HACKER Q&A
📣 consumer451

How hard would it be for a group to alter the opinions of an LLM?


I have been thinking about how realistic it is for an external actor to "hack" an LLM for real. Not a "jailbreak," but an actual modification of the weights.

ChatGPT seems to provide a vector via the public thumbs up/down feature.

I imagine that this will be an even larger problem for opensource ChatGPT-esque efforts.

Will this be the new playground for every intelligence agency, PR firm, and SEO consultant?

HN'ers very familiar with LLMs, how plausible is this type of manipulation?


  👤 YourDadVPN Accepted Answer ✓
Should be impossible, the model won't have write access to its weights when it isn't being trained. If you hacked the machine it was running on and gained FS write permission, sure, but you're not getting GPT to update its own weights, no mayter how good you are at prompt-engineering.

👤 uberman
Was MS Tay driven by a LLM? People killed her in like a weekend. It seems like Sydney is headed for the same end