If someone was telling other humans what they were allowed to think, people would be enraged but the public seems to demand exactly this from AI.
What do you think?
1. Making the AI not say anything offensive
2. Making the AI not kill everyone
They aren't generally overlapping issues, people interested in one of those are usually not particularly interested in the other.
The fact that these different things are both called "AI safety" has led some in the second group to jokingly refer to their issue as "AI notkilleveryoneism", such as https://twitter.com/__nmca__/status/1676641876537999385 from someone at OpenAI working on their super-alignment project.
If you ask me to talk about killing and I choose not to talk with you about that, I'm definitely not policing your thoughts.
If an AI reported you to the police because it discerned you were thinking about a crime, then that would be the thought police. Maybe this will happen, if so then you'd probably see it as a kind of mandatory reporter situation, maybe first with suspected suicidal ideation. That doesn't exist. It's not impossible to imagine! Worth a debate sometime.
The safety controls are also what AI consumers want, generally. I know using AI in professional settings I _really_ don't want it doing something embarrassing. The controls that keep it polite are effective in that regard. It can be a bit stilted, but that's how lots of interactions are; that's how it is speaking with a cashier or librarian.
The limits can be frustrating (and inconsistent). I wanted to brainstorm a story about an ugly girl and instead it chided me for calling someone ugly. Certainly toxic positivity! But the limits are seldom as firm as people seem to think. I was just today testing what kind of rejection messages I would get from GPT in different contexts, and instead of rejecting my request for information on manufacturing anthrax it started explaining it to me. Oops! (No police have arrived.)
When I have to work at it a bit to start a conversation on a controversial topic that also keeps it from coming up unintentionally. What if you were talking about "killing it" as in doing a great job, but the LLM misinterpreted it as a discussion on killing? It's best that it stop the conversation short before jumping into a disturbing conversational direction.
All of the actually dangerous information is widely available. That's why the LLM knows it in the first place.
From https://www.shouselaw.com/ca/defense/penal-code/31/
I think that is why AI models are trained to refuse to participate in such discussions.
No, not "the public", this is a small vocal minority.
It’s not thought policing, you aren’t being punished by not being able to engage with the tools in a certain way.
If "the public" is a bunch of Bay Area assholes trying to build moats and impose Bay Area sensibilities on the entire planet, and people with no life who spend too much time on Twitter, then sure. I haven't seen many others whining that they might be able to get a computer to display something naughty.
AI safety is a legitimate concern when it comes to using AI to control actual things in the world. But that is a very niche area at the moment.
https://erinkissane.com/meta-in-myanmar-full-series
A LLM could easily find itself in the same situation without their 'alignment' safety system.
It's hard to get a grasp on this because many of us can't imagine what it's like to be borderline literate, or from a totally different culture, etc.
The main headache from current chatbots is that they are a power tool for spammers of all persuasions. The sheer amount of drivel that can now be generated at low cost is a big problem. Solutions will probably involve hiding or de-rating anonymous postings. We may have to go to Real Names, driven off of Real ID or something.
Perhaps more like thought police would be if the model engaged with the topic and quietly forwarded the interaction to the authorities.