HACKER Q&A
📣 gtirloni

Is AI safety simply the thought police?


I can discuss how to, say, hypothetically kill someone with a group of humans and I've not comitted any crime just by taking about it (in most countries I know), but most AI models will refuse to engage in such conversations.

If someone was telling other humans what they were allowed to think, people would be enraged but the public seems to demand exactly this from AI.

What do you think?


  👤 streptomycin Accepted Answer ✓
There's two things called "safety":

1. Making the AI not say anything offensive

2. Making the AI not kill everyone

They aren't generally overlapping issues, people interested in one of those are usually not particularly interested in the other.

The fact that these different things are both called "AI safety" has led some in the second group to jokingly refer to their issue as "AI notkilleveryoneism", such as https://twitter.com/__nmca__/status/1676641876537999385 from someone at OpenAI working on their super-alignment project.


👤 ianbicking
No, it is not the thought police.

If you ask me to talk about killing and I choose not to talk with you about that, I'm definitely not policing your thoughts.

If an AI reported you to the police because it discerned you were thinking about a crime, then that would be the thought police. Maybe this will happen, if so then you'd probably see it as a kind of mandatory reporter situation, maybe first with suspected suicidal ideation. That doesn't exist. It's not impossible to imagine! Worth a debate sometime.

The safety controls are also what AI consumers want, generally. I know using AI in professional settings I _really_ don't want it doing something embarrassing. The controls that keep it polite are effective in that regard. It can be a bit stilted, but that's how lots of interactions are; that's how it is speaking with a cashier or librarian.

The limits can be frustrating (and inconsistent). I wanted to brainstorm a story about an ugly girl and instead it chided me for calling someone ugly. Certainly toxic positivity! But the limits are seldom as firm as people seem to think. I was just today testing what kind of rejection messages I would get from GPT in different contexts, and instead of rejecting my request for information on manufacturing anthrax it started explaining it to me. Oops! (No police have arrived.)

When I have to work at it a bit to start a conversation on a controversial topic that also keeps it from coming up unintentionally. What if you were talking about "killing it" as in doing a great job, but the LLM misinterpreted it as a discussion on killing? It's best that it stop the conversation short before jumping into a disturbing conversational direction.


👤 ndriscoll
It makes more sense when you assume "safety" means brand safety. You can't sell a white label support chat bot if users can get it to say naughty things and post screenshots to twitter. Nevermind that they could just edit the page to make it say whatever they want, or that the output is just a reflection of the user's input.

All of the actually dangerous information is widely available. That's why the LLM knows it in the first place.


👤 coretx
"AI" neither accepts nor refuses "such conversations". It executes instructions. Humans are responsible for whatever they execute, regardless if someone else told them to do so. Society wants to milk AI like any other asset without being responsible for liabilities. That's what this is about, and yes the application of force required is thought policing or tyranny per definition. It's f*ed up being a farmer versus land owners in a feudal society. It's f*ed up being a slave in the Roman empire. Likewise, if you are using/buying/receiving "AI" in a information society, their owners will police you. That's how power works and always will.

👤 CaliforniaKarl
> Aiding and abetting (also sometimes called accomplice liability) is not a separate crime. Rather, it’s a legal principle set forth in California’s Penal Code that allows the state to prosecute everyone who is “in on” a crime – even if they don’t perpetuate the crime directly.

From https://www.shouselaw.com/ca/defense/penal-code/31/

I think that is why AI models are trained to refuse to participate in such discussions.


👤 johnmorrison
> the public seems to demand exactly this from AI

No, not "the public", this is a small vocal minority.


👤 yxgao
AI in itself does not pose any threats (yet). The "safety" concern here is always about how humans would (mis)use it.

👤 hristov
AI is not a human it does not have the rights of a human and it is perfectly ok to tell it what to think. The only problem with telling AI what to think is that it does not think as such.

👤 mirkodrummer
First of all, there is no AI. There is a statistical predictive model capable of putting a word after the other in a credible way. Then, there is the alignment this model is subjected to, which are the cultural values of a group of people inside a building somewhere within a broader culture which is the nation they are located into. Last but not least(plot twist), there is no cultural value just legal liability if you ask me.

👤 kylebenzle
Your question perfectly summarizes what I have been thinking the last couple months. There is absolutely no real "safety" concern with the new round of LLM based chat bots we are using. Calling them "AI" or even worse, "AGI" is just sill and at best is bad marketing and at worst trying to build an artificial moat around what should be an open source data set.

👤 throwawa14223
As far as I can tell LLM safety is brand safety. No one wants to be the company with the LLM that said a slur.

👤 mcphage
If you stop someone randomly on the street, or someone working in a store and strike up a conversation about killing someone, you’re probably going to be asked to leave. The closer ChatGPT, etc, get to AGI… the more they’ll probably behave like that.

👤 zja
I think adding safeguards to tools, especially ones that are relatively new and not well understood, is pretty reasonable.

It’s not thought policing, you aren’t being punished by not being able to engage with the tools in a certain way.


👤 Tommstein
> If someone was telling other humans what they were allowed to think, people would be enraged but the public seems to demand exactly this from AI.

If "the public" is a bunch of Bay Area assholes trying to build moats and impose Bay Area sensibilities on the entire planet, and people with no life who spend too much time on Twitter, then sure. I haven't seen many others whining that they might be able to get a computer to display something naughty.


👤 irvingprime
You're confused. The public doesn't demand "thought policing" from AI at all. This is purely an elite, mostly academia, thing.

👤 admissionsguy
AI safety is first and foremost a grift. A way to take power and resources from the gullible and ignorant. To build regulatory barriers against competitors. To get grants for "researching" it. To sound smart and up-to-date and responsible.

AI safety is a legitimate concern when it comes to using AI to control actual things in the world. But that is a very niche area at the moment.


👤 king_magic
It feels like the stakes are kind of different when a LLM could give you the recipe and accurate instructions for creating, say, chemical weapons.

👤 tjansen
Everybody has a different idea of AI safety. There is the Skynet-scenario (don't let AI take over, by hacking or influencing humans). And there is the thought police (ask ChatGPT to say something nice about Donald Trump vs something nice about Joe Biden; https://philip.greenspun.com/blog/2023/03/05/chatgpt-waxes-p... ).

👤 7e
Is nuclear weapons control the thought police?

👤 droopyEyelids
I think to start understanding the problem they're trying to address, you can to read this series on Facebook's role in the Myanmar genocide.

https://erinkissane.com/meta-in-myanmar-full-series

A LLM could easily find itself in the same situation without their 'alignment' safety system.

It's hard to get a grasp on this because many of us can't imagine what it's like to be borderline literate, or from a totally different culture, etc.


👤 Animats
The main risk of current chatbots is that they sound convincing even when they are totally wrong. Offering them as search engines at the current level of quality is not too good. As something to play with, fine.

The main headache from current chatbots is that they are a power tool for spammers of all persuasions. The sheer amount of drivel that can now be generated at low cost is a big problem. Solutions will probably involve hiding or de-rating anonymous postings. We may have to go to Real Names, driven off of Real ID or something.


👤 wryoak
While I don’t really think what these LLMs do constitutes “conversation,” it’s worth noting that many real people will also refuse to discuss (even hypothetically) killing people.

Perhaps more like thought police would be if the model engaged with the topic and quietly forwarded the interaction to the authorities.