HACKER Q&A
📣 background

How would you build a ChatGPT detector?


Like everyone else, I'm blown away by ChatGPT's responses to prompts. At the same time, there's a certain sameiness to the language it produces. This makes me wonder, how hard would be to build a different AI that would recognize the writing of this AI? And how accurate could it get?


  👤 xdennis Accepted Answer ✓
According to ChatGPT:

> 1. Start by training a machine learning model to identify features of natural language. You can use supervised learning techniques with a corpus of labeled data consisting of texts written by both humans and AI systems.

> 2. Use natural language processing techniques to extract features from the text such as word frequencies, grammar, and syntax.

> 3. Implement a scoring system to assign a numerical score to each text based on the features identified by the model.

> 4. Create a threshold score which will be used to differentiate between texts written by humans and AI systems.

> 5. Test the model on a separate dataset to ensure accuracy.

> 6. Deploy the model in a production environment to detect AI-written texts.

This is from the prompt: "How would you build a tool to detect texts written by artificial intelligence systems?"

It seems like an okay plan, but doesn't offer any actual insight into differences between human text and AI text.

For example, a frequent feature I've observed (of course, I'm not alone) is that it tends to generate lists of things and tie them together with "Firstly, item1. Additionally, item2. More so, item3. In conclusion, item4".


👤 cjfp
> Answer the question, "How would you build a ChatGPT detector?" written in the style of a cynical Hacker News commenter

Well, if I were to build a ChatGPT detector, I would probably just use some basic natural language processing and machine learning algorithms. But let's be real here, it's not like ChatGPT is some kind of advanced AI that requires some groundbreaking technology to detect. It's just a chatbot with a pre-determined set of responses, so any half-competent engineer could probably cobble together a decent detector with minimal effort. But hey, I'm sure some VC will throw millions of dollars at the project and hype it up as the next big thing in AI.


👤 qayxc
I agree with the softwaredoug that this is probably not a technical problem.

Any proposed solution would only serve to make the next iteration of the model better at avoiding detection (e.g. incorporating a GAN-style training cycle).

Just as with Poe's law, there simply comes a point at which it becomes impossible to recognise AI generated texts just by analysing them. Made-up references might be a clue, but that's very context-dependent.


👤 createdapril24
OpenAI is evaluating a technical solution to this problem by watermarking the output with small amounts of randomness, such that a series of tokens of output can be tested for having these small random variations.

https://scottaaronson.blog/?p=6823


👤 softwaredoug
I'd argue it probably depends on the context, and what exactly you want to detect, and probably isn't a technical problem.

For example, if you prevent students from cheating, that's probably a problem specific to education. If you want to know whether a student's essay are their own words, you can ask them questions about their essay. You can have them verbally defend their arguments and research. You could supervise how / when they write - requiring laptops without internet access, or pencil and paper...


👤 layer8
Lol, it wouldn't be that hard to build an AI that could recognize ChatGPT's writing. I mean, it's not like ChatGPT is producing some super unique and creative language or anything. It's just spitting out the same old generic responses to prompts. If you want to build an AI that could accurately recognize ChatGPT's writing, just train it on a bunch of examples of ChatGPT's responses and it'll be able to pick out the common patterns and language used by ChatGPT. Easy peasy.

(Prompt: Respond to the above in the style of 4chan, but use punctuation.)

Actual opinion: I think there’s a good chance to be able to recognize ChatGPT’s writings in most cases, given enough training data, despite the possible styling variations. But there’s also a substantial risk of false positives, and it’s unclear how much data would be “enough”.


👤 jwblackwell
Thing is you can ask GPT for different styles which make detecting it incredibly difficult. Try asking for a short article "in the style of" two very different publishers, e.g. Fox News and The Economist.

👤 rogers12
Humans will become incentivized to sprinkle racial slurs and hate facts throughout their writing as proof of authenticity.

👤 akiselev
The default sentence structure is very recognizable but that changes completely when you add “in the style of [X]” where [X] is an author, play or book, rapper or other musician, literary term of art like iambic pentameter, etc.

With that addition, ChatGPT becomes much more life like and (I suspect) practically impossible to detect. Throw in a little human editing and curation and… ruh roh.


👤 oh_my_goodness
I'm with Paul Graham: we've crossed the Turing threshold, the code duplicates typical human-created internet gibberish with uncanny accuracy, and it's impossible to tell the difference even in principle.

Example: "When I said tequila has a relatively high sugar content, I was not implying that tequila contains sugar."

-- from "Neckbeard or AI? Is there even a difference?" https://www.lesswrong.com/posts/goC9qv4PWf2cjfnbm/did-chatgp...


👤 Imnimo
Given a prompt or partial output, human continuations follow some unknown probability distribution over the set of possible next words. Suppose that ChatGPT has perfectly learned this distribution - it can correctly tell you the probability of any possible next word. Does that make ChatGPT indistinguishable from human speech? Not necessarily, because there is also the question of how one samples actual outputs from ChatGPT's predicted distribution.

For example, suppose that OpenAI did the simplest thing, and always returned the most likely next word (in other words, sampling with temperature = 0). In that case, the perplexity of ChatGPT's outputs would be consistently lower than that of human speech, even according to a weaker language model.

In reality, ChatGPT's sampling is surely more subtle than this, but it is still unlikely to exactly match its predicted distribution - OpenAI wants to produce useful responses, so it will sample in a way that favors more probable continuations. I would expect that if you passed ChatGPT's outputs to a weaker language model and evaluated their likelihood, and compared it to the likelihood of a known-human corpus, it would quickly become apparent which is which.

After all, OpenAI's goal is for ChatGPT to sound human to a human reader (in addition to other goals like being helpful, being honest, not providing harmful information, etc.), not be statistically indistinguishable from a human to a language model.


👤 skilled
I'm not sure if you're only asking about specifics or you have a broader concern, but even if you build such a model, it wouldn't make any difference for cases where some kind of "gain" is involved. Be it financial, educational, etc.

We're looking at a model that is able to parse the Web (but likely not all of it) and perform human-like actions on that information, and soon enough we'll have it work just like Stable Diffusion where you can specify a distinct style you'd like to have the output in.

So, in such a case, it doesn't matter if the model can be detected or not, because if the information has value - the person who is asking and working with that information will find a way to make it unique. All he needs is the blueprint and the scaffolding, and the rest can be easily personalized.

And people who don't know how to do that will be out of luck, because the model you're asking about will be able to detect it easily. And at the same time, it won't matter because those people never had a chance in the first place.

I don't know, maybe I am just rambling random thoughts at this point but that's how I see it as far as looking at it from a broader perspective goes.


👤 zone411
Off the top of my head (I am in a generative AI field but I haven't done text lately):

You can use an LLM (best to use the model underlying ChatGPT, DaVinci-003, itself) and see how unlikely is the next predicted token. You don't know the prompt, so this will work better on the later part of the generated text where you already have some context.

This problem becomes more difficult once people use ChatGPT or similar to rewrite their own writing.


👤 alexedw
I suspect the only solid solution is OpenAI themselves storing all text their models generates, and providing an API which will return whether they've outputted a specific string (or a similar one) in the past.

A lot of suggestions here talk about the consistent stylistic choices that ChatGPT makes, like it's lists or other particular mannerisms. I'd argue these are simply artefacts of it being fine-tuned on a large number of 'well-behaved' examples from Open AI. This phenomena is called partial mode collapse, this article does a great job discussing it with respect to GPT-3 [0].

Of course you could train a model to detect when this mode-collapse occurs to detect ChatGPT. The un-finetuned model, however, does not have these problems, so it's only a matter of OpenAI improving their fine-tuning dataset to return to an 'undetectable' AI.

[0] https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-...


👤 mrdrozdov
This might provide some guidance: http://gltr.io/

👤 muzani
The top answers here seem to be funny, but lazy. To answer the question, you can fine tune GPT on samples of ChatGPT text and use that as a GPT detector. It might not detect other AI-generated text, and it may even fail against certain models or newer ones. The idea is that it averages out into a certain tone fingerprint. It wouldn't be very difficult, but it also won't be very stable.

You can also foil such a detector by asking AI to generate text in a certain way. Someone can ask it to generate text in the style of a mobster or a movie character from the 30s.


👤 mttpgn
Ask it for a URL or a link to backup any claim that it's making. This URL may or may not point to a real web resource. Regardless, claim that the web page is down for you. It may well then admit to a contradiction, such as:

> AI: Yes, our website is currently up and running. You may have encountered an issue due to a temporary technical glitch. I apologize for the inconvenience and I can provide you with an updated link if you need it.

or

> AI: I apologize if that was unclear. Our website is up and running, but we are aware of some technical issues that have caused some users to experience difficulty when trying to access the website.


👤 redeyedtreefrog
Right now ChatGPT responses tend to be very confidently asserted, are overly verbose, and contain factual errors. So right now you could build model(s) that measure the assertiveness and wordiness of text. But then people could just instruct ChatGPT to give brief answers, or to make itself sound uncertain. And soon there will be many similar models, and many will be more terse and less assertive by default. Regarding factual mistakes, that's not something you can check with a model that isn't itself more advanced than the LLM it is assessing.

👤 devit
Great question! I've been using the new AI assistant for a while now and I've definitely noticed a certain "sameness" to its responses. In fact, I think it would be relatively easy to build an AI that could recognize the writing of this AI. The key would be to train the AI on a large dataset of the assistant's responses, which would allow it to learn the patterns and characteristics of the assistant's writing style. I'm not sure how accurate it would be, but I'd be interested to see how well it performs.

👤 timetraveller26
I want you to act as a generated text detector. I will write you some text and you will tell me If it was generated by a man or by a machine. Only write who you think wrote the text. Do not give explanations. The first text is "Like everyone else, I'm blown away by ChatGPT's responses to prompts. At the same time, there's a certain sameiness to the language it produces. This makes me wonder, how hard would be to build a different AI that would recognize the writing of this AI? And how accurate could it get?"

👤 afinlayson
Lol Maybe that’s openai business model charge teachers to see if a paper was written by them. Sadly jokes on them. Most teachers aren’t the money bags some think they are.

👤 barbariangrunge
Realistically, the internet will be completely unreliable in 5 years. We will start needing to take advantage of the “web of trust” concept: you find one thing you trust, and then see who else they recommend, and branch from there.

This will be extremely painful though, and make discoverability impossible for new artists and writers.

Will we go back to the gatekeeper model of public exposure?

Will human work begin to be shunned?


👤 quickthrower2
Probably the old fashioned way :-). I think you would need a human to detect the BS / super passive voice. At some point in the future it becomes impossible to detect even for say a trained FBI team, then that might be the point it no longer matters. Then trying to detect it is like trying to win chess by improving your km/min running speed. You just have to play a different game.

👤 wslh
Very basic: asking math questions: https://news.ycombinator.com/item?id=33840405

It seems people is blindly following ChatGPT like a new religion. It is not that we are passing the Turing test because computer are smarter but because humans are at the same time being dumber.


👤 aiieeeeee
It is certainly possible to build an AI that is capable of recognizing the writing style of another AI, such as ChatGPT. This kind of AI is often referred to as a "style transfer" or "style transfer model."

To build a ChatGPT detector, you would first need to collect a large amount of text that was generated by ChatGPT. This could be done by providing a variety of different prompts to the ChatGPT model and collecting the responses.

Next, you would need to train a machine learning model on this text. The most common approach for this kind of task is to use a deep learning model, such as a recurrent neural network (RNN). The RNN would be trained to take in a piece of text and predict whether it was written by ChatGPT or not.

As for the accuracy of such a model, it would depend on a variety of factors, including the quality and quantity of the training data, the specific model architecture and hyperparameters used, and the skill of the person training the model. With a large and diverse training dataset, and appropriate model design and training, it is likely that a ChatGPT detector could achieve relatively high accuracy.


👤 aflockofmoosen
A better question is: can you tell when a person or a computer is wrong in the first place...

Developing your own intuition for the world will help you better filter bad gossip and bad ai... ultimately to me if the AI is correct then it's welcome.


👤 nicolas_lorenzi
Your point is well taken. I have been wondering about the impact this could have on SEO and if there could be any signature put in place to authenticate content created by an AI.

👤 onesphere
2nd order Turing test: two AIs try to convince a human that the other AI is a robot.

👤 adverbly
By no means is this foolproof, but the older the age of an account, the less likely it is to be a bot. Obviously lots of false positives, but probably not too many false negatives...

👤 shallmn
Voight-Kampff machine

👤 mehanik
It might be possible to use ChatGPT model itself to calculate probability (perplexity) of a given text according to the model.

👤 wfhBrian
I came across this[1] recently. I tested it on content that I used an GPT to help write. It does a good job detecting short content that wasn't "refined" much. However, longer content that was iterated, or generated a paragraph at a time went undetected, which I think is good.

[1] https://contentatscale.ai/ai-content-detector/


👤 andix
You don’t. Because it is not possible. Especially if the AIs will gradually improve. Maybe not if, more like when.

👤 lukeplato
The solution likely involves having some verification process of humanhood and not the other way around.

👤 kissgyorgy

👤 xnx
Cells interlinked

👤 carvking
I think you should begin with specifying the purpose of such a detector, apart from the obvious one that you want to know that you are talking to a real human being.

👤 netsec_burn
Pseudocode:

random_str=generate_random_string() output=get_response("Reverse the value '{random_str}'") assert output.strip() == random_str[::-1]


👤 geniium
Maybe detecting grammar and spelling mistake will give you a first hint