> 1. Start by training a machine learning model to identify features of natural language. You can use supervised learning techniques with a corpus of labeled data consisting of texts written by both humans and AI systems.
> 2. Use natural language processing techniques to extract features from the text such as word frequencies, grammar, and syntax.
> 3. Implement a scoring system to assign a numerical score to each text based on the features identified by the model.
> 4. Create a threshold score which will be used to differentiate between texts written by humans and AI systems.
> 5. Test the model on a separate dataset to ensure accuracy.
> 6. Deploy the model in a production environment to detect AI-written texts.
This is from the prompt: "How would you build a tool to detect texts written by artificial intelligence systems?"
It seems like an okay plan, but doesn't offer any actual insight into differences between human text and AI text.
For example, a frequent feature I've observed (of course, I'm not alone) is that it tends to generate lists of things and tie them together with "Firstly, item1. Additionally, item2. More so, item3. In conclusion, item4".
Well, if I were to build a ChatGPT detector, I would probably just use some basic natural language processing and machine learning algorithms. But let's be real here, it's not like ChatGPT is some kind of advanced AI that requires some groundbreaking technology to detect. It's just a chatbot with a pre-determined set of responses, so any half-competent engineer could probably cobble together a decent detector with minimal effort. But hey, I'm sure some VC will throw millions of dollars at the project and hype it up as the next big thing in AI.
Any proposed solution would only serve to make the next iteration of the model better at avoiding detection (e.g. incorporating a GAN-style training cycle).
Just as with Poe's law, there simply comes a point at which it becomes impossible to recognise AI generated texts just by analysing them. Made-up references might be a clue, but that's very context-dependent.
For example, if you prevent students from cheating, that's probably a problem specific to education. If you want to know whether a student's essay are their own words, you can ask them questions about their essay. You can have them verbally defend their arguments and research. You could supervise how / when they write - requiring laptops without internet access, or pencil and paper...
(Prompt: Respond to the above in the style of 4chan, but use punctuation.)
Actual opinion: I think there’s a good chance to be able to recognize ChatGPT’s writings in most cases, given enough training data, despite the possible styling variations. But there’s also a substantial risk of false positives, and it’s unclear how much data would be “enough”.
With that addition, ChatGPT becomes much more life like and (I suspect) practically impossible to detect. Throw in a little human editing and curation and… ruh roh.
Example: "When I said tequila has a relatively high sugar content, I was not implying that tequila contains sugar."
-- from "Neckbeard or AI? Is there even a difference?" https://www.lesswrong.com/posts/goC9qv4PWf2cjfnbm/did-chatgp...
For example, suppose that OpenAI did the simplest thing, and always returned the most likely next word (in other words, sampling with temperature = 0). In that case, the perplexity of ChatGPT's outputs would be consistently lower than that of human speech, even according to a weaker language model.
In reality, ChatGPT's sampling is surely more subtle than this, but it is still unlikely to exactly match its predicted distribution - OpenAI wants to produce useful responses, so it will sample in a way that favors more probable continuations. I would expect that if you passed ChatGPT's outputs to a weaker language model and evaluated their likelihood, and compared it to the likelihood of a known-human corpus, it would quickly become apparent which is which.
After all, OpenAI's goal is for ChatGPT to sound human to a human reader (in addition to other goals like being helpful, being honest, not providing harmful information, etc.), not be statistically indistinguishable from a human to a language model.
We're looking at a model that is able to parse the Web (but likely not all of it) and perform human-like actions on that information, and soon enough we'll have it work just like Stable Diffusion where you can specify a distinct style you'd like to have the output in.
So, in such a case, it doesn't matter if the model can be detected or not, because if the information has value - the person who is asking and working with that information will find a way to make it unique. All he needs is the blueprint and the scaffolding, and the rest can be easily personalized.
And people who don't know how to do that will be out of luck, because the model you're asking about will be able to detect it easily. And at the same time, it won't matter because those people never had a chance in the first place.
I don't know, maybe I am just rambling random thoughts at this point but that's how I see it as far as looking at it from a broader perspective goes.
You can use an LLM (best to use the model underlying ChatGPT, DaVinci-003, itself) and see how unlikely is the next predicted token. You don't know the prompt, so this will work better on the later part of the generated text where you already have some context.
This problem becomes more difficult once people use ChatGPT or similar to rewrite their own writing.
A lot of suggestions here talk about the consistent stylistic choices that ChatGPT makes, like it's lists or other particular mannerisms. I'd argue these are simply artefacts of it being fine-tuned on a large number of 'well-behaved' examples from Open AI. This phenomena is called partial mode collapse, this article does a great job discussing it with respect to GPT-3 [0].
Of course you could train a model to detect when this mode-collapse occurs to detect ChatGPT. The un-finetuned model, however, does not have these problems, so it's only a matter of OpenAI improving their fine-tuning dataset to return to an 'undetectable' AI.
[0] https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-...
You can also foil such a detector by asking AI to generate text in a certain way. Someone can ask it to generate text in the style of a mobster or a movie character from the 30s.
> AI: Yes, our website is currently up and running. You may have encountered an issue due to a temporary technical glitch. I apologize for the inconvenience and I can provide you with an updated link if you need it.
or
> AI: I apologize if that was unclear. Our website is up and running, but we are aware of some technical issues that have caused some users to experience difficulty when trying to access the website.
This will be extremely painful though, and make discoverability impossible for new artists and writers.
Will we go back to the gatekeeper model of public exposure?
Will human work begin to be shunned?
It seems people is blindly following ChatGPT like a new religion. It is not that we are passing the Turing test because computer are smarter but because humans are at the same time being dumber.
To build a ChatGPT detector, you would first need to collect a large amount of text that was generated by ChatGPT. This could be done by providing a variety of different prompts to the ChatGPT model and collecting the responses.
Next, you would need to train a machine learning model on this text. The most common approach for this kind of task is to use a deep learning model, such as a recurrent neural network (RNN). The RNN would be trained to take in a piece of text and predict whether it was written by ChatGPT or not.
As for the accuracy of such a model, it would depend on a variety of factors, including the quality and quantity of the training data, the specific model architecture and hyperparameters used, and the skill of the person training the model. With a large and diverse training dataset, and appropriate model design and training, it is likely that a ChatGPT detector could achieve relatively high accuracy.
Developing your own intuition for the world will help you better filter bad gossip and bad ai... ultimately to me if the AI is correct then it's welcome.
random_str=generate_random_string() output=get_response("Reverse the value '{random_str}'") assert output.strip() == random_str[::-1]