We need a new Turing-Test

Question

It wasn't long ago that the best attempt to pass the Turing-Test was a chatbot that "was portrayed as a 13-year-old Ukrainian boy to induce forgiveness in those with whom it interacts for its grammatical errors and lack of general knowledge." [0]. But then ChatGPT came along and blew the test out of the water.There are other, more well defined intelligence tests such as the "Winograd Schema" [1] which asks grammatical questions that require real-world knowledge and common-sense in order to answer such as:"In the sentence 'The trophy would not fit in the suitcase because it was too big/small.' what does big/small refer to?"But even these type of questions which were considered to be "The Sentences Computers Can't Understand, But Humans Can" as late as 2020 [2] seem to be easy for LLMs to answer with GPT4 topping the list with almost human level accuracy [3].So assuming ChatGPT isn't an AGI (yet), how can we prove it to ourselves? What linguistic tasks are humans qualitatively better than LLMs in?------[0] https://en.wikipedia.org/wiki/Eugene_Goostman [1] https://en.wikipedia.org/wiki/Winograd_schema_challenge [2] https://www.youtube.com/watch?v=m3vIEKWrP9Q [3] https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande

MattGaiser · Accepted Answer

Does ChatGPT really blow away the Turing test?
The criteria listed on Wikipedia is:
> The evaluator would be aware that one of the two partners in conversation was a machine
> If the evaluator could not reliably tell the machine from the human, the machine would be said to have passed the test.
ChatGPT isn't there by a long shot. If you aren't really considering that an online agent may be a bot and are just casually interacting with it, then it might pass. But within 10 minutes, it is not going to become anywhere near as friendly as a person normally would in 10 minutes of conversation, especially if you know that one is a bot and are looking for differences.
Turing Test is not casually passing as human in casual interaction. It is stumping a human dedicated to rooting out that it is a machine who is aware that one of the participants is a machine.

Quinzel · Answer

Is it a fair test, if once it&rsquo;s been passed, you change the testing criteria? I&rsquo;m genuinely asking because I would have thought that the test was a measure of something (in this case whether AI is capable of thinking like a human being) , so then to change what you&rsquo;re measuring because something actually passed the test seems to defeat the point of the test somewhat. Unless you want to redefine the variable being measured rather than the test? I guess &ldquo;thinking like a human being&rdquo; is a latent variable and not easily observable so changing the way you define it is possible, but then you could just forever be redefining what it means to think like a human and constantly ensure that AI never quite makes the cut?That&rsquo;s my philosophical pondering over the issue&hellip;