There are other, more well defined intelligence tests such as the "Winograd Schema" [1] which asks grammatical questions that require real-world knowledge and common-sense in order to answer such as:
"In the sentence 'The trophy would not fit in the suitcase because it was too big/small.' what does big/small refer to?"
But even these type of questions which were considered to be "The Sentences Computers Can't Understand, But Humans Can" as late as 2020 [2] seem to be easy for LLMs to answer with GPT4 topping the list with almost human level accuracy [3].
So assuming ChatGPT isn't an AGI (yet), how can we prove it to ourselves? What linguistic tasks are humans qualitatively better than LLMs in?
------
[0] https://en.wikipedia.org/wiki/Eugene_Goostman [1] https://en.wikipedia.org/wiki/Winograd_schema_challenge [2] https://www.youtube.com/watch?v=m3vIEKWrP9Q [3] https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande
The criteria listed on Wikipedia is:
> The evaluator would be aware that one of the two partners in conversation was a machine
> If the evaluator could not reliably tell the machine from the human, the machine would be said to have passed the test.
ChatGPT isn't there by a long shot. If you aren't really considering that an online agent may be a bot and are just casually interacting with it, then it might pass. But within 10 minutes, it is not going to become anywhere near as friendly as a person normally would in 10 minutes of conversation, especially if you know that one is a bot and are looking for differences.
Turing Test is not casually passing as human in casual interaction. It is stumping a human dedicated to rooting out that it is a machine who is aware that one of the participants is a machine.
That’s my philosophical pondering over the issue…