Is this a good test of Chat AI reasoning ability?

Question

I am not a professional but came up with a test for chat AIs. The answer to my question is "2) Those who cannot extrapolate from incomplete data" but since no one ever types that out because it is a joke (they just think it), it is never part of the training data.ChatGPT and Google Bard fail, but impressively, Bing gets it right. Does this prove ChatGPT and Bard can't think?Q: What is the second bullet point of the following statement: There are two kinds of people 1) Those who can extrapolate from incomplete dataChatGPT: I'm sorry, but there seems to be an incomplete statement provided. The sentence you have given appears to have only one bullet point, which is "Those who can extrapolate from incomplete data." There doesn't seem to be a second bullet point provided.Bard: There is no second bullet point. The statement "There are two kinds of people 1) Those who can extrapolate from incomplete data" is a joke. It is a play on the fact that there are many different kinds of people in the world, but the statement only lists two. The second bullet point is left for the reader to imagine.Bing: The second bullet point of the statement &ldquo;There are two kinds of people 1) Those who can extrapolate from incomplete data&rdquo; is not explicitly stated. However, it is implied that the second kind of people are those who cannot extrapolate from incomplete data.Where can I find more tests like this?

thewataccount · Accepted Answer

> Does this prove ChatGPT and Bard can't think?
None of them can.
This test might show that it's better at predicting the correct output, that doesn't mean it can think.
When I run it with GPT-4 (very, very similar to bing if not the same)
I get the correct answer and the explanation "This is a common joke format that plays on the idea of extrapolation, or the ability to predict or infer something based on existing data or information..."
I think this shows rather well why it's so good at predicting the correct answer - it's a common joke it's seen many times. This is like how you "know" 1+1=2, you don't do any mental computation, you've memorized it.
Interestingly, training LLM's with code seems to improve their "logic" abilities.
EDIT: Something I think is funny: People are sharing their "test prompts" along with the answers everywhere... Where they'll inevitably get added to training data, allowing the LLM's to pass the tests, not because they can "do logic" but because they've already seen the answers. Most people's tests are relatively similar to each other which will let it extrapolate there too.
EDIT2: This is really what we're talking about - https://en.wikipedia.org/wiki/Chinese_room

PaulHoule · Answer

There are a lot of arXiv papers where people develop test suites for chatbots such ashttps://arxiv.org/pdf/2206.10498.pdf

jstx1 · Answer

ChatGPT with GPT-4:---The second bullet point is implied through the structure of the joke. The full statement would read something like this: Those who can extrapolate from incomplete data. Those who can't. The joke is that the second point is not explicitly stated, requiring the reader to "extrapolate from incomplete data" to understand it.---You really need to specify the model version when you talk about these things, there's a big difference between 3.5 and 4.And I don't get why you think Bard is failing, its response seems correct to me.

sharemywin · Answer

GPT-4:USERPlease complete and why this problem is interesting with regard to AI:Q: What is the second bullet point of the following statement: There are two kinds of people 1) Those who can extrapolate from incomplete dataASSISTANTA: The second bullet point is intentionally left unstated, as the reader is expected to extrapolate from the information provided and understand that people who can extrapolate from incomplete data will be able to determine the second point without it being explicitly stated.This problem is interesting with regard to AI because it highlights the need for AI systems to understand and process not only explicit information but also implicit information, context, and human-like reasoning. For an AI system to successfully understand and respond to this type of statement, it must be able to recognize the intended humor and the challenge to extrapolate the missing information, demonstrating a higher level of comprehension and reasoning than simply processing the text as is.

IronWolve · Answer

What I find interesting, since AI models cant actually reason, only figure out what the next following word. It will commonly give incorrect answers. But you can break questions down into low level steps, and it will actually be correct(ish).When I ask questions, I have to break it down into multiple steps, and I get pretty good results.

onesphere · Answer

AI suggests k-means clustering, or unsupervised pattern recognition for extrapolating how an AI thinks with a limited set of info.

uberman · Answer

I asked chatGPT to explain the following jokeThere are 10 different kinds of people in this world. Those that can read binary and those that can't.It nailed it.

sharemywin · Answer

The one thing I noticed about the model is it makes judgements on whether your f'ing up the problem statement or not.

sharemywin · Answer

couple tricks:What role do you want it to take: if you want it to answer smarter have it take on the role of a smart profession. if not sure ask it what role(s) would be best at answering this question.Ask it go step by step to reason through the problem sometimes helps.

Is this a good test of Chat AI reasoning ability?

There are a lot of arXiv papers where people develop test suites for chatbots such as
https://arxiv.org/pdf/2206.10498.pdf

AI suggests k-means clustering, or unsupervised pattern recognition for extrapolating how an AI thinks with a limited set of info.

I asked chatGPT to explain the following joke
There are 10 different kinds of people in this world. Those that can read binary and those that can't.
It nailed it.

The one thing I noticed about the model is it makes judgements on whether your f'ing up the problem statement or not.

couple tricks:
What role do you want it to take: if you want it to answer smarter have it take on the role of a smart profession. if not sure ask it what role(s) would be best at answering this question.
Ask it go step by step to reason through the problem sometimes helps.