Even if you gave an super-LLM agency and a body, and it moved out to a little shack on a lake, got married, raised children, and started publishing sublime philosophy and fiction, I still wouldn't be convinced.
There's just too much human data available. Anything could be probabilistic word selection... Maybe the distinction is not really important anyway.
Will it become curious or contemplate like a human?
When does it drive by itself, rather than requiring a prompt to do anything?
The obvious problem is inventing a question like that. (You have to invent it. You can't find one online...)
What matters is whether the outputs are useful and the outputs don't change based on whether you call it "thought", "AGI" or "probabilistic word selection".