What can we learn about human cognition from the performance of LLMs

Question

What can we learn about human cognition from the performance of LLMs

What can we learn about human cognition from the performance of LLMs
Some hypotheses (adapted from other posts):
* We have learned that Spreading Activation, when applied through high-dimensional non-symbolic network (the network formed by embedding vectors) may be able to account for abstraction in fluent language.
* We have learned that "fluent reasoning" (sometimes called "inline" or "online" reasoning), that is, the shallow reasoning embedded in fluent language, may be more powerful than usually thought.
* We have learned that "talking to yourself" (externally, in the case of GPTs, and potentially also internally in the case of human's "hearing yourself think") is able to successfully maintain enough short-term context to track naturally long chains of argument (via contextually-guided fluent reasoning, as above).
* We have learned that to some extent powerful "mental models" that support (again, at least fluent) reasoning can be in effect (functionally) represented and used in a highly distributed system.
* We have learned that meta-reasoning (which the LLMs do not do) may be important in augmenting fluent reasoning, and in tracking extended "trains of thought" (and thus extended dialogues).
* We have a new model of confabulation that fits into the fluent language model as implemented by LLMs.
* We have learned that people's "knowledge space" is quite amazing, given that they have ~10x current LLM parameter size (~10T, where as an individual has potentially ~100T cortical parameters -- depending on what you count, of course) but a given individual only encodes a small number of languages and a small number of domains to any great depth (in addition to the standard operating procedures that almost all people encode). [That is, vs. the LLM encoding the whole damned internet in ~10 different languages.]
What else? (And, of course, it goes w/o saying that you'll argue about the above :-)

tlb · Accepted Answer

Transformer-based LLMs define a theory of time: each token representation has added to it a vector full of sincos(wt) for a set of frequencies w, after which order is ignored. (Each sincos defines 2 elements of the vector: sin(wt) and cos(wt). Use e^iwt if you prefer to think in complex numbers.)
So in "Your heart is stronger than your head", heart and head are 5 words apart, or ~8 tokens. So one gets sincos(w(t+0)), the other gets sincos(w(t+8)). That's the only thing that distinguishes it from the converse sentence, "Your head is stronger than your heart."
Chomsky had a much more symbolic theory of grammar. The fact that Chat GPT can answer questions about the above sentences (try them!) with order only defined by relative timestamps is remarkable.
Interestingly, if you throw in some extra words like "Bob's head is stronger (and more cromulent) than his heart" it fails to answer questions about which is stronger. Possibly because the extra tokens bring the sincos terms it had learned to use for "A is Xer than B" statements wrapped all the way around the circle.
It'd be interesting to devise similar tests for people, to see what extraneous parentheticals can confuse them.

ted_bunny · Answer

Surfing Uncertainty makes a case that the whole damn brain seems to work on a predictive model. I found it convincing, but I'm very much a layman so that's all I can say with half confidence.