HACKER Q&A
📣 xucian

Anyone working on something better than LLMs?


if you think about it, next-token prediction is just stupid. it's so resource intensive

yet, it's mimicking emergent thought quite beautifully. it's shockingly unintuitive how a simple process scaled enormously can lead to this much practical intelligence (practical in the sense that's useful, but it's not the way we think). I'm aware there are multiple layers, filters, processes etc., I'm just talking about the foundation, which is next-token prediction.

when I first heard that it's not predicting words, but parts of words, I immediately saw a red flag. yes, there are compounded words like strawberry (straw + berry) and you can capture meaning at a higher-resolution, but most words are not compounded, and just in general we're trying to simulate meaning instead of 'understanding' it. 'understanding' simply means knowing a man is to a woman what a king is to a queen, but without the need to learn about words and letters (that should be just an interface).

I feel we're yet to discover the "machine code" for ASI. it's like we have no compiler, but we directly interpret code. imagine the speed-ups if we could just spare the processor from understanding our stupid, inefficient language.

I'd really like to see a completely new approach working in the Meaning Space, which transcends the imperfect Language Space. This will require lots of data pre-processing, but it's a fun journey -- basically a parser human-machine and machine-human. I'm sure I'm not the first one thinking about it

so what we've got so far?


  👤 plaidfuji Accepted Answer ✓
As others have noted, Yann LeCun is looking beyond autoregressive (next-token-prediction) models. Here’s one of his slide decks that raises some interesting new concepts:

https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMR...


👤 BugsJustFindMe
> it's shockingly unintuitive how a simple process scaled enormously can lead to this much practical intelligence

A biological neuron doesn't do much. On its own, a simple process. Yet when you put a 100 billion of them together in the right 1000-connected configuration you get a human brain.


👤 stevenAthompson
> in general we're trying to simulate meaning instead of 'understanding' it. 'understanding' simply means knowing a man is to a woman what a king is to a queen, but without the need to learn about words and letters (that should be just an interface).

I have no idea what I'm talking about, but what you describe is exactly what LLM's do.

Words are tokens that represent concepts. We've found a way to express the relationships between many tokens in a giant web. The tokens are defined by their relationships to each other. Changing the tokens we use probably won't make much more difference than changing the language the LLM is built from.

We could improve the method we use to store and process those relationships, but it will still be fundamentally the same idea: Large webs of inter-related tokens representing concepts.


👤 jtietema
I think you might find Lex Fridmans interview with Yann LeCun interesting[1]. It discusses exactly this, how LLMs just mimmick intelligent behaviour but have no understanding of the world at all. It also discusses other approaches we should look at instead of current LLMs.

[1] https://youtu.be/5t1vTLU7s40


👤 exe34
> understanding' simply means knowing a man is to a woman what a king is to a queen,

Turns out this is beautifully represented by embeddings alone!


👤 waldrews
Meaning Space transcending the imperfect Language Space? Yes, there's been some recent thinking in this direction, e.g. Zhuangzi, "Words exist because of meaning. Once you've gotten the meaning, you can forget the words. Where can I find a man who has forgotten words so I can talk with him?"

👤 jncfhnb
> understanding' simply means knowing a man is to a woman what a king is to a queen, but without the need to learn about words and letters (that should be just an interface).

Citation needed


👤 quadrature
>Yes, there are compounded words like strawberry (straw + berry) and you can capture meaning at a higher-resolution, but most words are not compounded

What's really cool about tokenization is that it breaks down words based on how often parts of the word are used. This helps a lot with understanding different forms of words, like when you add "-ing" to a verb, make words plural, or change tenses. It's like seeing language as a bunch of building blocks.


👤 xnx
Does anyone have a guess what angle John Carmack is working on with Keen Technologies? https://dallasinnovates.com/john-carmacks-keen-technologies-...

👤 samus
Look no further than here for decoding more than one following token:

https://hao-ai-lab.github.io/blogs/cllm/


👤 gaganyaan
I don't think you quite understand how tokenization works. Try typing "strawberry" in here:

https://platform.openai.com/tokenizer

Tokens aren't just individual parts of compound words, they're sliced up in a way that's convenient statistically. The tokenizer has each individual character as a token, so it could be purely character-based if desired, it's just easier to compute when some common sequences like "berry" are represented by a single token. Try typing "strawberry" into the tokenizer and see it tokenized as "str", "aw", and "berry".

Also, next token prediction is not stupid. A "sufficiently advanced" next token predictor must be at least as intelligent as a human, if it could predict any humans' next token in any scenario. Obviously, we're not there yet, but there's no reason to think right now that next token prediction will face any sort of limitation. Especially with new models coming out that are seeing better perfomance purely from training them much longer on the same datasets.


👤 byyoung3
its not stupid.