Has anyone considered using rich tokens for LLMs?

Question

The tokens used by LLMs are the rough equivalent of the tokens produced during programming language parsing, but LLM tokens can only represent fixed character sequences. In programming language parsing, literals are parsed into tokens that have the literal value as extra baggage that they carry around.Current LLMs map words to token IDs, such as "apple" -> '1', "the" -> '2', etc...Why not have them map numeric literals to special token IDs that also has the numeric values associated with them?E.g.: token IDs 0..1000 could be assigned to the first 1K distinct numeric literals being processed.Subsets of the matrices at each layer could be special-purposed to do arithmetic on those tokens. E.g.:, if a "maths operator neuron" is activated with inputs that are IDs 533 and 712, then it'll "multiply" those two numeric values.This could allow LLM-like systems to be built that can do arithmetic in the same way as a calculator instead of trying to do long-form multiplication like a human with pencil and paper.

ftxbro · Accepted Answer

I think the trend is away from semantic tokens, and away anything that is remotely like 'feature engineering'. Maybe even they will just use character level or byte level encoding in the future. They have used it before, and probably something like it will be used again if the Rich Sutton 'bitter lesson' continues to be true.

i2cmaster · Answer

Why have the LLM do arithmetic at all though? Just have it generate python if you need a calculation.