Current LLMs map words to token IDs, such as "apple" -> '1', "the" -> '2', etc...
Why not have them map numeric literals to special token IDs that also has the numeric values associated with them?
E.g.: token IDs 0..1000 could be assigned to the first 1K distinct numeric literals being processed.
Subsets of the matrices at each layer could be special-purposed to do arithmetic on those tokens. E.g.:, if a "maths operator neuron" is activated with inputs that are IDs 533 and 712, then it'll "multiply" those two numeric values.
This could allow LLM-like systems to be built that can do arithmetic in the same way as a calculator instead of trying to do long-form multiplication like a human with pencil and paper.