HACKER Q&A
📣 anticrymactic

Why can GPT4 en/decode base64?


My conversation: https://imgur.com/a/KzLKdQF All the answers in the picture are true.

I like to think that I understand LLMs pretty well. Which is why I was so underwhelmed by most of the mainstream "AI" news. But this threw for a loop. As a predictor, how can it model base64? It surely can't just be "pretending" like it does with all other stuff. The precision feels the most wrong to me, it does long random strings perfectly. Why does it then fail at simple arithmetic?


  👤 waselighis Accepted Answer ✓
It's a pretty simple and direct mapping. A single character is 8 bits, and a single base64 digit is 6 bits. They perfectly align at 24 bits. So it simply has to learn how to map every 3 characters to 4 base64 digits. Otherwise, there's likely tons of base64-encoded text in the training data simply from scraping the web.

It's not perfect though. I tested it on a few sentences of text and it made a few mistakes. Due to the way that GPT tokenizes the input text, it can't really generalize the pattern, as mappings of text to tokens is somewhat random. It effectively has to learn how to map every unique combination of 3 characters to 4 base64 digits, of which there are up to 2^24=16,777,216 distinct mappings. Otherwise, the number of characters in each token varies, which can also lead to mistakes.

You can use this tool to see how GPT3 maps text to tokens and token IDs: https://platform.openai.com/tokenizer

As an example, the alphabet "abcdefghijklmnopqrstuvwxyz" maps to [39305, 4299, 456, 2926, 41582, 10295, 404, 80, 81, 301, 14795, 86, 5431, 89]. This is what I mean by it's fairly random.


👤 hackinthebochs
People really need to update their model of what a "statistical predictor" can accomplish. We know that Transformers are universal approximators of sequence-to-sequence functions[1], and so any structure that can be encoded into a sequence-to-sequence map can be modeled by Transformer layers. It follows that prediction and modeling are not categorically distinct capacities in LLMs, but exist on a continuum. How well the model predicts in a given instance is largely due to the availability of data during training. This is the basis for the beginnings of genuine understanding in LLMs. I talk about this in some length here[2]. Odd failures and hallucinations are just the model responding from different points along the prediction-modeling spectrum.

[1] https://arxiv.org/abs/1912.10077

[2] https://www.reddit.com/r/naturalism/comments/1236vzf/on_larg...


👤 landgenoot
Base64 is just another translation with very basic rules, you can even do it by hand [^1].

How hard would it be for an LLM to convert a string to binary? That's just a lookup table. How hard would it be to remove all spaces and add spaces every 6 bits? And convert that to letters. That's a lookup table again.

[1]: https://pthree.org/2011/04/06/convert-text-to-base-64-by-han...


👤 Someone
I don’t know how LLM do that, but I would think base64 is easier than arithmetic. ‘All’ you need (more or less) to do it perfectly is a table with 256 × 256 × 256 entries containing four-character ascii strings and the ability to chop up byte sequences at every 3rd byte.

Also, changing a character in the input only has local changes; it changes at most 2 characters in other encoded byte stream.

In arithmetic, on the other hand, a single character change can have effects at an arbitrary long-range.


👤 SkyPuncher
I don’t find this one bit surprising. It’s no different than language translation.

OpenAI has an association between Base64 tokens and plain text. It’s likely a pretty high correlation, but like everything, it likely has some unpredictable edge cases.


👤 dave84
Based on my understanding from the recent TED talk [0] they mention that GPT-4 can for example add 40 digit numbers perfectly, but makes mistakes on adding a 40 digit and a 35 digit number. It hasn’t worked out a generic understanding of arithmetic but several smaller case specific ones which may not always be correct.

[0] https://youtu.be/C_78DM8fG6E


👤 og_kalu
GPT-4 doesn't really fail at simple arithmetic.

But it's a harder task to learn because arithmetic doesn't encode information about its solution in preceding context. "9383 + 3545" or "is any of the following numbers a prime, 96885, 66576, 4766 ?" doesn't actually tell you anything that would inform the answer. You go to school and you learn the required set of steps for solving these problems.

On the hand, for "John is smiling so he is _____", the preceding context screams happy as a very likely choice. Preceding context actually helps finding the solution rather than being the equivalent of deadweight.

And you simply didn't understand LLMs as well as you thought you did.

Language models trained on Code reason better, even on benchmarks that have nothing to do with code. https://arxiv.org/abs/2210.07128

Encoding/Decoding Base64 is neat but not particularly mindblowing unless you have some serious misconceptions on what language models are capable of.


👤 tikkun
Here's my understanding of how GPT-4 works.

Imagine you're a supercomputer and someone feeds you billions and billions and billions of pages of text written by humans.

Then they ask you to compress it really really small.

You can't compress it that small without figuring out a lot of the underlying laws, frameworks, and rules that apply to humans, that apply to the world, and so on.

Compression == coming up with powerful frameworks that condense knowledge.

It's kind of like how with a really powerful set of rules or frameworks in math or physics, you can derive many other things.

As a side note, I suspect GPT-4 has inside its neurons a bunch of powerful frameworks about the world that humans haven't yet discovered.


👤 pmoriarty
It seems to do this ok enough for short bits of base64, but I've been unable to get it to work with encoding/decoding paragraphs or pages of it.

👤 usgroup
presumably if you ask it to write you some code to decode base64 it will.

presumably if you ask it to execute that same code for a input example you provide, it will.

Et viola.


👤 blibble
it's probably loaded with heuristics as a pre-filter that modifies the prompt by decoding the base64