HACKER Q&A
📣 mcqueenjordan

Should we still Base64 encode?


Base64-encoding incurs some overhead (storage and computation). For context, Base64-encoding is a mechanism to encode bytes (0-255) to a radix-64 alphabet (0-63), with '=' for padding. Base64 encoding is 4:3 output:input, so 33% overhead.

Assumption: Many systems (modulo email) are 8-bit clean nowadays.

As an example, many session tokens are base64 encoded, but if we know that these tokens will only interact with 8-bit clean systems, should we not avoid the overhead?


  👤 pwg Accepted Answer ✓
> Assumption: Many systems (modulo email) are 8-bit clean nowadays.

Your assumption is flawed. Most of the 'systems' that were not 8-bit clean (and resulted in the various 'encodings' being required) remain just as 'not 8-bit clean' today as they were then.


👤 A-AronBrown
I had a similar thought a few months ago when looking at ways to encode data in HTTP cookies and came to the conclusion that base64 wasn't broken, so there was no need to fix it.

For medium-large messages I would tend to use JSON/msgpack where appropriate, so no need to further encode anything.

For small (binary) messages, other encodings (e.g. ascii85) often wasn't much smaller and didn't encode faster, so there was no measurable performance benefit. And given the extra complexity and compatibility issues it would take to use something else, it just wasn't worth it.


👤 wmf
Probably not. Most places where base64 is used are no cleaner than they ever were. XML is not perfectly 8-bit clean and neither are HTTP cookies. You could probably replace base64 with more optimized yenc-style encodings in many situations but it's probably not worth the hassle.

👤 zzo38computer
There are sometimes cases when such encoding is necessary, such as to not interfere with the message framing, and there are many such cases; it happens in many text-based protocols (which is good in order to be able to work it without specialized software; for example, NNTP). Or if you are going to type in the data by hand, maybe, such as from a print out.