HACKER Q&A
📣 keepamovin

Is it significant that token length of source is close to e?


What's the relationship between Shannon entropy of the distribution, and token length? For example, English is quoted as having a token length of ~4 characters, but source code (that I've tested) seems to be closer to 2.7. Is this significant that it's close to e (i.e., the base of the natural logarithm)? Is source code a more efficient and natural representation of structure/knowledge/information than English? Any thoughts? Any connection with how log appears in thermodynamic entropy?


  👤 uberman Accepted Answer ✓
My guess is that you are not naming your variables correctly.