HACKER Q&A
📣 unipassword

How do I handle Unicode passwords?


Hello, I was recently asked to, if possible, implement Unicode in the password field. The storage is not itself a problem - the hasher itself can accept binary, so I could use UTF-8 so that I wouldn't need to migrate ASCII passwords. My question is before that hasher: I'm aware that Unicode in itself is so messy that you need to normalise it first into something (https://en.wikipedia.org/wiki/Unicode_equivalence) so that would not run on normalisation bugs (such as https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/). If it's impossible, I would have to write a detailed reason since the application is used around the world and they would really prefer to support Unicode passwords.


  👤 ev1 Accepted Answer ✓
Just hash raw bytes. Treat it as data and set a maximum byte limit.

👤 alpaca128
If I understand this correctly the canonical equivalence is more about visual appearance and a user's expectation that two visually identical symbols are treated the same. But I find it hard to imagine that a user somehow manages to input equivalent but differently coded symbols on any standard keyboard, especially for a password that a single person will probably always input the same way.

👤 rurban
don't normalize. just hash the UTF-8 asis.

normalization changes every year.

you need normalization only if you need to find or compare strings. or for visual equivalence.