Some sign-up forms don't even give you feedback on which characters are problematic. The Oracle Cloud one kept erroring with "you need one uppercase, one lowercase, and one number" when what it meant to say is "remove that tilde", that took a while to figure out.
For a specific example Oracle Database has a very restrictive list of characters allowed in a user password. If you're using Database Users behind the scenes (even if not directly, but via an Oracle integration) you're subject to those same restrictions. Up until Oracle 11g passwords were also limited to 30 characters and a few releases before that were case-insensitive (!).
Is this a good reason? I'd argue, no, but I've worked at tons of organizations where "things that don't make sense" often have an explanation even if it isn't an explanation you're happy with. We should definitely push companies to use cryptographically secure one-way hashing functions with salts, and adjustable difficulty.
The keyboards in the lab were heavily used and was noisy. The space bar, because of its shape, sounded distinctly different from the other keys. I stayed away from the admins when they entered the password like a decent citizen but listened in and found that the password was 7 characters long and also that the second and sixth characters were space (thanks to the different sound of the key). So .˽...˽.
I brute forced this using a shell script (since I has just learned how to write shell script), ran it overnight, and got in the next day.
So yes, I think there might, atleast in theory, be good reasons to avoid certain characters in a password.
Also, since my password manager types letters one by one, I wouldn't use tabs or line feeds.
Maybe don't use grapheme clusters that have multiple valid encodings and make up for it by using a longer password instead?
Because of that, outlawing the likes of line feed, carriage return and backspace (raw input on a tty will store those in passwords, but good luck entering them in a web form) makes sense, as does normalizing Unicode input (typing ‘é’ on their phone may produce a byte sequence that’s different from typing ‘é’ on their PC)
Apart from that, it should not be necessary. If, however, you don’t trust your programmers to do the right thing, you may want to rule out characters that are related to security incidents such as single quotes, and also may want to prevent users from entering strings that might get decoded to such strings such as ‘"’.
That path can be endless, though. If you forbid ‘&’, because your programmers might accidentally html-decode it, should you guard against double html-decoding? URI-decoding and then uudecoding? Getting programmers you can trust to do the right thing and giving them the time to do so is the better option.
But they're probably just storing it in plaintext on some legacy system that can't handle certain characters. Or the plaintext goes through one of those systems on its way to being hashed and salted.
For characters outside that range, there is a good reason: it's hard to type those characters consistently across different platforms/systems, and they don't want you to lock yourself out over that.
> Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length. Verifiers SHOULD permit subscriber-chosen memorized secrets at least 64 characters in length. All printing ASCII [RFC 20] characters as well as the space character SHOULD be acceptable in memorized secrets. Unicode [ISO/ISC 10646] characters SHOULD be accepted as well.
- Requires quoting or escaping in the shell or some other programming environment
- Hard to type on mobile keyboard.
- Not in a given person's touch-typing repertoire.
The correct way to think about password security is as randomly generating a binary string of the desired security strength/length and then encoding it. If you generate 16 random bytes, that's 128 bits of security whether you encode it with hex, base32 or base64.
Required characters also do little to improve security, since there is usually only 1 of each kind of required character, and it's often at the beginning or end. They don't cause the user to select a random string from a meaningfully larger space.
What I cannot get is sites that make you play 20 questions to figure out their rules instead of just telling you, as in my experience, it leads to lousy passwords that meet only the bare minimum. I seem to recall some popular site (want to say it was AirBnB) which threw an error "password cannot contain name/username" for basically anything it didn't like, regardless of whether the password actually contained that, and it's very annoying.
It was one of the most welcomed changes to the password system at a former work place when I convinced the small team behind the authentication to put the requirements plain and simple and change from red to green as people met the requirements. We also added a passphrase helper that could be summoned if they missed requirements a few times which based on metrics got some fair use.
People generally want to do well by security and it's on their mind, but no one wants to look stupid because they can't think of a password that meets unknown requirements. Make it clear what's expected, and even a nudge towards how to think of good passphrases, and you'll get happy people using your site.
I change my password with something randomly generated by my password manager, and the site accepts it, and as far as I know I'm good to go. Then next time I try to log into the website, it doesn't accept the password it previously (falsely) accepted before, and I have to reset it again and play the guessing game of what special character it didn't like. Madness.
Possibly they're preparing for password entry on more ubiquitous devices with limited keyboards? (ATMs, credit card keypads).
Although you should probably not allow "1234" as passwords or anything on the top 100 list for that matter.
That said, I did actually run into an instance where having ";-- in your password would trigger the WAF during login and because we needed to ship ASAP the easiest way to get around that was to ban ; in passwords. I don't think we ever went back to fix that one...
Some emoji, for example, are combinations of multiple other emoji, and a given combined emoji may not be uniquely represented by a sequence of codepoints. In the pathological case, this could mean that an OS update on the user's system changes the composition of the same emoji, which might make it impossible for them to input their password. It is probably prudent for a system to disallow emoji passwords.
One step away from Emoji, Unicode also allows for other m̸̱̜̅ͅȋ̴̩̠̀s̸̺͐c̶͈͇͉̐͛̚h̸̤̣̆i̴͍͍͒͌e̴̲̽̓f̸̞̽̊. Chances are, full-on Zalgo passwords can lead to problems. Again, there are probably prudent reasons to restrict some characters. On the other hand, those modifiers exist for a reason, and disallowing phrases in the user's native language doesn't make for great UX.
Towards the more common use of Unicode, there is a pretty good _practical_ reason to restrict the use of some non-ASCII characters: if your system accepts ç, ö and ø as characters in passwords, and non-technical users venture into a part of the world where the keyboard layout doesn't, your helpdesk is going to have to deal with the occasional annoyed customer. From a systems design perspective, those characters seem fine -- operationally, they may cause headaches.
Finally, we've arrived at printable ASCII characters. Restrictions on maximum length (usually 6 or 8 characters), and on certain characters (%, & or :) tend to be based on interactions with legacy systems (e.g. DES crypt() used to have an 8-character minimum), or on bad input handling. Either way, it's probably a bad sign.
I think it took me about five reboots in single-user mode and password resets before something clicked. I wish Ubuntu would not have allowed special characters. :)
So if your password is "password", it will get entered in as "Password" - and the user will get confused why their username/password aren't logging them in.
So a UX pattern is to actually lowercase the first letter on the backend.