A talk about letting users know what algorithm do companies use to hash users’ passwords, if any. With additional free speaker notes.
Ever wondered why some sites limit the length of passwords? You're not alone. Quite often when someone discovers that a site is limiting length of their password they ask why. Most of the times, they don't get any answer. Sometimes even the companies don't know because the guy who knew already left.
maxlength(64) = varchar(64)
? People are a bit suspicious that the site might be storing their passwords in plaintext in a column 64 chars wide, if they limit the length to 64 chars. It might or might not be true but I don’t think there‘s any significant relation between maximum allowed password length and a password storage. Of course, storing passwords in plaintext actually requires limiting passwords length.
Users, like this Shotbow forum member, are asking especially when there's been a data breach at a site they use. Shotbow is a Minecraft server and they suffered a breach in May 2016.
Asking the general public can give you some very interesting answers. Let's just not go into details here, you know, encryption, hashing, Kerckhoffs' principle. Let's move on. Thanks Bruce @PwdRsch for sending me the link.
The official answer from a staff member wasn’t any better. If companies don't disclose details, always expect the worst. Luckily, there are companies who have no problems disclosing all the details of their password hashing.
Like Facebook. Alec Muffett has disclosed Facebook's hashing algorithm in a talk at Passwords14 in Norway. Seriously, go watch it. They call the algorithm “The Onion”, because it has layers. At the core of it there's scrypt and HMAC.
Or LastPass which uses PBKDF2 with SHA-256 to turn master password into an encryption key. Or 1Password releasing 60 pages long PDF of their security design and sending Jeffrey Goldberg regularly to conferences like Passwords.
Some other smaller services have no problems disclosing their hashing policies either, like for example Scott Helme's report-uri.com, which is a great Content Security Policy reporting tool. Scott's got this on the login and sign-up page, right next to the password input field (now moved to the FAQ page).
I've actually started collecting info on how companies store user passwords. The collection is available at https://pulse.michalspacek.cz/passwords/storages. It's part of a bigger survey, I also scan HTTPS on Czech banks using SSL Labs Server Test every month. The name Pulse is heavily inspired by the work 18F does and which is available at https://pulse.cio.gov.
My site looks like this. So far (August 2016), I have less than 20 records but I want the collection to grow. If you know a site which should be listed, please let me know. I’ll add it. Here we see a company called Datadog and we see they are hashing user passwords with bcrypt. They’ve been rated with B-grade, we'll get to that.
Where I have more details, like hash params, I also share them. I always link to a public disclosure, so the site is actually more like a collection of links to who said what. If I have a historical data, I also share it as you see in this example.
My scoring system is inspired by SSL Labs Server Test rating and it works in the following way. The better the hashing algorithm is and the better the disclosure is, the better score the site gets. So if they use bcrypt (or any other slow hash, like PBKDF2, scrypt, or Argon2) and they tell us in their docs, they score A. If they tell us only in a blog post, or a talk, they score B, because a talk or a blog post is not that visible and you can't easily find it if you don't know what you're looking for. Both A and B are scores for safe password storage.
A site scores C if they use unsuitable hashes like MD5 or SHA-1 with a salt and multiple iterations. They score D, if they hash passwords with one iteration of an unsuitable hashing function, with a salt. Grade E is for when they use plain fast hash or encrypt passwords. Users are advised to create unique passwords for sites with these scores, especially for sites with D or E.
Last but not least, F is for total failure, and that's when the site stores passwords in plaintext. When signing up for the service, users should use a unique password, not used anywhere else.
So, is it ok to share or disclose your password hashing policy? I think it's ok, especially if you do passwords and hashing right. If you don't do it right, then fix it and then disclose. But if you don‘t care…
Some companies fear that if they disclose, they will become a target. Well, I have bad news for them, they already are a target. Companies get hacked no matter what hash they use. For example Datadog uses bcrypt to store their passwords and they have used it even before they suffered a breach in July 2016. Also Ashley Madison was using bcrypt even before the breach, but clearly passwords were not the motivation there. And I know several companies using plain MD5 or SHA-1 who didn‘t get hacked (yet). Or at least they don‘t know about it.
md5('240610708') == md5('QNKCDZO')
→ 0e... == 0e...
→ 0 == 0
Sometimes, you can even check yourself what hashing is used. There are several tricks, like this one for PHP: you sign up with password 240610708
and then try to log in with password QNKCDZO
, and if you're in, then it's plain MD5. PHP compares the hashes as numbers if ==
is used, and they both start with 0e
, followed by digits (exponential notation meaning zero times whatever), which means PHP compares them as zeros. If it doesn't work, it could still be MD5 but they could be comparing hashes in a different way, for example using ===
. I've got similar tricks for few more algorithms on my GitHub. They all exploit some nice features in PHP.
5f4dcc3b5aa765d61d8327deb882cf99
If a database gets leaked, people can usually tell what hash is used just by looking at it. Or do you think this is a bcrypt hash? Not disclosing details of password storage won’t prevent database leaks. But if a company has publicly disclosed what exactly they use for hashing passwords, and there‘s a database dump claiming to originate at that company but it has different hashes, we know it‘s coming from elsewhere. PR‘s job is suddenly much easier.
Anthony Ferrara, aka ircmaxell, ran a test some time ago. He gave people two passwords, two salts per password and four hashes in total, and they had to reverse the hashing algorithm to produce a hash for password foo
with salt barbarbarbarbarbarbarbarbarbarba
. He provided 15 such algorithms, some of them pretty weird.
Amazing people listed above were able to find 14 algorithms. One guy found the server was leaking the algorithms and was able to “reverse” them all.
If a site uses open source software, then they disclose by design. Which is a good thing after all, because it allows bugs in hashing to be fixed soon-ish.
password_hash($cookie_key.$passwd, PASSWORD_BCRYPT);
Like this bug in PrestaShop. They were using MD5 with a static salt, then they have changed it to bcrypt in development version, but they have still used the extra salt. It was 56 bytes long, effectively cutting the passwords to 16 bytes, because bcrypt truncates passwords at 72 bytes. I've fixed the issue before they have released the code and made PrestaShop more secure just because they have disclosed the way they hash user passwords. Nevermind the BCryptSHA256
and encrypt
keys, they were renamed in later revisions.
It's ok to disclose your password hashing policies, especially if you use slow “password hashes”. Users will love the site more, guaranteed. And if you do something nasty to users passwords, then fix it before it's too late and then disclose, your users will love you too. I've been there and done that. Don‘t forget to let me know so I can add you to my list of sites and their hashing policies.