Talk Cracking passwords, or why use password_hash() (phpCE 2018)

Check the online version, I often update my slides.

Talk detail

Do not use MD5 to hash user passwords! Do not use SHA-x! Yeah, but why? I'll show you, donut worry. We'll talk about password cracking, what are the speeds, how passwords are cracked on GPUs, and of course how to defend against cracking. As an example, I'll use a recent data leak where 750k plaintext passwords have surfaced from a site that supposedly used MD5. We'll also explain what a salt is and what it's not, and that it's not there to prevent cracking, and that's fine. Eventually, I'll explain what slow hashes like bcrypt or Argon2 are and how to use them in PHP. Let's get cracking!

Date and event

October 28, 2018, phpCE 2018 (talk duration 50 minutes, 42 slides)

Slides

#1 A talk about modern password cracking. What techniques, tools and hardware is used to crack passwords blazingly fast. We'll also talk about what PHP developers can do to secure their users' passwords.

#2 If an application stores users' passwords, it should store them hashed. That means a secure password hashing function is used to calculate a hash which is then stored in a database. But you already know this, right? Right?

#3 Hashing functions are so called one-way functions. You can't “decrypt”, or “unhash”, a hash and get the original password back. Hashing is not encryption, a hash function is not a cipher. With encryption, if you know the encryption key, you can decrypt the original data. A hash could be viewed as a checksum, or even as a fingerprint. You can't reconstruct the “full human” from the fingerprints found at the crime scene, but you can search the fingerprint database and compare what's in there with what you've found at the scene.

#4 That's similar to password cracking. We'll somehow generate so called password candidates, calculate their hashes and see if we can find them in the database we have available. This is called offline password attack: the attacker has hashes and tries to crack them without sending anything over the wire. During an online attack, the attacker inputs passwords to login forms in an automated way. The total cracking time depends on the speed of generating those candidates and their “quality”.

#5 The most primitive way of attacking password hashes is a brute-force attack. No, it's not “brutal force”, don't call it that way.

aaa, aab, aac, aad, aae, aaf ⇛ hashes == hash?

#6 Brute-force attack will just try all possible combinations. To attack an eight-character long password we'd need to calculate hashes for:

aaaaaaaa
aaaaaaab
aaaaaaac
aaaaaaad
aaaaaaae

… and so on, and compare them with one or more hashes we have obtained from the database. This technique is slow because most of the candidates won't ever be used as a password by anybody. Users are predictable and their passwords too, unless they use a password manager, of course. And even if they do, password managers usually generate passwords longer than 15 characters and such passwords cannot be cracked by brute-force either. It would just take ages to generate such candidates.

#7 Slightly better way of generating password candidates is a “dictionary attack”. Let's try only those words from a dictionary or some other known list (sometimes called a wordlist), because users know them and use them as their passwords. Users try to be clever though, and will somehow modify the selected word, sometimes because they're forced to do so by a password policy set by their system administrator. You know, the infamous “Your password must contain at least one capital letter and a number!” error message. An old-school dictionary attack is not very successful anymore.

#8 A thing called “Rainbow Tables” is a specific implementation of precomputed tables. It was used quite often when dinosaurs still roamed the Earth but now there are more effective ways.

#9 Precomputed tables use generated databases of hashes and their corresponding values. Thus it's very fast to find the password if you know the hash, it's just a simple lookup. The disadvantage of precomputed tables is their huge size (“rainbow tables” are used to reduce the size), and the need to generate them for every password hashing algorithm which we want to attack. Often, they also contain candidates that are not used for passwords. Cracking passwords with precomputed tables is also prevented by using cryptographic salts. A salt is a unique and random value used for hashing a password, and we'd need to regenerate the tables for each password, as the salt is different for each of them. That would be highly ineffective. We'll get back to salts later on.

#10 You can test-drive cracking with precomputed tables in your browsers, thanks to CrackStation. For MD5 and SHA-1, they have 190 GB, 15-billion-entry lookup tables. Those also include all Wikipedia articles from 2010, and known breaches. You can also download a smaller 19 GB table.

#11 Nowadays, you use graphics cards, like this beauty here, to crack passwords. This is NVIDIA GTX 1080 Ti, currently the most powerful off-the-shelf card for password cracking: it does 31 billion MD5 hashes per second. For cracking, you want the Founder's Edition line, reference cards designed by NVIDIA, with top-quality components. These cards can operate under heavy loads 24/7. The only problems is you can't buy GTX 1080 Ti Founder's Edition anymore so you'll need to wait for a some new cards, like the new RTX 2080 which is actually slower for password cracking than the GTX 1080 Ti. That might improve with some newer drivers, though.

ASUS ROG POSEIDON GeForce GTX 1080Ti Platinum edition OC 11GB

#12 The gaming editions of these cards (with huge fans, often colorful) are not really designed for password cracking. Vendors are using lower-quality components, sometimes even the previous architecture with just a new chip. This NVIDIA dealer even had a note on their site saying this $1k card is not intended for use under heavy loads. If you can't see the note, first check you can read Czech, then check you're not on their mobile site – the note is not present there.

#13 Luckily, even the dealer's mascot had an opinion on that: ¯\_(ツ)_/¯. So you definitely want a Founder's Edition for password cracking.

[a-zA-Z0-9]{8}: 218 340 bln., 7043 sec, 2 hours

#14 31 billion MD5 hashes/sec is a huge number but unless you run on a state budget, you probably can't imagine that. So let's add a context to it. The number of all possible combinations for 8 characters-long password consisting of lower-, uppercase letters, digits is 218 billions (a billion is 10⁹, sometimes called a milliard in Central Europe). The GTX 1080 Ti card can calculate MD5 hashes for all those combinations in 2 hours.

[a-zA-Z0-9]{9}: 62^9, 5 days; [a-zA-Z0-9]{10}: 62^10, 313 days; [a-zA-Z0-9]{11}: 62^11, 53 years; [a-zA-Z0-9]{12}: 62^12, 3300 years

#15 For GTX 1080 Ti, it takes 5 days to calculate hashes of all possible combinations of a password consisting of 9 lower-, uppercase characters and digits; 62× more for 10 characters, 53 years if that would be 11 characters, and 3300 years for 12 characters. So even if an app is using fast and for password storage totally not recommended algorithm like MD5, it would take 3300 years for the attacker to generate all possible combinations of 12 characters long password using a-zA-Z0-9 character set. And if you're generating your passwords randomly in a password manager, then this would be the only way to crack them. In other words, if you have a password 12 or more characters long, then it's safely stored even when hashed with MD5. Add one more character and it would take 62× more. Add special characters and rather do the math yourself. Unfortunately, most users don't create random passwords.

#16 Of course, you can shorten the time by using more GPUs. Such machines with 8× GTX 1080 Ti (or other GPUs) are built by Sagitta HPC (where HPC stands for High Performance ~~Computing~~Cracking), and if you want (and have enough kilowatts and money), they'll build you a cluster of ten or more.

#17 ---8<--- Cut ---8<---

Mall.cz had a security incident some time ago, and this online shopping gallery had to reset passwords for 1.3 million users at the end of August 2017. I was there, and maybe you too.

#18 In July 2017, somebody posted a link to download 750k Mall.cz user accounts and passwords. Lupa.cz has discovered the dump actually contained 750k, 381 908 unique, plaintext passwords. But reportedly, Mall.cz has never stored plaintext passwords. They have used MD5 until 2012, then SHA-1 with a cryptographic salt, and since 2016 they use bcrypt, one of the recommended password hashing algorithms. The details are tracked in my project listing password storages and disclosures. Ok, that doesn't sound like plaintext. Unfortunately, Mall.cz didn't rehash existing passwords and have even stated that those dumped passwords are from the time when they used MD5.

Tesla Model S with a gas pump in the background

#19 Almost immediately, you could read theories online, that Mall.cz stored plaintext passwords and that they're losers and so on. I think that passwords were indeed stored somehow hashed and somebody has cracked them and has dumped them on the Internet. This theory is supported by Mall.cz resetting roughly twice as many passwords (1.3 million) than what was in the dump (750k). To verify the theory (because I'm a fan of PoC||GTFO), I've rented a Tesla. Oh, sorry, wrong picture.

#20 This Tesla. Amazon Web Services don't offer servers with a GTX 1080 Ti, so I've rented a machine with NVIDIA Tesla K80 instead. These cards are used for computing only, you can't even connect them to monitors. K80 is much slower than a GTX 1080 Ti, it does “only” 4.5 billion MD5 hashes/sec.

#21 We can make up for the lower performance of Tesla K80 by renting a p2.16xlarge instance with 16 GPUs doing 73 billion MD5 hashes/sec in total. I wanted to try crack the passwords just like somebody who's dumped those 750k passwords. I took those plaintext passwords, hashed them again with MD5, followed a guideline to set up the instance, and with almost no preparation for the job (unlike a password cracking pro) I was able to crack almost all the password again in 12 hours (then I went to sleep and halted the machine).

I was left with 935 passwords uncracked. In the first 45 minutes I cracked 165k passwords, 43% of all unique passwords. I've used hashcat password cracker, other well known one is John the Ripper. For renting the instance I've paid just $14.40/hour, $170 total. With Spot instances you can even rent it for less than $5/hour sometimes but the instance might be interrupted with 2 minutes notification if they need the capacity back.

#22 These are some of the passwords that were cracked, all of them predictable in one way or another:

Renik2510!! – a name (probably short for Renata), a number (a birthday maybe), exclamation marks predictably at the end
andalusan89T@ – a word, numbers, a special character again at the end
čokoládamilka – two Czech words (“chocolatemilka”), including national characters
lindisfarne793 – Lindisfarne is an island in England, which was raided by the Vikings in 793
kobylamamalybok – 4 Czech words (the words with diacritics are “kobyla”, “má”, “malý”, “bok”), a palindrome in fact, but that's not how it was cracked
asdfghjkl0123456789 – “a keyboard walk” asdfghjkl + a sequence of numbers

There are a few more passwords in my write-up on the same topic.

A password from the rockyou.txt wordlist: "to neuhodnes"

#23 Often, other password leaks are used as dictionaries or wordlists, rockyou.txt is a well-known one. It contains more than 14 million plaintext passwords, including to neuhodnes, Czech for “you won't guess it” (I think I did), which was also in the Mall.cz dump, or cokoladamilka, this time without those strange characters. You can use other wordlists or even build your own with Wordsmith, and you can just google lists of all Czech words.

Combinator attack (first.txt + last.txt)

#24 Let's see how you can generate some useful candidates, we'll start with the combinator attack. It combines words from two lists so you can generate candidates like firstnamelastname which can sometimes lead to quite a long passwords. First names and last names used in the Czech Republic can be downloaded from the site of the Ministry of the Interior of the Czech Republic. You can of course combine other lists, and if you'd like to combine three lists, check combinator3 from hashcat-utils.

PRINCE (PRobability INfinite Chained Elements)

#25 The PRINCE generator can be thought of as an advanced combinator attack. It uses one wordlist and builds “chains” of combined words. If it's generating a candidate of length four, the PRINCE processor will try 4 letter words (like here) from the list, 2 words of 2 letters combined (me + ok), 1-letter word and 3-letter word (u + too), the other way round (you + i), then also 2 chars words and two single char words (we + u + i) and so on. This attack is responsible for cracking the palindrome kobylamamalybok – 6 + 2 + 4 + 3 characters-long words.

Rules for generating even more candidates

#26 All attacks can be extended with rules that replace characters with digits (a → 4, e → 3, o → 0 etc., so c4nd1d4t3 will be generated in addition to plain candidate), duplicate or reverse words, add digits or special characters, change letters to uppercase and so on.

#27 There was not a single password in the dump, that had been generated by a password manager (not even mine). How do I know that? Those passwords that I've managed to crack were not password manager-generated (it wouldn't be possible to crack such password in 12 hours), and I've checked the ones that I didn't have enough time to crack them. Some of them actually looked like random, generated passwords, like this one.

#28 But the password is known by Google, so it was obviously in one or more wordlists I've used. The password, fm9fytmf7qkckct, is the beginning of a Microsoft Office XP license key. If you'll use a password manager then your passwords will be unique and uncrackable in reasonable time.

#29 I didn't manage to crack these passwords in 12 hours. But they are all predictable anyways:

passwordusuniversalis – two words with -us, -is suffixes
3681913234731michal – a number (actually a Counter Strike CD key) + a name (no, that's not my password)
8araD0mca – two names with characters replaced to look-alike numbers (Bara + Domca)
●●●●●●●●●●● – no idea how it got there as a password but a 3 dots-longer variant, ●●●●●●●●●●●●●●, has been cracked

… and some more again in my article about password cracking.

#30 In October 2017, Amazon has introduced new instances with Tesla V100 GPUs, and boy, these are fast! One Tesla V100 generates 56 billion MD5 hashes per second, that's 12× more than Tesla K80, and almost 2× more than the GTX 1080 Ti. These cards are expensive, and you can't even play StarCraft on it, but you can still rent them for just a few dollars.

A machine with just one Tesla V100 will cost you $3.06/hour, if you want 8 of them you'll pay $24.48 per hour (Spot instances might get as low as $5/hour). That's not much but just don't leave this computing beast on for prolonged periods of time – a week of cracking passwords will cost you $4100, you've been warned. In that case it would be better and cheaper to just buy some fine cracking hardware from Sagitta.

#31 Tesla V100, despite being the most powerful password cracking GPU (and the most expensive too), can calculate only 1700 bcrypt hashes/sec with cost parameter set to 10 (2¹⁰ = 1024 rounds), the default in PHP. It's almost impossible to crack any passwords besides the most simple ones, like password, with these speeds. If you raise cost to 11 (2¹¹ = 2048 rounds), the speed will be just 850 hashes/sec. Just don't raise it too much, otherwise users will wait forever for you to verify their password. With cost 10, the password will be hashed in roughly 80 ms on a regular CPU.

Don't forget that historically, benchmarks use cost 5 (2⁵ = 32 rounds), so benchmark results will look more optimistic – 54k/sec. It's an impressive card nonetheless. For cost 10, one GTX 1080 Ti GPU does only 650/sec, Tesla K80 just 66/sec.

Blowfish-based hashing, password_hash(), password_verify()

#32 What's bcrypt anyway? It's a Blowfish-based password hashing function first published in 1999. Sometimes a password hashing function is also called a KDF, a key derivation function. Blowfish is a symmetric-key block cipher and bcrypt is based on it, but it's not the same thing. There are two functions in PHP and they make using bcrypt really easy.

They are password_hash() and password_verify(), the former is used to hash a password on registration or password change, the latter to verify it during signing in for example. These functions are available since PHP 5.5, bcrypt itself even earlier (in crypt()) but you shouldn't be using versions that are not supported anymore. Your framework might also use these functions internally.

#33 password_hash() returns the hashed password or false on failure. The first parameter is a password you want to hash, $algo specifies the hashing algorithm. It's safe to use PASSWORD_DEFAULT which maps to PASSWORD_BCRYPT today but the default might change in the future. The hashes are backwards compatible so don't be afraid to use the default.

If you'd use PASSWORD_BCRYPT you can raise the cost parameter (which is specific to bcrypt) in the $options array to make cracking even slower, see above. The default is 10 and it's the minimum you should use nowadays. It's also important to note that using the PASSWORD_BCRYPT as the algorithm, will result in the $password parameter being truncated to a maximum length of 72 characters. In reality that doesn't really matter, such long passwords are uncrackable anyway and nobody would even dare, so don't try “fixing” it anyhow.

#34 Use password_verify() to verify a password when the user is signing in for example. It's fairly easy, just pass the password from the login form as the first argument, and the hash from the database as the second parameter. You don't need to, can't even, specify an algorithm that was used to create the hash, because…

Output format: $<bcrypt>$<cost>$<salt><hash>

#35 …because both the algorithm identifier and it's settings are encoded into the output from password_hash(). So password_verify() will read them, hash the password with the same settings, and compare the resulting hash with the passed one. password_verify() is safe against timing attacks but if for some reason you'd need to compare any hashes yourself, use hash_equals().

#36 Cryptographic salt is random data used as an additional input to the hashing function. A salt should be unique and different for each password, so a salt stored in a config file, used for all passwords, is not a salt. It defends against precomputed tables attack. The tables would need to be precomputed for each salt and each password so the “precomputed” part would loose it's meaning. Cryptographic salt also prevents a “birthday attack” which uses the fact that the more users you have, the more of them will have the same passwords. The password cracker would then crack a password and see who else has the same hash.

With salts the time needed to crack all the hashes gets longer with the more hashes you have. Password cracker needs to generate a candidate, add a salt, compare hashes, add another salt to the same candidate, compare hashes, and so on. But the speed of calculating the hash stays the same it's just that the cracker needs to calculate more hashes.

You can't specify a salt with password_hash() anymore, and that's a good thing. Salt should be generated by the hashing library and web developers shouldn't really care about that. Don't try to invent your own salts, you might break the hashing algorithm somehow. The $password parameter in password_hash() should really contain just the password entered by the user, unmodified.

Argon2 (Argon2i since PHP 7.2, Argon2id since PHP 7.3)

#37 Several hashing algorithms are suitable for password hashing, not just bcrypt. The full list is Argon2, bcrypt, scrypt, PBKDF2. scrypt is available only in an extension, PBKDF2 can be used with hash_pbkdf2() but setting the params right might be a bit of a challenge. I'd really recommend using password_hash() unless you need to use something else.

password_hash() supports Argon2 since PHP 7.2. There are several variants of the Argon2 hashing function:

Argon2d is suitable for applications with no threats from side-channel timing attacks (eg. cryptocurrencies) and is not supported by PHP at all
Argon2i is preferred for password hashing and password-based key derivation, this variant is supported since PHP 7.2 if PHP has been compiled with Argon2 support
Argon2id is a hybrid of Argon2i and Argon2d, with some of Argon2i's resistance to side-channel cache timing attacks and much of Argon2d's resistance to GPU cracking attacks, is the recommended and primary variant in the specification draft, and will be available starting with PHP 7.3, if compiled with Argon2 support

If you haven't yet switched to Argon2, just wait for PHP 7.3 when Argon2id will be available with password_hash($password, PASSWORD_ARGON2ID). Argon2 might become the PASSWORD_DEFAULT algorithm in future versions of PHP.

Output format: $argon2id$<version>$<parameters>$<salt>$<hash>

#38 The output from password_hash() is similar if you use Argon2, it also contains encoded parameters:

A memory cost m that defines memory usage of the algorithm (in kilobytes, from 8p to 2³² – 1)
A time cost t that defines the execution time of the algorithm and the number of iterations (from 1 to 2³² – 1)
And a parallelism factor p, which defines the number of parallel threads (from 1 to 16777215)

The defaults in PHP are as displayed: memory cost 1024 kB, 2 iterations, 2 threads.

Version number v is 1.3 (the latest Argon2 version at the time of writing) represented as dec 19, hexdec 0x13.

IKEA style upgrades: SHA-1 → BCRÜPT (or Årgon2id)

#39 If IKEA made password hashing upgrade instructions, it would look like this. If you're not using Argon2 or bcrypt (or scrypt, PBKDF2), you really should upgrade your password hashing to protect users and their passwords.

if (password_needs_rehash(...)) { /* calculate a new hash and replace the old one */ }

#40 To upgrade hashing you can use password_needs_rehash(). The function will return true if the hash doesn't match given algorithm and/or options, in that case you can hash the password with current settings and then replace the old hash with the new one. Call password_needs_rehash() when password_verify() returns true. The problem with this technique is that you need to know the password, so you can only upgrade hashes when user logs in.

Upgrade all hashes, then cleanup on login

#41 To upgrade hashing for existing passwords, and to protect them against cracking, you can do this: first, rehash all existing hashes with a password hash, say bcrypt. Run a one-time job to do this. Once finished, your passwords will be hashed like bcrypt(sha1(password)), which from a cryptography point of view might not be perfect, but given the threat (password cracking on GPUs), it will protect passwords much better than just using MD5 for example.

You'll need to “prehash” the user password with the old hashing function (sha1() for example) before you try to verify it with password_verify(). If the password was correct, you can “clean up” the hashes to get rid of the sha1() “prehashing”.

Such transparent upgrades are nice for your users because you don't need to reset passwords. Don't even think about cracking existing passwords, your job is to protect passwords, not attack them! Use this only when you're upgrading from non-password hashes (like MD5, or SHA-x) to password hashes, you don't need to use this when upgrading from bcrypt to Argon2 for example, both are already a password hashing function. In that case just use password_needs_rehash() as shown in the previous slide.

As this probably needs a bit more explaining and details, I'll point you to my article about upgrading existing password hashes. It has more code and generally should give you an idea what needs to be done and how.

#42 Remember, hobby cracking is not expensive anymore, The Cloud™ offers powerful machines (but if you're a cracking pro you should get your own GPU or two, or get Brutalis from Sagitta, it will be cheaper in the end of the day).

If a site limits the length of passwords to say 16 chars and you're using a password manager (unique random passwords) you can still create a secure password, even if you'll use just lowercase & uppercase characters and digits. But of course, that's not how most users create their passwords so you shouldn't limit the length. NIST says you should allow passwords “at least 64 characters in length”. If you're wondering why for example Microsoft limits the length, then wonder no more, they might actually have a good reason.

Use password hashes (password_hash() in PHP) for hashing passwords. These are sometimes called “slow” hashes, while the other hashes that are used for message integrity (like ~~MD5, SHA-1~~don't use these for integrity anymore, SHA-2) are called “fast”.

If you're using those “fast” hashes, you should stop and upgrade hashing. Don't forget to also upgrading existing password hashes.

Still interested in passwords? Want more?

Check videos from Passwords conference series
See this Beyond the Basics of cracking talk by Royce Williams
Follow my Twitter list of people (etc.) that research, create, crack, talk about, or just know your next password
Get Hash Crack: Password Cracking Manual (v2.0) by NETMUX
…
Profit!

Still here? Congratulations!

Cracking passwords, or why use `password_hash()`

Talk detail

Date and event

Slides

Michal Špaček

Public trainings