You don't crack passwords using rainbow tables or brute-force attacks anymore. So this probably wasn't a plaintext leak, somebody have cracked 750k passwords and uploaded them online. I've tried cracking them too.
Czech online shopping gallery Mall.cz has suffered a data breach. For about a month, since July 2017, the supposedly leaked data file was available on Ulož.to (“Save it”), a local file sharing service. Czech magazine Lupa.cz has seen the data and says the file contains 750k plaintext passwords. I've read almost all the articles about the incident (including the comments!) and one of the things I've learned is that it must have been a plaintext leak, and that passwords are cracked using precomputed (or rainbow) tables and brute-force attacks, by trying all possible combinations. No, you don't crack passwords like this nowadays anymore. And plaintext leak? Not necessarily.
Also known as the facts: the file hosted on Ulož.to (now deleted) contained 735 956 unique email addresses (Troy Hunt has imported just 735 405 addresses to Have I Been Pwned? notification service, addresses with invalid or national characters were left out) and 766 421 passwords in readable plain text. Roughly 216k passwords looked like randomly generated strings of 6 alphanumeric characters (lower and upper case letters and digits). We've learned later that such passwords were generated and sent by Mall.cz automatically upon completing the first order.
Mall.cz was using several outdated and insecure password hashing schemes, like MD5 and Salted SHA-1, until October 2016 when they switched to bcrypt. Mall says the breach has occurred in 2015 and that most passwords in the leak are from the time when they were hashed using MD5. Unfortunately, Mall.cz didn't properly protect existing passwords, so attackers had accessed MD5 hashes in 2015 even when Mall.cz has been using Salted SHA-1 since 2012.
We also know that the file hosted on Ulož.to, the Czech file sharing service, did not contain all Mall.cz accounts. My randomly generated password used for Mall.cz exclusively since 2009 (97DS9WK14qMrAbzftnwd
) was not in the file. It may be because nobody has cracked the password until the file was uploaded to Ulož.to. I actually think somebody has accessed more (hashed) passwords than what was in the file. First, Mall.cz has reset passwords for almost twice as many accounts, including my own, and second, few days after the incident was disclosed, this guy, let's call him John, messaged me stating that somebody had successfully logged into his Steam account. John added that he'd used his Steam password for Mall.cz too, but nowhere else. The thing is John's account with his password (chosen by himself, not computer-generated) is not in the published file with Mall.cz accounts. Eventually, John's Steam account was saved by two-factor authentication.
So, was it a plaintext leak? Maybe, maybe not.
High performance cracking beast built by Sagitta HPC, now Terahash, photo Jeremi Gosney
Nowadays, when you crack passwords from leaked databases (so called offline attacks) you don't use precomputed lookup tables, also known as rainbow tables. Generating these tables is time-consuming, and they contain a lot of records that don't correspond to how users are creating their passwords. With general-purpose computing on graphics processing units (GPGPU) now widely available, you rather use that for cracking passwords. Tools like hashcat or John the Ripper create so called candidate passwords on GPUs and compare them with available hashes. When the tool finds a match, it is marked as the original password. The chance that it would be a different password with the same hash, a hash collision, equals almost zero. Passwords are too short strings and use limited character sets. One GPU allows to generate billions of MD5 hashes per second. Users employ predictable methods when creating passwords, so we can generate only candidates that match those methods.
The most powerful graphics card you can buy today for password cracking is the NVIDIA GeForce GTX 1080 Ti. It can generate 31 billion MD5 hashes per second (GTX 1080 is about a fifth slower). When you're serious about password cracking, you want to use the Founders Edition, a reference card designed by the chip vendor. Reference design cards can operate 24/7 under full load, while “gaming”/OEM editions simply can't. You play games rather occasionally, a few hours a day, so these “gaming” cards can use cheaper and lower-quality components (the chip itself is the same, of course), which unfortunately is not projected much into the final price.
You can't get new GTX 1080 Ti Founders Edition anymore, so what about the cloud? Amazon AWS offers Elastic GPUs (which most probably can't be used for password cracking) and p2, p3, and g3 instances with the following cards (listed are the max models you can get, prices can vary, consult respective vendors for up-to-date prices):
Just to compare, Microsoft Azure offers the following machines with NVIDIA Tesla GPUs:
OVH offers only the following dedicated server:
Tesla GPUs are not powerful enough for some serious password cracking (except the new V100), when you compare them with GTX cards in MD5 benchmark:
The Mall.cz file hosted on Ulož.to contained 381 908 unique passwords. I've hashed them again with MD5 and tried re-cracking them back. Just wanted to see if it would be possible and how far I'd get. If passwords could be cracked rather fast, then it could have been hashes what leaked, not plain text, and somebody has cracked the passwords. So I've rented Tesla K80-based p2.16xlarge (Tesla V100-based P3 instances are available since October 2017, the events in this “story” happened in September), downloaded probably the two most famous word lists rockyou.txt
and phpbb.txt
, and started cracking, essentially skipping the preparation phase. A pro-level password cracker would certainly not underestimate the preparation phase, wouldn't crack passwords on Amazon, but I didn't want to do it as a pro, for a reason. I've even used only standard hashcat-bundled rules for generating additional candidates and a I've built lists of Czech words, first and last names only after I've started cracking. I didn't even bother to build a list of Czech street names. A pro would prepare all of this in advance and would crack more passwords within the same time frame.
In addition to using those two already mentioned world lists, I've tried cracking passwords using the following techniques:
firstnames.txt
and lastnames.txt
)All these attacks can be customized and extended using rules. These can for example substitute characters with digits (a
→ 4
, e
→ 3
, o
→ 0
etc. so both password
and p4ssw0rd
will be tested), change some letters to their upper-case variants, mix numbers or special characters in, duplicate words and so on, eventually producing bazillion of candidates.
I've cracked 165k passwords, cca. 43%, in 45 minutes. In 12 hours, I've cracked almost all of them, I was left with 935 passwords not cracked only because I wanted to go to sleep. I'm pretty confident that I'd get all of them if I'd keep the cracking running for a few more hours.
I've managed to crack even these passwords, which were not in any word list I've used:
JK52jarka
– “jarka” is a short form of a Czech namelockap7gia
– “lock” + hmm, somethingsoyouitknow
– four English words, “incorrect” orderstr3ela9133
– Czech word strela customized by rules + a numberMarketa19..
– a name with an upper-cased first letter + a number + dotsRenik2510!!
– similar, but with two exclamation marks, both well known patterns15zdenek1973
– “Zdeněk” is a Czech nameto neuhodnes
– means “you won't guess it” in Czech (this is an exception, it was in the rockyou.txt
list)andalusan89T@
natoneprijdes
– three Czech words, no spaces, means “you won't find it”čokoládamilka
– “Milka chocolate” in Czech, the rockyou.txt
list contains a variant without the accented characters – cokoladamilka
lindisfarne793
kobylamamalybok
– four Czech words, a palindrome actuallyfm9fytmf7qkckct
– first 15 characters of a Microsoft Office license keyasdfghjkl0123456789
– a “keyboard walk” + a sequence…and 380 961 more. Some of the not cracked passwords are:
carolinepassword2680?
– a name + password + a number + a questionmarkpasswordusuniversalis
– two words with -us and -is suffixes3681913234731michal
– a number (actually a CD key for the Counter-Strike game) + a name (nope, not my password)Qawsedqawsed11+
– 2× a “keyboard walk” + a number + a plus signj4 n3v1m v073
– ja nevim vole means “I don't know dude” in Czech, here with well-known substitutionsPyQ7z4XwBf1o9
– first I though this was a computer-generated password, the strongest one in the leak, but smells like a password reuse, although the question is for how long has the password been known (no other Mall.cz passwords present on that site)●●●●●●●●●●●
– no idea how this password made it to the database, but I've managed to crack this 3-chars-longer variant anyway: ●●●●●●●●●●●●●●
There's no single password-manager-generated password on the list of not cracked passwords, and all of them are similar to what has been cracked. So I think that the rest could be cracked too relatively quickly.
I think the passwords in the leak could have been hashed somehow. Cracking cryptographically salted SHA-1 (a hash Mall.cz has been using 2012–2016) would take much more time not only because the speed of generating SHA-1 hashes is three times slower, but also because the optimization of comparing one candidate with all the hashes can't be utilized, each hash uses a different salt. Some of the passwords stored using bcrypt hash (since 2016) could be cracked too, but just some weak ones like password (GTX 1080 Ti does 646 hashes/s for bcrypt with 210 iterations, Tesla V100 does 1707 hashes/s), but Mall.cz says that most of the cracked passwords are from the period when we were using MD5.
Two tips to conclude with: