Abstract:
Many cryptographic systems such as OpenPGP and Linux Unified Key Setup utilize passwords to encrypt the payload, whether directly or indirectly. Due to limits on human memory, low entropy of natural language, and other related factors, passwords are oftentimes the weakest link in the crypto system, and thus represent a viable attack point for obtaining unauthorized access. In this article I share my personal experiences with trying a password guessing attack on an OpenPGP keypair, with a focus on the combinatorics of the problem.
Help! I forgot my password!
Last week, a friend of mine approached me with a problem relating to her OpenPGP key (a type of digital certificate used for validating and encrypting messages). She had just received her first encrypted test message from me, and was experiencing problems with convincing her OpenPGP software to decrypt it for her.
It turned out to be a multi-stage debugging session. Quite a few things had gone wrong. And because my friend had approached me via instant messaging I had to debug these problems remotely, which complicated matters even further.
decryption failed: secret key not available
I had actually physically supervised my friend while generating the OpenPGP key. We were not using her computer. As a consequence, on her own computer, it was possible for her to not actually have the key I was encrypting to. So I first checked if she had correctly imported her key. This did in fact turn out to be the (first) problem here.
decryption failed: secret key not available
… Again.
Having verified that she possessed the correct key, I decrypted my own copy of the message to check if I had sent it to the correct recipient. This wasn’t the case: my messaging application had silently dropped her from the list of (cryptographic) recipients because it couldn’t find her key either. It turns out that I had imported her key on the wrong computer.
decryption failed: bad passphrase
After making sure I had sent a correctly encrypted message, she replied with the worst of news: Help! I forgot my password! I tried typing it carefully five or six times now, but it’s still not working! I’ve forgotten it!
Which meant the following: If we wanted to still use that key, we would have to guess the password.
Why blindly guessing won’t work: the “size” of the problem
The problem with blindly guessing the password is that the solution space is too large. If the chosen password is sufficiently long, then the chances for finding the correct solution among all the other candidates is extremely low. Additionally, the cryptography involved here is designed to not reveal any information about the true password if we try an incorrect password. Every other not-yet-proven incorrect password is just as likely a candidate to be the true password as any other. There is thus no shortcut to trying trying out every single candidate.
To make an analogy: The chances of finding a needle in a “normal” haystack are already slim, but what if the haystack were the size of a soccer field?
In this case, my friend said that the password was 24 characters long. There are approximately 100 characters that can be typed directly on a 105-key keyboard using the standard German “QWERTZ” layout, which is what my friend used. This gives us 10024 combinations, meaning 1×1048 possibilities, one of which is the correct one… assuming that she correctly remembered her password length to be 24 characters.
Assuming we can try out 1000 combinations per second (which is unrealistically high for this specific scenario), we can then try out 31,557,600,000 (31.5 billion) combinations per year. Then it will still take us roughly 3.2×1037 (32,000,000,000,000,000,000,000,000,000,000,000,000) years to try out all those combinations.
That is far too long!
Targeting typos: a keyboard-aware approach
My first reaction when I encounter problems like these is to write a program that systematically tries out all combinations within a smaller set that still contains the password. The hard part of this approach is finding the correct smaller set. As my friend had tried to rule out typos during password entry, I thought of the possibility that she had committed a typo while setting the password the first time. So I wrote a password-guessing program for this occasion.
For each character in a given initial guess, the keyboard-aware approach tries out all combinations of “neighbors” of each key, where “neighbors” are keys that are likely to have been pressed instead of the intended key. This program uses proximity on the keyboard as well as accidental presses of the Shift key as indicators of neighborhood. It’s also specifically tailored to the German “QWERTZ” layout. For example, the set of neighbors for the f
key contains the letters e
, r
, t
, g
, b
, v
, c
and d
, as well as all uppercase versions of these letters (including F
), and the additional character €
. The program first determines the correct such set of neighbors for each letter in the initial guess. Then it generates a password candidate by substituting each letter in the initial guess with one of the letter’s neighbors (including the letter itself). Finally, the program then iterates through all possible such combinations, testing each one as to whether it is the true password.
In effect, this approach reduces the character set of the candidates from the full 100 characters to anywhere between 7 and 24 characters, and 16.9216 characters on average.
Figure 1 lists this first version of the keyboard-aware password guessing program. This is, however, not the actual version my friend and I ultimately used.
Targeting the case of the letters
Unfortunately, the keyboard-aware approach just described still is infeasible for moderately long passwords, such as the one my friend chose. Given her password length, we expect to try out roughly 3.0×1029 (300,000,000,000,000,000,000,000,000,000) combinations, which would take us 9.6×1018 (9,600,000,000,000,000,000) years, according to our 1000 combinations per second
assumption. I might have realized this, had I done the calculations before writing the program.
My friend gave me more information on her password to narrow it down. She said that her password contained only lowercase ASCII letters and digits, and the @
character.
Using this new information, I adapted the program to only consider lowercase ASCII letters and digits, and @
. We now assume that the true password contains at most three deviations from the initial guess, and that all deviations are limited to ASCII letters and digits as well as the @
character. The character restriction limits the candidates’ character set to anywhere between 1 and 11 characters, and 6.7234 characters on average. The limit on the number of deviations limits the amount of candidates to those with 0, 1, 2 or 3 deviations from the initial guess, instead of anywhere from 0 to 24 deviations. With this scheme, we test the following groups of candidates:
Those with exactly zero deviations from the initial guess. There is 1 total candidate.
Exactly one deviation. There are on average 137.36 such candidates, depending on the initial guess.
Exactly two deviations: 9041.03 candidates.
Exactly three deviations: 379466.79 such candidates.
In total, we test 388646.18 candidates on average per initial guess. This number is low enough that the program can finish within a few hours or a day at most, even on somewhat old hardware.
Figure 2 lists this revised version of the password guessing program. This is the version my friend and I actually used.
The results?
We gave the program one day to accomplish its job. Sadly, it didn’t find the correct password. At this point, we both decided that the benefits of keeping my friend’s existing OpenPGP key—namely, the assurance that we are using the correct key, and not an imposter’s version—didn’t outweigh the time costs of trying to find the correct password necessary to actually use the key. We had her generate a new key and then performed a key exchange to the best of our ability.
However, there still is hope. While writing this article and showcasing the program I wrote, I discovered a bug in the refined program’s chosen character set: I had left out the @
character. I have contacted my friend and told her how to fix the bug. She is willing to try guessing the password one more time, now that the program is fixed. As of now, this second run of the program hasn’t finished yet. I will report on this once we have the results.
Irrespective of our attempt above, I want to highlight a truth that I have re-discovered through this exercise: cryptography works. Barring any flaws in the mathematics of the specific algorithm used, cryptography is extremely effective at stopping unauthorized users from reading what they shouldn’t be allowed to read—even if the unauthorized user really is the recipient of the message but currently cannot prove this to the crypto system. And my friend did a wonderful job at making even the password—which is the weakest link in the system—a very tough nut to crack. Every other attacker trying to obtain her OpenPGP key will be facing the same 1-in-1048 odds that we did.
I believe we have exhausted the combinatorial enumeration angle of attacking her key. Any further attacks on her key are likely better off by trying a different approach instead of enumerating the password. I can think of the following classes of possible attacks:
Social engineering attacks. These attacks use psychological tactics to obtain the password, or hints to the password which can be used as a basis for another enumeration attack. One example of a social engineering attack is coercing the user into cooperation by threatening her with violence. (This would not work in our case, since my friend already is cooperating, and has forgotten the password anyway.)
Mathematical cryptanalysis attacks. These attacks target the underlying mathematics of my friend’s OpenPGP software implementation. Our OpenPGP software implementation is the well known GnuPG
software, which has already received security reviews, and is currently believed to be secure.
I believe that neither of these approaches are feasible for me, and thus outside the scope of this article.
Final question: Can I reuse your code?
Yes, you may gladly reuse my code. I explicitly place the contents of listings 1 and 2 in the public domain. Please use these as you see fit. I would appreciate a comment stating me as the author (or one of the authors, if you modify the code) along with a link back to this page, but this is not a requirement. Enjoy!