A team of six computer scientists (Lenstra, Hughes, Augier, Bos, Kleinjung and Wachter) are reporting a flaw in the way that certain SSL certificate issuers are generating RSA keys. This is getting significant online coverage at websites like NYTimes, Slashdot and BoingBoing, so I thought I’d try my hand at a low level explanation of the mathematical issues.
The goal of RSA is that you can publish a certain large number — called your public key — and anyone can then use that key to generate coded messages that only you can read. You can basically think of the RSA system like a mail drop box: Anyone can put messages into it, but only you can get them out. There many good explanations of how RSA works online, so I won’t get into the details. The essential point is that RSA keys are products of two large prime numbers. In order to make an RSA key, a computer finds two large primes and multiplies them together. The key’s owner then publishes the product, but keeps the individual primes secret. It is using these secret primes that the key’s owner can read the messages he receives. (Of course, in practice, the key’s owner will have a computer program to do this all for him: I am not suggesting that Bruce Schneier actually hauls out a piece of paper with two 300 digit primes on it everytime he reads his mail!)
If you want to read someone else’s mail, and it is protected by RSA, one method is to factor their key. Fortunately, factoring large numbers is believed to be very hard — much harder than multiplying them. So, what is the vulnerability?
The Euclidean algorithm
In elementary school, teachers may have assigned you to find the greatest common divisor of two numbers. For example, the greatest common divisor of and is because , , and there is no larger number which divides both and .
The way you would probably think of doing this is to find the prime factorizations of and : Namely, and . Then take the primes that they have in common (namely and ) and discard the primes that only appear in one of the factorizations (namely and .) Then the GCD is the product of the overlapping primes: .
The most important thing to know about computing GCD’s is that this is the wrong way to do it. You can easily demonstrate this for yourself. Go over to Wolfram Alpha and type Factor followed by some 90 digit number. The computation will time out; this is too large to factor in the period of time that Wolfram allocates to an individual user. Now type GCD followed by two 90 digit numbers. You’ll get an answer almost instantly. (Here’s mine.) Clearly, Wolfram is doing something more clever than factoring those two numbers.
The right way to compute GCD’s is to use the Euclidean algorithm. This consists of repeatedly subtracting the smaller of your two numbers from the larger. For example, any common divisor of and would also divide . So . Similarly, any divisor of and would divide so . Keeping going in this way, the numbers get small very quickly: At the next step, we are looking at , and divides , so is our GCD.
The Wikipedia link has more details. Whether or not you follow them, the key point is that factoring is hard, while GCD’s are easy.
How factorization methods work
So, one way that you might try to factor some large number is to guess various numbers and compute . If you get an answer other than or , then you have found a factor. In particular, if is a product of two large primes, like in RSA, if you find one factor this way then it will be one of the primes, and dividing by that prime will give you the other factor.
If you just guess at random, you are very unlikely to find a nontrivial factor this way; you’ll just keep getting back “” from your computer. This method is no better than just directly guessing prime factors of . Modern factorization techniques are based on choosing in clever ways to increase the odds of a hit. For example, Pollard’s method is based on computing for certain carefully chosen and . The various methods that are out there increase the odds of finding a useful GCD to well above random chance, but they are still very long odds, which is why factorization is hard.
The point of the new paper is that, rather than using clever number theoretic methods to choose , a good choice of is someone else’s public RSA key. Remember, RSA keys are published, so you can just download millions of them and see whether there is a common factor.
When will this attack work? It will work if the RSA key manufacturers are reusing primes. For example, suppose that I have the key and you have the key . If a malicious person computed the GCD of 440747 and 269107 then they would discover the common prime , and a little division would give them the other factors. The problem is caused by our RSA issuer using the prime more than once.
Security experts have certainly known about this issue. But it’s a tricky one to avoid. RSA certificates are being generated continuously all around the globe, so there is no way to have a database of what primes have been used. (Plus, such a database would be a huge prize for computer criminals; if you stole it, you would know exactly which prime divisors to try in a future attack!)
So the approach which is used is simply to generated random primes from a very large space, and hope that the set of primes being used is large enough. If you and I are trying to go to New York City and not see each other, we don’t have to coordinate plans. We just have to each go to a random street corner and stay put, and we will almost surely succeed.
In this case, New York is the set of all 512 bit prime numbers. (For 1024 bit RSA. There is also 2048 bit RSA, which uses even larger primes.) These are primes that are roughly 154 digits long, and there are about of them. Considering that the number of people on earth is only around and that the number of seconds since recorded history began is around , this should be adequate. Comparing this to two tourists in New York trying to avoid each other wildly over states the difficulty. I was going to say that this is more like two atoms in New York but, in fact, there is no atom or subatomic particle small enough to indicate the hugeness of the space being used here. So this method should work and, indeed, huge portions of internet security are bet on the idea that it will.
Sidenote: Some clever readers might be remembering the game where your teacher bet you that two people in the class had the same birthday. Even though there were only of you, and days in the year, the odds were in her favor. When looking for coincidences between objects in a set of size , you should expect to start seeing them when is near , not all the way up at . But is still plenty big enough to be safe.
Theory versus practice
That’s how its supposed to work. Key generators choose random pairs of primes from a set of size , multiply them, and hand out the products. With such a large range of primes to choose from, they should never need to reuse.
But, apparently, they do. Some primes, it seems, are the Times Square of 300 digit primes. In a sampling of million RSA keys from around the internet, one particular prime turned up 12720 times! The researchers estimate that as many as in every RSA keys may be vulnerable in this way.
A few thoughts on solutions
What should the response be? Here I am hesitant to give ideas, because I am a mathematician, not a security expert. One approach is to figure out why these primes are getting reused. Another is to simply stay away from the key issuers which seem to have this problem most often; the paper reports that this issue is much rarer for some issuers than others. Or one could imagine testing each key against a few million preexisting keys, and regenerating it if a hit comes up.
I also wonder whether going to products of three primes might work, if these more basic methods don’t solve the problem. RSA works just fine with products of more than two primes, and it is believed, you need to find all the prime factors to break it. If we used products of three primes, then a single GCD hit would be survivable, just as long as not more than one of your primes was reused. Again, I hesitate to make suggestions, because this is an area where you should really rely on experts. But it sounds reasonable to me.