Question12.5.1
Can you quickly now factor \(n\) without using Sage?
Let's take a last crack at issues directly related to cryptography. (That doesn't mean that other stuff we do in this text is unrelated – oh no! Especially the geometry is connected. But we will not make direct connections.)
We will focus on the main attack on the RSA algorithm, namely factorization.
Let's look at another toy RSA problem to get a sense of what is going on. First, I choose a modulus \(n=899\). I will also use Sage to verify it has two prime factors, without telling you what they are.
Then I choose an exponent to raise my secret message by …
I haven't told you \(\phi(n)\), but this guarantees it is coprime to my (public) encryption key, which I have chosen to be \(e=13\). Now we can encode our message, \(x=11\).
Now, how could we hope to crack this sinister message? (Assume that euler_phi(899) is just too huge for Sage to do.) Well, we do know \(n=899\) and that \(e=13\). That could help. Remember, if we knew \(p\) and \(q\), we could easily calculate \(\phi(n)\) without even using Sage, which should be enough.
Can you quickly now factor \(n\) without using Sage?
Be smart about it. Think strategically; how should I have chosen a public modulus \(n\) to make this hard to do? How should \(p\) and \(q\) relate?
Hopefully you figured out \(p\) and \(q\). Then we just need to find an inverse modulo \(\phi(n)=(p-1)(q-1)\) to get our decryption key.
You can fill in the values you got for p and q here to make things work. Try it!
And we decrypt:
This gives us the original message \(x=11\) again.
This simple example makes it clear why factorization, not just looking for primes, might be important. To be truthful, many researchers in factorization simply do it to stay one step ahead of the other side, who is presumably also researching factorization – so to some extent it is an arms race.
But factorization is also inherently interesting mathematically! Here is an interesting statement, as an example.
If I know \(\phi(n)\) and \(n\), and know that \(n\) is a product of exactly two distinct primes, I can easily compute them both.
Continuing the example above, \begin{equation*}x^2-(899-840+1)x+899=x^2-60x+899=0\end{equation*} gives \begin{equation*}x=\frac{60\pm \sqrt{60^2-4(1)(899)}}{2(1)}=30\pm\frac{\sqrt{3600-3596}}{2}=30\pm 1=29,31\end{equation*}
The first, and oldest, method of factoring is one you already know, and maybe used a few minutes ago – trial factorization. It is the method we used with the Sieve of Eratosthenes; you just try each prime number, one by one.
Do you remember what the highest number you would have to try in order to factor \(n\) by trial division is? (Can you prove it?)
The following algorithm does this very naively (and slowly, even for trial division). Let's try to talk through what each step does.
This is one of the few places where it really is important to follow the code. That said, the details of the syntax are not as important as the algorithm – unless you want to harness the power of computers more effectively!
To factor \(n\), first enumerate the primes in ascending order \(p_1,p_2,\cdots p_k\), where \(p_k\) is the largest prime less than or equal to \(\sqrt{n}\). For each prime in order, check whether \(p_i \mid n\). If it does, terminate by returning \(p_i\) and \(n/p_i\); otherwise \(n\) must be prime.
Now let me verify it works on easy examples. Remember, we are just looking for factors at this point, not complete factorizations.
Okay, so this seems reasonable. But it's a little more problematic when you try to do large numbers, where large means “bigger than you can do by hand, but nowhere close to the size we looked at in general.” I'll actually time how long it takes.
Sage actually has this implemented in a much faster way, primarily by using optimized integers and a special version of Python that allows turning it into much faster code in the C language (Cython). Notice that it just returns a single factor – another slight speedup.
That's roughly one thousand times faster for the initial example! Naturally, it's possible to speed up even more. But note that getting the full factorization slows us back down; after all, one has to check that the remaining factor is prime (or factor it, if it isn't).
Even for this smaller number it takes some actual time – here is where one sees the difference between different implementations of the same algorithm.
So much for trial division! But we have other tools at our disposal.
Some of you might have tried something other than straight trial factorization when attacking \(n=899\) from our earlier problem. Reason this way; since we know that someone is trying to protect a secret, they probably are not going to pick a number with primes like \(3\) and \(5\) in it. After all, that would be too easy to factor.
In fact, it stands to reason that the primes \(p\) and \(q\) should be relatively large compared to \(n\) – so why not start in the middle?
This was Fermat's idea for factoring larger numbers. However, he didn't just start with primes in the middle; for one thing, if your number is even somewhat big and you don't have a computer or huge list of primes, how would you know where to start? So Fermat became clever, as always, and used an algebraic identity to help himself along.
Write \(n=ab\), with \(a>b\), and assume \(n\) is odd. Then we can write \(n\) as a difference of two square numbers.
Why is it fine to assume \(n\) is odd in these circumstances?
This may seem like an obscure identity to us, but at the time (and even well into the last century) such identities were the bread and butter of algebra, before we had tools like computers to help us along.
So what Fermat did is to try this identity backwards. Here is his strategy.
To find a factor for a number \(n\), begin by seeking a perfect square \(s^2\) bigger than \(n\), but still as close as possible. Now, do the following until you succeed, increasing \(s\) by one each time.
Check whether \(s^2-n\) is itself a perfect square \(t^2\).
That means we essentially turned \begin{equation*}s^2-t^2=n\text{ around into }s^2-n=t^2\end{equation*}
Once you succeed, then \(s\) and \(t\) are not the factors of \(n\); rather, they are \begin{equation*}a=s+t\text{ and }b=s-t\, .\end{equation*}
Here is an implementation – again, assuredly slow, but at least verbose in its explanation – of this strategy. We simply start with the next \(s\) above the square root of \(n\), and just keep trying \(s^2-n\) again and again for bigger and bigger \(s\).
Before we move on, let's try to factor 143 and 93 using this algorithm. Remember, we start with \(s^2-n\), where \(s\) is the next integer above \(\sqrt{n}\), and see if it is a perfect square; then we increase \(s\) by one each time.
After we do them by hand, we can see what Sage does with them to check.
Well, we struck gold on the first try here! That happens if your number is the product of two primes which are two apart. (Such primes are known as twin primes, and have some interesting stories. Among other things, calculating with them helped find a bug in the Pentium computer chip in 1995; see Subsection 22.3.2.)
As you can see, we probably would have been better off with trial division for \(n=93\). It's obvious that it's divisible by 3, but that takes a long time to reach from the middle.