Skip to main content

Section 12.5 Introduction to Factorization

Let's take a last crack at issues directly related to cryptography. (That doesn't mean that other stuff we do in this text is unrelated – oh no! Especially the geometry is connected. But we will not make direct connections.)

We will focus on the main attack on the RSA algorithm, namely finding nontrivial factorizations, or factorization.

Subsection 12.5.1 Factorization and the RSA

Let's look at another toy RSA problem to get a sense of what is going on. First, I choose a modulus \(n=899\text{.}\) I will also use Sage to verify it has two prime factors, without telling you what they are.

Then I choose an exponent to raise my secret message by …

I haven't told you \(\phi(n)\text{,}\) but this guarantees it is coprime to my (public) encryption key, which I have chosen to be \(e=13\text{.}\) Now we can encode our message, \(x=11\text{.}\)

Now, how could we hope to crack this sinister message? (Assume that Sage doesn't have enough power to compute euler_phi(899) directly.) Well, we do know \(n=899\) and that \(e=13\text{.}\) That could help. Remember, if we knew \(p\) and \(q\text{,}\) we could easily calculate \(\phi(n)\) without even using Sage, which should be enough.

Question 12.5.1.

Can you quickly now factor \(n=899\) without using Sage?

Solution

Hint: be smart about it. Think strategically; how should I have chosen a public modulus \(n\) to make this hard to do? How should \(p\) and \(q\) relate?

Hopefully you figured out \(p\) and \(q\text{.}\) Then we just need to find an inverse modulo \(\phi(n)=(p-1)(q-1)\) to get our decryption key.

Sage note 12.5.2. Trying your primes yourself.

You can fill in the values you got for p and q here to make things work. Try it!

When we decrypt, we should get the original message \(x=11\) again.

This simple example makes it clear why factorization, not just looking for primes, might be important. To be truthful, many researchers in factorization simply do it to stay one step ahead of the other side, who is presumably also researching factorization – so to some extent it is an arms race.

But factorization is also inherently interesting mathematically! Here is an interesting statement, as an example.

Of course, if we know \(\phi(n)\text{,}\) we already can crack the code, but who cares; maybe we are given \(\phi(n)\) and \(n\) and want the factorization. Here is the short proof.

Suppose the (as yet unknown) primes are \(p\) and \(q\text{.}\) Then expand our formula to

\begin{equation*} \phi(n)=(p-1)(q-1)=pq-p-q+1=n-(p+q)+1 \end{equation*}

We now can represent both \(p+q\) and \(pq\) as formulas in \(n\) and \(\phi(n)\text{:}\)

  • \(\displaystyle p+q=n-\phi(n)+1\)

  • \(\displaystyle pq=n\)

Where might we have a formula with \(p+q\) and \(pq\text{?}\) That should seem familiar …

\begin{equation*} (x-p)(x-q)=x^2-(p+q)x+pq \end{equation*}

So we can simply use the quadratic formula on this expression to get the values for \(p\) and \(q\text{!}\)

\begin{equation*} p,q=\frac{(p+q)\pm\sqrt{(p+q)^2-4pq}}{2}=\frac{n-\phi(n)+1}{2}\pm\frac{\sqrt{(n-\phi(n)+1)^2-4n}}{2} \end{equation*}
Example 12.5.4.

Continuing the example above,

\begin{equation*} x^2-(899-840+1)x+899=x^2-60x+899=0 \end{equation*}

gives

\begin{equation*} x=\frac{60\pm \sqrt{60^2-4(1)(899)}}{2(1)}=30\pm\frac{\sqrt{3600-3596}}{2}=30\pm 1=29,31\text{.} \end{equation*}

Subsection 12.5.2 Trial division

The first, and oldest, method of factoring is one you already know, and maybe used a few minutes ago – trial factorization, or trial division. It is the method we used with the Sieve of Eratosthenes; you just try each prime number, one by one.

In Algorithm 6.2.2, do you remember what the highest number you would have to try is in order to factor a given \(n\) by trial division? (Can you prove it?)

The following algorithm does this very naively (and slowly, even for trial division). Let's try to talk through what each step does.

Sage note 12.5.5. Code for trial division.

This is one of the few places where it really is important to follow the code. That said, the details of the syntax are not as important as the algorithm – unless you want to harness the power of computers more effectively!

Now let me verify it works on easy examples. Remember, we are just looking for factors at this point, not complete factorizations.

Okay, so this seems reasonable. But it's a little more problematic when you try to do large numbers, where large means “bigger than you can do by hand, but nowhere close to the size we looked at in general.” I'll actually time 3  how long it takes.

Unfortunately Sage interacts do not currently support using timeit in an interact properly.

Sage actually implements this in a much faster way, primarily by using optimized integers and a special version of Python that allows turning it into muchfaster code in the C language (Cython). Notice that the command returns just a single factor – giving another slight speedup.

That's roughly one thousand times faster for the initial example! Naturally, it's possible to speed up even more. Sometimes getting the full factorization slows us back down; after all, one has to check that the remaining factor is prime (or factor it, if it isn't), so checking this is worth it too.

Even for the following smaller number it takes some actual time – here is where one sees the difference between different implementations of the same algorithm.

Subsection 12.5.3 Starting in the middle

So much for trial division! But we have other tools at our disposal.

Some of you might have tried something other than straight trial factorization when attacking \(n=899\) from our earlier problem. Reason this way; since we know that someone is trying to protect a secret, they probably are not going to pick a number with primes like \(3\) and \(5\) in it. After all, that would be too easy to factor.

In fact, it stands to reason that the primes \(p\) and \(q\) should be relatively large compared to \(n\) – so why not start in the middle?

This was Fermat's idea for factoring larger numbers. However, he didn't just start with primes in the middle; for one thing, if your number is even somewhat big and you don't have a computer or huge list of primes, how would you know where to start? So Fermat became clever, as always, and used an algebraic identity to help himself along.

Namely, \(n\) is the difference of the squares of \(s=\frac{a+b}{2}\) and \(t=\frac{a-b}{2}\text{:}\)

\begin{equation*} s^2-t^2=\left(\frac{a+b}{2}\right)^2-\left(\frac{a-b}{2}\right)^2=\frac{a^2}{4}-\frac{a^2}{4}+\frac{b^2}{4}-\frac{b^2}{4}+\frac{2ab}{4}+\frac{2ab}{4}=ab=n\text{.} \end{equation*}
Remark 12.5.8.

Why is it fine to assume \(n\) is odd in these circumstances?

This may seem like an obscure identity to us, but at the time (and even well into the last century) such identities were the bread and butter of algebra, before we had tools like computers to help us along.

So what Fermat did was try this identity backwards. Here is his strategy.

It should be clear why \(a\) and \(b\) are the factors. But how do we know this algorithm terminates?

Assuming you started with \(s\) as instructed, eventually you will reach \(s=(n+1)/2\text{,}\) which is much larger than \(\sqrt{n}\text{.}\) But then \(((n+1)/2)^2-n=\frac{n^2+2n+1-4n}{4}=((n-1)/2)^2\text{.}\) You should check that this gives us the trivial factorization \(n=n\cdot 1\text{,}\) though! (See Exercise 12.7.11)

Here is an implementation – again, assuredly slow, but at least verbose in its explanation – of this strategy. We simply start with the next \(s\) above the square root of \(n\text{,}\) and just keep trying \(s^2-n\) again and again for bigger and bigger \(s\text{.}\)

Example 12.5.10.

Before we move on, let's try to factor \(143\) and \(93\) using this algorithm. Remember, we start with \(s^2-n\text{,}\) where \(s\) is the next integer above \(\sqrt{n}\text{,}\) and see if it is a perfect square; then we increase \(s\) by one each time.

After you attempt this by hand, you can see what Sage does with them to check.

Well, we struck gold on the first try here! That happens if your number is the product of two primes which are two apart. (Such primes are known as twin primes, and have some interesting stories. Among other things, calculating with them helped find a bug in the Pentium computer chip in 1995; see Subsection 22.3.2.)

As you can see, we probably would have been better off with trial division for \(n=93\text{.}\) It's obvious that it's divisible by 3, but that takes a long time to reach from the middle.