Section12.6A Taste of Modernity¶ permalink

Now, these methods are the beginnings of how people really factor big numbers. Typically, one does trial division up to a certain size (maybe the first few hundred or thousand primes), then perhaps some modification of Fermat to make sure that there aren't any factors close to the square root if you are attacking something like RSA where that would otherwise be advantageous.

Then what?

There are many answers, some of which involve cool things called continued fractions or number fields. We won't really touch on those topics, but the previous section did talk about something very useful in the previous sections.

Namely, we could come up with some probabilistic/random methods! That's right, we are going to try to find a factor randomly!

Subsection12.6.1The Pollard Rho algorithm

Here is the essence of this random approach; it is highly recursive, like many good algorithms.

Algorithm12.6.1Generic routine for “random” factoring

Follow these steps.

Pick some polynomial that will be easy to compute mod \((n)\).
Plug in an essentially random seed value. (Often the seed is \(2\) or \(3\).)
Compute the polynomial's value at the seed.
If that has a non-trivial gcd with \(n\), we have a factor. Otherwise, plug the value back into the polynomial, and repeat (and hope it eventually succeeds).

Below is code for the method we'll discuss in this section. It has a modification to the generic algorithm which I will discuss below..

The essence of the method is that by plugging in the values of the polynomial modulo \(n\), we are generating a ‘pseudo-random’ sequence of numbers. And a ‘pseudo-random’ sequence might be better than the sequences we used for trial division or Fermat factorization, precisely because it will hit some small(ish) factors and some other large random factors. It might also be good that it could give us numbers which, although not a factor of \(n\), might at least share a factor with \(n\).

Example12.6.2

Here are some details. Let \(x_0=2\) and \(f(x)=x^2+1\) (these could be different, but they are a typical first choice). We create the following sequence of \(x_i\):

\(x_0\equiv 2\)
\(x_1\equiv f(x_0)\)
\(x_2\equiv f(x_1)\)
etc., all (mod \(n\)).

For instance, doing it with \(n=8051\), we get

Now, these might be kind of random, but we will not actually try to find common divisors of these numbers with \(n\).

Instead, we will try to see if all the differences \(x_i-x_j\) share a common factor with \(n\), using the (highly efficient to compute) gcd. That is a lot more opportunities! And hopefully it's just as (or more) ‘random’, and just as effective at finding factors.

However, that is too many to check quickly. So instead one modifies this one last time, with the following modification to the algorithm.

First, remember that polynomials are well-defined in modular arithmetic, and so the sequence of results will eventually repeat modulo any particular modulus \(d\), since there are finitely many possible \(x_i\). In particular, if \(d\) is a divisor of \(n\), then if \(\gcd(x_i-x_j,n)=d\) is a shared divisor found by the pair \(x_i\) and \(x_j\), then not only will it be the case that \begin{equation*}x_i\equiv x_j\text{ (mod }d)\end{equation*} but it will also be the case for any number of iterations of \(f\) that \begin{equation*}f^n(x_i)\equiv f^n(x_j)\text{ (mod }d)\end{equation*} which means the sequence (modulo \(d\), the common divisor) repeats itself every \(j-i\) terms.

Let \(k=j-i\) (or the first multiple of this which is greater than \(i\)); then the congruence \begin{equation*}x_k\equiv x_{2k}\text{ (mod }d)\end{equation*} will have to be true as well, so all the \(x_i,x_j\) pairs which come from the first one to share a divisor can be checked by checking just this one \(\gcd(x_{2k}-x_k,n)\).

Algorithm12.6.3Pollard Rho factoring algorithm

Follow these steps.

Pick some polynomial \(f(x)\) that will be easy to compute mod \((n)\) (typically \(x^2+1\)).
Plug in an essentially random seed value \(x_0\). (Often the seed is \(2\) or \(3\).)
Compute the polynomial's value at the seed, \(x_1\).
Continue plugging in \(f(x_i)=x_{i+1}\), modulo \(n\).
For each \(k\) we check whether \begin{equation*}1<gcd(x_{2k}-x_k,n)=d<n\, .\end{equation*}

Proof

Example12.6.4

In the example above, the numbers we would get for the first three rounds are

\(x_2-x_1=26-5=21\)
\(x_4-x_2=7474-26=7448\)
\(x_6-x_3=871-677=194\)

These factor as follows:

and have the following gcds with 8051:

So this would catch the four factors \(3,7,19\), and \(97\), which is not bad, and indeed the final one caught a common divisor with \(8051\).

Remark12.6.5

This method is usually called the Pollard rho method because it is due to John Pollard and because a very imaginative eye can interpret the \(x_i\) eventually repeating (mod \(d\)) (in the example, \(d=97\)) as a tail and then a loop, i.e. a Greek \(\rho\).

Notice that sometimes the rho method doesn't come up with an answer quickly, or at all (it took 50 rounds without success here). I could up this to a lot more. So what do you do then – bring out the big guns? Not at all – just as with primality testing, you just change your starting point to try again!

In general, there are other such probabilistic algorithms, and they are quite successful with factoring numbers which might have reasonably sized but not gigantic factors. According to Wikipedia:

The rho algorithm's most remarkable success has been the factorization of the eighth Fermat number by Pollard and Brent. They used Brent's variant of the algorithm, which found a previously unknown prime factor. The complete factorization of \(F_8\) took, in total, 2 hours on a UNIVAC 1100/42.

Well, this was in the 70s. But still, it's not bad! Things don't automatically work quickly even now.

Hmm, what now? Let's change the seed.

Luckily, we have bigger guns at our disposal in Sage (especially in the component program Pari), that polish thing off rather more quickly.

A little better than two hours on a mainframe, or even on this computer, I hope you'll agree.

Sage note12.6.6Building interacts

Real factorization algorithms use several different methods to attack different types of factors. We can try to simulate this in a basic way by creating a Sage interact. Evaluate the first cell to define things, then the second one, which is the interact.

An interact is just a Sage/Python function, except with @interact before it. There are many different input widgets you can use; this one demonstrates using a list and an input box which takes any input. See the interact documentation or Quickstart for many examples and more details.

If you think this sort of thing is cool, the Cunningham Project is a place to explore. I particularly like their Most Wanted lists. The idea is this.

The Cunningham Project seeks to factor the numbers \(b^n \pm 1\) for \(b=2, 3, 5, 6, 7, 10, 11, 12\), up to high powers \(n\).

Another interesting resource is Sage developer Paul Zimmermann's Integer Factoring Records. Finally, Wagstaff's The joy of factoring [C.3.11] has tons of awesome examples and procedures – far too many, really, as well as an excellent discussion of how to speed up trial division etc.