Section 12.6 A Taste of Modernity
Now, these methods are the beginnings of how people really factor big numbers. Typically, one does trial division up to a certain size (maybe the first few hundred or thousand primes), then perhaps some modification of Fermat to make sure that there aren't any factors close to the square root if you are attacking something like RSA where that would otherwise be advantageous.
Then what?
One important modern technique that is beginning to show up in introductory textbooks is Lenstra's elliptic curve method; see once again either [E.4.19] or [E.2.10, Chapter 18.7] for details at the level of this text. There are many other answers, some of which involve cool things called continued fractions or number fields. See Exercises 12.7.12–12.7.15 to investigate these, starting with a simpler (but related) algorithm to Lenstra's in Exercise 12.7.12.
We won't touch more on those topics, but Algorithm 12.4.7 touched on a concept important in factoring, not just finding primes. Namely, we could come up with some probabilistic/random methods. That's right, we are going to try to find a factor randomly!
Subsection 12.6.1 The Pollard Rho algorithm
Here is the essence of this random (or ‘Monte Carlo’) approach; it is highly recursive, like many good algorithms.
Algorithm 12.6.1. Generic routine for “random” factoring.
Follow these steps.
Pick some polynomial that will be easy to compute mod \((n)\text{.}\)
Plug in an essentially random seed value. (Often the seed is \(2\) or \(3\text{.}\))
Compute the polynomial's value at the seed.
If that has a non-trivial gcd with \(n\text{,}\) we have a factor. Otherwise, plug the new value back into the polynomial, and repeat (and hope it eventually succeeds).
Below is code for the method we'll discuss in this section. It has a modification to the generic algorithm which I will discuss below.
The essence of the method is that by plugging in the values of the polynomial modulo \(n\text{,}\) we are generating a ‘pseudo-random’ sequence of numbers.
\(x_0\equiv 2\) (mod \(n\))
\(x_1\equiv f(x_0)\) (mod \(n\))
\(x_2\equiv f(x_1)\) (mod \(n\))
\(x_{i+1}\equiv f(x_i)\text{,}\) all (mod \(n\)).
Such a ‘pseudo-random’ sequence might be better than the sequences we used for trial division or Fermat factorization, precisely because it will likely hit some small(ish) factors and some fairly large factors, a good mix. It might also be good that it could give us numbers which, although not a factor of \(n\text{,}\) might at least share a factor with \(n\text{.}\)
A first choice of seed and polynomial might be \(x_0=2\) and \(f(x)=x^2+1\text{.}\) These choices could be different, but they are typical; John Pollard's original paper used \(f(x)=x^2-1\text{,}\) for instance.
Example 12.6.2.
Let's try computing what we get for some specific numbers. Picking \(n=8051\) as in [E.2.4, Example 3.25], we get results as in the following interact.
Notice that for \(n=8051\text{,}\) the term \(x_{4}\equiv x_{19}\) (mod \(n\)), so the sequence, while seeming fairly random, will not come up with every possible divisor. With seed \(3\text{,}\) \(x_{13}\equiv x_{28}\) is the first repeat; if instead we change the modulus to \(8053\text{,}\) there is no repeat through \(x_{50}\text{.}\) So you can see the output will be hard to predict.
Although the outputs of \(x_{i+1}=f(x_i)\) might already seem to be fairly random, we will not actually try to find common divisors of these numbers with \(n\text{.}\) Instead, we will try to see if all the differences \(x_i-x_j\) share a common factor with \(n\text{,}\) using the (highly efficient to compute) greatest common divisor. That gives a lot more opportunities to find a common factor than just comparing with \(x_i\text{!}\) And hopefully it's just as (or more) ‘random’, and just as effective at finding factors.
However, having all possible differences to check might slow things down too much. So instead there is a final modification to the algorithm.
First, since there are finitely many possible outcomes modulo any particular modulus \(d\text{,}\) the sequence of results will eventually repeat not just modulo \(n\text{,}\) but for any \(d\text{.}\) In particular, suppose \(d\) is a divisor of \(n\) such that \(\gcd(x_i-x_j,n)=d\) for a specific pair \(x_i\) and \(x_j\) with \(j>i\text{.}\)
Now consider the values of the sequence of \(x_\ell\) modulo \(d\text{.}\) Because polynomials are well-defined in modular arithmetic, we have that
implies that, for any \(m\) we have
as well. When we let \(m=j-i\text{,}\) this becomes
which means the sequence (modulo \(d\text{,}\) the common divisor, not \(n\)) repeats itself every \(j-i\) terms (after \(i\text{,}\) of course, if \(j-i\lt i\)).
Now let \(s\) be an integer such that \(s(j-i)\geq i\text{.}\) Then \(x_{s(j-i)}\) appears after the periodic behavior (modulo \(d\)) begins, so
If we now let \(k=s(j-i)\) this congruence means
so \(d\) is a divisor of \(x_{2k}-x_k\) specifically, not just \(x_i-x_j\text{.}\)
Finally, this means instead of checking all possible differences for a common divisor, we only have to check \(\gcd(x_{2k}-x_k,n)\) for all \(k\) in the algorithm.
Algorithm 12.6.3. Pollard Rho factoring algorithm.
Follow these steps.
Pick some polynomial \(f(x)\) that will be easy to compute mod \((n)\) (such as \(x^2+1\text{,}\) though other quadratics might be used).
Plug in an essentially random seed value \(x_0\text{.}\) (Often the seed is \(2\) or \(3\text{.}\))
Compute the polynomial's value at the seed, \(f(x_0)=x_1\text{.}\)
Continue plugging in \(f(x_i)=x_{i+1}\text{,}\) modulo \(n\text{.}\)
-
For each \(k\) we check whether
\begin{equation*} 1<\gcd(x_{2k}-x_k,n)=d<n\text{.} \end{equation*}
Since the algorithm doesn't always find a factor for any given combination of seed, number, and polynomial, there is nothing to prove per se. However, probabilistically (just like with Miller-Rabin) it should succeed for \(k\) in the neighborhood of the size of the square root of the smallest factor of \(n\text{.}\) (This is also the justification for introducing the algorithm in the original papers introducing this method and its variants.) So if \(n\) has a big, but not too big, divisor, this test should help us find that divisor.
Example 12.6.4.
Let's try this with \(n=9991\text{.}\) Keeping \(x^2+1\) and seed \(2\text{,}\) the numbers we would get for the first six rounds are
This gives differences as follows:
\(\displaystyle x_2-x_1=26-5=21\)
\(\displaystyle x_4-x_2=8735-26=8709\)
\(\displaystyle x_6-x_3=4654-677=3977\)
\(\displaystyle x_8-x_4=4973-8735=-3762\)
\(\displaystyle x_{10}-x_5=8153-8950=-797\)
These factor as follows:
That is an impressive list of eight different prime factors that could potentially be shared with \(9991\) in just five iterations. These differences have the following gcds with \(9991\text{:}\)
Indeed the third one already caught a common divisor with \(9991\text{.}\)
Remark 12.6.5.
This method is called the Pollard rho method because (apparently) a very imaginative eye can interpret the \(x_i\) eventually repeating (mod \(d\)) (in the example, \(d=97\)) as a tail and then a loop, i.e. a Greek \(\rho\text{.}\) John Pollard actually has another method named after him, the \(p-1\) method, which we will not explore; however, it is related to some of the more advanced methods we mentioned in the introduction to this section.
Example 12.6.6.
Sometimes the rho method doesn't come up with an answer quickly, or at all.
Here we took 50 rounds without success, using the seed \(2\text{.}\) Because of the \(\rho\) repetition, it will never succeed. So what do you do then – bring out the really advanced methods? Not at all – just as with primality testing, you just change your starting point to try again!
Subsection 12.6.2 More factorization
In general, there are other such probabilistic algorithms, and they are quite successful with factoring numbers which might have reasonably sized but not gigantic factors.
Historical remark 12.6.7. Factoring Fermat.
The big success of this algorithm was the 1980 factorization of \(F_8\) (the eighth Fermat number) by Pollard and Richard Brent (see Brent's website). They used one of several variants of the method, due to Brent, to found the previously unknown prime factor 4 \(1238926361552897\text{.}\) Finding this factor of \(F_8\) took, in total, 2 hours on a UNIVAC 1100/42 (which for the time period was very fast, indeed). Interestingly, the other (much larger) factor was not proven to be prime until significantly later; and as of this writing (January 2021) even \(F_{12}\) has not been fully factored!
Things don't automatically work quickly even with today's far more powerful hardware.
Hmm, what now? Let's change the seed.
No one method will factor every number quickly. Luckily, we have bigger guns at our disposal in Sage (especially in the component program Pari), that polish thing off rather more quickly.
That is a little better than two hours on a mainframe, or even on your computer, I hope you'll agree.
Real factorization algorithms use several different methods to attack different types of factors. We can try to simulate this in a basic way by creating a Sage interact. Evaluate the first cell to define things (don't forget to keep the rho method defined); then you can evaluate the second cell, which is the interact.
Sage note 12.6.8. Building interacts.
An interact is just a Sage/Python function, except with @interact
before it. There are many different input widgets you can use; this one demonstrates using a list and an input box which takes any input. See the interact documentation or Quickstart for many examples and more details.
If you think this sort of thing is cool, the Cunningham Project is a place to explore. I particularly like their Most Wanted lists. The idea is this:
The Cunningham Project seeks to factor the numbers \(b^n \pm 1\) for \(b=2, 3, 5, 6, 7, 10, 11, 12\text{,}\) up to high powers \(n\text{.}\)
Another interesting resource is Sage developer Paul Zimmermann's Integer Factoring Records. Finally, Wagstaff's The joy of factoring [E.4.12] has tons of awesome examples and procedures – far too many, really, as well as an excellent discussion of how to speed up trial division, etc.