In the past few sections, one of the many things you may have conjectured about sums of squares is that every prime of the form \(p=4k+1\) can be represented as the sum of two squares. (We discussed why limiting the question to primes was sufficient.) It turns out this is true, and we will spend most of the remainder of this chapter proving it (in the manner of [C.1.1, Chapter 10.6], though expanded greatly to avoid any direct reference to Minkowski's Theorem). At the end of the chapter, we'll combine it with the observation about primes of the form \(p=4k+3\) to see exactly which numbers can be thus represented.
Subsection13.4.1A useful plot
First, let's look at the following plot on the integer lattice. As you can see, I am plotting certain points on the circle \(x^2+y^2=n\), with \(n=5\) to begin. I have done some ‘magic’ to turn the square root of \(-1\text{ (mod }n)\) into these points. Before telling you the magic, this graphic will help us get ready to see it.
To be precise, I've used this square root of \(-1\) to create the regularly spaced grid of blue points. You can think about it as a bunch of corners of parallelograms.
Here is how we constructed the blue grid.
Assume that \(p\) is our prime and \(k=\left(\frac{p-1}{2}\right)!\) is our square root of negative one.
The blue points all are of the form \((ak+bp,a)\) for all integers \(a,b\).
For one final preliminary, let's define one more thing for any old point \((x,y)\) in the integer lattice (and especially for our blue dots).
Definition13.4.2
We call the norm of a point \((x,y)\) the sum of squares, \(N(x,y)=x^2+y^2\).
Subsection13.4.2Primes which are sums of squares¶ permalink
We are now ready to state our big theorem for the section. (See Fact 14.1.6 for a quite different proof.)
Theorem13.4.3
Every prime \(p\) of the form \(4k+1\) can be written as a sum of squares.
Proof
The proof is fairly long. Here is the strategy; the first step will be detailed in Subsection 13.4.3 and Subsection 13.4.4.
Suppose we find some blue dot \((ak+bp,a)\) such that \begin{equation*}0<N(ak+bp,a)=a^2+(ak+bp)^2<2p\, .\end{equation*} Then we know, modulo \(p\), that \begin{equation*}N(ak+bp,a)=a^2+(ak+bp)^2 \equiv a^2+(ak)^2\equiv a^2+a^2 k^2\equiv a^2-a^2\equiv 0\text{ (mod }p)\, ,\end{equation*} so \(p\) in fact divides the norm of the point \((ak+bp,a)\).
So we have that \(0<a^2+(ak+bp)^2<2p\) and that \(p\mid a^2+(ak+bp)^2\), meaning the only possibility is that \(p=a^2+(ak+bp)^2\), which gives \(p\) explicitly written as a sum of squares.
Example13.4.4
For instance, with \(p=5\), we have that \(k=\left(\frac{5-1}{2}\right)!=2!=2\), so we need to find a point \((a,2a+5b)\) such that \(a^2+(2a+5b)^2<2p\). Guess and check with \(a=1\) and \(b=0\) gives us \begin{equation*}N(1,2\cdot 1 +5\cdot 0)=1^2+(2\cdot 1+5\cdot 0)^2=5<2\cdot 5=10\end{equation*} so this point should work, and this does give the correct statement that \begin{equation*}5=1^2+2^2\; .\end{equation*}
What remains to be shown is that there actually is such a blue dot.
To prove the theorem that for any \(p=4k+1\) we can write it as a sum of squares, we need to prove there is a blue dot (somewhere) that is not at the origin but also has norm smaller than \(2p\). We will prove this by heavy reference to graphics, but all claims also make sense algebraically. Sometimes we need help to be able to think about more involved proofs.
We include a variation on the graphic to make this visually clear. The bigger circle is the one we care about now – it has formula \(x^2+y^2=2p\), so radius \(\sqrt{2p}\). If we find a blue point inside that circle but not at the origin, then the argument in the proof sketch shows it must be on the smaller circle.
Very strangely, the best way to do this is by considering the areas of the various circles, and showing that they are so big you just must have a blue point in it (but not at the origin). Let's see how this works.
The area of the bigger circle, which has radius \(\sqrt{2p}\), is \(\pi (\sqrt{2p})^2=2\pi p\). Since \(\pi >2\), we have that \(2\pi>2(2)=4\), which mean that the area of the bigger circle is bigger than \(4p\).
What we do now is to create a sublattice of the blue dots, which we will color green. (This is just a subset of a lattice which still otherwise satisfies the conditions for being a lattice.)
To create the green sublattice, take all blue dots, and just double their coordinates. (Naturally, each green dot is still a blue dot.)
Next, we take a look at the triangles made by the different colored dots. (You can click on triangles_on in the interact above to see them in red.) Compare the thinnest such triangles.
The thinnest triangle made by blue dots would be from the origin and the points \((p,0)\) (with \(a=0,b=1\)) and \((k,1)\) (with \(a=1,b=0\)).
The thinnest triangle made by the green dots has width \(2p\) (from the origin to \((2p,0)\), the previous point doubled) and height \(2\) (to the point \((2k,2)\), which is \((k,1)\) doubled).
The green dot triangle has area \(4p/2\) – so the parallelogram with the solid red lines made of two of them has area \(4p\). This area means it is smaller than the bigger circle.
This proof is very visual, so before we move on, make sure you believe all of this. Then we will analyze the exact areas involved more closely to finish. Remember, we are trying to prove that there is a blue point inside the bigger blue circle, but away from the origin.
Let's take stock.
We've created circles of various sizes to find points in, and two lattices to examine.
The area of the circle is more than the area (\(4p\)) of the smallest parallelogram made by green dots.
Because all points inside the parallelogram (not just green, blue, or lattice points) will repeat outside of it in another parallelogram, \(4p\) is the biggest area you can have and not repeat some point.
So, the circle, having a bigger area, must have two points (not necessarily blue points, just points on the plane) which are repeated by the shifting of this parallelogram (called a fundamental region).
This may sound a little suspicious, so let's be sure about it.
Claim13.4.5
The circle has two points of some kind repeated by shifting the fundamental region.
Call the parallelogram in red \(L\). The circle is composed of all the pieces of the circle which lie in different parallelograms (comprised of green dots) congruent to \(L\).
Let's ‘move’ the pieces in the circle to the corresponding part of \(L\). Now suppose there are not two points which are repeated in the big circle. Then this movement of pieces of the circle to \(L\) is one-to-one.
If that movement is one-to-one, then the pieces of the circle must at most fill up \(L\), but the circle has a bigger area than \(L\)! This is not possible, since moving doesn't change area, and thus there are two points which are repeated.
Now let's continue the proof of the main Theorem 13.4.3. To start, take two points that are repeated in the circle; call them \(v\) and \(w\). Then if we consider the points as vectors, \(v-w\) is itself a green point, since the difference one shifts them by must be one of the obvious directions of the parallelogram in order to be a repeat.
By the construction of the green points, that means \((v-w)/2\) is a blue point. This point can't be the origin, since \(v\) and \(w\) are different!
Further, this blue point is inside the big circle (so its norm is less than \(2p\)). Why?
Since the circle is nicely symmetric about the origin, the point \(-w\) is also in the circle.
The midpoint of the line segment connecting \(v\) and \(-w\), both points in the big circle, is in fact \((v-w)/2=\frac{v+(-w)}{2}\).
Circles are convex, so this blue point being between \(v\) \(-w\) means it's in the big circle. So we have found a blue point other than the origin in the blue circle.
Here is the picture of how to find the blue point in the circle. The black points are \(v\), \(w\), and \(-w\), and you see the midpoint of the line is indeed blue.
Believe it or not, we've concluded the proof – whew!
Why was this so hard? I can think of three reasons.
First, we are trying to prove something about squares by proving something about square roots. It works, but it means there will be many steps.
Secondly, we are not just algebraically proving it exists by solving an equation; we are forced to prove our square root exists with inequalities, which brings another set of complication.
Third, we are looking not just at any old inequalities, but truly geometric ones, and so we must gain insight that way – worthwhile, but stretching.