Skip to main content
\( \newcommand{\lt}{ < } \newcommand{\gt}{ > } \newcommand{\amp}{ & } \)

Section13.4Primes as Sum of Squares

In the past few sections, one of the many things you may have conjectured about sums of squares is that every prime of the form \(p=4k+1\) can be represented as the sum of two squares. (We discussed why limiting the question to primes was sufficient.) It turns out this is true, and we will spend most of the remainder of this chapter proving it (in the manner of [C.1.1, Chapter 10.6], though expanded greatly to avoid any direct reference to Minkowski's Theorem). At the end of the chapter, we'll combine it with the observation about primes of the form \(p=4k+3\) to see exactly which numbers can be thus represented.

Subsection13.4.1A useful plot

First, let's look at the following plot on the integer lattice. As you can see, I am plotting certain points on the circle \(x^2+y^2=n\), with \(n=5\) to begin. I have done some ‘magic’ to turn the square root of \(-1\text{ (mod }n)\) into these points. Before telling you the magic, this graphic will help us get ready to see it.

To be precise, I've used this square root of \(-1\) to create the regularly spaced grid of blue points. You can think about it as a bunch of corners of parallelograms.

Remark13.4.1

Sometimes we generically call things like the set of blue dots a lattice, though we will usually use the word lattice only to refer to the integer lattice of the black dots. A general lattice is something related to a concept from linear algebra – vectors generated by a basis, except instead of being vectors over \(\mathbb{Q}\) or \(\mathbb{R}\), they are over \(\mathbb{Z}\).

Here is how we constructed the blue grid.

  • Assume that \(p\) is our prime and \(k=\left(\frac{p-1}{2}\right)!\) is our square root of negative one.

  • The blue points all are of the form \((ak+bp,a)\) for all integers \(a,b\).

For one final preliminary, let's define one more thing for any old point \((x,y)\) in the integer lattice (and especially for our blue dots).

Definition13.4.2

We call the norm of a point \((x,y)\) the sum of squares, \(N(x,y)=x^2+y^2\).

Subsection13.4.2Primes which are sums of squares

We are now ready to state our big theorem for the section. (See Fact 14.1.6 for a quite different proof.)

Proof
Example13.4.4

For instance, with \(p=5\), we have that \(k=\left(\frac{5-1}{2}\right)!=2!=2\), so we need to find a point \((a,2a+5b)\) such that \(a^2+(2a+5b)^2<2p\). Guess and check with \(a=1\) and \(b=0\) gives us \begin{equation*}N(1,2\cdot 1 +5\cdot 0)=1^2+(2\cdot 1+5\cdot 0)^2=5<2\cdot 5=10\end{equation*} so this point should work, and this does give the correct statement that \begin{equation*}5=1^2+2^2\; .\end{equation*}

What remains to be shown is that there actually is such a blue dot.

Subsection13.4.3Visualizing the proof

To prove the theorem that for any \(p=4k+1\) we can write it as a sum of squares, we need to prove there is a blue dot (somewhere) that is not at the origin but also has norm smaller than \(2p\). We will prove this by heavy reference to graphics, but all claims also make sense algebraically. Sometimes we need help to be able to think about more involved proofs.

We include a variation on the graphic to make this visually clear. The bigger circle is the one we care about now – it has formula \(x^2+y^2=2p\), so radius \(\sqrt{2p}\). If we find a blue point inside that circle but not at the origin, then the argument in the proof sketch shows it must be on the smaller circle.

Very strangely, the best way to do this is by considering the areas of the various circles, and showing that they are so big you just must have a blue point in it (but not at the origin). Let's see how this works.

The area of the bigger circle, which has radius \(\sqrt{2p}\), is \(\pi (\sqrt{2p})^2=2\pi p\). Since \(\pi >2\), we have that \(2\pi>2(2)=4\), which mean that the area of the bigger circle is bigger than \(4p\).

What we do now is to create a sublattice of the blue dots, which we will color green. (This is just a subset of a lattice which still otherwise satisfies the conditions for being a lattice.)

To create the green sublattice, take all blue dots, and just double their coordinates. (Naturally, each green dot is still a blue dot.)

Next, we take a look at the triangles made by the different colored dots. (You can click on triangles_on in the interact above to see them in red.) Compare the thinnest such triangles.

  • The thinnest triangle made by blue dots would be from the origin and the points \((p,0)\) (with \(a=0,b=1\)) and \((k,1)\) (with \(a=1,b=0\)).

  • The thinnest triangle made by the green dots has width \(2p\) (from the origin to \((2p,0)\), the previous point doubled) and height \(2\) (to the point \((2k,2)\), which is \((k,1)\) doubled).

The green dot triangle has area \(4p/2\) – so the parallelogram with the solid red lines made of two of them has area \(4p\). This area means it is smaller than the bigger circle.

This proof is very visual, so before we move on, make sure you believe all of this. Then we will analyze the exact areas involved more closely to finish. Remember, we are trying to prove that there is a blue point inside the bigger blue circle, but away from the origin.

Subsection13.4.4Finishing the proof

Let's take stock.

  • We've created circles of various sizes to find points in, and two lattices to examine.

  • The area of the circle is more than the area (\(4p\)) of the smallest parallelogram made by green dots.

  • Because all points inside the parallelogram (not just green, blue, or lattice points) will repeat outside of it in another parallelogram, \(4p\) is the biggest area you can have and not repeat some point.

  • So, the circle, having a bigger area, must have two points (not necessarily blue points, just points on the plane) which are repeated by the shifting of this parallelogram (called a fundamental region).

This may sound a little suspicious, so let's be sure about it.

Now let's continue the proof of the main Theorem 13.4.3. To start, take two points that are repeated in the circle; call them \(v\) and \(w\). Then if we consider the points as vectors, \(v-w\) is itself a green point, since the difference one shifts them by must be one of the obvious directions of the parallelogram in order to be a repeat.

By the construction of the green points, that means \((v-w)/2\) is a blue point. This point can't be the origin, since \(v\) and \(w\) are different!

Further, this blue point is inside the big circle (so its norm is less than \(2p\)). Why?

  • Since the circle is nicely symmetric about the origin, the point \(-w\) is also in the circle.

  • The midpoint of the line segment connecting \(v\) and \(-w\), both points in the big circle, is in fact \((v-w)/2=\frac{v+(-w)}{2}\).

  • Circles are convex, so this blue point being between \(v\) \(-w\) means it's in the big circle. So we have found a blue point other than the origin in the blue circle.

Here is the picture of how to find the blue point in the circle. The black points are \(v\), \(w\), and \(-w\), and you see the midpoint of the line is indeed blue.

Sage note13.4.6Examining code is good for you

This is by far the longest code we've seen up to this point. It is a brute force check of all movements of all points in the parallelogram to find two points in the bigger circle. Can you think of ways to make it more efficient?

Believe it or not, we've concluded the proof – whew!

Why was this so hard? I can think of three reasons.

  • First, we are trying to prove something about squares by proving something about square roots. It works, but it means there will be many steps.

  • Secondly, we are not just algebraically proving it exists by solving an equation; we are forced to prove our square root exists with inequalities, which brings another set of complication.

  • Third, we are looking not just at any old inequalities, but truly geometric ones, and so we must gain insight that way – worthwhile, but stretching.

Remark13.4.7

Many more theorems of this kind can be proved using these techniques – the names of Minkowski and Blichfeldt show up in further generality, which we are intentionally avoiding. Those who have had some physics may have heard of Minkowski before, as his work nearly beat Einstein to the notion of special relativity.