The easy parts of the prime number theorem

The prime number theorem states that the number of primes less than x is approximately \int_{t \leq x} dt/\log t. Proofs of the PNT tend to be easy to follow step by step, but I can never remember them once I step away from the page. I think I finally got the outline of the standard analytic proof to stick, so I’m going to record it here before I lose it:

1. Play around with \zeta(s).

Define the zeta function by

\zeta(s) = \sum_{n=1}^{\infty} n^{-s}.

Notice that \zeta(s) = \int_{t=1}^{\infty} \frac{dt}{t^s} + O(1)=1/(s-1) + O(1) as s \to 1. We have Euler’s factorization:

\zeta(s) = \prod_{p=1}^{\infty} \frac{1}{1-p^{-s}}

so

-\frac{\zeta'(s)}{\zeta(s)} = \sum_p \frac{(\log p) p^{-s}}{1- p^{-s}} = \sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s}.

Here \Lambda(n) is the Von Mangoldt function: \Lambda(n) = \log p if n=p^k and \Lambda(n)=0 otherwise. Since \zeta(s) has a first order pole at 1, we have - \zeta'(s)/\zeta(s) = 1/(s-1) + O(1). So

\sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s} = \sum_{n=1}^{\infty} \frac{1}{n^s} + O(1).

With a little care, we can see that the O(1) is an analytic function of s. So

\lim_{s \to 1} \sum_{n=1}^{\infty} \frac{\lambda(n)-1}{n^s} exists.

2. Take the limit as s goes to 1

Use the limit above to deduce that

\sum \frac{\Lambda(n)-1}{n} converges.

This is in red because it is really hard. Maybe I’ll say something about it later. But, intuitively, it makes sense.

Added in response to a comment of E. Kowalski As I discuss a little below, the fear is that \Lambda(n)-1 might behave like \sin(r \log n), for some r. In this case, we could not take the limit. If this exact problem occurred, then -\zeta'(s)/\zeta(s)+\zeta(s) would have a pole at 1+ir. A very clever trick shows that this function has no poles with real part 1. The challenge is to show that the absence of poles means that the limit is permissible.

3. Add up the partial sums

Fix \epsilon > 0. So there is an M such that, for M < w,x, we have \left| \sum_{n=w}^x \frac{\Lambda(n)-1}{n} \right| < \epsilon. Also, there must be some absolute bound C for any sum of the form \sum_{n=w}^x \frac{\Lambda(n)-1}{n}.

So

\left| \sum_{n=1}^x (\Lambda(n)-1) \right| = \left| \sum_{w=1}^x  \sum_{n=w}^x  \frac{\Lambda(n)-1}{n} \right| < CM + \epsilon x.

Since \epsilon is arbitrary, this shows that \sum_{n=1}^x (\Lambda(n)-1) = o(x). You are more likely to see this quoted as \sum_{n=1}^x \Lambda(n) =x +o(x).

4. Clean up

Standard techniques turn \sum_{n=1}^x (\Lambda(n)-1) = o(x) into \sum_{n=1}^x (\Lambda(n)/\log n-1/\log n) = o(x/\log x). In other words,
\pi(x) + (1/2) \pi(x^{1/2}) + \cdots = \sum_{n=1}^x (1/\log n) + o(x/\log x).
The terms \pi(x^{1/k}) are negligible for k>1.

A few notes below the fold:

Getting stronger bounds:

Take the equation \sum \frac{\Lambda(n) - 1}{n^s} = g(s), where g(s) is analytic near 1, and differentiate it d times. We get that \lim_{s \to 1} \sum \frac{(\Lambda(n)-1) (\log n)^d}{n^s} exists; that \sum \frac{(\Lambda(n)-1) (\log n)^d}{n} converges; that \sum_{n=1}^x (\Lambda(n)-1)(\log n)^d = o(x); and that \pi(x) = \sum_{n=1}^x 1/\log n + o(x/(\log x)^{d+1}).

UPDATE I should probably add a warning that I have neither seen this done nor checked the hardest part myself. It seems like the obvious way to proceed.

Even without the difficult step 2, the fact that \lim_{s \to 1} \sum \frac{(\Lambda(n)-1) (\log n)^d}{n^s} exists is already powerful. In shows that, if any result of the form
\pi(x) = \int_{t \leq x} \frac{t}{\log t} + C_1 \frac{t}{\log t} + \cdots + C_d \frac{t}{(\log t)^d} + o(\frac{t}{(\log t)^d})
holds, then the C_i must all be 0.

Converting sums to integrals:

Instead of using \zeta(s) as our example of a function with the correct pole at 1, we could use \int_{t=1}^{\infty} t^{-s} dt. So we would say, for example, that
\lim_{s \to 1} \sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s} - \int_{t=1}^{\infty} t^{-s} dt exists. This make a few things a little nicer, and a few things a little uglier.

A more dramatic change is to use measures to make all sums into integrals. Let m_{\lambda} be the atomic measure which is \Lambda(n) times a delta function at n. So we can write statements like “\int_{t=1}^{\infty} \frac{m_{\lambda}(t) - dt}{t} converges”. This is very helpful in writing the proof of step 2 cleanly. The reader who is scared by measures can use the Riemann-Stieltjes integral \int d(\Theta(t)-t)/t, where \Theta(x) = \sum_{n \leq x} \Lambda(n).

If even that is too scary, one can integrate by parts and talk about \int (\Theta(t)-t) dt/t^2. This is how Newman’s proof is usually written, but I think it obscures more than it reveals.

The oscillatory fear

The key step is going from the fact that \lim_{s \to 1} \int \frac{f(t)}{t^s} exists (for a certain measure f) to the conclusion that \int \frac{f(t)}{t} converges.

The fear here is that m(t) could be something like \sin(r \log t) dt for some constant r. Then \int \frac{\sin(r \log t) dt}{t^s} = \int e^{- (s-1) u} \sin(r u) du, where u=\log t. The limit as s \to 1 exists, but \int \sin(ru) du does not converge. ADDED Notice that, in this case, \int f(t) t^{-s} dt would have a pole at s=1+ir.

5 thoughts on “The easy parts of the prime number theorem

  1. A professional analytic number theorist would say that the “really hard” step 2 is in fact the proof of the PNT (on the other hand, the post is titled “The easy parts of the proof…”, so it’s fair).

    More seriously, there isn’t that much complex analysis in those steps, so the effect of (potential) zeros of zeta(s) on the line with real part 1 is invisible, although any such zero (however far away) would imply that the PNT is false.

  2. I did want to write about the easy steps :). For one thing, until I started thinking this through, I didn’t understand why it was considered hard to go from s=1+\epsilon to s=1, but easy to go from s=1 to the “s=0” statement that \sum_{n \leq x} (\Lambda(n)-1) = o(x). Now I see that the value of s is hiding not just in the summand, but also in the error bound.

    However, I’ve added some edits to try to get a little deeper.

  3. what are other theorems with “elementary” proofs that are worth memorizing? perhaps, the Wigner Semi-Circle theorem.

  4. Small typo: in the end of Step 1, the lower-case lambda should be upper case.

    The way I think of this is that there is a hierarchy of averages that one can perform on a quantity f(n) (such as the von Mangoldt function), from rough (and sharply localised) averages to smooth (and slowly decaying) averages. The rougher averages convey more information than the smoother ones, but by the same token are harder to control. (Note that one can typically write a smooth average efficiently as an average of rough averages, but not vice versa, since averaging only makes things smoother, not rougher. There are _inefficient_ tricks to express rough averages in terms of smooth averages, such as Polya-Vinogradov completion of sums or analytic continuation, but these tend to not be terribly good for the goal of obtaining good bounds.)

    Various oscillations in the primes (notes in the “music of the primes”, if you will), are picked up by sufficiently rough averages but are damped out to insignificance in sufficiently smooth averages. So as one gets to rougher and rougher averages, one needs to deal with more and more oscillatory scenarios. The first one, as pointed out above, is that \Lambda(n) resonates with \sin( r \log n ), which would destroy the prime number theorem; then there are weaker resonances such as n^{-\sigma} \sin(r \log n) for some 0 < \sigma < 1/2, which would leave the prime number theorem intact but destroy the Riemann hypothesis.

    Where complex analysis comes in is in showing that these sinusoidal conspiracies are the only things that cause the primes to behave non-uniformly, so that once one eliminates those, one has good control on the distribution of the primes. Other types of oscillation are not compatible with the meromorphic nature of the zeta function. One can pick up an echo of this important complex-analytic fact in the elementary approaches to the prime number theorem via the Selberg symmetry formula as discussed in a recent blog post of mine, but the complex method (with its well-developed theory of zeroes, factorisation, etc.) is far slicker in doing this.

  5. David – Thank you for writing this helpful post. I have had the same experience that you have (where the outline of the proof doesn’t stick despite having been through proof several times) and I appreciate you taking the time to write it down for reference.

Comments are closed.