There are a number of sums over primes whose asymptotic behavior is easy to determine. For example, we know that
$\log n! = \sum_p \sum_k \lfloor \frac{n}{p^k} \rfloor \log p$
and also $\log n! = n \log n - n + O(\log n)$. A bit of rearrangement and some elementary bounds yields
$\sum_{p \leq n} \frac{n}{p} \log p = n \log n + O(n)$
so
$\sum_{p \leq n} \frac{\log p}{p} = \log n + O(1). \quad (1)$

On the other hand, the estimate
$\sum_{p \leq n} 1 = \frac{n}{\log n} + O\left( \frac{n}{(\log n)^2} \right) \quad (2)$
is the Prime Number Theorem, whose proof was a major triumph of nineteenth century mathematics.

If you spend a lot of time reading about the Prime Number Theorem, you’ll see a lot of messing around with sums over primes, and it isn’t clear which ones are elementary and which ones are deep. This post is about a heuristic I found a while ago to separate the two: Would the given asymptotic still hold if primes hated to have first digit nine?

Acknowledgements: Much of this appeared earlier in an answer I left on MO. There is also a lot of similarity to Terry Tao’s post on conspiracies and the prime number theorem. One can think of “primes hate to start with nine” as a “conspiracy” whose consequences I find easy to work with.

## Seeing that (1) will not prove PNT

To be more precise, suppose that

$\bullet$ There are no primes between $9 \times 10^k$ and $10^{k+1}$, for $k$ large.

$\bullet$ There are $\approx \frac{\log 10}{\log 9} \int_{a}^{b} \frac{dt}{\log t}$ primes between $a$ and $b$ for $10^k \leq a < b \leq 9 \times 10^k$ (and $a$ and $b$ not too close together).

This would not be consistent with the Prime Number Theorem. PNT gives
$\pi(9 \times 10^k) \approx \frac{9 \times 10^k}{\log (9 \times 10^k)} = \frac{9 \times 10^k}{k \log 10 + O(1)}$
and
$\pi(10 \times 10^k) = \frac{10 \times 10^k}{k \log 10 + O(1)}$
so
$\pi(10 \times 10^k ) - \pi(9 \times 10^k) \approx \frac{10^k}{k \log 10}$
which goes to $\infty$ as $k \to \infty$.

So the proposed conspiracy is very inconsistent with the PNT, and any asymptotic property of primes which is strong enough to imply PNT should violate the conspiracy. But $(1)$ is perfectly compatible with this conspiracy.

Let's see that the conspiracy is consistent with (1): For $n$ of the form $9 \times 10^k$, the sum $\sum_{p \leq n} \log p/p$ would be
$\displaystyle{ \sum_{m=1}^k \sum_{10^m \leq p \leq 9 \times 10^m} \frac{\log p}{p} \approx \sum_{m=1}^k \int_{x=10^m}^{9 \times 10^m} \frac{\log 10}{\log 9} \frac{\log x}{x} \frac{dx}{\log x}=}$
$\displaystyle{ \sum_{m=1}^k \frac{\log 10}{\log 9} \left( \log (9 \times 10^m ) - \log (10^m) \right) = k \log 10. }$
For $n = 10^{k+1}$, the sum $\sum_{p \leq n} \log p/p$ would be equal to the same quantity, because there are no additional terms between $9 \times 10^k$ and $10^{k+1}$, so we also have
$\sum_{p \leq 10^{k+1}} \log p/p \approx k \log 10$. So $\sum_{p \leq 9 \times 10^{k}} \log p/p - \log( 9 \times 10^k) \approx - \log 9$ and $\sum_{p \leq \times 10^{k+1}} \log p/p - \log(10^{k+1}) \approx - \log 10$. Actually, if you are careful about the bounds, you’ll see that the right hand sides should be $c - \log 9$ and $c-\log 10$, where $c$ is a constant that has to do with how large the contributions from small primes are. So this conspiracy would imply $\sum_{p \leq n} \log p/p = \log n + O(1)$ where the $O(1)$ term oscillates between $c - \log 9$ and $c - \log 10$.

So $(1)$ should not be powerful enough to rule out the possibility that primes hate to start with $9$, and therefore should not be good enough to prove PNT.

On the other hand, if we had a proof that $\sum_{p \leq n} \log p/p = \log n + C + o(1)$, that would NOT be compatible with the possibility that primes hate to start with $9$, so it might be good enough to imply PNT and, indeed, it is.

To point out that people really do ask questions where this heuristic helps, see this math.SE question.

## Connection to the zeroes of the zeta function

As you’ve probably heard, the key to most proofs of the PNT is to show that $\zeta(1+it) \neq 0$ for any real $t$. Let’s see how this is related to the possibility of $9$-avoidance. Remember that we have $\zeta(s) = \prod_p (1-p^{-s})^{-1}$ so $\log \zeta(s) = - \sum_p \log (1-p^{-s}) \approx \sum_p p^{-s}$. So $\Re \log \zeta(1+it) \approx \Re \sum_p p^{-1} p^{-it} = \sum_p p^{-1} \cos (t \log p)$. Now, if $\zeta(1+it)$ were $0$, then would would expect $\Re \log \zeta(1+it)$ to be $- \infty$. (In practice, one usually prefers to work with $\zeta'(s)/\zeta(s)$ rather than $\log \zeta(s)$, but either will work for the current discussion.)

Looking at the sum $\sum_p p^{-1} \cos (t \log p)$, we expect it to be absolutely divergent, because $\sum_{p} p^{-1}$ diverges, and there is no reason to expect the cosine to be particularly small. On the other hand, the cosine will have both positive and negative signs, so as long as both signs occur about equally often, the sum probably will converge conditionally. The challenge of proving PNT is showing that there is enough cancellation of signs that we do indeed get conditional convergence.

Suppose that primes hated to start with $9$, and consider $t=2 \pi / \log 10$. Then $t \log p$ never lies between $2 \pi (k+\log 9/\log 10)$ and $2 \pi (k+1)$ for any sufficiently large integer $k$. Since $\cos$ is positive on such an interval, a whole lot of positive terms are missing from the sum $\sum_p p^{-1} \cos (t \log p)$.

So one can imagine that, if primes hated to start with $9$, then $\zeta(1+it)$ would vanish at $t = 2 \pi / \log 10$. This argument isn’t rigorous (one could imagine the primes had other peculiar properties which made the sum converge after all) but it starts to make it clear why the $\zeta$ function comes up. Of course, if you want to actually read a rigorous proof of the PNT, their are plenty of good ones available, and you’ll see that they work with $\zeta(1+it)$ from the start, without going through any thoughts about writing primes in base ten.

Of course, $\zeta$ does have zeroes, they just lie on $1/2+it$ rather than $1+it$. The first such zero is at $1/2+i \rho$ where $\rho \approx 14.134725$. Speaking vaguely, this means that $\cos (\rho \log p)$ should have a bias towards having one sign over another, although this preference only creates an imbalance of size $\approx \sqrt{\pi(N)}$ for primes near $N$. In other words, primes do prefer to have certain “leading digits” in “base” $e^{2 \pi/\rho} \approx 1.55974$.

In the following graphic, the blue points show how $\rho \log p/(2 \pi) \mod \mathbb{Z}$ prefers to avoid the left and right end points of the interval, so the $\cos$ tends to be negative. The red points show an analogous chart computed with $17$ in place of $\rho$, where no such trend appears. See this post for details of what I am plotting.