# Christol’s theorem and the Cartier operator

Let’s suppose that we want to compute $2^n \mod p$, and we have already been given $n$ written out in base $p$ as $n = \sum n_i p^i$. Here $p$ is a small prime and we want to do this conversation repeatedly for many $n$‘s.

Remember that $2^p \equiv 2 \mod p$ and thus $2^n \equiv 2^{n_0+n_1 p + n_2 p^2 + \cdots + n_{\ell} p^{\ell}} \equiv 2^{n_0} 2^{n_1} \cdots 2^{n_{\ell}} \mod p$. So, start with $1$, multiply it by $2^{n_0}$, then by $2^{n_1}$, then by $2^{n_2}$ and so forth. When you get to the end, read off $2^n$.

We can precompute the effect on $\mathbb{F}_p$ of multiplying by $2^k$, for $0 \leq k \leq p-1$. Then we can compute $2^n$ just by scanning across the base $p$ representation of $n$ and applying these precomputed maps to the finite set $\mathbb{F}_p$.

The precise way to say that this is a simple, one-pass, process is that it is a computation which can be done by a finite-state automaton. Here is the definition: let $I$, $S$ and $O$ be finite sets (input, states and output), and $s \in S$ (the start). For each $i \in I$, let $A(i)$ be a map $S \to S$. We also have a map $r:S \to O$ (readout). Given a string $(i_0, i_1, \ldots, i_{\ell})$ in $I^{\ell}$, we compute $r( A(i_{\ell}) \circ \cdots A(i_1) \circ A(i_0) (s))$. So our input is a string of characters from $I$, and our output is in $O$.

We can say that our example above shows that $(n_{\ell}, \ldots, n_1, n_0) \mapsto 2^{\sum n_i p^i} \mod p$ is computable by a finite-state automaton. (In our example, the sets $I$, $S$ and $O$ all have cardinality $p$, but I do not want to identify them.)

This is a special case of an amazing result of Christol et al: Let $f_n$ be a sequence of elements of $\mathbb{F}_p$. Then $(n_{\ell}, \ldots, n_1, n_0) \mapsto f_{\sum n_i p^i}$ can be computed by a finite-state automaton if and only if the generating function $\sum f_n x^n$ is algebraic over $\mathbb{F}_p(x)$!

We have just explained the case $\sum f_n x^n = 1/(1-2x)$. The reader might enjoy working out the cases $1/(1-x-x^2)$ (the Fibonacci numbers) and $(1 - \sqrt{1-4x})/2x$ (the Catalans).

In this post, I will use Christol’s theorem as an excuse to promote the Cartier operator, an amazing tool for working with differential forms in characteristic $p$.

# The technical part of Godel’s proof

We’ll start with a puzzle, taken from Hofstader’s Godel, Escher, Bach. In this puzzle, we are going to learn to speak in a restricted language. We will be talking about integers, so our variables will always stand for integers. We have available to us the ordinary arithmetic operations $+$, $-$, $\times$, and the relations $=$, $\neq$, $<$, $\leq$, $>$, $\geq$. We have the logical connectives: "AND", "OR", "NOT", "IF … THEN" and the quantifiers "THERE EXISTS" and "FOR ALL".

So we can say "$p$ is prime", because we just say "There do not exist $x$ and $y$, both $\geq 2$, with $p=xy$." Or we can say “Every nonnegative integer is a sum of four squares”; we just say “If $n \geq 0$ then there exist $a$, $b$, $c$ and $d$ such that $a^2+b^2+c^2+d^2=n$.” The technical phrase here is that we are speaking the first order language of integers.

Puzzle: Within this limited vocabulary, express the statement $x=10^y$.

Why is this puzzle important? In Godel’s proof of the undecidability of Peano Arithmetic, he took theorems and proofs and encoded them as integers. He then needed to express “$P$ is a proof of $T$” as a sentence in the vocabulary of first order logic. When you think about how challenging the above puzzle is, and how much more complicated “$P$ is a proof of $T$” is than “$x=10^y$“, this should astonish you.

There are a lot of great popular introductions to the overall structure of Godel’s proof. Most of them gloss over HOW Godel solved this technical problem. My goal in this post is to discuss precisely this point, and ignore the larger questions. I will take the view that expressing things in the first order language of arithmetic is a worthwhile goal in its own right, and will show you how surprisingly strong this language is.

Below the fold, I’ll solve the puzzle. Then I’ll show how to express some more serious ideas.

# The easy parts of the prime number theorem

The prime number theorem states that the number of primes less than $x$ is approximately $\int_{t \leq x} dt/\log t$. Proofs of the PNT tend to be easy to follow step by step, but I can never remember them once I step away from the page. I think I finally got the outline of the standard analytic proof to stick, so I’m going to record it here before I lose it:

1. Play around with $\zeta(s)$.

Define the zeta function by

$\zeta(s) = \sum_{n=1}^{\infty} n^{-s}$.

Notice that $\zeta(s) = \int_{t=1}^{\infty} \frac{dt}{t^s} + O(1)=1/(s-1) + O(1)$ as $s \to 1$. We have Euler’s factorization:

$\zeta(s) = \prod_{p=1}^{\infty} \frac{1}{1-p^{-s}}$

so

$-\frac{\zeta'(s)}{\zeta(s)} = \sum_p \frac{(\log p) p^{-s}}{1- p^{-s}} = \sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s}$.

Here $\Lambda(n)$ is the Von Mangoldt function: $\Lambda(n) = \log p$ if $n=p^k$ and $\Lambda(n)=0$ otherwise. Since $\zeta(s)$ has a first order pole at $1$, we have $- \zeta'(s)/\zeta(s) = 1/(s-1) + O(1)$. So

$\sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s} = \sum_{n=1}^{\infty} \frac{1}{n^s} + O(1)$.

With a little care, we can see that the $O(1)$ is an analytic function of $s$. So

$\lim_{s \to 1} \sum_{n=1}^{\infty} \frac{\lambda(n)-1}{n^s}$ exists.

2. Take the limit as $s$ goes to $1$

Use the limit above to deduce that

$\sum \frac{\Lambda(n)-1}{n}$ converges.

This is in red because it is really hard. Maybe I’ll say something about it later. But, intuitively, it makes sense.

Added in response to a comment of E. Kowalski As I discuss a little below, the fear is that $\Lambda(n)-1$ might behave like $\sin(r \log n)$, for some $r$. In this case, we could not take the limit. If this exact problem occurred, then $-\zeta'(s)/\zeta(s)+\zeta(s)$ would have a pole at $1+ir$. A very clever trick shows that this function has no poles with real part $1$. The challenge is to show that the absence of poles means that the limit is permissible.

3. Add up the partial sums

Fix $\epsilon > 0$. So there is an $M$ such that, for $M < w,x$, we have $\left| \sum_{n=w}^x \frac{\Lambda(n)-1}{n} \right| < \epsilon$. Also, there must be some absolute bound $C$ for any sum of the form $\sum_{n=w}^x \frac{\Lambda(n)-1}{n}$.

So

$\left| \sum_{n=1}^x (\Lambda(n)-1) \right| = \left| \sum_{w=1}^x \sum_{n=w}^x \frac{\Lambda(n)-1}{n} \right| < CM + \epsilon x$.

Since $\epsilon$ is arbitrary, this shows that $\sum_{n=1}^x (\Lambda(n)-1) = o(x)$. You are more likely to see this quoted as $\sum_{n=1}^x \Lambda(n) =x +o(x)$.

4. Clean up

Standard techniques turn $\sum_{n=1}^x (\Lambda(n)-1) = o(x)$ into $\sum_{n=1}^x (\Lambda(n)/\log n-1/\log n) = o(x/\log x)$. In other words,
$\pi(x) + (1/2) \pi(x^{1/2}) + \cdots = \sum_{n=1}^x (1/\log n) + o(x/\log x)$.
The terms $\pi(x^{1/k})$ are negligible for $k>1$.

A few notes below the fold:

# A (partial) explanation of the fundamental lemma and Ngo’s proof

I would like to take Ben up on his challenge (especially since he seems to have solved the problem that I’ve been working on for the past four years) and try to explain something about the Fundamental Lemma and Ngo’s proof.  In doing so, I am aided by a two expository talks I’ve been to on the subject — by Laumon last year and by Arthur this week.

Before I begin, I should say that I am not an expert in this subject, so please don’t take what I write here too seriously and feel free to correct me in the comments.  Fortunately for me, even though the Fundamental Lemma is a statement about p-adic harmonic analysis, its proof involves objects that are much more familiar to me (and to Ben).  As we shall see, it involves understanding the summands occurring in a particular application of the decomposition theorem in perverse sheaves and then applying trace of Frobenius (stay tuned until the end for that!).

First of all I should begin with the notion of “endoscopy”.  Let $G, G'$ be two reductive groups and let $\hat{G}, \hat{G}'$ be there Langlands duals.  Then $G'$ is called an endoscopic group for $G$ if $\hat{G}'$ is the fixed point subgroup of an automorphism of $\hat{G}$.  A good example of this is to take $G = GL_{2n}$, $G' = SO_{2n+1}$.  At first glance these groups having nothing to do with each other, but you can see they are endoscopic since their dual groups are $GL_{2n}$ and $Sp_{2n}$ and we have $Sp_{2n} \hookrightarrow GL_{2n}$.

As part of a more general conjecture called Langlands functoriality, we would like to relate the automorphic representations of $G$ to the automorphic representations of all possible endoscopic groups $G'$.  Ngo’s proof of the Fundamental Lemma completes the proof of this relationship.

# Topology that Algebra can’t see

Let $X$ be an algebraic variety over $\mathbb{C}$; that is to say, the zero locus of a bunch of polynomials with complex coefficients. We will consider this zero locus as a topological space using the ordinary topology on $\mathbb{C}$. One of the main themes of algebraic geometry in the last century has been learning how to study the topology of $X$ in terms of the algebraic properties of the defining equations.

In this post, I will explain that there are intrinsic limits to this approach; things that cannot be computed algebraically. In particular, I want to explain how from a categorical point of view, we can’t even compute the homology $H_1( \ , \mathbb{Z})$. And, even if you don’t believe in categories, you’ll still have to concede that we can’t compute $\pi_1( \ )$. This is a very pretty example and it should be more widely known.

Absolutely none of the ideas in this post are original; I think most of them are due to Serre. (Thanks to Attila Smith in comments for the reference.)

# Continued Fractions and Hyperelliptic Curves

I recently read a charming little paper: Quasi-elliptic integrals and periodic continued fractions, by van der Poorten and Tran. Most of us who have taken a number theory course of some kind learned how to solve Pell’s equation: $x^2 - D y^2 =1$ where $D$ is a nonsquare positive integer. The usual method is to compute the continued fraction
$\displaystyle{\sqrt{D} = a_0 + \frac{1}{a_1 + \frac{1}{a_2 + \frac{1}{\cdots}}}}$.
One then defines the convergents of $\sqrt{D}$ by
$\displaystyle{x_0/y_0 = a_0}$
$\displaystyle{x_1/y_1 = a_0 + \frac{1}{a_1}}$
$\displaystyle{x_2/y_2 = a_0 + \frac{1}{a_1+\frac{1}{a_2}}}$ etcetera.

Then $x_i^2 - D y_i^2$ tends to be very small and, if you compute long enough, for some $i$ you will have $x_i^2 - D y_i^2=1$.

What van der Poorten and Tran do is to ask what happens if $D$ is not an integer, but a polynomial $D(t) = t^{2g+2} + d_{2g+1} t^{2g+1} + \cdots + d_1 t + d_0$. Before I get into details, I want to tell you about something gorgeous that I won’t explain at all. Using the methods in their paper, van der Poorten and Trap can discover identities like
$\displaystyle{ \int \frac{3 x dx}{\sqrt{x^4+2x}} = \log \left( x^3+1+x \sqrt{x^4+2x} \right)}.$
Isn’t that pretty?

It turns out that the continued fraction algorithm for $\sqrt{D(t)}$ is actually much prettier than for integers. Everything should be understood in terms of the curve $C$ cut out by $y^2 = D(t)$. This is a curve of genus $g$, with two points at infinity. (One of these points is the limit of $(t, \sqrt{D(t)})$ and the other is the limit of $(t, -\sqrt{D(t)})$.) I’ll call these two points $\infty_{+}$ and $\infty_{-}$. The theory is controlled by the line bundles $\mathcal{O}(k \infty_+ + \ell \infty_-)$. In particular, there are nontrivial solutions to $x(t)^2 - D(t) y(t)^2 =1$ if and only if the continued fraction is periodic, if and only if $\mathcal{O}(k \infty_+) = \mathcal{O}(k \infty_-)$ for some $a >0$.

Below the fold, I’ll explain what is meant by the continued fraction algorithm for an algebraic function, and tell you some of the other nice results from the paper.

# Rabinoff on Witt Vectors

I have sometimes thought of writing a post on the ring of Witt vectors. But now I see that there is no need, because Joe Rabinoff has written a superb guide. Every time that I thought “this is pretty good, but it would be clearer if he pointed out $X$“, the next paragraph was an explanation of $X$!

There is one little thing that I could think to add, so I’ll do that here. I think you will get the most out of this paper if you approach each result as an exercise and try to give your own proof. However, right near the beginning is a theorem — Theorem 1.2, part 2 — which is too hard to be an exercise and where failure will be frustrating rather than illuminating. So I’m going to give you a hint.

Here is the result: Let $K$ be a ring of characteristic $p$ for which the map $x \mapsto x^p$ is bijective. Let $R$ be a ring in which $p$ is not a zero divisor, which is complete and Hausdorff in the $p$adic topology*, and such that $R/pR \cong K$. Theorem/Exercise There is a unique lift $\tau: K \to R$ such that $\tau(x)$ is congruent to $x$ modulo $p$ and such that $\tau(x) \tau(y) = \tau(xy)$.

Here is the hint: $\tau(x)$ can be characterized by the fact that it is the unique lift of $x$ such that $\tau(x)^{1/p^i}$ exists for every $i$.

That’s the only improvement I have. Go and enjoy!

* If the statement that $R$ is complete and Hausdorff is weird for you, here is a restatement: The Hausdorff condition says that, for $x \in R$, if $p^i$ divides $x$ for every $i$, then $x=0$. The complete condition says this: Suppose we have a sequence $x_n$ in $R$ such that, for any $i$, the images of $x_n$ in $R/p^i$ are eventually constant. Then there is an element $x \in R$ such that, for every $i$ the image of $x$ in $R/p^i$ assumes the above-mentioned constant value.

I would advise, however, that you learn to think about these conditions topologically before attempting the Theorem/Exercise.

# Generalized moonshine I: Genus zero functions

This is a plug for my first arXiv preprint, 0812.3440. It didn’t really exist as an independent entity until about a month ago, when I got a little frustrated writing a larger paper and decided to package some results separately. It is the first in a series of n (where n is about five right now), attacking the generalized moonshine conjecture. Perhaps the most significant result is that nontrivial replicable functions of finite order with algebraic integer coefficients are genus zero modular functions. This answers a question that has been floating around the moonshine community for about 30 years.

Moonshine originated in the 1970s, when some mathematicians noticed apparent numerical coincidences between the theory of modular functions and the theory of finite simple groups. Most notable was McKay’s observation that 196884=196883+1, where the number on the left is the first nontrivial Fourier coefficient of the modular function j, which classifies complex elliptic curves, and the numbers on the right are the dimensions of the smallest irreducible representations of the largest sporadic finite simple group, called the monster. Modular functions and finite group theory were two areas of mathematics that were not previously thought to be deeply related, so this came as a bit of a surprise. Conway and Norton encoded the above equation together with other calculations by Thompson and themselves in the Monstrous Moonshine Conjecture, which was proved by Borcherds around 1992.

I was curious about the use of the word “moonshine” here, so I looked it up in the Oxford English Dictionary. There are essentially four definitions:

1. Light from the moon, presumably reflected from the sun (1425)
2. Appearance without substance, foolish talk (1468 – originally “moonshine in the water”)
3. A base of rosewater and sugar, or a sweet pudding (1558 cookbook!)
4. Smuggled or illegally distilled alcoholic liquor (1782)

The fourth and most recent definition seems to be the most commonly used among people I know. The second definition is what gets applied to the monster, and as far as I can tell, its use is confined to English people over 60. It seems to be most popularly known among scientists through a quote by Rutherford concerning the viability of atomic power.

I’ll give a brief explanation of monstrous moonshine, generalized moonshine, and my paper below the fold. There is a question at the bottom, so if you get tired, you should skip to that.

# More F_un

Incidentally, I hope you’ve all been reading F_un mathematics. Even if you aren’t all that interested in the field with one element, it’s a beautifully designed site and might give you some ideas about pushing Web 2.0 in mathematics a bit further than just blogs. While I like our blog, with all its messy diversity (as my collaborators can tell you, messy diversity is a core component of my mathematical style), F_un mathematics has a much more organized focused feel, which I think maybe more promising for getting actual mathematics done. I also think the division of the posts into “outreach,” “undergraduate,” “graduate,” and “research” has some interesting potential and sort of makes me feel like we should be doing a better job of indicating the background level for our post.

# Bleg: testing algebraic integrality by computer.

Update 2: we’ve found a nice answer to our question. Maybe it will appear in the comments soon. –Scott M

Scott, Emily, and I have an ongoing project optimistically called “The Atlas of subfactors.” In the long run we’re hoping to have a site like Dror Bar-Natan and Scott’s Knot atlas with information about subfactors of small index and small fusion categories. In the short run we’re trying to automate known tests for eliminating possible fusion graphs for subfactors.

Right now we’re running into a computational bottleneck: given a number that is a ratio of two algebraic integers how can you quickly test whether it is an algebraic integer? Mathematica’s function AlgebraicIntegerQ is horribly slow, and we’re not sure if that’s because it’s poorly implemented or whether the problem is difficult. So, anyone have a good suggestion? After the jump I’ll explain what this question has to do with tensor categories (and hence subfactors which correspond to bi-oidal categories as I’ve discussed before).

To whet your appetite, here’s an example. Is $a/b$, where

$a=-293 \lambda^{11}+4624 \lambda^9-23668 \lambda^7+50302 \lambda^5-44616\lambda^3+14017 \lambda$

$b=131\lambda^{10} - 2033 \lambda^8 + 9974\lambda^6-18951\lambda^4+12233 \lambda^2-1475$

and where $\lambda$ is the largest real root of

$1 - 58 x^2 + 175 x^4 - 186 x^6 + 84 x^8 - 16 x^{10} + x^{12},$

an algebraic integer? Mathematica running on Scott’s computer (using the builtin function AlgebraicIntegerQ) takes more than 5 minutes to decide that it is.

Update: Thanks to David Savitt for pointing out that both this example and an earlier one are answered instantly by MAGMA. Blegging is already working. But what’s the trick? Is it something we can teach Mathematica quickly? –Scott M