We’ll start with a puzzle, taken from Hofstader’s *Godel, Escher, Bach*. In this puzzle, we are going to learn to speak in a restricted language. We will be talking about integers, so our variables will always stand for integers. We have available to us the ordinary arithmetic operations , , , and the relations , , , , , . We have the logical connectives: "AND", "OR", "NOT", "IF … THEN" and the quantifiers "THERE EXISTS" and "FOR ALL".

So we can say " is prime", because we just say "There do not exist and , both , with ." Or we can say “Every nonnegative integer is a sum of four squares”; we just say “If then there exist , , and such that .” The technical phrase here is that we are speaking the first order language of integers.

**Puzzle:** Within this limited vocabulary, express the statement .

Why is this puzzle important? In Godel’s proof of the undecidability of Peano Arithmetic, he took theorems and proofs and encoded them as integers. He then needed to express “ is a proof of ” as a sentence in the vocabulary of first order logic. When you think about how challenging the above puzzle is, and how much more complicated “ is a proof of ” is than ““, this should astonish you.

There are a lot of great popular introductions to the overall structure of Godel’s proof. Most of them gloss over HOW Godel solved this technical problem. My goal in this post is to discuss precisely this point, and ignore the larger questions. I will take the view that expressing things in the first order language of arithmetic is a worthwhile goal in its own right, and will show you how surprisingly strong this language is.

Below the fold, I’ll solve the puzzle. Then I’ll show how to express some more serious ideas.

First, a basic construction:

By ““, we mean “ and there exists such that .”

So, when we write , this is a statement of first order logic.

Now, without further ado, the solution:

There exist , and such that

- .
- For all , there exist and such that , and .
- .

I urge you to stare at this a lot, and convince yourself that it works. For any finite sequence ; we can find such that .

This is the trick everything else will be based on. By giving only three integers , we have managed to encode an entire sequence of integers. Obviously, we could modify this trick to say “” or “ is the -th Fibonacci number.”

The next statement says that the recursion, starting at , terminates:

There exist nonnegative integers , , and such that

- .
- For all , there exist and such that , , if is even then and, if is odd, then .
- .

Now, we are encoding a list of length by giving a quadruple of positive integers. It will be convenient to have a way to encode a list by just one integer. That’s easy enough. Just take the previous example and change it to read

There is a nonnegative integer such that there exist nonnegative integers , , and with and

Of course, I use binomial coefficients as a convenient shorthand. The right hand side is some polynomial in , which we could write out in full if we wanted.

The point is, given any nonnegative integer , there is exactly one way to express it as this sum of binomial coefficients. So we can encode four integers in one. And those four integers, in turn, can encode a list of finite length.

We’re going to be using this idea a lot, so let’s create notation. will mean “the -th term of the list encoded by ” and will be “the length of the list encoded by .” I could always unravel this to talk about and if I wanted to. (But I don’t!)

Here’s a nice example:

If is prime, then there is some with such that

- and .
- For to , , and, for to , .
- For , there is a such that .

Do you see what’s going on here? is a list of lists, encoding the first rows of Pascal’s triangle. This states the (true) theorem that is divisible by .

The ability to talk about lists of lists is incredibly powerful. It can also be very confusing. In one of Douglass Hofstadter’s essays, he says something like: “Godel’s paper isn’t a mathematical paper at all; it’s a LISP compiler!” And, indeed, most of the length of the paper comes down to saying how to convert very simple statements about lists into very long statements about arithmetic relations between integers.

In this analogy, the first order language of arithmetic is assembly language. What is *C*? This is the first order language of hereditarily finite lists of integers. Our variables will denote either integers or finite lists whose elements are either integers or finite lists whose elements are either integers or finite lists , so something like

The important point is that we do **not** have infinite branching or infinite nesting.

We will, again, have all the logical symbols "AND", "OR", "NOT", "IF … THEN" and the quantifiers "THERE EXISTS" and "FOR ALL". We will have the statements “is an integer” and “is a list”. For integers, we will have all the arithmetic operations and relations we had before. We also have basic operations on lists. For example, if is a list and an integer, we have the operations and .

The next few screens will address some technical points. To keep you interested, here is another puzzle:

**Puzzle 2:** Let be the function which takes a hereditarily finite list of integers and returns the list where every integer, at any level of nesting, has been doubled. Express the sentence in the first order language of hereditarily finite sets of integers.

There is one very technical point to warn you about. Since the language of arithmetic only has integers, we have to use integers both to encode lists and to encode integers. We can solve this in many ways; a traditional solution is to use odd integers for lists and even integers for actual integers. So the sentence

in the language of hereditarily finite lists of integers.

has to be compiled to

in the language of integers.

I will completely hide this issue from you in the future, but the first book I read about this didn’t point it out, and it freaked me out for a while when I realized the issue existed.

Hereditarily finite lists of integers are a vastly nicer language than integers. There is one immediate problem. Depending on exactly how we set up our encoding of lists by integers, we might be able to encode the list that contains itself. We could probably dodge that by choosing our encoding right, but it’s easier to just write down a statement that says that is hereditarily finite. Here’s a solution; it is worth reading carefully:

There is a list such that and, for between and , either

- is an integer

or- is a list and, for between and , there is a , with , such that .

Do you see how this works? Every subobject which occurs anywhere in , at any depth of nesting, must occur as an object of . As an imperfect analogy, you can think of this as like the way that very complicated data structures are flattened into a linear computer memory.

We can use this trick to define recursive functions *on hereditarily finite lists*, the same was as we used lists to define recursive functions on numbers. For example, we now give the solution to **puzzle 2**.

There are two lists and , with , with and and such that,

for between and , either

- and are integers with

or- and are lists of the same length and, for between and , there is a such that and .

Again, I encourage you to stare at this until you see that it says what it is supposed to.

So we can do recursion! We are now something like ‘s of the way to LISP.

There is an important point which I’ve glossed over. We can have two different integers encode the same list. All the ~~code~~ first order sentences which I have written so far will work just fine if you read the symbol to mean “are the same actual integer, in the low level encoding.” But we also do need a way to say that two lists are equal, however they are encoded. To solve this, take the block above and just remove the multiplication by . We’ll write for this more general notion of equality.

At this point, you should be able to write Godel’s proof. Assign numbers to characters like , , and so forth. Figure out how to say things like “ is a grammatical sentence, containing the unbound variable .” Keep pushing and, now that we have a decent language for handling lists and recursion, it is not that hard to write sentences that say we have a valid proof.

I’m going to go in a different direction, and explain why I started writing this post. Over at Math Overflow, I asked the question “What is induction up to ?” I didn’t realize it at the time, but I was basically asking someone to write me a multi-screen blogpost. So, now that I understand, I have written that post. Let me explain how to encode .

I’ll say that

if one of the following is true:

- and are integers with
- is an integer and a list
- and are lists and either

- There is some , with such that

- , for
- and
or

- and
- for .

This gives us a total ordering on all hereditarily finite lists of integers. (Thanks to Allison Miller for pointing out that my previous definition didn’t work.) Roughly, we do a depth first search of and , looking for where they first differ. We rank large integers above small integers, and lists above integers.

Exercise for the reader: convince yourself that the relationship really can be written in the first order language of hereditarily finite lists of integers. This will get you a lot of practice in unfolding recursions.

Now that we can compare lists, we can sort lists of lists.

is sorted if either

- is an integer or
- is a list so that

- for and
- is sorted for .

**SOME PREVIOUS ERRORS ARE NOW FIXED** Define a pure list to be a list whose elements are either pure lists or the empty list. (Details are left to you at this point.) So, something like .

Now, I can define . The ordinal is the set of all pure sorted lists, under the relation .

Here are the first few elements of , along with their more standard notations:

I think you can see why no one wanted to write out a complete answer to “How do you encode in the first order language of integers?” on Math Overflow!

This was a lovely article, and much better than I could have written, so I only feel (slightly) less guilty now.

I’m glad that someone’s written all this, and in a much clearer way than I probably would have been able to! When you mentioned Puzzle 2 I was momentarily freaked out by the extreme non-uniqueness of the “function” you wanted us to define, until you gave the clarification a moment later about how this had freaked you out too.

I normally do Gödel’s theorems in a language that has exponentiation built in, and I always remember that there’s some trick using the Chinese Remainder Theorem that lets you encode lists even without exponentiation, but can never quite remember how it goes – it’s nice to see that here.

Seeing the connection to is a nice touch too – I’ve heard of “proof-theoretic ordinals” that encode the strength of logical systems, and it looks like this shows how to go about doing some of that evaluation. The idea is basically that for any axiom system for arithmetic, consider every ordering of the integers that can be proven with these axioms to be a well-ordering. The first ordinal that isn’t represented by such an ordering is the proof-theoretic ordinal for that system. As I understand it, proving Goodstein’s Theorem (and everything else that depends on “induction up to “) requires not just the ability to give the definition that you give, but also the ability to prove that this ordering is a well-ordering.

Unfortunately, I don’t know quite how you even express that claim in this language, let alone prove it.

As I understand it, you express this claim the same way you do in the ordinary integers: There is a theorem schema, with one theorem for every property of a list. It says

“If, for every list , the truth of for all implies , then holds for all lists .”

The statement that we can’t prove induction up to is the statement that there are some for which we can’t prove this.

I thought about putting this in the post, but decided that axiom schemata were one concept too many. It makes a good problem though: given the axiom schema that the ordinary integers are well ordered, deduce the theorem schema which says that is well ordered.

This is a really nice exposition, on a topic that I’ve been curious about for a while. (I know I’ve been spending too much time reading MathOverflow when I instinctively start looking for the “upvote” button…)

However, I don’t believe that the relation that you’ve defined is currently a total ordering: in the case where and neither of the two subconditions are met, the lists and are incomparable. All this stuff is currently making my head spin a bit, so I’m not sure what the right fix is for this.

Grrr. You’re right. I’ll try to fix this later today. Of course, if this were MO, some 3000+ user could come by and fix it for me.

OK, I think the definition of works now. I also, reluctantly, decided I was being too clever at the end. I wanted to use lists of integers, rather than just lists of lists, for three reasons (1) It is much easier to see how to code natural number theoretic constructions as lists of numbers (2) it makes examples more readable and (3) I have an emotional attachment to ur-elements.

So I tried to modify the standard definition of to use lists of numbers. In the end, though, I’m not sure I got it right, so I reverted to the pure lists for .

The encoding is meant to be by the following recipe:

means , and an empty sum is .

Did I get it right this time?

OK – it’s a little unsatisfying to prove well-ordering as a schema, but I guess it’s what we have to do if we don’t allow variables of set type.

Anyway, it looks like that new definition is right. It’s a nice elegant way of encoding things in Cantor Normal Form. The fact that every ordinal can be written uniquely in Cantor Normal Form, and the definition that is the smallest solution to seems to me to show that this representation of ordinals is correct. (It’s trivial to show that 1 can be represented, that the finite sum of any sequence of representable ordinals can be represented, and can be represented for any representable . Thus, the least non-representable ordinal must be a solution to .)

About the 10^n puzzle: that was definitely the stumper out of the exercises from GEB! I am curious though about the part you leave to the reader to convince themselves… I believe that if you take (a, a+M, a+2M, …, a+yM) to be a list of suitably large primes then it follows using the Chinese Remainder Theorem. But Green-Tao’s theorem (there exist arbitrarily long arithmetic sequences of primes) was not proven at time of writing the book.

About 5 years ago I read a book on the solution of Hilbert’s 10th problem and they give one (very complex) solution.

What is the simplest solution you know… is Green-Tao unnecessary?

They don’t have to be primes; they just have to be relatively prime and sufficiently large. Consider , for some large .

By the way, this is not the solution to Hilbert’s 10th problem. Hilbert’s 10th is much harder, because you are only allowed to use Diophantine equations, which are more restrictive than first order number theory sentences. For example, if you unpack the definition of , our solution to the power of 10 problem says in part

“There exists , and such that, for all , there exists a such that …”

You can’t use “for all” in a Diophantine equation! There is an easy trick to get rid of it when it is the inner most quantifier, but that “there exists a ” blocks your way. This is why the first order theory of is known to be undecidable, but the decidability of Diophantine equations over is open.

Awesome, thanks!

Kenny, this doesn’t state that ordinals are well-founded. We only state (via a schema) induction for ordinals. There is no way to even state anything is well founded in a first order way, even < on N. Indeed there are always non-standard models where these relations are not well-founded, but induction for first-order statements over these relations still hold in these non-standard models.

At least this is my understanding.

Russel: You’re saying “well founded” where I would say “well ordered”. Is there a reason?

David: Feel free to substitute well-ordered with well-founded in my statement. It shouldn’t be a issue since the orders in question are presumably provably total. The well-founded part of the definition of well-ordered is the problematic part.

Very nice post!

Here’s a typo when you talk about encoding a list by one non-negative integer: since , , , are non-negative, I think you meant

(also in the line above that the variable is missing)

Thanks for the corrections!

Hi, David. Wonderful post! You might be interested in how ordinals are encoded in the ACL2 (`a computational logic for applicative common lisp’) theorem prover. Two good links: http://www.cs.utexas.edu/users/moore/acl2/v3-3/O-P.html and http://www.cs.utexas.edu/users/moore/acl2/v3-3/ORDINALS.html .