The technical part of Godel’s proof

We’ll start with a puzzle, taken from Hofstader’s Godel, Escher, Bach. In this puzzle, we are going to learn to speak in a restricted language. We will be talking about integers, so our variables will always stand for integers. We have available to us the ordinary arithmetic operations +, -, \times, and the relations =, \neq, <, \leq, >, \geq. We have the logical connectives: "AND", "OR", "NOT", "IF … THEN" and the quantifiers "THERE EXISTS" and "FOR ALL".

So we can say "p is prime", because we just say "There do not exist x and y, both \geq 2, with p=xy." Or we can say “Every nonnegative integer is a sum of four squares”; we just say “If n \geq 0 then there exist a, b, c and d such that a^2+b^2+c^2+d^2=n.” The technical phrase here is that we are speaking the first order language of integers.

Puzzle: Within this limited vocabulary, express the statement x=10^y.

Why is this puzzle important? In Godel’s proof of the undecidability of Peano Arithmetic, he took theorems and proofs and encoded them as integers. He then needed to express “P is a proof of T” as a sentence in the vocabulary of first order logic. When you think about how challenging the above puzzle is, and how much more complicated “P is a proof of T” is than “x=10^y“, this should astonish you.

There are a lot of great popular introductions to the overall structure of Godel’s proof. Most of them gloss over HOW Godel solved this technical problem. My goal in this post is to discuss precisely this point, and ignore the larger questions. I will take the view that expressing things in the first order language of arithmetic is a worthwhile goal in its own right, and will show you how surprisingly strong this language is.

Below the fold, I’ll solve the puzzle. Then I’ll show how to express some more serious ideas.

First, a basic construction:
By “r=\bmod(a,m)“, we mean “0 \leq r < m and there exists q such that a=mq+r.”
So, when we write r = \bmod(a,m), this is a statement of first order logic.

Now, without further ado, the solution:

There exist M, a and L such that

  • 1 = \bmod(L, a+0 \cdot M).
  • For all 0 \leq i < y, there exist r and s such that r=\bmod(L,a+iM), s = \bmod(L, a+(i+1)M) and s=10 * r.
  • x = \bmod(L, a+yM).

I urge you to stare at this a lot, and convince yourself that it works. For any finite sequence (r_0, r_1, \ldots, r_n); we can find (L,M,a) such that r_i = \bmod(L, a+iM).

This is the trick everything else will be based on. By giving only three integers (M,a,L), we have managed to encode an entire sequence of y+1 integers. Obviously, we could modify this trick to say “x=y!” or “x is the y-th Fibonacci number.”

The next statement says that the 3x+1 recursion, starting at n, terminates:

There exist nonnegative integers M, a, L and t such that

  • n = \bmod(L, a+0 \cdot M).
  • For all 0 \leq i \leq t-1, there exist r and s such that r=\bmod(L,a+iM), s = \bmod(L, a+(i+1)M), if r is even then r=2s and, if r is odd, then s=3r+1.
  • 1 = \bmod(L, a+tM).

Now, we are encoding a list of length t by giving a quadruple (M,a,L,t) of positive integers. It will be convenient to have a way to encode a list by just one integer. That’s easy enough. Just take the previous example and change it to read

There is a nonnegative integer N such that there exist nonnegative integers M, a, L and t with N = \binom{M+a+L+t+3}{4} + \binom{M+a+L+2}{3} + \binom{M+a+1}{2} + \binom{M}{1} and \ldots

Of course, I use binomial coefficients as a convenient shorthand. The right hand side is some polynomial in (M,a,L,t), which we could write out in full if we wanted.

The point is, given any nonnegative integer N, there is exactly one way to express it as this sum of binomial coefficients. So we can encode four integers in one. And those four integers, in turn, can encode a list of finite length.

We’re going to be using this idea a lot, so let’s create notation. N[i] will mean “the i-th term of the list encoded by N” and \# N will be “the length of the list encoded by N.” I could always unravel this to talk about (M,aL,,t) and \bmod(L, a+iM) if I wanted to. (But I don’t!)

Here’s a nice example:

If p is prime, then there is some N with \#(N)=p+1 such that

  • \#(N[0]) = 1 and N[0][0]=1.
  • For i=1 to p, \#(N[i]) = i+1, N[i][0] = N[i][i]= 1 and, for j=1 to i-1, N[i][j+1] = N[i-1][j] + N[i-1][j+1].
  • For 1 \leq j \leq p-1, there is a k such that N[p+1][j] = pk.

Do you see what’s going on here? N is a list of lists, encoding the first p+1 rows of Pascal’s triangle. This states the (true) theorem that \binom{p}{j} is divisible by p.

The ability to talk about lists of lists is incredibly powerful. It can also be very confusing. In one of Douglass Hofstadter’s essays, he says something like: “Godel’s paper isn’t a mathematical paper at all; it’s a LISP compiler!” And, indeed, most of the length of the paper comes down to saying how to convert very simple statements about lists into very long statements about arithmetic relations between integers.

In this analogy, the first order language of arithmetic is assembly language. What is C? This is the first order language of hereditarily finite lists of integers. Our variables will denote either integers or finite lists whose elements are either integers or finite lists whose elements are either integers or finite lists \ldots, so something like
( 1,\ 1,\ (1,\ 2),\ ((\ ),\ 1),\ (\ ) ).
The important point is that we do not have infinite branching or infinite nesting.

We will, again, have all the logical symbols "AND", "OR", "NOT", "IF … THEN" and the quantifiers "THERE EXISTS" and "FOR ALL". We will have the statements “is an integer” and “is a list”. For integers, we will have all the arithmetic operations and relations we had before. We also have basic operations on lists. For example, if L is a list and i an integer, we have the operations L[i] and \#(L).

The next few screens will address some technical points. To keep you interested, here is another puzzle:

Puzzle 2: Let f be the function which takes a hereditarily finite list of integers and returns the list where every integer, at any level of nesting, has been doubled. Express the sentence M=f(L) in the first order language of hereditarily finite sets of integers.

There is one very technical point to warn you about. Since the language of arithmetic only has integers, we have to use integers both to encode lists and to encode integers. We can solve this in many ways; a traditional solution is to use odd integers for lists and even integers for actual integers. So the sentence

X=5 in the language of hereditarily finite lists of integers.

has to be compiled to

x=10 in the language of integers.

I will completely hide this issue from you in the future, but the first book I read about this didn’t point it out, and it freaked me out for a while when I realized the issue existed.

Hereditarily finite lists of integers are a vastly nicer language than integers. There is one immediate problem. Depending on exactly how we set up our encoding of lists by integers, we might be able to encode the list that contains itself. We could probably dodge that by choosing our encoding right, but it’s easier to just write down a statement that says that L is hereditarily finite. Here’s a solution; it is worth reading carefully:

There is a list V such that L=V[1] and, for i between 1 and \#(V), either

  • V[i] is an integer
    or

  • V[i] is a list and, for j between 1 and \#(V[i]), there is a k, with i < k \leq \#(V), such that V[i][j] = V[k].

Do you see how this works? Every subobject which occurs anywhere in L, at any depth of nesting, must occur as an object of V. As an imperfect analogy, you can think of this as like the way that very complicated data structures are flattened into a linear computer memory.

We can use this trick to define recursive functions on hereditarily finite lists, the same was as we used lists to define recursive functions on numbers. For example, we now give the solution to puzzle 2.

There are two lists V and W, with \#(V) = \#(W), with L = V[1] and M = W[1] and such that,
for i between 1 and \#(V), either

  • V[i] and W[i] are integers with W[i]=2V[i]
    or

  • V[i] and W[i] are lists of the same length and, for j between 1 and \#(V[i]), there is a k>i such that V[i][j] = V[k] and W[i][j]=W[k].

Again, I encourage you to stare at this until you see that it says what it is supposed to.

So we can do recursion! We are now something like 3/9‘s of the way to LISP.

There is an important point which I’ve glossed over. We can have two different integers encode the same list. All the code first order sentences which I have written so far will work just fine if you read the symbol = to mean “are the same actual integer, in the low level encoding.” But we also do need a way to say that two lists are equal, however they are encoded. To solve this, take the block above and just remove the multiplication by 2. We’ll write L \equiv M for this more general notion of equality.

At this point, you should be able to write Godel’s proof. Assign numbers to characters like +, \leq, \forall and so forth. Figure out how to say things like “T is a grammatical sentence, containing the unbound variable x.” Keep pushing and, now that we have a decent language for handling lists and recursion, it is not that hard to write sentences that say we have a valid proof.


I’m going to go in a different direction, and explain why I started writing this post. Over at Math Overflow, I asked the question “What is induction up to \epsilon_0?” I didn’t realize it at the time, but I was basically asking someone to write me a multi-screen blogpost. So, now that I understand, I have written that post. Let me explain how to encode \epsilon_0.

I’ll say that

L \prec M if one of the following is true:

  • L and M are integers with L \leq M
  • L is an integer and M a list
  • L and M are lists and either
    • There is some i, with 1 \leq i \leq \min(\#(L), \#(M)) such that
      • L(j) \equiv M(j), for 1 \leq j < i
      • L(i) \not\equiv M(i) and
      • L(i) \prec M(i)

      or

      • \#(L) \leq \#(M) and
      • L(j) \equiv M(j) for 1 \leq j \leq \#(L).

This gives us a total ordering on all hereditarily finite lists of integers. (Thanks to Allison Miller for pointing out that my previous definition didn’t work.) Roughly, we do a depth first search of L and M, looking for where they first differ. We rank large integers above small integers, and lists above integers.
Exercise for the reader: convince yourself that the relationship \prec really can be written in the first order language of hereditarily finite lists of integers. This will get you a lot of practice in unfolding recursions.

Now that we can compare lists, we can sort lists of lists.

L is sorted if either

  • L is an integer or
  • L is a list so that
    • L[i] \prec L[i+1] for 1 \leq i \leq \#(L) and
    • L[i] is sorted for 1 \leq i \leq \#(L).

SOME PREVIOUS ERRORS ARE NOW FIXED Define a pure list to be a list whose elements are either pure lists or the empty list. (Details are left to you at this point.) So, something like ( ( (),() ),\ (()),\ (()),\ ()).

Now, I can define \epsilon_0. The ordinal \epsilon_0 is the set of all pure sorted lists, under the relation \prec.
Here are the first few elements of \epsilon_0, along with their more standard notations:

() \quad 0

(()) \quad 1

((), ()) \quad 2

((),(),()) \quad 3

\cdots

( (()) ) \quad \omega

( (()), ()) \quad \omega+1

\cdots

( (()), (()) ) \quad 2 \omega

\cdots

( ((), ()) ) \quad \omega^2

\cdots

( ((), ()), (()), (()), (), (), () ) \quad \omega^2 + 2 \omega +3

\cdots

( ((),(),()) ) \quad \omega^3

\cdots

( ( (()) ) ) \quad \omega^{\omega}

\cdots

I think you can see why no one wanted to write out a complete answer to “How do you encode \epsilon_0 in the first order language of integers?” on Math Overflow!

18 thoughts on “The technical part of Godel’s proof

  1. I’m glad that someone’s written all this, and in a much clearer way than I probably would have been able to! When you mentioned Puzzle 2 I was momentarily freaked out by the extreme non-uniqueness of the “function” you wanted us to define, until you gave the clarification a moment later about how this had freaked you out too.

    I normally do Gödel’s theorems in a language that has exponentiation built in, and I always remember that there’s some trick using the Chinese Remainder Theorem that lets you encode lists even without exponentiation, but can never quite remember how it goes – it’s nice to see that here.

    Seeing the connection to \epsilon_0 is a nice touch too – I’ve heard of “proof-theoretic ordinals” that encode the strength of logical systems, and it looks like this shows how to go about doing some of that evaluation. The idea is basically that for any axiom system for arithmetic, consider every ordering of the integers that can be proven with these axioms to be a well-ordering. The first ordinal that isn’t represented by such an ordering is the proof-theoretic ordinal for that system. As I understand it, proving Goodstein’s Theorem (and everything else that depends on “induction up to \epsilon_0“) requires not just the ability to give the definition that you give, but also the ability to prove that this ordering is a well-ordering.

    Unfortunately, I don’t know quite how you even express that claim in this language, let alone prove it.

  2. As I understand it, you express this claim the same way you do in the ordinary integers: There is a theorem schema, with one theorem for every property \phi of a list. It says

    “If, for every list L, the truth of \phi(M) for all M \prec L implies \phi(L), then \phi(L) holds for all lists L.”

    The statement that we can’t prove induction up to \epsilon_0 is the statement that there are some \phi for which we can’t prove this.

    I thought about putting this in the post, but decided that axiom schemata were one concept too many. It makes a good problem though: given the axiom schema that the ordinary integers are well ordered, deduce the theorem schema which says that \omega^{\omega} is well ordered.

  3. This is a really nice exposition, on a topic that I’ve been curious about for a while. (I know I’ve been spending too much time reading MathOverflow when I instinctively start looking for the “upvote” button…)

    However, I don’t believe that the relation \prec that you’ve defined is currently a total ordering: in the case where \#(L) \le \#(M) and neither of the two subconditions are met, the lists L and M are incomparable. All this \epsilon_0 stuff is currently making my head spin a bit, so I’m not sure what the right fix is for this.

  4. Grrr. You’re right. I’ll try to fix this later today. Of course, if this were MO, some 3000+ user could come by and fix it for me.

  5. OK, I think the definition of \prec works now. I also, reluctantly, decided I was being too clever at the end. I wanted to use lists of integers, rather than just lists of lists, for three reasons (1) It is much easier to see how to code natural number theoretic constructions as lists of numbers (2) it makes examples more readable and (3) I have an emotional attachment to ur-elements.

    So I tried to modify the standard definition of \epsilon_0 to use lists of numbers. In the end, though, I’m not sure I got it right, so I reverted to the pure lists for \epsilon_0.

    The encoding is meant to be by the following recipe:

    (a_1, a_2, \ldots, a_r) means \sum \omega^{a_i}, and an empty sum is 0.

    Did I get it right this time?

  6. OK – it’s a little unsatisfying to prove well-ordering as a schema, but I guess it’s what we have to do if we don’t allow variables of set type.

    Anyway, it looks like that new definition is right. It’s a nice elegant way of encoding things in Cantor Normal Form. The fact that every ordinal can be written uniquely in Cantor Normal Form, and the definition that \epsilon_0 is the smallest solution to x=\omega^x seems to me to show that this representation of ordinals is correct. (It’s trivial to show that 1 can be represented, that the finite sum of any sequence of representable ordinals can be represented, and \omega^x can be represented for any representable x. Thus, the least non-representable ordinal must be a solution to x=\omega^x.)

  7. About the 10^n puzzle: that was definitely the stumper out of the exercises from GEB! I am curious though about the part you leave to the reader to convince themselves… I believe that if you take (a, a+M, a+2M, …, a+yM) to be a list of suitably large primes then it follows using the Chinese Remainder Theorem. But Green-Tao’s theorem (there exist arbitrarily long arithmetic sequences of primes) was not proven at time of writing the book.

    About 5 years ago I read a book on the solution of Hilbert’s 10th problem and they give one (very complex) solution.

    What is the simplest solution you know… is Green-Tao unnecessary?

  8. They don’t have to be primes; they just have to be relatively prime and sufficiently large. Consider M=k*y!, a=1+M for some large k.

    By the way, this is not the solution to Hilbert’s 10th problem. Hilbert’s 10th is much harder, because you are only allowed to use Diophantine equations, which are more restrictive than first order number theory sentences. For example, if you unpack the definition of \bmod, our solution to the power of 10 problem says in part

    “There exists M, a and L such that, for all i, there exists a q such that …”

    You can’t use “for all” in a Diophantine equation! There is an easy trick to get rid of it when it is the inner most quantifier, but that “there exists a q” blocks your way. This is why the first order theory of \mathbb{Q} is known to be undecidable, but the decidability of Diophantine equations over \mathbb{Q} is open.

  9. Kenny, this doesn’t state that ordinals are well-founded. We only state (via a schema) induction for ordinals. There is no way to even state anything is well founded in a first order way, even < on N. Indeed there are always non-standard models where these relations are not well-founded, but induction for first-order statements over these relations still hold in these non-standard models.

    At least this is my understanding.

  10. Russel: You’re saying “well founded” where I would say “well ordered”. Is there a reason?

  11. David: Feel free to substitute well-ordered with well-founded in my statement. It shouldn’t be a issue since the orders in question are presumably provably total. The well-founded part of the definition of well-ordered is the problematic part.

  12. Very nice post!

    Here’s a typo when you talk about encoding a list by one non-negative integer: since L, M, a, t are non-negative, I think you meant

    N = \binom{M+a+L+t+3}{4}+\binom{M+a+L+2}{3}+\binom{M+a+1}{2}+\binom{M}{1}

    (also in the line above that the variable a is missing)

Comments are closed.