Here is the result we are proving: Let be a prime power and let be the cyclic group of order . Let be a set which does not contain any three term arithmetic progression, except for the trivial progressions . Then

The exciting thing about this bound is that it is exponentially better than the obvious bound of . Until recently, all people could prove was bounds like , and this is still the case if is not a prime power.

All of our bounds extend to the colored version: Let be a list of triples in such that , but if are not all equal. Then the same bound applies to . To see that this is a special case of the previous problem, take . Once the problem is cast this way, if is odd, one might as well define , so our hypotheses are that but if are not all equal. We will prove our bounds in this setting.

I must admit, this is the least slick of the three arguments. The reader who wants to cut to the slick versions may want to scroll down to the other sections.

We will put an abelian group structure on the set which is isomorphic to , using formulas found by Witt. I give an example first: Define an addition on by

The reader may enjoy verifying that this is an associative addition, and makes into a group isomorphic to . For example, and .

In general, Witt found formulas

such that becomes an abelian group isomorphic to . If we define and to have degree , then is homogenous of degree . (Of course, Witt did much more: See Wikipedia or Rabinoff.)

Write

.

and set

.

For example, when , we have

.

So if and only if in .

We now work with variables, , and , where and . Consider the polynomial

.

Here each is a polynomial in variables.

So is a polynomial on . We identify this domain with . Then if and only if in the group .

We define the degree of a monomial in the , and by setting . In this section, “degree” always has this meaning, not the standard one. The degree of is ; the degree of is and the degree of is .

From each monomial in , extract whichever of , or has lowest degree. We see that we can write

where , and are monomials of degree .

The now-standard argument (I like Terry Tao’s exposition) shows that is bounded by three times the number of monomials of degree . One needs to check that the argument works when the “degree” of a variable need not be , but this is straightforward.

Except we have a problem! There are too many monomials. To solve this issue, let be the polynomial obtained from by replacing every monomial by where with if and if . So coincides with as a function on , but uses smaller monomials. For example, the reader who multiplies out the expression for when will find a term . In , this is replaced by . The polynomial does not have the nice factorization of , but it is much smaller. For example, when , has nonzero monomials and has . Replacing by can only lower degree, so . Now rerun the argument with . Our new bound is three times the number of monomials of degree , **with the additional condition** that all exponents are .

Now, the monomial has degree . Identify with by sending to . We can thus think of as . We get the bound , just as in the prime case.

Let’s be much slicker. Here is how Naslund and Sawin do it (original here).

Notice that, by Lucas’s theorem, the function is a well defined function when . Moreover, using Lucas again,

Define a function by

.

Here we have expanded by Vandermonde’s identity and used .

Define a function by just as before. So if and only if in the abelian group . Expanding gives a sum of terms of the form . Considering such a term to have “degree” , we see that has degree .

As in the standard proof, factor out whichever of , or , has least degree. We obtain

where , and are products of binomial coefficients and, taking , we have , and .

We derive the bound , exactly as before.

I have taken the most liberties in rewriting this argument, to emphasize the similarity with the other arguments. The reader can see the original here.

Let . Let be the ring of functions with pointwise operations, and let be the group ring of . We think of acting on by .

Let , , …, be generators for . Let the functions annihilated by the operators where . For example, is the functions which obey for any , and . We think of as polynomials of degree , and the dimension of is the number of monomials in variables of total degree where each variable has degree .

Define by and otherwise. Define by .

We write , and for the generators of the three factors in .

Then we have

So, if , then as a function on .

On the other hand, we can expand for , and in . We see that, if , then

.

We make the familiar deduction: We can write in the form

where , and run over a basis for .

Once more, we obtain the bound .

Petrov’s method has an advantage not seen in the other proofs: It generalizes well to the case that is non-abelian. For any finite group , let be a one-sided ideal in obeying . In our case, this is the ideal generated by with . Then we obtain a bound for sum free sets in .

I find Petrov’s proof immensely clarifying, because it explains why the arguments all give the same bound. We are all working with functions . I write them as polynomials in variables , Naslund and Sawin use binomial coefficients . The formulas to translate between our variables are a mess: For example, my is their . However, we both agree on what it means to be a polynomial of degree : It means to be annihilated by .

In both cases, we take the indicator function of the identity and pull it back to along the addition map. The first two proofs use explicit identities to see that the result has degree . The third proof points out this is an abstract property of functions pulled back along addition of groups, and has nothing to do with how we write the functions as explicit formulas.

I sometimes think that mathematical progress consists of first finding a dozen proofs of a result, then realizing there is only one proof. My mental image is settling a wilderness — first there are many trails through the dark woods, but later there is an open field where we can run wherever we like. But can we get anywhere beyond the current bounds with this understanding? All I can say is not yet…

]]>

The preprint is here.

Let me first explain the problem. Let be an abelian group. A subset of is said to be free of three term arithmetic progressions if there are no solutions to with , , , other than the trivial solutions . I’ll write for the cyclic group of order . Ellenberg and Giswijt, building on work by Croot, Lev and Pach have recently shown that such an in can have size at most , which is . This was the first upper bound better than , and has set off a storm of activity on related questions.

Robert Kleinberg pointed out the argument extends just as well to bound colored-sets without arithmetic progressions. A colored set is a collection of triples in , and we see that it is free of arithmetic progressions if we have if and only if . So, if , then this is the same as a set free of three term arithmetic progressions, but the colored version allows us the freedom to set the three coordinates separately.

Moreover, once , and are treated separately, if is odd, we may as well replace by and just require that if and only if is odd. This is the three-colored sum-free set problem. Three-colored sum-free sets are easier to construct than three-term arithmetic-progression free sets, but the Croot-Lev-Pach/Ellenberg-Giswijt bounds apply to them as well^{*}.

Our result is a matching of upper and lower bounds: There is a constant such that

(1) We can construct three-colored sum-free subsets of of size and

(2) For a prime power, we can show that three-colored sum-free subsets of have size at most .

So, what is ? We suspect it is the same number as in Ellenberg-Giswijt, but we don’t know!

When is prime, Ellenberg and Giswijt establish the bound . Petrov, and independently Naslund and Sawin (in preparation), have extended this argument to prime powers.

In the set , almost all the -tuples have a particular mix of components. For example, when , almost all tuples have roughly zeroes, ones and twos. The number of such tuples is roughly , where is the entropy .

In general, let be the probability distribution on which maximizes entropy, subject to the constraint that the expected value is . Then almost all -tuples in have roughly copies of , and the number of such -tuples is grows like . I’ll call the EG-distribution.

So Robert and I set out to construct three-colored sum-free sets of size

. What we were actually able to do was to construct such sets whenever there was an -symmetric probability distribution on such that was the marginal probability that the first coordinate of was , and the same for the and coordinates. For example, in the case, if we pick , and with probability and , and with probability , then the resulting distribution on each of the three coordinates is the EG-distribution , and we can realize the growth rate of the EG bound for .

Will pointed out to us that, if such a probability distribution on does not exist, then we can lower the upper bound! So, here is our result:

Consider all -symmetric probability distributions on . Let be the corresponding marginal distribution, with the probabilty that the first coordinate of will be . Let be the largest value of for such a . Then

(1) There are three-colored sum-free subsets of of size and

(2) If is a prime power, such sets have size at most .

Any marginal of an -symmetric distribution on has expected value , so our upper bound is at least as strong as the Ellenberg-Gisjwijt/Petrov-Naslund-Sawin bound. We suspect they are the same: That their optimal probability distribution is such a marginal. But we don’t know!

Here are a few remarks:

(1) The restriction to -symmetric distributions is a notational convenience. Really, all we need is that all three marginals are equal to each other. But we might as well impose -symmetry because, if all the marginals of a distribution are equal, we can just take the average over all permutations of that distribution.

(2) Our lower bound does not need to be a prime power. I’d love to know whether the upper bound can also remove that restriction.

(3) If the largest entropy of a marginal comes from a distribution on with all , then the marginal distribution is the EG distribution. The problem is about the distributions at the boundary; it seems hard to show that it is always beneficial to perturb inwards.

(4) For , there is more than one distribution on with the required marginal. One canonical choice would be the one which has largest entropy given that marginal. If the optimal solution has all , then one can show that it factors as for some function .

* One exception is that Fedor Petrov has lowered the bound for AP free sets in to , whereas the bound for sum-free is still . But, as you will see, I am chasing much rougher bounds here.

]]>

Let me first say that the cc-by-0 license is no problem at all as it allows for other publications without restrictions. Second, our copyright statement of course only talks about the version published in one of our journals, with our copyright line (or the copyright line of a partner society if applicable, or the author’s copyright if Open Access is chosen) on it.

At least if you are publishing in a Springer journal, and more generally, I would strongly encourage you to post your papers to the arXiv under the more permissive CC-BY-0 license, rather than the minimal license the arXiv requires.

As a question to any legally-minded readers: does copyright law genuinely distinguish between “the version published in one of our journals, with our copyright line”, and the “author-formatted post-peer-review” version which is substantially identical, barring the journals formatting and copyright line?

]]>

According to the Talmud, in order for the Sanhedrin to sentence a man to death, the majority of them must agree to it. However

R. Kahana said: If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. (Babylonian Talmud, Tractate Sanhedrin, Folio 17a)

Scott Alexander has a devious mind and considers how he would respond to this rule as a criminal:

[F]irst I’d invite a bunch of trustworthy people over as eyewitnesses, then I’d cover all available surfaces of the crime scene with fingerprints and bodily fluids, and finally I’d make sure to videotape myself doing the deed and publish the video on YouTube.

So, suppose you were on a panel of judges, all of whom had seen overwhelming evidence of the accused’s guilt, and wanted to make sure that a majority of you would vote to convict, but not all of you. And suppose you cannot communicate. With what probability would you vote to convict?

Test your intuition by guessing an answer now, then click below:

My gut instincts were that (1) we should choose really close to , probably approaching as and (2) there is no way this question would have a precise round answer. As you will see, I was quite wrong.

Tumblr user lambdaphagy is smarter than I was and wrote a program. Here are his or her results:

As you can see, it appears that is not approaching , or even coming close to it, but is somewhere near . Can we explain this?

We want to avoid two events: unanimity, and a majority vote to acquit. The probability of unanimity is .

The probability of a majority vote to acquit is . Assuming that , and it certainly should be, almost all of the contribution to that sum will come from terms where . In that case, . And we’ll roughly care about such terms. So the odds of acquittal are roughly .

So we roughly want to be as small as possible. For large, one of the two terms will be much larger than the other, so it is the same to ask that be as small as possible.

Here is a plot of :

Ignore the part with below ; that’s clearly wrong and our approximation that is dominated by won’t be good there. Over the range , the minimum is where .

Let’s do some algebra: , , (since is clearly wrong), . Holy cow, is actually right!

First of all, actually do some computations.

Secondly, I was wrongly thinking that failing by acquittal would be much more important than failing by unanimity. I think I was mislead because one of them occurs for values of and the other only occurs for one value. I should have realized two things (1) the bell curve is tightly peaked, so it is really only the very close to which matter and (2) exponentials are far more powerful than the ratio between or and anyway.

Finally, for the skeptics, here is an actual proof. Assuming , we have

The main step is to replace each by the largest it can be.

But also,

Here we have lower bounded the sum by one of its terms, and then used the easy bound since it is the largest of the entries in a row of Pascal’s triangle which sums to .

So the odds of failure are bounded between

and . We further use the convenient trick of replacing a with a , up to bounded error to get that the odds of failure are bounded between and .

Now, let be a probability greater than other than . We claim that choosing conviction probability will be better than for large. Indeed, the -strategy will fail with odds at least , and the strategy will fail with odds at most . Since , one of the two exponentials in the first case is larger than , and the -strategy is more likely to fail, as claimed.

Of course, for a Sanhedrin of members, , so our upper bound predicts only a one percent probability of failure. More accurately computations give . So the whole conversation deals with the overly detailed analysis of an unlikely consequence of a bizarre hypothetical event. Fortunately, this is not a problem in the study of Talmud!

]]>

This is the corner of a crystal of salt, as seen under an electron microscope. (I took the image from here, unfortunately I couldn’t find better information about the sourcing.) As you can see, the corner is a bit rounded, where some of the molecules have rubbed away. They ask the question: “What is the shape of that rounded corner?”

The molecules of a salt crystal form a cubical lattice. We can index them as . But it’s not the ones from the interior that are missing – if is missing and , and , then is missing as well.

A finite subset of with the property that implies for , , is what I’ll call a **three dimensional partition**. (Here I am deliberately rebelling against the awful classical terminology, which is a “plane partition“.)

The opening of Kenyon, Okounkov and Sheffield’s paper discusses the question: “What is the shape of a random three dimensional partition of size ?”

It was only after I had worked through this paper, and its sequels 1 2 fairly carefully that I realized they hadn’t actually answered their motivating question. They certainly implied what the answer should be, and they laid out all the necessary tools, but they never came back and said “let’s do the salt crystal example”.

In this post, I want to lay out in outline how this question is answered. The previous posts on Legendre transforms in statistical mechanics, and on random partitions, were meant as warm ups, where it is easier to make complete arguments.

Let me acknowledge right out that I am making no attempt at rigor here. What I want to do is sketch the argument, and hope this encourages some of you to read the amazing sequence of papers I have linked to.

In the previous posts, our first step was to analyze partitions of a given slope. Namely, we showed that there are roughly partitions that fit in a box, where . We would now like to similarly analyze three dimensional partitions with a given slope.

We could specify a normal vector and ask for the partition to have slope roughly , but there is an approach which turns out to be equivalent and is a bit easier to formulate. Let be positive real numbers with . The boundary of a plane partition is made up of squares which lie either in the plane, the plane or the plane. Let us suppose there were some function such that the number of partitions with some specified boundary, using roughly squares of the first type, of the second type and of the third type, is roughly . We would like to know what this function is.

I said “some specified boundary”. What boundary shall we use? As is often the case in mathematical physics, the nicest thing to do is to use “periodic boundary conditions” — which is to say, to avoid the issue of boundaries by wrapping the problem on a torus.

A perspective drawing can give us inspiration. The boundary of a three dimensional partition looks like a tiling of the plane by rhombi with angles and . The three planes which squares can lie in turn into the three possible orientations of the rhombi (colored red, black and blue below).

(Image taken from Eventually Almost Everywhere, who has further nice discussion of the relation between rhombus tilings and plane partitions.)

Tile the plane by equilateral triangles in the standard manner and then quotient that plane by some lattice to produce a torus. Let that torus contain upward pointing and downward pointing triangles; so we can hope to tile it with rhombi.

Let be the number of such tilings which use , and rhombi of the three potential types. So our goal is to determine .

As in the case of two dimensional partitions, the best strategy is to form a generating function. Let .

There is an amazing explicit formula for this generating function.

To tantalize you, I will state it without proof, and simply point you to the search term “Kasteleyn’s method” if you want to learn more.

Let us suppose our torus has a fundamental domain which is , so , and let be even. Set

.

We have (up to possible sign errors on my part)

.

Asymtotically, all four terms contributing to are about the same size, so . And how big is ? We can approximate that sum by an integral:

.

Set . This is known as the “Rankin function”. Then, as before, we conclude that is the Legendre transform of the Rankin function. And we conclude that a random three dimensional partition has the shape for some constant .

Something interesting happens. Suppose that . So cannot be zero for any complex numbers and with , . So is a harmonic function when we restrict and to those discs. The average of a harmonic function over the boundary of a disc is the value at the center of the disc. So simply equals when .

Thus, the surface contains, in particular, the planar region , , and similar planar regions in the and planes. This is a major difference between the two dimensional and three dimensional limit shapes — no part of the two dimensional limit curve is contained in the coordinate axes. While I'm not sure how seriously this picture should really be taken, it might do a bit to explain why those salt crystals above look so cubical.

There is another very important difference between the two and three dimensional pictures. In the two dimensional case, the only solutions of the variational problem were of the form . But, in the three dimensional case, is only one of many solutions to the resulting PDE. There is so much more to say about all of this. If you want to read more, I refer you to Chapter One of Kenyon and Okounkov.

]]>

It’s up on mathjobs, but applicants will need to apply through the university website. Here’s the pitch:

The Mathematical Sciences Institute at the Australian National University is seeking to invigorate its research and teaching profile in the areas of statistics, probability, stochastic analysis, mathematical finance and/or biomathematics/biostatistics. We wish to fill several continuing positions at the Academic Level B and/or Level C (which equates to the position of Associate Professor within the United States of America). Up to 3 full time positions may be awarded.

You will be joining an internationally recognised leading team of academics with a focus on achieving excellence in research and teaching. The Institute comprises of approximately fifty academics, within seven mathematical research programs. Applicants are expected to have an outstanding record in research, teaching and administration. All positions will involve some teaching, in the specialised areas advertised and/or standard mathematics undergraduate courses, but this may be at a reduced level for several years.

It’s a great place to work, excellent opportunities for research grant funding, and a really nice place to live. Feel free to contact me if you have any questions about the job or living in Canberra.

**Please pass this on to friends with relevant interests!**

(Oh, and don’t forget those two postdocs I’m hiring in quantum algebra, higher category theory, subfactors, representation theory, etc.)

]]>

By a partition of , we mean positive integers with . We draw a partition as a collection of boxes: For example, this is :

Suppose we let , select partitions of uniformly at random and rescale the size of the boxes by , so that the diagram of the partition always has area . What is the shape of the most likely diagram?

We want to describe, given a particular shape, how many partitions have roughly that shape. To that end, let , , …, be a sequence of points with and . Let's count partitions which fit inside the box and whose boundary passes through the points .

To describe such a partition, we simply must describe the portion between and , and then describe the portion between and , and so forth; each of these portions is independent. Set and (so ). Set .

So the number of such partitions is

.

From our computations in an earlier post, this is roughly

where . Moreover, if we rescale all of the and by , this approximation becomes better.

So, if we want there to be a lot of partitions, we should make large.

We have now described what happens if we require the boundary of the partition to pass through finitely many fixed points. Suppose instead we insist that the partition lie near a fixed curve? Parametrize that curve as , with the normalization that . I’m not even going to say what I mean precisely, but we can extrapolate from the preceding formula that the number of partitions whose boundary is near is roughly

We thus obtain a problem in the calculus of variations: Consider curves , with the normalizations that and the area enclosed by and the coordinate axes is . Subject to these conditions, maximize . The next several paragraphs are solving this variational problem.

Rotating the coordinates , we see that the condition on the enclosed area is simply that .

As always in problems involving the calculus of variations, the first step is to perturb our proposed solution and see how it changes. Replace by . If was optimal, than should be a maximum, so the first order variation in should vanish.

This is only valid if our perturbation preserves the conditions on . We want the perturbed functions to obey , so we take , and we want to preserve the area, so we take .

For such a perturbation, the first order variation in is supposed to vanish. We compute:

.

Integrating by parts:

.

We want this to hold for any obeying . This will happen if and only if is a constant. So we want for some constants and . So .

The formula , in some sense, solves the variational problem. Our next task is to simplify that formula.

From the previous post, we know that . So , where is another constant of integration.

As , we want . This shows that .

As , we want . This shows that . So we have and .

To evaluate the final constant , we need to use that the area under the curve is . It is convenient to eliminate and relate and directly as , or, . Now, the area under the curve is . We deduce that .

In other words, our final answer is that the curve is .

How good is this approximation? See for yourself! Here is a random partition of , and the approximating curve.

The thing which I think is really cool is that we never needed the explicit formula ; we only needed that it was Legendre dual to . Suppose that we looked at some other sort of local probability distribution on partitions: For example, maybe we would weight the likehood of a partition by or something like that. We could still build a generating function for partitions fitting in a box with this weighting. We could then try to compute . If we succeeded, the exact same logic as before would show that was the shape of a random partition of area chosen from this distribution. Which brings me to the thing I really want to talk about … next time.

]]>

We are going to be considering systems with parts, and asking how many states they can be in. The answers will be exponential in , and all that we care about is the base of that exponent. For example, the number of ways to partition an element set into two sets of size (if is even) is which Stirling’s formula shows to be . All we will care about is that .

How many ways can we partition an element set into a set of size roughly and a set of size roughly ? More mucking around with Stirling’s formula will get us the answer

I’ll show you a non-rigorous way to get that answer without getting into the details of Stirlings formula.

Let’s suppose there is a function so that the number of ways to partition an element set into a piece of size and a piece of size is roughly . Let’s look at the generating function

Then we expect

For fixed , there is presumably some which maximizes . The terms coming from near that will overwhelm the others, so we should have

or

Set . So we should have

In other words, we expect and to be Legendre dual. In particular, we expect to have all of the following formulas:

To spell the last one out in words, is a function of and is a function of ; they should be inverse functions. (Keeping the signs straight is one of the real nuisances in this subject.)

In this case, we can compute explicitly. We have , so so . We use the relation that the derivatives of Legendre dual functions are inverse. Inverting gives so

and we can compute the integral to get

I’m trying to learn to write shorter posts, so I’ll stop here for now. Next time, we compute the shape of a random partition of .

]]>

]]>

We’ve just put up an ad for a new 2 year postdoctoral position at the ANU, to work with myself and Tony Licata. We’re looking for someone who’s interested in operator algebras, quantum topology, and/or representation theory, to collaborate with us on Australian Research Council funded projects.

The ad hasn’t yet been crossposted to MathJobs, but hopefully it will eventually appear there! In any case, applications need to be made through the ANU website. You need to submit a CV, 3 references, and a document addressing the selection criteria. Let me know if you have any questions about the application process, the job, or Canberra!

]]>