Using discord for online teaching

On Wednesday, I asked several of my students what tools they use to collaborate online on their problem sets. Several of them mentioned Discord. I am currently trying to set up a Discord channel for my class. I imagine I am not the only one in this situation, so I am writing up my progress as I go here . If you have relevant knowledge, please leave an answer to this question or edit mine!

If you have questions to discuss, let’s do that in the comment thread here. And please promote this on twitter and wherever else math teacher’s gather!

I did a dry run with 6 of my students this afternoon. We spent 10 minutes introducing each other to the system, broke into two groups of 3 and spent 10 minutes solving a math problem (problem 19.2 here) and 30 minutes debriefing. Most students found it awkward but workable. Here are things I/we saw as good:

  • Connection was reliable; much less dropping out than the video conferencing tools they report their other courses are using.
  • The presence of the chat stream with integrated graphics and LaTeX encouraged people to write things down, just like we encourage students to write on blackboards when doing group work in class. This made it easier for me to jump in and out of conversations.
  • It was pretty easy for me to jump back and forth from one group to the other. (Not sure how I would do with 4 or more groups, though.)
  • We really did solve a nontrivial problem and have a nontrivial conversation about it!

Things we didn’t like

  • Both groups needed, at one point, to draw an equation or commutative diagram like this one:
    \mathbb{Q}(\cos \tfrac{2 \pi}{7} ) \cong \mathbb{Q}[x]/f(x) \mathbb{Q}[x] \cong \mathbb{Q}(\cos \tfrac{4 \pi}{7}).
    One group did the LaTeX, the other draw a commutative diagram in MS Paint and dropped it into the thread. Both expressed frustration about how much slower this was than drawing a picture in person.
  • Some students felt that it was awkward not seeing the people they were talking to.

Read Izabella Laba on diversity statements

There has been a dispute running through mathematical twitter about diversity statements in academic hiring. Prompted by that, Izabella Laba has just written an excellent post, which affirms the importance of diversity as a goal, but lays out the many tricky issues with diversity statements. It makes a lot of points I would like to make, and raises others that I hadn’t thought of but should have. In case there is someone reading this blog but not reading Professor Laba’s, go check it out.

Sadly, this blog is still dead. I have a list of things I would like to write on it, but I have no idea when or if I will find the time.

Why single variable analysis is easier

Jadagul writes:

Got a draft of the course schedule for next year. Looks like I might get to teach real analysis.

I probably need someone to talk me out of trying to do everything in R^n.

A subsequent update indicates that the more standard alternative is teaching one variable analysis.

This is my second go around teaching rigorous multivariable analysis — key points are the multivariate chain rule, the inverse and implicit function theorems, Fubini’s theorem, the multivariate change of variables formula, the definition of manifolds, differential forms, Stokes’ theorem, the degree of a differentiable map and some preview of de Rham cohomology. I wouldn’t say I’m doing a great job, but at least I know why it’s hard to do. I haven’t taught single variable, but I have read over the day-to-day syllabus and problem sets of our most experienced instructor.

Here is the conceptual difference: It is quite doable to start with the axioms of an ordered field and build rigorously to the Fundamental Theorem of Calculus. Doing this gives students a real appreciation for the nontrivial power of mathematical reasoning. I don’t want to say that it is actually impossible to do the same for Stokes’ theorem (according to rumor, Brian Conrad did it), but I never manage — there comes a point where I start waving my hands and saying “Oh yes, and throw in a partition of unity” or “Yes, there is an inverse function theorem for maps between n-folds just like the one for maps between open subsets of \mathbb{R}^n.” I think most students probably benefit from seeing things done carefully for a term first.

Below the fold, a list of specific topics much harder in more than one variable. If you have found ways not to make them harder, please chime in in the comments!

Continue reading

A grad student fellowship

My CAREER grant includes funding for a thesis-writing fellowship for graduate students who have done extraordinary teaching and outreach during their time as a grad student.  If you know any such grad students who are planning on graduating in the 2018-2019 school year please encourage them to apply.  Deadline is Jan 31 and details here.

While I’m shamelessly plugging stuff for early career people, Dave Penneys, Julia Plavnik, and I are running an MRC this summer in Quantum Symmetry.  It’s aimed at people at -2 to +5 years from Ph.D. working in tensor categories, subfactors, topological phases of matter, and related fields, and the deadline is Feb 15.


Fighting the grad student tax

I’m throwing this post up quickly, because time is of the essence. I had hoped someone else would do the work. If they did, please link them in the comments.

As many of you know, the US House and Senate have passed revisions to the tax code. According to the House, but not the Senate draft, graduate tuition remissions are taxed as income. Thus, here at U Michigan, our graduate stipend is 19K and our tuition is 12K. If the House version takes effect, our students would be billed as if receiving 31K without getting a penny more to pay it with.

It is thus crucial which version of the bill goes forward. The first meeting of the reconciliation committee is TONIGHT, at 6 PM Eastern Time. Please contact your congress people. You can look up their contact information here. Even if they are clearly on the right side of this issue, they need to be able to report how many calls they have gotten about it when arguing with their colleagues. Remember — be polite, make it clear what you are asking for, and make it clear that you live and vote in their district. If you work at a large public university in their district, you may want to point out the effect this will have on that university.

I’ll try to look up information about congress people who are specifically on the committee or otherwise particularly vulnerable. Jordan Ellenberg wrote a “friends only” facebook post relevant to this, which I encourage him to repost publicly, on his blog or in the comments here.

UPDATE According to the Wall Street Journal, the grad tax is out. Ordinarily, I thank my congress people when they’ve been on the right side of an issue and won. (Congress people are human, and appreciate thanks too!) In this case, I believe the negotiations happened largely in secret, so I’m not sure who deserves thanks. If anyone knows, feel free to post below.

… and Elsevier taketh away.

Readers may recall that during the 2013 “peak-Elsevier” period, Elsevier made an interesting concession to the mathematical community — they released all their old mathematical content (“old” here means a rolling 4-year embargo) under a fairly permissive licence.

Unfortunately, sometime in the intervening period they have quietly withdrawn some of the rights they gave to that content. In particular, they no longer give the right to redistribute on non-commercial terms. Of course, the 2013 licence is no longer available on their website, but thankfully David Roberts saved a copy at The critical sentence there is

“Users may access, download, copy, display, redistribute, adapt, translate, text mine and data mine the articles provided that: …”

The new licence, at now reads

“Users may access, download, copy, translate, text and data mine (but may not redistribute, display or adapt) the articles for non-commercial purposes provided that users: …”

I think this is pretty upsetting. The big publishers hold the copyright on our collective cultural heritage, and they can deny us access to the mathematical literature at a whim. The promise that we could redistribute on a non-commercial basis was a guarantee that we could preserve the literature. If this is to be taken away, I hope that mathematicians will go to war again.

Hopefully Elsevier will soon come out with a “oops, this was a mistake, those lawyers, you know?” but this will only happen if we get on their case.

What to do:

  • Elsevier journal editors: please contact your Elsevier representations, and ask that the licence for the open archives be restored to what it was, to assure the mathematical community that we have ongoing access to the old literature.
  • Elsevier referees and authors: please contact your journal editors, to ask them to contact Elsevier. If you are currently refereeing or submitting, please bring up this issue directly.
  • Everyone: contact Elsevier, either by email or social media (twitter facebook google+).
  • Happily, as we have a copy of the 2013 licence, all the Elsevier open mathematics archive up to 2009 is still available for non-commercial redistribution under their terms. You can find these at

Bounds for sum free sets in prime power cyclic groups — three ways

A few weeks ago, I e-mailed Will Sawin excitedly to tell him that I could extend the new bounds on three-term arithmetic progression free subsets of (\mathbb{Z}/p \mathbb{Z})^n to (\mathbb{Z}/p^r \mathbb{Z})^n. Will politely told me that I was the third team to get there — he and Eric Naslund already had the result, as did Fedor Petrov. But I think there might be some expository benefit in writing up the three arguments, to see how they are all really the same trick underneath.

Here is the result we are proving: Let q=p^r be a prime power and let C_q be the cyclic group of order q. Let A \subset C_q^n be a set which does not contain any three term arithmetic progression, except for the trivial progressions (a,a,a). Then

\displaystyle{|A| \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^i \leq n(q-1)/3 \}|.}

The exciting thing about this bound is that it is exponentially better than the obvious bound of q^n. Until recently, all people could prove was bounds like q^n/n^c, and this is still the case if q is not a prime power.

All of our bounds extend to the colored version: Let (\bar{a}_i, \bar{b}_i, \bar{c}_i) be a list of N triples in ((C_q)^n)^3 such that \bar{a}_i-2 \bar{b}_i+\bar{c}_i=0, but \bar{a}_i-2\bar{b}_j+\bar{c}_k \neq 0 if (i,j,k) are not all equal. Then the same bound applies to N. To see that this is a special case of the previous problem, take \bar{a}_i = \bar{b}_i = \bar{c}_i. Once the problem is cast this way, if q is odd, one might as well define (a_i, b_i, c_i) = (\bar{a}_i, -2 \bar{b}_i, \bar{c}_i), so our hypotheses are that a_i+b_i+c_i = 0 but a_i+b_j+c_k \neq 0 if (i,j,k) are not all equal. We will prove our bounds in this setting.

My argument — Witt vectors

I must admit, this is the least slick of the three arguments. The reader who wants to cut to the slick versions may want to scroll down to the other sections.

We will put an abelian group structure \oplus on the set \mathbb{F}_p^r which is isomorphic to C_q, using formulas found by Witt. I give an example first: Define an addition on \mathbb{F}_3^2 by

\displaystyle{(x_0,x_1) \oplus (y_0, y_1) = (x_0+y_0, x_1+y_1 - x_0^2 y_0 - x_0 y_0^2)}

The reader may enjoy verifying that this is an associative addition, and makes \mathbb{F}_3^2 into a group isomorphic to C_9. For example, (1,0) \oplus (1,0) \oplus (1,0) = (2,1) \oplus (1,0) = (0,1) and (0,1) \oplus (0,1) \oplus (0,1) = (0,0).

In general, Witt found formulas

\displaystyle{(x_0,x_1, \ldots x_{r-1}) \oplus (y_0, y_1, \ldots, y_{r-1}) = (S_0(x,y), S_1(x,y), \ldots, S_{r-1}(x,y))}

such that \mathbb{F}_p^r becomes an abelian group isomorphic to C_q. If we define x_e and y_e to have degree p^e, then S_e is homogenous of degree p^e. (Of course, Witt did much more: See Wikipedia or Rabinoff.)


\displaystyle{(x_0,x_1, \ldots x_{r-1}) \oplus (y_0, y_1, \ldots, y_{r-1}) \oplus (z_0, \ldots, z_{r-1}) = (S_0(x,y,z), S_1(x,y,z), \ldots, S_{r-1}(x,y,z))}.

and set

\displaystyle{\bar{F}(x,y,z) = \prod_{i=0}^{r-1} \left( 1-S_i(x,y,z)^{p-1} \right)}.

For example, when q=9, we have

\displaystyle{\bar{F} = \left( 1-(x_0+y_0+z_0)^2 \right) \left( 1-(x_1+y_1+z_1 - x_0^2 y_0 - x_0 y_0^2 - x_0^2 z_0 - x_0 z_0^2 - y_0^2 z_0 - y_0 z_0^2 + x_0 y_0 z_0)^2 \right)}.

So \bar{F}(x,y,z)=0 if and only if (x_0, \ldots, x_{r-1}) \oplus (y_0, \ldots, y_{r-1}) \oplus (z_0, \ldots, z_{r-1}) \neq 0 in C_q.

We now work with 3kn variables, x_e^f, y_e^f and z_e^f, where 0 \leq e \leq k-1 and 1 \leq f \leq n. Consider the polynomial

\displaystyle{ \bar{G} = \prod_{f=1}^n \bar{F}(x^f, y^f, z^f) }.

Here each \bar{F} is a polynomial in 3k variables.

So \bar{G} is a polynomial on (\mathbb{F}_p^k)^{3n}. We identify this domain with C_q^{3n}. Then \bar{G}(x,y,z)=0 if and only if x+y+z \neq 0 in the group C_q^n.

We define the degree of a monomial in the x_e^f, y_e^f and z_e^f by setting \deg(x_e^f)=\deg(y_e^f)=\deg(z_e^f)=p^e. In this section, “degree” always has this meaning, not the standard one. The degree of S_e is p^e; the degree of \bar{F} is (p-1) + p(p-1) + p^2(p-1) + \cdots + p^{r-1}(p-1) = p^r-1 = q-1 and the degree of \bar{G} is (q-1)n.

From each monomial in \bar{G}, extract whichever of \prod (x_e^f)^{i_e^f}, \prod (y_e^f)^{j_e^f} or \prod (z_e^f)^{k_e^f} has lowest degree. We see that we can write

\displaystyle{ \bar{G}(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B_s and C_s are monomials of degree \leq (q-1)n/3.

The now-standard argument (I like Terry Tao’s exposition) shows that N is bounded by three times the number of monomials \prod (x_e^f)^{m_e^f} of degree \leq (q-1)n/3. One needs to check that the argument works when the “degree” of a variable need not be 1, but this is straightforward.

Except we have a problem! There are too many monomials. To solve this issue, let F be the polynomial obtained from \bar{F} by replacing every monomial u^{\bar{k}} by u^k where k \equiv \bar{k} \bmod p-1 with k=0 if \bar{k}=0 and 1 \leq k \leq p-1 if \bar{k}>0. So F coincides with \bar{F} as a function on \mathbb{F}_p^k, but F uses smaller monomials. For example, the reader who multiplies out the expression for \bar{F} when q=9 will find a term 2 x_0^6 y_0 z_0. In F, this is replaced by 2 x_0^2 y_0 z_0. The polynomial F does not have the nice factorization of \bar{F}, but it is much smaller. For example, when q=9, \bar{F} has 137 nonzero monomials and F has 65. Replacing \bar{F} by F can only lower degree, so \deg(F) \leq q-1. Now rerun the argument with G(x,y,z) := \prod_{f=1}^n F(x^f,y^f,z^f). Our new bound is three times the number of monomials \prod (x_e^f)^{m_e^f} of degree \leq (q-1)n/3, with the additional condition that all exponents m_e^f are \leq p-1.

Now, the monomial \prod_e x_e^{m_e} has degree \sum_e p^e m_e. Identify [0,p-1]^r with [0,q-1] by sending (m_0, m_1, \ldots, m_{r-1}) to \sum m_e p^e. We can thus think of [0,p-1]^{rn} as [0,q-1]^n. We get the bound N \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^i \leq n(q-1)/3 \}|, just as in the prime case.

Naslund-Sawin — binomial coefficients

Let’s be much slicker. Here is how Naslund and Sawin do it (original here).

Notice that, by Lucas’s theorem, the function x \mapsto \binom{x}{m} is a well defined function C_q \to \mathbb{F}_p when 0 \leq m \leq q-1. Moreover, using Lucas again,

\displaystyle{\binom{x-1}{q-1}=\begin{cases} 1 & x \equiv 0 \bmod q \\ 0 & \mbox{otherwise} \end{cases}}.

Define a function F: C_q^3 \to \mathbb{F}_p by

\displaystyle{F(x,y,z) = \binom{x+y+z-1}{q-1} = \sum_{i+j+k = q-1} \binom{x-1}{i} \binom{y}{i} \binom{z}{k}}

\displaystyle{=\sum_{i+j+k \leq  q-1} (-1)^{q-1-i-j-k} \binom{x}{i} \binom{y}{i} \binom{z}{k}}.

Here we have expanded by Vandermonde’s identity and used \binom{x-1}{m} = \sum_{i \leq m} (-1)^{m-i} \binom{x}{i}.

Define a function G: C_q^{3n} \to \mathbb{F}_p by G(x,y,z) = \prod_{f=1}^n F(x^f,y^f,z^f) just as before. So G(x,y,z)=0 if and only if x+y+z \neq 0 in the abelian group C_q^n. Expanding G gives a sum of terms of the form \prod_f \binom{x^f}{i^f} \binom{y^f}{j^f} \binom{z^f}{k^f}. Considering such a term to have “degree” \sum_f (i^f+j^f+k^f), we see that G has degree \leq (q-1)n.

As in the standard proof, factor out whichever of \prod_f \binom{x^f}{i^f}, \prod_f \binom{y^f}{j^f} or \prod_f \binom{z^f}{k^f}, has least degree. We obtain

\displaystyle{ G(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B and C are products of binomial coefficients and, taking \deg \binom{w}{m}=m, we have \deg A_s, \deg B_s and \deg C_s \leq (q-1)n/3.

We derive the bound N \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^f \leq n(q-1)/3 \}|, exactly as before.

Group rings — Petrov’s argument

I have taken the most liberties in rewriting this argument, to emphasize the similarity with the other arguments. The reader can see the original here.

Let \Gamma = C_q^n. Let \mathbb{F}_p^{\Gamma} be the ring of functions \Gamma \to \mathbb{F}_p with pointwise operations, and let \mathbb{F}_p[\Gamma] be the group ring of \Gamma. We think of \mathbb{F}_p[\Gamma] acting on \mathbb{F}_p^{\Gamma} by \left( \left( \sum a_{\sigma} \sigma \right) f \right)(w) = \sum a_{\sigma} f(w+\sigma).

Let \tau_1, \tau_2, …, \tau_n be generators for \Gamma = C_q^n. Let Y_{\leq d} \subset \mathbb{F}_p^{\Gamma} the functions annihilated by the operators \prod_f (\tau_f-1)^{m_f} where \sum m_f = d+1. For example, Y_{\leq 1} is the functions f: \Gamma \to \mathbb{F}_p which obey f(w+\tau_i+\tau_j) - f(w+\tau_i)-f(w+\tau_j)+f(w)=0 for any w, i and j. We think of Y_{\leq d} as polynomials of degree \leq d, and the dimension of Y_{\leq d} is the number of monomials in n variables of total degree \leq d where each variable has degree \leq q-1.

Define H: \Gamma \to \mathbb{F}_p by H(0)=1 and H(w)=0 otherwise. Define G: \Gamma^3 \to \mathbb{F}_p by G(x,y,z) = H(x+y+z).

We write \alpha_i, \beta_i and \gamma_i for the generators of the three factors in \Gamma \times \Gamma \times \Gamma.
Then we have

\displaystyle{ \left( \prod (\alpha_f-1)^{i_f} \prod (\beta_f-1)^{j_f} \prod (\gamma_f-1)^{k_f} G\right) (x,y,z) = }

\displaystyle{ \left(  \prod (\tau_f-1)^{i_f+j_f+k_f}  H\right) (x+y+z)}.

So, if i_f+j_f+k_f = q-1, then \prod (\alpha_f-1)^{i_f} \prod (\beta_f-1)^{j_f} \prod (\gamma_f-1)^{k_f} G=0 as a function on \Gamma^3.

On the other hand, we can expand G(x,y,z) = \sum A_s(x) B_s(y) C_s(z) for A_s, B_s and C_s in \mathbb{F}_p^{\Gamma}. We see that, if i_f+j_f+k_f = q-1, then

\displaystyle{\sum_s \left( \prod (\alpha_f-1)^{i_f} A_s \right) (x) \left( \prod (\beta_f-1)^{j_f} B_s \right) (y) \left( \prod (\gamma_f-1)^{k_f} C \right) (z) =0}.

We make the familiar deduction: We can write G in the form

\displaystyle{ G(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B_s and C_s run over a basis for Y_{\leq (q-1)n/3}.

Once more, we obtain the bound N \leq 3 \dim Y_{(q-1)n/3}.

Petrov’s method has an advantage not seen in the other proofs: It generalizes well to the case that \Gamma is non-abelian. For any finite group \Gamma, let I be a one-sided ideal in \mathbb{F}_p[\Gamma] obeying I^3=0. In our case, this is the ideal generated by \prod_f (\tau_f-1)^{m_f} with \sum m_f = (q-1)n/3+1. Then we obtain a bound N \leq 3 \dim \mathbb{F}_p[\Gamma]/I for sum free sets in \Gamma.

What’s going on?

I find Petrov’s proof immensely clarifying, because it explains why the arguments all give the same bound. We are all working with functions C_q \to \mathbb{F}_p. I write them as polynomials in r variables x_e, Naslund and Sawin use binomial coefficients \binom{x}{m}. The formulas to translate between our variables are a mess: For example, my x_1 is their \tfrac{1}{p} (x-x^p) \bmod p. However, we both agree on what it means to be a polynomial of degree \leq d: It means to be annihilated by (\tau-1)^{d+1}.

In both cases, we take the indicator function of the identity and pull it back to \Gamma^3 along the addition map. The first two proofs use explicit identities to see that the result has degree \leq (q-1)n. The third proof points out this is an abstract property of functions pulled back along addition of groups, and has nothing to do with how we write the functions as explicit formulas.

I sometimes think that mathematical progress consists of first finding a dozen proofs of a result, then realizing there is only one proof. My mental image is settling a wilderness — first there are many trails through the dark woods, but later there is an open field where we can run wherever we like. But can we get anywhere beyond the current bounds with this understanding? All I can say is not yet…

The growth rate for three-colored sum-free sets

There seems to be a rule that all progress on the cap set problem should be announced on blogs, so let me continue the tradition. Robert Kleinberg, Will Sawin and I have found the rate of growth of three-colored sum-free subsets of (\mathbb{Z}/q \mathbb{Z})^n, as n \to \infty. We just don’t know it is that what we’ve found!

The preprint is here.

Let me first explain the problem. Let H be an abelian group. A subset A of H is said to be free of three term arithmetic progressions if there are no solutions to a-2b+c=0 with a, b, c \in A, other than the trivial solutions (a,a,a). I’ll write C_q for the cyclic group of order q. Ellenberg and Giswijt, building on work by Croot, Lev and Pach have recently shown that such an A in C_3^n can have size at most 3 \cdot \# \{ (a_1, a_2, \ldots, a_n) \in  \{0,1,2 \}^n : \sum a_i \leq 2n/3 \}, which is \approx 2.755^n. This was the first upper bound better than 3^n/n^c, and has set off a storm of activity on related questions.

Robert Kleinberg pointed out the argument extends just as well to bound colored-sets without arithmetic progressions. A colored set is a collection of triples (a_i, b_i, c_i) in H^3, and we see that it is free of arithmetic progressions if we have a_i - 2 b_j + c_k = 0 if and only if i=j=k. So, if a_i = b_i = c_i, then this is the same as a set free of three term arithmetic progressions, but the colored version allows us the freedom to set the three coordinates separately.

Moreover, once a, b and c are treated separately, if \# H is odd, we may as well replace b_i by -2b_i and just require that a_i+b_i+c_i=0 if and only if i is odd. This is the three-colored sum-free set problem. Three-colored sum-free sets are easier to construct than three-term arithmetic-progression free sets, but the Croot-Lev-Pach/Ellenberg-Giswijt bounds apply to them as well*.

Our result is a matching of upper and lower bounds: There is a constant \gamma(q) such that

(1) We can construct three-colored sum-free subsets of C_q^n of size \exp(\gamma n-o(n)) and

(2) For q a prime power, we can show that three-colored sum-free subsets of C_q^n have size at most \exp(\gamma n).

So, what is \gamma? We suspect it is the same number as in Ellenberg-Giswijt, but we don’t know!

When q is prime, Ellenberg and Giswijt establish the bound 3 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq (q-1)n/3 \}. Petrov, and independently Naslund and Sawin (in preparation), have extended this argument to prime powers.

In the set \{ (a_1, a_2, \ldots, a_n) \in \{0,1,\ldots,q-1 \}^n : \sum a_i \leq (q-1)n/3 \}, almost all the n-tuples have a particular mix of components. For example, when q=3, almost all n tuples have roughly 0.514n zeroes, 0.305 n ones and 0.181 n twos. The number of such n tuples is roughly \exp(n \eta(0.514, 0.305, 0.181)), where \eta(p_0, p_1, \ldots, p_k) is the entropy - \sum p_k \log p_k.

In general, let (p_0, p_1, \ldots, p_{q-1}) be the probability distribution on \{ 0,1,\ldots, q-1 \} which maximizes entropy, subject to the constraint that the expected value \sum j p_j is (q-1)/3. Then almost all n-tuples in \{ (a_1, a_2, \ldots, a_n) \in \{0,1,\ldots,q-1 \}^n : \sum a_i \leq (q-1)n/3 \} have roughly p_j n copies of j, and the number of such n-tuples is grows like \exp(n \cdot \eta(p_0, \ldots, p_{q-1})). I’ll call (p_0, \ldots, p_{q-1}) the EG-distribution.

So Robert and I set out to construct three-colored sum-free sets of size
\exp(n \cdot \eta(p_0, \ldots, p_{q-1})). What we were actually able to do was to construct such sets whenever there was an S_3-symmetric probability distribution on T:= \{ (a,b,c) \in \mathbb{Z}_{\geq 0} : a+b+c = q-1 \} such that p_j was the marginal probability that the first coordinate of (a,b,c) was j, and the same for the b and c coordinates. For example, in the q=3 case, if we pick (2,0,0), (0,2,0) and (0,0,2) with probability 0.181 and (1,1,0), (1,0,1) and (0,1,0) with probability 0.152, then the resulting distribution on each of the three coordinates is the EG-distribution (0.514, 0.305, 0.181), and we can realize the growth rate of the EG bound for q=3.

Will pointed out to us that, if such a probability distribution on T does not exist, then we can lower the upper bound! So, here is our result:

Consider all S_3-symmetric probability distributions on T. Let (\psi_0, \psi_1,\ldots, \psi_{q-1}) be the corresponding marginal distribution, with \psi_j the probabilty that the first coordinate of (a,b,c) will be j. Let \gamma be the largest value of \eta(\psi_0, \ldots, \psi_{q-1}) for such a \psi. Then

(1) There are three-colored sum-free subsets of C_q^n of size \exp(\gamma n - o(n)) and

(2) If q is a prime power, such sets have size at most \exp(\gamma n).

Any marginal of an S_3-symmetric distribution on T has expected value (q-1)/3, so our upper bound is at least as strong as the Ellenberg-Gisjwijt/Petrov-Naslund-Sawin bound. We suspect they are the same: That their optimal probability distribution is such a marginal. But we don’t know!

Here are a few remarks:

(1) The restriction to S_3-symmetric distributions is a notational convenience. Really, all we need is that all three marginals are equal to each other. But we might as well impose S_3-symmetry because, if all the marginals of a distribution are equal, we can just take the average over all S_3 permutations of that distribution.

(2) Our lower bound does not need q to be a prime power. I’d love to know whether the upper bound can also remove that restriction.

(3) If the largest entropy of a marginal comes from a distribution \pi_{abc} on T with all \pi_{abc}>0, then the marginal distribution is the EG distribution. The problem is about the distributions at the boundary; it seems hard to show that it is always beneficial to perturb inwards.

(4) For q \geq 7, there is more than one distribution on T with the required marginal. One canonical choice would be the one which has largest entropy given that marginal. If the optimal solution has all \pi_{abc}>0, then one can show that it factors as \pi_{abc} = \beta(a) \beta(b) \beta(c) for some function \beta:\{0,1,\ldots,q-1\} \to \mathbb{R}_{>0}.

* One exception is that Fedor Petrov has lowered the bound for AP free sets in C_3^n to 2 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq 2n/3 \}+1, whereas the bound for sum-free is still 3 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq 2n/3 \}. But, as you will see, I am chasing much rougher bounds here.

Springer’s copyright agreement is, according to Springer, compatible with posting your article on the arXiv under the CC-BY license

It is far from clear to me that Springer’s copyright agreement (quoted below) is compatible with posting your paper on the arXiv under the CC-BY license. Happily, a Springer representative has just confirmed for me that this is allowed:

Let me first say that the cc-by-0 license is no problem at all as it allows for other publications without restrictions. Second, our copyright statement of course only talks about the version published in one of our journals, with our copyright line (or the copyright line of a partner society if applicable, or the author’s copyright if Open Access is chosen) on it.

At least if you are publishing in a Springer journal, and more generally, I would strongly encourage you to post your papers to the arXiv under the more permissive CC-BY-0 license, rather than the minimal license the arXiv requires.

As a question to any legally-minded readers: does copyright law genuinely distinguish between “the version published in one of our journals, with our copyright line”, and the “author-formatted post-peer-review” version which is substantially identical, barring the journals formatting and copyright line?


Pious penal probability puzzle

Here is a fun problem, with a great story and a surprising answer.

According to the Talmud, in order for the Sanhedrin to sentence a man to death, the majority of them must agree to it. However

R. Kahana said: If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. (Babylonian Talmud, Tractate Sanhedrin, Folio 17a)

Scott Alexander has a devious mind and considers how he would respond to this rule as a criminal:

[F]irst I’d invite a bunch of trustworthy people over as eyewitnesses, then I’d cover all available surfaces of the crime scene with fingerprints and bodily fluids, and finally I’d make sure to videotape myself doing the deed and publish the video on YouTube.

So, suppose you were on a panel of N judges, all of whom had seen overwhelming evidence of the accused’s guilt, and wanted to make sure that a majority of you would vote to convict, but not all of you. And suppose you cannot communicate. With what probability p would you vote to convict?

Test your intuition by guessing an answer now, then click below:

My gut instincts were that (1) we should choose p really close to 1, probably approaching 1 as N \to \infty and (2) there is no way this question would have a precise round answer. As you will see, I was quite wrong.

Tumblr user lambdaphagy is smarter than I was and wrote a program. Here are his or her results:

As you can see, it appears that p is not approaching 1, or even coming close to it, but is somewhere near 0.8. Can we explain this?

Heuristic solution

We want to avoid two events: unanimity, and a majority vote to acquit. The probability of unanimity is p^N.

The probability of a majority vote to acquit is \sum_{k=0}^{\lfloor N/2 \rfloor} \binom{N}{k} p^k(1-p)^{N-k}. Assuming that p>0.5, and it certainly should be, almost all of the contribution to that sum will come from terms where k \approx N/2. In that case, \binom{N}{k} \approx 2^N/\sqrt{N}. And we’ll roughly care about \sqrt{N} such terms. So the odds of acquittal are roughly \sqrt{N} \tfrac{2^N}{\sqrt{N}} p^{N/2} (1-p)^{N/2} = \sqrt{4p(1-p)}^N.

So we roughly want p^N+\sqrt{4p(1-p)}^N to be as small as possible. For N large, one of the two terms will be much larger than the other, so it is the same to ask that \max(p, \sqrt{4p(1-p)})^N be as small as possible.

Here is a plot of \max(p, \sqrt{4 p(1-p)}):

Ignore the part with p below 1/2; that’s clearly wrong and our approximation that \sum_{k=0}^{\lfloor N/2 \rfloor} \binom{N}{k} p^k(1-p)^{N-k} is dominated by k \approx N/2 won’t be good there. Over the range p>1/2, the minimum is where p=\sqrt{4p (1-p)}.

Let’s do some algebra: p = \sqrt{4p(1-p)}, p^2= 4 p(1-p), p = 4(1-p) (since p=0 is clearly wrong), p = 4/5. Holy cow, 4/5 is actually right!

Lessons learned

First of all, actually do some computations.

Secondly, I was wrongly thinking that failing by acquittal would be much more important than failing by unanimity. I think I was mislead because one of them occurs for N/2 values of k and the other only occurs for one value. I should have realized two things (1) the bell curve is tightly peaked, so it is really only the k very close to N/2 which matter and (2) exponentials are far more powerful than the ratio between N or \sqrt{N} and 1 anyway.

Rigorous computation

Finally, for the skeptics, here is an actual proof. Assuming p>1/2, we have
\displaystyle{ \sum_{k=0}^{\lfloor N/2 \rfloor} \binom{N}{k} p^k(1-p)^{N-k} \leq  \sum_{k=0}^{\lfloor N/2 \rfloor} \binom{N}{k} p^{N/2} (1-p)^{N/2} \leq  2^N p^{N/2} (1-p)^{N/2}}.
The main step is to replace each p^k(1-p)^{N-k} by the largest it can be.

But also,
\displaystyle{ \sum_{k=0}^{\lfloor N/2 \rfloor} \binom{N}{k} p^k(1-p)^{N-k} \geq \binom{N}{\lfloor N/2 \rfloor} p^{\lfloor N/2 \rfloor} (1-p)^{\lceil N/2 \rceil} \geq \frac{1}{N+1} \sqrt{\frac{1-p}{p}} \ 2^N p^{N/2} (1-p)^{N/2}} .
Here we have lower bounded the sum by one of its terms, and then used the easy bound \binom{N}{\lfloor N/2 \rfloor} \geq \frac{1}{N+1} 2^N since it is the largest of the N+1 entries in a row of Pascal’s triangle which sums to 2^N.

So the odds of failure are bounded between
\tfrac{1}{N+1} \sqrt{\frac{1-p}{p}} \sqrt{4p(1-p)}^N+p^N and \sqrt{4p(1-p)}^N+p^N. We further use the convenient trick of replacing a + with a \max, up to bounded error to get that the odds of failure are bounded between \max \left( \tfrac{1}{N+1} \sqrt{\frac{1-p}{p}} \sqrt{4p(1-p)}^N, p^N \right) and 2 \max \left( \sqrt{4p(1-p)}^N, p^N \right).

Now, let p be a probability greater than 1/2 other than 4/5. We claim that choosing conviction probability 4/5 will be better than p for N large. Indeed, the p-strategy will fail with odds at least \max \left( \tfrac{1}{N+1} \sqrt{\frac{1-p}{p}} \sqrt{4p(1-p)}^N, p^N \right), and the 4/5 strategy will fail with odds at most 2 \cdot (4/5)^N. Since \max(p, \sqrt{4p(1-p)}) > 4/5, one of the two exponentials in the first case is larger than 4/5, and the p-strategy is more likely to fail, as claimed.

Of course, for a Sanhedrin of 23 members, 2 (4/5)^{23} \approx 0.011, so our upper bound predicts only a one percent probability of failure. More accurately computations give 0.005. So the whole conversation deals with the overly detailed analysis of an unlikely consequence of a bizarre hypothetical event. Fortunately, this is not a problem in the study of Talmud!