Bounds for sum free sets in prime power cyclic groups — three ways

A few weeks ago, I e-mailed Will Sawin excitedly to tell him that I could extend the new bounds on three-term arithmetic progression free subsets of (\mathbb{Z}/p \mathbb{Z})^n to (\mathbb{Z}/p^r \mathbb{Z})^n. Will politely told me that I was the third team to get there — he and Eric Naslund already had the result, as did Fedor Petrov. But I think there might be some expository benefit in writing up the three arguments, to see how they are all really the same trick underneath.

Here is the result we are proving: Let q=p^r be a prime power and let C_q be the cyclic group of order q. Let A \subset C_q^n be a set which does not contain any three term arithmetic progression, except for the trivial progressions (a,a,a). Then

\displaystyle{|A| \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^i \leq n(q-1)/3 \}|.}

The exciting thing about this bound is that it is exponentially better than the obvious bound of q^n. Until recently, all people could prove was bounds like q^n/n^c, and this is still the case if q is not a prime power.

All of our bounds extend to the colored version: Let (\bar{a}_i, \bar{b}_i, \bar{c}_i) be a list of N triples in ((C_q)^n)^3 such that \bar{a}_i-2 \bar{b}_i+\bar{c}_i=0, but \bar{a}_i-2\bar{b}_j+\bar{c}_k \neq 0 if (i,j,k) are not all equal. Then the same bound applies to N. To see that this is a special case of the previous problem, take \bar{a}_i = \bar{b}_i = \bar{c}_i. Once the problem is cast this way, if q is odd, one might as well define (a_i, b_i, c_i) = (\bar{a}_i, -2 \bar{b}_i, \bar{c}_i), so our hypotheses are that a_i+b_i+c_i = 0 but a_i+b_j+c_k \neq 0 if (i,j,k) are not all equal. We will prove our bounds in this setting.

My argument — Witt vectors

I must admit, this is the least slick of the three arguments. The reader who wants to cut to the slick versions may want to scroll down to the other sections.

We will put an abelian group structure \oplus on the set \mathbb{F}_p^r which is isomorphic to C_q, using formulas found by Witt. I give an example first: Define an addition on \mathbb{F}_3^2 by

\displaystyle{(x_0,x_1) \oplus (y_0, y_1) = (x_0+y_0, x_1+y_1 - x_0^2 y_0 - x_0 y_0^2)}

The reader may enjoy verifying that this is an associative addition, and makes \mathbb{F}_3^2 into a group isomorphic to C_9. For example, (1,0) \oplus (1,0) \oplus (1,0) = (2,1) \oplus (1,0) = (0,1) and (0,1) \oplus (0,1) \oplus (0,1) = (0,0).

In general, Witt found formulas

\displaystyle{(x_0,x_1, \ldots x_{r-1}) \oplus (y_0, y_1, \ldots, y_{r-1}) = (S_0(x,y), S_1(x,y), \ldots, S_{r-1}(x,y))}

such that \mathbb{F}_p^r becomes an abelian group isomorphic to C_q. If we define x_e and y_e to have degree p^e, then S_e is homogenous of degree p^e. (Of course, Witt did much more: See Wikipedia or Rabinoff.)


\displaystyle{(x_0,x_1, \ldots x_{r-1}) \oplus (y_0, y_1, \ldots, y_{r-1}) \oplus (z_0, \ldots, z_{r-1}) = (S_0(x,y,z), S_1(x,y,z), \ldots, S_{r-1}(x,y,z))}.

and set

\displaystyle{\bar{F}(x,y,z) = \prod_{i=0}^{r-1} \left( 1-S_i(x,y,z)^{p-1} \right)}.

For example, when q=9, we have

\displaystyle{\bar{F} = \left( 1-(x_0+y_0+z_0)^2 \right) \left( 1-(x_1+y_1+z_1 - x_0^2 y_0 - x_0 y_0^2 - x_0^2 z_0 - x_0 z_0^2 - y_0^2 z_0 - y_0 z_0^2 + x_0 y_0 z_0)^2 \right)}.

So \bar{F}(x,y,z)=0 if and only if (x_0, \ldots, x_{r-1}) \oplus (y_0, \ldots, y_{r-1}) \oplus (z_0, \ldots, z_{r-1}) \neq 0 in C_q.

We now work with 3kn variables, x_e^f, y_e^f and z_e^f, where 0 \leq e \leq k-1 and 1 \leq f \leq n. Consider the polynomial

\displaystyle{ \bar{G} = \prod_{f=1}^n \bar{F}(x^f, y^f, z^f) }.

Here each \bar{F} is a polynomial in 3k variables.

So \bar{G} is a polynomial on (\mathbb{F}_p^k)^{3n}. We identify this domain with C_q^{3n}. Then \bar{G}(x,y,z)=0 if and only if x+y+z \neq 0 in the group C_q^n.

We define the degree of a monomial in the x_e^f, y_e^f and z_e^f by setting \deg(x_e^f)=\deg(y_e^f)=\deg(z_e^f)=p^e. In this section, “degree” always has this meaning, not the standard one. The degree of S_e is p^e; the degree of \bar{F} is (p-1) + p(p-1) + p^2(p-1) + \cdots + p^{r-1}(p-1) = p^r-1 = q-1 and the degree of \bar{G} is (q-1)n.

From each monomial in \bar{G}, extract whichever of \prod (x_e^f)^{i_e^f}, \prod (y_e^f)^{j_e^f} or \prod (z_e^f)^{k_e^f} has lowest degree. We see that we can write

\displaystyle{ \bar{G}(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B_s and C_s are monomials of degree \leq (q-1)n/3.

The now-standard argument (I like Terry Tao’s exposition) shows that N is bounded by three times the number of monomials \prod (x_e^f)^{m_e^f} of degree \leq (q-1)n/3. One needs to check that the argument works when the “degree” of a variable need not be 1, but this is straightforward.

Except we have a problem! There are too many monomials. To solve this issue, let F be the polynomial obtained from \bar{F} by replacing every monomial u^{\bar{k}} by u^k where k \equiv \bar{k} \bmod p-1 with k=0 if \bar{k}=0 and 1 \leq k \leq p-1 if \bar{k}>0. So F coincides with \bar{F} as a function on \mathbb{F}_p^k, but F uses smaller monomials. For example, the reader who multiplies out the expression for \bar{F} when q=9 will find a term 2 x_0^6 y_0 z_0. In F, this is replaced by 2 x_0^2 y_0 z_0. The polynomial F does not have the nice factorization of \bar{F}, but it is much smaller. For example, when q=9, \bar{F} has 137 nonzero monomials and F has 65. Replacing \bar{F} by F can only lower degree, so \deg(F) \leq q-1. Now rerun the argument with G(x,y,z) := \prod_{f=1}^n F(x^f,y^f,z^f). Our new bound is three times the number of monomials \prod (x_e^f)^{m_e^f} of degree \leq (q-1)n/3, with the additional condition that all exponents m_e^f are \leq p-1.

Now, the monomial \prod_e x_e^{m_e} has degree \sum_e p^e m_e. Identify [0,p-1]^r with [0,q-1] by sending (m_0, m_1, \ldots, m_{r-1}) to \sum m_e p^e. We can thus think of [0,p-1]^{rn} as [0,q-1]^n. We get the bound N \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^i \leq n(q-1)/3 \}|, just as in the prime case.

Naslund-Sawin — binomial coefficients

Let’s be much slicker. Here is how Naslund and Sawin do it (original here).

Notice that, by Lucas’s theorem, the function x \mapsto \binom{x}{m} is a well defined function C_q \to \mathbb{F}_p when 0 \leq m \leq q-1. Moreover, using Lucas again,

\displaystyle{\binom{x-1}{q-1}=\begin{cases} 1 & x \equiv 0 \bmod q \\ 0 & \mbox{otherwise} \end{cases}}.

Define a function F: C_q^3 \to \mathbb{F}_p by

\displaystyle{F(x,y,z) = \binom{x+y+z-1}{q-1} = \sum_{i+j+k = q-1} \binom{x-1}{i} \binom{y}{i} \binom{z}{k}}

\displaystyle{=\sum_{i+j+k \leq  q-1} (-1)^{q-1-i-j-k} \binom{x}{i} \binom{y}{i} \binom{z}{k}}.

Here we have expanded by Vandermonde’s identity and used \binom{x-1}{m} = \sum_{i \leq m} (-1)^{m-i} \binom{x}{i}.

Define a function G: C_q^{3n} \to \mathbb{F}_p by G(x,y,z) = \prod_{f=1}^n F(x^f,y^f,z^f) just as before. So G(x,y,z)=0 if and only if x+y+z \neq 0 in the abelian group C_q^n. Expanding G gives a sum of terms of the form \prod_f \binom{x^f}{i^f} \binom{y^f}{j^f} \binom{z^f}{k^f}. Considering such a term to have “degree” \sum_f (i^f+j^f+k^f), we see that G has degree \leq (q-1)n.

As in the standard proof, factor out whichever of \prod_f \binom{x^f}{i^f}, \prod_f \binom{y^f}{j^f} or \prod_f \binom{z^f}{k^f}, has least degree. We obtain

\displaystyle{ G(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B and C are products of binomial coefficients and, taking \deg \binom{w}{m}=m, we have \deg A_s, \deg B_s and \deg C_s \leq (q-1)n/3.

We derive the bound N \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^f \leq n(q-1)/3 \}|, exactly as before.

Group rings — Petrov’s argument

I have taken the most liberties in rewriting this argument, to emphasize the similarity with the other arguments. The reader can see the original here.

Let \Gamma = C_q^n. Let \mathbb{F}_p^{\Gamma} be the ring of functions \Gamma \to \mathbb{F}_p with pointwise operations, and let \mathbb{F}_p[\Gamma] be the group ring of \Gamma. We think of \mathbb{F}_p[\Gamma] acting on \mathbb{F}_p^{\Gamma} by \left( \left( \sum a_{\sigma} \sigma \right) f \right)(w) = \sum a_{\sigma} f(w+\sigma).

Let \tau_1, \tau_2, …, \tau_n be generators for \Gamma = C_q^n. Let Y_{\leq d} \subset \mathbb{F}_p^{\Gamma} the functions annihilated by the operators \prod_f (\tau_f-1)^{m_f} where \sum m_f = d+1. For example, Y_{\leq 1} is the functions f: \Gamma \to \mathbb{F}_p which obey f(w+\tau_i+\tau_j) - f(w+\tau_i)-f(w+\tau_j)+f(w)=0 for any w, i and j. We think of Y_{\leq d} as polynomials of degree \leq d, and the dimension of Y_{\leq d} is the number of monomials in n variables of total degree \leq d where each variable has degree \leq q-1.

Define H: \Gamma \to \mathbb{F}_p by H(0)=1 and H(w)=0 otherwise. Define G: \Gamma^3 \to \mathbb{F}_p by G(x,y,z) = H(x+y+z).

We write \alpha_i, \beta_i and \gamma_i for the generators of the three factors in \Gamma \times \Gamma \times \Gamma.
Then we have

\displaystyle{ \left( \prod (\alpha_f-1)^{i_f} \prod (\beta_f-1)^{j_f} \prod (\gamma_f-1)^{k_f} G\right) (x,y,z) = }

\displaystyle{ \left(  \prod (\tau_f-1)^{i_f+j_f+k_f}  H\right) (x+y+z)}.

So, if i_f+j_f+k_f = q-1, then \prod (\alpha_f-1)^{i_f} \prod (\beta_f-1)^{j_f} \prod (\gamma_f-1)^{k_f} G=0 as a function on \Gamma^3.

On the other hand, we can expand G(x,y,z) = \sum A_s(x) B_s(y) C_s(z) for A_s, B_s and C_s in \mathbb{F}_p^{\Gamma}. We see that, if i_f+j_f+k_f = q-1, then

\displaystyle{\sum_s \left( \prod (\alpha_f-1)^{i_f} A_s \right) (x) \left( \prod (\beta_f-1)^{j_f} B_s \right) (y) \left( \prod (\gamma_f-1)^{k_f} C \right) (z) =0}.

We make the familiar deduction: We can write G in the form

\displaystyle{ G(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B_s and C_s run over a basis for Y_{\leq (q-1)n/3}.

Once more, we obtain the bound N \leq 3 \dim Y_{(q-1)n/3}.

Petrov’s method has an advantage not seen in the other proofs: It generalizes well to the case that \Gamma is non-abelian. For any finite group \Gamma, let I be a one-sided ideal in \mathbb{F}_p[\Gamma] obeying I^3=0. In our case, this is the ideal generated by \prod_f (\tau_f-1)^{m_f} with \sum m_f = (q-1)n/3+1. Then we obtain a bound N \leq 3 \dim \mathbb{F}_p[\Gamma]/I for sum free sets in \Gamma.

What’s going on?

I find Petrov’s proof immensely clarifying, because it explains why the arguments all give the same bound. We are all working with functions C_q \to \mathbb{F}_p. I write them as polynomials in r variables x_e, Naslund and Sawin use binomial coefficients \binom{x}{m}. The formulas to translate between our variables are a mess: For example, my x_1 is their \tfrac{1}{p} (x-x^p) \bmod p. However, we both agree on what it means to be a polynomial of degree \leq d: It means to be annihilated by (\tau-1)^{d+1}.

In both cases, we take the indicator function of the identity and pull it back to \Gamma^3 along the addition map. The first two proofs use explicit identities to see that the result has degree \leq (q-1)n. The third proof points out this is an abstract property of functions pulled back along addition of groups, and has nothing to do with how we write the functions as explicit formulas.

I sometimes think that mathematical progress consists of first finding a dozen proofs of a result, then realizing there is only one proof. My mental image is settling a wilderness — first there are many trails through the dark woods, but later there is an open field where we can run wherever we like. But can we get anywhere beyond the current bounds with this understanding? All I can say is not yet…

2 thoughts on “Bounds for sum free sets in prime power cyclic groups — three ways

  1. This all seems quite similar to Schanuel’s work on generalising Chevalley-Warning to prime congruences (with applications to Davenport constant): Also see Section 2.3 in the paper by Clark, Forrow and Schmitt: where the discuss this. Though I haven’t checked all the details yet. But when I saw Petrov’s linear algebraic proof of cap-set bound and then a comment he made to me that he has proved the result for prime powers as well, this is one of the first thing that came to my mind.

  2. Thanks! Yes, these papers are definitely using similar tricks to get between characteristic p and p-th powers.

    The applications to additive number theory are also interesting. I have to assume experts already tried these methods for the cap set problem and got nowhere, but I haven’t tried, so I’ll give it some thought later today.

Comments are closed.