The shape of a random partition

In this post we will give a heuristic derivation of a result of Vershik, describing the shape of a random partition of a large integer $N$. (Vershik’s Russian original is available here; English translation is pay-walled.)

By a partition of $N$, we mean positive integers $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r > 0$ with $\sum \lambda_i = N$. We draw a partition as a collection of boxes: For example, this is $4+2+1+1+1$:

Suppose we let $N \to \infty$, select partitions of $N$ uniformly at random and rescale the size of the boxes by $1/\sqrt{N}$, so that the diagram of the partition always has area $1$. What is the shape of the most likely diagram?

We want to describe, given a particular shape, how many partitions have roughly that shape. To that end, let $(x_1, y_1)$, $(x_2, y_2)$, …, $(x_r, y_r)$ be a sequence of points with $x_1 and $y_1>y_2>\cdots>x_r$. Let's count partitions which fit inside the box $\{ (x,y) : x \leq x_r ,\ y \leq y_1 \}$ and whose boundary passes through the points $(x_i, y_i)$.

To describe such a partition, we simply must describe the portion between $(x_1, y_1)$ and $(x_2, y_2)$, and then describe the portion between $(x_2, y_2)$ and $(x_3, y_3)$, and so forth; each of these portions is independent. Set $(\delta x)_i=x_{i+1}-x_i$ and $(\delta y)_i = y_{i+1}-y_{i}$ (so $(\delta y)_i<0$). Set $(\delta t)_i = (\delta x)_i+(-(\delta y)_i)$.

So the number of such partitions is

$\displaystyle{\prod_i \binom{(\delta t)_i}{(\delta x)_i} }$.

From our computations in an earlier post, this is roughly

$\displaystyle{\exp \left( \sum_i (\delta t)_i \cdot h\left( \frac{\delta x_i}{\delta t_i} \right) \right)}$

where $h(p) = - p \log p - (1-p) \log (1-p)$. Moreover, if we rescale all of the $x_i$ and $y_i$ by $\sqrt{N}$, this approximation becomes better.
So, if we want there to be a lot of partitions, we should make $\sum (\delta t)_i h\left( \frac{\delta x_i}{\delta t_i} \right)$ large.

We have now described what happens if we require the boundary of the partition to pass through finitely many fixed points. Suppose instead we insist that the partition lie near a fixed curve? Parametrize that curve as $(x(t),y(t))$, with the normalization that $x(t)-y(t) = t$. I’m not even going to say what I mean precisely, but we can extrapolate from the preceding formula that the number of partitions whose boundary is near $\sqrt{N} (x(t), y(t))$ is roughly

$\displaystyle{\exp \left( \sqrt{N} \int_t h\left(\frac{dx}{dt} \right) dt \right)}$

We thus obtain a problem in the calculus of variations: Consider curves $(x(t), y(t))$, with the normalizations that $x(t)-y(t)=t$ and the area enclosed by $(x(t), y(t))$ and the coordinate axes is $1$. Subject to these conditions, maximize $\int h(x'(t)) dt$. The next several paragraphs are solving this variational problem.

Rotating the coordinates $45^{\circ}$, we see that the condition on the enclosed area is simply that $\int (x(t)+y(t)-|t|) dt = \sqrt{2}$.

As always in problems involving the calculus of variations, the first step is to perturb our proposed solution and see how it changes. Replace $(x(t), y(t))$ by $(x(t)+\epsilon u(t), y(t)+\epsilon v(t))$. If $(x(t), y(t))$ was optimal, than $\epsilon=0$ should be a maximum, so the first order variation in $\epsilon$ should vanish.

This is only valid if our perturbation preserves the conditions on $(x(t), y(t))$. We want the perturbed functions to obey $x(t)-y(t)=t$, so we take $u(t)=v(t)$, and we want to preserve the area, so we take $\int u(t) dt =0$.
For such a perturbation, the first order variation in $\epsilon$ is supposed to vanish. We compute:

$0 = \left. \frac{\partial}{\partial \epsilon} \right|_{\epsilon=0} \int_t h(x'(t)+\epsilon u'(t)) dt = \int u'(t) h'(x'(t)) dt$.

Integrating by parts:

$0 = \int u(t) \frac{d h'(x'(t))}{dt} dt$.

We want this to hold for any $u$ obeying $\int u(t)=0$. This will happen if and only if $\frac{d h(x'(t))}{dt}$ is a constant. So we want $h'(x'(t)) = Ct+D$ for some constants $C$ and $D$. So $x(t) = \int (h')^{-1}(Ct+D)$.

The formula $x(t) = \int (h')^{-1}(Ct+D)$, in some sense, solves the variational problem. Our next task is to simplify that formula.

From the previous post, we know that $\int h^{-1}(u) du = \log(1+e^u)$. So $x(t) = (1/C) \log (1+e^{Ct+D}) + B$, where $B$ is another constant of integration.

As $t \to - \infty$, we want $x(t) \to 0$. This shows that $B=0$.

As $t \to \infty$, we want $x(t)-t \to 0$. This shows that $D=0$. So we have $x(t) = (1/C) \log(1+e^{Ct})$ and $y(t) = (1/C) \log(1+e^{-Ct})$.

To evaluate the final constant $C$, we need to use that the area under the curve is $1$. It is convenient to eliminate $t$ and relate $x$ and $y$ directly as $e^{-C x} + e^{-C y} = 1$, or, $y = (1/C) \log \frac{1}{1-e^{-C x}}$. Now, the area under the curve is $\int_{x=0}^{\infty} \frac{dx}{C} \log \frac{1}{1-e^{-Cx}} = \frac{1}{C} \sum_{n=1}^{\infty} \frac{1}{n} \int_{x=0}^{\infty} e^{-nCx} dx = \frac{1}{C^2} \sum_{n=1}^{\infty} \frac{1}{n^2}$. We deduce that $C = \tfrac{\pi}{\sqrt{6}}$.

In other words, our final answer is that the curve is $e^{-\pi x/\sqrt{6}} + e^{-\pi y/\sqrt{6}} = 1$.

How good is this approximation? See for yourself! Here is a random partition of $1000$, and the approximating curve.

The thing which I think is really cool is that we never needed the explicit formula $h(p) = -p \log p - (1-p) \log p$; we only needed that it was Legendre dual to $\log(1+e^x)$. Suppose that we looked at some other sort of local probability distribution on partitions: For example, maybe we would weight the likehood of a partition by $\exp(\mbox{number of outer corners})$ or something like that. We could still build a generating function $Z_n(x)$ for partitions fitting in a $k \times (n-k)$ box with this weighting. We could then try to compute $z(x) = \lim_{n \to \infty} \frac{1}{n} \log Z_n(x)$. If we succeeded, the exact same logic as before would show that $z(x)$ was the shape of a random partition of area $N$ chosen from this distribution. Which brings me to the thing I really want to talk about … next time.

6 thoughts on “The shape of a random partition”

1. Curious now; how does one generate a uniform random partition?

2. Like this :).

Without that: Let $q(k,n)$ be the number of partitions of $n$ whose largest part is at most $k$. We can quickly compute a table of values of $q$ by the recursion $q(k,n) = q(k-1,n) + q(k,n-k)$.

We then write a recursive function $r(k,n)$ which builds a random partition of $n$ with largest part at most $k$: If $k=1$, return $(1,1,\ldots,1)$. Otherwise, choose an integer $j$ between $1$ and $k$ with probability $(q(j,n)-q(j-1,n))/q(k,n)$ and return $(j, \lambda)$, where $\lambda$ is the output of $r(j,n-j)$.

I had this implemented and it ran pretty much instantly up to $n=250$ or so, when I discovered that Mathematica already had a function to do this.

3. Can you say anything analogous for choosing a random partition according to Plancherel measure?