Mathematics Literature Project progress

We’ve made some good progress over at the Mathematics Literature Project. In particular, we’ve completely analyzed the 2013 issues of five journals:

(The colour coded bars show the fractions of papers available on the arXiv, available on authors’ webpages, and not freely accessible at all; these now appear all over the wiki, but unfortunately don’t update automatically. Over at the wiki you can hover over these bars to get the numerical totals, too.)

Thanks everyone for your contributions so far! If you’ve just arrived, check out the tutorial I made on editing the wiki. Now, it’s time to do a little planning.

What questions should we be asking?

Here’s one we can start to answer right away.

What fraction of recent papers are available on the arXiv or on authors webpages?

For good generalist journals (e.g. Adv. Math. and Annals), almost everything! For subject area journals, there is wide variation (probably mostly depending on traditions in subfields): AGT is almost completely freely accessible, while Discrete Math. is at most half.

I hope we’ll soon be able to say this for many other journals, too.

Here’s the question I really want to have answers for:

Does being freely accessible correlate well with quality?

It’s certainly tempting to think so, seeing how accessible Advances and Annals are. I think to really answer this question we’re going to have to classify all the articles in slightly older issues (2010?) and then start looking at the citation counts for articles in the two pools. If we get coverage of more journals, we can also look for the correlation between, say, impact factor and the ratio of freely accessible content.

What next?

I don’t want to just list every journal on the wiki; it’s best if editors (and the helpful bots working in the background) can focus attention and enjoy the pleasures of finishing off issues and journals. Suggestions for journals to add next welcome in the comments. I’ve already included the tables of contents for the Journal of Number Theory, and the Journal of Functional Analysis. (It will be nice to be able to make comparisons between JFA and GAFA, I think.)

I’ve been working with some people on automating the entry of data in the wiki (mainly by using arXiv metadata; there are actually way more articles there with journal references and DOIs than I’d expected). Hopefully this will make the wiki editing experience more fun, as a lot of the work will have already been done, and humans just get to handle the hard and interesting cases.

An editable database tracking freely accessible mathematics literature.

(This post continues a discussion started by Tim Gowers on google+. [1] [2])

(For the impatient, go visit, or for the really impatient

It would be nice to know how much of the mathematical literature is freely accessible. Here by ‘freely accessible’ I mean “there is a URL which, in any browser anywhere in the world, resolves to the contents of the article”. (And my intention throughout is that this article is legitimately hosted, either on the arxiv, on an institutional repository, or on an author’s webpage, but I don’t care how the article is actually licensed.) I think it’s going to be okay to not worry too much about discrepancies between the published version and a freely accessible version — we’re all grown ups and understand that these things happen. Perhaps a short comment field, containing for example “minor differences from the published version” could be provided when necessary.

This post outlines an idea to achieve this, via a human editable database containing the tables of contents of journals, and links, where available, to a freely accessible copy of the articles.

It’s important to realize that the goal is *not* to laboriously create a bad search engine. Google Scholar already does a very good job of identifying freely accessible copies of particular mathematics articles. The goal is to be able to definitively answer questions such as “which journals are primarily, or even entirely, freely accessible?”, to track progress towards making the mathematical literature more accessible, and finally to draw attention to, and focus enthusiasm for, such progress.

I think it’s essential, although this is not obvious, that at first the database is primarily created “by hand”. Certainly there is scope for computer programs to help a lot! (For example, by populating tables of contents, or querying google scholar or other sources to find freely accessible versions.) Nevertheless curation at the per-article level will certainly be necessary, and so whichever route one takes it must be possible for humans to edit the database. I think that starting off with the goal of primarily human contributions achieved two purposes: one, it provides an immediate means to recruit and organize interested participants, and two, hopefully it allows much more flexibility in the design and organization of the collected data — hopefully many eyes will reveal bad decisions early, while they’re easy to fix.

That said, we better remember that eventually computers may be very helpful, and avoid design decisions that make computer interaction with the database difficult.

What should this database look like? I’m imagining a website containing a list of journals (at first perhaps just one), and for each journal a list of issues, and for each issue a table of contents.

The table of contents might be very simple, having as few as four columns: the title, the authors, the link to the publishers webpage, and a freely accessible link, if known. All these lists and table of contents entries must be editable by a user — if, for example no freely accessible link is known, this fact should be displayed along with a prominent link or button which allows a reader to contribute one.

At this point I think it’s time to consider what software might drive this website. One option is to build something specifically tailored to the purpose. Another is to use an essentially off-the-shelf wiki, for example tiddlywiki as Tim Gowers used when analyzing an issue of Discrete Math.

Custom software is of course great, but it takes programming experience and resources. (That said, perhaps not much — I’m confident I could make something usable myself, and I know people who could do it in a more reasonable timespan!) I want to essentially ignore this possibility, and instead use mediawiki (the wiki software driving wikipedia) to build a very simple database that is readable and editable by both humans and computers. If you’re impatient, jump to and start editing! I’ve previously used it to develop the Knot Atlas at with Dror Bar-Natan (and subsequently many wiki editors). There we solved a very similar set of problems, achieving human readable and editable pages, with “under the hood” a very simple database maintained directly in the wiki.

The quest for narrow admissible tuples

(A guest post by Andrew Sutherland.)

With more than 400 comments tacked on to the previous blog post, it’s past time to rollover to a new one. As just punishment for having contributed more than my fair share of those comments, Scott has asked me to write a guest post summarizing the current state of affairs. This task is made easier by Tao’s recent progress report on the polymath project to sharpen Zhang’s result on bounded gaps between primes. If you haven’t already read the progress report I encourage you to do so, but for the benefit of newcomers who would like to understand how our quest for narrow admissible tuples fits in the bounded prime gaps polymath project, here goes.

The Hardy-Littlewood prime tuples conjecture states that every admissible tuple has infinitely many translates that consist entirely of primes. Here a tuple is simply a set of integers, which we view as an increasing sequence t_1 < t_2 < \ldots < t_k; we refer to a tuple of size k as a k-tuple. A tuple is admissible if it does not contain a complete set of residues modulo any prime p. For example, 0,2,4 is not an admissible 3-tuple, but both 0,2,6 and 0,4,6 are. A translate of a tuple is obtained by adding a fixed integer to each element; the sequences 5,7,11 and 11,13,17 are the first two translates of 0,2,6 that consist entirely of primes, and we expect that there are infinitely more. Admissibility is clearly a necessary condition for a tuple to have infinitely many translates made up of primes; the conjecture is that it is also sufficient.

Zhang proved a weakened form the prime tuples conjecture, namely, that for all sufficiently large k\ge k_0, every admissible k-tuple has infinitely many translates that contain at least 2 primes (as opposed to k). He made this result explicit by showing that one may take k_0=3,500,000, and then noted the existence of an admissible k_0-tuple with diameter (difference of largest and smallest elements) less than 70,000,000. Zhang’s k_0-tuple consists of the first k_0 primes greater than k_0, which is clearly admissible. As observed by Trudgian, the diameter of this k_0-tuple is actually less than 60,000,000 (it is precisely 59,874,954).

Further improvements to Zhang’s bound came rapidly, first by finding narrower admissible k_0-tuples, then by optimizing k_0 and the critical parameter \varpi on which it depends (this means making \varpi larger; k_0 is proportional to \varpi^{-3/2}). Since it began on June 4, the polymath8 project has been working along three main lines of attack: (1) improving bounds on \varpi and a related parameter \delta, (2) deriving smaller values of k_0 from a given pair (\varpi, \delta), and (3) the search for narrow admissible k_0-tuples.You can see the steady progress that has been made on these three interlocking fronts by viewing the list of world records.

A brief perusal of this list makes it clear that, other than some quick initial advances made by tightening obvious slack in Zhang’s bounds, most of the big gains have come from improving the bounds on \varpi (edit: as pointed out by v08ltu below, reducing the dependence of k_0 on \varpi from \varpi^{-2} to \varpi^{-3/2} was also a major advance); see Tao’s progress report and related blog posts for a summary of this work. Once new values of \varpi and \delta have been established, it is now relatively straight-forward to derive an optimal k_0 (at least within 1 or 2; the introduction of Pintz’s method has streamlined this process). There then remains the task of finding admissible k_0-tuples that are as narrow as possible; it is this last step that is the subject of this blog post and the two that preceded it. Our goal is to compute H(k_0), the smallest possible diameter of an admissible k_0-tuple, or at least to obtain bounds (particularly upper bounds) that are as tight as we can make them.

A general way to construct a narrow admissible k_0-tuple is to suppose that we first sieve the integers of one residue class modulo each prime p\le k_0 and then choose a set of k_0 survivors, preferably ones that are as close together as possible. In fact, it is usually not necessary to sieve a residue class for every prime p\le k_0 in order to obtain an admissible k_0-tuple, asymptotically an O(k_0/\log k_0) bound should suffice. The exact number of residue classes that require sieving depends not only on k_0, but also on the interval in which one looks for survivors (it could also depend on the order in which one sieves residue classes, but we will ignore this issue).

All of the initial methods we considered involved sieving residue classes 0 mod p, and varied only in where to look for the survivors. Zhang takes the first k_0 survivors greater than 1 (after sieving modulo primes up to k_0), and Morrison’s early optimizations effectively did the same, but with a lower sieving bound. The Hensley-Richards approach instead selects survivors from an interval centered at the origin, and the asymmetric Hensley-Richards optimization shifts this interval slightly (see our wiki page for precise descriptions of each of these approaches, along with benchmark results for particular k_0 values of interest).

But there are sound practical reasons for not always sieving 0 mod p. Assuming we believe the prime tuples conjecture (which we do!), we can certainly find an optimally narrow admissible k_0-tuple somewhere among the primes greater than k_0, all of which survive sieving 0 modulo primes p\le k_0. However, the quantitative form of the prime tuples conjecture tells us roughly how far we might need to search in order to find one. The answer is depressingly large: the expected number of translates of any particular admissible k_0-tuple to be found among the primes p\le x is O(x/\log^{k_0}x), thus we may need to search through the primes in an interval of size exponential in k_0 in order to have a good chance of finding even one translate of the k_0-tuple we seek.

Schinzel suggested that it would be better to sieve 1 mod 2 rather than 0 mod 2, and more generally to sieve 1 mod p for all primes up to some intermediate bound and then switch to sieving 0 mod p for the remaining primes. We find that simply following Schinzel’s initial suggestion, works best, and one can see the improvement this yields on the benchmarks page (unlike Schinzel, we don’t restrict ourselves to picking the first k_0 survivors to the right of the origin, we may shift the interval to obtain a better bound).

But sieving a fixed set of residue classes is still too restrictive. In order to find narrower admissible tuples we must relax this constraint and instead consider a greedy approach, where we start by picking an interval in which we hope to find k_0 survivors (we know that the size of this interval should be just slightly larger than k_0 \log k_0), and then run through the primes in order, sieving whichever residue class is least occupied by survivors (we can break ties in any way we like, including at random). Unfortunately a purely greedy approach does not work very well. What works much better is to start with a Schinzel sieve, sieving 1 mod 2 and 0 mod primes up to a bound slightly smaller than \sqrt{k_0\log k_0}, and then start making greedy choices. Initially the greedy choice will tend to be the residue class 0 mod p, but it will deviate as the primes get larger. For best results the choice of interval is based on the success of the greedy sieving.

This is known as the “greedy-greedy” algorithm, and while it may not have a particularly well chosen name (this is the downside of doing math in a blog comment thread, you tend to throw out the first thing that comes to mind and then get stuck with it), it performs remarkably well. For the values of k_0 listed in the benchmarks table, the output of the greedy-greedy algorithm is within 1 percent of the best results known, even for k_0 = 342, where the optimal value is known.

But what about that last 1 percent? Here we switch to local optimizations, taking a good admissible tuple (e.g. one output by the greedy-greedy algorithm) and trying to make it better. There are several methods for doing this, some involve swapping a small set of sieved residue classes for a different set, others shift the tuple by adding elements at one end and deleting them from the other. Another approach is to randomly perturb the tuple by adding additional elements that make it inadmissible and then re-sieving to obtain a new admissible tuple. This can be done in a structured way by using a randomized version of the greedy-greedy algorithm to obtain a similar but slightly different admissible tuple in approximately the same interval, merging it with the reference tuple, and then re-sieving to obtain a new admissible tuple. These operations can all be iterated and interleaved, ideally producing a narrower admissible tuple. But even when this does not happen one often obtains a different admissible tuple with the same diameter, providing another reference tuple against which further local optimizations may be applied.

Recently, improvements in the bounds on \varpi brought k_0 below 4507, and we entered a regime where good bounds on H(k) are already known, thanks to prior work by Engelsma. His work was motivated by the second Hardy-Littlewood conjecture, which claims that \pi(x+y)-\pi(x)\le \pi(y) for all x,y\ge 2, a claim that Hensley and Richards showed is asymptotically incompatible with the prime tuples conjecture (and now generally believed to be false). Engelsma was able to find an admissible 447-tuple with diameter 3158, implying that if the prime tuples conjecture holds then there exists an x (infinitely many in fact) for which \pi(x+3159)-\pi(x)=447, which is greater than \pi(3159) = 446. In the process of obtaining this result, Engelsma spent several years doing extensive computations, and obtained provably optimal bounds on H(k) for all k\le 342, as well as upper bounds on H(k) for k \le 4507. The quality of these upper bounds is better in some regions than in others (Engelsma naturally focused on the areas that were most directly related to his research), but they are generally quite good, and for k up to about 700 believed to be the best possible.

We have now merged our results with those of Engelsma and placed them in an online database of narrow admissible k-tuples that currently holds records for all k up to 5000. The database is also accepts submissions of better admissible tuples, and we invite anyone and everyone to try and improve it. Since it went online a week ago it has processed over 1000 submissions, and currently holds tuples that improve Engelsma’s bounds at 1927 values of k, the smallest of which is 785. As I write this k_0 stands at 873 (subject to confirmation), which happens to be one of the places where we have made an improvement, yielding a current prime gap bound of 6,712 (but I expect this may drop again soon).

In addition to supporting the polymath prime gaps project, we expect this database will have other applications, including further investigations of the second Hardy-Littlewood conjecture. As can be seen in this chart, not only have we found many examples of admissible k-tuples whose diameter d satisfies k > \pi (d+1), one can view the growth rate of the implied lower bounds on k - \pi(H(k)+1) in this chart.

More narrow admissible sets

It looks like it may be time to roll over the search for narrow admissible sets to a new blog post, as we’re approaching 100 comments on the original thread.

In the meantime, an official polymath8 project has started. The wiki page is a good place to get started. Work to understand and improve the bounds in Zhang’s result on prime gaps has split into three main areas.

1) A reading seminar on Zhang’s Theorem 2.
2) A discussion on sieve theory, bridging the gap begin Zhang’s Theorem 2 and the parameter k_0 (see also the follow-up post).
3) Efforts to find narrow admissible sets of a given cardinality k_0 — the width of the narrowest set we find gives the current best bound on prime gaps.

We started on 3) in the previous blog post, and now will continue here. I’ll try to summarize the situation.

Just recently there’s been a significant improvement in k_0, the desired cardinality of the admissible set, and we’re now looking at k_0 = 34,429. Hopefully there’s going to be a whole new round of techniques, made possible by the significantly smaller problem size.

As I write this, the narrowest admissible set of size 34,429 found so far, due to Andrew Sutherland, has width 388,118.

This was found using the “greedy-greedy” algorithm. This starts with some chosen interval of integers, in this case [-185662,202456], and then sieves as follows. First discard 1 mod 2, and then 0 mod p for p \leq b, for some parameter b. (I’m not actually sure of the value of this parameter in Andrew’s best set.) After that, for each prime we pick a minimally occupied residue class, and sieve that out. Assuming we picked a sufficiently wide interval to begin with, when we’re done the resulting admissible set with still have at least k_0 elements.

More generally, there are several directions worth pursuing

1. sharpening bounds on \rho^*(x), the maximal cardinality of an admissible set of width at most x,
2. finding new constructions of admissible sets of a given size (and also ‘almost-admissible’ sequences)
3. developing algorithms or search techniques to find narrow admissible sets, perhaps starting from a wider or smaller admissible set, or starting from an ‘almost-admissible’ set.

(If these questions carry us in different directions, there’s always more room on the internet!)

For sufficiently small sizes (at most 372), everything is completely understood due to exhaustive computer searches described at At least for now, we need to look at much larger sizes, so obtaining bounds and finding probabilistic methods is probably the right approach.

I’m writing this on a bus, beginning 30 hours of travel. (To be followed by a short sleep then an intense 3 day conference!) So my apologies if I missed something important!

I just can’t resist: there are infinitely many pairs of primes at most 59470640 apart

Everyone by now has heard about Zhang’s landmark result showing that there are infinitely many pairs of primes at most 70000000 apart.

His core result is that if a set of 3.5 * 10^6 (corrected, thanks to comment #2) numbers H is admissible (see below), then there are infinitely many n so that n+H contains at least two primes. He then easily constructs an admissible set wherein the largest difference is 7 * 10^7, obtaining the stated result.

A set H is admissible if there is no prime p so H \pmod p occupies every residue class. For a given H this is clearly a checkable condition; there’s no need to look at primes larger than |H|.

(While Zhang went for a nice round number, Mark Lewko found his argument in fact gives 63374611, if you’re being overly specific about these things, which we are right now. :-)

In a short note on the arXiv yesterday, Tim Trudgian (whose office is not far from mine) pointed out another way to build an admissible set, giving a smaller largest difference, obtaining the result that there are infinitely many pairs of primes at most 59874594 apart. He considers sets of the form H_m = {p_{m+1}, \ldots, p_{m+k_0}} (where k_0 is Zhang’s constant 3.5 * 10^7). These aren’t necessarily admissible, but they are for some values of m, and both Zhang and Tim noticed certain values for which this is easy to prove. Zhang used H_m with m=k_0, while Tim’s observation is that m_0=250150=\pi(k_0) also works. (Comment #2 below points out this isn’t right, and Zhang also intended m=\pi(k_0), and the slack in his estimate is coming from just looking at the largest element of H_m, rather than the largest difference.) Thus the bound in his result is p_{m_0+k_0}-p_{m_0+1} = 59874594.

It turns out that checking admissibility for a given H_m isn’t that hard; it takes about an hour to check a single value for m ~ m_0 (but if you find a prime witnessing H_m not being admissible, it very often gives you a fast proof that H_{m+1} is not admissible either, so searching is much faster).

I haven’t looked exhaustively, but one can check that m_1 = m_0 / 2 = 125075 gives an admissible H_m, and hence there are infinitely many pairs of primes at most p_{m_1 + k_0} - p_{m_1 + 1} = 59470640. (Sadly, it’s impossible to get below 59 million with this trick; no m below 27000 works; all witnessed by p=182887 or 378071.)

I just couldn’t resist momentarily “claiming the crown” for the smallest upper bound on gap size. :-) Of course the actual progress, that’s surely coming soon from people who actually understand Zhang’s work, is going to be in reducing his 3.5 * 10^6. You can read more about prospects for that in the answers to this MathOverflow question.

University of Melbourne hiring

Often Australian jobs don’t make it on to, for various reasons, so I thought I’d help distribute this one — Melbourne is hiring, initially a full professor, and subsequently several tenure track assistant professor positions, all in pure mathematics.

Below the fold, Arun Ram’s message about this. Australia is a nice place to come and work!

Continue reading

Mathematicians take a stand

Douglas Arnold and Henry Cohn have just posted to the arXiv their article Mathematicians take a stand, which will also appear in the Notices of the AMS shortly.

In it they describe the background to the Elsevier boycott, and make a case for more people joining in. It might appear at this point that the boycott is losing steam, but I’m pretty sure this is not the case. A lot has been happening, although too much of it is “behind the scenes”. Elsevier has made some concessions, although these so far seem to mostly miss the point. There’s a rumour of more to come. Two weeks ago some representatives met with the Journal of Number Theory’s editorial board, to discuss their concerns. The meeting was inconclusive, but it seems the editorial board there remains unsatisfied with Elsevier. (It appears that the minutes of that meeting are googleable…)

I hope that the prestigious mathematics journals still with Elsevier (particularly the three discussed in Arnold and Cohn’s article: Advances, the Journal of Algebra, and the Journal of Number Theory) make sure they get what they need from Elsevier. Right now, they have the support of the mathematical community, and I hope they will not be timid about making demands. I’m not sure what’s already been discussed in private, but this would be my list:

  • Legal ownership of the journals to be transferred to the editorial board or their chosen representatives, with Elsevier kept on as the publisher on a contract basis.
  • Open access to the historical archives (everything older than 5 years), with clear rules and ideally an open license, allowing access for indexing, preservation and analysis.
  • Prices reduced to no more than twice the per page cost of comparable journals published by professional societies, and a clear mechanism for libraries to achieve proportional reductions in bundle costs by dropping unwanted journals.

I don’t realistically expect that Elsevier will agree to any of these right away. But, nevertheless, I hope we demand this of them. At the moment their interests are simply diametrically opposed to ours — their commercial interest is to restrict access to our work, while our interest is to have the widest possible dissemination. Until they clearly see that we’re willing to take our toys and go home, I think there’s little hope of Elsevier taking meaningful steps.

Finally, it would be great  to hear from editors at Elsevier journals about the impact of the boycott. Has Advances seen a decrease in quantity or quality of submissions? If so, what are they going to do about it?


Although I wouldn’t want to a hit someone while they’re down, hitting a large faceless evil corporation, whose sole purpose is to extract rents from the academic community, while they’re down rather appeals to me.

Elsevier just sent out an email announcing (amongst many other things, to be blogged about later, I guess) they are withdrawing support for the Research Works Act.

Elsevier has announced today that we are withdrawing our support for the Research Works Act. In recent weeks, our support for the Act has caused some in the community to question our commitment to serving the global research community and ensuring the best possible access to research publications and data. We have heard concerns from some Elsevier journal authors, editors and reviewers that the Act would be seen as a step backwards for expanding options for free and low cost public access to scholarly literature. That was certainly not the intention of the legislation or our intention in supporting it. Please read our full statement online.

Good news—now let’s turn retreat into rout.

arXiv overlays at Scholastica

The guys at Scholastica (some grad students at UChicago) just added arXiv integration to their journal management software.

It’s pretty impressive, and it seems they’re very actively working on the software. I’m sure they’d be really excited if one upshot of the current debate about publishing was some journals switching to (or new journals starting on) their software. If journals are going to abandon the big commercial publishers, we need to make sure there are viable alternatives; and part of that is ensuring that the basic software for managing manuscripts and referee reports is up to par. Even if we’re not actually ready to move on any particular journals, it seems like it would be pretty useful to try out things like Scholastica, and their ‘competitors’ OJS, Annotum, and EditFlow. Explaining which features are needed to the developers, and being able to explain to editorial boards exactly what is available, will be really helpful.

I created a dummy journal on their site, called “Experiments in Mathematical Publishing“. If anyone would like to play around with the interface, try submitting a paper there. (Submit one of your old arXiv papers, for example!) Their interface has a feedback form, and we can discuss things here or, even better, over at the Math2.0 forum. If you submit something, I’ll ‘assign’ something to you to ‘referee’, for the sake of trying out the whole interface. If anyone would like to be an ‘editor’, just let me know. I’ll eventually delete the journal, of course.

A forum on mathematical publishing

There’s been lots of great discussion on the future of mathematical publishing in recent weeks, largely inspired by the boycott of Elsevier (1) (2) (3). Mostly this has been happening on blogs, particularly Tim Gower’s, but also here and a number of other places. There’s a nice index of this discussion in a wiki page on Michael Nielsen’s site, to the extent that it’s possible to index a discussion happening all over the internet!

I think a lot of people find it somewhat frustrating that this discussion is predominantly happening in blog comment threads, however. It’s hard to maintain conversations, and almost impossible to coordinate people with similar interests and concerns. Andrew Stacey and I thought that it might be helpful to set up a forum (like the nForum, associated the to nCafe, or to alleviate this.

Thus, please check out Math 2.0! We’ll see what sticks. :-)

Our hope is that this might provide a better home for more focused discussion, and a place for people who want to coordinate concrete next steps in reforming mathematical publishing. Come in and join us!