Mathematics Literature Project progress

We’ve made some good progress over at the Mathematics Literature Project. In particular, we’ve completely analyzed the 2013 issues of five journals:

(The colour coded bars show the fractions of papers available on the arXiv, available on authors’ webpages, and not freely accessible at all; these now appear all over the wiki, but unfortunately don’t update automatically. Over at the wiki you can hover over these bars to get the numerical totals, too.)

Thanks everyone for your contributions so far! If you’ve just arrived, check out the tutorial I made on editing the wiki. Now, it’s time to do a little planning.

What questions should we be asking?

Here’s one we can start to answer right away.

What fraction of recent papers are available on the arXiv or on authors webpages?

For good generalist journals (e.g. Adv. Math. and Annals), almost everything! For subject area journals, there is wide variation (probably mostly depending on traditions in subfields): AGT is almost completely freely accessible, while Discrete Math. is at most half.

I hope we’ll soon be able to say this for many other journals, too.

Here’s the question I really want to have answers for:

Does being freely accessible correlate well with quality?

It’s certainly tempting to think so, seeing how accessible Advances and Annals are. I think to really answer this question we’re going to have to classify all the articles in slightly older issues (2010?) and then start looking at the citation counts for articles in the two pools. If we get coverage of more journals, we can also look for the correlation between, say, impact factor and the ratio of freely accessible content.

What next?

I don’t want to just list every journal on the wiki; it’s best if editors (and the helpful bots working in the background) can focus attention and enjoy the pleasures of finishing off issues and journals. Suggestions for journals to add next welcome in the comments. I’ve already included the tables of contents for the Journal of Number Theory, and the Journal of Functional Analysis. (It will be nice to be able to make comparisons between JFA and GAFA, I think.)

I’ve been working with some people on automating the entry of data in the wiki (mainly by using arXiv metadata; there are actually way more articles there with journal references and DOIs than I’d expected). Hopefully this will make the wiki editing experience more fun, as a lot of the work will have already been done, and humans just get to handle the hard and interesting cases.

21 thoughts on “Mathematics Literature Project progress

  1. From a little bit of poking around on the papers that are not on the arxiv, it appears that authors either arxiv all their papers or arxiv only when they have a coauthor who arxivs all their papers. This means it’s not quality of paper as such that’s the key variable. Instead, it’s the quality of the authors. Authors who publish in good journals tend to also arxiv their papers. (In addition there may be some bump coming from a correlation between papers with more coauthors and quality, if any such correlation exists.)

  2. I have been suspecting this quality-accessibility correlation for ages, but the discrepance between Discrete Mathematics and the rest ofthe list struck me. I wondered what the authors are trying to hide. But then I noticed this:
    These are the guidelines for authors in DM (you can tell the professional typesetting by the wordspacing in the “Retained author rights” paragraph on page 4) and, despite being from 2013, don’t ever mention the standard Elsevier permission to publish preprints on repositories and websites. Instead they read like you have to pay a fee for OA and otherwise cannot do anything. I suspect a good deal of the blame for the red paint can be ascribed to this PDF.

  3. Based on tiny amounts of anecdotal evidence, my guess is that there’s a strong correlation between subfield and arxivification rates. (Perhaps there are other significant factors as well, such as the Elsevierity of the journal as Darij suggests.) I would be very interested to see some actual data about this.

  4. Let me speculate that being freely available correlates not so much with quality as with where the authors work, and particularly how well plugged-in the authors’ institutions are to the mathematics research community. I am pretty sure that, of all the journals you have listed, Discrete Math gets the most contributions from very teaching-focused institutions and places in the developing world, in part because it’s the only one of these journals that regularly accepts papers a good undergraduate could understand.

  5. An obvious pair of journals to add would be Journal of Combinatorial Theory A and B. They are the same publisher as Discrete Mathematics and are perceived as being of higher quality. I’ve looked at them in the past and am pretty sure a far higher proportion of their papers are arXived than those of DM, but I would very much like to see this confirmed. It would also be interesting to look at Combinatorica, which is another good combinatorics journal, but this one published by Springer.

  6. @quasihumanist: I see where you’re coming from, but:

    1) I disagree with the idea that an average DM paper is more understandable to an undergraduate than an average JCTA paper. Obviously I can’t judge particularly well since DM has a lot of the graphs and designs kind of combinatorics I don’t have an idea about, but I can tell there is a lot of good insight about elementary things to be learnt from JCTA. Truth is, neither of the existing research journals seems to optimize for readability, at least to an extent I would prefer.

    2) Theoretically, “how well plugged-in the authors’ institutions are to the mathematics research community” shouldn’t have an effect on whether they submit their work on arXiv or not. It’s not like an editor would tell the less-connected authors “you are a nobody, you better give us exclusive copyright or noone will ever read your crap”. (Well, at least this isn’t what I expect to be happening most of the time. After reading the dubious author policy PDF for Discrete Mathematics I’m not 100% about this anymore…) Of course, the reality is that the OA virus has been spreading socially from places like Scandinavia, France, MIT etc., and probably is slower to reach developing countries than the publishers’ often targeted PR. So I’m not disagreeing with you on this (although I could bet it’s not as much about where the authors work but where they have been studying for PhD), just suggesting that the reasons might be different from the ones you are probably thinking of.

  7. I full-heartedly agree with Tim Gowers’ suggestion to add JCTA, and I’d be happy to help out with that. (European J. C. could also be an interesting addition.)

  8. I’m a little concerned about focusing too much on one subject area from the beginning, but in deference to the opinions above (especially weighted with promises to work on them!) I’m uploading the metadata for JCTA now.

  9. While I’m a fan of the project (as hopefully evident from my contribution) I am a bit wary of positing correlation between “quality” and “freely accessible”. Try and — I hasten to add that neither of these is a “joke journal” or a “vanity press”, but I would also politely suggest that these are not the high end of the field.

    Given Elsevier’s apparent status as synecdoche of Evil Publishing, at least according to the online discussion I see, I’d be genuinely interested to see if JFA is worse than GAFA on accessibility. I suspect it will be, if only because of certain high-profile authors simultaneously preferring GAFA and free access.

  10. @Yemon: I think we’re talking about open access as a criterion only when it is the decision of (one of) the author(s) rather than an automatic consequence of the journal’s modus operandi.

  11. @Darij: ah, that makes sense (although then I have some thoughts along similar lines to the comments made by quasihumanist). Certainly I have heard that in some countries, submission to the arXiv is just not part of the “default culture” — this is however purely anecdotal and I don’t claim to have got any kind of representative sample.

  12. @Darij: (1) DM was explicitly compared to the other 4 listed on this post, not to JCTA/B. (2) yes I agree with your reasons and those were the ones I was thinking of, but institution is important because mathematicians who went to grad school before arXiv learn arXiving habits from their younger colleagues (if they do learn).

  13. Hi Scott, how’s this going? It looks like the data that’s currently on the tqft wiki is more or less final (for these specific journals and volumes); can we see an updated chart?

  14. @Darij, I’m in New Zealand at the moment at a conference, and have fairly limited internet access, so I haven’t been able to do much. Next week I’ll have no internet access whatsoever! Making matters worse my office computer back in Canberra seems to not be enjoying the heat wave there.

    I’ll try once more to update the charts and perhaps even add another journal before departing tomorrow.

    In any case, the week after next I’ll be on top of everything again.

  15. It is hard for good mathematicians (like most of those posting here) to understand the following:
    1. The majority of mathematicians are not “good mathematicians” (this is almost a tautology).
    2. Publishing is demanded of everyone teaching math in universities (what I mean by “mathematician”).
    3. There is a need for places for low quality and mediocre mathematicians to publish their low quality and mediocre work. They sometimes are even willing to pay out of their own pocket for publication, and they are certainly willing to make their university pay Elsevier and its ilk for subscriptions to lousy journals, because they need those lousy journals to have decent impact factors, because that is how the incompetent people evaluating their CVs score those CVs. On the other hand, since they are publishing lousy or mediocre papers, and are often no so stupid as to be unaware of this situation, they have little interest in the lousiness of their articles becoming well known. It is convenient to publish in pay-walled journals and to distribute nothing freely. That way no one knows that your paper in Chaos, Solitons and Fractals is complete garbage even though the journal has an impact factor above 1 – because no one can read it. The mediocrity does not need access to articles published in Annals/Acta. He is too busy filling his CV with publications in conference proceedings and trips to Poland to have time to read that stuff; in any case he publishes in journals listed in the JCR and his friends cite him as often as he cites them.

  16. @Bobito, there are more problems in mathematical publishing than expensive subscriptions. I think everyone understands that a large part of the mathematics literature is “write-only”. It would be nice to be able to meaningfully quantify this.

  17. Two things are seriously disturbing me:

    – freely accessible might include “author or their institution pay”, thus favoring scandalous publishing fees in some cases of the so-called “gold open-access”;

    – “Quality measured by number of citations” : this is extremely questionable: is it necessary to recall once more that number of citations has nothing to do with quality (e.g., citations by “friends”, citations to underline a mistake, etc.)? In the same spirit it looks like some people (even in mathematics!) do believe in the non-pertinent (to be polite) “impact factor”!

    Jean-Paul Allouche, CNRS, Paris

  18. Dear Jean-Paul,

    You’re right that it would be interesting to catalogue those articles that have been subject to a publication charge. I’m not sure of a good way to collect this data. If someone has a suggestion, I’d love to try to incorporate this. So far the database essentially ignores copies of the article available from the publisher under any form of open access. (Although there’s currently no prohibition against wiki editors providing such links.)

    I agree that citation data is far from an ideal proxy for quality. But frankly I disagree that citation counts have _nothing_ to do with quality. Nor do I think that “impact factor” is _completely_ irrelevant. If you look at the list of mathematics journals by impact factor (e.g. at eigenfactor), surely you would agree that the list given there is a better indication of the relative quality of journals than the reversed list?

    I understand that there are important political reasons for disparaging these measures, of course.

    best regards,
    Scott Morrison

  19. Dear Scott

    Thanks for yout comment on comment :-).
    About impact factor, “important political reasons for disparaging these measures”: I fully agree of course. On the other hand, I said “non-pertinent”, you are more precise (and may be less polemic) then I was, by saying “not completely irrelevant”. On a more “epistemological” level, I have always been surprised by the classical oxymoron of tempting to “quantify quality”.

    Best wishes

Comments are closed.