An editable database tracking freely accessible mathematics literature.

(This post continues a discussion started by Tim Gowers on google+. [1] [2])

(For the impatient, go visit, or for the really impatient

It would be nice to know how much of the mathematical literature is freely accessible. Here by ‘freely accessible’ I mean “there is a URL which, in any browser anywhere in the world, resolves to the contents of the article”. (And my intention throughout is that this article is legitimately hosted, either on the arxiv, on an institutional repository, or on an author’s webpage, but I don’t care how the article is actually licensed.) I think it’s going to be okay to not worry too much about discrepancies between the published version and a freely accessible version — we’re all grown ups and understand that these things happen. Perhaps a short comment field, containing for example “minor differences from the published version” could be provided when necessary.

This post outlines an idea to achieve this, via a human editable database containing the tables of contents of journals, and links, where available, to a freely accessible copy of the articles.

It’s important to realize that the goal is *not* to laboriously create a bad search engine. Google Scholar already does a very good job of identifying freely accessible copies of particular mathematics articles. The goal is to be able to definitively answer questions such as “which journals are primarily, or even entirely, freely accessible?”, to track progress towards making the mathematical literature more accessible, and finally to draw attention to, and focus enthusiasm for, such progress.

I think it’s essential, although this is not obvious, that at first the database is primarily created “by hand”. Certainly there is scope for computer programs to help a lot! (For example, by populating tables of contents, or querying google scholar or other sources to find freely accessible versions.) Nevertheless curation at the per-article level will certainly be necessary, and so whichever route one takes it must be possible for humans to edit the database. I think that starting off with the goal of primarily human contributions achieved two purposes: one, it provides an immediate means to recruit and organize interested participants, and two, hopefully it allows much more flexibility in the design and organization of the collected data — hopefully many eyes will reveal bad decisions early, while they’re easy to fix.

That said, we better remember that eventually computers may be very helpful, and avoid design decisions that make computer interaction with the database difficult.

What should this database look like? I’m imagining a website containing a list of journals (at first perhaps just one), and for each journal a list of issues, and for each issue a table of contents.

The table of contents might be very simple, having as few as four columns: the title, the authors, the link to the publishers webpage, and a freely accessible link, if known. All these lists and table of contents entries must be editable by a user — if, for example no freely accessible link is known, this fact should be displayed along with a prominent link or button which allows a reader to contribute one.

At this point I think it’s time to consider what software might drive this website. One option is to build something specifically tailored to the purpose. Another is to use an essentially off-the-shelf wiki, for example tiddlywiki as Tim Gowers used when analyzing an issue of Discrete Math.

Custom software is of course great, but it takes programming experience and resources. (That said, perhaps not much — I’m confident I could make something usable myself, and I know people who could do it in a more reasonable timespan!) I want to essentially ignore this possibility, and instead use mediawiki (the wiki software driving wikipedia) to build a very simple database that is readable and editable by both humans and computers. If you’re impatient, jump to and start editing! I’ve previously used it to develop the Knot Atlas at with Dror Bar-Natan (and subsequently many wiki editors). There we solved a very similar set of problems, achieving human readable and editable pages, with “under the hood” a very simple database maintained directly in the wiki.