Why isn’t the arXiv more like Craigslist?

While some people might be suspicious of this analogy, I think the arXiv and Craigslist have a lot in common;  both took a service previously only available through an expensive print intermediary, and instead made it freely available on the web.

But there’s one point where the analogy breaks down: the aggressiveness with which they have expanded.  Craigslist started out only in San Francisco, but has since opened a bewildering number of local sites.  Of course, at the time, most them were very sparsely used at first, since people in those cities had never heard of Craigslist.  But relatively quickly, the ones all over the US took off (for a sense of which are getting used, I think best of craigslist is an excellent way to get a (highly NSFW) cross-section).  I think some of the UK ones are also getting some use, though there are also dozens of mostly empty Craigslist sites for random foreign cities like Seoul or Buenos Aires.  However, when the demand gets in place for such sites, Craigslist will be there, and who’s to say where and when it will happen first.  After all, it was very little trouble for craigslist to set up the foreign sites, having already created the US ones.

Similarly, the arXiv has a model which scales brilliantly, and in the time that Craigslist has gone from an SF-only site to covering 500 cities around the world, the arXiv has…added sections in Quantitative Biology and Quantitative Finance.  I don’t want to be too hard on the arXiv.  I mean, they do an incredible service to the mathematics and physics communities, but I don’t understand why they’ve taken such a restrictive view of their possibilities.  Maybe there’s some serious obstruction I’m not seeing, but I’d like to know what it is.

I mean, why not an arXiv for economics?  For literature?  For history?  My understanding is that most other disciplines don’t have a centralized preprint server (this experience is most based on my dating of grad students in other disciplines, so it’s rather, ahem, anecdotal), so why isn’t the arXiv at least providing the opportunity?  Maybe they wouldn’t be used much at first, but what’s the harm in trying?

  1. If we stay serious and don’t count “Why bother?” as an answer, I believe the main reason is that people in other disciplines don’t write preprints, they just publish papers. (I have never heard about a preprint in biology or economics).
    I believe that, other reasons aside, people in math and physics have somewhat higher moral standards, higher level of trust and are more eager to receive feedback and share their ideas. I think the relatively small size of the community also contributes to that.

  2. Actually, I think it’s dangerous to try to expand the arXiv. For example, the Computing Research Repository (= CS section of the arXiv) has simply been a failure in most areas. This is much worse than not having it at all, since it doesn’t remain empty. Instead, the cranks and low-prestige researchers move in, and this drives away most other researchers. For example, surprisingly few of the top researchers in computational complexity arXiv their papers. I’ve encouraged many of them to do so, and I’ve repeatedly been told that they consider the arXiv to be full of junk and that they would be embarrassed to have their papers appear there. The existence of a substandard arXiv section makes it more difficult to establish a central repository for a field, either within the arXiv or elsewhere. (Incidentally, there are a few areas of CS that are more arXiv-friendly, but they are mainly areas adjacent to fields where the arXiv is popular. For example, all quantum computing theorists use the arXiv. And some wise researchers use the arXiv even in subfields in which it is not popular.)

    I would have expected the CS arXiv to become very popular, and I still don’t really understand why it is floundering. The CS publication culture is very different from that in math or physics (which are themselves different), and that may contribute to it. There are also just unpredictable social factors; for example, it’s important that some of the early adopters should have high enough prestige that many people imitate them.

    One area in CS with a widely used central e-print server is cryptography (http://eprint.iacr.org/search.html). This server really ought to be folded into the arXiv, but it’s not going to happen: IACR wants the credit and publicity, plus the arXiv is viewed as a lower-prestige option.

    In computational complexity, the ECCC (http://eccc.hpi-web.de/eccc/) is reasonably widely used. It seems to me to be inferior to the arXiv in every respect. I’ve actually argued for migrating everything to the arXiv, with no success. Basically, a lot of people in the field have reservations about the arXiv, while nobody is strongly in favor of it.

    As for other fields, there’s little hope of extending the arXiv to the humanities, with their book-focused publication practices. Economics is in fact the only area I can think of where it might well succeed. (They don’t have “preprints”, but they do have “working papers”, which amounts to the same thing.) However, I would strongly recommend against setting up an economics section of the arXiv without a clear understanding of what went wrong with CS and how to avoid it in economics.

  3. I hope that we can get NSF money to come with the following string attached: that papers therefrom cannot have their copyright turned over to publishers that will not allow them on public archives.

    That’s a sad tale about the CS arXiv.

  4. There is a preprint server in philosophy of science (http://philsci-archive.pitt.edu/), but I can’t tell how much it’s actually used. It looks like they’re getting about 300 papers a year, but in a quick anecdotal sampling of names of prominent philosophers of science that randomly came to my head, only about half of them had a paper on there in 2008. I haven’t put anything up there yet, but I also don’t really know what the norms are – at what point is something ready to go there? Is it just before submitting to a journal? Or once it’s been accepted? Or just when I have some sort of draft that I’d be willing to e-mail to friends and colleagues and people from conferences?

    People in philosophy certainly circulate papers before they’re published (in the bibliographies of things I read I often find a couple things dated “MS” or “2009”, though I also often don’t find any) but I think to establish a server model that works you need a sort of community norm for what sorts of things go on there.

  5. Also, on the Craigslist side, it sounds like Kijiji is putting up some serious competition in Canada at least. And in Australia there are other sites that serve some of the functionality of Craigslist, like allhomes.com.au, so the fact that Craigslist operates very low-traffic sites in Australian cities only fragments the market.

  6. CS people publish their preprints (in the sense of early drafts of papers) in conference proceedings, where the bar is low as long as the paper is “interesting”, then they republish the final version in a journal.

  7. I’m in chemistry, and people just don’t do preprints around here.

    The general sentiment is “OMG People are going to steal my ideas.”

  8. lukas,

    I have heard of this competitiveness in chemistry, but I was under the impression that some areas of physics have a similarly competitive culture. By time-stamping submissions, the arXiv actually prevents some of the nastier idea-stealing that can happen during the referee process. I think the question of looking bad for posting on an archive that has cranks is somewhat moot if one can cite public time-stamping as a defensive measure. If we could get some high-profile chemists to support more openness, I think it could take off.

  9. Scott,

    I should have clarified: The sort of idea-stealing that you refer to is important and undesirable, of course, but from what I’ve seen, researchers are even more afraid of losing the head start they have on other groups in their field of research. So the delay that comes with traditional publishing is actually desirable for them, because that is time they can use to follow up on their own work without having to fear that someone else is going to do the same.

    The fear that once a discovery is out there, some lab in China, India or the Middle East is going to put fifty grad students to work on it and map every last corner is often exaggerated, but nonetheless grounded in reality. And if it happens to you, you can forget about that grant renewal.

  10. In economics we have SSRN, IDEAS, REPEC and probably something else I am forgetting. Anyone can have their papers listed in those archives without any quality check.

    I am not very familiar with ArXiv, but I believed it was the same there.

    If so, having one’s papers in the archive is neither a badge of honor nor a shame; it is just the easiest way to let people know what you have done.

    After all nobody seem to complain if a copy of their magnum opus is in a library that also contains trashy novels o Hitler’s Mein Kampf.

  11. lukas,

    Thanks for explaining. That sounds rather troubling, and I can’t think of any simple solutions.

  12. this is a VERY LATE comment, but now it’s Jul 2010, and I can say that arxiv use has picked up greatly among CS folks. Now after the major conference deadlines (STOC/FOCS/SODA etc), there’s a large number of submissions on arxiv (the timing is too perfect to be a coincidence).

