Has anyone else tried Zotero yet? It’s a Firefox add-on intended to help you organize the papers you use in your research (organize notes, make bibliographies [it seems to have bibtex support], etc.). It still seems to have some kinks that need working out (for example, it seems to work better with the arXiv’s website than the Front’s), but it does look promising. And if it actually allows me to keep track of which arxiv articles I have downloaded on my computer, it will be invaluable for that alone (at the moment, it’s much easier for me to download the PDF from the web again rather than locate it in my downloads folder, a ridiculous state of affairs).

5 thoughts on “Zotero!

  1. Perhaps I should clarify what I mean: one of the features of the program is that when you’re on a page that Zotero recognizes as a scholarly resource, say a MathSciNet page, a little icon will appear in your address bar, which you click to add the paper to your collection. MathSciNet and the arXiv have this, the Front does not. It might just be a matter of letting the Zotero people know that the Front exists.

  2. I looked into this, and there may be a bit of work involved, although hopefully much of it can be copied-and-pasted from existing code for the arxiv.

    To teach Zotero how to understand a collection of webpages, someone needs to write a “translator”. There’s a very basic tool to help you do this, called “Scaffold”, available at http://dev.zotero.org/scaffold. This is yet another Firefox extension.

    Inside Scaffold, you’ll find you can load (and save?) existing translators in Zotero’s database. Calling up the one for arxiv.org, you’ll find a regular expression that determines whether the translator should be applied, and then two chunks of code, defining functions (all written in javascript) called “detectWeb” and “doWeb”.

    “detectWeb” is easy enough; it just decides what sort of page we’re looking at, either from the URL itself, or from the page’s “DOM” (document object model).

    “doWeb” is more daunting. It turns out that this code looks through the html source of the page extracting the arxiv identifiers, and then
    performs in the background a lookup to the OAI (Open Archives Initiative) interface provided by the arxiv. This returns a chunk of XML describing the article, and Zotero then extracts what it needs from this.

    It seems one could adapt this model to the Front. Either one could write a parallel translator, or add some extra code to the existing translator for arxiv.org (it already deals with eprintweb.org at the same time).

    The existing code uses XPaths to find the arxiv identifiers in the html document — the MIT Simile project’s firefox extension “Solvent” is probably the way to discover these, rather than trying to write them by hand. After you’ve found these identifiers however, you can just plug right into the preexisting code; all the later steps of looking data up using the OAI interface, and then interpreting it, require no new code.

    While this is the sort of thing that, in principle, I know how to do, it feels too boring to do by myself, and I probably should get back to work anyway!

  3. I installed Zotero and it seems a very powerful tool but I couldn’t quite figure out how to use it well. But if the problem is just searching through pdfs, here is a simple solution. Download and install xpdf which has a tool called pdftotext which does exactly what it says. Now go to the directory where you have your pdfs, create a directory named Text (or whatever) and run the following command:
    for f in $(ls *.pdf); do pdftotext $f Text/$f.txt; done
    Now you have a copy of the text of all your pdfs. If you want to find the files which reference Gauss you can simply do:
    grep -l Gauss Text/*
    and you get a list of the files you want.

Comments are closed.