Subverting the system.

Subversion, often abbreviated as SVN, is a “version control system”. Prompted by Nathan’s request to hear about collaborative software for mathematicians, and the comments on Ben’s post on the subject, I’m going to briefly describe how you might use Subversion to collaborate on a maths paper. Even better, I’m offering to set up a subversion repository for any mathematician who’d like to try it. Jump to the bottom if you already have your subversion-fu, and just want the goodies.

Why would you want to try it? Essentially because I can’t imagine how you’re currently surviving in the dark ages without it! Having met Subversion through a programming project (the sadly defunct omath.org), I’ve used it for each and every maths paper I’ve written since, and even persuaded 6 coauthors so far to jump through the required hoops. They’ve all been happy enough. The alternative, emailing drafts back and forth, and having to keep track of “who’s in charge” at any given point, seems miserable. Besides automating the process of merging changes made by several people, it provides several nifty capabilities — recovering any previous version, if something goes wrong or you rediscover the charm of dropped paragraph, as well as tools to show differences, or to “blame” a file, showing who last edited each line or section.

You might have heard of one of Subversion’s many cousins: CVS, essentially made obsolete by SVN, and perhaps darcs or git, both of which are “distributed” systems, not requiring a central repository. I know mathematicians using all 4; I know about SVN, so I’ll stick with that today.

How do you use Subversion? I’ll describe here the command line interface; if you use Windows, you should install TortoiseSVN which gives a nice “right-click” interface to all of this, but the description should translate readily. There are also GUI interfaces for varieties of Linux and OSX; hopefully someone will chime in on those in the comments. Installing Subversion is fairly simple for most people — on a debian-like system, you can use the incantation “sudo apt-get install subversion”. At Berkeley, I used the incantation “Dear Julie, could you please install subversion on the department machines? It would be really useful for me and my collaborators. Thanks, Scott”, to which she replied “Sure, done!” about half an hour later. Julie is awesome.

Each SVN repository has a URL, for example something like http://tqft.net/svn/d4. (That’s a real live URL, corresponding to the SVN repository Noah and Emily and I are using to write a paper about the D_2n subfactor planar algebras. It’s even public, if you want to play along.)

To get started, you “check out” the repository:

svn checkout http://tqft.net/svn/d4

That should create a local directory called “d4”, containing several subdirectories and files. (If you know you only want part of the repository, you can use a longer URL, like http://tqft.net/svn/d4/trunk/code.) Once you’ve done this initial check out, the commands you’ll mostly use are

svn up

and

svn commit -m "This is a message describing the changes I just made."

The svn up command “updates” your local copy, automatically incorporating any changes that other people have made on the repository. Makes sure that you’re actually inside a directory containing files under version control. If you’ve just checked out the “d4” repository described above, you have to type cd d4 before svn up will do anything. Noah makes this mistake all the time! :-)

Nearly always Subversion incorporates remote changes successfully, even if you’ve also been editing one of the modified files locally. (Be careful, though, to close and reopen any updated files in your text editor!) Sometimes it fails though, and this is called a conflict. If this happens, well, ask someone who’s dealt with one before, or go read some of the Subversion book! (Incidentally, this is a well-written and thorough resource, available freely online. You don’t need to read much of it for normal use, but the real Subversion guru will eventually need to master all its appendixes.)

The svn commit command sends your local changes to the repository. You should include a very brief message describing your changes, although I’m often lazy about this, and use a very very brief message: “”. The important rule for happy Subversion use is “commit early and often”. This only really matters when you’re concurrently editing files with other people, but Noah and Emily and I have found this a really nice way to use Subversion (three people typing at once makes for quickly growing papers)!

A few more things: when you create a new file, it isn’t automatically included in the repository. You have to use a command like

svn add my-new-file.tex

Further, instead of using commands mv, cp or rm to move, copy or delete a file, use svn mv, svn cp or svn rm, so Subversion knows what’s going on. The GUI clients mentioned above make this easy.

Finally, it’s give-away time. I’ve set up so many Subversion repositories by now (papers, all my private files, several programming projects, and repositories for friends) that I have it down to a fine art, and in particular just a few minutes startup time. So, if you’d like one for your own use, tell me:

  1. The name for the repository.
  2. A list of usernames and passwords for accessing the repository
    • I didn’t mention this above, but you’ll be prompted for a password the first time you try to make changes at the repository.
  3. Whether it should be public or private (i.e. readable by the world, or just those with passwords).
  4. If you’d like automatic emails every time anything gets committed, and if so which email address(es) to use. This is strongly recommended, even if it sounds unnecessary at first.

Disclaimers: I’m not promising support outside of this comment thread, and you’re on your own if my hosting company does a runner, or I do. In principle, I can read your private repository, but in practice I don’t care enough to do so. On the other hand, I’ll include your repository in my completely paranoid backup system.

If you’d prefer to do all this yourself, the command you want to start with it “svnadmin create abc123”. Hooking up the repository to apache or another webserver for easy access requires some practice, however. Alternatively, if you don’t mind paying $6 a month, http://dreamhost.com/ offers SVN hosting amongst their many other services. They’re cheap and cheerful, and every so often offline.

18 thoughts on “Subverting the system.

  1. I’ll just note, there a ton of places to get free Subversion accounts online, often in communities often aimed at developers. The one I’m trying at the moment is assembla.com, but I haven’t gotten enough of a feeling for it to make a real endorsement. It seems like stealing the coders bug-tracking tools could be rather useful, though.

    Let me add that if you want a functional free account, you should definitely NOT go to beanstalk. Their free accounts suck.

    Scott, do you have any thoughts about such sites?

  2. Here’s a nice LaTeX trick when using CVS or Subversion. The following code defines a macro \versioninfo which you can use to print the current revision number within your document e.g. in a footnote or the running head:

    \def\RCS$#1: #2 ${\expandafter\def\csname RCS#1\endcsname{#2}}
    \RCS$Revision: 1.99 $
    \RCS$Date: 2008/03/18 03:44:56 $
    \newcommand{\versioninfo}{Version \RCSRevision; Last commit \RCSDate}

    Oh, and for Subversion you need to run:

    svn propset svn:keywords “Id Revision Date” filename.tex

    for this to work.

  3. I recommend Mercurial.
    We use it for everything from papers to very large software projects.

    I’ve switched to Mercurial for coding work, and it is indeed truly excellent there, but have so far stuck with traditional central repository systems (CVS and SVN) for papers. I really like being able to track the revisions with specific numbers that can appear in the TeXed file as per my last comment, and have found this very helpful to coauthors unfamiliar with version control (“does it say Revision 1.41 in your copy of the file”, “no, only 1.40”, “ok, you need to do ‘cvs update’ to get my latest changes…”). Having one’s revisions called things like “2ad3dcb8d811” instead of “1.40” is a tad confusing to the neophyte user, though such keyword substitutions are apparently now supported in Mercurial.

  4. I have asked several of the people in my department (Physics and Astronomy) why they don’t use revision control software, and most hadn’t even heard of it, even the ones that do substantial amounts of coding. The ones that had were all ex-programmers of one sort or another. Once I explained, they all said “sounds too complicated”, even though they have had significant problems with shared code and others changing it.

    It’s frustrating.

  5. I can’t believe I never thought of that :-) I’m currently writing up my thesis, and I work sometimes from my laptop at home and sometimes from my office machine; I use rsync to transfer stuff across, and it’s a right pain, as it will happily overwrite new stuff with old if you get the syntax even slightly wrong.

    Your article has inspired me to just chuck the whole lot into SVN, which is just obviously a better solution, even though this isn’t a collaborative project — just me working from two places.

  6. I use rsync to transfer stuff across, and it’s a right pain, as it will happily overwrite new stuff with old if you get the syntax even slightly wrong.

    Yeah, bidirectional synchronization is not rsync’s forte; unison works very well for this. (Though of course for one’s thesis actual version control is the way to go.)

  7. @ben (#3),

    I have in the past used Jira, a “bug tracking” program. It’s made by a friend of mine in Australia, and his company Atlassian. Sadly it’s closed source, and commercial, but they’re very supportive of open source projects, and I got a free license for the omath.org project years ago.

    I used it for all sorts of things beyond omath, however — keeping track of things that still needed to be written in a long and complicated paper, as well as organising my life for a while (keeping track of travel arrangements, passport and visa applications, papers to referee, etc.) In the end it fizzled out, and I fell back on simpler methods. I still have a working installation of Jira, however, if anyone would like to play.

    Trac is by now the standard free software solution to the “issue tracking” problem. Does anyone have experience using it in mathematics? I’m not sure how useful it is, even for a complicated many-author paper. On the other hand, if I were ever appointed dictator of a maths department, I’d be tempted to have my first command be “Go create an account on our Trac server, and get started.”

  8. The problem with SVN (and CVS) is that it requires a central server to do it. This comes with its own set of problems like having to have the technical know how to set one up and a lack of data integrity. The latter coming in the form of not only missing data, but history, etc as well if the server crashes. What if the server is unreachable?

    As mentioned above Mercurial:

    http://www.selenic.com/mercurial/wiki/

    Is an excellent (distributed) one that as well can be set-up like a central server. Not only that, but it’s really easy to set that up for use over http with zero extra configuration needed for the web server. In fact, I did this while I was at a web hoster not too long ago. All that was needed was Python to be installed on the server. Everything else can be installed ‘locally’.

    Another distributed version control system is GIT:

    http://git.or.cz/

    I haven’t tried it myself, but many people use and like it.

    As I mentioned, I use Mercurial. I thought about using SVN, but it’s just way too heavy (bloated) for my tastes. Mercurial is fast, written in Python and pretty much runs anywhere. All that with a *very* small learning curve.

    @Scott Morrison:

    Re: Trac:

    Why? Don’t you think that Trac is over-kill for a paper? Trac would be like digging a hole in your backyard with a H-bomb.

    Personally, I would think that using a version control system (e.g. Mercurial) and possibly a mailing list would be just right. That way, no data gets lost and everyone gets cc’d (as long as replies go to the list) on emails.

    “Make everything as simple as possible, but not simpler.”
    — Albert Einstein

  9. @Nathan, #4

    just a pointer for Windows users of Subversion — Nathan’s suggested command

    svn propset svn:keywords “Id Revision Date” filename.tex

    doesn’t seem to work on the windows command line. You can acheive the same effect using the GUI provided by TortoiseSVN, or running svn through cygwin. I’m sure this is just something about how quotation marks are interpreted by the windows shell, but I’m no expert.

  10. I’ve said this several times over email now, so I’m going to copy and paste it here!

    I’m “deprecating” my offer of hosting SVN repositories, in the sense that I’m now offering to host mercurial repositories, and in order to get me to set up an SVN repository for you, you have to first read the following argument for switching! :-)

    Mercurial is a “modern” distributed version control system. This means that while a central server is useful for coordination, it is by no means necessary. Every copy, whether on a client or server, contains a complete copy of the history, and in principle you can push changes between any two copies. In practice, for networking reasons, it usually makes most sense to push and pull from a server via http.

    There are some disadvantages. First, there are some limitations on large files (you’ll get a warning at 10mb, but on a fast computer you’re okay quite a way beyond that). Second, the mechanism for resolving conflicts is different from that in SVN, and to my mind less intuitive.

    There are some significant advantages, too. First, arranging for notification emails, rss feeds of changes, and highlighted diffs online all works out of the box. Second, because it is distributed, you can work offline, committing changes to your local repository, and pushing then out when you’re back online. If you’re interested in the “theory” of why distributed VCS is so much better, have a look at this Stack Overflow question. The original question is specifically about “git”, the main competitor to mercurial, but most of the answers are just about DVCS vs. VCS.

    Finally, you can easily walk away. If I lose interest, or you have any reason for preferring other hosting, it’s as simple as creating a new repository at one of the several online services supporting mercurial, and pushing the whole history up to there.

    (That said, why bother asking me, rather than going through google code, say? If you’re comfortable with it, it’s probably a better option to use someone else! On the other hand, set up with me is easy: tell me the repository name, some usernames, passwords and emails and you’re all set. If I know you at all, I’m also happy to help you get started if you have problems with the client software.)

    So, what do you need to do to use mercurial?

    I’ll send you a URL, which will look something like
    http://tqft.net/hg/blob/. (You can also use https, but you may see messages about the https certificate being invalid.)
    There you’ll find RSS feeds for changes, but I can also set up automatic emails if you like. There are also nice highlighted diffs for each revision.

    You can find a great tutorial on using mercurial at
    http://hginit.com/top/.

    Instructions:
    1) To get started, clone the repository.
    hg clone https://tqft.net/hg/XXX
    2) Copy the lines below between —hgrc-sample— and —— into a file named .hgrc and put it in your
    home directory (if anyone uses windows, ask me if you don’t know what
    to do here!), and edit it to fill in your name, email address and
    password (I’ll tell you these) at the appropriate places.
    3) Whenever you want to collect other’s changes, type hg fetch>.
    4) To commit changes you’ve made, type hg commit -m "message" ,
    replacing “message” with a description of your changes. To send
    changes to everyone else (at least, when they next type hg fetch)
    type hg push.
    5) You can see your current local changes by typing for a
    summary, or for all the details. You can check if you have
    any “unpushed commits” by typing hg out.

    —hgrc-sample—
    [ui]
    username = Your Name

    [extensions]
    rebase=
    fetch=

    [auth]
    tqft.prefix = tqft.net
    tqft.username = username
    tqft.password = password
    ——

  11. “If you’re interested in the “theory” of why distributed VCS is so much better, have a look at .”

    Missing link?

  12. Thanks David. I usually surround links with < … > in emails, but that doesn’t work so well when copying and pasting into html-like environments. All the actual command examples were broken for the same reason.

  13. That said, why bother asking me, rather than going through google code, say?

    My understanding with Google Code is that any project on it must be publicly available, right? I don’t think this is how most mathematicians will want to write their papers.

Comments are closed.