Grepping the Knot Atlas

I want to make a brief plug for the Knot Atlas, and in particular a new way you can get hold of all the knot theory data hidden away inside it.

First of all, what is the Knot Atlas? Officially, it’s aiming to be “a complete user-editable knot atlas, in the wiki spirit of Wikipedia“. In more mundane technicalities, it’s a machine sitting underneath Dror Bar-Natan’s desk, running Mediawiki (and other) software maintained by Dror and myself, hosting a wiki with lots of information about knots. You might like to take a self guided tour, or just read the About page.

Next I’ll explain how the Knot Atlas stores and displays its data, and then introduce a new and simple mechanism we’ve devised for using this data — the classic command line tool “grep“, secretly operating on an RDF data dump.

The Knot Atlas stores knot theory data by taking advantadge of the fact that a wiki is the simplest database system that could possibly work — it lets us store arbitrary chunks of text under fairly arbitrary page names. We store every knot invariant value known to the Knot Atlas as a separate page. Have a look at these examples. To display these, the Knot Atlas uses the wonderful wiki trick of transclusion. An individual page collecting data for a particular knot includes some wiki markup along the lines of {{Data:3_1/Jones_Polynomial}} (or actually, the somewhat fancier markup {{Data:{{{PAGENAME}}}/Jones_Polynomial}}, which works wherever we put it), which magically splices in whatever data appears on that page.

While this works nicely for what we’ve done so far on the Knot Atlas, it’s not obvious from this description how one would add lots of new data to the Knot Atlas, or extract the data for use elsewhere. In fact, neither of these problems currently have great solutions. We’ve made a big improvement on extracting data, which I’ll explain now, but at present the only way to add data is to use one of the many mediawiki editting bots (such as the Java based WikiLink, originally written for the Knot Atlas, which also has a Mathematica wrapper). It’s a little painful, and I don’t think anyone other than me and Dror has overcome to learning curve.

Happily, there’s now a much better way to get data out of the Knot Atlas. Earlier in July, while I was visiting Dror in Toronto, we put together a script that runs every night, and dumps out all the “Data:” pages in the Knot Atlas as an RDF file. It turns out that you can ignore the fact that it’s a well-formed RDF file, and just think of it as plain text, and it looks like this:

<knot:10_71> <invariant:BraidIndex> "5" .
<knot:10_71> <invariant:Alexander_Polynomial> "<math>-t^3+7 t^2-18 t+25-18 t^{-1} +7 t^{-2} - t^{-3} </math>" .
<knot:10_71> <invariant:Bridge_Index> "3" .
<knot:10_71> <invariant:ConcordanceGenus> "<math>3</math>" .
<knot:10_71> <invariant:Conway_Notation> "[22,21,2+]" .

Every line is a “triple”, listing a knot, an invariant, and a literal string, the value of the invariant. If you’d like to play, you can download our entire database (watch out, it’s 400MB uncompressed!) or go to the “Take Home Database” tutorial page on the Knot Atlas where you can find links to subsets of the database. That tutorial page contains some suggestions on using the data, but I’ll give one nice example here:

[drorbn@katlas ~/Data]$ zgrep Determinant Knots11.rdf.gz | grep \"1\"
<knot:K11n34> <invariant:Determinant> "1" .
<knot:K11n42> <invariant:Determinant> "1" .
<knot:K11n49> <invariant:Determinant> "1" .
<knot:K11n116> <invariant:Determinant> "1" .

This found all the 11 crossing knots with determinant 1, just using some command line tools. “zgrep” is just a version of “grep” which works on gzipped files (nice, huh?), so if you grok the unix command line, you’ll see that the above command just picks out every line containing “Determinant”, and then every line containing “1” (with quotes).

Maybe one day we’ll take advantadge of the fact that these data files are actually RDF. RDF, which stands for “Resource Description Framework”, is a simple scheme for describing certain labelled oriented graphs. In particular, these labelled trivalent graphs are those in which edges are labelled by URLs, and vertices are labelled by either URLs or literals (strings, numbers, dates, and a few others), with the extra condition that ‘source vertices’ can only be labelled by URLs. While this sounds rather restrictive, it’s actually rather flexible. You do need to ‘make up’ URLs to refer to the objects you’re interested in — for example <knot:K11n42> is actually a URL, simply prefacing the Knot Atlas’s standard notation for the 42nd non-alternating knot with 11 crossings with the string “knot:”. Often when writing RDF you don’t actually need to make up URLs for the relationships (edges) you’re interested in, because there are lots of semi-formalised standards out there, for example DC elements (basic bibliographic information), FOAF (relationships between people), and many others. While taking using the common vocabularies has some nice benefits (e.g. you can often usefully merge graphs from different sources), in this case we need very particular relationships — “this knot has this value for such and such an invariant”, so we make up our own URLs for these too — for example <invariant:Determinant> above.

(You might wonder how <knot:K11n42> or <invariant:Determinant> count as “URLs”… Well, at the very top of the file there are two extra lines:

@prefix knot:  <http://katlas.org/wiki/> .
@prefix invariant:  <http://katlas.org/wiki/Invariants/> .

With these, it should be obvious how knot:K11n42 translates into http://katlas.org/wiki/K11n42,which turns out to be a nice human readable page about the knot! Maybe one day it will also be nicely computer readable, using RDFa to include all the data…)

How would we make use of the data being in RDF? Well, there’s a nice RDF query language, called SPARQL, which lets you search for subgraphs, with some bound and some unbound labels, and presumably we could easily write something like the KnotInfo page, but perhaps even more powerful.

3 thoughts on “Grepping the Knot Atlas

  1. Hi Scott,

    The Knot Atlas is cool! Wow, 400Mb of data? That’s amazing.

    So it seems that the issue you have is of getting “semantic” data out the wiki. I think your solution is great, it looks like it works like a bomb. But howcome you don’t try Semantic Mediawiki? Perhaps because its not really stable yet, so you couldn’t trust it for the volume of data you’re talking about?

    The one advantage I could think of Semantic Mediawiki is that wiki contributors will be able to write these queries inline, inside the wiki pages. Eg. , want to find all the 11 crossing knots with determinant 1, and insert it into a wiki page? Insert

    <ask>
    [[Category:knots]]
    [[crossingNumber := 11]]
    [[hasDeterminant := 1]]
    </ask>

    in the wiki page. Or alternatively, go to the Special:Ask page in the wiki. A nice effect of this is that it produces the links to those pages too, so the user can click on them and see if there’s perhaps something special about those knots.

    Want to export some semantic data as an RDF file? Go to the Special:ExportRDF page. And so on. Anyhow, you’re well aware of this stuff, this is a techno-lam-o (ie. me) talking to a technocrat!

    The Knot Atlas is a datalover’s dream. I think that’s a really exciting project. Wikiwise, the stuff you guys do with bots is amazing. I haven’t yet learned how to use bots in Mediawiki.

  2. Bruce,

    you’re right that Semantic Mediawiki seems like a good solution. When we started the (wikified) Knot Atlas 2 years ago, however, neither Dror nor I knew anything about RDF or the joys of the semantic web. We were making everything up as we went along, and only ended up with the current data model after some wrangling. Perhaps if we were starting again we’d be using Semantic Mediawiki! I’m not sure we have the energy to switch now.

    I’m actually watching with great interest your EurekaJournalWatch effort, and in particular your use of Semantic Mediawiki. In the last 6 months I’ve become a big fan of RDF, so your approach tickles me the right way. :-)

    Oh — and while the RDF data dump might be 400mb, it’s actually not as big as you might think at first. The biggest chunk of that is the Dowker-Thistlethwaite codes for the 16 crossing knots, and the second biggest chunk is the HOMFLY-PT and Kauffman polynomials for the 15 crossing knots. Given some computer time and the patience to manage a herd of bots, we could easily expand massively.

Comments are closed.