GraphConnect Europe

I recently attended Neo Technology’s European GraphConnect conference. This was surprising on several fronts. First of all, with respect to attendance – there were around 500 people there – next year they’ll need a bigger hall. Given how few people (relatively: maybe 500 gives the lie to this argument) know anything about graph databases yet – not to mention the dearth of other analysts – this was a surprisingly high turnout. The company also revealed that its London user group now has 2,000 members: also a surprise.

The conference consists of a three track programme: an industry track with presenters talking about the business motivation for choosing a graph database (which happens to be Neo4j), a community track dedicated to graph ecosystem partners (including graph visualisation, development frameworks and so on) and practitioner sessions that are more technical. If you want to learn about graph databases this is as good a place to come as any. In this article I will focus on the latest release of Neo4j, which was discussed at the conference, and I have already blogged about one of the users presenting at GraphConnect (see http://www.bloorresearch.com/analysis/new-thinking-on-spreadsheets/).

The most notable feature of Neo4j 2.2 is the introduction of a cost-based optimiser. According to the company this results in as much as two orders of magnitude performance improvements for appropriate queries. Unlike some of the other things mentioned in this article this does not surprise me: there is a good reason why the optimisers in all the leading relational databases are cost based. However, not all queries are affected by cost statistics (Neo estimates around 20%) or, at least, it is not clear what statistics are relevant in these cases. For the time being, Neo is therefore continuing to use its previous rules-based approach for these queries with the software deciding which approach is most appropriate. Over time it is likely that statistics will be applied more and more widely. In association with the new optimiser the company has also launched a visual query capability that allows you to view (and amend if you really want to) query plans. Previously this was all hidden away as a black box.

There are some other major performance and scaling improvements for bulk loading and for both read scaling and write scaling, with the former using an in-memory page cache and the latter a fast write buffer. In the latter case, the company has moved to what it calls a “unified commit” whereby both transaction and index data are held in a single log in the buffer and the graph and the index are then updated simultaneously at an appropriate time, rather than the two-phase approach (the graph and index being updated separately) which was formerly used and which could cause bottlenecks.

In many ways perhaps the most interesting thing about release 2.2 for non-graph aficionados is the Northwind example that now ships with the product. This will be familiar to just about every relational database user, as it is the standard set of trial data that ships with Microsoft SQL Server. Well, now it ships with Neo4j too and, since Neo4j is open source, you can download it and try it out. There are actually a couple of reasons why you might want to do this. The first is just to see how you can explore Northwind data in ways that would simply not be possible (or too slow) if you were using SQL Server. This was actually demonstrated live at the conference and it was impressive. The second reason for trying this out is to see how you can migrate from a relational environment to a graph environment. Or, at least, to Neo4j. It’s not as difficult as you might think.