Graph Update 4: performance, scalability and Neo4j

As I have mentioned previously in this series of articles, the question I am asked about graph databases most often is with respect to performance and scalability. And, as Neo4j is the most widely used product in this space, it is not surprising that this question is asked most often about Neo4j. So, I have been taking a look under the hood at the product’s architecture and how it does and will provide appropriate performance and scale.

The first thing that you need to know is that Neo4j uses the Raft protocol. This is a consensus algorithm used for distributing a state machine across a cluster of computing systems. In Neo4j this is what enables fault tolerance and related capabilities such as causal consistency (which is to be preferred – in my opinion – to eventual consistency). That’s your central cluster but then you can define what are known as Read Replicas that can depend from the main cluster and you can have multiple such replicas that can be defined for different purposes. Thus, you could have a set of read replicas defined for the graph equivalent of OLAP type queries and you could have other replicas dedicated to look-up queries or to complex analytics or to operational processing. In effect, you get workload management by devolving different workloads to different Read Replicas. Note that the system knows where to address relevant processes thanks to the metadata it holds about the configuration that you have deployed. Further, it is worth commenting that you can geographically disperse your processing, so that Read Replicas do not have to be co-located with the central cluster.

Also from a performance perspective, the company has re-written both its browser and its label indexing in the latest release (3.2), in the latter case so that they are graph native, as opposed to being based on Lucene (which is greedy on resources). The company has also added composite indexes, a new cost-based optimiser for Cypher, and a compiled runtime capability for Cypher. Further enhancements in these areas are planned.

The company has also shared some future plans with me though, for obvious reasons, I cannot go into too much detail. Nor can I put dates on anything I am about to discuss. There are a number of major initiatives in place, though I can only discuss two. Firstly, there will be support for multiple Raft instances, so that you can have multiple clusters within the same configuration, each with their own Read Replicas. Secondly, there are plans to make Cypher multi-threaded. This is dependent on having an optimiser whose statistics are appropriate for parallel processing. I understand that research is ongoing, in conjunction with a leading British university, into this subject.

There’s more – I do not want to divulge too much – but it should be clear that there are some significant ways in which Neo is planning to improve performance and scalability. Note that these aren’t just the incremental improvements you would expect from any vendor: the plans I have outlined represent major architectural steps forward. To conclude, I think any worries about graph performance and scale – certainly with respect to Neo4j – are probably overblown and likely derive from ignorance. For example, one questioner who expressed reservations to me wasn’t aware of the support for Read Replicas. On the other hand, what it does suggest is that the company may not be doing enough to explain how the product performs and scales.