What is a graph database?

I have written about this before in general terms but I now need to clarify matters. The upsurge in interest in graphs over recent months (according to db-engines.com it has had the fastest growth in this respect over the last year) has led to lots of companies jumping into the market. This is not unusual with subjects that are flavour of the month (or year) but it leads to a great deal of confusion, because not all of the products are the same.

It was previously (more or less) the case that graph databases needed to be distinguished from RDF (resource description framework) and triple stores, where the latter were primarily designed to support the semantic web and the former for more general-purpose use. From a technical point of view the latter do not have the inferencing capabilities that are associated with the former.

However, things have become more complex as a multitude of vendors have leapt onto the graph bandwagon so that we now have different products that are aimed at different market segments and which use different underlying technologies.

There are three generic use cases for graphs (or, indeed, any other database system): CRUD (create, read, update, delete) applications that are focused on transaction processing; query processing—reporting, business intelligence and real-time analytics; and what we might call deep analytics (typically in batch mode) or data discovery. Different vendors in the graph market focus on one or more of these.

However, target markets are not the only thing that distinguishes vendors but also the underlying database technology. While there are suppliers that have a genuine graph database underlying their graph solution there are now significant numbers of vendors that do not.

Thus the market breaks down into the following groupings:

Graph databases aimed at CRUD and analytic applications such as Neo4j. These may or may not be ACID compliant: Neo4J is, some others aren’t.
Graph solutions aimed at the above but which are not based on a graph database per se. An example is MarkLogic, which is ACID compliant but whose underlying technology is essentially an XML database with graph indexes.
Graph databases aimed at data discovery. As noted, these are essentially batch environments. An example is Giraph.
Graph databases aimed at data discovery and complex analytics that will operate in real-time. An example here is YarcData, which uses in-memory technology to support non-batch operations.
Graph solutions targeted at data discovery that do not use a graph database for storage. Examples here are Pregel and Teradata Aster SQL-GR.
Graph solutions/databases based on hybrid databases that use multiple storage engines, one of which is graph-based, such as IBM’s DB2. But DB2 doesn’t currently have an inference engine so it can only currently be described as a triple store. This will change. Note that Teradata Aster has multiple storage engines but none of these is actually graphical.

You can see how this could be (is!) confusing. Sometime in the Spring I plan to write a detailed paper or papers on the different elements of the graph market but until then you should be aware that the graph database market is by no means homogeneous and it will be important to understand exactly what any particular vendor means when refers to graphs.