Analyst Coverage: Philip Howard
A graph database is one that stores data in terms of entities and the relationships between entities. A variant on this theme are RDF (resource description framework) databases which store data in the format subject-predicate-object, which is known as a triple.
There are three types of graph database: true graph databases, triple stores and conventional databases that provide some graphical capabilities. Triple stores are often referred to as RDF databases. The difference between a true graph product and a triple store is that the former supports index free adjacency (which means you can traverse a graph without needing an index) and the latter doesn’t. The former are designed to support property graphs (graphs where properties may be assigned to either entities or their relationships, or both) but recently some triple stores have added this capability.
Both graph and RDF databases may be native products or they may be built on top of other database types. Most commonly, other database types are forms of NoSQL database though there are some relational implementations.
RDF databases target semantic processing, often with the ability to combine information across structured and unstructured data. Both graph and RDF databases may be ACID compliant and both are frequently targeted at transactional environments. All graph products target analytics but different products are targeted at operational analytics (those suitable for transactional environments) as opposed to data warehousing analytics. In this last category there is also a distinction between vendors targeting known-known problems as opposed to those that also cover known-unknowns and those tackling unknown-unknowns: the most intractable of all.
Graph databases handle a class of issues that are too structured for NoSQL and too diverse for relational technologies. In the latter case, relational databases are inherently limited to one-to-one, many-to-one and one-to-many relationships. They do not cater well for problems (such as bill of materials – a classic case) that are many-to-many. For these types of requirements graph databases not only perform way better better than relational databases but they allow some types of query that are simply not possible otherwise. Semantic query support tends to be particularly strong in triple stores.
Another major point is that research suggests that graph visualisations are very easy and intuitive for users. It is also worth noting that many (not all) graph products are schema-free. This means that if you want to change the structure of the environment you simply add a new entity or relationship as required and do not have to explicitly implement a schema change. This is a major advantage over relational databases.
This market is emerging and there are many open source projects and vendors, many of which will not ultimately survive. Nevertheless there are still new products coming on to the marketplace. Conversely, there are companies that have been in this space for more than a decade, so the technology is not entirely new. One noticeable trend is for triple store vendors to add support for property graphs.
One major issue that has yet to be finalised is with respect to language support. SPARQL (SPARQL protocol and RDF query language) is a W3C standard and is a declarative language but by no means all vendors support it. There are potential rivals to SPARQL and one or two vendors have extended SQL rather than adopt SPARQL. As a graph traversal language Gremlin is also popular.
Oracle and Teradata are both examples of major companies that have implemented graph capabilities on top of existing relational technology. IBM, on the other hand, having implemented triple store and SPARQL support in DB2, is not developing this further. It is, however, embedding third party graph technology into a number of its products. Other vendors such as Informatica are also OEMIng third party graph database products. Actian has recently introduced a graph database into its Actian Analytics Platform and MarkLogic has just introduced native graph (and JSON) storage into its database.
Some consolidation in this space is already occurring. For example, Experian acquired 4Store (now for internal use only) and, most recently, DataStax has acquired Aurelius, the developer of the Titan graph database. We can expect further such acquistions in the future.
Despite all of this attention the market is dominated by Neo4J and OntoText (GraphDB), which are graph and RDF database providers respectively. These are the longest established vendors in this space (both founded in 2000) so they have a longevity and experience that other suppliers cannot yet match. How long this will remain the case remains to be seen.