Graph update: Ontotext GraphDB

This is the second of a series of articles discussing recent developments in the Graph and RDF database space. In this piece, I am going to focus on the recent releases (8.0 last December, and 8.1 just recently in April) of GraphDB from Ontotext, which is a semantic (RDF) graph database.

There are probably two major features of these releases that are worth discussing: ease of use and performance, both of which involve several new features. Given the limited space available to me I will skip over new features such as support for Docker images, enhanced monitoring, health checks, and extended support for Elastic Search.

As far as ease of use is concerned, one of things that Ontotext has recognised is RDF is perceived to be relatively difficult to use – it’s not as intuitive as JSON, for example. So, the company has set out to rectify this and has introduced new capabilities into GraphDB to support this. Firstly, the company redesigned the GraphDB Workbench Interface, to make it more intuitive and, in the latest release (8.1), the company has gone further by introducing a new visual interface that allows you to visually navigate your graph without having to write SPARQL queries. This is not dissimilar to the sort of facility you might get from a graph database such as Neo4j. Further, while on the subject of SPARQL, SPARQL views are now easier to use: you can now switch between different query results and preserve these for reuse.

The other major ease of use feature, introduced in 8.0, was the launch of OntoRefine. This is a fork of the open source OpenRefine (www.openrefine.org) project that was originally developed by Google. The aim of OpenRefine is to “clean up messy data”. Some sources refer to it as data wrangling but that’s a capability that is specifically associated with data lakes and data scientists whereas there are all sorts of environments where you want to move, transform and cleanse data from spreadsheets, CSV files, JSON documents, XML and so forth. In the case of OntoRefine this allows these types of data to be queried by means of SPARQL in order to construct relevant graphs.

I often get asked about the performance of RDF and graph databases and this is an issue that is clearly recognised by Ontotext. In the latest release (8.1), the company has introduced three major new features in order to improve performance. These are faster writes, parallel inferencing and optimised named graph indexes. The first of these doesn’t require much description, it’s the sort of enhancement you might expect any company to make: according to Ontotext’s own benchmarks (see http://ontotext.com/products/graphdb/benchmark-results) faster writes result in a 35% performance improvement for smaller SPARQL queries and 60% for large file imports. The parallel inferencing is similarly supposed to increase queries by around 60%. This is intra-core parallelism, so you will continue to get improved performance as you add more cores.

However, perhaps the most interesting of the performance upgrades are the new named graph indexes, which replace what was in the product previously. Without going into too much detail these might better be described as “context” indexes. So, for example, a context might be the source of the data you are investigating, or its business “owner”. These indexes will therefore be useful in speeding up the performance of queries where this context is relevant (which it won’t always be). Governance related queries would be one obvious example.

Finally, it is worth commenting that Ontotext is increasingly emphasising business solutions rather than just being a database technology company. Its current focus is in publishing (especially online) and healthcare, and it also sees potential in the financial and insurance sectors.