Enthusiasm is the SPARQL in your eyes

Written By:
Content Copyright © 2014 Bloor. All Rights Reserved.
Also posted on: The IM Blog

As there have been several (usually misspelled) searches on the Bloor Research web site since SPARQL City released SPARQLverse recently, it is probably appropriate that I write something about it. So, SPARQLverse is a graph-based analytics platform: what some people refer to as a graph compute engine.

The company has some serious pedigree. SPARQL City was founded by Barry Zane, previously of Netezza (now IBM) and ParAccel (now Actian), along with colleagues from each of those companies. Actian is a major investor in the company, which is why SPARQLverse is available as a part of the Actian Analytics Platform as well as directly from SPARQL City. There is a free to download single node version of the product available so that you can try it out.

SPARQLverse is a massively parallel (MPP), in-memory computing platform for doing serious analytics. It is not a graph database per se, in the sense that the company has not built a graph storage engine. In this first release it uses the Hadoop Distributed File System (HDFS) as its storage engine though other options are likely to be added in due course. The product uses SPARQL (SPARQL RDF Query Language) to query the data in the database. SPARQL, of course, is a declarative language so one of the important elements in a product such as this one is the quality of the query optimiser. While I haven’t looked at it in detail this is an area where the background of the various people involved with SPARQLverse is important: these are folks with a deep history in analytics and who understand the importance of good optimisation, so I wouldn’t expect to be disappointed in this regard.

The one issue that I do have is with respect to partitioning the data. SPARQLverse automatically collocates data as it sees appropriate, but you can override the sharding policy. The sharding override can potentially improve inter-node traffic that can slow down performance, but the less you know about the data in advance the more difficult this becomes. Note that this isn’t an issue that is specific to SPARQLverse: it applies to all graph compute engines. Some companies in this space are working with BSP (bulk synchronous parallel) as an add-on to MPP architectures as a way around this issue while others are using advanced algebra, but SPARQL City is relying on the speed of the Interconnect (and its optimiser), which is currently running at dual-10Gbit but with plans to move to dual-40Gbit.

With respect to SPARQL itself it is worth commenting that despite the fact that most products in this space (triple stores, graph databases per se and graph compute engines) typically support SPARQL, the vast majority of users do not use it. This is because the current W3C standard is very limited. For example, you can’t add a time stamp to a sensor reading. To resolve this issue in particular, SPARQL City has extended its version of SPARQL to include predicate attributes. And it has submitted this capability back to the community to be included in the next version of the standard. If SPARQL is to become a standard comparable to SQL or XQuery then it needs these sort of contributions. As a further point it is also worth commenting that because most people don’t use SPARQL most vendors don’t have much in the way of a database optimiser for SPARQL, so SPARQLverse has a further edge in this regard.

So much for the details: what about the overall package? Readers will know that I am a fan of graph databases in general. In the analytics space the truth is that this is the first major play in this area. There are lots of products in the market from vendors you’ve never heard of but there aren’t many from mainstream vendors. Of those that are available from names that you have heard of, they are either going to be very expensive or they are non-specific (graph access is just one choice out of many). Moreover, a number of leading vendors are currently sitting on the fence as to whether to get into this space or not. So, congratulations to SPARQL City and Actian for realising the potential of graphs and stealing a march on its competitors.

This Post Has 2 Comments
    1. I didn’t say people don’t use it: I said that the vast majority don’t. If you add up all the users of the vendors that have their own declarative languages: Neo4j, Lexis Nexis, ArangoDB, SPARQL City et al and then add in the numbers that could use SPARQL but choose to use Gremlin or some other non-declarative language, then SPARQL is in a significant minority.

Comments are closed.