Data virtualisation: the next generation

For a long time, the market for data virtualisation – what we used to call data federation – was pretty static. Oh there have been movements: Composite Software got bought by Cisco, but pretty much it has been the same old, same old. Denodo has pretty much been left in a field of one. But the times they are a-changin’.

There are actually a couple of things going on. The first is that traditional database vendors are starting to build data virtualisation directly into their products. IBM, for example, has brought Federation Server (the very first product in this category) back into DB2, where it started. Similarly, version 6 of Exasol, due for release next month, has data virtualisation built into it. Why are these companies doing this? To support the logical data warehouse (LDW). I would expect this to become commonplace. There are probably other vendors already doing this and I just missed their announcements. And if they haven’t yet they soon will.

However, there’s another side to this. The sort of data federation supported by relational database vendors can be expected to be strong when it comes to federating queries across relational data sources that can be accessed via SQL (and probably SQL access to Hadoop) but not so much when it comes to other sorts of data such as JSON documents, video files, audio and so forth. To address this problem a number of graph database vendors – notably MarkLogic, Stardog and Virtuoso – are specifically positioning themselves as “unification” platforms, which provide data virtualisation across multiple data types.

The first question is: why do these graph guys call themselves unification vendors rather than data virtualisation suppliers? There are three reasons. Firstly, they actually provide a database of their own: it’s not just a middleware layer. Secondly, and I think this is the main reason, is because you can use the graph database to map the federated environment. Consider an ecosystem where you have perhaps hundreds – even thousands – of data sources which you may want to access and query. You actually need a map – literally, a map – of the landscape, so that you can navigate around it. Moreover, you need to know what sort of a source each database is (is it relational, a key-value store, a document store, a property graph database and so on) because you need to know how to access it. And thirdly, data virtualisation and data federation are both terms associated with analytics and queries. Unification platforms, while they address this market they also address operational environments where you may want to bring together multiple resources in real-time.

One final point is that, historically, data virtualisation tools have worked by building a virtual database view across your various data sources. IBM Federation Server certainly works that way and I expect that Exasol will also. But the problem with that is that it becomes unwieldy as the number of data sources you have grows. A graph, in this context, is much easier to visualise.

So, for everything there is a season and for traditional data virtualisation as we used to know and love it, it is now Autumn. Winter is coming. Who will stand up to the White Walkers? Sorry, mixed metaphors.