Calpont InfiniDB version 4.0

Calpont, one of the last of the data warehousing vendors to have started up this century to remain independent, has just released version 4.0 of its InfiniDB product. From a general point-of-view the interesting thing that is different about InfiniDB is its tiered architecture. It is, of course, columnar, but that’s pretty much de rigueur nowadays. What is interesting is that the way that incoming SQL is broken down into low-level primitives in the user modules (the top layer of the tiers) and then the results are distributed (parallelised) across performance modules (middle layer) before accessing shared storage (the bottom tier). In other words, this is very similar to the way that MapReduce works, except within a conventional massively parallel environment.

Also interesting about InfiniDB is that it looks like MySQL to the end user (or, more particularly, the database administrator) although, in the latest release, InfiniDB now supports SQL 2003 with features like windowing (which runs in-database). This is the first time that Calpont has extended beyond what is normally available in MySQL environments. However, it is useful because it supports features like moving averages. While on the subject of MySQL it is interesting to note that Calpont has seen a significant number of organisations moving from MySQL to a combination of InfiniDB and MongoDB (with which the former integrates).

Finally, again from a general point of view, Calpont uses metadata (not unlike the way IBM Netezza uses zonemaps) to avoid having to full table scans.

All good stuff. What about the latest release?

Well, apart for the (statistical) windowing there is extended cloud (Amazon) support, the removal of open source (GPL v2) restrictions and extended Hadoop/HDFS support. As far is the licensing is concerned there are no SQL syntax restrictions, and no performance or scalability restrictions. What you get for a subscription, apart from support and so forth, is some utilities for re-balancing and tuning (things like compression ratios) and a diagnostics utility. Previously there was limited scalability and compression was not available, which meant that you had to reload the data if you wanted to move to an enterprise licence after trying the software out. Now you won’t have to.

However, the most interesting thing, as far as I am concerned, is that now InfiniDB can run on top of HDFS. The company already had a Hadoop connector but running on top of HDFS is another matter altogether. What this means is that technologies such as Hive, Impala or Stinger are not your only choices if you want to run sophisticated SQL analytics on top of HDFS. In particular, not only does Calpont offer a richer SQL syntax than any of these, it has performed real-world benchmarks (the Piwik data set, which is similar to Google Analytics) comparing the performance of InfiniDB on HDFS to Impala: the results ranged from between one and two orders of magnitude better performance (average 38 times) using InfiniDB and you would expect even better when compared to, say, Hive.

These are pretty significant numbers and, bearing in mind the open source model that Calpont offers, that’s a very good reason for looking at InfiniDB as an alternative to Impala or any of the other Hadoop SQL initiatives.