When you get a press release, as I did recently, from a company you've never heard of, about a product about which you have equally little knowledge, which is claiming that the database in question can process "big data" with 5 to 7 orders of magnitude better performance than traditional databases my immediate reaction is to wonder what they have been smoking.
However, you can't ignore these things, particularly when they have published benchmarks that have been certified (at Purdue University). It turns out that this is more about "lots of data" than "big data" in the sense of unstructured stuff (though it would certainly include machine generated data) but the performance does seem to be genuine.
So, what is the Ancelus database and why have I never heard of it despite the fact that development started in 2003 and it went live, albeit in a limited number of environments, in 2007? Well, partly because it has spent a further four years in development and testing within those environments so that it is now really robust and ready to be launched on an unsuspecting world.
But, you might ask, how can a company spend eight or more years on the development of a product without actually bringing it to market? Where's the cashflow to sustain such a prolonged development cycle?
And the answer is that Ancelus is, in fact, a third generation database from the same developers and the previous generation product is still earning significant revenues. The first generation product was created, as an in-memory relational database, in the early/mid 80s. This was re-developed in the late 80s as ERDB. Again, ERDB (like Ancelus) is/was an in-memory database but instead of being a relational database it is/was an entity-relational database. Readers of my recent series on graph databases will recognise the similarity.
ERDB is one of those under-the-radar success stories that nobody much knows about, primarily because it is mostly used as an embedded database within specific devices or applications. It is widely used, for example, by Telcos.
There are a few things to note about ERDB. Specifically, each entity is stored exactly once in the database (rather like tokenisation, so there is only one instance of "Chicago", for example, except that you get this for free in ERDB) and it's the relationships that tell you that Chicago is relevant in any particular instance. The vendor typically provides 80% compression rates and says that it will sometimes be as much as 90%. Thirdly, it's worth noting that you don't need indexes because the database is effectively self-indexing, rather in the way that column-based databases don't need indexes. Next, the database always enforces referential integrity - you can't turn it off - but you don't get the performance hit you do with a relational store. Moreover, the database is actually 4NF compliant although the actual data is de-normalised for performance reasons. And, finally, there is a SQL interface for ERDB.
So, if ERDB is so great why develop a new generation product? The simple answer is that ERDB was originally created to run on 8-bit processors. It has gradually been upgraded to support 16, 32 and 64 bit processors but that has inevitably led to compromises, hence the development of Ancelus, a pure 64-bit implementation with intra-core parallelism built-in. The system is MPP-based and has continuous availability (that is, no taking the system down for any - note, any - administrative functions) built-in. The only significant feature that ERDB has that Ancelus does not is a SQL interface though the company assures me that the two products are similar enough that it would be very easy to port the SQL interface should anyone require it (so far, no-one has).
As far as scalability is concerned you can have as little as a few Gb of memory and as much as you can afford. To give an indication of performance at scale the benchmark tests running on a single CPU six core server with Linux processed 1.3 billion transactions per minute (tpm) against a 100 million row table, 1.2 billion tpm against 200 million rows and 1.1 billion tpm against 1 billion rows. Note that these figures are way beyond TPC-C benchmarks even if it is difficult to make a direct comparison between the two. A couple of other figures from the benchmarking exercise are also worth noting. Firstly, the total amount of time that data was locked during the insertion of a new column into a live database was just 8 microseconds, regardless of the size of the database. Secondly, a three table join: first return was between 50 and 65 microseconds and task completion between 2.7 and 6 milliseconds, depending on the number of rows in the database. These are very impressive figures.
From a marketing perspective Ancelus is marketed by distributors in Europe (by Celeram: www.celeram.com or see www.ancelus.com) and in the States and the setup is primarily partner-oriented rather than as a conventional database organisation, whether for OLTP or warehousing (the product is suitable for either). If you are looking for a platform to build applications, either for internal use or to be marketed in a particular vertical, then you could do worse that to take a look at it.