DATAllegro: how can it do what it claims to do or is it all hype?

Towards the end of last year I wrote an article about the future of data warehouse appliances, asking whether it was a boom or bust. Since then a number of the vendors mentioned in that article have been in touch with me, plus one other company that I am not at liberty to name. Most interestingly, I have spent an extended time with DATAllegro, looking into the product’s architecture and examining how (and if) it can do what it claims it can do.

Before I reveal the results of my findings I should tell you about the company’s claims. To begin with, it claims to offer superior performance to the likes of Teradata and IBM and even more superior performance compared to Oracle or Microsoft (which is a logical corollary to the first statement). Typically, such performance benefits are measured in orders of magnitude. This isn’t difficult to believe for a number of reasons.

The first reason why this is understandable is because there are referenceable customers who will attest to this fact (albeit that these are not yet publicly announced); the second is because Netezza has already proved that performance gains of this magnitude are achievable; and, thirdly, because the architecture of the product also makes such claims realistic.

So, the performance gains are dramatic but they are no more, or not much more, than other appliance vendors are claiming. No, the really startling thing about DATAllegro is that it is claiming to provide 10-100x performance gains at a tenth of the price or less.

The big question is whether this is actually credible—price/performance that is a hundred to a thousand times better is a huge step change—can it really be true?

Yes, I think it can. Take the slave units in DATAllegro’s architecture. These consist of one or two Intel processors, some memory, the Ingres open source database and two RAID arrays consisting of 6 disks each. This is basically the same approach as taken by other appliance vendors with one exception: the others typically have only one or two disks per slave. While there are other sources of its cost cutting, this is the big one: the fact that you need way less slave units.

This raises another question of course: how can DATAllegro manage two RAID arrays and still maintain its performance? The answer is that it has done some clever stuff with the partitioning in Ingres, so that it supports sequential I/O (as does Netezza) rather than random I/O. This, incidentally, is the basic secret behind warehouse appliances along with the fact that you only need an SMP box at the front-end of your warehouse, which significantly reduces the cost compared to the all MPP systems of the likes of Teradata.

Anyway, the bottom line is that I think DATAllegro can genuinely offer the sort of performance and price advantages that it claims. Of course, it is not a panacea: it isn’t and doesn’t claim to be an enterprise data warehouse (at least, not yet) but it is also more than a data mart. In particular, its price point is such that it is opening up new possibilities within the data warehousing space that were previously precluded on account of their price.

Finally, for UK readers, it is worth noting that DATAllegro has just appointed an agent in this country (run by Alex McMorland, who is well known in this space) which will act as a sales force (contracts will be with DATAllegro) and provide professional services for the customers that DATAllegro hopes to gain here. I expect that Alex will be kept busy in the months and years to come.