Who will be Snow White?

Six down and counting. That’s the current state of the data warehousing space though I am not including SAP’s acquisition of Sybase, which was a different kettle of fish. In no particular order, DATAllegro was acquired by Microsoft, Greenplum by EMC, Vertica by HP, Netezza by IBM and Kickfire and, most recently, Aster Data by Teradata.

So, who will be Snow White and who will be the seventh dwarf?

Actually, I don’t much care and I’m not going to bother to speculate about it. What I am more interested in is why these six as opposed to ParAccel, Infobright, illuminate, Kognitio, Calpont or Exasol?

We need to dispense with three of the acquisitions first. DATAllegro was acquired by Microsoft because the latter needed a play in the large warehousing space and it thought it could get a quick start by buying in expertise. Not to mention that Stuart Frost and his cohorts are good salesmen. Kickfire was a more opportunistic buy: the company wasn’t doing well but had some interesting technology: effectively, going cheap. And then there’s Netezza: successful, and seen by IBM as an Oracle killer.

But what about the other three: Greenplum, Aster Data and Vertica? Why these three and not any of the others? You can’t say that these are the biggest because Infobright, for example, has way more customers than Aster Data. I would guess that Kognitio does too.

The question is: were these purchases random? Were EMC and HP, in particular, just looking for a good fit for their companies or is there something else involved? I think the latter—these three have one thing in common—Greenplum, Aster Data and Vertica were the first three of the data warehousing vendors to pick up on the use of MapReduce. Although Vertica has a rather different approach from Greenplum and Aster Data the fact is that these were the thought leaders on this issue in the warehousing space or, at least, they were the first to introduce products and capabilities to support the use of MapReduce.

Of course, MapReduce is associated with big data. It doesn’t have to be that large but if the data set is very large then MapReduce can be very useful. And, of course, there are lots of bucks in very large systems, so these acquirers want to ensure that they have the firepower to play in this space. So, that’s why I think these three dwarfs rather than some of the other goblins. If this discussion is anything to go by then the seventh dwarf (Happy?) will be the one that starts building out its MapReduce functions.