Yawn wars

For those of you with any familiarity with bridge, you will know that at a higher level, the game is replete with conventional bids (that is, your bid means something specific that is not necessarily related to the suit you bid in). One such convention is called Lebensohl. There are also variations on this convention such as Modified Lebensohl, Lebensohl with transfers, Staymansohl, Extended Lebensohl, and Rubensohl. Finally, somebody came up with YAS, which is the acronym for ‘yet another Sohl’. Well, I am about at that stage with data warehouses so I have coined the acronym YAWN: ‘yet another warehousing notion’. I’d like to think of a better word than notion (I would welcome suggestions) but yawn encapsulates my feelings so well that it seems appropriate.

Anyway, as an update in the YAWN wars, a number of things have caught my attention recently, not least a number of new (or new to me) vendors, a couple of lawsuits, and two companies announcing support for MapReduce.

As far as the companies go, these are:

Aster Data, whose nCluster is a shared-nothing, row-based, clustered solution;
InfoBright, which offers the column-based Brighthouse;
InfoBionics, which describes its product as being a cellular database management system and which I would describe as using a hybrid associative/tokenised approach (incidentally, I would describe illuminate’s correlation database in the same way);
1010data, which is another column-based database but is rather more interesting both because it is provided through a SaaS offering and because the company has been around since 2000 and has a significant number of customers, many of which are major organisations—I will be writing in more detail about 1010data in a separate article.

The lawsuits I will merely mention—with DATAllegro and Vertica both being sued for patent infringements; in the case of Vertica, by Sybase.

Much more interesting are the announcements by both Greenplum and Aster Data that they are supporting MapReduce. I have to say that I have not fully got my head around MapReduce yet (any contributions that can explain it in words of one syllable would be welcome) but as I understand it, it is effectively a form of tokenisation. Its importance is that it makes it much easier to parallelise functions that are otherwise hard to parallelise. For example, you can parallelise text processing and mining and, potentially, you can use it to parallelise other things where that would be useful and where it is currently difficult.

I have seen suggestions that MapReduce could be usefully applied to data mining and analytics. I am not sure about this: isn’t that what massively parallel processing does for you, let alone the various data mining products out there? I can see the advantages in text mining, where it may be difficult to break down the data into subsets but I am not clear how this would apply to data mining.

In any case, the verdict is not yet in on MapReduce—it has issues with SQL, for example. Whether it will become popular or stay a minority sport remains to be seen: if lots of other vendors start coming out with it then you will know that it is at least seen to be a requirement—if they don’t then we will know that it isn’t.

Anyway, so much for the latest update on the YAWN wars. Oh, and incidentally, a little bird tells me that Oracle may have something to say shortly.