Speculation, Streams, Warehousing and the X-Men

I recently wrote (see “IBM, BEP and CEP”) about IBM’s release of InfoSphere Streams. I reported that rather than referring to this as a complex event processing (CEP) product they are instead calling it a “stream computing engine used for CEP deployments”. However, I did not discuss this further in that article because that piece was factual whereas here I intend to be fanciful.

The question is: why this distinction? If InfoSphere is “used for CEP deployments” then does that not imply that it might be used for other purposes? If so, what might these be?

Well, what is the product doing? Put simply, InfoSphere Streams is processing streams of data very quickly. Where else might you wish to do that where CEP is not the right solution? Now, there may be esoteric applications where this would be relevant but I can only think of one mainstream environment where this would be useful.

But before I go into that, it is pertinent to remind you that InfoSphere Streams is hardware agnostic. Now, in my previous article I referred to the fact that it might be deployed on IBM supercomputers for very large scale, low latency requirements. However, hardware agnosticism also means that it should be deployable on a low-end Intel processor or PowerPC, for example.

Which brings me to Netezza. Netezza is built around multiple parallel nodes processing data that is streamed off disk, and then the results are collated by the central database management system which, of course, also despatched the queries in the first place.

You see the similarity? If IBM was to implement InfoSphere Streams in parallel at the node level, and then link that to DB2 then you would have an MPP (massively parallel processing) version of DB2.

Far-fetched? Perhaps. Certainly they’d be a lot of work to do but Dataupia has already proved that you can deploy MPP-based architectures underneath DB2 (and Oracle and SQL Server) so it’s not of the question?

Anyway, that’s my little confabulation. I haven’t put it to IBM because if they said it was true I’d be put under non-disclosure and if they denied it, it wouldn’t mean anything. Time will tell.

And talking about time telling, it’s shortly time for Larry’s big data warehousing announcement that we’ve been waiting for since last August. My spy is lukewarm about it (but then he would be, given that he’s at another data warehousing company) telling me that it is good for some types of queries but otherwise not spectacular.

But the big question is this: what does this announcement have to do with the X-Men? The answer, along with views on Oracle’s announcement, will follow in due course.