IBM and big data: an introduction

IBM probably (almost certainly) has the largest portfolio of products spanning the big data space. It has DB2 (or Smart Analytics Systems), Netezza, Informix, Hadoop (with HDFS) and BigInsights (Hadoop with GPFS) to store the data or, if the data is too big and too fast you can use InfoSphere Streams as a complex event processing (CEP) engine. To query the data you can use Cognos, SPSS or BigInsights (which includes text analytics). Also, if you are using BigInsights you can use JasperSoft, as the latter has just announced a partnership with IBM.

The problem with this, and I haven’t even started on data governance or integration – a subject for another day – is that there is too much.

When do you use DB2 or Netezza? The answer from IBM is that you use DB2 for operational analytics and Netezza for ‘deep’ analytics. In other words, Netezza won’t support lots of users with short running queries from, say, a call centre. Well, okay, but that’s a development decision: it was certainly something that Netezza was working on before the company was acquired and I have no doubt that it would have succeeded in satisfactorily supporting operational queries in due course.

When would you use Netezza versus Informix? The particular strength of the latter is in supporting time-series storage and analysis. Netezza only does the latter. So you would use Netezza if you are only interested in analytics and Informix if you want to build instrumented applications around time-series. But it’s all a bit murky.

Hadoop with HDFS or Hadoop with GPFS? I’ve discussed this in a previous article: GPFS is more robust and should out-perform HDFS. So, when would you want HDFS? Probably only if you want to try stuff out to see if the whole idea is worth pursuing and you want to stick with free (open source) software.

As for Streams, I have no issues. In my opinion it’s probably the best CEP engine on the market. And it supports PMML (predictive modelling mark-up language), which many others don’t. I’ll discuss this further in another article.

When it comes to queries again there are multiple products. The biggest question must be why JasperSoft? Not that I have anything against JasperSoft, it’s a nice product, but why not Cognos? If IBM wants a query and reporting solution alongside BigInsights why didn’t it develop something out of the Cognos stable?

Okay, so there are issues. But this is an emerging market. I’d rather have too many choices than too few. And it’s not as if competitive vendors have a single product either. Moreover, there isn’t really anyone competing on the instrumented data side of the house: either for event processing (Oracle, for example, still sees CEP as a part of its SOA offering rather than having anything to do with big data) or for instrumented applications, where both Informix and Netezza have a role to play. Overall, I think IBM has got its approach about right, though some product rationalisation might be nice.