IBM boo-boo on big data

On April 3^rd IBM put out a press release called “IBM Announces New Innovations to Help Organizations Benefit from the Next Natural Resource: Big Data”. It discussed the introduction of the IBM PureData System for Hadoop, which includes a new version of InfoSphere BigInsights; the latest version of InfoSphere Streams; enhancements for time series processing in Informix, and something known as BLU Acceleration. However, it forgot to mention the latest release of DB2 (v10.5), which is where BLU acceleration has been implemented (along with Informix). Whoops!

There are some really big deals in this release but the most important is probably BLU Acceleration. This is so significant that I will be devoting a separate article to discussing it and how it works.

As far as the other products are concerned the most important is the IBM PureData System for Hadoop. Like the other PureData models this is an appliance that is pre-balanced with the software pre-installed. It is based on InfoSphere BigInsights, using GPFS storage rather than HDFS. I have discussed this previously but, briefly, it is similar to HDFS but stores data in 128MB blocks rather than 64MB blocks and it has all the sort of robustness and resilience that you would like to have from HDFS but which isn’t actually there.

However, this is not all. BigInsights has been extended in a number of ways but the most significant is the introduction of what IBM is calling BigSQL, which allows you to use ANSI standard SQL to address the database; and, also built into the appliance, is data archival capability.

Of course the PureData System for Hadoop will cost you more than doing this all for yourself but where do you want to put your resources: into getting value for your data or in spending (wasting?) time configuring, implementing and maintaining your own DIY solution.

As mentioned, Streams has also been enhanced, although there is nothing revolutionary about this and both Informix and DB2 have had BLU acceleration implemented. DB2 also has a variety of other new and enhanced features, as you would expect. The one thing that hasn’t been announced formally is that a JSON storage engine for DB2 is now available as a preview. I discussed this previously in my article questioning whether a relational epithet is still appropriate for DB2 (www.bloorresearch.com/analysis/11839/db2-relational-epithet-longer/), with JSON to be the fourth storage engine to be available. One thing that hasn’t featured in this release, and which I had hoped for, is an inference engine to support DB2’s graph store.