Netezza Advanced Analytics

Written By:
Content Copyright © 2010 Bloor. All Rights Reserved.
Also posted on: The IM Blog

As usual at TDWI there are a series of announcements from major vendors. Not least of this year’s releases has been the introduction of the Netezza TwinFin iClass, which follows on from last month’s announcement of the Netezza Skimmer (a Skimboard is another sort of surfboard) as a low-end entry point to the Netezza range. However, the iClass (where i stands for insight) is a beast of a completely different stripe.

The iClass is an appliance for what Netezza refers to as advanced analytics, which is a part of the general drive across the sector towards in-database analytics. The big advantage of in-database analytics is, of course, that you get much, much better performance. Instead of having to extract the data to an external application server for processing, the analytics can actually be performed in situ. It also means that the analytics are more accurate because you don’t need to sample the data, which is necessary in conventional environments in order to maintain performance. It’s what Netezza refers to as big data meets big math.

So, what does the iClass actually provide? To begin with, it isn’t just SAS scoring in the warehouse: it’s much more comprehensive than that. So, there are two new APIs. One is an open language API that currently supports C/C++, Java, Python, Fortran and, most interestingly, R. It should be easy to add support for others should there be sufficient demand and there is an SDK so you can implement your own. The second API is an Open Framework API that supports MapReduce and Hadoop.

And then there are massively parallel analytics engines that parallelise analytic operations including embarrassingly parallel algorithms for processes that lend themselves to parallelism, task parallelism (for model execution) and algorithms for not embarrassingly parallel processes that parallelise these as much as possible. Specifically, these engines support user defined extensions (functions, aggregates and table functions) where these are to be run within a process; analytic executables, which perform the same role but outside of a process, and nzMatrix. This last is a part of the out of the box analytics functions provided by Netezza, in this case specifically focused on linear algebra with support for the resolution of simultaneous linear equations, least squares, eigenvalues and singular value problems. If youre not a mathematician or statistician I won’t bother to explain what these are but suffice it to say that they are important in certain complex analytic computations.

You might think I’ve finished but I haven’t. On top of all that the iClass also supports an R GUI and Eclipse as well, of course, as partner-based development environments.

Which brings me on to my final point, which is that Netezza is neither getting into the analytic application business nor into the data mining business. Instead it is providing a foundation platform for its partners (like bis2 and Fuzzy Logix) to build analytic applications on. I think this makes a lot of sense: after all, there are far more potential partners with feet on the ground, providing greater coverage than Netezza could manage on its own.

The bottom line is that this is a major step forward for Netezza, differentiating itself still further from the mass of competitors that have yet to implement any sort of in-database analytics. And even those that are doing so are in many cases only providing much more elementary out of the box capabilities.