Surfing to a warehouse near you

The big news at TDWI this week was the announcement of Netezza TwinFin. Surfers amongst you will know that a twinfin is a surf board with twin fins at the back: they are supposed to be extremely fast and easily manoeuvrable. You should be able to see the parallel.

The parallel that won’t be obvious is that TwinFin reflects a brand new (well, almost new) Netezza architecture. It’s not new in the sense that the product still has an asymmetric massively parallel architecture (that is, SMP at the front-end and MPP at the back-end) and it’s not new in the sense that the TwinFin still uses “snippet processors” that are based on FPGAs (field programmable gate arrays). However, what is new is that this is no longer implemented on top of proprietary hardware but, instead, on IBM BladeServers.

This may take some explanation. IBM BladeServers feature a mezzanine card known as a sidecar. These are basically extension units that link to the blades (making up twin fins). What Netezza has done is to leverage this capability to implement four dual FPGAs per sidecar, supporting a total of eight disk drives. The whole ensemble of blade plus sidecar is known as an S-Blade (snippet blade). It is worth commenting that transfer rates are faster using this architecture than the previous one, with S-Blades supporting 8 data streams with throughput of 115MB/sec each compared to the 65MB/sec that was previously the case. It is also worth commenting on the fact that other blade manufacturers support the concept of sidecars (though they don’t use that terminology) so there is the possibility of TwinFin being ported to other platforms in the future.

As far as actual deliverables are concerned 12 S-blades can be configured into a single rack containing 96 disk drives of 1Tb each with dual SMP hosts, storing around 32Tb of uncompressed user data but you can also license fractions of a rack. Shortly, you should be able to configure up to 10 racks in a single configuration, which will support a theoretical maximum of 1.3Pb of compressed data.

So, why has Netezza done this? Is it simply responding to competitive claims about the downsides of proprietary hardware? Lock-in, no reusability and so on? The short answer is no, though that’s a useful by-product. The actual reason is that Netezza figured that it could probably get around a doubling of performance through software and other enhancements whereas, thanks to improvements in multi-core technology and disk drive performance, it could get a lot more by moving to blades. In this release, it estimates a 5x performance boost and that will increase further through enhanced software functionality in due course. And there are further knock-on benefits: the company can be more price competitive (and, indeed, its new pricing paradigm is independent of disk capacity) and it can spend more time on software optimisation because it has less hardware to worry about.

Of course there are software elements to this release, not least of which is the ability to cache compressed data in memory, but the main feature is TwinFin. Netezza is already by far and away the leading “next generation” vendor in the data warehousing space and TwinFin can only help it maintain that position.