Performance-driven compression

Written By:
Content Copyright © 2007 Bloor. All Rights Reserved.
Also posted on: Accessibility

Along with its latest financial results, which have seen the company move into profitability for the first time, Netezza has just announced its Compress Engine, which will be available in version 4.5 of the product in May of next year.

When it first unveiled this facility to analysts over a year ago Netezza was calling this feature “compiled tables” and, while it took a while for me to get my head around why it was using this terminology, I eventually started to appreciate that this technology was as much, or more, about performance as it as about compression. However, the company has now decided to go with the more straightforward language of compression, which is something people understand, even though is not completely accurate. As a corollary, it is interesting to note that in its press release about the Compress Engine the company never once refers to the sort of compression ratios you can achieve with this engine.

This is because compiled tables work differently, and have a different primary goal from, other database compression technologies. In particular, although compression rates are typically 2–5 times on average for large tables, the primary focus of the Compress Engine is not to enable compression but to improve performance. In its press release Netezza refers to improvements of 100% but in fact performance may be significantly better even than that. This is very different from the typically incremental performance benefits that you can get using compression from most other vendors.

In addition, using the Compress Engine all tables are compressed. With, say, DB2 9.5 or Oracle 11g the cost of decompressing tables will mean that for smaller tables you get performance degradation for compressing these. This means that there is an administrative overhead involved in determining which tables to compress and which not to. You do not have this overhead with the Compress Engine.

The way that the Compress Engine works is that data is compressed (automatically) into a compiled format, on a columnar basis. What is actually stored on disk is the “instruction set” that tells the engine, which is embedded within Netezza’s FPGAs (field programmable gate arrays), how to re-construct the original data as it is streamed off disk.

This is, in fact, the sixth engine to be produced for the FPGAs, the others being the Control Engine, which is a management engine; the Visibility Engine, which enforces ACID (atomicity, consistency, isolation and durability); and three filtering engines, which parse, project and restrict data so that only the data needed to resolve the query in hand is passed along for processing.

The company has also exposed plans for potential new engines that it might build in the future, for example to support encryption or other capabilities. While there are a number of such possibilities this is, of course, limited by the number of gates on the FPGA, though next generation FPGAs will, of course, extend the possibilities that are available.

To summarise: while Netezza is calling this new functionality a Compress Engine this is, at least in the way that compression is normally thought of, a misnomer. It is both a compression and performance engine.