Netezza: a black swan

Written By:
Content Copyright © 2007 Bloor. All Rights Reserved.

For those of you that have read Nassim Nicholas Taleb’s book “The Black Swan” the relevance of the title of this article should be immediately obvious (I hope). However, for those of you have not, I had better explain.

Until sometime after Captain Cook discovered Australia everybody living everywhere else thought that swans were always white. Indeed, I am prepared to bet that a good many readers even today will be surprised to hear that you can get non-white swans (I confess that I wouldn’t know it either if it wasn’t for the fact that there is a famous Australian stamp featuring a black swan, and I was something of a philatelist during my youth). Anyway, the point is that everybody thought that whiteness was a defining characteristic of swans until someone in Australia discovered a black one. Such a discovery is disruptive: it totally changes the way that you think about swans.

This is what Netezza has done in the data warehousing market: it has totally changed the way that we think about data warehousing (with a little bit of help from the other companies that have since followed in its footsteps). I had a call really recently from a journalist who told me that he used to write about data warehousing back in the late 90s. Almost the first thing he asked me was whether the market was still as boring as it was then. I rapidly disabused him.

But a Black Swan event, as defined by Taleb, is not just about changing the rules of the game, a further point is that the discovery that swans could be black was totally unpredictable. In the case of Netezza, of course early employees knew what the company was developing, but the market as a whole had no idea.

The final major point about black swan events is that they can have huge consequences, which can be either positive or negative. Perhaps the most recent example of the latter is Northern Rock. Further, these consequences can sometimes take a long time to unwind. For example, the rise of the Internet was a black swan event. What I am going to suggest now is that the impact of the Netezza black swan event has not yet fully played out.

Let’s go back to the beginning. In 2003 Netezza introduced its data warehouse appliance to the world, which mostly didn’t pay much attention. Nevertheless, the company started to gain traction and by 2005 it had become sufficiently successful that new companies were springing up to capitalise on this success. More recently the big boys have been responding with IBM introducing its Balanced Configuration Unit and Oracle and Microsoft both talking about appliances.

However, let us take Oracle as an example. In a recent press conference Larry Ellison stated that Oracle would have a product that would compete with Netezza (he named the company specifically) in 12 months time. The first and most obvious implication of this is that Oracle, no matter how it packages up its database, cannot compete with Netezza today. Sure, there are some things, maybe, that Oracle can do better than Netezza but where Netezza’s strengths are, Oracle cannot compete, certainly in performance terms. And, by the way, don’t think that I am especially knocking Oracle: the same applies to the other merchant database products as well.

Now let me go back to this claim that Oracle will be able to compete with Netezza sometime next year. No, they won’t. And neither would Microsoft or IBM if they tried it.

Why? Well, let’s suppose that there are some clever things that you can with Oracle or SQL Server or DB2 that might enable them to compete effectively with the version of Netezza that they have been meeting in competition in recent months. However, there is the rub.

The rub is that this version of Netezza will have been version 3.x. Now, in August, Netezza released version 4.0 which, amongst other things, doubled performance. In 12 months time Netezza is likely to have had two further releases and, while I am not at liberty to give you the details of what these will feature, what I can say is that between 3.x and the version of Netezza that this hypothetically improved merchant database will be supposed to be competing with in a year’s time, we can expect to see around an order of magnitude improvement in Netezza’s performance. In other words the new Oracle (or whatever) will be running ten times slower than the new Netezza.

However, that isn’t the worst of it for Oracle, IBM, Microsoft, Teradata and the rest. The bad news is that all of these improvements are based on software upgrades only. They are being based on facilities that were always latent in the Netezza architecture: it was just that the company hadn’t got around to implementing them yet. There may well be more of these to come.

However, if that was the bad news, wait for the really bad news. The first is that the next generation of FPGAs (field programmable gate arrays) will be 5 times faster than the current generation. And the second is the support for user defined functions being leveraged by the Netezza Developer Network (NDN). I have written about the latter elsewhere so suffice it to say that this is providing NDN members (companies like SAS and SPSS) with performance improvements of around an order of magnitude or better. Moreover, products based on these user defined functions will be in the market and available by the time that Oracle’s new database performance features emerge next year.

So, by the middle of next year Netezza will be ten or 100 times (depending on whether you are using an NDN product or not) faster than it was just a couple of months ago. Do I think that this is what Larry meant that Oracle was going to be competing with? No. And bear in mind that at some point in the future Netezza will introduce the next generation of FPGAs, meaning 50 or 500 times better performance.

In practice, Netezza is relatively conservative in its marketing. It claims to offer 10 to 100 times better performance than conventional systems. In practice I have talked to customers that have reported a 400 times performance boost, and I have talked with customers reporting a better than 200 times average improvement in performance. Netezza doesn’t make these claims because it fears that people will not believe them, but they happen to be true.

Now, consider (conservatively) that you are averaging just 50 times better than previously in terms of performance. Do you really need a further multiplier by 50? Or 500? The answer is that with current workloads: probably not. That said, there are companies now starting to implement Netezza not as a data warehouse at all but simply as a high performance computing platform. For example, one customer is using Netezza for trade matching in capital markets. In this sort of environment (where you would typically be thinking about implementing an event processing engine such as that provided by Streambase) then you could well benefit from 500 times better performance. However, in most instances this is not the case, which means that Netezza has potentially got performance to burn. The question then becomes: what does it do with all of this surplus performance?

And the obvious answer is that it extends concurrency to support more users, extends capacity to scale further, extends its mixed workload capabilities (it’s already working on enhancing its existing features here), extends its mission critical support (ditto) and extends its query scope (its NDN partners are already doing this with XML, geospatial data, video data, text analysis and so forth). What this will mean is that its competitors will no longer be able to claim that Netezza can’t support an enterprise data warehouse (EDW). Of course, this is something of an empty claim anyway, bearing in mind that a number of Netezza’s customers would argue that they are doing just this already and that, on the other hand, a number of companies are moving away from the concept of a centralised EDW to a more federated approach.

The key to all of this is, of course, that Netezza owns the entire stack: it puts hardware components together specifically to support data warehousing and it builds software specifically to exploit the features of hardware. As I have already discussed, the latest release and the upcoming releases are all software only enhancements: they simply leverage the existing hardware better. Other vendors can’t do this. There aren’t many pure play (true) appliance vendors anyway and those that there are have only conventional PC technology to leverage on the nodes in their architecture and there aren’t latent capabilities built into these in the same way that there are in FPGAs. I can’t see how, for example, other suppliers can match Netezza’s user defined functions.

So the bottom line is not just that Netezza’s entry into the market was a black swan event but that that event has not ceased to unfold. Major vendors often damn Netezza with faint praise: “yes, they’ve done a good job but …” as if they were a petty irritant that will sooner or later go away. Well, they had better wake up and smell the coffee.

Note from editor: This is a new version submitted at 11:30 on the 8th/October. The original document contained factual errors. If you have downloaded or syndicated this analysis please ensure that you have the correct version.