Content Copyright © 2009 Bloor. All Rights Reserved.
At TDWI this week, Vertica has announced version 3.5 of its eponymous product. This has two major features: FlexStore and MapReduce. Both of these merit discussion.
Vertica is column-based. And, as I have discussed many times, columns provide a number of performance advantages in query environments. Which is why there are lots of column-based vendors. However, for precisely that reason, being column-based is no longer much of a differentiator. Naturally, then, you start to think about ways in which you could use columns more effectively. This is what FlexStore does.
FlexStore, as you might guess from its name, provides a flexible approach to storage. It has two major attributes. The first is that you can, in effect, combine columns. For example, in capital markets you might have ask price and bid price as separate columns. Now, these are almost invariably retrieved together in any query so why have them in separate columns requiring two reads, when you could have them in one? This is precisely what FlexStore allows. Moreover, since they both have the same datatype there will be no impact on compression and there is also a beneficial impact on data loading. In addition, bearing in mind that Vertica supports multiple query profiles on different nodes, you can have columns combined in this way for some queries but not combined in others if that is most appropriate for the queries you are processing. Clever stuff.
The second thing that FlexStore allows, which is also pretty neat, is based on the fact that a single block for retrieval purposes is around 1Mb. So, if you have a small table such as a dimension table, which is less than 1Mb in size, why not simply store it in a single block? This, again, is what FlexStore allows you to do. Seems pretty sensible to me.
So, let’s move on to MapReduce. Other vendors that have been majoring on MapReduce have made a big deal about integrating it with SQL. Vertica has taken a different approach. It believes that MapReduce developers are not typically SQL programmers, and vice versa. In which case, why build a product that requires people to have both sets of skills? As a result of this consideration, what Vertica has done is build input and output compatibility with MapReduce. So, for example, you can load the results of a MapReduce process directly into Vertica for further (SQL-based) processing or, perhaps more interestingly, you will be able to take data from Vertica, process it using Mapreduce and then put the result set back into Vertica for further analysis. In other words, Vertica is supporting sequential MapReduce (via Hadoop) and SQL functions rather than intertwined SQL and MapReduce. The company concedes that an integrated approach may become more interesting if SQL and MapReduce communities become more intertwined in the longer term but it doesn’t believe that it is the most practical approach today. I am inclined to agree.
There are, of course, a variety of other performance and other features in this release, but these are the headliners. Interesting ones too.