A Fast Analytic Database for the Silent Majority

Written By:
Published:
Content Copyright © 2008 Bloor. All Rights Reserved.

ParAccel is one of the growing band of fast analytic database vendors that have emerged in recent years following the success of “uber appliance” Teradata. Founded in 2005 and headquartered in sunny San Diego, ParAccel released its first product in October 2007 along with its first customer, Latinode, a voice over IP provider. ParAccel uses a columnar database approach similar to Sybase IQ and Vertica. Unlike many appliances, it is not initially aiming at the rarefied strata of giant data warehouses where Teradata and its direct competitors live. Data warehouses in the 100 Tb range and above are concentrated in a few verticals (Telco, retail, retail banking) in the business to consumer area and are in a small minority. In the business to business (B2B) world, data warehouses are much smaller in volume, though they may have very complex analytic processing needs. According to IDC, over 90% of data warehouses are less than 25 TB in size.

It is sensible to address this “silent majority” of data warehouse customers, whose need for more rapid analytic processing is every bit as real as at a Telco. An entry level ParAccel configuration may be four servers with 2 terabytes of user data, at a list price of USD 200k (software only). The technology is full MPP (“shared nothing”) and has demonstrated its price-performance effectively through the TPC-H benchmark, where it is, at the time of writing, either first or second in the benchmarks for 100 GB, 300 GB and 1 TB price/performance. One can debate the TPC benchmarks, and certainly vendors can go to a lot of trouble to set things up to show their products off to their best advantage, but they are a lot better than the alternative, which is just unaudited vendor claims. These particular benchmarks are conducted entirely in memory, which is why your particular query on that old server under the desk doesn’t run this fast, but each of the benchmarks is comparable to the others. Customers should obviously check any appliance performance on their own data rather than extrapolating from benchmarks. ParAccel claims load speeds up to 30MB per second per node, a process that can fully utilise parallel processing.

Does the world really need another appliance? One important feature that ParAccel has is the ability to parse SQL stored procedures and translate these. All appliance vendors claim ANSI SQL compliance, and that is fine, but in the real world plenty of code is out there that uses proprietary database extensions. If you have to re-write a few hundred stored procedures in order to get your application to run, this may seriously offset the benefits of the appliance. The vendor claims that in proofs-of-concepts around 70% of SQL Server stored procedures can run unchanged, and around 20% run with “minimal” tweaking, while 10% use features for which no alternative vendor support will ever be likely, and which therefore require full rewriting. This is a significant issue for customers, and having even 70% of the pain mitigated enables customers to consider appliances in situations they might previously have rejected. At present only SQL Server stored procedures are handled (showing its SQL Server accelerator roots), but Oracle stored procedure are in the pipeline. This is a genuine differentiator for ParAccel, since as far as I am aware only Dataupia offers a similar facility (albeit done in a different way).

From a customer viewpoint, ParAccel offers fast query performance at an affordable price point for moderate sized data warehouses (i.e. the vast majority), uses no indexes so has reduced DBA maintenance, appears to have good recovery characteristics, and its ability to run many stored procedures will reduce implementation costs. For B2B customers frustrated that appliances mostly cater for giant warehouses (often with a matching price tag), it is well worth a look.