Going to the Xtreme

I recently discussed XtremeData’s DB^x appliance. I now have more details. As I mentioned before it is based on PostgreSQL but only the front-end of that has been retained while the back-end has been completely re-engineered to support a shared-nothing MPP-based architecture with a head node and multiple data nodes. Each data node has a multi-core CPU, twelve 1Tb disks, and an FPGA (field programmable gate array) that XtremeData describes as an in-socket accelerator. That is, the FPGA attaches to the motherboard and has direct access to its resources.

It is the use of the FPGA that makes XtremeData different in that it is used specifically for SQL processing, which is why the company refers to it as “SQL In Silicon”. However, XtremeData is by no means the only company doing this. Nevertheless, there is a distinct architectural difference between how XtremeData is leveraging this compared to other approaches. Indeed, the approach is totally different from every other supplier on the market.

When a query comes into a database the basic premise of all other vendors is that you look at the way the data is distributed and then optimise the query for that distribution. The problem is that when you have multiple joins, order bys and group bys, the intermediate data is often skewed (relatively speaking) and not distributed in an optimal fashion. So what XtremeData has done is to question the assumption that the data distribution is fixed and concluded that in an MPP-based system with a very fast interconnect (DB^x uses Infiniband) there is no need to shy away from data exchange I/O between nodes.

What it is doing is redistributing the data, using the FPGA for data movement and hash-partitioning into temp tables, whenever required during the execution of a query. Once a particular query step is completed, the temp tables are thrown away. Multiple sets of temp tables can be deployed simultaneously in order to support multiple queries that require different distributions of the data.

The actual performance of this system you will have to test for yourself but it certainly appears encouraging based on XtremeData’s published figures. Interestingly, for queries of equal complexity across the same data set you should get consistent performance.

The company claims that its solution will scale from 8 to 1,024 nodes, which you can have in any configuration. It may not be clear from the XtemeData’s web site but the quoted offerings supporting 30, 60, 105 and 225Tb of user data, going up to 60 nodes in a 4 rack system, are examples only.

Of course, we will have to wait to see how much impact XtremeData has on the market but initial results are encouraging: the product first went into beta testing at a number of sites last September, and there appears to be definite interest from customers.