It’s a busy time of year

Every mother’s son seems to be announcing new products or versions of their products at this time of the year. As I write this we can expect new announcements from both Oracle and IBM in the next couple of weeks but here are a few interesting developments that have already been announced.

Tokutek has just released version 6.5 of its data storage for MySQL and MariaDB. This is billed as “optimised for flash”. Flash is, of course, much faster than conventional disks. However, apart from the fact that solid state disks (SSDs) are still a lot more expensive than conventional disk (as much as an order of magnitude) they also suffer from “write wear”, whereby performance deteriorates over time. For example, I was recently talking to a hard-core gamer who had deployed an SSD to improve his gaming performance. He told me that it was great initially but the performance benefits started to fall off within six months. So, the less you write to the SSD the better and the point about Tokutek is that it uses fractal tree indexes instead of Btrees, which write much less frequently, because when they do write they write in much larger blocks than when using other MySQL storage engines.

Apart from the flash support, another major feature of this release is support for the ability to add and drop columns and indexes on the fly. While users of DB2 and Oracle will take this sort of ability for granted it is innovative within the MySQL environment. As an aside, Tokutek has been experimenting with the use of fractal trees in conjunction with MongoDB. There’s no product yet but there are some interesting blog posts about it on the company web site.

Moving onto a different vendor, Infobright has announced its Infopliance. This is a genuine data mart appliance as opposed to some package or “pre-integrated” offering. It scales up to 144Tb in this first release and it is targeted specifically at machine generated data. That is, sensor data, log data, clickstream data, smart metering information, call and IP detail records and so on. The company’s high compression capabilities, fast loading (thanks, in part, to the high compression) and its lack of need for indexes (thanks to the product’s Knowledge Grid) mean that it is well suited to this sort of environment. Indeed, even before the release of the Infopliance this has been where the company has got much of its traction over the last few years and it intends to focus its future development on this market. Needless to say the Infopliance is aggressively priced at less than $5,000 per Terabyte for the largest configurations.

Thirdly, ScaleOut Software has announced ScaleOut Analytics Server. ScaleOut is a purveyor of an in-memory data grid and associated technology. Traditionally, in-memory data grids have been used to improve the performance of operational applications. However, with this release the company is targeting, as the product’s name implies, the analytics space. In particular, ScaleOut believes there is a significant market for analysing ephemeral data in close to real-time. There are environments, for example, where you want to be able to detect trends over very short periods of time: five or ten minutes for example, but event processing engines don’t have the storage to do this; using event processing and an in-memory database is expensive and storing the data in a warehouse creates a bottleneck. ScaleOut Analytics Server supports MapReduce and the company is targeting environments with up to around 1TB of (distributed) memory although it thinks the technology should reasonably scale up to an order of magnitude greater than this.

In a way, the introduction of ScaleOut Analytics Server is the most interesting of these announcements. Firstly, as far as I know, this is the first in-memory data grid to specifically target analytics. Secondly, it is symptomatic of what is going on in the data warehousing and big data space in general, in that it represents yet another approach to an existing problem. As already noted there are various other ways to address this requirement and I haven’t even mentioned Acunu (a Cassandra distribution) that already has customers in exactly this space (Tellybug for example: see the video on the Acunu web site). This “multiple ways to skin the cat” is becoming increasingly common across the big data market and is making life increasingly difficult: it is nice to have choices but too many choices becomes confusing.