Tokutek and MongoDB

Written By:
Content Copyright © 2013 Bloor. All Rights Reserved.
Also posted on: The IM Blog

I have written about Tokutek before and its TokuDB product. To save you having to look up previous articles (“Is YourSQL running too slowly?”) the following is an abstract from an earlier piece:

The big difference between TokuDB and the standard storage engines (such as InnoDB) that underpin MySQL is that it uses Fractal Tree Indexes instead of B-Trees where a Fractal Tree Index is just a marketing (and trademarked) name for what is otherwise known as a cache oblivious streaming B-Tree (normal B-Trees are cache aware).  The fundamental difference is that a B-Tree has a single cache for the whole tree whereas a Fractal Tree uses multiple caches. The result is that a B-Tree writes to disk much more frequently, and in smaller blocks, than a Fractal Tree-based approach, and is therefore much less efficient.

In particular, one of the problems with B-Trees, whether implemented in MySQL or any other database, is that when it writes data to disk, it mixes in old data with new, which means it needs a lot of writes to get all the new data written.  And having more indexes makes it worse because every time you update the data on disk you also have to update all the relevant indexes so that you get a lot of writing to disk, which slows down load speeds. As a result, you have to limit the number of indexes you can support which in turn slows down query response. Fractal Tree Indexes get over this issue. When a Fractal Tree Index is used to write data, it’s all new, which means you do a lot less writing. Further, Fractal Tree Indexes write data in much larger blocks (measured in megabytes as opposed to 16k, which is typical for MySQL) which yields more effective compression . As a result you do not have to short-change your environment in terms of the number and richness of indexes you can support. So you get both better OLTP performance and better query performance. And when I say better I don’t mean just a little bit better but a whole lot better.”

TokuDB, as should be clear from the above, is a storage engine for MySQL. However, I mentioned back in April that the company was planning a MongoDB version of its technology. This duly appeared, as the open source TokuMX, in June and the company has now just announced the TokuMX Enterprise Edition (also open source – downloadable from the Tokutek web site)) which, amongst other things, adds hot back-up capabilities to the MongoDB environment. That’s nice but it’s the Fractal Tree Indexes that make the big difference between a native MongoDB environment and one supported by TokuMX. Once again, the latter provides orders of magnitude better performance and you also get significantly improved compression so that the physical size (and therefore cost) of the database is smaller, plus ACID compliance.

I think this is a really interesting development. Despite all the hype about Hadoop there are actually more live MongoDB deployments than there are Hadoop installations. Of course, a lot of these are simply open source, unsupported implementations but then the same is also true of Hadoop. More particularly, unlike Hadoop where there are lots of vendors offering service, support and extra functionality (HortonWorks, Cloudera, MapR, IBM and so on) the same thing has not been true for MongoDB. Indeed, 10Gen (now MongoDB Inc) has more or less had the market to itself prior to this. However, with the introduction of TokuMX Enterprise Edition we have another significant vendor offering such support, and competition in this market must be a good thing. In addition, the improved performance you’re likely to get from deploying TokuMX should have a substantial impact on the market, especially for enterprise customers: 10Gen had better look to its laurels.