Deep dive

If you were to apply a clustering algorithm to the database market you would find a variety of clusters, same based around technology – NewSQL, NoSQL, Graphs, relational and so forth – and some based around solutions: transaction processing, warehousing, stream processing, HTAP (hybrid transaction/analytic processing). From a technical perspective one of these clusters is MySQL.

Before the advent of NoSQL and NewSQL, MySQL was the selection du jour for start-ups as well as for many more established companies. However, it hit performance and scalability issues, so a number of alternate/additional vendors (some are replacements, some offer better engines, some aim to offer distributed capability) to improve upon the MySQL environment. Thus there are InfiniDB (previously Calpont) and Infobright for data warehousing; Tokutek, which has been generally regarded (at least by me) as the leader in this space; Scalebase, which provides distributed scaling; and a number of others. Into this arena has now come Deep Information Sciences and the Deep Engine.

The most interesting thing about the Deep Engine is its use of machine learning. As far as I know it is the first database provider to use this technique to directly support database performance and scaling. Specifically, it uses predictive algorithms to predict the behaviour of the system, which it then uses to assign hardware resources in order to optimise performance. As a result of this and other features (of which more in a moment) the company claims to have demonstrated a trillion row database and to outperform Tokutek by four times (and Tokutek claims a 50x performance boost compared to native MySQL). However, I should note that the Deep Engine is not necessarily a replacement for TokuDB or InnoDB: you can use it alongside these and other MySQL engines or as a replacement, as you prefer.

Another important feature provided by Deep is that it is an append-only database with deferred writes, with data held in-memory in the interim so that it can be used immediately. As a result Deep supports HTAP. Further, rather than having fixed page sizes, Deep uses adaptable page sizes, which help to reduce or remove the need for seeks.

Apart from being innovative in its technology it is also being innovative in its marketing. The product is free to use for developers and also for start-ups with annual revenues of less than $1m, as well as for educational establishments. In this way the company is hoping to build up a “cult” following. It will, no doubt, have a significant focus on OEM and ISV partnerships. The product is available on IBM Softlayer and Amazon Redshift for those wanting to deploy in the cloud.

More generally, the database market is currently rife with new companies developing database technology in innovative ways. I’m not talking about NewSQL and NoSQL here, but products like AtomicDB, Ancelus, SpaceCurve and so on, which are using radically new techniques and mathematics to address familiar issues. I would put Deep into this category. Where it differs from these others is that it has a defined market to go after, which makes a lot of sense. As a forecast, if Deep proves to be successful – and there’s every reason why it should be – then I would expect mainstream vendors to start to implement machine learning algorithms into their database products.