Content Copyright © 2017 Bloor. All Rights Reserved.
Also posted on: The IM Blog
Teradata has announced that it is spinning off the support and development of the Presto SQL on Hadoop engine, into a new company called Starburst Data. The logic behind this move is that Teradata is focused on the Fortune 500 but Presto has much wider applicability. Teradata feels that a separate entity will be better placed to exploit opportunities within the wider market. At the same time, the new company – largely staffed by people who had been at Hadapt (a previous Teradata acquisition) – will continue to work closely with Teradata so that, for example, Presto will continue to support the Teradata QueryGrid and it will continue to form a part of Teradata’s Appliance for Hadoop.
Presto was originally developed by Facebook as a step forward from Hive. It is an open source product, available under an Apache license, but it is not an Apache project. Unlike every other mainstream data warehousing vendor – all of which have, in effect, ported their data warehousing products onto Hadoop – Teradata instead decided to adopt Presto, and it formally launched support for the product in 2015.
Since 2015, Teradata has become a major contributor to the Presto project, most notably by adding spill-to-disk capability. What this does is to cater for query processes that exceed the amount of memory available, allowing analyses to continue to run, even if at a slower pace. The lack of features like this is one of the major reasons why a number of SQL on Hadoop engines still fail to perform all the TPC-DS SQL benchmark tests (especially at scale and with multiple concurrent users).
What is more interesting is what hasn’t been introduced into the product yet. To begin with, there is a cost-based optimiser, the result of a joint development between Teradata and Facebook. This was originally intended to be released in September but has been held back pending the launch of Starburst. So, expect a major announcement around this capability and, to follow, there will be one on workload management, which had originally been scheduled for this month’s release of Presto, and has been similarly held back. As features, these are both big deals. While spill-to-disk provides a major advantage over other open source SQL on Hadoop engines it doesn’t do much for competitive positioning versus traditional competitors such as IBM, Oracle, Kognitio, Actian, Vertica, or Pivotal. And the truth is that cost-based optimisation and workload management won’t give Presto any sort of huge advantage against these vendors either. But what it will provide is some sort of parity. Right now, Presto cannot come close to competing with the likes of IBM Big SQL or Kognitio in performance terms, and the main reason for that is the lack of these features. While they may not completely close the performance gap – these will be first releases, after all – they should go a long way towards doing so.
This is all very sensible. The only thing I don’t like is the new company name, which I think is a bit excessive: it sounds like a character from a Marvel comic.