
Figure 2 - The Presto distributed system
Presto is a massively parallel distributed system that runs on a cluster of machines. A full installation includes a coordinator (which enables high availability) and multiple workers, as illustrated in Figure 2. Queries are submitted from a client such as the Presto CLI (command line interface) to the coordinator. The coordinator parses, analyses and plans the query execution, then distributes the processing to the workers. Specialised connectors are available for Cassandra, MySQL, Google BigQuery, ElasticSearch, Oracle, MongoDB, Snowflake, PostgreSQL and many others, while there is also ODBC and JDBC support. There are Presto client libraries that support C, Go, Java, Node.js, PHP, Python R and Ruby. Also notable are the in-memory capabilities, the use of vectorised columnar processing and integration with Kubernetes, which allows the deployment on any cloud and on-premises
The product does not currently support push-down query capability but the company intends to introduce this in 2020. This will be two-way to the extent that you push-down when that is appropriate but refrain from doing so if the source database is overworked.
A major feature of Starburst Enterprise Presto is that it offers a cost-based optimiser that is the result of a collaboration between what is now Starburst and Facebook, as opposed to the less capable optimiser used in standard Presto distributions. It has been designed specifically for Presto, as opposed to the Apache Calcite project, which is more of a generic optimiser. Another major feature that was previously contributed by Teradata is spill-to-disk, which is designed to support query processing when you run out of memory. There are a number of other in-memory engines which grind to a halt if you run out of memory. Workload management capabilities are provided along with resource groups.
The product has strong security capabilities, with support for LDAP and Kerberos, and you can inherit security details from the storage environment. In addition, Starburst ensures Presto security & governance with role-based access control, data masking and encryption (both at rest and in motion), column and row level security, and integration with Apache Ranger. And finally, the company has recently introduced Starburst Mission Control as a management console to manage Starburst Enterprise Presto clusters across platforms and data sources. It allows you to create, access, and manage multiple clusters, even across hybrid cloud environments, from a single intuitive user interface.
It is currently available on AWS and Kubernetes, which covers both cloud and on-premises deployments.