DataStax

Last Updated: 19th July 2023
Analyst Coverage: Daniel Howard and Philip Howard

DataStax is a database vendor that was founded in 2010. Its primary offerings are the self-managed DataStax Enterprise (DSE), the leading database built using Apache Cassandra™, and Astra DB, its cloud equivalent, that is available as a multi-cloud managed service. DataStax is headquartered in Santa Clara, CA, and has additional US offices in Austin and Atlanta, as well as international offices in the UK, France, Germany, Japan, Australia, Ireland, and Singapore.

The company has made several notable acquisitions in its lifetime. In 2015, it acquired Aurelius, the chief developers of the Titan graph database, and it subsequently leveraged that expertise to develop graph capabilities within its platform. In 2016 it acquired DataScale, which it used to develop its managed cloud service. More recently, in 2020, it acquired The Last Pickle, a Cassandra consulting and services company. In 2021, the company acquired Kesque, a cloud messaging service built on Apache Pulsar that it used to develop Luna Streaming, its own Pulsar-driven streaming platform. Streaming functionality is also available through Astra Streaming, the company’s cloud-based streaming service.

Lastly, in early 2023, it acquired Kaskada, a company that specialised in using time-based data to train machine learning models. This capability has since been incorporated into the DataStax platform as part of its real-time AI capabilities. These allow it to apply machine learning to data “in the moment”, enabling real-time predictive analytics and providing vital context to LLMs (Large Language Models), and thereby generative AI. The latter use case is further supported by the addition of vector search capabilities to Astra DB.

Company Info

Headquarters: 3975 Freedom Circle, Santa Clara, CA 95054, USA
Telephone: +1 650 389 6000

DataStax Enterprise

Last Updated: 28th June 2019
Mutable Award: Highly Commended 2019

What is it?

Fig 01 Showing how DataStax is built on top of Cassandra

DSE is a distributed NoSQL database, using CQL (Cassandra Query Language), that is oriented towards (though not exclusive to) cloud and hybrid-cloud architectures. It is built on top of Cassandra, as illustrated in Figure 1. It boasts numerous capabilities above and beyond what Cassandra alone offers, including native search and analytics, auto-management functionality, and significant increases to speed and performance.

As can be seen in this diagram, DSE provides multi-model capabilities and, unlike some other multi-model products you can leverage all of the models, not just within a single database instance but also within a single query. For example, the optimiser can automatically invoke Spark or search (Solr) from a Gremlin (graph) query. This has the advantage that if you are a Gremlin or CQL developer you don’t need to know or understand Spark (or Solr). One possible limitation is with respect to document model implementations where DataStax requires that a schema is defined.

Note that from the perspective of supporting hybrid processing environments DataStax takes the view that this should not only encompass analytic and transactional processing but also search.

Customer Quotes

“Search and analytics were some of the key capabilities we were looking for and with DataStax Enterprise, we got a unified platform that provides all these and more all in the same cluster. This was a significant reason why we chose DataStax Enterprise to power
our app.”
You Are My Guide

“The key benefit of using DSE is the co-location of data and technology with Cassandra and Solr for search and Cassandra with Spark for analytics. This results in the real-time nodes having access to data instantly and not requiring time-consuming or costly ETL processes to move data between systems, because all the data is transparently replicated in the cluster.”
Macquarie

How does it work?

Fig 02 How DataStax supports workload management

Architecturally, the most notable feature of DSE is that it uses a master-less architecture in which all nodes are the same, with the result that there is no single point of failure. This particularly suits environments where you want to deploy across multiple clouds or in hybrid on-premises and cloud deployments. It also suits the way that DataStax supports workload management, which is illustrated in Figure 2.

As can be seen, you can support any workload within a node, you can specify that a particular node has a specific task or you can have clusters – (elastically) scalable individually - dedicated to a particular task, or you can mix and match these.

From a transactional standpoint the database supports the atomicity, isolation and durability of ACID guarantees but tuneable consistency. The latter is enabled by choosing to use either asynchronous or synchronous replication. The former provides eventual consistency and the latter immediate consistency but with the trade-off of reduced performance.

As far as analytics and search are concerned the company offers specific enterprise components known as DSE Analytics and DSE Search, which work in conjunction with both DSE itself and DSE Graph. As mentioned, DSE Analytics is integrated with Spark and the company claims that DSE Analytics is significantly faster than open source Spark. The product also supports Python and it has customers using both R and TensorFlow though these are not formally supported as yet. PMML (predictive modelling mark-up language) is not supported. It is worth also noting that DSE Graph in and of itself boasts some significant differentiators. This includes its dual processing engines, allowing you to easily switch between transactional and analytical processing, and DataStax Studio, a particularly impressive example of a visual development environment for graph.

Finally, it is worth commenting on DSE’s Kafka integration, which enables data to be streamed into the DSE environment. This is currently only a one-way process, but the company plans to support export to Kafka in a future release.

Why should you care?

Cassandra initially made its name as a NoSQL database because it was designed from the outset to support key enterprise requirements such as constant availability, resilience, and disaster recovery, as well as scalability. Many other NoSQL databases did not start from this position and only added mission-critical capabilities – if they did – later. We prefer the approach taken by the developers of Cassandra. Moreover, in DSE there are substantial additional elements that go beyond Cassandra itself, some of which are at the feature level and some of which, such as the multi-model support, and the search and analytics capabilities, are more substantial.

The Bottom Line

DSE is almost unique in supporting both graph and conventional analytics alongside transactional processing and search. No other company we have spoken to sees hybrid processing as a three-way (transactions, analytics and search) environment, and we think DataStax’s approach makes a lot of sense.

Mutable Award: Highly Commended 2019

DataStax Enterprise (DSE) (Graph Engine)

Last Updated: 11th September 2020

What is it?

DSE is a distributed database oriented towards (though not exclusive to) a hybrid-cloud architecture. It is built on top of Cassandra and includes native search and analytics, continuous availability, and significant increases to speed and performance. It is available on-premises, in-cloud, or as part of a hybrid solution. A recently released offering, DataStax Astra, provides open source Cassandra as a database-as-a-service offering and although this supports the GraphQL API it does not currently (this may change) include DSE’s graph engine.

The graph engine in DSE is based on a property graph solution that is optimised for storing billions of items and relationships. It is suited for both transactional and analytical processing. In accordance with the latter, it also supports Spark-based analytics.

Customer Quotes

“DSE’s scalability and analytics capabilities provide us what we need to not only analyze every aspect of the supply chain, but also bring new innovations to market.”
elementum

“Graph analytics is great for showing relationships between data points, and this can be very valuable in a healthcare scenario. By looking at data in different ways within the same platform, we can support more in-depth interactions with patients and improve healthcare outcomes.”
Babylon Health

What does it do?

Fig 01 - Monitoring the DataStax DSE environment

The DSE Graph Engine is a property graph that is built into DSE and leverages DSE’s capabilities for storage, search and analytics. Consequently, it inherits the scalability, high availability, performance (as much as 10 times faster with this re-engineering) and real-time processing that Cassandra and DataStax are well known for, with scaling up to billions of entities. In service to this, it leverages optimisation techniques such as query optimisation, data partitioning, and distributed query execution, among others. In particular, now that the graph data model is within the platform, this means that you can store your data exactly once but access it via either Cassandra or Gremlin (part of Tinkerpop) APIs. This means, for example, that you can create CQL (Cassandra Query Language) tables and read them via Gremlin, or vice versa. Thus providing interoperability and transparency. SQL and Spark APIs are also supported, with the latter supporting streaming environments as well as batch processing.

The Graph Engine is designed for both transactional and analytical processing, and consequently features two processing engines – one transactional, one analytical – and allows for both OLTP and OLAP graph traversals. Furthermore, switching between engines (and therefore modes of traversal) is relatively simple, and can be done without altering the underlying data. This means that you can leverage transactional and analytic queries on a single set of data, as needed. In addition, analytical and transactional workloads are separated, and automated workload management is provided. Notable new features for graph processing include significantly faster and simpler loading processes (because you are now simply loading into Cassandra) and intelligent indexing tool that analyses the traversals that you regularly make and then recommends appropriate indexes in order to optimise traversal performance.

Fig 02 - DataStax Studio

There are a variety of tools for managing all aspects of your graphs and graph clusters. This includes Lifecycle Manager and OpsCenter, which allow you to automate and visualise the creation of new graph clusters, respectively. However, the most important tool for interacting with the Graph Engine is probably DataStax Studio (see Figure 2), a visual, browser-based development environment for your graph. It supports Spark SQL, Gremlin, and CQL (Cassandra Query Language), and additionally comes with a built-in smart Gremlin editor, similar to an RDBMS smart query editor. In fact, much of DataStax Studio is similar in feel to the visual development tools available in more conventional, relational environments. Moreover, to support the visualisation aspect of this tool, DataStax partners with a number of visualisation vendors, including Cambridge Intelligence, Tom Sawyer, Linkurious and Tableau (although the latter is a more general partnership, and not specific to graphs).

Why should you care?

In the past, we have commented that graph was well-integrated within DSE and that it therefore shared many of the advantages of Cassandra. However, now that the Graph Engine is built in such a comment seems superfluous. Perhaps more to the point, to all intents and purposes DataStax no longer markets its graph capabilities as distinct from Cassandra. Of course, it is still available for use in that way if that is what you want to do, but the emphasis is now much more on how the two are complementary, whether that is in IoT environments or for applications involving Customer 360o or in a variety of other use cases.

The Bottom Line

DataStax is targeting DSE, including its Graph Engine, as “the cloud native platform for developers with zero lock-in, zero downtime at global scale”. Cassandra itself is, of course, widely seen as a popular environment for this purpose. By re-architecting DSE so that the graph data model is embedded within the platform DataStax is making the incorporation of graphs as a part of an application, rather than the whole thing, that much easier for developers. It makes a lot of sense.

Commentary

Graph update

Consolidating graphs and other matters

Big data storage options

DataStax

Company Info

DataStax Enterprise

What is it?

How does it work?

Why should you care?

DataStax Enterprise (DSE) (Graph Engine)

What is it?

What does it do?

Why should you care?

Commentary

Solutions

Research

Graph Database (2020)

DataStax Enterprise (DSE) (Graph Engine)

Hybrid real-time data processing

DataStax Enterprise

Graph Database Market Update 2019

DataStax DSE Graph