DataStax
Last Updated:
Analyst Coverage: Daniel Howard and Philip Howard
DataStax is a database vendor that was founded in 2010. Its primary offering is DataStax Enterprise (DSE), the leading database built using Apache Cassandra™. In 2015, the company acquired Aurelius, the chief developers of the Titan graph database, and it subsequently leveraged that expertise to develop DSE Graph as a graph database add-on for DSE. However, in the most recent release (DSE 6.8) the company has re-engineered these capabilities so that the product’s graph capabilities are now built into the platform rather than being an add-on to it. In addition, DataStax is the leading contributor to Apache TinkerPop, the graph computing framework upon which various graph databases (including DSE Graph) are based.
DataStax is headquartered in Santa Clara, CA, and has additional US offices in Austin, and Atlanta, as well as international offices in the UK, France, Germany, Japan and Australia. As of this writing, DataStax has more than 400 employees.
DataStax DSE Graph (2019)
Last Updated: 23rd January 2019
Mutable Award: Gold 2018
DSE is a distributed database oriented towards (though not exclusive to) a hybrid-cloud architecture. It is built on top of Cassandra, but boasts numerous capabilities above and beyond what Cassandra alone offers, including native search and analytics, continuous availability, and significant increases to speed and performance.
DSE Graph is the graph database add-on for DSE. It is a property graph solution that is optimised for storing billions of items and relationships. It is suited for both transactional and analytical processing. In accordance with the latter, it also supports Spark-based analytics. It is available on-premises, in-cloud, or as part of a hybrid solution, and is additionally deployable as a Docker container.
The product originally existed as a bespoke version of the Titan database, optimised to run on the Cassandra database engine used within DSE. It has since been updated with a bevy of new features and capabilities, but retains basic compatibility with Titan, allowing for straightforward migration between the two. Moreover, as with Titan, DSE Graph is built to use Apache TinkerPop. Notably, DataStax is the primary contributor to TinkerPop, having contributed approximately 99% of TinkerPop’s codebase
Customer Quotes
“DSE’s scalability and analytics capabilities provide us what we need to not only analyze every aspect of the supply chain, but also bring new innovations to market.”
Elementum
“Graph analytics is great for showing relationships between data points, and this can be very valuable in a healthcare scenario. By looking at data in different ways within the same platform, we can support more in-depth interactions with patients and improve healthcare outcomes.”
Babylon Health
DSE Graph is a property graph that is fully integrated with DSE. In fact, it relies on DSE (and Cassandra underneath it) as a data store. It also integrates with a number of built-in DSE capabilities, including DSE Search and DSE Analytics. In addition, it is highly scalable and performant, scaling up to billions of entities. In service to this, it leverages optimisation techniques such as query optimisation, data partitioning, and distributed query execution, among others.
Moreover, DSE Graph is designed for both transactional and analytical processing, and consequently features two processing engines – one transactional, one analytical – and allows for both OLTP and OLAP graph traversals. Moreover, for the purposes of OLAP, Gremlin (part of Tinkerpop), SQL and Spark APIs are supported, the latter including both batch and streaming. Furthermore, switching between engines (and therefore modes of traversal) is relatively simple, and can be done without altering the underlying data. This means that you can leverage transactional and analytic queries on a single set of data, as needed. In addition, analytical and transactional workloads are separated, and automatic workload management is available.
DSE Graph includes a variety of tools for managing all aspects of your graphs and graph clusters. This includes Lifecycle Manager and OpsCenter, which allow you to automate and visualise the creation of new graph clusters, respectively. However, the most important tool for interacting with DSE Graph might be the DataStax Studio, a visual, browser-based development environment for your graph. It supports Spark SQL, Gremlin, and CQL (Cassandra Query Language), and additionally comes with a built-in smart Gremlin editor, similar to an RDBMS smart query editor. In fact, much of DataStax Studio is similar in feel to the visual development tools available in more conventional, relational environments. Moreover, to support the visualisation aspect of this tool, DataStax partners with a number of visualisation vendors, including Cambridge Intelligence, Tom Sawyer, Linkurious and Tableau (although the latter is a more general partnership, and not specific to DSE Graph).
In general, the reason graph databases are worth caring about is that they perform well compared to more traditional databases for processing data that involves multiple, complex relationships. However, a graph database by itself can only do so much. In order to effectively address so-called graph problems, your graph must be embedded inside a full software stack that supports a wide range of capabilities, such as search, analytics, and so on. In other words, graph problems are bigger than just the graph database. This is where DataStax, with DSE and DSE Graph, excels, providing not only the graph database, but the full stack as well. Moreover, many of the benefits of DSE are carried over to DSE Graph – including continuous availability, hybrid-cloud deployment, scalability, and so on – and the two are well integrated, allowing DSE Graph to take advantage of a variety of capabilities that are available in DSE, including DSE Search and DSE Analytics. Moreover, DSE Graph in and of itself boasts some significant differentiators. This includes its dual processing engines, allowing you to easily switch between transactional and analytical processing, and DataStax Studio, a particularly impressive example of a visual development environment for graph.
The Bottom Line
Together with its parent platform, DSE Graph provides a complete and effective means of addressing graph problems, regardless of whether they are transactional or analytical in nature. If you already use DSE, DSE Graph makes for an excellent addition. If not, it provides a very good reason for doing so.
DataStax Enterprise
Last Updated: 28th June 2019
Mutable Award: Highly Commended 2019
DSE is a distributed NoSQL database, using CQL (Cassandra Query Language), that is oriented towards (though not exclusive to) cloud and hybrid-cloud architectures. It is built on top of Cassandra, as illustrated in Figure 1. It boasts numerous capabilities above and beyond what Cassandra alone offers, including native search and analytics, auto-management functionality, and significant increases to speed and performance.
As can be seen in this diagram, DSE provides multi-model capabilities and, unlike some other multi-model products you can leverage all of the models, not just within a single database instance but also within a single query. For example, the optimiser can automatically invoke Spark or search (Solr) from a Gremlin (graph) query. This has the advantage that if you are a Gremlin or CQL developer you don’t need to know or understand Spark (or Solr). One possible limitation is with respect to document model implementations where DataStax requires that a schema is defined.
Note that from the perspective of supporting hybrid processing environments DataStax takes the view that this should not only encompass analytic and transactional processing but also search.
Customer Quotes
“Search and analytics were some of the key capabilities we were looking for and with DataStax Enterprise, we got a unified platform that provides all these and more all in the same cluster. This was a significant reason why we chose DataStax Enterprise to power
our app.”
You Are My Guide
“The key benefit of using DSE is the co-location of data and technology with Cassandra and Solr for search and Cassandra with Spark for analytics. This results in the real-time nodes having access to data instantly and not requiring time-consuming or costly ETL processes to move data between systems, because all the data is transparently replicated in the cluster.”
Macquarie
Architecturally, the most notable feature of DSE is that it uses a master-less architecture in which all nodes are the same, with the result that there is no single point of failure. This particularly suits environments where you want to deploy across multiple clouds or in hybrid on-premises and cloud deployments. It also suits the way that DataStax supports workload management, which is illustrated in Figure 2.
As can be seen, you can support any workload within a node, you can specify that a particular node has a specific task or you can have clusters – (elastically) scalable individually - dedicated to a particular task, or you can mix and match these.
From a transactional standpoint the database supports the atomicity, isolation and durability of ACID guarantees but tuneable consistency. The latter is enabled by choosing to use either asynchronous or synchronous replication. The former provides eventual consistency and the latter immediate consistency but with the trade-off of reduced performance.
As far as analytics and search are concerned the company offers specific enterprise components known as DSE Analytics and DSE Search, which work in conjunction with both DSE itself and DSE Graph. As mentioned, DSE Analytics is integrated with Spark and the company claims that DSE Analytics is significantly faster than open source Spark. The product also supports Python and it has customers using both R and TensorFlow though these are not formally supported as yet. PMML (predictive modelling mark-up language) is not supported. It is worth also noting that DSE Graph in and of itself boasts some significant differentiators. This includes its dual processing engines, allowing you to easily switch between transactional and analytical processing, and DataStax Studio, a particularly impressive example of a visual development environment for graph.
Finally, it is worth commenting on DSE’s Kafka integration, which enables data to be streamed into the DSE environment. This is currently only a one-way process, but the company plans to support export to Kafka in a future release.
Cassandra initially made its name as a NoSQL database because it was designed from the outset to support key enterprise requirements such as constant availability, resilience, and disaster recovery, as well as scalability. Many other NoSQL databases did not start from this position and only added mission-critical capabilities – if they did – later. We prefer the approach taken by the developers of Cassandra. Moreover, in DSE there are substantial additional elements that go beyond Cassandra itself, some of which are at the feature level and some of which, such as the multi-model support, and the search and analytics capabilities, are more substantial.
The Bottom Line
DSE is almost unique in supporting both graph and conventional analytics alongside transactional processing and search. No other company we have spoken to sees hybrid processing as a three-way (transactions, analytics and search) environment, and we think DataStax’s approach makes a lot of sense.
DataStax Enterprise (DSE) (Graph Engine) (2020)
Last Updated: 11th September 2020
DSE is a distributed database oriented towards (though not exclusive to) a hybrid-cloud architecture. It is built on top of Cassandra and includes native search and analytics, continuous availability, and significant increases to speed and performance. It is available on-premises, in-cloud, or as part of a hybrid solution. A recently released offering, DataStax Astra, provides open source Cassandra as a database-as-a-service offering and although this supports the GraphQL API it does not currently (this may change) include DSE’s graph engine.
The graph engine in DSE is based on a property graph solution that is optimised for storing billions of items and relationships. It is suited for both transactional and analytical processing. In accordance with the latter, it also supports Spark-based analytics.
Customer Quotes
“DSE’s scalability and analytics capabilities provide us what we need to not only analyze every aspect of the supply chain, but also bring new innovations to market.”
elementum
“Graph analytics is great for showing relationships between data points, and this can be very valuable in a healthcare scenario. By looking at data in different ways within the same platform, we can support more in-depth interactions with patients and improve healthcare outcomes.”
Babylon Health
The DSE Graph Engine is a property graph that is built into DSE and leverages DSE’s capabilities for storage, search and analytics. Consequently, it inherits the scalability, high availability, performance (as much as 10 times faster with this re-engineering) and real-time processing that Cassandra and DataStax are well known for, with scaling up to billions of entities. In service to this, it leverages optimisation techniques such as query optimisation, data partitioning, and distributed query execution, among others. In particular, now that the graph data model is within the platform, this means that you can store your data exactly once but access it via either Cassandra or Gremlin (part of Tinkerpop) APIs. This means, for example, that you can create CQL (Cassandra Query Language) tables and read them via Gremlin, or vice versa. Thus providing interoperability and transparency. SQL and Spark APIs are also supported, with the latter supporting streaming environments as well as batch processing.
The Graph Engine is designed for both transactional and analytical processing, and consequently features two processing engines – one transactional, one analytical – and allows for both OLTP and OLAP graph traversals. Furthermore, switching between engines (and therefore modes of traversal) is relatively simple, and can be done without altering the underlying data. This means that you can leverage transactional and analytic queries on a single set of data, as needed. In addition, analytical and transactional workloads are separated, and automated workload management is provided. Notable new features for graph processing include significantly faster and simpler loading processes (because you are now simply loading into Cassandra) and intelligent indexing tool that analyses the traversals that you regularly make and then recommends appropriate indexes in order to optimise traversal performance.
There are a variety of tools for managing all aspects of your graphs and graph clusters. This includes Lifecycle Manager and OpsCenter, which allow you to automate and visualise the creation of new graph clusters, respectively. However, the most important tool for interacting with the Graph Engine is probably DataStax Studio (see Figure 2), a visual, browser-based development environment for your graph. It supports Spark SQL, Gremlin, and CQL (Cassandra Query Language), and additionally comes with a built-in smart Gremlin editor, similar to an RDBMS smart query editor. In fact, much of DataStax Studio is similar in feel to the visual development tools available in more conventional, relational environments. Moreover, to support the visualisation aspect of this tool, DataStax partners with a number of visualisation vendors, including Cambridge Intelligence, Tom Sawyer, Linkurious and Tableau (although the latter is a more general partnership, and not specific to graphs).
In the past, we have commented that graph was well-integrated within DSE and that it therefore shared many of the advantages of Cassandra. However, now that the Graph Engine is built in such a comment seems superfluous. Perhaps more to the point, to all intents and purposes DataStax no longer markets its graph capabilities as distinct from Cassandra. Of course, it is still available for use in that way if that is what you want to do, but the emphasis is now much more on how the two are complementary, whether that is in IoT environments or for applications involving Customer 360o or in a variety of other use cases.
The Bottom Line
DataStax is targeting DSE, including its Graph Engine, as “the cloud native platform for developers with zero lock-in, zero downtime at global scale”. Cassandra itself is, of course, widely seen as a popular environment for this purpose. By re-architecting DSE so that the graph data model is embedded within the platform DataStax is making the incorporation of graphs as a part of an application, rather than the whole thing, that much easier for developers. It makes a lot of sense.