ScyllaDB

Written By:
Published:
Content Copyright © 2017 Bloor. All Rights Reserved.
Also posted on: The IM Blog

ScyllaDB, which is pronounced as in Scylla and Charybdis (that is, like Cilla Black), is a replacement for Cassandra that (very) significantly outperforms other Cassandra implementations. Briefly – and various benchmarks and user tests confirm this – ScyllaDB requires much less hardware, performs way better when loading data, reduces update latency hugely, and generally is much, much better when it comes to performance and scalability.

How does it do this? Well, the simple answer is that it re-wrote the whole of the Cassandra engine in C++ instead of using Java. More than that, while Cassandra is multi-threaded, ScyllaDB also leverages sharding, which is on a per core basis, which eliminates the locking issues you can get with Cassandra. And, Cassandra uses a page cache whereas ScyllaDB uses direct memory access (DMA). It also has a dynamic I/O scheduler. And, there are probably more things, but you get the idea. This is deep technical in-the-weeds stuff and you probably don’t care: but what you do care about is that it is way faster. And, because it is way faster you need less hardware and that reduces administration and saves costs.

So, should you run out right now and replace Cassandra with ScyllaDB? Well, it depends: there are some features of Cassandra – such as secondary indexes – that have not yet been added to ScyllaDB. However, in terms of features, ScyllaDB should be fully equivalent to Cassandra within the next few months. So, then you can rush and replace Cassandra with ScyllaDB. Moreover, you should be getting a much easier to use product, because that is what the ScyllaDB development team is currently focused on. But, to return to the first point, if throughput and latency are what are worrying you, then by all means rush out right now. Note that you won’t have to change any application code when migrating from Apache Cassandra to ScyllaDB.

The next question is where ScyllaDB goes from here? There are another bunch of performance optimisations that the company can implement when it needs to – for example, it could enable column-based storage on disk (despite being called a column family database Cassandra does not actually store data in columns) –  but it doesn’t figure it has to right now, so it can focus on new features. For example, multi-tenancy. Even more interesting, however, are the company’s plans for new developments. To explain this, I need to dive back into the weeds.

Briefly, ScyllaDB is not a monolithic implementation. It is, in fact, an implementation of Cassandra on top of ScyllaDB’s open source Seastar engine. And this engine can be used for various purposes, one of which, clearly, was building ScyllaDB. Another has been the development, by Alibaba, of Pedis (Parallel Redis). Anyway, ScyllaDB is planning more instantiations of Seastar. While nothing has been confirmed yet, and we may be talking years rather than months, possibilities include instantiations for Apache Kafka or The Elastic Stack (including Elastic Search).

The key question is whether you are an enthusiast or an enterprise. If you are an enthusiast, you should certainly consider moving to ScyllaDB but by all means go on playing with Cassandra if you really want to. On the other hand, if you are an enterprise I can’t think of a single good reason why you shouldn’t be looking at ScyllaDB instead.