Redis Labs
Last Updated:
Analyst Coverage: Daniel Howard and Philip Howard
Redis Labs is the ‘open source home’ and official sponsor of Redis, a leading in-memory database platform that uses a key-value storage paradigm. It is also the commercial provider of Redis Enterprise, an enhanced version of Redis that provides additional functionality designed for the enterprise. This is available both on-premises and in the cloud (Amazon, Microsoft and Google). Redis itself is often rated as one of – if not the – most popular NoSQL databases and Redis has more than 7,900 paying customers.
The company also extends the native characteristics of Redis and Redis Enterprise to many popular industry use cases implemented through modules that the company has developed. By “module” the company means functionality embedded into the product as opposed to something tacked on top.
Redis Labs is a privately held company backed by venture capital. It was founded in 2011 and its corporate headquarters are in Mountain View, California. It has additional offices in London, Tel Aviv and Bangalore.
Redis Enterprise (February 2020)
Last Updated: 28th February 2020
Redis Enterprise is an in-memory, distributed (automated partitioning), NoSQL database with a key-value store as its underpinning. The core open source and commercial capabilities are shown in Figure 1. However, this description does Redis a disservice because the company’s approach to modules – see Figure 2 – is such that, in reality, Redis is better thought of as a multi-model database that can be used to support document processing, graph traversals, stream processing , machine learning, time-series, search and so forth. It is also possible to write your own modules.
From a transactional perspective, Redis Enterprise is ACID compliant within a single cluster that is not geographically distributed. Both synchronous and asynchronous replication are supported and therefore both Active/Active and Active/Passive deployments, in the former case relying on CRDT-based (conflict-free replicated data types) strong eventual consistency.
Customer Quotes
“We have a very high concurrency: about 40,000 or more at peak times logging into our system to utilise our services. We’ve had very good results with fetching data in pre-populating forms and the user experience has been very good.”
India’s National Informatics Centre
“Redis Labs delivered on its commitments to demonstrate how its application can scale and provide the level of performance needed for high speed, high volume transaction processing. The Redis Labs solutions teams were committed to finding a solution to the problem that we presented and they were dedicated and diligent in following up on providing the solution
that we needed.”
Fortune 100 Financial Services
From the perspective of time-series data the key issue is how Redis modules work. Note that these are embedded into the database engine and not just layered on top. The relevant options are illustrated in Figure 2 and, of these, it is the time-series module, along with RedisGears and possibly RedisAI, that are of interest here, though Redis Streams may be useful in supporting in high ingestion rates.
RedisTimeSeries provides built-in aggregation functions such as calculating minima, maxima, sums, averages and so on. More significantly, you can label data – based on timestamps – on either an individual or global basis and then you can use those labels to support analytics. This will be particularly useful in Internet of Things (IoT) environments where you are doing initial analytics at the edge. Additional capabilities include the ability to compress data across different time series functions, specific time-series indexing, support for time buckets, programmable retention policies, and downsampling (reducing the sampling rate so that you can determine the granularity – typically of sensor data – you require). There are also built-in Prometheus and Telegraf interfaces and visualisation support through Grafana.
All Redis modules are designed to interoperate with one another but RedisGears takes this one step further by facilitating inter-module transformations. It provides a serverless environment and allows you to aggregate data across multiple Redis database instances and react to activity based on pre-defined triggers. In other words, you get event-driven data transformations from one model to another, in real-time, in memory.
Then there is RedisAI. This enables the deployment of machine learning models and model serving, with the database supporting relevant languages such as Python, R and Scala plus deep learning support with the ability to embed TensorFlow, PyTorch and TorchScript models into your analytic workflows.
Finally, Redis is well-known for its performance, not least because it has historically run everything in memory. However, as the company moves away from focusing on caching use cases into wider environments, it can no longer assume that all of its clients can afford the amount of memory that may be required. For this reason, warm (as opposed to hot) data may be stored on SSDs and the company has been working with closely with Intel on its Optane Persistent Memory technology.
Apart from its performance and scalability – which are obviously major factors – the most outstanding thing about Redis Enterprise is the flexibility that its support for different data structures and modules provides. From a time-series perspective the relevant module has some significant features. However, we would like to see more geo-spatial support as the product is limited to supporting latitude and longitude, which may be enough for some IoT applications, but it doesn’t offer the breadth of capability that some other databases in this space do. Conversely, Redis offers many other functions that competitive time-series offerings do not do.
The Bottom Line
It is interesting to observe how Redis has managed to leverage its initial success as a caching technology, into something more general-purpose. It is now a major contender across a range of functionality and we expect that to be also true with respect to time-series.
Redis Enterprise (June 2019)
Last Updated: 27th June 2019
Mutable Award: Platinum 2019
Redis Enterprise was originally an in-memory, distributed (automated partitioning), NoSQL database with a key-value store as its underpinning. The core open source and commercial capabilities are shown in Figure 1. However, this description does Redis a disservice because the company’s approach to modules is such that, in reality, Redis is better thought of as a multi-model database that can be used to support document processing, graph traversals, stream processing (a core feature rather than a module though, rather surprisingly, there is no support for Kafka), machine learning, time-series, search and so forth. It is also possible to write your own modules.
From a transactional perspective, ACID transactions may be supported on data within a cluster, as long as the cluster is not geographically distributed. Both synchronous and asynchronous replication are supported and therefore both Active/Active and Active/Passive deployments, in the former case relying on a CRDT (conflict free replicated datatypes) approach and leveraging strong eventual and causal consistency. The choice of replication methods also allows tuneable durability (disk-based or replication-based) and consistency. Both strong and weak consistency (note: this is not the same consistency as ACID consistency) are supported within a local cluster but if you have a geographically dispersed environment (either Active/Active or using read replicas) then these would be eventually consistent.
Customer Quotes
“Redis Enterprise helped us deliver applications faster and with greater reliability than ever before. It enabled us in scaling by executing more than 2 million transactions – all within a short window of 6-8 hours.”
India’s National Informatics Centre
The architecture of Redis Enterprise is illustrated in Figure 2 where the Cluster Watchdog does what its name suggests; the Node Watchdog supports a secure (multi-tenant) user interface, a call level interface and a REST API; and the proxies handle the complexities of, for example, shared memory.
Perhaps more interesting from the point of view of combining analytic processing with operational and transactional processing is the way that Redis leverages different data structures. This is illustrated in Figure 3, along with Redis’ own suggestions about where these particular different data structures might be useful. Alongside these data structures, Redis provides a variety of inline analytics through its modules, which range from specific functions such as Topk, which tracks the most frequent elements in a data set, to RedisAI, serving deep learning models with built-in integrations into popular AI frameworks, supporting machine learning models and model serving. Further, there are also operational analytics capabilities such as RediSearch, RedisGraph, RedisJSON and RedisTimeSeries, amongst others.
More generally, Redis supports a wide range of programming languages, including Python, R and Scala. RedisAI supports TensorFlow and PyTorch and in the future, ONNX.
Finally, Redis is well-known for its performance, not least because it has historically run everything in memory. However, as the company moves away from focusing on caching use cases into wider environments, it can no longer assume that all of its clients can afford the amount of memory that may be required. For this reason, warm (as opposed to hot) data may be stored on SSDs and the company has worked closely with Intel on its Optane DC Persistent Memory technology, with general availability being announced in April 2019.
Apart from its performance and scalability – which are obviously major factors – the most outstanding thing about Redis Enterprise is the flexibility that its support for different data structures and modules provides. Moreover, a significant number of these features explicitly support analytic as well as transactional capabilities. For example, prior to the introduction of modules, Redis did not have any of the core data structures to enable fast and efficient manipulation of JSON documents or the ability to use Redis as a Graph database. Many competitive solutions fall short as they impose restrictions on the data to fit their native data model. Modules like RedisJSON are designed to be intuitive regardless of whether someone is familiar with Redis or JSON. We should also say that we are especially impressed with the architecture behind RedisGraph, which uses sparse adjacency matrices. More broadly, we can say that Redis supports a broader range of pre-built analytic capabilities than most, if not all, other NoSQL databases.
The Bottom Line
It is interesting to observe how Redis has managed to leverage its initial success as a caching technology, into something more general-purpose. It is now a major contender in the provision of hybrid analytic and operational/transactional processing.
RedisGraph (2019)
Last Updated: 14th December 2018
Mutable Award: One to Watch 2018
RedisGraph is the graph database module for Redis, where by “module” the company means functionality embedded into the product as opposed to something tacked on top. Version 1.0, available via a Docker container, was released as a preview in July 2018 and was made generally available in November 2018, along with Enterprise (multi-node) support. We are therefore commenting on a product in its very early stages and, as one might expect, it currently lacks some of the advanced features of more mature products (for example, strong consistency is not yet available). On the other hand, it has a major point of distinction from other graph products. This is that, apart from operating as part of the Redis platform, it represents and stores graphs as sparse adjacency matrices instead of adjacency lists. This enables much faster ingestion and query performance than would otherwise be the case.
The product itself is a property graph. Queries are written in (a subset of) the Cypher query language and, further, Redis Labs is participating in the GQL project to create a standardised graph querying language. Consequently, we fully expect RedisGraph to support GQL when it arrives.
Generally speaking, there are two ways to store and represent graphs. The first of these is the adjacency list, which consists of a list of all the nodes in the graph it represents, each paired with the set of nodes with which it shares an edge (or, in other words, has a relationship with). This is effectively the industry standard. The second is the adjacency matrix, which represents its graph as a single, square matrix with one row and column for each node within the graph. Within this matrix, nonzero values indicate the presence of an edge between the nodes represented by the corresponding row and column. An example of an adjacency matrix, alongside the graph it represents, is shown in Figure 1.
This method has several advantages in terms of performance. For example, determining whether two nodes share an edge is significantly faster when using matrix representation. Queries can be performed using direct mathematical operations such as matrix multiplication, which is often much faster than the traditional approach using adjacency lists. Performing a self-join, for instance, is simply a matter of multiplying a matrix by itself. Similarly, ingestion rates can be improved using adjacency matrices.
Historically, matrix representation has had two disadvantages. Firstly, it scales extremely poorly in terms of storage. An adjacency matrix for even a moderately large graph can take up far more space in memory than is commercially viable. However, the vast majority of matrices representing real world graphs are ‘sparse’ – meaning that almost every value is zero – and RedisGraph stores adjacency matrices in Compressed Sparse Column (or CSC) format, meaning that they are, effectively, only storing nonzero values. This almost always results in a very large saving in terms of memory. Notably, storing matrices in CSC format does not impact RedisGraph’s ability to use them in mathematical operations.
The second hurdle is that, until recently, there was no easy or standardised way to implement queries based on linear algebra (which refers to mathematical operations on matrices, such as matrix multiplication). This has been solved by the release of the GraphBLAS engine (see http://graphblas.org), which underpins RedisGraph. GraphBLAS is an open effort that provides standardised building blocks for graph algorithms based on linear algebra and using algebraic constructs known as semirings (rings with an additive inverse). ‘BLAS’ stands for Basic Linear Algebra Subprograms. The combination of matrix representation and linear algebra optimises and simplifies many different graph queries and algorithms. For example, a comparison between implementations of Breadth-First-Search using linear algebra and the standard approach using adjacency lists is shown in Figure 2. It should be clear that the algebraic approach is easier to write and to understand. It is also computationally simpler.
There’s one more advantage of the matrix representation that is worth noting. Matrix operations (such as multiplication) are extremely and easily parallelisable, and this property carries over to queries and graph algorithms based on linear algebra. This means that RedisGraph benefits very significantly from parallelisation. It will, in the future, be able to take full advantage of the massive parallelisation offered by GPU based processing.
RedisGraph is an extremely performant graph database. Thanks to the matrix representation it uses and the linear algebra algorithmics it implements, it is able to create over one million nodes in under half a second, and form three million relationships in the same timeframe. What’s more, early benchmarks performed by Redis Labs suggest that their graph is order(s) of magnitude faster than its competitors.
Moreover, RedisGraph’s CSC storage format mitigates the problems with storage and memory usage that the matrix representation of graphs has had in the past, providing a sixty to seventy percent reduction in memory usage. This makes adjacency matrices a practical proposition, allowing RedisGraph to benefit from all of their advantages with few – if any – of the accompanying downsides.
The Bottom Line
Although RedisGraph has yet to be officially released, it has already seen use by dozens of non-production users running real-time graph use cases. While it is early days in terms of features the theoretical advantages of using adjacency matrices are considerable. Even if you are not already using Redis, it is certainly worth looking into.
Mutable Award: One to Watch 2018
RedisGraph (2020)
Last Updated: 11th September 2020
Mutable Award: Highly Commended 2020
RedisGraph is the graph database module for Redis where by “module” the company means functionality embedded into the product as opposed to something tacked on top. It is available via a Docker container, downloadable software, and as an optional part of Redis Enterprise. As a relatively young product, it currently lacks some of the advanced features of more mature products (for example, strong consistency is not yet available). That said, it offers a major point of distinction from other graph products: it represents and stores graphs as sparse adjacency matrices instead of adjacency lists. This enables much faster ingestion and query performance.
The product itself is a property graph. Queries are written in (a subset of) the Cypher query language and, further, Redis Labs is participating in the GQL project to create a standardised graph querying language.
Customer Quotes
“We tried several graph database technologies and we really found that RedisGraph is the one that gave us the speed to solve instant real-time problems, yielding a minimum 5x improvement in query speed.”
IBM
Generally speaking, there are two ways to store and represent graphs. The first of these is the adjacency list, which consists of a list of all the nodes in the graph it represents, each paired with the set of nodes with which it shares an edge. This is effectively the industry standard. The second is the adjacency matrix, which represents its graph as a matrix with one row and column for each node within the graph. Within this matrix, nonzero values indicate the presence of an edge between the nodes represented by the corresponding row and column. An example of an adjacency matrix, alongside the graph it represents, is shown in Figure 1.
This method has several advantages in terms of performance. For example, determining whether two nodes share an edge is significantly faster when using matrix representation. Queries can be performed using direct mathematical operations such as matrix multiplication, which is often much faster than the traditional approach using adjacency lists. Performing a self-join, for instance, is simply a matter of multiplying a matrix by itself. Similarly, ingestion rates can be improved using adjacency matrices.
The vast majority of matrices representing real world graphs are ‘sparse’ – meaning that almost every value is zero – and RedisGraph stores adjacency matrices in Compressed Sparse Row (or CSR) format, meaning that they are, effectively, only storing nonzero values. This almost always results in a very large saving in terms of memory. Notably, storing matrices in CSR format does not impact RedisGraph’s ability to use them in mathematical operations.
Going further, RedisGraph implements the GraphBLAS engine. GraphBLAS is an open effort whereby ‘BLAS’ stands for Basic Linear Algebra Subprograms. It provides the ability to use linear algebra running against sparse (compressed) matrices and this combination of optimises and simplifies many different graph queries and algorithms. For example, a comparison between implementations of Breadth-First- Search using linear algebra and the standard approach using adjacency lists is shown in Figure 2. It should be clear that the algebraic approach is easier to write and to understand. It is also computationally simpler.
There is one more advantage of the matrix representation that is worth noting. Matrix operations (such as multiplication) are extremely and easily parallelisable, and this property carries over to queries and graph algorithms based on linear algebra. This means that RedisGraph benefits very significantly from parallelisation. It will, in the future, be able to take full advantage of the massive parallelisation offered by GPU based processing.
The 2.0 and 2.2 releases of RedisGraph offer a number of new features over previous versions. This includes various performance improvements, enhanced support for Cypher features, full-text search via RediSearch, and graph visualisation leveraging either RedisInsight (see Figure 3), Linkurious or Graphileon (Redis is partnered with the latter two). RedisGraph has also adopted the SuiteSparse implementation of GraphBLAS, which has positive implications for performance, as well LAGraph, an open source collection of GraphBLAS algorithms developed primarily for academia. A growing number of community created drivers and connectors are also available.
Firstly, we should mention the recently release of RedisAI and RedisGears, which are other Redis modules in the same way that RedisGraph is. As may be imagined, RedisAI is designed to serve ML/DL models that were trained over standard platforms like TensorFlow and PyTorch, while RedisGears is a fully programmable engine that enables orchestration and data-flow across modules, data-structures and cluster shards. In context to this paper that means that RedisGraph should be able to interoperate with RedisAI and, for that matter, with RedisStreams.
Secondly, RedisGraph’s CSR storage format mitigates the problems with storage and memory usage that the matrix representation of graphs has had in the past, providing a sixty to seventy percent reduction in memory usage.
The Bottom Line
Although RedisGraph has matured significantly since its release in 2018, it is still very much a product in its infancy. That said, while it is still early days in terms of features, the theoretical advantages of using adjacency matrices are considerable. Even if you are not already using Redis, it is certainly worth looking into.
Commentary
Coming soon.