Figure 1 – A graph and corresponding adjacency matrix

Generally speaking, there are two ways to store and represent graphs. The first of these is the adjacency list, which consists of a list of all the nodes in the graph it represents, each paired with the set of nodes with which it shares an edge (or, in other words, has a relationship with). This is effectively the industry standard. The second is the adjacency matrix, which represents its graph as a single, square matrix with one row and column for each node within the graph. Within this matrix, nonzero values indicate the presence of an edge between the nodes represented by the corresponding row and column. An example of an adjacency matrix, alongside the graph it represents, is shown in **Figure 1**.

This method has several advantages in terms of performance. For example, determining whether two nodes share an edge is significantly faster when using matrix representation. Queries can be performed using direct mathematical operations such as matrix multiplication, which is often much faster than the traditional approach using adjacency lists. Performing a self-join, for instance, is simply a matter of multiplying a matrix by itself. Similarly, ingestion rates can be improved using adjacency matrices.

Historically, matrix representation has had two disadvantages. Firstly, it scales extremely poorly in terms of storage. An adjacency matrix for even a moderately large graph can take up far more space in memory than is commercially viable. However, the vast majority of matrices representing real world graphs are ‘sparse’ – meaning that almost every value is zero – and RedisGraph stores adjacency matrices in Compressed Sparse Column (or CSC) format, meaning that they are, effectively, only storing nonzero values. This almost always results in a very large saving in terms of memory. Notably, storing matrices in CSC format does not impact RedisGraph’s ability to use them in mathematical operations.

Figure 2 – Breadth-First-Search via adjacency lists and linear algebra

The second hurdle is that, until recently, there was no easy or standardised way to implement queries based on linear algebra (which refers to mathematical operations on matrices, such as matrix multiplication). This has been solved by
the release of the GraphBLAS engine (see http://graphblas.org), which underpins RedisGraph. GraphBLAS is an open effort that provides standardised building blocks for graph algorithms based on linear algebra and using algebraic constructs known as semirings (rings with an additive inverse). ‘BLAS’ stands for Basic Linear Algebra Subprograms. The combination of matrix representation and linear algebra optimises and simplifies many different graph queries and algorithms. For example, a comparison between implementations of Breadth-First-Search using linear algebra and the standard approach using adjacency lists is shown in **Figure 2**. It should be clear that the algebraic approach is easier to write and to understand. It is also computationally simpler.

There’s one more advantage of the matrix representation that is worth noting. Matrix operations (such as multiplication) are extremely and easily parallelisable, and this property carries over to queries and graph algorithms based on linear algebra. This means that RedisGraph benefits very significantly from parallelisation. It will, in the future, be able to take full advantage of the massive parallelisation offered by GPU based processing.