Memgraph is a graph database start-up that was founded in 2016 and has offices in London and Croatia. While suitable for many environments the company is currently focusing on financial services, especially in real-time environments such as fraud detection, anti-money laundering and customer 360.
Last Updated: 23rd January 2019
Memgraph is an in-memory, ACID-compliant, graph database written in C++. In another context it could easily be described as an HTAP database, since to supports both transactional and analytic processing against the same set of data. It uses a property graph model and emphasises high performance, scalability and, most notably, real-time processing.
Memgraph’s approach to technology is that it will reuse or integrate with existing standards and market leading products wherever it can, as illustrated in Figure 1. Thus, Memgraph uses openCypher to query the data (for which the company has built its own cost-based optimiser); it integrates tightly with Apache Kafka for the ingestion of real-time (streaming) data; leverages LDAP, Active Directory and soon Kerberos (Q1 2019) for authentication purposes; supports containerisation via Docker and will be supporting both Kubernetes and OpenShift; plans to support Amazon S3; and supports the use of machine learning models built using TensorFlow or in Python. The company has also developed various algorithms that are shipped with the database, such as breadth first search and weighted shortest path.
Memgraph is available on-premises or in-cloud, and it is offered in either a single node or distributed version. The former is available via free download and the company has plans to open source this. The distributed version is available in both a Developer Edition and an Enterprise Edition, where the latter includes features such as security (schema-based access control rules, encryption and audit logs), high availability and dynamic partitioning (see next).
As alluded to above, one of the key differentiators for Memgraph is its high performance. There are a number of ways it achieves this. For starters, it is written in C++. Consequently, the product enjoys an extremely small footprint: on start-up, it only consumes approximately 10MB of RAM, which means that Memgraph can easily run on edge devices, whether in IoT (Internet of Things) or mobile environments.
Memgraph’s standout feature is something called ‘dynamic graph partitioning’. Essentially, this works by continually and automatically managing the physical location of your data, to optimise performance. This process is carried out based on a variety of factors, including the structure of the graph and in the long-term, user behaviour. It involves minimising the number of relationships that span multiple machines, thus minimising the number of traversals that need to go across the network. This can then be further refined based on the paths and queries that are most in use (for example, if a relationship is accessed frequently, the entities it connects would most likely be moved to the same machine). Moreover, when the physical system is changed – for instance, a new processor is added – Memgraph will detect this and move data accordingly. In addition, newly ingested data is automatically sampled and directed to the optimal machine. Together, this makes scaling up simple: you just add machines to your system, and data will gradually be moved over to your new machines as it becomes optimal to do so. Not only does this feature help to solidify the product’s high performance, it also provides its other key differentiator: extensive and automatic scalability.
Another way in which the product enables high performance is concurrency. Memgraph data structures are lock-free. For concurrency, Memgraph has implemented MVCC (Multi-Version Concurrency Control) with snapshot isolation to ensure that, for example, reads never block writes and writes never block reads. Not only does this contribute to performance, but the snapshotting used within MVCC combines with write-ahead logging capabilities to prevent data loss from occurring during system failure, hence providing a guarantee of durability. Together with the company’s extensive investment in testing and test-driven development, this makes for an eminently robust solution.
Finally, we should mention Memgraph Lab, illustrated in Figure 2. This is a lightweight visual user interface for developers, designed to help openCypher query and Graph development. It provides visualisation (of both graphs and schema), exploration capabilities, the ability to tune queries through query profiling (with diagnostics and query plan details) and import capabilities via Memgraph’s Graph Streams.
In general, the reason graph databases are worth caring about is that they perform well – in some cases, exceptionally well – compared to more traditional databases for processing data that involves multiple, complex relationships. Ultimately, this is a matter of performance. It follows that, when choosing to use a graph database, you should be striving it optimise its performance. This is where Memgraph’s design decisions are relevant. It has been developed using C++, which will always outperform (with a smaller footprint) products written in other high-level languages such as Java. It has been designed from the outset to support real-time updates: what is the value of a recommendation engine (a common graph use case) if the resulting recommendations are not up-to-date? And it has been designed as a “Smart Graph” with features such as dynamic graph partitioning, which will adjust the behaviour of the databases according to usage. Quite simply, Memgraph has been built from the ground up to be what the company itself describes as a ‘superfast’ graph database.
The Bottom Line
Graph databases significantly outperform traditional technologies when it comes to processing and querying complex, multi-way relationships. Thus, performance is a major characteristic of graph databases. Memgraph combines high performance with real-time capabilities, and this is unusual. We especially like the dynamic graph partitioning which, as far as we know, is unique.