Memgraph
Last Updated:
Analyst Coverage: Philip Howard and Daniel Howard
Memgraph was founded in 2016 and has offices in London and Croatia. While suitable for many environments the company targets complex graph analytics, often where multiple graph algorithms need to be used in conjunction. For example, in the industrial sector where complex production networks need real-time performance analysis and optimisation, and in the energy sector where power grids need to be managed in a similar fashion. The company is also addressing more common problems such as customer intelligence (recommendations) and fraud within financial services. More generally, the company is focusing on making operational graph analytics easier to use, especially for data scientists.
The product, also called Memgraph, is available in Community and Enterprise Editions (available both on-premises and in the cloud) and there is also a cloud-based managed service offering available on AWS. The company partners with Cambridge Intelligence, Graphileon and FactGem, amongst others.
Memgraph (2019)
Last Updated: 23rd January 2019
Memgraph is an in-memory, ACID-compliant, graph database written in C++. In another context it could easily be described as an HTAP database, since to supports both transactional and analytic processing against the same set of data. It uses a property graph model and emphasises high performance, scalability and, most notably, real-time processing.
Memgraph’s approach to technology is that it will reuse or integrate with existing standards and market leading products wherever it can, as illustrated in Figure 1. Thus, Memgraph uses openCypher to query the data (for which the company has built its own cost-based optimiser); it integrates tightly with Apache Kafka for the ingestion of real-time (streaming) data; leverages LDAP, Active Directory and soon Kerberos (Q1 2019) for authentication purposes; supports containerisation via Docker and will be supporting both Kubernetes and OpenShift; plans to support Amazon S3; and supports the use of machine learning models built using TensorFlow or in Python. The company has also developed various algorithms that are shipped with the database, such as breadth first search and weighted shortest path.
Memgraph is available on-premises or in-cloud, and it is offered in either a single node or distributed version. The former is available via free download and the company has plans to open source this. The distributed version is available in both a Developer Edition and an Enterprise Edition, where the latter includes features such as security (schema-based access control rules, encryption and audit logs), high availability and dynamic partitioning (see next).
As alluded to above, one of the key differentiators for Memgraph is its high performance. There are a number of ways it achieves this. For starters, it is written in C++. Consequently, the product enjoys an extremely small footprint: on start-up, it only consumes approximately 10MB of RAM, which means that Memgraph can easily run on edge devices, whether in IoT (Internet of Things) or mobile environments.
Memgraph’s standout feature is something called ‘dynamic graph partitioning’. Essentially, this works by continually and automatically managing the physical location of your data, to optimise performance. This process is carried out based on a variety of factors, including the structure of the graph and in the long-term, user behaviour. It involves minimising the number of relationships that span multiple machines, thus minimising the number of traversals that need to go across the network. This can then be further refined based on the paths and queries that are most in use (for example, if a relationship is accessed frequently, the entities it connects would most likely be moved to the same machine). Moreover, when the physical system is changed – for instance, a new processor is added – Memgraph will detect this and move data accordingly. In addition, newly ingested data is automatically sampled and directed to the optimal machine. Together, this makes scaling up simple: you just add machines to your system, and data will gradually be moved over to your new machines as it becomes optimal to do so. Not only does this feature help to solidify the product’s high performance, it also provides its other key differentiator: extensive and automatic scalability.
Another way in which the product enables high performance is concurrency. Memgraph data structures are lock-free. For concurrency, Memgraph has implemented MVCC (Multi-Version Concurrency Control) with snapshot isolation to ensure that, for example, reads never block writes and writes never block reads. Not only does this contribute to performance, but the snapshotting used within MVCC combines with write-ahead logging capabilities to prevent data loss from occurring during system failure, hence providing a guarantee of durability. Together with the company’s extensive investment in testing and test-driven development, this makes for an eminently robust solution.
Finally, we should mention Memgraph Lab, illustrated in Figure 2. This is a lightweight visual user interface for developers, designed to help openCypher query and Graph development. It provides visualisation (of both graphs and schema), exploration capabilities, the ability to tune queries through query profiling (with diagnostics and query plan details) and import capabilities via Memgraph’s Graph Streams.
In general, the reason graph databases are worth caring about is that they perform well – in some cases, exceptionally well – compared to more traditional databases for processing data that involves multiple, complex relationships. Ultimately, this is a matter of performance. It follows that, when choosing to use a graph database, you should be striving it optimise its performance. This is where Memgraph’s design decisions are relevant. It has been developed using C++, which will always outperform (with a smaller footprint) products written in other high-level languages such as Java. It has been designed from the outset to support real-time updates: what is the value of a recommendation engine (a common graph use case) if the resulting recommendations are not up-to-date? And it has been designed as a “Smart Graph” with features such as dynamic graph partitioning, which will adjust the behaviour of the databases according to usage. Quite simply, Memgraph has been built from the ground up to be what the company itself describes as a ‘superfast’ graph database.
The Bottom Line
Graph databases significantly outperform traditional technologies when it comes to processing and querying complex, multi-way relationships. Thus, performance is a major characteristic of graph databases. Memgraph combines high performance with real-time capabilities, and this is unusual. We especially like the dynamic graph partitioning which, as far as we know, is unique.
Memgraph (2020)
Last Updated: 11th September 2020
Memgraph is an in-memory, ACID-compliant, graph database written in C++. In another context it could easily be described as an HTAP database, since to supports both transactional and analytic processing. It uses a property graph model and emphasises high performance, scalability and, most notably, real-time processing.
Memgraph’s approach to technology is that it will reuse or integrate with existing standards and market leading products wherever it can, as illustrated in Figure 1. Thus, Memgraph uses openCypher to query the data (for which the company has built its own cost-based optimiser); it integrates tightly with Apache Kafka; leverages LDAP, Active Directory and Kerberos for authentication purposes; supports containerisation via Docker and supports both Kubernetes and OpenShift. It also supports machine learning models built using TensorFlow or PyTorch or written in R, Python or Julia. The company has also developed various graph algorithms that are shipped with the database, such as breadth first search and weighted shortest path.
Customer Quotes
“For analysis of our production networks we apply complex graph analytics. Until we found Memgraph, no other service met our needs in terms of flexibility, performance and custom analytics. Now we are able to integrate complex graph analytics into our internal applications, and deploy them with ease at global scale, and ultimately generate value.”
Fortune 500 Chemical Company
One of the key differentiators for Memgraph is its high performance. There are a number of ways it achieves this. For starters, it is written in C/C++. Consequently, the product enjoys an extremely small footprint: on start-up, it only consumes approximately 30MB of RAM, which means that Memgraph can easily run on edge devices, whether in IoT (Internet of Things) or mobile environments. The fact that Memgraph is an in-memory database is also significant since it will often mean that the entire graph can be held in memory. Not only will this aid performance in general but it will be particularly useful when the database needs to support mixed workloads.
Memgraph’s focus is on algorithm scalability and extensibility. In other words, you can extend and implement high-performance user customised algorithms and procedures. This is enabled through integration with the data science and machine learning ecosystem. Specifically, Memgraph allows you to extend its query language and implement your own custom procedures. These procedures are grouped into ‘Query Modules’, which can be loaded on start-up. Although the most performant and scalable way to implement these procedures is by using the Memgraph C Query Module API, in an effort to make quick development and iteration possible for data scientists, Memgraph also exposes a Python Query Module API. With an embedded Python interpreter inside the database to make it easy for data scientists to leverage libraries like Scikit Learn, TensorFlow and PyTorch, and run analytics directly on data stored inside Memgraph. Finally, Memgraph can be combined with more than 300 graph algorithms from NetworkX and works with machine learning libraries such as www.stellargraph.io.
Another way in which the product enables high performance is concurrency. Memgraph data structures are lock-free. For concurrency, Memgraph has implemented MVCC (Multi-Version Concurrency Control) with snapshot isolation to ensure that, for example, reads never block writes and writes never block reads. Not only does this contribute to performance, but the snapshotting used within MVCC combines with write-ahead logging to prevent data loss from occurring during system failure, hence providing a guarantee of durability. Together with the company’s extensive investment in testing and test-driven development, this makes for an eminently robust solution.
We should also mention Memgraph Lab, illustrated in Figure 2. This is a lightweight visual user interface for developers, designed to help openCypher query and graph development. It provides visualisation (of both graphs and schema), exploration capabilities, and the ability to tune queries through query profiling (with diagnostics and query plan details).
Finally, we must comment on the fact that high availability replication is not yet available in the product. The company has announced that this will be available later during 2020 for both its Enterprise Edition and its managed service.
When choosing to use any (graph) database, you should be striving it optimise its performance. This is where Memgraph’s design decisions are relevant. It has been developed using C/C++, which will always outperform (with a smaller footprint) products written in other high-level languages such as Java. It has been designed from the outset to support real-time updates, and it has been designed as a “Smart Graph”.
We should also add that Memgraph is targeting the most complex graph problems, which other vendors typically ignore. Its emphasis on making graph analytics easier for data scientists is also noteworthy.
The Bottom Line
High performance, real-time capabilities, a focus on complex operational graph analytics, support for open standards, and an environment designed to make analytics as easy as possible, sounds to us like a winning combination.
Commentary
Coming soon.