Neo4j
Last Updated:
Analyst Coverage: Philip Howard and Daniel Howard
Neo4j Inc (previously Neo Technology) was first conceived in 2000 and was formally founded in 2007 in Sweden, although it is now headquartered in the United States. Outside of these two countries the company also has offices in the UK and Germany with additional sales and service personnel across the EU, Middle East and Asia Pacific regions. The company’s eponymous product is available in both Community and Enterprise Editions and is available both on-premises and via Google, Amazon and Microsoft Azure cloud platforms. Managed service options (Neo4j Aura) are also available, both in public and private clouds. The company also offers Neo4j Bloom as a visualisation engine and the Neo4j Graph Data Science Library.
The company has a significant partner base. Notable amongst these are Confluent (Kafka), Linkurious, Thales, Tom Sawyer, IBM, EY, GraphAware and NEORIS amongst many others.
Neo4j (2020)
Last Updated: 11th September 2020
Mutable Award: Platinum 2020
Neo4j is a property graph database with a native engine that is targeted at operational, hybrid operational/analytic (HTAP) and pure analytic use cases. It is ACID compliant and supports immediate consistency. Additional technologies and tooling are available to support the Neo4j environment. Since version 4.0 was released (4.1 is the current version) the product has supported scale-out as well as scale-up, as shown in Figure 1, which depicts the (geographically) distributed environment that Neo4j now supports. This is based on the introduction of support for sharding, which extends the horizontal multi-cluster scaling that was introduced in version 3.4. The replicas illustrated refer to read replicas, which have been available within the product for some time. Also included in the most recent release is support for much more granular security than was previously the case.
Most users (see below) employ Cypher or openCypher (the open source version), which is the declarative language developed by Neo4j. It is notable that SAP, Redis, Memgraph and others have adopted OpenCypher and it is also being used within several open source projects including Cypher for Apache Spark, and Cypher for Gremlin, as well as in research projects like InGraph for streaming queries. As with any declarative language this is best implemented along with a database optimiser and the company has devoted considerable resources to this, extending beyond an original rules-based optimiser so that it is now primarily cost-based, supporting optimisation for writes as well as reads.
Customer Quotes
“Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code. At the same time, Neo4j allowed us to add functionality that was previously not possible.”
eBay Shutl
“I’d like to comment on Neo4j’s scalability and capability of looking at millions and millions of nodes. We have a “big data” problem — not only in structured data, but in unstructured data — and we are continually gathering more data. At NASA, my focus right now is on the unstructured data. And I need a product or an application that can go across and develop millions if not billions of nodes, connect that information and at fast speeds. Neo4j is that tool.”
NASA
Unusually for a property graph, SPARQL is supported. So too is Gremlin (part of the Apache Tinkerpop project). Perhaps more significantly, the company has introduced a “BI Connector” which translates SQL queries into Cypher, with an initial focus on supporting Tableau so that you can use this instead of, or alongside, Neo4j Bloom, where the latter provides a visualisation and communication interface for non-technical users so that they can explore, edit and search graphs, and create storyboards. This is illustrated in Figure 2.
Also, importantly, the company is a driving force behind GQL (graph query language), which is intended to be a common standard for graph databases. under the ageas of the ANSI standards committee. This initiative is supported by a range of technology vendors including Talend, SAP, Tableau and others.
A significant recent release is the Neo4j Graph Data Science Library, which works in conjunction with Neo4j Bloom to support advanced analytics and machine learning. More than 50 graph algorithms (see Figure 3) are supported and have been optimised for robust scale and parallelised for performance. This last point is important because there are some vendors offering parallelised graph algorithms (consider MADlib for example) running against relational databases. And the problem there is that there are only a limited number of such algorithms can be parallelised in a relational environment, whereas Neo4j is able to offer a much comprehensive set of capabilities. Finally, in the context of analytic and query support, it is also worth noting that Neo4j supports 3D geospatial capabilities.
Neo4j is the clear market leader in the graph space. It has the most users, it uses and drives a widely adopted query language. In many respects, it has consistently been a lot more innovative than its competitors. This is in part because of the maturity of the product and partly because its success has meant that it has the resources to introduce such developments more quickly. Its competitors have historically argued that the product did not scale well but the multi-clustering and sharding that are now available should knock that argument on its head. Some vendors that specialise in analytics will claim that they can outperform Neo4j and this may be valid, but Neo4j does not have this limited focus: it is, in effect, the Oracle or SQL Server of the graph database world. It is not the equivalent of Teradata. In other words, it is a general-purpose graph database, and it is no coincidence that it is the leading product in this space.
The Bottom Line
Whenever we talk to a vendor in the graph database space it is Neo4j they compare themselves to. Even if they do something different and address a different market, Neo4j is the benchmark – the company claims more than 400 enterprise customers globally. In pretty much every instance Neo4j should be on your shortlist.
Neo4j (January 2019)
Last Updated: 23rd January 2019
Mutable Award: Gold 2018
Neo4j, is a labelled, property graph database with a native engine that is targeted at operational and hybrid operational/analytic (HTAP) use cases, as illustrated in Figure 1. It is ACID compliant and supports immediate consistency. Most users (see below) employ Cypher or OpenCypher (the open source version), which is the declarative language developed by Neo4j. It is notable that SAP, Redis, Memgraph and others have adopted OpenCypher and it is also being used within several open source projects including Cypher for Apache Spark, and Cypher for Gremlin, as well as in research projects like InGraph for streaming queries. As with any declarative language this is best implemented along with a database optimiser and the company has devoted considerable resources to this, extending beyond an original rules-based optimiser so that it is now primarily cost-based, supporting optimisation for writes as well as reads.
Customer Quotes
“I’d like to comment on Neo4j’s scalability and capability of looking at millions and millions of nodes. We have a “big data” problem – not only in structured data, but in unstructured data – and we are continually gathering more data. At NASA, my focus right now is on the unstructured data. And I need a product or an application that can go across and develop millions if not billions of nodes, connect that information and at fast speeds. Neo4j is that tool.”
NASA
“Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code. At the same time, Neo4j allowed us to add functionality that was previously not possible.”
eBay Shutl
Historically, Neo4j has prioritised performance over scale but over the last couple of years it has put significant emphasis on scalability. This started with support for read replicas and, in the latest release (3.4), the implementation of full horizontal multi-cluster scaling.
Also in the 3.4 release the product supports native string indexes, which will improve write performance; 3D geospatial search; a bulk data loader; security applied against property values; a new date/time datatype; and faster Cypher run-times.
Unusually for a property graph, SPARQL is supported. So too is Gremlin (part of the Apache Tinkerpop project). However, the emphasis is on Cypher and the company plans to introduce a “Cypher for Gremlin” capability in addition to the “Cypher for Spark” capability that is already available. In the context of the latter the company currently has a product in alpha testing called Morpheus for Apache Spark. This is intended to allow graph analytics within a data lake (Hadoop and Hive) with in-memory graphs, graph storage within Neo4j and high-speed data transfer between the two environments. In the future, Morpheus will support any source supported by the open source Kettle data integration suite. Also, importantly, the company has introduced a GQL (graph query language) Manifesto as a step towards having a common standard for graph databases. This has been proposed to the ANSI SQL committee for approval and is supported by a range of technology vendors including Talend, SAP, Tableau and others. This proposal is running in parallel to the ANSI SQL property graph extension program.
Other major developments in Neo4j include support for high performance graph algorithms; preparatory work to leverage new hardware capabilities such as Intel Optane and IBM Power 9; integration and the introduction of Neo4j Bloom, which was released in May 2018. This provides a visualisation and communication interface for non-technical users so that they can explore, edit and search graphs, and create storyboards. An illustration of Neo4j Bloom is provided in Figure 2.
Neo4j is the clear market leader in the graph space. It has the most users, it uses a widely adopted language (not just by Neo4j but also many other suppliers of graph databases) that is much easier to use than Gremlin and, in many respects, it has consistently been a lot more innovative than its competitors. This is in part because of the maturity of the product and partly because its success has meant that it has the resources to introduce such developments more quickly. Its competitors have historically argued that the product did not scale well but the multi-clustering now available should knock that argument on its head. Some vendors that specialise in analytics will claim that they can outperform Neo4j and this may be valid, but Neo4j does not have this limited focus: it is, in effect, the Oracle or SQL Server of the graph database world. It is not the equivalent of Teradata. In other words, it is a general-purpose graph database, and it is no coincidence that it is the leading product in that space. In this context, we should comment that we have little faith in the various benchmarks published by other vendors; not least because they are not typically comparing apples with apples.
The Bottom Line
Whenever we talk to a vendor in the graph database space it is Neo4j they compare themselves to. Even if they do something different and address a different market, Neo4j is the benchmark. In pretty much every instance Neo4j should be on your short list.
Neo4j (June 2019)
Last Updated: 27th June 2019
Mutable Award: Gold 2019
Neo4j, is a labelled, property graph database with a native engine that is targeted at operational and hybrid operational/transactional and analytic use cases. It is ACID compliant and supports immediate consistency within a cluster and, optionally, for writes. In multi-cluster environments, especially where multiple read replicas are deployed, causal consistency is supported. Most users employ Cypher or OpenCypher (the open source version), which is the declarative language developed by Neo4j, though SPARQL and Gremlin are also supported. As with any declarative language Cypher is best implemented along with a database optimiser and the company has devoted considerable resources to this, extending beyond an original rules-based optimiser so that it is now primarily cost-based, supporting optimisation for writes as well as reads.
Customer Quotes
“Neo4j enables a new dimension of data analyses to fight diabetes by helping us to connect highly heterogeneous data from various disciplines, species and locations to build an invaluable body of knowledge. By applying modern machine learning techniques to our Neo4j graph, we are getting closer to understanding this complex disease to help diabetics and those with prediabetes.”
German Center for Diabetes Research (DZD)
“Our Neo4j activity implementation has led to a great decrease in complexity, storage, and infrastructure costs. Our full dataset size is now around 40 GB, down from 50 TB of data that we had stored in Cassandra. We’re able to power our entire activity feed infrastructure using a cluster of 3 Neo4j instances, down from 48 Cassandra instances of pretty much equal specs. That has also led to reduced infrastructure costs. Most importantly, it’s been a breeze for our operations staff to manage since the architecture is simple and lean.”
Adobe
Historically, Neo4j has prioritised performance over scale but over the last couple of years it has put significant emphasis on scalability. This started with support for read replicas and, in the latest release (3.4), the implementation of full horizontal multi-cluster scaling. Since that release the product also supports native string indexes, which improve write performance; 3D geospatial search; a bulk data loader; security applied against property values; a new date/time datatype; and faster Cypher run-times. Other major developments in Neo4j include support for high performance graph algorithms (thirty of them), support for a variety of types of knowledge graph, and preparatory work to leverage new hardware capabilities such as Intel Optane and IBM Power 9. However, from the perspective of supporting analytics, the most notable recent developments from Neo4j have been in complementary products, most notably Morpheus for Apache Spark and Neo4j Bloom.
Morpheus for Apache Spark (currently in beta) will allow pattern-based graph analytic queries to be performed directly within a data lake, using a “tables for labels” scheme to automatically map relational views into an in-memory Spark graph. Data can be stored as a graph in Neo4j and easily move between the Neo4j and Spark environment at high speed. On a related note, as part of the preparations for the forthcoming Apache Spark 3.0 release, the Spark development community has just voted to add Cypher-based property graph querying based on DataFrames to Spark. As a result, Spark 3.0 users will be able to use the Cypher for graph query processing, as well as having access to graph algorithms stemming from the GraphFrames project and GraphX.
Neo4j Bloom provides a visualisation and communication interface for non-technical users so that they can explore, edit and search graphs, and create storyboards. Features include natural language search; business views of your graphs (for example, by department or looking at/for sensitive data); the ability to select, expand, dismiss or find paths through your graphs; browsing; code-free graph change capabilities and support for the use of GPUs for high-performance rendering. An example of the Neo4j Bloom user interface is shown in Figure 1.
Neo4j is the clear market leader in the graph space. It has the most users, it uses a widely adopted language (not just by Neo4j but also many other suppliers of graph databases) that is much easier to use than Gremlin and, in many respects, it has consistently been a lot more innovative than its competitors. This is in part because of the maturity of the product and partly because its success has meant that it has the resources to introduce such developments more quickly. It is, in effect, the Oracle or SQL Server of the graph database world.
More generically, graph databases are a natural home for combining operational and transactional processing with analytics. Their understanding of relationships, combined with the fact that is no need for multi-temperature data and no requirement to store the data twice (once in memory and once on disk) makes graphs an obvious candidate for processing in hybrid environments.
The Bottom Line
All analytics involve relationships in one way or another, so graphs that represent those relationships are an obvious way to explore and analyse them. When you bear in mind that Neo4j has spent most of its history focusing on support for operational and transactional processing, it is clear cut that it is worth serious consideration for hybrid processing environments.