TigerGraph is based in California and develops and markets a graph database of the same name, which was previously known as GraphSQL. The company has been in existence since 2012 but was primarily in stealth mode, until mid-2017. Nevertheless, the company had acquired some prestigious users, including Visa, Uber, Intuit, Zillow, PingAn and Alipay, amongst others. TigerGraph is VC-backed.
Last Updated: 28th January 2019
TigerGraph is a native graph parallel database that is available both in on-premises and cloud (AWS and Azure) versions. The company has also announced TigerGraph Cloud through which the product will be available as a service. TigerGraph uses a property graph paradigm and its strengths are with processing structured rather than semantically oriented data. Its main areas of focus are anti-fraud, customer intelligence, supply chain intelligence and energy efficient analytics. The Internet of Things (IoT) is also of increasing interest, assisted by the product’s direct integration with Kafka (among other things – see below). Unlike many graph database products, TigerGraph has been designed specifically to support real-time (less than one second) analytics. The keys to achieving this are parallelism, compression and the way that, in TigerGraph, graph edges and vertices are not just units of storage but also computational units. The engine supports the processing of these in parallel, and the product also includes a parallel loader. Compression can be more than 10x, according to TigerGraph, and compression is also used as a part of the loading and transformation processes, to further improve performance. Also relevant is the graph partitioning, which supports application-specific partitioning and mixed partitioning strategies, as well as automated partitioning. Aligned with this, but not shown in Figure 1, is the ability to run multiple graph engines, with each engine hosting identical graphs with different partitioning algorithms tailored for different types of application queries. The front-end server will route application queries to the relevant engines based on the query type.
TigerGraph 2.0, released in February 2018, added various security enhancements, including single sign-on, support for LDAP and Active Directory, encryption (both in motion and at rest) and role-based access control. Most notably, the company introduced a collaboration service whereby multiple groups can share a single master database, with each having their own view into the database. This has important implications for compliance (not least GDPR) because this service allows you to manage and monitor data access, data lineage and personal data. This includes where a point of data was first acquired, whether consent was given in obtaining it, where it moved over time, where it resides in each system, and how it gets used.
There is a free trial program for enterprises and a free developer edition for non-commercial use. The company has also introduced Graph Gurus, which is a free, educational webinar series. The product features one-click deployment to several major cloud marketplaces, including AWS and Microsoft Azure; supports Docker and Kubernetes containers; and includes direct integration with a number of popular data storage systems, as shown in Figure 1.
“We selected TigerGraph for its superior data warehousing speed and computational processing capacity, which improved performance by an order of magnitude.”
“Alipay streams 2B+ daily events in real time to a graph with 100B+ vertices and 600B+ edges on a cluster of only 20 commodity machines.”
TigerGraph is about real-time analytics for anomaly detection, pattern recognition, IoT applications, making recommendations (next best offer) and similar environments where low latency is required. As an example of its use, Figure 2 illustrates the logical architecture deployed for anti-money laundering (AML) at one of TigerGraph’s users. The blue brackets indicate sub-graphs. Note the support for machine learning, both supervised and unsupervised. Note too the fact that this sort of application requires support for real-time processing of operational data, not just analytics.
You can access the database via GSQL. As its name suggests, this is “SQL like”. In addition, a migration toolkit is provided to port queries from Cypher into GSQL, allowing you to easily reuse queries written for Neo4j. The company is planning to add support for Gremlin, part of the Apache Tinkerpop project, in a future release. TigerGraph provides its own graph visualisation capabilities, and also offers a browser-based capability called GraphStudio that can be used to create graph models, queries and so forth. This has been built on top of GSQL to make the environment more user-friendly, allowing ad hoc exploration of your data. There is also a GSQL software developer’s kit (SDK) that third-party graph specialists could use to integrate with TigerGraph. In addition, there is a RESTful API capability, which means that it should be relatively easy to integrate with third-party tools such as Tableau. A user extensible library of graph algorithms is also provided. Several algorithms (such as PageRank) are available out of the box, with more forthcoming as development continues.
TigerGraph also supports machine learning via the ability to generate training data en masse, which can then be extracted into your machine learning solution and used to train models. The training data itself is derived from your graph, and can be exported into your machine learning solution on a continuous basis, for example, every two hours. Due to the complexity of the graph structure on which it is based, this exposes a large quantity of information (particularly relationships) which can then be analysed deeply for connections and patterns. This can dramatically improve the accuracy of your models, particularly compared to simple analysis of relatively uncomplicated training data.
The key point about TigerGraph is its performance. Most other graph databases were built originally to support operational environments and were not intended to be used for complex large-scale and real-time analytics, though they may have been extended in that direction since they were originally designed. TigerGraph, on the other hand, was designed specifically for these environments and it is therefore not surprising that benchmarks suggest that TigerGraph outperforms leading rival products.
The Bottom Line
We should emphasise “complex, large-scale and real-time” as well as “analytics” from the previous section. Add in the ability to process operational data in real-time and you should understand where and why TigerGraph has significant advantages.