Knowledge Graphs: are they just for people to explore, or are they broader than that?

TopQuadrant, a data governance provider, has recently been running a campaign around its support for knowledge graphs. In particular, it has published a paper called “Knowledge Graphs versus Property Graphs: Similarities, Differences and Some Guidance on Capabilities” (see here).

The title of this paper is somewhat misleading since it is more of a discussion about the differences between RDF graphs and property graphs (specifically Neo4j) than a general overview of knowledge graphs. TopQuadrant’s view being that property graphs (Neo4j) are not ideal for supporting construction of knowledge graphs which, in turn, depends on your definition of a knowledge graph.

As a general discussion about the differences between RDF and property graphs the paper is worth reading, though there are elements to the advantage of property graphs that are not included. For example, while extolling the fact that SPARQL is a W3C standard it does not mention that there is an ongoing effort to create an ANSI standard property graph language. However, as a general comment the author makes a reasonable attempt at being even-handed about the relevant advantages of the two approaches though, not surprisingly given that TopQuadrant is based on an RDF database, RDF almost always seems to come out ahead.

However, that is not the point of this discussion. As the title of this paper implies, and the accompanying webinar, what TopQuadrant is effectively doing is to try to equate knowledge graphs with RDF graphs. It then goes on to define knowledge graphs as containing “facts about entities in the world together with the meaning of those facts expressed as models or rules”. Now, there are lots of different definitions of what a knowledge graph is, but here TopQuadrant is making a definition to suit its own purposes, and this is evidenced by the fact the paper explicitly describes TopQuadrant as providing “an Enterprise Knowledge Graph Infrastructure for Data Governance”. I don’t disagree with this: “models and rules” are exactly what you require for data governance and these are enabled through the use of SHACL (shape constraint language). The paper goes on to say that SHACL is “a language for defining rules and constraints for RDF Graphs, turning them into fully fledged Knowledge Graphs.” Fair enough. But the bad news, not mentioned in the paper, is that Neo4j supports SHACL through a plug-in called neosemantics. That said, this support is relatively limited compared to the capabilities provided by TopQuadrant.

As a more general comment, TopQuadrant’s focus on knowledge graphs means that in addition to competing with other data governance vendors such as Collibra, Syniti, Informatica and Alex Solutions and it is also starting to compete with the likes of Ontotext (GraphDB), Stardog and Franz (AllegroGraph), all of which support SHACL. It will also come into competition with products built on top of ODPi Egeria, as these start to come to market.

However, that’s an aside. The point is actually to discuss what a knowledge graph is. Actually, I quite like TopQuadrant’s idea here. I think “facts about entities in the world together with the meaning of those facts” would be the basis for a reasonable definition. However, this leads to another discussion point. Are knowledge graphs intended solely for people or do they also extend to automated processes involving machines, applications and algorithms? If it is only the former – which certainly applies to some use cases for knowledge graphs – then I do not see why these facts and their meanings have to be necessarily expressed as models or rules. The allusion to rules is surely specifically directed towards the governance of these entities and their relationships rather than environments where you simply want to understand and explore those relationships. In other words, you want rules and models if you want to automate processes, such as data governance processes, but I don’t think you need them if that’s not necessary for your use case.