Content Copyright © 2020 Bloor. All Rights Reserved.
Also posted on: Bloor blogs
Informatica has announced the acquisition of GreenBay Technologies. Unless you are a keen follower of Informatica, or of the University of Wisconsin (from which GreenBay was a spin-off) you won’t have heard of GreenBay, as it had yet to come to market. Nevertheless, Informatica had invested in the company last year because it saw promise in its work on artificial intelligence and machine learning and saw the opportunity to further enhance its CLAIRE engine.
In practical terms, there are three major areas where GreenBay’s capabilities will be enhancing Informatica’s offerings. The first will be in providing increased automation for entity matching, through GreenBay’s CloudMatcher technology. This will be extended to support product matching. Secondly, schema matching, which will be integrated with Informatica’s data cataloguing, data integration and master data management solutions during the course of next year. Thirdly, the company also plans to introduce a cloud native Metadata Knowledge Graph (MKG), leveraging GreenBay’s technology to automatically generate data asset relationships expressed in the graph.
While Informatica might argue – perfectly reasonably – that entity and schema matching are at least as, or more important than MKG, my personal interests in graph technology mean that it is this aspect of this announcement that I am most interested in. So, I have enquired further about the MKG. This will be based on a property graph database. Initially, it will only be used internally by Informatica applications though the company plans to introduce user access to the MKG via an API at a later date. However, this will not be GraphQL.
There are clearly benefits that Informatica, and its users, will gain from having an embedded MKG, not least the fact that relationships will be recognised and mapped automatically rather than manually, but I think this is a missed opportunity. there must be significant potential advantages for users that want to understand and explore their metadata through the MKG, beyond any visualisations that Informatica may provide. If its running on a graph database why not expose a Gremlin or openCypher interface? At the very least the company could support GraphQL.
Note that Informatica is not seeking to compete with semantic (RDF) graph database vendors that allow you to apply governance and other rules via SHACL (shapes constraint language) or even with Neo4j, which has a semantic plug-in that offers (limited) support for SHACL. Nevertheless, these are clearly in overlapping spaces, so the company may not position itself as competitive but that does not mean that the market will agree with it. Moreover, with products now being developed by rival vendors based on ODPi Egeria (which is also property graph based) it is certainly likely to be seen as competitive to this and it is in danger of being accused of vendor lock-in unless it at least announces that it is Egeria conformant even if it is not actually built on Egeria. What I would actually like to see Informatica do is to build the MKG on top of Egeria.
More generally, we are currently conducting some research into data management platforms designed to support (hybrid) cloud analytics environments. While we have not completed this research at the time of writing we can say that a number (not all) of the traditional rivals to Informatica are some way behind the curve when it comes to automation and the implementation of machine learning. The acquisition of GreenBay represents a doubling down on this existing advantage, which Informatica claims will give them a twelve-month lead over its rivals. Where it is already ahead of its competitors, I would guess that that is an under-estimate.