Whither data governance?

I have been approached four times in the last few weeks by VCs wanting to talk to me about data governance. That’s more approaches than I usually get in a year. The question is why?

The first answer is that there is some consolidation going on. Trillium got acquired by Syncsort, Informatica acquired Daiku and rumour has it that another well-known player in the data governance (and migration) space is also up for sale.

But I think this is a symptom rather than a cause.

Another possible answer is what Henri Peyret at Forrester Research has called Data Governance 2.0. This is essentially about organisations sharing collaboratively within a data governance framework that spans partners, data providers and so forth. I like the idea, but most companies haven’t got their own house in order let alone thinking about working collaboratively. So, no, I don’t think this is behind current trends in the market.

A third thing to consider is technology. Historically, there have been two halves to data governance: there was data quality and stewardship on the one hand and policy creation and management on the other. And vendors tended to be in one camp or the other. This is changing. It is why Informatica, which did the former, acquired Diaku, which did the latter: merging (well, ultimately) the two products into what Informatica calls an Enterprise Data Governance (EDG) solution. But it is also worth considering another company with an EDG solution: notably TopQuadrant. The interesting thing about TopQuadrant is that it comes at data governance from a semantic perspective. And whereas Diaku (and others) uses a graph database to help explore and visualise relationships within a data governance perspective, TopQuadrant goes more deeply by using an RDF database that truly works at a semantic level.

That’s interesting but do I think technology is the answer? I don’t think so. It’s an enabler, certainly. But I don’t think it’s why so much interest is being expressed in data governance. Nope, the answer is big data or, more specifically, data lakes.

You might express disbelief at this suggestion but I will explain. Many organisations have stopped playing around with data lakes and want to start using them seriously. And this means understanding both the data and metadata therein, which in turn requires governance. But, and this is the key point, this is not governance as we have historically known it. In the past, governance has been about compliance with appropriate regulations and it has been about the accuracy and timeliness of your data. It still is about those things: you will still need to comply with GDPR, even if you don’t like it. But data governance in this context is essentially about avoidance: trying to ensure that bad things don’t happen. It’s the sort of thing that companies do with reluctance rather than enthusiasm. However, when it comes to the governance and management of data lakes there is another, much more positive aspect to data governance, which is that the data cataloguing and data preparation involved, enable productivity, self-service and automation. And companies like that and will invest in it. And that’s why (enterprise) data governance is hot and getting hotter.