From a purely theoretical perspective, data governance is about how you govern your organisation's information assets where the governance in question is based on corporate policies with respect to the accuracy, timeliness, completeness and appropriateness of that data, as well as security, compliance and retention policies with respect to the data. Some of these policies will be dictated by regulatory requirements and others will be determined internally.
In practice, most data governance initiatives are with respect to relational data only. A few companies have extended this to cover content such as documents. To date, virtually no-one is applying it to big data although the imperatives of data governance apply as much to big data as they do to conventional types of data. Where there is any spreadsheet governance, this is typically treated as entirely separate from data governance. In principle, all sorts of information, in whatever format, should fall within the remit of data governance.
Data governance is about the policies already discussed, and the people and processes that implement and monitor adherence to those policies. In other words you define an appropriate policy, then set up the processes that you will use to ensure that the policy is met, and assign responsibility to relevant individuals (often data stewards) who will monitor and assure the results.
Data governance is not a technology per se but is supported by a variety of different technologies. There are technologies that support data governance in its monitoring and preventative aspects, such as master data management, data quality, and data profiling and discovery but each of these can be deployed without any attempt at data governance and it is entirely possible, if difficult, to implement data governance without some or all of these technologies.
A separate subset of technologies that support the data governance function are related to security and compliance, specifically static and dynamic data masking and test data management.
The only true products that are directly aimed at data governance (and nothing else)—and these are few and far between—are those that aim to capture and manage the policy and process aspects of data governance, supporting data stewards with functions such as issue tracking and managing data sharing. Business glossaries and reference data management may both be included in this category but are also commonly provided by data quality vendors.
Data governance is driven by a confluence of interests: the CMO may want to have more accurate information about customers so that he can market to them more effectively, the CSO wants to ensure that data masking is applied to sensitive data, the compliance officer wants to make sure that data archival and retention policies are adhered to, and so on. We also know of cases where the head of personnel has been actively involved in data governance and, indeed, any C level executive may be actively involved, depending on circumstances. Because of the nature of the technologies involved, the CIO and others within IT will have a significant input into any decision making.
Typically, data governance comes under the aegis of a data governance council that reports at board level, often with a C level executive on the council. It is arguable that Chief Data Officers, where they are in place, should take on this role. In terms of actual implementation and maintenance the people most likely to be involved are business analysts and data stewards, who will often work closely together.
Historically most compliance requirements have been around the processes used to manipulate data. The accuracy of the data itself was of no concern. Sarbanes-Oxley and its derivatives are a classic example of this. However, this is starting to change. Solvency II, MiFID II, Basel III and Dodd-Frank are all examples of legislation that apply to data as well processes. The words used by both Solvency II and MiFID II are telling: "data should be accurate, complete and appropriate". While these acts do not actually mandate data governance they come as close to doing so as possible without actually saying so. And note that they do not limit themselves to data in your databases: it equally applies to data in, say, spreadsheets.
The reason this is important is because we expect more governments to introduce more legislation that is focused, at least in part, on data accuracy and completeness. Of course, there is already a significant focus on data privacy.
Vendors have been slow to introduce features specifically designed to support data governance, as opposed to complementary technologies such as data quality. The only common exception is support for a 'Data Steward' interface but this does not necesssarily provide additional capabilities.
In terms of more functionality the exceptions are Ataccama, which provides a tracking facility to monitor that processes are being followed; Kalido, which offers a policy control and management suite that supports the definition of the policies to be enforced together with monitoring thereof; and Collibra, which is a data governance specialist. More recently, IBM has introduced its Business Information Exchange product, which is a policy hub not just for data governance policies but also for use in business, security and privacy requirements that go beyond pure governance. IBM is also making a significant push around the governance of big data. Varonis is a leading supplier of data governance solutions for unstructured data.
To a significant extent the reason why data quality vendors have been slow to add specific data governance features is essentially because the former are data-driven while data governance is process-centric. Bearing this in mind it is interesting that Trillium has recently forged a partnership with Collibra, which is one of the few (only?) companies to specialise specifically in data governance software.