Analyst Coverage: Philip Howard
Ab Initio is a privately owned software company with a 25-year pedigree. It was established in Boston in 1995, and still has its headquarters there. It boasts a global presence, with offices located around the world and customers in over 35 countries.
The company’s marketing strategy is almost solely focused on using customer recommendations to win new business. This means that its growth as a business depends heavily on generating success for its customers. The fact that it has successfully relied on this business model for 25 years should speak for itself.
Ab Initio Metadata Hub
Last Updated: 14th July 2020
The Ab Initio Metadata Hub acts as the data governance component for Ab Initio’s data management platform. It can be used as either a system of record or a system of reference, is able to govern technical, business, and even logical assets, and provides both business and technical lineage. It also offers data quality and reference data functionality, as well as role and responsibility management. Moreover, the Metadata Hub is closely integrated with Ab Initio’s other solutions, such as Semantic Discovery, which each provide significant additional capabilities.
The Metadata Hub separates your data assets into a number of categories. For data governance, the most important are business assets, technical assets, and logical assets, as well as reference data. In effect, these consist of business information surrounding your data (business assets), the physical reality of your data (technical assets), and logical data models that describe your data (logical assets). Each type of asset can be browsed through at your leisure, and this acts as the first of two primary means to access (and govern) your assets. The second is through one of the two lineage views that the product provides.
Both of these lineage views visualise the movement and impact of data and data assets through your system as a flowchart, and allow you to access your data assets directly by drilling down into them. Where they differ is in the perspective they take on this movement. Business lineage (shown in Figure 1) approaches it with a high-level, logical view, and illustrates how your business assets interact with and are processed by your system from a business perspective. In other words, it focuses on the interactions that matter to the business, without much regard for the physical reality. An initial view of your business lineage can be generated automatically from the relationships between your fields and your business terms (see below), but some manual effort will usually be required to make it suitable for consumption by the business.
Technical lineage, on other hand, takes the exact opposite approach, by examining how your technical, physical assets literally move through your system, how your files, fields and tables interact, and so on. What’s more, technical lineage is generated automatically from your metadata, and said metadata can be imported from a wide variety of third-party systems.
In fact, a sizable range of extractors are provided for importing metadata into the product, both for generating technical lineage and more generally. This includes support for a number of third-party products, including some direct competitors in the data governance space. It’s also possible to write your own extractors using the open documentation that Ab Initio provides. Extractors can be run either through Ab Initio’s UI or via the command line, and can therefore be scheduled by utilising the latter.
The product’s business glossary, as seen in Figure 2, allows you to centrally define and manage your business terms. Using the business glossary, they can be given a number of (configurable) types, such as ‘critical element’, and can be equipped with classifications such as PII (with the latter tying into Ab Initio Semantic Discovery, the company’s sensitive data discovery solution). Your terms can be hierarchical or otherwise related to other terms, as well as your physical data (again accelerated by Semantic Discovery) and other assets. These relationships are also available as a visualisation, which is generated automatically. Role and responsibility management is configurable for each term individually (and can have their own hierarchies), and a configurable workflow approval process is used to facilitate this. Search access to each term is also provided.
Your business terms can also be used to measure your data quality from a business perspective. Data quality rules can be created and added to your business terms, which will then contribute to an associated, user-configurable quality metric (typically examples include ‘accuracy’ and ‘consistency’). Each rule provides a historical performance summary as well as a list of associate terms, assets, and so on. Configurable thresholds drive data quality warnings, email notifications, or other actions if your quality metrics fall too far. Data quality checking can be run manually or automatically via scheduling. Notably, data quality information can be overlaid onto your business lineage to form a data quality heat map, allowing you to visually understand the health of your system. This can be seen in Figure 1.
The Metadata Hub offers a number of advantages as a platform for data governance. For instance, it positions lineage prominently as a part of its solution, and as you might therefore expect, its lineage capabilities are a significant draw. In particular, explicitly providing both business and technical lineage can prove very useful by allowing all of your users to easily comprehend and get what they need out of your lineage. The data quality heat map is also a notable feature. It’s unfortunate that preparing your business lineage for consumption is likely to take manual effort, but on the other hand, generating your technical lineage is completely automated, and can be accomplished using a selection of third-party – and even competitor – products, to boot.
Ultimately, though, the greatest strength of the Metadata Hub is not part of the product itself. Rather, it’s the product’s place in Ab Initio’s entire milieu that makes it so powerful. For instance, it will readily and closely integrate with Semantic Discovery, and hence add full-fledged data discovery and classification (and thereby sensitive data discovery and GDPR compliance) to your governance solution. What’s more, Semantic Discovery is far from the only integration available: Ab Initio offers a wide range of data management solutions, and in many ways the Metadata Hub acts first and foremost to bring those solutions together.
The Bottom Line
Ab Initio is a broad and highly regarded platform for managing your data. The Metadata Hub, as part of that platform, is an excellent way to bring different elements of it together in aid of data governance.
Ab Initio Semantic Discovery
Last Updated: 21st May 2020
Mutable Award: Gold 2020
Ab Initio Semantic Discovery is a data discovery solution offered as part of Ab Initio’s broader data management platform. It provides automated data discovery (including sensitive data discovery) against both structured and unstructured sources (although its capabilities on the latter are somewhat limited). What’s more, it can leverage this discovery process to drive downstream actions and outcomes, such as automated data quality and data masking.
The discovery process in Semantic Discovery begins by profiling your data using Ab Initio’s Data Profiler. This only needs to be done once on a given data set, no matter how many times you run the rest of the process, and Semantic Discovery will also augment this information via a classification process that provides additional analysis that is particularly useful for data discovery.
Next, the product uses four different metrics to test your data: business term matching, a metadata comparison of your field definitions against your business terms, which may include fuzzy matching, abbreviations, and synonyms; pattern tests, which compare your data against a selection of recognised patterns and values, using the classification mentioned above to determine which fields to test; keyword tests, which looks for specified keywords within your data; and fingerprints, which compare your data against known values. Each of these is used to determine the likelihood that a given field belongs to a given business term. They are then corroborated against each other to provide a final estimate, which in turn is used to recommend an action to take on each field: match, where the field unambiguously belongs to a particular business term; recommend, where a probable match has been found, but there is enough ambiguity to require human intervention; ignore, where no related term has been found; and investigate, where the results as a whole are highly ambiguous (for example, if several possible terms were found, but each had only middling likelihood).
The results of the discovery process are accessed via a metadata portal, allowing you to take action on each field that was discovered on. For example, if a match was recommended, you can accept it, reject it, or investigate the field further (and potentially specify a different match). Choosing to investigate provides more information on the field in question, including a clickthrough to the Data Profiler view for that field. Barring investigate, these actions can be done in bulk and trigger approval workflows when instigated.
The metadata portal is not just for reviewing your discovery results. Among other things, it can also be used for role management, activity monitoring, and managing the metrics that are used to discover your data. For the latter, in particular, you can manage your domains, the sets of known data used during fingerprints; your pattern tests, as well as thresholds for determining how prevalent each pattern needs to be within a field before it is recognised; your keyword tests; and finally, your business terms. These are all extensible, are populated out of the box, and each entry within them can be enabled or disabled individually.
Semantic Discovery also provides dashboards for central monitoring of your discovery processes (as seen in Figure 1), a variety of reporting options such as a decision log, and a visualisation of the relationships between your business terms (see Figure 2) with the option to click through to a detailed view for each term. This detailed view also allows you flag a term as PII (of varying levels, if necessary) and define which action to take (if any) when that term is discovered (for example, a masking function). If these options are used, any data that is discovered under that term will be a) flagged as PII and b) actioned on automatically. On the topic of PII, Ab Initio also provides subject access requests via Query>It, its solution for querying on distributed data sources. This is important for regulatory compliance (say, with GDPR).
Semantic Discovery has several qualities that recommend it. It provides a number of different metrics for discovering your data, which can be combined to help minimise false positives. The existence of the investigate and recommend categories provides nuance, allowing the product to request human intervention when appropriate. Investigate, in particular, can act as a way to identify undocumented business terms: if a field is flagged for investigation, it will quite often be because it doesn’t fit into any of your existing terms, which may suggest a new one should be created.
Semantic Discovery also benefits from Ab Initio’s wider product suite, and in particular its close ties to both the Metadata Hub (part of Ab Initio’s solution for data governance) and Ab Initio’s automated masking engine. This is what allows Semantic Discovery to automatically action on your data as soon as it is discovered. Offering this functionality is a significant advantage, and the ability to mask your data immediately after discovery is particularly helpful for protecting your sensitive data.
The Bottom Line
Ab Initio is a broad and highly regarded platform for managing your data. Semantic Discovery, as part of that platform, is an effective means for discovering the data, and particularly the sensitive data, within your system.
Mutable Award: Gold 2020