Analyst Coverage: Philip Howard
Ab Initio is a privately owned software company with a 25-year pedigree. It was established in Boston in 1995, and still has its headquarters there. It boasts a global presence, with offices located around the world and customers in over 35 countries.
The company’s marketing strategy is almost solely focused on using customer recommendations to win new business. This means that its growth as a business depends heavily on generating success for its customers. The fact that it has successfully relied on this business model for 25 years should speak for itself.
Ab Initio Semantic Discovery
Last Updated: 21st May 2020
Mutable Award: Gold 2020
Ab Initio Semantic Discovery is a data discovery solution offered as part of Ab Initio’s broader data management platform. It provides automated data discovery (including sensitive data discovery) against both structured and unstructured sources (although its capabilities on the latter are somewhat limited). What’s more, it can leverage this discovery process to drive downstream actions and outcomes, such as automated data quality and data masking.
The discovery process in Semantic Discovery begins by profiling your data using Ab Initio’s Data Profiler. This only needs to be done once on a given data set, no matter how many times you run the rest of the process, and Semantic Discovery will also augment this information via a classification process that provides additional analysis that is particularly useful for data discovery.
Next, the product uses four different metrics to test your data: business term matching, a metadata comparison of your field definitions against your business terms, which may include fuzzy matching, abbreviations, and synonyms; pattern tests, which compare your data against a selection of recognised patterns and values, using the classification mentioned above to determine which fields to test; keyword tests, which looks for specified keywords within your data; and fingerprints, which compare your data against known values. Each of these is used to determine the likelihood that a given field belongs to a given business term. They are then corroborated against each other to provide a final estimate, which in turn is used to recommend an action to take on each field: match, where the field unambiguously belongs to a particular business term; recommend, where a probable match has been found, but there is enough ambiguity to require human intervention; ignore, where no related term has been found; and investigate, where the results as a whole are highly ambiguous (for example, if several possible terms were found, but each had only middling likelihood).
The results of the discovery process are accessed via a metadata portal, allowing you to take action on each field that was discovered on. For example, if a match was recommended, you can accept it, reject it, or investigate the field further (and potentially specify a different match). Choosing to investigate provides more information on the field in question, including a clickthrough to the Data Profiler view for that field. Barring investigate, these actions can be done in bulk and trigger approval workflows when instigated.
The metadata portal is not just for reviewing your discovery results. Among other things, it can also be used for role management, activity monitoring, and managing the metrics that are used to discover your data. For the latter, in particular, you can manage your domains, the sets of known data used during fingerprints; your pattern tests, as well as thresholds for determining how prevalent each pattern needs to be within a field before it is recognised; your keyword tests; and finally, your business terms. These are all extensible, are populated out of the box, and each entry within them can be enabled or disabled individually.
Semantic Discovery also provides dashboards for central monitoring of your discovery processes (as seen in Figure 1), a variety of reporting options such as a decision log, and a visualisation of the relationships between your business terms (see Figure 2) with the option to click through to a detailed view for each term. This detailed view also allows you flag a term as PII (of varying levels, if necessary) and define which action to take (if any) when that term is discovered (for example, a masking function). If these options are used, any data that is discovered under that term will be a) flagged as PII and b) actioned on automatically. On the topic of PII, Ab Initio also provides subject access requests via Query>It, its solution for querying on distributed data sources. This is important for regulatory compliance (say, with GDPR).
Semantic Discovery has several qualities that recommend it. It provides a number of different metrics for discovering your data, which can be combined to help minimise false positives. The existence of the investigate and recommend categories provides nuance, allowing the product to request human intervention when appropriate. Investigate, in particular, can act as a way to identify undocumented business terms: if a field is flagged for investigation, it will quite often be because it doesn’t fit into any of your existing terms, which may suggest a new one should be created.
Semantic Discovery also benefits from Ab Initio’s wider product suite, and in particular its close ties to both the Metadata Hub (part of Ab Initio’s solution for data governance) and Ab Initio’s automated masking engine. This is what allows Semantic Discovery to automatically action on your data as soon as it is discovered. Offering this functionality is a significant advantage, and the ability to mask your data immediately after discovery is particularly helpful for protecting your sensitive data.
The Bottom Line
Ab Initio is a broad and highly regarded platform for managing your data. Semantic Discovery, as part of that platform, is an effective means for discovering the data, and particularly the sensitive data, within your system.
Mutable Award: Gold 2020