Analyst Coverage: Philip Howard
Global IDs is a privitely funded company established at the beginning of the century (2001). Its initial focus—though this has broadened subsequently—was on Wall Street, hence the company’s location, although software development takes place primarily in India. At the time of writing the company is in the process of establishing a European operation.
Global IDs provides solutions to enterprise information management (EIM) and governance problems and issues, focusing on companies with the largest, most complex and intractable environments, which are typically multi-domain and often span multiple geographies. In particular, the company is targeting what might best be described as “landscape discovery and governance”, by which we mean the discovery, management and governance of large (hundreds or thousands of data sources) IT estates.
The company’s technology suite is extremely broad ranging and there are a large number of individual tools that span its solutions, including data profiling, data quality, data governance, and master and reference data management. However, their solutions are not usually implemented for, say, a simple name and address matching and cleansing project—while they could certainly be used for this purpose, in practice it would be overkill—the strength of Global IDs is when there is overwhelming complexity involved, across multiple (and not just a few) data sources. To coin a phrase: “Global IDs refreshes parts of the IT landscape that other vendors cannot reach”.
Global IDs Enterprise Data Automation
Last Updated: 29th March 2021
Mutable Award: Gold 2021
Global IDs does not see data quality as an isolated set of functions. In fact, the company’s methodology is illustrated in Figure 1, and as can be seen, data quality is actually step nine of ten. Specifically, it leverages earlier steps and particularly the discovery, profiling, classification, lineage, and catalog (the platform includes a data catalog) functions to establish effective quality controls, though it can be deployed without all of these previous steps. These all do much as you would expect them to do – or perhaps more – in terms of functionality but what really sets the whole Global IDs environment apart is its emphasis on data management at scale or, as in this case, data quality at scale. As an example of this, one of the company’s clients has two thousand applications, ten thousand databases, ninety thousand schemas, twenty million tables, and two hundred and thirteen million columns under management using Global IDs’ software. And this isn’t even its largest customer, which has five hundred million columns under management.
As a precursor to data cleansing operations, the Global IDs data profiling capability provides the sort of histograms and analysis that are commonplace for these sorts of tools. As far as data quality itself is concerned the company takes two approaches: rules-based data quality and machine learning based data quality, with the former applied to single and/or multiple columns and/or rows, while the latter is only used against single columns. In the case of multiple columns these are categorised by semantic domain (for example, checking the format and consistency of email addresses across multiple physical columns). More generally, data quality checks are used for completeness, conformity, consistency, integrity, and timeliness. While there is no match engine per se, rules can be used to check the uniqueness of any particular physical row to identify duplicates.
Global IDs provides a domain rule generator. For example, Figure 2 shows an example of profiled results for UK postal codes.
In the domain rule generator, you select the datatype – see Figure 3 – then the format (length plus pattern) using a similar clicking of the appropriate box and finally a word analysis, again based on the profiled results. The rule is then generated for you.
As far as machine learning is concerned, although its data quality use is confined to single columns it can be used in conjunction with classification to, for example, determine that a particular domain contains sensitive data. When running machine learning for data quality the software will make predictions about expected values along with confidence levels.
Other notable facilities include reconciliation analyses that compare source and target records. Needless to say, the company provides a data quality dashboard so that you can monitor data quality over time.
We have made much of the fact that Global IDs is focused on data management at scale. However, we should note that what the company means by “at scale” is an order of magnitude greater than what most other vendors mean when they talk about scalability. For instance, Global IDs will currently admit that its row-based data quality is not “at scale” to the extent that it would like. But it’s probably already comparable to other tools in the market. However, running at scale causes other problems because it makes visualisation, in particular, difficult. The company therefore partners with Neo4j so that it can represent relationships in the data as a graph, as illustrated in Figure 4.
Aside from that, the major point in Global IDs favour is that it sees, and supports, data quality as a part of a chain of data management requirements rather than something that is a stand-alone function. We concur with this view but it does mean that you are unlikely to select Global IDs purely for data quality purposes but only as a part of a more holistic solution.
The Bottom Line
Global IDs focuses on the most intractable and complex data management problems. Typically, in the largest enterprises with huge volumes of data. We are not aware of any other vendor that has this singular focus. If your company falls into this category Global IDs should at least be on your short list of potential providers.
Mutable Award: Gold 2021
Global IDs landscape discovery and governance
Last Updated: 26th February 2016
While individual parts of the Global IDs portfolio can be used to address any individual data quality or data governance issue, the company's primary focus is on understanding large data landscapes in the first instance and, in the second, taking those management and governance issues on board. For example, major mergers and acquisitions often involve very large numbers of data sources, in both companies, and trying to understand the relationships - both consistencies and inconsistencies - that cross corporate boundaries, and then managing and migrating those relationships, is the sort of complex management issue that Global IDs targets. Similar problems also arise in very large enterprises, even leaving aside mergers and acquisitions, where thousands or even tens of thousands of database instances may be in place and any sort of rationalisation must start with an understanding of that data landscape prior to implementing data quality, master data management or governance processes.
Complex landscapes such as these contain vast amounts of redundant data that describe real world things a business cares about (or once cared about but no longer does). The core problem Global IDs seeks to solve is to help firms make sense of what data exists, what it is about, and how accurate it is, so that they are able to begin systematically and efficiently weeding out the parts of their landscapes which are causing them the most pain. While there are many individual tools and products, from a variety of companies that can be used to start to address these issues, the cost of using these techniques tends to escalate to the point where it is no longer economical (or prudent, because of the risks involved) to tamper with the status quo. As a result, these landscapes continue to expand over time in increasingly complex ways. What Global IDs aims to do is to cut this Gordian knot by making landscape discovery and governance a practical proposition.
As noted, the company targets the world's largest organisations across all verticals. While Global IDs has its own sales force it also works with systems integrators. Partners in this area include Cap Gemini, Cognizant and others. In addition, the company has a number of notable partnerships with other technology vendors including EMC, Pitney Bowes, Acxiom, Red Hat, Cray and SAP.
The company has customers in the Financial Services, Healthcare, Pharmaceuticals, Telecom and Retail sectors. None of its clients are publicly named but the names of some can easily be deduced from their descriptions, such as "one of the world's largest providers of both mobile telephony and fixed telephony. This company is an icon of its industry and can trace its foundational roots back to over 125 years ago" and "one of the world's leading retail giants is an American public corporation that runs a chain of large, discount department stores. It has the largest number of stores, supercentres and neighbourhood markets in the US."
The basics of Global IDs' 'landscape discovery and governance' is that you iteratively profile all of your data sources to discover the relationships that exist across those data sources. There are very few other data profiling tools that were designed from the outset with this sort of capability and none at the scale that Global IDs is supporting. In this latter context, Global IDs supports an elastic computing model designed to scale to support very large environments with many data sources.
Another major focus of Global IDs is automation. The company sees this as critical to the success of understanding and managing large data landscapes and its technology is based on semantic principles (for example, recognising that a client is a customer in whatever these things are called in foreign languages, and so on). Of course, the implementation of automation is an on-going process.
In addition, when you try to govern large data landscapes one of the problems that you will encounter is that there is so much information to explore and manage that it is difficult to visualise the environment using traditional techniques. Global IDs' approach to this problem is to store the semantics it captures in a graph database (the company embeds the Titan distributed graph database) so that you can explore inter-relationships using graph technology which, in our view, gives Global IDs a significant competitive advantage. This doesn't mean that it is ever going to be simply to visualise large, complex landscapes but, in our opinion, the use of graphs is the best starting point even if this remains a work in progress.
Of course, discovery across the landscape is only stage one. Typically you are doing this because you want to rationalise across multiple systems, implement master data management, consolidate database systems or implement data governance. You might also want to do this if you have appointed a Chief Data Officer and want to know about all relevant sources of data for analytic purposes. Whatever the case, there will certainly be additional data management functions that are required and Global IDs offers relevant capabilities for these tasks also.
Global IDs provides extensive consulting and support services, which are necessary in the sort of complex environments with which it is dealing. The company offers customer service managers, 24/7 support, a support portal, agile development and release process to ensure fixes and features get rolled out quickly as well as personalised roll out support for new versions when they become available.
Global IDs Sensitive Data Discovery
Last Updated: 21st May 2020
Mutable Award: Gold 2020
Global IDs offers data discovery and classification as part of its Enterprise Data Automation (EDA) platform, thereby providing sensitive data discovery and compliance with regulations such as GDPR and CCPA. A number of other relevant capabilities, such as data lineage, are also available, as are enterprise-wide visualisations of your entire sensitive data landscape (see Figure 1). What’s more, as a Global IDs product, the platform has been designed to provide all of this at scale, regardless of the size of your ecosystem.
The product supports a wide range of data sources and file formats, as you would expect from a product designed to support large and therefore often highly varied ecosystems. In particular, it provides support for both relational and NoSQL databases, the latter most notably including MongoDB and Cassandra but, in principle, any data source that can be resolved into a columnar structure. Mainframes are also supported, as is Amazon S3 while unstructured support extends to text files and emails.
For sensitive data discovery, Global IDs leverages the profiling, classification, and lineage capabilities of its EDA platform. This allows it to categorise your data using classification rules. In turn, this process identifies personal data as well as the individual that it corresponds to. Moreover, Global IDs supports its classification rules with semantic tagging (by column) and machine learning capabilities, enabling it to automatically adjust to your system and become more accurate over time. This is particularly relevant for larger ecosystems, since manually creating appropriate classification rules and keeping them up to date will be difficult and time consuming. Automating this process is therefore highly valuable. Notably, Global IDs also leverages disambiguation and validation as part of its process for discovering personal data, which can help to eliminate false associations.
In addition to the discovery of sensitive data, Global IDs also allows you to search for and track all personal information related to a specific individual within your system, going so far as to create a data privacy report to that effect, as shown in Figure 2. This is both highly useful for Data Subject Access Requests (DSARs) and particularly impressive when you consider that the view provided is enterprise-wide: in a large, possibly pan-global data ecosystem, it is likely that data relating to an individual will end up distributed far and wide across your system, and Global IDs allows you to reconsolidate that information by providing a centralised view of it.
The product also provides full data traceability and data lineage, allowing you to see where a given individual’s data is being used within your system. Global IDs does this by generating hypotheses regarding where said data is being used. This is based on both human and machine input, in the form of surveys for the former and machine learning for the latter, although the product can be configured to prioritise either of these input methods and therefore weigh either human or machine input more highly. Subsequently, these hypotheses can be validated against your actual system and either confirmed or rejected accordingly.
Complementary capabilities, such as data masking, are provided through partnerships. Most prominently, Global IDs has recently partnered with Delphix, a leading vendor in the test data management space, in order to leverage the substantial data masking and FPE (Format-Preserving Encryption) functionality within its platform.
Finally, to aid in understanding the geography of regulations like GDPR, Global IDs provides an interactive map of the world that displays each country alongside the regulations that apply there. For the USA specifically, there is a separate map that does the same on a state-by-state basis. These views are shown in Figures 3 and 4.
The foremost reason to use Global IDs is its capacity for automation, which both reduces human resources costs and allows it to scale to the size of any enterprise. In regards to scalability, there are few, if any, vendors that can compete. In fact, Global IDs is able to operate on huge numbers of data sources, which may themselves be geographically distributed, and not only locate personal data across these sources, but associate that data to a specific individual (who can then be searched for to return this data) and locate where in your system it is being used. These are essential capabilities for GDPR compliance – suppose a customer asks, as is their right, for all of the information you are storing on them – and without Global IDs you will be hard-pressed to provide them at such scales.
Global IDs also supports a much broader range of data sources than many competing products in the space, including mainframes, relational databases, multiple types of NoSQL, S3, and others. This is a notable capability in a space which, for the most part, has very limited database support. Although support for one or two of these ranges of technology is common, supporting all of them within a single product is much less so.
The Bottom Line
Global IDs provides highly automated sensitive data discovery that scales to the size of your enterprise. If you have a very large data ecosystem, it should be your first port of call, but even if you don’t, it is still well worth your consideration.
Mutable Award: Gold 2020