Global IDs is a privitely funded company established at the beginning of the century (2001). Its initial focus—though this has broadened subsequently—was on Wall Street, hence the company’s location, although software development takes place primarily in India. At the time of writing the company is in the process of establishing a European operation.
Global IDs provides solutions to enterprise information management (EIM) and governance problems and issues, focusing on companies with the largest, most complex and intractable environments, which are typically multi-domain and often span multiple geographies. In particular, the company is targeting what might best be described as “landscape discovery and governance”, by which we mean the discovery, management and governance of large (hundreds or thousands of data sources) IT estates.
The company’s technology suite is extremely broad ranging and there are a large number of individual tools that span its solutions, including data profiling, data quality, data governance, and master and reference data management. However, their solutions are not usually implemented for, say, a simple name and address matching and cleansing project—while they could certainly be used for this purpose, in practice it would be overkill—the strength of Global IDs is when there is overwhelming complexity involved, across multiple (and not just a few) data sources. To coin a phrase: “Global IDs refreshes parts of the IT landscape that other vendors cannot reach”.
Global IDs Data Lineage
Last Updated: 31st August 2022
The Global IDs platform provides a variety of enterprise information management capabilities, with an evident emphasis on data privacy. This includes data lineage, data profiling, data cataloguing, data quality, data discovery, and data classification, among other things. Additional capabilities, such as data masking, are available via partners. For the purposes of this report, the company’s data lineage solution is of particular interest, but we also examine parts of its (sensitive) data discovery and related capabilities, as they form an essential part of the lineage story.
In general, the platform’s data discovery and lineage capabilities allow you to find the personal and other critical data in your system, track its movements throughout said system at the level of both the individual record and the system as a whole, and then prove that you are doing so via visualisation and report generation. This is of significant benefit for both data privacy and regulatory compliance. Specific supported regulations include GDPR and CCPA, among others, but note that the platform is able to address privacy issues in a general sense and is not limited to any particular set of compliance mandates.
What’s more, the product can do all of this on a massive scale with a high level of automation of accuracy – as is typical for Global IDs – and it supports a wide range of data sources and file formats, as you would expect from a product designed to support large and often highly varied ecosystems. In particular, it provides support for both relational and NoSQL databases. The latter most notably includes MongoDB and Cassandra, as well as – in principle – any data source that can be resolved into a columnar structure. Mainframes are also supported, as is Amazon S3, while unstructured support extends to text files and emails.
“Global IDs Data Lineage was a lifesaver in helping us to meet audit and regulatory requirements. Their solution is highly automated and provided insight to data movement and transformations we were unable to determine before using their tool. Reports were easily exported enabling us to share proof and evidence and the ability to monitor and detect changes was instrumental in building trust in our data.”
Head of Data Governance, Financial Institution
“The auto-discovery of data and its profiling is a miracle. The Machine Learning Algorithms for the data profiling are well executed. The overall workflow of the tool is exceptional and well thought out.”
C-Level, Vendor Selection and Purchasing
Global IDs uses machine learning (ML) driven classification and semantic tagging (with corresponding semantic domains) to automatically scan and categorise your data, including personal data as well as other critical data (perhaps relating to operational processes or decision making). Disambiguation and validation are provided, helping to eliminate false associations, and discovery results can be viewed through the platform’s data catalogue and/or its data privacy dashboard. This process can be leveraged to retrieve all personal information related to a specific individual – and to generate a data privacy report to that effect, potentially in response to a Data Subject Access Request (DSAR) – but more pertinently, it forms the foundation of Global IDs’ data lineage capability.
To wit, off the back of this discovery process the product provides full data lineage and traceability functionality, allowing you to see where a given individual’s data is used within your system. It does this by generating hypotheses regarding where said data is being (or has been) used, based on both human and machine input. It is thus able to recommend probable flows that can be confirmed or rejected by your users by validating them against your actual system. You can also scan ETL flows and stored procedures in order to determine how they move data around, and even stitch together lineage flows manually if you so desire.
The end result of this process is both a high-level visualisation of the flow of data within your system as a whole – shown in Figure 1 – as well as a series of lower-level visualisations that show the movement of each (and all) of your individual records as they move between tables, applications, and any other part of your system. Both views are important aspects of data lineage, and of modern regulatory compliance. Moreover, the platform supports automated report generation for the lineage of a record in much the same way as it does for the personal data relating to a specific individual. Combined with the visualisations mentioned above, this should serve as adequate proof of your ability to trace personal data, and thus ensure compliance with GDPR, CCPA, and any other relevant regulations (at least as far as lineage is concerned). What’s more – and rather uniquely – all of this lineage information can be viewed in virtual reality (VR), as shown in Figure 2.
It is also worth demonstrating the real-world applicability of the lineage capability offered by Global IDs, and highlighting some of its features in that context. Consider, for example, the use of data lineage as part of financial operations - reporting, regulatory compliance, reconciliation, and so forth. Lineage in general is useful here for all of the reasons we’ve already described: it allows you to trace the flow of your data at all levels, which naturally feeds into effective reporting, it contributes to regulatory compliance, as already discussed, and so on and so forth. Global IDs specifically can take a financial report and map all of the data represented in it back to its original source, track records back to individual consumers in order to engage in anti-money laundering monitoring, and more. Note that this is not a comprehensive list of Global IDs’ capabilities, or even its capabilities as they pertain to finance, but rather a brief elucidation of how the platform can be applied to this particular regulatory context. Other contexts can benefit in similar ways.
The most obvious triumph of Global IDs’ data lineage (and indeed, Global IDs in general) is in its ability to operate at extremely large scales, encompassing huge numbers of data sources – which may be geographically distributed – and extracting comprehensive lineage information from even the largest systems. Moreover, the platform supports a broad range of data sources, including mainframes, relational databases, multiple types of NoSQL, S3, and others, making it eminently suitable for enterprise deployment.
That said, Global IDs’ lineage capability has more going for it than just scalability, although that alone makes for a substantial differentiator. The degree of automation offered is also a strong point, for instance, as is the ability to generate lineage information based (at least partially) on human input and feedback. Moreover, the aforementioned automation and scalability advantages combine to allow Global IDs to generate validated lineage information from even highly complex data flows with aplomb.
The end result is that by employing Global IDs you can learn how your data is flowing through your system and provide proof to that effect when regulators and auditors ask for it. The importance of being able to both understand and prove you understand your system really cannot be overstated when it comes to regulatory compliance, and these are some of the core values that Global IDs provides through its data lineage. Moreover, the increased understanding of data flow that its data lineage offers can also be beneficial in general: for generating robust impact analyses, or for resolving data quality issues, for example.
The Bottom Line
Global IDs offers useful, compliant data lineage that is suitable for even the largest and most complex enterprise systems. If that’s you – and perhaps even if it’s not – there is every reason to check it out.
Global IDs Enterprise Data Automation
Last Updated: 29th March 2021
Mutable Award: Gold 2021
Global IDs does not see data quality as an isolated set of functions. In fact, the company’s methodology is illustrated in Figure 1, and as can be seen, data quality is actually step nine of ten. Specifically, it leverages earlier steps and particularly the discovery, profiling, classification, lineage, and catalog (the platform includes a data catalog) functions to establish effective quality controls, though it can be deployed without all of these previous steps. These all do much as you would expect them to do – or perhaps more – in terms of functionality but what really sets the whole Global IDs environment apart is its emphasis on data management at scale or, as in this case, data quality at scale. As an example of this, one of the company’s clients has two thousand applications, ten thousand databases, ninety thousand schemas, twenty million tables, and two hundred and thirteen million columns under management using Global IDs’ software. And this isn’t even its largest customer, which has five hundred million columns under management.
As a precursor to data cleansing operations, the Global IDs data profiling capability provides the sort of histograms and analysis that are commonplace for these sorts of tools. As far as data quality itself is concerned the company takes two approaches: rules-based data quality and machine learning based data quality, with the former applied to single and/or multiple columns and/or rows, while the latter is only used against single columns. In the case of multiple columns these are categorised by semantic domain (for example, checking the format and consistency of email addresses across multiple physical columns). More generally, data quality checks are used for completeness, conformity, consistency, integrity, and timeliness. While there is no match engine per se, rules can be used to check the uniqueness of any particular physical row to identify duplicates.
Global IDs provides a domain rule generator. For example, Figure 2 shows an example of profiled results for UK postal codes.
In the domain rule generator, you select the datatype – see Figure 3 – then the format (length plus pattern) using a similar clicking of the appropriate box and finally a word analysis, again based on the profiled results. The rule is then generated for you.
As far as machine learning is concerned, although its data quality use is confined to single columns it can be used in conjunction with classification to, for example, determine that a particular domain contains sensitive data. When running machine learning for data quality the software will make predictions about expected values along with confidence levels.
Other notable facilities include reconciliation analyses that compare source and target records. Needless to say, the company provides a data quality dashboard so that you can monitor data quality over time.
We have made much of the fact that Global IDs is focused on data management at scale. However, we should note that what the company means by “at scale” is an order of magnitude greater than what most other vendors mean when they talk about scalability. For instance, Global IDs will currently admit that its row-based data quality is not “at scale” to the extent that it would like. But it’s probably already comparable to other tools in the market. However, running at scale causes other problems because it makes visualisation, in particular, difficult. The company therefore partners with Neo4j so that it can represent relationships in the data as a graph, as illustrated in Figure 4.
Aside from that, the major point in Global IDs favour is that it sees, and supports, data quality as a part of a chain of data management requirements rather than something that is a stand-alone function. We concur with this view but it does mean that you are unlikely to select Global IDs purely for data quality purposes but only as a part of a more holistic solution.
The Bottom Line
Global IDs focuses on the most intractable and complex data management problems. Typically, in the largest enterprises with huge volumes of data. We are not aware of any other vendor that has this singular focus. If your company falls into this category Global IDs should at least be on your short list of potential providers.
Mutable Award: Gold 2021
Global IDs landscape discovery and governance
Last Updated: 26th February 2016
While individual parts of the Global IDs portfolio can be used to address any individual data quality or data governance issue, the company's primary focus is on understanding large data landscapes in the first instance and, in the second, taking those management and governance issues on board. For example, major mergers and acquisitions often involve very large numbers of data sources, in both companies, and trying to understand the relationships - both consistencies and inconsistencies - that cross corporate boundaries, and then managing and migrating those relationships, is the sort of complex management issue that Global IDs targets. Similar problems also arise in very large enterprises, even leaving aside mergers and acquisitions, where thousands or even tens of thousands of database instances may be in place and any sort of rationalisation must start with an understanding of that data landscape prior to implementing data quality, master data management or governance processes.
Complex landscapes such as these contain vast amounts of redundant data that describe real world things a business cares about (or once cared about but no longer does). The core problem Global IDs seeks to solve is to help firms make sense of what data exists, what it is about, and how accurate it is, so that they are able to begin systematically and efficiently weeding out the parts of their landscapes which are causing them the most pain. While there are many individual tools and products, from a variety of companies that can be used to start to address these issues, the cost of using these techniques tends to escalate to the point where it is no longer economical (or prudent, because of the risks involved) to tamper with the status quo. As a result, these landscapes continue to expand over time in increasingly complex ways. What Global IDs aims to do is to cut this Gordian knot by making landscape discovery and governance a practical proposition.
As noted, the company targets the world's largest organisations across all verticals. While Global IDs has its own sales force it also works with systems integrators. Partners in this area include Cap Gemini, Cognizant and others. In addition, the company has a number of notable partnerships with other technology vendors including EMC, Pitney Bowes, Acxiom, Red Hat, Cray and SAP.
The company has customers in the Financial Services, Healthcare, Pharmaceuticals, Telecom and Retail sectors. None of its clients are publicly named but the names of some can easily be deduced from their descriptions, such as "one of the world's largest providers of both mobile telephony and fixed telephony. This company is an icon of its industry and can trace its foundational roots back to over 125 years ago" and "one of the world's leading retail giants is an American public corporation that runs a chain of large, discount department stores. It has the largest number of stores, supercentres and neighbourhood markets in the US."
The basics of Global IDs' 'landscape discovery and governance' is that you iteratively profile all of your data sources to discover the relationships that exist across those data sources. There are very few other data profiling tools that were designed from the outset with this sort of capability and none at the scale that Global IDs is supporting. In this latter context, Global IDs supports an elastic computing model designed to scale to support very large environments with many data sources.
Another major focus of Global IDs is automation. The company sees this as critical to the success of understanding and managing large data landscapes and its technology is based on semantic principles (for example, recognising that a client is a customer in whatever these things are called in foreign languages, and so on). Of course, the implementation of automation is an on-going process.
In addition, when you try to govern large data landscapes one of the problems that you will encounter is that there is so much information to explore and manage that it is difficult to visualise the environment using traditional techniques. Global IDs' approach to this problem is to store the semantics it captures in a graph database (the company embeds the Titan distributed graph database) so that you can explore inter-relationships using graph technology which, in our view, gives Global IDs a significant competitive advantage. This doesn't mean that it is ever going to be simply to visualise large, complex landscapes but, in our opinion, the use of graphs is the best starting point even if this remains a work in progress.
Of course, discovery across the landscape is only stage one. Typically you are doing this because you want to rationalise across multiple systems, implement master data management, consolidate database systems or implement data governance. You might also want to do this if you have appointed a Chief Data Officer and want to know about all relevant sources of data for analytic purposes. Whatever the case, there will certainly be additional data management functions that are required and Global IDs offers relevant capabilities for these tasks also.
Global IDs provides extensive consulting and support services, which are necessary in the sort of complex environments with which it is dealing. The company offers customer service managers, 24/7 support, a support portal, agile development and release process to ensure fixes and features get rolled out quickly as well as personalised roll out support for new versions when they become available.
Global IDs Sensitive Data Discovery
Last Updated: 8th December 2022
Mutable Award: Gold 2022
Global IDs offers data discovery and classification as part of DEEP, its Data Ecosystem Evolution Platform. It thereby provides sensitive data discovery and compliance with regulations such as GDPR and CCPA. Four primary capabilities are provided: discovering sensitive data in general, scanning for data associated with a particular individual, locating individuals from an address (a CCPA-specific capability), and creating privacy reports that prove compliance. A number of other relevant capabilities, such as data lineage, are available, as are enterprise-wide visualisations of your entire sensitive data landscape. What’s more, as a Global IDs product, it has been designed to do all of this at scale, regardless of the size of your ecosystem.
The product supports a wide range of data sources and file formats, as you would expect from a product designed to support large and therefore often highly varied ecosystems. In particular, it provides support for both relational and NoSQL databases, the latter most notably including MongoDB and Cassandra but, in principle, any data source that can be resolved into a columnar structure. Mainframes are also supported, as is Amazon S3, while unstructured support extends to text files and emails.
For sensitive data discovery, Global IDs leverages its data profiling, classification, and lineage capabilities, as seen in Figure 1. In turn, this process identifies personal data as well as the individual that it corresponds to. For data classification, the product heavily (and increasingly) leverages semantic tagging and machine learning to identify columns and tables containing sensitive data, backed up by more traditional methods such as rule and pattern-based matching. The idea is that it will look at each field holistically – including both associated metadata and the values of the data itself – and make a classification recommendation, which can either
be accepted or rejected. Either way, this user feedback will be incorporated into its underlying model, enabling it to automatically adjust to your system and become more accurate over time.
This is particularly relevant given the size of ecosystem that Global IDs will generally be working with: with such a large system, creating appropriate classification rules and keeping them up to date will be difficult and time consuming. Automating this process is therefore highly valuable. Notably, Global IDs also leverages disambiguation and validation as part of its process for discovering personal data, which can help to eliminate false associations.
In addition to the discovery of sensitive data, Global IDs also allows you to search for and track all personal information related to a specific individual within your system, going so far as to create a data privacy report to that effect, as shown in Figure 2. The reports themselves can be produced and accessed through a privacy dashboard, with any sensitive values contained therein masked automatically. They are highly useful for addressing Data Subject Access Requests (DSAR), and particularly impressive when you consider the scale that the product operates on: across a large, possibly pan-global data ecosystem, it is likely that data relating to an individual will end up distributed far and wide across your system. Global IDs allows you to reconsolidate that information by providing a centralised view of it.
As an extension to the above, the platform is designed to help you create a comprehensive view of your (sensitive) data landscape. This includes the creation of privacy domains (which is to say, types of personal information), as well “semantic objects” that consist of multiple such domains. These objects can then be treated as singular entities for the purposes of, say, search, allowing you to locate specific groupings of personal information across your enterprise.
The product also provides full data traceability and data lineage, allowing you to see where a given individual’s data is being used within your system. It does this by generating hypotheses regarding where said data is being (or has been) used, based on both human and machine input. It is thus able to recommend probable flows that can be confirmed or rejected by your users by validating them against your actual system. Further complementary capabilities, such as data masking, are provided through partnerships.
Finally, to aid in understanding the geography of regulations like GDPR, Global IDs provides an interactive map of the world that displays each country alongside the regulations that apply there. For the USA specifically, there is a separate map that does the same on a state-by-state basis. The first of these views is shown in Figure 3.
The foremost reason to use Global IDs is its capacity to operate at extremely large scales. As with many of the spaces it operates in, there are few, if any, vendors that can compete in this regard. Global IDs is able to operate on huge numbers of data sources, which may themselves be geographically distributed, and not only locate personal data across these sources, but associate that data to a specific individual (who can then be searched for to return this data) and locate where in your system it is being used. These are essential capabilities for GDPR compliance – suppose a customer asks, as is their right, for all of the information you are storing on them – and without Global IDs you will be hard pressed to provide them at such scales. Moreover, it is worth noting that privacy regulation requires both compliance and convincing proof of said compliance. Global IDs’ reporting functionality can be instrumental for the latter.
Global IDs also supports a much broader range of data sources than many competing products in the space, including mainframes, relational databases, multiple types of NoSQL, S3, and others. This is a notable capability in a space which, for the most part, has very limited database support. Although support for one or two of these ranges of technology is common, supporting all of them within a single product is much less so.
The Bottom Line
If you need sensitive data discovery at very large scale, Global IDs should be your first port of call. Even at somewhat less extreme scale, it is well worth looking at.
Mutable Award: Gold 2022