Informatica
Last Updated:
Analyst Coverage: Daniel Howard
Informatica was founded in 1993 as a services company specialising in helping its customers to migrate to a client/server environment. It was not until 1996 that it introduced its first product, Informatica PowerMart, which was followed by Informatica PowerCenter in 1998. The following year the company floated on NASDAQ but in 2015 the company went through a leveraged buyout whereby Permira and the Canada Pension Plan Investment Board acquired Informatica for a price of $5.3bn. Microsoft and Salesforce were also investors. In the year prior to the acquisition the company had revenues in excess of $1bn and net income of over $100m.
Since going private Informatica has not only been aggressive in introducing new products but has also been transitioning away from an on-premises, traditional licensing model to a more cloud-based subscription-oriented approach. In a number of the markets it serves, Informatica is also having to evolve from a company that traditionally marketed itself to IT and technical departments, to one that is more involved at business levels. This is arguably more difficult to do than a transition to the cloud.
Axon Data Governance became part of the Informatica portfolio when the company acquired the original developer, Diaku, in early 2017. In the time since, Informatica has worked diligently to integrate it with other Informatica products, particularly Informatica Data Quality, Enterprise Data Catalog and Secure@Source.
Informatica Axon Data Governance (July 2020)
Last Updated: 14th July 2020
Mutable Award: Gold 2020
Informatica Axon Data Governance provides browser-based, business level access to a variety of enterprise grade, highly automated and democratised data governance capabilities. These capabilities are tightly coupled with Informatica Enterprise Data Catalog, Informatica Data Quality and Informatica Data Privacy Management (formerly Secure@Source), which operate downstream of Axon Data Governance but seamlessly integrate with it using shared metadata-driven intelligence, as well as CLAIRE, Informatica’s Enterprise Unified Metadata Intelligence Engine. Taken all together, these products offer a complete, unified, and intelligent solution for data governance.
Customer Quotes
“Having an automated, integrated solution from Informatica is making a difference in our data governance program – because you cannot manage what you cannot see.”
L.A. Care Health Plan
“Informatica helps us tackle data governance and management in new and more effective ways, giving us the tools to win more business and retain our existing customers.”
AIA Singapore
Axon Data Governance allows you to view and manage a variety of business and data assets within a single location, including data sets, business terms, policies, processes, and so on. A contextual graph search is provided, as are automated workflows, approval processes, and a variety of dashboards at both the system and local levels (one of which is displayed in Figure 1).
Assets come equipped with a variety of information, notably including connections with your other assets. This includes explicit associations, applicable rules and policies, impact analysis, and data lineage. For policies, this also includes their position in your overall policy hierarchy. The product thus provides a connected view of your overall system. Relevant stakeholders are also highlighted, including both direct stakeholders as well as the broader stakeholder community, and discussion features are provided to facilitate collaboration. The product also offers a view into your technical metadata, held within Enterprise Data Catalog, and allows you to create data sets within Axon Data Governance directly from that metadata.
Data lineage in Axon Data Governance is business-oriented and available at multiple levels. It is displayed visually, and can be filtered, explored, and so forth on the fly. You can also overlay a variety of metadata, such as data quality and risk, onto your lineage view. This is shown in Figure 2. Corresponding technical lineage is available within the Enterprise Data Catalog.
The product’s data discovery capabilities are quite extensive, leveraging CLAIRE to automatically sort your data into a variety of ‘Smart Domains’, a number of which are provided out of the box. CLAIRE can also be used for intelligent business term association, tagging relevant data assets with business terms based on data discovery rules equipped to each term, and thus connecting your business and technical assets.
For data quality, natural language processing is used via CLAIRE to automatically generate new (or recommend existing) data quality rules based on plain English descriptions of your quality requirements. Data quality checks are automated – in particular, newly ingested data is checked automatically – as is reporting. Several categories of data quality are also offered, allowing for nuanced quality measurements.
Several features are provided to support data privacy and regulatory compliance. In addition to the data discovery described above, the product offers sensitive data discovery and classification across structured and unstructured data, support for subject access requests, and the ability to track policy violations and understand and assess risk. Policies can be managed just as any other asset, and can also be associated to business terms and automatically enforced on correspondingly tagged data assets. Integration with Informatica Data Privacy Management also allows your policies to drive risk analysis and privacy monitoring (for example, alerting) inside Axon Data Governance.
Finally, Axon Data Governance provides data democratisation via Informatica Axon Data Marketplace, a new, embedded feature. The Axon Data Marketplace solution allows data owners to publish their data directly from Axon Data Governance, as well as manage and track access to it. For data consumers, it provides a means to search for data – which is organised into meaningful business categories – and request access to it centrally via a checkout process. Axon Data Marketplace also provides a significant degree of automation, automatically notifying data owners of incoming requests, checking requests against existing policies, and delivering the relevant data when a request is approved.
Axon Data Governance provides a notably broad set of data governance capabilities, which are extended even further by its integration with Enterprise Data Catalog and Data Privacy Management. What’s more, these capabilities are, in general, highly automated, often owing to shared metadata-driven intelligence and other functionality provided by CLAIRE. In effect, Informatica provides a complete and often intelligent solution for data governance.
In fact, separating out business concerns within Axon Data Governance from technical concerns in Enterprise Data Catalog and Data Privacy Management provides its own benefits, by offering experiences and views tailored specifically for business and technical users, respectively. The privacy compliance and risk monitoring provided by Informatica Data Privacy Management, and especially the ability to bring the results of that monitoring back into Axon Data Governance for consumption by your business users, is a particular strength.
Axon Data Marketplace also provides significant advantages. By acting as a centralised ‘one stop shop’ for data approval, it makes it much easier for data consumers to find and request access to the data they need. Likewise, the Axon Data Marketplace benefits data owners by providing a single location to manage access requests, by automating much of the approval process, and by automatically delivering data assets to your data consumers upon the completion of said process. In short, it makes the lives of your data owners and consumers simpler and easier, allowing them to spend more of their time and energy on other concerns, such as creating value from the data within your organisation.
The Bottom Line
Informatica Axon Data Governance is the keystone for Informatica’s intelligent data governance solution. Said solution is eminently integrated and complete, highly automated and scalable, and well worth your consideration.
Informatica Axon Data Governance (November 2018)
Last Updated: 7th November 2018
Mutable Award: Gold 2018
Axon Data Governance is a data governance product that supports, among other things, data stewardship, data quality and data monitoring. In support of data governance, it provides a collaborative business environment for both business and IT, as well as the capability to support and monitor compliance with specified regulations.
One of Axon’s primary goals is to facilitate understanding of the data within your organisation. To this end, it provides data quality and compliance dashboards (one of which is demonstrated in Figure 1) that monitor all of your governed assets, as well as more detailed assessments that are contextualised to each individual asset. Business lineage data is available, and workflow driven change requests and policy design and creation are supported.
More broadly, Axon Data Governance is positioned as the core of the Informatica Enterprise Data Governance Solution, a complete data governance solution. It’s supported by (and integrates with) other products in the platform, including Informatica Data Quality, Enterprise Data Catalog, and Secure@Source. These products provide additional functionality – such as data cataloguing, data privacy, and, of course, the company’s well-known data profiling and cleansing capabilities – that combine with Axon to form a complete data governance solution that enables data-driven digital transformation and collaboration.
Customer Quotes
“Choosing Axon is more about the functionality than the cost and, in any case, the costs are dwarfed by the benefits (or words to that effect).”
Interviews conducted by Bloor Research with a major publisher, a major pharmaceutical and a major payment provider
“In comparison to other tools, we chose Axon for its data governance management capabilities. Anyone can use it to focus on the business impact of data. You’ll be able to get a business driven, global vision that’s defined in partnership with technology and enforced via the data governance team. All of that lends itself to removing the silos and increasing collaboration between business units and the IT department.”
McGraw-Hill Education
Axon Data Governance provides access to the data in your system via the aforementioned data quality dashboard. The same dashboard exposes metadata attributes, business terms, processes, and so on, with an emphasis on modelling your data specifically as it relates to your business. Furthermore, it is easy to access any underlying physical data via integration with Enterprise Data Catalog, and this access is provided via a multi-dimensional search across your entire system, either globally or for a particular type of asset. In addition to the dashboard, a detailed data quality assessment is available for each data asset, complete with business lineage data that is displayed visually. You can also make change requests through this interface. If your request is approved, a workflow is triggered to coordinate relevant stakeholders to implement the necessary changes, as well as documenting any changes made.
Another view within Axon Data Governance is related to data privacy and compliance, which allows you to view privacy information on data assets in more detail. It lets you see precisely how your data has been used, accessed and reported on. As with many tools of this nature, this means that you can both ensure that you are complying to a regulation and prove that you are doing so.
Axon Data Governance also provides a significant level of support for the creation and implementation of policies, particularly those oriented around data quality and data privacy. Specifically, it allows you to design and build a policy hierarchy, as illustrated in Figure 2, beginning with an abstract description of your policy, and ending with concrete business rules that can be executed on your data assets. Policies will typically be organised by project although, of course, reuse is possible. Projects may be purely internal or they may be related to regulations such as the General Data Protection Regulation (GDPR). Axon also has significant graphical capabilities (backed by a graph database) in this area, so that you can visually explore relationships both across policies and between policies and projects.
Axon Data Governance models data assets as they relate to the business, making both your data, and the relationships between it, much easier to understand and utilise, particularly for non-technical users. This enables self-service, data discovery and collaboration, which in turn allows everyone in your organisation to effectively contribute to data governance. In our opinion, collaboration on this scale is necessary to achieve effective data governance at the enterprise level. In turn, data governance is increasingly becoming a necessity to promote data quality and ensure regulatory compliance. To wit, the 2016 Big Data Maturity Survey concluded that “accessibility, security and governance have become the fastest growing areas of concern year-over-year, with governance growing most at 21%.” More generally, the issues that surround poor data quality are well known, and the Gartner Group estimates that bad data can cost large organisations up to 14.2 million dollars per year.
The Bottom Line
Axon Data Governance provides data quality, data privacy and data governance capabilities within a single solution. By itself, it is very capable. However, combined with other products in the Informatica Intelligent Data Platform, it forms the crux of an extremely thorough, well-integrated, and complete data governance solution.
Informatica Data Privacy Management
Last Updated: 21st May 2020
Mutable Award: Gold 2020
Informatica Data Privacy Management is a data-centric privacy, governance and security solution that is focused on discovering and classifying sensitive data to understand how it moves around the organisation, where it may be located from a geographical perspective, who owns the data, and which people and processes access that data. In short, to manage privacy risks in a comprehensive, integrated, solution. The product shares a common metadata platform with Informatica Enterprise Data Catalog (EDC) as well as the Informatica Axon data governance offerings. These solutions have many of the same cataloguing capabilities as Informatica Data Privacy Management. Informatica also provides format preserving encryption as well as both static and dynamic data masking, which are used to protect sensitive data. The company also offers encrypted archival capabilities. For consent management, you can use the company’s MDM offering for consent mastering, and Informatica also partners with both OneTrust and TrustArc.
Specifically for discovering sensitive data the product supports relational databases in the cloud or on-premises, applications such as Salesforce and SAP R/3, Amazon S3, ETL processes (limited to Informatica, Microsoft SSIS and Cloudera Navigator and Atlas), file systems and both SharePoint and OneDrive. However it is less of a primary focus area – it is not alone in this – when it comes to unstructured and NoSQL data sources. For example, it is limited to supporting Hive and HDFS at present, though the company plans to support Cassandra in version 6.0 (the current release is 5.1) and BigQuery.
Customer Quotes
“Before we embarked on this journey, we didn’t have a clear view of sensitive data. Now, with Informatica, we can see and manage our entire universe of information. This capability is a game-changer, and it’s enabling us to take a proactive approach to data protection that is helping to strengthen customer trust in our services.”
Financial Services Company
Considered holistically, Informatica’s approach is that you start by creating actionable data privacy policies (integration with Axon – see Figure 1 – from which you can overlay policies into Informatica Data Privacy Management). Then, discover and classify your sensitive data; uncover and map “identities” to data (that can be used to support data subject rights requests under regulations such as the GDPR and CCPA); analyse the risks posed by sensitive data so that you can prioritise your protection plans; protect the data (masking and other techniques); respond to rights and consent requests; and, finally, be able to track and report on all of this.
The facilities for discovering sensitive data, which may be run against samples of the data, if required, are extensive and can be automated using ML and AI. You can match on the metadata via patterns, regular expressions and rules and you can also introspect SQL – both SQL queries and any SQL used for data movement purposes – though not stored procedures. Distance constraints (for example, post code needs to be near city name) can be used and you can define white (always sensitive) and black (never sensitive) lists. In the latest version, the underlying engine can make recommendations about what should be in these lists. The leverage of primary/foreign key relationships is planned for the next release. For unstructured data the product uses AI to look for parts of speech and otherwise relies heavily on the use of reference data. When potentially sensitive data is discovered you can set your system up to automatically agree that this is sensitive or that it is not sensitive or that it needs human validation, according to thresholds that you define.
The use of identity mapping, is interesting because it allows you to support rights requests: for example, where is Philip’s data? To discover and map identities, the product uses fuzzy matching and the product ships with various pre-built classification policies such as PCI, GDPR and so forth. This is augmented by support for domains (name, email address and so forth) which are provided out of the box and can be combined.
Finally, we should mention the risk scoring (see Figure 2). In addition to providing risk analytics and key performance indicators there is also risk simulation planning. This allows you to see the impact of using different approaches to say, masking.
While sensitive data isn’t only about personal data, it is issues over complying with new privacy regulatory mandates that are driving the market for sensitive data discovery and the subsequent protection of that data. Informatica is well-known as a market leader in the data management space and this is where its strength lies. The company has very strong credentials when it comes to structured data and it has focused its discovery capabilities in this area, where it has significant strengths. We particularly like the company’s support for managing identities, which makes a lot sense within the context of GDPR, CCPA and similar regulations to determine data access.
Conversely, Informatica has respectable rather than comprehensive capabilities when it comes to discovery in unstructured environments. But, and this is a big but, companies that have focused on discovery for unstructured data tend to have very limited structured capability, typically limited to just Oracle and SQL Server. Most large enterprises are not limited to just these providers which would mean having two different sensitive data discovery solutions which, to our minds, does not represent any sort of solution.
The Bottom Line
Organisations should be aiming to have a single solution for sensitive data discovery that enables a data privacy governance strategy across a global enterprise. Organisations with multiple heterogeneous database implementations, as well as file systems, that they need to secure, would do well to shortlist Informatica as one of only a few companies that offers significant structured data discovery along with unstructured support.
Informatica Intelligent Data Platform
Last Updated: 3rd March 2021
Mutable Award: Platinum 2021
The Informatica Intelligent Data Platform is a platform for – as the name suggests – intelligently managing your data. It is designed to help you leverage data across your enterprise to provide strategic business value, and particularly so as part of digital and/or cloud transformations. Between the various solutions available through the platform, its overall capabilities are extremely broad, encompassing data integration, data quality, data governance, master data management, data cataloguing, data privacy, application integration, and more. All of these capabilities are underpinned by unified metadata management and connectivity layers, and CLAIRE, Informatica’s AI and machine learning intelligence engine, which drives automation and guides user experience throughout the platform. The platform itself is cloud-native, supports multi-cloud, hybrid and serverless deployments, and leverages microservices and APIs as core parts of its architecture. Its overall architecture can be seen in Figure 1.
A number of Informatica products (such as Cloud Data Integration, Cloud Data Quality, Informatica Axon Data Governance and Informatica Enterprise Data Catalog) are available through the Intelligent Data Platform, and these products form the heart of the platform’s cloud data management functionality. The breadth of capability these products provide when considered in sum is great, and what’s more, many (if not all) of them are best of breed within their own areas. They are tightly integrated within the platform, thanks in part to a shared metadata management layer and the AI powered automation provided by CLAIRE. For lack of space, we demur from commenting on these products and others further, except to note that each of them could support a paper similar to this one in its own right, and that several of them already have.
The platform boasts a wide variety of native connectivity options via its connectivity layer. This encompasses claims that it will support any type of data and any integration pattern, connecting to any source, any target, and in any way. To put it more concretely, over 220 native connectors are available. The connectivity layer itself can better be thought of as ‘connectivity as a service’ and provides a common connectivity interface for a variety of connectivity options, notably including CLAIRE-enhanced ‘rich’ metadata. In addition, the layer comes with the security and performance capabilities you would generally expect, including authentication and authorisation mechanisms, partitioning, pushdown optimisation, and so on.
CLAIRE itself enables a wide variety of AI and machine learning driven automation capabilities throughout the Intelligent Data Platform (and therefore throughout your data pipeline). When considered by itself, however, Informatica likes to describe the unified metadata foundation as “cloud-native metadata as an operating system”, and this is not without merit: CLAIRE consists of a core metadata “kernel” that provides open access to all of your enterprise metadata across both Informatica and non-Informatica applications, as well as several services that sit on top of and utilise that kernel. In point of fact, there are four of them: the metadata knowledge graph, which is automatically created by scanning critical metadata and relationships across your data assets, and generates enhanced understanding of your data; the cross platform discovery service, which provides metadata intelligence for data management derived across your entire platform; the active metadata service, which enables pervasive metadata ability across data applications; and the customer advisory service, which leverages continuous product feedback to provide in-product recommendations.
These core services are used by Informatica to automate a variety of data management related tasks, which Informatica describes collectively as “Intelligent Data Management Automation”. As before, we lack the space to go over each use case in any amount of detail, but some of these CLAIRE powered automation capabilities can be seen in Figure 2 (as can the architecture of CLAIRE as a whole). Note also that CLAIRE is, among other things, open and extensible: API based access is provided to both the underlying kernel and the overlaying services. Therefore, the list of use cases in Figure 2 is extensible as well, both by Informatica and, at least in principle, by your own organisation.
The reason you should care about the Informatica Intelligent Data Platform is the same as for any data platform: because it provides you with integrated, unified access to the products available within it. In Informatica’s case, these products are of generally high quality (and as already mentioned, many are best of breed) and, taken as a whole, offer comprehensive data management capabilities.
But if this was the only reason to care about the platform – if it was simply a collection of other products – then it would scarcely be worth writing about. In truth, the shared capabilities the platform offers – the metadata management layer, the connectivity layer, and CLAIRE – all add significant value to the platform that make it more than just the sum of its parts. In particular, a running theme across all of Informatica’s individual data management products is automation (and often AI-driven automation, at that), and the reason it is able to be so prevalent is that it exists at the platform level via CLAIRE. CLAIRE allows the Intelligent Data Platform and the products within it to provide automation at every stage of your data management lifecycle, as evidenced by the examples listed in Figure 2. Suffice it to say that enabling this level of automation is a very significant advantage.
The Bottom Line
The Informatica Intelligent Data Platform grants you access to an integrated, cloud-native, comprehensive and, above all, highly automated portfolio of data management products.
Informatica MDM
Last Updated: 25th September 2014
Informatica, a leader in data integration, has strong offerings for master data management. Its flagship product was originally based on the Siperian customer data integration technology, but is now a full multi-domain MDM product. Informatica has strengthened its product data capability through the acquisition of Heiler, a specialist in the mastering of product data.
Informatica MDM is noted for its high performance and scalability for high volume customer data implementations in particular, but it has a broad range of functionality, including support for data governance. It has for, some years, had some of the leading data quality technology on the market, a key element of any successful master data implementation. The company offers a broad platform, covering data integration and data quality as well as MDM.
Informatica focuses on large enterprises and public sector bodies. Known for its strong penetration for MDM in the pharmaceutical market, it now has a wide range of master data implementations across a range of industries. Its strong US presence is now complemented by growing customer deployments in Europe and Asia.
Informatica has some very large master data implementations, with customers including Thomson Reuters, UBS and Harrods. It has a large presence in the healthcare industry, particularly in North America, with customers such as Blue Shield.
The Informatica MDM technology has three distinct editions. Its flagship product has a high performance master data hub that is based on relational database technology. Its product data hub currently has a separate database but able to share data with the core product, and there is another technology for their cloud offering, which focuses on Salesforce CRM. These products can co-exist and work together.
Informatica has its own, highly functional, data quality technology. This allows master data records to be validated at source, enriched where needed and avoid data duplication between different customer source systems. Their technology has support for data governance, providing workflow and reports for data stewards. The flagship MDM technology is noted for its high scalability, and has some of the largest production implementations in the market of high-volume customer master data.
Informatica has a substantial services organisation, which for example has an offering to assist customers with building a quantified business case for MDM. They also partner with a wide range of systems integrators, both global and local, in order to ensure that customer implementations are successful.
Informatica Stream Processing
Last Updated: 14th December 2021
Informatica offers stream processing as part of Intelligent Data Management Cloud (IDMC), a broad, cloud-ready, and well-integrated data management platform. IDMC also includes a number of other services, a unified metadata and AI layer (CLAIRE), and over 10,000 metadata-aware connectors that cover all three major public clouds (among other things). In addition, Informatica has partnered with Datumize in order to maximise its compatibility with IoT. The platform also offers comprehensive cloud capabilities and is available on a consumption-based pricing model.
IDMC provides a single architecture (and user experience) for data ingestion – whether via streaming, batch, or whatever else – that leads into solutions for stream processing and data integration (see Figure 1). Features relevant to streaming include real-time ingestion, mass ingestion, automated handling of schema and data drift, and a Kappa messaging architecture.
Customer Quotes
“Informatica Cloud Mass Ingestion allowed us to generate hundreds of mappings in a very short time. It’s a straightforward, secure bridge from source to target, which is exactly what we need. We don’t require a VPN in order to maintain data security.”
University of New Orleans
“Informatica Cloud Mass Ingestion is so easy to use that it saves us 90 percent of the ETL effort. I can just open a browser and access it anytime, anywhere.”
University of New Orleans
Informatica’s solution for stream processing consists of several different products and services that combine within the singular platform of IDMC. The backbone of this solution consists of three services: Cloud Mass Ingestion, High Performance Messaging for Distributed Systems, and Data Engineering Streaming. Respectively, these provide streaming ingestion, high-speed messaging, and stream processing. Other Informatica offerings, such as Cloud Data Integration and Enterprise Data Catalog, can then add to this core. Taken as a whole, IDMC lets you ingest streaming data and move it to wherever it needs to be, while processing, transforming, and governing it as necessary.
More specifically, Cloud Mass Ingestion provides format-agnostic data movement and mass data ingestion, including file transfer, CDC (Change Data Capture) and exactly-once database replication. It also offers mass streaming ingestion from a variety of sources, complete with real time monitoring, alerting, and lifecycle management.
High Performance Messaging for Distributed Systems is what it says on the tin: a performant messaging system boasting “ultra-low latency”, targeted at distributed systems. In addition, it provides high resiliency and guaranteed message delivery. Alternately, you could use Kafka, or another messaging service, via the connectivity options that Informatica provides. For instance, Cloud Data Integration can gather and load in batch data from Kafka directly, with an “at least once” delivery guarantee. Enterprise Data Catalog can also scan Kafka deployments in order to extract relevant metadata (message structure, for instance).
Finally, Data Engineering Streaming is a continuous event processing engine built on Spark Streaming that is designed to handle big data. It supports both batch and streaming data, and it features out of the box connectivity to various messaging sources, as well as no-code visual development (shown in Figure 2). As part of the latter, it provides hundreds of prebuilt transformation functions, connectors and parsers. You can also pipe in your own code, or build your own functions and whatnot using Informatica’s business rules builder. Essentially, it allows you to enrich your streaming data in real time. This could mean improving data quality, masking sensitive data, aggregation, or what have you.
IDMC also supports Spark Structured Streaming, which can be important if you want to aggregate streaming data based on event time (not processing time) and hence reorder data that has arrived out of order before delivering it to your data target. It also supports Confluent Schema Registry, which can be used to parse Kafka messages, retrieve message structure, and handle schemas as they change and grow.
Moreover, you can use CLAIRE to augment your solution with machine learning. For example, to automatically detect and generate schemas. This is particularly beneficial in that it allows you to discover and rectify data (and schema) drift. Informatica also provides a ready-made integration pipeline for data science and machine learning which helps you to apply machine learning and AI models to your streaming flows.
In addition, Informatica is keenly aware of the need to govern streaming data, and the company’s suite of data governance products are available for this purpose. Data cataloguing, preparation, discovery, lineage, and visualization are all available, and have been designed to promote self-service and collaboration. Security features, like masking, authentication and access control, are also available, and real-time job monitoring, analytics, and visualisation are provided via the Operational Insights service.
Informatica provides a high-quality, comprehensive stream processing solution that is positioned as just one part of a much larger, and broader, integrated data platform. Moreover, it is highly compatible with the cloud, and includes native cloud ingestion; it provides a broad range of connectivity, exemplified by the sheer quantity of connectors and scanners provided; and it offers a unified user experience, regardless of whether you’re deploying in-cloud or on-prem, which data sources you’re using, whether you’re using it for streaming or batch processing, and so on. The latter is particularly important, in that it allows you to abstract out much of the underlying complexity of stream processing, thus enabling your business users to work that much more efficiently and effectively.
We are also impressed by IDMC’s ability to transform streaming data in real-time, especially regarding its substantial number of built-in transformations. This is an area where Informatica’s breadth of capability really shines, by allowing you to combine high-end data quality and masking with stream processing. Moreover, Informatica’s data governance and security solutions combine with streaming in much the same way, with similar benefits.
The Bottom Line
Informatica Data Management Cloud offers highly capable and well-integrated stream processing as part of its overall data management functionality.
Informatica Test Data Management (2019)
Last Updated: 18th June 2019
Mutable Award: Gold 2019
Informatica Test Data Management (TDM) is, as the name suggests, Informatica’s solution for test data management. It offers a number of features, including data subsetting, static data masking, synthetic data generation, a test data warehouse and a self-service portal. Dynamic data masking – usually used with production data – is available but requires a separate license.
TDM can act on a range of data sources that span relational, NoSQL, cloud and mainframe databases. Most recently, Informatica has added connectivity support for VSAM and IMS mainframe databases, MongoDB, Cassandra, Redshift and Azure (including both Azure DB and Azure SQL).
Customer Quotes
“We are shrinking clients’ development cycles by working with smaller sets of test data, and lowering IT costs through the use of smaller data sets that require less storage and fewer system resources.”
Cognizant Testing Services
The product provides a variety of methods for data subsetting and rules for generating synthetic data (including bad data), as well as extensive options to support data masking. The latter is policy-driven, and includes support for unstructured data, federated masking (to ensure consistency across multiple datasets), and encryption. The last of these can be reversible or not as the situation requires, is based on NIST approved algorithms, and can be either format or metadata preserving. The product also ships with out-of-the-box masking policies that support PCI, PII and PHI compliance.
A major capability is the creation of a test data warehouse that can be used to provision updated test data on demand and without troubling the production environment. This links into DevOps environments and tools (such as Jenkins) and there is integration with HPE ALM. In addition, there is support for flat files as well as databases.
There are multiple ways to enable self-service within this environment. For example, you can distribute data to your testers via a parameterised test data plan: testers fill in the plan’s parameters whenever they need relevant test data and receive it accordingly. The product also includes a self-service portal (shown in Figure 1) that allows test data admins to publish test data sets for consumption by testers. Testers can provision any data sets that are made available to them, and can modify, subset or copy them locally as they will, allowing them to customise their test data for their specific testing requirements. Data sets can be published directly to the portal from within the test data warehouse, and once within the portal can be tagged to make them easier to search through.
TDM also provides a graphical test data coverage capability, as seen in Figure 2, that allows you to see whether you have sufficient data to provide the level of coverage you require. The blank spaces in the picture indicate uncovered test data combinations. Armed with this information, you can augment your test data as necessary.
For the discovery of sensitive data, Informatica provides discovery options that use pattern matching, dictionaries, algorithmics, and other techniques – including machine learning – to reduce false positives. Relevant curation facilities are also provided. Moreover, these discovery capabilities are extended by Informatica Secure@Source, an additional product that provides enterprise-wide discovery of sensitive data and continuous multi-factor sensitive data risk monitoring. In particular, Secure@Source provides lineage against masked data and allows you to see where data has been masked as it flows through your enterprise. Moreover, Secure@Source is integrated with TDM’s masking functionality, allowing you to, for example, start a masking process in TDM from within Secure@Source.
Informatica TDM is an outstandingly broad solution, offering functionality that covers almost all aspects of test data management and enabling it to address a wide variety of use cases. Its capabilities include data subsetting, static data masking, synthetic data generation, data discovery, test data provisioning, and format-preserving encryption. Moreover, it is highly capable in all of these areas. This is a rare thing, especially in regards to synthetic data.
Several of its capabilities serve particularly well as differentiators, including the ability to visually display test data coverage, the test data warehouse and the self-service portal. The latter is an especially strong point in the product’s favour: the ease of use and reduced reliance on a test data admin that it provides could be a major boon to your testers’ productivity.
In addition, concerns over data breaches and regulatory compliance (particularly GDPR) have been prominent in 2018, and this will likely continue throughout 2019. Data breaches, in particular, are rapidly becoming a case of “when, not if”. This has led to an increased need for data security, which will often overlap with test data management. For instance, data masking is frequently used for both purposes. Consequently, the fact that Informatica is able to provide both test data management and data security solutions is appealing, since it can make sure that your solutions for the two spaces are well integrated where they overlap. Moreover, it will often be easier to extend an existing data security solution (such as Informatica) to cover test data management (or vice versa) than to start from scratch, not least because at least one major component – masking – will most likely already be in place.
The Bottom Line
TDM proves very competent in almost all areas of test data management. It is certainly a market leader, and it should most likely be on your shortlist.
Informatica Test Data Management (2021)
Last Updated: 12th July 2021
Mutable Award: Gold 2021
Informatica Test Data Management (TDM) is Informatica’s solution for test data management, that it bundles as Secure Testing. It offers data subsetting, static data masking, and synthetic data generation, as well as easy access via a test data warehouse and a self-service portal. It fully supports the cloud and is available on all major cloud providers, and it can act on a range of data sources, including relational, NoSQL, cloud and mainframe databases as well as flat files.
Moreover, TDM is only one of a number of Informatica products that collectively provide a holistic solution for data governance, privacy, and protection. Conversely, the broader Informatica ecosystem enriches TDM: for example, it allows it to leverage CLAIRE, Informatica’s shared metadata and AI layer. TDM also exposes a range of REST APIs for integration with third-party software.
Customer Quotes
“We are shrinking clients’ development cycles by working with smaller sets of test data, and lowering IT costs through the use of smaller data sets that require less storage and fewer system resources.”
Cognizant Testing Services
The product supports a variety of methods and options for data subsetting, data masking, and rules-based synthetic data generation. Data masking in particular is policy-driven, complete with out-of-the-box masking policies that support PCI, PII and PHI compliance. Masking works with structured and unstructured data, is federated to ensure consistency across multiple datasets (including datasets in multiple locations, such as on-prem and in cloud), and can leverage encryption that can be reversible or irreversible, and format or metadata preserving, as required. For auditing purposes, you can also generate compliance reports that show exactly how much of your sensitive (test) data is masked.
Furthermore, there are multiple ways to accelerate your test data provisioning within this environment. For instance, the product includes a self-service portal (see Figure 1) that allows test data admins to publish test data sets for consumption. Testers can provision any data sets that are made available to them, and can modify, subset or copy them locally as they will, allowing them to customise their test data for their specific testing requirements. Data within the portal can also be tagged to make it easier to search through. Somewhat more primitively, you can distribute data to your testers via parameterised test data plans, allowing your testers to fill in the plan’s parameters whenever they need relevant test data, and receive it accordingly.
On the other hand, you could integrate provisioning into your existing test automation workflows, perhaps by leveraging the product’s test data warehouse. This is an additional capability that can be used to store, and subsequently provision, up-to-date test data on-demand without troubling your production environment. You can also use it as a baseline to reset your personal testing environment against, allowing you to experiment as you wish without negatively impacting either other testers or your own overall testing efforts. It can even link into your DevOps environments, CI/CD pipelines, and test automation workflows, thus automating your provisioning. You can also use it to contribute to the solution’s self-service capability by publishing data sets in the warehouse directly to the portal.
In addition, the product provides a number of neat visual capabilities, including graphical test data coverage and an entity view while subsetting. The former in particular allows you to see whether you have sufficient data to achieve the level of coverage you desire. This is shown in Figure 2.
Finally, TDM can discover (and hence mask) your sensitive data using several methods. This includes domain and pattern matching, dictionaries, algorithmics, and AI/machine learning. These discovery capabilities are extended by Informatica Data Privacy Management, an additional product that provides enterprise-wide discovery and classification of sensitive data, DSAR reporting, and continuous multi-factor sensitive data risk monitoring with AI-enabled analytics. In particular, Data Privacy Management also tracks the lineage of your masked data and allows you to see where your data has been masked as it flows through your organisation. It is also integrated with TDM’s masking functionality, allowing you to, for example, automate a masking process in TDM from within Data Privacy Management to support risk remediation policies.
TDM is an outstandingly broad test data solution. It offers functionality that covers almost all aspects of test data management, which consequently enables it to address a wide variety of use cases. Its capabilities include data subsetting, static data masking, synthetic data generation, data discovery, test data provisioning, self-service, and format-preserving encryption. Moreover, it is capable in all of these areas, and you could reasonably argue that it is best-of-breed in several.
Standout features include visual test data coverage, the test data warehouse, and the self-service portal. The ease of use, reuse, enhanced collaboration, and reduced dependency on admins that the latter two provide are particularly notable as boons to your testers’ productivity.
Even then, TDM is still only one solution among many available from Informatica, and the breadth of that ecosystem – even if you only consider privacy and governance – is its own advantage, and one that TDM benefits from substantially. For example, shared metadata can be used to enable a far more comprehensive approach to data protection and transparency. There is not enough space on this page to go into the finer details, but suffice it to say that when it comes to Informatica, the whole is greater than the sum of its parts (and the parts themselves are highly capable to begin with).
The Bottom Line
TDM proves very competent in almost every area of test data management, with a particular penchant for robust self-service and expedient test data provisioning. There is little reason it shouldn’t be on your shortlist, especially in the context of building out more complete data governance and privacy operations.