skip to Main Content

Test Data Management

Last Updated:
Analyst Coverage:

Traditionally, testing and quality assurance create test data by copying the live database. However, the average Global 2000 company has seven such copies, which is expensive in terms of license fees, hardware, and running costs. A cheaper option is to take subsets of the database instead of copies. However, without sophisticated tools that ensure the subset you take is representative of the database as a whole, you cannot ensure that you will be able to fully cover all the testing scenarios that might apply. Thus there is a trade-off between cost and quality of testing.

The second problem that assails the test data environment is that you need to put as little workload onto DBAs as possible, otherwise the testing (and therefore the development environment as a whole) will be less agile than it needs to be. Operations is frequently seen as an obstacle to providing test data while Development is all too frequently seen as a nuisance by DBAs. DevOps is a generalised approach to ensuring improved collaboration across these environments while test data management is a specific technology designed to achieve this; while at the same time supporting an agile development environment where testing is conducted early and often.

Test data management aims to square the circle of providing fully representative data with right-sized datasets (you may need a differently sized subset for different types of test) together with minimal impact on the database administrator. There are two methods generally in use for generating test data: either you take a subset of the data that is representative or you generate a synthetic set of data. The latter can be achieved either by sub-setting the data and then repeatedly applying data masking techniques while the former relies on having profiled the source data using a data profiling and discovery tool.

The advantage of a completely synthetic approach is that you don’t touch the live data at all, other than for the original profiling, and therefore it is very quick and easy to generate new test data sets without having to go to operations for assistance. Thus this is a particularly suitable approach for agile requirements.

Test data management solutions will also include data masking capabilities, so that personally identifiable and other sensitive data can be discovered and masked in an appropriate fashion (this is really a governance issue); although it should be noted that this is not necessary if you are generating completely synthetic data.

Those in charge of testing teams and quality control will be the most interested but this is also relevant for compliance officers (especially when development is to be outsourced) because of the synthetic or masked aspects of the data used for testing.

In addition, development teams adopting an agile methodology should care because agile development is not much use without agile testing and you can’t have agile testing if you don’t also have agile test data.

While test data management has actually been around for some years it is only in this decade that it has really come to the fore. In our view the most likely trend going forward is the merger of test data management with service virtualisation to further speed up testing processes. Indeed, partnerships and acquisitions are already taking place within this sector to enable exactly this.

One noticeable fissure in the market is between those companies providing test data management from the perspective of developers (integrating with service virtualisation, testing tools, code coverage and so on) as exemplified by Grid-Tools, and those that offer a more data-centric approach, as typified by Informatica. In practice, nearly all vendors are in the latter camp which potentially gives Grid-Tools an advantage.

Informatica acquired Applimation, IBM acquired Greenhat (a service virtualisation provider) and Grid-Tools has extended its portfolio to include service virtualisation. The latter has also partnered with a number of the service virtualisation vendors. New entrants into the field include Rever and Delphix where the latter is a virtualised environment for SQL Server and Oracle. It works with, rather than provides, data masking.

The big trend, however, is towards synthetic data generation. It used be that only Grid-Tools offered this but now GenRocket has emerged, Rever has introduced a test data management product (SEAL) that also includes data masking, and Informatica has added synthetic data generation. We expect IBM to follow suit in due course.

The next step for vendors will be to introduce something comparable to Grid-Tools’ test data warehouse. Informatica has announced that it will do so later in 2014.

Solutions

  • AB INITIO logo
  • BMC logo
  • broadcom logo
  • CURIOSITY SOFTWARE logo
  • DATPROF logo
  • DELPHIX logo
  • GENROCKET logo
  • IBM (logo)
  • Informatica (logo)
  • IRI logo
  • MAGE logo
  • REDGATE logo
  • SOLIX logo
  • WINDOCKS logo

These organisations are also known to offer solutions:

  • Net2000
  • OpenText
  • Oracle
  • Original Software
  • Polarion
  • Rever
  • Synthesized

Research

00002829 - WINDOCKS TDM InBrief (cover thumbnail)

Windocks (2024)

Windocks is a platform for containerised, enterprise-level test data management designed to support and be supported by AI and machine learning technologies.
Curiosity TDA InBrief (Mar 2024) cover thumbnail

Curiosity Software Test Data Automation (2024)

Curiosity Software Test Data Automation is a test data management solution that embeds automated test data creation and delivery into your test processes.
DatProf InBrief (cover thumbnail)

DATPROF (2024)

DATPROF provides a test data management platform that is comprised of five separate (but highly integrable) products.
Broadcom Test Data Manager cover thumbnail

Broadcom Test Data Manager (2024)

Broadcom Test Data Manager uses data subsetting, data masking, data profiling and synthetic data generation to produce secure test data at scale.
00002810 - IRI Voracity InBrief (cover thumbnail)

Test Data Management in IRI Voracity

IRI Voracity is a “total data management” platform that can be used as an effective solution for test data management.
IRI GDPR InContext cover thumbnail

IRI information privacy compliance

GDPR regulates the security and use of EU citizens’ personal data but is a model for similar regs worldwide. These laws mandate protection for personal data.
TDM InComparison (cover thumbnail)

Test Data Management: Delphix, Mage, Redgate

In this paper, we describe and compare three prominent test data management solutions: Delphix DevOps Data Platform, Mage and Redgate SQL Provision.
TEST DATA MANAGEMENT AND MAGE InContext (cover thumbnail)

Test Data Management and Mage

This paper discusses the challenges within the test data management space, and highlights Mage as a means of addressing them.
Back To Top