skip to Main Content

Test Data Management

Last Updated:
Analyst Coverage:

Traditionally, testing and quality assurance create test data by copying the live database. However, the average Global 2000 company has seven such copies, which is expensive in terms of license fees, hardware, and running costs. A cheaper option is to take subsets of the database instead of copies. However, without sophisticated tools that ensure the subset you take is representative of the database as a whole, you cannot ensure that you will be able to fully cover all the testing scenarios that might apply. Thus there is a trade-off between cost and quality of testing.

The second problem that assails the test data environment is that you need to put as little workload onto DBAs as possible, otherwise the testing (and therefore the development environment as a whole) will be less agile than it needs to be. Operations is frequently seen as an obstacle to providing test data while Development is all too frequently seen as a nuisance by DBAs. DevOps is a generalised approach to ensuring improved collaboration across these environments while test data management is a specific technology designed to achieve this; while at the same time supporting an agile development environment where testing is conducted early and often.

Test data management aims to square the circle of providing fully representative data with right-sized datasets (you may need a differently sized subset for different types of test) together with minimal impact on the database administrator. There are two methods generally in use for generating test data: either you take a subset of the data that is representative or you generate a synthetic set of data. The latter can be achieved either by sub-setting the data and then repeatedly applying data masking techniques while the former relies on having profiled the source data using a data profiling and discovery tool.

The advantage of a completely synthetic approach is that you don’t touch the live data at all, other than for the original profiling, and therefore it is very quick and easy to generate new test data sets without having to go to operations for assistance. Thus this is a particularly suitable approach for agile requirements.

Test data management solutions will also include data masking capabilities, so that personally identifiable and other sensitive data can be discovered and masked in an appropriate fashion (this is really a governance issue); although it should be noted that this is not necessary if you are generating completely synthetic data.

Those in charge of testing teams and quality control will be the most interested but this is also relevant for compliance officers (especially when development is to be outsourced) because of the synthetic or masked aspects of the data used for testing.

In addition, development teams adopting an agile methodology should care because agile development is not much use without agile testing and you can’t have agile testing if you don’t also have agile test data.

While test data management has actually been around for some years it is only in this decade that it has really come to the fore. In our view the most likely trend going forward is the merger of test data management with service virtualisation to further speed up testing processes. Indeed, partnerships and acquisitions are already taking place within this sector to enable exactly this.

One noticeable fissure in the market is between those companies providing test data management from the perspective of developers (integrating with service virtualisation, testing tools, code coverage and so on) as exemplified by Grid-Tools, and those that offer a more data-centric approach, as typified by Informatica. In practice, nearly all vendors are in the latter camp which potentially gives Grid-Tools an advantage.

Informatica acquired Applimation, IBM acquired Greenhat (a service virtualisation provider) and Grid-Tools has extended its portfolio to include service virtualisation. The latter has also partnered with a number of the service virtualisation vendors. New entrants into the field include Rever and Delphix where the latter is a virtualised environment for SQL Server and Oracle. It works with, rather than provides, data masking.

The big trend, however, is towards synthetic data generation. It used be that only Grid-Tools offered this but now GenRocket has emerged, Rever has introduced a test data management product (SEAL) that also includes data masking, and Informatica has added synthetic data generation. We expect IBM to follow suit in due course.

The next step for vendors will be to introduce something comparable to Grid-Tools’ test data warehouse. Informatica has announced that it will do so later in 2014.

Solutions

  • AB INITIO logo
  • BMC logo
  • broadcom logo
  • CURIOSITY SOFTWARE logo
  • DATPROF logo
  • DELPHIX logo
  • GENROCKET logo
  • IBM (logo)
  • Informatica (logo)
  • IRI logo
  • MAGE logo
  • REDGATE logo
  • SOLIX logo
  • WINDOCKS logo

These organisations are also known to offer solutions:

  • Net2000
  • Oracle
  • Original Software
  • Polarion
  • Rever
  • Synthesized
APPLICATION QUALITY ASSURANCE AT BROADCOM InContext cover thumbnail

Application Quality Assurance at Broadcom

In this Bloor InContext report, we discuss and evaluate Broadcom’s solution for application quality assurance.
INFORMATICA InBrief cover thumbnail

Informatica Test Data Management (2019)

This paper discusses and evaluates Informatica Test Data Management.
00002657 - MENTIS InBrief (cover thumbnail)

MENTIS Test Data Management (2021)

MENTIS is a data governance, privacy and security platform that offers sophisticated data discovery alongside data masking, subsetting, monitoring, and more.
post (Icon)

Accelerating Software Quality: Machine Learning and Artificial Intelligence in the Age of DevOps - a book by Eran Kinsbruner et al

Accelerating Software Quality aims to show how AI and ML helps to make data-driven decisions, automating processes and delivering higher quality software.
Cover for What's Hot in Data?

What’s Hot in Data

In this paper, we have identified the potential significance of a wide range of data-based technologies that impact on the move to a data-driven environment.
00002662 - GENROCKET InBrief (cover thumbnail)

GenRocket (2021)

GenRocket is a platform for enterprise-level test data automation that offers high-end synthetic data generation.
00002658 - REDGATE InBrief (cover thumbnail)

Redgate SQL Provision (2021)

SQL Provision is a solution for test data management that combines two Redgate products: Data Masker for data masking, and SQL Clone for database cloning.
COMPUWARE InBrief cover thumbnail

Compuware Test Data Management

This paper discusses and evaluates Compuware’s test data management offering.
Back To Top