Test Data Management
Analyst Coverage: Philip Howard
Traditionally, testing and quality assurance create test data by copying the live database. However, the average Global 2000 company has seven such copies, which is expensive in terms of license fees, hardware, and running costs. A cheaper option is to take subsets of the database instead of copies. However, without sophisticated tools that ensure the subset you take is representative of the database as a whole, you cannot ensure that you will be able to fully cover all the testing scenarios that might apply. Thus there is a trade-off between cost and quality of testing.
The second problem that assails the test data environment is that you need to put as little workload onto DBAs as possible, otherwise the testing (and therefore the development environment as a whole) will be less agile than it needs to be. Operations is frequently seen as an obstacle to providing test data while Development is all too frequently seen as a nuisance by DBAs. DevOps is a generalised approach to ensuring improved collaboration across these environments while test data management is a specific technology designed to achieve this; while at the same time supporting an agile development environment where testing is conducted early and often.
Test data management aims to square the circle of providing fully representative data with right-sized datasets (you may need a differently sized subset for different types of test) together with minimal impact on the database administrator. There are two methods generally in use for generating test data: either you take a subset of the data that is representative or you generate a synthetic set of data. The latter can be achieved either by sub-setting the data and then repeatedly applying data masking techniques while the former relies on having profiled the source data using a data profiling and discovery tool.
The advantage of a completely synthetic approach is that you don’t touch the live data at all, other than for the original profiling, and therefore it is very quick and easy to generate new test data sets without having to go to operations for assistance. Thus this is a particularly suitable approach for agile requirements.
Test data management solutions will also include data masking capabilities, so that personally identifiable and other sensitive data can be discovered and masked in an appropriate fashion (this is really a governance issue); although it should be noted that this is not necessary if you are generating completely synthetic data.
Those in charge of testing teams and quality control will be the most interested but this is also relevant for compliance officers (especially when development is to be outsourced) because of the synthetic or masked aspects of the data used for testing.
In addition, development teams adopting an agile methodology should care because agile development is not much use without agile testing and you can’t have agile testing if you don’t also have agile test data.
While test data management has actually been around for some years it is only in this decade that it has really come to the fore. In our view the most likely trend going forward is the merger of test data management with service virtualisation to further speed up testing processes. Indeed, partnerships and acquisitions are already taking place within this sector to enable exactly this.
One noticeable fissure in the market is between those companies providing test data management from the perspective of developers (integrating with service virtualisation, testing tools, code coverage and so on) as exemplified by Grid-Tools, and those that offer a more data-centric approach, as typified by Informatica. In practice, nearly all vendors are in the latter camp which potentially gives Grid-Tools an advantage.
Informatica acquired Applimation, IBM acquired Greenhat (a service virtualisation provider) and Grid-Tools has extended its portfolio to include service virtualisation. The latter has also partnered with a number of the service virtualisation vendors. New entrants into the field include Rever and Delphix where the latter is a virtualised environment for SQL Server and Oracle. It works with, rather than provides, data masking.
The big trend, however, is towards synthetic data generation. It used be that only Grid-Tools offered this but now GenRocket has emerged, Rever has introduced a test data management product (SEAL) that also includes data masking, and Informatica has added synthetic data generation. We expect IBM to follow suit in due course.
The next step for vendors will be to introduce something comparable to Grid-Tools’ test data warehouse. Informatica has announced that it will do so later in 2014.
Further resources to broaden your knowledge:
Test Data Management
This Market Update compares TDM (Test Data Management) products.
Optimising HPE ALM with CA Agile Requirements Designer
CA Agile Requirements Designer brings significant additional capabilities to HPE ALM environments.
The data management implications of GDPR
This paper discusses the EU's forthcoming General Data Protection Regulation (GDPR) not from a legal perspective but from the point of view of data management.
Total cost of ownership
TCO should be more important in decision making than either license fees or subscription costs.
Test Case Generation
In this paper we compare and examine the different vendor products that are available in the market for this purpose.
Automated test case generation
In this paper we focus on automating the testing process, it is there where we believe that the greatest savings and efficiencies can be achieved.
Attunity Gold Client
Attunity Gold Client provides test data management and allied capabilities for SAP applications.
Exploring Successful Approaches to Test Data Management
Philip Howard, Research Director and Practice Leader for Information Management at Bloor Research, will present his research on test data management