DATPROF
Last Updated:
Analyst Coverage: Daniel Howard
DATPROF is a Dutch company, founded in 1998, that specialises in test data management. Its clients, which are concentrated in Europe but stretch to North America and the Pacific regions, include ING, Thomas Cook and the Dutch government. Moreover, it boasts a 95% customer retention rate.
DATPROF offers four products that are relevant to test data management: DATPROF Subset, DATPROF Privacy, DATPROF Analyze and DATPROF Runtime. Respectively, they provide data subsetting, data masking, data analysis and data provisioning. They can be licensed individually or as a whole and, in the latter case, work in concert to create an effective and easy to use test data management solution.
DATPROF Test Data Management (2019)
Last Updated: 14th June 2019
DATPROF offers four products that are relevant to test data management: DATPROF Subset, DATPROF Privacy, DATPROF Analyze and DATPROF Runtime. Respectively, they provide data subsetting, data masking, data analysis and data provisioning. They can be licensed individually or as a whole and, in the latter case, work in concert to create an effective and easy to use test data management solution.
Customer Quotes
“The initial results of the implementation at Coöperatie VGZ seem to indicate that DATPROF saves over 70% storage use of the non production environments.”
VGZ
“After a period of familiarization there is little need for the use of production data for development and testing. The tooling of DATPROF is characterized by its high usability and flexibility.”
De Friesland Zorgverzekeraar
DATPROF Subset is used to create subsets of your existing production data for testing purposes. Subsets are generated using a single table as a base, but during the subsetting process you will need to
classify the other tables in your system or systems based on their relationship to that table, which will determine whether, and to what extent, those tables are included in your subset. This classification is assisted by suggestions made by the product, and can ultimately be visualised as either a data or process model, as seen in Figure 1. Both models are very helpful for understanding the structure of your database and therefore how to conduct the subsetting process.
Once you have created your subset, the product provides several ways to copy it into your test database. Notably, it allows you to append new test data onto existing tables that may or may not be populated. This is an enormous time-saver: you don’t need to regenerate all of your tables and re-copy all of your data whenever you add a new test case. Additionally, Subset takes care of duplicate data for you (or not, as appropriate). Options for recreating or refilling existing tables are also available. Subsets can also be deployed via DATPROF Runtime.
DATPROF Privacy is used to mask sensitive (test) data by applying one or more masking rules to it. It supports Oracle, SQL Server, PostgreSQL and IBM DB2 databases natively (with support for more forthcoming), and can (in theory) support any data source via a processing engine (which is to say, one of the aforementioned databases). It can also mask data located inside files that have been stored in a variety of formats, including CSV and XML. In addition, it masks live data directly inside each data source, meaning there is never any need to extract data (only metadata).
The product provides support for several masking rules out of the box, as well as custom masking rules and constraints. It allows you to enforce the order in which masking rules take effect by setting dependencies, masks consistently over all of your systems and applications, and maintains its own audit log, which is exposed as an HTML report after each run. As with DATPROF Subset, you can run the masking process via the product itself or by using DATPROF Runtime. It’s also worth noting that both DATPROF Privacy and DATPROF Subset complete their deployment operations by generating and running a SQL script. This ensures that they remain performant.
DATPROF Analyze is a data profiling and discovery product. Its chief relevance to test data management is in its ability to find sensitive data and personally identifiable information (PII) within either your subsets or your overall system. As a data profiling tool, it can also be used to understand how your data is being used, and hence inform the creation of subsets and synthetic data, as well as to investigate and identify data quality issues. It includes country specific profiles, allowing you to search for different varieties of PII depending on the country in which your data originates, and the profiling results it generates can be exported as an HTML report.
DATPROF Runtime (as seen in Figure 2) allows you to centrally manage your test data environments, testing teams and applications. It can be used to initiate and monitor a variety of test data management processes (such as subsetting and masking), comes with built-in error handling and logging (including a complete run history), and exposes a REST API for integration with, among other things, continuous testing platforms. It also natively integrates with both Tricentis Tosca and Parasoft.
Moreover, DATPROF Runtime is designed to facilitate ‘test data provisioning’, or the operationalisation of test data. It does this in two ways. Firstly, it enables self-service access to test data. Secondly, it allows you to proactively provide your testers with data. This is accomplished by generating application templates from your test datasets, which can then be deployed to any number of different machines, allowing testers to access the instantiated application using specifically the test data you have provided for them.
Finally, DATPROF Runtime also provides synthetic data generation capabilities, primarily as a means to fill in missing data in your existing test datasets. Although creating a synthetic dataset from scratch is possible, it is not a recommended approach except when you have no access to production data. A variety of generators are supplied, several of which are country specific. You can also opt to create your own.
The area where DATPROF most excels is ease of use: the various DATPROF products, although technically separate tools, feel like a single product that is exceptionally easy and intuitive to work with, and doesn’t require any significant training. This means that you can discover, subset, mask and deliver your data easily and – more importantly – quickly.
This is significantly enhanced by the most recent addition to the DATPROF product suite, DATPROF Runtime. Being able to create test data easily is important, but ultimately somewhat futile without the means to efficiently distribute that data to your testers. This is exactly what DATPROF Runtime provides. Moreover, by supplying a centralised location from which to manage your various testing assets and processes, it makes DATPROF easier to use than ever.
The Bottom Line
When we looked at DATPROF previously, we concluded that it was an exceptionally easy to use product with good – but not outstanding – functionality. In the time since, DATPROF has introduced data profiling, synthetic test data, and test data provisioning, among other things. In doing so, DATPROF has expanded on the latter without sacrificing – in fact, enhancing – the former.
DATPROF Test Data Management (2021)
Last Updated: 12th July 2021
DATPROF offers four products that are relevant to test data management: DATPROF Subset, DATPROF Privacy, DATPROF Analyze and DATPROF Runtime. Respectively, they provide data subsetting; static data masking; data discovery, profiling, and analysis; and data provisioning and centralised deployment. Synthetic data generation is also provided as part of DATPROF Privacy, as a way to augment – rather than replace – subsetting and masking. DATPROF’s products can be licensed individually or as a whole and, in the latter case, work in concert to create an effective and easy to use test data management solution.
Customer Quotes
“The initial results of the implementation at Coöperatie VGZ seem to indicate that DATPROF saves over 70% storage use of the non production environments.”
VGZ
“After a period of familiarization there is little need for the use of production data for development and testing. The tooling of DATPROF is characterized by its high usability and flexibility.”
De Friesland Zorgverzekeraar
DATPROF Subset is used to create subsets of your existing production data for testing purposes. They are generated using a single, driver table as a start point, with other tables included based on their relationship to that table. These relationships can be derived from existing database relationships or specified manually, and the process is assisted by intelligent suggestions for which table content should be included in full, as opposed to in part.
The results can be visualised as either a data or process model (both shown in Figure 1). These are helpful for understanding your database’s structure, and thus how best to create your subset. Various validation techniques are provided to facilitate this process. Options exist to either completely refresh your test database or to append new test data cases to your existing data content, and duplicate data is handled appropriately while ensuring all constraints remain valid.
DATPROF Privacy is a rule-based data masking solution with native support for Oracle, SQL Server, PostgreSQL, MySQL, IBM DB2 and MariaDB. It can, in theory, support any other data source via a processing engine (which is to say, one of the aforementioned databases) and it can mask data stored in a variety of formats, including CSV and XML. Notably, it masks live data in-situ, meaning you never need to move or extract it for the purposes of masking. Masking rules can be customised or leveraged out of the box, and can be applied in a specific order by setting dependencies. The product masks consistently over all of your systems and applications, and it delivers meaningful audit reports on your data masking and subsetting actions.
DATPROF Privacy also provides the company’s synthetic data generation capability, compatible with all of the data sources listed above. The product provides a selection of replacement data candidates and algorithms out of the box, including logical generators, weighted lists, regular expressions, generators that leverage seed data, and more. You can also build your own, using custom database functions, multi-column seed files (for example, a correlated seed list) and “generator expressions” that allow you to combine other types of generator into a bespoke formula, among other things.
Synthetic data is generated directly in the database, in a uniform fashion for all major databases, and either during or after masking depending on whether you want to add data to your subset or replace data that’s already there. Data is created against “generation sets” of tables, with each column in the table assigned one of the generators described above. Various configuration options are available on each column, including the percentage of null values to generate. Generated values can be combined to create a fully synthetic data set (for example, concatenating first and last names to get full names), columns can be earmarked to generate simultaneously in order to preserve correlations, and foreign key relationships can be discovered and included in your generated data automatically.
Finally, DATPROF Runtime (see Figure 2) allows you to centrally configure, manage and monitor your test data user-groups, their databases, and the masking and subsetting applications available to them. A REST API is provided to help facilitate this.
First of all, DATPROF excels in terms of ease of use. The various DATPROF products, although technically separate tools, feel like a single product that is exceptionally easy to work with. This means that you can discover, subset, mask and deliver your data easily and, more importantly, quickly. DATPROF Runtime is particularly notable for helping you to fit your test data processes into a CI/CD pipeline, and thus accelerate the delivery of your test data.
Since our last review, DATPROF has added a synthetic data generator to DATPROF Privacy, making the suite as a whole significantly more comprehensive. This capability is also rather robust, and even more so considering it is positioned as an ancillary capability to data subsetting and masking. We particularly like the attention paid to helping you carry relationships from your original data into your synthetic data. It is also demonstrably performant, and even automatically optimises its process flow to facilitate parallelisation.
In addition, it is clear there is still more to come from DATPROF’s synthetic data capability. Future updates will likely make it available as a standalone product, as well as offer greater synergy with DATPROF Analyze that could, for example, use its profiling capability to automatically populate your generators.
The Bottom Line
DATPROF is an easy to use, cost-effective, and well-rounded test data management solution. It may not offer every bell and whistle, but it includes the lion’s share, and much of what’s most useful at that. If you’re looking for a product that is streamlined and compact compared to the big boys in the space – but perhaps just as, if not more, useful – it is certainly worth a look.
Commentary
Coming soon.