DATPROF

Last Updated: 6th June 2024
Analyst Coverage: Daniel Howard

DATPROF is a Dutch company that develops test data management software solutions. It was founded in 1998, and has offices in Europe and North America. It has around 500 active users spanning approximately 100 clients, and its client base is concentrated in Europe but increasingly encompasses both North America and the Pacific regions. The company also boasts a greater than 95% customer retention rate.

DATPROF offers four products that are relevant to test data management: DATPROF Subset, DATPROF Privacy, DATPROF Analyze and DATPROF Runtime. Respectively, they provide data subsetting, data masking, data analysis and data provisioning. They can be licensed individually or as a whole and, in the latter case, work in concert to create an effective and easy to use test data management solution.

Company Info

Headquarters: Friesestraatweg 213b, 9743 AD Groningen, The Netherlands
Telephone: +31 (0) 50 - 571 03 05

DATPROF

Last Updated: 7th February 2024
Mutable Award: Gold 2024

What is it?

Figure 1 - DATPROF platform conceptual architecture

DATPROF provides a test data management platform that is comprised of five separate (but highly integrable) products. DATPROF Runtime forms the basis of the platform, containing the company’s core data provisioning and automation functionality. As such, it is necessary for the platform to function. Its sister products function more like modules: DATPROF Subset, DATPROF Privacy, DATPROF Analyze, and DATPROF Virtualize provide data subsetting, data masking and synthetic data generation, data profiling, and database cloning, respectively. Moreover, they can be stacked on top of DATPROF Runtime in a fully modular manner (and licensed accordingly). This is shown in Figure 1.

Customer quotes

“DATPROF helped improve our organization because as more of our development teams adopt the tool, we become faster at creating test data. The test data quality is also improving, resulting in us creating better products.”
Dutch Railroads

“Before DATPROF, we did not have the ability to scramble or mask data in our non-SAP applications… Today we are confident that our customers’ sensitive data is not in anybody else’s hands.”
BCS

What does it do?

Figure 2 - DATPROF Workflow test data automation pipeline

For starters, DATPROF Runtime – which, as already noted, is the basis of the DATPROF platform – allows you to centrally configure, manage and monitor your test data user-groups, their databases, and the masking and subsetting applications available to them. It also allows you to deliver data to them automatically via DATPROF Workflow, a tool built into DATPROF Runtime that enables you to create automated test data pipelines visually, using a drag-and-drop interface. These pipelines can stretch from test data creation through to the delivery of test data to your testing teams, and are eminently readable and easy to both create and interact with. An example of such a pipeline is shown in Figure 2. Alternatively, a REST API is also provided to do the same job, which may be preferable if you have other CI/CD tools to include in your test data pipelines.

DATPROF Subset is used to create subsets of your existing production data for testing purposes. They are generated using a single, driver table as a starting point, with other tables included based on their relationship to that table. These relationships can be derived from existing database relationships or specified manually, and the process is assisted by intelligent suggestions for which table content should be included in full, as opposed to in part. The results can be visualised as either a data or process model. These are helpful for understanding your database’s structure, and thus how best to create your subset. Various validation techniques are provided to facilitate this process. Options exist to either completely refresh your test database or to append new test data cases to your existing data content, and duplicate data is handled appropriately while ensuring all constraints remain valid.

DATPROF Privacy is a rule-based data masking solution with native support for Oracle, SQL Server, PostgreSQL, MySQL, IBM Db2 (including Db2 on z/OS) and MariaDB. It can, in theory, support any other data source via a processing engine (which is to say, one of the aforementioned databases) and it can mask data stored in a variety of formats, including CSV and XML. Notably, it masks live data in situ, meaning you never need to move or extract it for the purposes of masking. Masking rules can be customised or leveraged out of the box, and can be applied in a specific order by setting dependencies. The product masks consistently over all of your systems and applications, and it delivers meaningful audit reports on your data masking and subsetting actions. Data profiling (and thus the discovery of sensitive data) is offered through DATPROF Analyze.

DATPROF Privacy also provides the company’s synthetic data generation capability, compatible with all of the data sources listed above. The product provides a selection of replacement data candidates and algorithms out of the box, including logical generators, weighted lists, regular expressions, generators that leverage seed data, and more. You can also build your own, using custom database functions, multi-column seed files (for example, a correlated seed list) and “generator expressions” that allow you to combine other types of generators into a bespoke formula, among other things. Synthetic data is generated directly in the database, in a uniform fashion for all major databases, and either during or after masking depending on whether you want to add data to your subset or replace data that is already there. It is also demonstrably performant, and automatically optimises its process flow to facilitate parallelisation.

More specifically, synthetic data is created against “generation sets” of tables, with each column in the table assigned one of the generators described above. Various configuration options are available for each column, including the percentage of null values to generate. Generated values can be combined to create a fully synthetic data set (for example, concatenating first and last names to get full names), columns can be earmarked to generate simultaneously in order to preserve correlations, and foreign key relationships can be discovered and included in your generated data automatically.

Finally, DATPROF Virtualize, the newest addition to the DATPROF product catalogue, offers database virtualisation, cloning, and snapshotting. This technology allows you to rapidly create personalised, containerised copies – clones – of entire production databases and distribute them to your testers individually. It is possible to do this without consuming huge volumes of disk space (and without breaking the bank on storage costs) because each clone will be very small in size when compared to the original database. This is the case because only the deltas, the way each clone differs from said database, are stored. This technique allows your testers to work with entire databases, rather than subsets or synthetic data sets, if they so choose. This has clear advantages in terms of coverage, although any sensitive data will still require masking. It also allows your testers to modify their test data on an ad hoc basis whenever they feel it is required, since each tester is given their own clone to work with, and to quickly refresh their clone when new test data is available. The product also lets you take snapshots of your test data, which can then be used for performing comparisons (for instance, before and after a test run) and rapid rollbacks.

Why should you care?

First of all, DATPROF excels in terms of ease of use. The various DATPROF products, although technically separate tools, feel like a single product that is exceptionally easy to work with. This means that you can discover, subset, mask, clone, and deliver your data easily and, more importantly, quickly. DATPROF Runtime and DATPROF Workflow are particularly notable for helping you build an automated testing pipeline easily and visually, thus accelerating the creation and delivery of your test data sets. We also like the synthetic data generation capabilities built into DATPROF Privacy, and we are particularly impressed by the attention paid in helping you carry relationships from your original data into your synthetic data.

Moreover, DATPROF has matured into an impressively comprehensive solution – indeed, a comprehensive platform – for test data management. With the addition of database virtualisation capabilities in DATPROF Virtualize, it now offers every major feature one could expect from a test data management solution. Few vendors in the space can say the same, and the fact that the DATPROF platform is highly modular in its licensing and deployment only sweetens the deal it is offering.

It is also worth noting that despite this newfound maturity, DATPROF is by no means resting on its laurels: future updates are planned to add NoSQL masking support (specifically, support for MongoDB), improved support for cloud databases, the ability to mask data in SaaS environments (using REST APIs), and more. Even generative AI is in the pipeline, although we are told that at this point it is only in the experimental stages.

The Bottom Line

DATPROF’s aim is to create an easy-to-use, highly modular test data platform that provides secure, compliant test data in a minimal time window and with minimal storage costs. Not only does it achieve this goal, it does so with aplomb, now incorporating every major technique for creating test data sets. We most certainly approve.

Mutable Award: Gold 2024

Commentary

Coming soon.

Solutions

DATPROF

DATPROF

Company Info

DATPROF

What is it?

What does it do?

Why should you care?

Commentary

Solutions

Research

Test Data Management (2024)

DATPROF (2024)

Test Data Management (2021)

DATPROF Test Data Management (2021)

Application Quality Assurance

DATPROF Test Data Management (2019)

Test Data Management (2019)