GenRocket

I have written several times in the past about Grid-Tools and its test data management (TDM) solution. And I have extolled the product not least because it was the only product on the market that supported synthetic test data generation. This has the advantage that a) this approach is very agile, b) it requires minimal involvement from database administrators and c) it is the ultimate in data masking, because there is no sensitive or personally identifiable information in synthetic data.

However, Grid-Tools is now no longer alone. A Boston (US not UK) based company called GenRocket launched its eponymous TDM solution in November, following a lengthy beta programme, and this too generates synthetic data (subsetting is an option). The company is small (currently) and is self-funded.

The product has an interesting, tiered architecture. Put simply, this consists of projects, domains, attributes, generators, receivers and scenarios. The most interesting of these are the generators and receivers, and currently there are 63 of the former and 15 of the latter that are delivered out of the box. The generators do exactly that: generate names, postal codes, phone numbers, and so on, as well as having more advanced options like the option to subset an existing database. The receivers create the XML, JSON, SQL or appropriate code to actually create the required data.

This is neat: it means that the data definition is logically separated from the generation of the data and that, in turn, means that you can potentially generate synthetic data for any environment: you aren’t limited by the architecture to any particular type of database or file structure. The very fact that you can have a receiver that produces SQL and another that produces JSON illustrates this. Indeed, this is a major differentiator. Traditional TDM vendors like Grid-Tools, Informatica, IBM and Net-2000 tend to focus exclusively on generating data for relational databases (plus maybe IMS), so being able to generate XML or JSON is a significant potential benefit for GenRocket.

So: nice. However, there is a downside. For the moment – no doubt this will be rectified in due course – you have to profile source databases manually. The company has plans to pull in schemas and so forth but this is only a partial solution as there are always relationships in the data that are not defined formally in the metadata, so you are probably going to need a data profiling and discovery tool in addition to GenRocket if you want to get up and running quickly.

Generally speaking the buzz is that more companies are expressing interest in synthetic test data generation, which is good news for both GenRocket and Grid-Tools, at least in theory. On the other hand, if that’s true then the big boys in this market are likely to want to play. That will expand the market further but will also introduce more competition. Anyway, for the time being at least, the introduction of GenRocket means that you have a choice if you are interested in synthetic data.