GenRocket
Last Updated:
Analyst Coverage: Philip Howard and Daniel Howard
GenRocket is a private, venture-backed software company focused on test data management and particularly synthetic data generation. It was founded in February 2012, and is based in Ojai, CA.
GenRocket is a platform for enterprise-level synthetic test data generation. It is not technically a pure play vendor – some data masking capabilities are offered in addition to synthetic data generation – but it’s fair to say that the product is extremely focused on generating high quality synthetic data.
GenRocket
Last Updated: 17th June 2019
GenRocket is a platform for enterprise-level synthetic test data generation. It is not technically a pure play vendor – some data masking capabilities are offered in addition to synthetic data generation – but it’s fair to say that the product is extremely focused on generating high quality synthetic data.
The product can be deployed either on-premises or in-cloud via AWS. Multiple instances can be deployed in parallel via its ‘partition engine’, enabling the creation of millions or even billions of rows of synthetic data in a matter of minutes. It’s integrated with a variety of third-party testing tools that provide additional capabilities such as pairwise testing, and exposes REST and runtime APIs which allow it to integrate with many additional products and frameworks. A few of these can be seen in Figure 2 along with a variety of use cases and an overview of GenRocket’s architecture.
Customer Quotes
“With GenRocket and the expertise that the GenRocket team brought from years of experience in the software testing world, we were able to come up with a method to streamline our testing from days and weeks to a matter of hours.”
Solium
“The GenRocket platform is revolutionary – it replaces manual test data generation with a fully automated process that turns dummy data into intelligent data. And because there is no other test data management solution on the market matching its level of price/performance, we can offer GenRocket to any customer regardless of project size.”
QA Mentor
Generating synthetic data in GenRocket requires you to create a model for your test data, consisting of domains, attributes, generators, receivers and scenarios. Domains are roughly equivalent to database tables: you might have a ‘user’ domain, for example. Domains have attributes, which are similarly analogous to columns. In the aforementioned example, the user domain might have attributes for first name, last name, date of birth, and so on.
Each attribute is equipped with one or more generators, which are methods for generating synthetic data. 222 generators are provided out of the box across a wide variety of categories. Some of these are straightforward, such as NameGen, which generates names. Others are more complicated, but also more powerful, such as ConcatGen, which concatenates the values of other attributes; SwitchGen, which acts as a switch statement; and EdgeCaseGen, which generates and inserts edge cases into your test data. All have parameters that allow for data customisation: for example, NameGen can be configured to generate any combination of first, last, male or female names. Notably, GenRocket provides several generators specifically to support machine learning via the creation of training data. It also has the ability to blend production data with synthetic data, as well as generate data feed data.
Generators can be linked together on an ad hoc basis, allowing you to chain the output from one generator into the next. Any number of generators can be linked in this fashion, and sequences of linked generators can be saved as ‘presets’. When you first create an attribute, an appropriate generator is automatically assigned to it, if one is available. Furthermore, many generators are designed to create data that is internally consistent. For example, if you have an ‘address’ domain with attributes for street, city, state, and so on, the generators for each of those attributes will communicate automatically in order to create addresses which could actually exist – you might get addresses in ‘Boston, Massachusetts’, but never in ‘Boston, Hawaii’.
In addition to attributes, each domain has access to a variety of different receivers which determine the output format of your test data. For example, if a domain has an XML receiver then test data generated from it will be outputted as an XML file. Receivers exist for more than 44 different output formats, including CSV, JSON, SQL, REST, SOAP and the aforementioned XML, and you can generate test data in several different formats simultaneously by attaching multiple receivers to a given domain.
Finally, scenarios are sets of instructions for generating your test data, created by combining all of the above information into a single specification. You can configure the quantity of test data to create, and also include multiple domains in a single scenario chain. Moreover, you can automatically include multiple domains by creating relationships between them – either parent-child or sibling-sibling – which also guarantees referential integrity between those domains. Once you have created your scenario, it can be used (and reused) to create test data either centrally or on a local machine. The former is assisted by the GenRocket Multi User Server (GMUS), which allows large volumes of users to generate synthetic data centrally and simultaneously. In addition, scenarios can be modified in real time using the GenRocket API (perhaps as part of a dynamic testing workflow).
While you can create your model manually, this could become laborious, especially for large systems. Fortunately, GenRocket provides a feature called XTS (Extract Table Schema) that will scan a database and automatically build out an appropriate test data model, including domains, attributes, generators and scenarios. Relationships between domains are established via a wizard, and you can proceed to either use or customise the generated model as you require. Similarly, GenRocket can automatically generate synthetic data for documents by leveraging a relevant document’s XSD file.
GenRocket’s most important advantage is its position in the market. Most synthetic data offerings come as one part of a broad test data management platform, which due to reasons of expense and time to implement is often far from ideal if you are only interested in synthetic data generation. GenRocket’s small size and laser focus on synthetic data means that it avoids this problem. Likewise, if you have a traditional test data management solution which does not offer synthetic data generation, adding GenRocket to your existing solution will likely be far easier than moving to a new platform. This is enhanced by the range of integration capabilities offered by GenRocket, including the APIs it exposes and the variety of output formats it offers.
GenRocket also boasts some significant synthetic data capabilities that few other products share, including parent-child-sibling referential integrity (for which it holds a patent) and distributable and reusable test data generation (via scenarios) that can be performed either centrally or on a local machine. A notable omission is a lack of data profiling, which may limit your ability to create datasets that are representative of your production data. On the other hand, representative data can sometimes be a false ideal: for example, you will still want to test edge cases even if your production data doesn’t contain any.
The Bottom Line
Whether you need a traditional data subsetting solution, a synthetic data solution, or both, will depend on your circumstances and requirements. However, if you are interested in synthetic data, you should certainly be considering GenRocket.