ERwin runs free - just in time to take on the GDPR problem, perhaps

A press release has just come across my desk, announcing that a private equity firm, Parallax Capital Partners, has acquired the ERwin data modelling software from CA Technologies. The new company is ERwin Inc. That’s interesting, for data scientists, as ERwin is a leading modeling product, with a long history. It’s new CEO is Adam Famularo, and the implications of this will doubtless become clear as the acquisition beds in. As our founder, Robin Bloor (also Chief Analyst & Cofounder of The Bloor Group) says, “The longevity and endurance of the ERwin brand is an exceptional story in itself. That ERwin has been spun out to become an independent operation as part of ERwin, Inc. is good news for data modeling and will doubtless accelerate the evolution of its industry-leading product into Big Data”. What I am less sure of is why CA Technologies wanted to get rid of ERwin – it seems a good fit with its DevOps and test data management tools, as Philip Howard has pointed out to me, and I agree. CA Technologies tried to sell ERwin to Embarcadero a couple of years ago and Embarcadero was very keen, but in the end, this was disallowed by anti-competition regulation in the USA.

It is perhaps a good time to develop and revitalise a data modelling tool. I’ve been a huge fan of data modelling for years. If you don’t understand the semantics – meaning – of your data and relations between data items, you can’t manage your data. Which means that you can’t manage what is a key resource for any company. The best way to facilitate this understanding is by producing visual models of your data – and by “models”, I mean more than diagrams, I mean living data models with completeness and consistency checks built in, and links to live databases etc.

Data modelling has always been important, but I think that it is about to move into the spotlight again, as various privacy regulations need you to be aware of personal data, and you need to be able to find it all and reliably action requests for access and update made by data subjects. In the EU, this is the GDPR issue – see here – and if you can’t demonstrate a capability that will allow you to implement the requirements of this regulation, you face the possibility of some fairly hefty fines. This is simply an aspect of having (and being able to demonstrate) good governance over an automated organisation.

The big issue companies face at the moment is what is (misleadingly) called “unstructured data” or “big data” – stuff that isn’t in relational databases but is often in a corporate “data lake” of doubtful provenance, often sourced from uncontrolled 3rd party Web sources that the IT group may not be aware of. The data protection regulations apply to this stuff if it is “personal data” and if you don’t know what it is or where it comes from, how do you know if it is personal data or not? If it is, you need to be able to track it through your systems, match its explicit processing permissions (if any) from the data subjects against your processing and so on. If a data subject discovers that you have this stuff before you have it under control it could be, at best, embarrassing; at worst very expensive. Luckily, it does have structure, which means that processing it can be automated (you don’t have to sort through it by hand) – if you can understand and use this structure (and traditional relational data analysis probably won’t help much). Which is where next-generation data modelling comes in.

This is why I was interested in this quote from on my ERwin press release, from ERwin’s new CEO: “I am excited to be leading the ERwin, Inc. team and expanding our vision into the Big Data market. We expect to extend and expand ERwin’s leadership in new technology segments to meet market demands for Big Data mastery”. We will see whether this is genuine innovation rather than simply words over the next few months (it’s a bit early to delve too deeply now)

I think that bringing order to Big Data with modelling is a huge opportunity for data analysis generally and ERwin in particular. There is an immediate need for this in the context of data privacy regulations; but it is also essential to getting full value out of a Big Data resource (and innovation is needed, relational analysis isn’t enough). I will be watching the new ERwin’s progress in delivering this (it won’t be trivial) with great interest.