Simulating behaviour to replace legacy - a constraint-driven approach to legacy modernisation

Some years ago I wrote about an interesting way of building legacy-replacement and other systems. In essence, this was a technology that reduced an application (legacy or even new-build) to a set of behaviours and then reproduced these behaviours using new technology that could be enhanced with new behaviours on demand. If you can simulate the behaviour of a legacy system sufficiently accurately that any conceivable input produces the same outcome as the legacy system would deliver – and the simulation is maintainable, to cope with new or changed inputs, and sufficiently performant – then the simulation can replace the legacy system, even if you don’t understand the code in the legacy system and the simulation uses entirely new technology. Of course, that also implies that the replacement reproduces any mistakes built into the legacy system, and that may not be acceptable in practice (although the simulation and verification process may itself provide valuable insights into such issues).

The big issue is “any conceivable input”, of course – if you capture the behaviour of the live system over a particular period of time, as you would, how can you be sure that the period includes all behaviours? Regression testing (showing that nothing has changed, except in ways intended) is always the possible killer. Nevertheless, Martin and Phil Rice came up with a technology using this approach called the Erudine Behaviour Engine that seemed to work in difficult cases where other approaches had failed.

Unfortunately, the first implementation of Erudine failed, not because the technology itself failed but, allegedly, because some large organisations feel that paying small companies in tough times is optional. This is probably an object lesson to small developers with clever ideas – be careful about getting too dependent on anyone much bigger than you are, and with more lawyers.

This original Erudine technology, however, is still available from Erudine Financial, which bought the IP. It still seems to work well, although it now seems to be focussed more on building systems to meet compliance rules. ‘Legacy replacement’, of course, covers a wide gamut – the process is to mine legacy for requirements use-cases and corresponding tests and then build modernised systems from them, but the process can mine requirements from regulations too…

However, the Rices, the people with the vision behind the original technology, operating as “The Agile Consultancy“, have now produced what they see as an improvement on the original approach. Instead of building an all-singing, all-dancing solution to implementing enterprise requirements, they have taken the essence of their approach and released it as an Open Source project. They can now concentrate on developing the core idea, which is a version of formal constraint-driven development; see, for example, Constraint-driven development by K. Lano of the Department of Computer Science, King’s College London: “constraints, together with UML class diagrams and state machines, can be used as a precise and platform-independent specification language”. The Agile Consultancy’s less formal, less powerful but faster and easier to use version of constraint-driven development is then embedded in Open Source frameworks such as Eclipse in order to fix real-world problems.

This Open-Source core framework (which isn’t UML-based nor really platform independent, but that won’t worry many people) is called CDD, and is written in Scala. It seems particularly appropriate to moving legacy applications to a Cloud environment, according to Martin Banks (see here).

The fundamental problem CDD addresses is the need to integrate the legacy applications that many businesses still depend on, with more modern cloud and mobile environments. In many cases, the source code for these applications may have been lost or be unreliable (it’s possibly even worse to have source code that doesn’t actually correspond to what is in production than to have no source code, because using it leads to unexpected changes in behaviour). And, the original ‘requirements’ for these legacy applications may be in the heads of people who have left the organisation or even life itself.

One approach to this issue is to ‘wrapper’ well-structured legacy as a service, with a well-defined API, but that has maintenance implications if you want to ever change the internal behaviour of the service, and can increase complexity if the legacy is poorly structured (highly coupled and not very cohesive; i.e. if it has a complex interface and can’t be easily broken up into modules that just do one thing).

Re-writing legacy in well-structured, modern, standards-based technology is preferable in the longer term – as long as you can cope with the overheads of re-modelling and validating the requirements and regression testing the results, to show that nothing has changed. And, the overheads of getting the regulators and your customers to accept any changes that are the result of correcting errors in the legacy logic. A customer will be delighted if an error-correction results in, say, lower charges – and then ask about refunds for the over-charges made by the legacy system, and possibly try to get any relevant regulators involved; if this possibility isn’t anticipated and allowed for, you may have problems.

CDD seems to address the requirements problem by, in effect, mining the legacy system for ‘requirements’ seen as constraints on the behaviour of the system (although in an ideal world, there would still be a benefit from building and re-validating a platform independent business requirements model, if you could afford it, because not all legacy actually does exactly what the business wants). A simple constraint model, for example, might be used to classify system users on entry into poor people (with tightly constrained, low, savings), rich people (unconstrained savings above a certain point) and rejected users (the constraint that people under 16, say, can’t use the system),

Constraint-driven development also helps address the regression-testing problem, because constraints are specified as test-cases and are an inherently simpler way of looking at behaviour and easier for business users to validate. I don’t see CDD as any sort of ‘magic bullet’ (the gap between the ideal business requirements and the actual behaviour of code, and the regression testing issue, are both still real) but it does seem, to me, to offer a very useful way forward. I think that small-medium enterprises (particularly) should at least look at CDD as part of a legacy modernisation approach.

However, Martin Rice says “we are finding that large companies and organisations that we dealt with when with Erudine, who were very much in the “we buy gold plated solutions from IBM’ set, are now talking very differently and seriously looking at Open Source in order to make cost reductions. It will be interesting to see how this transpires in actual use rather than just in statements of intent”. It will indeed, especially as Martin tells me that these people are seeing Open Source as reducing lifetime ownership cost and avoiding lock-in rather than as just ‘free software’ (there’s no such thing as free software, of course – it all has a cost of ownership, including licence management, even if the upfront acquisition cost is zero).