The issue with testing today - Modern systems are so complex, testing may do little more than engender a false sense of security

I think we need to mention a few caveats first. We are not saying, in this blog, that conventional testing tools and techniques are useless (although if, as some people suggest, the majority of development shops still test manually, that is a serious problem), just that systems (especially in the Mutable Enterprise) are becoming so complex that conventional test automation solutions won’t be enough to deliver adequate confidence in the whole system.

To provide some context, fellow analyst David Norris has said: “The reality in an awful lot of circumstances, with major FTSE 100 companies, is that they outsource the testing, they arm the testers with all the “best of breed” testing tools, but the security team, because of the silos in which we work, does not allow the free exchange of data across the firewalls so whilst senior management think all is rosy because they have invested in the tools, the outcome is sadly still inadequate” – because only parts of the system have been tested, not the system behaviour as a whole.”It stays like that,” David continues, “until a crisis exposes the reality. I have come to the conclusion that all large companies are now too complex for the minds of mere mortals to manage in any sensible fashion”. He also points out the October 2016 crash in the value of the pound (which cost some people money even if some traders did well out of it) may well have been caused by the unanticipated behaviour of trading algorithms in unusual circumstances. Hardly a conventional coding bug, but a significant system defect none-the-less – especially if there is any possibility of regulators deciding that “due care and attention” wasn’t applied – and one that should have been tested for. All testing is, effectively, risk management; but you need to be able to quantify the risk and its scope, in order to manage it.

I was talking to David Norris recently, about the brave new world of the Mutable Enterprise and its consequences for Agile development. This world is very much more complex than that we are used to, and for which our testing methods were developed; it contains many more things that we wish to automate, doing ever cleverer things, talking to other things in an asynchronous manner – and having to deal with their own mistakes.

“Obviously”, in this world, adequate testing and effective defect removal is going to be important; and “obviously” the future is going to be with automated testing. The trouble is, when one says that something is “obvious”, that usually leads one to make dangerous assumptions.

What if I suggested that “fully adequate” testing, even in the conventional sense, was infeasible in practice (just think of running all possible test cases in all possible combinations in all possible environments in anything other than a trivial system) and that the complexity of modern online systems makes, in effect, accepted testing tools all but useless? Well, perhaps that is overstating it, unit testing still has real value, but although unit testing etc may be necessary, it is almost certainly insufficient and we need a radically different approach to testing.

In the delivery cycles commonplace today, the amount of time given to testing is under increasing pressure; but with the increased complexity we face, that leaves many paths untested (even if you automate your testing), and the consequences of the interaction of those untested paths in a distributed multi-layered application is an unknown.

Test Packs

Let’s assume that you have adopted a proper, professional, automated TDD (Test Driven Development) approach. Now ask yourself these questions:

Do you have all the relevant test cases? Even for remediation (undoing user actions after the user changes their mind; possibly after their input has been used in another interaction) and asynchronous error recovery?
Do your test cases deal with “non-functional requirements” – security, performance, usability – or just “functional requirements” – the required business logic?
The better testing tools support testing “non-functional requirements”, but how thoroughly have you taken advantage of this?
Are the test cases (“requirements”) you do have “coherent” – that is, do they deal with just one function, self-contained, and with a low level of dependence on other functions, represented by one piece of code, or has the code got other “less important” functionality included which isn’t really part of the test case, but which is “obviously” necessary. If you don’t have coherent test cases, how can you understand exactly what you have tested?
Is it even feasible to include a test case for absolutely every piece of functionality, or does the test case test the aggregation of several sub-functions as a whole? If so, does this matter?
Have you run your test cases in all possible sequences, or just in the order dictated by the “light path” adopted by a skilled user, doing all the right things in the right order?
Have you run your test cases in all possible situations that might occur in the external systems (that you may have no control over), that make up the environment that what you are building has to run in?
Have you run your test cases at production scale, in case of “emergent behaviours” when there are enough interactions to make the very unlikely inevitable?
Do you still think that your testing is adequate? Testing doesn’t have to complete, in practice, but it must be effective at finding defects and controlling risk. How would you measure this effectiveness and demonstrate it to third parties?

Don’t expect too much input from the vendors of test tools. They are quite happy dealing with the needs of people who generally don’t yet do much testing anyway; and are busy with following the status quo, but doing it better, and with persuading more developers to adopt it and actually automate their testing. And, to be fair, this isn’t a problem at the moment because our inefficiency protects us. If something seriously breaks in production, then there is usually plenty of time for manual intervention – someone hits a big red stop button. But that doesn’t scale – and might be unexpectedly expensive – and aren’t we increasingly trying to remove inefficiencies anyway?

Cultural issues

The issues with testing go far beyond the technology of testing. There are serious cultural issues:

Working in silos: for example, the security group thinks that something like sequel (SQL) injection is a coding issue, whilst at the same time the developers think it is a security issue. People do not talk outside of their silo, with the consequence that no-one tests for it.
The promotion of a positive attitude at all times: anybody finding problems is seen as being negative and making development less agile. So the clever people who could identify complex failure scenarios learn to keep quiet about them.
Over-reliance on formal communication channels: the agile development team understands the needs of their own group of users and develops test cases for their particular requirements; but don’t realise that other groups have different, possibly more complex, requirements, and without co-location and informal communications this goes undiscovered until it is too late.

The solution

The solution to these issues isn’t new; both of us have worked on last-century case tools that built models of evolving business systems, which could be forked into production at any time when the risk/benefit ratio was sufficiently positive. They failed, ultimately, because they were proprietary and too clumsy – the supporting technology wasn’t adequate.

These days, the technology is adequate and analytics tools are available to predict the likelihood and consequences of failure. We are talking, in essentials of the “digital twin” approach being adopted by systems engineering and suggesting that it can be extended into general business automation.

The Digital Twin is being adopted by, for example, GE: “The ultimate vision for the digital twin is to create, test and build our equipment in a virtual environment. Only when we get it to where it performs to our requirements do we physically manufacture it. We then want that physical build to tie back to its digital twin through sensors so that the digital twin contains all the information that we could have by inspecting the physical build,” says John Vickers, NASA’s leading manufacturing expert and manager of NASA’s National Center for Advanced Manufacturing – see here.

The “digital twin” supports preventative maintenance and “what if” analysis for engineering systems where downtime is prohibitively expensive. Why not do the same thing with “business process”, so you can continually change and evolve process without impacting day-to- day operation? “Testing” now becomes the simulation of working business scenarios and identification, and quantification, of the likelihood of failure. You can invite in teams dedicated to being disruptive or even criminal – and get real input from them by making the whole simulation into a “computer game”. You can now get all stakeholders actively contributing to validation because you are playing in “game space”, not the real world with its political issues.

In this new testing scenario, the analytics are vital – the “digital twin” needs quantitative (statistical) metrics to show how much of the model has been exercised and in what ways. And, to quantify the likelihood of failure and the magnitude of the likely impact of failure. One still needs a testing plan, focused on finding defects (so that they can be addressed), not on showing how well the system works when it does work. Are you planning to unleash a “Black Hat team on your Digital Twin and, if not, why not?

The danger is that even with a digital twin, there’s a cultural pressure to run scenarios that work over and over again (thus improving management morale), rather than to look for defects. One reason that we think external testing – or “defect removal” – experts, who will speak truth to power, are necessary, in addition to in-house testers. But, of course, an ambitious manager can still employ external testers who informally agree to be kind…

Why is this important

This is important because, past the Mutable inflection point, the increasing use of AI and robot automation will make the risks from releasing systems with significant defects into production unacceptable. Conventional testing deals with only a subset of possible defects; current tools that look at aggregate behaviours in simulated environments may not support the analytics needed to quantify evolving system effectiveness. The digital twin may not be the only way to address the emerging issues (perhaps risk-driven behaviour-driven development provides an alternative, or complimentary, approach) but we do need something better than what we generally have.