Curiosity about Testing

I see testing as an integral part of application development from the very beginning – a “requirement”, whether it is in a formal requirements document or just a yellow post-it on a whiteboard isn’t any use unless it has an associated test. You must have a way of deciding whether a particular requirement has been met, and some assurance that it has been met in a way that will delight your customers.

More than that, testing is, to my mind, just a specialised form of application development. You are writing a program to report on the behaviour and limitations of the code you are developing – which sounds like a great opportunity for Augmented Intelligence (AI), by the way. The same standards apply in general: you need to structure your tests so as to maximise the chance of finding defects and minimise the associated cost; you need version control of the testing code and the test data and configuration management to make sure the tests match the evolving target code; you need searchable libraries to store tests and test data for reuse; you need security to make sure that testing is not a backdoor into your systems and that real people and their personal data can’t be identified. This all explains why I find Curiosity Software interesting – it produces modelling and management tools for testing and managing test data but although its capabilities are primarily concerned with testing and test data, in many cases they could also be applied to application development as a whole.

This sort of sophistication matters, because people are often advised that they need more test automation, and if their application assurance preprocesses are poor, more test automation merely delivers poor results faster. As Curiosity itself says, just buying more automation

assumes that the way you do testing today is effective, and simply needs automating. Sadly this is rarely the case and you likely have a ton of waste, duplication and inefficiency in there.”

I think that part of the problem is that testing metrics are often undefined and without context. You might do a lot of testing by executing essentially the same test over and over again with slightly different numbers – or you might be following a structured approach, exploring boundary conditions. A test that doesn’t find a defect is a waste of time and resources, not a validation of how good you are at building software.

Curiosity lists some testing antipatterns that are a sign of trouble:

You have a large team of testers doing mainly end-to-end (e2e) tests – that is, testing overall black box behaviour so you may know that something is broken, but not where in the code. This sort of testing is necessary, in my opinion, but not sufficient – it tests the overall user experience (both functionality and performance) of an application, using data that replicates real life.
You have a test pack largely made up of UI based e2e tests which always grows after each release – you suspect there’s lots of duplicated effort in there.
Testing is delayed because you are waiting on e2e environments to be built, or waiting for your slot in the few e2e environments that persist in your organization.
Test coverage relies on traceability back to requirements that you know are incomplete or out of date.
Test design is largely an activity that involves testers creating test cases without input from the wider development team. Alternatively, you don’t design tests at all; you just write test cases/automation.
Despite trying many different tools, you are struggling to increase the rate of tests executed through automation as the tests are just too flaky!
A large percentage of your defects are environment/config related, causing large delays in test phases.
You are finding fundamental design issues in late, e2e test phases.

These are good to think about, but I’d like to add a couple more:

You have no idea how many defects you might expect to find in a given component, or how to come up with or justify such a figure. Historical analysis of past efforts? Theoretical models? Guesswork?
You have no idea of how to define “success” for testing, no success criteria defined in advance, so how do you know when to stop? When you run out of resources or when the previously agreed deadline comes around – no matter how flakey the application? Perhaps when your tests stop finding more defects – but that implies that your tests are well-structured and designed. Perhaps when you have found the number of defects you expect to find and are finding it difficult to define tests that find new ones?

I do very much agree with Curiosity, that making testing cost-effective is one key to success. As I’ve said before, since you will have defects, any test that doesn’t find a defect is more-or-less wasted. Plus, don’t automate broken processes, that’ll just get you to hell faster and more reliably.

I like the idea of fixing the test approach and strategy first and taking a “lean” – no waste – approach. I was always taught (since well back in the last century that high cohesion and low coupling made quality management and application assurance easier, because they reduce complexity and you waste less time trying to see exactly what the code does. Of course, you shouldn’t “end up unnecessarily testing interfaces that aren’t changing and so pose zero risk” – but “unnecessarily” matters; there is usually a non-zero risk that the interface was changed unintentionally.

So what does Curiosity actually suggest that people do about all this? Well, there is a very useful article by Rich Jordan called “Going lean on your testing approach” I do like the idea, in that paper, of “blast radius”: it’s really impact analysis; when you put a software change into production and it blows up in your face, how much damage does it do, how many people and systems are affected? The more modular and componentised (with high cohesion and low coupling) your applications are, the smaller the blast radius should be. Well, at least, you should have fewer surprise detonations, and minimising blast radius is a worthwhile goal for application assurance generally.