The idea that you can completely test that an application does what it is supposed to do is generally regarded as impracticable at best and impossible at worst. Unless, of course, you are doing something highly dangerous - like running a nuclear reactor, which absolutely has to be tested to the limit. Moreover, theoretically, it is not possible to "completely" test an application. Suppose that you enter a number into a field: there are an infinite number of numbers and you cannot test every one. But, generally speaking, if it works with 2,314, it will probably work with 3,612: you can test sets of numbers rather than numbers themselves. So, let's leave this consideration aside: is it practical to test exhaustively if not "completely"?

The assumption is not. Why is that the case? Think about an application from a user perspective. You start by entering some value into a field. That value belongs to a set of values which you can test for validity. If the entry is valid then something else happens and something rather different happens if the entry is invalid. These require further steps which are either valid or invalid and so on. If you wanted to express this graphically you would use a tree diagram or a flow chart. It's not actually very complicated to conceptualise this.

The question then becomes whether, if you can capture the logic of the application, which is what I have been describing, and if you know all the value sets that apply at each decision point, you can apply the latter to the former and just run it. It's actually an ideal task for a computer to perform - large-scale iterative processing - precisely what the things were built for. Of course, in an agile development environment you probably don't want to do this en masse: you want to test a bit of the logic with the relevant value sets rather than everything all at once. But that's fine - it's just a smaller version of the same thing.

The key question, of course, is whether you can capture the complete logic of the application as well as all of the relevant values that have to be driven through the appropriate test cases. In the first case, we know that this is possible because there are existing products that do exactly that. And we know that we can derive all the relevant values and/or actions: just look at the written test plans that get passed to manual testers. The issue is whether we can generate those values automatically, using a tool. It certainly shouldn't be beyond the wit of man. And, finally, you need an engine to process the whole thing.

So, is exhaustive testing impractical and too expensive? I don't think it is. Or, at least, potentially it isn't. It seems to me perfectly plausible to build a tool suite that will derive test cases from a specification, derive value sets from the same source, and exhaustively run one against the other. It's the sort of automation that all development shops should be looking for.

Philip Howardsaid on :"

A reader, Llyr Wyn Jones, has sent me a lengthy email on this topic. He had wanted to post it here via a comment but it was too long. I am therefore posting it myself:

"A very good and timely article. I have a few comments:

1. When talking about "testing sets of numbers", you fall short of calling out equivalence class testing (ECT) and boundary value analysis. A nitpick, but you were pretty much 99% of the way there and you described ECT to a tee.

2. You stay clear of mentioning ordering, which is a far bigger problem than the "large search domain" problem you outline (the latter being obviated by ECT anyway so it isn't actually a problem). Imagine a set of 6 switches, all able to be set "on" or "off". There are 2^6 = 64 distinct combinations of switch states - this corresponds to the "search domain". However, when you factor in the different ways you can arrive at those 64 states, you introduce a factorial to this number and 64! (or "64 factorial") is a number which is in the region of 10^83, which makes testing all possible permutations impossible. This turns out to be quite an intractable problem, but can be ameliorated by considering pairwise ordering etc. but will introduce gaps.

3. Along with (2) above, the main justification for "totally exhaustive testing" to be impossible is due to an incorrect interpretation of the Halting Problem, which basically says that there cannot exist a general algorithm for determining whether a particular program will finish given a particular input (an obvious corollary/generalization being that you cannot test whether or not a given program actually works 100% correctly). The incorrect interpretation comes from a subtlety in how the problem is worded: a *single* algorithm, but says nothing about the existence of an algorithm to test each particular program. This is identical to saying that no single model type can test everything, but you can always find a particular model type that will test a particular program.

Otherwise a very good article, and the important thing is it'll get people thinking.