Specification-based Testing

I just had the opportunity to hear a live talk by John Hughes given to a small group of Chalmers students which I happened to part of. He was re-giving his talk he held one week ago at the Erlang User Conference about specification-based testing of Ericsson software (written in Erlang). In the following I’ll try to give a summary of what I consider the most essential parts of his very interesting talk. For further information see John’s (et al.) paper or the LtU discussion. Users of QuickCheck know what nice advantages specification-based testing has: Instead of lots and lots of test cases one writes properties of functions, for example (in Erlang):

prop_reverse() ->
  ?FORALL(Xs, list(int())),
  ?FORALL(Ys, list(int())),
    reverse(Xs++Ys) ==
      reverse(Xs) ++ reverse(Ys).

In fact this specification is wrong as a quick check (pun intended) shows us:

Failed! After 12 tests.
[-3,2]
[3,0] 
Shrinking....(10 times)
[0][1]

QuickCheck provides two important features:

  • It provides the interface to a controlled generation of test cases (as opposed to simply random ones, which usually are of little help).

  • It generates and runs tests of the specified properties and–this might be new to users of the GHC package Test.QuickCheck–shrinks them to smaller test cases. This is important as it often hard to find the actual cause of a bug, especially if test cases are very large. (John mentioned that the Mozilla developers actually offer T-Shirts to people just reducing test cases as they found this to be an extremely time consuming task.

In our examples the error was in the specification and can be fixed by substituting Xs and Ys in the last line.

Advantages of Specification-based Testing

The most obvious advantage of using QuickCheck over usual test cases is that it allows us to dramatically reduce the number of test cases. In fact it also does a great job in finding corner cases. In their field study at Ericsson, while spending only 6 days to implement all the test properties (and a library for state-machine-based testing), they found 5 bugs in an already well-tested soon-to-release product, one of which would have been really unlikely to be triggered by any test case and revealed 9 bugs in an older version of the project, of which only one has been documented at this time. (see the paper). Neil Mitchell also recently posted about the merits of QuickCheck can be. John added one note, though. The bugs they found were very small (due to shrinking) and in fact would have never been caused in the real system, because the command sequences that lead to the failures would have never been triggered by the real controller. However, they tested against the documented (400+ pages) specification. This leads us to one other important advantage of specification-based testing.

QuickCheck forces (or allows) us to actually state the specification as executable properties. therefore we might not only find errors in the program but, as seen in our example, also in the specification. This can be very useful and he gave a nice demonstration of buggy (or incomplete) specification for the Erlang functions register/2, unregister/1, and whereis/1 that provide a sort of name server for Erlang processes. To do this he generated a sequence of Erlang calls and checked if their results corresponded to the model of a state machine based on the specification. That is our current state consisted of a simple map, representing the (assumed) state of the name server. Using preconditions he controlled the allowable sequences of generated commands and checked their outcome using postconditions. This small experiment showed quite a couple of incompletenesses in the Erlang specification, e.g., it does state that register/2 will fail if you try to register the same process under different names, but it does not state that the process is not added to the list of registered processes (although sensible to assume–nevertheless, the specification is incomplete). (I think he had some more serious example, but I can’t remember what exactly it was.)

Remarks

Having it used myself I am very convinced of the advantages of QuickCheck. In reply to my question about testability of non-deterministic faults, caused by the effects of concurrent execution of processes, John remarked, that you can get quite deterministic behavior (thus causing reproducible test results) by running the programs on a single CPU and rely on the deterministic scheduler. I am not so sure how far this reaches, but then again, you should use Erlang’s behaviors as much as possible.