Mutation Testing – The Basics
Software quality has evolved to an important factor when developing software systems. Especially for commercial systems, quality assurance of the software is necessary to provide a reliable and pleasing product. Testing is a crucial step in the software development life cycle and should be of every software engineer’s concern. Testing frameworks and tools such as JUnit or PHPUnit have made their way to production systems and are often deeply integrated into release processes.
Consequently, the quality of the tests themselves is crucial for their reliability to ensure the software’s quality and correctness. One of the approaches to evaluate tests is Mutation Testing. However, this technique does not seem to be very popular and has its downsides. Is it mature and useful enough to support strategic decisions regarding development of a software and its tests?
First, let’s get down with the basics.
Concept
What is the basic idea of Mutation Testing? In order to find out whether a given set of tests is able to detect possible faults, Mutation Testing introduces a number of faults to the source code and checks whether the software tests can detect the changes by failing.
Those modifications are called mutators or mutation operators and are defined beforehand. When a mutation operator is applied, an alternative version of the original software is created. We call this alternative version a mutant. This mutation process creates a whole set of mutants. Then, the test suite is executed against these mutants. If the testsuite fails on a mutant – which is the desired outcome – the mutant is considered to be killed. Otherwise, it has survived. Finally, if all mutants are killed, the set of tests is considered to be of good quality. Of course in practice, usually not all mutants are killed by the existing tests. When dividing the killed mutants by the number of all created mutants a score, called the mutation score, can be determined. The mutation score is in fact the metric by which the test quality is then measured.
The idea of asserting the quality of software tests by Mutation Testing is not new at all. According to an extensive survey conducted in 2010 by Jia and Harman [1], origins of Mutation Testing can be found as early as 1971. But for a long time, the technique was merely used in an academic context and only a few, more recent publications applied it to larger or commercial software systems. Of course, this long-time focus on research instead of application illustrates the difficulties Mutation Testing had to overcome in order to be feasible for a production application.
Equivalent Mutants
One important issue are so called equivalent mutants which are mutants that – though syntactically different – show the same behavior as the original program. Tests will and should never fail when ran against equivalent mutants, resulting in a lower overall score even for a perfect set of tests and thus diminishing the accuracy of the technique. Unfortunately, the problem of testing whether two programs are equivalent has been proven to be undecidable by Budd and Angluin in 1982 [2]. In the context of Mutation Testing it is accordingly impossible to just test the generated mutants for equivalence to the original program. As a result, a lot of effort was put into finding ways to reduce the number of equivalent mutants somewhere in the process. Ranging from looking into compiler optimization with heuristics [3] or using genetic algorithms to selectively create only useful mutants [4] to looking at the impact on test execution itself [5].
Computational costs
Reducing the execution costs also got a lot of attention as the concept of Mutation Testing is rather expensive. Not only is running the tests a large number of times costly and takes a lot of time, also the mutant creation requires effort concerning source code altering and compilation. To mitigate this drawback, numerous approaches have been made. Be that as it may, as can be seen from the setup described in chapter three, execution costs are not important in the context of this work. Nonetheless, the previously mentioned survey gathered and classified the existing ideas and is recommended for more detailed insights into this problem. Mutation Operators. Last, the mutation operators are one of the most relevant aspects of the technique and key to its reliability for a certain environment. Obviously, the result of Mutation Testing is only trustworthy if the source code alterings mimic possible programming flaws or common failure causes. However, such abstract problems are often specific to the programming paradigm or even features of a single programming language. Jia and Harman also compiled that as long as procedural programming was predominant, “traditional mutation operators” (mostly altering logic and basic statements) were used, but object orientated programming poses different needs [1]. For example, languages like Java might additionally require mutators that alter method visibility in order to challenge tests covering inheritance related code. As a result, several publications look at strategies to cover such needs [6, 7].
References
[1] Yue Jia and Mark Harman. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering, 37(5):649–678, 2011.
[2] Timothy A Budd and Dana Angluin. Two notions of correctness and their relation to testing. Acta Informatica, 18(1):31–45, 1982.
[3] Douglas Baldwin and Frederick Sayward. Heuristics for determining equivalence of program mutations. Technical report, GEORGIA INST OF TECH ATLANTA SCHOOL OF INFORMATION AND COMPUTER SCIENCE, 1979.
[4] Konstantinos Adamopoulos, Mark Harman, and Robert M Hierons. How to overcome the equivalent mutant problem and achieve tailored selective mutation using coevolution. In Genetic and evolutionary computation conference, pages 1338–1349. Springer, 2004.
[5] Bernhard JM Grün, David Schuler, and Andreas Zeller. The impact of equivalent mutants. In Software Testing, Verification and Validation Workshops, 2009. ICSTW’09. International Conference on, pages 192–199. IEEE, 2009.
[6] Sun-Woo Kim, John A Clark, and John A McDermid. Investigating the effectiveness of object-oriented testing strategies using the mutation method. Software Testing, Verification and Reliability, 11(4):207–225, 2001.
[7] Sunwoo Kim, John Clark, and John McDermid. The rigorous generation of java mutation operators using hazop. Informe técnico, The University of York, 1999.