Mutation Testing with Pitest

Unit tests can be useful for ensuring code quality and correctness. Not every unit test makes sense, however, and bugs often manage to escape detection by unit tests. How can test quality be increased so that programming errors are detected earlier and more reliably?

Like some other software companies, we subscribe to test-driven development. With this approach, unit tests are first performed, and then the actual logic is implemented. A useful result of this is that there is a high degree of code coverage. This metric, however, only allows limited conclusions to be made about the correctness and quality of the tested code. Test coverage refers to all lines of code that are executed during tests. These are not necessarily error-free lines. In the end, this means that the test coverage metric can only be used to find lines that will not in any case be tested for correctness.

To describe an extreme case: In most cases, it is quite possible to design tests that cover every program path and each line of code without checking a single condition. In spite of 100 percent coverage, the metric provides zero information on the correctness of the code.

The question therefore arises of how meaningful test coverage can be achieved through higher quality tests. More specifically: How can we measure the quality of unit tests? To answer this question, one needs to understand the reasons why we write tests at all. One of the reasons – probably the most important one – is to avoid programming errors. An ideal unit test ensures that the unit of code to be tested does exactly what it is supposed to do. Unfortunately, the general absence of errors does not prove this, nor can it be ensured through tests. However, unit tests can help developers to avoid the most likely errors. In this process, the test author needs to take care that the tests evaluate the code for these most likely errors. If he fails to take important sources of error into consideration, the application can malfunction in spite of high test coverage. So what can be done about this?

Mutation testing

One tool that can help us to improve the quality of our tests is so-called mutation testing. With mutation testing, for each execution of the test suite, each test is executed not just once, but multiple times. Before each test run, the code to be tested is modified or mutated. If the test fails after a change is made, its robustness has been proven for this case. If it remains green, this means that the test does not cover the failure case caused by the change.

At the end of the day, this means that each unit test shows which changes it did not detect. These changes are called mutations. Each mutation that causes the test to fail is referred to as killed, since it did not survive the test. The term survivors refers to mutations that are not detected by a test – despite their presence, the test stays green. It should be mentioned here that mutations that survive a test do not necessarily indicate a problem; however, they could very well hint at one.

Running each test suite multiple times and also mutating the code under test between runs would involve significant effort—so much that it would not be reasonable to perform this manually. Therefore, mutation tests would not be practical without tools for automating the process. One tool for JVM-based programming languages that automates mutation testing is Pitest. In this article, we will explain and examine the use and usefulness of this tool.

Example

We will use an illustrative example to demonstrate the usefulness of mutation tests as well as how to set up Pitest with Maven.

We assume that the following classes for performing simple calculations on integers are already present in a Maven project:

package de.triology.blog.pitest;

class Calculator {
	static int add(int a, int b) {
    		return a + b;
	}

	static int subtract(int a, int b) {
        	return a - b;
	}

	static int multiply(int a, int b) {
    		return a * b;
	}
}

Since test-driven development was used, the following unit tests already exist for this class:

package de.triology.blog.pitest;
import org.junit.Test;

import static com.google.common.truth.Truth.assertThat;

public class CalculatorTest {

	@Test
	public void add() {
    		assertThat(Calculator.add(2, 2)).isEqualTo(4);
	}

	@Test
	public void subtract() throws Exception {
    		assertThat(Calculator.subtract(3, 0)).isEqualTo(3);
	}

	@Test
	public void multiply() throws Exception {
    		assertThat(Calculator.multiply(5, 1)).isEqualTo(5);
	}
}

These tests provide 100 percent method coverage and will also all be green when executed. But how robust are they against mutations? In order to find this out, we will add the Pitest Maven plugin to the project. To do this, we add the following to the pom.xml:

	<build>
    	<plugins>
        	<plugin>
            	<groupId>org.pitest</groupId>
            	<artifactId>pitest-maven</artifactId>
            	<version>1.2.0</version>
        	</plugin>
    	</plugins>
	</build>

After we have added the plugin to the project, it can be called up as follows:

clean install org.pitest:pitest-maven:mutationCoverage

It will take a moment for the project to be built and for the mutation tests to run. The Pitest plugin creates a folder in the project’s target directory with the name pit-reports. This contains the results of the mutation test. If we now open the HTML file in the subdirectory and follow the links all the way to the tested class, all surviving and killed mutations will be displayed:

As can be seen in the screenshot, there were six mutations, of which two survived the test suite. In both cases, they are mutations on mathematical operators; one time a minus was replaced with a plus, another time an asterisk (multiplication operator) was replaced by a slash (division operator).

The class to be tested clearly appears to be error-free. However, the result of the mutation test still provides meaningful information: the class could possibly contain errors, but our test suite would still be green. In this example: The tests allow an addition to be performed instead of a subtraction, and a division instead of a multiplication, without them failing. Of course, for illustrative purposes, the tests data have been chosen specifically to result in surviving mutations. If we change the unit tests as follows, the Pitest result will be different:

public class CalculatorTest {

	@Test
	public void add() {
    	assertThat(Calculator.add(2, 2)).isEqualTo(4);
	}

	@Test
	public void subtract() throws Exception {
    		assertThat(Calculator.subtract(3, 1)).isEqualTo(2);
	}

	@Test
	public void multiply() throws Exception {
    		assertThat(Calculator.multiply(6, 3)).isEqualTo(18);
	}
}

As before, the tests are green; however, all mutations are now killed off:

Mutators

Both surviving mutations from the example were created by replacing a mathematical operator with its inverse, in other words addition became subtraction and multiplication became division. The rules used for these replacements are contained in so-called mutators. Pitest itself provides a few mutators, such as the MATH mutator used above. This mutator not only replaces subtraction and multiplication operators, but also the following:

Other mutators replace each increment with a decrement, or cause each if condition to always be true (or false). Complete documentation of the mutators included with Pitest can be found in the corresponding documentation.
The combination of mutators that Pitest automatically runs out of the box enables identification of problems and ambiguities within the test suite without requiring a lot of the developer’s time. Like most good things, however, mutation tests also have their price.

Disadvantages of mutation tests

With Pitest, mutation tests don’t need to be written by the developer. However, at some point in time they must be executed. In the vast majority of cases, the execution time of the mutation tests is quite a bit longer than the time required to simply execute the unit tests. This is because for each section of code that is tested with a unit test, one or more mutations are generated. For each of these mutations, a unit test will run. Even for projects with just a few thousand lines of code but high test coverage, execution of the mutation tests can take a few minutes. And this is in spite of the optimizations that Pitest performs: For example, not all tests necessarily run for a given mutation, but rather only those that have a chance of detecting that mutation. As soon as a test intercepts a mutation, no more tests are run for it.

In combination with the Maven SCM plugin, Pitest can also be configured so that only newly added code is mutation tested. As part of a CI pipeline, for example, it is possible to always run mutation tests in the nightly build, but only on newly added code. At the end of this pipeline could be a SonarQube, into which the Pitest results are imported. We will explain how this works in a follow-up to this blog post.

As is the case with high test coverage, it is possible for a software developer to get carried away by a high mutation detection rate and overengineer the unit test. And even though it is certainly a good thing if no mutations survive the tests, in each individual case it must be decided which survival rates make sense, and which mutations might in good conscience be allowed to survive. Pitest currently does not have the option of ignoring certain survivors. Thus if one intentionally decides not to kill off a mutation, it will continue to appear in the Pitest report.

Conclusion

Mutation tests can help with detecting and improving weak unit tests with a minimum of effort. This contributes to improved software quality. They are no cure-all, however. The introduction of mutation tests will not suddenly cause the quality of a product to shoot up. Mutation tests do, however, provide a practical way to increase the developer’s confidence in his own unit tests – much like how unit tests can increase confidence in the code. Whether and to what extent mutation tests are used must be decided on a case-by-case basis. There is simply no yellow brick road for this.

Share this article

Philipp Czora
Software Development
Philipp likes to be surrounded by people that he can learn from. When it comes to Software Development, he always strives for the perfect mix of pragmatism and perfectionism.