Universal Mutation Testing Tool for Code

an essay written by Lee Bush, 2020

Introduction
Background: product code, automated test code, and code coverage -- a crash course
Background: mutation testing
Example usage of the universal mutation testing tool (a fictional story)
How it works
Disadvantages
Conclusion

Introduction

I have developed a tool that helps identify testing gaps in software code by using a technique called mutation testing. Using the insights that this tool provides, software engineers can improve how they are testing their software and uncover more bugs in their software earlier in the software development process. Without the insights that mutation testing tools like mine provide, software engineers may feel a false sense of security that their testing is sufficient, allowing untested/undertested software code and the bugs that come along with them to be released to customers.

Similar tools exist and they offer more powerful features, but each one is limited to specific programming languages. This tool is unique because it is programming language agnostic, focusing on removing lines of text. This tool is innovative because it delivers the testing gap insights with a relatively simple setup procedure compared to the existing tools and code libraries that I have reviewed. The tool is also innovative because it can be used in situations where no other mutation testing tools are available for a particular programming language.

Background: product code, automated test code, and code coverage -- a crash course

(Please feel free to skip to the next section if you understand these concepts well.)

When companies build software products, their software engineers need to ensure software quality to keep customers happy and avoid security problems, outages, lost customers, and lost revenue associated with low quality software. Software quality is ensured by testing, and some of that testing is automated.

Aside from writing the code that goes into the product, most good engineering teams also write automated tests using code (i.e., unit tests, end-to-end tests, etc.). Automated tests can be run frequently to verify that the software product's functionality is working correctly. Automated tests (henceforth referred to simply as tests) check that the product code behaves correctly under certain conditions. When engineers add new features and fixes to the product, they write code, compile/build the code into a product, write test code, and then run their tests may fail when something is broken. Failed tests inform engineers that the code needs more work before the product can be placed in production for customers to use. This code-and-test technique can help improve and maintain software quality, but higher levels of quality are possible when these product and test artifacts are processed by additional tools.

Special tools and code libraries exist that record the lines of product code that actually exercised when the tests are run and the results are stored in a code coverage report. A code coverage report can tell you that a piece of the product was not tested at all, informing you to write more tests. After improving your tests to excite the right conditions in the product, the code coverage report improves. But if the code coverage report tells you that some product code lines were executed, it does not necessarily mean that they were properly tested. For example, if the product uses a formula to calculate taxes, the code coverage report will only tell you that the tax formula was run. Unfortunately, the coverage report will not tell you if you if your tests are actually checking that the taxes are correct or incorrect. It is the burden of the writer of automated test to remember to test everything important, which can be hard to track. This is where mutation testing helps.

Background: mutation testing

Mutation testing is an advanced software testing technique where tests are run repeatedly against product code, but each time the product is tested, the product code is intentionally changed in some way to break it. Mutation testing tools mangle lines of product code in different ways (i.e., deleting lines, changing numbers around, etc.). Each unique way that the product is changed is called a mutant (i.e., "line 4 is deleted from on code file"). In mutation testing, the goal is for engineers to "kill mutants" and to eliminate or reduce the number of mutants that "survive". These concepts are explained below.

If tests are run against a mutant and fail then the mutant is considered killed. A killed mutant indicates that the lines of code in the product that were mangled are properly tested by the tests.

But if all tests pass when run on a mutant, then the mutant has survived. A mutant that survived could mean that the tests do not fully test the behavior of the product, a piece of product code could have a problem in it or not be necessary, or the report could be a false positive. Software engineers analyze these survived mutants and determine the best course of action: fixing tests, fixing the product, or ignoring the false positive.

If an engineer kills enough mutants, she has proven reasonably well that the product cannot function properly unless it contains its original unmangled code. This is much stronger evidence of quality than a code coverage report, which only verifies that the code is executed while testing.

The following story helps explain the mutation testing experience and its benefits and how it is with my tool.

Example usage of the universal mutation testing tool (a fictional story)

Claire built some new features for her online tax preparation product, which leverages a variety of technologies and programming languages. In the product code area of her development workspace, she has created a C++ file `government/canada.cpp` and has edited some logic in the C++ file `tax/real_estate.cpp`. She also wrote five new tests cases to verify her functionality in the Python file `tests/test_canadian_real_estate_taxes.py`. As she has done many times per day, she built her product in her terminal by typing the command `make`. After the product was built, she tested it by running the command `make test`. After 30 seconds of waiting, she saw that all 75 tests passed (70 existing and five new). But was Claire done with her work? Were those 5 tests enough? She could't think of additional test cases, but she wanted to be really confident that the Canadian real estate tax feature was correct and well-tested. She had read about the benefits of mutation testing and wanted to try it out, but didn't want to deal with integrating a mutation testing framework now.

She decided to try out my mutation testing tool `mutation-test` to explore how well she had tested her new code. In a configuration file, she specified that `make` builds the product, `make test` tests the product, and `/opt/line-tools/bin/cpp-annotator` identifies lines of code within C++ files. To mutation test her code she then typed `mutation-test government/canada.cpp tax/real_estate.cpp` and pressed enter. While her computer crunched away running up to 75 tests against 352 mutant products it computed, she went out to lunch. A little after she returned from lunch, the program reported some interesting findings. Some mutants had survived.

In one mutant, the line `new_priority_factor = 5 * incoming_priority_factor;` was deleted from the product but the mutant still passed the tests. After some analysis, she decided that the code and some variables were unnecessary and removed unneeded code, which lowered technical debt. A small win for the product.

In another mutant, a five-line loop was deleted and the mutant still passed all 75 test cases, including the new ones. This was surprising. The loop computed a total, and after looking through her test code in `test_canadian_real_estate_taxes.py`, she realized that she had forgot to check the `total` field was coming out as expected. She added some checks to her test code and then re-ran her tests with `make test`. The test that she had modified was now failing. The actual total that the product computed was `498.00 CAD` but her test expected `148.00 CAD`. She identified and fixed the bug in her loop and then re-ran her tests again. Now all 75 tests were passing again. The tool successfully helped identify a testing gap and ultimately a bug in a piece of code that was not properly tested. That was a big win for product quality!

With the use of my mutation testing tool, a critical product bug was found and product testing was strengthened. Product bugs can be a lot more expensive and difficult to fix if they escape to production, and a few minutes of setup and an hour or so of waiting, the test run was well worth the investment.

How it works

The tool's behavior is pretty simple. It backs up the product code and then starts a series of mutation testing experiments. In each experiment, a line or block of code is deleted and then the user-specified external build and test tools are invoked. The tool records whether each mutant was killed or survived. The results are printed and the mutations that survived are printed out along with the code diffs that explain how the product code was mutated.

Disadvantages

The tool does have some disadvantages. By design, the tool is not intelligent and runs more slowly than other mutation testing solutions that are tuned for specific languages. Also, if the code deletion of lines always results in compilation breakage, then the tool will not "universally" work for your programming language or code base. :)

Conclusion

Mutation testing helps identify gaps in software testing, which helps software engineers improve tests and catch more bugs earlier, saving engineering time and money. My tool offers a path to quickly set up mutation testing experiments and reap the benefits.

< Back to projects