Can Testedness be Effectively Measured

Can Testedness be Effectively Measured? Iftekhar Ahmed Rahul Gopinath Caius
Brindescu Alex Groce Carlos Jensen

Quality matters 1

What to do? 2

Our contribution 3 • A novel approach for retroactively evaluating
the quality of test suite. • Evaluate the impact of tests on code quality. • Do more tests lead to better code quality? • Key finding: Testing works… to a certain extent.

How do we measure testedness? • Code coverage. • Mutation
coverage. 4

5 Code coverage “Code coverage is a measure used to
describe the degree to which the source code of a program is tested” --Wikipedia • Stop when a targeted coverage is reached.

Two problems with manually written tests • Coverage ≠ Quality.
• Prone to manipulation. • Depends on the quality of assertions. • Assertions have a tendency to be inadequate • Might give a false sense of security • Manually written tests are subject to similar problems of correctness as programs. 6

Enter mutation analysis • Mutants look like real Bugs. d
= b^2 – 4 * a * c; d = b^2 + 4 * a * c; d = b^2 * 4 * a * c; d = b^2 – 4 + a * c; d = b^2 – 4 * a * a; … • Add/Replace/Remove single token. • Change in constant value/Negation • One binary operator to another. 7

Why mutation analysis • Mutants are similar to real bugs
[Just2014]. • Their detectability is similar to real faults [Andrews2005]. • Tests with high mutation score are better able to detect hand seeded faults than other test coverage metrics [Le2009] . • Subsumes most test evaluation criteria • Statement coverage [Andrews06] • Dataflow coverage [Offutt92] 8

Limitation of mutation analysis 9 Mutation analysis is not the
silver bullet. • We don’t have all possible mutants. • Mutation analysis is as good as it’s mutants. • Can not simulate bugs that occur due to missing statements. • Generates huge number of mutants = explosion in runtime. • Need to run through all tests for each mutant.

Research questions 10 • Is there a correlation between testedness
and code quality? • Does testing have an impact on code quality?

METHODOLOGY 5/17/2016

Data Collection Github & Apache Libraries 1,800 796 Has test
suite Compiles |T| > 100 326 49 12

Tools • PIT for mutation testing • Gumtree for program
element tracking. • Machine learning for categorizing commits. 13

Program element tracking Time 14

Summary of processing dataset • Analyzed 11,566 commits. • Used
Naïve Bayes classifier to categorize commits into bug fix and Other. • Used 1,500 manually classified commits as training data. • Precision and recall of identifying bug fix (63% and 43% respectively) 15

5/17/2016

Utility of test suite quality 17 • Fully tested ⇒
no bug fixes applied • Not tested ⇒ higher chance of future bug fixes FUTURE BUG FIXES TESTEDNESS

Relationship between bug-fixes and test suite quality Mutation Score Coverage
R2 (Mean) R2 (Mean) Statement - 0.12 - 0.11 Blocks - 0.14 - 0.13 Methods - 0.16 - 0.14 Classes - 0.13 0.09 Correlation between total number of bug-fixes per statement and various test quality measurement criteria • Weak correlation (Significant at level p<0.05) • Mutation score/coverage is not a good predictor. 18

Does testing have an impact? Covered Uncovered Statement 0.68 1.20
Blocks 0.42 0.83 Methods 0.40 0.87 Difference in mean number of bug-fixes between covered and uncovered program elements Testing is effective in forcing quality improvements. 19 Testing does not guarantee quality.

Impact of coverage Statement coverage score thresholds Statement 0.68 1.20
Blocks 0.42 0.83 Methods 0.40 0.87 20 • More tested ⇒ less bug fixes applied Mutation Score thresholds Statement 0.60 1.20 Blocks 0.39 0.81 Methods 0.32 0.87 Covered statements need fewer bug fixes.

Conclusions 21 • Testing works! • However, test coverage is
not enough. • There is a limit on return on investment. • High coverage doesn’t guarantee bug free software. • Continuous improvement related to coverage (vs. mere measure of “tested or not?”) is not evident • We need to move beyond coverage to measure the quality of the tests. Thank You

Can Testedness be Effectively Measured

Can Testedness be Effectively Measured

Iftekhar Ahmed

More Decks by Iftekhar Ahmed

Other Decks in Research

Featured

Transcript