the quality of test suite. • Evaluate the impact of tests on code quality. • Do more tests lead to better code quality? • Key finding: Testing works… to a certain extent.
• Prone to manipulation. • Depends on the quality of assertions. • Assertions have a tendency to be inadequate • Might give a false sense of security • Manually written tests are subject to similar problems of correctness as programs. 6
= b^2 – 4 * a * c; d = b^2 + 4 * a * c; d = b^2 * 4 * a * c; d = b^2 – 4 + a * c; d = b^2 – 4 * a * a; … • Add/Replace/Remove single token. • Change in constant value/Negation • One binary operator to another. 7
[Just2014]. • Their detectability is similar to real faults [Andrews2005]. • Tests with high mutation score are better able to detect hand seeded faults than other test coverage metrics [Le2009] . • Subsumes most test evaluation criteria • Statement coverage [Andrews06] • Dataflow coverage [Offutt92] 8
silver bullet. • We don’t have all possible mutants. • Mutation analysis is as good as it’s mutants. • Can not simulate bugs that occur due to missing statements. • Generates huge number of mutants = explosion in runtime. • Need to run through all tests for each mutant.
Naïve Bayes classifier to categorize commits into bug fix and Other. • Used 1,500 manually classified commits as training data. • Precision and recall of identifying bug fix (63% and 43% respectively) 15
R2 (Mean) R2 (Mean) Statement - 0.12 - 0.11 Blocks - 0.14 - 0.13 Methods - 0.16 - 0.14 Classes - 0.13 0.09 Correlation between total number of bug-fixes per statement and various test quality measurement criteria • Weak correlation (Significant at level p<0.05) • Mutation score/coverage is not a good predictor. 18
Blocks 0.42 0.83 Methods 0.40 0.87 Difference in mean number of bug-fixes between covered and uncovered program elements Testing is effective in forcing quality improvements. 19 Testing does not guarantee quality.
not enough. • There is a limit on return on investment. • High coverage doesn’t guarantee bug free software. • Continuous improvement related to coverage (vs. mere measure of “tested or not?”) is not evident • We need to move beyond coverage to measure the quality of the tests. Thank You