Can Testedness be Effectively Measured

7054e88c483431e1c8ed017fc650c8c2?s=47 Iftekhar Ahmed
November 21, 2016

Can Testedness be Effectively Measured

Presentation given at FSE 2016

7054e88c483431e1c8ed017fc650c8c2?s=128

Iftekhar Ahmed

November 21, 2016
Tweet

Transcript

  1. Can Testedness be Effectively Measured? Iftekhar Ahmed Rahul Gopinath Caius

    Brindescu Alex Groce Carlos Jensen
  2. Quality matters 1

  3. What to do? 2

  4. Our contribution 3 • A novel approach for retroactively evaluating

    the quality of test suite. • Evaluate the impact of tests on code quality. • Do more tests lead to better code quality? • Key finding: Testing works… to a certain extent.
  5. How do we measure testedness? • Code coverage. • Mutation

    coverage. 4
  6. 5 Code coverage “Code coverage is a measure used to

    describe the degree to which the source code of a program is tested” --Wikipedia • Stop when a targeted coverage is reached.
  7. Two problems with manually written tests • Coverage ≠ Quality.

    • Prone to manipulation. • Depends on the quality of assertions. • Assertions have a tendency to be inadequate • Might give a false sense of security • Manually written tests are subject to similar problems of correctness as programs. 6
  8. Enter mutation analysis • Mutants look like real Bugs. d

    = b^2 – 4 * a * c; d = b^2 + 4 * a * c; d = b^2 * 4 * a * c; d = b^2 – 4 + a * c; d = b^2 – 4 * a * a; … • Add/Replace/Remove single token. • Change in constant value/Negation • One binary operator to another. 7
  9. Why mutation analysis • Mutants are similar to real bugs

    [Just2014]. • Their detectability is similar to real faults [Andrews2005]. • Tests with high mutation score are better able to detect hand seeded faults than other test coverage metrics [Le2009] . • Subsumes most test evaluation criteria • Statement coverage [Andrews06] • Dataflow coverage [Offutt92] 8
  10. Limitation of mutation analysis 9 Mutation analysis is not the

    silver bullet. • We don’t have all possible mutants. • Mutation analysis is as good as it’s mutants. • Can not simulate bugs that occur due to missing statements. • Generates huge number of mutants = explosion in runtime. • Need to run through all tests for each mutant.
  11. Research questions 10 • Is there a correlation between testedness

    and code quality? • Does testing have an impact on code quality?
  12. METHODOLOGY 5/17/2016

  13. Data Collection Github & Apache Libraries 1,800 796 Has test

    suite Compiles |T| > 100 326 49 12
  14. Tools • PIT for mutation testing • Gumtree for program

    element tracking. • Machine learning for categorizing commits. 13
  15. Program element tracking Time 14

  16. Summary of processing dataset • Analyzed 11,566 commits. • Used

    Naïve Bayes classifier to categorize commits into bug fix and Other. • Used 1,500 manually classified commits as training data. • Precision and recall of identifying bug fix (63% and 43% respectively) 15
  17. 5/17/2016

  18. Utility of test suite quality 17 • Fully tested ⇒

    no bug fixes applied • Not tested ⇒ higher chance of future bug fixes FUTURE BUG FIXES TESTEDNESS
  19. Relationship between bug-fixes and test suite quality Mutation Score Coverage

    R2 (Mean) R2 (Mean) Statement - 0.12 - 0.11 Blocks - 0.14 - 0.13 Methods - 0.16 - 0.14 Classes - 0.13 0.09 Correlation between total number of bug-fixes per statement and various test quality measurement criteria • Weak correlation (Significant at level p<0.05) • Mutation score/coverage is not a good predictor. 18
  20. Does testing have an impact? Covered Uncovered Statement 0.68 1.20

    Blocks 0.42 0.83 Methods 0.40 0.87 Difference in mean number of bug-fixes between covered and uncovered program elements Testing is effective in forcing quality improvements. 19 Testing does not guarantee quality.
  21. Impact of coverage Statement coverage score thresholds Statement 0.68 1.20

    Blocks 0.42 0.83 Methods 0.40 0.87 20 • More tested ⇒ less bug fixes applied Mutation Score thresholds Statement 0.60 1.20 Blocks 0.39 0.81 Methods 0.32 0.87 Covered statements need fewer bug fixes.
  22. Conclusions 21 • Testing works! • However, test coverage is

    not enough. • There is a limit on return on investment. • High coverage doesn’t guarantee bug free software. • Continuous improvement related to coverage (vs. mere measure of “tested or not?”) is not evident • We need to move beyond coverage to measure the quality of the tests. Thank You