Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Can Testedness be Effectively Measured

Iftekhar Ahmed
November 21, 2016

Can Testedness be Effectively Measured

Presentation given at FSE 2016

Iftekhar Ahmed

November 21, 2016
Tweet

More Decks by Iftekhar Ahmed

Other Decks in Research

Transcript

  1. Can Testedness be Effectively Measured?
    Iftekhar Ahmed
    Rahul Gopinath
    Caius Brindescu
    Alex Groce
    Carlos Jensen

    View full-size slide

  2. Quality matters
    1

    View full-size slide

  3. What to do?
    2

    View full-size slide

  4. Our contribution
    3
    • A novel approach for retroactively evaluating the quality
    of test suite.
    • Evaluate the impact of tests on code quality.
    • Do more tests lead to better code quality?
    • Key finding: Testing works… to a certain extent.

    View full-size slide

  5. How do we measure testedness?
    • Code coverage.
    • Mutation coverage.
    4

    View full-size slide

  6. 5
    Code coverage
    “Code coverage is a measure used to
    describe the degree to which the source code
    of a program is tested” --Wikipedia
    • Stop when a targeted coverage is reached.

    View full-size slide

  7. Two problems with manually written tests
    • Coverage ≠ Quality.
    • Prone to manipulation.
    • Depends on the quality of assertions.
    • Assertions have a tendency to be inadequate
    • Might give a false sense of security
    • Manually written tests are subject to similar problems of
    correctness as programs.
    6

    View full-size slide

  8. Enter mutation analysis
    • Mutants look like real Bugs.
    d = b^2 – 4 * a * c;
    d = b^2 + 4 * a * c;
    d = b^2 * 4 * a * c;
    d = b^2 – 4 + a * c;
    d = b^2 – 4 * a * a;

    • Add/Replace/Remove single token.
    • Change in constant value/Negation
    • One binary operator to another.
    7

    View full-size slide

  9. Why mutation analysis
    • Mutants are similar to real bugs [Just2014].
    • Their detectability is similar to real faults [Andrews2005].
    • Tests with high mutation score are better able to detect hand
    seeded faults than other test coverage metrics [Le2009] .
    • Subsumes most test evaluation criteria
    • Statement coverage [Andrews06]
    • Dataflow coverage [Offutt92]
    8

    View full-size slide

  10. Limitation of mutation analysis
    9
    Mutation analysis is not the silver bullet.
    • We don’t have all possible mutants.
    • Mutation analysis is as good as it’s mutants.
    • Can not simulate bugs that occur due to missing
    statements.
    • Generates huge number of mutants = explosion in
    runtime.
    • Need to run through all tests for each mutant.

    View full-size slide

  11. Research questions
    10
    • Is there a correlation between testedness and code quality?
    • Does testing have an impact on code quality?

    View full-size slide

  12. METHODOLOGY
    5/17/2016

    View full-size slide

  13. Data Collection
    Github & Apache Libraries
    1,800
    796
    Has test suite
    Compiles
    |T| > 100
    326
    49
    12

    View full-size slide

  14. Tools
    • PIT for mutation testing
    • Gumtree for program element tracking.
    • Machine learning for categorizing commits.
    13

    View full-size slide

  15. Program element tracking
    Time
    14

    View full-size slide

  16. Summary of processing dataset
    • Analyzed 11,566 commits.
    • Used Naïve Bayes classifier to categorize commits into
    bug fix and Other.
    • Used 1,500 manually classified commits as training data.
    • Precision and recall of identifying bug fix (63% and 43%
    respectively)
    15

    View full-size slide

  17. Utility of test suite quality
    17
    • Fully tested ⇒ no bug fixes applied
    • Not tested ⇒ higher chance of future bug fixes
    FUTURE BUG FIXES
    TESTEDNESS

    View full-size slide

  18. Relationship between bug-fixes and test suite quality
    Mutation
    Score
    Coverage
    R2
    (Mean)
    R2
    (Mean)
    Statement - 0.12 - 0.11
    Blocks - 0.14 - 0.13
    Methods - 0.16 - 0.14
    Classes - 0.13 0.09
    Correlation between total number of bug-fixes per statement and various test quality measurement criteria
    • Weak correlation (Significant at level p<0.05)
    • Mutation score/coverage is not a good predictor.
    18

    View full-size slide

  19. Does testing have an impact?
    Covered Uncovered
    Statement 0.68 1.20
    Blocks 0.42 0.83
    Methods 0.40 0.87
    Difference in mean number of bug-fixes between covered and uncovered program elements
    Testing is effective in forcing quality improvements.
    19
    Testing does not guarantee quality.

    View full-size slide

  20. Impact of coverage
    Statement coverage score
    thresholds
    Statement 0.68 1.20
    Blocks 0.42 0.83
    Methods 0.40 0.87
    20
    • More tested ⇒ less bug fixes applied
    Mutation Score thresholds
    Statement 0.60 1.20
    Blocks 0.39 0.81
    Methods 0.32 0.87
    Covered statements need fewer bug fixes.

    View full-size slide

  21. Conclusions
    21
    • Testing works!
    • However, test coverage is not enough.
    • There is a limit on return on investment.
    • High coverage doesn’t guarantee bug free
    software.
    • Continuous improvement related to coverage (vs.
    mere measure of “tested or not?”) is not evident
    • We need to move beyond coverage to measure the
    quality of the tests.
    Thank You

    View full-size slide