Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Code Coverage is a Strong Predictor of Test Suite Effectiveness in the Real World

Code Coverage is a Strong Predictor of Test Suite Effectiveness in the Real World

GTAC 2016

Rahul Gopinath

October 18, 2016
Tweet

More Decks by Rahul Gopinath

Other Decks in Research

Transcript

  1. Code Coverage is a Strong Predictor of
    Test Suite Effectiveness
    in the Real World
    Rahul Gopinath Iftekhar Ahmed

    View Slide

  2. When should we stop testing?

    View Slide

  3. How to evaluate test suite effectiveness?

    View Slide

  4. Previous research: Do not trust coverage
    (In theory)
    GTAC’15 Inozemtseva

    View Slide

  5. Test suite quality
    Coverage
    Assertions
    Factors affecting test suite quality

    View Slide

  6. Test suite quality
    Coverage
    Assertions
    According to previous research
    Test suite size GTAC’15 Inozemtseva

    View Slide

  7. What is the adequate test suite size?
    But…
    • Is there a maximum number of test cases for a given program?
    • Are different test cases equivalent in strength?
    • How do we account for duplicate tests?
    • Test suite sizes are not comparable even for the same program.

    View Slide

  8. Can I use coverage to measure suite effectiveness?

    View Slide

  9. Mutation score is best predicted by statement coverage
    A fault in a statement has 87%
    probability of being detected
    if a test covers it.
    M = 0.87xS
    Size of dots follow size of projects
    R2 = 0.94
    Results from 250 real world programs

    largest > 100 KLOC
    On Developer written test suites

    View Slide

  10. Mutation score is best predicted by statement coverage
    A fault in a statement has 61%
    probability of being detected
    if a test covers it.
    M = 0.61xS
    R2 = 0.70
    Results from 250 real world programs

    largest > 100 KLOC
    On Randoop produced test suites

    View Slide

  11. Controlling for test suite size, coverage provides little extra information.
    Hence don't use coverage
    GTAC’15 Inozemtseva
    But
    Mutation score provides little extra information (<6%) compared to coverage.
    Why use mutation?

    View Slide

  12. Does coverage follow test suite size?
    GTAC’15 Inozemtseva Our Research
    # Programs 5 250
    Selection of programs Ad hoc Systematic sample from Github
    Tool used CodeCover, PIT Emma, Cobertura, CodeCover, PIT
    Test suites Random subsets of original Organic & Randomly generated
    (New results)
    Removal of influence of size Ad hoc Statistical
    Our study is much larger, systematic (not ad hoc), and follows the real world usage
    Our Research (New results)
    M~TestsuiteSize 12.84%
    M~log(TSize) 51.26%
    residuals(M~log(TSize))~S 75.25%
    Statement coverage can explain 75% variability in mutation score after
    eliminating influence of test suite size.

    View Slide

  13. Is mutation analysis better than coverage analysis?

    View Slide

  14. Δ=b2 – 4ac
    d = b^2 + 4 * a * c;

    d = b^2 * 4 * a * c;

    d = b^2 / 4 * a * c;

    d = b^2 ^ 4 * a * c;

    d = b^2 % 4 * a * c;
    d = b^2 << 4 * a * c;
    d = b^2 >> 4 * a * c;
    d = b^2 * 4 + a * c;

    d = b^2 * 4 - a * c;

    d = b^2 * 4 / a * c;

    d = b^2 * 4 ^ a * c;

    d = b^2 * 4 % a * c;
    d = b^2 * 4 << a * c;
    d = b^2 * 4 >> a * c;
    d = b^2 * 4 * a + c;

    d = b^2 * 4 * a - c;

    d = b^2 * 4 * a / c;

    d = b^2 * 4 * a ^ c;

    d = b^2 * 4 * a % c;
    d = b^2 * 4 * a << c;
    d = b^2 * 4 * a >> c;
    d = b + 2 - 4 * a * c;

    d = b - 2 - 4 * a * c;

    d = b * 2 - 4 * a * c;

    d = b / 2 - 4 * a * c;

    d = b % 2 - 4 * a * c;
    d = b^0 - 4 * a * c;

    d = b^1 - 4 * a * c;
    d = b^-1 - 4 * a * c;
    d = b^MAX - 4 * a * c;
    d = b^MIN - 4 * a * c;
    d = b^2 - 0 * a * c;

    d = b^2 - 1 * a * c;

    d = b^2 – (-1) * a * c;

    d = b^2 - MAX * a * c;

    d = b^2 - MIN * a * c;

    14
    Mutation Analysis: High cost of analysis

    View Slide

  15. Δ=b2 – 22ac
    d = b^2 - (2^2) * a * c;

    d = b^2 - (2*2) * a * c;

    d = b^2 - (2+2) * a * c;
    15
    Mutation Analysis: Equivalent Mutants
    Mutants
    Original
    Equivalent Mutant
    Normal Mutant
    Or: Do not trust low mutation scores

    View Slide

  16. 16
    Mutation Analysis: Redundant Mutants
    Mutants
    Original
    Equivalent Mutant
    Redundant Mutant
    d = b^2 - (-4) * a * c;

    d = b^2 + 4 * a * c;

    d = (-b)^2 - 4 * a * c;

    Δ=b2 – 4ac
    Or: Do not trust high mutation scores

    View Slide

  17. 17
    Mutation Analysis: Different Operators
    Δ=b2 – 4ac
    d = b^2 + 4 * a * c;
    >>> dis.dis(d)
    2 0 LOAD_FAST 0 (b)
    3 LOAD_CONST 1 (2)
    6 LOAD_CONST 2 (4)
    9 LOAD_FAST 1 (a)
    12 BINARY_MULTIPLY
    13 LOAD_FAST 2 (c)
    16 BINARY_MULTIPLY
    17 BINARY_SUBTRACT
    18 BINARY_XOR
    19 RETURN_VALUE x
    [2016 Software Quality Journal]

    View Slide

  18. Does a high coverage test suite actually prevent bugs?

    View Slide

  19. FSE 2016
    We looked at the incidence of bug fixes on actual programs
    An uncovered line is twice as likely to have a bug fix
    as that of a line covered by any test case.

    View Slide

  20. That is,
    • Coverage is highly correlated with mutation score (92%)
    • Mutation score provides little extra information compared to coverage.
    • Coverage provides 75% more information than just test suite size.
    • Mutation score can be unreliable.
    • Coverage thresholds actually help reduce incidence of bugs.

    View Slide

  21. Summary
    Beware of theoretical spherical cows

    View Slide