Code Coverage is a Strong Predictor of Test Suite Effectiveness in the Real World

Code Coverage is a Strong Predictor of Test Suite Effectiveness in the Real World

GTAC 2016

D27cb84e0d30e2778e9b66d6a5f42106?s=128

Rahul Gopinath

October 18, 2016
Tweet

Transcript

  1. Code Coverage is a Strong Predictor of Test Suite Effectiveness

    in the Real World Rahul Gopinath Iftekhar Ahmed
  2. When should we stop testing?

  3. How to evaluate test suite effectiveness?

  4. Previous research: Do not trust coverage (In theory) GTAC’15 Inozemtseva

  5. Test suite quality Coverage Assertions Factors affecting test suite quality

  6. Test suite quality Coverage Assertions According to previous research Test

    suite size GTAC’15 Inozemtseva
  7. What is the adequate test suite size? But… • Is

    there a maximum number of test cases for a given program? • Are different test cases equivalent in strength? • How do we account for duplicate tests? • Test suite sizes are not comparable even for the same program.
  8. Can I use coverage to measure suite effectiveness?

  9. Mutation score is best predicted by statement coverage A fault

    in a statement has 87% probability of being detected if a test covers it. M = 0.87xS Size of dots follow size of projects R2 = 0.94 Results from 250 real world programs largest > 100 KLOC On Developer written test suites
  10. Mutation score is best predicted by statement coverage A fault

    in a statement has 61% probability of being detected if a test covers it. M = 0.61xS R2 = 0.70 Results from 250 real world programs largest > 100 KLOC On Randoop produced test suites
  11. Controlling for test suite size, coverage provides little extra information.

    Hence don't use coverage GTAC’15 Inozemtseva But Mutation score provides little extra information (<6%) compared to coverage. Why use mutation?
  12. Does coverage follow test suite size? GTAC’15 Inozemtseva Our Research

    # Programs 5 250 Selection of programs Ad hoc Systematic sample from Github Tool used CodeCover, PIT Emma, Cobertura, CodeCover, PIT Test suites Random subsets of original Organic & Randomly generated (New results) Removal of influence of size Ad hoc Statistical Our study is much larger, systematic (not ad hoc), and follows the real world usage Our Research (New results) M~TestsuiteSize 12.84% M~log(TSize) 51.26% residuals(M~log(TSize))~S 75.25% Statement coverage can explain 75% variability in mutation score after eliminating influence of test suite size.
  13. Is mutation analysis better than coverage analysis?

  14. Δ=b2 – 4ac d = b^2 + 4 * a

    * c;
 d = b^2 * 4 * a * c;
 d = b^2 / 4 * a * c;
 d = b^2 ^ 4 * a * c;
 d = b^2 % 4 * a * c; d = b^2 << 4 * a * c; d = b^2 >> 4 * a * c; d = b^2 * 4 + a * c;
 d = b^2 * 4 - a * c;
 d = b^2 * 4 / a * c;
 d = b^2 * 4 ^ a * c;
 d = b^2 * 4 % a * c; d = b^2 * 4 << a * c; d = b^2 * 4 >> a * c; d = b^2 * 4 * a + c;
 d = b^2 * 4 * a - c;
 d = b^2 * 4 * a / c;
 d = b^2 * 4 * a ^ c;
 d = b^2 * 4 * a % c; d = b^2 * 4 * a << c; d = b^2 * 4 * a >> c; d = b + 2 - 4 * a * c;
 d = b - 2 - 4 * a * c;
 d = b * 2 - 4 * a * c;
 d = b / 2 - 4 * a * c;
 d = b % 2 - 4 * a * c; d = b^0 - 4 * a * c;
 d = b^1 - 4 * a * c; d = b^-1 - 4 * a * c; d = b^MAX - 4 * a * c; d = b^MIN - 4 * a * c; d = b^2 - 0 * a * c;
 d = b^2 - 1 * a * c;
 d = b^2 – (-1) * a * c;
 d = b^2 - MAX * a * c;
 d = b^2 - MIN * a * c;
 14 Mutation Analysis: High cost of analysis
  15. Δ=b2 – 22ac d = b^2 - (2^2) * a

    * c;
 d = b^2 - (2*2) * a * c;
 d = b^2 - (2+2) * a * c; 15 Mutation Analysis: Equivalent Mutants Mutants Original Equivalent Mutant Normal Mutant Or: Do not trust low mutation scores
  16. 16 Mutation Analysis: Redundant Mutants Mutants Original Equivalent Mutant Redundant

    Mutant d = b^2 - (-4) * a * c;
 d = b^2 + 4 * a * c;
 d = (-b)^2 - 4 * a * c;
 Δ=b2 – 4ac Or: Do not trust high mutation scores
  17. 17 Mutation Analysis: Different Operators Δ=b2 – 4ac d =

    b^2 + 4 * a * c; >>> dis.dis(d) 2 0 LOAD_FAST 0 (b) 3 LOAD_CONST 1 (2) 6 LOAD_CONST 2 (4) 9 LOAD_FAST 1 (a) 12 BINARY_MULTIPLY 13 LOAD_FAST 2 (c) 16 BINARY_MULTIPLY 17 BINARY_SUBTRACT 18 BINARY_XOR 19 RETURN_VALUE x [2016 Software Quality Journal]
  18. Does a high coverage test suite actually prevent bugs?

  19. FSE 2016 We looked at the incidence of bug fixes

    on actual programs An uncovered line is twice as likely to have a bug fix as that of a line covered by any test case.
  20. That is, • Coverage is highly correlated with mutation score

    (92%) • Mutation score provides little extra information compared to coverage. • Coverage provides 75% more information than just test suite size. • Mutation score can be unreliable. • Coverage thresholds actually help reduce incidence of bugs.
  21. Summary Beware of theoretical spherical cows