Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Code Coverage is a Strong Predictor of Test Suite Effectiveness in the Real World

Code Coverage is a Strong Predictor of Test Suite Effectiveness in the Real World

GTAC 2016

Rahul Gopinath

October 18, 2016
Tweet

More Decks by Rahul Gopinath

Other Decks in Research

Transcript

  1. Code Coverage is a Strong Predictor of Test Suite Effectiveness

    in the Real World Rahul Gopinath Iftekhar Ahmed
  2. What is the adequate test suite size? But… • Is

    there a maximum number of test cases for a given program? • Are different test cases equivalent in strength? • How do we account for duplicate tests? • Test suite sizes are not comparable even for the same program.
  3. Mutation score is best predicted by statement coverage A fault

    in a statement has 87% probability of being detected if a test covers it. M = 0.87xS Size of dots follow size of projects R2 = 0.94 Results from 250 real world programs largest > 100 KLOC On Developer written test suites
  4. Mutation score is best predicted by statement coverage A fault

    in a statement has 61% probability of being detected if a test covers it. M = 0.61xS R2 = 0.70 Results from 250 real world programs largest > 100 KLOC On Randoop produced test suites
  5. Controlling for test suite size, coverage provides little extra information.

    Hence don't use coverage GTAC’15 Inozemtseva But Mutation score provides little extra information (<6%) compared to coverage. Why use mutation?
  6. Does coverage follow test suite size? GTAC’15 Inozemtseva Our Research

    # Programs 5 250 Selection of programs Ad hoc Systematic sample from Github Tool used CodeCover, PIT Emma, Cobertura, CodeCover, PIT Test suites Random subsets of original Organic & Randomly generated (New results) Removal of influence of size Ad hoc Statistical Our study is much larger, systematic (not ad hoc), and follows the real world usage Our Research (New results) M~TestsuiteSize 12.84% M~log(TSize) 51.26% residuals(M~log(TSize))~S 75.25% Statement coverage can explain 75% variability in mutation score after eliminating influence of test suite size.
  7. Δ=b2 – 4ac d = b^2 + 4 * a

    * c;
 d = b^2 * 4 * a * c;
 d = b^2 / 4 * a * c;
 d = b^2 ^ 4 * a * c;
 d = b^2 % 4 * a * c; d = b^2 << 4 * a * c; d = b^2 >> 4 * a * c; d = b^2 * 4 + a * c;
 d = b^2 * 4 - a * c;
 d = b^2 * 4 / a * c;
 d = b^2 * 4 ^ a * c;
 d = b^2 * 4 % a * c; d = b^2 * 4 << a * c; d = b^2 * 4 >> a * c; d = b^2 * 4 * a + c;
 d = b^2 * 4 * a - c;
 d = b^2 * 4 * a / c;
 d = b^2 * 4 * a ^ c;
 d = b^2 * 4 * a % c; d = b^2 * 4 * a << c; d = b^2 * 4 * a >> c; d = b + 2 - 4 * a * c;
 d = b - 2 - 4 * a * c;
 d = b * 2 - 4 * a * c;
 d = b / 2 - 4 * a * c;
 d = b % 2 - 4 * a * c; d = b^0 - 4 * a * c;
 d = b^1 - 4 * a * c; d = b^-1 - 4 * a * c; d = b^MAX - 4 * a * c; d = b^MIN - 4 * a * c; d = b^2 - 0 * a * c;
 d = b^2 - 1 * a * c;
 d = b^2 – (-1) * a * c;
 d = b^2 - MAX * a * c;
 d = b^2 - MIN * a * c;
 14 Mutation Analysis: High cost of analysis
  8. Δ=b2 – 22ac d = b^2 - (2^2) * a

    * c;
 d = b^2 - (2*2) * a * c;
 d = b^2 - (2+2) * a * c; 15 Mutation Analysis: Equivalent Mutants Mutants Original Equivalent Mutant Normal Mutant Or: Do not trust low mutation scores
  9. 16 Mutation Analysis: Redundant Mutants Mutants Original Equivalent Mutant Redundant

    Mutant d = b^2 - (-4) * a * c;
 d = b^2 + 4 * a * c;
 d = (-b)^2 - 4 * a * c;
 Δ=b2 – 4ac Or: Do not trust high mutation scores
  10. 17 Mutation Analysis: Different Operators Δ=b2 – 4ac d =

    b^2 + 4 * a * c; >>> dis.dis(d) 2 0 LOAD_FAST 0 (b) 3 LOAD_CONST 1 (2) 6 LOAD_CONST 2 (4) 9 LOAD_FAST 1 (a) 12 BINARY_MULTIPLY 13 LOAD_FAST 2 (c) 16 BINARY_MULTIPLY 17 BINARY_SUBTRACT 18 BINARY_XOR 19 RETURN_VALUE x [2016 Software Quality Journal]
  11. FSE 2016 We looked at the incidence of bug fixes

    on actual programs An uncovered line is twice as likely to have a bug fix as that of a line covered by any test case.
  12. That is, • Coverage is highly correlated with mutation score

    (92%) • Mutation score provides little extra information compared to coverage. • Coverage provides 75% more information than just test suite size. • Mutation score can be unreliable. • Coverage thresholds actually help reduce incidence of bugs.