Code Coverage is a Strong Predictor of Test Suite Effectiveness in the Real World

Slide 1

Slide 1 text

Code Coverage is a Strong Predictor of Test Suite Effectiveness in the Real World Rahul Gopinath Iftekhar Ahmed

Slide 2

Slide 2 text

When should we stop testing?

Slide 3

Slide 3 text

How to evaluate test suite effectiveness?

Slide 4

Slide 4 text

Previous research: Do not trust coverage (In theory) GTAC’15 Inozemtseva

Slide 5

Slide 5 text

Test suite quality Coverage Assertions Factors affecting test suite quality

Slide 6

Slide 6 text

Test suite quality Coverage Assertions According to previous research Test suite size GTAC’15 Inozemtseva

Slide 7

Slide 7 text

What is the adequate test suite size? But… • Is there a maximum number of test cases for a given program? • Are different test cases equivalent in strength? • How do we account for duplicate tests? • Test suite sizes are not comparable even for the same program.

Slide 8

Slide 8 text

Can I use coverage to measure suite effectiveness?

Slide 9

Slide 9 text

Mutation score is best predicted by statement coverage A fault in a statement has 87% probability of being detected if a test covers it. M = 0.87xS Size of dots follow size of projects R2 = 0.94 Results from 250 real world programs largest > 100 KLOC On Developer written test suites

Slide 10

Slide 10 text

Mutation score is best predicted by statement coverage A fault in a statement has 61% probability of being detected if a test covers it. M = 0.61xS R2 = 0.70 Results from 250 real world programs largest > 100 KLOC On Randoop produced test suites

Slide 11

Slide 11 text

Controlling for test suite size, coverage provides little extra information. Hence don't use coverage GTAC’15 Inozemtseva But Mutation score provides little extra information (<6%) compared to coverage. Why use mutation?

Slide 12

Slide 12 text

Does coverage follow test suite size? GTAC’15 Inozemtseva Our Research # Programs 5 250 Selection of programs Ad hoc Systematic sample from Github Tool used CodeCover, PIT Emma, Cobertura, CodeCover, PIT Test suites Random subsets of original Organic & Randomly generated (New results) Removal of influence of size Ad hoc Statistical Our study is much larger, systematic (not ad hoc), and follows the real world usage Our Research (New results) M~TestsuiteSize 12.84% M~log(TSize) 51.26% residuals(M~log(TSize))~S 75.25% Statement coverage can explain 75% variability in mutation score after eliminating influence of test suite size.

Slide 13

Slide 13 text

Is mutation analysis better than coverage analysis?

Slide 14

Slide 14 text

Δ=b2 – 4ac d = b^2 + 4 * a * c;  d = b^2 * 4 * a * c;  d = b^2 / 4 * a * c;  d = b^2 ^ 4 * a * c;  d = b^2 % 4 * a * c; d = b^2 << 4 * a * c; d = b^2 >> 4 * a * c; d = b^2 * 4 + a * c;  d = b^2 * 4 - a * c;  d = b^2 * 4 / a * c;  d = b^2 * 4 ^ a * c;  d = b^2 * 4 % a * c; d = b^2 * 4 << a * c; d = b^2 * 4 >> a * c; d = b^2 * 4 * a + c;  d = b^2 * 4 * a - c;  d = b^2 * 4 * a / c;  d = b^2 * 4 * a ^ c;  d = b^2 * 4 * a % c; d = b^2 * 4 * a << c; d = b^2 * 4 * a >> c; d = b + 2 - 4 * a * c;  d = b - 2 - 4 * a * c;  d = b * 2 - 4 * a * c;  d = b / 2 - 4 * a * c;  d = b % 2 - 4 * a * c; d = b^0 - 4 * a * c;  d = b^1 - 4 * a * c; d = b^-1 - 4 * a * c; d = b^MAX - 4 * a * c; d = b^MIN - 4 * a * c; d = b^2 - 0 * a * c;  d = b^2 - 1 * a * c;  d = b^2 – (-1) * a * c;  d = b^2 - MAX * a * c;  d = b^2 - MIN * a * c;  14 Mutation Analysis: High cost of analysis

Slide 15

Slide 15 text

Δ=b2 – 22ac d = b^2 - (2^2) * a * c;  d = b^2 - (2*2) * a * c;  d = b^2 - (2+2) * a * c; 15 Mutation Analysis: Equivalent Mutants Mutants Original Equivalent Mutant Normal Mutant Or: Do not trust low mutation scores

Slide 16

Slide 16 text

16 Mutation Analysis: Redundant Mutants Mutants Original Equivalent Mutant Redundant Mutant d = b^2 - (-4) * a * c;  d = b^2 + 4 * a * c;  d = (-b)^2 - 4 * a * c;  Δ=b2 – 4ac Or: Do not trust high mutation scores

Slide 17

Slide 17 text

17 Mutation Analysis: Different Operators Δ=b2 – 4ac d = b^2 + 4 * a * c; >>> dis.dis(d) 2 0 LOAD_FAST 0 (b) 3 LOAD_CONST 1 (2) 6 LOAD_CONST 2 (4) 9 LOAD_FAST 1 (a) 12 BINARY_MULTIPLY 13 LOAD_FAST 2 (c) 16 BINARY_MULTIPLY 17 BINARY_SUBTRACT 18 BINARY_XOR 19 RETURN_VALUE x [2016 Software Quality Journal]

Slide 18

Slide 18 text

Does a high coverage test suite actually prevent bugs?

Slide 19

Slide 19 text

FSE 2016 We looked at the incidence of bug fixes on actual programs An uncovered line is twice as likely to have a bug fix as that of a line covered by any test case.

Slide 20

Slide 20 text

That is, • Coverage is highly correlated with mutation score (92%) • Mutation score provides little extra information compared to coverage. • Coverage provides 75% more information than just test suite size. • Mutation score can be unreliable. • Coverage thresholds actually help reduce incidence of bugs.

Slide 21

Slide 21 text

Summary Beware of theoretical spherical cows