Tests & Types March 1, 2017 4 Tests • Verbose • Typically easy to understand • Generally, aim to cover the complete specification • Can have failing test cases Types • Terse • Can be harder to understand • Typically a much smaller set of properties covered than the complete specification • All types, if specified should be correct
How do we choose? March 1, 2017 6 • Both test suites and types are used to prevent bugs • Both tests and types can be improved with more resources • We have finite budget • We need a way to compare
We know how to evaluate tests March 1, 2017 7 Code Coverage • Adequate coverage says something about correctness def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n == 0: return 1 else: return factorial(n-1) Doesn't translate well to types • A program usually has a complete type specification def factorial(n : Int) -> Int: if n == 0: return 1 else: return factorial(n-1)
Subjects March 1, 2017 13 • twitter-graph (own project) • 83 LOC • 10/10 Functions type annotated • 25 test cases with 99% statement coverage • w3lib • 369 LOC • 34/42 Functions type annotated • 97 test cases with 94.7% statement coverage In both, type annotations were added after the test suite Did not find any undetected faults on adding type annotations
Results Summary March 1, 2017 16 • The mutant kills from types is a strict subset of kills by the test suite • No difference in mutation scores between stronger and weaker annotations in either projects Type Test Total twitter-graph 62 142 171 w3lib 94 545 677
Discussion March 1, 2017 18 Mutants Killed: • Type assertions ⊂ Test suite • Type errors ⊂ Semantic errors • Strength of type annotations in MyPy for Python 3.6 • Weaker type assertions ~ Stronger type assertions • Mutants targeted towards tests, so insufficient to detect differences in type assertions
Failure to distinguish type assertions March 1, 2017 19 • Our mutation tools usually mutate the AST • Many simple errors that programmers make have a larger impact in the AST • Single token mutation of AST fails to reproduce them fn('Ex %s' % 1 * 2) (fn (% ’Ex %s’ (* 1 2))) fn('Ex %s' % 1 + 2) (fn (+ (% ’Ex %s’ 1) 2)
Conclusions March 1, 2017 20 Two possibilities • Current mutation operators are inadequate for measuring quality of type annotations • Need operators that target types • Investigate source based rather than AST based mutants • Stronger type annotations aren't that helpful • Farfetched, but worth investigating