Slide 1

Slide 1 text

How Good Are Your Types? Rahul Gopinath Eric Walkingshaw

Slide 2

Slide 2 text

Tests & Types March 1, 2017 2 Means of reducing susceptibility of programs to faults

Slide 3

Slide 3 text

Tests & Types March 1, 2017 3 Tests • Concrete • Probabilistic guarantee def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n < 1: return 1 else: return factorial(n-1) Types • Abstract • Absolute guarantee def factorial(n : Int) -> Int: if n < 1: return 1 else: return factorial(n-1)

Slide 4

Slide 4 text

Tests & Types March 1, 2017 4 Tests • Verbose • Typically easy to understand • Generally, aim to cover the complete specification • Can have failing test cases Types • Terse • Can be harder to understand • Typically a much smaller set of properties covered than the complete specification • All types, if specified should be correct

Slide 5

Slide 5 text

Aside: When they are applied March 1, 2017 5 Specify types Write tests TDD Gradual Typing Write program

Slide 6

Slide 6 text

How do we choose? March 1, 2017 6 • Both test suites and types are used to prevent bugs • Both tests and types can be improved with more resources • We have finite budget • We need a way to compare

Slide 7

Slide 7 text

We know how to evaluate tests March 1, 2017 7 Code Coverage • Adequate coverage says something about correctness def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n == 0: return 1 else: return factorial(n-1) Doesn't translate well to types • A program usually has a complete type specification def factorial(n : Int) -> Int: if n == 0: return 1 else: return factorial(n-1)

Slide 8

Slide 8 text

We improve types by refining them March 1, 2017 8 weak def factorial(n : Any) -> Any: if n < 1: return 1 else: return factorial(n-1) strong def factorial(n : Number )-> Number: if n < 1: return 1 else: return factorial(n-1) stronger def factorial(n : Integral) -> Integral: if n < 1: return 1 else: return factorial(n-1) weak def hashtags(jr : Any) -> Any: .... strong def hashtags(jr : Dict[str, Any]) -> Tuple[int, List[str]]: ... stronger def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, Any]: ... How much better is stronger compared to strong?

Slide 9

Slide 9 text

Idea! use mutation testing March 1, 2017 9

Slide 10

Slide 10 text

Evaluation March 1, 2017 10

Slide 11

Slide 11 text

Tools March 1, 2017 11 • Language: Python • Annotations verified by MyPy for Python 3.6 • Mutation testing tool: MutPy • Standard (traditional) mutation operators

Slide 12

Slide 12 text

MutPy Operators March 1, 2017 12

Slide 13

Slide 13 text

Subjects March 1, 2017 13 • twitter-graph (own project) • 83 LOC • 10/10 Functions type annotated • 25 test cases with 99% statement coverage • w3lib • 369 LOC • 34/42 Functions type annotated • 97 test cases with 94.7% statement coverage In both, type annotations were added after the test suite Did not find any undetected faults on adding type annotations

Slide 14

Slide 14 text

Methodology March 1, 2017 14 • Started with the strictest annotations available in MyPy • Progressively weakened innermost type annotations • (int|str|...) -> Any • Container[Any*] -> Any Examples: def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, List[str]]: def hashtags(jr : Dict[str, List[Any]]) -> Tuple[int, Any]: def hashtags(jr : Dict[str, Any]) -> Any:

Slide 15

Slide 15 text

Results March 1, 2017 15

Slide 16

Slide 16 text

Results Summary March 1, 2017 16 • The mutant kills from types is a strict subset of kills by the test suite • No difference in mutation scores between stronger and weaker annotations in either projects Type Test Total twitter-graph 62 142 171 w3lib 94 545 677

Slide 17

Slide 17 text

Examples of Type Errors March 1, 2017 17 - print("Hello" + " World") + print("Hello" - " World") def square(x: int) -> int: - y = x**2 + pass return y class A(object): def add(self, x: int) -> int: - y = x + self.val + y = x + val return y class A(object): ... - @staticmethod def mymethod(x: int) -> int: return x ... a = A() a.mymethod() Operator Replacement Statement Deletion Decorator Deletion Self Deletion

Slide 18

Slide 18 text

Discussion March 1, 2017 18 Mutants Killed: • Type assertions ⊂ Test suite • Type errors ⊂ Semantic errors • Strength of type annotations in MyPy for Python 3.6 • Weaker type assertions ~ Stronger type assertions • Mutants targeted towards tests, so insufficient to detect differences in type assertions

Slide 19

Slide 19 text

Failure to distinguish type assertions March 1, 2017 19 • Our mutation tools usually mutate the AST • Many simple errors that programmers make have a larger impact in the AST • Single token mutation of AST fails to reproduce them fn('Ex %s' % 1 * 2) (fn (% ’Ex %s’ (* 1 2))) fn('Ex %s' % 1 + 2) (fn (+ (% ’Ex %s’ 1) 2)

Slide 20

Slide 20 text

Conclusions March 1, 2017 20 Two possibilities • Current mutation operators are inadequate for measuring quality of type annotations • Need operators that target types • Investigate source based rather than AST based mutants • Stronger type annotations aren't that helpful • Farfetched, but worth investigating