Rahul Gopinath
March 13, 2017
130

# How Good Are Your Types?

Mutation workshop 2017 at Tokyo

March 13, 2017

## Transcript

2. ### Tests & Types March 1, 2017 2 Means of reducing

susceptibility of programs to faults
3. ### Tests & Types March 1, 2017 3 Tests • Concrete

• Probabilistic guarantee def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n < 1: return 1 else: return factorial(n-1) Types • Abstract • Absolute guarantee def factorial(n : Int) -> Int: if n < 1: return 1 else: return factorial(n-1)
4. ### Tests & Types March 1, 2017 4 Tests • Verbose

• Typically easy to understand • Generally, aim to cover the complete specification • Can have failing test cases Types • Terse • Can be harder to understand • Typically a much smaller set of properties covered than the complete specification • All types, if specified should be correct
5. ### Aside: When they are applied March 1, 2017 5 Specify

types Write tests TDD Gradual Typing Write program
6. ### How do we choose? March 1, 2017 6 • Both

test suites and types are used to prevent bugs • Both tests and types can be improved with more resources • We have finite budget • We need a way to compare
7. ### We know how to evaluate tests March 1, 2017 7

Code Coverage • Adequate coverage says something about correctness def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n == 0: return 1 else: return factorial(n-1) Doesn't translate well to types • A program usually has a complete type specification def factorial(n : Int) -> Int: if n == 0: return 1 else: return factorial(n-1)
8. ### We improve types by refining them March 1, 2017 8

weak def factorial(n : Any) -> Any: if n < 1: return 1 else: return factorial(n-1) strong def factorial(n : Number )-> Number: if n < 1: return 1 else: return factorial(n-1) stronger def factorial(n : Integral) -> Integral: if n < 1: return 1 else: return factorial(n-1) weak def hashtags(jr : Any) -> Any: .... strong def hashtags(jr : Dict[str, Any]) -> Tuple[int, List[str]]: ... stronger def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, Any]: ... How much better is stronger compared to strong?

11. ### Tools March 1, 2017 11 • Language: Python • Annotations

verified by MyPy for Python 3.6 • Mutation testing tool: MutPy • Standard (traditional) mutation operators

13. ### Subjects March 1, 2017 13 • twitter-graph (own project) •

83 LOC • 10/10 Functions type annotated • 25 test cases with 99% statement coverage • w3lib • 369 LOC • 34/42 Functions type annotated • 97 test cases with 94.7% statement coverage In both, type annotations were added after the test suite Did not find any undetected faults on adding type annotations
14. ### Methodology March 1, 2017 14 • Started with the strictest

annotations available in MyPy • Progressively weakened innermost type annotations • (int|str|...) -> Any • Container[Any*] -> Any Examples: def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, List[str]]: def hashtags(jr : Dict[str, List[Any]]) -> Tuple[int, Any]: def hashtags(jr : Dict[str, Any]) -> Any:

16. ### Results Summary March 1, 2017 16 • The mutant kills

from types is a strict subset of kills by the test suite • No difference in mutation scores between stronger and weaker annotations in either projects Type Test Total twitter-graph 62 142 171 w3lib 94 545 677
17. ### Examples of Type Errors March 1, 2017 17 - print("Hello"

+ " World") + print("Hello" - " World") def square(x: int) -> int: - y = x**2 + pass return y class A(object): def add(self, x: int) -> int: - y = x + self.val + y = x + val return y class A(object): ... - @staticmethod def mymethod(x: int) -> int: return x ... a = A() a.mymethod() Operator Replacement Statement Deletion Decorator Deletion Self Deletion
18. ### Discussion March 1, 2017 18 Mutants Killed: • Type assertions

⊂ Test suite • Type errors ⊂ Semantic errors • Strength of type annotations in MyPy for Python 3.6 • Weaker type assertions ~ Stronger type assertions • Mutants targeted towards tests, so insufficient to detect differences in type assertions
19. ### Failure to distinguish type assertions March 1, 2017 19 •

Our mutation tools usually mutate the AST • Many simple errors that programmers make have a larger impact in the AST • Single token mutation of AST fails to reproduce them fn('Ex %s' % 1 * 2) (fn (% ’Ex %s’ (* 1 2))) fn('Ex %s' % 1 + 2) (fn (+ (% ’Ex %s’ 1) 2)
20. ### Conclusions March 1, 2017 20 Two possibilities • Current mutation

operators are inadequate for measuring quality of type annotations • Need operators that target types • Investigate source based rather than AST based mutants • Stronger type annotations aren't that helpful • Farfetched, but worth investigating