How Good Are Your Types?

How Good Are Your Types? Rahul Gopinath Eric Walkingshaw

Tests & Types March 1, 2017 2 Means of reducing
susceptibility of programs to faults

Tests & Types March 1, 2017 3 Tests • Concrete
• Probabilistic guarantee def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n < 1: return 1 else: return factorial(n-1) Types • Abstract • Absolute guarantee def factorial(n : Int) -> Int: if n < 1: return 1 else: return factorial(n-1)

Tests & Types March 1, 2017 4 Tests • Verbose
• Typically easy to understand • Generally, aim to cover the complete specification • Can have failing test cases Types • Terse • Can be harder to understand • Typically a much smaller set of properties covered than the complete specification • All types, if specified should be correct

Aside: When they are applied March 1, 2017 5 Specify
types Write tests TDD Gradual Typing Write program

How do we choose? March 1, 2017 6 • Both
test suites and types are used to prevent bugs • Both tests and types can be improved with more resources • We have finite budget • We need a way to compare

We know how to evaluate tests March 1, 2017 7
Code Coverage • Adequate coverage says something about correctness def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n == 0: return 1 else: return factorial(n-1) Doesn't translate well to types • A program usually has a complete type specification def factorial(n : Int) -> Int: if n == 0: return 1 else: return factorial(n-1)

We improve types by refining them March 1, 2017 8
weak def factorial(n : Any) -> Any: if n < 1: return 1 else: return factorial(n-1) strong def factorial(n : Number )-> Number: if n < 1: return 1 else: return factorial(n-1) stronger def factorial(n : Integral) -> Integral: if n < 1: return 1 else: return factorial(n-1) weak def hashtags(jr : Any) -> Any: .... strong def hashtags(jr : Dict[str, Any]) -> Tuple[int, List[str]]: ... stronger def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, Any]: ... How much better is stronger compared to strong?

Idea! use mutation testing March 1, 2017 9

Evaluation March 1, 2017 10

Tools March 1, 2017 11 • Language: Python • Annotations
verified by MyPy for Python 3.6 • Mutation testing tool: MutPy • Standard (traditional) mutation operators

MutPy Operators March 1, 2017 12

Subjects March 1, 2017 13 • twitter-graph (own project) •
83 LOC • 10/10 Functions type annotated • 25 test cases with 99% statement coverage • w3lib • 369 LOC • 34/42 Functions type annotated • 97 test cases with 94.7% statement coverage In both, type annotations were added after the test suite Did not find any undetected faults on adding type annotations

Methodology March 1, 2017 14 • Started with the strictest
annotations available in MyPy • Progressively weakened innermost type annotations • (int|str|...) -> Any • Container[Any*] -> Any Examples: def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, List[str]]: def hashtags(jr : Dict[str, List[Any]]) -> Tuple[int, Any]: def hashtags(jr : Dict[str, Any]) -> Any:

Results March 1, 2017 15

Results Summary March 1, 2017 16 • The mutant kills
from types is a strict subset of kills by the test suite • No difference in mutation scores between stronger and weaker annotations in either projects Type Test Total twitter-graph 62 142 171 w3lib 94 545 677

Examples of Type Errors March 1, 2017 17 - print("Hello"
+ " World") + print("Hello" - " World") def square(x: int) -> int: - y = x**2 + pass return y class A(object): def add(self, x: int) -> int: - y = x + self.val + y = x + val return y class A(object): ... - @staticmethod def mymethod(x: int) -> int: return x ... a = A() a.mymethod() Operator Replacement Statement Deletion Decorator Deletion Self Deletion

Discussion March 1, 2017 18 Mutants Killed: • Type assertions
⊂ Test suite • Type errors ⊂ Semantic errors • Strength of type annotations in MyPy for Python 3.6 • Weaker type assertions ~ Stronger type assertions • Mutants targeted towards tests, so insufficient to detect differences in type assertions

Failure to distinguish type assertions March 1, 2017 19 •
Our mutation tools usually mutate the AST • Many simple errors that programmers make have a larger impact in the AST • Single token mutation of AST fails to reproduce them fn('Ex %s' % 1 * 2) (fn (% ’Ex %s’ (* 1 2))) fn('Ex %s' % 1 + 2) (fn (+ (% ’Ex %s’ 1) 2)

Conclusions March 1, 2017 20 Two possibilities • Current mutation
operators are inadequate for measuring quality of type annotations • Need operators that target types • Investigate source based rather than AST based mutants • Stronger type annotations aren't that helpful • Farfetched, but worth investigating

How Good Are Your Types?

How Good Are Your Types?

Rahul Gopinath

More Decks by Rahul Gopinath

Other Decks in Research

Featured

Transcript