How Good Are Your Types?

How Good Are Your Types?

Mutation workshop 2017 at Tokyo

D27cb84e0d30e2778e9b66d6a5f42106?s=128

Rahul Gopinath

March 13, 2017
Tweet

Transcript

  1. How Good Are Your Types? Rahul Gopinath Eric Walkingshaw

  2. Tests & Types March 1, 2017 2 Means of reducing

    susceptibility of programs to faults
  3. Tests & Types March 1, 2017 3 Tests • Concrete

    • Probabilistic guarantee def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n < 1: return 1 else: return factorial(n-1) Types • Abstract • Absolute guarantee def factorial(n : Int) -> Int: if n < 1: return 1 else: return factorial(n-1)
  4. Tests & Types March 1, 2017 4 Tests • Verbose

    • Typically easy to understand • Generally, aim to cover the complete specification • Can have failing test cases Types • Terse • Can be harder to understand • Typically a much smaller set of properties covered than the complete specification • All types, if specified should be correct
  5. Aside: When they are applied March 1, 2017 5 Specify

    types Write tests TDD Gradual Typing Write program
  6. How do we choose? March 1, 2017 6 • Both

    test suites and types are used to prevent bugs • Both tests and types can be improved with more resources • We have finite budget • We need a way to compare
  7. We know how to evaluate tests March 1, 2017 7

    Code Coverage • Adequate coverage says something about correctness def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n == 0: return 1 else: return factorial(n-1) Doesn't translate well to types • A program usually has a complete type specification def factorial(n : Int) -> Int: if n == 0: return 1 else: return factorial(n-1)
  8. We improve types by refining them March 1, 2017 8

    weak def factorial(n : Any) -> Any: if n < 1: return 1 else: return factorial(n-1) strong def factorial(n : Number )-> Number: if n < 1: return 1 else: return factorial(n-1) stronger def factorial(n : Integral) -> Integral: if n < 1: return 1 else: return factorial(n-1) weak def hashtags(jr : Any) -> Any: .... strong def hashtags(jr : Dict[str, Any]) -> Tuple[int, List[str]]: ... stronger def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, Any]: ... How much better is stronger compared to strong?
  9. Idea! use mutation testing March 1, 2017 9

  10. Evaluation March 1, 2017 10

  11. Tools March 1, 2017 11 • Language: Python • Annotations

    verified by MyPy for Python 3.6 • Mutation testing tool: MutPy • Standard (traditional) mutation operators
  12. MutPy Operators March 1, 2017 12

  13. Subjects March 1, 2017 13 • twitter-graph (own project) •

    83 LOC • 10/10 Functions type annotated • 25 test cases with 99% statement coverage • w3lib • 369 LOC • 34/42 Functions type annotated • 97 test cases with 94.7% statement coverage In both, type annotations were added after the test suite Did not find any undetected faults on adding type annotations
  14. Methodology March 1, 2017 14 • Started with the strictest

    annotations available in MyPy • Progressively weakened innermost type annotations • (int|str|...) -> Any • Container[Any*] -> Any Examples: def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, List[str]]: def hashtags(jr : Dict[str, List[Any]]) -> Tuple[int, Any]: def hashtags(jr : Dict[str, Any]) -> Any:
  15. Results March 1, 2017 15

  16. Results Summary March 1, 2017 16 • The mutant kills

    from types is a strict subset of kills by the test suite • No difference in mutation scores between stronger and weaker annotations in either projects Type Test Total twitter-graph 62 142 171 w3lib 94 545 677
  17. Examples of Type Errors March 1, 2017 17 - print("Hello"

    + " World") + print("Hello" - " World") def square(x: int) -> int: - y = x**2 + pass return y class A(object): def add(self, x: int) -> int: - y = x + self.val + y = x + val return y class A(object): ... - @staticmethod def mymethod(x: int) -> int: return x ... a = A() a.mymethod() Operator Replacement Statement Deletion Decorator Deletion Self Deletion
  18. Discussion March 1, 2017 18 Mutants Killed: • Type assertions

    ⊂ Test suite • Type errors ⊂ Semantic errors • Strength of type annotations in MyPy for Python 3.6 • Weaker type assertions ~ Stronger type assertions • Mutants targeted towards tests, so insufficient to detect differences in type assertions
  19. Failure to distinguish type assertions March 1, 2017 19 •

    Our mutation tools usually mutate the AST • Many simple errors that programmers make have a larger impact in the AST • Single token mutation of AST fails to reproduce them fn('Ex %s' % 1 * 2) (fn (% ’Ex %s’ (* 1 2))) fn('Ex %s' % 1 + 2) (fn (+ (% ’Ex %s’ 1) 2)
  20. Conclusions March 1, 2017 20 Two possibilities • Current mutation

    operators are inadequate for measuring quality of type annotations • Need operators that target types • Investigate source based rather than AST based mutants • Stronger type annotations aren't that helpful • Farfetched, but worth investigating