Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Good Are Your Types?

How Good Are Your Types?

Mutation workshop 2017 at Tokyo

Rahul Gopinath

March 13, 2017
Tweet

More Decks by Rahul Gopinath

Other Decks in Research

Transcript

  1. Tests & Types March 1, 2017 2 Means of reducing

    susceptibility of programs to faults
  2. Tests & Types March 1, 2017 3 Tests • Concrete

    • Probabilistic guarantee def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n < 1: return 1 else: return factorial(n-1) Types • Abstract • Absolute guarantee def factorial(n : Int) -> Int: if n < 1: return 1 else: return factorial(n-1)
  3. Tests & Types March 1, 2017 4 Tests • Verbose

    • Typically easy to understand • Generally, aim to cover the complete specification • Can have failing test cases Types • Terse • Can be harder to understand • Typically a much smaller set of properties covered than the complete specification • All types, if specified should be correct
  4. Aside: When they are applied March 1, 2017 5 Specify

    types Write tests TDD Gradual Typing Write program
  5. How do we choose? March 1, 2017 6 • Both

    test suites and types are used to prevent bugs • Both tests and types can be improved with more resources • We have finite budget • We need a way to compare
  6. We know how to evaluate tests March 1, 2017 7

    Code Coverage • Adequate coverage says something about correctness def factorial(n): > assert(factorial(0)==1) > assert(factorial(3)==6) if n == 0: return 1 else: return factorial(n-1) Doesn't translate well to types • A program usually has a complete type specification def factorial(n : Int) -> Int: if n == 0: return 1 else: return factorial(n-1)
  7. We improve types by refining them March 1, 2017 8

    weak def factorial(n : Any) -> Any: if n < 1: return 1 else: return factorial(n-1) strong def factorial(n : Number )-> Number: if n < 1: return 1 else: return factorial(n-1) stronger def factorial(n : Integral) -> Integral: if n < 1: return 1 else: return factorial(n-1) weak def hashtags(jr : Any) -> Any: .... strong def hashtags(jr : Dict[str, Any]) -> Tuple[int, List[str]]: ... stronger def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, Any]: ... How much better is stronger compared to strong?
  8. Tools March 1, 2017 11 • Language: Python • Annotations

    verified by MyPy for Python 3.6 • Mutation testing tool: MutPy • Standard (traditional) mutation operators
  9. Subjects March 1, 2017 13 • twitter-graph (own project) •

    83 LOC • 10/10 Functions type annotated • 25 test cases with 99% statement coverage • w3lib • 369 LOC • 34/42 Functions type annotated • 97 test cases with 94.7% statement coverage In both, type annotations were added after the test suite Did not find any undetected faults on adding type annotations
  10. Methodology March 1, 2017 14 • Started with the strictest

    annotations available in MyPy • Progressively weakened innermost type annotations • (int|str|...) -> Any • Container[Any*] -> Any Examples: def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, List[str]]: def hashtags(jr : Dict[str, List[Any]]) -> Tuple[int, Any]: def hashtags(jr : Dict[str, Any]) -> Any:
  11. Results Summary March 1, 2017 16 • The mutant kills

    from types is a strict subset of kills by the test suite • No difference in mutation scores between stronger and weaker annotations in either projects Type Test Total twitter-graph 62 142 171 w3lib 94 545 677
  12. Examples of Type Errors March 1, 2017 17 - print("Hello"

    + " World") + print("Hello" - " World") def square(x: int) -> int: - y = x**2 + pass return y class A(object): def add(self, x: int) -> int: - y = x + self.val + y = x + val return y class A(object): ... - @staticmethod def mymethod(x: int) -> int: return x ... a = A() a.mymethod() Operator Replacement Statement Deletion Decorator Deletion Self Deletion
  13. Discussion March 1, 2017 18 Mutants Killed: • Type assertions

    ⊂ Test suite • Type errors ⊂ Semantic errors • Strength of type annotations in MyPy for Python 3.6 • Weaker type assertions ~ Stronger type assertions • Mutants targeted towards tests, so insufficient to detect differences in type assertions
  14. Failure to distinguish type assertions March 1, 2017 19 •

    Our mutation tools usually mutate the AST • Many simple errors that programmers make have a larger impact in the AST • Single token mutation of AST fails to reproduce them fn('Ex %s' % 1 * 2) (fn (% ’Ex %s’ (* 1 2))) fn('Ex %s' % 1 + 2) (fn (+ (% ’Ex %s’ 1) 2)
  15. Conclusions March 1, 2017 20 Two possibilities • Current mutation

    operators are inadequate for measuring quality of type annotations • Need operators that target types • Investigate source based rather than AST based mutants • Stronger type annotations aren't that helpful • Farfetched, but worth investigating