Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Good Are Your Types?

How Good Are Your Types?

Mutation workshop 2017 at Tokyo

Rahul Gopinath

March 13, 2017
Tweet

More Decks by Rahul Gopinath

Other Decks in Research

Transcript

  1. How Good Are Your Types?
    Rahul Gopinath
    Eric Walkingshaw

    View Slide

  2. Tests & Types
    March 1, 2017
    2
    Means of reducing susceptibility of programs to faults

    View Slide

  3. Tests & Types
    March 1, 2017
    3
    Tests
    • Concrete
    • Probabilistic guarantee
    def factorial(n):
    > assert(factorial(0)==1)
    > assert(factorial(3)==6)
    if n < 1:
    return 1
    else:
    return factorial(n-1)
    Types
    • Abstract
    • Absolute guarantee
    def factorial(n : Int) -> Int:
    if n < 1:
    return 1
    else:
    return factorial(n-1)

    View Slide

  4. Tests & Types
    March 1, 2017
    4
    Tests
    • Verbose
    • Typically easy to understand
    • Generally, aim to cover the
    complete specification
    • Can have failing test cases
    Types
    • Terse
    • Can be harder to understand
    • Typically a much smaller set
    of properties covered than
    the complete specification
    • All types, if specified should
    be correct

    View Slide

  5. Aside: When they are applied
    March 1, 2017
    5
    Specify types Write tests
    TDD Gradual Typing
    Write program

    View Slide

  6. How do we choose?
    March 1, 2017
    6
    • Both test suites and types are used to prevent bugs
    • Both tests and types can be improved with more resources
    • We have finite budget
    • We need a way to compare

    View Slide

  7. We know how to evaluate tests
    March 1, 2017
    7
    Code Coverage
    • Adequate coverage says
    something about correctness
    def factorial(n):
    > assert(factorial(0)==1)
    > assert(factorial(3)==6)
    if n == 0:
    return 1
    else:
    return factorial(n-1)
    Doesn't translate well to types
    • A program usually has a
    complete type specification
    def factorial(n : Int) -> Int:
    if n == 0:
    return 1
    else:
    return factorial(n-1)

    View Slide

  8. We improve types by refining them
    March 1, 2017
    8
    weak
    def factorial(n : Any) -> Any:
    if n < 1: return 1
    else: return factorial(n-1)
    strong
    def factorial(n : Number )-> Number:
    if n < 1: return 1
    else: return factorial(n-1)
    stronger
    def factorial(n : Integral) -> Integral:
    if n < 1: return 1
    else: return factorial(n-1)
    weak
    def hashtags(jr : Any)
    -> Any:
    ....
    strong
    def hashtags(jr : Dict[str, Any])
    -> Tuple[int, List[str]]:
    ...
    stronger
    def hashtags(jr : Dict[str, List[Tuple[int, str]]])
    -> Tuple[int, Any]:
    ...
    How much better is stronger compared to strong?

    View Slide

  9. Idea! use mutation testing
    March 1, 2017
    9

    View Slide

  10. Evaluation
    March 1, 2017
    10

    View Slide

  11. Tools
    March 1, 2017
    11
    • Language: Python
    • Annotations verified by MyPy for Python 3.6
    • Mutation testing tool: MutPy
    • Standard (traditional) mutation operators

    View Slide

  12. MutPy Operators
    March 1, 2017
    12

    View Slide

  13. Subjects
    March 1, 2017
    13
    • twitter-graph (own project)
    • 83 LOC
    • 10/10 Functions type annotated
    • 25 test cases with 99% statement coverage
    • w3lib
    • 369 LOC
    • 34/42 Functions type annotated
    • 97 test cases with 94.7% statement coverage
    In both, type annotations were added after the test suite
    Did not find any undetected faults on adding type annotations

    View Slide

  14. Methodology
    March 1, 2017
    14
    • Started with the strictest annotations available in MyPy
    • Progressively weakened innermost type annotations
    • (int|str|...) -> Any
    • Container[Any*] -> Any
    Examples:
    def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, List[str]]:
    def hashtags(jr : Dict[str, List[Any]]) -> Tuple[int, Any]:
    def hashtags(jr : Dict[str, Any]) -> Any:

    View Slide

  15. Results
    March 1, 2017
    15

    View Slide

  16. Results Summary
    March 1, 2017
    16
    • The mutant kills from types is a strict subset of kills by the test suite
    • No difference in mutation scores between stronger and weaker
    annotations in either projects
    Type Test Total
    twitter-graph 62 142 171
    w3lib 94 545 677

    View Slide

  17. Examples of Type Errors
    March 1, 2017
    17
    - print("Hello" + " World")
    + print("Hello" - " World")
    def square(x: int) -> int:
    - y = x**2
    + pass
    return y
    class A(object):
    def add(self, x: int) -> int:
    - y = x + self.val
    + y = x + val
    return y
    class A(object):
    ...
    - @staticmethod
    def mymethod(x: int) -> int:
    return x
    ...
    a = A()
    a.mymethod()
    Operator Replacement Statement Deletion
    Decorator Deletion Self Deletion

    View Slide

  18. Discussion
    March 1, 2017
    18
    Mutants Killed:
    • Type assertions ⊂ Test suite
    • Type errors ⊂ Semantic errors
    • Strength of type annotations in MyPy for Python 3.6
    • Weaker type assertions ~ Stronger type assertions
    • Mutants targeted towards tests, so insufficient to detect differences
    in type assertions

    View Slide

  19. Failure to distinguish type assertions
    March 1, 2017
    19
    • Our mutation tools usually mutate the AST
    • Many simple errors that programmers make have a larger impact in the AST
    • Single token mutation of AST fails to reproduce them
    fn('Ex %s' % 1 * 2)
    (fn
    (% ’Ex %s’
    (* 1 2)))
    fn('Ex %s' % 1 + 2)
    (fn
    (+ (% ’Ex %s’ 1)
    2)

    View Slide

  20. Conclusions
    March 1, 2017
    20
    Two possibilities
    • Current mutation operators are inadequate for measuring quality of type annotations
    • Need operators that target types
    • Investigate source based rather than AST based mutants
    • Stronger type annotations aren't that helpful
    • Farfetched, but worth investigating

    View Slide