Rahul Gopinath
March 13, 2017
86

# How Good Are Your Types?

Mutation workshop 2017 at Tokyo

March 13, 2017

## Transcript

1. How Good Are Your Types?
Rahul Gopinath
Eric Walkingshaw

2. Tests & Types
March 1, 2017
2
Means of reducing susceptibility of programs to faults

3. Tests & Types
March 1, 2017
3
Tests
• Concrete
• Probabilistic guarantee
def factorial(n):
> assert(factorial(0)==1)
> assert(factorial(3)==6)
if n < 1:
return 1
else:
return factorial(n-1)
Types
• Abstract
• Absolute guarantee
def factorial(n : Int) -> Int:
if n < 1:
return 1
else:
return factorial(n-1)

4. Tests & Types
March 1, 2017
4
Tests
• Verbose
• Typically easy to understand
• Generally, aim to cover the
complete specification
• Can have failing test cases
Types
• Terse
• Can be harder to understand
• Typically a much smaller set
of properties covered than
the complete specification
• All types, if specified should
be correct

5. Aside: When they are applied
March 1, 2017
5
Specify types Write tests
Write program

6. How do we choose?
March 1, 2017
6
• Both test suites and types are used to prevent bugs
• Both tests and types can be improved with more resources
• We have finite budget
• We need a way to compare

7. We know how to evaluate tests
March 1, 2017
7
Code Coverage
def factorial(n):
> assert(factorial(0)==1)
> assert(factorial(3)==6)
if n == 0:
return 1
else:
return factorial(n-1)
Doesn't translate well to types
• A program usually has a
complete type specification
def factorial(n : Int) -> Int:
if n == 0:
return 1
else:
return factorial(n-1)

8. We improve types by refining them
March 1, 2017
8
weak
def factorial(n : Any) -> Any:
if n < 1: return 1
else: return factorial(n-1)
strong
def factorial(n : Number )-> Number:
if n < 1: return 1
else: return factorial(n-1)
stronger
def factorial(n : Integral) -> Integral:
if n < 1: return 1
else: return factorial(n-1)
weak
def hashtags(jr : Any)
-> Any:
....
strong
def hashtags(jr : Dict[str, Any])
-> Tuple[int, List[str]]:
...
stronger
def hashtags(jr : Dict[str, List[Tuple[int, str]]])
-> Tuple[int, Any]:
...
How much better is stronger compared to strong?

9. Idea! use mutation testing
March 1, 2017
9

10. Evaluation
March 1, 2017
10

11. Tools
March 1, 2017
11
• Language: Python
• Annotations verified by MyPy for Python 3.6
• Mutation testing tool: MutPy

12. MutPy Operators
March 1, 2017
12

13. Subjects
March 1, 2017
13
• 83 LOC
• 10/10 Functions type annotated
• 25 test cases with 99% statement coverage
• w3lib
• 369 LOC
• 34/42 Functions type annotated
• 97 test cases with 94.7% statement coverage
In both, type annotations were added after the test suite
Did not find any undetected faults on adding type annotations

14. Methodology
March 1, 2017
14
• Started with the strictest annotations available in MyPy
• Progressively weakened innermost type annotations
• (int|str|...) -> Any
• Container[Any*] -> Any
Examples:
def hashtags(jr : Dict[str, List[Tuple[int, str]]]) -> Tuple[int, List[str]]:
def hashtags(jr : Dict[str, List[Any]]) -> Tuple[int, Any]:
def hashtags(jr : Dict[str, Any]) -> Any:

15. Results
March 1, 2017
15

16. Results Summary
March 1, 2017
16
• The mutant kills from types is a strict subset of kills by the test suite
• No difference in mutation scores between stronger and weaker
annotations in either projects
Type Test Total
w3lib 94 545 677

17. Examples of Type Errors
March 1, 2017
17
- print("Hello" + " World")
+ print("Hello" - " World")
def square(x: int) -> int:
- y = x**2
+ pass
return y
class A(object):
def add(self, x: int) -> int:
- y = x + self.val
+ y = x + val
return y
class A(object):
...
- @staticmethod
def mymethod(x: int) -> int:
return x
...
a = A()
a.mymethod()
Operator Replacement Statement Deletion
Decorator Deletion Self Deletion

18. Discussion
March 1, 2017
18
Mutants Killed:
• Type assertions ⊂ Test suite
• Type errors ⊂ Semantic errors
• Strength of type annotations in MyPy for Python 3.6
• Weaker type assertions ~ Stronger type assertions
• Mutants targeted towards tests, so insufficient to detect differences
in type assertions

19. Failure to distinguish type assertions
March 1, 2017
19
• Our mutation tools usually mutate the AST
• Many simple errors that programmers make have a larger impact in the AST
• Single token mutation of AST fails to reproduce them
fn('Ex %s' % 1 * 2)
(fn
(% ’Ex %s’
(* 1 2)))
fn('Ex %s' % 1 + 2)
(fn
(+ (% ’Ex %s’ 1)
2)

20. Conclusions
March 1, 2017
20
Two possibilities
• Current mutation operators are inadequate for measuring quality of type annotations
• Need operators that target types
• Investigate source based rather than AST based mutants
• Stronger type annotations aren't that helpful
• Farfetched, but worth investigating