Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyConZA 2015: "Property-based testing with Hypothesis" by Jeremy Thurgood

Pycon ZA
October 02, 2015

PyConZA 2015: "Property-based testing with Hypothesis" by Jeremy Thurgood

Unit testing can be more effective and less tedious when you have an army of robot monkeys at your disposal. Why should humans have to worry about finding the particular combination of Turkish and Tengwar that crashes the serialiser, or the convoluted sequence of operations that corrupts the database?

> Hypothesis is a Python library for turning unit tests into generative tests,
> covering a far wider range of cases than you can manually. Rather than just
> testing for the things you already know about, Hypothesis goes out and
> actively hunts for bugs in your code. It usually finds them, and when it
> does it gives you simple and easy to read examples to demonstrate.
>
> -- Hypothesis 1.0 release announcement

Property-based testing lets you think about your tests in terms of general behaviour and invariant properties instead of getting lost in the details of individual examples, and good tools (such as Hypothesis) will explore quite complex combinations of test data and reduce them to minimum failing cases.

This talk will provide a practical introduction to property-based testing with Hypothesis, and show how you can use it to build more effective test suites with less effort.

Pycon ZA

October 02, 2015
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. PROPERTY-BASED TESTING PROPERTY-BASED TESTING WITH HYPOTHESIS WITH HYPOTHESIS LET YOUR

    ARMY OF ROBOTS WRITE YOUR TESTS LET YOUR ARMY OF ROBOTS WRITE YOUR TESTS Jeremy Thurgood PyconZA 2015
  2. AN EXAMPLE AN EXAMPLE class NaivePriorityQueue(object): """ A priority queue

    moves the smallest item to the front. Naive implementation with O(N) `put()` and O(1) `get()`. """ def __init__(self): self._items = [] def __len__(self): return len(self._items) def put(self, item): """Add an item to the collection.""" self._items.append(item) self._items.sort(reverse=True) def get(self): """Remove and return the smallest item.""" return self._items.pop()
  3. HOW WE USUALLY WRITE TESTS HOW WE USUALLY WRITE TESTS

    from naive_pqueue import NaivePriorityQueue def mkpq(items): """Create and fill a NaivePriorityQueue.""" pq = NaivePriorityQueue() [pq.put(item) for item in items] return pq def test_get_only(): """If there's only one item, we get that.""" pq = mkpq(["a"]) assert pq.get() == "a" assert len(pq) == 0 def test_get_both(): """If there are two items, we get both in order.""" pq = mkpq(["a", "b"]) assert pq.get() == "a" assert pq.get() == "b" assert len(pq) == 0 def test_put_only(): """If the queue is empty, putting is trivial.""" pq = NaivePriorityQueue()
  4. ISSUES WITH EXAMPLE-BASED TESTS ISSUES WITH EXAMPLE-BASED TESTS Tedious to

    write. Lots of repetitition. Painful to maintain. Focus on low-level details. ... But in�nitely better than no tests at all.
  5. HOW WE WANT TO WRITE TESTS HOW WE WANT TO

    WRITE TESTS In a world made of unicorns and kittens and rainbows... from magic import assert_correct from naive_pqueue import NaivePriorityQueue assert_correct(NaivePriorityQueue) ...but how does assert_correct know what's correct?
  6. DEFINING CORRECTNESS DEFINING CORRECTNESS Let's de�ne "correctness" for a priority

    queue. def items_are_returned_in_priority_order(pq, items): """We always get the smallest item first.""" for item in items: pq.put(item) assert len(pq) == len(items) current = pq.get() while len(pq) > 0: prior, current = current, pq.get() assert prior <= current def all_items_are_returned_exactly_once(pq, items): """We always get every item exactly once.""" for item in items: pq.put(item) assert len(pq) == len(items) while len(pq) > 0: items.remove(pq.get()) assert len(items) == 0
  7. HOW PROPERTY-BASED TESTS WORK HOW PROPERTY-BASED TESTS WORK Focus on

    high-level requirements. Properties de�ne behaviour. Randomly generated input. Failure case minimization. ... But no silver bullet.
  8. FOR REAL, WITH HYPOTHESIS FOR REAL, WITH HYPOTHESIS from hypothesis

    import given, strategies as st from naive_pqueue import NaivePriorityQueue @given(items=st.lists(st.integers(), min_size=1)) def test_items_are_returned_in_priority_order(items): """We always get the smallest item first.""" pq = NaivePriorityQueue() for item in items: pq.put(item) assert len(pq) == len(items) current = pq.get() while len(pq) > 0: prior, current = current, pq.get() assert prior <= current @given(items=st.lists(st.integers())) def test_all_items_are_returned_exactly_once(items): """We always get every item exactly once.""" pq = NaivePriorityQueue() for item in items: pq.put(item) assert len(pq) == len(items) while len(pq) > 0: items.remove(pq.get()) assert len(items) == 0
  9. ANATOMY OF A TEST ANATOMY OF A TEST @given turns

    a test into a property ... ... that runs a bunch of times with random input ... ... generated by the strategies you give it ... ... and reports minimised failure examples.
  10. SIMPLE TEST SIMPLE TEST from hypothesis import given, strategies as

    st @given(st.floats()) def test_additive_inverse(x): """Double additive inverse has no effect.""" assert x == -(-x) failtest_additive_inverse_nan.py F [Traceback elided] AssertionError: assert nan == --nan Falsifying example: test_additive_inverse(x=nan) ======================= 1 failed in 0.02 seconds =======================
  11. ASSUME VALID INPUT ASSUME VALID INPUT We can assume our

    input isn't NaN. import math from hypothesis import given, assume, strategies as st @given(st.floats()) def test_additive_inverse(x): """Double additive inverse has no effect (except NaN).""" assume(not math.isnan(x)) assert x == -(-x) test_additive_inverse_assume.py . ======================= 1 passed in 0.06 seconds =======================
  12. SETTINGS SETTINGS Let's see what Hypothesis is actually doing. from

    hypothesis import given, Settings, Verbosity, strategies as st with Settings(max_examples=5, verbosity=Verbosity.verbose): @given(st.integers(-10, 10)) def test_additive_inverse(x): """Double additive inverse has no effect.""" assert x == -(-x) if __name__ == "__main__": test_additive_inverse() Trying example: test_additive_inverse(x=9) Trying example: test_additive_inverse(x=-8) Trying example: test_additive_inverse(x=4) Trying example: test_additive_inverse(x=3) Trying example: test_additive_inverse(x=-3)
  13. MINIMIZATION MINIMIZATION from hypothesis import given, strategies as st @given(st.lists(st.integers()))

    def test_sum_less_than_42(numbers): assert sum(numbers) < 42 failtest_minimization.py F [Traceback elided] AssertionError: assert 42 < 42 + where 42 = sum([42]) Falsifying example: test_sum_less_than_42(numbers=[42]) ======================= 1 failed in 0.03 seconds ======================= The example is (usually) the simplest failing input.
  14. MORE MINIMIZATION MORE MINIMIZATION from hypothesis import given, strategies as

    st @given(st.lists(st.integers())) def test_sum_less_than_42_nontrivial(numbers): if len(numbers) > 2: assert sum(numbers) < 42 failtest_minimization_nontrivial.py F [Traceback elided] AssertionError: assert 42 < 42 + where 42 = sum([0, 0, 42]) Falsifying example: test_sum_less_than_42_nontrivial(numbers=[0, 0, 42]) ======================= 1 failed in 0.05 seconds ======================= Works for more complicated cases as well.
  15. STRATEGIES STRATEGIES A strategy is a set of rules: It

    knows how to generate values. (Of course.) It knows how to simplify values. (Very important!) It's composable. (Building blocks for complex data.) Built-in strategies are very clever so yours can be simple.
  16. SIMPLE STRATEGIES SIMPLE STRATEGIES >>> st.just("I'm so lonely.").example() "I'm so

    lonely." >>> pprint((st.text("abc", max_size=5) | st.none()).example()) u'ccabc' >>> pprint((st.text("abc", max_size=5) | st.none()).example()) None >>> pprint((st.text("abc", max_size=5) | st.none()).example()) u'' >>> st.tuples(st.integers(0, 10), st.booleans()).example() (2, True) >>> st.tuples(st.integers(0, 10), st.booleans()).example() (7, False) >>> st.integers(0, 10).map(lambda x: 2 ** x).example() 1 >>> st.integers(0, 10).map(lambda x: 2 ** x).example() 512 >>> st.integers(0, 10).map(lambda x: 2 ** x).example() 8
  17. FLATMAP: SQUARE TEXT FLATMAP: SQUARE TEXT from hypothesis.strategies import integers,

    text, lists def text_line(size): return text("1234567890", min_size=size, max_size=size) def square_lines(size): return lists(text_line(size), min_size=size, max_size=size) square_text = integers(1, 5).flatmap(square_lines).map("\n".join) >>> print square_text.example() 42 44 >>> print square_text.example() 7 >>> print square_text.example() 5005 8274 9505 0599
  18. RECURSIVE: NESTED DICTS RECURSIVE: NESTED DICTS from hypothesis.strategies import text,

    none, dictionaries, recursive def nest_dict(values): return dictionaries(text("abc", max_size=5), values) nested_dicts = recursive(text("def", max_size=5) | none(), nest_dict) >>> pprint(nested_dicts.example()) {} >>> pprint(nested_dicts.example()) {u'': None, u'c': {}, u'cc': None, u'ccccc': {u'b': {}, u'bbb': {u'': None, u'a': None}, u'cb': {u'': None, u'a': None, u'b': None}}} >>> pprint(nested_dicts.example()) {u'': {u'': None, u'a': None}, u'bbabb': {}} >>> pprint(nested_dicts.example()) u'fd'
  19. WHAT MAKES A GOOD PROPERTY? WHAT MAKES A GOOD PROPERTY?

    True for (almost) all input. Does not duplicate the code under test. Describes the code under test in a meaningful way. Not too expensive to check. Harder than example-based tests, but a lot more useful.
  20. IDEMPOTENCE IDEMPOTENCE f( f(x) ) = f(x) from hypothesis import

    given, strategies as st @given(number=st.floats(-1e300, 1e300), decimals=st.integers(0, 5)) def test_round_idempotent(number, decimals): """ Rounding an already-rounded number is a no-op. """ rounded = round(number, decimals) assert rounded == round(rounded, decimals) It's already been done.
  21. ROUND TRIP ROUND TRIP f -1( f(x) ) = x

    from hypothesis import given, strategies as st import myjson def nest_data(st_values): return st.lists(st_values) | st.dictionaries(st.text(), st_values) def nested_data(): return st.none() | st.integers() | st.floats(-1e308, 1e308) | st.text() @given(st.recursive(nested_data(), nest_data)) def test_json_round_trip(data): """ Encoding a thing as JSON and decoding it again returns the same thing. (This will fail for input that contains tuples, but we don't test that.) """ assert data == myjson.loads(myjson.dumps(data)) There and back again.
  22. INVARIANCE INVARIANCE g( f(x) ) = g(x) from hypothesis import

    given, strategies as st @given(st.randoms(), st.lists(st.integers())) def test_something_invariant(rand, items): """ The set of items in a collection does not change when shuffling. """ orig_items = list(items) rand.shuffle(items) for item in items: orig_items.remove(item) assert orig_items == [] Some things never change.
  23. TRANSFORMATION TRANSFORMATION f( g(x) ) = g'( f(x) ) from

    string import ascii_uppercase as uc, ascii_lowercase as lc, digits from hypothesis import given, strategies as st st_upperlower = st.sampled_from(zip(uc + digits, lc + digits)) @given(st_upperlower, st.text()) def test_uppercase_transformation(upperlower, text): """ Appending a lowercase character before uppercasing is equivalent to appending its uppercase equivalent after uppercasing. """ (upper, lower) = upperlower assert text.upper() + upper == (text + lower).upper() All roads lead to Rome.
  24. VERIFICATION VERIFICATION P( f(x) ) is true from hypothesis import

    given, strategies as st @given(st.text("abcd \t\r\n")) def test_no_tabs_after_expandtabs(text): """ Expanding tabs replaces all tab characters. """ assert "\t" not in text.expandtabs() (e) None of the above.
  25. ORACLE ORACLE f(x) = g(x) import json from hypothesis import

    given, strategies as st import myjson def nest_data(st_values): return st.lists(st_values) | st.dictionaries(st.text(), st_values) def nested_data(): return st.none() | st.integers() | st.floats(-1e308, 1e308) | st.text() @given(st.recursive(nested_data(), nest_data).map(json.dumps)) def test_json_oracle(json_text): """ The new thing behaves the same as the old thing. """ assert json.loads(json_text) == myjson.loads(json_text) No, not the database.
  26. BACK TO OUR PRIORITY QUEUE BACK TO OUR PRIORITY QUEUE

    class NaivePriorityQueue(object): """ A priority queue moves the smallest item to the front. Naive implementation with O(N) `put()` and O(1) `get()`. """ def __init__(self): self._items = [] def __len__(self): return len(self._items) def put(self, item): """Add an item to the collection.""" self._items.append(item) self._items.sort(reverse=True) def get(self): """Remove and return the smallest item.""" return self._items.pop()
  27. MUCH BETTER PRIORITY QUEUE MUCH BETTER PRIORITY QUEUE class FastPriorityQueue(object):

    """ A priority queue moves the smallest item to the front. Heap-based implementation with O(log N) `put()` and `get()`. """ def __init__(self): self._heap = [] def __len__(self): return len(self._heap) def put(self, item): """ Add an item to the collection. """ self._heap.append(item) self._swim(len(self)) def get(self): """ Remove and return the smallest item in the collection. """ self._swap(1, len(self)) item = self._heap.pop()
  28. PRIORITY QUEUE STATEFUL TESTS PRIORITY QUEUE STATEFUL TESTS from hypothesis

    import assume, strategies as st from hypothesis.stateful import RuleBasedStateMachine, rule from fast_pqueue import FastPriorityQueue class PriorityQueueStateMachine(RuleBasedStateMachine): def __init__(self): super(PriorityQueueStateMachine, self).__init__() self.items = [] self.pq = FastPriorityQueue() @rule(item=st.integers()) def check_put(self, item): assert len(self.pq) == len(self.items) self.pq.put(item) self.items.append(item) @rule() def check_get(self): assert len(self.pq) == len(self.items) assume(len(self.items) > 0) item = min(self.items) self.items.remove(item) assert self.pq.get() == item TestPriorityQueue = PriorityQueueStateMachine.TestCase
  29. FAILURE REPORT FAILURE REPORT If we use max instead of

    min in the test, it fails: failtest_pqueue_stateful.py F [Traceback elided] AssertionError: assert 0 == 1 Step #1: check_put(item=0) Step #2: check_put(item=1) Step #3: check_get() ======================= 1 failed in 0.05 seconds ======================= We get a minimized* failing sequence of operations. *but not necessarily minimal
  30. THE END OF THE SLIDES THE END OF THE SLIDES

    Now you get to ask me hard questions. (Or I can show some more code.)