PyConZA 2015: "Property-based testing with Hypothesis" by Jeremy Thurgood

PROPERTY-BASED TESTING PROPERTY-BASED TESTING WITH HYPOTHESIS WITH HYPOTHESIS LET YOUR
ARMY OF ROBOTS WRITE YOUR TESTS LET YOUR ARMY OF ROBOTS WRITE YOUR TESTS Jeremy Thurgood PyconZA 2015

PART 1 PART 1 WHAT IS PROPERTY-BASED TESTING? WHAT IS
PROPERTY-BASED TESTING?

AN EXAMPLE AN EXAMPLE class NaivePriorityQueue(object): """ A priority queue
moves the smallest item to the front. Naive implementation with O(N) `put()` and O(1) `get()`. """ def __init__(self): self._items = [] def __len__(self): return len(self._items) def put(self, item): """Add an item to the collection.""" self._items.append(item) self._items.sort(reverse=True) def get(self): """Remove and return the smallest item.""" return self._items.pop()

HOW WE USUALLY WRITE TESTS HOW WE USUALLY WRITE TESTS
from naive_pqueue import NaivePriorityQueue def mkpq(items): """Create and fill a NaivePriorityQueue.""" pq = NaivePriorityQueue() [pq.put(item) for item in items] return pq def test_get_only(): """If there's only one item, we get that.""" pq = mkpq(["a"]) assert pq.get() == "a" assert len(pq) == 0 def test_get_both(): """If there are two items, we get both in order.""" pq = mkpq(["a", "b"]) assert pq.get() == "a" assert pq.get() == "b" assert len(pq) == 0 def test_put_only(): """If the queue is empty, putting is trivial.""" pq = NaivePriorityQueue()

ISSUES WITH EXAMPLE-BASED TESTS ISSUES WITH EXAMPLE-BASED TESTS Tedious to
write. Lots of repetitition. Painful to maintain. Focus on low-level details. ... But in�nitely better than no tests at all.

HOW WE WANT TO WRITE TESTS HOW WE WANT TO
WRITE TESTS In a world made of unicorns and kittens and rainbows... from magic import assert_correct from naive_pqueue import NaivePriorityQueue assert_correct(NaivePriorityQueue) ...but how does assert_correct know what's correct?

DEFINING CORRECTNESS DEFINING CORRECTNESS Let's de�ne "correctness" for a priority
queue. def items_are_returned_in_priority_order(pq, items): """We always get the smallest item first.""" for item in items: pq.put(item) assert len(pq) == len(items) current = pq.get() while len(pq) > 0: prior, current = current, pq.get() assert prior <= current def all_items_are_returned_exactly_once(pq, items): """We always get every item exactly once.""" for item in items: pq.put(item) assert len(pq) == len(items) while len(pq) > 0: items.remove(pq.get()) assert len(items) == 0

HOW PROPERTY-BASED TESTS WORK HOW PROPERTY-BASED TESTS WORK Focus on
high-level requirements. Properties de�ne behaviour. Randomly generated input. Failure case minimization. ... But no silver bullet.

FOR REAL, WITH HYPOTHESIS FOR REAL, WITH HYPOTHESIS from hypothesis
import given, strategies as st from naive_pqueue import NaivePriorityQueue @given(items=st.lists(st.integers(), min_size=1)) def test_items_are_returned_in_priority_order(items): """We always get the smallest item first.""" pq = NaivePriorityQueue() for item in items: pq.put(item) assert len(pq) == len(items) current = pq.get() while len(pq) > 0: prior, current = current, pq.get() assert prior <= current @given(items=st.lists(st.integers())) def test_all_items_are_returned_exactly_once(items): """We always get every item exactly once.""" pq = NaivePriorityQueue() for item in items: pq.put(item) assert len(pq) == len(items) while len(pq) > 0: items.remove(pq.get()) assert len(items) == 0

PART 2 PART 2 HYPOTHESIS BASICS HYPOTHESIS BASICS

ANATOMY OF A TEST ANATOMY OF A TEST @given turns
a test into a property ... ... that runs a bunch of times with random input ... ... generated by the strategies you give it ... ... and reports minimised failure examples.

SIMPLE TEST SIMPLE TEST from hypothesis import given, strategies as
st @given(st.floats()) def test_additive_inverse(x): """Double additive inverse has no effect.""" assert x == -(-x) failtest_additive_inverse_nan.py F [Traceback elided] AssertionError: assert nan == --nan Falsifying example: test_additive_inverse(x=nan) ======================= 1 failed in 0.02 seconds =======================

ASSUME VALID INPUT ASSUME VALID INPUT We can assume our
input isn't NaN. import math from hypothesis import given, assume, strategies as st @given(st.floats()) def test_additive_inverse(x): """Double additive inverse has no effect (except NaN).""" assume(not math.isnan(x)) assert x == -(-x) test_additive_inverse_assume.py . ======================= 1 passed in 0.06 seconds =======================

SETTINGS SETTINGS Let's see what Hypothesis is actually doing. from
hypothesis import given, Settings, Verbosity, strategies as st with Settings(max_examples=5, verbosity=Verbosity.verbose): @given(st.integers(-10, 10)) def test_additive_inverse(x): """Double additive inverse has no effect.""" assert x == -(-x) if __name__ == "__main__": test_additive_inverse() Trying example: test_additive_inverse(x=9) Trying example: test_additive_inverse(x=-8) Trying example: test_additive_inverse(x=4) Trying example: test_additive_inverse(x=3) Trying example: test_additive_inverse(x=-3)

MINIMIZATION MINIMIZATION from hypothesis import given, strategies as st @given(st.lists(st.integers()))
def test_sum_less_than_42(numbers): assert sum(numbers) < 42 failtest_minimization.py F [Traceback elided] AssertionError: assert 42 < 42 + where 42 = sum([42]) Falsifying example: test_sum_less_than_42(numbers=[42]) ======================= 1 failed in 0.03 seconds ======================= The example is (usually) the simplest failing input.

MORE MINIMIZATION MORE MINIMIZATION from hypothesis import given, strategies as
st @given(st.lists(st.integers())) def test_sum_less_than_42_nontrivial(numbers): if len(numbers) > 2: assert sum(numbers) < 42 failtest_minimization_nontrivial.py F [Traceback elided] AssertionError: assert 42 < 42 + where 42 = sum([0, 0, 42]) Falsifying example: test_sum_less_than_42_nontrivial(numbers=[0, 0, 42]) ======================= 1 failed in 0.05 seconds ======================= Works for more complicated cases as well.

PART 3 PART 3 GENERATING VALUES GENERATING VALUES

STRATEGIES STRATEGIES A strategy is a set of rules: It
knows how to generate values. (Of course.) It knows how to simplify values. (Very important!) It's composable. (Building blocks for complex data.) Built-in strategies are very clever so yours can be simple.

SIMPLE STRATEGIES SIMPLE STRATEGIES >>> st.just("I'm so lonely.").example() "I'm so
lonely." >>> pprint((st.text("abc", max_size=5) | st.none()).example()) u'ccabc' >>> pprint((st.text("abc", max_size=5) | st.none()).example()) None >>> pprint((st.text("abc", max_size=5) | st.none()).example()) u'' >>> st.tuples(st.integers(0, 10), st.booleans()).example() (2, True) >>> st.tuples(st.integers(0, 10), st.booleans()).example() (7, False) >>> st.integers(0, 10).map(lambda x: 2 ** x).example() 1 >>> st.integers(0, 10).map(lambda x: 2 ** x).example() 512 >>> st.integers(0, 10).map(lambda x: 2 ** x).example() 8

FLATMAP: SQUARE TEXT FLATMAP: SQUARE TEXT from hypothesis.strategies import integers,
text, lists def text_line(size): return text("1234567890", min_size=size, max_size=size) def square_lines(size): return lists(text_line(size), min_size=size, max_size=size) square_text = integers(1, 5).flatmap(square_lines).map("\n".join) >>> print square_text.example() 42 44 >>> print square_text.example() 7 >>> print square_text.example() 5005 8274 9505 0599

RECURSIVE: NESTED DICTS RECURSIVE: NESTED DICTS from hypothesis.strategies import text,
none, dictionaries, recursive def nest_dict(values): return dictionaries(text("abc", max_size=5), values) nested_dicts = recursive(text("def", max_size=5) | none(), nest_dict) >>> pprint(nested_dicts.example()) {} >>> pprint(nested_dicts.example()) {u'': None, u'c': {}, u'cc': None, u'ccccc': {u'b': {}, u'bbb': {u'': None, u'a': None}, u'cb': {u'': None, u'a': None, u'b': None}}} >>> pprint(nested_dicts.example()) {u'': {u'': None, u'a': None}, u'bbabb': {}} >>> pprint(nested_dicts.example()) u'fd'

PART 4 PART 4 WRITING PROPERTIES WRITING PROPERTIES

WHAT MAKES A GOOD PROPERTY? WHAT MAKES A GOOD PROPERTY?
True for (almost) all input. Does not duplicate the code under test. Describes the code under test in a meaningful way. Not too expensive to check. Harder than example-based tests, but a lot more useful.

IDEMPOTENCE IDEMPOTENCE f( f(x) ) = f(x) from hypothesis import
given, strategies as st @given(number=st.floats(-1e300, 1e300), decimals=st.integers(0, 5)) def test_round_idempotent(number, decimals): """ Rounding an already-rounded number is a no-op. """ rounded = round(number, decimals) assert rounded == round(rounded, decimals) It's already been done.

ROUND TRIP ROUND TRIP f -1( f(x) ) = x
from hypothesis import given, strategies as st import myjson def nest_data(st_values): return st.lists(st_values) | st.dictionaries(st.text(), st_values) def nested_data(): return st.none() | st.integers() | st.floats(-1e308, 1e308) | st.text() @given(st.recursive(nested_data(), nest_data)) def test_json_round_trip(data): """ Encoding a thing as JSON and decoding it again returns the same thing. (This will fail for input that contains tuples, but we don't test that.) """ assert data == myjson.loads(myjson.dumps(data)) There and back again.

INVARIANCE INVARIANCE g( f(x) ) = g(x) from hypothesis import
given, strategies as st @given(st.randoms(), st.lists(st.integers())) def test_something_invariant(rand, items): """ The set of items in a collection does not change when shuffling. """ orig_items = list(items) rand.shuffle(items) for item in items: orig_items.remove(item) assert orig_items == [] Some things never change.

TRANSFORMATION TRANSFORMATION f( g(x) ) = g'( f(x) ) from
string import ascii_uppercase as uc, ascii_lowercase as lc, digits from hypothesis import given, strategies as st st_upperlower = st.sampled_from(zip(uc + digits, lc + digits)) @given(st_upperlower, st.text()) def test_uppercase_transformation(upperlower, text): """ Appending a lowercase character before uppercasing is equivalent to appending its uppercase equivalent after uppercasing. """ (upper, lower) = upperlower assert text.upper() + upper == (text + lower).upper() All roads lead to Rome.

VERIFICATION VERIFICATION P( f(x) ) is true from hypothesis import
given, strategies as st @given(st.text("abcd \t\r\n")) def test_no_tabs_after_expandtabs(text): """ Expanding tabs replaces all tab characters. """ assert "\t" not in text.expandtabs() (e) None of the above.

ORACLE ORACLE f(x) = g(x) import json from hypothesis import
given, strategies as st import myjson def nest_data(st_values): return st.lists(st_values) | st.dictionaries(st.text(), st_values) def nested_data(): return st.none() | st.integers() | st.floats(-1e308, 1e308) | st.text() @given(st.recursive(nested_data(), nest_data).map(json.dumps)) def test_json_oracle(json_text): """ The new thing behaves the same as the old thing. """ assert json.loads(json_text) == myjson.loads(json_text) No, not the database.

PART 5 PART 5 STATEFUL TESTS STATEFUL TESTS

BACK TO OUR PRIORITY QUEUE BACK TO OUR PRIORITY QUEUE
class NaivePriorityQueue(object): """ A priority queue moves the smallest item to the front. Naive implementation with O(N) `put()` and O(1) `get()`. """ def __init__(self): self._items = [] def __len__(self): return len(self._items) def put(self, item): """Add an item to the collection.""" self._items.append(item) self._items.sort(reverse=True) def get(self): """Remove and return the smallest item.""" return self._items.pop()

MUCH BETTER PRIORITY QUEUE MUCH BETTER PRIORITY QUEUE class FastPriorityQueue(object):
""" A priority queue moves the smallest item to the front. Heap-based implementation with O(log N) `put()` and `get()`. """ def __init__(self): self._heap = [] def __len__(self): return len(self._heap) def put(self, item): """ Add an item to the collection. """ self._heap.append(item) self._swim(len(self)) def get(self): """ Remove and return the smallest item in the collection. """ self._swap(1, len(self)) item = self._heap.pop()

PRIORITY QUEUE STATEFUL TESTS PRIORITY QUEUE STATEFUL TESTS from hypothesis
import assume, strategies as st from hypothesis.stateful import RuleBasedStateMachine, rule from fast_pqueue import FastPriorityQueue class PriorityQueueStateMachine(RuleBasedStateMachine): def __init__(self): super(PriorityQueueStateMachine, self).__init__() self.items = [] self.pq = FastPriorityQueue() @rule(item=st.integers()) def check_put(self, item): assert len(self.pq) == len(self.items) self.pq.put(item) self.items.append(item) @rule() def check_get(self): assert len(self.pq) == len(self.items) assume(len(self.items) > 0) item = min(self.items) self.items.remove(item) assert self.pq.get() == item TestPriorityQueue = PriorityQueueStateMachine.TestCase

FAILURE REPORT FAILURE REPORT If we use max instead of
min in the test, it fails: failtest_pqueue_stateful.py F [Traceback elided] AssertionError: assert 0 == 1 Step #1: check_put(item=0) Step #2: check_put(item=1) Step #3: check_get() ======================= 1 failed in 0.05 seconds ======================= We get a minimized* failing sequence of operations. *but not necessarily minimal

THE END OF THE SLIDES THE END OF THE SLIDES
Now you get to ask me hard questions. (Or I can show some more code.)

PyConZA 2015: "Property-based testing with Hypo...

PyConZA 2015: "Property-based testing with Hypothesis" by Jeremy Thurgood

More Decks by Pycon ZA

Other Decks in Programming

Featured

Transcript