Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CUFP 2015: Modeling state transitions with specification-based random testing

CUFP 2015: Modeling state transitions with specification-based random testing

What if you thought about tests only in terms of properties and counterexamples? Properties that may assert failures and/or successes. Counterexamples to a set of properties that can “shrink” to smaller failures and be better reasoned about. Properties and counterexamples are the foundation of QuickCheck, a tool to generate tests over concurrent and non-deterministic code.

The difficult component of most real-world approaches to generative testing is understanding the bounds and requirements surrounding a problem/feature/application. Using Erlang’s QuickCheck implementation, we’ll walk through an example which models a continuous, side-effecting, hashtree-based synchronization mechanism, called Active Anti-Entropy (AAE), as an abstract state machine. By being able to query the (Erlang) process state and compare it against our model state, we can assure that our system matches its intended specification – which is a whole lot more important than tests being green.

Zeeshan Lakhani

September 05, 2015
Tweet

More Decks by Zeeshan Lakhani

Other Decks in Programming

Transcript

  1. “It's really hard to explain that test infrastructure for distributed

    systems is actually a fantastically deep and cool problem.” Jay Kreps 05/15/2015 1
  2. curl “$RIAK_HOST/search/query/cufp?wt=json&q=name_s:lj_*” • distributed across a cluster of nodes •

    all index entries for a given riak object are co-located on the same physical machine • b/c of replication, not all nodes need to be queried, only a coverage set 4
  3. “A liveness property guarantees that ‘something good eventually happens’; for

    example, all requests eventually receive a response.” Bailis, Ghodsi Eventual consistency today: limitations, extensions, and beyond Communications of the ACM #56 5
  4. • “To ensure convergence, replicas must exchange information with one

    another about which writes they have seen” • This exchange is called anti-entropy • read-repair is an anti-entropy mechanism to repair out-of-date replicas optimistic replication and resolution 7 8 6
  5. • background process using/storing merkle trees to keep divergent and

    missing data in sync • disk-based: issues with in-memory trees • containing billions of persistent keys: build a tree once
 • real-time updates - liveness
 • non-blocking - can’t affect incoming write-rate riak’s AAE (active anti-entropy)9
  6. • riak search / yokozuna iterates over entropy data containing

    {bucket_type, bucket}/key and object hash • riak search / yokozuna trees always repair entries remote_missing. . . key- value data is canonical
  7. • a complete binary tree with an n-bit value associated

    with each node • Each internal node value is the result of a hash of the node values of its children (n is the number of bits returned by the hash function) merkle trees 10
  8. 11

  9. 4

  10. 4

  11. 4

  12. 4

  13. 12

  14. %% @doc - Perform BAD_is_member action -spec bad_is_member(list(), non_neg_integer()) ->

    boolean(). bad_is_member(S, N) -> lists:member(N, lists:sublist(S, 5)). shrinking
  15. shrinking … stateful_sm:is_member([5, 0, 5, 1, 4], 3) -> false

    stateful_sm:is_member([5, 0, 5, 1, 4], 4) -> true stateful_sm:is_member([5, 0, 5, 1, 4], 2) -> false stateful_sm:add([5, 0, 5, 1, 4], 3) -> [3, 5, 0, 5, 1, 4] stateful_sm:is_member([3, 5, 0, 5, 1, 4], 2) -> false stateful_sm:add([3, 5, 0, 5, 1, 4], 0) -> [0, 3, 5, 0, 5, 1, 4] stateful_sm:add([0, 3, 5, 0, 5, 1, 4], 2) -> [2, 0, 3, 5, 0, 5, 1, 4] stateful_sm:add([2, 0, 3, 5, 0, 5, 1, 4], 0) -> [0, 2, 0, 3, 5, 0, 5, 1, 4] stateful_sm:is_member([0, 2, 0, 3, 5, 0, 5, 1, 4], 4) -> false Reason: false Shrinking xxxxx....x..(6 times) [{set,{var,1},{call,stateful_sm,add,[[5,1,4],0]}}]
  16. my generation vclock() -> ?LET(VclockSym, vclock_sym(), eval(VclockSym)). vclock_sym() -> ?LAZY(

    oneof([ {call, vclock, fresh, []}, ?LETSHRINK([Clock], [vclock_sym()], {call, ?MODULE, increment, [noshrink(binary(4)), nat(), Clock]}) ])). not_empty(G) -> ?SUCHTHAT(X, G, X /= [] andalso X /= <<>>).
  17. increment(Actor, Count, Vclock) -> lists:foldl( fun vclock:increment/2, Vclock, lists:duplicate(Count, Actor)).

    riak_object() -> ?LET({{Bucket, Key}, Vclock, Value}, {bkey(), vclock(), binary()}, riak_object:set_vclock( riak_object:new(Bucket, Key, Value), Vclock)). bkey() -> {non_blank_string(), %% bucket non_blank_string()}. %% key
  18. riak_object() -> ?LET({{Bucket, Key}, Vclock, Value}, {bkey(), vclock(), binary()}, riak_object:set_vclock(

    riak_object:new(Bucket, Key, Value), Vclock)). a generated riak object
  19. a generated riak object {r_object,<<"|plVWx&F">>,<<"?#sjiGS|">>, [{r_content,{dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[], [],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], [],[]}}},

    <<"t+">>}], [], {dict,1,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], [[clean|true]], []}}}, undefined}
  20. •commands - symbolic calls to run during test sequences •symbolic

    variables - generated during test generation •dynamic values - generated during test execution •next_state - operates during test generation and execution •preconditions - ensure a logical precedence between operations, checks command in generation and used in shrinking •postcondition - predicate called during test execution (that must hold) with the dynamic state before the call •aggregate - Collects a list of values in each test, and shows the distribution of list elements a few terms
  21. -record(state, {yz_idx_tree, kv_idx_tree, yz_idx_objects = dict:new(), kv_idx_objects = dict:new(), trees_updated

    = false, both = []}). %% Initialize State initial_state() -> #state{}. the model’s state record13
  22. • we start a 1 riak key-value idx-tree process and

    a 1 yokozuna idx-tree process • this happens per vnode (each vnode is responsible for a partition) • each process contains multiple hashtrees due to preflist overlap • some nodes contain data not part of preflist, otherwise always divergent
  23. %% We want to constrain all data to fit in

    the last portion of our %% hash space so that it maps to partition 0. hashfun({_Bucket, _Key}=ID) -> %% Top valid hash value Top = (1 bsl 160) - 1, %% Calculate partition size PartitionSize = chash:ring_increment(?RING_SIZE), %% Generate an integer hash Hash = intify_sha(riak_core_util:chash_std_keyfun(ID)), %% Map the hash to 1/?RING_SIZE of the full hash space SmallHash = Hash rem PartitionSize, %% Force the hash into the last 1/?RING_SIZE block Top - SmallHash.
  24. insert_kv_tree_pre(S, _Args) -> S#state.kv_idx_tree /= undefined. insert_kv_tree_command(S) -> {call, ?MODULE,

    insert_kv_tree, [insert_method(), eqc_util:riak_object(), S#state.kv_idx_tree]}. insert_kv_tree_next(S, _V, [_, RObj, _]) -> {ok, TreeData} = dict:find(?TEST_INDEX_N, S#state.kv_idx_objects), S#state{kv_idx_objects=dict:store( ?TEST_INDEX_N, set_treedata(RObj, TreeData), S#state.kv_idx_objects), trees_updated=false}. insert into the key-val hashtree
  25. insert_kv_tree(Method, RObj, {ok, TreePid}) -> {Bucket, Key} = eqc_util:get_bkey_from_object(RObj), Items

    = [{object, {Bucket, Key}, RObj}], case Method of sync -> riak_kv_index_hashtree:insert( Items, [], TreePid); async -> riak_kv_index_hashtree:async_insert( Items, [], TreePid) end. insert_kv_tree_post(_S, _Args, _Res) -> true.
  26. insert_yz_tree_command(S) -> {call, ?MODULE, insert_yz_tree, [insert_method(), eqc_util:riak_object(), S#state.yz_idx_tree]}. insert_yz_tree_next(S, _V,

    [_, RObj, _]) -> {ok, TreeData} = dict:find(?TEST_INDEX_N, S#state.yz_idx_objects), S#state{yz_idx_objects=dict:store( ?TEST_INDEX_N, set_treedata(RObj, TreeData), S#state.yz_idx_objects), trees_updated=false}. insert_yz_tree(Method, RObj, {ok, TreePid}) -> BKey = eqc_util:get_bkey_from_object(RObj), yz_index_hashtree:insert( Method, ?TEST_INDEX_N, BKey, yz_kv:hash_object(RObj), TreePid, []). insert into the search hashtree
  27. -spec insert_both(sync|async, obj(), {ok, tree()}, {ok, tree()}) -> {ok, ok}.

    insert_both(Method, RObj, YZOkTree, KVOkTree) -> {insert_yz_tree(Method, RObj, YZOkTree), insert_kv_tree(Method, RObj, KVOkTree)}.
  28. 4

  29. update_pre(S, _Args) -> S#state.yz_idx_tree /= undefined andalso S#state.kv_idx_tree /= undefined.

    update_command(S) -> {call, ?MODULE, update, [S#state.yz_idx_tree, S#state.kv_idx_tree]}. update_next(S, _Value, _Args) -> S#state{trees_updated=true}. update({ok, YZTreePid}, {ok, KVTreePid}) -> yz_index_hashtree:update(?TEST_INDEX_N, YZTreePid), riak_kv_index_hashtree:update(?TEST_INDEX_N, KVTreePid), ok. update_post(S, _Args, _Res) -> ... eq(ModelKVKeyCount, RealKVKeyCount) and eq(ModelYZKeyCount, RealYZKeyCount). update the hashtrees
  30. 4

  31. compare_pre(S, _Args) -> S#state.yz_idx_tree /= undefined andalso S#state.kv_idx_tree /= undefined

    andalso S#state.trees_updated. compare_command(S) -> {call, ?MODULE, compare, [S#state.yz_idx_tree, S#state.kv_idx_tree]}. compare({ok, YZTreePid}, {ok, KVTreePid}) -> Remote = fun(get_bucket, {L, B}) -> riak_kv_index_hashtree:exchange_bucket(?TEST_INDEX_N, L, B, KVTreePid); (key_hashes, Segment) -> riak_kv_index_hashtree:exchange_segment(?TEST_INDEX_N, Segment, KVTreePid); (_, _) -> ok end, AccFun = fun(KeyDiff, Count) -> lists:foldl(fun(Diff, InnerCount) -> case repair(0, Diff) of full_repair -> InnerCount + 1; _ -> InnerCount end end, Count, KeyDiff) end, yz_index_hashtree:compare(?TEST_INDEX_N, Remote, AccFun, 0, YZTreePid). compare the hashtrees
  32. 4

  33. compare_post(S, _Args, Res) -> YZTreeData = dict:fetch(?TEST_INDEX_N, S#state.yz_idx_objects), KVTreeData =

    dict:fetch(?TEST_INDEX_N, S#state.kv_idx_objects), LeftDiff = dict:fold(fun(BKey, Hash, Count) -> case dict:find(BKey, KVTreeData) of {ok, Hash} -> Count; {ok, _OtherHash} -> Count; error -> Count+1 end end, 0, YZTreeData), RightDiff = dict:fold(fun(BKey, Hash, Count) -> case dict:find(BKey, YZTreeData) of {ok, Hash} -> Count; {ok, _OtherHash} -> Count; error -> Count+1 end end, LeftDiff, KVTreeData), eq(RightDiff, Res).
  34. prop_correct() -> ?FORALL(Cmds,commands(?MODULE, #state{}), aggregate(command_names(Cmds), ?TRAPEXIT(begin {H, S, Res} =

    run_commands(?MODULE, Cmds), catch yz_index_hashtree:destroy( element(2, S#state.yz_idx_tree)), catch riak_kv_index_hashtree:destroy( element(2, S#state.kv_idx_tree)), pretty_commands(?MODULE, Cmds, {H, S, Res}, Res == ok) end))). property test
  35. property test ................. OK, passed 100 tests 22.3% {yz_index_hashtree_eqc,insert_yz_tree,3} 20.7%

    {yz_index_hashtree_eqc,insert_kv_tree,3} 18.1% {yz_index_hashtree_eqc,insert_both,4} 17.7% {yz_index_hashtree_eqc,update,2} 9.2% {yz_index_hashtree_eqc,start_kv_tree,0} 8.8% {yz_index_hashtree_eqc,start_yz_tree,0} 3.3% {yz_index_hashtree_eqc,compare,2}
  36. 1. bit.ly/1JS3nRB 2. bit.ly/1Ne15if 3. bit.ly/1LZ5sxj 4. bit.ly/1f2JElM 5. bit.ly/1JG4cLU

    6. bit.ly/1FmYpGS 7. bit.ly/1JG4cLU 8. bit.ly/1UqXAZW 9. bit.ly/1JEHCAX 10. bit.ly/1GjNLj7 11. bit.ly/1JDqdJ8 12. bit.ly/1EE9LLA 13. bit.ly/1INKmwQ footnotes