Hansei: Property-based Development of Concurrent Systems

Hansei: Property-based Development of Concurrent Systems

Slides from my Erlang Workshop 2012 presentation. Related paper: http://dl.acm.org/citation.cfm?id=2364505

5f1086a52e504fa025e138c6924903e1?s=128

Joseph Blomstedt

September 14, 2012
Tweet

Transcript

  1. Joseph Blomstedt (@jtuple) Basho Technologies joe@basho.com Property-based Development of Concurrent

    Systems Erlang Workshop September 2012 Hansei Tuesday, September 25, 2012
  2. Concurrent programming is hard 2 Tuesday, September 25, 2012

  3. Debugging concurrent programs is harder 3 Tuesday, September 25, 2012

  4. Testing/Verification Make Things Easier 4 Tuesday, September 25, 2012

  5. This talk is about testing 5 Tuesday, September 25, 2012

  6. This talk is about testing early often always 6 Tuesday,

    September 25, 2012
  7. This talk is about test-driven development 7 Tuesday, September 25,

    2012
  8. Why? 8 Tuesday, September 25, 2012

  9. research/prototype production ready implementation days-weeks-months months-years Tuesday, September 25, 2012

  10. Enemy: non-determinism 10 Tuesday, September 25, 2012

  11. right concurrency interleaving right scenario concurrency violation + = Tuesday,

    September 25, 2012
  12. Erlang Testing Tools • Property-based Testing Quickcheck Proper Triq 12

    • Interleaving Tools PULSE Concuerror McErlang Tuesday, September 25, 2012
  13. Concurrency Interleaving Tool Property-based Testing Tool + Tuesday, September 25,

    2012
  14. Examples 14 • QuickCheck / PULSE • QuickCheck / McErlang

    Tuesday, September 25, 2012
  15. Hansei is built on QuickCheck 15 Tuesday, September 25, 2012

  16. Enables lightweight stateful testing 16 Tuesday, September 25, 2012

  17. Provides extended OTP behaviors 17 Tuesday, September 25, 2012

  18. Provides built-in message interleaving 18 Tuesday, September 25, 2012

  19. Supports other interleaving tools 19 Tuesday, September 25, 2012

  20. Hansei Goals • Enable end to end testing prototype to

    final implementation • Use existing OTP behaviors Extended behaviors with property information • Message interleaving across VMs 20 Tuesday, September 25, 2012
  21. 21 Test Implement Prototype/Model Retest Tuesday, September 25, 2012

  22. Lightweight/Integrated Testing 22 Tuesday, September 25, 2012

  23. Quickcheck eqc_statem 23 Run against stateful code Verify postconditions Generate

    Command Sequence Tuesday, September 25, 2012
  24. Quickcheck eqc_statem 24 command(State) -> %% Commands to run against

    stateful system oneof(Cmds). precondition(State, Cmd) -> %% Return true if cmd is valid in current state. next_state(State, Result, Cmd) -> %% Update test state after a given cmd. postcondition(State, Cmd, Result) -> %% Test postconditions. Tuesday, September 25, 2012
  25. System Stateful Test Tuesday, September 25, 2012

  26. System Test Annotations Tuesday, September 25, 2012

  27. Hansei 27 • Test consists of test module and a

    set of process modules • Events External events, timers, things you do not care to model • Calls/casts map to simulated receive/reply semantics Tuesday, September 25, 2012
  28. Hansei Test 28 test module test process (server) test process

    (server) test process (server) test process (fsm) test process (fsm) test process (fsm) Tuesday, September 25, 2012
  29. gen_server 29 handle_call handle_cast init Tuesday, September 25, 2012

  30. gen_server 30 handle_call handle_cast init handle_event post_call post_cast post_event always

    event precondition Tuesday, September 25, 2012
  31. Test Module 31 after_call after_cast initial_state after_event post_call post_cast post_event

    always event precondition process_modules Tuesday, September 25, 2012
  32. Hansei Operating Modes • Simulation Used during prototyping / modeling

    • Tracing • Tracing + Interception 32 Tuesday, September 25, 2012
  33. Simulation Mode • Calls/casts mapped to command sequences • Generates

    sequence of events, calls, casts • Runs against simulated system of processes • Shrinks sequence when postconditions fail 33 Tuesday, September 25, 2012
  34. 34 Test Implement Prototype/Model Retest Tuesday, September 25, 2012

  35. Tracing Mode • Generate event sequences, not call/casts • Run

    against external stateful system • Erlang tracing used to capture actual call/casts that occurred • Verify events + observed call/casts against model and final cluster state 35 Tuesday, September 25, 2012
  36. Tracing + Implementation • Modify implementation to enable controlling message

    interleaving • Implemented as a proxy process that delays forwarding messages until told to do so by test module 36 Tuesday, September 25, 2012
  37. Simple Example • Nodes join together an form a cluster

    • Nodes periodically gossip membership state to other known nodes • Prototype nodes as gen_servers 37 Tuesday, September 25, 2012
  38. 38 get_members(Node) -> gen_server:call(Node, get_members). Node Server (1/5) Tuesday, September

    25, 2012
  39. 39 -record(state, {id, members}). init(Node) -> {ok, #state{id=Node, members=[Node]}}. handle_call(get_members,

    _From, State) -> {reply, State#state.members, State}; handle_call(get_state, _From, State) -> {reply, State, State}. Node Server (2/5) Tuesday, September 25, 2012
  40. 40 handle_cast({gossip, #state{members=OtherMembers}}, State=#state{members=Members}) -> Members2 = ordsets:union(Members, OtherMembers), State2

    = State{members=Members2}, {noreply, State2}. Node Server (3/5) Tuesday, September 25, 2012
  41. 41 events(#state{id=Node, members=Members}) -> {call,?MODULE,send_gossip,[Node, [elements(Members)]]}. precondition({send_gossip, [Node, [OtherNode]]}, S)

    -> all([lists:member(OtherNode, S#state.members), Node /= OtherNode]). Node Server (4/5) Tuesday, September 25, 2012
  42. 42 handle_event({join, [OtherNode]}, State) -> OtherState = gen_server:call(OtherNode, get_state), Members

    = OtherState#state.members, Members2 = ordsets:add_element(State#state.id, Members), {noreply, State#state{members=Members2}; handle_event({send_gossip, [OtherNode]}, State) -> gen_server:cast(OtherNode, {gossip, State}), {noreply, State}. Node Server (5/5) Tuesday, September 25, 2012
  43. 43 -record(state, {nodes, singleton}). prop_riak() -> hansei_test:simulate(?MODULE). process_modules() -> lists:duplicate(?CLUSTER_SIZE,

    riak_node). initial_state(Procs) -> #state{nodes=Procs, singleton=Procs}. Test Module (1/2) Tuesday, September 25, 2012
  44. 44 events(#state{nodes=Nodes}) -> {call,?MODULE,join,[elements(Nodes), [elements(Nodes)]]}. precondition({join, [Node,[OtherNode]]}, S) -> Singleton

    = S#state.singleton, all([Node /= OtherNode, lists:member(Node, Singleton), (Singleton == S#state.nodes) or lists:member(OtherNode, Singleton)]). after_event({join, [Node,[OtherNode]]}) -> Singleton = S#state.singleton -- [Node, OtherNode], S#state{singleton=Singleton}. Test Module (2/2) Tuesday, September 25, 2012
  45. Extended Example • Cluster maintains a weak leader Lowest node

    id in the cluster is considered the leader No actual leader election or failure detection • Property we care about At all times, there is only one node that believe it is the leader of a cluster 45 Tuesday, September 25, 2012
  46. 46 get_leader(Node) -> gen_server:call(Node, get_leader). Extended Node Server (1/4) Tuesday,

    September 25, 2012
  47. 47 -record(state, {id, members, leader}). init(Node) -> {ok, #state{id=Node, members=[Node],

    leader=Node}}. handle_call(get_leader, _From, State) -> {reply, State#state.leader, State}; Extended Node Server (2/4) Tuesday, September 25, 2012
  48. 48 handle_cast({gossip, #state{members=OtherMembers}}, State=#state{members=Members}) -> Members2 = ordsets:union(Members, OtherMembers), case

    is_leader(State) of true -> Leader2 = hd(lists:sort(Members2)); false -> Leader2 = Leader end, State2 = State#state{members=Members2, leader=Leader2}, {noreply, State2}. Extended Node Server (3/4) Tuesday, September 25, 2012
  49. 49 handle_event({join, [OtherNode]}, State) -> OtherState = gen_server:call(OtherNode, get_state), #state{members=Members,

    leader=Leader} = OtherState, Members2 = ordsets:add_element(State#state.id, Members), {noreply, State#state{members=Members2, leader=Leader}}; handle_event({send_gossip, [OtherNode]}, State) -> gen_server:cast(OtherNode, {gossip, State}), {noreply, State}. Extended Node Server (4/4) Tuesday, September 25, 2012
  50. 50 always(S) -> all([begin Members = riak_node:get_members(Node), one_leader(Members) end ||

    Node <- S#state.nodes]). one_leader(Members) -> Leaders = [Leader || Node <- Members, Leader <- [riak_node:get_leader(Node)], Leader == Node], length(lists:usort(Leaders)) < 2. Extended Test Module Tuesday, September 25, 2012
  51. Counterexample 51 [{init,{test_state,undefined,undefined,riak_model, 0,[],undefined,undefined,simulate}}, {set,{var,1},{call,hansei_test,init_dynamic,[]}}, {set,{var,2},{call,hansei_test,init_system,[riak_model]}}, {set,{var,3},{call,riak_model,join,[1,[3]]}}, {set,{var,4},{call,hansei_test,rcvmsg,[3,{1,{call,get_state}}]}}, {set,{var,5},{call,hansei_test,rcvreply,[1,{3,{state,3,[3],3}}]}}, {set,{var,6},{call,riak_node,send_gossip,[1,[3]]}},

    {set,{var,7}, {call,hansei_test,rcvmsg, [3,{1,{cast,{gossip,{state,1,[1,3],3}}}}]}}, {set,{var,8},{call,riak_node,send_gossip,[3,[1]]}}, {set,{var,9},{call,riak_node,send_gossip,[1,[3]]}}, {set,{var,16}, {call,hansei_test,rcvmsg,[3,{1,{cast,{gossip,{state,1,[1,3],3}}}}]}}, {set,{var,18}, {call,hansei_test,rcvmsg,[1,{3,{cast,{gossip,{state,3,[1,3],1}}}}]}}] {postcondition,false} Tuesday, September 25, 2012
  52. 2 3 1 join 3 call: get_state cast: gossip([1,3], 3)

    send_gossip 3 send_gossip 3 send_gossip 1 reply: ([3], 3) cast: gossip([1,3], 3) cast: gossip([1,3], 1) [1,3], 3 [1,3], 1 [3], 3 [1,3], 3 [1], 1 [1,3], 1 52 Tuesday, September 25, 2012
  53. Versioned leader state 53 • Add version number to gossiped

    state • Leader increments version when changed • Node updates leader only if newer version • After changes, model passes without issue Tuesday, September 25, 2012
  54. 2 3 1 join 3 call: get_state cast: gossip([1,3], 3)

    send_gossip 3 send_gossip 3 send_gossip 1 reply: ([3], 3) cast: gossip([1,3], 3) cast: gossip([1,3], 1) [1,3], 1 [1,3], 1 [3], 3 [1,3], 3 [1], 1 [1,3], 1 54 Tuesday, September 25, 2012
  55. Riak Implementation • Simple example similar to Riak clustering system

    • Can run tracing/interception mode against Riak • Use riak_test to bring up multiple Riak nodes • Change process_modules to return a list [{node(), riak_core_gossip})] 55 Tuesday, September 25, 2012
  56. Open source • Hansei will be released as open-source http://github.com/basho/hansei

    • Apache License (most likely) • Soon! 56 Tuesday, September 25, 2012
  57. Future Plans • Simulate monitors + links • Simulate dropping

    messages Earlier prototype did, recent changes broke code • Support process exits, supervisors • Add properties to most of riak_core • Use Hansei in construction of basho_ensemble New dynamic ensemble, leader election library 57 Tuesday, September 25, 2012
  58. 58 Basho is hiring! joe@basho.com @jtuple Tuesday, September 25, 2012