Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hansei: Property-based Development of Concurrent Systems

Joseph Blomstedt
September 14, 2012

Hansei: Property-based Development of Concurrent Systems

Slides from my Erlang Workshop 2012 presentation. Related paper: http://dl.acm.org/citation.cfm?id=2364505

Joseph Blomstedt

September 14, 2012
Tweet

More Decks by Joseph Blomstedt

Other Decks in Technology

Transcript

  1. Joseph Blomstedt (@jtuple)
    Basho Technologies
    [email protected]
    Property-based Development of Concurrent Systems
    Erlang Workshop
    September 2012
    Hansei
    Tuesday, September 25, 2012

    View Slide

  2. Concurrent programming is hard
    2
    Tuesday, September 25, 2012

    View Slide

  3. Debugging concurrent programs is harder
    3
    Tuesday, September 25, 2012

    View Slide

  4. Testing/Verification Make Things Easier
    4
    Tuesday, September 25, 2012

    View Slide

  5. This talk is about testing
    5
    Tuesday, September 25, 2012

    View Slide

  6. This talk is about testing early
    often
    always
    6
    Tuesday, September 25, 2012

    View Slide

  7. This talk is about test-driven development
    7
    Tuesday, September 25, 2012

    View Slide

  8. Why?
    8
    Tuesday, September 25, 2012

    View Slide

  9. research/prototype production ready implementation
    days-weeks-months
    months-years
    Tuesday, September 25, 2012

    View Slide

  10. Enemy: non-determinism
    10
    Tuesday, September 25, 2012

    View Slide

  11. right concurrency interleaving
    right scenario
    concurrency violation
    +
    =
    Tuesday, September 25, 2012

    View Slide

  12. Erlang Testing Tools
    • Property-based Testing
    Quickcheck
    Proper
    Triq
    12
    • Interleaving Tools
    PULSE
    Concuerror
    McErlang
    Tuesday, September 25, 2012

    View Slide

  13. Concurrency Interleaving Tool
    Property-based Testing Tool
    +
    Tuesday, September 25, 2012

    View Slide

  14. Examples
    14
    • QuickCheck / PULSE
    • QuickCheck / McErlang
    Tuesday, September 25, 2012

    View Slide

  15. Hansei is built on QuickCheck
    15
    Tuesday, September 25, 2012

    View Slide

  16. Enables lightweight stateful testing
    16
    Tuesday, September 25, 2012

    View Slide

  17. Provides extended OTP behaviors
    17
    Tuesday, September 25, 2012

    View Slide

  18. Provides built-in message interleaving
    18
    Tuesday, September 25, 2012

    View Slide

  19. Supports other interleaving tools
    19
    Tuesday, September 25, 2012

    View Slide

  20. Hansei Goals
    • Enable end to end testing
    prototype to final implementation
    • Use existing OTP behaviors
    Extended behaviors with property information
    • Message interleaving across VMs
    20
    Tuesday, September 25, 2012

    View Slide

  21. 21
    Test
    Implement
    Prototype/Model
    Retest
    Tuesday, September 25, 2012

    View Slide

  22. Lightweight/Integrated Testing
    22
    Tuesday, September 25, 2012

    View Slide

  23. Quickcheck eqc_statem
    23
    Run against stateful code
    Verify postconditions
    Generate Command Sequence
    Tuesday, September 25, 2012

    View Slide

  24. Quickcheck eqc_statem
    24
    command(State) ->
    %% Commands to run against stateful system
    oneof(Cmds).
    precondition(State, Cmd) ->
    %% Return true if cmd is valid in current state.
    next_state(State, Result, Cmd) ->
    %% Update test state after a given cmd.
    postcondition(State, Cmd, Result) ->
    %% Test postconditions.
    Tuesday, September 25, 2012

    View Slide

  25. System
    Stateful Test
    Tuesday, September 25, 2012

    View Slide

  26. System
    Test Annotations
    Tuesday, September 25, 2012

    View Slide

  27. Hansei
    27
    • Test consists of test module and a set of
    process modules
    • Events
    External events, timers, things you do not care to
    model
    • Calls/casts map to simulated receive/reply
    semantics
    Tuesday, September 25, 2012

    View Slide

  28. Hansei Test
    28
    test module
    test process (server)
    test process (server)
    test process (server)
    test process (fsm)
    test process (fsm)
    test process (fsm)
    Tuesday, September 25, 2012

    View Slide

  29. gen_server
    29
    handle_call
    handle_cast
    init
    Tuesday, September 25, 2012

    View Slide

  30. gen_server
    30
    handle_call
    handle_cast
    init
    handle_event
    post_call
    post_cast
    post_event
    always
    event
    precondition
    Tuesday, September 25, 2012

    View Slide

  31. Test Module
    31
    after_call
    after_cast
    initial_state
    after_event
    post_call
    post_cast
    post_event
    always
    event
    precondition
    process_modules
    Tuesday, September 25, 2012

    View Slide

  32. Hansei Operating Modes
    • Simulation
    Used during prototyping / modeling
    • Tracing
    • Tracing + Interception
    32
    Tuesday, September 25, 2012

    View Slide

  33. Simulation Mode
    • Calls/casts mapped to command sequences
    • Generates sequence of events, calls, casts
    • Runs against simulated system of processes
    • Shrinks sequence when postconditions fail
    33
    Tuesday, September 25, 2012

    View Slide

  34. 34
    Test
    Implement
    Prototype/Model
    Retest
    Tuesday, September 25, 2012

    View Slide

  35. Tracing Mode
    • Generate event sequences, not call/casts
    • Run against external stateful system
    • Erlang tracing used to capture actual call/casts
    that occurred
    • Verify events + observed call/casts against
    model and final cluster state
    35
    Tuesday, September 25, 2012

    View Slide

  36. Tracing + Implementation
    • Modify implementation to enable controlling
    message interleaving
    • Implemented as a proxy process that delays
    forwarding messages until told to do so by test
    module
    36
    Tuesday, September 25, 2012

    View Slide

  37. Simple Example
    • Nodes join together an form a cluster
    • Nodes periodically gossip membership state to
    other known nodes
    • Prototype nodes as gen_servers
    37
    Tuesday, September 25, 2012

    View Slide

  38. 38
    get_members(Node) ->
    gen_server:call(Node, get_members).
    Node Server (1/5)
    Tuesday, September 25, 2012

    View Slide

  39. 39
    -record(state, {id, members}).
    init(Node) ->
    {ok, #state{id=Node, members=[Node]}}.
    handle_call(get_members, _From, State) ->
    {reply, State#state.members, State};
    handle_call(get_state, _From, State) ->
    {reply, State, State}.
    Node Server (2/5)
    Tuesday, September 25, 2012

    View Slide

  40. 40
    handle_cast({gossip, #state{members=OtherMembers}},
    State=#state{members=Members}) ->
    Members2 = ordsets:union(Members, OtherMembers),
    State2 = State{members=Members2},
    {noreply, State2}.
    Node Server (3/5)
    Tuesday, September 25, 2012

    View Slide

  41. 41
    events(#state{id=Node, members=Members}) ->
    {call,?MODULE,send_gossip,[Node, [elements(Members)]]}.
    precondition({send_gossip, [Node, [OtherNode]]}, S) ->
    all([lists:member(OtherNode, S#state.members),
    Node /= OtherNode]).
    Node Server (4/5)
    Tuesday, September 25, 2012

    View Slide

  42. 42
    handle_event({join, [OtherNode]}, State) ->
    OtherState = gen_server:call(OtherNode, get_state),
    Members = OtherState#state.members,
    Members2 = ordsets:add_element(State#state.id, Members),
    {noreply, State#state{members=Members2};
    handle_event({send_gossip, [OtherNode]}, State) ->
    gen_server:cast(OtherNode, {gossip, State}),
    {noreply, State}.
    Node Server (5/5)
    Tuesday, September 25, 2012

    View Slide

  43. 43
    -record(state, {nodes, singleton}).
    prop_riak() ->
    hansei_test:simulate(?MODULE).
    process_modules() ->
    lists:duplicate(?CLUSTER_SIZE, riak_node).
    initial_state(Procs) ->
    #state{nodes=Procs, singleton=Procs}.
    Test Module (1/2)
    Tuesday, September 25, 2012

    View Slide

  44. 44
    events(#state{nodes=Nodes}) ->
    {call,?MODULE,join,[elements(Nodes), [elements(Nodes)]]}.
    precondition({join, [Node,[OtherNode]]}, S) ->
    Singleton = S#state.singleton,
    all([Node /= OtherNode,
    lists:member(Node, Singleton),
    (Singleton == S#state.nodes)
    or lists:member(OtherNode, Singleton)]).
    after_event({join, [Node,[OtherNode]]}) ->
    Singleton = S#state.singleton -- [Node, OtherNode],
    S#state{singleton=Singleton}.
    Test Module (2/2)
    Tuesday, September 25, 2012

    View Slide

  45. Extended Example
    • Cluster maintains a weak leader
    Lowest node id in the cluster is considered the leader
    No actual leader election or failure detection
    • Property we care about
    At all times, there is only one node that believe it is the
    leader of a cluster
    45
    Tuesday, September 25, 2012

    View Slide

  46. 46
    get_leader(Node) ->
    gen_server:call(Node, get_leader).
    Extended Node Server (1/4)
    Tuesday, September 25, 2012

    View Slide

  47. 47
    -record(state, {id, members, leader}).
    init(Node) ->
    {ok, #state{id=Node, members=[Node], leader=Node}}.
    handle_call(get_leader, _From, State) ->
    {reply, State#state.leader, State};
    Extended Node Server (2/4)
    Tuesday, September 25, 2012

    View Slide

  48. 48
    handle_cast({gossip, #state{members=OtherMembers}},
    State=#state{members=Members}) ->
    Members2 = ordsets:union(Members, OtherMembers),
    case is_leader(State) of
    true ->
    Leader2 = hd(lists:sort(Members2));
    false ->
    Leader2 = Leader
    end, State2 = State#state{members=Members2,
    leader=Leader2},
    {noreply, State2}.
    Extended Node Server (3/4)
    Tuesday, September 25, 2012

    View Slide

  49. 49
    handle_event({join, [OtherNode]}, State) ->
    OtherState = gen_server:call(OtherNode, get_state),
    #state{members=Members, leader=Leader} = OtherState,
    Members2 = ordsets:add_element(State#state.id, Members),
    {noreply, State#state{members=Members2,
    leader=Leader}};
    handle_event({send_gossip, [OtherNode]}, State) ->
    gen_server:cast(OtherNode, {gossip, State}),
    {noreply, State}.
    Extended Node Server (4/4)
    Tuesday, September 25, 2012

    View Slide

  50. 50
    always(S) ->
    all([begin
    Members = riak_node:get_members(Node),
    one_leader(Members) end || Node <- S#state.nodes]).
    one_leader(Members) ->
    Leaders = [Leader || Node <- Members,
    Leader <- [riak_node:get_leader(Node)],
    Leader == Node],
    length(lists:usort(Leaders)) < 2.
    Extended Test Module
    Tuesday, September 25, 2012

    View Slide

  51. Counterexample
    51
    [{init,{test_state,undefined,undefined,riak_model,
    0,[],undefined,undefined,simulate}},
    {set,{var,1},{call,hansei_test,init_dynamic,[]}},
    {set,{var,2},{call,hansei_test,init_system,[riak_model]}},
    {set,{var,3},{call,riak_model,join,[1,[3]]}},
    {set,{var,4},{call,hansei_test,rcvmsg,[3,{1,{call,get_state}}]}},
    {set,{var,5},{call,hansei_test,rcvreply,[1,{3,{state,3,[3],3}}]}},
    {set,{var,6},{call,riak_node,send_gossip,[1,[3]]}},
    {set,{var,7},
    {call,hansei_test,rcvmsg, [3,{1,{cast,{gossip,{state,1,[1,3],3}}}}]}},
    {set,{var,8},{call,riak_node,send_gossip,[3,[1]]}},
    {set,{var,9},{call,riak_node,send_gossip,[1,[3]]}},
    {set,{var,16},
    {call,hansei_test,rcvmsg,[3,{1,{cast,{gossip,{state,1,[1,3],3}}}}]}},
    {set,{var,18},
    {call,hansei_test,rcvmsg,[1,{3,{cast,{gossip,{state,3,[1,3],1}}}}]}}]
    {postcondition,false}
    Tuesday, September 25, 2012

    View Slide

  52. 2 3
    1
    join 3
    call: get_state
    cast: gossip([1,3], 3)
    send_gossip 3
    send_gossip 3
    send_gossip 1
    reply: ([3], 3)
    cast: gossip([1,3], 3)
    cast: gossip([1,3], 1)
    [1,3], 3
    [1,3], 1
    [3], 3
    [1,3], 3
    [1], 1
    [1,3], 1
    52
    Tuesday, September 25, 2012

    View Slide

  53. Versioned leader state
    53
    • Add version number to gossiped state
    • Leader increments version when changed
    • Node updates leader only if newer version
    • After changes, model passes without issue
    Tuesday, September 25, 2012

    View Slide

  54. 2 3
    1
    join 3
    call: get_state
    cast: gossip([1,3], 3)
    send_gossip 3
    send_gossip 3
    send_gossip 1
    reply: ([3], 3)
    cast: gossip([1,3], 3)
    cast: gossip([1,3], 1)
    [1,3], 1
    [1,3], 1
    [3], 3
    [1,3], 3
    [1], 1
    [1,3], 1
    54
    Tuesday, September 25, 2012

    View Slide

  55. Riak Implementation
    • Simple example similar to Riak clustering
    system
    • Can run tracing/interception mode against Riak
    • Use riak_test to bring up multiple Riak nodes
    • Change process_modules to return a list
    [{node(), riak_core_gossip})]
    55
    Tuesday, September 25, 2012

    View Slide

  56. Open source
    • Hansei will be released as open-source
    http://github.com/basho/hansei
    • Apache License (most likely)
    • Soon!
    56
    Tuesday, September 25, 2012

    View Slide

  57. Future Plans
    • Simulate monitors + links
    • Simulate dropping messages
    Earlier prototype did, recent changes broke code
    • Support process exits, supervisors
    • Add properties to most of riak_core
    • Use Hansei in construction of basho_ensemble
    New dynamic ensemble, leader election library
    57
    Tuesday, September 25, 2012

    View Slide

  58. 58
    Basho is hiring!
    [email protected]
    @jtuple
    Tuesday, September 25, 2012

    View Slide