Slide 1

Slide 1 text

Joseph Blomstedt (@jtuple) Basho Technologies joe@basho.com Property-based Development of Concurrent Systems Erlang Workshop September 2012 Hansei Tuesday, September 25, 2012

Slide 2

Slide 2 text

Concurrent programming is hard 2 Tuesday, September 25, 2012

Slide 3

Slide 3 text

Debugging concurrent programs is harder 3 Tuesday, September 25, 2012

Slide 4

Slide 4 text

Testing/Verification Make Things Easier 4 Tuesday, September 25, 2012

Slide 5

Slide 5 text

This talk is about testing 5 Tuesday, September 25, 2012

Slide 6

Slide 6 text

This talk is about testing early often always 6 Tuesday, September 25, 2012

Slide 7

Slide 7 text

This talk is about test-driven development 7 Tuesday, September 25, 2012

Slide 8

Slide 8 text

Why? 8 Tuesday, September 25, 2012

Slide 9

Slide 9 text

research/prototype production ready implementation days-weeks-months months-years Tuesday, September 25, 2012

Slide 10

Slide 10 text

Enemy: non-determinism 10 Tuesday, September 25, 2012

Slide 11

Slide 11 text

right concurrency interleaving right scenario concurrency violation + = Tuesday, September 25, 2012

Slide 12

Slide 12 text

Erlang Testing Tools • Property-based Testing Quickcheck Proper Triq 12 • Interleaving Tools PULSE Concuerror McErlang Tuesday, September 25, 2012

Slide 13

Slide 13 text

Concurrency Interleaving Tool Property-based Testing Tool + Tuesday, September 25, 2012

Slide 14

Slide 14 text

Examples 14 • QuickCheck / PULSE • QuickCheck / McErlang Tuesday, September 25, 2012

Slide 15

Slide 15 text

Hansei is built on QuickCheck 15 Tuesday, September 25, 2012

Slide 16

Slide 16 text

Enables lightweight stateful testing 16 Tuesday, September 25, 2012

Slide 17

Slide 17 text

Provides extended OTP behaviors 17 Tuesday, September 25, 2012

Slide 18

Slide 18 text

Provides built-in message interleaving 18 Tuesday, September 25, 2012

Slide 19

Slide 19 text

Supports other interleaving tools 19 Tuesday, September 25, 2012

Slide 20

Slide 20 text

Hansei Goals • Enable end to end testing prototype to final implementation • Use existing OTP behaviors Extended behaviors with property information • Message interleaving across VMs 20 Tuesday, September 25, 2012

Slide 21

Slide 21 text

21 Test Implement Prototype/Model Retest Tuesday, September 25, 2012

Slide 22

Slide 22 text

Lightweight/Integrated Testing 22 Tuesday, September 25, 2012

Slide 23

Slide 23 text

Quickcheck eqc_statem 23 Run against stateful code Verify postconditions Generate Command Sequence Tuesday, September 25, 2012

Slide 24

Slide 24 text

Quickcheck eqc_statem 24 command(State) -> %% Commands to run against stateful system oneof(Cmds). precondition(State, Cmd) -> %% Return true if cmd is valid in current state. next_state(State, Result, Cmd) -> %% Update test state after a given cmd. postcondition(State, Cmd, Result) -> %% Test postconditions. Tuesday, September 25, 2012

Slide 25

Slide 25 text

System Stateful Test Tuesday, September 25, 2012

Slide 26

Slide 26 text

System Test Annotations Tuesday, September 25, 2012

Slide 27

Slide 27 text

Hansei 27 • Test consists of test module and a set of process modules • Events External events, timers, things you do not care to model • Calls/casts map to simulated receive/reply semantics Tuesday, September 25, 2012

Slide 28

Slide 28 text

Hansei Test 28 test module test process (server) test process (server) test process (server) test process (fsm) test process (fsm) test process (fsm) Tuesday, September 25, 2012

Slide 29

Slide 29 text

gen_server 29 handle_call handle_cast init Tuesday, September 25, 2012

Slide 30

Slide 30 text

gen_server 30 handle_call handle_cast init handle_event post_call post_cast post_event always event precondition Tuesday, September 25, 2012

Slide 31

Slide 31 text

Test Module 31 after_call after_cast initial_state after_event post_call post_cast post_event always event precondition process_modules Tuesday, September 25, 2012

Slide 32

Slide 32 text

Hansei Operating Modes • Simulation Used during prototyping / modeling • Tracing • Tracing + Interception 32 Tuesday, September 25, 2012

Slide 33

Slide 33 text

Simulation Mode • Calls/casts mapped to command sequences • Generates sequence of events, calls, casts • Runs against simulated system of processes • Shrinks sequence when postconditions fail 33 Tuesday, September 25, 2012

Slide 34

Slide 34 text

34 Test Implement Prototype/Model Retest Tuesday, September 25, 2012

Slide 35

Slide 35 text

Tracing Mode • Generate event sequences, not call/casts • Run against external stateful system • Erlang tracing used to capture actual call/casts that occurred • Verify events + observed call/casts against model and final cluster state 35 Tuesday, September 25, 2012

Slide 36

Slide 36 text

Tracing + Implementation • Modify implementation to enable controlling message interleaving • Implemented as a proxy process that delays forwarding messages until told to do so by test module 36 Tuesday, September 25, 2012

Slide 37

Slide 37 text

Simple Example • Nodes join together an form a cluster • Nodes periodically gossip membership state to other known nodes • Prototype nodes as gen_servers 37 Tuesday, September 25, 2012

Slide 38

Slide 38 text

38 get_members(Node) -> gen_server:call(Node, get_members). Node Server (1/5) Tuesday, September 25, 2012

Slide 39

Slide 39 text

39 -record(state, {id, members}). init(Node) -> {ok, #state{id=Node, members=[Node]}}. handle_call(get_members, _From, State) -> {reply, State#state.members, State}; handle_call(get_state, _From, State) -> {reply, State, State}. Node Server (2/5) Tuesday, September 25, 2012

Slide 40

Slide 40 text

40 handle_cast({gossip, #state{members=OtherMembers}}, State=#state{members=Members}) -> Members2 = ordsets:union(Members, OtherMembers), State2 = State{members=Members2}, {noreply, State2}. Node Server (3/5) Tuesday, September 25, 2012

Slide 41

Slide 41 text

41 events(#state{id=Node, members=Members}) -> {call,?MODULE,send_gossip,[Node, [elements(Members)]]}. precondition({send_gossip, [Node, [OtherNode]]}, S) -> all([lists:member(OtherNode, S#state.members), Node /= OtherNode]). Node Server (4/5) Tuesday, September 25, 2012

Slide 42

Slide 42 text

42 handle_event({join, [OtherNode]}, State) -> OtherState = gen_server:call(OtherNode, get_state), Members = OtherState#state.members, Members2 = ordsets:add_element(State#state.id, Members), {noreply, State#state{members=Members2}; handle_event({send_gossip, [OtherNode]}, State) -> gen_server:cast(OtherNode, {gossip, State}), {noreply, State}. Node Server (5/5) Tuesday, September 25, 2012

Slide 43

Slide 43 text

43 -record(state, {nodes, singleton}). prop_riak() -> hansei_test:simulate(?MODULE). process_modules() -> lists:duplicate(?CLUSTER_SIZE, riak_node). initial_state(Procs) -> #state{nodes=Procs, singleton=Procs}. Test Module (1/2) Tuesday, September 25, 2012

Slide 44

Slide 44 text

44 events(#state{nodes=Nodes}) -> {call,?MODULE,join,[elements(Nodes), [elements(Nodes)]]}. precondition({join, [Node,[OtherNode]]}, S) -> Singleton = S#state.singleton, all([Node /= OtherNode, lists:member(Node, Singleton), (Singleton == S#state.nodes) or lists:member(OtherNode, Singleton)]). after_event({join, [Node,[OtherNode]]}) -> Singleton = S#state.singleton -- [Node, OtherNode], S#state{singleton=Singleton}. Test Module (2/2) Tuesday, September 25, 2012

Slide 45

Slide 45 text

Extended Example • Cluster maintains a weak leader Lowest node id in the cluster is considered the leader No actual leader election or failure detection • Property we care about At all times, there is only one node that believe it is the leader of a cluster 45 Tuesday, September 25, 2012

Slide 46

Slide 46 text

46 get_leader(Node) -> gen_server:call(Node, get_leader). Extended Node Server (1/4) Tuesday, September 25, 2012

Slide 47

Slide 47 text

47 -record(state, {id, members, leader}). init(Node) -> {ok, #state{id=Node, members=[Node], leader=Node}}. handle_call(get_leader, _From, State) -> {reply, State#state.leader, State}; Extended Node Server (2/4) Tuesday, September 25, 2012

Slide 48

Slide 48 text

48 handle_cast({gossip, #state{members=OtherMembers}}, State=#state{members=Members}) -> Members2 = ordsets:union(Members, OtherMembers), case is_leader(State) of true -> Leader2 = hd(lists:sort(Members2)); false -> Leader2 = Leader end, State2 = State#state{members=Members2, leader=Leader2}, {noreply, State2}. Extended Node Server (3/4) Tuesday, September 25, 2012

Slide 49

Slide 49 text

49 handle_event({join, [OtherNode]}, State) -> OtherState = gen_server:call(OtherNode, get_state), #state{members=Members, leader=Leader} = OtherState, Members2 = ordsets:add_element(State#state.id, Members), {noreply, State#state{members=Members2, leader=Leader}}; handle_event({send_gossip, [OtherNode]}, State) -> gen_server:cast(OtherNode, {gossip, State}), {noreply, State}. Extended Node Server (4/4) Tuesday, September 25, 2012

Slide 50

Slide 50 text

50 always(S) -> all([begin Members = riak_node:get_members(Node), one_leader(Members) end || Node <- S#state.nodes]). one_leader(Members) -> Leaders = [Leader || Node <- Members, Leader <- [riak_node:get_leader(Node)], Leader == Node], length(lists:usort(Leaders)) < 2. Extended Test Module Tuesday, September 25, 2012

Slide 51

Slide 51 text

Counterexample 51 [{init,{test_state,undefined,undefined,riak_model, 0,[],undefined,undefined,simulate}}, {set,{var,1},{call,hansei_test,init_dynamic,[]}}, {set,{var,2},{call,hansei_test,init_system,[riak_model]}}, {set,{var,3},{call,riak_model,join,[1,[3]]}}, {set,{var,4},{call,hansei_test,rcvmsg,[3,{1,{call,get_state}}]}}, {set,{var,5},{call,hansei_test,rcvreply,[1,{3,{state,3,[3],3}}]}}, {set,{var,6},{call,riak_node,send_gossip,[1,[3]]}}, {set,{var,7}, {call,hansei_test,rcvmsg, [3,{1,{cast,{gossip,{state,1,[1,3],3}}}}]}}, {set,{var,8},{call,riak_node,send_gossip,[3,[1]]}}, {set,{var,9},{call,riak_node,send_gossip,[1,[3]]}}, {set,{var,16}, {call,hansei_test,rcvmsg,[3,{1,{cast,{gossip,{state,1,[1,3],3}}}}]}}, {set,{var,18}, {call,hansei_test,rcvmsg,[1,{3,{cast,{gossip,{state,3,[1,3],1}}}}]}}] {postcondition,false} Tuesday, September 25, 2012

Slide 52

Slide 52 text

2 3 1 join 3 call: get_state cast: gossip([1,3], 3) send_gossip 3 send_gossip 3 send_gossip 1 reply: ([3], 3) cast: gossip([1,3], 3) cast: gossip([1,3], 1) [1,3], 3 [1,3], 1 [3], 3 [1,3], 3 [1], 1 [1,3], 1 52 Tuesday, September 25, 2012

Slide 53

Slide 53 text

Versioned leader state 53 • Add version number to gossiped state • Leader increments version when changed • Node updates leader only if newer version • After changes, model passes without issue Tuesday, September 25, 2012

Slide 54

Slide 54 text

2 3 1 join 3 call: get_state cast: gossip([1,3], 3) send_gossip 3 send_gossip 3 send_gossip 1 reply: ([3], 3) cast: gossip([1,3], 3) cast: gossip([1,3], 1) [1,3], 1 [1,3], 1 [3], 3 [1,3], 3 [1], 1 [1,3], 1 54 Tuesday, September 25, 2012

Slide 55

Slide 55 text

Riak Implementation • Simple example similar to Riak clustering system • Can run tracing/interception mode against Riak • Use riak_test to bring up multiple Riak nodes • Change process_modules to return a list [{node(), riak_core_gossip})] 55 Tuesday, September 25, 2012

Slide 56

Slide 56 text

Open source • Hansei will be released as open-source http://github.com/basho/hansei • Apache License (most likely) • Soon! 56 Tuesday, September 25, 2012

Slide 57

Slide 57 text

Future Plans • Simulate monitors + links • Simulate dropping messages Earlier prototype did, recent changes broke code • Support process exits, supervisors • Add properties to most of riak_core • Use Hansei in construction of basho_ensemble New dynamic ensemble, leader election library 57 Tuesday, September 25, 2012

Slide 58

Slide 58 text

58 Basho is hiring! joe@basho.com @jtuple Tuesday, September 25, 2012