Slide 1

Slide 1 text

The Mace Model Checker Christian Stigen Larsen, UiS 2013-01-17

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

« [Mace consists of tools to] enhance development, testing and understanding of the execution of distributed systems ... »

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Highlevel description Verification and compilation 0101010101 1100110101 0011001101 0111010111 1110110001 Generated code

Slide 6

Slide 6 text

Design principles

Slide 7

Slide 7 text

Design principles • Service objects in hierarchy of layers • Events as a unified concurrency model • Aspects for cross-cutting concerns

Slide 8

Slide 8 text

Layers

Slide 9

Slide 9 text

Layers Mace organizes everything into layers.

Slide 10

Slide 10 text

Layers

Slide 11

Slide 11 text

• Upcalls / Downcalls • Used interfaces / Provided interfaces Layers

Slide 12

Slide 12 text

Layers

Slide 13

Slide 13 text

Layers Upper layer Lower layer Provides Uses Upcalls Upcalls Downcalls Downcalls

Slide 14

Slide 14 text

Layers Upper layer Lower layer Provides Uses Upcalls Upcalls Downcalls Downcalls Implement downcalls to receive them Implement upcalls to receive them

Slide 15

Slide 15 text

Layers

Slide 16

Slide 16 text

Layers

Slide 17

Slide 17 text

Layers Events are implemented as simple function calls

Slide 18

Slide 18 text

Concurrency

Slide 19

Slide 19 text

Concurrency Modelled using state machines with transitions and guards

Slide 20

Slide 20 text

Concurrency

Slide 21

Slide 21 text

Concurrency • State = enumeration and variables • Transitions = upcalls from below, downcalls from above, scheduled events • Guards = only transition if condition is true

Slide 22

Slide 22 text

Concurrency

Slide 23

Slide 23 text

Concurrency

Slide 24

Slide 24 text

Concurrency states { init; preJoining; joining; joined; } state_variables { NodeKey myhash; /* ... */ } transitions { scheduler global_maintenance() guard ( state == joined ) { NodeKey d = myhash; /* ... */ TCP.route(n, GlobalSample(d)); } upcall forward(const NodeKey& from, const NodeKey& to, /* etc */) guard ( state == joined ) { nextHop = make_routing_decision(msg.key); return true; } /* etc */ }

Slide 25

Slide 25 text

Domain Specific Language (DSL) states { init; preJoining; joining; joined; } state_variables { NodeKey myhash; /* ... */ } transitions { scheduler global_maintenance() guard ( state == joined ) { NodeKey d = myhash; /* ... */ TCP.route(n, GlobalSample(d)); } upcall forward(const NodeKey& from, const NodeKey& to, /* etc */) guard ( state == joined ) { nextHop = make_routing_decision(msg.key); return true; } /* etc */ }

Slide 26

Slide 26 text

Failures

Slide 27

Slide 27 text

Failures Uses aspects for detecting both failures and inconsistencies

Slide 28

Slide 28 text

Failures Programmer supplies predicates that detect failure situations; these are monitored and acted upon.

Slide 29

Slide 29 text

Failures

Slide 30

Slide 30 text

Failures // local detection detect { guard = (range != pre(range)); error = notifyNewRange; } // distributed detection across myleafset detect { guard = (state == joined); nodes = myleafset; send = { message = LeafsetPush(myhash, myleafset); period = 5sec; } receive = { message = LeafsetPull; period = 5min; } error = leafFailed; }

Slide 31

Slide 31 text

Analysis

Slide 32

Slide 32 text

Analysis With this model, we can find performance and correctness problems

Slide 33

Slide 33 text

Analysis

Slide 34

Slide 34 text

Analysis • Uses aspects to generate debugging and logging code • Causal-paths: Think distributed call graphs • Model checking: Detect liveness violations

Slide 35

Slide 35 text

Model Checking

Slide 36

Slide 36 text

Model Checking Instead of finding safety violations, check for liveness properties.

Slide 37

Slide 37 text

Model Checking

Slide 38

Slide 38 text

Model Checking

Slide 39

Slide 39 text

Model Checking The authors found 52 bugs using MaceMC, including a bug in Pastry and FreePastry.

Slide 40

Slide 40 text

Model Checking

Slide 41

Slide 41 text

Model Checking « While our experience is restricted to MaceMC, we believe our random execution algorithms for finding liveness violations and the critical transition generalize to any state-exploration model checker capable of replaying executions. »

Slide 42

Slide 42 text

Model Checking « we believe our algorithms generalize to any model checker capable of replaying executions. »

Slide 43

Slide 43 text

Search Algorithms

Slide 44

Slide 44 text

Search Algorithms • Bounded depth-first search (BDFS) • Random walks • Isolating the critical transition • Combined exhaustive search and random walks • Reducing the search space

Slide 45

Slide 45 text

Lots of cools stuff here!

Slide 46

Slide 46 text

Can we use this in our projects?

Slide 47

Slide 47 text

Some questions

Slide 48

Slide 48 text

Some questions • Events are functions and therefore in- process? No remove events? • Are states set explicitly by code? • I’m missing a discussion on synchronization and details on external event handling • ... but this is all in the C++ source code :)

Slide 49

Slide 49 text

References • Charles Killian et al, «Mace: Language support for building distributed systems», UCSD • Charles Killian et al, «Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code»

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content