Layers
Upper layer
Lower layer
Provides
Uses
Upcalls
Upcalls Downcalls
Downcalls
Implement downcalls
to receive them
Implement upcalls to
receive them
Slide 15
Slide 15 text
Layers
Slide 16
Slide 16 text
Layers
Slide 17
Slide 17 text
Layers
Events are implemented as
simple function calls
Slide 18
Slide 18 text
Concurrency
Slide 19
Slide 19 text
Concurrency
Modelled using state machines
with transitions and guards
Slide 20
Slide 20 text
Concurrency
Slide 21
Slide 21 text
Concurrency
• State = enumeration and variables
• Transitions = upcalls from below, downcalls
from above, scheduled events
• Guards = only transition if condition is true
Slide 22
Slide 22 text
Concurrency
Slide 23
Slide 23 text
Concurrency
Slide 24
Slide 24 text
Concurrency
states { init; preJoining; joining; joined; }
state_variables {
NodeKey myhash;
/* ... */
}
transitions {
scheduler global_maintenance()
guard ( state == joined ) {
NodeKey d = myhash;
/* ... */
TCP.route(n, GlobalSample(d));
}
upcall forward(const NodeKey& from, const NodeKey& to,
/* etc */)
guard ( state == joined ) {
nextHop = make_routing_decision(msg.key);
return true;
}
/* etc */
}
Slide 25
Slide 25 text
Domain Specific Language (DSL)
states { init; preJoining; joining; joined; }
state_variables {
NodeKey myhash;
/* ... */
}
transitions {
scheduler global_maintenance()
guard ( state == joined ) {
NodeKey d = myhash;
/* ... */
TCP.route(n, GlobalSample(d));
}
upcall forward(const NodeKey& from, const NodeKey& to,
/* etc */)
guard ( state == joined ) {
nextHop = make_routing_decision(msg.key);
return true;
}
/* etc */
}
Slide 26
Slide 26 text
Failures
Slide 27
Slide 27 text
Failures
Uses aspects for detecting both failures
and inconsistencies
Slide 28
Slide 28 text
Failures
Programmer supplies predicates that
detect failure situations; these are
monitored and acted upon.
Analysis
With this model, we can find performance
and correctness problems
Slide 33
Slide 33 text
Analysis
Slide 34
Slide 34 text
Analysis
• Uses aspects to generate debugging and
logging code
• Causal-paths: Think distributed call graphs
• Model checking: Detect liveness violations
Slide 35
Slide 35 text
Model Checking
Slide 36
Slide 36 text
Model Checking
Instead of finding safety violations, check
for liveness properties.
Slide 37
Slide 37 text
Model Checking
Slide 38
Slide 38 text
Model Checking
Slide 39
Slide 39 text
Model Checking
The authors found 52 bugs using MaceMC,
including a bug in Pastry and FreePastry.
Slide 40
Slide 40 text
Model Checking
Slide 41
Slide 41 text
Model Checking
« While our experience is restricted to MaceMC, we
believe our random execution algorithms for finding
liveness violations and the critical transition generalize to
any state-exploration model checker capable of
replaying executions. »
Slide 42
Slide 42 text
Model Checking
« we believe our algorithms generalize to any model
checker capable of replaying executions. »
Slide 43
Slide 43 text
Search Algorithms
Slide 44
Slide 44 text
Search Algorithms
• Bounded depth-first search (BDFS)
• Random walks
• Isolating the critical transition
• Combined exhaustive search and random
walks
• Reducing the search space
Slide 45
Slide 45 text
Lots of cools stuff here!
Slide 46
Slide 46 text
Can we use this in our
projects?
Slide 47
Slide 47 text
Some questions
Slide 48
Slide 48 text
Some questions
• Events are functions and therefore in-
process? No remove events?
• Are states set explicitly by code?
• I’m missing a discussion on
synchronization and details on external
event handling
• ... but this is all in the C++ source code :)
Slide 49
Slide 49 text
References
• Charles Killian et al, «Mace: Language
support for building distributed systems»,
UCSD
• Charles Killian et al, «Life, Death, and the
Critical Transition: Finding Liveness Bugs
in Systems Code»