The Mace Model Checker

The Mace Model Checker

This was an internal presentation at the University of Stavanger on the Mace Model Checker. It can be used to model check distributed, concurrent systems for correctness.

62ec120256167ee34435f007becc2c13?s=128

Christian Stigen Larsen

January 17, 2013
Tweet

Transcript

  1. The Mace Model Checker Christian Stigen Larsen, UiS 2013-01-17

  2. None
  3. « [Mace consists of tools to] enhance development, testing and

    understanding of the execution of distributed systems ... »
  4. None
  5. ... ... ... ... ... ... ... ... ... ...

    ... ... ... ... ... ... Highlevel description Verification and compilation 0101010101 1100110101 0011001101 0111010111 1110110001 Generated code
  6. Design principles

  7. Design principles • Service objects in hierarchy of layers •

    Events as a unified concurrency model • Aspects for cross-cutting concerns
  8. Layers

  9. Layers Mace organizes everything into layers.

  10. Layers

  11. • Upcalls / Downcalls • Used interfaces / Provided interfaces

    Layers
  12. Layers

  13. Layers Upper layer Lower layer Provides Uses Upcalls Upcalls Downcalls

    Downcalls
  14. Layers Upper layer Lower layer Provides Uses Upcalls Upcalls Downcalls

    Downcalls Implement downcalls to receive them Implement upcalls to receive them
  15. Layers

  16. Layers

  17. Layers Events are implemented as simple function calls

  18. Concurrency

  19. Concurrency Modelled using state machines with transitions and guards

  20. Concurrency

  21. Concurrency • State = enumeration and variables • Transitions =

    upcalls from below, downcalls from above, scheduled events • Guards = only transition if condition is true
  22. Concurrency

  23. Concurrency

  24. Concurrency states { init; preJoining; joining; joined; } state_variables {

    NodeKey myhash; /* ... */ } transitions { scheduler global_maintenance() guard ( state == joined ) { NodeKey d = myhash; /* ... */ TCP.route(n, GlobalSample(d)); } upcall forward(const NodeKey& from, const NodeKey& to, /* etc */) guard ( state == joined ) { nextHop = make_routing_decision(msg.key); return true; } /* etc */ }
  25. Domain Specific Language (DSL) states { init; preJoining; joining; joined;

    } state_variables { NodeKey myhash; /* ... */ } transitions { scheduler global_maintenance() guard ( state == joined ) { NodeKey d = myhash; /* ... */ TCP.route(n, GlobalSample(d)); } upcall forward(const NodeKey& from, const NodeKey& to, /* etc */) guard ( state == joined ) { nextHop = make_routing_decision(msg.key); return true; } /* etc */ }
  26. Failures

  27. Failures Uses aspects for detecting both failures and inconsistencies

  28. Failures Programmer supplies predicates that detect failure situations; these are

    monitored and acted upon.
  29. Failures

  30. Failures // local detection detect { guard = (range !=

    pre(range)); error = notifyNewRange; } // distributed detection across myleafset detect { guard = (state == joined); nodes = myleafset; send = { message = LeafsetPush(myhash, myleafset); period = 5sec; } receive = { message = LeafsetPull; period = 5min; } error = leafFailed; }
  31. Analysis

  32. Analysis With this model, we can find performance and correctness

    problems
  33. Analysis

  34. Analysis • Uses aspects to generate debugging and logging code

    • Causal-paths: Think distributed call graphs • Model checking: Detect liveness violations
  35. Model Checking

  36. Model Checking Instead of finding safety violations, check for liveness

    properties.
  37. Model Checking

  38. Model Checking

  39. Model Checking The authors found 52 bugs using MaceMC, including

    a bug in Pastry and FreePastry.
  40. Model Checking

  41. Model Checking « While our experience is restricted to MaceMC,

    we believe our random execution algorithms for finding liveness violations and the critical transition generalize to any state-exploration model checker capable of replaying executions. »
  42. Model Checking « we believe our algorithms generalize to any

    model checker capable of replaying executions. »
  43. Search Algorithms

  44. Search Algorithms • Bounded depth-first search (BDFS) • Random walks

    • Isolating the critical transition • Combined exhaustive search and random walks • Reducing the search space
  45. Lots of cools stuff here!

  46. Can we use this in our projects?

  47. Some questions

  48. Some questions • Events are functions and therefore in- process?

    No remove events? • Are states set explicitly by code? • I’m missing a discussion on synchronization and details on external event handling • ... but this is all in the C++ source code :)
  49. References • Charles Killian et al, «Mace: Language support for

    building distributed systems», UCSD • Charles Killian et al, «Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code»
  50. None
  51. None