Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Mace Model Checker

The Mace Model Checker

This was an internal presentation at the University of Stavanger on the Mace Model Checker. It can be used to model check distributed, concurrent systems for correctness.

Christian Stigen Larsen

January 17, 2013
Tweet

More Decks by Christian Stigen Larsen

Other Decks in Science

Transcript

  1. The Mace Model Checker
    Christian Stigen Larsen, UiS
    2013-01-17

    View Slide

  2. View Slide

  3. « [Mace consists of tools to] enhance
    development, testing and understanding of the
    execution of distributed systems ... »

    View Slide

  4. View Slide

  5. ... ... ... ...
    ... ... ... ...
    ... ... ... ...
    ... ... ... ...
    Highlevel
    description
    Verification and
    compilation
    0101010101
    1100110101
    0011001101
    0111010111
    1110110001
    Generated
    code

    View Slide

  6. Design principles

    View Slide

  7. Design principles
    • Service objects in hierarchy of layers
    • Events as a unified concurrency model
    • Aspects for cross-cutting concerns

    View Slide

  8. Layers

    View Slide

  9. Layers
    Mace organizes everything into layers.

    View Slide

  10. Layers

    View Slide

  11. • Upcalls / Downcalls
    • Used interfaces / Provided interfaces
    Layers

    View Slide

  12. Layers

    View Slide

  13. Layers
    Upper layer
    Lower layer
    Provides
    Uses
    Upcalls
    Upcalls Downcalls
    Downcalls

    View Slide

  14. Layers
    Upper layer
    Lower layer
    Provides
    Uses
    Upcalls
    Upcalls Downcalls
    Downcalls
    Implement downcalls
    to receive them
    Implement upcalls to
    receive them

    View Slide

  15. Layers

    View Slide

  16. Layers

    View Slide

  17. Layers
    Events are implemented as
    simple function calls

    View Slide

  18. Concurrency

    View Slide

  19. Concurrency
    Modelled using state machines
    with transitions and guards

    View Slide

  20. Concurrency

    View Slide

  21. Concurrency
    • State = enumeration and variables
    • Transitions = upcalls from below, downcalls
    from above, scheduled events
    • Guards = only transition if condition is true

    View Slide

  22. Concurrency

    View Slide

  23. Concurrency

    View Slide

  24. Concurrency
    states { init; preJoining; joining; joined; }
    state_variables {
    NodeKey myhash;
    /* ... */
    }
    transitions {
    scheduler global_maintenance()
    guard ( state == joined ) {
    NodeKey d = myhash;
    /* ... */
    TCP.route(n, GlobalSample(d));
    }
    upcall forward(const NodeKey& from, const NodeKey& to,
    /* etc */)
    guard ( state == joined ) {
    nextHop = make_routing_decision(msg.key);
    return true;
    }
    /* etc */
    }

    View Slide

  25. Domain Specific Language (DSL)
    states { init; preJoining; joining; joined; }
    state_variables {
    NodeKey myhash;
    /* ... */
    }
    transitions {
    scheduler global_maintenance()
    guard ( state == joined ) {
    NodeKey d = myhash;
    /* ... */
    TCP.route(n, GlobalSample(d));
    }
    upcall forward(const NodeKey& from, const NodeKey& to,
    /* etc */)
    guard ( state == joined ) {
    nextHop = make_routing_decision(msg.key);
    return true;
    }
    /* etc */
    }

    View Slide

  26. Failures

    View Slide

  27. Failures
    Uses aspects for detecting both failures
    and inconsistencies

    View Slide

  28. Failures
    Programmer supplies predicates that
    detect failure situations; these are
    monitored and acted upon.

    View Slide

  29. Failures

    View Slide

  30. Failures
    // local detection
    detect {
    guard = (range != pre(range));
    error = notifyNewRange;
    }
    // distributed detection across myleafset
    detect {
    guard = (state == joined);
    nodes = myleafset;
    send = { message = LeafsetPush(myhash, myleafset);
    period = 5sec; }
    receive = { message = LeafsetPull;
    period = 5min; }
    error = leafFailed;
    }

    View Slide

  31. Analysis

    View Slide

  32. Analysis
    With this model, we can find performance
    and correctness problems

    View Slide

  33. Analysis

    View Slide

  34. Analysis
    • Uses aspects to generate debugging and
    logging code
    • Causal-paths: Think distributed call graphs
    • Model checking: Detect liveness violations

    View Slide

  35. Model Checking

    View Slide

  36. Model Checking
    Instead of finding safety violations, check
    for liveness properties.

    View Slide

  37. Model Checking

    View Slide

  38. Model Checking

    View Slide

  39. Model Checking
    The authors found 52 bugs using MaceMC,
    including a bug in Pastry and FreePastry.

    View Slide

  40. Model Checking

    View Slide

  41. Model Checking
    « While our experience is restricted to MaceMC, we
    believe our random execution algorithms for finding
    liveness violations and the critical transition generalize to
    any state-exploration model checker capable of
    replaying executions. »

    View Slide

  42. Model Checking
    « we believe our algorithms generalize to any model
    checker capable of replaying executions. »

    View Slide

  43. Search Algorithms

    View Slide

  44. Search Algorithms
    • Bounded depth-first search (BDFS)
    • Random walks
    • Isolating the critical transition
    • Combined exhaustive search and random
    walks
    • Reducing the search space

    View Slide

  45. Lots of cools stuff here!

    View Slide

  46. Can we use this in our
    projects?

    View Slide

  47. Some questions

    View Slide

  48. Some questions
    • Events are functions and therefore in-
    process? No remove events?
    • Are states set explicitly by code?
    • I’m missing a discussion on
    synchronization and details on external
    event handling
    • ... but this is all in the C++ source code :)

    View Slide

  49. References
    • Charles Killian et al, «Mace: Language
    support for building distributed systems»,
    UCSD
    • Charles Killian et al, «Life, Death, and the
    Critical Transition: Finding Liveness Bugs
    in Systems Code»

    View Slide

  50. View Slide

  51. View Slide