Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} 2018 - Reliable by design - Applying formal methods to distributed systems

Elastic Co
March 01, 2018

Elastic{ON} 2018 - Reliable by design - Applying formal methods to distributed systems

Elastic Co

March 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Reliable by Design
    Applying Formal Methods to Distributed Systems
    David Turner
    @DaveCTurner
    Yannick Welsch
    @ywelsch

    View Slide

  2. Design is thinking
    2

    View Slide

  3. Design is documentation
    3

    View Slide

  4. Richard Guindon
    { }
    Writing is nature’s way of
    letting you know how sloppy your
    thinking is.

    View Slide

  5. Leslie Lamport
    { }
    Mathematics is nature’s way of
    letting you know how sloppy your
    writing is.

    View Slide

  6. Mathematical tools
    6

    View Slide

  7. Millions of states
    7

    View Slide

  8. 8
    Model
    Checking
    Interactive
    Theorem
    Proving
    • Exhaustive search
    • Finite state space
    • Accessible
    • Detailed argument
    • Arbitrary state space
    • More specialised
    Flavours

    View Slide

  9. CONSENSUS

    View Slide

  10. Asynchronous system
    10

    View Slide

  11. 11
    Safety Liveness
    Nothing bad happens Something good eventually
    happens
    Properties

    View Slide

  12. Majorities
    12

    View Slide

  13. 13
    TLA+
    Node1
    Node2
    Node3
    • combines temporal logic and set theory
    • specification defines initial state and next-state relation
    • states represented by assigning values to variables

    View Slide

  14. 14
    Next-state relation
    Node n
    firstUncommittedSlot: s
    currentTerm: t
    ...
    PublishResponse{ ... }
    Node n
    firstUncommittedSlot: s
    currentTerm: t
    lastAcceptedTerm: t
    lastAcceptedValue: v
    ...
    PublishRequest
    - dest: n
    - slot: s
    - term: t
    - value: v
    PublishResponse
    - slot: s
    - term: t

    View Slide

  15. \* next-state relation
    Next ==
    \/ HandlePublishRequest
    \/ HandlePublishResponse
    \/ HandleClientRequest
    \/ SomeNodeCrashes
    \/ ...
    \* main safety property
    StateMachineSafety ==
    \A n1, n2 \in Nodes :
    firstUncommittedSlot[n1] = firstUncommittedSlot[n2] =>
    /\ currentClusterState[n1] = currentClusterState[n2]
    /\ currentConfiguration[n1] = currentConfiguration[n2]
    15
    Full specification
    • network behavior
    • node failures
    • client submitting values
    • next-state relation
    • safety property

    View Slide

  16. TLC
    • model checker
    • integrated into IDE
    • exhaustive state exploration
    • breadth-first
    • bounded state space
    • bugs even for small models
    • good at finding edge cases

    View Slide

  17. Isabelle/HOL
    • interactive proof assistant
    • needs guidance
    • tracks proof goals
    • fully automatically verifies proof

    View Slide

  18. Experiences
    TLA+ Isabelle/HOL
    ● executable specs
    ● rapid prototyping
    ● high confidence
    ● rising in popularity
    ● no state-space limitations
    ● deep insights
    ● even higher confidence

    View Slide

  19. Where can I learn more about this?
    More Questions? Visit us at the AMA
    ● TLA+ Home Page: http://lamport.azurewebsites.net/tla/tla.html
    ● TLA+ Video Course: http://lamport.azurewebsites.net/video/videos.html
    ● Introduction to TLA+: https://learntla.com
    ● Tutorial on Isabelle/HOL: http://isabelle.in.tum.de/doc/tutorial.pdf
    ● Use of Formal Methods at AWS: http://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf
    ● Formal models of core Elasticsearch algorithms: https://github.com/elastic/elasticsearch-formal-models
    ● Related talk at 3:30pm (Salon 1-7): Elasticsearch Consensus: The Past, the Present, and the Future

    View Slide

  20. www.elastic.c
    o

    View Slide

  21. 21
    Please attribute Elastic with a link to elastic.co

    View Slide