Elastic{ON} 2018 - Reliable by design - Applying formal methods to distributed systems

Dd9d954997353b37b4c2684f478192d3?s=47 Elastic Co
March 01, 2018

Elastic{ON} 2018 - Reliable by design - Applying formal methods to distributed systems

Dd9d954997353b37b4c2684f478192d3?s=128

Elastic Co

March 01, 2018
Tweet

Transcript

  1. Reliable by Design Applying Formal Methods to Distributed Systems David

    Turner @DaveCTurner Yannick Welsch @ywelsch
  2. Design is thinking 2

  3. Design is documentation 3

  4. Richard Guindon { } Writing is nature’s way of letting

    you know how sloppy your thinking is.
  5. Leslie Lamport { } Mathematics is nature’s way of letting

    you know how sloppy your writing is.
  6. Mathematical tools 6

  7. Millions of states 7

  8. 8 Model Checking Interactive Theorem Proving • Exhaustive search •

    Finite state space • Accessible • Detailed argument • Arbitrary state space • More specialised Flavours
  9. CONSENSUS

  10. Asynchronous system 10

  11. 11 Safety Liveness Nothing bad happens Something good eventually happens

    Properties
  12. Majorities 12

  13. 13 TLA+ Node1 Node2 Node3 • combines temporal logic and

    set theory • specification defines initial state and next-state relation • states represented by assigning values to variables
  14. 14 Next-state relation Node n firstUncommittedSlot: s currentTerm: t ...

    PublishResponse{ ... } Node n firstUncommittedSlot: s currentTerm: t lastAcceptedTerm: t lastAcceptedValue: v ... PublishRequest - dest: n - slot: s - term: t - value: v PublishResponse - slot: s - term: t
  15. \* next-state relation Next == \/ HandlePublishRequest \/ HandlePublishResponse \/

    HandleClientRequest \/ SomeNodeCrashes \/ ... \* main safety property StateMachineSafety == \A n1, n2 \in Nodes : firstUncommittedSlot[n1] = firstUncommittedSlot[n2] => /\ currentClusterState[n1] = currentClusterState[n2] /\ currentConfiguration[n1] = currentConfiguration[n2] 15 Full specification • network behavior • node failures • client submitting values • next-state relation • safety property
  16. TLC • model checker • integrated into IDE • exhaustive

    state exploration • breadth-first • bounded state space • bugs even for small models • good at finding edge cases
  17. Isabelle/HOL • interactive proof assistant • needs guidance • tracks

    proof goals • fully automatically verifies proof
  18. Experiences TLA+ Isabelle/HOL • executable specs • rapid prototyping •

    high confidence • rising in popularity • no state-space limitations • deep insights • even higher confidence
  19. Where can I learn more about this? More Questions? Visit

    us at the AMA • TLA+ Home Page: http://lamport.azurewebsites.net/tla/tla.html • TLA+ Video Course: http://lamport.azurewebsites.net/video/videos.html • Introduction to TLA+: https://learntla.com • Tutorial on Isabelle/HOL: http://isabelle.in.tum.de/doc/tutorial.pdf • Use of Formal Methods at AWS: http://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf • Formal models of core Elasticsearch algorithms: https://github.com/elastic/elasticsearch-formal-models • Related talk at 3:30pm (Salon 1-7): Elasticsearch Consensus: The Past, the Present, and the Future
  20. www.elastic.c o

  21. 21 Please attribute Elastic with a link to elastic.co