Slide 1

Slide 1 text

Reliable by Design Applying Formal Methods to Distributed Systems David Turner @DaveCTurner Yannick Welsch @ywelsch

Slide 2

Slide 2 text

Design is thinking 2

Slide 3

Slide 3 text

Design is documentation 3

Slide 4

Slide 4 text

Richard Guindon { } Writing is nature’s way of letting you know how sloppy your thinking is.

Slide 5

Slide 5 text

Leslie Lamport { } Mathematics is nature’s way of letting you know how sloppy your writing is.

Slide 6

Slide 6 text

Mathematical tools 6

Slide 7

Slide 7 text

Millions of states 7

Slide 8

Slide 8 text

8 Model Checking Interactive Theorem Proving • Exhaustive search • Finite state space • Accessible • Detailed argument • Arbitrary state space • More specialised Flavours

Slide 9

Slide 9 text

CONSENSUS

Slide 10

Slide 10 text

Asynchronous system 10

Slide 11

Slide 11 text

11 Safety Liveness Nothing bad happens Something good eventually happens Properties

Slide 12

Slide 12 text

Majorities 12

Slide 13

Slide 13 text

13 TLA+ Node1 Node2 Node3 • combines temporal logic and set theory • specification defines initial state and next-state relation • states represented by assigning values to variables

Slide 14

Slide 14 text

14 Next-state relation Node n firstUncommittedSlot: s currentTerm: t ... PublishResponse{ ... } Node n firstUncommittedSlot: s currentTerm: t lastAcceptedTerm: t lastAcceptedValue: v ... PublishRequest - dest: n - slot: s - term: t - value: v PublishResponse - slot: s - term: t

Slide 15

Slide 15 text

\* next-state relation Next == \/ HandlePublishRequest \/ HandlePublishResponse \/ HandleClientRequest \/ SomeNodeCrashes \/ ... \* main safety property StateMachineSafety == \A n1, n2 \in Nodes : firstUncommittedSlot[n1] = firstUncommittedSlot[n2] => /\ currentClusterState[n1] = currentClusterState[n2] /\ currentConfiguration[n1] = currentConfiguration[n2] 15 Full specification • network behavior • node failures • client submitting values • next-state relation • safety property

Slide 16

Slide 16 text

TLC • model checker • integrated into IDE • exhaustive state exploration • breadth-first • bounded state space • bugs even for small models • good at finding edge cases

Slide 17

Slide 17 text

Isabelle/HOL • interactive proof assistant • needs guidance • tracks proof goals • fully automatically verifies proof

Slide 18

Slide 18 text

Experiences TLA+ Isabelle/HOL ● executable specs ● rapid prototyping ● high confidence ● rising in popularity ● no state-space limitations ● deep insights ● even higher confidence

Slide 19

Slide 19 text

Where can I learn more about this? More Questions? Visit us at the AMA ● TLA+ Home Page: http://lamport.azurewebsites.net/tla/tla.html ● TLA+ Video Course: http://lamport.azurewebsites.net/video/videos.html ● Introduction to TLA+: https://learntla.com ● Tutorial on Isabelle/HOL: http://isabelle.in.tum.de/doc/tutorial.pdf ● Use of Formal Methods at AWS: http://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf ● Formal models of core Elasticsearch algorithms: https://github.com/elastic/elasticsearch-formal-models ● Related talk at 3:30pm (Salon 1-7): Elasticsearch Consensus: The Past, the Present, and the Future

Slide 20

Slide 20 text

www.elastic.c o

Slide 21

Slide 21 text

21 Please attribute Elastic with a link to elastic.co