Slide 1

Slide 1 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . The Raft Protocol Distributed Consensus for Dummies Arnaud Bailly @abailly 2014-06

Slide 2

Slide 2 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Who am I? ▶ Writing code since 1986

Slide 3

Slide 3 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Who am I? ▶ Writing code since 1986 ▶ Developping software since 1994

Slide 4

Slide 4 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Who am I? ▶ Writing code since 1986 ▶ Developping software since 1994 ▶ Lead developer, Java/XP consultant at Murex since 2009

Slide 5

Slide 5 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Who am I? ▶ Writing code since 1986 ▶ Developping software since 1994 ▶ Lead developer, Java/XP consultant at Murex since 2009 ▶ Fascinated with distributed computing since …

Slide 6

Slide 6 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Who am I? ▶ Writing code since 1986 ▶ Developping software since 1994 ▶ Lead developer, Java/XP consultant at Murex since 2009 ▶ Fascinated with distributed computing since … ▶ By the way, Murex is hiring!

Slide 7

Slide 7 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Why Should I Care about Distributed Consensus? ▶ Real world is distributed (multicore chips, WWW)

Slide 8

Slide 8 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Why Should I Care about Distributed Consensus? ▶ Real world is distributed (multicore chips, WWW) ▶ ￿ Today’s applications need to take care of distribution: abstractions leak!

Slide 9

Slide 9 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Why Should I Care about Distributed Consensus? ▶ Real world is distributed (multicore chips, WWW) ▶ ￿ Today’s applications need to take care of distribution: abstractions leak! ▶ Systems may fail, and large systems may fail more often

Slide 10

Slide 10 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Why Should I Care about Distributed Consensus? ▶ Real world is distributed (multicore chips, WWW) ▶ ￿ Today’s applications need to take care of distribution: abstractions leak! ▶ Systems may fail, and large systems may fail more often ▶ ￿ fault-tolerance

Slide 11

Slide 11 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Why Should I Care about Distributed Consensus? ▶ Real world is distributed (multicore chips, WWW) ▶ ￿ Today’s applications need to take care of distribution: abstractions leak! ▶ Systems may fail, and large systems may fail more often ▶ ￿ fault-tolerance ▶ Yet we need to provide fast service reliably

Slide 12

Slide 12 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Why Should I Care about Distributed Consensus? ▶ Real world is distributed (multicore chips, WWW) ▶ ￿ Today’s applications need to take care of distribution: abstractions leak! ▶ Systems may fail, and large systems may fail more often ▶ ￿ fault-tolerance ▶ Yet we need to provide fast service reliably ▶ ￿ high-availabilty

Slide 13

Slide 13 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Why Should I Care about Distributed Consensus? ▶ Real world is distributed (multicore chips, WWW) ▶ ￿ Today’s applications need to take care of distribution: abstractions leak! ▶ Systems may fail, and large systems may fail more often ▶ ￿ fault-tolerance ▶ Yet we need to provide fast service reliably ▶ ￿ high-availabilty ▶ Consensus is a basic building block for all kind of distributed systems features

Slide 14

Slide 14 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: PaaS Configuration ▶ etcd is part of CoreOS, a linux distribution for clusters

Slide 15

Slide 15 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: PaaS Configuration ▶ etcd is part of CoreOS, a linux distribution for clusters ▶ Provide consistent configuration for all docker containers hosted on CoreOS

Slide 16

Slide 16 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: PaaS Configuration ▶ etcd is part of CoreOS, a linux distribution for clusters ▶ Provide consistent configuration for all docker containers hosted on CoreOS ▶ Uses on Raft Distributed Consensus implemented in Go

Slide 17

Slide 17 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: Service Discovery ▶ Apache’s ZooKeeper provides distributed consistent hierarchical key-value store

Slide 18

Slide 18 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: Service Discovery ▶ Apache’s ZooKeeper provides distributed consistent hierarchical key-value store ▶ AirBnB uses ZK to provide service discovery in their SmartStack solution

Slide 19

Slide 19 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: Service Discovery ▶ Apache’s ZooKeeper provides distributed consistent hierarchical key-value store ▶ AirBnB uses ZK to provide service discovery in their SmartStack solution ▶ Example scenario:

Slide 20

Slide 20 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: Service Discovery ▶ Apache’s ZooKeeper provides distributed consistent hierarchical key-value store ▶ AirBnB uses ZK to provide service discovery in their SmartStack solution ▶ Example scenario: 1. A room registration service instance starts

Slide 21

Slide 21 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: Service Discovery ▶ Apache’s ZooKeeper provides distributed consistent hierarchical key-value store ▶ AirBnB uses ZK to provide service discovery in their SmartStack solution ▶ Example scenario: 1. A room registration service instance starts 2. It registers itself as an ephemeral node in ZK

Slide 22

Slide 22 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: Service Discovery ▶ Apache’s ZooKeeper provides distributed consistent hierarchical key-value store ▶ AirBnB uses ZK to provide service discovery in their SmartStack solution ▶ Example scenario: 1. A room registration service instance starts 2. It registers itself as an ephemeral node in ZK 3. This triggers reconfiguration of HAProxy to this service in the cluster

Slide 23

Slide 23 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: Service Discovery ▶ Apache’s ZooKeeper provides distributed consistent hierarchical key-value store ▶ AirBnB uses ZK to provide service discovery in their SmartStack solution ▶ Example scenario: 1. A room registration service instance starts 2. It registers itself as an ephemeral node in ZK 3. This triggers reconfiguration of HAProxy to this service in the cluster 4. The service then can address other services using “dynamic” HAProxy-ed address

Slide 24

Slide 24 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Use Case: Service Discovery ▶ Apache’s ZooKeeper provides distributed consistent hierarchical key-value store ▶ AirBnB uses ZK to provide service discovery in their SmartStack solution ▶ Example scenario: 1. A room registration service instance starts 2. It registers itself as an ephemeral node in ZK 3. This triggers reconfiguration of HAProxy to this service in the cluster 4. The service then can address other services using “dynamic” HAProxy-ed address ▶ zab ensures distributed consensus across ZK nodes

Slide 25

Slide 25 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Distributed Consensus is A Very Old Problem…

Slide 26

Slide 26 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … And it is Hard 1. Horses and messengers can get killed…

Slide 27

Slide 27 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … And it is Hard 1. Horses and messengers can get killed… 2. Horses can travel only so fast…

Slide 28

Slide 28 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … And it is Hard 1. Horses and messengers can get killed… 2. Horses can travel only so fast… 3. You can send only so many horses at once…

Slide 29

Slide 29 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … And it is Hard 1. Horses and messengers can get killed… 2. Horses can travel only so fast… 3. You can send only so many horses at once… 4. Enemy can setup ambushes…

Slide 30

Slide 30 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … And it is Hard 1. Horses and messengers can get killed… 2. Horses can travel only so fast… 3. You can send only so many horses at once… 4. Enemy can setup ambushes… 5. Army corps can move…

Slide 31

Slide 31 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … And it is Hard 1. Horses and messengers can get killed… 2. Horses can travel only so fast… 3. You can send only so many horses at once… 4. Enemy can setup ambushes… 5. Army corps can move… 6. Nobody knows everything…

Slide 32

Slide 32 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … And it is Hard 1. Horses and messengers can get killed… 2. Horses can travel only so fast… 3. You can send only so many horses at once… 4. Enemy can setup ambushes… 5. Army corps can move… 6. Nobody knows everything… 7. You need to feed horses…

Slide 33

Slide 33 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … And it is Hard 1. Horses and messengers can get killed… 2. Horses can travel only so fast… 3. You can send only so many horses at once… 4. Enemy can setup ambushes… 5. Army corps can move… 6. Nobody knows everything… 7. You need to feed horses… 8. Not all horses are created equal.

Slide 34

Slide 34 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … Even in Distributed Computing The 8 Fallacies of Distributed Computing 1. The network is reliable.

Slide 35

Slide 35 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … Even in Distributed Computing The 8 Fallacies of Distributed Computing 1. The network is reliable. 2. Latency is zero.

Slide 36

Slide 36 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … Even in Distributed Computing The 8 Fallacies of Distributed Computing 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite.

Slide 37

Slide 37 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … Even in Distributed Computing The 8 Fallacies of Distributed Computing 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure.

Slide 38

Slide 38 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … Even in Distributed Computing The 8 Fallacies of Distributed Computing 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn’t change.

Slide 39

Slide 39 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … Even in Distributed Computing The 8 Fallacies of Distributed Computing 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn’t change. 6. There is one administrator.

Slide 40

Slide 40 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … Even in Distributed Computing The 8 Fallacies of Distributed Computing 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn’t change. 6. There is one administrator. 7. Transport cost is zero.

Slide 41

Slide 41 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . … Even in Distributed Computing The 8 Fallacies of Distributed Computing 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn’t change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous.

Slide 42

Slide 42 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Fundamental Impossibility Results Figure : The Fischer-Lynch-Paterson Theorem (aka. FLP)

Slide 43

Slide 43 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . In an Asynchronous Network… It is not possible to reach distributed consensus with arbitrary communication failures Distributed Algorithms, Nancy Lynch, 1997, Morkan-Kaufmann

Slide 44

Slide 44 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . In a Partially Synchronous Network… It is possible to reach consensus assuming f processes fail and there is an upper bound d on delivery time for all messages, provided the number of processes is greater than 2f Nancy Lynch, op.cit.

Slide 45

Slide 45 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . In Practice

Slide 46

Slide 46 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Paxos ▶ Renowned consensus algorithm invented by Leslie Lamport

Slide 47

Slide 47 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Paxos ▶ Renowned consensus algorithm invented by Leslie Lamport ▶ Provides foundations for several implementations: ZooKeeper (kinda…), Chubby

Slide 48

Slide 48 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Paxos ▶ Renowned consensus algorithm invented by Leslie Lamport ▶ Provides foundations for several implementations: ZooKeeper (kinda…), Chubby ▶ But it is hard to implement correctly:

Slide 49

Slide 49 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Paxos ▶ Renowned consensus algorithm invented by Leslie Lamport ▶ Provides foundations for several implementations: ZooKeeper (kinda…), Chubby ▶ But it is hard to implement correctly: While Paxos can be described with a page of pseudo-code, our complete implementation contains several thousand lines of C++ code. Converting the algorithm into a practical system involved implementing many features some published in the literature and some not.

Slide 50

Slide 50 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Paxos ▶ Renowned consensus algorithm invented by Leslie Lamport ▶ Provides foundations for several implementations: ZooKeeper (kinda…), Chubby ▶ But it is hard to implement correctly: While Paxos can be described with a page of pseudo-code, our complete implementation contains several thousand lines of C++ code. Converting the algorithm into a practical system involved implementing many features some published in the literature and some not. Paxos Made Live - An Engineering Perspective, T.Chandra et al.

Slide 51

Slide 51 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft ▶ In Search of an Understandable Consensus Algorithm, D.Ongaro and J.Osterhout, 2013

Slide 52

Slide 52 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft ▶ In Search of an Understandable Consensus Algorithm, D.Ongaro and J.Osterhout, 2013 ▶ Novel algorithm designed with understandability in mind

Slide 53

Slide 53 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft ▶ In Search of an Understandable Consensus Algorithm, D.Ongaro and J.Osterhout, 2013 ▶ Novel algorithm designed with understandability in mind ▶ Dozens of implementations in various language

Slide 54

Slide 54 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft ▶ In Search of an Understandable Consensus Algorithm, D.Ongaro and J.Osterhout, 2013 ▶ Novel algorithm designed with understandability in mind ▶ Dozens of implementations in various language ▶ Most prominent use is currently Go version for etcd distributed configuration system in CoreOS

Slide 55

Slide 55 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Principle: Replicated State Machine With Persistent Log

Slide 56

Slide 56 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Principles of Operation ▶ Leader-follower based algorithm: Leader is the single entry point for all operations on the cluster

Slide 57

Slide 57 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Principles of Operation ▶ Leader-follower based algorithm: Leader is the single entry point for all operations on the cluster ▶ Each instance is a Replicated state machine whose state is uniquely determined by a linear persistent log

Slide 58

Slide 58 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Principles of Operation ▶ Leader-follower based algorithm: Leader is the single entry point for all operations on the cluster ▶ Each instance is a Replicated state machine whose state is uniquely determined by a linear persistent log ▶ Leader orchestrates safe log replication to its followers

Slide 59

Slide 59 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft Algorithm Figure : Ney requests being appointed leader

Slide 60

Slide 60 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft Algorithm Figure : Ney becomes leader

Slide 61

Slide 61 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft Algorithm Figure : Leader replicates own log to followers

Slide 62

Slide 62 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft Algorithm Figure : Ney receives attack order and propagates it

Slide 63

Slide 63 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft Algorithm Figure : Ney receives march order but is isolated

Slide 64

Slide 64 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft Algorithm Figure : Lannes is appointed leader for new term

Slide 65

Slide 65 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft Algorithm Figure : Ney comes back and tries to propagates march order

Slide 66

Slide 66 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Raft Algorithm Figure : Ney fallback to follower state

Slide 67

Slide 67 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Other Features ▶ Cluster Reconfiguration ￿ Supports cluster membership changes w/o service interruption

Slide 68

Slide 68 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Other Features ▶ Cluster Reconfiguration ￿ Supports cluster membership changes w/o service interruption ▶ Log compaction ￿ Logs can grow very large on systems with high throughput, slowing down rebuild after crash and occupying unnecessary disk space

Slide 69

Slide 69 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Other Features ▶ Cluster Reconfiguration ￿ Supports cluster membership changes w/o service interruption ▶ Log compaction ￿ Logs can grow very large on systems with high throughput, slowing down rebuild after crash and occupying unnecessary disk space ▶ Snapshotting replaces history prefix with a representation of the state

Slide 70

Slide 70 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Java Implementation: Barge https://github.com/mgodave/barge ! ▶ OSS project started by Dave Rusek with contributions from Justin Santa Barbara and yours truly

Slide 71

Slide 71 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Java Implementation: Barge https://github.com/mgodave/barge ! ▶ OSS project started by Dave Rusek with contributions from Justin Santa Barbara and yours truly ▶ Still very young but usable, provides 2 transport methods: Raw TCP and HTTP

Slide 72

Slide 72 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Java Implementation: Barge https://github.com/mgodave/barge ! ▶ OSS project started by Dave Rusek with contributions from Justin Santa Barbara and yours truly ▶ Still very young but usable, provides 2 transport methods: Raw TCP and HTTP ▶ Feature complete w.r.t base protocol but missing cluster reconfiguration and log compaction

Slide 73

Slide 73 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Java Implementation: Barge https://github.com/mgodave/barge ! ▶ OSS project started by Dave Rusek with contributions from Justin Santa Barbara and yours truly ▶ Still very young but usable, provides 2 transport methods: Raw TCP and HTTP ▶ Feature complete w.r.t base protocol but missing cluster reconfiguration and log compaction ▶ Friendly (Apache 2.0) License, Pull Requests are welcomed

Slide 74

Slide 74 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Demo

Slide 75

Slide 75 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Takeaways ▶ Understand your consistency requirements

Slide 76

Slide 76 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Takeaways ▶ Understand your consistency requirements ▶ Strong consistency ￿ Consensus

Slide 77

Slide 77 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Takeaways ▶ Understand your consistency requirements ▶ Strong consistency ￿ Consensus ▶ Lowered barrier of entry to use consensus at applicative level

Slide 78

Slide 78 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Takeaways ▶ Understand your consistency requirements ▶ Strong consistency ￿ Consensus ▶ Lowered barrier of entry to use consensus at applicative level ▶ Raft is lightweight and understandable

Slide 79

Slide 79 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Takeaways ▶ Understand your consistency requirements ▶ Strong consistency ￿ Consensus ▶ Lowered barrier of entry to use consensus at applicative level ▶ Raft is lightweight and understandable ▶ Not a Silver Bullet

Slide 80

Slide 80 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Takeaways ▶ Understand your consistency requirements ▶ Strong consistency ￿ Consensus ▶ Lowered barrier of entry to use consensus at applicative level ▶ Raft is lightweight and understandable ▶ Not a Silver Bullet ▶ Strong Consistency has a cost you don’t want to pay for high throughput and large data sets

Slide 81

Slide 81 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Takeaways ▶ Understand your consistency requirements ▶ Strong consistency ￿ Consensus ▶ Lowered barrier of entry to use consensus at applicative level ▶ Raft is lightweight and understandable ▶ Not a Silver Bullet ▶ Strong Consistency has a cost you don’t want to pay for high throughput and large data sets ▶ Sweet spot: Configuration data, synchronizing clients at key points

Slide 82

Slide 82 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . TINSTAAFL

Slide 83

Slide 83 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Questions?

Slide 84

Slide 84 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Credits & Links ▶ ETH Zurich Course on Distributed Systems

Slide 85

Slide 85 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Credits & Links ▶ ETH Zurich Course on Distributed Systems ▶ Napoléon à Austerlitz

Slide 86

Slide 86 text

. . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . .. . Credits & Links ▶ ETH Zurich Course on Distributed Systems ▶ Napoléon à Austerlitz ▶ Nancy Lynch at CSAIL