Slide 1

Slide 1 text

Consensus in Distributed Systems

Slide 2

Slide 2 text

Brian

Slide 3

Slide 3 text

Brian (an ideas guy)

Slide 4

Slide 4 text

Brian (an ideas guy) Twitter for Alsatians?

Slide 5

Slide 5 text

Brian (an ideas guy) Uber for unicycles?

Slide 6

Slide 6 text

Brian (an ideas guy)

Slide 7

Slide 7 text

MySQL Database Ruby on Rails API

Slide 8

Slide 8 text

MySQL Database Ruby on Rails API

Slide 9

Slide 9 text

Leader (master) Follower (slave) Replication

Slide 10

Slide 10 text

Leader (master) Follower (slave) Replication

Slide 11

Slide 11 text

Leader (master) Failover

Slide 12

Slide 12 text

Leader (master) Follower (slave) Network

Slide 13

Slide 13 text

Leader (master) Follower (slave) Network (actual physical cables and stuff)

Slide 14

Slide 14 text

Leader (master) Follower (slave) Network (actual physical cables and stuff)

Slide 15

Slide 15 text

Fallacies of distributed computing #1 The network is reliable. L Peter Deutsch (et al)

Slide 16

Slide 16 text

Leader (master) Follower (slave) Network Order #17623

Slide 17

Slide 17 text

Leader (master) Follower (slave) Network Order #17623

Slide 18

Slide 18 text

Leader (master) Follower (slave) Network Order #17623 Details of order #17623 please Huh?

Slide 19

Slide 19 text

Options You’ve got two of ‘em

Slide 20

Slide 20 text

Leader (master) Follower (slave) Network Order #17623 Details of order #17623 please Huh? Option A

Slide 21

Slide 21 text

Leader (master) Follower (slave) Network Order #17623 No Option C (!)

Slide 22

Slide 22 text

CAP theorem (paraphrased) Eric Brewer When operating in a catastrophically broken or unreliable network a distributed system must choose to either risk returning stale/outdated data or refuse to accept writes/updates

Slide 23

Slide 23 text

CAP theorem (paraphrased) Eric Brewer When operating in a catastrophically broken or unreliable network (Partition Tolerance) a distributed system must choose to either risk returning stale/outdated data (Availability) or refuse to accept writes/updates (Consistency)

Slide 24

Slide 24 text

Trade-offs

Slide 25

Slide 25 text

Raft Consensus Algorithm

Slide 26

Slide 26 text

Strongly Consistent but also Highly Available

Slide 27

Slide 27 text

Quorum (and you need an odd number of nodes)

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Distributed Log

Slide 31

Slide 31 text

best_programming_language = Ruby current_year = 2008 linux_on_desktop = Maybe State Machine Distributed Log

Slide 32

Slide 32 text

best_programming_language = Ruby current_year = 2018 linux_on_desktop = Maybe State Machine current_year = 2018 SET

Slide 33

Slide 33 text

best_programming_language = Go current_year = 2018 linux_on_desktop = Maybe State Machine best_programming_language = Go SET current_year = 2018 SET

Slide 34

Slide 34 text

best_programming_language = Go current_year = 2018 State Machine best_programming_language = Go SET current_year = 2018 SET linux_on_desktop DELETE

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Getting a majority of servers in a cluster to agree on What’s in the log

Slide 37

Slide 37 text

I like my leadership the same way I like my ☕ Strong. — Raft

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Leader Election

Slide 40

Slide 40 text

⏰ Random Timers

Slide 41

Slide 41 text

* + , Monotonically Increasing Terms

Slide 42

Slide 42 text

every node starts off as a Follower if a follower doesn’t hear from a leader for a while (random timer) it becomes a Candidate if the candidate receives votes from a majority of nodes it will become the Leader

Slide 43

Slide 43 text

In the case of a split-vote nodes will simply Wait for another election

Slide 44

Slide 44 text

Leader Election

Slide 45

Slide 45 text

Leader goes AWOL

Slide 46

Slide 46 text

Log Replication

Slide 47

Slide 47 text

1. Client sends a command to the Leader. 2. Leader appends an entry to its own log. 3. Leader issues an RPC (AppendEntries) to each Follower. 4. Follower appends the entry to its log and responds to the Leader to acknowledge the entry. 5. Once the entry has been acknowledged by a majority of Followers the Leader responds to the Client. 6. Leader issues a heartbeat RPC (AppendEntries) to each Follower which “commits” the entry and applies it to each Follower’s state machine.

Slide 48

Slide 48 text

Log Replication

Slide 49

Slide 49 text

Handling Turbulent Network Conditions

Slide 50

Slide 50 text

Safety Guarantees Election Safety Only a single leader will be elected in each term. Append Only Leaders The leader will never delete or overwrite entries. Log Matching Any two logs with an entry of the same index and term, will contain the same value. Leader Completeness An entry committed in an earlier term will be present in the logs of leaders in later terms. State Machine Safety If a log entry at a given index has been applied to a server’s state machine, no other server will ever apply a different log entry at the same index.

Slide 51

Slide 51 text

Preventing Split-Brain 1 1 1 1 1

Slide 52

Slide 52 text

1 1 1 1 1 Preventing Split-Brain

Slide 53

Slide 53 text

2 2 2 1 1 Preventing Split-Brain

Slide 54

Slide 54 text

2 2 2 1 1 Preventing Split-Brain X=1 X=2

Slide 55

Slide 55 text

2 2 2 1 1 Preventing Split-Brain X=1 X=1 X=1 X=2 X=2

Slide 56

Slide 56 text

2 2 2 1 1 Preventing Split-Brain X=1 X=1 X=1 X=2 X=2

Slide 57

Slide 57 text

2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2 1 X=2

Slide 58

Slide 58 text

2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2 1 X=2 AppendEntries Term: 1 X = 2

Slide 59

Slide 59 text

2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2 1 X=2 NOPE. Term is 2 now

Slide 60

Slide 60 text

2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2 2

Slide 61

Slide 61 text

2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2 2 AppendEntries Term: 2 X = 1

Slide 62

Slide 62 text

2 2 2 2 Preventing Split-Brain X=1 X=1 X=1 2 X=1 X=1

Slide 63

Slide 63 text

Snapshots / Log Compaction

Slide 64

Slide 64 text

Thanks! https://raft.github.io/raft.pdf http://thesecretlivesofdata.com/raft/