Slide 1

Slide 1 text

You. Must. Build. A. Raft! A tidal wave of information

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

@schristoff25 So contained Much kuberbabby Deade meme wow Many systems engineered catonacomputer.com

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Lead Me

Slide 8

Slide 8 text

Lesson Plan: - Distributed Systems vs Centralized Systems - Byzantine General’s Problem - Byzantine Fault Tolerance - Consensus - Paxos - Raft

Slide 9

Slide 9 text

Centralized Distributed

Slide 10

Slide 10 text

Byzantine Generals Problem

Slide 11

Slide 11 text

Fig. 1: A City

Slide 12

Slide 12 text

Fig. 1: A City Fig. 2: A Byzantine General (obviously)

Slide 13

Slide 13 text

(oh boy.)

Slide 14

Slide 14 text

Fig. 3: A Messenger (coo!)

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

(muahaha) We’re retreating?? ??

Slide 17

Slide 17 text

Fig. 4: The Traitor

Slide 18

Slide 18 text

Solution

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

The generals can reach consensus if 2/3 of them are honest. Fig. 5: A Lieutenant

Slide 21

Slide 21 text

Attack! Ok, Attack! Attack! Attack! No, Eat!

Slide 22

Slide 22 text

So.. 2 commands for Attack and 1 for Eat. I’ll ATTACK!

Slide 23

Slide 23 text

Attack! Attack! Nap Time! Eat! Eat! Nap Time! Nap Time! Eat! Attack!

Slide 24

Slide 24 text

Lieutenant 1 Lieutenant 2 Lieutenant 3 So.. Attack, Nap, and Eat. We’ll RETREAT! (as a default action) Attack, Nap, Eat Attack, Nap, Eat Attack, Nap, Eat

Slide 25

Slide 25 text

Byzantine Fault Tolerance

Slide 26

Slide 26 text

“In a "Byzantine failure", a component such as a server can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers.”

Slide 27

Slide 27 text

Byzantine Failure The loss of system agreement due to a Byzantine Fault Byzantine Fault Any fault that presents different symptoms to different observes

Slide 28

Slide 28 text

Actual Byzantine Fault

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Practical Byzantine Fault Tolerance

Slide 31

Slide 31 text

History - Written by Castro and Liskov in 1999 - Tests show PBFT is only 3% slower than the standard NFS daemon

Slide 32

Slide 32 text

Breakdown PBFT uses 3f+1, f for failures, which requires more replicas than non-Byzantine consensus modules.

Slide 33

Slide 33 text

Breakdown - We have a system that we want to withstand two failures. - 3 * 2 + 1 = 7

Slide 34

Slide 34 text

Breakdown First, the client sends a request(r) to the primary. R

Slide 35

Slide 35 text

Breakdown The primary then sends the request to all backups. R

Slide 36

Slide 36 text

Breakdown Each backup executes the request and then replies to the client Y Y Y Y Y

Slide 37

Slide 37 text

Breakdown Client waits for f+1 replies from different replicas with the same result. Y Y Y Y Y

Slide 38

Slide 38 text

Breakdown If the client doesn’t receive replies soon enough, it will send the request to all replicas. Y Y Y Y Y

Slide 39

Slide 39 text

(The Saddest Moment)

Slide 40

Slide 40 text

“How can you make a reliable computer service?” the presenter will ask in an innocent voice before continuing, “It may be difficult if you can’t trust anything and the entire concept of happiness is a lie designed by unseen overlords of endless deceptive power.” - James Mickens

Slide 41

Slide 41 text

Consensus Problem

Slide 42

Slide 42 text

X = 1

Slide 43

Slide 43 text

X = 5 X = 0 X = ? X = 10 X = ??

Slide 44

Slide 44 text

?????????????????????????

Slide 45

Slide 45 text

A consensus protocol must.. Termination Every correct process decides some value Integrity If all the correct processes proposed the same value x, then any correct process decide x Validity If a process decides a value x, then x must have been proposed by some correct processes. Agreement Every correct process must agree on the same value

Slide 46

Slide 46 text

Paxos

Slide 47

Slide 47 text

History - Used Lynch and Liskov’s work as a base, 1988 - The Part Time Parliament, Lamport 1989-1998 - No one understood it so it took ten years to get it published - Paxos Made Simple, Lamport 2001 - Also not easy to understand..

Slide 48

Slide 48 text

Breakdown

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

Howdy!

Slide 52

Slide 52 text

or

Slide 53

Slide 53 text

How about this majestic amazing cat? Wow, yes, she’s really cool.

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

Paxos has three roles: Proposers propose values Acceptors accept them Learners learn the agreed upon value

Slide 56

Slide 56 text

Paxos has four phases: Prepare Promise Accept Accepted

Slide 57

Slide 57 text

Howdy! Prepare 9001

Slide 58

Slide 58 text

Promise Accepters ask themselves: - Have we seen a proposal number higher than 9001? - - If not, let’s promise never to accept something > 9001, and say our highest proposal is 8999.

Slide 59

Slide 59 text

Accept ACCEPT!

Slide 60

Slide 60 text

Accepted

Slide 61

Slide 61 text

Breakdown Paxos needs 2m+1 servers to tolerate m failures. Ex. : I want my cluster to tolerate 2 failures, m = 2 2 * 2 + 1 = 5 I need five servers.

Slide 62

Slide 62 text

Breakdown: Failures Proposer fails during prepare phase: - Proposer is now unable to accept promise, therefore it doesn’t complete. - A new proposer can take over though.

Slide 63

Slide 63 text

Breakdown: Failures Proposer fails during accept phase - Another proposer tries to overwrite the job, but someone will eventually tell the new proposer about the previously unfinished business. The new proposer will update their value to the previous value.

Slide 64

Slide 64 text

Breakdown: Failures Acceptor fails - Keeps running unless majority fails

Slide 65

Slide 65 text

Leaderless Byzantine Paxos - Leaders are huge pain point to become Byzantine Fault Tolerant - Once a leader is malicious it is difficult to choose a new leader - Lamport calls Castro and Liskov’s method for detecting this “ad hoc”. - Each server is a virtual leader - The message is sent out to each server - All virtual leaders synchronously send back their responses

Slide 66

Slide 66 text

“If the system does not behave synchronously, then the synchronous Byzantine agreement algorithm may fail, causing different servers to choose different virtual-leader messages. This is equivalent to a malicious leader sending conflicting messages to different processes.” - Leslie Lamport

Slide 67

Slide 67 text

“There are significant gaps between the description of Paxos and the needs of the real world system...” - Google Chubby Authors

Slide 68

Slide 68 text

Raft

Slide 69

Slide 69 text

History In Search of an Understandable Consensus Algorithm, Diego Ongaro and John Ousterhout est. 2013 College students set out to make a more understandable consensus algorithm Stands for “Replicated and Fault Tolerant”

Slide 70

Slide 70 text

Breakdown In the beginning Raft has to decide a leader. Each node will have a randomized timeout set 150ms 157ms 190ms 300ms 201ms

Slide 71

Slide 71 text

Breakdown The first node to reach the end of it’s timeout will request to be leader A node will typically reach the end of it’s timeout when it doesn’t get a message from the leader 0ms 7ms 40ms 150ms 51ms Vote for me please!

Slide 72

Slide 72 text

Breakdown The elected leader will send out health checks which will restart the other node’s timeouts. 51ms 150ms 165ms 40ms 150ms 51ms New phone, who dis??

Slide 73

Slide 73 text

Server can be in any of the three states at any given time: Follower Listening for heartbeats Candidate Polling for votes Leader Listening for incoming commands, sending out heartbeats to keep term alive

Slide 74

Slide 74 text

Breakdown Raft is divided into terms, where at most there is one leader per term. - Some terms can have no leaders “Terms identify obsolete information” - John Ousterhout - Leader’s log is seen as the truth, and is the most up to date log.

Slide 75

Slide 75 text

Breakdown: Leader Election Timeout occurs after not receiving heartbeat from leader Request others to vote for you Becomes leader, send out heartbeats Somebody else becomes leader, become a follower Vote split, nobody wins. New term

Slide 76

Slide 76 text

Leader Election - Candidates will deny a leader if their log has a higher term, higher index then the proposed-leaders log. 1 X = 3 1 X = 3 1 X = 3 1 X = 3 1 X = 3 2 Y = 8 2 Y = 8 2 Y = 8 2 Y = 8 INDEX Value INDEX Value Different color represents new term 2 Y = 8 3 Y = 8 3 N = 9 3 N = 9 3 N = 9 3 N = 9 Vote for me please!

Slide 77

Slide 77 text

Breakdown: Log Replication “Keeping the replicated log consistent is the job of the consensus algorithm.” - Raft is designed around the log. Servers with inconsistent logs will never get elected as leader - Normal operation of Raft will repair inconsistencies

Slide 78

Slide 78 text

Breakdown: Log Replication 1 X = 3 1 X = 3 2 Y = 8 2 Y = 8 3 N = 9 3 N = 9 - Logs must persist through crashes - Any committed entry is safe to execute in state machines - A committed entry is replicated on the majority of servers 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 4 P = 6 4 P = 6 4 P = 6 4 P = 6 4 P = 6 5 L = 0 5 L = 0 5 L = 0 5 L = 0 6 R = 7 6 R = 7 6 R = 7 7 Z = 6 7 Z = 6 Committed Entries

Slide 79

Slide 79 text

Breakdown: Log Replication 1 X = 3 1 X = 3 2 Y = 8 2 Y = 8 3 N = 9 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 4 P = 6 4 P = 6 4 P = 6 4 P = 6 4 P = 6 5 L = 0 5 L = 0 5 L = 0 5 L = 0 6 R = 7 6 R = 7 6 R = 7 7 Z = 6 7 Z = 6 6 J = 1 7 W= 3 Lookin’ for a blue 6, and nothing in seven, bud..

Slide 80

Slide 80 text

Breakdown: Log Replication 1 X = 3 1 X = 3 2 Y = 8 2 Y = 8 3 N = 9 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 4 P = 6 4 P = 6 4 P = 6 4 P = 6 4 P = 6 5 L = 0 5 L = 0 5 L = 0 5 L = 0 6 R = 7 6 R = 7 6 R = 7 7 Z = 6 7 Z = 6 6 J = 1 7 W= 3 Lookin’ for a blue 6, and nothing in seven, bud.. No, thank you, friend.

Slide 81

Slide 81 text

Breakdown: Log Replication 1 X = 3 1 X = 3 2 Y = 8 2 Y = 8 3 N = 9 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 1 X = 3 2 Y = 8 3 N = 9 4 P = 6 4 P = 6 4 P = 6 4 P = 6 4 P = 6 5 L = 0 5 L = 0 5 L = 0 5 L = 0 6 R = 7 6 R = 7 6 R = 7 7 Z = 6 7 Z = 6 6 J = 1 7 W= 3 5 L = 0 6 R = 7 7 Z = 6 How’s this looking? Oh! I can fix this!

Slide 82

Slide 82 text

Breakdown: Failures Normal operations will heal all log inconsistencies - If leader fails before sending out new entry, then that entry will be lost

Slide 83

Slide 83 text

A Byzantine Fault Tolerant Raft Tangaroa: a Byzantine Fault Tolerant Raft: - Uses digital signatures to authenticate messages - Clients can interrupt current leadership if it fails to make progress. Disallows unloyal leaders from starving the system. - Nodes broadcast each entry they would like to commit to each other, not just the leader

Slide 84

Slide 84 text

Summary

Slide 85

Slide 85 text

1988 Lynch, Dwok and Stockmeyer Lynch, Dwok, and Stockmeyer show the the solvability of consensus 1989-1998 The Part-Time Parliament Lamport spends almost ten years trying to get his paper accepted. 2001 Paxos Made Simple Lamport releases Paxos made simple in an attempt to re-teach Paxos RAFT In Search of an Understandable Consensus is releas RAFT is born! 1982 Byzantine Generals Problem Lamport, Shostak, and Pease release the Byzantine Generals Problem to describe failures in reliable computing. 2011 Leaderless Byzantine Paxos Lamport releases three page paper describing how to make a Byzantine Fault Tolerant Paxos Est. 2012

Slide 86

Slide 86 text

- When something talks about Byzantine, you know what it means - You can use this in meetings to sound really cool - If anyone says “Let’s use Paxos” you can tell them why it’s probably not a good idea - If someone tells you that X problem is occuring because of Raft, you may be able to tell them they’re wrong.

Slide 87

Slide 87 text

Lead Me

Slide 88

Slide 88 text

No content

Slide 89

Slide 89 text

Thank you Wikipedia: Paxos Raft Consensus Problem Byzantine Fault Tolerance Medium: Loom Network Whitepapers: Practical Byzantine Fault Tolerance Byzantine Leaderless Paxos Whitepapers: The Part-time Parliament The Byzantine Generals Problem Paxos Made Simple In Search of an Understandable Consensus Algorithm Consensus in the Cloud: Paxos Demystified Tangaroa: A Byzantine Fault Tolerant Raft

Slide 90

Slide 90 text

Thank you Misc. James Aspnes Notes - Paxos The Saddest Moment Mark Nelson - Byzantine Fault Good Math - Paxos Byzantine Failures - NASA GoogleTech Talk - Paxos Talk on Raft Raft Website CSE452 at Washington State Practical Byzantine Fault Tolerance