Slide 1

Slide 1 text

CONSENSUS ALGORITHMS YIFAN XING - 2018 CONSENSUS ALGORITHMS Yifan Xing DISTRIBUTED SYSTEMS @yifan_xing_e

Slide 2

Slide 2 text

CONSENSUS ALGORITHMS YIFAN XING - 2018 HOW TO PICK CAKE FLAVORS Yifan Xing Consensus @yifan_xing_e

Slide 3

Slide 3 text

Poll Kafka: A distributed streaming platform YIFAN XING - 2018 @yifan_xing_e Consensus Algorithm e.g. Raft, Paxos 1 CONSENSUS ALGORITHMS

Slide 4

Slide 4 text

Distributed System YIFAN XING - 2018 @yifan_xing_e 2 CONSENSUS ALGORITHMS

Slide 5

Slide 5 text

Modern Examples YIFAN XING - 2018 Web DNS BitTorrent @yifan_xing_e 3 CONSENSUS ALGORITHMS

Slide 6

Slide 6 text

Fun Fact YIFAN XING - 2018 American Airline Central Office Travel Agents 1920s Cards for each flight Mark seats sold on cards @yifan_xing_e 4 CONSENSUS ALGORITHMS

Slide 7

Slide 7 text

Challenges 5 YIFAN XING - 2018 @yifan_xing_e

Slide 8

Slide 8 text

Challenge 1: Lack of Global Knowledge YIFAN XING - 2018 @yifan_xing_e 6 CONSENSUS ALGORITHMS Up-to-date Exchange Inconsistency

Slide 9

Slide 9 text

Challenge 2: Time YIFAN XING - 2018 @yifan_xing_e 7 CONSENSUS ALGORITHMS Clock skew Delay/duplicate messages -> order

Slide 10

Slide 10 text

Challenge 3: Consistency YIFAN XING - 2018 @yifan_xing_e 8 CONSENSUS ALGORITHMS Concurrent operations Consistent state Conflicts

Slide 11

Slide 11 text

Challenge 4: Failures YIFAN XING - 2018 @yifan_xing_e 9 A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. — Leslie Lamport CONSENSUS ALGORITHMS Tolerate failures Detect Handle Recover

Slide 12

Slide 12 text

Consensus Algorithms 10 YIFAN XING - 2018 @yifan_xing_e Raft Paxos Consensus Fault-tolerance Integrity Availability Reliability

Slide 13

Slide 13 text

YIFAN XING - 2018 @yifan_xing_e 11 Paxos Background Paxos Made Simple: Basic Paxos Lynch & Liskov Leslie Lamport Proved The Part Time Parliament Multi-paxos: Paxos + Complexity Rejected Published No mathematical proof 10 CONSENSUS ALGORITHMS

Slide 14

Slide 14 text

Basic Paxos YIFAN XING - 2018 Proposer: Propose a value Prepare: Try to propose Acceptor: accept/ reject value @yifan_xing_e 13 Consensus: Agree on one value CONSENSUS ALGORITHMS

Slide 15

Slide 15 text

Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Choose a Proposal Number (n) CONSENSUS ALGORITHMS

Slide 16

Slide 16 text

Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) CONSENSUS ALGORITHMS

Slide 17

Slide 17 text

Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) If n > maxProposal: maxProposal = n promise CONSENSUS ALGORITHMS Respond Promise Won't accept proposal with n' < n

Slide 18

Slide 18 text

Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) Respond Promise Won't accept proposal with n' < n If majority Accept(n, ) CONSENSUS ALGORITHMS

Slide 19

Slide 19 text

Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Broadcast Accept(n, ) CONSENSUS ALGORITHMS

Slide 20

Slide 20 text

Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Broadcast Accept(n, ) Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value CONSENSUS ALGORITHMS

Slide 21

Slide 21 text

Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value If majority: if any rejection => n is not largest repeat from beginning else: value chosen CONSENSUS ALGORITHMS Broadcast Accept(n, )

Slide 22

Slide 22 text

Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value If majority: if any rejection => n is not largest repeat from beginning else: value chosen CONSENSUS ALGORITHMS Broadcast Accept(n, )

Slide 23

Slide 23 text

Paxos: Proposal Number YIFAN XING - 2018 @yifan_xing_e 15 S0 - server id: unique - round number: increment overtime shared, highest Generate new proposal number: increment maxRound concatenate with server id CONSENSUS ALGORITHMS

Slide 24

Slide 24 text

Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 16 CONSENSUS ALGORITHMS A sequence of instances of Basic Paxos Log entries Implementation of Basic Paxos Concepts

Slide 25

Slide 25 text

Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Client cmd

Slide 26

Slide 26 text

Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Client cmd Basic Paxos Choose cmds (values) in log entries

Slide 27

Slide 27 text

Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Apply cmd (log entries) 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd

Slide 28

Slide 28 text

Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Client result

Slide 29

Slide 29 text

Paxos: Leader Vs. NonLeader YIFAN XING - 2018 @yifan_xing_e 18 S0 If didn't receive Heartbeat from a Higher ID for >= 2T ms: act as leader act as proposer Server with highest ID Heartbeat for every T ms Accept requests from client Leader/ Distinguished Proposer: CONSENSUS ALGORITHMS

Slide 30

Slide 30 text

Paxos: Leader Vs. NonLeader YIFAN XING - 2018 @yifan_xing_e 18 S0 Server with highest ID Heartbeat for every T ms Accept requests from client Non-leader: Redirect client requests to leader act as acceptor Leader/ Proposer: CONSENSUS ALGORITHMS

Slide 31

Slide 31 text

Paxos: Leader YIFAN XING - 2018 @yifan_xing_e 19 Unlikely to have two leaders at the same time Can handle multiple leaders, however, it won’t work as efficient because of conflicts CONSENSUS ALGORITHMS

Slide 32

Slide 32 text

Industry Examples YIFAN XING - 2018 @yifan_xing_e 20 CONSENSUS ALGORITHMS

Slide 33

Slide 33 text

Raft YIFAN XING - 2018 @yifan_xing_e CONSENSUS ALGORITHMS

Slide 34

Slide 34 text

Raft: Background YIFAN XING - 2018 Simpler version of Paxos Equivalent: performance & fault-tolerance Consistency, Conciseness, Correctness Why: implemented -> useful, extended/ adapted to the environment Understandability @yifan_xing_e 21 Designed by Diego Ongaro and John Ousterhout at Stanford CONSENSUS ALGORITHMS

Slide 35

Slide 35 text

Raft YIFAN XING - 2018 @yifan_xing_e 22 Raft: log (!value) CONSENSUS ALGORITHMS

Slide 36

Slide 36 text

Raft: Phases YIFAN XING - 2018 1. Select one machine to be a leader 2. Detect crashes, reelection @yifan_xing_e 23 Leader Election CONSENSUS ALGORITHMS

Slide 37

Slide 37 text

Raft: Phases YIFAN XING - 2018 1. Select one machine to be a leader 2. Detect crashes, reelection 1. Leader processes commands from clients 2. Replicates logs (consistency and consensus among servers) @yifan_xing_e 23 Leader Election Log Replication CONSENSUS ALGORITHMS

Slide 38

Slide 38 text

Raft: Leader Election YIFAN XING - 2018 @yifan_xing_e 24 Become Candidate CurrentTerm++ Vote for itself Send RequestVoteRPC to other servers Become Leader: - Send heartbeats - Handle requests Become Follower: - Redirect requests Majority Votes Timeout RPC from Leader CONSENSUS ALGORITHMS

Slide 39

Slide 39 text

Raft: How to ensure election works? YIFAN XING - 2018 @yifan_xing_e 25 At most one leader per term: - Each server: one vote per term - Receive majority to win election (N / 2 + 1) - Example: S0 S1 S1 S0 S0 S0 S1 S2 S3 S4 CONSENSUS ALGORITHMS

Slide 40

Slide 40 text

Raft: How to ensure election works? YIFAN XING - 2018 @yifan_xing_e 26 At most one leader per term: - Each server: one vote per term - Receive majority to win election (N / 2 + 1) - Example: S0 Leader S1 S2 S3 S4 CONSENSUS ALGORITHMS S1 S0 S0 S0 S1

Slide 41

Slide 41 text

Raft: How to ensure election works? YIFAN XING - 2018 @yifan_xing_e 27 There will eventually be a leader: - Random election timeout (range 100-300ms) - Usually, one times out first, and win the majority votes - If two time out at the same time: - Split vote -> election timeout -> re-enter election state (increment term, gather votes) CONSENSUS ALGORITHMS

Slide 42

Slide 42 text

Raft: Log Replication YIFAN XING - 2018 @yifan_xing_e 28 Client S0 Log CONSENSUS ALGORITHMS

Slide 43

Slide 43 text

Raft: Leader Appends Entry to Log YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd S0 CONSENSUS ALGORITHMS

Slide 44

Slide 44 text

Raft: Leader Sends AppendEntryRPC YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd S0 AppendEntryRPC CONSENSUS ALGORITHMS

Slide 45

Slide 45 text

Raft: Followers Send ACKs YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd cmd cmd cmd cmd S0 ACK CONSENSUS ALGORITHMS

Slide 46

Slide 46 text

Raft: Replies Client YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd cmd cmd cmd S0 If majority: - Entry Committed - Execute - Return result cmd CONSENSUS ALGORITHMS

Slide 47

Slide 47 text

Raft: Notifies Followers of Committed Entry YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd cmd cmd cmd cmd S0 CONSENSUS ALGORITHMS

Slide 48

Slide 48 text

Raft: Not Majority? YIFAN XING - 2018 @yifan_xing_e 28 Client Log cmd cmd S0 ACK If not majority: - Leader retries until succeed CONSENSUS ALGORITHMS

Slide 49

Slide 49 text

Raft: Log Entry YIFAN XING - 2018 @yifan_xing_e 29 Term: - Current Term when receive cmd Command: - cmd to execute 1 cmd S0 Index CONSENSUS ALGORITHMS

Slide 50

Slide 50 text

Raft: Consistency Server crushes => log inconsistency Goal: log consistency. But how? YIFAN XING - 2018 @yifan_xing_e 30 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd 4 cmd 4 cmd 4 cmd S0 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd 4 cmd 4 cmd 4 cmd S2 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S1 CONSENSUS ALGORITHMS

Slide 51

Slide 51 text

Log Consistency Always trust leader’s log, do not elect candidate with inconsistent logs No “holes” in log Repair inconsistency during log replication process YIFAN XING - 2018 @yifan_xing_e 31 CONSENSUS ALGORITHMS

Slide 52

Slide 52 text

Leader's Log Completeness How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 32 RequestVoteRPC: term - candidate’s term candidateId - candidate requesting vote lastLogIndex - index of candidate’s last log entry lastLogTerm - term of candidate’s last log entry CONSENSUS ALGORITHMS

Slide 53

Slide 53 text

(In)complete Log Example - 1 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 33 RequestVoteRPC: term - 2 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 1 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 3 lastLogTerm - 1 Current StateMachine CONSENSUS ALGORITHMS

Slide 54

Slide 54 text

(In)complete Log Example - 1 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 33 RequestVoteRPC: term - 2 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 1 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 3 lastLogTerm - 1 Current StateMachine REJECT VOTE Candidate.logTerm < my.logTerm => my log is more complete CONSENSUS ALGORITHMS

Slide 55

Slide 55 text

YIFAN XING - 2018 @yifan_xing_e 34 CONSENSUS ALGORITHMS

Slide 56

Slide 56 text

(In)complete Log Example - 2 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine CONSENSUS ALGORITHMS

Slide 57

Slide 57 text

(In)complete Log Example - 2 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine REJECT VOTE Candidate.lastLogIndex < my.lastLogIndex => my log is longer CONSENSUS ALGORITHMS

Slide 58

Slide 58 text

(In)complete Log Example - 3 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 5 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine CONSENSUS ALGORITHMS

Slide 59

Slide 59 text

(In)complete Log Example - 3 How to guarantee completeness of leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 5 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine ACCEPT VOTE REQUEST candidate.term >= my.term && candidate.lastLogIdx > my.lastLogIdx => complete CONSENSUS ALGORITHMS

Slide 60

Slide 60 text

Log Replication: Repair Inconsistency Check for consistency when sending AppendEntriesRPC YIFAN XING - 2018 @yifan_xing_e 37 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term 1 cmd 1 cmd 2 cmd 1 2 3 4 5 2 cmd 2 cmd S0 CONSENSUS ALGORITHMS

Slide 61

Slide 61 text

Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 38 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 2 cmd 2 cmd S1 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term 1 2 3 4 5 Receiver checks its own preceding index and term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency

Slide 62

Slide 62 text

Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 38 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S1 If match => append entry else => rejects request 1 2 3 4 5 CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency

Slide 63

Slide 63 text

Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency

Slide 64

Slide 64 text

Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 Do not match: REJECT CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency

Slide 65

Slide 65 text

Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency

Slide 66

Slide 66 text

Goal: append a new entry to log YIFAN XING - 2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd S1 2 cmd 2 cmd 1 2 3 4 5 Match: Replicate Log Entry CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency

Slide 67

Slide 67 text

Log Matching Property If both index and term match: 1. the two entries store the same cmd 2. all previous entries are identical For each log entry: compare index and term YIFAN XING - 2018 @yifan_xing_e 39 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 S1 1 cmd 1 cmd 2 cmd 3 xxx 3 xxx S2 CONSENSUS ALGORITHMS 1 2 3 4 5

Slide 68

Slide 68 text

YIFAN XING - 2018 @yifan_xing_e CONSENSUS ALGORITHMS Problems! 40

Slide 69

Slide 69 text

YIFAN XING - 2018 @yifan_xing_e 41 CONSENSUS ALGORITHMS Overhead Duplicate Out-of-order Lost Latency Messages

Slide 70

Slide 70 text

YIFAN XING - 2018 @yifan_xing_e 42 CONSENSUS ALGORITHMS Leader failure Follower(s) failure Partition Failures

Slide 71

Slide 71 text

YIFAN XING - 2018 @yifan_xing_e 43 CONSENSUS ALGORITHMS Who can read/ write? Confidentiality

Slide 72

Slide 72 text

YIFAN XING - 2018 @yifan_xing_e 44 CONSENSUS ALGORITHMS Malicious or misbehaving peers Abuse Malicious Peers

Slide 73

Slide 73 text

YIFAN XING - 2018 @yifan_xing_e 45 CONSENSUS ALGORITHMS Byzantine Generals Problem

Slide 74

Slide 74 text

YIFAN XING - 2018 @yifan_xing_e 46 CONSENSUS ALGORITHMS Byzantine Failure

Slide 75

Slide 75 text

YIFAN XING - 2018 @yifan_xing_e 46 CONSENSUS ALGORITHMS Byzantine Failure

Slide 76

Slide 76 text

YIFAN XING - 2018 @yifan_xing_e 46 CONSENSUS ALGORITHMS Byzantine Failure

Slide 77

Slide 77 text

YIFAN XING - 2018 @yifan_xing_e 47 CONSENSUS ALGORITHMS Byzantine Paxos algorithms Byzantine Raft algorithms Byzantine Fault Tolerant Algorithms

Slide 78

Slide 78 text

YIFAN XING - 2018 @yifan_xing_e 47 CONSENSUS ALGORITHMS Byzantine Paxos algorithms Byzantine Raft algorithms Byzantine Fault Tolerant Algorithms

Slide 79

Slide 79 text

Implementing Consensus Algorithms: Consistency Correctness, etc. Understandability Don't Resilient against issues? Worth it to be resilient? YIFAN XING - 2018 @yifan_xing_e 48 CONSENSUS ALGORITHMS Reliability Complexity Designing Consensus Algorithms: Take Away

Slide 80

Slide 80 text

T H A N K Y O U @yifan_xing_e