Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Consensus Algorithms in Distributed Systems

Yifan Xing
November 15, 2018

Consensus Algorithms in Distributed Systems

Paxos & Raft

Yifan Xing

November 15, 2018
Tweet

More Decks by Yifan Xing

Other Decks in Technology

Transcript

  1. CONSENSUS ALGORITHMS YIFAN XING - 2018 HOW TO PICK CAKE

    FLAVORS Yifan Xing Consensus @yifan_xing_e
  2. Poll Kafka: A distributed streaming platform YIFAN XING - 2018

    @yifan_xing_e Consensus Algorithm e.g. Raft, Paxos 1 CONSENSUS ALGORITHMS
  3. Fun Fact YIFAN XING - 2018 American Airline Central Office

    Travel Agents 1920s Cards for each flight Mark seats sold on cards @yifan_xing_e 4 CONSENSUS ALGORITHMS
  4. Challenge 1: Lack of Global Knowledge YIFAN XING - 2018

    @yifan_xing_e 6 CONSENSUS ALGORITHMS Up-to-date Exchange Inconsistency
  5. Challenge 2: Time YIFAN XING - 2018 @yifan_xing_e 7 CONSENSUS

    ALGORITHMS Clock skew Delay/duplicate messages -> order
  6. Challenge 3: Consistency YIFAN XING - 2018 @yifan_xing_e 8 CONSENSUS

    ALGORITHMS Concurrent operations Consistent state Conflicts
  7. Challenge 4: Failures YIFAN XING - 2018 @yifan_xing_e 9 A

    distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. — Leslie Lamport CONSENSUS ALGORITHMS Tolerate failures Detect Handle Recover
  8. Consensus Algorithms 10 YIFAN XING - 2018 @yifan_xing_e Raft Paxos

    Consensus Fault-tolerance Integrity Availability Reliability
  9. YIFAN XING - 2018 @yifan_xing_e 11 Paxos Background Paxos Made

    Simple: Basic Paxos Lynch & Liskov Leslie Lamport Proved The Part Time Parliament Multi-paxos: Paxos + Complexity Rejected Published No mathematical proof 10 CONSENSUS ALGORITHMS
  10. Basic Paxos YIFAN XING - 2018 Proposer: Propose a value

    Prepare: Try to propose Acceptor: accept/ reject value @yifan_xing_e 13 Consensus: Agree on one value CONSENSUS ALGORITHMS
  11. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Choose a Proposal Number (n) CONSENSUS ALGORITHMS
  12. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) CONSENSUS ALGORITHMS
  13. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) If n > maxProposal: maxProposal = n promise CONSENSUS ALGORITHMS Respond Promise Won't accept proposal with n' < n
  14. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) Respond Promise Won't accept proposal with n' < n If majority Accept(n, ) CONSENSUS ALGORITHMS
  15. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Broadcast Accept(n, ) CONSENSUS ALGORITHMS
  16. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Broadcast Accept(n, ) Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value CONSENSUS ALGORITHMS
  17. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value If majority: if any rejection => n is not largest repeat from beginning else: value chosen CONSENSUS ALGORITHMS Broadcast Accept(n, )
  18. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value If majority: if any rejection => n is not largest repeat from beginning else: value chosen CONSENSUS ALGORITHMS Broadcast Accept(n, )
  19. Paxos: Proposal Number YIFAN XING - 2018 @yifan_xing_e 15 S0

    - server id: unique - round number: increment overtime shared, highest Generate new proposal number: increment maxRound concatenate with server id CONSENSUS ALGORITHMS
  20. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 16 CONSENSUS ALGORITHMS A

    sequence of instances of Basic Paxos Log entries Implementation of Basic Paxos Concepts
  21. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Client

    cmd Basic Paxos Choose cmds (values) in log entries
  22. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Apply

    cmd (log entries) 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd
  23. Paxos: Leader Vs. NonLeader YIFAN XING - 2018 @yifan_xing_e 18

    S0 If didn't receive Heartbeat from a Higher ID for >= 2T ms: act as leader act as proposer Server with highest ID Heartbeat for every T ms Accept requests from client Leader/ Distinguished Proposer: CONSENSUS ALGORITHMS
  24. Paxos: Leader Vs. NonLeader YIFAN XING - 2018 @yifan_xing_e 18

    S0 Server with highest ID Heartbeat for every T ms Accept requests from client Non-leader: Redirect client requests to leader act as acceptor Leader/ Proposer: CONSENSUS ALGORITHMS
  25. Paxos: Leader YIFAN XING - 2018 @yifan_xing_e 19 Unlikely to

    have two leaders at the same time Can handle multiple leaders, however, it won’t work as efficient because of conflicts CONSENSUS ALGORITHMS
  26. Raft: Background YIFAN XING - 2018 Simpler version of Paxos

    Equivalent: performance & fault-tolerance Consistency, Conciseness, Correctness Why: implemented -> useful, extended/ adapted to the environment Understandability @yifan_xing_e 21 Designed by Diego Ongaro and John Ousterhout at Stanford CONSENSUS ALGORITHMS
  27. Raft: Phases YIFAN XING - 2018 1. Select one machine

    to be a leader 2. Detect crashes, reelection @yifan_xing_e 23 Leader Election CONSENSUS ALGORITHMS
  28. Raft: Phases YIFAN XING - 2018 1. Select one machine

    to be a leader 2. Detect crashes, reelection 1. Leader processes commands from clients 2. Replicates logs (consistency and consensus among servers) @yifan_xing_e 23 Leader Election Log Replication CONSENSUS ALGORITHMS
  29. Raft: Leader Election YIFAN XING - 2018 @yifan_xing_e 24 Become

    Candidate CurrentTerm++ Vote for itself Send RequestVoteRPC to other servers Become Leader: - Send heartbeats - Handle requests Become Follower: - Redirect requests Majority Votes Timeout RPC from Leader CONSENSUS ALGORITHMS
  30. Raft: How to ensure election works? YIFAN XING - 2018

    @yifan_xing_e 25 At most one leader per term: - Each server: one vote per term - Receive majority to win election (N / 2 + 1) - Example: S0 S1 S1 S0 S0 S0 S1 S2 S3 S4 CONSENSUS ALGORITHMS
  31. Raft: How to ensure election works? YIFAN XING - 2018

    @yifan_xing_e 26 At most one leader per term: - Each server: one vote per term - Receive majority to win election (N / 2 + 1) - Example: S0 Leader S1 S2 S3 S4 CONSENSUS ALGORITHMS S1 S0 S0 S0 S1
  32. Raft: How to ensure election works? YIFAN XING - 2018

    @yifan_xing_e 27 There will eventually be a leader: - Random election timeout (range 100-300ms) - Usually, one times out first, and win the majority votes - If two time out at the same time: - Split vote -> election timeout -> re-enter election state (increment term, gather votes) CONSENSUS ALGORITHMS
  33. Raft: Leader Appends Entry to Log YIFAN XING - 2018

    @yifan_xing_e 28 Client Log cmd S0 CONSENSUS ALGORITHMS
  34. Raft: Leader Sends AppendEntryRPC YIFAN XING - 2018 @yifan_xing_e 28

    Client Log cmd S0 AppendEntryRPC CONSENSUS ALGORITHMS
  35. Raft: Followers Send ACKs YIFAN XING - 2018 @yifan_xing_e 28

    Client Log cmd cmd cmd cmd cmd S0 ACK CONSENSUS ALGORITHMS
  36. Raft: Replies Client YIFAN XING - 2018 @yifan_xing_e 28 Client

    Log cmd cmd cmd cmd S0 If majority: - Entry Committed - Execute - Return result cmd CONSENSUS ALGORITHMS
  37. Raft: Notifies Followers of Committed Entry YIFAN XING - 2018

    @yifan_xing_e 28 Client Log cmd cmd cmd cmd cmd S0 CONSENSUS ALGORITHMS
  38. Raft: Not Majority? YIFAN XING - 2018 @yifan_xing_e 28 Client

    Log cmd cmd S0 ACK If not majority: - Leader retries until succeed CONSENSUS ALGORITHMS
  39. Raft: Log Entry YIFAN XING - 2018 @yifan_xing_e 29 Term:

    - Current Term when receive cmd Command: - cmd to execute 1 cmd S0 Index CONSENSUS ALGORITHMS
  40. Raft: Consistency Server crushes => log inconsistency Goal: log consistency.

    But how? YIFAN XING - 2018 @yifan_xing_e 30 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd 4 cmd 4 cmd 4 cmd S0 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd 4 cmd 4 cmd 4 cmd S2 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S1 CONSENSUS ALGORITHMS
  41. Log Consistency Always trust leader’s log, do not elect candidate

    with inconsistent logs No “holes” in log Repair inconsistency during log replication process YIFAN XING - 2018 @yifan_xing_e 31 CONSENSUS ALGORITHMS
  42. Leader's Log Completeness How to guarantee completeness of leader’s log?

    YIFAN XING - 2018 @yifan_xing_e 32 RequestVoteRPC: term - candidate’s term candidateId - candidate requesting vote lastLogIndex - index of candidate’s last log entry lastLogTerm - term of candidate’s last log entry CONSENSUS ALGORITHMS
  43. (In)complete Log Example - 1 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 33 RequestVoteRPC: term - 2 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 1 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 3 lastLogTerm - 1 Current StateMachine CONSENSUS ALGORITHMS
  44. (In)complete Log Example - 1 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 33 RequestVoteRPC: term - 2 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 1 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 3 lastLogTerm - 1 Current StateMachine REJECT VOTE Candidate.logTerm < my.logTerm => my log is more complete CONSENSUS ALGORITHMS
  45. (In)complete Log Example - 2 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine CONSENSUS ALGORITHMS
  46. (In)complete Log Example - 2 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine REJECT VOTE Candidate.lastLogIndex < my.lastLogIndex => my log is longer CONSENSUS ALGORITHMS
  47. (In)complete Log Example - 3 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 5 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine CONSENSUS ALGORITHMS
  48. (In)complete Log Example - 3 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 5 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine ACCEPT VOTE REQUEST candidate.term >= my.term && candidate.lastLogIdx > my.lastLogIdx => complete CONSENSUS ALGORITHMS
  49. Log Replication: Repair Inconsistency Check for consistency when sending AppendEntriesRPC

    YIFAN XING - 2018 @yifan_xing_e 37 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term 1 cmd 1 cmd 2 cmd 1 2 3 4 5 2 cmd 2 cmd S0 CONSENSUS ALGORITHMS
  50. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 2 cmd 2 cmd S1 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term 1 2 3 4 5 Receiver checks its own preceding index and term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  51. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S1 If match => append entry else => rejects request 1 2 3 4 5 CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  52. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  53. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 Do not match: REJECT CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  54. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  55. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd S1 2 cmd 2 cmd 1 2 3 4 5 Match: Replicate Log Entry CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  56. Log Matching Property If both index and term match: 1.

    the two entries store the same cmd 2. all previous entries are identical For each log entry: compare index and term YIFAN XING - 2018 @yifan_xing_e 39 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 S1 1 cmd 1 cmd 2 cmd 3 xxx 3 xxx S2 CONSENSUS ALGORITHMS 1 2 3 4 5
  57. YIFAN XING - 2018 @yifan_xing_e 47 CONSENSUS ALGORITHMS Byzantine Paxos

    algorithms Byzantine Raft algorithms Byzantine Fault Tolerant Algorithms
  58. YIFAN XING - 2018 @yifan_xing_e 47 CONSENSUS ALGORITHMS Byzantine Paxos

    algorithms Byzantine Raft algorithms Byzantine Fault Tolerant Algorithms
  59. Implementing Consensus Algorithms: Consistency Correctness, etc. Understandability Don't Resilient against

    issues? Worth it to be resilient? YIFAN XING - 2018 @yifan_xing_e 48 CONSENSUS ALGORITHMS Reliability Complexity Designing Consensus Algorithms: Take Away