Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Consensus Algorithms in Distributed Systems Lam...

Yifan Xing
September 19, 2018

Consensus Algorithms in Distributed Systems LambdaWorld2018

Paxos and Raft

Yifan Xing

September 19, 2018
Tweet

More Decks by Yifan Xing

Other Decks in Technology

Transcript

  1. CONSENSUS ALGORITHMS YIFAN XING - 2018 HOW TO PICK CAKE

    FLAVORS Yifan Xing Consensus @yifan_xing_e
  2. Poll Kafka: A distributed streaming platform YIFAN XING - 2018

    @yifan_xing_e Consensus Algorithm e.g. Raft, Paxos 1 CONSENSUS ALGORITHMS
  3. Fun Fact YIFAN XING - 2018 American Airline Central Office

    Travel Agents 1920s Cards for each flight Mark seats sold on cards @yifan_xing_e 4 CONSENSUS ALGORITHMS
  4. Challenge 1: Lack of Global Knowledge YIFAN XING - 2018

    @yifan_xing_e 6 CONSENSUS ALGORITHMS Up-to-date Exchange Inconsistency
  5. Challenge 2: Time YIFAN XING - 2018 @yifan_xing_e 7 CONSENSUS

    ALGORITHMS Clock skew Delay/duplicate messages -> order
  6. Challenge 3: Consistency YIFAN XING - 2018 @yifan_xing_e 8 CONSENSUS

    ALGORITHMS Concurrent operations Consistent state Conflicts
  7. Challenge 4: Failures YIFAN XING - 2018 @yifan_xing_e 9 A

    distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. — Leslie Lamport CONSENSUS ALGORITHMS Tolerate failures Detect Handle Recover
  8. Consensus Algorithms 10 YIFAN XING - 2018 @yifan_xing_e Raft Paxos

    Consensus Fault-tolerance Integrity Availability Reliability
  9. YIFAN XING - 2018 @yifan_xing_e 11 Paxos Background Paxos Made

    Simple: Basic Paxos Lynch & Liskov Leslie Lamport Proved The Part Time Parliament Multi-paxos: Paxos + Complexity Rejected Published No mathematical proof 10 CONSENSUS ALGORITHMS
  10. Basic Paxos YIFAN XING - 2018 Proposer: Propose a value

    Prepare: Try to propose Acceptor: accept/ reject value @yifan_xing_e 13 Consensus: Agree on one value CONSENSUS ALGORITHMS
  11. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Choose a Proposal Number (n) CONSENSUS ALGORITHMS
  12. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) CONSENSUS ALGORITHMS
  13. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) If n > maxProposal: maxProposal = n promise CONSENSUS ALGORITHMS Respond Promise Won't accept proposal with n' < n
  14. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Choose a Proposal Number (n) Broadcast the number to all servers Prepare(n) Respond Promise Won't accept proposal with n' < n If majority Accept(n, ) CONSENSUS ALGORITHMS
  15. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Broadcast Accept(n, ) CONSENSUS ALGORITHMS
  16. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Broadcast Accept(n, ) Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value CONSENSUS ALGORITHMS
  17. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value If majority: if any rejection => n is not largest repeat from beginning else: value chosen CONSENSUS ALGORITHMS Broadcast Accept(n, )
  18. Basic Paxos YIFAN XING - 2018 @yifan_xing_e 14 Proposer Acceptor

    Respond If n >= maxProposal: acceptedProposal = maxProposal = n acceptedValue = value If majority: if any rejection => n is not largest repeat from beginning else: value chosen CONSENSUS ALGORITHMS Broadcast Accept(n, )
  19. Paxos: Proposal Number YIFAN XING - 2018 @yifan_xing_e 15 S0

    - server id: unique - round number: increment overtime shared, highest Generate new proposal number: increment maxRound concatenate with server id CONSENSUS ALGORITHMS
  20. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 16 CONSENSUS ALGORITHMS A

    sequence of instances of Basic Paxos Log entries Implementation of Basic Paxos Concepts
  21. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Client

    cmd Basic Paxos Choose cmds (values) in log entries
  22. Multi-Paxos YIFAN XING - 2018 @yifan_xing_e 17 CONSENSUS ALGORITHMS Apply

    cmd (log entries) 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd
  23. Paxos: Leader Vs. NonLeader YIFAN XING - 2018 @yifan_xing_e 18

    S0 If didn't receive Heartbeat from a Higher ID for >= 2T ms: act as leader act as proposer Server with highest ID Heartbeat for every T ms Accept requests from client Leader/ Distinguished Proposer: CONSENSUS ALGORITHMS
  24. Paxos: Leader Vs. NonLeader YIFAN XING - 2018 @yifan_xing_e 18

    S0 Server with highest ID Heartbeat for every T ms Accept requests from client Non-leader: Redirect client requests to leader act as acceptor Leader/ Proposer: CONSENSUS ALGORITHMS
  25. Paxos: Leader YIFAN XING - 2018 @yifan_xing_e 19 Unlikely to

    have two leaders at the same time Can handle multiple leaders, however, it won’t work as efficient because of conflicts CONSENSUS ALGORITHMS
  26. Raft: Background YIFAN XING - 2018 Simpler version of Paxos

    Equivalent: performance & fault-tolerance Consistency, Conciseness, Correctness Why: implemented -> useful, extended/ adapted to the environment Understandability @yifan_xing_e 21 Designed by Diego Ongaro and John Ousterhout at Stanford CONSENSUS ALGORITHMS
  27. Raft: Phases YIFAN XING - 2018 1. Select one machine

    to be a leader 2. Detect crashes, reelection @yifan_xing_e 23 Leader Election CONSENSUS ALGORITHMS
  28. Raft: Phases YIFAN XING - 2018 1. Select one machine

    to be a leader 2. Detect crashes, reelection 1. Leader processes commands from clients 2. Replicates logs (consistency and consensus among servers) @yifan_xing_e 23 Leader Election Log Replication CONSENSUS ALGORITHMS
  29. Raft: Leader Election YIFAN XING - 2018 @yifan_xing_e 24 Become

    Candidate CurrentTerm++ Vote for itself Send RequestVoteRPC to other servers Become Leader: - Send heartbeats - Handle requests Become Follower: - Redirect requests Majority Votes Timeout RPC from Leader CONSENSUS ALGORITHMS
  30. Raft: How to ensure election works? YIFAN XING - 2018

    @yifan_xing_e 25 At most one leader per term: - Each server: one vote per term - Receive majority to win election (N / 2 + 1) - Example: S0 S1 S1 S0 S0 S0 S1 S2 S3 S4 CONSENSUS ALGORITHMS
  31. Raft: How to ensure election works? YIFAN XING - 2018

    @yifan_xing_e 26 At most one leader per term: - Each server: one vote per term - Receive majority to win election (N / 2 + 1) - Example: S0 Leader S1 S2 S3 S4 CONSENSUS ALGORITHMS S1 S0 S0 S0 S1
  32. Raft: How to ensure election works? YIFAN XING - 2018

    @yifan_xing_e 27 There will eventually be a leader: - Random election timeout (range 100-300ms) - Usually, one times out first, and win the majority votes - If two time out at the same time: - Split vote -> election timeout -> re-enter election state (increment term, gather votes) CONSENSUS ALGORITHMS
  33. Raft: Leader Appends Entry to Log YIFAN XING - 2018

    @yifan_xing_e 28 Client Log cmd S0 CONSENSUS ALGORITHMS
  34. Raft: Leader Sends AppendEntryRPC YIFAN XING - 2018 @yifan_xing_e 28

    Client Log cmd S0 AppendEntryRPC CONSENSUS ALGORITHMS
  35. Raft: Followers Send ACKs YIFAN XING - 2018 @yifan_xing_e 28

    Client Log cmd cmd cmd cmd cmd S0 ACK CONSENSUS ALGORITHMS
  36. Raft: Replies Client YIFAN XING - 2018 @yifan_xing_e 28 Client

    Log cmd cmd cmd cmd S0 If majority: - Entry Committed - Execute - Return result cmd CONSENSUS ALGORITHMS
  37. Raft: Notifies Followers of Committed Entry YIFAN XING - 2018

    @yifan_xing_e 28 Client Log cmd cmd cmd cmd cmd S0 CONSENSUS ALGORITHMS
  38. Raft: Not Majority? YIFAN XING - 2018 @yifan_xing_e 28 Client

    Log cmd cmd S0 ACK If not majority: - Leader retries until succeed CONSENSUS ALGORITHMS
  39. Raft: Log Entry YIFAN XING - 2018 @yifan_xing_e 29 Term:

    - Current Term when receive cmd Command: - cmd to execute 1 cmd S0 Index CONSENSUS ALGORITHMS
  40. Raft: Consistency Server crushes => log inconsistency Goal: log consistency.

    But how? YIFAN XING - 2018 @yifan_xing_e 30 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd 4 cmd 4 cmd 4 cmd S0 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd 3 cmd 3 cmd 4 cmd 4 cmd 4 cmd 4 cmd S2 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S1 CONSENSUS ALGORITHMS
  41. Log Consistency Always trust leader’s log, do not elect candidate

    with inconsistent logs No “holes” in log Repair inconsistency during log replication process YIFAN XING - 2018 @yifan_xing_e 31 CONSENSUS ALGORITHMS
  42. Leader's Log Completeness How to guarantee completeness of leader’s log?

    YIFAN XING - 2018 @yifan_xing_e 32 RequestVoteRPC: term - candidate’s term candidateId - candidate requesting vote lastLogIndex - index of candidate’s last log entry lastLogTerm - term of candidate’s last log entry CONSENSUS ALGORITHMS
  43. (In)complete Log Example - 1 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 33 RequestVoteRPC: term - 2 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 1 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 3 lastLogTerm - 1 Current StateMachine CONSENSUS ALGORITHMS
  44. (In)complete Log Example - 1 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 33 RequestVoteRPC: term - 2 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 1 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 3 lastLogTerm - 1 Current StateMachine REJECT VOTE Candidate.logTerm < my.logTerm => my log is more complete CONSENSUS ALGORITHMS
  45. (In)complete Log Example - 2 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine CONSENSUS ALGORITHMS
  46. (In)complete Log Example - 2 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 35 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 3 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine REJECT VOTE Candidate.lastLogIndex < my.lastLogIndex => my log is longer CONSENSUS ALGORITHMS
  47. (In)complete Log Example - 3 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 5 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine CONSENSUS ALGORITHMS
  48. (In)complete Log Example - 3 How to guarantee completeness of

    leader’s log? YIFAN XING - 2018 @yifan_xing_e 36 RequestVoteRPC: term - 3 candidateId - 00001 lastLogIndex - 5 lastLogTerm - 2 Candidate CurrentStateMachine Info: term - 3 candidateId - 00002 lastLogIndex - 4 lastLogTerm - 2 Current StateMachine ACCEPT VOTE REQUEST candidate.term >= my.term && candidate.lastLogIdx > my.lastLogIdx => complete CONSENSUS ALGORITHMS
  49. Log Replication: Repair Inconsistency Check for consistency when sending AppendEntriesRPC

    YIFAN XING - 2018 @yifan_xing_e 37 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term 1 cmd 1 cmd 2 cmd 1 2 3 4 5 2 cmd 2 cmd S0 CONSENSUS ALGORITHMS
  50. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 2 cmd 2 cmd S1 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term 1 2 3 4 5 Receiver checks its own preceding index and term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  51. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S1 If match => append entry else => rejects request 1 2 3 4 5 CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  52. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  53. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 Do not match: REJECT CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  54. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd 1 cmd S1 1 2 3 4 5 AppendEntryRPC: CurrentEntry: index, term, cmd PrecedingEntry: index, term CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  55. Goal: append a new entry to log YIFAN XING -

    2018 @yifan_xing_e 38 1 cmd 1 cmd 1 cmd 2 cmd 2 cmd S0 1 cmd 1 cmd 1 cmd S1 2 cmd 2 cmd 1 2 3 4 5 Match: Replicate Log Entry CONSENSUS ALGORITHMS Log Replication: Repair Inconsistency
  56. Log Matching Property If both index and term match: 1.

    the two entries store the same cmd 2. all previous entries are identical For each log entry: compare index and term YIFAN XING - 2018 @yifan_xing_e 39 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd 1 cmd 1 cmd 2 cmd 2 cmd 2 cmd S0 S1 1 cmd 1 cmd 2 cmd 3 xxx 3 xxx S2 CONSENSUS ALGORITHMS 1 2 3 4 5
  57. YIFAN XING - 2018 @yifan_xing_e 47 CONSENSUS ALGORITHMS Byzantine Paxos

    algorithms Byzantine Raft algorithms Byzantine Fault Tolerant Algorithms
  58. YIFAN XING - 2018 @yifan_xing_e 47 CONSENSUS ALGORITHMS Byzantine Paxos

    algorithms Byzantine Raft algorithms Byzantine Fault Tolerant Algorithms
  59. Implementing Consensus Algorithms: Consistency Correctness, etc. Understandability Don't Resilient against

    issues? Worth it to be resilient? YIFAN XING - 2018 @yifan_xing_e 48 CONSENSUS ALGORITHMS Reliability Complexity Designing Consensus Algorithms: Take Away