Consensus in Distributed Systems

E289085517b264a80af9f1dcb10e6caa?s=47 Daniel Upton
September 25, 2018

Consensus in Distributed Systems

We rely ever more heavily on distributed systems in our daily lives, from spending money on a debit card to posting a tweet to our followers (tweeple?). We’ll dive into the challenges in building such systems identified by the CAP theorem, and take a look at a solution offered by “Raft” the consensus algorithm at the core of projects such as Consul, etcd and CockroachDB.

Image Credits:

Photo of L Peter Deutsch - Parma Recordings (source: https://parmarecordings-news.com/the-inside-story-coro-del-mundo-moto-bello-and-l-peter-deutsch/)

Photo of Eric Brewer - CC BY-SA 4.0 (source: https://en.wikipedia.org/wiki/Eric_Brewer_(scientist)#/media/File:TNW_Con_EU15_-_Eric_Brewer_(scientist)-2.jpg)

Raft Logo - CC 3.0 (source: https://raft.github.io/)

Rest of the Owl Meme - (source: https://www.reddit.com/r/funny/comments/eccj2/how_to_draw_an_owl/)

E289085517b264a80af9f1dcb10e6caa?s=128

Daniel Upton

September 25, 2018
Tweet

Transcript

  1. Consensus in Distributed Systems

  2. Brian

  3. Brian (an ideas guy)

  4. Brian (an ideas guy) Twitter for Alsatians?

  5. Brian (an ideas guy) Uber for unicycles?

  6. Brian (an ideas guy)

  7. MySQL Database Ruby on Rails API

  8. MySQL Database Ruby on Rails API

  9. Leader (master) Follower (slave) Replication

  10. Leader (master) Follower (slave) Replication

  11. Leader (master) Failover

  12. Leader (master) Follower (slave) Network

  13. Leader (master) Follower (slave) Network (actual physical cables and stuff)

  14. Leader (master) Follower (slave) Network (actual physical cables and stuff)

  15. Fallacies of distributed computing #1 The network is reliable. L

    Peter Deutsch (et al)
  16. Leader (master) Follower (slave) Network Order #17623

  17. Leader (master) Follower (slave) Network Order #17623

  18. Leader (master) Follower (slave) Network Order #17623 Details of order

    #17623 please Huh?
  19. Options You’ve got two of ‘em

  20. Leader (master) Follower (slave) Network Order #17623 Details of order

    #17623 please Huh? Option A
  21. Leader (master) Follower (slave) Network Order #17623 No Option C

    (!)
  22. CAP theorem (paraphrased) Eric Brewer When operating in a catastrophically

    broken or unreliable network a distributed system must choose to either risk returning stale/outdated data or refuse to accept writes/updates
  23. CAP theorem (paraphrased) Eric Brewer When operating in a catastrophically

    broken or unreliable network (Partition Tolerance) a distributed system must choose to either risk returning stale/outdated data (Availability) or refuse to accept writes/updates (Consistency)
  24. Trade-offs

  25. Raft Consensus Algorithm

  26. Strongly Consistent but also Highly Available

  27. Quorum (and you need an odd number of nodes)

  28. None
  29. None
  30. Distributed Log

  31. best_programming_language = Ruby current_year = 2008 linux_on_desktop = Maybe State

    Machine Distributed Log
  32. best_programming_language = Ruby current_year = 2018 linux_on_desktop = Maybe State

    Machine current_year = 2018 SET
  33. best_programming_language = Go current_year = 2018 linux_on_desktop = Maybe State

    Machine best_programming_language = Go SET current_year = 2018 SET
  34. best_programming_language = Go current_year = 2018 State Machine best_programming_language =

    Go SET current_year = 2018 SET linux_on_desktop DELETE
  35. None
  36. Getting a majority of servers in a cluster to agree

    on What’s in the log
  37. I like my leadership the same way I like my

    ☕ Strong. — Raft
  38. None
  39. Leader Election

  40. ⏰ Random Timers

  41. * + , Monotonically Increasing Terms

  42. every node starts off as a Follower if a follower

    doesn’t hear from a leader for a while (random timer) it becomes a Candidate if the candidate receives votes from a majority of nodes it will become the Leader
  43. In the case of a split-vote nodes will simply Wait

    for another election
  44. Leader Election

  45. Leader goes AWOL

  46. Log Replication

  47. 1. Client sends a command to the Leader. 2. Leader

    appends an entry to its own log. 3. Leader issues an RPC (AppendEntries) to each Follower. 4. Follower appends the entry to its log and responds to the Leader to acknowledge the entry. 5. Once the entry has been acknowledged by a majority of Followers the Leader responds to the Client. 6. Leader issues a heartbeat RPC (AppendEntries) to each Follower which “commits” the entry and applies it to each Follower’s state machine.
  48. Log Replication

  49. Handling Turbulent Network Conditions

  50. Safety Guarantees Election Safety Only a single leader will be

    elected in each term. Append Only Leaders The leader will never delete or overwrite entries. Log Matching Any two logs with an entry of the same index and term, will contain the same value. Leader Completeness An entry committed in an earlier term will be present in the logs of leaders in later terms. State Machine Safety If a log entry at a given index has been applied to a server’s state machine, no other server will ever apply a different log entry at the same index.
  51. Preventing Split-Brain 1 1 1 1 1

  52. 1 1 1 1 1 Preventing Split-Brain

  53. 2 2 2 1 1 Preventing Split-Brain

  54. 2 2 2 1 1 Preventing Split-Brain X=1 X=2

  55. 2 2 2 1 1 Preventing Split-Brain X=1 X=1 X=1

    X=2 X=2
  56. 2 2 2 1 1 Preventing Split-Brain X=1 X=1 X=1

    X=2 X=2
  57. 2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2

    1 X=2
  58. 2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2

    1 X=2 AppendEntries Term: 1 X = 2
  59. 2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2

    1 X=2 NOPE. Term is 2 now
  60. 2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2

    2
  61. 2 2 2 1 Preventing Split-Brain X=1 X=1 X=1 X=2

    2 AppendEntries Term: 2 X = 1
  62. 2 2 2 2 Preventing Split-Brain X=1 X=1 X=1 2

    X=1 X=1
  63. Snapshots / Log Compaction

  64. Thanks! https://raft.github.io/raft.pdf http://thesecretlivesofdata.com/raft/