Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kiran Bhattaram on Failure Detectors

Kiran Bhattaram on Failure Detectors

The problem of consensus is central to many distributed systems algorithms. Failure detectors are central to the way we think about consensus algorithms. In a fully asynchronous system, the FLP impossibility result (https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf) shows that no consensus solution that can tolerate crash failures exists! This simple, stunning result imposed a hard constraint on what could be solved in an asynchronous model.

The FLP (http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/) result kicked off a flurry of research into ways to circumvent the impossibility result. Failure detectors were the most compelling abstraction proposed. These augmented the asynchronous model just enough to allow consensus, while retaining most of the neat abstractions that make asynchronous systems simple to reason about.

In this talk, I'll discuss some of the history and background of Chandra and Toueg's failure detector proposal (http://courses.csail.mit.edu/6.852/08/papers/CT96-JACM.pdf), and discuss some failure detector mechanisms that followed the paper.

Papers_We_Love

March 27, 2017
Tweet

More Decks by Papers_We_Love

Other Decks in Programming

Transcript

  1. 1 FA I L U R E 
 D E

    T E C TO R S Papers We Love NYC
  2. 4 Timeline T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d
  3. 4 Timeline T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d
  4. 4 Timeline T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d
  5. 4 Timeline T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d
  6. 4 Timeline T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d
  7. - how long do operations take? - is message delivery

    reliable? - what kind of crashes happen? System Models Set of assumptions about the system 6
  8. 7 The Synchronous System Model upper bound on message delivery

    delay reliable delivery fail stop crashes upper bound on processing time
  9. 8 The Asynchronous System Model unbounded processing time reliable delivery

    fail stop crashes unbounded message delivery delay
  10. 10 Consensus Termination Agreement Validity The processing will eventually conclude.

    Everyone will agree on the same thing. Some node will have proposed the agreed-upon value.
  11. 11 Consensus in Synchronous Systems Use timeouts to determine whether

    a process has crashed: t > (processing time bound + message delay time bound)
  12. 11 Consensus in Synchronous Systems Use timeouts to determine whether

    a process has crashed: t > (processing time bound + message delay time bound) => perfect failure detectors
  13. 12 Consensus in Asynchronous Systems: FLP! Even if only one

    process can crash Even with reliable delivery
  14. 13 Wait, what? but I use consensus systems all the

    time! Any fault-tolerant algorithm solving consensus has runs that never terminate
  15. 13 Wait, what? but I use consensus systems all the

    time! Any fault-tolerant algorithm solving consensus has runs that never terminate but these runs may have very small probabilities. [Ben- Or] (weakens termination!)
  16. 15 What Now? or, Keep Calm and Consensus On or,

    Keep Augmenting the System Model
  17. 18 The Failure Detector Model 18 An oracle that guesses

    at which processes are still alive. - might be incorrect! - might be different for different processes! - might be flappy!
  18. 20 20 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong

    Strong Weak Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS Eventually Weak ὓW Weak W ὓQ Q
  19. 22 Weak Completeness 22 A C B D every node

    that has crashed is permanently suspected by at least one alive node
  20. 22 Weak Completeness 22 A C B D D has

    died! every node that has crashed is permanently suspected by at least one alive node
  21. 22 Weak Completeness 22 A C B D D has

    died! A has died! every node that has crashed is permanently suspected by at least one alive node
  22. 23 Strong Completeness 23 A C B D eventually every

    process that crashes is permanently suspected by every correct process.
  23. 23 Strong Completeness 23 A C B D A &

    D! eventually every process that crashes is permanently suspected by every correct process.
  24. 23 Strong Completeness 23 A C B D A &

    D! A & D! eventually every process that crashes is permanently suspected by every correct process.
  25. 24

  26. 25 25 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong

    Strong Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS
  27. 27 C C A B D Perfect Accuracy No process

    is suspected before it crashes.
  28. 27 C A B D C has died! Perfect Accuracy

    No process is suspected before it crashes.
  29. C 28 A B D Weak Accuracy at least one

    correct process is never suspected.
  30. C 28 A B D C & D have died!

    Weak Accuracy B has died! B & C have died! at least one correct process is never suspected.
  31. C 28 A B D C & D have died!

    Weak Accuracy B has died! B & C have died! at least one correct process is never suspected.
  32. C 30 A B D Eventually Strong Accuracy eventually NO

    correct processes is suspected by any correct process.
  33. C 30 A B D C has died! Eventually Strong

    Accuracy eventually NO correct processes is suspected by any correct process.
  34. C 30 A B D C has died! Eventually Strong

    Accuracy B & C have died! eventually NO correct processes is suspected by any correct process.
  35. C 30 A B D C has died! Eventually Strong

    Accuracy C has died! eventually NO correct processes is suspected by any correct process.
  36. C 31 A B D A, C & D Eventually

    Weak Accuracy A & B B & C B, C & D
  37. C 31 A B D A, C & D Eventually

    Weak Accuracy B & C B B, C & D
  38. C 31 A B D Eventually Weak Accuracy B &

    C B C B, C & D eventually SOME correct process is not suspected by any correct process.
  39. 33 33 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong

    Strong Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS
  40. 34 Consensus: ὓS 34 initial arbitrary information, BUT: eventually every

    process that crashes is permanently suspected by every correct process. eventually SOME correct process is not suspected by any correct process. solvable for up to n/2 failures!
  41. 36 Consensus: ὓS 36 C C A B D G

    choose leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E
  42. 36 Consensus: ὓS 36 C A B D G choose

    leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E
  43. 36 Consensus: ὓS 36 C A B D G choose

    leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E move on when majority proposes
  44. 37 Consensus: ὓS 37 C A B D G Phase

    2: send proposal a proposal! F E
  45. 37 Consensus: ὓS 37 C A B D G Phase

    2: send proposal a proposal! F E no waiting!
  46. 38 Consensus: ὓS 38 C A B D G Phase

    3: gather votes sgtm! F E
  47. 38 Consensus: ὓS 38 C A B D G Phase

    3: gather votes sgtm! I think you may have died! F E
  48. 38 Consensus: ὓS 38 C A B D G Phase

    3: gather votes sgtm! I think you may have died! move on when majority votes F E
  49. 38 Consensus: ὓS 38 C A B D G Phase

    3: gather votes sgtm! I think you may have died! move on when majority votes cancel if all nodes realizes B is down F E
  50. 38 Consensus: ὓS 38 C A B D G Phase

    3: gather votes sgtm! I think you may have died! move on when majority votes cancel if all nodes realizes B is down OR F E
  51. 39 Consensus: ὓS 39 C A B D G Phase

    4: decision that’s okay, commit anyway! F E
  52. 39 Consensus: ὓS 39 C A B D G Phase

    4: decision that’s okay, commit anyway! no waiting — all done! F E
  53. 40 A very simple example of ὓS! 40 SWIM A

    B I’M ALIVE A looks fishy.
  54. 40 A very simple example of ὓS! 40 SWIM A

    B I’M ALIVE A looks fishy.
  55. 40 A very simple example of ὓS! 40 SWIM A

    B I’M ALIVE A looks fishy. oh hey, it’s back!
  56. 42 As of 1996 42 Model: - asynchronous systems -

    fail-stop processes - no recovery - no message losses Problems: - consensus - atomic broadcast
  57. 43 43 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong

    Strong Weak Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS Eventually Weak ὓW Weak W ὓQ Q tim eless heartbeat leader Ω
  58. 44 Non-blocking Atomic Commit: P? + diamond S 44 FLL

    Quiescent Communication: heartbeats And more! HB-completeness: if p[j] crashes, HB_i[j] stops increasing HB-accuracy: if p[j] is correct, HB_i[j] keeps increasing anonymously perfect: if a crash happens, the FD is informed.
  59. 45 And even more! Other models: - Crashes & link

    failures (FLL) - Network partitioning - Crash/recovery Other problems: - non-blocking atomic commit - group membership - leader election - k-set agreement - reliable communication
  60. 48 In production 48 • network efficiency & message load

    • speed of first detection • speed of knowledge propagation • minimizing flappy alerts completeness & accuracy, PLUS
  61. 50 Additional features 50 SWIM • network: • constant message

    load/group member • propagate membership updates with gossip • time to detection: • deterministic bound on failure detection latency • prevent flappy alerts: • “suspect” nodes before declaring them dead
  62. 54 Model 54 54 φ C A B D G

    F E C is 25% likely to be down
  63. 55 Use cases: a job scheduler 55 55 - at

    25%, stop sending it new jobs. - at 50%, reschedule outstanding jobs on another node, and wait for recovery. - at 75%, φ
  64. 56 Where to Go from Here or, a bibliography •

    FLP result • Chandra/Toueg • Reynal survey • SWIM • Phi Accrual Failure Detectors • Guerraoui et al. survey • Freiling et al. survey
  65. 57 Conclusion! T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual
  66. 57 Conclusion! T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual
  67. 57 Conclusion! T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual
  68. 57 Conclusion! T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual
  69. 57 Conclusion! T h e P a p e r

    E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual