Kiran Bhattaram on Failure Detectors

1 FA I L U R E   D E
T E C TO R S Papers We Love NYC

2 Kiran Bhattaram @kiranb

3 Why? Failure detectors are pervasive. Failure detectors abstract complexity.

4 Timeline T h e P a p e r
E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d

5 Background 1 history, system models consensus, impossibility

- how long do operations take? - is message delivery
reliable? - what kind of crashes happen? System Models Set of assumptions about the system 6

7 The Synchronous System Model upper bound on message delivery
delay reliable delivery fail stop crashes upper bound on processing time

8 The Asynchronous System Model unbounded processing time reliable delivery
fail stop crashes unbounded message delivery delay

9 Problems: Consensus C B 8 8 A 8

10 Consensus

10 Consensus Termination

10 Consensus Termination The processing will eventually conclude.

10 Consensus Termination Agreement The processing will eventually conclude.

10 Consensus Termination Agreement The processing will eventually conclude. Everyone
will agree on the same thing.

10 Consensus Termination Agreement Validity The processing will eventually conclude.
Everyone will agree on the same thing.

10 Consensus Termination Agreement Validity The processing will eventually conclude.
Everyone will agree on the same thing. Some node will have proposed the agreed-upon value.

11 Consensus in Synchronous Systems Use timeouts to determine whether
a process has crashed: t > (processing time bound + message delay time bound)

11 Consensus in Synchronous Systems Use timeouts to determine whether
a process has crashed: t > (processing time bound + message delay time bound) => perfect failure detectors

12 Consensus in Asynchronous Systems: FLP! Even if only one
process can crash Even with reliable delivery

13 Wait, what? but I use consensus systems all the
time!

time! Any fault-tolerant algorithm solving consensus has runs that never terminate

time! Any fault-tolerant algorithm solving consensus has runs that never terminate but these runs may have very small probabilities. [Ben- Or] (weakens termination!)

14 “consensus is impossible” => “consensus is not always possible”

15 What Now? or, Keep Calm and Consensus On

15 What Now? or, Keep Calm and Consensus On or,
Keep Augmenting the System Model

16 The Paper 2 oracles, classification, solving consensus

17 When do you stop waiting? 17

18 The Failure Detector Model 18 An oracle that guesses
at which processes are still alive. - might be incorrect! - might be different for different processes! - might be flappy!

19 Evaluating Failure Detectors 19 Accuracy Completeness no false negatives
no false positives A C B D

20 20 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong
Strong Weak Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS Eventually Weak ὓW Weak W ὓQ Q

21 Completeness Strong Weak

22 Weak Completeness 22 A C B D

22 Weak Completeness 22 A C B D every node
that has crashed is permanently suspected by at least one alive node

22 Weak Completeness 22 A C B D D has
died! every node that has crashed is permanently suspected by at least one alive node

22 Weak Completeness 22 A C B D D has
died! A has died! every node that has crashed is permanently suspected by at least one alive node

23 Strong Completeness 23 A C B D

23 Strong Completeness 23 A C B D eventually every
process that crashes is permanently suspected by every correct process.

23 Strong Completeness 23 A C B D A &
D! eventually every process that crashes is permanently suspected by every correct process.

23 Strong Completeness 23 A C B D A &
D! A & D! eventually every process that crashes is permanently suspected by every correct process.

Strong Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS

26 26 Accuracy Strong Weak Eventually Weak Eventually Strong

27 C C A B D Perfect Accuracy

27 C C A B D Perfect Accuracy No process
is suspected before it crashes.

27 C A B D C has died! Perfect Accuracy
No process is suspected before it crashes.

C 28 A B D Weak Accuracy

C 28 A B D Weak Accuracy at least one
correct process is never suspected.

C 28 A B D C & D have died!
Weak Accuracy B has died! B & C have died! at least one correct process is never suspected.

29 29 Accuracy Strong Weak Eventually Weak Eventually Strong

C 30 A B D Eventually Strong Accuracy

C 30 A B D Eventually Strong Accuracy eventually NO
correct processes is suspected by any correct process.

C 30 A B D C has died! Eventually Strong
Accuracy eventually NO correct processes is suspected by any correct process.

Accuracy B & C have died! eventually NO correct processes is suspected by any correct process.

Accuracy C has died! eventually NO correct processes is suspected by any correct process.

C 31 A B D Eventually Weak Accuracy

C 31 A B D A, C & D Eventually
Weak Accuracy A & B B & C B, C & D

C 31 A B D A, C & D Eventually
Weak Accuracy B & C B B, C & D

C 31 A B D Eventually Weak Accuracy B &
C B C B, C & D

C 31 A B D Eventually Weak Accuracy B &
C B C B, C & D eventually SOME correct process is not suspected by any correct process.

Strong Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS

34 Consensus: ὓS 34 initial arbitrary information, BUT: eventually every
process that crashes is permanently suspected by every correct process. eventually SOME correct process is not suspected by any correct process. solvable for up to n/2 failures!

35 Consensus: ὓS Mickens, The Saddest Moment.

36 Consensus: ὓS 36 C C A B D G
choose leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E

36 Consensus: ὓS 36 C A B D G choose
leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E

36 Consensus: ὓS 36 C A B D G choose
leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E move on when majority proposes

37 Consensus: ὓS 37 C A B D G Phase
2: send proposal a proposal! F E

2: send proposal a proposal! F E no waiting!

3: gather votes F E

3: gather votes sgtm! F E

3: gather votes sgtm! I think you may have died! F E

3: gather votes sgtm! I think you may have died! move on when majority votes F E

3: gather votes sgtm! I think you may have died! move on when majority votes cancel if all nodes realizes B is down F E

3: gather votes sgtm! I think you may have died! move on when majority votes cancel if all nodes realizes B is down OR F E

4: decision that’s okay, commit anyway! F E

4: decision that’s okay, commit anyway! no waiting — all done! F E

40 A very simple example of ὓS! 40 SWIM A
B I’M ALIVE A looks fishy.

40 A very simple example of ὓS! 40 SWIM A
B I’M ALIVE A looks fishy. oh hey, it’s back!

41 Expanding the Scope 4 New models, New problems

42 As of 1996 42 Model: - asynchronous systems -
fail-stop processes - no recovery - no message losses Problems: - consensus - atomic broadcast

Strong Weak Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS Eventually Weak ὓW Weak W ὓQ Q tim eless heartbeat leader Ω

44 Non-blocking Atomic Commit: P? + diamond S 44 FLL
Quiescent Communication: heartbeats And more! HB-completeness: if p[j] crashes, HB_i[j] stops increasing HB-accuracy: if p[j] is correct, HB_i[j] keeps increasing anonymously perfect: if a crash happens, the FD is informed.

45 And even more! Other models: - Crashes & link
failures (FLL) - Network partitioning - Crash/recovery Other problems: - non-blocking atomic commit - group membership - leader election - k-set agreement - reliable communication

46 Rephrasing problems 46 encapsulating complexity/hairy bits A, B &
C look sus

47 Examples 3 Productionization, SWIM, Phi Accrual

48 In production 48 • network efficiency & message load
• speed of first detection • speed of knowledge propagation • minimizing flappy alerts completeness & accuracy, PLUS

49 SWIM Scalable Weakly-consistent Infection-style Process Group Membership Protocol

50 Additional features 50 SWIM • network: • constant message
load/group member • propagate membership updates with gossip • time to detection: • deterministic bound on failure detection latency • prevent flappy alerts: • “suspect” nodes before declaring them dead

51 Randomized pings 51 k random nodes SWIM

52 Gossip 52 B SWIM

52 Gossip 52 B SWIM Hey, I suspect B is
dead!

53 Phi Accrual OTHER NOTES HERE TKTKTK φ

54 Model 54 54 φ C A B D G
F E C is 25% likely to be down

55 Use cases: a job scheduler 55 55 - at
25%, stop sending it new jobs. - at 50%, reschedule outstanding jobs on another node, and wait for recovery. - at 75%, φ

56 Where to Go from Here or, a bibliography •
FLP result • Chandra/Toueg • Reynal survey • SWIM • Phi Accrual Failure Detectors • Guerraoui et al. survey • Freiling et al. survey

57 Conclusion! T h e P a p e r
E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual

58 Thanks! @ k i r a n b kiranbot.com

59 Appendix!

60 Gossip van Renesse et. al

61 Ping one node 61 Gossip

Kiran Bhattaram on Failure Detectors

Kiran Bhattaram on Failure Detectors

More Decks by Papers_We_Love

Other Decks in Programming

Featured

Transcript