Slide 1

Slide 1 text

1 FA I L U R E 
 D E T E C TO R S Papers We Love NYC

Slide 2

Slide 2 text

2 Kiran Bhattaram @kiranb

Slide 3

Slide 3 text

3 Why? Failure detectors are pervasive. Failure detectors abstract complexity.

Slide 4

Slide 4 text

4 Timeline T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d

Slide 5

Slide 5 text

4 Timeline T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d

Slide 6

Slide 6 text

4 Timeline T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d

Slide 7

Slide 7 text

4 Timeline T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d

Slide 8

Slide 8 text

4 Timeline T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d

Slide 9

Slide 9 text

5 Background 1 history, system models consensus, impossibility

Slide 10

Slide 10 text

- how long do operations take? - is message delivery reliable? - what kind of crashes happen? System Models Set of assumptions about the system 6

Slide 11

Slide 11 text

7 The Synchronous System Model upper bound on message delivery delay reliable delivery fail stop crashes upper bound on processing time

Slide 12

Slide 12 text

8 The Asynchronous System Model unbounded processing time reliable delivery fail stop crashes unbounded message delivery delay

Slide 13

Slide 13 text

9 Problems: Consensus C B 8 8 A 8

Slide 14

Slide 14 text

10 Consensus

Slide 15

Slide 15 text

10 Consensus Termination

Slide 16

Slide 16 text

10 Consensus Termination The processing will eventually conclude.

Slide 17

Slide 17 text

10 Consensus Termination Agreement The processing will eventually conclude.

Slide 18

Slide 18 text

10 Consensus Termination Agreement The processing will eventually conclude. Everyone will agree on the same thing.

Slide 19

Slide 19 text

10 Consensus Termination Agreement Validity The processing will eventually conclude. Everyone will agree on the same thing.

Slide 20

Slide 20 text

10 Consensus Termination Agreement Validity The processing will eventually conclude. Everyone will agree on the same thing. Some node will have proposed the agreed-upon value.

Slide 21

Slide 21 text

11 Consensus in Synchronous Systems Use timeouts to determine whether a process has crashed: t > (processing time bound + message delay time bound)

Slide 22

Slide 22 text

11 Consensus in Synchronous Systems Use timeouts to determine whether a process has crashed: t > (processing time bound + message delay time bound) => perfect failure detectors

Slide 23

Slide 23 text

12 Consensus in Asynchronous Systems: FLP! Even if only one process can crash Even with reliable delivery

Slide 24

Slide 24 text

13 Wait, what? but I use consensus systems all the time!

Slide 25

Slide 25 text

13 Wait, what? but I use consensus systems all the time! Any fault-tolerant algorithm solving consensus has runs that never terminate

Slide 26

Slide 26 text

13 Wait, what? but I use consensus systems all the time! Any fault-tolerant algorithm solving consensus has runs that never terminate but these runs may have very small probabilities. [Ben- Or] (weakens termination!)

Slide 27

Slide 27 text

14 “consensus is impossible” => “consensus is not always possible”

Slide 28

Slide 28 text

15 What Now? or, Keep Calm and Consensus On

Slide 29

Slide 29 text

15 What Now? or, Keep Calm and Consensus On or, Keep Augmenting the System Model

Slide 30

Slide 30 text

16 The Paper 2 oracles, classification, solving consensus

Slide 31

Slide 31 text

17 When do you stop waiting? 17

Slide 32

Slide 32 text

18 The Failure Detector Model 18 An oracle that guesses at which processes are still alive. - might be incorrect! - might be different for different processes! - might be flappy!

Slide 33

Slide 33 text

19 Evaluating Failure Detectors 19 Accuracy Completeness no false negatives no false positives A C B D

Slide 34

Slide 34 text

20 20 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong Strong Weak Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS Eventually Weak ὓW Weak W ὓQ Q

Slide 35

Slide 35 text

21 Completeness Strong Weak

Slide 36

Slide 36 text

22 Weak Completeness 22 A C B D

Slide 37

Slide 37 text

22 Weak Completeness 22 A C B D every node that has crashed is permanently suspected by at least one alive node

Slide 38

Slide 38 text

22 Weak Completeness 22 A C B D D has died! every node that has crashed is permanently suspected by at least one alive node

Slide 39

Slide 39 text

22 Weak Completeness 22 A C B D D has died! A has died! every node that has crashed is permanently suspected by at least one alive node

Slide 40

Slide 40 text

23 Strong Completeness 23 A C B D

Slide 41

Slide 41 text

23 Strong Completeness 23 A C B D eventually every process that crashes is permanently suspected by every correct process.

Slide 42

Slide 42 text

23 Strong Completeness 23 A C B D A & D! eventually every process that crashes is permanently suspected by every correct process.

Slide 43

Slide 43 text

23 Strong Completeness 23 A C B D A & D! A & D! eventually every process that crashes is permanently suspected by every correct process.

Slide 44

Slide 44 text

24

Slide 45

Slide 45 text

25 25 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong Strong Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS

Slide 46

Slide 46 text

26 26 Accuracy Strong Weak Eventually Weak Eventually Strong

Slide 47

Slide 47 text

27 C C A B D Perfect Accuracy

Slide 48

Slide 48 text

27 C C A B D Perfect Accuracy No process is suspected before it crashes.

Slide 49

Slide 49 text

27 C A B D C has died! Perfect Accuracy No process is suspected before it crashes.

Slide 50

Slide 50 text

C 28 A B D Weak Accuracy

Slide 51

Slide 51 text

C 28 A B D Weak Accuracy at least one correct process is never suspected.

Slide 52

Slide 52 text

C 28 A B D C & D have died! Weak Accuracy B has died! B & C have died! at least one correct process is never suspected.

Slide 53

Slide 53 text

C 28 A B D C & D have died! Weak Accuracy B has died! B & C have died! at least one correct process is never suspected.

Slide 54

Slide 54 text

29 29 Accuracy Strong Weak Eventually Weak Eventually Strong

Slide 55

Slide 55 text

C 30 A B D Eventually Strong Accuracy

Slide 56

Slide 56 text

C 30 A B D Eventually Strong Accuracy eventually NO correct processes is suspected by any correct process.

Slide 57

Slide 57 text

C 30 A B D C has died! Eventually Strong Accuracy eventually NO correct processes is suspected by any correct process.

Slide 58

Slide 58 text

C 30 A B D C has died! Eventually Strong Accuracy B & C have died! eventually NO correct processes is suspected by any correct process.

Slide 59

Slide 59 text

C 30 A B D C has died! Eventually Strong Accuracy C has died! eventually NO correct processes is suspected by any correct process.

Slide 60

Slide 60 text

C 31 A B D Eventually Weak Accuracy

Slide 61

Slide 61 text

C 31 A B D A, C & D Eventually Weak Accuracy A & B B & C B, C & D

Slide 62

Slide 62 text

C 31 A B D A, C & D Eventually Weak Accuracy B & C B B, C & D

Slide 63

Slide 63 text

C 31 A B D Eventually Weak Accuracy B & C B C B, C & D

Slide 64

Slide 64 text

C 31 A B D Eventually Weak Accuracy B & C B C B, C & D

Slide 65

Slide 65 text

C 31 A B D Eventually Weak Accuracy B & C B C B, C & D eventually SOME correct process is not suspected by any correct process.

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

33 33 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong Strong Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS

Slide 68

Slide 68 text

34 Consensus: ὓS 34 initial arbitrary information, BUT: eventually every process that crashes is permanently suspected by every correct process. eventually SOME correct process is not suspected by any correct process. solvable for up to n/2 failures!

Slide 69

Slide 69 text

35 Consensus: ὓS Mickens, The Saddest Moment.

Slide 70

Slide 70 text

36 Consensus: ὓS 36 C C A B D G choose leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E

Slide 71

Slide 71 text

36 Consensus: ὓS 36 C A B D G choose leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E

Slide 72

Slide 72 text

36 Consensus: ὓS 36 C A B D G choose leader by c = (r mod n) + 1 Phase 1: gather proposals F a proposal! E move on when majority proposes

Slide 73

Slide 73 text

37 Consensus: ὓS 37 C A B D G Phase 2: send proposal a proposal! F E

Slide 74

Slide 74 text

37 Consensus: ὓS 37 C A B D G Phase 2: send proposal a proposal! F E no waiting!

Slide 75

Slide 75 text

38 Consensus: ὓS 38 C A B D G Phase 3: gather votes F E

Slide 76

Slide 76 text

38 Consensus: ὓS 38 C A B D G Phase 3: gather votes sgtm! F E

Slide 77

Slide 77 text

38 Consensus: ὓS 38 C A B D G Phase 3: gather votes sgtm! I think you may have died! F E

Slide 78

Slide 78 text

38 Consensus: ὓS 38 C A B D G Phase 3: gather votes sgtm! I think you may have died! move on when majority votes F E

Slide 79

Slide 79 text

38 Consensus: ὓS 38 C A B D G Phase 3: gather votes sgtm! I think you may have died! move on when majority votes cancel if all nodes realizes B is down F E

Slide 80

Slide 80 text

38 Consensus: ὓS 38 C A B D G Phase 3: gather votes sgtm! I think you may have died! move on when majority votes cancel if all nodes realizes B is down OR F E

Slide 81

Slide 81 text

39 Consensus: ὓS 39 C A B D G Phase 4: decision that’s okay, commit anyway! F E

Slide 82

Slide 82 text

39 Consensus: ὓS 39 C A B D G Phase 4: decision that’s okay, commit anyway! no waiting — all done! F E

Slide 83

Slide 83 text

40 A very simple example of ὓS! 40 SWIM A B I’M ALIVE A looks fishy.

Slide 84

Slide 84 text

40 A very simple example of ὓS! 40 SWIM A B I’M ALIVE A looks fishy.

Slide 85

Slide 85 text

40 A very simple example of ὓS! 40 SWIM A B I’M ALIVE A looks fishy. oh hey, it’s back!

Slide 86

Slide 86 text

41 Expanding the Scope 4 New models, New problems

Slide 87

Slide 87 text

42 As of 1996 42 Model: - asynchronous systems - fail-stop processes - no recovery - no message losses Problems: - consensus - atomic broadcast

Slide 88

Slide 88 text

43 43 Accuracy Completeness Strong Weak Eventually Weak Eventually Strong Strong Weak Perfect P Strong S Eventually Perfect ὓP Eventually Strong ὓS Eventually Weak ὓW Weak W ὓQ Q tim eless heartbeat leader Ω

Slide 89

Slide 89 text

44 Non-blocking Atomic Commit: P? + diamond S 44 FLL Quiescent Communication: heartbeats And more! HB-completeness: if p[j] crashes, HB_i[j] stops increasing HB-accuracy: if p[j] is correct, HB_i[j] keeps increasing anonymously perfect: if a crash happens, the FD is informed.

Slide 90

Slide 90 text

45 And even more! Other models: - Crashes & link failures (FLL) - Network partitioning - Crash/recovery Other problems: - non-blocking atomic commit - group membership - leader election - k-set agreement - reliable communication

Slide 91

Slide 91 text

46 Rephrasing problems 46 encapsulating complexity/hairy bits A, B & C look sus

Slide 92

Slide 92 text

47 Examples 3 Productionization, SWIM, Phi Accrual

Slide 93

Slide 93 text

48 In production 48 • network efficiency & message load • speed of first detection • speed of knowledge propagation • minimizing flappy alerts completeness & accuracy, PLUS

Slide 94

Slide 94 text

49 SWIM Scalable Weakly-consistent Infection-style Process Group Membership Protocol

Slide 95

Slide 95 text

50 Additional features 50 SWIM • network: • constant message load/group member • propagate membership updates with gossip • time to detection: • deterministic bound on failure detection latency • prevent flappy alerts: • “suspect” nodes before declaring them dead

Slide 96

Slide 96 text

51 Randomized pings 51 k random nodes SWIM

Slide 97

Slide 97 text

51 Randomized pings 51 k random nodes SWIM

Slide 98

Slide 98 text

51 Randomized pings 51 k random nodes SWIM

Slide 99

Slide 99 text

51 Randomized pings 51 k random nodes SWIM

Slide 100

Slide 100 text

52 Gossip 52 B SWIM

Slide 101

Slide 101 text

52 Gossip 52 B SWIM

Slide 102

Slide 102 text

52 Gossip 52 B SWIM

Slide 103

Slide 103 text

52 Gossip 52 B SWIM

Slide 104

Slide 104 text

52 Gossip 52 B SWIM

Slide 105

Slide 105 text

52 Gossip 52 B SWIM Hey, I suspect B is dead!

Slide 106

Slide 106 text

53 Phi Accrual OTHER NOTES HERE TKTKTK φ

Slide 107

Slide 107 text

54 Model 54 54 φ C A B D G F E C is 25% likely to be down

Slide 108

Slide 108 text

55 Use cases: a job scheduler 55 55 - at 25%, stop sending it new jobs. - at 50%, reschedule outstanding jobs on another node, and wait for recovery. - at 75%, φ

Slide 109

Slide 109 text

56 Where to Go from Here or, a bibliography • FLP result • Chandra/Toueg • Reynal survey • SWIM • Phi Accrual Failure Detectors • Guerraoui et al. survey • Freiling et al. survey

Slide 110

Slide 110 text

57 Conclusion! T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual

Slide 111

Slide 111 text

57 Conclusion! T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual

Slide 112

Slide 112 text

57 Conclusion! T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual

Slide 113

Slide 113 text

57 Conclusion! T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual

Slide 114

Slide 114 text

57 Conclusion! T h e P a p e r E x a m p l e s E x p a n d i n g S c o p e B a c k g r o u n d history, system models consensus, impossibility oracles, classification, solving consensus New models, New problems Productionization, SWIM, Phi Accrual

Slide 115

Slide 115 text

58 Thanks! @ k i r a n b kiranbot.com

Slide 116

Slide 116 text

59 Appendix!

Slide 117

Slide 117 text

60 Gossip van Renesse et. al

Slide 118

Slide 118 text

61 Ping one node 61 Gossip

Slide 119

Slide 119 text

61 Ping one node 61 Gossip