Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Slide 2
Slide 2 text
Paul Hinze phinze
Slide 3
Slide 3 text
Paul Hinze phinze death stare
Slide 4
Slide 4 text
Armon Dadgar armon creator of Serf and Consul
Slide 5
Slide 5 text
ma
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
Process Group Membership Protocol Who is alive
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
Process Group Membership Protocol
Slide 11
Slide 11 text
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Slide 12
Slide 12 text
Scalable The SWIM effort is motivated by the unscalability of traditional heartbeating protocols.
Slide 13
Slide 13 text
Heartbeating A B C A A B B Failure Detection + Membership
Slide 14
Slide 14 text
Heartbeating
Slide 15
Slide 15 text
Evaluating Protocols Completeness Speed Accuracy Overhead
Slide 16
Slide 16 text
Evaluating Protocols Completeness Speed Accuracy Overhead heartbeating Yes Limit * Interval High Nodes2 !
Slide 17
Slide 17 text
Key Insight Failure Detection State Updates. Solve separately from
Slide 18
Slide 18 text
Failure Detection ping! ack! A B C D {B,C,D}
Slide 19
Slide 19 text
Failure Detection ping! ack! A B C D {B,C,D}
Slide 20
Slide 20 text
Indirect Ping ping(C)! ack! B C D ping(C)! ping ack fail {B,C,D}
Slide 21
Slide 21 text
Indirect Ping ping(C)! B C D ping(C)! fail ...C is dead! fail
Slide 22
Slide 22 text
Key Insight Failure Detection State Updates. Solve separately from
Slide 23
Slide 23 text
Key Insight State Updates Failure Detection Piggyback onto messages.
Slide 24
Slide 24 text
State Updates ping! (B is dead) ack! (D just joined) A B C D {B,C} {D}
Slide 25
Slide 25 text
Infection Style A B C D {B,C} {A,B,D} {A,B} weakly consistent
Slide 26
Slide 26 text
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, eventually 1 * Interval High-ish O(N)
Slide 27
Slide 27 text
Improvements Time Bounded Completeness Increased Accuracy
Slide 28
Slide 28 text
Completeness A B {B, C, D, ..., N} N
Slide 29
Slide 29 text
Completeness A B {B, C, D, ..., N} N 1. Shuffle List 2. Iterate
Slide 30
Slide 30 text
Completeness Fixed Time
Slide 31
Slide 31 text
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time 1 * Interval High-ish O(N)
Slide 32
Slide 32 text
Accuracy B C D ...C is MAYBE dead! A
Slide 33
Slide 33 text
Accuracy B C D I heard C might be dead. A
Slide 34
Slide 34 text
Accuracy B C D I'm not dead! A
Slide 35
Slide 35 text
Accuracy
Slide 36
Slide 36 text
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time 1 * Interval High O(N)
Slide 37
Slide 37 text
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time 1 * Interval High O(N)
Slide 38
Slide 38 text
No content
Slide 39
Slide 39 text
Limitations Update Latency Problem Solution Separate Gossip Timer
Slide 40
Slide 40 text
Limitations Cannot Handle Network Partitions Problem Solution Track and Retry Recently Dead Nodes
Slide 41
Slide 41 text
Limitations No Concept of Graceful Leave (vs Failure) Problem Solution Broadcast and Tracking of "Intents"
Slide 42
Slide 42 text
Limitations New Nodes Take Too Long Materialize Initial State Problem Solution Anti-entropy TCP State Syncs
Slide 43
Slide 43 text
Limitations No built-in facility for user data Problem Solution Implement user payloads (ordering via lamport clocks)
Slide 44
Slide 44 text
Limitations No peer metadata, only IP Addresses Problem Solution Inject versioned peer metadata into state messages
Slide 45
Slide 45 text
Limitations No encryption Problem Solution Implement AES-GCM with key rotation
Slide 46
Slide 46 text
56 nodes 2K nodes Performance
Slide 47
Slide 47 text
Implementations Memberlist https://github.com/hashicorp/memberlist Serf lib https://github.com/hashicorp/serf Serf http://www.serfdom.io Consul http://www.consul.io
Slide 48
Slide 48 text
Implementations events, queries, and scripts serf CLI serf lib memberlist
Slide 49
Slide 49 text
Implementations service discovery, K/V, health checks consul serf lib memberlist
Slide 50
Slide 50 text
Implementations service discovery, K/V, health checks
Slide 51
Slide 51 text
Thanks Cloud icons by Julien Deveaux from the Noun Project