Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SWIM: Scalable Weakly Consistent Infection Styl...
Search
Paul Hinze
July 22, 2015
Technology
2
390
SWIM: Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Papers We Love, Chicago
July 22, 2015
Paul Hinze
July 22, 2015
Tweet
Share
More Decks by Paul Hinze
See All by Paul Hinze
Getting Good at System Failure Analysis
phinze
0
470
Applying Graph Theory to Infrastructure As Code
phinze
6
2.2k
Infrastructure as Code with Terraform and Friends
phinze
2
320
Smoke & Mirrors: The Primitives of High Availability
phinze
1
730
Git: Everybody's Favorite MMO
phinze
0
200
Shut Up and Pipe! Unix-style Object Collaboration in Rack and Vagrant
phinze
0
180
Freighthop: Vagrant on Rails
phinze
0
320
Puppet Modules Are Our Friends
phinze
0
120
Who Needs Clouds?: HA in Your Datacenter
phinze
1
530
Other Decks in Technology
See All in Technology
Phase12_総括_自走化
overflowinc
0
1.4k
Laravelで学ぶOAuthとOpenID Connectの基礎と実装
kyoshidaxx
4
1.8k
LLMに何を任せ、何を任せないか
cap120
10
4.8k
20260321_エンベディングってなに?RAGってなに?エンベディングの説明とGemini Embedding 2 の紹介
tsho
0
160
Astro Islandsの 内部実装を 「日本で一番わかりやすく」 ざっくり解説!
knj
1
260
JEDAI認定プログラム JEDAI Order 2026 受賞者一覧 / JEDAI Order 2026 Winners
databricksjapan
0
300
PostgreSQL 18のNOT ENFORCEDな制約とDEFERRABLEの関係
yahonda
0
110
Phase05_ClaudeCode入門
overflowinc
0
2k
大規模ECサイトのあるバッチのパフォーマンスを改善するために僕たちのチームがしてきたこと
panda_program
1
390
GitHub Copilot CLI で Azure Portal to Bicep
tsubakimoto_s
0
180
AIエージェント×GitHubで実現するQAナレッジの資産化と業務活用 / QA Knowledge as Assets with AI Agents & GitHub
tknw_hitsuji
0
220
Phase02_AI座学_応用
overflowinc
0
2.7k
Featured
See All Featured
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
220
The Mindset for Success: Future Career Progression
greggifford
PRO
0
290
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.7k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Evolving SEO for Evolving Search Engines
ryanjones
0
170
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
110
The Pragmatic Product Professional
lauravandoore
37
7.2k
Visualization
eitanlees
150
17k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.8k
Transcript
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Paul Hinze phinze
Paul Hinze phinze death stare
Armon Dadgar armon creator of Serf and Consul
ma
None
None
Process Group Membership Protocol Who is alive
None
Process Group Membership Protocol
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Scalable The SWIM effort is motivated by the unscalability of
traditional heartbeating protocols.
Heartbeating A B C A A B B Failure Detection
+ Membership
Heartbeating
Evaluating Protocols Completeness Speed Accuracy Overhead
Evaluating Protocols Completeness Speed Accuracy Overhead heartbeating Yes Limit *
Interval High Nodes2 !
Key Insight Failure Detection State Updates. Solve separately from
Failure Detection ping! ack! A B C D {B,C,D}
Failure Detection ping! ack! A B C D {B,C,D}
Indirect Ping ping(C)! ack! B C D ping(C)! ping ack
fail {B,C,D}
Indirect Ping ping(C)! B C D ping(C)! fail ...C is
dead! fail
Key Insight Failure Detection State Updates. Solve separately from
Key Insight State Updates Failure Detection Piggyback onto messages.
State Updates ping! (B is dead) ack! (D just joined)
A B C D {B,C} {D}
Infection Style A B C D {B,C} {A,B,D} {A,B} weakly
consistent
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, eventually 1
* Interval High-ish O(N)
Improvements Time Bounded Completeness Increased Accuracy
Completeness A B {B, C, D, ..., N} N
Completeness A B {B, C, D, ..., N} N 1.
Shuffle List 2. Iterate
Completeness Fixed Time
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High-ish O(N)
Accuracy B C D ...C is MAYBE dead! A
Accuracy B C D I heard C might be dead.
A
Accuracy B C D I'm not dead! A
Accuracy
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High O(N)
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High O(N)
None
Limitations Update Latency Problem Solution Separate Gossip Timer
Limitations Cannot Handle Network Partitions Problem Solution Track and Retry
Recently Dead Nodes
Limitations No Concept of Graceful Leave (vs Failure) Problem Solution
Broadcast and Tracking of "Intents"
Limitations New Nodes Take Too Long Materialize Initial State Problem
Solution Anti-entropy TCP State Syncs
Limitations No built-in facility for user data Problem Solution Implement
user payloads (ordering via lamport clocks)
Limitations No peer metadata, only IP Addresses Problem Solution Inject
versioned peer metadata into state messages
Limitations No encryption Problem Solution Implement AES-GCM with key rotation
56 nodes 2K nodes Performance
Implementations Memberlist https://github.com/hashicorp/memberlist Serf lib https://github.com/hashicorp/serf Serf http://www.serfdom.io Consul http://www.consul.io
Implementations events, queries, and scripts serf CLI serf lib memberlist
Implementations service discovery, K/V, health checks consul serf lib memberlist
Implementations service discovery, K/V, health checks
Thanks Cloud icons by Julien Deveaux from the Noun Project