Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SWIM: Scalable Weakly Consistent Infection Styl...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Paul Hinze
July 22, 2015
Technology
400
2
Share
SWIM: Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Papers We Love, Chicago
July 22, 2015
Paul Hinze
July 22, 2015
More Decks by Paul Hinze
See All by Paul Hinze
Getting Good at System Failure Analysis
phinze
0
480
Applying Graph Theory to Infrastructure As Code
phinze
6
2.2k
Infrastructure as Code with Terraform and Friends
phinze
2
330
Smoke & Mirrors: The Primitives of High Availability
phinze
1
760
Git: Everybody's Favorite MMO
phinze
0
220
Shut Up and Pipe! Unix-style Object Collaboration in Rack and Vagrant
phinze
0
190
Freighthop: Vagrant on Rails
phinze
0
340
Puppet Modules Are Our Friends
phinze
0
140
Who Needs Clouds?: HA in Your Datacenter
phinze
1
550
Other Decks in Technology
See All in Technology
APIテストとは?
nagix
0
180
探して_入れて_作って_使う_Agent_Skills___LT.pdf
peintangos
2
160
Platform Engineering as a Product: Criteria for Improvement and Multi-Tenant Design
kumorn5s
0
490
AI フレンドリーなエラー監視を TypeScript で実現する
shinyaigeek
2
250
エンジニアは生成AIと どのように向き合うべきか? ことばの意味という観点から
verypluming
3
340
AI活用を推進するために ファインディが下した、一つの小さな決断
starfish719
0
230
Spring Boot における AOT Cache 活用テクニックと 起動時間改善事例
ntt_dsol_java
0
200
Chart.js が簡単に使えるようになっていたので OGP 画像生成に使った話
kamekyame
0
140
JEP 522 Deep Dive - G1 GC同期コスト削減によるスループット向上を徹底検証&解説
tabatad
1
710
サプライチェーンセキュリティの空白地帯 - 信頼できる”依存性”の未来を考える
rung
PRO
2
650
ポスター発表&デモと総括 / Poster Presentations & Demonstrations and Summary
ks91
PRO
0
190
サイバーセキュリティ概論 / Introduction to Cybersecurity
ks91
PRO
0
130
Featured
See All Featured
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
1
250
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
370
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
1.1k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.5k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
201
74k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
150
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
190
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
530
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
For a Future-Friendly Web
brad_frost
183
10k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
130
Transcript
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Paul Hinze phinze
Paul Hinze phinze death stare
Armon Dadgar armon creator of Serf and Consul
ma
None
None
Process Group Membership Protocol Who is alive
None
Process Group Membership Protocol
SWIM Scalable Weakly Consistent Infection Style Process Group Membership Protocol
Scalable The SWIM effort is motivated by the unscalability of
traditional heartbeating protocols.
Heartbeating A B C A A B B Failure Detection
+ Membership
Heartbeating
Evaluating Protocols Completeness Speed Accuracy Overhead
Evaluating Protocols Completeness Speed Accuracy Overhead heartbeating Yes Limit *
Interval High Nodes2 !
Key Insight Failure Detection State Updates. Solve separately from
Failure Detection ping! ack! A B C D {B,C,D}
Failure Detection ping! ack! A B C D {B,C,D}
Indirect Ping ping(C)! ack! B C D ping(C)! ping ack
fail {B,C,D}
Indirect Ping ping(C)! B C D ping(C)! fail ...C is
dead! fail
Key Insight Failure Detection State Updates. Solve separately from
Key Insight State Updates Failure Detection Piggyback onto messages.
State Updates ping! (B is dead) ack! (D just joined)
A B C D {B,C} {D}
Infection Style A B C D {B,C} {A,B,D} {A,B} weakly
consistent
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, eventually 1
* Interval High-ish O(N)
Improvements Time Bounded Completeness Increased Accuracy
Completeness A B {B, C, D, ..., N} N
Completeness A B {B, C, D, ..., N} N 1.
Shuffle List 2. Iterate
Completeness Fixed Time
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High-ish O(N)
Accuracy B C D ...C is MAYBE dead! A
Accuracy B C D I heard C might be dead.
A
Accuracy B C D I'm not dead! A
Accuracy
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High O(N)
Evaluating Protocols Completeness Speed Accuracy Overhead SWIM Yes, fixed time
1 * Interval High O(N)
None
Limitations Update Latency Problem Solution Separate Gossip Timer
Limitations Cannot Handle Network Partitions Problem Solution Track and Retry
Recently Dead Nodes
Limitations No Concept of Graceful Leave (vs Failure) Problem Solution
Broadcast and Tracking of "Intents"
Limitations New Nodes Take Too Long Materialize Initial State Problem
Solution Anti-entropy TCP State Syncs
Limitations No built-in facility for user data Problem Solution Implement
user payloads (ordering via lamport clocks)
Limitations No peer metadata, only IP Addresses Problem Solution Inject
versioned peer metadata into state messages
Limitations No encryption Problem Solution Implement AES-GCM with key rotation
56 nodes 2K nodes Performance
Implementations Memberlist https://github.com/hashicorp/memberlist Serf lib https://github.com/hashicorp/serf Serf http://www.serfdom.io Consul http://www.consul.io
Implementations events, queries, and scripts serf CLI serf lib memberlist
Implementations service discovery, K/V, health checks consul serf lib memberlist
Implementations service discovery, K/V, health checks
Thanks Cloud icons by Julien Deveaux from the Noun Project