Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Dan Hopkins
April 19, 2014
Programming
190
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
170
Other Decks in Programming
See All in Programming
Dataformのリポジトリを立ち上げるときにまずやること / dataform-day0-2026
snhryt
0
160
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.2k
Vite+ Unified Toolchain for the Web
naokihaba
0
310
3Dシーンの圧縮
fadis
1
770
過去最大のMCPアップデート! 2026-07-28 RC版の謎に迫る
licux
6
330
TAKTでAI駆動開発の品質を設計する
j5ik2o
7
1.3k
さぁV100、メモリをお食べ・・・
nilpe
0
140
Hunting Vulnerabilities in Symfony with LLMs
vinceamstoutz
0
540
軽量Java基盤の設計 DIコンテナに頼らない、長期保守と1秒起動の実現 JJUG CCC 2026 Spring
macha64
0
520
ADKを使って簡単にAIエージェントを作ってみよう
k1mu21
0
260
net-httpのHTTP/2対応について
naruse
0
480
The ROI of Quarkus for Spring Boot Applications
hollycummins
0
120
Featured
See All Featured
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
460
Paper Plane
katiecoart
PRO
1
51k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
230
23k
KATA
mclloyd
PRO
35
15k
Color Theory Basics | Prateek | Gurzu
gurzu
0
360
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.7k
The Limits of Empathy - UXLibs8
cassininazir
1
360
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
54k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
410
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
540
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
Paper Plane (Part 1)
katiecoart
PRO
0
9k
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?