Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Dan Hopkins
April 19, 2014
Programming
1
190
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
Tweet
Share
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
150
Other Decks in Programming
See All in Programming
Claude Code + Container Use と Cursor で作る ローカル並列開発環境のススメ / ccc local dev
kaelaela
5
2.4k
Railsアプリケーションと パフォーマンスチューニング ー 秒間5万リクエストの モバイルオーダーシステムを支える事例 ー Rubyセミナー 大阪
falcon8823
5
1.1k
なんとなくわかった気になるブロックテーマ入門/contents.nagoya 2025 6.28
chiilog
1
270
AI時代のソフトウェア開発を考える(2025/07版) / Agentic Software Engineering Findy 2025-07 Edition
twada
PRO
76
25k
#QiitaBash MCPのセキュリティ
ryosukedtomita
1
990
Discover Metal 4
rei315
2
130
PostgreSQLのRow Level SecurityをPHPのORMで扱う Eloquent vs Doctrine #phpcon #track2
77web
2
510
0626 Findy Product Manager LT Night_高田スライド_speaker deck用
mana_takada
0
160
iOS 26にアップデートすると実機でのHot Reloadができない?
umigishiaoi
0
130
AIエージェントはこう育てる - GitHub Copilot Agentとチームの共進化サイクル
koboriakira
0
510
AIともっと楽するE2Eテスト
myohei
0
330
Deep Dive into ~/.claude/projects
hiragram
13
2.5k
Featured
See All Featured
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
How to Ace a Technical Interview
jacobian
277
23k
Writing Fast Ruby
sferik
628
62k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
RailsConf 2023
tenderlove
30
1.1k
Java REST API Framework Comparison - PWX 2021
mraible
31
8.7k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
46
9.6k
Fireside Chat
paigeccino
37
3.5k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
657
60k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
Code Reviewing Like a Champion
maltzj
524
40k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?