Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Dan Hopkins
April 19, 2014
Programming
190
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
170
Other Decks in Programming
See All in Programming
エンジニアと一緒にテストコードの設計と実装を改善した話
mototakatsu
0
180
The ROI of Quarkus for Spring Boot Applications
hollycummins
0
120
キャリア迷子上等 ─ "ない道"は自分で作ればいい
16bitidol
3
2.1k
ふつうのFeature Flag実践入門
irof
7
3.9k
Semantic Version 単位で戦略を柔軟に変えて、パッケージアップデートを自動化する
daitasu
1
240
Honoでのサプライチェーン侵害対策 〜 3つのライブラリに学ぶ
yusukebe
6
1.1k
Creating Composable Callables in Contemporary C++
rollbear
0
130
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
500
「なぜそう決めたのか」を残し続ける仕組み ― Notion AI カスタムエージェント × Slack連携による設計判断の自動記録 - NIKKEI Tech Talk #47
niftycorp
PRO
0
170
Dataformのリポジトリを立ち上げるときにまずやること / dataform-day0-2026
snhryt
0
160
Skillsは効率化、Agentsは"自分の拡張"——Builder時代のエージェント編成(CC Night 2026)
wemra
1
130
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
170
Featured
See All Featured
Chasing Engaging Ingredients in Design
codingconduct
0
220
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
The Pragmatic Product Professional
lauravandoore
37
7.3k
Prompt Engineering for Job Search
mfonobong
0
340
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
2
580
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.5k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
10k
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
The agentic SEO stack - context over prompts
schlessera
0
820
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
330
Joys of Absence: A Defence of Solitary Play
codingconduct
1
390
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
170
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?