Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Dan Hopkins
April 19, 2014
Programming
1
190
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
Tweet
Share
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
150
Other Decks in Programming
See All in Programming
社内での開発コミュニティ活動とモジュラーモノリス標準化事例のご紹介/xPalette and Introduction of Modular monolith standardization
m4maruyama
1
130
データの民主化を支える、透明性のあるデータ利活用への挑戦 2025-06-25 Database Engineering Meetup#7
y_ken
0
270
Create a website using Spatial Web
akkeylab
0
290
CursorはMCPを使った方が良いぞ
taigakono
0
130
KotlinConf 2025 現地で感じたServer-Side Kotlin
n_takehata
1
220
つよそうにふるまい、つよい成果を出すのなら、つよいのかもしれない
irof
1
290
ドメインモデリングにおける抽象の役割、tagless-finalによるDSL構築、そして型安全な最適化
knih
11
1.9k
Elixir で IoT 開発、 Nerves なら簡単にできる!?
pojiro
1
150
既存デザインを変更せずにタップ領域を広げる方法
tahia910
1
240
エラーって何種類あるの?
kajitack
5
260
Select API from Kotlin Coroutine
jmatsu
1
180
GoのWebAssembly活用パターン紹介
syumai
3
10k
Featured
See All Featured
Building Applications with DynamoDB
mza
95
6.5k
Agile that works and the tools we love
rasmusluckow
329
21k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
20
1.3k
Adopting Sorbet at Scale
ufuk
77
9.4k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
161
15k
Balancing Empowerment & Direction
lara
1
340
The World Runs on Bad Software
bkeepers
PRO
68
11k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
GraphQLとの向き合い方2022年版
quramy
46
14k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.3k
Done Done
chrislema
184
16k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.6k
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?