Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Dan Hopkins
April 19, 2014
Programming
190
1
Share
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
160
Other Decks in Programming
See All in Programming
UIの境界線をデザインする | React Tokyo #15 メイントーク
sasagar
1
280
Linux Kernelの1文字のミスで 権限昇格ができた話
rqda
0
2.3k
TiDBのアーキテクチャから学ぶ分散システム入門 〜MySQL互換のNewSQLは何を解決するのか〜 / tidb-architecture-study
dznbk
1
160
PCOVから学ぶコードカバレッジ #phpcon_odawara
o0h
PRO
0
260
Rethinking API Platform Filters
vinceamstoutz
0
11k
煩雑なSkills管理をSoC(関心の分離)により解決する――関心を分離し、プロンプトを部品として育てるためのOSSを作った話 / Solving Complex Skills Management Through SoC (Separation of Concerns)
nrslib
4
870
YJITとZJITにはイカなる違いがあるのか?
nakiym
0
200
「接続」—パフォーマンスチューニングの最後の一手 〜点と点を結ぶ、その一瞬のために〜
kentaroutakeda
5
2.5k
PHP 7.4でもOpenTelemetryゼロコード計装がしたい! / PHPerKaigi 2026
arthur1
1
570
Running Swift without an OS
kishikawakatsumi
0
780
The Monolith Strikes Back: Why AI Agents ❤️ Rails Monoliths
serradura
0
310
Make GenAI Production-Ready with Kubernetes Patterns
bibryam
0
110
Featured
See All Featured
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
330
Ethics towards AI in product and experience design
skipperchong
2
250
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Code Review Best Practice
trishagee
74
20k
Speed Design
sergeychernyshev
33
1.6k
Designing for Performance
lara
611
70k
Embracing the Ebb and Flow
colly
88
5k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?