Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Dan Hopkins
April 19, 2014
Programming
1
190
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
Tweet
Share
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
150
Other Decks in Programming
See All in Programming
FOSDEM 2026: STUNMESH-go: Building P2P WireGuard Mesh Without Self-Hosted Infrastructure
tjjh89017
0
170
Apache Iceberg V3 and migration to V3
tomtanaka
0
170
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
610
AIによるイベントストーミング図からのコード生成 / AI-powered code generation from Event Storming diagrams
nrslib
2
1.9k
【卒業研究】会話ログ分析によるユーザーごとの関心に応じた話題提案手法
momok47
0
200
AI時代のキャリアプラン「技術の引力」からの脱出と「問い」へのいざない / tech-gravity
minodriven
21
7.4k
なぜSQLはAIぽく見えるのか/why does SQL look AI like
florets1
0
480
CSC307 Lecture 07
javiergs
PRO
1
560
AI によるインシデント初動調査の自動化を行う AI インシデントコマンダーを作った話
azukiazusa1
1
750
Package Management Learnings from Homebrew
mikemcquaid
0
230
AtCoder Conference 2025
shindannin
0
1.1k
CSC307 Lecture 01
javiergs
PRO
0
690
Featured
See All Featured
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Context Engineering - Making Every Token Count
addyosmani
9
660
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.2k
BBQ
matthewcrist
89
10k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
196
71k
A Soul's Torment
seathinner
5
2.3k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
330
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
190
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.7k
How STYLIGHT went responsive
nonsquared
100
6k
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?