Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Dan Hopkins
April 19, 2014
Programming
190
1
Share
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
160
Other Decks in Programming
See All in Programming
CDK Deployのための ”反響定位”
watany
5
930
2026-04-15 Spring IO - I Can See Clearly Now
jonatan_ivanov
1
160
ソフトウェア設計の結合バランス #phperkaigi
kajitack
0
480
tRPCの概要と少しだけパフォーマンス
misoton665
2
260
【26新卒研修】OpenAPI/Swagger REST API研修
dip_tech
PRO
0
130
When benchmarks go bad - what I learned from measuring performance wrong
hollycummins
0
340
書き換えて学ぶTemporal #fukts
pirosikick
2
330
HTML-Aware ERB: The Path to Reactive Rendering @ RubyKaigi 2026, Hakodate, Japan
marcoroth
0
620
セグメントとターゲットを意識するプロポーザルの書き方 〜採択の鍵は、誰に刺すかを見極めるマーケティング戦略にある〜
m3m0r7
PRO
0
730
The Less-Told Story of Socket Timeouts
coe401_
3
930
「Linuxサーバー構築標準教科書」を読んでみた #ツナギメオフライン.7
akase244
0
1.4k
How Swift's Type System Guides AI Agents
koher
0
330
Featured
See All Featured
The Spectacular Lies of Maps
axbom
PRO
1
730
Utilizing Notion as your number one productivity tool
mfonobong
4
300
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
200
Accessibility Awareness
sabderemane
1
110
Are puppies a ranking factor?
jonoalderson
1
3.4k
Paper Plane (Part 1)
katiecoart
PRO
0
7k
Deep Space Network (abreviated)
tonyrice
0
130
Code Reviewing Like a Champion
maltzj
528
40k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
WENDY [Excerpt]
tessaabrams
10
37k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?