Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Dan Hopkins
April 19, 2014
Programming
1
190
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
Tweet
Share
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
150
Other Decks in Programming
See All in Programming
Go言語はstack overflowの夢を見るか?
logica0419
0
360
overlayPreferenceValue で実現する ピュア SwiftUI な AdMob ネイティブ広告
uhucream
0
180
AI Agent 時代的開發者生存指南
eddie
3
1.7k
Six and a half ridiculous things to do with Quarkus
hollycummins
0
180
大規模アプリのDIフレームワーク刷新戦略 ~過去最大規模の並行開発を止めずにアプリ全体に導入するまで~
mot_techtalk
1
460
私達はmodernize packageに夢を見るか feat. go/analysis, go/ast / Go Conference 2025
kaorumuta
2
590
3年ぶりにコードを書いた元CTOが Claude Codeと30分でMVPを作った話
maikokojima
0
500
Introducing ReActionView: A new ActionView-Compatible ERB Engine @ Kaigi on Rails 2025, Tokyo, Japan
marcoroth
3
1k
開発生産性を上げるための生成AI活用術
starfish719
3
1.2k
Railsだからできる 例外業務に禍根を残さない 設定設計パターン
ei_ei_eiichi
0
930
(Extension DC 2025) Actor境界を越える技術
teamhimeh
1
260
Cursorハンズオン実践!
eltociear
2
1.1k
Featured
See All Featured
Speed Design
sergeychernyshev
32
1.2k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
10
870
Music & Morning Musume
bryan
46
6.8k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.7k
Side Projects
sachag
455
43k
Building an army of robots
kneath
306
46k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
590
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
KATA
mclloyd
32
15k
Java REST API Framework Comparison - PWX 2021
mraible
34
8.9k
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?