Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
There's no Clusterf*ck without a Cluster
Search
Dan Hopkins
April 19, 2014
Programming
1
190
There's no Clusterf*ck without a Cluster
Dan Hopkins
April 19, 2014
Tweet
Share
More Decks by Dan Hopkins
See All by Dan Hopkins
Actors: not just for movies anymore
danielhopkins
1
150
Other Decks in Programming
See All in Programming
Android端末で実現するオンデバイスLLM 2025
masayukisuda
1
150
AWS発のAIエディタKiroを使ってみた
iriikeita
1
190
1から理解するWeb Push
dora1998
7
1.9k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
370
OSS開発者という働き方
andpad
5
1.7k
Navigation 2 を 3 に移行する(予定)ためにやったこと
yokomii
0
230
Kiroで始めるAI-DLC
kaonash
2
590
Laravel Boost 超入門
fire_arlo
3
220
プロパティベーステストによるUIテスト: LLMによるプロパティ定義生成でエッジケースを捉える
tetta_pdnt
0
330
基礎から学ぶ大画面対応(Learning Large-Screen Support from the Ground Up)
tomoya0x00
0
460
デザイナーが Androidエンジニアに 挑戦してみた
874wokiite
0
440
プロポーザル駆動学習 / Proposal-Driven Learning
mackey0225
2
1.3k
Featured
See All Featured
How to Ace a Technical Interview
jacobian
279
23k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.1k
Imperfection Machines: The Place of Print at Facebook
scottboms
268
13k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
Site-Speed That Sticks
csswizardry
10
810
The World Runs on Bad Software
bkeepers
PRO
70
11k
Raft: Consensus for Rubyists
vanstee
140
7.1k
Gamification - CAS2011
davidbonilla
81
5.4k
A Tale of Four Properties
chriscoyier
160
23k
Designing Experiences People Love
moore
142
24k
Producing Creativity
orderedlist
PRO
347
40k
Transcript
There's No Clusterf*ck without a Cluster How @GoVictorOps went from
unicorns and broken to boring and working
Premature availabilization? • Connect you with your monitors • Harass
you when stuff breaks @boulderDanH
Availability is our DNA • Scala • Akka • Kafka
• Shard key
What is clustering?
An online encyclopedia says • Computers working together (appeals to
authority)
A dictionary says • clus·ter noun \ˈkləs-tər\ a number of
similar things that occur together (includes pronunciation for legitimacy)
Our definition • Who is currently in the cluster? •
Tell me when nodes are coming and going • High Availability / scaling
Requirements 1.0 1. Logical actor tree 2. Service discovery 3.
Lead me to success
Logical actor tree • Failover • Hand off
Service discovery • “cluster://user/victorops/broadcaster” ! “hello”
Tradeoffs are everywhere Vector clocks are totally cool Async consensus?
None
Implementation • Routers / Patterns • Native = Truth
Actor state • Easy and Tempting • Painful to unwind
None
None
What could go wrong? • Partitions are permanent • Want
some config? How about six! ◦ failure-detector.threshold x 2 ◦ failure-detector.min-std-deviation x 2 ◦ failure-detector.acceptable-heartbeat-pause x 2 • Hazelcast uses hazelcast.max.no.heartbeat.seconds • ZooKeeper uses “session timeout”
More picking on Akka • Logging during failures is sparse
• Remoting / Failure detection weren’t bulkheaded
Recap 1. Logical actor tree 2. Service discovery 3. Lead
me to success
Requirements 2.0 1. Member lists 2. Easy to configure, ability
to add machines w/o config 3. Pass remoting address around
None
What is Hazelcast? • Distributed maps & locks • Multicast
(IGMP)
Implementation akka.remote.quarantine-systems-for = "off" akka.remote.gate-invalid-addresses-for = 0 s src: akka-devel
• Publish Akka address using a map • Detect nodes joining / leaving cluster
• Multicast • In memory • Cluster Client • Member
list isn’t consistent across cluster What went wrong?
Recap on requirements 2.0 1. Member lists 2. Easy to
configure 3. Pass remoting address around
Requirements 3.0 1. Member list is consistent 2. Cluster clients
are first class
Cluster Membership • Consistent - Zk • Probably consistent -
Gossip • YOLO consistency - Hazelcast
… no seriously, this is the logo
What is ZooKeeper? • Clustered, consistent file system • API
is focused on building distributed concepts
Implementation • Cluster Membership = EPHEMERAL • Leader Election =
SEQUENTIAL • “Cluster” = EPHEMERAL_SEQUENTIAL • Store akka addresses in ephemeral nodes • Curator project
The Good • Reputation • Strong Consistency • Cluster clients
/ Service Discovery
What was / is hard? • Twitter’s Zk library •
External Cluster Manager
The final tally • Solid concept of membership • Keep
things simple • Log / Graph / Monitor everything
Questions?