Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
130
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
330
Other Decks in Technology
See All in Technology
GA technologiesでのAI-Readyの取り組み@DataOps Night
yuto16
0
270
AI時代だからこそ考える、僕らが本当につくりたいスクラムチーム / A Scrum Team we really want to create in this AI era
takaking22
6
3.4k
PLaMoの事後学習を支える技術 / PFN LLMセミナー
pfn
PRO
9
3.8k
データエンジニアがこの先生きのこるには...?
10xinc
0
440
Function calling機能をPLaMo2に実装するには / PFN LLMセミナー
pfn
PRO
0
910
10年の共創が示す、これからの開発者と企業の関係 ~ Crossroad
soracom
PRO
1
170
GC25 Recap+: Advancing Go Garbage Collection with Green Tea
logica0419
1
400
Trust as Infrastructure
bcantrill
0
320
KMP の Swift export
kokihirokawa
0
330
ユニットテストに対する考え方の変遷 / Everyone should watch his live coding
mdstoy
0
120
SOC2取得の全体像
shonansurvivors
1
370
Optuna DashboardにおけるPLaMo2連携機能の紹介 / PFN LLM セミナー
pfn
PRO
1
870
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
36
6.9k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
51k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
960
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Fireside Chat
paigeccino
40
3.7k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
The Straight Up "How To Draw Better" Workshop
denniskardys
237
140k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
GitHub's CSS Performance
jonrohan
1032
460k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech