Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
130
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
310
Other Decks in Technology
See All in Technology
Recap of Next - Google Cloud で実践する クラウドネイティブ最前線 / The Frontlines of Cloud-Native with Insights from Google Cloud
aoto
PRO
1
100
YOLOv10~v12
tenten0727
4
930
Amazon CloudWatch Application Signals ではじめるバーンレートアラーム / Burn rate alarm with Amazon CloudWatch Application Signals
ymotongpoo
5
420
フロントエンドも盛り上げたい!フロントエンドCBとAmplifyの軌跡
mkdev10
2
270
プロダクト開発におけるAI時代の開発生産性
shnjtk
2
230
Рекомендации с нуля: как мы в Lamoda превратили главную страницу в ключевую точку входа для персонализированного шоппинга. Данил Комаров, Data Scientist, Lamoda Tech
lamodatech
0
690
LiteXとオレオレCPUで作る自作SoC奮闘記
msyksphinz
0
570
Beyond {shiny}: The Future of Mobile Apps with R
colinfay
1
410
Creating Awesome Change in SmartNews
martin_lover
1
270
Goの組織でバックエンドTypeScriptを採用してどうだったか / How was adopting backend TypeScript in a Golang company
kaminashi
5
4.1k
Webアプリを Lambdaで動かすまでに考えること / How to implement monolithic Lambda Web Application
_kensh
7
1.3k
.mdc駆動ナレッジマネジメント/.mdc-driven knowledge management
yodakeisuke
24
12k
Featured
See All Featured
How to train your dragon (web standard)
notwaldorf
90
6k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.5k
Designing for humans not robots
tammielis
252
25k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
5
540
Building Flexible Design Systems
yeseniaperezcruz
329
38k
Agile that works and the tools we love
rasmusluckow
328
21k
We Have a Design System, Now What?
morganepeng
52
7.5k
The Art of Programming - Codeland 2020
erikaheidi
53
13k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.2k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.3k
Navigating Team Friction
lara
184
15k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech