Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Peter Mounce
April 24, 2015
Technology
140
0
Share
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
170
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
350
Other Decks in Technology
See All in Technology
FinJAWS_ECSーRDSProxy
asahihidehiko
0
110
AI Agent に“攻略本”を渡したら、150フォームの移行が回り始めた話/登壇資料(高橋 悟生)
hacobu
PRO
1
430
LLM時代のリファクタリング戦略_AIエージェントによる段階的・安全なTS移行方法
play_inc
0
180
Geek Woman の育ち方 〜コミュニティとAIと〜
chicaco
0
400
社内RAGの導入で気を付けたポイント
yakumo
2
150
AI時代に求められる思考のパラダイムシフト
nrinetcom
PRO
1
140
Amazon CloudFrontにおけるAIボットアクセス制御のポイント
kizawa2020
4
260
責任あるソフトウェアエンジニアリングの紹介4章・5章 / RSE_Ch4-5
ido_kara_deru
0
320
[みん強]AIの価値を最大化するデータ基盤戦略:Self-Service型Data Meshへの転換とAgentic AI Meshに向けた取り組み with Snowflake他
y_matsubara
1
180
シンデレラなんかになりたくない!ガラスの靴が割れた時代にどう歩く?
nomizone
0
190
データ基盤構築・運用の現場から 〜 Snowflake Intelligence 導入で変わった、データ活用の未来 〜
wonohe
0
170
TypeScriptとAngular Signal で実現する保守性の高いアプリケーション設計 - 3層アーキテクチャによる責務分離の実践(たつかわ) https://2026.tskaigi.org/talks/10
nealle
1
340
Featured
See All Featured
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.3k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
300
Paper Plane
katiecoart
PRO
1
50k
The Invisible Side of Design
smashingmag
302
52k
Balancing Empowerment & Direction
lara
6
1.1k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.7k
30 Presentation Tips
portentint
PRO
1
300
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.6k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.5k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.2k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech