Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Peter Mounce
April 24, 2015
Technology
0
140
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
340
Other Decks in Technology
See All in Technology
学生・新卒・ジュニアから目指すSRE
hiroyaonoe
2
650
Ruby版 JSXのRuxが気になる
sansantech
PRO
0
160
2026年、サーバーレスの現在地 -「制約と戦う技術」から「当たり前の実行基盤」へ- /serverless2026
slsops
2
260
配列に見る bash と zsh の違い
kazzpapa3
3
160
超初心者からでも大丈夫!オープンソース半導体の楽しみ方〜今こそ!オレオレチップをつくろう〜
keropiyo
0
110
インフラエンジニア必見!Kubernetesを用いたクラウドネイティブ設計ポイント大全
daitak
1
370
制約が導く迷わない設計 〜 信頼性と運用性を両立するマイナンバー管理システムの実践 〜
bwkw
3
970
広告の効果検証を題材にした因果推論の精度検証について
zozotech
PRO
0
190
Oracle Cloud Observability and Management Platform - OCI 運用監視サービス概要 -
oracle4engineer
PRO
2
14k
今日から始めるAmazon Bedrock AgentCore
har1101
4
410
Bill One急成長の舞台裏 開発組織が直面した失敗と教訓
sansantech
PRO
2
380
Claude_CodeでSEOを最適化する_AI_Ops_Community_Vol.2__マーケティングx_AIはここまで進化した.pdf
riku_423
2
600
Featured
See All Featured
Faster Mobile Websites
deanohume
310
31k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
310
Product Roadmaps are Hard
iamctodd
PRO
55
12k
From π to Pie charts
rasagy
0
120
A designer walks into a library…
pauljervisheath
210
24k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
170
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
300
The Pragmatic Product Professional
lauravandoore
37
7.1k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
130
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
120
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.7k
We Have a Design System, Now What?
morganepeng
54
8k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech