Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
140
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
340
Other Decks in Technology
See All in Technology
外部キー制約の知っておいて欲しいこと - RDBMSを正しく使うために必要なこと / FOREIGN KEY Night
soudai
PRO
12
5.6k
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
2
320
Bill One急成長の舞台裏 開発組織が直面した失敗と教訓
sansantech
PRO
2
380
StrandsとNeptuneを使ってナレッジグラフを構築する
yakumo
1
120
学生・新卒・ジュニアから目指すSRE
hiroyaonoe
2
650
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
OWASP Top 10:2025 リリースと 少しの日本語化にまつわる裏話
okdt
PRO
3
820
SREが向き合う大規模リアーキテクチャ 〜信頼性とアジリティの両立〜
zepprix
0
460
生成AIを活用した音声文字起こしシステムの2つの構築パターンについて
miu_crescent
PRO
3
210
Bill One 開発エンジニア 紹介資料
sansan33
PRO
5
17k
クレジットカード決済基盤を支えるSRE - 厳格な監査とSRE運用の両立 (SRE Kaigi 2026)
capytan
6
2.8k
プロポーザルに込める段取り八分
shoheimitani
1
470
Featured
See All Featured
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Bash Introduction
62gerente
615
210k
Automating Front-end Workflow
addyosmani
1371
200k
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
320
KATA
mclloyd
PRO
34
15k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.7k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.7k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
200
How to Ace a Technical Interview
jacobian
281
24k
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech