Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Peter Mounce
April 24, 2015
Technology
140
0
Share
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
170
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
340
Other Decks in Technology
See All in Technology
システムは「動く」だけでは 足りない - 非機能要件・分散システム・トレードオフの基礎
nwiizo
23
6.9k
スクラムを支える内部品質の話
iij_pr
0
340
Data Enabling Team立ち上げました
sansantech
PRO
0
290
Hooks, Filters & Now Context: Why MCPs Are the “Hooks” of the AI Era
miriamschwab
0
130
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.2k
20260410 - CNTUG meetup #72 - DiskImage Builder 介紹:以 Kubespray CI 打造 RockyLinux 10 Cloud Image 為例
tico88612
0
110
MCPゲートウェイ MCPass の設計と実装 エンタープライズで AI を「運用できる」状態にする
mtpooh
1
210
本番環境でPHPコードに触れずに「使われていないコード」を調べるにはどうしたらよいか?
egmc
1
260
Kubernetes基盤における開発者体験 とセキュリティの両⽴ / Balancing developer experience and security in a Kubernetes-based environment
chmikata
0
220
Claude Teamプランの選定と、できること/できないこと
rfdnxbro
1
1.8k
DIPS2.0データに基づく森林管理における無人航空機の利用状況
naokimuroki
0
160
【関西電力KOI×VOLTMIND 生成AIハッカソン】空間AIブレイン ~⼤阪おばちゃんフィジカルAIに続く道~
tanakaseiya
0
180
Featured
See All Featured
The Curse of the Amulet
leimatthew05
1
11k
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
KATA
mclloyd
PRO
35
15k
Git: the NoSQL Database
bkeepers
PRO
432
67k
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
Ruling the World: When Life Gets Gamed
codingconduct
0
190
How to train your dragon (web standard)
notwaldorf
97
6.6k
30 Presentation Tips
portentint
PRO
1
270
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.7k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.3k
Claude Code のすすめ
schroneko
67
220k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
250
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech