Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
130
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
320
Other Decks in Technology
See All in Technology
現場が抱える様々な問題は “組織設計上” の問題によって生じていることがある / Team-oriented Organization Design 20250827
mtx2s
6
1.4k
あなたの知らない OneDrive
murachiakira
0
240
Browser
recruitengineers
PRO
4
480
我々は雰囲気で仕事をしている / How can we do vibe coding as well
naospon
2
220
実践アプリケーション設計 ②トランザクションスクリプトへの対応
recruitengineers
PRO
3
360
Postman MCP 関連機能アップデート / Postman MCP feature updates
yokawasa
1
160
Backboneとしてのtimm2025
yu4u
4
1.6k
Go で言うところのアレは TypeScript で言うとコレ / Kyoto.なんか #7
susisu
7
1.9k
そのコンポーネント、サーバー?クライアント?App Router開発のモヤモヤを可視化する補助輪
makotot
4
610
KiroでGameDay開催してみよう(準備編)
yuuuuuuu168
1
140
Evolution on AI Agent and Beyond - AGI への道のりと、シンギュラリティの3つのシナリオ
masayamoriofficial
0
190
Understanding Go GC #coefl_go_jp
bengo4com
0
1.1k
Featured
See All Featured
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
[RailsConf 2023] Rails as a piece of cake
palkan
56
5.8k
Building Adaptive Systems
keathley
43
2.7k
Raft: Consensus for Rubyists
vanstee
140
7.1k
For a Future-Friendly Web
brad_frost
179
9.9k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
1k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.8k
Site-Speed That Sticks
csswizardry
10
790
The World Runs on Bad Software
bkeepers
PRO
70
11k
Agile that works and the tools we love
rasmusluckow
329
21k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech