Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
130
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
330
Other Decks in Technology
See All in Technology
QA業務を変える(!?)AIを併用した不具合分析の実践
ma2ri
0
140
MCP ✖️ Apps SDKを触ってみた
hisuzuya
0
360
組織全員で向き合うAI Readyなデータ利活用
gappy50
0
280
マルチエージェントのチームビルディング_2025-10-25
shinoyamada
0
160
CREが作る自己解決サイクルSlackワークフローに組み込んだAIによる社内ヘルプデスク改革 #cre_meetup
bengo4com
0
330
Wasmの気になる最新情報
askua
0
190
HonoとJSXを使って管理画面をサクッと型安全に作ろう
diggymo
0
180
オブザーバビリティが育むシステム理解と好奇心
maruloop
2
1.1k
SCONE - 動画配信の帯域を最適化する新プロトコル
kazuho
1
380
GraphRAG グラフDBを使ったLLM生成(自作漫画DBを用いた具体例を用いて)
seaturt1e
1
150
ソフトウェアエンジニアの生成AI活用と、これから
lycorptech_jp
PRO
0
900
会社を支える Pythonという言語戦略 ~なぜPythonを主要言語にしているのか?~
curekoshimizu
3
670
Featured
See All Featured
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
GitHub's CSS Performance
jonrohan
1032
470k
Building a Scalable Design System with Sketch
lauravandoore
463
33k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
Balancing Empowerment & Direction
lara
5
700
The Art of Programming - Codeland 2020
erikaheidi
56
14k
Six Lessons from altMBA
skipperchong
29
4k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
34
2.3k
GraphQLとの向き合い方2022年版
quramy
49
14k
Mobile First: as difficult as doing things right
swwweet
225
10k
Designing Experiences People Love
moore
142
24k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech