Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
130
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
300
Other Decks in Technology
See All in Technology
AIエージェント時代のエンジニアになろう #jawsug #jawsdays2025 / 20250301 Agentic AI Engineering
yoshidashingo
9
4.4k
開発者のための FinOps/FinOps for Engineers
oracle4engineer
PRO
2
290
エンジニアの健康管理術 / Engineer Health Management Techniques
y_sone
8
6.5k
事業を差別化する技術を生み出す技術
pyama86
3
1k
アジャイルな開発チームでテスト戦略の話は誰がする? / Who Talks About Test Strategy?
ak1210
1
910
困難を「一般解」で解く
fujiwara3
9
2.9k
失敗しないAIエージェント開発:階層的タスク分解の実践
kworkdev
PRO
0
460
プルリクエストレビューを終わらせるためのチーム体制 / The Team for Completing Pull Request Reviews
nekonenene
4
2k
目標と時間軸 〜ベイビーステップでケイパビリティを高めよう〜
kakehashi
PRO
8
1.1k
エンジニアのキャリアパスと、 その中で自分が大切にしていること
noteinc
3
3.1k
開発組織を進化させる!AWSで実践するチームトポロジー
iwamot
2
620
人生を左右する「即答」のススメ: 一瞬の判断を間違えないためにするべきこと
takasyou
9
1.2k
Featured
See All Featured
Six Lessons from altMBA
skipperchong
27
3.6k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
Side Projects
sachag
452
42k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Practical Orchestrator
shlominoach
186
10k
Unsuck your backbone
ammeep
669
57k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.4k
How to Ace a Technical Interview
jacobian
276
23k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
How to train your dragon (web standard)
notwaldorf
91
5.9k
Bash Introduction
62gerente
611
210k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.5k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech