Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
130
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
290
Other Decks in Technology
See All in Technology
東京Ruby会議12 Ruby と Rust と私 / Tokyo RubyKaigi 12 Ruby, Rust and me
eagletmt
3
870
EMConf JP の楽しみ方 / How to enjoy EMConf JP
pauli
2
150
Cloudflareで実現する AIエージェント ワークフロー基盤
kmd09
0
290
dbtを中心にして組織のアジリティとガバナンスのトレードオンを考えてみた
gappy50
0
250
comilioとCloudflare、そして未来へと向けて
oliver_diary
6
440
2024年活動報告会(人材育成推進WG・ビジネスサブWG) / 20250114-OIDF-J-EduWG-BizSWG
oidfj
0
230
re:Invent 2024のふりかえり
beli68
0
110
完全自律型AIエージェントとAgentic Workflow〜ワークフロー構築という現実解
pharma_x_tech
0
350
シフトライトなテスト活動を適切に行うことで、無理な開発をせず、過剰にテストせず、顧客をビックリさせないプロダクトを作り上げているお話 #RSGT2025 / Shift Right
nihonbuson
3
2.1k
月間60万ユーザーを抱える 個人開発サービス「Walica」の 技術スタック変遷
miyachin
1
140
今から、 今だからこそ始める Terraform で Azure 管理 / Managing Azure with Terraform: The Perfect Time to Start
nnstt1
0
240
あなたの人生も変わるかも?AWS認定2つで始まったウソみたいな話
iwamot
3
850
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
44
13k
A designer walks into a library…
pauljervisheath
205
24k
The Pragmatic Product Professional
lauravandoore
32
6.4k
Designing Experiences People Love
moore
139
23k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.4k
A better future with KSS
kneath
238
17k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.2k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
The World Runs on Bad Software
bkeepers
PRO
66
11k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Why Our Code Smells
bkeepers
PRO
335
57k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech