Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
120
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
150
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
260
Other Decks in Technology
See All in Technology
複雑な構成要素を持つUIとの向き合い方 〜新・支出グラフでの実例〜 / B43 TECH TALK
nakamuuu
0
140
ゼロから始めるVue.jsコミュニティ貢献 / first-vuejs-community-contribution-link-and-motivation
lmi
1
130
本当のAWS基礎
toru_kubota
0
530
家族アルバム みてねにおけるGrafana活用術 / Grafana Meetup Japan Vol.1 LT
isaoshimizu
1
770
require(ESM)とECMAScript仕様
uhyo
3
780
競技としてのKaggle、役に立つKaggle
yu4u
3
1.9k
Vertex AI を中心に 生成AIのアップデートを共有します
kaz1437
0
310
私が trocco を推す理由
__allllllllez__
1
260
一生覚えておきたい「システム開発=コミュニケーション」〜初めての実務案件振り返りLT〜
maimyyym
1
160
Janus
bkuhlmann
1
490
ExaDB-D dbaascli で出来ること
oracle4engineer
PRO
0
2.1k
アクセス制御にまつわる改善 / Improving access control
itkq
0
550
Featured
See All Featured
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
121
39k
A Philosophy of Restraint
colly
197
16k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
274
13k
Thoughts on Productivity
jonyablonski
58
3.8k
KATA
mclloyd
15
12k
Design by the Numbers
sachag
274
18k
Building an army of robots
kneath
300
41k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
18
6.9k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
104
6.6k
Building Adaptive Systems
keathley
31
1.9k
BBQ
matthewcrist
80
8.8k
jQuery: Nuts, Bolts and Bling
dougneiner
59
7.1k
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech