Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Monitoring JUST EAT on AWS
Search
Peter Mounce
April 24, 2015
Technology
0
130
Monitoring JUST EAT on AWS
Or, why we didn't just use CloudWatch.
Peter Mounce
April 24, 2015
Tweet
Share
More Decks by Peter Mounce
See All by Peter Mounce
Modern Monitoring for .NET
petemounce
0
160
Embracing DevOps at JUST EAT, within a Microsoft platform
petemounce
1
290
Other Decks in Technology
See All in Technology
マルチモーダル / AI Agent / LLMOps 3つの技術トレンドで理解するLLMの今後の展望
hirosatogamo
37
12k
B2B SaaS × AI機能開発 〜テナント分離のパターン解説〜 / B2B SaaS x AI function development - Explanation of tenant separation pattern
oztick139
2
220
【Pycon mini 東海 2024】Google Colaboratoryで試すVLM
kazuhitotakahashi
2
490
rootlessコンテナのすゝめ - 研究室サーバーでもできる安全なコンテナ管理
kitsuya0828
3
380
インフラとバックエンドとフロントエンドをくまなく調べて遅いアプリを早くした件
tubone24
1
430
Incident Response Practices: Waroom's Features and Future Challenges
rrreeeyyy
0
160
SSMRunbook作成の勘所_20241120
koichiotomo
2
120
ExaDB-D dbaascli で出来ること
oracle4engineer
PRO
0
3.8k
個人でもIAM Identity Centerを使おう!(アクセス管理編)
ryder472
3
190
IBC 2024 動画技術関連レポート / IBC 2024 Report
cyberagentdevelopers
PRO
0
110
Exadata Database Service on Dedicated Infrastructure(ExaDB-D) UI スクリーン・キャプチャ集
oracle4engineer
PRO
2
3.2k
フルカイテン株式会社 採用資料
fullkaiten
0
40k
Featured
See All Featured
Building Adaptive Systems
keathley
38
2.3k
KATA
mclloyd
29
14k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
How to Think Like a Performance Engineer
csswizardry
20
1.1k
Fireside Chat
paigeccino
34
3k
Art, The Web, and Tiny UX
lynnandtonic
297
20k
4 Signs Your Business is Dying
shpigford
180
21k
The Invisible Side of Design
smashingmag
298
50k
Six Lessons from altMBA
skipperchong
27
3.5k
Building Your Own Lightsaber
phodgson
103
6.1k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
27
840
Transcript
Monitoring JUST EAT on AWS (Or: why we didn’t just
use AWS CloudWatch) Peter Mounce @petemounce / @justeat_tech
What did we want? Peter Mounce @petemounce / @justeat_tech One
source of truth Alerts that fire in (hopefully) a few seconds Data we can keep for a long time Data we can get rid of when we want
What did we end up with? Harvests OS-level perf-counters into
statsd Apps publish their own metrics where they choose Publishers: PerfTap + app-specific Peter Mounce @petemounce / @justeat_tech
What did we end up with? Send metrics over UDP:
timers.uk.paymentsapi.checkout.200.005.eu-west-1.a:343|ms Receiver: StatsD (by Etsy) Peter Mounce @petemounce / @justeat_tech
What did we end up with? Aggregator: Graphite Peter Mounce
@petemounce / @justeat_tech
What did we end up with? Check-runner / alerter: Seyren
Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute(diffSeries(movingAverage(sumSeries(stats_counts.consumercommunicationservice. uk.*.event-*.reaction-savetoken.*.eu-west-1.*),50),movingAverage(sumSeries(stats. timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count,stats. timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count,stats.timers.api-
consumer.asp-net-responses.create.post.201.*.*.*.count),50))) Just kidding. Example alert Peter Mounce @petemounce / @justeat_tech
What did we end up with? absolute( diffSeries( movingAverage( sumSeries(
stats_counts.consumercommunicationservice.uk.*.event-*.reaction-savetoken.*.eu-west-1.*) ,50), movingAverage( sumSeries( stats.timers.api-consumer.asp-net-responses.*authorizetoken.put.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.loginuser.post.200.*.*.*.count, stats.timers.api-consumer.asp-net-responses.create.post.201.*.*.*.count ) ,50) ) ) Example alert (comprehensible) Peter Mounce @petemounce / @justeat_tech
What did we end up with? • PagerDuty • Grafana
• HipChat Some other stuff too Peter Mounce @petemounce / @justeat_tech
What does it look like? Peter Mounce @petemounce / @justeat_tech
Diagram credit
What does it cost? Peter Mounce @petemounce / @justeat_tech Graphite
+ whisper 1x m3.2xlarge, 12x 1TB @ 500 PIOPs StatsD 1x m3.xlarge Carbon-relay 1x m3.xlarge Seyren 1x c3.xlarge Grafana S3 website PagerDuty somebody else’s problem ;-) Buys: 200k metrics / sec & alarm latency around 2min
What did we gain? Graphite has more analysis functions than
CloudWatch does. Graphite: ~100 CloudWatch: 5…? Rich set of data analysis functions Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch - retains data for 2
weeks … or until shortly after resources are terminated … so we would need to archive data ourselves Capability for historical analysis Peter Mounce @petemounce / @justeat_tech
What did we gain? CloudWatch • 1 min granularity •
~2 min latency (CloudWatch::DynamoDB - 5 min granularity on CCU) Our MTR-React is shorter Peter Mounce @petemounce / @justeat_tech
Happiness! (Mostly) Peter Mounce @petemounce / @justeat_tech
We’re recruiting! http://tech.just-eat.com/jobs Peter Mounce @petemounce / @justeat_tech