Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Life with Datadog
Search
Takeshi Kondo
January 24, 2021
Technology
3
4k
Life with Datadog
July Tech Festa 2021 winter
https://techfesta.connpass.com/event/193966/
Takeshi Kondo
January 24, 2021
Tweet
Share
More Decks by Takeshi Kondo
See All by Takeshi Kondo
SRE NEXT CfP チームが語る 聞きたくなるプロポーザルとは / Proposals by the SRE NEXT CfP Team that are sure to be accepted
chaspy
1
680
Slack Platform(Deno) での RAG 実装 - LangChain(js) を使ってみた / rag-implementation-on-slack-platform-deno-experimenting-with-langchain-js
chaspy
0
180
SRE の考えをマネジメントに活かす / applying SRE ideas to management
chaspy
7
7k
RAGの簡易評価によるフィードバックサイクル実践 / Feedback cycle practice through simplified assessment of RAGs
chaspy
2
5.1k
定量データと定性評価を用いた技術戦略の組織的実践 / Systematic implementation of technology strategies using quantitative data and qualitative evaluation
chaspy
9
1.7k
エンジニアブランディングチームの KPI / KPI's of engineer branding team
chaspy
2
2k
「SLO Review」今やるならこうする / If I had to do the "SLO Review" again
chaspy
3
1.8k
開発者とともに作る Site Reliability Engineering / SREing with Developers
chaspy
10
8.1k
自己診断能力の獲得を目指して / Toward the acquisition of self-diagnostic skills
chaspy
1
4.9k
Other Decks in Technology
See All in Technology
勝手に!深堀り!Cloud Run worker pools / Deep dive Cloud Run worker pools
iselegant
2
450
PostgreSQL Log File Mastery: Optimizing Database Performance Through Advanced Log Analysis
shiviyer007
PRO
0
100
読んで学ぶ Amplify Gen2 / Amplify と CDK の関係を紐解く #jawsug_tokyo
tacck
PRO
1
200
Writing Ruby Scripts with TypeProf
mame
0
250
watsonx.data上のベクトル・データベース Milvusを見てみよう/20250418-milvus-dojo
mayumihirano
0
120
Рекомендации с нуля: как мы в Lamoda превратили главную страницу в ключевую точку входа для персонализированного шоппинга. Данил Комаров, Data Scientist, Lamoda Tech
lamodatech
0
760
エンジニアリングで組織のアウトカムを最速で最大化する!
ham0215
1
130
生成AIによるCloud Native基盤構築の可能性と実践的ガードレールの敷設について
nwiizo
7
1k
バックオフィス向け toB SaaS バクラクにおけるレコメンド技術活用 / recommender-systems-in-layerx-bakuraku
yuya4
6
550
【Λ(らむだ)】最近のアプデ情報 / RPALT20250422
lambda
0
110
AIでめっちゃ便利になったけど、結局みんなで学ぶよねっていう話
kakehashi
PRO
0
250
AI AgentOps LT大会(2025/04/16) Algomatic伊藤発表資料
kosukeito
0
140
Featured
See All Featured
What's in a price? How to price your products and services
michaelherold
245
12k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.4k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
9
760
RailsConf 2023
tenderlove
30
1.1k
How to Think Like a Performance Engineer
csswizardry
23
1.5k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Building Applications with DynamoDB
mza
94
6.3k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.2k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
23
2.6k
A designer walks into a library…
pauljervisheath
205
24k
Transcript
Life with Datadog Takeshi Kondo / @chaspy 2021/01/24 July Tech
Festa Winter 2021
Who am I chaspy chaspy_ Lead Software Engineer Site Reliability
at Quipper Takeshi Kondo
Japan Datadog User Group Organizer • https://datadog-jp.connpass.com/
Software Design 2݄߸ʹDatadog ϋϯζΦϯهࣄΛدߘ • ʲୈ2ಛूʳγεςϜࢹͷ࢝Ίํɾଓ͚ํ ୈ3ষ Datadog Ͱ࣮ફ͢ΔSaaSࢹ https://twitter.com/chaspy_/status/1350270946767081472?s=20
Datadog ॳ৺ऀͷํͥͻʂ
TγϟπͱۺԼ͍͖ͨͩ·ͨ͠ https://twitter.com/chaspy_/status/1352577570278043649?s=20 ཉ͍͠ͻͱ meetup Ͱͨ͠Γ ϞϒϓϩࢀՃ͢ΔͱΒ͑Δ͔ʂ
ࠓ Datadog ͷѪΛޠΓ·͢ https://www.datadoghq.com/about/resources/ Monitoring SaaS
ࠓͷൃද • ର • γεςϜӡ༻ऀ • Monitoring SaaS ʹڵຯ͕͋Δͻͱ •
Datadog ʹڵຯ͕͋Δͻͱ • ΰʔϧ • Datadog ͷྑ͞ΛΔ • Datadog ͷ༷ʑͳԠ༻ػೳΛΔ͜ͱͰ׆༻ͷώϯτΛಘΔ
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Datadog ͷ͖ͳͱ͜Ζɿαϙʔτ͕͍͢͝ • Ϩεϙϯε͕ૣ͍(΄΅࣮֬ʹ24hҎʹؼͬͯ͘Δʣ • ճ͕త֬ • Ͱ͖ͳ͍͜ͱͰϫʔΫΞϥϯυΛҊͯ͘͠ΕΔ • ϓϩμΫτΛྑ͘͠Α͏ͱ͍͏͕࢟ݟ͑Δ
αϙʔτʹ͓ئ͍͢Δํ๏ • جຊνέοτΛىථͯ͠Δʢӳޠʣ • Help -> resource -> Tickets &
Email Support • Ͳ͏ͯ͠ӳޠແཧͳΒຊޠͰͷճΛ͓ئ͍͢Εຊ αϙʔτʹӦۀ࣌ؒͯ͘͠ΕΔΒ͍͠ • Event Stream ʹ @support-datadog Λଧͯ live chat Ͱ ͖ΔΒ͍͠ʢͬͨ͜ͱͳ͍͕ʣ
աڈͷ࣭ྫ • Terraform Ͱ SLO ͷ Apply ͕ Invalid Query
ͰౖΒΕΔ • Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • EKS Control Plane ͷ metrics औಘํ๏ • SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • API Key ͷΛ૿ͤͳ͍͔૬ஊ
աڈͷ࣭ྫ • Terraform Ͱ SLO ͷ Apply ͕ Invalid Query
ͰౖΒΕΔ • Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • EKS Control Plane ͷ metrics औಘํ๏ • SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • API Key ͷΛ૿ͤͳ͍͔૬ஊ
Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • Monitor Summary ͬͯ·͔͢ʁ •
िʹҰϝʔϧདྷͯΔͭ • https://app.datadoghq.com/report/monitor • ࣮ GUI ͔Βಋઢ͕ͳ͍ʢϝʔϧ͔Β͔͠ඈͳ͍ʣ
Monitor Summary
Monitor Summary ҙͷνϟϯωϧͷΞϥʔτ ͚ͩϑΟϧλ͍ͨ͠
Raw data(CSV) ͕ api ͰऔΕΔΒ͍͠ • CSV ͕μϯϩʔυͰ͖Δ • ͜ͷ
CSV ͔Β custom metrics ΛૹΔίʔυΛॻ͍ͨ https://app.datadoghq.com/report/hourly_data/monitor https://docs.datadoghq.com/monitors/faq/how-can-i-export-alert-history/?tab=us
SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • ࣌·ͩ SLO Error Budget Alert ͷػೳͳ͔ͬͨ •
Product Manager ͕ग़ͯ͘Δɻͦͷػೳઈࢍ։ൃதͰɺ ϢʔβΠϯλϏϡʔΛͤͯ͘͞Εͳ͍͔ͱݴͬͯ͘Δ • Support Ticket ্Ͱ͍͔࣭ͭ͘ʹ͑Δ • ͦͷޙແࣄ SLO Alert Beta ϦϦʔε!
Τϯήʔδϝϯτͷߴ·Γ
ϦονͰΩϡʔτͳGUI
ϦονͰΩϡʔτͳGUI • ՄࢹԽͷํ๏͕ଟ͍ • ͙͢ʹө͞ΕΔ • ͕ؔଟ͍ • ίϐϖָ͕
࣮ࡍʹάϥϑΛ͍ͬͯ͡ΈΑ͏
SLO Summary & Alert
SLO Summary & Alert
Datadog ͷ SLO ػೳͳͯ͘ͳΒͳ͍૬ https://quipper.hatenablog.com/entry/2020/01/30/slo-review
Terraform ཧ & generator Ͱ৽نαʔϏε࡞ ࣌ʹ؆୯ʹ࡞Ͱ͖Δ ࣮ࡍʹݟͯΈΑ͏
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Monitor Summary Λຖே Daily Standup Ͱݟ͍ͯΔ
Monitor Summary • >muted:false tag:(("severity:alert" OR "severity:emergency") AND "team:sre") •
ͯ͢ͷ Alert ʹ severity tag ͱ team tag ͕͍͍ͭͯΔ • ࣗͨͪͷ୲ͷΞϥʔτͷΈΛ Daily Standup Ͱ Check • NG ͩͬͨΒຖͷ୲ऀ͕ରॲ͢Δ
Weekly Summary • ͜͏͍͏ϝʔϧ͕དྷ͍ͯ·͢ • [Monitor Report] You received xxx
alerts
Weekly Summary • ࣮͜͜ͰݟΕΔ • https://app.datadoghq.com/report/monitor • ΫϦοΫ͢Δͱि࣌ؒଳΞϥʔτνϟϯωϧɺmonitor ͰߜΓࠐΈ͕Ͱ͖Δ •
͜ΕΛݟͯΞϥʔτͷࣗମͷਪҠΛ͍ͬͯΔ
Alfred ͱͷΈ߹Θͤ • Dashboard, monitor, SLO Λαοͱ։͚ΔΑ͏ʹ • Web Search
ొ͍ͯ͠Δ • ddd https://app.datadoghq.com/dashboard/lists?q={query} • ddm https://app.datadoghq.com/monitors/manage?q={query} • ddslo https://app.datadoghq.com/slo?query={query}
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Datadog Cost ͷܭࢉࣜ • ݄ؒͷϗετͷ 99 percentile (max Ͱͳ͍) •
ϗετΛݮ͢Δͱ Datadog Cost ݮΔ • ϗετΛݮΒ͔͢͠ͳ͍ʢԦಓͳ͠ʣ
৽نαʔϏεͱ Datadog • ৽نαʔϏεߏங࣌ʹ"Production Readiness Checklist" ͕͋Δ https://quipper.hatenablog.com/entry/2020/01/30/production-readiness-with-all
Production Readiness Checklist - Monitoring / Logging
Template Dashboard
࣮ࡍʹ Template Dashboard ΛݟͯΈΑ͏
Production Readiness Checklist - SLI/SLO • Generator Λͬͯ࡞ͬͯ Dashboard ʹՃ͢Δ͚ͩʂ
• SLO ࣗମΛԿʹ͢Δ͔ Design Doc ϑΣʔζͰࡦఆ͢Δ
Custom metrics ͷ׆༻ྫ • CI Ͱ͔͔ͬͨ࣌ؒΛଌఆ • Docker build ͷ࣌ؒɺ֤ςετͷ࣌ؒͳͲ
• CircleCI ͷίετΛଌఆɺՄࢹԽ • Dependabot ʹΑͬͯ open ʹͳͬͯΔ PR ΛՄࢹԽ • εύΠΫΞΫηεΛ͙ͨΊʹෛՙใΛ custom metrics ͱͯ͠ૹ ΓɺKubernetes HPA ͔Βར༻
Kubernetes meetup tokyo #38 Ͱ͠·ͨ͠ https://twitter.com/chaspy_/status/1352194206962388995?s=20 https://speakerdeck.com/chaspy/v2beta2-and-examples-of-using-hpa-external-metrics-with-datadog https://quipper.hatenablog.com/entry/2020/11/30/scheduled-scaling-with-hpa
Custom metric ͱͯ͠ͷճΓͷใΛૹΔϝϦοτ • ݱঢ়ΛՄࢹԽͰ͖Δ • ҙͷλάͰϑΟϧλ/আ֎Ͱ͖Δ • Metric ͔ΒΞϥʔτΛൃ๒Ͱ͖Δ
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent http Ͱ prometheus ܗࣜͰ metric Λ export
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent 1 annotations: 2 ad.datadoghq.com/timed-exam-schedule-exporter.check_names: | 3 ["prometheus"] 4 ad.datadoghq.com/timed-exam-schedule-exporter.init_configs: | 5 [{}] 6 ad.datadoghq.com/timed-exam-schedule-exporter.instances: | 7 [ 8 { 9 "prometheus_url": "http://%%host%%:8080/metrics", 10 "namespace": "namespace", 11 "metrics": ["*"] 12 } 13 ]
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent • GitHub ͷ PR issue • AWS Security Event(GuardDuty / ECR Image Scan) • OS ͝ͱͷ EC2 Horst • RDS ͷ version ͝ͱͷΫϥελ
·ͱΊ • Datadog Ͱ͔Βͳ͍͜ͱ͕͋ͬͨΒ... • ͙͢ʹ Support ʹฉ͜͏ʂૣͯ͘ஸೡʂ • Japan
Datadog User Group ͷ Slack Ͱฉ͍ͯ okʂ • Monitor Summary ͱ Weekly Report Ͱ Monitor ࣗମݮΒͯ͠ ͍͜͏ • Custom metrics ͦ͜ Datadog ͷޣຯɺͲΜͲΜ׆༻͠Α͏ʂ
More • ࣮ࡍͷάϥϑͱ͔ݟ͍ͨͻͱըͷͳ͍࣌ʹདྷͯʂ • ͜ͷ͋ͱͷ zoom breakout room ͰΑ͠ •
ݸผʹ Twitter Ͱ࿈བྷ͘ΕΕ Datadog ૬ஊΓ·͢
એ
Thank you! chaspy chaspy_ Lead Software Engineer Site Reliability at
Quipper Takeshi Kondo