Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Life with Datadog
Search
Takeshi Kondo
January 24, 2021
Technology
3
4.2k
Life with Datadog
July Tech Festa 2021 winter
https://techfesta.connpass.com/event/193966/
Takeshi Kondo
January 24, 2021
Tweet
Share
More Decks by Takeshi Kondo
See All by Takeshi Kondo
SREの知識地図 - 第2章の紹介 - / Knowledge Map of SRE – Introduction to Chapter 2 –
chaspy
0
35
SRE NEXT CfP チームが語る 聞きたくなるプロポーザルとは / Proposals by the SRE NEXT CfP Team that are sure to be accepted
chaspy
2
1.5k
Slack Platform(Deno) での RAG 実装 - LangChain(js) を使ってみた / rag-implementation-on-slack-platform-deno-experimenting-with-langchain-js
chaspy
0
260
SRE の考えをマネジメントに活かす / applying SRE ideas to management
chaspy
7
7.9k
RAGの簡易評価によるフィードバックサイクル実践 / Feedback cycle practice through simplified assessment of RAGs
chaspy
2
5.7k
定量データと定性評価を用いた技術戦略の組織的実践 / Systematic implementation of technology strategies using quantitative data and qualitative evaluation
chaspy
9
2k
エンジニアブランディングチームの KPI / KPI's of engineer branding team
chaspy
2
2.3k
「SLO Review」今やるならこうする / If I had to do the "SLO Review" again
chaspy
3
2.1k
開発者とともに作る Site Reliability Engineering / SREing with Developers
chaspy
10
8.6k
Other Decks in Technology
See All in Technology
AI時代のワークフロー設計〜Durable Functions / Step Functions / Strands Agents を添えて〜
yakumo
3
300
GitHub Copilotを使いこなす 実例に学ぶAIコーディング活用術
74th
3
3.4k
寫了幾年 Code,然後呢?軟體工程師必須重新認識的 DevOps
cheng_wei_chen
1
1.5k
年間40件以上の登壇を続けて見えた「本当の発信力」/ 20251213 Masaki Okuda
shift_evolve
PRO
1
140
Lessons from Migrating to OpenSearch: Shard Design, Log Ingestion, and UI Decisions
sansantech
PRO
1
150
Reinforcement Fine-tuning 基礎〜実践まで
ch6noota
0
190
生成AI活用の型ハンズオン〜顧客課題起点で設計する7つのステップ
yushin_n
0
240
ExpoのインダストリーブースでみたAWSが見せる製造業の未来
hamadakoji
0
150
AI時代の新規LLMプロダクト開発: Findy Insightsを3ヶ月で立ち上げた舞台裏と振り返り
dakuon
0
210
Strands Agents × インタリーブ思考 で変わるAIエージェント設計 / Strands Agents x Interleaved Thinking AI Agents
takanorig
2
160
MLflowで始めるプロンプト管理、評価、最適化
databricksjapan
1
260
Strands AgentsとNova 2 SonicでS2Sを実践してみた
yama3133
0
220
Featured
See All Featured
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
It's Worth the Effort
3n
187
29k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.3k
Producing Creativity
orderedlist
PRO
348
40k
Unsuck your backbone
ammeep
671
58k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
286
14k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
How GitHub (no longer) Works
holman
316
140k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
1
110
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.5k
Docker and Python
trallard
47
3.7k
Transcript
Life with Datadog Takeshi Kondo / @chaspy 2021/01/24 July Tech
Festa Winter 2021
Who am I chaspy chaspy_ Lead Software Engineer Site Reliability
at Quipper Takeshi Kondo
Japan Datadog User Group Organizer • https://datadog-jp.connpass.com/
Software Design 2݄߸ʹDatadog ϋϯζΦϯهࣄΛدߘ • ʲୈ2ಛूʳγεςϜࢹͷ࢝Ίํɾଓ͚ํ ୈ3ষ Datadog Ͱ࣮ફ͢ΔSaaSࢹ https://twitter.com/chaspy_/status/1350270946767081472?s=20
Datadog ॳ৺ऀͷํͥͻʂ
TγϟπͱۺԼ͍͖ͨͩ·ͨ͠ https://twitter.com/chaspy_/status/1352577570278043649?s=20 ཉ͍͠ͻͱ meetup Ͱͨ͠Γ ϞϒϓϩࢀՃ͢ΔͱΒ͑Δ͔ʂ
ࠓ Datadog ͷѪΛޠΓ·͢ https://www.datadoghq.com/about/resources/ Monitoring SaaS
ࠓͷൃද • ର • γεςϜӡ༻ऀ • Monitoring SaaS ʹڵຯ͕͋Δͻͱ •
Datadog ʹڵຯ͕͋Δͻͱ • ΰʔϧ • Datadog ͷྑ͞ΛΔ • Datadog ͷ༷ʑͳԠ༻ػೳΛΔ͜ͱͰ׆༻ͷώϯτΛಘΔ
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Datadog ͷ͖ͳͱ͜Ζɿαϙʔτ͕͍͢͝ • Ϩεϙϯε͕ૣ͍(΄΅࣮֬ʹ24hҎʹؼͬͯ͘Δʣ • ճ͕త֬ • Ͱ͖ͳ͍͜ͱͰϫʔΫΞϥϯυΛҊͯ͘͠ΕΔ • ϓϩμΫτΛྑ͘͠Α͏ͱ͍͏͕࢟ݟ͑Δ
αϙʔτʹ͓ئ͍͢Δํ๏ • جຊνέοτΛىථͯ͠Δʢӳޠʣ • Help -> resource -> Tickets &
Email Support • Ͳ͏ͯ͠ӳޠແཧͳΒຊޠͰͷճΛ͓ئ͍͢Εຊ αϙʔτʹӦۀ࣌ؒͯ͘͠ΕΔΒ͍͠ • Event Stream ʹ @support-datadog Λଧͯ live chat Ͱ ͖ΔΒ͍͠ʢͬͨ͜ͱͳ͍͕ʣ
աڈͷ࣭ྫ • Terraform Ͱ SLO ͷ Apply ͕ Invalid Query
ͰౖΒΕΔ • Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • EKS Control Plane ͷ metrics औಘํ๏ • SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • API Key ͷΛ૿ͤͳ͍͔૬ஊ
աڈͷ࣭ྫ • Terraform Ͱ SLO ͷ Apply ͕ Invalid Query
ͰౖΒΕΔ • Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • EKS Control Plane ͷ metrics औಘํ๏ • SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • API Key ͷΛ૿ͤͳ͍͔૬ஊ
Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • Monitor Summary ͬͯ·͔͢ʁ •
िʹҰϝʔϧདྷͯΔͭ • https://app.datadoghq.com/report/monitor • ࣮ GUI ͔Βಋઢ͕ͳ͍ʢϝʔϧ͔Β͔͠ඈͳ͍ʣ
Monitor Summary
Monitor Summary ҙͷνϟϯωϧͷΞϥʔτ ͚ͩϑΟϧλ͍ͨ͠
Raw data(CSV) ͕ api ͰऔΕΔΒ͍͠ • CSV ͕μϯϩʔυͰ͖Δ • ͜ͷ
CSV ͔Β custom metrics ΛૹΔίʔυΛॻ͍ͨ https://app.datadoghq.com/report/hourly_data/monitor https://docs.datadoghq.com/monitors/faq/how-can-i-export-alert-history/?tab=us
SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • ࣌·ͩ SLO Error Budget Alert ͷػೳͳ͔ͬͨ •
Product Manager ͕ग़ͯ͘Δɻͦͷػೳઈࢍ։ൃதͰɺ ϢʔβΠϯλϏϡʔΛͤͯ͘͞Εͳ͍͔ͱݴͬͯ͘Δ • Support Ticket ্Ͱ͍͔࣭ͭ͘ʹ͑Δ • ͦͷޙແࣄ SLO Alert Beta ϦϦʔε!
Τϯήʔδϝϯτͷߴ·Γ
ϦονͰΩϡʔτͳGUI
ϦονͰΩϡʔτͳGUI • ՄࢹԽͷํ๏͕ଟ͍ • ͙͢ʹө͞ΕΔ • ͕ؔଟ͍ • ίϐϖָ͕
࣮ࡍʹάϥϑΛ͍ͬͯ͡ΈΑ͏
SLO Summary & Alert
SLO Summary & Alert
Datadog ͷ SLO ػೳͳͯ͘ͳΒͳ͍૬ https://quipper.hatenablog.com/entry/2020/01/30/slo-review
Terraform ཧ & generator Ͱ৽نαʔϏε࡞ ࣌ʹ؆୯ʹ࡞Ͱ͖Δ ࣮ࡍʹݟͯΈΑ͏
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Monitor Summary Λຖே Daily Standup Ͱݟ͍ͯΔ
Monitor Summary • >muted:false tag:(("severity:alert" OR "severity:emergency") AND "team:sre") •
ͯ͢ͷ Alert ʹ severity tag ͱ team tag ͕͍͍ͭͯΔ • ࣗͨͪͷ୲ͷΞϥʔτͷΈΛ Daily Standup Ͱ Check • NG ͩͬͨΒຖͷ୲ऀ͕ରॲ͢Δ
Weekly Summary • ͜͏͍͏ϝʔϧ͕དྷ͍ͯ·͢ • [Monitor Report] You received xxx
alerts
Weekly Summary • ࣮͜͜ͰݟΕΔ • https://app.datadoghq.com/report/monitor • ΫϦοΫ͢Δͱि࣌ؒଳΞϥʔτνϟϯωϧɺmonitor ͰߜΓࠐΈ͕Ͱ͖Δ •
͜ΕΛݟͯΞϥʔτͷࣗମͷਪҠΛ͍ͬͯΔ
Alfred ͱͷΈ߹Θͤ • Dashboard, monitor, SLO Λαοͱ։͚ΔΑ͏ʹ • Web Search
ొ͍ͯ͠Δ • ddd https://app.datadoghq.com/dashboard/lists?q={query} • ddm https://app.datadoghq.com/monitors/manage?q={query} • ddslo https://app.datadoghq.com/slo?query={query}
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Datadog Cost ͷܭࢉࣜ • ݄ؒͷϗετͷ 99 percentile (max Ͱͳ͍) •
ϗετΛݮ͢Δͱ Datadog Cost ݮΔ • ϗετΛݮΒ͔͢͠ͳ͍ʢԦಓͳ͠ʣ
৽نαʔϏεͱ Datadog • ৽نαʔϏεߏங࣌ʹ"Production Readiness Checklist" ͕͋Δ https://quipper.hatenablog.com/entry/2020/01/30/production-readiness-with-all
Production Readiness Checklist - Monitoring / Logging
Template Dashboard
࣮ࡍʹ Template Dashboard ΛݟͯΈΑ͏
Production Readiness Checklist - SLI/SLO • Generator Λͬͯ࡞ͬͯ Dashboard ʹՃ͢Δ͚ͩʂ
• SLO ࣗମΛԿʹ͢Δ͔ Design Doc ϑΣʔζͰࡦఆ͢Δ
Custom metrics ͷ׆༻ྫ • CI Ͱ͔͔ͬͨ࣌ؒΛଌఆ • Docker build ͷ࣌ؒɺ֤ςετͷ࣌ؒͳͲ
• CircleCI ͷίετΛଌఆɺՄࢹԽ • Dependabot ʹΑͬͯ open ʹͳͬͯΔ PR ΛՄࢹԽ • εύΠΫΞΫηεΛ͙ͨΊʹෛՙใΛ custom metrics ͱͯ͠ૹ ΓɺKubernetes HPA ͔Βར༻
Kubernetes meetup tokyo #38 Ͱ͠·ͨ͠ https://twitter.com/chaspy_/status/1352194206962388995?s=20 https://speakerdeck.com/chaspy/v2beta2-and-examples-of-using-hpa-external-metrics-with-datadog https://quipper.hatenablog.com/entry/2020/11/30/scheduled-scaling-with-hpa
Custom metric ͱͯ͠ͷճΓͷใΛૹΔϝϦοτ • ݱঢ়ΛՄࢹԽͰ͖Δ • ҙͷλάͰϑΟϧλ/আ֎Ͱ͖Δ • Metric ͔ΒΞϥʔτΛൃ๒Ͱ͖Δ
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent http Ͱ prometheus ܗࣜͰ metric Λ export
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent 1 annotations: 2 ad.datadoghq.com/timed-exam-schedule-exporter.check_names: | 3 ["prometheus"] 4 ad.datadoghq.com/timed-exam-schedule-exporter.init_configs: | 5 [{}] 6 ad.datadoghq.com/timed-exam-schedule-exporter.instances: | 7 [ 8 { 9 "prometheus_url": "http://%%host%%:8080/metrics", 10 "namespace": "namespace", 11 "metrics": ["*"] 12 } 13 ]
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent • GitHub ͷ PR issue • AWS Security Event(GuardDuty / ECR Image Scan) • OS ͝ͱͷ EC2 Horst • RDS ͷ version ͝ͱͷΫϥελ
·ͱΊ • Datadog Ͱ͔Βͳ͍͜ͱ͕͋ͬͨΒ... • ͙͢ʹ Support ʹฉ͜͏ʂૣͯ͘ஸೡʂ • Japan
Datadog User Group ͷ Slack Ͱฉ͍ͯ okʂ • Monitor Summary ͱ Weekly Report Ͱ Monitor ࣗମݮΒͯ͠ ͍͜͏ • Custom metrics ͦ͜ Datadog ͷޣຯɺͲΜͲΜ׆༻͠Α͏ʂ
More • ࣮ࡍͷάϥϑͱ͔ݟ͍ͨͻͱըͷͳ͍࣌ʹདྷͯʂ • ͜ͷ͋ͱͷ zoom breakout room ͰΑ͠ •
ݸผʹ Twitter Ͱ࿈བྷ͘ΕΕ Datadog ૬ஊΓ·͢
એ
Thank you! chaspy chaspy_ Lead Software Engineer Site Reliability at
Quipper Takeshi Kondo