Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Life with Datadog
Search
Takeshi Kondo
January 24, 2021
Technology
3
4k
Life with Datadog
July Tech Festa 2021 winter
https://techfesta.connpass.com/event/193966/
Takeshi Kondo
January 24, 2021
Tweet
Share
More Decks by Takeshi Kondo
See All by Takeshi Kondo
SRE NEXT CfP チームが語る 聞きたくなるプロポーザルとは / Proposals by the SRE NEXT CfP Team that are sure to be accepted
chaspy
1
1.2k
Slack Platform(Deno) での RAG 実装 - LangChain(js) を使ってみた / rag-implementation-on-slack-platform-deno-experimenting-with-langchain-js
chaspy
0
210
SRE の考えをマネジメントに活かす / applying SRE ideas to management
chaspy
7
7.3k
RAGの簡易評価によるフィードバックサイクル実践 / Feedback cycle practice through simplified assessment of RAGs
chaspy
2
5.4k
定量データと定性評価を用いた技術戦略の組織的実践 / Systematic implementation of technology strategies using quantitative data and qualitative evaluation
chaspy
9
1.8k
エンジニアブランディングチームの KPI / KPI's of engineer branding team
chaspy
2
2.1k
「SLO Review」今やるならこうする / If I had to do the "SLO Review" again
chaspy
3
1.9k
開発者とともに作る Site Reliability Engineering / SREing with Developers
chaspy
10
8.2k
自己診断能力の獲得を目指して / Toward the acquisition of self-diagnostic skills
chaspy
1
5.1k
Other Decks in Technology
See All in Technology
Кто отправит outbox? Валентин Удальцов, автор канала Пых
lamodatech
0
330
Understanding_Thread_Tuning_for_Inference_Servers_of_Deep_Models.pdf
lycorptech_jp
PRO
0
110
エンジニア向け技術スタック情報
kauche
1
250
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
3
460
VISITS_AIIoTビジネス共創ラボ登壇資料.pdf
iotcomjpadmin
0
160
IIWレポートからみるID業界で話題のMCP
fujie
0
780
プロダクトエンジニアリング組織への歩み、その現在地 / Our journey to becoming a product engineering organization
hiro_torii
0
130
Prox Industries株式会社 会社紹介資料
proxindustries
0
270
Node-RED × MCP 勉強会 vol.1
1ftseabass
PRO
0
140
Amazon ECS & AWS Fargate 運用アーキテクチャ2025 / Amazon ECS and AWS Fargate Ops Architecture 2025
iselegant
16
5.4k
Postman AI エージェントビルダー最新情報
nagix
0
110
Javaで作る RAGを活用した Q&Aアプリケーション
recruitengineers
PRO
1
100
Featured
See All Featured
Why Our Code Smells
bkeepers
PRO
337
57k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Thoughts on Productivity
jonyablonski
69
4.7k
Scaling GitHub
holman
459
140k
Making Projects Easy
brettharned
116
6.3k
Git: the NoSQL Database
bkeepers
PRO
430
65k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
228
22k
Agile that works and the tools we love
rasmusluckow
329
21k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
Stop Working from a Prison Cell
hatefulcrawdad
270
20k
Transcript
Life with Datadog Takeshi Kondo / @chaspy 2021/01/24 July Tech
Festa Winter 2021
Who am I chaspy chaspy_ Lead Software Engineer Site Reliability
at Quipper Takeshi Kondo
Japan Datadog User Group Organizer • https://datadog-jp.connpass.com/
Software Design 2݄߸ʹDatadog ϋϯζΦϯهࣄΛدߘ • ʲୈ2ಛूʳγεςϜࢹͷ࢝Ίํɾଓ͚ํ ୈ3ষ Datadog Ͱ࣮ફ͢ΔSaaSࢹ https://twitter.com/chaspy_/status/1350270946767081472?s=20
Datadog ॳ৺ऀͷํͥͻʂ
TγϟπͱۺԼ͍͖ͨͩ·ͨ͠ https://twitter.com/chaspy_/status/1352577570278043649?s=20 ཉ͍͠ͻͱ meetup Ͱͨ͠Γ ϞϒϓϩࢀՃ͢ΔͱΒ͑Δ͔ʂ
ࠓ Datadog ͷѪΛޠΓ·͢ https://www.datadoghq.com/about/resources/ Monitoring SaaS
ࠓͷൃද • ର • γεςϜӡ༻ऀ • Monitoring SaaS ʹڵຯ͕͋Δͻͱ •
Datadog ʹڵຯ͕͋Δͻͱ • ΰʔϧ • Datadog ͷྑ͞ΛΔ • Datadog ͷ༷ʑͳԠ༻ػೳΛΔ͜ͱͰ׆༻ͷώϯτΛಘΔ
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Datadog ͷ͖ͳͱ͜Ζɿαϙʔτ͕͍͢͝ • Ϩεϙϯε͕ૣ͍(΄΅࣮֬ʹ24hҎʹؼͬͯ͘Δʣ • ճ͕త֬ • Ͱ͖ͳ͍͜ͱͰϫʔΫΞϥϯυΛҊͯ͘͠ΕΔ • ϓϩμΫτΛྑ͘͠Α͏ͱ͍͏͕࢟ݟ͑Δ
αϙʔτʹ͓ئ͍͢Δํ๏ • جຊνέοτΛىථͯ͠Δʢӳޠʣ • Help -> resource -> Tickets &
Email Support • Ͳ͏ͯ͠ӳޠແཧͳΒຊޠͰͷճΛ͓ئ͍͢Εຊ αϙʔτʹӦۀ࣌ؒͯ͘͠ΕΔΒ͍͠ • Event Stream ʹ @support-datadog Λଧͯ live chat Ͱ ͖ΔΒ͍͠ʢͬͨ͜ͱͳ͍͕ʣ
աڈͷ࣭ྫ • Terraform Ͱ SLO ͷ Apply ͕ Invalid Query
ͰౖΒΕΔ • Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • EKS Control Plane ͷ metrics औಘํ๏ • SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • API Key ͷΛ૿ͤͳ͍͔૬ஊ
աڈͷ࣭ྫ • Terraform Ͱ SLO ͷ Apply ͕ Invalid Query
ͰౖΒΕΔ • Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • EKS Control Plane ͷ metrics औಘํ๏ • SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • API Key ͷΛ૿ͤͳ͍͔૬ஊ
Monitor Summary Λ metrics ͱͯ͠औΕͳ͍͔૬ஊ • Monitor Summary ͬͯ·͔͢ʁ •
िʹҰϝʔϧདྷͯΔͭ • https://app.datadoghq.com/report/monitor • ࣮ GUI ͔Βಋઢ͕ͳ͍ʢϝʔϧ͔Β͔͠ඈͳ͍ʣ
Monitor Summary
Monitor Summary ҙͷνϟϯωϧͷΞϥʔτ ͚ͩϑΟϧλ͍ͨ͠
Raw data(CSV) ͕ api ͰऔΕΔΒ͍͠ • CSV ͕μϯϩʔυͰ͖Δ • ͜ͷ
CSV ͔Β custom metrics ΛૹΔίʔυΛॻ͍ͨ https://app.datadoghq.com/report/hourly_data/monitor https://docs.datadoghq.com/monitors/faq/how-can-i-export-alert-history/?tab=us
SLO ҧͨ͠ͱ͖ʹΞϥʔτΛඈ͢ํ๏ • ࣌·ͩ SLO Error Budget Alert ͷػೳͳ͔ͬͨ •
Product Manager ͕ग़ͯ͘Δɻͦͷػೳઈࢍ։ൃதͰɺ ϢʔβΠϯλϏϡʔΛͤͯ͘͞Εͳ͍͔ͱݴͬͯ͘Δ • Support Ticket ্Ͱ͍͔࣭ͭ͘ʹ͑Δ • ͦͷޙແࣄ SLO Alert Beta ϦϦʔε!
Τϯήʔδϝϯτͷߴ·Γ
ϦονͰΩϡʔτͳGUI
ϦονͰΩϡʔτͳGUI • ՄࢹԽͷํ๏͕ଟ͍ • ͙͢ʹө͞ΕΔ • ͕ؔଟ͍ • ίϐϖָ͕
࣮ࡍʹάϥϑΛ͍ͬͯ͡ΈΑ͏
SLO Summary & Alert
SLO Summary & Alert
Datadog ͷ SLO ػೳͳͯ͘ͳΒͳ͍૬ https://quipper.hatenablog.com/entry/2020/01/30/slo-review
Terraform ཧ & generator Ͱ৽نαʔϏε࡞ ࣌ʹ؆୯ʹ࡞Ͱ͖Δ ࣮ࡍʹݟͯΈΑ͏
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Monitor Summary Λຖே Daily Standup Ͱݟ͍ͯΔ
Monitor Summary • >muted:false tag:(("severity:alert" OR "severity:emergency") AND "team:sre") •
ͯ͢ͷ Alert ʹ severity tag ͱ team tag ͕͍͍ͭͯΔ • ࣗͨͪͷ୲ͷΞϥʔτͷΈΛ Daily Standup Ͱ Check • NG ͩͬͨΒຖͷ୲ऀ͕ରॲ͢Δ
Weekly Summary • ͜͏͍͏ϝʔϧ͕དྷ͍ͯ·͢ • [Monitor Report] You received xxx
alerts
Weekly Summary • ࣮͜͜ͰݟΕΔ • https://app.datadoghq.com/report/monitor • ΫϦοΫ͢Δͱि࣌ؒଳΞϥʔτνϟϯωϧɺmonitor ͰߜΓࠐΈ͕Ͱ͖Δ •
͜ΕΛݟͯΞϥʔτͷࣗମͷਪҠΛ͍ͬͯΔ
Alfred ͱͷΈ߹Θͤ • Dashboard, monitor, SLO Λαοͱ։͚ΔΑ͏ʹ • Web Search
ొ͍ͯ͠Δ • ddd https://app.datadoghq.com/dashboard/lists?q={query} • ddm https://app.datadoghq.com/monitors/manage?q={query} • ddslo https://app.datadoghq.com/slo?query={query}
Agenda • Datadog ͷ͖ͳͱ͜Ζ • ϓϩμΫτѪᷓΕΔαϙʔτ • ϦονͰΩϡʔτͳ GUI •
SLO Summary & Alert • Datadog ͱͷੜ׆ • Monitor Summary • Weekly Summary • Alfred ͱͷΈ߹Θͤ • Datadog ͱͷੜ׆ Ԡ༻ฤ • Datadog Cost ͱͷ͖߹͍ํ • ৽نαʔϏεͱ Datadog • Custom Metrics ͷ׆༻
Datadog Cost ͷܭࢉࣜ • ݄ؒͷϗετͷ 99 percentile (max Ͱͳ͍) •
ϗετΛݮ͢Δͱ Datadog Cost ݮΔ • ϗετΛݮΒ͔͢͠ͳ͍ʢԦಓͳ͠ʣ
৽نαʔϏεͱ Datadog • ৽نαʔϏεߏங࣌ʹ"Production Readiness Checklist" ͕͋Δ https://quipper.hatenablog.com/entry/2020/01/30/production-readiness-with-all
Production Readiness Checklist - Monitoring / Logging
Template Dashboard
࣮ࡍʹ Template Dashboard ΛݟͯΈΑ͏
Production Readiness Checklist - SLI/SLO • Generator Λͬͯ࡞ͬͯ Dashboard ʹՃ͢Δ͚ͩʂ
• SLO ࣗମΛԿʹ͢Δ͔ Design Doc ϑΣʔζͰࡦఆ͢Δ
Custom metrics ͷ׆༻ྫ • CI Ͱ͔͔ͬͨ࣌ؒΛଌఆ • Docker build ͷ࣌ؒɺ֤ςετͷ࣌ؒͳͲ
• CircleCI ͷίετΛଌఆɺՄࢹԽ • Dependabot ʹΑͬͯ open ʹͳͬͯΔ PR ΛՄࢹԽ • εύΠΫΞΫηεΛ͙ͨΊʹෛՙใΛ custom metrics ͱͯ͠ૹ ΓɺKubernetes HPA ͔Βར༻
Kubernetes meetup tokyo #38 Ͱ͠·ͨ͠ https://twitter.com/chaspy_/status/1352194206962388995?s=20 https://speakerdeck.com/chaspy/v2beta2-and-examples-of-using-hpa-external-metrics-with-datadog https://quipper.hatenablog.com/entry/2020/11/30/scheduled-scaling-with-hpa
Custom metric ͱͯ͠ͷճΓͷใΛૹΔϝϦοτ • ݱঢ়ΛՄࢹԽͰ͖Δ • ҙͷλάͰϑΟϧλ/আ֎Ͱ͖Δ • Metric ͔ΒΞϥʔτΛൃ๒Ͱ͖Δ
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent http Ͱ prometheus ܗࣜͰ metric Λ export
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent 1 annotations: 2 ad.datadoghq.com/timed-exam-schedule-exporter.check_names: | 3 ["prometheus"] 4 ad.datadoghq.com/timed-exam-schedule-exporter.init_configs: | 5 [{}] 6 ad.datadoghq.com/timed-exam-schedule-exporter.instances: | 7 [ 8 { 9 "prometheus_url": "http://%%host%%:8080/metrics", 10 "namespace": "namespace", 11 "metrics": ["*"] 12 } 13 ]
Custom metrics ΛૹΔ design pattern • Kubernetes ڥͩͱ Autodiscovery ػೳ͕ศར
https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes API ͕͋Δ֎෦γεςϜ Custom metric ੜ͢Δ܅ (Kubernetes Deployment) API Ͱσʔλऔಘ Datadog-agent • GitHub ͷ PR issue • AWS Security Event(GuardDuty / ECR Image Scan) • OS ͝ͱͷ EC2 Horst • RDS ͷ version ͝ͱͷΫϥελ
·ͱΊ • Datadog Ͱ͔Βͳ͍͜ͱ͕͋ͬͨΒ... • ͙͢ʹ Support ʹฉ͜͏ʂૣͯ͘ஸೡʂ • Japan
Datadog User Group ͷ Slack Ͱฉ͍ͯ okʂ • Monitor Summary ͱ Weekly Report Ͱ Monitor ࣗମݮΒͯ͠ ͍͜͏ • Custom metrics ͦ͜ Datadog ͷޣຯɺͲΜͲΜ׆༻͠Α͏ʂ
More • ࣮ࡍͷάϥϑͱ͔ݟ͍ͨͻͱըͷͳ͍࣌ʹདྷͯʂ • ͜ͷ͋ͱͷ zoom breakout room ͰΑ͠ •
ݸผʹ Twitter Ͱ࿈བྷ͘ΕΕ Datadog ૬ஊΓ·͢
એ
Thank you! chaspy chaspy_ Lead Software Engineer Site Reliability at
Quipper Takeshi Kondo