Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Alert Handling with Datadog Incident Management
Takeshi Kondo
August 25, 2020
Technology
0
1.2k
Alert Handling with Datadog Incident Management
JDDUG meetup#1
https://datadog-jp.connpass.com/event/185920/
Takeshi Kondo
August 25, 2020
Tweet
Share
More Decks by Takeshi Kondo
See All by Takeshi Kondo
Who owns the Service Level?
chaspy
5
7.5k
多様な働き方を支える Working Agreements / Working agreements that support diverse work styles
chaspy
1
1.3k
SRE を実現するための組織マネジメント / Management to achieve SRE
chaspy
3
4.3k
サービス立ち上げ期におけるSREの取り組み / SRE efforts in the service launch phase
chaspy
0
950
Implementing Site Reliability Engineering in your organization
chaspy
6
2.4k
How to measure "Site Reliability Engineering"
chaspy
6
2.4k
Site Reliability Engineering における 重要領域とパフォーマンス指標の提案 / Performance Indicators for SRE
chaspy
1
2.1k
Metric-Driven Decision Making with Custom Prometheus Exporter
chaspy
1
1.1k
想定外の負荷を乗り切ったオンライン教育サービスの裏側 / How We Overcame the COVID-19 Crisis
chaspy
7
5.6k
Other Decks in Technology
See All in Technology
IoT から見る AWS re:invent 2022 ― AWSのIoTの歴史を添えて/Point of view the AWS re:invent 2022 with IoT - with a history of IoT in AWS
ma2shita
0
290
書籍を書きました。 そう、VS Codeで。
takumanakagame
4
4.6k
WebLogic Server for OCI 概要
oracle4engineer
PRO
3
900
Dockerに疲れた人のためのLXDではじめるシステムコンテナ入門
devops_vtj
0
120
re:Invent発表のサービスを取り入れて加速する弥生のSecurity&Governance / accelerating YAYOI's Security and Governance with services announced at reinvent
yayoi_dd
0
150
DNS権威サーバのクラウドサービス向けに行われた攻撃および対策 / DNS Pseudo-Random Subdomain Attack and mitigations
kazeburo
5
1.3k
KyvernoとRed Hat ACMを用いたマルチクラスターの一元的なポリシー制御
ry
0
230
MarvelClient Upgrade 64bit クライアントへの自動アップグレード設定
mitsuru_katoh
0
190
AWS re:Invent 2022で発表された新機能を試してみた ~Cloud OperationとSecurity~ / New Cloud Operation and Security Features Announced at AWS reInvent 2022
yuj1osm
1
220
OCI DevOps 概要 / OCI DevOps overview
oracle4engineer
PRO
0
510
ChatGPT for Hacking
anugrahsr
0
4.6k
Multi-Cloud Gatewayでデータを統治せよ!/ Data Federation with MCG
tutsunom
1
350
Featured
See All Featured
Making the Leap to Tech Lead
cromwellryan
117
7.7k
For a Future-Friendly Web
brad_frost
166
7.8k
What's in a price? How to price your products and services
michaelherold
233
9.7k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
101
6.2k
Docker and Python
trallard
30
1.9k
YesSQL, Process and Tooling at Scale
rocio
159
12k
Building an army of robots
kneath
301
40k
Faster Mobile Websites
deanohume
295
29k
How GitHub Uses GitHub to Build GitHub
holman
465
280k
Typedesign – Prime Four
hannesfritz
34
1.5k
The MySQL Ecosystem @ GitHub 2015
samlambert
240
11k
The Language of Interfaces
destraynor
149
21k
Transcript
Alert Handling with Datadog Incident Management Takeshi Kondo / @chaspy
2020/08/25 JDDUG meetup#1
Datadog Incident Management
Datadog Incident Management https://www.datadoghq.com/blog/incident-response-with-datadog/
Datadog Incident Management https://docs.datadoghq.com/monitors/incident_management/
Datadog Incident Management https://docs.datadoghq.com/monitors/incident_management/ Cool
Who am I chaspy chaspy_ Lead Software Engineer Site Reliability
Engineering at Quipper Takeshi Kondo
Agenda • Introduction of Datadog Incident Management • Alert Handling
in Quipper
Agenda • Introduction of Datadog Incident Management • Alert Handling
in Quipper
Incident Response 6-Step Plan 1. Preparation 2. Identification 3. Containment
4. Eradication 5. Recovery 6. Review lessons learned https://www.varonis.com/blog/incident-response-plan/
Incident Response 6-Step Plan 1. Preparation 2. Identification 3. Containment
4. Eradication 5. Recovery 6. Review lessons learned -> Postmortem https://www.varonis.com/blog/incident-response-plan/ Incident Management
Datadog Incident Management • Overview • Timeline • Remediation
Datadog Incident Management • Overview • Timeline • Remediation
Datadog Incident Management: Overview
Severity Levels: Smart default and configurable
Status Levels and Properties Fields
Datadog Incident Management: Overview
Datadog Incident Management • Overview • Timeline • Remediation
Datadog Incident Management: Timeline
Datadog Incident Management • Overview • Timeline • Remediation
Datadog Incident Management: Remediation
Agenda • Introduction of Datadog Incident Management • Alert Handling
in Quipper
See “Alerting Strategy for Self-Contained Team” https://speakerdeck.com/chaspy/alerting-strategy-for-self-contained-team
Review alerts Daily
Review alerts at Daily Standup
Review alerts at Daily Standup
Thank You! chaspy chaspy_ Lead Software Engineer at Quipper Takeshi
Kondo