Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
インシデントキーメトリクスによるインシデント対応の改善 / Improving Inciden...
Search
Narimichi Takamura
January 26, 2025
Technology
1
13k
インシデントキーメトリクスによるインシデント対応の改善 / Improving Incident Response using Incident Key Metrics
SRE Kaigi 2025の発表資料です。TTXメトリクスがメイントピックです。
https://2025.srekaigi.net/
Narimichi Takamura
January 26, 2025
Tweet
Share
More Decks by Narimichi Takamura
See All by Narimichi Takamura
Observability — Extending Into Incident Response
nari_ex
2
1k
組織的なインシデント対応を目指して〜成熟度評価と改善のステップ〜 / Towards an Organized Incident Response - Maturity Assessment and Improvement Steps -
nari_ex
7
9.5k
Waroomの開発モチベーションと今後のロードマップ / Waroom development motivation and roadmap
nari_ex
1
1.7k
Engineering with Business Impact
nari_ex
2
340
How We Foster Reliability in Diversity
nari_ex
14
13k
SRE Practices in Organizations
nari_ex
16
11k
Hardening におけるトラブルシューティング / Troubleshooting in Hardening
nari_ex
1
390
私が Engineering Manager になるまでに経験してきたこと、大切にしてきたこと / Lecture materials for Introduction to Venture Business at UEC
nari_ex
0
260
運用技術者組織の設計と運用 / Design and operation of operational engineer organization
nari_ex
11
10k
Other Decks in Technology
See All in Technology
EMからVPoEを経てCTOへ:マネジメントキャリアパスにおける葛藤と成長
kakehashi
PRO
5
420
DX Improvement at Scale
ntk1000
2
120
LY Tableauでの Tableau x AIの実践 (at Tableau Now! - 2026-02-26)
yoshitakaarakawa
0
1.2k
ヘルシーSRE
tk3fftk
2
220
入門DBSC
ynojima
0
100
Oracle Cloud Infrastructure:2026年2月度サービス・アップデート
oracle4engineer
PRO
0
170
Kaggleの経験が実務にどう活きているか / kaggle_findy
sansan_randd
0
130
類似画像検索モデルの開発ノウハウ
lycorptech_jp
PRO
0
240
What's new in Go 1.26?
ciarana
2
280
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
71k
Webアクセシビリティ技術と実装の実際
tomokusaba
0
190
Eight Engineering Unit 紹介資料
sansan33
PRO
1
6.9k
Featured
See All Featured
Amusing Abliteration
ianozsvald
0
120
GitHub's CSS Performance
jonrohan
1032
470k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Side Projects
sachag
455
43k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
190
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.3k
Optimising Largest Contentful Paint
csswizardry
37
3.6k
Music & Morning Musume
bryan
47
7.1k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Prompt Engineering for Job Search
mfonobong
0
180
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Transcript
None
2
גࣜձࣾTopotalʢͱΆͨΔʣ • h#ps:/ /topotal.com • SREΛओ࣠ʹͨ͠ελʔτΞοϓ • 2ࣄۀΛӡӦ • SRE
as a Service • SaaS for SREʢWaroomʣ • ຊΠϕϯτͷ Pla;num εϙϯαʔ 3
SRE as a Service • topotal.com/services/sre-as-a-service • SREʹಛԽٕͨ͠ज़ࢧԉαʔϏε • ࢧԉͷྫ
• SLI/SLOͷಋೖɾӡ༻վળ • CI/CDͷߏஙɾվળ • ΠϯγσϯτϚωδϝϯτͷվળ 4
WaroomʢΘΔʔΉʣ • waroom.com • ৫తʹΠϯγσϯτରԠΛߦ͏ͨΊ ͷSaaS • Slack ϕʔεͷରԠʹ߹ΘͤͯࣗಈԽɾ লྗԽ͕Ͱ͖Δ
5
6
վળͷϑΟʔυόοΫΛߏங͢Δ 7
8
ΞδΣϯμ 1. MTTRͷ 2. ࣮ફతͳ TTX ϝτϦΫεͷఆٛ 3. TTX ϝτϦΫεͷ׆༻ྫ
4. ൃలతͳϝτϦΫε 9
1. MTTRͷ 10
MTTRʢฏۉ෮چ࣌ؒʣ ͱ • ো͕ൃੜ͔ͯ͠Βम෮·ͨ෮چ͢Δ ·Ͱͷฏۉ࣌ؒͷ͜ͱ • Mean Time To Recovery(Repair,
Resolve, Restore)ͷུ • ࢉग़ํ๏1 • MTTR = ૯मཧ࣌ؒ / ૯ނোճ • Four Keys ͷࢦඪͷҰͭͰ͋Δ 1 MTTRʢฏۉ෮چ࣌ؒʣͱʁܭࢉํ๏ͱMTBFͱͷނোɾՔಇʹ ͓͚Δؔ 11
12
SREs should move away from defaul/ng to the assump/on that
MTTX can be useful. 13
MTTRͷ༗ޮੑͷݕূ • Ծઆ • MTTR͕༗ޮͳࢦඪͳͷͰ͋ΕɺTTRΛվળʢॖʣ͢ΔͱMTTRվ ળ͞ΕΔͣ • ݕূ֓ཁ • σʔληοτΛ1:1Ͱׂ͠ɺยํTTRΛ10%վળɺ͏ยํͳʹ
͠ͳ͍ͰMTTRΛࢉग़ɾൺֱ͢Δ • MTTR͕10%վળ͞ΕΔ͔Ͳ͏͔Λ֬ೝ͢Δ 14
MTTRͷ༗ޮੑͷݕূ 1. Πϯγσϯτͷσʔληοτ2ΛϥϯμϜʹ2ׂ͢Δ 2. ยํͷσʔληοτͷम෮࣌ؒ(TTR)Λ10%ݮΒ͢ 3. ֤σʔληοτͷMTTR(ฏۉम෮࣌ؒ)Λܭࢉ͢Δ 4. σʔληοτؒͷMTTRͷࠩΛऔΔ •
diff = MTTR(unmodified) - MTTR(modified) • diff > 0 => MTTRվળ • diff < 0 => MTTRѱԽ 5. 1ʙ4Λ10ສճ܁Γฦ͢ 2 σʔληοτɺ༗໊ͳΠϯλʔ ωοτاۀ3ࣾͷΠϯγσϯτες ʔλεμογϡϘʔυ͔Βऔಘ 15
Πϯγσϯτσʔλͷಛ3 • େ͔ͳΓૣ͘ऩଋ͢Δ • Ұ෦൵ࢂͳΠϯγσϯτʢϒϥοΫ εϫϯΠϕϯτʣʹͳΔ • → ແ࡞ҝʹσʔληοτΛׂ͢Δ ͱɺ൵ࢂͳΠϯγσϯτͷภΓ͕
MTTRͷࢉग़ʹେ͖ͳӨڹΛٴ΅͢ 3 The VOID Report 16
ࢀߟ: ϒϥοΫεϫϯΠϕϯτ • ༧ظͰ͖ͳ͍ɺյ໓తͳ݁ՌΛҾ͖ى ͜͢ࣄ • ϤʔϩούͰനௗന͍ௗ͚ͩͱࢥ ΘΕ͍ͯͨ • "༧ظ͞Εͳ͍େ͖ͳग़དྷࣄ"
Λ “ϒ ϥοΫεϫϯ” ͱݺͿΑ͏ʹͳͬͨ • 2007ʹൃץ͞ΕͨʮThe Black Swanʯ͕͖͔͚ͬ 17
γϛϡϨʔγϣϯ݁Ռ ֤Πϯγσϯτͷम෮࣌ؒΛ10%ͨ͘͠ʹ͔͔ΘΒͣɺMTTR͕10%Ҏ্͘ͳΔέʔε49%ɺ50%ɺ64%ͷΈ → ͘Β͍ɺम෮࣌ؒͷॖ͕MTTRʹө͞Εͳ͍ 18
ࢀߟ: म෮࣌ؒΛมߋͤͣʹγϛϡϨʔγϣϯͨ݁͠Ռ → վળ׆ಈͷ༗ແʹ͔͔ΘΒͣɺMTTRσʔληοτ࣍ୈͰվળ or ѱԽ͢Δ 19
Incident Metrics in SRE ͷओு • γϛϡϨʔγϣϯ͔ΒΘ͔ͬͨ͜ͱ • ΠϯγσϯτނোظؒͷΒ͖͕ͭେ͖͍ͨΊɺվળ݁Ռ͕ MTTR
ʹө͞ΕͮΒ͍ • վળͯ͠ѱԽ͢Δέʔεͦͦ͋͜͜Δ • ݁ • MTTR վળͷධՁࢦඪͱͯ͠ʹཱͨͳ͍ 20
ͳʹ͕ͩͬͨͷʁ • Πϯγσϯτظؒͷมಈੑ͕ߴ͍͜ͱ • MTTRΛͳΜΒ͔ͷࢦඪʹ͢Δ͜ͱ • ࢦඪΛͱʹվળͷՌΛ֬ೝ͢Δ͜ͱ ֤ཁૉͳ͍ → తͱࢦඪ͕טΈ߹͍ͬͯͳ͍͜ͱ͕
21
σʔλੳʢԾઆݕূܕʣͷྲྀΕ 22
MTTRΛࢦඪʹ͢Δͱ͖ͷࢥߟͷྲྀΕ 23
ى͖͍ͯͨ͜ͱ: ԾઆݕূϩδοΫͷෆ߹ 24
ղܾࡦ: վળՕॴΛ໌Β͔ʹ͠ɺมಈੑΛ͑Δ 25
ղܾࡦ: վળՕॴΛ໌Β͔ʹ͠ɺมಈੑΛ͑Δ 26
ิ: TTRͷ͍ಓ ฏۉ(MTTR)େࡶ͗͢Δ → ͷൺֱ՝ൃݟͷࢳޱʹͳΔ • ex. ଈ࣌෮چͷো͕ݮগ • →
ܰඍͳোͷࣗಈ෮چͷՌʁ • → োݕͷΈʹෆ۩߹ʁ • ex. ϒϥοΫεϫϯΠϕϯτ͕૿Ճ • → ίʔυΠϯϑϥͷ࣭Լʁ 27
͜͜·Ͱͷ·ͱΊ • MTTR(෮چ࣌ؒ)σʔλมಈੑ͕ߴ͍ͨΊվળࢦඪʹෆద • վળՕॴΛ໌֬Խ͠ɺΑΓࡉ͔͍ TTX ϝτϦΫεΛར༻͢Δ͜ ͱͰɺมಈੑΛ͑Δ͜ͱ͕Մೳ → TTRΑΓࡉ͔͍ϝτϦΫεͷधཁ͕ग़ͯ͘Δ
28
2. ࣮ફతͳ TTX ϝτϦΫε 29
Waroom͕ߟ͑Δ࣮ફతͳϝτϦΫεͱ • ཏతͰ͋Δ͜ͱ • ཻ͕ࡉ͔͍͜ͱ • ऩू͕ݱ࣮తͰ͋Δ͜ͱ 30
ͲΜͳTTXϝτϦΫεΛ ऩू͢ΔͱΑ͍ͩΖ͏͔ 31
32
TTXϝτϦΫεͷ՝ײ • ੈͷதʹࣄྫ͍͔ͭ͋͘Δ͕ɺఆٛ౷Ұ͞Ε͍ͯͳ͍ • ࣄྫಉ࢜ΛΈ߹ΘͤΑ͏ͱͯ͠ɺॏෳෆ͕ੜ͡Δ • → ஶ໊ͳจݙΛϕʔεʹɺࡉ͔͘ɺཏతͳఆٛΛࢦ͢ 33
TTXϝτϦΫεఆٛͷྲྀΕ 1. ϕετϓϥΫςΟεΛֶͿ 2. ΠϯγσϯτεςʔλεΛఆٛ͢Δ 3. ΠϯγσϯτϚΠϧετʔϯ(εςʔλεͷڥ)Λఆٛ͢Δ 4. TTXϝτϦΫεΛఆٛ͢Δ 34
ϕετϓϥΫςΟεΛֶͿ 35
େ·͔ʹεςʔλεΛఆٛ͢Δ 36
37
38
ϚΠϧετʔϯΛͱʹ TTXʹམͱ͠ࠐΉ 39
40
ίϥϜ: ϝτϦΫεऩू͍ͨΜ • ࡉ͔ͳϝτϦΫεΛఆٛ͢ΔͱɺϚΠϧετʔϯΛ͑Δ͝ͱ ʹλΠϜελϯϓΛه͢Δඞཁ͕͋Δ • ରԠதʹ͍͍ͪͪਓ͕ؒଧࠁ͢Δͷඇݱ࣮త • → WaroomͰࣗಈऩू͍ͯ͠·͢
41
ରԠதͷΠϕϯτΛτϦΨʔʹࣗಈऩू͢Δྫ ϚΠϧετʔϯ ରԠதͷΠϕϯτ Detectedʢݕʣ Ξϥʔτൃੜ௨ Acknowledgedʢೝʣ νϟϯωϧ࡞ɺΠϯγσϯτىථ Iden.fiedʢղܾࡦͷಛఆʣ RunbookͷϑΣʔζ͚ʢPrecheck ͱResolu.onʣ
Recoveredʢ෮چʣ SlackͷΓͱΓ͔ΒAI͕அ͢Δ 42
3. TTXϝτϦΫεͷ׆༻ 43
ϝτϦΫεΛޮՌతʹ͏ͨΊʹ ੳͷతͱϝτϦΫεͷಛΛ߹ͤ͞Δ 44
45
ϝτϦΫεͱվળࢪࡦͷྫ TTX ՝ վળࢪࡦ TTDetectʢݕʣ ൃੜ͔ͯ͠Βݕ·Ͱʹ࣌ ͕͔͔ؒΔ ϞχλϦϯάͷվળ TTEngageʢνʔϜߏʣ ରԠνʔϜΛߏஙʹ͕࣌ؒ
͔͔Δ γϑτׂͷ໌֬ԽɺΦ ϯίʔϧ੍ͷಋೖ TTInves-gateʢௐࠪʣ োΓ͚ʹ͕͔͔࣌ؒ Δ RunbookͷμογϡϘʔυͷ උ TTFixʢम෮ʣ োͷम෮ʹ͕͔͔࣌ؒΔ ϩʔϧόοΫͷߴԽ 46
47
യવͱͨ͠ԾઆΛͱʹɺ͔Β՝Λݟ͚ͭΔ Ծઆ ৽ͨʹൃݟͨ͠՝ͷྫ ࣾͰੜ͡ΔΠϯγσϯτͰ͋ ΕTTXͷҰఆͷͣ αʔϏενʔϜʹΑͬͯύϑ ΥʔϚϯε͕ҟͳΔ ֤TTXఆʹ͍ۙͣ ʢex. TTAͳΒ10Ҏ͘Β
͍ʣ ʢ࣮ʣணख͕શମతʹ͍ɺ ղܾࡦͷಛఆ͕શମతʹ͍ 48
49
50
4. ൃలతͳϝτϦΫε 51
αʔϏε෮چҎ֎ʹॏཁͳ͜ͱ • ͜Ε·ͰΈ͖ͯͨTTXϝτϦΫεγεςϜ෮چʹয͕͋ͨͬ ͍ͯΔ • ࣮ࡍͷΠϯγσϯτରԠ γεςϜ͚ͩͰͳ͘ɺਓʹྀ͢ Δඞཁ͕͋Δ • ސ٬ରԠࣄۀӡӦ؍ͷϝτϦΫεΛ׆༻͢Δ͜ͱͰɺΤ
ϯδχΞҎ֎ͷϝϯόʔؚΊͨ৫తͳରԠͷ࣮ݱ͕ۙͮ ͘ 52
ൃలͳϝτϦΫεͷྫ ސ٬ରԠࠜຊରࡦʹযΛͯɺ͞·͟·ͳϩʔϧΛר͖ࠐΈɺ৫తͳΠϯγσϯτରԠΛՃͤ͞ Δ ϝτϦΫε໊ λʔήοτϩʔϧ త Incident Response Metrics Engineer
७ਮͳ෮چରԠͷ՝ಛఆɾվળ ࢦඪ Customer Reliability Metrics Sales, CRE ސ٬ରԠͷ՝ಛఆɾվળࢦඪ Learning Metrics Maneger, Engineer ৫ֶ͕ͼΛಘΔ·Ͱͷ׆ಈͷτ ϥοΩϯά Improvement Metrics Maneger, Engineer ࠜຊରࡦͷ࣮ࢪঢ়گͷੳ 53
·ͱΊ ҎԼͷ5Λ͓͑͠·ͨ͠ɻෆ໌͕͋Γ·ͨ͠ΒɺAsk the Speaker͓ӽ͍ͩ͘͠͞ʂ 1. MTTRվળࢦඪͱཱͯͨ͠ͳ͍ • ཧ༝: Πϯγσϯτσʔλͷมಈੑ͕ߴ͍͔Β 2.
ϝτϦΫε׆༻ɺతʙσʔλੳʹࢸΔ·Ͱͷ߹ੑ͕ॏཁ 3. มಈੑΛ͑ΔͨΊʹɺ͍ͷ۩ମԽͱϝτϦΫεͷࡉԽ͕ॏཁ 4. Waroomʹ͓͚ΔTTXϝτϦΫεͷఆٛաఔͱ׆༻ํ๏ 5. αʔϏε෮چҎ֎ʹॏཁͳϝτϦΫε 54
͍͞͝ʹ • ϝτϦΫεͷࣗಈऩूͷ͔͚͠Λ࡞Δ ͷ͍ͨΜ • ͞ΒʹɺՄࢹԽج൫ͷߏங͍ͨΜ • ͞ΒʹɺݪҼΧςΰϦҙϥϕϧΛ ͱʹ෦நग़͢Δͷ͍ͨΜ •
→ ͥͻ Waroom Λ͝׆༻͍ͩ͘͞ • ڵຯ͕༙͍ͨํ Topotal ͷϒʔε ͥͻ͓ӽ͍ͩ͘͠͞ 55
͋Γ͕ͱ͏͍͟͝·ͨ͠