Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

גࣜձࣾTopotalʢͱΆͨΔʣ • h#ps:/ /topotal.com • SREΛओ࣠ʹͨ͠ελʔτΞοϓ • 2ࣄۀΛӡӦ • SRE as a Service • SaaS for SREʢWaroomʣ • ຊΠϕϯτͷ Pla;num εϙϯαʔ 3

Slide 4

Slide 4 text

SRE as a Service • topotal.com/services/sre-as-a-service • SREʹಛԽٕͨ͠ज़ࢧԉαʔϏε • ࢧԉͷྫ • SLI/SLOͷಋೖɾӡ༻վળ • CI/CDͷߏஙɾվળ • ΠϯγσϯτϚωδϝϯτͷվળ 4

Slide 5

Slide 5 text

WaroomʢΘΔʔΉʣ • waroom.com • ૊৫తʹΠϯγσϯτରԠΛߦ͏ͨΊ ͷSaaS • Slack ϕʔεͷରԠʹ߹ΘͤͯࣗಈԽɾ লྗԽ͕Ͱ͖Δ 5

Slide 6

Slide 6 text

6

Slide 7

Slide 7 text

վળͷϑΟʔυόοΫΛߏங͢Δ 7

Slide 8

Slide 8 text

8

Slide 9

Slide 9 text

ΞδΣϯμ 1. MTTRͷ໰୊఺ 2. ࣮ફతͳ TTX ϝτϦΫεͷఆٛ 3. TTX ϝτϦΫεͷ׆༻ྫ 4. ൃలతͳϝτϦΫε 9

Slide 10

Slide 10 text

1. MTTRͷ໰୊఺ 10

Slide 11

Slide 11 text

MTTRʢฏۉ෮چ࣌ؒʣ ͱ͸ • ো֐͕ൃੜ͔ͯ͠Βम෮·ͨ͸෮چ͢Δ ·Ͱͷฏۉ࣌ؒͷ͜ͱ • Mean Time To Recovery(Repair, Resolve, Restore)ͷུ • ࢉग़ํ๏1 • MTTR = ૯मཧ࣌ؒ / ૯ނোճ਺ • Four Keys ͷࢦඪͷҰͭͰ΋͋Δ 1 MTTRʢฏۉ෮چ࣌ؒʣͱ͸ʁܭࢉํ๏ͱMTBFͱͷނো཰ɾՔಇ཰ʹ ͓͚Δؔ܎ 11

Slide 12

Slide 12 text

12

Slide 13

Slide 13 text

SREs should move away from defaul/ng to the assump/on that MTTX can be useful. 13

Slide 14

Slide 14 text

MTTRͷ༗ޮੑͷݕূ • Ծઆ • MTTR͕༗ޮͳࢦඪͳͷͰ͋Ε͹ɺTTRΛվળʢ୹ॖʣ͢ΔͱMTTR΋վ ળ͞ΕΔ͸ͣ • ݕূ֓ཁ • σʔληοτΛ1:1Ͱ෼ׂ͠ɺยํ͸TTRΛ10%վળɺ΋͏ยํ͸ͳʹ΋ ͠ͳ͍ͰMTTRΛࢉग़ɾൺֱ͢Δ • MTTR͕10%վળ͞ΕΔ͔Ͳ͏͔Λ֬ೝ͢Δ 14

Slide 15

Slide 15 text

MTTRͷ༗ޮੑͷݕূ 1. Πϯγσϯτͷσʔληοτ2ΛϥϯμϜʹ2෼ׂ͢Δ 2. ยํͷσʔληοτͷम෮࣌ؒ(TTR)Λ10%ݮΒ͢ 3. ֤σʔληοτͷMTTR(ฏۉम෮࣌ؒ)Λܭࢉ͢Δ 4. σʔληοτؒͷMTTRͷࠩ෼ΛऔΔ • diff = MTTR(unmodified) - MTTR(modified) • diff > 0 => MTTR͸վળ • diff < 0 => MTTR͸ѱԽ 5. 1ʙ4Λ10ສճ܁Γฦ͢ 2 σʔληοτ͸ɺ༗໊ͳΠϯλʔ ωοτاۀ3ࣾͷΠϯγσϯτες ʔλεμογϡϘʔυ͔Βऔಘ 15

Slide 16

Slide 16 text

Πϯγσϯτσʔλͷಛ௃3 • େ൒͸͔ͳΓૣ͘ऩଋ͢Δ • Ұ෦͸൵ࢂͳΠϯγσϯτʢϒϥοΫ εϫϯΠϕϯτʣʹͳΔ • → ແ࡞ҝʹσʔληοτΛ෼ׂ͢Δ ͱɺ൵ࢂͳΠϯγσϯτͷภΓ͕ MTTRͷࢉग़ʹେ͖ͳӨڹΛٴ΅͢ 3 The VOID Report 16

Slide 17

Slide 17 text

ࢀߟ: ϒϥοΫεϫϯΠϕϯτ • ༧ظͰ͖ͳ͍ɺյ໓తͳ݁ՌΛҾ͖ى ͜͢ࣄ৅ • ϤʔϩούͰ͸നௗ͸ന͍ௗ͚ͩͱࢥ ΘΕ͍ͯͨ • "༧ظ͞Εͳ͍େ͖ͳग़དྷࣄ" Λ “ϒ ϥοΫεϫϯ” ͱݺͿΑ͏ʹͳͬͨ • 2007೥ʹൃץ͞ΕͨʮThe Black Swanʯ͕͖͔͚ͬ 17

Slide 18

Slide 18 text

γϛϡϨʔγϣϯ݁Ռ ֤Πϯγσϯτͷम෮࣌ؒΛ10%୹ͨ͘͠ʹ΋͔͔ΘΒͣɺMTTR͕10%Ҏ্୹͘ͳΔέʔε͸49%ɺ50%ɺ64%ͷΈ → ൒෼͘Β͍͸ɺम෮࣌ؒͷ୹ॖ͕MTTRʹ൓ө͞Εͳ͍ 18

Slide 19

Slide 19 text

ࢀߟ: म෮࣌ؒΛมߋͤͣʹγϛϡϨʔγϣϯͨ݁͠Ռ → վળ׆ಈͷ༗ແʹ͔͔ΘΒͣɺMTTR͸σʔληοτ࣍ୈͰվળ or ѱԽ͢Δ 19

Slide 20

Slide 20 text

Incident Metrics in SRE ͷओு • γϛϡϨʔγϣϯ͔ΒΘ͔ͬͨ͜ͱ • Πϯγσϯτ͸ނোظؒͷ͹Β͖͕ͭେ͖͍ͨΊɺվળ݁Ռ͕ MTTR ʹ൓ө͞ΕͮΒ͍ • վળͯ͠΋ѱԽ͢Δέʔε΋ͦͦ͋͜͜Δ • ݁࿦ • MTTR ͸վળͷධՁࢦඪͱͯ͠໾ʹཱͨͳ͍ 20

Slide 21

Slide 21 text

ͳʹ͕໰୊ͩͬͨͷʁ • Πϯγσϯτظؒͷมಈੑ͕ߴ͍͜ͱ • MTTRΛͳΜΒ͔ͷࢦඪʹ͢Δ͜ͱ • ࢦඪΛ΋ͱʹվળͷ੒ՌΛ֬ೝ͢Δ͜ͱ ֤ཁૉ͸໰୊ͳ͍ → ໨తͱࢦඪ͕טΈ߹͍ͬͯͳ͍͜ͱ͕໰୊ 21

Slide 22

Slide 22 text

σʔλ෼ੳʢԾઆݕূܕʣͷྲྀΕ 22

Slide 23

Slide 23 text

MTTRΛࢦඪʹ͢Δͱ͖ͷࢥߟͷྲྀΕ 23

Slide 24

Slide 24 text

ى͖͍ͯͨ͜ͱ: ԾઆݕূϩδοΫͷෆ੔߹ 24

Slide 25

Slide 25 text

ղܾࡦ: վળՕॴΛ໌Β͔ʹ͠ɺมಈੑΛ཈͑Δ 25

Slide 26

Slide 26 text

ղܾࡦ: վળՕॴΛ໌Β͔ʹ͠ɺมಈੑΛ཈͑Δ 26

Slide 27

Slide 27 text

ิ଍: TTRͷ࢖͍ಓ ฏۉ஋(MTTR)͸େࡶ೺͗͢Δ → ෼෍ͷൺֱ͸՝୊ൃݟͷࢳޱʹͳΔ • ex. ଈ࣌෮چͷো֐͕ݮগ • → ܰඍͳো֐ͷࣗಈ෮چͷ੒Ռʁ • → ো֐ݕ஌ͷ࢓૊Έʹෆ۩߹ʁ • ex. ϒϥοΫεϫϯΠϕϯτ͕૿Ճ • → ίʔυ΍Πϯϑϥͷ඼࣭௿Լʁ 27

Slide 28

Slide 28 text

͜͜·Ͱͷ·ͱΊ • MTTR(෮چ࣌ؒ)͸σʔλมಈੑ͕ߴ͍ͨΊվળࢦඪʹ͸ෆద੾ • վળՕॴΛ໌֬Խ͠ɺΑΓࡉ͔͍ TTX ϝτϦΫεΛར༻͢Δ͜ ͱͰɺมಈੑΛ཈͑Δ͜ͱ͕Մೳ → TTRΑΓ΋ࡉ͔͍ϝτϦΫε΁ͷधཁ͕ग़ͯ͘Δ 28

Slide 29

Slide 29 text

2. ࣮ફతͳ TTX ϝτϦΫε 29

Slide 30

Slide 30 text

Waroom͕ߟ͑Δ࣮ફతͳϝτϦΫεͱ͸ • ໢ཏతͰ͋Δ͜ͱ • ཻ౓͕ࡉ͔͍͜ͱ • ऩू͕ݱ࣮తͰ͋Δ͜ͱ 30

Slide 31

Slide 31 text

ͲΜͳTTXϝτϦΫεΛ ऩू͢ΔͱΑ͍ͩΖ͏͔ 31

Slide 32

Slide 32 text

32

Slide 33

Slide 33 text

TTXϝτϦΫε΁ͷ՝୊ײ • ੈͷதʹࣄྫ͸͍͔ͭ͋͘Δ͕ɺఆٛ͸౷Ұ͞Ε͍ͯͳ͍ • ࣄྫಉ࢜Λ૊Έ߹ΘͤΑ͏ͱͯ͠΋ɺॏෳ΍ෆ଍͕ੜ͡Δ • → ஶ໊ͳจݙΛϕʔεʹɺࡉ͔͘ɺ໢ཏతͳఆٛΛ໨ࢦ͢ 33

Slide 34

Slide 34 text

TTXϝτϦΫεఆٛͷྲྀΕ 1. ϕετϓϥΫςΟεΛֶͿ 2. ΠϯγσϯτεςʔλεΛఆٛ͢Δ 3. ΠϯγσϯτϚΠϧετʔϯ(εςʔλεͷڥ໨)Λఆٛ͢Δ 4. TTXϝτϦΫεΛఆٛ͢Δ 34

Slide 35

Slide 35 text

ϕετϓϥΫςΟεΛֶͿ 35

Slide 36

Slide 36 text

େ·͔ʹεςʔλεΛఆٛ͢Δ 36

Slide 37

Slide 37 text

37

Slide 38

Slide 38 text

38

Slide 39

Slide 39 text

ϚΠϧετʔϯΛ΋ͱʹ TTXʹམͱ͠ࠐΉ 39

Slide 40

Slide 40 text

40

Slide 41

Slide 41 text

ίϥϜ: ϝτϦΫεऩू͸͍ͨ΁Μ • ࡉ͔ͳϝτϦΫεΛఆٛ͢ΔͱɺϚΠϧετʔϯΛ௒͑Δ͝ͱ ʹλΠϜελϯϓΛه࿥͢Δඞཁ͕͋Δ • ରԠதʹ͍͍ͪͪਓ͕ؒଧࠁ͢Δͷ͸ඇݱ࣮త • → WaroomͰ͸ࣗಈऩू͍ͯ͠·͢ 41

Slide 42

Slide 42 text

ରԠதͷΠϕϯτΛτϦΨʔʹࣗಈऩू͢Δྫ ϚΠϧετʔϯ ରԠதͷΠϕϯτ Detectedʢݕ஌ʣ Ξϥʔτൃੜ௨஌ Acknowledgedʢೝ஌ʣ νϟϯωϧ࡞੒ɺΠϯγσϯτىථ Iden.fiedʢղܾࡦͷಛఆʣ RunbookͷϑΣʔζ෼͚ʢPrecheck ͱResolu.onʣ Recoveredʢ෮چʣ Slackͷ΍ΓͱΓ͔ΒAI͕൑அ͢Δ 42

Slide 43

Slide 43 text

3. TTXϝτϦΫεͷ׆༻ 43

Slide 44

Slide 44 text

ϝτϦΫεΛޮՌతʹ࢖͏ͨΊʹ ෼ੳͷ໨తͱϝτϦΫεͷಛ௃Λ੔߹ͤ͞Δ 44

Slide 45

Slide 45 text

45

Slide 46

Slide 46 text

ϝτϦΫεͱվળࢪࡦͷྫ TTX ՝୊ վળࢪࡦ TTDetectʢݕ஌ʣ ൃੜ͔ͯ͠Βݕ஌·Ͱʹ࣌ ͕͔͔ؒΔ ϞχλϦϯάͷվળ TTEngageʢνʔϜߏ੒ʣ ରԠνʔϜΛߏஙʹ͕࣌ؒ ͔͔Δ γϑτ΍໾ׂͷ໌֬ԽɺΦ ϯίʔϧ੍౓ͷಋೖ TTInves-gateʢௐࠪʣ ো֐੾Γ෼͚ʹ͕͔͔࣌ؒ Δ RunbookͷμογϡϘʔυͷ ੔උ TTFixʢम෮ʣ ো֐ͷम෮ʹ͕͔͔࣌ؒΔ ϩʔϧόοΫͷߴ଎Խ 46

Slide 47

Slide 47 text

47

Slide 48

Slide 48 text

യવͱͨ͠ԾઆΛ΋ͱʹɺ܏޲͔Β՝୊Λݟ͚ͭΔ Ծઆ ৽ͨʹൃݟͨ͠՝୊ͷྫ ࣾ಺Ͱੜ͡ΔΠϯγσϯτͰ͋ Ε͹TTXͷ܏޲͸Ұఆͷ͸ͣ αʔϏε΍νʔϜʹΑͬͯύϑ ΥʔϚϯε͕ҟͳΔ ֤TTX͸૝ఆ஋ʹ͍ۙ͸ͣ ʢex. TTAͳΒ10෼Ҏ಺͘Β ͍ʣ ʢ࣮͸ʣணख͕શମతʹ஗͍ɺ ղܾࡦͷಛఆ͕શମతʹ஗͍ 48

Slide 49

Slide 49 text

49

Slide 50

Slide 50 text

50

Slide 51

Slide 51 text

4. ൃలతͳϝτϦΫε 51

Slide 52

Slide 52 text

αʔϏε෮چҎ֎ʹॏཁͳ͜ͱ • ͜Ε·ͰΈ͖ͯͨTTXϝτϦΫε͸γεςϜ෮چʹয఺͕͋ͨͬ ͍ͯΔ • ࣮ࡍͷΠϯγσϯτରԠ͸ γεςϜ͚ͩͰͳ͘ɺਓʹ΋഑ྀ͢ Δඞཁ͕͋Δ • ސ٬ରԠ΍ࣄۀӡӦ؍఺ͷϝτϦΫεΛ׆༻͢Δ͜ͱͰɺΤ ϯδχΞҎ֎ͷϝϯόʔ΋ؚΊͨ૊৫తͳରԠͷ࣮ݱ͕ۙͮ ͘ 52

Slide 53

Slide 53 text

ൃలͳϝτϦΫεͷྫ ސ٬ରԠ΍ࠜຊରࡦʹয఺Λ౰ͯɺ͞·͟·ͳϩʔϧΛר͖ࠐΈɺ૊৫తͳΠϯγσϯτରԠΛՃ଎ͤ͞ Δ ϝτϦΫε໊ λʔήοτϩʔϧ ໨త Incident Response Metrics Engineer ७ਮͳ෮چରԠͷ՝୊ಛఆɾվળ ࢦඪ Customer Reliability Metrics Sales, CRE ސ٬ରԠͷ՝୊ಛఆɾվળࢦඪ Learning Metrics Maneger, Engineer ૊৫ֶ͕ͼΛಘΔ·Ͱͷ׆ಈͷτ ϥοΩϯά Improvement Metrics Maneger, Engineer ࠜຊରࡦͷ࣮ࢪঢ়گͷ෼ੳ 53

Slide 54

Slide 54 text

·ͱΊ ҎԼͷ5఺Λ͓఻͑͠·ͨ͠ɻෆ໌఺͕͋Γ·ͨ͠ΒɺAsk the Speaker΁͓ӽ͍ͩ͘͠͞ʂ 1. MTTR͸վળࢦඪͱͯ͠໾ཱͨͳ͍ • ཧ༝: Πϯγσϯτσʔλͷมಈੑ͕ߴ͍͔Β 2. ϝτϦΫε׆༻͸ɺ໨తʙσʔλ෼ੳʹࢸΔ·Ͱͷ੔߹ੑ͕ॏཁ 3. มಈੑΛ཈͑ΔͨΊʹ͸ɺ໰͍ͷ۩ମԽͱϝτϦΫεͷࡉ෼Խ͕ॏཁ 4. Waroomʹ͓͚ΔTTXϝτϦΫεͷఆٛաఔͱ׆༻ํ๏ 5. αʔϏε෮چҎ֎ʹॏཁͳϝτϦΫε 54

Slide 55

Slide 55 text

͍͞͝ʹ • ϝτϦΫεͷࣗಈऩूͷ͔͚͠Λ࡞Δ ͷ͸͍ͨ΁Μ • ͞ΒʹɺՄࢹԽج൫ͷߏங͸͍ͨ΁Μ • ͞ΒʹɺݪҼΧςΰϦ΍೚ҙϥϕϧΛ ΋ͱʹ෦෼நग़͢Δͷ΋͍ͨ΁Μ • → ͥͻ Waroom Λ͝׆༻͍ͩ͘͞ • ڵຯ͕༙͍ͨํ͸ Topotal ͷϒʔε΁ ͥͻ͓ӽ͍ͩ͘͠͞ 55

Slide 56

Slide 56 text

͋Γ͕ͱ͏͍͟͝·ͨ͠