Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE を実現するための組織マネジメント / Management to achieve SRE

SRE を実現するための組織マネジメント / Management to achieve SRE

93c80c388fe9d8f9df7d030549a0ff0b?s=128

Takeshi Kondo

March 12, 2022
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. SRE Λ࣮ݱ͢ΔͨΊͷ૊৫Ϛωδϝϯτ Takeshi Kondo / @chaspy 2022/03/12 6ࣾ߹ಉ SREษڧձ

  2. Who am I chaspy chaspy_ Engineering Manager, Site Reliability at

    Recruit Co., Ltd. Takeshi Kondo https://chaspy.me
  3. Who am I chaspy chaspy_ ʢגʣϦΫϧʔτ ϓϩμΫτ౷ׅຊ෦ ϓϩμΫτ։ൃ౷ׅࣨ ϓϩμΫτσΟϕϩοϓϝϯτࣨ ·ͳͼྖҬϓϩμΫτσΟϕϩοϓϝϯτϢχοτ

    খதߴϓϩμΫτ։ൃ෦ খதߴ̨̧̚άϧʔϓ άϧʔϓϚωʔδϟ Takeshi Kondo https://chaspy.me
  4. ࠓ೔࿩͢͜ͱ ϦΫϧʔτάϧʔϓͷ ʮϛογϣϯϚωδϝϯτʯΛ ׆༻ͯ͠։ൃνʔϜͷ SRE Capability शಘ Λࢧԉͨ͠ࣄྫ

  5. ͋Δ͍͸ (Partially) Embedded / Enabling SRE ͷࣄྫ

  6. • ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ͸2छྨ͋Δ • Embedded SRE (from Pure SRE)

    / ֎͔Β఻͑Δ • Enabling SRE (in the Team) / ಺͔Β޿͛Δ • ૊৫ن໛ɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ • খن໛ / ։ൃॳظϑΣʔζͰ͋Ε͹ Embedded SRE Pattern • தେن໛ / ։ൃνʔϜ͕੒ख़ͯ͘͠Ε͹ Enabling SRE Pattern • ͜ͷ2ͭͷύλʔϯ͸ϚωδϝϯτͰσβΠϯͰ͖Δ • 100/0 Ͱͳ͘”෦෼తʹ”࣮ફ͢Δ͚ͩͰ΋ޮՌ͕͋Δ Tl;dr
  7. Disclimer • Management ͷྫͱͯ͠঺հ͠·͕͢ɺ੒Ռ͕ग़ͨͷ͸ ϛογϣϯΛҾ͖ड͚ͯ͘ΕͨϝϯόʔɺSREɺ։ൃνʔϜ ͷօ͞Μͷ͓͔͛Ͱ͢ɻ͍ͭ΋͋Γ͕ͱ͏͍͟͝·͢ʂ

  8. Agenda • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /

    Enabling SRE • ·ͱΊͱࠓޙ
  9. Agenda • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /

    Enabling SRE • ·ͱΊͱࠓޙ
  10. SRE Λ࣮ݱ͢Δͱ͸

  11. ։ൃνʔϜ͕৴པੑΛ ίϯτϩʔϧ͢Δ Capability Λ ਎ʹ͚͍ͭͯΔ͜ͱ

  12. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Not like this • αʔϏε͕ʮߴ͍৴པੑ

    (ʹ100%)ʯΛอ͍ͬͯΔ͜ͱ • SLI/SLO ΛकΕ͍ͯΔ͜ͱ • ΦϯίʔϧϩʔςʔγϣϯΛ։ൃνʔϜͰߦ͏͜ͱ https://github.com/twitter/twemoji
  13. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Like this! • αʔϏε͕ʮϢʔβ͕ظ଴͢Δ৴པੑʯΛอ͍ͬͯΔ͜ͱ •

    SLI/SLO Λઃఆ͠ɺඇػೳཁ݅ͱػೳཁ݅ͷ༏ઌ౓ܾఆͷ ࢦඪͱͯ͠׆༻͍ͯ͠Δ • SLO ҧ൓͕ൃੜͨ͠ͱ͖ʹద੾ʹରॲͰ͖ΔΑ͏ͳϞχλ Ϧϯάํ๏ͱϙϦγʔ͕νʔϜͰಉҙ͞Ε͍ͯΔ • ্ه͕ఆظతʹݟ௚͞Ε͍ͯΔ https://github.com/twitter/twemoji
  14. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷ৴པੑʹ ؔ͢Δ

    Capability औಘ Λࢧԉ͢Δ ࣗ෼ͨͪͷαʔϏεͷ ৴པੑΛࣗ෼ͨͪͰί ϯτϩʔϧͰ͖͍ͯΔ
  15. Team Topologies • 4ͭͷνʔϜύλʔϯ • Stream Aligned • Platform •

    Enabling • Complicated Subsystem • 3ͭͷίϛϡχέʔγϣϯύλʔϯ • Collaboration • X as a Service • Facilitation https://pub.jmam.co.jp/book/b593881.html
  16. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ׬݁ ԽΛࢧ͑Δϓϥοτ

    ϑΥʔϜͱจԽΛ࡞Δ Platform Team Enabling Team Stream Aligned Team ։ൃνʔϜ͸ࣗ෼ͨͪͰඞཁͳ ΋ͷΛࣗ෼ͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ׬݁Խ
  17. SRE Team ͷ Vision / Mission / Values https://blog.studysapuri.jp/entry/sre-vision-mission-values

  18. Mission ࣗݾ׬݁νʔϜ͕ϓϩμΫ τΛૉૣ҆͘શʹಧ͚ଓ͚ ΔͨΊͷϓϥοτϑΥʔϜ ͱจԽΛ࡞Δ

  19. Agenda • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /

    Enabling SRE • ·ͱΊͱࠓޙ
  20. ʢͦͷલʹʣ ϓϩμΫτ঺հ

  21. None
  22. None
  23. None
  24. ྺ࢙͔ΒৼΓฦΔ ʰελσΟαϓϦʱSRE

  25. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • 2019: Application Platform Λ Kubernetes ΁Ҡ؅ • 2020:

    Microservices Readiness ͷ੔උ • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc / Production Readiness Checklist • Self-services Infrastructure (terraform monorepo) • SLI/SLO • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹ׬શҠৡ Platform Team ͱͯ͠ Platform Λ ࡞͍ͬͯΔ Enabling Team ͱͯ͠ ։ൃ૊৫ʹ SLI/SLO ͳͲͷΧϧνϟʔৢ੒
  26. ૊৫ن໛ͷਪҠ     ։ൃऀ    

    43&     ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ
  27. 2021೥ɺEnabling SRE Λ։ൃνʔϜ಺͔Β࡞ΔΑ͏ઓུมߋ • ʮ৴པੑʯΛऔΓר͘։ൃ૊৫ͷঢ়گ͕ΑΓΞϓϦέʔγϣϯɾ υϝΠϯʹಛԽͨ͠໰୊ʹͳΓͭͭ͋ͬͨ • ෛՙࢼݧ • υϝΠϯಛԽͷ

    Pod Auto Scaling • Frontend Performance ͷଌఆ ͓Αͼ SLI/SLO ͷվળ • QA ࣗಈԽ • 1ͭͷ SRE Team ͕ Enabling Team ͱͯ͠ৼΔ෣͏ΑΓɺ։ൃ νʔϜ಺ʹ Enabling SRE Λ࡞Δํ޲ʹઓུมߋ https://blog.studysapuri.jp/entry/2022/02/17/sre-study-session
  28. ։ൃνʔϜ಺ Enabling SRE Λ࡞Δ

  29. ։ൃνʔϜ಺ Enabling SRE Λ࡞Δ

  30. 2020೥ࠒͷঢ়گ SRE ։ൃ νʔϜ ։ൃ νʔϜ Facilitating Facilitating Enabling Team

    Stream Aligned Team
  31. 2022೥ݱࡏ SRE ։ൃνʔϜ Facilitation SRE mem ber mem ber mem

    ber Facilitation ϑϥΫλϧతʹͳΔ Platform Team Enabling Team Stream Aligned Team Enabling SRE X as a Service
  32. Pure SRE vs Embedded SRE https://www.slideshare.net/newrelic/sreiously-de fi ning-the-principles-habits-and-practices-of-site-reliability-engineering-112178269

  33. 2020೥ࠒͷঢ়گ SRE ։ൃ νʔϜ ։ൃ νʔϜ Facilitating Facilitating Pure SRE

  34. 2022೥ݱࡏ SRE ։ൃνʔϜ Facilitating SRE mem ber mem ber mem

    ber Facilitating ϑϥΫλϧతʹͳΔ Pure SRE Embedded SRE X as a Service
  35. Agenda • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /

    Enabling SRE • ·ͱΊͱࠓޙ
  36. ࠓ೔࿩͢͜ͱ ϦΫϧʔτάϧʔϓͷ ʮϛογϣϯϚωδϝϯτʯΛ ׆༻ͯ͠։ൃνʔϜͷ SRE Capability शಘ Λࢧԉͨ͠ࣄྫ

  37. ͋Δ͍͸ (Partially) Embedded / Enabling SRE ͷࣄྫ

  38. ϛογϣϯϚωδϝϯτ https://github.com/twitter/twemoji

  39. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ • ϝϯόʔͷ Will / Can / Must ΛϚωʔδϟͱ͢Γ߹ΘͤΔ •

    ֤ϛογϣϯ͸ׂ߹ɾ಺༰ɾୡ੒ج४Λ߹ҙ͞ΕΔ • ϛογϣϯͷϨϙʔτϥΠϯ͸ඞͣ͠΋௚ଐͷνʔϜϚωʔ δϟͰ͋Δඞཁ͸ͳ͍
  40. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ EM Mem ber Mem ber Mem ber Mem ber

    ϛογϣϯͷ 30%Λ SRE ؔ܎ ͷ΋ͷʹઃఆ SRE ։ൃνʔϜ
  41. ۩ମతʹͲΜͳϛογϣϯΛઃఆ͔ͨ͠ • ։ൃνʔϜϝϯόʔʢதֶߨ࠲ϦχϡʔΞϧͷ։ൃʣ • ΠϯϑϥྖҬͷࣗݾ׬݁Խͷਪਐ 30% • ϓϩμΫτ։ൃͷͨΊͷϛογϣϯ 70% •

    SRE ϝϯόʔ • ։ൃνʔϜͷ։ൃऀੜ࢈ੑͷαϙʔτ 20% • Production Release ͷαϙʔτ 20% • SRE ͷͨΊͷϛογϣϯ 60% https://studysapuri.jp/course/junior/
  42. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ EM Mem ber Mem ber ΠϯϑϥྖҬͷ ࣗݾ׬݁Խͷਪ ਐ(30%) SRE

    ։ൃνʔϜ ϓϩμΫτ։ൃʹؔ͢Δ ϛογϣϯ(70%) ։ൃऀੜ࢈ੑ/ Production Release ͷ αϙʔτ(40%) / (ଞ60%)
  43. Ϛωʔδϟ͕΍ͬͨ͜ͱ • ֤ϝϯόʔͱͷఆظతͳ 1on1 • ϛογϣϯͷதؒৼΓฦΓ • ϛογϣϯΛՄࢹԽ͢ΔϛογϣϯπϦʔͷ࡞੒ • ϛογϣϯͷ૬ޓઆ໌ͷ৔ͷઃఆ

  44. ϛογϣϯΛՄࢹԽ͢ΔϛογϣϯπϦʔ https://blog.studysapuri.jp/entry/2022/02/25/sre-mission-tree

  45. Կ͕ى͖ͨͷ͔(1) • ੜ࢈ੑվળαΠΫϧͷՃ଎ • ՝୊ͷٵ্͍͛ -> ࣮૷ -> ϑΟʔυόοΫ ->

    վળͷαΠΫϧ͕Ճ଎
  46. Կ͕ى͖ͨͷ͔(2) • SRE Culture ͷ఻ൖɿϓϨϞʔςϜͷ࣮ࢪ https://blog.studysapuri.jp/entry/pre-mortem

  47. Կ͕ى͖ͨͷ͔(3) • ΞϥʔτϋϯυϦϯάͷαϙʔτ • Alert ͦͷ΋ͷͷઆ໌ɺௐࠪํ๏͸ SRE ͕αϙʔτ • ରԠͦͷ΋ͷ͸։ൃνʔϜͰ࣮ࢪ

  48. ݁ՌͲ͏ͳ͔ͬͨ • େ͖ͳো֐ͳ͘ελσΟαϓϦதֶߨ࠲ͷϑϧϦχϡʔΞϧ ͕ϦϦʔε • ։ൃνʔϜͰΞϥʔτରԠ࣮ݱ https://studysapuri.jp/course/junior/ https://github.com/twitter/twemoji

  49. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber

    Facilitation Pure SRE (։ൃνʔϜ಺) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
  50. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber

    Facilitating Pure SRE (։ൃνʔϜ಺) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
  51. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber

    Collaboration Pure SRE (։ൃνʔϜ಺) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
  52. ࠓճͷύλʔϯͷߟ࡯ • Enabling SRE ʹΑΔ Facilitating ͸”த”͔Β࡞Δํ͕ྑ͍ • ΑΓ։ൃνʔϜͷӡ༻ελΠϧʹ͋ͬͨܗͰద༻Ͱ͖Δ •

    ٕज़తͳ࣮૷͸ Platform ʹৄ͍͠ Pure SRE ͕”֎”͔Β Embedded ͞Εͯ Collaboration ͨ͠ํ͕ྑ͍ • ArgoCD, GitHub Actions ͳͲ Infrastructure ͸ Pure SRE ͕ৄ͍͠ • ՝୊ൃݟɺ࣮૷ɺϑΟʔυόοΫαΠΫϧΛߴ଎ʹճ͢͜ͱͰΑΓྑ ͍ Platform ͕ఏڙͰ͖Δ
  53. Agenda • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /

    Enabling SRE • ·ͱΊͱࠓޙ
  54. • ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ͸2छྨ͋Δ • Embedded SRE (from Pure SRE)

    / ֎͔Β఻͑Δ • Enabling SRE (in the Team) / ಺͔Β޿͛Δ • ૊৫ن໛ɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ • খن໛ / ։ൃॳظϑΣʔζͰ͋Ε͹ Embedded SRE Pattern • தେن໛ / ։ൃνʔϜ͕੒ख़ͯ͘͠Ε͹ Enabling SRE Pattern • ͜ͷ2ͭͷύλʔϯ͸ϚωδϝϯτͰσβΠϯͰ͖Δ • 100/0 Ͱͳ͘”෦෼తʹ”࣮ફ͢Δ͚ͩͰ΋ޮՌ͕͋Δ Tl;dr
  55. ࠓޙ͸͞Βʹ։ൃνʔϜͷεέʔϥϏϦςΟͷͨΊʹҎԼʹऔΓ૊Ή • SRE Capability शಘࢧԉ • ϛογϣϯϚωδϝϯτʹΑΔ։ൃνʔϜ಺ Enabling SRE ͷ࠾༻

    • SRE ੒ख़౓Ξηεϝϯτͷ࡞੒ɾ࣮ࢪ • SRE ஌ࣝɾٕज़शಘͷͨΊͷΦϯϘʔσΟϯάࢧԉ • Developer Success / ։ൃੜ࢈ੑ޲্ࢧԉ • Platform Λ Product ͱͯ͠։ൃ͢Δ • Developer Support ࠓճͷࣄྫ
  56. Special Thanks • @kyontan • As Embedded SRE • @ravelll

    • As Enabling SRE • ʰελσΟαϓϦʱதֶߨ࠲ϑϧϦχϡʔΞϧʹؔΘͬͨશͯͷਓ • SRE νʔϜϝϯόʔ
  57. Thank you! chaspy chaspy_ Engineering Manager, Site Reliability at Recruit

    Co., Ltd. Takeshi Kondo https://chaspy.me
  58. ͓·͚ɿSRE ੒ख़౓Ξηεϝϯτ