Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE を実現するための組織マネジメント / Management to achieve SRE

SRE を実現するための組織マネジメント / Management to achieve SRE

Takeshi Kondo

March 12, 2022
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

 1. SRE Λ࣮ݱ͢ΔͨΊͷ૊৫Ϛωδϝϯτ
  Takeshi Kondo / @chaspy


  2022/03/12


  6ࣾ߹ಉ SREษڧձ

  View Slide

 2. Who am I
  chaspy chaspy_
  Engineering Manager, Site Reliability

  at Recruit Co., Ltd.
  Takeshi Kondo
  https://chaspy.me

  View Slide

 3. Who am I
  chaspy chaspy_
  ʢגʣϦΫϧʔτ

  ϓϩμΫτ౷ׅຊ෦

  ϓϩμΫτ։ൃ౷ׅࣨ

  ϓϩμΫτσΟϕϩοϓϝϯτࣨ

  ·ͳͼྖҬϓϩμΫτσΟϕϩοϓϝϯτϢχοτ

  খதߴϓϩμΫτ։ൃ෦

  খதߴ̨̧̚άϧʔϓ

  άϧʔϓϚωʔδϟ
  Takeshi Kondo
  https://chaspy.me

  View Slide

 4. ࠓ೔࿩͢͜ͱ


  ϦΫϧʔτάϧʔϓͷ


  ʮϛογϣϯϚωδϝϯτʯΛ
  ׆༻ͯ͠։ൃνʔϜͷ


  SRE Capability शಘ


  Λࢧԉͨ͠ࣄྫ

  View Slide

 5. ͋Δ͍͸


  (Partially)


  Embedded / Enabling SRE ͷࣄྫ

  View Slide

 6. • ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ͸2छྨ͋Δ


  • Embedded SRE (from Pure SRE) / ֎͔Β఻͑Δ


  • Enabling SRE (in the Team) / ಺͔Β޿͛Δ


  • ૊৫ن໛ɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ


  • খن໛ / ։ൃॳظϑΣʔζͰ͋Ε͹ Embedded SRE Pattern


  • தେن໛ / ։ൃνʔϜ͕੒ख़ͯ͘͠Ε͹ Enabling SRE Pattern


  • ͜ͷ2ͭͷύλʔϯ͸ϚωδϝϯτͰσβΠϯͰ͖Δ


  • 100/0 Ͱͳ͘”෦෼తʹ”࣮ફ͢Δ͚ͩͰ΋ޮՌ͕͋Δ
  Tl;dr

  View Slide

 7. Disclimer
  • Management ͷྫͱͯ͠঺հ͠·͕͢ɺ੒Ռ͕ग़ͨͷ͸
  ϛογϣϯΛҾ͖ड͚ͯ͘ΕͨϝϯόʔɺSREɺ։ൃνʔϜ
  ͷօ͞Μͷ͓͔͛Ͱ͢ɻ͍ͭ΋͋Γ͕ͱ͏͍͟͝·͢ʂ

  View Slide

 8. Agenda
  • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


  • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


  • ࣄྫɿ(Partially) Embedded / Enabling SRE


  • ·ͱΊͱࠓޙ

  View Slide

 9. Agenda
  • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


  • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


  • ࣄྫɿ(Partially) Embedded / Enabling SRE


  • ·ͱΊͱࠓޙ

  View Slide

 10. SRE Λ࣮ݱ͢Δͱ͸

  View Slide

 11. ։ൃνʔϜ͕৴པੑΛ


  ίϯτϩʔϧ͢Δ
  Capability Λ


  ਎ʹ͚͍ͭͯΔ͜ͱ

  View Slide

 12. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Not like this
  • αʔϏε͕ʮߴ͍৴པੑ (ʹ100%)ʯΛอ͍ͬͯΔ͜ͱ


  • SLI/SLO ΛकΕ͍ͯΔ͜ͱ


  • ΦϯίʔϧϩʔςʔγϣϯΛ։ൃνʔϜͰߦ͏͜ͱ
  https://github.com/twitter/twemoji

  View Slide

 13. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Like this!
  • αʔϏε͕ʮϢʔβ͕ظ଴͢Δ৴པੑʯΛอ͍ͬͯΔ͜ͱ


  • SLI/SLO Λઃఆ͠ɺඇػೳཁ݅ͱػೳཁ݅ͷ༏ઌ౓ܾఆͷ
  ࢦඪͱͯ͠׆༻͍ͯ͠Δ


  • SLO ҧ൓͕ൃੜͨ͠ͱ͖ʹద੾ʹରॲͰ͖ΔΑ͏ͳϞχλ
  Ϧϯάํ๏ͱϙϦγʔ͕νʔϜͰಉҙ͞Ε͍ͯΔ


  • ্ه͕ఆظతʹݟ௚͞Ε͍ͯΔ
  https://github.com/twitter/twemoji

  View Slide

 14. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this!
  SRE
  ։ൃ
  νʔϜ
  ։ൃνʔϜͷ৴པੑʹ
  ؔ͢Δ Capability औಘ
  Λࢧԉ͢Δ
  ࣗ෼ͨͪͷαʔϏεͷ
  ৴པੑΛࣗ෼ͨͪͰί
  ϯτϩʔϧͰ͖͍ͯΔ

  View Slide

 15. Team Topologies
  • 4ͭͷνʔϜύλʔϯ


  • Stream Aligned


  • Platform


  • Enabling


  • Complicated Subsystem


  • 3ͭͷίϛϡχέʔγϣϯύλʔϯ


  • Collaboration


  • X as a Service


  • Facilitation
  https://pub.jmam.co.jp/book/b593881.html

  View Slide

 16. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this!
  SRE
  ։ൃ
  νʔϜ
  ։ൃνʔϜͷࣗݾ׬݁
  ԽΛࢧ͑Δϓϥοτ
  ϑΥʔϜͱจԽΛ࡞Δ
  Platform Team


  Enabling Team
  Stream Aligned Team
  ։ൃνʔϜ͸ࣗ෼ͨͪͰඞཁͳ
  ΋ͷΛࣗ෼ͨͪͰ༻ҙͰ͖Δ


  = self-contained / ࣗݾ׬݁Խ

  View Slide

 17. SRE Team ͷ Vision / Mission / Values
  https://blog.studysapuri.jp/entry/sre-vision-mission-values

  View Slide

 18. Mission


  ࣗݾ׬݁νʔϜ͕ϓϩμΫ
  τΛૉૣ҆͘શʹಧ͚ଓ͚
  ΔͨΊͷϓϥοτϑΥʔϜ
  ͱจԽΛ࡞Δ

  View Slide

 19. Agenda
  • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


  • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


  • ࣄྫɿ(Partially) Embedded / Enabling SRE


  • ·ͱΊͱࠓޙ

  View Slide

 20. ʢͦͷલʹʣ


  ϓϩμΫτ঺հ

  View Slide

 21. View Slide

 22. View Slide

 23. View Slide

 24. ྺ࢙͔ΒৼΓฦΔ


  ʰελσΟαϓϦʱSRE

  View Slide

 25. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE
  • 2019: Application Platform Λ Kubernetes ΁Ҡ؅


  • 2020: Microservices Readiness ͷ੔උ


  • αʔϏεΦʔφʔγοϓͷࡦఆ


  • Design Doc / Production Readiness Checklist


  • Self-services Infrastructure (terraform monorepo)


  • SLI/SLO


  • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹ׬શҠৡ
  Platform Team ͱͯ͠ Platform Λ
  ࡞͍ͬͯΔ
  Enabling Team ͱͯ͠
  ։ൃ૊৫ʹ SLI/SLO
  ͳͲͷΧϧνϟʔৢ੒

  View Slide

 26. ૊৫ن໛ͷਪҠ

  ։ൃऀ
  43&
  ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ

  View Slide

 27. 2021೥ɺEnabling SRE Λ։ൃνʔϜ಺͔Β࡞ΔΑ͏ઓུมߋ
  • ʮ৴པੑʯΛऔΓר͘։ൃ૊৫ͷঢ়گ͕ΑΓΞϓϦέʔγϣϯɾ
  υϝΠϯʹಛԽͨ͠໰୊ʹͳΓͭͭ͋ͬͨ


  • ෛՙࢼݧ


  • υϝΠϯಛԽͷ Pod Auto Scaling


  • Frontend Performance ͷଌఆ ͓Αͼ SLI/SLO ͷվળ


  • QA ࣗಈԽ


  • 1ͭͷ SRE Team ͕ Enabling Team ͱͯ͠ৼΔ෣͏ΑΓɺ։ൃ
  νʔϜ಺ʹ Enabling SRE Λ࡞Δํ޲ʹઓུมߋ
  https://blog.studysapuri.jp/entry/2022/02/17/sre-study-session

  View Slide

 28. ։ൃνʔϜ಺ Enabling SRE Λ࡞Δ

  View Slide

 29. ։ൃνʔϜ಺ Enabling SRE Λ࡞Δ

  View Slide

 30. 2020೥ࠒͷঢ়گ
  SRE
  ։ൃ
  νʔϜ
  ։ൃ
  νʔϜ
  Facilitating
  Facilitating
  Enabling Team Stream Aligned Team

  View Slide

 31. 2022೥ݱࡏ
  SRE
  ։ൃνʔϜ
  Facilitation
  SRE
  mem
  ber
  mem
  ber
  mem
  ber
  Facilitation
  ϑϥΫλϧతʹͳΔ
  Platform Team


  Enabling Team
  Stream Aligned Team
  Enabling SRE
  X as a Service

  View Slide

 32. Pure SRE vs Embedded SRE
  https://www.slideshare.net/newrelic/sreiously-de
  fi
  ning-the-principles-habits-and-practices-of-site-reliability-engineering-112178269

  View Slide

 33. 2020೥ࠒͷঢ়گ
  SRE
  ։ൃ
  νʔϜ
  ։ൃ
  νʔϜ
  Facilitating
  Facilitating
  Pure SRE

  View Slide

 34. 2022೥ݱࡏ
  SRE
  ։ൃνʔϜ
  Facilitating
  SRE
  mem
  ber
  mem
  ber
  mem
  ber
  Facilitating
  ϑϥΫλϧతʹͳΔ
  Pure SRE
  Embedded SRE
  X as a Service

  View Slide

 35. Agenda
  • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


  • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


  • ࣄྫɿ(Partially) Embedded / Enabling SRE


  • ·ͱΊͱࠓޙ

  View Slide

 36. ࠓ೔࿩͢͜ͱ


  ϦΫϧʔτάϧʔϓͷ


  ʮϛογϣϯϚωδϝϯτʯΛ
  ׆༻ͯ͠։ൃνʔϜͷ


  SRE Capability शಘ


  Λࢧԉͨ͠ࣄྫ

  View Slide

 37. ͋Δ͍͸


  (Partially)


  Embedded / Enabling SRE ͷࣄྫ

  View Slide

 38. ϛογϣϯϚωδϝϯτ
  https://github.com/twitter/twemoji

  View Slide

 39. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ
  • ϝϯόʔͷ Will / Can / Must ΛϚωʔδϟͱ͢Γ߹ΘͤΔ


  • ֤ϛογϣϯ͸ׂ߹ɾ಺༰ɾୡ੒ج४Λ߹ҙ͞ΕΔ


  • ϛογϣϯͷϨϙʔτϥΠϯ͸ඞͣ͠΋௚ଐͷνʔϜϚωʔ
  δϟͰ͋Δඞཁ͸ͳ͍

  View Slide

 40. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ
  EM
  Mem
  ber
  Mem
  ber
  Mem
  ber
  Mem
  ber
  ϛογϣϯͷ
  30%Λ SRE ؔ܎
  ͷ΋ͷʹઃఆ
  SRE ։ൃνʔϜ

  View Slide

 41. ۩ମతʹͲΜͳϛογϣϯΛઃఆ͔ͨ͠
  • ։ൃνʔϜϝϯόʔʢதֶߨ࠲ϦχϡʔΞϧͷ։ൃʣ


  • ΠϯϑϥྖҬͷࣗݾ׬݁Խͷਪਐ 30%


  • ϓϩμΫτ։ൃͷͨΊͷϛογϣϯ 70%


  • SRE ϝϯόʔ


  • ։ൃνʔϜͷ։ൃऀੜ࢈ੑͷαϙʔτ 20%


  • Production Release ͷαϙʔτ 20%


  • SRE ͷͨΊͷϛογϣϯ 60%
  https://studysapuri.jp/course/junior/

  View Slide

 42. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ
  EM
  Mem
  ber
  Mem
  ber
  ΠϯϑϥྖҬͷ
  ࣗݾ׬݁Խͷਪ
  ਐ(30%)
  SRE ։ൃνʔϜ
  ϓϩμΫτ։ൃʹؔ͢Δ
  ϛογϣϯ(70%)
  ։ൃऀੜ࢈ੑ/
  Production Release ͷ
  αϙʔτ(40%) / (ଞ60%)

  View Slide

 43. Ϛωʔδϟ͕΍ͬͨ͜ͱ
  • ֤ϝϯόʔͱͷఆظతͳ 1on1


  • ϛογϣϯͷதؒৼΓฦΓ


  • ϛογϣϯΛՄࢹԽ͢ΔϛογϣϯπϦʔͷ࡞੒


  • ϛογϣϯͷ૬ޓઆ໌ͷ৔ͷઃఆ

  View Slide

 44. ϛογϣϯΛՄࢹԽ͢ΔϛογϣϯπϦʔ
  https://blog.studysapuri.jp/entry/2022/02/25/sre-mission-tree

  View Slide

 45. Կ͕ى͖ͨͷ͔(1)
  • ੜ࢈ੑվળαΠΫϧͷՃ଎


  • ՝୊ͷٵ্͍͛ -> ࣮૷ -> ϑΟʔυόοΫ -> վળͷαΠΫϧ͕Ճ଎

  View Slide

 46. Կ͕ى͖ͨͷ͔(2)
  • SRE Culture ͷ఻ൖɿϓϨϞʔςϜͷ࣮ࢪ
  https://blog.studysapuri.jp/entry/pre-mortem

  View Slide

 47. Կ͕ى͖ͨͷ͔(3)
  • ΞϥʔτϋϯυϦϯάͷαϙʔτ


  • Alert ͦͷ΋ͷͷઆ໌ɺௐࠪํ๏͸ SRE ͕αϙʔτ


  • ରԠͦͷ΋ͷ͸։ൃνʔϜͰ࣮ࢪ

  View Slide

 48. ݁ՌͲ͏ͳ͔ͬͨ
  • େ͖ͳো֐ͳ͘ελσΟαϓϦதֶߨ࠲ͷϑϧϦχϡʔΞϧ
  ͕ϦϦʔε


  • ։ൃνʔϜͰΞϥʔτରԠ࣮ݱ
  https://studysapuri.jp/course/junior/ https://github.com/twitter/twemoji

  View Slide

 49. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔
  SRE
  ։ൃνʔϜ
  SRE
  mem
  ber
  mem
  ber
  mem
  ber
  Facilitation
  Pure SRE
  (։ൃνʔϜ಺)
  (Partially)
  Enabling SRE
  SRE
  (Partially) Embedded
  SRE ͱͯ͠Ҡಈ

  View Slide

 50. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔
  SRE
  ։ൃνʔϜ
  SRE
  mem
  ber
  mem
  ber
  mem
  ber
  Facilitating
  Pure SRE
  (։ൃνʔϜ಺)
  (Partially)
  Enabling SRE
  SRE
  (Partially) Embedded
  SRE ͱͯ͠Ҡಈ

  View Slide

 51. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔
  SRE
  ։ൃνʔϜ
  SRE
  mem
  ber
  mem
  ber
  mem
  ber
  Collaboration
  Pure SRE
  (։ൃνʔϜ಺)
  (Partially)
  Enabling SRE
  SRE
  (Partially) Embedded
  SRE ͱͯ͠Ҡಈ

  View Slide

 52. ࠓճͷύλʔϯͷߟ࡯
  • Enabling SRE ʹΑΔ Facilitating ͸”த”͔Β࡞Δํ͕ྑ͍


  • ΑΓ։ൃνʔϜͷӡ༻ελΠϧʹ͋ͬͨܗͰద༻Ͱ͖Δ


  • ٕज़తͳ࣮૷͸ Platform ʹৄ͍͠ Pure SRE ͕”֎”͔Β
  Embedded ͞Εͯ Collaboration ͨ͠ํ͕ྑ͍


  • ArgoCD, GitHub Actions ͳͲ Infrastructure ͸ Pure SRE ͕ৄ͍͠


  • ՝୊ൃݟɺ࣮૷ɺϑΟʔυόοΫαΠΫϧΛߴ଎ʹճ͢͜ͱͰΑΓྑ
  ͍ Platform ͕ఏڙͰ͖Δ

  View Slide

 53. Agenda
  • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


  • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


  • ࣄྫɿ(Partially) Embedded / Enabling SRE


  • ·ͱΊͱࠓޙ

  View Slide

 54. • ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ͸2छྨ͋Δ


  • Embedded SRE (from Pure SRE) / ֎͔Β఻͑Δ


  • Enabling SRE (in the Team) / ಺͔Β޿͛Δ


  • ૊৫ن໛ɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ


  • খن໛ / ։ൃॳظϑΣʔζͰ͋Ε͹ Embedded SRE Pattern


  • தେن໛ / ։ൃνʔϜ͕੒ख़ͯ͘͠Ε͹ Enabling SRE Pattern


  • ͜ͷ2ͭͷύλʔϯ͸ϚωδϝϯτͰσβΠϯͰ͖Δ


  • 100/0 Ͱͳ͘”෦෼తʹ”࣮ફ͢Δ͚ͩͰ΋ޮՌ͕͋Δ
  Tl;dr

  View Slide

 55. ࠓޙ͸͞Βʹ։ൃνʔϜͷεέʔϥϏϦςΟͷͨΊʹҎԼʹऔΓ૊Ή
  • SRE Capability शಘࢧԉ


  • ϛογϣϯϚωδϝϯτʹΑΔ։ൃνʔϜ಺ Enabling SRE ͷ࠾༻


  • SRE ੒ख़౓Ξηεϝϯτͷ࡞੒ɾ࣮ࢪ


  • SRE ஌ࣝɾٕज़शಘͷͨΊͷΦϯϘʔσΟϯάࢧԉ


  • Developer Success / ։ൃੜ࢈ੑ޲্ࢧԉ


  • Platform Λ Product ͱͯ͠։ൃ͢Δ


  • Developer Support
  ࠓճͷࣄྫ

  View Slide

 56. Special Thanks
  • @kyontan


  • As Embedded SRE


  • @ravelll


  • As Enabling SRE


  • ʰελσΟαϓϦʱதֶߨ࠲ϑϧϦχϡʔΞϧʹؔΘͬͨશͯͷਓ


  • SRE νʔϜϝϯόʔ

  View Slide

 57. Thank you!
  chaspy chaspy_
  Engineering Manager, Site Reliability

  at Recruit Co., Ltd.
  Takeshi Kondo
  https://chaspy.me

  View Slide

 58. ͓·͚ɿSRE ੒ख़౓Ξηεϝϯτ

  View Slide