Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE を実現するための組織マネジメント / Management to achieve SRE

SRE を実現するための組織マネジメント / Management to achieve SRE

Takeshi Kondo

March 12, 2022
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. Who am I chaspy chaspy_ Engineering Manager, Site Reliability at

    Recruit Co., Ltd. Takeshi Kondo https://chaspy.me
  2. • ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ͸2छྨ͋Δ • Embedded SRE (from Pure SRE)

    / ֎͔Β఻͑Δ • Enabling SRE (in the Team) / ಺͔Β޿͛Δ • ૊৫ن໛ɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ • খن໛ / ։ൃॳظϑΣʔζͰ͋Ε͹ Embedded SRE Pattern • தେن໛ / ։ൃνʔϜ͕੒ख़ͯ͘͠Ε͹ Enabling SRE Pattern • ͜ͷ2ͭͷύλʔϯ͸ϚωδϝϯτͰσβΠϯͰ͖Δ • 100/0 Ͱͳ͘”෦෼తʹ”࣮ફ͢Δ͚ͩͰ΋ޮՌ͕͋Δ Tl;dr
  3. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Not like this • αʔϏε͕ʮߴ͍৴པੑ

    (ʹ100%)ʯΛอ͍ͬͯΔ͜ͱ • SLI/SLO ΛकΕ͍ͯΔ͜ͱ • ΦϯίʔϧϩʔςʔγϣϯΛ։ൃνʔϜͰߦ͏͜ͱ https://github.com/twitter/twemoji
  4. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Like this! • αʔϏε͕ʮϢʔβ͕ظ଴͢Δ৴པੑʯΛอ͍ͬͯΔ͜ͱ •

    SLI/SLO Λઃఆ͠ɺඇػೳཁ݅ͱػೳཁ݅ͷ༏ઌ౓ܾఆͷ ࢦඪͱͯ͠׆༻͍ͯ͠Δ • SLO ҧ൓͕ൃੜͨ͠ͱ͖ʹద੾ʹରॲͰ͖ΔΑ͏ͳϞχλ Ϧϯάํ๏ͱϙϦγʔ͕νʔϜͰಉҙ͞Ε͍ͯΔ • ্ه͕ఆظతʹݟ௚͞Ε͍ͯΔ https://github.com/twitter/twemoji
  5. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷ৴པੑʹ ؔ͢Δ

    Capability औಘ Λࢧԉ͢Δ ࣗ෼ͨͪͷαʔϏεͷ ৴པੑΛࣗ෼ͨͪͰί ϯτϩʔϧͰ͖͍ͯΔ
  6. Team Topologies • 4ͭͷνʔϜύλʔϯ • Stream Aligned • Platform •

    Enabling • Complicated Subsystem • 3ͭͷίϛϡχέʔγϣϯύλʔϯ • Collaboration • X as a Service • Facilitation https://pub.jmam.co.jp/book/b593881.html
  7. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ׬݁ ԽΛࢧ͑Δϓϥοτ

    ϑΥʔϜͱจԽΛ࡞Δ Platform Team Enabling Team Stream Aligned Team ։ൃνʔϜ͸ࣗ෼ͨͪͰඞཁͳ ΋ͷΛࣗ෼ͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ׬݁Խ
  8. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • 2019: Application Platform Λ Kubernetes ΁Ҡ؅ • 2020:

    Microservices Readiness ͷ੔උ • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc / Production Readiness Checklist • Self-services Infrastructure (terraform monorepo) • SLI/SLO • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹ׬શҠৡ Platform Team ͱͯ͠ Platform Λ ࡞͍ͬͯΔ Enabling Team ͱͯ͠ ։ൃ૊৫ʹ SLI/SLO ͳͲͷΧϧνϟʔৢ੒
  9. ૊৫ن໛ͷਪҠ     ։ൃऀ    

    43&     ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ
  10. 2021೥ɺEnabling SRE Λ։ൃνʔϜ಺͔Β࡞ΔΑ͏ઓུมߋ • ʮ৴པੑʯΛऔΓר͘։ൃ૊৫ͷঢ়گ͕ΑΓΞϓϦέʔγϣϯɾ υϝΠϯʹಛԽͨ͠໰୊ʹͳΓͭͭ͋ͬͨ • ෛՙࢼݧ • υϝΠϯಛԽͷ

    Pod Auto Scaling • Frontend Performance ͷଌఆ ͓Αͼ SLI/SLO ͷվળ • QA ࣗಈԽ • 1ͭͷ SRE Team ͕ Enabling Team ͱͯ͠ৼΔ෣͏ΑΓɺ։ൃ νʔϜ಺ʹ Enabling SRE Λ࡞Δํ޲ʹઓུมߋ https://blog.studysapuri.jp/entry/2022/02/17/sre-study-session
  11. 2022೥ݱࡏ SRE ։ൃνʔϜ Facilitation SRE mem ber mem ber mem

    ber Facilitation ϑϥΫλϧతʹͳΔ Platform Team Enabling Team Stream Aligned Team Enabling SRE X as a Service
  12. 2022೥ݱࡏ SRE ։ൃνʔϜ Facilitating SRE mem ber mem ber mem

    ber Facilitating ϑϥΫλϧతʹͳΔ Pure SRE Embedded SRE X as a Service
  13. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ • ϝϯόʔͷ Will / Can / Must ΛϚωʔδϟͱ͢Γ߹ΘͤΔ •

    ֤ϛογϣϯ͸ׂ߹ɾ಺༰ɾୡ੒ج४Λ߹ҙ͞ΕΔ • ϛογϣϯͷϨϙʔτϥΠϯ͸ඞͣ͠΋௚ଐͷνʔϜϚωʔ δϟͰ͋Δඞཁ͸ͳ͍
  14. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ EM Mem ber Mem ber Mem ber Mem ber

    ϛογϣϯͷ 30%Λ SRE ؔ܎ ͷ΋ͷʹઃఆ SRE ։ൃνʔϜ
  15. ۩ମతʹͲΜͳϛογϣϯΛઃఆ͔ͨ͠ • ։ൃνʔϜϝϯόʔʢதֶߨ࠲ϦχϡʔΞϧͷ։ൃʣ • ΠϯϑϥྖҬͷࣗݾ׬݁Խͷਪਐ 30% • ϓϩμΫτ։ൃͷͨΊͷϛογϣϯ 70% •

    SRE ϝϯόʔ • ։ൃνʔϜͷ։ൃऀੜ࢈ੑͷαϙʔτ 20% • Production Release ͷαϙʔτ 20% • SRE ͷͨΊͷϛογϣϯ 60% https://studysapuri.jp/course/junior/
  16. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ EM Mem ber Mem ber ΠϯϑϥྖҬͷ ࣗݾ׬݁Խͷਪ ਐ(30%) SRE

    ։ൃνʔϜ ϓϩμΫτ։ൃʹؔ͢Δ ϛογϣϯ(70%) ։ൃऀੜ࢈ੑ/ Production Release ͷ αϙʔτ(40%) / (ଞ60%)
  17. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber

    Facilitation Pure SRE (։ൃνʔϜ಺) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
  18. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber

    Facilitating Pure SRE (։ൃνʔϜ಺) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
  19. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber

    Collaboration Pure SRE (։ൃνʔϜ಺) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
  20. ࠓճͷύλʔϯͷߟ࡯ • Enabling SRE ʹΑΔ Facilitating ͸”த”͔Β࡞Δํ͕ྑ͍ • ΑΓ։ൃνʔϜͷӡ༻ελΠϧʹ͋ͬͨܗͰద༻Ͱ͖Δ •

    ٕज़తͳ࣮૷͸ Platform ʹৄ͍͠ Pure SRE ͕”֎”͔Β Embedded ͞Εͯ Collaboration ͨ͠ํ͕ྑ͍ • ArgoCD, GitHub Actions ͳͲ Infrastructure ͸ Pure SRE ͕ৄ͍͠ • ՝୊ൃݟɺ࣮૷ɺϑΟʔυόοΫαΠΫϧΛߴ଎ʹճ͢͜ͱͰΑΓྑ ͍ Platform ͕ఏڙͰ͖Δ
  21. • ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ͸2छྨ͋Δ • Embedded SRE (from Pure SRE)

    / ֎͔Β఻͑Δ • Enabling SRE (in the Team) / ಺͔Β޿͛Δ • ૊৫ن໛ɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ • খن໛ / ։ൃॳظϑΣʔζͰ͋Ε͹ Embedded SRE Pattern • தେن໛ / ։ൃνʔϜ͕੒ख़ͯ͘͠Ε͹ Enabling SRE Pattern • ͜ͷ2ͭͷύλʔϯ͸ϚωδϝϯτͰσβΠϯͰ͖Δ • 100/0 Ͱͳ͘”෦෼తʹ”࣮ફ͢Δ͚ͩͰ΋ޮՌ͕͋Δ Tl;dr
  22. ࠓޙ͸͞Βʹ։ൃνʔϜͷεέʔϥϏϦςΟͷͨΊʹҎԼʹऔΓ૊Ή • SRE Capability शಘࢧԉ • ϛογϣϯϚωδϝϯτʹΑΔ։ൃνʔϜ಺ Enabling SRE ͷ࠾༻

    • SRE ੒ख़౓Ξηεϝϯτͷ࡞੒ɾ࣮ࢪ • SRE ஌ࣝɾٕज़शಘͷͨΊͷΦϯϘʔσΟϯάࢧԉ • Developer Success / ։ൃੜ࢈ੑ޲্ࢧԉ • Platform Λ Product ͱͯ͠։ൃ͢Δ • Developer Support ࠓճͷࣄྫ
  23. Special Thanks • @kyontan • As Embedded SRE • @ravelll

    • As Enabling SRE • ʰελσΟαϓϦʱதֶߨ࠲ϑϧϦχϡʔΞϧʹؔΘͬͨશͯͷਓ • SRE νʔϜϝϯόʔ