Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE を実現するための組織マネジメント / Management to achieve SRE

SRE を実現するための組織マネジメント / Management to achieve SRE

Takeshi Kondo

March 12, 2022
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. SRE Λ࣮ݱ͢ΔͨΊͷ૊৫Ϛωδϝϯτ
    Takeshi Kondo / @chaspy


    2022/03/12


    6ࣾ߹ಉ SREษڧձ

    View Slide

  2. Who am I
    chaspy chaspy_
    Engineering Manager, Site Reliability

    at Recruit Co., Ltd.
    Takeshi Kondo
    https://chaspy.me

    View Slide

  3. Who am I
    chaspy chaspy_
    ʢגʣϦΫϧʔτ

    ϓϩμΫτ౷ׅຊ෦

    ϓϩμΫτ։ൃ౷ׅࣨ

    ϓϩμΫτσΟϕϩοϓϝϯτࣨ

    ·ͳͼྖҬϓϩμΫτσΟϕϩοϓϝϯτϢχοτ

    খதߴϓϩμΫτ։ൃ෦

    খதߴ̨̧̚άϧʔϓ

    άϧʔϓϚωʔδϟ
    Takeshi Kondo
    https://chaspy.me

    View Slide

  4. ࠓ೔࿩͢͜ͱ


    ϦΫϧʔτάϧʔϓͷ


    ʮϛογϣϯϚωδϝϯτʯΛ
    ׆༻ͯ͠։ൃνʔϜͷ


    SRE Capability शಘ


    Λࢧԉͨ͠ࣄྫ

    View Slide

  5. ͋Δ͍͸


    (Partially)


    Embedded / Enabling SRE ͷࣄྫ

    View Slide

  6. • ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ͸2छྨ͋Δ


    • Embedded SRE (from Pure SRE) / ֎͔Β఻͑Δ


    • Enabling SRE (in the Team) / ಺͔Β޿͛Δ


    • ૊৫ن໛ɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ


    • খن໛ / ։ൃॳظϑΣʔζͰ͋Ε͹ Embedded SRE Pattern


    • தେن໛ / ։ൃνʔϜ͕੒ख़ͯ͘͠Ε͹ Enabling SRE Pattern


    • ͜ͷ2ͭͷύλʔϯ͸ϚωδϝϯτͰσβΠϯͰ͖Δ


    • 100/0 Ͱͳ͘”෦෼తʹ”࣮ફ͢Δ͚ͩͰ΋ޮՌ͕͋Δ
    Tl;dr

    View Slide

  7. Disclimer
    • Management ͷྫͱͯ͠঺հ͠·͕͢ɺ੒Ռ͕ग़ͨͷ͸
    ϛογϣϯΛҾ͖ड͚ͯ͘ΕͨϝϯόʔɺSREɺ։ൃνʔϜ
    ͷօ͞Μͷ͓͔͛Ͱ͢ɻ͍ͭ΋͋Γ͕ͱ͏͍͟͝·͢ʂ

    View Slide

  8. Agenda
    • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


    • ࣄྫɿ(Partially) Embedded / Enabling SRE


    • ·ͱΊͱࠓޙ

    View Slide

  9. Agenda
    • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


    • ࣄྫɿ(Partially) Embedded / Enabling SRE


    • ·ͱΊͱࠓޙ

    View Slide

  10. SRE Λ࣮ݱ͢Δͱ͸

    View Slide

  11. ։ൃνʔϜ͕৴པੑΛ


    ίϯτϩʔϧ͢Δ
    Capability Λ


    ਎ʹ͚͍ͭͯΔ͜ͱ

    View Slide

  12. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Not like this
    • αʔϏε͕ʮߴ͍৴པੑ (ʹ100%)ʯΛอ͍ͬͯΔ͜ͱ


    • SLI/SLO ΛकΕ͍ͯΔ͜ͱ


    • ΦϯίʔϧϩʔςʔγϣϯΛ։ൃνʔϜͰߦ͏͜ͱ
    https://github.com/twitter/twemoji

    View Slide

  13. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Like this!
    • αʔϏε͕ʮϢʔβ͕ظ଴͢Δ৴པੑʯΛอ͍ͬͯΔ͜ͱ


    • SLI/SLO Λઃఆ͠ɺඇػೳཁ݅ͱػೳཁ݅ͷ༏ઌ౓ܾఆͷ
    ࢦඪͱͯ͠׆༻͍ͯ͠Δ


    • SLO ҧ൓͕ൃੜͨ͠ͱ͖ʹద੾ʹରॲͰ͖ΔΑ͏ͳϞχλ
    Ϧϯάํ๏ͱϙϦγʔ͕νʔϜͰಉҙ͞Ε͍ͯΔ


    • ্ه͕ఆظతʹݟ௚͞Ε͍ͯΔ
    https://github.com/twitter/twemoji

    View Slide

  14. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this!
    SRE
    ։ൃ
    νʔϜ
    ։ൃνʔϜͷ৴པੑʹ
    ؔ͢Δ Capability औಘ
    Λࢧԉ͢Δ
    ࣗ෼ͨͪͷαʔϏεͷ
    ৴པੑΛࣗ෼ͨͪͰί
    ϯτϩʔϧͰ͖͍ͯΔ

    View Slide

  15. Team Topologies
    • 4ͭͷνʔϜύλʔϯ


    • Stream Aligned


    • Platform


    • Enabling


    • Complicated Subsystem


    • 3ͭͷίϛϡχέʔγϣϯύλʔϯ


    • Collaboration


    • X as a Service


    • Facilitation
    https://pub.jmam.co.jp/book/b593881.html

    View Slide

  16. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this!
    SRE
    ։ൃ
    νʔϜ
    ։ൃνʔϜͷࣗݾ׬݁
    ԽΛࢧ͑Δϓϥοτ
    ϑΥʔϜͱจԽΛ࡞Δ
    Platform Team


    Enabling Team
    Stream Aligned Team
    ։ൃνʔϜ͸ࣗ෼ͨͪͰඞཁͳ
    ΋ͷΛࣗ෼ͨͪͰ༻ҙͰ͖Δ


    = self-contained / ࣗݾ׬݁Խ

    View Slide

  17. SRE Team ͷ Vision / Mission / Values
    https://blog.studysapuri.jp/entry/sre-vision-mission-values

    View Slide

  18. Mission


    ࣗݾ׬݁νʔϜ͕ϓϩμΫ
    τΛૉૣ҆͘શʹಧ͚ଓ͚
    ΔͨΊͷϓϥοτϑΥʔϜ
    ͱจԽΛ࡞Δ

    View Slide

  19. Agenda
    • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


    • ࣄྫɿ(Partially) Embedded / Enabling SRE


    • ·ͱΊͱࠓޙ

    View Slide

  20. ʢͦͷલʹʣ


    ϓϩμΫτ঺հ

    View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. ྺ࢙͔ΒৼΓฦΔ


    ʰελσΟαϓϦʱSRE

    View Slide

  25. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE
    • 2019: Application Platform Λ Kubernetes ΁Ҡ؅


    • 2020: Microservices Readiness ͷ੔උ


    • αʔϏεΦʔφʔγοϓͷࡦఆ


    • Design Doc / Production Readiness Checklist


    • Self-services Infrastructure (terraform monorepo)


    • SLI/SLO


    • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹ׬શҠৡ
    Platform Team ͱͯ͠ Platform Λ
    ࡞͍ͬͯΔ
    Enabling Team ͱͯ͠
    ։ൃ૊৫ʹ SLI/SLO
    ͳͲͷΧϧνϟʔৢ੒

    View Slide

  26. ૊৫ن໛ͷਪҠ

    ։ൃऀ
    43&
    ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ

    View Slide

  27. 2021೥ɺEnabling SRE Λ։ൃνʔϜ಺͔Β࡞ΔΑ͏ઓུมߋ
    • ʮ৴པੑʯΛऔΓר͘։ൃ૊৫ͷঢ়گ͕ΑΓΞϓϦέʔγϣϯɾ
    υϝΠϯʹಛԽͨ͠໰୊ʹͳΓͭͭ͋ͬͨ


    • ෛՙࢼݧ


    • υϝΠϯಛԽͷ Pod Auto Scaling


    • Frontend Performance ͷଌఆ ͓Αͼ SLI/SLO ͷվળ


    • QA ࣗಈԽ


    • 1ͭͷ SRE Team ͕ Enabling Team ͱͯ͠ৼΔ෣͏ΑΓɺ։ൃ
    νʔϜ಺ʹ Enabling SRE Λ࡞Δํ޲ʹઓུมߋ
    https://blog.studysapuri.jp/entry/2022/02/17/sre-study-session

    View Slide

  28. ։ൃνʔϜ಺ Enabling SRE Λ࡞Δ

    View Slide

  29. ։ൃνʔϜ಺ Enabling SRE Λ࡞Δ

    View Slide

  30. 2020೥ࠒͷঢ়گ
    SRE
    ։ൃ
    νʔϜ
    ։ൃ
    νʔϜ
    Facilitating
    Facilitating
    Enabling Team Stream Aligned Team

    View Slide

  31. 2022೥ݱࡏ
    SRE
    ։ൃνʔϜ
    Facilitation
    SRE
    mem
    ber
    mem
    ber
    mem
    ber
    Facilitation
    ϑϥΫλϧతʹͳΔ
    Platform Team


    Enabling Team
    Stream Aligned Team
    Enabling SRE
    X as a Service

    View Slide

  32. Pure SRE vs Embedded SRE
    https://www.slideshare.net/newrelic/sreiously-de
    fi
    ning-the-principles-habits-and-practices-of-site-reliability-engineering-112178269

    View Slide

  33. 2020೥ࠒͷঢ়گ
    SRE
    ։ൃ
    νʔϜ
    ։ൃ
    νʔϜ
    Facilitating
    Facilitating
    Pure SRE

    View Slide

  34. 2022೥ݱࡏ
    SRE
    ։ൃνʔϜ
    Facilitating
    SRE
    mem
    ber
    mem
    ber
    mem
    ber
    Facilitating
    ϑϥΫλϧతʹͳΔ
    Pure SRE
    Embedded SRE
    X as a Service

    View Slide

  35. Agenda
    • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


    • ࣄྫɿ(Partially) Embedded / Enabling SRE


    • ·ͱΊͱࠓޙ

    View Slide

  36. ࠓ೔࿩͢͜ͱ


    ϦΫϧʔτάϧʔϓͷ


    ʮϛογϣϯϚωδϝϯτʯΛ
    ׆༻ͯ͠։ൃνʔϜͷ


    SRE Capability शಘ


    Λࢧԉͨ͠ࣄྫ

    View Slide

  37. ͋Δ͍͸


    (Partially)


    Embedded / Enabling SRE ͷࣄྫ

    View Slide

  38. ϛογϣϯϚωδϝϯτ
    https://github.com/twitter/twemoji

    View Slide

  39. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ
    • ϝϯόʔͷ Will / Can / Must ΛϚωʔδϟͱ͢Γ߹ΘͤΔ


    • ֤ϛογϣϯ͸ׂ߹ɾ಺༰ɾୡ੒ج४Λ߹ҙ͞ΕΔ


    • ϛογϣϯͷϨϙʔτϥΠϯ͸ඞͣ͠΋௚ଐͷνʔϜϚωʔ
    δϟͰ͋Δඞཁ͸ͳ͍

    View Slide

  40. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ
    EM
    Mem
    ber
    Mem
    ber
    Mem
    ber
    Mem
    ber
    ϛογϣϯͷ
    30%Λ SRE ؔ܎
    ͷ΋ͷʹઃఆ
    SRE ։ൃνʔϜ

    View Slide

  41. ۩ମతʹͲΜͳϛογϣϯΛઃఆ͔ͨ͠
    • ։ൃνʔϜϝϯόʔʢதֶߨ࠲ϦχϡʔΞϧͷ։ൃʣ


    • ΠϯϑϥྖҬͷࣗݾ׬݁Խͷਪਐ 30%


    • ϓϩμΫτ։ൃͷͨΊͷϛογϣϯ 70%


    • SRE ϝϯόʔ


    • ։ൃνʔϜͷ։ൃऀੜ࢈ੑͷαϙʔτ 20%


    • Production Release ͷαϙʔτ 20%


    • SRE ͷͨΊͷϛογϣϯ 60%
    https://studysapuri.jp/course/junior/

    View Slide

  42. ϦΫϧʔτͷϛογϣϯϚωδϝϯτ
    EM
    Mem
    ber
    Mem
    ber
    ΠϯϑϥྖҬͷ
    ࣗݾ׬݁Խͷਪ
    ਐ(30%)
    SRE ։ൃνʔϜ
    ϓϩμΫτ։ൃʹؔ͢Δ
    ϛογϣϯ(70%)
    ։ൃऀੜ࢈ੑ/
    Production Release ͷ
    αϙʔτ(40%) / (ଞ60%)

    View Slide

  43. Ϛωʔδϟ͕΍ͬͨ͜ͱ
    • ֤ϝϯόʔͱͷఆظతͳ 1on1


    • ϛογϣϯͷதؒৼΓฦΓ


    • ϛογϣϯΛՄࢹԽ͢ΔϛογϣϯπϦʔͷ࡞੒


    • ϛογϣϯͷ૬ޓઆ໌ͷ৔ͷઃఆ

    View Slide

  44. ϛογϣϯΛՄࢹԽ͢ΔϛογϣϯπϦʔ
    https://blog.studysapuri.jp/entry/2022/02/25/sre-mission-tree

    View Slide

  45. Կ͕ى͖ͨͷ͔(1)
    • ੜ࢈ੑվળαΠΫϧͷՃ଎


    • ՝୊ͷٵ্͍͛ -> ࣮૷ -> ϑΟʔυόοΫ -> վળͷαΠΫϧ͕Ճ଎

    View Slide

  46. Կ͕ى͖ͨͷ͔(2)
    • SRE Culture ͷ఻ൖɿϓϨϞʔςϜͷ࣮ࢪ
    https://blog.studysapuri.jp/entry/pre-mortem

    View Slide

  47. Կ͕ى͖ͨͷ͔(3)
    • ΞϥʔτϋϯυϦϯάͷαϙʔτ


    • Alert ͦͷ΋ͷͷઆ໌ɺௐࠪํ๏͸ SRE ͕αϙʔτ


    • ରԠͦͷ΋ͷ͸։ൃνʔϜͰ࣮ࢪ

    View Slide

  48. ݁ՌͲ͏ͳ͔ͬͨ
    • େ͖ͳো֐ͳ͘ελσΟαϓϦதֶߨ࠲ͷϑϧϦχϡʔΞϧ
    ͕ϦϦʔε


    • ։ൃνʔϜͰΞϥʔτରԠ࣮ݱ
    https://studysapuri.jp/course/junior/ https://github.com/twitter/twemoji

    View Slide

  49. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔
    SRE
    ։ൃνʔϜ
    SRE
    mem
    ber
    mem
    ber
    mem
    ber
    Facilitation
    Pure SRE
    (։ൃνʔϜ಺)
    (Partially)
    Enabling SRE
    SRE
    (Partially) Embedded
    SRE ͱͯ͠Ҡಈ

    View Slide

  50. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔
    SRE
    ։ൃνʔϜ
    SRE
    mem
    ber
    mem
    ber
    mem
    ber
    Facilitating
    Pure SRE
    (։ൃνʔϜ಺)
    (Partially)
    Enabling SRE
    SRE
    (Partially) Embedded
    SRE ͱͯ͠Ҡಈ

    View Slide

  51. ࠓճ΍ͬͨ͜ͱ͸ͳΜͩͬͨͷ͔
    SRE
    ։ൃνʔϜ
    SRE
    mem
    ber
    mem
    ber
    mem
    ber
    Collaboration
    Pure SRE
    (։ൃνʔϜ಺)
    (Partially)
    Enabling SRE
    SRE
    (Partially) Embedded
    SRE ͱͯ͠Ҡಈ

    View Slide

  52. ࠓճͷύλʔϯͷߟ࡯
    • Enabling SRE ʹΑΔ Facilitating ͸”த”͔Β࡞Δํ͕ྑ͍


    • ΑΓ։ൃνʔϜͷӡ༻ελΠϧʹ͋ͬͨܗͰద༻Ͱ͖Δ


    • ٕज़తͳ࣮૷͸ Platform ʹৄ͍͠ Pure SRE ͕”֎”͔Β
    Embedded ͞Εͯ Collaboration ͨ͠ํ͕ྑ͍


    • ArgoCD, GitHub Actions ͳͲ Infrastructure ͸ Pure SRE ͕ৄ͍͠


    • ՝୊ൃݟɺ࣮૷ɺϑΟʔυόοΫαΠΫϧΛߴ଎ʹճ͢͜ͱͰΑΓྑ
    ͍ Platform ͕ఏڙͰ͖Δ

    View Slide

  53. Agenda
    • લఏɿSRE Λ࣮ݱ͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE


    • ࣄྫɿ(Partially) Embedded / Enabling SRE


    • ·ͱΊͱࠓޙ

    View Slide

  54. • ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ͸2छྨ͋Δ


    • Embedded SRE (from Pure SRE) / ֎͔Β఻͑Δ


    • Enabling SRE (in the Team) / ಺͔Β޿͛Δ


    • ૊৫ن໛ɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ


    • খن໛ / ։ൃॳظϑΣʔζͰ͋Ε͹ Embedded SRE Pattern


    • தେن໛ / ։ൃνʔϜ͕੒ख़ͯ͘͠Ε͹ Enabling SRE Pattern


    • ͜ͷ2ͭͷύλʔϯ͸ϚωδϝϯτͰσβΠϯͰ͖Δ


    • 100/0 Ͱͳ͘”෦෼తʹ”࣮ફ͢Δ͚ͩͰ΋ޮՌ͕͋Δ
    Tl;dr

    View Slide

  55. ࠓޙ͸͞Βʹ։ൃνʔϜͷεέʔϥϏϦςΟͷͨΊʹҎԼʹऔΓ૊Ή
    • SRE Capability शಘࢧԉ


    • ϛογϣϯϚωδϝϯτʹΑΔ։ൃνʔϜ಺ Enabling SRE ͷ࠾༻


    • SRE ੒ख़౓Ξηεϝϯτͷ࡞੒ɾ࣮ࢪ


    • SRE ஌ࣝɾٕज़शಘͷͨΊͷΦϯϘʔσΟϯάࢧԉ


    • Developer Success / ։ൃੜ࢈ੑ޲্ࢧԉ


    • Platform Λ Product ͱͯ͠։ൃ͢Δ


    • Developer Support
    ࠓճͷࣄྫ

    View Slide

  56. Special Thanks
    • @kyontan


    • As Embedded SRE


    • @ravelll


    • As Enabling SRE


    • ʰελσΟαϓϦʱதֶߨ࠲ϑϧϦχϡʔΞϧʹؔΘͬͨશͯͷਓ


    • SRE νʔϜϝϯόʔ

    View Slide

  57. Thank you!
    chaspy chaspy_
    Engineering Manager, Site Reliability

    at Recruit Co., Ltd.
    Takeshi Kondo
    https://chaspy.me

    View Slide

  58. ͓·͚ɿSRE ੒ख़౓Ξηεϝϯτ

    View Slide