Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Who owns the Service Level?

Who owns the Service Level?

Takeshi Kondo

May 15, 2022
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. Who am I chaspy chaspy_ Engineering Manager, Site Reliability at

    Recruit Co., Ltd. Takeshi Kondo https://chaspy.me
  2. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Not like this • αʔϏε͕ʮߴ͍৴པੑ

    (ʹ100%)ʯΛอ͍ͬͯΔ͜ͱ • SLI/SLO ΛकΕ͍ͯΔ͜ͱ • ΦϯίʔϧϩʔςʔγϣϯΛ։ൃνʔϜͰߦ͏͜ͱ https://github.com/twitter/twemoji
  3. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Like this! • αʔϏε͕ʮϢʔβ͕ظ଴͢Δ৴པੑʯΛอ͍ͬͯΔ͜ͱ •

    SLI/SLO Λઃఆ͠ɺඇػೳཁ݅ͱػೳཁ݅ͷ༏ઌ౓ܾఆͷ ࢦඪͱͯ͠׆༻͍ͯ͠Δ • SLO ҧ൓͕ൃੜͨ͠ͱ͖ʹద੾ʹରॲͰ͖ΔΑ͏ͳϞχλ Ϧϯάํ๏ͱϙϦγʔ͕νʔϜͰಉҙ͞Ε͍ͯΔ • ্ه͕ఆظతʹݟ௚͞Ε͍ͯΔ https://github.com/twitter/twemoji
  4. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷ৴པੑʹ ؔ͢Δ

    Capability औಘ Λࢧԉ͢Δ ࣗ෼ͨͪͷαʔϏεͷ ৴པੑΛࣗ෼ͨͪͰί ϯτϩʔϧͰ͖͍ͯΔ
  5. Team Topologies • 4ͭͷνʔϜύλʔϯ • Stream Aligned • Platform •

    Enabling • Complicated Subsystem • 3ͭͷίϛϡχέʔγϣϯύλʔϯ • Collaboration • X as a Service • Facilitation https://pub.jmam.co.jp/book/b593881.html
  6. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ׬݁ԽΛ ࢧ͑ΔϓϥοτϑΥʔϜͱ

    จԽΛ࡞Δ Platform Team Enabling Team Stream Aligned Team ࣗ෼ͨͪͰඞཁͳ΋ͷΛ ࣗ෼ͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ׬݁Խ
  7. ͳͥࣗݾ׬݁Խ͕ॏཁ͔ SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ׬݁ԽΛ ࢧ͑ΔϓϥοτϑΥʔϜͱ จԽΛ࡞Δ Platform Team Enabling

    Team Stream Aligned Team ࣗ෼ͨͪͰඞཁͳ΋ͷΛ ࣗ෼ͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ׬݁Խ
  8. ͳͥࣗݾ׬݁Խ͕ॏཁ͔: Not “VS”, but “And” • Dev vs and Ops

    • Ϣʔβ͔Βߴ଎ʹϑΟʔυόοΫΛಘΔ (DevOps) • Dev vs and Infrastructure • ηϧϑαʔϏεͰߏஙͯ͠ϦʔυλΠϜ୹ॖ • Productivity vs and Reliability • ੜ࢈ੑͱ৴པੑ͸૬ޓʹґଘ͢Δ
  9. • ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭͯ ͍Δ͜ͱ • ։ൃνʔϜ͕”ࣗݾ׬݁Խ”͍ͯ͠Δঢ়ଶ • SRE νʔϜ͸͜ΕΛϓϥοτϑΥʔϜͱจԽৢ੒Ͱࢧ͑Δ

    • ͜ΕΛ࣮ݱ͢Δʹ͸ϓϩμΫτ։ൃʹด͡ͳ͍ଟ༷ͳࢹ఺͕ඞཁ • Ϣʔβͷظ଴஋Λ஌Δ / Product Management • ߴ͍։ൃੜ࢈ੑ / Development Skills • ඇػೳཁٻʹͲΕ͚ͩίετΛ͔͚Δ͔ / Business Development ·ͱΊɿSRE Λ࣮ݱ͢Δͱ͸
  10. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE ~ @chaspy ೖࣾޙ • 2018: @chaspy ೖࣾ • 2019:

    Application Platform Λ Kubernetes ΁Ҡ؅ • 2020: Microservices Readiness ͷ੔උ • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc / Production Readiness Checklist • Self-services Infrastructure (terraform monorepo) • SLI/SLO • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹ׬શҠৡ Platform Team ͱͯ͠ Platform Λ࡞͍ͬͯΔ Enabling Team ͱͯ͠ ։ൃ૊৫ʹ SLI/SLO ͳͲͷΧϧνϟʔৢ੒
  11. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE ~ 2021೥ • COVID-19 ྲྀߦɺΞΫηε਺૿େ • Platform ͷਐԽ •

    Terraform monorepo • Loadtest Platform • GitHub Actions ʹΑΔ monorepo CI ෼཭ • ૊৫ͷมԽ • ٕज़ઓུάϧʔϓൃ଍ • ࣄۀҠ؅ʹΑΓϦΫϧʔτ΁స੶ɺQuipper ೔ຊࢧళਫ਼ࢉ • chaspy EM ೚༻
  12. ૊৫ن໛ͷਪҠ      ։ൃऀ 35 53 54

    73 114 43& 4 5 7 7 7 ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ 2022೥͔Β͸ۀ຿ҕୗͷํ΋Χ΢ϯτ͍ͯ͠Δɻ2021೥Ҏલ΋ۀ຿ҕୗͷํͱ࢓ࣄ͸͍ͯͨ͠ɻ
  13. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ͍ͣΕͷ࣌୅΋ Platform Team ͱ Enabling Team ͷৼΔ෣ ͍Λ͍ͯ͠Δ

    • ಛʹ2019೥͔Β͸ʮࣗݾ׬݁ԽʯΛςʔϚʹɺ͓ئ͍͞Ε Δ͜ͱΛۃྗݮΒͤΔ Platform Λ࡞͖ͬͯͨ • ಉ࣌ʹ։ൃνʔϜͷʮจԽΛͭ͘Δʯ͜ͱʹ౿ΈࠐΈɺSLI/ SLO Λݟ͍ͯ͘จԽΛ૊৫ʹৢ੒ͨ͠ • →ʮSLO Reviewʯat SRE NEXT 2020
  14. ʮSLO Reviewʯat SRE NEXT 2020 • ։ൃ૊৫ʹ SLO Λ Review

    ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product, 15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷ4εςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review
  15. Α͔ͬͨ఺ • ։ൃ૊৫ʹ SLO Λ Review ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product,

    15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷεςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review ։ൃνʔϜͷೝ஌ෛՙΛపఈత ʹԼ͛Δ͜ͱʹͩ͜Θͬͨ ໨తෆ࣮֬ੑͷ௿ݮͷͨΊ ϑΟʔυόοΫαΠΫϧΛճͨ͠
  16. Α͘ͳ͔ͬͨ఺ʁ • ։ൃ૊৫ʹ SLO Λ Review ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product,

    15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷεςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review ͳͥ͏·͍͔͘ͳ ͔ͬͨͷ͔ʁ
  17. ͳͥ”ߦಈ͢Δ”·ͰࢸΒͳ͔ͬͨͷ͔ • ౰࣌ɺSLO ҧ൓࣌ͷΞΫγϣϯ͸ Product Manager / Team ʹҠৡ͍ͯͨ͠ •

    ·ͬͨ͘Կ΋Ͱ͖ͳ͔ͬͨΘ͚Ͱ͸ͳ͍ • ΋ͱ΋ͱνʔϜʹ༧ࢉͷ͋ΔɺվળͷͨΊͷ࣌ؒͰͰ͖Δ͜ͱ͔͠Ͱ͖ͳ͔ͬͨʢִि1೔ʣ • QB Day ͱݺ͹ΕΔ • Τϥʔʹର͢Δ௚઀తͳରॲɺܰඍͳ Performance վળͳͲ • ΞʔΩςΫνϟมߋɺΠϯϑϥ෼཭ͳͲɺ௕ظతɾࠜຊతରॲ͸೉͔ͬͨ͠ • ʮࢦඪΛݩʹػೳཁٻͱඇػೳཁٻͷ༏ઌ౓൑அ͕Ͱ͖Δʯ·Ͱ౸ୡ͠ͳ͔ͬͨ • ༏ઌ౓൑அʹ໾ʹཱͨͳ͍ͷͰ͋Ε͹ɺ։ൃνʔϜʹͱͬͯ΍Δ͜ͱ͕૿͚͑ͨͩͱ΋ݴ͑Δ
  18. ʮSLO Reviewʯat SRE NEXT 2020 ͦͷޙͷ·ͱΊ • ʮ৴པੑࢦඪΛఆΊɺ؍࡯͢ΔʯจԽΛ࡞ͬͨ͜ͱʹ͸Ձ஋͕͋ͬͨ • ࣄۀઓ্ུͷҙࢥܾఆʹ໾ʹཱͭࢦඪʹҭͭ·Ͱʹ͸ࢸΒͳ͔ͬͨ

    • ཧ༝1. ඇػೳཁٻͱػೳཁٻͷόϥϯεΛม͑Δҙࢥܾఆݖݶɾ༧ࢉ͕ϓϩμΫτ ։ൃνʔϜʹͳ͔ͬͨ • ৽نػೳ։ൃͷΠϯηϯςΟϒ͕େ͖͍ঢ়گ • ͦͷΑ͏ͳٕज़ઓུ/ٕज़౤ࢿΛϓϩμΫτ All Ͱߦ͑Δ࢓૊Έ͕ͳ͔ͬͨ • ཧ༝2. ৴པੑࢦඪ͕ Biz/Dev/SRE શһ͕ཧղ͠΍͍͢ࢦඪͰ͸ͳ͔ͬͨ • backend API ͷ SLI ͸ϢʔβମݧΛ௚઀ද͓ͯ͠ΒͣɺLatency ʹؔ͢Δରॲ͸ TPM ΁ͷઆ໌΋೉͍͠
  19. ελσΟαϓϦখֶɾதֶɾߴߍɾେֶडݧߨ࠲ ελσΟαϓϦ For TEACHERS ελσΟαϓϦ For SCHOOL ݱঢ়ͷ૊৫ਤ: খதߴϓϩμΫτ։ൃ෦ ҎԼ17άϧʔϓ

    TPM BtoB TPM BtoC TPM ForSCHOOL TPM ԣஅ BtoC BtoB QA ։ൃࢧԉ SRE ٕज़ઓུ ίʔνϯά ৽ن։ൃ1 Τϯϋϯε ֶशࢧԉ Native iOS Android ৽ن։ൃ2 ਐ࿏ओମੑ ίϛϡχέʔγϣϯࢧԉ ForSCHOOLϞόΠϧ
  20. Disclaimer • ٕज़ઓུάϧʔϓͷ্ཱͪ͛͸લ೚Ϛωʔδϟ͕ߦͬͨ΋ͷ • ࡢ೥࣌఺Ͱ͸ @chaspy ͸ DevOps WG ͷ

    Lead -> EM/Lead • લ೚ͷୀ৬ʹ൐͍ٕज़ઓུάϧʔϓͷ EM ͸෦௕͕݉೚ͭ͠ ͭɺଞ਺໊ͷ EM ͱҰॹʹӡӦ͍ͯ͠Δ • SLO ҧ൓ͷରॲ͕Ͱ͖ͳ͍͜ͱ͕ཧ༝Ͱ্ཱ͕ͪͬͨΘ͚Ͱ ͸ͳ͍
  21. ͳٕͥज़ઓུ”άϧʔϓ”͕ඞཁ͔ • ٕज़ઓུͷܾΊํ͸૊৫ʹΑͬͯҟͳΔ • 1ਓͷ CTO ͕τοϓμ΢ϯͰܾΊͯ΋͍͍ • ϘτϜΞοϓͰશһ߹ٞͰܾΊͯ΋͍͍ •

    ͦͷதؒͰ΋͍͍ • ελσΟαϓϦখதߴ։ൃ૊৫͸ٕज़ઓུΛ1ਓʹґଘ͠ͳ͍࢓૊ΈΛ ࡞Δ͜ͱʹ௅ઓ͍ͯ͠Δ
  22. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

    ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
  23. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

    ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
  24. ΋ͪΖΜɺ׬ᘳͰ͸ͳ͍ • ٕज़՝୊͸ස౓ͱڧ౓ͰଌΕΔ΋ͷͰ͸ͳ͍ • ఆੑతͰ͋Δ • ࢀՃϝϯόʔͷภΓ͕͋Δ͔΋ • ෳ਺ member

    ͷ vote ݁Ռͷॏ৺ʹஔ͍͍ͯΔͷͰਫ਼౓ʹٙ໰ • ։ൃϦιʔεɺٕज़త೉қ౓ɺϦεΫʹΑ͙ͬͯ͢ʹऔΓ͔͔Εͳ ͍՝୊΋͋Δ • ՝୊ͷ༏ઌ౓෇͚ʹ͕͔͔࣌ؒΔ • etc…
  25. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

    ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
  26. DevOps WG ͷ໨తͱ׆ಈ • ໨తɿʢ։ൃνʔϜͷʣࣗݾ਍அೳྗͷ֫ಘͷͨΊʹઃஔ • ϝϯόʔ͸ ྖҬ͝ͱͷ WebDev /

    QA / SRE • ׆ಈ಺༰ • όϦϡʔετϦʔϜϚοϐϯάͷ࣮ࢪ • ީิͱͳΔ Metrics / Indicator ͷચ͍ग़͠ͱܭଌ • DX Criteria ͷ࣮ࢪ • όϦϡʔετϦʔϜΛ્֐͢ΔཁҼͷղܾ(e.g. E2E Automation) • ϓϩμΫτ։ൃ෦֎΁ͷ޿ใ׆ಈ • ·ͣ͸༗ޮͦ͏ͳ metrics ΍ΞηεϝϯτΛݕূͨ͠
  27. ϓϩμΫτ։ൃ෦֎Ͱͷ޿ใ׆ಈ: BtoC All Hands Ͱͷൃද https://blog.studysapuri.jp/entry/2020/08/17/dx-criteria-system • ॴଐάϧʔϓΛ௒͑ͨࣄۀঢ়گΛ஌Δ৔ • Ϛʔέοτχϡʔε

    • ࣄۀঢ়گ • ϓϩμΫτ KPI • SLI / ։ൃऀੜ࢈ੑ • ͦͷଞτϐοΫ͞·͟· • SRE ͱ͸ͳʹʁ • ϚΠΫϩαʔϏεͬͯͳʹʁ͏Ε͍͠ͷʁ
  28. SRE ͱٕज़ઓུ • DevOps WG ͷ׆ಈ͸ ʮSRE ͷ࣮ݱʯͷจԽ໘Ͱͷ֦ு • զʑ͕ݟΔ΂͖ࢦඪ͸γεςϜͷ৴པੑࢦඪ͚ͩͰ͸ͳ͍

    • ͋ΒΏΔ΋ͷΛࢦඪΛݟͯɺҙࢥܾఆ͢Δ • ࠓޙ͸͜ͷจԽৢ੒ͦͷ΋ͷͷվળαΠΫϧΛճ͢ • 1. ީิͱͳΔ metrics ͷ༗ޮੑ͕໌Β͔ʹͳΓɺ਺஋Խ͢Δ • 2. ։ൃνʔϜ͕ͦΕΛݟͯɺΞΫγϣϯΛߟ͑Δ͜ͱ͕Ͱ͖Δ • 3. ։ൃνʔϜ͕ΞΫγϣϯ->վળͷαΠΫϧΛճ͢ • 4. 1-3 ͦΕࣗମ͕͏·͍͍ͬͯ͘Δ͔ΛධՁ͢Δ
  29. SRE ͱٕज़ઓུ: ·ͱΊ • SRE Λ࣮ݱ͢ΔͨΊʹ͸ɺSLO ҧ൓Λͨ࣌͠ʹߦಈͰ͖Δ༧ࢉͱݖݶ ͕ඞཁ • ͦͷ্Ͱɺٕज़՝୊Λղܾ͢Δ༏ઌॱҐΛ͚ͭΒΕΔٕज़ઓུ͕ඞཁ

    • ʰελσΟαϓϦʱখதߴϓϩμΫτ։ൃ෦Ͱ͸͜ͷٕज़ઓུΛ1ਓ ʹґଘͤͣɺάϧʔϓͰ࣮ݱ͢Δ͜ͱʹ௅ઓ͍ͯ͠Δ • ͋ΒΏΔ΋ͷΛࢦඪͰݟ͍ͯ͘จԽ͕৴པੑͷͨΊʹॏཁ
  30. SRE “NEXT” in ʰελσΟαϓϦʱ • “৴པੑ” ʹؔͯ͠͸ Enabling Team ͱͯ͠ͷ

    SRE Team ͸໾ ׂΛՌͨͭͭ͋͠Δ • SRE Team ͷࠓޙ • ΑΓ૊৫Λ Sustainable / Scalable ʹ͢ΔͨΊʹɺPlatform ʹؔ͢Δ ΦϯϘʔσΟϯάͷ֦ॆ΍ɺ։ൃνʔϜ͕ࣗ཯తʹ৴པੑʹؔ͢Δ Capability शಘΛ൑அͰ͖ΔΞηεϝϯτΛఏڙ͢Δ • ৴པੑ͚ͩͰ͸ͳ͍ɺ։ൃੜ࢈ੑΛՌͨͤΔ Platform ։ൃʹ஫ྗ͢Δ
  31. Who owns the Service Level? • Service Level ͸ϓϩμΫτʹؔΘΔશһͷ΋ͷ •

    શһ͕ؔ৺Λ࣋ͯΔΑ͏ͳ৴པੑࢦඪʹਐԽͤ͞·͠ΐ͏ • ϢʔβମݧΛ௚઀తʹද͢ Client-side(WebFrontend/Native) Ͱͷ SLI/SLO Λ௥͏ • ʮࢦඪΛݟͯߦಈ͢Δʯͦͷ΋ͷͷվળαΠΫϧΛճ͠·͠ΐ͏