Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Who owns the Service Level?

Who owns the Service Level?

Takeshi Kondo

May 15, 2022
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

 1. Who am I chaspy chaspy_ Engineering Manager, Site Reliability at

  Recruit Co., Ltd. Takeshi Kondo https://chaspy.me
 2. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Not like this • αʔϏε͕ʮߴ͍৴པੑ

  (ʹ100%)ʯΛอ͍ͬͯΔ͜ͱ • SLI/SLO ΛकΕ͍ͯΔ͜ͱ • ΦϯίʔϧϩʔςʔγϣϯΛ։ൃνʔϜͰߦ͏͜ͱ https://github.com/twitter/twemoji
 3. ͦ΋ͦ΋ Site Reliability Engineering ͱ͸: Like this! • αʔϏε͕ʮϢʔβ͕ظ଴͢Δ৴པੑʯΛอ͍ͬͯΔ͜ͱ •

  SLI/SLO Λઃఆ͠ɺඇػೳཁ݅ͱػೳཁ݅ͷ༏ઌ౓ܾఆͷ ࢦඪͱͯ͠׆༻͍ͯ͠Δ • SLO ҧ൓͕ൃੜͨ͠ͱ͖ʹద੾ʹରॲͰ͖ΔΑ͏ͳϞχλ Ϧϯάํ๏ͱϙϦγʔ͕νʔϜͰಉҙ͞Ε͍ͯΔ • ্ه͕ఆظతʹݟ௚͞Ε͍ͯΔ https://github.com/twitter/twemoji
 4. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷ৴པੑʹ ؔ͢Δ

  Capability औಘ Λࢧԉ͢Δ ࣗ෼ͨͪͷαʔϏεͷ ৴པੑΛࣗ෼ͨͪͰί ϯτϩʔϧͰ͖͍ͯΔ
 5. Team Topologies • 4ͭͷνʔϜύλʔϯ • Stream Aligned • Platform •

  Enabling • Complicated Subsystem • 3ͭͷίϛϡχέʔγϣϯύλʔϯ • Collaboration • X as a Service • Facilitation https://pub.jmam.co.jp/book/b593881.html
 6. ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ׬݁ԽΛ ࢧ͑ΔϓϥοτϑΥʔϜͱ

  จԽΛ࡞Δ Platform Team Enabling Team Stream Aligned Team ࣗ෼ͨͪͰඞཁͳ΋ͷΛ ࣗ෼ͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ׬݁Խ
 7. ͳͥࣗݾ׬݁Խ͕ॏཁ͔ SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ׬݁ԽΛ ࢧ͑ΔϓϥοτϑΥʔϜͱ จԽΛ࡞Δ Platform Team Enabling

  Team Stream Aligned Team ࣗ෼ͨͪͰඞཁͳ΋ͷΛ ࣗ෼ͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ׬݁Խ
 8. ͳͥࣗݾ׬݁Խ͕ॏཁ͔: Not “VS”, but “And” • Dev vs and Ops

  • Ϣʔβ͔Βߴ଎ʹϑΟʔυόοΫΛಘΔ (DevOps) • Dev vs and Infrastructure • ηϧϑαʔϏεͰߏஙͯ͠ϦʔυλΠϜ୹ॖ • Productivity vs and Reliability • ੜ࢈ੑͱ৴པੑ͸૬ޓʹґଘ͢Δ
 9. • ։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λ਎ʹ͚ͭͯ ͍Δ͜ͱ • ։ൃνʔϜ͕”ࣗݾ׬݁Խ”͍ͯ͠Δঢ়ଶ • SRE νʔϜ͸͜ΕΛϓϥοτϑΥʔϜͱจԽৢ੒Ͱࢧ͑Δ

  • ͜ΕΛ࣮ݱ͢Δʹ͸ϓϩμΫτ։ൃʹด͡ͳ͍ଟ༷ͳࢹ఺͕ඞཁ • Ϣʔβͷظ଴஋Λ஌Δ / Product Management • ߴ͍։ൃੜ࢈ੑ / Development Skills • ඇػೳཁٻʹͲΕ͚ͩίετΛ͔͚Δ͔ / Business Development ·ͱΊɿSRE Λ࣮ݱ͢Δͱ͸
 10. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE ~ @chaspy ೖࣾޙ • 2018: @chaspy ೖࣾ • 2019:

  Application Platform Λ Kubernetes ΁Ҡ؅ • 2020: Microservices Readiness ͷ੔උ • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc / Production Readiness Checklist • Self-services Infrastructure (terraform monorepo) • SLI/SLO • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹ׬શҠৡ Platform Team ͱͯ͠ Platform Λ࡞͍ͬͯΔ Enabling Team ͱͯ͠ ։ൃ૊৫ʹ SLI/SLO ͳͲͷΧϧνϟʔৢ੒
 11. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE ~ 2021೥ • COVID-19 ྲྀߦɺΞΫηε਺૿େ • Platform ͷਐԽ •

  Terraform monorepo • Loadtest Platform • GitHub Actions ʹΑΔ monorepo CI ෼཭ • ૊৫ͷมԽ • ٕज़ઓུάϧʔϓൃ଍ • ࣄۀҠ؅ʹΑΓϦΫϧʔτ΁స੶ɺQuipper ೔ຊࢧళਫ਼ࢉ • chaspy EM ೚༻
 12. ૊৫ن໛ͷਪҠ   ։ൃऀ 35 53 54

  73 114 43& 4 5 7 7 7 ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ 2022೥͔Β͸ۀ຿ҕୗͷํ΋Χ΢ϯτ͍ͯ͠Δɻ2021೥Ҏલ΋ۀ຿ҕୗͷํͱ࢓ࣄ͸͍ͯͨ͠ɻ
 13. ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ͍ͣΕͷ࣌୅΋ Platform Team ͱ Enabling Team ͷৼΔ෣ ͍Λ͍ͯ͠Δ

  • ಛʹ2019೥͔Β͸ʮࣗݾ׬݁ԽʯΛςʔϚʹɺ͓ئ͍͞Ε Δ͜ͱΛۃྗݮΒͤΔ Platform Λ࡞͖ͬͯͨ • ಉ࣌ʹ։ൃνʔϜͷʮจԽΛͭ͘Δʯ͜ͱʹ౿ΈࠐΈɺSLI/ SLO Λݟ͍ͯ͘จԽΛ૊৫ʹৢ੒ͨ͠ • →ʮSLO Reviewʯat SRE NEXT 2020
 14. ʮSLO Reviewʯat SRE NEXT 2020 • ։ൃ૊৫ʹ SLO Λ Review

  ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product, 15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷ4εςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review
 15. Α͔ͬͨ఺ • ։ൃ૊৫ʹ SLO Λ Review ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product,

  15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷεςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review ։ൃνʔϜͷೝ஌ෛՙΛపఈత ʹԼ͛Δ͜ͱʹͩ͜Θͬͨ ໨తෆ࣮֬ੑͷ௿ݮͷͨΊ ϑΟʔυόοΫαΠΫϧΛճͨ͠
 16. Α͘ͳ͔ͬͨ఺ʁ • ։ൃ૊৫ʹ SLO Λ Review ͍ͯ͘͠จԽΛ࡞ͬͨऔΓ૊Έ • 2 Product,

  15 Team ʹϘτϜΞοϓͰಋೖ • ొΓํͷεςοϓ • γεςϜͱ૊৫ͷΦʔφʔγοϓΛܾΊΔ • 1ਓͰϓϩηεΛ·Θ͠ɺ࣮ݱํ๏Λཱ֬͢Δ • Developer ͱҰॹʹ SLI/SLO Λఆٛ͠ɺϓϩηεΛ·Θ͢ • Error Budget Policy Λఆٛͯ͠ߦಈ͢Δ(ະ࣮ݱ) • ಘֶͨͼ • ඪ४Խ͞Εͨ SLI Λఏڙ͢Δ • ઃఆ͸ૣ͍ஈ֊ͰίʔυԽ͢Δ • ֶशۂઢΛٸޯ഑ʹ͢Δ https://blog.studysapuri.jp/entry/2020/01/30/slo-review ͳͥ͏·͍͔͘ͳ ͔ͬͨͷ͔ʁ
 17. ͳͥ”ߦಈ͢Δ”·ͰࢸΒͳ͔ͬͨͷ͔ • ౰࣌ɺSLO ҧ൓࣌ͷΞΫγϣϯ͸ Product Manager / Team ʹҠৡ͍ͯͨ͠ •

  ·ͬͨ͘Կ΋Ͱ͖ͳ͔ͬͨΘ͚Ͱ͸ͳ͍ • ΋ͱ΋ͱνʔϜʹ༧ࢉͷ͋ΔɺվળͷͨΊͷ࣌ؒͰͰ͖Δ͜ͱ͔͠Ͱ͖ͳ͔ͬͨʢִि1೔ʣ • QB Day ͱݺ͹ΕΔ • Τϥʔʹର͢Δ௚઀తͳରॲɺܰඍͳ Performance վળͳͲ • ΞʔΩςΫνϟมߋɺΠϯϑϥ෼཭ͳͲɺ௕ظతɾࠜຊతରॲ͸೉͔ͬͨ͠ • ʮࢦඪΛݩʹػೳཁٻͱඇػೳཁٻͷ༏ઌ౓൑அ͕Ͱ͖Δʯ·Ͱ౸ୡ͠ͳ͔ͬͨ • ༏ઌ౓൑அʹ໾ʹཱͨͳ͍ͷͰ͋Ε͹ɺ։ൃνʔϜʹͱͬͯ΍Δ͜ͱ͕૿͚͑ͨͩͱ΋ݴ͑Δ
 18. ʮSLO Reviewʯat SRE NEXT 2020 ͦͷޙͷ·ͱΊ • ʮ৴པੑࢦඪΛఆΊɺ؍࡯͢ΔʯจԽΛ࡞ͬͨ͜ͱʹ͸Ձ஋͕͋ͬͨ • ࣄۀઓ্ུͷҙࢥܾఆʹ໾ʹཱͭࢦඪʹҭͭ·Ͱʹ͸ࢸΒͳ͔ͬͨ

  • ཧ༝1. ඇػೳཁٻͱػೳཁٻͷόϥϯεΛม͑Δҙࢥܾఆݖݶɾ༧ࢉ͕ϓϩμΫτ ։ൃνʔϜʹͳ͔ͬͨ • ৽نػೳ։ൃͷΠϯηϯςΟϒ͕େ͖͍ঢ়گ • ͦͷΑ͏ͳٕज़ઓུ/ٕज़౤ࢿΛϓϩμΫτ All Ͱߦ͑Δ࢓૊Έ͕ͳ͔ͬͨ • ཧ༝2. ৴པੑࢦඪ͕ Biz/Dev/SRE શһ͕ཧղ͠΍͍͢ࢦඪͰ͸ͳ͔ͬͨ • backend API ͷ SLI ͸ϢʔβମݧΛ௚઀ද͓ͯ͠ΒͣɺLatency ʹؔ͢Δରॲ͸ TPM ΁ͷઆ໌΋೉͍͠
 19. ελσΟαϓϦখֶɾதֶɾߴߍɾେֶडݧߨ࠲ ελσΟαϓϦ For TEACHERS ελσΟαϓϦ For SCHOOL ݱঢ়ͷ૊৫ਤ: খதߴϓϩμΫτ։ൃ෦ ҎԼ17άϧʔϓ

  TPM BtoB TPM BtoC TPM ForSCHOOL TPM ԣஅ BtoC BtoB QA ։ൃࢧԉ SRE ٕज़ઓུ ίʔνϯά ৽ن։ൃ1 Τϯϋϯε ֶशࢧԉ Native iOS Android ৽ن։ൃ2 ਐ࿏ओମੑ ίϛϡχέʔγϣϯࢧԉ ForSCHOOLϞόΠϧ
 20. Disclaimer • ٕज़ઓུάϧʔϓͷ্ཱͪ͛͸લ೚Ϛωʔδϟ͕ߦͬͨ΋ͷ • ࡢ೥࣌఺Ͱ͸ @chaspy ͸ DevOps WG ͷ

  Lead -> EM/Lead • લ೚ͷୀ৬ʹ൐͍ٕज़ઓུάϧʔϓͷ EM ͸෦௕͕݉೚ͭ͠ ͭɺଞ਺໊ͷ EM ͱҰॹʹӡӦ͍ͯ͠Δ • SLO ҧ൓ͷରॲ͕Ͱ͖ͳ͍͜ͱ͕ཧ༝Ͱ্ཱ͕ͪͬͨΘ͚Ͱ ͸ͳ͍
 21. ͳٕͥज़ઓུ”άϧʔϓ”͕ඞཁ͔ • ٕज़ઓུͷܾΊํ͸૊৫ʹΑͬͯҟͳΔ • 1ਓͷ CTO ͕τοϓμ΢ϯͰܾΊͯ΋͍͍ • ϘτϜΞοϓͰશһ߹ٞͰܾΊͯ΋͍͍ •

  ͦͷதؒͰ΋͍͍ • ελσΟαϓϦখதߴ։ൃ૊৫͸ٕज़ઓུΛ1ਓʹґଘ͠ͳ͍࢓૊ΈΛ ࡞Δ͜ͱʹ௅ઓ͍ͯ͠Δ
 22. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

  ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
 23. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

  ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
 24. ΋ͪΖΜɺ׬ᘳͰ͸ͳ͍ • ٕज़՝୊͸ස౓ͱڧ౓ͰଌΕΔ΋ͷͰ͸ͳ͍ • ఆੑతͰ͋Δ • ࢀՃϝϯόʔͷภΓ͕͋Δ͔΋ • ෳ਺ member

  ͷ vote ݁Ռͷॏ৺ʹஔ͍͍ͯΔͷͰਫ਼౓ʹٙ໰ • ։ൃϦιʔεɺٕज़త೉қ౓ɺϦεΫʹΑ͙ͬͯ͢ʹऔΓ͔͔Εͳ ͍՝୊΋͋Δ • ՝୊ͷ༏ઌ౓෇͚ʹ͕͔͔࣌ؒΔ • etc…
 25. ׆ಈମ • ໨త • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ໨ඪ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ •

  ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ DevOps WG ԣஅWG Backend WG Frontend WG
 26. DevOps WG ͷ໨తͱ׆ಈ • ໨తɿʢ։ൃνʔϜͷʣࣗݾ਍அೳྗͷ֫ಘͷͨΊʹઃஔ • ϝϯόʔ͸ ྖҬ͝ͱͷ WebDev /

  QA / SRE • ׆ಈ಺༰ • όϦϡʔετϦʔϜϚοϐϯάͷ࣮ࢪ • ީิͱͳΔ Metrics / Indicator ͷચ͍ग़͠ͱܭଌ • DX Criteria ͷ࣮ࢪ • όϦϡʔετϦʔϜΛ્֐͢ΔཁҼͷղܾ(e.g. E2E Automation) • ϓϩμΫτ։ൃ෦֎΁ͷ޿ใ׆ಈ • ·ͣ͸༗ޮͦ͏ͳ metrics ΍ΞηεϝϯτΛݕূͨ͠
 27. ϓϩμΫτ։ൃ෦֎Ͱͷ޿ใ׆ಈ: BtoC All Hands Ͱͷൃද https://blog.studysapuri.jp/entry/2020/08/17/dx-criteria-system • ॴଐάϧʔϓΛ௒͑ͨࣄۀঢ়گΛ஌Δ৔ • Ϛʔέοτχϡʔε

  • ࣄۀঢ়گ • ϓϩμΫτ KPI • SLI / ։ൃऀੜ࢈ੑ • ͦͷଞτϐοΫ͞·͟· • SRE ͱ͸ͳʹʁ • ϚΠΫϩαʔϏεͬͯͳʹʁ͏Ε͍͠ͷʁ
 28. SRE ͱٕज़ઓུ • DevOps WG ͷ׆ಈ͸ ʮSRE ͷ࣮ݱʯͷจԽ໘Ͱͷ֦ு • զʑ͕ݟΔ΂͖ࢦඪ͸γεςϜͷ৴པੑࢦඪ͚ͩͰ͸ͳ͍

  • ͋ΒΏΔ΋ͷΛࢦඪΛݟͯɺҙࢥܾఆ͢Δ • ࠓޙ͸͜ͷจԽৢ੒ͦͷ΋ͷͷվળαΠΫϧΛճ͢ • 1. ީิͱͳΔ metrics ͷ༗ޮੑ͕໌Β͔ʹͳΓɺ਺஋Խ͢Δ • 2. ։ൃνʔϜ͕ͦΕΛݟͯɺΞΫγϣϯΛߟ͑Δ͜ͱ͕Ͱ͖Δ • 3. ։ൃνʔϜ͕ΞΫγϣϯ->վળͷαΠΫϧΛճ͢ • 4. 1-3 ͦΕࣗମ͕͏·͍͍ͬͯ͘Δ͔ΛධՁ͢Δ
 29. SRE ͱٕज़ઓུ: ·ͱΊ • SRE Λ࣮ݱ͢ΔͨΊʹ͸ɺSLO ҧ൓Λͨ࣌͠ʹߦಈͰ͖Δ༧ࢉͱݖݶ ͕ඞཁ • ͦͷ্Ͱɺٕज़՝୊Λղܾ͢Δ༏ઌॱҐΛ͚ͭΒΕΔٕज़ઓུ͕ඞཁ

  • ʰελσΟαϓϦʱখதߴϓϩμΫτ։ൃ෦Ͱ͸͜ͷٕज़ઓུΛ1ਓ ʹґଘͤͣɺάϧʔϓͰ࣮ݱ͢Δ͜ͱʹ௅ઓ͍ͯ͠Δ • ͋ΒΏΔ΋ͷΛࢦඪͰݟ͍ͯ͘จԽ͕৴པੑͷͨΊʹॏཁ
 30. SRE “NEXT” in ʰελσΟαϓϦʱ • “৴པੑ” ʹؔͯ͠͸ Enabling Team ͱͯ͠ͷ

  SRE Team ͸໾ ׂΛՌͨͭͭ͋͠Δ • SRE Team ͷࠓޙ • ΑΓ૊৫Λ Sustainable / Scalable ʹ͢ΔͨΊʹɺPlatform ʹؔ͢Δ ΦϯϘʔσΟϯάͷ֦ॆ΍ɺ։ൃνʔϜ͕ࣗ཯తʹ৴པੑʹؔ͢Δ Capability शಘΛ൑அͰ͖ΔΞηεϝϯτΛఏڙ͢Δ • ৴པੑ͚ͩͰ͸ͳ͍ɺ։ൃੜ࢈ੑΛՌͨͤΔ Platform ։ൃʹ஫ྗ͢Δ
 31. Who owns the Service Level? • Service Level ͸ϓϩμΫτʹؔΘΔશһͷ΋ͷ •

  શһ͕ؔ৺Λ࣋ͯΔΑ͏ͳ৴པੑࢦඪʹਐԽͤ͞·͠ΐ͏ • ϢʔβମݧΛ௚઀తʹද͢ Client-side(WebFrontend/Native) Ͱͷ SLI/SLO Λ௥͏ • ʮࢦඪΛݟͯߦಈ͢Δʯͦͷ΋ͷͷվળαΠΫϧΛճ͠·͠ΐ͏