Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Implementing Site Reliability Engineering in your organization

Implementing Site Reliability Engineering in your organization

Implementing Site Reliability Engineering in your organization
- Making Culture, Enabling DevOps, Building Platform -

Infra Study 2nd #7「SREと組織」
https://forkwell.connpass.com/event/228038/

Takeshi Kondo

November 16, 2021
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. Implementing Site Reliability Engineering in your organization - Making Culture,

    Enabling DevOps, Building Platform - Takeshi Kondo / @chaspy 2021/11/16 Infra Study 2nd #7ʮSREͱ૊৫ʯ
  2. ࠓ೔࿩͢͜ͱ / ࿩͞ͳ͍͜ͱ / ର৅ • ࿩͢͜ͱ • SRE Λ૊৫ʹ࣮૷͢ΔͨΊͷഎܠͱͳΔߟ͑ํ

    • ࿩͞ͳ͍͜ͱ • ಛఆͷٕज़ͷ࿩ • SRE Practice ͷ࣮ફྫ • ର৅ • SRE Λ૊৫ʹ࣮૷͍͚ͨ͠Ͳ೰ΜͰΔͻͱ
  3. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  4. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ ⾢ main 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  5. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  6. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • SRE ͷ໨ඪ • αΠτͷ৴པੑΛίϯτϩʔϧ͢Δ͜ͱ • Agility ͱ

    Reliability ͲͪΒʹ౤ࢿ͢Δͷ͔Λ SLO ͱ͍͏ࢦඪΛݩʹ ҙࢥܾఆ͢Δ • ࣗ෼ͨͪͷϓϩμΫτɾαʔϏεΛ࡞ΔνʔϜ͕͜ΕΒΛ౰ͨΓલʹ Ͱ͖Δঢ়ଶΛ໨ࢦ͢
  7. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • SRE ͷ໨ඪ • αΠτͷ৴པੑΛίϯτϩʔϧ͢Δ͜ͱ • Agility ͱ

    Reliability ͲͪΒʹ౤ࢿ͢Δͷ͔Λ SLO ͱ͍͏ࢦඪΛݩʹ ҙࢥܾఆ͢Δ • ࣗ෼ͨͪͷϓϩμΫτɾαʔϏεΛ࡞ΔνʔϜ͕͜ΕΒΛ౰ͨΓલʹ Ͱ͖Δঢ়ଶΛ໨ࢦ͢ ૊৫ʹνʔϜʹਓؒʹ Կ͔Λ࣮ߦͯ͠΋Β͏
  8. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  9. ελσΟαϓϦ K12 SRE Team ͷ Vision / Mission / Values

    • Vision • ࠷ߴͷֶशϓϩμΫτΛ࡞Γଓ͚ΒΕΔ։ൃ૊৫ͷ࣮ݱ • Mission • ࣗݾ׬݁νʔϜ͕ϓϩμΫτΛૉૣ҆͘શʹಧ͚ଓ͚ΔͨΊͷϓϥο τϑΥʔϜͱจԽΛ࡞Δ • Values • Fail smart / Learning / Borderless / Metrics-driven
  10. ελσΟαϓϦ K12 SRE Team ͷ Vision / Mission / Values

    https://blog.studysapuri.jp/entry/sre-vision-mission-values
  11. ૊৫ن໛ͷਪҠ     ։ൃऀ    

    43&     ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ
  12. 2020೥1݄౰࣌ͷঢ়گ • Platform Λ Kubernetes ʹࡌͤସ͑ɺMicroservices Ready ͳঢ়گΛ໨ࢦ͍ͯͨ͠ • ૊৫ͱγεςϜ͕εέʔϧ͢ΔΑ͏ʹϓϩηεΛ੔͑ͨ

    • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc • Production Readiness Checklist • Self-services Infrastructure (Terraform) KubernetesಋೖͰ࣮ݱ͍ͨ͠ੈքͱͦͷઌʹ͋ΔMicroservices https://blog.studysapuri.jp/entry/future-with-kubernetes
  13. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ ⾢ main 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  14. SRE ͷ૊৫΁ͷ࣮૷ͱ͸Կ͔ • SRE ͷ૊৫΁ͷ࣮૷ ▶ SRE ͷจԽΛ։ൃ૊৫ʹ࡞Δ͜ͱ • ։ൃϝϯόʔ͕ɺ։ൃ૊৫ͷߏ੒һͱͯ͠ɺSLI/SLO

    Λఆٛ ͢ΔͳͲͷ Practice Λ࣮ફ͠ɺ৴པੑΛίϯτϩʔϧ͢Δৼ Δ෣͍Λࣗવʹ࣮ߦ͍ͯ͠Δঢ়ଶͷ͜ͱ
  15. ೝ஌ෛՙ / Cognitive load In cognitive psychology, cognitive load refers

    to the used amount of working memory resources (Wikipedia ΑΓ)
  16. Terraform Platform in Quipper HashiTalks Japan 2021 ͰฐϓϩμΫτͷ Terraform Platform

    ʹ͍ͭͯొஃ͠·ͨ͠ https://blog.studysapuri.jp/entry/2021/10/13/080000
  17. పఈతͳݴޠԽΛ͢Δ • Backgrond, Problem, Why, What, How Λݴ༿ʹ͢Δ • ϑΟʔυόοΫΛ΋Β͍΍͘͢͢Δ

    • উखʹڞ༗͞Ε͍ͯ͘ ݴ༿͸ڧ͍ɻݴ༿ʹͯ͠ɺҙࢥܾఆͯ͠ɺ ࢼͯ͠ɺৼΓฦΔ͜ͱΛ܁Γฦ͔͢͠ͳ͍
  18. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ͜Ε·ͰελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯ ͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  19. Class SRE implements DevOps • DevOps ͱ͍͏ࢥ૝Λ࣮ફ͢Δͷ͕ SRE • What's

    the Difference Between DevOps and SRE? (class SRE implements DevOps) https://www.youtube.com/watch?v=uTEL8Ff1Zvk
  20. DevOps WG • ֤։ൃνʔϜͱ SRE, QA ϝϯόʔͰʮࣗݾ਍அೳྗͷ֫ಘʯ Λߟ͍͑ͯΔ • DX

    Criteria ͷ࣮ࢪ • ։ൃͷόϦϡʔετϦʔϜϚοϐϯάͷ࡞੒ • Metrics ͷऩूͱ؍࡯ • Platform ͷൈຊతͳվળ΁ͷΠϯϓοτ
  21. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  22. Why Platform? • Platform = ڞ௨ʹ࢖ΘΕΔ࣮ߦج൫ • ͳͥ Platform ͕ඞཁ͔ʁ

    • ӡ༻ෛՙΛԼ͛Δ • ೝ஌ෛՙΛԼ͛Δ • Metrics ΛΑΓޮՌతʹऩू͢Δ • Agility ͱ Reliability ͷ૒ํΛߴΊΔ
  23. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  24. ·ͱΊ • SRE ͷ૊৫΁ͷ࣮૷ͷͨΊʹ৺͕͚Δͱ͍͍͜ͱ • φϥςΟϒΛཧղ͢Δ • ೝ஌ෛՙΛԼ͛Δ • పఈతʹݴޠԽ͢Δ

    • ։ൃ૊৫શମͰมԽʹڧ͍૊৫ͱγεςϜΛ࡞͍ͬͯ͘ • DevOps ͷ࣮ݱ • Platform ͷ։ൃ • ։ൃνʔϜ಺Ͱͷ SRE ࣮ફͷαϙʔτ
  25. ࠓޙ • Site Reliability Engineering ͷ૊৫΁ͷ࣮૷͸Ҿ͖ଓ͖΍Δ • ։ൃ Team ಺Ͱͷ

    capability शಘΛࢧԉ͍ͯ͘͠ • SRE Team ͸Πϯϑϥͷٕज़తෛ࠴Λղফ͠ͳ͕Βɺ Platform ։ൃʹ஫ྗ͍ͯ͘͠ • Security ΋ Reliability ͱಉ͡ߏਤʹͳΔɻ։ൃνʔϜ͕ࣗ཯ తʹ࣮ફͰ͖ΔΑ͏ͳจԽͱϓϥοτϑΥʔϜΛ࡞͍ͬͯ͘
  26. FAQ: SREνʔϜࣗ਎͕Πϯϑϥج൫Λ։ൃͨ͠Γ͠·͔͢ʁ • Yes. جຊతʹ Cloud Λ׆༻ͨ͠ج൫Λ։ൃ͍ͯ͠Δ • Cloud Native

    Platform • Application CI/CD • Infrastructure CI/CD • Kubernetes Platform • Loadtest Platform