Save 37% off PRO during our Black Friday Sale! »

Implementing Site Reliability Engineering in your organization

Implementing Site Reliability Engineering in your organization

Implementing Site Reliability Engineering in your organization
- Making Culture, Enabling DevOps, Building Platform -

Infra Study 2nd #7「SREと組織」
https://forkwell.connpass.com/event/228038/

93c80c388fe9d8f9df7d030549a0ff0b?s=128

Takeshi Kondo

November 16, 2021
Tweet

Transcript

  1. Implementing Site Reliability Engineering in your organization - Making Culture,

    Enabling DevOps, Building Platform - Takeshi Kondo / @chaspy 2021/11/16 Infra Study 2nd #7ʮSREͱ૊৫ʯ
  2. Who am I chaspy chaspy_ Engineering Manager Site Reliability at

    Recruit Co., Ltd. Takeshi Kondo
  3. SRE NEXT 2020 https://sre-next.dev/schedule#c4

  4. ࠓ೔࿩͢͜ͱ / ࿩͞ͳ͍͜ͱ / ର৅ • ࿩͢͜ͱ • SRE Λ૊৫ʹ࣮૷͢ΔͨΊͷഎܠͱͳΔߟ͑ํ

    • ࿩͞ͳ͍͜ͱ • ಛఆͷٕज़ͷ࿩ • SRE Practice ͷ࣮ફྫ • ର৅ • SRE Λ૊৫ʹ࣮૷͍͚ͨ͠Ͳ೰ΜͰΔͻͱ
  5. Tl;dr • SRE Λ૊৫ʹ࣮૷͢ΔͨΊʹ৺͕͚Δͱ͍͍͜ͱ • φϥςΟϒΛཧղ͢Δ • ೝ஌ෛՙΛԼ͛Δ • పఈతʹݴޠԽ͢Δ

  6. Infra Study Meetup #3ʮSREͷ͜Ε·Ͱͱ͜Ε͔Βʯ https://speakerdeck.com/masayoshi/sre-culture-organization?slide=29

  7. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  8. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ ⾢ main 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  9. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  10. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • SRE ͷ໨ඪ • αΠτͷ৴པੑΛίϯτϩʔϧ͢Δ͜ͱ • Agility ͱ

    Reliability ͲͪΒʹ౤ࢿ͢Δͷ͔Λ SLO ͱ͍͏ࢦඪΛݩʹ ҙࢥܾఆ͢Δ • ࣗ෼ͨͪͷϓϩμΫτɾαʔϏεΛ࡞ΔνʔϜ͕͜ΕΒΛ౰ͨΓલʹ Ͱ͖Δঢ়ଶΛ໨ࢦ͢
  11. ૊৫΋γεςϜ https://speakerdeck.com/masayoshi/sre-culture-organization?slide=29

  12. ૊৫΋γεςϜ💡

  13. ૊৫΋γεςϜ🤔

  14. ૊৫͸ਓؒ🙆

  15. ૊৫͸࿩͞ͳ͍Ͱ͢Α https://note.com/qsona/n/ncb9e1f242fb4

  16. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ • SRE ͷ໨ඪ • αΠτͷ৴པੑΛίϯτϩʔϧ͢Δ͜ͱ • Agility ͱ

    Reliability ͲͪΒʹ౤ࢿ͢Δͷ͔Λ SLO ͱ͍͏ࢦඪΛݩʹ ҙࢥܾఆ͢Δ • ࣗ෼ͨͪͷϓϩμΫτɾαʔϏεΛ࡞ΔνʔϜ͕͜ΕΒΛ౰ͨΓલʹ Ͱ͖Δঢ়ଶΛ໨ࢦ͢ ૊৫ʹνʔϜʹਓؒʹ Կ͔Λ࣮ߦͯ͠΋Β͏
  17. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  18. ελσΟαϓϦ K12 SRE Team ͷ Vision / Mission / Values

    • Vision • ࠷ߴͷֶशϓϩμΫτΛ࡞Γଓ͚ΒΕΔ։ൃ૊৫ͷ࣮ݱ • Mission • ࣗݾ׬݁νʔϜ͕ϓϩμΫτΛૉૣ҆͘શʹಧ͚ଓ͚ΔͨΊͷϓϥο τϑΥʔϜͱจԽΛ࡞Δ • Values • Fail smart / Learning / Borderless / Metrics-driven
  19. ελσΟαϓϦ K12 SRE Team ͷ Vision / Mission / Values

    https://blog.studysapuri.jp/entry/sre-vision-mission-values
  20. ૊৫ن໛ͷਪҠ     ։ൃऀ    

    43&     ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ
  21. Timeline at 2020-01-11 (SRE NEXT 2020)

  22. 2020೥1݄౰࣌ͷঢ়گ • Platform Λ Kubernetes ʹࡌͤସ͑ɺMicroservices Ready ͳঢ়گΛ໨ࢦ͍ͯͨ͠ • ૊৫ͱγεςϜ͕εέʔϧ͢ΔΑ͏ʹϓϩηεΛ੔͑ͨ

    • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc • Production Readiness Checklist • Self-services Infrastructure (Terraform) KubernetesಋೖͰ࣮ݱ͍ͨ͠ੈքͱͦͷઌʹ͋ΔMicroservices https://blog.studysapuri.jp/entry/future-with-kubernetes
  23. ͦΕ͔Β1೥൒ɺ2021೥10݄ݱࡏ • "ࣗݾ׬݁Խ / self-contained"Λ໨ࢦ͢ • ֤αʔϏενʔϜ͕౰ͨΓલʹDesign Doc Λॻ͖ɺSLI/ SLO

    Λߟ͑ɺఆٛ͠ɺఆظతʹͦΕΛ؍࡯͍ͯ͠Δ • จԽͱͯ͠ఆணͨ͠ͱݴͬͯ΋͍͍͸ͣ
  24. จԽ🤔

  25. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ ⾢ main 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  26. จԽͱ͸Կ͔ ਓ͕ؒࣾձͷߏ੒һͱͯ֫͠ಘ͢Δ ଟ਺ͷৼΔ෣͍ͷશମͷ͜ͱ (Wikipedia ΑΓ)

  27. SRE ͷ૊৫΁ͷ࣮૷ͱ͸Կ͔ • SRE ͷ૊৫΁ͷ࣮૷ ▶ SRE ͷจԽΛ։ൃ૊৫ʹ࡞Δ͜ͱ • ։ൃϝϯόʔ͕ɺ։ൃ૊৫ͷߏ੒һͱͯ͠ɺSLI/SLO

    Λఆٛ ͢ΔͳͲͷ Practice Λ࣮ફ͠ɺ৴པੑΛίϯτϩʔϧ͢Δৼ Δ෣͍Λࣗવʹ࣮ߦ͍ͯ͠Δঢ়ଶͷ͜ͱ
  28. ͲͷΑ͏ʹจԽΛ࡞Δͷ͔ • ҟจԽΛ։ൃνʔϜʹड͚ೖΕͯ΋Β͏͜ͱ • ૊৫͸ਓؒͰߏ੒͞Ε͍ͯΔɺਓؒͷಛੑΛཧղ͢Δ • ਓؒɺ஌Βͳ͍΋ͷʹ͸ෆ҆Λ֮͑Δ • ਓؒɺ೉͍͠΋ͷ͸΍Γͨ͘ͳ͍ •

    ਓؒɺมԽ͸ۤख γεςϜͱͷҧ͍Ͱ͢Ͷ
  29. ҟจԽΛड͚ೖΕͯ΋Β͏ͨΊʹͲ͏͢Ε͹͍͍͔ • ର৅ʢSREจԽʣ͕Կͳͷ͔Λ஌Δ • ߹ཧੑɾϝϦοτΛཧղ͠ɺΠϯηϯςΟϒΛײ͡Δ • ࣮ફ͕ՄೳͳݶΓ؆୯ʹͳ͍ͬͯΔ ʮઆ໌Λ͢Δʯ͚ͩͰ͜ΕΛ੒͠਱͛Δͷ͸೉͍͠

  30. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ • φϥςΟϒΛཧղ͢Δ • ೝ஌ෛՙΛԼ͛Δ • పఈతʹݴޠԽ͢Δ

  31. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ • φϥςΟϒΛཧղ͢Δ • ೝ஌ෛՙΛԼ͛Δ • పఈతʹݴޠԽ͢Δ

  32. φϥςΟϒΛཧղ͢Δ https://publishing.newspicks.com/books/9784910063010 https://twitter.com/chaspy_/status/1223088950387982337?s=20 https://twitter.com/chaspy_/status/1403587911421894657?s=20

  33. ໺ྑ 1on1 ͷ͢ʍΊ ηΫγϣϯͷนΛ௒͑ͯڧྗ͋͠͏ https://speakerdeck.com/chaspy/how-we-overcame-the-covid-19-crisis?slide=56 8FC%FWFMPQFS νʔϜن໛͕มԽ͍ͯ͠Δঢ়گͰͷӡ༻ ෛ୲΍ɺݱঢ়ೝࣝΛ஌Δ͜ͱ͕Ͱ͖ͨ #VTJOFTT%FWFMPQFS1MBOOFS 43&ͬͯͦ΋ͦ΋஌ͬͯ·͔͢ʁͬͯ࿩΍ɺ

    ֶशऀ 6TFS ,1*ɺ৴པੑࢦඪͷ࿩Λڞ༗ ͪͳΈʹ௅Ήͱ͖͸ͪΌΜͱ google doc ʹ agenda Λॻ͍ͯࣄલʹڞ༗͍ͯ͠·͢ɻͼͬ͘Γͪ͠Ό͏͔ΒͶɻ
  34. φϥςΟϒΛཧղ͢Δ • ର࿩Λ௨ͯ͡ଞऀΛ஌ΔɺΘ͔Γ͋͑ͳ͞Λ஌Δ • ཱ৔͕ҧ͏ਓؒͱڠۀ͢Δதɺࣗ෼͕ͨͪ໨ࢦ͢ੈքΛͲ͏ ࣮ݱ͢Δ͔Λߟ͑ൈ͘ • ૬खʹͦ΋ͦ΋ࣗ෼͕ͨͪ΍Ζ͏ͱ͍ͯ͠Δ͜ͱΛཧղͯ͠΋Β͑Δ • ΠϯηϯςΟϒͷઆ໌ͷ࢓ํΛߟ͑Δ͖͔͚ͬʹͳΔ

  35. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ • φϥςΟϒΛཧղ͢Δ • ೝ஌ෛՙΛԼ͛Δ • పఈతʹݴޠԽ͢Δ

  36. ೝ஌ෛՙ / Cognitive load In cognitive psychology, cognitive load refers

    to the used amount of working memory resources (Wikipedia ΑΓ)
  37. ೝ஌ෛՙΛԼ͛Δ SRE NEXT 2020 ͰʮSLO Reviewʯͱ͍͏λΠτϧͰొஃ͠·ͨ͠ #srenext https://blog.studysapuri.jp/entry/2020/01/30/slo-review

  38. ೝ஌ෛՙΛԼ͛Δ • υΩϡϝϯςʔγϣϯͱͦͷಋઢ • ΨΠυͱࣗಈԽ • γϯϓϧ͞

  39. ೝ஌ෛՙΛԼ͛Δ σϓϩΠ׬ྃ௨஌ͱҰॹʹɺ1SFWJFX؀ ڥ 13͝ͱʹੜ੒͞ΕΔ؀ڥ ΁ͷ-JOL ͱ"SHP$%6*΁ͷ-JOLΛ௨஌ ,VCFSOFUFTNBOJGFTUΛมߋͨ͠৔߹ɺ DVTUPNJ[F࣮ߦޙͷࠩ෼EJ ff Λ௨஌

  40. ೝ஌ෛՙΛԼ͛Δ $POGUFTUʹΑΔNBOJGFTUMJOU /PEFB ffi OJUZࢦఆෆ଍ Ͳ͏मਖ਼͢Ε͹͍͍͔͕ॻ͔Εͨ υΩϡϝϯτʹ༠ಋ

  41. Terraform Platform in Quipper HashiTalks Japan 2021 ͰฐϓϩμΫτͷ Terraform Platform

    ʹ͍ͭͯొஃ͠·ͨ͠ https://blog.studysapuri.jp/entry/2021/10/13/080000
  42. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ • φϥςΟϒΛཧղ͢Δ • ೝ஌ෛՙΛԼ͛Δ • పఈతʹݴޠԽ͢Δ

  43. పఈతͳݴޠԽΛ͢Δ • Backgrond, Problem, Why, What, How Λݴ༿ʹ͢Δ • ϑΟʔυόοΫΛ΋Β͍΍͘͢͢Δ

    • উखʹڞ༗͞Ε͍ͯ͘ ݴ༿͸ڧ͍ɻݴ༿ʹͯ͠ɺҙࢥܾఆͯ͠ɺ ࢼͯ͠ɺৼΓฦΔ͜ͱΛ܁Γฦ͔͢͠ͳ͍
  44. quipper/snippets-ja

  45. quipper/snippets-ja

  46. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ • φϥςΟϒΛཧղ͢Δ • ೝ஌ෛՙΛԼ͛Δ • పఈతʹݴޠԽ͢Δ

  47. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ͜Ε·ͰελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯ ͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  48. Class SRE implements DevOps • DevOps ͱ͍͏ࢥ૝Λ࣮ફ͢Δͷ͕ SRE • What's

    the Difference Between DevOps and SRE? (class SRE implements DevOps) https://www.youtube.com/watch?v=uTEL8Ff1Zvk
  49. What is DevOps?

  50. ελσΟαϓϦখதߴେ։ൃ෦ "ٕज़ઓུάϧʔϓ" • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ • ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘ • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ

    • DevOps WG (Working Group) Ͱ׆ಈத
  51. DevOps WG • ֤։ൃνʔϜͱ SRE, QA ϝϯόʔͰʮࣗݾ਍அೳྗͷ֫ಘʯ Λߟ͍͑ͯΔ • DX

    Criteria ͷ࣮ࢪ • ։ൃͷόϦϡʔετϦʔϜϚοϐϯάͷ࡞੒ • Metrics ͷऩूͱ؍࡯ • Platform ͷൈຊతͳվળ΁ͷΠϯϓοτ
  52. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  53. Why Platform? • Platform = ڞ௨ʹ࢖ΘΕΔ࣮ߦج൫ • ͳͥ Platform ͕ඞཁ͔ʁ

    • ӡ༻ෛՙΛԼ͛Δ • ೝ஌ෛՙΛԼ͛Δ • Metrics ΛΑΓޮՌతʹऩू͢Δ • Agility ͱ Reliability ͷ૒ํΛߴΊΔ
  54. Cloud Native Platform Team ϝϯόʔͷϛογϣϯΛπϦʔߏ଄ʹͨ͠ϛογϣϯπϦʔΛ࡞Γ·ͨ͠ɻ͜Εʹ͍ͭͯϒϩάΛॻ͘༧ఆɻ͜Ε͸ Platform ͱ͍͏ Tree ͷ child

    nodes
  55. Agenda 1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔ 2. ελσΟαϓϦ / Quipper ͷ SRE

    ͱͯ͠΍͖ͬͯͨ͜ͱ 3. SRE ͱจԽ 4. SRE ͱ DevOps 5. SRE ͱ Platform 6. ·ͱΊͱࠓޙ
  56. ·ͱΊ • SRE ͷ૊৫΁ͷ࣮૷ͷͨΊʹ৺͕͚Δͱ͍͍͜ͱ • φϥςΟϒΛཧղ͢Δ • ೝ஌ෛՙΛԼ͛Δ • పఈతʹݴޠԽ͢Δ

    • ։ൃ૊৫શମͰมԽʹڧ͍૊৫ͱγεςϜΛ࡞͍ͬͯ͘ • DevOps ͷ࣮ݱ • Platform ͷ։ൃ • ։ൃνʔϜ಺Ͱͷ SRE ࣮ફͷαϙʔτ
  57. ࠓޙ • Site Reliability Engineering ͷ૊৫΁ͷ࣮૷͸Ҿ͖ଓ͖΍Δ • ։ൃ Team ಺Ͱͷ

    capability शಘΛࢧԉ͍ͯ͘͠ • SRE Team ͸Πϯϑϥͷٕज़తෛ࠴Λղফ͠ͳ͕Βɺ Platform ։ൃʹ஫ྗ͍ͯ͘͠ • Security ΋ Reliability ͱಉ͡ߏਤʹͳΔɻ։ൃνʔϜ͕ࣗ཯ తʹ࣮ફͰ͖ΔΑ͏ͳจԽͱϓϥοτϑΥʔϜΛ࡞͍ͬͯ͘
  58. We are hiring! https://brand.studysapuri.jp/career/position/sre

  59. Thank you! chaspy chaspy_ Engineering Manager Site Reliability at Recruit

    Co., Ltd. Takeshi Kondo
  60. FAQ: ࠷΋఍߅͕ڧ͔ͬͨͷ͸Ͳ͜ͰɺͲΜͳࣄ͕͖͔͚ͬͰᬍ᫯͠·͔ͨ͠ʁ • "ڧ͍఍߅ʹ͋ͬͨ"Έ͍ͨͳγʔϯ͸ͳ͔ͬͨ • ͔ͳΓͬ͘͡ΓೖΕ͍ͯͬͨͷͰ

  61. FAQ: SREͷ࣮૷͕ఆணͨ͠ޙɺଞνʔϜͷҙࣝ΍ߦಈ͸Կ͔มԽͨ͠Ͱ͠ΐ͏͔ʁ • SLO ҧ൓ͷΞϥʔτ΍ Loadtest ͷ݁ՌΛݩʹੑೳվળ͕ࣗ ཯తʹͰ͖͍ͯΔ

  62. FAQ: SREνʔϜࣗ਎͕Πϯϑϥج൫Λ։ൃͨ͠Γ͠·͔͢ʁ • Yes. جຊతʹ Cloud Λ׆༻ͨ͠ج൫Λ։ൃ͍ͯ͠Δ • Cloud Native

    Platform • Application CI/CD • Infrastructure CI/CD • Kubernetes Platform • Loadtest Platform