$30 off During Our Annual Pro Sale. View Details »

Implementing Site Reliability Engineering in your organization

Takeshi Kondo
November 16, 2021

Implementing Site Reliability Engineering in your organization

Implementing Site Reliability Engineering in your organization
- Making Culture, Enabling DevOps, Building Platform -

Infra Study 2nd #7「SREと組織」
https://forkwell.connpass.com/event/228038/

Takeshi Kondo

November 16, 2021
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. Implementing Site Reliability
    Engineering in your organization


    - Making Culture, Enabling DevOps, Building Platform -
    Takeshi Kondo / @chaspy


    2021/11/16


    Infra Study 2nd #7ʮSREͱ૊৫ʯ

    View Slide

  2. Who am I
    chaspy chaspy_
    Engineering Manager

    Site Reliability at Recruit Co., Ltd.
    Takeshi Kondo

    View Slide

  3. SRE NEXT 2020
    https://sre-next.dev/schedule#c4

    View Slide

  4. ࠓ೔࿩͢͜ͱ / ࿩͞ͳ͍͜ͱ / ର৅
    • ࿩͢͜ͱ


    • SRE Λ૊৫ʹ࣮૷͢ΔͨΊͷഎܠͱͳΔߟ͑ํ


    • ࿩͞ͳ͍͜ͱ


    • ಛఆͷٕज़ͷ࿩


    • SRE Practice ͷ࣮ફྫ


    • ର৅


    • SRE Λ૊৫ʹ࣮૷͍͚ͨ͠Ͳ೰ΜͰΔͻͱ

    View Slide

  5. Tl;dr
    • SRE Λ૊৫ʹ࣮૷͢ΔͨΊʹ৺͕͚Δͱ͍͍͜ͱ


    • φϥςΟϒΛཧղ͢Δ


    • ೝ஌ෛՙΛԼ͛Δ


    • పఈతʹݴޠԽ͢Δ

    View Slide

  6. Infra Study Meetup #3ʮSREͷ͜Ε·Ͱͱ͜Ε͔Βʯ
    https://speakerdeck.com/masayoshi/sre-culture-organization?slide=29

    View Slide

  7. Agenda
    1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    2. ελσΟαϓϦ / Quipper ͷ SRE ͱͯ͠΍͖ͬͯͨ͜ͱ


    3. SRE ͱจԽ


    4. SRE ͱ DevOps


    5. SRE ͱ Platform


    6. ·ͱΊͱࠓޙ

    View Slide

  8. Agenda
    1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    2. ελσΟαϓϦ / Quipper ͷ SRE ͱͯ͠΍͖ͬͯͨ͜ͱ


    3. SRE ͱจԽ ⾢ main


    4. SRE ͱ DevOps


    5. SRE ͱ Platform


    6. ·ͱΊͱࠓޙ

    View Slide

  9. Agenda
    1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    2. ελσΟαϓϦ / Quipper ͷ SRE ͱͯ͠΍͖ͬͯͨ͜ͱ


    3. SRE ͱจԽ


    4. SRE ͱ DevOps


    5. SRE ͱ Platform


    6. ·ͱΊͱࠓޙ

    View Slide

  10. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔
    • SRE ͷ໨ඪ


    • αΠτͷ৴པੑΛίϯτϩʔϧ͢Δ͜ͱ


    • Agility ͱ Reliability ͲͪΒʹ౤ࢿ͢Δͷ͔Λ SLO ͱ͍͏ࢦඪΛݩʹ
    ҙࢥܾఆ͢Δ


    • ࣗ෼ͨͪͷϓϩμΫτɾαʔϏεΛ࡞ΔνʔϜ͕͜ΕΒΛ౰ͨΓલʹ
    Ͱ͖Δঢ়ଶΛ໨ࢦ͢

    View Slide

  11. ૊৫΋γεςϜ
    https://speakerdeck.com/masayoshi/sre-culture-organization?slide=29

    View Slide

  12. ૊৫΋γεςϜ💡

    View Slide

  13. ૊৫΋γεςϜ🤔

    View Slide

  14. ૊৫͸ਓؒ🙆

    View Slide

  15. ૊৫͸࿩͞ͳ͍Ͱ͢Α
    https://note.com/qsona/n/ncb9e1f242fb4

    View Slide

  16. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔
    • SRE ͷ໨ඪ


    • αΠτͷ৴པੑΛίϯτϩʔϧ͢Δ͜ͱ


    • Agility ͱ Reliability ͲͪΒʹ౤ࢿ͢Δͷ͔Λ SLO ͱ͍͏ࢦඪΛݩʹ
    ҙࢥܾఆ͢Δ


    • ࣗ෼ͨͪͷϓϩμΫτɾαʔϏεΛ࡞ΔνʔϜ͕͜ΕΒΛ౰ͨΓલʹ
    Ͱ͖Δঢ়ଶΛ໨ࢦ͢
    ૊৫ʹνʔϜʹਓؒʹ
    Կ͔Λ࣮ߦͯ͠΋Β͏

    View Slide

  17. Agenda
    1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    2. ελσΟαϓϦ / Quipper ͷ SRE ͱͯ͠΍͖ͬͯͨ͜ͱ


    3. SRE ͱจԽ


    4. SRE ͱ DevOps


    5. SRE ͱ Platform


    6. ·ͱΊͱࠓޙ

    View Slide

  18. ελσΟαϓϦ K12 SRE Team ͷ Vision / Mission / Values
    • Vision


    • ࠷ߴͷֶशϓϩμΫτΛ࡞Γଓ͚ΒΕΔ։ൃ૊৫ͷ࣮ݱ


    • Mission


    • ࣗݾ׬݁νʔϜ͕ϓϩμΫτΛૉૣ҆͘શʹಧ͚ଓ͚ΔͨΊͷϓϥο
    τϑΥʔϜͱจԽΛ࡞Δ


    • Values


    • Fail smart / Learning / Borderless / Metrics-driven

    View Slide

  19. ελσΟαϓϦ K12 SRE Team ͷ Vision / Mission / Values
    https://blog.studysapuri.jp/entry/sre-vision-mission-values

    View Slide

  20. ૊৫ن໛ͷਪҠ

    ։ൃऀ
    43&
    ։ൃऀ͸ελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷ਺ɻNative ͸আ֎͍ͯ͠Δɻ

    View Slide

  21. Timeline at 2020-01-11 (SRE NEXT 2020)

    View Slide

  22. 2020೥1݄౰࣌ͷঢ়گ
    • Platform Λ Kubernetes ʹࡌͤସ͑ɺMicroservices Ready
    ͳঢ়گΛ໨ࢦ͍ͯͨ͠


    • ૊৫ͱγεςϜ͕εέʔϧ͢ΔΑ͏ʹϓϩηεΛ੔͑ͨ


    • αʔϏεΦʔφʔγοϓͷࡦఆ


    • Design Doc


    • Production Readiness Checklist


    • Self-services Infrastructure (Terraform)
    KubernetesಋೖͰ࣮ݱ͍ͨ͠ੈքͱͦͷઌʹ͋ΔMicroservices https://blog.studysapuri.jp/entry/future-with-kubernetes

    View Slide

  23. ͦΕ͔Β1೥൒ɺ2021೥10݄ݱࡏ
    • "ࣗݾ׬݁Խ / self-contained"Λ໨ࢦ͢


    • ֤αʔϏενʔϜ͕౰ͨΓલʹDesign Doc Λॻ͖ɺSLI/
    SLO Λߟ͑ɺఆٛ͠ɺఆظతʹͦΕΛ؍࡯͍ͯ͠Δ


    • จԽͱͯ͠ఆணͨ͠ͱݴͬͯ΋͍͍͸ͣ

    View Slide

  24. จԽ🤔

    View Slide

  25. Agenda
    1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    2. ελσΟαϓϦ / Quipper ͷ SRE ͱͯ͠΍͖ͬͯͨ͜ͱ


    3. SRE ͱจԽ ⾢ main


    4. SRE ͱ DevOps


    5. SRE ͱ Platform


    6. ·ͱΊͱࠓޙ

    View Slide

  26. จԽͱ͸Կ͔


    ਓ͕ؒࣾձͷߏ੒һͱͯ֫͠ಘ͢Δ
    ଟ਺ͷৼΔ෣͍ͷશମͷ͜ͱ
    (Wikipedia ΑΓ)

    View Slide

  27. SRE ͷ૊৫΁ͷ࣮૷ͱ͸Կ͔
    • SRE ͷ૊৫΁ͷ࣮૷ ▶ SRE ͷจԽΛ։ൃ૊৫ʹ࡞Δ͜ͱ


    • ։ൃϝϯόʔ͕ɺ։ൃ૊৫ͷߏ੒һͱͯ͠ɺSLI/SLO Λఆٛ
    ͢ΔͳͲͷ Practice Λ࣮ફ͠ɺ৴པੑΛίϯτϩʔϧ͢Δৼ
    Δ෣͍Λࣗવʹ࣮ߦ͍ͯ͠Δঢ়ଶͷ͜ͱ

    View Slide

  28. ͲͷΑ͏ʹจԽΛ࡞Δͷ͔
    • ҟจԽΛ։ൃνʔϜʹड͚ೖΕͯ΋Β͏͜ͱ


    • ૊৫͸ਓؒͰߏ੒͞Ε͍ͯΔɺਓؒͷಛੑΛཧղ͢Δ


    • ਓؒɺ஌Βͳ͍΋ͷʹ͸ෆ҆Λ֮͑Δ


    • ਓؒɺ೉͍͠΋ͷ͸΍Γͨ͘ͳ͍


    • ਓؒɺมԽ͸ۤख
    γεςϜͱͷҧ͍Ͱ͢Ͷ

    View Slide

  29. ҟจԽΛड͚ೖΕͯ΋Β͏ͨΊʹͲ͏͢Ε͹͍͍͔
    • ର৅ʢSREจԽʣ͕Կͳͷ͔Λ஌Δ


    • ߹ཧੑɾϝϦοτΛཧղ͠ɺΠϯηϯςΟϒΛײ͡Δ


    • ࣮ફ͕ՄೳͳݶΓ؆୯ʹͳ͍ͬͯΔ
    ʮઆ໌Λ͢Δʯ͚ͩͰ͜ΕΛ੒͠਱͛Δͷ͸೉͍͠

    View Slide

  30. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ
    • φϥςΟϒΛཧղ͢Δ


    • ೝ஌ෛՙΛԼ͛Δ


    • పఈతʹݴޠԽ͢Δ

    View Slide

  31. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ
    • φϥςΟϒΛཧղ͢Δ


    • ೝ஌ෛՙΛԼ͛Δ


    • పఈతʹݴޠԽ͢Δ

    View Slide

  32. φϥςΟϒΛཧղ͢Δ
    https://publishing.newspicks.com/books/9784910063010
    https://twitter.com/chaspy_/status/1223088950387982337?s=20
    https://twitter.com/chaspy_/status/1403587911421894657?s=20

    View Slide

  33. ໺ྑ 1on1 ͷ͢ʍΊ
    ηΫγϣϯͷนΛ௒͑ͯڧྗ͋͠͏ https://speakerdeck.com/chaspy/how-we-overcame-the-covid-19-crisis?slide=56
    8FC%FWFMPQFS
    νʔϜن໛͕มԽ͍ͯ͠Δঢ়گͰͷӡ༻
    ෛ୲΍ɺݱঢ়ೝࣝΛ஌Δ͜ͱ͕Ͱ͖ͨ
    #VTJOFTT%FWFMPQFS1MBOOFS
    43&ͬͯͦ΋ͦ΋஌ͬͯ·͔͢ʁͬͯ࿩΍ɺ
    ֶशऀ 6TFS
    ,1*ɺ৴པੑࢦඪͷ࿩Λڞ༗
    ͪͳΈʹ௅Ήͱ͖͸ͪΌΜͱ google doc ʹ agenda Λॻ͍ͯࣄલʹڞ༗͍ͯ͠·͢ɻͼͬ͘Γͪ͠Ό͏͔ΒͶɻ

    View Slide

  34. φϥςΟϒΛཧղ͢Δ
    • ର࿩Λ௨ͯ͡ଞऀΛ஌ΔɺΘ͔Γ͋͑ͳ͞Λ஌Δ


    • ཱ৔͕ҧ͏ਓؒͱڠۀ͢Δதɺࣗ෼͕ͨͪ໨ࢦ͢ੈքΛͲ͏
    ࣮ݱ͢Δ͔Λߟ͑ൈ͘


    • ૬खʹͦ΋ͦ΋ࣗ෼͕ͨͪ΍Ζ͏ͱ͍ͯ͠Δ͜ͱΛཧղͯ͠΋Β͑Δ


    • ΠϯηϯςΟϒͷઆ໌ͷ࢓ํΛߟ͑Δ͖͔͚ͬʹͳΔ

    View Slide

  35. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ
    • φϥςΟϒΛཧղ͢Δ


    • ೝ஌ෛՙΛԼ͛Δ


    • పఈతʹݴޠԽ͢Δ

    View Slide

  36. ೝ஌ෛՙ / Cognitive load


    In cognitive psychology, cognitive
    load refers to the used amount
    of working memory resources


    (Wikipedia ΑΓ)

    View Slide

  37. ೝ஌ෛՙΛԼ͛Δ
    SRE NEXT 2020 ͰʮSLO Reviewʯͱ͍͏λΠτϧͰొஃ͠·ͨ͠ #srenext https://blog.studysapuri.jp/entry/2020/01/30/slo-review

    View Slide

  38. ೝ஌ෛՙΛԼ͛Δ
    • υΩϡϝϯςʔγϣϯͱͦͷಋઢ


    • ΨΠυͱࣗಈԽ


    • γϯϓϧ͞

    View Slide

  39. ೝ஌ෛՙΛԼ͛Δ
    σϓϩΠ׬ྃ௨஌ͱҰॹʹɺ1SFWJFX؀
    ڥ 13͝ͱʹੜ੒͞ΕΔ؀ڥ
    ΁ͷ-JOL
    ͱ"SHP$%6*΁ͷ-JOLΛ௨஌
    ,VCFSOFUFTNBOJGFTUΛมߋͨ͠৔߹ɺ
    DVTUPNJ[F࣮ߦޙͷࠩ෼EJ
    ff
    Λ௨஌

    View Slide

  40. ೝ஌ෛՙΛԼ͛Δ
    $POGUFTUʹΑΔNBOJGFTUMJOU
    /PEFB
    ffi
    OJUZࢦఆෆ଍

    Ͳ͏मਖ਼͢Ε͹͍͍͔͕ॻ͔Εͨ
    υΩϡϝϯτʹ༠ಋ

    View Slide

  41. Terraform Platform in Quipper
    HashiTalks Japan 2021 ͰฐϓϩμΫτͷ Terraform Platform ʹ͍ͭͯొஃ͠·ͨ͠ https://blog.studysapuri.jp/entry/2021/10/13/080000

    View Slide

  42. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ
    • φϥςΟϒΛཧղ͢Δ


    • ೝ஌ෛՙΛԼ͛Δ


    • పఈతʹݴޠԽ͢Δ

    View Slide

  43. పఈతͳݴޠԽΛ͢Δ
    • Backgrond, Problem, Why, What, How Λݴ༿ʹ͢Δ


    • ϑΟʔυόοΫΛ΋Β͍΍͘͢͢Δ


    • উखʹڞ༗͞Ε͍ͯ͘
    ݴ༿͸ڧ͍ɻݴ༿ʹͯ͠ɺҙࢥܾఆͯ͠ɺ


    ࢼͯ͠ɺৼΓฦΔ͜ͱΛ܁Γฦ͔͢͠ͳ͍

    View Slide

  44. quipper/snippets-ja

    View Slide

  45. quipper/snippets-ja

    View Slide

  46. จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ
    • φϥςΟϒΛཧղ͢Δ


    • ೝ஌ෛՙΛԼ͛Δ


    • పఈతʹݴޠԽ͢Δ

    View Slide

  47. Agenda
    1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    2. ͜Ε·ͰελσΟαϓϦ / Quipper ͷ SRE ͱͯ͠΍͖ͬͯ
    ͨ͜ͱ


    3. SRE ͱจԽ


    4. SRE ͱ DevOps


    5. SRE ͱ Platform


    6. ·ͱΊͱࠓޙ

    View Slide

  48. Class SRE implements DevOps
    • DevOps ͱ͍͏ࢥ૝Λ࣮ફ͢Δͷ͕ SRE


    • What's the Difference Between DevOps and SRE? (class
    SRE implements DevOps)
    https://www.youtube.com/watch?v=uTEL8Ff1Zvk

    View Slide

  49. What is DevOps?

    View Slide

  50. ελσΟαϓϦখதߴେ։ൃ෦ "ٕज़ઓུάϧʔϓ"
    • ϓϩμΫτ։ൃ૊৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ


    • ٕज़తͳϏδϣϯͱํ਑ͷࡦఆ


    • ٕज़త՝୊ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘


    • վળαΠΫϧͷཱ֬ͱࣗݾ਍அೳྗͷ֫ಘ


    • DevOps WG (Working Group) Ͱ׆ಈத

    View Slide

  51. DevOps WG
    • ֤։ൃνʔϜͱ SRE, QA ϝϯόʔͰʮࣗݾ਍அೳྗͷ֫ಘʯ
    Λߟ͍͑ͯΔ


    • DX Criteria ͷ࣮ࢪ


    • ։ൃͷόϦϡʔετϦʔϜϚοϐϯάͷ࡞੒


    • Metrics ͷऩूͱ؍࡯


    • Platform ͷൈຊతͳվળ΁ͷΠϯϓοτ

    View Slide

  52. Agenda
    1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    2. ελσΟαϓϦ / Quipper ͷ SRE ͱͯ͠΍͖ͬͯͨ͜ͱ


    3. SRE ͱจԽ


    4. SRE ͱ DevOps


    5. SRE ͱ Platform


    6. ·ͱΊͱࠓޙ

    View Slide

  53. Why Platform?
    • Platform = ڞ௨ʹ࢖ΘΕΔ࣮ߦج൫


    • ͳͥ Platform ͕ඞཁ͔ʁ


    • ӡ༻ෛՙΛԼ͛Δ


    • ೝ஌ෛՙΛԼ͛Δ


    • Metrics ΛΑΓޮՌతʹऩू͢Δ


    • Agility ͱ Reliability ͷ૒ํΛߴΊΔ

    View Slide

  54. Cloud Native Platform
    Team ϝϯόʔͷϛογϣϯΛπϦʔߏ଄ʹͨ͠ϛογϣϯπϦʔΛ࡞Γ·ͨ͠ɻ͜Εʹ͍ͭͯϒϩάΛॻ͘༧ఆɻ͜Ε͸ Platform ͱ͍͏ Tree ͷ child nodes

    View Slide

  55. Agenda
    1. SRE Λ૊৫ʹ࣮૷͢Δͱ͸Ͳ͏͍͏͜ͱ͔


    2. ελσΟαϓϦ / Quipper ͷ SRE ͱͯ͠΍͖ͬͯͨ͜ͱ


    3. SRE ͱจԽ


    4. SRE ͱ DevOps


    5. SRE ͱ Platform


    6. ·ͱΊͱࠓޙ

    View Slide

  56. ·ͱΊ
    • SRE ͷ૊৫΁ͷ࣮૷ͷͨΊʹ৺͕͚Δͱ͍͍͜ͱ


    • φϥςΟϒΛཧղ͢Δ


    • ೝ஌ෛՙΛԼ͛Δ


    • పఈతʹݴޠԽ͢Δ


    • ։ൃ૊৫શମͰมԽʹڧ͍૊৫ͱγεςϜΛ࡞͍ͬͯ͘


    • DevOps ͷ࣮ݱ


    • Platform ͷ։ൃ


    • ։ൃνʔϜ಺Ͱͷ SRE ࣮ફͷαϙʔτ

    View Slide

  57. ࠓޙ
    • Site Reliability Engineering ͷ૊৫΁ͷ࣮૷͸Ҿ͖ଓ͖΍Δ


    • ։ൃ Team ಺Ͱͷ capability शಘΛࢧԉ͍ͯ͘͠


    • SRE Team ͸Πϯϑϥͷٕज़తෛ࠴Λղফ͠ͳ͕Βɺ
    Platform ։ൃʹ஫ྗ͍ͯ͘͠


    • Security ΋ Reliability ͱಉ͡ߏਤʹͳΔɻ։ൃνʔϜ͕ࣗ཯
    తʹ࣮ફͰ͖ΔΑ͏ͳจԽͱϓϥοτϑΥʔϜΛ࡞͍ͬͯ͘

    View Slide

  58. We are hiring!
    https://brand.studysapuri.jp/career/position/sre

    View Slide

  59. Thank you!
    chaspy chaspy_
    Engineering Manager

    Site Reliability at Recruit Co., Ltd.
    Takeshi Kondo

    View Slide

  60. FAQ: ࠷΋఍߅͕ڧ͔ͬͨͷ͸Ͳ͜ͰɺͲΜͳࣄ͕͖͔͚ͬͰᬍ᫯͠·͔ͨ͠ʁ
    • "ڧ͍఍߅ʹ͋ͬͨ"Έ͍ͨͳγʔϯ͸ͳ͔ͬͨ


    • ͔ͳΓͬ͘͡ΓೖΕ͍ͯͬͨͷͰ

    View Slide

  61. FAQ: SREͷ࣮૷͕ఆணͨ͠ޙɺଞνʔϜͷҙࣝ΍ߦಈ͸Կ͔มԽͨ͠Ͱ͠ΐ͏͔ʁ
    • SLO ҧ൓ͷΞϥʔτ΍ Loadtest ͷ݁ՌΛݩʹੑೳվળ͕ࣗ
    ཯తʹͰ͖͍ͯΔ

    View Slide

  62. FAQ: SREνʔϜࣗ਎͕Πϯϑϥج൫Λ։ൃͨ͠Γ͠·͔͢ʁ
    • Yes. جຊతʹ Cloud Λ׆༻ͨ͠ج൫Λ։ൃ͍ͯ͠Δ


    • Cloud Native Platform


    • Application CI/CD


    • Infrastructure CI/CD


    • Kubernetes Platform


    • Loadtest Platform

    View Slide