Save 37% off PRO during our Black Friday Sale! »

SRE Practices in Organizations

SRE Practices in Organizations

Infra Study 2nd #7「SREと組織」の登壇資料です。
https://forkwell.connpass.com/event/228038/

Ed424b1e857828ce69b0fdf4c3291d2e?s=128

Takamura Narimichi

November 16, 2021
Tweet

Transcript

  1. None
  2. None
  3. about:me

  4. None
  5. None
  6. None
  7. None
  8. Motivation • SRE ͷ৴৚͸ཧղͨ͠ && SRE ͷϓϥΫςΟε΋ཧղͨ͠ • ҰํͰɺͲͷΑ͏ʹͯࣗࣾ͠ʹ SRE

    Λಋೖ͢Ε͹͍͍ͷ͔ϐϯͱ͜ͳ͍ • IT ٕज़Ҏ֎ʹ΋ཁૉ͕ඞཁͦ͏͕ͩɺ۩ମతʹͲͷΑ͏ͳ΋ͷ͕͋Δͷ͔ • SRE ʹؔ͢Δଞࣾࣄྫ͸ࢀߟʹͳΔ • ҰํͰɺࣗࣾ΁ద༻͢ΔͨΊʹɺͲͷΑ͏ͳ؍఺Ͱݕ౼͢Ε͹Α͍ͷ͔ → ιϑτεΩϧ ͱ SRE ૊৫ͷઃܭϙΠϯτ ʹ͍ͭͯ࿩͠·͢ → ࠓճͷൃද͕ ࣗࣾͷ SRE ૊৫ͷ্ཱͪ͛ɾशख़ ͷҰॿʹͳΕ͹޾͍Ͱ͢
  9. Table of Contents • Why is Organization Important in SRE?

    • Soft Skills required to implement SRE • SRE Organization Design
  10. Why is Organization Important in SRE?

  11. Business metrics Include Engineering metrics 1 1 Mohit Suley and

    Kurt Andersen, Understanding Business Metrics Can Make You a Better SRE, 2019, SREcon
  12. SRE collaborate a lot!!

  13. Culture beats strategy every time — Chapter 31 - Communication

    and Collaboration in SRE
  14. Soft Skills required to implement SRE

  15. Why are soft skills so important?

  16. Chapter 31 - Communication and Collaboration in SRE

  17. “A good SRE has an ability to critically examine a

    system and use that to guide them when asking questions of the system.” — Jamie Wilkinson, SRE at Google
  18. Top 5 Soft Skills in SRE3 1. Problem Solving 2.

    Teamwork 3. Composure underpressure 4. Written communication 5. Verbal communicaiton 3 Catchpoint, 2018 SRE report
  19. SRE ʹٻΊΒΕΔιϑτεΩϧ • ໰୊ΛޮՌతʹղܾ͢ΔͨΊʹ͸ɺଞऀͱ͏·͘ڠྗ͢Δೳྗ ͕ඞཁͰ͋Δ • ͢΂ͯͷ౴͑Λ஌͍ͬͯΔ͜ͱΛظ଴͞Ε͍ͯΔͷͰ͸ͳ͘ɺ νʔϜ΍૊৫ͷதͰ୭ʹॿ͚ΛٻΊΕ͹Α͍ͷ͔ɺͲͷΑ͏ʹ ίϛϡχέʔγϣϯΛͱΕ͹Α͍ͷ͔Λ஌͍ͬͯΔඞཁ͕͋Δ

  20. Soft Skills Example in Implement SRE

  21. Case Soft Skill Postmortem Blameless, Critical Thinking... SLI/SLO Organizational Behavior...

    Building consensus with managers Facilitation, Negotiation...
  22. Organizational Behavior • ਓΛಈ͔ͨ͢ΊͷΞϓϩʔν͸2ͭʹ෼ྨ͞ΕΔ • HRM: ࢓૊ΈʹΑΔΞϓϩʔν • OB: ରਓతͳΞϓϩʔν

    • SLI/SLO ΍ϙετϞʔςϜͳͲɺଞνʔϜΛר͖ࠐΉΑ͏ͳ γʔϯͰ OB ͸໾ʹཱͭ
  23. None
  24. ϕʔεͱͳΔߦಈݪཧ

  25. ॏཁͱͳΔ3ͭͷجૅ஌ࣝΧςΰϦ

  26. جૅཧ࿦ͷ۩ମྫ • ݸਓ • ex. εϖϯαʔʮණࢁϞσϧʯɺϘϠςΟζʮίϯϐςϯγʔ֓೦ ਤʯɺόϯσϡʔϥʮࣗݾޮྗײͷߏ੒ཁૉʯ • ूஂ •

    ex. ϨϰΟϯʮ૊৫มֵϓϩηεʯɺλοΫϚϯϞσϧ • Ϧʔμʔγοϓ࿦ • ex. ΧϦεϚϦʔμʔγοϓɺαʔόϯτϦʔμʔγοϓ...
  27. όϯσϡʔϥʮࣗݾޮྗײͷߏ੒ཁૉʯ5 5 GLOBIS ஌ݟ࿥࣮࿥ʂ, MBAᶈ ࣗ෼มֵ͸ɺߦ͖ͭ໭Γͭɺগͣͭ͠ʲ࠷ऴճʳ, 2015

  28. ϨϰΟϯʮ૊৫มֵϓϩηεʯ

  29. OB ͷ࣮ફྫ: SLI/SLO ͷஈ֊తͳಋೖ • ૊৫ͷಛੑΛ೺Ѳ ্ͨ͠Ͱɺղౚˠมֵˠ࠶ౚ݁ͷεςοϓ Λ ܦͭͭಋೖ͢Δ •

    Dev ͷߦಈݪཧΛཧղ ্ͨ͠ͰɺSLI/SLO ͷಋೖোนΛԼ͛Δ • SLI/SLO ΛτϦΨʔʹΞΫγϣϯͰ͖ΔΑ͏ʹɺߦಈม༰Λଅ ͢ࢪࡦ ʹऔΓ૊Ή
  30. SLI/SLO ಋೖͷϑΣʔζ෼͚ͷྫ

  31. SLI/SLO ಋೖ: ϑΣʔζ1 ·ͣ͸ SRE ͕ओମͱͳͬͯ૊৫ʹ SLI/SLO Λಋೖ͠ɺՁ஋ݕূΛߦ͏͜ͱΛ໨ࢦ ͢ɻ ӡ༻͸શମΛר͖ࠐΈͭͭɺSRE

    ͕ίϯτϩʔϧͰ͖ΔൣғͰ͸͡ΊΔͱΑ͍ɻ 1. SLI/SLO ͕ఆٛ͞Ε͍ͯΔ 2. SLI/SLO ʹؔ͢ΔϫʔΫϑϩʔ͕ఆٛ͞Ε͍ͯΔ 3. αʔϏενʔϜΛר͖ࠐΈͭͭɺSRE ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ • SLO ͷ஋ΛτϦΨʔʹΞϥʔτ௨஌Λ͢Δ • ৼΓฦΓձΛߦ͏
  32. SLI/SLO ಋೖ: ϑΣʔζ2 SRE ͷ௚઀తͳࢧԉͳ͠Ͱ SLI/SLO ͕ӡ༻͞ΕΔମ੍Λ໨ࢦ͢ɻ ϑΣʔζ1ͰɺSLI/SLO ʹର͢ΔՁ஋͕ೝΊΒΕ͔ͯΒ͜ͷϑΣʔζʹҠߦ͢Δɻ ר͖ࠐΉਓ਺΍ϩʔϧ͕૿͍͑ͯΔ఺͕ϑΣʔζ1ͱ͸ҟͳΔɻ

    ΑΓଟ͘ͷਓ͕ސ٬ࢹ఺Λ࣋ͬͯ SLI/SLO Λӡ༻͢Δঢ়ଶΛ໨ࢦ͢ɻ 1. PdM ΍ࣄۀ੹೚ऀͳͲͱͱ΋ʹɺࣄۀࢹ఺Λ౿·͑ͯ SLI/SLO ΛఆΊΔ͜ͱ ͕Ͱ͖Δ 2. αʔϏενʔϜ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ 3. Embedded SRE ͱͯ͠αʔϏενʔϜΛϑΥϩʔ͢Δମ੍͕͋Δ
  33. Facilitation • ೲಘײͷ͋Δ݁࿦ʹ౸ୡ͢Δ ͨΊͷεΩϧ • ޮՌతͳ ձٞͷ४උͱਐߦ Λߦ͏ͨΊʹඞཁͳձٞϚωδϝϯ τͷఆੴ

  34. it's difficult to find someone who's lucky enough to only

    have useful, effective meetings. This is equally true for SRE. — Chapter 31 - Communication and Collaboration in SRE
  35. None
  36. None
  37. None
  38. SRE Organization Design

  39. ཧ૝ͱݱ࣮ͷΪϟοϓʹର͢Δղ૾౓Λ্͛Δ 1. SRE ͸धཁʹରͯ͠Ϧιʔεෆ଍ʹؕΔ͜ͱ͕ଟ͍ 2. εέʔϧ͢Δߏ੒ΛऔΔඞཁ͕͋Δ 3. εέʔϥϏϦςΟΛอͱ͏ͱ͢Δͱ༷ʑͳϓϥΫςΟε͕ඞཁʹͳΔ 4. ࣮ࡍʹ͸Ϧιʔε͕গͳ͍ͷͰɺগͣͭ͠ਐΊΔඞཁ͕͋Δʢཁ͸ό

    ϥϯεʣɺͰࢥߟΛࢭΊͳ͍ 5. → SRE ૊৫Λߏங͢Δ্Ͱɺ޻෉Ͱ͖ΔϙΠϯτ͸Ͳ͜ʹ͋Δ͔Λ ཧղ͢Δ
  40. SRE ૊৫Λߏங͢Δࡍʹॏཁͳ3ͭͷϙΠϯτ • Roles • Responsibilities • Mindset

  41. ୅දతͳ 2 ͭͷϩʔϧ6 6 New Relic, SRE-iously: Defining the Principles,

    Habits, and Practices of Site Reliability Engineering , 2018
  42. Responsibilities • ۀ຿ͷ෼୲΍੹೚ͷॴࡏΛ໌֬ʹ͢Δ • RACIϚτϦΫε͸ҎԼͷ4ͭͷཁૉΛ໌֬ʹࣔ͢ࡍʹ༗ޮ • RʢResponsibleʣ: ࣮ߦ੹೚ऀ • AʢAccountableʣ:

    આ໌੹೚ऀ • CʢConsultedʣ: ૬ஊઌ • IʢInformedʣ: ใࠂઌ • Google ͷهࣄͰ΋ RACI ༻ޠ͕ར༻͞Ε͍ͯΔ7 7 Alex Bramley, Are we there yet? Thoughts on assessing an SRE team’s maturity, 2021
  43. RACI Matrix example8 8 Devops Raci Matrix Ppt Powerpoint Presentation

    File Format
  44. Mindset • ૊৫ͷ৴པੑʹ͸ 5 ͭͷجຊతஈ֊͕͋Γɺ͋Δ࣌఺ͷ૊৫ͷϚΠϯυηοτΛද͢9 • Absent: ૊৫ʹͱͬͯ৴པੑ͸ޙճ͠ʹͳ͍ͬͯΔঢ়ଶ • Reactive:

    ௚ۙͰੜͨ͡৴པੑͷ໰୊ͷϑΥϩʔ͕ߦΘΕΔ͕ɺγεςϜ΁ͷ௕ظతͳ౤ࢿ͸͠ͳ͍ • Proactive: ఆظతͳ૊৫ϓϩηεΛ௨ͯ͡જࡏతͳ৴པੑϦεΫ͕ಛఆ͞Εରॲ͞ΕΔ • Strategic: ΞʔΩςΫνϟɺϓϩμΫτɺϓϩηεΛମܥతʹมߋ͢Δ͜ͱͰϦεΫͷΫϥεΛ؅ཧ ͢Δ • Visionary: ৴པੑͷ࠷ߴҐʹ౸ୡ͓ͯ͠Γɺ৴པੑ΁ͷ෯޿͍औΓ૊ΈΛϕετϓϥΫςΟε͓Αͼ ܦݧʹج͍ͮͯࣾ಺֎ͰਪਐͰ͖Δ 9 What’s your org’s reliability mindset? Insights from Google SREs
  45. Mindset ͷཁ఺ • ඞͣ͠΋ Strategic ϑΣʔζ΍ Visionary ϑΣʔζʹ͍Δඞཁ͸ͳ͍ • ෳ਺ͷϑΣʔζʹ·͕ͨΔଐੑΛ͍࣋ͬͯΔ͜ͱ΋Ұൠత

    • େ෦෼͸डಈత͕ͩҰ෦͸ੵۃతଐੑΛ࣋ͭύλʔϯ΋͋Δ • ϚΠϯυηοτ͸૊৫ͷঢ়ଶʹ߹ΘͤͯมԽ͍ͤͯ͘͞ඞཁ͕͋Δ • e.g. डಈతˠੵۃతˠઓུత • ࡞ۀΛந৅Խ͠ɺٕೳΛ఻ঝ͠ɺߟ͑Λ໌จԽ͠ͳ͕ΒϑΣʔζΛ্͛ ͍ͯ͘
  46. Lessons Learned

  47. Why is Organization Important in SRE? • ৴པੑ͸Ϗδωεʹ͓͍ͯॏཁͳࢦඪͰ͋ΓɺاۀશମʹӨڹ͕͋Δ ͨΊ •

    ৴པੑ͸ސ٬ʹڧ͘ඥ෇͍͓ͯΓɺSRE νʔϜ୯ମͰ؅ཧ͢Δͷ͸ࠔ೉ • SRE ͷ࣮ફ͸ɺଟ͘ͷίϥϘϨʔγϣϯΛ௨ͯ͡૊৫తʹऔΓ૊Ήඞཁ͕ ͋Δ ͨΊ • Ұ؏ͨ͠৴৚ʹج͍ͮͨϓϥΫςΟεͷ࣮ફʹ͸ɺจԽͷৢ੒ͱՁ஋؍ͷ ڞ༗͕ඞཁෆՄ Ͱ͋ΔͨΊ • ݸਓͰ͸ͳ͘ɺ૊৫తʹऔΓ૊Ήඞཁ͕͋Δ
  48. Soft Skills required to implement SRE • SRE ʹ͸ϋʔυεΩϧ͚ͩͰͳ͘ιϑτεΩϧ΋ॏཁ •

    ૊৫ʹ SRE Λಋೖ͢Δ্ͰॏཁͳιϑτεΩϧͷྫΛ঺հ • SLI/SLO ΍ϙετϞʔςϜͳͲͷϓϥΫςΟεͷ࣮ફʹ໾ཱ ͭεΩϧͱͯ͠ɺOrganizational Behavior ͱ Facilitation Λઆ ໌
  49. SRE Organization Design • ࣗࣾʹͱͬͯద੾ͳ SRE ૊৫Λͭ͘Δࡍʹॏཁͳ3ͭͷϙΠϯτΛ঺հ • গͣͭ͠ਐΊΔͨΊʹ͸֤ϙΠϯτΛஈ֊తʹҠߦ͍ͯ͘͠ͱΑ͍ •

    Roles: ·ͣ͸ Pure SRE ͔Β͸͡Ίͯɺঃʑʹ Embedded SRE Λݕ౼͢Δ • Responsibilities: ·ͣ͸ SRE ͕ R Λ୲͍ͳ͕Βɺগͣͭ͠ݖݶҠৡΛਐ Ίͯ A ΍ C ʹҠߦ͢Δ • Mindset: ·ͣ͸ Absent Λղফ͠ɺม༰Ͱ͖Δ෦෼Λݟ͚ͭͯ Reactive ΍ Proactive ʹ͍ͯ͘͠
  50. We are Hiring! topotal.com/careers/software_engineer_sre