Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SRE Practices in Organizations
Search
Narimichi Takamura
November 16, 2021
Technology
16
9.8k
SRE Practices in Organizations
Infra Study 2nd #7「SREと組織」の登壇資料です。
https://forkwell.connpass.com/event/228038/
Narimichi Takamura
November 16, 2021
Tweet
Share
More Decks by Narimichi Takamura
See All by Narimichi Takamura
インシデントキーメトリクスによるインシデント対応の改善 / Improving Incident Response using Incident Key Metrics
nari_ex
1
11k
組織的なインシデント対応を目指して〜成熟度評価と改善のステップ〜 / Towards an Organized Incident Response - Maturity Assessment and Improvement Steps -
nari_ex
7
8.6k
Waroomの開発モチベーションと今後のロードマップ / Waroom development motivation and roadmap
nari_ex
1
1.6k
Engineering with Business Impact
nari_ex
2
310
How We Foster Reliability in Diversity
nari_ex
14
13k
Hardening におけるトラブルシューティング / Troubleshooting in Hardening
nari_ex
1
350
私が Engineering Manager になるまでに経験してきたこと、大切にしてきたこと / Lecture materials for Introduction to Venture Business at UEC
nari_ex
0
240
運用技術者組織の設計と運用 / Design and operation of operational engineer organization
nari_ex
11
9.9k
エンジニアリング組織の基礎知識 / Basic knowledge of engineering organization
nari_ex
10
4.6k
Other Decks in Technology
See All in Technology
隙間時間で爆速開発! Claude Code × Vibe Coding で作るマニュアル自動生成サービス
akitomonam
3
250
みんなのSRE 〜チーム全員でのSRE活動にするための4つの取り組み〜
kakehashi
PRO
2
140
GMOペパボのデータ基盤とデータ活用の現在地 / Current State of GMO Pepabo's Data Infrastructure and Data Utilization
zaimy
3
210
LLM 機能を支える Langfuse / ClickHouse のサーバレス化
yuu26
7
970
dipにおけるSRE変革の軌跡
dip_tech
PRO
1
240
Serverless Meetup #21
yoshidashingo
1
110
AIに目を奪われすぎて、周りの困っている人間が見えなくなっていませんか?
cap120
1
440
データモデリング通り #2オンライン勉強会 ~方法論の話をしよう~
datayokocho
0
140
マルチプロダクト×マルチテナントを支えるモジュラモノリスを中心としたアソビューのアーキテクチャ
disc99
1
370
バクラクによるコーポレート業務の自動運転 #BetAIDay
layerx
PRO
1
890
Tableau API連携の罠!?脱スプシを夢見たはずが、逆に依存を深めた話
cuebic9bic
3
220
MCP認可の現在地と自律型エージェント対応に向けた課題 / MCP Authorization Today and Challenges to Support Autonomous Agents
yokawasa
5
2.1k
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
430
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
How to train your dragon (web standard)
notwaldorf
96
6.2k
Automating Front-end Workflow
addyosmani
1370
200k
Designing for Performance
lara
610
69k
Mobile First: as difficult as doing things right
swwweet
223
9.9k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
50
5.5k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
283
13k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.5k
Thoughts on Productivity
jonyablonski
69
4.8k
Adopting Sorbet at Scale
ufuk
77
9.5k
Transcript
None
None
about:me
None
None
None
None
Motivation • SRE ͷ৴ཧղͨ͠ && SRE ͷϓϥΫςΟεཧղͨ͠ • ҰํͰɺͲͷΑ͏ʹͯࣗࣾ͠ʹ SRE
Λಋೖ͢Ε͍͍ͷ͔ϐϯͱ͜ͳ͍ • IT ٕज़Ҏ֎ʹཁૉ͕ඞཁͦ͏͕ͩɺ۩ମతʹͲͷΑ͏ͳͷ͕͋Δͷ͔ • SRE ʹؔ͢ΔଞࣾࣄྫࢀߟʹͳΔ • ҰํͰɺࣗࣾద༻͢ΔͨΊʹɺͲͷΑ͏ͳ؍Ͱݕ౼͢ΕΑ͍ͷ͔ → ιϑτεΩϧ ͱ SRE ৫ͷઃܭϙΠϯτ ʹ͍ͭͯ͠·͢ → ࠓճͷൃද͕ ࣗࣾͷ SRE ৫ͷ্ཱͪ͛ɾशख़ ͷҰॿʹͳΕ͍Ͱ͢
Table of Contents • Why is Organization Important in SRE?
• Soft Skills required to implement SRE • SRE Organization Design
Why is Organization Important in SRE?
Business metrics Include Engineering metrics 1 1 Mohit Suley and
Kurt Andersen, Understanding Business Metrics Can Make You a Better SRE, 2019, SREcon
SRE collaborate a lot!!
Culture beats strategy every time — Chapter 31 - Communication
and Collaboration in SRE
Soft Skills required to implement SRE
Why are soft skills so important?
Chapter 31 - Communication and Collaboration in SRE
“A good SRE has an ability to critically examine a
system and use that to guide them when asking questions of the system.” — Jamie Wilkinson, SRE at Google
Top 5 Soft Skills in SRE3 1. Problem Solving 2.
Teamwork 3. Composure underpressure 4. Written communication 5. Verbal communicaiton 3 Catchpoint, 2018 SRE report
SRE ʹٻΊΒΕΔιϑτεΩϧ • ΛޮՌతʹղܾ͢ΔͨΊʹɺଞऀͱ͏·͘ڠྗ͢Δೳྗ ͕ඞཁͰ͋Δ • ͯ͢ͷ͑Λ͍ͬͯΔ͜ͱΛظ͞Ε͍ͯΔͷͰͳ͘ɺ νʔϜ৫ͷதͰ୭ʹॿ͚ΛٻΊΕΑ͍ͷ͔ɺͲͷΑ͏ʹ ίϛϡχέʔγϣϯΛͱΕΑ͍ͷ͔Λ͍ͬͯΔඞཁ͕͋Δ
Soft Skills Example in Implement SRE
Case Soft Skill Postmortem Blameless, Critical Thinking... SLI/SLO Organizational Behavior...
Building consensus with managers Facilitation, Negotiation...
Organizational Behavior • ਓΛಈ͔ͨ͢ΊͷΞϓϩʔν2ͭʹྨ͞ΕΔ • HRM: ΈʹΑΔΞϓϩʔν • OB: ରਓతͳΞϓϩʔν
• SLI/SLO ϙετϞʔςϜͳͲɺଞνʔϜΛר͖ࠐΉΑ͏ͳ γʔϯͰ OB ʹཱͭ
None
ϕʔεͱͳΔߦಈݪཧ
ॏཁͱͳΔ3ͭͷجૅࣝΧςΰϦ
جૅཧͷ۩ମྫ • ݸਓ • ex. εϖϯαʔʮණࢁϞσϧʯɺϘϠςΟζʮίϯϐςϯγʔ֓೦ ਤʯɺόϯσϡʔϥʮࣗݾޮྗײͷߏཁૉʯ • ूஂ •
ex. ϨϰΟϯʮ৫มֵϓϩηεʯɺλοΫϚϯϞσϧ • Ϧʔμʔγοϓ • ex. ΧϦεϚϦʔμʔγοϓɺαʔόϯτϦʔμʔγοϓ...
όϯσϡʔϥʮࣗݾޮྗײͷߏཁૉʯ5 5 GLOBIS ݟ࣮ʂ, MBAᶈ ࣗมֵɺߦ͖ͭΓͭɺগͣͭ͠ʲ࠷ऴճʳ, 2015
ϨϰΟϯʮ৫มֵϓϩηεʯ
OB ͷ࣮ફྫ: SLI/SLO ͷஈ֊తͳಋೖ • ৫ͷಛੑΛѲ ্ͨ͠Ͱɺղౚˠมֵˠ࠶ౚ݁ͷεςοϓ Λ ܦͭͭಋೖ͢Δ •
Dev ͷߦಈݪཧΛཧղ ্ͨ͠ͰɺSLI/SLO ͷಋೖোนΛԼ͛Δ • SLI/SLO ΛτϦΨʔʹΞΫγϣϯͰ͖ΔΑ͏ʹɺߦಈม༰Λଅ ͢ࢪࡦ ʹऔΓΉ
SLI/SLO ಋೖͷϑΣʔζ͚ͷྫ
SLI/SLO ಋೖ: ϑΣʔζ1 ·ͣ SRE ͕ओମͱͳͬͯ৫ʹ SLI/SLO Λಋೖ͠ɺՁݕূΛߦ͏͜ͱΛࢦ ͢ɻ ӡ༻શମΛר͖ࠐΈͭͭɺSRE
͕ίϯτϩʔϧͰ͖ΔൣғͰ͡ΊΔͱΑ͍ɻ 1. SLI/SLO ͕ఆٛ͞Ε͍ͯΔ 2. SLI/SLO ʹؔ͢ΔϫʔΫϑϩʔ͕ఆٛ͞Ε͍ͯΔ 3. αʔϏενʔϜΛר͖ࠐΈͭͭɺSRE ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ • SLO ͷΛτϦΨʔʹΞϥʔτ௨Λ͢Δ • ৼΓฦΓձΛߦ͏
SLI/SLO ಋೖ: ϑΣʔζ2 SRE ͷతͳࢧԉͳ͠Ͱ SLI/SLO ͕ӡ༻͞ΕΔମ੍Λࢦ͢ɻ ϑΣʔζ1ͰɺSLI/SLO ʹର͢ΔՁ͕ೝΊΒΕ͔ͯΒ͜ͷϑΣʔζʹҠߦ͢Δɻ ר͖ࠐΉਓϩʔϧ͕૿͍͑ͯΔ͕ϑΣʔζ1ͱҟͳΔɻ
ΑΓଟ͘ͷਓ͕ސ٬ࢹΛ࣋ͬͯ SLI/SLO Λӡ༻͢Δঢ়ଶΛࢦ͢ɻ 1. PdM ࣄۀऀͳͲͱͱʹɺࣄۀࢹΛ౿·͑ͯ SLI/SLO ΛఆΊΔ͜ͱ ͕Ͱ͖Δ 2. αʔϏενʔϜ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ 3. Embedded SRE ͱͯ͠αʔϏενʔϜΛϑΥϩʔ͢Δମ੍͕͋Δ
Facilitation • ೲಘײͷ͋Δ݁ʹ౸ୡ͢Δ ͨΊͷεΩϧ • ޮՌతͳ ձٞͷ४උͱਐߦ Λߦ͏ͨΊʹඞཁͳձٞϚωδϝϯ τͷఆੴ
it's difficult to find someone who's lucky enough to only
have useful, effective meetings. This is equally true for SRE. — Chapter 31 - Communication and Collaboration in SRE
None
None
None
SRE Organization Design
ཧͱݱ࣮ͷΪϟοϓʹର͢Δղ૾Λ্͛Δ 1. SRE धཁʹରͯ͠ϦιʔεෆʹؕΔ͜ͱ͕ଟ͍ 2. εέʔϧ͢ΔߏΛऔΔඞཁ͕͋Δ 3. εέʔϥϏϦςΟΛอͱ͏ͱ͢Δͱ༷ʑͳϓϥΫςΟε͕ඞཁʹͳΔ 4. ࣮ࡍʹϦιʔε͕গͳ͍ͷͰɺগͣͭ͠ਐΊΔඞཁ͕͋Δʢཁό
ϥϯεʣɺͰࢥߟΛࢭΊͳ͍ 5. → SRE ৫Λߏங͢Δ্ͰɺͰ͖ΔϙΠϯτͲ͜ʹ͋Δ͔Λ ཧղ͢Δ
SRE ৫Λߏங͢Δࡍʹॏཁͳ3ͭͷϙΠϯτ • Roles • Responsibilities • Mindset
දతͳ 2 ͭͷϩʔϧ6 6 New Relic, SRE-iously: Defining the Principles,
Habits, and Practices of Site Reliability Engineering , 2018
Responsibilities • ۀͷ୲ͷॴࡏΛ໌֬ʹ͢Δ • RACIϚτϦΫεҎԼͷ4ͭͷཁૉΛ໌֬ʹࣔ͢ࡍʹ༗ޮ • RʢResponsibleʣ: ࣮ߦऀ • AʢAccountableʣ:
આ໌ऀ • CʢConsultedʣ: ૬ஊઌ • IʢInformedʣ: ใࠂઌ • Google ͷهࣄͰ RACI ༻ޠ͕ར༻͞Ε͍ͯΔ7 7 Alex Bramley, Are we there yet? Thoughts on assessing an SRE team’s maturity, 2021
RACI Matrix example8 8 Devops Raci Matrix Ppt Powerpoint Presentation
File Format
Mindset • ৫ͷ৴པੑʹ 5 ͭͷجຊతஈ֊͕͋Γɺ͋Δ࣌ͷ৫ͷϚΠϯυηοτΛද͢9 • Absent: ৫ʹͱͬͯ৴པੑޙճ͠ʹͳ͍ͬͯΔঢ়ଶ • Reactive:
ۙͰੜͨ͡৴པੑͷͷϑΥϩʔ͕ߦΘΕΔ͕ɺγεςϜͷظతͳࢿ͠ͳ͍ • Proactive: ఆظతͳ৫ϓϩηεΛ௨ͯ͡જࡏతͳ৴པੑϦεΫ͕ಛఆ͞Εରॲ͞ΕΔ • Strategic: ΞʔΩςΫνϟɺϓϩμΫτɺϓϩηεΛମܥతʹมߋ͢Δ͜ͱͰϦεΫͷΫϥεΛཧ ͢Δ • Visionary: ৴པੑͷ࠷ߴҐʹ౸ୡ͓ͯ͠Γɺ৴པੑͷ෯͍औΓΈΛϕετϓϥΫςΟε͓Αͼ ܦݧʹج͍ͮͯࣾ֎ͰਪਐͰ͖Δ 9 What’s your org’s reliability mindset? Insights from Google SREs
Mindset ͷཁ • ඞͣ͠ Strategic ϑΣʔζ Visionary ϑΣʔζʹ͍Δඞཁͳ͍ • ෳͷϑΣʔζʹ·͕ͨΔଐੑΛ͍࣋ͬͯΔ͜ͱҰൠత
• େ෦डಈత͕ͩҰ෦ੵۃతଐੑΛ࣋ͭύλʔϯ͋Δ • ϚΠϯυηοτ৫ͷঢ়ଶʹ߹ΘͤͯมԽ͍ͤͯ͘͞ඞཁ͕͋Δ • e.g. डಈతˠੵۃతˠઓུత • ࡞ۀΛநԽ͠ɺٕೳΛঝ͠ɺߟ͑Λ໌จԽ͠ͳ͕ΒϑΣʔζΛ্͛ ͍ͯ͘
Lessons Learned
Why is Organization Important in SRE? • ৴པੑϏδωεʹ͓͍ͯॏཁͳࢦඪͰ͋ΓɺاۀશମʹӨڹ͕͋Δ ͨΊ •
৴པੑސ٬ʹڧ͘ඥ͍͓ͯΓɺSRE νʔϜ୯ମͰཧ͢Δͷࠔ • SRE ͷ࣮ફɺଟ͘ͷίϥϘϨʔγϣϯΛ௨ͯ͡৫తʹऔΓΉඞཁ͕ ͋Δ ͨΊ • Ұ؏ͨ͠৴ʹج͍ͮͨϓϥΫςΟεͷ࣮ફʹɺจԽͷৢͱՁ؍ͷ ڞ༗͕ඞཁෆՄ Ͱ͋ΔͨΊ • ݸਓͰͳ͘ɺ৫తʹऔΓΉඞཁ͕͋Δ
Soft Skills required to implement SRE • SRE ʹϋʔυεΩϧ͚ͩͰͳ͘ιϑτεΩϧॏཁ •
৫ʹ SRE Λಋೖ͢Δ্ͰॏཁͳιϑτεΩϧͷྫΛհ • SLI/SLO ϙετϞʔςϜͳͲͷϓϥΫςΟεͷ࣮ફʹཱ ͭεΩϧͱͯ͠ɺOrganizational Behavior ͱ Facilitation Λઆ ໌
SRE Organization Design • ࣗࣾʹͱͬͯదͳ SRE ৫Λͭ͘Δࡍʹॏཁͳ3ͭͷϙΠϯτΛհ • গͣͭ͠ਐΊΔͨΊʹ֤ϙΠϯτΛஈ֊తʹҠߦ͍ͯ͘͠ͱΑ͍ •
Roles: ·ͣ Pure SRE ͔Β͡Ίͯɺঃʑʹ Embedded SRE Λݕ౼͢Δ • Responsibilities: ·ͣ SRE ͕ R Λ୲͍ͳ͕Βɺগͣͭ͠ݖݶҠৡΛਐ Ίͯ A C ʹҠߦ͢Δ • Mindset: ·ͣ Absent Λղফ͠ɺม༰Ͱ͖Δ෦Λݟ͚ͭͯ Reactive Proactive ʹ͍ͯ͘͠
We are Hiring! topotal.com/careers/software_engineer_sre