Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SRE Practices in Organizations
Search
Narimichi Takamura
November 16, 2021
Technology
16
9.6k
SRE Practices in Organizations
Infra Study 2nd #7「SREと組織」の登壇資料です。
https://forkwell.connpass.com/event/228038/
Narimichi Takamura
November 16, 2021
Tweet
Share
More Decks by Narimichi Takamura
See All by Narimichi Takamura
インシデントキーメトリクスによるインシデント対応の改善 / Improving Incident Response using Incident Key Metrics
nari_ex
1
9.7k
組織的なインシデント対応を目指して〜成熟度評価と改善のステップ〜 / Towards an Organized Incident Response - Maturity Assessment and Improvement Steps -
nari_ex
7
8.4k
Waroomの開発モチベーションと今後のロードマップ / Waroom development motivation and roadmap
nari_ex
1
1.6k
Engineering with Business Impact
nari_ex
2
310
How We Foster Reliability in Diversity
nari_ex
14
13k
Hardening におけるトラブルシューティング / Troubleshooting in Hardening
nari_ex
1
340
私が Engineering Manager になるまでに経験してきたこと、大切にしてきたこと / Lecture materials for Introduction to Venture Business at UEC
nari_ex
0
240
運用技術者組織の設計と運用 / Design and operation of operational engineer organization
nari_ex
11
9.8k
エンジニアリング組織の基礎知識 / Basic knowledge of engineering organization
nari_ex
10
4.6k
Other Decks in Technology
See All in Technology
整頓のジレンマとの戦い〜Tidy First?で振り返る事業とキャリアの歩み〜/Fighting the tidiness dilemma〜Business and Career Milestones Reflected on in Tidy First?〜
bitkey
3
17k
LLM時代の検索
shibuiwilliam
2
180
american airlines®️ USA Contact Numbers: Complete 2025 Support Guide
supportflight
1
110
Delegating the chores of authenticating users to Keycloak
ahus1
0
140
SmartNewsにおける 1000+ノード規模 K8s基盤 でのコスト最適化 – Spot・Gravitonの大規模導入への挑戦
vsanna2
0
140
United Airlines Customer Service– Call 1-833-341-3142 Now!
airhelp
0
170
MobileActOsaka_250704.pdf
akaitadaaki
0
130
マネジメントって難しい、けどおもしろい / Management is tough, but fun! #em_findy
ar_tama
7
1.1k
Yahoo!しごとカタログ 新しい境地を創るエンジニア募集!
lycorptech_jp
PRO
0
120
AWS認定を取る中で感じたこと
siromi
1
190
Tokyo_reInforce_2025_recap_iam_access_analyzer
hiashisan
0
190
20250707-AI活用の個人差を埋めるチームづくり
shnjtk
4
3.9k
Featured
See All Featured
The Language of Interfaces
destraynor
158
25k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
Git: the NoSQL Database
bkeepers
PRO
430
65k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
690
Being A Developer After 40
akosma
90
590k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
281
13k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
A Tale of Four Properties
chriscoyier
160
23k
Art, The Web, and Tiny UX
lynnandtonic
299
21k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Transcript
None
None
about:me
None
None
None
None
Motivation • SRE ͷ৴ཧղͨ͠ && SRE ͷϓϥΫςΟεཧղͨ͠ • ҰํͰɺͲͷΑ͏ʹͯࣗࣾ͠ʹ SRE
Λಋೖ͢Ε͍͍ͷ͔ϐϯͱ͜ͳ͍ • IT ٕज़Ҏ֎ʹཁૉ͕ඞཁͦ͏͕ͩɺ۩ମతʹͲͷΑ͏ͳͷ͕͋Δͷ͔ • SRE ʹؔ͢ΔଞࣾࣄྫࢀߟʹͳΔ • ҰํͰɺࣗࣾద༻͢ΔͨΊʹɺͲͷΑ͏ͳ؍Ͱݕ౼͢ΕΑ͍ͷ͔ → ιϑτεΩϧ ͱ SRE ৫ͷઃܭϙΠϯτ ʹ͍ͭͯ͠·͢ → ࠓճͷൃද͕ ࣗࣾͷ SRE ৫ͷ্ཱͪ͛ɾशख़ ͷҰॿʹͳΕ͍Ͱ͢
Table of Contents • Why is Organization Important in SRE?
• Soft Skills required to implement SRE • SRE Organization Design
Why is Organization Important in SRE?
Business metrics Include Engineering metrics 1 1 Mohit Suley and
Kurt Andersen, Understanding Business Metrics Can Make You a Better SRE, 2019, SREcon
SRE collaborate a lot!!
Culture beats strategy every time — Chapter 31 - Communication
and Collaboration in SRE
Soft Skills required to implement SRE
Why are soft skills so important?
Chapter 31 - Communication and Collaboration in SRE
“A good SRE has an ability to critically examine a
system and use that to guide them when asking questions of the system.” — Jamie Wilkinson, SRE at Google
Top 5 Soft Skills in SRE3 1. Problem Solving 2.
Teamwork 3. Composure underpressure 4. Written communication 5. Verbal communicaiton 3 Catchpoint, 2018 SRE report
SRE ʹٻΊΒΕΔιϑτεΩϧ • ΛޮՌతʹղܾ͢ΔͨΊʹɺଞऀͱ͏·͘ڠྗ͢Δೳྗ ͕ඞཁͰ͋Δ • ͯ͢ͷ͑Λ͍ͬͯΔ͜ͱΛظ͞Ε͍ͯΔͷͰͳ͘ɺ νʔϜ৫ͷதͰ୭ʹॿ͚ΛٻΊΕΑ͍ͷ͔ɺͲͷΑ͏ʹ ίϛϡχέʔγϣϯΛͱΕΑ͍ͷ͔Λ͍ͬͯΔඞཁ͕͋Δ
Soft Skills Example in Implement SRE
Case Soft Skill Postmortem Blameless, Critical Thinking... SLI/SLO Organizational Behavior...
Building consensus with managers Facilitation, Negotiation...
Organizational Behavior • ਓΛಈ͔ͨ͢ΊͷΞϓϩʔν2ͭʹྨ͞ΕΔ • HRM: ΈʹΑΔΞϓϩʔν • OB: ରਓతͳΞϓϩʔν
• SLI/SLO ϙετϞʔςϜͳͲɺଞνʔϜΛר͖ࠐΉΑ͏ͳ γʔϯͰ OB ʹཱͭ
None
ϕʔεͱͳΔߦಈݪཧ
ॏཁͱͳΔ3ͭͷجૅࣝΧςΰϦ
جૅཧͷ۩ମྫ • ݸਓ • ex. εϖϯαʔʮණࢁϞσϧʯɺϘϠςΟζʮίϯϐςϯγʔ֓೦ ਤʯɺόϯσϡʔϥʮࣗݾޮྗײͷߏཁૉʯ • ूஂ •
ex. ϨϰΟϯʮ৫มֵϓϩηεʯɺλοΫϚϯϞσϧ • Ϧʔμʔγοϓ • ex. ΧϦεϚϦʔμʔγοϓɺαʔόϯτϦʔμʔγοϓ...
όϯσϡʔϥʮࣗݾޮྗײͷߏཁૉʯ5 5 GLOBIS ݟ࣮ʂ, MBAᶈ ࣗมֵɺߦ͖ͭΓͭɺগͣͭ͠ʲ࠷ऴճʳ, 2015
ϨϰΟϯʮ৫มֵϓϩηεʯ
OB ͷ࣮ફྫ: SLI/SLO ͷஈ֊తͳಋೖ • ৫ͷಛੑΛѲ ্ͨ͠Ͱɺղౚˠมֵˠ࠶ౚ݁ͷεςοϓ Λ ܦͭͭಋೖ͢Δ •
Dev ͷߦಈݪཧΛཧղ ্ͨ͠ͰɺSLI/SLO ͷಋೖোนΛԼ͛Δ • SLI/SLO ΛτϦΨʔʹΞΫγϣϯͰ͖ΔΑ͏ʹɺߦಈม༰Λଅ ͢ࢪࡦ ʹऔΓΉ
SLI/SLO ಋೖͷϑΣʔζ͚ͷྫ
SLI/SLO ಋೖ: ϑΣʔζ1 ·ͣ SRE ͕ओମͱͳͬͯ৫ʹ SLI/SLO Λಋೖ͠ɺՁݕূΛߦ͏͜ͱΛࢦ ͢ɻ ӡ༻શମΛר͖ࠐΈͭͭɺSRE
͕ίϯτϩʔϧͰ͖ΔൣғͰ͡ΊΔͱΑ͍ɻ 1. SLI/SLO ͕ఆٛ͞Ε͍ͯΔ 2. SLI/SLO ʹؔ͢ΔϫʔΫϑϩʔ͕ఆٛ͞Ε͍ͯΔ 3. αʔϏενʔϜΛר͖ࠐΈͭͭɺSRE ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ • SLO ͷΛτϦΨʔʹΞϥʔτ௨Λ͢Δ • ৼΓฦΓձΛߦ͏
SLI/SLO ಋೖ: ϑΣʔζ2 SRE ͷతͳࢧԉͳ͠Ͱ SLI/SLO ͕ӡ༻͞ΕΔମ੍Λࢦ͢ɻ ϑΣʔζ1ͰɺSLI/SLO ʹର͢ΔՁ͕ೝΊΒΕ͔ͯΒ͜ͷϑΣʔζʹҠߦ͢Δɻ ר͖ࠐΉਓϩʔϧ͕૿͍͑ͯΔ͕ϑΣʔζ1ͱҟͳΔɻ
ΑΓଟ͘ͷਓ͕ސ٬ࢹΛ࣋ͬͯ SLI/SLO Λӡ༻͢Δঢ়ଶΛࢦ͢ɻ 1. PdM ࣄۀऀͳͲͱͱʹɺࣄۀࢹΛ౿·͑ͯ SLI/SLO ΛఆΊΔ͜ͱ ͕Ͱ͖Δ 2. αʔϏενʔϜ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ 3. Embedded SRE ͱͯ͠αʔϏενʔϜΛϑΥϩʔ͢Δମ੍͕͋Δ
Facilitation • ೲಘײͷ͋Δ݁ʹ౸ୡ͢Δ ͨΊͷεΩϧ • ޮՌతͳ ձٞͷ४උͱਐߦ Λߦ͏ͨΊʹඞཁͳձٞϚωδϝϯ τͷఆੴ
it's difficult to find someone who's lucky enough to only
have useful, effective meetings. This is equally true for SRE. — Chapter 31 - Communication and Collaboration in SRE
None
None
None
SRE Organization Design
ཧͱݱ࣮ͷΪϟοϓʹର͢Δղ૾Λ্͛Δ 1. SRE धཁʹରͯ͠ϦιʔεෆʹؕΔ͜ͱ͕ଟ͍ 2. εέʔϧ͢ΔߏΛऔΔඞཁ͕͋Δ 3. εέʔϥϏϦςΟΛอͱ͏ͱ͢Δͱ༷ʑͳϓϥΫςΟε͕ඞཁʹͳΔ 4. ࣮ࡍʹϦιʔε͕গͳ͍ͷͰɺগͣͭ͠ਐΊΔඞཁ͕͋Δʢཁό
ϥϯεʣɺͰࢥߟΛࢭΊͳ͍ 5. → SRE ৫Λߏங͢Δ্ͰɺͰ͖ΔϙΠϯτͲ͜ʹ͋Δ͔Λ ཧղ͢Δ
SRE ৫Λߏங͢Δࡍʹॏཁͳ3ͭͷϙΠϯτ • Roles • Responsibilities • Mindset
දతͳ 2 ͭͷϩʔϧ6 6 New Relic, SRE-iously: Defining the Principles,
Habits, and Practices of Site Reliability Engineering , 2018
Responsibilities • ۀͷ୲ͷॴࡏΛ໌֬ʹ͢Δ • RACIϚτϦΫεҎԼͷ4ͭͷཁૉΛ໌֬ʹࣔ͢ࡍʹ༗ޮ • RʢResponsibleʣ: ࣮ߦऀ • AʢAccountableʣ:
આ໌ऀ • CʢConsultedʣ: ૬ஊઌ • IʢInformedʣ: ใࠂઌ • Google ͷهࣄͰ RACI ༻ޠ͕ར༻͞Ε͍ͯΔ7 7 Alex Bramley, Are we there yet? Thoughts on assessing an SRE team’s maturity, 2021
RACI Matrix example8 8 Devops Raci Matrix Ppt Powerpoint Presentation
File Format
Mindset • ৫ͷ৴པੑʹ 5 ͭͷجຊతஈ֊͕͋Γɺ͋Δ࣌ͷ৫ͷϚΠϯυηοτΛද͢9 • Absent: ৫ʹͱͬͯ৴པੑޙճ͠ʹͳ͍ͬͯΔঢ়ଶ • Reactive:
ۙͰੜͨ͡৴པੑͷͷϑΥϩʔ͕ߦΘΕΔ͕ɺγεςϜͷظతͳࢿ͠ͳ͍ • Proactive: ఆظతͳ৫ϓϩηεΛ௨ͯ͡જࡏతͳ৴པੑϦεΫ͕ಛఆ͞Εରॲ͞ΕΔ • Strategic: ΞʔΩςΫνϟɺϓϩμΫτɺϓϩηεΛମܥతʹมߋ͢Δ͜ͱͰϦεΫͷΫϥεΛཧ ͢Δ • Visionary: ৴པੑͷ࠷ߴҐʹ౸ୡ͓ͯ͠Γɺ৴པੑͷ෯͍औΓΈΛϕετϓϥΫςΟε͓Αͼ ܦݧʹج͍ͮͯࣾ֎ͰਪਐͰ͖Δ 9 What’s your org’s reliability mindset? Insights from Google SREs
Mindset ͷཁ • ඞͣ͠ Strategic ϑΣʔζ Visionary ϑΣʔζʹ͍Δඞཁͳ͍ • ෳͷϑΣʔζʹ·͕ͨΔଐੑΛ͍࣋ͬͯΔ͜ͱҰൠత
• େ෦डಈత͕ͩҰ෦ੵۃతଐੑΛ࣋ͭύλʔϯ͋Δ • ϚΠϯυηοτ৫ͷঢ়ଶʹ߹ΘͤͯมԽ͍ͤͯ͘͞ඞཁ͕͋Δ • e.g. डಈతˠੵۃతˠઓུత • ࡞ۀΛநԽ͠ɺٕೳΛঝ͠ɺߟ͑Λ໌จԽ͠ͳ͕ΒϑΣʔζΛ্͛ ͍ͯ͘
Lessons Learned
Why is Organization Important in SRE? • ৴པੑϏδωεʹ͓͍ͯॏཁͳࢦඪͰ͋ΓɺاۀશମʹӨڹ͕͋Δ ͨΊ •
৴པੑސ٬ʹڧ͘ඥ͍͓ͯΓɺSRE νʔϜ୯ମͰཧ͢Δͷࠔ • SRE ͷ࣮ફɺଟ͘ͷίϥϘϨʔγϣϯΛ௨ͯ͡৫తʹऔΓΉඞཁ͕ ͋Δ ͨΊ • Ұ؏ͨ͠৴ʹج͍ͮͨϓϥΫςΟεͷ࣮ફʹɺจԽͷৢͱՁ؍ͷ ڞ༗͕ඞཁෆՄ Ͱ͋ΔͨΊ • ݸਓͰͳ͘ɺ৫తʹऔΓΉඞཁ͕͋Δ
Soft Skills required to implement SRE • SRE ʹϋʔυεΩϧ͚ͩͰͳ͘ιϑτεΩϧॏཁ •
৫ʹ SRE Λಋೖ͢Δ্ͰॏཁͳιϑτεΩϧͷྫΛհ • SLI/SLO ϙετϞʔςϜͳͲͷϓϥΫςΟεͷ࣮ફʹཱ ͭεΩϧͱͯ͠ɺOrganizational Behavior ͱ Facilitation Λઆ ໌
SRE Organization Design • ࣗࣾʹͱͬͯదͳ SRE ৫Λͭ͘Δࡍʹॏཁͳ3ͭͷϙΠϯτΛհ • গͣͭ͠ਐΊΔͨΊʹ֤ϙΠϯτΛஈ֊తʹҠߦ͍ͯ͘͠ͱΑ͍ •
Roles: ·ͣ Pure SRE ͔Β͡Ίͯɺঃʑʹ Embedded SRE Λݕ౼͢Δ • Responsibilities: ·ͣ SRE ͕ R Λ୲͍ͳ͕Βɺগͣͭ͠ݖݶҠৡΛਐ Ίͯ A C ʹҠߦ͢Δ • Mindset: ·ͣ Absent Λղফ͠ɺม༰Ͱ͖Δ෦Λݟ͚ͭͯ Reactive Proactive ʹ͍ͯ͘͠
We are Hiring! topotal.com/careers/software_engineer_sre