Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SRE Practices in Organizations
Search
Narimichi Takamura
November 16, 2021
Technology
16
11k
SRE Practices in Organizations
Infra Study 2nd #7「SREと組織」の登壇資料です。
https://forkwell.connpass.com/event/228038/
Narimichi Takamura
November 16, 2021
Tweet
Share
More Decks by Narimichi Takamura
See All by Narimichi Takamura
Observability — Extending Into Incident Response
nari_ex
2
1k
インシデントキーメトリクスによるインシデント対応の改善 / Improving Incident Response using Incident Key Metrics
nari_ex
1
12k
組織的なインシデント対応を目指して〜成熟度評価と改善のステップ〜 / Towards an Organized Incident Response - Maturity Assessment and Improvement Steps -
nari_ex
7
9.4k
Waroomの開発モチベーションと今後のロードマップ / Waroom development motivation and roadmap
nari_ex
1
1.7k
Engineering with Business Impact
nari_ex
2
330
How We Foster Reliability in Diversity
nari_ex
14
13k
Hardening におけるトラブルシューティング / Troubleshooting in Hardening
nari_ex
1
380
私が Engineering Manager になるまでに経験してきたこと、大切にしてきたこと / Lecture materials for Introduction to Venture Business at UEC
nari_ex
0
260
運用技術者組織の設計と運用 / Design and operation of operational engineer organization
nari_ex
11
10k
Other Decks in Technology
See All in Technology
私たち準委任PdEは2つのプロダクトに挑戦する ~ソフトウェア、開発支援という”二重”のプロダクトエンジニアリングの実践~ / 20260212 Naoki Takahashi
shift_evolve
PRO
2
190
Context Engineeringの取り組み
nutslove
0
380
今こそ学びたいKubernetesネットワーク ~CNIが繋ぐNWとプラットフォームの「フラッと」な対話
logica0419
5
370
仕様書駆動AI開発の実践: Issue→Skill→PRテンプレで 再現性を作る
knishioka
2
680
Embedded SREの終わりを設計する 「なんとなく」から計画的な自立支援へ
sansantech
PRO
3
2.6k
こんなところでも(地味に)活躍するImage Modeさんを知ってるかい?- Image Mode for OpenShift -
tsukaman
1
170
SRE Enabling戦記 - 急成長する組織にSREを浸透させる戦いの歴史
markie1009
0
170
プロポーザルに込める段取り八分
shoheimitani
1
630
Cosmos World Foundation Model Platform for Physical AI
takmin
0
970
StrandsとNeptuneを使ってナレッジグラフを構築する
yakumo
1
120
pool.ntp.orgに ⾃宅サーバーで 参加してみたら...
tanyorg
0
530
M&A 後の統合をどう進めるか ─ ナレッジワーク × Poetics が実践した組織とシステムの融合
kworkdev
PRO
1
500
Featured
See All Featured
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.6k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
57
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
470
SEO for Brand Visibility & Recognition
aleyda
0
4.2k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Technical Leadership for Architectural Decision Making
baasie
2
250
Documentation Writing (for coders)
carmenintech
77
5.3k
Rails Girls Zürich Keynote
gr2m
96
14k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
120
Embracing the Ebb and Flow
colly
88
5k
Navigating Weather and Climate Data
rabernat
0
110
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
130
Transcript
None
None
about:me
None
None
None
None
Motivation • SRE ͷ৴ཧղͨ͠ && SRE ͷϓϥΫςΟεཧղͨ͠ • ҰํͰɺͲͷΑ͏ʹͯࣗࣾ͠ʹ SRE
Λಋೖ͢Ε͍͍ͷ͔ϐϯͱ͜ͳ͍ • IT ٕज़Ҏ֎ʹཁૉ͕ඞཁͦ͏͕ͩɺ۩ମతʹͲͷΑ͏ͳͷ͕͋Δͷ͔ • SRE ʹؔ͢ΔଞࣾࣄྫࢀߟʹͳΔ • ҰํͰɺࣗࣾద༻͢ΔͨΊʹɺͲͷΑ͏ͳ؍Ͱݕ౼͢ΕΑ͍ͷ͔ → ιϑτεΩϧ ͱ SRE ৫ͷઃܭϙΠϯτ ʹ͍ͭͯ͠·͢ → ࠓճͷൃද͕ ࣗࣾͷ SRE ৫ͷ্ཱͪ͛ɾशख़ ͷҰॿʹͳΕ͍Ͱ͢
Table of Contents • Why is Organization Important in SRE?
• Soft Skills required to implement SRE • SRE Organization Design
Why is Organization Important in SRE?
Business metrics Include Engineering metrics 1 1 Mohit Suley and
Kurt Andersen, Understanding Business Metrics Can Make You a Better SRE, 2019, SREcon
SRE collaborate a lot!!
Culture beats strategy every time — Chapter 31 - Communication
and Collaboration in SRE
Soft Skills required to implement SRE
Why are soft skills so important?
Chapter 31 - Communication and Collaboration in SRE
“A good SRE has an ability to critically examine a
system and use that to guide them when asking questions of the system.” — Jamie Wilkinson, SRE at Google
Top 5 Soft Skills in SRE3 1. Problem Solving 2.
Teamwork 3. Composure underpressure 4. Written communication 5. Verbal communicaiton 3 Catchpoint, 2018 SRE report
SRE ʹٻΊΒΕΔιϑτεΩϧ • ΛޮՌతʹղܾ͢ΔͨΊʹɺଞऀͱ͏·͘ڠྗ͢Δೳྗ ͕ඞཁͰ͋Δ • ͯ͢ͷ͑Λ͍ͬͯΔ͜ͱΛظ͞Ε͍ͯΔͷͰͳ͘ɺ νʔϜ৫ͷதͰ୭ʹॿ͚ΛٻΊΕΑ͍ͷ͔ɺͲͷΑ͏ʹ ίϛϡχέʔγϣϯΛͱΕΑ͍ͷ͔Λ͍ͬͯΔඞཁ͕͋Δ
Soft Skills Example in Implement SRE
Case Soft Skill Postmortem Blameless, Critical Thinking... SLI/SLO Organizational Behavior...
Building consensus with managers Facilitation, Negotiation...
Organizational Behavior • ਓΛಈ͔ͨ͢ΊͷΞϓϩʔν2ͭʹྨ͞ΕΔ • HRM: ΈʹΑΔΞϓϩʔν • OB: ରਓతͳΞϓϩʔν
• SLI/SLO ϙετϞʔςϜͳͲɺଞνʔϜΛר͖ࠐΉΑ͏ͳ γʔϯͰ OB ʹཱͭ
None
ϕʔεͱͳΔߦಈݪཧ
ॏཁͱͳΔ3ͭͷجૅࣝΧςΰϦ
جૅཧͷ۩ମྫ • ݸਓ • ex. εϖϯαʔʮණࢁϞσϧʯɺϘϠςΟζʮίϯϐςϯγʔ֓೦ ਤʯɺόϯσϡʔϥʮࣗݾޮྗײͷߏཁૉʯ • ूஂ •
ex. ϨϰΟϯʮ৫มֵϓϩηεʯɺλοΫϚϯϞσϧ • Ϧʔμʔγοϓ • ex. ΧϦεϚϦʔμʔγοϓɺαʔόϯτϦʔμʔγοϓ...
όϯσϡʔϥʮࣗݾޮྗײͷߏཁૉʯ5 5 GLOBIS ݟ࣮ʂ, MBAᶈ ࣗมֵɺߦ͖ͭΓͭɺগͣͭ͠ʲ࠷ऴճʳ, 2015
ϨϰΟϯʮ৫มֵϓϩηεʯ
OB ͷ࣮ફྫ: SLI/SLO ͷஈ֊తͳಋೖ • ৫ͷಛੑΛѲ ্ͨ͠Ͱɺղౚˠมֵˠ࠶ౚ݁ͷεςοϓ Λ ܦͭͭಋೖ͢Δ •
Dev ͷߦಈݪཧΛཧղ ্ͨ͠ͰɺSLI/SLO ͷಋೖোนΛԼ͛Δ • SLI/SLO ΛτϦΨʔʹΞΫγϣϯͰ͖ΔΑ͏ʹɺߦಈม༰Λଅ ͢ࢪࡦ ʹऔΓΉ
SLI/SLO ಋೖͷϑΣʔζ͚ͷྫ
SLI/SLO ಋೖ: ϑΣʔζ1 ·ͣ SRE ͕ओମͱͳͬͯ৫ʹ SLI/SLO Λಋೖ͠ɺՁݕূΛߦ͏͜ͱΛࢦ ͢ɻ ӡ༻શମΛר͖ࠐΈͭͭɺSRE
͕ίϯτϩʔϧͰ͖ΔൣғͰ͡ΊΔͱΑ͍ɻ 1. SLI/SLO ͕ఆٛ͞Ε͍ͯΔ 2. SLI/SLO ʹؔ͢ΔϫʔΫϑϩʔ͕ఆٛ͞Ε͍ͯΔ 3. αʔϏενʔϜΛר͖ࠐΈͭͭɺSRE ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ • SLO ͷΛτϦΨʔʹΞϥʔτ௨Λ͢Δ • ৼΓฦΓձΛߦ͏
SLI/SLO ಋೖ: ϑΣʔζ2 SRE ͷతͳࢧԉͳ͠Ͱ SLI/SLO ͕ӡ༻͞ΕΔମ੍Λࢦ͢ɻ ϑΣʔζ1ͰɺSLI/SLO ʹର͢ΔՁ͕ೝΊΒΕ͔ͯΒ͜ͷϑΣʔζʹҠߦ͢Δɻ ר͖ࠐΉਓϩʔϧ͕૿͍͑ͯΔ͕ϑΣʔζ1ͱҟͳΔɻ
ΑΓଟ͘ͷਓ͕ސ٬ࢹΛ࣋ͬͯ SLI/SLO Λӡ༻͢Δঢ়ଶΛࢦ͢ɻ 1. PdM ࣄۀऀͳͲͱͱʹɺࣄۀࢹΛ౿·͑ͯ SLI/SLO ΛఆΊΔ͜ͱ ͕Ͱ͖Δ 2. αʔϏενʔϜ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ 3. Embedded SRE ͱͯ͠αʔϏενʔϜΛϑΥϩʔ͢Δମ੍͕͋Δ
Facilitation • ೲಘײͷ͋Δ݁ʹ౸ୡ͢Δ ͨΊͷεΩϧ • ޮՌతͳ ձٞͷ४උͱਐߦ Λߦ͏ͨΊʹඞཁͳձٞϚωδϝϯ τͷఆੴ
it's difficult to find someone who's lucky enough to only
have useful, effective meetings. This is equally true for SRE. — Chapter 31 - Communication and Collaboration in SRE
None
None
None
SRE Organization Design
ཧͱݱ࣮ͷΪϟοϓʹର͢Δղ૾Λ্͛Δ 1. SRE धཁʹରͯ͠ϦιʔεෆʹؕΔ͜ͱ͕ଟ͍ 2. εέʔϧ͢ΔߏΛऔΔඞཁ͕͋Δ 3. εέʔϥϏϦςΟΛอͱ͏ͱ͢Δͱ༷ʑͳϓϥΫςΟε͕ඞཁʹͳΔ 4. ࣮ࡍʹϦιʔε͕গͳ͍ͷͰɺগͣͭ͠ਐΊΔඞཁ͕͋Δʢཁό
ϥϯεʣɺͰࢥߟΛࢭΊͳ͍ 5. → SRE ৫Λߏங͢Δ্ͰɺͰ͖ΔϙΠϯτͲ͜ʹ͋Δ͔Λ ཧղ͢Δ
SRE ৫Λߏங͢Δࡍʹॏཁͳ3ͭͷϙΠϯτ • Roles • Responsibilities • Mindset
දతͳ 2 ͭͷϩʔϧ6 6 New Relic, SRE-iously: Defining the Principles,
Habits, and Practices of Site Reliability Engineering , 2018
Responsibilities • ۀͷ୲ͷॴࡏΛ໌֬ʹ͢Δ • RACIϚτϦΫεҎԼͷ4ͭͷཁૉΛ໌֬ʹࣔ͢ࡍʹ༗ޮ • RʢResponsibleʣ: ࣮ߦऀ • AʢAccountableʣ:
આ໌ऀ • CʢConsultedʣ: ૬ஊઌ • IʢInformedʣ: ใࠂઌ • Google ͷهࣄͰ RACI ༻ޠ͕ར༻͞Ε͍ͯΔ7 7 Alex Bramley, Are we there yet? Thoughts on assessing an SRE team’s maturity, 2021
RACI Matrix example8 8 Devops Raci Matrix Ppt Powerpoint Presentation
File Format
Mindset • ৫ͷ৴པੑʹ 5 ͭͷجຊతஈ֊͕͋Γɺ͋Δ࣌ͷ৫ͷϚΠϯυηοτΛද͢9 • Absent: ৫ʹͱͬͯ৴པੑޙճ͠ʹͳ͍ͬͯΔঢ়ଶ • Reactive:
ۙͰੜͨ͡৴པੑͷͷϑΥϩʔ͕ߦΘΕΔ͕ɺγεςϜͷظతͳࢿ͠ͳ͍ • Proactive: ఆظతͳ৫ϓϩηεΛ௨ͯ͡જࡏతͳ৴པੑϦεΫ͕ಛఆ͞Εରॲ͞ΕΔ • Strategic: ΞʔΩςΫνϟɺϓϩμΫτɺϓϩηεΛମܥతʹมߋ͢Δ͜ͱͰϦεΫͷΫϥεΛཧ ͢Δ • Visionary: ৴པੑͷ࠷ߴҐʹ౸ୡ͓ͯ͠Γɺ৴པੑͷ෯͍औΓΈΛϕετϓϥΫςΟε͓Αͼ ܦݧʹج͍ͮͯࣾ֎ͰਪਐͰ͖Δ 9 What’s your org’s reliability mindset? Insights from Google SREs
Mindset ͷཁ • ඞͣ͠ Strategic ϑΣʔζ Visionary ϑΣʔζʹ͍Δඞཁͳ͍ • ෳͷϑΣʔζʹ·͕ͨΔଐੑΛ͍࣋ͬͯΔ͜ͱҰൠత
• େ෦डಈత͕ͩҰ෦ੵۃతଐੑΛ࣋ͭύλʔϯ͋Δ • ϚΠϯυηοτ৫ͷঢ়ଶʹ߹ΘͤͯมԽ͍ͤͯ͘͞ඞཁ͕͋Δ • e.g. डಈతˠੵۃతˠઓུత • ࡞ۀΛநԽ͠ɺٕೳΛঝ͠ɺߟ͑Λ໌จԽ͠ͳ͕ΒϑΣʔζΛ্͛ ͍ͯ͘
Lessons Learned
Why is Organization Important in SRE? • ৴པੑϏδωεʹ͓͍ͯॏཁͳࢦඪͰ͋ΓɺاۀશମʹӨڹ͕͋Δ ͨΊ •
৴པੑސ٬ʹڧ͘ඥ͍͓ͯΓɺSRE νʔϜ୯ମͰཧ͢Δͷࠔ • SRE ͷ࣮ફɺଟ͘ͷίϥϘϨʔγϣϯΛ௨ͯ͡৫తʹऔΓΉඞཁ͕ ͋Δ ͨΊ • Ұ؏ͨ͠৴ʹج͍ͮͨϓϥΫςΟεͷ࣮ફʹɺจԽͷৢͱՁ؍ͷ ڞ༗͕ඞཁෆՄ Ͱ͋ΔͨΊ • ݸਓͰͳ͘ɺ৫తʹऔΓΉඞཁ͕͋Δ
Soft Skills required to implement SRE • SRE ʹϋʔυεΩϧ͚ͩͰͳ͘ιϑτεΩϧॏཁ •
৫ʹ SRE Λಋೖ͢Δ্ͰॏཁͳιϑτεΩϧͷྫΛհ • SLI/SLO ϙετϞʔςϜͳͲͷϓϥΫςΟεͷ࣮ફʹཱ ͭεΩϧͱͯ͠ɺOrganizational Behavior ͱ Facilitation Λઆ ໌
SRE Organization Design • ࣗࣾʹͱͬͯదͳ SRE ৫Λͭ͘Δࡍʹॏཁͳ3ͭͷϙΠϯτΛհ • গͣͭ͠ਐΊΔͨΊʹ֤ϙΠϯτΛஈ֊తʹҠߦ͍ͯ͘͠ͱΑ͍ •
Roles: ·ͣ Pure SRE ͔Β͡Ίͯɺঃʑʹ Embedded SRE Λݕ౼͢Δ • Responsibilities: ·ͣ SRE ͕ R Λ୲͍ͳ͕Βɺগͣͭ͠ݖݶҠৡΛਐ Ίͯ A C ʹҠߦ͢Δ • Mindset: ·ͣ Absent Λղফ͠ɺม༰Ͱ͖Δ෦Λݟ͚ͭͯ Reactive Proactive ʹ͍ͯ͘͠
We are Hiring! topotal.com/careers/software_engineer_sre