Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SRE Practices in Organizations
Search
Narimichi Takamura
November 16, 2021
Technology
16
11k
SRE Practices in Organizations
Infra Study 2nd #7「SREと組織」の登壇資料です。
https://forkwell.connpass.com/event/228038/
Narimichi Takamura
November 16, 2021
Tweet
Share
More Decks by Narimichi Takamura
See All by Narimichi Takamura
Observability — Extending Into Incident Response
nari_ex
2
1k
インシデントキーメトリクスによるインシデント対応の改善 / Improving Incident Response using Incident Key Metrics
nari_ex
1
13k
組織的なインシデント対応を目指して〜成熟度評価と改善のステップ〜 / Towards an Organized Incident Response - Maturity Assessment and Improvement Steps -
nari_ex
7
9.5k
Waroomの開発モチベーションと今後のロードマップ / Waroom development motivation and roadmap
nari_ex
1
1.7k
Engineering with Business Impact
nari_ex
2
340
How We Foster Reliability in Diversity
nari_ex
14
13k
Hardening におけるトラブルシューティング / Troubleshooting in Hardening
nari_ex
1
390
私が Engineering Manager になるまでに経験してきたこと、大切にしてきたこと / Lecture materials for Introduction to Venture Business at UEC
nari_ex
0
260
運用技術者組織の設計と運用 / Design and operation of operational engineer organization
nari_ex
11
10k
Other Decks in Technology
See All in Technology
ブラックボックス観測に基づくAI支援のプロトコルのリバースエンジニアリングと再現~AIを用いたリバースエンジニアリング~ @ SECCON 14 電脳会議 / Reverse Engineering and Reproduction of an AI-Assisted Protocol Based on Black-Box Observation @ SECCON 14 DENNO-KAIGI
chibiegg
0
140
技術的負債の泥沼から組織を救う3つの転換点
nwiizo
5
1.1k
三菱UFJ銀行におけるエンタープライズAI駆動開発のリアル / Enterprise AI_Driven Development at MUFG Bank: The Real Story
muit
11
21k
Exadata Fleet Update
oracle4engineer
PRO
0
1.3k
AI が Approve する開発フロー / How AI Reviewers Accelerate Our Development
zaimy
1
260
Data Hubグループ 紹介資料
sansan33
PRO
0
2.8k
Snowflakeデータ基盤で挑むAI活用 〜4年間のDataOpsの基礎をもとに〜
kaz3284
1
330
マネージャー版 "提案のレベル" を上げる
konifar
16
12k
AIに視覚を与えモバイルアプリケーション開発をより円滑に行う
lycorptech_jp
PRO
1
780
OSSで構築するIT基盤管理実践事例: NetBox・Snipe-IT・FreeRADIUS+PrivacyIDEA / Practical Case Studies of IT Infrastructure Management Using OSS
nttcom
0
190
類似画像検索モデルの開発ノウハウ
lycorptech_jp
PRO
2
730
Oracle Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
1.6k
Featured
See All Featured
The SEO identity crisis: Don't let AI make you average
varn
0
400
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
150
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
870
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
The Invisible Side of Design
smashingmag
302
51k
First, design no harm
axbom
PRO
2
1.1k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
190
Abbi's Birthday
coloredviolet
2
5.1k
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
117
110k
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
450
Transcript
None
None
about:me
None
None
None
None
Motivation • SRE ͷ৴ཧղͨ͠ && SRE ͷϓϥΫςΟεཧղͨ͠ • ҰํͰɺͲͷΑ͏ʹͯࣗࣾ͠ʹ SRE
Λಋೖ͢Ε͍͍ͷ͔ϐϯͱ͜ͳ͍ • IT ٕज़Ҏ֎ʹཁૉ͕ඞཁͦ͏͕ͩɺ۩ମతʹͲͷΑ͏ͳͷ͕͋Δͷ͔ • SRE ʹؔ͢ΔଞࣾࣄྫࢀߟʹͳΔ • ҰํͰɺࣗࣾద༻͢ΔͨΊʹɺͲͷΑ͏ͳ؍Ͱݕ౼͢ΕΑ͍ͷ͔ → ιϑτεΩϧ ͱ SRE ৫ͷઃܭϙΠϯτ ʹ͍ͭͯ͠·͢ → ࠓճͷൃද͕ ࣗࣾͷ SRE ৫ͷ্ཱͪ͛ɾशख़ ͷҰॿʹͳΕ͍Ͱ͢
Table of Contents • Why is Organization Important in SRE?
• Soft Skills required to implement SRE • SRE Organization Design
Why is Organization Important in SRE?
Business metrics Include Engineering metrics 1 1 Mohit Suley and
Kurt Andersen, Understanding Business Metrics Can Make You a Better SRE, 2019, SREcon
SRE collaborate a lot!!
Culture beats strategy every time — Chapter 31 - Communication
and Collaboration in SRE
Soft Skills required to implement SRE
Why are soft skills so important?
Chapter 31 - Communication and Collaboration in SRE
“A good SRE has an ability to critically examine a
system and use that to guide them when asking questions of the system.” — Jamie Wilkinson, SRE at Google
Top 5 Soft Skills in SRE3 1. Problem Solving 2.
Teamwork 3. Composure underpressure 4. Written communication 5. Verbal communicaiton 3 Catchpoint, 2018 SRE report
SRE ʹٻΊΒΕΔιϑτεΩϧ • ΛޮՌతʹղܾ͢ΔͨΊʹɺଞऀͱ͏·͘ڠྗ͢Δೳྗ ͕ඞཁͰ͋Δ • ͯ͢ͷ͑Λ͍ͬͯΔ͜ͱΛظ͞Ε͍ͯΔͷͰͳ͘ɺ νʔϜ৫ͷதͰ୭ʹॿ͚ΛٻΊΕΑ͍ͷ͔ɺͲͷΑ͏ʹ ίϛϡχέʔγϣϯΛͱΕΑ͍ͷ͔Λ͍ͬͯΔඞཁ͕͋Δ
Soft Skills Example in Implement SRE
Case Soft Skill Postmortem Blameless, Critical Thinking... SLI/SLO Organizational Behavior...
Building consensus with managers Facilitation, Negotiation...
Organizational Behavior • ਓΛಈ͔ͨ͢ΊͷΞϓϩʔν2ͭʹྨ͞ΕΔ • HRM: ΈʹΑΔΞϓϩʔν • OB: ରਓతͳΞϓϩʔν
• SLI/SLO ϙετϞʔςϜͳͲɺଞνʔϜΛר͖ࠐΉΑ͏ͳ γʔϯͰ OB ʹཱͭ
None
ϕʔεͱͳΔߦಈݪཧ
ॏཁͱͳΔ3ͭͷجૅࣝΧςΰϦ
جૅཧͷ۩ମྫ • ݸਓ • ex. εϖϯαʔʮණࢁϞσϧʯɺϘϠςΟζʮίϯϐςϯγʔ֓೦ ਤʯɺόϯσϡʔϥʮࣗݾޮྗײͷߏཁૉʯ • ूஂ •
ex. ϨϰΟϯʮ৫มֵϓϩηεʯɺλοΫϚϯϞσϧ • Ϧʔμʔγοϓ • ex. ΧϦεϚϦʔμʔγοϓɺαʔόϯτϦʔμʔγοϓ...
όϯσϡʔϥʮࣗݾޮྗײͷߏཁૉʯ5 5 GLOBIS ݟ࣮ʂ, MBAᶈ ࣗมֵɺߦ͖ͭΓͭɺগͣͭ͠ʲ࠷ऴճʳ, 2015
ϨϰΟϯʮ৫มֵϓϩηεʯ
OB ͷ࣮ફྫ: SLI/SLO ͷஈ֊తͳಋೖ • ৫ͷಛੑΛѲ ্ͨ͠Ͱɺղౚˠมֵˠ࠶ౚ݁ͷεςοϓ Λ ܦͭͭಋೖ͢Δ •
Dev ͷߦಈݪཧΛཧղ ্ͨ͠ͰɺSLI/SLO ͷಋೖোนΛԼ͛Δ • SLI/SLO ΛτϦΨʔʹΞΫγϣϯͰ͖ΔΑ͏ʹɺߦಈม༰Λଅ ͢ࢪࡦ ʹऔΓΉ
SLI/SLO ಋೖͷϑΣʔζ͚ͷྫ
SLI/SLO ಋೖ: ϑΣʔζ1 ·ͣ SRE ͕ओମͱͳͬͯ৫ʹ SLI/SLO Λಋೖ͠ɺՁݕূΛߦ͏͜ͱΛࢦ ͢ɻ ӡ༻શମΛר͖ࠐΈͭͭɺSRE
͕ίϯτϩʔϧͰ͖ΔൣғͰ͡ΊΔͱΑ͍ɻ 1. SLI/SLO ͕ఆٛ͞Ε͍ͯΔ 2. SLI/SLO ʹؔ͢ΔϫʔΫϑϩʔ͕ఆٛ͞Ε͍ͯΔ 3. αʔϏενʔϜΛר͖ࠐΈͭͭɺSRE ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ • SLO ͷΛτϦΨʔʹΞϥʔτ௨Λ͢Δ • ৼΓฦΓձΛߦ͏
SLI/SLO ಋೖ: ϑΣʔζ2 SRE ͷతͳࢧԉͳ͠Ͱ SLI/SLO ͕ӡ༻͞ΕΔମ੍Λࢦ͢ɻ ϑΣʔζ1ͰɺSLI/SLO ʹର͢ΔՁ͕ೝΊΒΕ͔ͯΒ͜ͷϑΣʔζʹҠߦ͢Δɻ ר͖ࠐΉਓϩʔϧ͕૿͍͑ͯΔ͕ϑΣʔζ1ͱҟͳΔɻ
ΑΓଟ͘ͷਓ͕ސ٬ࢹΛ࣋ͬͯ SLI/SLO Λӡ༻͢Δঢ়ଶΛࢦ͢ɻ 1. PdM ࣄۀऀͳͲͱͱʹɺࣄۀࢹΛ౿·͑ͯ SLI/SLO ΛఆΊΔ͜ͱ ͕Ͱ͖Δ 2. αʔϏενʔϜ͕ओମͱͳͬͯ SLO ͷӡ༻͕ߦΘΕ͍ͯΔ 3. Embedded SRE ͱͯ͠αʔϏενʔϜΛϑΥϩʔ͢Δମ੍͕͋Δ
Facilitation • ೲಘײͷ͋Δ݁ʹ౸ୡ͢Δ ͨΊͷεΩϧ • ޮՌతͳ ձٞͷ४උͱਐߦ Λߦ͏ͨΊʹඞཁͳձٞϚωδϝϯ τͷఆੴ
it's difficult to find someone who's lucky enough to only
have useful, effective meetings. This is equally true for SRE. — Chapter 31 - Communication and Collaboration in SRE
None
None
None
SRE Organization Design
ཧͱݱ࣮ͷΪϟοϓʹର͢Δղ૾Λ্͛Δ 1. SRE धཁʹରͯ͠ϦιʔεෆʹؕΔ͜ͱ͕ଟ͍ 2. εέʔϧ͢ΔߏΛऔΔඞཁ͕͋Δ 3. εέʔϥϏϦςΟΛอͱ͏ͱ͢Δͱ༷ʑͳϓϥΫςΟε͕ඞཁʹͳΔ 4. ࣮ࡍʹϦιʔε͕গͳ͍ͷͰɺগͣͭ͠ਐΊΔඞཁ͕͋Δʢཁό
ϥϯεʣɺͰࢥߟΛࢭΊͳ͍ 5. → SRE ৫Λߏங͢Δ্ͰɺͰ͖ΔϙΠϯτͲ͜ʹ͋Δ͔Λ ཧղ͢Δ
SRE ৫Λߏங͢Δࡍʹॏཁͳ3ͭͷϙΠϯτ • Roles • Responsibilities • Mindset
දతͳ 2 ͭͷϩʔϧ6 6 New Relic, SRE-iously: Defining the Principles,
Habits, and Practices of Site Reliability Engineering , 2018
Responsibilities • ۀͷ୲ͷॴࡏΛ໌֬ʹ͢Δ • RACIϚτϦΫεҎԼͷ4ͭͷཁૉΛ໌֬ʹࣔ͢ࡍʹ༗ޮ • RʢResponsibleʣ: ࣮ߦऀ • AʢAccountableʣ:
આ໌ऀ • CʢConsultedʣ: ૬ஊઌ • IʢInformedʣ: ใࠂઌ • Google ͷهࣄͰ RACI ༻ޠ͕ར༻͞Ε͍ͯΔ7 7 Alex Bramley, Are we there yet? Thoughts on assessing an SRE team’s maturity, 2021
RACI Matrix example8 8 Devops Raci Matrix Ppt Powerpoint Presentation
File Format
Mindset • ৫ͷ৴པੑʹ 5 ͭͷجຊతஈ֊͕͋Γɺ͋Δ࣌ͷ৫ͷϚΠϯυηοτΛද͢9 • Absent: ৫ʹͱͬͯ৴པੑޙճ͠ʹͳ͍ͬͯΔঢ়ଶ • Reactive:
ۙͰੜͨ͡৴པੑͷͷϑΥϩʔ͕ߦΘΕΔ͕ɺγεςϜͷظతͳࢿ͠ͳ͍ • Proactive: ఆظతͳ৫ϓϩηεΛ௨ͯ͡જࡏతͳ৴པੑϦεΫ͕ಛఆ͞Εରॲ͞ΕΔ • Strategic: ΞʔΩςΫνϟɺϓϩμΫτɺϓϩηεΛମܥతʹมߋ͢Δ͜ͱͰϦεΫͷΫϥεΛཧ ͢Δ • Visionary: ৴པੑͷ࠷ߴҐʹ౸ୡ͓ͯ͠Γɺ৴པੑͷ෯͍औΓΈΛϕετϓϥΫςΟε͓Αͼ ܦݧʹج͍ͮͯࣾ֎ͰਪਐͰ͖Δ 9 What’s your org’s reliability mindset? Insights from Google SREs
Mindset ͷཁ • ඞͣ͠ Strategic ϑΣʔζ Visionary ϑΣʔζʹ͍Δඞཁͳ͍ • ෳͷϑΣʔζʹ·͕ͨΔଐੑΛ͍࣋ͬͯΔ͜ͱҰൠత
• େ෦डಈత͕ͩҰ෦ੵۃతଐੑΛ࣋ͭύλʔϯ͋Δ • ϚΠϯυηοτ৫ͷঢ়ଶʹ߹ΘͤͯมԽ͍ͤͯ͘͞ඞཁ͕͋Δ • e.g. डಈతˠੵۃతˠઓུత • ࡞ۀΛநԽ͠ɺٕೳΛঝ͠ɺߟ͑Λ໌จԽ͠ͳ͕ΒϑΣʔζΛ্͛ ͍ͯ͘
Lessons Learned
Why is Organization Important in SRE? • ৴པੑϏδωεʹ͓͍ͯॏཁͳࢦඪͰ͋ΓɺاۀશମʹӨڹ͕͋Δ ͨΊ •
৴པੑސ٬ʹڧ͘ඥ͍͓ͯΓɺSRE νʔϜ୯ମͰཧ͢Δͷࠔ • SRE ͷ࣮ફɺଟ͘ͷίϥϘϨʔγϣϯΛ௨ͯ͡৫తʹऔΓΉඞཁ͕ ͋Δ ͨΊ • Ұ؏ͨ͠৴ʹج͍ͮͨϓϥΫςΟεͷ࣮ફʹɺจԽͷৢͱՁ؍ͷ ڞ༗͕ඞཁෆՄ Ͱ͋ΔͨΊ • ݸਓͰͳ͘ɺ৫తʹऔΓΉඞཁ͕͋Δ
Soft Skills required to implement SRE • SRE ʹϋʔυεΩϧ͚ͩͰͳ͘ιϑτεΩϧॏཁ •
৫ʹ SRE Λಋೖ͢Δ্ͰॏཁͳιϑτεΩϧͷྫΛհ • SLI/SLO ϙετϞʔςϜͳͲͷϓϥΫςΟεͷ࣮ફʹཱ ͭεΩϧͱͯ͠ɺOrganizational Behavior ͱ Facilitation Λઆ ໌
SRE Organization Design • ࣗࣾʹͱͬͯదͳ SRE ৫Λͭ͘Δࡍʹॏཁͳ3ͭͷϙΠϯτΛհ • গͣͭ͠ਐΊΔͨΊʹ֤ϙΠϯτΛஈ֊తʹҠߦ͍ͯ͘͠ͱΑ͍ •
Roles: ·ͣ Pure SRE ͔Β͡Ίͯɺঃʑʹ Embedded SRE Λݕ౼͢Δ • Responsibilities: ·ͣ SRE ͕ R Λ୲͍ͳ͕Βɺগͣͭ͠ݖݶҠৡΛਐ Ίͯ A C ʹҠߦ͢Δ • Mindset: ·ͣ Absent Λղফ͠ɺม༰Ͱ͖Δ෦Λݟ͚ͭͯ Reactive Proactive ʹ͍ͯ͘͠
We are Hiring! topotal.com/careers/software_engineer_sre