Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How We Foster Reliability in Diversity
Search
Narimichi Takamura
May 14, 2022
Technology
14
13k
How We Foster Reliability in Diversity
SRE NEXT 2022 の基調講演の資料です。
https://sre-next.dev/2022/schedule#kc01
Narimichi Takamura
May 14, 2022
Tweet
Share
More Decks by Narimichi Takamura
See All by Narimichi Takamura
インシデントキーメトリクスによるインシデント対応の改善 / Improving Incident Response using Incident Key Metrics
nari_ex
1
11k
組織的なインシデント対応を目指して〜成熟度評価と改善のステップ〜 / Towards an Organized Incident Response - Maturity Assessment and Improvement Steps -
nari_ex
7
8.9k
Waroomの開発モチベーションと今後のロードマップ / Waroom development motivation and roadmap
nari_ex
1
1.6k
Engineering with Business Impact
nari_ex
2
310
SRE Practices in Organizations
nari_ex
16
10k
Hardening におけるトラブルシューティング / Troubleshooting in Hardening
nari_ex
1
360
私が Engineering Manager になるまでに経験してきたこと、大切にしてきたこと / Lecture materials for Introduction to Venture Business at UEC
nari_ex
0
250
運用技術者組織の設計と運用 / Design and operation of operational engineer organization
nari_ex
11
10k
エンジニアリング組織の基礎知識 / Basic knowledge of engineering organization
nari_ex
10
4.7k
Other Decks in Technology
See All in Technology
stupid jj tricks
indirect
0
7.8k
タスクって今どうなってるの?3.14の新機能 asyncio ps と pstree でasyncioのデバッグを (PyCon JP 2025)
jrfk
1
230
From Prompt to Product @ How to Web 2025, Bucharest, Romania
janwerner
0
110
Green Tea Garbage Collector の今
zchee
PRO
2
380
OCI Network Firewall 概要
oracle4engineer
PRO
1
7.8k
いまさら聞けない ABテスト入門
skmr2348
1
190
「技術負債にならない・間違えない」 権限管理の設計と実装
naro143
35
11k
Findy Team+のSOC2取得までの道のり
rvirus0817
0
300
生成AIで「お客様の声」を ストーリーに変える 新潮流「Generative ETL」
ishikawa_satoru
1
280
PythonとLLMで挑む、 4コマ漫画の構造化データ化
esuji5
1
130
GopherCon Tour 概略
logica0419
2
170
Azure Well-Architected Framework入門
tomokusaba
0
210
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
45
7.7k
Writing Fast Ruby
sferik
629
62k
The World Runs on Bad Software
bkeepers
PRO
71
11k
Docker and Python
trallard
46
3.6k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
It's Worth the Effort
3n
187
28k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Transcript
None
2
3
4
5
6
Motivation • اۀͷઓུαʔϏεɺจԽͳͲͷݻ༗ͷίϯςΩετ͕͋Δ தͰɺͲͷΑ͏ʹ৫ʹ SRE Λ࣮ફ͢ΔͱΑ͍͔ • αʔϏεΛऔΓר͘ڥͷมԽʹରͯ͠ SRE ͳʹ͕Ͱ͖Δͷ
͔ => ৫ʹదͨ͠ SRE Λఆٛ͠ɺҭ͍ͯͯͨ͘ΊͷऔΓΈΛհ ͠ͳ͕Βɺଟ༷ੑͱͱʹ SRE ͕ͲͷΑ͏ʹาΜͰ͍͔͘ ʹ͍ͭ ͓ͯ͠·͢ɻ 7
8
9
ٻΊΒΕΔ SRE ͷଟ༷Խ 10
SRE ͕ଟ༷Խ͢Δཧ༝ • ৫͝ͱʹ৴པੑͷఆ͕ٛҟͳΔͨΊ • ৴པੑࣄۀઓུγεςϜಛੑͳͲ༷ʑͳཁૉ͕Өڹ͢Δ • αʔϏεΛऔΓרͯ͘͢ͷཁૉ͕ಉ͡ʹͳΔ͜ͱͳ͍ • ৴པੑͷܭଌํ๏ඪɺͦͷୡํ๏ʹ͍ͨΔશ͕ͯಉ
͡ʹͳΔ͜ͱͳ͍ • → SRE ΛҰ༷ʹ࣮ફ͢Δ͚ͩͰ͏·͍͔͘ͳ͍ 11
ྫ: ϓϩμΫτͷϑΣʔζʹΑΔ৴པੑͷҧ͍ 12
ྫ: γεςϜಛੑʹΑΔ৴པੑͷҧ͍ 13
it's a marathon, not a sprint1 1 With SRE, failing
to plan is planning to fail 14
Why is SRE a marathon? • ٕज़తͳ໘ͱਓؒతͳ໘ʢจԽϓϩηεͳͲʣͷ՝͕͍͘ ͭંΓॏͳ͍ͬͯΔ • νʔϜϦʔμʔܦӦΛר͖ࠐΉඞཁ͕͋Δ
• ex. Ϧιʔεʢਓ͓ۚʣΛ৴པੑ࣮ݱͷͨΊʹׂΓͯΔ • ex. ࣗ෦த৺తͳߟ͑ํΛ͠ɺSRE ͷจԽతݪଇΛҭΉ → SRE ৫ͷมֵ͏ͷͰ͋ΓɺҰேҰ༦Ͱ͠ಘͳ͍ 15
16
৫ʹదͨ͠ SRE ͷ࣮ફ ͲͷΑ͏ʹߦ͏ͱΑ͍͔ 17
SRE ͷ࣮ફʹ͓͚Δॏཁͳ5ͭͷεςοϓ23 1. ࣮ફʹඞཁͳใΛूΊΔ 2. খ࢝͘͞Ίͯɺ܁Γฦ͢ 3. νʔϜΛࢧԉ͢Δ 4. ֶΜͩ͜ͱΛεέʔϧ͢Δ
5. σʔλυϦϒϯܕͷϚΠϯυηοτΛ۩ମԽ͢Δ => ͍͖ͳΓϓϥΫςΟε࣮ફΛ͢ΔͷͰͳ͘ɺঢ়گѲ͔Β͡ΊΔ => ࠷ऴతʹɺ৫͕ࣗతʹ SRE Λ࣮ફͰ͖Δঢ়ଶΛࢦ͢ 3 Four steps to jumpstarting your SRE practice 2 The Professional Scrum Product Owner Book 18
SLI/SLO ʹ͓͚Δεςοϓͷ࣮ફྫ εςοϓ ۩ମྫ ࣮ફʹඞཁͳใΛूΊΔ ৴པੑʹؔ͢Δ՝ΛѲɺࣄۀऀΤϯδχΞ Ϧϯά৫ͷϦʔμʔΛಛఆ͢Δ খ࢝͘͞Ίͯɺ܁Γฦ͢ ݫີͳ SLI/SLO
Λఆٛͤͣɺখ͞ͳαʔϏεɾνʔϜ Λରʹ͡ΊͯΈΔ νʔϜΛࢧԉ͢Δ Embbedded SRE Λ༻͍֤ͯνʔϜͰ SLI/SLO Λ࣮ફ͢ Δ ֶΜͩ͜ͱΛεέʔϧ͢Δ ษڧձͷ։࠵ɺSRE Ξϯόαμʔͷ໋ͳͲʹΑͬͯ ద༻ൣғΛ֦େ͢Δ σʔλυϦϒϯܕͷϚΠϯυηοτΛ۩ମԽ͢Δ SLI/SLO ͷվળΛ௨ͯ͠ଌఆͷΧϧνϟʔΛৢ͢Δ 19
5ͭͷεςοϓʹΑΔ৫ͷมԽ • ʮঢ়گѲʯˠʮখ࣮͘͞ફʯˠʮࢧԉʯˠʮՁೝࣝΛΞο ϓσʔτʯˠʮ৫શମ͕ೳಈతʹ࣮ફʯͷྲྀΕ • → ৫શମͷߦಈ͚ͩͰͳ͘ɺՁ؍มԽ͍ͯ͠Δ • ࣮ફ͢ΔϓϥΫςΟεʹΑͬͯɺՁೝࣝͷΞοϓσʔτҎ ߱ͷఔ͕ԕ͘ײ͡Δ......
• → ߦಈݪཧͷཧΛͯΊΔ͜ͱͰ൫໘ཧ͕Ͱ͖ͦ͏ 20
৫ͷණࢁϞσϧ • ߦಈݪཧʹؔ͢ΔණࢁͷϝλϑΝʔ • ʹݟ͍͑ͯΔණΑΓԿഒେ͖͍ණ ͕ਫ໘Լʹଘࡏ͢Δ • Լ্ͷཁҼʹͳ͍ͬͯΔ • ex.
ߦಈΛܾΊΔͷϙϦγʔ • ex. ϙϦγʔΛܾΊΔͷՁ؍ • ্ม͑͘͢ɺԼม͑ʹ͍͘ • ৫มֵɺ্͔ΒͪΖΜɺԼ͔Βͷ Ξϓϩʔνॏཁ 21
SRE ͷϓϥΫςΟεͱණࢁϞσϧ 22
SRE ࣮ફʹ͓͚ΔΈ • SLI/SLO ͳͲͷϓϥΫςΟεͷ࣮ફɺLevel 1,2 Λ܁Γฦ͠มߋ͠ͳ͕ Βɺ࠷ऴతʹՁ؍ΛΞοϓσʔτ͍ͯ͘͠औΓΈͱ͍͑Δ • Level
1,2 ͷࣄྫ૿͖͍࣮͑ͯͯͯફʹऔΓֻ͔Γ͍͢ • ͔͠͠ɺ্هͷऔΓΈ͚ͩͰ Level 3 ͷมԽʹ౸ୡ͠ͳ͍͜ͱ͕͋Δ • ৗͷۀมΘΔ͕ɺࣗతͳ SRE ࣮ફʹڑ͕͋Δ • => Level 3 ʹରͯ͠ͳʹ͔Ξϓϩʔν͕Ͱ͖ͳ͍͔ 23
Level 3 ͷΞϓϩʔν => SRE ͷํੑɾՁ؍Λ໌֬ʹ͢Δ 24
ͲͷΑ͏ʹํੑɾՁ؍Λ໌֬ʹ͢Δ͔ • ϑϨʔϜϫʔΫϛογϣϯɾϏδϣϯɾόϦϡʔΛར༻ • খ͘͞͡ΊΔͨΊʹ SREs ͷΈΛର • CTO
EM ͳͲɺ৫ͷϦʔμʔΛר͖ࠐΜͰ࣮ࢪ 25
SRE ͷϏδϣϯɾϛογϣϯɾόϦϡʔΛఆٛ͢Δ εςοϓ ۩ମྫ ࣮ફʹඞཁͳใΛूΊΔ اۀɾ৫ͷίϯςΩετΛѲ͢Δ খ࢝͘͞Ίͯɺ܁Γฦ͢ MVV Λ࡞͠ɺSREs ʹߜͬͯӡ༻͠ͳ͕ΒվળΛੵ
ΈॏͶΔ νʔϜΛࢧԉ͢Δ Embbedded SRE Λ༻͍ͯɺ֤νʔϜͰ SRE ͷ࣮ફΛ͢ Δ ֶΜͩ͜ͱΛεέʔϧ͢Δ MVV Λϕʔεʹͨ͠Ձ؍ͷڞ༗ํੑͷ͢Γ߹Θ ͤΛߦ͏ σʔλυϦϒϯܕͷϚΠϯυηοτΛ۩ମԽ͢Δ SRE ͷ࣮ફʹؔ͢Δ KPI ΛఆΊɺଌఆͷΧϧνϟʔΛ ৢ͢Δ 26
اۀɾ৫ίϯςΩετΛѲ͢Δ 27
લఏ: ձࣾͱ SREs ͷ MVV Λ߹ͤ͞Δ 28
ͲͷΑ͏ʹاۀɾ৫ίϯςΩετͷѲ͢Δ͔ • اۀ৫Λද͢มΛͯ͢ચ͍ग़ͨ͠Γɺܭଌ͢Δ͜ͱ ࠔ • → اۀ׆ಈΛ၆ᛌతʹଊ͑ͳ͕ΒɺSRE ʹӨڹͷ͋Δओཁͳม Λཧ͢Δ 29
اۀ׆ಈͷશମ૾ 30
اۀ׆ಈͷશମ૾ 31
اۀ׆ಈͷநԽ 32
৫ͷίϯςΩετʹ͓͚Δओཁͳม 1. اۀํʹؔ͢ΔίϯςΩετ 2. αʔϏεʹؔ͢ΔίϯςΩετ 3. ৫ʹؔ͢ΔίϯςΩετ 33
اۀํͷίϯςΩετͷѲ ֬ೝର ۩ମతͳΠϯϓοτྫ ؍ ձࣾͷํੑ ϛογϣϯɺϏδϣ ϯɺόϦϡʔʹؔ͢Δ ࢿྉ Ͳ͜ʹ͔͍ͬͯ͘ͷ ͔ɺͳʹΛେʹͯ͠
͍Δͷ͔ ࣄۀઓུ IRɺ͚ࣾͷΩοΫ Φϑࢿྉ ݱࡏͷձࣾͷϑΣʔ ζɺࣄۀͷઓུΛ ͑Δ 34
αʔϏεͷίϯςΩετͷѲ ֬ೝର ۩ମతͳΠϯϓοτྫ ؍ αʔϏε༰ αʔϏεઆ໌ࢿྉɺ࣮ࡍʹ ͬͯΈΔ ͲͷΑ͏ͳαʔϏεͳͷ͔ αʔϏεߏ γεςϜߏਤɺ։ൃϑϩ
ʔɺϦϦʔεϓϩηε ͲͷΑ͏ͳΈͳͷ͔ɺ ͲͷΑ͏ʹ։ൃ͞Ε͍ͯΔ ͷ͔ αʔϏε՝ Πϯγσϯτͷཤྺɺγε ςϜʹ͓͚Δॏཁͳ՝ ͳʹ͕ʹͳͷ͔ 35
৫ͷίϯςΩετͷѲ ֬ೝର ۩ମతͳΠϯϓοτྫ ؍ ৫ߏ ৫ਤɺνʔϜͷׂ આ໌ࢿྉ ͲͷΑ͏ͳνʔϜ͕͋ Δͷ͔ɺΩʔϚϯ୭ ͳͷ͔
৴པੑͷҙࣝ ϚΠϯυηοτΛௐࠪ ͢Δ ৴པੑ͕Ͳͷ͘Β͍׆ ಈʹӨڹ͕͋Δͷ͔ 36
ิ: ৫ͷ৴པੑͷϚΠϯυηοτ ৴པੑʹؔ͢Δ৫ͷϚΠϯυηοτ5ͭʹྨͰ͖Δ4 ϑΣʔζ આ໌ Absent ৫ʹͱͬͯ৴པੑޙճ͠ʹͳ͍ͬͯΔঢ়ଶ Reactive ۙͰੜͨ͡৴པੑͷͷϑΥϩʔ͕ߦΘΕΔ͕ɺγεςϜͷظత ͳࢿ΄΅ͳ͍
Proactive ఆظతͳ৫ϓϩηεΛ௨ͯ͡જࡏతͳ৴པੑϦεΫ͕ಛఆ͞Εରॲ͞Ε Δ Strategic ΞʔΩςΫνϟɺϓϩμΫτɺϓϩηεΛମܥతʹมߋ͢Δ͜ͱͰϦεΫ ͷΫϥεཧ͢Δ Visionary ৴པੑͷ࠷ߴҐʹ౸ୡ͓ͯ͠Γɺ৴པੑͷ෯͍औΓΈΛϕετϓϥ ΫςΟεΑͼܦݧʹج͍ͮͯࣾ֎ͰਪਐͰ͖Δ 4 What’s your org’s reliability mindset? Insights from Google SREs 37
SREs ͷ MVV Λ࡞͢Δ 38
MVV ࡞ͷεςοϓ 1. CTOɺEMɺSREs ͷώΞϦϯά • ແҙࣝతʹ༏ઌ͍ͯ͠Δ͜ͱ • ॏཁʹ͍ͯ͠ΔՁ؍ •
γεςϜϫʔΫϑϩʔͷϕετέʔεͷ͢Γ߹Θͤ 2. ͖ͨͨΛ࡞ • લஈͰ४උͨ͠ίϯςΩετͱώΞϦϯάΛͱʹ࡞ 3. ϨϏϡʔ & मਖ਼ • Google Docs Λར༻͠ɺSREνʔϜϨϏϡʔΛඇಉظͰߦ͏ • େ͕Ͱ͖ͨΒɺ͏ݴ༿ͷબఆχϡΞϯεͷ֬ೝΛಉظతʹߦ͏ 39
MVV ͷྫ 40
Χϧνϟʔͷৢʹؔ͢Δ KPI ΛఆΊΔ Ωʔϫʔυ KPI ͷྫ ૉૣ͘σϓϩΠ͢Δ CI/CD ࣌ؒͷܭଌ ҆શʹσϓϩΠ͢Δ
σϓϩΠޙʹमਖ਼͕ඞཁʹͳΔׂ߹ͷܭଌ Ϣʔβʔ͕҆৺ͯ͑͠Δ γεςϜʹ͓͚ΔηΩϡϦςΟج४ͷୡ ͷܭଌ • ։ൃपΓ Agile KPIs ͰௐΔͱݟ͕๛ʹग़ͯ͘Δ • ͦͷଞͷ׆ಈ KPI ͱΈ߹Θͤͯɺࣗࣾͷ "SRE KPIs" ΛఆٛͰ͖ΔͱΑ͍ 41
͜͜·Ͱͷ·ͱΊ • SRE ͷ࣮ફ 5 ͭͷεςοϓʹ͚Δ͜ͱ͕Ͱ͖Δ • ঢ়گѲ͔Β͡·Γɺ৫ͷϚΠϯυηοτͷมߋ·Ͱߦ͏ • 5
ͭͷεςοϓ৫มֵͷϓϩηεͳͷͰɺණࢁϞσϧΛద༻ͯ͠ཧͰ͖Δ • ණࢁϞσϧͷԼ͔Β্Ξϓϩʔν͢Δྫͱͯ͠ SRE ͷ MVV ߏஙΛհ • ํੑɾՁ؍ͷߏங 5 ͭͷεςοϓ༗ޮ • ၆ᛌతʹاۀ׆ಈΛଊ͑Δ͜ͱͰɺঢ়گѲͷΓޱΛಛఆ • ํੑɾՁ؍ͷৢʹ͓͍ͯɺσʔλυϦϒϯͷϚΠϯυηοτ༗ޮ 42
43
ڥͷมԽʹΑͬͯ దͳ৴པੑมΘ͍ͬͯ͘ • αʔϏεΛऔΓר͘ڥɺ࣌ؒܦաͱ ͱʹมԽ͠ଓ͚Δ • ಋೖͨ͠ͷɺڥͷมԽͱͱʹ࠷ దͰͳ͘ͳΔ • ڥͷมԽʹରͯ͠ɺͲͷΑ͏ʹཱͪ
͔͑Α͍ͷ͔ → มԽͷରԠ͢ΔͨΊͷΑ͍Ξϓϩʔν ͳ͍ͩΖ͏͔ 44
μΠφϛοΫέΠύϏϦςΟ • ͳʹ͕ى͜Δ͔Θ͔Βͳ͍࣌ͳͷͰԿ͕ى͖ͯରԠͰ͖ΔΑ͏ʹ͠Α͏ͱ͍͏ߟ͑ํ • ڥঢ়گ͕ܹ͘͠มԽ͢ΔதͰɺاۀ͕ͦͷมԽʹରԠͯࣗ͠ݾΛมֵ͢Δೳྗ5 • ҎԼͷ3ͭͷೳྗʹྨͰ͖Δ • ײʢSensingʣ: ڻҟةػΛײ͢Δೳྗ
• ัଊʢSeizingʣɿػձΛଊ͑ɺطଘͷࢿ࢈ɾࣝɾٕज़Λ࠶ߏͯ͠ڝ૪ྗΛ֫ಘ͢Δೳྗ • ม༰ʢTransformingʣɿڝ૪ྗΛ࣋ଓతͳͷʹ͢ΔͨΊʹɺ৫શମΛ৽͠ม༰ͤ͞Δ ೳྗ 5 ৽࣌ͷܦӦઓུʮμΠφϛοΫɾέΠύϏϦςΟʯͱԿ͔ʁ 45
3ͭͷೳྗSREͷεςοϓʹඥ͚Δ͜ͱ͕Ͱ͖Δ 1. ࣮ફʹඞཁͳใΛूΊΔ => Sensing 2. খ࢝͘͞Ίͯɺ܁Γฦ͢ => Seizing 3.
νʔϜΛࢧԉ͢Δ => Transforming 4. ֶΜͩ͜ͱΛεέʔϧ͢Δ => Transforming 5. σʔλυϦϒϯܕͷϚΠϯυηοτΛ۩ମԽ͢Δ => Transforming 46
มԽʹΑͬͯੜͨ͡৽͍͠ ͚ࣗͩͰΧόʔ͖͠Εͳ͍ͷͰͳ͍͔ 47
18.3.5 νʔϜͷྗֶ ࢲͨͪɺSRE ͷιϑτΣΞ։ൃϓϩδΣΫτʹ ܞΘΔΤϯδχΞΛબ͢Δࡍʹɺ৽͍͠τϐοΫ ʹૉૣ͘ରԠ͍͚ͯ͠ΔθωϥϦετɺ๛ͳࣝ ͱܦݧΛ͍࣋ͬͯΔΤϯδχΞΛΈ߹Θͤͯ࠷ॳ ͷνʔϜΛ্ཱͪ͛Δ͜ͱͰɺେ͖ͳϝϦοτ͕ಘ ΒΕΔ͜ͱʹؾ͖ͮ·ͨ͠ɻ ܦݧͷଟ༷ੑʹΑͬͯΛͳ͘͢ͱڞʹɺͯ͢
ͷνʔϜͷϢʔεέʔε͕ࣗͷνʔϜͷϢʔε έʔεͱಉͩ͡ͱࢥ͍ࠐΜͰ͠·͏མͱ݀͠Λආ͚ Δ͜ͱ͕Ͱ͖·͢ɻ νʔϜʹͱͬͯɺඞཁͱ͞ΕΔεϖγϟϦετͱ ͷؔੑΛཱ֬͠ɺΤϯδχΞ͕৽͍͠ྖҬʹ ৺Α͘औΓΊΔΑ͏ʹ͢Δ͜ͱ͕ॏཁͰ͢ɻ 48
มԽͱଟ༷ੑ • 5ͭͷεςοϓΛ܁Γฦ͢͜ͱͰมԽʹରԠͰ͖Δ͕มֵͷ෯ʹݶք͕͋Δ • ৫Ͱଟ༷ੑΛ͛Δ͜ͱͰɺมԽͷੑΛڧԽͰ͖Δ • SRE ຊͰɺҟͳΔλΠϓͷΤϯδχΞΛΈ߹Θ͍ͤͯͨ • SRE
ίϛϡχςΟΛνʔϜͱଊ͑Δͱଟ༷ੑΛ͛ΔઈͷػձʹͳΔ • มԽʹର͢Δଟ༷ੑʹΑΔରࡦɺΞϯνϑϥδϟΠϧͷߟ͑ํͷҰ෦Ͱ͋Δ • ͞ΒʹରࡦΛߟ͑Δ߹ɺγεςϜੑͷΞϓϩʔν͕ࢀߟʹͳΓͦ͏ 49
·ͱΊ • SRE ࣮ફΛ͢Δ্Ͱॏཁͳ5ͭͷεςοϓΛհ • ৫Λมֵɺණࢁͷ্ͱԼͷ྆໘͔Βߦ͑ΔͱΑ͍ • SRE ͷํੑɾՁ؍Λ໌֬ʹ͢ΔऔΓΈΛհ •
ڥͷมԽͷੑΛอͭʹɺ৫ͷଟ༷ੑཱ͕ͭ 50