Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
SRE を実現するための組織マネジメント / Management to achieve SRE
Takeshi Kondo
March 12, 2022
Technology
3
3.1k
SRE を実現するための組織マネジメント / Management to achieve SRE
https://line.connpass.com/event/236497/
Takeshi Kondo
March 12, 2022
Tweet
Share
More Decks by Takeshi Kondo
See All by Takeshi Kondo
Who owns the Service Level?
chaspy
5
3.5k
多様な働き方を支える Working Agreements / Working agreements that support diverse work styles
chaspy
1
540
サービス立ち上げ期におけるSREの取り組み / SRE efforts in the service launch phase
chaspy
0
250
Implementing Site Reliability Engineering in your organization
chaspy
6
2k
How to measure "Site Reliability Engineering"
chaspy
6
1.9k
Site Reliability Engineering における 重要領域とパフォーマンス指標の提案 / Performance Indicators for SRE
chaspy
1
1.6k
Metric-Driven Decision Making with Custom Prometheus Exporter
chaspy
1
800
想定外の負荷を乗り切ったオンライン教育サービスの裏側 / How We Overcame the COVID-19 Crisis
chaspy
7
4.8k
Life with Datadog
chaspy
3
2.8k
Other Decks in Technology
See All in Technology
Power AutomateでのAdaptive Cards
miyakemito
1
620
開発組織の生産性を可視化する State of DevOpsとFour Keysとは / deep dive into State of DevOps
yfcgpsebp
0
300
SlackBotで あらゆる業務を自動化。問い合わせ〜DevOpsまで #CODT2022
kogatakanori
0
940
ラブグラフ紹介資料 〜プロダクト解体新書〜 / Lovegraph Product Deck
lovegraph
0
290
今どきのLinux事情
tokida
43
35k
What's Data Lake ? Azure Data Lake best practice
ryomaru0825
2
750
機械学習システムのアーキテクチャとデザインパターン
washizaki
1
590
【SAP知らない人向け】SAP on AWS 個人学習メモ/sap-on-aws-study
emiki
3
2.3k
プログラマがオブジェクト指向しても幸せになれない理由
shirayanagiryuji
0
150
さいきんのRaspberry Pi。 / osc22do-rpi
akkiesoft
6
5.3k
モブに早く慣れたい人のためのガイド / A Guide to Getting Started Quickly with Mob Programming
cybozuinsideout
PRO
2
1.9k
Camp Digital 2022: tailored advice
kyliehavelock
0
150
Featured
See All Featured
Why Our Code Smells
bkeepers
PRO
324
55k
Fontdeck: Realign not Redesign
paulrobertlloyd
73
4.1k
Making Projects Easy
brettharned
98
4.3k
WebSockets: Embracing the real-time Web
robhawkes
57
5.3k
The Web Native Designer (August 2011)
paulrobertlloyd
74
1.9k
Bootstrapping a Software Product
garrettdimon
296
110k
Producing Creativity
orderedlist
PRO
334
37k
Pencils Down: Stop Designing & Start Developing
hursman
112
9.8k
VelocityConf: Rendering Performance Case Studies
addyosmani
316
22k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
5
510
The Invisible Customer
myddelton
110
11k
The MySQL Ecosystem @ GitHub 2015
samlambert
238
11k
Transcript
SRE Λ࣮ݱ͢ΔͨΊͷ৫Ϛωδϝϯτ Takeshi Kondo / @chaspy 2022/03/12 6ࣾ߹ಉ SREษڧձ
Who am I chaspy chaspy_ Engineering Manager, Site Reliability at
Recruit Co., Ltd. Takeshi Kondo https://chaspy.me
Who am I chaspy chaspy_ ʢגʣϦΫϧʔτ ϓϩμΫτ౷ׅຊ෦ ϓϩμΫτ։ൃ౷ׅࣨ ϓϩμΫτσΟϕϩοϓϝϯτࣨ ·ͳͼྖҬϓϩμΫτσΟϕϩοϓϝϯτϢχοτ
খதߴϓϩμΫτ։ൃ෦ খதߴ̨̧̚άϧʔϓ άϧʔϓϚωʔδϟ Takeshi Kondo https://chaspy.me
ࠓ͢͜ͱ ϦΫϧʔτάϧʔϓͷ ʮϛογϣϯϚωδϝϯτʯΛ ׆༻ͯ͠։ൃνʔϜͷ SRE Capability शಘ Λࢧԉͨ͠ࣄྫ
͋Δ͍ (Partially) Embedded / Enabling SRE ͷࣄྫ
• ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ2छྨ͋Δ • Embedded SRE (from Pure SRE)
/ ֎͔Β͑Δ • Enabling SRE (in the Team) / ͔Β͛Δ • ৫نɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ • খن / ։ൃॳظϑΣʔζͰ͋Ε Embedded SRE Pattern • தେن / ։ൃνʔϜ͕ख़ͯ͘͠Ε Enabling SRE Pattern • ͜ͷ2ͭͷύλʔϯϚωδϝϯτͰσβΠϯͰ͖Δ • 100/0 Ͱͳ͘”෦తʹ”࣮ફ͢Δ͚ͩͰޮՌ͕͋Δ Tl;dr
Disclimer • Management ͷྫͱͯ͠հ͠·͕͢ɺՌ͕ग़ͨͷ ϛογϣϯΛҾ͖ड͚ͯ͘ΕͨϝϯόʔɺSREɺ։ൃνʔϜ ͷօ͞Μͷ͓͔͛Ͱ͢ɻ͍ͭ͋Γ͕ͱ͏͍͟͝·͢ʂ
Agenda • લఏɿSRE Λ࣮ݱ͢ΔͱͲ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /
Enabling SRE • ·ͱΊͱࠓޙ
Agenda • લఏɿSRE Λ࣮ݱ͢ΔͱͲ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /
Enabling SRE • ·ͱΊͱࠓޙ
SRE Λ࣮ݱ͢Δͱ
։ൃνʔϜ͕৴པੑΛ ίϯτϩʔϧ͢Δ Capability Λ ʹ͚͍ͭͯΔ͜ͱ
ͦͦ Site Reliability Engineering ͱ: Not like this • αʔϏε͕ʮߴ͍৴པੑ
(ʹ100%)ʯΛอ͍ͬͯΔ͜ͱ • SLI/SLO ΛकΕ͍ͯΔ͜ͱ • ΦϯίʔϧϩʔςʔγϣϯΛ։ൃνʔϜͰߦ͏͜ͱ https://github.com/twitter/twemoji
ͦͦ Site Reliability Engineering ͱ: Like this! • αʔϏε͕ʮϢʔβ͕ظ͢Δ৴པੑʯΛอ͍ͬͯΔ͜ͱ •
SLI/SLO Λઃఆ͠ɺඇػೳཁ݅ͱػೳཁ݅ͷ༏ઌܾఆͷ ࢦඪͱͯ͠׆༻͍ͯ͠Δ • SLO ҧ͕ൃੜͨ͠ͱ͖ʹదʹରॲͰ͖ΔΑ͏ͳϞχλ Ϧϯάํ๏ͱϙϦγʔ͕νʔϜͰಉҙ͞Ε͍ͯΔ • ্ه͕ఆظతʹݟ͞Ε͍ͯΔ https://github.com/twitter/twemoji
։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷ৴པੑʹ ؔ͢Δ
Capability औಘ Λࢧԉ͢Δ ࣗͨͪͷαʔϏεͷ ৴པੑΛࣗͨͪͰί ϯτϩʔϧͰ͖͍ͯΔ
Team Topologies • 4ͭͷνʔϜύλʔϯ • Stream Aligned • Platform •
Enabling • Complicated Subsystem • 3ͭͷίϛϡχέʔγϣϯύλʔϯ • Collaboration • X as a Service • Facilitation https://pub.jmam.co.jp/book/b593881.html
։ൃνʔϜ͕৴པੑΛίϯτϩʔϧ͢Δ Capability Λʹ͚ͭΔ: Like this! SRE ։ൃ νʔϜ ։ൃνʔϜͷࣗݾ݁ ԽΛࢧ͑Δϓϥοτ
ϑΥʔϜͱจԽΛ࡞Δ Platform Team Enabling Team Stream Aligned Team ։ൃνʔϜࣗͨͪͰඞཁͳ ͷΛࣗͨͪͰ༻ҙͰ͖Δ = self-contained / ࣗݾ݁Խ
SRE Team ͷ Vision / Mission / Values https://blog.studysapuri.jp/entry/sre-vision-mission-values
Mission ࣗݾ݁νʔϜ͕ϓϩμΫ τΛૉૣ҆͘શʹಧ͚ଓ͚ ΔͨΊͷϓϥοτϑΥʔϜ ͱจԽΛ࡞Δ
Agenda • લఏɿSRE Λ࣮ݱ͢ΔͱͲ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /
Enabling SRE • ·ͱΊͱࠓޙ
ʢͦͷલʹʣ ϓϩμΫτհ
None
None
None
ྺ࢙͔ΒৼΓฦΔ ʰελσΟαϓϦʱSRE
ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • 2019: Application Platform Λ Kubernetes Ҡ • 2020:
Microservices Readiness ͷඋ • αʔϏεΦʔφʔγοϓͷࡦఆ • Design Doc / Production Readiness Checklist • Self-services Infrastructure (terraform monorepo) • SLI/SLO • 2021: SLI/SLO ӡ༻Λ։ൃνʔϜʹશҠৡ Platform Team ͱͯ͠ Platform Λ ࡞͍ͬͯΔ Enabling Team ͱͯ͠ ։ൃ৫ʹ SLI/SLO ͳͲͷΧϧνϟʔৢ
৫نͷਪҠ ։ൃऀ
43& ։ൃऀελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷɻNative আ֎͍ͯ͠Δɻ
2021ɺEnabling SRE Λ։ൃνʔϜ͔Β࡞ΔΑ͏ઓུมߋ • ʮ৴པੑʯΛऔΓר͘։ൃ৫ͷঢ়گ͕ΑΓΞϓϦέʔγϣϯɾ υϝΠϯʹಛԽͨ͠ʹͳΓͭͭ͋ͬͨ • ෛՙࢼݧ • υϝΠϯಛԽͷ
Pod Auto Scaling • Frontend Performance ͷଌఆ ͓Αͼ SLI/SLO ͷվળ • QA ࣗಈԽ • 1ͭͷ SRE Team ͕ Enabling Team ͱͯ͠ৼΔ͏ΑΓɺ։ൃ νʔϜʹ Enabling SRE Λ࡞Δํʹઓུมߋ https://blog.studysapuri.jp/entry/2022/02/17/sre-study-session
։ൃνʔϜ Enabling SRE Λ࡞Δ
։ൃνʔϜ Enabling SRE Λ࡞Δ
2020ࠒͷঢ়گ SRE ։ൃ νʔϜ ։ൃ νʔϜ Facilitating Facilitating Enabling Team
Stream Aligned Team
2022ݱࡏ SRE ։ൃνʔϜ Facilitation SRE mem ber mem ber mem
ber Facilitation ϑϥΫλϧతʹͳΔ Platform Team Enabling Team Stream Aligned Team Enabling SRE X as a Service
Pure SRE vs Embedded SRE https://www.slideshare.net/newrelic/sreiously-de fi ning-the-principles-habits-and-practices-of-site-reliability-engineering-112178269
2020ࠒͷঢ়گ SRE ։ൃ νʔϜ ։ൃ νʔϜ Facilitating Facilitating Pure SRE
2022ݱࡏ SRE ։ൃνʔϜ Facilitating SRE mem ber mem ber mem
ber Facilitating ϑϥΫλϧతʹͳΔ Pure SRE Embedded SRE X as a Service
Agenda • લఏɿSRE Λ࣮ݱ͢ΔͱͲ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /
Enabling SRE • ·ͱΊͱࠓޙ
ࠓ͢͜ͱ ϦΫϧʔτάϧʔϓͷ ʮϛογϣϯϚωδϝϯτʯΛ ׆༻ͯ͠։ൃνʔϜͷ SRE Capability शಘ Λࢧԉͨ͠ࣄྫ
͋Δ͍ (Partially) Embedded / Enabling SRE ͷࣄྫ
ϛογϣϯϚωδϝϯτ https://github.com/twitter/twemoji
ϦΫϧʔτͷϛογϣϯϚωδϝϯτ • ϝϯόʔͷ Will / Can / Must ΛϚωʔδϟͱ͢Γ߹ΘͤΔ •
֤ϛογϣϯׂ߹ɾ༰ɾୡج४Λ߹ҙ͞ΕΔ • ϛογϣϯͷϨϙʔτϥΠϯඞͣ͠ଐͷνʔϜϚωʔ δϟͰ͋Δඞཁͳ͍
ϦΫϧʔτͷϛογϣϯϚωδϝϯτ EM Mem ber Mem ber Mem ber Mem ber
ϛογϣϯͷ 30%Λ SRE ؔ ͷͷʹઃఆ SRE ։ൃνʔϜ
۩ମతʹͲΜͳϛογϣϯΛઃఆ͔ͨ͠ • ։ൃνʔϜϝϯόʔʢதֶߨ࠲ϦχϡʔΞϧͷ։ൃʣ • ΠϯϑϥྖҬͷࣗݾ݁Խͷਪਐ 30% • ϓϩμΫτ։ൃͷͨΊͷϛογϣϯ 70% •
SRE ϝϯόʔ • ։ൃνʔϜͷ։ൃऀੜ࢈ੑͷαϙʔτ 20% • Production Release ͷαϙʔτ 20% • SRE ͷͨΊͷϛογϣϯ 60% https://studysapuri.jp/course/junior/
ϦΫϧʔτͷϛογϣϯϚωδϝϯτ EM Mem ber Mem ber ΠϯϑϥྖҬͷ ࣗݾ݁Խͷਪ ਐ(30%) SRE
։ൃνʔϜ ϓϩμΫτ։ൃʹؔ͢Δ ϛογϣϯ(70%) ։ൃऀੜ࢈ੑ/ Production Release ͷ αϙʔτ(40%) / (ଞ60%)
Ϛωʔδϟ͕ͬͨ͜ͱ • ֤ϝϯόʔͱͷఆظతͳ 1on1 • ϛογϣϯͷதؒৼΓฦΓ • ϛογϣϯΛՄࢹԽ͢ΔϛογϣϯπϦʔͷ࡞ • ϛογϣϯͷ૬ޓઆ໌ͷͷઃఆ
ϛογϣϯΛՄࢹԽ͢ΔϛογϣϯπϦʔ https://blog.studysapuri.jp/entry/2022/02/25/sre-mission-tree
Կ͕ى͖ͨͷ͔(1) • ੜ࢈ੑվળαΠΫϧͷՃ • ՝ͷٵ্͍͛ -> ࣮ -> ϑΟʔυόοΫ ->
վળͷαΠΫϧ͕Ճ
Կ͕ى͖ͨͷ͔(2) • SRE Culture ͷൖɿϓϨϞʔςϜͷ࣮ࢪ https://blog.studysapuri.jp/entry/pre-mortem
Կ͕ى͖ͨͷ͔(3) • ΞϥʔτϋϯυϦϯάͷαϙʔτ • Alert ͦͷͷͷઆ໌ɺௐࠪํ๏ SRE ͕αϙʔτ • ରԠͦͷͷ։ൃνʔϜͰ࣮ࢪ
݁ՌͲ͏ͳ͔ͬͨ • େ͖ͳোͳ͘ελσΟαϓϦதֶߨ࠲ͷϑϧϦχϡʔΞϧ ͕ϦϦʔε • ։ൃνʔϜͰΞϥʔτରԠ࣮ݱ https://studysapuri.jp/course/junior/ https://github.com/twitter/twemoji
ࠓճͬͨ͜ͱͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber
Facilitation Pure SRE (։ൃνʔϜ) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
ࠓճͬͨ͜ͱͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber
Facilitating Pure SRE (։ൃνʔϜ) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
ࠓճͬͨ͜ͱͳΜͩͬͨͷ͔ SRE ։ൃνʔϜ SRE mem ber mem ber mem ber
Collaboration Pure SRE (։ൃνʔϜ) (Partially) Enabling SRE SRE (Partially) Embedded SRE ͱͯ͠Ҡಈ
ࠓճͷύλʔϯͷߟ • Enabling SRE ʹΑΔ Facilitating ”த”͔Β࡞Δํ͕ྑ͍ • ΑΓ։ൃνʔϜͷӡ༻ελΠϧʹ͋ͬͨܗͰద༻Ͱ͖Δ •
ٕज़తͳ࣮ Platform ʹৄ͍͠ Pure SRE ͕”֎”͔Β Embedded ͞Εͯ Collaboration ͨ͠ํ͕ྑ͍ • ArgoCD, GitHub Actions ͳͲ Infrastructure Pure SRE ͕ৄ͍͠ • ՝ൃݟɺ࣮ɺϑΟʔυόοΫαΠΫϧΛߴʹճ͢͜ͱͰΑΓྑ ͍ Platform ͕ఏڙͰ͖Δ
Agenda • લఏɿSRE Λ࣮ݱ͢ΔͱͲ͏͍͏͜ͱ͔ • ྺ࢙͔ΒৼΓฦΔʰελσΟαϓϦʱSRE • ࣄྫɿ(Partially) Embedded /
Enabling SRE • ·ͱΊͱࠓޙ
• ։ൃνʔϜͷ৴པੑʹؔ͢Δ Capability शಘʹ2छྨ͋Δ • Embedded SRE (from Pure SRE)
/ ֎͔Β͑Δ • Enabling SRE (in the Team) / ͔Β͛Δ • ৫نɾϑΣʔζʹΑͬͯ࠷దͳύλʔϯ͕ҟͳΔ • খن / ։ൃॳظϑΣʔζͰ͋Ε Embedded SRE Pattern • தେن / ։ൃνʔϜ͕ख़ͯ͘͠Ε Enabling SRE Pattern • ͜ͷ2ͭͷύλʔϯϚωδϝϯτͰσβΠϯͰ͖Δ • 100/0 Ͱͳ͘”෦తʹ”࣮ફ͢Δ͚ͩͰޮՌ͕͋Δ Tl;dr
ࠓޙ͞Βʹ։ൃνʔϜͷεέʔϥϏϦςΟͷͨΊʹҎԼʹऔΓΉ • SRE Capability शಘࢧԉ • ϛογϣϯϚωδϝϯτʹΑΔ։ൃνʔϜ Enabling SRE ͷ࠾༻
• SRE ख़Ξηεϝϯτͷ࡞ɾ࣮ࢪ • SRE ࣝɾٕज़शಘͷͨΊͷΦϯϘʔσΟϯάࢧԉ • Developer Success / ։ൃੜ࢈ੑ্ࢧԉ • Platform Λ Product ͱͯ͠։ൃ͢Δ • Developer Support ࠓճͷࣄྫ
Special Thanks • @kyontan • As Embedded SRE • @ravelll
• As Enabling SRE • ʰελσΟαϓϦʱதֶߨ࠲ϑϧϦχϡʔΞϧʹؔΘͬͨશͯͷਓ • SRE νʔϜϝϯόʔ
Thank you! chaspy chaspy_ Engineering Manager, Site Reliability at Recruit
Co., Ltd. Takeshi Kondo https://chaspy.me
͓·͚ɿSRE ख़Ξηεϝϯτ