Implementing Site Reliability Engineering in your organization - Making Culture, Enabling DevOps, Building Platform -
Infra Study 2nd #7「SREと組織」 https://forkwell.connpass.com/event/228038/
Implementing Site ReliabilityEngineering in your organization- Making Culture, Enabling DevOps, Building Platform -Takeshi Kondo / @chaspy2021/11/16Infra Study 2nd #7ʮSREͱ৫ʯ
View Slide
Who am Ichaspy chaspy_Engineering ManagerSite Reliability at Recruit Co., Ltd.Takeshi Kondo
SRE NEXT 2020https://sre-next.dev/schedule#c4
ࠓ͢͜ͱ / ͞ͳ͍͜ͱ / ର• ͢͜ͱ• SRE Λ৫ʹ࣮͢ΔͨΊͷഎܠͱͳΔߟ͑ํ• ͞ͳ͍͜ͱ• ಛఆͷٕज़ͷ• SRE Practice ͷ࣮ફྫ• ର• SRE Λ৫ʹ࣮͍͚ͨ͠ͲΜͰΔͻͱ
Tl;dr• SRE Λ৫ʹ࣮͢ΔͨΊʹ৺͕͚Δͱ͍͍͜ͱ• φϥςΟϒΛཧղ͢Δ• ೝෛՙΛԼ͛Δ• పఈతʹݴޠԽ͢Δ
Infra Study Meetup #3ʮSREͷ͜Ε·Ͱͱ͜Ε͔Βʯhttps://speakerdeck.com/masayoshi/sre-culture-organization?slide=29
Agenda1. SRE Λ৫ʹ࣮͢ΔͱͲ͏͍͏͜ͱ͔2. ελσΟαϓϦ / Quipper ͷ SRE ͱ͖ͯͬͯͨ͜͠ͱ3. SRE ͱจԽ4. SRE ͱ DevOps5. SRE ͱ Platform6. ·ͱΊͱࠓޙ
Agenda1. SRE Λ৫ʹ࣮͢ΔͱͲ͏͍͏͜ͱ͔2. ελσΟαϓϦ / Quipper ͷ SRE ͱ͖ͯͬͯͨ͜͠ͱ3. SRE ͱจԽ ⾢ main4. SRE ͱ DevOps5. SRE ͱ Platform6. ·ͱΊͱࠓޙ
SRE Λ৫ʹ࣮͢ΔͱͲ͏͍͏͜ͱ͔• SRE ͷඪ• αΠτͷ৴པੑΛίϯτϩʔϧ͢Δ͜ͱ• Agility ͱ Reliability ͲͪΒʹࢿ͢Δͷ͔Λ SLO ͱ͍͏ࢦඪΛݩʹҙࢥܾఆ͢Δ• ࣗͨͪͷϓϩμΫτɾαʔϏεΛ࡞ΔνʔϜ͕͜ΕΒΛͨΓલʹͰ͖Δঢ়ଶΛࢦ͢
৫γεςϜhttps://speakerdeck.com/masayoshi/sre-culture-organization?slide=29
৫γεςϜ💡
৫γεςϜ🤔
৫ਓؒ🙆
৫͞ͳ͍Ͱ͢Αhttps://note.com/qsona/n/ncb9e1f242fb4
SRE Λ৫ʹ࣮͢ΔͱͲ͏͍͏͜ͱ͔• SRE ͷඪ• αΠτͷ৴པੑΛίϯτϩʔϧ͢Δ͜ͱ• Agility ͱ Reliability ͲͪΒʹࢿ͢Δͷ͔Λ SLO ͱ͍͏ࢦඪΛݩʹҙࢥܾఆ͢Δ• ࣗͨͪͷϓϩμΫτɾαʔϏεΛ࡞ΔνʔϜ͕͜ΕΒΛͨΓલʹͰ͖Δঢ়ଶΛࢦ͢৫ʹνʔϜʹਓؒʹԿ͔Λ࣮ߦͯ͠Β͏
ελσΟαϓϦ K12 SRE Team ͷ Vision / Mission / Values• Vision• ࠷ߴͷֶशϓϩμΫτΛ࡞Γଓ͚ΒΕΔ։ൃ৫ͷ࣮ݱ• Mission• ࣗݾ݁νʔϜ͕ϓϩμΫτΛૉૣ҆͘શʹಧ͚ଓ͚ΔͨΊͷϓϥοτϑΥʔϜͱจԽΛ࡞Δ• Values• Fail smart / Learning / Borderless / Metrics-driven
ελσΟαϓϦ K12 SRE Team ͷ Vision / Mission / Valueshttps://blog.studysapuri.jp/entry/sre-vision-mission-values
৫نͷਪҠ ։ൃऀ 43& ։ൃऀελσΟαϓϦɾQuipper ྆ํͷɺWeb Engineer (frontend&backend) ͷɻNative আ֎͍ͯ͠Δɻ
Timeline at 2020-01-11 (SRE NEXT 2020)
20201݄࣌ͷঢ়گ• Platform Λ Kubernetes ʹࡌͤସ͑ɺMicroservices Readyͳঢ়گΛࢦ͍ͯͨ͠• ৫ͱγεςϜ͕εέʔϧ͢ΔΑ͏ʹϓϩηεΛ͑ͨ• αʔϏεΦʔφʔγοϓͷࡦఆ• Design Doc• Production Readiness Checklist• Self-services Infrastructure (Terraform)KubernetesಋೖͰ࣮ݱ͍ͨ͠ੈքͱͦͷઌʹ͋ΔMicroservices https://blog.studysapuri.jp/entry/future-with-kubernetes
ͦΕ͔Β1ɺ202110݄ݱࡏ• "ࣗݾ݁Խ / self-contained"Λࢦ͢• ֤αʔϏενʔϜ͕ͨΓલʹDesign Doc Λॻ͖ɺSLI/SLO Λߟ͑ɺఆٛ͠ɺఆظతʹͦΕΛ؍͍ͯ͠Δ• จԽͱͯ͠ఆணͨ͠ͱݴ͍͍ͬͯͣ
จԽ🤔
จԽͱԿ͔ਓ͕ؒࣾձͷߏһͱͯ֫͠ಘ͢ΔଟͷৼΔ͍ͷશମͷ͜ͱ(Wikipedia ΑΓ)
SRE ͷ৫ͷ࣮ͱԿ͔• SRE ͷ৫ͷ࣮ ▶ SRE ͷจԽΛ։ൃ৫ʹ࡞Δ͜ͱ• ։ൃϝϯόʔ͕ɺ։ൃ৫ͷߏһͱͯ͠ɺSLI/SLO Λఆٛ͢ΔͳͲͷ Practice Λ࣮ફ͠ɺ৴པੑΛίϯτϩʔϧ͢ΔৼΔ͍Λࣗવʹ࣮ߦ͍ͯ͠Δঢ়ଶͷ͜ͱ
ͲͷΑ͏ʹจԽΛ࡞Δͷ͔• ҟจԽΛ։ൃνʔϜʹड͚ೖΕͯΒ͏͜ͱ• ৫ਓؒͰߏ͞Ε͍ͯΔɺਓؒͷಛੑΛཧղ͢Δ• ਓؒɺΒͳ͍ͷʹෆ҆Λ֮͑Δ• ਓؒɺ͍͠ͷΓͨ͘ͳ͍• ਓؒɺมԽۤखγεςϜͱͷҧ͍Ͱ͢Ͷ
ҟจԽΛड͚ೖΕͯΒ͏ͨΊʹͲ͏͢Ε͍͍͔• ରʢSREจԽʣ͕Կͳͷ͔ΛΔ• ߹ཧੑɾϝϦοτΛཧղ͠ɺΠϯηϯςΟϒΛײ͡Δ• ࣮ફ͕ՄೳͳݶΓ؆୯ʹͳ͍ͬͯΔʮઆ໌Λ͢Δʯ͚ͩͰ͜ΕΛ͛͠Δͷ͍͠
จԽߏஙͷͨΊͷ3ͭͷϙΠϯτ• φϥςΟϒΛཧղ͢Δ• ೝෛՙΛԼ͛Δ• పఈతʹݴޠԽ͢Δ
φϥςΟϒΛཧղ͢Δhttps://publishing.newspicks.com/books/9784910063010https://twitter.com/chaspy_/status/1223088950387982337?s=20https://twitter.com/chaspy_/status/1403587911421894657?s=20
ྑ 1on1 ͷ͢ʍΊηΫγϣϯͷนΛ͑ͯڧྗ͋͠͏ https://speakerdeck.com/chaspy/how-we-overcame-the-covid-19-crisis?slide=568FC%FWFMPQFSνʔϜن͕มԽ͍ͯ͠Δঢ়گͰͷӡ༻ෛ୲ɺݱঢ়ೝࣝΛΔ͜ͱ͕Ͱ͖ͨ#VTJOFTT%FWFMPQFS1MBOOFS43&ͬͯͦͦͬͯ·͔͢ʁͬͯɺֶशऀ 6TFS,1*ɺ৴པੑࢦඪͷΛڞ༗ͪͳΈʹΉͱ͖ͪΌΜͱ google doc ʹ agenda Λॻ͍ͯࣄલʹڞ༗͍ͯ͠·͢ɻͼͬ͘Γͪ͠Ό͏͔ΒͶɻ
φϥςΟϒΛཧղ͢Δ• ରΛ௨ͯ͡ଞऀΛΔɺΘ͔Γ͋͑ͳ͞ΛΔ• ཱ͕ҧ͏ਓؒͱڠۀ͢Δதɺ͕ࣗͨͪࢦ͢ੈքΛͲ͏࣮ݱ͢Δ͔Λߟ͑ൈ͘• ૬खʹ͕ͦͦࣗͨͪΖ͏ͱ͍ͯ͠Δ͜ͱΛཧղͯ͠Β͑Δ• ΠϯηϯςΟϒͷઆ໌ͷํΛߟ͑Δ͖͔͚ͬʹͳΔ
ೝෛՙ / Cognitive loadIn cognitive psychology, cognitiveload refers to the used amountof working memory resources(Wikipedia ΑΓ)
ೝෛՙΛԼ͛ΔSRE NEXT 2020 ͰʮSLO Reviewʯͱ͍͏λΠτϧͰొஃ͠·ͨ͠ #srenext https://blog.studysapuri.jp/entry/2020/01/30/slo-review
ೝෛՙΛԼ͛Δ• υΩϡϝϯςʔγϣϯͱͦͷಋઢ• ΨΠυͱࣗಈԽ• γϯϓϧ͞
ೝෛՙΛԼ͛ΔσϓϩΠྃ௨ͱҰॹʹɺ1SFWJFXڥ 13͝ͱʹੜ͞ΕΔڥͷ-JOLͱ"SHP$%6*ͷ-JOLΛ௨,VCFSOFUFTNBOJGFTUΛมߋͨ͠߹ɺDVTUPNJ[F࣮ߦޙͷࠩEJffΛ௨
ೝෛՙΛԼ͛Δ$POGUFTUʹΑΔNBOJGFTUMJOU/PEFBffiOJUZࢦఆෆͲ͏मਖ਼͢Ε͍͍͔͕ॻ͔ΕͨυΩϡϝϯτʹ༠ಋ
Terraform Platform in QuipperHashiTalks Japan 2021 ͰฐϓϩμΫτͷ Terraform Platform ʹ͍ͭͯొஃ͠·ͨ͠ https://blog.studysapuri.jp/entry/2021/10/13/080000
పఈతͳݴޠԽΛ͢Δ• Backgrond, Problem, Why, What, How Λݴ༿ʹ͢Δ• ϑΟʔυόοΫΛΒ͍͘͢͢Δ• উखʹڞ༗͞Ε͍ͯ͘ݴ༿ڧ͍ɻݴ༿ʹͯ͠ɺҙࢥܾఆͯ͠ɺࢼͯ͠ɺৼΓฦΔ͜ͱΛ܁Γฦ͔͢͠ͳ͍
quipper/snippets-ja
Agenda1. SRE Λ৫ʹ࣮͢ΔͱͲ͏͍͏͜ͱ͔2. ͜Ε·ͰελσΟαϓϦ / Quipper ͷ SRE ͱ͖ͯͬͯͨ͜͠ͱ3. SRE ͱจԽ4. SRE ͱ DevOps5. SRE ͱ Platform6. ·ͱΊͱࠓޙ
Class SRE implements DevOps• DevOps ͱ͍͏ࢥΛ࣮ફ͢Δͷ͕ SRE• What's the Difference Between DevOps and SRE? (classSRE implements DevOps)https://www.youtube.com/watch?v=uTEL8Ff1Zvk
What is DevOps?
ελσΟαϓϦখதߴେ։ൃ෦ "ٕज़ઓུάϧʔϓ"• ϓϩμΫτ։ൃ৫ͱͦͷγεςϜΛΑΓมԽʹڧ͘͢Δ• ٕज़తͳϏδϣϯͱํͷࡦఆ• ٕज़త՝ɾෛ࠴ΛίϯτϩʔϧԼʹஔ͘• վળαΠΫϧͷཱ֬ͱࣗݾஅೳྗͷ֫ಘ• DevOps WG (Working Group) Ͱ׆ಈத
DevOps WG• ֤։ൃνʔϜͱ SRE, QA ϝϯόʔͰʮࣗݾஅೳྗͷ֫ಘʯΛߟ͍͑ͯΔ• DX Criteria ͷ࣮ࢪ• ։ൃͷόϦϡʔετϦʔϜϚοϐϯάͷ࡞• Metrics ͷऩूͱ؍• Platform ͷൈຊతͳվળͷΠϯϓοτ
Why Platform?• Platform = ڞ௨ʹΘΕΔ࣮ߦج൫• ͳͥ Platform ͕ඞཁ͔ʁ• ӡ༻ෛՙΛԼ͛Δ• ೝෛՙΛԼ͛Δ• Metrics ΛΑΓޮՌతʹऩू͢Δ• Agility ͱ Reliability ͷํΛߴΊΔ
Cloud Native PlatformTeam ϝϯόʔͷϛογϣϯΛπϦʔߏʹͨ͠ϛογϣϯπϦʔΛ࡞Γ·ͨ͠ɻ͜Εʹ͍ͭͯϒϩάΛॻ͘༧ఆɻ͜Ε Platform ͱ͍͏ Tree ͷ child nodes
·ͱΊ• SRE ͷ৫ͷ࣮ͷͨΊʹ৺͕͚Δͱ͍͍͜ͱ• φϥςΟϒΛཧղ͢Δ• ೝෛՙΛԼ͛Δ• పఈతʹݴޠԽ͢Δ• ։ൃ৫શମͰมԽʹڧ͍৫ͱγεςϜΛ࡞͍ͬͯ͘• DevOps ͷ࣮ݱ• Platform ͷ։ൃ• ։ൃνʔϜͰͷ SRE ࣮ફͷαϙʔτ
ࠓޙ• Site Reliability Engineering ͷ৫ͷ࣮Ҿ͖ଓ͖Δ• ։ൃ Team Ͱͷ capability शಘΛࢧԉ͍ͯ͘͠• SRE Team Πϯϑϥͷٕज़తෛ࠴Λղফ͠ͳ͕ΒɺPlatform ։ൃʹྗ͍ͯ͘͠• Security Reliability ͱಉ͡ߏਤʹͳΔɻ։ൃνʔϜ͕ࣗతʹ࣮ફͰ͖ΔΑ͏ͳจԽͱϓϥοτϑΥʔϜΛ࡞͍ͬͯ͘
We are hiring!https://brand.studysapuri.jp/career/position/sre
Thank you!chaspy chaspy_Engineering ManagerSite Reliability at Recruit Co., Ltd.Takeshi Kondo
FAQ: ࠷߅͕ڧ͔ͬͨͷͲ͜ͰɺͲΜͳࣄ͕͖͔͚ͬͰᬍ͠·͔ͨ͠ʁ• "ڧ͍߅ʹ͋ͬͨ"Έ͍ͨͳγʔϯͳ͔ͬͨ• ͔ͳΓͬ͘͡ΓೖΕ͍ͯͬͨͷͰ
FAQ: SREͷ࣮͕ఆணͨ͠ޙɺଞνʔϜͷҙࣝߦಈԿ͔มԽͨ͠Ͱ͠ΐ͏͔ʁ• SLO ҧͷΞϥʔτ Loadtest ͷ݁ՌΛݩʹੑೳվળ͕ࣗతʹͰ͖͍ͯΔ
FAQ: SREνʔϜ͕ࣗΠϯϑϥج൫Λ։ൃͨ͠Γ͠·͔͢ʁ• Yes. جຊతʹ Cloud Λ׆༻ͨ͠ج൫Λ։ൃ͍ͯ͠Δ• Cloud Native Platform• Application CI/CD• Infrastructure CI/CD• Kubernetes Platform• Loadtest Platform