i b l e ג ࣜ ձ ࣾ Ϗ ζ Ϧ ʔ ν H R M O S ࠾ ༻ S R E ν ʔ Ϝ ͷ O b s e r v a b i l i t y ઓ ུ S A I T O T a k u r o @ B i z R e a c h , i n c . 2 0 1 9 / 0 8 / 0 2 S R E L o u n g e # 1 0
i b l e / A g e n d a SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ త ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ࢦ ͠ ͯ τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔ ՝Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ
i b l e / A g e n d a SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ࢦ ͠ ͯ τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔ ՝Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ త ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ
ڥͱΓऔΓ͖͔͢ɺۀͷنൣΛ໌จԽ͠·ͨ͠ɻ ͜ΕΒͷڥʹɺ࣮ಇڥͷΈͳΒͣɺϓϩμΫτ։ൃ νʔϜɺςεςΟϯάνʔϜɺϢʔβͳͲؚ·Ε·͢ɻ ͜ΕΒͷϧʔϧ࡞ۀͷϓϥΫςΟεɺӡ༻ͷ࡞ۀͰ ͳ͘ɺΤϯδχΞϦϯά࡞ۀʹूத͠ଓ͚Δͷʹཱͬͯ ͍·͢ɻ S R E ຊ / 1 . 3 ষ S R E ͷ ৴
i b l e / A g e n d a τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔ ՝Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ త ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ࢦ ͠ ͯ
things” and hold the pager. Their day-to-day tasks and projects are driven by SLOs: ensuring that SLOs are defended in the short term and that they can be maintained in the medium to long term. One could even claim that without SLOs, there is no need for SREs. S R E W o r k b o o k / C h a p t e r 2 I m p l e m e n t i n g S L O s SREͱSLOͷؔੑ S e r v i c e L e v e l O b j e c t i v e
i b l e / A g e n d a ՝Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ త ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ࢦ ͠ ͯ τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔
n i t i o n ࣗಈԽՄೳ A u t o m a t a b l e αʔϏε ʹରͯ͠O(n) O ( n ) w i t h s e r v i c e g r o w t h ख࡞ۀ M a n u a l ܁Γฦ͠ R e p e t i t i v e ظతͳՁͳ͠ N o e n d u r i n g v a l u e ઓज़త T a c t i c a l ܭଌ ՄࢹԽλʔήοτ T a r g e t o f m e a s u r e m e n t a n d v i s u a l i z a t i o n
Members of the same team or organization often arrive at different conclusions regarding the magnitude of engineering effort lost to toil, and therefore prioritize remediation efforts differently. Furthermore, toil reduction efforts can span quarters or even years, during which time team priorities and personnel can change. To maintain focus and justify cost over the long term, you need an objective measure of progress. W o r k b o o k / C h a p t e r 6 - E l i m i n a t i n g T o i l τΠϧͷܭଌ M e a s u r i n g T o i l
g T o i l େྨ খྨ આ໌ toil - ʮτΠϧͷఆٛʯʹͯ·Δͷ not-toil troubleshooting τϥϒϧରԠ checksheet ܖؔ࿈ͷखଓ͖ collaboration ෦ॺؒґཔ࡞ۀɺผϓϩμΫτ࡞ۀ sre-culture SREจԽͷܒ debt ٕज़తෛ࠴ͷฦ٫ optimize αʔϏεͷ࠷దԽ overhead Φʔόʔου other ʢͯ·Βͳ͍ʣ
to make sure you promote it in the right place. τΠϧܭଌ݁Ռ άϥϑ G r a p h o f t o i l m e a s u r e m e n t r e s u l t other 2% overhead 3% automation 7% troubleshoot 8% optimize 14% sre-culture 14% debt 19% collabo 21% toil 10% τΠϧҎ֎ͷ࡞ۀܭଌΛߦ͍ɺ ྗ͖͢λεΫ(optimize/ automation/sre-culture)Λೝࣝ
i b l e / A g e n d a SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ త ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ࢦ ͠ ͯ τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔ ՝Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ
r e Մ༻ੑࢦඪ A v a i l a b i l i t y τΠϧࢦඪ T o i l ηΩϡϦςΟࢦඪ S e c u r i t y ͦͷ՝͕ͲͷఔʢԿ࣌ؒʣͷՄ༻ੑԼΛͨΒ͔͢ ͦͷ՝͕िԿ࣌ؒఔͷτΠϧΛͨΒ͔͢ ͦͷ՝͕Ͳͷ͘Β͍ͷηΩϡϦςΟϦεΫΛሃΜͰ͍Δ͔ʢେ/த/খʣ ֓ࢉετʔϦʔϙΠϯτ R o u g h E s t i m a t e d S t o r y P o i n t ՝Λશղܾ͢Δʮͱ͋ΔࢪࡦʯΛ࣮ࢪͨ͠ͱ͖ͷ֓ࢉ
n i t i o n SREνʔϜͷత ϛογϣϯࡦఆͯ͠ํੑΛఆΊతҙࣝΛ࣋ͬͯۀ ߦʹ͋ͨΕΔΑ͏ʹͳͬͨ SLO͕ݟ͑ͳ͍ ఆٛ/ܭଌ/ՄࢹԽΛߦ͍SLOυϦϒϯͳSREͷ४උ͕ ͬͨ τΠϧͷྔ͕ݟ͑ͳ͍ τΠϧͷఆٛ/ܭଌ/ՄࢹԽΛߦ͍ྗ͖͢λεΫ͕ ݟ͑ͨ ՝ͷ༏ઌ͕Θ͔Βͳ͍ ΠγϡʔείΞͷఆٛ/ࢉग़ʹΑΓ՝ͷॏཁ͕Մ ࢹԽ͞Εɺ༏ઌ͖͢՝͕໌֬ʹͳͬͨ
ν γεςϜຊ෦ϓϥοτϑΥʔϜ ج൫ਪਐࣨ શࣾHRMOS࠾༻ ཧࣄۀSREάϧʔϓ ݉ HRTech Χϯύχʔ ϦΫϧʔςΟϯάϓ ϥοτϑΥʔϜࣄۀϢχοτ HRMOS࠾༻ࣄۀ෦ ϓϩμΫτ ։ൃ෦ Site Reliability Engineeringάϧʔϓ (2018/11ೖࣾ) S e l f I n t r o d u c t i o n