Make it Visible - BizReach HRMOS SRE Team's Observability Strategy

5a6541ecc9ddfab33ef8e06833aafd01?s=47 saitotak
August 02, 2019

Make it Visible - BizReach HRMOS SRE Team's Observability Strategy

Make it Visible - 株式会社ビズリーチHRMOS-SREチームのObservability戦略 / BizReach HRMOS SRE Team's Observability Strategy

SRE Lounge #10 発表資料
https://sre-lounge.connpass.com/event/139832/

5a6541ecc9ddfab33ef8e06833aafd01?s=128

saitotak

August 02, 2019
Tweet

Transcript

  1. 1 M a k e i t V i s

    i b l e ג ࣜ ձ ࣾ Ϗ ζ Ϧ ʔ ν H R M O S ࠾ ༻ S R E ν ʔ Ϝ ͷ O b s e r v a b i l i t y ઓ ུ S A I T O T a k u r o @ B i z R e a c h , i n c . 2 0 1 9 / 0 8 / 0 2 S R E L o u n g e # 1 0
  2. 2 ࢟ͷݟ͑ͳ͍ఢͱͷઓ͍ S R E ͷ ೔ ৗ

  3. 3 Observability զ ʑ ͕ େ ੾ ʹ ͠ ͯ

    ͍ Δ ͜ ͱ
  4. 4 ՄࢹԽͰࢥ͍ු͔Ϳ΋ͷ O b s e r v a b

    i l i t y
  5. 5 ՄࢹԽͰࢥ͍ු͔Ϳ΋ͷ O b s e r v a b

    i l i t y
  6. 6 System Headquarters PlatformPromotion Office Make it Visible ৴ པ

    ੑ ɺ ੜ ࢈ ੑ Λ ݟ ͑ Δ Խ ͢ Δ ͜ ͱ Ͱ ՝ ୊ ͷ ൃ ݟ ͱ ݈ શ ͳ ੒ ௕ Λ ଅ ͢ No Ops, More Code Τ ϯ δ χ Ξ ʹ ର ͠ ͯ ։ ൃ ͱ ࣄ ۀ ੒ ௕ ʹ ஫ ྗ Ͱ ͖ Δ ؀ ڥ Λ ఏ ڙ ͢ Δ OUR VISION
  7. 7 System Headquarters PlatformPromotion Office No Ops, More Code Τ

    ϯ δ χ Ξ ʹ ର ͠ ͯ ։ ൃ ͱ ࣄ ۀ ੒ ௕ ʹ ஫ ྗ Ͱ ͖ Δ ؀ ڥ Λ ఏ ڙ ͢ Δ OUR VISION Make it Visible ৴ པ ੑ ɺ ੜ ࢈ ੑ Λ ݟ ͑ Δ Խ ͢ Δ ͜ ͱ Ͱ ՝ ୊ ͷ ൃ ݟ ͱ ݈ શ ͳ ੒ ௕ Λ ଅ ͢
  8. 8 M a k e i t V i s

    i b l e / A g e n d a SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ ໨ త ஍ ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ ਑ SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ໨ ࢦ ͠ ͯ τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔ ՝୊Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ
  9. 9 M a k e i t V i s

    i b l e / A g e n d a SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ໨ ࢦ ͠ ͯ τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔ ՝୊Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ ໨ త ஍ ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ ਑
  10. 10 ϛογϣϯʹ͍ͭͯ ϫʔΫϑϩʔͷࡉ͔͍෦෼ɺ༏ઌॱҐɺ೔ʑͷӡ༻͸ SREͷνʔϜʹΑͬͯҟͳ͍ͬͯ·͕͢ɺαϙʔτ͢Δ αʔϏεʹର͢ΔҰ࿈ͷجຊతͳ੹೚ͱத֩ͱͳΔ৴৚ͷ ଚॏ͸ɺશͯͷSREνʔϜʹڞ௨͢Δ΋ͷͰ͢ɻ֓ͯ͠ɺ SREνʔϜ͸αʔϏεͷՄ༻ੑɺϨΠςϯγɺύ ϑΥʔϚϯεɺޮ཰ੑɺมߋ؅ཧɺϞχλϦϯ άɺۓٸରԠɺΩϟύγςΟϓϥϯχϯάʹ ੹೚Λෛ͍·͢ɻࢲୡ͸ɺSREνʔϜ͸ͲͷΑ͏ʹ

    ؀ڥͱ΍ΓऔΓ͢΂͖͔ɺۀ຿ͷنൣΛ໌จԽ͠·ͨ͠ɻ ͜ΕΒͷ؀ڥʹ͸ɺ࣮ಇ؀ڥͷΈͳΒͣɺϓϩμΫτ։ൃ νʔϜɺςεςΟϯάνʔϜɺϢʔβͳͲ΋ؚ·Ε·͢ɻ ͜ΕΒͷϧʔϧ΍࡞ۀͷϓϥΫςΟε͸ɺӡ༻ͷ࡞ۀͰ͸ ͳ͘ɺΤϯδχΞϦϯά࡞ۀʹूத͠ଓ͚Δͷʹ໾ཱͬͯ ͍·͢ɻ S R E ຊ / 1 . 3 ষ S R E ͷ ৴ ৚
  11. 11 νʔϜϛογϣϯͷࡦఆ T e a m M i s s

    i o n
  12. 12 νʔϜϛογϣϯͷࡦఆ T e a m M i s s

    i o n ೲಘ͢Δ·ͰσΟεΧογϣϯ
  13. 13 νʔϜϛογϣϯͷࡦఆ T e a m M i s s

    i o n ϓϩμΫτʹे෼ͳ৴པੑΛ΋ͨͤΔ͜ͱͰϢʔβՁ஋ΛߴΊΔ ϓϩμΫτͱ͸ ຊ൪؀ڥͷHRMOS࠾༻ Ͱ͋Δ ৴པੑͱ͸ Քಇ཰ Ͱ͋Δ Ϣʔβʔͱ͸ ϓϩμΫτΛར༻͍ͯ͠Δਓ Ͱ͋Δ ϢʔβՁ஋ͱ͸ ػೳͱ৴པੑͷੵ Ͱ͋Δ
  14. 14 M a k e i t V i s

    i b l e / A g e n d a τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔ ՝୊Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ ໨ త ஍ ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ ਑ SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ໨ ࢦ ͠ ͯ
  15. 15 SREs’ core responsibilities aren’t merely to automate “all the

    things” and hold the pager. Their day-to-day tasks and projects are driven by SLOs: ensuring that SLOs are defended in the short term and that they can be maintained in the medium to long term. One could even claim that without SLOs, there is no need for SREs. S R E W o r k b o o k / C h a p t e r 2 I m p l e m e n t i n g S L O s SREͱSLOͷؔ܎ੑ S e r v i c e L e v e l O b j e c t i v e
  16. 16 SLOͷࡦఆ S e r v i c e L

    e v e l O b j e c t i v e
  17. 17 SLOͷࡦఆ S e r v i c e L

    e v e l O b j e c t i v e αʔϏε͕Քಇ͍ͯ͠Δঢ়ଶͱ͸ʁ Ϣʔβ͕ຬ଍͍ͯ͠Δঢ়ଶͱ͸ʁ ࢦඪ͸ܭଌՄೳ͔ʁ
  18. 18 SLOͷࡦఆ S e r v i c e L

    e v e l O b j e c t i v e Indicator Objective ࣮੷ Availa bility Webܥʗػೳ୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q XXX ػೳ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/xxx 99.99% per month xx.xx% xx.xx% YYY ػೳ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/yyy 99.99% per month xx.xx% xx.xx% … … … … … … … Latenc y WebܥʗAPI୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q xxx-module API Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/xxx 1sec 99%le xx.xx% xx.xx% yyy-module API Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/yyy 1sec 99%le xx.xx% xx.xx% … … … … … … … BatchܥʗService୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q mail-XXX - Service Log(ॲཧ࣌ؒ) - from: XXX queue IN - to: XXX api 200 - API(੒ޭ཰) - XXX queue success count / XXX api 200 count *** 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% YYY-Cordination - Service Log(ॲཧ࣌ؒ) - from: YYY api call - to: YYY table update - Database(੒ޭ཰) - YYY table count / API call *** 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% … … … … … … …
  19. 19 SLOͷࡦఆ S e r v i c e L

    e v e l O b j e c t i v e Indicator Objective ࣮੷ Avaiab ility Webܥʗػೳ୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q XXX ػೳ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/xxx 99.99% per month xx.xx% xx.xx% YYY ػೳ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/yyy 99.99% per month xx.xx% xx.xx% … … … … … … … Latenc y WebܥʗAPI୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q xxx-module API Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/xxx 1sec 99%le xx.xx% xx.xx% yyy-module API Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/yyy 1sec 99%le xx.xx% xx.xx% … … … … … … … BatchܥʗService୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q mail-XXX - Service Log(ॲཧ࣌ؒ) - from: XXX queue IN - to: XXX api 200 - API(੒ޭ཰) - XXX queue success count / XXX api 200 count *** 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% YYY-Cordination - Service Log(ॲཧ࣌ؒ) - from: YYY api call - to: YYY table update - Database(੒ޭ཰) - YYY table count / API call *** 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% … … … … … … … Indicator Availa bility Webܥʗػೳ୯Ґ XXX ػೳ YYY ػೳ … Latenc y WebܥʗAPI୯Ґ xxx-module API yyy-module API … BatchܥʗService୯Ґ mail-XXX YYY-Cordination …
  20. 20 SLOͷࡦఆ S e r v i c e L

    e v e l O b j e c t i v e Indicator Objective ࣮੷ Availa bility Webܥʗػೳ୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q XXX ػೳ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/xxx 99.99% per month xx.xx% xx.xx% YYY ػೳ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/yyy 99.99% per month xx.xx% xx.xx% … … … … … … … Latenc y WebܥʗAPI୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q xxx-module API Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/xxx 1sec 99%le xx.xx% xx.xx% yyy-module API Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/yyy 1sec 99%le xx.xx% xx.xx% … … … … … … … BatchܥʗService୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q mail-XXX - Service Log(ॲཧ࣌ؒ) - from: XXX queue IN - to: XXX api 200 - API(੒ޭ཰) - XXX queue success count / XXX api 200 count *** 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% YYY-Cordination - Service Log(ॲཧ࣌ؒ) - from: YYY api call - to: YYY table update - Database(੒ޭ཰) - YYY table count / API call *** 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% … … … … … … … ܭଌ߲໨ σʔλऔಘํ๏ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/xxx - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/yyy … … ܭଌ߲໨ σʔλऔಘํ๏ Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/xxx Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/yyy … … ܭଌ߲໨ σʔλऔಘํ๏ - Service Log(ॲཧ࣌ؒ) - from: XXX queue IN - to: XXX api 200 - API(੒ޭ཰) - XXX queue success count / XXX api 200 count *** - Service Log(ॲཧ࣌ؒ) - from: YYY api call - to: YYY table update - Database(੒ޭ཰) - YYY table count / API call *** … …
  21. 21 SLOͷࡦఆ S e r v i c e L

    e v e l O b j e c t i v e Indicator Objective ࣮੷ Availa bility Webܥʗػೳ୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q XXX ػೳ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/xxx 99.99% per month xx.xx% xx.xx% YYY ػೳ - 5xx Error Rate - HealthCheck - Connection Timeout Kibana-index/yyy 99.99% per month xx.xx% xx.xx% … … … … … … … Latenc y WebܥʗAPI୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q xxx-module API Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/xxx 1sec 99%le xx.xx% xx.xx% yyy-module API Target Responce Time / GET Target Responce Time / PUT Target Responce Time / POST Kibana-index/yyy 1sec 99%le xx.xx% xx.xx% … … … … … … … BatchܥʗService୯Ґ ܭଌ߲໨ σʔλऔಘํ๏ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q mail-XXX - Service Log(ॲཧ࣌ؒ) - from: XXX queue IN - to: XXX api 200 - API(੒ޭ཰) - XXX queue success count / XXX api 200 count *** 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% YYY-Cordination - Service Log(ॲཧ࣌ؒ) - from: YYY api call - to: YYY table update - Database(੒ޭ཰) - YYY table count / API call *** 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% … … … … … … … Objective ࣮੷ ໨ඪ஋ ܭଌཻ౓ 1Q 2Q 99.99% per month xx.xx% xx.xx% 99.99% per month xx.xx% xx.xx% … … … … ໨ඪ஋ ܭଌཻ౓ 1Q 2Q 1sec 99%le xx.xx% xx.xx% 1sec 99%le xx.xx% xx.xx% … … … … ໨ඪ஋ ܭଌཻ౓ 1Q 2Q 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% 1h ஗Ԇ࣌ؒ xx.xx% xx.xx% … … … …
  22. 22 SLOͷࡦఆ S e r v i c e L

    e v e l O b j e c t i v e
  23. 23 M a k e i t V i s

    i b l e / A g e n d a ՝୊Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ ໨ త ஍ ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ ਑ SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ໨ ࢦ ͠ ͯ τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔
  24. 24 τΠϧͱ͸ɺϓϩμΫγϣϯαʔϏεΛಈ࡞ͤ͞Δ͜ͱ ʹؔ࿈͢Δ࡞ۀͰɺख࡞ۀͰ܁Γฦ͠ߦΘΕɺࣗಈ Խ͢Δ͜ͱ͕ՄೳͰ͋Γɺઓज़తͰ௕ظతͳՁ ஋Λ࣋ͨͣɺ࡞ۀྔ͕αʔϏεͷ੒௕ʹൺྫ ͢Δͱ͍ͬͨ܏޲Λ࣋ͭ΋ͷͰ͢ɻGoogleͷSRE ͷ૊৫Ͱ͸ɺӡ༻ۀ຿ʢͭ·Γ͸τΠϧʣΛɺ֤ਓͷ ࡞ۀ࣌ؒͷ50%ҎԼʹ཈͑Δͱ͍͏໨ඪ͕ܝ͛ ΒΕ͍ͯ·͢ɻ֤SREͷ࡞ۀ࣌ؒͷ͏ͪ࠷௿Ͱ΋50%͸ কདྷͷτΠϧΛ࡟ݮ͢Δ͔ɺαʔϏεʹػೳΛ௥Ճ͢Δ

    ΤϯδχΞϦϯάϓϩδΣΫτͷ࡞ۀʹඅ΍͞ΕΔ΂͖ S R E ຊ / 5 ষ τ Π ϧ ͷ ๾ ໓ τΠϧʹ͍ͭͯ A b o u t T o i l
  25. 25 τΠϧͷఆٛ T o i l D e f i

    n i t i o n ۩ମྫ ख࡞ۀ ܁Γฦ͠ ࣗಈԽ Մೳ ௕ظతͳՁ ஋ͳ͠ ઓज़త αʔϏε੒ ௕ʹ0 O SSLূ໌ॻߋ৽࡞ۀ 1 1 1 1 1 1 TerraformϨϏϡʔɺPlanɺ Apply 1 1 0 0 1 1 ΫΤϦϨϏϡʔͷ&YQMBJO 1 1 1 0 1 1 ηΩϡϦςΟύονͷద༻ 1 1 0 1 1 1 Ϧιʔε্ݶ௨஌ͷରԠ 1 1 1 1 1 1 ϦϦʔε࡞ۀ 1 1 1 1 1 1 ٕज़తෛ࠴ͷղফ 1 1 0 0 1 0 ٞࣄ࿥࡞੒ͱ·ͱΊ 1 1 0 0 1 0 ো֐ௐࠪ࣌ͷ࢓༷֬ೝ 1 1 0 1 1 1 ৽ٕज़ͷݕূ 1 1 0 0 1 0 ෆཁͳΫϥ΢υϦιʔεͷ ࡟আ 1 1 1 0 1 1 ຊ൪ো֐ͷௐࠪ 1 1 0 0 1 1 ....
  26. 26 τΠϧͷఆٛ T o i l D e f i

    n i t i o n ۩ମྫ ख࡞ۀ ܁Γฦ͠ ࣗಈԽ Մೳ ௕ظతͳՁ ஋ͳ͠ ઓज़త αʔϏε੒ ௕ʹ0 O ެࣜݕ౼ SSLূ໌ॻߋ৽࡞ۀ 1 1 1 1 1 1 4 TerraformϨϏϡʔɺPlanɺ Apply 1 1 0 0 1 1 0 ΫΤϦϨϏϡʔͷ&YQMBJO 1 1 1 0 1 1 3 ηΩϡϦςΟύονͷద༻ 1 1 0 1 1 1 0 Ϧιʔε্ݶ௨஌ͷରԠ 1 1 1 1 1 1 4 ϦϦʔε࡞ۀ 1 1 1 1 1 1 4 ٕज़తෛ࠴ͷղফ 1 1 0 0 1 0 0 ٞࣄ࿥࡞੒ͱ·ͱΊ 1 1 0 0 1 0 0 ো֐ௐࠪ࣌ͷ࢓༷֬ೝ 1 1 0 1 1 1 0 ৽ٕज़ͷݕূ 1 1 0 0 1 0 0 ෆཁͳΫϥ΢υϦιʔεͷ ࡟আ 1 1 1 0 1 1 3 ຊ൪ো֐ͷௐࠪ 1 1 0 0 1 1 0 .... ࣗಈԽ Մೳ 1 0 1 0 1 1 0 0 0 0 1 0 αʔϏε੒ ௕ʹ0 O τΠϧ஋ 1 4 1 0 1 3 1 0 1 4 1 4 0 0 0 0 1 0 0 0 1 3 1 0
  27. 27 τΠϧͷఆٛ T o i l D e f i

    n i t i o n ۩ମྫ ख࡞ۀ ܁Γฦ͠ ࣗಈԽ Մೳ ௕ظతͳՁ ஋ͳ͠ ઓज़త αʔϏε੒ ௕ʹ0 O τΠϧ஋ SSLূ໌ॻߋ৽࡞ۀ 1 1 1 1 1 1 4 Ϧιʔε্ݶ௨஌ͷରԠ 1 1 1 1 1 1 4 ϦϦʔε࡞ۀ 1 1 1 1 1 1 4 ΫΤϦϨϏϡʔͷ&YQMBJO 1 1 1 0 1 1 3 ෆཁͳΫϥ΢υϦιʔεͷ ࡟আ 1 1 1 0 1 1 3 TerraformϨϏϡʔɺPlanɺ Apply 1 1 0 0 1 1 0 ηΩϡϦςΟύονͷద༻ 1 1 0 1 1 1 0 ٕज़తෛ࠴ͷղফ 1 1 0 0 1 0 0 ٞࣄ࿥࡞੒ͱ·ͱΊ 1 1 0 0 1 0 0 ো֐ௐࠪ࣌ͷ࢓༷֬ೝ 1 1 0 1 1 1 0 ৽ٕज़ͷݕূ 1 1 0 0 1 0 0 ຊ൪ো֐ͷௐࠪ 1 1 0 0 1 1 0 ....
  28. 28 τΠϧͷఆٛ T o i l D e f i

    n i t i o n զʑʹͱͬͯͷτΠϧͱ͸ αʔϏεͷ੒௕ʹରͯ͠O(n)ɺ͔ͭࣗಈԽՄೳ ͳ࡞ۀͰ͋Γ ख࡞ۀͰ͋ͬͨΓɺ܁Γฦ͠ߦΘΕͨΓɺ ઓज़తͰ͋ͬͨΓɺ௕ظతͳՁ஋Λ࣋ͨͳ͍ ͳͲͷಛੑΛ΋ͭ࡞ۀͰ͋Δ ͜ΕΒΛܭଌɺ͓ΑͼՄࢹԽͷλʔήοτ ͱ͢Δɻ
  29. 29 τΠϧͷఆٛ T o i l D e f i

    n i t i o n ࣗಈԽՄೳ A u t o m a t a b l e αʔϏε੒௕ ʹରͯ͠O(n) O ( n ) w i t h s e r v i c e g r o w t h ख࡞ۀ M a n u a l ܁Γฦ͠ R e p e t i t i v e ௕ظతͳՁ஋ͳ͠ N o e n d u r i n g v a l u e ઓज़త T a c t i c a l ܭଌ ՄࢹԽλʔήοτ T a r g e t o f m e a s u r e m e n t a n d v i s u a l i z a t i o n
  30. 30 Experience and intuition are not repeatable, objective, or transferable.

    Members of the same team or organization often arrive at different conclusions regarding the magnitude of engineering effort lost to toil, and therefore prioritize remediation efforts differently. Furthermore, toil reduction efforts can span quarters or even years, during which time team priorities and personnel can change. To maintain focus and justify cost over the long term, you need an objective measure of progress. W o r k b o o k / C h a p t e r 6 - E l i m i n a t i n g T o i l τΠϧͷܭଌ M e a s u r i n g T o i l
  31. 31 τΠϧͷܭଌ M e a s u r i n

    g T o i l େ෼ྨ આ໌ toil ʮτΠϧͷఆٛʯʹ౰ͯ͸·Δ΋ͷ not-toil ͦΕҎ֎
  32. 32 τΠϧܭଌ݁Ռ άϥϑ G r a p h o f

    t o i l m e a s u r e m e n t r e s u l t not-toil 90% toil 10% ൒ظͷ࡞ۀ࣌ؒܭଌΛߦͬͨͱ͜ Ζɺ40h͕τΠϧʹඅ΍͞Ε͓ͯ Γɺ376h͸τΠϧҎ֎ͷ࡞ۀͰ ͋Δ͜ͱ͕Θ͔ͬͨɻ
  33. 33 τΠϧͷܭଌ M e a s u r i n

    g T o i l େ෼ྨ খ෼ྨ આ໌ toil - ʮτΠϧͷఆٛʯʹ౰ͯ͸·Δ΋ͷ not-toil troubleshooting τϥϒϧରԠ checksheet ܖ໿ؔ࿈ͷखଓ͖ collaboration ෦ॺؒґཔ࡞ۀɺผϓϩμΫτ࡞ۀ sre-culture SREจԽͷܒ໤ debt ٕज़తෛ࠴ͷฦ٫ optimize αʔϏεͷ࠷దԽ overhead Φʔόʔ΁ου other ʢ౰ͯ͸·Βͳ͍ʣ
  34. 34 τΠϧͷܭଌ M e a s u r i n

    g T o i l
  35. 35 τΠϧͷܭଌ M e a s u r i n

    g T o i l
  36. 36 To get your company’s name out there, you need

    to make sure you promote it in the right place. τΠϧܭଌ݁Ռ άϥϑ G r a p h o f t o i l m e a s u r e m e n t r e s u l t other 2% overhead 3% automation 7% troubleshoot 8% optimize 14% sre-culture 14% debt 19% collabo 21% toil 10% τΠϧҎ֎ͷ࡞ۀ΋ܭଌΛߦ͍ɺ ஫ྗ͢΂͖λεΫ(optimize/ automation/sre-culture)Λೝࣝ
  37. 37 M a k e i t V i s

    i b l e / A g e n d a SRE νʔϜϛογϣϯͷࡦఆ զ ʑ Λ ໨ త ஍ ʹ ಋ ͘ ͨ Ί ͷ ߦ ಈ ࢦ ਑ SLOͷࡦఆͱՄࢹԽ S L O υ Ϧ ϒ ϯ ͳ S R E Λ ໨ ࢦ ͠ ͯ τΠϧͷఆٛͱՄࢹԽ 5 0 % Ҏ ্ ͳ ͷ ͔ Ҏ Լ ͳ ͷ ͔ ՝୊Ϛοϓͷࡦఆͱ༏ઌ࣠ ݟ ͑ ͳ ͍ ఢ ʹ ک ͑ ଓ ͚ ͳ ͍ ͨ Ί ʹ
  38. 4FDVSJUZ "WBJMBCJMJUZ 3FMFBTF &OHJOFFSJOH ιʔείʔυ͕Χ ΦεͰมߋʹऑ͍ ՝୊' ςετίʔυ͕গ ͳͯ͘มߋʹऑ͍ ϩάͷӡ༻ͷख

    ͕͔͔ؒΔ %#"ͷ࡞ۀ͕ճΒ ͳ͍ ՝୊$ ೝূ͕੬ऑ ؂ࠪରԠ ՝୊, ՝୊6 ՝୊7 ՝୊9 ՝୊% ՝୊& ՝୊- ՝୊* ՝୊. ՝୊4 ՝୊1 ՝୊# ηΩϡϦςΟ਍அ ݁ՌͷରԠ ՝୊5 ࢮ க ໋ ই ॏ ঱ ෳ ࡶ ࠎ ં ՝୊0 εϩʔΫΤϦ͕ͨ ͘͞Μ ՝୊3 ܰ ই ՝୊; ՝୊" ՝୊+ 1PTUHSF42-ͷόʔ δϣϯ͕ݹ͍ "84ͷίετ͕ ద੾Ͱͳ͍ ՝୊8 ՝୊2 ՝୊) ηΩϡϦςΟΞ ϥʔτͷτϦΞʔ δͷ࢓૊Έ͕ͳ͍ ՝୊8 ՝୊(
  39. 39 If Everything is important, Nothing is important. R a

    d i c a l F o c u s : A c h i e v i n g Y o u r M o s t ՝୊Ϛοϓͷࡦఆ I s s u e M a p
  40. શͯ࠷༏ઌ ιʔείʔυ͕Χ ΦεͰมߋʹऑ͍ ՝୊' ՝୊$ ೝূ͕੬ऑ ՝୊% ՝୊& ՝୊* ՝୊#

    εϩʔΫΤϦ͕ͨ ͘͞Μ ՝୊" ՝୊) ՝୊(
  41. 41 ΠγϡʔείΞ I s s u e S c o

    r e Մ༻ੑࢦඪ A v a i l a b i l i t y τΠϧࢦඪ T o i l ηΩϡϦςΟࢦඪ S e c u r i t y ֓ࢉετʔϦʔϙΠϯτ R o u g h E s t i m a t e d S t o r y P o i n t
  42. 42 ΠγϡʔείΞ I s s u e S c o

    r e Մ༻ੑࢦඪ A v a i l a b i l i t y τΠϧࢦඪ T o i l ηΩϡϦςΟࢦඪ S e c u r i t y ͦͷ՝୊͕Ͳͷఔ౓ʢԿ࣌ؒʣͷՄ༻ੑ௿ԼΛ΋ͨΒ͔͢ ͦͷ՝୊͕िԿ࣌ؒఔ౓ͷτΠϧΛ΋ͨΒ͔͢ ͦͷ՝୊͕Ͳͷ͘Β͍ͷηΩϡϦςΟϦεΫΛሃΜͰ͍Δ͔ʢେ/த/খʣ ֓ࢉετʔϦʔϙΠϯτ R o u g h E s t i m a t e d S t o r y P o i n t ՝୊Λ׬શղܾ͢Δʮͱ͋ΔࢪࡦʯΛ࣮ࢪͨ͠ͱ͖ͷ֓ࢉ޻਺
  43. ιʔείʔυ͕Χ ΦεͰมߋʹऑ͍  ՝୊" ՝୊( ՝୊' ՝୊* ՝୊& ՝୊$ ՝୊#

    ՝୊# ՝୊) ΠγϡʔείΞ I s s u e S c o r e ࠷ ༏ ઌ ՝୊%
  44. 44 ·ͱΊ T o i l D e f i

    n i t i o n SREνʔϜͷ໨త஍ ϛογϣϯࡦఆͯ͠ํ޲ੑΛఆΊ໨తҙࣝΛ࣋ͬͯۀ ຿਱ߦʹ͋ͨΕΔΑ͏ʹͳͬͨ SLO͕ݟ͑ͳ͍ ఆٛ/ܭଌ/ՄࢹԽΛߦ͍SLOυϦϒϯͳSREͷ४උ͕ ੔ͬͨ τΠϧͷྔ͕ݟ͑ͳ͍ τΠϧͷఆٛ/ܭଌ/ՄࢹԽΛߦ͍஫ྗ͢΂͖λεΫ͕ ݟ͑ͨ ՝୊ͷ༏ઌ౓͕Θ͔Βͳ͍ ΠγϡʔείΞͷఆٛ/ࢉग़ʹΑΓ՝୊ͷॏཁ౓͕Մ ࢹԽ͞Εɺ༏ઌ͢΂͖՝୊͕໌֬ʹͳͬͨ
  45. 45 ᴡ౻୓࿕ ג ࣜ ձ ࣾ Ϗ ζ Ϧ ʔ

    ν γεςϜຊ෦ϓϥοτϑΥʔϜ ج൫ਪਐࣨ શࣾHRMOS࠾༻؅ ཧࣄۀSREάϧʔϓ ݉ HRTech Χϯύχʔ ϦΫϧʔςΟϯάϓ ϥοτϑΥʔϜࣄۀϢχοτ HRMOS࠾༻ࣄۀ෦ ϓϩμΫτ ։ൃ෦ Site Reliability Engineeringάϧʔϓ (2018/11ೖࣾ) S e l f I n t r o d u c t i o n
  46. 46 SPECIAL THANKS C o n t r i b

    u t o r s
  47. 47 ࢀߟจݙ

  48. 48 EOF

  49. 49 ૑ۀɹɹɿ2009೥4݄ ୅දऀɹɿೆ ૖Ұ࿠ ैۀһ਺ɿ1,479໊ʢ2019೥6݄ݱࡏʣ ڌ఺ɹɹɿ౦ژʗେࡕʗ໊ݹ԰ʗ෱ԬʗγϯΨϙʔϧ ࣄۀ಺༰ɿΠϯλʔωοτΛ׆༻ͨ͠αʔϏεࣄۀ גࣜձࣾϏζϦʔν / BizReach,

    Inc.
  50. 50 ૑ۀɹɹɿ2009೥4݄ ୅දऀɹɿೆ ૖Ұ࿠ ैۀһ਺ɿ1,479໊ʢ2019೥6݄ݱࡏʣ ڌ఺ɹɹɿ౦ژʗେࡕʗ໊ݹ԰ʗ෱ԬʗγϯΨϙʔϧ ࣄۀ಺༰ɿΠϯλʔωοτΛ׆༻ͨ͠αʔϏεࣄۀ גࣜձࣾϏζϦʔν / BizReach,

    Inc.
  51. Copyright (C) 2019 BizReach, Inc. ৽͍͠Ձ஋Λੈͷதʹ૑Γͳ͕Β૊৫ͱͯ͠΋ٸ੒௕ 51 2009 2010 2011

    2012 2013 2014 2015 2016 2017 2 5 23 74 151 251 548 738 1,479໊ 942 ࠓݱࡏ ϏζϦʔνɿશैۀһ਺ਪҠʢ2019೥6݄࣌఺ʣ 1,306 2018
  52. 52 ʮਓखෆ଍ʯʮاۀͷITԽʯʮࣄۀঝܧʯ͸ɺதখاۀிͷฏ੒30೥౓தখاۀɾখن໛ࣄۀऀ੓ࡦͷॏ఺߲໨ͱͯ͠ڍ͛ΒΕ͍ͯ·͢ɻ

  53. Copyright (C) 2019 BizReach, Inc. 9೥Ͱ17ͷαʔϏεΛ։࢝ 53 2009 2010 2011

    2012 2013 2014 2015 2016 2017 2018 ※Ұ෦αʔϏεΛൈਮͯ͠ܝࡌ ௅ઓ͢Δ20୅ͷస৬αΠτ
  54. Copyright (C) 2019 BizReach, Inc. 9೥Ͱ17ͷαʔϏεΛ։࢝ 54 2009 2010 2011

    2012 2013 2014 2015 2016 2017 2018 ※Ұ෦αʔϏεΛൈਮͯ͠ܝࡌ ௅ઓ͢Δ20୅ͷస৬αΠτ
  55. Copyright (C) 2018 BizReach, Inc.

  56. ٻਓ࡞੒ Ԡืऀ؅ཧ ਓࡐ঺հձࣾ ΁ͷґཔ ໘઀೔ఔௐ੔ ໘઀ධՁ ໘઀݁Ռొ࿥ ࠾༻࣮੷ใࠂʴ෼ੳ ٻ৬ऀ͕ɺٻਓഔମ΍ਓࡐ঺հձࣾɺࣗࣾ࠾༻α ΠτͳͲΛ௨ͯ͡ٻਓʹԠื

    ֤छ࠾༻ख๏͔Β֫ಘͨ͠ Ԡืऀ৘ใʢ৬຿ܦྺͳ ͲʣΛ؅ཧ ςετ΍໘઀Λ௨ͯ͡ ԠืऀΛධՁ બߟ࣮੷Λ֬ೝͯ͠ɺվળ׆ ಈϓϥϯΛࡦఆɾ࣮ߦ Copyright (C) 2018 BizReach, Inc.
  57. 57 EOF