Where Chaos Engineering comes from, and what's next

E4619fc2a039391a1677beeac58dd487?s=47 itkq
April 13, 2019

Where Chaos Engineering comes from, and what's next

E4619fc2a039391a1677beeac58dd487?s=128

itkq

April 13, 2019
Tweet

Transcript

  1. ΧΦεΤϯδχΞϦϯά͸Ͳ͔͜Β ͖ͨͷ͔ɺͦͷઌʹ͸Կ͕͋Δͷ͔ Web System Artchitecture ݚڀձ #4 @itkq Web System

    Artchitecture ݚڀձ #4 (@itkq) 1
  2. whoami • @itkq [ˈɪtəkəʊ] • SRE @ Cookpad • SLI/SLO,

    Մ༻ੑ, ... • Web γεςϜΛࣗ཯ࣦͤͯ͞৬͍ͨ͠ Web System Artchitecture ݚڀձ #4 (@itkq) 2
  3. ൃද಺༰ • ࣗݾ঺հ • ΧΦεΤϯδχΞϦϯάͷྲྀߦ • ϨδϦΤϯεΤϯδχΞϦϯάͱ Safety-II • SRE

    ͱΧΦεΤϯδχΞϦϯάͷؔ܎ • ΞϯνϑϥδϟΠϧͳ Web γεςϜ͕੒ཱ͢Δ͔ Web System Artchitecture ݚڀձ #4 (@itkq) 3
  4. Web System Artchitecture ݚڀձ #4 (@itkq) 4

  5. ΧΦεΤϯδχΞϦϯάͷྲྀߦ • ʮΧΦεΤϯδχΞϦϯάʯ͕ॳΊͯొ৔ͨ͠ͷ͸ 2015 ೥1 • "Chaos Engineering" ϖʔύʔ͸ 2016

    ೥2 • "Chaos Engineering" ຊ͸ 2017 ೥3 • Gremilin: Failure as a Service4 (2017ʙ) • ࣮ફ͸ͱ΋͔֓͘೦ͱͯ͠͸޿·͍ͬͯΔ 4 h$p:/ /principlesofchaos.org 3 h$ps:/ /www.oreilly.com/library/view/chaos-engineering/9781491988459/ 2 Chaos Engineering, Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Jus@n Reynolds, Casey Rosenthal 1 h$ps:/ /medium.com/ne2lix-techblog/chaos-engineering-upgraded-878d341f15fa Web System Artchitecture ݚڀձ #4 (@itkq) 5
  6. ΧΦεΤϯδχΞϦϯάͱ͸ ෼ࢄγεςϜʹ͓͍ͯγεςϜ͕ෆ҆ఆͳঢ়ଶʹ଱͑Δ͜ͱͷग़ དྷΔ؀ڥΛߏங͢ΔͨΊͷݕূͷن཯ — ΧΦεΤϯδχΞϦϯάͷݪଇ4 4 h$p:/ /principlesofchaos.org Web System

    Artchitecture ݚڀձ #4 (@itkq) 6
  7. ΧΦεΤϯδχΞϦϯά͸Կ͕৽͔ͬͨ͠ͷ͔ • πʔϧ? • ϥϯμϜʹյ͢͜ͱ? • ҙਤతʹյ͢͜ͱ? • ຊ൪Ͱյ͢͜ͱ? •

    ܧଓతʹյ͢͜ͱ? Web System Artchitecture ݚڀձ #4 (@itkq) 7
  8. ΧΦεΤϯδχΞϦϯά͸Կ͕৽͔ͬͨ͠ͷ͔ Amazon, Google, Microso2, Facebook ͳͲͷاۀͰ͸ɺࣗ਎ͷγ εςϜͷϨδϦΤϯεΛςετ͢ΔͨΊͷಉ༷ͳٕज़Λద༻ͯ͠ ͍ͨɻզʑͷۀքʹݱΕͨ͜ͷن཯Λܗ੒͢ΔΞΫςΟϏςΟΛ ʮΧΦεΤϯδχΞϦϯάʯͱݺͿ —

    Ne%lix2 2 Chaos Engineering, Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Jus@n Reynolds, Casey Rosenthal Web System Artchitecture ݚڀձ #4 (@itkq) 8
  9. ϨδϦΤϯεΤϯδχΞϦϯά Web System Artchitecture ݚڀձ #4 (@itkq) 9

  10. ϨδϦΤϯεΤϯδχΞϦϯά5 • ࣾձɾٕज़γεςϜͷϨδϦΤϯτੑΛ޲্ͤ͞ΔͨΊͷํ๏ ࿦6 • ϨδϦΤϯε: ஄ྗੑɾ෮ݩྗɾճ෮ྗͷ༏Εͨঢ়ଶΛࢦ֓͢೦ 6 ϨδϦΤϯεΤϯδχΞϦϯά͕໨ࢦ҆͢શ Safety-II

    ͱͦͷ࣮ݱ๏, ๺ଜਖ਼੖, IEICE Fundamentals Review Vol.8 No.2 5 Resilience Engineering - Concepts and Precepts, Ashgate Publishing Ltd., E.Hollnagel, D.D.Woods, and N.Leveson, Eds., Aldershot, England, 2006 Web System Artchitecture ݚڀձ #4 (@itkq) 10
  11. ࣾձɾٕज़γεςϜʹ͓͚Δ໰୊ͷଊ͑ํ 1. ࣌ؒෆมͰ͸ͳ͘ৗʹมԽ͢Δ 2. ෆ׬શͳ৘ใΛ΋ͱʹॏཁͳҙࢥܾఆ͕͞ΕΔ 3. རӹ΍ޮ཰Λ௥ٻ͢Δ͜ͱ͕ཁٻ͞ΕΔ͕ɺͦͷ౒ྗͷ݁Ռ҆ શͷ༨༟͕࡟ΒΕΔ͜ͱ΋ଟ͘ɺةݥͳঢ়ଶ΁υϦϑτ͢Δ܏ ޲͕͋Δ 4.

    ҆શ͸ॏཁͰ͋Δ͕҆શͦͷ΋ͷ͕γεςϜಈ࡞ͷ໨తͰ͸ͳ͍ Web System Artchitecture ݚڀձ #4 (@itkq) 11
  12. ʮ҆શʯ֓೦ͷݶք • ैདྷͷʮ҆શʯͷྫ • ʮ๬·͘͠ͳ͍ࣄଶ͕ى͜Βͳ͍͜ͱʯ • ʮڐ༰Ͱ͖ͳ͍ϦεΫ͕ͳ͍͜ͱʯ • ΧλετϩϑΟʔʹ͸ଠ౛ଧͪͰ͖ͳ͍ •

    ྫ͑͹౦೔ຊେ਒ࡂ Web System Artchitecture ݚڀձ #4 (@itkq) 12
  13. ʮ҆શʯ֓೦ͷݶք • ա৒ͳӨڹྗ • ʮࣄނθϩ͕ݱࡏ n ೔ܧଓதʯͷΑ͏ͳඪޠ • ࠜڌͷͳ͍ࣗݾա৴ •

    ʮ͜Ε͚ͩపఈͯ҆͠શΛक͍ͬͯΔͷ͔ͩΒաࠅࣄނͳͲ ى͜Δ͸͕ͣͳ͍ʯ Web System Artchitecture ݚڀձ #4 (@itkq) 13
  14. Safety-I ͔Β Safety-II ΁ • Safety-I • ैདྷͷ൱ఆܗ͔ͭ੩తʹఆٛ͞Εͨʮ҆શʯ • Safety-II10

    • ʮγεςϜ͕େ֎ཚͳͲʹΑͬͯ௨ৗ࣌ͷಈ࡞ঢ়ଶΛҡ࣋Ͱ͖ͳ͍৔߹ɺੑೳ͸௿Լͤͯ͞΋ಈ࡞Ͱ͖Δʯ • ʮঢ়گ͕ճ෮ͨ͠Β଎΍͔ʹݩͷঢ়ଶ·ͨ͸ͦΕʹ४͡Δঢ়ଶʹ෮چͰ͖Δʯ • ϨδϦΤϯτੑͷ͋Δڍಈ͕Ͱ͖Δ͜ͱ • Safety-I Λશ໘൱ఆ͍ͯ͠ΔͷͰ͸ͳ͘ɺͦͷઌʹ͋Δ΋ͷ 10 E.Hollnagel, Safety-I and Safety-II - The Past and Future of Safety Management, Ashgate Publishing Ltd., Surrey, England, 2014. Web System Artchitecture ݚڀձ #4 (@itkq) 14
  15. جૅͱͳΔ4ͭͷओཁͳೳྗ • ରॲͰ͖Δ • ྟػԠมͳରॲ·ͰΛؚΉ • ؂ࢹͰ͖Δ • ϓϩΞΫςΟϒʹରԠ͢ΔೳྗΛ࣋ͭ͜ͱ͕๬·͍͠ •

    ༧ݟͰ͖Δ • ඞͣ͠΋σʔλۦಈͱ͸ݶΒͳ͍ • ֶशͰ͖Δ • ্هͷೳྗΛઈ͑ؒͳ͘޲্ͤ͞Δ͜ͱ Web System Artchitecture ݚڀձ #4 (@itkq) 15
  16. ҆શ΁ͷ౤ࢿײͷࠩҟ • Safety-I ϕʔε • Կ΋ى͜Βͳ͍͜ͱ͕๬·͍͠ͱ͍͏҉໧ͷԾఆʹΑΓɺ౤ ࢿߦಈ͸ֻ͚ࣺͯอݥͷΑ͏ʹଊ͑ΒΕ఍߅͕ੜ·Ε΍͍͢ • Safety-II ϕʔε

    • ໨త͸ಈ࡞ͷܧଓͰ͋ΔͨΊɺͦͷՄೳੑΛߴΊΔ౤ࢿ͸Ծ ʹେ֎ཚ΍؀ڥͷมԽ͕ى͜Βͣͱ΋ਖ਼౰ੑΛओுͰ͖Δ Web System Artchitecture ݚڀձ #4 (@itkq) 16
  17. Safety-II ͷධՁͷ೉͠͞ • ҆શ΁ͷܧଓ౒ྗͱ੒ՌධՁʹؔ͢ΔδϨϯϚ • ର৅ͷγεςϜ͸͢Ͱʹ͋Δఔ౓ߴ͍҆શੑΛୡ੒ࡁΈͰ͋ ΔͨΊɺࣄނ਺ͳͲͷΞ΢τΧϜධՁͰଌΔ͜ͱ͸೉͍͠ • ϓϩηεධՁͰͷิ׬͕ඞཁ Web

    System Artchitecture ݚڀձ #4 (@itkq) 17
  18. Web γεςϜͰ͸Ͳ͏͔? • ৗʹมԽ͢Δ • ۜͷ஄ؙ͸ͳ͍ • ӡ༻Λݮ఺๏ͰධՁ͞ΕΔͷ͸ͭΒ͍ • ...

    • ʮ҆શʯ=>ʮ৴པੑʯɺʮࣄނʯ=>ʮো֐ʯʹஔ͖׵͑ͯΈΔ ͱͲ͏͔ Web System Artchitecture ݚڀձ #4 (@itkq) 18
  19. Web γεςϜͷ৴པੑ • ଟ͘ͷ Web γεςϜͰॏཁͳࢦඪ • ৴པੑ 100% Λ໨ඪʹ͢Δ͜ͱ͸ؒҧ͍7

    • Web γεςϜͷ৴པੑΛ੍ޚ͢Δํ๏࿦ => SRE • SRE ͸ϨδϦΤϯεΤϯδχΞϦϯάͷ1ͭͷ࣮૷ͱ͍͑Δ 7 Site Reliability Engineering, Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy Web System Artchitecture ݚڀձ #4 (@itkq) 19
  20. SLO ͱΤϥʔόδΣοτ • SLO (αʔϏεϨϕϧ໨ඪ) Λຬ͍ͨͯ͠ΔݶΓ͸ϦϦʔεՄೳ • => ΤϥʔόδΣοτ͕࢒͍ͬͯΔ •

    ΤϥʔόδΣοτ͕࢒͍ͬͯͳ͍৔߹ • γεςϜͷϨδϦΤϯεΛߴΊΔ • SLO Λ؇ΊΔ Web System Artchitecture ݚڀձ #4 (@itkq) 20
  21. SRE จ຺ͰͷϨδϦΤϯτੑͷධՁ • SLO ͸௚઀తͳධՁج४ʹ͸ͳΒͳ͍ • ҰํͰ Web γεςϜͷো֐͸ࣗવࡂ֐ʹൺ΂Δͱ࠶ݱ͠΍͍͢ •

    ো֐͕ى͜ΔͷΛ଴͚ͭͩͰͳ͘ɺো֐ΛΤϛϡϨʔτ͢Δ ͜ͱͰγεςϜͷϨδϦΤϯτੑͷϓϩηεධՁ͕ߦ͑Δ • ͜Ε͕ΧΦεΤϯδχΞϦϯά Web System Artchitecture ݚڀձ #4 (@itkq) 21
  22. ΧΦεΤϯδχΞϦϯά࠶ߟ • Web γεςϜͷ৴པੑ 100% Λ໨ඪʹ͢Δ͜ͱ͸ؒҧ͍ • SRE ͷ໨త͸ "αʔϏεͷ

    SLO ΛԼճΔ͜ͱͳ͘มߋͷ଎౓ͷ࠷େԽΛ௥ ٻ͢Δ͜ͱ7" • ΤϥʔόδΣοτΛ࢖͍Ռͨ͞ͳ͍ͨΊʹ͸ Safety-II ΛߴΊΔඞཁ͕͋Δ • ϨδϦΤϯτੑΛϓϩηεධՁ͢ΔͨΊͷํ๏ͱͯ͠ো֐ΛΤϛϡϨʔτ ͢ΔΧΦεΤϯδχΞϦϯά͕͋Δ 7 Site Reliability Engineering, Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy Web System Artchitecture ݚڀձ #4 (@itkq) 22
  23. ΧΦεΤϯδχΞϦϯάͰಘΒΕΔ͜ͱ • Known unknown ͳ੬ऑੑͷݕূ • ྫ: ΠϯελϯεΛಥવམͱͨ͠৔߹ʹ SLI ͕Ͳ͏มԽ͢Δ͔

    • Unknown unknown ͳ໰୊ͷൃݟ • ྫ͑͹૬ؔ͢Δ΋ͷ • ྫ: ϨδϦΤϯεΛߴΊΔͨΊͷϑΥʔϧόοΫΩϟογϡ͕Ҿ͖ى͜͢ෆ੔߹11 11 h$ps:/ /medium.com/ne2lix-techblog/from-chaos-to-control-tes<ng-the-resiliency-of-ne2lixs-content-discovery- pla2orm-ce5566aef0a4 Web System Artchitecture ݚڀձ #4 (@itkq) 23
  24. ΧΦεΤϯδχΞϦϯάͷݶք • Unknown unknown ͳো֐΍੬ऑੑ͸ΤϛϡϨʔτͰ͖ͳ͍ • ྫ͑͹఻છతͳ΋ͷ8 • ྫ: Linux

    Leap Second bug • "ଟ༷ੑ" ͸ཱͪ޲͔͏खஈͷ1ͭ8 • ଟ༷ੑ...? 8 h$ps:/ /www.gremlin.com/blog/adrian-cockro9-chaos-engineering-what-it-is-and-where-its-going-chaos- conf-2018/ Web System Artchitecture ݚڀձ #4 (@itkq) 24
  25. ΑΓো֐ʹʮڧ͍ʯγεςϜ • ϨδϦΤϯτͳγεςϜ • ϨδϦΤϯτΛ௒͑Δ൓੬͍γεςϜ Web System Artchitecture ݚڀձ #4

    (@itkq) 25
  26. ൓੬͞ (ΞϯνϑϥδϟΠϧੑ) • ੬͞ͷ൓ରͱ͞ΕΔ֓೦9 • ྫ • ے೑: ෛՙΛ͔͚Δ͜ͱͰҎલΑΓڧ͘ͳΔ •

    ৘ใ: ޿ΊΔΑΓյ͢౒ྗͷ΄͏͕ྐʹͳΔ 9 ൓੬ऑੑʦ্ʧ――ෆ࣮֬ͳੈքΛੜ͖ԆͼΔ།Ұͷߟ͑ํ, φγʔϜɾχίϥεɾλϨϒ (ஶ), ๬݄ Ӵ (຋༁), ઍ ༿ හੜ (຋༁) Web System Artchitecture ݚڀձ #4 (@itkq) 26
  27. ൓੬͞ (ΞϯνϑϥδϟΠϧੑ) • ੬͞: িܸͰऑ͘ͳΔੑ࣭ • ϩόετੑ: িܸʹରͯ͠มԽ͠ͳ͍ੑ࣭ • ϨδϦΤϯτੑ:

    িܸʹରͯ͠దԠ͢Δੑ࣭ • ൓੬͞: িܸͰڧ͘ͳΔੑ࣭ Web System Artchitecture ݚڀձ #4 (@itkq) 27
  28. ൓੬͍γεςϜͷ֊૚ߏ଄ • ൓੬͞ͷཪଆʹ͸֊૚ߏ଄͕͋Δ9 • Web γεςϜશମ͕൓੬͘ʮਐԽʯ͢ΔͨΊʹ͸ݸʑͷ Web γεςϜ͕੬͘ɺഁ୼͢ΔՄ ೳੑΛ͍࣋ͬͯΔ͜ͱ͕͔ܽͤͳ͍ •

    Web γεςϜ͕൓੬͍ͨΊʹ͸ݸʑͷίϯϙʔωϯτ͕੬͘ɺഁ୼͢ΔՄೳੑΛ͍࣋ͬͯΔ ͜ͱ͕͔ܽͤͳ͍ • ίϯϙʔωϯτ͕൓੬͍ͨΊʹ͸ݸʑͷϗετ͕੬͘ɺഁ୼͢ΔՄೳੑΛ͍࣋ͬͯΔ͜ͱ͕ ͔ܽͤͳ͍ => ϗετͷଟ༷ੑ 9 ൓੬ऑੑʦ্ʧ――ෆ࣮֬ͳੈքΛੜ͖ԆͼΔ།Ұͷߟ͑ํ, φγʔϜɾχίϥεɾλϨϒ (ஶ), ๬݄ Ӵ (຋༁), ઍ ༿ හੜ (຋༁) Web System Artchitecture ݚڀձ #4 (@itkq) 28
  29. ·ͱΊ • ϨδϦΤϯεΤϯδχΞϦϯάʹ͍ͭͯಋೖͨ͠ • ϨδϦΤϯεΤϯδχΞϦϯάɾSREɾΧΦεΤϯδχΞϦϯ άͷؔ܎Λઆ໌ͨ͠ • ൓੬͍ Web γεςϜͷߏ૝Λઆ໌ͨ͠

    Web System Artchitecture ݚڀձ #4 (@itkq) 29
  30. h"ps:/ /twi"er.com/tammybutow/status/1115027371999027200 Web System Artchitecture ݚڀձ #4 (@itkq) 30