Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where Chaos Engineering comes from, and what's next

itkq
April 13, 2019

Where Chaos Engineering comes from, and what's next

itkq

April 13, 2019
Tweet

More Decks by itkq

Other Decks in Technology

Transcript

  1. ΧΦεΤϯδχΞϦϯά͸Ͳ͔͜Β
    ͖ͨͷ͔ɺͦͷઌʹ͸Կ͕͋Δͷ͔
    Web System Artchitecture ݚڀձ #4
    @itkq
    Web System Artchitecture ݚڀձ #4 (@itkq) 1

    View Slide

  2. whoami
    • @itkq [ˈɪtəkəʊ]
    • SRE @ Cookpad
    • SLI/SLO, Մ༻ੑ, ...
    • Web γεςϜΛࣗ཯ࣦͤͯ͞৬͍ͨ͠
    Web System Artchitecture ݚڀձ #4 (@itkq) 2

    View Slide

  3. ൃද಺༰
    • ࣗݾ঺հ
    • ΧΦεΤϯδχΞϦϯάͷྲྀߦ
    • ϨδϦΤϯεΤϯδχΞϦϯάͱ Safety-II
    • SRE ͱΧΦεΤϯδχΞϦϯάͷؔ܎
    • ΞϯνϑϥδϟΠϧͳ Web γεςϜ͕੒ཱ͢Δ͔
    Web System Artchitecture ݚڀձ #4 (@itkq) 3

    View Slide

  4. Web System Artchitecture ݚڀձ #4 (@itkq) 4

    View Slide

  5. ΧΦεΤϯδχΞϦϯάͷྲྀߦ
    • ʮΧΦεΤϯδχΞϦϯάʯ͕ॳΊͯొ৔ͨ͠ͷ͸ 2015 ೥1
    • "Chaos Engineering" ϖʔύʔ͸ 2016 ೥2
    • "Chaos Engineering" ຊ͸ 2017 ೥3
    • Gremilin: Failure as a Service4 (2017ʙ)
    • ࣮ફ͸ͱ΋͔֓͘೦ͱͯ͠͸޿·͍ͬͯΔ
    4 h$p:/
    /principlesofchaos.org
    3 h$ps:/
    /www.oreilly.com/library/view/chaos-engineering/9781491988459/
    2 Chaos Engineering, Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, [email protected] Reynolds,
    Casey Rosenthal
    1 h$ps:/
    /medium.com/ne2lix-techblog/chaos-engineering-upgraded-878d341f15fa
    Web System Artchitecture ݚڀձ #4 (@itkq) 5

    View Slide

  6. ΧΦεΤϯδχΞϦϯάͱ͸
    ෼ࢄγεςϜʹ͓͍ͯγεςϜ͕ෆ҆ఆͳঢ়ଶʹ଱͑Δ͜ͱͷग़
    དྷΔ؀ڥΛߏங͢ΔͨΊͷݕূͷن཯
    — ΧΦεΤϯδχΞϦϯάͷݪଇ4
    4 h$p:/
    /principlesofchaos.org
    Web System Artchitecture ݚڀձ #4 (@itkq) 6

    View Slide

  7. ΧΦεΤϯδχΞϦϯά͸Կ͕৽͔ͬͨ͠ͷ͔
    • πʔϧ?
    • ϥϯμϜʹյ͢͜ͱ?
    • ҙਤతʹյ͢͜ͱ?
    • ຊ൪Ͱյ͢͜ͱ?
    • ܧଓతʹյ͢͜ͱ?
    Web System Artchitecture ݚڀձ #4 (@itkq) 7

    View Slide

  8. ΧΦεΤϯδχΞϦϯά͸Կ͕৽͔ͬͨ͠ͷ͔
    Amazon, Google, Microso2, Facebook ͳͲͷاۀͰ͸ɺࣗ਎ͷγ
    εςϜͷϨδϦΤϯεΛςετ͢ΔͨΊͷಉ༷ͳٕज़Λద༻ͯ͠
    ͍ͨɻզʑͷۀքʹݱΕͨ͜ͷن཯Λܗ੒͢ΔΞΫςΟϏςΟΛ
    ʮΧΦεΤϯδχΞϦϯάʯͱݺͿ
    — Ne%lix2
    2 Chaos Engineering, Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, [email protected] Reynolds,
    Casey Rosenthal
    Web System Artchitecture ݚڀձ #4 (@itkq) 8

    View Slide

  9. ϨδϦΤϯεΤϯδχΞϦϯά
    Web System Artchitecture ݚڀձ #4 (@itkq) 9

    View Slide

  10. ϨδϦΤϯεΤϯδχΞϦϯά5
    • ࣾձɾٕज़γεςϜͷϨδϦΤϯτੑΛ޲্ͤ͞ΔͨΊͷํ๏
    ࿦6
    • ϨδϦΤϯε: ஄ྗੑɾ෮ݩྗɾճ෮ྗͷ༏Εͨঢ়ଶΛࢦ֓͢೦
    6 ϨδϦΤϯεΤϯδχΞϦϯά͕໨ࢦ҆͢શ Safety-II ͱͦͷ࣮ݱ๏, ๺ଜਖ਼੖, IEICE Fundamentals Review Vol.8
    No.2
    5 Resilience Engineering - Concepts and Precepts, Ashgate Publishing Ltd., E.Hollnagel, D.D.Woods, and N.Leveson,
    Eds., Aldershot, England, 2006
    Web System Artchitecture ݚڀձ #4 (@itkq) 10

    View Slide

  11. ࣾձɾٕज़γεςϜʹ͓͚Δ໰୊ͷଊ͑ํ
    1. ࣌ؒෆมͰ͸ͳ͘ৗʹมԽ͢Δ
    2. ෆ׬શͳ৘ใΛ΋ͱʹॏཁͳҙࢥܾఆ͕͞ΕΔ
    3. རӹ΍ޮ཰Λ௥ٻ͢Δ͜ͱ͕ཁٻ͞ΕΔ͕ɺͦͷ౒ྗͷ݁Ռ҆
    શͷ༨༟͕࡟ΒΕΔ͜ͱ΋ଟ͘ɺةݥͳঢ়ଶ΁υϦϑτ͢Δ܏
    ޲͕͋Δ
    4. ҆શ͸ॏཁͰ͋Δ͕҆શͦͷ΋ͷ͕γεςϜಈ࡞ͷ໨తͰ͸ͳ͍
    Web System Artchitecture ݚڀձ #4 (@itkq) 11

    View Slide

  12. ʮ҆શʯ֓೦ͷݶք
    • ैདྷͷʮ҆શʯͷྫ
    • ʮ๬·͘͠ͳ͍ࣄଶ͕ى͜Βͳ͍͜ͱʯ
    • ʮڐ༰Ͱ͖ͳ͍ϦεΫ͕ͳ͍͜ͱʯ
    • ΧλετϩϑΟʔʹ͸ଠ౛ଧͪͰ͖ͳ͍
    • ྫ͑͹౦೔ຊେ਒ࡂ
    Web System Artchitecture ݚڀձ #4 (@itkq) 12

    View Slide

  13. ʮ҆શʯ֓೦ͷݶք
    • ա৒ͳӨڹྗ
    • ʮࣄނθϩ͕ݱࡏ n ೔ܧଓதʯͷΑ͏ͳඪޠ
    • ࠜڌͷͳ͍ࣗݾա৴
    • ʮ͜Ε͚ͩపఈͯ҆͠શΛक͍ͬͯΔͷ͔ͩΒաࠅࣄނͳͲ
    ى͜Δ͸͕ͣͳ͍ʯ
    Web System Artchitecture ݚڀձ #4 (@itkq) 13

    View Slide

  14. Safety-I ͔Β Safety-II ΁
    • Safety-I
    • ैདྷͷ൱ఆܗ͔ͭ੩తʹఆٛ͞Εͨʮ҆શʯ
    • Safety-II10
    • ʮγεςϜ͕େ֎ཚͳͲʹΑͬͯ௨ৗ࣌ͷಈ࡞ঢ়ଶΛҡ࣋Ͱ͖ͳ͍৔߹ɺੑೳ͸௿Լͤͯ͞΋ಈ࡞Ͱ͖Δʯ
    • ʮঢ়گ͕ճ෮ͨ͠Β଎΍͔ʹݩͷঢ়ଶ·ͨ͸ͦΕʹ४͡Δঢ়ଶʹ෮چͰ͖Δʯ
    • ϨδϦΤϯτੑͷ͋Δڍಈ͕Ͱ͖Δ͜ͱ
    • Safety-I Λશ໘൱ఆ͍ͯ͠ΔͷͰ͸ͳ͘ɺͦͷઌʹ͋Δ΋ͷ
    10 E.Hollnagel, Safety-I and Safety-II - The Past and Future of Safety Management, Ashgate Publishing Ltd., Surrey,
    England, 2014.
    Web System Artchitecture ݚڀձ #4 (@itkq) 14

    View Slide

  15. جૅͱͳΔ4ͭͷओཁͳೳྗ
    • ରॲͰ͖Δ
    • ྟػԠมͳରॲ·ͰΛؚΉ
    • ؂ࢹͰ͖Δ
    • ϓϩΞΫςΟϒʹରԠ͢ΔೳྗΛ࣋ͭ͜ͱ͕๬·͍͠
    • ༧ݟͰ͖Δ
    • ඞͣ͠΋σʔλۦಈͱ͸ݶΒͳ͍
    • ֶशͰ͖Δ
    • ্هͷೳྗΛઈ͑ؒͳ͘޲্ͤ͞Δ͜ͱ
    Web System Artchitecture ݚڀձ #4 (@itkq) 15

    View Slide

  16. ҆શ΁ͷ౤ࢿײͷࠩҟ
    • Safety-I ϕʔε
    • Կ΋ى͜Βͳ͍͜ͱ͕๬·͍͠ͱ͍͏҉໧ͷԾఆʹΑΓɺ౤
    ࢿߦಈ͸ֻ͚ࣺͯอݥͷΑ͏ʹଊ͑ΒΕ఍߅͕ੜ·Ε΍͍͢
    • Safety-II ϕʔε
    • ໨త͸ಈ࡞ͷܧଓͰ͋ΔͨΊɺͦͷՄೳੑΛߴΊΔ౤ࢿ͸Ծ
    ʹେ֎ཚ΍؀ڥͷมԽ͕ى͜Βͣͱ΋ਖ਼౰ੑΛओுͰ͖Δ
    Web System Artchitecture ݚڀձ #4 (@itkq) 16

    View Slide

  17. Safety-II ͷධՁͷ೉͠͞
    • ҆શ΁ͷܧଓ౒ྗͱ੒ՌධՁʹؔ͢ΔδϨϯϚ
    • ର৅ͷγεςϜ͸͢Ͱʹ͋Δఔ౓ߴ͍҆શੑΛୡ੒ࡁΈͰ͋
    ΔͨΊɺࣄނ਺ͳͲͷΞ΢τΧϜධՁͰଌΔ͜ͱ͸೉͍͠
    • ϓϩηεධՁͰͷิ׬͕ඞཁ
    Web System Artchitecture ݚڀձ #4 (@itkq) 17

    View Slide

  18. Web γεςϜͰ͸Ͳ͏͔?
    • ৗʹมԽ͢Δ
    • ۜͷ஄ؙ͸ͳ͍
    • ӡ༻Λݮ఺๏ͰධՁ͞ΕΔͷ͸ͭΒ͍
    • ...
    • ʮ҆શʯ=>ʮ৴པੑʯɺʮࣄނʯ=>ʮো֐ʯʹஔ͖׵͑ͯΈΔ
    ͱͲ͏͔
    Web System Artchitecture ݚڀձ #4 (@itkq) 18

    View Slide

  19. Web γεςϜͷ৴པੑ
    • ଟ͘ͷ Web γεςϜͰॏཁͳࢦඪ
    • ৴པੑ 100% Λ໨ඪʹ͢Δ͜ͱ͸ؒҧ͍7
    • Web γεςϜͷ৴པੑΛ੍ޚ͢Δํ๏࿦ => SRE
    • SRE ͸ϨδϦΤϯεΤϯδχΞϦϯάͷ1ͭͷ࣮૷ͱ͍͑Δ
    7 Site Reliability Engineering, Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy
    Web System Artchitecture ݚڀձ #4 (@itkq) 19

    View Slide

  20. SLO ͱΤϥʔόδΣοτ
    • SLO (αʔϏεϨϕϧ໨ඪ) Λຬ͍ͨͯ͠ΔݶΓ͸ϦϦʔεՄೳ
    • => ΤϥʔόδΣοτ͕࢒͍ͬͯΔ
    • ΤϥʔόδΣοτ͕࢒͍ͬͯͳ͍৔߹
    • γεςϜͷϨδϦΤϯεΛߴΊΔ
    • SLO Λ؇ΊΔ
    Web System Artchitecture ݚڀձ #4 (@itkq) 20

    View Slide

  21. SRE จ຺ͰͷϨδϦΤϯτੑͷධՁ
    • SLO ͸௚઀తͳධՁج४ʹ͸ͳΒͳ͍
    • ҰํͰ Web γεςϜͷো֐͸ࣗવࡂ֐ʹൺ΂Δͱ࠶ݱ͠΍͍͢
    • ো֐͕ى͜ΔͷΛ଴͚ͭͩͰͳ͘ɺো֐ΛΤϛϡϨʔτ͢Δ
    ͜ͱͰγεςϜͷϨδϦΤϯτੑͷϓϩηεධՁ͕ߦ͑Δ
    • ͜Ε͕ΧΦεΤϯδχΞϦϯά
    Web System Artchitecture ݚڀձ #4 (@itkq) 21

    View Slide

  22. ΧΦεΤϯδχΞϦϯά࠶ߟ
    • Web γεςϜͷ৴པੑ 100% Λ໨ඪʹ͢Δ͜ͱ͸ؒҧ͍
    • SRE ͷ໨త͸ "αʔϏεͷ SLO ΛԼճΔ͜ͱͳ͘มߋͷ଎౓ͷ࠷େԽΛ௥
    ٻ͢Δ͜ͱ7"
    • ΤϥʔόδΣοτΛ࢖͍Ռͨ͞ͳ͍ͨΊʹ͸ Safety-II ΛߴΊΔඞཁ͕͋Δ
    • ϨδϦΤϯτੑΛϓϩηεධՁ͢ΔͨΊͷํ๏ͱͯ͠ো֐ΛΤϛϡϨʔτ
    ͢ΔΧΦεΤϯδχΞϦϯά͕͋Δ
    7 Site Reliability Engineering, Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy
    Web System Artchitecture ݚڀձ #4 (@itkq) 22

    View Slide

  23. ΧΦεΤϯδχΞϦϯάͰಘΒΕΔ͜ͱ
    • Known unknown ͳ੬ऑੑͷݕূ
    • ྫ: ΠϯελϯεΛಥવམͱͨ͠৔߹ʹ SLI ͕Ͳ͏มԽ͢Δ͔
    • Unknown unknown ͳ໰୊ͷൃݟ
    • ྫ͑͹૬ؔ͢Δ΋ͷ
    • ྫ: ϨδϦΤϯεΛߴΊΔͨΊͷϑΥʔϧόοΫΩϟογϡ͕Ҿ͖ى͜͢ෆ੔߹11
    11 h$ps:/
    /medium.com/ne2lix-techblog/from-chaos-to-control-tespla2orm-ce5566aef0a4
    Web System Artchitecture ݚڀձ #4 (@itkq) 23

    View Slide

  24. ΧΦεΤϯδχΞϦϯάͷݶք
    • Unknown unknown ͳো֐΍੬ऑੑ͸ΤϛϡϨʔτͰ͖ͳ͍
    • ྫ͑͹఻છతͳ΋ͷ8
    • ྫ: Linux Leap Second bug
    • "ଟ༷ੑ" ͸ཱͪ޲͔͏खஈͷ1ͭ8
    • ଟ༷ੑ...?
    8 h$ps:/
    /www.gremlin.com/blog/adrian-cockro9-chaos-engineering-what-it-is-and-where-its-going-chaos-
    conf-2018/
    Web System Artchitecture ݚڀձ #4 (@itkq) 24

    View Slide

  25. ΑΓো֐ʹʮڧ͍ʯγεςϜ
    • ϨδϦΤϯτͳγεςϜ
    • ϨδϦΤϯτΛ௒͑Δ൓੬͍γεςϜ
    Web System Artchitecture ݚڀձ #4 (@itkq) 25

    View Slide

  26. ൓੬͞ (ΞϯνϑϥδϟΠϧੑ)
    • ੬͞ͷ൓ରͱ͞ΕΔ֓೦9
    • ྫ
    • ے೑: ෛՙΛ͔͚Δ͜ͱͰҎલΑΓڧ͘ͳΔ
    • ৘ใ: ޿ΊΔΑΓյ͢౒ྗͷ΄͏͕ྐʹͳΔ
    9 ൓੬ऑੑʦ্ʧ――ෆ࣮֬ͳੈքΛੜ͖ԆͼΔ།Ұͷߟ͑ํ, φγʔϜɾχίϥεɾλϨϒ (ஶ), ๬݄ Ӵ (຋༁), ઍ
    ༿ හੜ (຋༁)
    Web System Artchitecture ݚڀձ #4 (@itkq) 26

    View Slide

  27. ൓੬͞ (ΞϯνϑϥδϟΠϧੑ)
    • ੬͞: িܸͰऑ͘ͳΔੑ࣭
    • ϩόετੑ: িܸʹରͯ͠มԽ͠ͳ͍ੑ࣭
    • ϨδϦΤϯτੑ: িܸʹରͯ͠దԠ͢Δੑ࣭
    • ൓੬͞: িܸͰڧ͘ͳΔੑ࣭
    Web System Artchitecture ݚڀձ #4 (@itkq) 27

    View Slide

  28. ൓੬͍γεςϜͷ֊૚ߏ଄
    • ൓੬͞ͷཪଆʹ͸֊૚ߏ଄͕͋Δ9
    • Web γεςϜશମ͕൓੬͘ʮਐԽʯ͢ΔͨΊʹ͸ݸʑͷ Web γεςϜ͕੬͘ɺഁ୼͢ΔՄ
    ೳੑΛ͍࣋ͬͯΔ͜ͱ͕͔ܽͤͳ͍
    • Web γεςϜ͕൓੬͍ͨΊʹ͸ݸʑͷίϯϙʔωϯτ͕੬͘ɺഁ୼͢ΔՄೳੑΛ͍࣋ͬͯΔ
    ͜ͱ͕͔ܽͤͳ͍
    • ίϯϙʔωϯτ͕൓੬͍ͨΊʹ͸ݸʑͷϗετ͕੬͘ɺഁ୼͢ΔՄೳੑΛ͍࣋ͬͯΔ͜ͱ͕
    ͔ܽͤͳ͍ => ϗετͷଟ༷ੑ
    9 ൓੬ऑੑʦ্ʧ――ෆ࣮֬ͳੈքΛੜ͖ԆͼΔ།Ұͷߟ͑ํ, φγʔϜɾχίϥεɾλϨϒ (ஶ), ๬݄ Ӵ (຋༁), ઍ
    ༿ හੜ (຋༁)
    Web System Artchitecture ݚڀձ #4 (@itkq) 28

    View Slide

  29. ·ͱΊ
    • ϨδϦΤϯεΤϯδχΞϦϯάʹ͍ͭͯಋೖͨ͠
    • ϨδϦΤϯεΤϯδχΞϦϯάɾSREɾΧΦεΤϯδχΞϦϯ
    άͷؔ܎Λઆ໌ͨ͠
    • ൓੬͍ Web γεςϜͷߏ૝Λઆ໌ͨ͠
    Web System Artchitecture ݚڀձ #4 (@itkq) 29

    View Slide

  30. h"ps:/
    /twi"er.com/tammybutow/status/1115027371999027200
    Web System Artchitecture ݚڀձ #4 (@itkq) 30

    View Slide