Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where Chaos Engineering comes from, and what's next

itkq
April 13, 2019

Where Chaos Engineering comes from, and what's next

itkq

April 13, 2019
Tweet

More Decks by itkq

Other Decks in Technology

Transcript

  1. ΧΦεΤϯδχΞϦϯά͸Ͳ͔͜Β
    ͖ͨͷ͔ɺͦͷઌʹ͸Կ͕͋Δͷ͔
    Web System Artchitecture ݚڀձ #4
    @itkq
    Web System Artchitecture ݚڀձ #4 (@itkq) 1

    View full-size slide

  2. whoami
    • @itkq [ˈɪtəkəʊ]
    • SRE @ Cookpad
    • SLI/SLO, Մ༻ੑ, ...
    • Web γεςϜΛࣗ཯ࣦͤͯ͞৬͍ͨ͠
    Web System Artchitecture ݚڀձ #4 (@itkq) 2

    View full-size slide

  3. ൃද಺༰
    • ࣗݾ঺հ
    • ΧΦεΤϯδχΞϦϯάͷྲྀߦ
    • ϨδϦΤϯεΤϯδχΞϦϯάͱ Safety-II
    • SRE ͱΧΦεΤϯδχΞϦϯάͷؔ܎
    • ΞϯνϑϥδϟΠϧͳ Web γεςϜ͕੒ཱ͢Δ͔
    Web System Artchitecture ݚڀձ #4 (@itkq) 3

    View full-size slide

  4. Web System Artchitecture ݚڀձ #4 (@itkq) 4

    View full-size slide

  5. ΧΦεΤϯδχΞϦϯάͷྲྀߦ
    • ʮΧΦεΤϯδχΞϦϯάʯ͕ॳΊͯొ৔ͨ͠ͷ͸ 2015 ೥1
    • "Chaos Engineering" ϖʔύʔ͸ 2016 ೥2
    • "Chaos Engineering" ຊ͸ 2017 ೥3
    • Gremilin: Failure as a Service4 (2017ʙ)
    • ࣮ફ͸ͱ΋͔֓͘೦ͱͯ͠͸޿·͍ͬͯΔ
    4 h$p:/
    /principlesofchaos.org
    3 h$ps:/
    /www.oreilly.com/library/view/chaos-engineering/9781491988459/
    2 Chaos Engineering, Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Jus@n Reynolds,
    Casey Rosenthal
    1 h$ps:/
    /medium.com/ne2lix-techblog/chaos-engineering-upgraded-878d341f15fa
    Web System Artchitecture ݚڀձ #4 (@itkq) 5

    View full-size slide

  6. ΧΦεΤϯδχΞϦϯάͱ͸
    ෼ࢄγεςϜʹ͓͍ͯγεςϜ͕ෆ҆ఆͳঢ়ଶʹ଱͑Δ͜ͱͷग़
    དྷΔ؀ڥΛߏங͢ΔͨΊͷݕূͷن཯
    — ΧΦεΤϯδχΞϦϯάͷݪଇ4
    4 h$p:/
    /principlesofchaos.org
    Web System Artchitecture ݚڀձ #4 (@itkq) 6

    View full-size slide

  7. ΧΦεΤϯδχΞϦϯά͸Կ͕৽͔ͬͨ͠ͷ͔
    • πʔϧ?
    • ϥϯμϜʹյ͢͜ͱ?
    • ҙਤతʹյ͢͜ͱ?
    • ຊ൪Ͱյ͢͜ͱ?
    • ܧଓతʹյ͢͜ͱ?
    Web System Artchitecture ݚڀձ #4 (@itkq) 7

    View full-size slide

  8. ΧΦεΤϯδχΞϦϯά͸Կ͕৽͔ͬͨ͠ͷ͔
    Amazon, Google, Microso2, Facebook ͳͲͷاۀͰ͸ɺࣗ਎ͷγ
    εςϜͷϨδϦΤϯεΛςετ͢ΔͨΊͷಉ༷ͳٕज़Λద༻ͯ͠
    ͍ͨɻզʑͷۀքʹݱΕͨ͜ͷن཯Λܗ੒͢ΔΞΫςΟϏςΟΛ
    ʮΧΦεΤϯδχΞϦϯάʯͱݺͿ
    — Ne%lix2
    2 Chaos Engineering, Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Jus@n Reynolds,
    Casey Rosenthal
    Web System Artchitecture ݚڀձ #4 (@itkq) 8

    View full-size slide

  9. ϨδϦΤϯεΤϯδχΞϦϯά
    Web System Artchitecture ݚڀձ #4 (@itkq) 9

    View full-size slide

  10. ϨδϦΤϯεΤϯδχΞϦϯά5
    • ࣾձɾٕज़γεςϜͷϨδϦΤϯτੑΛ޲্ͤ͞ΔͨΊͷํ๏
    ࿦6
    • ϨδϦΤϯε: ஄ྗੑɾ෮ݩྗɾճ෮ྗͷ༏Εͨঢ়ଶΛࢦ֓͢೦
    6 ϨδϦΤϯεΤϯδχΞϦϯά͕໨ࢦ҆͢શ Safety-II ͱͦͷ࣮ݱ๏, ๺ଜਖ਼੖, IEICE Fundamentals Review Vol.8
    No.2
    5 Resilience Engineering - Concepts and Precepts, Ashgate Publishing Ltd., E.Hollnagel, D.D.Woods, and N.Leveson,
    Eds., Aldershot, England, 2006
    Web System Artchitecture ݚڀձ #4 (@itkq) 10

    View full-size slide

  11. ࣾձɾٕज़γεςϜʹ͓͚Δ໰୊ͷଊ͑ํ
    1. ࣌ؒෆมͰ͸ͳ͘ৗʹมԽ͢Δ
    2. ෆ׬શͳ৘ใΛ΋ͱʹॏཁͳҙࢥܾఆ͕͞ΕΔ
    3. རӹ΍ޮ཰Λ௥ٻ͢Δ͜ͱ͕ཁٻ͞ΕΔ͕ɺͦͷ౒ྗͷ݁Ռ҆
    શͷ༨༟͕࡟ΒΕΔ͜ͱ΋ଟ͘ɺةݥͳঢ়ଶ΁υϦϑτ͢Δ܏
    ޲͕͋Δ
    4. ҆શ͸ॏཁͰ͋Δ͕҆શͦͷ΋ͷ͕γεςϜಈ࡞ͷ໨తͰ͸ͳ͍
    Web System Artchitecture ݚڀձ #4 (@itkq) 11

    View full-size slide

  12. ʮ҆શʯ֓೦ͷݶք
    • ैདྷͷʮ҆શʯͷྫ
    • ʮ๬·͘͠ͳ͍ࣄଶ͕ى͜Βͳ͍͜ͱʯ
    • ʮڐ༰Ͱ͖ͳ͍ϦεΫ͕ͳ͍͜ͱʯ
    • ΧλετϩϑΟʔʹ͸ଠ౛ଧͪͰ͖ͳ͍
    • ྫ͑͹౦೔ຊେ਒ࡂ
    Web System Artchitecture ݚڀձ #4 (@itkq) 12

    View full-size slide

  13. ʮ҆શʯ֓೦ͷݶք
    • ա৒ͳӨڹྗ
    • ʮࣄނθϩ͕ݱࡏ n ೔ܧଓதʯͷΑ͏ͳඪޠ
    • ࠜڌͷͳ͍ࣗݾա৴
    • ʮ͜Ε͚ͩపఈͯ҆͠શΛक͍ͬͯΔͷ͔ͩΒաࠅࣄނͳͲ
    ى͜Δ͸͕ͣͳ͍ʯ
    Web System Artchitecture ݚڀձ #4 (@itkq) 13

    View full-size slide

  14. Safety-I ͔Β Safety-II ΁
    • Safety-I
    • ैདྷͷ൱ఆܗ͔ͭ੩తʹఆٛ͞Εͨʮ҆શʯ
    • Safety-II10
    • ʮγεςϜ͕େ֎ཚͳͲʹΑͬͯ௨ৗ࣌ͷಈ࡞ঢ়ଶΛҡ࣋Ͱ͖ͳ͍৔߹ɺੑೳ͸௿Լͤͯ͞΋ಈ࡞Ͱ͖Δʯ
    • ʮঢ়گ͕ճ෮ͨ͠Β଎΍͔ʹݩͷঢ়ଶ·ͨ͸ͦΕʹ४͡Δঢ়ଶʹ෮چͰ͖Δʯ
    • ϨδϦΤϯτੑͷ͋Δڍಈ͕Ͱ͖Δ͜ͱ
    • Safety-I Λશ໘൱ఆ͍ͯ͠ΔͷͰ͸ͳ͘ɺͦͷઌʹ͋Δ΋ͷ
    10 E.Hollnagel, Safety-I and Safety-II - The Past and Future of Safety Management, Ashgate Publishing Ltd., Surrey,
    England, 2014.
    Web System Artchitecture ݚڀձ #4 (@itkq) 14

    View full-size slide

  15. جૅͱͳΔ4ͭͷओཁͳೳྗ
    • ରॲͰ͖Δ
    • ྟػԠมͳରॲ·ͰΛؚΉ
    • ؂ࢹͰ͖Δ
    • ϓϩΞΫςΟϒʹରԠ͢ΔೳྗΛ࣋ͭ͜ͱ͕๬·͍͠
    • ༧ݟͰ͖Δ
    • ඞͣ͠΋σʔλۦಈͱ͸ݶΒͳ͍
    • ֶशͰ͖Δ
    • ্هͷೳྗΛઈ͑ؒͳ͘޲্ͤ͞Δ͜ͱ
    Web System Artchitecture ݚڀձ #4 (@itkq) 15

    View full-size slide

  16. ҆શ΁ͷ౤ࢿײͷࠩҟ
    • Safety-I ϕʔε
    • Կ΋ى͜Βͳ͍͜ͱ͕๬·͍͠ͱ͍͏҉໧ͷԾఆʹΑΓɺ౤
    ࢿߦಈ͸ֻ͚ࣺͯอݥͷΑ͏ʹଊ͑ΒΕ఍߅͕ੜ·Ε΍͍͢
    • Safety-II ϕʔε
    • ໨త͸ಈ࡞ͷܧଓͰ͋ΔͨΊɺͦͷՄೳੑΛߴΊΔ౤ࢿ͸Ծ
    ʹେ֎ཚ΍؀ڥͷมԽ͕ى͜Βͣͱ΋ਖ਼౰ੑΛओுͰ͖Δ
    Web System Artchitecture ݚڀձ #4 (@itkq) 16

    View full-size slide

  17. Safety-II ͷධՁͷ೉͠͞
    • ҆શ΁ͷܧଓ౒ྗͱ੒ՌධՁʹؔ͢ΔδϨϯϚ
    • ର৅ͷγεςϜ͸͢Ͱʹ͋Δఔ౓ߴ͍҆શੑΛୡ੒ࡁΈͰ͋
    ΔͨΊɺࣄނ਺ͳͲͷΞ΢τΧϜධՁͰଌΔ͜ͱ͸೉͍͠
    • ϓϩηεධՁͰͷิ׬͕ඞཁ
    Web System Artchitecture ݚڀձ #4 (@itkq) 17

    View full-size slide

  18. Web γεςϜͰ͸Ͳ͏͔?
    • ৗʹมԽ͢Δ
    • ۜͷ஄ؙ͸ͳ͍
    • ӡ༻Λݮ఺๏ͰධՁ͞ΕΔͷ͸ͭΒ͍
    • ...
    • ʮ҆શʯ=>ʮ৴པੑʯɺʮࣄނʯ=>ʮো֐ʯʹஔ͖׵͑ͯΈΔ
    ͱͲ͏͔
    Web System Artchitecture ݚڀձ #4 (@itkq) 18

    View full-size slide

  19. Web γεςϜͷ৴པੑ
    • ଟ͘ͷ Web γεςϜͰॏཁͳࢦඪ
    • ৴པੑ 100% Λ໨ඪʹ͢Δ͜ͱ͸ؒҧ͍7
    • Web γεςϜͷ৴པੑΛ੍ޚ͢Δํ๏࿦ => SRE
    • SRE ͸ϨδϦΤϯεΤϯδχΞϦϯάͷ1ͭͷ࣮૷ͱ͍͑Δ
    7 Site Reliability Engineering, Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy
    Web System Artchitecture ݚڀձ #4 (@itkq) 19

    View full-size slide

  20. SLO ͱΤϥʔόδΣοτ
    • SLO (αʔϏεϨϕϧ໨ඪ) Λຬ͍ͨͯ͠ΔݶΓ͸ϦϦʔεՄೳ
    • => ΤϥʔόδΣοτ͕࢒͍ͬͯΔ
    • ΤϥʔόδΣοτ͕࢒͍ͬͯͳ͍৔߹
    • γεςϜͷϨδϦΤϯεΛߴΊΔ
    • SLO Λ؇ΊΔ
    Web System Artchitecture ݚڀձ #4 (@itkq) 20

    View full-size slide

  21. SRE จ຺ͰͷϨδϦΤϯτੑͷධՁ
    • SLO ͸௚઀తͳධՁج४ʹ͸ͳΒͳ͍
    • ҰํͰ Web γεςϜͷো֐͸ࣗવࡂ֐ʹൺ΂Δͱ࠶ݱ͠΍͍͢
    • ো֐͕ى͜ΔͷΛ଴͚ͭͩͰͳ͘ɺো֐ΛΤϛϡϨʔτ͢Δ
    ͜ͱͰγεςϜͷϨδϦΤϯτੑͷϓϩηεධՁ͕ߦ͑Δ
    • ͜Ε͕ΧΦεΤϯδχΞϦϯά
    Web System Artchitecture ݚڀձ #4 (@itkq) 21

    View full-size slide

  22. ΧΦεΤϯδχΞϦϯά࠶ߟ
    • Web γεςϜͷ৴པੑ 100% Λ໨ඪʹ͢Δ͜ͱ͸ؒҧ͍
    • SRE ͷ໨త͸ "αʔϏεͷ SLO ΛԼճΔ͜ͱͳ͘มߋͷ଎౓ͷ࠷େԽΛ௥
    ٻ͢Δ͜ͱ7"
    • ΤϥʔόδΣοτΛ࢖͍Ռͨ͞ͳ͍ͨΊʹ͸ Safety-II ΛߴΊΔඞཁ͕͋Δ
    • ϨδϦΤϯτੑΛϓϩηεධՁ͢ΔͨΊͷํ๏ͱͯ͠ো֐ΛΤϛϡϨʔτ
    ͢ΔΧΦεΤϯδχΞϦϯά͕͋Δ
    7 Site Reliability Engineering, Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy
    Web System Artchitecture ݚڀձ #4 (@itkq) 22

    View full-size slide

  23. ΧΦεΤϯδχΞϦϯάͰಘΒΕΔ͜ͱ
    • Known unknown ͳ੬ऑੑͷݕূ
    • ྫ: ΠϯελϯεΛಥવམͱͨ͠৔߹ʹ SLI ͕Ͳ͏มԽ͢Δ͔
    • Unknown unknown ͳ໰୊ͷൃݟ
    • ྫ͑͹૬ؔ͢Δ΋ͷ
    • ྫ: ϨδϦΤϯεΛߴΊΔͨΊͷϑΥʔϧόοΫΩϟογϡ͕Ҿ͖ى͜͢ෆ੔߹11
    11 h$ps:/
    /medium.com/ne2lix-techblog/from-chaos-to-control-tespla2orm-ce5566aef0a4
    Web System Artchitecture ݚڀձ #4 (@itkq) 23

    View full-size slide

  24. ΧΦεΤϯδχΞϦϯάͷݶք
    • Unknown unknown ͳো֐΍੬ऑੑ͸ΤϛϡϨʔτͰ͖ͳ͍
    • ྫ͑͹఻છతͳ΋ͷ8
    • ྫ: Linux Leap Second bug
    • "ଟ༷ੑ" ͸ཱͪ޲͔͏खஈͷ1ͭ8
    • ଟ༷ੑ...?
    8 h$ps:/
    /www.gremlin.com/blog/adrian-cockro9-chaos-engineering-what-it-is-and-where-its-going-chaos-
    conf-2018/
    Web System Artchitecture ݚڀձ #4 (@itkq) 24

    View full-size slide

  25. ΑΓো֐ʹʮڧ͍ʯγεςϜ
    • ϨδϦΤϯτͳγεςϜ
    • ϨδϦΤϯτΛ௒͑Δ൓੬͍γεςϜ
    Web System Artchitecture ݚڀձ #4 (@itkq) 25

    View full-size slide

  26. ൓੬͞ (ΞϯνϑϥδϟΠϧੑ)
    • ੬͞ͷ൓ରͱ͞ΕΔ֓೦9
    • ྫ
    • ے೑: ෛՙΛ͔͚Δ͜ͱͰҎલΑΓڧ͘ͳΔ
    • ৘ใ: ޿ΊΔΑΓյ͢౒ྗͷ΄͏͕ྐʹͳΔ
    9 ൓੬ऑੑʦ্ʧ――ෆ࣮֬ͳੈքΛੜ͖ԆͼΔ།Ұͷߟ͑ํ, φγʔϜɾχίϥεɾλϨϒ (ஶ), ๬݄ Ӵ (຋༁), ઍ
    ༿ හੜ (຋༁)
    Web System Artchitecture ݚڀձ #4 (@itkq) 26

    View full-size slide

  27. ൓੬͞ (ΞϯνϑϥδϟΠϧੑ)
    • ੬͞: িܸͰऑ͘ͳΔੑ࣭
    • ϩόετੑ: িܸʹରͯ͠มԽ͠ͳ͍ੑ࣭
    • ϨδϦΤϯτੑ: িܸʹରͯ͠దԠ͢Δੑ࣭
    • ൓੬͞: িܸͰڧ͘ͳΔੑ࣭
    Web System Artchitecture ݚڀձ #4 (@itkq) 27

    View full-size slide

  28. ൓੬͍γεςϜͷ֊૚ߏ଄
    • ൓੬͞ͷཪଆʹ͸֊૚ߏ଄͕͋Δ9
    • Web γεςϜશମ͕൓੬͘ʮਐԽʯ͢ΔͨΊʹ͸ݸʑͷ Web γεςϜ͕੬͘ɺഁ୼͢ΔՄ
    ೳੑΛ͍࣋ͬͯΔ͜ͱ͕͔ܽͤͳ͍
    • Web γεςϜ͕൓੬͍ͨΊʹ͸ݸʑͷίϯϙʔωϯτ͕੬͘ɺഁ୼͢ΔՄೳੑΛ͍࣋ͬͯΔ
    ͜ͱ͕͔ܽͤͳ͍
    • ίϯϙʔωϯτ͕൓੬͍ͨΊʹ͸ݸʑͷϗετ͕੬͘ɺഁ୼͢ΔՄೳੑΛ͍࣋ͬͯΔ͜ͱ͕
    ͔ܽͤͳ͍ => ϗετͷଟ༷ੑ
    9 ൓੬ऑੑʦ্ʧ――ෆ࣮֬ͳੈքΛੜ͖ԆͼΔ།Ұͷߟ͑ํ, φγʔϜɾχίϥεɾλϨϒ (ஶ), ๬݄ Ӵ (຋༁), ઍ
    ༿ හੜ (຋༁)
    Web System Artchitecture ݚڀձ #4 (@itkq) 28

    View full-size slide

  29. ·ͱΊ
    • ϨδϦΤϯεΤϯδχΞϦϯάʹ͍ͭͯಋೖͨ͠
    • ϨδϦΤϯεΤϯδχΞϦϯάɾSREɾΧΦεΤϯδχΞϦϯ
    άͷؔ܎Λઆ໌ͨ͠
    • ൓੬͍ Web γεςϜͷߏ૝Λઆ໌ͨ͠
    Web System Artchitecture ݚڀձ #4 (@itkq) 29

    View full-size slide

  30. h"ps:/
    /twi"er.com/tammybutow/status/1115027371999027200
    Web System Artchitecture ݚڀձ #4 (@itkq) 30

    View full-size slide