Slide 1

Slide 1 text

Research Paper Introduction #33 “Destroying networks for fun (and pro fi t)” ௨ࢉ#92 @cafenero_777 2022/02/10 1

Slide 2

Slide 2 text

Agenda •ର৅࿦จ •֓ཁͱಡ΋͏ͱͨ͠ཧ༝ 1. INTRODUCTION 2. MOTIVATION & BACKGROUND 3. ARCHITECTURE 4. COMPUTING SMART FAILURES 5. PRELIMINARY EVALUATION 6. DISCUSSIONS AND FUTURE WORK 7. CONCLUSION 2

Slide 3

Slide 3 text

ର৅࿦จ •Destroying networks for fun (and pro fi t) • Nick Shelly∗§, Brendan Tschaen†, Klaus-Tycho Förster∗, Michael Chang‡, Theophilus Benson†, Laurent Vanbever∗ • ∗ETH Zürich, ‡Princeton University, †Duke University, §Forward Networks • ACM HotNets 2015 • paper: https://dl.acm.org/doi/10.1145/2834050.2834099 • slide: https://tik-db.ee.ethz.ch/ fi le/2a818c8733e21ab762ec0a92b7057c4e/hotnets.pdf 3

Slide 4

Slide 4 text

֓ཁͱಡ΋͏ͱͨ͠ཧ༝ •֓ཁ • SDNʢNW੍ޚιϑτʣ͕NWରো֐ੑΛอূ͢Δ੹೚͕͋Δɺ͚Ͳ࣮ࡍ͸೉͍͠ • Failure injection framework (Armageddon)ͷ঺հ • ޮ཰తͳނোγφϦΦɾΞϧΰϦζϜΛ༻͍ͯɺ3ճͷγφϦΦ࣮ߦͰlinkো֐ͷ80%ΛΧόʔ •ಡ΋͏ͱͨ͠ཧ༝ • աڈʹಡΜͩChaos Engineeringͳ࿦จͰ݁ߏҾ༻͞Ε͍ͯͨͷͰ • TitleʹҾ͔Εͯ (fun -> ?) • ͨ·ʹ͸ӡ༻ͬΆ͍΋ͷ (ࢥͬͨΑΓӡ༻ͬΆ͘ͳ͔ͬͨ) • ͪΐͬͱݹ͍ʁʢSDNίϯτϩʔϥͷઃܭͱ͸ʙɺSDNͷ৴པੑͱ͸ʙʣ 4

Slide 5

Slide 5 text

Net fl ixͱC.E.ͷྺ࢙ ͱɺ͜Ε·Ͱ঺հͨ͠΋ͷ • 2008ʹAWSશ໘ҠߦʢCDNҎ֎ʣ • Chaos Monkey: VMࢭΊΔ • 2012೥ΫϦεϚεΠϒʹ୯ҰϦʔδϣϯͰେো֐ • Chaos Kong: RegionࢭΊΔ • ϥϯμϜʹյ͢͜ͱ=CE, ͱ͍͏ޡղ͕ൃੜ • 2015: Destroying networks for fun (and pro fi t) • 2016: ChAP (ChaosAutomation Platform), ϚΠΫϩαʔϏεʹಋೖ • 2017: PRINCIPLES OF CHAOS ENGINEERINGΛެ։ • 2019: Automating chaos experiments in production • 2022: ??? Chaos engineering: Building con fi dence in system behavior through experiments C.E.ͷํ๏࿦ͳ࿩͕த৺ ʢ۩ମతͳྫΛݟͳ͕Βʣ ࣮ࡍʹγεςϜΛ࡞ͬͯɺ
 C.E.ӡ༻ͨ͠࿩ ࠓճͷ࿩

Slide 6

Slide 6 text

Chaos as a Service •AWS FIS (Fault Injection Simulator) 6 •Azure Chaos Studio https://aws.amazon.com/jp/ fi s/ 2021/03/15- https://azure.microsoft.com/ja-jp/services/chaos-studio/ 2021/11/13- (preview൛)

Slide 7

Slide 7 text

1. INTRODUCTION •Net fl ixͷChaos Monkey • ຊ൪؀ڥͷVM΍αʔϏεΛແ࡞ҝʹࡴ͢ɻ2012೥͚ͩͰ65k VMΛఀࢭɻόάͷൃݟͱमਖ਼ɻো֐͸ؾ͔ͮΕͣɻ •DC, WAN, SDN΋ނো͢Δ 7 •ϗϫΠτϘοΫεతͳΞϓϩʔν • ։ൃ؀ڥͰϞσϧݕࠪɺγϯϘϦοΫ࣮ߦɺxςετͳͲ • ຊ൪؀ڥɾHWͰͷΈൃੜ͢Δόά΋͋ΔʢϨʔείϯσΟγϣϯͳͲʣ •ϒϥοΫϘοΫεతͳΞϓϩʔν • ߹ཧతͳʢى͖ͦ͏ͳɺ֬ೝͨ͠΄͏͕ྑͦ͞͏ͳʣো֐Λ࣮ࡍʹࢼ͢ • Χόʔ཰ʢશͯͷ૊Έ߹ΘͤͰ࣮ߦɺ֤ϦιʔεͰ1ճ͝ͱʣ • ো֐ࢼߦճ਺ʢ൓෮ճ਺ɺΠςϨʔγϣϯʣ͸࠷௿ݶʹ͢Δ • SLAຬͨͤΔʁ->ී௨͸طʹ৑௕Խ͞Ε͍ͯΔɻʢΤϯδχΞ͕͍Δ࣌ʹʣ೔தଳ͕ྑ͍ •Armageddon • E2Eͷ઀ଓੑΛอ࣋ͨ͠··ɺো֐γφϦΦΛΞϧΰϦζϜͰੜ੒ • NWૄ௨ʹ໰୊ݟ͔ͭΔͱςετதࢭ͠໰୊ՕॴͷϨϙʔτੜ੒ • ࢼߦճ਺ͱނোΧόʔ཰ͷධՁ

Slide 8

Slide 8 text

2. MOTIVATION & BACKGROUND •NWͷϞσϧԽ΍γϛϡϨʔγϣϯͷݶք • Ϟσϧݕࠪɺ*testͰ͸ݶքɻi.e. ಉ࣮࣌ߦόάɺOpenFlow࢓༷४ڌʢnot࣮૷४ڌʣɺ·ͨͦͷΑ͏ͳςετϑϨʔϜϫʔΫ •ෆมྔͷνΣοΫ • ϦΞΫςΟϒΞϓϩʔν: C-planeग़ྗ΍D-plane࣮ߦ݁Ռ͔ΒόάΛݕ஌ • ΞΫςΟϒʹόάΛݕ஌: ArmageddonͰ࣮ࡍʹfailͤ͞Δ •ো֐ͷݕ஌ͱ෮چ • ઌߦݚڀ: con fi gνΣοΫʢԾ૝؀ڥΛ࢖͏ʣɺbugճආख๏ʢԾ૝ճઢΛ࢖͏ʣɺDCࣗಈ෮چख๏ • Armageddon͸࣮ࡍͷNWશମΛ࢖ͬͯݕ஌ɻʢԾ૝Խ͕ෆཁɺD-planeͷ؆ૉԽʣ • Armageddon͸޷͖ͳλΠϛϯάͰӡ༻ɾ࣮ߦͰ͖ΔɻඞཁͳΒ෮چπʔϧΛ࢖͏ 8

Slide 9

Slide 9 text

3. ARCHITECTURE •Inducing failures • failuresΛSDNܦ༝Ͱ (OpenFlowϝοηʔδͰ)஥հ࣮͠ߦ •Monitoring for correctness • fl owঢ়ଶมԽΛݕ஌ޙɺinvariant (ෆมྔ)νΣοΧʔ͕ಈ࡞ • NetPlumberͰFlowͷঢ়ଶΛೝ஌ •Generating reports and restoring correctness • ো֐γφϦΦͱτϨʔε݁ՌΛใࠂ • ࣗಈͰ໭͠ɻ໭Βͳ͍৔߹͸ର৅γεςϜվम͕ѱ͍ͱߟ͑Δ •Dealing with concurrent physical failures • ো֐ݕ஌ͨ͠Β͙͢ʹఀࢭɾ෮چϑϩʔʹೖΔɻ • ෺ཧϦιʔεނোݕ஌࣌͸ͦ΋ͦ΋߈ܸ࣮ߦ͞Εͳ͍ 9 https://www.usenix.org/conference/ons2014/technical-sessions/presentation/al-shabibi ೖྗ͸
 τϙϩδͱෆมྔ ෆมྔΛҡ࣋͠ͳ͕Βো֐ςετΛ࣮ߦ fl ow஥հ࣮ߦ (active/passive)

Slide 10

Slide 10 text

4. COMPUTING SMART FAILURES (1/4) •Coverage • ϦϯΫো֐Λى͜͠ɺE2E reachability͕͋Δ͔ • SDNίϯτϩʔϥ͕ਖ਼͘͠ಈ࡞͢Δ͔ʁ • શͯͷϦϯΫΛॱ൪ʹམͱ͢ʁ •Coverage + Speed • ࠷௿2ճ͸ࣗ໌ɺ࠷ߴnճ͔͔Δɻ͋ΔτϙϩδͰͷ࠷খճ਺͸ʁ 10

Slide 11

Slide 11 text

4. COMPUTING SMART FAILURES (2/4) ΞϧΰϦζϜ͓͞Β͍ •άϥϑϊʔυʂ •࠷খશҬ໦ʢMST: Minimum Spanning TreeʣΛٻΊΔΞϧΰϦζϜ • ϓϦϜ๏ • ΫϥεΧϧ๏ • https://algo-logic.info/kruskal-mst/ 11

Slide 12

Slide 12 text

4. COMPUTING SMART FAILURES (3/4) ࠓճղ͖͍ͨ໰୊ = “graph Jenga” •࠷௿Ұຊܨ͕͍ͬͯΔNWʢ= Spanning TreeʣΛҡ࣋͠ɺͦΕҎ֎Λ੾Δ •Ͱ͖Δ͚ͩগͳ͍൓෮ճ਺ͰɺશͯͷϦϯΫΛগͳ͘ͱ΋Ұ౓͸੾Γ͍ͨ 12 ௚ײతͳྫ ࠓճͷΞϧΰϦζϜ (Greedy Killer Algorithm) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 0 0 vertex (node) edge (link) 1 1 1 1 1 1 1 1 1 1 1 1 1

Slide 13

Slide 13 text

4. COMPUTING SMART FAILURES (4/4) •ΞϧΰϦζϜ • MST + α •ܭࢉྔ • O (ve log v) • e: edge, link਺ • v: vertex, node਺ 13 ΋͠T1, T2͕ඃΒͣʹ෼ղͰ͖Ε͹ E1ো֐࣮ߦ͠ɺE2ো֐࣮ߦͯ͠ऴΘΓ
 ʢ࠷খճ਺ͷ৔߹ʣ શͯͷedge (link)ͷॏΈΛ1ͱ͢Δ MSTͳT(V, E’)Λ࡞Δ E’’ (= E not E’)ʹো֐࣮ߦ ো֐ର৅link਺ΛΧ΢ϯτ E’’ͳlinkͷॏΈΛ0ʹ͢Δ ো֐ର৅͕ͳ͍ or શͯͷॏΈ͕0ͳΒऴྃ

Slide 14

Slide 14 text

5. PRELIMINARY EVALUATION •σʔληοτ • Internet Topology Zoo͔Β261ݸͷτϙϩδɺRocketFuelτϙϩδ͔Β7ݸ • େن໛ISP~தن໛ΩϟϯύεNWΛ૝ఆ • ߹ܭ265τϙϩδɻαΠζ͸36@50%ile, 962@Max •ΞϧΰϦζϜͷ࣮ߦ݁Ռ • 60%͸5ճҎԼͰ࣮ߦՄೳɺ91%͸8ճҎԼͰ࣮ߦՄೳ • 28ճඞཁͳέʔε -> ৑௕ੑ͕௿͍ɻ͋ΔछͷϦϯάτϙϩδ •ΞϧΰϦζϜͷ࠷ద͞ • 52% (138/265)͸࠷ద஋ʢ࠷খͷ൓෮ճ਺ʣͩͬͨ 14

Slide 15

Slide 15 text

6. DISCUSSIONS AND FUTURE WORK •΋ͬͱޮ཰తʹ࣮ߦ͍ͨ͠ •ΧόϨοδಛੑͷར༻ • ಛఆॱংͷlink downͷΈൃੜ͢Δো֐ͷ৔߹ͳͲɻϞσϧͷ׆༻ɻ • υϝΠϯ஌ࣝʢτϥϑΟοΫύλʔϯͳͲʣ͔ΒγφϦΦ਺ΛݮΒ͢ •ߴ౓ͳಛੑͷอ࣋ • bandwidthอ࣋΍᫔᫓ͳͲΛߟྀ • ΞϧΰϦζϜվྑʢෆมྔͷ૊Έ߹ΘͤʣͰͰ͖ͦ͏ 15

Slide 16

Slide 16 text

7. CONCLUSIONS •NWނো͸ආ͚Δ΂͖Ͱ͸ͳ͘ड͚ೖΕΔ΂͖ •ΧΦεΤϯδχΞϦϯάΛ΍Ζ͏ •Armageddon: NWΛޮ཰తʹো֐ςετͰ͖Δ • γεςϜɾΞϧΰϦζϜͷ޻෉ • ࠷খ൓෮ճ਺Ͱશମͷlink downΛςετ 16

Slide 17

Slide 17 text

׬૸ͨ͠ײ૝ •͍͖ͳΓάϥϑཧ࿦Ͱ࿩͞ΕΔͱͨ͡Ζ͙ •ൃද༻εϥΠυΛಡΜͰҙຯΛཧղ • ෇͚ম͖ਕͳܗࣜख๏ʢ࿦ཧԋࢉͷॻ͖ํʣ͕໾ʹཱͬͨ •಺༰͕ͪΐͬͱݹ͍ɺɺʢྲྀߦΓʁʣ 17

Slide 18

Slide 18 text

EoP 18

Slide 19

Slide 19 text

Key takeaways •NWނো͸ආ͚Δ΂͖Ͱ͸ͳ͘ड͚ೖΕΔ΂͖ •ΧΦεΤϯδχΞϦϯάΛ΍Ζ͏ •Armageddon: NWΛޮ཰తʹো֐ςετͰ͖Δ • γεςϜɾΞϧΰϦζϜͷ޻෉ • ࠷খ൓෮ճ਺Ͱશମͷlink downΛςετ 19

Slide 20

Slide 20 text

4. COMPUTING SMART FAILURES (3/4) ࠓճղ͖͍ͨ໰୊ = “graph Jenga” •࠷௿Ұຊܨ͕͍ͬͯΔNWʢ= Spanning TreeʣΛҡ࣋͠ɺͦΕҎ֎Λ੾Δ •Ͱ͖Δ͚ͩগͳ͍൓෮ճ਺ͰɺશͯͷϦϯΫΛগͳ͘ͱ΋Ұ౓͸੾Γ͍ͨ 20 ௚ײతͳྫ ࠓճͷΞϧΰϦζϜ (Greedy Killer Algorithm) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 0 0 vertex (node) edge (link)