Slide 1

Slide 1 text

Research Paper Introduction #46 “Technology-Driven, Highly-Scalable Dragon fl y Topology” ௨ࢉ#116 @cafenero_777 2022/05/11 1

Slide 2

Slide 2 text

Agenda •ର৅࿦จ •֓ཁͱಡ΋͏ͱͨ͠ཧ༝ 1. Introduction 2. Technology Model 3. Dragon fl y Topology 4. Routing 5. Cost Comparison 6. Related Work 7. Conclusion 2

Slide 3

Slide 3 text

ର৅࿦จ •Technology-Driven, Highly-Scalable Dragon fl y Topology • John Kim, William J. Dally, Steve Scott, Dennis Abts • Northwestern University, Stanford University, Cray Inc., Google Inc. • ISCA '08 (International Symposium on Computer Architecture) • https://dl.acm.org/doi/abs/10.1145/1394608.1382129 • https://iscaconf.org/isca2008/ 3

Slide 4

Slide 4 text

֓ཁͱಡ΋͏ͱͨ͠ཧ༝ •֓ཁ • ASICͷਐԽͰߴradixϧʔλ͕࣮ݱ͢Δ͕ɺέʔϒϧ௕ɾ਺͕ίετ • ϧʔλɺԾ૝ϧʔλɺDragon fl y TopologyΛ༻͍ͯ͜ΕΛղܾɺClosൺͰ50% • Ծ૝νϟωϧࣝผͱ᫔᫓ݕ஌ΫϨδοτಋೖͰεϧʔϓοτɺϨΠςϯγ͕ཧ૝తͳ Adaptive Routingग़དྷΔΑ͏ʹͳͬͨɻ •ಡ΋͏ͱͨ͠ཧ༝ • Dragon fl y Topologyͷ֓ཁΛཧղ͍ͨ͠ • Aquila@googleͰ΋Dragon fl y Topology͕࢖ΘΕ͍ͯΔ • IETF 116 rtgwgͰDTͷ࿩୊͕ग़ͨͷͰؾʹͳͬͨɻ// ಺༰͕શવ෼͔Βͳ͔ͬͨͷͰɻ 4

Slide 5

Slide 5 text

5 https://www.usenix.org/conference/nsdi22/presentation/gibson https://www.microsoft.com/en-us/research/video/network-topologies- for-large-scale-datacenters-its-the-diameter-stupid/ https://datatracker.ietf.org/meeting/116/materials/ slides-116-rtgwg-routing-in-dragon fl y-topologies Dragon fl y TopologyͬͯԿʁ

Slide 6

Slide 6 text

1. INTRODUCTION •SCͳੈքͰ͸64 Radix Closʢ୤3D Torusʣ΁ •ϙʔτ਺ʢج਺: Radixʣ૿Ճɺ௕ڑ཭ɺϑΝΠόʔ -> ߴՁ • Dragon fl y Topology: ϧʔλΛάϧʔϓԽͯ͠༗ޮج਺Λ૿΍͢ •ྫ: 1hopͰglobal (ڑ཭௕Ί)ʹ౸ணͤ͞Δʹ͸: Fig.1 • 2*sqrt(N): 64, 128 Radix͸࣮ݱͰ͖Δ͕ɺN=1M͸ແཧʂ •άϩʔόϧ΁ͷܦ࿏੍ޚ • άϩʔόϧ௚઀઀ଓؒͳΒUGAL(Universal Globally Adaptive Load-balanced routing)Ͱ͖Δ • ϩʔΧϧ -> άϩʔόϧ -> ...ͷ৔߹Ͱ΋࠷దԽ͍ͨ͠ͷͰҎԼΛఏҊ • ଳҬ: UGALVC_H • ஗Ԇ: UGALCR 6 100k hostऩ༰ʹ600portεΠον͕ඞཁʁʂ

Slide 7

Slide 7 text

2. Technology Model •૬ޓ઀ଓͷ֊૚ • ASIC಺ંฦ͠ • όοΫϓϨʔϯʢϛουϓϨʔϯʣંฦ͠ • άϩʔόϧ: ϥοΫؒંฦ͠ // ࢧ഑తίετ •Cupper͔ΒFiber (AOCར༻)΁ • 10m͋ͨΓ͕ίετٯస • ຊ࿦จɿຊ਺ΛݮΒ͠ɺ௕͞Λ৳͹͢ •έʔϒϧ௕ͱhigh-radixɺಛੑʹ߹ΘͤͯτϙϩδΛ޻෉ 7

Slide 8

Slide 8 text

3. Dragon fl y Topology 1/2 8 αʔό઀ଓ਺: p R: ϧʔλ͕aݸ͋Δ άϧʔϓ ~= Ծ૝ϧʔλ ϧʔλؒ઀ଓ਺: a-1 άϧʔϓؒ઀ଓ਺: h ֤ϧʔλͷج਺k=p+h+a-1 a*p ports a*h ports άϧʔϓͷج਺k' = a*(p+h) k' >> k ௚઀઀ଓ͕͠΍͘͢ͳΔ ʢάϩʔόϧܘ͕খ͘͞ͳΔʣ 1hopͰߦ͚Δݶք - g: ah+1άϧʔϓ·Ͱɻ - N: ap(ah+1)୆·Ͱɻ άϧʔϓ಺͸׬શ݁߹τϙϩδ άϧʔϓؒ͸೚ҙτϙϩδͰྑ͍ 50port͋Ε͹100k hostऩ༰Ͱ͖Δʂ

Slide 9

Slide 9 text

3. Dragon fl y Topology 2/2 9 p=h=2, a=4, k=7 -> k' = 16 2D fl attened butter fl y Fig5ͱಉ͡k'=16͕ͩɺ ϧʔλϖΞͷlocality͕ߴΊ ΋͏গ͠ن໛େ͖Ί 3D fl attened butter fl y k'=32 N=1056 •όϦΤʔγϣϯ৭ʑ

Slide 10

Slide 10 text

4. Routing Routing on the Dragon fl y •࠷୹ܦ࿏ (Min) • Gs != GdͰRs͕GdΛ௚ऩͯ͠ͳ͍৔߹͸ɺάϩʔόϧ઀ଓΛ͍࣋ͬͯΔRa΁సૹ͠ɺRb΁౸ண • Rb != RdͳΒɺGd಺ϧʔςΟϯάͰRd΁౸ண •΋ͬͱLB͍ͤͨ͞ -> ValiantΞϧΰϦζϜͷదԠ͠ɺதؒάϧʔϓΛܦ༝ͤ͞Δ͜ͱ΋ग़དྷΔʢ͕ɺhop਺΋૿͑ΔͷͰ ͋·Γ΍Γͨ͘ͳ͍ʣ • Gs != GdͰRs͕GiΛ௚ऩͯ͠ͳ͍৔߹͸ɺGiάϩʔόϧ઀ଓΛ͍࣋ͬͯΔRa΁సૹ • Gs != GdͰRa͔ΒGiͷRx΁సૹ • Gi != GdͰɺRx͕Gd΁઀ଓ͞Ε͍ͯͳ͚Ε͹ɺRx͔Β͔ΒGd઀ଓ࣋ͭRy·ͰGi಺Ͱసૹ • Gi != GdͰɺRy͔ΒGdͷRb·Ͱసૹ • Rb != RdͰɺRb͔ΒRd·ͰGd಺ϧʔςΟϯά 10

Slide 11

Slide 11 text

4. Routing Evaluation •ΞϧΰϦζϜ: MinimulͱValiantͰධՁ • 1k node: p=h=4, a=8 •UGAL: Universal Globally-Adaptive Load-balanced • Qͱhop਺ͰMIN or VALΛબ୒ • UGAL-L: ϩʔΧϧϧʔλͷQͷΈͰܾΊΔ • UGAL-G: GsશͯͷϧʔλͷQͰܾΊΔ 11 (a)URͷ৔߹ MIN͕࠷ߴ VAL͸࠷௿, ͖͔ͬΓ൒෼ੑೳ (b)WorstCaseͳ৔߹ MIN͕࠷௿: ͭ·Γ٧·Δ(1/ah͔͠ग़ͤͳ͍) VAL, UGAL-G͕ߴ͍͕ɺ࠷େͰ΋50%ଳҬ·Ͱɻ

Slide 12

Slide 12 text

4. Routing Indirect Adaptive Routing •೉͠͞: ϧʔλ୯ମग़ྗͰ͸ͳ͘ɺάϧʔϓʢͷάϩʔόϧνϟωϧʣͷग़ྗͷར༻ঢ়گΛਖ਼͘͠ೝ͍ࣝͨ͠ • ͭ·Γindirect routingͷ໰୊ • ϩʔΧϧϧʔλ৘ใʢؒ઀৘ใʣͰάϩʔόϧνϟωϧʢΛ࣋ͭϧʔλʣΛબ୒ͤ͟ΔΛಘͳ͍ • ϩʔΧϧϧʔλͷQΛݟΔ -> ϩʔΧϧνϟωϧѹഭ͔ΒάϩʔόϧνϟωϧѹഭΛਪఆ͢Δ • ϩʔΧϧϧʔλ͕Over Provisioning͞ΕͯΔ৔߹͸ಛʹݦஶʹѹഭ͞ΕΔ 12

Slide 13

Slide 13 text

4. Routing Problem1: Limited throughput •ྫ: R1͔Βgc6(1hop), gc7(2hop)Ͱq1, q2͕๞࿨ • 1hop͕༏ઌ͞ΕΔͨΊR1٧·Δ • -> gc6/7ଳҬ͕ۭ͍͍ͯΔͷʹ࢖༻͞Εͣɻɻɻɻ •վળ: UGAL-LVC •ߋʹվળ: UGAL-LVC_H •URͰ8ׂੑೳɺWCͰlatencyඍ૿ 13 H: hop count VCຖͷQɾH͕ಘ͢ΔͳΒɻ Out: ग़ྗϙʔτ ϙʔτ͕ҧͬͯQɾH͕ಘͳΒɻ ϙʔτ͕ಉ͡Ͱ΋VCతʹಘͳΒɻ

Slide 14

Slide 14 text

4. Routing Problem2: Higher intermediate latency 1/3 •bu ff erྔͰlatency͸มΘΔ • short packetͰ͸ಛʹݦஶ •ྫ: R1ʹgc0, gc7͔Βύέοτ͕དྷͨ৔߹ɺͲ͔͜Βฦ͢ʁ • R1͔Βq0, q3͸ݟ͑ͳ͍ -> q1, q2Λ୅ΘΓʹ࢖͏ • throughputతʹ͸ਖ਼͍͠ɺ͕ɺlatencyతʹ͸ଛ͢Δ • ͭ·ΓQ͕๞࿨͢Δ·Ͱɺnon-minimal routeʹτϥϑΟοΫ͕Ҡ ಈ͠ͳ͍ɻ͜Ε͕஗Ԇ७૿ͷݪҼ •όοϑΝ͕ઙ͍΄ͲάϩʔόϧQͷback pressure͕ಧ͖΍͍͢: Fig.14 • ͨͩ͠throughput͸٘ਜ਼ʹͳΔ 14 latency෼෍ Short packet͸latencyߴΊ

Slide 15

Slide 15 text

4. Routing Problem2: Higher intermediate latency 2/3 •஗Ԇʹର͢ΔఏҊख๏: Credit round-trip latency • R0Λ্ྲྀɺͦͷԼྲྀΛR1, R2ͱ͢Δ • ύέοτ͕Լྲྀϧʔλʹసૹ͞Εɺcredit͕ݮΔ ( fl it) • Լྲྀϧʔλଆ͕సૹ׬ྃ͢Δͱɺ্ྲྀͷcreditΛ૿΍͢ (credit) • ෛՙθϩͷ࣌͸tcrt0 ͱ͢Δɻෛՙ͕૿͑Ε͹t΋૿͑Δ •͜ͷt͔Βάϩʔόϧνϟωϧ᫔᫓ঢ়ଶΛਪఆ͢Δ • ֤ϙʔτOʹରͯ͠ͷtd(O) = tcrt0 (O) - tcrt0 • td(O) - min [td(o)] ͚ͩ஗Ԇͤ͞creditΛฦ͢ • minͱͷࠩΛऔΔ ~= ෼ࢄΛऔΔʢͦͷRͷෛՙ৘ใ͕ೖΔʣ •ҰݟόοϑΝ͕ઙ͍Α͏ͳৼΔ෣͍͕ͩɺόοϑΝࣗମ͸͢΂ͯ࢖͑ΔͷͰߴε ϧʔϓοτ͕ग़ͤΔ • Q͕͍ͬͺ͍ʹͳΔΑΓ΋ૣ͘᫔᫓ʹؾ͚ͮΔ 15

Slide 16

Slide 16 text

4. Routing Problem2: Higher intermediate latency 3/3 •ఏҊख๏ UGAL-L (cr)ͷධՁ: ӈਤ •ඞཁͳػೳ • tcrt ଌఆͱcredit tracking • ύέοτࣝผࢠͷͨΊλΠϜελϯ ϓ (CTQ)Λ୯७ͳΩϡʔʹpush͢Δ • ໭͖ͬͯͨΒpop͢Δ • td ஋ͷอ࣋Ϩδελ • ୯७ʹอ࣋ • credit໭࣌͠ͷ஗ԆϝΧχζϜ • ͜Ε͸࡞Δඞཁ͋Γ 16 Bu ff er 16 WC Bu ff er 16 UR WC Bu ff er 256 Bu ff er 256 UR UGAL-L (vc-h)ͱൺֱͯ͠30% , 200%தؒ஗ԆΛ޲্ தؒ஗Ԇ͸ όοϑΝαΠζʹґΒͳ͍ UGAL-GΑΓ͸ѱ͍ʢҰ෦͕non-minimal routing͞ΕΔͨΊʣ

Slide 17

Slide 17 text

5. Cost Comparison 1/2 •(Folded-)Clos -> Flatten butter fl y topology: 50%ίετ࡟ݮ • தؒϧʔλɾνϟωϧͷ࡟আͰίετ௿ •Dragon fl y topology: ߋʹ֦ுੑ૿ɺίετ௿ • ྫ: 16R (=1group = 256node) * 16 * 16 = 64k node • FBT: group͔Βॎԣ16઀ଓ (2DΛ௥Ճ) • 50%͕άϩʔόϧ઀ଓ • DimensionΛ֦ுʢେมʣ • DT: group͔ΒશRʹ1઀ଓ (1DΛ௥Ճ) • FBTൺͰάϩʔόϧ઀ଓ͕൒෼ʹͳΔ • 25%͕άϩʔόϧ઀ଓ • groupαΠζΛ֦ுʢ༰қʣ 17

Slide 18

Slide 18 text

5. Cost Comparison 2/2 •ϗοϓ਺͸΄΅ಉ͡ •௕͞͸एׯDT͕ෆར • άϩʔόϧ਺ΛݮΒ͠ɺ৽͍͠γάφϦϯάٕज़ར༻΁ •ऩ༰ϗετ਺Ͱͷίετ • 1k·Ͱ͸ϧʔλؒશ઀ଓͷͨΊBT, DPมΘΒͣɻ • 4kͰBTൺͰ10%ίετݮʢέʔϒϧ௕͕୹͍ͨΊʣ • 4kҎ্͸BTൺͰ20%ίετݮʢάϩʔόϧ͕௕͘ɺগͳ͍ͨΊʣ • 3Dτʔϥε͸έʔϒϧ਺͕ଟ͍ͨΊߴίετ • folded-ClosൺͰ50%ίετݮ 18

Slide 19

Slide 19 text

Related Work •ROENet: Scalable Opto-Electronic Network • ෳ਺αϒωοτʢάϧʔϓʣ઀ଓͳάϩʔόϧεΠονͰߏ੒ • தؒϧʔλ͕ඞཁʢDT͸֊૚͕ϑϥοτʣ •༷ʑͳ֊૚ܕτϙϩδ • DT΋֊૚ܕ͕ͩɺάϩʔόϧ઀ଓΛݮΒͭͭ͠hop਺͸গͳ͍఺͕ҟͳΔ • πϦʔߏ଄ -> ଳҬɺ஗ԆѱԽ cube-connected cycles -> ༗ޮج਺૿Λ׆͔ͤͣ •৴߸ٕज़ͷॏཁੑ • ҆͘ɺ௕͍έʔϒϧ఻ૹٕज़ͷग़ݱͰτϙϩδ͕มΘΔ 19

Slide 20

Slide 20 text

Conclusion •Dragon fl y Topology • ߴ͍༗ޮج਺(Radix)Λ׆͔͠ɺωοτϫʔΫͷେ͖͞(Diameter)ɺίε τɺϨΠςϯγΛ࠷దԽ • άϩʔόϧέʔϒϧΛݮΒ͠ɺίετ࠷దԽ • 20%ݮ@ fl attened BTൺֱ, 50%ݮ@Floded-Closൺ • ϧʔςΟϯά՝୊: Ծ૝νϟωϧɺΫϨδοτϕʔεͷ᫔᫓੍ޚͷఏҊ 20

Slide 21

Slide 21 text

͓·͚: rtgwg@IETF 116ͷ࿩ •Routing in Dragon fl y Topologies IETF 116 Yokohama •New Topologies for Data Center • Dragon fl y+͓͞Β͍ • άϧʔϓ಺෦: Closʹ͢Δ • άϧʔϓؒ: ϑϧϝογϡΛ΍ΊΔ (2hopҎ্Λڐ༰) • ՝୊ײ • BGP: min+1͸௨ৗͷϧʔςΟϯάͰɺmin+3͸source routingඞཁ • Min for Core, ECMP/WCMP for Pods (?!) • min/non-minΛ࢖͍෼͚͍͕ͨ͠ɺECMPͩͱμϝ • Global͸path properties͕ͩɺLocal͸QΛ࢖͏ʁμϝͦ͏ • ReactiveͰ͸ͳ͘Proactiveʹௐ੔͍ͨ͠ͱͳΔͱɺBGPͰ͸ͳ͘QΛ࢖͍͍ͨؾ࣋ͪɻRTT 10usఔ౓ͳͷͰඇৗʹ଎͍ɺ਺΋ଟ͍ • ECN or fl ow label (Λ࢖ͬͯͷpath mapping)Λ࢖͏ͷɺͲ͏Ͱ͔͢Ͷʁ 21

Slide 22

Slide 22 text

ࢀߟจݙ •Dragon fl y • Routing in Dragon fl y Topologies IETF 116 Yokohama // IETF 116ͷࢿྉ • Dragon fl y+: Low Cost Topology for Scaling Datacenters // Α͘෼͔ΔDragon fl y+ • Exascale HPC Fabric Topology // Α͘෼͔Δʢུ • Aquila: A uni fi ed, low-latency fabric for datacenter networks // Gࣾͷར༻ࣄྫ •ࢀߟจݙ • LOAD-BALANCED ROUTING IN INTERCONNECTION NETWORKS // ത࢜࿦จʂ • The BlackWidow High-radix Clos Network • The Cube-Connected Cycles: A Versatile Network for Parallel Computation 22

Slide 23

Slide 23 text

ϝϞ •The cube-connected cycles: a versatile network for parallel computation • https://dl.acm.org/doi/10.1145/358645.358660 23