Slide 1

Slide 1 text

Research Paper Introduction #37 “Bluebird: High-performance SDN for Bare-metal Cloud Services” ௨ࢉ#101 @cafenero_777 2022/06/09 1

Slide 2

Slide 2 text

Agenda •ର৅࿦จ •֓ཁͱಡ΋͏ͱͨ͠ཧ༝ 1. Introduction 2. Background 3. Design Goals and Rationale 4. System Design 5. Performance 6. Operationalization and Experiences 7. Related Work 8. Conclusions and Future Work 2

Slide 3

Slide 3 text

ର৅࿦จ •Bluebird: High-performance SDN for Bare-metal Cloud Services • Manikandan Arumugam1, et al • Arista1, Intel2, Microsoft3 • NSDI 2022 • https://www.usenix.org/conference/nsdi22/presentation/arumugam • ઌ೔ͷNSDI 2022 RecapճͰ঺հͨ͠΋ͷ 3

Slide 4

Slide 4 text

Bluebird: High-performance SDN for Bare-metal Cloud Services Arista, Intel, Microsoft • AzureͷϕΞϝλϧɾΫϥ΢υαʔϏε༻ͷԾ૝NWΛP4SWͰ·͔ͳ͏ • Netapp, Cray, SAP • 100Gbps, 2೥ӡ༻ • ೔ຊޠղઆهࣄ લճͷεϥΠυΑΓൈਮ

Slide 5

Slide 5 text

֓ཁͱಡ΋͏ͱͨ͠ཧ༝ •֓ཁ • AzureͷϕΞϝλϧɾΫϥ΢υαʔϏε༻ͷNWΛP4SWͰ͏·͘ܨ͙ • Մ༻ੑΛߟྀͨ͠ઃܭͰɺ<1us latencyͰ100Gb/s line-rateग़ͤΔ • ೋ೥Ҏ্Քಇͨ͠ܦݧͷ঺հ •ಡ΋͏ͱͨ͠ཧ༝ • Ϋϥ΢υͰͷP4 use case • ՝୊ͱͦͷղܾํ๏ʢઃܭͳͲʣ͕ؾʹͳΔ 5

Slide 6

Slide 6 text

1. Introduction •SDN, Τϯυϗετଆ (HV)ͰD-plane࣮૷ • OvS, DPDK, ASIC, FGPA, SmartNIC •ࣗࣾγεςϜͷΫϥ΢υҠߦͷݕ౼ • ʢઐ༻ʣΞϓϥΠΞϯε౳Λ࢖͍ͬͯΔʢNetApp, Cray, SAP, and HPCʣ •ϕΞϝλϧΫϥ΢υαʔϏε/HWaaS͸SDNελοΫΛೖΕΒΕͳ͍ʂ •ToRϕʔεͷSDNιϦϡʔγϣϯ: Bluebird • Barefoot To fi noͷToR΍SmartToRΛར༻૝ఆ • 1

Slide 7

Slide 7 text

2. Background 7 HVͰશ෦΍ΔͷͰγϯϓϧɻ SWͰ΍Δͷ͸େมɻagent͕Ϧιʔε࢖͏ɻ scalability/programmabilityΛҡ࣋͠ͳ͕ΒߴੑೳԽɻ ϕΞϝλϧʹ͸͋·Γద͞ͳ͍ɻʢෳࡶա͗ΔɻVFPվ଄ʁʣ ϕΞϝλϧͷ୅ΘΓʹToRͰෳࡶͳ͜ͱ͕Ͱ͖Δɻ ࠓճ͸VRF(ސ٬ຖͷNW෼ׂ)ͱVRFຖͷCA-PA mapping (VxLAN static route) ֤छrouting/tunnelingॲཧΛP4Ͱ࣮૷ɻ

Slide 8

Slide 8 text

3. Design Goals and Rationale 1. Programmability: VFPͱಉ౳ͳSDNελοΫɻ࣌ͱͱ΋ʹཁ͕݅มΘ͍͕ͬͯ͘ҡ࣋͢Δඞཁ͋Γɻ 2. Scalability: ToRͷϝϞϦ༰ྔ͕ϘτϧωοΫͷͨΊɺΩϟογϡγεςϜΛ։ൃɻ 3. Latency and Throughput: Programmable ASICΛར༻ɻ 4. High availability: Bluebird΋৑௕ઃܭΛͨ͠ɻ 5. Multitenancy support: ඞਢͳػೳཁ݅ɻ 6. Minimal overhead on host resources: θϩʹͳΔɻϕΞϝλϧੑೳͦͷ··ग़ͤΔɻ 7. Seamless integration: ϕΞϝλϧଆΛมߋͤͣʹɺBluebird͚ͩͰ࣮ݱɻ 8. External network access: ϕΞϝλϧ͕௚઀Πϯλʔωοτͱܨ͛ΔΑ͏ʹNATΛαϙʔτɻ 9. Interoperability: طଘͷSDNελοΫͱ࿈ܞ͠ಁաతͳಈ࡞Λ࣮ݱɻ 8

Slide 9

Slide 9 text

4. System Design (1/5) ύέοτͷྲྀΕ # Baremetal -> VM • VLAN 400 -> VRF/VNI 20500 • ѼઌMACΛToRͰม׵ • ToR/VFPؒVXLANτϯωϧ 9 # VM -> Baremetal • VFP/ToRؒVXLANτϯωϧ • VRF/VNI 20500 -> VLAN 400 • ѼઌMACΛToRͰղܾ

Slide 10

Slide 10 text

4. System Design (2/5) ֓ཁ •σόΠείετɾϝϞϦʢFIBʣɾNPU/ASICػೳͷτϨʔυΦϑ • ίΞϧʔλ: ߴ͍ɾେ༰ྔɾଟػೳ • Bluebird: ͍҆ɾͦΕͳΓͷྔɾଟػೳʢࣗ࡞ʣ • NetAppͷཁ݅ʢ240Gbps, <4msʣΛ6.4TbpsͳToRΛ࢖ͬͯղܾ •P4ύΠϓϥΠϯઃܭʹۤ࿑ • VTEP (VXLAN Tunnel Endpoint) tableͰදݱ͞ΕΔCA-PAϚοϐϯά਺Λ࠷େԽ͍ͨ͠ • To fi noͷIPv4/v6 unicast FIBΛॖখ͠ɺVTEP tableΛ16K -> 192Kʹ૿΍ͨ͠ • े෼ʁ -> NO, ։࢝౰ॳ͸े෼͕ͩͬͨɺɺɺ • mapping৘ใΛΩϟογϡͤ͞ɺ192KΤϯτϦҎ্Λ͞͹͚ΔΑ͏ʹͳͬͨ 10

Slide 11

Slide 11 text

4. System Design (3/5) P4 Platform/pipeline •To fi no-1ͷ࠾༻ • 6.4Tbps, 12stage, 256*25G SerDes, Quad-core 2.2Ghz CPU on Arista 7170 • 192K CA-to-PA mappingཁ݅ΛΫϦΞ •P4 Pipelineͷ޻෉ • ૉ๿ͳ࣮૷ͩͱΞϯμʔϨΠʹIPv6Λ࢖͏৔߹͸CA-to-PAαΠζ֬อෆՄ • ΧελϜP4ύΠϓϥΠϯΛ࢖͏͜ͱͰ͜ΕΛղܾ •ToRͷϓϩϑΝΠϧΛ੾Γସ͑Δ͜ͱͰɺҟͳΔP4ϓϩάϥϜʹ੾Γସ͑ •BM->VFPͷѼઌMAC͸BMଆͰstatic routeͱͯ͠deploy •https://github.com/navybhatia/p4-vxlanencapdecap/blob/main/switch-vxlan.p4 11

Slide 12

Slide 12 text

4. System Design (4/5) route cache •192K CA-PA mappingͷϘτϧωοΫ͕ݟ͖͑ͯͨ • ղܾҊ1: To fi no2 (1.5M CA-PA mapping)Λ࢖͏ • ղܾҊ2: cacheػߏΛ࡞Δ • ࣮ࡍʹ௨৴ͨ͠ΒͳΔ΂͘HW (To fi no)࢖͏ • LRU age/routeͰSW (CPU)ʹୀආ •1Mఔ౓·Ͱ૿΍ͤͨ 12

Slide 13

Slide 13 text

4. System Design (5/5) C-plane & policy •֎෦αʔϏε(Bluebird Service) ͔ΒϓϩϏδϣχϯά͢Δ •BBS: goal-stateΛ࡞ͬͯpush͢Δ • DAL: ίϚϯυγʔέϯε->JSON-RPC->EOS CLI • λʔήοτͱͷcon fi gࠩ෼Λܭࢉͯ͠reconciliation͢Δ • ֤ߏ੒ཁૉ͸ΞτϛοΫॲཧɺߏ੒͸όʔδϣϯ؅ཧ͞ΕΔ • ࿦ཧToRʢෳ਺୆ʣͷҰ؏ੑରԠ •BBS͸AZ͝ͱʹ͋ΔɻҰͭͷBBS͸ෳ਺AZ΋αϙʔτՄೳɻ 13

Slide 14

Slide 14 text

5. Performance (1/3) •AzureͰաڈ2೥Ͱ42Ҏ্ͷDCͰSDN-ToRར༻ • ਺ઍ୆ن໛ͷϕΞϝλϧαʔόʢCray ClusterStor, and NetApp FilesؚΉʣ͕Քಇ • route cache͸·ͩൃಈͤͣʢҰ೥ޙ͙Β͍ʹൃಈͦ͠͏ʣ • 40Gbps NIC, Xeon E5-2673 v4 (2.3GHz) on Windows Server 2019 14

Slide 15

Slide 15 text

5. Performance (2/3) •SDN ToR εωʔΫςετ • <1usͰ΄΅100Gbps • ଳҬɾϨΠςϯγʹහײͳBMϫʔΫϩʔυʹ߹͍ͬͯΔ • ిྗޮ཰͸طଘͷToRͱมΘΒͣ •route cacheͷ஗Ԇ • 8us஗Ԇ • SFEసૹ஗ԆͱSFW->HWΤϯτϦҠಈ஗Ԇ 15

Slide 16

Slide 16 text

5. Performance (3/3) •route cacheͷݕূ • ࣮Քಇͷσʔλతʹ͸~25%ఔ౓͕”active”ͳ௨৴ • 75%͸SW (CPU)ʹҠߦՄೳ • ͭ·Γ192K PA-CAΤϯτϦҎ্͕ར༻Մೳ • route͝ͱʹageͰbucket෼ྨ • Ͳͷఔ౓ੵۃతʹҠಈ͍͔ͤͨ͞νϡʔχϯάՄೳ 16 HW(To fi no)ʹ৐͍ͬͯΔactiveͳmapping਺(%)

Slide 17

Slide 17 text

6. Lessons Learned (1/2) •packet mirroring: ToR CPUͰϛϥʔϦϯάͯ͠ຊ൪Ͱσόοά •Re-con fi gurable ASIC: route cacheػߏͳͲɺʢଞͷํ๏Ͱ͸Ͱ͖ͳ͔ͬͨʣػೳΛ։ൃͰ͖ͨ •ASIC emulators: ։ൃͷߴ଎Խɻύέοτྲྀͯ͠ϑϩʔݕূ΍ςετ΋Մೳɻ •ToR imageΛ࢖ͬͨC−planeςετ: ςετͰ׆༻ •64bit OS: ϝϞϦ͍ͬͺ͍࢖͑Δ-> route cacheΤϯτϦΛଟ͘ར༻Ͱ͖Δ •C-planeͷػೳ੍ݶ: VRF/mapping௥Ճɾ࡟আͷΈɻϝϯςφϯε͸ଞͷϑϨʔϜϫʔΫʹ೚ͤΔ •ن໛ʹԠͨ͡ॲཧௐ੔: Ωϡʔͱόονॲཧ 17 ࢀߟ: https://t.co/KEWgX8pfuj ղઆऀͷ ؾʹͳΔ఺

Slide 18

Slide 18 text

6. Lessons Learned (2/2) •ToR৑௕ԽʢMLAGʣʹΑΔBBSಋೖɾҡ࣋ͷ؆қԽ •Reconciliationͷඞཁੑɿ • ݹ͍ઃఆ͔Βਖ਼͍͠ઃఆʹ໭͢ʢ෮ݩϓϩηεʣͷதͰΤϥʔΛमਖ਼ͯ͠੔߹ੑΛऔΔඞཁ͋Γɻ • ౤ೖઃఆͱͷࠩ෼Λߟྀͯ͠ઃఆ௥Ճɾ࡟আΛߦ͍ɺ੔߹ੑΛอͭɻfail-over࣌΋ಉ༷ɻ •Stateful Reconciliation: BBS͸࠷ॳ͸statelessϞσϧ͕ͩͬͨɺॲཧʹֻ͕͔࣌ؒΓա͗ͨͷมߋɻόʔδϣϯ؅ཧͳͲͰstate୲อ •҆શห͕ӡ༻޻਺ͷ૿ՃΛҾ͖ى͜͢ɿ • route cache͕࢖͑ΔΑ͏ʹͳΔ·Ͱɺސ٬༻ͷmapping਺Λ੍ݶͨ͠ʢ҆શͷͨΊɻ͕ɺ੍ݶ͕௿͗ͨ͢ʣ • ্ݶΛΦϯσϚϯυͰ্͛Δඞཁ͋Γɻ੍ݶΛ্͛ͯ΋࣮ࡍ͸ͦ͜·Ͱ૿͑ͳ͔ͬͨ •ToR OS image͸patchΛ౰ͯΔͷͰ͸ͳ͘ম͖௚͢ɻ͜ͷํ͕؅ཧ͕୯७͔ͭ༰қɺαʔϏε඼࣭΋޲্ •ToR OS͸ී௨ͷlinux OS, tcpdump΍iperfͳͲ”ී௨ͷ”πʔϧ͕࢖͑ɺূ໌ॻͷߋ৽΍dockerίϯςφ΋αʔόͱಉ͡Α͏ʹར༻Ͱ͖Δ 18 ղઆऀͷ ؾʹͳΔ఺

Slide 19

Slide 19 text

7. Related Work •OpenNF, Embark, ClickOS, NFVܥ, Serverless NFܥ, middle-boxܥ, OpenFlowܥ • Azure bare-metalαʔϏεཁ݅ʢ޿ଳҬɾ௿஗Ԇʣʹ߹Θͳ͍ •SmartNIC͸ࠓճͷཁ݅ʹ͸࢖͑ͳ͍ •εΠον+αʔόߏ੒ -> ফඅిྗ͕ߴ͍ •ϓϩάϥϚϒϧεΠονͷϦιʔε੍ݶ • ΩϟογϡɾTo fi no-2΁ͷupgrade, εΠονͷϝϞϦ֦ு •SDN͸multi-tenancy͚ͩͷ΋ͷͰ͸ͳ͍: FBOSS, B4, EgressEngineering, Jupiter, Robotron, Espresso 19

Slide 20

Slide 20 text

Conclusions and Future Work •Bluebirdͷઃܭɾ࣮૷ɾܦݧ • Azure ϕΞϝλϧΫϥ΢υαʔϏε༻ͷSDN ToRγεςϜ • Neap, Cray, SAPͷʢݫ͍͠ʣϫʔΫϩʔυͰ2೥ؒӡ༻ • ϓϩάϥϚϒϧASIC + ࣗ࡞ͷΩϟογϡػߏ • ΩϟογϡΞϧΰϦζϜվળ΍ଟ༷ͳϫʔΫϩʔυʹରԠ༧ఆ 20

Slide 21

Slide 21 text

Key takeaways •AzureϕΞϝλϧαʔϏεʢNetappͳͲʣΛP4 ToRͷVLAN/VXLANม׵ͰΧόʔ •HW༰ྔෆ଍͸ΩϟογϡʢSWͰͷ޻෉ʣͰղܾ •2೥ӡ༻ɺੑೳ(<1us latencyͰ100Gb/s line-rate)΍ܦݧΛڞ༗ 21

Slide 22

Slide 22 text

EoP 22