Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#37 “Bluebird: High-performance SDN for Bare-me...
Search
cafenero_777
June 22, 2023
Technology
1
150
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
NSDI 2022
https://www.usenix.org/conference/nsdi22/presentation/arumugam
cafenero_777
June 22, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
530
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
130
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
150
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
110
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
79
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
58
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
270
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
320
#25 “Swift: Delay is Simple and Effective for Congestion Control in the Datacenter”
cafenero_777
0
180
Other Decks in Technology
See All in Technology
Claude Code for NOT Programming
kawaguti
PRO
1
100
データの整合性を保ちたいだけなんだ
shoheimitani
8
3.2k
プロポーザルに込める段取り八分
shoheimitani
1
640
1,000 にも届く AWS Organizations 組織のポリシー運用をちゃんとしたい、という話
kazzpapa3
0
160
今こそ学びたいKubernetesネットワーク ~CNIが繋ぐNWとプラットフォームの「フラッと」な対話
logica0419
5
430
Bill One急成長の舞台裏 開発組織が直面した失敗と教訓
sansantech
PRO
2
400
SREチームをどう作り、どう育てるか ― Findy横断SREのマネジメント
rvirus0817
0
350
コスト削減から「セキュリティと利便性」を担うプラットフォームへ
sansantech
PRO
3
1.6k
GitHub Issue Templates + Coding Agentで簡単みんなでIaC/Easy IaC for Everyone with GitHub Issue Templates + Coding Agent
aeonpeople
1
260
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
2
330
Why Organizations Fail: ノーベル経済学賞「国家はなぜ衰退するのか」から考えるアジャイル組織論
kawaguti
PRO
1
190
Oracle AI Database移行・アップグレード勉強会 - RAT活用編
oracle4engineer
PRO
0
110
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
174
15k
A Tale of Four Properties
chriscoyier
162
24k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.7k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
3.9k
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
590
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.1k
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
58
50k
Art, The Web, and Tiny UX
lynnandtonic
304
21k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
220
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
61
52k
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.3k
Transcript
Research Paper Introduction #37 “Bluebird: High-performance SDN for Bare-metal Cloud
Services” ௨ࢉ#101 @cafenero_777 2022/06/09 1
Agenda •ରจ •֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Background 3. Design Goals
and Rationale 4. System Design 5. Performance 6. Operationalization and Experiences 7. Related Work 8. Conclusions and Future Work 2
ରจ •Bluebird: High-performance SDN for Bare-metal Cloud Services • Manikandan
Arumugam1, et al • Arista1, Intel2, Microsoft3 • NSDI 2022 • https://www.usenix.org/conference/nsdi22/presentation/arumugam • ઌͷNSDI 2022 RecapճͰհͨ͠ͷ 3
Bluebird: High-performance SDN for Bare-metal Cloud Services Arista, Intel, Microsoft
• AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷԾNWΛP4SWͰ·͔ͳ͏ • Netapp, Cray, SAP • 100Gbps, 2ӡ༻ • ຊޠղઆهࣄ લճͷεϥΠυΑΓൈਮ
֓ཁͱಡ͏ͱͨ͠ཧ༝ •֓ཁ • AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷNWΛP4SWͰ͏·͘ܨ͙ • Մ༻ੑΛߟྀͨ͠ઃܭͰɺ<1us latencyͰ100Gb/s line-rateग़ͤΔ • ೋҎ্Քಇͨ͠ܦݧͷհ
•ಡ͏ͱͨ͠ཧ༝ • ΫϥυͰͷP4 use case • ՝ͱͦͷղܾํ๏ʢઃܭͳͲʣ͕ؾʹͳΔ 5
1. Introduction •SDN, Τϯυϗετଆ (HV)ͰD-plane࣮ • OvS, DPDK, ASIC, FGPA,
SmartNIC •ࣗࣾγεςϜͷΫϥυҠߦͷݕ౼ • ʢઐ༻ʣΞϓϥΠΞϯεΛ͍ͬͯΔʢNetApp, Cray, SAP, and HPCʣ •ϕΞϝλϧΫϥυαʔϏε/HWaaSSDNελοΫΛೖΕΒΕͳ͍ʂ •ToRϕʔεͷSDNιϦϡʔγϣϯ: Bluebird • Barefoot To fi noͷToRSmartToRΛར༻ఆ • 1<us, 100Gbps, NAT༻ͳͲͷඦສͷconntrackͷ࣮ݱ • ίϯτϩʔϧϓϨʔϯ 6
2. Background 7 HVͰશ෦ΔͷͰγϯϓϧɻ SWͰΔͷେมɻagent͕Ϧιʔε͏ɻ scalability/programmabilityΛҡ࣋͠ͳ͕ΒߴੑೳԽɻ ϕΞϝλϧʹ͋·Γద͞ͳ͍ɻʢෳࡶա͗ΔɻVFPվʁʣ ϕΞϝλϧͷΘΓʹToRͰෳࡶͳ͜ͱ͕Ͱ͖Δɻ ࠓճVRF(ސ٬ຖͷNWׂ)ͱVRFຖͷCA-PA mapping
(VxLAN static route) ֤छrouting/tunnelingॲཧΛP4Ͱ࣮ɻ
3. Design Goals and Rationale 1. Programmability: VFPͱಉͳSDNελοΫɻ࣌ͱͱʹཁ͕݅มΘ͍͕ͬͯ͘ҡ࣋͢Δඞཁ͋Γɻ 2. Scalability:
ToRͷϝϞϦ༰ྔ͕ϘτϧωοΫͷͨΊɺΩϟογϡγεςϜΛ։ൃɻ 3. Latency and Throughput: Programmable ASICΛར༻ɻ 4. High availability: BluebirdઃܭΛͨ͠ɻ 5. Multitenancy support: ඞਢͳػೳཁ݅ɻ 6. Minimal overhead on host resources: θϩʹͳΔɻϕΞϝλϧੑೳͦͷ··ग़ͤΔɻ 7. Seamless integration: ϕΞϝλϧଆΛมߋͤͣʹɺBluebird͚ͩͰ࣮ݱɻ 8. External network access: ϕΞϝλϧ͕Πϯλʔωοτͱܨ͛ΔΑ͏ʹNATΛαϙʔτɻ 9. Interoperability: طଘͷSDNελοΫͱ࿈ܞ͠ಁաతͳಈ࡞Λ࣮ݱɻ 8
4. System Design (1/5) ύέοτͷྲྀΕ # Baremetal -> VM •
VLAN 400 -> VRF/VNI 20500 • ѼઌMACΛToRͰม • ToR/VFPؒVXLANτϯωϧ 9 # VM -> Baremetal • VFP/ToRؒVXLANτϯωϧ • VRF/VNI 20500 -> VLAN 400 • ѼઌMACΛToRͰղܾ
4. System Design (2/5) ֓ཁ •σόΠείετɾϝϞϦʢFIBʣɾNPU/ASICػೳͷτϨʔυΦϑ • ίΞϧʔλ: ߴ͍ɾେ༰ྔɾଟػೳ •
Bluebird: ͍҆ɾͦΕͳΓͷྔɾଟػೳʢࣗ࡞ʣ • NetAppͷཁ݅ʢ240Gbps, <4msʣΛ6.4TbpsͳToRΛͬͯղܾ •P4ύΠϓϥΠϯઃܭʹۤ࿑ • VTEP (VXLAN Tunnel Endpoint) tableͰදݱ͞ΕΔCA-PAϚοϐϯάΛ࠷େԽ͍ͨ͠ • To fi noͷIPv4/v6 unicast FIBΛॖখ͠ɺVTEP tableΛ16K -> 192Kʹ૿ͨ͠ • ेʁ -> NO, ։࢝ॳे͕ͩͬͨɺɺɺ • mappingใΛΩϟογϡͤ͞ɺ192KΤϯτϦҎ্Λ͚͞ΔΑ͏ʹͳͬͨ 10
4. System Design (3/5) P4 Platform/pipeline •To fi no-1ͷ࠾༻ •
6.4Tbps, 12stage, 256*25G SerDes, Quad-core 2.2Ghz CPU on Arista 7170 • 192K CA-to-PA mappingཁ݅ΛΫϦΞ •P4 Pipelineͷ • ૉͳ࣮ͩͱΞϯμʔϨΠʹIPv6Λ͏߹CA-to-PAαΠζ֬อෆՄ • ΧελϜP4ύΠϓϥΠϯΛ͏͜ͱͰ͜ΕΛղܾ •ToRͷϓϩϑΝΠϧΛΓସ͑Δ͜ͱͰɺҟͳΔP4ϓϩάϥϜʹΓସ͑ •BM->VFPͷѼઌMACBMଆͰstatic routeͱͯ͠deploy •https://github.com/navybhatia/p4-vxlanencapdecap/blob/main/switch-vxlan.p4 11
4. System Design (4/5) route cache •192K CA-PA mappingͷϘτϧωοΫ͕ݟ͖͑ͯͨ •
ղܾҊ1: To fi no2 (1.5M CA-PA mapping)Λ͏ • ղܾҊ2: cacheػߏΛ࡞Δ • ࣮ࡍʹ௨৴ͨ͠ΒͳΔ͘HW (To fi no)͏ • LRU age/routeͰSW (CPU)ʹୀආ •1Mఔ·Ͱ૿ͤͨ 12
4. System Design (5/5) C-plane & policy •֎෦αʔϏε(Bluebird Service) ͔ΒϓϩϏδϣχϯά͢Δ
•BBS: goal-stateΛ࡞ͬͯpush͢Δ • DAL: ίϚϯυγʔέϯε->JSON-RPC->EOS CLI • λʔήοτͱͷcon fi gࠩΛܭࢉͯ͠reconciliation͢Δ • ֤ߏཁૉΞτϛοΫॲཧɺߏόʔδϣϯཧ͞ΕΔ • ཧToRʢෳʣͷҰ؏ੑରԠ •BBSAZ͝ͱʹ͋ΔɻҰͭͷBBSෳAZαϙʔτՄೳɻ 13
5. Performance (1/3) •AzureͰաڈ2Ͱ42Ҏ্ͷDCͰSDN-ToRར༻ • ઍنͷϕΞϝλϧαʔόʢCray ClusterStor, and NetApp FilesؚΉʣ͕Քಇ
• route cache·ͩൃಈͤͣʢҰޙ͙Β͍ʹൃಈͦ͠͏ʣ • 40Gbps NIC, Xeon E5-2673 v4 (2.3GHz) on Windows Server 2019 14
5. Performance (2/3) •SDN ToR εωʔΫςετ • <1usͰ΄΅100Gbps • ଳҬɾϨΠςϯγʹහײͳBMϫʔΫϩʔυʹ߹͍ͬͯΔ
• ిྗޮطଘͷToRͱมΘΒͣ •route cacheͷԆ • 8usԆ • SFEసૹԆͱSFW->HWΤϯτϦҠಈԆ 15
5. Performance (3/3) •route cacheͷݕূ • ࣮Քಇͷσʔλతʹ~25%ఔ͕”active”ͳ௨৴ • 75%SW (CPU)ʹҠߦՄೳ
• ͭ·Γ192K PA-CAΤϯτϦҎ্͕ར༻Մೳ • route͝ͱʹageͰbucketྨ • ͲͷఔੵۃతʹҠಈ͍͔ͤͨ͞νϡʔχϯάՄೳ 16 HW(To fi no)ʹ͍ͬͯΔactiveͳmapping(%)
6. Lessons Learned (1/2) •packet mirroring: ToR CPUͰϛϥʔϦϯάͯ͠ຊ൪Ͱσόοά •Re-con fi
gurable ASIC: route cacheػߏͳͲɺʢଞͷํ๏ͰͰ͖ͳ͔ͬͨʣػೳΛ։ൃͰ͖ͨ •ASIC emulators: ։ൃͷߴԽɻύέοτྲྀͯ͠ϑϩʔݕূςετՄೳɻ •ToR imageΛͬͨC−planeςετ: ςετͰ׆༻ •64bit OS: ϝϞϦ͍ͬͺ͍͑Δ-> route cacheΤϯτϦΛଟ͘ར༻Ͱ͖Δ •C-planeͷػೳ੍ݶ: VRF/mappingՃɾআͷΈɻϝϯςφϯεଞͷϑϨʔϜϫʔΫʹͤΔ •نʹԠͨ͡ॲཧௐ: Ωϡʔͱόονॲཧ 17 ࢀߟ: https://t.co/KEWgX8pfuj ղઆऀͷ ؾʹͳΔ
6. Lessons Learned (2/2) •ToRԽʢMLAGʣʹΑΔBBSಋೖɾҡ࣋ͷ؆қԽ •Reconciliationͷඞཁੑɿ • ݹ͍ઃఆ͔Βਖ਼͍͠ઃఆʹ͢ʢ෮ݩϓϩηεʣͷதͰΤϥʔΛमਖ਼ͯ͠߹ੑΛऔΔඞཁ͋Γɻ • ೖઃఆͱͷࠩΛߟྀͯ͠ઃఆՃɾআΛߦ͍ɺ߹ੑΛอͭɻfail-over࣌ಉ༷ɻ
•Stateful Reconciliation: BBS࠷ॳstatelessϞσϧ͕ͩͬͨɺॲཧʹֻ͕͔࣌ؒΓա͗ͨͷมߋɻόʔδϣϯཧͳͲͰstate୲อ •҆શห͕ӡ༻ͷ૿ՃΛҾ͖ى͜͢ɿ • route cache͕͑ΔΑ͏ʹͳΔ·Ͱɺސ٬༻ͷmappingΛ੍ݶͨ͠ʢ҆શͷͨΊɻ͕ɺ੍ݶ͕͗ͨ͢ʣ • ্ݶΛΦϯσϚϯυͰ্͛Δඞཁ͋Γɻ੍ݶΛ্࣮͛ͯࡍͦ͜·Ͱ૿͑ͳ͔ͬͨ •ToR OS imagepatchΛͯΔͷͰͳ͘ম͖͢ɻ͜ͷํ͕ཧ͕୯७͔ͭ༰қɺαʔϏε্࣭ •ToR OSී௨ͷlinux OS, tcpdumpiperfͳͲ”ී௨ͷ”πʔϧ͕͑ɺূ໌ॻͷߋ৽dockerίϯςφαʔόͱಉ͡Α͏ʹར༻Ͱ͖Δ 18 ղઆऀͷ ؾʹͳΔ
7. Related Work •OpenNF, Embark, ClickOS, NFVܥ, Serverless NFܥ, middle-boxܥ,
OpenFlowܥ • Azure bare-metalαʔϏεཁ݅ʢଳҬɾԆʣʹ߹Θͳ͍ •SmartNICࠓճͷཁ݅ʹ͑ͳ͍ •εΠον+αʔόߏ -> ফඅిྗ͕ߴ͍ •ϓϩάϥϚϒϧεΠονͷϦιʔε੍ݶ • ΩϟογϡɾTo fi no-2ͷupgrade, εΠονͷϝϞϦ֦ு •SDNmulti-tenancy͚ͩͷͷͰͳ͍: FBOSS, B4, EgressEngineering, Jupiter, Robotron, Espresso 19
Conclusions and Future Work •Bluebirdͷઃܭɾ࣮ɾܦݧ • Azure ϕΞϝλϧΫϥυαʔϏε༻ͷSDN ToRγεςϜ •
Neap, Cray, SAPͷʢݫ͍͠ʣϫʔΫϩʔυͰ2ؒӡ༻ • ϓϩάϥϚϒϧASIC + ࣗ࡞ͷΩϟογϡػߏ • ΩϟογϡΞϧΰϦζϜվળଟ༷ͳϫʔΫϩʔυʹରԠ༧ఆ 20
Key takeaways •AzureϕΞϝλϧαʔϏεʢNetappͳͲʣΛP4 ToRͷVLAN/VXLANมͰΧόʔ •HW༰ྔෆΩϟογϡʢSWͰͷʣͰղܾ •2ӡ༻ɺੑೳ(<1us latencyͰ100Gb/s line-rate)ܦݧΛڞ༗ 21
EoP 22