Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#37 “Bluebird: High-performance SDN for Bare-me...
Search
cafenero_777
June 22, 2023
Technology
1
110
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
NSDI 2022
https://www.usenix.org/conference/nsdi22/presentation/arumugam
cafenero_777
June 22, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
440
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
110
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
110
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
81
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
45
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
31
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
210
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
230
#25 “Swift: Delay is Simple and Effective for Congestion Control in the Datacenter”
cafenero_777
0
140
Other Decks in Technology
See All in Technology
2/18/25: Java meets AI: Build LLM-Powered Apps with LangChain4j
edeandrea
PRO
0
120
利用終了したドメイン名の最強終活〜観測環境を育てて、分析・供養している件〜 / The Ultimate End-of-Life Preparation for Discontinued Domain Names
nttcom
2
190
なぜ私は自分が使わないサービスを作るのか? / Why would I create a service that I would not use?
aiandrox
0
740
Classmethod AI Talks(CATs) #16 司会進行スライド(2025.02.12) / classmethod-ai-talks-aka-cats_moderator-slides_vol16_2025-02-12
shinyaa31
0
110
スタートアップ1人目QAエンジニアが QAチームを立ち上げ、“個”からチーム、 そして“組織”に成長するまで / How to set up QA team at reiwatravel
mii3king
2
1.5k
Amazon S3 Tablesと外部分析基盤連携について / Amazon S3 Tables and External Data Analytics Platform
nttcom
0
130
Oracle Cloud Infrastructure:2025年2月度サービス・アップデート
oracle4engineer
PRO
1
210
OpenID Connect for Identity Assurance の概要と翻訳版のご紹介 / 20250219-BizDay17-OIDC4IDA-Intro
oidfj
0
270
Culture Deck
optfit
0
420
白金鉱業Meetup Vol.17_あるデータサイエンティストのデータマネジメントとの向き合い方
brainpadpr
6
750
あれは良かった、あれは苦労したB2B2C型SaaSの新規開発におけるCloud Spanner
hirohito1108
2
590
プロセス改善による品質向上事例
tomasagi
2
2.6k
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1030
460k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
7k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Scaling GitHub
holman
459
140k
Testing 201, or: Great Expectations
jmmastey
42
7.2k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7.1k
Faster Mobile Websites
deanohume
306
31k
Reflections from 52 weeks, 52 projects
jeffersonlam
348
20k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
BBQ
matthewcrist
87
9.5k
Bash Introduction
62gerente
611
210k
Transcript
Research Paper Introduction #37 “Bluebird: High-performance SDN for Bare-metal Cloud
Services” ௨ࢉ#101 @cafenero_777 2022/06/09 1
Agenda •ରจ •֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Background 3. Design Goals
and Rationale 4. System Design 5. Performance 6. Operationalization and Experiences 7. Related Work 8. Conclusions and Future Work 2
ରจ •Bluebird: High-performance SDN for Bare-metal Cloud Services • Manikandan
Arumugam1, et al • Arista1, Intel2, Microsoft3 • NSDI 2022 • https://www.usenix.org/conference/nsdi22/presentation/arumugam • ઌͷNSDI 2022 RecapճͰհͨ͠ͷ 3
Bluebird: High-performance SDN for Bare-metal Cloud Services Arista, Intel, Microsoft
• AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷԾNWΛP4SWͰ·͔ͳ͏ • Netapp, Cray, SAP • 100Gbps, 2ӡ༻ • ຊޠղઆهࣄ લճͷεϥΠυΑΓൈਮ
֓ཁͱಡ͏ͱͨ͠ཧ༝ •֓ཁ • AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷNWΛP4SWͰ͏·͘ܨ͙ • Մ༻ੑΛߟྀͨ͠ઃܭͰɺ<1us latencyͰ100Gb/s line-rateग़ͤΔ • ೋҎ্Քಇͨ͠ܦݧͷհ
•ಡ͏ͱͨ͠ཧ༝ • ΫϥυͰͷP4 use case • ՝ͱͦͷղܾํ๏ʢઃܭͳͲʣ͕ؾʹͳΔ 5
1. Introduction •SDN, Τϯυϗετଆ (HV)ͰD-plane࣮ • OvS, DPDK, ASIC, FGPA,
SmartNIC •ࣗࣾγεςϜͷΫϥυҠߦͷݕ౼ • ʢઐ༻ʣΞϓϥΠΞϯεΛ͍ͬͯΔʢNetApp, Cray, SAP, and HPCʣ •ϕΞϝλϧΫϥυαʔϏε/HWaaSSDNελοΫΛೖΕΒΕͳ͍ʂ •ToRϕʔεͷSDNιϦϡʔγϣϯ: Bluebird • Barefoot To fi noͷToRSmartToRΛར༻ఆ • 1<us, 100Gbps, NAT༻ͳͲͷඦສͷconntrackͷ࣮ݱ • ίϯτϩʔϧϓϨʔϯ 6
2. Background 7 HVͰશ෦ΔͷͰγϯϓϧɻ SWͰΔͷେมɻagent͕Ϧιʔε͏ɻ scalability/programmabilityΛҡ࣋͠ͳ͕ΒߴੑೳԽɻ ϕΞϝλϧʹ͋·Γద͞ͳ͍ɻʢෳࡶա͗ΔɻVFPվʁʣ ϕΞϝλϧͷΘΓʹToRͰෳࡶͳ͜ͱ͕Ͱ͖Δɻ ࠓճVRF(ސ٬ຖͷNWׂ)ͱVRFຖͷCA-PA mapping
(VxLAN static route) ֤छrouting/tunnelingॲཧΛP4Ͱ࣮ɻ
3. Design Goals and Rationale 1. Programmability: VFPͱಉͳSDNελοΫɻ࣌ͱͱʹཁ͕݅มΘ͍͕ͬͯ͘ҡ࣋͢Δඞཁ͋Γɻ 2. Scalability:
ToRͷϝϞϦ༰ྔ͕ϘτϧωοΫͷͨΊɺΩϟογϡγεςϜΛ։ൃɻ 3. Latency and Throughput: Programmable ASICΛར༻ɻ 4. High availability: BluebirdઃܭΛͨ͠ɻ 5. Multitenancy support: ඞਢͳػೳཁ݅ɻ 6. Minimal overhead on host resources: θϩʹͳΔɻϕΞϝλϧੑೳͦͷ··ग़ͤΔɻ 7. Seamless integration: ϕΞϝλϧଆΛมߋͤͣʹɺBluebird͚ͩͰ࣮ݱɻ 8. External network access: ϕΞϝλϧ͕Πϯλʔωοτͱܨ͛ΔΑ͏ʹNATΛαϙʔτɻ 9. Interoperability: طଘͷSDNελοΫͱ࿈ܞ͠ಁաతͳಈ࡞Λ࣮ݱɻ 8
4. System Design (1/5) ύέοτͷྲྀΕ # Baremetal -> VM •
VLAN 400 -> VRF/VNI 20500 • ѼઌMACΛToRͰม • ToR/VFPؒVXLANτϯωϧ 9 # VM -> Baremetal • VFP/ToRؒVXLANτϯωϧ • VRF/VNI 20500 -> VLAN 400 • ѼઌMACΛToRͰղܾ
4. System Design (2/5) ֓ཁ •σόΠείετɾϝϞϦʢFIBʣɾNPU/ASICػೳͷτϨʔυΦϑ • ίΞϧʔλ: ߴ͍ɾେ༰ྔɾଟػೳ •
Bluebird: ͍҆ɾͦΕͳΓͷྔɾଟػೳʢࣗ࡞ʣ • NetAppͷཁ݅ʢ240Gbps, <4msʣΛ6.4TbpsͳToRΛͬͯղܾ •P4ύΠϓϥΠϯઃܭʹۤ࿑ • VTEP (VXLAN Tunnel Endpoint) tableͰදݱ͞ΕΔCA-PAϚοϐϯάΛ࠷େԽ͍ͨ͠ • To fi noͷIPv4/v6 unicast FIBΛॖখ͠ɺVTEP tableΛ16K -> 192Kʹ૿ͨ͠ • ेʁ -> NO, ։࢝ॳे͕ͩͬͨɺɺɺ • mappingใΛΩϟογϡͤ͞ɺ192KΤϯτϦҎ্Λ͚͞ΔΑ͏ʹͳͬͨ 10
4. System Design (3/5) P4 Platform/pipeline •To fi no-1ͷ࠾༻ •
6.4Tbps, 12stage, 256*25G SerDes, Quad-core 2.2Ghz CPU on Arista 7170 • 192K CA-to-PA mappingཁ݅ΛΫϦΞ •P4 Pipelineͷ • ૉͳ࣮ͩͱΞϯμʔϨΠʹIPv6Λ͏߹CA-to-PAαΠζ֬อෆՄ • ΧελϜP4ύΠϓϥΠϯΛ͏͜ͱͰ͜ΕΛղܾ •ToRͷϓϩϑΝΠϧΛΓସ͑Δ͜ͱͰɺҟͳΔP4ϓϩάϥϜʹΓସ͑ •BM->VFPͷѼઌMACBMଆͰstatic routeͱͯ͠deploy •https://github.com/navybhatia/p4-vxlanencapdecap/blob/main/switch-vxlan.p4 11
4. System Design (4/5) route cache •192K CA-PA mappingͷϘτϧωοΫ͕ݟ͖͑ͯͨ •
ղܾҊ1: To fi no2 (1.5M CA-PA mapping)Λ͏ • ղܾҊ2: cacheػߏΛ࡞Δ • ࣮ࡍʹ௨৴ͨ͠ΒͳΔ͘HW (To fi no)͏ • LRU age/routeͰSW (CPU)ʹୀආ •1Mఔ·Ͱ૿ͤͨ 12
4. System Design (5/5) C-plane & policy •֎෦αʔϏε(Bluebird Service) ͔ΒϓϩϏδϣχϯά͢Δ
•BBS: goal-stateΛ࡞ͬͯpush͢Δ • DAL: ίϚϯυγʔέϯε->JSON-RPC->EOS CLI • λʔήοτͱͷcon fi gࠩΛܭࢉͯ͠reconciliation͢Δ • ֤ߏཁૉΞτϛοΫॲཧɺߏόʔδϣϯཧ͞ΕΔ • ཧToRʢෳʣͷҰ؏ੑରԠ •BBSAZ͝ͱʹ͋ΔɻҰͭͷBBSෳAZαϙʔτՄೳɻ 13
5. Performance (1/3) •AzureͰաڈ2Ͱ42Ҏ্ͷDCͰSDN-ToRར༻ • ઍنͷϕΞϝλϧαʔόʢCray ClusterStor, and NetApp FilesؚΉʣ͕Քಇ
• route cache·ͩൃಈͤͣʢҰޙ͙Β͍ʹൃಈͦ͠͏ʣ • 40Gbps NIC, Xeon E5-2673 v4 (2.3GHz) on Windows Server 2019 14
5. Performance (2/3) •SDN ToR εωʔΫςετ • <1usͰ΄΅100Gbps • ଳҬɾϨΠςϯγʹහײͳBMϫʔΫϩʔυʹ߹͍ͬͯΔ
• ిྗޮطଘͷToRͱมΘΒͣ •route cacheͷԆ • 8usԆ • SFEసૹԆͱSFW->HWΤϯτϦҠಈԆ 15
5. Performance (3/3) •route cacheͷݕূ • ࣮Քಇͷσʔλతʹ~25%ఔ͕”active”ͳ௨৴ • 75%SW (CPU)ʹҠߦՄೳ
• ͭ·Γ192K PA-CAΤϯτϦҎ্͕ར༻Մೳ • route͝ͱʹageͰbucketྨ • ͲͷఔੵۃతʹҠಈ͍͔ͤͨ͞νϡʔχϯάՄೳ 16 HW(To fi no)ʹ͍ͬͯΔactiveͳmapping(%)
6. Lessons Learned (1/2) •packet mirroring: ToR CPUͰϛϥʔϦϯάͯ͠ຊ൪Ͱσόοά •Re-con fi
gurable ASIC: route cacheػߏͳͲɺʢଞͷํ๏ͰͰ͖ͳ͔ͬͨʣػೳΛ։ൃͰ͖ͨ •ASIC emulators: ։ൃͷߴԽɻύέοτྲྀͯ͠ϑϩʔݕূςετՄೳɻ •ToR imageΛͬͨC−planeςετ: ςετͰ׆༻ •64bit OS: ϝϞϦ͍ͬͺ͍͑Δ-> route cacheΤϯτϦΛଟ͘ར༻Ͱ͖Δ •C-planeͷػೳ੍ݶ: VRF/mappingՃɾআͷΈɻϝϯςφϯεଞͷϑϨʔϜϫʔΫʹͤΔ •نʹԠͨ͡ॲཧௐ: Ωϡʔͱόονॲཧ 17 ࢀߟ: https://t.co/KEWgX8pfuj ղઆऀͷ ؾʹͳΔ
6. Lessons Learned (2/2) •ToRԽʢMLAGʣʹΑΔBBSಋೖɾҡ࣋ͷ؆қԽ •Reconciliationͷඞཁੑɿ • ݹ͍ઃఆ͔Βਖ਼͍͠ઃఆʹ͢ʢ෮ݩϓϩηεʣͷதͰΤϥʔΛमਖ਼ͯ͠߹ੑΛऔΔඞཁ͋Γɻ • ೖઃఆͱͷࠩΛߟྀͯ͠ઃఆՃɾআΛߦ͍ɺ߹ੑΛอͭɻfail-over࣌ಉ༷ɻ
•Stateful Reconciliation: BBS࠷ॳstatelessϞσϧ͕ͩͬͨɺॲཧʹֻ͕͔࣌ؒΓա͗ͨͷมߋɻόʔδϣϯཧͳͲͰstate୲อ •҆શห͕ӡ༻ͷ૿ՃΛҾ͖ى͜͢ɿ • route cache͕͑ΔΑ͏ʹͳΔ·Ͱɺސ٬༻ͷmappingΛ੍ݶͨ͠ʢ҆શͷͨΊɻ͕ɺ੍ݶ͕͗ͨ͢ʣ • ্ݶΛΦϯσϚϯυͰ্͛Δඞཁ͋Γɻ੍ݶΛ্࣮͛ͯࡍͦ͜·Ͱ૿͑ͳ͔ͬͨ •ToR OS imagepatchΛͯΔͷͰͳ͘ম͖͢ɻ͜ͷํ͕ཧ͕୯७͔ͭ༰қɺαʔϏε্࣭ •ToR OSී௨ͷlinux OS, tcpdumpiperfͳͲ”ී௨ͷ”πʔϧ͕͑ɺূ໌ॻͷߋ৽dockerίϯςφαʔόͱಉ͡Α͏ʹར༻Ͱ͖Δ 18 ղઆऀͷ ؾʹͳΔ
7. Related Work •OpenNF, Embark, ClickOS, NFVܥ, Serverless NFܥ, middle-boxܥ,
OpenFlowܥ • Azure bare-metalαʔϏεཁ݅ʢଳҬɾԆʣʹ߹Θͳ͍ •SmartNICࠓճͷཁ݅ʹ͑ͳ͍ •εΠον+αʔόߏ -> ফඅిྗ͕ߴ͍ •ϓϩάϥϚϒϧεΠονͷϦιʔε੍ݶ • ΩϟογϡɾTo fi no-2ͷupgrade, εΠονͷϝϞϦ֦ு •SDNmulti-tenancy͚ͩͷͷͰͳ͍: FBOSS, B4, EgressEngineering, Jupiter, Robotron, Espresso 19
Conclusions and Future Work •Bluebirdͷઃܭɾ࣮ɾܦݧ • Azure ϕΞϝλϧΫϥυαʔϏε༻ͷSDN ToRγεςϜ •
Neap, Cray, SAPͷʢݫ͍͠ʣϫʔΫϩʔυͰ2ؒӡ༻ • ϓϩάϥϚϒϧASIC + ࣗ࡞ͷΩϟογϡػߏ • ΩϟογϡΞϧΰϦζϜվળଟ༷ͳϫʔΫϩʔυʹରԠ༧ఆ 20
Key takeaways •AzureϕΞϝλϧαʔϏεʢNetappͳͲʣΛP4 ToRͷVLAN/VXLANมͰΧόʔ •HW༰ྔෆΩϟογϡʢSWͰͷʣͰղܾ •2ӡ༻ɺੑೳ(<1us latencyͰ100Gb/s line-rate)ܦݧΛڞ༗ 21
EoP 22