Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#24 “Ananta: Cloud Scale Load Balancing”
Search
cafenero_777
June 19, 2023
Technology
340
0
Share
#24 “Ananta: Cloud Scale Load Balancing”
ACM SIGCOM ’13
https://dl.acm.org/doi/10.1145/2534169.2486026
cafenero_777
June 19, 2023
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
550
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
140
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
160
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
120
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
87
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
cafenero_777
1
160
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
67
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
280
#25 “Swift: Delay is Simple and Effective for Congestion Control in the Datacenter”
cafenero_777
0
190
Other Decks in Technology
See All in Technology
「できない」のアウトプット 同人誌『精神を壊してからの』シリーズ出版を 通して得られたこと
comi190327
3
620
AIを活用したアクセシビリティ改善フロー
degudegu2510
1
150
解剖"React Native"
hacusk
0
120
ストライクウィッチーズ2期6話のエイラの行動が許せないのでPjMの観点から何をすべきだったのかを考える
ichimichi
1
280
BIツール「Omni」の紹介 @Snowflake中部UG
sagara
0
230
OPENLOGI Company Profile
hr01
0
83k
ADOTで始めるサーバレスアーキテクチャのオブザーバビリティ
alchemy1115
2
260
チームで育てるAI自走環境_20260409
fuktig
0
930
ログ基盤・プラグイン・ダッシュボード、全部整えた。でも最後は人だった。
makikub
5
1.1k
ある製造業の会社全体のAI化に1エンジニアが挑んだ話
kitami
2
160
【Findy FDE登壇_2026_04_14】— 現場課題を本気で解いてたら、FDEになってた話
miyatakoji
0
200
Network Firewall Proxyで 自前プロキシを消し去ることができるのか
gusandayo
0
210
Featured
See All Featured
Test your architecture with Archunit
thirion
1
2.2k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
10k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
2.7k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
150
Music & Morning Musume
bryan
47
7.1k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
410
Into the Great Unknown - MozCon
thekraken
40
2.3k
Embracing the Ebb and Flow
colly
88
5k
The Spectacular Lies of Maps
axbom
PRO
1
680
Producing Creativity
orderedlist
PRO
348
40k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
99
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
64
53k
Transcript
Research Paper Introduction #24 “Ananta: Cloud Scale Load Balancing” ௨ࢉ#75
@cafenero_777 2021/06/24 1
Agenda • ରจ • ֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. INTRODUCTION 2. BACKGROUND 3.
DESIGN 4. IMPLEMENTATION 5. MEASUREMENTS 6. OPERATIONAL EXPERIENCE 7. RELATED WORK 8. CONCLUSION 2
ରจ • Ananta: Cloud Scale Load Balancing • Parveen Patel,
Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert Greenberg, David A. Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, Naveen Karri • Microsoft • ACM SIGCOM ’13 • https://dl.acm.org/doi/10.1145/2534169.2486026 3
֓ཁͱಡ͏ͱͨ͠ཧ༝ • ֓ཁ • Ananta: Scalable L4LB (DSR/NAT) • ୯ҰVIPͰ100Gbps,
߹ܭͰ1TbpsҎ্ͷଳҬ෯ • Azure্Ͱಈ࡞ • ಡ͏ͱͨ͠ཧ༝ͱײ • AzureͷVFPจʹҾ༻ • ଞͷLB (Maglevͱ͔)Ͱ݁ߏҾ༻͞Ε͍ͯͨͷͰɻ 4 https://www.connectedpapers.com/main/5c295df1a7f302c97f6f379eab6abba592811d42/Ananta-cloud-scale-load-balancing/graph Ananta Maglev SilkRoad Beamer Faild Middleboxܥ ࢄɾߴޮܥ
1. Introduction • ΫϥυίϯϐϡʔςΟϯάͷීٴ • ߴ͍Քಇ (SLA), ϚϧνςφϯτɺେنτϥϑΟοΫ • 1VIP
100Gbps, 1000host/VIP, 6~60ճૢ࡞/1min • શSLAҧ/োͷ36%LBؔ࿈ • Ananta (αϯεΫϦοτޠͰແݶ) • Scalable L4 LB (NAT/DSR) • D-plane: ECMP (in NW), LB, NAT (in VFP/HV) • C-plane: SDN/Paxos, S-NAT࿈ܞ • 2011/09ʹAzureʹ100ಋೖ, 1Tbps, 100k VIPs • L4LB@Cloud, NWࢄγεςϜͱsclaingʹ͍ͭͯɺଌఆ݁Ռͱӡ༻݁ՌΛհ 5
2. BACKGROUND • Data Center Clos NW: 10Gαʔό*40kɺoversub 1:4, 400Gbps@Border
• VIPτϥϑΟοΫͷੑ࣭ • 44%VIPτϥϑΟοΫʢDC:DC֎=2:1ʣ • DCؒin/out1:1, σʔλಉظܥ • ཁ݅·ͱΊ 1. “Scale, Scale and Scale”: • ίετʢαʔόίετͷ1%, 400ଟ͗͢ɻʣ • ࠷େ1VIP 100Gbps & 1M current-conn, 100ճઃఆมߋ/ 2. ৴པੑ: N+1ߏͰͷࣗಈճ෮ɺϝϯςରԠ 3. “Any Service Anywhere”: L2υϝΠϯ੍ݶʹറΒΕͳ͍Α͏ʹ͢Δ 4. ςφϯτ: LBڞ༗ʹΑΔDoSӨڹʢଞͷސ٬͕ଳҬΛୣΘΕΔʣͷରࡦ 6 400Gbps 100Tbps
3. DESIGN (1/5) Principles & Architecture 7 • Scale outͰ͖ΔΑ͏ʹ
• RouterͷΑ͏ʹϑϩʔҡ࣋ػߏΛ࣋ͨͳ͍Α͏ʹ͢Δ • શಉظ͕ඞཁͳͷΘͳ͍ • i.e. WRR (Weighted Round Robin)ͱWeighted Random • ͍͠ॲཧHVଆͰΔʢΦϑϩʔυ͢Δʣ • ACL, Rate Limit, Metering • Ananta Manager (AM), Multiplexer (Mux), Host Agent (HA) • Inbound: IP-in-IP, NAT and DSR • Outbound: DIP->VIPVIP:sportͷmappingΛMUXͱHAͰಉظ͓ͯ͘͠ VIPใ/ECMP Selection/IP-in-IP L3 Routing Decap/DNAT NAT͠ DSR (Encapͳ͠) sportͱVIPΛཁٻ sportͱVIPΛઃఆ dportͱVIPͰ VMʹৼΓ͚
3. DESIGN (2/5) Principles & Architecture 8 • Fastpath: VIP
to VIP௨৴ɿLBΛbypath͠ɺHVؒͰ௨৴ͤ͞Δ • ࠷ॳLBΛ௨ͯ͠௨৴ • 3WHSྃ͢ΔͱDIP mappingใΛϦμΠϨΫτ • HA͕௨৴ͤ͞Δ • ҎޙLBΛ௨Βͳ͍ • ͬऔΓରࡦඞཁ ͜ͷ௨৴DIP2ͱmapping͞ΕͯΔΑ DIP1ඥ͚ DIP1/DIP2௨৴
3. DESIGN (3/5) Mux/Host Agent 9 • Mux Pool (Muxͷηοτ)
• Mux: BGP Speaker: VIPΛใɻো࣌ܦ࿏ॖୀɻTCP MD5ೝূ • AM͕VIP/DIP mappingΛMuxʹσϓϩΠɻ5tupleͰselection, hashؔɾseedશMuxͰڞ௨ʢECMPͰͲͷMuxʹ౸ୡͯ͠ಉ͡ॲཧΛอূʣ • ҰmappingΛࢀর͞ΕΔͱϑϩʔΛอ࣋ɻͨͩ͠ϝϞϦׂྔผʢSYN-Flood߈ܸରࡦʣ • ৴པͰ͖Δϑϩʔɿෳύέοτ->timeoutΊʹ͢Δ • ৴པͰ͖ͳ͍ϑϩʔɿ1ύέοτ->timeoutΊʹ͢Δ • Mux͕μϯ͢ΔͱECMPΨϥΨϥϙϯ • μϯதʹmappingมߋ͞ΕΔͱϑϩʔҡ࣋Ͱ͖ͳ͍ ->DHT (Distributed hash table)Λར༻ • Host Agent: શHV্ʹଘࡏɺFastpath, NAT, Health checkΛߦ͏ ʢP.8ͷઆ໌ʣ • ϙʔτͷ࠶ར༻ػೳ • Health checkMuxͰͳ͘HAଆͰΔɻ
3. DESIGN (4/5) Ananta Manager/Tenant Isolation 10 • Ananta Manager
(AM) • Paxosϕʔεͷࢄίϯτϩʔϥ • 5ϨϓϦΧͰՔಇɺ3ϨϓϦΧҎ্Ͱਖ਼ৗॲཧ • S-NAT: portׂΛόϧΫॲཧ • ςφϯτ • Muxຖʹಠཱ֤ͯ͠ςφϯτΞΠιϨʔγϣϯΛ࣮͢Εྑ͍ • AM: ཁٻFCFS(ઌணॱ: fi rst-come- fi rst-serve)͞ΕΔɻ͔ͭɺಉ͡Α͏ͳ৽نϦΫΤετऔΓԼ͛ɻ(2) • Mux: దͳଳҬ෯Λ͑ͨ߹ɺաଳҬʹൺྫͨ֬͠Ͱdrop and rate limit͢Δ • Top talker(Ұ൪௨৴͍ͯ͠Δ) VIPΛMux͔ΒҠಈͤ͞Δ
3. DESIGN (5/5) Alternatives 11 • DNS-based LB • ෛՙࢄͷࣄલ༧ଌ͕͍͠ʢClient͔ΒͷϦΫΤετ͕ภΔʣ???
• DNSΩϟογϡফ͑Δ·Ͱ͕͔͔࣌ؒΔ • stateful (NATͳͲ)͕Ͱ͖ͳ͍ • OpenFlow-based LB • ࢢൢOpenFlowσόΠεͰ2-4kϑϩʔ·ͰʢMux~Mϑϩʔঢ়ଶΛอ͍࣋ͨ͠ʣ • ςφϯτͷػೳ • BGPใͰ͖ͳ͍ʢAMʹͤΔʁʣ
4. IMPLEMENTATION • AM: Ԡੑॏཁ • SEDA (Staged event-driven Arch.)తͳϩοΫϑϦʔઃܭ
• thread poolڞ༗ʢ૯੍ݶʣ • ༏ઌʢྫɿVIP࡞༏ઌʣ • Paxos SDK + Discovery + Health MonitoringͰ࣮ • ϓϥΠϚϦ͕ॲཧΛߦ͏͜ͱΛอূ • upgrade࣌ʹAMΠϯελϯε͕1ͭҎ্མͪͳ͍͜ͱΛอূ • Mux: ΧʔωϧʢυϥΠόʣͰͷύέοτॲཧ + ϢʔβϞʔυͷBGPॲཧ • ΧʔωϧػೳΛͦͷ··͏: IPIP/RSS/IPv6 etc • 1VIPͰ20k DIP, 1.6M SNAT port mapping. ~Mͷಉ࣌ίωΫγϣϯใΛอ࣋ 12 *5 *8 *all
5. MEASUREMENTS Micro-benchmark 13 10VM * 2 tenantͰ1MB௨৴/connection ͔ᷮʹHostෛՙ͕૿͑Δ͕ɺMuxෛՙେ෯ʹԼΔ 10VM
* 5 tenant (baseτϥϑΟοΫ+SYN- fl ood * 10ճ) தʙߴෛՙͰDoSͷݟ͚͕ͭ͘ʹ͘͘ͳΔɻ ΄΅શͯ75msҎʹऩ·Δ ϙʔτ֬อͰ͖ͳ͍߹Ճ͕࣌ؒlong-tailͰ͔͔Δ Fastpath༗Γແ͠ͰͷCPUෛՙൺֱ SYN- fl ood Attack Mitigation S-NAT·ͰͷϨΠςϯγʔ
5. MEASUREMENTS Real World Data (1/2) • ߹ܭ1Tbps, 3ӡ༻ɺinter/intranet, ༻్ɿblob,
table/queue, storage 14 %ileతʹࠔΔγφϦΦ΄΅ແ͍ɻ <- 50ms <- 200ms <- max 2s req/5min@test tenant ฏۉՔಇ99.95% Muxߴෛՙ ʢSYN- fl oodʣ NW ޡݕ <- 75ms@50%ile <- max 2s ςφϯτɾMuxͷنʹґଘɻ SLAʹऩ·͍ͬͯΔɻ S-NATͷϦΫΤετ࣌ؒ Մ༻ੑ con fi gྃ࣌ؒ
5. MEASUREMENTS Real World Data (2/2) 15 800Mbps (220Kpps) /
core ॲཧ͕͔֬ʹECMP͞Ε͍ͯΔ ߹ܭ33.6Gbps: 2.4Gbps*14 Mux14ͷଳҬͱෛՙঢ়گ (25%)
6. OPERATIONAL EXPERIENCE • 3ؒΫϥυͰӡ༻ • HW LBʹ”ݟΓΛ͚ͭͨ”ཧ༝ɿDoS߈ܸରԠ͕Ͱ͖ͳ͍ɺྗੑ (elasticity)͕ͳ͍ɺଳҬ૿ՃɾՁ֨ѹྗʹݟ߹Θͳ͍ •
SW LBͷى͖࣮ͨࡍʹى͖ͨͱ՝ • AM dual primary: ݹ͍primaryػ͔ΒMuxϦΫΤετɺMuxଆ͜ΕΛڋ൱ɻ • Muxଆ͕ϦΫΤετڋ൱Λͨ͠ΒτϥϯβΫγϣϯΛ࣮ߦɺͰղܾ • IP-in-IPͷͨΊMTUมߋ, HA͕MSSௐ͢Δ͕ͣԿނ͔֎ΕͯMTU͑Ͱdrop • ͋ΔϗʔϜϧʔλʹMSS͕ fi x͞ΕΔόά • ͋ΔϞόΠϧOSͷTCPόάͰTCP࠶ଓ࣌ʹϑϧαΠζͷηάϝϯτΛͦͷ··͏όά • NWશମͷMTUΛ্͛ͨ • BGPͱLB͕ಉډ͍ͯ͠ΔͷͰɺଳҬ͋;ΕΔͱڞΕɻ͔͠1མͪΔͱτϥϑΟοΫ͕دΔͷͰ࿈োͷՄೳੑ • BGP/LBͰI/FΛ͚ΔɾϧʔλଆͰτϥϑΟοΫϨʔτΛߜΔɻBGP/LBಉډͷ΄͏͕ઃܭ͕γϯϓϧ • HW LBͷΞΠυϧίωΫγϣϯλΠϜΞτʢ̒̌ඵʣ • SW LBͰstateᷓΕରࡦʢDoSରࡦʣΛҾ͖ܧ͍ͩɻϞόΠϧ௨৴ுΓͬͺͳ͕͠ଟ͍ -> ͦͦVIP mapping͕͋ΔͷͰstateΛ࡞Βͳͯ͘ྑ͍->ແࣄλΠϜΞτΛ͘ Ͱ͖ͨ 16
7. RELATED WORK • HW LBεέʔϧΞοϓܕʢ1+1ܕʣ • धཁʹԠͨ͡εέʔϧΞοϓɾμϯ͕Ͱ͖ͳ͍ • ΫϥυڥͰՔಇཁ݅ͷͨΊN+1ͷੑ͕ඞཁ
• ԾΞϓϥΠΞϯεOSS (HAProxyͳͲ) • N+1͕Ͱ͖ͳ͍ɻNWো࣌εϖΞIP (I/F)Λ͏ҝL2υϝΠϯ੍ݶ • 1VIPΛεέʔϧͰ͖ͳ͍ • Embrace: ϗετଆͰಈ࡞ɺEgiΒ/RouteBricks: ίϞσΟςΟHWͰߴੑೳϧʔλΛ࣮ݱ • ETTM: શͯͷΤϯυϗετ͕ύέοτॲཧɻAnantaLB͚ͩઐ༻ͷαʔόɻ 17
8. CONCLUSION • Ananta • ࢄܕL4LB/NAT • Ϛϧνςφϯτɺߴ৴པੑɺӡ༻ͷ(Azureͷ)ཁ݅Λຬͨ͢Α͏ʹઃܭ • AzureҎ֎Ͱʹཱͭϋζ
• େن༻్Ͱίετʹݟ߹͏ɾscale-outͰ͖Δઃܭ͕ඞཁ • ECMP, BGP, DSR, Fastpath, HostଆNAT, rate limit • LB100ɺ10ສਓҎ্ʹVIPαʔϏεΛఏڙ 18
3ߦ·ͱΊ • ઃܭࢥMaglev (google)VPPLB (YNWLB2)ͱಉ͡ • BGP/ECMP/Consistent-hash/L4LB/DSR • εέʔϧͤ͞ΔͨΊͷ •
LBͰඞཁͳॲཧʢNATॲཧɾϔϧενΣοΫͳͲʣΛHVଆʹΦϑϩʔυ͢Δ • 1%ϧʔϧʢLBϊʔυΫϥελͷαʔόͷ1%·Ͱʣ • LBઃఆมߋരʢ75msʣ • Fastpath (్த͔ΒLBΛհ͞ͳ͍௨৴ʹΓସ͑Δ)ͰߋʹޮԽ 19
EoP 20