Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”

#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”

cafenero_777

June 19, 2023
Tweet

More Decks by cafenero_777

Other Decks in Technology

Transcript

  1. Research Paper Introduction #23 “VFP: A Virtual Switch Platform for

    Host SDN in the Public Cloud” ௨ࢉ#74 @cafenero_777 2021/06/10 1
  2. Agenda • ର৅࿦จ • ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ 1. Introduction 2. Design Goals

    and Rationale 3. Overview and Comparison 4. Filtering Model 5. Programming Model 6. Packet Processor and Flow Compiler 7. Switching Model 8. Operational Considerations 9. Hardware O ffl oads and Performance 10.Experiences 11.Conclusions and Future Work 2
  3. ର৅࿦จ • VFP: A Virtual Switch Platform for Host SDN

    in the Public Cloud • Daniel Firestone, Microsoft • NSDI ’17 • https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/ fi restone 3
  4. ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ • ֓ཁ • VFP: AzureͷNWΛࢧ͑ΔԾ૝εΠονʢTunnel/NAT/Stateful-ACL/QoSʣ • 100ສ୆Ҏ্ͷIaaS/PaaSϗετͰར༻த • ̐೥Ҏ্ͷܦݧΛڞ༗

    • ಡ΋͏ͱͨ͠ཧ༝ͱײ૝ • AzureͷԾ૝NWߏ੒͕ؾʹͳͬͨ • LinuxBridge/OvSͱͷઃܭɾ࣮૷ͷҧ͍ • τϯωϧͷுΓํ 4
  5. 1. Introduction • Public Cloud • ސ٬ຖʹϓϥΠϕʔτͳNWۭؒΛఏڙ • ैདྷͷHW switchͰ͸ແཧ

    -> ϗετ্(HV্)ͷԾ૝SW • SDNͰεέʔϧ͢ΔC-plane + D-planeͷ྆ํ͕ॏཁ • VFP: Virtual Filtering Platform • શHV্Ͱಈ࡞͢ΔԾ૝SW, Ծ૝NICͷ”ϑΟϧλ”ͱͯ͠ಈ࡞ • SDNϙϦγʔͱ࿈ಈͯ͠ಈ࡞ • stateless-tunnel (VNET), Ananta (L4LB=LB-NAT), etc 5
  6. 2. Design Goals and Rationale • ݩʑ͸Hyper-V্ͷNWϑΟϧλυϥΠό • ACL/Tunnel/B/QoSͷػೳ͕”ݻఆ”͞Ε͍ͯͨɻ2011೥ʹ͜ΕΛಠཱͤ͞APIΛੜ΍ͨ͠ɺMAT (Match/Action

    Table) baseʹมߋɻVL2ͱ͸ҧ͏ɻ • ΰʔϧ 1. ୯ҰͷSDN-ControllerͰ͸ͳ͘ɺෳ਺ͷControllerΛಉ࣌ʹ࢖͑Δʢ໾ׂΛ෼ׂͰ͖ΔɻϚϧνίϯτϩʔϥϞσϧʣ 2. ύέοτ୯ҐͰ͸ͳ͘ɺίωΫγϣϯ୯ҐͰॲཧΛ͢Δɻstateful connection͕ fi st class object. (ASICॲཧͱͷҧ͍) i.e. NAT/ACL 3. Encap/Decap౳ͷϩδοΫΛίϯτϩʔϥʹԡ͠ࠐΊͳ͍ɻ͍͍ײ͡ͷεΩʔϚΛD-planeʹಥͬࠐΊΔΑ͏ʹ͓ͯ͘͠ͱɺߋ৽͕଎͍ʢP4·Ͱ͸ཁΒͳ͍ʣ 4. VFP updateʹΑΔ௨৴அ͸ڐ༰͞Εͳ͍ -> ෳࡶͳstateful fl owΛ࢖͏ͷ͸ݫ͍͠ 5. ڊେͳϧʔϧ܈Ͱ΋ߴ଎఻ૹΛ࣮ݱɿ10+ tables, 40Gbps+, Ωϟογϡͷ࢓૊Έɺޮ཰తͳclassi fi cationͷ޻෉ 6. SRIOV/NIC-o ffl oad: NICΩϟύʹऩ·ΔΑ͏ʹࣄલίϯύΠϧ+׬શҰகϑϩʔϞσϧͰରԠ • ΰʔϧ͡Όͳ͍΋ͷ • ΫϩεϓϥοτϑΥʔϜɿWindows only. ϑΝετύεҠ২͕ࠔ೉ • ίϯτϩʔϥʔ༻ϓϩτίϧɿOVS/OVSDBϓϩτίϧతͳ΋ͷ • con fi gͷিಥɾෆਖ਼ݕ஌ɿ ඞཁͰ΋ՄೳͰ΋ແ͍ʂͦͷ୅ΘΓʹ։ൃऀ༻ϙϦγʔݕূπʔϧ͸࡞ͬͨ 6 Ref: https://dl.acm.org/doi/10.1145/1594977.1592576 VFPv1 VFPv2
  7. 3. Overview and Comparison • O(1M) VMsΛ૝ఆͨ͠γφϦΦΛ૝ఆ͠ൺֱ • OvSͱͷൺֱɺOvS͸ૉ੖Β͍͠ɺ͕ɺ •

    ϚϧνίϯτϩʔϥϞσϧʹ͸߹Θͳ͍ɻtable͕ॱํ޲͔͠μϝ -> stateful layerԽ • MATͰεςʔτϑϧαϙʔτ͞Εͳ͍ -> stateful-layerԽ • MATͰEncap/DecapΛϞσϧԽͰ͖ΔΑ͏ʹमਖ਼ • virtual/physical address mappingΛ୳ͤΔ࢓૊Έ • ൚༻తͳΦϑϩʔυΞΫγϣϯAPI͕ͳ͍ -> ࡞ͬͨɻಛʹVTEP schema!! 7
  8. 5. Programming Model • ϙʔτɿVM/NICͷingress/egress͕ϕʔε • ϨΠϠʔ • ֤ػೳ͝ͱͷ·ͱ·Γɻ໭Γ͸ٯʹḷΔʹεςʔτϑϧ, ໭Γ͸5tuple͕ٯ

    • LBྫɿߦ͖: VIP -> DIP, ؼΓ: DIP-> VIPɻVIPۭؒ, DIPۭؒΛલޙͷϨΠϠʔͰಠཱͯ͠ϧʔϧ͕ॻ͚Δ • 1ϨΠϠʔ಺Ͱmatch/action͞ΕΔϧʔϧ͸1͚ͭͩɺॲཧॱ༗Γ • ϧʔϧɿmatch/actionΤϯςΟςΟ • Encap/Decap actionͷ৔߹͸ඞཁͳσʔλɾΩʔΛड͚औΔ • άϧʔϓ • ϧʔϧ܈ʢάϧʔϓ͕࠷খ୯Ґʣ • ྫɿDockerίϯςφͰಠࣗIPΛ࢖͍͍ͨ৔߹ɻACL (ϢʔβఆٛͱΠϯϑϥఆٛͷ྆ํΛ࢖͍͍ͨ৔߹) • ͦͷଞ • ͜ΕΒΛϙϦγʔΦϒδΣΫτͱͯ͠ѻ͑Δɻaddress mapping (hash), ಈతNAT-key • Fast eventing APIͱඇಉظI/O 9
  9. 6. Packet Processor and Flow Compiler (1/2) • ύϑΥʔϚϯεΛ޲্ͤ͞Δ࢓૊Έ •

    VFPv1 ଟஈͷϨΠϠʔͩͱੑೳͰͳ͍ -> ύέοτͦͷ΋ͷͰ͸ͳ͘ϝλσʔλԽͯ͠matchͤ͞Δ • UFID: fl ow IDͷ౷Ұ • Header Transpositions (HTs) ~= OvS actionͷϔομૢ࡞ // ද2, ۩ମྫ͸ද3 • v2͔Βϝλσʔλ΋ૢ࡞ɻ • UFID/HTsͷΩϟογϡԽ (UF Table = fast-path = OvS micro fl ow cache) • fi rst packetͷΈslow-path (HTΤϯδϯ)Λ௨Δ • τϯωϧαϙʔτ΋Ͱ͖ΔΑ͏ʹ޻෉ • ύέοτमਖ਼Ҏ֎ͷΞΫγϣϯ • HTsͷcallbackͱ࣮ͯ͠૷ɿϝʔλϦϯάɺ҉߸Խ 10
  10. 6. Packet Processor and Flow Compiler (2/2) • Flow Reconciliation

    • ϧʔϧΛมߋޙɺUF͕͋ͬͯ΋৽͍͠ϧʔϧΛద༻͍ͤͨ͞ • portຖʹੈ୅൪߸͕෇༩ɺ؅ཧ͞ΕΔ • TCPϑϩʔঢ়ଶͷ௥੻ɿٯํ޲ͷΧ΢ϯλΛݟͯ൑அ • ϑϩʔͷϖΞϦϯά • DSRͷ৔߹͸”୯७ͳٯ”Ͱ͸ͳ͍ • inbound UF -> iUFID … ٯʹ͢Δ -> outbound UFΛੜ੒ • TCPτϥοΩϯά • TCPεςʔτϚγϯΛ࢖͏ɻsyn- fl oodରࡦͰ͖Δɻ֤छstats΋௥੻ɾ਍அͰ͖Δ • ύέοτ෼ྨ • ෳࡶͳACLϧʔϧɾԿઍϧʔϧ΋͋ΔͱύϑΥʔϚϯεӨڹ • ѹॖτϥΠ (compressed trie), ۠ؒ໦ (interval tree), hash-table, listͰ࠷దԽ 11
  11. 7. Switching Model • FilterҎ֎ɺͭ·Γ௨ৗͷύέοτసૹ • BridgeʹΑΔL2సૹ (outer or inner

    MAC addressʹΑΔ) • ϔΞϐϯɺϛϥʔϦϯά͸VFPͰॲཧɺಛʹgateway VMͩͱ༗ޮɻ • QoSαϙʔτɿଳҬ༧໿౳ 12
  12. 8. Operational Considerations • อकੑ (serviceability)ɺϞχλϦϯάɺ diagnostics͸ඞਢʂ • Rebootless update:

    1ඵະຬͰ׬ྃͰ͖ΔɻVM͔Β͸෼͔Δ • State Save/Restore: VFPͷACL/NATͷঢ়ଶҡ͕࣋໨త • VMϚΠάϨʔγϣϯɿState΋ҡ࣋͞ΕΔ (্هSSRར༻) • ϙʔτɾϨΠϠɾϧʔϧຖʹ300Ҏ্ͷΧ΢ϯλɾ౷ܭ৘ใ //ද8 • ਍அػೳ • VFP/VFPAPIΫϥΠΞϯτ྆ํͰ਍அՄೳ • ྫɿ೚ҙͷUFIDΛ࢖ͬͨΫΤϦ • VFPͷτϨʔε༗ޮʹ͢Ε͹࣮ࡍͷΞΫγϣϯϩά͕ݟΕΔ • snapshotΛऔಘ͠ɺϩʔΧϧͰ෮ݩɾղੳ 13
  13. 9. Hardware Of fl oads and Performance • NIC (HW)ͰΦϑϩʔυ

    • NVGRE/VXLANΦϑϩʔυ: • 40GbpsϥΠϯϨʔτEncapୡ੒ • QoSΦϑϩʔυ • VFPϙϦγʔΦϑϩʔυ • ଟஈtableΛ௚ྻlookup͢ΔͱTCAM/CPU͕ඞཁ • Uni fi ed FlowsͰhash͢Δ͜ͱͰղফ • VMͰ25GbpsϥΠϯϨʔτ, VNET಺E2EͰ<25us 14 TCP synͰ1-2ܻੑೳྑ͍ ϨΠϠ૿͑ͯ΋ੑೳҡ࣋ Q (CPU਺)Ͱੑೳ޲্ ෺ཧ(PacketDirect)ͷ৔߹ ΑΓྑ͍
  14. 10. Experiences (1/3) • ΰʔϧ ʢP.6 ࠶ܝʣ 1. ୯ҰͷSDN-ControllerͰ͸ͳ͘ɺෳ਺ͷControllerΛಉ࣌ʹ࢖͑Δʢ໾ׂΛ෼ׂͰ͖ΔɻϚϧνίϯτϩʔϥϞσϧʣ •

    ϨΠϠʔಠཱͤ͞ɺଞͷίϯτϩʔϥʹӨڹͳ͘σϓϩΠՄೳ 2. ύέοτ୯ҐͰ͸ͳ͘ɺίωΫγϣϯ୯ҐͰॲཧΛ͢Δɻstateful connection͕ fi st class object. (ASICॲཧͱͷҧ͍) i.e. NAT/ACL • VFPͰ͸શͯͷ઀ଓ͸εςʔτϑϧʹѻΘΕΔ 3. Encap/Decap౳ͷϩδοΫΛίϯτϩʔϥʹԡ͠ࠐΊͳ͍ɻ͍͍ײ͡ͷεΩʔϚΛD-planeʹಥͬࠐΊΔΑ͏ʹ͓ͯ͘͠ͱɺߋ৽͕଎͍ʢP4·Ͱ͸ཁΒͳ͍ʣ • VNET, LB, ACL౳͕VFPͷมߋͳ͠Ͱ࢖͍ଓ͚ΒΕΔɻ 4. VFP updateʹΑΔ௨৴அ͸ڐ༰͞Εͳ͍ -> ෳࡶͳstateful fl owΛ࢖͏ͷ͸ݫ͍͠ • RebootlessͰԿ౓΋update͖ͯͨ͠ 5. ڊେͳϧʔϧ܈Ͱ΋ߴ଎఻ૹΛ࣮ݱɿ10+ tables, 40Gbps+, Ωϟογϡͷ࢓૊Έɺޮ཰తͳclassi fi cationͷ޻෉ • UFTͰVFPੑೳ޲্ɺಛʹϨΠϠʔ͕ଟஈͷ৔߹ɻ 6. SRIOV/NIC-o ff l oad: NICΩϟύʹऩ·ΔΑ͏ʹࣄલίϯύΠϧ+׬શҰகϑϩʔϞσϧͰରԠ • UFTͰͷ୳ࡧʹͯ1-2ܻੑೳ޲্ɺ͞ΒʹHWΦϑϩʔυ (SRIOV)͠ੑೳ޲্ 15
  15. 10. Experiences (2/3) • 2012೥Ҏ߱ɺVFP͸21ճϝδϟʔϦϦʔεɺશͯͷAzureαʔόɺ30regions • L4 fl ow cacheͰे෼ɺOvSͰݴ͏mega

    fl ow (wildcard)͸ඞཁͳ͍ɺͱ͍͏ओு • Statefulʹ͢ΔͳΒ࠷ॳ͔Β΍Ζ͏ɻޙ͔ΒMatch/action tableͷมߋ͸ແཧʂ • ϨΠϠϦϯάॏཁɿίϯτϩʔϥʔؒͰͷϨΠϠϦϯάηϚϯςΟΫεͰ֤ίϯτϩʔϥΛਖ਼͘͠ಠཱͤ͞Δ • goto͸༗֐ɿ”ؾΛ͚ͭΔ”͸ແཧɻϨΠϠϦϯάηϚϯςΟΫεͰ΍ΒͤΔ • IaaS͸μ΢ϯλΠϜʹහײɿupdateํ๏ΛͪΌΜͱઃܭͯ͠அ࣌ؒΛอূ͢ΔɻͦͷͨΊʹඞཁͳػೳΛ௥Ճ, StateSaveRestoreͳͲɻ • D-planeͱwire protocol͸੾Γ཭ͤɿO(1M)εέʔϧͷcontroller/agentϞσϧͰͷ࣮૷ܦݧɻVFPAPI (southbound API)ͷಠཱ 16 ղઆऀͷ ؾʹͳΔ఺
  16. 10. Experiences (3/3) • ίϯϑϦΫτݕ஌͸݁ہཁΒͳ͔ͬͨɻಠཱίϯτϩʔϥ͕͏·͘ಈ࡞ɻίϯτϩʔϥσόοάπʔϧͷํ͕େ੾ • શͯ͸”ΞΫγϣϯ”: VL2ͷτϯωϧ͸τϯωϧI/Fͩͬͨɻ͜ΕΛΞΫγϣϯʹͯ͠MA͕γϯϓϧʹͳΓΦϑϩʔυͰ͖ͨ • MTU͸ݒ೦͕ͩͬͨɺΞϯμʔϨΠଆΛେ͖͘औΔ͜ͱͰ໰୊ͳ͠

    • MATεέʔϧɿ࠷େ10-20ϨΠϠɺ1ϨΠϠ࠷େ਺ඦάϧʔϓɺ1άϧʔϓลΓO(50k)ϧʔϧɺ1ϙʔτ͋ͨΓ50ສTCP current-conn • γϯϓϧͳMACΞυϨεϕʔεͷϑΥϫʔσΟϯάҎ֎ͷํ๏͸ݟ͔ͭΒͣ • E2EϞχλϦϯάɿVMʹ௚઀ೖΕͳ͍ͷͰɺVFPϧʔϧͱͯ͠inject/responseΛ࡞Γɺ͜ΕͰϞχλϦϯάɻVM/ϗετڥքΛ؂ࢹ • ঎༻NIC͸ཧ૝తͰ͸ͳ͍ɿAzureͷϙϦγʔ͸SR-IOVͰΦϑϩʔυͰ͖ͳ͍ɻFPGAϕʔεͳNICͰ࣮૷ 17 ղઆऀͷ ؾʹͳΔ఺
  17. 11. Conclusion and Future Work • VFP (Virtual Filtering Platform)͸Microsoft

    AzureͷԾ૝εΠον • ϓϩάϥϚϏϦςΟͱεέʔϥϏϦςΟΛ঺հ • ຊ൪؀ڥͰͷอकੑɾϞχλϦϯάɾdiagnosticsͷݒ೦ͱܦݧ • ࠓޙ͸ΑΓ৽͍͠HWϞσϧ΍VFPΦϑϩʔυݴޠͷ֦ு 18