Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”

#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”

cafenero_777

June 19, 2023
Tweet

More Decks by cafenero_777

Other Decks in Technology

Transcript

  1. Research Paper Introduction #23


    “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”

    ௨ࢉ#74
    @cafenero_777

    2021/06/10
    1

    View full-size slide

  2. Agenda
    • ର৅࿦จ

    • ֓ཁͱಡ΋͏ͱͨ͠ཧ༝

    1. Introduction

    2. Design Goals and Rationale

    3. Overview and Comparison

    4. Filtering Model

    5. Programming Model

    6. Packet Processor and Flow Compiler

    7. Switching Model

    8. Operational Considerations

    9. Hardware O
    ffl
    oads and Performance

    10.Experiences

    11.Conclusions and Future Work
    2

    View full-size slide

  3. ର৅࿦จ
    • VFP: A Virtual Switch Platform for Host SDN in the Public Cloud

    • Daniel Firestone, Microsoft

    • NSDI ’17

    • https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/
    fi
    restone
    3

    View full-size slide

  4. ֓ཁͱಡ΋͏ͱͨ͠ཧ༝
    • ֓ཁ

    • VFP: AzureͷNWΛࢧ͑ΔԾ૝εΠονʢTunnel/NAT/Stateful-ACL/QoSʣ

    • 100ສ୆Ҏ্ͷIaaS/PaaSϗετͰར༻த

    • ̐೥Ҏ্ͷܦݧΛڞ༗

    • ಡ΋͏ͱͨ͠ཧ༝ͱײ૝

    • AzureͷԾ૝NWߏ੒͕ؾʹͳͬͨ

    • LinuxBridge/OvSͱͷઃܭɾ࣮૷ͷҧ͍

    • τϯωϧͷுΓํ
    4

    View full-size slide

  5. 1. Introduction
    • Public Cloud

    • ސ٬ຖʹϓϥΠϕʔτͳNWۭؒΛఏڙ

    • ैདྷͷHW switchͰ͸ແཧ -> ϗετ্(HV্)ͷԾ૝SW

    • SDNͰεέʔϧ͢ΔC-plane + D-planeͷ྆ํ͕ॏཁ

    • VFP: Virtual Filtering Platform

    • શHV্Ͱಈ࡞͢ΔԾ૝SW, Ծ૝NICͷ”ϑΟϧλ”ͱͯ͠ಈ࡞

    • SDNϙϦγʔͱ࿈ಈͯ͠ಈ࡞

    • stateless-tunnel (VNET), Ananta (L4LB=LB-NAT), etc
    5

    View full-size slide

  6. 2. Design Goals and Rationale
    • ݩʑ͸Hyper-V্ͷNWϑΟϧλυϥΠό

    • ACL/Tunnel/B/QoSͷػೳ͕”ݻఆ”͞Ε͍ͯͨɻ2011೥ʹ͜ΕΛಠཱͤ͞APIΛੜ΍ͨ͠ɺMAT (Match/Action Table) baseʹมߋɻVL2ͱ͸ҧ͏ɻ

    • ΰʔϧ

    1. ୯ҰͷSDN-ControllerͰ͸ͳ͘ɺෳ਺ͷControllerΛಉ࣌ʹ࢖͑Δʢ໾ׂΛ෼ׂͰ͖ΔɻϚϧνίϯτϩʔϥϞσϧʣ

    2. ύέοτ୯ҐͰ͸ͳ͘ɺίωΫγϣϯ୯ҐͰॲཧΛ͢Δɻstateful connection͕
    fi
    st class object. (ASICॲཧͱͷҧ͍) i.e. NAT/ACL

    3. Encap/Decap౳ͷϩδοΫΛίϯτϩʔϥʹԡ͠ࠐΊͳ͍ɻ͍͍ײ͡ͷεΩʔϚΛD-planeʹಥͬࠐΊΔΑ͏ʹ͓ͯ͘͠ͱɺߋ৽͕଎͍ʢP4·Ͱ͸ཁΒͳ͍ʣ

    4. VFP updateʹΑΔ௨৴அ͸ڐ༰͞Εͳ͍ -> ෳࡶͳstateful
    fl
    owΛ࢖͏ͷ͸ݫ͍͠

    5. ڊେͳϧʔϧ܈Ͱ΋ߴ଎఻ૹΛ࣮ݱɿ10+ tables, 40Gbps+, Ωϟογϡͷ࢓૊Έɺޮ཰తͳclassi
    fi
    cationͷ޻෉

    6. SRIOV/NIC-o
    ffl
    oad: NICΩϟύʹऩ·ΔΑ͏ʹࣄલίϯύΠϧ+׬શҰகϑϩʔϞσϧͰରԠ

    • ΰʔϧ͡Όͳ͍΋ͷ

    • ΫϩεϓϥοτϑΥʔϜɿWindows only. ϑΝετύεҠ২͕ࠔ೉

    • ίϯτϩʔϥʔ༻ϓϩτίϧɿOVS/OVSDBϓϩτίϧతͳ΋ͷ

    • con
    fi
    gͷিಥɾෆਖ਼ݕ஌ɿ ඞཁͰ΋ՄೳͰ΋ແ͍ʂͦͷ୅ΘΓʹ։ൃऀ༻ϙϦγʔݕূπʔϧ͸࡞ͬͨ
    6
    Ref: https://dl.acm.org/doi/10.1145/1594977.1592576
    VFPv1
    VFPv2

    View full-size slide

  7. 3. Overview and Comparison
    • O(1M) VMsΛ૝ఆͨ͠γφϦΦΛ૝ఆ͠ൺֱ

    • OvSͱͷൺֱɺOvS͸ૉ੖Β͍͠ɺ͕ɺ

    • ϚϧνίϯτϩʔϥϞσϧʹ͸߹Θͳ͍ɻtable͕ॱํ޲͔͠μϝ -> stateful layerԽ

    • MATͰεςʔτϑϧαϙʔτ͞Εͳ͍ -> stateful-layerԽ

    • MATͰEncap/DecapΛϞσϧԽͰ͖ΔΑ͏ʹमਖ਼

    • virtual/physical address mappingΛ୳ͤΔ࢓૊Έ

    • ൚༻తͳΦϑϩʔυΞΫγϣϯAPI͕ͳ͍ -> ࡞ͬͨɻಛʹVTEP schema!!
    7

    View full-size slide

  8. 4. Filtering Model
    • MATʹԠͯ͡OS಺ͷύέοτΛϑΟϧλ͢Δ

    • Hyper-VͰ΋௨ৗBM hostͰ΋ར༻Մೳ

    • PacketDirectϞʔυɺߴ଎Β͍͠
    8

    View full-size slide

  9. 5. Programming Model
    • ϙʔτɿVM/NICͷingress/egress͕ϕʔε

    • ϨΠϠʔ

    • ֤ػೳ͝ͱͷ·ͱ·Γɻ໭Γ͸ٯʹḷΔʹεςʔτϑϧ, ໭Γ͸5tuple͕ٯ

    • LBྫɿߦ͖: VIP -> DIP, ؼΓ: DIP-> VIPɻVIPۭؒ, DIPۭؒΛલޙͷϨΠϠʔͰಠཱͯ͠ϧʔϧ͕ॻ͚Δ

    • 1ϨΠϠʔ಺Ͱmatch/action͞ΕΔϧʔϧ͸1͚ͭͩɺॲཧॱ༗Γ

    • ϧʔϧɿmatch/actionΤϯςΟςΟ

    • Encap/Decap actionͷ৔߹͸ඞཁͳσʔλɾΩʔΛड͚औΔ

    • άϧʔϓ

    • ϧʔϧ܈ʢάϧʔϓ͕࠷খ୯Ґʣ

    • ྫɿDockerίϯςφͰಠࣗIPΛ࢖͍͍ͨ৔߹ɻACL (ϢʔβఆٛͱΠϯϑϥఆٛͷ྆ํΛ࢖͍͍ͨ৔߹)

    • ͦͷଞ

    • ͜ΕΒΛϙϦγʔΦϒδΣΫτͱͯ͠ѻ͑Δɻaddress mapping (hash), ಈతNAT-key

    • Fast eventing APIͱඇಉظI/O
    9

    View full-size slide

  10. 6. Packet Processor and Flow Compiler (1/2)
    • ύϑΥʔϚϯεΛ޲্ͤ͞Δ࢓૊Έ

    • VFPv1 ଟஈͷϨΠϠʔͩͱੑೳͰͳ͍ -> ύέοτͦͷ΋ͷͰ͸ͳ͘ϝλσʔλԽͯ͠matchͤ͞Δ

    • UFID:
    fl
    ow IDͷ౷Ұ

    • Header Transpositions (HTs) ~= OvS actionͷϔομૢ࡞ // ද2, ۩ମྫ͸ද3

    • v2͔Βϝλσʔλ΋ૢ࡞ɻ

    • UFID/HTsͷΩϟογϡԽ (UF Table = fast-path = OvS micro
    fl
    ow cache)


    fi
    rst packetͷΈslow-path (HTΤϯδϯ)Λ௨Δ

    • τϯωϧαϙʔτ΋Ͱ͖ΔΑ͏ʹ޻෉

    • ύέοτमਖ਼Ҏ֎ͷΞΫγϣϯ

    • HTsͷcallbackͱ࣮ͯ͠૷ɿϝʔλϦϯάɺ҉߸Խ
    10

    View full-size slide

  11. 6. Packet Processor and Flow Compiler (2/2)
    • Flow Reconciliation

    • ϧʔϧΛมߋޙɺUF͕͋ͬͯ΋৽͍͠ϧʔϧΛద༻͍ͤͨ͞

    • portຖʹੈ୅൪߸͕෇༩ɺ؅ཧ͞ΕΔ

    • TCPϑϩʔঢ়ଶͷ௥੻ɿٯํ޲ͷΧ΢ϯλΛݟͯ൑அ

    • ϑϩʔͷϖΞϦϯά

    • DSRͷ৔߹͸”୯७ͳٯ”Ͱ͸ͳ͍

    • inbound UF -> iUFID … ٯʹ͢Δ -> outbound UFΛੜ੒

    • TCPτϥοΩϯά

    • TCPεςʔτϚγϯΛ࢖͏ɻsyn-
    fl
    oodରࡦͰ͖Δɻ֤छstats΋௥੻ɾ਍அͰ͖Δ

    • ύέοτ෼ྨ

    • ෳࡶͳACLϧʔϧɾԿઍϧʔϧ΋͋ΔͱύϑΥʔϚϯεӨڹ

    • ѹॖτϥΠ (compressed trie), ۠ؒ໦ (interval tree), hash-table, listͰ࠷దԽ
    11

    View full-size slide

  12. 7. Switching Model
    • FilterҎ֎ɺͭ·Γ௨ৗͷύέοτసૹ

    • BridgeʹΑΔL2సૹ (outer or inner MAC addressʹΑΔ)

    • ϔΞϐϯɺϛϥʔϦϯά͸VFPͰॲཧɺಛʹgateway VMͩͱ༗ޮɻ

    • QoSαϙʔτɿଳҬ༧໿౳
    12

    View full-size slide

  13. 8. Operational Considerations
    • อकੑ (serviceability)ɺϞχλϦϯάɺ diagnostics͸ඞਢʂ

    • Rebootless update: 1ඵະຬͰ׬ྃͰ͖ΔɻVM͔Β͸෼͔Δ

    • State Save/Restore: VFPͷACL/NATͷঢ়ଶҡ͕࣋໨త

    • VMϚΠάϨʔγϣϯɿState΋ҡ࣋͞ΕΔ (্هSSRར༻)

    • ϙʔτɾϨΠϠɾϧʔϧຖʹ300Ҏ্ͷΧ΢ϯλɾ౷ܭ৘ใ //ද8

    • ਍அػೳ

    • VFP/VFPAPIΫϥΠΞϯτ྆ํͰ਍அՄೳ

    • ྫɿ೚ҙͷUFIDΛ࢖ͬͨΫΤϦ

    • VFPͷτϨʔε༗ޮʹ͢Ε͹࣮ࡍͷΞΫγϣϯϩά͕ݟΕΔ

    • snapshotΛऔಘ͠ɺϩʔΧϧͰ෮ݩɾղੳ
    13

    View full-size slide

  14. 9. Hardware Of
    fl
    oads and Performance
    • NIC (HW)ͰΦϑϩʔυ

    • NVGRE/VXLANΦϑϩʔυ:

    • 40GbpsϥΠϯϨʔτEncapୡ੒

    • QoSΦϑϩʔυ

    • VFPϙϦγʔΦϑϩʔυ

    • ଟஈtableΛ௚ྻlookup͢ΔͱTCAM/CPU͕ඞཁ

    • Uni
    fi
    ed FlowsͰhash͢Δ͜ͱͰղফ

    • VMͰ25GbpsϥΠϯϨʔτ, VNET಺E2EͰ<25us
    14
    TCP synͰ1-2ܻੑೳྑ͍ ϨΠϠ૿͑ͯ΋ੑೳҡ࣋ Q (CPU਺)Ͱੑೳ޲্
    ෺ཧ(PacketDirect)ͷ৔߹
    ΑΓྑ͍

    View full-size slide

  15. 10. Experiences (1/3)
    • ΰʔϧ ʢP.6 ࠶ܝʣ

    1. ୯ҰͷSDN-ControllerͰ͸ͳ͘ɺෳ਺ͷControllerΛಉ࣌ʹ࢖͑Δʢ໾ׂΛ෼ׂͰ͖ΔɻϚϧνίϯτϩʔϥϞσϧʣ

    • ϨΠϠʔಠཱͤ͞ɺଞͷίϯτϩʔϥʹӨڹͳ͘σϓϩΠՄೳ

    2. ύέοτ୯ҐͰ͸ͳ͘ɺίωΫγϣϯ୯ҐͰॲཧΛ͢Δɻstateful connection͕
    fi
    st class object. (ASICॲཧͱͷҧ͍) i.e. NAT/ACL

    • VFPͰ͸શͯͷ઀ଓ͸εςʔτϑϧʹѻΘΕΔ

    3. Encap/Decap౳ͷϩδοΫΛίϯτϩʔϥʹԡ͠ࠐΊͳ͍ɻ͍͍ײ͡ͷεΩʔϚΛD-planeʹಥͬࠐΊΔΑ͏ʹ͓ͯ͘͠ͱɺߋ৽͕଎͍ʢP4·Ͱ͸ཁΒͳ͍ʣ

    • VNET, LB, ACL౳͕VFPͷมߋͳ͠Ͱ࢖͍ଓ͚ΒΕΔɻ

    4. VFP updateʹΑΔ௨৴அ͸ڐ༰͞Εͳ͍ -> ෳࡶͳstateful
    fl
    owΛ࢖͏ͷ͸ݫ͍͠

    • RebootlessͰԿ౓΋update͖ͯͨ͠

    5. ڊେͳϧʔϧ܈Ͱ΋ߴ଎఻ૹΛ࣮ݱɿ10+ tables, 40Gbps+, Ωϟογϡͷ࢓૊Έɺޮ཰తͳclassi
    fi
    cationͷ޻෉

    • UFTͰVFPੑೳ޲্ɺಛʹϨΠϠʔ͕ଟஈͷ৔߹ɻ

    6. SRIOV/NIC-o
    ff l
    oad: NICΩϟύʹऩ·ΔΑ͏ʹࣄલίϯύΠϧ+׬શҰகϑϩʔϞσϧͰରԠ

    • UFTͰͷ୳ࡧʹͯ1-2ܻੑೳ޲্ɺ͞ΒʹHWΦϑϩʔυ (SRIOV)͠ੑೳ޲্
    15

    View full-size slide

  16. 10. Experiences (2/3)
    • 2012೥Ҏ߱ɺVFP͸21ճϝδϟʔϦϦʔεɺશͯͷAzureαʔόɺ30regions

    • L4
    fl
    ow cacheͰे෼ɺOvSͰݴ͏mega
    fl
    ow (wildcard)͸ඞཁͳ͍ɺͱ͍͏ओு

    • Statefulʹ͢ΔͳΒ࠷ॳ͔Β΍Ζ͏ɻޙ͔ΒMatch/action tableͷมߋ͸ແཧʂ

    • ϨΠϠϦϯάॏཁɿίϯτϩʔϥʔؒͰͷϨΠϠϦϯάηϚϯςΟΫεͰ֤ίϯτϩʔϥΛਖ਼͘͠ಠཱͤ͞Δ

    • goto͸༗֐ɿ”ؾΛ͚ͭΔ”͸ແཧɻϨΠϠϦϯάηϚϯςΟΫεͰ΍ΒͤΔ

    • IaaS͸μ΢ϯλΠϜʹහײɿupdateํ๏ΛͪΌΜͱઃܭͯ͠அ࣌ؒΛอূ͢ΔɻͦͷͨΊʹඞཁͳػೳΛ௥Ճ, StateSaveRestoreͳͲɻ

    • D-planeͱwire protocol͸੾Γ཭ͤɿO(1M)εέʔϧͷcontroller/agentϞσϧͰͷ࣮૷ܦݧɻVFPAPI (southbound API)ͷಠཱ
    16
    ղઆऀͷ


    ؾʹͳΔ఺

    View full-size slide

  17. 10. Experiences (3/3)
    • ίϯϑϦΫτݕ஌͸݁ہཁΒͳ͔ͬͨɻಠཱίϯτϩʔϥ͕͏·͘ಈ࡞ɻίϯτϩʔϥσόοάπʔϧͷํ͕େ੾

    • શͯ͸”ΞΫγϣϯ”: VL2ͷτϯωϧ͸τϯωϧI/Fͩͬͨɻ͜ΕΛΞΫγϣϯʹͯ͠MA͕γϯϓϧʹͳΓΦϑϩʔυͰ͖ͨ

    • MTU͸ݒ೦͕ͩͬͨɺΞϯμʔϨΠଆΛେ͖͘औΔ͜ͱͰ໰୊ͳ͠

    • MATεέʔϧɿ࠷େ10-20ϨΠϠɺ1ϨΠϠ࠷େ਺ඦάϧʔϓɺ1άϧʔϓลΓO(50k)ϧʔϧɺ1ϙʔτ͋ͨΓ50ສTCP current-conn

    • γϯϓϧͳMACΞυϨεϕʔεͷϑΥϫʔσΟϯάҎ֎ͷํ๏͸ݟ͔ͭΒͣ

    • E2EϞχλϦϯάɿVMʹ௚઀ೖΕͳ͍ͷͰɺVFPϧʔϧͱͯ͠inject/responseΛ࡞Γɺ͜ΕͰϞχλϦϯάɻVM/ϗετڥքΛ؂ࢹ

    • ঎༻NIC͸ཧ૝తͰ͸ͳ͍ɿAzureͷϙϦγʔ͸SR-IOVͰΦϑϩʔυͰ͖ͳ͍ɻFPGAϕʔεͳNICͰ࣮૷
    17
    ղઆऀͷ


    ؾʹͳΔ఺

    View full-size slide

  18. 11. Conclusion and Future Work
    • VFP (Virtual Filtering Platform)͸Microsoft AzureͷԾ૝εΠον

    • ϓϩάϥϚϏϦςΟͱεέʔϥϏϦςΟΛ঺հ

    • ຊ൪؀ڥͰͷอकੑɾϞχλϦϯάɾdiagnosticsͷݒ೦ͱܦݧ

    • ࠓޙ͸ΑΓ৽͍͠HWϞσϧ΍VFPΦϑϩʔυݴޠͷ֦ு
    18

    View full-size slide