Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#47-48 “NSDI 2023 recap”

#47-48 “NSDI 2023 recap”

cafenero_777

June 08, 2023
Tweet

More Decks by cafenero_777

Other Decks in Technology

Transcript

  1. $ which • NSDI 2023 • Boston, MA, USA, April

    17-19, 2023 • https://www.usenix.org/conference/nsdi23/technical-sessions • '23: 96/560 papers, acceptance rate: 17% • '22: 78/396 papers, acceptance rate: 19.7% • o ffl ineͷΈʂʢڈ೥͸ॳͷhybrid։࠵ʣ • dual-track͸ܧଓ
  2. Awards • Best Paper • LeakyScatter: A Frequency-Agile Directional Backscatter

    Network Above 100 GHz • CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation • DOTE: Rethinking (Predictive) WAN Tra ffi c Engineering • Community Award • Building Flexible, Low-Cost Wireless Access Networks With Magma
  3. NSDI ’23 Technical Sessions • 2023/04/17 • RDMA • Learning

    with GPUs • RPC and Remote Memory • Congestion Control • Distributed Systems • Wireless • Cloud • Internet-Scale Network • 2023/05/19 • Programming the Network • Alternative Networks • Performance • Serverless and Network Functions • Real Networks • Cellular • Testing Physical Layer • 2023/04/18 • Synthesis and Formal Methods • Data Centers • Systems for Learning • Privacy and Security • Video • Data • Making Systems Learn • IoT Networks 23 tracks, 96sessions
  4. ࢀߟɿNSDI ’22 Technical Sessions • 2022/04/04 • Cluster Resource Management

    • Transport Layer - Part 1 • Video Streaming • Programmable Switches - Part 1 • Security and Privacy • Network Troubleshooting and Debugging • Operational Track - Part 1 • Wireless - Part 1 • 2022/04/06 • Operational Track - Part 2 • Edge IoT Applications • Cloud Scale Services • ISPs and CDNs • Cloud Scale Resource Management • Data Center Network Infrastructure • Multi-tenancy • Software Switching and Beyond • 2022/04/05 • Reliable Distributed Systems • Raising the Bar for Programmable Hardware • Testing and Veri fi cation • Programmable Switches - Part 2 • Sketch-based Telemetry • Transport Layer - Part 2 • Troubleshooting • Wireless - Part 2 24 tracks, 78sessions
  5. ࢀߟɿNSDI '19 Technical Sessions • 2019/02/26 • Host Networking •

    Distributed Systems • Modern Network Hardware • Analytics • Data Center Network Architecture • 2019/02/28 • Network Characterization • Privacy and Security • Network Modeling • Wireless Applications • 2019/02/27 • Wireless Technologies • Operating Systems • Monitoring and Diagnosis • Improving Machine Learning • Network Functions • Wireless Applications 15 tracks, 50sessions
  6. ࠷ۙͷಈ޲ • // ࣗ෼͔ΒݟͨΒɺͷ࿩ • RDMAಠཱηογϣϯɻ࣮ӡ༻΁ʁ • Ӵ੕௨৴ɺಛఆಈը഑৴ಛԽʢtiktokεϫΠϓʣ • Φϑϩʔυܥ:

    ύέοτͦͷ΋ͷͰ͸ͳ͘ঢ়ଶ͚ͩΦϑϩʔυ • ػցֶशܥʢjob/resource sked.ʣ͸͍ͭ΋௨Γଟ͍ɺɺ • ແઢ௨৴͸׆گ
  7. ·ͱΊΔํ਑ • ஫ҙ • ʢࢲͷʣڵຯ͕͋ͬͨ΋ͷ͚ͩ঺հ • ʢࢲͷʣཧղͰ͖ͨ΋ͷ͚ͩ঺հ • ͪΌΜͱઆ໌͢Δͷ͕೉͍͠΋ͷͨͪ: NIC

    queue, Distributed system, AI/ DL, Semantics, Veri fi cation, Compiler, Wireless, Edge/IoT • ͭ·Γɺ͍ͭ΋ͷʢࢲͷʣج४
  8. SRNIC: A Scalable Architecture for RDMA NICs Hong Kong University

    of Science and Technology, ByteDance, Unaf fi liated • scalable RDMA NICΞʔΩςΫνϟ: SRNICͰεέʔϥϏϦςΟվળ • FPGAͰϓϩτλΠϓ࣮૷ • QPs (Q Pairs)͕10kͰ΋҆ఆ • PFC free
  9. Hostping: Diagnosing Intra-host Network Bottlenecks in RDMA Servers BUPT, Purple

    Mountain Laboratories, ByteDance Inc. • GPU w/ RDMAͰ100G~ʹͳΔͱϗετ಺NW͕ϘτϧωοΫ • Hostping: RNICͱϗετ಺EPͰϧʔϓόοΫςετͰ஗ԆͱଳҬΛ਍அɾ෼ੳ • طଘҎ֎ʹ΋৽ͨʹ6ͭϘτϧωοΫΛൃݟ Intra-host Inter-host (Miss con fi g.)
  10. Understanding RDMA Microarchitecture Resources for Performance Isolation Duke University, Microsoft,

    Shanghai Jiao Tong University • RDMAΛVM͝ͱʹੑೳisolation͍ͨ͠ • RNICੑೳ෼཭Ͱ͖ΔϚΠΫϩΞʔΩςΫνϟ͸ݱঢ়ଘࡏͤͣɻ • NVIDIA, Chelsio, Intelʹڞ༗ࡁΈɻ
  11. Empowering Azure Storage with RDMA Microsoft • AzureϦʔδϣϯ಺ͰRDMAετϨʔδΛαϙʔτ࢝͠Ίͨ࿩ • RDMAΛVM

    (HV), Storage྆ํͰ༗ޮԽɻregion಺DCؒͰ΋࢖͏ • NICͰDCQCN, sK-RDMAϓϩτίϧɺNWͰPFC/SONiC/SAI • RDMA over commodity Ethernet v2Λ࢖͍ɺطଘΠϯϑϥΛ࢖͏ • 70%͸RDMAτϥϑΟοΫ
  12. Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

    University of Michigan • ֶश׬͕ྃ࣌ؒओ؟ɺΤωϧΪʔޮ཰͸౓ฦ͠ • ΤωϧΪʔফඅྔͱτϨʔχϯά࣌ؒͷτϨʔ υΦϑΛ໌Β͔ʹͨ͠
  13. Remote Procedure Call as a Managed System Service DukeUniversity, University

    of Washington, Shanghai Jiao Tong University • RPCΛ֤ΞϓϦͰ࣮૷͢Δͷ͸ඇޮ཰ͳͷͰɺαʔϏεԽʢσʔϞϯԽʁʣͨ͠ • mRPC: αΠυΧʔൺֱͰ2.5ഒɻॊೈੑ΋૿͢
  14. Bolt: Sub-RTT Congestion Control for Ultra-Low Latency Stanford University, Google

    LLC • 200G, 400G࣌୅ͷ᫔᫓੍ޚɻBDPʹऩ·Βͳ͍ • SRCʢαϒRTT੍ޚʣͰૣ͘᫔᫓ʹؾͮ͘ɺProactive Ramp UpͰϑϩʔিಥΛ༧ݟͯ͠଴ػΛૉૣ͘઎ ༗͢Δ • Swift, HPCCൺͰ99%ileͷ଴ͪ࣌ؒΛ88%୹ॖɺFCTΛ3ഒվળ
  15. Understanding the impact of host networking elements on traf fi

    c bursts Johns Hopkins University, Meta • eBPFͰτϥϑΟοΫॲཧͷՄࢹԽ • όʔετɺ᫔᫓੍ޚɺqdisc, sched. NIC-sched. HW-o ffl oad, protocol • [ns]͔Β[s]Φʔμʔ·ͰݟΕΔ
  16. DiSh: Dynamic Shell-Script Distribution MIT, University of Pennsylvania, Purdue University,

    Brown University • DISH: • γΣϧεΫϦϓτͰ෼ࢄίϯϐϡʔςΟϯά͠Α͏ͥʂ • BashϕʔεͰɺࣗಈฒྻγεςϜར༻(PASH)ɺHDFS/ Hadoop Streamingར༻
  17. SkyPilot: An Intercloud Broker for Sky Computing University of California,

    Berkeley, UC Berkeley and ICSI • Sky of Computing = Inter cloud broker • ϫʔΫϩʔυ͝ͱʹҧ͏public cloudΛ࢖͍෼͚Δ͜ͱͰɺίετϝϦοτʢ࣌ؒɺՁ֨ʣΛग़͢ • cf: https://misreading.chat/2023/04/25/112-skypilot-an-intercloud-broker-for-sky-computing/
  18. Invisinets: Removing Networking from Cloud Networks UC Berkeley, Google, Microsoft

    • Ϋϥ΢υωοτϫʔΫར༻͢Δͷେม͗͢Δ໰୊ • ςφϯτNW૚Λந৅Խͨ͠APIͷఏڙ • PRDO: Publicly Routable but Default O ff • routing͸ग़དྷΔ͕ɺσϑΥϧτ͸deny • શΤϯυϙΠϯτʹIPv6෇༩ • ෳࡶ͞ͷ90%Λ࡟ݮͰ͖ͨ • Cf: https://misreading.chat/2023/05/18/114-invisinets-removing-networking-from-cloud-networks/
  19. xBGP: Faster Innovation in Routing Protocols ICTEAM, UCLouvain, I IJ

    /Arrcus, Inc, NSG, ETH Zürich • BGPͷػೳ௥Ճ͸஗͍ɺ͕ɺૣ͘࢖͍͍ͨ • ϕϯμʔχϡʔτϥϧͳAPIͱBGP࣮૷ͷ֦ு෦෼ΛeBPFͰఆٛɾ࣮૷ • FRR/BIRDͰ࣮૷ • Use case 7ͭ঺հ: withdrawࣦഊ࣌ʹTSͰϧʔτഁغػೳɻϧʔτબ୒ํ๏ͷ؂ࢹͱڞ༗ɻ఻ൖ࣌ؒͷଌఆɻetc... • Cf: https://blog.apnic.net/2021/01/27/xbgp-toward-a-fully-extensible-bgp/ 873k route@IPv4 120k route@IPv6
  20. Flattened Clos: Designing High-performance Deadlock-free Expander Data Center Networks Using

    Graph Contraction Shanghai Jiao Tong University, Chinese Academy of Sciences • FC: Flattened Closߏ੒ͷఏҊ • ToRΛ࿦ཧతʹkݸʹ෼͚ɺྡ઀Ծ૝ Up-down pathΛ࡞Γɺ fl attenedͤ͞Δ • CBD-free routing
  21. TOPOOPT: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training

    Jobs Massachusetts Institute of Technology, Meta, CMU, Telescent • TOPOOPTτϙϩδͰ100G RDMAΛ࢖ͬͯDNNֶश • Direct connect NW w/ ޫεΠον + ύονύωϧ + NPAR • Fat-TreeൺͰ3ഒ଎͘ɺ҆Ձ@12node ֶशதͷ௨৴ύλʔϯ
  22. A High-Speed Stateful Packet Processing Approach for Tbps Programmable Switches

    KTH Royal Institute of Technology, Roma Tre University, UCLouvain • RDMAసૹ࣌ɺstate͸NFʹ෼཭ɾసૹ͢Δ • ͜ΕΛP4Ͱ΍Δ • 300GbpsΛୡ੒
  23. ExoPlane: An Operating System for On-Rack Switch Resource Augmentation Microsoft,

    University of Texas at Austin, Carnegie Mellon University • In-network computing on Rack • ToR (P4)ͱSmartNICΛ࢖ͬͯɺINCΛ࣮ݱɻಛʹstate؅ཧΛ࿈ಈͯ͠΍Δ
  24. RingLeader: Ef fi ciently Of fl oading Intra-Server Orchestration to

    NICs Google, UT Austin • αʔό಺ΦʔέετϨʔγϣϯʢsked.?ʣΛNIC assisted CPU sked.ͱ͢Δ • FPGAͰ࣮૷͠ɺtail-latency, throughput, CPU࢖༻཰Λվળ
  25. Skyplane: Optimizing Transfer Cost and Throughput Using Cloud- Aware Overlays

    University of California, Berkeley • Inter cloudͰόϧΫσʔλసૹγεςϜ • Ұ൪Ձ֨ޮ཰͕ྑ͍ํ๏Λݟ͚ͭΔʢSkyplane plannerʣ • ઢܗܭը๏Ͱղ͘ • Ϋϥ΢υ಺: ࠷େ4.6ഒ • Ϋϥ΢υؒ: ࠷େ5.0ഒ
  26. Electrode: Accelerating Distributed Protocols with eBPF Harvard University, Peking University,

    Cornell University • ෼ࢄϓϩτίϧΛIn kernel (eBPF)Ͱ࣮૷ • Context switch, NW stackͷΦʔόʔϔου͕ͳ͍ • throughput 128%, latency 41%޲্
  27. Disaggregating Stateful Network Functions Microsoft and AMD Pensando • ൚༻ARMίΞͱASICʢߴ଎stateful

    match/actionʣ Λ༻͍ͯɺॲཧΛϗετ͔Β੾Γ཭͠ɺNFΛ෼ࢄԽ • 12NICϚγϯΛ࣮૷͠ɺNFੑೳ͕10ഒ޲্ • Azureͷ࣮ӡ༻݁Ռͷ঺հ
  28. DOTE: Rethinking (Predictive) WAN Traf fi c Engineering Hebrew University

    of Jerusalem, Microsoft Research, Technion • Best paper ! • DOTE: աڈͷσʔλͷΈΛ࢖ͬͯDL͠ɺWAN TE͢Δ • Direct Optimization for Tra ffi c Engineering • धཁ༧ଌʢNot IPFIXͰࡉ͔͘෼ੳ or Not demand-basedʣͰ͸ͳ͘௚઀࠷దԽ • ֬཰࠷దԽ + ࣮ੈքରԠͷͨΊʹML/DL΋࢖͏ • ܭࢉ࣌ؒ΋ૣ͘ɺ݁Ռ΋ྑ͍ • τϥϑΟοΫมԽ΍ো֐ݎ࿚ੑ΋ྑ͍
  29. Dashlet: Taming Swipe Uncertainty for Robust Short Video Streaming Princeton

    University • εϫΠϓͷλΠϛϯάʹಛԽͨ͠ϏσΦετϦʔϛϯάख๏վળ • videoϨίϝϯυͱ࿈ܞͨ͠όοϑΝϦϯάɺϏοτϨʔτվળͷ࣮૷ • ϏσΦ඼࣭޲্Λ֬ೝ
  30. Norma: Towards Practical Network Load Testing Nanjing University, Alibaba Group

    • pktgenͰग़དྷͯͳ͍͜ͱ • εςʔτϑϧ/ϦΞϧͳτϥϑΟοΫ • Tbpsͳ޿ଳҬͱϨʔτ੍ޚ • Norma: Programmable SW ASIC (To fi no w/ P4 1kߦ*)Ͱ࡞ͬͨ • 3TbpsͷTCP, 1TbpsͷHTTPτϥϑΟοΫΛੜ੒ + SWجຊػೳͰ8kߦ
  31. ׬૸ͨ͠ײ૝ • ͱʹ͔͘ྔଟ͗͢ʂʢҰ೥ͿΓೋ౓໨ʣ • Abstract/ConclusionಡΉ͚ͩͰ΋͠ΜͲ͍ • ڈ೥ΑΓϚγ // ׳Ε͚ͨͩ •

    NSDIʹ෺ཧࢀՃ͔ͨͬͨ͠ • ؾʹͳΔ΋ͷ͸ؾʹͳͬͨ࣌ʹಡΉͱྑ͍ • ΋͏গ͠खΛಈ͔͍ͨ͠
  32. EoP