Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”

#29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”

cafenero_777

June 19, 2023
Tweet

More Decks by cafenero_777

Other Decks in Technology

Transcript

  1. Research Paper Introduction #29


    “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”

    ௨ࢉ#84
    @cafenero_777

    2021/10/14
    1

    View full-size slide

  2. Agenda
    • ର৅࿦จ

    • ֓ཁͱಡ΋͏ͱͨ͠ཧ༝

    1. Introduction

    2. Kernel-Bypass Accelerators in the Datacenter

    3. Evolving the Datacenter OS for Kernel Bypass

    4. The Demikernel

    5. Future Work

    6. Related Work

    7. CONCLUSION
    2

    View full-size slide

  3. ର৅࿦จ
    • I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era

    • Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, Anirudh
    Badam

    • Microsoft Research, University of Wisconsin, University of Texas

    • HotOS '19

    • https://dl.acm.org/doi/10.1145/3317550.3321422
    3

    View full-size slide

  4. ֓ཁͱಡ΋͏ͱͨ͠ཧ༝
    • ֓ཁ

    • DCNW༻్ͰͷOS͸”ऴᖼ (demise)”͍ͯ͠Δʁʂ

    • RDMA/DPDK͸ߴ଎͕ͩந৅ԽΛࡴ͢

    • ৽͍͠I/Oந৅Խ: DemikernelͷఏҊ

    • ಡ΋͏ͱͨ͠ཧ༝

    • NWߴ଎ԽͷͲͷํ޲΁ʁ

    • ۙະདྷͷ࿩: library OS?

    • ΩϟονʔͳtitleͩͬͨͷͰ
    4

    View full-size slide

  5. 1. Introduction
    5
    • աڈ10೥ͷαʔόI/Oߴ଎Խ V.S. CPUੑೳ

    • TCP-o
    ff l
    oad, SmartNIC/SR-IOV, Comp./Enc./ML on FPGA

    • kernel bypassٕज़ͰI/OΦʔόʔϔουΛ࡟ݮ

    • ػೳఏڙ͸͢Δ͕ɺந৅ԽϨΠϠʔ͕ແ͍

    • ʢྫɿsocket,
    fi
    le, pipeʣ

    • ෼ࢄϝϞϦɾ෼ࢄετϨʔδ w/ RDMA

    • systemΛHWʹ߹ΘͤͯΧελϚΠζ -> େมʂ

    • OSΛͲ͏ม͑Δ΂͖͔ɻ৽OSΞʔΩςΫνϟDemikernelͰઃܭٞ͠࿦

    View full-size slide

  6. 2. Kernel-Bypass Accelerators in the Datacenter
    6
    • Kernel Bypass

    • KernelΦʔόʔϔου͸ۃখͰ࠷଎ͷύέοτసૹΛ໨ࢦ͢

    • I/F΍ػೳ͸ଘࡏ͠ͳ͍

    • ϓϩάϥϚ͕OSಉ౳ͷػೳ௥ՃɾσόΠεຖʹػೳ௥Ճ

    • ྫɿ

    • DPDK: جຊతͳI/OσόΠεػೳΛنఆ

    • Arrakis: HWԾ૝Խٕज़(SR-IOV౳)Ͱ࣮૷

    • RDMA: verbs I/F΍rdmacm I/F (~socket)࢓༷͸͋Δ͕ɾɾɾ

    • FPGA: ԿͰ΋Ͱ͖Δ͕࣮༻ੑ͸use case࣍ୈɾɾɾ
    https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf

    View full-size slide

  7. 3. Evolving the Datacenter OS for Kernel Bypass
    7
    • UserۭؒͰͷIʗOॲཧ࠷దԽ

    • طଘLibrary OSʹ͸ڞ༗ɾଟॏԽͷ࢓૊Έ͕͋Δʢ͕ॏ͍ʣ

    • ಁաతϝϞϦ֬อʢi.e. DDIO, NIC<->LLCʣ౳͕ແ͍ͷͰ࠶࣮૷

    • ޮ཰తͳந৅Խ

    • I/Oॲཧ͕௕͔ͬͨࠒͷઃܭʢͷ໊࢒ʣV.S. ݱ୅ɿRedis͸readͰ2us

    • ந৅Խͱੑೳͷڱؒ

    • طଘPOSIX APIҡ࣋͸ߋʹΦʔόʔϔου͕͔͔Δ

    • طଘLibrary OSͱͷػೳͷҧ͍

    • طଘɿkernel I/F΍σόΠεػೳ͕ۉҰͰ͋Δલఏ

    • ࠓճɿKernel-Bypass framework (HWͱͷSW/kernelͷ”ྑ͍ͱ͜औΓ”తͳʣΛೖΕ͍ͨ
    https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf

    View full-size slide

  8. 4. The Demikernel (1/3)
    8
    • Architecture: C/D pathͷ෼཭

    • C: network/
    fi
    le open, ஗ͯ͘΋ྑ͍ -> طଘKernel

    • D: network/storage/memory΁ͷread/write -> LibOS + accelerator

    • I/O queueͱͯ͠ந৅Խ

    • HW͸ී௨queueΛར༻ -> ͜ΕΛͦͷ··ந৅Խ

    • atomic data unitͱͯ͠ѻ͑Δʢ༨෼ͳ଴ͪ͸ൃੜ͠ͳ͍ʣ

    • σόΠεʹґଘ͠ͳ͍ߴϨϕϧந৅ԽKernel-BypassϨΠϠʔ

    View full-size slide

  9. 4. The Demikernel (2/3)
    9
    • Syscall interface

    • C: socket(): queue descriptorΛฦ͢ (not
    fi
    le descriptor)

    • C: packet type౳Ͱ
    fi
    lter(): BPF frameworkΛ૝ఆ

    • C: merge(): I/OΩϡʔͷϚʔδ

    • C: sort(): ༏ઌ౓ʹԠͯ͡I/OΩϡʔΛ࢖͏

    • C: map(): P4తͳෳࡶͳpktॲཧ΋࣮૷Ͱ͖ͦ͏

    • D: push/pop

    • ૢ࡞ൣғͷࢦఆ

    • non-blockingॲཧ. wait_*()Ͱfetch

    View full-size slide

  10. 4. The Demikernel (3/3)
    10
    • qtoken: Ұͭͷqૢ࡞ຖʹݻ༗

    • epollΛվળͰ͖Δ

    • wait_*()͕௚઀σʔλΛฦ͢->ଞͷsyscallݺ͹ͣʹʢۭৼΓʣࡁΉ

    • pop completion: pop͕׬ྃͨ͠Βthread͕ى͖Δɻbusy pollingཁΒͳ͍

    • zero copy:

    • 1. ಁաతϝϞϦ֬อɿLibOS͕IOMMUϝϞϦొ࿥Λߦ͏

    • 2. ΞϓϦέʔγϣϯͱI/OσόΠεؒͰͷڞ༗ϝϞϦͷௐ੔ΛͳΔ΂͘ݮΒ͢

    • Free protect: ΞϓϦόοϑΝ։์໋ྩ -> LibOS͕I/Oऴྃ·Ͱ଴͔ͬͯΒ։์

    • ʢैདྷಉ༷ʣॻ͖ࠐΈอޢ͸ແ͠ -> όοϑΝมߋʢwriteʣ͸I/O଴ͭඞཁ͋Γ

    • DCར༻Ͱ͸໰୊ແ͠ɺͱ͍͏ओுɻྫɿRedisͰ͸put requestຖʹόοϑΝׂ౰ޙɺσʔλߏ଄ମͰͦͷϙΠϯλʹࢦఆ

    View full-size slide

  11. 5. Future Work
    • OS Design

    • ಛఆΞΫηϥϨʔλͷෆ଍ػೳΛLibOSͰิ׬ʢDPDKͳΒNWελοΫશൠʣ

    • ΞΫηϥϨʔλͷछྨ͕ଟ͍৔߹͸શ෦LibOS͕ίʔυΛ࣋ͭʢʂʣLibOSͱ͸ɾɾɾʁ

    • Network Protocols

    • I/OͷΑΓ൚༻తͳdata unit෼ׂΛ໨ࢦ͢ɻ

    • طଘͷϑϨʔϜϫʔΫʢTCP΍HTTPSͳͲʣͳΒड৴ଆͰ΋࠶ߏ੒Ͱ͖Δ͕ɺ൚༻ੑ੍͕ݶ͞Εͯ͠·͏

    • File System and Storage

    • طଘFS (ext4ͳͲ)ΛLibOS (γϯάϧΞϓϦέʔγϣϯ)Ͱ࢖͏ʹ͸Φʔόʔϔου͕େ͖͗͢

    • ΞΫηϥϨʔλʹదͨ͠FS?
    11

    View full-size slide

  12. 6. Related Work
    • OS

    • Arraakis, IXΑΓ΋ந৅౓ͷߴ͍I/F

    • ϢʔβϨϕϧͷOS֦ுͰHWӅṭ -> NWελοΫͳͲͷOSػೳ͸ແ͍

    • I/O Accelerated System

    • POSIX I/Fʹҡ࣋Ͱඇޮ཰ԽɻྫɿmTCPͩͱDPDKΑΓlatency͔͔Δ

    • NW/TCPॲཧΛPMD/NICͰ΍Δɺ΋͘͠͸᫔᫓੍ޚΛOS͔Β֎͢ํ޲ʢQUICͳͲʣ

    • I/O Accelerated Application

    • RDMAΛ࢖ͬͨϦϞʔτϝϞϦ΁ͷ௿ϨΠςϯγʔΞϓϦέγϣʔϯ౳
    12

    View full-size slide

  13. 7. Conclusion
    • I/Oੑೳେ෯޲্ʹ௥͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ࢖͏

    • Kernel-bypassͷͨΊOS/kernelͷػೳ͕࢖͑ͳ͍ɺI/Oந৅Խ͕ग़དྷͳ͍

    • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/Oந৅ԽΛٞ࿦
    13

    View full-size slide

  14. ࡾߦ·ͱΊ
    14
    • I/Oੑೳେ෯޲্ʹ௥͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ࢖͏

    • Kernel-bypassͷͨΊOS/kernelͷػೳ͕࢖͑ͳ͍ɺI/Oந৅Խ͕ग़དྷͳ͍

    • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/Oந৅ԽΛٞ࿦

    View full-size slide