Slide 1

Slide 1 text

Research Paper Introduction #29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era” ௨ࢉ#84 @cafenero_777 2021/10/14 1

Slide 2

Slide 2 text

Agenda • ର৅࿦จ • ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ 1. Introduction 2. Kernel-Bypass Accelerators in the Datacenter 3. Evolving the Datacenter OS for Kernel Bypass 4. The Demikernel 5. Future Work 6. Related Work 7. CONCLUSION 2

Slide 3

Slide 3 text

ର৅࿦จ • I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era • Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, Anirudh Badam • Microsoft Research, University of Wisconsin, University of Texas • HotOS '19 • https://dl.acm.org/doi/10.1145/3317550.3321422 3

Slide 4

Slide 4 text

֓ཁͱಡ΋͏ͱͨ͠ཧ༝ • ֓ཁ • DCNW༻్ͰͷOS͸”ऴᖼ (demise)”͍ͯ͠Δʁʂ • RDMA/DPDK͸ߴ଎͕ͩந৅ԽΛࡴ͢ • ৽͍͠I/Oந৅Խ: DemikernelͷఏҊ • ಡ΋͏ͱͨ͠ཧ༝ • NWߴ଎ԽͷͲͷํ޲΁ʁ • ۙະདྷͷ࿩: library OS? • ΩϟονʔͳtitleͩͬͨͷͰ 4

Slide 5

Slide 5 text

1. Introduction 5 • աڈ10೥ͷαʔόI/Oߴ଎Խ V.S. CPUੑೳ • TCP-o ff l oad, SmartNIC/SR-IOV, Comp./Enc./ML on FPGA • kernel bypassٕज़ͰI/OΦʔόʔϔουΛ࡟ݮ • ػೳఏڙ͸͢Δ͕ɺந৅ԽϨΠϠʔ͕ແ͍ • ʢྫɿsocket, fi le, pipeʣ • ෼ࢄϝϞϦɾ෼ࢄετϨʔδ w/ RDMA • systemΛHWʹ߹ΘͤͯΧελϚΠζ -> େมʂ • OSΛͲ͏ม͑Δ΂͖͔ɻ৽OSΞʔΩςΫνϟDemikernelͰઃܭٞ͠࿦

Slide 6

Slide 6 text

2. Kernel-Bypass Accelerators in the Datacenter 6 • Kernel Bypass • KernelΦʔόʔϔου͸ۃখͰ࠷଎ͷύέοτసૹΛ໨ࢦ͢ • I/F΍ػೳ͸ଘࡏ͠ͳ͍ • ϓϩάϥϚ͕OSಉ౳ͷػೳ௥ՃɾσόΠεຖʹػೳ௥Ճ • ྫɿ • DPDK: جຊతͳI/OσόΠεػೳΛنఆ • Arrakis: HWԾ૝Խٕज़(SR-IOV౳)Ͱ࣮૷ • RDMA: verbs I/F΍rdmacm I/F (~socket)࢓༷͸͋Δ͕ɾɾɾ • FPGA: ԿͰ΋Ͱ͖Δ͕࣮༻ੑ͸use case࣍ୈɾɾɾ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf

Slide 7

Slide 7 text

3. Evolving the Datacenter OS for Kernel Bypass 7 • UserۭؒͰͷIʗOॲཧ࠷దԽ • طଘLibrary OSʹ͸ڞ༗ɾଟॏԽͷ࢓૊Έ͕͋Δʢ͕ॏ͍ʣ • ಁաతϝϞϦ֬อʢi.e. DDIO, NIC<->LLCʣ౳͕ແ͍ͷͰ࠶࣮૷ • ޮ཰తͳந৅Խ • I/Oॲཧ͕௕͔ͬͨࠒͷઃܭʢͷ໊࢒ʣV.S. ݱ୅ɿRedis͸readͰ2us • ந৅Խͱੑೳͷڱؒ • طଘPOSIX APIҡ࣋͸ߋʹΦʔόʔϔου͕͔͔Δ • طଘLibrary OSͱͷػೳͷҧ͍ • طଘɿkernel I/F΍σόΠεػೳ͕ۉҰͰ͋Δલఏ • ࠓճɿKernel-Bypass framework (HWͱͷSW/kernelͷ”ྑ͍ͱ͜औΓ”తͳʣΛೖΕ͍ͨ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf

Slide 8

Slide 8 text

4. The Demikernel (1/3) 8 • Architecture: C/D pathͷ෼཭ • C: network/ fi le open, ஗ͯ͘΋ྑ͍ -> طଘKernel • D: network/storage/memory΁ͷread/write -> LibOS + accelerator • I/O queueͱͯ͠ந৅Խ • HW͸ී௨queueΛར༻ -> ͜ΕΛͦͷ··ந৅Խ • atomic data unitͱͯ͠ѻ͑Δʢ༨෼ͳ଴ͪ͸ൃੜ͠ͳ͍ʣ • σόΠεʹґଘ͠ͳ͍ߴϨϕϧந৅ԽKernel-BypassϨΠϠʔ

Slide 9

Slide 9 text

4. The Demikernel (2/3) 9 • Syscall interface • C: socket(): queue descriptorΛฦ͢ (not fi le descriptor) • C: packet type౳Ͱ fi lter(): BPF frameworkΛ૝ఆ • C: merge(): I/OΩϡʔͷϚʔδ • C: sort(): ༏ઌ౓ʹԠͯ͡I/OΩϡʔΛ࢖͏ • C: map(): P4తͳෳࡶͳpktॲཧ΋࣮૷Ͱ͖ͦ͏ • D: push/pop • ૢ࡞ൣғͷࢦఆ • non-blockingॲཧ. wait_*()Ͱfetch

Slide 10

Slide 10 text

4. The Demikernel (3/3) 10 • qtoken: Ұͭͷqૢ࡞ຖʹݻ༗ • epollΛվળͰ͖Δ • wait_*()͕௚઀σʔλΛฦ͢->ଞͷsyscallݺ͹ͣʹʢۭৼΓʣࡁΉ • pop completion: pop͕׬ྃͨ͠Βthread͕ى͖Δɻbusy pollingཁΒͳ͍ • zero copy: • 1. ಁաతϝϞϦ֬อɿLibOS͕IOMMUϝϞϦొ࿥Λߦ͏ • 2. ΞϓϦέʔγϣϯͱI/OσόΠεؒͰͷڞ༗ϝϞϦͷௐ੔ΛͳΔ΂͘ݮΒ͢ • Free protect: ΞϓϦόοϑΝ։์໋ྩ -> LibOS͕I/Oऴྃ·Ͱ଴͔ͬͯΒ։์ • ʢैདྷಉ༷ʣॻ͖ࠐΈอޢ͸ແ͠ -> όοϑΝมߋʢwriteʣ͸I/O଴ͭඞཁ͋Γ • DCར༻Ͱ͸໰୊ແ͠ɺͱ͍͏ओுɻྫɿRedisͰ͸put requestຖʹόοϑΝׂ౰ޙɺσʔλߏ଄ମͰͦͷϙΠϯλʹࢦఆ

Slide 11

Slide 11 text

5. Future Work • OS Design • ಛఆΞΫηϥϨʔλͷෆ଍ػೳΛLibOSͰิ׬ʢDPDKͳΒNWελοΫશൠʣ • ΞΫηϥϨʔλͷछྨ͕ଟ͍৔߹͸શ෦LibOS͕ίʔυΛ࣋ͭʢʂʣLibOSͱ͸ɾɾɾʁ • Network Protocols • I/OͷΑΓ൚༻తͳdata unit෼ׂΛ໨ࢦ͢ɻ • طଘͷϑϨʔϜϫʔΫʢTCP΍HTTPSͳͲʣͳΒड৴ଆͰ΋࠶ߏ੒Ͱ͖Δ͕ɺ൚༻ੑ੍͕ݶ͞Εͯ͠·͏ • File System and Storage • طଘFS (ext4ͳͲ)ΛLibOS (γϯάϧΞϓϦέʔγϣϯ)Ͱ࢖͏ʹ͸Φʔόʔϔου͕େ͖͗͢ • ΞΫηϥϨʔλʹదͨ͠FS? 11

Slide 12

Slide 12 text

6. Related Work • OS • Arraakis, IXΑΓ΋ந৅౓ͷߴ͍I/F • ϢʔβϨϕϧͷOS֦ுͰHWӅṭ -> NWελοΫͳͲͷOSػೳ͸ແ͍ • I/O Accelerated System • POSIX I/Fʹҡ࣋Ͱඇޮ཰ԽɻྫɿmTCPͩͱDPDKΑΓlatency͔͔Δ • NW/TCPॲཧΛPMD/NICͰ΍Δɺ΋͘͠͸᫔᫓੍ޚΛOS͔Β֎͢ํ޲ʢQUICͳͲʣ • I/O Accelerated Application • RDMAΛ࢖ͬͨϦϞʔτϝϞϦ΁ͷ௿ϨΠςϯγʔΞϓϦέγϣʔϯ౳ 12

Slide 13

Slide 13 text

7. Conclusion • I/Oੑೳେ෯޲্ʹ௥͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ࢖͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕࢖͑ͳ͍ɺI/Oந৅Խ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/Oந৅ԽΛٞ࿦ 13

Slide 14

Slide 14 text

ࡾߦ·ͱΊ 14 • I/Oੑೳେ෯޲্ʹ௥͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ࢖͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕࢖͑ͳ͍ɺI/Oந৅Խ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/Oந৅ԽΛٞ࿦

Slide 15

Slide 15 text

EoP 15