Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#29 “I’m Not Dead Yet! The Role of the Operatin...
Search
cafenero_777
June 19, 2023
Technology
0
140
#29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”
HotOS '19
https://dl.acm.org/doi/10.1145/3317550.3321422
cafenero_777
June 19, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
510
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
120
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
130
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
96
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
66
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
cafenero_777
1
130
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
47
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
240
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
290
Other Decks in Technology
See All in Technology
Adapty_東京AI祭ハッカソン2025ピッチスライド
shinoyamada
0
280
能登半島地震において デジタルができたこと・できなかったこと
ditccsugii
0
130
OpenAI gpt-oss ファインチューニング入門
kmotohas
2
1.2k
Escaping_the_Kraken_-_October_2025.pdf
mdalmijn
0
170
プロポーザルのコツ ~ Kaigi on Rails 2025 初参加で3名の登壇を実現 ~
naro143
1
220
エンタメとAIのための3Dパラレルワールド構築(GPU UNITE 2025 特別講演)
pfn
PRO
0
240
業務効率化をさらに加速させる、ノーコードツールとStep Functionsのハイブリッド化
smt7174
2
130
Developer Advocate / Community Managerなるには?
tsho
0
140
「れきちず」のこれまでとこれから - 誰にでもわかりやすい歴史地図を目指して / FOSS4G 2025 Japan
hjmkth
1
290
Reflections of AI: A Trilogy in Four Parts (GOTO; Copenhagen 2025)
ondfisk
0
110
LLM時代にデータエンジニアの役割はどう変わるか?
ikkimiyazaki
6
1.3k
AIツールでどこまでデザインを忠実に実装できるのか
oikon48
6
3.3k
Featured
See All Featured
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
The World Runs on Bad Software
bkeepers
PRO
72
11k
We Have a Design System, Now What?
morganepeng
53
7.8k
Balancing Empowerment & Direction
lara
4
690
Typedesign – Prime Four
hannesfritz
42
2.8k
Six Lessons from altMBA
skipperchong
28
4k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Fireside Chat
paigeccino
40
3.7k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
620
How STYLIGHT went responsive
nonsquared
100
5.8k
Transcript
Research Paper Introduction #29 “I’m Not Dead Yet! The Role
of the Operating System in a Kernel-Bypass Era” ௨ࢉ#84 @cafenero_777 2021/10/14 1
Agenda • ରจ • ֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Kernel-Bypass Accelerators
in the Datacenter 3. Evolving the Datacenter OS for Kernel Bypass 4. The Demikernel 5. Future Work 6. Related Work 7. CONCLUSION 2
ରจ • I’m Not Dead Yet! The Role of the
Operating System in a Kernel-Bypass Era • Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, Anirudh Badam • Microsoft Research, University of Wisconsin, University of Texas • HotOS '19 • https://dl.acm.org/doi/10.1145/3317550.3321422 3
֓ཁͱಡ͏ͱͨ͠ཧ༝ • ֓ཁ • DCNW༻్ͰͷOS”ऴᖼ (demise)”͍ͯ͠Δʁʂ • RDMA/DPDKߴ͕ͩநԽΛࡴ͢ • ৽͍͠I/OநԽ:
DemikernelͷఏҊ • ಡ͏ͱͨ͠ཧ༝ • NWߴԽͷͲͷํʁ • ۙະདྷͷ: library OS? • ΩϟονʔͳtitleͩͬͨͷͰ 4
1. Introduction 5 • աڈ10ͷαʔόI/OߴԽ V.S. CPUੑೳ • TCP-o ff
l oad, SmartNIC/SR-IOV, Comp./Enc./ML on FPGA • kernel bypassٕज़ͰI/OΦʔόʔϔουΛݮ • ػೳఏڙ͢Δ͕ɺநԽϨΠϠʔ͕ແ͍ • ʢྫɿsocket, fi le, pipeʣ • ࢄϝϞϦɾࢄετϨʔδ w/ RDMA • systemΛHWʹ߹ΘͤͯΧελϚΠζ -> େมʂ • OSΛͲ͏ม͑Δ͖͔ɻ৽OSΞʔΩςΫνϟDemikernelͰઃܭٞ͠
2. Kernel-Bypass Accelerators in the Datacenter 6 • Kernel Bypass
• KernelΦʔόʔϔουۃখͰ࠷ͷύέοτసૹΛࢦ͢ • I/Fػೳଘࡏ͠ͳ͍ • ϓϩάϥϚ͕OSಉͷػೳՃɾσόΠεຖʹػೳՃ • ྫɿ • DPDK: جຊతͳI/OσόΠεػೳΛنఆ • Arrakis: HWԾԽٕज़(SR-IOV)Ͱ࣮ • RDMA: verbs I/Frdmacm I/F (~socket)༷͋Δ͕ɾɾɾ • FPGA: ԿͰͰ͖Δ͕࣮༻ੑuse case࣍ୈɾɾɾ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
3. Evolving the Datacenter OS for Kernel Bypass 7 •
UserۭؒͰͷIʗOॲཧ࠷దԽ • طଘLibrary OSʹڞ༗ɾଟॏԽͷΈ͕͋Δʢ͕ॏ͍ʣ • ಁաతϝϞϦ֬อʢi.e. DDIO, NIC<->LLCʣ͕ແ͍ͷͰ࠶࣮ • ޮతͳநԽ • I/Oॲཧ͕͔ͬͨࠒͷઃܭʢͷ໊ʣV.S. ݱɿRedisreadͰ2us • நԽͱੑೳͷڱؒ • طଘPOSIX APIҡ࣋ߋʹΦʔόʔϔου͕͔͔Δ • طଘLibrary OSͱͷػೳͷҧ͍ • طଘɿkernel I/FσόΠεػೳ͕ۉҰͰ͋Δલఏ • ࠓճɿKernel-Bypass framework (HWͱͷSW/kernelͷ”ྑ͍ͱ͜औΓ”తͳʣΛೖΕ͍ͨ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
4. The Demikernel (1/3) 8 • Architecture: C/D pathͷ •
C: network/ fi le open, ͯ͘ྑ͍ -> طଘKernel • D: network/storage/memoryͷread/write -> LibOS + accelerator • I/O queueͱͯ͠நԽ • HWී௨queueΛར༻ -> ͜ΕΛͦͷ··நԽ • atomic data unitͱͯ͠ѻ͑Δʢ༨ͳͪൃੜ͠ͳ͍ʣ • σόΠεʹґଘ͠ͳ͍ߴϨϕϧநԽKernel-BypassϨΠϠʔ
4. The Demikernel (2/3) 9 • Syscall interface • C:
socket(): queue descriptorΛฦ͢ (not fi le descriptor) • C: packet typeͰ fi lter(): BPF frameworkΛఆ • C: merge(): I/OΩϡʔͷϚʔδ • C: sort(): ༏ઌʹԠͯ͡I/OΩϡʔΛ͏ • C: map(): P4తͳෳࡶͳpktॲཧ࣮Ͱ͖ͦ͏ • D: push/pop • ૢ࡞ൣғͷࢦఆ • non-blockingॲཧ. wait_*()Ͱfetch
4. The Demikernel (3/3) 10 • qtoken: Ұͭͷqૢ࡞ຖʹݻ༗ • epollΛվળͰ͖Δ
• wait_*()͕σʔλΛฦ͢->ଞͷsyscallݺͣʹʢۭৼΓʣࡁΉ • pop completion: pop͕ྃͨ͠Βthread͕ى͖Δɻbusy pollingཁΒͳ͍ • zero copy: • 1. ಁաతϝϞϦ֬อɿLibOS͕IOMMUϝϞϦొΛߦ͏ • 2. ΞϓϦέʔγϣϯͱI/OσόΠεؒͰͷڞ༗ϝϞϦͷௐΛͳΔ͘ݮΒ͢ • Free protect: ΞϓϦόοϑΝ։์໋ྩ -> LibOS͕I/Oऴྃ·Ͱ͔ͬͯΒ։์ • ʢैདྷಉ༷ʣॻ͖ࠐΈอޢແ͠ -> όοϑΝมߋʢwriteʣI/Oͭඞཁ͋Γ • DCར༻Ͱແ͠ɺͱ͍͏ओுɻྫɿRedisͰput requestຖʹόοϑΝׂޙɺσʔλߏମͰͦͷϙΠϯλʹࢦఆ
5. Future Work • OS Design • ಛఆΞΫηϥϨʔλͷෆػೳΛLibOSͰิʢDPDKͳΒNWελοΫશൠʣ • ΞΫηϥϨʔλͷछྨ͕ଟ͍߹શ෦LibOS͕ίʔυΛ࣋ͭʢʂʣLibOSͱɾɾɾʁ
• Network Protocols • I/OͷΑΓ൚༻తͳdata unitׂΛࢦ͢ɻ • طଘͷϑϨʔϜϫʔΫʢTCPHTTPSͳͲʣͳΒड৴ଆͰ࠶ߏͰ͖Δ͕ɺ൚༻ੑ੍͕ݶ͞Εͯ͠·͏ • File System and Storage • طଘFS (ext4ͳͲ)ΛLibOS (γϯάϧΞϓϦέʔγϣϯ)Ͱ͏ʹΦʔόʔϔου͕େ͖͗͢ • ΞΫηϥϨʔλʹదͨ͠FS? 11
6. Related Work • OS • Arraakis, IXΑΓநͷߴ͍I/F • ϢʔβϨϕϧͷOS֦ுͰHWӅṭ
-> NWελοΫͳͲͷOSػೳແ͍ • I/O Accelerated System • POSIX I/Fʹҡ࣋ͰඇޮԽɻྫɿmTCPͩͱDPDKΑΓlatency͔͔Δ • NW/TCPॲཧΛPMD/NICͰΔɺ੍͘͠ޚΛOS͔Β֎͢ํʢQUICͳͲʣ • I/O Accelerated Application • RDMAΛͬͨϦϞʔτϝϞϦͷϨΠςϯγʔΞϓϦέγϣʔϯ 12
7. Conclusion • I/Oੑೳେ෯্ʹ͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕͑ͳ͍ɺI/OநԽ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/OநԽΛٞ 13
ࡾߦ·ͱΊ 14 • I/Oੑೳେ෯্ʹ͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕͑ͳ͍ɺI/OநԽ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/OநԽΛٞ
EoP 15