Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#29 “I’m Not Dead Yet! The Role of the Operatin...
Search
cafenero_777
June 19, 2023
Technology
0
120
#29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”
HotOS '19
https://dl.acm.org/doi/10.1145/3317550.3321422
cafenero_777
June 19, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
440
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
110
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
110
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
82
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
48
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
cafenero_777
1
110
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
33
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
210
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
230
Other Decks in Technology
See All in Technology
【詳説】コンテンツ配信 システムの複数機能 基盤への拡張
hatena
0
220
分解して理解する Aspire
nenonaninu
2
1k
ABWG2024採択者が語るエンジニアとしての自分自身の見つけ方〜発信して、つながって、世界を広げていく〜
maimyyym
1
110
Amazon Aurora のバージョンアップ手法について
smt7174
2
140
コンピュータビジョンの社会実装について考えていたらゲームを作っていた話
takmin
1
590
組織におけるCCoEの役割とAWS活用事例
nrinetcom
PRO
4
120
RayでPHPのデバッグをちょっと快適にする
muno92
PRO
0
190
Apache Iceberg Case Study in LY Corporation
lycorptech_jp
PRO
0
300
データベースの負荷を紐解く/untangle-the-database-load
emiki
2
480
Oracle Database Technology Night #87-1 : Exadata Database Service on Exascale Infrastructure(ExaDB-XS)サービス詳細
oracle4engineer
PRO
1
160
ESXi で仮想化した ARM 環境で LLM を動作させてみるぞ
unnowataru
0
160
スキルだけでは満たせない、 “組織全体に”なじむオンボーディング/Onboarding that fits “throughout the organization” and cannot be satisfied by skills alone
bitkey
0
160
Featured
See All Featured
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
A designer walks into a library…
pauljervisheath
205
24k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.5k
Docker and Python
trallard
44
3.3k
Building Adaptive Systems
keathley
40
2.4k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
114
50k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Building Your Own Lightsaber
phodgson
104
6.2k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
175
52k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7.1k
Building a Scalable Design System with Sketch
lauravandoore
461
33k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
133
33k
Transcript
Research Paper Introduction #29 “I’m Not Dead Yet! The Role
of the Operating System in a Kernel-Bypass Era” ௨ࢉ#84 @cafenero_777 2021/10/14 1
Agenda • ରจ • ֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Kernel-Bypass Accelerators
in the Datacenter 3. Evolving the Datacenter OS for Kernel Bypass 4. The Demikernel 5. Future Work 6. Related Work 7. CONCLUSION 2
ରจ • I’m Not Dead Yet! The Role of the
Operating System in a Kernel-Bypass Era • Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, Anirudh Badam • Microsoft Research, University of Wisconsin, University of Texas • HotOS '19 • https://dl.acm.org/doi/10.1145/3317550.3321422 3
֓ཁͱಡ͏ͱͨ͠ཧ༝ • ֓ཁ • DCNW༻్ͰͷOS”ऴᖼ (demise)”͍ͯ͠Δʁʂ • RDMA/DPDKߴ͕ͩநԽΛࡴ͢ • ৽͍͠I/OநԽ:
DemikernelͷఏҊ • ಡ͏ͱͨ͠ཧ༝ • NWߴԽͷͲͷํʁ • ۙະདྷͷ: library OS? • ΩϟονʔͳtitleͩͬͨͷͰ 4
1. Introduction 5 • աڈ10ͷαʔόI/OߴԽ V.S. CPUੑೳ • TCP-o ff
l oad, SmartNIC/SR-IOV, Comp./Enc./ML on FPGA • kernel bypassٕज़ͰI/OΦʔόʔϔουΛݮ • ػೳఏڙ͢Δ͕ɺநԽϨΠϠʔ͕ແ͍ • ʢྫɿsocket, fi le, pipeʣ • ࢄϝϞϦɾࢄετϨʔδ w/ RDMA • systemΛHWʹ߹ΘͤͯΧελϚΠζ -> େมʂ • OSΛͲ͏ม͑Δ͖͔ɻ৽OSΞʔΩςΫνϟDemikernelͰઃܭٞ͠
2. Kernel-Bypass Accelerators in the Datacenter 6 • Kernel Bypass
• KernelΦʔόʔϔουۃখͰ࠷ͷύέοτసૹΛࢦ͢ • I/Fػೳଘࡏ͠ͳ͍ • ϓϩάϥϚ͕OSಉͷػೳՃɾσόΠεຖʹػೳՃ • ྫɿ • DPDK: جຊతͳI/OσόΠεػೳΛنఆ • Arrakis: HWԾԽٕज़(SR-IOV)Ͱ࣮ • RDMA: verbs I/Frdmacm I/F (~socket)༷͋Δ͕ɾɾɾ • FPGA: ԿͰͰ͖Δ͕࣮༻ੑuse case࣍ୈɾɾɾ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
3. Evolving the Datacenter OS for Kernel Bypass 7 •
UserۭؒͰͷIʗOॲཧ࠷దԽ • طଘLibrary OSʹڞ༗ɾଟॏԽͷΈ͕͋Δʢ͕ॏ͍ʣ • ಁաతϝϞϦ֬อʢi.e. DDIO, NIC<->LLCʣ͕ແ͍ͷͰ࠶࣮ • ޮతͳநԽ • I/Oॲཧ͕͔ͬͨࠒͷઃܭʢͷ໊ʣV.S. ݱɿRedisreadͰ2us • நԽͱੑೳͷڱؒ • طଘPOSIX APIҡ࣋ߋʹΦʔόʔϔου͕͔͔Δ • طଘLibrary OSͱͷػೳͷҧ͍ • طଘɿkernel I/FσόΠεػೳ͕ۉҰͰ͋Δલఏ • ࠓճɿKernel-Bypass framework (HWͱͷSW/kernelͷ”ྑ͍ͱ͜औΓ”తͳʣΛೖΕ͍ͨ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
4. The Demikernel (1/3) 8 • Architecture: C/D pathͷ •
C: network/ fi le open, ͯ͘ྑ͍ -> طଘKernel • D: network/storage/memoryͷread/write -> LibOS + accelerator • I/O queueͱͯ͠நԽ • HWී௨queueΛར༻ -> ͜ΕΛͦͷ··நԽ • atomic data unitͱͯ͠ѻ͑Δʢ༨ͳͪൃੜ͠ͳ͍ʣ • σόΠεʹґଘ͠ͳ͍ߴϨϕϧநԽKernel-BypassϨΠϠʔ
4. The Demikernel (2/3) 9 • Syscall interface • C:
socket(): queue descriptorΛฦ͢ (not fi le descriptor) • C: packet typeͰ fi lter(): BPF frameworkΛఆ • C: merge(): I/OΩϡʔͷϚʔδ • C: sort(): ༏ઌʹԠͯ͡I/OΩϡʔΛ͏ • C: map(): P4తͳෳࡶͳpktॲཧ࣮Ͱ͖ͦ͏ • D: push/pop • ૢ࡞ൣғͷࢦఆ • non-blockingॲཧ. wait_*()Ͱfetch
4. The Demikernel (3/3) 10 • qtoken: Ұͭͷqૢ࡞ຖʹݻ༗ • epollΛվળͰ͖Δ
• wait_*()͕σʔλΛฦ͢->ଞͷsyscallݺͣʹʢۭৼΓʣࡁΉ • pop completion: pop͕ྃͨ͠Βthread͕ى͖Δɻbusy pollingཁΒͳ͍ • zero copy: • 1. ಁաతϝϞϦ֬อɿLibOS͕IOMMUϝϞϦొΛߦ͏ • 2. ΞϓϦέʔγϣϯͱI/OσόΠεؒͰͷڞ༗ϝϞϦͷௐΛͳΔ͘ݮΒ͢ • Free protect: ΞϓϦόοϑΝ։์໋ྩ -> LibOS͕I/Oऴྃ·Ͱ͔ͬͯΒ։์ • ʢैདྷಉ༷ʣॻ͖ࠐΈอޢແ͠ -> όοϑΝมߋʢwriteʣI/Oͭඞཁ͋Γ • DCར༻Ͱແ͠ɺͱ͍͏ओுɻྫɿRedisͰput requestຖʹόοϑΝׂޙɺσʔλߏମͰͦͷϙΠϯλʹࢦఆ
5. Future Work • OS Design • ಛఆΞΫηϥϨʔλͷෆػೳΛLibOSͰิʢDPDKͳΒNWελοΫશൠʣ • ΞΫηϥϨʔλͷछྨ͕ଟ͍߹શ෦LibOS͕ίʔυΛ࣋ͭʢʂʣLibOSͱɾɾɾʁ
• Network Protocols • I/OͷΑΓ൚༻తͳdata unitׂΛࢦ͢ɻ • طଘͷϑϨʔϜϫʔΫʢTCPHTTPSͳͲʣͳΒड৴ଆͰ࠶ߏͰ͖Δ͕ɺ൚༻ੑ੍͕ݶ͞Εͯ͠·͏ • File System and Storage • طଘFS (ext4ͳͲ)ΛLibOS (γϯάϧΞϓϦέʔγϣϯ)Ͱ͏ʹΦʔόʔϔου͕େ͖͗͢ • ΞΫηϥϨʔλʹదͨ͠FS? 11
6. Related Work • OS • Arraakis, IXΑΓநͷߴ͍I/F • ϢʔβϨϕϧͷOS֦ுͰHWӅṭ
-> NWελοΫͳͲͷOSػೳແ͍ • I/O Accelerated System • POSIX I/Fʹҡ࣋ͰඇޮԽɻྫɿmTCPͩͱDPDKΑΓlatency͔͔Δ • NW/TCPॲཧΛPMD/NICͰΔɺ੍͘͠ޚΛOS͔Β֎͢ํʢQUICͳͲʣ • I/O Accelerated Application • RDMAΛͬͨϦϞʔτϝϞϦͷϨΠςϯγʔΞϓϦέγϣʔϯ 12
7. Conclusion • I/Oੑೳେ෯্ʹ͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕͑ͳ͍ɺI/OநԽ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/OநԽΛٞ 13
ࡾߦ·ͱΊ 14 • I/Oੑೳେ෯্ʹ͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕͑ͳ͍ɺI/OநԽ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/OநԽΛٞ
EoP 15