Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GPU on OpenStack日本語版
Search
masafumi_ohta
May 15, 2017
Technology
0
750
GPU on OpenStack日本語版
日本語バージョン作りました。若干訂正も加えました。
masafumi_ohta
May 15, 2017
Tweet
Share
More Decks by masafumi_ohta
See All by masafumi_ohta
COSCUP25 Intro at OSC Tokyo Spring 25, at Komazawa Univ.
masafumi_ohta
0
22
Deepdiving to Raspberry Pi 5
masafumi_ohta
0
36
树莓派的历史、相关信息及使用案例
masafumi_ohta
0
410
海外カンファレンスのCFPの正しい書き方
masafumi_ohta
4
570
GPD.pdf
masafumi_ohta
0
71
GPD MicroPCのご紹介
masafumi_ohta
0
170
3大あくじょ考察
masafumi_ohta
0
480
GPU on OpenStack GPUインターナルクラウドのベストプラクティス
masafumi_ohta
0
310
これからのSIerに必要なこと
masafumi_ohta
1
530
Other Decks in Technology
See All in Technology
AI エージェント活用のベストプラクティスと今後の課題
asei
2
430
useEffectってなんで非推奨みたいなこと言われてるの?
maguroalternative
9
5.9k
進化の早すぎる生成 AI と向き合う
satohjohn
0
460
原理から解き明かす AIと人間の成長 - Progate BAR
teba_eleven
2
270
Modern Data Stack大好きマンが語るSnowflakeの魅力
sagara
0
220
MS Ignite 2025で発表されたFoundry IQをRecap
satodayo
2
190
ローカルLLM基礎知識 / local LLM basics 2025
kishida
26
12k
"なるべくスケジューリングしない" を実現する "PreferNoSchedule" taint
superbrothers
0
130
re:Invent2025とAWS Builder Cards Resilience Expansionのご紹介
tsuwa61
1
130
[続・営業向け 誰でも話せるOCI セールストーク] AWSよりOCIの優位性が分からない編(2025年11月21日開催)
oracle4engineer
PRO
1
200
機械学習を「社会実装」するということ 2025年冬版 / Social Implementation of Machine Learning November 2025 Version
moepy_stats
4
2k
DGX SparkでローカルLLMをLangChainで動かした話
ruzia
1
210
Featured
See All Featured
Rails Girls Zürich Keynote
gr2m
95
14k
Why Our Code Smells
bkeepers
PRO
340
57k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
690
Code Review Best Practice
trishagee
73
19k
Documentation Writing (for coders)
carmenintech
76
5.2k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Building Adaptive Systems
keathley
44
2.8k
The Cost Of JavaScript in 2023
addyosmani
55
9.3k
Visualization
eitanlees
150
16k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
360
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Transcript
GPU on OpenStack ͓͓ͨɹ·͞;Έ @masafumiohta
͓લͩΕʁ> SIerۈ ͍͔ͭ͘ͷOpenStack PoCʹࢀՃɺٴͼΞυόΠε OpenStackͷఏҊɾޚ૬ஊঝͬͯ·͢ɻ OpenStackίʔυಡΜͰमਖ਼ͨ͠Γͱ͔… ଓ͖WebͰ https://jp.linkedin.com/in/ohtamasafumi
͡Ίʹ OpenStackಛघར༻ Sahara(HadoopͷOpenStack)ͳͲͳͲ ΄ͱΜͲ͕·ͱΊΒΕͯͳ͙ͬͯ͘ௐΔ͔͠ͳ͍ ͍ΘΏΔʰυΩϡϝϯτϩετʱ docs.openstack.orgɹʹ͋͛ͯ΄͍͠ɻ
GPU on OpenStackͱ
GPUτϨϯυ GPUϝχʔίΞత ֤MPUίΞҰݸ͋ͨΓখ͍͕͘͞ز͔ͭͷܭࢉ (MATLABܥͳͲ)Ͱ༗ޮ GPUͷμΠूੵʹର͢ΔলిྗHPCϢʔβʹ ͱͬͯେ͖͍ɻ ίϯύΫτͳγεςϜলిྗɾলεϖʔε͕ඞਢͳຊ ͷHPCγεςϜʹେ͖͍
GPUͲ͏ͬͯ͏ͷ͔ʁ LinuxͷPCIύεεϧʔٕज़Ͱ͏ AWSଟ͜ͷϕʔε KVMʹґଘ VSphereGPUίΞͷεϓϦοτ͕Մೳ VSphereXenͷΑ͏ͳGPUεϓϦοτՄೳ͔ʁ ݱߦͰ͖ͳ͍͕Nvidia͕αϙʔτ༧ఆ
GPU Docker DockerΛ͔ͭͬͨGPUͷར༻ ίΞׂͰ͖ͳ͍͕λεΫʹԠͯ͡GPUίΞར༻Λ ࣗಈར༻ׂɻ ࠷ۙͷGPU on OpenStackͱ͍͑͜ͷGPU Dockerར ༻
ͨͩɺύεεϧʔ+KVMΑΓύϑΥʔϚϯεͰͳ͍
GPU OpenStack୭ͷͨΊ HPCͷςϯϙϥϦར༻ ͪΐͬͱܭࢉͬͯऴΘͬͨΒVM͝ͱյ͢ ͍͔ͭ͘ͷVMΛ͔ͭͬͯ؆୯ʹHPCάϦουͱ͔.. EC2ͷΑ͏ʹHPCΛͬͨΓ͢Δ ෦ར༻Ͱ͏ɺಛʹۀͰEC2ͳͲΫϥυʹग़ͪ͠Ό ͍͚ͳ͍ਓ͚
GPU on OpenStackΛ͏
PCIύεεϧʔ(1) PCIσόΠεΛμΠϨΫτʹVMʹͭͳ͙ ཧϗετΑΓ͔ΒPCIσόΠεΛΓ͢ඞཁ͕͋Δɻ KVMͷػೳʹࠨӈ͞ΕΔ͜ͱʹͳ͍ͬͯΔɻ OpenStackͱؔͳ͍ ҰͭͷσόΠεʹҰͭͷVM GPUࣗମ֤VMʹର͠ڞ༗͓ΑͼׂෆՄೳ ͋͘·ͰKVMͷ੍ݶɺOpenStackͰͳ͍ɻ
PCIύεεϧʔ(2) RedhatͰެࣜαϙʔτ ͔͠͠ར༻ੵۃਪ͍ͯ͠ͳ͍ʢ͍͋͠ʁ) ࣮ࡍRedhatͰϋϚΔ.. UbuntuͰυΩϡϝϯτԽ͞Ε͍ͯͳ͍ άάͬͯใΛ͕͞͞ͳ͖Ό͍͚ͳ͍ɻ
K Linux OS for KVM hypervisor GPU Driver App Instance
VMM/KVM IOMMU/Vt-d PCI Express x16 Linux/Win OS ComputeNode GPU Card GPU Card Nova Compute Nova Scheduler AMQP Nova API Linux OS ControllerNode ਤ:GPUύεεϧʔͲ͏ͬͯOpenStackͰಈ͔͘ʢҹ͕Πϯελϯεൃߦϓϩηεʣ
KVMϗετͷGPU ϗετͷGPUΛνΣοΫ͢Δ: lspci -nn | grep -i nvidia lspci -nn
| grep -i nvidia 88:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:11b4] (rev a1) 88:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1) GPUͷϢχοτ͝ͱύεεϧʔ͢Δඞཁ͕͋Δ GPUϗετ͚ͩͰͳ͘HDMIͳͲϢχοτ͝ͱΓ͢ඞཁ͋Γ Ͱͳ͍ͱVM্Ͱಈ͍ͯ͘Εͳ͍=ύεεϧʔͰ͖ͳ͍
GPUͷϙʔτΛΈΔ GPUʹHDMIϏσΦͱΦʔσΟΦ͕͋ΔͷͰ ͜Ε͝ͱύεεϧʔ͢Δඞཁ͕༗Δ
IOMMUͷηοτΞοϓ ཧσόΠεΛ͏ͨΊͷԾγεςϜʹ͓͍ͯ IOMMU(Input/Output Memory Management Unit) ηοτΞο ϓඞਢ ͪΖΜ intel
vt-d (I/OԾԽ)Φϯʹ (EFI/BIOSνΣοΫ) grubͰͷηοτΞοϓඞཁ:/etc/default/grubΛมߋهड़ GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1”
pci-stub pci-stub ཧPCIσόΠεΛϗετͰར༻Ͱ͖ͳ͍Α͏ʹ͢Δ ͨΊʹઃఆ͢Δ /etc/moduleʹσϑΥϧτͰઃఆ͞Ε͍ͯͳ͍ͷͰઃఆ͢Δɻؔ ࿈͢Δίϯϙʔωϯτઃఆ͢Δ (vfio,kvm)ɻ pci_stub vfio vfio_iommu_type1
vfio_pci kvm kvm_intel
VFIO(1) ύεεϧʔ͞ΕΔͷΛཧσόΠε͔ΒΓͨ͢ ΊʹVFIO(Virtual Function IO)্ʹՃ͢Δඞཁ͕͋Δɻ ͜ΕΒͷσόΠεΛramfsͰೝࣝ͞ΕΔͷΛ͙ /etc/initramfs-tools/modules to initramfs (ubuntu)
echo ‘pci_stub ids=10de:11b4,10de:0e0a’ >> /etc/initramfs-tools/modules sudo update-initramfs -u && sudo reboot
VFIO(2) ͜ΕΒͷσόΠε·ͨbootͷࡍʹೝࣝ͞Εͳ͍ Α͏ʹ͢Δɻͯ͢ͷboot upγʔέϯεͰཧσό ΠεͰͷೝࣝΛࢭ͢Δɻ /etc/modprobe.d/blacklist.conf ʹՃ: blacklist nvidia blacklist
nvidia-uvm NvidiaޓυϥΠόೝࣝͤ͞ͳ͍Α͏ʹՃ blacklist nouveau
ཧ͔ΒΓ͢ pci-stubΛ͔ͭͬͯཧσόΠεΑΓΓ͠VMͱ͚ͬͭ͘Δɻσ όΠεIDΛnew_idʹՃ͢Δɻ·ͨؔ࿈ࣝผࢠΛΓ͠ɺVMͱ ͚ͬͭ͘Δɻ echo 11de 11b4 > /sys/bus/pci/drivers/pci-stub/new_id echo
11de 0e0a > /sys/bus/pci/drivers/pci-stub/new_id echo 0000:88:00.0 > /sys/bus/pci/devices/0000:88:00.0/driver/unbind echo 0000:88:00.1 > /sys/bus/pci/devices/0000:88:00.1/driver/unbind echo 0000:88:00.0 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:88:00.1 > /sys/bus/pci/drivers/pci-stub/bind ཧϚγϯ͔ΒΓ͞Ε͔ͨɺ’claimed’ʹͳ͍ͬͯΔ͔ௐΔɻ pci-stub 0000:88:00.1: claimed by stub
modprobe /etc/modprobe.d/blacklist.conf pci-stab /sys/bus/pci/drivers/pci-stub/ /sys/bus/pci/devices/$(Identifier)/driver/unbind ramfs /etc/initramfs-tools/modules GRUB /etc/default/grub modules
/etc/modules UEFI/BIOS Vt-d ਤ:GPUύεεϧʔͰComputeNodeͷBOOTϓϩηε্Ͱઃఆ͢Δͷ B O O T ϓ ϩ η ε IOMMU IOMMU BLACK LIST BLACK LIST IOMMU BLACK LIST
ཧσόΠε pci-stub (ԾͰ͏) GPUϢχοτ (σόΠεશମ) GPUΛཧσόΠε͔ΒΓ͠ɺ ԾσόΠεʹ͚ସ͑Δɻ echo 11de 11b4
> /sys/bus/pci/drivers/pci-stub/new_id echo 11de 0e0a > /sys/bus/pci/drivers/pci-stub/new_id echo 0000:88:00.0 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:88:00.1 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:88:00.0 > /sys/bus/pci/devices/0000:88:00.0/driver/unbind echo 0000:88:00.1 > /sys/bus/pci/devices/0000:88:00.1/driver/unbind
͞ΒʹGPUΛՃ͑Δ(1) lspciͷ݁ՌΛ֬ೝɺ2ͭͷσόΠεID͕݁Ռͱͯ͠ݟ͑Δɺ ͜ͷݟ͑ํγεςϜʹґଘɻ lspci -nn | grep -i nvidia 88:00.0
VGA compatible controller [0300]: NVIDIA Corporation Device [10de:11b4] (rev a1) 88:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1) 84:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:11b4] (rev a1) 84:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1)
͞ΒʹGPUΛՃ͑Δ(2) pci-stubΛར༻͠͞ΒʹσόΠεΛύεεϧʔ͢Δɻ echo 0000:84:00.0 > /sys/bus/pci/devices/0000:84:00.0/driver/unbind echo 0000:84:00.1 > /sys/bus/pci/devices/0000:84:00.1/driver/unbind
echo 0000:84:00.0 > /sys/bus/pci/drivers/pci-stub/bind echo 0000:84:00.1 > /sys/bus/pci/drivers/pci-stub/bind CUDAΞϓϦΛ͏্ͰಉػछͷGPUͷՃ͕ඞਢ(͕ͪ ͏ͷෆՄ)ɺಉ͡ͷͰ͋Δ͔ฉ͔ΕΔ͜ͱ͕͋Δɻ /nbody -benchmark -numdevices=2 -num bodies=65536
͞ΒʹGPUΛՃ͑Δ(3) ύεεϧʔʹޭ͢ΔͱVM্Ͱಈ͍͍ͯΔͷ͕lspciʹͯ νΣοΫͰ͖Δɻ ubuntu@guestos$ lspci -nn | grep -i nvidia
00:07.0 VGA compatible controller [0300]: NVIDIA Corporation GK104GL [Quadro K4200] [10de:11b4] (rev a1) 00:08.0 VGA compatible controller [0300]: NVIDIA Corporation GK104GL [Quadro K4200] [10de:11b4] (rev a1)
Novaͷه(1) ComputeNodeͷwhitelist͕PCIύεεϧʔͱGPUͷ VMͷσϓϩΠϝϯτͰඞཁɻ /etc/nova/nova.conf ʹ pci_passthrough_whitelist Λه pci_passthrough_whitelist={"name":"K4200","vendor_id":"10de","product_id": "11b4"}
Novaͷه(2) ίϯτϩʔϥʔϊʔυͷnova aliasͷઃఆ͕ඞཁɺޙ ड़͢Δflavor-key Ͱར༻͢Δɻ to /etc/nova/nova.confʹ pci_aliasesΛه pci_alias={“name”:”K4200”,"vendor_id":"10de","product_id":"11b4"}
Novaͷه(3) ίϯτϩʔϥϊʔυʹͯPCIύεεϧʔϑΟϧλʔΛ novaʹՃɻ /etc/nova/nova.conf ʹΞϯμʔϥΠϯ෦Ճ scheduler_available_filters=nova.scheduler.filters.all_filters scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciP assthroughFilter scheduler_default_filters=DifferentHostFilter,RetryFilter,AvailabilityZoneFilter, RamFilter,CoreFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,Imag
ePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,Aggre gateInstanceExtraSpecsFilter,PciPassthroughFilter
nova alias GPUΛ͏ͨΊʹflavor-keyΛه͢ΔɻϑϨʔόʹset໊ Ͱطଘflavorʹର͠pciύεεϧʔͷͰܾΊͨGPUͷalias ໊ɺ͍͍ͨGPUϢχοτΛՃ͢Δɻ ্هલड़·ͰͷաఔͰɺQuadro K4200Λ2ϢχοτՃ͍ͯ͠ Δ͜ͱ͕લఏͳͷͰɺҎԼͷΑ͏ͳهड़ʹͳΔɻ pci_aliasΛK4200ͱ͍ͯ͠ΔͨΊҎԼͷهड़ͱͳΔɻ nova
flavor-key $flavor_name set “pci_passthrough:alias”=“K4200:$amount_of_gpu”
طͷ
ΫϥυΠϝʔδͷ Πϝʔδ͕GPUΛ͏ʹͱͯখ͘͞qemu-imgͰΠϝʔδͷϦα ΠζΛͯ͠Δඞཁ͕༗Δɻ CUDA driverΛΠϯετʔϧ͢Δʹperl-packagesΛطଘͷΫϥυ Πϝʔδʹ͍ΕͯΔඞཁ͕͋Δɻ CUDAυϥΠό.deb or .rpMύοέʔδͰόΠφϦϑΝΠϧͰͳ͘ɺΠϯε τʔϧ࣌ʹιʔεΛmakeͰΠϯετʔϧ͢ΔܗΛͱ͍ͬͯΔ=.runϑΝΠϧͱม
ΘΒͣɻ NvidiaকདྷͷϦϦʔεͰspecϑΝΠϧʹperlؔ࿈ϑΝΠϧͷهΛ͢ΔܗͰfix ͢Δ༧ఆɻ 7.6Ҏ߱Ͱగਖ਼ͯ͘͠ΕΔɺͱ͍͍͕ͬͯͨ…(·ͩςετͯ͠ͳ͍ɻʣ
Windows as VDI Windows্ͷCUDAͪΌΜͱΠϯετʔϧͰ͖Εૣ ͘ͳΔ͚Ͳ࣌ͨ·ΧΫΧΫͯ͠͠·͏ɻ vmͷdisk speedͷɺΤϑΣϝϥϧʹ͢ΔͳΓૣ͍σΟεΫΛ ͏ͳΓͰͦͦ͜͜ղܾ͢Δ(SSD,NVMe or..etc) vmcontext
switchͰr/wΛߦ͍ͬͯΔͷͰϔϏʔϫʔΫϩʔυ ͷCUDAͳΓ͜ͷΑ͏ͳΧΫΧΫঢ়ଶΛى͜͢Մೳੑ͕͋Δɻ ͜ͷ͋ͱޙʹ·͔͕ͤͨɺগ͠ΤϑΣϝϥϧͰվળͨ͠Β͍͠ ސ٬·ɺͳΜͱ͔ͳͬͨͱͱΓ͋͑ͣͷຬΒͬͨɻ
ϥΠϒϚΠάϨʔγϣϯ ϥΠϒϚΠΫϨʔγϣϯͰ͖ͳ͍ɺvm͕ݹ͍ϗετͷଓ ใΛΓͤͳ͍ɻݹ͍ϗετͷใΛѲͬͨ··ʹͳΔɻ ϫʔΫΞϥϯυ:nova.pci_devicesͷMySQLͷDBใΛফ͠ ͯɺݹ͍ϗετΛ࠶ىಈ͢Δʼҙຯͳ͠ʂ | 2016-08-11 00:54:45 | 2016-08-19
04:58:01 | NULL | 0 | 45 | 21 | 0000:84:00.0 | 11b4 | 10de | type-PCI | pci_0000_84_00_0 | label_10de_11b4 | available | {} | NULL | NULL | 1 | <<-- old-host | 2016-08-11 00:54:45 | 2016-08-19 04:58:01 | NULL | 0 | 48 | 21 | 0000:88:00.0 | 11b4 | 10de | type-PCI | pci_0000_88_00_0 | label_10de_11b4 | available | {} | NULL | NULL | 1 | <<-- old-host
·ͱΊ OpenStackͷಛघར༻͕Ͱ͖Δ͔Ͳ͏͔ͷOS࣍ୈ Ͱ͢ɻ ͷOS͕μϝͳͷʹOpenStack͕..ͱ͍͏ͷΠέͯͳ͍ਓ͕͍ ͏͓ݴ༿… αʔόܥOSҙ֎ʹ͍Ζ͍ΖΠέͯ·ͤΜ…͋Γ͋ΓͰ͢ɻ GPUͷར༻ࠓ͜ͷύεεϧʔ͕ݱߦͰ͕͢ɺNvidia Intel࣍ୈ͔ͱ(AMD…Ͳ͏͢ΔΜͩΖ͏…)
Thank you Masafumi Ohta @masafumiohta