A ‘ S TA C K E R ’ L O O K I N G I N T O G P G P U U S E . V O L U N T E E R F O R T H E R A S P B E R RY P I F O U N D AT I O N : ) M A S A F U M I O H TA
T H I S P R E S E N TAT I O N I S R E N E W E D F O R O P E N S TA C K D AY S T O K Y O 2 0 1 7 . I T I S I N C L U D E D S O M E F E E D B A C K S F R O M T H E S E S S I O N AT # L C 3 C H I N A I N B E I J I N G , 2 0 1 7 P R E S E N TAT I O N R E N E W E D
O R G I Z N E D B Y V T J A N D G AT H E R M E , N V I D I A ( A S K ) , D E L L E M C , N E S I C A N D S O M E C O M PA N I E S W H O I N T E R E S T E D G P U U S E O N O P E N S TA C K E N V I R O N M E N T F O R O U R C U S T O M E R S T H I S P R O J E C T I S …
T E S L A M 6 0 + D E L L E M C ɹ P O W E R E D G E C 4 1 3 0 + O P E N S TA C K T O E VA L U AT E G P U O N O P E N S TA C K E N V I R O N M E N T T H A N K S T W O H E L P I N G O U T A N D P R O V I D I N G T H O S E S E R V E R S + C A R D S N O W E VA L U AT I N G . .
O P E N S TA C K ಛ घ ར ༻ ͷ ध ཁ OpenStackͷಛघͳར༻ͷधཁ͕ ૿͖͍͑ͯͯΔ Hadoop(Sahara),HPC,ͳͲͳͲ ΄ͱΜͲͷͷ·ͱΊͨͷ͕ ͳ͘ɺάάͬͯௐΔ͔͠ͳ͍… ᐌ͘ʮυΩϡϝϯτϩετঢ়ଶʯ ͜ΕΒͷͷOpenStackͷ Docsʹ·ͱΊΒΕΔ͖
G P U O P E N S TA C K ͱ • ΠϯελϯτͳHPCར༻ • ͍͔ͭ͘ͷܭࢉΛ͓͑ͨ͠ΒVMͦͷ ͷΛյ͢ɻ • ͪΐͬͱ͓ͨ͠ࢼ͠Ͱ͍͔ͭ͘ͷVM ΛͬͯHPCάϦουࢼͯ͠ΈΔɻऴ ΘͬͨΒ͙͢յ͢ɻ • GPUΠϯλʔφϧΫϥυͱͯ͠ ͷGPUར༻ • ओʹۀʹ͓͍ͯɺ͍͔ͭ͘ͷγε ςϜใཧ্ɺύϒϦοΫΫϥ υʹ֎͕ͩ͠Ͱ͖ͳ͍ɻ
O p e n S t a c k Ͱ G P U Λ ಈ ͔͢ ํ ๏ ʢ ݱ ࡏ ) • PCIύεεϧʔ • PCIσόΠεΛμΠϨΫτʹଓ͢Δ • ComputeNodeͷϋΠύʔόΠβʔґଘɺOpenStackґଘͰͳ͍ • Xenར༻ͰͷGPUίΞׂɺKVMNVIDIA/AMDͱίΞׂෆՄ • Intel GVT-g(Xen)/GVT-d(KVM)ʹΑΔIntel GPUͷίΞׂ • ίϯςφ • NVIDIA Dockerͷར༻ • ෳͷίϯςφʹΑͬͯGPUΛར༻Ͱ͖Δ͕ɺ໌ࣔతͳGPUίΞׂͰ͖ͳ͍ λεΫ࣍ୈͰͷGPUར༻ • Kubernetes/Mesos/Docker SwarmͳͲͱͷΈ߹Θͤཧ
P C I P a s s t h ro u g h o n O p e n S t a c k • Redhatެࣜαϙʔτ • ͨͩɺ͋·Γਪ͠Ͱͳ͍ؾ͕.. • UbuntuυΩϡϝϯτ͢Β·ͱʹͳ͍… • άάͬͯ୳͔͢͠ͳ͍ʢ͕͢͞Ubuntu..orz ) • Լখ৬/NVIDIA JAPAN/DELLEMC/VTJͰ࠶ࡉ͘ݕূͱ OpenStackίϛϡχςΟͷͨΊʹυΩϡϝϯτΛ·ͱΊ্͛Δ༧ ఆʢझࢫʹಉҙ͍͖ͨͩࢀը͍͚ͨͩΔاۀ༷ܴ͍ͨ͠·͢)
Linux OS for KVM hypervisor GPU Driver App VM VMM/KVM IOMMU/Vt-d PCI Express x16 Linux/Win OS ComputeNode GPU Card Nova Compute Nova Scheduler Nova API Linux OS ControllerNode ਤ1:OpenStackͲ͏ͬͯGPUύεεϧʔΛ࣮ݱ͍ͯ͠Δͷ͔ʁ(KVMͷ߹) Nova Conductor pci-stub/vfio-pci GPU Driver
S T E P 2 : I O M M U η ο τΞ οϓ • IOMMU(Input/Output Memory Management Unit) ཧσ όΠεΛԾԽγεςϜͰ͏্Ͱඞཁͳͷ • ͪΖΜvt-dΦϯʹ͢Δඞཁ͋Γ (EFI/BIOSͷͷσϑΥϧ τON) • intel_iommuͱvfio_iommu_type1.allow_unsafe_interruptsͷ ઃఆΛ/etc/default/grubʹߦ͏ඞཁ͋Γ GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1”
S T E P 3 : p c i - s t u b ͱ V F I O • pci-stubཧσόΠεΛLinuxϗετଆ͕ར༻Ͱ͖ͳ͍Α͏ʹ͢Δ • VFIO(Virtual Function IO) pci-stubͱಉ༷ͷಇ͖Λ͢Δɻkernel 4.1Ҏ ߱ͷαϙʔτ • ະ༻࣌σόΠεΛD3εςʔτ(ফඅిྗϞʔυ)ʹมߋ͢Δ • It is not used by default σϑΥϧτͰΘΕͳ͍ͨΊ /etc/module Λ ฤूͯ͠ɺ͜ΕΒͱؔ࿈͢ΔίϯϙʔωϯπΛه͢Δ (kvm,kvm_intel) pci_stub vfio vfio_iommu_type1 vfio_pci kvm kvm_intel
S T E P 4 - 2 : B l a c k l i s t ( 2 ) • ϒʔτ࣌ʹGPUσόΠεΛೝࣝͰ͖ͳ͍Α͏ʹ͢Δɻ • /etc/modprobe.d/blacklist.conf ʹ࣍ΛՃ: blacklist nvidia blacklist nvidia-uvm • υϥΠόʹ͍ͭͯϒϥοΫϦετʹೖΕΔඞཁ͋Γ blacklist nouveau
modprobe /etc/modprobe.d/blacklist.conf pci-stub /sys/bus/pci/drivers/pci-stub/ /sys/bus/pci/devices/$(Identifier)/driver/unbind ramfs /etc/initramfs-tools/modules GRUB /etc/default/grub modules /etc/modules UEFI/BIOS Vt-d ਤ3:ϒʔτ࣌ͷGPUϒϥοΫϦετͷϓϩηε(Ubuntuͷ߹) IOMMU IOMMU BLACK LIST BLACK LIST IOMMU BLACK LIST BLACK LIST
O p e n s t a c k : n o v a - a p i ͷ ઃ ఆ • ControllerNodeͷ/etc/nova/nova.confΛฤू͠nova- apiΛ࠶ىಈ͢Δ • pci_aliasʹPCIσόΠεͷใɺΤΠϦΞε໊Λهड़ pci_alias={“name”:”K4200”,"vendor_id":"10de","product_id":"11b4"}
O p e n S t a c k : n o v a - c o m p u t e ͷ ઃ ఆ • ComputeNodeʹ͋Δ/etc/nova/nova.confΛฤू͠ɺ nova-computeΛ࠶ىಈ͢Δɻ • pci_passthrough_whitelistʹPCIσόΠεͷใɺΤΠϦΞε໊Λ هड़ pci_passthrough_whitelist={“name”:”K4200","vendor_id":"10de","product_id":"11b4"} *͜ͷέʔεͷ߹ɺϕϯμʔIDͱϓϩμΫτID͕ద߹ͨ͠σόΠεશͯVMʹύεεϧʔ͢Δɻ • pci_aliasʹʹPCIσόΠεͷใɺΤΠϦΞε໊Λಉ༷ʹه *NeutonҎ߱ pci_alias={“name”:”K4200”,"vendor_id":"10de","product_id":"11b4"}
O p e n S t a c k : n o v a - s c h e d u l e r ͷ ઃ ఆ • ControllerNodeͷ/etc/nova/nova.confΛઃఆ͠nova- schedulerΛ࠶ىಈ͢Δɻ • PciPassthroughFilterΛར༻Ͱ͖ΔΑ͏ʹ͢ΔͨΊʹ PciPassthroughFilterΛscheduler_default_filtersʹه͢Δ • ಉ༷ʹPciPassthroughFilterΛscheduler_available_filtersʹهड़ ͢Δ scheduler_available_filters=nova.scheduler.filters.all_filters scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter scheduler_default_filters=DifferentHostFilter,RetryFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,Dis kFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter,PciPassthroughFilter
Linux OS for KVM hypervisor GPU D App VM VMM/KVM Linux/W ComputeNode Nova Compute Nova Scheduler Nova API Linux OS ControllerNode Nova Conductor pci_alias pci_passthrough_whitelist pci_alias scheduler_default_filters scheduler_available_filters ਤ4:Nova͕GPU(PCI)ύεεϧʔ͞Ε͍ͯΔComputeNodeͰಈ࡞͢Δϓϩηε P C I σόΠ ε Λ ར ༻ Մ ʹ ͯ͠ P C I ར ༻ ͷ Ϧ Ϋ Τ ε τ Λ ૹ Ε Δ Α ͏ ʹ ͢ Δ p c i p a s s t h ro u g h f i l t e r Λ ར ༻ ͯ͠ G P U ύ ε ε ϧ ʔ ͞ Ε ͨ C o m p u t e N o d e Λ બ Ϳ G P U ύ ε ε ϧ ʔ ͠ ͨ Π ϯελ ϯε Λ p c i _ a l i a s ͱ p c i _ p a s s t h ro u g h _ w h i t e l i s t ʹ Α ͬͯ ൃ ੜ ͞ ͤ Δ
O p e n S t a c k : f l a v o r- k e y ͷ ઃ ఆ • flavor-keyΛઃఆ͠GPUΠϯελϯεͰར༻Ͱ͖ΔΑ͏ ʹPCIύεεϧʔͷઃఆΛflavorʹه͢Δ • pci_passthrough:alias=$(pci_alias_name):$(the number of GPUs we would like to use) nova flavor-key $flavor_name set “pci_passthrough:alias”=“K4200:$(the number_of_gpus)”
C l o u d Πϝʔ δ ͷ • CloudΠϝʔδGPUΛ͏্Ͱͱͯখ͘͞qemu-imgͰϦαΠζ ͢Δඞཁ͕͋Δ • CUDAυϥΠόperl-packages(dev packages)͕Πϯετʔϧ࣌ʹඞཁ • ͦΕ͕.deb͋Δ͍.rpmύοέʔδͰ͋Ζ͏ͱΠϯετʔϧ͕ඞཁʹͳΔɻͳͥͳΒ CUDAύοέʔδࣗମ͕όΠφϦύοέʔδͰͳ͘ɺιʔείʔυΑΓόΠφϦΛϏ ϧυ͍ͯͯ͠ɺmakeΛύοέʔδΠϯετʔϧͷࡍʹ࣮ߦ͍ͯ͠Δ • NVIDIAᐌ͘CUDAυϥΠόͷ࣍ظϦϦʔεͰFIX༧ఆͱͷ͜ͱ • CUDA 7.6Ҏ߱Ͱfixͷ༧ఆͩͬͨͷʹ..·ͩfix͞Εͯͳ͍..orz
V D I ͱ ͯ͠ͷ W i n d o w s ར ༻ • CUDA on Windows ࢥͬͨΑΓૣ͍͚ͲΧΫΧΫͯ͠͠·͏ • ଟDISKͷεϐʔυɺωοτϫʔΫͳͲͳͲ͍Ζ͍Ζͳͷ͕༧͞ΕΔɻଟ ΤϑΣϝϥϧϞʔυͦͷଞരܥͷSSD/NVMeͷར༻Β10gҎ্ͷωοτ ϫʔΫڥ͕ඞཁͱͳΔͩΖ͏ • ·ͩվળରԠΛͬͯΈͨ͜ͱͳ͍͕ɺͳͥൃੜ͢Δͷ͔ΛௐΔۙ͘ʑݕূ༧ఆɻ • VMجຊϝϞϦ/NWసૹͳͲίϯςΩετεΠονͰͷಈ࡞Λ͢ΔͨΊɺCUDAͷॏ͍ ϫʔΫϩʔυΧΫΧΫ͢ΔՄೳੑ͕͋Δ͔͠Εͳ͍ɻ • ৄ͘͠σϞϏσΦʹͯ… • GPU on OpenStack্ͰͷWindowsͷಈ࡞ͬͱௐ͕ࠪඞཁ..͕࣌ؒ΄͍͠…
V D I ͱ ͯ͠ͷ W i n d o w s ར ༻ ( f e e d b a c k ) • Thanks giving some feedbacks to my session at LC3 China! • Should checked and investigate the Windows-related issues below, will update later. • Windows 10 on KVM issue • http://bart.vanhauwaert.org/hints/installing-win10-on-KVM.html • Windows 10 deployment is succeeded from my KVM but I still have failed the same deployment from my OpenStack.I should investigate more what happened in detail. • Nvidia Driver issue (version 337.88 or later) • https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#. 22Error_43:_Driver_failed_to_load.22_on_Nvidia_GPUs_passed_to_Windows_VMs • ‘Nvidia drivers on Windows check if an hypervisor is running and fail if it detects one, which results in an Error 43 in the Windows device manager. ‘ I haven’t found this issue on my Windows 7 VMs so I should check more in detail • Related links libvirt for adding the driver, should be checked. • https://github.com/openstack/nova/blob/master/nova/virt/libvirt/config.py#L2025-L2036 • https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4262-L4264
S p e c i a l T h a n k s t o : • GPU on OpenStack project members VirtualTech Japan NVIDIA DellEMC NEC Networks & System Integration • @zgock999 at Tokaido-LUG, Nagoya, Japan Teach me some hints how to use GPGPU on VM! • Matthew Treinish of IBM attended my session at LC3 china and figure out and feedback some point! • Our customers! give the chance to evaluate!
P R E S E N T B Y M A S A F U M I O H TA T W E E T @ m a s a f u m i o h t a m a i l t o : m a s a f u m i @ p i d 0 . o rg T H A N K S V E RY M U C H F O R C O M I N G M Y S E S S I O N !