Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nutanix CEを自作PCへインストールして

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.
Avatar for Yamauchi Yamauchi
August 27, 2023

Nutanix CEを自作PCへインストールして

Nutanix CEを自作PCへインストールしてまんまとハマった話です。

Avatar for Yamauchi

Yamauchi

August 27, 2023

Other Decks in Technology

Transcript

  1. 自己紹介 • 山内 • X(旧Twitter): nogisawa • 福島県出身、現在は川越市在住 • 高校の時は天文部

    • 現在はHPC製品の製品サポート(障害対応の切り分け) (Web系サーバーエンジニア → ヘルプデスク・技術支援員 → スパコン常駐員を経て現在に至る) • 趣味:自宅サーバー、鉄道(乗る方)、カメラ、テレビの遠距離受信 天文部部室
  2. Nutanix CEの要件確認 CPU Intel: Sandy Bridge以降でVT-xとAVXをサポートしてるもの AMD: Zen(Gen1)以降 メモリ 最低20GB,

    推奨は32GB以上 NIC IntelまたはRealtek ストレージ ストレージ(Cold Tier): 500GB以上のHDDまたはSSD ストレージ(Hot Tiler Flash): 200GB以上のSSD USBメモリ: 32GB以上のドライブ パーツ選定に必要な部分のみ抜粋
  3. 選択したパーツ CPU AMD RYZEN5 4500 メモリ 48GB(Crucial DDR4 16x2 +

    8x2) M/B ASRock A520M Phantom Gaming 4 (Realtek 1G NIC) ストレージ ADATA 512GB SATA SSD Intel 512GB NVMe SSD BUFFALO 32GB USBメモリ
  4. CVMが起動しない原因調査… • とりあえずログ調査(journalctl) Aug 20 08:38:07 NTNX-aa58deac-A python[2056]: , stderr:

    error: Failed to start domain NTNX-aa58deac-A-CVM Aug 20 08:38:07 NTNX-aa58deac-A python[2056]: ¥ error: Hook script execution failed: internal error: ¥ Child process (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin /etc/libvirt/hooks/qemu NTNX-aa58deac-A-CVM prepare begin -) ¥ unexpected exit status 1: Could not detach device : 0000:05:00.0 Could not detach device : 0000:05:00.0 がキーワードになりそう
  5. 試しにCVMを手動で起動させてみる • 同じエラーが発生します [root@NTNX-aa58deac-A ~]# virsh list --all Id Name

    State -------------------------------------- - NTNX-aa58deac-A-CVM shut off [root@NTNX-aa58deac-A ~]# virsh start NTNX-aa58deac-A-CVM error: Failed to start domain NTNX-aa58deac-A-CVM error: Hook script execution failed: ¥ internal error: Child process ¥ (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin /etc/libvirt/hooks/qemu NTNX-aa58deac-A-CVM prepare begin -)¥ unexpected exit status 1: Could not detach device : 0000:05:00.0
  6. 「0000:05:00.0」の正体 • 多分PCIだろうという事でlspciを確認 [root@NTNX-aa58deac-A libvirt]# lspci -D -v -s 05:00.0

    0000:05:00.0 Non-Volatile memory controller: Intel Corporation SSD Pro 7600p/760p/E 6100p Series (rev 03) (prog-if 02 [NVM Express]) Subsystem: Intel Corporation Intel Corporation SSD Pro 7600p/760p/E 6100p Series [NVM Express] Flags: bus master, fast devsel, latency 0, IRQ 58, NUMA node 0 Memory at fcf00000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [b0] MSI-X: Enable+ Count=16 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [158] Secondary PCI Express Capabilities: [178] Latency Tolerance Reporting Capabilities: [180] L1 PM Substates Kernel driver in use: nvme Kernel modules: nvme IntelのNVMe SSDでした
  7. なぜ「Could not detach」なのか? [root@NTNX-aa58deac-A ~]# find / -type f |

    xargs grep 'Could not detach device' 2>/dev/null /usr/lib/python3.6/nutanix-site-packages/ahv/utils.py: raise RuntimeError("Could not detach device : %s" % self._pci_address_str) Binary file /usr/lib/python3.6/nutanix-site-packages/ahv/__pycache__/utils.cpython-36.opt-1.pyc matches Binary file /usr/lib/python3.6/nutanix-site-packages/ahv/__pycache__/utils.cpython-36.pyc matches /usr/lib/python2.7/nutanix-site-packages/ahv/utils.py: raise RuntimeError(“Could not detach device : %s” % self._pci_address_str) Binary file /usr/lib/python2.7/nutanix-site-packages/ahv/utils.pyo matches Binary file /usr/lib/python2.7/nutanix-site-packages/ahv/utils.pyc matches エラーメッセージの発生元を探してみる utils.pyが怪しい
  8. 試してみると… [root@NTNX-aa58deac-A libvirt]# readlink /sys/bus/pci/devices/0000:05:00.0/iommu_group (出力なし。本来なら何か出てくる) [root@NTNX-aa58deac-A libvirt]# ls -la

    /sys/bus/pci/devices/0000:05:00.0/*iommu* ls: cannot access /sys/bus/pci/devices/0000:05:00.0/*iommu*: No such file or directory ??????
  9. IOMMUはクリアした。しかし・・・ • 今度はメモリが確保できないというエラー 2023-08-20 13:03:19.668+0000: 1977: error : ¥ qemuProcessReportLogError:2067

    : ¥ internal error: qemu unexpectedly closed the monitor: ¥ 2023-08-11T13:03:19.498297Z qemu-kvm: ¥ -object memory-backend-file,id=ram-node0,mem-path=¥ /dev/hugepages/libvirt/qemu/¥ NX-aa58deac-A-CVM,share=yes,prealloc=yes,size=51539607552:¥ unable to map backing store for guest RAM: Cannot allocate memory
  10. メモリ設定がおかしい [root@NTNX-aa58deac-A ~]# virsh edit NTNX-aa58deac-A-CVM (一部抜粋) <memory unit='KiB'>50331648</memory> <currentMemory

    unit='KiB'>50331648</currentMemory> [root@NTNX-aa58deac-A ~]# free -k total used free shared buff/cache available Mem: 49076696 47398408 1302676 1048 375612 1355236 Swap: 0 0 0 下記のように変更
  11. それでも起動しない・・・ [root@NTNX-aa58deac-A ~]# virsh start NTNX-aa58deac-A-CVM error: Failed to start

    domain NTNX-aa58deac-A-CVM error: internal error: qemu unexpectedly closed the monitor: 2023-08-11T14:48:17.106515Z qemu-kvm: warning: Large machine and max_ram_below_4g (536870912) not a multiple of 1G; possible bad performance. 2023-08-11T14:48:17.127954Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead 2023-08-11T14:48:17.128854Z qemu-kvm: ¥ -device vfio-pci,host=0000:05:00.0,id=hostdev0,bus=pci.0,addr=0x7,rombar=0: vfio 0000:05:00.0: group 1 is not viable Please ensure all devices within the iommu_group are bound to their vfio bus driver. よくあるのは同じPCIに複数デバイスがぶら下がってる場合だったりするらしいが、 今回それは該当しない…
  12. NVMeはパッと見VFIOに割当たっている [root@NTNX-aa58deac-A ~]# lspci -s 0000:05:00.0 -v -D 0000:05:00.0 Non-Volatile

    memory controller: Intel Corporation SSD Pro 7600p/760p/E 6100p Series (rev 03) (prog-if 02 [NVM Express]) Subsystem: Intel Corporation Intel Corporation SSD Pro 7600p/760p/E 6100p Series [NVM Express] Flags: fast devsel, IRQ 59, NUMA node 0, IOMMU group 1 <-★ Memory at fcf00000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [b0] MSI-X: Enable- Count=16 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [158] Secondary PCI Express Capabilities: [178] Latency Tolerance Reporting Capabilities: [180] L1 PM Substates Kernel driver in use: vfio-pci <-★ Kernel modules: nvme 沼にハマる予感…
  13. 構成を変更 CPU AMD RYZEN5 4500 メモリ 48GB(Crucial DDR4 16x2 +

    8x2) M/B ASRock A520M Phantom Gaming 4 (Realtek 1G NIC) ストレージ ADATA 512GB SATA SSD Intel 512GB NVMe SSD → ADATA 512GB SATA SSD BUFFALO 32GB USBメモリ