Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nutanix CEを自作PCへインストールして

Yamauchi
August 27, 2023

Nutanix CEを自作PCへインストールして

Nutanix CEを自作PCへインストールしてまんまとハマった話です。

Yamauchi

August 27, 2023
Tweet

Other Decks in Technology

Transcript

  1. 自己紹介 • 山内 • X(旧Twitter): nogisawa • 福島県出身、現在は川越市在住 • 高校の時は天文部

    • 現在はHPC製品の製品サポート(障害対応の切り分け) (Web系サーバーエンジニア → ヘルプデスク・技術支援員 → スパコン常駐員を経て現在に至る) • 趣味:自宅サーバー、鉄道(乗る方)、カメラ、テレビの遠距離受信 天文部部室
  2. Nutanix CEの要件確認 CPU Intel: Sandy Bridge以降でVT-xとAVXをサポートしてるもの AMD: Zen(Gen1)以降 メモリ 最低20GB,

    推奨は32GB以上 NIC IntelまたはRealtek ストレージ ストレージ(Cold Tier): 500GB以上のHDDまたはSSD ストレージ(Hot Tiler Flash): 200GB以上のSSD USBメモリ: 32GB以上のドライブ パーツ選定に必要な部分のみ抜粋
  3. 選択したパーツ CPU AMD RYZEN5 4500 メモリ 48GB(Crucial DDR4 16x2 +

    8x2) M/B ASRock A520M Phantom Gaming 4 (Realtek 1G NIC) ストレージ ADATA 512GB SATA SSD Intel 512GB NVMe SSD BUFFALO 32GB USBメモリ
  4. CVMが起動しない原因調査… • とりあえずログ調査(journalctl) Aug 20 08:38:07 NTNX-aa58deac-A python[2056]: , stderr:

    error: Failed to start domain NTNX-aa58deac-A-CVM Aug 20 08:38:07 NTNX-aa58deac-A python[2056]: ¥ error: Hook script execution failed: internal error: ¥ Child process (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin /etc/libvirt/hooks/qemu NTNX-aa58deac-A-CVM prepare begin -) ¥ unexpected exit status 1: Could not detach device : 0000:05:00.0 Could not detach device : 0000:05:00.0 がキーワードになりそう
  5. 試しにCVMを手動で起動させてみる • 同じエラーが発生します [root@NTNX-aa58deac-A ~]# virsh list --all Id Name

    State -------------------------------------- - NTNX-aa58deac-A-CVM shut off [root@NTNX-aa58deac-A ~]# virsh start NTNX-aa58deac-A-CVM error: Failed to start domain NTNX-aa58deac-A-CVM error: Hook script execution failed: ¥ internal error: Child process ¥ (LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin /etc/libvirt/hooks/qemu NTNX-aa58deac-A-CVM prepare begin -)¥ unexpected exit status 1: Could not detach device : 0000:05:00.0
  6. 「0000:05:00.0」の正体 • 多分PCIだろうという事でlspciを確認 [root@NTNX-aa58deac-A libvirt]# lspci -D -v -s 05:00.0

    0000:05:00.0 Non-Volatile memory controller: Intel Corporation SSD Pro 7600p/760p/E 6100p Series (rev 03) (prog-if 02 [NVM Express]) Subsystem: Intel Corporation Intel Corporation SSD Pro 7600p/760p/E 6100p Series [NVM Express] Flags: bus master, fast devsel, latency 0, IRQ 58, NUMA node 0 Memory at fcf00000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [b0] MSI-X: Enable+ Count=16 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [158] Secondary PCI Express Capabilities: [178] Latency Tolerance Reporting Capabilities: [180] L1 PM Substates Kernel driver in use: nvme Kernel modules: nvme IntelのNVMe SSDでした
  7. なぜ「Could not detach」なのか? [root@NTNX-aa58deac-A ~]# find / -type f |

    xargs grep 'Could not detach device' 2>/dev/null /usr/lib/python3.6/nutanix-site-packages/ahv/utils.py: raise RuntimeError("Could not detach device : %s" % self._pci_address_str) Binary file /usr/lib/python3.6/nutanix-site-packages/ahv/__pycache__/utils.cpython-36.opt-1.pyc matches Binary file /usr/lib/python3.6/nutanix-site-packages/ahv/__pycache__/utils.cpython-36.pyc matches /usr/lib/python2.7/nutanix-site-packages/ahv/utils.py: raise RuntimeError(“Could not detach device : %s” % self._pci_address_str) Binary file /usr/lib/python2.7/nutanix-site-packages/ahv/utils.pyo matches Binary file /usr/lib/python2.7/nutanix-site-packages/ahv/utils.pyc matches エラーメッセージの発生元を探してみる utils.pyが怪しい
  8. 試してみると… [root@NTNX-aa58deac-A libvirt]# readlink /sys/bus/pci/devices/0000:05:00.0/iommu_group (出力なし。本来なら何か出てくる) [root@NTNX-aa58deac-A libvirt]# ls -la

    /sys/bus/pci/devices/0000:05:00.0/*iommu* ls: cannot access /sys/bus/pci/devices/0000:05:00.0/*iommu*: No such file or directory ??????
  9. IOMMUはクリアした。しかし・・・ • 今度はメモリが確保できないというエラー 2023-08-20 13:03:19.668+0000: 1977: error : ¥ qemuProcessReportLogError:2067

    : ¥ internal error: qemu unexpectedly closed the monitor: ¥ 2023-08-11T13:03:19.498297Z qemu-kvm: ¥ -object memory-backend-file,id=ram-node0,mem-path=¥ /dev/hugepages/libvirt/qemu/¥ NX-aa58deac-A-CVM,share=yes,prealloc=yes,size=51539607552:¥ unable to map backing store for guest RAM: Cannot allocate memory
  10. メモリ設定がおかしい [root@NTNX-aa58deac-A ~]# virsh edit NTNX-aa58deac-A-CVM (一部抜粋) <memory unit='KiB'>50331648</memory> <currentMemory

    unit='KiB'>50331648</currentMemory> [root@NTNX-aa58deac-A ~]# free -k total used free shared buff/cache available Mem: 49076696 47398408 1302676 1048 375612 1355236 Swap: 0 0 0 下記のように変更
  11. それでも起動しない・・・ [root@NTNX-aa58deac-A ~]# virsh start NTNX-aa58deac-A-CVM error: Failed to start

    domain NTNX-aa58deac-A-CVM error: internal error: qemu unexpectedly closed the monitor: 2023-08-11T14:48:17.106515Z qemu-kvm: warning: Large machine and max_ram_below_4g (536870912) not a multiple of 1G; possible bad performance. 2023-08-11T14:48:17.127954Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead 2023-08-11T14:48:17.128854Z qemu-kvm: ¥ -device vfio-pci,host=0000:05:00.0,id=hostdev0,bus=pci.0,addr=0x7,rombar=0: vfio 0000:05:00.0: group 1 is not viable Please ensure all devices within the iommu_group are bound to their vfio bus driver. よくあるのは同じPCIに複数デバイスがぶら下がってる場合だったりするらしいが、 今回それは該当しない…
  12. NVMeはパッと見VFIOに割当たっている [root@NTNX-aa58deac-A ~]# lspci -s 0000:05:00.0 -v -D 0000:05:00.0 Non-Volatile

    memory controller: Intel Corporation SSD Pro 7600p/760p/E 6100p Series (rev 03) (prog-if 02 [NVM Express]) Subsystem: Intel Corporation Intel Corporation SSD Pro 7600p/760p/E 6100p Series [NVM Express] Flags: fast devsel, IRQ 59, NUMA node 0, IOMMU group 1 <-★ Memory at fcf00000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [b0] MSI-X: Enable- Count=16 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [158] Secondary PCI Express Capabilities: [178] Latency Tolerance Reporting Capabilities: [180] L1 PM Substates Kernel driver in use: vfio-pci <-★ Kernel modules: nvme 沼にハマる予感…
  13. 構成を変更 CPU AMD RYZEN5 4500 メモリ 48GB(Crucial DDR4 16x2 +

    8x2) M/B ASRock A520M Phantom Gaming 4 (Realtek 1G NIC) ストレージ ADATA 512GB SATA SSD Intel 512GB NVMe SSD → ADATA 512GB SATA SSD BUFFALO 32GB USBメモリ