Go Out with Blackhole

Go Out with Blackhole Tenstorrent Tech Talk #4 - Lightning
Talk Tetsuya Hayashi Note: This material is not official information from Tenstorrent. Please understand this is solely personal hobby information.

Introduction I want to hack Blackhole anytime, anywhere! I really
want to hack on the actual hardware in my hands I want to bring my own Blackhole and show it off Want to bring Blackholes together and connect them at 800Gbps A 2D torus with 4 people, a 3D torus with 8 people? That's incredible! (1) Might be doable with effort. TT is freedum! ※ The image on the right was generated by Gemini Nano Banana (1)

Shopping 1. Thunderbolt 3 M.2 NVMe Adapter: Wavlink Portable M.2
NVMe SSD 2. M.2 NVMe PCIe 3.0 x4 Adapter : ADT-Link R42UF 3. 1000W ATX 3.1 Power Supply: Thermalright TR-TPFX-1000-W Purchased the entire set during Aliexpress's July sale for ¥32,335 DIY was cheaper than buying an eGPU box Reference sites https://darekasan-net.hatenablog.com/entry/2024/09/04/152918

Tried various hosts Result I/F Memory Old ThinkPad X1 Carbon
✗ TB3 16GB Abandoned due to lack of Adobe 4G Decoding in BIOS Unable to recognize PCIe memory Business card-sized x86 Radxa X4 ✗ M.2 OCI link 8GB Close! The small sample runs, but vLLM fails due to insufficient memory tt-smi OK, run_op_on_device.py OK, vLLM NG Recent ThinkPad P14s Gen5 ◯ TB3 64GB Worked with IOMMU (VT-d) off TTSMI ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Ｖｅｒｓｉｏｎ３．０．３２ＴＴ－ＳＭＩＯｃｔ２３２０２５１１：１２：０４ＰＭ │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ┌─ ＨｏｓｔＩｎｆｏ（ＣｏｎｆｉｇＷａｒｎｉｎｇ！） ─ ─ ─ ─ ─ ─ ─ ─┐Ｉｎｆｏｒｍａｔｉｏｎ（１）Ｔｅｌｅｍｅｔｒｙ（２）ＦＷＶｅｒｓｉｏｎ（３） │ │╸━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │＊ＯＳ：Ｌｉｎｕｘ（ｘ８６＿６４） │┌─ ＤｅｖｉｃｅＩｎｆｏｒｍａｔｉｏｎ ───────────────────────────────────────────────────────────────────────────┐ │＊Ｄｉｓｔｒｏ：Ｕｂｕｎｔｕ２４．０４．３ＬＴＳ ││ │ │＊Ｋｅｒｎｅｌ：６．１４．０－３３－ｇｅｎｅｒｉｃ ││ ＃ＢｕｓＩＤＢｏａｒｄＴｙｐｅＢｏａｒｄＩＤＣｏｏｒｄｓＤＲＡＭＴｒａｉｎｅｄＤＲＡＭＳｐｅｅｄＬｉｎｋＳｐｅｅｄＬｉ │ │＊Ｈｏｓｔｎａｍｅ：ｈａｕｙｎｉｔｅ ││ │ │＊Ｐｙｔｈｏｎ：３．１２．３ ││ ０００００：０３：００．０ｐ１００ａ４３２３１９１１０５ｃＮ ∕ ＡＮ ∕ ＡＮ ∕ ＡＧｅｎ３ ∕ Ｇｅｎ５ｘ │ │＊Ｍｅｍｏｒｙ：７．５４ＧＢ ││ │ │ ＊３２ＧＢ＋ ││ │ │＊Ｄｒｉｖｅｒ：ＴＴ－ＫＭＤ２．４．１ ││ │ │ ││ │ └─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ＊ＲｅｃｏｍｍｅｎｄｅｄＣｏｎｆｉｇ ─┘│ │ │ │ │ │ │ ▉ │ │ │ └────────────────────────────────────────────────────────────────────────────────────────────────┘ ｑＱｕｉｔｈＨｅｌｐｄＴｏｇｇｌｅｄａｒｋｍｏｄｅｃＴｏｇｇｌｅｓｉｄｅｂａｒ１Ｄｅｖｉｃｅｉｎｆｏｔａｂ２Ｔｅｌｅｍｅｔｒｙｔａｂ３Ｆｉｒｍｗａｒｅｔａｂ ▏＾ｐｐａｌｅｔｔｅ ※2 ERROR 10-12 03:13:08 [engine.py:453] [enforce fail at alloc_cpu.cpp:117] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 17179869184 bytes. Error code 12 (Cannot allocate memory) 2

It worked! Board: My Blackhole p100a PC: Thinkpad P14s Gen5
Intel Core Ultra 7 155H 64MB Ubuntu 24.04.3 bare metal installed BIOS: Thunderbolt 3 -> Security Level: No Security Security -> Virtualization -> VT- d Feature: Disable ※ For some reason, in my case, vLLM wouldn't work and threw errors when IOMMU (VT-d) was enabled!? ※

TT-SMI Display TTSMI ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Ｖｅｒｓｉ
ｏｎ３．０．３２ＴＴ－ＳＭＩＯｃｔ２０２０２５１２：２４：３７ＡＭ │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ┌─ ＨｏｓｔＩｎｆｏ（ＦｕｌｌｙＣｏｍｐａｔｉｂｌｅ） ─ ─ ─ ─ ─ ─ ─┐Ｉｎｆｏｒｍａｔｉｏｎ（１）Ｔｅｌｅｍｅｔｒｙ（２）ＦＷＶｅｒｓｉｏｎ（３） │ │╸━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │＊ＯＳ：Ｌｉｎｕｘ（ｘ８６＿６４） │┌─ ＤｅｖｉｃｅＩｎｆｏｒｍａｔｉｏｎ ───────────────────────────────────────────────────────────────────────────────────┐ │＊Ｄｉｓｔｒｏ：Ｕｂｕｎｔｕ２４．０４．３ＬＴＳ ││ │ │＊Ｋｅｒｎｅｌ：６．１４．０－３３－ｇｅｎｅｒｉｃ ││ ＃ＢｕｓＩＤＢｏａｒｄＴｙｐｅＢｏａｒｄＩＤＣｏｏｒｄｓＤＲＡＭＴｒａｉｎｅｄＤＲＡＭＳｐｅｅｄＬｉｎｋＳｐｅｅｄＬｉｎｋＷｉｄｔｈ │ │＊Ｈｏｓｔｎａｍｅ：ｍｉｄｎｉｇｈｔ ││ │ │＊Ｐｙｔｈｏｎ：３．１２．３ ││ ０００００：５２：００．０ｐ１００ａ４３２３１９１１０５ｃＮ ∕ ＡＮ ∕ ＡＮ ∕ ＡＧｅｎ３ ∕ Ｇｅｎ５ｘ４ ∕ ｘ１６ │ │＊Ｍｅｍｏｒｙ：６２．３０ＧＢ ││ │ │＊Ｄｒｉｖｅｒ：ＴＴ－ＫＭＤ２．４．１ ││ │ │ ││ │ └─ ─── ──── ─── ─── ─── ──── ─── ─── ──── ─── ─── ─┘│ │ │ │ │ │ │ │ │ │ │ ▏ │ │ │ └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ｑＱｕｉｔｈＨｅｌｐｄＴｏｇｇｌｅｄａｒｋｍｏｄｅｃＴｏｇｇｌｅｓｉｄｅｂａｒ１Ｄｅｖｉｃｅｉｎｆｏｔａｂ２Ｔｅｌｅｍｅｔｒｙｔａｂ３Ｆｉｒｍｗａｒｅｔａｂ ▏＾ｐｐａｌｅｔｔｅ Saving SVG from tt-smi adds this border, but it's not a Mac

It worked! TT-Inference-Server Following the tutorial's “Deploying LLMs” section, vLLM
worked smoothly. (request-venv) hayate@midnight:~/git/tt-inference-server$ curl -sS "http://localhost:8000/v1/completions" -H "Content-Type: application/json" -H "Authorization: Bearer $VLLM_API_KEY" -d "{ \"model\": \"meta-llama/$MODEL\", \"prompt\": \"Jim Keller is?\", \"max_tokens\": 60, \"temperature\": 0 }" | jq { "id": "cmpl-9c65c696ebaa4031a5900aaec091ab11", "object": "text_completion", "created": 1761145166, "model": "meta-llama/Llama-3.1-8B-Instruct", "choices": [ { "index": 0, "text": " (Part 2)\nJim Keller is a renowned American computer architect and engineer, best known for his work at AMD and Apple. He is credited with designing the x86-64 architecture, which is the foundation of modern personal computers.\nKeller's career spans over three decades, with significant contributions to", "logprobs": null, "finish_reason": "length", "stop_reason": null, "prompt_logprobs": null } ], "usage": { "prompt_tokens": 5, "total_tokens": 65, "completion_tokens": 60, "prompt_tokens_details": null } } https://docs.tenstorrent.com/getting-started/vLLM-servers.html#deploying-llms ※: For tt-inference-server branches, try bh-getting-started first, then proceed to dev if successful

Summary Built a portable Blackhole environment using a Thunderbolt adapter
+ p100a Now able to go out with my Blackhole anytime, anywhere Future work Investigate performance degradation from Thunderbolt connection (Is 8.0 Gb/s sufficient?) Evaluate Blackhole Peer to Peer 800Gbps connection performance Requires two or more P150 units. Yes, I want them! (2) pci 0000:52:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.2 (capable of 504.112 Gb/s with 32.0 GT/s PCIe x16 link) 00:07.2 PCI bridge: Intel Corporation Meteor Lake-P Thunderbolt 4 PCI Express Root Port #2 (rev 02) (2)

Thank you I leave some setup tips below

Tips. Linux Device Recognition and hugepage 1. Add udev rules
to recognize Thunderbolt devices on connection /etc/udev/rules.d/99-removable.rules ACTION==“add”, SUBSYSTEM==“thunderbolt”, ATTR{authorized}==‘0’, ATTR{authorized}=“1” ※Reference URL: https://wiki.archlinux.org/title/Thunderbolt 2. Connect the p100a and verify with lspci Verify with lspci -vv -d 1e52:* 52:00.0 Processing accelerators: Tenstorrent Inc Blackhole The device must be displayed and three Memory regions (0, 2, 4) must be allocated 3. Re-apply hugepages (mandatory for plug-and-play connections) If the device shows up in lspci, manually run sudo /opt/tenstorrent/bin/hugepage-setup.sh If it displays Node 0 hugepages after: 4 , it's OK. You can also check the info with cat /proc/meminfo

lspci and hugepage-setup.sh output $ lspci -vv -d 1e52:* 52:00.0
Processing accelerators: Tenstorrent Inc Blackhole Subsystem: Tenstorrent Inc Blackhole Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 198 Region 0: Memory at 4800000000 (64-bit, prefetchable) [size=512M] Region 2: Memory at 4820000000 (64-bit, prefetchable) [size=1M] Region 4: Memory at 4000000000 (64-bit, prefetchable) [size=32G] Capabilities: <access denied> Kernel driver in use: tenstorrent Kernel modules: tenstorrent $ sudo /opt/tenstorrent/bin/hugepages-setup.sh Node 0 hugepages needed: 4 Node 0 hugepages after: 4 Completed hugepage setup

Go Out with Blackhole

Go Out with Blackhole

Tenstorrent Japan

More Decks by Tenstorrent Japan

Featured

Transcript

Go Out with Blackhole Tenstorrent Tech Talk #4 - Lightning

Introduction I want to hack Blackhole anytime, anywhere! I really

Shopping 1. Thunderbolt 3 M.2 NVMe Adapter: Wavlink Portable M.2

Tried various hosts Result I/F Memory Old ThinkPad X1 Carbon

It worked! Board: My Blackhole p100a PC: Thinkpad P14s Gen5

It worked! TT-Inference-Server Following the tutorial's “Deploying LLMs” section, vLLM

Summary Built a portable Blackhole environment using a Thunderbolt adapter

Thank you I leave some setup tips below

Tips. Linux Device Recognition and hugepage 1. Add udev rules

lspci and hugepage-setup.sh output $ lspci -vv -d 1e52:* 52:00.0