Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Go Out with Blackhole

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Tenstorrent Japan Tenstorrent Japan
October 29, 2025
280

Go Out with Blackhole

Tenstorrent TechTalk #4 in Tokyo, LT2

Avatar for Tenstorrent Japan

Tenstorrent Japan

October 29, 2025
Tweet

More Decks by Tenstorrent Japan

Transcript

  1. Go Out with Blackhole Tenstorrent Tech Talk #4 - Lightning

    Talk Tetsuya Hayashi Note: This material is not official information from Tenstorrent. Please understand this is solely personal hobby information.
  2. Introduction I want to hack Blackhole anytime, anywhere! I really

    want to hack on the actual hardware in my hands I want to bring my own Blackhole and show it off Want to bring Blackholes together and connect them at 800Gbps A 2D torus with 4 people, a 3D torus with 8 people? That's incredible! (1) Might be doable with effort. TT is freedum! ※ The image on the right was generated by Gemini Nano Banana (1)
  3. Shopping 1. Thunderbolt 3 M.2 NVMe Adapter: Wavlink Portable M.2

    NVMe SSD 2. M.2 NVMe PCIe 3.0 x4 Adapter : ADT-Link R42UF 3. 1000W ATX 3.1 Power Supply: Thermalright TR-TPFX-1000-W Purchased the entire set during Aliexpress's July sale for ¥32,335 DIY was cheaper than buying an eGPU box Reference sites https://darekasan-net.hatenablog.com/entry/2024/09/04/152918
  4. Tried various hosts Result I/F Memory Old ThinkPad X1 Carbon

    ✗ TB3 16GB Abandoned due to lack of Adobe 4G Decoding in BIOS Unable to recognize PCIe memory Business card-sized x86 Radxa X4 ✗ M.2 OCI link 8GB Close! The small sample runs, but vLLM fails due to insufficient memory tt-smi OK, run_op_on_device.py OK, vLLM NG Recent ThinkPad P14s Gen5 ◯ TB3 64GB Worked with IOMMU (VT-d) off TTSMI ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ V e r s i o n   3 . 0 . 3 2                                                                   TT-SMI                                                 O c t   2 3   2 0 2 5   1 1 : 1 2 : 0 4   P M │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ┌─  H o s t   I n f o   ( C o n f i g   W a r n i n g ! )  ─ ─ ─ ─ ─ ─ ─ ─┐I n f o r m a t i o n   ( 1 ) T e l e m e t r y   ( 2 ) F W   V e r s i o n   ( 3 ) │ │╸━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │*  O S               :  L i n u x   ( x 8 6 _ 6 4 ) │┌─  D e v i c e   I n f o r m a t i o n  ───────────────────────────────────────────────────────────────────────────┐ │*  D i s t r o       :  U b u n t u   2 4 . 0 4 . 3   L T S ││ │ │*  K e r n e l       :  6 . 1 4 . 0 - 3 3 - g e n e r i c ││ #       B u s   I D       B o a r d   T y p e   B o a r d   I D     C o o r d s D R A M   T r a i n e d D R A M   S p e e d L i n k   S p e e d   L i │ │*  H o s t n a m e   :  h a u y n i t e ││ │ │*  P y t h o n       :  3 . 1 2 . 3 ││ 0  0 0 0 0 : 0 3 : 0 0 . 0        p 1 0 0 a          4 3 2 3 1 9 1 1 0 5 c     N ∕ A             N ∕ A                 N ∕ A         G e n 3  ∕   G e n 5     x │ │*  M e m o r y       :  7 . 5 4   G B ││ │ │                      *   3 2 G B + ││ │ │*  D r i v e r       :  T T - K M D   2 . 4 . 1 ││ │ │ ││ │ └─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  *   R e c o m m e n d e d   C o n f i g  ─┘│ │ │ │ │ │ │ ▉ │ │ │ └────────────────────────────────────────────────────────────────────────────────────────────────┘   q  Q u i t    h  H e l p    d  T o g g l e   d a r k   m o d e    c  T o g g l e   s i d e b a r    1  D e v i c e   i n f o   t a b    2  T e l e m e t r y   t a b    3  F i r m w a r e   t a b   ▏^ p  p a l e t t e ※2 ERROR 10-12 03:13:08 [engine.py:453] [enforce fail at alloc_cpu.cpp:117] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 17179869184 bytes. Error code 12 (Cannot allocate memory) 2
  5. It worked! Board: My Blackhole p100a PC: Thinkpad P14s Gen5

    Intel Core Ultra 7 155H 64MB Ubuntu 24.04.3 bare metal installed BIOS: Thunderbolt 3 -> Security Level: No Security Security -> Virtualization -> VT- d Feature: Disable ※ For some reason, in my case, vLLM wouldn't work and threw errors when IOMMU (VT-d) was enabled!? ※
  6. TT-SMI Display TTSMI ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ V e r s i

    o n   3 . 0 . 3 2                                                                         TT-SMI                                                       O c t   2 0   2 0 2 5   1 2 : 2 4 : 3 7   A M │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ┌─  H o s t   I n f o   ( F u l l y   C o m p a t i b l e )  ─ ─ ─ ─ ─ ─ ─┐I n f o r m a t i o n   ( 1 ) T e l e m e t r y   ( 2 ) F W   V e r s i o n   ( 3 ) │ │╸━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │*  O S               :  L i n u x   ( x 8 6 _ 6 4 ) │┌─  D e v i c e   I n f o r m a t i o n  ───────────────────────────────────────────────────────────────────────────────────┐ │*  D i s t r o       :  U b u n t u   2 4 . 0 4 . 3   L T S ││ │ │*  K e r n e l       :  6 . 1 4 . 0 - 3 3 - g e n e r i c ││ #       B u s   I D       B o a r d   T y p e   B o a r d   I D     C o o r d s D R A M   T r a i n e d D R A M   S p e e d L i n k   S p e e d   L i n k   W i d t h │ │*  H o s t n a m e   :  m i d n i g h t ││ │ │*  P y t h o n       :  3 . 1 2 . 3 ││ 0  0 0 0 0 : 5 2 : 0 0 . 0        p 1 0 0 a          4 3 2 3 1 9 1 1 0 5 c     N ∕ A             N ∕ A                 N ∕ A         G e n 3  ∕   G e n 5     x 4  ∕   x 1 6 │ │*  M e m o r y       :  6 2 . 3 0   G B ││ │ │*  D r i v e r       :  T T - K M D   2 . 4 . 1 ││ │ │ ││ │ └─ ─── ──── ─── ─── ─── ──── ─── ─── ──── ─── ─── ─┘│ │ │ │ │ │ │ │ │ │ │ ▏ │ │ │ └────────────────────────────────────────────────────────────────────────────────────────────────────────┘   q  Q u i t    h  H e l p    d  T o g g l e   d a r k   m o d e    c  T o g g l e   s i d e b a r    1  D e v i c e   i n f o   t a b    2  T e l e m e t r y   t a b    3  F i r m w a r e   t a b   ▏^ p  p a l e t t e Saving SVG from tt-smi adds this border, but it's not a Mac
  7. It worked! TT-Inference-Server Following the tutorial's “Deploying LLMs” section, vLLM

    worked smoothly. (request-venv) hayate@midnight:~/git/tt-inference-server$ curl -sS "http://localhost:8000/v1/completions" -H "Content-Type: application/json" -H "Authorization: Bearer $VLLM_API_KEY" -d "{ \"model\": \"meta-llama/$MODEL\", \"prompt\": \"Jim Keller is?\", \"max_tokens\": 60, \"temperature\": 0 }" | jq { "id": "cmpl-9c65c696ebaa4031a5900aaec091ab11", "object": "text_completion", "created": 1761145166, "model": "meta-llama/Llama-3.1-8B-Instruct", "choices": [ { "index": 0, "text": " (Part 2)\nJim Keller is a renowned American computer architect and engineer, best known for his work at AMD and Apple. He is credited with designing the x86-64 architecture, which is the foundation of modern personal computers.\nKeller's career spans over three decades, with significant contributions to", "logprobs": null, "finish_reason": "length", "stop_reason": null, "prompt_logprobs": null } ], "usage": { "prompt_tokens": 5, "total_tokens": 65, "completion_tokens": 60, "prompt_tokens_details": null } } https://docs.tenstorrent.com/getting-started/vLLM-servers.html#deploying-llms ※: For tt-inference-server branches, try bh-getting-started first, then proceed to dev if successful
  8. Summary Built a portable Blackhole environment using a Thunderbolt adapter

    + p100a Now able to go out with my Blackhole anytime, anywhere Future work Investigate performance degradation from Thunderbolt connection (Is 8.0 Gb/s sufficient?) Evaluate Blackhole Peer to Peer 800Gbps connection performance Requires two or more P150 units. Yes, I want them! (2) pci 0000:52:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.2 (capable of 504.112 Gb/s with 32.0 GT/s PCIe x16 link) 00:07.2 PCI bridge: Intel Corporation Meteor Lake-P Thunderbolt 4 PCI Express Root Port #2 (rev 02) (2)
  9. Tips. Linux Device Recognition and hugepage 1. Add udev rules

    to recognize Thunderbolt devices on connection /etc/udev/rules.d/99-removable.rules ACTION==“add”, SUBSYSTEM==“thunderbolt”, ATTR{authorized}==‘0’, ATTR{authorized}=“1” ※Reference URL: https://wiki.archlinux.org/title/Thunderbolt 2. Connect the p100a and verify with lspci Verify with lspci -vv -d 1e52:* 52:00.0 Processing accelerators: Tenstorrent Inc Blackhole The device must be displayed and three Memory regions (0, 2, 4) must be allocated 3. Re-apply hugepages (mandatory for plug-and-play connections) If the device shows up in lspci, manually run sudo /opt/tenstorrent/bin/hugepage-setup.sh If it displays Node 0 hugepages after: 4 , it's OK. You can also check the info with cat /proc/meminfo
  10. lspci and hugepage-setup.sh output $ lspci -vv -d 1e52:* 52:00.0

    Processing accelerators: Tenstorrent Inc Blackhole Subsystem: Tenstorrent Inc Blackhole Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 198 Region 0: Memory at 4800000000 (64-bit, prefetchable) [size=512M] Region 2: Memory at 4820000000 (64-bit, prefetchable) [size=1M] Region 4: Memory at 4000000000 (64-bit, prefetchable) [size=32G] Capabilities: <access denied> Kernel driver in use: tenstorrent Kernel modules: tenstorrent $ sudo /opt/tenstorrent/bin/hugepages-setup.sh Node 0 hugepages needed: 4 Node 0 hugepages after: 4 Completed hugepage setup