Slide 10
Slide 10 text
© Prometech Software, Inc.
予備評価
10
NVIDIA A100 (p4d.24xlarge)
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00260 s, Size = 32000000 bytes, NumDevsUsed = 1
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00243 s, Size = 32000000 bytes, NumDevsUsed = 1
NVIDIA V100 (p3dn.24xlarge)
bandwidthTest-H2D-Pinned, Bandwidth = 11.2 GB/s, Time = 0.00286 s, Size = 32000000 bytes, NumDevsUsed = 1
bandwidthTest-D2H-Pinned, Bandwidth = 12.5 GB/s, Time = 0.00255 s, Size = 32000000 bytes, NumDevsUsed = 1
Max 15.75 GB/s
0
2
4
6
8
10
12
14
1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M
Bandwidth [GB/s]
Message size [Byte]
osu_bibw (inter-node CPU)
100 Gbps
• CPU-GPU間のデータ転送性能はPCIe gen3に律速
• CPU間はEFA 1リンクの通信性能が得られる
• 複数リンクを束ねた通信はInfiniBandでもうまく動
かない事がある
• P4dはEFA GDR (GPU Direct RDMA) が提供され
るが、Open MPI内部エラーで停止(原因要調査)