Slide 27
Slide 27 text
ISC’23
Relaxed alignment restriction for packed vectors
• VE20 required 8-byte alignment for FP32 vectors, resulting in poor performance with
some access patterns (e.g., stencil-like).
• VE30 relaxes the restriction to 4-byte alignment
Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer 27
0
10
20
30
40
50
60
70
VE20
w/o packed
VE30
w/o packed
VE30
w/ packed
GFLOP/s
do k = 1, nz
do j = 1, ny
do i = 1, nx
a(i,j,k) = a(i,j,k) + &
(b(i-1,j-1,k-1) + b(i ,j-1,k-1) + b(i+1,j-1,k-1) + &
b(i-1,j ,k-1) + b(i ,j ,k-1) + b(i+1,j ,k-1) + &
b(i-1,j+1,k-1) + b(i ,j+1,k-1) + b(i+1,j+1,k-1) + &
b(i-1,j-1,k ) + b(i ,j-1,k ) + b(i+1,j-1,k ) + &
b(i-1,j ,k ) + b(i ,j ,k ) + b(i+1,j ,k ) + &
b(i-1,j+1,k ) + b(i ,j+1,k ) + b(i+1,j+1,k ) + &
b(i-1,j-1,k+1) + b(i ,j-1,k+1) + b(i+1,j-1,k+1) + &
b(i-1,j ,k+1) + b(i ,j ,k+1) + b(i+1,j ,k+1) + &
b(i-1,j+1,k+1) + b(i ,j+1,k+1) + b(i+1,j+1,k+1))/27.0
end do
end do
end do
27-point stencil microbenchmark