2 2 with BLAS # Compute tables q_norms = norms(Q) # 1 2 2, 2 2 2, … , 2 2 x_norms = norms(X) # 1 2 2, 2 2 2, … , 2 2 ip = sgemm_(Q, X, …) # ⊤ # Scan and sum parfor (m = 0; m < M; ++m): for (n = 0; n < N; ++n): dist = q_norms[m] + x_norms[n] – ip[m][n] Stack -dim query vectors to a × matrix: = 1 , 2 , … , ∈ ℝ× Stack -dim database vectors to a × matrix: = 1 , 2 , … , ∈ ℝ× SIMD-accelerated function ➢ Matrix multiplication by BLAS ➢ Dominant if and are large ➢ The difference of the background matters: ✓ Intel MKL is 30% faster than OpenBLAS 29 2 2 2 2 ⊤ − 2 2