pSVM and LIBSVM CUDA Package Comparison

Support Vector Machines Matthew Barga 01 November 2012

1 SVM Training Methods 2 Environments 3 Analysis 4 GPULIBSVM
Published Analysis 5 Closing

pSVM Optimizes the QP problem directly (O(n3) where n =
number of samples) Two keys: 1 Incomplete Cholesky Factorization - reduced row-rank approximate factorization of kernel matrix 2 Interior Point Method - parallelizable quadratic programming solver Reduces memory use from O(n2) to O(np/m) where p/m n (p is reduced column dimension of factorized matrix, m is number of machines) Computation time from O(n2) (decomposition method) to O(np2/m) (bottleneck on IPM step in matrix inversion)

GPU-LIBSVM Pre-calculate on the CPU the sum of squares of
the elements for each training vector Convert the training vectors array into GPU compatible format Allocate memory on the GPU for the training vectors array Load the training vectors array to the GPU memory FOR (each training vector) DO Load the training vector to the GPU (because the version on the GPU is in a translated format) Perform the matrix-vector multiplication, i.e. calculate the dot products, using CUBLAS Retrieve the dot products vector from the GPU Calculate the line of the KM by adding the training vector squares, then calculating Φ(x, y) according to Equation 1 END DO De-allocate memory from GPU

Complexity Overview Memory Compuational LIBSVM O(n) O(ITER ∗ ); O(ITER
∗ ∗ n) PSVM O(np/m) O(np2/m) #ITER empirically shows that it scales more than linearly with number of samples l is the nubmer of operations needed for the kernel function

Setup Intel core i7 870s nvidia geforce gtx580 2GB RAM
Ubuntu 10.04 64-bit

Runtime Analysis gisette_scale cod_rna a9a news20 ijcnn Features 5000 8
123 1355191 22 Samples 6000 59535 32561 19996 49990 GPULIBSVM 1:21m 3:16m 1:22m N/A 35s LIBSVM 1:52m 3:53m 0:52m 14:21m 36s PSVM 2:18m » 12h » 5m » 12h » 1h

Empirical Discussion pSVM is implemented in MPI for compuing clusters.
In this analysis I used a single machine (4 hardware cores) pSVM highly dependent on number of samples LIBSVM should depend highly on the number of samples Number of kernel matrix elements is equal to the square of the number of training vectors (contains all possible inner products)

GPU-LIBSVM GPULIBSVM group published runtime test on their website using
the following hardware: quad-core Intel Q6600 nvidia 8800GTS 3.5GB DDR2 RAM WindowsXP 32-bit This study performed parameter optimization through cross-validation of the data Used automated script to accomplish this, but did not publish nature of cross-validation Increased speedup will be noticable here as parameter optimization repeatedly recalculates the kernel matrix Cross-validation will also repeat the calculation of the kernel matrix

GPU-LIBSVM Input Data TRECVID 2007 video features dimension of 6000
20 features models (sets) from 36 to 3772 training samples source: http://mklab.iti.gr/project/GPU-LIBSVM

Upcoming Determine current state of large scale applications and needs
Multikernel SVM Multiclass SVM (multisvm CUDA)

Cited Works

pSVM and LIBSVM CUDA Package Comparison

pSVM and LIBSVM CUDA Package Comparison

Matthew Barga

More Decks by Matthew Barga

Featured

Transcript

Support Vector Machines Matthew Barga 01 November 2012

1 SVM Training Methods 2 Environments 3 Analysis 4 GPULIBSVM

pSVM Optimizes the QP problem directly (O(n3) where n =

GPU-LIBSVM Pre-calculate on the CPU the sum of squares of

Complexity Overview Memory Compuational LIBSVM O(n) O(ITER ∗ ); O(ITER

Setup Intel core i7 870s nvidia geforce gtx580 2GB RAM

Runtime Analysis gisette_scale cod_rna a9a news20 ijcnn Features 5000 8

Empirical Discussion pSVM is implemented in MPI for compuing clusters.

GPU-LIBSVM GPULIBSVM group published runtime test on their website using

GPU-LIBSVM Input Data TRECVID 2007 video features dimension of 6000

Upcoming Determine current state of large scale applications and needs

Cited Works