Upgrade to Pro — share decks privately, control downloads, hide ads and more …

pSVM and LIBSVM CUDA Package Comparison

Matthew Barga
November 05, 2012
130

pSVM and LIBSVM CUDA Package Comparison

masters progress report Nov. 1, 2012

Matthew Barga

November 05, 2012
Tweet

Transcript

  1. pSVM Optimizes the QP problem directly (O(n3) where n =

    number of samples) Two keys: 1 Incomplete Cholesky Factorization - reduced row-rank approximate factorization of kernel matrix 2 Interior Point Method - parallelizable quadratic programming solver Reduces memory use from O(n2) to O(np/m) where p/m n (p is reduced column dimension of factorized matrix, m is number of machines) Computation time from O(n2) (decomposition method) to O(np2/m) (bottleneck on IPM step in matrix inversion)
  2. GPU-LIBSVM Pre-calculate on the CPU the sum of squares of

    the elements for each training vector Convert the training vectors array into GPU compatible format Allocate memory on the GPU for the training vectors array Load the training vectors array to the GPU memory FOR (each training vector) DO Load the training vector to the GPU (because the version on the GPU is in a translated format) Perform the matrix-vector multiplication, i.e. calculate the dot products, using CUBLAS Retrieve the dot products vector from the GPU Calculate the line of the KM by adding the training vector squares, then calculating Φ(x, y) according to Equation 1 END DO De-allocate memory from GPU
  3. Complexity Overview Memory Compuational LIBSVM O(n) O(ITER ∗ ); O(ITER

    ∗ ∗ n) PSVM O(np/m) O(np2/m) #ITER empirically shows that it scales more than linearly with number of samples l is the nubmer of operations needed for the kernel function
  4. Runtime Analysis gisette_scale cod_rna a9a news20 ijcnn Features 5000 8

    123 1355191 22 Samples 6000 59535 32561 19996 49990 GPULIBSVM 1:21m 3:16m 1:22m N/A 35s LIBSVM 1:52m 3:53m 0:52m 14:21m 36s PSVM 2:18m » 12h » 5m » 12h » 1h
  5. Empirical Discussion pSVM is implemented in MPI for compuing clusters.

    In this analysis I used a single machine (4 hardware cores) pSVM highly dependent on number of samples LIBSVM should depend highly on the number of samples Number of kernel matrix elements is equal to the square of the number of training vectors (contains all possible inner products)
  6. GPU-LIBSVM GPULIBSVM group published runtime test on their website using

    the following hardware: quad-core Intel Q6600 nvidia 8800GTS 3.5GB DDR2 RAM WindowsXP 32-bit This study performed parameter optimization through cross-validation of the data Used automated script to accomplish this, but did not publish nature of cross-validation Increased speedup will be noticable here as parameter optimization repeatedly recalculates the kernel matrix Cross-validation will also repeat the calculation of the kernel matrix
  7. GPU-LIBSVM Input Data TRECVID 2007 video features dimension of 6000

    20 features models (sets) from 36 to 3772 training samples source: http://mklab.iti.gr/project/GPU-LIBSVM
  8. Upcoming Determine current state of large scale applications and needs

    Multikernel SVM Multiclass SVM (multisvm CUDA)