Upgrade to Pro — share decks privately, control downloads, hide ads and more …

K Nearest Neighbourhood on GPU

Ciel
July 24, 2014

K Nearest Neighbourhood on GPU

K Nearest Neighbourhood using inverted list on GPU

Ciel

July 24, 2014
Tweet

More Decks by Ciel

Other Decks in Research

Transcript

  1. K Nearest Neighbourhood Fundamental Operator in Data Mining Classification 0

    5 10 15 20 0 3 6 9 12 Regression Collaborative Filtering You may like * Apple * Google * Amazon
  2. SELECT SEX M AGE 18 SALARY 2900 Sex Age Salary

    … M 20 3000 … F 17 3600 … M 18 4000 … F 19 2900 … K Nearest Neighbourhood A running example
  3. SELECT SEX M AGE 18 SALARY 2900 K Nearest Neighbourhood

    Sex Age Salary … M 20 3000 … F 17 3600 … M 18 4000 … F 19 2900 … A running example
  4. DIM + VALUE SEX+M SEX+F AGE+18 AGE+19 … 2 0

    3 1 2 Invert list: row_id SELECT SEX M AGE 18 SALARY 2900 3 How do we store the inverted list table on GPU?
  5. DIM + VALUE Inverted List … … AGE+17 1 AGE+18

    2, 3 AGE+19 4 AGE+20 9, 10 AGE+21 11 … … Row ID Count AGG … … … 1 0 0 2 0 0 3 0 0 4 0 0 … … … SELECT AGE 18±1 Step 1: Matching & Aggregation
  6. DIM + VALUE Inverted List … … AGE+17 1 AGE+18

    2, 3 AGE+19 4 AGE+20 9, 10 AGE+21 11 … … Row ID Count AGG … … … 1 0 0 2 1 1*0.5 3 1 1*0.5 4 0 0 … … … SELECT AGE 18±1 Step 1: Matching & Aggregation
  7. DIM + VALUE Inverted List … … AGE+17 1 AGE+18

    2, 3 AGE+19 4 AGE+20 9, 10 AGE+21 11 … … Row ID Count AGG … … … 1 1 1*0.5 2 1 1*0.5 3 1 1*0.5 4 1 1*0.5 … … … SELECT AGE 18±1 Step 1: Matching & Aggregation
  8. DIM + VALUE Inverted List … … SALARY+2500 NULL SALARY+3000

    0, 3 SALARY+3500 1 SALARY+4000 2 SALARY+4500 4,5 … … SELECT SALARY 2900±1000 Row ID Count AGG … … … 1 1 0.5 2 1 0.5 3 1 0.5 4 1 0.5 … … … Step 1: Matching & Aggregation
  9. DIM + VALUE Inverted List … … SALARY+2500 NULL SALARY+3000

    0, 3 SALARY+3500 1 SALARY+4000 2 SALARY+4500 4,5 … … Row ID Count AGG … … … 1 1 0.5 2 1 0.5 3 2 1*0.3+0.5 4 1 0.5 … … … SELECT SALARY 2900±1000 Step 1: Matching & Aggregation
  10. Row ID Count AGG … … … 1 1 0.5

    2 1 0.5 3 2 0.8 4 1 0.5 … … … K Selection What is the fast K Selection algorithm? Step 2: K Selection
  11. R_id R_id R_id R_id R_id R_id R_id D+V1 D+V2 D+V3

    invert_list_idx invert_list_table end_index First approach to store the inverted list table on GPU GPU
  12. Mapping C P U ! M E M O R

    Y MAP(KEY, INDEX) device_vector
  13. Mapping C P U ! M E M O R

    Y raw_pointer get(key) map(key, value) freeze() ratio()
  14. Bucket Top K Selection Algorithm 2 4 1 5 2

    1 K = 10 First 7 results Bucket_Num = (Value - MIN) / (MAX - MIN) * Number_Of_Buckets
  15. #define NAME “YIWEI GONG” #define UNIVERSITY “NTU” #define EMAIL “[email protected]

    #define BLOG “http://ciel.im” #define ME “A stupid programmer” THANK YOU
  16. Block 1 Block 2 Block 3 Block 4 Block 5

    Block 6 GPU Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Block