Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anonymize Large-scale Sparse User Features at LINE Corp

Anonymize Large-scale Sparse User Features at LINE Corp

2019/3/7 Machine Learning Production Pitch #1
Yeo Chaerim

LINE Developers

March 07, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO,

    LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07
  2. ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE

    Corporation l Account Platform Development Dept. l Ad performance optimization
  3. DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000

    1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9
  4. DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000

    1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features
  5. DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00

    25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9
  6. USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97

    0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score
  7. USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05

    0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction
  8. CONCLUSION l Anonymize user features based on SCDV l Enough

    to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …