Slide 1

Slide 1 text

ANONYMIZE LARGE-SCALE SPARSE USER FEATURES AT LINE CORP CHAERIM YEO, LINE CORPORATION MACHINE LEARNING PRODUCTION PITCH #1, 2019/03/07

Slide 2

Slide 2 text

ABOUT ME l Chaerim Yeo(呂 彩林) l 2018.12 ~ LINE Corporation l Account Platform Development Dept. l Ad performance optimization

Slide 3

Slide 3 text

Agenda • Z-Features • Y-Features • Evaluation • Conclusion

Slide 4

Slide 4 text

Z-FEATURES

Slide 5

Slide 5 text

WHAT ARE Z-FEATURES

Slide 6

Slide 6 text

WHAT ARE Z-FEATURES

Slide 7

Slide 7 text

WHAT ARE Z-FEATURES

Slide 8

Slide 8 text

WHAT ARE Z-FEATURES

Slide 9

Slide 9 text

WHAT ARE Z-FEATURES

Slide 10

Slide 10 text

BENEFIT OF Z-FEATURES Reusable Flexible

Slide 11

Slide 11 text

LIMITATION OF Z-FEATURES Human Interpretable Extremely Sparse

Slide 12

Slide 12 text

Y-FEATURES

Slide 13

Slide 13 text

BEYOND Z-FEATURES Obfuscation Dimensionality Reduction

Slide 14

Slide 14 text

BEYOND Z-FEATURES Obfuscation Dimensionality Reduction With keeping information as far as possible

Slide 15

Slide 15 text

BEYOND Z-FEATURES Obfuscation Dimensionality Reduction SCDV https://arxiv.org/abs/1612.06778

Slide 16

Slide 16 text

OVERVIEW OF SCDV

Slide 17

Slide 17 text

INTEGRATE Z-FEATURES WITH SCDV

Slide 18

Slide 18 text

SYSTEM OVERVIEW

Slide 19

Slide 19 text

EVALUATION

Slide 20

Slide 20 text

DATA DIMENSION RELATIVE TO Z-FEATURES (LOG-SCALE) 0.0001 0.0010 0.0100 0.1000 1.0000 10.0000 100.0000 type1 type2 type3 type4 type5 type6 type7 type8 type9

Slide 21

Slide 21 text

DATA DENSITY LOG-SCALE 0.0000001 0.0000010 0.0000100 0.0001000 0.0010000 0.0100000 0.1000000 1.0000000 type1 type2 type3 type4 type5 type6 type7 type8 type9 z-features y-features

Slide 22

Slide 22 text

DATA SIZE RELATIVE TO Z-FEATURES 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 type1 type2 type3 type4 type5 type6 type7 type8 type9

Slide 23

Slide 23 text

USER DEMOGRAPHICS ESTIMATION MATRICS (RELATIVE TO Z-FEATURES) 0.95 0.96 0.97 0.98 0.99 1.00 1.01 1.02 gender age-group region precision recall f1-score

Slide 24

Slide 24 text

USER DEMOGRAPHICS ESTIMATION RUNNING TIME (RELATIVE TO Z-FEATURES) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 gender age-group region training prediction

Slide 25

Slide 25 text

CONCLUSION

Slide 26

Slide 26 text

CONCLUSION l Anonymize user features based on SCDV l Enough to use in ML l Future works l Add workflow to production l Apply further dimensionality reduction l Auto encoders, PCA, …

Slide 27

Slide 27 text

THANK YOU