Slide 1

Slide 1 text

2019 DevDay Feature as a Service at Data Labs > Chaerim Yeo > LINE Machine Learning Team Senior Software Engineer

Slide 2

Slide 2 text

LINE PLATFORM

Slide 3

Slide 3 text

DATA LABS Sticker Data Labs Ad Manga Music Live News > Independent from service/dev depts. > Aggregate data across various services > Provide analysis/solution from data across various services

Slide 4

Slide 4 text

Feature as a Service

Slide 5

Slide 5 text

WHAT IS IT?

Slide 6

Slide 6 text

WHAT IS IT?

Slide 7

Slide 7 text

WHAT IS IT?

Slide 8

Slide 8 text

WHAT IS IT? Standardization Democratization

Slide 9

Slide 9 text

AVAILABLE FEATURES Z-Features User
 Features Y-Features Obfuscated 
 User Features C-Features Content Features

Slide 10

Slide 10 text

Background

Slide 11

Slide 11 text

SYSTEM OVERVIEW

Slide 12

Slide 12 text

SYSTEM OVERVIEW

Slide 13

Slide 13 text

SYSTEM OVERVIEW

Slide 14

Slide 14 text

SYSTEM OVERVIEW

Slide 15

Slide 15 text

SYSTEM OVERVIEW

Slide 16

Slide 16 text

SYSTEM OVERVIEW

Slide 17

Slide 17 text

SYSTEM OVERVIEW

Slide 18

Slide 18 text

SYSTEM OVERVIEW

Slide 19

Slide 19 text

SYSTEM OVERVIEW

Slide 20

Slide 20 text

SYSTEM OVERVIEW

Slide 21

Slide 21 text

SYSTEM OVERVIEW

Slide 22

Slide 22 text

SYSTEM OVERVIEW

Slide 23

Slide 23 text

SYSTEM OVERVIEW

Slide 24

Slide 24 text

SYSTEM OVERVIEW

Slide 25

Slide 25 text

NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

Slide 26

Slide 26 text

NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

Slide 27

Slide 27 text

NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

Slide 28

Slide 28 text

NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

Slide 29

Slide 29 text

NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

Slide 30

Slide 30 text

Available Features

Slide 31

Slide 31 text

AVAILABLE FEATURES Z-Features User
 Features Y-Features Obfuscated 
 User Features C-Features Content Features

Slide 32

Slide 32 text

Z-FEATURES BACKGROUND GENDER AGE- GROUP REGION User Demographics
 Estimation Look-a-like
 Engine

Slide 33

Slide 33 text

Z-FEATURES BACKGROUND Input Sparse vector
 from
 user's behaviral logs Output Class probabilities

Slide 34

Slide 34 text

> Collection of users' behavioral logs across various LINE services Z-FEATURES OVERVIEW

Slide 35

Slide 35 text

> Collection of users' behavioral logs across various LINE services Z-FEATURES OVERVIEW Transform into structures 
 that cover about 80% of 
 all ML use cases

Slide 36

Slide 36 text

> Collection of users' behavioral logs across various LINE services Z-FEATURES OVERVIEW {...} {...} {...} {...} {...} {...} ... ...

Slide 37

Slide 37 text

Z-FEATURES STATISTICS Dimensions 50M+ Users 890M+ Types 30+ Services 10+

Slide 38

Slide 38 text

Z-FEATURES COMPONENTS USING Z-FEATURES

Slide 39

Slide 39 text

AVAILABLE FEATURES Z-Features User
 Features Y-Features Obfuscated 
 User Features C-Features Content Features

Slide 40

Slide 40 text

Y-FEATURES BACKGROUND Human- interpretable Extremely sparse

Slide 41

Slide 41 text

> Obfuscated user features > Mitigate z-features' problems • Accumulate content embedding based on users' behavioral logs • Reduce dimensionality Y-FEATURES OVERVIEW

Slide 42

Slide 42 text

> Obfuscated user features > Mitigate z-features' problems • Accumulate content embedding based on users' behavioral logs • Reduce dimensionality Y-FEATURES OVERVIEW

Slide 43

Slide 43 text

> Obfuscated user features > Mitigate z-features' problems • Accumulate content embedding based on users' behavioral logs • Reduce dimensionality Y-FEATURES OVERVIEW

Slide 44

Slide 44 text

> Obfuscated user features > Mitigate z-features' problems • Accumulate content embedding based on users' behavioral logs • Reduce dimensionality Y-FEATURES OVERVIEW Matrix sketching + PCA

Slide 45

Slide 45 text

Y-FEATURES STATISTICS Dimensions 60K Users 400M+ Types 20+ Services 10+

Slide 46

Slide 46 text

Y-FEATURES USER DEMOGRAPHICS ESTIMATION FOR JP REGION GENDER ESTIMATION METRICS
 (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 precision recall f1-score 1.00 1.00 0.99 AGE-GROUP ESTIMATION METRICS
 (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 precision recall f1-score 0.88 0.88 0.88 REGION ESTIMATION METRICS
 (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 precision recall f1-score 0.98 0.98 0.99

Slide 47

Slide 47 text

Y-FEATURES USER DEMOGRAPHICS ESTIMATION FOR JP REGION TRAINING TIME
 (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 gender age-group region 0.06 0.02 0.05 PREDICTION TIME
 (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 gender age-group region 0.52 0.51 0.20

Slide 48

Slide 48 text

Y-FEATURES COMPONENTS USING Y-FEATURES User to User
 Recommendation CTR/CVR Prediction
 on Ads Platform

Slide 49

Slide 49 text

FEATURES Z-Features User
 Features Y-Features Obfuscated 
 User Features C-Features Content Features

Slide 50

Slide 50 text

C-FEATURES OVERVIEW > Embedding of each service's contents > Currently available for two services • News articles: SCDV with fastText • Sticker images: Xception

Slide 51

Slide 51 text

C-FEATURES STATISTICS Dimension 15K Contents 3M+ Types 5 Services 2

Slide 52

Slide 52 text

Conclusion

Slide 53

Slide 53 text

HOW WE USE FEATURES AT DATA LABS > Feature as a Service • Achieve data standardization/democratization • Improve development efficiency > Available Features • User features • Obfuscated user features • Content features

Slide 54

Slide 54 text

Thank You