Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Feature as a Service at Data Labs

Feature as a Service at Data Labs

Chaerim Yeo
LINE Machine Learning Team Senior Software Engineer
https://linedevday.linecorp.com/jp/2019/sessions/C1-5

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay
    Feature as a Service at Data Labs
    > Chaerim Yeo
    > LINE Machine Learning Team Senior Software Engineer

    View Slide

  2. LINE PLATFORM

    View Slide

  3. DATA LABS
    Sticker
    Data Labs
    Ad
    Manga
    Music
    Live
    News
    > Independent from service/dev depts.
    > Aggregate data across various services
    > Provide analysis/solution from data
    across various services

    View Slide

  4. Feature as a Service

    View Slide

  5. WHAT IS IT?

    View Slide

  6. WHAT IS IT?

    View Slide

  7. WHAT IS IT?

    View Slide

  8. WHAT IS IT?
    Standardization Democratization

    View Slide

  9. AVAILABLE FEATURES
    Z-Features
    User

    Features
    Y-Features
    Obfuscated 

    User Features
    C-Features
    Content
    Features

    View Slide

  10. Background

    View Slide

  11. SYSTEM OVERVIEW

    View Slide

  12. SYSTEM OVERVIEW

    View Slide

  13. SYSTEM OVERVIEW

    View Slide

  14. SYSTEM OVERVIEW

    View Slide

  15. SYSTEM OVERVIEW

    View Slide

  16. SYSTEM OVERVIEW

    View Slide

  17. SYSTEM OVERVIEW

    View Slide

  18. SYSTEM OVERVIEW

    View Slide

  19. SYSTEM OVERVIEW

    View Slide

  20. SYSTEM OVERVIEW

    View Slide

  21. SYSTEM OVERVIEW

    View Slide

  22. SYSTEM OVERVIEW

    View Slide

  23. SYSTEM OVERVIEW

    View Slide

  24. SYSTEM OVERVIEW

    View Slide

  25. NATURE OF CENTRALIZED FEATURES
    Versatile Flexible Reusable Extensible

    View Slide

  26. NATURE OF CENTRALIZED FEATURES
    Versatile Flexible Reusable Extensible

    View Slide

  27. NATURE OF CENTRALIZED FEATURES
    Versatile Flexible Reusable Extensible

    View Slide

  28. NATURE OF CENTRALIZED FEATURES
    Versatile Flexible Reusable Extensible

    View Slide

  29. NATURE OF CENTRALIZED FEATURES
    Versatile Flexible Reusable Extensible

    View Slide

  30. Available Features

    View Slide

  31. AVAILABLE FEATURES
    Z-Features
    User

    Features
    Y-Features
    Obfuscated 

    User Features
    C-Features
    Content
    Features

    View Slide

  32. Z-FEATURES
    BACKGROUND
    GENDER
    AGE-
    GROUP REGION
    User Demographics

    Estimation
    Look-a-like

    Engine

    View Slide

  33. Z-FEATURES
    BACKGROUND
    Input
    Sparse vector

    from

    user's behaviral logs
    Output
    Class probabilities

    View Slide

  34. > Collection of users' behavioral logs across various LINE services
    Z-FEATURES
    OVERVIEW

    View Slide

  35. > Collection of users' behavioral logs across various LINE services
    Z-FEATURES
    OVERVIEW
    Transform into structures 

    that cover about 80% of 

    all ML use cases

    View Slide

  36. > Collection of users' behavioral logs across various LINE services
    Z-FEATURES
    OVERVIEW
    {...} {...}
    {...} {...}
    {...} {...}
    ...
    ...

    View Slide

  37. Z-FEATURES
    STATISTICS
    Dimensions
    50M+
    Users
    890M+
    Types
    30+
    Services
    10+

    View Slide

  38. Z-FEATURES
    COMPONENTS USING Z-FEATURES

    View Slide

  39. AVAILABLE FEATURES
    Z-Features
    User

    Features
    Y-Features
    Obfuscated 

    User Features
    C-Features
    Content
    Features

    View Slide

  40. Y-FEATURES
    BACKGROUND
    Human-
    interpretable
    Extremely
    sparse

    View Slide

  41. > Obfuscated user features
    > Mitigate z-features' problems
    • Accumulate content embedding based on users' behavioral logs
    • Reduce dimensionality
    Y-FEATURES
    OVERVIEW

    View Slide

  42. > Obfuscated user features
    > Mitigate z-features' problems
    • Accumulate content embedding based on users' behavioral logs
    • Reduce dimensionality
    Y-FEATURES
    OVERVIEW

    View Slide

  43. > Obfuscated user features
    > Mitigate z-features' problems
    • Accumulate content embedding based on users' behavioral logs
    • Reduce dimensionality
    Y-FEATURES
    OVERVIEW

    View Slide

  44. > Obfuscated user features
    > Mitigate z-features' problems
    • Accumulate content embedding based on users' behavioral logs
    • Reduce dimensionality
    Y-FEATURES
    OVERVIEW
    Matrix sketching + PCA

    View Slide

  45. Y-FEATURES
    STATISTICS
    Dimensions
    60K
    Users
    400M+
    Types
    20+
    Services
    10+

    View Slide

  46. Y-FEATURES
    USER DEMOGRAPHICS ESTIMATION FOR JP REGION
    GENDER ESTIMATION METRICS

    (RELATIVE TO Z-FEATURES)
    0
    0.25
    0.5
    0.75
    1
    precision recall f1-score
    1.00
    1.00
    0.99
    AGE-GROUP ESTIMATION METRICS

    (RELATIVE TO Z-FEATURES)
    0
    0.25
    0.5
    0.75
    1
    precision recall f1-score
    0.88
    0.88
    0.88
    REGION ESTIMATION METRICS

    (RELATIVE TO Z-FEATURES)
    0
    0.25
    0.5
    0.75
    1
    precision recall f1-score
    0.98
    0.98
    0.99

    View Slide

  47. Y-FEATURES
    USER DEMOGRAPHICS ESTIMATION FOR JP REGION
    TRAINING TIME

    (RELATIVE TO Z-FEATURES)
    0
    0.25
    0.5
    0.75
    1
    gender age-group region
    0.06
    0.02
    0.05
    PREDICTION TIME

    (RELATIVE TO Z-FEATURES)
    0
    0.25
    0.5
    0.75
    1
    gender age-group region
    0.52
    0.51
    0.20

    View Slide

  48. Y-FEATURES
    COMPONENTS USING Y-FEATURES
    User to User

    Recommendation
    CTR/CVR Prediction

    on Ads Platform

    View Slide

  49. FEATURES
    Z-Features
    User

    Features
    Y-Features
    Obfuscated 

    User Features
    C-Features
    Content
    Features

    View Slide

  50. C-FEATURES
    OVERVIEW
    > Embedding of each service's contents
    > Currently available for two services
    • News articles: SCDV with fastText
    • Sticker images: Xception

    View Slide

  51. C-FEATURES
    STATISTICS
    Dimension
    15K
    Contents
    3M+
    Types
    5
    Services
    2

    View Slide

  52. Conclusion

    View Slide

  53. HOW WE USE FEATURES AT DATA LABS
    > Feature as a Service
    • Achieve data standardization/democratization
    • Improve development efficiency
    > Available Features
    • User features
    • Obfuscated user features
    • Content features

    View Slide

  54. Thank You

    View Slide