Feature as a Service at Data Labs

Feature as a Service at Data Labs

Chaerim Yeo
LINE Machine Learning Team Senior Software Engineer
https://linedevday.linecorp.com/jp/2019/sessions/C1-5

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay Feature as a Service at Data Labs >

    Chaerim Yeo > LINE Machine Learning Team Senior Software Engineer
  2. LINE PLATFORM

  3. DATA LABS Sticker Data Labs Ad Manga Music Live News

    > Independent from service/dev depts. > Aggregate data across various services > Provide analysis/solution from data across various services
  4. Feature as a Service

  5. WHAT IS IT?

  6. WHAT IS IT?

  7. WHAT IS IT?

  8. WHAT IS IT? Standardization Democratization

  9. AVAILABLE FEATURES Z-Features User
 Features Y-Features Obfuscated 
 User Features

    C-Features Content Features
  10. Background

  11. SYSTEM OVERVIEW

  12. SYSTEM OVERVIEW

  13. SYSTEM OVERVIEW

  14. SYSTEM OVERVIEW

  15. SYSTEM OVERVIEW

  16. SYSTEM OVERVIEW

  17. SYSTEM OVERVIEW

  18. SYSTEM OVERVIEW

  19. SYSTEM OVERVIEW

  20. SYSTEM OVERVIEW

  21. SYSTEM OVERVIEW

  22. SYSTEM OVERVIEW

  23. SYSTEM OVERVIEW

  24. SYSTEM OVERVIEW

  25. NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

  26. NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

  27. NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

  28. NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

  29. NATURE OF CENTRALIZED FEATURES Versatile Flexible Reusable Extensible

  30. Available Features

  31. AVAILABLE FEATURES Z-Features User
 Features Y-Features Obfuscated 
 User Features

    C-Features Content Features
  32. Z-FEATURES BACKGROUND GENDER AGE- GROUP REGION User Demographics
 Estimation Look-a-like


    Engine
  33. Z-FEATURES BACKGROUND Input Sparse vector
 from
 user's behaviral logs Output

    Class probabilities
  34. > Collection of users' behavioral logs across various LINE services

    Z-FEATURES OVERVIEW
  35. > Collection of users' behavioral logs across various LINE services

    Z-FEATURES OVERVIEW Transform into structures 
 that cover about 80% of 
 all ML use cases
  36. > Collection of users' behavioral logs across various LINE services

    Z-FEATURES OVERVIEW {...} {...} {...} {...} {...} {...} ... ...
  37. Z-FEATURES STATISTICS Dimensions 50M+ Users 890M+ Types 30+ Services 10+

  38. Z-FEATURES COMPONENTS USING Z-FEATURES

  39. AVAILABLE FEATURES Z-Features User
 Features Y-Features Obfuscated 
 User Features

    C-Features Content Features
  40. Y-FEATURES BACKGROUND Human- interpretable Extremely sparse

  41. > Obfuscated user features > Mitigate z-features' problems • Accumulate

    content embedding based on users' behavioral logs • Reduce dimensionality Y-FEATURES OVERVIEW
  42. > Obfuscated user features > Mitigate z-features' problems • Accumulate

    content embedding based on users' behavioral logs • Reduce dimensionality Y-FEATURES OVERVIEW
  43. > Obfuscated user features > Mitigate z-features' problems • Accumulate

    content embedding based on users' behavioral logs • Reduce dimensionality Y-FEATURES OVERVIEW
  44. > Obfuscated user features > Mitigate z-features' problems • Accumulate

    content embedding based on users' behavioral logs • Reduce dimensionality Y-FEATURES OVERVIEW Matrix sketching + PCA
  45. Y-FEATURES STATISTICS Dimensions 60K Users 400M+ Types 20+ Services 10+

  46. Y-FEATURES USER DEMOGRAPHICS ESTIMATION FOR JP REGION GENDER ESTIMATION METRICS


    (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 precision recall f1-score 1.00 1.00 0.99 AGE-GROUP ESTIMATION METRICS
 (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 precision recall f1-score 0.88 0.88 0.88 REGION ESTIMATION METRICS
 (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 precision recall f1-score 0.98 0.98 0.99
  47. Y-FEATURES USER DEMOGRAPHICS ESTIMATION FOR JP REGION TRAINING TIME
 (RELATIVE

    TO Z-FEATURES) 0 0.25 0.5 0.75 1 gender age-group region 0.06 0.02 0.05 PREDICTION TIME
 (RELATIVE TO Z-FEATURES) 0 0.25 0.5 0.75 1 gender age-group region 0.52 0.51 0.20
  48. Y-FEATURES COMPONENTS USING Y-FEATURES User to User
 Recommendation CTR/CVR Prediction


    on Ads Platform
  49. FEATURES Z-Features User
 Features Y-Features Obfuscated 
 User Features C-Features

    Content Features
  50. C-FEATURES OVERVIEW > Embedding of each service's contents > Currently

    available for two services • News articles: SCDV with fastText • Sticker images: Xception
  51. C-FEATURES STATISTICS Dimension 15K Contents 3M+ Types 5 Services 2

  52. Conclusion

  53. HOW WE USE FEATURES AT DATA LABS > Feature as

    a Service • Achieve data standardization/democratization • Improve development efficiency > Available Features • User features • Obfuscated user features • Content features
  54. Thank You