Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Case Study of Issues Arising from LINE’s User...

A Case Study of Issues Arising from LINE’s User Persona Systems Revamping

LINE DEVDAY 2021

November 10, 2021
Tweet

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Transcript

  1. Self introduction Machine Learning Solution Team 2 Tetsuroh Watanabe, Ph.D.

    - User persona system - Estimating users’ demographic and psychographic profiles - Feature store system Machine learning engineer, in charge of - Data scientist @ video game companies - Accounting - Researcher - Evolutionary computing / Multi-agent simulation - Travel by train, plane, ship, bus - Anime / Game / Manga - Quiz club activities Past work experiences Hobbies
  2. Agenda - Introduction to LINE’s user persona system - Improvement

    points identified in the revamping - Future works 
  3. Agenda - Introduction to LINE’s user persona system - Improvement

    points identified in the revamping - Future works 
  4. LINE’s user persona system Systems overview - Estimating users’ persona

    - By machine learning methods - With the service logs from users who have consented - Used in various LINE services - For delivering useful content to LINE user  • Female • Age: 20-24 • Interests: - Music - Fashion • Male • Age: 40-44 • Interests: - Travel - Games - Entertainment Estimated users’ persona (example) Demographic and psychographic profiles
  5. LINE’s user persona system [Use case-1] LINE Display Ads 

    https://linebizid.com/id-en/service/line-display-ads - External AD owners can specify their AD target users. - By estimated user demographic segments
  6. LINE’s user persona system [Use case-2] LINE Official Account: Push

    message targeting  - External account owners can specify target users who will receive their push messages. - By estimated user demographic segments
  7. LINE’s user persona system [Use case-2] LINE Official Account: Push

    message targeting  - External account owners can specify target users who will receive their push messages. - By estimated user demographic segments https://developers.line.biz/en/reference/messaging-api/#send-multicast-message
  8. LINE’s user persona system [Use case-3] New LINE services: Dealing

    with Cold Start Problems  Service log Very limited since the service has just started Item recommendation for User A User A L Too little data to generate recommendations for User A… For you . . . ??? ??? ???
  9. LINE’s user persona system [Use case-3] New LINE services: Dealing

    with Cold Start Problems  Service log Very limited since the service has just started Item recommendation for User A User A Popular Female Age: 20-24 . . . Item A Item B Item C J Popular items tailored to User A can be recommended! User demographics • Female • Age: 20-24
  10. LINE’s user persona system Region MAU (As of June 2021)

    Feature dimensions (As of September 2021) Japan 89M 4.8M+ Taiwan 50M 1.4M+ Thailand 21M 1.1M+ Indonesia 10M 0.4M+ System scale
  11. Data for machine learning z-features: Feature store storing cross-service logs

     only permitted data will be collected across services,
  12. Data for machine learning Ground truth (answer label) data 

    Results of questionnaire surveys of users who gave their consent
  13. Data for machine learning Ground truth (answer label) data 

    Developing another model - Auto-label AD categories from small human-labeled data Results of questionnaire surveys of users who gave their consent
  14. Data for machine learning Auto-label AD categories from small human-labeled

    data  AD (annotated) Ground truth: “Game”
  15. Data for machine learning Auto-label AD categories from small human-labeled

    data  AD (annotated) Ground truth: “Game” AD ML model Train data Train label
  16. Data for machine learning Auto-label AD categories from small human-labeled

    data  AD (annotated) Ground truth: “Game” Estimated: “Game” AD (not annotated) AD ML model Train data Train label Predict
  17. Data for machine learning Auto-label AD categories from small human-labeled

    data  AD (annotated) Ground truth: “Game” Estimated: “Game” AD (not annotated) Ground truth: “Interests: Game” AD ML model Train data Train label Predict Click (Tap)
  18. Data for machine learning Auto-label AD categories from small human-labeled

    data  AD (annotated) Ground truth: “Game” Estimated: “Game” User ML model feature data AD (not annotated) Ground truth: “Interests: Game” AD ML model Train data Train label Predict Click (Tap) Train data Train label
  19. Data for machine learning Auto-label AD categories from small human-labeled

    data  AD (annotated) Ground truth: “Game” Estimated: “Game” User ML model feature data AD (not annotated) Ground truth: “Interests: Game” Estimated: “Interests: Game” AD ML model Train data Train label Predict Click (Tap) Predict Train data Train label
  20. System history Further improved! GPU cluster introduced MPI introduced ghee

    & Kubernetes cluster Theano & Mesos cluster Neural network from scratch 2014 2016 2020 2014 2020 2021
  21. System history Further improved! GPU cluster introduced MPI introduced ghee

    & Kubernetes cluster Theano & Mesos cluster Neural network from scratch 2014 2016 2020 2014 2020 2021
  22. System history Further improved! GPU cluster introduced MPI introduced ghee

    & Kubernetes cluster Theano & Mesos cluster Neural network from scratch 2014 2016 2020 2014 2020 2021
  23. System revamping Legacy system - Mesos cluster - Theano -

    Difficulties - Maintenance - Due to system age - Updating models - Due to inability to use current mainstream libraries (such as PyTorch)
  24. System revamping Legacy system Revamped system - GPU cluster -

    Kubernetes cluster - Distributed processing - Using in-house library “ghee” - Available to use major deep learning libraries (including PyTorch) - Mesos cluster - Theano - Difficulties - Maintenance - Due to system age - Updating models - Due to inability to use current mainstream libraries (such as PyTorch)
  25. System revamping Legacy system Revamped system Identified more room for

    improvement! - Mesos cluster - Theano - Difficulties - Maintenance - Due to system age - Updating models - Due to inability to use current mainstream libraries (such as PyTorch) - GPU cluster - Kubernetes cluster - Distributed processing - Using in-house library “ghee” - Available to use major deep learning libraries (including PyTorch)
  26. Agenda - Introduction to LINE’s user persona system - Improvement

    points identified in the revamping - Future works 
  27. Improvement points 1. Further improvement of ML models 3. Further

    improvement of data and model reusability 2. Systems for stable delivery and monitoring 
  28. Improvement points 1. Further improvement of ML models 3. Further

    improvement of data and model reusability 2. Systems for stable delivery and monitoring
  29. 1. Further improvement of ML models Separated embedding by service

    Embedding layer Simple DNN Output from service A Feature data B C …… model
  30. 1. Further improvement of ML models Separated embedding by service

    Embedding layer Simple DNN Output from service A Feature data B C …… model
  31. 1. Further improvement of ML models Separated embedding by service

    Embedding layer Simple DNN Output from service A Feature data Splitted embedding layer ResNet (CNN) Output B C …… …… from service A Feature data B C …… model model
  32. 1. Further improvement of ML models Implementing a state-of-the-art model

    from service A Feature data Splitted embedding layer MLP-mixer (state-of-the-art) Output B C …… …… Service dropout model
  33. 1. Further improvement of ML models Implementing a state-of-the-art model

    from service A Feature data Splitted embedding layer MLP-mixer (state-of-the-art) Output B C …… …… Service dropout model Model accuracy 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Old Old Old New New New Segment type (# of segments) B (13) A (22) C (11)
  34. Improvement points 1. Further improvement of ML models 3. Further

    improvement of data and model reusability 2. Systems for stable delivery and monitoring
  35. 2-1. System for stable delivery [Issue] Variation in segment volume

    leads to inconvenience for biz side Date Segment volume Segment A C B D [Example] AD / Official account message delivery using estimated demographic segments
  36. 2-1. System for stable delivery [Issue] Variation in segment volume

    leads to inconvenience for biz side Delivery target users Date Segment volume Segment A d d +1 C B D [Example] AD / Official account message delivery using estimated demographic segments %FDSFBTFECZ
  37. 2-1. System for stable delivery [Issue] Variation in segment volume

    leads to inconvenience for biz side Date Segment volume Segment A d d +1 C B D %FDSFBTFECZ  "GUFSKVTUPOFEBZ  &WFOUIPVHIUIF"%BDDPVOUPXOFS IBTOPUDIBOHFEUIFEFMJWFSZTFUUJOH [Example] AD / Official account message delivery using estimated demographic segments Delivery target users
  38. 2-1. System for stable delivery [Solution-1] Smoothing: Reduce daily fluctuations

    Date Segment volume Date Segment volume Smoothing Moving average
  39. 2-1. System for stable delivery [Solution-1] Smoothing: Reduce daily fluctuations

    Date Segment volume Date Smoothing API Segment volume Developing “Smoothing API” - Standardizing processes - Simplifying implementation Moving average
  40. 2-1. System for stable delivery [Solution-2] Volume calibration Example (dummy

    data and settings) Original (after smoothing) 300,000 250,000 200,000 150,000 100,000 50,000 0 Date Segment volume
  41. 2-1. System for stable delivery [Solution-2] Volume calibration Original (after

    smoothing) Calibrated Example (dummy data and settings) 300,000 250,000 200,000 150,000 100,000 50,000 0 Date Segment volume
  42. 2-1. System for stable delivery [Solution-2] Volume calibration Original (after

    smoothing) Calibrated Suppresses too large fluctuations (œ20% in this case) Example (dummy data and settings) 300,000 250,000 200,000 150,000 100,000 50,000 0 Date Segment volume
  43. 2-1. System for stable delivery [Solution-2] Volume calibration Original (after

    smoothing) Calibrated Guarantees the lower limit (150,000 in this case) Example (dummy data and settings) 300,000 250,000 200,000 150,000 100,000 50,000 0 Date Segment volume
  44. 2-1. System for stable delivery [Solution-2] Volume calibration Original (after

    smoothing) Calibrated Guarantees the lower limit (150,000 in this case) Example (dummy data and settings) 300,000 250,000 200,000 150,000 100,000 50,000 0 Date Segment volume Suppresses too large fluctuations (œ20% in this case)
  45. 2-1. System for stable delivery [Solution-3] Gradual rollout: Release model

    updates gradually Date OLD model applied NEW model applied
  46. 2-1. System for stable delivery [Solution-3] Gradual rollout: Release model

    updates gradually Date Segment volume [Example of actual data] Date OLD model applied NEW model applied
  47. 2-1. System for stable delivery [Solution-3] Gradual rollout: Release model

    updates gradually Date Segment volume [Example of actual data] Gradual rollout Old model 100% New model 100% Date OLD model applied NEW model applied
  48. 2-2. System for output monitoring As a framework for MLOps

    MLOps monitoring system “Lupus” Dashboard tool “OASIS”
  49. 2-2. System for output monitoring As a framework for MLOps

    MLOps monitoring system “Lupus” Dashboard tool “OASIS” Session No. ML-2 “Lupus - A monitoring system for accelerating MLOps” Just after this session!
  50. Improvement points 1. Further improvement of ML models 3. Further

    improvement of data and model reusability 2. Systems for stable supply and monitoring
  51. 3. Improvement of data & model reusability [Issue] Similar processes

    implemented in different ways Feature data Preprocess A Processed feature data A Train model A Ground truth A Task A Create dataset A Dataset A Model A
  52. 3. Improvement of data & model reusability [Issue] Similar processes

    implemented in different ways Feature data Preprocess A Processed feature data A Train model A Ground truth A Task A Create dataset A Dataset A Model A
  53. 3. Improvement of data & model reusability [Issue] Similar processes

    implemented in different ways Feature data Preprocess A Processed feature data A Train model A Ground truth A Task A Create dataset A Dataset A Model A
  54. 3. Improvement of data & model reusability [Issue] Similar processes

    implemented in different ways Feature data Preprocess A Processed feature data A Train model A Ground truth A Task A Create dataset A Dataset A Model A
  55. 3. Improvement of data & model reusability [Issue] Similar processes

    implemented in different ways Feature data Preprocess A Processed feature data A Train model A Ground truth A Task A Create dataset A Dataset A Model A Preprocess B Processed feature data B Train model B Ground truth B Task B Create dataset B Dataset B Model B Preprocess C Processed feature data C Train model C Ground truth C Task C Create dataset C Dataset C Model C
  56. 3. Improvement of data & model reusability [Solution-1] Data pipeline

    integration Feature data Train model A Ground truth A Task A Dataset A Model A Preprocess Processed feature data Train model B Ground truth B Task B Dataset B Model B Train model C Ground truth C Task C Dataset C Model C Create dataset A Create dataset B Create dataset C
  57. 3. Improvement of data & model reusability [Solution-1] Data pipeline

    integration Feature data Train model A Ground truth A Task A Dataset A Model A Preprocess Processed feature data Train model B Ground truth B Task B Dataset B Model B Train model C Ground truth C Task C Dataset C Model C Create dataset API
  58. 3. Improvement of data & model reusability [Solution-2] ghee-models API:

    Further simplifying the use and modification of models Feature data Ground truth A Task A Dataset A Model A Preprocess Processed feature data Ground truth B Task B Dataset B Model B Ground truth C Task C Dataset C Model C Create dataset API ghee- models API “ghee-models” = LINE’s proprietary ML models
  59. 3. Improvement of data & model reusability [Solution-3] felib: Organizing

    preprocessing as a library Feature data Ground truth A Task A Dataset A Model A Preprocess Processed feature data Ground truth B Task B Dataset B Model B Ground truth C Task C Dataset C Model C Create dataset API . . . Util library “felib” Util function 1 Util function 2 ghee- models API
  60. System history Further improved! GPU cluster introduced MPI introduced ghee

    & Kubernetes cluster Theano & Mesos cluster Neural network from scratch 2014 2016 2020 2014 2020 2021
  61. Agenda - Introduction to LINE’s user persona systems - Improvement

    points identified in the revamping - Future works
  62. Future works [1] ghee-models upgrade: Reduce implementation for adding new

    models Dataset Preprocess code α ML Method α Train code α Predict code α Postprocess code α Model α Output α AS-IS
  63. Future works [1] ghee-models upgrade: Reduce implementation for adding new

    models Dataset Preprocess code α ML Method α Train code α Predict code α Postprocess code α Model α Output α Preprocess code β ML Method β Train code β Predict code β Postprocess code β Model β Output β Preprocess code γ ML Method γ Train code γ Predict code γ Postprocess code γ Model γ Output γ AS-IS
  64. Future works [1] ghee-models upgrade: Reduce implementation for adding new

    models Dataset ML Method α Model α Output α ML Method β Model β Output β ML Method γ Model γ Output γ Preprocess core code Train core code Predict core code Postprocess core code For α For α For α For α For β For β For β For β For γ For γ For γ For γ TO-BE
  65. Future works [2] Auto-persona: Enableing original segment estimation by anyone

    in LINE Each LINE service side Original ground truth (Some users) • Each service can generate its original segments. • All they need is the label data mapped to users. • No feature data is required.
  66. Future works [2] Auto-persona: Enableing original segment estimation by anyone

    in LINE Each LINE service side Original ground truth (Some users) Original estimated segments (All users) Auto-persona API • Each service can generate its original segments. • All they need is the label data mapped to users. • No feature data is required.
  67. Future works [2] Auto-persona: Enableing original segment estimation by anyone

    in LINE Feature data Preprocess Processed feature data Auto-persona API Dataset Model Create dataset Train model Original ground truth (Some users) Each LINE service side Original estimated segments (All users) • Each service can generate its original segments. • All they need is the label data mapped to users. • No feature data is required.
  68. Future works [2] Auto-persona: Enableing original segment estimation by anyone

    in LINE Feature data Preprocess Processed feature data Auto-persona API Dataset Model Create dataset Train model Original ground truth (Some users) Each LINE service side Original estimated segments (All users) Able to deliver more useful content to each and every LINE user • Each service can generate its original segments. • All they need is the label data mapped to users. • No feature data is required.