$30 off During Our Annual Pro Sale. View Details »

Data Analysis Supporting Policy Decisions of New Service LIVEBUY

Data Analysis Supporting Policy Decisions of New Service LIVEBUY

Keiichiro Nagao (LINE / Family Service Data Science Team / Data Scientist)

https://tech-verse.me/ja/sessions/40
https://tech-verse.me/en/sessions/40
https://tech-verse.me/ko/sessions/40

Tech-Verse2022
PRO

November 17, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. Data Analysis Supporting Policy Decisions of New Service LIVEBUY Keiichiro

    Nagao / LINE
  2. Self-Introduction ML & DS Planning Team Data Science center Machine

    Learning Solution Dept. Data Science Dept.2 Data Science Dept.1 Ad Data Dept. - Keiichiro Nagao - Joined LINE Corporation as a Data Scientist in October 2020 - Working on data analysis for family service data projects especially LIVEBUY
  3. Intro

  4. Session Theme - What if you are assigned to a

    new service as a data scientist? - Seems impossible to support fundamental policy decisions under the lack of service data - Multi-dimensional log data of other services causes compliance issues - Data unique to LINE helps to resolve these issues
  5. Agenda - What is LIVEBUY? - Hypothesis - Problems &

    Solutions - y-features - Result - Application - Session Summary
  6. What is LIVEBUY?

  7. - Pre-released in 2021/11 - Broadcasting various programs in LINE

    app - Users are able to purchase products with a few taps Live commerce
  8. - Users can also chat in broadcasts - Comments are

    usually to praise products, to say hello to presenters or to ask the usage of product Interactive Communication
  9. Service Growth The number of broadcasts is increasing 2.8x In

    this term Apr-22 May-22 Jun-22 Jul-22 Aug-22 Sep-22 Cumulative Count of Broadcasts
  10. Hypothesis

  11. - The hypothesis above was important for UI and UX

    improvement - Verify whether chatting users turn to be purchasers - If so, programs should be more chatting-oriented Chatting in Broadcasts Stimulates Purchasing Products? ?
  12. Glance of Aggregation - Chatting users indicated higher average of

    times of purchasing and amount of purchasing - However, can we really take it for granted? Program Difference of Avg between two groups Times of Purchasing Amount of Purchasing Program1 68.3x 102.1x Program2 16.4x 14.6x Program3 12.4x 11.6x
  13. Glance of Aggregation - Chatting users indicated higher average of

    times of purchasing and amount of purchasing - However, can we really take it for granted? Program Difference of Avg between two groups Times of Purchasing Amount of Purchasing Program1 68.3x 102.1x Program2 16.4x 14.6x Program3 12.4x 11.6x
  14. Problems & Solutions

  15. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  16. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  17. Selection Bias - Situations where a part of subjects are

    selected from the whole population - Simple aggregation often misleads - In LIVEBUY case, Chatting users are likely to be more motivated to purchase
  18. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  19. Rubin’s Counterfactual - Assume what if chatting user hadn’t chatted

    and calculate difference of average as the effect of chatting - In reality, such data is not available - Nevertheless, users having similar covariates can compensate for it 𝑦! , 𝑦" ⊥ 𝑧|𝒙 Treatment(𝒛 = 𝟏) Control(𝒛 = 𝟎) Result when treated 𝑦! Treated group’s data NA Result when not treated 𝑦" NA Controlled group’s data
  20. Rubin’s Counterfactual - Assume what if chatting user hadn’t chatted

    and calculate difference of average as the effect of chatting - In reality, such data is not available - Nevertheless, users having similar covariates can compensate for it 𝑦! , 𝑦" ⊥ 𝑧|𝒙 Treatment(𝒛 = 𝟏) Control(𝒛 = 𝟎) Result when treated 𝑦! Treated group’s data NA Result when not treated 𝑦" NA Controlled group’s data
  21. Rubin’s Counterfactual - Assume what if chatting user hadn’t chatted

    and calculate difference of average as the effect of chatting - In reality, such data is not available - Nevertheless, users having similar covariates can compensate for it 𝑦! , 𝑦" ⊥ 𝑧|𝒙 Treatment(𝒛 = 𝟏) Control(𝒛 = 𝟎) Result when treated 𝑦! Treated group’s data NA Result when not treated 𝑦" NA Controlled group’s data
  22. Propensity Score - Create the model that predicts the probability

    𝑒! of chatting with covariates - Using 𝑒! , difference of average value should be correctly weighted(IPW estimator) 𝐴𝑇𝐸 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝐸𝑓𝑓𝑒𝑐𝑡 = 𝐸 𝑌" − 𝐸 𝑌# = 1 !$" % 𝑧! 𝑒! 𝑦! / 1 &$" % 𝑧& 𝑒& − 1 !$" % 1 − 𝑧! 1 − 𝑒! 𝑦! / 1 &$" % 1 − 𝑧& 1 − 𝑒&
  23. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  24. Cold Start Problem - New services such as LIVEBUY do

    not have enough data to train model - Multi-dimensional data from other services causes compliance issues - However, in LINE, ML provides a well- thought-out solution Service Data Alternative Data ? Users
  25. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  26. y-features

  27. Collaboration between Teams ML & DS Planning Team Data Science

    center Data Science Dept.2 Machine Learning Solution Dept. Data Science Dept.1 Ad Data Dept. - Developed by Machine Learning Solution Department - DS & ML Departments collaborate on improvement of recommendation engines etc. - ML & DS Planning Team plays a great role in connecting us
  28. Overview - Transformed from z-features that covers cross-sectional service usage

    of LINE users - Over 30 types of data such as LINE News, LINE Sticker and AD reaction are available - Mitigate z-features’ two problems: interpretable and extremely sparse z-features: https://speakerdeck.com/line_devday2019/feature-as-a-service-at-data-labs ML User Friendly
  29. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter
  30. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter User logs of various services that are extremely sparse
  31. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter
  32. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter Content info such as news title and sticker price
  33. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter Dense vector of user log info is generated by GCN GCN: https://speakerdeck.com/line_devday2020/distributed-computing-library-for-big-data-ml-applications?slide=44
  34. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter Cannot be reverted to original user logs(z-features)
  35. Datamart for Modeling Using dense vector of other service logs

    by each user as covariates Service Z (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx) ・・・ ・・・ (xxx, xxx, xxx, ・・・, xxx, xxx) 0 1 Service B Service A Chat (target) User y-features(covariates) (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx)
  36. Result

  37. Model Validation Logistic regression model with L2 regularization satisfied statistical

    criteria
  38. Covariate Distribution - Indicated the difference of mean in ±0.2

    in each feature - Correction of propensity score succeeded
  39. Estimation - Using propensity score, bias in raw difference should

    be corrected - Still Indicated higher lift of weighted difference in both indices - Hypothesis “Chatting in broadcasts stimulates purchasing products” is supported Program Metrics Raw Difference Weighted Difference Program2 Avg Times of Purchasing 16.4x 10.1x Avg Amount of Purchasing 14.6x 6.4x
  40. Application

  41. UI Improvement Chat Scrolling As-Is To-Be

  42. - As program direction alters to be more easily-chatting, LIVEBUY

    increases Paid Users User Engagement
  43. Session Summary

  44. Session Summary - Supposed the situation you enrolled to a

    new service such as LIVEBUY - Found it Hard to analyze with the lack of service data and multi-dimensional other service data also - In LINE Data Science Center, y-features enable us to overcome these problems - Thanks to that, propensity score analysis succeeded and supported the fundamental policy decision
  45. Thank you