Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Analysis Supporting Policy Decisions of Ne...

Data Analysis Supporting Policy Decisions of New Service LIVEBUY

Keiichiro Nagao (LINE / Family Service Data Science Team / Data Scientist)

https://tech-verse.me/ja/sessions/40
https://tech-verse.me/en/sessions/40
https://tech-verse.me/ko/sessions/40

Tech-Verse2022

November 17, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. Self-Introduction ML & DS Planning Team Data Science center Machine

    Learning Solution Dept. Data Science Dept.2 Data Science Dept.1 Ad Data Dept. - Keiichiro Nagao - Joined LINE Corporation as a Data Scientist in October 2020 - Working on data analysis for family service data projects especially LIVEBUY
  2. Session Theme - What if you are assigned to a

    new service as a data scientist? - Seems impossible to support fundamental policy decisions under the lack of service data - Multi-dimensional log data of other services causes compliance issues - Data unique to LINE helps to resolve these issues
  3. Agenda - What is LIVEBUY? - Hypothesis - Problems &

    Solutions - y-features - Result - Application - Session Summary
  4. - Pre-released in 2021/11 - Broadcasting various programs in LINE

    app - Users are able to purchase products with a few taps Live commerce
  5. - Users can also chat in broadcasts - Comments are

    usually to praise products, to say hello to presenters or to ask the usage of product Interactive Communication
  6. Service Growth The number of broadcasts is increasing 2.8x In

    this term Apr-22 May-22 Jun-22 Jul-22 Aug-22 Sep-22 Cumulative Count of Broadcasts
  7. - The hypothesis above was important for UI and UX

    improvement - Verify whether chatting users turn to be purchasers - If so, programs should be more chatting-oriented Chatting in Broadcasts Stimulates Purchasing Products? ?
  8. Glance of Aggregation - Chatting users indicated higher average of

    times of purchasing and amount of purchasing - However, can we really take it for granted? Program Difference of Avg between two groups Times of Purchasing Amount of Purchasing Program1 68.3x 102.1x Program2 16.4x 14.6x Program3 12.4x 11.6x
  9. Glance of Aggregation - Chatting users indicated higher average of

    times of purchasing and amount of purchasing - However, can we really take it for granted? Program Difference of Avg between two groups Times of Purchasing Amount of Purchasing Program1 68.3x 102.1x Program2 16.4x 14.6x Program3 12.4x 11.6x
  10. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  11. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  12. Selection Bias - Situations where a part of subjects are

    selected from the whole population - Simple aggregation often misleads - In LIVEBUY case, Chatting users are likely to be more motivated to purchase
  13. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  14. Rubin’s Counterfactual - Assume what if chatting user hadn’t chatted

    and calculate difference of average as the effect of chatting - In reality, such data is not available - Nevertheless, users having similar covariates can compensate for it 𝑦! , 𝑦" ⊥ 𝑧|𝒙 Treatment(𝒛 = 𝟏) Control(𝒛 = 𝟎) Result when treated 𝑦! Treated group’s data NA Result when not treated 𝑦" NA Controlled group’s data
  15. Rubin’s Counterfactual - Assume what if chatting user hadn’t chatted

    and calculate difference of average as the effect of chatting - In reality, such data is not available - Nevertheless, users having similar covariates can compensate for it 𝑦! , 𝑦" ⊥ 𝑧|𝒙 Treatment(𝒛 = 𝟏) Control(𝒛 = 𝟎) Result when treated 𝑦! Treated group’s data NA Result when not treated 𝑦" NA Controlled group’s data
  16. Rubin’s Counterfactual - Assume what if chatting user hadn’t chatted

    and calculate difference of average as the effect of chatting - In reality, such data is not available - Nevertheless, users having similar covariates can compensate for it 𝑦! , 𝑦" ⊥ 𝑧|𝒙 Treatment(𝒛 = 𝟏) Control(𝒛 = 𝟎) Result when treated 𝑦! Treated group’s data NA Result when not treated 𝑦" NA Controlled group’s data
  17. Propensity Score - Create the model that predicts the probability

    𝑒! of chatting with covariates - Using 𝑒! , difference of average value should be correctly weighted(IPW estimator) 𝐴𝑇𝐸 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝐸𝑓𝑓𝑒𝑐𝑡 = 𝐸 𝑌" − 𝐸 𝑌# = 1 !$" % 𝑧! 𝑒! 𝑦! / 1 &$" % 𝑧& 𝑒& − 1 !$" % 1 − 𝑧! 1 − 𝑒! 𝑦! / 1 &$" % 1 − 𝑧& 1 − 𝑒&
  18. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  19. Cold Start Problem - New services such as LIVEBUY do

    not have enough data to train model - Multi-dimensional data from other services causes compliance issues - However, in LINE, ML provides a well- thought-out solution Service Data Alternative Data ? Users
  20. Diagram Selection Bias Cold Start Problem Compliance Issues Other Services

    Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution
  21. Collaboration between Teams ML & DS Planning Team Data Science

    center Data Science Dept.2 Machine Learning Solution Dept. Data Science Dept.1 Ad Data Dept. - Developed by Machine Learning Solution Department - DS & ML Departments collaborate on improvement of recommendation engines etc. - ML & DS Planning Team plays a great role in connecting us
  22. Overview - Transformed from z-features that covers cross-sectional service usage

    of LINE users - Over 30 types of data such as LINE News, LINE Sticker and AD reaction are available - Mitigate z-features’ two problems: interpretable and extremely sparse z-features: https://speakerdeck.com/line_devday2019/feature-as-a-service-at-data-labs ML User Friendly
  23. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter
  24. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter User logs of various services that are extremely sparse
  25. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter
  26. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter Content info such as news title and sticker price
  27. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter Dense vector of user log info is generated by GCN GCN: https://speakerdeck.com/line_devday2020/distributed-computing-library-for-big-data-ml-applications?slide=44
  28. Data Pipeline IU(Information Universe) Integration Jobs Ingestion, Integration z-features- meta

    z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs(client) Filter Cannot be reverted to original user logs(z-features)
  29. Datamart for Modeling Using dense vector of other service logs

    by each user as covariates Service Z (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx) ・・・ ・・・ (xxx, xxx, xxx, ・・・, xxx, xxx) 0 1 Service B Service A Chat (target) User y-features(covariates) (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx)
  30. Covariate Distribution - Indicated the difference of mean in ±0.2

    in each feature - Correction of propensity score succeeded
  31. Estimation - Using propensity score, bias in raw difference should

    be corrected - Still Indicated higher lift of weighted difference in both indices - Hypothesis “Chatting in broadcasts stimulates purchasing products” is supported Program Metrics Raw Difference Weighted Difference Program2 Avg Times of Purchasing 16.4x 10.1x Avg Amount of Purchasing 14.6x 6.4x
  32. Session Summary - Supposed the situation you enrolled to a

    new service such as LIVEBUY - Found it Hard to analyze with the lack of service data and multi-dimensional other service data also - In LINE Data Science Center, y-features enable us to overcome these problems - Thanks to that, propensity score analysis succeeded and supported the fundamental policy decision