Data Analysis Supporting Policy Decisions of New Service LIVEBUY

Data Analysis Supporting Policy Decisions of New Service LIVEBUY Keiichiro
Nagao / LINE

Self-Introduction ML & DS Planning Team Data Science center Machine
Learning Solution Dept. Data Science Dept.2 Data Science Dept.1 Ad Data Dept. - Keiichiro Nagao - Joined LINE Corporation as a Data Scientist in October 2020 - Working on data analysis for family service data projects especially LIVEBUY

Session Theme - What if you are assigned to a
new service as a data scientist? - Seems impossible to support fundamental policy decisions under the lack of service data - Multi-dimensional log data of other services causes compliance issues - Data unique to LINE helps to resolve these issues

Agenda - What is LIVEBUY? - Hypothesis - Problems &
Solutions - y-features - Result - Application - Session Summary

What is LIVEBUY?

- Pre-released in 2021/11 - Broadcasting various programs in LINE
app - Users are able to purchase products with a few taps Live commerce

- Users can also chat in broadcasts - Comments are
usually to praise products, to say hello to presenters or to ask the usage of product Interactive Communication

Service Growth The number of broadcasts is increasing 2.8x In
this term Apr-22 May-22 Jun-22 Jul-22 Aug-22 Sep-22 Cumulative Count of Broadcasts

Hypothesis

- The hypothesis above was important for UI and UX
improvement - Verify whether chatting users turn to be purchasers - If so, programs should be more chatting-oriented Chatting in Broadcasts Stimulates Purchasing Products? ？

Glance of Aggregation - Chatting users indicated higher average of
times of purchasing and amount of purchasing - However, can we really take it for granted? Program Difference of Avg between two groups Times of Purchasing Amount of Purchasing Program1 68.3x 102.1x Program2 16.4x 14.6x Program3 12.4x 11.6x

Problems & Solutions

Diagram Selection Bias Cold Start Problem Compliance Issues Other Services
Log Data Rubin’s Counterfactual (Propensity Score) y-features Problem Solution

Selection Bias - Situations where a part of subjects are
selected from the whole population - Simple aggregation often misleads - In LIVEBUY case, Chatting users are likely to be more motivated to purchase

Rubin’s Counterfactual - Assume what if chatting user hadn’t chatted
and calculate difference of average as the effect of chatting - In reality, such data is not available - Nevertheless, users having similar covariates can compensate for it 𝑦! , 𝑦" ⊥ 𝑧|𝒙 Treatment(𝒛 = 𝟏) Control(𝒛 = 𝟎) Result when treated 𝑦! Treated group’s data NA Result when not treated 𝑦" NA Controlled group’s data

Propensity Score - Create the model that predicts the probability
𝑒! of chatting with covariates - Using 𝑒! , difference of average value should be correctly weighted(IPW estimator) 𝐴𝑇𝐸 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝐸𝑓𝑓𝑒𝑐𝑡 = 𝐸 𝑌" − 𝐸 𝑌# = 1 !$" % 𝑧! 𝑒! 𝑦! / 1 &$" % 𝑧& 𝑒& − 1 !$" % 1 − 𝑧! 1 − 𝑒! 𝑦! / 1 &$" % 1 − 𝑧& 1 − 𝑒&

Cold Start Problem - New services such as LIVEBUY do
not have enough data to train model - Multi-dimensional data from other services causes compliance issues - However, in LINE, ML provides a well- thought-out solution Service Data Alternative Data ？ Users

y-features

Collaboration between Teams ML & DS Planning Team Data Science
center Data Science Dept.2 Machine Learning Solution Dept. Data Science Dept.1 Ad Data Dept. - Developed by Machine Learning Solution Department - DS & ML Departments collaborate on improvement of recommendation engines etc. - ML & DS Planning Team plays a great role in connecting us

Overview - Transformed from z-features that covers cross-sectional service usage
of LINE users - Over 30 types of data such as LINE News, LINE Sticker and AD reaction are available - Mitigate z-features’ two problems: interpretable and extremely sparse z-features: https://speakerdeck.com/line_devday2019/feature-as-a-service-at-data-labs ML User Friendly

Data Pipeline IU（Information Universe） Integration Jobs Ingestion, Integration z-features- meta
z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs（client） Filter

z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs（client） Filter User logs of various services that are extremely sparse

z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs（client） Filter

z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs（client） Filter Content info such as news title and sticker price

z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs（client） Filter Dense vector of user log info is generated by GCN GCN: https://speakerdeck.com/line_devday2020/distributed-computing-library-for-big-data-ml-applications?slide=44

z-features y-features Representation Learning Feature Store Contents Logs (server) Service(s) Kafka LINE Apps only permitted data can be used Logs（client） Filter Cannot be reverted to original user logs(z-features)

Datamart for Modeling Using dense vector of other service logs
by each user as covariates Service Z (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx) ・・・・・・ (xxx, xxx, xxx, ・・・, xxx, xxx) 0 1 Service B Service A Chat (target) User y-features(covariates) (xxx, xxx, xxx, ・・・, xxx, xxx) (xxx, xxx, xxx, ・・・, xxx, xxx)

Result

Model Validation Logistic regression model with L2 regularization satisfied statistical
criteria

Covariate Distribution - Indicated the difference of mean in ±0.2
in each feature - Correction of propensity score succeeded

Estimation - Using propensity score, bias in raw difference should
be corrected - Still Indicated higher lift of weighted difference in both indices - Hypothesis “Chatting in broadcasts stimulates purchasing products” is supported Program Metrics Raw Difference Weighted Difference Program2 Avg Times of Purchasing 16.4x 10.1x Avg Amount of Purchasing 14.6x 6.4x

Application

UI Improvement Chat Scrolling As-Is To-Be

- As program direction alters to be more easily-chatting, LIVEBUY
increases Paid Users User Engagement

Session Summary

Session Summary - Supposed the situation you enrolled to a
new service such as LIVEBUY - Found it Hard to analyze with the lack of service data and multi-dimensional other service data also - In LINE Data Science Center, y-features enable us to overcome these problems - Thanks to that, propensity score analysis succeeded and supported the fundamental policy decision

Thank you

Data Analysis Supporting Policy Decisions of Ne...

Data Analysis Supporting Policy Decisions of New Service LIVEBUY

More Decks by Tech-Verse2022

Other Decks in Technology

Featured

Transcript