Slide 1

Slide 1 text

2019 DevDay Building a Smart Recommender System Across LINE Services > Jun Namikawa > LINE Machine Learning Team Fellow

Slide 2

Slide 2 text

Introduction

Slide 3

Slide 3 text

LINE Services

Slide 4

Slide 4 text

> Display recommended content and advertisements at the top of the chat tab Smart Channel

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Overview Concept of Smart Channel Feed Contents Personalize

Slide 7

Slide 7 text

History of Smart Channel Country: JP

Slide 8

Slide 8 text

History of Smart Channel Country: JP

Slide 9

Slide 9 text

> Day2: C2-1 12:00-12:40 "LINE-Like" Product Management > Poster Session 13:40-14:20/15:30-16:10 (2days) > Day1: B1-2 14:30-15:10 The Art of Smart Channel Continuous Improvements in Smart Channel Platform/Contents Related Sections

Slide 10

Slide 10 text

ML Architecture

Slide 11

Slide 11 text

Recommender System for Smart Channel Constraints Cooperation with existing recommender systems Cold start problem Scalability

Slide 12

Slide 12 text

Many Recommender Systems Exist in LINE Each system has a different > Implementation > Algorithm > Objective

Slide 13

Slide 13 text

Smart Channel 2019-10 (Global) Current Stats Impressions / Day 500M Contents / Day 60K+ Global DAU 100M+

Slide 14

Slide 14 text

Only New Content Has Value

Slide 15

Slide 15 text

Recommender System Architecture Recommende r System 
 for Service Recommender System 
 for Service Recommender System 
 for Service Recommender System 
 for Service Recommended
 Items (Candidates) Ranker Trainer Events
 (imp, click, etc) LINE App User ID Items Model 
 parameter Item Request Top k items 
 for each user

Slide 16

Slide 16 text

Ranker Item A 0.7 Current Expected 
 Score 0.4 Current Expected 
 Score 0.6 Current Expected 
 Score 0.1 Current Expected 
 Score Item B Item C Item D > Ranker chooses an item from candidates A, B, C … by using contextual bandits > Each expected score is computed by a prediction model corresponding to the item

Slide 17

Slide 17 text

Prediction Model > Imp: 0.5, Click: 1.0, Mute: 0.0 > Balance Exploration-Exploitation Tradeoff > Laplace Approximation Bayesian Factorization Machine (FM) as an Arm of Contextual Bandits Output User ID Item ID User Features
 (Gender, Age, …) Other Features
 (Timestamp, …) Bayesian FM Embedding Embedding

Slide 18

Slide 18 text

Parameter Server for Distributed ML Events LINE App Trainer Worker Model Worker Model Parameter Server Ranker Executor Model Executor Model Δw W W Request Contents

Slide 19

Slide 19 text

Example of asynchronous communications between the parameter server and trainers. In the situation, learning doesn't work well just by accumulating the gradient in the parameter server. Asynchronous Distributed Online Learning

Slide 20

Slide 20 text

Asynchronous distributed learning algorithm Example of asynchronous communications between the parameter server and trainers. In the situation, learning doesn't work well just by accumulating the gradient in the parameter server. Asynchronous Distributed Online Learning Deceleration Backtrack

Slide 21

Slide 21 text

Storage for Parameters Item
 Embedding Parameter Server User
 Embedding Trainer Bayesian FM Events

Slide 22

Slide 22 text

Platform for Data Analysis

Slide 23

Slide 23 text

Primary Performance Metric > Consistent with user satisfaction trends obtained from questionnaire research > Easy to calculate > Stable under temporary fluctuations due to user's unfamiliarity Why score is used as main indicator?

Slide 24

Slide 24 text

Primary Performance Metric > Consistent with user satisfaction trends obtained from questionnaire research > Easy to calculate > Stable under temporary fluctuations due to user's unfamiliarity Why score is used as main indicator? Release new types of contents, or expand target users

Slide 25

Slide 25 text

Dashboard Country: JP

Slide 26

Slide 26 text

Anomaly Detection Country: JP

Slide 27

Slide 27 text

Offline Test Off-policy Evaluation We use the More Robust Doubly Robust (MRDR) algorithm to estimate the performance of a new logic from the data generated by other logics. Framework of Offline Test To Evaluate New Logic Offline Test Environment Parameter server and trainers are clones of the production system. We use the event logs stored in DataLake by using PySpark. Trainer Parameter Server (Offline) Ranker DataLake

Slide 28

Slide 28 text

A/B Test Country: JP

Slide 29

Slide 29 text

Experiments

Slide 30

Slide 30 text

Recent Experiments To Improve Recommendation Successful Experiments Incorporate Images in Banner User and Item embeddings LinUCB to Bayesian FM

Slide 31

Slide 31 text

LinUCB To Bayesian FM CTR +4.8% Score +5.8% -1.0% xCTR > Linearity: Easy To Parallelize LinUCB > Explicit Feature Interactions Bayesian FM

Slide 32

Slide 32 text

Incorporate Images in Banner

Slide 33

Slide 33 text

Incorporate Images in Banner CTR +56% Score +16% xCTR +35%

Slide 34

Slide 34 text

User and Item Embeddings 16 User ID Item ID User Features
 (Gender, Age, …) Other Features
 (Timestamp, …) Bayesian FM Embedding Embedding

Slide 35

Slide 35 text

User and Item Embeddings CTR +5.1% Score +25.3% xCTR -16.2%

Slide 36

Slide 36 text

Future Work

Slide 37

Slide 37 text

Synergies Between Online and Offline Learning Systems Feed Contents Personalize

Slide 38

Slide 38 text

Improve Machine Learning Platform Country: JP GPUs on Kubernetes Unified Hadoop Cluster

Slide 39

Slide 39 text

Thank You