Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a smart recommender system across LINE services

Building a smart recommender system across LINE services

Jun Namikawa
LINE Machine Learning Team Fellow
https://linedevday.linecorp.com/jp/2019/sessions/C1-7

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay
    Building a Smart Recommender
    System Across LINE Services
    > Jun Namikawa
    > LINE Machine Learning Team Fellow

    View Slide

  2. Introduction

    View Slide

  3. LINE Services

    View Slide

  4. > Display recommended content and
    advertisements at the top of the chat tab
    Smart Channel

    View Slide

  5. View Slide

  6. Overview
    Concept of Smart Channel
    Feed Contents Personalize

    View Slide

  7. History of Smart Channel
    Country: JP

    View Slide

  8. History of Smart Channel
    Country: JP

    View Slide

  9. > Day2: C2-1 12:00-12:40
    "LINE-Like" Product Management
    > Poster Session 13:40-14:20/15:30-16:10 (2days)
    > Day1: B1-2 14:30-15:10
    The Art of Smart Channel
    Continuous Improvements in Smart Channel Platform/Contents
    Related Sections

    View Slide

  10. ML Architecture

    View Slide

  11. Recommender System for Smart Channel
    Constraints
    Cooperation with existing recommender systems
    Cold start problem
    Scalability

    View Slide

  12. Many Recommender Systems Exist in LINE
    Each system has a different
    > Implementation
    > Algorithm
    > Objective

    View Slide

  13. Smart Channel 2019-10 (Global)
    Current Stats
    Impressions / Day
    500M
    Contents / Day
    60K+
    Global DAU
    100M+

    View Slide

  14. Only New Content
    Has Value

    View Slide

  15. Recommender System Architecture
    Recommende
    r System 

    for Service
    Recommender
    System 

    for Service
    Recommender
    System 

    for Service
    Recommender
    System 

    for Service Recommended

    Items
    (Candidates)
    Ranker
    Trainer Events

    (imp, click, etc)
    LINE App
    User ID
    Items
    Model 

    parameter
    Item
    Request
    Top k items 

    for each user

    View Slide

  16. Ranker
    Item A
    0.7
    Current Expected 

    Score
    0.4
    Current Expected 

    Score
    0.6
    Current Expected 

    Score
    0.1
    Current Expected 

    Score
    Item B Item C Item D
    > Ranker chooses an item from candidates A, B, C … by using
    contextual bandits
    > Each expected score is computed by a prediction model
    corresponding to the item

    View Slide

  17. Prediction Model
    > Imp: 0.5, Click: 1.0, Mute: 0.0
    > Balance Exploration-Exploitation Tradeoff
    > Laplace Approximation
    Bayesian Factorization Machine (FM) as
    an Arm of Contextual Bandits
    Output
    User ID Item ID
    User Features

    (Gender, Age, …)
    Other Features

    (Timestamp, …)
    Bayesian
    FM
    Embedding Embedding

    View Slide

  18. Parameter Server for Distributed ML
    Events LINE App
    Trainer
    Worker
    Model
    Worker
    Model
    Parameter Server
    Ranker
    Executor
    Model
    Executor
    Model
    Δw W
    W
    Request Contents

    View Slide

  19. Example of asynchronous communications between the
    parameter server and trainers. In the situation, learning doesn't
    work well just by accumulating the gradient in the parameter
    server.
    Asynchronous Distributed Online Learning

    View Slide

  20. Asynchronous distributed learning algorithm
    Example of asynchronous communications between the
    parameter server and trainers. In the situation, learning doesn't
    work well just by accumulating the gradient in the parameter
    server.
    Asynchronous Distributed Online Learning
    Deceleration
    Backtrack

    View Slide

  21. Storage for Parameters
    Item

    Embedding
    Parameter
    Server
    User

    Embedding
    Trainer
    Bayesian FM
    Events

    View Slide

  22. Platform for Data Analysis

    View Slide

  23. Primary Performance Metric
    > Consistent with user satisfaction trends obtained
    from questionnaire research
    > Easy to calculate
    > Stable under temporary fluctuations due to user's
    unfamiliarity
    Why score is used as main indicator?

    View Slide

  24. Primary Performance Metric
    > Consistent with user satisfaction trends obtained
    from questionnaire research
    > Easy to calculate
    > Stable under temporary fluctuations due to user's
    unfamiliarity
    Why score is used as main indicator?
    Release new types of contents, or expand target users

    View Slide

  25. Dashboard
    Country: JP

    View Slide

  26. Anomaly Detection
    Country: JP

    View Slide

  27. Offline Test
    Off-policy Evaluation
    We use the More Robust Doubly Robust (MRDR)
    algorithm to estimate the performance of a new
    logic from the data generated by other logics.
    Framework of Offline Test To Evaluate New Logic
    Offline Test Environment
    Parameter server and trainers are clones of the
    production system. We use the event logs stored in
    DataLake by using PySpark.
    Trainer
    Parameter
    Server
    (Offline)
    Ranker
    DataLake

    View Slide

  28. A/B Test
    Country: JP

    View Slide

  29. Experiments

    View Slide

  30. Recent Experiments To Improve Recommendation
    Successful Experiments
    Incorporate Images in
    Banner
    User and Item
    embeddings
    LinUCB to Bayesian FM

    View Slide

  31. LinUCB To Bayesian FM
    CTR
    +4.8%
    Score
    +5.8%
    -1.0%
    xCTR
    > Linearity: Easy To Parallelize
    LinUCB
    > Explicit Feature Interactions
    Bayesian FM

    View Slide

  32. Incorporate Images in Banner

    View Slide

  33. Incorporate Images in Banner
    CTR
    +56%
    Score
    +16%
    xCTR
    +35%

    View Slide

  34. User and Item Embeddings
    16
    User ID Item ID
    User Features

    (Gender, Age, …)
    Other Features

    (Timestamp, …)
    Bayesian
    FM
    Embedding Embedding

    View Slide

  35. User and Item Embeddings
    CTR
    +5.1%
    Score
    +25.3%
    xCTR
    -16.2%

    View Slide

  36. Future Work

    View Slide

  37. Synergies Between Online and Offline
    Learning Systems
    Feed Contents Personalize

    View Slide

  38. Improve Machine Learning Platform
    Country: JP
    GPUs on
    Kubernetes
    Unified Hadoop
    Cluster

    View Slide

  39. Thank You

    View Slide