Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

LINE Services

Slide 3

Slide 3 text

Amount of data being processed 2020-10 (Global) Services 53 Records/Day 700B Tables 17800+

Slide 4

Slide 4 text

Data Science & Engineering Center Data Science & Engineering Data Management Data Platform Data Labs Engineering Infrastructure Data Governance Data Strategy Inquiry Management Business Consulting Data Product Management Data ETL Data Engineering IU Dev Data Solutions Cloudera PS/PSE Data Science 1-4 Machine Learning 1-2 DSP ML OCR Voice Speech NLP Speech & Voice Planning SET Delivery Infra Observability Infra

Slide 5

Slide 5 text

Mission & Goal Unified Self-Service Data Platform Machine Learning Engineering Data Science Data Governance

Slide 6

Slide 6 text

Agenda › Data and ML Platform › Application: Cross-domain recommendation › Data analysis and management

Slide 7

Slide 7 text

Agenda › Data and ML Platform › Application: Cross-domain recommendation › Data analysis and management

Slide 8

Slide 8 text

Information Universe (IU) The data platform at LINE HDFS:// s3a:// POSIX filesystem YARN Container Docker Container Distributed system

Slide 9

Slide 9 text

Information Universe (IU) The data platform at LINE HDFS:// s3a:// POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data

Slide 10

Slide 10 text

Information Universe (IU) The data platform at LINE HDFS:// s3a:// POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data External data source Export to Collect data

Slide 11

Slide 11 text

Information Universe (IU) The data platform at LINE HDFS:// s3a:// POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data External data source Export to Collect data Business intelligence

Slide 12

Slide 12 text

IU - Scale 2020-10 (Global) Servers 2585 Workloads/Day 303K+ Storage 270PB

Slide 13

Slide 13 text

Aggregated Feature Data Across Services

Slide 14

Slide 14 text

Aggregated Feature Data Across Services

Slide 15

Slide 15 text

Aggregated Feature Data Across Services

Slide 16

Slide 16 text

Aggregated Feature Data Across Services

Slide 17

Slide 17 text

Cross-Service User Features

Slide 18

Slide 18 text

Z-Features - Statistics 2020-10 (Global) Users 935M+ Dimension 62M+

Slide 19

Slide 19 text

Agenda › Data and ML Platform › Application: Cross-domain recommendation › Data analysis and management

Slide 20

Slide 20 text

Cross-Domain Recommendation › Timeline Discover › Use various features obtained from other LINE Family services (News, Live, etc.) › LINE Theme Recommendation › Utilize sticker purchase log › Smart Channel › Leverage feedbacks from multiple domains to improve recommendation performance Timeline Discover Theme Recommendation Smart Channel

Slide 21

Slide 21 text

Smart Channel › Display recommended content of various services and advertisements › Weather › Fortune › News › Sticker › Theme › Manga › Music › Point › Search › Local Safety › Train Delay › Lottery

Slide 22

Slide 22 text

Where do these contents come from? Smart Channel Service A First-stage Recommendation Recommendation for User A Service B News articles Sticker Fortune Service C

Slide 23

Slide 23 text

Where do these contents come from? Smart Channel Service A First-stage Recommendation Recommendation for User A Service B News articles Sticker Fortune Service C CRS Engine Second-stage Cross-Domain Recommendation targeting scoring filtering Only a subset of items passes User A 35-39 male Feedback

Slide 24

Slide 24 text

CRS Engine Available Features User Segment / Preference Estimated from z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker

Slide 25

Slide 25 text

CRS Engine Available Features User Segment / Preference Estimated from z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker

Slide 26

Slide 26 text

CRS Engine Available Features User Segment / Preference Estimated from z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker

Slide 27

Slide 27 text

CRS Engine Available Features User Segment / Preference Estimated from z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker

Slide 28

Slide 28 text

Case: Free Stickers 1st Trial Do not use cross-domain user / item embeddings 2nd Trial Use cross-domain user / item embeddings Notify all JP users of free stickers

Slide 29

Slide 29 text

Case: Free Stickers Results Impression ×36 Score ×13 CTR +40% › Note that a low score brings less impressions because other content is more likely to be chosen by bandit algorithm. = click / mute

Slide 30

Slide 30 text

Auto Targeting Smart Channel Service A First-stage Recommendation Recommendation for User A Service B News articles Sticker Fortune Service C CRS Engine Second-stage Cross-Domain Recommendation targeting scoring filtering Only a subset of items passes User A 35-39 male Feedback Service D Upload Content First-stage recommendation is not mandatory

Slide 31

Slide 31 text

Agenda › Data and ML Platform › Application: Cross-domain recommendation › Data analysis and management

Slide 32

Slide 32 text

Data Science efforts Data Science Team 1 Data Science Teams Data Science Team 2 Data Science Team 3 Data Science Team 4

Slide 33

Slide 33 text

BI Suite IU tools OASIS yanagishima

Slide 34

Slide 34 text

Analytics IU tools LINE Analytics Logsearch

Slide 35

Slide 35 text

A/B Testing Tool Libra suite Libra Report Libra

Slide 36

Slide 36 text

Data Analysis Examples › Chat Menu Renewal › Define KPIs in the order of priority › Estimate effects of new UI bias › Open Score for OA › Users tend to open messages less when receiving them more › Predicting `open rate’ and control the volume of message delivery

Slide 37

Slide 37 text

OA Targeting for Fintech Services Improvement with Lookalike Fintech Services Text message Rich message Send OA message › Past: Manual targeting › Present: Lookalike targeting Sent March 18, 2020

Slide 38

Slide 38 text

All Users Lookalike Audience Targeting › Lookalike engine takes a seed user set as input and output a set of similar users z-features Similar Users Seed Users Seed Users Similar Users Lookalike Engine

Slide 39

Slide 39 text

Experiments CTR +164% CVR +159% CTR +117% CVR +53% CTR +67% CVR +12% CTR +200% CVR +814% Manual Targeting vs Lookalike Targeting (2019-12 - 2020-02) Note that these campaigns have already ended

Slide 40

Slide 40 text

Data Management Data Catalog Data Governance › Information security › Data owner approval › Data Open guidance Security › Authentication: LDAP + Kerberos › Authorization: Apache Ranger › Auditing: Apache Ranger + native audit log for each component

Slide 41

Slide 41 text

Data Catalog IU tools

Slide 42

Slide 42 text

Data Governance Communication Data management is a hub for inquiries and assists with utilizing data Planner/Engineer Data Management Security Privacy Legal Data Scientist / ML Inquiry

Slide 43

Slide 43 text

Future Work

Slide 44

Slide 44 text

ML Universe (MLU) Towards company-wide ML democratization

Slide 45

Slide 45 text

DeepPocket/PicCell To help service developers to integrate various ML/DL models easily

Slide 46

Slide 46 text

Jutopia Jupiter to Pipeline Architecture Notebooks Multi-framework Model Serving Pipelines Infrastructures

Slide 47

Slide 47 text

Dataground IU Kubernetes

Slide 48

Slide 48 text

Masala Library for Distributed ML on Kubernetes › ZeroMQ › Fast and stable › asyncio with aiozmq library › Transfer Manager › Manage push/pull sockets lifecycle › MPI › State Synchronization › Distributed Training (e.g. Horovod) Kubernetes mpi run CPU Pod Process Process push push CPU Pod Process Process push push mpi run GPU Pod Process pull Process pull GPU Pod Process pull Process pull Transfer Manager

Slide 49

Slide 49 text

Closing The Distance By Data

Slide 50

Slide 50 text

Thank you