Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Does LINE Implement Cross-Service Data Utilization?

How Does LINE Implement Cross-Service Data Utilization?

Eebedc2ee7ff95ffb9d9102c6d4a065c?s=128

LINE DevDay 2020

November 27, 2020
Tweet

Transcript

  1. None
  2. LINE Services

  3. Amount of data being processed 2020-10 (Global) Services 53 Records/Day

    700B Tables 17800+
  4. Data Science & Engineering Center Data Science & Engineering Data

    Management Data Platform Data Labs Engineering Infrastructure Data Governance Data Strategy Inquiry Management Business Consulting Data Product Management Data ETL Data Engineering IU Dev Data Solutions Cloudera PS/PSE Data Science 1-4 Machine Learning 1-2 DSP ML OCR Voice Speech NLP Speech & Voice Planning SET Delivery Infra Observability Infra
  5. Mission & Goal Unified Self-Service Data Platform Machine Learning Engineering

    Data Science Data Governance
  6. Agenda › Data and ML Platform › Application: Cross-domain recommendation

    › Data analysis and management
  7. Agenda › Data and ML Platform › Application: Cross-domain recommendation

    › Data analysis and management
  8. Information Universe (IU) The data platform at LINE HDFS:// s3a://

    POSIX filesystem YARN Container Docker Container Distributed system
  9. Information Universe (IU) The data platform at LINE HDFS:// s3a://

    POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data
  10. Information Universe (IU) The data platform at LINE HDFS:// s3a://

    POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data External data source Export to Collect data
  11. Information Universe (IU) The data platform at LINE HDFS:// s3a://

    POSIX filesystem YARN Container Docker Container Distributed system Execution engine Read data Write data External data source Export to Collect data Business intelligence
  12. IU - Scale 2020-10 (Global) Servers 2585 Workloads/Day 303K+ Storage

    270PB
  13. Aggregated Feature Data Across Services

  14. Aggregated Feature Data Across Services

  15. Aggregated Feature Data Across Services

  16. Aggregated Feature Data Across Services

  17. Cross-Service User Features

  18. Z-Features - Statistics 2020-10 (Global) Users 935M+ Dimension 62M+

  19. Agenda › Data and ML Platform › Application: Cross-domain recommendation

    › Data analysis and management
  20. Cross-Domain Recommendation › Timeline Discover › Use various features obtained

    from other LINE Family services (News, Live, etc.) › LINE Theme Recommendation › Utilize sticker purchase log › Smart Channel › Leverage feedbacks from multiple domains to improve recommendation performance Timeline Discover Theme Recommendation Smart Channel
  21. Smart Channel › Display recommended content of various services and

    advertisements › Weather › Fortune › News › Sticker › Theme › Manga › Music › Point › Search › Local Safety › Train Delay › Lottery
  22. Where do these contents come from? Smart Channel Service A

    First-stage Recommendation Recommendation for User A Service B News articles Sticker Fortune Service C
  23. Where do these contents come from? Smart Channel Service A

    First-stage Recommendation Recommendation for User A Service B News articles Sticker Fortune Service C CRS Engine Second-stage Cross-Domain Recommendation targeting scoring filtering Only a subset of items passes User A 35-39 male Feedback
  24. CRS Engine Available Features User Segment / Preference Estimated from

    z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker
  25. CRS Engine Available Features User Segment / Preference Estimated from

    z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker
  26. CRS Engine Available Features User Segment / Preference Estimated from

    z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker
  27. CRS Engine Available Features User Segment / Preference Estimated from

    z-features (user features) Contextual Bandits Algorithm to maximize the rewards. As contexts change, the model should adapt its bandit choice. Cross-domain User / Item Embedding Learn node embeddings in an online manner. 35-39 male fond of music User Manga News Sticker Music User News Sticker
  28. Case: Free Stickers 1st Trial Do not use cross-domain user

    / item embeddings 2nd Trial Use cross-domain user / item embeddings Notify all JP users of free stickers
  29. Case: Free Stickers Results Impression ×36 Score ×13 CTR +40%

    › Note that a low score brings less impressions because other content is more likely to be chosen by bandit algorithm. = click / mute
  30. Auto Targeting Smart Channel Service A First-stage Recommendation Recommendation for

    User A Service B News articles Sticker Fortune Service C CRS Engine Second-stage Cross-Domain Recommendation targeting scoring filtering Only a subset of items passes User A 35-39 male Feedback Service D Upload Content First-stage recommendation is not mandatory
  31. Agenda › Data and ML Platform › Application: Cross-domain recommendation

    › Data analysis and management
  32. Data Science efforts Data Science Team 1 Data Science Teams

    Data Science Team 2 Data Science Team 3 Data Science Team 4
  33. BI Suite IU tools OASIS yanagishima

  34. Analytics IU tools LINE Analytics Logsearch

  35. A/B Testing Tool Libra suite Libra Report Libra

  36. Data Analysis Examples › Chat Menu Renewal › Define KPIs

    in the order of priority › Estimate effects of new UI bias › Open Score for OA › Users tend to open messages less when receiving them more › Predicting `open rate’ and control the volume of message delivery
  37. OA Targeting for Fintech Services Improvement with Lookalike Fintech Services

    Text message Rich message Send OA message › Past: Manual targeting › Present: Lookalike targeting Sent March 18, 2020
  38. All Users Lookalike Audience Targeting › Lookalike engine takes a

    seed user set as input and output a set of similar users z-features Similar Users Seed Users Seed Users Similar Users Lookalike Engine
  39. Experiments CTR +164% CVR +159% CTR +117% CVR +53% CTR

    +67% CVR +12% CTR +200% CVR +814% Manual Targeting vs Lookalike Targeting (2019-12 - 2020-02) Note that these campaigns have already ended
  40. Data Management Data Catalog Data Governance › Information security ›

    Data owner approval › Data Open guidance Security › Authentication: LDAP + Kerberos › Authorization: Apache Ranger › Auditing: Apache Ranger + native audit log for each component
  41. Data Catalog IU tools

  42. Data Governance Communication Data management is a hub for inquiries

    and assists with utilizing data Planner/Engineer Data Management Security Privacy Legal Data Scientist / ML Inquiry
  43. Future Work

  44. ML Universe (MLU) Towards company-wide ML democratization

  45. DeepPocket/PicCell To help service developers to integrate various ML/DL models

    easily
  46. Jutopia Jupiter to Pipeline Architecture Notebooks Multi-framework Model Serving Pipelines

    Infrastructures
  47. Dataground IU Kubernetes

  48. Masala Library for Distributed ML on Kubernetes › ZeroMQ ›

    Fast and stable › asyncio with aiozmq library › Transfer Manager › Manage push/pull sockets lifecycle › MPI › State Synchronization › Distributed Training (e.g. Horovod) Kubernetes mpi run CPU Pod Process Process push push CPU Pod Process Process push push mpi run GPU Pod Process pull Process pull GPU Pod Process pull Process pull Transfer Manager
  49. Closing The Distance By Data

  50. Thank you