Explore in Timeline : from a ML pipeline perspective

Agenda › About Explore in Timeline › What is a
Machine Learning Pipeline? › Explore in Timeline: from a ML Pipeline perspective › Summary › Next mission

About Explore in Timeline

About Explore in Timeline › About Explore recommendation service ›
What is an Explore in Timeline?

About Explore in Timeline What is an Explore in Timeline?

Available Features › Post › Post contents › From other
LINE Family service › News, Live, etc. › User › User’s preference / category About Explore in Timeline

About Explore in Timeline About Explore recommendation service RAW data

About Explore in Timeline About Explore recommendation service Feature Model
RAW data User feature Post feature …

Filtering RAW data

Filtering Recommendat ion Model RAW data

Filtering API RAW data Recommendat ion Model

About Explore in Timeline But we have to consider… Large
Scale Data Service Requirement Resource Limitation Training Time etc. Model Complexity

What is a ML Pipeline?

What is a ML Pipeline? › High cost in terms
of time and complexity › Independent tasks that have dependency in a service perspective Features Data Driven, But not Data Pipeline › Need to understand domain knowledge › The goal is providing service, so need understanding and considering the service environment Definition › The continuous process of planning and managing tasks for Data Science, ML Model Development, Production, Infra to utilize Machine Learning technology as service.

What is a ML Pipeline? › Explore › Type of
machine learning based service › What is the difference with the classical service?

What is a ML Pipeline? Machine Learning Approach vs. Classical
Approach Logic Data Result Model Data Result Given Wanted ML Classical

Machine Learning Part Classical Service What is a ML Pipeline?
Machine Learning Approach vs. Classical Approach Model Learning Process Inference Process Result Data Input Data Training Data

Approach Feature Extraction Data Verification Data Collection Data Transform Process Management Modeling Serving Infra Automation Data Infra GPU Analysis Tool & Knowledge Security

Approach Feature Extraction Data Verification Data Collection Data Transform Process Management Modeling Serving Infra Automation Data Infra GPU Security Analysis Tool & Knowledge Data Science & ML System & Infra

Approach Feature Extraction Data Verification Data Collection Data Transform Process Management Modeling Serving Infra Automation Data Infra GPU Analysis Tool & Knowledge Data Science & ML System & Infra Model Core Security

Scheduling / Managing What is a ML Pipeline? ML Pipeline
by Data flow Production Train ML Model Data Analysis Model Serving Feature Eng Collect data Prepare Data Infra Collect & Store Data Security Validate Data EDA Create Feature Prepare Feature for train & service GPU / Computing Develop & Tuning ML Model Prepare Serving Infra Deploy trained model for service API for Service Service Monitoring

Explore in Timeline: from a ML Pipeline perspective

Explore in Timeline: from a ML Pipeline perspective Explore in
Timeline: from a ML Pipeline perspective › Steps of pipeline › Infra Structure of Explore › Case Study

Infra Structure of Explore

Automation / Scheduling Infra Structure of Explore Production Train ML
Model Data Analysis Model Serving Feature Eng Collect data MySQL/Redis Serving Cluster Filtering API Object Storage Data Cluster Jupyter User approach Operation

Infra in use › Data Cluster - (with IU) ›
HDFS + Hive with Spark › Kafka Raw Data Storage › Jupyter (with Jutopia) › On Demand Jupyter environment › Distributed GPU training environment › RDB: MySQL › IMDB: Redis Cluster Data Analysis and Modeling Serving Data Storage › Kubernetes Serving Cluster › Clipper › API › Airflow Serving Platform Workflow automation Infra Structure of Explore

Steps of Pipeline

Scheduling / Managing General approach ML Pipeline by Data flow
Production Train ML Model Data Analysis Model Serving Feature Eng Collect data Prepare Data Infra Collect & Store Data Security Validate Data EDA Create Feature Prepare Feature for train & service GPU / Computing Develop & Tuning ML Model Prepare Serving Infra Deploy trained model for service API for Service Service Monitoring

Collect Data Production Train ML Model Data Analysis Model Serving
Feature Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert Data Cluster MySQL/Redis Object Storage › Define Data to collect › With Domain expert › What we collect › User’s view, click › Contents object / meta data › Some data is dumped from cluster to serving

Data Analysis Train ML Model Data Analysis Model Serving Feature
Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Object Storage Data Cluster Jupyter Production › Validate Data › Analysis Data › EDA & Visualize

Feature Engineering Train ML Model Data Analysis Model Serving Feature
Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Jupyter Serving Cluster + Kakfa Object Storage Data Cluster Production › Develop Feature › Execution type › Batch: Jupyter + papermill › Stream: › Kafka + Spark

Feature Engineering Train ML Model Data Analysis Model Serving Feature
Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Jupyter Serving Cluster + Kakfa Object Storage Data Cluster Production › Develop Feature › Execution type › Batch: Jupyter + papermill › Stream: Kafka + Spark

Train ML Model Train ML Model Data Analysis Model Serving
Feature Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Filtering Data Cluster Jupyter Production + GPUs › Aggregate Data › Train Model › Prepare Data for inference

Model Serving Train ML Model Data Analysis Model Serving Feature
Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Serving Cluster Production › Deploy trained model › Load balancing

Production Train ML Model Data Analysis Model Serving Feature Eng
Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Serving Cluster Filtering API Object Storage Data Cluster Production › Mapping API with Model › Monitoring › Apply required filters

Automation / Scheduling Automation / Scheduling Production Train ML Model
Data Analysis Model Serving Feature Eng Collect data MySQL/Redis Serving Cluster Filtering API Object Storage Data Cluster Jupyter User approach Operation

Automation / Scheduling › Task › For providing service, there
are some tasks like data preprocessing, modeling, and so on. › Each task must be executed in order that according to the data dependency › Things for manage › Dependencies › Task execution Schedule & status monitoring

Automation / Scheduling › Use Airflow for managing tasks ›
Example for workflow

Automation / Scheduling Duration Gantt

Case Study

Case Study In Real World › Various types of cases
› Some cases are not perfectly match with ML Pipeline perspective › Understand the concept of Machine Learning Pipeline with some cases › Follow embedding › Perfectly matched with ML pipeline steps › Offline Recommendation (for LINE Smart Channel) › Use pre-trained model

Case: Follow embedding

Case: Follow embedding About Case › Background › Previous User
Embedding › Create only the embedding of an author of posts to recommend › Follow Relation › A directed relation(e.g. ‘A follow B’ is represented as ‘A→B’ › If A follow B then A can subscribe posts of B even A and B is not a friend › Problem › To many follow relation › JP, TH about 2 billion, TW: about 0.5 billion

Case: Follow embedding Follow relations to embedding VTFS <
ʜ> 6TFS < ʜ> ʜ 6TFSO < ʜ>

Case: Follow embedding About Case › Goal › Implement a
model that generate an embedding for each user › User’s embedding represent that user’s followed users › The higher similarity, the more similar followed user › Daily update › Difficulty › Too many relations

Collect data Data Cluster MySQL/Redis Object Storage › Define Data to collect › With Domain expert › What we collect › User’s view, click › Contents object / meta data › Some data is dumped from cluster to serving Case: Follow embedding Collect Data

Case: Follow embedding Collect Data Production Train ML Model Data
Analysis Model Serving Feature Eng Collect data Data Cluster MySQL/Redis Object Storage › Data to collect › Follow Relations › Using data already exists in Cluster

Train ML Model Data Analysis Model Serving Feature Eng Collect
data MySQL/Redis Object Storage Data Cluster Jupyter Production › Validate Data › Analysis Data › EDA & Visualize Case: Follow embedding Data Analysis

data MySQL/Redis Object Storage Data Cluster Jupyter Production › Data Analysis › Find out the relation data is meaningful › average of user followed user, average of follower. › Ratio of top-k user based on follower count Case: Follow embedding Data Analysis

data MySQL/Redis Jupyter Serving Cluster Object Storage Data Cluster Production › Develop Feature › Execution type › Batch: Jupyter + papermill › Stream: › Kafka + Spark Case: Follow embedding Feature Engineering

data MySQL/Redis Jupyter Serving Cluster Object Storage Data Cluster Production › Feature Engineering › Tried graph based embedding model. › Tried APP, ASNE and so on › Only worked for small size relation. › PBG (PyTorch BigGraph) › Can model whole relations. Case: Follow embedding Feature Engineering

data MySQL/Redis Filtering Data Cluster Jupyter Production + GPUs › Aggregate Data › Train Model › Prepare Data for inference Case: Follow embedding Train ML Model

data MySQL/Redis Filtering Data Cluster Jupyter Production + GPUs › CPU based PBG › Running time > 1 day (measured with JP relations) › GPU based PBG › Set up an environment to run & Fix some codes › Running time about 6 hours (measured with JP relations) Case: Follow embedding Train ML Model

Case: Follow embedding Model Serving › Model Serving › Find
users who are following users who are similar to me › User recommendation › Based on the user that similar user followed, recommend users that I have not followed yet › Post recommendation › Recommend the post that is posted by similar user Production Train ML Model Data Analysis Model Serving Feature Eng Collect data

data MySQL/Redis Serving Cluster Production › Deploy trained model › Load balancing Case: Follow embedding Model Serving

data MySQL/Redis Serving Cluster Production › Find users who are following users who are similar to me › Based on the followed user › User recommendation › recommend users that I have not followed yet › Post recommendation › Recommend the post posted by similar user Case: Follow embedding Model Serving Jupyter

data MySQL/Redis Production › Create embedding first › Save created embedding for production purpose Case: Follow embedding Model Serving Jupyter Embedding Serving Cluster

Case: Follow embedding Production › Production › Hard to calculate
similarity between all user on real time › Offer pre-inference result for recommend user/post › Using embedding based search Production Train ML Model Data Analysis Model Serving Feature Eng Collect data

Collect data MySQL/Redis Serving Cluster API Object Storage Data Cluster Case: Follow embedding Production › Problems › Data size is too large › Can’t satisfying time requirements

Input Desire output QPTU < ʜ> QPTU <
ʜ> ʜ QPTUO < ʜ> QPTU QPTU < ʜ> Case: Follow embedding Production

Embedding Search Engine Collection of vectors QPTU < ʜ>
QPTU < ʜ> ʜ QPTUO < ʜ> Query by Vector QPTU QPTU < ʜ> List of similar post by vector with score Case: Follow embedding Production

Search Engine < ʜ> QPTU QPTU Redis
Production Train ML Model Data Analysis Model Serving Feature Eng Collect data Case: Follow embedding Production

Train ML Model Data Analysis Feature Eng Collect data MySQL/Redis
Jupyter Embedding Serving Cluster API Search Engine Offline Inference Case: Follow embedding Production Production Model Serving

Case: Offline recommendation

Case: Offline recommendation About Case › Background › Required recommendation
› Recommend a personalized post for user › For Smart Channel › Offline method › Need to pre-inference for whole user after update model › Problem › To many users to inference

Case: Offline recommendation About Case › Goal › Using the
most recently trained model › Not affect on service in serving › Provide recommendation result to fit the format of Smart Channel › Difficulty › Present system is not support offline serving

Existing tasks to serving Case: Offline recommendation Task-1 Task-2 Task-n
… Processed Data Trained Model Inference Results

Case: Offline recommendation Collect Data / Data Analysis / Feature
Engineering / Train Model › Collect Data / Data Analysis / Feature Engineering / Train Model › Use already processed data and trained model › Only need to copy model file and corresponding data › Offline inference for user › SmartCH need a recommendation for ALL USER › Need a list of all user › Inference test with exist system › takes about 72 hour Production Train ML Model Data Analysis Model Serving Feature Eng Collect data

Case: Offline recommendation Model Serving / Production › Model Serving
› The running service can’t recommend for all user › Due to lack of resource, latency, etc › Set a separated cluster for offline serving › Inference test with new system: < 5h › Production › Save inference result on HDFS first › Insert the result to Smart Channel Production Train ML Model Data Analysis Model Serving Feature Eng Collect data

Offline serving Cluster Inference Instance Case: Offline recommendation Model Serving
/ Production Model Data for Serving HDFS Smart Channel

Summary

Summary › ML Based service › Various type of task
› High Complexity › Logging to serving › Data Science to Frontend/Backend › Math to Computer Science › Periodic

Summary If you want to serve a ML based service
› Approaching with ML Pipeline perspective › Divide tasks according stage of ML pipeline › Define dependencies between tasks according to data relationship › Run tasks continuously › With ML Pipeline › Visualize tasks of Project(service) › Automate ETL / train / deploy process to enable tracking and debugging › Easy to find things to improvements from both model and infra

Next Mission

Problems to solve › Separated data and feature › Test
Environment != Serving Environment

Searving Cluster Docker Images Prepare Serving Develop Process Jupyter Develop
Env Jupyter Notebook Convert Notebook Build Image Docker Images Prapare Data MySQL/Redis Data Cluster Docker Images

Next mission Test Environment != Serving Environment › Preparing cluster
that runs on Jupyter notebook environment Solution Test Environment != Serving Environment › Test Environment use Jutopia, which env is organized by user(data scientist, model developer) › Difference › Packages / Resource / ACL … is different

Docker Images Prepare Serving Test Environment != Serving Environment Jupyter
Develop Env + Serving Cluster Jupyter Notebook Docker Images Prapare Data MySQL/Redis Data Cluster Docker Images

Next mission Separated data and feature › Offer Feature as
a Service › For a large scale data Solution For analysis / Training purpose storage › Large scale data › Raw data is saved on data cluster › Feature / Model is also saved on Hive/HDFS For service › Need only a part of data › Low latency, Large scale traffic › Dump needed data from Hive / HDFS

Docker Images Separated data and feature Jupyter Develop Env +
Serving Cluster Jupyter Notebook Docker Images Feature Storage Data Cluster Docker Images

Thank you

Explore in Timeline : from a ML pipeline perspe...

Explore in Timeline : from a ML pipeline perspective

More Decks by LINE DevDay 2020

Other Decks in Technology

Featured

Transcript