Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Explore in Timeline : from a ML pipeline perspective

Explore in Timeline : from a ML pipeline perspective

Hochul Kim
LINE Plus Data Science Dev & ML Platform Dev Machine Learning Engineer
https://linedevday.linecorp.com/2020/ja/sessions/5543
https://linedevday.linecorp.com/2020/en/sessions/5543

Eebedc2ee7ff95ffb9d9102c6d4a065c?s=128

LINE DevDay 2020

November 26, 2020
Tweet

Transcript

  1. None
  2. Agenda › About Explore in Timeline › What is a

    Machine Learning Pipeline? › Explore in Timeline: from a ML Pipeline perspective › Summary › Next mission
  3. About Explore in Timeline

  4. About Explore in Timeline › About Explore recommendation service ›

    What is an Explore in Timeline?
  5. About Explore in Timeline What is an Explore in Timeline?

  6. About Explore in Timeline What is an Explore in Timeline?

  7. About Explore in Timeline What is an Explore in Timeline?

  8. About Explore in Timeline What is an Explore in Timeline?

  9. About Explore in Timeline What is an Explore in Timeline?

  10. About Explore in Timeline What is an Explore in Timeline?

  11. About Explore in Timeline What is an Explore in Timeline?

  12. About Explore in Timeline What is an Explore in Timeline?

  13. Available Features › Post › Post contents › From other

    LINE Family service › News, Live, etc. › User › User’s preference / category About Explore in Timeline
  14. About Explore in Timeline About Explore recommendation service RAW data

  15. About Explore in Timeline About Explore recommendation service Feature Model

    RAW data User feature Post feature …
  16. About Explore in Timeline About Explore recommendation service Feature Model

    Filtering RAW data
  17. About Explore in Timeline About Explore recommendation service Feature Model

    Filtering Recommendat ion Model RAW data
  18. About Explore in Timeline About Explore recommendation service Feature Model

    Filtering API RAW data Recommendat ion Model
  19. About Explore in Timeline About Explore recommendation service Feature Model

    Filtering API RAW data Recommendat ion Model
  20. About Explore in Timeline But we have to consider… Large

    Scale Data Service Requirement Resource Limitation Training Time etc. Model Complexity
  21. What is a ML Pipeline?

  22. What is a ML Pipeline? › High cost in terms

    of time and complexity › Independent tasks that have dependency in a service perspective Features Data Driven, But not Data Pipeline › Need to understand domain knowledge › The goal is providing service, so need understanding and considering the service environment Definition › The continuous process of planning and managing tasks for Data Science, ML Model Development, Production, Infra to utilize Machine Learning technology as service.
  23. What is a ML Pipeline? › Explore › Type of

    machine learning based service › What is the difference with the classical service?
  24. What is a ML Pipeline? Machine Learning Approach vs. Classical

    Approach Logic Data Result Model Data Result Given Wanted ML Classical
  25. Machine Learning Part Classical Service What is a ML Pipeline?

    Machine Learning Approach vs. Classical Approach Model Learning Process Inference Process Result Data Input Data Training Data
  26. What is a ML Pipeline? Machine Learning Approach vs. Classical

    Approach Feature Extraction Data Verification Data Collection Data Transform Process Management Modeling Serving Infra Automation Data Infra GPU Analysis Tool & Knowledge Security
  27. What is a ML Pipeline? Machine Learning Approach vs. Classical

    Approach Feature Extraction Data Verification Data Collection Data Transform Process Management Modeling Serving Infra Automation Data Infra GPU Security Analysis Tool & Knowledge Data Science & ML System & Infra
  28. What is a ML Pipeline? Machine Learning Approach vs. Classical

    Approach Feature Extraction Data Verification Data Collection Data Transform Process Management Modeling Serving Infra Automation Data Infra GPU Analysis Tool & Knowledge Data Science & ML System & Infra Model Core Security
  29. Scheduling / Managing What is a ML Pipeline? ML Pipeline

    by Data flow Production Train ML Model Data Analysis Model Serving Feature Eng Collect data Prepare Data Infra Collect & Store Data Security Validate Data EDA Create Feature Prepare Feature for train & service GPU / Computing Develop & Tuning ML Model Prepare Serving Infra Deploy trained model for service API for Service Service Monitoring
  30. Explore in Timeline: from a ML Pipeline perspective

  31. Explore in Timeline: from a ML Pipeline perspective Explore in

    Timeline: from a ML Pipeline perspective › Steps of pipeline › Infra Structure of Explore › Case Study
  32. Infra Structure of Explore

  33. Automation / Scheduling Infra Structure of Explore Production Train ML

    Model Data Analysis Model Serving Feature Eng Collect data MySQL/Redis Serving Cluster Filtering API Object Storage Data Cluster Jupyter User approach Operation
  34. Infra in use › Data Cluster - (with IU) ›

    HDFS + Hive with Spark › Kafka Raw Data Storage › Jupyter (with Jutopia) › On Demand Jupyter environment › Distributed GPU training environment › RDB: MySQL › IMDB: Redis Cluster Data Analysis and Modeling Serving Data Storage › Kubernetes Serving Cluster › Clipper › API › Airflow Serving Platform Workflow automation Infra Structure of Explore
  35. Infra in use › Data Cluster - (with IU) ›

    HDFS + Hive with Spark › Kafka Raw Data Storage › Jupyter (with Jutopia) › On Demand Jupyter environment › Distributed GPU training environment › RDB: MySQL › IMDB: Redis Cluster Data Analysis and Modeling Serving Data Storage › Kubernetes Serving Cluster › Clipper › API › Airflow Serving Platform Workflow automation Infra Structure of Explore
  36. Infra in use › Data Cluster - (with IU) ›

    HDFS + Hive with Spark › Kafka Raw Data Storage › Jupyter (with Jutopia) › On Demand Jupyter environment › Distributed GPU training environment › RDB: MySQL › IMDB: Redis Cluster Data Analysis and Modeling Serving Data Storage › Kubernetes Serving Cluster › Clipper › API › Airflow Serving Platform Workflow automation Infra Structure of Explore
  37. Infra in use › Data Cluster - (with IU) ›

    HDFS + Hive with Spark › Kafka Raw Data Storage › Jupyter (with Jutopia) › On Demand Jupyter environment › Distributed GPU training environment › RDB: MySQL › IMDB: Redis Cluster Data Analysis and Modeling Serving Data Storage › Kubernetes Serving Cluster › Clipper › API › Airflow Serving Platform Workflow automation Infra Structure of Explore
  38. Infra in use › Data Cluster - (with IU) ›

    HDFS + Hive with Spark › Kafka Raw Data Storage › Jupyter (with Jutopia) › On Demand Jupyter environment › Distributed GPU training environment › RDB: MySQL › IMDB: Redis Cluster Data Analysis and Modeling Serving Data Storage › Kubernetes Serving Cluster › Clipper › API › Airflow Serving Platform Workflow automation Infra Structure of Explore
  39. Infra in use › Data Cluster - (with IU) ›

    HDFS + Hive with Spark › Kafka Raw Data Storage › Jupyter (with Jutopia) › On Demand Jupyter environment › Distributed GPU training environment › RDB: MySQL › IMDB: Redis Cluster Data Analysis and Modeling Serving Data Storage › Kubernetes Serving Cluster › Clipper › API › Airflow Serving Platform Workflow automation Infra Structure of Explore
  40. Steps of Pipeline

  41. Scheduling / Managing General approach ML Pipeline by Data flow

    Production Train ML Model Data Analysis Model Serving Feature Eng Collect data Prepare Data Infra Collect & Store Data Security Validate Data EDA Create Feature Prepare Feature for train & service GPU / Computing Develop & Tuning ML Model Prepare Serving Infra Deploy trained model for service API for Service Service Monitoring
  42. Collect Data Production Train ML Model Data Analysis Model Serving

    Feature Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert Data Cluster MySQL/Redis Object Storage › Define Data to collect › With Domain expert › What we collect › User’s view, click › Contents object / meta data › Some data is dumped from cluster to serving
  43. Data Analysis Train ML Model Data Analysis Model Serving Feature

    Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Object Storage Data Cluster Jupyter Production › Validate Data › Analysis Data › EDA & Visualize
  44. Feature Engineering Train ML Model Data Analysis Model Serving Feature

    Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Jupyter Serving Cluster + Kakfa Object Storage Data Cluster Production › Develop Feature › Execution type › Batch: Jupyter + papermill › Stream: › Kafka + Spark
  45. Feature Engineering Train ML Model Data Analysis Model Serving Feature

    Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Jupyter Serving Cluster + Kakfa Object Storage Data Cluster Production › Develop Feature › Execution type › Batch: Jupyter + papermill › Stream: Kafka + Spark
  46. Train ML Model Train ML Model Data Analysis Model Serving

    Feature Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Filtering Data Cluster Jupyter Production + GPUs › Aggregate Data › Train Model › Prepare Data for inference
  47. Model Serving Train ML Model Data Analysis Model Serving Feature

    Eng Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Serving Cluster Production › Deploy trained model › Load balancing
  48. Production Train ML Model Data Analysis Model Serving Feature Eng

    Collect data App Developer Front Engineer Backend Engineer Data Engineer ML Developer Domain Expert MySQL/Redis Serving Cluster Filtering API Object Storage Data Cluster Production › Mapping API with Model › Monitoring › Apply required filters
  49. Automation / Scheduling Automation / Scheduling Production Train ML Model

    Data Analysis Model Serving Feature Eng Collect data MySQL/Redis Serving Cluster Filtering API Object Storage Data Cluster Jupyter User approach Operation
  50. Automation / Scheduling › Task › For providing service, there

    are some tasks like data preprocessing, modeling, and so on. › Each task must be executed in order that according to the data dependency › Things for manage › Dependencies › Task execution Schedule & status monitoring
  51. Automation / Scheduling › Use Airflow for managing tasks ›

    Example for workflow
  52. Automation / Scheduling Duration Gantt

  53. Case Study

  54. Case Study In Real World › Various types of cases

    › Some cases are not perfectly match with ML Pipeline perspective › Understand the concept of Machine Learning Pipeline with some cases › Follow embedding › Perfectly matched with ML pipeline steps › Offline Recommendation (for LINE Smart Channel) › Use pre-trained model
  55. Case: Follow embedding

  56. Case: Follow embedding About Case › Background › Previous User

    Embedding › Create only the embedding of an author of posts to recommend › Follow Relation › A directed relation(e.g. ‘A follow B’ is represented as ‘A→B’ › If A follow B then A can subscribe posts of B even A and B is not a friend › Problem › To many follow relation › JP, TH about 2 billion, TW: about 0.5 billion
  57. Case: Follow embedding Follow relations to embedding VTFS < 

    ʜ> 6TFS <  ʜ> ʜ 6TFSO <  ʜ>
  58. Case: Follow embedding About Case › Goal › Implement a

    model that generate an embedding for each user › User’s embedding represent that user’s followed users › The higher similarity, the more similar followed user › Daily update › Difficulty › Too many relations
  59. Production Train ML Model Data Analysis Model Serving Feature Eng

    Collect data Data Cluster MySQL/Redis Object Storage › Define Data to collect › With Domain expert › What we collect › User’s view, click › Contents object / meta data › Some data is dumped from cluster to serving Case: Follow embedding Collect Data
  60. Case: Follow embedding Collect Data Production Train ML Model Data

    Analysis Model Serving Feature Eng Collect data Data Cluster MySQL/Redis Object Storage › Data to collect › Follow Relations › Using data already exists in Cluster
  61. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Object Storage Data Cluster Jupyter Production › Validate Data › Analysis Data › EDA & Visualize Case: Follow embedding Data Analysis
  62. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Object Storage Data Cluster Jupyter Production › Data Analysis › Find out the relation data is meaningful › average of user followed user, average of follower. › Ratio of top-k user based on follower count Case: Follow embedding Data Analysis
  63. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Jupyter Serving Cluster Object Storage Data Cluster Production › Develop Feature › Execution type › Batch: Jupyter + papermill › Stream: › Kafka + Spark Case: Follow embedding Feature Engineering
  64. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Jupyter Serving Cluster Object Storage Data Cluster Production › Feature Engineering › Tried graph based embedding model. › Tried APP, ASNE and so on › Only worked for small size relation. › PBG (PyTorch BigGraph) › Can model whole relations. Case: Follow embedding Feature Engineering
  65. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Filtering Data Cluster Jupyter Production + GPUs › Aggregate Data › Train Model › Prepare Data for inference Case: Follow embedding Train ML Model
  66. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Filtering Data Cluster Jupyter Production + GPUs › CPU based PBG › Running time > 1 day (measured with JP relations) › GPU based PBG › Set up an environment to run & Fix some codes › Running time about 6 hours (measured with JP relations) Case: Follow embedding Train ML Model
  67. Case: Follow embedding Model Serving › Model Serving › Find

    users who are following users who are similar to me › User recommendation › Based on the user that similar user followed, recommend users that I have not followed yet › Post recommendation › Recommend the post that is posted by similar user Production Train ML Model Data Analysis Model Serving Feature Eng Collect data
  68. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Serving Cluster Production › Deploy trained model › Load balancing Case: Follow embedding Model Serving
  69. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Serving Cluster Production › Find users who are following users who are similar to me › Based on the followed user › User recommendation › recommend users that I have not followed yet › Post recommendation › Recommend the post posted by similar user Case: Follow embedding Model Serving Jupyter
  70. Train ML Model Data Analysis Model Serving Feature Eng Collect

    data MySQL/Redis Production › Create embedding first › Save created embedding for production purpose Case: Follow embedding Model Serving Jupyter Embedding Serving Cluster
  71. Case: Follow embedding Production › Production › Hard to calculate

    similarity between all user on real time › Offer pre-inference result for recommend user/post › Using embedding based search Production Train ML Model Data Analysis Model Serving Feature Eng Collect data
  72. Production Train ML Model Data Analysis Model Serving Feature Eng

    Collect data MySQL/Redis Serving Cluster API Object Storage Data Cluster Case: Follow embedding Production › Problems › Data size is too large › Can’t satisfying time requirements
  73. Input Desire output QPTU <  ʜ> QPTU < 

    ʜ> ʜ QPTUO <  ʜ> QPTU  QPTU  <  ʜ> Case: Follow embedding Production
  74. Embedding Search Engine Collection of vectors QPTU <  ʜ>

    QPTU <  ʜ> ʜ QPTUO <  ʜ> Query by Vector QPTU  QPTU  <  ʜ> List of similar post by vector with score Case: Follow embedding Production
  75. Search Engine <  ʜ> QPTU  QPTU  Redis

    Production Train ML Model Data Analysis Model Serving Feature Eng Collect data Case: Follow embedding Production
  76. Train ML Model Data Analysis Feature Eng Collect data MySQL/Redis

    Jupyter Embedding Serving Cluster API Search Engine Offline Inference Case: Follow embedding Production Production Model Serving
  77. Case: Offline recommendation

  78. Case: Offline recommendation About Case › Background › Required recommendation

    › Recommend a personalized post for user › For Smart Channel › Offline method › Need to pre-inference for whole user after update model › Problem › To many users to inference
  79. Case: Offline recommendation About Case › Goal › Using the

    most recently trained model › Not affect on service in serving › Provide recommendation result to fit the format of Smart Channel › Difficulty › Present system is not support offline serving
  80. Existing tasks to serving Case: Offline recommendation Task-1 Task-2 Task-n

    … Processed Data Trained Model Inference Results
  81. Case: Offline recommendation Collect Data / Data Analysis / Feature

    Engineering / Train Model › Collect Data / Data Analysis / Feature Engineering / Train Model › Use already processed data and trained model › Only need to copy model file and corresponding data › Offline inference for user › SmartCH need a recommendation for ALL USER › Need a list of all user › Inference test with exist system › takes about 72 hour Production Train ML Model Data Analysis Model Serving Feature Eng Collect data
  82. Case: Offline recommendation Model Serving / Production › Model Serving

    › The running service can’t recommend for all user › Due to lack of resource, latency, etc › Set a separated cluster for offline serving › Inference test with new system: < 5h › Production › Save inference result on HDFS first › Insert the result to Smart Channel Production Train ML Model Data Analysis Model Serving Feature Eng Collect data
  83. Offline serving Cluster Inference Instance Case: Offline recommendation Model Serving

    / Production Model Data for Serving HDFS Smart Channel
  84. Summary

  85. Summary › ML Based service › Various type of task

    › High Complexity › Logging to serving › Data Science to Frontend/Backend › Math to Computer Science › Periodic
  86. Summary If you want to serve a ML based service

    › Approaching with ML Pipeline perspective › Divide tasks according stage of ML pipeline › Define dependencies between tasks according to data relationship › Run tasks continuously › With ML Pipeline › Visualize tasks of Project(service) › Automate ETL / train / deploy process to enable tracking and debugging › Easy to find things to improvements from both model and infra
  87. Next Mission

  88. Problems to solve › Separated data and feature › Test

    Environment != Serving Environment
  89. Searving Cluster Docker Images Prepare Serving Develop Process Jupyter Develop

    Env Jupyter Notebook Convert Notebook Build Image Docker Images Prapare Data MySQL/Redis Data Cluster Docker Images
  90. Next mission Test Environment != Serving Environment › Preparing cluster

    that runs on Jupyter notebook environment Solution Test Environment != Serving Environment › Test Environment use Jutopia, which env is organized by user(data scientist, model developer) › Difference › Packages / Resource / ACL … is different
  91. Docker Images Prepare Serving Test Environment != Serving Environment Jupyter

    Develop Env + Serving Cluster Jupyter Notebook Docker Images Prapare Data MySQL/Redis Data Cluster Docker Images
  92. Next mission Separated data and feature › Offer Feature as

    a Service › For a large scale data Solution For analysis / Training purpose storage › Large scale data › Raw data is saved on data cluster › Feature / Model is also saved on Hive/HDFS For service › Need only a part of data › Low latency, Large scale traffic › Dump needed data from Hive / HDFS
  93. Docker Images Separated data and feature Jupyter Develop Env +

    Serving Cluster Jupyter Notebook Docker Images Feature Storage Data Cluster Docker Images
  94. Thank you