Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE TECHPULSE 2020 - Scaling Machine Learning at LINE

LINE TECHPULSE 2020 - Scaling Machine Learning at LINE

Scaling Machine Learning at LINE by Shawn Tsai & Penny Sun @ LINE TECHPULSE 2020 https://techpulse.line.me/

2102a6b8760bd6f57f672805723dd83a?s=128

line_developers_tw
PRO

December 18, 2020
Tweet

Transcript

  1. Shawn Tsai, Penny Sun / LINE TAIWAN Data Dev Scaling

    Machine Learning at LINE
  2. Agenda › ML-enhanced LINE Services › Scaling Data Team ›

    Key Roles and Activities in ML Pipeline › Data Platform at LINE › Scaling Machine Learning › ML Platform at LINE › Real Cases
  3. ML-enhanced LINE Services

  4. Machine Learning Everywhere

  5. Scaling Data Team

  6. Challenges When Scaling Data Team Team Scale vs. Data Scale

    Data Scale Team Scale Challenges Phase 1 DE Phase 2 MLE Phase 3 DA MSE Phase 4 PM BIZ Pipeline Standardization ML Model Management Collaboration Data-driven Culture
  7. Roles and Responsibilities Data Engineer (DE) ML Engineer (MLE) ML

    Svc Engineer (MSE) Data Analyst (DA) ✓Build and optimize data pipeline architecture ✓Assemble large, complex data sets that meet requirements Responsibilities ✓Select appropriate datasets and data representation methods ✓Research and implement appropriate ML algorithms ✓Build and scale machine learning infrastructure ✓Monitor model performance ✓Interpret data, analyze results using statistical techniques ✓Identify, analyze, and interpret trends or patterns in complex data sets Roles Big data infra, SQL, ETL, message queuing Skills Machine learning, deep learning, CV, NLP, Speech System infrastructure design, DevOps Statistics, Data Visualization, Business Knowledge
  8. Key Roles and Activities in ML Pipeline

  9. ML Pipeline Prepare Discover Develop Train Test Deploy

  10. Key Roles in ML Pipeline Prepare Discover Develop Train Test

    Deploy MLE MLE MLE DA DE PM BIZ Domain Knowledge MLE DA DE Collaboration Optimization Monitoring MLE MSE BIZ
  11. Key Activities in ML Pipeline Prepare Discover Develop Train Test

    Deploy MLE MLE MLE MLE MLE DA DA DE DE MSE PM BIZ Data Labeling Model Reusability Feature Reusability Model Analysis Serving Scale-out Model Skew Model Decay Computing Resources Hyperparameters Tuning Model Version Control Model Development
  12. Data Platform at LINE Information Universe (IU)

  13. Mission & Goal Data Governance Unified Self-Service Data Platform (IU)

    Machine Learning Data Science
  14. Information Universe (IU) The Data Platform at LINE

  15. Scaling Machine Learning MLOps Adoption

  16. MLOps = ML + Dev + Ops ML Dev Ops

    Extract data Design model Develop model Testing, CI CD, Feedback Monitoring
  17. System Design for MLOps Experiment Pipeline CI Pipeline CD Continuous

    Training Source code Package Automated Pipeline data/feature store Trained model Model CD Prediction service Monitoring DEVELOPMENT PRODUCTION Source repository Model registry Reference: https://cloud.google.com/solutions/machine-learning/mlops- continuous-delivery-and-automation-pipelines-in-machine-learning GPU K8S
  18. ML Platform at LINE Machine Learning Universe (MLU)

  19. Machine Learning Universe (MLU) Data Governance Unified Self-Service Data Platform

    (IU) Machine Learning Data Science Unified Self-Service Machine Learning Platform (MLU)
  20. MLU with MLOps Experiment Source code Pipeline CI Pipeline CD

    Package data/feature store Continuous Training Automated Pipeline Trained model Model CD Prediction service Monitoring DEVELOPMENT PRODUCTION Source repository Model registry Reference: https://cloud.google.com/solutions/machine-learning/mlops- continuous-delivery-and-automation-pipelines-in-machine-learning GPU K8S Portal Pipeline Automation Training Platform Feature Store Serving Platform Monitoring
  21. MLE DA DE PM BIZ MSE Feature Store Feature Store

    Raw Data Models Feature Engineering Training Prepare Discover Develop Train Test Deploy API API MLE DA DE PM BIZ Import aggregated data Discover and get reusable features
  22. Portal › Jupyter Notebook › Jnotebook reader › ReviewNB Prepare

    Discover Develop Train Test Deploy MLE DA DE PM BIZ MSE Collaborative development
  23. Portal › Jupyter Notebook › Jnotebook reader › ReviewNB Prepare

    Discover Develop Train Test Deploy MLE DA DE PM BIZ MSE Collaborative development
  24. Portal › Jupyter Notebook › Jnotebook reader › ReviewNB Prepare

    Discover Develop Train Test Deploy MLE DA DE PM BIZ MSE Collaborative code review
  25. Pipeline Automation Prepare Discover Develop Train Test Deploy MLE DA

    DE PM BIZ MSE › Pipeline Editor › Airflow (automation) Edit pipeline by visualized UI
  26. Pipeline Automation Prepare Discover Develop Train Test Deploy MLE DA

    DE PM BIZ MSE › Pipeline Editor › Airflow (automation) Trigger auto-pipeline to avoid model training/ serving skew
  27. Training Platform Prepare Discover Develop Train Test Deploy MLE DA

    DE PM BIZ MSE › NSML (GPU Cluster) › AutoML Adjustable GPU resources
  28. Training Platform Prepare Discover Develop Train Test Deploy MLE DA

    DE PM BIZ MSE › NSML (GPU Cluster) › AutoML Get best model by auto- tuning hyper-parameter
  29. Training Platform Prepare Discover Develop Train Test Deploy MLE DA

    DE PM BIZ MSE › MLFlow Log each version of models Analyze and validate model
  30. Serving Platform Prepare Discover Develop Train Test Deploy BentoML KFServing

    image Model Docker Hub Object Storage K8S MLE DA DE PM BIZ MSE Auto scale-out model service
  31. Monitoring Prepare Discover Develop Train Test Deploy MLE DA DE

    PM BIZ MSE › Prometheus + Grafana › BI Dashboard Monitor if service is heathy Monitor if model is decay
  32. Real Cases

  33. Our ML Applications Conversational AI Recommendation MarTech LINE Fact Checker

    LINE HELP TW Shopping Related Search Content Recommen- dation User Targeting CRM
  34. Shopping Related Search StarSpace: Word/Doc. Embedding User search logs: coach

    棒球⼿套 coach 旅⾏袋 ⼿拉旅⾏袋 coach 短夾 coach 零錢 鑰匙包 coach 斜背包 mk 斜背包 … coach #品牌 #鑰匙包 TagSpace word / tag embedding model 斜背包 零錢 鑰匙包 coach #斜背包 #品牌 斜背包 mk #斜背包 #品牌
  35. Shopping Related Search StarSpace: Word/Doc. Embedding Experiment Pipeline CI Pipeline

    CD Check if new model metrics pass the threshold Source code Package BI Dashboard DEVELOPMENT PRODUCTION Source repository Model registry User search features Batch Trigger Online Model Serving
  36. Content Recommendation Text Classifier: ELECTRA based fine-tuning model ELECTRA (pre-trained

    LM) Classifier (fine tuning) I Iove L ##IN ##E class #1 Text Classifier Hyper parameters: epoch lr batch size decay …
  37. Content Recommendation Text Classifier: ELECTRA based fine-tuning model Experiment Pipeline

    CI Pipeline CD Use AutoML to find best hyper parameters Source code Package Monitoring DEVELOPMENT PRODUCTION Source repository Model registry Text features Batch Trigger Online Model Serving
  38. User Targeting Use Uplift Model to Engage Persuadables Experiment Pipeline

    CI Pipeline CD Source code Package Automated Pipeline Service features Trained model BI Analytics DEVELOPMENT PRODUCTION Source repository Model registry Offline Model Serving Continuous Training
  39. Summary Scaling Machine Learning at LINE thru Data/ML Platform DE

    MLE DA MSE PM BIZ Role Prepare Discover Develop Train Test Deploy ML Portal Pipeline Automation Training Platform Data / Feature Store Serving Platform Monitoring IU / MLU
  40. Thank you