Data Scale Team Scale Challenges Phase 1 DE Phase 2 MLE Phase 3 DA MSE Phase 4 PM BIZ Pipeline Standardization ML Model Management Collaboration Data-driven Culture
Svc Engineer (MSE) Data Analyst (DA) ✓Build and optimize data pipeline architecture ✓Assemble large, complex data sets that meet requirements Responsibilities ✓Select appropriate datasets and data representation methods ✓Research and implement appropriate ML algorithms ✓Build and scale machine learning infrastructure ✓Monitor model performance ✓Interpret data, analyze results using statistical techniques ✓Identify, analyze, and interpret trends or patterns in complex data sets Roles Big data infra, SQL, ETL, message queuing Skills Machine learning, deep learning, CV, NLP, Speech System infrastructure design, DevOps Statistics, Data Visualization, Business Knowledge
Deploy MLE MLE MLE MLE MLE DA DA DE DE MSE PM BIZ Data Labeling Model Reusability Feature Reusability Model Analysis Serving Scale-out Model Skew Model Decay Computing Resources Hyperparameters Tuning Model Version Control Model Development
Training Source code Package Automated Pipeline data/feature store Trained model Model CD Prediction service Monitoring DEVELOPMENT PRODUCTION Source repository Model registry Reference: https://cloud.google.com/solutions/machine-learning/mlops- continuous-delivery-and-automation-pipelines-in-machine-learning GPU K8S
Package data/feature store Continuous Training Automated Pipeline Trained model Model CD Prediction service Monitoring DEVELOPMENT PRODUCTION Source repository Model registry Reference: https://cloud.google.com/solutions/machine-learning/mlops- continuous-delivery-and-automation-pipelines-in-machine-learning GPU K8S Portal Pipeline Automation Training Platform Feature Store Serving Platform Monitoring
Raw Data Models Feature Engineering Training Prepare Discover Develop Train Test Deploy API API MLE DA DE PM BIZ Import aggregated data Discover and get reusable features
CD Check if new model metrics pass the threshold Source code Package BI Dashboard DEVELOPMENT PRODUCTION Source repository Model registry User search features Batch Trigger Online Model Serving
CI Pipeline CD Use AutoML to find best hyper parameters Source code Package Monitoring DEVELOPMENT PRODUCTION Source repository Model registry Text features Batch Trigger Online Model Serving
CI Pipeline CD Source code Package Automated Pipeline Service features Trained model BI Analytics DEVELOPMENT PRODUCTION Source repository Model registry Offline Model Serving Continuous Training
MLE DA MSE PM BIZ Role Prepare Discover Develop Train Test Deploy ML Portal Pipeline Automation Training Platform Data / Feature Store Serving Platform Monitoring IU / MLU