MTC2018 - Mercari ML Platform

Mercari ML Platform Hirofumi Nakagawa Software Engineer (SRE)

Software Engineer SRE Hirofumi Nakagawa

Mercari ML Platform

Kubernetes CLI Image builder Cluster Pipeline Engine Dashboard Metrics Runner
Component Mercari ML Component ML Platform Architecture

Component Mercari ML Component

Builds container image automatically. There is no need for the
user to write Dockerfiles. Cluster Pipeline Engine CLI/Image builder Metrics Manages cluster resources and implements containerized data pipeline. Using Prometheus. Acquire metrics pertaining to ML model.

Component Mercari ML Component

Componentized so that pre-processing and classifiers can be executed through
container pipeline Runner ML Component Mercari ML Component Bridges the gap between the cluster and local environments using the training/serving environment Specialized components for internal use, such as data source and item categorization

Container-Based Pipeline

DataSource Image Text　 Pre-processing Image PV Picture　 Pre-processing Image PV
PV Estimator Image All output is saved to PV and can be used as a cache.

Model Training & Serving Workflow

CI Training Cluster Job ・・・ Model Repository Job Job Serving
Cluster REST API TF Serving Faiss ・・・ All models are turned into images and subject to version management.

Serving Architecture

Mercari API Flask SK Model SK Model SK Model TensorFlow
Serving TF Model TF Model TF Model Virtual Service Basic Serving API Architecture

Streaming Serving API Architecture Flask SK Model SK Model SK
Model TensorFlow Serving TF Model TF Model TF Model Virtual Service Proxy

A/B Test Architecture

Model Service A　 Virtual Service Model Service B　 Mercari API
90% 10% Model API Is Activated by Istio

Common Problems of Model Serving

Huge Model File vs. Container Image • Can a huge
ML model file be included in the image? • If not, where should it be placed? • Trade-off between portability and load times

Effective Memory Usage • Normally uses several GB of memory
• An environment where a model must be loaded for each process is not ideal. • Using Copy on Write (CoW) is necessary.

In Progress

Automation of model evaluation re-training Automated generation of models Automatic
deployment to the production environment Simple model generation and simplification can be automated through methods like architecture search and hyper parameter optimization. Deploy the generated model to the production environment. The best model will be selected automatically. High Degree of Automation Conduct evaluation and visualization of the operation model. Automate re-training and raise its scale.

ML Continuous Deployment Deploy Monitoring Evaluation Hyper Parameter Optimization Re-Training
Even after release, precision monitoring, hyper parameter tuning, re-training, and deployment will be conducted automatically.

AutoFlow Feature Extraction Components Concatenation Components Classification Components Model Builder
Component Repository

In the Future

Edge Device Democratization of AI Further Automation Future Plans

Edge Device • An environment has been arranged that can
conduct prediction for TensorFlow Lite, CoreML, etc. on the Edge site. • Predictions on Edge are considered to have great benefits for UX. • Research and investigation is underway, so stay tuned

Democratization of AI • Many models in demand • Make
it possible for people other than ML engineers to make models • Necessary to prepare an environment in accordance with DataPlatform

Further Automation • It will be necessary to operate several
thousand models in the near future. • It will be vital to conduct automation on an even greater scale.

MTC2018 - Mercari ML Platform

MTC2018 - Mercari ML Platform

mercari
PRO

More Decks by mercari

Other Decks in Technology

Featured

Transcript

Mercari ML Platform Hirofumi Nakagawa Software Engineer (SRE)

Software Engineer SRE Hirofumi Nakagawa

Mercari ML Platform

Kubernetes CLI Image builder Cluster Pipeline Engine Dashboard Metrics Runner

Kubernetes CLI Image builder Cluster Pipeline Engine Dashboard Metrics Runner

Builds container image automatically. There is no need for the

Kubernetes CLI Image builder Cluster Pipeline Engine Dashboard Metrics Runner

Componentized so that pre-processing and classifiers can be executed through

Container-Based Pipeline

DataSource Image Text　 Pre-processing Image PV Picture　 Pre-processing Image PV

Model Training & Serving Workflow

CI Training Cluster Job ・・・ Model Repository Job Job Serving

Serving Architecture

Mercari API Flask SK Model SK Model SK Model TensorFlow

Streaming Serving API Architecture Flask SK Model SK Model SK

A/B Test Architecture

Model Service A　 Virtual Service Model Service B　 Mercari API

Common Problems of Model Serving

Huge Model File vs. Container Image • Can a huge

Effective Memory Usage • Normally uses several GB of memory

In Progress

Automation of model evaluation re-training Automated generation of models Automatic

ML Continuous Deployment Deploy Monitoring Evaluation Hyper Parameter Optimization Re-Training

AutoFlow Feature Extraction Components Concatenation Components Classification Components Model Builder

In the Future

Edge Device Democratization of AI Further Automation Future Plans

Edge Device • An environment has been arranged that can

Demo

Democratization of AI • Many models in demand • Make

Further Automation • It will be necessary to operate several