Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MTC2018 - Mercari ML Platform

mercari
October 04, 2018

MTC2018 - Mercari ML Platform

Speaker: Hirofumi Nakagawa

Mercari uses several machine learning (ML) models. After an ML model is created, it requires continuous tuning and retraining. This is why it's not possible to use multiple ML models without having a highly automated system in place. In this session, I will be talking about the fundamentals of such automation and systems, what we’re currently using, and what we're developing.

mercari

October 04, 2018
Tweet

More Decks by mercari

Other Decks in Technology

Transcript

  1. Kubernetes CLI Image builder Cluster Pipeline Engine Dashboard Metrics Runner

    Component Mercari ML Component ML Platform Architecture
  2. Builds container image automatically. There is no need for the

    user to write Dockerfiles. Cluster Pipeline Engine CLI/Image builder Metrics Manages cluster resources and implements containerized data pipeline. Using Prometheus. Acquire metrics pertaining to ML model.
  3. Componentized so that pre-processing and classifiers can be executed through

    container pipeline Runner ML Component Mercari ML Component Bridges the gap between the cluster and local environments using the training/serving environment Specialized components for internal use, such as data source and item categorization
  4. DataSource Image Text  Pre-processing Image PV Picture  Pre-processing Image PV

    PV Estimator Image All output is saved to PV and can be used as a cache.
  5. CI Training Cluster Job ・・・ Model Repository Job Job Serving

    Cluster REST API TF Serving Faiss ・・・ All models are turned into images and subject to version management.
  6. Mercari API Flask SK Model SK Model SK Model TensorFlow

    Serving TF Model TF Model TF Model Virtual Service Basic Serving API Architecture
  7. Streaming Serving API Architecture Flask SK Model SK Model SK

    Model TensorFlow Serving TF Model TF Model TF Model Virtual Service Proxy
  8. Huge Model File vs. Container Image • Can a huge

    ML model file be included in the image? • If not, where should it be placed? • Trade-off between portability and load times
  9. Effective Memory Usage • Normally uses several GB of memory

    • An environment where a model must be loaded for each process is not ideal. • Using Copy on Write (CoW) is necessary.
  10. Automation of model evaluation re-training Automated generation of models Automatic

    deployment to the production environment Simple model generation and simplification can be automated through methods like architecture search and hyper parameter optimization. Deploy the generated model to the production environment. The best model will be selected automatically. High Degree of Automation Conduct evaluation and visualization of the operation model. Automate re-training and raise its scale.
  11. ML Continuous Deployment Deploy Monitoring Evaluation Hyper Parameter Optimization Re-Training

    Even after release, precision monitoring, hyper parameter tuning, re-training, and deployment will be conducted automatically.
  12. Edge Device • An environment has been arranged that can

    conduct prediction for TensorFlow Lite, CoreML, etc. on the Edge site. • Predictions on Edge are considered to have great benefits for UX. • Research and investigation is underway, so stay tuned
  13. Democratization of AI • Many models in demand • Make

    it possible for people other than ML engineers to make models • Necessary to prepare an environment in accordance with DataPlatform
  14. Further Automation • It will be necessary to operate several

    thousand models in the near future. • It will be vital to conduct automation on an even greater scale.