Scale Machine learning model deployment

Scale Machine Learning Deployment Gang Tao

Data Science Project Life Cycle

Model Persistent

Python pickle based code serialization sklearn.externals.joblib Spark provide api to
save model/pipeline as file Tensorflow provide tf.train.Saver that persists the tensor graph It is pickle + metadata + checkpoint Python Sklearn / Spark / Tensorflow

Models from different tools are not compatible Code serialization has
dependency on python version Code serialization has potential security concerns For tf model, those tensor names are required ( need check if there are in the meta data) tf mode has dependency on customer code which defined customer operations Issues and Limitations

A simple view of model deployment

Enable wide range of ML modeling tools : Python, R,
Tensorflow, Spark Scale up and down Performance, Latency optimization Accessing model, API Audit and Versioning CI/CD Metrics and Monitoring Optimization, AB Tests ML Deployment Challenges

Seldon

Seldon, A London Company focuses on providing control over Machine
Learning based on open source software Seldon Core is a open source platform for deploying machine learning model on Kubernetes • Python/Spark/H2O/R model support • REST and gRPC API • Deploy Inference graph of Model/Routers/Combiner/Transformers as microservices • Leveraging K8s to provide scale, security, monitoring etc Seldon

Pros Cons Seamless K8s integration Graph definition to support AB
test and ensembling No Scala support for Spark Need customer image for pySpark No customization support for liveness/readiness check due to CRD Summary

Clipper

Clipper.ai is a system developed by UC Berkeley RISE lab.
Clipper is a prediction serving system that sits between user-facing applications and a wide range of commonly used machine learning models and frameworks. Clipper

Pros Cons Easy to use interactive model deploy Support Docker
and K8s Query Latency Objective support Model Version management • Update and Rollback Cloud pickle version issue Python only Less examples/Documents Not friendly to AWS • use_internal_ip does not work well • need manually create repo for model • Failed to pull image from ecr Cluster creation is not stable Tensorflow failed to pickle Summary

MLFlow

MLflow is an open source platform for managing the end-to-end
machine learning lifecycle. MLFlow is developed by Databricks MLFlow

Pros Cons Flexible Easy to do with SKlearn Cloud integration
to support sagemaker and azure No K8s integration Spark/Tensorflow support is based on Python Projects are better managed by container Summary

MLeap allows data scientists and engineers to deploy machine learning
pipelines from Spark and Scikit-learn to a portable format and execution engine. • A JSON base serialization • A Runtime execution engine • Benchmarks http://mleap-docs.combust.ml/core-concepts/transformers/support.html MLeap

MLeap Serialization

Pros Cons Portable model between Spark and Sklearn Human readable
model Easy model serving Support matrix is incomplete Extensibility • Write code for each estimator/transformer To support tensorflow, need customer build tf-java binding, and is under experiment Summary

Wrap up

Seldon tightly integrates with k8s to support the scalability of
model serving, and it’s graph function is powerful. Clipper provides good interaction, while the code is not stable enough MLflow’s model serving is simple, with less functions MLeap targets to provide inter-operation between different tools which is very nice, while there is still a long way to go to support all the features. • PMML is not covered Some other tools are not touched • MXnet model server • Oracle Graphpipe Wrap up

Model Persistent ML Tools K8s Integration Version License Implementation Seldon
Core S2i + Pickle Tensorflow, SKlearn, Keras, R, H2O, Nodejs, PMML Yes 0.3.2 Apache Docker + K8s CRD Clipper Pickle Python, PySpark, PyTorch, Tensorflow, MXnet, Customer Container Yes 0.3.0 Apache CPP / Python MLFlow Directory + Metadata Python, H2O, Kera, MLeap, PyTorch, Sklearn, Spark, Tensorflow, R No Alpha Apache Python MLeap Spark,Sklearn, Tensorflow No 0.12.0 Apache Scala/Java

Other findings

Enabling Spark is not easy • Version, pyspark version, java
version • Build spark image with glibc support • Java gateway process exited before sending its port number • Access spark from k8s is not easy Some K8s pods are pending with Unknown status • kubectl delete pod {} --grace-period=0 --force Building your own ML image from python is not easy, use continuumio/miniconda may save you some time Using batch command to clean the docker images • docker images | grep "something_to_search" | awk '{print $1 ":" $2}' |xargs docker rmi -f • docker system prune Some other findings

References

https://cmry.github.io/notes/serialize https://cmry.github.io/notes/serialize-sk https://github.com/hiveml/simple-ml-serving https://medium.com/@vikati/the-rise-of-the-model-servers-9395522b6c58 https://qconsp.com/system/files/presentation-slides/qconsp18-deployingml- may18-npentreath.pdf https://www.slideshare.net/dscrankshaw/veloxampcamp5-final References

Scale Machine learning model deployment

Scale Machine learning model deployment

Gang Tao

More Decks by Gang Tao

Featured

Transcript

Scale Machine Learning Deployment Gang Tao

Data Science Project Life Cycle

Model Persistent

Python pickle based code serialization sklearn.externals.joblib Spark provide api to

Models from different tools are not compatible Code serialization has

A simple view of model deployment

Enable wide range of ML modeling tools : Python, R,

Seldon

Seldon, A London Company focuses on providing control over Machine

Pros Cons Seamless K8s integration Graph definition to support AB

Clipper

Clipper.ai is a system developed by UC Berkeley RISE lab.

Pros Cons Easy to use interactive model deploy Support Docker

MLFlow

MLflow is an open source platform for managing the end-to-end

Pros Cons Flexible Easy to do with SKlearn Cloud integration

MLeap

MLeap allows data scientists and engineers to deploy machine learning

MLeap Serialization

Pros Cons Portable model between Spark and Sklearn Human readable

Wrap up

Seldon tightly integrates with k8s to support the scalability of

Model Persistent ML Tools K8s Integration Version License Implementation Seldon

Other findings

Enabling Spark is not easy • Version, pyspark version, java

References