Deploying Machine Learning Models to Production

Deploying Machine Learning Models To Production 1 Anass BENSRHIR 11-05-2017

Outline 2 •Typical ML Flow •Different Strategies to deploy Machine
Learning Models •Verdict and Comparison

Who am I ? • Founder et Senior Data Scientist
@Bolddata • Big Data Project Leader @Schlumberger • Msc Advanced Systems & Machine Learning @CentraleParis [email protected] @anassbensrhir @abensrhir 3

Typical Machine Learning Flow 4 Source : http://blog.cloudera.com/blog/2016/02/how-to-predict-telco-churn-with-apache-spark-mllib/

Typical ML Model 5 from sklearn.datasets import load_iris import numpy
as np from sklearn.ensemble import RandomForestClassifier import pandas as pd iris = load_iris() # Create a dataframe with the four feature variables df = pd.DataFrame(iris.data, columns=iris.feature_names) df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75 train, test = df[df['is_train']==True], df[df['is_train']==False] features = df.columns[:4] y = pd.factorize(train['species'])[0] model = RandomForestClassifier(n_jobs=2) model.fit(train[features], y) model.predict(test[features])

One Problem 6

“Might Works well for KAGGLE! But Kaggle isn’t real world
Machine learning!” >>> Interpretability + Low Complexity + Speed Accuracy = 0.81 Speed = 30ms Accuracy Accuracy = 0.91 Speed = 3s

Life Cycle of Real World ML in Production Deployment Management
Evaluation Monitoring

Pickling 9 import cPickle as Pickle with open(“mymodel.pkl”, “wb”) as
mymodelfile: Pickle.dump(model, mymodelfile) with open(“mymodel.pkl”, “rb”) as mymodelfile: thenewmodel = Pickle.load(mymodelfile) thenewmodel.predict(newvector)

With Sklearn’s Joblib 10 from sklearn.externals import joblib joblib.dump(model,"model.joblib", compress=1)
# compression into 1 ﬁle thenewmodel = joblib.load(“model.joblib") thenewmodel.predict(newvector)

Pickle Vs Joblib Performance Time to Load the Model 0s
0.2s 0.4s 0.6s 0.8s Loading The Model 0.72s 0.23s Joblib Pickle Model File Size 0kb 12.5kb 25kb 37.5kb 50kb File Size 48 kb 4.7 kb Joblib Pickle *The Same Model

Verdict 12 GOOD BAD • Consistant way to save time
and reuse the same model everywhere. • Fast ! • Might not work if Sklearn and python versions are different from saving to loading environments. • DevOPS nightmare.

The END ?

Popularity of programming languages index (TIOBE) 14 Source : https://www.tiobe.com/tiobe-index/
5

Cost vs Technological Beneﬁt Tradeoff 15 Cost $ Technological Beneﬁt
Native Java / C++ .. models Rebuild the whole stack to Python API Powered Model Hybrid Approach PMML

Native Java / C++ Models 16

Native Libraries • Mostly used on legacy systems (Old CRM’s,
Banking….) or High Frequency Trading strategies. • If used correctly, they are Fast • Entire List of Libraries : https://github.com/josephmisiti/awesome-machine-learning • C++ : • LightGBM (https://github.com/Microsoft/LightGBM) • MLPack (http://www.mlpack.org/) • Caffe/CUDA (deeplearning) (http://caffe.berkeleyvision.org/) • Java : • Aerosolve (https://github.com/airbnb/aerosolve) • H2O (https://github.com/h2oai/h2o-3) • Weka (http://www.cs.waikato.ac.nz/ml/weka/)

Native Java/C++ Verdict 18 GOOD BAD • Used on High
Frequency Trading ﬂoors where speed trumps usability and agility. • Faster !!! • No use of Scikit-learn / pandas Data science libraries • Limitation of available algorithms • Difﬁcult and Costly ($$) • Does anybody know a Data scientist who works exclusively on Java or C++ ? (they are all in New York)

Model Export “Pmml” 19

PMML (Predictive Model Markup) PMML stands for "Predictive Model Markup
Language". It is the de facto standard to represent predictive solutions. A PMML ﬁle may contain a myriad of data transformations (pre- and post-processing) as well as one or more predictive models. Because it is a standard, PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among different tools and applications without the need for custom coding. For example, it may be developed in one application and directly deployed on another. 20

Pmml XML File

PMML Pipeline Scikit-learn model PMML File Export as PMML sklearn2pmml
PMML File Import as PMML Knime Weka R SAS C++ Java General Purpose APP Use the Model sklearn2pmml : https://github.com/jpmml/sklearn2pmml Java PMML Library : https://github.com/jpmml Apache Spark PMML : https://github.com/jpmml/jpmml-spark 22 model.pmml model.pmml

Python Code (Simplified) 23 import pandas from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier from sklearn2pmml import PMMLPipeline iris = load_iris() # Create a dataframe with the four feature variables iris_df = pd.DataFrame(iris.data, columns=iris.feature_names) iris_pipeline = PMMLPipeline([ ("classifier", RandomForestClassifier()) ]) iris_pipeline.fit(iris_df[iris_df.columns.difference(["Species"])], iris_df["Species"]) from sklearn2pmml import sklearn2pmml sklearn2pmml(iris_pipeline, “RandomForestClassifier_Iris.pmml”)

Java Code (So much Simplified) 24 // Load the file
using our simple util function. PMML pmml = JPMMLUtils.loadModel( “RandomForestClassifier_Iris.pmml” ); // Now we need a prediction evaluator for the loaded model PMMLManager mgr = new PMMLManager( pmml ); ModelEvaluator modelEvaluator = (ModelEvaluator) mgr.getModelManager(modelName, ModelEvaluatorFactory.getInstance()); Evaluator evaluator = modelEvaluator; … … … Map results = evaluator.evaluate( features ); // prediction happens here

Use PMML in Spark 25 spark-submit --master local --class org.jpmml.spark.EvaluationExample
example-1.0-SNAPSHOT.jar RandomForestClassiﬁer_Iris.pmml Iris.csv /tmp/output/ example-1.0-SNAPSHOT.jar contains a java code to import the CSV and the pmml model

Verdict 26 GOOD BAD • Interoperable • Use Python data
science stack and deploy everywhere • No Agility nor sustainability • Not Every ML algorithm is available • PMML Files Are BIG (Gigabytes…) • Need to use unit tests and match python output with new output = Slow deployment

Api Powered Model 27

Flask - Scikit-learn Model 28 Scikit-learn model FLASK Nginx Features
Vector (x) Predicted Value (y) Request Response Request : x {a = 1, b=3.4, c=3} Response : status = 200 , y {predicted= “setosa”} Web Request/response Application / Webapp / Mobile App

Json POST Request 29 curl -H "Content-Type: application/json” -X POST
-d ‘{“a":1,"b":2, “c”:4}’ http://localhost:5000/api/1.0/predict Better With Security Enabled curl -H "Content-Type: application/json” -H "Authorization: Bearer <ACCESS_TOKEN>" -X POST -d ‘{“a":1,"b":2, “c”:4}’ http://localhost:5000/api/1.0/predict

Flask Code (Simplified) 30 from flask import Flask, request, jsonify
from config import VERSION app = Flask(__name__) @app.route('/api/{version}/predict'.format(version=VERSION), methods=['POST']) def predict(): request = request.get_json(silent=True) a = request.get('a') b = request.get('b') c = request.get('c') prediction = model.predict([a, b, c]) response = dict(status="ok", prediction=prediction) return jsonify(response)

Python Web Frameworks Benchmark 31 • Results for Loading and
returning a json object. • Falcon and Flask have the best Speed/Usability Tradeoff

32 A RealWorld Architecture

MongoDB Document 33 { "_id" : ObjectId("4f693d40e4b04cde19f17205"), "hostname" : “ec2-203-0-113-25.compute-1.amazonaws.com",
“user_id" : “19846”, “prediction_id" : “3f6dcfe0-f0ac-4e94-ac46-35c1ce8d59f8”, #For traceability "model_version": "1.0", "request_features" : { "a": 1.5, "b": 0, "c": 3.2 }, "response_prediction" { "class": "setosa", "probability": 0.701 }, "requested_at": ISODate("2017-03-10T10:50:42.389Z"), "predicted_at": ISODate("2017-03-10T10:50:43.132Z"), }

Nginx Conﬁg File 34 # Deﬁne your "upstream" servers -
the # servers request will be sent to upstream app_example { least_conn; # Use Least Connections strategy server 192.168.1.19:5000; # Flask Server 1 server 192.168.1.19:5001; # Flask Server 1 Model 2 server 192.168.1.20:5000; # Flask Server 2 server 192.168.1.21:5000; # Flask Server 3 } server { listen 80; server_name model.example.com # pass the request to Flask Gunicorn server # with some correct headers for proxy-awareness location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_set_header X-NginX-Proxy true; } }

2 strategies Model 1 Model 2 Sample 1 Sample 2
3000 visits 10% conversion 3000 visits 40% conversion Use Model 2 Everywhere Strategie 1 : Use different models as they are, and update the models with new data afterwards Strategie 2 : Use A/B Testing strategy to serve the best performing model to the whole sample.

Real-Time Monitoring with Grafana

Verdict 37 GOOD BAD • SCALABLE • Interoperable (can be
used both by backend and frontend languages think : javascript !) • Agile : models can be put on production very fast and with no code change, simple as launching a new server instance • As Fast as you need it to be (add new servers or docker containers) • Did i say Agile ? • The Infrastructure can become overwhelming and costly over time.

Other Options ? • PredictionIO (http://prediction.io) • Seldon (https://www.seldon.io/) Interesting
! • Oryx (http://oryx.io) Built for Hadoop

Deploying Machine Learning Models to Production

Deploying Machine Learning Models to Production

More Decks by Anass Bensrhir

Other Decks in Programming

Featured

Transcript