Slide 1

Slide 1 text

Real World Challenges in Deploying Machine Learning Applications Anand Chitipothu Ananth Krishnamoorthy rorodata 1

Slide 2

Slide 2 text

Outline — Challenges in deploying machine learning applications — Our approach to address the — Summary 2

Slide 3

Slide 3 text

The Challenges — Deployment — Infrastructure Provisioning — Model Management — Collaboration 3

Slide 4

Slide 4 text

Challenges: Deployment — Deploy a model as a service — Keeping track of usage and latencies 4

Slide 5

Slide 5 text

Challenges: Infrastructure — Starting instances on demand — Better resource utilization — Scaling 5

Slide 6

Slide 6 text

Challenges: Model Management — Maintaining multiple versions of a model — Keeping track of what went into building a model 6

Slide 7

Slide 7 text

Challenges: Collaboration — Share models with others in the team — Model reuse across the team — Access control 7

Slide 8

Slide 8 text

The pace of innovation of a data-driven business is limited by the bottlenecks in their data science workflows 8

Slide 9

Slide 9 text

Our Approach 9

Slide 10

Slide 10 text

Our Approach A data science platform to address these issues. Key Elements: — Firefly - A tool to run python functions as RESTful API (open source) — Compute environment to deploy models, run scheduled jobs and notebooks — Model versioning 10

Slide 11

Slide 11 text

Firefly Deploying ML Models 11

Slide 12

Slide 12 text

The Problem How to expose an ML model an API for others to use? Or How to expose a Python function as an API? 12

Slide 13

Slide 13 text

Challenges — Requires writing a web application — What about authentication? — How to do data validation? — Need to write a client library too? 13

Slide 14

Slide 14 text

Welcome to Firefly Deploying functions made easy! 14

Slide 15

Slide 15 text

Code Write your function: # sq.py def square(n): return n*n 15

Slide 16

Slide 16 text

Run Start web service: $ firefly sq.square http://127.0.0.1:8000/ ... 16

Slide 17

Slide 17 text

Use And use it with a client. >>> import firefly >>> client = firefly.Client("http://127.0.0.1:8000") >>> client.square(n=4) 16 17

Slide 18

Slide 18 text

Behind the scenes, it is a RESTful API. $ curl -d '{"n": 4}' http://127.0.0.1:8000/square 16 And supports any JSON-friendly datatype. 18

Slide 19

Slide 19 text

Deploying a Machine Learning Model The code: # model.py import pickle model = pickle.load('model.pkl') def predict(features): result = model.predict([features]) return int(result[0]) 19

Slide 20

Slide 20 text

Run the server using: $ firefly model.predict ... And use it in the client: >>> remote_model = firefly.Client("http://localhost:8080/") >>> remote_model.predict(features=[5.9, 3, 5.1, 1.8])) 2 20

Slide 21

Slide 21 text

Authentication Firefly has built-in support for autentication. $ firefly --token abcd1234 sq.square ... 21

Slide 22

Slide 22 text

The client must pass the same token to autenticate it. >>> client = firefly.Client( "http://127.0.0.1:8000", auth_token="abcd1234") >>> client.square(n=4) 16 22

Slide 23

Slide 23 text

It's Open Source! Firefly is an open source project with Apache 2 License. 23

Slide 24

Slide 24 text

The Compute Platform 24

Slide 25

Slide 25 text

The Compute Platform We have tools to manage ML models and serve them as APIs. But, what we still need to: — Set up the right environement — Provision the server as needed — Serve the functions 25

Slide 26

Slide 26 text

The Abstraction Every project gets an elastic computer in the cloud. 26

Slide 27

Slide 27 text

Projects A project contains: — Unique name — A runtime — The code — Services and scheduled tasks — Data volumes 27

Slide 28

Slide 28 text

The Setup (1/3) Every project contains a special file roro.yml. It contains the project name and runtime. project: credit-risk runtime: python3 28

Slide 29

Slide 29 text

The Setup (2/3) The services that need to be running. services: - name: default function: predict.predict size: S1 - name: credit-grade function: credit_grade.get_credit_grade size: S1 29

Slide 30

Slide 30 text

The Setup (3/3) And the scheduled periodic tasks. tasks: - name: train command: python train.py size: S2 when: every day at 10:00 AM - name: restart-web command: roro ps:restart web when: after train 30

Slide 31

Slide 31 text

The API Make your code changes and: $ roro deploy Deploying credit-risk... Building docker image... done. Updating scheduled jobs... done. Restarting services... default: https://credit-risk.rorocloud.io/ credit-grade: https://credit-risk--credit-grade.rorocloud.io/ Deployed v4 of credit-risk project. 31

Slide 32

Slide 32 text

Run scripts and notebooks. $ roro run -size C64 train.py Created new job b42c12a0 $ roro run --gpu train.py Created new job b42c12a0 $ roro run:notebook Created new job 60984179 Jupyter notebook is available at: https://60984179-nb.rorocloud.io/?token=LNRZDpHdPhGLzf00 The jupyter notebook server can be stopped using: roro stop 60984179 32

Slide 33

Slide 33 text

Inspect: $ roro ps JOBID STATUS WHEN TIME CMD -------- -------- -------------- ------- --------------- 60984179 running 14 minutes ago 0:14:18 [notebook] 74ee24a1 running 24 minutes ago 0:24:47 python train.py $ roro logs 74ee24a1 ... Iteration 1 - 43.01 Iteration 2 - 44.04 ... Iteration 34 - 67.32 33

Slide 34

Slide 34 text

Opportunities — Record and monitor model predictions and performance — A/B testing of models 34

Slide 35

Slide 35 text

Model Versioning 35

Slide 36

Slide 36 text

The Requirements — Storage: Save and retrieve multiple versions of a model — Metadata: Associate additional metadata with each model version — Simplicity: Simple Python and Command-line interface 36

Slide 37

Slide 37 text

The Concepts Model Repository: Repository for storing multiple versions of a model. Model Image: A saved version of a model, including the metadata. 37

Slide 38

Slide 38 text

+---------------------------------------+ | Model Repository A | | | | ModelImage - v1 ModelImage - v2 | | +---------------+ +---------------+ | | | Model v1 | | Model v2 | | | +---------------+ +---------------+ | | | Metadata v1 | | Metadata v2 | | | +---------------+ +---------------+ | +---------------------------------------+ 38

Slide 39

Slide 39 text

The Metadata It is important to capture everything that went into building a model, including: — who built it — what dataset was used — what were the features used — what was the accuracy — etc. 39

Slide 40

Slide 40 text

Sample metadata: Model-ID: c54e00eb Model-Name: iris Model-Version: 3 Author: Alice Foo Date: 2017-08-02T10:20:30Z Content-Encoding: pickle+gzip Input-Source: s3://iris-sample-data Python-Version: 3.5.1 Dataset-Features: Sepal-Length,Sepal-Width,Petal-... Dataset-Rows: 150 Training-Algorithm: SVM Training-Parameters: C=10; alpha=0.4; kernel=rbf Training-Accuracy: 0.85 Job-Id: 0c51db57 40

Slide 41

Slide 41 text

The Python Interface Get a Model: import roro # Get the current project project = roro.get_current_project() # Get repo for the model you are looking for repo = project.get_model_repository("credit-risk") # get the model image model_image = repo.get_model_image(tag="latest") print(model_image["Model-Version"], model_image["Accuracy"]) 41

Slide 42

Slide 42 text

Save a Model: import roro project = roro.get_current_project() repo = project.get_model_repository("credit-risk") model_image = repo.new_image(model) model_image["Dataset-Features"] = "A,B,C,D" model_image["Training-Parameters"] = parameters model_image["Training-Accuracy"] = 0.35 model_image.save( comment="Built a new model using the data till August 2017") 42

Slide 43

Slide 43 text

The CLI $ roro models:list credit-risk v4 hello-world v3 $ roro models:download credit-risk:latest Downloaded credit-risk model to credit-risk.model $ roro models:tags credit-risk jan2017 v1 production v3 latest v4 43

Slide 44

Slide 44 text

$ roro models:log Model-ID: 4fbe8871 Model-Name: credit-risk Model-Version: 4 Date: Thu Sep 1 13:16:14 2017 +0530 Updated the model with August data. Model-ID: bdc0a3b4 Model-Name: hello-world Model-Version: 1 Date: Thu Jul 27 11:17:14 2017 +0530 First Version of the hello-world model. 44

Slide 45

Slide 45 text

$ roro models:show credit-risk:latest Model-ID: 4fbe8871 Model-Name: credit-risk Model-Version: 4 Date: Thu Sep 1 13:16:14 2017 +0530 Author: Alice Foo Content-Encoding: pickle+gzip Input-Source: s3://credit-risk-data Python-Version: 3.5.1 Dataset-Features: age,income,years,ownership,grade Dataset-Rows: 150 Training-Algorithm: DecisionTree Training-Parameters: max-depth=5 Training-Accuracy: 0.85 Job-Id: 01c5d25b Updated the model with August data. 45

Slide 46

Slide 46 text

Summary — Deploying ML apps is non-trivial — The pace of innovation of a data-driven business is limited by the bottlenecks in their data science workflows — Data Science Platforms are essential to fill that gap 46

Slide 47

Slide 47 text

Thank You! Links: firefly - https://github.com/rorodata/firefly rorodata - http://rorodata.com/ 47