Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real World Challenges in Deploying Machine Learning Applications

Real World Challenges in Deploying Machine Learning Applications

Talk given by Anand Chitipothu and Ananth Krishnamoorthy at BangPypers meetup in Nov 2017.

Anand Chitipothu

November 18, 2017
Tweet

More Decks by Anand Chitipothu

Other Decks in Technology

Transcript

  1. Challenges: Deployment — Deploy a model as a service —

    Keeping track of usage and latencies 4
  2. Challenges: Model Management — Maintaining multiple versions of a model

    — Keeping track of what went into building a model 6
  3. Challenges: Collaboration — Share models with others in the team

    — Model reuse across the team — Access control 7
  4. The pace of innovation of a data-driven business is limited

    by the bottlenecks in their data science workflows 8
  5. Our Approach A data science platform to address these issues.

    Key Elements: — Firefly - A tool to run python functions as RESTful API (open source) — Compute environment to deploy models, run scheduled jobs and notebooks — Model versioning 10
  6. The Problem How to expose an ML model an API

    for others to use? Or How to expose a Python function as an API? 12
  7. Challenges — Requires writing a web application — What about

    authentication? — How to do data validation? — Need to write a client library too? 13
  8. Use And use it with a client. >>> import firefly

    >>> client = firefly.Client("http://127.0.0.1:8000") >>> client.square(n=4) 16 17
  9. Behind the scenes, it is a RESTful API. $ curl

    -d '{"n": 4}' http://127.0.0.1:8000/square 16 And supports any JSON-friendly datatype. 18
  10. Deploying a Machine Learning Model The code: # model.py import

    pickle model = pickle.load('model.pkl') def predict(features): result = model.predict([features]) return int(result[0]) 19
  11. Run the server using: $ firefly model.predict ... And use

    it in the client: >>> remote_model = firefly.Client("http://localhost:8080/") >>> remote_model.predict(features=[5.9, 3, 5.1, 1.8])) 2 20
  12. The client must pass the same token to autenticate it.

    >>> client = firefly.Client( "http://127.0.0.1:8000", auth_token="abcd1234") >>> client.square(n=4) 16 22
  13. The Compute Platform We have tools to manage ML models

    and serve them as APIs. But, what we still need to: — Set up the right environement — Provision the server as needed — Serve the functions 25
  14. Projects A project contains: — Unique name — A runtime

    — The code — Services and scheduled tasks — Data volumes 27
  15. The Setup (1/3) Every project contains a special file roro.yml.

    It contains the project name and runtime. project: credit-risk runtime: python3 28
  16. The Setup (2/3) The services that need to be running.

    services: - name: default function: predict.predict size: S1 - name: credit-grade function: credit_grade.get_credit_grade size: S1 29
  17. The Setup (3/3) And the scheduled periodic tasks. tasks: -

    name: train command: python train.py size: S2 when: every day at 10:00 AM - name: restart-web command: roro ps:restart web when: after train 30
  18. The API Make your code changes and: $ roro deploy

    Deploying credit-risk... Building docker image... done. Updating scheduled jobs... done. Restarting services... default: https://credit-risk.rorocloud.io/ credit-grade: https://credit-risk--credit-grade.rorocloud.io/ Deployed v4 of credit-risk project. 31
  19. Run scripts and notebooks. $ roro run -size C64 train.py

    Created new job b42c12a0 $ roro run --gpu train.py Created new job b42c12a0 $ roro run:notebook Created new job 60984179 Jupyter notebook is available at: https://60984179-nb.rorocloud.io/?token=LNRZDpHdPhGLzf00 The jupyter notebook server can be stopped using: roro stop 60984179 32
  20. Inspect: $ roro ps JOBID STATUS WHEN TIME CMD --------

    -------- -------------- ------- --------------- 60984179 running 14 minutes ago 0:14:18 [notebook] 74ee24a1 running 24 minutes ago 0:24:47 python train.py $ roro logs 74ee24a1 ... Iteration 1 - 43.01 Iteration 2 - 44.04 ... Iteration 34 - 67.32 33
  21. The Requirements — Storage: Save and retrieve multiple versions of

    a model — Metadata: Associate additional metadata with each model version — Simplicity: Simple Python and Command-line interface 36
  22. The Concepts Model Repository: Repository for storing multiple versions of

    a model. Model Image: A saved version of a model, including the metadata. 37
  23. +---------------------------------------+ | Model Repository A | | | | ModelImage

    - v1 ModelImage - v2 | | +---------------+ +---------------+ | | | Model v1 | | Model v2 | | | +---------------+ +---------------+ | | | Metadata v1 | | Metadata v2 | | | +---------------+ +---------------+ | +---------------------------------------+ 38
  24. The Metadata It is important to capture everything that went

    into building a model, including: — who built it — what dataset was used — what were the features used — what was the accuracy — etc. 39
  25. Sample metadata: Model-ID: c54e00eb Model-Name: iris Model-Version: 3 Author: Alice

    Foo <[email protected]> Date: 2017-08-02T10:20:30Z Content-Encoding: pickle+gzip Input-Source: s3://iris-sample-data Python-Version: 3.5.1 Dataset-Features: Sepal-Length,Sepal-Width,Petal-... Dataset-Rows: 150 Training-Algorithm: SVM Training-Parameters: C=10; alpha=0.4; kernel=rbf Training-Accuracy: 0.85 Job-Id: 0c51db57 40
  26. The Python Interface Get a Model: import roro # Get

    the current project project = roro.get_current_project() # Get repo for the model you are looking for repo = project.get_model_repository("credit-risk") # get the model image model_image = repo.get_model_image(tag="latest") print(model_image["Model-Version"], model_image["Accuracy"]) 41
  27. Save a Model: import roro project = roro.get_current_project() repo =

    project.get_model_repository("credit-risk") model_image = repo.new_image(model) model_image["Dataset-Features"] = "A,B,C,D" model_image["Training-Parameters"] = parameters model_image["Training-Accuracy"] = 0.35 model_image.save( comment="Built a new model using the data till August 2017") 42
  28. The CLI $ roro models:list credit-risk v4 hello-world v3 $

    roro models:download credit-risk:latest Downloaded credit-risk model to credit-risk.model $ roro models:tags credit-risk jan2017 v1 production v3 latest v4 43
  29. $ roro models:log Model-ID: 4fbe8871 Model-Name: credit-risk Model-Version: 4 Date:

    Thu Sep 1 13:16:14 2017 +0530 Updated the model with August data. Model-ID: bdc0a3b4 Model-Name: hello-world Model-Version: 1 Date: Thu Jul 27 11:17:14 2017 +0530 First Version of the hello-world model. 44
  30. $ roro models:show credit-risk:latest Model-ID: 4fbe8871 Model-Name: credit-risk Model-Version: 4

    Date: Thu Sep 1 13:16:14 2017 +0530 Author: Alice Foo <[email protected]> Content-Encoding: pickle+gzip Input-Source: s3://credit-risk-data Python-Version: 3.5.1 Dataset-Features: age,income,years,ownership,grade Dataset-Rows: 150 Training-Algorithm: DecisionTree Training-Parameters: max-depth=5 Training-Accuracy: 0.85 Job-Id: 01c5d25b Updated the model with August data. 45
  31. Summary — Deploying ML apps is non-trivial — The pace

    of innovation of a data-driven business is limited by the bottlenecks in their data science workflows — Data Science Platforms are essential to fill that gap 46