Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless deployment of ML inference models

Serverless deployment of ML inference models

When you’re trying to find use cases for serverless applications, you might not immediately — if at all — think of machine learning inference. Why? Because trained ml-models are basically a big bunch of state (ouch!) and most frameworks come with a large pile of dependencies (double ouch!).

We’ll investigate working examples of deep learning technology for image and text classification using Tensorflow, Pytorch and Spacy on AWS, Google Cloud and Azure. We will use these examples to discuss working with dependencies, handling of global state and deployment/packaging options in serverless environments. Furthermore, we will try to answer the question if these limitations help us to improve architectural goals like separation of concerns in code related to machine learning. Finally, I will summarize if serverless is a good fit for these use cases.

Don't be afraid: machine learning models are only used as subject of discussion in this talk. There is no in-depth knowledge of these technologies required for this session. Everything you need to know to be able to follow the discussion will be covered by the talk itself.

Michael Krämer

April 02, 2019
Tweet

More Decks by Michael Krämer

Other Decks in Technology

Transcript

  1. „No coffee, no results!“ Michael Krämer Software Architekt & ML

    Enthusiast 15+ years of software development Works at INNOQ Schweiz in Zürich
  2. Rumors about Machine Learning • Needs GPU! • Much data

    • A lot of state • Dependencies • Immature, volatile & low-level dependencies • Needs more GPUs!
  3. Rumors about Cloud Functions • So lightweight • Stateless •

    For small function scopes only • With minimal instantiation effort • Very limited runtime environment
  4. And what does inference mean? + 1 single data input

    Prediction of output } + = Usage of a trained model
  5. How we checked: 3 typical use cases with 3 well

    known frameworks Sentiment analysis SpaCy 10M values (word vectors) Image classification Tensorflow 4M values (retrained MobileNet) Structured data Scikit-learn 1K values (Random Forest)
  6. What you get: language support AWS Lambda Google Cloud Functions

    Azure Functions Python 3.6, Java 8 C#, F# (.NET Core 2) JavaScript (Node 8 & 10) Python 3.7, Go JavaScript (Node 6 & 8) Python (2.7, 3.6, 3.7) JavaScript (Node 6 & 8) Java 8, Go, PowerShell C# (.NET Core 1.0, 2.0, 2.1) Custom Runtime
  7. Limitation 1: no GPU AWS Lambda Google Cloud Functions Azure

    Functions GPU Memory X X X 128 to 3008 MB 128 to 2048 MB 128 to 1536 MB What can be done: nothing
  8. Limitation 2 deployable artifact size AWS Lambda Google Cloud Functions

    Azure Functions 256 MB uncompressed 500 MB uncompressed No limit
  9. Limitation 2 deployable artifact size What can be done: •

    package only what you need in minimal size • Use requirements.txt to keep dependencies out of your artifact https://github.com/Accenture/serverless- ephemeral/blob/master/docs/build-tensorflow-package.md https://github.com/antonpaquin/Tensorflow-Lambda-Layer
  10. Limitation 3: cold start Init model – 20s Evaluate image

    – 4s (retrained MobileNet, 4M values, AWS Lambda with 1GB memory) model Category Daisy Rose Dandelion Tulip Sunflower „start on demand, fulfil the task, get terminated“
  11. Limitation 3: cold start https://mikhail.io/2018/08/serverless-cold-start-war/ What can be done: •

    use hacks to keep your functions warm • Which won’t work in any case • declare expensive resources as global variables so that they will be cached with a function • Be aware of autoscaling • Might not be suited for customer facing services
  12. Performance AWS, Python Google, Python Structured data Scikit-learn, 1K values

    Image classification, Tensorflow, 4M values Sentiment analysis, SpaCy, 10M values Cold Warm 0.15s 22s Image classification, Tensorflow, 4M values Google, JavaScript 1GB 4.3s 17s 1GB 1GB Google, Python 256MB 0.28s 0.38s Structured data Scikit-learn, 1K values Microsoft, Python 256MB 0.25s 0.7s 4s 15s
  13. • API which cloud uses to call your function •

    API of cloud services your function calls are available in the cloud only Deployment takes 3-5 minutes Limitation 4 hard to test offline
  14. Limitation 4 hard to test offline What can be done:

    try offline emulators For AWS: https://www.npmjs.com/package/serverless-offline or https://github.com/localstack/localstack For Google: https://cloud.google.com/functions/docs/emulator For Azure: https://docs.microsoft.com/de-de/azure/azure- functions/functions-develop-local
  15. Limitation 6: cloud APIs are proprietary and not standardised Plan

    some extra effort if you decide to move Serverless Framework does not really help
  16. Detour Serverless Framework • https://github.com/serverless/serverless • I (naively) expected a

    unified function API • Write once, run everywhere • This does not exist (yet) :-[ • Deployment descriptors for functions • serverless.yml • Share common properties for all providers • We used it primarily for AWS
  17. Limitation 7 Deployment debugging https://stackoverflow.com/questions/55449313/ When deployments fail, you get

    often cryptical messages • Use community support • If nothing else helps, try to isolate the problem step by step • feels like in the 90ies :-[ • Often, a low-level dependency like cuda is the problem
  18. It works Fits best for fast leightweight models Fits best

    for unheavy load scenarios Takeaways Benefits if you use other cloud services as well Benefits if your ML model is not suitable for your other environment
  19. Declare you model as global variable Best practices Try offline

    emulators You can convert your model to run from other languages
  20. What does the architect say? • ML-related code is often

    tending to be not well- structured • Initiated often from experiments • Important not to miss the turn point • Serverless can force you to clean up!
  21. Thank you! Questions? www.innoq.com innoQ Deutschland GmbH Krischerstr. 100 40789

    Monheim am Rhein Germany +49 2173 3366-0 Ohlauer Str. 43 10999 Berlin Germany Ludwigstr. 180E 63067 Offenbach Germany Kreuzstr. 16 80331 München Germany Gewerbestr. 11 CH-6330 Cham Switzerland +41 41 743 01 11 Albulastr. 55 8048 Zürich Switzerland innoQ Schweiz GmbH Michael Krämer [email protected] @mkraemerx c/o WeWork Hermannstrasse 13 20095 Hamburg Germany