Serverless deployment of ML inference models

Serverless deployment of ML inference models 02.04.2019 MICROXCHG 2019, BERLIN

„No coffee, no results!“ Michael Krämer Software Architekt & ML
Enthusiast 15+ years of software development Works at INNOQ Schweiz in Zürich

Machine Learning?

Rumors about Machine Learning • Needs GPU! • Much data
• A lot of state • Dependencies • Immature, volatile & low-level dependencies • Needs more GPUs!

Rumors about Cloud Functions • So lightweight • Stateless •
For small function scopes only • With minimal instantiation effort • Very limited runtime environment

Challenge accepted!

What you need to know about ML Model Coefficients Training

And what does inference mean? + 1 single data input
Prediction of output } + = Usage of a trained model

Let‘s give it a try Image from https://www.flickr.com/photos/amphalon/6322092360

How we checked: 3 typical use cases with 3 well
known frameworks Sentiment analysis SpaCy 10M values (word vectors) Image classification Tensorflow 4M values (retrained MobileNet) Structured data Scikit-learn 1K values (Random Forest)

How we checked 3 major cloud providers AWS Lambda Google
Cloud Functions Azure Functions

How we checked trying to make it run https://github.com/innoq/ml_serverless

What you get: language support AWS Lambda Google Cloud Functions
Azure Functions Python 3.6, Java 8 C#, F# (.NET Core 2) JavaScript (Node 8 & 10) Python 3.7, Go JavaScript (Node 6 & 8) Python (2.7, 3.6, 3.7) JavaScript (Node 6 & 8) Java 8, Go, PowerShell C# (.NET Core 1.0, 2.0, 2.1) Custom Runtime

Limitation 1: no GPU AWS Lambda Google Cloud Functions Azure
Functions GPU Memory X X X 128 to 3008 MB 128 to 2048 MB 128 to 1536 MB What can be done: nothing

Limitation 2 deployable artifact size AWS Lambda Google Cloud Functions
Azure Functions 256 MB uncompressed 500 MB uncompressed No limit

Limitation 2 deployable artifact size What can be done: •
package only what you need in minimal size • Use requirements.txt to keep dependencies out of your artifact https://github.com/Accenture/serverless- ephemeral/blob/master/docs/build-tensorflow-package.md https://github.com/antonpaquin/Tensorflow-Lambda-Layer

Limitation 3: cold start Init model – 20s Evaluate image
– 4s (retrained MobileNet, 4M values, AWS Lambda with 1GB memory) model Category Daisy Rose Dandelion Tulip Sunflower „start on demand, fulfil the task, get terminated“

Limitation 3: cold start https://mikhail.io/2018/08/serverless-cold-start-war/ What can be done: •
use hacks to keep your functions warm • Which won’t work in any case • declare expensive resources as global variables so that they will be cached with a function • Be aware of autoscaling • Might not be suited for customer facing services

Performance AWS, Python Google, Python Structured data Scikit-learn, 1K values
Image classification, Tensorflow, 4M values Sentiment analysis, SpaCy, 10M values Cold Warm 0.15s 22s Image classification, Tensorflow, 4M values Google, JavaScript 1GB 4.3s 17s 1GB 1GB Google, Python 256MB 0.28s 0.38s Structured data Scikit-learn, 1K values Microsoft, Python 256MB 0.25s 0.7s 4s 15s

• API which cloud uses to call your function •
API of cloud services your function calls are available in the cloud only Deployment takes 3-5 minutes Limitation 4 hard to test offline

Limitation 4 hard to test offline What can be done:
try offline emulators For AWS: https://www.npmjs.com/package/serverless-offline or https://github.com/localstack/localstack For Google: https://cloud.google.com/functions/docs/emulator For Azure: https://docs.microsoft.com/de-de/azure/azure- functions/functions-develop-local

Spontaneous use Limitation 5 may get expensive https://www.trek10.com/blog/lambda-cost/ Heavy load,
multiple req/s

Limitation 6: cloud APIs are proprietary and not standardised Plan
some extra effort if you decide to move Serverless Framework does not really help

Detour Serverless Framework • https://github.com/serverless/serverless • I (naively) expected a
unified function API • Write once, run everywhere • This does not exist (yet) :-[ • Deployment descriptors for functions • serverless.yml • Share common properties for all providers • We used it primarily for AWS

Limitation 7 Deployment debugging https://stackoverflow.com/questions/55449313/ When deployments fail, you get
often cryptical messages • Use community support • If nothing else helps, try to isolate the problem step by step • feels like in the 90ies :-[ • Often, a low-level dependency like cuda is the problem

Takeaways and best practices

It works Fits best for fast leightweight models Fits best
for unheavy load scenarios Takeaways Benefits if you use other cloud services as well Benefits if your ML model is not suitable for your other environment

Declare you model as global variable Best practices Try offline
emulators You can convert your model to run from other languages

What does the architect say? • ML-related code is often
tending to be not well- structured • Initiated often from experiments • Important not to miss the turn point • Serverless can force you to clean up!

Image sources • Unsplash.com • Wikipedia.org • http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf Credentials Michael
Perlin, Leonardo Ramirez and other colleagues from INNOQ contributed

Thank you! Questions? www.innoq.com innoQ Deutschland GmbH Krischerstr. 100 40789
Monheim am Rhein Germany +49 2173 3366-0 Ohlauer Str. 43 10999 Berlin Germany Ludwigstr. 180E 63067 Offenbach Germany Kreuzstr. 16 80331 München Germany Gewerbestr. 11 CH-6330 Cham Switzerland +41 41 743 01 11 Albulastr. 55 8048 Zürich Switzerland innoQ Schweiz GmbH Michael Krämer [email protected] @mkraemerx c/o WeWork Hermannstrasse 13 20095 Hamburg Germany

Serverless deployment of ML inference models

Serverless deployment of ML inference models

Michael Krämer

More Decks by Michael Krämer

Other Decks in Technology

Featured

Transcript