Deploying an ML Model as an API | Postman Student Summit

Deploying an ML Model as an API Rishit Dagli High
School TEDx, TED-Ed Speaker rishit_dagli Rishit-dagli

“Most models don’t get deployed.” rishit_dagli

of models don’t get deployed. 90% rishit_dagli

Source: Laurence Moroney rishit_dagli

• High School Student • TEDx and 2xTed-Ed Speaker •
Postman Student Leader • I ❤ ML Research • My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli

• Devs who have worked on Deep Learning Models (Keras)
• Devs looking for ways to put their model into production ready manner Ideal Audience rishit_dagli

Why care about ML deployments? Source: memegenerator.net rishit_dagli

rishit_dagli

• Package the model What things to take care of?
rishit_dagli

• Package the model • Post the model on Server
What things to take care of? rishit_dagli

• Maintain the server What things to take care of? rishit_dagli

• Maintain the server Auto-scale What things to take care of? rishit_dagli

• Maintain the server Auto-scale Global availability What things to take care of? rishit_dagli

• Maintain the server Auto-scale Global availability Latency What things to take care of? rishit_dagli

• Maintain the server • API What things to take care of? rishit_dagli

• Maintain the server • API • Model Versioning What things to take care of? rishit_dagli

Simple Deployments Why are they inefﬁcient?

Simple Deployments Why are they inefﬁcient? • No consistent API
• No model versioning • No mini-batching • Inefﬁcient for large models Source: Hannes Hapke rishit_dagli

TensorFlow Serving

TensorFlow Serving TensorFlow Data validation TensorFlow Transform TensorFlow Model Analysis
TensorFlow Serving TensorFlow Extended rishit_dagli

• Part of TensorFlow Extended TensorFlow Serving rishit_dagli

• Part of TensorFlow Extended • Used Internally at Google
TensorFlow Serving rishit_dagli

• Part of TensorFlow Extended • Used Internally at Google
• Makes deployment a lot easier TensorFlow Serving rishit_dagli

The Process

• The SavedModel format • Graph deﬁnitions as protocol buffer
Export Model rishit_dagli

SavedModel Directory rishit_dagli

auxiliary files e.g. vocabularies SavedModel Directory rishit_dagli

auxiliary files e.g. vocabularies SavedModel Directory Variables rishit_dagli

auxiliary files e.g. vocabularies SavedModel Directory Variables Graph definitions rishit_dagli

TensorFlow Serving Also supports gRPC rishit_dagli

Inference

• Consistent APIs • Supports simultaneously gRPC: 8500 REST: 8501
• No lists but lists of lists Inference rishit_dagli

• No lists but lists of lists Inference rishit_dagli

• JSON response • Can specify a particular version Inference
with REST Default URL http://{HOST}:8501/v1/ models/test:predict Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict rishit_dagli

• JSON response • Can specify a particular version Inference
with REST Default URL http://{HOST}:8501/v1/ models/test:predict Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict Port Model name rishit_dagli

Inference with REST rishit_dagli

• Better connections • Data converted to protocol buffer •
Request types have designated type • Payload converted to base64 • Use gRPC stubs Inference with gRPC rishit_dagli

Model Meta Information

• You have an API to get meta info •
Useful for model tracking in telementry systems • Provides model input/ outputs, signatures Model Meta Information rishit_dagli

Model Meta Information http://{HOST}:8501/ v1/models/{MODEL_NAME} /versions/{MODEL_VERSION} /metadata rishit_dagli

Batch Inferences

• Use hardware efﬁciently • Save costs and compute resources
• Take multiple requests process them together • Super cool😎 for large models Batch inferences rishit_dagli

• max_batch_size • batch_timeout_micros • num_batch_threads • max_enqueued_batches • file_system_poll_wait
_seconds • tensorflow_session _paralellism • tensorflow_intra_op _parallelism Batch Inference Highly customizable rishit_dagli

• Load conﬁguration ﬁle on startup • Change parameters according
to use cases Batch Inference rishit_dagli

Also take a look at...

• Kubeﬂow deployments • Data pre-processing on server🚅 • AI
Platform Predictions • Deployment on edge devices • Federated learning Also take a look at... rishit_dagli

rishit_dagli bit.ly/postman-summit-deck Slides

Demos! bit.ly/postman-summit-demo rishit_dagli

Thank You rishit_dagli Rishit-dagli

Deploying an ML Model as an API | Postman Stude...

Deploying an ML Model as an API | Postman Student Summit

More Decks by Rishit Dagli

Other Decks in Technology

Featured

Transcript