Deploying Models to Production with TF Serving

Deploying models to production with TF Serving Rishit Dagli High
School TEDx, TED-Ed Speaker rishit_dagli Rishit-dagli

“Most models don’t get deployed.”

of models don’t get deployed. 90%

Source: Laurence Moroney

• 11 Grade Student • TEDx and Ted-Ed Speaker •
♡ Hackathons and competitions • ♡ Research • My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli

• Devs who have worked on Deep Learning Models (Keras)
• Devs looking for ways to put their model into production ready manner Ideal Audience

Why care about ML deployments? Source: memegenerator.net

• Package the model What things to take care of?

• Package the model • Post the model on Server
What things to take care of?

• Maintain the server What things to take care of?

• Maintain the server Auto-scale What things to take care of?

• Maintain the server Auto-scale Global availability What things to take care of?

• Maintain the server Auto-scale Global availability Latency What things to take care of?

• Maintain the server • API What things to take care of?

• Maintain the server • API • Model Versioning What things to take care of?

Simple Deployments Why are they inefﬁcient?

Simple Deployments Why are they inefﬁcient? • No consistent API
• No model versioning • No mini-batching • Inefﬁcient for large models Source: Hannes Hapke

TensorFlow Serving

TensorFlow Serving TensorFlow Data validation TensorFlow Transform TensorFlow Model Analysis
TensorFlow Serving TensorFlow Extended

• Part of TensorFlow Extended TensorFlow Serving

• Part of TensorFlow Extended • Used Internally at Google
TensorFlow Serving

• Part of TensorFlow Extended • Used Internally at Google
• Makes deployment a lot easier TensorFlow Serving

The Process

• The SavedModel format • Graph deﬁnitions as protocol buffer
Export Model

SavedModel Directory

auxiliary files e.g. vocabularies SavedModel Directory

auxiliary files e.g. vocabularies SavedModel Directory Variables

auxiliary files e.g. vocabularies SavedModel Directory Variables Graph definitions

TensorFlow Serving

TensorFlow Serving Also supports gRPC

TensorFlow Serving

Inference

• Consistent APIs • Supports simultaneously gRPC: 8500 REST: 8501
• No lists but lists of lists Inference

• No lists but lists of lists Inference

• JSON response • Can specify a particular version Inference
with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict

• JSON response • Can specify a particular version Inference
with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict Port Model name

Inference with REST

• Better connections • Data converted to protocol buffer •
Request types have designated type • Payload converted to base64 • Use gRPC stubs Inference with gRPC

Model Meta Information

• You have an API to get meta info •
Useful for model tracking in telementry systems • Provides model input/ outputs, signatures Model Meta Information

Model Meta Information http://{HOST}:8501/ v1/models/{MODEL_NAME} /versions/{MODEL_VERSION} /metadata

Batch Inferences

• Use hardware efﬁciently • Save costs and compute resources
• Take multiple requests process them together • Super cool for large models Batch inferences

• max_batch_size • batch_timeout_micros • num_batch_threads • max_enqueued_batches • file_system_poll_wait
_seconds • tensorflow_session _paralellism • tensorflow_intra_op _parallelism Batch Inference Highly customizable

• Load conﬁguration ﬁle on startup • Change parameters according
to use cases Batch Inference

Also take a look at...

• Kubeﬂow deployments • Data pre-processing on server • AI
Platform Predictions • Deployment on edge devices • Federated learning Also take a look at...

• Valid only for today • go.qwiklabs.com/cloud-study-jams-2020 • Select ML
Infrastructure Study Jam • Enter code 1s-Nairobi-8989 • Complete 1 lab to get 1 month free access • Complete the quest to get 2 month free access! • 1 Month free Coursera access Qwiklabs rishit.tech/qwiklabs-offer

df-kenya.rishit.tech Demos!

Q & A rishit_dagli Rishit-dagli

Thank You rishit_dagli Rishit-dagli

Deploying Models to Production with TF Serving

Deploying Models to Production with TF Serving

More Decks by Rishit Dagli

Other Decks in Technology

Featured

Transcript