Deploying Models to Production with TF Serving

Slide 1

Slide 1 text

Deploying models to production with TF Serving Rishit Dagli High School TEDx, TED-Ed Speaker rishit_dagli Rishit-dagli

Slide 2

Slide 2 text

“Most models don’t get deployed.”

Slide 3

Slide 3 text

of models don’t get deployed. 90%

Slide 4

Slide 4 text

Source: Laurence Moroney

Slide 5

Slide 5 text

Source: Laurence Moroney

Slide 6

Slide 6 text

● 11 Grade Student ● TEDx and Ted-Ed Speaker ● ♡ Hackathons and competitions ● ♡ Research ● My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli

Slide 7

Slide 7 text

● Devs who have worked on Deep Learning Models (Keras) ● Devs looking for ways to put their model into production ready manner Ideal Audience

Slide 8

Slide 8 text

Why care about ML deployments? Source: memegenerator.net

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

● Package the model What things to take care of?

Slide 11

Slide 11 text

● Package the model ● Post the model on Server What things to take care of?

Slide 12

Slide 12 text

● Package the model ● Post the model on Server ● Maintain the server What things to take care of?

Slide 13

Slide 13 text

● Package the model ● Post the model on Server ● Maintain the server Auto-scale What things to take care of?

Slide 14

Slide 14 text

● Package the model ● Post the model on Server ● Maintain the server Auto-scale What things to take care of?

Slide 15

Slide 15 text

● Package the model ● Post the model on Server ● Maintain the server Auto-scale Global availability What things to take care of?

Slide 16

Slide 16 text

● Package the model ● Post the model on Server ● Maintain the server Auto-scale Global availability Latency What things to take care of?

Slide 17

Slide 17 text

● Package the model ● Post the model on Server ● Maintain the server ● API What things to take care of?

Slide 18

Slide 18 text

● Package the model ● Post the model on Server ● Maintain the server ● API ● Model Versioning What things to take care of?

Slide 19

Slide 19 text

Simple Deployments Why are they inefﬁcient?

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Simple Deployments Why are they inefﬁcient? ● No consistent API ● No model versioning ● No mini-batching ● Inefﬁcient for large models Source: Hannes Hapke

Slide 22

Slide 22 text

TensorFlow Serving

Slide 23

Slide 23 text

TensorFlow Serving TensorFlow Data validation TensorFlow Transform TensorFlow Model Analysis TensorFlow Serving TensorFlow Extended

Slide 24

Slide 24 text

● Part of TensorFlow Extended TensorFlow Serving

Slide 25

Slide 25 text

● Part of TensorFlow Extended ● Used Internally at Google TensorFlow Serving

Slide 26

Slide 26 text

● Part of TensorFlow Extended ● Used Internally at Google ● Makes deployment a lot easier TensorFlow Serving

Slide 27

Slide 27 text

The Process

Slide 28

Slide 28 text

● The SavedModel format ● Graph deﬁnitions as protocol buffer Export Model

Slide 29

Slide 29 text

SavedModel Directory

Slide 30

Slide 30 text

auxiliary files e.g. vocabularies SavedModel Directory

Slide 31

Slide 31 text

auxiliary files e.g. vocabularies SavedModel Directory Variables

Slide 32

Slide 32 text

auxiliary files e.g. vocabularies SavedModel Directory Variables Graph definitions

Slide 33

Slide 33 text

TensorFlow Serving

Slide 34

Slide 34 text

TensorFlow Serving

Slide 35

Slide 35 text

TensorFlow Serving Also supports gRPC

Slide 36

Slide 36 text

TensorFlow Serving

Slide 37

Slide 37 text

TensorFlow Serving

Slide 38

Slide 38 text

TensorFlow Serving

Slide 39

Slide 39 text

TensorFlow Serving

Slide 40

Slide 40 text

Inference

Slide 41

Slide 41 text

● Consistent APIs ● Supports simultaneously gRPC: 8500 REST: 8501 ● No lists but lists of lists Inference

Slide 42

Slide 42 text

● No lists but lists of lists Inference

Slide 43

Slide 43 text

● JSON response ● Can specify a particular version Inference with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Inference with REST

Slide 46

Slide 46 text

● Better connections ● Data converted to protocol buffer ● Request types have designated type ● Payload converted to base64 ● Use gRPC stubs Inference with gRPC

Slide 47

Slide 47 text

Model Meta Information

Slide 48

Slide 48 text

● You have an API to get meta info ● Useful for model tracking in telementry systems ● Provides model input/ outputs, signatures Model Meta Information

Slide 49

Slide 49 text

Model Meta Information http://{HOST}:8501/ v1/models/{MODEL_NAME} /versions/{MODEL_VERSION} /metadata

Slide 50

Slide 50 text

Batch Inferences

Slide 51

Slide 51 text

● Use hardware efﬁciently ● Save costs and compute resources ● Take multiple requests process them together ● Super cool for large models Batch inferences

Slide 52

Slide 52 text

● max_batch_size ● batch_timeout_micros ● num_batch_threads ● max_enqueued_batches ● file_system_poll_wait _seconds ● tensorflow_session _paralellism ● tensorflow_intra_op _parallelism Batch Inference Highly customizable

Slide 53

Slide 53 text

● Load conﬁguration ﬁle on startup ● Change parameters according to use cases Batch Inference

Slide 54

Slide 54 text

Also take a look at...

Slide 55

Slide 55 text

● Kubeﬂow deployments ● Data pre-processing on server ● AI Platform Predictions ● Deployment on edge devices ● Federated learning Also take a look at...

Slide 56

Slide 56 text

● Valid only for today ● go.qwiklabs.com/cloud-study-jams-2020 ● Select ML Infrastructure Study Jam ● Enter code 1s-Nairobi-8989 ● Complete 1 lab to get 1 month free access ● Complete the quest to get 2 month free access! ● 1 Month free Coursera access Qwiklabs rishit.tech/qwiklabs-offer