Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Making Deployments Easy with TF Serving | TF Ev...
Search
Rishit Dagli
May 11, 2021
Programming
1
150
Making Deployments Easy with TF Serving | TF Everywhere India
My talk at TensorFlow Everywhere India
Rishit Dagli
May 11, 2021
Tweet
Share
More Decks by Rishit Dagli
See All by Rishit Dagli
Fantastic Models and Where to Find Them
rishitdagli
0
63
Plant AI: Project Showcase
rishitdagli
0
110
Deploying an ML Model as an API | Postman Student Summit
rishitdagli
0
80
APIs 101 with Postman
rishitdagli
0
65
Deploying Models to production with Azure ML | Scottish Summit
rishitdagli
1
75
Computer Vision with TensorFlow, Getting Started
rishitdagli
0
270
Teaching Your Models to Play Fair | Global AI Student Conf
rishitdagli
1
140
Deploying Models to Production with TF Serving
rishitdagli
1
180
Superpower Your Android apps with ML: Android 11 | Devfest 2020
rishitdagli
1
77
Other Decks in Programming
See All in Programming
第5回日本眼科AI学会総会_AIコンテスト_3位解法
neilsaw
0
130
SymfonyCon Vienna 2025: Twig, still relevant in 2025?
fabpot
3
890
テスト自動化失敗から再挑戦しチームにオーナーシップを委譲した話/STAC2024 macho
ma_cho29
1
710
PaaSとSaaSの境目で信頼性と開発速度を両立する 〜TROCCO®︎のこれまでとこれから〜
gtnao
6
7k
クリエイティブコーディングとRuby学習 / Creative Coding and Learning Ruby
chobishiba
0
3.5k
tidymodelsによるtidyな生存時間解析 / Japan.R2024
dropout009
1
560
競技プログラミングで 基礎体力を身につけよう / You can get basic skills through competitive programming
mdstoy
0
150
かんたんデザイン編集やってみた~「完全に理解した」までの道のり~
morit4ryo
1
120
.NET 9アプリをCGIとして レンタルサーバーで動かす
mayuki
1
750
物流システムにおけるリファクタリングとアーキテクチャの再構築 〜依存関係とモジュール分割の重要性〜
deeprain
1
530
React CompilerとFine Grained Reactivityと宣言的UIのこれから / The next chapter of declarative UI
ssssota
7
3.3k
アニメーションを最深まで理解してパフォーマンスを向上させる
mine2424
0
100
Featured
See All Featured
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
191
16k
GraphQLの誤解/rethinking-graphql
sonatard
67
10k
Done Done
chrislema
181
16k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Automating Front-end Workflow
addyosmani
1366
200k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
94
17k
Imperfection Machines: The Place of Print at Facebook
scottboms
266
13k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.1k
VelocityConf: Rendering Performance Case Studies
addyosmani
326
24k
Fireside Chat
paigeccino
34
3.1k
Become a Pro
speakerdeck
PRO
25
5k
Transcript
Making Deployments Easy with TF Serving Rishit Dagli High School
TEDx, TED-Ed Speaker rishit_dagli Rishit-dagli
“Most models don’t get deployed.”
of models don’t get deployed. 90%
Source: Laurence Moroney
Source: Laurence Moroney
• High School Student • TEDx and Ted-Ed Speaker •
♡ Hackathons and competitions • ♡ Research • My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli
• Devs who have worked on Deep Learning Models (Keras)
• Devs looking for ways to put their model into production ready manner Ideal Audience
Why care about ML deployments? Source: memegenerator.net
None
• Package the model What things to take care of?
• Package the model • Post the model on Server
What things to take care of?
• Package the model • Post the model on Server
• Maintain the server What things to take care of?
• Package the model • Post the model on Server
• Maintain the server Auto-scale What things to take care of?
• Package the model • Post the model on Server
• Maintain the server Auto-scale What things to take care of?
• Package the model • Post the model on Server
• Maintain the server Auto-scale Global availability What things to take care of?
• Package the model • Post the model on Server
• Maintain the server Auto-scale Global availability Latency What things to take care of?
• Package the model • Post the model on Server
• Maintain the server • API What things to take care of?
• Package the model • Post the model on Server
• Maintain the server • API • Model Versioning What things to take care of?
Simple Deployments Why are they inefficient?
None
Simple Deployments Why are they inefficient? • No consistent API
• No model versioning • No mini-batching • Inefficient for large models Source: Hannes Hapke
TensorFlow Serving
TensorFlow Serving TensorFlow Data validation TensorFlow Transform TensorFlow Model Analysis
TensorFlow Serving TensorFlow Extended
• Part of TensorFlow Extended TensorFlow Serving
• Part of TensorFlow Extended • Used Internally at Google
TensorFlow Serving
• Part of TensorFlow Extended • Used Internally at Google
• Makes deployment a lot easier TensorFlow Serving
The Process
• The SavedModel format • Graph definitions as protocol buffer
Export Model
SavedModel Directory
auxiliary files e.g. vocabularies SavedModel Directory
auxiliary files e.g. vocabularies SavedModel Directory Variables
auxiliary files e.g. vocabularies SavedModel Directory Variables Graph definitions
TensorFlow Serving
TensorFlow Serving
TensorFlow Serving Also supports gRPC
TensorFlow Serving
TensorFlow Serving
TensorFlow Serving
TensorFlow Serving
Inference
• Consistent APIs • Supports simultaneously gRPC: 8500 REST: 8501
• No lists but lists of lists Inference
• No lists but lists of lists Inference
• JSON response • Can specify a particular version Inference
with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict
• JSON response • Can specify a particular version Inference
with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict Port Model name
Inference with REST
• Better connections • Data converted to protocol buffer •
Request types have designated type • Payload converted to base64 • Use gRPC stubs Inference with gRPC
Model Meta Information
• You have an API to get meta info •
Useful for model tracking in telementry systems • Provides model input/ outputs, signatures Model Meta Information
Model Meta Information http://{HOST}:8501/ v1/models/{MODEL_NAME} /versions/{MODEL_VERSION} /metadata
Batch Inferences
• Use hardware efficiently • Save costs and compute resources
• Take multiple requests process them together • Super cool😎 for large models Batch inferences
• max_batch_size • batch_timeout_micros • num_batch_threads • max_enqueued_batches • file_system_poll_wait
_seconds • tensorflow_session _paralellism • tensorflow_intra_op _parallelism Batch Inference Highly customizable
• Load configuration file on startup • Change parameters according
to use cases Batch Inference
Also take a look at...
• Kubeflow deployments • Data pre-processing on server🚅 • AI
Platform Predictions • Deployment on edge devices • Federated learning Also take a look at...
bit.ly/tf-everywhere-ind Demos!
bit.ly/serving-deck Slides
Thank You rishit_dagli Rishit-dagli