● High School Student ● TEDx and 2xTed-Ed Speaker ● Postman Student Leader ● I ❤ ML Research ● My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli
● Devs who have worked on Deep Learning Models (Keras) ● Devs looking for ways to put their model into production ready manner Ideal Audience rishit_dagli
Simple Deployments Why are they inefficient? ● No consistent API ● No model versioning ● No mini-batching ● Inefficient for large models Source: Hannes Hapke rishit_dagli
● JSON response ● Can specify a particular version Inference with REST Default URL http://{HOST}:8501/v1/ models/test:predict Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict rishit_dagli
● JSON response ● Can specify a particular version Inference with REST Default URL http://{HOST}:8501/v1/ models/test:predict Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict Port Model name rishit_dagli
● Better connections ● Data converted to protocol buffer ● Request types have designated type ● Payload converted to base64 ● Use gRPC stubs Inference with gRPC rishit_dagli
● You have an API to get meta info ● Useful for model tracking in telementry systems ● Provides model input/ outputs, signatures Model Meta Information rishit_dagli
● Use hardware efficiently ● Save costs and compute resources ● Take multiple requests process them together ● Super cool😎 for large models Batch inferences rishit_dagli
● Kubeflow deployments ● Data pre-processing on server🚅 ● AI Platform Predictions ● Deployment on edge devices ● Federated learning Also take a look at... rishit_dagli