Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deploying an ML Model as an API | Postman Student Summit

Deploying an ML Model as an API | Postman Student Summit

My talk at Postman Student Summit

Rishit Dagli

July 30, 2021
Tweet

More Decks by Rishit Dagli

Other Decks in Technology

Transcript

  1. Deploying an ML Model as an API Rishit Dagli High

    School TEDx, TED-Ed Speaker rishit_dagli Rishit-dagli
  2. • High School Student • TEDx and 2xTed-Ed Speaker •

    Postman Student Leader • I ❤ ML Research • My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli
  3. • Devs who have worked on Deep Learning Models (Keras)

    • Devs looking for ways to put their model into production ready manner Ideal Audience rishit_dagli
  4. • Package the model • Post the model on Server

    What things to take care of? rishit_dagli
  5. • Package the model • Post the model on Server

    • Maintain the server What things to take care of? rishit_dagli
  6. • Package the model • Post the model on Server

    • Maintain the server Auto-scale What things to take care of? rishit_dagli
  7. • Package the model • Post the model on Server

    • Maintain the server Auto-scale What things to take care of? rishit_dagli
  8. • Package the model • Post the model on Server

    • Maintain the server Auto-scale Global availability What things to take care of? rishit_dagli
  9. • Package the model • Post the model on Server

    • Maintain the server Auto-scale Global availability Latency What things to take care of? rishit_dagli
  10. • Package the model • Post the model on Server

    • Maintain the server • API What things to take care of? rishit_dagli
  11. • Package the model • Post the model on Server

    • Maintain the server • API • Model Versioning What things to take care of? rishit_dagli
  12. Simple Deployments Why are they inefficient? • No consistent API

    • No model versioning • No mini-batching • Inefficient for large models Source: Hannes Hapke rishit_dagli
  13. • Part of TensorFlow Extended • Used Internally at Google

    • Makes deployment a lot easier TensorFlow Serving rishit_dagli
  14. • Consistent APIs • Supports simultaneously gRPC: 8500 REST: 8501

    • No lists but lists of lists Inference rishit_dagli
  15. • JSON response • Can specify a particular version Inference

    with REST Default URL http://{HOST}:8501/v1/ models/test:predict Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict rishit_dagli
  16. • JSON response • Can specify a particular version Inference

    with REST Default URL http://{HOST}:8501/v1/ models/test:predict Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict Port Model name rishit_dagli
  17. • Better connections • Data converted to protocol buffer •

    Request types have designated type • Payload converted to base64 • Use gRPC stubs Inference with gRPC rishit_dagli
  18. • You have an API to get meta info •

    Useful for model tracking in telementry systems • Provides model input/ outputs, signatures Model Meta Information rishit_dagli
  19. • Use hardware efficiently • Save costs and compute resources

    • Take multiple requests process them together • Super cool😎 for large models Batch inferences rishit_dagli
  20. • max_batch_size • batch_timeout_micros • num_batch_threads • max_enqueued_batches • file_system_poll_wait

    _seconds • tensorflow_session _paralellism • tensorflow_intra_op _parallelism Batch Inference Highly customizable rishit_dagli
  21. • Kubeflow deployments • Data pre-processing on server🚅 • AI

    Platform Predictions • Deployment on edge devices • Federated learning Also take a look at... rishit_dagli