Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deploying an ML Model as an API | Postman Student Summit

Deploying an ML Model as an API | Postman Student Summit

My talk at Postman Student Summit

0d7c1e828ec0afbf29c0d37702c4637d?s=128

Rishit Dagli

July 30, 2021
Tweet

Transcript

  1. Deploying an ML Model as an API Rishit Dagli High

    School TEDx, TED-Ed Speaker rishit_dagli Rishit-dagli
  2. “Most models don’t get deployed.” rishit_dagli

  3. of models don’t get deployed. 90% rishit_dagli

  4. Source: Laurence Moroney rishit_dagli

  5. Source: Laurence Moroney rishit_dagli

  6. • High School Student • TEDx and 2xTed-Ed Speaker •

    Postman Student Leader • I ❤ ML Research • My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli
  7. • Devs who have worked on Deep Learning Models (Keras)

    • Devs looking for ways to put their model into production ready manner Ideal Audience rishit_dagli
  8. Why care about ML deployments? Source: memegenerator.net rishit_dagli

  9. rishit_dagli

  10. • Package the model What things to take care of?

    rishit_dagli
  11. • Package the model • Post the model on Server

    What things to take care of? rishit_dagli
  12. • Package the model • Post the model on Server

    • Maintain the server What things to take care of? rishit_dagli
  13. • Package the model • Post the model on Server

    • Maintain the server Auto-scale What things to take care of? rishit_dagli
  14. • Package the model • Post the model on Server

    • Maintain the server Auto-scale What things to take care of? rishit_dagli
  15. • Package the model • Post the model on Server

    • Maintain the server Auto-scale Global availability What things to take care of? rishit_dagli
  16. • Package the model • Post the model on Server

    • Maintain the server Auto-scale Global availability Latency What things to take care of? rishit_dagli
  17. • Package the model • Post the model on Server

    • Maintain the server • API What things to take care of? rishit_dagli
  18. • Package the model • Post the model on Server

    • Maintain the server • API • Model Versioning What things to take care of? rishit_dagli
  19. Simple Deployments Why are they inefficient?

  20. None
  21. Simple Deployments Why are they inefficient? • No consistent API

    • No model versioning • No mini-batching • Inefficient for large models Source: Hannes Hapke rishit_dagli
  22. TensorFlow Serving

  23. TensorFlow Serving TensorFlow Data validation TensorFlow Transform TensorFlow Model Analysis

    TensorFlow Serving TensorFlow Extended rishit_dagli
  24. • Part of TensorFlow Extended TensorFlow Serving rishit_dagli

  25. • Part of TensorFlow Extended • Used Internally at Google

    TensorFlow Serving rishit_dagli
  26. • Part of TensorFlow Extended • Used Internally at Google

    • Makes deployment a lot easier TensorFlow Serving rishit_dagli
  27. The Process

  28. • The SavedModel format • Graph definitions as protocol buffer

    Export Model rishit_dagli
  29. SavedModel Directory rishit_dagli

  30. auxiliary files e.g. vocabularies SavedModel Directory rishit_dagli

  31. auxiliary files e.g. vocabularies SavedModel Directory Variables rishit_dagli

  32. auxiliary files e.g. vocabularies SavedModel Directory Variables Graph definitions rishit_dagli

  33. TensorFlow Serving rishit_dagli

  34. TensorFlow Serving rishit_dagli

  35. TensorFlow Serving Also supports gRPC rishit_dagli

  36. TensorFlow Serving rishit_dagli

  37. TensorFlow Serving rishit_dagli

  38. TensorFlow Serving rishit_dagli

  39. TensorFlow Serving rishit_dagli

  40. Inference

  41. • Consistent APIs • Supports simultaneously gRPC: 8500 REST: 8501

    • No lists but lists of lists Inference rishit_dagli
  42. • No lists but lists of lists Inference rishit_dagli

  43. • JSON response • Can specify a particular version Inference

    with REST Default URL http://{HOST}:8501/v1/ models/test:predict Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict rishit_dagli
  44. • JSON response • Can specify a particular version Inference

    with REST Default URL http://{HOST}:8501/v1/ models/test:predict Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict Port Model name rishit_dagli
  45. Inference with REST rishit_dagli

  46. • Better connections • Data converted to protocol buffer •

    Request types have designated type • Payload converted to base64 • Use gRPC stubs Inference with gRPC rishit_dagli
  47. Model Meta Information

  48. • You have an API to get meta info •

    Useful for model tracking in telementry systems • Provides model input/ outputs, signatures Model Meta Information rishit_dagli
  49. Model Meta Information http://{HOST}:8501/ v1/models/{MODEL_NAME} /versions/{MODEL_VERSION} /metadata rishit_dagli

  50. Batch Inferences

  51. • Use hardware efficiently • Save costs and compute resources

    • Take multiple requests process them together • Super cool😎 for large models Batch inferences rishit_dagli
  52. • max_batch_size • batch_timeout_micros • num_batch_threads • max_enqueued_batches • file_system_poll_wait

    _seconds • tensorflow_session _paralellism • tensorflow_intra_op _parallelism Batch Inference Highly customizable rishit_dagli
  53. • Load configuration file on startup • Change parameters according

    to use cases Batch Inference rishit_dagli
  54. Also take a look at...

  55. • Kubeflow deployments • Data pre-processing on server🚅 • AI

    Platform Predictions • Deployment on edge devices • Federated learning Also take a look at... rishit_dagli
  56. rishit_dagli bit.ly/postman-summit-deck Slides

  57. Demos! bit.ly/postman-summit-demo rishit_dagli

  58. Thank You rishit_dagli Rishit-dagli