Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Deployments Easy with TF Serving | TF Everywhere India

Making Deployments Easy with TF Serving | TF Everywhere India

My talk at TensorFlow Everywhere India

0d7c1e828ec0afbf29c0d37702c4637d?s=128

Rishit Dagli

May 11, 2021
Tweet

Transcript

  1. Making Deployments Easy with TF Serving Rishit Dagli High School

    TEDx, TED-Ed Speaker rishit_dagli Rishit-dagli
  2. “Most models don’t get deployed.”

  3. of models don’t get deployed. 90%

  4. Source: Laurence Moroney

  5. Source: Laurence Moroney

  6. • High School Student • TEDx and Ted-Ed Speaker •

    ♡ Hackathons and competitions • ♡ Research • My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli
  7. • Devs who have worked on Deep Learning Models (Keras)

    • Devs looking for ways to put their model into production ready manner Ideal Audience
  8. Why care about ML deployments? Source: memegenerator.net

  9. None
  10. • Package the model What things to take care of?

  11. • Package the model • Post the model on Server

    What things to take care of?
  12. • Package the model • Post the model on Server

    • Maintain the server What things to take care of?
  13. • Package the model • Post the model on Server

    • Maintain the server Auto-scale What things to take care of?
  14. • Package the model • Post the model on Server

    • Maintain the server Auto-scale What things to take care of?
  15. • Package the model • Post the model on Server

    • Maintain the server Auto-scale Global availability What things to take care of?
  16. • Package the model • Post the model on Server

    • Maintain the server Auto-scale Global availability Latency What things to take care of?
  17. • Package the model • Post the model on Server

    • Maintain the server • API What things to take care of?
  18. • Package the model • Post the model on Server

    • Maintain the server • API • Model Versioning What things to take care of?
  19. Simple Deployments Why are they inefficient?

  20. None
  21. Simple Deployments Why are they inefficient? • No consistent API

    • No model versioning • No mini-batching • Inefficient for large models Source: Hannes Hapke
  22. TensorFlow Serving

  23. TensorFlow Serving TensorFlow Data validation TensorFlow Transform TensorFlow Model Analysis

    TensorFlow Serving TensorFlow Extended
  24. • Part of TensorFlow Extended TensorFlow Serving

  25. • Part of TensorFlow Extended • Used Internally at Google

    TensorFlow Serving
  26. • Part of TensorFlow Extended • Used Internally at Google

    • Makes deployment a lot easier TensorFlow Serving
  27. The Process

  28. • The SavedModel format • Graph definitions as protocol buffer

    Export Model
  29. SavedModel Directory

  30. auxiliary files e.g. vocabularies SavedModel Directory

  31. auxiliary files e.g. vocabularies SavedModel Directory Variables

  32. auxiliary files e.g. vocabularies SavedModel Directory Variables Graph definitions

  33. TensorFlow Serving

  34. TensorFlow Serving

  35. TensorFlow Serving Also supports gRPC

  36. TensorFlow Serving

  37. TensorFlow Serving

  38. TensorFlow Serving

  39. TensorFlow Serving

  40. Inference

  41. • Consistent APIs • Supports simultaneously gRPC: 8500 REST: 8501

    • No lists but lists of lists Inference
  42. • No lists but lists of lists Inference

  43. • JSON response • Can specify a particular version Inference

    with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict
  44. • JSON response • Can specify a particular version Inference

    with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict Port Model name
  45. Inference with REST

  46. • Better connections • Data converted to protocol buffer •

    Request types have designated type • Payload converted to base64 • Use gRPC stubs Inference with gRPC
  47. Model Meta Information

  48. • You have an API to get meta info •

    Useful for model tracking in telementry systems • Provides model input/ outputs, signatures Model Meta Information
  49. Model Meta Information http://{HOST}:8501/ v1/models/{MODEL_NAME} /versions/{MODEL_VERSION} /metadata

  50. Batch Inferences

  51. • Use hardware efficiently • Save costs and compute resources

    • Take multiple requests process them together • Super cool😎 for large models Batch inferences
  52. • max_batch_size • batch_timeout_micros • num_batch_threads • max_enqueued_batches • file_system_poll_wait

    _seconds • tensorflow_session _paralellism • tensorflow_intra_op _parallelism Batch Inference Highly customizable
  53. • Load configuration file on startup • Change parameters according

    to use cases Batch Inference
  54. Also take a look at...

  55. • Kubeflow deployments • Data pre-processing on server🚅 • AI

    Platform Predictions • Deployment on edge devices • Federated learning Also take a look at...
  56. bit.ly/tf-everywhere-ind Demos!

  57. bit.ly/serving-deck Slides

  58. Thank You rishit_dagli Rishit-dagli