$30 off During Our Annual Pro Sale. View Details »

Making Deployments Easy with TF Serving | TF Everywhere India

Making Deployments Easy with TF Serving | TF Everywhere India

My talk at TensorFlow Everywhere India

Rishit Dagli

May 11, 2021
Tweet

More Decks by Rishit Dagli

Other Decks in Programming

Transcript

  1. Making Deployments
    Easy with
    TF Serving
    Rishit Dagli
    High School
    TEDx, TED-Ed Speaker
    rishit_dagli
    Rishit-dagli

    View Slide

  2. “Most models don’t get
    deployed.”

    View Slide

  3. of models don’t get deployed.
    90%

    View Slide

  4. Source: Laurence Moroney

    View Slide

  5. Source: Laurence Moroney

    View Slide

  6. ● High School Student
    ● TEDx and Ted-Ed Speaker
    ● ♡ Hackathons and competitions
    ● ♡ Research
    ● My coordinates - www.rishit.tech
    $whoami
    rishit_dagli Rishit-dagli

    View Slide

  7. ● Devs who have worked on Deep Learning
    Models (Keras)
    ● Devs looking for ways to put their
    model into production ready manner
    Ideal Audience

    View Slide

  8. Why care about
    ML deployments?
    Source: memegenerator.net

    View Slide

  9. View Slide

  10. ● Package the model
    What things to take care of?

    View Slide

  11. ● Package the model
    ● Post the model on Server
    What things to take care of?

    View Slide

  12. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    What things to take care of?

    View Slide

  13. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    What things to take care of?

    View Slide

  14. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    What things to take care of?

    View Slide

  15. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    Global availability
    What things to take care of?

    View Slide

  16. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    Global availability
    Latency
    What things to take care of?

    View Slide

  17. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    ● API
    What things to take care of?

    View Slide

  18. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    ● API
    ● Model Versioning
    What things to take care of?

    View Slide

  19. Simple
    Deployments
    Why are they inefficient?

    View Slide

  20. View Slide

  21. Simple Deployments
    Why are they inefficient?
    ● No consistent API
    ● No model versioning
    ● No mini-batching
    ● Inefficient for
    large models
    Source: Hannes Hapke

    View Slide

  22. TensorFlow Serving

    View Slide

  23. TensorFlow Serving
    TensorFlow Data
    validation
    TensorFlow Transform
    TensorFlow Model
    Analysis
    TensorFlow Serving
    TensorFlow Extended

    View Slide

  24. ● Part of TensorFlow Extended
    TensorFlow Serving

    View Slide

  25. ● Part of TensorFlow Extended
    ● Used Internally at Google
    TensorFlow Serving

    View Slide

  26. ● Part of TensorFlow Extended
    ● Used Internally at Google
    ● Makes deployment a lot easier
    TensorFlow Serving

    View Slide

  27. The Process

    View Slide

  28. ● The SavedModel
    format
    ● Graph definitions as
    protocol buffer
    Export Model

    View Slide

  29. SavedModel
    Directory

    View Slide

  30. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory

    View Slide

  31. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory
    Variables

    View Slide

  32. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory
    Variables
    Graph definitions

    View Slide

  33. TensorFlow Serving

    View Slide

  34. TensorFlow Serving

    View Slide

  35. TensorFlow Serving
    Also supports gRPC

    View Slide

  36. TensorFlow Serving

    View Slide

  37. TensorFlow Serving

    View Slide

  38. TensorFlow Serving

    View Slide

  39. TensorFlow Serving

    View Slide

  40. Inference

    View Slide

  41. ● Consistent APIs
    ● Supports simultaneously
    gRPC: 8500
    REST: 8501
    ● No lists but lists of lists
    Inference

    View Slide

  42. ● No lists but lists
    of lists
    Inference

    View Slide

  43. ● JSON response
    ● Can specify a
    particular version
    Inference with
    REST
    Default URL
    http://{HOST}:8501/v1/
    models/test
    Model Version
    http://{HOST}:8501/v1/
    models/test/versions/
    {MODEL_VERSION}:
    predict

    View Slide

  44. ● JSON response
    ● Can specify a
    particular version
    Inference with
    REST Default URL
    http://{HOST}:8501/v1/
    models/test
    Model Version
    http://{HOST}:8501/v1/
    models/test/versions/
    {MODEL_VERSION}:
    predict
    Port
    Model name

    View Slide

  45. Inference with REST

    View Slide

  46. ● Better connections
    ● Data converted to protocol buffer
    ● Request types have designated type
    ● Payload converted to base64
    ● Use gRPC stubs
    Inference with gRPC

    View Slide

  47. Model Meta
    Information

    View Slide

  48. ● You have an API to get meta info
    ● Useful for model tracking in
    telementry systems
    ● Provides model input/ outputs,
    signatures
    Model Meta Information

    View Slide

  49. Model Meta Information
    http://{HOST}:8501/
    v1/models/{MODEL_NAME}
    /versions/{MODEL_VERSION}
    /metadata

    View Slide

  50. Batch
    Inferences

    View Slide

  51. ● Use hardware efficiently
    ● Save costs and compute resources
    ● Take multiple requests process them
    together
    ● Super cool😎 for large models
    Batch inferences

    View Slide

  52. ● max_batch_size
    ● batch_timeout_micros
    ● num_batch_threads
    ● max_enqueued_batches
    ● file_system_poll_wait
    _seconds
    ● tensorflow_session
    _paralellism
    ● tensorflow_intra_op
    _parallelism
    Batch Inference
    Highly customizable

    View Slide

  53. ● Load configuration
    file on startup
    ● Change parameters
    according to use
    cases
    Batch Inference

    View Slide

  54. Also take a
    look at...

    View Slide

  55. ● Kubeflow deployments
    ● Data pre-processing on server🚅
    ● AI Platform Predictions
    ● Deployment on edge devices
    ● Federated learning
    Also take a look at...

    View Slide

  56. bit.ly/tf-everywhere-ind
    Demos!

    View Slide

  57. bit.ly/serving-deck
    Slides

    View Slide

  58. Thank You
    rishit_dagli Rishit-dagli

    View Slide