$30 off During Our Annual Pro Sale. View Details »

Deploying an ML Model as an API | Postman Student Summit

Deploying an ML Model as an API | Postman Student Summit

My talk at Postman Student Summit

Rishit Dagli

July 30, 2021
Tweet

More Decks by Rishit Dagli

Other Decks in Technology

Transcript

  1. Deploying an ML
    Model as an API
    Rishit Dagli
    High School
    TEDx, TED-Ed Speaker
    rishit_dagli
    Rishit-dagli

    View Slide

  2. “Most models don’t get
    deployed.”
    rishit_dagli

    View Slide

  3. of models don’t get deployed.
    90%
    rishit_dagli

    View Slide

  4. Source: Laurence Moroney rishit_dagli

    View Slide

  5. Source: Laurence Moroney rishit_dagli

    View Slide

  6. ● High School Student
    ● TEDx and 2xTed-Ed Speaker
    ● Postman Student Leader
    ● I ❤ ML Research
    ● My coordinates - www.rishit.tech
    $whoami
    rishit_dagli Rishit-dagli

    View Slide

  7. ● Devs who have worked on Deep Learning
    Models (Keras)
    ● Devs looking for ways to put their
    model into production ready manner
    Ideal Audience
    rishit_dagli

    View Slide

  8. Why care about
    ML deployments?
    Source: memegenerator.net rishit_dagli

    View Slide

  9. rishit_dagli

    View Slide

  10. ● Package the model
    What things to take care of?
    rishit_dagli

    View Slide

  11. ● Package the model
    ● Post the model on Server
    What things to take care of?
    rishit_dagli

    View Slide

  12. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    What things to take care of?
    rishit_dagli

    View Slide

  13. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    What things to take care of?
    rishit_dagli

    View Slide

  14. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    What things to take care of?
    rishit_dagli

    View Slide

  15. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    Global availability
    What things to take care of?
    rishit_dagli

    View Slide

  16. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    Global availability
    Latency
    What things to take care of?
    rishit_dagli

    View Slide

  17. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    ● API
    What things to take care of?
    rishit_dagli

    View Slide

  18. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    ● API
    ● Model Versioning
    What things to take care of?
    rishit_dagli

    View Slide

  19. Simple
    Deployments
    Why are they inefficient?

    View Slide

  20. View Slide

  21. Simple Deployments
    Why are they inefficient?
    ● No consistent API
    ● No model versioning
    ● No mini-batching
    ● Inefficient for
    large models
    Source: Hannes Hapke
    rishit_dagli

    View Slide

  22. TensorFlow Serving

    View Slide

  23. TensorFlow Serving
    TensorFlow Data
    validation
    TensorFlow Transform
    TensorFlow Model
    Analysis
    TensorFlow Serving
    TensorFlow Extended
    rishit_dagli

    View Slide

  24. ● Part of TensorFlow Extended
    TensorFlow Serving
    rishit_dagli

    View Slide

  25. ● Part of TensorFlow Extended
    ● Used Internally at Google
    TensorFlow Serving
    rishit_dagli

    View Slide

  26. ● Part of TensorFlow Extended
    ● Used Internally at Google
    ● Makes deployment a lot easier
    TensorFlow Serving
    rishit_dagli

    View Slide

  27. The Process

    View Slide

  28. ● The SavedModel
    format
    ● Graph definitions as
    protocol buffer
    Export Model
    rishit_dagli

    View Slide

  29. SavedModel
    Directory
    rishit_dagli

    View Slide

  30. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory
    rishit_dagli

    View Slide

  31. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory
    Variables
    rishit_dagli

    View Slide

  32. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory
    Variables
    Graph definitions
    rishit_dagli

    View Slide

  33. TensorFlow Serving
    rishit_dagli

    View Slide

  34. TensorFlow Serving
    rishit_dagli

    View Slide

  35. TensorFlow Serving
    Also supports gRPC
    rishit_dagli

    View Slide

  36. TensorFlow Serving
    rishit_dagli

    View Slide

  37. TensorFlow Serving
    rishit_dagli

    View Slide

  38. TensorFlow Serving
    rishit_dagli

    View Slide

  39. TensorFlow Serving
    rishit_dagli

    View Slide

  40. Inference

    View Slide

  41. ● Consistent APIs
    ● Supports simultaneously
    gRPC: 8500
    REST: 8501
    ● No lists but lists of lists
    Inference
    rishit_dagli

    View Slide

  42. ● No lists but lists
    of lists
    Inference
    rishit_dagli

    View Slide

  43. ● JSON response
    ● Can specify a
    particular version
    Inference with
    REST
    Default URL
    http://{HOST}:8501/v1/
    models/test:predict
    Model Version
    http://{HOST}:8501/v1/
    models/test/versions/
    {MODEL_VERSION}:
    predict
    rishit_dagli

    View Slide

  44. ● JSON response
    ● Can specify a
    particular version
    Inference with
    REST
    Default URL
    http://{HOST}:8501/v1/
    models/test:predict
    Model Version
    http://{HOST}:8501/v1/
    models/test/versions/
    {MODEL_VERSION}:
    predict
    Port
    Model name
    rishit_dagli

    View Slide

  45. Inference with REST
    rishit_dagli

    View Slide

  46. ● Better connections
    ● Data converted to protocol buffer
    ● Request types have designated type
    ● Payload converted to base64
    ● Use gRPC stubs
    Inference with gRPC
    rishit_dagli

    View Slide

  47. Model Meta
    Information

    View Slide

  48. ● You have an API to get meta info
    ● Useful for model tracking in
    telementry systems
    ● Provides model input/ outputs,
    signatures
    Model Meta Information
    rishit_dagli

    View Slide

  49. Model Meta Information
    http://{HOST}:8501/
    v1/models/{MODEL_NAME}
    /versions/{MODEL_VERSION}
    /metadata
    rishit_dagli

    View Slide

  50. Batch
    Inferences

    View Slide

  51. ● Use hardware efficiently
    ● Save costs and compute resources
    ● Take multiple requests process them
    together
    ● Super cool😎 for large models
    Batch inferences
    rishit_dagli

    View Slide

  52. ● max_batch_size
    ● batch_timeout_micros
    ● num_batch_threads
    ● max_enqueued_batches
    ● file_system_poll_wait
    _seconds
    ● tensorflow_session
    _paralellism
    ● tensorflow_intra_op
    _parallelism
    Batch Inference
    Highly customizable
    rishit_dagli

    View Slide

  53. ● Load configuration
    file on startup
    ● Change parameters
    according to use
    cases
    Batch Inference
    rishit_dagli

    View Slide

  54. Also take a
    look at...

    View Slide

  55. ● Kubeflow deployments
    ● Data pre-processing on server🚅
    ● AI Platform Predictions
    ● Deployment on edge devices
    ● Federated learning
    Also take a look at...
    rishit_dagli

    View Slide

  56. rishit_dagli
    bit.ly/postman-summit-deck
    Slides

    View Slide

  57. Demos!
    bit.ly/postman-summit-demo
    rishit_dagli

    View Slide

  58. Thank You
    rishit_dagli Rishit-dagli

    View Slide