$30 off During Our Annual Pro Sale. View Details »

Deploying Models to production with Azure ML | Scottish Summit

Deploying Models to production with Azure ML | Scottish Summit

Deploying models is a pretty important aspect to consider while building end-to-end ML applications. I first plan to show how the models could be registered with Azure ML so as to make them accessible and allow them to be loaded for deployment. I then plan to show how configurations could be built for deploying the models with Azure ML. Azure ML allows us to easily deploy models to receive low latency real-time inferences which are required for a lot of applications, so I would majorly focus on this and also show how one could consume these models. I would further show how to also build batch inference pipelines. If time persists I would also show demos for the same.

Rishit Dagli

May 11, 2021
Tweet

More Decks by Rishit Dagli

Other Decks in Programming

Transcript

  1. #ScottishSummit2021
    R i s h i t D a g l i
    D e p l o y i n g M o d e l s
    t o P r o d u c t i o n w i t h
    A z u r e M L
    M e t h v e n , S a t 1 7 : 3 0
    @rishit_dagli
    Rishit-dagli
    www.rishit.tech

    View Slide

  2. Our Sponsors

    View Slide

  3. “Most models don’t get
    deployed.”

    View Slide

  4. 90%
    Of models don’t get deployed

    View Slide

  5. S o u r c e : L a u r e n c e M o r o n e y

    View Slide

  6. TEDx, TED-Ed Speaker
    High School
    Rishit Dagli
    @rishit_dagli
    Rishit-dagli
    www.rishit.tech

    View Slide

  7. • High School Student
    • TEDx and Ted-Ed Speaker
    • ♡Hackathons and competitions
    • ♡Research
    • My coordinates -www.rishit.tech
    $whoami

    View Slide

  8. Acknowledgements
    • Henk Boelman(Microsoft)
    • Dawood Iddris(AI MVP)

    View Slide

  9. • Devs who have worked on creating Machine Learning Models
    • Devs looking for ways to put their model into production
    ready manner
    Ideal Audience

    View Slide

  10. Why care about
    ML deployments?
    Source: memegenerator
    .net

    View Slide

  11. View Slide

  12. • Package the model
    What things to take care of?

    View Slide

  13. • Package the model
    • Post the model on Server
    What things to take care of?

    View Slide

  14. • Package the model
    • Post the model on Server
    • Maintain the server
    What things to take care of?

    View Slide

  15. • Package the model
    • Post the model on Server
    • Maintain the server
    o Auto-scale
    What things to take care of?

    View Slide

  16. • Package the model
    • Post the model on Server
    • Maintain the server
    o Auto-scale
    What things to take care of?

    View Slide

  17. • Package the model
    • Post the model on Server
    • Maintain the server
    o Auto-scale
    o Global Availability
    What things to take care of?

    View Slide

  18. • Package the model
    • Post the model on Server
    • Maintain the server
    o Auto-scale
    o Global Availability
    o Latency
    What things to take care of?

    View Slide

  19. • Package the model
    • Post the model on Server
    • Maintain the server
    • API
    What things to take care of?

    View Slide

  20. • Package the model
    • Post the model on Server
    • Maintain the server
    • API
    • Model Versioning
    What things to take care of?

    View Slide

  21. • Package the model
    • Post the model on Server
    • Maintain the server
    • API
    • Model Versioning
    • Batch Predictions
    What things to take care of?

    View Slide

  22. Simple Deployments
    Why are they Inefficient?

    View Slide

  23. View Slide

  24. Simple Deployments
    Why are they Inefficient?
    • No consistent API
    • No model versioning
    • No mini-batching
    • Inefficient for large models
    Source: Hannes Hapke

    View Slide

  25. Deploying a Model
    A Walkthrough

    View Slide

  26. What do we need?
    • Register Your Model
    • Load the Model
    • Perform Inference
    • Deploy the model

    View Slide

  27. What do we need?
    • Register Your Model
    • Load the Model
    • Perform Inference
    Do it at Scale

    View Slide

  28. Register a Model

    View Slide

  29. Register a Model
    TensorFlow 2
    Saved Model

    View Slide

  30. Register a Model
    .onnx
    .pkl
    .pt

    View Slide

  31. Register a Model
    With the run object

    View Slide

  32. Register a Model
    With the run object
    .onnx
    .pkl
    .pt

    View Slide

  33. Creating an Inference Service
    • Load the Model

    View Slide

  34. Creating an Inference Service
    • Load the Model
    • Inference from the Model

    View Slide

  35. Creating an Inference Service
    • Load the Model
    • Inference from the Model
    Environment

    View Slide

  36. Creating an Inference Service

    View Slide

  37. Creating an Inference Service
    Load the registered
    model
    Really do the
    inference

    View Slide

  38. Load a model

    View Slide

  39. Load a model

    View Slide

  40. Let’s inference from the model

    View Slide

  41. Let’s inference from the model

    View Slide

  42. Let’s inference from the model

    View Slide

  43. Let’s inference from the model

    View Slide

  44. And that’s it

    View Slide

  45. Set up an environment
    Customizable
    • Can use a Docker Image directly
    • Can manage the dependencies
    yourself too
    • Can specify a custom interpreter
    • Customizable Spark Settings

    View Slide

  46. Let’s deploy it!

    View Slide

  47. Let’s deploy it!

    View Slide

  48. Let’s deploy it!

    View Slide

  49. Let’s deploy it!

    View Slide

  50. Deployment Configuration

    View Slide

  51. And Deploy to AKS

    View Slide

  52. Inference with REST
    • JSON response
    • Can specify a particular version

    View Slide

  53. Inference with REST

    View Slide

  54. Inference with gRPC
    • Better connections
    • Data converted to protocol buffer
    • Request types have designated type
    • Payload converted to base64
    • Use gRPCstubs

    View Slide

  55. Batch
    Inferences

    View Slide

  56. Batch Inferences
    • Use hardware efficiently
    • Save costs and compute resources
    • Take multiple requests process them together
    • Super cool😎for large models

    View Slide

  57. Batch Inferences
    • Update the run() function
    • Runs on each batch of data

    View Slide

  58. Batch Inferences

    View Slide

  59. Configure the ParallelRun

    View Slide

  60. Create the pipeline

    View Slide

  61. Publish the pipeline

    View Slide

  62. Demos!

    View Slide

  63. #ScottishSummit2021
    Thank You
    @rishit_dagli
    Rishit-dagli
    www.rishit.tech

    View Slide