$30 off During Our Annual Pro Sale. View Details »

Deploying Models to Production with TF Serving

Deploying Models to Production with TF Serving

I plan to make it an intermediate level talk and would just expect the audience to know how they can make their own models with TensorFlow or Keras and take it forward from there and show how they can serve their models over HTTP and HTTPS. I would show how we can essentially follow the main steps of putting a model into production, package it and make it ready for deployment, upload it somewhere in the cloud, make an API and most importantly have no downtime while you are updating the model and doing version numbering efficiently. I plan to cover all these which are the steps required to deploy a model in the wild and how TensorFlow simplifies them for a developer. I would then also show how applications could access the model maybe through web or cloud calls. If time permits I could also show how one could make this deployment to auto scale using GCP Cloud functions and/or Kubernetes.

Rishit Dagli

October 18, 2020
Tweet

More Decks by Rishit Dagli

Other Decks in Technology

Transcript

  1. Deploying models
    to production
    with TF Serving
    Rishit Dagli
    High School
    TEDx, TED-Ed Speaker
    rishit_dagli
    Rishit-dagli

    View Slide

  2. “Most models don’t get
    deployed.”

    View Slide

  3. of models don’t get deployed.
    90%

    View Slide

  4. Source: Laurence Moroney

    View Slide

  5. Source: Laurence Moroney

    View Slide

  6. ● 11 Grade Student
    ● TEDx and Ted-Ed Speaker
    ● ♡ Hackathons and competitions
    ● ♡ Research
    ● My coordinates - www.rishit.tech
    $whoami
    rishit_dagli Rishit-dagli

    View Slide

  7. ● Devs who have worked on Deep Learning
    Models (Keras)
    ● Devs looking for ways to put their
    model into production ready manner
    Ideal Audience

    View Slide

  8. Why care about
    ML deployments?
    Source: memegenerator.net

    View Slide

  9. View Slide

  10. ● Package the model
    What things to take care of?

    View Slide

  11. ● Package the model
    ● Post the model on Server
    What things to take care of?

    View Slide

  12. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    What things to take care of?

    View Slide

  13. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    What things to take care of?

    View Slide

  14. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    What things to take care of?

    View Slide

  15. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    Global availability
    What things to take care of?

    View Slide

  16. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    Auto-scale
    Global availability
    Latency
    What things to take care of?

    View Slide

  17. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    ● API
    What things to take care of?

    View Slide

  18. ● Package the model
    ● Post the model on Server
    ● Maintain the server
    ● API
    ● Model Versioning
    What things to take care of?

    View Slide

  19. Simple
    Deployments
    Why are they inefficient?

    View Slide

  20. View Slide

  21. Simple Deployments
    Why are they inefficient?
    ● No consistent API
    ● No model versioning
    ● No mini-batching
    ● Inefficient for
    large models
    Source: Hannes Hapke

    View Slide

  22. TensorFlow Serving

    View Slide

  23. TensorFlow Serving
    TensorFlow Data
    validation
    TensorFlow Transform
    TensorFlow Model
    Analysis
    TensorFlow Serving
    TensorFlow Extended

    View Slide

  24. ● Part of TensorFlow Extended
    TensorFlow Serving

    View Slide

  25. ● Part of TensorFlow Extended
    ● Used Internally at Google
    TensorFlow Serving

    View Slide

  26. ● Part of TensorFlow Extended
    ● Used Internally at Google
    ● Makes deployment a lot easier
    TensorFlow Serving

    View Slide

  27. The Process

    View Slide

  28. ● The SavedModel
    format
    ● Graph definitions as
    protocol buffer
    Export Model

    View Slide

  29. SavedModel
    Directory

    View Slide

  30. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory

    View Slide

  31. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory
    Variables

    View Slide

  32. auxiliary files
    e.g. vocabularies
    SavedModel
    Directory
    Variables
    Graph definitions

    View Slide

  33. TensorFlow Serving

    View Slide

  34. TensorFlow Serving

    View Slide

  35. TensorFlow Serving
    Also supports gRPC

    View Slide

  36. TensorFlow Serving

    View Slide

  37. TensorFlow Serving

    View Slide

  38. TensorFlow Serving

    View Slide

  39. TensorFlow Serving

    View Slide

  40. Inference

    View Slide

  41. ● Consistent APIs
    ● Supports simultaneously
    gRPC: 8500
    REST: 8501
    ● No lists but lists of lists
    Inference

    View Slide

  42. ● No lists but lists
    of lists
    Inference

    View Slide

  43. ● JSON response
    ● Can specify a
    particular version
    Inference with
    REST
    Default URL
    http://{HOST}:8501/v1/
    models/test
    Model Version
    http://{HOST}:8501/v1/
    models/test/versions/
    {MODEL_VERSION}:
    predict

    View Slide

  44. ● JSON response
    ● Can specify a
    particular version
    Inference with
    REST Default URL
    http://{HOST}:8501/v1/
    models/test
    Model Version
    http://{HOST}:8501/v1/
    models/test/versions/
    {MODEL_VERSION}:
    predict
    Port
    Model name

    View Slide

  45. Inference with REST

    View Slide

  46. ● Better connections
    ● Data converted to protocol buffer
    ● Request types have designated type
    ● Payload converted to base64
    ● Use gRPC stubs
    Inference with gRPC

    View Slide

  47. Model Meta
    Information

    View Slide

  48. ● You have an API to get meta info
    ● Useful for model tracking in
    telementry systems
    ● Provides model input/ outputs,
    signatures
    Model Meta Information

    View Slide

  49. Model Meta Information
    http://{HOST}:8501/
    v1/models/{MODEL_NAME}
    /versions/{MODEL_VERSION}
    /metadata

    View Slide

  50. Batch
    Inferences

    View Slide

  51. ● Use hardware efficiently
    ● Save costs and compute resources
    ● Take multiple requests process them
    together
    ● Super cool for large models
    Batch inferences

    View Slide

  52. ● max_batch_size
    ● batch_timeout_micros
    ● num_batch_threads
    ● max_enqueued_batches
    ● file_system_poll_wait
    _seconds
    ● tensorflow_session
    _paralellism
    ● tensorflow_intra_op
    _parallelism
    Batch Inference
    Highly customizable

    View Slide

  53. ● Load configuration
    file on startup
    ● Change parameters
    according to use
    cases
    Batch Inference

    View Slide

  54. Also take a
    look at...

    View Slide

  55. ● Kubeflow deployments
    ● Data pre-processing on server
    ● AI Platform Predictions
    ● Deployment on edge devices
    ● Federated learning
    Also take a look at...

    View Slide

  56. ● Valid only for today
    ● go.qwiklabs.com/cloud-study-jams-2020
    ● Select ML Infrastructure Study Jam
    ● Enter code 1s-Nairobi-8989
    ● Complete 1 lab to get 1 month free access
    ● Complete the quest to get 2 month free access!
    ● 1 Month free Coursera access
    Qwiklabs rishit.tech/qwiklabs-offer

    View Slide

  57. df-kenya.rishit.tech
    Demos!

    View Slide

  58. Q & A
    rishit_dagli Rishit-dagli

    View Slide

  59. Thank You
    rishit_dagli Rishit-dagli

    View Slide