Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build and deploy PyTorch models with Azure Machine Learning

Build and deploy PyTorch models with Azure Machine Learning

With machine learning becoming more and more an engineering problem the need to track, work together and easily deploy ML experiments with integrated CI/CD tooling is becoming more relevant then ever.

In this session we take a deep-dive into Azure Machine Learning, a cloud service that you can use to track as you build, train, deploy, and manage models. We use the Azure Machine Learning Python SDK to manage the complete life cycle of a PyTorch model, from managing the data, to train the model and finally run it into a production Kubernetes cluster.

At the end of this session you have a good grasp of the technological building blocks of Azure machine learning services and train a PyTorch model on scale.

Henk Boelman

October 04, 2022
Tweet

More Decks by Henk Boelman

Other Decks in Technology

Transcript

  1. What are we building? Endpoint { “homer”: 0.9 } Lego

    Dataset PyTorch Classification Model
  2. What is Transfer Learning Transfer learning, used in machine learning,

    is the reuse of a pre- trained model on a new problem. In transfer learning, a machine exploits the knowledge gained from a previous task to improve generalization about another. For example, in training a classifier to predict whether an image contains food, you could use the knowledge it gained during training to recognize drinks.
  3. What is Computer Vision Computer vision is a field of

    artificial intelligence that trains computers to interpret and understand the visual world. Using digital images and deep learning, machines can accurately detect and classify objects.
  4. What is PyTorch PyTorch is an open source machine learning

    framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab (FAIR). It is free and open-source software released under the Modified BSD license.
  5. Distributed training Challenges of distributed training Dependencies and containers Schedule

    jobs Distribute data Scale resources Provision clusters of VMS Gather results Secure Access Handling failures
  6. What is Azure Machine Learning? Set of Azure Cloud Services

    Python SDK AML Studio or Interface of your choice That enables you to:  Prepare Data  Manage Models  Build Models  Track Experiments  Train Models  Deploy Models
  7. Compute Run AML Instance Local Compute Linked Compute AML Cluster

    AutoML Visual Designer SDK Pipeline Training Script Deployment Scoring Script Deployment configuration Dataset Datastore Data labeling Experiment Artifacts Model Logs & Metrics Model Management Environments Curated Custom AKS ACI Online Endpoint Batch Endpoint
  8. Training Options 3 ways to train your model in Azure

    ML 1. All-code: Run configuration 2. No-code: Automated machine learning (AutoML) 3. Some-code: Machine learning pipeline using the Azure ML designer
  9. What is Automated Machine Learning? Automated machine learning (automated ML)

    automates feature engineering, algorithm and hyperparameter selection to find the ‘best model’ for your data. User inputs Featurization & Feature engineering Algorithm selection Hyperparameter tuning Model Leaderboard Dataset Configuration & Constraints 76% 34% 82% 41% 88% 72% 81% 54% 73% 88% 90% 91% 95% 68% 56% 89% 89% 79% Rank Model Score 1 95% …
  10. Azure ML Designer What is the Azure ML Designer? AML

    Designer is an UI interface which enables you to build machine learning pipelines with drag-n-drop experience and simplify the publishment and deployment of your pipelines. Key advantages: Connect to your own data with ease. Hundred of pre-build components help you to build and train models without writing code. Automate model validation, evaluation and interpretation in your pipeline. Deploy models and publish endpoints with a few clicks
  11. All Code: Run Configuration Run configuration is an all-code approach

    for training. Submit jobs with the follow specifications: The training script to execute The compute target to execute the run on The environment to execute the run in Any other parameters such as data, distributed configuration, custom variable, etc Then use Azure ML Experiments UI to track the results of your run
  12. DevOps is the union of people, process, and products to

    enable continuous delivery of value to your end users. “ ”
  13. MLOps Workflow Model management & monitoring Model performance analysis Train

    model Validate model Deploy model Monitor model Retrain model Build app Collaborate Test app Release app Monitor app App developer using Azure DevOps Data scientist using Azure Machine Learning Model reproducibility Model validation Model deployment Model retraining
  14. ML Life Cycle Data Preparation Upload data to Blob Create

    a dataset Model Building Use PyTorch and Python in GitHub Codespaces Model Training Code first pipeline Using a PyTorch training Environment Run Azure Machine Learning Cluster Using a GitHub Action Model Registration Pipeline registers the model in Model Management Model Deployment Deploy model in an Online Endpoint using a GitHub Action
  15. Compute Run AML Instance Local Compute Linked Compute AML Cluster

    AutoML Visual Designer SDK Pipeline Training Script Deployment Scoring Script Deployment configuration Dataset Datastore Data labeling Experiment Artifacts Model Logs & Metrics Model Management Environments Curated Custom AKS ACI Online Endpoint Batch Endpoint
  16. ML Pipelines An Azure Machine Learning pipeline is an independently

    executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. An Azure Machine Learning pipeline can be as simple as one that calls a Python script, so may do just about anything. Workflows of steps that can use Data Sources, Datasets and run on Compute targets
  17. Create a step / component Input Output Runs a script

    on a Compute Target in a Docker container. Parameters
  18. Create a pipeline Dataset of Simpsons Images Train the Model

    with a PyTorch model Register the model Blob Storage Account Model Management
  19. Choosing an inferencing target First time deploying? Yes: Local deployment

    No: Are you prioritizing low latency or high throughput? Low latency: are you prioritizing high availability or minimal costs? Yes: what kind of workload? Traditional ML, low throughput: Azure Kubernetes Service on CPUs Deep learning, high throughput: Azure Kubernetes Service on NVIDIA GPUs Low costs: Azure Container Instances (ACI) High throughput: batch inference on AmlCompute High availability: Do you need to manage your own cluster? No: Azure Machine Learning Online Endpoint
  20. Recap GitHub Action Training Pipeline Create Compute Start deployment to

    Online Point Creates & Starts A pipeline Model Management Adds event in event grid Adds Message in Queue Starts Logic App workflow GitHub Action Deploy Pipeline Create Environment Online Endpoint { “homer”: 0.9 } User Simpson Dataset