Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI Bootcamp 2019 - Bulgaria - Getting started with Azure ML Service

Ivan Donev
December 14, 2019

AI Bootcamp 2019 - Bulgaria - Getting started with Azure ML Service

AI Bootcamp 2019 - Bulgaria - Getting started with Azure ML Service

Ivan Donev

December 14, 2019
Tweet

More Decks by Ivan Donev

Other Decks in Technology

Transcript

  1. DECEMBER 14 GLOBAL AI BOOTCAMP IS POWERED BY: Introduction to

    Azure Machine learning Service Ivan Donev Solution Architect @ Inspirit Trainer @ SQL Master Academy
  2. Let’s work with a sample case • Business problem: brooms

    are sold out before the end of each month and management needs to know how to properly stock them • Your input data is like this: • You need to go through the whole data science process. What would be the steps?
  3. Data Preparation • Often performed by a data engineer but

    can also be done by the data scientist. • Cleanse data of null and bad values. • Data may be aggregated (counts, sums, averages, etc.) and reformatted or calculated columns may be added. • Data can come from many sources and formats such as SQL Server, Cosmos DB, Flat Files, etc. • Data must be merged from the sources into a consistent and useful dataset.
  4. Feature Engineering • Goal is to identify the data needed

    to support you machine learning model. • In some cases, raw data can be used with little or no transformation. • In other cases, you need to create the data you need through transformations or aggregations. • Example of possibly little transformation might be the customer country code. • Example of a transformation might be to calculate the patient age using birth date and create age bands like ‘< 18’, ’18 – 25’, ‘26 – 35’, ‘36 – 45’, etc. • Example of aggregation would be to count the number of past readmissions for patient.
  5. Model Training: Model Selection • Select the machine learning models

    you think will have the best accuracy. • Different types of models work best for certain types of problems. • Some models can predict continuous values like sales. • Others classify data such as good email or SPAM. • Models vary in the complexity of algorithms they support. • More complex models require more resources to use.
  6. Model Training: Steps 1. Split the data into Training and

    Testing. 2. Generate (Train) the model. o Cross-Validation 3. Test the model. Use the Testing data to make predictions with the model. 4. Score the model, i.e. compare predicted with actual values.
  7. Model Training: Evaluating Results Accuracy and precision A couple of

    metrics that are useful when evaluating a classification model are accuracy and precision. To understand these terms, it is essential to understand what a confusion matrix is. A confusion matrix includes the following data. •True positive: the number of times a model predicts true (or yes) when it is actually yes •True negative: the number of times a model predicts false (or no) when it is actually false •False negative: the number of times a model predicts false when it is actually true •False positive: the number of times a model predicts true when it is actually false Mean squared error Mean squared error (MSE) is one of the most popular model evaluation metrics in statistical modeling. It allows you to look at how far your predictions are on average from the correct y values. These are just a couple of metric available for model evaluation. The type of model will drive the direction of the model evaluation, but the technique you choose will depend on whether the model is a classification or numerical prediction.
  8. Model Training vs Model Use • Model training is often

    resource intensive using a lot of CPU cycles and memory. • The end result of model training is a small, lightweight, program called a model. • Using a model usually takes significantly less resources than training it. • The bottom line is that we don’t need to use the same platform for consuming the model that we used to train it.
  9. Model Deployment • Putting the trained model somewhere it can

    be used such as by a web/mobile app. • Usually this is done by a developer. • Deployment requirements are very different from model training requirements. o Security o Dependencies o Availability • Ideal for cloud based containers.
  10. AML service within the Data Science Pipeline • Create a

    workspace to store your work. • Use Python or the Azure Portal. • An experiment within the workspace stores model training. information • Use the IDE of your choice. • Use Python libraries or the Azure Data Prep SDK. • Train models with the Python open source modules of your choice. • Train locally or on Azure. • Submit model training to Azure containers. • Monitor model training. • Register final model. • Deploy trained model to Azure target environment: • Azure Kubernetes Service (AKS) • IoT Edge • Field Programmable Gate Array (FPGA)
  11. Deployment • To make a model available for consumption, you

    deploy it. • Target Deployment Environments Supported are: o Docker image o Azure Container Instances o Azure Kubernetes Service (AKS) o Azure IoT Edge o Field Programmable Gate Array (FPGA). • For the deployment, you need the following files: o A score script file tells AMLS to call the model. o An environment file specifies package dependencies o A configuration file requests the required resources for the container
  12. The Pipeline Features: • Can schedule tasks and execution times.

    • Flexibility to allocate compute targets for individual steps and to coordinate multiple pipelines. • Can reuse pipeline scripts, and customize them for different processes; for example, in retraining and batch scoring • C record and manage all input, output, intermediate tasks, and data.
  13. Deployment Targets • Azure Container Instances (ACI) o Testing o

    Limited resources (memory and CPU) • Azure Kubernetes Service (AKS) o Real-time interface o Production and web services o Highly scalable (auto-scale) o Supports GPU! • AML Compute o Batch interface o Normal and low-prio VMs managed by the AMLS o Supports GPU • Azure IoT edge • FPGA
  14. Summary of options and BEST PRACTICES • Real time scoring

    o ACI (docker) (small loads) o AKS o WebService + GPU = AKS ONLY!!! • Batch scoring o Using pipelines! o Azure ML Compute (preview) o GPU + batch scoring = ML Compute WITH Pipelines