AI Bootcamp 2019 - Bulgaria - Getting started with Azure ML Service

Slide 1

Slide 1 text

DECEMBER 14 GLOBAL AI BOOTCAMP IS POWERED BY: Introduction to Azure Machine learning Service Ivan Donev Solution Architect @ Inspirit Trainer @ SQL Master Academy

Slide 2

Slide 2 text

Thanks to our Sponsors:

Slide 3

Slide 3 text

The Data Science Process

Slide 4

Slide 4 text

The Model Training Cycle

Slide 5

Slide 5 text

Let’s work with a sample case • Business problem: brooms are sold out before the end of each month and management needs to know how to properly stock them • Your input data is like this: • You need to go through the whole data science process. What would be the steps?

Slide 6

Slide 6 text

Data Preparation • Often performed by a data engineer but can also be done by the data scientist. • Cleanse data of null and bad values. • Data may be aggregated (counts, sums, averages, etc.) and reformatted or calculated columns may be added. • Data can come from many sources and formats such as SQL Server, Cosmos DB, Flat Files, etc. • Data must be merged from the sources into a consistent and useful dataset.

Slide 7

Slide 7 text

Let’s work with a sample case • Data prep/cleansing on this sample?

Slide 8

Slide 8 text

Feature Engineering • Goal is to identify the data needed to support you machine learning model. • In some cases, raw data can be used with little or no transformation. • In other cases, you need to create the data you need through transformations or aggregations. • Example of possibly little transformation might be the customer country code. • Example of a transformation might be to calculate the patient age using birth date and create age bands like ‘< 18’, ’18 – 25’, ‘26 – 35’, ‘36 – 45’, etc. • Example of aggregation would be to count the number of past readmissions for patient.

Slide 9

Slide 9 text

Let’s work with a sample case • What features can you add here?

Slide 10

Slide 10 text

Model Training: Model Selection • Select the machine learning models you think will have the best accuracy. • Different types of models work best for certain types of problems. • Some models can predict continuous values like sales. • Others classify data such as good email or SPAM. • Models vary in the complexity of algorithms they support. • More complex models require more resources to use.

Slide 11

Slide 11 text

Let’s work with a sample case • Choosing the model in our case?

Slide 12

Slide 12 text

Model Training: Steps 1. Split the data into Training and Testing. 2. Generate (Train) the model. o Cross-Validation 3. Test the model. Use the Testing data to make predictions with the model. 4. Score the model, i.e. compare predicted with actual values.

Slide 13

Slide 13 text

Model Training: Evaluating Results Accuracy and precision A couple of metrics that are useful when evaluating a classification model are accuracy and precision. To understand these terms, it is essential to understand what a confusion matrix is. A confusion matrix includes the following data. •True positive: the number of times a model predicts true (or yes) when it is actually yes •True negative: the number of times a model predicts false (or no) when it is actually false •False negative: the number of times a model predicts false when it is actually true •False positive: the number of times a model predicts true when it is actually false Mean squared error Mean squared error (MSE) is one of the most popular model evaluation metrics in statistical modeling. It allows you to look at how far your predictions are on average from the correct y values. These are just a couple of metric available for model evaluation. The type of model will drive the direction of the model evaluation, but the technique you choose will depend on whether the model is a classification or numerical prediction.

Slide 14

Slide 14 text

Model Training vs Model Use • Model training is often resource intensive using a lot of CPU cycles and memory. • The end result of model training is a small, lightweight, program called a model. • Using a model usually takes significantly less resources than training it. • The bottom line is that we don’t need to use the same platform for consuming the model that we used to train it.

Slide 15

Slide 15 text

Model Deployment • Putting the trained model somewhere it can be used such as by a web/mobile app. • Usually this is done by a developer. • Deployment requirements are very different from model training requirements. o Security o Dependencies o Availability • Ideal for cloud based containers.

Slide 16

Slide 16 text

AML service within the Data Science Pipeline • Create a workspace to store your work. • Use Python or the Azure Portal. • An experiment within the workspace stores model training. information • Use the IDE of your choice. • Use Python libraries or the Azure Data Prep SDK. • Train models with the Python open source modules of your choice. • Train locally or on Azure. • Submit model training to Azure containers. • Monitor model training. • Register final model. • Deploy trained model to Azure target environment: • Azure Kubernetes Service (AKS) • IoT Edge • Field Programmable Gate Array (FPGA)

Slide 17

Slide 17 text

Deployment • To make a model available for consumption, you deploy it. • Target Deployment Environments Supported are: o Docker image o Azure Container Instances o Azure Kubernetes Service (AKS) o Azure IoT Edge o Field Programmable Gate Array (FPGA). • For the deployment, you need the following files: o A score script file tells AMLS to call the model. o An environment file specifies package dependencies o A configuration file requests the required resources for the container

Slide 18

Slide 18 text

Creating a Machine Learning Experiment • AMLS Artifacts

Slide 19

Slide 19 text

The Pipeline Features: • Can schedule tasks and execution times. • Flexibility to allocate compute targets for individual steps and to coordinate multiple pipelines. • Can reuse pipeline scripts, and customize them for different processes; for example, in retraining and batch scoring • C record and manage all input, output, intermediate tasks, and data.

Slide 20

Slide 20 text

Deployment Targets • Azure Container Instances (ACI) o Testing o Limited resources (memory and CPU) • Azure Kubernetes Service (AKS) o Real-time interface o Production and web services o Highly scalable (auto-scale) o Supports GPU! • AML Compute o Batch interface o Normal and low-prio VMs managed by the AMLS o Supports GPU • Azure IoT edge • FPGA

Slide 21

Slide 21 text

Summary of options and BEST PRACTICES • Real time scoring o ACI (docker) (small loads) o AKS o WebService + GPU = AKS ONLY!!! • Batch scoring o Using pipelines! o Azure ML Compute (preview) o GPU + batch scoring = ML Compute WITH Pipelines

Slide 22

Slide 22 text

Thanks to our Sponsors: