Machine Learning 102: Classification

Slide 1

Slide 1 text

The picture can't be displayed. The picture can't be displayed. The picture can't be displayed. With Marcos Arancibia, Product Manager, Data Science and Big Data @MarcosArancibia Mark Hornick, Senior Director, Product Management, Data Science and Machine Learning @MarkHornick oracle.com/machine-learning Oracle Machine Learning Office Hours Machine Learning 102 - Classification Copyright © 2020, Oracle and/or its affiliates. All rights reserved

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Web Questions • Previous session was awesome and i really loved it. I hope that similar sessions will be organized based on different ML functions. • In this current session , i would like you to provide details about Cost Matrix and how to calculate cost and assign. I have gone through all the documentation. But, it does not give you clear explanation about how cost should be assigned. Also, request you to provide details on model evaluation with respect to cost matrix. Copyright © 2020 Oracle and/or its affiliates.

Slide 4

Slide 4 text

Next Session June 25, 2020: Oracle Machine Office Hours, 9AM US Pacific Machine Learning 101 – Regression Have you always been curious about what machine learning can do for your business problem, but could never find the time to learn the practical necessary skills? Do you wish to learn what Classification, Regression, Clustering and Feature Extraction techniques do, and how to apply them using the Oracle Machine Learning family of products? Join us for this special series “Oracle Machine Learning Office Hours – Machine Learning 101”, where we will go through the main steps of solving a Business Problem from beginning to end, using the different components available in Oracle Machine Learning: programming languages and interfaces, including Notebooks with SQL, UI, and languages like R and Python. This second session in the series will cover Regression, where we will learn how to set up a data set for regression modeling, build machine learning models that predict numeric values such as home prices, and evaluate model quality. Marcos Arancibia, OML Product Management Marcos Arancibia, OML Product Management Copyright © 2020, Oracle and/or its affiliates. All rights reserved

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Today’s Session: Machine Learning 102 - Classification In this "ML Classification 102" session we will pickup where we left off from our 101 Session, and go deeper in our discussions on ML algorithms, the importance of Feature Selection, and will explore even more the correct way to evaluate models using the Confusion Matrix and the many statistics that can be computed from it. Copyright © 2020, Oracle and/or its affiliates. All rights reserved

Slide 8

Slide 8 text

• What is machine learning? • What is classification? • Business problems addressed by classification • Types of data needed for Classification • Terminology • Data preparation • Model evaluation • AutoML • Q&A • Further details in Model evaluation Agenda Copyright © 2020 Oracle and/or its affiliates 8

Slide 9

Slide 9 text

Copyright © 2020, Oracle and/or its affiliates 9 How can we determine if a Model is any good? After Scoring new (Test or Validation) data, we compare what the Model predicted was going to happen vs. the Actual Target. Review: Model Evaluation: Confusion Matrix 1 0 1 20 12 0 10 50 Model Predicted This These are Actual Responses found on the test data Precision only takes into account the True Positives on the Actual Predicted Positives Precision = 20 / (20+12) = 62.5% Accuracy takes into account the Positives but also the Negatives, which is key in many use cases Accuracy = (20 + 50) / (20 + 12 + 10 + 50) = 76.1%

Slide 10

Slide 10 text

Copyright © 2020, Oracle and/or its affiliates 10 There are many more measures of quality of a Model available, several can be easily computed and several are available in Oracle Machine Learning. From Wikipedia on Confusion Matrix: Model Evaluation: Confusion Matrix

Slide 11

Slide 11 text

Copyright © 2020, Oracle and/or its affiliates 11 In Oracle Data Miner (SQL Developer desktop App): • Interactive Comparison of multiple Cutoff Points • Comparison of all Models • In practice, interpretation is: "How much of the total actual positive cases would I have captured if I have chosen to only contact the top X % of the customers sorted in descending order by Probability?" Percentile Cumulative positive cases by Model Lift/Gains Chart

Slide 12

Slide 12 text

Copyright © 2020, Oracle and/or its affiliates 12 In Oracle Data Miner (SQL Developer desktop App): • Interactive Comparison of multiple Cutoff Points • Comparison of all Models • In practice, interpretation is: "How much better than a Random Choice would my model be if I have chosen to only contact the top X % of the customers sorted in descending order by Probability?" • Notice that at the right most, all models go to '1' because if you contact everyone you actually would have spoken to all positive responders Percentile Cumulative lift by Model Lift/Gains Chart

Slide 13

Slide 13 text

Copyright © 2020, Oracle and/or its affiliates 13 In Oracle Data Miner (SQL Developer desktop App): • Interactive Comparison of multiple Cutoff Points • Comparison of all Models • In practice, interpretation is: "How much money would I actually win or lose depending on the cutoff point, based on the fact that I have a base Cost for Contacting people (cost of a lead), I might have incremental revenues when a customer actually accepts an offer, I have an incremental cost when the customer accepts an offer (cost of processing, welcome kits, etc).? • We can also add limits to budget and number of people we can contact (Call Center limitations for example) Percentile Cumulative lift by Model Lift/Gains Chart

Slide 14

Slide 14 text

Copyright © 2020, Oracle and/or its affiliates 14 1. Sort data descending on PROBABILITY of being the TARGET as decided by the model 2. Divide the data in percentiles or deciles (slices of data with the same number of observations). For deciles this means 10 slices, while for percentiles it means 100 3. Evaluate on each slice how many correct TARGETS the Model is able to identify 4. To simulate the IDEAL model, we separately sort data descending on TARGET, and compute the proportion of TARGET on each slice 5. The IDEAL model would have done 100% on all initial slices until it ran out of TARGET records. 6. The overall Average target proportion can be used as the Random Guess comparison Decile statistics of proportion of Detected Target by Model Lift/Gains Chart

Slide 15

Slide 15 text

Copyright © 2020, Oracle and/or its affiliates 15 1. Distribution of "PROBABILITY_OF_1 " for each TARGET level is compared for the Model 2. Models usually have a cutoff at 0.5 (or 50%) PROBABILITY of assinging a PREDICTION to "0" or "1". If PROBABILITY_OF_1 > 0.5 then the decision by the model is PREDICTION = '1' , else PREDICTION='0' 3. When looking at the Density Chart, one should expect that the better the model, the larger the separation the model will be able to detect between the '1' and '0'. Histogram of TARGET='1' vs TARGET='0' Density Chart of Predictions

Slide 16

Slide 16 text

Copyright © 2020, Oracle and/or its affiliates 16 1. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings 2. The ROC curve is thus the sensitivity or recall as a function of fall-out. 3. ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution 4. The Area Under the Curve (AUC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative') 5. If prediction models are well calibrated and unbiased, the "Gini inequality coefficient" can be derived from AUC: Gini = AUC * 2 - 1. True Positive Rate (Sensitivity) vs. False Positive Rate (1 – Specificity) ROC (Receiver Operating Characteristic) curve

Slide 17

Slide 17 text

Copyright © 2020, Oracle and/or its affiliates 17 1. Added a few Statistics inside the ROC Curve Chart to facilitate viewing of Models True Positive Rate (Sensitivity) vs. False Positive Rate (1 – Specificity) ROC (Receiver Operating Characteristic) curve

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Thank You Marcos Arancibia | [email protected] Mark Hornick | [email protected] Oracle Machine Learning Product Management