ML Concepts - Best Practices when using ML Classification Metrics

Slide 1

Slide 1 text

OML usage highlight: ML Concepts - Best Practices when using ML Classification Metrics OML AskTOM Office Hours Jie Liu Data Scientist, Oracle Machine Learning Supported by Marcos Arancibia, Mark Hornick and Sherry LaMonica Product Management, Oracle Machine Learning Move the Algorithms; Not the Data! Copyright © 2021, Oracle and/or its affiliates. This Session will be Recorded

Slide 2

Slide 2 text

Slide 3

Slide 3 text

• November 9 2021 08:00 AM Pacific • OML Usage Highlight: Leveraging OML algorithms in Retail Science platform • November 2 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn • October 5 2021 08:00 AM Pacific • OML4Py: Using third-party Python packages from Python, SQL and REST • September 28 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn • (OML4Py updated Hands-on-Lab) • September 21 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn Upcoming Sessions Copyright © 2021, Oracle and/or its affiliates 3

Slide 4

Slide 4 text

Slide 5

Slide 5 text

• Overview of classification metrics • Discussion of various classification metrics and plots • Computation of classification metrics using SQL and OML4Py • Conclusion Outline Copyright © 2021, Oracle and/or its affiliates.

Slide 6

Slide 6 text

• A long list of classification metric + plots • Accuracy • Confusion matrix, FP, FN, TP, TN • Precision, Recall, F1 score + Precision-Recall curve • Lift chart + waterfall analysis • AUC + ROC curve Motivation 6 Choose the right metric after a classification model is built Copyright © 2021, Oracle and/or its affiliates. Achieved AUC 0.8 Achieved an Accuracy 0.98 Achieved high precision but low recall • Questions: • Does a good looking metric truly reflect the reality? • The chosen metric looks good, but does it help the business? • How to explain AUC to stakeholder with little knowledge of ML?

Slide 7

Slide 7 text

• Definition - # of correctly predicted / # of all rows of data • Pros • Easy to compute and understand • Used in multi-class prediction task – image classification • Cons • Does not provide specific information • Among the errors, how many of the positive labels are misclassified as negative ? • Unable to reflect the reality when the dataset is unbalanced • 999 negative examples, 1 positive example • predict everything as negative – 99.9% accuracy achieved Accuracy 7 Starting with accuracy Copyright © 2021, Oracle and/or its affiliates.

Slide 8

Slide 8 text

Better overview for choosing threshold • Most of classification algorithms output probability score • Need to choose a threshold for classification decisions • Predict a case as positive if p >= 0.4 • Predict a case as negative if p < 0.4 • Methodology • Based on ground truth (calibration) – what is the actual ratio of people buying insurance • Based on cost – how to balance the cost for each type of error Probability Score Histogram Copyright © 2021, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 8 Threshold

Slide 9

Slide 9 text

List all scenarios Confusion Matrix 9 Copyright © 2021, Oracle and/or its affiliates. • True positive – number of positive cases are predicted as positive • True negative – number of negative cases are predicted as negative • False positive – number of negative cases are predicted as positive • False negative – number of positive cases are predicted as negative • Note • confusion matrix relies on the choice of threshold

Slide 10

Slide 10 text

Definition • Precision • Among the cases predicted as positive, the ratio of actual positive cases • Recall • Among the actual positive cases, the ratio of cases which are predicted as positive • Depend on the threshold Precision, Recall 10 Copyright © 2021, Oracle and/or its affiliates. Precision = TP/(TP + FP) 41.54% Recall = TP/(TP + FN) 12.61%

Slide 11

Slide 11 text

Precision, Recall Copyright © 2021, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 11 Trade off • Different emphasis based on different business target • Focus on precision • Care more about false positive • Marketing target audience • Video/music/item recommendation • Focus on recall • Care more about false negative • Detect fraud • Detect disease infection • Useful when the data is imbalanced • Imbalanced data leads to high accuracy easily • Not easy to achieve high precision or recall • Precision – Pr(actual = 1| predicted = 1) - change with the target distribution • the same model is likely to have lower precision for a test set with fewer positive labels • Recall – Pr(predicted = 1 | actual = 1) - independent of the target distribution • if a test set has fewer positive labels, recall remains roughly the same

Slide 12

Slide 12 text

F1 Score 12 • F1-score is defined as 2/ ( 1/Precision + 1/Recall). • Value between (0, 1). • When the precision and recall are both high, F1-score is approaching to 1 • If either of precision or recall is very low, then F1 –score becomes very low • Depend on actual threshold Copyright © 2021, Oracle and/or its affiliates. Combining precision and recall

Slide 13

Slide 13 text

Plot on precision and recall Precision – Recall Curve Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 13 • Observation • Increasing the threshold • Precision goes up • Recall goes down • Decreasing the threshold • Precision goes down • Recall goes up • PR curve • Plot the curve by increasing threshold from lowest to highest • Does not depend on one particular threshold Highest Threshold Lowest Threshold

Slide 14

Slide 14 text

Metric mostly used in marketing Lift Chart Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 14 • Select a subset of potential customers as marketing target to maximize conversions • Predictive marketing – use machine learning to predict which customer is going to make a purchase i.e. conversion • Focus on recall obtained by targeting the top customer ranked by probability Target customers with prediction scores at the top 20% # targeted customers who will make the purchased / # total customers who will make the purchase

Slide 15

Slide 15 text

Metric specific to marketing: use random targeting as a baseline Waterfall Analysis Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 15 • Bin customers based on the probability score sorted from high to low • For each customer bin, plot the percentage of actual converted customer • Draw a reference line to represent the average percentage of customer who will purchase • Target customer randomly achieves the dashed line • Ideally, the bar in the customer segment with high probability score should be much higher than the real customer percentage ( obtained by targeting random customer )

Slide 16

Slide 16 text

Most popular metric for classification AUC & ROC (Receiver Operating Curve) Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 16 • ROC curve • plot false positive rate (FP/N) and true positive rate (TP/P) for classification threshold from low to high • AUC - area under ROC, ranging from 0 to 1 • AUC focuses on ranking performance • For any pair of randomly chosen positive example and negative example, AUC is equal to the chance of positive example has higher score than the negative example • Independent of classification threshold • Independent of the range of the classification score • Cons – insensitive for imbalanced dataset

Slide 17

Slide 17 text

Compute using OML4Py and Oracle SQL query Computation of metrics Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 17 • Table-wise aggregation needed for each row • Precision – Recall plot • Precision and recall for each row’s probability score as a threshold • Lift chart • recall for all data up to each row’s probability score as a threshold • ROC • True positive rate and false positive rate computed based on each row’s probability score as a threshold • Use Oracle window function to speed up • Go through notebook