ML Concepts - Best Practices when using ML Classification Metrics

OML usage highlight: ML Concepts - Best Practices when using
ML Classification Metrics OML AskTOM Office Hours Jie Liu Data Scientist, Oracle Machine Learning Supported by Marcos Arancibia, Mark Hornick and Sherry LaMonica Product Management, Oracle Machine Learning Move the Algorithms; Not the Data! Copyright © 2021, Oracle and/or its affiliates. This Session will be Recorded

• Upcoming Sessions • Classification Metrics • Q&A Topics for
today Copyright © 2021, Oracle and/or its affiliates 2

• November 9 2021 08:00 AM Pacific • OML Usage
Highlight: Leveraging OML algorithms in Retail Science platform • November 2 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn • October 5 2021 08:00 AM Pacific • OML4Py: Using third-party Python packages from Python, SQL and REST • September 28 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn • (OML4Py updated Hands-on-Lab) • September 21 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn Upcoming Sessions Copyright © 2021, Oracle and/or its affiliates 3

Classification Metrics OML Office Hours Jie Liu Data Scientist, Oracle
Machine Learning Move the Algorithms; Not the Data! Copyright © 2021, Oracle and/or its affiliates.

• Overview of classification metrics • Discussion of various classification
metrics and plots • Computation of classification metrics using SQL and OML4Py • Conclusion Outline Copyright © 2021, Oracle and/or its affiliates.

• A long list of classification metric + plots •
Accuracy • Confusion matrix, FP, FN, TP, TN • Precision, Recall, F1 score + Precision-Recall curve • Lift chart + waterfall analysis • AUC + ROC curve Motivation 6 Choose the right metric after a classification model is built Copyright © 2021, Oracle and/or its affiliates. Achieved AUC 0.8 Achieved an Accuracy 0.98 Achieved high precision but low recall • Questions: • Does a good looking metric truly reflect the reality? • The chosen metric looks good, but does it help the business? • How to explain AUC to stakeholder with little knowledge of ML?

• Definition - # of correctly predicted / # of
all rows of data • Pros • Easy to compute and understand • Used in multi-class prediction task – image classification • Cons • Does not provide specific information • Among the errors, how many of the positive labels are misclassified as negative ? • Unable to reflect the reality when the dataset is unbalanced • 999 negative examples, 1 positive example • predict everything as negative – 99.9% accuracy achieved Accuracy 7 Starting with accuracy Copyright © 2021, Oracle and/or its affiliates.

Better overview for choosing threshold • Most of classification algorithms
output probability score • Need to choose a threshold for classification decisions • Predict a case as positive if p >= 0.4 • Predict a case as negative if p < 0.4 • Methodology • Based on ground truth (calibration) – what is the actual ratio of people buying insurance • Based on cost – how to balance the cost for each type of error Probability Score Histogram Copyright © 2021, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 8 Threshold

List all scenarios Confusion Matrix 9 Copyright © 2021, Oracle
and/or its affiliates. • True positive – number of positive cases are predicted as positive • True negative – number of negative cases are predicted as negative • False positive – number of negative cases are predicted as positive • False negative – number of positive cases are predicted as negative • Note • confusion matrix relies on the choice of threshold

Definition • Precision • Among the cases predicted as positive,
the ratio of actual positive cases • Recall • Among the actual positive cases, the ratio of cases which are predicted as positive • Depend on the threshold Precision, Recall 10 Copyright © 2021, Oracle and/or its affiliates. Precision = TP/(TP + FP) 41.54% Recall = TP/(TP + FN) 12.61%

Precision, Recall Copyright © 2021, Oracle and/or its affiliates |
Confidential: Internal/Restricted/Highly Restricted 11 Trade off • Different emphasis based on different business target • Focus on precision • Care more about false positive • Marketing target audience • Video/music/item recommendation • Focus on recall • Care more about false negative • Detect fraud • Detect disease infection • Useful when the data is imbalanced • Imbalanced data leads to high accuracy easily • Not easy to achieve high precision or recall • Precision – Pr(actual = 1| predicted = 1) - change with the target distribution • the same model is likely to have lower precision for a test set with fewer positive labels • Recall – Pr(predicted = 1 | actual = 1) - independent of the target distribution • if a test set has fewer positive labels, recall remains roughly the same

F1 Score 12 • F1-score is defined as 2/ (
1/Precision + 1/Recall). • Value between (0, 1). • When the precision and recall are both high, F1-score is approaching to 1 • If either of precision or recall is very low, then F1 –score becomes very low • Depend on actual threshold Copyright © 2021, Oracle and/or its affiliates. Combining precision and recall

Plot on precision and recall Precision – Recall Curve Copyright
© 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 13 • Observation • Increasing the threshold • Precision goes up • Recall goes down • Decreasing the threshold • Precision goes down • Recall goes up • PR curve • Plot the curve by increasing threshold from lowest to highest • Does not depend on one particular threshold Highest Threshold Lowest Threshold

Metric mostly used in marketing Lift Chart Copyright © 2020,
Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 14 • Select a subset of potential customers as marketing target to maximize conversions • Predictive marketing – use machine learning to predict which customer is going to make a purchase i.e. conversion • Focus on recall obtained by targeting the top customer ranked by probability Target customers with prediction scores at the top 20% # targeted customers who will make the purchased / # total customers who will make the purchase

Metric specific to marketing: use random targeting as a baseline
Waterfall Analysis Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 15 • Bin customers based on the probability score sorted from high to low • For each customer bin, plot the percentage of actual converted customer • Draw a reference line to represent the average percentage of customer who will purchase • Target customer randomly achieves the dashed line • Ideally, the bar in the customer segment with high probability score should be much higher than the real customer percentage ( obtained by targeting random customer )

Most popular metric for classification AUC & ROC (Receiver Operating
Curve) Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 16 • ROC curve • plot false positive rate (FP/N) and true positive rate (TP/P) for classification threshold from low to high • AUC - area under ROC, ranging from 0 to 1 • AUC focuses on ranking performance • For any pair of randomly chosen positive example and negative example, AUC is equal to the chance of positive example has higher score than the negative example • Independent of classification threshold • Independent of the range of the classification score • Cons – insensitive for imbalanced dataset

Compute using OML4Py and Oracle SQL query Computation of metrics
Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 17 • Table-wise aggregation needed for each row • Precision – Recall plot • Precision and recall for each row’s probability score as a threshold • Lift chart • recall for all data up to each row’s probability score as a threshold • ROC • True positive rate and false positive rate computed based on each row’s probability score as a threshold • Use Oracle window function to speed up • Go through notebook

ML Concepts - Best Practices when using ML Clas...

ML Concepts - Best Practices when using ML Classification Metrics

Marcos Arancibia

More Decks by Marcos Arancibia

Other Decks in Technology

Featured

Transcript

OML usage highlight: ML Concepts - Best Practices when using

• Upcoming Sessions • Classification Metrics • Q&A Topics for

• November 9 2021 08:00 AM Pacific • OML Usage

Classification Metrics OML Office Hours Jie Liu Data Scientist, Oracle

• Overview of classification metrics • Discussion of various classification

• A long list of classification metric + plots •

• Definition - # of correctly predicted / # of

Better overview for choosing threshold • Most of classification algorithms

List all scenarios Confusion Matrix 9 Copyright © 2021, Oracle

Definition • Precision • Among the cases predicted as positive,

Precision, Recall Copyright © 2021, Oracle and/or its affiliates |

F1 Score 12 • F1-score is defined as 2/ (

Plot on precision and recall Precision – Recall Curve Copyright

Metric mostly used in marketing Lift Chart Copyright © 2020,

Metric specific to marketing: use random targeting as a baseline

Most popular metric for classification AUC & ROC (Receiver Operating

Compute using OML4Py and Oracle SQL query Computation of metrics

Thank you jie.jl.liu@oracle.com Copyright © 2021, Oracle and/or its affiliates.