Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML Concepts - Best Practices when using ML Classification Metrics

ML Concepts - Best Practices when using ML Classification Metrics

On this weekly Office Hours for Oracle Machine Learning on Autonomous Database, Jie Liu, Data Scientist for Oracle Machine Learning, covered the best practices when utilizing ML Classification Metrics, and showed a variety of ways to use them with Oracle Machine Learning for Python (OML4Py), with a live demo.

The Oracle Machine Learning product family supports data scientists, analysts, developers, and IT to achieve data science project goals faster while taking full advantage of the Oracle platform.

The Oracle Machine Learning Notebooks offers an easy-to-use, interactive, multi-user, collaborative interface based on Apache Zeppelin notebook technology, and support SQL, PL/SQL, Python and Markdown interpreters. It is available on all Autonomous Database versions and Tiers, including the always-free editions.

OML includes AutoML, which provides automated machine learning algorithm features for algorithm selection, feature selection and model tuning, in addition to a specialized AutoML UI exclusive to the Autonomous Database.

OML Services is also included in Autonomous Database, where you can deploy and manage native in-database OML models as well as ONNX ML models (for classification and regression) built using third-party engines, and can also invoke cognitive text analytics.

Marcos Arancibia

September 14, 2021
Tweet

More Decks by Marcos Arancibia

Other Decks in Technology

Transcript

  1. OML usage highlight: ML Concepts - Best Practices when using

    ML Classification Metrics OML AskTOM Office Hours Jie Liu Data Scientist, Oracle Machine Learning Supported by Marcos Arancibia, Mark Hornick and Sherry LaMonica Product Management, Oracle Machine Learning Move the Algorithms; Not the Data! Copyright © 2021, Oracle and/or its affiliates. This Session will be Recorded
  2. • Upcoming Sessions • Classification Metrics • Q&A Topics for

    today Copyright © 2021, Oracle and/or its affiliates 2
  3. • November 9 2021 08:00 AM Pacific • OML Usage

    Highlight: Leveraging OML algorithms in Retail Science platform • November 2 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn • October 5 2021 08:00 AM Pacific • OML4Py: Using third-party Python packages from Python, SQL and REST • September 28 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn • (OML4Py updated Hands-on-Lab) • September 21 2021 08:00 AM Pacific • Weekly Office Hours: OML on Autonomous Database - Ask & Learn Upcoming Sessions Copyright © 2021, Oracle and/or its affiliates 3
  4. Classification Metrics OML Office Hours Jie Liu Data Scientist, Oracle

    Machine Learning Move the Algorithms; Not the Data! Copyright © 2021, Oracle and/or its affiliates.
  5. • Overview of classification metrics • Discussion of various classification

    metrics and plots • Computation of classification metrics using SQL and OML4Py • Conclusion Outline Copyright © 2021, Oracle and/or its affiliates.
  6. • A long list of classification metric + plots •

    Accuracy • Confusion matrix, FP, FN, TP, TN • Precision, Recall, F1 score + Precision-Recall curve • Lift chart + waterfall analysis • AUC + ROC curve Motivation 6 Choose the right metric after a classification model is built Copyright © 2021, Oracle and/or its affiliates. Achieved AUC 0.8 Achieved an Accuracy 0.98 Achieved high precision but low recall • Questions: • Does a good looking metric truly reflect the reality? • The chosen metric looks good, but does it help the business? • How to explain AUC to stakeholder with little knowledge of ML?
  7. • Definition - # of correctly predicted / # of

    all rows of data • Pros • Easy to compute and understand • Used in multi-class prediction task – image classification • Cons • Does not provide specific information • Among the errors, how many of the positive labels are misclassified as negative ? • Unable to reflect the reality when the dataset is unbalanced • 999 negative examples, 1 positive example • predict everything as negative – 99.9% accuracy achieved Accuracy 7 Starting with accuracy Copyright © 2021, Oracle and/or its affiliates.
  8. Better overview for choosing threshold • Most of classification algorithms

    output probability score • Need to choose a threshold for classification decisions • Predict a case as positive if p >= 0.4 • Predict a case as negative if p < 0.4 • Methodology • Based on ground truth (calibration) – what is the actual ratio of people buying insurance • Based on cost – how to balance the cost for each type of error Probability Score Histogram Copyright © 2021, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 8 Threshold
  9. List all scenarios Confusion Matrix 9 Copyright © 2021, Oracle

    and/or its affiliates. • True positive – number of positive cases are predicted as positive • True negative – number of negative cases are predicted as negative • False positive – number of negative cases are predicted as positive • False negative – number of positive cases are predicted as negative • Note • confusion matrix relies on the choice of threshold
  10. Definition • Precision • Among the cases predicted as positive,

    the ratio of actual positive cases • Recall • Among the actual positive cases, the ratio of cases which are predicted as positive • Depend on the threshold Precision, Recall 10 Copyright © 2021, Oracle and/or its affiliates. Precision = TP/(TP + FP) 41.54% Recall = TP/(TP + FN) 12.61%
  11. Precision, Recall Copyright © 2021, Oracle and/or its affiliates |

    Confidential: Internal/Restricted/Highly Restricted 11 Trade off • Different emphasis based on different business target • Focus on precision • Care more about false positive • Marketing target audience • Video/music/item recommendation • Focus on recall • Care more about false negative • Detect fraud • Detect disease infection • Useful when the data is imbalanced • Imbalanced data leads to high accuracy easily • Not easy to achieve high precision or recall • Precision – Pr(actual = 1| predicted = 1) - change with the target distribution • the same model is likely to have lower precision for a test set with fewer positive labels • Recall – Pr(predicted = 1 | actual = 1) - independent of the target distribution • if a test set has fewer positive labels, recall remains roughly the same
  12. F1 Score 12 • F1-score is defined as 2/ (

    1/Precision + 1/Recall). • Value between (0, 1). • When the precision and recall are both high, F1-score is approaching to 1 • If either of precision or recall is very low, then F1 –score becomes very low • Depend on actual threshold Copyright © 2021, Oracle and/or its affiliates. Combining precision and recall
  13. Plot on precision and recall Precision – Recall Curve Copyright

    © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 13 • Observation • Increasing the threshold • Precision goes up • Recall goes down • Decreasing the threshold • Precision goes down • Recall goes up • PR curve • Plot the curve by increasing threshold from lowest to highest • Does not depend on one particular threshold Highest Threshold Lowest Threshold
  14. Metric mostly used in marketing Lift Chart Copyright © 2020,

    Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 14 • Select a subset of potential customers as marketing target to maximize conversions • Predictive marketing – use machine learning to predict which customer is going to make a purchase i.e. conversion • Focus on recall obtained by targeting the top customer ranked by probability Target customers with prediction scores at the top 20% # targeted customers who will make the purchased / # total customers who will make the purchase
  15. Metric specific to marketing: use random targeting as a baseline

    Waterfall Analysis Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 15 • Bin customers based on the probability score sorted from high to low • For each customer bin, plot the percentage of actual converted customer • Draw a reference line to represent the average percentage of customer who will purchase • Target customer randomly achieves the dashed line • Ideally, the bar in the customer segment with high probability score should be much higher than the real customer percentage ( obtained by targeting random customer )
  16. Most popular metric for classification AUC & ROC (Receiver Operating

    Curve) Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 16 • ROC curve • plot false positive rate (FP/N) and true positive rate (TP/P) for classification threshold from low to high • AUC - area under ROC, ranging from 0 to 1 • AUC focuses on ranking performance • For any pair of randomly chosen positive example and negative example, AUC is equal to the chance of positive example has higher score than the negative example • Independent of classification threshold • Independent of the range of the classification score • Cons – insensitive for imbalanced dataset
  17. Compute using OML4Py and Oracle SQL query Computation of metrics

    Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 17 • Table-wise aggregation needed for each row • Precision – Recall plot • Precision and recall for each row’s probability score as a threshold • Lift chart • recall for all data up to each row’s probability score as a threshold • ROC • True positive rate and false positive rate computed based on each row’s probability score as a threshold • Use Oracle window function to speed up • Go through notebook