Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML Concepts - Best Practices when using ML Classification Metrics

ML Concepts - Best Practices when using ML Classification Metrics

On this weekly Office Hours for Oracle Machine Learning on Autonomous Database, Jie Liu, Data Scientist for Oracle Machine Learning, covered the best practices when utilizing ML Classification Metrics, and showed a variety of ways to use them with Oracle Machine Learning for Python (OML4Py), with a live demo.

The Oracle Machine Learning product family supports data scientists, analysts, developers, and IT to achieve data science project goals faster while taking full advantage of the Oracle platform.

The Oracle Machine Learning Notebooks offers an easy-to-use, interactive, multi-user, collaborative interface based on Apache Zeppelin notebook technology, and support SQL, PL/SQL, Python and Markdown interpreters. It is available on all Autonomous Database versions and Tiers, including the always-free editions.

OML includes AutoML, which provides automated machine learning algorithm features for algorithm selection, feature selection and model tuning, in addition to a specialized AutoML UI exclusive to the Autonomous Database.

OML Services is also included in Autonomous Database, where you can deploy and manage native in-database OML models as well as ONNX ML models (for classification and regression) built using third-party engines, and can also invoke cognitive text analytics.

Marcos Arancibia

September 14, 2021
Tweet

More Decks by Marcos Arancibia

Other Decks in Technology

Transcript

  1. OML usage highlight:
    ML Concepts - Best Practices when using ML Classification Metrics
    OML AskTOM Office Hours
    Jie Liu
    Data Scientist, Oracle Machine Learning
    Supported by Marcos Arancibia, Mark Hornick and Sherry LaMonica
    Product Management, Oracle Machine Learning
    Move the Algorithms; Not the Data!
    Copyright © 2021, Oracle and/or its affiliates.
    This Session will
    be Recorded

    View Slide

  2. • Upcoming Sessions
    • Classification Metrics
    • Q&A
    Topics for today
    Copyright © 2021, Oracle and/or its affiliates
    2

    View Slide

  3. • November 9 2021 08:00 AM Pacific
    • OML Usage Highlight: Leveraging OML algorithms in Retail Science platform
    • November 2 2021 08:00 AM Pacific
    • Weekly Office Hours: OML on Autonomous Database - Ask & Learn
    • October 5 2021 08:00 AM Pacific
    • OML4Py: Using third-party Python packages from Python, SQL and REST
    • September 28 2021 08:00 AM Pacific
    • Weekly Office Hours: OML on Autonomous Database - Ask & Learn
    • (OML4Py updated Hands-on-Lab)
    • September 21 2021 08:00 AM Pacific
    • Weekly Office Hours: OML on Autonomous Database - Ask & Learn
    Upcoming Sessions
    Copyright © 2021, Oracle and/or its affiliates
    3

    View Slide

  4. Classification Metrics
    OML Office Hours
    Jie Liu
    Data Scientist, Oracle Machine Learning
    Move the Algorithms; Not the Data!
    Copyright © 2021, Oracle and/or its affiliates.

    View Slide

  5. • Overview of classification metrics
    • Discussion of various classification metrics and plots
    • Computation of classification metrics using SQL and OML4Py
    • Conclusion
    Outline
    Copyright © 2021, Oracle and/or its affiliates.

    View Slide

  6. • A long list of classification metric + plots
    • Accuracy
    • Confusion matrix, FP, FN, TP, TN
    • Precision, Recall, F1 score + Precision-Recall curve
    • Lift chart + waterfall analysis
    • AUC + ROC curve
    Motivation
    6
    Choose the right metric after a classification model is built
    Copyright © 2021, Oracle and/or its affiliates.
    Achieved AUC 0.8
    Achieved an Accuracy
    0.98
    Achieved high precision
    but low recall
    • Questions:
    • Does a good looking metric truly reflect the reality?
    • The chosen metric looks good, but does it help the business?
    • How to explain AUC to stakeholder with little knowledge of ML?

    View Slide

  7. • Definition - # of correctly predicted / # of all rows of data
    • Pros
    • Easy to compute and understand
    • Used in multi-class prediction task – image classification
    • Cons
    • Does not provide specific information
    • Among the errors, how many of the positive labels are misclassified as negative ?
    • Unable to reflect the reality when the dataset is unbalanced
    • 999 negative examples, 1 positive example
    • predict everything as negative – 99.9% accuracy achieved
    Accuracy
    7
    Starting with accuracy
    Copyright © 2021, Oracle and/or its affiliates.

    View Slide

  8. Better overview for choosing threshold
    • Most of classification algorithms output
    probability score
    • Need to choose a threshold for
    classification decisions
    • Predict a case as positive if p >= 0.4
    • Predict a case as negative if p < 0.4
    • Methodology
    • Based on ground truth (calibration) –
    what is the actual ratio of people
    buying insurance
    • Based on cost – how to balance the
    cost for each type of error
    Probability Score Histogram
    Copyright © 2021, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    8
    Threshold

    View Slide

  9. List all scenarios
    Confusion Matrix
    9 Copyright © 2021, Oracle and/or its affiliates.
    • True positive – number of positive cases are predicted as positive
    • True negative – number of negative cases are predicted as negative
    • False positive – number of negative cases are predicted as positive
    • False negative – number of positive cases are predicted as negative
    • Note
    • confusion matrix relies on the choice of threshold

    View Slide

  10. Definition
    • Precision
    • Among the cases predicted as positive, the ratio of actual positive
    cases
    • Recall
    • Among the actual positive cases, the ratio of cases which are
    predicted as positive
    • Depend on the threshold
    Precision, Recall
    10 Copyright © 2021, Oracle and/or its affiliates.
    Precision = TP/(TP + FP)
    41.54%
    Recall =
    TP/(TP + FN)
    12.61%

    View Slide

  11. Precision, Recall
    Copyright © 2021, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    11
    Trade off
    • Different emphasis based on different business target
    • Focus on precision
    • Care more about false positive
    • Marketing target audience
    • Video/music/item recommendation
    • Focus on recall
    • Care more about false negative
    • Detect fraud
    • Detect disease infection
    • Useful when the data is imbalanced
    • Imbalanced data leads to high accuracy easily
    • Not easy to achieve high precision or recall
    • Precision – Pr(actual = 1| predicted = 1) - change with the target distribution
    • the same model is likely to have lower precision for a test set with fewer positive labels
    • Recall – Pr(predicted = 1 | actual = 1) - independent of the target distribution
    • if a test set has fewer positive labels, recall remains roughly the same

    View Slide

  12. F1 Score
    12
    • F1-score is defined as 2/ ( 1/Precision + 1/Recall).
    • Value between (0, 1).
    • When the precision and recall are both high, F1-score is approaching to 1
    • If either of precision or recall is very low, then F1 –score becomes very low
    • Depend on actual threshold
    Copyright © 2021, Oracle and/or its affiliates.
    Combining precision and recall

    View Slide

  13. Plot on precision and recall
    Precision – Recall Curve
    Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    13
    • Observation
    • Increasing the threshold
    • Precision goes up
    • Recall goes down
    • Decreasing the threshold
    • Precision goes down
    • Recall goes up
    • PR curve
    • Plot the curve by increasing threshold from
    lowest to highest
    • Does not depend on one particular threshold
    Highest Threshold
    Lowest Threshold

    View Slide

  14. Metric mostly used in marketing
    Lift Chart
    Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    14
    • Select a subset of potential customers as
    marketing target to maximize conversions
    • Predictive marketing – use machine learning to
    predict which customer is going to make a
    purchase i.e. conversion
    • Focus on recall obtained by targeting the top
    customer ranked by probability
    Target customers with
    prediction scores at
    the top 20%
    # targeted customers who will make the purchased /
    # total customers who will make the purchase

    View Slide

  15. Metric specific to marketing: use random targeting as a baseline
    Waterfall Analysis
    Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    15
    • Bin customers based on the probability score
    sorted from high to low
    • For each customer bin, plot the percentage of
    actual converted customer
    • Draw a reference line to represent the average
    percentage of customer who will purchase
    • Target customer randomly achieves the
    dashed line
    • Ideally, the bar in the customer segment with
    high probability score should be much higher
    than the real customer percentage ( obtained by
    targeting random customer )

    View Slide

  16. Most popular metric for classification
    AUC & ROC (Receiver Operating Curve)
    Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    16
    • ROC curve
    • plot false positive rate (FP/N) and true positive rate
    (TP/P) for classification threshold from low to high
    • AUC - area under ROC, ranging from 0 to 1
    • AUC focuses on ranking performance
    • For any pair of randomly chosen positive example
    and negative example, AUC is equal to the chance of
    positive example has higher score than the negative
    example
    • Independent of classification threshold
    • Independent of the range of the classification score
    • Cons – insensitive for imbalanced dataset

    View Slide

  17. Compute using OML4Py and Oracle SQL query
    Computation of metrics
    Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
    17
    • Table-wise aggregation needed for each row
    • Precision – Recall plot
    • Precision and recall for each row’s probability score as a
    threshold
    • Lift chart
    • recall for all data up to each row’s probability score as a
    threshold
    • ROC
    • True positive rate and false positive rate computed based
    on each row’s probability score as a threshold
    • Use Oracle window function to speed up
    • Go through notebook

    View Slide

  18. Thank you
    [email protected]
    Copyright © 2021, Oracle and/or its affiliates.
    18

    View Slide