Evaluating Machine Learning Models

Evaluating Machine Learning Models Ike Okonkwo - @ikeondata Data Scientist
@6senseInc 3.1.16 Metis Data Science Speaker Series

About Me • Data Scientist • 6sense - B2B multichannel
predictive intelligence engine for marketing and sales • Data Science mentor and writer • Background • Physics / Electrical Engineering • Industrial & Systems Engineering

Agenda • Evaluation Metrics • Classification • Regression • Ranking

Evaluation Metrics Model evaluation answers the question - how do
I choose between different models that reflect a particular use case in an objective manner? Generalization >>> Memorization

Evaluation Metrics • Classification : Accuracy, Confusion Matrix, ROC/ AUC,
Logloss, Decile Chart, Fixed Bucket Decile, Lift • Regression : R^2, MSE, RMSE • Ranking : NDCG

Classification • Accuracy : measures how often the classifier make
the correct prediction • PROS : easy to calculate • CONS : It doesn’t tell us anything about the distribution of the dependent values, it also doesn’t tell what type of errors your classifier is making • accuracy > null accuracy (baseline)

Classification • Confusion Matrix : table that summarizes performance of
classification tasks

Classification • Accuracy : TN + TP / (TP +
TN + FP +FN) • Error Rate : 1 - Accuracy • Sensitivity (TPR / Recall) : When the actual value is +ve, how often is the prediction correct. TP / (FN + TP) • Specificity : When the actual value is -ve, how often is the prediction correct. TN / (TN + FP) • FPR (1 - Specificity ) : When the actual value is -ve, how often is the prediction incorrect. FP / (TN + FP) • Precision : When a +ve value is predicted, how often is the prediction correct TP / (FP + TP) • f1-score : harmonic mean of Precision and Recall. (2*P*R) / (P + R ) • MCC : correlation coefficient between observed / predicted results [-1, +1]

Classification • Confusion Matrix • PROS : allows you to
calculate other metrics, useful for multi-class problems, allows expected value (cost) calculations (Type I [FP] vs Type II [FN] errors)

Classification • ROC curve : a plot of TPR (sensitivity)
vs FPR (1 - Specificity) for every possible classification threshold • PROS : single graph summary of classifier performance, also useful for cases of high class imbalance, enables you to understand the tradeoff in classifier performance • CONS : less interpretable for multi-class problems , sometimes doesn’t tell the entire story

Classification • AUC : the area under the ROC curve
• PROS : single number summary of classifier performance, also useful for cases of high class imbalance • CONS : less interpretable for multi-class problems

Classification • Logloss : measure of accuracy that incorporates probabilistic
confidence. • PROS : adds a measure to gauge extra noise that comes from using a prediction vs true labels • CONS : Predictions must be probabilities

Classification • Decile Chart

Classification • Fixed Bucket Decile Chart

Classification • Lift : the performance of a classifier over
random guessing • profit / lift curves : 2x..5x

Regression • RMSE : square root of the average squared
distance between actual and predicted values • normalized euclidean distance on number of data points • not robust since it’s an average, distance on average of a data point from the fitted line • MSE : average squared distance between actual and predicted values • distance of a data point from the fitted line • R^2 : proportion of variability in Y that can be explained by the model • measure of correlation

Ranking • NDCG : Normalized Discounted Cumulative Gain. Sums up
relevance of top k ranked items • ex. search engine results. The top few answers matter more /are more relevant than those that are lower down the list • important in information retrieval where positioning of the returned items is very important

References • An Introduction to ROC Analysis - Tom Fawcett
: https:// ccrma.stanford.edu/workshops/mir2009/references/ROCintro.pdf • Simple guide to Confusion Matrix Terminology : http:// www.dataschool.io/simple-guide-to-confusion-matrix-terminology/ • Model Evaluation : http://scikit-learn.org/stable/modules/ model_evaluation.html

Data Science @ 6sense ETL Layer : Hive, Presto, Feature
Eng. Data Layer : HDFS, Hadoop Automation Layer : Python, R, Shell ML Layer : h2o, Python, R, Shell, SQL SAAS Layer : REST API R&D : Python, R, Scala, C++ , Shell, Java, UDFs, Scaling / Distributing models, etc

Interview Tips • Get good at doing data take home
challenges • Get really good at doing data take home challenges • Take home challenges are the new Phone Interview • Network.. network : meetups, linkedin, etc • Build a portfolio of interesting data science projects • Really understand how most of the major ML algorithms work under the hood • Work on open source data related libraries.. if you don’t find any interesting ones start writing yours • Become more visible : blog, contribute to open source ML, etc

Evaluating Machine Learning Models

Evaluating Machine Learning Models

Ike Okonkwo

More Decks by Ike Okonkwo

Other Decks in Programming

Featured

Transcript