Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning and Performance Evaluation @ DataPhilly 2016

Machine Learning and Performance Evaluation @ DataPhilly 2016

Every day, in scientific research and business applications, we rely on statistics and machine learning as our support tools for predictive modeling. To satisfy our desire of modeling uncertainty, to predict trends and to predict patterns that may occur in future, we developed a vast library of tools for decision making. In other words, we learned to take advantage of computers to replicate the real world, making intuitive decisions more quantitative, labeling unlabeled data, predicting trends, and ultimately trying to predict the future. Now, whether we are applying predictive modeling techniques to our research or business problems, we want to make "good" predictions!

In the presence of modern machine learning libraries, choosing a machine learning algorithm to fit a model to our training data has never been that simple. However, making sure that our model generalizes well to unseen data is still up to us — the machine learning practitioners and researchers. In this talk, we will discuss the two most important components of various estimators of generalization performance: bias and variance. We will discuss how we can make the best use of our data at hand — proper (re)sampling -- and how to pick appropriate performance metrics. Then, we will compare various techniques for algorithm selection and model selection to find the right tool and approach for our task at hand. In the context of the "bias-variance trade-off," we will go over potential weaknesses in common modeling techniques, and we will learn how to take uncertainty into account to build predictive model performs well on unseen data.

3324b5ee3a1f4060057dad7c215265f5?s=128

Sebastian Raschka

December 01, 2016
Tweet

Transcript

  1. Wednesday, November 30, 2016 6:30 PM to 7:00 PM 441

    N 5th St, Suite 301, Philadelphia, PA Machine Learning and Performance Evaluation Sebastian Raschka DATAPHILLY
  2. Estimating the Performance of Predictive Models Why bother?

  3. ①  Generalization Performance ②  Model Selection ③  Algorithm Selection

  4. target y

  5. target y variance bias

  6. Bias = E ⇥ ˆ ⇤ Variance = E 

    ˆ E[ˆ] 2 Low Variance (Precise) High Variance (Not Precise) Low Bias (Accurate) High Bias (Not Accurate) This work by Sebastian Raschka is licensed under a Bias = E ⇥ ˆ ⇤ Variance = E  ˆ E[ˆ] 2 expected estimated value VARIANCE BIAS
  7. Performance Estimates – Absolute vs Relative

  8. ①  Generalization Performance ②  Model Selection ③  Algorithm Selection

  9. Sources of Bias and Variance TRAIN TRAIN TEST

  10. TRAIN TEST

  11. TRAIN TEST Pessimistic Bias

  12. * SoftMax Classifier on a small MNIST subset

  13. TRAIN TEST

  14. TRAIN TEST Pessimistic Bias

  15. TRAIN TEST Pessimistic Bias Variance

  16. Train (70%) Test (30%) Train (70%) Test (30%) n=1000 n=100

    Real World Distribution Sample 1 Sample 2 Sample 3 Resampling
  17. * 3-NN on Iris dataset

  18. * 3-NN on Iris dataset

  19. 1st 2nd 3rd 4th 5th K Iterations (K-Folds) Validation Fold

    Training Fold Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 5 ∑ 5 i =1 Performancei = K-fold Cross Validation
  20. Learning Algorithm Hyperparameter Values Model Training Fold Data Training Fold

    Labels Prediction Performance Model Validation Fold Data Validation Fold Labels Hyperparameter Values Training Fold Data Training Fold Labels Prediction Performance Model Validation Fold Data Validation Fold Labels 1st 2nd 3rd 4th 5th K Iterations (K-Folds) Validation Fold Training Fold Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 5 ∑ 5 i =1 Performancei =
  21. Σ Logis'c cost . . . Net input func'on (weighted

    sum) Logis'c (sigmoid) func'on Quan'zer Predicted class label y Update Model parameters w 1 w 2 w m w 0 1 x 1 x 2 x m y True class label Number of itera'ons w λ 2 L2-regulariza'on strength
  22. The law of parsimony 1-standard error method

  23. The law of parsimony 1-standard error method

  24. None
  25. Test Labels Test Data Training Data Training Labels Data Labels

    1 K-fold for Model Selection step-by-step
  26. Learning Algorithm Hyperparameter values Hyperparameter values Hyperparameter values Training Data

    Training Labels 2 Performance Performance Performance Test Labels Test Data Training Data Training Labels Data Labels 1
  27. Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels

    3 Learning Algorithm Hyperparameter values Hyperparameter values Hyperparameter values Training Data Training Labels 2 Performance Performance Performance
  28. Prediction Test Labels Performance Model Test Data 4 Learning Algorithm

    Best Hyperparameter Values Model Training Data Training Labels 3
  29. Prediction Test Labels Performance Model Test Data 4 Learning Algorithm

    Best Hyperparameter Values Final Model Data Labels 5
  30. 1st 2nd 3rd 4th 5th Outer Loop Outer Validation Fold

    Outer Training Fold Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 10 ∑ 10 i=1 Performancei = Inner Loop Inner Training Fold Inner Validation Fold Performance Performance 5,1 5, 2 Performance 1 2 ∑ 2 j=1 Performance5,j Best Algorithm Best Model Nested Cross-Validation for Algorithm Selection
  31. Beyond Performance Metrics Ideal features that are ... •  discriminatory

    •  salient •  invariant
  32. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why

    Should I Trust You?”: Explaining the Predictions of Any Classifier. In Knowledge Discovery and Data Mining (KDD).
  33. THANK YOU!

  34. https://github.com/rasbt mail@sebastianraschka.com http://sebastianraschka.com @rasbt