Slide 1

Slide 1 text

Wednesday, November 30, 2016 6:30 PM to 7:00 PM 441 N 5th St, Suite 301, Philadelphia, PA Machine Learning and Performance Evaluation Sebastian Raschka DATAPHILLY

Slide 2

Slide 2 text

Estimating the Performance of Predictive Models Why bother?

Slide 3

Slide 3 text

①  Generalization Performance ②  Model Selection ③  Algorithm Selection

Slide 4

Slide 4 text

target y

Slide 5

Slide 5 text

target y variance bias

Slide 6

Slide 6 text

Bias = E ⇥ ˆ ⇤ Variance = E  ˆ E[ˆ] 2 Low Variance (Precise) High Variance (Not Precise) Low Bias (Accurate) High Bias (Not Accurate) This work by Sebastian Raschka is licensed under a Bias = E ⇥ ˆ ⇤ Variance = E  ˆ E[ˆ] 2 expected estimated value VARIANCE BIAS

Slide 7

Slide 7 text

Performance Estimates – Absolute vs Relative

Slide 8

Slide 8 text

①  Generalization Performance ②  Model Selection ③  Algorithm Selection

Slide 9

Slide 9 text

Sources of Bias and Variance TRAIN TRAIN TEST

Slide 10

Slide 10 text

TRAIN TEST

Slide 11

Slide 11 text

TRAIN TEST Pessimistic Bias

Slide 12

Slide 12 text

* SoftMax Classifier on a small MNIST subset

Slide 13

Slide 13 text

TRAIN TEST

Slide 14

Slide 14 text

TRAIN TEST Pessimistic Bias

Slide 15

Slide 15 text

TRAIN TEST Pessimistic Bias Variance

Slide 16

Slide 16 text

Train (70%) Test (30%) Train (70%) Test (30%) n=1000 n=100 Real World Distribution Sample 1 Sample 2 Sample 3 Resampling

Slide 17

Slide 17 text

* 3-NN on Iris dataset

Slide 18

Slide 18 text

* 3-NN on Iris dataset

Slide 19

Slide 19 text

1st 2nd 3rd 4th 5th K Iterations (K-Folds) Validation Fold Training Fold Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 5 ∑ 5 i =1 Performancei = K-fold Cross Validation

Slide 20

Slide 20 text

Learning Algorithm Hyperparameter Values Model Training Fold Data Training Fold Labels Prediction Performance Model Validation Fold Data Validation Fold Labels Hyperparameter Values Training Fold Data Training Fold Labels Prediction Performance Model Validation Fold Data Validation Fold Labels 1st 2nd 3rd 4th 5th K Iterations (K-Folds) Validation Fold Training Fold Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 5 ∑ 5 i =1 Performancei =

Slide 21

Slide 21 text

Σ Logis'c cost . . . Net input func'on (weighted sum) Logis'c (sigmoid) func'on Quan'zer Predicted class label y Update Model parameters w 1 w 2 w m w 0 1 x 1 x 2 x m y True class label Number of itera'ons w λ 2 L2-regulariza'on strength

Slide 22

Slide 22 text

The law of parsimony 1-standard error method

Slide 23

Slide 23 text

The law of parsimony 1-standard error method

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Test Labels Test Data Training Data Training Labels Data Labels 1 K-fold for Model Selection step-by-step

Slide 26

Slide 26 text

Learning Algorithm Hyperparameter values Hyperparameter values Hyperparameter values Training Data Training Labels 2 Performance Performance Performance Test Labels Test Data Training Data Training Labels Data Labels 1

Slide 27

Slide 27 text

Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels 3 Learning Algorithm Hyperparameter values Hyperparameter values Hyperparameter values Training Data Training Labels 2 Performance Performance Performance

Slide 28

Slide 28 text

Prediction Test Labels Performance Model Test Data 4 Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels 3

Slide 29

Slide 29 text

Prediction Test Labels Performance Model Test Data 4 Learning Algorithm Best Hyperparameter Values Final Model Data Labels 5

Slide 30

Slide 30 text

1st 2nd 3rd 4th 5th Outer Loop Outer Validation Fold Outer Training Fold Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 10 ∑ 10 i=1 Performancei = Inner Loop Inner Training Fold Inner Validation Fold Performance Performance 5,1 5, 2 Performance 1 2 ∑ 2 j=1 Performance5,j Best Algorithm Best Model Nested Cross-Validation for Algorithm Selection

Slide 31

Slide 31 text

Beyond Performance Metrics Ideal features that are ... •  discriminatory •  salient •  invariant

Slide 32

Slide 32 text

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Knowledge Discovery and Data Mining (KDD).

Slide 33

Slide 33 text

THANK YOU!

Slide 34

Slide 34 text

https://github.com/rasbt [email protected] http://sebastianraschka.com @rasbt