Wednesday, November 30, 2016
6:30 PM to 7:00 PM
441 N 5th St, Suite 301, Philadelphia, PA
Machine Learning and Performance Evaluation
Sebastian Raschka
DATAPHILLY
Slide 2
Slide 2 text
Estimating the Performance of
Predictive Models
Why bother?
Slide 3
Slide 3 text
① Generalization Performance
② Model Selection
③ Algorithm Selection
Slide 4
Slide 4 text
target y
Slide 5
Slide 5 text
target y
variance
bias
Slide 6
Slide 6 text
Bias = E
⇥
ˆ
⇤
Variance = E
ˆ E[ˆ] 2
Low Variance
(Precise)
High Variance
(Not Precise)
Low Bias
(Accurate)
High Bias
(Not Accurate)
This work by Sebastian Raschka is licensed under a
Bias = E
⇥
ˆ
⇤
Variance = E
ˆ E[ˆ] 2
expected estimated value
VARIANCE
BIAS
Slide 7
Slide 7 text
Performance Estimates
– Absolute vs Relative
Slide 8
Slide 8 text
① Generalization Performance
② Model Selection
③ Algorithm Selection
Slide 9
Slide 9 text
Sources of Bias and Variance
TRAIN
TRAIN
TEST
Slide 10
Slide 10 text
TRAIN TEST
Slide 11
Slide 11 text
TRAIN TEST
Pessimistic Bias
Slide 12
Slide 12 text
* SoftMax Classifier on a small MNIST subset
Slide 13
Slide 13 text
TRAIN TEST
Slide 14
Slide 14 text
TRAIN TEST
Pessimistic Bias
Slide 15
Slide 15 text
TRAIN TEST
Pessimistic Bias
Variance
Slide 16
Slide 16 text
Train
(70%)
Test
(30%)
Train
(70%)
Test
(30%)
n=1000
n=100
Real World Distribution
Sample 1
Sample 2
Sample 3
Resampling
Learning
Algorithm
Hyperparameter
Values
Model
Training Fold Data
Training Fold Labels
Prediction
Performance
Model
Validation
Fold Data
Validation
Fold Labels
Hyperparameter
Values
Training Fold Data
Training Fold Labels
Prediction
Performance
Model
Validation
Fold Data
Validation
Fold Labels
1st
2nd
3rd
4th
5th
K Iterations (K-Folds)
Validation
Fold
Training
Fold
Performance
Performance
Performance
Performance
Performance
1
2
3
4
5
Performance
1
5
∑
5
i =1
Performancei
=
Slide 21
Slide 21 text
Σ
Logis'c cost
.
.
.
Net input
func'on
(weighted sum)
Logis'c (sigmoid)
func'on
Quan'zer Predicted
class label
y
Update
Model parameters
w
1
w
2
w
m
w
0
1
x
1
x
2
x
m
y
True
class label
Number of itera'ons
w
λ
2
L2-regulariza'on strength
Slide 22
Slide 22 text
The law of parsimony
1-standard error method
Slide 23
Slide 23 text
The law of parsimony
1-standard error method
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
Test Labels
Test Data
Training Data
Training Labels
Data
Labels
1
K-fold for Model Selection step-by-step
Slide 26
Slide 26 text
Learning
Algorithm
Hyperparameter
values
Hyperparameter
values
Hyperparameter
values
Training Data
Training Labels
2
Performance
Performance
Performance
Test Labels
Test Data
Training Data
Training Labels
Data
Labels
1
Slide 27
Slide 27 text
Learning
Algorithm
Best
Hyperparameter
Values
Model
Training Data
Training Labels
3
Learning
Algorithm
Hyperparameter
values
Hyperparameter
values
Hyperparameter
values
Training Data
Training Labels
2
Performance
Performance
Performance
Slide 28
Slide 28 text
Prediction
Test Labels
Performance
Model
Test Data
4
Learning
Algorithm
Best
Hyperparameter
Values
Model
Training Data
Training Labels
3
Slide 29
Slide 29 text
Prediction
Test Labels
Performance
Model
Test Data
4
Learning
Algorithm
Best
Hyperparameter
Values Final
Model
Data
Labels
5
Slide 30
Slide 30 text
1st
2nd
3rd
4th
5th
Outer Loop
Outer
Validation Fold
Outer
Training Fold
Performance
Performance
Performance
Performance
Performance
1
2
3
4
5
Performance
1
10
∑
10
i=1
Performancei
=
Inner Loop
Inner
Training Fold
Inner
Validation Fold
Performance
Performance
5,1
5, 2
Performance
1
2
∑
2
j=1
Performance5,j
Best Algorithm
Best Model
Nested Cross-Validation
for Algorithm Selection
Slide 31
Slide 31 text
Beyond Performance Metrics
Ideal features that are ...
• discriminatory
• salient
• invariant
Slide 32
Slide 32 text
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier.
In Knowledge Discovery and Data Mining (KDD).