Modeling Social Data, Lecture 8: Regression, Part 2

Model complexity and generalization APAM E4990 Modeling Social Data Jake
Hofman Columbia University March 15, 2019 Jake Hofman (Columbia University) Model complexity and generalization March 15, 2019 1 / 1

Overﬁtting (a la xkcd) Jake Hofman (Columbia University) Model complexity
and generalization March 15, 2019 2 / 1

Overﬁtting (a la xkcd) Jake Hofman (Columbia University) Model complexity
and generalization March 15, 2019 3 / 1

Complexity Our models should be complex enough to explain the
past, but simple enough to generalize to the future Jake Hofman (Columbia University) Model complexity and generalization March 15, 2019 4 / 1

Bias-variance tradeoﬀ Jake Hofman (Columbia University) Model complexity and generalization
March 15, 2019 5 / 1

Bias-variance tradeoff 38 2. Overview of Supervised Learning High Bias
Low Variance Low Bias High Variance Prediction Error Model Complexity Training Sample Test Sample Low High FIGURE 2.11. Test and training error as a function of model complexity. be close to f(x0 ). As k grows, the neighbors are further away, and then anything can happen. The variance term is simply the variance of an average here, and de- creases as the inverse of k. So as k varies, there is a bias–variance tradeoff. Simple models may be “wrong” (high bias), but fits don’t vary a lot with different samples of training data (low variance) Jake Hofman (Columbia University) Model complexity and generalization March 15, 2019 6 / 1

Bias-variance tradeoﬀ 38 2. Overview of Supervised Learning High Bias
Low Variance Low Bias High Variance Prediction Error Model Complexity Training Sample Test Sample Low High FIGURE 2.11. Test and training error as a function of model complexity. be close to f(x0 ). As k grows, the neighbors are further away, and then anything can happen. The variance term is simply the variance of an average here, and de- creases as the inverse of k. So as k varies, there is a bias–variance tradeoﬀ. Flexible models can capture more complex relationships (low bias), but are also sensitive to noise in the training data (high variance) Jake Hofman (Columbia University) Model complexity and generalization March 15, 2019 6 / 1

Bigger models = Better models Jake Hofman (Columbia University) Model
complexity and generalization March 15, 2019 7 / 1

Cross-validation set error of the final chosen model will underestimate
the true test error, sometimes substantially. It is difficult to give a general rule on how to choose the number of observations in each of the three parts, as this depends on the signal-to- noise ratio in the data and the training sample size. A typical split might be 50% for training, and 25% each for validation and testing: Test Train Validation Test Train Validation Test Validation Train Validation Test Train The methods in this chapter are designed for situations where there is insufficient data to split it into three parts. Again it is too difficult to give a general rule on how much training data is enough; among other things, this depends on the signal-to-noise ratio of the underlying function, and the complexity of the models being fit to the data. • Randomly split our data into three sets • Fit models on the training set • Use the validation set to find the best model • Quote final performance of this model on the test set Jake Hofman (Columbia University) Model complexity and generalization March 15, 2019 8 / 1

K-fold cross-validation Estimates of generalization error from one train /
validation split can be noisy, so shuﬄe data and average over K distinct validation partitions instead Jake Hofman (Columbia University) Model complexity and generalization March 15, 2019 9 / 1

K-fold cross-validation: pseudocode (randomly) divide the data into K parts
for each model for each of the K folds train on everything but one fold measure the error on the held out fold store the training and validation error compute and store the average error across all folds pick the model with the lowest average validation error evaluate its performance on a final, held out test set Jake Hofman (Columbia University) Model complexity and generalization March 15, 2019 10 / 1

Modeling Social Data, Lecture 8: Regression, Pa...

Modeling Social Data, Lecture 8: Regression, Part 2

Jake Hofman

More Decks by Jake Hofman

Other Decks in Education

Featured

Transcript

Model complexity and generalization APAM E4990 Modeling Social Data Jake

Overﬁtting (a la xkcd) Jake Hofman (Columbia University) Model complexity

Overﬁtting (a la xkcd) Jake Hofman (Columbia University) Model complexity

Complexity Our models should be complex enough to explain the

Bias-variance tradeoﬀ Jake Hofman (Columbia University) Model complexity and generalization

Bias-variance tradeoﬀ 38 2. Overview of Supervised Learning High Bias

Bias-variance tradeoﬀ 38 2. Overview of Supervised Learning High Bias

Bigger models = Better models Jake Hofman (Columbia University) Model

Cross-validation set error of the ﬁnal chosen model will underestimate

K-fold cross-validation Estimates of generalization error from one train /

K-fold cross-validation: pseudocode (randomly) divide the data into K parts