Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro to Leave One Out Cross Validation – LMU 2022

Intro to Leave One Out Cross Validation – LMU 2022

Suppose you have two or more models which fit your data reasonably well. How do you choose which model performs best on unseen data? In this short, practical tutorial, I will introduce Leave-One-Out Cross-Validation (LOO-CV) as a technique for testing a model’s predictive accuracy, and briefly show stacking for model selection. I will motivate the talk around a series of example plots generated by a pedagogical Python notebook which you can experiment with at:

https://gist.github.com/bmorris3/a69842ce9384966feba965eb0d726da6

Brett Morris

March 18, 2022
Tweet

More Decks by Brett Morris

Other Decks in Science

Transcript

  1. A Practical Introduction to Leave One Out Cross Validation Brett

    Morris Universität Bern Demo Notebook: https://gist.github.com/bmorris3/a69842ce9384966feba965eb0d726da6
  2. Goals 1. Prerequisite: Take a model described in a probabilistic

    programming language (e.g.: PyMC3, numpyro), draw posterior samples, and compute pointwise log likelihood 2. Compute leave-one-out predictive density 3. Compute expected LOO predictive density • Compare (2) and (3) 4. Trick: use Pareto smoothing to • “Correct” the weights for importance sampling • Give warnings when there are strong outliers 5. Use Bayesian model stacking to compare models via the LOO-CV
  3. Log pointwise predictive density lppdLOO = log n ∏ i=1

    ∫ p(yi |θ)p(θs)dθ ≈ n ∑ i=1 log ∑ s p(θs) S = n ∑ i=1 ( log∑ s p(θs) − log(S) ) p(θs |y) : likelihood s = 1,2,…, S : samples
  4. Log pointwise predictive density lppdLOO = log n ∏ i=1

    ∫ p(yi |θ)p(θs)dθ ≈ n ∑ i=1 log ∑ s p(θs) S = n ∑ i=1 ( log∑ s p(θs) − log(S) ) p(θs |y) : likelihood s = 1,2,…, S : samples
  5. Expected log pointwise predictive density elpdLOO = ∑ n log

    p(yi |y−i ), p(yi |y−i ) = ∫ p(yi |θ)p(θ|y−i ) dθ . ws i ∝ ∏ j p(yj |θs)p(θs) ∏ i p(yi |θs)p(θs) = 1 ∑ i log p(yi |θs) ⇒ − loglike j = {1,…, j − 1,j + 1,…, N} The predictive density without i is: Importance sampling with importance ratios: The expected log predictive density without i is: Quadratic model Importance weighted
  6. <- low ln(like) high ln(like) -> p(y|u, σ, ̂ k)

    = 1 σ (1 + ̂ k ( y − u σ )) − 1 ̂ k −1 ̂ k ≠ 0 1 σ exp ( y − u σ ) ̂ k = 0 Pareto smoothed importance sampling
  7. <- low ln(like) high ln(like) -> p(y|u, σ, ̂ k)

    = 1 σ (1 + ̂ k ( y − u σ )) − 1 ̂ k −1 ̂ k ≠ 0 1 σ exp ( y − u σ ) ̂ k = 0 Pareto smoothed importance sampling Outliers show large ̂ k
  8. Results elpdloo is the importance-weighted log likelihood: lpdloo - elpdloo

    is the effective number of free parameters: }Sanity check #2!