Statistical Rethinking 2022 Lecture 03

by Richard McElreath

Slide 1

Slide 1 text

Statistical Rethinking 03: Geocentric Models 2022

Slide 2

Slide 2 text

Mars: June 2020 until Feb 2021 — Tunç Tezel — https://vimeo.com/user48630149

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

MARS EARTH

Slide 5

Slide 5 text

MARS EARTH Geocentric Model

Slide 6

Slide 6 text

Prediction Without Explanation Geocentric Model

Slide 7

Slide 7 text

Geocentrism • Descriptively accurate • Mechanistically wrong • General method of approximation • Known to be wrong Linear Regression • Descriptively accurate • Mechanistically wrong • General method of approximation • Taken too seriously

Slide 8

Slide 8 text

Simple statistical golems Model of mean and variance of variable Mean as weighted sum of other variables Many special cases: ANOVA, ANCOVA, t-test, MANOVA Can also be generative models Linear Regression From Breath of Bones: A Tale of the Golem

Slide 9

Slide 9 text

1809 Bayesian argument for normal error and least-squares estimation

Slide 10

Slide 10 text

Positions Distribution

Slide 11

Slide 11 text

Why Normal? Two arguments (1) Generative: Summed fluctuations tend towards normal distribution (2) Statistical: For estimating mean and variance, normal distribution is least informative distribution (maxent) Variable does not have to be normally distributed for normal model to be useful

Slide 12

Slide 12 text

Making Normal Models Goals: (1) Language for representing models (2) How to calculate bigger posterior distributions (3) Constructing & understanding linear models

Slide 13

Slide 13 text

FLOW

Slide 14

Slide 14 text

Language for modeling Revisit globe tossing model: W ~ Binomial(N, p) p ~ Uniform(0, 1)

Slide 15

Slide 15 text

Language for modeling Revisit globe tossing model: outcome W ~ Binomial(N, p) p ~ Uniform(0, 1)

Slide 16

Slide 16 text

Language for modeling Revisit globe tossing model: outcome “is distributed” W ~ Binomial(N, p) p ~ Uniform(0, 1)

Slide 17

Slide 17 text

Language for modeling Revisit globe tossing model: outcome “is distributed” W ~ Binomial(N, p) p ~ Uniform(0, 1) data distribution (likelihood)

Slide 18

Slide 18 text

Language for modeling Revisit globe tossing model: outcome parameter  to estimate “is distributed” W ~ Binomial(N, p) p ~ Uniform(0, 1) data distribution (likelihood)

Slide 19

Slide 19 text

Language for modeling Revisit globe tossing model: outcome parameter  to estimate prior distribution “is distributed” W ~ Binomial(N, p) p ~ Uniform(0, 1) data distribution (likelihood)

Slide 20

Slide 20 text

Now make it compute — arrange as probability statements Binomial( Uniform( W ~ p ~ N, p) 0, 1)

Slide 21

Slide 21 text

Now make it compute — arrange as probability statements W p Pr(W|N, p) = Pr(p) = Binomial( Uniform( N, p) 0, 1) | |

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Now make it compute — using code Pr(p|W, N) ∝ Binomial(W|N, p) Uniform(p|0,1) W <- 6 N <- 9 p <- seq(from=0,to=1,len=1000) PrW <- dbinom(W,N,p) Prp <- dunif(p,0,1) posterior <- PrW * Prp

Slide 24

Slide 24 text

Linear Regression Drawing the Owl (1) Question/goal/estimand (2) Scientific model (3) Statistical model(s) (4) Validate model (5) Analyze data

Slide 25

Slide 25 text

Linear Regression Drawing the Owl (1) Question/goal/estimand (2) Scientific model (3) Statistical model(s) (4) Validate model (5) Analyze data library(rethinking) data(Howell1) 60 80 100 140 180 10 20 30 40 50 60 height (cm) weight (kg)

Slide 26

Slide 26 text

Linear Regression Drawing the Owl (1) Question/goal/estimand Describe association between weight and height library(rethinking) data(Howell1) 60 80 100 140 180 10 20 30 40 50 60 height (cm) weight (kg)

Slide 27

Slide 27 text

Linear Regression Drawing the Owl (1) Question/goal/estimand Describe association between ADULT weight and height data(Howell1) d <- Howell1[Howell1$age>=18,] 140 150 160 170 180 30 35 40 45 50 55 60 height (cm) weight (kg)

Slide 28

Slide 28 text

Linear Regression (2) Scientific model How does height influence weight? 140 150 160 170 180 30 35 40 45 50 55 60 height (cm) weight (kg) data(Howell1) d <- Howell1[Howell1$age>=18,] H W W = f(H) “Weight is some function of height”

Slide 29

Slide 29 text

Generative models Options

Slide 30

Slide 30 text

Generative models Options (1) Dynamic: Incremental growth of organism; both mass and height (length) derive from growth pattern; Gaussian variation result of summed fluctuations

Slide 31

Slide 31 text

Generative models Options (1) Dynamic: Incremental growth of organism; both mass and height (length) derive from growth pattern; Gaussian variation result of summed fluctuations (2) Static: Changes in height result in changes in weight, but no mechanism; Gaussian variation result of growth history

Slide 32

Slide 32 text

y i = α + βx i index intercept slope Anatomy of a linear model

Slide 33

Slide 33 text

μ i = α + βx i y i ∼ Normal(μ i , σ) expectation standard  deviation Anatomy of a linear model “Each x value has a different expectation, E(y|x) = µ”

Slide 34

Slide 34 text

Generative model: H → W μ i = α + βH i W i ∼ Normal(μ i , σ) alpha <- 0 beta <- 0.5 sigma <- 5 n_individuals <- 100 H <- runif(n_individuals,130,170) mu <- alpha + beta*H W <- rnorm(n_individuals,mu,sigma)

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

PAUSE

Slide 40

Slide 40 text

Linear Regression Drawing the Owl (1) Question/goal/estimand (2) Scientific model (3) Statistical model(s) (4) Validate model (5) Analyze data 140 150 160 170 180 30 35 40 45 50 55 60 height (cm) weight (kg) data(Howell1) d <- Howell1[Howell1$age>=18,]

Slide 41

Slide 41 text

μ i = α + βx i y i ∼ Normal(μ i , σ) Anatomy of a linear model α ∼ Normal(0,1) β ∼ Normal(0,1) σ ∼ Uniform(0,1)

Slide 42

Slide 42 text

Sampling the prior distribution

Slide 43

Slide 43 text

Sampling the prior distribution n_samples <- 10 alpha <- rnorm(n_samples,0,1) beta <- rnorm(n_samples,0,1) plot(NULL,xlim=c(-2,2),ylim=c(-2,2), xlab="x",ylab="y") for ( i in 1:n_samples ) abline(alpha[i],beta[i],lwd=4,col=2) -2 -1 0 1 2 -2 -1 0 1 2 x y

Slide 44

Slide 44 text

Updating the posterior

Slide 45

Slide 45 text

Statistical model for H → W Structure of statistical model similar to generative model, BUT (1) Useful to re-scale variables (2) Must think about priors These two things go together μ i = α + β(H i − ¯ H) W i ∼ Normal(μ i , σ) α ∼ Normal(?, ?) β ∼ Normal(?, ?) σ ∼ Uniform(0,?)

Slide 46

Slide 46 text

Statistical model for H → W Re-scaling height so that the intercept makes sense μ i = α + β(H i − ¯ H) W i ∼ Normal(μ i , σ) H i − ¯ H α value of µ when H i − ¯ H = 0 mean value of H i

Slide 47

Slide 47 text

Statistical model for H → W Now what are scientifically reasonable priors? : average adult weight : kilograms per centimeter μ i = α + β(H i − ¯ H) W i ∼ Normal(μ i , σ) α ∼ Normal(60,10) β ∼ Normal(0,10) σ ∼ Uniform(0,10)

Slide 48

Slide 48 text

Sampled regression lines n <- 10 alpha <- rnorm(n,60,10) beta <- rnorm(n,0,10) Hbar <- 150 Hseq <- seq(from=130,to=170,len=30) plot(NULL,xlim=c(130,170),ylim=c(10,100), xlab="height (cm)",ylab="weight (kg)") for ( i in 1:n ) lines( Hseq , alpha[i] + beta[i]*(Hseq-Hbar) , lwd=3 , col=2 )

Slide 49

Slide 49 text

Slide 50

Slide 50 text

Slide 51

Slide 51 text

Statistical model for H → W μ i = α + β(H i − ¯ H) W i ∼ Normal(μ i , σ) α ∼ Normal(60,10) β ∼ LogNormal(0,1) σ ∼ Uniform(0,10) 0 1 2 3 4 5 0.0 0.5 1.0 1.5 simulated b values Density

Slide 52

Slide 52 text

Sampled regression lines n <- 10 alpha <- rnorm(n,60,10) beta <- rlnorm(n,0,1) Hbar <- 150 Hseq <- seq(from=130,to=170,len=30) plot(NULL,xlim=c(130,170),ylim=c(10,100), xlab="height (cm)",ylab="weight (kg)") for ( i in 1:n ) lines( Hseq , alpha[i] + beta[i]*(Hseq-Hbar) , lwd=3 , col=2 ) 130 140 150 160 170 20 40 60 80 100 height (cm) weight (kg)

Slide 53

Slide 53 text

Sermon on Priors There are no correct priors, only scientifically justifiable priors Justify with information outside the data — like rest of model Priors not so important in simple linear models But need to practice now: simulate, understand, expand 130 140 150 160 170 20 40 60 80 100 height (cm) weight (kg)

Slide 54

Slide 54 text

Fitting the model Posterior is Pr(α, β, σ|W, H) μ i = α + β(H i − ¯ H) W i ∼ Normal(μ i , σ) α ∼ Normal(60,10) β ∼ LogNormal(0,1) σ ∼ Uniform(0,10) Pr(W i |μ i , σ) Pr(α) Pr(β) Pr(σ)

Slide 55

Slide 55 text

Slide 56

Slide 56 text

30 adults from Howell1

Slide 57

Slide 57 text

Approximate posterior Many posterior distributions are approximately Gaussian Instead of grid approximation, Gaussian approximation Sometimes called quadratic or Laplace approximation See page 41 in book for more detail

Slide 58

Slide 58 text

Slide 59

Slide 59 text

Simulation-Based Validation Bare minimum: Test statistical model with simulated observations from scientific model Golem might be broken Even working golems might not deliver what you hoped Strong test: Simulation-Based Calibration Fahrvergnügen

Slide 60

Slide 60 text

Model formula W ~ dnorm(mu,sigma), mu <- a + b*(H-Hbar), a ~ dnorm(60,10), b ~ dlnorm(0,1), sigma ~ dunif(0,10) μ i = α + β(H i − ¯ H) W i ∼ Normal(μ i , σ) α ∼ Normal(60,10) β ∼ LogNormal(0,1) σ ∼ Uniform(0,10)

Slide 61

Slide 61 text

First validate with simulation alpha <- 70 beta <- 0.5 sigma <- 5 n_individuals <- 100 H <- runif(n_individuals,130,170) mu <- alpha + beta*(H-mean(H)) W <- rnorm(n_individuals,mu,sigma) dat <- list( H=H , W=W , Hbar=mean(H) ) m_validate <- quap( alist( W ~ dnorm(mu,sigma), mu <- a + b*(H-Hbar), a ~ dnorm(60,10), b ~ dlnorm(0,1), sigma ~ dunif(0,10) ), data=dat ) 130 140 150 160 170 55 60 65 70 75 80 85 H W

Slide 62

Slide 62 text

Slide 63

Slide 63 text

Slide 64

Slide 64 text

Slide 65

Slide 65 text

Now with the real data data(Howell1) d <- Howell1 d <- d[ d$age>=18 , ] dat <- list( W = d$weight, H = d$height, Hbar = mean(d$height) ) m_adults <- quap( alist( W ~ dnorm(mu,sigma), mu <- a + b*(H-Hbar), a ~ dnorm(60,10), b ~ dlnorm(0,1), sigma ~ dunif(0,10) ), data=dat ) μ i = α + β(H i − ¯ H) W i ∼ Normal(μ i , σ) α ∼ Normal(60,10) β ∼ LogNormal(0,1) σ ∼ Uniform(0,10)

Slide 66

Slide 66 text

Obey The Law First Law of Statistical Interpretation: The parameters are not independent of one another and cannot always be independently interpreted Instead: Push out posterior predictions and describe/interpret those

Slide 67

Slide 67 text

Posterior predictive distribution (1) Plot the sample (2) Plot the posterior mean (3) Plot uncertainty of the mean (4) Plot uncertainty of predictions 140 150 160 170 180 30 35 40 45 50 55 60 height (cm) weight (kg)

Slide 68

Slide 68 text

140 150 160 170 180 30 35 40 45 50 55 60 height (cm) weight (kg) Posterior predictive distribution (1) Plot the sample (2) Plot the posterior mean (3) Plot uncertainty of the mean (4) Plot uncertainty of predictions

Slide 69

Slide 69 text

Slide 70

Slide 70 text

Slide 71

Slide 71 text

Posterior predictive distribution # plot sample col2 <- col.alpha(2,0.8) plot( d$height , d$weight , col=col2 , lwd=3 , cex=1.2 , xlab="height (cm)" , ylab="weight (kg)" ) # expectation with 99% compatibility interval xseq <- seq(from=130,to=190,len=50) mu <- link(m0,data=list(H=xseq,Hbar=mean(d$height))) lines( xseq , apply(mu,2,mean) , lwd=4 ) shade( apply(mu,2,PI,prob=0.99) , xseq , col=col.alpha(2,0.5) ) # 89% prediction interval W_sim <- sim(m0,data=list(H=xseq,Hbar=mean(d$height))) shade( apply(W_sim,2,PI,prob=0.89) , xseq , col=col.alpha(1,0.3) ) See 4.4.3 starting page 98 in book 140 150 160 170 180 30 35 40 45 50 55 60 height (cm) weight (kg)

Slide 72

Slide 72 text

Slide 73

Slide 73 text

140 150 160 170 180 30 35 40 45 50 55 60 height (cm) weight (kg) Posterior predictive distribution # plot sample col2 <- col.alpha(2,0.8) plot( d$height , d$weight , col=col2 , lwd=3 , cex=1.2 , xlab="height (cm)" , ylab="weight (kg)" ) # expectation with 99% compatibility interval xseq <- seq(from=130,to=190,len=50) mu <- link(m0,data=list(H=xseq,Hbar=mean(d$height))) lines( xseq , apply(mu,2,mean) , lwd=4 ) shade( apply(mu,2,PI,prob=0.99) , xseq , col=col.alpha(2,0.5) ) # 89% prediction interval W_sim <- sim(m0,data=list(H=xseq,Hbar=mean(d$height))) shade( apply(W_sim,2,PI,prob=0.89) , xseq , col=col.alpha(1,0.3) ) See 4.4.3 starting page 98 in book

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

Flexible Linear Thermometers (2) Scientific model How does height influence weight? H W W = f(H) “Weight is some function of height” μ i = α + β(H i − ¯ H) W i ∼ Normal(μ i , σ) α ∼ Normal(60,10) β ∼ LogNormal(0,1) σ ∼ Uniform(0,10)

Slide 76

Slide 76 text

Course Schedule Week 1 Bayesian inference Chapters 1, 2, 3 Week 2 Linear models & Causal Inference Chapter 4 Week 3 Causes, Confounds & Colliders Chapters 5 & 6 Week 4 Overfitting / Interactions Chapters 7 & 8 Week 5 MCMC & Generalized Linear Models Chapters 9, 10, 11 Week 6 Integers & Other Monsters Chapters 11 & 12 Week 7 Multilevel models I Chapter 13 Week 8 Multilevel models II Chapter 14 Week 9 Measurement & Missingness Chapter 15 Week 10 Generalized Linear Madness Chapter 16 https://github.com/rmcelreath/statrethinking_2022

Slide 77

Slide 77 text

No content