Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking 2023 - Lecture 10

Statistical Rethinking 2023 - Lecture 10

Richard McElreath

February 01, 2023
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Statistical Rethinking
    10. Counts & Hidden Confounds
    2023

    View Slide

  2. Generalized Linear Models
    Linear Models: Expected value is
    additive (“linear”) combination of
    parameters
    Generalized Linear Models:
    Expected value is some function of
    an additive combination of
    parameters
    Y
    i
    ∼ Normal(μ
    i
    , σ)
    μ
    i
    = α + β
    X
    X
    i
    + β
    Z
    Z
    i
    Y
    i
    ∼ Bernoulli(p
    i
    )
    f(p
    i
    ) = α + β
    X
    X
    i
    + β
    Z
    Z
    i
    f(p
    i
    )

    View Slide

  3. logit(p
    i
    ) = α + βx
    i
    Generalized Linear Models:
    Expected value is some function
    of an additive combination of
    parameters
    Uniform changes in predictor not
    uniform changes in prediction
    All predictor variables interact,
    moderate one another
    In uences predictions &
    uncertainty of predictions

    View Slide

  4. Confounded Admissions
    G
    D
    A
    gender
    department
    admission

    View Slide

  5. Confounded Admissions
    G
    D
    A
    gender
    department
    admission
    uability

    View Slide

  6. G
    D
    A
    u
    set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )

    View Slide

  7. set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )
    G
    D
    A
    u

    View Slide

  8. set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )
    G
    D
    A
    u

    View Slide

  9. set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )
    G
    D
    A
    u
    > table(G,D,u)
    , , u = 0
    D
    G 1 2
    1 908 0
    2 220 645
    , , u = 1
    D
    G 1 2
    1 0 114
    2 28 85

    View Slide

  10. set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )
    G
    D
    A
    u
    > p_u0
    [,1] [,2]
    [1,] 0.1 0.1
    [2,] 0.1 0.3
    > p_u1
    [,1] [,2]
    [1,] 0.3 0.5
    [2,] 0.3 0.5
    Department 2 discriminates
    against gender 1

    View Slide

  11. dat_sim <- list( A=A , D=D , G=G )
    # total effect gender
    m1 <- ulam(
    alist(
    A ~ bernoulli(p),
    logit(p) <- a[G],
    a[G] ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )
    # direct effects - now confounded!
    m2 <- ulam(
    alist(
    A ~ bernoulli(p),
    logit(p) <- a[G,D],
    matrix[G,D]:a ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )
    > precis(m1,depth=3)
    mean sd 5.5% 94.5% n_eff Rhat4
    a[1] -1.82 0.09 -1.96 -1.67 1483 1
    a[2] -0.89 0.07 -1.00 -0.78 1562 1
    > precis(m2,depth=3)
    mean sd 5.5% 94.5% n_eff Rhat4
    a[1,1] -2.05 0.11 -2.23 -1.89 2482 1
    a[1,2] -0.66 0.18 -0.96 -0.38 2369 1
    a[2,1] -1.77 0.19 -2.06 -1.47 2278 1
    a[2,2] -0.68 0.08 -0.81 -0.56 2210 1
    total e ect shows disadvantage
    direct e ect confounded

    View Slide

  12. post2 <- extract.samples(m2)
    dens(inv_logit(post2$a[,1,1]),lwd=3,col=4,xli
    m=c(0,0.5),xlab="probability admission")
    dens(inv_logit(post2$a[,2,1]),lwd=3,col=4,lty
    =2,add=TRUE)
    dens(inv_logit(post2$a[,1,2]),lwd=3,col=2,add
    =TRUE)
    dens(inv_logit(post2$a[,2,2]),lwd=3,col=2,lty
    =2,add=TRUE)
    > precis(m1,depth=3)
    mean sd 5.5% 94.5% n_eff Rhat4
    a[1] -1.82 0.09 -1.96 -1.67 1483 1
    a[2] -0.89 0.07 -1.00 -0.78 1562 1
    total e ect shows disadvantage
    direct e ect confounded
    D2,G2
    D2,G1
    0.0 0.1 0.2 0.3 0.4 0.5
    0 10 20 30 40
    probability admission
    Density
    D1,G1
    D1,G2

    View Slide

  13. You guessed it: Collider bias
    Stratifying by D opens non-
    causal path through u
    Can estimate total causal e ect
    of G, but isn’t what we want
    Cannot estimate direct e ect
    of D or G
    Worst place in the world, Collider Bias country

    View Slide

  14. You guessed it: Collider bias
    More intuitive explanation:
    High ability G1s apply to
    discriminatory department anyway
    G1s in that department are higher
    ability on average than G2s
    High ability compensates for
    discrimination => masks evidence
    G
    D
    A
    u

    View Slide

  15. View Slide

  16. https://arxiv.org/abs/2207.13665

    View Slide

  17. Citation networks
    Citation networks of members of
    NAS
    Women associated with lower
    lifetime citation rate
    G
    C
    gender
    citations
    Lerman et al 2022 Gendered citation patterns among the scienti c elite

    View Slide

  18. Membership
    Elections to NAS
    Women associated with 3–15 times
    higher election rate, controlling for
    citations
    gender
    Card et al 2022 Gender Gaps at the Academies
    NAS member
    citations
    G M
    C

    View Slide

  19. gender NAS member
    citations
    G M
    C
    q quality (hidden)

    View Slide

  20. gender NAS member
    citations
    G M
    C
    q quality (hidden)
    Restrict sample to NAS members, examine citations
    If men less likely to be elected, then must
    have higher q,C to compensate

    View Slide

  21. gender NAS member
    citations
    G M
    C
    q quality (hidden)
    Control for citations, examine elections to NAS
    G is “treatment”. C is a post-treatment variable!
    If women less likely to be cited (bias), then
    women more likely to be elected because they
    have higher q than indicated by C

    View Slide

  22. No causes in; No causes out
    Hyped papers with vague estimands,
    unwise adjustment sets
    Policy design through collider bias?
    We can do better
    Strong assumptions required
    Qualitative data useful
    G M
    C
    q

    View Slide

  23. Sensitivity analysis
    What are the implications of what
    we don’t know?
    Assume confound exists, model its
    consequences for di erent
    strengths/kinds of in uence
    How strong must the confound be
    to change conclusions?
    G
    D
    A
    u

    View Slide

  24. Sensitivity analysis
    A
    i
    ∼ Bernoulli(p
    i
    )
    logit(p
    i
    ) = α[G
    i
    , D
    i
    ] + β
    G[i]
    u
    i
    G
    D
    A
    u

    View Slide

  25. Sensitivity analysis
    A
    i
    ∼ Bernoulli(p
    i
    )
    logit(p
    i
    ) = α[G
    i
    , D
    i
    ] + β
    G[i]
    u
    i
    G
    D
    A
    u
    (D
    i
    = 2) ∼ Bernoulli(q
    i
    )
    logit(q
    i
    ) = δ[G
    i
    ] + γ
    G[i]
    u
    i
    G
    D
    A
    u

    View Slide

  26. A
    i
    ∼ Bernoulli(p
    i
    )
    logit(p
    i
    ) = α[G
    i
    , D
    i
    ] + β
    G[i]
    u
    i
    (D
    i
    = 2) ∼ Bernoulli(q
    i
    )
    logit(q
    i
    ) = δ[G
    i
    ] + γ
    G[i]
    u
    i
    datl <- dat_sim
    datl$D2 <- ifelse(datl$D==2,1,0)
    datl$N <- length(datl$D)
    datl$b <- c(1,1)
    datl$g <- c(1,0)
    mGDu <- ulam(
    alist(
    # A model
    A ~ bernoulli(p),
    logit(p) <- a[G,D] + b[G]*u[i],
    matrix[G,D]:a ~ normal(0,1),
    # D model
    D2 ~ bernoulli(q),
    logit(q) <- delta[G] + g[G]*u[i],
    delta[G] ~ normal(0,1),
    # declare unobserved u
    vector[N]:u ~ normal(0,1)
    ), data=datl , chains=4 , cores=4 )
    u
    j
    ∼ Normal(0,1)

    View Slide

  27. 0.0 0.1 0.2 0.3 0.4 0.5
    0 10 20 30 40
    probability admission
    Density
    0.0 0.1 0.2 0.3 0.4 0.5
    0 10 20 30 40
    probability admission
    Density
    Ignore
    confound
    Assume
    confound
    D2,G2
    D2,G1
    D1,G1
    D1,G2

    View Slide

  28. Sensitivity analysis
    What are the implications of what we
    don’t know?
    Somewhere between pure simulation
    and pure analysis
    Vary confound strength over range and
    show how results change –or– vary other
    e ects and estimate confound strength
    Confounds persist — don’t pretend
    G
    D
    A
    u
    0.0 0.1 0.2 0.3 0.4 0.5
    0 10 20 30 40
    probability admission
    Density

    View Slide

  29. More parameters (2006)
    than observations (2000)!
    A
    i
    ∼ Bernoulli(p
    i
    )
    logit(p
    i
    ) = α[G
    i
    , D
    i
    ] + β
    G[i]
    u
    i
    (D
    i
    = 2) ∼ Bernoulli(q
    i
    )
    logit(q
    i
    ) = δ[G
    i
    ] + γ
    G[i]
    u
    i
    u
    j
    ∼ Normal(0,1)

    View Slide

  30. Charlie Chaplin, Modern Times (1936)

    View Slide

  31. PAUSE

    View Slide

  32. View Slide

  33. Oceanic Technology
    How is technological
    complexity related to
    population size?
    To social structure?
    data(Kline)
    In uence of population size
    and contact on total tools
    Kline & Boyd 2010 Population size predicts technological complexity in Oceania

    View Slide

  34. Oceanic Technology
    data(Kline)
    culture population contact total_tools mean_TU
    1 Malekula 1100 low 13 3.2
    2 Tikopia 1500 low 22 4.7
    3 Santa Cruz 3600 low 24 4.0
    4 Yap 4791 high 43 5.0
    5 Lau Fiji 7400 high 33 5.0
    6 Trobriand 8000 high 19 4.0
    7 Chuuk 9200 high 40 3.8
    8 Manus 13000 low 28 6.6
    9 Tonga 17500 high 55 5.4
    10 Hawaii 275000 low 71 6.6
    Kline & Boyd 2010 Population size predicts technological complexity in Oceania

    View Slide

  35. Tools
    Innovations
    Population Loss
    Contact

    View Slide

  36. P
    C
    T
    L
    contact
    population tools
    location

    View Slide

  37. P
    C
    T
    L
    contact
    population tools
    location
    Adjustment set for P: L
    Also want to stratify by C,
    to study interaction

    View Slide

  38. Modeling tools
    Tool count is not binomial: No maximum
    Poisson distribution: Very high maximum and very low
    probability of each success
    Here: Many many possible technologies, very few realized in
    any one place

    View Slide

  39. Poisson link is log
    Poisson distribution takes shape
    from expected value
    Must be positive
    Exponential scaling can be
    surprising!
    Y
    i
    ∼ Poisson(λ
    i
    )
    log(λ
    i
    ) = α + βx
    i
    λ
    i
    = exp(α + βx
    i
    )
    log(λ
    i
    )
    exp

    View Slide

  40. Poisson (poison) priors
    Exponential scaling can be surprising
    α ∼ Normal(0,10)
    log(λ
    i
    ) = α

    View Slide

  41. Poisson (poison) priors
    Exponential scaling can be surprising
    α ∼ Normal(0,10)
    log(λ
    i
    ) = α
    0 20 40 60 80 100
    0.00 0.01 0.02 0.03 0.04 0.05
    mean number of tools
    Density

    View Slide

  42. 0 20 40 60 80 100
    0.00 0.01 0.02 0.03 0.04 0.05
    mean number of tools
    Density
    Poisson (poison) priors
    Exponential scaling can be surprising
    α ∼ Normal(0,10)
    log(λ
    i
    ) = α
    α ∼ Normal(3,0.5)

    View Slide

  43. Poisson priors
    -2 -1 0 1 2
    0 20 40 60 80 100
    x value
    expected count
    α ∼ Normal(0,1)
    log(λ
    i
    ) = α + βx
    i
    β ∼ Normal(0,10)
    -2 -1 0 1 2
    0 20 40 60 80 100
    x value
    expected count
    α ∼ Normal(3,0.5)
    β ∼ Normal(0,0.2)

    View Slide

  44. data(Kline)
    d <- Kline
    d$P <- scale( log(d$population) )
    d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )
    dat <- list(
    T = d$total_tools ,
    P = d$P ,
    C = d$contact_id )
    # intercept only
    m11.9 <- ulam(
    alist(
    T ~ dpois( lambda ),
    log(lambda) <- a,
    a ~ dnorm( 3 , 0.5 )
    ), data=dat , chains=4 , log_lik=TRUE )
    # interaction model
    m11.10 <- ulam(
    alist(
    T ~ dpois( lambda ),
    log(lambda) <- a[C] + b[C]*P,
    a[C] ~ dnorm( 3 , 0.5 ),
    b[C] ~ dnorm( 0 , 0.2 )
    ), data=dat , chains=4 , log_lik=TRUE )
    compare( m11.9 , m11.10 , func=PSIS )
    Y
    i
    ∼ Poisson(λ
    i
    )
    log(λ
    i
    ) = α
    C[i]
    + β
    C[i]
    log(P
    i
    )
    α
    j
    ∼ Normal(3,0.5)
    β
    j
    ∼ Normal(0,0.2)

    View Slide

  45. data(Kline)
    d <- Kline
    d$P <- scale( log(d$population) )
    d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )
    dat <- list(
    T = d$total_tools ,
    P = d$P ,
    C = d$contact_id )
    # intercept only
    m11.9 <- ulam(
    alist(
    T ~ dpois( lambda ),
    log(lambda) <- a,
    a ~ dnorm( 3 , 0.5 )
    ), data=dat , chains=4 , log_lik=TRUE )
    # interaction model
    m11.10 <- ulam(
    alist(
    T ~ dpois( lambda ),
    log(lambda) <- a[C] + b[C]*P,
    a[C] ~ dnorm( 3 , 0.5 ),
    b[C] ~ dnorm( 0 , 0.2 )
    ), data=dat , chains=4 , log_lik=TRUE )
    compare( m11.9 , m11.10 , func=PSIS )
    Y
    i
    ∼ Poisson(λ
    i
    )
    log(λ
    i
    ) = α
    C[i]
    + β
    C[i]
    log(P
    i
    )
    α
    j
    ∼ Normal(3,0.5)
    β
    j
    ∼ Normal(0,0.2)
    > compare( m11.9 , m11.10 , func=PSIS )
    Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
    Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
    PSIS SE dPSIS dSE pPSIS weight
    m11.10 85.9 13.50 0.0 NA 7.3 1
    m11.9 141.3 33.69 55.4 33.13 8.0 0

    View Slide

  46. > compare( m11.9 , m11.10 , func=PSIS )
    Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
    Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
    PSIS SE dPSIS dSE pPSIS weight
    m11.10 85.9 13.50 0.0 NA 7.3 1
    m11.9 141.3 33.69 55.4 33.13 8.0 0
    pPSIS is the “penalty”, the e ective number of parameters.
    How can a model with more parameters (m11.10) have fewer
    e ective parameters?
    m11.9 changes more when individual societies are dropped

    View Slide

  47. -1 0 1 2
    0 20 40 60
    log population (std)
    total tools
    Yap
    Trobriand
    Tonga
    Hawaii
    k <- PSIS( m11.10 , pointwise=TRUE )$k
    plot( dat$P , dat$T , xlab="log population (std)" ,
    ylab="total tools" ,
    col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
    ylim=c(0,75) , cex=1+normalize(k) )
    # set up the horizontal axis values to compute predictions
    at
    P_seq <- seq( from=-1.4 , to=3 , len=100 )
    # predictions for C=1 (low contact)
    lambda <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) )
    lmu <- apply( lambda , 2 , mean )
    lci <- apply( lambda , 2 , PI )
    lines( P_seq , lmu , lty=2 , lwd=1.5 )
    shade( lci , P_seq , xpd=TRUE , col=col.alpha(4,0.3) )
    # predictions for C=2 (high contact)
    lambda <- link( m11.10 , data=data.frame( P=P_seq , C=2 ) )
    lmu <- apply( lambda , 2 , mean )
    lci <- apply( lambda , 2 , PI )
    lines( P_seq , lmu , lty=1 , lwd=1.5 )
    shade( lci , P_seq , xpd=TRUE , col=col.alpha(2,0.3))
    Points scaled by leverage

    View Slide

  48. -1 0 1 2
    0 20 40 60
    log population (std)
    total tools
    Yap
    Trobriand
    Tonga
    Hawaii
    0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii
    log scale Natural scale

    View Slide

  49. 0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii

    View Slide

  50. 0 50000 150000 250000
    0 20 40 6
    total tools
    Tonga

    View Slide

  51. is model is wack
    Two immediate ways to improve the
    model
    (1) Use a robust model:
    gamma-Poisson (neg-binomial)
    (2) A principled scienti c model
    0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii

    View Slide

  52. View Slide

  53. ΔT = αPβ − γT

    View Slide

  54. ΔT = αPβ − γT
    change in tools

    View Slide

  55. ΔT = αPβ − γT
    change in tools
    innovation
    rate

    View Slide

  56. ΔT = αPβ − γT
    change in tools
    innovation
    rate
    diminishing returns
    (elasticity)

    View Slide

  57. ΔT = αPβ − γT
    change in tools
    innovation
    rate
    diminishing returns
    (elasticity)
    rate of loss

    View Slide

  58. ΔT = α
    C

    C
    − γT
    diminishing returns
    depend upon contact
    innovation depends
    upon contact

    View Slide

  59. ΔT = α
    C

    C
    − γT
    Di erence equation: Says how tools
    change, not expected number
    What is the equilibrium?

    View Slide

  60. ΔT = α
    C

    C
    − γT
    0 10 20 30 40 50
    0 2 4 6 8 10
    time
    Tools
    P = 1e4
    P = 1e3
    f <- function(a=0.02,b=0.5,g=0.2,P=1e4,t_max=50) {
    T <- rep(0,t_max)
    for ( i in 2:t_max )
    T[i] <- T[i-1] + a*P^b - g*T[i-1]
    return(T)
    }
    plot( NULL , xlim=c(0,50) , ylim=c(0,10) ,
    xlab="time" , ylab="Tools" )
    T <- f(P=1e3)
    lines( 1:50 , T , lwd=3 , col=2 )
    T <- f(P=1e4)
    lines( 1:50 , T , lwd=3 , col=2 )

    View Slide

  61. ΔT = α
    C

    C
    − γT = 0
    Solve for equilibrium T

    View Slide

  62. ΔT = α
    C

    C
    − γT = 0
    ̂
    T =
    α
    C

    C
    γ
    Solve for equilibrium T

    View Slide

  63. ̂
    T =
    α
    C

    C
    γ
    T
    i
    ∼ Poisson(λ
    i
    )
    λ
    i
    = ̂
    T

    View Slide

  64. # innovation/loss model
    dat2 <- list( T=d$total_tools, P=d$population,
    C=d$contact_id )
    m11.11 <- ulam(
    alist(
    T ~ dpois( lambda ),
    lambda <- exp(a[C])*P^b[C]/g,
    a[C] ~ dnorm(1,1),
    b[C] ~ dexp(1),
    g ~ dexp(1)
    ), data=dat2 , chains=4 , cores=4 )
    All parameters must be positive
    Two ways to do this
    (1) use exp()
    (2) use appropriate prior
    ̂
    T =
    α
    C

    C
    γ

    View Slide

  65. All parameters must be positive
    Two ways to do this
    (1) use exp()
    (2) use appropriate prior
    # innovation/loss model
    dat2 <- list( T=d$total_tools, P=d$population,
    C=d$contact_id )
    m11.11 <- ulam(
    alist(
    T ~ dpois( lambda ),
    lambda <- exp(a[C])*P^b[C]/g,
    a[C] ~ dnorm(1,1),
    b[C] ~ dexp(1),
    g ~ dexp(1)
    ), data=dat2 , chains=4 , cores=4 )
    ̂
    T =
    α
    C

    C
    γ
    > precis(m11.11,2)
    mean sd 5.5% 94.5% n_eff Rhat4
    a[1] 0.85 0.68 -0.26 1.90 698 1
    a[2] 0.93 0.83 -0.39 2.31 902 1
    b[1] 0.26 0.03 0.21 0.32 1149 1
    b[2] 0.29 0.10 0.12 0.45 711 1
    g 1.11 0.70 0.32 2.43 862 1

    View Slide

  66. 0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii
    # innovation/loss model
    dat2 <- list( T=d$total_tools, P=d$population,
    C=d$contact_id )
    m11.11 <- ulam(
    alist(
    T ~ dpois( lambda ),
    lambda <- exp(a[C])*P^b[C]/g,
    a[C] ~ dnorm(1,1),
    b[C] ~ dexp(1),
    g ~ dexp(1)
    ), data=dat2 , chains=4 , cores=4 )
    ̂
    T =
    α
    C

    C
    γ

    View Slide

  67. 0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii
    0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii
    Innovation/loss model Generalized linear model
    Still have to deal with location as confound

    View Slide

  68. Count GLMs
    Distributions from constraints
    Maximum entropy priors: Binomial,
    Poisson, and extensions
    More event types: Multinomial and
    categorical
    Robust regressions:
    Beta-binomial, gamma-Poisson

    View Slide

  69. Course Schedule
    Week 1 Bayesian inference Chapters 1, 2, 3
    Week 2 Linear models & Causal Inference Chapter 4
    Week 3 Causes, Confounds & Colliders Chapters 5 & 6
    Week 4 Over tting / MCMC Chapters 7, 8, 9
    Week 5 Generalized Linear Models Chapters 10, 11
    Week 6 Mixtures & ordered categories Chapters 11 & 12
    Week 7 Multilevel models I Chapter 13
    Week 8 Multilevel models II Chapter 14
    Week 9 Measurement & Missingness Chapter 15
    Week 10 Generalized Linear Madness Chapter 16
    https://github.com/rmcelreath/stat_rethinking_2023

    View Slide

  70. View Slide

  71. BONUS

    View Slide

  72. Simpson’s Pandora’s Box
    Simpson’s paradox: Reversal of an
    association when groups are
    combined or separated
    No way to know which association
    (combined/separated) is correct
    without causal assumptions
    In nite evils unleashed
    Simpson 1951. e Interpretation of Interaction in Contingency Tables
    The container HELD
    every imaginable
    estimate.
    Pandora opens a spreadsheet
    Simpson’s paradox
    plagues us to this day.

    View Slide

  73. Berkeley Paradox
    Unconditional on department:
    Women admitted at lower rate
    Conditional on department:
    Women admitted slightly more
    Which is correct? No way to
    know without assumptions
    dept admit reject applications gender
    1 A 512 313 825 male
    2 A 89 19 108 female
    3 B 353 207 560 male
    4 B 17 8 25 female
    5 C 120 205 325 male
    6 C 202 391 593 female
    7 D 138 279 417 male
    8 D 131 244 375 female
    9 E 53 138 191 male
    10 E 94 299 393 female
    11 F 22 351 373 male
    12 F 24 317 341 female

    View Slide

  74. Berkeley Paradox
    Which is correct? No way to
    know without assumptions
    Mediator (department)
    Collider + confound (ability)
    dept admit reject applications gender
    1 A 512 313 825 male
    2 A 89 19 108 female
    3 B 353 207 560 male
    4 B 17 8 25 female
    5 C 120 205 325 male
    6 C 202 391 593 female
    7 D 138 279 417 male
    8 D 131 244 375 female
    9 E 53 138 191 male
    10 E 94 299 393 female
    11 F 22 351 373 male
    12 F 24 317 341 female

    View Slide

  75. X Z Y
    e Pipe
    X Z Y
    e Fork
    X Z Y
    e Collider
    X Z Y
    e Descendant
    A

    View Slide

  76. Z = 1
    Z = 0
    X Z Y
    -3 -2 -1 0 1 2 3 4
    -3 -2 -1 0 1 2 3
    X
    Y
    cols <- c(4,2)
    N <- 300
    Z <- rbern(N)
    X <- rnorm(N,2*Z-1)
    Y <- rnorm(N,2*Z-1)
    plot( X , Y , col=cols[Z+1] , lwd=3 )
    abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
    abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
    abline(lm(Y~X),lwd=3)

    View Slide

  77. Z = 1
    Z = 0
    cols <- c(4,2)
    N <- 300
    X <- rnorm(N)
    Z <- rbern(N,inv_logit(X))
    Y <- rnorm(N,(2*Z-1))
    plot( X , Y , col=cols[Z+1] , lwd=3 )
    abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
    abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
    abline(lm(Y~X),lwd=3)
    X Z Y
    -3 -2 -1 0 1 2 3
    -3 -2 -1 0 1 2 3
    X
    Y

    View Slide

  78. Z = 1
    Z = 0
    cols <- c(4,2)
    N <- 300
    X <- rnorm(N)
    Y <- rnorm(N)
    Z <- rbern(N,inv_logit(2*X+2*Y-2))
    plot( X , Y , col=cols[Z+1] , lwd=3 )
    abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
    abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
    abline(lm(Y~X),lwd=3)
    X Z Y
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    X
    Y

    View Slide

  79. Non-linear haunting
    In event models, e ect reversal
    can arise other ways
    Example: Base rate di erences
    X Y
    Z
    treatment outcome
    covariate

    View Slide

  80. # base rate differences erasing effect of X
    N <- 1000
    X <- rnorm(N)
    Z <- rbern(N)
    p <- inv_logit(X + ifelse(Z==1,-1,5))
    Y <- rbern(N,p)
    mX <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b*X,
    a ~ dnorm(0,1),
    b ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X) )
    mXZ <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b[Z]*X,
    a ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    X Y
    Z

    View Slide

  81. X Y
    Z
    # base rate differences erasing effect of X
    N <- 1000
    X <- rnorm(N)
    Z <- rbern(N)
    p <- inv_logit(X + ifelse(Z==1,-1,5))
    Y <- rbern(N,p)
    mX <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b*X,
    a ~ dnorm(0,1),
    b ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X) )
    mXZ <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b[Z]*X,
    a ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    logit(p
    i
    ) = α + βX
    i

    View Slide

  82. # base rate differences erasing effect of X
    N <- 1000
    X <- rnorm(N)
    Z <- rbern(N)
    p <- inv_logit(X + ifelse(Z==1,-1,5))
    Y <- rbern(N,p)
    mX <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b*X,
    a ~ dnorm(0,1),
    b ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X) )
    mXZ <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b[Z]*X,
    a ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    X Y
    Z
    logit(p
    i
    ) = α + βX
    i
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i

    View Slide

  83. -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    logit(p
    i
    ) = α + βX
    i
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    Z collapsed

    View Slide

  84. -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    -0.5 0.0 0.5 1.0
    0 1 2 3 4
    beta[Z]
    Density

    View Slide

  85. mX <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b*X,
    a ~ dnorm(0,1),
    b ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X) )
    mXZ <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b[Z]*X,
    a ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    mXZ2 <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a[Z] + b[Z]*X,
    a[Z] ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    logit(p
    i
    ) = α + βX
    i
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    logit(p
    i
    ) = α
    Z[i]
    + β
    Z[i]
    X
    i

    View Slide

  86. -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    logit(p
    i
    ) = α
    Z[i]
    + β
    Z[i]
    X
    i
    -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    Z = 1
    Z = 2

    View Slide

  87. -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    logit(p
    i
    ) = α
    Z[i]
    + β
    Z[i]
    X
    i
    -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    Z = 1
    Z = 2
    -1.0 -0.5 0.0 0.5 1.0 1.5
    0.0 1.0 2.0 3.0
    beta[Z]
    Density
    -0.5 0.0 0.5 1.0
    0 1 2 3 4
    beta[Z]
    Density

    View Slide

  88. logit(p
    i
    ) = α
    Z[i]
    + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    -1.0 -0.5 0.0 0.5 1.0 1.5
    0.0 1.0 2.0 3.0
    beta[Z]
    Density

    View Slide

  89. Simpson’s Pandora’s Box
    No paradox, because almost
    anything can produce it
    People do not have intuitions
    about coe cient reversals
    Stop naming statistical paradoxes;
    start teaching scienti c logic
    Simpson 1951. e Interpretation of Interaction in Contingency Tables
    The container HELD
    every imaginable
    estimate.
    Pandora opens a spreadsheet
    Simpson’s paradox
    plagues us to this day.

    View Slide

  90. View Slide