Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking 2023 - Lecture 10

Statistical Rethinking 2023 - Lecture 10

Richard McElreath

February 01, 2023
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Statistical Rethinking
    10. Counts & Hidden Confounds
    2023

    View full-size slide

  2. Generalized Linear Models
    Linear Models: Expected value is
    additive (“linear”) combination of
    parameters
    Generalized Linear Models:
    Expected value is some function of
    an additive combination of
    parameters
    Y
    i
    ∼ Normal(μ
    i
    , σ)
    μ
    i
    = α + β
    X
    X
    i
    + β
    Z
    Z
    i
    Y
    i
    ∼ Bernoulli(p
    i
    )
    f(p
    i
    ) = α + β
    X
    X
    i
    + β
    Z
    Z
    i
    f(p
    i
    )

    View full-size slide

  3. logit(p
    i
    ) = α + βx
    i
    Generalized Linear Models:
    Expected value is some function
    of an additive combination of
    parameters
    Uniform changes in predictor not
    uniform changes in prediction
    All predictor variables interact,
    moderate one another
    In uences predictions &
    uncertainty of predictions

    View full-size slide

  4. Confounded Admissions
    G
    D
    A
    gender
    department
    admission

    View full-size slide

  5. Confounded Admissions
    G
    D
    A
    gender
    department
    admission
    uability

    View full-size slide

  6. G
    D
    A
    u
    set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )

    View full-size slide

  7. set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )
    G
    D
    A
    u

    View full-size slide

  8. set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )
    G
    D
    A
    u

    View full-size slide

  9. set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )
    G
    D
    A
    u
    > table(G,D,u)
    , , u = 0
    D
    G 1 2
    1 908 0
    2 220 645
    , , u = 1
    D
    G 1 2
    1 0 114
    2 28 85

    View full-size slide

  10. set.seed(12)
    N <- 2000 # number of applicants
    # even gender distribution
    G <- sample( 1:2 , size=N , replace=TRUE )
    # sample ability, high (1) to average (0)
    u <- rbern(N,0.1)
    # gender 1 tends to apply to department 1, 2 to 2
    # and G=1 with greater ability tend to apply to 2 as well
    D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
    # matrix of acceptance rates [dept,gender]
    p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
    p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
    p_u <- list( p_u0 , p_u1 )
    # simulate acceptance
    p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
    A <- rbern( N , p )
    G
    D
    A
    u
    > p_u0
    [,1] [,2]
    [1,] 0.1 0.1
    [2,] 0.1 0.3
    > p_u1
    [,1] [,2]
    [1,] 0.3 0.5
    [2,] 0.3 0.5
    Department 2 discriminates
    against gender 1

    View full-size slide

  11. dat_sim <- list( A=A , D=D , G=G )
    # total effect gender
    m1 <- ulam(
    alist(
    A ~ bernoulli(p),
    logit(p) <- a[G],
    a[G] ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )
    # direct effects - now confounded!
    m2 <- ulam(
    alist(
    A ~ bernoulli(p),
    logit(p) <- a[G,D],
    matrix[G,D]:a ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )
    > precis(m1,depth=3)
    mean sd 5.5% 94.5% n_eff Rhat4
    a[1] -1.82 0.09 -1.96 -1.67 1483 1
    a[2] -0.89 0.07 -1.00 -0.78 1562 1
    > precis(m2,depth=3)
    mean sd 5.5% 94.5% n_eff Rhat4
    a[1,1] -2.05 0.11 -2.23 -1.89 2482 1
    a[1,2] -0.66 0.18 -0.96 -0.38 2369 1
    a[2,1] -1.77 0.19 -2.06 -1.47 2278 1
    a[2,2] -0.68 0.08 -0.81 -0.56 2210 1
    total e ect shows disadvantage
    direct e ect confounded

    View full-size slide

  12. post2 <- extract.samples(m2)
    dens(inv_logit(post2$a[,1,1]),lwd=3,col=4,xli
    m=c(0,0.5),xlab="probability admission")
    dens(inv_logit(post2$a[,2,1]),lwd=3,col=4,lty
    =2,add=TRUE)
    dens(inv_logit(post2$a[,1,2]),lwd=3,col=2,add
    =TRUE)
    dens(inv_logit(post2$a[,2,2]),lwd=3,col=2,lty
    =2,add=TRUE)
    > precis(m1,depth=3)
    mean sd 5.5% 94.5% n_eff Rhat4
    a[1] -1.82 0.09 -1.96 -1.67 1483 1
    a[2] -0.89 0.07 -1.00 -0.78 1562 1
    total e ect shows disadvantage
    direct e ect confounded
    D2,G2
    D2,G1
    0.0 0.1 0.2 0.3 0.4 0.5
    0 10 20 30 40
    probability admission
    Density
    D1,G1
    D1,G2

    View full-size slide

  13. You guessed it: Collider bias
    Stratifying by D opens non-
    causal path through u
    Can estimate total causal e ect
    of G, but isn’t what we want
    Cannot estimate direct e ect
    of D or G
    Worst place in the world, Collider Bias country

    View full-size slide

  14. You guessed it: Collider bias
    More intuitive explanation:
    High ability G1s apply to
    discriminatory department anyway
    G1s in that department are higher
    ability on average than G2s
    High ability compensates for
    discrimination => masks evidence
    G
    D
    A
    u

    View full-size slide

  15. https://arxiv.org/abs/2207.13665

    View full-size slide

  16. Citation networks
    Citation networks of members of
    NAS
    Women associated with lower
    lifetime citation rate
    G
    C
    gender
    citations
    Lerman et al 2022 Gendered citation patterns among the scienti c elite

    View full-size slide

  17. Membership
    Elections to NAS
    Women associated with 3–15 times
    higher election rate, controlling for
    citations
    gender
    Card et al 2022 Gender Gaps at the Academies
    NAS member
    citations
    G M
    C

    View full-size slide

  18. gender NAS member
    citations
    G M
    C
    q quality (hidden)

    View full-size slide

  19. gender NAS member
    citations
    G M
    C
    q quality (hidden)
    Restrict sample to NAS members, examine citations
    If men less likely to be elected, then must
    have higher q,C to compensate

    View full-size slide

  20. gender NAS member
    citations
    G M
    C
    q quality (hidden)
    Control for citations, examine elections to NAS
    G is “treatment”. C is a post-treatment variable!
    If women less likely to be cited (bias), then
    women more likely to be elected because they
    have higher q than indicated by C

    View full-size slide

  21. No causes in; No causes out
    Hyped papers with vague estimands,
    unwise adjustment sets
    Policy design through collider bias?
    We can do better
    Strong assumptions required
    Qualitative data useful
    G M
    C
    q

    View full-size slide

  22. Sensitivity analysis
    What are the implications of what
    we don’t know?
    Assume confound exists, model its
    consequences for di erent
    strengths/kinds of in uence
    How strong must the confound be
    to change conclusions?
    G
    D
    A
    u

    View full-size slide

  23. Sensitivity analysis
    A
    i
    ∼ Bernoulli(p
    i
    )
    logit(p
    i
    ) = α[G
    i
    , D
    i
    ] + β
    G[i]
    u
    i
    G
    D
    A
    u

    View full-size slide

  24. Sensitivity analysis
    A
    i
    ∼ Bernoulli(p
    i
    )
    logit(p
    i
    ) = α[G
    i
    , D
    i
    ] + β
    G[i]
    u
    i
    G
    D
    A
    u
    (D
    i
    = 2) ∼ Bernoulli(q
    i
    )
    logit(q
    i
    ) = δ[G
    i
    ] + γ
    G[i]
    u
    i
    G
    D
    A
    u

    View full-size slide

  25. A
    i
    ∼ Bernoulli(p
    i
    )
    logit(p
    i
    ) = α[G
    i
    , D
    i
    ] + β
    G[i]
    u
    i
    (D
    i
    = 2) ∼ Bernoulli(q
    i
    )
    logit(q
    i
    ) = δ[G
    i
    ] + γ
    G[i]
    u
    i
    datl <- dat_sim
    datl$D2 <- ifelse(datl$D==2,1,0)
    datl$N <- length(datl$D)
    datl$b <- c(1,1)
    datl$g <- c(1,0)
    mGDu <- ulam(
    alist(
    # A model
    A ~ bernoulli(p),
    logit(p) <- a[G,D] + b[G]*u[i],
    matrix[G,D]:a ~ normal(0,1),
    # D model
    D2 ~ bernoulli(q),
    logit(q) <- delta[G] + g[G]*u[i],
    delta[G] ~ normal(0,1),
    # declare unobserved u
    vector[N]:u ~ normal(0,1)
    ), data=datl , chains=4 , cores=4 )
    u
    j
    ∼ Normal(0,1)

    View full-size slide

  26. 0.0 0.1 0.2 0.3 0.4 0.5
    0 10 20 30 40
    probability admission
    Density
    0.0 0.1 0.2 0.3 0.4 0.5
    0 10 20 30 40
    probability admission
    Density
    Ignore
    confound
    Assume
    confound
    D2,G2
    D2,G1
    D1,G1
    D1,G2

    View full-size slide

  27. Sensitivity analysis
    What are the implications of what we
    don’t know?
    Somewhere between pure simulation
    and pure analysis
    Vary confound strength over range and
    show how results change –or– vary other
    e ects and estimate confound strength
    Confounds persist — don’t pretend
    G
    D
    A
    u
    0.0 0.1 0.2 0.3 0.4 0.5
    0 10 20 30 40
    probability admission
    Density

    View full-size slide

  28. More parameters (2006)
    than observations (2000)!
    A
    i
    ∼ Bernoulli(p
    i
    )
    logit(p
    i
    ) = α[G
    i
    , D
    i
    ] + β
    G[i]
    u
    i
    (D
    i
    = 2) ∼ Bernoulli(q
    i
    )
    logit(q
    i
    ) = δ[G
    i
    ] + γ
    G[i]
    u
    i
    u
    j
    ∼ Normal(0,1)

    View full-size slide

  29. Charlie Chaplin, Modern Times (1936)

    View full-size slide

  30. Oceanic Technology
    How is technological
    complexity related to
    population size?
    To social structure?
    data(Kline)
    In uence of population size
    and contact on total tools
    Kline & Boyd 2010 Population size predicts technological complexity in Oceania

    View full-size slide

  31. Oceanic Technology
    data(Kline)
    culture population contact total_tools mean_TU
    1 Malekula 1100 low 13 3.2
    2 Tikopia 1500 low 22 4.7
    3 Santa Cruz 3600 low 24 4.0
    4 Yap 4791 high 43 5.0
    5 Lau Fiji 7400 high 33 5.0
    6 Trobriand 8000 high 19 4.0
    7 Chuuk 9200 high 40 3.8
    8 Manus 13000 low 28 6.6
    9 Tonga 17500 high 55 5.4
    10 Hawaii 275000 low 71 6.6
    Kline & Boyd 2010 Population size predicts technological complexity in Oceania

    View full-size slide

  32. Tools
    Innovations
    Population Loss
    Contact

    View full-size slide

  33. P
    C
    T
    L
    contact
    population tools
    location

    View full-size slide

  34. P
    C
    T
    L
    contact
    population tools
    location
    Adjustment set for P: L
    Also want to stratify by C,
    to study interaction

    View full-size slide

  35. Modeling tools
    Tool count is not binomial: No maximum
    Poisson distribution: Very high maximum and very low
    probability of each success
    Here: Many many possible technologies, very few realized in
    any one place

    View full-size slide

  36. Poisson link is log
    Poisson distribution takes shape
    from expected value
    Must be positive
    Exponential scaling can be
    surprising!
    Y
    i
    ∼ Poisson(λ
    i
    )
    log(λ
    i
    ) = α + βx
    i
    λ
    i
    = exp(α + βx
    i
    )
    log(λ
    i
    )
    exp

    View full-size slide

  37. Poisson (poison) priors
    Exponential scaling can be surprising
    α ∼ Normal(0,10)
    log(λ
    i
    ) = α

    View full-size slide

  38. Poisson (poison) priors
    Exponential scaling can be surprising
    α ∼ Normal(0,10)
    log(λ
    i
    ) = α
    0 20 40 60 80 100
    0.00 0.01 0.02 0.03 0.04 0.05
    mean number of tools
    Density

    View full-size slide

  39. 0 20 40 60 80 100
    0.00 0.01 0.02 0.03 0.04 0.05
    mean number of tools
    Density
    Poisson (poison) priors
    Exponential scaling can be surprising
    α ∼ Normal(0,10)
    log(λ
    i
    ) = α
    α ∼ Normal(3,0.5)

    View full-size slide

  40. Poisson priors
    -2 -1 0 1 2
    0 20 40 60 80 100
    x value
    expected count
    α ∼ Normal(0,1)
    log(λ
    i
    ) = α + βx
    i
    β ∼ Normal(0,10)
    -2 -1 0 1 2
    0 20 40 60 80 100
    x value
    expected count
    α ∼ Normal(3,0.5)
    β ∼ Normal(0,0.2)

    View full-size slide

  41. data(Kline)
    d <- Kline
    d$P <- scale( log(d$population) )
    d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )
    dat <- list(
    T = d$total_tools ,
    P = d$P ,
    C = d$contact_id )
    # intercept only
    m11.9 <- ulam(
    alist(
    T ~ dpois( lambda ),
    log(lambda) <- a,
    a ~ dnorm( 3 , 0.5 )
    ), data=dat , chains=4 , log_lik=TRUE )
    # interaction model
    m11.10 <- ulam(
    alist(
    T ~ dpois( lambda ),
    log(lambda) <- a[C] + b[C]*P,
    a[C] ~ dnorm( 3 , 0.5 ),
    b[C] ~ dnorm( 0 , 0.2 )
    ), data=dat , chains=4 , log_lik=TRUE )
    compare( m11.9 , m11.10 , func=PSIS )
    Y
    i
    ∼ Poisson(λ
    i
    )
    log(λ
    i
    ) = α
    C[i]
    + β
    C[i]
    log(P
    i
    )
    α
    j
    ∼ Normal(3,0.5)
    β
    j
    ∼ Normal(0,0.2)

    View full-size slide

  42. data(Kline)
    d <- Kline
    d$P <- scale( log(d$population) )
    d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )
    dat <- list(
    T = d$total_tools ,
    P = d$P ,
    C = d$contact_id )
    # intercept only
    m11.9 <- ulam(
    alist(
    T ~ dpois( lambda ),
    log(lambda) <- a,
    a ~ dnorm( 3 , 0.5 )
    ), data=dat , chains=4 , log_lik=TRUE )
    # interaction model
    m11.10 <- ulam(
    alist(
    T ~ dpois( lambda ),
    log(lambda) <- a[C] + b[C]*P,
    a[C] ~ dnorm( 3 , 0.5 ),
    b[C] ~ dnorm( 0 , 0.2 )
    ), data=dat , chains=4 , log_lik=TRUE )
    compare( m11.9 , m11.10 , func=PSIS )
    Y
    i
    ∼ Poisson(λ
    i
    )
    log(λ
    i
    ) = α
    C[i]
    + β
    C[i]
    log(P
    i
    )
    α
    j
    ∼ Normal(3,0.5)
    β
    j
    ∼ Normal(0,0.2)
    > compare( m11.9 , m11.10 , func=PSIS )
    Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
    Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
    PSIS SE dPSIS dSE pPSIS weight
    m11.10 85.9 13.50 0.0 NA 7.3 1
    m11.9 141.3 33.69 55.4 33.13 8.0 0

    View full-size slide

  43. > compare( m11.9 , m11.10 , func=PSIS )
    Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
    Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
    PSIS SE dPSIS dSE pPSIS weight
    m11.10 85.9 13.50 0.0 NA 7.3 1
    m11.9 141.3 33.69 55.4 33.13 8.0 0
    pPSIS is the “penalty”, the e ective number of parameters.
    How can a model with more parameters (m11.10) have fewer
    e ective parameters?
    m11.9 changes more when individual societies are dropped

    View full-size slide

  44. -1 0 1 2
    0 20 40 60
    log population (std)
    total tools
    Yap
    Trobriand
    Tonga
    Hawaii
    k <- PSIS( m11.10 , pointwise=TRUE )$k
    plot( dat$P , dat$T , xlab="log population (std)" ,
    ylab="total tools" ,
    col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
    ylim=c(0,75) , cex=1+normalize(k) )
    # set up the horizontal axis values to compute predictions
    at
    P_seq <- seq( from=-1.4 , to=3 , len=100 )
    # predictions for C=1 (low contact)
    lambda <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) )
    lmu <- apply( lambda , 2 , mean )
    lci <- apply( lambda , 2 , PI )
    lines( P_seq , lmu , lty=2 , lwd=1.5 )
    shade( lci , P_seq , xpd=TRUE , col=col.alpha(4,0.3) )
    # predictions for C=2 (high contact)
    lambda <- link( m11.10 , data=data.frame( P=P_seq , C=2 ) )
    lmu <- apply( lambda , 2 , mean )
    lci <- apply( lambda , 2 , PI )
    lines( P_seq , lmu , lty=1 , lwd=1.5 )
    shade( lci , P_seq , xpd=TRUE , col=col.alpha(2,0.3))
    Points scaled by leverage

    View full-size slide

  45. -1 0 1 2
    0 20 40 60
    log population (std)
    total tools
    Yap
    Trobriand
    Tonga
    Hawaii
    0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii
    log scale Natural scale

    View full-size slide

  46. 0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii

    View full-size slide

  47. 0 50000 150000 250000
    0 20 40 6
    total tools
    Tonga

    View full-size slide

  48. is model is wack
    Two immediate ways to improve the
    model
    (1) Use a robust model:
    gamma-Poisson (neg-binomial)
    (2) A principled scienti c model
    0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii

    View full-size slide

  49. ΔT = αPβ − γT

    View full-size slide

  50. ΔT = αPβ − γT
    change in tools

    View full-size slide

  51. ΔT = αPβ − γT
    change in tools
    innovation
    rate

    View full-size slide

  52. ΔT = αPβ − γT
    change in tools
    innovation
    rate
    diminishing returns
    (elasticity)

    View full-size slide

  53. ΔT = αPβ − γT
    change in tools
    innovation
    rate
    diminishing returns
    (elasticity)
    rate of loss

    View full-size slide

  54. ΔT = α
    C

    C
    − γT
    diminishing returns
    depend upon contact
    innovation depends
    upon contact

    View full-size slide

  55. ΔT = α
    C

    C
    − γT
    Di erence equation: Says how tools
    change, not expected number
    What is the equilibrium?

    View full-size slide

  56. ΔT = α
    C

    C
    − γT
    0 10 20 30 40 50
    0 2 4 6 8 10
    time
    Tools
    P = 1e4
    P = 1e3
    f <- function(a=0.02,b=0.5,g=0.2,P=1e4,t_max=50) {
    T <- rep(0,t_max)
    for ( i in 2:t_max )
    T[i] <- T[i-1] + a*P^b - g*T[i-1]
    return(T)
    }
    plot( NULL , xlim=c(0,50) , ylim=c(0,10) ,
    xlab="time" , ylab="Tools" )
    T <- f(P=1e3)
    lines( 1:50 , T , lwd=3 , col=2 )
    T <- f(P=1e4)
    lines( 1:50 , T , lwd=3 , col=2 )

    View full-size slide

  57. ΔT = α
    C

    C
    − γT = 0
    Solve for equilibrium T

    View full-size slide

  58. ΔT = α
    C

    C
    − γT = 0
    ̂
    T =
    α
    C

    C
    γ
    Solve for equilibrium T

    View full-size slide

  59. ̂
    T =
    α
    C

    C
    γ
    T
    i
    ∼ Poisson(λ
    i
    )
    λ
    i
    = ̂
    T

    View full-size slide

  60. # innovation/loss model
    dat2 <- list( T=d$total_tools, P=d$population,
    C=d$contact_id )
    m11.11 <- ulam(
    alist(
    T ~ dpois( lambda ),
    lambda <- exp(a[C])*P^b[C]/g,
    a[C] ~ dnorm(1,1),
    b[C] ~ dexp(1),
    g ~ dexp(1)
    ), data=dat2 , chains=4 , cores=4 )
    All parameters must be positive
    Two ways to do this
    (1) use exp()
    (2) use appropriate prior
    ̂
    T =
    α
    C

    C
    γ

    View full-size slide

  61. All parameters must be positive
    Two ways to do this
    (1) use exp()
    (2) use appropriate prior
    # innovation/loss model
    dat2 <- list( T=d$total_tools, P=d$population,
    C=d$contact_id )
    m11.11 <- ulam(
    alist(
    T ~ dpois( lambda ),
    lambda <- exp(a[C])*P^b[C]/g,
    a[C] ~ dnorm(1,1),
    b[C] ~ dexp(1),
    g ~ dexp(1)
    ), data=dat2 , chains=4 , cores=4 )
    ̂
    T =
    α
    C

    C
    γ
    > precis(m11.11,2)
    mean sd 5.5% 94.5% n_eff Rhat4
    a[1] 0.85 0.68 -0.26 1.90 698 1
    a[2] 0.93 0.83 -0.39 2.31 902 1
    b[1] 0.26 0.03 0.21 0.32 1149 1
    b[2] 0.29 0.10 0.12 0.45 711 1
    g 1.11 0.70 0.32 2.43 862 1

    View full-size slide

  62. 0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii
    # innovation/loss model
    dat2 <- list( T=d$total_tools, P=d$population,
    C=d$contact_id )
    m11.11 <- ulam(
    alist(
    T ~ dpois( lambda ),
    lambda <- exp(a[C])*P^b[C]/g,
    a[C] ~ dnorm(1,1),
    b[C] ~ dexp(1),
    g ~ dexp(1)
    ), data=dat2 , chains=4 , cores=4 )
    ̂
    T =
    α
    C

    C
    γ

    View full-size slide

  63. 0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii
    0 50000 150000 250000
    0 20 40 60
    population
    total tools
    Tonga
    Hawaii
    Innovation/loss model Generalized linear model
    Still have to deal with location as confound

    View full-size slide

  64. Count GLMs
    Distributions from constraints
    Maximum entropy priors: Binomial,
    Poisson, and extensions
    More event types: Multinomial and
    categorical
    Robust regressions:
    Beta-binomial, gamma-Poisson

    View full-size slide

  65. Course Schedule
    Week 1 Bayesian inference Chapters 1, 2, 3
    Week 2 Linear models & Causal Inference Chapter 4
    Week 3 Causes, Confounds & Colliders Chapters 5 & 6
    Week 4 Over tting / MCMC Chapters 7, 8, 9
    Week 5 Generalized Linear Models Chapters 10, 11
    Week 6 Mixtures & ordered categories Chapters 11 & 12
    Week 7 Multilevel models I Chapter 13
    Week 8 Multilevel models II Chapter 14
    Week 9 Measurement & Missingness Chapter 15
    Week 10 Generalized Linear Madness Chapter 16
    https://github.com/rmcelreath/stat_rethinking_2023

    View full-size slide

  66. Simpson’s Pandora’s Box
    Simpson’s paradox: Reversal of an
    association when groups are
    combined or separated
    No way to know which association
    (combined/separated) is correct
    without causal assumptions
    In nite evils unleashed
    Simpson 1951. e Interpretation of Interaction in Contingency Tables
    The container HELD
    every imaginable
    estimate.
    Pandora opens a spreadsheet
    Simpson’s paradox
    plagues us to this day.

    View full-size slide

  67. Berkeley Paradox
    Unconditional on department:
    Women admitted at lower rate
    Conditional on department:
    Women admitted slightly more
    Which is correct? No way to
    know without assumptions
    dept admit reject applications gender
    1 A 512 313 825 male
    2 A 89 19 108 female
    3 B 353 207 560 male
    4 B 17 8 25 female
    5 C 120 205 325 male
    6 C 202 391 593 female
    7 D 138 279 417 male
    8 D 131 244 375 female
    9 E 53 138 191 male
    10 E 94 299 393 female
    11 F 22 351 373 male
    12 F 24 317 341 female

    View full-size slide

  68. Berkeley Paradox
    Which is correct? No way to
    know without assumptions
    Mediator (department)
    Collider + confound (ability)
    dept admit reject applications gender
    1 A 512 313 825 male
    2 A 89 19 108 female
    3 B 353 207 560 male
    4 B 17 8 25 female
    5 C 120 205 325 male
    6 C 202 391 593 female
    7 D 138 279 417 male
    8 D 131 244 375 female
    9 E 53 138 191 male
    10 E 94 299 393 female
    11 F 22 351 373 male
    12 F 24 317 341 female

    View full-size slide

  69. X Z Y
    e Pipe
    X Z Y
    e Fork
    X Z Y
    e Collider
    X Z Y
    e Descendant
    A

    View full-size slide

  70. Z = 1
    Z = 0
    X Z Y
    -3 -2 -1 0 1 2 3 4
    -3 -2 -1 0 1 2 3
    X
    Y
    cols <- c(4,2)
    N <- 300
    Z <- rbern(N)
    X <- rnorm(N,2*Z-1)
    Y <- rnorm(N,2*Z-1)
    plot( X , Y , col=cols[Z+1] , lwd=3 )
    abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
    abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
    abline(lm(Y~X),lwd=3)

    View full-size slide

  71. Z = 1
    Z = 0
    cols <- c(4,2)
    N <- 300
    X <- rnorm(N)
    Z <- rbern(N,inv_logit(X))
    Y <- rnorm(N,(2*Z-1))
    plot( X , Y , col=cols[Z+1] , lwd=3 )
    abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
    abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
    abline(lm(Y~X),lwd=3)
    X Z Y
    -3 -2 -1 0 1 2 3
    -3 -2 -1 0 1 2 3
    X
    Y

    View full-size slide

  72. Z = 1
    Z = 0
    cols <- c(4,2)
    N <- 300
    X <- rnorm(N)
    Y <- rnorm(N)
    Z <- rbern(N,inv_logit(2*X+2*Y-2))
    plot( X , Y , col=cols[Z+1] , lwd=3 )
    abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
    abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
    abline(lm(Y~X),lwd=3)
    X Z Y
    -2 -1 0 1 2 3
    -2 -1 0 1 2 3
    X
    Y

    View full-size slide

  73. Non-linear haunting
    In event models, e ect reversal
    can arise other ways
    Example: Base rate di erences
    X Y
    Z
    treatment outcome
    covariate

    View full-size slide

  74. # base rate differences erasing effect of X
    N <- 1000
    X <- rnorm(N)
    Z <- rbern(N)
    p <- inv_logit(X + ifelse(Z==1,-1,5))
    Y <- rbern(N,p)
    mX <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b*X,
    a ~ dnorm(0,1),
    b ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X) )
    mXZ <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b[Z]*X,
    a ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    X Y
    Z

    View full-size slide

  75. X Y
    Z
    # base rate differences erasing effect of X
    N <- 1000
    X <- rnorm(N)
    Z <- rbern(N)
    p <- inv_logit(X + ifelse(Z==1,-1,5))
    Y <- rbern(N,p)
    mX <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b*X,
    a ~ dnorm(0,1),
    b ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X) )
    mXZ <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b[Z]*X,
    a ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    logit(p
    i
    ) = α + βX
    i

    View full-size slide

  76. # base rate differences erasing effect of X
    N <- 1000
    X <- rnorm(N)
    Z <- rbern(N)
    p <- inv_logit(X + ifelse(Z==1,-1,5))
    Y <- rbern(N,p)
    mX <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b*X,
    a ~ dnorm(0,1),
    b ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X) )
    mXZ <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b[Z]*X,
    a ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    X Y
    Z
    logit(p
    i
    ) = α + βX
    i
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i

    View full-size slide

  77. -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    logit(p
    i
    ) = α + βX
    i
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    Z collapsed

    View full-size slide

  78. -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    -0.5 0.0 0.5 1.0
    0 1 2 3 4
    beta[Z]
    Density

    View full-size slide

  79. mX <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b*X,
    a ~ dnorm(0,1),
    b ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X) )
    mXZ <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a + b[Z]*X,
    a ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    mXZ2 <- quap(
    alist(
    Y ~ dbern(p),
    logit(p) <- a[Z] + b[Z]*X,
    a[Z] ~ dnorm(0,1),
    b[Z] ~ dnorm(0,1)
    ) , data=list(Y=Y,X=X,Z=Z+1) )
    logit(p
    i
    ) = α + βX
    i
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    logit(p
    i
    ) = α
    Z[i]
    + β
    Z[i]
    X
    i

    View full-size slide

  80. -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    logit(p
    i
    ) = α
    Z[i]
    + β
    Z[i]
    X
    i
    -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    Z = 1
    Z = 2

    View full-size slide

  81. -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    logit(p
    i
    ) = α + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    logit(p
    i
    ) = α
    Z[i]
    + β
    Z[i]
    X
    i
    -2 -1 0 1 2
    0.0 0.2 0.4 0.6 0.8 1.0
    treatment (X)
    outcome (Y)
    Z = 1
    Z = 2
    -1.0 -0.5 0.0 0.5 1.0 1.5
    0.0 1.0 2.0 3.0
    beta[Z]
    Density
    -0.5 0.0 0.5 1.0
    0 1 2 3 4
    beta[Z]
    Density

    View full-size slide

  82. logit(p
    i
    ) = α
    Z[i]
    + β
    Z[i]
    X
    i
    Z = 1
    Z = 2
    -1.0 -0.5 0.0 0.5 1.0 1.5
    0.0 1.0 2.0 3.0
    beta[Z]
    Density

    View full-size slide

  83. Simpson’s Pandora’s Box
    No paradox, because almost
    anything can produce it
    People do not have intuitions
    about coe cient reversals
    Stop naming statistical paradoxes;
    start teaching scienti c logic
    Simpson 1951. e Interpretation of Interaction in Contingency Tables
    The container HELD
    every imaginable
    estimate.
    Pandora opens a spreadsheet
    Simpson’s paradox
    plagues us to this day.

    View full-size slide