Richard McElreath
February 01, 2019
1.9k

# L12 Statistical Rethinking Winter 2019

Lecture 12 of the Dec 2018 through March 2019 edition of Statistical Rethinking. Covers Chapter 11 and 11, generalized linear models, binomial, Poisson GLMs, survival analysis.

## Richard McElreath

February 01, 2019

## Transcript

Week 6

6. ### 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15

proportion left lever 0 0.5 1 actor 1 actor 2

actor 3 actor 4 actor 5 actor 6 actor 7 R/N L/N R/P L/P observed proportions proportion left lever 0 0.5 1 actor 1 actor 2 actor 3 actor 4 actor 5 actor 6 actor 7 posterior predictions Figure 11.4
proportion left lever 0 0.5 1 actor 1 actor 2

Relative and absolute effects • Parameters on relative effect scale

• Predictions on absolute effect scale • Proportional odds: Relative effect measure • Good for scaring people, getting published • Not so good for public health, scientific progress • But needed for causal inference  3FMBUJWF TIBSL BOE BCTPMVUF QFOHVJO *O UIF BOBMZTJT BCPWF * NPTUMZ GP DIBOHFT JO QSFEJDUJPOT PO UIF PVUDPNF TDBMFIPX NVDI EJČFSFODF EPFT UIF USFBUN JO UIF QSPCBCJMJUZ PG QVMMJOH B MFWFS ćJT WJFX PG QPTUFSJPS QSFEJDUJPO GPDVTFT PO Į ĲĳĳĲİŁŀ UIF EJČFSFODF B DPVOUFSGBDUVBM DIBOHF JO B WBSJBCMF NJHIU NBLF PO BO TDBMF PG NFBTVSFNFOU MJLF UIF QSPCBCJMJUZ PG BO FWFOU *U JT NPSF DPNNPO UP TFF MPHJTUJD SFHSFTTJPOT JOUFSQSFUFE UISPVHI ĿĲĹĮŁĶŃĲ 3FMBUJWF FČFDUT BSF QSPQPSUJPOBM DIBOHFT JO UIF PEET PG BO PVUDPNF *G XF DIBOHF BOE TBZ UIF PEET PG BO PVUDPNF EPVCMF UIFO XF BSF EJTDVTTJOH SFMBUJWF FČFDUT :PV MBUF UIFTF ĽĿļĽļĿŁĶļĻĮĹ ļııŀ SFMBUJWF FČFDU TJ[FT CZ TJNQMZ FYQPOFOUJBUJOH UIF Q PG JOUFSFTU 'PS FYBNQMF UP DBMDVMBUF UIF QSPQPSUJPOBM PEET PG TXJUDIJOH GSPN USFBU USFBUNFOU  BEEJOH B QBSUOFS  3 DPEF  +*./ ʚǶ 3/-/ǡ.(+' .ǿ(ǎǎǡǑȀ ( )ǿ 3+ǿ+*./ɶȁǢǑȂǶ+*./ɶȁǢǏȂȀ Ȁ ȁǎȂ ǍǡǖǏǍǓǑǔǖ 0O BWFSBHF UIF TXJUDI NVMUJQMFT UIF PEET PG QVMMJOH UIF MFę MFWFS CZ  BO  S JO PEET ćJT JT XIBU JT NFBOU CZ QSPQPSUJPOBM PEET ćF OFX PEET BSF DBMDVMBUFE UIF PME PEET BOE NVMUJQMZJOH UIFN CZ UIF QSPQPSUJPOBM PEET XIJDI JT  JO UIJT ćF SJTL PG GPDVTJOH PO SFMBUJWF FČFDUT TVDI BT QSPQPSUJPOBM PEET JT UIBU UI FOPVHI UP UFMM VT XIFUIFS B WBSJBCMF JT JNQPSUBOU PS OPU *G UIF PUIFS QBSBNFUFST JO U NBLF UIF PVUDPNF WFSZ VOMJLFMZ UIFO FWFO B MBSHF QSPQPSUJPOBM PEET MJLF  XPVME UIF PVUDPNF GSFRVFOU \$POTJEFS GPS FYBNQMF B SBSF EJTFBTF XIJDI PDDVST JO  QFS PO QFPQMF 4VQQPTF BMTP UIBU SFBEJOH UIJT UFYUCPPL JODSFBTFE UIF PEET PG UIF EJTFBTF G
Relative and absolute effects • Parameters on relative effect scale

• Predictions on absolute effect scale • Using relative effects may exaggerate importance of predictor • Good for scaring people, getting published • Not so good for public health, scientific progress • But needed for causal inference relative shark absolute penguin
Risk communication • Many people mistake relative risk for absolute

risk • Example: • 1/1000 women develop blood clots • 3/1000 women on birth control develop blood clots • => 200% increase in blood clots! • Change in probability is only 0.002 • Pregnancy much more dangerous than blood clots
Aggregated Binomial • Numbers accepted/rejected to 6 PhD programs at

UCB admissions dept applicant.gender admit reject applications 1 A male

Posterior contrast • Compute the contrast between genders • On

0.0 0.2 0.4 0.6 0.8 1.0 case admit 1 2

3 4 5 6 7 8 9 10 11 12 Posterior validation check A B C D E F 'ĶĴłĿĲ ƉƉƍ 1PTUFSJPS WBMJEBUJPO GPS NPEFM (ǎǎǡǔ #MVF QPJOUT BSF PC TFSWFE QSPQPSUJPOT BENJUUFE GPS FBDI SPX JO UIF EBUB XJUI QPJOUT GSPN UIF TBNF EFQBSUNFOU DPOOFDUFE CZ B CMVF MJOF 0QFO QPJOUT UIF UJOZ WFSUJDBM m f Females admitted more in all but 2 departments! Figure 11.5
Backdoor admissions • Backdoor path through department • Use unique

Stratification by department Stat Q: What are the average probabilities

Again with shark & penguin • Difference on logit and

Backdoor admissions • What happened? Females apply more to most

Backdoor admissions • Careful about casual interpretation • No evidence

Poisson GLMs • Counts without upper limit, constant expected value

• Single parameter: events per unit time/distance • Variance equal to mean Z ∼ 1PJTTPO(λ) Z J ∼ /PSNBM(µJ, σ), µJ = α + βYJ &(Z J |YJ) = α + βYJ ∂ ∂YJ &(Z J |YJ) = β Z J ∼ #JOPNJBM(Q J, O), &(Z) = λ WBS(Z) = λ Z J ∼ /PSNBM(µJ, σ), µJ = α + βYJ &(Z J |YJ) = α + βYJ ∂ ∂YJ &(Z J |YJ) = β 0 5 10 15 0 500 1000 1500 Count Frequency 0 5 10 20 30 0 200 400 600 800 Count Frequency
Poisson GLMs • Examples: Soccer goals, fission events, photons striking

a detector, DNA mutations, soldiers killed by horses Siméon Denis Poisson (1781–1840) Abraham de Moivre (1667–1754)
Oceanic tool complexity

Log link • Goal: Map linear model to positive reals

Priors & the log link • Log link not intuitive

Compare using PSIS-LOO • Warning indicates strongly influential points •

Hawaii has leverage

Generalized Linear Madness • This model is terrible: • Intercepts

Scientific model • Change in tools per unit time:

Scientific model • Solve for steady state expected number of

Poisson exposure (offsets) • Poisson outcome: events per unit time/distance

Additional count distributions • Multinomial/categorical: generalized binomial, more than 2

un-ordered outcomes • Geometric: number of trials until specific event • Mixtures, coping with heterogeneity: • Beta-binomial: varying probabilities • gamma-Poisson: aka negative-Binomial, varying rates • others (e.g. Dirichlet-multinomial)
Survival Analysis • Count models are fundamentally about rates •

Rate of heads per coin toss • Rate of tools per person • Can also estimate rates by modeling time-to-event • Tricky, because cannot ignore censored cases • Left-censored: Don’t know when time started • Right-censored: Something cut observation off before event occurred • Ignoring censored cases leads to inferential error • Imagine estimating time-to-PhD but ignoring people who drop out • Time in program before dropping out is info about rate
Survival Analysis • Example: Cat adoptions • data(AustinCats) • 20-thousand

cats • time-to-event • Event either: (1) adopted or (2) something else • Something else could be: death, escape, censored
Censored cats • Cumulative distribution (CDF): Probability event before-or-at time

Other cats Posterior survival curves 0 20 40 60 80

Homework • 3 problems, 2 data sets, multiple good DAGs

• One of the data sets (NWOGrants) is new in rethinking 1.83, so update • Next week: More adventures with integers