Lecture 11 of the Dec 2018 through March 2019 edition of Statistical Rethinking. Covers Chapters 10 and 11, maximum entropy, generalized linear models, binomial GLMs.
The maxent principle: • Distribution with largest entropy is distribution most consistent with stated assumptions • Can happen the largest number of ways • For parameters, provides way to construct priors • For observations, way to construct likelihood • Also reproduces Bayesian updating as special case (minimum cross-entropy) E. T. Jaynes (1922–1998)
The maxent principle: • Distribution with largest entropy is distribution most consistent with stated assumptions • Can happen the largest number of ways • For parameters, provides way to understand priors • For observations, way to understand likelihood • Also reproduces Bayesian updating as special case (minimum cross-entropy) E. T. Jaynes (1922–1998)
kind of distribution maximizes this quantity? • A: Flattest distribution still consistent with constraints. This is the distribution that can happen the most unique ways. • Whatever does happen, bound to be one of those ways. .BYJNVN FOUSPQZ $IBQUFS ZPV NFU UIF CBTJDT PG JOGPSNBUJPO UIFPSZ *O CSJFG XF TFFL B NFBT UBJOUZ UIBU TBUJTĕFT UISFF DSJUFSJB UIF NFBTVSF TIPVME CF DPOUJOVPVT JU T TF BT UIF OVNCFS PG QPTTJCMF FWFOUT JODSFBTFT BOE JU TIPVME CF BEEJUJWF ć H VOJRVF NFBTVSF PG UIF VODFSUBJOUZ PG B QSPCBCJMJUZ EJTUSJCVUJPO Q XJUI QSPCB FBDI QPTTJCMF FWFOU J UVSOT PVU UP CF KVTU UIF BWFSBHF MPHQSPCBCJMJUZ )(Q) = − J QJ MPH QJ VODUJPO JT LOPXO BT JOGPSNBUJPO FOUSPQZ
variable • Still geocentric! • Strategy: 1. Pick an outcome distribution 2. Model its parameters using links to linear models 3. Compute posterior • Can model multivariate relationships and non- linear responses • Building blocks of multilevel models
• Mostly exponential family • Arise from natural processes • Maximum entropy interpretations • Select from first principles • Resist histomancy: Superstitious practice of picking likelihoods by gazing at a histogram
probability many trials dnorm dgamma dpois dbinom dexp Z ∼ /PSNBM(µ, σ) Z ∼ #JOPNJBM(O, Q) Z ∼ 1PJTTPO(λ) Z ∼ (BNNB(λ, L) Z ∼ &YQPOFOUJBM(λ) Figure 9.5
probability many trials dnorm dgamma dpois dbinom dexp Z ∼ /PSNBM(µ, σ) Z ∼ #JOPNJBM(O, Q) Z ∼ 1PJTTPO(λ) Z ∼ (BNNB(λ, L) Z ∼ &YQPOFOUJBM(λ) Figure 9.5
probability many trials dnorm dgamma dpois dbinom dexp Z ∼ /PSNBM(µ, σ) Z ∼ #JOPNJBM(O, Q) Z ∼ 1PJTTPO(λ) Z ∼ (BNNB(λ, L) Z ∼ &YQPOFOUJBM(λ) Figure 9.5
probability many trials dnorm dgamma dpois dbinom dexp Z ∼ /PSNBM(µ, σ) Z ∼ #JOPNJBM(O, Q) Z ∼ 1PJTTPO(λ) Z ∼ (BNNB(λ, L) Z ∼ &YQPOFOUJBM(λ) Figure 9.5
probability many trials dnorm dgamma dpois dbinom dexp Z ∼ /PSNBM(µ, σ) Z ∼ #JOPNJBM(O, Q) Z ∼ 1PJTTPO(λ) Z ∼ (BNNB(λ, L) Z ∼ &YQPOFOUJBM(λ) Figure 9.5
• Constant expected value • Maxent: Binomial Binomial distribution count “successes” number of trials probability of success Z ∼ #JOPNJBM(O, Q) 0 2 4 6 8 10 0 500 1500 2500 Count Frequency lambda=0.5
= OQ( − Q) Mean and variance not independent • Counts of a specific event out of n possibilities • Constant expected value • Maxent: Binomial 0 2 4 6 8 10 0 500 1500 2500 Count Frequency lambda=0.5
Several good answers: • “Natural” link inside probability formula • log-odds is fundamental parameter • See Overthinking box, pages 313–314 • Other links sometimes justified • Probit (common in economics) • Complementary-log-log (cloglog) • If you have a real scientific model, link is automatic F ĹļĴĶŁ ĹĶĻĸ NBQT B QBSBNFUFS UIBU JT EFĕOFE BT B QSPCBCJMJUZ NBTT BOE UIFSF JOFE UP MJF CFUXFFO [FSP BOE POF POUP B MJOFBS NPEFM UIBU DBO UBLF PO BOZ SFBM W L JT FYUSFNFMZ DPNNPO XIFO XPSLJOH XJUI CJOPNJBM (-.T *O UIF DPOUFYU EFĕOJUJPO JU MPPLT MJLF UIJT ZJ ∼ #JOPNJBM(O, QJ) MPHJU(QJ) = α + βYJ F MPHJU GVODUJPO JUTFMG JT EFĕOFE BT UIF MPHPEET MPHJU(QJ) = MPH QJ − QJ ETw PG BO FWFOU BSF KVTU UIF QSPCBCJMJUZ JU IBQQFOT EJWJEFE CZ UIF QSPCBCJMJUZ JU QFO 4P SFBMMZ BMM UIBU JT CFJOH TUBUFE IFSF JT MPH QJ − QJ = α + βYJ HVSF PVU UIF EFĕOJUJPO PG QJ JNQMJFE IFSF KVTU EP B MJUUMF BMHFCSB BOE TPMWF UIF BC O GPS QJ QJ = FYQ(α + βYJ) + FYQ(α + βYJ)
Two options: (1) prosocial, (2) asocial • Two outcomes: (1) left lever, (2) right lever • Want to predict outcome as function of condition and which side option is on • Do chimps prefer left lever when partner present and prosocial on left? => interaction! #*/0.*"- 8IFO IVNBO TUVEFOUT QBSUJDJQBUF JO BO FY UIF MFWFS MJOLFE UP UXP QJFDFT PG GPPE UIF QSPT
actor 3 actor 4 actor 5 actor 6 actor 7 R/N L/N R/P L/P observed proportions proportion left lever 0 0.5 1 actor 1 actor 2 actor 3 actor 4 actor 5 actor 6 actor 7 posterior predictions Figure 11.4
actor 3 actor 4 actor 5 actor 6 actor 7 R/N L/N R/P L/P observed proportions proportion left lever 0 0.5 1 actor 1 actor 2 actor 3 actor 4 actor 5 actor 6 actor 7 posterior predictions Figure 11.4
• Predictions on absolute effect scale • Using relative effects may exaggerate importance of predictor • Good for scaring people, getting published • Not so good for public health, scientific progress • But needed for causal inference relative shark absolute penguin
risk • Example: • 1/1000 women develop blood clots • 3/1000 women on birth control develop blood clots • => 200% increase in blood clots! • Change in probability is only 0.002 • Pregnancy much more dangerous than blood clots