Slide 1

Slide 1 text

The Golem of Prague Statistical Rethinking Winter 2019 Week 1

Slide 2

Slide 2 text

JJ Harrison, “Phylidonyris novaehollandiae Bruny Island”

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

The Golem of Prague go•lem |gōlǝm| noun • (in Jewish legend) a clay figure brought to life by magic. • an automaton or robot. ORIGIN late 19th cent.: from Yiddish goylem, from Hebrew gōlem ‘shapeless mass.’

Slide 9

Slide 9 text

The Golem of Prague “Even the most perfect of Golem, risen to life to protect us, can easily change into a destructive force. Therefore let us treat carefully that which is strong, just as we bow kindly and patiently to that which is weak.” Rabbi Judah Loew ben Bezalel (1512–1609) From Breath of Bones: A Tale of the Golem

Slide 10

Slide 10 text

The Golems of Science Golem • Made of clay • Animated by “truth” • Powerful • Blind to creator’s intent • Easy to misuse • Fictional Model • Made of...silicon? • Animated by “truth” • Hopefully powerful • Blind to creator’s intent • Easy to misuse • Not even false

Slide 11

Slide 11 text

Statistical Rethinking A Bayesian Course in R & Stan Week 1 Bayesian inference Chapters 1, 2, 3 Week 2 Linear models Chapter 4 Week 3 More linear models Chapters 5 & 6 Week 4 Overfitting Chapter 7 Week 5 Interactions Chapter 8 Week 6 MCMC & GLMs Chapters 9, 10, 11 Week 7 GLMs II Chapters 11 & 12 Week 8 Multilevel models I Chapter 13 Week 9 Multilevel models II Chapter 14 Week 10 Measurement error etc. Chapters 15 & 16 https://github.com/rmcelreath/statrethinking_winter2019

Slide 12

Slide 12 text

Goals & Methods • Practical model-building, model- criticizing skills • Enough philosophy to ground you • Enough confidence to be comfortable with confusion

Slide 13

Slide 13 text

Stan install.packages(c("coda","mvtnorm","devtools","loo")) library(devtools) devtools::install_github("rmcelreath/rethinking", ref="Experimental") http://mc-stan.org/ rethinking package (Experimental) 2nd Edition book draft http://xcelab.net/rm/sr2/ blossom

Slide 14

Slide 14 text

2nd Edition: Ch-Ch-Changes • Lots of prior predictive simulation • Causal inference: DAGs, colliders, instrumental variables • map becomes quap (name change) • map2stan replaced by ulam • New examples

Slide 15

Slide 15 text

Against Tests • Specialized, pre-made golems, “procedures” • Most developed in early 20th century, fragile, eclipsed by more recent tools • Users don’t know they are using models (golems) • Falsifying null model not sufficient • Inference is not decision O, that way madness lies

Slide 16

Slide 16 text

H0 “Evolution is neutral” P0A Neutral, equilibrium MII Hypotheses Process models Statistical models Figure 1.2

Slide 17

Slide 17 text

H0 H1 “Evolution is neutral” “Selection matters” P0A Neutral, equilibrium P1B Fluctuating selection P1A Constant selection MII MIII Hypotheses Process models Statistical models Figure 1.2

Slide 18

Slide 18 text

H0 H1 “Evolution is neutral” “Selection matters” P0A Neutral, non-equilibrium P0B Neutral, equilibrium P1B Fluctuating selection P1A Constant selection MI MII MIII Hypotheses Process models Statistical models Figure 1.2

Slide 19

Slide 19 text

Failure of Falsification • Null models not unique • Should falsify explanatory model, not null model • Falsification is consensual, not logical • Falsifiability about demarcation, not method • No statistical procedure sufficient • Science is social technology “There is even something like a methodological justification for individual scientists to be dogmatic and biased. Since the method of science is that of critical discussion, it is of great importance that the theories criticized should be tenaciously defended. For only in this way can we learn their real power.” —Karl Popper, The Myth of the Framework

Slide 20

Slide 20 text

Golem Engineering • Need a framework for developing and vetting statistical golems • Several options • We’ll use this one • Bayesian data analysis • Multilevel modeling • Model comparison From Breath of Bones: A Tale of the Golem

Slide 21

Slide 21 text

Bayesian data analysis • Use probability to describe uncertainty • Extends ordinary logic (true/false) to continuous plausibility • Computationally difficult • Markov chain Monte Carlo (MCMC) to the rescue • Used to be controversial • Ronald Fisher: Bayesian analysis “must be wholly rejected.” Pierre-Simon Laplace (1749–1827) Sir Harold Jeffreys (1891–1989) with Bertha Swirles, aka Lady Jeffreys (1903–1999)

Slide 22

Slide 22 text

Bayesian data analysis Count all the ways data can happen, according to assumptions. Assumptions with more ways that are consistent with data are more plausible.

Slide 23

Slide 23 text

Multilevel models • Models with multiple levels of uncertainty • Replace parameters with models • Common uses • Repeat & imbalanced sampling • Study variation • Avoid averaging • Phylogenetics, factor and path analysis, networks, spatial models • Natural Bayesian strategy

Slide 24

Slide 24 text

Model comparison • Instead of falsifying a null model, compare meaningful models • Basic problems • Overfitting • Causal inference • Ockham’s razor is silly • Information theory less silly • AIC, WAIC, cross-validation • Must distinguish prediction from inference

Slide 25

Slide 25 text

Colombo’s Mistake Behaim’s globe, as detailed in 1492

Slide 26

Slide 26 text

Colombo’s Mistake Behaim’s globe, as detailed in 1492

Slide 27

Slide 27 text

Small and Large Worlds • Sensu L.J. Savage (1954) • Small world: The world of the golem’s assumptions. Bayesian golems are optimal, in the small world. • Large world: The real world. No guarantee of optimality for any kind of golem. • Have to worry about both

Slide 28

Slide 28 text

Bayesian data analysis Count all the ways data can happen, according to assumptions. Assumptions with more ways that are consistent with data are more plausible.

Slide 29

Slide 29 text

Garden of Forking Data • The future: • Full of branching paths • Each choice closes some • The data: • Many possible events • Each observation eliminates some

Slide 30

Slide 30 text

Garden of Forking Data (1) (2) (3) (4) (5) Contains 4 marbles ? Possible contents: Observe:

Slide 31

Slide 31 text

Conjecture: Data:

Slide 32

Slide 32 text

Conjecture: Data:

Slide 33

Slide 33 text

Conjecture: Data:

Slide 34

Slide 34 text

Conjecture: Data: 3 paths consistent with data

Slide 35

Slide 35 text

Garden of Forking Data (1) (2) (3) (4) (5) Possible contents: Ways to produce ? 3 ? ? ?

Slide 36

Slide 36 text

Garden of Forking Data (1) (2) (3) (4) (5) Possible contents: Ways to produce 0 3 ? ? 0

Slide 37

Slide 37 text

3 ways 9 ways 8 ways

Slide 38

Slide 38 text

3 ways 9 ways 8 ways

Slide 39

Slide 39 text

3 ways 9 ways 8 ways

Slide 40

Slide 40 text

Garden of Forking Data OE XIJUF UIFSF BSF QBUIT UIBU TVSWJWF WF DPOTJEFSFE ĕWF EJČFSFOU DPOKFDUVSFT BCPVU UIF DPOUFOUT PG UIF CBH F NBSCMFT UP GPVS CMVF NBSCMFT 'PS FBDI PG UIFTF DPOKFDUVSFT XFWF TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUFOUJBMMZ EBUB $POKFDUVSF 8BZT UP QSPEVDF < > × × = < > × × = < > × × = < > × × = < > × × = S PG XBZT UP QSPEVDF UIF EBUB GPS FBDI DPOKFDUVSF DBO CF DPNQVUFE VNCFS PG QBUIT JO FBDI iSJOHw PG UIF HBSEFO BOE UIFO CZ NVMUJQMZJOH ćJT JT KVTU B DPNQVUBUJPOBM EFWJDF *U UFMMT VT UIF TBNF UIJOH BT 'ĶĴ BWJOH UP ESBX UIF HBSEFO ćF GBDU UIBU OVNCFST BSF NVMUJQMJFE EVSJOH OHF UIF GBDU UIBU UIJT JT TUJMM KVTU DPVOUJOH PG MPHJDBMMZ QPTTJCMF QBUIT

Slide 41

Slide 41 text

Updating Another draw from the bag: QBUIT DPNQBUJCMF XJUI UIF EBUB TFRVFODF 0S ZPV DPVME UBLF UIF Q PWFS DPOKFDUVSFT BOE KVTU VQEBUF UIFN JO MJHIU PG UIF OFX PCTFS PVU UIBU UIFTF UXP NFUIPET BSF NBUIFNBUJDBMMZ JEFOUJDBM "T MPOH BT UIF OFX JT MPHJDBMMZ JOEFQFOEFOU PG UIF QSFWJPVT PCTFSWBUJPOT SFT IPX UP EP JU 'JSTU XF DPVOU UIF OVNCFST PG XBZT FBDI DPOKFDUVSF DPVME Q X PCTFSWBUJPO ćFO XF NVMUJQMZ FBDI PG UIFTF OFX DPVOUT CZ UIF QSFWJPVT OV GPS FBDI DPOKFDUVSF *O UBCMF GPSN $POKFDUVSF 8BZT UP QSPEVDF 1SFWJPVT DPVOUT /FX DPVOU < > × = < > × = < > × = < > × = < > × = X DPVOUT JO UIF SJHIUIBOE DPMVNO BCPWF TVNNBSJ[F BMM UIF FWJEFODF GPS FBDI T OFX EBUB BSSJWF BOE QSPWJEFE UIPTF EBUB BSF JOEFQFOEFOU PG QSFWJPVT PCTFSW F OVNCFS PG MPHJDBMMZ QPTTJCMF XBZT GPS B DPOKFDUVSF UP QSPEVDF BMM UIF EBUB VQ BO CF DPNQVUFE KVTU CZ NVMUJQMZJOH UIF OFX DPVOU CZ UIF PME DPVOU T VQEBUJOH BQQSPBDI BNPVOUT UP OPUIJOH NPSF UIBO BTTFSUJOH UIBU XIFO X VT JOGPSNBUJPO TVHHFTUJOH UIFSF BSF 8QSJPS XBZT GPS B DPOKFDUVSF UP QSPEVDF B Q 4

Slide 42

Slide 42 text

Using other information marbles rare, but every bag contains at least one. Factory says: IJT FYBNQMF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBX #VU JO HFOFSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQ UIBU TPNFPOF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF H DPOUBJOJOH < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT D > ćFZ BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE PO 8F DBO VQEBUF PVS DPVOUT BHBJO $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU < > × = < > × = < > × = < > × = < > × = DPOKFDUVSF < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > * E EJČFSFODF JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPO SSFDU POF :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO OH 0SJHJOBM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBXO GSPN FSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQQPTF GPS POF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF 4P GPS H < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT DPOUBJO Z BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE POF XIJUF EBUF PVS DPVOUT BHBJO $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU > × = > × = > × = > × = > × = < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > *T UIFSF B F JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPOKFDUVSFT :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO BM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSFWJPVT JO

Slide 43

Slide 43 text

Using other information marbles rare. Factory says: IJT FYBNQMF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBX #VU JO HFOFSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQ UIBU TPNFPOF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF H DPOUBJOJOH < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT D > ćFZ BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE PO 8F DBO VQEBUF PVS DPVOUT BHBJO $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU < > × = < > × = < > × = < > × = < > × = DPOKFDUVSF < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > * E EJČFSFODF JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPO SSFDU POF :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO OH 0SJHJOBM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSF

Slide 44

Slide 44 text

Counts to plausibility Unglamorous basis of applied probability: Things that can happen more ways are more plausible. J[F JT UP BEE VQ BMM PG UIF QSPEVDUT POF GPS FBDI WBMVF Q DBO UBLF XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q UIFO EJWJEF FBDI QSPEVDU CZ UIF TVN PG QSPEVDUT QMBVTJCJMJUZ PG Q BęFS %OFX = XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q TVN PG QSPEVDUT FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ UZw 1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ < > < > . < > . < > . < > DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3 ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ

Slide 45

Slide 45 text

Counts to plausibility TVN PG QSPEVDUT FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ UZw 1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ < > < > . < > . < > . < > DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3 ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ dz.0(ǭ24.Ǯ ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF CFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJ UIF QMBVTJCJMJUJFT GPS BMM QPTTJCMF DPOKFDUVSFT XJMM CF POF "MM ZPV OFFE UP EP JO PSEFS UP TUBO EBSEJ[F JT UP BEE VQ BMM PG UIF QSPEVDUT POF GPS FBDI WBMVF Q DBO UBLF XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q "OE UIFO EJWJEF FBDI QSPEVDU CZ UIF TVN PG QSPEVDUT QMBVTJCJMJUZ PG Q BęFS %OFX = XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q TVN PG QSPEVDUT ćFSFT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH UIF OVNCFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQMBV TJCJMJUZw 1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ < > < > . < > . < > . < > :PV DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3 3 DPEF 24. ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ 24.dz.0(ǭ24.Ǯ ǯƼǰ ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF SFBM OVNCFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJMJUJFT ZPV DBO BMTP EP XJUI UIFTF WBMVFT 4QFDJĕDBMMZ FBDI QJFDF PG UIF DBMDVMBUJPO IBT B EJSFDU QBSUOFS JO BQQMJFE QSPCBCJMJUZ UIFPSZ ćFTF QBSUOFST IBWF TUFSFPUZQFE OBNFT TP JUT XPSUI

Slide 46

Slide 46 text

Counts to plausibility TVN PG QSPEVDUT FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ UZw 1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ < > < > . < > . < > . < > DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3 ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ dz.0(ǭ24.Ǯ ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF CFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJ Plausibility is probability: Set of non-negative real numbers that sum to one. Probability theory is just a set of shortcuts for counting possibilities.

Slide 47

Slide 47 text

Building a model • How to use probability to do typical statistical modeling? 1. Design the model (data story) 2. Condition on the data (update) 3. Evaluate the model (critique)

Slide 48

Slide 48 text

Nine tosses of the globe: W L W W W L W L W