Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking 2023 - Lecture 01

Statistical Rethinking 2023 - Lecture 01

Course description and materials: https://github.com/rmcelreath/stat_rethinking_2023

Richard McElreath

January 02, 2023
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Texts in Statistical Science Richard McElreath McElreath Statistical Rethinking A

    Bayesian Course with Examples in R and Stan SECOND EDITION Second Edition Statistical Rethinking F ESBę 'ĶĴłĿIJ ƊƉ *MMVTUSBUJPO PG .BSUJO #FIBJNT  HMPCF TIPXJOH UIF TNBMM XPSME UIBU $PMPNCP BOUJDJQBUFE &VSPQF MJFT PO UIF SJHIU IBOE TJEF "TJB MJFT PO UIF MFę ćF CJH JTMBOE MBCFMFE i$JQBOHVw JT +BQBO XPSMET VTJOH UIF TUSFOHUIT PG RVBMJUBUJWF JOTJHIU UP DSJUJDJ[F TNBMM XPSME RVBOUJUBUJWF JOTJHIU ćF QSFDJTJPO BOE USBOTQBSFODZ PG UIF TNBMM XPSME JT QPXFSGVM BOE FTTFOUJBM #VU JU JT WFSZ EJďDVMU UP JODPSQPSBUF BMM PG PVS TDJFOUJĕD LOPXMFEHF BOE FYQFSUJTF JOUP UIF TNBMM XPSME 4P PęFO XF SFBMJ[F IPX CBE B NPEFM JT POMZ XIFO XF TFF IPX CBEMZ JU CFIBWFT JO UIF CSPBEFS TDJFOUJĕD DPOUFYU *O UIJT DIBQUFS ZPV XJMM CFHJO UP CVJME #BZFTJBO NPEFMT "MNPTU BMM PG UIF XPSL JO UIJT DIBQUFS UBLFT QMBDF JO UIF TNBMM XPSME 8FMM TUBSU XJUI B TJNQMF HPBM FTUJNBOE  8IBU QSP QPSUJPO PG UIF &BSUIT TVSGBDF JT DPWFSFE CZ XBUFS ćJT JT B EFTDSJQUJWF FTUJNBOE CVU JU TUJMM SFRVJSFT DBVTBM BTTVNQUJPOT BCPVU IPX UIF TBNQMF BSJTFT 0ODF UIPTF BTTVNQUJPOT BSF JO QMBDF XFMM VTF #BZFTJBO JOGFSFODF UP QSPEVDF BO FTUJNBUF 3FUIJOLJOH 'BTU BOE GSVHBM JO UIF MBSHF XPSME ćF OBUVSBM XPSME JT DPNQMFY BT USZJOH UP EP TDJFODF TFSWFT UP SFNJOE VT :FU FWFSZUIJOH GSPN UIF IVNCMF UJDL UP UIF JOEVTUSJPVT TRVJSSFM UP UIF JEMF TMPUI NBOBHFT UP GSFRVFOUMZ NBLF BEBQUJWF EFDJTJPOT #VU JUT B HPPE CFU UIBU NPTU BOJNBMT BSF OPU #BZFTJBO JG POMZ CFDBVTF CFJOH #BZFTJBO JT FYQFOTJWF BOE EFQFOET VQPO IBWJOH B HPPE NPEFM *OTUFBE BOJNBMT VTF WBSJPVT IFVSJTUJDT UIBU BSF ĕU UP UIFJS FOWJSPONFOUT QBTU PS QSFTFOU ćFTF IFVSJTUJDT UBLF BEBQUJWF TIPSUDVUT BOE TP NBZ PVUQFSGPSN B SJHPSPVT #BZFTJBO BOBMZTJT PODF DPTUT PG JOGPSNBUJPO HBUIFSJOH BOE QSPDFTTJOH BOE PWFSĕUUJOH $IBQUFS  BSF UBLFO JOUP BDDPVOU 0ODF ZPV BMSFBEZ LOPX XIJDI JOGPSNBUJPO UP JHOPSF PS BUUFOE UP CFJOH GVMMZ #BZFTJBO JT B XBTUF *UT OFJUIFS OFDFTTBSZ OPS TVďDJFOU GPS NBLJOH HPPE EFDJTJPOT BT SFBM BOJNBMT EFNPOTUSBUF #VU GPS IVNBO BOJNBMT #BZFTJBO BOBMZTJT QSPWJEFT B HFOFSBM XBZ UP EJTDPWFS SFMFWBOU JOGPSNBUJPO BOE QSPDFTT JU MPHJDBMMZ  ćF HBSEFO PG GPSLJOH EBUB 0VS HPBM JO UIJT TFDUJPO JT UP CVJME #BZFTJBO JOGFSFODF VQ GSPN IVNCMF CFHJOOJOHT TP UIFSF JT OP TVQFSTUJUJPO BCPVU JU #BZFTJBO JOGFSFODF JT SFBMMZ KVTU DPVOUJOH BOE DPNQBSJOH PG QPTTJCJMJUJFT 8F NBLF DBVTBM BTTVNQUJPOT XF FOVNFSBUF UIF JNQMJDBUJPOT PG UIPTF BTTVNQ THIRD
  2. F ESBę   4."-- 803-%4 "/% -"3(& 803-%4 'ĶĴłĿIJ

    ƊƉ *MMVTUSBUJPO PG .BSUJO #FIBJNT  HMPCF TIPXJOH UIF TNBMM XPSME UIBU $PMPNCP BOUJDJQBUFE &VSPQF MJFT PO UIF SJHIU IBOE TJEF "TJB MJFT PO UIF MFę ćF CJH JTMBOE MBCFMFE i$JQBOHVw JT +BQBO XPSMET VTJOH UIF TUSFOHUIT PG RVBMJUBUJWF JOTJHIU UP DSJUJDJ[F TNBMM XPSME RVBOUJUBUJWF JOTJHIU ćF QSFDJTJPO BOE USBOTQBSFODZ PG UIF TNBMM XPSME JT QPXFSGVM BOE FTTFOUJBM #VU JU JT WFSZ EJďDVMU UP JODPSQPSBUF BMM PG PVS TDJFOUJĕD LOPXMFEHF BOE FYQFSUJTF JOUP UIF TNBMM XPSME 4P PęFO XF SFBMJ[F IPX CBE B NPEFM JT POMZ XIFO XF TFF IPX CBEMZ JU CFIBWFT JO UIF CSPBEFS TDJFOUJĕD DPOUFYU *O UIJT DIBQUFS ZPV XJMM CFHJO UP CVJME #BZFTJBO NPEFMT "MNPTU BMM PG UIF XPSL JO UIJT DIBQUFS UBLFT QMBDF JO UIF TNBMM XPSME 8FMM TUBSU XJUI B TJNQMF HPBM FTUJNBOE  8IBU QSP QPSUJPO PG UIF &BSUIT TVSGBDF JT DPWFSFE CZ XBUFS ćJT JT B EFTDSJQUJWF FTUJNBOE CVU JU TUJMM SFRVJSFT DBVTBM BTTVNQUJPOT BCPVU IPX UIF TBNQMF BSJTFT 0ODF UIPTF BTTVNQUJPOT BSF JO QMBDF XFMM VTF #BZFTJBO JOGFSFODF UP QSPEVDF BO FTUJNBUF 3FUIJOLJOH 'BTU BOE GSVHBM JO UIF MBSHF XPSME ćF OBUVSBM XPSME JT DPNQMFY BT USZJOH UP EP TDJFODF TFSWFT UP SFNJOE VT :FU FWFSZUIJOH GSPN UIF IVNCMF UJDL UP UIF JOEVTUSJPVT TRVJSSFM UP UIF JEMF TMPUI NBOBHFT UP GSFRVFOUMZ NBLF BEBQUJWF EFDJTJPOT #VU JUT B HPPE CFU UIBU NPTU BOJNBMT BSF OPU #BZFTJBO JG POMZ CFDBVTF CFJOH #BZFTJBO JT FYQFOTJWF BOE EFQFOET VQPO IBWJOH B HPPE NPEFM *OTUFBE BOJNBMT VTF WBSJPVT IFVSJTUJDT UIBU BSF ĕU UP UIFJS FOWJSPONFOUT QBTU PS QSFTFOU ćFTF IFVSJTUJDT UBLF BEBQUJWF TIPSUDVUT BOE TP NBZ PVUQFSGPSN B SJHPSPVT #BZFTJBO BOBMZTJT PODF DPTUT PG JOGPSNBUJPO HBUIFSJOH BOE QSPDFTTJOH BOE PWFSĕUUJOH $IBQUFS  BSF UBLFO JOUP BDDPVOU 0ODF ZPV BMSFBEZ LOPX XIJDI JOGPSNBUJPO UP JHOPSF PS BUUFOE UP CFJOH GVMMZ #BZFTJBO JT B XBTUF *UT OFJUIFS OFDFTTBSZ OPS TVďDJFOU GPS NBLJOH HPPE EFDJTJPOT BT SFBM BOJNBMT EFNPOTUSBUF #VU GPS IVNBO BOJNBMT #BZFTJBO BOBMZTJT QSPWJEFT B HFOFSBM XBZ UP EJTDPWFS SFMFWBOU JOGPSNBUJPO BOE QSPDFTT JU MPHJDBMMZ Changes and updates Fewer examples, more depth Detailed workflow, testing Interventions, post-stratification Foreground measurement, missing Sensitivity analysis
  3. Science Before Statistics For statistical models to produce scientific insight,

    they require additional scientific (causal) models The reasons for a statistical analysis are not found in the data themselves, but rather in the causes of the data The causes of the data cannot be extracted from the data alone. No causes in; no causes out.
  4. What is Causal Inference? More than association between variables Causal

    inference is prediction of intervention Causal inference is imputation of missing observations
  5. Causal Prediction Knowing a cause means being able to predict

    the consequences of an intervention. 
 What if I do this?
  6. Causal Imputation Knowing a cause means being able to construct

    unobserved counterfactual outcomes. 
 What if I had done something else?
  7. Causes Are Not Optional Even when goal is descriptive, need

    causal model The sample differs from the population; describing the population requires causal thinking about why
  8. DAGs Directed Acyclic Graphs Heuristic causal models Clarify scientific thinking

    Analyze to deduce appropriate statistical models “What can we decide, without additional assumptions?” Gateway to scientific modeling s  * " u
  9. s  * " u DAGs Different queries, different models

    Which control variables? Absolute not safe to add everything — bad controls How to test/refine the causal model? DAGs are intuition pumps: get head out of data, into science
  10. Statistical Models   5)& (0-&. 0' 13"(6& Clay robots

    Powerful No wisdom or foresight Dangerous
  11. Statistical Models Incredibly limiting Focus on rejecting null hypotheses Relationship

    between research and test not clear Industrial framework   5)& (0-&. 0' 13"(6&
  12. Null Models Rarely Unique Null population dynamics? Null phylogeny? Null

    ecological community? Null social network? Problem: Many processes produce similar distributions
  13. H0 H1 “Evolution is neutral” “Selection
 matters” P0A Neutral, non-equilibrium

    P0B Neutral, equilibrium P1B Fluctuating selection P1A Constant selection MI MII MIII Hypotheses Process models Statistical models Figure 1.2
  14. H0 “Evolution is neutral” P0A Neutral, non-equilibrium P0B Neutral, equilibrium

    MI MII Hypotheses Process models Statistical models Figure 1.2
  15. H0 H1 “Evolution is neutral” “Selection
 matters” P0A Neutral, non-equilibrium

    P0B Neutral, equilibrium P1B Fluctuating selection P1A Constant selection MI MII MIII Hypotheses Process models Statistical models Figure 1.2
  16. SPECIES a b species per island ISLANDS A B C

    1 0 1 0 1 0 1 1 i 1 islands per D species 0 2 i 2 Fig. 3. Example of a simple presence/absence matrix with a checker- board distribution: four islands and two species islands and are absent from species-rich islands (see Fig. 2 right, which has a supertramp as species e but has the same row sums and grand total as Fig. 2 left). Now compare the rearrangeability of the two matrices of Fig. 2. Recall that the rearrangement algorithm by which Connor and Simberloff generate simulated matrices seeks 2 by 2 subma- trices (not necessarily in adjacent rows or columns) of the form 10) 1)o (0, (?0 and changes them to the opposite form. This manipulation alters neither row nor column sums and hence maintains the con- Islands 00101011110100001110 Ii010100001011110001 00101111001001001110 11010000110110110001 10110000000011111011 01001111111100000100 10011101011001001100 01100010100110110011 00101011100100110110 Species ii010100011011001001 11010111100001011000 00101000011110100111 i0000111100011001110 01111000011100110001 11001100100111000110 00110011011000111001 00010101101011101010 11101010010100010101 01101101010001010101 10010010101110101010 Conor & Simberloff 1979, Diamond & Gilpin 1982 Species Locations No null ecology
  17. Null networks & fantastical beasts Figure 1: Dependence structure between

    data points must not change under permutations. In node- . CC-BY-NC-ND 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 7, 2021. ; https://doi.org/10.1101/2021.06.04.447124 doi: bioRxiv preprint Hart et al 2021 Network permutation methods: low power, high false positives
  18. Hypotheses and Models Research requires more than null robots Also

    requires: Generative causal models Statistical models justified by generative models & questions (estimands) An effective way to produce estimates
  19. Justifying “controls” bBQM US2`H 2i HX kyReVV rQmH/ K2bm`2 i?

    s  * " u Y ~ X Y ~ X + A Y ~ X + A + B Y ~ X + C Y ~ X + A + C Y ~ X + B + C
  20. Justifying “controls” bBQM US2`H 2i HX kyReVV rQmH/ K2bm`2 i?

    s  * " u Y ~ X Y ~ X + A Y ~ X + A + B Y ~ X + C Y ~ X + A + C Y ~ X + B + C “Adjustment set”
  21. Finite data, infinite problems DAG is not enough Need generative

    model to design/ debug inference Need a strategy to derive estimate and uncertainty Easiest approach: Bayesian data analysis F /PUJDF UIBU FWFO UIPVHI UIF UISFF XIJUF TJEFT MPPL UIF TBNF GSPN KVTU SFDPSE UIF DPMPS PG UIF TJEFT BęFS BMM‰UIFZ BSF SFBMMZ EJČFSFOU F CFDBVTF JU NFBOT UIBU UIFSF BSF UISFF NPSF XBZT UP TFF UIBO UP TFF /PX DPOTJEFS UIF HBSEFO BT XF HFU BOPUIFS UPTT PG UIF HMPCF *U POF MBZFS /PX UIFSF BSF  QPTTJCMF QBUIT UISPVHI UIF HBSEFO POF GPS FBDI TFDPOE ESBX GSPN UIF CBH FBDI PG UIF QBUIT BCPWF BHBJO GPSLT JOUP GP #FDBVTF XF CFMJFWF UIBU FBDI UPTT PG UIF HMPCF HJWFT FBDI TJEF B GBJS SFHBSEMFTT PG XIJDI TJEFT XBT TBNQMFE QSFWJPVTMZ ćF UIJSE MBZFS J
  22. Bayes is practical, not philosophical Simple analyses: little difference, adds

    mess Realistic analyses: huge difference Measurement error, missing data, latent variables, regularization Bayesian models are generative
  23. Statistics wars are over Bayes no longer controversial or marginalized

    Bayesian methods routine Waiting for teaching to catch up The action is in machine learning, which has different battles
  24.  3 DPEF  # function to toss a globe

    covered p by water N times sim_globe <- function( p=0.7 , N=9 ) { sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE) } /PUIJOH IBQQFOT VOUJM XF DBMM UIF GVODUJPO CZ JUT OBNF 3 DPEF  sim_globe() [1] "L" "W" "W" "W" "L" "L" "L" "W" "L" 3FQFBU DBMMJOH UIF GVODUJPO UP TFF UIBU JU TJNVMBUFT B EJČFSFOU TBNQMF FBDI UJNF "OE CZ OBNJOH UIF QSPQPSUJPO PG XBUFS p BOE OVNCFS PG UPTTFT N JO UIF GVODUJPO EFĕOJUJPO XF DBO FBTJMZ DIBOHF UIFTF WBMVFT XIFO XF DBMM UIF GVODUJPO Bę  26"-*5: "4463"/$&  # function to compute posterior distribution compute_posterior <- function( the_sample , poss=c(0,0.25,0.5,0.75,1) ) { W <- sum(the_sample=="W") # number of W observed L <- sum(the_sample=="L") # number of L observed ways <- sapply( poss , function(q) (q*4)^W * ((1-q)*4)^L ) post <- ways/sum(ways) bars <- sapply( post, function(q) make_bar(q) ) data.frame( poss , ways , post=round(post,3) , bars ) } 5P VTF UIJT GVODUJPO ZPV OFFE UP HJWF JU B TBNQMF "OE XF DBO KVTU FNCFE UIF QSFWJPVT TJNVMBUJPO GVODUJPO JOTJEF JU 3 DPEF  compute_posterior( sim_globe() ) poss ways post bars 1 0.00 0 0.000 2 0.25 243 0.291 ###### 3 0.50 512 0.612 ############ 4 0.75 81 0.097 ## 5 1.00 0 0.000 SBę  26"-*5: "4463"/$&  # function to compute posterior distribution compute_posterior <- function( the_sample , poss=c(0,0.25,0.5,0.75,1) ) { W <- sum(the_sample=="W") # number of W observed L <- sum(the_sample=="L") # number of L observed ways <- sapply( poss , function(q) (q*4)^W * ((1-q)*4)^L ) post <- ways/sum(ways) bars <- sapply( post, function(q) make_bar(q) ) data.frame( poss , ways , post=round(post,3) , bars ) } 5P VTF UIJT GVODUJPO ZPV OFFE UP HJWF JU B TBNQMF "OE XF DBO KVTU FNCFE UIF QSFWJPVT TJNVMBUJPO GVODUJPO JOTJEF JU 3  compute_posterior( sim_globe() ) poss ways post bars 1 0.00 0 0.000 2 0.25 243 0.291 ###### 3 0.50 512 0.612 ############ 4 0.75 81 0.097 ## 5 1.00 0 0.000 3FQFBU UIJT GVODUJPO DBMM B GFX UJNFT UP TIPX UIBU BT UIF TBNQMF WBSJFT TP UPP EPFT UIF QPTUF SJPS EJTUSJCVUJPO ę   4."-- 803-% 0 0.2 0.4 0.6 0.8 1 proportion water posterior probability 0.00 0.05 0.10 0.15 0.20 0.25 0.30 11 possibilities 1 2 3 4
  25. Drawing the Bayesian Owl Scientific data analyses: 
 Amateur software

    engineering Three modes: Understand what you are doing Document your work, reduce error Respectable scientific workflow
  26. Drawing the Bayesian Owl 1. Theoretical estimand 2. Scientific (causal)

    model(s) 3. Use 1 & 2 to build statistical model(s) 4. Simulate from 2 to validate 3 yields 1 5. Analyze real data
  27. Drawing the Bayesian Owl 1. Theoretical estimand 2. Scientific (causal)

    model(s) 3. Use 1 & 2 to build statistical model(s) 4. Simulate from 2 to validate 3 yields 1 5. Analyze real data
  28. Drawing the Bayesian Owl 1. Theoretical estimand 2. Scientific (causal)

    model(s) 3. Use 1 & 2 to build statistical model(s) 4. Simulate from 2 to validate 3 yields 1 5. Analyze real data
  29. Drawing the Bayesian Owl 1. Theoretical estimand 2. Scientific (causal)

    model(s) 3. Use 1 & 2 to build statistical model(s) 4. Simulate from 2 to validate 3 yields 1 5. Analyze real data
  30. Drawing the Bayesian Owl 1. Theoretical estimand 2. Scientific (causal)

    model(s) 3. Use 1 & 2 to build statistical model(s) 4. Simulate from 2 to validate 3 yields 1 5. Analyze real data
  31. DAGs, Golems & Owls DAGs: Transparent scientific assumptions to 


    justify scientific effort
 expose it to useful critique
 connect theories to golems Golems: Brainless, powerful statistical models Owls: Documented procedures, quality assurance
  32. Course Schedule Week 1 Bayesian inference Chapters 1, 2, 3

    Week 2 Linear models & Causal Inference Chapter 4 Week 3 Causes, Confounds & Colliders Chapters 5 & 6 Week 4 Overfitting / Interactions Chapters 7 & 8 Week 5 MCMC & Generalized Linear Models Chapters 9, 10, 11 Week 6 Integers & Other Monsters Chapters 11 & 12 Week 7 Multilevel models I Chapter 13 Week 8 Multilevel models II Chapter 14 Week 9 Measurement & Missingness Chapter 15 Week 10 Generalized Linear Madness Chapter 16 https://github.com/rmcelreath/stat_rethinking_2023