Upgrade to Pro — share decks privately, control downloads, hide ads and more …

L05 Statistical Rethinking Winter 2019

L05 Statistical Rethinking Winter 2019

Lecture 06 of the Dec 2018 through March 2019 edition of Statistical Rethinking. This lecture covers Chapter 5 of the text.

A0f2f64b2e58f3bfa48296fb9ed73853?s=128

Richard McElreath

January 07, 2019
Tweet

Transcript

  1. The Many Variables & The Spurious Waffles Statistical Rethinking Winter

    2019 Lecture 04 / Week 3
  2. None
  3. None
  4. “If you get there and the Waffle House is closed?

    That's really bad. That's when you go to work.” Craig Fugate, director (2009–2017) USA Federal Emergency Management Agency (FEMA)
  5. Does Waffle House cause divorce?   .6-5*7"3*" 0 10

    20 30 40 50 6 8 10 12 14 Waffle Houses per million Divorce rate AL AR GA ME NJ OK SC 'Ķ QF JO QP BU QS  BH UIBO POF UZQF PG JOĘVFODF XF TIP POF DBVTF DBO IJEF BOPUIFS .VMUJ  *OUFSBDUJPOT &WFO XIFO WBSJBCMFT
  6. http://www.tylervigen.com/spurious-correlations Correlation is commonplace

  7. Goals this week • Multiple regression models • The good:

    • Reveal spurious correlation • Uncover masked association • The bad: • Cause spurious correlation • Hide real associations • Learn basics of causal inference • Directed acyclic graphs • forks, pipes, colliders, oh my! • Backdoor criterion   .6-5*7" 0 10 20 30 40 50 6 8 10 12 14 Waffle Houses per million Divorce rate AL AR GA ME NJ OK SC UIBO POF UZQF PG JOĘVFODF XF POF DBVTF DBO IJEF BOPUIFS .V  *OUFSBDUJPOT &WFO XIFO WBSJBC
  8. Spurious association • Correlation does not imply causation • Causation

    does not imply correlation • Causation implies conditional correlation • Need more than just models • Q: Does marriage cause divorce?  4163*064 Marriage rate Divorce rate 13 20 30 6 10 13 'ĶĴłĿIJ ƍƊ %JWPSDF SBUF JT BTTPDJBU NFEJBO BHF BU NBSSJBHF SJHIU  #PUI JO UIJT FYBNQMF ćF BWFSBHF NBSSJBHF
  9. Spurious association  4163*064 "440$*"5*0/  Marriage rate Divorce rate

    13 20 30 6 10 13 Median age marriage Divorce rate 23 26 29 6 10 13 'ĶĴłĿIJ ƍƊ %JWPSDF SBUF JT BTTPDJBUFE XJUI CPUI NBSSJBHF SBUF MFę BOE NFEJBO BHF BU NBSSJBHF SJHIU  #PUI QSFEJDUPS WBSJBCMFT BSF TUBOEBSEJ[FE JO UIJT FYBNQMF ćF BWFSBHF NBSSJBHF SBUF BDSPTT 4UBUFT JT  QFS  BEVMUT Figure 5.2
  10. Multiple causes of divorce • Want to know: what is

    value of a predictor, once we know the other predictors? • What is value of knowing marriage rate, once we already know median age at marriage? • What is value of knowing median age marriage, once we know marriage rate?
  11. They’re good DAGs, Brent • Directed Acyclic Graphs — tools

    for causal models • Directed: Arrows • Acyclic: Arrows don’t make loops • Graphs: Nodes and edges • Unlike statistical model, has causal implications NBSSJBHF SBUF . BOE UIF NFEJBO BHF BU NBSSJBHF " JO FBDI 4UBUF ćF QBUUFSO X O UIF QSFWJPVT UXP NPEFMT BOE JMMVTUSBUFE JO 'ĶĴłĿIJ ƍƊ JT TZNQUPNBUJD PG B TJUVBUJPO I POMZ POF PG UIF QSFEJDUPS WBSJBCMFT " JO UIJT DBTF IBT B DBVTBM JNQBDU PO UIF PVUDPN FO UIPVHI CPUI QSFEJDUPS WBSJBCMFT BSF TUSPOHMZ BTTPDJBUFE XJUI UIF PVUDPNF 5P VOEFSTUBOE UIJT CFUUFS JU JT IFMQGVM UP JOUSPEVDF B QBSUJDVMBS UZQF PG DBTVBM HSBQ O BT B %"( TIPSU GPS ıĶĿIJİŁIJı ĮİņİĹĶİ ĴĿĮĽĵ (SBQI NFBOT JU JT OPEFT BOE DP POT %JSFDUFE NFBOT UIF DPOOFDUJPOT IBWF BSSPXT UIBU JOEJDBUF EJSFDUJPOT PG DBVTBM J DF "OE BDZDMJD NFBOT UIBU DBVTFT EP OPU FWFOUVBMMZ ĘPX CBDL PO UIFNTFMWFT " %" XBZ PG EFTDSJCJOH RVBMJUBUJWF DBVTBM SFMBUJPOTIJQT BNPOH WBSJBCMFT *U JTOU BT EFUBJMFE NPEFM EFTDSJQUJPO CVU JU DPOUBJOT JOGPSNBUJPO UIBU B QVSFMZ TUBUJTUJDBM NPEFM EPFT OP LF B TUBUJTUJDBM NPEFM B %"( JG JU JT DPSSFDU XJMM UFMM ZPV UIF DPOTFRVFODFT PG JOUFSWFOJO BOHF B WBSJBCMF )FSF JT B %"( GPS PVS EJWPSDF SBUF FYBNQMF A D M V XBOU UP LOPX IPX UP ESBX UIJT TFF UIF 0WFSUIJOLJOH CPY BU UIF FOE PG UIJT TFDUJPO
  12. WBSJBCMF B %"( GPS PVS EJWPSDF SBUF FYBNQMF A D

    M UP LOPX IPX UP ESBX UIJT TFF UIF 0WFSUIJOLJOH CPY BU UIF FOE PG UI L MJLF NVDI CVU UIJT UZQF PG EJBHSBN EPFT B MPU PG XPSL *U SFQSFTFOU FM -JLF PUIFS NPEFMT JU JT BO BOBMZUJDBM BTTVNQUJPO ćF TZNCPMT " FSWFE WBSJBCMFT ćF BSSPXT TIPX EJSFDUJPOT PG JOĘVFODF 8IBU UIJT % NBZ EJSFDUMZ JOĘVFODF % NBZ EJSFDUMZ JOĘVFODF % Median age of marriage Marriage rate Divorce rate Implications: (1) M is a function of A (2) D is a function of A and M (3) The total causal effect of A has two paths: (a) A –> M –> D (b) A –> D
  13. Good DAGs • Given association M <–> D, cannot tell

    difference between: • Need conditional association: M <–> D | A  4163*064 "440$*"5*0/ BO PVUDPNF MJLF % *U DPVME TUJMM CF BTTPDJBUFE XJUI % FOUJSFMZ UISPVHI UI UZQF PG SFMBUJPOTIJQ JT LOPXO BT ĺIJıĶĮŁĶļĻ BOE XFMM IBWF BO FYBNQMF *O UIJT DBTF IPXFWFS UIF JOEJSFDU QBUI BDUVBMMZ EPFT OP XPSL )PX D LOPX GSPN (ǒǡǏ UIBU NBSSJBHF SBUF JT QPTJUJWFMZ BTTPDJBUFE XJUI EJWPSD FOPVHI UP UFMM VT UIBU UIF QBUI . → % JT QPTJUJWF *U DPVME CF UIBU UIF . BOE % BSJTFT FOUJSFMZ GSPN "T JOĘVFODF PO CPUI . BOE % -JLF UIJT A D M ćJT %"( JT BMTP DPOTJTUFOU XJUI UIF JOGFSFODFT GSPN NPEFMT (ǒǡǎ BOE *T UIFSF B EJSFDU FČFDU PG NBSSJBHF SBUF PS SBUIFS JT BHF BU NBSSJBHF KVTU E B TQVSJPVT DPSSFMBUJPO CFUXFFO NBSSJBHF SBUF BOE EJWPSDF SBUF 5P ĕOE PVU XF OFFE B NPEFM UIBU iDPOUSPMTw GPS " XIJMF BTTFTTJOH UIF . BOE % "OE UIBU JT XIBU NVMUJQMF SFHSFTTJPO IFMQT XJUI ćF RVFTUJPO UIJT CFUUFS JU JT IFMQGVM UP JOUSPEVDF B QBSUJDVMBS UZQF PG DBTVBM HSBQI IPSU GPS ıĶĿIJİŁIJı ĮİņİĹĶİ ĴĿĮĽĵ (SBQI NFBOT JU JT OPEFT BOE DPO NFBOT UIF DPOOFDUJPOT IBWF BSSPXT UIBU JOEJDBUF EJSFDUJPOT PG DBVTBM JO D NFBOT UIBU DBVTFT EP OPU FWFOUVBMMZ ĘPX CBDL PO UIFNTFMWFT " %"( OH RVBMJUBUJWF DBVTBM SFMBUJPOTIJQT BNPOH WBSJBCMFT *U JTOU BT EFUBJMFE BT QUJPO CVU JU DPOUBJOT JOGPSNBUJPO UIBU B QVSFMZ TUBUJTUJDBM NPEFM EPFT OPU NPEFM B %"( JG JU JT DPSSFDU XJMM UFMM ZPV UIF DPOTFRVFODFT PG JOUFSWFOJOH F GPS PVS EJWPSDF SBUF FYBNQMF A D M X IPX UP ESBX UIJT TFF UIF 0WFSUIJOLJOH CPY BU UIF FOE PG UIJT TFDUJPO *U NVDI CVU UIJT UZQF PG EJBHSBN EPFT B MPU PG XPSL *U SFQSFTFOUT B IFVSJTUJD PUIFS NPEFMT JU JT BO BOBMZUJDBM BTTVNQUJPO ćF TZNCPMT " . BOE % SJBCMFT ćF BSSPXT TIPX EJSFDUJPOT PG JOĘVFODF 8IBU UIJT %"( TBZT JT FDUMZ JOĘVFODF %
  14. divorce rate marriage rate median age marriage “slope” for marriage

    rate “slope” for median age marriage I QSFEJDUPS NBLF B QBSBNFUFS UIBU XJMM NFBTVSF JUT BTTPDJBUJPO XJUI UIF F Z UIF QBSBNFUFS CZ UIF WBSJBCMF BOE BEE UIBU UFSN UP UIF MJOFBS NPEFM BZT OFDFTTBSZ TP IFSF JT UIF NPEFM UIBU QSFEJDUT EJWPSDF SBUF VTJOH CPUI BHF BU NBSSJBHF %J ∼ /PSNBM(µJ, σ) [probability of data] µJ = α + β. .J + β" "J [linear model] α ∼ /PSNBM(, .) [prior for α] β. ∼ /PSNBM(, .) [prior for β.] β" ∼ /PSNBM(, .) [prior for β"] σ ∼ &YQPOFOUJBM() [prior for σ] FWFS TZNCPMT ZPV MJLF GPS UIF QBSBNFUFST BOE WBSJBCMFT CVU IFSF *WF DIPTFO UF BOE " GPS BHF BU NBSSJBHF SFVTJOH UIFTF TZNCPMT BT TVCTDSJQUT GPS UIF SBNFUFST #VU GFFM GSFF UP VTF XIJDIFWFS TZNCPMT SFEVDF UIF MPBE PO ZPVS JU NFBO UP BTTVNF µJ = α + β. .J + β" "J *U NFBOT UIBU UIF FYQFDUFE
  15. Priors • Standardize divorce rate D, marriage rate M, median

    age at marriage A • We expect alpha to be near zero • Slopes should not produce impossibly strong relationships PVUDPNF .VMUJQMZ UIF QBSBNFUFS CZ UIF WBSJBCMF BOE BEE UIBU UFSN UP UIF MJOFBS NPEFM BSF BMXBZT OFDFTTBSZ TP IFSF JT UIF NPEFM UIBU QSFEJDUT EJWPSDF SBUF VTJOH SBUF BOE BHF BU NBSSJBHF %J ∼ /PSNBM(µJ, σ) [probability o µJ = α + β. .J + β" "J [linear m α ∼ /PSNBM(, .) [prior β. ∼ /PSNBM(, .) [prior fo β" ∼ /PSNBM(, .) [prior f σ ∼ &YQPOFOUJBM() [prior TF XIBUFWFS TZNCPMT ZPV MJLF GPS UIF QBSBNFUFST BOE WBSJBCMFT CVU IFSF *WF DI SJBHF SBUF BOE " GPS BHF BU NBSSJBHF SFVTJOH UIFTF TZNCPMT BT TVCTDSJQUT GP EJOH QBSBNFUFST #VU GFFM GSFF UP VTF XIJDIFWFS TZNCPMT SFEVDF UIF MPBE PO PSZ BU EPFT JU NFBO UP BTTVNF µJ = α + β. .J + β" "J *U NFBOT UIBU UIF FYQ PS BOZ 4UBUF XJUI NBSSJBHF SBUF .J BOE NFEJBO BHF BU NBSSJBHF "J JT UIF TVN PG FOU UFSNT ćF ĕSTU UFSN JT B DPOTUBOU α &WFSZ 4UBUF HFUT UIJT ćF TFDPOE UF FMT BU UIF FOE PG UIF QSFWJPVT DIBQUFS‰UIFZ BEE NPSF QBSBNFUFST BOE UJPO PG µJ  ćF TUSBUFHZ JT TUSBJHIUGPSXBSE NJOBUF UIF QSFEJDUPS WBSJBCMFT ZPV XBOU JO UIF MJOFBS NPEFM PG UIF NFB S FBDI QSFEJDUPS NBLF B QBSBNFUFS UIBU XJMM NFBTVSF JUT BTTPDJBUJPO UDPNF VMUJQMZ UIF QBSBNFUFS CZ UIF WBSJBCMF BOE BEE UIBU UFSN UP UIF MJOFBS NP F BMXBZT OFDFTTBSZ TP IFSF JT UIF NPEFM UIBU QSFEJDUT EJWPSDF SBUF VT F BOE BHF BU NBSSJBHF %J ∼ /PSNBM(µJ, σ) [probabil µJ = α + β. .J + β" "J [line α ∼ /PSNBM(, .) [p β. ∼ /PSNBM(, .) [pr β" ∼ /PSNBM(, .) [pr σ ∼ &YQPOFOUJBM() [p PNJOBUF UIF QSFEJDUPS WBSJBCMFT ZPV XBOU JO UIF MJOFBS NPEFM PG UIF NFB S FBDI QSFEJDUPS NBLF B QBSBNFUFS UIBU XJMM NFBTVSF JUT BTTPDJBUJPO UDPNF VMUJQMZ UIF QBSBNFUFS CZ UIF WBSJBCMF BOE BEE UIBU UFSN UP UIF MJOFBS NP SF BMXBZT OFDFTTBSZ TP IFSF JT UIF NPEFM UIBU QSFEJDUT EJWPSDF SBUF VT UF BOE BHF BU NBSSJBHF %J ∼ /PSNBM(µJ, σ) [probabili µJ = α + β. .J + β" "J [line α ∼ /PSNBM(, .) [p β. ∼ /PSNBM(, .) [pri β" ∼ /PSNBM(, .) [pr σ ∼ &YQPOFOUJBM() [p XIBUFWFS TZNCPMT ZPV MJLF GPS UIF QBSBNFUFST BOE WBSJBCMFT CVU IFSF *W BHF SBUF BOE " GPS BHF BU NBSSJBHF SFVTJOH UIFTF TZNCPMT BT TVCTDSJQU
  16. Prior predictive simulation " TUBOEBSE EFWJBUJPO DIBOHF JO UIF PVUDPNF

    WBSJBCMF ćBU TFFNT MJLF BO JOTBOFMZ TUSPOH SFMB UJPOTIJQ ćF QSJPS BCPWF UIJOLT UIBU POMZ  PG QMBVTJCMF TMPQFT NPSF FYUSFNF UIBO  8FMM TJNVMBUF GSPN UIFTF QSJPST JO B NPNFOU TP ZPV DBO TFF IPX UIFZ MPPL JO UIF PVUDPNF TQBDF 5P DPNQVUF UIF BQQSPYJNBUF QPTUFSJPS UIFSF BSF OP OFX DPEF USJDLT PS UFDIOJRVFT IFSF #VU *MM BEE DPNNFOUT UP IFMQ FYQMBJO UIF NBTT PG DPEF UP GPMMPX 3 DPEF  (ǒǡǎ ʚǶ ,0+ǿ '$./ǿ  ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ  ʔ  ȉ  Ǣ  ʡ )*-(ǿ Ǎ Ǣ ǍǡǏ Ȁ Ǣ  ʡ )*-(ǿ Ǎ Ǣ Ǎǡǒ Ȁ Ǣ .$"( ʡ  3+ǿ ǎ Ȁ Ȁ Ǣ / ʙ  Ȁ 5P TJNVMBUF GSPN UIF QSJPST XF DBO VTF 3/-/ǡ+-$*- BOE '$)& BT JO UIF QSFWJPVT DIBQUFS *MM QMPU UIF MJOFT PWFS UIF SBOHF PG  TUBOEBSE EFWJBUJPOT GPS CPUI UIF PVUDPNF BOE QSFEJDUPS ćBUMM DPWFS NPTU PG UIF QPTTJCMF SBOHF PG CPUI WBSJBCMFT 3 DPEF  . /ǡ. ǿǎǍȀ +-$*- ʚǶ 3/-/ǡ+-$*-ǿ (ǒǡǎ Ȁ (0 ʚǶ '$)&ǿ (ǒǡǎ Ǣ +*./ʙ+-$*- Ǣ /ʙ'$./ǿ ʙǿǶǏǢǏȀ Ȁ Ȁ +'*/ǿ  Ǣ 3'$(ʙǿǶǏǢǏȀ Ǣ 4'$(ʙǿǶǏǢǏȀ Ȁ !*- ǿ $ $) ǎǣǒǍ Ȁ '$) .ǿ ǿǶǏǢǏȀ Ǣ (0ȁ$ǢȂ Ǣ *'ʙ*'ǡ'+#ǿǫ'&ǫǢǍǡǑȀ Ȁ 'ĶĴłĿIJ ƍƋ EJTQMBZT UIF SFTVMU :PV NBZ XJTI UP USZ TPNF WBHVFS ĘBUUFS QSJPST BOE TFF IPX RVJDLMZ UIF QSJPS SFHSFTTJPO MJOFT CFDPNF SJEJDVMPVT 5P DPNQVUF UIF BQQSPYJNBUF QPTUFSJPS UIFSF BSF OP OFX DPEF USJDLT PS UFDIOJRVFT IFSF #VU *MM BEE DPNNFOUT UP IFMQ FYQMBJO UIF NBTT PG DPEF UP GPMMPX 3 DPEF  (ǒǡǎ ʚǶ ,0+ǿ '$./ǿ  ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ  ʔ  ȉ  Ǣ  ʡ )*-(ǿ Ǎ Ǣ ǍǡǏ Ȁ Ǣ  ʡ )*-(ǿ Ǎ Ǣ Ǎǡǒ Ȁ Ǣ .$"( ʡ  3+ǿ ǎ Ȁ Ȁ Ǣ / ʙ  Ȁ 5P TJNVMBUF GSPN UIF QSJPST XF DBO VTF 3/-/ǡ+-$*- BOE '$)& BT JO UIF QSFWJPVT DIBQUFS *MM QMPU UIF MJOFT PWFS UIF SBOHF PG  TUBOEBSE EFWJBUJPOT GPS CPUI UIF PVUDPNF BOE QSFEJDUPS ćBUMM DPWFS NPTU PG UIF QPTTJCMF SBOHF PG CPUI WBSJBCMFT 3 DPEF  . /ǡ. ǿǎǍȀ +-$*- ʚǶ 3/-/ǡ+-$*-ǿ (ǒǡǎ Ȁ (0 ʚǶ '$)&ǿ (ǒǡǎ Ǣ +*./ʙ+-$*- Ǣ /ʙ'$./ǿ ʙǿǶǏǢǏȀ Ȁ Ȁ +'*/ǿ  Ǣ 3'$(ʙǿǶǏǢǏȀ Ǣ 4'$(ʙǿǶǏǢǏȀ Ȁ !*- ǿ $ $) ǎǣǒǍ Ȁ '$) .ǿ ǿǶǏǢǏȀ Ǣ (0ȁ$ǢȂ Ǣ *'ʙ*'ǡ'+#ǿǫ'&ǫǢǍǡǑȀ Ȁ 'ĶĴłĿIJ ƍƋ EJTQMBZT UIF SFTVMU :PV NBZ XJTI UP USZ TPNF WBHVFS ĘBUUFS QSJPST BOE TFF IPX RVJDLMZ UIF QSJPS SFHSFTTJPO MJOFT CFDPNF SJEJDVMPVT
  17. Prior predictive simulation Ȁ Ǣ / ʙ  Ȁ 5P

    TJNVMBUF GSPN UIF QSJPST XF DBO VTF 3/-/ǡ+-$*- BOE '$)& BT JO UIF QSFWJPVT DIBQUFS *MM QMPU UIF MJOFT PWFS UIF SBOHF PG  TUBOEBSE EFWJBUJPOT GPS CPUI UIF PVUDPNF BOE QSFEJDUPS ćBUMM DPWFS NPTU PG UIF QPTTJCMF SBOHF PG CPUI WBSJBCMFT 3 DPEF  . /ǡ. ǿǎǍȀ +-$*- ʚǶ 3/-/ǡ+-$*-ǿ (ǒǡǎ Ȁ (0 ʚǶ '$)&ǿ (ǒǡǎ Ǣ +*./ʙ+-$*- Ǣ /ʙ'$./ǿ ʙǿǶǏǢǏȀ Ȁ Ȁ +'*/ǿ  Ǣ 3'$(ʙǿǶǏǢǏȀ Ǣ 4'$(ʙǿǶǏǢǏȀ Ȁ !*- ǿ $ $) ǎǣǒǍ Ȁ '$) .ǿ ǿǶǏǢǏȀ Ǣ (0ȁ$ǢȂ Ǣ *'ʙ*'ǡ'+#ǿǫ'&ǫǢǍǡǑȀ Ȁ 'ĶĴłĿIJ ƍƋ EJTQMBZT UIF SFTVMU :PV NBZ XJTI UP USZ TPNF WBHVFS ĘBUUFS QSJPST BOE TFF IPX RVJDLMZ UIF QSJPS SFHSFTTJPO MJOFT CFDPNF SJEJDVMPVT  4163*064 "440$*"5*0/ -2 -1 0 1 2 -2 -1 0 1 2 Median age marriage (std) Divorce rate (std) 'ĶĴłĿIJ ƍƋ 1MBVT CZ UIF QSJPST JO (ǒ NBUJWF QSJPST JO UI CMZ TUSPOH SFMBUJPO UIF MJOFT UP QPTTJC Figure 5.3
  18. NFUFS CZ UIF WBSJBCMF BOE BEE UIBU UFSN UP UIF

    MJOFBS NPEFM TBSZ TP IFSF JT UIF NPEFM UIBU QSFEJDUT EJWPSDF SBUF VTJOH CPUI BSSJBHF %J ∼ /PSNBM(µJ, σ) [probability of data] µJ = α + β. .J + β" "J [linear model] α ∼ /PSNBM(, .) [prior for α] β. ∼ /PSNBM(, .) [prior for β.] β" ∼ /PSNBM(, .) [prior for β"] σ ∼ &YQPOFOUJBM() [prior for σ] PMT ZPV MJLF GPS UIF QBSBNFUFST BOE WBSJBCMFT CVU IFSF *WF DIPTFO GPS BHF BU NBSSJBHF SFVTJOH UIFTF TZNCPMT BT TVCTDSJQUT GPS UIF #VU GFFM GSFF UP VTF XIJDIFWFS TZNCPMT SFEVDF UIF MPBE PO ZPVS P BTTVNF µJ = α + β. .J + β" "J *U NFBOT UIBU UIF FYQFDUFE NBSSJBHF SBUF .J BOE NFEJBO BHF BU NBSSJBHF "J JT UIF TVN PG UISFF
  19. σ ∼ &YQPOFOUJBM() .$"( ʡ  3+ǿǎȀ "OE IFSF JT

    UIF ,0+ DPEF UP BQQSPYJNBUF UIF QPTUFSJPS EJTUSJCVUJPO 3 DPEF  (ǒǡǐ ʚǶ ,0+ǿ '$./ǿ  ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ  ʔ ȉ ʔ ȉ Ǣ  ʡ )*-(ǿ Ǎ Ǣ ǍǡǏ Ȁ Ǣ  ʡ )*-(ǿ Ǎ Ǣ Ǎǡǒ Ȁ Ǣ  ʡ )*-(ǿ Ǎ Ǣ Ǎǡǒ Ȁ Ǣ   5)& ."/: 7"3*"#-&4  5)& 4163*064 8"''-&4 .$"( ʡ  3+ǿ ǎ Ȁ Ȁ Ǣ / ʙ  Ȁ +- $.ǿ (ǒǡǐ Ȁ ( ) . ǒǡǒʉ ǖǑǡǒʉ  ǍǡǍǍ ǍǡǎǍ ǶǍǡǎǓ ǍǡǎǓ  ǶǍǡǍǔ Ǎǡǎǒ ǶǍǡǐǎ ǍǡǎǕ  ǶǍǡǓǎ Ǎǡǎǒ ǶǍǡǕǒ ǶǍǡǐǔ .$"( Ǎǡǔǖ ǍǡǍǕ ǍǡǓǓ Ǎǡǖǎ ćF QPTUFSJPS NFBO GPS NBSSJBHF SBUF  JT OPX DMPTF UP [FSP XJUI QMFOUZ PG QSPCBCJMJUZ PG CPUI TJEFT PG [FSP ćF QPTUFSJPS NFBO GPS BHF BU NBSSJBHF  JT FTTFOUJBMMZ VODIBOHFE *U XJMM IFMQ UP WJTVBMJ[F UIF QPTUFSJPS EJTUSJCVUJPOT GPS BMM UISFF NPEFMT GPDVTJOH KVTU PO UIF TMPQF QBSBNFUFST β" BOE β.  EF  +'*/ǿ * !/ǿ(ǒǡǎǢ(ǒǡǏǢ(ǒǡǐȀǢ +-ʙǿǫǫǢǫǫȀ Ȁ
  20. * !/ǿ(ǒǡǎǢ(ǒǡǏǢ(ǒǡǐȀǢ +-ʙǿǫǫǢǫǫȀ Ȁ m5.1 m5.2 m5.3 m5.1 m5.2 m5.3

    bA bM -0.5 0.0 0.5 Estimate S NFBOT TIPXO CZ UIF QPJOUT BOE UIF  DPNQBUJCJMJUZ JOUFSWBMT CZ UIF TPMJE IPSJ OFT /PUJDF IPX  EPFTOU NPWF POMZ HSPXT B CJU NPSF VODFSUBJO XIJMF  JT POMZ FE XJUI EJWPSDF XIFO BHF BU NBSSJBHF JT NJTTJOH GSPN UIF NPEFM :PV DBO JOUFSQSFU TUSJCVUJPOT BT TBZJOH 0ODF XF LOPX NFEJBO BHF BU NBSSJBHF GPS B 4UBUF UIFSF JT MJUUMF PS OP BEEJ UJPOBM QSFEJDUJWF QPXFS JO BMTP LOPXJOH UIF SBUF PG NBSSJBHF JO UIBU 4UBUF BU UIJT EPFT OPU NFBO UIBU UIFSF JT OP WBMVF JO LOPXJOH NBSSJBHF SBUF $POTJTUFOU m5.1: age of marriage only D ~ A m5.2: marriage rate only D ~ M m5.3: multiple regression D ~ A + M
  21. Multiple regression • Once we know median age marriage, little

    additional value in knowing marriage rate. • Once we know marriage rate, still value in knowing median age marriage. • If we don’t know median age marriage, still useful to know marriage rate. .$"( ʡ  3+ǿ ǎ Ȁ Ȁ Ǣ / ʙ  Ȁ +- $.ǿ (ǒǡǐ Ȁ ( ) . ǒǡǒʉ ǖǑǡǒʉ  ǍǡǍǍ ǍǡǎǍ ǶǍǡǎǓ ǍǡǎǓ  ǶǍǡǍǔ Ǎǡǎǒ ǶǍǡǐǎ ǍǡǎǕ  ǶǍǡǓǎ Ǎǡǎǒ ǶǍǡǕǒ ǶǍǡǐǔ .$"( Ǎǡǔǖ ǍǡǍǕ ǍǡǓǓ Ǎǡǖǎ ćF QPTUFSJPS NFBO GPS NBSSJBHF SBUF  JT OPX DMPTF UP [FSP XJUI Q PG CPUI TJEFT PG [FSP ćF QPTUFSJPS NFBO GPS BHF BU NBSSJBHF  JT FTT *U XJMM IFMQ UP WJTVBMJ[F UIF QPTUFSJPS EJTUSJCVUJPOT GPS BMM UISFF NPEFMT G TMPQF QBSBNFUFST β" BOE β.  3 DPEF  +'*/ǿ * !/ǿ(ǒǡǎǢ(ǒǡǏǢ(ǒǡǐȀǢ +-ʙǿǫǫǢǫǫȀ Ȁ m5.2 m5.3 bA BO PVUDPNF MJLF % *U DPVME TUJMM CF BTTPDJBUFE XJUI % FOUJSFMZ UISPVHI UIF JO UZQF PG SFMBUJPOTIJQ JT LOPXO BT ĺIJıĶĮŁĶļĻ BOE XFMM IBWF BO FYBNQMF MB *O UIJT DBTF IPXFWFS UIF JOEJSFDU QBUI BDUVBMMZ EPFT OP XPSL )PX DBO X LOPX GSPN (ǒǡǏ UIBU NBSSJBHF SBUF JT QPTJUJWFMZ BTTPDJBUFE XJUI EJWPSDF S FOPVHI UP UFMM VT UIBU UIF QBUI . → % JT QPTJUJWF *U DPVME CF UIBU UIF BTT . BOE % BSJTFT FOUJSFMZ GSPN "T JOĘVFODF PO CPUI . BOE % -JLF UIJT A D M ćJT %"( JT BMTP DPOTJTUFOU XJUI UIF JOGFSFODFT GSPN NPEFMT (ǒǡǎ BOE (ǒǡ *T UIFSF B EJSFDU FČFDU PG NBSSJBHF SBUF PS SBUIFS JT BHF BU NBSSJBHF KVTU ESJW B TQVSJPVT DPSSFMBUJPO CFUXFFO NBSSJBHF SBUF BOE EJWPSDF SBUF 5P ĕOE PVU XF OFFE B NPEFM UIBU iDPOUSPMTw GPS " XIJMF BTTFTTJOH UIF BTT . BOE % "OE UIBU JT XIBU NVMUJQMF SFHSFTTJPO IFMQT XJUI ćF RVFTUJPO XF *T UIFSF BOZ BEEJUJPOBM WBMVF JO LOPXJOH B WBSJBCMF PODF * BMSFBEZ LO UIF PUIFS QSFEJDUPS WBSJBCMFT 4P GPS FYBNQMF PODF ZPV ĕU B NVMUJQMF SFHSFTTJPO UP QSFEJDU EJWPSDF VTJOH C BOE BHF BU NBSSJBHF UIF NPEFM BEESFTTFT UIF RVFTUJPOT  "ęFS * BMSFBEZ LOPX NBSSJBHF SBUF XIBU BEEJUJPOBM WBMVF JT UIFS BHF BU NBSSJBHF  "ęFS * BMSFBEZ LOPX BHF BU NBSSJBHF XIBU BEEJUJPOBM WBMVF JT UIFS
  22. Posterior predictions • Lots of plotting options now 1. Predictor

    residual plots 2. Counterfactual plots 3. Posterior prediction plots  4163*064 "440$ -1 0 1 2 6 8 10 12 Marriage.s Divorce 6 8 10 12 Divorce 'ĶĴłĿIJ ƍƊ %JWPSDF SBUF JT BTTPDJBUFE XJU NFEJBO BHF BU NBSSJBHF SJHIU  #PUI QSFEJD UIJT FYBNQMF ćF BWFSBHF NBSSJBHF SBUF BDS NFEJBO BHF BU NBSSJBHF JT  ĕHVSF #VU EPFT NBSSJBHF DBVTF EJWPSDF *O B USJW HFU B EJWPSDF XJUIPVU ĕSTU HFUUJOH NBSSJFE #VU UIFS SJBHF SBUF UP CF DPSSFMBUFE XJUI EJWPSDF‰JUT FBTZ UP IJHI DVMUVSBM WBMVBUJPO PG NBSSJBHF BOE UIFSFGPSF CF TPNFUIJOH JT TVTQJDJPVT IFSF "OPUIFS QSFEJDUPS BTTPDJBUFE XJUI EJWPSDF JT UI UIF SJHIUIBOE QMPU JO 'ĶĴłĿIJ ƍƊ "HF BU NBSSJBHF J IJHIFS BHF BU NBSSJBHF QSFEJDUT MFTT EJWPSDF :PV DBO CZ ĕUUJOH UIJT MJOFBS SFHSFTTJPO NPEFM  4163*064 "440$*"5*0/  -1 0 1 2 6 8 10 12 Marriage.s Divorce -2 -1 0 1 2 3 6 8 10 12 MedianAgeMarriage.s Divorce 'ĶĴłĿIJ ƍƊ %JWPSDF SBUF JT BTTPDJBUFE XJUI CPUI NBSSJBHF SBUF MFę BOE
  23. Predictor residual plots • Goal: Show association of each predictor

    with outcome, “controlling” for other predictors • Useful intuition • Never analyze residuals! • Recipe: 1. Regress predictor on other predictors 2. Compute predictor residuals 3. Regress outcome on residuals
  24. -2 -1 0 1 2 3 -1 0 1 2

    Age at marriage (std) Marriage rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 3 Marriage rate (std) Age at marriage (std) DC HI ID -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2 -1 0 1 2 Marriage rate residuals Divorce rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 Age at marriage residuals Divorce rate (std) DC HI ID Figure 5.4
  25. -2 -1 0 1 2 3 -1 0 1 2

    Age at marriage (std) Marriage rate (std) DC HI ME ND WY -2 -1 0 1 2 3 Age at marriage (std) residual marriage rate
  26. -2 -1 0 1 2 3 -1 0 1 2

    Age at marriage (std) Marriage rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 3 Marriage rate (std) Age at marriage (std) DC HI ID -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2 -1 0 1 2 Marriage rate residuals Divorce rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 Age at marriage residuals Divorce rate (std) DC HI ID Figure 5.4
  27. -2 -1 0 1 2 3 -1 0 1 2

    Age at marriage (std) Marriage rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 3 Marriage rate (std) Age at marriage (std) DC HI ID -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2 -1 0 1 2 Marriage rate residuals Divorce rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 Age at marriage residuals Divorce rate (std) DC HI ID Figure 5.4
  28. -2 -1 0 1 2 3 -1 0 1 2

    Age at marriage (std) Marriage rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 3 Marriage rate (std) Age at marriage (std) DC HI ID -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2 -1 0 1 2 Marriage rate residuals Divorce rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 Age at marriage residuals Divorce rate (std) DC HI ID Figure 5.4
  29. Statistical “control” • Multiple linear regression answers question: How is

    each predictor associated with outcome, once we know all the other predictors? • Uses model to build expected outcomes — not magic! • Don’t get cocky: Marriage rate may still be associated with divorce, for some subset of States • Can’t make strong causal inferences from averages; need data on individuals -1.5 -0.5 0.5 1.0 1.5 6 8 10 12 14 Marriage rate residuals Divorce rate faster slower 'ĶĴłĿIJ ƍƍ 1SFEJDUPS SFTJEVBM QMPUT GP GBTU NBSSJBHF SBUFT GPS UIFJS NFEJBO BH EJWPSDF SBUFT BT EP 4UBUFT XJUI TMPX N NFEJBO BHF PG NBSSJBHF GPS UIFJS NBS XIJMF 4UBUFT XJUI ZPVOH NFEJBO BHF PG 4P 4UBUFT UP UIF SJHIU PG UIF MJOF NBSSZ GBTUFS UIB TMPXFS UIBO FYQFDUFE "WFSBHF EJWPSDF SBUF PO C UIF SFHSFTTJPO MJOF EFNPOTUSBUFT MJUUMF SFMBUJPOT TMPQF PG UIF SFHSFTTJPO MJOF JT −. FYBDUMZ XI ćF SJHIUIBOE QMPU JO 'ĶĴłĿIJ ƍƍ EJTQMBZT NFEJBO BHF BU NBSSJBHF iDPOUSPMMJOHw GPS NBSSJ EBTIFE MJOF IBWF PMEFS UIBO FYQFDUFE NFEJBO ZPVOHFS UIBO FYQFDUFE NFEJBO BHF BU NBSSJBHF PO UIF SJHIU JT MPXFS UIBO UIF SBUF PO UIF MFę -2 -1 0 1 2 3 -1 0 1 2 MedianAgeMarriage.s Marriage.s 'ĶĴłĿIJ ƍƌ 3FTJEVBM NBSSJBHF SBUF JO FBDI 4U BęFS BDDPVOUJOH GPS UIF MJOFBS BTTPDJBUJPO X NFEJBO BHF BU NBSSJBHF &BDI HSBZ MJOF TFHN JT B SFTJEVBM UIF EJTUBODF PG FBDI PCTFSWFE N SJBHF SBUF GSPN UIF FYQFDUFE WBMVF BUUFNQUJOH QSFEJDU NBSSJBHF SBUF XJUI NFEJBO BHF BU NBSSJ BMPOF 4P 4UBUFT UIBU MJF BCPWF UIF CMBDL SFHSFTT MJOF IBWF IJHIFS SBUFT PG NBSSJBHF UIBO FYQFDU BDDPSEJOH UP BHF BU NBSSJBHF ćPTF CFMPX UIF M IBWF MPXFS SBUFT UIBO FYQFDUFE -1.5 -0.5 0.5 1.0 1.5 6 8 10 12 14 Marriage rate residuals Divorce rate faster slower -1 0 1 2 6 8 10 12 14 Age of marriage residuals Divorce rate older younger 'ĶĴłĿIJ ƍƍ 1SFEJDUPS SFTJEVBM QMPUT GPS UIF EJWPSDF EBUB -Fę 4UBUFT XJUI GBTU NBSSJBHF SBUFT GPS UIFJS NFEJBO BHF PG NBSSJBHF IBWF BCPVU UIF TBNF
  30. Counterfactual plots • Goal: Explore model implications for outcomes •

    Fix other predictor(s) • Compute predictions across values of predictor • Compute for unobserved (impossible?) cases, hence “counterfactual” Figure 5.5  4163*064 "440$*"5*0/  -1 0 1 2 -2 -1 0 1 2 Marriage rate (standardized) Divorce rate (standardized) Median age marriage (std) = 0 -2 -1 0 1 2 3 -2 -1 0 1 2 Median age marriage (standardized) Divorce rate (standardized) Marriage rate (std) = 0
  31. Posterior prediction checks • Goal: Compute implied predictions for observed

    cases • Check model fit — golems do make mistakes • Find model failures, stimulate new ideas • Always average over the posterior distribution • Using only posterior mean leads to overconfidence • Embrace the uncertainty
  32. Figure 5.6 Predicted compared to observed -2 -1 0 1

    2 -2 -1 0 1 Observed divorce Predicted divorce ID ME RI UT 'ĶĴłĿIJ ƍƎ 1 NVMUJWBSJBUF E [POUBM BYJT JT U 4UBUF ćF WFS QSFEJDUFE EJWP EJBO BHF BU N CMVF MJOF TFHN UFSWBMT PG UIF FRVBMJUZ
  33. Masked association • Sometimes association between outcome and predictor masked

    by another variable • Need both variables to see influence of either • Tends to arise when • Another predictor associated with outcome in opposite direction • Both predictors associated with one another • Noise in predictors can also mask association (residual confounding)
  34. Eulemur fulvus 0.49 kcal/g 55% neocortex Homo sapiens 0.71 kcal/g

    75% neocortex Cebus apella 0.89 kcal/g 68% neocortex Milk and Brain
  35. Masked influence • Primate milk data kcal.per.g -2 0 2

    4 0.5 0.7 0.9 -2 0 2 4 log(mass) 0.5 0.7 0.9 55 65 75 55 65 75 neocortex.perc library(rethinking) data(milk) d <- milk pairs(~kcal.per.g+log(mass) +neocortex.perc , data=d)
  36. Necessary sermon on priors  ."4,&% 3&-"5*0/4)*1  -2 -1

    0 1 2 -2 -1 0 1 2 neocortex percent (std) kilocal per g (std) a ~ dnorm(0, 1) bN ~ dnorm(0, 1) -2 -1 0 1 2 -2 -1 0 1 2 neocortex percent (std) kilocal per g (std) a ~ dnorm(0, 0.2) bN ~ dnorm(0, 0.5) 'ĶĴłĿIJ ƍƏ 1SJPS QSFEJDUJWF EJTUSJCVUJPOT GPS UIF ĕSTU QSJNBUF NJML NPEFM (ǒǡǒ &BDI QMPU TIPXT B SBOHF PG  TUBOEBSE EFWJBUJPOT GPS FBDI WBSJBCMF
  37. Single predictor models Figure 5.8  ."4,&% 3&-"5*0/4)*1  -2.0

    -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 neocortex percent (std) kilocal per g (std) -2 -1 0 1 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 log body mass (std) kilocal per g (std) 1.5 2.0 ) Counterfactual holding M = 0 1.5 2.0 ) Counterfactual holding N = 0
  38. Multiple regression model BTTPDJBUFE XJUI POF BOPUIFS UIJT NBTLT FBDI

    WBSJBCMF PVUDPNF VOMFTT CPUI BSF DPOTJEFSFE TJNVMUBOFPVTMZ /PX MFUT TFF XIBU IBQQFOT XIFO XF BEE CPUI QSFEJDUPS WB SFHSFTTJPO ćJT JT UIF NVMUJWBSJBUF NPEFM JO NBUI GPSN ,J ∼ /PSNBM(µJ, σ) µJ = α + β/ /J + β. .J α ∼ /PSNBM(, .) βO ∼ /PSNBM(, .) βN ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM()   5)& ."/: 7"3*"#-&4  5)& 4163*064 8"''-&4 "QQSPYJNBUJOH UIF QPTUFSJPS SFRVJSFT OP OFX USJDLT 3 DPEF  (ǒǡǔ ʚǶ ,0+ǿ '$./ǿ ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ  ʔ ȉ ʔ ȉ Ǣ  ʡ )*-(ǿ Ǎ Ǣ ǍǡǏ Ȁ Ǣ  ʡ )*-(ǿ Ǎ Ǣ Ǎǡǒ Ȁ Ǣ  ʡ )*-(ǿ Ǎ Ǣ Ǎǡǒ Ȁ Ǣ .$"( ʡ  3+ǿ ǎ Ȁ Ȁ Ǣ /ʙ Ȁ +- $.ǿ(ǒǡǔȀ ( ) . ǒǡǒʉ ǖǑǡǒʉ  ǍǡǍǔ Ǎǡǎǐ ǶǍǡǎǒ ǍǡǏǕ  ǍǡǓǕ ǍǡǏǒ ǍǡǏǕ ǎǡǍǔ  ǶǍǡǔǍ ǍǡǏǏ ǶǎǡǍǓ ǶǍǡǐǒ .$"( ǍǡǔǑ Ǎǡǎǐ Ǎǡǒǐ Ǎǡǖǒ #Z JODPSQPSBUJOH CPUI QSFEJDUPS WBSJBCMFT JO UIF SFHSFTTJPO UIF QPTUFSJPS BTTPDJBUJPO XJUI UIF PVUDPNF IBT JODSFBTFE 7JTVBMMZ DPNQBSJOH UIJT QPTUFSJPS UP UIPTF PG UIF
  39. -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5

    0.0 0.5 1.0 1.5 2.0 neocortex percent (std) kilocal per g (std) -2 -1 0 1 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 log body mass (std) kilocal per g (std) -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 neocortex percent (std) kilocal per g (std) Counterfactual holding M = 0 -2 -1 0 1 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 log body mass (std) kilocal per g (std) Counterfactual holding N = 0 Figure 5.8 Bivariate Multiple
  40. UIFTF UXP WBSJBCMFT CPEZ TJ[F BOE OFPDPSUFY BSF DPSSFMBUFE F

    UIFTF SFMBUJPOTIJQT VOMFTT XF TUBUJTUJDBMMZ BDDPVOU GPS CPUI BSF BU MFBTU UISFF HSBQIT DPOTJTUFOU XJUI UIF QBUUFSO JO UIFTF FYQMBJO FBDI K M N K M N U PTTJCJMJUZ JT UIBU CPEZ NBTT . JOĘVFODFT OFPDPSUFY QFSDFOU PSJFT JO NJML ,  4FDPOE JO UIF NJEEMF OFPDPSUFY DPVME JO XP WBSJBCMFT TUJMM FOE VQ DPSSFMBUFE JO UIF TBNQMF 'JOBMMZ PO CTFSWFE WBSJBCMF 6 UIBU JOĘVFODFT CPUI . BOE / QSPEVDJOH IU 8F EPOU LOPX *O UIF 0WFSUIJOLJOH CPY GVSUIFS EPXO Synthetic masked association + – DPEF  ȕ  Ƕʛ ʚǶ  ȕ  Ƕʛ  ) ʚǶ ǎǍǍ  ʚǶ -)*-(ǿ ) Ȁ  ʚǶ -)*-(ǿ ) Ǣ  Ȁ ʚǶ -)*-(ǿ ) Ǣ  Ƕ  Ȁ Ǿ.$(Ǐ ʚǶ /ǡ!-( ǿ ʙ ǢʙǢʙȀ ȕ  Ƕʛ ʚǶ  ȕ  ʚǶ  Ƕʛ  ) ʚǶ ǎǍǍ  ʚǶ -)*-(ǿ ) Ȁ  ʚǶ -)*-(ǿ ) Ǣ  Ȁ  ʚǶ -)*-(ǿ ) Ǣ  Ȁ ʚǶ -)*-(ǿ ) Ǣ  Ƕ  Ȁ Ǿ.$(ǐ ʚǶ /ǡ!-( ǿ ʙ ǢʙǢʙȀ *O UIF QSJNBUF NJML FYBNQMF JU NBZ CF UIBU UIF QPTJUJWF BTTPDJBUJPO CFUXFFO MBSHF CPEZ TJ[F OFPDPSUFY QFSDFOU BSJTFT GSPN B USBEFPČ CFUXFFO MJGFTQBO BOE MFBSOJOH -BSHF BOJNBMT UFOE UP B MPOH UJNF "OE JO TVDI BOJNBMT BO JOWFTUNFOU JO MFBSOJOH NBZ CF B CFUUFS JOWFTUNFOU CFD MFBSOJOH DBO CF BNPSUJ[FE PWFS B MPOHFS MJGFTQBO #PUI MBSHF CPEZ TJ[F BOE MBSHF OFPDPSUFY JOĘVFODF NJML DPNQPTJUJPO CVU JO EJČFSFOU EJSFDUJPOT GPS EJČFSFOU SFBTPOT ćJT TUPSZ JNQMJFT UIF %"( XJUI BO BSSPX GSPN . UP / UIF ĕSTU POF JT UIF SJHIU POF #VU XJUI UIF FWJEFODF BU IBOE DBOOPU FBTJMZ TFF XIJDI JT SJHIU  $BUFHPSJDBM WBSJBCMFT
  41. Categorical variables • Many predictors are discrete, unordered categories •

    Gender, region, species • How to use in regression? • Two approaches • Use dummy/indicator variables • Use index variables • Most automated software uses dummy variables • Usually easier to think & code with index variables
  42. Dummy (indicator) variables • Variables that use 1 to indicate

    a category and 0 to indicate some other category 130 150 170 0.00 0.04 0.08 height (cm) Density height weight age male 1 151.765 47.82561 63 1 2 139.700 36.48581 63 0 3 136.525 31.86484 65 0 4 156.845 53.04191 41 1 5 145.415 41.27687 51 0 6 163.830 62.99259 35 1 7 149.225 38.24348 32 0 8 168.910 55.47997 27 1 9 147.955 34.86988 19 0 10 165.100 54.48774 54 1 11 154.305 49.89512 47 0 12 151.130 41.22017 66 1
  43. Dummy variables • Dummy variables allow each category to have

    unique intercept • Coefficient is the difference from baseline category 0/1 variable male mean when mi = 0 change in mean when mi = 1 PO JT GFNBMF *U EPFTOU NBUUFS XIJDI DBUFHPSZ‰iNBMFw PS iGFN CZ UIF  ćF NPEFM XPOU DBSF #VU DPSSFDUMZ JOUFSQSFUJOH UI BOE UIBU ZPV SFNFNCFS TP JUT B HPPE JEFB UP OBNF UIF WBSJBCMF BTTJHOFE UIF  WBMVF FČFDU PG B EVNNZ WBSJBCMF JT UP UVSO B QBSBNFUFS PO GPS UIPTF PSZ 4JNVMUBOFPVTMZ UIF WBSJBCMF UVSOT UIF TBNF QBSBNFUFS PČ BOPUIFS DBUFHPSZ ćF NPEFM UP ĕU JT IJ ∼ /PSNBM(µJ, σ) µJ = α + βN NJ T IFJHIU BOE N JT UIF EVNNZ WBSJBCMF JOEJDBUJOH B NBMF JOEJWJE S βN JT OPX UVSOFE PO BOE JOĘVFODFT QSFEJDUJPO GPS UIPTF DBT 8IFO NJ =  JU IBT OP FČFDU PO QSFEJDUJPO 5P ĕU UIJT NPEFM (+ǭ
  44. Problems with dummy variables • For k categories, need k–1

    dummy variables • Makes one of the categories a priori more uncertain than others season spring summer fall 1 winter 0 0 0 2 spring 1 0 0 3 summer 0 1 0 4 fall 0 0 1 w 4P JU UBLFT UIF WBMVF  XIFOFWFS UIF QFSTPO JT NBMF CVU JU UBLFT UIF WB STPO JT GFNBMF PS BOZ PUIFS DBUFHPSZ  *U EPFTOU NBUUFS XIJDI DBUFHPSZ‰ Fw‰JT JOEJDBUFE CZ UIF  ćF NPEFM XPOU DBSF #VU DPSSFDUMZ JOUFSQSFUJOH NBOE UIBU ZPV SFNFNCFS TP JUT B HPPE JEFB UP OBNF UIF WBSJBCMF BęFS UI FE UIF  WBMVF ćFSF BSF UXP XBZT UP NBLF B NPEFM XJUI UIJT JOGPSNBUJPO ćF ĕSTU JT UP VTF UI MF EJSFDUMZ JOTJEF UIF MJOFBS NPEFM BT JG JU XFSF B UZQJDBM QSFEJDUPS WBSJBCMF ć JDBUPS WBSJBCMF JT UP UVSO B QBSBNFUFS PO GPS UIPTF DBTFT JO UIF DBUFHPSZ 4JNV SJBCMF UVSOT UIF TBNF QBSBNFUFS PČ GPS UIPTF DBTFT JO BOPUIFS DBUFHPSZ ćJ TFOTF PODF ZPV TFF JU JO UIF NBUIFNBUJDBM EFĕOJUJPO PG UIF NPEFM $POTJ NPEFM PG IFJHIU BT JO $IBQUFS  /PX XFMM JHOPSF XFJHIU BOE UIF PUIFS WB POMZ PO TFY IJ ∼ /PSNBM(µJ, σ) µJ = α + βN NJ α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) σ ∼ 6OJGPSN(, )
  45. Index variable TJNQMF UIFTF QSJPST XJMM XBTI PVU WFSZ RVJDLMZ

    JO HFOFSBM XF TIPVME CF DBSF BDUVBMMZ NPSF VOTVSF BCPVU NBMF IFJHIU UIBO GFNBMF IFJHIU B QSJPSJ *T UIFSF B "OPUIFS BQQSPBDI BWBJMBCMF UP VT VTJOH UIF TBNF JOGPSNBUJPO JT UP VTF BO ĮįĹIJ JOTUFBE "O JOEFY WBSJBCMF DPOUBJOT JOUFHFST UIBU DPSSFTQPOE UP EJČFS ćF JOUFHFST BSF KVTU OBNFT CVU UIFZ BMTP MFU VT SFGFSFODF B MJTU PG DPSSFTQPOEJO POF GPS FBDI DBUFHPSZ *O UIJT DBTF XF DBO DPOTUSVDU PVS JOEFY MJLF UIJT 3 DPEF  ɶ. 3 ʚǶ $! '. ǿ ɶ(' ʙʙǎ Ǣ Ǐ Ǣ ǎ Ȁ ./-ǿ ɶ. 3 Ȁ )0( ȁǎǣǒǑǑȂ Ǐ ǎ ǎ Ǐ ǎ Ǐ ǎ Ǐ ǎ Ǐ ǡǡǡ /PX iw NFBOT GFNBMF BOE iw NFBOT NBMF /P PSEFS JT JNQMJFE ćFTF BSF KV UIF NBUIFNBUJDBM WFSTJPO PG UIF NPEFM CFDPNFT IJ ∼ /PSNBM(µJ, σ) µJ = αŀIJŅ[J] αK ∼ /PSNBM(, ) , GPS K = .. σ ∼ 6OJGPSN(, ) 8IBU UIJT EPFT JT DSFBUF B MJTU PG α QBSBNFUFST POF GPS FBDI VOJRVF WBMVF JO UIF 4P JO UIJT DBTF XF FOE VQ XJUI UXP α QBSBNFUFST OBNFE α BOE α  ćF OVNCF UP UIF WBMVFT JO UIF JOEFY WBSJBCMF . 3 * LOPX UIJT TFFNT PWFSMZ DPNQMJDBUFE PVS QSPCMFN XJUI UIF QSJPST /PX UIF TBNF QSJPS DBO CF BTTJHOFE UP FBDI DPS UIF OPUJPO UIBU BMM UIF DBUFHPSJFT BSF UIF TBNF QSJPS UP UIF EBUB /FJUIFS DBUFH F VOTVSF BCPVU NBMF IFJHIU UIBO GFNBMF IFJHIU B QSJPSJ *T UIFSF BOPUIFS XBZ BQQSPBDI BWBJMBCMF UP VT VTJOH UIF TBNF JOGPSNBUJPO JT UP VTF BO ĶĻıIJŅ ŃĮĿĶ E "O JOEFY WBSJBCMF DPOUBJOT JOUFHFST UIBU DPSSFTQPOE UP EJČFSFOU DBUFHPSJFT BSF KVTU OBNFT CVU UIFZ BMTP MFU VT SFGFSFODF B MJTU PG DPSSFTQPOEJOH QBSBNFUFST DBUFHPSZ *O UIJT DBTF XF DBO DPOTUSVDU PVS JOEFY MJLF UIJT ! '. ǿ ɶ(' ʙʙǎ Ǣ Ǐ Ǣ ǎ Ȁ Ȁ ǑȂ Ǐ ǎ ǎ Ǐ ǎ Ǐ ǎ Ǐ ǎ Ǐ ǡǡǡ BOT GFNBMF BOE iw NFBOT NBMF /P PSEFS JT JNQMJFE ćFTF BSF KVTU MBCFMT "OE BUJDBM WFSTJPO PG UIF NPEFM CFDPNFT IJ ∼ /PSNBM(µJ, σ) µJ = αŀIJŅ[J] αK ∼ /PSNBM(, ) , GPS K = .. σ ∼ 6OJGPSN(, ) FT JT DSFBUF B MJTU PG α QBSBNFUFST POF GPS FBDI VOJRVF WBMVF JO UIF JOEFY WBSJBCMF F XF FOE VQ XJUI UXP α QBSBNFUFST OBNFE α BOE α  ćF OVNCFST DPSSFTQPOE JO UIF JOEFY WBSJBCMF . 3 * LOPX UIJT TFFNT PWFSMZ DPNQMJDBUFE CVU JU TPMWFT XJUI UIF QSJPST /PX UIF TBNF QSJPS DBO CF BTTJHOFE UP FBDI DPSSFTQPOEJOH UP
  46. Index variable  $"5&(03*$"- 7"3*"#-&4  3 DPEF  (ǒǡǕ

    ʚǶ ,0+ǿ '$./ǿ # $"#/ ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ ȁ. 3Ȃ Ǣ ȁ. 3Ȃ ʡ )*-(ǿ ǎǔǕ Ǣ ǏǍ Ȁ Ǣ .$"( ʡ 0)$!ǿ Ǎ Ǣ ǒǍ Ȁ Ȁ Ǣ /ʙ Ȁ +- $.ǿ (ǒǡǕ Ǣ  +/#ʙǏ Ȁ ( ) . ǒǡǒʉ ǖǑǡǒʉ ȁǎȂ ǎǐǑǡǖǎ ǎǡǓǎ ǎǐǏǡǐǑ ǎǐǔǡǑǕ ȁǏȂ ǎǑǏǡǒǕ ǎǡǔǍ ǎǐǖǡǕǓ ǎǑǒǡǏǖ .$"( Ǐǔǡǐǎ ǍǡǕǐ ǏǒǡǖǕ ǏǕǡǓǐ /PUF UIF  +/#ʙǏ UIBU * BEEFE UP +- $. ćJT UFMMT JU UP TIPX BOZ WFDUPS QBSBNFUFST MJLF PVS OFX  WFDUPS 7FDUPS BOE NBUSJY QBSBNFUFST BSF IJEEFO CZ +- $ . CZ EFGBVMU CFDBVTF TPNFUJNFT UIFSF BSF MPUT PG UIFTF BOE ZPV EPOU XBOU UP JOTQFDU UIFJS JOEJWJEVBM WBMVFT :PVMM TFF XIBU * NFBO JO MBUFS DIBQUFST *OUFSQSFUJOH UIFTF QBSBNFUFST JT FBTZ FOPVHI‰UIFZ BSF UIF FYQFDUFE IFJHIUT JO FBDI DBU FHPSZ #VU PęFO XF BSF JOUFSFTUFE JO EJČFSFODFT CFUXFFO DBUFHPSJFT *O UIJT DBTF XIBU JT UIF FYQFDUFE EJČFSFODF CFUXFFO GFNBMFT BOE NBMFT 8F DBO DPNQVUF UIJT VTJOH TBNQMFT GSPN UIF QPTUFSJPS *O GBDU *MM FYUSBDU QPTUFSJPS TBNQMFT JOUP B EBUB GSBNF BOE JOTFSU PVS
  47. Differences TPNFUJNFT UIFSF BSF MPUT PG UIFTF BOE ZPV EPOU

    XBOU UP JOTQFDU UIFJS JOEJWJEVBM WBMVFT :PVMM TFF XIBU * NFBO JO MBUFS DIBQUFST *OUFSQSFUJOH UIFTF QBSBNFUFST JT FBTZ FOPVHI‰UIFZ BSF UIF FYQFDUFE IFJHIUT JO FBDI DBU FHPSZ #VU PęFO XF BSF JOUFSFTUFE JO EJČFSFODFT CFUXFFO DBUFHPSJFT *O UIJT DBTF XIBU JT UIF FYQFDUFE EJČFSFODF CFUXFFO GFNBMFT BOE NBMFT 8F DBO DPNQVUF UIJT VTJOH TBNQMFT GSPN UIF QPTUFSJPS *O GBDU *MM FYUSBDU QPTUFSJPS TBNQMFT JOUP B EBUB GSBNF BOE JOTFSU PVS DBMDVMBUJPO EJSFDUMZ JOUP UIF TBNF GSBNF 3 DPEF  +*./ ʚǶ 3/-/ǡ.(+' .ǿ(ǒǡǕȀ +*./ɶ$!!Ǿ!( ʚǶ +*./ɶȁǢǎȂ Ƕ +*./ɶȁǢǏȂ +- $.ǿ +*./ Ǣ  +/#ʙǏ Ȁ ,0+ +*./ -$*-ǣ ǎǍǍǍǍ .(+' . !-*( (ǒǡǕ ( ) . ǒǡǒʉ ǖǑǡǒʉ #$./*"-( .$"( ǏǔǡǏǖ ǍǡǕǑ Ǐǒǡǖǒ ǏǕǡǓǐ ΤΤΤΤΦΪΪΪΦΥΤΤΤ ȁǎȂ ǎǐǑǡǖǎ ǎǡǒǖ ǎǐǏǡǐǔ ǎǐǔǡǑǏ ΤΤΤΥΨΪΪΨΥΤΤΤΤ ȁǏȂ ǎǑǏǡǓǍ ǎǡǔǎ ǎǐǖǡǖǍ ǎǑǒǡǐǒ ΤΤΤΨΪΦΤΤΤ $!!Ǿ!( ǶǔǡǔǍ Ǐǡǐǐ ǶǎǎǡǑǎ Ƕǐǡǖǔ ΤΤΤΤΦΪΪΦΤΤΤ 0VS DBMDVMBUJPO BQQFBST BU UIF CPUUPN BT JG JU XFSF B OFX QBSBNFUFS JO UIF QPTUFSJPS ćJT JT UIF FYQFDUFE EJČFSFODF CFUXFFO B GFNBMF BOE NBMF JO UIF TBNQMF ćJT LJOE PG DBMDVMBUJPO JT DBMMFE B İļĻŁĿĮŀŁ /P NBUUFS IPX NBOZ DBUFHPSJFT ZPV IBWF ZPV DBO DPNQVUF UIF DPOUSBTU CFUXFFO BOZ UXP CZ VTJOH TBNQMFT GSPN UIF QPTUFSJPS UP DPNQVUF UIFJS EJČFSFODF ćFO ZPV HFU UIF QPTUFSJPS EJTUSJCVUJPO PG UIF EJČFSFODF  .BOZ DBUFHPSJFT #JOBSZ DBUFHPSJFT BSF FBTZ XIFUIFS ZPV VTF BO JOEJDBUPS WBSJBCMF PS JOTUFBE BO JOEFY WBSJBCMF #VU XIFO UIFSF BSF NPSF UIBO UXP DBUFHPSJFT UIF JOEJDBUPS WBSJBCMF BQQSPBDI FYQMPEFT :PVMM OFFE B OFX JOEJDBUPS WBSJBCMF GPS FBDI OFX DBUFHPSZ *G ZPV IBWF L VOJRVF DBUFHPSJFT ZPV OFFE L −  JOEJDBUPS WBSJBCMFT "VUPNBUFE UPPMT MJLF 3T '( EP JO GBDU HP UIJT SPVUF DPOTUSVDUJOH L− JOEJDBUPS WBSJBCMFT GPS ZPV BOE SFUVSOJOH L−
  48. Difference and uncertainty -6 -4 -2 0 2 4 0.00

    0.25 estimate likelihood 1 2 diff