20 30 40 50 6 8 10 12 14 Waffle Houses per million Divorce rate AL AR GA ME NJ OK SC 'Ķ QF JO QP BU QS BH UIBO POF UZQF PG JOĘVFODF XF TIP POF DBVTF DBO IJEF BOPUIFS .VMUJ *OUFSBDUJPOT &WFO XIFO WBSJBCMFT
does not imply correlation • Causation implies conditional correlation • Need more than just models • Q: Does marriage cause divorce? 4163*064 Marriage rate Divorce rate 13 20 30 6 10 13 'ĶĴłĿIJ ƍƊ %JWPSDF SBUF JT BTTPDJBU NFEJBO BHF BU NBSSJBHF SJHIU #PUI JO UIJT FYBNQMF ćF BWFSBHF NBSSJBHF
value of a predictor, once we know the other predictors? • What is value of knowing marriage rate, once we already know median age at marriage? • What is value of knowing median age marriage, once we know marriage rate?
M UP LOPX IPX UP ESBX UIJT TFF UIF 0WFSUIJOLJOH CPY BU UIF FOE PG UI L MJLF NVDI CVU UIJT UZQF PG EJBHSBN EPFT B MPU PG XPSL *U SFQSFTFOU FM -JLF PUIFS NPEFMT JU JT BO BOBMZUJDBM BTTVNQUJPO ćF TZNCPMT " FSWFE WBSJBCMFT ćF BSSPXT TIPX EJSFDUJPOT PG JOĘVFODF 8IBU UIJT % NBZ EJSFDUMZ JOĘVFODF % NBZ EJSFDUMZ JOĘVFODF % Median age of marriage Marriage rate Divorce rate Implications: (1) M is a function of A (2) D is a function of A and M (3) The total causal effect of A has two paths: (a) A –> M –> D (b) A –> D
bA bM -0.5 0.0 0.5 Estimate S NFBOT TIPXO CZ UIF QPJOUT BOE UIF DPNQBUJCJMJUZ JOUFSWBMT CZ UIF TPMJE IPSJ OFT /PUJDF IPX EPFTOU NPWF POMZ HSPXT B CJU NPSF VODFSUBJO XIJMF JT POMZ FE XJUI EJWPSDF XIFO BHF BU NBSSJBHF JT NJTTJOH GSPN UIF NPEFM :PV DBO JOUFSQSFU TUSJCVUJPOT BT TBZJOH 0ODF XF LOPX NFEJBO BHF BU NBSSJBHF GPS B 4UBUF UIFSF JT MJUUMF PS OP BEEJ UJPOBM QSFEJDUJWF QPXFS JO BMTP LOPXJOH UIF SBUF PG NBSSJBHF JO UIBU 4UBUF BU UIJT EPFT OPU NFBO UIBU UIFSF JT OP WBMVF JO LOPXJOH NBSSJBHF SBUF $POTJTUFOU m5.1: age of marriage only D ~ A m5.2: marriage rate only D ~ M m5.3: multiple regression D ~ A + M
with outcome, “controlling” for other predictors • Useful intuition • Never analyze residuals! • Recipe: 1. Regress predictor on other predictors 2. Compute predictor residuals 3. Regress outcome on residuals
Age at marriage (std) Marriage rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 3 Marriage rate (std) Age at marriage (std) DC HI ID -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2 -1 0 1 2 Marriage rate residuals Divorce rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 Age at marriage residuals Divorce rate (std) DC HI ID Figure 5.4
Age at marriage (std) Marriage rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 3 Marriage rate (std) Age at marriage (std) DC HI ID -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2 -1 0 1 2 Marriage rate residuals Divorce rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 Age at marriage residuals Divorce rate (std) DC HI ID Figure 5.4
Age at marriage (std) Marriage rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 3 Marriage rate (std) Age at marriage (std) DC HI ID -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2 -1 0 1 2 Marriage rate residuals Divorce rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 Age at marriage residuals Divorce rate (std) DC HI ID Figure 5.4
Age at marriage (std) Marriage rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 3 Marriage rate (std) Age at marriage (std) DC HI ID -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -2 -1 0 1 2 Marriage rate residuals Divorce rate (std) DC HI ME ND WY -1 0 1 2 -2 -1 0 1 2 Age at marriage residuals Divorce rate (std) DC HI ID Figure 5.4
cases • Check model fit — golems do make mistakes • Find model failures, stimulate new ideas • Always average over the posterior distribution • Using only posterior mean leads to overconfidence • Embrace the uncertainty
2 -2 -1 0 1 Observed divorce Predicted divorce ID ME RI UT 'ĶĴłĿIJ ƍƎ 1 NVMUJWBSJBUF E [POUBM BYJT JT U 4UBUF ćF WFS QSFEJDUFE EJWP EJBO BHF BU N CMVF MJOF TFHN UFSWBMT PG UIF FRVBMJUZ
by another variable • Need both variables to see influence of either • Tends to arise when • Another predictor associated with outcome in opposite direction • Both predictors associated with one another • Noise in predictors can also mask association (residual confounding)
Gender, region, species • How to use in regression? • Two approaches • Use dummy/indicator variables • Use index variables • Most automated software uses dummy variables • Usually easier to think & code with index variables
unique intercept • Coefficient is the difference from baseline category 0/1 variable male mean when mi = 0 change in mean when mi = 1 PO JT GFNBMF *U EPFTOU NBUUFS XIJDI DBUFHPSZiNBMFw PS iGFN CZ UIF ćF NPEFM XPOU DBSF #VU DPSSFDUMZ JOUFSQSFUJOH UI BOE UIBU ZPV SFNFNCFS TP JUT B HPPE JEFB UP OBNF UIF WBSJBCMF BTTJHOFE UIF WBMVF FČFDU PG B EVNNZ WBSJBCMF JT UP UVSO B QBSBNFUFS PO GPS UIPTF PSZ 4JNVMUBOFPVTMZ UIF WBSJBCMF UVSOT UIF TBNF QBSBNFUFS PČ BOPUIFS DBUFHPSZ ćF NPEFM UP ĕU JT IJ ∼ /PSNBM(µJ, σ) µJ = α + βN NJ T IFJHIU BOE N JT UIF EVNNZ WBSJBCMF JOEJDBUJOH B NBMF JOEJWJE S βN JT OPX UVSOFE PO BOE JOĘVFODFT QSFEJDUJPO GPS UIPTF DBT 8IFO NJ = JU IBT OP FČFDU PO QSFEJDUJPO 5P ĕU UIJT NPEFM (+ǭ