Dealing with Separation in Logistic Regression Models

Slide 1

Slide 1 text

Dealing with Separation in Logistic Regression Models Carlisle Rainey Assistant Professor University at Buffalo, SUNY [email protected] paper, data, and code at crain.co/research

Slide 2

Slide 2 text

Dealing with Separation in Logistic Regression Models

Slide 3

Slide 3 text

The prior matters a lot, so choose a good one. 43 times larger million

Slide 4

Slide 4 text

The prior matters a lot, so choose a good one. 1. in practice 2. in theory 3. concepts 4. software

Slide 5

Slide 5 text

The Prior Matters in Practice

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

2 million

Slide 10

Slide 10 text

3,000

Slide 11

Slide 11 text

100%

Slide 12

Slide 12 text

90%

Slide 13

Slide 13 text

“To expand this program is not unlike adding a thousand people to the Titanic.” — July 2012

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

politics need

Slide 16

Slide 16 text

“Obamacare is going to be horrible for patients. It’s going to be horrible for taxpayers. It’s probably the biggest job killer ever.” — October 2010

Slide 17

Slide 17 text

“Obamacare is going to be horrible for patients. It’s going to be horrible for taxpayers. It’s probably the biggest job killer ever.” — October 2010 “While the federal government is committed to paying 100 percent of the cost, I cannot, in good conscience, deny Floridians that need it access to healthcare.” — February 2013

Slide 18

Slide 18 text

In the tug-of-war between politics and need, which one wins?

Slide 19

Slide 19 text

Variable Coefﬁcient Conﬁdence Interval Democratic Governor -20.35 [-6,340.06; 6,299.36] % Uninsured (Std.) 0.92 [-3.46; 5.30] % Favorable to ACA 0.01 [-0.17; 0.18] GOP Legislature 2.43 [-0.47; 5.33] Fiscal Health 0.00 [-0.02; 0.02] Medicaid Multiplier -0.32 [-2.45; 1.80] % Non-white 0.05 [-0.12; 0.21] % Metropolitan -0.08 [-0.17; 0.02] Constant 2.58 [-7.02; 12.18]

Slide 20

Slide 20 text

Doesn’t Oppose Opposes Republican 14 16 Democrat 20 0

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Variable Coefﬁcient Conﬁdence Interval Democratic Governor -26.35 [-126,979.03; 126,926.33] % Uninsured (Std.) 0.92 [-3.46; 5.30] % Favorable to ACA 0.01 [-0.17; 0.18] GOP Legislature 2.43 [-0.47; 5.33] Fiscal Health 0.00 [-0.02; 0.02] Medicaid Multiplier -0.32 [-2.45; 1.80] % Non-white 0.05 [-0.12; 0.21] % Metropolitan -0.08 [-0.17; 0.02] Constant 2.58 [-7.02; 12.18]

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Jeffreys’ Prior Zorn (2005)

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Cauchy Prior Gelman et al. (2008)

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

The Cauchy prior produces… a conﬁdence interval that is 250% wider

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

The Cauchy prior produces… a coefﬁcient estimate that is 50% larger

Slide 31

Slide 31 text

The Cauchy prior produces… a risk-ratio estimate that is 43 million times larger

Slide 32

Slide 32 text

Different default priors produce different results.

Slide 33

Slide 33 text

The Prior Matters in Theory

Slide 34

Slide 34 text

For 1. a monotonic likelihood p(y| ) decreasing in s, 2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Slide 41

Slide 41 text

Slide 42

Slide 42 text

Slide 43

Slide 43 text

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Slide 46

Slide 46 text

Slide 47

Slide 47 text

Slide 48

Slide 48 text

Slide 49

Slide 49 text

Slide 50

Slide 50 text

The prior determines crucial parts of the posterior.

Slide 51

Slide 51 text

Key Concepts for Choosing a Good Prior

Slide 52

Slide 52 text

Pr ( yi) = ⇤( c + ssi + 1xi1 + ... + kxik)

Slide 53

Slide 53 text

Prior Predictive Distribution p(ynew) = 1 R 1 p(ynew | )p( )d( )

Slide 54

Slide 54 text

0 B B B B B @ 11 12 13 . . . 1k 21 22 23 . . . 2k 31 32 33 . . . 3k . . . . . . . . . ... . . . k1 k2 k3 . . . kk 1 C C C C C A

Slide 55

Slide 55 text

simplify

Slide 56

Slide 56 text

We Already Know Few Things 1 ⇡ ˆmle 1 2 ⇡ ˆmle 2 . . . k ⇡ ˆmle k s < 0

Slide 57

Slide 57 text

0 B B B B B @ 11 12 13 . . . 1k 21 22 23 . . . 2k 31 32 33 . . . 3k . . . . . . . . . ... . . . k1 k2 k3 . . . kk 1 C C C C C A

Slide 58

Slide 58 text

0 B B B B B @ 11 12 13 . . . 1k 21 22 23 . . . 2k 31 32 33 . . . 3k . . . . . . . . . ... . . . k1 k2 k3 . . . kk 1 C C C C C A

Slide 59

Slide 59 text

Partial Prior Predictive Distribution p⇤(ynew) = R 0 1 p(ynew | s, ˆmle s )p( s | s  0)d( s)

Slide 60

Slide 60 text

1. Choose a prior distribution p( s) . 2. Estimate the model coefﬁcients ˆmle . 3. For i in 1 to nsims, do the following: (a) Simulate ˜[i] s ⇠ p( s) . (b) Replace ˆmle s in ˆmle with ˜[i] s , yielding the vector ˜[i] . (c) Calculate and store the quantity of interest ˜ q[i] = q ⇣ ˜[i] ⌘ . 4. Keep only the simulations in the direction of the separation. 5. Summarize the simulations ˜ q using quantiles, histograms, or density plots. 6. If the prior is inadequate, then update the prior distribution p( s) .

Slide 61

Slide 61 text

Example Nuclear Weapons and War

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

The prior matters, so robustness checks are critical.

Slide 66

Slide 66 text

1 10 100 1000 10000 100000 Risk−Ratio (Log Scale) 0 500 1000 Counts Informative Normal(0, 4.5) Prior 1% of simulations 1 10 100 1000 10000 100000 Risk−Ratio (Log Scale) Skeptical Normal(0, 2) Prior < 1% of simulations 1 10 100 1000 10000 100000 Risk−Ratio (Log Scale) Enthusiastic Normal(0, 8) Prior 15% of simulations

Slide 67

Slide 67 text

0.00 0.05 0.10 0.15 0.20 0.25 Posterior Density Informative Normal(0, 4.5) Prior Skeptical Normal(0, 2) Prior Enthusiastic Normal(0, 8) Prior −20 −15 −10 −5 0 Coefficient of Symmetric Nuclear Dyads −20 −15 −10 −5 0 Coefficient of Symmetric Nuclear Dyads 0.00 0.05 0.10 0.15 0.20 0.25 Posterior Density Zorn's Default Jeffreys' Prior −20 −15 −10 −5 0 Coefficient of Symmetric Nuclear Dyads Gelman et al.'s Default Cauchy(0, 2.5) Prior

Slide 68

Slide 68 text

0.1 1 10 100 1,000 10,000 100,000 Posterior Distribution of Risk−Ratio of War in Nonnuclear Dyads Compared to Symmetric Nuclear Dyads ● Informative Normal(0, 4.5) Prior 0.1 24.5 1986.4 ● Skeptical Normal(0, 2) Prior 0.1 4 31.2 ● Enthusiastic Normal(0, 8) Prior 0.1 299.2 499043.2 ● Zorn's Default Jefferys' Prior 0.1 3.4 100.2 ● Gelman et al.'s Default Cauchy(0, 2.5) Prior 0.1 9.2 25277.4

Slide 69

Slide 69 text

Software for Choosing a Good Prior

Slide 70

Slide 70 text

separation (on GitHub)

Slide 71

Slide 71 text

crain.co/example

Slide 72

Slide 72 text

# install packages devtools::install_github("carlislerainey/compactr") devtools::install_github("carlislerainey/separation") # load packages library(separation) library(arm) # for rescale() # load and recode data data(politics_and_need) d <- politics_and_need d$dem_governor <- 1 - d$gop_governor d$st_percent_uninsured <- rescale(d$percent_uninsured) # formula to use throughout f <- oppose_expansion ~ dem_governor + percent_favorable_aca + gop_leg + st_percent_uninsured + bal2012 + multiplier + percent_nonwhite + percent_metro

Slide 73

Slide 73 text

Workﬂow 1. Calculate the PPPD: calc_pppd() 2. Simulate from the posterior: sim_post_*() 3. Calculate quantities of interest: calc_qi()

Slide 74

Slide 74 text

calc_pppd()

Slide 75

Slide 75 text

# informative prior prior_sims_4.5 <- rnorm(10000, 0, 4.5) pppd <- calc_pppd(formula = f, data = d, prior_sims = prior_sims_4.5, sep_var_name = "dem_governor", prior_label = "Normal(0, 4.5)")

Slide 76

Slide 76 text

plot(pppd)

Slide 77

Slide 77 text

plot(pppd, log_scale = TRUE)

Slide 78

Slide 78 text

sim_post_normal() sim_post_gelman() sim_post_jeffreys()

Slide 79

Slide 79 text

# mcmc estimation post <- sim_post_normal(f, d, sep_var = "dem_governor", sd = 4.5, n_sims = 10000, n_burnin = 1000, n_chains = 4)

Slide 80

Slide 80 text

calc_qi()

Slide 81

Slide 81 text

# compute quantities of interest ## dem_governor X_pred_list <- set_at_median(f, d) x <- c(0, 1) X_pred_list$dem_governor <- x qi <- calc_qi(post, X_pred_list, qi_name = "fd")

Slide 82

Slide 82 text

plot(qi, xlim = c(-1, 1), xlab = "First Difference", ylab = "Posterior Density", main = "The Effect of Democratic Partisanship on Opposing the Expansion")

Slide 83

Slide 83 text

## st_percent_uninsured X_pred_list <- set_at_median(f, d) x <- seq(min(d$st_percent_uninsured), max(d$st_percent_uninsured), by = 0.1) X_pred_list$st_percent_uninsured <- x qi <- calc_qi(post, X_pred_list, qi_name = "pr")

Slide 84

Slide 84 text

plot(qi, x, xlab = "Percent Uninsured (Std.)", ylab = "Predicted Probability", main = "The Probability of Opposition as the Percent Uninsured (Std.) Varies")

Slide 85

Slide 85 text

15 lines

Slide 86

Slide 86 text

Conclusion

Slide 87

Slide 87 text

The prior matters a lot, so choose a good one.

Slide 88

Slide 88 text

The prior matters in practice.

Slide 89

Slide 89 text

The prior matters in theory.

Slide 90

Slide 90 text

The partial prior predictive distribution simpliﬁes the choice of prior.

Slide 91

Slide 91 text

Software makes choosing a prior, estimating the model, and interpreting the estimates easy.

Slide 92

Slide 92 text

What should you do? 1. Notice the problem and do something. 2. Recognize the the prior affects the inferences and choose a good one. 3. Assess the robustness of your conclusions to a range of prior distributions.

Slide 93

Slide 93 text

Questions?

Slide 94

Slide 94 text

Appendix

Slide 95

Slide 95 text

−15 −10 −5 0 Posterior Median and 90% HPD for Coefficient of Symmetric Nuclear Dyads ● Informative Normal(0, 4.5) Prior ● Skeptical Normal(0, 2) Prior ● Enthusiastic Normal(0, 8) Prior ● Zorn's Default Jefferys' Invariant Prior ● Gelman et al.'s Default Cauchy(0, 2.5) Prior

Slide 96

Slide 96 text

0.0 0.2 0.4 0.6 0.8 1.0 Pr(RR > 1) ● Informative Normal(0, 4.5) Prior 0.93 ● Skeptical Normal(0, 2) Prior 0.86 ● Enthusiastic Normal(0, 8) Prior 0.96 ● Zorn's Default Jeffreys' Prior 0.79 ● Gelman et al.'s Default Cauchy(0, 2.5) Prior 0.9

Slide 97

Slide 97 text

Slide 98

Slide 98 text

No content

Slide 99

Slide 99 text

No content

Slide 100

Slide 100 text

No content

Slide 101

Slide 101 text

No content

Slide 102

Slide 102 text

No content

Slide 103

Slide 103 text

Theorem 1. For a monotonic likelihood p(y| ) increasing [decreasing] in s, proper prior distribution p( | ) , and large positive [negative] s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .

Slide 104

Slide 104 text

Proof. Due to separation, p(y| ) is monotonic increasing in s to a limit L , so that lim s !1 p(y| s ) = L . By Bayes’ rule, p( |y) = p(y| )p( | ) 1 R 1 p(y| )p( | )d = p(y| )p( | ) p(y| ) | {z } constant w.r.t. . Integrating out the other parameters s = h cons , 1, 2, ..., k i to obtain the posterior distribution of s, p( s |y) = 1 R 1 p(y| )p( | )d s p(y| ) , (1) and the prior distribution of s, p( s | ) = 1 Z 1 p( | )d s . Notice that p( s |y) / p( s | ) iff p( s |y) p( | ) = k , where the constant k 6= 0 .Thus,

Slide 105

Slide 105 text

p( s | ) = 1 Z 1 p( | )d s . Notice that p( s |y) / p( s | ) iff p( s |y) p( s | ) = k , where the constant k 6= 0 .Thus, Theorem 1 implies that lim s !1 p( s |y) p( s | ) = k Substituting in Equation 1, lim s !1 1 R 1 p ( y | ) p ( | ) d s p ( y | ) p( s | ) = k. Multiplying both sides by p(y| ) , which is constant with respect to , lim s !1 1 R 1 p(y| )p( | )d s p( s | ) = kp(y| ). Setting 1 R p(y| )p( | )d s = p(y| s )p( s | ) ,

Slide 106

Slide 106 text

s !1 p( s | ) Substituting in Equation 1, lim s !1 1 R 1 p ( y | ) p ( | ) d s p ( y | ) p( s | ) = k. Multiplying both sides by p(y| ) , which is constant with respect to , lim s !1 1 R 1 p(y| )p( | )d s p( s | ) = kp(y| ). Setting 1 R 1 p(y| )p( | )d s = p(y| s )p( s | ) , lim s !1 p(y| s )p( s | ) p( s | ) = kp(y| ). Canceling p( s | ) in the numerator and denominator, lim s !1 p(y| s ) = kp(y| ).