Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dealing with Separation in Logistic Regression Models

Bf99409063473973c7f9d3cf4f882492?s=47 Carlisle Rainey
December 04, 2014

Dealing with Separation in Logistic Regression Models

Slides for a paper available at http://www.carlislerainey.com/papers/separation.pdf

Bf99409063473973c7f9d3cf4f882492?s=128

Carlisle Rainey

December 04, 2014
Tweet

More Decks by Carlisle Rainey

Other Decks in Research

Transcript

  1. Dealing with Separation in Logistic Regression Models Carlisle Rainey Assistant

    Professor University at Buffalo, SUNY rcrainey@buffalo.edu paper, data, and code at crain.co/research
  2. Dealing with Separation in Logistic Regression Models

  3. The prior matters a lot, so choose a good one.

    43 times larger million
  4. The prior matters a lot, so choose a good one.

    1. in practice 2. in theory 3. concepts 4. software
  5. The Prior Matters in Practice

  6. None
  7. None
  8. None
  9. 2 million

  10. 3,000

  11. 100%

  12. 90%

  13. “To expand this program is not unlike adding a thousand

    people to the Titanic.” — July 2012
  14. None
  15. politics need

  16. “Obamacare is going to be horrible for patients. It’s going

    to be horrible for taxpayers. It’s probably the biggest job killer ever.” — October 2010
  17. “Obamacare is going to be horrible for patients. It’s going

    to be horrible for taxpayers. It’s probably the biggest job killer ever.” — October 2010 “While the federal government is committed to paying 100 percent of the cost, I cannot, in good conscience, deny Floridians that need it access to healthcare.” — February 2013
  18. In the tug-of-war between politics and need, which one wins?

  19. Variable Coefficient Confidence Interval Democratic Governor -20.35 [-6,340.06; 6,299.36] %

    Uninsured (Std.) 0.92 [-3.46; 5.30] % Favorable to ACA 0.01 [-0.17; 0.18] GOP Legislature 2.43 [-0.47; 5.33] Fiscal Health 0.00 [-0.02; 0.02] Medicaid Multiplier -0.32 [-2.45; 1.80] % Non-white 0.05 [-0.12; 0.21] % Metropolitan -0.08 [-0.17; 0.02] Constant 2.58 [-7.02; 12.18]
  20. Doesn’t Oppose Opposes Republican 14 16 Democrat 20 0

  21. None
  22. Variable Coefficient Confidence Interval Democratic Governor -26.35 [-126,979.03; 126,926.33] %

    Uninsured (Std.) 0.92 [-3.46; 5.30] % Favorable to ACA 0.01 [-0.17; 0.18] GOP Legislature 2.43 [-0.47; 5.33] Fiscal Health 0.00 [-0.02; 0.02] Medicaid Multiplier -0.32 [-2.45; 1.80] % Non-white 0.05 [-0.12; 0.21] % Metropolitan -0.08 [-0.17; 0.02] Constant 2.58 [-7.02; 12.18]
  23. Variable Coefficient Confidence Interval Democratic Governor -26.35 [-126,979.03; 126,926.33] %

    Uninsured (Std.) 0.92 [-3.46; 5.30] % Favorable to ACA 0.01 [-0.17; 0.18] GOP Legislature 2.43 [-0.47; 5.33] Fiscal Health 0.00 [-0.02; 0.02] Medicaid Multiplier -0.32 [-2.45; 1.80] % Non-white 0.05 [-0.12; 0.21] % Metropolitan -0.08 [-0.17; 0.02] Constant 2.58 [-7.02; 12.18] useless unreasonable This is a failure of maximum likelihood.
  24. Jeffreys’ Prior Zorn (2005)

  25. None
  26. Cauchy Prior Gelman et al. (2008)

  27. None
  28. The Cauchy prior produces… a confidence interval that is 250%

    wider
  29. None
  30. The Cauchy prior produces… a coefficient estimate that is 50%

    larger
  31. The Cauchy prior produces… a risk-ratio estimate that is 43

    million times larger
  32. Different default priors produce different results.

  33. The Prior Matters in Theory

  34. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  35. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  36. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  37. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  38. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  39. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  40. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  41. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  42. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  43. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  44. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  45. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  46. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  47. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  48. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  49. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  50. The prior determines crucial parts of the posterior.

  51. Key Concepts for Choosing a Good Prior

  52. Pr ( yi) = ⇤( c + ssi + 1xi1

    + ... + kxik)
  53. Prior Predictive Distribution p(ynew) = 1 R 1 p(ynew |

    )p( )d( )
  54. 0 B B B B B @ 11 12 13

    . . . 1k 21 22 23 . . . 2k 31 32 33 . . . 3k . . . . . . . . . ... . . . k1 k2 k3 . . . kk 1 C C C C C A
  55. simplify

  56. We Already Know Few Things 1 ⇡ ˆmle 1 2

    ⇡ ˆmle 2 . . . k ⇡ ˆmle k s < 0
  57. 0 B B B B B @ 11 12 13

    . . . 1k 21 22 23 . . . 2k 31 32 33 . . . 3k . . . . . . . . . ... . . . k1 k2 k3 . . . kk 1 C C C C C A
  58. 0 B B B B B @ 11 12 13

    . . . 1k 21 22 23 . . . 2k 31 32 33 . . . 3k . . . . . . . . . ... . . . k1 k2 k3 . . . kk 1 C C C C C A
  59. Partial Prior Predictive Distribution p⇤(ynew) = R 0 1 p(ynew

    | s, ˆmle s )p( s | s  0)d( s)
  60. 1. Choose a prior distribution p( s) . 2. Estimate

    the model coefficients ˆmle . 3. For i in 1 to nsims, do the following: (a) Simulate ˜[i] s ⇠ p( s) . (b) Replace ˆmle s in ˆmle with ˜[i] s , yielding the vector ˜[i] . (c) Calculate and store the quantity of interest ˜ q[i] = q ⇣ ˜[i] ⌘ . 4. Keep only the simulations in the direction of the separation. 5. Summarize the simulations ˜ q using quantiles, histograms, or density plots. 6. If the prior is inadequate, then update the prior distribution p( s) .
  61. Example Nuclear Weapons and War

  62. None
  63. None
  64. None
  65. The prior matters, so robustness checks are critical.

  66. 1 10 100 1000 10000 100000 Risk−Ratio (Log Scale) 0

    500 1000 Counts Informative Normal(0, 4.5) Prior 1% of simulations 1 10 100 1000 10000 100000 Risk−Ratio (Log Scale) Skeptical Normal(0, 2) Prior < 1% of simulations 1 10 100 1000 10000 100000 Risk−Ratio (Log Scale) Enthusiastic Normal(0, 8) Prior 15% of simulations
  67. 0.00 0.05 0.10 0.15 0.20 0.25 Posterior Density Informative Normal(0,

    4.5) Prior Skeptical Normal(0, 2) Prior Enthusiastic Normal(0, 8) Prior −20 −15 −10 −5 0 Coefficient of Symmetric Nuclear Dyads −20 −15 −10 −5 0 Coefficient of Symmetric Nuclear Dyads 0.00 0.05 0.10 0.15 0.20 0.25 Posterior Density Zorn's Default Jeffreys' Prior −20 −15 −10 −5 0 Coefficient of Symmetric Nuclear Dyads Gelman et al.'s Default Cauchy(0, 2.5) Prior
  68. 0.1 1 10 100 1,000 10,000 100,000 Posterior Distribution of

    Risk−Ratio of War in Nonnuclear Dyads Compared to Symmetric Nuclear Dyads • Informative Normal(0, 4.5) Prior 0.1 24.5 1986.4 • Skeptical Normal(0, 2) Prior 0.1 4 31.2 • Enthusiastic Normal(0, 8) Prior 0.1 299.2 499043.2 • Zorn's Default Jefferys' Prior 0.1 3.4 100.2 • Gelman et al.'s Default Cauchy(0, 2.5) Prior 0.1 9.2 25277.4
  69. Software for Choosing a Good Prior

  70. separation (on GitHub)

  71. crain.co/example

  72. # install packages devtools::install_github("carlislerainey/compactr") devtools::install_github("carlislerainey/separation") # load packages library(separation) library(arm)

    # for rescale() # load and recode data data(politics_and_need) d <- politics_and_need d$dem_governor <- 1 - d$gop_governor d$st_percent_uninsured <- rescale(d$percent_uninsured) # formula to use throughout f <- oppose_expansion ~ dem_governor + percent_favorable_aca + gop_leg + st_percent_uninsured + bal2012 + multiplier + percent_nonwhite + percent_metro
  73. Workflow 1. Calculate the PPPD: calc_pppd() 2. Simulate from the

    posterior: sim_post_*() 3. Calculate quantities of interest: calc_qi()
  74. calc_pppd()

  75. # informative prior prior_sims_4.5 <- rnorm(10000, 0, 4.5) pppd <-

    calc_pppd(formula = f, data = d, prior_sims = prior_sims_4.5, sep_var_name = "dem_governor", prior_label = "Normal(0, 4.5)")
  76. plot(pppd)

  77. plot(pppd, log_scale = TRUE)

  78. sim_post_normal() sim_post_gelman() sim_post_jeffreys()

  79. # mcmc estimation post <- sim_post_normal(f, d, sep_var = "dem_governor",

    sd = 4.5, n_sims = 10000, n_burnin = 1000, n_chains = 4)
  80. calc_qi()

  81. # compute quantities of interest ## dem_governor X_pred_list <- set_at_median(f,

    d) x <- c(0, 1) X_pred_list$dem_governor <- x qi <- calc_qi(post, X_pred_list, qi_name = "fd")
  82. plot(qi, xlim = c(-1, 1), xlab = "First Difference", ylab

    = "Posterior Density", main = "The Effect of Democratic Partisanship on Opposing the Expansion")
  83. ## st_percent_uninsured X_pred_list <- set_at_median(f, d) x <- seq(min(d$st_percent_uninsured), max(d$st_percent_uninsured),

    by = 0.1) X_pred_list$st_percent_uninsured <- x qi <- calc_qi(post, X_pred_list, qi_name = "pr")
  84. plot(qi, x, xlab = "Percent Uninsured (Std.)", ylab = "Predicted

    Probability", main = "The Probability of Opposition as the Percent Uninsured (Std.) Varies")
  85. 15 lines

  86. Conclusion

  87. The prior matters a lot, so choose a good one.

  88. The prior matters in practice.

  89. The prior matters in theory.

  90. The partial prior predictive distribution simplifies the choice of prior.

  91. Software makes choosing a prior, estimating the model, and interpreting

    the estimates easy.
  92. What should you do? 1. Notice the problem and do

    something. 2. Recognize the the prior affects the inferences and choose a good one. 3. Assess the robustness of your conclusions to a range of prior distributions.
  93. Questions?

  94. Appendix

  95. −15 −10 −5 0 Posterior Median and 90% HPD for

    Coefficient of Symmetric Nuclear Dyads • Informative Normal(0, 4.5) Prior • Skeptical Normal(0, 2) Prior • Enthusiastic Normal(0, 8) Prior • Zorn's Default Jefferys' Invariant Prior • Gelman et al.'s Default Cauchy(0, 2.5) Prior
  96. 0.0 0.2 0.4 0.6 0.8 1.0 Pr(RR > 1) •

    Informative Normal(0, 4.5) Prior 0.93 • Skeptical Normal(0, 2) Prior 0.86 • Enthusiastic Normal(0, 8) Prior 0.96 • Zorn's Default Jeffreys' Prior 0.79 • Gelman et al.'s Default Cauchy(0, 2.5) Prior 0.9
  97. For 1. a monotonic likelihood p(y| ) decreasing in s,

    2. a proper prior distribution p( | ) , and 3. a large, negative s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  98. None
  99. None
  100. None
  101. None
  102. None
  103. Theorem 1. For a monotonic likelihood p(y| ) increasing [decreasing]

    in s, proper prior distribution p( | ) , and large positive [negative] s, the posterior distribution of s is proportional to the prior distribution for s, so that p( s |y) / p( s | ) .
  104. Proof. Due to separation, p(y| ) is monotonic increasing in

    s to a limit L , so that lim s !1 p(y| s ) = L . By Bayes’ rule, p( |y) = p(y| )p( | ) 1 R 1 p(y| )p( | )d = p(y| )p( | ) p(y| ) | {z } constant w.r.t. . Integrating out the other parameters s = h cons , 1, 2, ..., k i to obtain the posterior distribution of s, p( s |y) = 1 R 1 p(y| )p( | )d s p(y| ) , (1) and the prior distribution of s, p( s | ) = 1 Z 1 p( | )d s . Notice that p( s |y) / p( s | ) iff p( s |y) p( | ) = k , where the constant k 6= 0 .Thus,
  105. p( s | ) = 1 Z 1 p( |

    )d s . Notice that p( s |y) / p( s | ) iff p( s |y) p( s | ) = k , where the constant k 6= 0 .Thus, Theorem 1 implies that lim s !1 p( s |y) p( s | ) = k Substituting in Equation 1, lim s !1 1 R 1 p ( y | ) p ( | ) d s p ( y | ) p( s | ) = k. Multiplying both sides by p(y| ) , which is constant with respect to , lim s !1 1 R 1 p(y| )p( | )d s p( s | ) = kp(y| ). Setting 1 R p(y| )p( | )d s = p(y| s )p( s | ) ,
  106. s !1 p( s | ) Substituting in Equation 1,

    lim s !1 1 R 1 p ( y | ) p ( | ) d s p ( y | ) p( s | ) = k. Multiplying both sides by p(y| ) , which is constant with respect to , lim s !1 1 R 1 p(y| )p( | )d s p( s | ) = kp(y| ). Setting 1 R 1 p(y| )p( | )d s = p(y| s )p( s | ) , lim s !1 p(y| s )p( s | ) p( s | ) = kp(y| ). Canceling p( s | ) in the numerator and denominator, lim s !1 p(y| s ) = kp(y| ).