Carlisle Rainey
December 04, 2014
430

Dealing with Separation in Logistic Regression Models

Slides for a paper available at http://www.carlislerainey.com/papers/separation.pdf

Carlisle Rainey

December 04, 2014

Transcript

1. Dealing with Separation in
Logistic Regression Models
Carlisle Rainey
Assistant Professor
University at Buffalo, SUNY
[email protected]
paper, data, and code at
crain.co/research

2. Dealing with Separation in
Logistic Regression Models

3. The prior matters a lot,
so choose a good one.
43 times larger
million

4. The prior matters a lot,
so choose a good one.
1. in practice
2. in theory
3. concepts
4. software

5. The Prior Matters
in Practice

6. 2 million

7. 3,000

8. 100%

9. 90%

10. “To expand this program is not
unlike adding a thousand
people to the Titanic.”

— July 2012

11. politics need

12. “Obamacare is going to be horrible
for patients. It’s going to be horrible
for taxpayers. It’s probably the
biggest job killer ever.”
— October 2010

13. “Obamacare is going to be horrible
for patients. It’s going to be horrible
for taxpayers. It’s probably the
biggest job killer ever.”
— October 2010
“While the federal government is committed
to paying 100 percent of the cost, I cannot,
in good conscience, deny Floridians that
— February 2013

14. In the tug-of-war between politics and need,
which one wins?

15. Variable Coefﬁcient Conﬁdence Interval
Democratic Governor -20.35 [-6,340.06; 6,299.36]
% Uninsured (Std.) 0.92 [-3.46; 5.30]
% Favorable to ACA 0.01 [-0.17; 0.18]
GOP Legislature 2.43 [-0.47; 5.33]
Fiscal Health 0.00 [-0.02; 0.02]
Medicaid Multiplier -0.32 [-2.45; 1.80]
% Non-white 0.05 [-0.12; 0.21]
% Metropolitan -0.08 [-0.17; 0.02]
Constant 2.58 [-7.02; 12.18]

16. Doesn’t Oppose Opposes
Republican 14 16
Democrat 20 0

17. Variable Coefﬁcient Conﬁdence Interval
Democratic Governor -26.35 [-126,979.03; 126,926.33]
% Uninsured (Std.) 0.92 [-3.46; 5.30]
% Favorable to ACA 0.01 [-0.17; 0.18]
GOP Legislature 2.43 [-0.47; 5.33]
Fiscal Health 0.00 [-0.02; 0.02]
Medicaid Multiplier -0.32 [-2.45; 1.80]
% Non-white 0.05 [-0.12; 0.21]
% Metropolitan -0.08 [-0.17; 0.02]
Constant 2.58 [-7.02; 12.18]

18. Variable Coefﬁcient Conﬁdence Interval
Democratic Governor -26.35 [-126,979.03; 126,926.33]
% Uninsured (Std.) 0.92 [-3.46; 5.30]
% Favorable to ACA 0.01 [-0.17; 0.18]
GOP Legislature 2.43 [-0.47; 5.33]
Fiscal Health 0.00 [-0.02; 0.02]
Medicaid Multiplier -0.32 [-2.45; 1.80]
% Non-white 0.05 [-0.12; 0.21]
% Metropolitan -0.08 [-0.17; 0.02]
Constant 2.58 [-7.02; 12.18]
useless
unreasonable
This is a failure of maximum likelihood.

19. Jeffreys’ Prior
Zorn (2005)

20. Cauchy Prior
Gelman et al. (2008)

21. The Cauchy prior produces…
a conﬁdence interval that is
250% wider

22. The Cauchy prior produces…
a coefﬁcient estimate that is
50% larger

23. The Cauchy prior produces…
a risk-ratio estimate that is
43 million times larger

24. Different default priors
produce different results.

25. The Prior Matters
in Theory

26. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

27. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

28. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

29. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

30. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

31. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

32. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

33. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

34. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

35. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

36. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

37. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

38. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

39. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

40. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

41. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

42. The prior determines
crucial parts of the posterior.

43. Key Concepts
for Choosing a Good Prior

44. Pr
(
yi) = ⇤( c + ssi + 1xi1 +
...
+ kxik)

45. Prior Predictive Distribution
p(ynew) =
1
R
1
p(ynew
| )p( )d( )

46. 0
B
B
B
B
B
@
11 12 13 . . . 1k
21 22 23 . . . 2k
31 32 33 . . . 3k
.
.
.
.
.
.
.
.
.
...
.
.
.
k1 k2 k3 . . . kk
1
C
C
C
C
C
A

47. simplify

48. We Already Know Few Things
1
⇡ ˆmle
1
2
⇡ ˆmle
2
.
.
.
k
⇡ ˆmle
k
s < 0

49. 0
B
B
B
B
B
@
11 12 13 . . . 1k
21 22 23 . . . 2k
31 32 33 . . . 3k
.
.
.
.
.
.
.
.
.
...
.
.
.
k1 k2 k3 . . . kk
1
C
C
C
C
C
A

50. 0
B
B
B
B
B
@
11 12 13 . . . 1k
21 22 23 . . . 2k
31 32 33 . . . 3k
.
.
.
.
.
.
.
.
.
...
.
.
.
k1 k2 k3 . . . kk
1
C
C
C
C
C
A

51. Partial Prior Predictive
Distribution
p⇤(ynew) =
R 0
1
p(ynew
| s, ˆmle
s
)p( s
| s
 0)d( s)

52. 1. Choose a prior distribution
p( s)
.
2. Estimate the model coefﬁcients
ˆmle
.
3. For
i
in 1 to
nsims, do the following:
(a) Simulate
˜[i]
s ⇠ p( s)
.
(b) Replace
ˆmle
s in
ˆmle
with
˜[i]
s , yielding the vector
˜[i]
.
(c) Calculate and store the quantity of interest
˜
q[i] = q

˜[i]

.
4. Keep only the simulations in the direction of the separation.
5. Summarize the simulations
˜
q
using quantiles, histograms, or density plots.
6. If the prior is inadequate, then update the prior distribution
p( s)
.

53. Example
Nuclear Weapons and War

54. The prior matters,
so robustness checks
are critical.

55. 1 10 100 1000 10000 100000
Risk−Ratio (Log Scale)
0
500
1000
Counts
Informative Normal(0, 4.5) Prior
1% of
simulations
1 10 100 1000 10000 100000
Risk−Ratio (Log Scale)
Skeptical Normal(0, 2) Prior
< 1% of
simulations
1 10 100 1000 10000 100000
Risk−Ratio (Log Scale)
Enthusiastic Normal(0, 8) Prior
15% of
simulations

56. 0.00
0.05
0.10
0.15
0.20
0.25
Posterior Density
Informative Normal(0, 4.5) Prior Skeptical Normal(0, 2) Prior Enthusiastic Normal(0, 8) Prior
−20 −15 −10 −5 0
Coefficient of Symmetric Nuclear Dyads
−20 −15 −10 −5 0
Coefficient of Symmetric Nuclear Dyads
0.00
0.05
0.10
0.15
0.20
0.25
Posterior Density
Zorn's Default Jeffreys' Prior
−20 −15 −10 −5 0
Coefficient of Symmetric Nuclear Dyads
Gelman et al.'s Default Cauchy(0, 2.5) Prior

57. 0.1 1 10 100 1,000 10,000 100,000
Posterior Distribution of Risk−Ratio of War in Nonnuclear Dyads
Compared to Symmetric Nuclear Dyads

Informative Normal(0, 4.5) Prior
0.1 24.5 1986.4

Skeptical Normal(0, 2) Prior
0.1 4 31.2

Enthusiastic Normal(0, 8) Prior
0.1 299.2 499043.2

Zorn's Default Jefferys' Prior
0.1 3.4 100.2

Gelman et al.'s Default Cauchy(0, 2.5) Prior
0.1 9.2 25277.4

58. Software
for Choosing a Good Prior

59. separation
(on GitHub)

60. crain.co/example

61. # install packages
devtools::install_github("carlislerainey/compactr")
devtools::install_github("carlislerainey/separation")
library(separation)
library(arm) # for rescale()
# load and recode data
data(politics_and_need)
d <- politics_and_need
d\$dem_governor <- 1 - d\$gop_governor
d\$st_percent_uninsured <- rescale(d\$percent_uninsured)
# formula to use throughout
f <- oppose_expansion ~ dem_governor +
percent_favorable_aca + gop_leg +
st_percent_uninsured + bal2012 +
multiplier + percent_nonwhite +
percent_metro

62. Workﬂow
1. Calculate the PPPD: calc_pppd()
2. Simulate from the posterior: sim_post_*()
3. Calculate quantities of interest: calc_qi()

63. calc_pppd()

64. # informative prior
prior_sims_4.5 <- rnorm(10000, 0, 4.5)
pppd <- calc_pppd(formula = f,
data = d,
prior_sims = prior_sims_4.5,
sep_var_name = "dem_governor",
prior_label = "Normal(0, 4.5)")

65. plot(pppd)

66. plot(pppd, log_scale = TRUE)

67. sim_post_normal()
sim_post_gelman()
sim_post_jeffreys()

68. # mcmc estimation
post <- sim_post_normal(f, d, sep_var = "dem_governor",
sd = 4.5,
n_sims = 10000,
n_burnin = 1000,
n_chains = 4)

69. calc_qi()

70. # compute quantities of interest
## dem_governor
X_pred_list <- set_at_median(f, d)
x <- c(0, 1)
X_pred_list\$dem_governor <- x
qi <- calc_qi(post, X_pred_list, qi_name = "fd")

71. plot(qi, xlim = c(-1, 1),
xlab = "First Difference",
ylab = "Posterior Density",
main = "The Effect of Democratic Partisanship on
Opposing the Expansion")

72. ## st_percent_uninsured
X_pred_list <- set_at_median(f, d)
x <- seq(min(d\$st_percent_uninsured),
max(d\$st_percent_uninsured),
by = 0.1)
X_pred_list\$st_percent_uninsured <- x
qi <- calc_qi(post, X_pred_list, qi_name = "pr")

73. plot(qi, x,
xlab = "Percent Uninsured (Std.)",
ylab = "Predicted Probability",
main = "The Probability of Opposition as the
Percent Uninsured (Std.) Varies")

74. 15 lines

75. Conclusion

76. The prior matters a lot,
so choose a good one.

77. The prior matters
in practice.

78. The prior matters
in theory.

79. The partial prior predictive distribution
simpliﬁes the choice of prior.

80. Software makes choosing a prior,
estimating the model, and
interpreting the estimates easy.

81. What should you do?
1. Notice the problem and do something.
2. Recognize the the prior affects the inferences
and choose a good one.
3. Assess the robustness of your conclusions to a
range of prior distributions.

82. Questions?

83. Appendix

84. −15 −10 −5 0
Posterior Median and 90% HPD for
Coefficient of Symmetric Nuclear Dyads

Informative Normal(0, 4.5) Prior

Skeptical Normal(0, 2) Prior

Enthusiastic Normal(0, 8) Prior

Zorn's Default Jefferys' Invariant Prior

Gelman et al.'s Default Cauchy(0, 2.5) Prior

85. 0.0 0.2 0.4 0.6 0.8 1.0
Pr(RR > 1)

Informative Normal(0, 4.5) Prior 0.93

Skeptical Normal(0, 2) Prior 0.86

Enthusiastic Normal(0, 8) Prior 0.96

Zorn's Default Jeffreys' Prior 0.79

Gelman et al.'s Default Cauchy(0, 2.5) Prior 0.9

86. For
1. a monotonic likelihood
p(y| )
decreasing in s,
2. a proper prior distribution
p( | )
, and
3. a large, negative s,
the posterior distribution of s is proportional to the prior distribution for s, so
that
p( s
|y) / p( s
| )
.

87. Theorem 1. For a monotonic likelihood
p(y| )
increasing [decreasing] in s,
proper prior distribution
p( | )
, and large positive [negative] s, the posterior
distribution of s is proportional to the prior distribution for s, so that
p( s
|y) /
p( s
| )
.

88. Proof. Due to separation,
p(y| )
is monotonic increasing in s to a limit
L
, so
that
lim
s
!1
p(y|
s
) = L
. By Bayes’ rule,
p( |y) =
p(y| )p( | )
1
R
1
p(y| )p( | )d
=
p(y| )p( | )
p(y| )
| {z }
constant w.r.t.
.
Integrating out the other parameters s
= h
cons
, 1, 2, ...,
k
i
to obtain the
posterior distribution of s,
p(
s
|y) =
1
R
1
p(y| )p( | )d
s
p(y| )
,
(1)
and the prior distribution of s,
p(
s
| ) =
1
Z
1
p( | )d
s
.
Notice that
p(
s
|y) / p(
s
| )
iff
p(
s
|y)
p( | )
= k
, where the constant
k 6= 0
.Thus,

89. p(
s
| ) =
1
Z
1
p( | )d
s
.
Notice that
p(
s
|y) / p(
s
| )
iff
p(
s
|y)
p(
s
| )
= k
, where the constant
k 6= 0
.Thus,
Theorem 1 implies that
lim
s
!1
p(
s
|y)
p(
s
| )
= k
Substituting in Equation 1,
lim
s
!1
1
R
1
p
(
y
| )
p
( | )
d s
p
(
y
| )
p(
s
| )
= k.
Multiplying both sides by
p(y| )
, which is constant with respect to ,
lim
s
!1
1
R
1
p(y| )p( | )d
s
p(
s
| )
= kp(y| ).
Setting
1
R
p(y| )p( | )d
s
= p(y|
s
)p(
s
| )
,

90. s
!1 p(
s
| )
Substituting in Equation 1,
lim
s
!1
1
R
1
p
(
y
| )
p
( | )
d s
p
(
y
| )
p(
s
| )
= k.
Multiplying both sides by
p(y| )
, which is constant with respect to ,
lim
s
!1
1
R
1
p(y| )p( | )d
s
p(
s
| )
= kp(y| ).
Setting
1
R
1
p(y| )p( | )d
s
= p(y|
s
)p(
s
| )
,
lim
s
!1
p(y|
s
)p(
s
| )
p(
s
| )
= kp(y| ).
Canceling
p(
s
| )
in the numerator and denominator,
lim
s
!1
p(y|
s
) = kp(y| ).