Carlisle Rainey
September 16, 2015
99

# Logit Models in Small Samples

Talk given on September 16, 2015, at Texas A&M University.

## Carlisle Rainey

September 16, 2015

## Transcript

1. Logit Models in
Small Samples

2. How likely are the Kuwaitis to form
an active resistance to the PGFK?
What factors make this resistance
more or less likely?

3. –Weisiger (2014)
“A good understanding of when resistance is likely after
to ﬁght in the ﬁrst place, whether to accept compromise
settlements prior to a ﬁnal military victory, and when it may
be worthwhile to encourage resistance among friendly
conquered populations”

4. Variable
Standardized
Coefﬁcient Estimate
90% Conﬁdence
Interval
Polity Score of Conqueror -0.27* [-0.55, 0.00]
log(Distance) 0.47* [0.21, 0.74]
Terrain 0.20 [-0.04, 0.45]
Soldiers per Territory -0.13 [-0.41, 0.15]
GDP per Capita -0.13 [-0.38, 0.12]
Constant 0.40* [0.30, 0.50]
N 35
Events 14
R2 0.61
Linear Probability Model Estimates

5. What is the probability of a Kuwaiti resistance
in the aftermath of the Iraqi invasion?
-0.22

6. How many times more likely is a Kuwaiti
resistance if they have a coordinating leader?
-0.90

7. a logit model
What We Need

8. How many times more likely is a Kuwaiti
resistance if they have a coordinating leader?
281.9

9. 35 observations
“It is risky to use ML with samples smaller than 100,
while samples larger than 500 seem adequate.”
–J. Scott Long

10. The biostatistics literature recommends 10
events per explanatory variable.
2.3 EPEV

11. ML estimates are
severely biased
in small samples.

12. Pr(yi = 1) = logit
1
( cons + 0.5x1 +
k
X
j
=2
0.2xj)

13. βcons = −1
25%
50%
75%
100%
125%
k = 9
50 100 150 200
Sample Size
Percent Bias
Percent Bias of the ML Estimator

14. βcons = −1 βcons = −0.5 βcons = 0
0%
25%
50%
75%
100%
125%
0%
25%
50%
75%
100%
125%
0%
25%
50%
75%
100%
125%
k = 3 k = 6 k = 9
50 100 150 200 50 100 150 200 50 100 150 200
Sample Size (N)
Percent Bias
Percent Bias of the ML Estimator

15. a logit model
What We Need
a good estimator

16. L⇤( |y) = L( |y)
penalty
z }| {
|I( )|1
2

17. βcons = −1 βcons = −0.5 βcons = 0
0%
40%
80%
120%
0%
40%
80%
120%
0%
40%
80%
120%
k = 3 k = 6 k = 9
50 100 150 200 50 100 150 200 50 100 150 200
Sample Size
Percent Bias
Method
ML
PML
Percent Bias of ML and PML Estimators
30 observations
120%
7%

variance?

19. βcons = −1 βcons = −0.5 βcons = 0
5%
10%
20%
50%
100%
200%
500%
5%
10%
20%
50%
100%
200%
500%
5%
10%
20%
50%
100%
200%
500%
k = 3 k = 6 k = 9
50 100 150 200 50 100 150 200 50 100 150 200
Sample Size
Variance Inflation
Variance Inflation (%) of ML Relative to PML

20. MSE(ˆ) = E[(ˆ true)2]
= V ar(ˆ) + [E(ˆ)]2

21. βcons = −1 βcons = −0.5 βcons = 0
1%
10%
100%
1,000%
1%
10%
100%
1,000%
1%
10%
100%
1,000%
k = 3 k = 6 k = 9
50 100 150 200 50 100 150 200 50 100 150 200
Sample Size
Mean−Squared Error Inflation
Mean−Squared Error Inflation (%) of ML Relative to PML

22. 0
50
100
150
50 100 150 200
Sample Size
Relative Contribution of Variance Compared to Bias
Number of Variables
3
6
9
Intercept
−1
−0.5
0
The Relative Contribution of the Variance Compared
to the Bias as the Sample Size Varies

23. N = 35 (14 events)
Intercept
Conqueror's Polity Score
log(Intercapital Distance)
Terrain
Occupying Force Density
Per Capita GDP
−20 −10 0 10 20
Logistic Regression Coefficients and 90% Confidence Intervals
(Variables Standardized)
Method

ML Estimate
PML Estimate
Logistic Regression Model Explaining
Post−Conflict Guerrilla War

24. 0.0
0.2
0.4
0.6
Brier Score Log Score
Score Type
Score
Method
ML
PML
Out−of−Sample Prediction Scores

25. How many times more likely is a Kuwaiti
resistance if they have a coordinating leader?
ML: 281.9
PML: 18.5

27. What is a
“small sample”?

28. ⇠ =
1
k
min
"
n
X
i=1
yi,
n
X
i=1
(1 yi)
#

29. ● ●

● ●

● ●

● ●

● ●

●●

● ●

● ●

● ●

● ●

● ●

●●

● ●

●●

●●

● ●

● ●

●●

ξ = 51
ξ = 12
Slope Coefficient
0.5%
1.0%
2.0%
2.0%
3.0%
5.0%
10.0%
20.0%
50.0%
2 5 10 20 50 100 200 500
ξ
MSE−Inflation
Sample Size

1000
2000
−1
0
1
True Coefficient
MSE−Inflation of ML Relative to PML for the
Slope Coefficients as the Information Increases

30. ● ●

● ●

● ●

● ●

● ●

● ●
● ●

● ●

● ●

●●

●●

● ●

● ●

ξ = 96
ξ = 33
Intercept
0%
1%
2%
2%
3%
5%
10%
20%
50%
100%
200%
2 5 10 20 50 100 200 500
ξ
MSE−Inflation
Sample Size

1000
2000
−2
0
2
True Coefficient
MSE−Inflation of ML Relative to PML for the
Intercept as the Information Increases

31. Acceptable
Inaccuracy
Slope
Copefﬁcients
Intercept
Substantial ξ < 12 ξ < 33
Noticeable 12 ≤ ξ < 51 33 ≤ ξ < 96
Negligible ξ ≥ 51 ξ ≥ 96
Some Thresholds

32. Acceptable
Inaccuracy
Slope
Copefﬁcients
Intercept
Substantial ξ < 12 ξ < 33
Noticeable 12 ≤ ξ < 51 33 ≤ ξ < 96
Negligible ξ ≥ 51 ξ ≥ 96
Some Thresholds

33. Acceptable
Inaccuracy
Slope
Copefﬁcients
Intercept
Substantial ξ < 12 ξ < 33
Noticeable 12 ≤ ξ < 51 33 ≤ ξ < 96
Negligible ξ ≥ 51 ξ ≥ 96
Some Thresholds

34. PML in Short
• Always better in theory.
• Easy to implement.
• Makes a big difference in small samples.
• Makes a small, but noticeable, difference in much
larger samples.

35. 1. Choose the number of covariates k randomly from a uniform distribution
from 3 to 12.
2. Choose the sample size n randomly from a uniform distribution from 200
to 3,000.
3. Choose the intercept cons randomly from a uniform distribution from -4
to 4.
4. Choose the slope coefﬁcients 1, ..., k randomly from a normal distribu-
tion with mean 0 and standard deviation 0.5.
5. Choose a covariance matrix

for the explanatory variables randomly
using the method developed by Joe (2006) such that the variances along
the diagonal range from from 0.25 to 2.
6. Choose the explanatory variables x1, x2, ..., xk randomly from a multivari-
ate normal distribution with mean 0 and covariance matrix

.
Generating a Random DGP

36. ξ = 51
ξ = 12

● ●

● ●

● ●

ξ = 51
ξ = 12

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

ξ = 51
ξ = 12

● ●

● ●

● ●

● ●

● ●

● ●

●●

●●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

●●

● ●

●●

● ●

● ●

●●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

ξ = 51
ξ = 12
Sample Size Between 200 and 500 Sample Size Between 501 and 1,000
Sample Size Between 1,001 and 1,500 Sample Size Between 1,501 and 2,000
0.5%
1.0%
2.0%
2.0%
3.0%
5.0%
10.0%
20.0%
50.0%
0.5%
1.0%
2.0%
2.0%
3.0%
5.0%
10.0%
20.0%
50.0%
2 5 10 20 50 100 200 500 2 5 10 20 50 100 200 500
ξ
MSE−Inflation
Sample Size

1000
2000
−1
0
1
True Coefficient
MSE−Inflation of ML Relative to PML for the
Slope Coefficients as the Information Increases

37. ξ = 96
ξ = 33 ●

● ●

ξ = 96
ξ = 33

● ●

● ●

● ●

ξ = 96
ξ = 33

●●

●●

● ●

● ●

●●

● ●

●●

● ●

● ●

ξ = 96
ξ = 33
Sample Size Between 200 and 500 Sample Size Between 501 and 1,000
Sample Size Between 1,001 and 1,500 Sample Size Between 1,501 and 2,000
0%
1%
2%
2%
3%
5%
10%
20%
50%
100%
200%
0%
1%
2%
2%
3%
5%
10%
20%
50%
100%
200%
2 5 10 20 50 100 200 500 2 5 10 20 50 100 200 500
ξ
MSE−Inflation
Sample Size

1000
2000
−2
0
2
True Coefficient
MSE−Inflation of ML Relative to PML for the
Intercept as the Information Increases