Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Logit Models in Small Samples

Carlisle Rainey
September 16, 2015
99

Logit Models in Small Samples

Talk given on September 16, 2015, at Texas A&M University.

Carlisle Rainey

September 16, 2015
Tweet

Transcript

  1. Logit Models in
    Small Samples
    Kelly McCaskey and Carlisle Rainey

    View Slide

  2. View Slide

  3. How likely are the Kuwaitis to form
    an active resistance to the PGFK?
    What factors make this resistance
    more or less likely?

    View Slide

  4. View Slide

  5. –Weisiger (2014)
    “A good understanding of when resistance is likely after
    conquest should influence leaders’ decisions about whether
    to fight in the first place, whether to accept compromise
    settlements prior to a final military victory, and when it may
    be worthwhile to encourage resistance among friendly
    conquered populations”

    View Slide

  6. Variable
    Standardized
    Coefficient Estimate
    90% Confidence
    Interval
    Coordinating Leader 0.44* [0.19, 0.69]
    Polity Score of Conqueror -0.27* [-0.55, 0.00]
    log(Distance) 0.47* [0.21, 0.74]
    Terrain 0.20 [-0.04, 0.45]
    Soldiers per Territory -0.13 [-0.41, 0.15]
    GDP per Capita -0.13 [-0.38, 0.12]
    Constant 0.40* [0.30, 0.50]
    N 35
    Events 14
    R2 0.61
    Linear Probability Model Estimates

    View Slide

  7. What is the probability of a Kuwaiti resistance
    in the aftermath of the Iraqi invasion?
    -0.22

    View Slide

  8. How many times more likely is a Kuwaiti
    resistance if they have a coordinating leader?
    -0.90

    View Slide

  9. a logit model
    What We Need

    View Slide

  10. How many times more likely is a Kuwaiti
    resistance if they have a coordinating leader?
    281.9

    View Slide

  11. 35 observations
    “It is risky to use ML with samples smaller than 100,
    while samples larger than 500 seem adequate.”
    –J. Scott Long

    View Slide

  12. The biostatistics literature recommends 10
    events per explanatory variable.
    2.3 EPEV

    View Slide

  13. ML estimates are
    severely biased
    in small samples.

    View Slide

  14. Pr(yi = 1) = logit
    1
    ( cons + 0.5x1 +
    k
    X
    j
    =2
    0.2xj)

    View Slide

  15. βcons = −1
    25%
    50%
    75%
    100%
    125%
    k = 9
    50 100 150 200
    Sample Size
    Percent Bias
    Percent Bias of the ML Estimator

    View Slide

  16. βcons = −1 βcons = −0.5 βcons = 0
    0%
    25%
    50%
    75%
    100%
    125%
    0%
    25%
    50%
    75%
    100%
    125%
    0%
    25%
    50%
    75%
    100%
    125%
    k = 3 k = 6 k = 9
    50 100 150 200 50 100 150 200 50 100 150 200
    Sample Size (N)
    Percent Bias
    Percent Bias of the ML Estimator

    View Slide

  17. a logit model
    What We Need
    a good estimator

    View Slide

  18. L⇤( |y) = L( |y)
    penalty
    z }| {
    |I( )|1
    2

    View Slide

  19. βcons = −1 βcons = −0.5 βcons = 0
    0%
    40%
    80%
    120%
    0%
    40%
    80%
    120%
    0%
    40%
    80%
    120%
    k = 3 k = 6 k = 9
    50 100 150 200 50 100 150 200 50 100 150 200
    Sample Size
    Percent Bias
    Method
    ML
    PML
    Percent Bias of ML and PML Estimators
    30 observations
    about 8 events
    120%
    7%

    View Slide

  20. But what about the
    variance?

    View Slide

  21. βcons = −1 βcons = −0.5 βcons = 0
    5%
    10%
    20%
    50%
    100%
    200%
    500%
    5%
    10%
    20%
    50%
    100%
    200%
    500%
    5%
    10%
    20%
    50%
    100%
    200%
    500%
    k = 3 k = 6 k = 9
    50 100 150 200 50 100 150 200 50 100 150 200
    Sample Size
    Variance Inflation
    Variance Inflation (%) of ML Relative to PML

    View Slide

  22. MSE(ˆ) = E[(ˆ true)2]
    = V ar(ˆ) + [E(ˆ)]2

    View Slide

  23. βcons = −1 βcons = −0.5 βcons = 0
    1%
    10%
    100%
    1,000%
    1%
    10%
    100%
    1,000%
    1%
    10%
    100%
    1,000%
    k = 3 k = 6 k = 9
    50 100 150 200 50 100 150 200 50 100 150 200
    Sample Size
    Mean−Squared Error Inflation
    Mean−Squared Error Inflation (%) of ML Relative to PML

    View Slide

  24. 0
    50
    100
    150
    50 100 150 200
    Sample Size
    Relative Contribution of Variance Compared to Bias
    Number of Variables
    3
    6
    9
    Intercept
    −1
    −0.5
    0
    The Relative Contribution of the Variance Compared
    to the Bias as the Sample Size Varies

    View Slide















  25. N = 35 (14 events)
    Intercept
    Conqueror's Polity Score
    log(Intercapital Distance)
    Terrain
    Occupying Force Density
    Per Capita GDP
    Coordinating Leader
    −20 −10 0 10 20
    Logistic Regression Coefficients and 90% Confidence Intervals
    (Variables Standardized)
    Method


    ML Estimate
    PML Estimate
    Logistic Regression Model Explaining
    Post−Conflict Guerrilla War

    View Slide

  26. 0.0
    0.2
    0.4
    0.6
    Brier Score Log Score
    Score Type
    Score
    Method
    ML
    PML
    Out−of−Sample Prediction Scores

    View Slide

  27. How many times more likely is a Kuwaiti
    resistance if they have a coordinating leader?
    ML: 281.9
    PML: 18.5

    View Slide

  28. Practical Advice

    View Slide

  29. What is a
    “small sample”?

    View Slide

  30. ⇠ =
    1
    k
    min
    "
    n
    X
    i=1
    yi,
    n
    X
    i=1
    (1 yi)
    #

    View Slide

































































































































































  31. ● ●











































    ● ●


































































































































































































































































































































































    ● ●


















































































    ● ●






































































































































































































































































































































































































































































































    ● ●




















































































































    ●●


























































































    ● ●







































































    ● ●


























































































































































































































































































































































































































































































    ● ●

































































































































































































































































































































    ● ●






























































































































































    ● ●





































































































































































    ●●















































































































































































































































































































    ● ●










































































































































































































































































































    ●●










































    ●●
















































    ● ●

















































































































































































































































































    ● ●







































































    ●●























































    ξ = 51
    ξ = 12
    Slope Coefficient
    0.5%
    1.0%
    2.0%
    2.0%
    3.0%
    5.0%
    10.0%
    20.0%
    50.0%
    2 5 10 20 50 100 200 500
    ξ
    MSE−Inflation
    Sample Size


    1000
    2000
    −1
    0
    1
    True Coefficient
    MSE−Inflation of ML Relative to PML for the
    Slope Coefficients as the Information Increases

    View Slide











































































  32. ● ●




    ● ●






    ● ●

    ● ●





































































    ● ●






    ● ●
    ● ●
























































































































    ● ●






















    ● ●
























    ●●







































































    ●●




























    ● ●































    ● ●









    ξ = 96
    ξ = 33
    Intercept
    0%
    1%
    2%
    2%
    3%
    5%
    10%
    20%
    50%
    100%
    200%
    2 5 10 20 50 100 200 500
    ξ
    MSE−Inflation
    Sample Size


    1000
    2000
    −2
    0
    2
    True Coefficient
    MSE−Inflation of ML Relative to PML for the
    Intercept as the Information Increases

    View Slide

  33. Acceptable
    Inaccuracy
    Slope
    Copefficients
    Intercept
    Substantial ξ < 12 ξ < 33
    Noticeable 12 ≤ ξ < 51 33 ≤ ξ < 96
    Negligible ξ ≥ 51 ξ ≥ 96
    Some Thresholds

    View Slide

  34. Acceptable
    Inaccuracy
    Slope
    Copefficients
    Intercept
    Substantial ξ < 12 ξ < 33
    Noticeable 12 ≤ ξ < 51 33 ≤ ξ < 96
    Negligible ξ ≥ 51 ξ ≥ 96
    Some Thresholds

    View Slide

  35. Acceptable
    Inaccuracy
    Slope
    Copefficients
    Intercept
    Substantial ξ < 12 ξ < 33
    Noticeable 12 ≤ ξ < 51 33 ≤ ξ < 96
    Negligible ξ ≥ 51 ξ ≥ 96
    Some Thresholds

    View Slide

  36. PML in Short
    • Always better in theory.
    • Easy to implement.
    • Makes a big difference in small samples.
    • Makes a small, but noticeable, difference in much
    larger samples.

    View Slide

  37. 1. Choose the number of covariates k randomly from a uniform distribution
    from 3 to 12.
    2. Choose the sample size n randomly from a uniform distribution from 200
    to 3,000.
    3. Choose the intercept cons randomly from a uniform distribution from -4
    to 4.
    4. Choose the slope coefficients 1, ..., k randomly from a normal distribu-
    tion with mean 0 and standard deviation 0.5.
    5. Choose a covariance matrix

    for the explanatory variables randomly
    using the method developed by Joe (2006) such that the variances along
    the diagonal range from from 0.25 to 2.
    6. Choose the explanatory variables x1, x2, ..., xk randomly from a multivari-
    ate normal distribution with mean 0 and covariance matrix

    .
    Generating a Random DGP

    View Slide




















































































































































































































































































  38. ξ = 51
    ξ = 12







































































    ● ●







































































































































































































































































































































































    ● ●
















































    ● ●
























































































    ξ = 51
    ξ = 12




















    ● ●






























    ● ●


























































































































































    ● ●























































    ● ●










































































    ● ●



































































    ● ●













































    ● ●




























    ● ●





























































































































































































































































































































    ξ = 51
    ξ = 12






















































































    ● ●











    ● ●
































    ● ●































































    ● ●










































































































































    ● ●










    ● ●






    ●●



































































    ●●





























































































































































































    ● ●

















































    ● ●












    ● ●



















































































































    ● ●

















































    ● ●


































































    ● ●











    ● ●




























    ● ●
































































































































































    ●●















































































    ● ●



































    ●●



















    ● ●







    ● ●








































































    ●●






















































    ● ●





























































































    ● ●


























































    ● ●























































    ● ●


















































































































































































    ● ●





















    ● ●






























































    ● ●
































    ● ●



















































    ● ●























    ξ = 51
    ξ = 12
    Sample Size Between 200 and 500 Sample Size Between 501 and 1,000
    Sample Size Between 1,001 and 1,500 Sample Size Between 1,501 and 2,000
    0.5%
    1.0%
    2.0%
    2.0%
    3.0%
    5.0%
    10.0%
    20.0%
    50.0%
    0.5%
    1.0%
    2.0%
    2.0%
    3.0%
    5.0%
    10.0%
    20.0%
    50.0%
    2 5 10 20 50 100 200 500 2 5 10 20 50 100 200 500
    ξ
    MSE−Inflation
    Sample Size


    1000
    2000
    −1
    0
    1
    True Coefficient
    MSE−Inflation of ML Relative to PML for the
    Slope Coefficients as the Information Increases

    View Slide










































  39. ξ = 96
    ξ = 33 ●











































    ● ●






























    ξ = 96
    ξ = 33























































    ● ●







    ● ●


















    ● ●



















    ξ = 96
    ξ = 33














































    ●●




    ●●






    ● ●










































































































    ● ●













    ●●





    ● ●




















    ●●





































    ● ●









    ● ●





    ξ = 96
    ξ = 33
    Sample Size Between 200 and 500 Sample Size Between 501 and 1,000
    Sample Size Between 1,001 and 1,500 Sample Size Between 1,501 and 2,000
    0%
    1%
    2%
    2%
    3%
    5%
    10%
    20%
    50%
    100%
    200%
    0%
    1%
    2%
    2%
    3%
    5%
    10%
    20%
    50%
    100%
    200%
    2 5 10 20 50 100 200 500 2 5 10 20 50 100 200 500
    ξ
    MSE−Inflation
    Sample Size


    1000
    2000
    −2
    0
    2
    True Coefficient
    MSE−Inflation of ML Relative to PML for the
    Intercept as the Information Increases

    View Slide