Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[blog用] Comparison of Estimation Methods in Causal Inference

Masa
May 25, 2022

[blog用] Comparison of Estimation Methods in Causal Inference

ブログ用の図表素材スライド
ブログ
Comparison of Estimation Methods in Causal Inference
https://medium.com/p/16f5ac9ed122

Masa

May 25, 2022
Tweet

More Decks by Masa

Other Decks in Science

Transcript

  1. Comparison of estimation methods in
    causal inference

    twitter @asas_mimi
    1
    (with RCT benchmark)

    View full-size slide

  2. Well-known RCT dataset: LaLonde(1986)

    Dehejia, Rajeev and Sadek Wahba. (1999).Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs.
    Journal of the American Statistical Association 94 (448): 1053-1062.
    LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 2
    The National Supported Work Demonstration (NSW) : The interest of this experiment
    is whether "vocational training" (counseling and short-term work experience) affects
    subsequent earnings. In the dataset, the treatment variable, vocational training, is
    denoted by treat, and the outcome variable, income in 1978, is denoted by re78.
    Data can be downloaded at the following website:
    https://users.nber.org/~rdehejia/data/ outcome
    treatment
    1 or 0

    View full-size slide

  3. Basic statistics : NSW

    Dehejia, Rajeev and Sadek Wahba. (1999).Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs.
    Journal of the American Statistical Association 94 (448): 1053-1062.
    LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 3
    This experiment was conducted as an RCT, but the table below shows that the
    covariates are not completely balanced
    treated
    average
    control
    average | (treated avg - control avg) / std |

    View full-size slide

  4. RCT causal effect := 1676.3426 

    Dehejia, Rajeev and Sadek Wahba. (1999).Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs.
    Journal of the American Statistical Association 94 (448): 1053-1062.
    LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 4
    We will use a simple multiple regression model and consider the "true causal effect" to
    be 1676.3426.

    View full-size slide

  5. Validation dataset

    安井翔太(2020)『効果検証入門 正しい比較のための因果推論
    /. 計量経済学の基礎』、技術評論社
    5
    A validation dataset is created by excluding data from the NSW control group and
    instead considering non-experimental data (CPS: Current Population Survey) as the
    control group
    treated
    control
    NSW CPS
    control
    New DataSet !!


    View full-size slide

  6. Our Assumptions

    True ATT = 1676.3426 strongly ignorable treatment assignment
    In the treated group, we know the true effect by
    RCT.
    However, there is no guarantee that the same
    effect would be obtained if the treatment were
    applied to the CPS data. In other words, ATT =
    ATU does not necessarily hold (i.e., we do not
    know the true ATE).
    All we can know with this validation data is the
    ATT (:= 1676.3426).
    ATE
    ATT
    Average
    treatment effects
    on the treated
    Strongly ignorable treatment assignment can
    also be expressed in terms of conditional
    independence.
    If treatment assignment T is conditionally
    independent of Y(1) , Y(0) given confounding
    covariates X , the treatment assignment is said
    to be strongly ignorable.

    View full-size slide

  7. Our Approaches 

    conditional treatment assignment conditional parallel-trend
    (1) multiple regression
    (2) Propensity score Approach (IPW)
    (3) Meta Learner
    ● S-Learner
    ● T-Learner
    ● X-Learner
    ● DomainAdaptation-Learner
    (4) Double/Debiased ML
    (5) Doubly Robust DID
    (6) Double/Debiased DID
    Except for Doubly Robust DID, these were validated without
    the use of a good package such as EconML. The reasons are
    as follows: .
    - To calculate the standard error (using the boatstrap method)
    - For my own study

    View full-size slide

  8. (1) Multiple regression

    Recently, propensity score-based methods have become popular, and some people
    assume that causal inference cannot be made with multiple regression models.
    ● Remember that the identification strategy is almost the same as the propensity
    score-based approach.
    ● Whether it is easier to model the outcome directly or the assignment to the
    treatment group depends on the case
    Multiple regression?
    Nonsense.lol
    Although this approach is very simple,
    it is a very bad attitude to assume that
    causal inference is not possible
    only because of multiple regression.

    View full-size slide

  9. (2) IPW for ATT

    The weight of IPW estimation for ATT can be defined as follows. Only the control group
    is weighted by propensity score.
    w Covariate balance
    improved.

    View full-size slide

  10. (3) Meta Learner : S & T -Learner

    EconML “EconML User Guide“ ( https://econml.azurewebsites.net/spec/estimation/metalearners.html )
    We use the following process to estimate the ATT
    (1) Once the CATE (conditional ATE) is calculated for each
    individual using these algorithms
    (2) Calculate the ATT by averaging the estimated CATE over
    the T=1 records.
    Create a separate model for the treatment and
    control groups, and then take the difference
    between the output values of the two models
    for each record.
    Simplest method! Adopt the difference
    between T=1 and T=0 as CATE in the learned
    model.

    View full-size slide

  11. (3) Meta Learner : X-Learner


    EconML “EconML User Guide“ ( https://econml.azurewebsites.net/spec/estimation/metalearners.html )
    We use the following process to estimate the ATT
    (1) Once the CATE (conditional ATE) is calculated for each
    individual using these algorithms
    (2) Calculate the ATT by averaging the estimated CATE over
    the T=1 records.
    Estimate outcome function
    Average the estimates
    (g(x) : propensity score)
    Compute imputed treatment effects
    Estimate CATE in 2 ways

    View full-size slide

  12. (3) Meta Learner : DA-Learner

    EconML “EconML User Guide“ ( https://econml.azurewebsites.net/spec/estimation/metalearners.html )
    We use the following process to estimate the ATT
    (1) Once the CATE (conditional ATE) is calculated for each
    individual using these algorithms
    (2) Calculate the ATT by averaging the estimated CATE over
    the T=1 records.
    Estimate outcome function
    using propensity score weighting
    Compute imputed treatment effects
    Estimate CATE

    View full-size slide

  13. (4) Double/Debiased Machine Learning

    for non-linear CATE

    EconML “EconML User Guide“ ( https://econml.azurewebsites.net/spec/estimation/dml.html )
    We use the following process to estimate the ATT
    (1) Once the CATE (conditional ATE) is calculated for each
    individual using these algorithms
    (2) Calculate the ATT by averaging the estimated CATE over
    the T=1 records.
    DML
    τ is a function of X and aims to compute
    CATE
    sample weight supervised Label

    τ(X) can be viewed as weighted supervised
    learning

    View full-size slide

  14. Assumptions for DID models

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    14
    The support of the ps of the treated is a
    subset of the support for the untreated
    conditional parallel-trend
    potential outcomes
    Counterfactual outcomes if no intervention
    is received
    treatment group control group
    violation for
    parallel-trend
    conditioning
    with X trend plot | X = 〇〇
    ps
    Not overrap !!
    Common support is a
    subset of the untreated
    Comparable!!
    This states that the support of the propensity score of the
    treated group is a subset of the support for the untreated.
    This is the same constraint placed on ATT estimation in
    other propensity score methods

    View full-size slide

  15. (5) Doubly Robust DID
 For this model only, the R package was
    used as is. 


    Please note that this is not a fair
    comparison since we used the default
    model without any modification.

    https://psantanna.com/DRDID/index.html

    View full-size slide

  16. (6) Double/Debiased DID

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    supervised
    learning
    Label = Diff.
    Learning with control group only
    Cross fitting 

    separates samples for “fitting” and
    “prediction” as in Chernozhukov (2018)

    propensity score

    View full-size slide

  17. Estimated Results 

    17
    ● Point estimation and absolute error with RCT result
    ● The default hyperparameters of LGBM are used for DML, Meta Leaner, and
    DMLDID. DRDID uses R package defaults
    If we only consider point estimates,
    DML is closest to the RCT result
    In the present case (table data with low
    dimensionality of features as well), such a
    simple model is sufficient to adjust the bias
    without using a complex model such as the
    following.

    View full-size slide

  18. Standard error…

    18
    ● Considering the standard error, it is clear that DML was not suitable in this case.
    ● This may be an underfit of the ML model due to the reduction in data volume
    caused by cross-fitting.
    DML

    View full-size slide

  19. Conclusions

    19
    ● In this experiment, the identification strategy is almost the same.
    ● The only difference is the "estimation method".
    ● The most important thing in practice is to agree on the identification strategy,
    which should be discussed with the stakeholders and make full use of domain
    knowledges.
    ● The choice of estimation method itself should be decided flexibly depending on
    the characteristics of the data.
    ○ In the present case, a nonlinear ML-based approach proved to be overkill.
    ■ Of course, if the given data are expected to have high-dimensional features or nonlinear
    functions to treatments or outcomes, the ML-based approach is likely to have strengths.
    ○ It is better to try several estimation methods, and if there is a difference
    between them, it is better to have an attitude of digging deeper into the
    causes.

    View full-size slide

  20. 20
    Multiple regression model? That's too naive!
    If you're going to do causal inference, it has to be a XX
    model (propensity model or DML or …), right?
    So reject!!
    Multiple regression model, right? I see.
    But your data is p>>n, so maybe OLS doesn't estimate
    it well, maybe you should try DML or something?
    constructive
    advice
    Inappropriate
    advice

    View full-size slide

  21. Extra Analysis

    21
    ● Since DML and Meta Learner calculate CATE, it is useful to visualize it using shap.
    ● We can also check flexible non-linear relationships rather than interaction terms
    with linear models. If you are interested, please refer to my NOTEBOOK.
    Younger age groups with
    higher 75-year annual
    incomes are more likely to
    benefit from the treatment.

    View full-size slide

  22. Thank you


    ※ The Python code for this article is stored in this
    repository.

    https://github.com/MasaAsami/D2ML 


    22

    View full-size slide

  23. References:

    [1] 安井翔太(著)株式会社ホクソエム(監修)(2020).『効果検証入門:正しい比較のための因果推論
    /計量経済学の基礎』技術評論社

    [2] Microsoft Research . EconML User Guide

    [3] Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using
    random forests. Journal of the American Statistical Association, 113(523), 1228–1242.

    [4] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J.
    (2018). Double/debiased machine learning for treatment and structural parameters.

    [5] Sant'Anna, P. H., & Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal
    of Econometrics, 219(1), 101–122.

    [6] R pkg. Doubly Robust Difference-in-Differences

    [7] Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The
    Econometrics Journal, 23(2), 177–191.
    23

    View full-size slide