Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[blog用] Comparison of Estimation Methods in Causal Inference

B583d0943fe698dd7fc75d30d02f099f?s=47 Masa
May 25, 2022

[blog用] Comparison of Estimation Methods in Causal Inference

ブログ用の図表素材スライド
ブログ
Comparison of Estimation Methods in Causal Inference
https://medium.com/p/16f5ac9ed122

B583d0943fe698dd7fc75d30d02f099f?s=128

Masa

May 25, 2022
Tweet

More Decks by Masa

Other Decks in Science

Transcript

  1. Comparison of estimation methods in causal inference
 twitter @asas_mimi 1

    (with RCT benchmark)
  2. Well-known RCT dataset: LaLonde(1986)
 Dehejia, Rajeev and Sadek Wahba. (1999).Causal

    Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 2 The National Supported Work Demonstration (NSW) : The interest of this experiment is whether "vocational training" (counseling and short-term work experience) affects subsequent earnings. In the dataset, the treatment variable, vocational training, is denoted by treat, and the outcome variable, income in 1978, is denoted by re78. Data can be downloaded at the following website: https://users.nber.org/~rdehejia/data/ outcome treatment 1 or 0
  3. Basic statistics : NSW
 Dehejia, Rajeev and Sadek Wahba. (1999).Causal

    Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 3 This experiment was conducted as an RCT, but the table below shows that the covariates are not completely balanced treated average control average | (treated avg - control avg) / std |
  4. RCT causal effect := 1676.3426 
 Dehejia, Rajeev and Sadek

    Wahba. (1999).Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 4 We will use a simple multiple regression model and consider the "true causal effect" to be 1676.3426.
  5. Validation dataset
 安井翔太(2020)『効果検証入門 正しい比較のための因果推論 /. 計量経済学の基礎』、技術評論社 5 A validation dataset

    is created by excluding data from the NSW control group and instead considering non-experimental data (CPS: Current Population Survey) as the control group treated control NSW CPS control New DataSet !!

  6. Our Assumptions
 True ATT = 1676.3426 strongly ignorable treatment assignment

    In the treated group, we know the true effect by RCT. However, there is no guarantee that the same effect would be obtained if the treatment were applied to the CPS data. In other words, ATT = ATU does not necessarily hold (i.e., we do not know the true ATE). All we can know with this validation data is the ATT (:= 1676.3426). ATE ATT Average treatment effects on the treated Strongly ignorable treatment assignment can also be expressed in terms of conditional independence. If treatment assignment T is conditionally independent of Y(1) , Y(0) given confounding covariates X , the treatment assignment is said to be strongly ignorable.
  7. Our Approaches 
 conditional treatment assignment conditional parallel-trend (1) multiple

    regression (2) Propensity score Approach (IPW) (3) Meta Learner • S-Learner • T-Learner • X-Learner • DomainAdaptation-Learner (4) Double/Debiased ML (5) Doubly Robust DID (6) Double/Debiased DID Except for Doubly Robust DID, these were validated without the use of a good package such as EconML. The reasons are as follows: . - To calculate the standard error (using the boatstrap method) - For my own study
  8. (1) Multiple regression
 Recently, propensity score-based methods have become popular,

    and some people assume that causal inference cannot be made with multiple regression models. • Remember that the identification strategy is almost the same as the propensity score-based approach. • Whether it is easier to model the outcome directly or the assignment to the treatment group depends on the case Multiple regression? Nonsense.lol Although this approach is very simple, it is a very bad attitude to assume that causal inference is not possible only because of multiple regression.
  9. (2) IPW for ATT
 The weight of IPW estimation for

    ATT can be defined as follows. Only the control group is weighted by propensity score. w Covariate balance improved.
  10. (3) Meta Learner : S & T -Learner
 EconML “EconML

    User Guide“ ( https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Create a separate model for the treatment and control groups, and then take the difference between the output values of the two models for each record. Simplest method! Adopt the difference between T=1 and T=0 as CATE in the learned model.
  11. (3) Meta Learner : X-Learner
 
 EconML “EconML User Guide“

    ( https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Estimate outcome function Average the estimates (g(x) : propensity score) Compute imputed treatment effects Estimate CATE in 2 ways
  12. (3) Meta Learner : DA-Learner
 EconML “EconML User Guide“ (

    https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Estimate outcome function using propensity score weighting Compute imputed treatment effects Estimate CATE
  13. (4) Double/Debiased Machine Learning
 for non-linear CATE
 EconML “EconML User

    Guide“ ( https://econml.azurewebsites.net/spec/estimation/dml.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. DML τ is a function of X and aims to compute CATE sample weight supervised Label
 τ(X) can be viewed as weighted supervised learning
  14. Assumptions for DID models
 Chang, N. C. (2020). Double/debiased machine

    learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. 14 The support of the ps of the treated is a subset of the support for the untreated conditional parallel-trend potential outcomes Counterfactual outcomes if no intervention is received treatment group control group violation for parallel-trend conditioning with X trend plot | X = 〇〇 ps Not overrap !! Common support is a subset of the untreated Comparable!! This states that the support of the propensity score of the treated group is a subset of the support for the untreated. This is the same constraint placed on ATT estimation in other propensity score methods
  15. (5) Doubly Robust DID
 For this model only, the R

    package was used as is. 
 
 Please note that this is not a fair comparison since we used the default model without any modification.
 https://psantanna.com/DRDID/index.html
  16. (6) Double/Debiased DID
 Chang, N. C. (2020). Double/debiased machine learning

    for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. supervised learning Label = Diff. Learning with control group only Cross fitting 
 separates samples for “fitting” and “prediction” as in Chernozhukov (2018)
 propensity score
  17. Estimated Results 
 17 • Point estimation and absolute error

    with RCT result • The default hyperparameters of LGBM are used for DML, Meta Leaner, and DMLDID. DRDID uses R package defaults If we only consider point estimates, DML is closest to the RCT result In the present case (table data with low dimensionality of features as well), such a simple model is sufficient to adjust the bias without using a complex model such as the following.
  18. Standard error…
 18 • Considering the standard error, it is

    clear that DML was not suitable in this case. • This may be an underfit of the ML model due to the reduction in data volume caused by cross-fitting. DML
  19. Conclusions
 19 • In this experiment, the identification strategy is

    almost the same. • The only difference is the "estimation method". • The most important thing in practice is to agree on the identification strategy, which should be discussed with the stakeholders and make full use of domain knowledges. • The choice of estimation method itself should be decided flexibly depending on the characteristics of the data. ◦ In the present case, a nonlinear ML-based approach proved to be overkill. ▪ Of course, if the given data are expected to have high-dimensional features or nonlinear functions to treatments or outcomes, the ML-based approach is likely to have strengths. ◦ It is better to try several estimation methods, and if there is a difference between them, it is better to have an attitude of digging deeper into the causes.
  20. 20 Multiple regression model? That's too naive! If you're going

    to do causal inference, it has to be a XX model (propensity model or DML or …), right? So reject!! Multiple regression model, right? I see. But your data is p>>n, so maybe OLS doesn't estimate it well, maybe you should try DML or something? constructive advice Inappropriate advice
  21. Extra Analysis
 21 • Since DML and Meta Learner calculate

    CATE, it is useful to visualize it using shap. • We can also check flexible non-linear relationships rather than interaction terms with linear models. If you are interested, please refer to my NOTEBOOK. Younger age groups with higher 75-year annual incomes are more likely to benefit from the treatment.
  22. Thank you
 
 ※ The Python code for this article

    is stored in this repository.
 https://github.com/MasaAsami/D2ML 
 
 22
  23. References:
 [1] 安井翔太(著)株式会社ホクソエム(監修)(2020).『効果検証入門:正しい比較のための因果推論 /計量経済学の基礎』技術評論社
 [2] Microsoft Research . EconML User

    Guide
 [3] Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242.
 [4] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters.
 [5] Sant'Anna, P. H., & Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1), 101–122.
 [6] R pkg. Doubly Robust Difference-in-Differences
 [7] Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177–191. 23