[blog用] Comparison of Estimation Methods in Causal Inference

Comparison of estimation methods in causal inference  twitter @asas_mimi 1
(with RCT benchmark)

Well-known RCT dataset: LaLonde(1986)  Dehejia, Rajeev and Sadek Wahba. (1999).Causal
Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 2 The National Supported Work Demonstration (NSW) : The interest of this experiment is whether "vocational training" (counseling and short-term work experience) affects subsequent earnings. In the dataset, the treatment variable, vocational training, is denoted by treat, and the outcome variable, income in 1978, is denoted by re78. Data can be downloaded at the following website: https://users.nber.org/~rdehejia/data/ outcome treatment 1 or 0

Basic statistics : NSW  Dehejia, Rajeev and Sadek Wahba. (1999).Causal
Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 3 This experiment was conducted as an RCT, but the table below shows that the covariates are not completely balanced treated average control average | (treated avg - control avg) / std |

RCT causal effect := 1676.3426   Dehejia, Rajeev and Sadek
Wahba. (1999).Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 4 We will use a simple multiple regression model and consider the "true causal effect" to be 1676.3426.

Validation dataset  安井翔太（2020）『効果検証入門正しい比較のための因果推論 /. 計量経済学の基礎』、技術評論社 5 A validation dataset
is created by excluding data from the NSW control group and instead considering non-experimental data (CPS: Current Population Survey) as the control group treated control NSW CPS control New DataSet !! 

Our Assumptions  True ATT = 1676.3426 strongly ignorable treatment assignment
In the treated group, we know the true effect by RCT. However, there is no guarantee that the same effect would be obtained if the treatment were applied to the CPS data. In other words, ATT = ATU does not necessarily hold (i.e., we do not know the true ATE). All we can know with this validation data is the ATT (:= 1676.3426). ATE ATT Average treatment effects on the treated Strongly ignorable treatment assignment can also be expressed in terms of conditional independence. If treatment assignment T is conditionally independent of Y(1) , Y(0) given confounding covariates X , the treatment assignment is said to be strongly ignorable.

Our Approaches   conditional treatment assignment conditional parallel-trend (1) multiple
regression (2) Propensity score Approach (IPW) (3) Meta Learner • S-Learner • T-Learner • X-Learner • DomainAdaptation-Learner (4) Double/Debiased ML (5) Doubly Robust DID (6) Double/Debiased DID Except for Doubly Robust DID, these were validated without the use of a good package such as EconML. The reasons are as follows: . - To calculate the standard error (using the boatstrap method) - For my own study

(1) Multiple regression  Recently, propensity score-based methods have become popular,
and some people assume that causal inference cannot be made with multiple regression models. • Remember that the identiﬁcation strategy is almost the same as the propensity score-based approach. • Whether it is easier to model the outcome directly or the assignment to the treatment group depends on the case Multiple regression? Nonsense.lol Although this approach is very simple, it is a very bad attitude to assume that causal inference is not possible only because of multiple regression.

(2) IPW for ATT  The weight of IPW estimation for
ATT can be deﬁned as follows. Only the control group is weighted by propensity score. w Covariate balance improved.

(3) Meta Learner : S & T -Learner  EconML “EconML
User Guide“ ( https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Create a separate model for the treatment and control groups, and then take the difference between the output values of the two models for each record. Simplest method! Adopt the difference between T=1 and T=0 as CATE in the learned model.

(3) Meta Learner : X-Learner    EconML “EconML User Guide“
( https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Estimate outcome function Average the estimates (g(x) : propensity score) Compute imputed treatment effects Estimate CATE in 2 ways

(3) Meta Learner : DA-Learner  EconML “EconML User Guide“ (
https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Estimate outcome function using propensity score weighting Compute imputed treatment effects Estimate CATE

(4) Double/Debiased Machine Learning  for non-linear CATE  EconML “EconML User
Guide“ ( https://econml.azurewebsites.net/spec/estimation/dml.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. DML τ is a function of X and aims to compute CATE sample weight supervised Label  τ(X) can be viewed as weighted supervised learning

Assumptions for DID models  Chang, N. C. (2020). Double/debiased machine
learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. 14 The support of the ps of the treated is a subset of the support for the untreated conditional parallel-trend potential outcomes Counterfactual outcomes if no intervention is received treatment group　control group violation for parallel-trend conditioning with X trend plot | X = 〇〇 ps Not overrap !! Common support is a subset of the untreated Comparable!! This states that the support of the propensity score of the treated group is a subset of the support for the untreated. This is the same constraint placed on ATT estimation in other propensity score methods

(5) Doubly Robust DID  For this model only, the R
package was used as is.     Please note that this is not a fair comparison since we used the default model without any modification.  https://psantanna.com/DRDID/index.html

(6) Double/Debiased DID  Chang, N. C. (2020). Double/debiased machine learning
for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. supervised learning Label = Diff. Learning with control group only Cross fitting   separates samples for “fitting” and “prediction” as in Chernozhukov (2018)  propensity score

Estimated Results   17 • Point estimation and absolute error
with RCT result • The default hyperparameters of LGBM are used for DML, Meta Leaner, and DMLDID. DRDID uses R package defaults If we only consider point estimates, DML is closest to the RCT result In the present case (table data with low dimensionality of features as well), such a simple model is sufﬁcient to adjust the bias without using a complex model such as the following.

Standard error…  18 • Considering the standard error, it is
clear that DML was not suitable in this case. • This may be an underﬁt of the ML model due to the reduction in data volume caused by cross-ﬁtting. DML

Conclusions  19 • In this experiment, the identification strategy is
almost the same. • The only difference is the "estimation method". • The most important thing in practice is to agree on the identification strategy, which should be discussed with the stakeholders and make full use of domain knowledges. • The choice of estimation method itself should be decided flexibly depending on the characteristics of the data. ◦ In the present case, a nonlinear ML-based approach proved to be overkill. ▪ Of course, if the given data are expected to have high-dimensional features or nonlinear functions to treatments or outcomes, the ML-based approach is likely to have strengths. ◦ It is better to try several estimation methods, and if there is a difference between them, it is better to have an attitude of digging deeper into the causes.

20 Multiple regression model? That's too naive! If you're going
to do causal inference, it has to be a XX model (propensity model or DML or …), right? So reject！！ Multiple regression model, right? I see. But your data is p>>n, so maybe OLS doesn't estimate it well, maybe you should try DML or something? constructive advice Inappropriate advice

Extra Analysis  21 • Since DML and Meta Learner calculate
CATE, it is useful to visualize it using shap. • We can also check ﬂexible non-linear relationships rather than interaction terms with linear models. If you are interested, please refer to my NOTEBOOK. Younger age groups with higher 75-year annual incomes are more likely to beneﬁt from the treatment.

Thank you    ※ The Python code for this article
is stored in this repository.  https://github.com/MasaAsami/D2ML     22

References:  [1] 安井翔太（著）株式会社ホクソエム（監修）（2020）．『効果検証入門：正しい比較のための因果推論／計量経済学の基礎』技術評論社  [2] Microsoft Research . EconML User
Guide  [3] Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242.  [4] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters.  [5] Sant'Anna, P. H., & Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1), 101–122.  [6] R pkg. Doubly Robust Difference-in-Differences  [7] Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177–191. 23

[blog用] Comparison of Estimation Methods in Cau...

[blog用] Comparison of Estimation Methods in Causal Inference

Masa

More Decks by Masa

Other Decks in Science

Featured

Transcript

Comparison of estimation methods in causal inference  twitter @asas_mimi 1

Well-known RCT dataset: LaLonde(1986)  Dehejia, Rajeev and Sadek Wahba. (1999).Causal

Basic statistics : NSW  Dehejia, Rajeev and Sadek Wahba. (1999).Causal

RCT causal effect := 1676.3426   Dehejia, Rajeev and Sadek

Validation dataset  安井翔太（2020）『効果検証入門正しい比較のための因果推論 /. 計量経済学の基礎』、技術評論社 5 A validation dataset

Our Assumptions  True ATT = 1676.3426 strongly ignorable treatment assignment

Our Approaches   conditional treatment assignment conditional parallel-trend (1) multiple

(1) Multiple regression  Recently, propensity score-based methods have become popular,

(2) IPW for ATT  The weight of IPW estimation for

(3) Meta Learner : S & T -Learner  EconML “EconML

(3) Meta Learner : X-Learner    EconML “EconML User Guide“

(3) Meta Learner : DA-Learner  EconML “EconML User Guide“ (

(4) Double/Debiased Machine Learning  for non-linear CATE  EconML “EconML User

Assumptions for DID models  Chang, N. C. (2020). Double/debiased machine

(5) Doubly Robust DID  For this model only, the R

(6) Double/Debiased DID  Chang, N. C. (2020). Double/debiased machine learning

Estimated Results   17 • Point estimation and absolute error

Standard error…  18 • Considering the standard error, it is

Conclusions  19 • In this experiment, the identiﬁcation strategy is

20 Multiple regression model? That's too naive! If you're going

Extra Analysis  21 • Since DML and Meta Learner calculate

Thank you    ※ The Python code for this article

References:  [1] 安井翔太（著）株式会社ホクソエム（監修）（2020）．『効果検証入門：正しい比較のための因果推論／計量経済学の基礎』技術評論社  [2] Microsoft Research . EconML User