Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 2 The National Supported Work Demonstration (NSW) : The interest of this experiment is whether "vocational training" (counseling and short-term work experience) affects subsequent earnings. In the dataset, the treatment variable, vocational training, is denoted by treat, and the outcome variable, income in 1978, is denoted by re78. Data can be downloaded at the following website: https://users.nber.org/~rdehejia/data/ outcome treatment 1 or 0
Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 3 This experiment was conducted as an RCT, but the table below shows that the covariates are not completely balanced treated average control average | (treated avg - control avg) / std |
Wahba. (1999).Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94 (448): 1053-1062. LaLonde, Robert. (1986). Evaluating the Econometric Evaluations of Training Programs. American Economic Review 76:604-620. 4 We will use a simple multiple regression model and consider the "true causal effect" to be 1676.3426.
is created by excluding data from the NSW control group and instead considering non-experimental data (CPS: Current Population Survey) as the control group treated control NSW CPS control New DataSet !!
In the treated group, we know the true effect by RCT. However, there is no guarantee that the same effect would be obtained if the treatment were applied to the CPS data. In other words, ATT = ATU does not necessarily hold (i.e., we do not know the true ATE). All we can know with this validation data is the ATT (:= 1676.3426). ATE ATT Average treatment effects on the treated Strongly ignorable treatment assignment can also be expressed in terms of conditional independence. If treatment assignment T is conditionally independent of Y(1) , Y(0) given confounding covariates X , the treatment assignment is said to be strongly ignorable.
regression (2) Propensity score Approach (IPW) (3) Meta Learner • S-Learner • T-Learner • X-Learner • DomainAdaptation-Learner (4) Double/Debiased ML (5) Doubly Robust DID (6) Double/Debiased DID Except for Doubly Robust DID, these were validated without the use of a good package such as EconML. The reasons are as follows: . - To calculate the standard error (using the boatstrap method) - For my own study
and some people assume that causal inference cannot be made with multiple regression models. • Remember that the identiﬁcation strategy is almost the same as the propensity score-based approach. • Whether it is easier to model the outcome directly or the assignment to the treatment group depends on the case Multiple regression? Nonsense.lol Although this approach is very simple, it is a very bad attitude to assume that causal inference is not possible only because of multiple regression.
User Guide“ ( https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Create a separate model for the treatment and control groups, and then take the difference between the output values of the two models for each record. Simplest method! Adopt the difference between T=1 and T=0 as CATE in the learned model.
( https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Estimate outcome function Average the estimates (g(x) : propensity score) Compute imputed treatment effects Estimate CATE in 2 ways
https://econml.azurewebsites.net/spec/estimation/metalearners.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. Estimate outcome function using propensity score weighting Compute imputed treatment effects Estimate CATE
Guide“ ( https://econml.azurewebsites.net/spec/estimation/dml.html ) We use the following process to estimate the ATT (1) Once the CATE (conditional ATE) is calculated for each individual using these algorithms (2) Calculate the ATT by averaging the estimated CATE over the T=1 records. DML τ is a function of X and aims to compute CATE sample weight supervised Label τ(X) can be viewed as weighted supervised learning
learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. 14 The support of the ps of the treated is a subset of the support for the untreated conditional parallel-trend potential outcomes Counterfactual outcomes if no intervention is received treatment group control group violation for parallel-trend conditioning with X trend plot | X = 〇〇 ps Not overrap !! Common support is a subset of the untreated Comparable!! This states that the support of the propensity score of the treated group is a subset of the support for the untreated. This is the same constraint placed on ATT estimation in other propensity score methods
package was used as is. Please note that this is not a fair comparison since we used the default model without any modification. https://psantanna.com/DRDID/index.html
for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. supervised learning Label = Diff. Learning with control group only Cross fitting separates samples for “fitting” and “prediction” as in Chernozhukov (2018) propensity score
with RCT result • The default hyperparameters of LGBM are used for DML, Meta Leaner, and DMLDID. DRDID uses R package defaults If we only consider point estimates, DML is closest to the RCT result In the present case (table data with low dimensionality of features as well), such a simple model is sufﬁcient to adjust the bias without using a complex model such as the following.
almost the same. • The only difference is the "estimation method". • The most important thing in practice is to agree on the identiﬁcation strategy, which should be discussed with the stakeholders and make full use of domain knowledges. • The choice of estimation method itself should be decided ﬂexibly depending on the characteristics of the data. ◦ In the present case, a nonlinear ML-based approach proved to be overkill. ▪ Of course, if the given data are expected to have high-dimensional features or nonlinear functions to treatments or outcomes, the ML-based approach is likely to have strengths. ◦ It is better to try several estimation methods, and if there is a difference between them, it is better to have an attitude of digging deeper into the causes.
to do causal inference, it has to be a XX model (propensity model or DML or …), right? So reject！！ Multiple regression model, right? I see. But your data is p>>n, so maybe OLS doesn't estimate it well, maybe you should try DML or something? constructive advice Inappropriate advice
CATE, it is useful to visualize it using shap. • We can also check ﬂexible non-linear relationships rather than interaction terms with linear models. If you are interested, please refer to my NOTEBOOK. Younger age groups with higher 75-year annual incomes are more likely to beneﬁt from the treatment.
Guide [3] Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242. [4] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. [5] Sant'Anna, P. H., & Zhao, J. (2020). Doubly robust difference-in-differences estimators. Journal of Econometrics, 219(1), 101–122. [6] R pkg. Doubly Robust Difference-in-Differences [7] Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177–191. 23