DMLDiD - Speaker Deck

Slide 1

Slide 1 text

Double/debiased machine learning for DiD with Python  twitter @asas_mimi 1 (repeated outcomes)

Slide 2

Slide 2 text

Table of Contents 1. DMLDiD 2. Reproducing the paper 3. New SIMULATION 4. Next Works 2

Slide 3

Slide 3 text

Original Paper: Chang, Neng-Chieh. "Double/debiased machine learning for difference-in-differences models." The Econometrics Journal 23.2 (2020): 177-191 https://academic.oup.com/ectj/article/23/2/177/5722119#247745047 3

Slide 4

Slide 4 text

Data structure : repeated outcomes  Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. 4 The following data can be observed Pre-intervention Outcomes post-intervention Outcomes treatment group or not covariates

Slide 5

Slide 5 text

Assumptions for repeated outcomes  Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. 5 The support of the ps of the treated is a subset of the support for the untreated conditional parallel-trend potential outcomes Counterfactual outcomes if no intervention is received treatment group　control group violation for parallel-trend conditioning with X trend plot | X = 〇〇 ps Not overrap !! Common support is a subset of the untreated Comparable!! This states that the support of the propensity score of the treated group is a subset of the support for the untreated. This is the same constraint placed on ATT estimation in other propensity score methods

Slide 6

Slide 6 text

previous work : Abadie (2005)  Abadie A. (2005). Semiparametric difference-in-differences estimators, Review of Economic Studies, 72, 1–19. 6 propensity score Simple diff of pre vs. post ΔY D=1 & ps = 0.9 D=1 & ps = 0.1 D=0 & ps = 0.1 D=0 & ps = 0.9 In the example below, P(D)=0.5 no weight ( only 2 : inverse of P(D)=0.5) Since we want ATT, we do not weight the treatment group by the propensity score. -9 ps = 0.9 ~> Homogeneous with the treated -0.111 ps = 0.1 ~> Heterogeneous with the treated These are the untreated, so they are weighted negatively.

Slide 7

Slide 7 text

DMLDiD : Chang (2020)  g(X) Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. p g(X) Another ML model added！！ 

Slide 8

Slide 8 text

DMLDiD : Chang (2020)  Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. Predictive model (supervised learning)   Label = Diff.  Learning with control group only Cross fitting   separates samples for “fitting” and “prediction” as in Chernozhukov (2018) 

Slide 9

Slide 9 text

DMLDiD : Chang (2020)  Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. Diff. As in Abadie (2005), we weight the propensity score to the untreated. Propensity scores and P(D) are also calculated by "cross ﬁtting". Observable increase/decrease (Diff) ー Counterfactual increase/decrease (Diff) If there was no intervention (= counterfactual), the Diff would look something like this

Slide 10

Slide 10 text

Score function  Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. DMLDiD’s score function is as follows: New with the unknown constant p0 = P(D = 1) and the inﬁnite-dimensional nuisance parameter: nuisance parameter  nuisance parameter 

Slide 11

Slide 11 text

Orthogonality & Asymptotic properties  Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. DMLDiD’s score function obey the Neyman orthogonality: The orthogonality property above says that the score function is invariant to small perturbations of the nuisance parameters g(propensity socre) and l (outcome model) consistent estimator for the asymptotic variance root-N consistency DMLDiD can achieve root-N consistency

Slide 12

Slide 12 text

Reproducing Chang (2020)  My notebooks are here:  https://github.com/MasaAsami/ReproducingDMLDiD     These implementations were based on the following R package:  https://github.com/NengChiehChang/Diff-in-Diff   12

Slide 13

Slide 13 text

13 def dmldid_rc( df, y1_col, y0_col, d_col, X_cols, ps_model=LogisticRegressionCV(cv=5, random_state=333, penalty="l1", solver="saga"), l1k_model=LassoCV(cv=5, random_state=333), ) -> np.float: K = 2 df_set = train_test_split(df, random_state=0, test_size=0.5) thetabar = [] for i in range(K): k = 0 if i == 0 else 1 c = 1 if i == 0 else 0 ps_model.fit(df_set[c][X_cols], df_set[c][d_col]) eps = 0.03 ghat = np.clip( ps_model.predict_proba(df_set[k][X_cols])[:, 1], eps, 1 - eps, ) DMLDID for repeated outcomes    cross-ﬁtting propensity socre

Slide 14

Slide 14 text

def dmldid_rc(....): ….. control_y0 = df_set[c].query(f"{d_col} < 1")[y0_col] control_y1 = df_set[c].query(f"{d_col} < 1")[y1_col] _y = control_y1 - control_y0 control_x = df_set[c].query(f"{d_col} < 1")[X_cols] l1k_model.fit(control_x, _y) l1hat = l1k_model.predict(df_set[k][X_cols]) p_hat = df_set[c][d_col].mean() _e = ( (df_set[k][y1_col] - df_set[k][y0_col]) / p_hat * (df_set[k][d_col] - ghat) / (1 - ghat) - (df_set[k][d_col] - ghat) / p_hat / (1 - ghat) * l1hat ).mean() thetabar.append(_e) return np.mean(thetabar) 14 DMLDID for repeated outcome  outcome model

Slide 15

Slide 15 text

15 Simulation data of Chang (2020)   ???? Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. ?? The data generating process in the original paper seems inappropriate in terms of testing the accuracy of this model. The conditional parallel trend assumption is not well represented. (This would be sufficient with ordinary DiD)

Slide 16

Slide 16 text

16 reproduction result  Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. Although we were able to show its superiority over previous studies, simple DID is sufficient because it still does not represent bias well in the data generation process.

Slide 17

Slide 17 text

New Simulation Data  New Data Chang (2020) X  D  ΔY  Y(0)  ?? X  D  ΔY  I prepared the following data and experimented again. ※ ΔY := Y(1) - Y(0) X  unobservable

Slide 18

Slide 18 text

New Simulation Data  'Latent group' is the simulation variable that generates each variable. In the simulation, we assume that it is not possible to directly observe which latent group each unit belongs to. Y(0) = Y_2022 Y(1) = Y_2023 X : x0~x99 latent_group 

Slide 19

Slide 19 text

No parallel trend  The parallel trend assumption is clearly not met by the aggregate data. It is clear that the ATT cannot be addressed by normal DiD.

Slide 20

Slide 20 text

Conditional parallel trend  Conditional parallel trend assumption is satisfied. However, latent groups shall not be observable.

Slide 21

Slide 21 text

Treatment group allocation  Each latent group is designed for 20 units. # (Code excerpt) df["latent_group"] = sg # 0 ~ 9 df["latent_ps"] = np.clip(1- sg/10, 0.0001, 1- 0.1) df["D"] = df["latent_ps"].apply(lambda x: np.random.binomial(1, x)) # (Code excerpt) X  D  ΔY  X  unobservable `latent group` directly affects D 

Slide 22

Slide 22 text

Failure to meet backdoor criteria with PS  `Latent group` variable is provided to make estimation difficult in PS-based models such as Abadie (2005). I tested whether DMLDiD can estimate ATT unbiasedly in such DAG. X  D  ΔY  X  PS-based Approach DAG `latent group` directly affects D! But unobservable !! X  D  ΔY  X  backdoor block with PS Failure to close total backdoor paths. A backpath through `latent group` may still exist.

Slide 23

Slide 23 text

23 Simulation result  True ATT = 3

Slide 24

Slide 24 text

Next Works:  Chang (2020) also devised DMLDiD in repeated cross-session data. I will try to reproduce it in the next issue. (There seem to be a few errors in the original demonstration.)  Also, DMLDiD seems to be very versatile. I am currently developing a Python package  24

Slide 25

Slide 25 text

Thank you  25

Slide 26

Slide 26 text

References:  [1] Chang, Neng-Chieh. (2020). "Double/debiased machine learning for difference-in-differences models." The Econometrics Journal 23.2 : 177–191  [2] Abadie A. (2005). Semiparametric difference-in-differences estimators, Review of Economic Studies, 72, 1–19.  [3] Chernozhukov V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, J. Robins (2018). Double/debiased machine learning for treatment and structural parameters, Econometrics Journal, 21, C1–C68.  [4] (slide) 加藤真大 (2021)「DMLによる差分の差推定」 https://speakerdeck.com/masakat0/dmlniyoruchai-fen-falsechai-tui-ding 26