Slide 1

Slide 1 text

Double/debiased machine learning for DiD with Python
 twitter @asas_mimi 1 (repeated outcomes)

Slide 2

Slide 2 text

Table of Contents 1. DMLDiD 2. Reproducing the paper 3. New SIMULATION 4. Next Works 2

Slide 3

Slide 3 text

Original Paper: Chang, Neng-Chieh. "Double/debiased machine learning for difference-in-differences models." The Econometrics Journal 23.2 (2020): 177-191 https://academic.oup.com/ectj/article/23/2/177/5722119#247745047 3

Slide 4

Slide 4 text

Data structure : repeated outcomes
 Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. 4 The following data can be observed Pre-intervention Outcomes post-intervention Outcomes treatment group or not covariates

Slide 5

Slide 5 text

Assumptions for repeated outcomes
 Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. 5 The support of the ps of the treated is a subset of the support for the untreated conditional parallel-trend potential outcomes Counterfactual outcomes if no intervention is received treatment group control group violation for parallel-trend conditioning with X trend plot | X = 〇〇 ps Not overrap !! Common support is a subset of the untreated Comparable!! This states that the support of the propensity score of the treated group is a subset of the support for the untreated. This is the same constraint placed on ATT estimation in other propensity score methods

Slide 6

Slide 6 text

previous work : Abadie (2005)
 Abadie A. (2005). Semiparametric difference-in-differences estimators, Review of Economic Studies, 72, 1–19. 6 propensity score Simple diff of pre vs. post ΔY D=1 & ps = 0.9 D=1 & ps = 0.1 D=0 & ps = 0.1 D=0 & ps = 0.9 In the example below, P(D)=0.5 no weight ( only 2 : inverse of P(D)=0.5) Since we want ATT, we do not weight the treatment group by the propensity score. -9 ps = 0.9 ~> Homogeneous with the treated -0.111 ps = 0.1 ~> Heterogeneous with the treated These are the untreated, so they are weighted negatively.

Slide 7

Slide 7 text

DMLDiD : Chang (2020)
 g(X) Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. p g(X) Another ML model added!!


Slide 8

Slide 8 text

DMLDiD : Chang (2020)
 Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. Predictive model (supervised learning) 
 Label = Diff.
 Learning with control group only Cross fitting 
 separates samples for “fitting” and “prediction” as in Chernozhukov (2018)


Slide 9

Slide 9 text

DMLDiD : Chang (2020)
 Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. Diff. As in Abadie (2005), we weight the propensity score to the untreated. Propensity scores and P(D) are also calculated by "cross fitting". Observable increase/decrease (Diff) ー Counterfactual increase/decrease (Diff) If there was no intervention (= counterfactual), the Diff would look something like this

Slide 10

Slide 10 text

Score function
 Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. DMLDiD’s score function is as follows: New with the unknown constant p0 = P(D = 1) and the infinite-dimensional nuisance parameter: nuisance parameter
 nuisance parameter


Slide 11

Slide 11 text

Orthogonality & Asymptotic properties
 Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. DMLDiD’s score function obey the Neyman orthogonality: The orthogonality property above says that the score function is invariant to small perturbations of the nuisance parameters g(propensity socre) and l (outcome model) consistent estimator for the asymptotic variance root-N consistency DMLDiD can achieve root-N consistency

Slide 12

Slide 12 text

Reproducing Chang (2020)
 My notebooks are here:
 https://github.com/MasaAsami/ReproducingDMLDiD 
 
 These implementations were based on the following R package:
 https://github.com/NengChiehChang/Diff-in-Diff 
 12

Slide 13

Slide 13 text

13 def dmldid_rc( df, y1_col, y0_col, d_col, X_cols, ps_model=LogisticRegressionCV(cv=5, random_state=333, penalty="l1", solver="saga"), l1k_model=LassoCV(cv=5, random_state=333), ) -> np.float: K = 2 df_set = train_test_split(df, random_state=0, test_size=0.5) thetabar = [] for i in range(K): k = 0 if i == 0 else 1 c = 1 if i == 0 else 0 ps_model.fit(df_set[c][X_cols], df_set[c][d_col]) eps = 0.03 ghat = np.clip( ps_model.predict_proba(df_set[k][X_cols])[:, 1], eps, 1 - eps, ) DMLDID for repeated outcomes
 
 cross-fitting propensity socre

Slide 14

Slide 14 text

def dmldid_rc(....): ….. control_y0 = df_set[c].query(f"{d_col} < 1")[y0_col] control_y1 = df_set[c].query(f"{d_col} < 1")[y1_col] _y = control_y1 - control_y0 control_x = df_set[c].query(f"{d_col} < 1")[X_cols] l1k_model.fit(control_x, _y) l1hat = l1k_model.predict(df_set[k][X_cols]) p_hat = df_set[c][d_col].mean() _e = ( (df_set[k][y1_col] - df_set[k][y0_col]) / p_hat * (df_set[k][d_col] - ghat) / (1 - ghat) - (df_set[k][d_col] - ghat) / p_hat / (1 - ghat) * l1hat ).mean() thetabar.append(_e) return np.mean(thetabar) 14 DMLDID for repeated outcome
 outcome model

Slide 15

Slide 15 text

15 Simulation data of Chang (2020) 
 ???? Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. ?? The data generating process in the original paper seems inappropriate in terms of testing the accuracy of this model. The conditional parallel trend assumption is not well represented. (This would be sufficient with ordinary DiD)

Slide 16

Slide 16 text

16 reproduction result
 Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. Although we were able to show its superiority over previous studies, simple DID is sufficient because it still does not represent bias well in the data generation process.

Slide 17

Slide 17 text

New Simulation Data
 New Data Chang (2020) X
 D
 ΔY
 Y(0)
 ?? X
 D
 ΔY
 I prepared the following data and experimented again. ※ ΔY := Y(1) - Y(0) X
 unobservable

Slide 18

Slide 18 text

New Simulation Data
 'Latent group' is the simulation variable that generates each variable. In the simulation, we assume that it is not possible to directly observe which latent group each unit belongs to. Y(0) = Y_2022 Y(1) = Y_2023 X : x0~x99 latent_group


Slide 19

Slide 19 text

No parallel trend
 The parallel trend assumption is clearly not met by the aggregate data. It is clear that the ATT cannot be addressed by normal DiD.

Slide 20

Slide 20 text

Conditional parallel trend
 Conditional parallel trend assumption is satisfied. However, latent groups shall not be observable.

Slide 21

Slide 21 text

Treatment group allocation
 Each latent group is designed for 20 units. # (Code excerpt) df["latent_group"] = sg # 0 ~ 9 df["latent_ps"] = np.clip(1- sg/10, 0.0001, 1- 0.1) df["D"] = df["latent_ps"].apply(lambda x: np.random.binomial(1, x)) # (Code excerpt) X
 D
 ΔY
 X
 unobservable `latent group` directly affects D


Slide 22

Slide 22 text

Failure to meet backdoor criteria with PS
 `Latent group` variable is provided to make estimation difficult in PS-based models such as Abadie (2005). I tested whether DMLDiD can estimate ATT unbiasedly in such DAG. X
 D
 ΔY
 X
 PS-based Approach DAG `latent group` directly affects D! But unobservable !! X
 D
 ΔY
 X
 backdoor block with PS Failure to close total backdoor paths. A backpath through `latent group` may still exist.

Slide 23

Slide 23 text

23 Simulation result
 True ATT = 3

Slide 24

Slide 24 text

Next Works:
 Chang (2020) also devised DMLDiD in repeated cross-session data. I will try to reproduce it in the next issue. (There seem to be a few errors in the original demonstration.)
 Also, DMLDiD seems to be very versatile. I am currently developing a Python package
 24

Slide 25

Slide 25 text

Thank you
 25

Slide 26

Slide 26 text

References:
 [1] Chang, Neng-Chieh. (2020). "Double/debiased machine learning for difference-in-differences models." The Econometrics Journal 23.2 : 177–191
 [2] Abadie A. (2005). Semiparametric difference-in-differences estimators, Review of Economic Studies, 72, 1–19.
 [3] Chernozhukov V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, J. Robins (2018). Double/debiased machine learning for treatment and structural parameters, Econometrics Journal, 21, C1–C68.
 [4] (slide) 加藤真大 (2021)「DMLによる差分の差推定」 https://speakerdeck.com/masakat0/dmlniyoruchai-fen-falsechai-tui-ding 26