Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DMLDiD

Masa
March 22, 2022

 DMLDiD

Masa

March 22, 2022
Tweet

More Decks by Masa

Other Decks in Science

Transcript

  1. Double/debiased machine learning for
    DiD with Python

    twitter @asas_mimi
    1
    (repeated outcomes)

    View full-size slide

  2. Table of
    Contents
    1. DMLDiD
    2. Reproducing the paper
    3. New SIMULATION
    4. Next Works
    2

    View full-size slide

  3. Original Paper:
    Chang, Neng-Chieh. "Double/debiased machine learning for
    difference-in-differences models." The Econometrics Journal 23.2 (2020):
    177-191
    https://academic.oup.com/ectj/article/23/2/177/5722119#247745047
    3

    View full-size slide

  4. Data structure : repeated outcomes

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191. 4
    The following data can be observed
    Pre-intervention Outcomes
    post-intervention Outcomes
    treatment group or not
    covariates

    View full-size slide

  5. Assumptions for repeated outcomes

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    5
    The support of the ps of the treated is a
    subset of the support for the untreated
    conditional parallel-trend
    potential outcomes
    Counterfactual outcomes if no intervention
    is received
    treatment group control group
    violation for
    parallel-trend
    conditioning
    with X trend plot | X = 〇〇
    ps
    Not overrap !!
    Common support is a
    subset of the untreated
    Comparable!!
    This states that the support of the propensity score of the
    treated group is a subset of the support for the untreated.
    This is the same constraint placed on ATT estimation in
    other propensity score methods

    View full-size slide

  6. previous work : Abadie (2005)

    Abadie A. (2005). Semiparametric difference-in-differences estimators, Review of Economic Studies, 72, 1–19.
    6
    propensity score
    Simple diff of pre vs. post
    ΔY
    D=1 & ps = 0.9 D=1 & ps = 0.1 D=0 & ps = 0.1
    D=0 & ps = 0.9
    In the example below, P(D)=0.5
    no weight ( only 2 : inverse of P(D)=0.5)
    Since we want ATT, we do not weight the
    treatment group by the propensity score.
    -9
    ps = 0.9 ~>
    Homogeneous with the
    treated
    -0.111
    ps = 0.1 ~>
    Heterogeneous with the
    treated
    These are the untreated, so they are weighted
    negatively.

    View full-size slide

  7. DMLDiD : Chang (2020)

    g(X)
    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    p g(X)
    Another ML model
    added!!


    View full-size slide

  8. DMLDiD : Chang (2020)

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    Predictive model
    (supervised learning) 

    Label = Diff.

    Learning with control group only
    Cross fitting 

    separates samples for “fitting” and
    “prediction” as in Chernozhukov (2018)


    View full-size slide

  9. DMLDiD : Chang (2020)

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    Diff.
    As in Abadie (2005), we weight the
    propensity score to the untreated.
    Propensity scores and P(D) are
    also calculated by "cross fitting".
    Observable
    increase/decrease
    (Diff)

    Counterfactual
    increase/decrease
    (Diff)
    If there was no intervention
    (= counterfactual),
    the Diff would look something like this

    View full-size slide

  10. Score function

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    DMLDiD’s score function is as follows:
    New
    with the unknown constant p0 = P(D = 1) and the infinite-dimensional nuisance parameter:
    nuisance
    parameter

    nuisance
    parameter


    View full-size slide

  11. Orthogonality & Asymptotic properties

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    DMLDiD’s score function obey the Neyman orthogonality:
    The orthogonality property above says that the score function is invariant to small
    perturbations of the nuisance parameters g(propensity socre) and l (outcome model)
    consistent estimator
    for the asymptotic variance
    root-N consistency
    DMLDiD can achieve root-N consistency

    View full-size slide

  12. Reproducing Chang (2020)

    My notebooks are here:

    https://github.com/MasaAsami/ReproducingDMLDiD 


    These implementations were based on the following R
    package:

    https://github.com/NengChiehChang/Diff-in-Diff 

    12

    View full-size slide

  13. 13
    def dmldid_rc(
    df, y1_col, y0_col, d_col, X_cols,
    ps_model=LogisticRegressionCV(cv=5, random_state=333, penalty="l1", solver="saga"),
    l1k_model=LassoCV(cv=5, random_state=333),
    ) -> np.float:
    K = 2
    df_set = train_test_split(df, random_state=0, test_size=0.5)
    thetabar = []
    for i in range(K):
    k = 0 if i == 0 else 1
    c = 1 if i == 0 else 0
    ps_model.fit(df_set[c][X_cols], df_set[c][d_col])
    eps = 0.03
    ghat = np.clip(
    ps_model.predict_proba(df_set[k][X_cols])[:, 1],
    eps,
    1 - eps,
    )
    DMLDID for repeated outcomes


    cross-fitting
    propensity socre

    View full-size slide

  14. def dmldid_rc(....):
    …..
    control_y0 = df_set[c].query(f"{d_col} < 1")[y0_col]
    control_y1 = df_set[c].query(f"{d_col} < 1")[y1_col]
    _y = control_y1 - control_y0
    control_x = df_set[c].query(f"{d_col} < 1")[X_cols]
    l1k_model.fit(control_x, _y)
    l1hat = l1k_model.predict(df_set[k][X_cols])
    p_hat = df_set[c][d_col].mean()
    _e = (
    (df_set[k][y1_col] - df_set[k][y0_col])
    / p_hat
    * (df_set[k][d_col] - ghat)
    / (1 - ghat)
    - (df_set[k][d_col] - ghat) / p_hat / (1 - ghat) * l1hat
    ).mean()
    thetabar.append(_e)
    return np.mean(thetabar) 14
    DMLDID for repeated outcome

    outcome model

    View full-size slide

  15. 15
    Simulation data of Chang (2020) 

    ????
    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    ??
    The data generating process in the original paper seems inappropriate in terms of testing
    the accuracy of this model.
    The conditional parallel trend assumption is not well represented. (This would be
    sufficient with ordinary DiD)

    View full-size slide

  16. 16
    reproduction result

    Chang, N. C. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal, 23(2), 177-191.
    Although we were able to show its superiority over previous studies,
    simple DID is sufficient because it still does not represent bias well in the data generation
    process.

    View full-size slide

  17. New Simulation Data

    New Data
    Chang (2020)
    X

    D
 ΔY

    Y(0)

    ??
    X

    D
 ΔY

    I prepared the following data and experimented again.
    ※ ΔY := Y(1) - Y(0)
    X

    unobservable

    View full-size slide

  18. New Simulation Data

    'Latent group' is the simulation variable that generates each variable.
    In the simulation, we assume that it is not possible to directly observe which latent group
    each unit belongs to.
    Y(0) = Y_2022
    Y(1) = Y_2023
    X : x0~x99
    latent_group


    View full-size slide

  19. No parallel trend

    The parallel trend assumption is clearly not met by the aggregate data. It is clear that the
    ATT cannot be addressed by normal DiD.

    View full-size slide

  20. Conditional parallel trend

    Conditional parallel trend assumption is satisfied. However, latent groups shall not be
    observable.

    View full-size slide

  21. Treatment group allocation

    Each latent group is designed for 20 units.
    # (Code excerpt)
    df["latent_group"] = sg # 0 ~ 9
    df["latent_ps"] = np.clip(1- sg/10, 0.0001, 1- 0.1)
    df["D"] = df["latent_ps"].apply(lambda x: np.random.binomial(1, x))
    # (Code excerpt)
    X

    D
 ΔY

    X

    unobservable
    `latent group`
    directly affects D


    View full-size slide

  22. Failure to meet backdoor criteria with PS

    `Latent group` variable is provided to make estimation difficult in PS-based models such as
    Abadie (2005). I tested whether DMLDiD can estimate ATT unbiasedly in such DAG.
    X

    D
 ΔY

    X

    PS-based Approach
    DAG
    `latent group` directly affects D! But unobservable !!
    X

    D
 ΔY

    X

    backdoor block
    with PS
    Failure to close total backdoor paths. A backpath
    through `latent group` may still exist.

    View full-size slide

  23. 23
    Simulation result

    True ATT = 3

    View full-size slide

  24. Next Works:

    Chang (2020) also devised DMLDiD in repeated cross-session
    data. I will try to reproduce it in the next issue. (There seem to
    be a few errors in the original demonstration.)

    Also, DMLDiD seems to be very versatile. I am currently
    developing a Python package

    24

    View full-size slide

  25. Thank you

    25

    View full-size slide

  26. References:

    [1] Chang, Neng-Chieh. (2020). "Double/debiased machine learning for
    difference-in-differences models." The Econometrics Journal 23.2 : 177–191

    [2] Abadie A. (2005). Semiparametric difference-in-differences estimators, Review of
    Economic Studies, 72, 1–19.

    [3] Chernozhukov V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, J. Robins
    (2018). Double/debiased machine learning for treatment and structural parameters,
    Econometrics Journal, 21, C1–C68.

    [4] (slide) 加藤真大 (2021)「DMLによる差分の差推定」
    https://speakerdeck.com/masakat0/dmlniyoruchai-fen-falsechai-tui-ding
    26

    View full-size slide