$30 off During Our Annual Pro Sale. View Details »

Synthetic Control Methods through Predictive Synthesis

MasaKat0
August 02, 2023

Synthetic Control Methods through Predictive Synthesis

Presentation slides at EcoSta 2023.

MasaKat0

August 02, 2023
Tweet

More Decks by MasaKat0

Other Decks in Research

Transcript

  1. Synthetic Control Methods
    through Predictive Synthesis
    Masahiro Kato (The University of Tokyo)
    Coauthors: Akira Fukuda, Kosaku Takanashi,
    Kenichiro McAlinn, Akari Ohda, Masaaki Imaizumi
    Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity (https://arxiv.org/abs/2307.11127)
    Paper 2: Bayesian Predictive Synthetic Control Methods
    (https://drive.google.com/file/d/1veWTQTuWTx2gAMyh7VSZnenxsVqs1nla/view)
    Speaker Deck: https://speakerdeck.com/masakat0/synthetic-control-methods-through-predictive-synthesis?slide=25

    View Slide

  2. Synthetic Control Methods
    ØSynthetic Control Methods (SCMs; Abadie et al. 2003).
    n Core idea.
    • There are several units. One unit among them receives a policy intervention (treated unit).
    • Policy intervention affects outcomes of the treated unit.
    • We cannot observe outcomes when the treated unit does not receive the policy intervention
    • Estimate counterfactual outcomes of the treated unit by using a weighted sum of observed
    outcomes of untreated units.
    • Then, using the estimated outcome, estimate the causal effect of the treated unit.

    View Slide

  3. Problem Setting
    n 𝐽 + 1 units, 𝑗 ∈ 𝒥 ≔ {0,1,2, … , 𝐽}.
    • 𝑗 = 0: Treated unit (a unit affected by the policy intervention).
    • 𝑗 ∈ 𝒥!: 𝒥 ∖ {0}: Untreated units.
    n 𝑇 Periods, 𝑡 ∈ 𝒯 ≔ {1,2, … , 𝑇}.
    • Intervention occurs at 𝑡 = 𝑇" < 𝑇.
    • 𝑡 ∈ 𝒯
    " ≔ {1,2, … , 𝑇"} : before the intervention.
    • 𝑡 ∈ 𝒯 ∖ 𝒯
    "
    : after the intervention (𝑇# ≔ 𝒯
    # = 𝑇 − 𝑇"
    ).

    View Slide

  4. Problem Setting
    ØPotential outcomes (Neyman, 1923; Rubin, 1974):
    n For each unit 𝑗 ∈ 𝒥 and period 𝑡 ∈ 𝒯, define potential outcomes 𝑌
    $,&
    ' , 𝑌
    $,&
    ( ∈ ℝ).
    • 𝑌&
    ' and 𝑌&
    ( are potential outcomes with and without interventions.
    • 𝔼$,&
    : expectation over 𝑌&
    ' and 𝑌&
    (.
    ØObservations:
    n Observe one of the outcomes, 𝑌
    $,& ∈ ℝ, corresponding to actual intervention; that is,
    𝑌",& = 9
    𝑌",&
    ' 𝑖𝑓 𝑡 ∈ 𝒯
    #
    𝑌",&
    ( 𝑖𝑓 𝑡 ∈ 𝒯
    "
    , 𝑌
    $,& = 𝑌
    $,&
    ( 𝑓𝑜𝑟 𝑗 ∈ 𝒥!.

    View Slide

  5. Problem Setting
    ØCausal effects:
    𝜏",& ≔ 𝔼",& 𝑌",&
    ' − 𝑌",&
    ( 𝑓𝑜𝑟 𝑡 ∈ 𝒯
    #.
    n Estimating the causal effect by predicting 𝑌",&
    ( for 𝑡 ∈ 𝒯
    #
    .
    ØCore idea.
    n Predict 𝑌",&
    ( by a weighted sum of 𝑌#,&
    ( , … , 𝑌
    *,&
    (.
    A
    𝑌",&
    ( = ∑
    $∈*!
    𝑤$𝑌
    $,&
    ( .
    • A
    𝑌",&
    ( is a counterfactual trend of the treated unit.
    • A
    𝑌",&
    ( is called a synthetic control unit.
    v

    View Slide

  6. Contents
    ØResearch questions mainly lie in estimation of the weights, 𝑤#, … , 𝑤*
    .
    n Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity.
    • Estimators in existing SCMs are not consistent (Ferman and Pinto, 2021).
    • We discuss the inconsistency problem from the viewpoint of endogeneity.
    • Propose frequentist SCMs with the generalized method of moments (GMM).
    n Paper 2: Bayesian Predictive Synthetic Control Methods.
    • Apply Bayesian predicative synthesis for SCMs.
    • Flexible modeling with time-varying parameter, finite-sample analysis, and minimax optimality.

    View Slide

  7. Synthetic Control Methods by Density Matching
    under Implicit Endogeneity

    View Slide

  8. Least-Squares Estimator
    n In standard SCM, we usually estimate the weights by constraint least squares.
    • That is, we estimate 𝑤$
    as
    D
    𝑤$
    ,-
    $∈*!
    = arg min
    ." "∈$!
    1
    𝑇
    K
    &∈𝒯
    %
    𝑌",&
    ( − 𝑤$𝑌
    $,&
    ( )
    such that K
    $∈*!
    𝑤$ = 1, 𝑤$ ≥ 0 ∀𝑗 ∈ 𝐽!.
    n To justify the least squares (LS) estimator, we assume the linearity in the expected outcomes:
    𝔼 𝑌",&
    ( = K
    $∈*!
    𝑤$
    ∗𝔼 𝑌
    $,&
    ( .
    v

    View Slide

  9. Inconsistency of the LS Estimator
    n Ferman and Pinto (2021) shows that the LS estimator is inconsistent; that is,
    D
    𝑤$
    ,- →
    1
    T
    𝑤$ ≠ 𝑤2
    ∗.
    • They propose another LS-based estimator that reduces the bias.
    • However, the estimator is still biased.
    n Their results imply that the LS estimator is incompatible to SCMs under the linearity
    assumption, 𝔼 𝑌",&
    ( = ∑
    $∈*!
    𝑤$
    ∗𝔼 𝑌
    $,&
    ( .
    v

    View Slide

  10. Implicit Endogeneity
    n We investigate this problem from the viewpoint of endogeneity.
    • Let 𝑌
    $,&
    ( = 𝔼$,& 𝑌
    $,&
    ( + 𝜀$,&
    .
    • Under 𝔼 𝑌",&
    ( = ∑
    $∈*!
    𝑤$
    ∗𝔼 𝑌
    $,&
    ( , it holds that
    𝑌",&
    ( = K
    $∈𝒥!
    𝑤$
    ∗ 𝑌
    $,&
    ( − K
    $∈𝒥!
    𝑤$
    ∗ 𝜀$,& + 𝜀",& = K
    $∈𝒥!
    𝑤$
    ∗ 𝑌
    $,&
    ( + 𝑣&.
    n Implicit endogeneity (measurement error bias): correlation between 𝑌
    $,&
    ( and 𝑣&
    .
    • There is an (implicit) endogeneity between the explanatory variable and the error term.
    • This is a reason why the LS estimator D
    𝑤$
    ,- is biased; that is, D
    𝑤$
    ,- →
    1
    T
    𝑤$ ≠ 𝑤$
    ∗.
    v

    View Slide

  11. Mixture Models
    n The implicit endogeneity implies that the LS estimator is incompatible to SCMs.
    ØConsider another estimation strategy.
    n Assume mixture models and estimate the weights by the GMM.
    • 𝑝$,&(𝑦): density of 𝑌
    $,&
    (
    n Mixture models between 𝑝",&(𝑦) and 𝑝$,& 𝑦
    $∈𝒥!
    :
    𝑝",& 𝑦 = K
    $∈𝒥!
    𝑤$
    ∗ 𝑝$,& 𝑦 .
    v

    View Slide

  12. Fine-Grained Models.
    n Assuming mixture models is stronger than assuming 𝔼 𝑌",&
    ( = ∑
    $∈*!
    𝑤$
    ∗𝔼 𝑌
    $,&
    ( .
    ØMixture models can be justified from the viewpoint of fine-grained models (Shi et al., 2021).
    n Linear factor models are usually assumed in SCMs:
    𝑌
    $,&
    ( = 𝑐$ + 𝛿& + 𝜆&𝜇$ + 𝜀$,&, 𝑌
    $,&
    ' = 𝜏",& + 𝑌
    $,&
    (
    • Shi et al., (2021) finds that mixture models imply factor models under some assumptions.

    View Slide

  13. Fine-Grained Models.
    ØFine-grained models (Shi et al., 2021).
    n Assume that 𝑌
    $,&
    ( represents a group-level outcome.
    n In each unit 𝑗, there are unobserved small units 𝑌
    $,
    ( , 𝑌
    $,&)
    ( , ….
    = In each unit, there are unobserved units that constitute 𝑌
    $,&
    (.
    → Under some assumptions,
    • each 𝑝$,& 𝑦 can be linked to the linear factor model, and
    • 𝑝",& 𝑦 = ∑
    $∈𝒥!
    𝑤$
    ∗ 𝑝$,& 𝑦 holds.
    𝑌!,#
    $
    𝑌%,#
    $ 𝑌&,#
    $
    𝑌!,#,%
    $
    𝑌!,#,&
    $ 𝑌!,#,'
    $
    𝑌%,#,%
    $
    𝑌%,#,%
    $
    𝑌%,#,%
    $
    𝑌&,#,%
    $
    𝑌&,#,%
    $
    𝑌&,#,%
    $

    View Slide

  14. Moment Conditions
    ØMoment conditions.
    n Under the mixture models, the following moment conditions hold:
    𝔼",& 𝑌",&
    ( 4
    = K
    $∈𝒥!
    𝑤$
    ∗ 𝔼$,& 𝑌
    $,&
    ( 4
    ∀ 𝛾 ∈ ℝ5.
    n Empirical approximation of 𝔼",& 𝑌",&
    ( 4
    − ∑
    $∈𝒥!
    𝑤$
    ∗ 𝔼$,& 𝑌
    $,&
    ( 4
    :
    D
    𝑚4 𝑤 ≔
    1
    𝑇"
    K
    &∈𝒯
    %
    𝑌",&
    ( 4
    − K
    $∈𝒥!
    𝑤$ 𝑌
    $,&
    ( 4
    .
    • We estimate 𝑤 to achieve D
    𝑚4 𝑤 ≈ 0.
    v
    v

    View Slide

  15. GMM
    n A set of positive values Γ ≔ {1,2,3, … , 𝐺}, e.x., Γ = {1, 2, 3, 4, 5}.
    n Estimate 𝑤$
    ∗ as
    D
    𝑤$
    677
    $∈𝒥!
    ≔ arg min
    ." :∑
    "∈𝒥!
    .":#
    K
    4∈ ;
    D
    𝑚4 𝑤
    )
    .
    • We can weight each empirical moment condition; that is, by using some weight 𝑣4 ∈ ℝ5,
    D
    𝑤$
    677
    $∈𝒥!
    ≔ arg min
    ." :∑
    "∈𝒥!
    .":#
    K
    4∈ ;
    𝑣4 D
    𝑚4 𝑤
    )
    .
    n We can show that the GMM estimator is asymptotically unbiased; that is,
    D
    𝑤$
    677 →
    1
    𝑤$
    ∗.
    v

    View Slide

  16. Inference
    n Hypothesis testing about the sharp null
    𝐻": 𝜏",& = 0 for 𝑡 ∈ 𝒯
    #.
    • Note that 𝜏",& = 𝑌",&
    ' − 𝑌",&
    ( under the linear factor model.
    n We usually employ the conformal inference for testing the hypothesis.
    • Nonparametrically test the sharp null.
    • Computational costs.

    View Slide

  17. Simulation Studies
    n 𝐺 is chosen from 2,3,5,10,20,30,40,50,60,70,80,90,100 . 𝐽 is chosen from {10, 30, 60}.
    • Recall that 3
    𝑤(
    )**
    (∈𝒥!
    ≔ arg min
    -" :∑
    "∈𝒥! -"0%
    ∑1∈ {%,&,…,4}
    %
    6%
    ∑#∈𝒯
    %
    𝑌!,#
    $ 1
    − ∑
    (∈𝒥!
    𝑤(
    𝑌
    (,#
    $ 1 &
    .
    n Generate 𝑌
    (,#
    $ from gaussian distributions.
    n The y-axis denotes the estimation error, and the x-axis denotes 𝐺.

    View Slide

  18. n Empirical analysis using case studies in existing studies.
    • Tobacco control in California (Abadie, Diamond and Hainmueller, 2010).
    • Basque conflict in the Basque country (Abadie and Gardeazabal, 2003).
    • Reunification of Germany (Abadie, Diamond and Hainmueller, 2015).
    n Pretreatment fit: Predictive ability for outcomes for 𝑡 ∈ 𝑇"
    .
    Empirical Studies

    View Slide

  19. Bayesian Predictive Synthetic Control Methods

    View Slide

  20. Bayesian SCMs
    n We introduced frequentist method for SCMs.
    n Frequentist SCMs require
    • Large samples for showing the convergence of the weight estimators.
    • Special inference methods, such as conformal inference.
    • Distance minimization to employ covariates, which is not easy to be justified.
    n Consider Bayesian approach for SCMs.
    • Works with finite samples.
    • Inference with posterior distribution.

    View Slide

  21. Bayesian Predictive Synthesis
    n Our Bayesian SCMs are based on the formulation of Bayesian predictive synthesis (BPS).
    n BPS: a method for synthesizing predictive models (McAlinn and West, 2019).
    • Synthesize predictive models with reflecting the model uncertainty.
    • A generalization of Bayesian model averaging.
    • Incorporating various predictive models with weighting them time-varying parameters.
    n We regard untreated outcomes and predictive models for the outcomes using covariates as
    predictors of 𝑌",&
    (
    • We first predict outcomes using covariates.
    • Then, we incorporate the predictors using the BPS.

    View Slide

  22. BPSCM
    n We propose SCMs with the BPS, referred to as the BPSCMs.
    ØBPSCM.
    • Φ&
    : a set of time-varying parameters at 𝑡. Φ&
    depends on 𝑌",&5#
    (
    &∈[#:&]
    .
    n The conditional density function of 𝑌",&5#
    ( is referred to as the synthesis function, denoted by
    𝛼 𝑦 𝑌
    $,&
    (
    $∈𝒥!
    , Φ> .
    n Bayesian decision maker predicts 𝑌",&5#
    ( using the posterior distribution defined as
    𝑝( 𝑦 𝑌",&5#
    (
    &∈[#:&]
    , Φ&) ≔ m 𝛼 𝑦 𝑦$,&
    (
    $∈𝒥!
    , Φ&) n
    $∈𝒥!
    𝑝$,& 𝑦$,&
    ( d𝑦$,&
    ( .
    v

    View Slide

  23. Dynamic Latent Factor Linear Regression Models
    n There are several specifications for the synthesis function.
    n Ex. Latent factor dynamic linear model:
    • Set the synthesis function as 𝛼 𝑦",&
    ( 𝑌
    $,&
    (
    $∈𝒥!
    , Φ& = 𝜙 𝑦",&
    ( , ; 𝑤",& + ∑
    $:#
    * 𝑤$,&𝑌
    $,&
    (, 𝜈&
    .
    • 𝜙(⋅; 𝑎, 𝑏)): a univariate normal density with mean 𝑎 and variance 𝑏).
    • 𝑣&
    are unobserved error terms.
    • Specify the process of 𝑌",&
    ( and 𝑤&,$
    as
    𝑌!,#
    $ = 𝑤!,#
    + %
    %∈𝒥!
    𝑤#,%
    𝑌
    %,#
    $ + 𝜖#
    , 𝜖#
    ∼ 𝑁 0, 𝜈#
    , 𝑤#,%
    = 𝑤#().%
    + 𝜂#,%
    , 𝜂#,%
    ∼ 𝑁(0, 𝜈#
    𝑾#
    ),

    View Slide

  24. Auxiliary Covariates
    ØThe BPSCM can use covariates by predicting outcomes using various predictive models.
    • 𝑋$,&
    : Covariates for the unit 𝑗.
    n Define 𝐿 predictors for 𝑌
    $,&
    ( by x
    𝑓? 𝑋$,& ?:#
    @
    .
    • These predictors can be constructed from machine learning methods.
    • We can use covariates in the predictive models.
    n With the original untreated outcomes 𝑌
    %,#
    $, there are 𝐾 = 1 + 𝐿 𝐿 predictrs 𝑌
    %,#
    $, 3
    𝑓+ 𝑥%,# +,)
    -
    %,)
    .
    .
    n We incorporate them by using the BPS.

    View Slide

  25. Auxiliary Covariates
    n A set of predictors are denoted by
    𝒁& = z𝑌#,&
    ( , … , 𝑌
    *,&
    (, x
    𝑓# 𝑋#,& , … , x
    𝑓@ 𝑋#,& , {
    x
    𝑓# 𝑋),& , … , x
    𝑓@ 𝑋*A#,& , x
    𝑓# 𝑋*,& , … , x
    𝑓@ 𝑋*,&
    n Conduct BPSCM as if there are 𝐽 + 𝐽𝐿 untreated units that can be used for SCMs:
    𝑝$ 𝑦| Φ#
    , 𝑌!,#
    $
    #∈ %:#
    = C 𝛼 𝑦 𝒛#
    , Φ#
    F
    (∈ %,&,…, %89 :
    𝑝(,#
    𝑧(,#
    d𝑧(,#
    .
    Ex. Synthesize predictive models such as linear regression and random forest.

    View Slide

  26. Advantages of the BPSCM
    üTime-varying parameters.
    üIncorporate uncertainty of each untreated outcome’s outcome.
    üMinimax optimality
    • Even under model misspecification, predictor of the BPSCM is minimax optimal in terms of
    KL divergence (Takanashi and McAlinn, 2021).
    • Avoid the implicit endogeneity problem?
    üWorks with finite samples.
    üInference (posterior distribution).

    View Slide

  27. Empirical Analysis
    ØEmpirical studies using the same case studies in the previous slide.
    n Compare following five prediction models.
    Time-varying coef.s Using covariates Synthesized predictive models
    Abadie ✓ -
    BPSCM ✓ -
    BPSCM (Linear) ✓ ✓ Least squares
    BPSCM (RF) ✓ ✓ Random forests
    BPSCM (Linear + RF) ✓ ✓ Least squares + random forests

    View Slide

  28. Empirical Analysis
    n We mainly check the pretreatment fit and posterior distribution of the BPSCMs.

    View Slide

  29. Conclusion

    View Slide

  30. n SCMs suffer from the issue of inconsistency.
    • The LS estimator is incompatible to the assumption, 𝔼 𝑌!,#
    $ = ∑
    (∈:!
    𝑤(
    ∗𝔼 𝑌
    (,#
    $ .
    → Implicit endogeneity (measurement error bias).
    • 𝑌!,#
    $ = ∑
    (∈:!
    𝑤(
    ∗𝑌
    (,#
    $ is not realistic...?
    n Frequentist density matching (Mixture model + GMM).
    • Mixture model 𝑝!,#
    𝑦 = ∑
    (∈𝒥!
    𝑤(
    ∗ 𝑝(,#
    𝑦 , a stronger assumption than 𝔼 𝑌!,#
    $ = ∑
    (∈:!
    𝑤(
    ∗𝔼 𝑌
    (,#
    $ .
    • By using the GMM under the assumption, we can estimate the weight consistently.
    n BPSCM.
    • By using the Bayesian method, we can obtain the minimax optimal predictor without assuming the
    mixture models without assuming mixture models.
    • Advantages such as flexible modeling and finite sample inference.

    View Slide

  31. Reference

    View Slide

  32. • Abadie, A. and Gardeazabal, J. “The economic costs of conflict: A case study of the basque country.” American Economic Review,
    2003.
    • Abadie, A., Diamond, A., and Hainmueller, J. “Synthetic control methods for comparative case studies: Estimating the effect of
    california’s tobacco control program.” Journal of the American Statistical Association, 2010
    • Abadie, A., Diamond, A., and Hainmueller, J. “Comparative politics and the synthetic control method.” American Journal of Political
    Science, 2015
    • Ferman, B. and Pinto, C. Synthetic controls with imperfect pretreatment fit. Quantitative Economics, 12(4):1197–1221, 2021.
    • McAlinn, K. and West, M., “Dynamic Bayesian predictive synthesis in time series forecasting,” Journal of econometrics, 2019
    • McAlinn, K., Aastveit, K. A., Nakajima, J., and West, M. “Multivariate Bayesian predictive synthesis in macroeconomic forecasting.”
    Journal of the American Statistical Association, 2020
    • Shi, C., Sridhar, D., Misra, V., and Blei, D. On the assumptions of synthetic control methods. In AISTATS, pp. 7163–7175, 2022
    • Takanashi, K. and McAlinn, K. “Predictions with dynamic bayesian predictive synthesis are exact minimax”, 2021
    • West, M. and Harrison, P. J. “Bayesian Forecasting & Dynamic Models.” Springer Verlag, 2nd edition, 1997

    View Slide