Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Synthetic Control Methods through Predictive Synthesis

MasaKat0
August 02, 2023

Synthetic Control Methods through Predictive Synthesis

Presentation slides at EcoSta 2023.

MasaKat0

August 02, 2023
Tweet

More Decks by MasaKat0

Other Decks in Research

Transcript

  1. Synthetic Control Methods through Predictive Synthesis Masahiro Kato (The University

    of Tokyo) Coauthors: Akira Fukuda, Kosaku Takanashi, Kenichiro McAlinn, Akari Ohda, Masaaki Imaizumi Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity (https://arxiv.org/abs/2307.11127) Paper 2: Bayesian Predictive Synthetic Control Methods (https://drive.google.com/file/d/1veWTQTuWTx2gAMyh7VSZnenxsVqs1nla/view) Speaker Deck: https://speakerdeck.com/masakat0/synthetic-control-methods-through-predictive-synthesis?slide=25
  2. Synthetic Control Methods ØSynthetic Control Methods (SCMs; Abadie et al.

    2003). n Core idea. • There are several units. One unit among them receives a policy intervention (treated unit). • Policy intervention affects outcomes of the treated unit. • We cannot observe outcomes when the treated unit does not receive the policy intervention • Estimate counterfactual outcomes of the treated unit by using a weighted sum of observed outcomes of untreated units. • Then, using the estimated outcome, estimate the causal effect of the treated unit.
  3. Problem Setting n 𝐽 + 1 units, 𝑗 ∈ 𝒥

    ≔ {0,1,2, … , 𝐽}. • 𝑗 = 0: Treated unit (a unit affected by the policy intervention). • 𝑗 ∈ 𝒥!: 𝒥 ∖ {0}: Untreated units. n 𝑇 Periods, 𝑡 ∈ 𝒯 ≔ {1,2, … , 𝑇}. • Intervention occurs at 𝑡 = 𝑇" < 𝑇. • 𝑡 ∈ 𝒯 " ≔ {1,2, … , 𝑇"} : before the intervention. • 𝑡 ∈ 𝒯 ∖ 𝒯 " : after the intervention (𝑇# ≔ 𝒯 # = 𝑇 − 𝑇" ).
  4. Problem Setting ØPotential outcomes (Neyman, 1923; Rubin, 1974): n For

    each unit 𝑗 ∈ 𝒥 and period 𝑡 ∈ 𝒯, define potential outcomes 𝑌 $,& ' , 𝑌 $,& ( ∈ ℝ). • 𝑌& ' and 𝑌& ( are potential outcomes with and without interventions. • 𝔼$,& : expectation over 𝑌& ' and 𝑌& (. ØObservations: n Observe one of the outcomes, 𝑌 $,& ∈ ℝ, corresponding to actual intervention; that is, 𝑌",& = 9 𝑌",& ' 𝑖𝑓 𝑡 ∈ 𝒯 # 𝑌",& ( 𝑖𝑓 𝑡 ∈ 𝒯 " , 𝑌 $,& = 𝑌 $,& ( 𝑓𝑜𝑟 𝑗 ∈ 𝒥!.
  5. Problem Setting ØCausal effects: 𝜏",& ≔ 𝔼",& 𝑌",& ' −

    𝑌",& ( 𝑓𝑜𝑟 𝑡 ∈ 𝒯 #. n Estimating the causal effect by predicting 𝑌",& ( for 𝑡 ∈ 𝒯 # . ØCore idea. n Predict 𝑌",& ( by a weighted sum of 𝑌#,& ( , … , 𝑌 *,& (. A 𝑌",& ( = ∑ $∈*! 𝑤$𝑌 $,& ( . • A 𝑌",& ( is a counterfactual trend of the treated unit. • A 𝑌",& ( is called a synthetic control unit. v
  6. Contents ØResearch questions mainly lie in estimation of the weights,

    𝑤#, … , 𝑤* . n Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity. • Estimators in existing SCMs are not consistent (Ferman and Pinto, 2021). • We discuss the inconsistency problem from the viewpoint of endogeneity. • Propose frequentist SCMs with the generalized method of moments (GMM). n Paper 2: Bayesian Predictive Synthetic Control Methods. • Apply Bayesian predicative synthesis for SCMs. • Flexible modeling with time-varying parameter, finite-sample analysis, and minimax optimality.
  7. Least-Squares Estimator n In standard SCM, we usually estimate the

    weights by constraint least squares. • That is, we estimate 𝑤$ as D 𝑤$ ,- $∈*! = arg min ." "∈$! 1 𝑇 K &∈𝒯 % 𝑌",& ( − 𝑤$𝑌 $,& ( ) such that K $∈*! 𝑤$ = 1, 𝑤$ ≥ 0 ∀𝑗 ∈ 𝐽!. n To justify the least squares (LS) estimator, we assume the linearity in the expected outcomes: 𝔼 𝑌",& ( = K $∈*! 𝑤$ ∗𝔼 𝑌 $,& ( . v
  8. Inconsistency of the LS Estimator n Ferman and Pinto (2021)

    shows that the LS estimator is inconsistent; that is, D 𝑤$ ,- → 1 T 𝑤$ ≠ 𝑤2 ∗. • They propose another LS-based estimator that reduces the bias. • However, the estimator is still biased. n Their results imply that the LS estimator is incompatible to SCMs under the linearity assumption, 𝔼 𝑌",& ( = ∑ $∈*! 𝑤$ ∗𝔼 𝑌 $,& ( . v
  9. Implicit Endogeneity n We investigate this problem from the viewpoint

    of endogeneity. • Let 𝑌 $,& ( = 𝔼$,& 𝑌 $,& ( + 𝜀$,& . • Under 𝔼 𝑌",& ( = ∑ $∈*! 𝑤$ ∗𝔼 𝑌 $,& ( , it holds that 𝑌",& ( = K $∈𝒥! 𝑤$ ∗ 𝑌 $,& ( − K $∈𝒥! 𝑤$ ∗ 𝜀$,& + 𝜀",& = K $∈𝒥! 𝑤$ ∗ 𝑌 $,& ( + 𝑣&. n Implicit endogeneity (measurement error bias): correlation between 𝑌 $,& ( and 𝑣& . • There is an (implicit) endogeneity between the explanatory variable and the error term. • This is a reason why the LS estimator D 𝑤$ ,- is biased; that is, D 𝑤$ ,- → 1 T 𝑤$ ≠ 𝑤$ ∗. v
  10. Mixture Models n The implicit endogeneity implies that the LS

    estimator is incompatible to SCMs. ØConsider another estimation strategy. n Assume mixture models and estimate the weights by the GMM. • 𝑝$,&(𝑦): density of 𝑌 $,& ( n Mixture models between 𝑝",&(𝑦) and 𝑝$,& 𝑦 $∈𝒥! : 𝑝",& 𝑦 = K $∈𝒥! 𝑤$ ∗ 𝑝$,& 𝑦 . v
  11. Fine-Grained Models. n Assuming mixture models is stronger than assuming

    𝔼 𝑌",& ( = ∑ $∈*! 𝑤$ ∗𝔼 𝑌 $,& ( . ØMixture models can be justified from the viewpoint of fine-grained models (Shi et al., 2021). n Linear factor models are usually assumed in SCMs: 𝑌 $,& ( = 𝑐$ + 𝛿& + 𝜆&𝜇$ + 𝜀$,&, 𝑌 $,& ' = 𝜏",& + 𝑌 $,& ( • Shi et al., (2021) finds that mixture models imply factor models under some assumptions.
  12. Fine-Grained Models. ØFine-grained models (Shi et al., 2021). n Assume

    that 𝑌 $,& ( represents a group-level outcome. n In each unit 𝑗, there are unobserved small units 𝑌 $,&# ( , 𝑌 $,&) ( , …. = In each unit, there are unobserved units that constitute 𝑌 $,& (. → Under some assumptions, • each 𝑝$,& 𝑦 can be linked to the linear factor model, and • 𝑝",& 𝑦 = ∑ $∈𝒥! 𝑤$ ∗ 𝑝$,& 𝑦 holds. 𝑌!,# $ 𝑌%,# $ 𝑌&,# $ 𝑌!,#,% $ 𝑌!,#,& $ 𝑌!,#,' $ 𝑌%,#,% $ 𝑌%,#,% $ 𝑌%,#,% $ 𝑌&,#,% $ 𝑌&,#,% $ 𝑌&,#,% $
  13. Moment Conditions ØMoment conditions. n Under the mixture models, the

    following moment conditions hold: 𝔼",& 𝑌",& ( 4 = K $∈𝒥! 𝑤$ ∗ 𝔼$,& 𝑌 $,& ( 4 ∀ 𝛾 ∈ ℝ5. n Empirical approximation of 𝔼",& 𝑌",& ( 4 − ∑ $∈𝒥! 𝑤$ ∗ 𝔼$,& 𝑌 $,& ( 4 : D 𝑚4 𝑤 ≔ 1 𝑇" K &∈𝒯 % 𝑌",& ( 4 − K $∈𝒥! 𝑤$ 𝑌 $,& ( 4 . • We estimate 𝑤 to achieve D 𝑚4 𝑤 ≈ 0. v v
  14. GMM n A set of positive values Γ ≔ {1,2,3,

    … , 𝐺}, e.x., Γ = {1, 2, 3, 4, 5}. n Estimate 𝑤$ ∗ as D 𝑤$ 677 $∈𝒥! ≔ arg min ." :∑ "∈𝒥! .":# K 4∈ ; D 𝑚4 𝑤 ) . • We can weight each empirical moment condition; that is, by using some weight 𝑣4 ∈ ℝ5, D 𝑤$ 677 $∈𝒥! ≔ arg min ." :∑ "∈𝒥! .":# K 4∈ ; 𝑣4 D 𝑚4 𝑤 ) . n We can show that the GMM estimator is asymptotically unbiased; that is, D 𝑤$ 677 → 1 𝑤$ ∗. v
  15. Inference n Hypothesis testing about the sharp null 𝐻": 𝜏",&

    = 0 for 𝑡 ∈ 𝒯 #. • Note that 𝜏",& = 𝑌",& ' − 𝑌",& ( under the linear factor model. n We usually employ the conformal inference for testing the hypothesis. • Nonparametrically test the sharp null. • Computational costs.
  16. Simulation Studies n 𝐺 is chosen from 2,3,5,10,20,30,40,50,60,70,80,90,100 . 𝐽

    is chosen from {10, 30, 60}. • Recall that 3 𝑤( )** (∈𝒥! ≔ arg min -" :∑ "∈𝒥! -"0% ∑1∈ {%,&,…,4} % 6% ∑#∈𝒯 % 𝑌!,# $ 1 − ∑ (∈𝒥! 𝑤( 𝑌 (,# $ 1 & . n Generate 𝑌 (,# $ from gaussian distributions. n The y-axis denotes the estimation error, and the x-axis denotes 𝐺.
  17. n Empirical analysis using case studies in existing studies. •

    Tobacco control in California (Abadie, Diamond and Hainmueller, 2010). • Basque conflict in the Basque country (Abadie and Gardeazabal, 2003). • Reunification of Germany (Abadie, Diamond and Hainmueller, 2015). n Pretreatment fit: Predictive ability for outcomes for 𝑡 ∈ 𝑇" . Empirical Studies
  18. Bayesian SCMs n We introduced frequentist method for SCMs. n

    Frequentist SCMs require • Large samples for showing the convergence of the weight estimators. • Special inference methods, such as conformal inference. • Distance minimization to employ covariates, which is not easy to be justified. n Consider Bayesian approach for SCMs. • Works with finite samples. • Inference with posterior distribution.
  19. Bayesian Predictive Synthesis n Our Bayesian SCMs are based on

    the formulation of Bayesian predictive synthesis (BPS). n BPS: a method for synthesizing predictive models (McAlinn and West, 2019). • Synthesize predictive models with reflecting the model uncertainty. • A generalization of Bayesian model averaging. • Incorporating various predictive models with weighting them time-varying parameters. n We regard untreated outcomes and predictive models for the outcomes using covariates as predictors of 𝑌",& ( • We first predict outcomes using covariates. • Then, we incorporate the predictors using the BPS.
  20. BPSCM n We propose SCMs with the BPS, referred to

    as the BPSCMs. ØBPSCM. • Φ& : a set of time-varying parameters at 𝑡. Φ& depends on 𝑌",&5# ( &∈[#:&] . n The conditional density function of 𝑌",&5# ( is referred to as the synthesis function, denoted by 𝛼 𝑦 𝑌 $,& ( $∈𝒥! , Φ> . n Bayesian decision maker predicts 𝑌",&5# ( using the posterior distribution defined as 𝑝( 𝑦 𝑌",&5# ( &∈[#:&] , Φ&) ≔ m 𝛼 𝑦 𝑦$,& ( $∈𝒥! , Φ&) n $∈𝒥! 𝑝$,& 𝑦$,& ( d𝑦$,& ( . v
  21. Dynamic Latent Factor Linear Regression Models n There are several

    specifications for the synthesis function. n Ex. Latent factor dynamic linear model: • Set the synthesis function as 𝛼 𝑦",& ( 𝑌 $,& ( $∈𝒥! , Φ& = 𝜙 𝑦",& ( , ; 𝑤",& + ∑ $:# * 𝑤$,&𝑌 $,& (, 𝜈& . • 𝜙(⋅; 𝑎, 𝑏)): a univariate normal density with mean 𝑎 and variance 𝑏). • 𝑣& are unobserved error terms. • Specify the process of 𝑌",& ( and 𝑤&,$ as 𝑌!,# $ = 𝑤!,# + % %∈𝒥! 𝑤#,% 𝑌 %,# $ + 𝜖# , 𝜖# ∼ 𝑁 0, 𝜈# , 𝑤#,% = 𝑤#().% + 𝜂#,% , 𝜂#,% ∼ 𝑁(0, 𝜈# 𝑾# ),
  22. Auxiliary Covariates ØThe BPSCM can use covariates by predicting outcomes

    using various predictive models. • 𝑋$,& : Covariates for the unit 𝑗. n Define 𝐿 predictors for 𝑌 $,& ( by x 𝑓? 𝑋$,& ?:# @ . • These predictors can be constructed from machine learning methods. • We can use covariates in the predictive models. n With the original untreated outcomes 𝑌 %,# $, there are 𝐾 = 1 + 𝐿 𝐿 predictrs 𝑌 %,# $, 3 𝑓+ 𝑥%,# +,) - %,) . . n We incorporate them by using the BPS.
  23. Auxiliary Covariates n A set of predictors are denoted by

    𝒁& = z𝑌#,& ( , … , 𝑌 *,& (, x 𝑓# 𝑋#,& , … , x 𝑓@ 𝑋#,& , { x 𝑓# 𝑋),& , … , x 𝑓@ 𝑋*A#,& , x 𝑓# 𝑋*,& , … , x 𝑓@ 𝑋*,& n Conduct BPSCM as if there are 𝐽 + 𝐽𝐿 untreated units that can be used for SCMs: 𝑝$ 𝑦| Φ# , 𝑌!,# $ #∈ %:# = C 𝛼 𝑦 𝒛# , Φ# F (∈ %,&,…, %89 : 𝑝(,# 𝑧(,# d𝑧(,# . Ex. Synthesize predictive models such as linear regression and random forest.
  24. Advantages of the BPSCM üTime-varying parameters. üIncorporate uncertainty of each

    untreated outcome’s outcome. üMinimax optimality • Even under model misspecification, predictor of the BPSCM is minimax optimal in terms of KL divergence (Takanashi and McAlinn, 2021). • Avoid the implicit endogeneity problem? üWorks with finite samples. üInference (posterior distribution).
  25. Empirical Analysis ØEmpirical studies using the same case studies in

    the previous slide. n Compare following five prediction models. Time-varying coef.s Using covariates Synthesized predictive models Abadie ✓ - BPSCM ✓ - BPSCM (Linear) ✓ ✓ Least squares BPSCM (RF) ✓ ✓ Random forests BPSCM (Linear + RF) ✓ ✓ Least squares + random forests
  26. n SCMs suffer from the issue of inconsistency. • The

    LS estimator is incompatible to the assumption, 𝔼 𝑌!,# $ = ∑ (∈:! 𝑤( ∗𝔼 𝑌 (,# $ . → Implicit endogeneity (measurement error bias). • 𝑌!,# $ = ∑ (∈:! 𝑤( ∗𝑌 (,# $ is not realistic...? n Frequentist density matching (Mixture model + GMM). • Mixture model 𝑝!,# 𝑦 = ∑ (∈𝒥! 𝑤( ∗ 𝑝(,# 𝑦 , a stronger assumption than 𝔼 𝑌!,# $ = ∑ (∈:! 𝑤( ∗𝔼 𝑌 (,# $ . • By using the GMM under the assumption, we can estimate the weight consistently. n BPSCM. • By using the Bayesian method, we can obtain the minimax optimal predictor without assuming the mixture models without assuming mixture models. • Advantages such as flexible modeling and finite sample inference.
  27. • Abadie, A. and Gardeazabal, J. “The economic costs of

    conflict: A case study of the basque country.” American Economic Review, 2003. • Abadie, A., Diamond, A., and Hainmueller, J. “Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program.” Journal of the American Statistical Association, 2010 • Abadie, A., Diamond, A., and Hainmueller, J. “Comparative politics and the synthetic control method.” American Journal of Political Science, 2015 • Ferman, B. and Pinto, C. Synthetic controls with imperfect pretreatment fit. Quantitative Economics, 12(4):1197–1221, 2021. • McAlinn, K. and West, M., “Dynamic Bayesian predictive synthesis in time series forecasting,” Journal of econometrics, 2019 • McAlinn, K., Aastveit, K. A., Nakajima, J., and West, M. “Multivariate Bayesian predictive synthesis in macroeconomic forecasting.” Journal of the American Statistical Association, 2020 • Shi, C., Sridhar, D., Misra, V., and Blei, D. On the assumptions of synthetic control methods. In AISTATS, pp. 7163–7175, 2022 • Takanashi, K. and McAlinn, K. “Predictions with dynamic bayesian predictive synthesis are exact minimax”, 2021 • West, M. and Harrison, P. J. “Bayesian Forecasting & Dynamic Models.” Springer Verlag, 2nd edition, 1997