Slide 1

Slide 1 text

Synthetic Control Methods through Predictive Synthesis Masahiro Kato (The University of Tokyo) Coauthors: Akira Fukuda, Kosaku Takanashi, Kenichiro McAlinn, Akari Ohda, Masaaki Imaizumi Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity (https://arxiv.org/abs/2307.11127) Paper 2: Bayesian Predictive Synthetic Control Methods (https://drive.google.com/file/d/1veWTQTuWTx2gAMyh7VSZnenxsVqs1nla/view) Speaker Deck: https://speakerdeck.com/masakat0/synthetic-control-methods-through-predictive-synthesis?slide=25

Slide 2

Slide 2 text

Synthetic Control Methods ØSynthetic Control Methods (SCMs; Abadie et al. 2003). n Core idea. • There are several units. One unit among them receives a policy intervention (treated unit). • Policy intervention affects outcomes of the treated unit. • We cannot observe outcomes when the treated unit does not receive the policy intervention • Estimate counterfactual outcomes of the treated unit by using a weighted sum of observed outcomes of untreated units. • Then, using the estimated outcome, estimate the causal effect of the treated unit.

Slide 3

Slide 3 text

Problem Setting n 𝐽 + 1 units, 𝑗 ∈ 𝒥 ≔ {0,1,2, … , 𝐽}. • 𝑗 = 0: Treated unit (a unit affected by the policy intervention). • 𝑗 ∈ 𝒥!: 𝒥 ∖ {0}: Untreated units. n 𝑇 Periods, 𝑡 ∈ 𝒯 ≔ {1,2, … , 𝑇}. • Intervention occurs at 𝑡 = 𝑇" < 𝑇. • 𝑡 ∈ 𝒯 " ≔ {1,2, … , 𝑇"} : before the intervention. • 𝑡 ∈ 𝒯 ∖ 𝒯 " : after the intervention (𝑇# ≔ 𝒯 # = 𝑇 − 𝑇" ).

Slide 4

Slide 4 text

Problem Setting ØPotential outcomes (Neyman, 1923; Rubin, 1974): n For each unit 𝑗 ∈ 𝒥 and period 𝑡 ∈ 𝒯, define potential outcomes 𝑌 $,& ' , 𝑌 $,& ( ∈ ℝ). • 𝑌& ' and 𝑌& ( are potential outcomes with and without interventions. • 𝔼$,& : expectation over 𝑌& ' and 𝑌& (. ØObservations: n Observe one of the outcomes, 𝑌 $,& ∈ ℝ, corresponding to actual intervention; that is, 𝑌",& = 9 𝑌",& ' 𝑖𝑓 𝑡 ∈ 𝒯 # 𝑌",& ( 𝑖𝑓 𝑡 ∈ 𝒯 " , 𝑌 $,& = 𝑌 $,& ( 𝑓𝑜𝑟 𝑗 ∈ 𝒥!.

Slide 5

Slide 5 text

Problem Setting ØCausal effects: 𝜏",& ≔ 𝔼",& 𝑌",& ' − 𝑌",& ( 𝑓𝑜𝑟 𝑡 ∈ 𝒯 #. n Estimating the causal effect by predicting 𝑌",& ( for 𝑡 ∈ 𝒯 # . ØCore idea. n Predict 𝑌",& ( by a weighted sum of 𝑌#,& ( , … , 𝑌 *,& (. A 𝑌",& ( = ∑ $∈*! 𝑤$𝑌 $,& ( . • A 𝑌",& ( is a counterfactual trend of the treated unit. • A 𝑌",& ( is called a synthetic control unit. v

Slide 6

Slide 6 text

Contents ØResearch questions mainly lie in estimation of the weights, 𝑤#, … , 𝑤* . n Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity. • Estimators in existing SCMs are not consistent (Ferman and Pinto, 2021). • We discuss the inconsistency problem from the viewpoint of endogeneity. • Propose frequentist SCMs with the generalized method of moments (GMM). n Paper 2: Bayesian Predictive Synthetic Control Methods. • Apply Bayesian predicative synthesis for SCMs. • Flexible modeling with time-varying parameter, finite-sample analysis, and minimax optimality.

Slide 7

Slide 7 text

Synthetic Control Methods by Density Matching under Implicit Endogeneity

Slide 8

Slide 8 text

Least-Squares Estimator n In standard SCM, we usually estimate the weights by constraint least squares. • That is, we estimate 𝑤$ as D 𝑤$ ,- $∈*! = arg min ." "∈$! 1 𝑇 K &∈𝒯 % 𝑌",& ( − 𝑤$𝑌 $,& ( ) such that K $∈*! 𝑤$ = 1, 𝑤$ ≥ 0 ∀𝑗 ∈ 𝐽!. n To justify the least squares (LS) estimator, we assume the linearity in the expected outcomes: 𝔼 𝑌",& ( = K $∈*! 𝑤$ ∗𝔼 𝑌 $,& ( . v

Slide 9

Slide 9 text

Inconsistency of the LS Estimator n Ferman and Pinto (2021) shows that the LS estimator is inconsistent; that is, D 𝑤$ ,- → 1 T 𝑤$ ≠ 𝑤2 ∗. • They propose another LS-based estimator that reduces the bias. • However, the estimator is still biased. n Their results imply that the LS estimator is incompatible to SCMs under the linearity assumption, 𝔼 𝑌",& ( = ∑ $∈*! 𝑤$ ∗𝔼 𝑌 $,& ( . v

Slide 10

Slide 10 text

Implicit Endogeneity n We investigate this problem from the viewpoint of endogeneity. • Let 𝑌 $,& ( = 𝔼$,& 𝑌 $,& ( + 𝜀$,& . • Under 𝔼 𝑌",& ( = ∑ $∈*! 𝑤$ ∗𝔼 𝑌 $,& ( , it holds that 𝑌",& ( = K $∈𝒥! 𝑤$ ∗ 𝑌 $,& ( − K $∈𝒥! 𝑤$ ∗ 𝜀$,& + 𝜀",& = K $∈𝒥! 𝑤$ ∗ 𝑌 $,& ( + 𝑣&. n Implicit endogeneity (measurement error bias): correlation between 𝑌 $,& ( and 𝑣& . • There is an (implicit) endogeneity between the explanatory variable and the error term. • This is a reason why the LS estimator D 𝑤$ ,- is biased; that is, D 𝑤$ ,- → 1 T 𝑤$ ≠ 𝑤$ ∗. v

Slide 11

Slide 11 text

Mixture Models n The implicit endogeneity implies that the LS estimator is incompatible to SCMs. ØConsider another estimation strategy. n Assume mixture models and estimate the weights by the GMM. • 𝑝$,&(𝑦): density of 𝑌 $,& ( n Mixture models between 𝑝",&(𝑦) and 𝑝$,& 𝑦 $∈𝒥! : 𝑝",& 𝑦 = K $∈𝒥! 𝑤$ ∗ 𝑝$,& 𝑦 . v

Slide 12

Slide 12 text

Fine-Grained Models. n Assuming mixture models is stronger than assuming 𝔼 𝑌",& ( = ∑ $∈*! 𝑤$ ∗𝔼 𝑌 $,& ( . ØMixture models can be justified from the viewpoint of fine-grained models (Shi et al., 2021). n Linear factor models are usually assumed in SCMs: 𝑌 $,& ( = 𝑐$ + 𝛿& + 𝜆&𝜇$ + 𝜀$,&, 𝑌 $,& ' = 𝜏",& + 𝑌 $,& ( • Shi et al., (2021) finds that mixture models imply factor models under some assumptions.

Slide 13

Slide 13 text

Fine-Grained Models. ØFine-grained models (Shi et al., 2021). n Assume that 𝑌 $,& ( represents a group-level outcome. n In each unit 𝑗, there are unobserved small units 𝑌 $, ( , 𝑌 $,&) ( , …. = In each unit, there are unobserved units that constitute 𝑌 $,& (. → Under some assumptions, • each 𝑝$,& 𝑦 can be linked to the linear factor model, and • 𝑝",& 𝑦 = ∑ $∈𝒥! 𝑤$ ∗ 𝑝$,& 𝑦 holds. 𝑌!,# $ 𝑌%,# $ 𝑌&,# $ 𝑌!,#,% $ 𝑌!,#,& $ 𝑌!,#,' $ 𝑌%,#,% $ 𝑌%,#,% $ 𝑌%,#,% $ 𝑌&,#,% $ 𝑌&,#,% $ 𝑌&,#,% $

Slide 14

Slide 14 text

Moment Conditions ØMoment conditions. n Under the mixture models, the following moment conditions hold: 𝔼",& 𝑌",& ( 4 = K $∈𝒥! 𝑤$ ∗ 𝔼$,& 𝑌 $,& ( 4 ∀ 𝛾 ∈ ℝ5. n Empirical approximation of 𝔼",& 𝑌",& ( 4 − ∑ $∈𝒥! 𝑤$ ∗ 𝔼$,& 𝑌 $,& ( 4 : D 𝑚4 𝑤 ≔ 1 𝑇" K &∈𝒯 % 𝑌",& ( 4 − K $∈𝒥! 𝑤$ 𝑌 $,& ( 4 . • We estimate 𝑤 to achieve D 𝑚4 𝑤 ≈ 0. v v

Slide 15

Slide 15 text

GMM n A set of positive values Γ ≔ {1,2,3, … , 𝐺}, e.x., Γ = {1, 2, 3, 4, 5}. n Estimate 𝑤$ ∗ as D 𝑤$ 677 $∈𝒥! ≔ arg min ." :∑ "∈𝒥! .":# K 4∈ ; D 𝑚4 𝑤 ) . • We can weight each empirical moment condition; that is, by using some weight 𝑣4 ∈ ℝ5, D 𝑤$ 677 $∈𝒥! ≔ arg min ." :∑ "∈𝒥! .":# K 4∈ ; 𝑣4 D 𝑚4 𝑤 ) . n We can show that the GMM estimator is asymptotically unbiased; that is, D 𝑤$ 677 → 1 𝑤$ ∗. v

Slide 16

Slide 16 text

Inference n Hypothesis testing about the sharp null 𝐻": 𝜏",& = 0 for 𝑡 ∈ 𝒯 #. • Note that 𝜏",& = 𝑌",& ' − 𝑌",& ( under the linear factor model. n We usually employ the conformal inference for testing the hypothesis. • Nonparametrically test the sharp null. • Computational costs.

Slide 17

Slide 17 text

Simulation Studies n 𝐺 is chosen from 2,3,5,10,20,30,40,50,60,70,80,90,100 . 𝐽 is chosen from {10, 30, 60}. • Recall that 3 𝑤( )** (∈𝒥! ≔ arg min -" :∑ "∈𝒥! -"0% ∑1∈ {%,&,…,4} % 6% ∑#∈𝒯 % 𝑌!,# $ 1 − ∑ (∈𝒥! 𝑤( 𝑌 (,# $ 1 & . n Generate 𝑌 (,# $ from gaussian distributions. n The y-axis denotes the estimation error, and the x-axis denotes 𝐺.

Slide 18

Slide 18 text

n Empirical analysis using case studies in existing studies. • Tobacco control in California (Abadie, Diamond and Hainmueller, 2010). • Basque conflict in the Basque country (Abadie and Gardeazabal, 2003). • Reunification of Germany (Abadie, Diamond and Hainmueller, 2015). n Pretreatment fit: Predictive ability for outcomes for 𝑡 ∈ 𝑇" . Empirical Studies

Slide 19

Slide 19 text

Bayesian Predictive Synthetic Control Methods

Slide 20

Slide 20 text

Bayesian SCMs n We introduced frequentist method for SCMs. n Frequentist SCMs require • Large samples for showing the convergence of the weight estimators. • Special inference methods, such as conformal inference. • Distance minimization to employ covariates, which is not easy to be justified. n Consider Bayesian approach for SCMs. • Works with finite samples. • Inference with posterior distribution.

Slide 21

Slide 21 text

Bayesian Predictive Synthesis n Our Bayesian SCMs are based on the formulation of Bayesian predictive synthesis (BPS). n BPS: a method for synthesizing predictive models (McAlinn and West, 2019). • Synthesize predictive models with reflecting the model uncertainty. • A generalization of Bayesian model averaging. • Incorporating various predictive models with weighting them time-varying parameters. n We regard untreated outcomes and predictive models for the outcomes using covariates as predictors of 𝑌",& ( • We first predict outcomes using covariates. • Then, we incorporate the predictors using the BPS.

Slide 22

Slide 22 text

BPSCM n We propose SCMs with the BPS, referred to as the BPSCMs. ØBPSCM. • Φ& : a set of time-varying parameters at 𝑡. Φ& depends on 𝑌",&5# ( &∈[#:&] . n The conditional density function of 𝑌",&5# ( is referred to as the synthesis function, denoted by 𝛼 𝑦 𝑌 $,& ( $∈𝒥! , Φ> . n Bayesian decision maker predicts 𝑌",&5# ( using the posterior distribution defined as 𝑝( 𝑦 𝑌",&5# ( &∈[#:&] , Φ&) ≔ m 𝛼 𝑦 𝑦$,& ( $∈𝒥! , Φ&) n $∈𝒥! 𝑝$,& 𝑦$,& ( d𝑦$,& ( . v

Slide 23

Slide 23 text

Dynamic Latent Factor Linear Regression Models n There are several specifications for the synthesis function. n Ex. Latent factor dynamic linear model: • Set the synthesis function as 𝛼 𝑦",& ( 𝑌 $,& ( $∈𝒥! , Φ& = 𝜙 𝑦",& ( , ; 𝑤",& + ∑ $:# * 𝑤$,&𝑌 $,& (, 𝜈& . • 𝜙(⋅; 𝑎, 𝑏)): a univariate normal density with mean 𝑎 and variance 𝑏). • 𝑣& are unobserved error terms. • Specify the process of 𝑌",& ( and 𝑤&,$ as 𝑌!,# $ = 𝑤!,# + % %∈𝒥! 𝑤#,% 𝑌 %,# $ + 𝜖# , 𝜖# ∼ 𝑁 0, 𝜈# , 𝑤#,% = 𝑤#().% + 𝜂#,% , 𝜂#,% ∼ 𝑁(0, 𝜈# 𝑾# ),

Slide 24

Slide 24 text

Auxiliary Covariates ØThe BPSCM can use covariates by predicting outcomes using various predictive models. • 𝑋$,& : Covariates for the unit 𝑗. n Define 𝐿 predictors for 𝑌 $,& ( by x 𝑓? 𝑋$,& ?:# @ . • These predictors can be constructed from machine learning methods. • We can use covariates in the predictive models. n With the original untreated outcomes 𝑌 %,# $, there are 𝐾 = 1 + 𝐿 𝐿 predictrs 𝑌 %,# $, 3 𝑓+ 𝑥%,# +,) - %,) . . n We incorporate them by using the BPS.

Slide 25

Slide 25 text

Auxiliary Covariates n A set of predictors are denoted by 𝒁& = z𝑌#,& ( , … , 𝑌 *,& (, x 𝑓# 𝑋#,& , … , x 𝑓@ 𝑋#,& , { x 𝑓# 𝑋),& , … , x 𝑓@ 𝑋*A#,& , x 𝑓# 𝑋*,& , … , x 𝑓@ 𝑋*,& n Conduct BPSCM as if there are 𝐽 + 𝐽𝐿 untreated units that can be used for SCMs: 𝑝$ 𝑦| Φ# , 𝑌!,# $ #∈ %:# = C 𝛼 𝑦 𝒛# , Φ# F (∈ %,&,…, %89 : 𝑝(,# 𝑧(,# d𝑧(,# . Ex. Synthesize predictive models such as linear regression and random forest.

Slide 26

Slide 26 text

Advantages of the BPSCM üTime-varying parameters. üIncorporate uncertainty of each untreated outcome’s outcome. üMinimax optimality • Even under model misspecification, predictor of the BPSCM is minimax optimal in terms of KL divergence (Takanashi and McAlinn, 2021). • Avoid the implicit endogeneity problem? üWorks with finite samples. üInference (posterior distribution).

Slide 27

Slide 27 text

Empirical Analysis ØEmpirical studies using the same case studies in the previous slide. n Compare following five prediction models. Time-varying coef.s Using covariates Synthesized predictive models Abadie ✓ - BPSCM ✓ - BPSCM (Linear) ✓ ✓ Least squares BPSCM (RF) ✓ ✓ Random forests BPSCM (Linear + RF) ✓ ✓ Least squares + random forests

Slide 28

Slide 28 text

Empirical Analysis n We mainly check the pretreatment fit and posterior distribution of the BPSCMs.

Slide 29

Slide 29 text

Conclusion

Slide 30

Slide 30 text

n SCMs suffer from the issue of inconsistency. • The LS estimator is incompatible to the assumption, 𝔼 𝑌!,# $ = ∑ (∈:! 𝑤( ∗𝔼 𝑌 (,# $ . → Implicit endogeneity (measurement error bias). • 𝑌!,# $ = ∑ (∈:! 𝑤( ∗𝑌 (,# $ is not realistic...? n Frequentist density matching (Mixture model + GMM). • Mixture model 𝑝!,# 𝑦 = ∑ (∈𝒥! 𝑤( ∗ 𝑝(,# 𝑦 , a stronger assumption than 𝔼 𝑌!,# $ = ∑ (∈:! 𝑤( ∗𝔼 𝑌 (,# $ . • By using the GMM under the assumption, we can estimate the weight consistently. n BPSCM. • By using the Bayesian method, we can obtain the minimax optimal predictor without assuming the mixture models without assuming mixture models. • Advantages such as flexible modeling and finite sample inference.

Slide 31

Slide 31 text

Reference

Slide 32

Slide 32 text

• Abadie, A. and Gardeazabal, J. “The economic costs of conflict: A case study of the basque country.” American Economic Review, 2003. • Abadie, A., Diamond, A., and Hainmueller, J. “Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program.” Journal of the American Statistical Association, 2010 • Abadie, A., Diamond, A., and Hainmueller, J. “Comparative politics and the synthetic control method.” American Journal of Political Science, 2015 • Ferman, B. and Pinto, C. Synthetic controls with imperfect pretreatment fit. Quantitative Economics, 12(4):1197–1221, 2021. • McAlinn, K. and West, M., “Dynamic Bayesian predictive synthesis in time series forecasting,” Journal of econometrics, 2019 • McAlinn, K., Aastveit, K. A., Nakajima, J., and West, M. “Multivariate Bayesian predictive synthesis in macroeconomic forecasting.” Journal of the American Statistical Association, 2020 • Shi, C., Sridhar, D., Misra, V., and Blei, D. On the assumptions of synthetic control methods. In AISTATS, pp. 7163–7175, 2022 • Takanashi, K. and McAlinn, K. “Predictions with dynamic bayesian predictive synthesis are exact minimax”, 2021 • West, M. and Harrison, P. J. “Bayesian Forecasting & Dynamic Models.” Springer Verlag, 2nd edition, 1997