MasaKat0
August 02, 2023
240

# Synthetic Control Methods through Predictive Synthesis

Presentation slides at EcoSta 2023.

August 02, 2023

## Transcript

1. ### Synthetic Control Methods through Predictive Synthesis Masahiro Kato (The University

of Tokyo) Coauthors: Akira Fukuda, Kosaku Takanashi, Kenichiro McAlinn, Akari Ohda, Masaaki Imaizumi Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity (https://arxiv.org/abs/2307.11127) Paper 2: Bayesian Predictive Synthetic Control Methods (https://drive.google.com/file/d/1veWTQTuWTx2gAMyh7VSZnenxsVqs1nla/view) Speaker Deck: https://speakerdeck.com/masakat0/synthetic-control-methods-through-predictive-synthesis?slide=25
2. ### Synthetic Control Methods ØSynthetic Control Methods (SCMs; Abadie et al.

2003). n Core idea. • There are several units. One unit among them receives a policy intervention (treated unit). • Policy intervention affects outcomes of the treated unit. • We cannot observe outcomes when the treated unit does not receive the policy intervention • Estimate counterfactual outcomes of the treated unit by using a weighted sum of observed outcomes of untreated units. • Then, using the estimated outcome, estimate the causal effect of the treated unit.
3. ### Problem Setting n 𝐽 + 1 units, 𝑗 ∈ 𝒥

≔ {0,1,2, … , 𝐽}. • 𝑗 = 0: Treated unit (a unit affected by the policy intervention). • 𝑗 ∈ 𝒥!: 𝒥 ∖ {0}: Untreated units. n 𝑇 Periods, 𝑡 ∈ 𝒯 ≔ {1,2, … , 𝑇}. • Intervention occurs at 𝑡 = 𝑇" < 𝑇. • 𝑡 ∈ 𝒯 " ≔ {1,2, … , 𝑇"} : before the intervention. • 𝑡 ∈ 𝒯 ∖ 𝒯 " : after the intervention (𝑇# ≔ 𝒯 # = 𝑇 − 𝑇" ).
4. ### Problem Setting ØPotential outcomes (Neyman, 1923; Rubin, 1974): n For

each unit 𝑗 ∈ 𝒥 and period 𝑡 ∈ 𝒯, define potential outcomes 𝑌 \$,& ' , 𝑌 \$,& ( ∈ ℝ). • 𝑌& ' and 𝑌& ( are potential outcomes with and without interventions. • 𝔼\$,& : expectation over 𝑌& ' and 𝑌& (. ØObservations: n Observe one of the outcomes, 𝑌 \$,& ∈ ℝ, corresponding to actual intervention; that is, 𝑌",& = 9 𝑌",& ' 𝑖𝑓 𝑡 ∈ 𝒯 # 𝑌",& ( 𝑖𝑓 𝑡 ∈ 𝒯 " , 𝑌 \$,& = 𝑌 \$,& ( 𝑓𝑜𝑟 𝑗 ∈ 𝒥!.
5. ### Problem Setting ØCausal effects: 𝜏",& ≔ 𝔼",& 𝑌",& ' −

𝑌",& ( 𝑓𝑜𝑟 𝑡 ∈ 𝒯 #. n Estimating the causal effect by predicting 𝑌",& ( for 𝑡 ∈ 𝒯 # . ØCore idea. n Predict 𝑌",& ( by a weighted sum of 𝑌#,& ( , … , 𝑌 *,& (. A 𝑌",& ( = ∑ \$∈*! 𝑤\$𝑌 \$,& ( . • A 𝑌",& ( is a counterfactual trend of the treated unit. • A 𝑌",& ( is called a synthetic control unit. v
6. ### Contents ØResearch questions mainly lie in estimation of the weights,

𝑤#, … , 𝑤* . n Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity. • Estimators in existing SCMs are not consistent (Ferman and Pinto, 2021). • We discuss the inconsistency problem from the viewpoint of endogeneity. • Propose frequentist SCMs with the generalized method of moments (GMM). n Paper 2: Bayesian Predictive Synthetic Control Methods. • Apply Bayesian predicative synthesis for SCMs. • Flexible modeling with time-varying parameter, finite-sample analysis, and minimax optimality.

8. ### Least-Squares Estimator n In standard SCM, we usually estimate the

weights by constraint least squares. • That is, we estimate 𝑤\$ as D 𝑤\$ ,- \$∈*! = arg min ." "∈\$! 1 𝑇 K &∈𝒯 % 𝑌",& ( − 𝑤\$𝑌 \$,& ( ) such that K \$∈*! 𝑤\$ = 1, 𝑤\$ ≥ 0 ∀𝑗 ∈ 𝐽!. n To justify the least squares (LS) estimator, we assume the linearity in the expected outcomes: 𝔼 𝑌",& ( = K \$∈*! 𝑤\$ ∗𝔼 𝑌 \$,& ( . v
9. ### Inconsistency of the LS Estimator n Ferman and Pinto (2021)

shows that the LS estimator is inconsistent; that is, D 𝑤\$ ,- → 1 T 𝑤\$ ≠ 𝑤2 ∗. • They propose another LS-based estimator that reduces the bias. • However, the estimator is still biased. n Their results imply that the LS estimator is incompatible to SCMs under the linearity assumption, 𝔼 𝑌",& ( = ∑ \$∈*! 𝑤\$ ∗𝔼 𝑌 \$,& ( . v
10. ### Implicit Endogeneity n We investigate this problem from the viewpoint

of endogeneity. • Let 𝑌 \$,& ( = 𝔼\$,& 𝑌 \$,& ( + 𝜀\$,& . • Under 𝔼 𝑌",& ( = ∑ \$∈*! 𝑤\$ ∗𝔼 𝑌 \$,& ( , it holds that 𝑌",& ( = K \$∈𝒥! 𝑤\$ ∗ 𝑌 \$,& ( − K \$∈𝒥! 𝑤\$ ∗ 𝜀\$,& + 𝜀",& = K \$∈𝒥! 𝑤\$ ∗ 𝑌 \$,& ( + 𝑣&. n Implicit endogeneity (measurement error bias): correlation between 𝑌 \$,& ( and 𝑣& . • There is an (implicit) endogeneity between the explanatory variable and the error term. • This is a reason why the LS estimator D 𝑤\$ ,- is biased; that is, D 𝑤\$ ,- → 1 T 𝑤\$ ≠ 𝑤\$ ∗. v
11. ### Mixture Models n The implicit endogeneity implies that the LS

estimator is incompatible to SCMs. ØConsider another estimation strategy. n Assume mixture models and estimate the weights by the GMM. • 𝑝\$,&(𝑦): density of 𝑌 \$,& ( n Mixture models between 𝑝",&(𝑦) and 𝑝\$,& 𝑦 \$∈𝒥! : 𝑝",& 𝑦 = K \$∈𝒥! 𝑤\$ ∗ 𝑝\$,& 𝑦 . v
12. ### Fine-Grained Models. n Assuming mixture models is stronger than assuming

𝔼 𝑌",& ( = ∑ \$∈*! 𝑤\$ ∗𝔼 𝑌 \$,& ( . ØMixture models can be justified from the viewpoint of fine-grained models (Shi et al., 2021). n Linear factor models are usually assumed in SCMs: 𝑌 \$,& ( = 𝑐\$ + 𝛿& + 𝜆&𝜇\$ + 𝜀\$,&, 𝑌 \$,& ' = 𝜏",& + 𝑌 \$,& ( • Shi et al., (2021) finds that mixture models imply factor models under some assumptions.
13. ### Fine-Grained Models. ØFine-grained models (Shi et al., 2021). n Assume

that 𝑌 \$,& ( represents a group-level outcome. n In each unit 𝑗, there are unobserved small units 𝑌 \$,&# ( , 𝑌 \$,&) ( , …. = In each unit, there are unobserved units that constitute 𝑌 \$,& (. → Under some assumptions, • each 𝑝\$,& 𝑦 can be linked to the linear factor model, and • 𝑝",& 𝑦 = ∑ \$∈𝒥! 𝑤\$ ∗ 𝑝\$,& 𝑦 holds. 𝑌!,# \$ 𝑌%,# \$ 𝑌&,# \$ 𝑌!,#,% \$ 𝑌!,#,& \$ 𝑌!,#,' \$ 𝑌%,#,% \$ 𝑌%,#,% \$ 𝑌%,#,% \$ 𝑌&,#,% \$ 𝑌&,#,% \$ 𝑌&,#,% \$
14. ### Moment Conditions ØMoment conditions. n Under the mixture models, the

following moment conditions hold: 𝔼",& 𝑌",& ( 4 = K \$∈𝒥! 𝑤\$ ∗ 𝔼\$,& 𝑌 \$,& ( 4 ∀ 𝛾 ∈ ℝ5. n Empirical approximation of 𝔼",& 𝑌",& ( 4 − ∑ \$∈𝒥! 𝑤\$ ∗ 𝔼\$,& 𝑌 \$,& ( 4 : D 𝑚4 𝑤 ≔ 1 𝑇" K &∈𝒯 % 𝑌",& ( 4 − K \$∈𝒥! 𝑤\$ 𝑌 \$,& ( 4 . • We estimate 𝑤 to achieve D 𝑚4 𝑤 ≈ 0. v v
15. ### GMM n A set of positive values Γ ≔ {1,2,3,

… , 𝐺}, e.x., Γ = {1, 2, 3, 4, 5}. n Estimate 𝑤\$ ∗ as D 𝑤\$ 677 \$∈𝒥! ≔ arg min ." :∑ "∈𝒥! .":# K 4∈ ; D 𝑚4 𝑤 ) . • We can weight each empirical moment condition; that is, by using some weight 𝑣4 ∈ ℝ5, D 𝑤\$ 677 \$∈𝒥! ≔ arg min ." :∑ "∈𝒥! .":# K 4∈ ; 𝑣4 D 𝑚4 𝑤 ) . n We can show that the GMM estimator is asymptotically unbiased; that is, D 𝑤\$ 677 → 1 𝑤\$ ∗. v
16. ### Inference n Hypothesis testing about the sharp null 𝐻": 𝜏",&

= 0 for 𝑡 ∈ 𝒯 #. • Note that 𝜏",& = 𝑌",& ' − 𝑌",& ( under the linear factor model. n We usually employ the conformal inference for testing the hypothesis. • Nonparametrically test the sharp null. • Computational costs.
17. ### Simulation Studies n 𝐺 is chosen from 2,3,5,10,20,30,40,50,60,70,80,90,100 . 𝐽

is chosen from {10, 30, 60}. • Recall that 3 𝑤( )** (∈𝒥! ≔ arg min -" :∑ "∈𝒥! -"0% ∑1∈ {%,&,…,4} % 6% ∑#∈𝒯 % 𝑌!,# \$ 1 − ∑ (∈𝒥! 𝑤( 𝑌 (,# \$ 1 & . n Generate 𝑌 (,# \$ from gaussian distributions. n The y-axis denotes the estimation error, and the x-axis denotes 𝐺.
18. ### n Empirical analysis using case studies in existing studies. •

Tobacco control in California (Abadie, Diamond and Hainmueller, 2010). • Basque conflict in the Basque country (Abadie and Gardeazabal, 2003). • Reunification of Germany (Abadie, Diamond and Hainmueller, 2015). n Pretreatment fit: Predictive ability for outcomes for 𝑡 ∈ 𝑇" . Empirical Studies

20. ### Bayesian SCMs n We introduced frequentist method for SCMs. n

Frequentist SCMs require • Large samples for showing the convergence of the weight estimators. • Special inference methods, such as conformal inference. • Distance minimization to employ covariates, which is not easy to be justified. n Consider Bayesian approach for SCMs. • Works with finite samples. • Inference with posterior distribution.
21. ### Bayesian Predictive Synthesis n Our Bayesian SCMs are based on

the formulation of Bayesian predictive synthesis (BPS). n BPS: a method for synthesizing predictive models (McAlinn and West, 2019). • Synthesize predictive models with reflecting the model uncertainty. • A generalization of Bayesian model averaging. • Incorporating various predictive models with weighting them time-varying parameters. n We regard untreated outcomes and predictive models for the outcomes using covariates as predictors of 𝑌",& ( • We first predict outcomes using covariates. • Then, we incorporate the predictors using the BPS.
22. ### BPSCM n We propose SCMs with the BPS, referred to

as the BPSCMs. ØBPSCM. • Φ& : a set of time-varying parameters at 𝑡. Φ& depends on 𝑌",&5# ( &∈[#:&] . n The conditional density function of 𝑌",&5# ( is referred to as the synthesis function, denoted by 𝛼 𝑦 𝑌 \$,& ( \$∈𝒥! , Φ> . n Bayesian decision maker predicts 𝑌",&5# ( using the posterior distribution defined as 𝑝( 𝑦 𝑌",&5# ( &∈[#:&] , Φ&) ≔ m 𝛼 𝑦 𝑦\$,& ( \$∈𝒥! , Φ&) n \$∈𝒥! 𝑝\$,& 𝑦\$,& ( d𝑦\$,& ( . v
23. ### Dynamic Latent Factor Linear Regression Models n There are several

specifications for the synthesis function. n Ex. Latent factor dynamic linear model: • Set the synthesis function as 𝛼 𝑦",& ( 𝑌 \$,& ( \$∈𝒥! , Φ& = 𝜙 𝑦",& ( , ; 𝑤",& + ∑ \$:# * 𝑤\$,&𝑌 \$,& (, 𝜈& . • 𝜙(⋅; 𝑎, 𝑏)): a univariate normal density with mean 𝑎 and variance 𝑏). • 𝑣& are unobserved error terms. • Specify the process of 𝑌",& ( and 𝑤&,\$ as 𝑌!,# \$ = 𝑤!,# + % %∈𝒥! 𝑤#,% 𝑌 %,# \$ + 𝜖# , 𝜖# ∼ 𝑁 0, 𝜈# , 𝑤#,% = 𝑤#().% + 𝜂#,% , 𝜂#,% ∼ 𝑁(0, 𝜈# 𝑾# ),
24. ### Auxiliary Covariates ØThe BPSCM can use covariates by predicting outcomes

using various predictive models. • 𝑋\$,& : Covariates for the unit 𝑗. n Define 𝐿 predictors for 𝑌 \$,& ( by x 𝑓? 𝑋\$,& ?:# @ . • These predictors can be constructed from machine learning methods. • We can use covariates in the predictive models. n With the original untreated outcomes 𝑌 %,# \$, there are 𝐾 = 1 + 𝐿 𝐿 predictrs 𝑌 %,# \$, 3 𝑓+ 𝑥%,# +,) - %,) . . n We incorporate them by using the BPS.
25. ### Auxiliary Covariates n A set of predictors are denoted by

𝒁& = z𝑌#,& ( , … , 𝑌 *,& (, x 𝑓# 𝑋#,& , … , x 𝑓@ 𝑋#,& , { x 𝑓# 𝑋),& , … , x 𝑓@ 𝑋*A#,& , x 𝑓# 𝑋*,& , … , x 𝑓@ 𝑋*,& n Conduct BPSCM as if there are 𝐽 + 𝐽𝐿 untreated units that can be used for SCMs: 𝑝\$ 𝑦| Φ# , 𝑌!,# \$ #∈ %:# = C 𝛼 𝑦 𝒛# , Φ# F (∈ %,&,…, %89 : 𝑝(,# 𝑧(,# d𝑧(,# . Ex. Synthesize predictive models such as linear regression and random forest.
26. ### Advantages of the BPSCM üTime-varying parameters. üIncorporate uncertainty of each

untreated outcome’s outcome. üMinimax optimality • Even under model misspecification, predictor of the BPSCM is minimax optimal in terms of KL divergence (Takanashi and McAlinn, 2021). • Avoid the implicit endogeneity problem? üWorks with finite samples. üInference (posterior distribution).
27. ### Empirical Analysis ØEmpirical studies using the same case studies in

the previous slide. n Compare following five prediction models. Time-varying coef.s Using covariates Synthesized predictive models Abadie ✓ - BPSCM ✓ - BPSCM (Linear) ✓ ✓ Least squares BPSCM (RF) ✓ ✓ Random forests BPSCM (Linear + RF) ✓ ✓ Least squares + random forests
28. ### Empirical Analysis n We mainly check the pretreatment fit and

posterior distribution of the BPSCMs.

30. ### n SCMs suffer from the issue of inconsistency. • The

LS estimator is incompatible to the assumption, 𝔼 𝑌!,# \$ = ∑ (∈:! 𝑤( ∗𝔼 𝑌 (,# \$ . → Implicit endogeneity (measurement error bias). • 𝑌!,# \$ = ∑ (∈:! 𝑤( ∗𝑌 (,# \$ is not realistic...? n Frequentist density matching (Mixture model + GMM). • Mixture model 𝑝!,# 𝑦 = ∑ (∈𝒥! 𝑤( ∗ 𝑝(,# 𝑦 , a stronger assumption than 𝔼 𝑌!,# \$ = ∑ (∈:! 𝑤( ∗𝔼 𝑌 (,# \$ . • By using the GMM under the assumption, we can estimate the weight consistently. n BPSCM. • By using the Bayesian method, we can obtain the minimax optimal predictor without assuming the mixture models without assuming mixture models. • Advantages such as flexible modeling and finite sample inference.

32. ### • Abadie, A. and Gardeazabal, J. “The economic costs of

conflict: A case study of the basque country.” American Economic Review, 2003. • Abadie, A., Diamond, A., and Hainmueller, J. “Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program.” Journal of the American Statistical Association, 2010 • Abadie, A., Diamond, A., and Hainmueller, J. “Comparative politics and the synthetic control method.” American Journal of Political Science, 2015 • Ferman, B. and Pinto, C. Synthetic controls with imperfect pretreatment fit. Quantitative Economics, 12(4):1197–1221, 2021. • McAlinn, K. and West, M., “Dynamic Bayesian predictive synthesis in time series forecasting,” Journal of econometrics, 2019 • McAlinn, K., Aastveit, K. A., Nakajima, J., and West, M. “Multivariate Bayesian predictive synthesis in macroeconomic forecasting.” Journal of the American Statistical Association, 2020 • Shi, C., Sridhar, D., Misra, V., and Blei, D. On the assumptions of synthetic control methods. In AISTATS, pp. 7163–7175, 2022 • Takanashi, K. and McAlinn, K. “Predictions with dynamic bayesian predictive synthesis are exact minimax”, 2021 • West, M. and Harrison, P. J. “Bayesian Forecasting & Dynamic Models.” Springer Verlag, 2nd edition, 1997