MasaKat0
August 02, 2023
130

# Synthetic Control Methods through Predictive Synthesis

Presentation slides at EcoSta 2023.

August 02, 2023

## Transcript

1. Synthetic Control Methods
through Predictive Synthesis
Masahiro Kato (The University of Tokyo)
Coauthors: Akira Fukuda, Kosaku Takanashi,
Kenichiro McAlinn, Akari Ohda, Masaaki Imaizumi
Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity (https://arxiv.org/abs/2307.11127)
Paper 2: Bayesian Predictive Synthetic Control Methods
Speaker Deck: https://speakerdeck.com/masakat0/synthetic-control-methods-through-predictive-synthesis?slide=25

2. Synthetic Control Methods
ØSynthetic Control Methods (SCMs; Abadie et al. 2003).
n Core idea.
• There are several units. One unit among them receives a policy intervention (treated unit).
• Policy intervention affects outcomes of the treated unit.
• We cannot observe outcomes when the treated unit does not receive the policy intervention
• Estimate counterfactual outcomes of the treated unit by using a weighted sum of observed
outcomes of untreated units.
• Then, using the estimated outcome, estimate the causal effect of the treated unit.

3. Problem Setting
n 𝐽 + 1 units, 𝑗 ∈ 𝒥 ≔ {0,1,2, … , 𝐽}.
• 𝑗 = 0: Treated unit (a unit affected by the policy intervention).
• 𝑗 ∈ 𝒥!: 𝒥 ∖ {0}: Untreated units.
n 𝑇 Periods, 𝑡 ∈ 𝒯 ≔ {1,2, … , 𝑇}.
• Intervention occurs at 𝑡 = 𝑇" < 𝑇.
• 𝑡 ∈ 𝒯
" ≔ {1,2, … , 𝑇"} : before the intervention.
• 𝑡 ∈ 𝒯 ∖ 𝒯
"
: after the intervention (𝑇# ≔ 𝒯
# = 𝑇 − 𝑇"
).

4. Problem Setting
ØPotential outcomes (Neyman, 1923; Rubin, 1974):
n For each unit 𝑗 ∈ 𝒥 and period 𝑡 ∈ 𝒯, define potential outcomes 𝑌
\$,&
' , 𝑌
\$,&
( ∈ ℝ).
• 𝑌&
' and 𝑌&
( are potential outcomes with and without interventions.
• 𝔼\$,&
: expectation over 𝑌&
' and 𝑌&
(.
ØObservations:
n Observe one of the outcomes, 𝑌
\$,& ∈ ℝ, corresponding to actual intervention; that is,
𝑌",& = 9
𝑌",&
' 𝑖𝑓 𝑡 ∈ 𝒯
#
𝑌",&
( 𝑖𝑓 𝑡 ∈ 𝒯
"
, 𝑌
\$,& = 𝑌
\$,&
( 𝑓𝑜𝑟 𝑗 ∈ 𝒥!.

5. Problem Setting
ØCausal effects:
𝜏",& ≔ 𝔼",& 𝑌",&
' − 𝑌",&
( 𝑓𝑜𝑟 𝑡 ∈ 𝒯
#.
n Estimating the causal effect by predicting 𝑌",&
( for 𝑡 ∈ 𝒯
#
.
ØCore idea.
n Predict 𝑌",&
( by a weighted sum of 𝑌#,&
( , … , 𝑌
*,&
(.
A
𝑌",&
( = ∑
\$∈*!
𝑤\$𝑌
\$,&
( .
• A
𝑌",&
( is a counterfactual trend of the treated unit.
• A
𝑌",&
( is called a synthetic control unit.
v

6. Contents
ØResearch questions mainly lie in estimation of the weights, 𝑤#, … , 𝑤*
.
n Paper 1: Synthetic Control Methods by Density Matching under Implicit Endogeneity.
• Estimators in existing SCMs are not consistent (Ferman and Pinto, 2021).
• We discuss the inconsistency problem from the viewpoint of endogeneity.
• Propose frequentist SCMs with the generalized method of moments (GMM).
n Paper 2: Bayesian Predictive Synthetic Control Methods.
• Apply Bayesian predicative synthesis for SCMs.
• Flexible modeling with time-varying parameter, finite-sample analysis, and minimax optimality.

7. Synthetic Control Methods by Density Matching
under Implicit Endogeneity

8. Least-Squares Estimator
n In standard SCM, we usually estimate the weights by constraint least squares.
• That is, we estimate 𝑤\$
as
D
𝑤\$
,-
\$∈*!
= arg min
." "∈\$!
1
𝑇
K
&∈𝒯
%
𝑌",&
( − 𝑤\$𝑌
\$,&
( )
such that K
\$∈*!
𝑤\$ = 1, 𝑤\$ ≥ 0 ∀𝑗 ∈ 𝐽!.
n To justify the least squares (LS) estimator, we assume the linearity in the expected outcomes:
𝔼 𝑌",&
( = K
\$∈*!
𝑤\$
∗𝔼 𝑌
\$,&
( .
v

9. Inconsistency of the LS Estimator
n Ferman and Pinto (2021) shows that the LS estimator is inconsistent; that is,
D
𝑤\$
,- →
1
T
𝑤\$ ≠ 𝑤2
∗.
• They propose another LS-based estimator that reduces the bias.
• However, the estimator is still biased.
n Their results imply that the LS estimator is incompatible to SCMs under the linearity
assumption, 𝔼 𝑌",&
( = ∑
\$∈*!
𝑤\$
∗𝔼 𝑌
\$,&
( .
v

10. Implicit Endogeneity
n We investigate this problem from the viewpoint of endogeneity.
• Let 𝑌
\$,&
( = 𝔼\$,& 𝑌
\$,&
( + 𝜀\$,&
.
• Under 𝔼 𝑌",&
( = ∑
\$∈*!
𝑤\$
∗𝔼 𝑌
\$,&
( , it holds that
𝑌",&
( = K
\$∈𝒥!
𝑤\$
∗ 𝑌
\$,&
( − K
\$∈𝒥!
𝑤\$
∗ 𝜀\$,& + 𝜀",& = K
\$∈𝒥!
𝑤\$
∗ 𝑌
\$,&
( + 𝑣&.
n Implicit endogeneity (measurement error bias): correlation between 𝑌
\$,&
( and 𝑣&
.
• There is an (implicit) endogeneity between the explanatory variable and the error term.
• This is a reason why the LS estimator D
𝑤\$
,- is biased; that is, D
𝑤\$
,- →
1
T
𝑤\$ ≠ 𝑤\$
∗.
v

11. Mixture Models
n The implicit endogeneity implies that the LS estimator is incompatible to SCMs.
ØConsider another estimation strategy.
n Assume mixture models and estimate the weights by the GMM.
• 𝑝\$,&(𝑦): density of 𝑌
\$,&
(
n Mixture models between 𝑝",&(𝑦) and 𝑝\$,& 𝑦
\$∈𝒥!
:
𝑝",& 𝑦 = K
\$∈𝒥!
𝑤\$
∗ 𝑝\$,& 𝑦 .
v

12. Fine-Grained Models.
n Assuming mixture models is stronger than assuming 𝔼 𝑌",&
( = ∑
\$∈*!
𝑤\$
∗𝔼 𝑌
\$,&
( .
ØMixture models can be justified from the viewpoint of fine-grained models (Shi et al., 2021).
n Linear factor models are usually assumed in SCMs:
𝑌
\$,&
( = 𝑐\$ + 𝛿& + 𝜆&𝜇\$ + 𝜀\$,&, 𝑌
\$,&
' = 𝜏",& + 𝑌
\$,&
(
• Shi et al., (2021) finds that mixture models imply factor models under some assumptions.

13. Fine-Grained Models.
ØFine-grained models (Shi et al., 2021).
n Assume that 𝑌
\$,&
( represents a group-level outcome.
n In each unit 𝑗, there are unobserved small units 𝑌
\$,
( , 𝑌
\$,&)
( , ….
= In each unit, there are unobserved units that constitute 𝑌
\$,&
(.
→ Under some assumptions,
• each 𝑝\$,& 𝑦 can be linked to the linear factor model, and
• 𝑝",& 𝑦 = ∑
\$∈𝒥!
𝑤\$
∗ 𝑝\$,& 𝑦 holds.
𝑌!,#
\$
𝑌%,#
\$ 𝑌&,#
\$
𝑌!,#,%
\$
𝑌!,#,&
\$ 𝑌!,#,'
\$
𝑌%,#,%
\$
𝑌%,#,%
\$
𝑌%,#,%
\$
𝑌&,#,%
\$
𝑌&,#,%
\$
𝑌&,#,%
\$

14. Moment Conditions
ØMoment conditions.
n Under the mixture models, the following moment conditions hold:
𝔼",& 𝑌",&
( 4
= K
\$∈𝒥!
𝑤\$
∗ 𝔼\$,& 𝑌
\$,&
( 4
∀ 𝛾 ∈ ℝ5.
n Empirical approximation of 𝔼",& 𝑌",&
( 4
− ∑
\$∈𝒥!
𝑤\$
∗ 𝔼\$,& 𝑌
\$,&
( 4
:
D
𝑚4 𝑤 ≔
1
𝑇"
K
&∈𝒯
%
𝑌",&
( 4
− K
\$∈𝒥!
𝑤\$ 𝑌
\$,&
( 4
.
• We estimate 𝑤 to achieve D
𝑚4 𝑤 ≈ 0.
v
v

15. GMM
n A set of positive values Γ ≔ {1,2,3, … , 𝐺}, e.x., Γ = {1, 2, 3, 4, 5}.
n Estimate 𝑤\$
∗ as
D
𝑤\$
677
\$∈𝒥!
≔ arg min
." :∑
"∈𝒥!
.":#
K
4∈ ;
D
𝑚4 𝑤
)
.
• We can weight each empirical moment condition; that is, by using some weight 𝑣4 ∈ ℝ5,
D
𝑤\$
677
\$∈𝒥!
≔ arg min
." :∑
"∈𝒥!
.":#
K
4∈ ;
𝑣4 D
𝑚4 𝑤
)
.
n We can show that the GMM estimator is asymptotically unbiased; that is,
D
𝑤\$
677 →
1
𝑤\$
∗.
v

16. Inference
n Hypothesis testing about the sharp null
𝐻": 𝜏",& = 0 for 𝑡 ∈ 𝒯
#.
• Note that 𝜏",& = 𝑌",&
' − 𝑌",&
( under the linear factor model.
n We usually employ the conformal inference for testing the hypothesis.
• Nonparametrically test the sharp null.
• Computational costs.

17. Simulation Studies
n 𝐺 is chosen from 2,3,5,10,20,30,40,50,60,70,80,90,100 . 𝐽 is chosen from {10, 30, 60}.
• Recall that 3
𝑤(
)**
(∈𝒥!
≔ arg min
-" :∑
"∈𝒥! -"0%
∑1∈ {%,&,…,4}
%
6%
∑#∈𝒯
%
𝑌!,#
\$ 1
− ∑
(∈𝒥!
𝑤(
𝑌
(,#
\$ 1 &
.
n Generate 𝑌
(,#
\$ from gaussian distributions.
n The y-axis denotes the estimation error, and the x-axis denotes 𝐺.

18. n Empirical analysis using case studies in existing studies.
• Tobacco control in California (Abadie, Diamond and Hainmueller, 2010).
• Basque conflict in the Basque country (Abadie and Gardeazabal, 2003).
• Reunification of Germany (Abadie, Diamond and Hainmueller, 2015).
n Pretreatment fit: Predictive ability for outcomes for 𝑡 ∈ 𝑇"
.
Empirical Studies

19. Bayesian Predictive Synthetic Control Methods

20. Bayesian SCMs
n We introduced frequentist method for SCMs.
n Frequentist SCMs require
• Large samples for showing the convergence of the weight estimators.
• Special inference methods, such as conformal inference.
• Distance minimization to employ covariates, which is not easy to be justified.
n Consider Bayesian approach for SCMs.
• Works with finite samples.
• Inference with posterior distribution.

21. Bayesian Predictive Synthesis
n Our Bayesian SCMs are based on the formulation of Bayesian predictive synthesis (BPS).
n BPS: a method for synthesizing predictive models (McAlinn and West, 2019).
• Synthesize predictive models with reflecting the model uncertainty.
• A generalization of Bayesian model averaging.
• Incorporating various predictive models with weighting them time-varying parameters.
n We regard untreated outcomes and predictive models for the outcomes using covariates as
predictors of 𝑌",&
(
• We first predict outcomes using covariates.
• Then, we incorporate the predictors using the BPS.

22. BPSCM
n We propose SCMs with the BPS, referred to as the BPSCMs.
ØBPSCM.
• Φ&
: a set of time-varying parameters at 𝑡. Φ&
depends on 𝑌",&5#
(
&∈[#:&]
.
n The conditional density function of 𝑌",&5#
( is referred to as the synthesis function, denoted by
𝛼 𝑦 𝑌
\$,&
(
\$∈𝒥!
, Φ> .
n Bayesian decision maker predicts 𝑌",&5#
( using the posterior distribution defined as
𝑝( 𝑦 𝑌",&5#
(
&∈[#:&]
, Φ&) ≔ m 𝛼 𝑦 𝑦\$,&
(
\$∈𝒥!
, Φ&) n
\$∈𝒥!
𝑝\$,& 𝑦\$,&
( d𝑦\$,&
( .
v

23. Dynamic Latent Factor Linear Regression Models
n There are several specifications for the synthesis function.
n Ex. Latent factor dynamic linear model:
• Set the synthesis function as 𝛼 𝑦",&
( 𝑌
\$,&
(
\$∈𝒥!
, Φ& = 𝜙 𝑦",&
( , ; 𝑤",& + ∑
\$:#
* 𝑤\$,&𝑌
\$,&
(, 𝜈&
.
• 𝜙(⋅; 𝑎, 𝑏)): a univariate normal density with mean 𝑎 and variance 𝑏).
• 𝑣&
are unobserved error terms.
• Specify the process of 𝑌",&
( and 𝑤&,\$
as
𝑌!,#
\$ = 𝑤!,#
+ %
%∈𝒥!
𝑤#,%
𝑌
%,#
\$ + 𝜖#
, 𝜖#
∼ 𝑁 0, 𝜈#
, 𝑤#,%
= 𝑤#().%
+ 𝜂#,%
, 𝜂#,%
∼ 𝑁(0, 𝜈#
𝑾#
),

24. Auxiliary Covariates
ØThe BPSCM can use covariates by predicting outcomes using various predictive models.
• 𝑋\$,&
: Covariates for the unit 𝑗.
n Define 𝐿 predictors for 𝑌
\$,&
( by x
𝑓? 𝑋\$,& ?:#
@
.
• These predictors can be constructed from machine learning methods.
• We can use covariates in the predictive models.
n With the original untreated outcomes 𝑌
%,#
\$, there are 𝐾 = 1 + 𝐿 𝐿 predictrs 𝑌
%,#
\$, 3
𝑓+ 𝑥%,# +,)
-
%,)
.
.
n We incorporate them by using the BPS.

25. Auxiliary Covariates
n A set of predictors are denoted by
𝒁& = z𝑌#,&
( , … , 𝑌
*,&
(, x
𝑓# 𝑋#,& , … , x
𝑓@ 𝑋#,& , {
x
𝑓# 𝑋),& , … , x
𝑓@ 𝑋*A#,& , x
𝑓# 𝑋*,& , … , x
𝑓@ 𝑋*,&
n Conduct BPSCM as if there are 𝐽 + 𝐽𝐿 untreated units that can be used for SCMs:
𝑝\$ 𝑦| Φ#
, 𝑌!,#
\$
#∈ %:#
= C 𝛼 𝑦 𝒛#
, Φ#
F
(∈ %,&,…, %89 :
𝑝(,#
𝑧(,#
d𝑧(,#
.
Ex. Synthesize predictive models such as linear regression and random forest.

üTime-varying parameters.
üIncorporate uncertainty of each untreated outcome’s outcome.
üMinimax optimality
• Even under model misspecification, predictor of the BPSCM is minimax optimal in terms of
KL divergence (Takanashi and McAlinn, 2021).
• Avoid the implicit endogeneity problem?
üWorks with finite samples.
üInference (posterior distribution).

27. Empirical Analysis
ØEmpirical studies using the same case studies in the previous slide.
n Compare following five prediction models.
Time-varying coef.s Using covariates Synthesized predictive models
BPSCM ✓ -
BPSCM (Linear) ✓ ✓ Least squares
BPSCM (RF) ✓ ✓ Random forests
BPSCM (Linear + RF) ✓ ✓ Least squares + random forests

28. Empirical Analysis
n We mainly check the pretreatment fit and posterior distribution of the BPSCMs.

29. Conclusion

30. n SCMs suffer from the issue of inconsistency.
• The LS estimator is incompatible to the assumption, 𝔼 𝑌!,#
\$ = ∑
(∈:!
𝑤(
∗𝔼 𝑌
(,#
\$ .
→ Implicit endogeneity (measurement error bias).
• 𝑌!,#
\$ = ∑
(∈:!
𝑤(
∗𝑌
(,#
\$ is not realistic...?
n Frequentist density matching (Mixture model + GMM).
• Mixture model 𝑝!,#
𝑦 = ∑
(∈𝒥!
𝑤(
∗ 𝑝(,#
𝑦 , a stronger assumption than 𝔼 𝑌!,#
\$ = ∑
(∈:!
𝑤(
∗𝔼 𝑌
(,#
\$ .
• By using the GMM under the assumption, we can estimate the weight consistently.
n BPSCM.
• By using the Bayesian method, we can obtain the minimax optimal predictor without assuming the
mixture models without assuming mixture models.
• Advantages such as flexible modeling and finite sample inference.

31. Reference

32. • Abadie, A. and Gardeazabal, J. “The economic costs of conflict: A case study of the basque country.” American Economic Review,
2003.
• Abadie, A., Diamond, A., and Hainmueller, J. “Synthetic control methods for comparative case studies: Estimating the effect of
california’s tobacco control program.” Journal of the American Statistical Association, 2010
• Abadie, A., Diamond, A., and Hainmueller, J. “Comparative politics and the synthetic control method.” American Journal of Political
Science, 2015
• Ferman, B. and Pinto, C. Synthetic controls with imperfect pretreatment fit. Quantitative Economics, 12(4):1197–1221, 2021.
• McAlinn, K. and West, M., “Dynamic Bayesian predictive synthesis in time series forecasting,” Journal of econometrics, 2019
• McAlinn, K., Aastveit, K. A., Nakajima, J., and West, M. “Multivariate Bayesian predictive synthesis in macroeconomic forecasting.”
Journal of the American Statistical Association, 2020
• Shi, C., Sridhar, D., Misra, V., and Blei, D. On the assumptions of synthetic control methods. In AISTATS, pp. 7163–7175, 2022
• Takanashi, K. and McAlinn, K. “Predictions with dynamic bayesian predictive synthesis are exact minimax”, 2021
• West, M. and Harrison, P. J. “Bayesian Forecasting & Dynamic Models.” Springer Verlag, 2nd edition, 1997