Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Active Adaptive Experimental Design for Treatm...

MasaKat0
August 12, 2024

Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices

ICML 2024.

MasaKat0

August 12, 2024
Tweet

More Decks by MasaKat0

Other Decks in Research

Transcript

  1. Mizuho-DL Financial Technology Co., Ltd. [email protected] Active Adaptive Experimental Design

    for Treatment Effect Estimation with Covariate Choices ICML 2024 Oral 6F Experimental Design and Simulation, 25 Jul, Vienna Copyright (c) Mizuho–DL Financial Technology Co., Ltd. All Rights Reserved. Masahiro Kato Ryo Inokuchi Wataru Komatsubara Akihiro Oga
  2. Research Background Copyright (c) Mizuho–DL Financial Technology Co., Ltd. All

    Rights Reserved.  When the covariates in the experimental data differ from those in the population we want to evaluate, does the estimator of the treatment effect become inefficient? Whether the asymptotic variance increases when the distribution shifts depends on the situation. It might be possible to collect experimental data in a way that reduces the asymptotic variance of the treatment effect estimator for the population of interest. DS n This study began with discussions with a data scientist who has an interest in causal inference under distribution shifts. 2
  3. Contributions Copyright (c) Mizuho–DL Financial Technology Co., Ltd. All Rights

    Reserved.  Goal: Estimation of the average treatment effect (ATE) using experimental data. Task: Design of an experiment that returns an ATE estimator with a smaller variance. Contribution: Active adaptive experiment for efficient ATE estimation. 1. New framework for experimental design: Optimize both propensity score and covariate density. 2. Efficient adaptive experiment: Minimize the asymptotic variance of an ATE estimator. 3. Asymptotic analysis of our estimator: Asymptotic normality and efficiency. Related work: Adaptive experimental design for ATE estimation (Hahn et al., 2011; Kato et al., 2020) + Covariate shift adaptation (Shimoraida, 2000; Sugiyama et al., 2006). 3
  4. Problem Setting Copyright (c) Mizuho–DL Financial Technology Co., Ltd. All

    Rights Reserved.  n Binary treatment 𝑎 ∈ {1, 0}. Ex. AB testing, new drug and placebo, etc. n Potential outcome 𝑌 𝑎 ∈ ℝ, where 𝑌 1 , 𝑌 0 ∣ 𝑋 ∼ 𝜁(𝑦 1 , 𝑦 0 ∣ 𝑋). • If treatment 𝑎 is assigned, we observe the corresponding outcome 𝑌(𝑎). n Each experimental unit has covariates 𝑋 ∈ 𝒳. • Covariates represent characteristics of experimental units (age, height, weight, etc). n We denote the covariate density of interest by 𝑞∗(𝑥). Ex. US Demographics, n ATE over 𝑞∗(𝑥): 𝜃" ≔ ∫ 𝑦 1 − 𝑦 0 𝜁 𝑦 1 , 𝑦 0 𝑥 𝑞∗ 𝑥 𝑑𝑦 1 𝑑𝑦 0 𝑑𝑥. v 4
  5. Experiment for ATE Estimation Copyright (c) Mizuho–DL Financial Technology Co.,

    Ltd. All Rights Reserved.  n Randomized controlled trials (RCTs) are gold standard in experimental approaches. • Treatments are assigned randomly with a probability of 1/2. ↔ However, RCTs can sometimes be costly. n We propose designing an experiment that returns an ATE estimator with a smaller asymptotic variance than those from RCTs. n To gain efficiency, we optimize the experiment using past observations. → Active adaptive experiment. • We optimize both the propensity score and the covariate density. 5
  6. Active Adaptive Experimental Design Copyright (c) Mizuho–DL Financial Technology Co.,

    Ltd. All Rights Reserved.  n Active adaptive experiment with 𝑻 rounds: In each round 𝑡 ∈ 𝑇 ≔ {1,2, … , 𝑇}, • Sample an experimental unit from a covariate density 𝑝# (𝑥) as 𝑋# ∼ 𝑝# (𝑥). • Assign treatment 𝐴# ∈ {1,0} with probability 𝜋#(𝐴# ∣ 𝑋#) (propensity scre). • Observe the corresponding outcome 𝑌# = ∑$∈ &," 1 𝐴# = 𝑎 𝑌 𝑎 = 𝑌(𝐴#).. After round 𝑇, we estimate the ATE using the observations 𝑋#, 𝐴#, 𝑌# #(& ) . 6
  7. n We update the covariate density 𝑝# (𝑥) and the

    propensity score 𝜋# 𝑎 𝑋# by using the past observations 𝑋*, 𝐴*, 𝑌* *(& #+&. Ø Question: What covariate densities and propensity scores can reduce the asymptotic variance of the estimator? Active Adaptive Experimental Design 7
  8. Semiparametric Efficiency Bound Copyright (c) Mizuho–DL Financial Technology Co., Ltd.

    All Rights Reserved.  n Lower bound of the asymptotic variance of ATE estimators. n Data-generating process (DGP) under fixed 𝜋(𝑎 ∣ 𝑥) and 𝑝(𝑥): 𝑌, 𝐴, 𝑋 ∼ 𝑝 𝑦, 𝑎, 𝑥 ≔ + !∈ #,% 𝜁 𝑦 𝑑 𝑥 𝜋 𝑑 𝑥 # !&' 𝑝 𝑥 . n Assume that we can obtain infinite samples from 𝑞∗(𝑥). Ø Semiparametric efficiency bound for 𝜃" (also see Hahn, 1998 and Uehara et al., 2021): 𝑉 𝜋, 𝑝 ≔ E Var 𝑌 1 𝑥 𝜋 1 𝑥 + Var 𝑌 0 𝑥 𝜋 0 𝑥 𝑞∗ 𝑥 𝑝 𝑥 𝑞∗(𝑥) 𝜁 𝑦 1 , 𝑦 0 𝑥 𝑑𝑦 1 𝑑𝑦 0 𝑑𝑥. n The efficiency bound is a functional of 𝜋 and 𝑝. v 8
  9. 9 n We minimize the lower bound by optimizing 𝑉(𝜋,

    𝑝) regarding 𝜋 and 𝑝. n Define the efficient propensity score and covariate distribution as (𝜋∗, 𝑝∗) ≔ arg min ,,- 𝑉 𝜋, 𝑝 . • We can obtain the close forms of 𝜋∗ and 𝑝∗ as follows: 𝜋∗ 𝑎 𝑥 = Var(𝑌(𝑎)|𝑥) Var(𝑌(1)|𝑥) + Var(𝑌 0 |𝑥) , 𝑝∗ 𝑥 = Var(𝑌(1)|𝑥) + Var(𝑌 0 |𝑥) ∫ Var 𝑌 1 𝑥 + Var 𝑌 0 𝑥 𝑞∗ 𝑥 𝑑𝑥 . • Note that 𝜋∗ and 𝑝∗ are unknown since Var(𝑌(𝑎)|𝑥) is unknown. Efficient Propensity Score and Covariate Density v
  10. n The efficient propensity score 𝜋∗ 𝑎 𝑋 = )*+(-(')|0)

    )*+(-(#)|0)1 )*+(- % |0) : • Neyman allocation (Neyman, 1932). • Kato et al. (2020) propose optimizing only 𝜋∗. n The efficient covariate density: 𝑝∗ 𝑥 = )*+(-(#)|0)1 )*+(- % |0) ∫ )*+ 𝑌 1 𝑥 1 )*+ 𝑌 0 𝑥 3∗ 0 !0 . • Sampling more experimental units with larger variance. • While the ATE is defined over 𝑞∗, the efficient probability is 𝑝∗ ≠ 𝑞∗. Efficient Propensity Score and Covariate Density 10
  11. n We estimate Var(𝑌(𝑎)|𝑋) to use 𝜋∗ and 𝑝∗ in

    an experiment. n Our designed adaptive experiment: In each round 𝑡 ∈ [𝑇]: • Estimate Var(𝑌(𝑎)|𝑋) using 𝑌*, 𝐴*, 𝑋* *(& #+&. We denote the estimator by . Var! (𝑌(𝑎)|𝑋). • Sample 𝑋# ∼ 𝑝# 𝑥 = . 89:!(;(&)|<)= . 89:!(; " |<) ∫ . 89:! 𝑌 1 𝑥′ = . 89:! 𝑌 0 𝑥′ ?∗ <# @<# . • Assign treatment 𝐴# ∼ 𝜋# 𝑎 𝑋# = . 89:! 𝑌 𝑎 𝑋# . 89:! 𝑌 1 𝑋# = . 89:! 𝑌 0 𝑋# . Efficient Active Adaptive Experiment 11
  12. n At the end, by using 𝑋#, 𝐴#, 𝑌# #(&

    ) , we estimate the ATE as 5 𝜃" = 1 𝑇 8 #∈["] 1 𝐴# = 1 𝑌# − ; 𝜇# 1 𝑋# 𝜋# 1 ∣ 𝑋# − 1 𝐴# = 0 𝑌# − ; 𝜇# 0 𝑋# 𝜋# 0 ∣ 𝑋# 𝑞∗(𝑋# ) 𝑝# (𝑋# ) + ? ; 𝜇# 1 𝑥 − ; 𝜇# 0 𝑥 𝑞∗ 𝑥 𝑑𝑥 . • ; 𝜇# 𝑎 𝑋# is an estimator of 𝔼[𝑌 𝑎 ∣ 𝑋] only using 𝑋' , 𝐴' , 𝑌' '() #*). cf. double machine learning. n We can show the asymptotic normality and efficiency of the estimator. Assume 𝜋# 𝑎 𝑥 − 𝜋∗ 𝑎 𝑥 → 0 and 𝑝# 𝑥 → 𝑝∗(𝑥) as 𝑡 → ∞ a.s. Then, it holds that 𝑇 T 𝜃) − 𝜃" → @ 𝒩 0, 𝑉 𝜋∗, 𝑝∗ (𝑇 → ∞). Efficient Active Adaptive Experiment Theorem: Asymptotic normality and efficiency v 12
  13. n Simulation studies. Conditional variances of two treatments. Comparison of

    𝑞∗(𝑥) and 𝑝∗ 𝑥 in two cases: Gaussian and Uniform distributions for 𝑞∗(𝑥). Simulation Studies 13
  14. n Empirical squared errors between ATE estimators and the true

    ATE. AS-AIPW: optimize only treatment-assignment probabilities. AAS-AIPW: optimize treatment-assignment and covariate probabilism. Simulation Studies 14
  15. Conclusion Copyright (c) Mizuho–DL Financial Technology Co., Ltd. All Rights

    Reserved.  Ø Active adaptive experimental design for ATE estimation. 1. Minimize the semiparametric efficiency bound for both the propensity score and the covariate density. → Define the minimizer as the efficient propensity score and covariate density. 2. Estimate the efficient propensity score and covariate density during an experiment. Draw 𝑋# and 𝐴# from the estimated efficient propensity score and covariate density. 3. At the end of the experiment, estimate the ATE by using experimental data. The estimator’s asymptotic variance aligns with the minimized efficiency bound. 15
  16. Copyright (c) Mizuho–DL Financial Technology Co., Ltd. All Rights Reserved.

     Thank you for your attention! Reference • Hahn, J., Hirano, K., and Karlan, D. Adaptive experimental design using the propensity score. Journal of Business and Economic Statistics, 2011. • Kato, M., Ishihara, T., Honda, J., and Narita, Y. Adaptive experimental design for efficient treatment effect estimation: Randomized allocation via contextual bandit algorithm. arXiv:2002.05308, 2020. • Shimodaira, H., “Improving predictive inference under covariate shift by weighting the log-likelihood function,” Journal of statistical planning and inference, 2010. • Sugiyama, M. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research, 2006. • Uehara, M., Kato, M., and Yasui, S., Off-Policy Evaluation and Learning for External Validity under a Covariate Shift. In NeurIPS 2020