Slide 1

Slide 1 text

Adaptive Experimental Design for Efficient ATE Estimation Masahiro Kato, Takuya Ishihara, Junya Honda, Yusuke Narita NeurIPS 2020 Workshop Causal Discovery and Causality-Inspired Machine Learning 1

Slide 2

Slide 2 text

n RCT: sequentially assign a treatment and observe the outcome. n Goal: estimating the Average Treatment Effect (ATE) 1 − 0 . Ex. effect of a new vaccine for COVID-19. Problem Setting 2 Outcome !(!) Researcher Individual with covariate ! ∼ (!) Treatment ! ∈ 0, 1 ∼ (! ∣ !)

Slide 3

Slide 3 text

n Efficient RCT (or A/B testing) ? = Hypothesis testing with the smallest sample size. n How do we reduce the sample size ? → conducting an experiment to minimize the asymptotic variance. Ø The lower bound of the asymptotic variance of the ATE estimator = ! 1 ! ! = 1 ! + ! 0 ! 1 − ! = 1 ! + ! 1 ! − ! 0 ! − " # () → minimize the asymptotic variance by adjusting ! = 1 ! . n There is an ATE estimator achieving the lower bound. → We construct an optimal ! = 1 ! and an efficient ATE estimator. Efficient Experimental Design 3

Slide 4

Slide 4 text

n ! = 1 ! is an assignment probability of a treatment. The minimizer of the asymptotic variance of the ATE is ∗ ! = 1 ! = ! 1 ! ! 1 ! + ! 0 ! . (2) n Assign a treatment ! following ∗ ! = 1 ! . n However, ∗ ! = 1 ! includes an unknown ! 1 ! . → Sequentially estimate ! 1 ! and assign ! with ∗ " = 1 " . Desirable Assignment Probability of a Treatment 4

Slide 5

Slide 5 text

n In our adaptive experimental design, at the period , we sequentially 1. Using #, #, #(#) #$% ! , estimate ! 1 ! = estimate ∗ ! = 1 ! ; 2. assign a treatment ! following the estimator of ∗ ! = 1 ! ; 3. observe ! ! . At the end of the experiment , construct an ATE estimator defined as 1 % !"# $ 1 ! = 1 ! − * !%# 1, ! ∗ ! = 1 ! , Ω!%# − 1 ! = 1 ! − * !%# 1, ! ∗ ! = 0 ! , Ω!%# + * !%# 1, ! − * !%# 0, ! (3) where 8 !&% , ! is an estimator of [! ∣ !] only using Ω!&% . Ø This estimator achieves the asymptotic lower bound!! Adaptive Experimental Design 5

Slide 6

Slide 6 text

For an obtained dataset (!, !, !) !$% ' , where the samples are dependent. n Can we derive the asymptotic normality from the dependent samples? • Yes. The martingale property of (3) solves the problem. n Can we stop the experiment as early as possible? • Yes. When stopping the experiment at any time, we use sequential testing. • We derived a non-asymptotic deviation bound of the ATE estimator (3). Ø Standard hypothesis testing: the sample size is fixed (we cannot stop). Ø Sequential testing: the sample size is random variable (we can stop). n Which should we use, standard hypothesis testing or sequential testing? • It depends on the applications. There are different advantages. • Asymptotically, both approaches return the same result. Questions 6

Slide 7

Slide 7 text

Experimental Results 7 n Show the results of standard hypothesis testing and sequential testing. n For sequential testing, we use non-asymptotic deviation bound-based testing (LIL) and Bonferroni method-based sequential testing.

Slide 8

Slide 8 text

n Is Donsker’s condition required for the nuisance estimators? • No. The martingale property relax this problem. n Is Martingale necessary? • No. Under appropriate assumptions, we can show the asymptotic normality without requiring Donsker’s condition. See van der Laan and Lendle (2014) and Kato (2020). n Can we extend the method to best arm identification (BAI)? • We are tackling this problem! n Difficulty is in an extension of semiparametric inference to BAI. • Semiparametric theory is mainly based on the asymptotic theory. • BAI theory is mainly based on the non-asymptotic theory. Other Questions and Open Problems 8

Slide 9

Slide 9 text

Thank you for listening 9

Slide 10

Slide 10 text

• Klaassen, C. A. J. Consistent estimation of the influence function of locally asymptotically linear estimators. Annals of statistics, 1987. • van der Laan, M. J. The construction and analysis of adaptive group sequential designs. 2008. • Hahn, J., Hirano, K., and Karlan, D. Adaptive experimental design using the propensity score. Journal of Business and Economic Statistics, 29(1):96–108, 2008-2011. • Zheng, W. and van der Laan, M. J. Cross-validated targeted minimum-loss-based estimation. In Targeted Learning: Causal Inference for Observational and Experimental Data, Springer Series in Statistics. 2011. • van der Laan, M. J. and Lendle, S. D. Online targeted learning. 2014. • Johari, R., Pekelis, L., and Walsh, D. J. Always valid inference: Bringing sequential analysis to a/b testing. arXiv preprint arXiv:1512.04922, 2015. Reference 10

Slide 11

Slide 11 text

• Luedtke, A. R. and van der Laan, M. J. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Annals of statistics, 2016. • Balsubramani, A. and Ramdas, A. Sequential non- parametric testing with the law of the iterated logarithm. In UAI, 2016. • Kaufmann, E., Cappé, O., and Garivier, A. On the complexity of best-arm identification in multi-armed bandit models. JMLR, 2016. • Zhao, S., Zhou, E., Sabharwal, A., and Ermon, S. Adaptive concentration inequalities for sequential decision problems. In NeurIPS, pp. 1343–1351. Cur- ran Associates, Inc., 2016. • Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21: C1–C68, 2018. • Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S., and Athey, S. Confidence intervals for policy evaluation in adaptive experiments. arXiv preprint arXiv:1911.02768, 2019. Reference 11