Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Causal Discovery Workshop

Bb6c3fc8c577710c72d03aeb4fa56bf6?s=47 MasaKat0
November 30, 2020

Causal Discovery Workshop

This slide introduces "Adaptive Experimental Designfor Efficient ATE Estimation."

Bb6c3fc8c577710c72d03aeb4fa56bf6?s=128

MasaKat0

November 30, 2020
Tweet

More Decks by MasaKat0

Other Decks in Research

Transcript

  1. Adaptive Experimental Design for Efficient ATE Estimation Masahiro Kato, Takuya

    Ishihara, Junya Honda, Yusuke Narita NeurIPS 2020 Workshop Causal Discovery and Causality-Inspired Machine Learning 1
  2. n RCT: sequentially assign a treatment and observe the outcome.

    n Goal: estimating the Average Treatment Effect (ATE) 1 − 0 . Ex. effect of a new vaccine for COVID-19. Problem Setting 2 Outcome !(!) Researcher Individual with covariate ! ∼ (!) Treatment ! ∈ 0, 1 ∼ (! ∣ !)
  3. n Efficient RCT (or A/B testing) ? = Hypothesis testing

    with the smallest sample size. n How do we reduce the sample size ? → conducting an experiment to minimize the asymptotic variance. Ø The lower bound of the asymptotic variance of the ATE estimator = ! 1 ! ! = 1 ! + ! 0 ! 1 − ! = 1 ! + ! 1 ! − ! 0 ! − " # () → minimize the asymptotic variance by adjusting ! = 1 ! . n There is an ATE estimator achieving the lower bound. → We construct an optimal ! = 1 ! and an efficient ATE estimator. Efficient Experimental Design 3
  4. n ! = 1 ! is an assignment probability of

    a treatment. The minimizer of the asymptotic variance of the ATE is ∗ ! = 1 ! = ! 1 ! ! 1 ! + ! 0 ! . (2) n Assign a treatment ! following ∗ ! = 1 ! . n However, ∗ ! = 1 ! includes an unknown ! 1 ! . → Sequentially estimate ! 1 ! and assign ! with ∗ " = 1 " . Desirable Assignment Probability of a Treatment 4
  5. n In our adaptive experimental design, at the period ,

    we sequentially 1. Using #, #, #(#) #$% ! , estimate ! 1 ! = estimate ∗ ! = 1 ! ; 2. assign a treatment ! following the estimator of ∗ ! = 1 ! ; 3. observe ! ! . At the end of the experiment , construct an ATE estimator defined as 1 % !"# $ 1 ! = 1 ! − * !%# 1, ! ∗ ! = 1 ! , Ω!%# − 1 ! = 1 ! − * !%# 1, ! ∗ ! = 0 ! , Ω!%# + * !%# 1, ! − * !%# 0, ! (3) where 8 !&% , ! is an estimator of [! ∣ !] only using Ω!&% . Ø This estimator achieves the asymptotic lower bound!! Adaptive Experimental Design 5
  6. For an obtained dataset (!, !, !) !$% ' ,

    where the samples are dependent. n Can we derive the asymptotic normality from the dependent samples? • Yes. The martingale property of (3) solves the problem. n Can we stop the experiment as early as possible? • Yes. When stopping the experiment at any time, we use sequential testing. • We derived a non-asymptotic deviation bound of the ATE estimator (3). Ø Standard hypothesis testing: the sample size is fixed (we cannot stop). Ø Sequential testing: the sample size is random variable (we can stop). n Which should we use, standard hypothesis testing or sequential testing? • It depends on the applications. There are different advantages. • Asymptotically, both approaches return the same result. Questions 6
  7. Experimental Results 7 n Show the results of standard hypothesis

    testing and sequential testing. n For sequential testing, we use non-asymptotic deviation bound-based testing (LIL) and Bonferroni method-based sequential testing.
  8. n Is Donsker’s condition required for the nuisance estimators? •

    No. The martingale property relax this problem. n Is Martingale necessary? • No. Under appropriate assumptions, we can show the asymptotic normality without requiring Donsker’s condition. See van der Laan and Lendle (2014) and Kato (2020). n Can we extend the method to best arm identification (BAI)? • We are tackling this problem! n Difficulty is in an extension of semiparametric inference to BAI. • Semiparametric theory is mainly based on the asymptotic theory. • BAI theory is mainly based on the non-asymptotic theory. Other Questions and Open Problems 8
  9. Thank you for listening 9

  10. • Klaassen, C. A. J. Consistent estimation of the influence

    function of locally asymptotically linear estimators. Annals of statistics, 1987. • van der Laan, M. J. The construction and analysis of adaptive group sequential designs. 2008. • Hahn, J., Hirano, K., and Karlan, D. Adaptive experimental design using the propensity score. Journal of Business and Economic Statistics, 29(1):96–108, 2008-2011. • Zheng, W. and van der Laan, M. J. Cross-validated targeted minimum-loss-based estimation. In Targeted Learning: Causal Inference for Observational and Experimental Data, Springer Series in Statistics. 2011. • van der Laan, M. J. and Lendle, S. D. Online targeted learning. 2014. • Johari, R., Pekelis, L., and Walsh, D. J. Always valid inference: Bringing sequential analysis to a/b testing. arXiv preprint arXiv:1512.04922, 2015. Reference 10
  11. • Luedtke, A. R. and van der Laan, M. J.

    Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Annals of statistics, 2016. • Balsubramani, A. and Ramdas, A. Sequential non- parametric testing with the law of the iterated logarithm. In UAI, 2016. • Kaufmann, E., Cappé, O., and Garivier, A. On the complexity of best-arm identification in multi-armed bandit models. JMLR, 2016. • Zhao, S., Zhou, E., Sabharwal, A., and Ermon, S. Adaptive concentration inequalities for sequential decision problems. In NeurIPS, pp. 1343–1351. Cur- ran Associates, Inc., 2016. • Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21: C1–C68, 2018. • Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S., and Athey, S. Confidence intervals for policy evaluation in adaptive experiments. arXiv preprint arXiv:1911.02768, 2019. Reference 11