Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice

1 Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design
for Treatment Choice 統計輪講 Masahiro Kato https://arxiv.org/abs/2506.24007

2 Experimental approach for causal inference ◼ The gold standard
for causal inference is randomized controlled trials (RCTs). • We randomly allocate treatments for experimental units. ◼ RCTs are the gold standard but often costly and inefficient. → Design more efficient experiments (in some sense). ➢ Adaptive experiment design. • Update treatment-allocation during an experiment to gain efficiency. ◼ How do we define an ideal treatment-allocation probability (propensity score)?

3 Table of contents ◼ I explain how we design
adaptive experiments for treatment choice. 1. (Introduction) General approach for adaptive experimental design. 2. Problem setting. 3. Algorithm and optimality. 4. Theoretical analysis. 5. On the optimal experimental design for treatment choice

4 1. Adaptive experimental design for causal inference

5 How to Design Adaptive Experiments? ◼ A typical procedure
of experimental design for causal inference: Step 1. Define the goal of causal inference and the performance measure. Step 2. Compute a lower bound and an ideal treatment-allocation probability. Step 3. Conduct the designed adaptive experiment Allocate treatment arms while estimating the ideal treatment-allocation probability. Return a target of interest using the observation in the adaptive experiment. Step 4. Investigate the performance of the designed experiment and confirm the optimality.

6 Step 1: goals and performance measures ◼ Goal 1:
Average treatment effect (ATE) estimation. * The number of treatment arms is usually two (binary treatments). • Goal. Estimate the ATE. • Performance measure. The (Asymptotic) Variance. • Smaller asymptotic variance is better. ◼ Goal 2: Treatment choice, also known as best-arm identification. * The number of treatments is more than or equal to two (multiple treatments). • Goal. Choose the best treatment = a treatment whose expected outcome is the highest. • Performance measure. The probability of misidentifying the best treatment or the regret.

7 Step 2: lower bounds and ideal allocation probability ◼
After deciding the goal and performance metric, we develop a lower bound. ◼ ATE estimation. • Semiparametric efficiency bound (Hahn, 1998). ◼ Best-arm identification. • Various lower bounds have been proposed. No consensus for which one we should use. ✓ Lower bounds are often functions of treatment-allocation probabilities (propensity score). → We can minimize the lower bound regarding treatment-allocation probabilities. We refer to probabilities that minimize lower bounds as an ideal treatment-allocation probabilities.

8 Step 3: adaptive experiment ◼ Ideal allocation probabilities usually
depend on unknown parameters of a distribution. • We estimate the probabilities (unknown parameters) during an experiment. ◼ We run an adaptive experiment, which consists of the following two phases. 1. Treatment-allocation phase: in each round 𝑡 = 1,2, … , 𝑇: • Estimate the ideal treatment-allocation probability based on past observations. • Allocate treatment following the estimated ideal treatment-allocation probability. 2. Decision-making phase: at the end of the experiment: • Return an estimate of a target of interest.

9 Step 4: upper bound and optimality ◼ For the
designed experiment, we investigate its theoretical performance. ➢ ATE estimation. • We prove the asymptotic normality and check the asymptotic variance. • If the asymptotic variance matches the efficiency bound minimized for the treatment- allocation probability, the design is (asymptotically) optimal. ➢ Best-arm identification. • We investigate the probability of misidentifying the best arm or the regret. • Distribution-dependent, minimax and Bayes optimality.

10 2. Problem setting

11 From ATE estimation to treatment choice ◼ From ATE
estimation to decision-making (treatment choice. Manski, 2004). ◼ Treatment choice via adaptive experiments is called best-arm identification (BAI). • BAI has been investigated in various areas, including operations research and economics. Gathering data Parameter estimation Decision-making ATE estimation Treatment choice

12 Setup ◼ Treatment. 𝐾 treatments indexed by 1,2, …
𝐾． Also referred to as treatment arms or arms. ◼ Potential outcome. 𝑌𝑎 ∈ ℝ． ◼ No covariates. ◼ Distribution. 𝑌𝑎 follows a parametric distribution 𝑃 𝝁 , where 𝝁 = 𝜇𝑎 𝑎∈[𝐾] ∈ ℝ𝐾. • The mean of 𝑌𝑎 is 𝜇𝑎 (and only the mean parameters can take different values). • For simplicity, let 𝑃 𝝁 be a Gaussian distribution under which the variance of 𝑌𝑎 is fixed at 𝜎𝑎 2. ◼ Goal. Find the following best treatment arm efficiently: 𝑎𝝁 ∗ ≔ arg max 𝑎∈ 1,2,…,𝐾 𝜇𝑎 .

13 Setup ◼ Adaptive experimental design with 𝑻 rounds: •
Treatment-allocation phase: in each round 𝑡 = 1,2, … , 𝑇: • Allocate treatment 𝐴𝑡 ∈ 𝐾 ≔ 1,2, … , 𝐾 using 𝐴𝑠 , 𝑌𝑠 𝑠=1 𝑡−1. • Observe the outcome 𝑌𝑡 = σ 𝑎∈ 𝐾 1 𝐴𝑡 = 𝑎 𝑌𝑎,𝑡 . • Decision-making phase: at the end of the experiment: • Obtain an estimator ො 𝑎𝑇 of the the best treatment arm using 𝑋𝑡 , 𝐴𝑡 , 𝑌𝑡 𝑡=1 𝑇 . Round 𝒕 Unit 𝑡 Treatment 𝐴𝑡 Outcome 𝑌𝑡 After round 𝑻 Choose the best treatment.

14 Performance measures • Let ℙ𝝁 and 𝔼𝝁 be the
probability law and expectation under 𝑃 𝝁 . ◼ Probability of misidentification: ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝝁 ∗ . ➢ The probability that the estimate of the best treatment ො 𝑎𝑇 is not the true best one 𝑎𝜇 ∗ . ◼ Expected simple regret: Regret𝝁 ≔ 𝔼𝝁 𝑌𝑎𝝁 ∗ − 𝑌ො 𝑎𝑇 . ➢ The welfare loss when we deploy the estimate of the best treatment ො 𝑎𝑇 for the population. • In this talk, we simply refer this regret as the regret. • Also called the out-of-sample regret or the policy regret (Kasy and Sautmann, 2021). E.g., in-sample regret in regret minimization.

15 Evaluation framework ◼ Distribution-dependent analysis. • Evaluate the performance
measures given 𝝁 fixed for 𝑇. • For each 𝝁 ∈ ℝ𝐾, we evaluate ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝝁 ∗ or Regret𝝁 ◼ Minimax analysis. • Evaluate the performance measures for the worst-case 𝝁. • We evaluate sup 𝝁∈ℝ𝐾 ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝝁 ∗ or sup 𝝁∈ℝ𝐾 Regret𝝁 ◼ Bayes analysis. • A prior Π is given. • Evaluate the performance measures by summing them weighted by the prior. • We evaluate ׬ ℝ𝐾 ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝝁 ∗ 𝑑Π(𝜇) or ׬ ℝ𝐾 Regret𝝁 𝑑Π(𝝁).

16 3. Algorithm and optimality

17 Minimax and Bayes optimal experiment for the regret Kato,
M. Minimax and Bayes optimal best-arm identification: adaptive experimental design for treatment choice. • We design an experiment for treatment choice (best-arm identification). • The designed experiment is minimax and Bayes optimality for the (expected simple) regret.

18 Minimax and Bayes optimal experiment for the regret ➢
Our designed experiment consists of the following two elements: • Treatment-allocation phase: two-stage sampling. • In the first stage, we remove clearly suboptimal treatments and estimate the variances. • In the second stage, we allocate treatment arms with a variant of the Neyman allocation. • Decision-making phase: choice of the empirical best arm (Bubeck et al., 2011, Manski, 2004). → We refer to our designed experiment as the TS-EBA experiment.

19 Minimax and Bayes optimal experiment for the regret ◼
Treatment-allocation phase: We introduce the following two-stage sampling: • Split 𝑇 into the first stage with 𝑟𝑇 rounds and the second stage with 1 − 𝑟 𝑇 rounds. 1. First stage: • Allocate each treatment with equivalent ratio 𝑟𝑇/𝐾. • Using concentration inequality, we select candidates of the best treatments as መ 𝒮 ⊆ 𝐾 . • Obtain an estimator ො 𝜎𝑎,𝑟𝑇 2 of the variance 𝜎𝑎 2. (sample variance is fine). 2. Second stage (we do not allocate treatment 𝑎 ∉ መ 𝒮 from this stage): • If መ 𝒮 = 1, we return an arm 𝑎 ∈ መ 𝒮 as an estimate of the best arm. • If መ 𝒮 = 2, we allocate treatment 𝑎 ∈ መ 𝒮 with probability 𝑤𝑎,𝑡 ≔ ො 𝜎𝑎,𝑟𝑇 / ො 𝜎1,𝑟𝑇 + ො 𝜎2,𝑟𝑇 . • If መ 𝒮 ≥ 3, we allocate treatment 𝑎 ∈ መ 𝒮 with probability 𝑤𝑎,𝑡 ≔ ො 𝜎𝑎,𝑟𝑇 2 / σ 𝑏∈ መ 𝒮 ො 𝜎𝑏,𝑟𝑇 2 .

How to select candidates መ 𝒮? • Let 𝑣𝑎,𝑞𝑇 ≔ log 𝑇 𝑇 max 𝑎 ො 𝜎𝑎,𝑟𝑇 , and ො 𝑎𝑟𝑇 = arg max 𝑎∈[𝐾] ො 𝜇𝑎,𝑟𝑇 . • Construct lower and upper confidence bounds as ො 𝜇𝑎,𝑟𝑇 − 𝑣𝑎,𝑞𝑇 and ො 𝜇𝑎,𝑟𝑇 ＋𝑣𝑎,𝑞𝑇 . • Define መ 𝒮 as መ 𝒮 ≔ 𝑎 ∈ 𝐾 ∶ ො 𝜇𝑎,𝑟𝑇 ＋𝑣𝑎,𝑞𝑇 ≥ ො 𝜇ො 𝑎𝑟𝑇,𝑟𝑇 − 𝑣ො 𝑎𝑟𝑇,𝑟𝑇 .

Decision-making phase: • Just choose the treatment arm with the highest mean: ො 𝑎𝑇 = arg max 𝑎∈[𝐾] ො 𝜇𝑎,𝑇 . • ො 𝜇𝑎,𝑇 ≔ 1 σ𝑡=1 𝑇 1 𝐴𝑡=𝑎 σ𝑡=1 𝑇 1 𝐴𝑡 = 𝑎 𝑌𝑡 is the sample mean. ◼ Denote the TS-EBA experiment by 𝛿TS−EBA. • We also denote ො 𝑎𝑇 by ො 𝑎𝑇 𝛿TS−EBA when we clarify its dependence on the experiment. • Similarly, let Regret𝝁 be the regret of 𝛿TS−EBA under 𝝁.

For the designed experiment, we can prove the minimax and Bayes optimality. • For deriving lower bounds, we restrict the class of experiments. ◼ Regular experiments. • Let ℰ be a set of regular experiments. • For any 𝛿 ∈ ℰ, the followings hold: • If 𝑇 𝜇𝑎𝜇 ∗ − 𝜇𝑎 → ∞ holds for all 𝑎 ∈ [𝐾], we have ℙ𝜇 ො 𝑎𝑇 𝛿 ≠ 𝑎𝜇 ∗ → 0 as 𝑇 → ∞. • There exists 𝑎 ∈ [𝐾] such that 𝑇 𝜇𝑎𝜇 ∗ − 𝜇𝑎 → 0 holds, there exists a constant 𝐶 ∈ (0, 1) independent of 𝑇 such that ℙ𝜇 ො 𝑎𝑇 𝛿 ≠ 𝑎𝜇 ∗ → 𝐶 as 𝑇 → ∞.

Minimax optimality. • If 𝐾 = 2, lim 𝑇→∞ sup 𝝁∈ℝ𝐾 𝑇 Regret𝝁 𝛿TS−EBA ≤ 1 𝑒 𝜎1 + 𝜎2 ≤ inf 𝛿∈ℰ lim 𝑇→∞ sup 𝝁∈ℝ𝐾 𝑇 Regret𝝁 𝛿. • If 𝐾 ≥ 3, lim 𝑇→∞ sup 𝝁∈ℝ𝐾 𝑇 Regret𝝁 𝛿TS−EBA ≤ 2 1 + 𝐾 − 1 𝐾 ෍ 𝑎∈ 𝐾 𝜎𝑎 2 log 𝐾

Bayes optimality. lim 𝑇→∞ 𝑇 න 𝝁∈ℝ𝐾 Regret𝝁 𝛿TS−EBA 𝑑Π(𝝁) ≤ 4 ෍ 𝑎∈ 𝐾 න 𝝁∖ 𝑎 ∈ℝ𝐾−1 𝜎∖ 𝑎 2∗ ℎ𝑎 𝜇∖ 𝑎 ∗ 𝝁∖ 𝑎 𝑑𝐻∖ 𝑎 𝝁∖ 𝑎 ≤ inf 𝛿∈ℰ lim 𝑇→∞ 𝑇 න 𝝁∈ℝ𝐾 Regret𝜇 𝛿 𝑑Π 𝝁 . • 𝜎∖ 𝑎 2∗ = 𝜎𝑏∗ 2 and 𝜇∖ 𝑎 ∗ = 𝜇𝑏∗, where 𝑏∗ be arg max 𝑐∈ 𝐾 ∖{𝑎} 𝜇𝑎 . • 𝝁∖ 𝑎 is a parameter vector which removes 𝜇𝑎 from 𝝁. • 𝐻∖ 𝑎 𝝁∖ 𝑎 is a prior of 𝝁∖ 𝑎 .

25 Intuition ◼ Regret𝝁 = 𝔼𝝁 𝑌𝑎𝜇 ∗ − 𝔼𝝁
𝑌ො 𝑎𝑇 = σ 𝑏≠𝑎𝜇 ∗ 𝜇𝑎𝜇 ∗ − 𝜇𝑏 ℙ𝝁 (ො 𝑎𝑇 = 𝑏). • ℙ𝝁 ො 𝑎𝑇 = 𝑏 = exp −𝐶1 𝑇 𝜇𝑎𝜇 ∗ − 𝜇𝑏 2 . • If 𝜇𝑎𝜇 ∗ − 𝜇𝑏 converges to zero slower than 1/ 𝑇, we have ℙ𝝁 ො 𝑎𝑇 = 𝑏 → 0. (𝑏 ∉ መ 𝒮𝑟𝑇 ) • If 𝜇𝑎𝜇 ∗ − 𝜇𝑏 = 𝐶2 / 𝑇, we have ℙ𝝁 ො 𝑎𝑇 = 𝑏 = constant. (𝑏 ∈ መ 𝒮𝑟𝑇 ) Treatment 1 Treatment 2 𝜇1 𝜇2 Treatment 3 𝜇3 𝜇2 Treatment 4 Treatment 5 𝜇5 መ 𝒮𝑟𝑇 . 𝐶2 / 𝑇

26 Intuition ◼ When two treatments are given, we allocate
them so as to maximize the discrepancy between their confidence intervals. ◼ When multiple treatments are given, there is no unique way to compare their confidence intervals. Treatment 1 Treatment 2 𝜇1 𝜇2 Confidence intervals Treatment 1 Treatment 2 𝜇1 𝜇2 Treatment 3 𝜇3 → It depends on the performance measures and uncertainty evaluation Maximize → This task is closely related to make the variance of ATE estimators smaller. → Use Neyman allocation.

27 Bayes optimal experiment for the simple regret ◼ Our
designed experiment and theory are applicable for cases with other distributions. E.g., Bernoulli distributions. Komiyama, J., Ariu, K., Kato, M., and Qin, C. (2023). Rate-optimal Bayesian simple regret in best arm identification. Mathematics of Operations Research. • Bayes optimal experiment for treatment choice with Bernoulli distributions. • Komiyama et al. (2023) corresponds to a simplified case of our new results. • In our previous study, we only consider Bernoulli bandits and the lower bound is not tight (there exists a constant gap between the upper and lower bounds). • In Bernoulli cases, the Neyman allocation becomes the uniform sampling. • No need for estimating the variances.

28 Related literature ◼ Ordinal optimization: The best-arm identification has
been considered as ordinal optimization. In this setting, we assume that ideal allocation probability is known. • Chen (2000) considers Gaussian outcomes. Glynn and Juneja (2004) considers more general distributions. ◼ Best–arm identification: • Audibert, Bubeck, and Munos (2010) establishes the best-arm identification framework. An ideal allocation probability is unknown, and we need to estimate it. • Bubeck, Munos, and Stoltz (2011) proposes minimax-rate optimal algorithm.

29 Related literature ◼ Limit-of-experiment (or local asymptotic normality) framework
for experimental design: Tools for developing experiments with matching lower and upper bounds by assuming locality for the underlying distributions. • Hirano and Porter (2008) introduces this framework for treatment choice. • Armstong (2021) and Hirano and Porter (2023) apply the framework for adaptive experiments. • Adusumilli (2025) proposes adaptive experimental design based on the limit-of-experiment framework and the diffusion-process theory. ➢ I guess that we can derive tight lower and upper bounds without using the limit-of-experiment. • 1/ 𝑇 regimes (locality) directedly appear as a result of the worst-case for the regret.

30 4. Theoretical analysis

31 Preliminary: regret decomposition ◼ The simple regret can be
written as Regret𝝁 = 𝔼𝝁 𝑌𝑎𝜇 ∗ − 𝔼𝝁 𝑌ො 𝑎𝑇 = ෍ 𝑏≠𝑎𝜇 ∗ 𝜇𝑎𝜇 ∗ − 𝜇𝑏 ℙ𝝁 (ො 𝑎𝑇 = 𝑏). • The regret (Regret𝝁 ) is equal to the sum of the probability (ℙ𝝁 (ො 𝑎𝑇 = 𝑏)) weighted by the ATE (𝜇𝑎𝜇 ∗ − 𝜇𝑏 ) between the best and suboptimal arms. • Also holds that Regret𝝁 ≤ σ 𝑏≠𝑎𝜇 ∗ 𝜇𝑎𝜇 ∗ − 𝜇𝑏 ℙ𝝁 (ො 𝑎𝑇 ≠ 𝑎𝜇 ∗ ). ◼ ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝜇 ∗ roughly upper bounded by ෍ 𝑏≠𝑎𝜇 ∗ exp −𝐶∗ 𝑇 𝜇𝑎𝜇 ∗ − 𝜇𝑏 2 for some constant 𝐶∗ > 0. • C.f., Chernoff bound. Large-deviation principles.

32 Preliminary: evaluation framework ◼ Distribution-dependent analysis. • Evaluate the
performance measures given 𝝁 fixed for 𝑇. • For each 𝝁 ∈ ℝ𝐾, we evaluate ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝝁 ∗ or Regret𝝁 ◼ Minimax analysis. • Evaluate the performance measures for the worst-case 𝝁. • We evaluate sup 𝝁∈ℝ𝐾 ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝜇 ∗ or sup 𝝁∈ℝ𝐾 Regret𝝁 ◼ Bayes analysis. • A prior Π is given. • Evaluate the performance measures by summing them weighted by the prior. • We evaluate ׬ ℝ𝐾 ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝝁 ∗ 𝑑Π(𝜇) or ׬ ℝ𝐾 Regret𝝁 𝑑Π(𝜇).

33 Preliminary: distribution-dependent, minimax, and Bayes analysis ◼ To obtain
an intuition, we consider a binary case (𝐾 = 2), where 𝑎𝜇 ∗ = 1. Regret𝝁 = 𝜇1 − 𝜇2 ⋅ ℙ𝝁 ො 𝑎𝑇 = 2 ≤ 𝜇1 − 𝜇2 ⋅ exp −𝐶∗ 𝑇 𝜇1 − 𝜇2 2 . ◼ Distribution-dependent analysis. → 𝜇1 − 𝜇2 can be asymptotically ignored since it is constant. → It is enough to evaluate ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝜇 ∗ for evaluating Regret𝑃 . → We evaluate the term 𝐶∗ by 1 𝑇 log ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝝁 ∗ ≈ 1 𝑇 log Regret𝝁 ≈ 𝐶∗.

34 Preliminary: distribution-dependent, minimax, and Bayes analysis ◼ To obtain
an intuition, we consider a binary case (𝐾 = 2), where 𝑎𝜇 ∗ = 1. Regret𝝁 = 𝜇1 − 𝜇2 ⋅ ℙ𝜇 ො 𝑎𝑇 = 2 ≤ 𝜇1 − 𝜇2 ⋅ exp −𝐶∗ 𝑇 𝜇1 − 𝜇2 2 . ◼ Minimax and Bayes analysis. • Probability of misidentification, ℙ𝝁 ො 𝑎𝑇 ≠ 𝑎𝝁 ∗ . • The rate of convergence to zero is still exponential (if the worst-case is well-defined). • Regret, Regret𝜇 . • Distributions whose means are 𝜇1 − 𝜇2 = 𝑂 1/ 𝑇 dominate the regret. • The rate of convergence to zero is 𝑂 1/ 𝑇 . → The analysis differs between the probability of misidentification and the regret.

35 Preliminary: distribution-dependent, minimax, and Bayes analysis ◼ Define 𝛥𝜇
= 𝜇1 − 𝜇2 . ◼ For some constant 𝐶 > 0, the regret Regret𝝁 = 𝛥𝝁 (ො 𝑎𝑇 ≠ 𝑎𝜇 ∗ ) can be written as Regret𝝁 ≈ 𝛥𝝁 exp −𝐶𝑇 𝛥𝝁 2 → The dependency of 𝛥𝝁 on 𝑇 affects the evaluation of the regret. 1. 𝛥𝜇 converges to zero with an order slower than 1/ 𝑇. → For some increasing function 𝑔, the regret becomes Regret𝝁 ≈ exp −𝑔 𝑇 . 2. 𝛥𝜇 converges to zero with an order of 1/ 𝑇. → For some 𝐶 > 0, Regret𝜇 ≈ C 𝑇 . 3. 𝛥𝜇 converges to zero with an order faster than 1/ 𝑇. → Regret𝝁 ≈ 𝑜(1/ 𝑇) → In the worst case, Regret𝝁 ≈ C 𝑇 , where 𝛥𝜇 = 𝑂 1 𝑇 )．

36 Preliminary: bandit lower bound ◼ Lower bounds in the
bandit problem are derived via the information theory. ◼ The following lemma is one of the most general and tight results for lower bounds. • Let 𝑃 and 𝑄 be two distributions with 𝐾 arms such that for all 𝑎, the distributions 𝑃(𝑎) and 𝑄(𝑎) of 𝑌(𝑎) are mutually absolutely continuous. • We have ෍ 𝑎∈[𝐾] 𝔼𝑃 ෍ 𝑡=1 𝑇 1[𝐴𝑡 = 𝑎] KL 𝑃 𝑎 , 𝑄𝑎 ≥ sup ℰ∈ℱ𝑇 𝑑 ℙ𝑃 ℰ , ℙ𝑄 ℰ . • 𝑑 𝑥, 𝑦 ≔ 𝑥 log 𝑥 𝑦 + 1 − 𝑥 log 1−𝑥 1−𝑦 is the binary relative entropy with the convention that 𝑑 0,0 = 𝑑 1, 1 = 0. Transportation lemma (Lai and Robbins 1985; Kaufmann et al. 2016)

37 Proof of the lower bound ◼ Simplicity, only consider
binary treatments, where 𝜇1 > 𝜇2 . • The regret is given as Regret𝝁 = 𝜇1 − 𝜇2 ℙ𝝁 (ො 𝑎𝑇 = 2). ◼ By using Kaufmann et al., (2016)’s lemma, we can derive the following lower bound for the probability of misidentification: ℙ𝝁 ො 𝑎𝑇 = 2 ≥ exp −𝑇 𝜇1 − 𝜇2 2 𝜎1 2/𝑤1 + 𝜎2 2/𝑤2 + 𝑜 1 . ◼ Therefore, we have Regret𝝁 ≥ 𝜇1 − 𝜇2 exp −𝑇 𝜇1−𝜇2 2 2 𝜎1 2/𝑤1+𝜎2 2/𝑤2 + 𝑜 1 . ◼ Letting 𝜇1 − 𝜇2 = 𝜎1 2 𝑤1 + 𝜎2 2 𝑤2 /𝑇 and optimizing 𝑤𝑎 , we have Regret𝝁 ≥ (𝜎1 + 𝜎2 )/ 𝑒𝑇 .

38 Proof of the upper bound ◼ We upper bound
the regret. ◼ From the regret decomposition Regret𝝁 = 𝜇1 − 𝜇2 ℙ𝝁 (ො 𝑎𝑇 = 2), only ℙ𝝁 (ො 𝑎𝑇 = 2) depends on the experiment. ◼ We aim to bound ℙ𝝁 ො 𝑎𝑇 = 2 . • Note that ℙ𝝁 ො 𝑎𝑇 = 2 = ℙ𝝁 ො 𝜇1,𝑇 ≤ ො 𝜇2,𝑇 = ℙ𝝁 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 ≤ −Δ2,𝝁 . • Δ2,𝝁 = 𝜇1 − 𝜇2

39 Proof of the upper bound ◼ Can we use
the central limit theorem? • Ex. 𝑇 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 → 𝑑 𝒩 0, 𝜎1 + 𝜎2 2 (Hahn, Hirano, and Karlan, 2011). (if the propensity score is given as 𝑤1 = 𝜎1 𝜎1+𝜎2 ). • The central limit theorem guarantees lim 𝑇→∞ sup 𝑧 ℙ𝝁 𝑇 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 𝜎1 + 𝜎2 ≤ 𝑧 𝜎1 + 𝜎2 − Φ 𝑧 𝜎1 + 𝜎2 = 0. • Φ(𝑥) is the cumulative density of the standard normal distribution. Evaluate the large deviation probability ℙ𝝁 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 ≤ −Δ2,𝝁 when 𝑇Δ2,𝝁 → ∞. • The central limit guarantee is insufficient.

40 Proof of the upper bound ◼ Note that ℙ𝝁
ො 𝑎𝑇 = 2 = ℙ𝝁 ො 𝜇1,𝑇 ≤ ො 𝜇2,𝑇 = ℙ𝝁 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 ≤ −Δ2,𝝁 . ◼ From the Chernoff bound, for any 𝜆 > 0, we have ℙ𝝁 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 ≤ −Δ2,𝝁 ≤ exp −𝜆 𝑇Δ2,𝝁 𝔼 exp 𝜆 𝑇 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 . • Here, we have 𝔼 exp 𝜆 𝑇 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 = exp log 𝔼 exp 𝜆 𝑇 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 . ◼ From the Taylor expansion and the central limit theorem, if the third moment of 𝑇൫ ൯ ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 is bounded, we have exp log 𝔼 exp 𝜆 ො 𝜇1,𝑇 − ො 𝜇2,𝑇 − Δ2,𝝁 ] ≤ exp 𝜆2 𝜎1 + 𝜎2 2/2 + 𝑜(1) .

41 Proof of the upper bound ◼ Finally, letting 𝜆
= 𝑇Δ2,𝝁 / 𝜎1 + 𝜎2 2, we have ℙ𝝁 ො 𝑎𝑇 = 2 ≤ ℙ𝝁 Ƹ 𝜇1,𝑇 − Ƹ 𝜇2,𝑇 − Δ2,𝝁 ≤ −Δ2,𝝁 ≤ exp −𝜆 𝑇Δ2,𝝁 𝔼 exp 𝜆 𝑇 Ƹ 𝜇1,𝑇 − Ƹ 𝜇2,𝑇 − Δ2,𝝁 ≤ exp −𝜆 𝑇Δ2,𝝁 exp 𝜆2 𝜎1 + 𝜎2 2/2 + 𝑜 1 ≤ exp − 𝑇Δ2,𝝁 2 𝜎1 + 𝜎2 2 + 𝑜 1 . ◼ Question: this upper bound is tight? • When 𝑇Δ2,𝝁 → 0, the central limit guarantee is tighter than the Chernoff bound.

42 5. On the optimal experimental design for treatment choice

43 Difficulty in best-arm identification ◼ Why we consider minimax
or Bayes optimality for treatment choice? → We can consider optimality under each fixed distribution. • Kasy and Sautmann (2021) claims that they developed such a result. • But their proof had several issues. → Through the investigation of their issues, various impossibility theorems have been showed. ◼ Impossibility theorems: there is no algorithm which is optimal for any distribution. • Kaufmann (2020). • Ariu, Kato, Komiyama, McAlinn, and Qin (2021). • Degenne (2023). • Wang, Ariu, and Proutier (2024) • Imbens, Qin, and Wager (2025(

44 Preliminary: bandit lower bound ◼ Lower bounds in the
bandit problem are derived via the information theory. ◼ The following lemma is one of the most general and tight results for lower bounds. • Let 𝑃 and 𝑄 be two distributions with 𝐾 arms such that for all 𝑎, the distributions 𝑃(𝑎) and 𝑄(𝑎) of 𝑌(𝑎) are mutually absolutely continuous. • We have ෍ 𝑎∈[𝐾] 𝔼𝑃 ෍ 𝑡=1 𝑇 1[𝐴𝑡 = 𝑎] KL 𝑃 𝑎 , 𝑄𝑎 ≥ sup ℰ∈ℱ𝑇 𝑑 ℙ𝑃 ℰ , ℙ𝑄 ℰ . • 𝑑 𝑥, 𝑦 ≔ 𝑥 log 𝑥 𝑦 + 1 − 𝑥 log 1−𝑥 1−𝑦 is the binary relative entropy with the convention that 𝑑 0,0 = 𝑑 1, 1 = 0. Transportation lemma (Lai and Robbins 1985; Kaufmann et al. 2016)

45 Example: regret lower bound in regret minimization ◼ Lai
and Robbins (1985) develops a lower bound for the regret minimization problem. • Regret minimization problem. • Goal. Maximize the cumulative outcome σ𝑡=1 𝑇 𝑌 𝐴𝑡,𝑡 . • The (in-sample) regret is defined as Regret𝑃 ≔ 𝔼𝑃 σ𝑡=1 𝑇 𝑌𝑎𝑃 ∗ , 𝑡 − σ𝑡=1 𝑇 𝑌 𝐴𝑡,𝑡 . • Under each distribution 𝑃0 , for large 𝑇, the lower bound is given as Regret𝑃0 ≥ log 𝑇 inf 𝑄∈𝐴𝑙𝑡 𝑃0 ෍ 𝑎∈ 𝐾 𝔼𝑃0 ෍ 𝑡=1 𝑇 1[𝐴𝑡 = 𝑎] KL 𝑃𝑎,0 , 𝑄𝑎 for large 𝑇 → ∞, where 𝐴𝑙𝑡 𝑃0 ≔ {𝑄 ∈ 𝒫: 𝑎𝑄 ∗ ≠ 𝑎𝑃_) ∗ } ◼ Kaufmann’s lower bound is a generalization of Lai and Robbins’ lower bound. • It is known that Kaufmann’s lower bound can yield lower bounds for various bandit problems. • “Can we develop a lower bound for best-arm identification using Kaufmann’s lemma?”

46 Lower bound in the fixed-budget setting ◼ A direct
application of Kaufmann’s lemma for best-arm identification yields the following bound. ◼ Lower bound (conjecture): • Under each distribution 𝑃0 of the data-generating process, a lower bound is given as ℙ𝑃0 ො 𝑎𝑇 ≠ 𝑎𝑃0 ∗ ≥ inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp − ෍ 𝑎∈[𝐾] 𝔼𝑄 ෍ 𝑡=1 𝑇 1[𝐴𝑡 = 𝑎] KL 𝑄𝑎 𝑃𝑎, 0 . • 1 𝑇 𝔼𝑄 σ𝑡=1 𝑇 1[𝐴𝑡 = 𝑎] corresponds to a treatment-assignment probability (ratio) under 𝑄. • By denoting 1 𝑇 𝔼𝑄 σ𝑡=1 𝑇 1[𝐴𝑡 = 𝑎] by 𝑤𝑎 , we can rewrite the above conjecture as ℙ𝑃0 ො 𝑎𝑇 ≠ 𝑎𝑃0 ∗ ≥ inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp − ෍ 𝑎∈ 1,2,…,𝐾 𝑤𝑎 KL 𝑄𝑎 , 𝑃0,𝑎 . • Note that 𝑤𝑎 = 1 𝑇 𝔼𝑄 σ𝑡=1 𝑇 1[𝐴𝑡 = 𝑎] implies that 𝜋 is a treatment-assignment probability (ratio) under 𝑄.

47 Optimal design in the fixed-budget setting ◼ Question: Does
there exist an algorithm whose probability of misidentification ℙ𝑃0 ො 𝑎𝑇 ≠ 𝑎𝑃0 ∗ exactly matches the following lower bound?: inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 ෍ 𝑎∈[𝐾] 𝑤a KL 𝑄𝑎 , 𝑃0,𝑎 . → Answer: No(without additional assumptions). • When the number of treatments is two, and the outcomes follow the Gaussian distributions with known variances, such an optimal algorithm exists. • The optimal algorithm is Neyman allocation. • In more general cases, no algorithm exists.

48 Neyman allocation is optimal in two-armed bandits ◼ Consider
two treatments (𝐾 = 2) • Assume that the variance is known. ◼ The lower bound can be computed as inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 ෍ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃0,𝑎 ≥ exp −𝑇 𝔼𝑃0 𝑌1 − 𝔼𝑃0 𝑌2 2 𝜎1 + 𝜎2 2 ◼ Asymptotically optimal algorithm is Neyman allocation (Kaufmann et al. 2016): • Assume that the variances are known. • Allocate treatments with a ratio of the standard deviation. • At the end of the experiment, recommend a treatment with the highest sample mean as the best treatment. ◼ 𝜎1 + 𝜎2 2 is the asymptotic variance of the ATE estimator under the Neyman allocation.

49 Optimal algorithm in the multi-armed bandits. Kasy and Sautmann
(KS) (2021) ◼ Propose the Exploration Sampling (ES) algorithm. • A variant of the Top-Two Thompson sampling (Russo, 2016). ◼ They show that under their ES algorithm, for each 𝑃0 ∈ Bernoulli independent of 𝑇, it holds that Regret𝑃0 ≤ inf 𝑤 inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 ෍ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃𝑎,0 . → They claim that their algorithm is optimal for Kaufmann’s lower bound ℙ𝑃0 ො 𝑎𝑇 ≠ 𝑎𝑃0 ∗ ≥ inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 ෍ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃0,𝑎 .

50 Impossibility theorem in the multi-armed bandits ◼ Issues in
KS (Ariu et al. 2025). • The KL divergence is flipped. • KS (Incorrect): Regret𝑃0 ≤ inf 𝑤 inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 σ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃𝑎,0 . • Ours (Correct): Regret𝑃0 ≤ inf 𝑤 inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 σ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑃𝑎,0 , 𝑄𝑎 . • There exists a distribution under which any algorithm cannot attain Γ∗. Impossibility theorem: there exists a distribution 𝑃0 ∈ 𝒫 under which Regret𝑃0 ≥ inf 𝑤 inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 ෍ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃𝑎,0 .

51 Why? ◼ Why the conjectured lower bound does not
work? ◼ There are several technical issues. 1. We cannot compute an ideal treatment-allocation probability from the conjectured lower bound. • Recall that the lower bound is given as ℙ𝑃0 ො 𝑎𝑇 ≠ 𝑎𝑃0 ∗ ≥ 𝑉∗ = inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 ෍ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃𝑎,0 , Here, 𝑤𝑎 = 1 𝑇 𝔼𝑄 σ𝑡=1 𝑇 1[𝐴𝑡 = 𝑎] depends on 𝑄. → min 𝑄 min 𝑤 exp −𝑇 σ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃𝑎,0 ≠ inf 𝑤 inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 σ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃𝑎,0 . → We cannot obtain an ideal allocation probability from this lower bound. 2. There exists 𝑃0 such that ℙ𝑃0 ො 𝑎𝑇 ≠ 𝑎𝑃0 ∗ ≥ inf 𝑤 inf 𝑄∈𝐴𝑙𝑡(𝑃0) exp −𝑇 σ 𝑎∈[𝐾] 𝑤𝑎 KL 𝑄𝑎 , 𝑃𝑎,0 .

52 Related literature • Glynn and Juneja (2004) develops the
large-deviation optimal experiments for treatment choice. • Kasy and Sautman (2021) shows that Russo (2021)’s Top-Two-Thompson sampling is optimal in the sense that the upper bound matches the bound in Glynn and Juneja (2004). • Kaufmann, Cappé, and Garivier (2016) develops a general lower bound for the bandit problem. ◼ Impossibility theorems. • Ariu, Kato, Komiyama, McAlinn, and Qin (2025) finds a counterexample for Kasy and Sautmann (2021). • Degenne (2023) shows that the bounds in Glynn and Juneja (2004) is a special case of Kaufmann’s bounds under strong assumptions. • Wang, Ariu, and Proutier (2024) and Imbens, Qin, and Wager also derive counterexamples.

53 Concluding remarks

54 Concluding remarks ◼ Adaptive experimental design for treatment choice
(BAI). • Goal. Choosing the best treatment arm. • A lower bound and an ideal treatment-allocation probability depend on the uncertainty. • Distribution-dependent analysis: • No globally optimal algorithm exists for Kaufmann’s lower bound. • Impossibility theorems: there exists a distribution under which a lower bound is larger than Kaufmann’s one. • Minimax and Bayes analysis • TS-EBA experiment.

55 Reference • Masahiro Kato, Takuya Ishihara, Junya Honda, and
Yusuke Narita. Efficient adaptive experimental design for average treatment effect estimation, 2020. arXiv:2002.05308. • Masahiro Kato, Kenichiro McAlinn, and Shota Yasui. The adaptive doubly robust estimator and a paradox concerning logging policy. In International Conference on Neural Information Processing Systems (NeurIPS), 2021. • Kaito Ariu, Masahiro Kato, Junpei Komiyama, Kenichiro McAlinn, and Chao Qin. A comment on “adaptive treatment assignment in experiments for policy choice”, 2021. • Masahiro Kato. Generalized Neyman allocation for locally minimax optimal best-arm identification, 2024a. arXiv: 2405.19317. • Masahiro Kato. Locally optimal fixed-budget best arm identification in two-armed gaussian bandits with unknown variances, 2024b. arXIV: 2312.12741. • Masahiro Kato and Kaito Ariu. The role of contextual information in best arm identification, 2021. Accepted for Journal of Machine Learning Research conditioned on minor revisions. • Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment effect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024a. • Masahiro Kato, Kyohei Okumura, Takuya Ishihara, and Toru Kitagawa. Adaptive experimental design for policy learning, 2024b. arXiv: 2401.03756. • Junpei Komiyama, Kaito Ariu, Masahiro Kato, and Chao Qin. Rate-optimal bayesian simple regret in best arm identification. Mathematics of Operations Research, 2023.

56 Reference • van der Vaart, A. (1998), Asymptotic Statistics,
Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press. • Tabord-Meehan, M. (2022), “Stratification Trees for Adaptive Randomization in Randomized Controlled Trials,” The Review of Economic Studies. • van der Laan, M. J. (2008), “The Construction and Analysis of Adaptive Group Sequential Designs,” https://biostats.bepress.com/ucbbiostat/paper232. • Neyman, J. (1923), “Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes,” Statistical Science, 5, 463–472. • Neyman, J. (1934), “On the Two Different Aspects of the Representative Method: the Method of Stratified Sampling and the Method of Purposive Selection,” Journal of the Royal Statistical Society, 97, 123–150. • Manski, C. F. (2002), “Treatment choice under ambiguity induced by inferential problems,” Journal of Statistical Planning and Inference, 105, 67–82. • Manski (2004), “Statistical Treatment Rules for Heterogeneous Populations,” Econometrica, 72, 1221–1246.

57 Reference • Kitagawa, T. and Tetenov, A. (2018), “Who
Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice,” Econometrica, 86, 591–616. • Garivier, A. and Kaufmann, E. (2016), “Optimal Best Arm Identification with Fixed Confidence,” in Conference on Learning Theory. • Glynn, P. and Juneja, S. (2004), “A large deviations perspective on ordinal optimization,” in Proceedings of the 2004 Winter Simulation Conference, IEEE, vol. 1. • Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018), “Double/debiased machine learning for treatment and structural parameters,” The Econometrics Journal. • Degenne, R. (2023), “On the Existence of a Complexity in Fixed Budget Bandit Identification,” Conference on Learning Theory (COLT). • Kasy, M. and Sautmann, A. (2021), “Adaptive Treatment Assignment in Experiments for Policy Choice,” Econometrica, 89, 113– 132. • Rubin, D. B. (1974), “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of Educational Psychology.

Minimax and Bayes Optimal Best-arm Identificati...

Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice

More Decks by MasaKat0

Other Decks in Research

Featured

Transcript