Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Arshak Minasyan

Arshak Minasyan

(CREST-ENSAE, IP Paris)

Title — Recent advances in robust mean estimation in high dimensions

Abstract — Arguably the first rigorously studied question in robust statistics is the mean (or the location parameter) estimation for contaminated Gaussian distributions, dating back to 1964, pioneered by P. Huber. The natural extension of this question is to consider it in high dimensions. This problem has received renewed attention in the last few years, both from statistical and computational aspects. In this talk, I review some recent advances in the statistical performance of mean estimators under the adversarial contamination model. First, from the practical point of view, one would require a robust estimator to have a polynomial runtime. I will discuss the challenges and crucial properties of such estimators. The typical minimax rate of computationally tractable estimators is the additional log factor in the dependence on the contamination level. Secondly, from the statistical point of view, despite the recent significant interest in robust statistics, achieving dimension-free bound with optimal dependence on contamination level in the canonical Gaussian case remained open. In [3], we constructed an estimator for the mean vector that is dimension-free and has optimal dependence on the contamination level. Previously known results were either dimension-dependent and required a covariance matrix to be close to identity, or had a sub-optimal dependence on the contamination level.

References
The talk will be built on the following papers:

1. Dalalyan A.S. and Minasyan A. All-in-one robust estimator of the Gaussian mean. The Annals of Statistics (50) 1193–1219, 2022.
2. Bateni A.-H., Minasyan A. and Dalalyan A. Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction. arXiv preprint arXiv:2204.02323, 2022.
3. A. Minasyan and N. Zhivotovskiy. Statistically optimal robust mean and covariance estimation for anisotropic Gaussians, arXiv preprint arXiv:2301.09024, 2023.

S³ Seminar

March 08, 2023
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Recent advances in robust mean estimation in high dimensions Arshak

    Minasyan CREST-ENSAE, IP Paris S3 – The Paris-Saclay Signal Seminar March 8, 2023 Based on joint works with Arnak Dalalyan (CREST-ENSAE, IP Paris), Amir-Hossein Bateni (Université Grenoble Alpes), Nikita Zhivotovskiy (University of California, Berkeley). arshak minasyan robust mean estimation in high dimensions 1
  2. classical mean estimation setting We observe n vectors from Rd

    such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . arshak minasyan robust mean estimation in high dimensions 2
  3. classical mean estimation setting We observe n vectors from Rd

    such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . Question: What happens if some “small” fraction (denoted by ε) of data is contaminated? arshak minasyan robust mean estimation in high dimensions 2
  4. classical mean estimation setting We observe n vectors from Rd

    such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . Question: What happens if some “small” fraction (denoted by ε) of data is contaminated? ▶ Statistical guarantees ▶ Computational aspects arshak minasyan robust mean estimation in high dimensions 2
  5. contamination models MOF = P n µ : µ ∈

    M . arshak minasyan robust mean estimation in high dimensions 3
  6. contamination models MOF = P n µ : µ ∈

    M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , arshak minasyan robust mean estimation in high dimensions 3
  7. contamination models MOF = P n µ : µ ∈

    M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , MPC(o) = Pµ1 ⊗ · · · ⊗ Pµn : µi ∈ M, ∃O ⊂ [n] s.t. |O| ≤ o, µi = µj ∀i, j ∈ Oc and µi ̸= µj, ∀i ∈ O and some j ∈ Oc . arshak minasyan robust mean estimation in high dimensions 3
  8. contamination models MOF = P n µ : µ ∈

    M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , MPC(o) = Pµ1 ⊗ · · · ⊗ Pµn : µi ∈ M, ∃O ⊂ [n] s.t. |O| ≤ o, µi = µj ∀i, j ∈ Oc and µi ̸= µj, ∀i ∈ O and some j ∈ Oc . MAC(o) = σ(P n−o µ ⊗ P1 ⊗ · · · ⊗ Po) : µ ∈ M, σ ∈ Sn permutation group and P1, . . . , Po are arbitrary . arshak minasyan robust mean estimation in high dimensions 3
  9. relation between contamination models MOF MHC MPC MAC arshak minasyan

    robust mean estimation in high dimensions 4
  10. (sub-)Gaussian adversarial contamination Definition (Gaussian Adversarial Contamination) Let Y1, .

    . . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. arshak minasyan robust mean estimation in high dimensions 5
  11. (sub-)Gaussian adversarial contamination Definition (Gaussian Adversarial Contamination) Let Y1, .

    . . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. Definition (Sub-Gaussian Adversarial Contamination) Let ξ1, . . . , ξn ind. ∼ SGd(τ) then the contaminated sample X1, . . . , Xn is distributed according to SGAC(µ∗, Σ, ε, τ) when i := 1, . . . , n : Xi ̸= µ∗ + Σ1/2ξi ≤ εn. arshak minasyan robust mean estimation in high dimensions 5
  12. (sub-)Gaussian adversarial contamination Definition (Gaussian Adversarial Contamination) Let Y1, .

    . . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. Definition (Sub-Gaussian Adversarial Contamination) Let ξ1, . . . , ξn ind. ∼ SGd(τ) then the contaminated sample X1, . . . , Xn is distributed according to SGAC(µ∗, Σ, ε, τ) when i := 1, . . . , n : Xi ̸= µ∗ + Σ1/2ξi ≤ εn. Outliers: O = {i : Xi ̸= Yi} Inliers: I = {1, . . . , n} \ O. arshak minasyan robust mean estimation in high dimensions 5
  13. basic estimators ▶ Sample mean: µn = 1 n n

    i=1 Xi arshak minasyan robust mean estimation in high dimensions 6
  14. basic estimators ▶ Sample mean: µn = 1 n n

    i=1 Xi Not robust ! One point can make the risk arbitrarily large. arshak minasyan robust mean estimation in high dimensions 6
  15. basic estimators ▶ Sample mean: µn = 1 n n

    i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. arshak minasyan robust mean estimation in high dimensions 6
  16. basic estimators ▶ Sample mean: µn = 1 n n

    i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! arshak minasyan robust mean estimation in high dimensions 6
  17. basic estimators ▶ Sample mean: µn = 1 n n

    i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. arshak minasyan robust mean estimation in high dimensions 6
  18. basic estimators ▶ Sample mean: µn = 1 n n

    i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. ▶ Tukey’s median: µTM n = arg max µ∈Rd Dn(µ), Dn(µ) = inf u∈Sd−1 n i=1 1(u⊤Xi ≤ u⊤µ) arshak minasyan robust mean estimation in high dimensions 6
  19. basic estimators ▶ Sample mean: µn = 1 n n

    i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. ▶ Tukey’s median: µTM n = arg max µ∈Rd Dn(µ), Dn(µ) = inf u∈Sd−1 n i=1 1(u⊤Xi ≤ u⊤µ) Robust & optimal only when the covariance is identical and takes exponential time to compute ! arshak minasyan robust mean estimation in high dimensions 6
  20. median of means A simple estimator is median-of-means. Goes back

    to Nemirovsky, Yudin (1983), Jerrum, Valiant, and Vazirani (1986), Alon, Matias, and Szegedy (2002). µMOM n,δ ≜ med 1 m m j=1 Xj, . . . , 1 m km j=(k−1)m+1 Xj . Let δ ∈ (0, 1), k = 8 log(1/δ) and m = n 8 log(1/δ) . Then, |µMOM n,δ − µ∗| ≤ σ 32 log(1/δ) n Further developments of the MOM approach including heavy-tailed distributions in high dimensions include: Depersin and Lecue (2021), Lugosi and Mendelson (2019), Minsker (2015), Hsu and Sabato (2013), etc. arshak minasyan robust mean estimation in high dimensions 7
  21. trimmed mean Arguably the oldest and most natural idea of

    robust statistics is the trimmed mean. Tukey and McLaughlin (1963), Huber and Ronchetti (1984), Bickel (1965), Stigler (1973). Divide the contaminated sample into two halves: X1, . . . , Xn; Y1, . . . , Yn. Let α = Y(εn) and β = Y((1−ε)n) be the empirical quantiles of the second half. Define µ2n = 1 n n i=1 ϕα,β(Xi), arshak minasyan robust mean estimation in high dimensions 8
  22. trimmed mean Arguably the oldest and most natural idea of

    robust statistics is the trimmed mean. Tukey and McLaughlin (1963), Huber and Ronchetti (1984), Bickel (1965), Stigler (1973). Divide the contaminated sample into two halves: X1, . . . , Xn; Y1, . . . , Yn. Let α = Y(εn) and β = Y((1−ε)n) be the empirical quantiles of the second half. Define µ2n = 1 n n i=1 ϕα,β(Xi), where (w.p. ≥ 1 − δ given nε ∼ log(1/δ)) ϕα,β(x) =      β, if x > β, x, if x ∈ [α, β], α, if , x < α. |µ2n − µ∗| ≤ 9σ log(8/δ) n . Lugosi and Mendelson (2021) showed the optimality of trimmed mean in high dimensions for distributions with finite two moments. arshak minasyan robust mean estimation in high dimensions 8
  23. lower bound for Gaussian mean Combining the lower bound from

    Chen et al. (2018) with the lower bound in the outlier-free regime we have inf µn ∥µn − µ∗∥2 ≥ c ∥Σ∥ rΣ n + ε holds with positive probability for some absolute constant c > 0, where rΣ ≜ Tr(Σ) ∥Σ∥ is usually called effective rank. Chen et al. (2018) also proved that Tukey’s median of Gaussians satisfies ∥µTM n − µ∗∥2 ≤ Cσ d + log(1/δ) n + ε w.p. ≥ 1 − δ when Σ = σ2Id. arshak minasyan robust mean estimation in high dimensions 9
  24. lower bound for sub-Gaussian mean Lugosi and Mendelson (2021) showed

    that in the family of sub-Gaussian distributions, it is indeed impossible to achieve a rate better than ∥Σ∥ rΣ n + ε log(1/ε) , i.e., inf µn ∥µn − µ∗∥2 ≥ c ∥Σ∥ rΣ n + ε log(1/ε) holds with positive probability. arshak minasyan robust mean estimation in high dimensions 10
  25. optimal robust Gaussian mean estimation Theorem (M., Zhivotovskiy 2023+) Assume

    X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < c1, then there is an estimator µn satisfying, with probability at least 1 − δ, ∥µn − µ∗∥2 ≤ c2 ∥Σ∥ rΣ n + log(1/δ) n + ε , where c1, c2 > 0 are some absolute constants. arshak minasyan robust mean estimation in high dimensions 11
  26. optimal robust Gaussian mean estimation Theorem (M., Zhivotovskiy 2023+) Assume

    X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < c1, then there is an estimator µn satisfying, with probability at least 1 − δ, ∥µn − µ∗∥2 ≤ c2 ∥Σ∥ rΣ n + log(1/δ) n + ε , where c1, c2 > 0 are some absolute constants. µn = arg min ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| , where θ ∼ Nd(v, β−1Id) and the parameter β satisfies rΣ /10 ≤ β ≤ 10rΣ . arshak minasyan robust mean estimation in high dimensions 11
  27. tuning the parameter β The parameter β is chosen by

    the Statistician in such a way that rΣ /10 ≤ β ≤ 10rΣ . arshak minasyan robust mean estimation in high dimensions 12
  28. tuning the parameter β The parameter β is chosen by

    the Statistician in such a way that rΣ /10 ≤ β ≤ 10rΣ . To estimate β we need to estimate both ∥Σ∥ and Tr(Σ). Abdalla and Zhivotovskiy (2022) provided an estimator ω such that ∥Σ∥/4 ≤ ω ≤ 4∥Σ∥. The estimation of Tr(Σ) reduces to mean estimation in R and using the estimator from Lugosi and Mendelson (2019) we have an estimator τ such that Tr(Σ)/2 ≤ τ ≤ 2Tr(Σ). Hence, this yields an estimator such that rΣ /8 ≤ τ ω ≤ 8rΣ . An alternative approach for estimating β would be using Lepskii’s method. arshak minasyan robust mean estimation in high dimensions 12
  29. going beyond the Gaussian assumption Gaussian assumption does not play

    a crucial role. Our results extend to distributions that are/have ▶ Symmetry around the mean and spherical symmetry properties (⟨X − µ, v⟩/ √ v⊤Σv for any v ∈ Sd−1 is independent of v). ▶ For small enough ε and some c > 0 |F−1(1/2 ± ε) − F−1(1/2)| ≤ cε. ▶ The density function f is separated from zero by an absolute constant for all x ∈ [F−1(1/2 − ε), F−1(1/2 + ε)]. ▶ Sub-Gaussian tails. arshak minasyan robust mean estimation in high dimensions 13
  30. computational aspects of smoothed median Recall µn = arg min

    ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| . arshak minasyan robust mean estimation in high dimensions 14
  31. computational aspects of smoothed median Recall µn = arg min

    ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| . ▶ It takes exponential time in d to compute the smoothed median of contaminated data X1, . . . , Xn. ▶ For mean estimation, it is a challenging open problem to have a polynomial time algorithm with a linear dependence on ε, even when Σ = Id. arshak minasyan robust mean estimation in high dimensions 14
  32. iteratively reweighted mean estimator We consider the weighted sample mean

    Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. arshak minasyan robust mean estimation in high dimensions 15
  33. iteratively reweighted mean estimator We consider the weighted sample mean

    Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. Notice that Xw − µ∗ = ψw, where ψw = i∈I wi ∥wI∥1 ζi + 1 1 − εw i∈Ic wi(Xi − Xw) with εw = i∈Ic wi, and ζ1, . . . , ζn i.i.d. ∼ Nd(0, Σ). arshak minasyan robust mean estimation in high dimensions 15
  34. iteratively reweighted mean estimator We consider the weighted sample mean

    Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. Notice that Xw − µ∗ = ψw, where ψw = i∈I wi ∥wI∥1 ζi + 1 1 − εw i∈Ic wi(Xi − Xw) with εw = i∈Ic wi, and ζ1, . . . , ζn i.i.d. ∼ Nd(0, Σ). ∥Xw − µ∗∥2 ≤ sup v∈B2 v⊤ψw. arshak minasyan robust mean estimation in high dimensions 15
  35. iteratively reweighted mean estimator For any pair of vectors w

    ∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . arshak minasyan robust mean estimation in high dimensions 16
  36. iteratively reweighted mean estimator For any pair of vectors w

    ∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) arshak minasyan robust mean estimation in high dimensions 16
  37. iteratively reweighted mean estimator For any pair of vectors w

    ∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) ▶ For fixed value of µ minimizing G(w, µ) becomes an SDP arshak minasyan robust mean estimation in high dimensions 16
  38. iteratively reweighted mean estimator For any pair of vectors w

    ∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) ▶ For fixed value of µ minimizing G(w, µ) becomes an SDP ▶ For fixed value of w ∈ ∆n−1 Xw = arg min µ∈Rn G(w, µ). arshak minasyan robust mean estimation in high dimensions 16
  39. algorithm for known ε and Σ alg]algo:a1 Alg. 1: Iteratively

    reweighted mean estimator Input: data X1 , . . . , Xn ∈ Rd, contamination rate ε and Σ Output: parameter estimate µIR n Initialize: compute µ0 as a minimizer of n i=1 ∥Xi − µ∥2 Set K = 0 ∨ log(4rΣ)−2 log(ε(1−2ε)) 2 log(1−2ε)−log ε−log(1−ε) . For k = 1 : K Compute current weights: w ∈ arg min (n−nε)∥w∥∞≤1 λmax n i=1 wi (Xi − µk−1)⊗2 − Σ . Update the estimator: µk = Xw . EndFor Return µIR n := µK. arshak minasyan robust mean estimation in high dimensions 17
  40. in-expectation bound for Gaussians Theorem (M. & Dalalyan, 2022) Assume

    that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . arshak minasyan robust mean estimation in high dimensions 18
  41. in-expectation bound for Gaussians Theorem (M. & Dalalyan, 2022) Assume

    that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . ▶ Small and explicit constant in front of the rate, compared to other estimator’s constants which are non-explicit or very large, e.g. around 107. arshak minasyan robust mean estimation in high dimensions 18
  42. in-expectation bound for Gaussians Theorem (M. & Dalalyan, 2022) Assume

    that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . ▶ Small and explicit constant in front of the rate, compared to other estimator’s constants which are non-explicit or very large, e.g. around 107. ▶ Effective rank of the covariance matrix instead of the dimension. arshak minasyan robust mean estimation in high dimensions 18
  43. in-probability bound for sub-Gaussians Theorem (M. & Dalalyan, 2022) Assume

    that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . arshak minasyan robust mean estimation in high dimensions 19
  44. in-probability bound for sub-Gaussians Theorem (M. & Dalalyan, 2022) Assume

    that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . ▶ The constant A(τ) is not explicit and depends only on variance proxy τ. arshak minasyan robust mean estimation in high dimensions 19
  45. in-probability bound for sub-Gaussians Theorem (M. & Dalalyan, 2022) Assume

    that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . ▶ The constant A(τ) is not explicit and depends only on variance proxy τ. ▶ The dependence on ε is optimal for sub-Gaussians. arshak minasyan robust mean estimation in high dimensions 19
  46. unknown Σ Theorem (M. & Dalalyan, 2022) Assume that X1,

    . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 20
  47. unknown Σ Theorem (M. & Dalalyan, 2022) Assume that X1,

    . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. ▶ For unknown isotropic Σ = σ2Id, the sub-Gaussian rates hold. arshak minasyan robust mean estimation in high dimensions 20
  48. unknown Σ Theorem (M. & Dalalyan, 2022) Assume that X1,

    . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. ▶ For unknown isotropic Σ = σ2Id, the sub-Gaussian rates hold. ▶ Among computationally tractable estimators the rate √ ε for general unknown Σ is the best known in the literature. arshak minasyan robust mean estimation in high dimensions 20
  49. in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022)

    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 22
  50. in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022)

    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. arshak minasyan robust mean estimation in high dimensions 22
  51. in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022)

    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. arshak minasyan robust mean estimation in high dimensions 22
  52. in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022)

    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. ▶ Breakdown point is equal to 0.5. arshak minasyan robust mean estimation in high dimensions 22
  53. in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022)

    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. ▶ Breakdown point is equal to 0.5. ▶ It has an additional factor √ log p. arshak minasyan robust mean estimation in high dimensions 22
  54. unknown Σ Theorem (Bateni, M., Dalalyan, 2022) Let X1, .

    . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 23
  55. unknown Σ Theorem (Bateni, M., Dalalyan, 2022) Let X1, .

    . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. ▶ If γ is at most of order ε log(1/ε) then we have the same rate as if we knew Σ. arshak minasyan robust mean estimation in high dimensions 23
  56. unknown Σ Theorem (Bateni, M., Dalalyan, 2022) Let X1, .

    . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. ▶ If γ is at most of order ε log(1/ε) then we have the same rate as if we knew Σ. ▶ If γ is of constant order then we recover the best-known rate for computationally tractable estimators, i.e., rΣ /n + √ ε. arshak minasyan robust mean estimation in high dimensions 23
  57. references A. Dalalyan and A. Minasyan. All-in-one robust estimator of

    the Gaussian mean. Annals of Statistics, 2022. A.-H. Bateni, A. Minasyan, A. Dalalyan. Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction. arXiv preprint arXiv:2204.02323, 2022. A. Minasyan and N. Zhivotovskiy. Statistically optimal robust mean and covariance estimation for anisotropic Gaussians. arXiv preprint arXiv:2301.09024, 2023. arshak minasyan robust mean estimation in high dimensions 25