Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Arshak Minasyan

Arshak Minasyan

(CREST-ENSAE, IP Paris)

Title — Recent advances in robust mean estimation in high dimensions

Abstract — Arguably the first rigorously studied question in robust statistics is the mean (or the location parameter) estimation for contaminated Gaussian distributions, dating back to 1964, pioneered by P. Huber. The natural extension of this question is to consider it in high dimensions. This problem has received renewed attention in the last few years, both from statistical and computational aspects. In this talk, I review some recent advances in the statistical performance of mean estimators under the adversarial contamination model. First, from the practical point of view, one would require a robust estimator to have a polynomial runtime. I will discuss the challenges and crucial properties of such estimators. The typical minimax rate of computationally tractable estimators is the additional log factor in the dependence on the contamination level. Secondly, from the statistical point of view, despite the recent significant interest in robust statistics, achieving dimension-free bound with optimal dependence on contamination level in the canonical Gaussian case remained open. In [3], we constructed an estimator for the mean vector that is dimension-free and has optimal dependence on the contamination level. Previously known results were either dimension-dependent and required a covariance matrix to be close to identity, or had a sub-optimal dependence on the contamination level.

References
The talk will be built on the following papers:

1. Dalalyan A.S. and Minasyan A. All-in-one robust estimator of the Gaussian mean. The Annals of Statistics (50) 1193–1219, 2022.
2. Bateni A.-H., Minasyan A. and Dalalyan A. Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction. arXiv preprint arXiv:2204.02323, 2022.
3. A. Minasyan and N. Zhivotovskiy. Statistically optimal robust mean and covariance estimation for anisotropic Gaussians, arXiv preprint arXiv:2301.09024, 2023.

S³ Seminar

March 08, 2023
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Recent advances in robust mean estimation in
    high dimensions
    Arshak Minasyan
    CREST-ENSAE, IP Paris
    S3 – The Paris-Saclay Signal Seminar
    March 8, 2023
    Based on joint works with Arnak Dalalyan (CREST-ENSAE, IP Paris),
    Amir-Hossein Bateni (Université Grenoble Alpes), Nikita Zhivotovskiy
    (University of California, Berkeley).
    arshak minasyan robust mean estimation in high dimensions 1

    View Slide

  2. classical mean estimation setting
    We observe n vectors from Rd such that
    X1, . . . , Xn
    i.i.d.
    ∼ Nd(µ∗, Σ)
    with an unknown values of µ∗, Σ.
    The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian
    concentration states that w.p. ≥ 1 − δ, it holds
    1
    n
    n
    i=1
    Xi − µ∗
    2

    Tr(Σ)
    n
    +
    2∥Σ∥ log(1/δ)
    n
    .
    arshak minasyan robust mean estimation in high dimensions 2

    View Slide

  3. classical mean estimation setting
    We observe n vectors from Rd such that
    X1, . . . , Xn
    i.i.d.
    ∼ Nd(µ∗, Σ)
    with an unknown values of µ∗, Σ.
    The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian
    concentration states that w.p. ≥ 1 − δ, it holds
    1
    n
    n
    i=1
    Xi − µ∗
    2

    Tr(Σ)
    n
    +
    2∥Σ∥ log(1/δ)
    n
    .
    Question: What happens if some “small” fraction (denoted by
    ε) of data is contaminated?
    arshak minasyan robust mean estimation in high dimensions 2

    View Slide

  4. classical mean estimation setting
    We observe n vectors from Rd such that
    X1, . . . , Xn
    i.i.d.
    ∼ Nd(µ∗, Σ)
    with an unknown values of µ∗, Σ.
    The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian
    concentration states that w.p. ≥ 1 − δ, it holds
    1
    n
    n
    i=1
    Xi − µ∗
    2

    Tr(Σ)
    n
    +
    2∥Σ∥ log(1/δ)
    n
    .
    Question: What happens if some “small” fraction (denoted by
    ε) of data is contaminated?
    ▶ Statistical guarantees
    ▶ Computational aspects
    arshak minasyan robust mean estimation in high dimensions 2

    View Slide

  5. contamination models
    MOF = P n
    µ : µ ∈ M .
    arshak minasyan robust mean estimation in high dimensions 3

    View Slide

  6. contamination models
    MOF = P n
    µ : µ ∈ M .
    MHC(ε) = (1 − ε)Pµ + εQ n
    : µ ∈ M, Q ∈ P ,
    arshak minasyan robust mean estimation in high dimensions 3

    View Slide

  7. contamination models
    MOF = P n
    µ : µ ∈ M .
    MHC(ε) = (1 − ε)Pµ + εQ n
    : µ ∈ M, Q ∈ P ,
    MPC(o) = Pµ1
    ⊗ · · · ⊗ Pµn
    : µi ∈ M, ∃O ⊂ [n] s.t. |O| ≤ o,
    µi = µj ∀i, j ∈ Oc and µi ̸= µj, ∀i ∈ O and some j ∈ Oc .
    arshak minasyan robust mean estimation in high dimensions 3

    View Slide

  8. contamination models
    MOF = P n
    µ : µ ∈ M .
    MHC(ε) = (1 − ε)Pµ + εQ n
    : µ ∈ M, Q ∈ P ,
    MPC(o) = Pµ1
    ⊗ · · · ⊗ Pµn
    : µi ∈ M, ∃O ⊂ [n] s.t. |O| ≤ o,
    µi = µj ∀i, j ∈ Oc and µi ̸= µj, ∀i ∈ O and some j ∈ Oc .
    MAC(o) = σ(P n−o
    µ ⊗ P1 ⊗ · · · ⊗ Po) : µ ∈ M, σ ∈ Sn
    permutation group and P1, . . . , Po are arbitrary .
    arshak minasyan robust mean estimation in high dimensions 3

    View Slide

  9. relation between contamination models
    MOF
    MHC
    MPC
    MAC
    arshak minasyan robust mean estimation in high dimensions 4

    View Slide

  10. (sub-)Gaussian adversarial contamination
    Definition (Gaussian Adversarial Contamination)
    Let Y1, . . . , Yn
    i.i.d.
    ∼ Nd(µ∗, Σ) then the contaminated sample
    X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with
    ε ∈ (0, 1/2) when we have
    {i : Xi ̸= Yi} ≤ εn.
    arshak minasyan robust mean estimation in high dimensions 5

    View Slide

  11. (sub-)Gaussian adversarial contamination
    Definition (Gaussian Adversarial Contamination)
    Let Y1, . . . , Yn
    i.i.d.
    ∼ Nd(µ∗, Σ) then the contaminated sample
    X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with
    ε ∈ (0, 1/2) when we have
    {i : Xi ̸= Yi} ≤ εn.
    Definition (Sub-Gaussian Adversarial Contamination)
    Let ξ1, . . . , ξn
    ind.
    ∼ SGd(τ) then the contaminated sample
    X1, . . . , Xn is distributed according to SGAC(µ∗, Σ, ε, τ) when
    i := 1, . . . , n : Xi ̸= µ∗ + Σ1/2ξi ≤ εn.
    arshak minasyan robust mean estimation in high dimensions 5

    View Slide

  12. (sub-)Gaussian adversarial contamination
    Definition (Gaussian Adversarial Contamination)
    Let Y1, . . . , Yn
    i.i.d.
    ∼ Nd(µ∗, Σ) then the contaminated sample
    X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with
    ε ∈ (0, 1/2) when we have
    {i : Xi ̸= Yi} ≤ εn.
    Definition (Sub-Gaussian Adversarial Contamination)
    Let ξ1, . . . , ξn
    ind.
    ∼ SGd(τ) then the contaminated sample
    X1, . . . , Xn is distributed according to SGAC(µ∗, Σ, ε, τ) when
    i := 1, . . . , n : Xi ̸= µ∗ + Σ1/2ξi ≤ εn.
    Outliers: O = {i : Xi ̸= Yi} Inliers: I = {1, . . . , n} \ O.
    arshak minasyan robust mean estimation in high dimensions 5

    View Slide

  13. basic estimators
    ▶ Sample mean:
    µn =
    1
    n
    n
    i=1
    Xi
    arshak minasyan robust mean estimation in high dimensions 6

    View Slide

  14. basic estimators
    ▶ Sample mean:
    µn =
    1
    n
    n
    i=1
    Xi
    Not robust ! One point can make the risk arbitrarily large.
    arshak minasyan robust mean estimation in high dimensions 6

    View Slide

  15. basic estimators
    ▶ Sample mean:
    µn =
    1
    n
    n
    i=1
    Xi
    Not robust ! One point can make the risk arbitrarily large.
    ▶ Coordinate-wise / geometric median:
    µn = med{X1, . . . , Xn}, µGM ∈ arg min
    µ∈Rd
    n
    i=1
    ∥Xi − µ∥2.
    arshak minasyan robust mean estimation in high dimensions 6

    View Slide

  16. basic estimators
    ▶ Sample mean:
    µn =
    1
    n
    n
    i=1
    Xi
    Not robust ! One point can make the risk arbitrarily large.
    ▶ Coordinate-wise / geometric median:
    µn = med{X1, . . . , Xn}, µGM ∈ arg min
    µ∈Rd
    n
    i=1
    ∥Xi − µ∥2.
    Robust !
    arshak minasyan robust mean estimation in high dimensions 6

    View Slide

  17. basic estimators
    ▶ Sample mean:
    µn =
    1
    n
    n
    i=1
    Xi
    Not robust ! One point can make the risk arbitrarily large.
    ▶ Coordinate-wise / geometric median:
    µn = med{X1, . . . , Xn}, µGM ∈ arg min
    µ∈Rd
    n
    i=1
    ∥Xi − µ∥2.
    Robust ! Optimal only in low dimensions.
    arshak minasyan robust mean estimation in high dimensions 6

    View Slide

  18. basic estimators
    ▶ Sample mean:
    µn =
    1
    n
    n
    i=1
    Xi
    Not robust ! One point can make the risk arbitrarily large.
    ▶ Coordinate-wise / geometric median:
    µn = med{X1, . . . , Xn}, µGM ∈ arg min
    µ∈Rd
    n
    i=1
    ∥Xi − µ∥2.
    Robust ! Optimal only in low dimensions.
    ▶ Tukey’s median:
    µTM
    n
    = arg max
    µ∈Rd
    Dn(µ), Dn(µ) = inf
    u∈Sd−1
    n
    i=1
    1(u⊤Xi ≤ u⊤µ)
    arshak minasyan robust mean estimation in high dimensions 6

    View Slide

  19. basic estimators
    ▶ Sample mean:
    µn =
    1
    n
    n
    i=1
    Xi
    Not robust ! One point can make the risk arbitrarily large.
    ▶ Coordinate-wise / geometric median:
    µn = med{X1, . . . , Xn}, µGM ∈ arg min
    µ∈Rd
    n
    i=1
    ∥Xi − µ∥2.
    Robust ! Optimal only in low dimensions.
    ▶ Tukey’s median:
    µTM
    n
    = arg max
    µ∈Rd
    Dn(µ), Dn(µ) = inf
    u∈Sd−1
    n
    i=1
    1(u⊤Xi ≤ u⊤µ)
    Robust & optimal only when the covariance is identical and
    takes exponential time to compute !
    arshak minasyan robust mean estimation in high dimensions 6

    View Slide

  20. median of means
    A simple estimator is median-of-means. Goes back to
    Nemirovsky, Yudin (1983), Jerrum, Valiant, and Vazirani
    (1986), Alon, Matias, and Szegedy (2002).
    µMOM
    n,δ ≜ med
    1
    m
    m
    j=1
    Xj, . . . ,
    1
    m
    km
    j=(k−1)m+1
    Xj .
    Let δ ∈ (0, 1), k = 8 log(1/δ) and m = n
    8 log(1/δ)
    . Then,
    |µMOM
    n,δ
    − µ∗| ≤ σ
    32 log(1/δ)
    n
    Further developments of the MOM approach including
    heavy-tailed distributions in high dimensions include: Depersin
    and Lecue (2021), Lugosi and Mendelson (2019), Minsker
    (2015), Hsu and Sabato (2013), etc.
    arshak minasyan robust mean estimation in high dimensions 7

    View Slide

  21. trimmed mean
    Arguably the oldest and most natural idea of robust statistics is
    the trimmed mean. Tukey and McLaughlin (1963), Huber and
    Ronchetti (1984), Bickel (1965), Stigler (1973).
    Divide the contaminated sample into two halves:
    X1, . . . , Xn; Y1, . . . , Yn. Let α = Y(εn)
    and β = Y((1−ε)n)
    be the
    empirical quantiles of the second half. Define
    µ2n =
    1
    n
    n
    i=1
    ϕα,β(Xi),
    arshak minasyan robust mean estimation in high dimensions 8

    View Slide

  22. trimmed mean
    Arguably the oldest and most natural idea of robust statistics is
    the trimmed mean. Tukey and McLaughlin (1963), Huber and
    Ronchetti (1984), Bickel (1965), Stigler (1973).
    Divide the contaminated sample into two halves:
    X1, . . . , Xn; Y1, . . . , Yn. Let α = Y(εn)
    and β = Y((1−ε)n)
    be the
    empirical quantiles of the second half. Define
    µ2n =
    1
    n
    n
    i=1
    ϕα,β(Xi),
    where (w.p. ≥ 1 − δ given nε ∼ log(1/δ))
    ϕα,β(x) =





    β, if x > β,
    x, if x ∈ [α, β],
    α, if , x < α.
    |µ2n − µ∗| ≤ 9σ
    log(8/δ)
    n
    .
    Lugosi and Mendelson (2021) showed the optimality of trimmed
    mean in high dimensions for distributions with finite two
    moments.
    arshak minasyan robust mean estimation in high dimensions 8

    View Slide

  23. lower bound for Gaussian mean
    Combining the lower bound from Chen et al. (2018) with the
    lower bound in the outlier-free regime we have
    inf
    µn
    ∥µn − µ∗∥2 ≥ c ∥Σ∥

    n
    + ε
    holds with positive probability for some absolute constant c > 0,
    where


    Tr(Σ)
    ∥Σ∥
    is usually called effective rank. Chen et al. (2018) also proved
    that Tukey’s median of Gaussians satisfies
    ∥µTM
    n
    − µ∗∥2 ≤ Cσ
    d + log(1/δ)
    n
    + ε
    w.p. ≥ 1 − δ when Σ = σ2Id.
    arshak minasyan robust mean estimation in high dimensions 9

    View Slide

  24. lower bound for sub-Gaussian mean
    Lugosi and Mendelson (2021) showed that in the family of
    sub-Gaussian distributions, it is indeed impossible to achieve a
    rate better than ∥Σ∥ rΣ
    n
    + ε log(1/ε) , i.e.,
    inf
    µn
    ∥µn − µ∗∥2 ≥ c ∥Σ∥

    n
    + ε log(1/ε)
    holds with positive probability.
    arshak minasyan robust mean estimation in high dimensions 10

    View Slide

  25. optimal robust Gaussian mean estimation
    Theorem (M., Zhivotovskiy 2023+)
    Assume X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < c1, then there is
    an estimator µn satisfying, with probability at least 1 − δ,
    ∥µn − µ∗∥2 ≤ c2 ∥Σ∥

    n
    +
    log(1/δ)
    n
    + ε ,
    where c1, c2 > 0 are some absolute constants.
    arshak minasyan robust mean estimation in high dimensions 11

    View Slide

  26. optimal robust Gaussian mean estimation
    Theorem (M., Zhivotovskiy 2023+)
    Assume X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < c1, then there is
    an estimator µn satisfying, with probability at least 1 − δ,
    ∥µn − µ∗∥2 ≤ c2 ∥Σ∥

    n
    +
    log(1/δ)
    n
    + ε ,
    where c1, c2 > 0 are some absolute constants.
    µn = arg min
    ν∈Rd
    sup
    v∈Sd−1
    |Eρv
    Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| ,
    where θ ∼ Nd(v, β−1Id) and the parameter β satisfies

    /10 ≤ β ≤ 10rΣ
    .
    arshak minasyan robust mean estimation in high dimensions 11

    View Slide

  27. tuning the parameter β
    The parameter β is chosen by the Statistician in such a way that

    /10 ≤ β ≤ 10rΣ
    .
    arshak minasyan robust mean estimation in high dimensions 12

    View Slide

  28. tuning the parameter β
    The parameter β is chosen by the Statistician in such a way that

    /10 ≤ β ≤ 10rΣ
    .
    To estimate β we need to estimate both ∥Σ∥ and Tr(Σ).
    Abdalla and Zhivotovskiy (2022) provided an estimator ω such
    that ∥Σ∥/4 ≤ ω ≤ 4∥Σ∥. The estimation of Tr(Σ) reduces to
    mean estimation in R and using the estimator from Lugosi and
    Mendelson (2019) we have an estimator τ such that
    Tr(Σ)/2 ≤ τ ≤ 2Tr(Σ). Hence, this yields an estimator such
    that

    /8 ≤
    τ
    ω
    ≤ 8rΣ
    .
    An alternative approach for estimating β would be using
    Lepskii’s method.
    arshak minasyan robust mean estimation in high dimensions 12

    View Slide

  29. going beyond the Gaussian assumption
    Gaussian assumption does not play a crucial role. Our results
    extend to distributions that are/have
    ▶ Symmetry around the mean and spherical symmetry
    properties (⟨X − µ, v⟩/

    v⊤Σv for any v ∈ Sd−1 is
    independent of v).
    ▶ For small enough ε and some c > 0
    |F−1(1/2 ± ε) − F−1(1/2)| ≤ cε.
    ▶ The density function f is separated from zero by an
    absolute constant for all x ∈ [F−1(1/2 − ε), F−1(1/2 + ε)].
    ▶ Sub-Gaussian tails.
    arshak minasyan robust mean estimation in high dimensions 13

    View Slide

  30. computational aspects of smoothed median
    Recall
    µn = arg min
    ν∈Rd
    sup
    v∈Sd−1
    |Eρv
    Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| .
    arshak minasyan robust mean estimation in high dimensions 14

    View Slide

  31. computational aspects of smoothed median
    Recall
    µn = arg min
    ν∈Rd
    sup
    v∈Sd−1
    |Eρv
    Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| .
    ▶ It takes exponential time in d to compute the smoothed
    median of contaminated data X1, . . . , Xn.
    ▶ For mean estimation, it is a challenging open problem to
    have a polynomial time algorithm with a linear dependence
    on ε, even when Σ = Id.
    arshak minasyan robust mean estimation in high dimensions 14

    View Slide

  32. iteratively reweighted mean estimator
    We consider the weighted sample mean
    Xw =
    n
    i=1
    wiXi
    Goal: Find a w to mimic w∗
    j
    ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}.
    arshak minasyan robust mean estimation in high dimensions 15

    View Slide

  33. iteratively reweighted mean estimator
    We consider the weighted sample mean
    Xw =
    n
    i=1
    wiXi
    Goal: Find a w to mimic w∗
    j
    ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}.
    Notice that
    Xw − µ∗ = ψw,
    where
    ψw =
    i∈I
    wi
    ∥wI∥1
    ζi +
    1
    1 − εw
    i∈Ic
    wi(Xi − Xw)
    with εw = i∈Ic
    wi, and ζ1, . . . , ζn
    i.i.d.
    ∼ Nd(0, Σ).
    arshak minasyan robust mean estimation in high dimensions 15

    View Slide

  34. iteratively reweighted mean estimator
    We consider the weighted sample mean
    Xw =
    n
    i=1
    wiXi
    Goal: Find a w to mimic w∗
    j
    ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}.
    Notice that
    Xw − µ∗ = ψw,
    where
    ψw =
    i∈I
    wi
    ∥wI∥1
    ζi +
    1
    1 − εw
    i∈Ic
    wi(Xi − Xw)
    with εw = i∈Ic
    wi, and ζ1, . . . , ζn
    i.i.d.
    ∼ Nd(0, Σ).
    ∥Xw − µ∗∥2 ≤ sup
    v∈B2
    v⊤ψw.
    arshak minasyan robust mean estimation in high dimensions 15

    View Slide

  35. iteratively reweighted mean estimator
    For any pair of vectors w ∈ ∆n−1 and µ ∈ Rd we proved
    ∥Xw − µ∗∥2 ≤

    ε · G(w, µ)1/2 + R(ζ, I)
    G(w, µ) = λmax
    n
    i=1
    wi(Xi − µ)⊗2 − Σ .
    arshak minasyan robust mean estimation in high dimensions 16

    View Slide

  36. iteratively reweighted mean estimator
    For any pair of vectors w ∈ ∆n−1 and µ ∈ Rd we proved
    ∥Xw − µ∗∥2 ≤

    ε · G(w, µ)1/2 + R(ζ, I)
    G(w, µ) = λmax
    n
    i=1
    wi(Xi − µ)⊗2 − Σ .
    ▶ G(w, µ) is indeed bi-convex in (w, µ)
    arshak minasyan robust mean estimation in high dimensions 16

    View Slide

  37. iteratively reweighted mean estimator
    For any pair of vectors w ∈ ∆n−1 and µ ∈ Rd we proved
    ∥Xw − µ∗∥2 ≤

    ε · G(w, µ)1/2 + R(ζ, I)
    G(w, µ) = λmax
    n
    i=1
    wi(Xi − µ)⊗2 − Σ .
    ▶ G(w, µ) is indeed bi-convex in (w, µ)
    ▶ For fixed value of µ minimizing G(w, µ) becomes an SDP
    arshak minasyan robust mean estimation in high dimensions 16

    View Slide

  38. iteratively reweighted mean estimator
    For any pair of vectors w ∈ ∆n−1 and µ ∈ Rd we proved
    ∥Xw − µ∗∥2 ≤

    ε · G(w, µ)1/2 + R(ζ, I)
    G(w, µ) = λmax
    n
    i=1
    wi(Xi − µ)⊗2 − Σ .
    ▶ G(w, µ) is indeed bi-convex in (w, µ)
    ▶ For fixed value of µ minimizing G(w, µ) becomes an SDP
    ▶ For fixed value of w ∈ ∆n−1
    Xw = arg min
    µ∈Rn
    G(w, µ).
    arshak minasyan robust mean estimation in high dimensions 16

    View Slide

  39. algorithm for known ε and Σ
    alg]algo:a1
    Alg. 1: Iteratively reweighted mean estimator
    Input: data X1
    , . . . , Xn
    ∈ Rd, contamination rate ε and Σ
    Output: parameter estimate µIR
    n
    Initialize: compute µ0 as a minimizer of n
    i=1
    ∥Xi
    − µ∥2
    Set K = 0 ∨ log(4rΣ)−2 log(ε(1−2ε))
    2 log(1−2ε)−log ε−log(1−ε)
    .
    For k = 1 : K
    Compute current weights:
    w ∈ arg min
    (n−nε)∥w∥∞≤1
    λmax
    n
    i=1
    wi
    (Xi
    − µk−1)⊗2 − Σ .
    Update the estimator: µk = Xw
    .
    EndFor
    Return µIR
    n
    := µK.
    arshak minasyan robust mean estimation in high dimensions 17

    View Slide

  40. in-expectation bound for Gaussians
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 −

    5)/10
    and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies
    ∥µIR
    n
    − µ∗∥
    L2

    10∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)

    /n + ε log(1/ε) .
    arshak minasyan robust mean estimation in high dimensions 18

    View Slide

  41. in-expectation bound for Gaussians
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 −

    5)/10
    and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies
    ∥µIR
    n
    − µ∗∥
    L2

    10∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)

    /n + ε log(1/ε) .
    ▶ Small and explicit constant in front of the rate, compared
    to other estimator’s constants which are non-explicit or
    very large, e.g. around 107.
    arshak minasyan robust mean estimation in high dimensions 18

    View Slide

  42. in-expectation bound for Gaussians
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 −

    5)/10
    and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies
    ∥µIR
    n
    − µ∗∥
    L2

    10∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)

    /n + ε log(1/ε) .
    ▶ Small and explicit constant in front of the rate, compared
    to other estimator’s constants which are non-explicit or
    very large, e.g. around 107.
    ▶ Effective rank of the covariance matrix instead of the
    dimension.
    arshak minasyan robust mean estimation in high dimensions 18

    View Slide

  43. in-probability bound for sub-Gaussians
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let
    ε < (5 −

    5)/10 and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies,
    with probability at least 1 − δ,
    ∥µIR
    n
    − µ∗∥2 ≤
    A(τ)∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)
    d + log(4/δ)
    n
    + ε log(1/ε) .
    arshak minasyan robust mean estimation in high dimensions 19

    View Slide

  44. in-probability bound for sub-Gaussians
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let
    ε < (5 −

    5)/10 and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies,
    with probability at least 1 − δ,
    ∥µIR
    n
    − µ∗∥2 ≤
    A(τ)∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)
    d + log(4/δ)
    n
    + ε log(1/ε) .
    ▶ The constant A(τ) is not explicit and depends only on
    variance proxy τ.
    arshak minasyan robust mean estimation in high dimensions 19

    View Slide

  45. in-probability bound for sub-Gaussians
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let
    ε < (5 −

    5)/10 and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies,
    with probability at least 1 − δ,
    ∥µIR
    n
    − µ∗∥2 ≤
    A(τ)∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)
    d + log(4/δ)
    n
    + ε log(1/ε) .
    ▶ The constant A(τ) is not explicit and depends only on
    variance proxy τ.
    ▶ The dependence on ε is optimal for sub-Gaussians.
    arshak minasyan robust mean estimation in high dimensions 19

    View Slide

  46. unknown Σ
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ.
    Let ε < (5 −

    5)/10 and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies, with probability at least 1 − δ,
    ∥µIR
    n
    − µ∗∥2 ≤
    A(τ)∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)
    p + log(1/δ)
    n
    +

    ε
    with probability at least 1 − δ.
    arshak minasyan robust mean estimation in high dimensions 20

    View Slide

  47. unknown Σ
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ.
    Let ε < (5 −

    5)/10 and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies, with probability at least 1 − δ,
    ∥µIR
    n
    − µ∗∥2 ≤
    A(τ)∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)
    p + log(1/δ)
    n
    +

    ε
    with probability at least 1 − δ.
    ▶ For unknown isotropic Σ = σ2Id, the sub-Gaussian rates
    hold.
    arshak minasyan robust mean estimation in high dimensions 20

    View Slide

  48. unknown Σ
    Theorem (M. & Dalalyan, 2022)
    Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ.
    Let ε < (5 −

    5)/10 and µ∗ ∈ Rd, then the estimator µIR
    n
    satisfies, with probability at least 1 − δ,
    ∥µIR
    n
    − µ∗∥2 ≤
    A(τ)∥Σ∥1/2
    op
    1 − 2ε − ε(1 − ε)
    p + log(1/δ)
    n
    +

    ε
    with probability at least 1 − δ.
    ▶ For unknown isotropic Σ = σ2Id, the sub-Gaussian rates
    hold.
    ▶ Among computationally tractable estimators the rate

    ε
    for general unknown Σ is the best known in the literature.
    arshak minasyan robust mean estimation in high dimensions 20

    View Slide

  49. mean estimator based on spectral dimension reduction
    (SDR)
    arshak minasyan robust mean estimation in high dimensions 21

    View Slide

  50. in-probability bound for SDR estimator
    Theorem (Bateni, M., Dalalyan, 2022)
    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and
    δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and
    some absolute constant C, the SDR estimator satisfies
    µSDR − µ∗
    2

    C

    log p
    1 − 2ε∗

    n
    + ε log(2/ε) +
    log(1/δ)
    n
    with probability at least 1 − δ.
    arshak minasyan robust mean estimation in high dimensions 22

    View Slide

  51. in-probability bound for SDR estimator
    Theorem (Bateni, M., Dalalyan, 2022)
    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and
    δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and
    some absolute constant C, the SDR estimator satisfies
    µSDR − µ∗
    2

    C

    log p
    1 − 2ε∗

    n
    + ε log(2/ε) +
    log(1/δ)
    n
    with probability at least 1 − δ.
    ▶ The SDR algorithm is much faster than that based on
    iterative reweighting.
    arshak minasyan robust mean estimation in high dimensions 22

    View Slide

  52. in-probability bound for SDR estimator
    Theorem (Bateni, M., Dalalyan, 2022)
    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and
    δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and
    some absolute constant C, the SDR estimator satisfies
    µSDR − µ∗
    2

    C

    log p
    1 − 2ε∗

    n
    + ε log(2/ε) +
    log(1/δ)
    n
    with probability at least 1 − δ.
    ▶ The SDR algorithm is much faster than that based on
    iterative reweighting.
    ▶ Does not require the knowledge of ε.
    arshak minasyan robust mean estimation in high dimensions 22

    View Slide

  53. in-probability bound for SDR estimator
    Theorem (Bateni, M., Dalalyan, 2022)
    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and
    δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and
    some absolute constant C, the SDR estimator satisfies
    µSDR − µ∗
    2

    C

    log p
    1 − 2ε∗

    n
    + ε log(2/ε) +
    log(1/δ)
    n
    with probability at least 1 − δ.
    ▶ The SDR algorithm is much faster than that based on
    iterative reweighting.
    ▶ Does not require the knowledge of ε.
    ▶ Breakdown point is equal to 0.5.
    arshak minasyan robust mean estimation in high dimensions 22

    View Slide

  54. in-probability bound for SDR estimator
    Theorem (Bateni, M., Dalalyan, 2022)
    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and
    δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and
    some absolute constant C, the SDR estimator satisfies
    µSDR − µ∗
    2

    C

    log p
    1 − 2ε∗

    n
    + ε log(2/ε) +
    log(1/δ)
    n
    with probability at least 1 − δ.
    ▶ The SDR algorithm is much faster than that based on
    iterative reweighting.
    ▶ Does not require the knowledge of ε.
    ▶ Breakdown point is equal to 0.5.
    ▶ It has an additional factor

    log p.
    arshak minasyan robust mean estimation in high dimensions 22

    View Slide

  55. unknown Σ
    Theorem (Bateni, M., Dalalyan, 2022)
    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and
    ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies
    ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for
    appropriately chosen threshold t and some absolute constant C,
    the SDR estimator satisfies
    µSDR − µ∗
    2
    ∥Σ∥1/2

    C

    log p
    1 − 2ε∗

    n
    + ε log(2/ε) +

    εγ +
    log(1/δ)
    n
    with probability at least 1 − δ.
    arshak minasyan robust mean estimation in high dimensions 23

    View Slide

  56. unknown Σ
    Theorem (Bateni, M., Dalalyan, 2022)
    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and
    ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies
    ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for
    appropriately chosen threshold t and some absolute constant C,
    the SDR estimator satisfies
    µSDR − µ∗
    2
    ∥Σ∥1/2

    C

    log p
    1 − 2ε∗

    n
    + ε log(2/ε) +

    εγ +
    log(1/δ)
    n
    with probability at least 1 − δ.
    ▶ If γ is at most of order ε log(1/ε) then we have the same
    rate as if we knew Σ.
    arshak minasyan robust mean estimation in high dimensions 23

    View Slide

  57. unknown Σ
    Theorem (Bateni, M., Dalalyan, 2022)
    Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and
    ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies
    ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for
    appropriately chosen threshold t and some absolute constant C,
    the SDR estimator satisfies
    µSDR − µ∗
    2
    ∥Σ∥1/2

    C

    log p
    1 − 2ε∗

    n
    + ε log(2/ε) +

    εγ +
    log(1/δ)
    n
    with probability at least 1 − δ.
    ▶ If γ is at most of order ε log(1/ε) then we have the same
    rate as if we knew Σ.
    ▶ If γ is of constant order then we recover the best-known rate
    for computationally tractable estimators, i.e., rΣ
    /n +

    ε.
    arshak minasyan robust mean estimation in high dimensions 23

    View Slide

  58. comparison
    Source: Bateni, M., Dalalyan (2022)
    arshak minasyan robust mean estimation in high dimensions 24

    View Slide

  59. references
    A. Dalalyan and A. Minasyan.
    All-in-one robust estimator of the Gaussian mean.
    Annals of Statistics, 2022.
    A.-H. Bateni, A. Minasyan, A. Dalalyan.
    Nearly minimax robust estimator of the mean vector by
    iterative spectral dimension reduction.
    arXiv preprint arXiv:2204.02323, 2022.
    A. Minasyan and N. Zhivotovskiy.
    Statistically optimal robust mean and covariance estimation for
    anisotropic Gaussians.
    arXiv preprint arXiv:2301.09024, 2023.
    arshak minasyan robust mean estimation in high dimensions 25

    View Slide