Slide 1

Slide 1 text

Recent advances in robust mean estimation in high dimensions Arshak Minasyan CREST-ENSAE, IP Paris S3 – The Paris-Saclay Signal Seminar March 8, 2023 Based on joint works with Arnak Dalalyan (CREST-ENSAE, IP Paris), Amir-Hossein Bateni (Université Grenoble Alpes), Nikita Zhivotovskiy (University of California, Berkeley). arshak minasyan robust mean estimation in high dimensions 1

Slide 2

Slide 2 text

classical mean estimation setting We observe n vectors from Rd such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . arshak minasyan robust mean estimation in high dimensions 2

Slide 3

Slide 3 text

classical mean estimation setting We observe n vectors from Rd such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . Question: What happens if some “small” fraction (denoted by ε) of data is contaminated? arshak minasyan robust mean estimation in high dimensions 2

Slide 4

Slide 4 text

classical mean estimation setting We observe n vectors from Rd such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . Question: What happens if some “small” fraction (denoted by ε) of data is contaminated? ▶ Statistical guarantees ▶ Computational aspects arshak minasyan robust mean estimation in high dimensions 2

Slide 5

Slide 5 text

contamination models MOF = P n µ : µ ∈ M . arshak minasyan robust mean estimation in high dimensions 3

Slide 6

Slide 6 text

contamination models MOF = P n µ : µ ∈ M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , arshak minasyan robust mean estimation in high dimensions 3

Slide 7

Slide 7 text

contamination models MOF = P n µ : µ ∈ M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , MPC(o) = Pµ1 ⊗ · · · ⊗ Pµn : µi ∈ M, ∃O ⊂ [n] s.t. |O| ≤ o, µi = µj ∀i, j ∈ Oc and µi ̸= µj, ∀i ∈ O and some j ∈ Oc . arshak minasyan robust mean estimation in high dimensions 3

Slide 8

Slide 8 text

contamination models MOF = P n µ : µ ∈ M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , MPC(o) = Pµ1 ⊗ · · · ⊗ Pµn : µi ∈ M, ∃O ⊂ [n] s.t. |O| ≤ o, µi = µj ∀i, j ∈ Oc and µi ̸= µj, ∀i ∈ O and some j ∈ Oc . MAC(o) = σ(P n−o µ ⊗ P1 ⊗ · · · ⊗ Po) : µ ∈ M, σ ∈ Sn permutation group and P1, . . . , Po are arbitrary . arshak minasyan robust mean estimation in high dimensions 3

Slide 9

Slide 9 text

relation between contamination models MOF MHC MPC MAC arshak minasyan robust mean estimation in high dimensions 4

Slide 10

Slide 10 text

(sub-)Gaussian adversarial contamination Definition (Gaussian Adversarial Contamination) Let Y1, . . . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. arshak minasyan robust mean estimation in high dimensions 5

Slide 11

Slide 11 text

(sub-)Gaussian adversarial contamination Definition (Gaussian Adversarial Contamination) Let Y1, . . . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. Definition (Sub-Gaussian Adversarial Contamination) Let ξ1, . . . , ξn ind. ∼ SGd(τ) then the contaminated sample X1, . . . , Xn is distributed according to SGAC(µ∗, Σ, ε, τ) when i := 1, . . . , n : Xi ̸= µ∗ + Σ1/2ξi ≤ εn. arshak minasyan robust mean estimation in high dimensions 5

Slide 12

Slide 12 text

(sub-)Gaussian adversarial contamination Definition (Gaussian Adversarial Contamination) Let Y1, . . . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. Definition (Sub-Gaussian Adversarial Contamination) Let ξ1, . . . , ξn ind. ∼ SGd(τ) then the contaminated sample X1, . . . , Xn is distributed according to SGAC(µ∗, Σ, ε, τ) when i := 1, . . . , n : Xi ̸= µ∗ + Σ1/2ξi ≤ εn. Outliers: O = {i : Xi ̸= Yi} Inliers: I = {1, . . . , n} \ O. arshak minasyan robust mean estimation in high dimensions 5

Slide 13

Slide 13 text

basic estimators ▶ Sample mean: µn = 1 n n i=1 Xi arshak minasyan robust mean estimation in high dimensions 6

Slide 14

Slide 14 text

basic estimators ▶ Sample mean: µn = 1 n n i=1 Xi Not robust ! One point can make the risk arbitrarily large. arshak minasyan robust mean estimation in high dimensions 6

Slide 15

Slide 15 text

basic estimators ▶ Sample mean: µn = 1 n n i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. arshak minasyan robust mean estimation in high dimensions 6

Slide 16

Slide 16 text

basic estimators ▶ Sample mean: µn = 1 n n i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! arshak minasyan robust mean estimation in high dimensions 6

Slide 17

Slide 17 text

basic estimators ▶ Sample mean: µn = 1 n n i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. arshak minasyan robust mean estimation in high dimensions 6

Slide 18

Slide 18 text

basic estimators ▶ Sample mean: µn = 1 n n i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. ▶ Tukey’s median: µTM n = arg max µ∈Rd Dn(µ), Dn(µ) = inf u∈Sd−1 n i=1 1(u⊤Xi ≤ u⊤µ) arshak minasyan robust mean estimation in high dimensions 6

Slide 19

Slide 19 text

basic estimators ▶ Sample mean: µn = 1 n n i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. ▶ Tukey’s median: µTM n = arg max µ∈Rd Dn(µ), Dn(µ) = inf u∈Sd−1 n i=1 1(u⊤Xi ≤ u⊤µ) Robust & optimal only when the covariance is identical and takes exponential time to compute ! arshak minasyan robust mean estimation in high dimensions 6

Slide 20

Slide 20 text

median of means A simple estimator is median-of-means. Goes back to Nemirovsky, Yudin (1983), Jerrum, Valiant, and Vazirani (1986), Alon, Matias, and Szegedy (2002). µMOM n,δ ≜ med 1 m m j=1 Xj, . . . , 1 m km j=(k−1)m+1 Xj . Let δ ∈ (0, 1), k = 8 log(1/δ) and m = n 8 log(1/δ) . Then, |µMOM n,δ − µ∗| ≤ σ 32 log(1/δ) n Further developments of the MOM approach including heavy-tailed distributions in high dimensions include: Depersin and Lecue (2021), Lugosi and Mendelson (2019), Minsker (2015), Hsu and Sabato (2013), etc. arshak minasyan robust mean estimation in high dimensions 7

Slide 21

Slide 21 text

trimmed mean Arguably the oldest and most natural idea of robust statistics is the trimmed mean. Tukey and McLaughlin (1963), Huber and Ronchetti (1984), Bickel (1965), Stigler (1973). Divide the contaminated sample into two halves: X1, . . . , Xn; Y1, . . . , Yn. Let α = Y(εn) and β = Y((1−ε)n) be the empirical quantiles of the second half. Define µ2n = 1 n n i=1 ϕα,β(Xi), arshak minasyan robust mean estimation in high dimensions 8

Slide 22

Slide 22 text

trimmed mean Arguably the oldest and most natural idea of robust statistics is the trimmed mean. Tukey and McLaughlin (1963), Huber and Ronchetti (1984), Bickel (1965), Stigler (1973). Divide the contaminated sample into two halves: X1, . . . , Xn; Y1, . . . , Yn. Let α = Y(εn) and β = Y((1−ε)n) be the empirical quantiles of the second half. Define µ2n = 1 n n i=1 ϕα,β(Xi), where (w.p. ≥ 1 − δ given nε ∼ log(1/δ)) ϕα,β(x) =      β, if x > β, x, if x ∈ [α, β], α, if , x < α. |µ2n − µ∗| ≤ 9σ log(8/δ) n . Lugosi and Mendelson (2021) showed the optimality of trimmed mean in high dimensions for distributions with finite two moments. arshak minasyan robust mean estimation in high dimensions 8

Slide 23

Slide 23 text

lower bound for Gaussian mean Combining the lower bound from Chen et al. (2018) with the lower bound in the outlier-free regime we have inf µn ∥µn − µ∗∥2 ≥ c ∥Σ∥ rΣ n + ε holds with positive probability for some absolute constant c > 0, where rΣ ≜ Tr(Σ) ∥Σ∥ is usually called effective rank. Chen et al. (2018) also proved that Tukey’s median of Gaussians satisfies ∥µTM n − µ∗∥2 ≤ Cσ d + log(1/δ) n + ε w.p. ≥ 1 − δ when Σ = σ2Id. arshak minasyan robust mean estimation in high dimensions 9

Slide 24

Slide 24 text

lower bound for sub-Gaussian mean Lugosi and Mendelson (2021) showed that in the family of sub-Gaussian distributions, it is indeed impossible to achieve a rate better than ∥Σ∥ rΣ n + ε log(1/ε) , i.e., inf µn ∥µn − µ∗∥2 ≥ c ∥Σ∥ rΣ n + ε log(1/ε) holds with positive probability. arshak minasyan robust mean estimation in high dimensions 10

Slide 25

Slide 25 text

optimal robust Gaussian mean estimation Theorem (M., Zhivotovskiy 2023+) Assume X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < c1, then there is an estimator µn satisfying, with probability at least 1 − δ, ∥µn − µ∗∥2 ≤ c2 ∥Σ∥ rΣ n + log(1/δ) n + ε , where c1, c2 > 0 are some absolute constants. arshak minasyan robust mean estimation in high dimensions 11

Slide 26

Slide 26 text

optimal robust Gaussian mean estimation Theorem (M., Zhivotovskiy 2023+) Assume X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < c1, then there is an estimator µn satisfying, with probability at least 1 − δ, ∥µn − µ∗∥2 ≤ c2 ∥Σ∥ rΣ n + log(1/δ) n + ε , where c1, c2 > 0 are some absolute constants. µn = arg min ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| , where θ ∼ Nd(v, β−1Id) and the parameter β satisfies rΣ /10 ≤ β ≤ 10rΣ . arshak minasyan robust mean estimation in high dimensions 11

Slide 27

Slide 27 text

tuning the parameter β The parameter β is chosen by the Statistician in such a way that rΣ /10 ≤ β ≤ 10rΣ . arshak minasyan robust mean estimation in high dimensions 12

Slide 28

Slide 28 text

tuning the parameter β The parameter β is chosen by the Statistician in such a way that rΣ /10 ≤ β ≤ 10rΣ . To estimate β we need to estimate both ∥Σ∥ and Tr(Σ). Abdalla and Zhivotovskiy (2022) provided an estimator ω such that ∥Σ∥/4 ≤ ω ≤ 4∥Σ∥. The estimation of Tr(Σ) reduces to mean estimation in R and using the estimator from Lugosi and Mendelson (2019) we have an estimator τ such that Tr(Σ)/2 ≤ τ ≤ 2Tr(Σ). Hence, this yields an estimator such that rΣ /8 ≤ τ ω ≤ 8rΣ . An alternative approach for estimating β would be using Lepskii’s method. arshak minasyan robust mean estimation in high dimensions 12

Slide 29

Slide 29 text

going beyond the Gaussian assumption Gaussian assumption does not play a crucial role. Our results extend to distributions that are/have ▶ Symmetry around the mean and spherical symmetry properties (⟨X − µ, v⟩/ √ v⊤Σv for any v ∈ Sd−1 is independent of v). ▶ For small enough ε and some c > 0 |F−1(1/2 ± ε) − F−1(1/2)| ≤ cε. ▶ The density function f is separated from zero by an absolute constant for all x ∈ [F−1(1/2 − ε), F−1(1/2 + ε)]. ▶ Sub-Gaussian tails. arshak minasyan robust mean estimation in high dimensions 13

Slide 30

Slide 30 text

computational aspects of smoothed median Recall µn = arg min ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| . arshak minasyan robust mean estimation in high dimensions 14

Slide 31

Slide 31 text

computational aspects of smoothed median Recall µn = arg min ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| . ▶ It takes exponential time in d to compute the smoothed median of contaminated data X1, . . . , Xn. ▶ For mean estimation, it is a challenging open problem to have a polynomial time algorithm with a linear dependence on ε, even when Σ = Id. arshak minasyan robust mean estimation in high dimensions 14

Slide 32

Slide 32 text

iteratively reweighted mean estimator We consider the weighted sample mean Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. arshak minasyan robust mean estimation in high dimensions 15

Slide 33

Slide 33 text

iteratively reweighted mean estimator We consider the weighted sample mean Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. Notice that Xw − µ∗ = ψw, where ψw = i∈I wi ∥wI∥1 ζi + 1 1 − εw i∈Ic wi(Xi − Xw) with εw = i∈Ic wi, and ζ1, . . . , ζn i.i.d. ∼ Nd(0, Σ). arshak minasyan robust mean estimation in high dimensions 15

Slide 34

Slide 34 text

iteratively reweighted mean estimator We consider the weighted sample mean Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. Notice that Xw − µ∗ = ψw, where ψw = i∈I wi ∥wI∥1 ζi + 1 1 − εw i∈Ic wi(Xi − Xw) with εw = i∈Ic wi, and ζ1, . . . , ζn i.i.d. ∼ Nd(0, Σ). ∥Xw − µ∗∥2 ≤ sup v∈B2 v⊤ψw. arshak minasyan robust mean estimation in high dimensions 15

Slide 35

Slide 35 text

iteratively reweighted mean estimator For any pair of vectors w ∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . arshak minasyan robust mean estimation in high dimensions 16

Slide 36

Slide 36 text

iteratively reweighted mean estimator For any pair of vectors w ∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) arshak minasyan robust mean estimation in high dimensions 16

Slide 37

Slide 37 text

iteratively reweighted mean estimator For any pair of vectors w ∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) ▶ For fixed value of µ minimizing G(w, µ) becomes an SDP arshak minasyan robust mean estimation in high dimensions 16

Slide 38

Slide 38 text

iteratively reweighted mean estimator For any pair of vectors w ∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) ▶ For fixed value of µ minimizing G(w, µ) becomes an SDP ▶ For fixed value of w ∈ ∆n−1 Xw = arg min µ∈Rn G(w, µ). arshak minasyan robust mean estimation in high dimensions 16

Slide 39

Slide 39 text

algorithm for known ε and Σ alg]algo:a1 Alg. 1: Iteratively reweighted mean estimator Input: data X1 , . . . , Xn ∈ Rd, contamination rate ε and Σ Output: parameter estimate µIR n Initialize: compute µ0 as a minimizer of n i=1 ∥Xi − µ∥2 Set K = 0 ∨ log(4rΣ)−2 log(ε(1−2ε)) 2 log(1−2ε)−log ε−log(1−ε) . For k = 1 : K Compute current weights: w ∈ arg min (n−nε)∥w∥∞≤1 λmax n i=1 wi (Xi − µk−1)⊗2 − Σ . Update the estimator: µk = Xw . EndFor Return µIR n := µK. arshak minasyan robust mean estimation in high dimensions 17

Slide 40

Slide 40 text

in-expectation bound for Gaussians Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . arshak minasyan robust mean estimation in high dimensions 18

Slide 41

Slide 41 text

in-expectation bound for Gaussians Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . ▶ Small and explicit constant in front of the rate, compared to other estimator’s constants which are non-explicit or very large, e.g. around 107. arshak minasyan robust mean estimation in high dimensions 18

Slide 42

Slide 42 text

in-expectation bound for Gaussians Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . ▶ Small and explicit constant in front of the rate, compared to other estimator’s constants which are non-explicit or very large, e.g. around 107. ▶ Effective rank of the covariance matrix instead of the dimension. arshak minasyan robust mean estimation in high dimensions 18

Slide 43

Slide 43 text

in-probability bound for sub-Gaussians Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . arshak minasyan robust mean estimation in high dimensions 19

Slide 44

Slide 44 text

in-probability bound for sub-Gaussians Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . ▶ The constant A(τ) is not explicit and depends only on variance proxy τ. arshak minasyan robust mean estimation in high dimensions 19

Slide 45

Slide 45 text

in-probability bound for sub-Gaussians Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . ▶ The constant A(τ) is not explicit and depends only on variance proxy τ. ▶ The dependence on ε is optimal for sub-Gaussians. arshak minasyan robust mean estimation in high dimensions 19

Slide 46

Slide 46 text

unknown Σ Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 20

Slide 47

Slide 47 text

unknown Σ Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. ▶ For unknown isotropic Σ = σ2Id, the sub-Gaussian rates hold. arshak minasyan robust mean estimation in high dimensions 20

Slide 48

Slide 48 text

unknown Σ Theorem (M. & Dalalyan, 2022) Assume that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. ▶ For unknown isotropic Σ = σ2Id, the sub-Gaussian rates hold. ▶ Among computationally tractable estimators the rate √ ε for general unknown Σ is the best known in the literature. arshak minasyan robust mean estimation in high dimensions 20

Slide 49

Slide 49 text

mean estimator based on spectral dimension reduction (SDR) arshak minasyan robust mean estimation in high dimensions 21

Slide 50

Slide 50 text

in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022) Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 22

Slide 51

Slide 51 text

in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022) Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. arshak minasyan robust mean estimation in high dimensions 22

Slide 52

Slide 52 text

in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022) Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. arshak minasyan robust mean estimation in high dimensions 22

Slide 53

Slide 53 text

in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022) Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. ▶ Breakdown point is equal to 0.5. arshak minasyan robust mean estimation in high dimensions 22

Slide 54

Slide 54 text

in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022) Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. ▶ Breakdown point is equal to 0.5. ▶ It has an additional factor √ log p. arshak minasyan robust mean estimation in high dimensions 22

Slide 55

Slide 55 text

unknown Σ Theorem (Bateni, M., Dalalyan, 2022) Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 23

Slide 56

Slide 56 text

unknown Σ Theorem (Bateni, M., Dalalyan, 2022) Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. ▶ If γ is at most of order ε log(1/ε) then we have the same rate as if we knew Σ. arshak minasyan robust mean estimation in high dimensions 23

Slide 57

Slide 57 text

unknown Σ Theorem (Bateni, M., Dalalyan, 2022) Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. ▶ If γ is at most of order ε log(1/ε) then we have the same rate as if we knew Σ. ▶ If γ is of constant order then we recover the best-known rate for computationally tractable estimators, i.e., rΣ /n + √ ε. arshak minasyan robust mean estimation in high dimensions 23

Slide 58

Slide 58 text

comparison Source: Bateni, M., Dalalyan (2022) arshak minasyan robust mean estimation in high dimensions 24

Slide 59

Slide 59 text

references A. Dalalyan and A. Minasyan. All-in-one robust estimator of the Gaussian mean. Annals of Statistics, 2022. A.-H. Bateni, A. Minasyan, A. Dalalyan. Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction. arXiv preprint arXiv:2204.02323, 2022. A. Minasyan and N. Zhivotovskiy. Statistically optimal robust mean and covariance estimation for anisotropic Gaussians. arXiv preprint arXiv:2301.09024, 2023. arshak minasyan robust mean estimation in high dimensions 25