Florent Bouchard

Riemannian geometry for data analysis: application to blind source separation
and low-rank structured covariance matrices Florent Bouchard (L2S – CNRS, Univ. Paris-Saclay, CentraleSupélec) S3 seminar – November, 25 2020

Introduction

Introduction – Statistical data analysis • Characterize data x with
some parameters θ x θ • To estimate θ, optimization problem: argmin θ∈M f (x, θ) • f : X × M → R, cost function corresponding to the model • x and θ might possess a structure ⇒ X and M are manifolds 1

Introduction – Riemannian geometry Geometry: metric, curves, distance or divergence
treat diﬃcult problems, complicated to handle with other approaches • modeling: design appropriate cost functions for the considered model exploit the structure of data x, e.g. centers of mass • optimization: generic convex and non-convex methods, modularity naturally takes into account structure of parameters θ • performance analysis: error measures and associated performance bounds captures the geometrical structure of the problem M TθM •θ 2

Introduction Riemannian geometry and optimization Approximate joint diagonalization for blind
source separation Intrinsic Cramér-Rao bound for low-rank structured elliptical models Conclusions and perspectives

Riemannian geometry and optimization

Riemannian geometry [Absil et al., 2008] • Manifold M: locally
diﬀeomorphic to Rd , with dim(M) = d, i.e. ∀θ ∈ M, ∃ Uθ ⊂ M and ϕθ : Uθ → Rd , diﬀeomorphism M • θ Uθ • Curve γ : R → M, γ(0) = θ, derivative: ˙ γ(0) = lim t→0 γ(t) − γ(0) t • Tangent space Tθ M = { ˙ γ(0) : γ : R → M, γ(0) = θ} M TθM θ • 3

Riemannian geometry [Absil et al., 2008] • Riemannian metric ·,
· θ : Tθ M × Tθ M → R • inner product on TθM – bilinear, symmetric, positive definite • defines length and relative positions of tangent vectors ξ 2 θ = ξ, ξ θ α(ξ, η) = ξ, η θ ξ θ η θ • Geodesics γ : [0, 1] → M • generalizes straight lines to manifolds – defined by (γ(0), ˙ γ(0)) or (γ(0), γ(1)) • curves on M with zero acceleration: D2γ dt2 = 0 operator D2 dt2 depends on M and ·, · · • Riemannian distance: δ(θ, ˆ θ) = 1 0 ˙ γ(t) γ(t) dt γ: geodesic connecting θ and ˆ θ ⇒ distance = length of γ M TθM θ ξ η α • M θ ˆ θ ˙ γ(t) • • • γ(t) 4

Riemannian optimization [Absil et al., 2008] argmin θ∈M f (θ)
• framework for optimization on manifold M equipped with metric ·, · · M TθM ξ • θ f (θ) D f (θ)[ξ] • descent direction of f at θ: ξ ∈ Tθ M, D f (θ)[ξ] < 0 • gradient of f at θ: grad f (θ), ξ θ = D f (θ)[ξ] 5

Riemannian optimization [Absil et al., 2008] • minimize f on
M from θ: descent direction ξ ∈ TθM grad f (θ), ξ θ < 0 retraction of ξ on M M TθM ξ • θ • Rθ(ξ) reiterate until critical point: grad f (θ) = 0 • example of optimization algorithm: gradient descent θi+1 = Rθi (−ti grad f (θi )) 6

Approximate joint diagonalization for blind source separation

Problem: approximate joint diagonalization • data: set {Ck } of
n × n symmetric positive deﬁnite matrices • goal: ﬁnd joint diagonalizer B of matrices Ck {Ck } ··· k B BCk BT ··· k • formulation: non-singular matrix B ∈ Rn×n such that: argmin B f (B, {Ck }) f – diagonality criterion 7

Application: blind source separation • instantaneous linear mixing: x(t) =
A s(t) [Comon and Jutten, 2010] • goal: retrieve A and s(t) knowing x(t) s(t) x(t) A Bx(t) B • used in many engineering ﬁelds such as electroencephalography 8

Geometrical modeling of the problem • goal: get Ck as
close as possible to D++ n in S++ n • diagonality measure of BCk BT : relative position to D++ n D++ n diagonal positive deﬁnite matrices S++ n symmetric positive deﬁnite matrices • • • • • • Ck 9

Proposed geometrical model • appropriate criterion: f (B, {Ck }))
= k d(BCk BT , Λk (B)) Λk (B): target diagonal matrices d(·, ·) : divergence on S++ n D++ n diagonal positive deﬁnite matrices S++ n symmetric positive deﬁnite matrices • BCk BT • Λk (B) • Λk (B) • natural choice for Λk (B): Λk (B) = argmin Λ∈D++ n d(BCk BT , Λ) 10

Considered divergences • least squares criterion: Euclidean distance on S++
n [Cardoso and Souloumiac, 1993] δ2 F (C, Λ) = C − Λ 2 F Λ = ddiag(C) • log-likelihood cirterion: left Kullback-Leibler divergence [Pham, 2000] d KL(C, Λ) = dKL(C, Λ) Λ = ddiag(C) where dKL(P, S) = trace(PS−1 − In) − log det(PS−1) • other measures obtained from dKL (·, ·): right measure: drKL(C, Λ) = dKL(Λ, C) Λ = ddiag(C−1)−1 symmetrized measure: dsKL(C, Λ) = 1 2 (dKL(C, Λ) + dKL(Λ, C)) Λ = ddiag(C)1/2 ddiag(C−1)−1/2 11

Considered divergences • natural Riemannian distance: [Skovgaard, 1984], [Bhatia, 2009]
δ2 R (C, Λ) = log(Λ−1/2CΛ−1/2) 2 F ddiag(log(C−1Λ)) = 0n • log-Euclidean distance: [Arsigny et al., 2007] δ2 LE (C, Λ) = log(C) − log(Λ) 2 F Λ = exp(ddiag(log(C))) • Bhattacharyya distance: [Chebbi and Moakher, 2012], [Sra, 2013] δ2 B (C, Λ) = 4 log det((C + Λ)/2) det(C)1/2 det(Λ)1/2 2 ddiag((C + Λ)−1) = Λ−1 • Wasserstein distance: [Villani, 2008], [Bhatia et al., 2017] δ2 W (C, Λ) = trace 1 2 (C + Λ) − (Λ1/2CΛ1/2)1/2 ddiag((Λ1/2CΛ1/2)1/2) = Λ 12

First experiment: simulated data Ck = AΛk AT + 1
σ Ek ∆k ET k A, Ek : random elements, standard normal distribution conditioning of A with respect to inversion in [1, 10] Λk , ∆k : diagonal – pth element, χ2 distribution with expectancy 1 and divided by p σ : signal to noise ratio • performance measure: Moreau-Amari criterion [Moreau and Macchi, 1994] measure degree of similarity of BA with matrix of the form PΣ P permutation matrix – Σ non-singular diagonal matrix 13

First experiment: comparisons of divergences 10 50 100 500 1000
−30 −20 −10 σ IM-A (dB) F W rKL B R sKL KL LE medians, ﬁrst and ninth deciles (error bars) estimated on 50 trials 14

Second experiment: electroencephalographic data Nasion Inion 1 2 3 4
5 6 7 8 1. Oz 2. O1 13 Hz 17 Hz 3. O2 21 Hz repos 4. POz 10 20 f (Hz) 5. PO3 10 20 6. PO4 10 20 7. PO7 10 20 8. PO8 15

Obtained sources with proposed methods 10 20 f (Hz) s1
s2 13 Hz 17 Hz 21 Hz repos 16

Comparisons of divergences performance measure ISSVEP (f ) ∈ [0,
1] for class f ratio between power at f and total power ISSVEP(f ) F W rKL B R sKL KL LE 13 Hz 0, 95 0, 95 0, 96 0, 96 0, 96 0, 96 0, 96 0, 96 17 Hz 0, 87 0, 89 0, 91 0, 91 0, 91 0, 91 0, 91 0, 91 21 Hz 0, 50 0, 54 0, 60 0, 60 0, 60 0, 60 0, 60 0, 60 s1 17

Intrinsic Cramér-Rao bound for low-rank structured elliptical models

Intrinsic Cramér-Rao bound • x ∼ L(θ), θ ∈ M,
with log-likelihood Lx : M → R • given x, estimation problem: ˆ θ = argmin θ∈M −Lx (θ) • Cramèr-Rao bound of an unbiased estimator ˆ θ of θ : Eˆ θ [Err(θ, ˆ θ)] F−1 ⇒ MSE = Eˆ θ [err(θ, ˆ θ)] ≥ trace(F−1) F: Fisher information matrix 18

Intrinsic Cramér-Rao bound Eˆ θ [err(θ, ˆ θ)] ≥ trace(F−1)
• classical case: M = Rd err(θ, ˆ θ) = θ − ˆ θ 2 2 Fij = −Ex ∂2Lx (θ) ∂θi ∂θj ⇒ constraints, invariances of the model not taken into account • generalization to Riemannian manifold M [Smith, 2005, Boumal, 2013] • err(θ, ˆ θ) = δ2 M (θ, ˆ θ) δM distance on M associated to ·, · M · • Fij = ei , ej F θ • ξ, η F θ = −Ex [D2 Lx (θ)[ξ, η]], Fisher metric • {ei } orthonormal basis of TθM associated to ·, · M θ M TθM •θ ⇒ allows to exhibit diﬀerent properties from the classical case 19

Elliptical distributions / low-rank structure • x ∼ CES(0, R),
R ∈ H++ p , with log-likelihood L++ x (R) = log(g(xH R−1x)) − log det(R)−1 g : R+ → R+ e.g. gaussian (g(t) = exp(−t)), generalized gaussian, Student t • Fisher metric of elliptical distributions on H++ p : [Breloy et al., 2018] ξ, η F,++ R = αF trace(R−1ξR−1η) + βF trace(R−1ξ) trace(R−1η) • low-rank structure – few data [Sun et al., 2016, Bouchard et al., 2020] R = Ip + H H ∈ H+ p,k ⇒ geometry of H+ p,k : several possibilities, none ideal e.g. [Bonnabel and Sepulchre, 2009, Vandereycken et al., 2012, Massart and Absil, 2018] 20

Geometry of low-rank structured covariance matrices • H = ϕ(U,
Σ) = UΣUH with (U, Σ) ∈ Mp,k = Stp,k × H++ k • ∀O ∈ Uk , ϕ(UO, OH ΣO) = ϕ(U, Σ) ⇒ H+ p,k isomorphic to quotient [Bonnabel and Sepulchre, 2009, Bouchard et al., 2020] Mp,k = Mp,k /Uk = {π(U, Σ) : (U, Σ) ∈ Mp,k } où π(U, Σ) = {(UO, OH ΣO) : O ∈ Uk } Mp,k O → (UO, O H ΣO) • θ = (U, Σ) 21

quotient Mp,k : Riemannian geometry • θ = (U, Σ)
∈ Mp,k • Tθ Mp,k = {(UΩξ + U⊥ Kξ , ξΣ ) : Ωξ ∈ Ak , Kξ ∈ R(p−k)×k , ξΣ ∈ Sk } U⊥ ∈ Stp,(p−k) such that UH U⊥ = 0 ξ, η θ = 1 2 trace(ΩH ξ Ωη ) + trace(KH ξ Kη ) + α trace(Σ−1ξΣ Σ−1ηΣ ) + β trace(Σ−1ξΣ ) trace(Σ−1ηΣ ) • vertical space – tangent space to the equivalence class Vθ = {(UΩ, ΣΩ − ΩΣ) : Ω ∈ Ak } • horizontal space – orthogonal complement to Vθ according to ·, · θ Hθ = {ξ ∈ Tθ Mp,k : Ωξ = 2α(Σ−1ξΣ − ξΣ Σ−1)} Mp,k O → (UO, O T SO) TθMp,k Vθ Hθ • θ 22

quotient Mp,k : error measure • Riemannian distance not known
⇒ alternative error measure • alternative horizontal space – complement to Vθ, non orthogonal [Bonnabel and Sepulchre, 2009] Hθ = {ξ ∈ Tθ Mp,k : Ωξ = 0} = {(U⊥ Kξ , ξΣ ) : Kξ ∈ R(p−k)×k , ξΣ ∈ Sk } • induces divergence onto Mp,k d(π(θ), π(ˆ θ)) = Θ 2 2 + α log(Σ−1/2O ˆ OH ˆ Σ ˆ OOH Σ−1/2) 2 2 + β log det(Σ−1 ˆ Σ) 2 UT ˆ U = O cos(Θ) ˆ OT ⇒ err(θ, ˆ θ) = d(θ, ˆ θ) Mp,k O → (UO, O H ΣO) TθMp,k Vθ Hθ Hθ • θ • θ 23

quotient Mp,k : performance bound Eˆ θ [err(θ, ˆ θ)]
= Eˆ θ [d(θ, ˆ θ)] ≥ trace(F−1 ) Fij = ei , ej F,Mp,k θ • {ei }, orthonormal basis of Hθ according to ·, · θ : {(U⊥Kij , 0), (iU⊥Kij , 0)}1≤i≤p−k 1≤j≤k , {(0, 1 √ α Σ1/2Hij Σ1/2 + √ α− √ α+kβ k √ α √ α+kβ trace(Hij )Σ)}1≤j≤i≤k , {(0, 1 √ α Σ1/2H ij Σ1/2)}1≤j<i≤k • Kij ∈ R(p−k)×k : element ij equals 1, 0 elsewhere • Hii ∈ Rk×k : element ii equals 1, 0 elsewhere • Hij ∈ Rk×k , i = j: elements ij and ji equal 1/√ 2, 0 elsewhere • H ij ∈ iRk×k , i = j: elements ij and ji equal i/√ 2 et −i/√ 2, 0 elsewhere • ·, · F,Mp,k · : Fisher metric onto Mp,k 24

quotient Mp,k : Fisher metric of elliptical distributions • ϕ(θ)
= UΣUH D ϕ(θ)[ξ] = UξΣ UH + ξU ΣUH + UΣξU • log-likelihood: LMp,k x (θ) = L++ x (Ip + ϕ(θ)) • Fisher metric: ξ, η F,Mp,k θ = D ϕ(θ)[ξ], D ϕ(θ)[η] F,++ Ip+ϕ(θ) where ξR , ηR F,++ R = αF trace(R−1ξR R−1ηR ) + βF trace(R−1ξR ) trace(R−1ηR ) 25

Numerical illustrations • covariance matrix: R = Ip + σUΣUH
• p = 16, k = 8 • U ∈ Stp,k : random orthgonal matrix • Σ ∈ H++ k : diagonal matrix, minimal / maximal element: 1/ √ c, √ c (c=20: conditionning) other elements: random, uniform distribution between 1/ √ c and √ c trace normalisée : trace(Σ) = trace(Ik ) = k • σ = 50 : free parameter • data : • 500 sets of n ∈ [12, 300] random samples • t-distribution, d = 3 (non gaussian) and d = 100 (almost gaussian) degrees of freedom [Ollila et al., 2012] 26

Numerical illustrations 101n = p 102 −10 0 10 n
err(θ, ˆ θ) (dB) d = 3 trace(F−1 θ ) T-RGD pSCM T-RTR 101n = p 102 n d = 100 27

Conclusions and perspectives

Conclusions and perspectives • Riemannian geometry: • powerful tool for
signal processing and machine learning • exploits intrinsic structure of data and parameters • quite complete and allows modularity – modeling, optimization, performance analysis • when to use Riemannian geometry and optimization ? • data and/or parameters possess a structure: constraints, invariances (geometrical properties of model) • metric of particular interest with respect to model (e.g., information theory) • issue: for some manifolds, geometry complicated to fully characterize e.g., low rank symmetric positive semideﬁnite matrices ⇒ go beyond Riemannian geometry to provide simple and eﬃcient geometrical objects 28

Absil, P.-A., Mahony, R., and Sepulchre, R. (2008). Optimization Algorithms
on Matrix Manifolds. Princeton University Press, Princeton, NJ, USA. Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2007). Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM journal on matrix analysis and applications, 29(1):328–347. Bhatia, R. (2009). Positive definite matrices. Princeton University Press. Bhatia, R., Jain, T., and Lim, Y. (2017). On the Bures-Wasserstein distance between positive definite matrices. preprint. Bonnabel, S. and Sepulchre, R. (2009). Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM Journal on Matrix Analysis and Applications, 31(3):1055–1070. Bouchard, F., Breloy, A., Ginolhac, G., Renaux, A., and Pascal, F. (2020). A Riemannian framework for low-rank structured elliptical models. IEEE Transactions on Signal Processing, submitted. Available on arxiv. Boumal, N. (2013). On intrinsic Cramér-Rao bounds for Riemannian submanifolds and quotient manifolds. IEEE Transactions on Signal Processing, 61(7):1809–1821. Breloy, A., Ginolhac, G., Renaux, A., and Bouchard, F. (2018). Intrinsic Cramér–Rao bounds for scatter and shape matrices estimation in CES distributions. IEEE Signal Processing Letters, 26(2):262–266. Cardoso, J.-F. and Souloumiac, A. (1993). Blind beamforming for non Gaussian signals. IEEE Proceedings-F, 140(6):362–370. 29

Chebbi, Z. and Moakher, M. (2012). Means of Hermitian positive-definite
matrices based on the log-determinant α-divergence function. Linear Algebra and its Applications, 436(7):1872–1889. Comon, P. and Jutten, C. (2010). Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, 1st edition. Massart, E. and Absil, P.-A. (2018). Quotient geometry with simple geodesics for the manifold of fixed-rank positive-semidefinite matrices. Technical Report UCL-INMA-2018.06. Moreau, E. and Macchi, O. (1994). A one stage self-adaptive algorithm for source separation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., volume 3, pages 49–52. Ollila, E., Tyler, D. E., Koivunen, V., and Poor, H. V. (2012). Complex elliptically symmetric distributions: Survey, new results and applications. IEEE Transactions on Signal Processing, 60(11):5597–5625. Pham, D.-T. (2000). Joint approximate diagonalization of positive definite Hermitian matrices. SIAM J. Matrix Anal. Appl., 22(4):1136–1152. Skovgaard, L. T. (1984). A Riemannian geometry of the multivariate normal model. Scandinavian Journal of Statistics, pages 211–223. Smith, S. T. (2005). Covariance, subspace, and intrinsic Cramér-Rao bounds. IEEE Transactions on Signal Processing, 53(5):1610–1630. Sra, S. (2013). Positive definite matrices and the s-divergence. arXiv preprint arXiv:1110.1773. 30

Sun, Y., Babu, P., and Palomar, D. P. (2016). Robust
estimation of structured covariance matrix for heavy-tailed elliptical distributions. IEEE Transactions on Signal Processing, 64(14):3576–3590. Vandereycken, B., Absil, P.-A., and Vandewalle, S. (2012). A Riemannian geometry with complete geodesics for the set of positive semideﬁnite matrices of ﬁxed rank. IMA Journal of Numerical Analysis, 33(2):481–514. Villani, C. (2008). Optimal transport: old and new, volume 338. Springer Science & Business Media. 31

Florent Bouchard

Florent Bouchard

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript