Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Florent Bouchard

S³ Seminar
November 27, 2020

Florent Bouchard

(L2S — CNRS, Université Paris-Saclay, CentraleSupélec)

https://s3-seminar.github.io/seminars/florent-bouchard/

Title — Riemannian geometry for data analysis: illustration on blind source separation and low-rank structured covariance matrices

Abstract — In this presentation, Riemannian geometry for data analysis is introduced. In particular, it is applied on two specific statistical signal processing problems: blind source separation and low-rank structured covariance matrices. Blind source separation can be solved by jointly diagonalizing some covariance matrices. We show here how geometry can be exploited in order to provide original joint diagonalization criteria and ways to compare them theoretically. These results are illustrated with numerical experiments both on simulated and real data. Concerning low-rank structured covariance matrices, an intrinsic Cramér-Rao bound of the corresponding estimation problem is presented, illustrating the interest of geometry for performance analysis. These results are validated with simulations.

S³ Seminar

November 27, 2020
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Riemannian geometry for data analysis:
    application to blind source separation and low-rank structured
    covariance matrices
    Florent Bouchard (L2S – CNRS, Univ. Paris-Saclay, CentraleSupélec)
    S3 seminar – November, 25 2020

    View Slide

  2. Introduction

    View Slide

  3. Introduction – Statistical data analysis
    • Characterize data x with some parameters θ
    x θ
    • To estimate θ, optimization problem:
    argmin
    θ∈M
    f (x, θ)
    • f : X × M → R, cost function corresponding to the model
    • x and θ might possess a structure ⇒ X and M are manifolds
    1

    View Slide

  4. Introduction – Riemannian geometry
    Geometry: metric, curves, distance or divergence
    treat difficult problems, complicated to handle with other approaches
    • modeling: design appropriate cost functions for the considered model
    exploit the structure of data x, e.g. centers of mass
    • optimization: generic convex and non-convex methods, modularity
    naturally takes into account structure of parameters θ
    • performance analysis: error measures and associated performance bounds
    captures the geometrical structure of the problem
    M
    TθM
    •θ
    2

    View Slide

  5. Introduction
    Riemannian geometry and optimization
    Approximate joint diagonalization for blind source separation
    Intrinsic Cramér-Rao bound for low-rank structured elliptical models
    Conclusions and perspectives

    View Slide

  6. Riemannian geometry and optimization

    View Slide

  7. Riemannian geometry [Absil et al., 2008]
    • Manifold M: locally diffeomorphic to Rd , with dim(M) = d, i.e.
    ∀θ ∈ M, ∃ Uθ
    ⊂ M and ϕθ
    : Uθ
    → Rd , diffeomorphism
    M

    θ

    • Curve γ : R → M, γ(0) = θ, derivative: ˙
    γ(0) = lim
    t→0
    γ(t) − γ(0)
    t
    • Tangent space Tθ
    M = { ˙
    γ(0) : γ : R → M, γ(0) = θ}
    M
    TθM
    θ •
    3

    View Slide

  8. Riemannian geometry [Absil et al., 2008]
    • Riemannian metric ·, · θ
    : Tθ
    M × Tθ
    M → R
    • inner product on TθM – bilinear, symmetric, positive definite
    • defines length and relative positions of tangent vectors
    ξ 2
    θ
    = ξ, ξ θ α(ξ, η) =
    ξ, η θ
    ξ
    θ
    η
    θ
    • Geodesics γ : [0, 1] → M
    • generalizes straight lines to manifolds – defined by (γ(0), ˙
    γ(0)) or (γ(0), γ(1))
    • curves on M with zero acceleration:
    D2γ
    dt2
    = 0
    operator D2
    dt2
    depends on M and ·, · ·
    • Riemannian distance: δ(θ, ˆ
    θ) =
    1
    0
    ˙
    γ(t)
    γ(t)
    dt
    γ: geodesic connecting θ and ˆ
    θ ⇒ distance = length of γ
    M
    TθM
    θ
    ξ
    η
    α

    M
    θ ˆ
    θ
    ˙
    γ(t)
    • •

    γ(t)
    4

    View Slide

  9. Riemannian optimization [Absil et al., 2008]
    argmin
    θ∈M
    f (θ)
    • framework for optimization on manifold M equipped with metric ·, · ·
    M
    TθM
    ξ

    θ
    f (θ)
    D f (θ)[ξ]
    • descent direction of f at θ:
    ξ ∈ Tθ
    M, D f (θ)[ξ] < 0
    • gradient of f at θ:
    grad f (θ), ξ θ
    = D f (θ)[ξ]
    5

    View Slide

  10. Riemannian optimization [Absil et al., 2008]
    • minimize f on M from θ:
    descent direction ξ ∈ TθM
    grad f (θ), ξ θ < 0
    retraction of ξ on M
    M
    TθM ξ

    θ

    Rθ(ξ)
    reiterate until critical point: grad f (θ) = 0
    • example of optimization algorithm: gradient descent
    θi+1
    = Rθi
    (−ti
    grad f (θi
    ))
    6

    View Slide

  11. Approximate joint diagonalization for
    blind source separation

    View Slide

  12. Problem: approximate joint diagonalization
    • data: set {Ck
    } of n × n symmetric positive definite matrices
    • goal: find joint diagonalizer B of matrices Ck
    {Ck }
    ···
    k
    B
    BCk BT
    ···
    k
    • formulation: non-singular matrix B ∈ Rn×n such that:
    argmin
    B
    f (B, {Ck
    })
    f – diagonality criterion
    7

    View Slide

  13. Application: blind source separation
    • instantaneous linear mixing: x(t) = A s(t) [Comon and Jutten, 2010]
    • goal: retrieve A and s(t) knowing x(t)
    s(t) x(t)
    A Bx(t)
    B
    • used in many engineering fields such as electroencephalography
    8

    View Slide

  14. Geometrical modeling of the problem
    • goal: get Ck
    as close as possible to D++
    n
    in S++
    n
    • diagonality measure of BCk
    BT : relative position to D++
    n
    D++
    n
    diagonal positive
    definite matrices
    S++
    n
    symmetric positive
    definite matrices




    • •
    Ck
    9

    View Slide

  15. Proposed geometrical model
    • appropriate criterion: f (B, {Ck
    })) =
    k
    d(BCk
    BT , Λk
    (B))
    Λk (B): target diagonal matrices
    d(·, ·) : divergence on S++
    n
    D++
    n
    diagonal positive
    definite matrices
    S++
    n
    symmetric positive
    definite matrices

    BCk BT

    Λk (B)

    Λk (B)
    • natural choice for Λk
    (B):
    Λk
    (B) = argmin
    Λ∈D++
    n
    d(BCk
    BT , Λ)
    10

    View Slide

  16. Considered divergences
    • least squares criterion: Euclidean distance on S++
    n
    [Cardoso and Souloumiac, 1993]
    δ2
    F
    (C, Λ) = C − Λ 2
    F
    Λ = ddiag(C)
    • log-likelihood cirterion: left Kullback-Leibler divergence [Pham, 2000]
    d KL(C, Λ) = dKL(C, Λ) Λ = ddiag(C)
    where dKL(P, S) = trace(PS−1 − In) − log det(PS−1)
    • other measures obtained from dKL
    (·, ·):
    right measure:
    drKL(C, Λ) = dKL(Λ, C) Λ = ddiag(C−1)−1
    symmetrized measure:
    dsKL(C, Λ) =
    1
    2
    (dKL(C, Λ) + dKL(Λ, C)) Λ = ddiag(C)1/2 ddiag(C−1)−1/2
    11

    View Slide

  17. Considered divergences
    • natural Riemannian distance: [Skovgaard, 1984], [Bhatia, 2009]
    δ2
    R
    (C, Λ) = log(Λ−1/2CΛ−1/2)
    2
    F
    ddiag(log(C−1Λ)) = 0n
    • log-Euclidean distance: [Arsigny et al., 2007]
    δ2
    LE
    (C, Λ) = log(C) − log(Λ) 2
    F
    Λ = exp(ddiag(log(C)))
    • Bhattacharyya distance: [Chebbi and Moakher, 2012], [Sra, 2013]
    δ2
    B
    (C, Λ) = 4 log
    det((C + Λ)/2)
    det(C)1/2 det(Λ)1/2
    2 ddiag((C + Λ)−1) = Λ−1
    • Wasserstein distance: [Villani, 2008], [Bhatia et al., 2017]
    δ2
    W
    (C, Λ) = trace
    1
    2
    (C + Λ) − (Λ1/2CΛ1/2)1/2 ddiag((Λ1/2CΛ1/2)1/2) = Λ
    12

    View Slide

  18. First experiment: simulated data
    Ck
    = AΛk
    AT +
    1
    σ
    Ek
    ∆k
    ET
    k
    A, Ek : random elements, standard normal distribution
    conditioning of A with respect to inversion in [1, 10]
    Λk , ∆k : diagonal – pth element, χ2 distribution with expectancy 1 and divided
    by p
    σ : signal to noise ratio
    • performance measure: Moreau-Amari criterion [Moreau and Macchi, 1994]
    measure degree of similarity of BA with matrix of the form PΣ
    P permutation matrix – Σ non-singular diagonal matrix
    13

    View Slide

  19. First experiment: comparisons of divergences
    10 50 100 500 1000
    −30
    −20
    −10
    σ
    IM-A (dB)
    F W
    rKL B
    R sKL
    KL LE
    medians, first and ninth deciles (error bars) estimated on 50 trials
    14

    View Slide

  20. Second experiment: electroencephalographic data
    Nasion
    Inion
    1
    2 3
    4
    5 6
    7 8
    1. Oz 2. O1
    13 Hz
    17 Hz
    3. O2
    21 Hz
    repos
    4. POz
    10 20
    f (Hz)
    5. PO3
    10 20
    6. PO4
    10 20
    7. PO7
    10 20
    8. PO8
    15

    View Slide

  21. Obtained sources with proposed methods
    10 20
    f (Hz)
    s1 s2 13 Hz
    17 Hz
    21 Hz
    repos
    16

    View Slide

  22. Comparisons of divergences
    performance measure ISSVEP
    (f ) ∈ [0, 1] for class f
    ratio between power at f and total power
    ISSVEP(f ) F W rKL B R sKL KL LE
    13 Hz 0, 95 0, 95 0, 96 0, 96 0, 96 0, 96 0, 96 0, 96
    17 Hz 0, 87 0, 89 0, 91 0, 91 0, 91 0, 91 0, 91 0, 91
    21 Hz 0, 50 0, 54 0, 60 0, 60 0, 60 0, 60 0, 60 0, 60
    s1
    17

    View Slide

  23. Intrinsic Cramér-Rao bound for low-rank
    structured elliptical models

    View Slide

  24. Intrinsic Cramér-Rao bound
    • x ∼ L(θ), θ ∈ M, with log-likelihood Lx
    : M → R
    • given x, estimation problem:
    ˆ
    θ = argmin
    θ∈M
    −Lx
    (θ)
    • Cramèr-Rao bound of an unbiased estimator ˆ
    θ of θ :

    θ
    [Err(θ, ˆ
    θ)] F−1 ⇒ MSE = Eˆ
    θ
    [err(θ, ˆ
    θ)] ≥ trace(F−1)
    F: Fisher information matrix
    18

    View Slide

  25. Intrinsic Cramér-Rao bound

    θ
    [err(θ, ˆ
    θ)] ≥ trace(F−1)
    • classical case: M = Rd
    err(θ, ˆ
    θ) = θ − ˆ
    θ 2
    2
    Fij
    = −Ex
    ∂2Lx
    (θ)
    ∂θi ∂θj
    ⇒ constraints, invariances of the model not taken into account
    • generalization to Riemannian manifold M [Smith, 2005, Boumal, 2013]
    • err(θ, ˆ
    θ) = δ2
    M
    (θ, ˆ
    θ)
    δM distance on M associated to ·, · M
    ·
    • Fij
    = ei , ej
    F
    θ
    • ξ, η F
    θ
    = −Ex [D2 Lx (θ)[ξ, η]],
    Fisher metric
    • {ei } orthonormal basis of TθM
    associated to ·, · M
    θ
    M
    TθM
    •θ
    ⇒ allows to exhibit different properties from the classical case
    19

    View Slide

  26. Elliptical distributions / low-rank structure
    • x ∼ CES(0, R), R ∈ H++
    p
    , with log-likelihood
    L++
    x
    (R) = log(g(xH R−1x)) − log det(R)−1 g : R+ → R+
    e.g. gaussian (g(t) = exp(−t)), generalized gaussian, Student t
    • Fisher metric of elliptical distributions on H++
    p
    : [Breloy et al., 2018]
    ξ, η F,++
    R
    = αF trace(R−1ξR−1η) + βF trace(R−1ξ) trace(R−1η)
    • low-rank structure – few data [Sun et al., 2016, Bouchard et al., 2020]
    R = Ip
    + H H ∈ H+
    p,k
    ⇒ geometry of H+
    p,k
    : several possibilities, none ideal
    e.g. [Bonnabel and Sepulchre, 2009, Vandereycken et al., 2012, Massart and Absil, 2018]
    20

    View Slide

  27. Geometry of low-rank structured covariance matrices
    • H = ϕ(U, Σ) = UΣUH with (U, Σ) ∈ Mp,k
    = Stp,k
    × H++
    k
    • ∀O ∈ Uk
    , ϕ(UO, OH ΣO) = ϕ(U, Σ)
    ⇒ H+
    p,k
    isomorphic to quotient [Bonnabel and Sepulchre, 2009, Bouchard et al., 2020]
    Mp,k
    = Mp,k
    /Uk
    = {π(U, Σ) : (U, Σ) ∈ Mp,k
    }
    où π(U, Σ) = {(UO, OH ΣO) : O ∈ Uk
    }
    Mp,k
    O → (UO, O H
    ΣO)

    θ = (U, Σ)
    21

    View Slide

  28. quotient Mp,k
    : Riemannian geometry
    • θ = (U, Σ) ∈ Mp,k
    • Tθ
    Mp,k
    = {(UΩξ
    + U⊥

    , ξΣ
    ) : Ωξ
    ∈ Ak
    , Kξ
    ∈ R(p−k)×k , ξΣ
    ∈ Sk
    }
    U⊥ ∈ Stp,(p−k)
    such that UH U⊥
    = 0
    ξ, η θ
    =
    1
    2
    trace(ΩH
    ξ
    Ωη
    ) + trace(KH
    ξ

    )
    + α trace(Σ−1ξΣ
    Σ−1ηΣ
    ) + β trace(Σ−1ξΣ
    ) trace(Σ−1ηΣ
    )
    • vertical space – tangent space to the equivalence class

    = {(UΩ, ΣΩ − ΩΣ) : Ω ∈ Ak
    }
    • horizontal space – orthogonal complement to Vθ according to ·, · θ

    = {ξ ∈ Tθ
    Mp,k
    : Ωξ
    = 2α(Σ−1ξΣ
    − ξΣ
    Σ−1)}
    Mp,k
    O → (UO, O T
    SO)
    TθMp,k Vθ


    θ
    22

    View Slide

  29. quotient Mp,k
    : error measure
    • Riemannian distance not known ⇒ alternative error measure
    • alternative horizontal space – complement to Vθ, non orthogonal
    [Bonnabel and Sepulchre, 2009]

    = {ξ ∈ Tθ
    Mp,k
    : Ωξ
    = 0}
    = {(U⊥

    , ξΣ
    ) : Kξ
    ∈ R(p−k)×k , ξΣ
    ∈ Sk
    }
    • induces divergence onto Mp,k
    d(π(θ), π(ˆ
    θ)) = Θ 2
    2
    + α log(Σ−1/2O ˆ
    OH ˆ
    Σ ˆ
    OOH Σ−1/2)
    2
    2
    + β log det(Σ−1 ˆ
    Σ)
    2
    UT ˆ
    U = O cos(Θ) ˆ
    OT
    ⇒ err(θ, ˆ
    θ) = d(θ, ˆ
    θ)
    Mp,k
    O → (UO, O H
    ΣO)
    TθMp,k Vθ


    • θ

    θ
    23

    View Slide

  30. quotient Mp,k
    : performance bound

    θ
    [err(θ, ˆ
    θ)] = Eˆ
    θ
    [d(θ, ˆ
    θ)] ≥ trace(F−1
    )
    Fij
    = ei , ej
    F,Mp,k
    θ
    • {ei }, orthonormal basis of Hθ
    according to ·, · θ
    :
    {(U⊥Kij , 0), (iU⊥Kij , 0)}1≤i≤p−k
    1≤j≤k
    ,
    {(0, 1

    α
    Σ1/2Hij Σ1/2 +

    α−

    α+kβ
    k

    α

    α+kβ
    trace(Hij )Σ)}1≤j≤i≤k ,
    {(0, 1

    α
    Σ1/2H
    ij
    Σ1/2)}1≤j• Kij ∈ R(p−k)×k : element ij equals 1, 0 elsewhere
    • Hii ∈ Rk×k : element ii equals 1, 0 elsewhere
    • Hij ∈ Rk×k , i = j: elements ij and ji equal 1/√
    2, 0 elsewhere
    • H
    ij
    ∈ iRk×k , i = j: elements ij and ji equal i/√
    2 et −i/√
    2, 0 elsewhere
    • ·, · F,Mp,k
    ·
    : Fisher metric onto Mp,k
    24

    View Slide

  31. quotient Mp,k
    : Fisher metric of elliptical distributions
    • ϕ(θ) = UΣUH D ϕ(θ)[ξ] = UξΣ
    UH + ξU
    ΣUH + UΣξU
    • log-likelihood: LMp,k
    x
    (θ) = L++
    x
    (Ip
    + ϕ(θ))
    • Fisher metric:
    ξ, η F,Mp,k
    θ
    = D ϕ(θ)[ξ], D ϕ(θ)[η] F,++
    Ip+ϕ(θ)
    where
    ξR
    , ηR
    F,++
    R
    = αF trace(R−1ξR
    R−1ηR
    ) + βF trace(R−1ξR
    ) trace(R−1ηR
    )
    25

    View Slide

  32. Numerical illustrations
    • covariance matrix: R = Ip
    + σUΣUH
    • p = 16, k = 8
    • U ∈ Stp,k : random orthgonal matrix
    • Σ ∈ H++
    k
    : diagonal matrix,
    minimal / maximal element: 1/

    c,

    c (c=20: conditionning)
    other elements: random, uniform distribution between 1/

    c
    and

    c
    trace normalisée : trace(Σ) = trace(Ik ) = k
    • σ = 50 : free parameter
    • data :
    • 500 sets of n ∈ [12, 300] random samples
    • t-distribution, d = 3 (non gaussian) and d = 100 (almost gaussian) degrees of
    freedom
    [Ollila et al., 2012]
    26

    View Slide

  33. Numerical illustrations
    101n = p 102
    −10
    0
    10
    n
    err(θ, ˆ
    θ) (dB)
    d = 3
    trace(F−1
    θ
    ) T-RGD
    pSCM T-RTR
    101n = p 102
    n
    d = 100
    27

    View Slide

  34. Conclusions and perspectives

    View Slide

  35. Conclusions and perspectives
    • Riemannian geometry:
    • powerful tool for signal processing and machine learning
    • exploits intrinsic structure of data and parameters
    • quite complete and allows modularity – modeling, optimization,
    performance analysis
    • when to use Riemannian geometry and optimization ?
    • data and/or parameters possess a structure: constraints, invariances
    (geometrical properties of model)
    • metric of particular interest with respect to model (e.g., information theory)
    • issue: for some manifolds, geometry complicated to fully characterize
    e.g., low rank symmetric positive semidefinite matrices
    ⇒ go beyond Riemannian geometry to provide simple and efficient
    geometrical objects
    28

    View Slide

  36. Absil, P.-A., Mahony, R., and Sepulchre, R. (2008).
    Optimization Algorithms on Matrix Manifolds.
    Princeton University Press, Princeton, NJ, USA.
    Arsigny, V., Fillard, P., Pennec, X., and Ayache, N. (2007).
    Geometric means in a novel vector space structure on symmetric positive-definite matrices.
    SIAM journal on matrix analysis and applications, 29(1):328–347.
    Bhatia, R. (2009).
    Positive definite matrices.
    Princeton University Press.
    Bhatia, R., Jain, T., and Lim, Y. (2017).
    On the Bures-Wasserstein distance between positive definite matrices.
    preprint.
    Bonnabel, S. and Sepulchre, R. (2009).
    Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank.
    SIAM Journal on Matrix Analysis and Applications, 31(3):1055–1070.
    Bouchard, F., Breloy, A., Ginolhac, G., Renaux, A., and Pascal, F. (2020).
    A Riemannian framework for low-rank structured elliptical models.
    IEEE Transactions on Signal Processing, submitted. Available on arxiv.
    Boumal, N. (2013).
    On intrinsic Cramér-Rao bounds for Riemannian submanifolds and quotient manifolds.
    IEEE Transactions on Signal Processing, 61(7):1809–1821.
    Breloy, A., Ginolhac, G., Renaux, A., and Bouchard, F. (2018).
    Intrinsic Cramér–Rao bounds for scatter and shape matrices estimation in CES distributions.
    IEEE Signal Processing Letters, 26(2):262–266.
    Cardoso, J.-F. and Souloumiac, A. (1993).
    Blind beamforming for non Gaussian signals.
    IEEE Proceedings-F, 140(6):362–370.
    29

    View Slide

  37. Chebbi, Z. and Moakher, M. (2012).
    Means of Hermitian positive-definite matrices based on the log-determinant α-divergence function.
    Linear Algebra and its Applications, 436(7):1872–1889.
    Comon, P. and Jutten, C. (2010).
    Handbook of Blind Source Separation: Independent Component Analysis and Applications.
    Academic Press, 1st edition.
    Massart, E. and Absil, P.-A. (2018).
    Quotient geometry with simple geodesics for the manifold of fixed-rank positive-semidefinite matrices.
    Technical Report UCL-INMA-2018.06.
    Moreau, E. and Macchi, O. (1994).
    A one stage self-adaptive algorithm for source separation.
    In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., volume 3,
    pages 49–52.
    Ollila, E., Tyler, D. E., Koivunen, V., and Poor, H. V. (2012).
    Complex elliptically symmetric distributions: Survey, new results and applications.
    IEEE Transactions on Signal Processing, 60(11):5597–5625.
    Pham, D.-T. (2000).
    Joint approximate diagonalization of positive definite Hermitian matrices.
    SIAM J. Matrix Anal. Appl., 22(4):1136–1152.
    Skovgaard, L. T. (1984).
    A Riemannian geometry of the multivariate normal model.
    Scandinavian Journal of Statistics, pages 211–223.
    Smith, S. T. (2005).
    Covariance, subspace, and intrinsic Cramér-Rao bounds.
    IEEE Transactions on Signal Processing, 53(5):1610–1630.
    Sra, S. (2013).
    Positive definite matrices and the s-divergence.
    arXiv preprint arXiv:1110.1773.
    30

    View Slide

  38. Sun, Y., Babu, P., and Palomar, D. P. (2016).
    Robust estimation of structured covariance matrix for heavy-tailed elliptical distributions.
    IEEE Transactions on Signal Processing, 64(14):3576–3590.
    Vandereycken, B., Absil, P.-A., and Vandewalle, S. (2012).
    A Riemannian geometry with complete geodesics for the set of positive semidefinite matrices of fixed rank.
    IMA Journal of Numerical Analysis, 33(2):481–514.
    Villani, C. (2008).
    Optimal transport: old and new, volume 338.
    Springer Science & Business Media.
    31

    View Slide