Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Alexandre Hippert-Ferrer

S³ Seminar
September 24, 2021
39

Alexandre Hippert-Ferrer

(L2S, CentraleSupelec, Université Paris Saclay)

https://s3-seminar.github.io/seminars/alexandre-hippert

Title — Robust low-rank covariance matrix estimation with missing values and application to classification problems

Abstract — Missing values are inherent to real-world data sets. Statistical learning problems often require the estimation of parameters as the mean or the covariance matrix (CM). If the data is incomplete, new estimation methodologies need to be designed depending on the data distribution and the missingness pattern (i.e. the pattern describing which values are missing with respect to the observed data). This talk considers robust CM estimation when the data is incomplete. In this perspective, classical statistical estimation methodologies are usually built upon the Gaussian assumption, whereas existing robust estimation ones assume unstructured signal models. The former can be inaccurate in real-world data sets in which heterogeneity causes heavy-tail distributions, while the latter does not profit from the usual low-rank structure of the signal. Taking advantage of both worlds, a CM estimation procedure is designed on a robust (compound Gaussian) low-rank model by leveraging the observed-data likelihood function within an expectation-maximization (EM) algorithm. After a validation on simulated data sets with various missingness patterns, the interest the proposed procedure is shown for CM-based classification and clustering problems with incomplete data. Investigated examples generally show higher classification accuracies with a classifier based on robust estimation compared to the one based on Gaussian assumption and the one based on imputed data.

S³ Seminar

September 24, 2021
Tweet

Transcript

  1. Robust low-rank covariance matrix estimation with
    missing values and application to classification problems
    Alexandre Hippert-Ferrer
    Laboratoire des Signaux et Syst`
    emes – CentraleSup´
    elec – Universit´
    e Paris-Saclay

    eminaire S3 – September 2021
    Joint work with:
    Mohammed Nabil El Korso (LEME, Universit´
    e Paris Nanterre)
    Arnaud Breloy (LEME, Universit´
    e Paris Nanterre)
    Guillaume Ginolhac (LISTIC, Universit´
    e Savoie Mont Blanc)
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 1 / 35

    View Slide

  2. Introduction
    (a) Missing completely at random (b) Missing not at random
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 2 / 35

    View Slide

  3. Introduction
    θ = {µ, Σ, . . . }
    ?
    ?
    (a) Missing completely at random (b) Missing not at random
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 2 / 35

    View Slide

  4. Table of Contents
    1 Problem formulation and data model
    2 EM algorithms for incomplete data under the SG distribution
    A brief introduction to the EM algorithm
    The unstructured case
    The structured case
    Numerical simulations
    3 Application to (non-)supervised learning
    Classification of crop fields
    Image clustering
    Classification of EEG signals
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 3 / 35

    View Slide

  5. Problem formulation
    Let us consider:
    • Complete data {yi
    }n
    i=1
    ∈ Rp
    • yi = {yo
    i
    , ym
    i
    }
    How to estimate θ from incomplete samples ?
    1. Use observed data only;
    2. Impute the data, then estimate θ;
    3. Use the Expectation-Maximization (EM) algorithm → handy iterative
    procedure to find ˆ
    θML.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 4 / 35

    View Slide

  6. Problem formulation
    Let us consider:
    • Complete data {yi
    }n
    i=1
    ∈ Rp
    • yi = {yo
    i
    , ym
    i
    }
    • Unknown parameter of interest θ ∈ Ω
    • A probabilistic model of the data p(y|θ)
    • Maximum likelihood (ML): ˆ
    θML = arg max
    θ∈Ω
    p(y|θ)
    How to estimate θ from incomplete samples ?
    1. Use observed data only;
    2. Impute the data, then estimate θ;
    3. Use the Expectation-Maximization (EM) algorithm → handy iterative
    procedure to find ˆ
    θML.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 4 / 35

    View Slide

  7. Problem formulation
    Let us consider:
    • Complete data {yi
    }n
    i=1
    ∈ Rp
    • yi = {yo
    i
    , ym
    i
    }
    • Unknown parameter of interest θ ∈ Ω
    • A probabilistic model of the data p(y|θ)
    • Maximum likelihood (ML): ˆ
    θML = arg max
    θ∈Ω
    p(y|θ)
    How to estimate θ from incomplete samples ?
    1. Use observed data only;
    2. Impute the data, then estimate θ;
    3. Use the Expectation-Maximization (EM) algorithm → handy iterative
    procedure to find ˆ
    θML.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 4 / 35

    View Slide

  8. Problem formulation
    Let us consider:
    • Complete data {yi
    }n
    i=1
    ∈ Rp
    • yi = {yo
    i
    , ym
    i
    }
    • Unknown parameter of interest θ ∈ Ω
    • A probabilistic model of the data p(y|θ)
    • Maximum likelihood (ML): ˆ
    θML = arg max
    θ∈Ω
    p(y|θ)
    How to estimate θ from incomplete samples ?
    1. Use observed data only;
    2. Impute the data, then estimate θ;
    3. Use the Expectation-Maximization (EM) algorithm → handy iterative
    procedure to find ˆ
    θML.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 4 / 35

    View Slide

  9. Problem formulation
    Let us consider:
    • Complete data {yi
    }n
    i=1
    ∈ Rp
    • yi = {yo
    i
    , ym
    i
    }
    • Unknown parameter of interest θ ∈ Ω
    • A probabilistic model of the data p(y|θ)
    • Maximum likelihood (ML): ˆ
    θML = arg max
    θ∈Ω
    p(y|θ)
    How to estimate θ from incomplete samples ?
    1. Use observed data only;
    2. Impute the data, then estimate θ;
    3. Use the Expectation-Maximization (EM) algorithm → handy iterative
    procedure to find ˆ
    θML.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 4 / 35

    View Slide

  10. Motivations
    • Work on ML estimation of Σ with incomplete data exist, e.g.:
    n > p (C. Liu 1999)



    Gaussian
    p > n (Lounici et al. 2014; St¨
    adler et al. 2014)
    rank(Σ) = r < p (Sportisse et al. 2020; Aubry et al. 2021)
    t-distribution (C. Liu and Rubin 1995; J. Liu et al. 2019)
    non-Gaussian
    GEM (Frahm et al. 2010)
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 5 / 35

    View Slide

  11. Motivations
    • Work on ML estimation of Σ with incomplete data exist, e.g.:
    n > p (C. Liu 1999)



    Gaussian
    p > n (Lounici et al. 2014; St¨
    adler et al. 2014)
    rank(Σ) = r < p (Sportisse et al. 2020; Aubry et al. 2021)
    t-distribution (C. Liu and Rubin 1995; J. Liu et al. 2019)
    non-Gaussian
    GEM (Frahm et al. 2010)
    • Missing data patterns:
    monotone general random
    Most of the work
    Gaussian / full-rank Σ
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 5 / 35

    View Slide

  12. Aim of this work
    1. Build generic algorithms that take advantage of both robust
    estimation and low-rank structure;
    2. Handle any pattern (= mechanism) of missing values (monotone,
    general, random);
    3. Apply these procedures to covariance-based
    imputation/classification/clustering problems.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 6 / 35

    View Slide

  13. The scaled-Gaussian (SG) distribution
    • Model:
    yi
    | τi
    ∼ N(0, τiΣ), Σ ⊆ Sp
    ++
    , τi > 0
    Texture parameter
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 7 / 35

    View Slide

  14. The scaled-Gaussian (SG) distribution
    • Model:
    yi
    | τi
    ∼ N(0, τiΣ), Σ ⊆ Sp
    ++
    , τi > 0
    Texture parameter
    • Complete likelihood function:
    Lc(yi
    |Σ, τi) ∝
    n
    i=1
    |τiΣ|−1 exp − yT
    i
    (τiΣ)−1yi
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 7 / 35

    View Slide

  15. The scaled-Gaussian (SG) distribution
    • Model:
    yi
    | τi
    ∼ N(0, τiΣ), Σ ⊆ Sp
    ++
    , τi > 0
    Texture parameter
    • Complete likelihood function:
    Lc(yi
    |Σ, τi) ∝
    n
    i=1
    |τiΣ|−1 exp − yT
    i
    (τiΣ)−1yi
    • Constraints on Σ: low-rank structure











    Σ = σ2Ip + H
    H 0
    rank(H) = r
    σ > 0
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 7 / 35

    View Slide

  16. Data incompleteness model
    • Data transformation:
    yi = Piyi =
    yo
    i
    ym
    i
    , i ∈ [1, n]
    {Pi
    }n
    i=1
    ∈ Rp×p: set of n
    permutation matrix.
    • Example: n = p = 3
    y1
    y2
    y3 ˜
    y1
    ˜
    y2
    ˜
    y3
    y11
    y12
    y13
    y21
    y22
    y23
    y31
    y33
    y12
    y13
    y11
    y23
    y21
    y22
    y31
    y33
    y32
    y32
    ˜
    yi
    = Pi
    yi
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 8 / 35

    View Slide

  17. Data incompleteness model
    • Data transformation:
    yi = Piyi =
    yo
    i
    ym
    i
    , i ∈ [1, n]
    {Pi
    }n
    i=1
    ∈ Rp×p: set of n
    permutation matrix.
    • Example: n = p = 3
    y1
    y2
    y3 ˜
    y1
    ˜
    y2
    ˜
    y3
    y11
    y12
    y13
    y21
    y22
    y23
    y31
    y33
    y12
    y13
    y11
    y23
    y21
    y22
    y31
    y33
    y32
    y32
    ˜
    yi
    = Pi
    yi
    • Covariance matrix:
    Σi =
    Σi,oo Σi,mo
    Σi,om Σi,mm
    = PiΣPi
    Σi,mm, Σi,mo, Σi,oo are the block CM of ym
    i
    , ym
    i
    and yo
    i
    , and yo
    i
    .
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 8 / 35

    View Slide

  18. Table of Contents
    1 Problem formulation and data model
    2 EM algorithms for incomplete data under the SG distribution
    A brief introduction to the EM algorithm
    The unstructured case
    The structured case
    Numerical simulations
    3 Application to (non-)supervised learning
    Classification of crop fields
    Image clustering
    Classification of EEG signals
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 9 / 35

    View Slide

  19. Introduction to the EM algorithm
    • Iterative scheme for ML estimation in incomplete-data problems;
    • Provides a formal approach to the intuitive ad hoc idea of filling
    in missing values;
    • The EM alternates between making guesses about the complete
    data y (E-step) and finding θ that maximizes Lc(θ|y) over θ
    (M-step);
    • Under some general conditions on the complete data, L(ˆ
    θEM )
    converges to L(ˆ
    θML).
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 10 / 35

    View Slide

  20. The EM algorithm
    • Initialization: At step t = 0, make an initial estimate of θ(0) (using a
    prior knowledge or by a sub-optimal existing algorithm).
    • E-step (guess complete data from current θ(t)):
    Q(θ|θ(t)) = Lc(θ|y)f(ym|yo, θ(t))dym
    = Eym
    i
    |yo
    i
    ,θ(t)
    Lc(θ|y)
    • M-step (update θ from the “guessed” complete data):
    Q(θ(t+1)|θ(t)) ≥ Q(θ|θ(t))
    Then repeat E and M-steps until a stopping criteria, such as the distance
    ||θ(t+1) − θ(t)||, converges to a pre-defined threshold.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 11 / 35

    View Slide

  21. Ingredients for our EM
    1. Transformed data: yi = Piyi =
    yo
    i
    ym
    i
    2. Transformed CM: Σi = PiΣPi
    3. The (new) complete loglikelihood Lc:
    Lc
    (θ|Y ) ∝ −n log |Σ| − p
    n
    i=1
    log τi

    n
    i=1
    yi (τiΣi)−1yi
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 12 / 35

    View Slide

  22. Ingredients for our EM
    1. Transformed data: yi = Piyi =
    yo
    i
    ym
    i
    2. Transformed CM: Σi = PiΣPi
    3. The (new) complete loglikelihood Lc:
    Lc
    (θ|Y ) ∝ −n log |Σ| − p
    n
    i=1
    log τi

    n
    i=1
    yi (τiΣi)−1yi
    EM 1: no structure on Σ (full-rank)
    EM 2: low-rank structure on Σ (r < p).
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 12 / 35

    View Slide

  23. EM 1 – No structure on Σ
    Parameters: θ = {Σ, τ1
    , . . . , τn
    }.
    E-step: compute the expectation of Lc
    Qi(θ|θ(t)) = Eym
    i
    |yo
    i
    ,θ(t)
    Lc
    (θ|yo
    i
    , ym
    i
    ) = τ−1
    i
    tr B(t)
    i
    Σ−1
    i
    need few manipulations
    with B(t)
    i
    =
    yo
    i
    yo
    i
    0
    0 Eym
    i
    |yo
    i
    ,θ(t)
    ym
    i
    ym
    i
    .
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 13 / 35

    View Slide

  24. EM 1 – No structure on Σ
    Parameters: θ = {Σ, τ1
    , . . . , τn
    }.
    E-step: compute the expectation of Lc
    Qi(θ|θ(t)) = Eym
    i
    |yo
    i
    ,θ(t)
    Lc
    (θ|yo
    i
    , ym
    i
    ) = τ−1
    i
    tr B(t)
    i
    Σ−1
    i
    need few manipulations
    with B(t)
    i
    =
    yo
    i
    yo
    i
    0
    0 Eym
    i
    |yo
    i
    ,θ(t)
    ym
    i
    ym
    i
    .
    M-step: obtain θ(t+1) as the solution of the following max. problems
    max
    θ
    Qi(θ|θ(t))
    subject to
    Σ 0
    τi > 0, ∀i
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 13 / 35

    View Slide

  25. M-step: update parameters with closed-form expressions
    Proposition: closed-form expressions of {τi
    }n
    i=1
    and Σ exists
    τi =
    tr(B(t)
    i
    Σ)
    p
    ; Σ =
    p
    n
    n
    i=1
    C(t)
    i
    tr C(t)
    i
    Σ−1

    = H(Σ) (1)
    with C(t)
    i
    = Pi
    B(t)
    i
    Pi and where Σm+1
    = H(Σm) is the fixed-point
    algorithm.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 14 / 35

    View Slide

  26. M-step: update parameters with closed-form expressions
    Proposition: closed-form expressions of {τi
    }n
    i=1
    and Σ exists
    τi =
    tr(B(t)
    i
    Σ)
    p
    ; Σ =
    p
    n
    n
    i=1
    C(t)
    i
    tr C(t)
    i
    Σ−1

    = H(Σ) (1)
    with C(t)
    i
    = Pi
    B(t)
    i
    Pi and where Σm+1
    = H(Σm) is the fixed-point
    algorithm.
    To obtain (1), one needs to...
    1. Differenciate Lc w.r.t. τi and resolve
    δLc
    δτi
    = 0
    2. Differenciate Lc w.r.t. to Σ and resolve
    δLc
    δΣ
    = 0
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 14 / 35

    View Slide

  27. EM-Tyl: Estimation of θ under CG distribution with
    missing values.
    Require: {yi
    }n
    i=1
    ∼ N(0, τiΣ), {Pi
    }n
    i=1
    Ensure: Σ, {τi
    }n
    i=1
    1: Initialization: Σ(0) = ΣTyl-obs
    ; τ(0) = 1N
    2: repeat EM loop, t varies
    3: Compute B(t)
    i
    =
    yo
    i
    yo
    i
    0
    0 Eym
    i
    |yo
    i
    ,θ(t)
    [ym
    i
    ym
    i
    ]
    4: Compute C(t)
    i
    = Pi
    B(t)
    i
    Pi
    5: repeat fixed-point, m varies (optional loop)
    6: Σ(t)
    m+1
    = H(Σ(t)
    m
    )
    7: until ||Σ(t)
    m+1
    − Σ(t)
    m
    ||2
    F
    converges
    8: Compute τ(t)
    i
    , i = 1, . . . , n
    9: t ← t + 1
    10: until ||θ(t+1) − θ(t)||2
    F
    converges
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 15 / 35

    View Slide

  28. EM 2 – Low-rank structure on Σ
    Parameters: θ = {H, σ2, τ1
    , . . . , τn
    }.
    E-step: compute the expectation of Lc (same as EM 1)
    Qi(θ|θ(t)) = Eym
    i
    |yo
    i
    ,θ(t)
    Lc
    (θ|yo
    i
    , ym
    i
    ) = τ−1
    i
    tr B(t)
    i
    Σ−1
    i
    with B(t)
    i
    =
    yo
    i
    yo
    i
    0
    0 Eym
    i
    |yo
    i
    ,θ(t)
    ym
    i
    ym
    i
    .
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 16 / 35

    View Slide

  29. EM 2 – Low-rank structure on Σ
    Parameters: θ = {H, σ2, τ1
    , . . . , τn
    }.
    E-step: compute the expectation of Lc (same as EM 1)
    Qi(θ|θ(t)) = Eym
    i
    |yo
    i
    ,θ(t)
    Lc
    (θ|yo
    i
    , ym
    i
    ) = τ−1
    i
    tr B(t)
    i
    Σ−1
    i
    with B(t)
    i
    =
    yo
    i
    yo
    i
    0
    0 Eym
    i
    |yo
    i
    ,θ(t)
    ym
    i
    ym
    i
    .
    M-step: obtain θ(t+1) as the solution of the following max. problems
    max
    θ
    Qi(θ|θ(t))
    subject to
    Σ = σ2Ip + H
    rank(H) = r
    σ > 0, τi > 0, ∀i
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 16 / 35

    View Slide

  30. M-step of EM 2
    • Form of the solution given in (Kang et al. 2014; Sun et al. 2016).
    • Once Σ is updated, eigendecompose it to obtain eigenvectors
    (u1
    , . . . , up) and eigenvalues (λ1
    , . . . , λp).
    • Then, reconstruct Σ using this set of operations:















    Σ = σ2Ip +
    r
    i=1
    ˆ
    λiuiui
    = σ2Ip + H
    σ2 =
    1
    p − r
    p
    i=r+1
    λi
    ˆ
    λi = λi
    − σ2, i = 1, . . . , r
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 17 / 35

    View Slide

  31. EM-Tyl-r: Low-rank estimation of θ under CG distribution
    with missing values.
    Require: {yi
    }n
    i=1
    ∼ N(0, τiΣ), {Pi
    }n
    i=1
    , rank r < p
    Ensure: Σ, {τi
    }n
    i=1
    1: Initialize Σ(0), τ(0).
    2: repeat EM loop, t varies
    3: Compute B(t)
    i
    and C(t)
    i
    as in previous Algorithm
    4: repeat fixed point, m varies (optional loop)
    5: Σ(t)
    m+1
    = H(Σ(t)
    m
    )
    6: Σ(t)
    m+1
    EVD
    =
    p
    i=1
    λiuiui
    7: Compute σ2, ˆ
    λi, H and reconstruct Σ(t)
    m+1
    8: until ||Σ(t)
    m+1
    − Σ(t)
    m
    ||2
    F
    converges
    9: Compute τ(t)
    i
    , i = 1, . . . , n
    10: t ← t + 1
    11: until ||θ(t+1) − θ(t)||2
    F
    converges
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 18 / 35

    View Slide

  32. Experiments setup
    • CM model:











    Rij = ρ|i−j|
    R EVD
    = UΛU
    Σ = σ2Ip + UU
    0 < ρ < 1 ; σ > 0 ; p = 15
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 19 / 35

    View Slide

  33. Experiments setup
    • CM model:











    Rij = ρ|i−j|
    R EVD
    = UΛU
    Σ = σ2Ip + UU
    0 < ρ < 1 ; σ > 0 ; p = 15
    • Geodesic distance: δ2
    Sp
    ++
    (Σ, Σ) = || log(Σ−1
    2 ΣΣ−1
    2 )||2
    2
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 19 / 35

    View Slide

  34. Experiments setup
    • CM model:











    Rij = ρ|i−j|
    R EVD
    = UΛU
    Σ = σ2Ip + UU
    0 < ρ < 1 ; σ > 0 ; p = 15
    • Geodesic distance: δ2
    Sp
    ++
    (Σ, Σ) = || log(Σ−1
    2 ΣΣ−1
    2 )||2
    2
    Comparison with:
    • Tyler’s M-estimator (Tyler 1987)
    on observed, full and imputed data.
    ΣTyler
    =
    p
    n
    n
    i=1
    yiyi
    yi
    Σ−1
    Tyler
    yi
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 19 / 35

    View Slide

  35. Experiments setup
    • CM model:











    Rij = ρ|i−j|
    R EVD
    = UΛU
    Σ = σ2Ip + UU
    0 < ρ < 1 ; σ > 0 ; p = 15
    • Geodesic distance: δ2
    Sp
    ++
    (Σ, Σ) = || log(Σ−1
    2 ΣΣ−1
    2 )||2
    2
    Comparison with:
    • Tyler’s M-estimator (Tyler 1987)
    on observed, full and imputed data.
    ΣTyler
    =
    p
    n
    n
    i=1
    yiyi
    yi
    Σ−1
    Tyler
    yi
    • Sample Covariance Matrix on observed and full data.
    ΣSCM
    =
    1
    n
    n
    i=1
    yiyi
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 19 / 35

    View Slide

  36. Results
    102 103
    −4
    −2
    0
    2
    4
    n
    δ2
    Sp
    ++
    (Σ, ˆ
    Σ) (dB)
    −4
    −2
    0
    2
    4
    δ2
    Sp
    ++
    (Σ, ˆ
    Σ) (dB)
    EM-Tyl EM-SCM Tyl-clair Tyl-obs
    SCM-clair SCM-obs RMI Mean-Tyl
    monotone
    general
    r = p r = 5
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 20 / 35

    View Slide

  37. Results
    102 103
    −4
    −2
    0
    2
    4
    n
    δ2
    Sp
    ++
    (Σ, ˆ
    Σ) (dB)
    102 103
    n
    −4
    −2
    0
    2
    4
    δ2
    Sp
    ++
    (Σ, ˆ
    Σ) (dB)
    EM-Tyl EM-SCM Tyl-clair Tyl-obs
    SCM-clair SCM-obs RMI Mean-Tyl EM-Tyl EM-SCM EM-Tyl-r EM-SCM-r
    monotone
    general
    r = p r = 5
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 21 / 35

    View Slide

  38. Table of Contents
    1 Problem formulation and data model
    2 EM algorithms for incomplete data under the SG distribution
    A brief introduction to the EM algorithm
    The unstructured case
    The structured case
    Numerical simulations
    3 Application to (non-)supervised learning
    Classification of crop fields
    Image clustering
    Classification of EEG signals
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 22 / 35

    View Slide

  39. Application 1: covariance-based classification of crop fields
    Breizhcrops dataset (Rußwurm et al. 2020)
    – {{yt
    k
    }K
    k=1
    }T
    t=1
    ∈ Rp: time series of
    reflectances at field parcels k ∈ [1, K] and timestamps t ∈ [1, T] over p
    spectral bands.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 23 / 35

    View Slide

  40. Application 1: covariance-based classification of crop fields
    Breizhcrops dataset (Rußwurm et al. 2020)
    – {{yt
    k
    }K
    k=1
    }T
    t=1
    ∈ Rp: time series of
    reflectances at field parcels k ∈ [1, K] and timestamps t ∈ [1, T] over p
    spectral bands.
    Problem – classify incomplete yt
    k
    using a minimum distance to
    Riemannian mean (MDRM) classifier (Barachant et al. 2012).
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 23 / 35

    View Slide

  41. Application 1: covariance-based classification of crop fields
    Breizhcrops dataset (Rußwurm et al. 2020)
    – {{yt
    k
    }K
    k=1
    }T
    t=1
    ∈ Rp: time series of
    reflectances at field parcels k ∈ [1, K] and timestamps t ∈ [1, T] over p
    spectral bands.
    Problem – classify incomplete yt
    k
    using a minimum distance to
    Riemannian mean (MDRM) classifier (Barachant et al. 2012).
    • Each parcel encoded by p × p SPD matrices {Σ1
    , . . . , ΣK
    }.
    • Test-training form.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 23 / 35

    View Slide

  42. Classification results
    None 1 2 3 4 5 6
    30
    40
    50
    60
    # of missing bands (1 band ∼ 7% of the data)
    Overall Accuracy (%)
    EM-SCM
    EM-Tyl
    RSI
    • EM-Tyl-based classification handles better incompleteness than the
    EM-SCM one;
    • EM-SCM∼EM-Tyl for higher missing data ratio.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 24 / 35

    View Slide

  43. Application 2: covariance-based image segmentation
    Indian Pines dataset (Baumgardner et al. 2015)
    – Hyperspectral image with p = 200
    bands, 16 classes partitioning the image.
    0 10 20 30 40 50 60
    0
    10
    20
    30
    40
    50
    60
    70
    80
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    (a) Ground truth
    0 10 20 30 40 50 60
    0
    10
    20
    30
    40
    50
    60
    70
    80
    −1000
    −500
    0
    500
    1000
    1500
    2000
    2500
    (b) Simulated sensor failure
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 25 / 35

    View Slide

  44. Clustering set-up
    • The mean is substracted to each image.
    • {Σ1
    , . . . , ΣM
    } are estimated in a w × w sliding window in each
    image.
    • Clustering task: K-means++ algorithm (Vassilvitskii et al. 2006).
    • Assigns each Σi
    to the cluster whose center is the closest according to
    a geodesic distance;
    • Update new class center using a Riemannian gradient descent (Collas et al.,
    2021)
    • Low-rank model with r = 5 (95% of the total cumulative
    variance).
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 26 / 35

    View Slide

  45. Clustering results
    None 1 2 3 4 5
    45
    50
    55
    # of incomplete bands
    Overall Accuracy (%)
    EM-SCM EM-SCM-r
    EM-Tyl EM-Tyl-r
    RSI RSI-r
    • EM-Tyl-r gives better OA for low missing data ratio.
    • EM-SCM-r gives better OA for high missing data ratio.
    • Low number of run due to a high runtime.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 27 / 35

    View Slide

  46. Application 3: classification of EEG signals [ongoing]
    Joint work with: Florent Bouchard (L2S, Universit´
    e Paris-Saclay), Fr´
    ed´
    eric Pascal (L2S,
    Universit´
    e Paris-Saclay), Ammar Mian (LISTIC, Universit´
    e Savoie Mont Blanc).
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 28 / 35

    View Slide

  47. Application 3: classification of EEG signals [ongoing]
    Joint work with: Florent Bouchard (L2S, Universit´
    e Paris-Saclay), Fr´
    ed´
    eric Pascal (L2S,
    Universit´
    e Paris-Saclay), Ammar Mian (LISTIC, Universit´
    e Savoie Mont Blanc).
    Dataset – K EEG trials over T timestamps over p electrodes.
    • Binary classification task: each trials belongs to the target (T) or
    non-target ( ¯
    T) class;
    • MDRM classifier with covariances {Σ1
    , . . . , ΣK
    };
    • The data is assumed to be Gaussian for now;
    • Some electrodes are set as missing (sensor failure).
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 28 / 35

    View Slide

  48. Preliminary results
    • Results on 10 subjects;
    • Taking into account incompleteness is better than taking only
    observed values;
    • Similar results than a KNN imputer.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 29 / 35

    View Slide

  49. A word to conclude
    To summarize:
    • EM-based procedure to perform robust low-rank estimation of the
    covariance matrix;
    • Handles missing data with a general pattern;
    • Compared to the Gaussian assumption / unstructured model:
    improvements in terms of CM estimation, supervised classification
    and unsupervised clustering.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 30 / 35

    View Slide

  50. A word to conclude
    To summarize:
    • EM-based procedure to perform robust low-rank estimation of the
    covariance matrix;
    • Handles missing data with a general pattern;
    • Compared to the Gaussian assumption / unstructured model:
    improvements in terms of CM estimation, supervised classification
    and unsupervised clustering.
    Some perspectives include:
    • Extension to other classes of SG distribution;
    • Consider the joint distribution of the data and the missing data
    mechanism: the E-step will change drastically.
    • Has been done for Gaussian and MNAR data (Sportisse et al. 2020)
    • Classification with temporal gaps rather than spectral gaps;
    • EEG signals: take into account SG distribution and think about a
    variable selection strategy.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 30 / 35

    View Slide

  51. Thanks!
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 31 / 35

    View Slide

  52. References I
    Augusto Aubry et al. “Structured Covariance Matrix Estimation with
    Missing-Data for Radar Applications via Expectation-Maximization”. In:
    arXiv preprint arXiv:2105.03738 (2021).
    Alexandre Barachant et al. “Multiclass Brain–Computer Interface
    Classification by Riemannian Geometry”. In: IEEE Transactions on
    Biomedical Engineering 59.4 (2012), pp. 920–928. doi:
    10.1109/TBME.2011.2172210.
    Marion F Baumgardner, Larry L Biehl, and David A Landgrebe. “220
    band aviris hyperspectral image data set: June 12, 1992 indian pine test
    site 3”. In: Purdue University Research Repository 10 (2015).
    Gabriel Frahm and Uwe Jaekel. “A generalization of Tyler’s M-estimators
    to the case of incomplete data”. In: Computational Statistics & Data
    Analysis 54.2 (2010), pp. 374–393.
    Bosung Kang, Vishal Monga, and Muralidhar Rangaswamy.
    “Rank-constrained maximum likelihood estimation of structured
    covariance matrices”. In: IEEE Transactions on Aerospace and Electronic
    Systems 50.1 (2014), pp. 501–515. doi: 10.1109/TAES.2013.120389.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 32 / 35

    View Slide

  53. References II
    Chuanhai Liu. “Efficient ML estimation of the multivariate normal
    distribution from incomplete data”. In: Journal of Multivariate Analysis 69
    (1999), pp. 206–217. doi: 10.1006/jmva.1998.1793.
    Karim Lounici et al. “High-dimensional covariance matrix estimation with
    missing observations”. In: Bernoulli 20.3 (2014), pp. 1029–1058.
    Junyan Liu and Daniel P. Palomar. “Regularized robust estimation of
    mean and covariance matrix for incomplete data”. In: Signal Processing
    165 (2019), pp. 278–291. issn: 0165-1684. doi:
    https://doi.org/10.1016/j.sigpro.2019.07.009.
    Chuanhai Liu and Donald B Rubin. “ML estimation of the t distribution
    using EM and its extensions, ECM and ECME”. In: Statistica Sinica
    (1995), pp. 19–39.
    Marc Rußwurm et al. “BreizhCrops: A Time Series Dataset for Crop Type
    Mapping”. In: International Archives of the Photogrammetry, Remote
    Sensing and Spatial Information Sciences ISPRS (2020) (2020).
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 33 / 35

    View Slide

  54. References III
    Aude Sportisse, Claire Boyer, and Julie Josse. “Imputation and low-rank
    estimation with Missing Not At Random data”. In: (2020). arXiv:
    1812.11409 [stat.ML].
    Y. Sun, P. Babu, and D. P. Palomar. “Robust Estimation of Structured
    Covariance Matrix for Heavy-Tailed Elliptical Distributions”. In: IEEE
    Transactions on Signal Processing 64.14 (2016), pp. 3576–3590. doi:
    10.1109/TSP.2016.2546222.
    Nicolas St¨
    adler, Daniel J Stekhoven, and Peter B¨
    uhlmann. “Pattern
    alternating maximization algorithm for missing data in high-dimensional
    problems.”. In: J. Mach. Learn. Res. 15.1 (2014), pp. 1903–1928.
    David E. Tyler. “A Distribution-Free M-Estimator of Multivariate
    Scatter”. In: The Annals of Statistics 15.1 (1987), pp. 234–251.
    Sergei Vassilvitskii and David Arthur. “k-means++: The advantages of
    careful seeding”. In: Proceedings of the eighteenth annual ACM-SIAM
    symposium on Discrete algorithms. 2006, pp. 1027–1035.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 34 / 35

    View Slide

  55. Convergence of the EM
    100 101 102 103
    10−18
    10−9
    100
    NMSEEM,Σ
    NMSEEM,τi
    100 101 102 103 100 101 102 103
    Figure: ||θ(t+1) − θ(t)|| versus the number of iterations.
    Alexandre Hippert-Ferrer CM estimation with missing values S´
    eminaire Scube 35 / 35

    View Slide