Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Christophe Kervazo

S³ Seminar
January 27, 2023

Christophe Kervazo

(LTCI, Télécom Paris, Institut Polytechnique de Paris)

Title — Automatic methods for sparse blind source separation

Abstract — Over the last decades, sparse Blind Source Separation (BSS) has become a well-established tool for a wide range of applications. Classical optimization-based sparse BSS methods, such as the Proximal Alternating Linearized Minimization (PALM) algorithm, nevertheless often rely on a cumbersome handcrafted hyper-parameter choice, undermining their practical results and making them difficult to use. In this presentation, we will therefore explore several strategies to bypass this pitfall. We will start by exploring some statistic-based automatic hyper-parameter choice rules, and we will eventually discuss data-driven methods leveraging algorithm unrolling/unfolding. We will furthermore consider an extension of sparse BSS to continuous target extraction in Synthetic Aperture Radar (SAR) images. Overall, we will show that our findings can contribute to a wide range of imaging applications: astrophysics, remote sensing and biomedical imaging, to only name a few.

[1] Kervazo, C., Bobin, J., Chenot, C., & Sureau, F. (2020). Use of PALM for ℓ1 sparse matrix factorization: Difficulty and rationalization of a two-step approach. Digital Signal Processing, 97, 102611.

[2] Fahes, M., Kervazo, C., Bobin, J., & Tupin, F. (2021, September). Unrolling PALM for Sparse Semi-Blind Source Separation. In International Conference on Learning Representations.

Biography — Christophe Kervazo received Supélec engineering degree in 2015, and the master of science in Electrical and Computer Engineering from Georgia Institute of Technology (USA) in 2016. From 2016 to 2019, he was a PhD student in the CosmoStat group at CEA Saclay, where he worked on the optimization framework for sparse blind source separation, as well as non-linear component separation. He then went for one year in Mons (Belgium) where he worked, as a post-doctoral researcher, on the extension of Nonnegative Matrix Factorization to Linear-Quadratic mixture unmixing, with mathematical guarantees. He is currently an Assistant Professor at Télécom Paris, in the IMAGES group, were he now mainly works on deep-learning tools for inverse problems, with a focus on algorithm unrolling/unfolding. His works main current application is remote sensing imaging, but he also works on biomedical data.

S³ Seminar

January 27, 2023
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Sparse blind source
    separation: from handcrafted
    to automatic methods
    Christophe Kervazo
    Joint works with Elsa Angelini, Jérôme Bobin, Cécile Chenot, Abdelkhalak Chetoui, Jérémy Cohen, Erwan Dereure,
    Mohammad Fahes, Rassim Hadjeres, Florent Sureau, Florence Tupin, and others…

    View Slide

  2. L2S seminar
    Blind Source Separation (BSS): example of applications
    2
    Goal: unmix some signals
    Figures: courtesy from: *https://cacm.acm.org/news/190656-3d-printed-device-helps-computers-solve-cocktail-party-problem/fulltext, ** Rapin, J. (2014). Décompositions parcimonieuses pour
    l'analyse avancée de données en spectrométrie pour la Santé, *** Chenot, C. (2017). Parcimonie, diversité morphologique et séparation robuste de sources.**** Courtesy of F. Acero
    Cocktail party problem*


    Hyperspectral unmixing in remote sensing***
    Spectroscopy** Show-through removal
    Astrophysics****

    /

    View Slide

  3. L2S seminar
    BSS: Linear model [Comon10]
    3
    Goal of BSS : estimate A* and S* from X
    X : m rows observations and t samples columns (m x t)


    A* : mixing function (m x n)


    S* : sources (n x t) - the sources will be assumed to be sparse


    N : noise and model imperfections (m x t)
    X = A⇤S⇤ + N
    (up to limited indeterminacies)
    = + … + N
    X
    x
    y
    x
    y
    x
    y
    A⇤1
    S⇤
    1
    +
    S⇤
    2
    A⇤2

    View Slide

  4. L2S seminar
    Sparse BSS [Zibulevsky01]
    4
    Ill-posed problem [1]


    => introduce prior about S:


    - ICA [Comon10] : Independent Component Analysis


    - NMF [Gillis12] : Non-negative Matrix Factorization


    - SMF [Zibulevsky01] : Sparse Matrix Factorization
    S
    X = A⇤S⇤ + N
    : wavelet domain
    S
    (S⇤
    S)1
    Sparsity: Most of the source coefficients are close to 0 in
    S
    S⇤
    1
    Direct domain

    View Slide

  5. L2S seminar
    BSS as an optimization problem
    5
    Written as an optimization problem [Zibulevsky01, Bobin07]:
    RS : Regularization parameters (size n x t)
    How to solve sparse BSS in practice?
    Data-
    fi
    delity Sparsity Oblique constraint
    argmin
    A2Rm⇥n,S2Rn⇥t
    1
    2
    kX ASk2
    F
    + RS (S T
    S
    )
    1
    + ◆{8i2[1,n];kAik2
    `2
    =1}
    (A)
    T
    S
    : is a sparsifying transform
    Challenges:


    - Non-smooth (needs advanced optimization tools: proximal operators [Parikh14])


    - Non-convex (non-unique minima)


    - Difficult hyperparameter choice
    => Difficult optimization problem, requiring some difficult tuning from
    the user


    => In the following, we will see how to make these choices more automatic

    View Slide

  6. L2S seminar
    Outline
    6
    I - Optimization strategy for sparse BSS: towards automatic parameter
    choices
    III - New data-driven methods for sparse BSS through algorithm unrolling
    II - Extension to single-observation complex component separation:
    continuous target extraction in SAR images

    View Slide

  7. L2S seminar
    Optimization frameworks
    7
    pALS
    PALM [Bolte14]
    Gradient based Least square based
    While not converged over and do:
    A S
    S ←
    𝒮
    ηRS
    (S − η (AS − X))
    A ← Π∥.∥2
    =1 (A − ξ (AS − X))
    While not stabilized over
    and :
    A S
    S ←
    𝒮
    RS
    (A†X)
    A ←
    𝒮
    RS
    (XS†)
    While not converged over do:
    S
    S ←
    𝒮
    ηRS
    (S − η (AS − X))
    A ← Π∥.∥2
    =1 (A − ξ (AS − X))
    While not converged over do:
    A
    While not converged over and do:
    A S
    BCD [Tseng01]
    argmin
    A2Rm⇥n,S2Rn⇥t
    1
    2
    kX ASk2
    F
    + RS (S T
    S
    )
    1
    + ◆{8i2[1,n];kAik2
    `2
    =1}
    (A)
    with:


    - the proximal operator of the -norm (soft-thresholding of parameters , applied element-wise)


    - the projection on the unit sphere


    - some gradient step-sizes
    𝒮
    ηRS
    ( . ) ℓ1
    ηRS
    Π∥.∥2
    =1
    ( . ) ℓ2
    η, ξ
    While PALM is in theory the best algorithm, it is in practice often reported to have bad
    practical results in BSS. Why?

    View Slide

  8. L2S seminar
    First study: why is PALM difficult to use in sparse BSS?
    8
    Median of CA
    - 3x10-4 change => 30 dB drop


    - In diagonal, a 2x10-3 change => 7 dB drop
    Zoom on maximum
    1
    2
    1
    2
    Empirical study: study sensitivity to regularization parameters
    - Simulated n = 2 sources (dynamic range = 0.6)


    - Compute [Bobin15] : indeterminacy correction
    Difficult choice of RS
    RS =
    2
    6
    6
    4
    1 1 ... 1
    2 2 ... 2
    ... ... ... ...
    n n ... n
    3
    7
    7
    5
    Let us first assume one parameter per source:
    CA = 10 log(mean(|P ˆ
    A†A⇤ Id
    |)), P
    GMCA: 36 dB

    View Slide

  9. L2S seminar
    Empirical study conclusions: sensitivity of PALM + grid search
    9
    - Low efficiency: for given X, hard to choose RS


    - Low versatility: a choice of RS does not generalize to other X


    - Low reliability: sensitive to initial point
    Kervazo, C., Bobin, J., Chenot, C., & Sureau, F. (2020). Use of PALM for ℓ1 sparse matrix factorization: Difficulty and rationalization of a two-step
    approach. Digital Signal Processing, 97, 102611.
    => Practical aspect not well discussed in the literature (grid-search, use of the
    true unknown factorization…)


    => Intractable for real data

    View Slide

  10. L2S seminar 10
    Comparison between sparse BSS algorithms
    BCD1 pALS2 / GMCA3 PALM4
    Reference [Tseng01] [Bobin07] [Bolte14]
    Fast ✕
    Convergence ✓ ✓

    Automatic parameter
    choice
    ✕ ✕

    ✓✓ ✓
    Easy to use
    Mathematically


    sound
    How to obtain both an easy to use and mathematically sound algorithm?
    Work on PALM
    Find a min. ? ✓ ✓

    argmin
    A2Rm⇥n,S2Rn⇥t
    1
    2
    kX ASk2
    F
    + RS (S T
    S
    )
    1
    + ◆{8i2[1,n];kAik2
    `2
    =1}
    (A)
    Robust to spurious
    min.
    ✕ ✓ ✕
    1Block Coordinate Descent, 2 projected Alternating Least-Square, 3 Generalized Morphological Component Analysis, 4 Proximal
    Alternating Linearized Minimization

    View Slide

  11. L2S seminar
    More about GMCA automatic hyper-parameter choice
    11
    - In GMCA, use of MAD of the current source estimation (new value at each iteration):
    - MAD: robust STD estimator. Insensible to sparse perturbations: MAD(S⇤) ' 0
    - Such a choice is motivated by a fixed point argument: assume that at a given iteration,
    GMCA has perfectly estimated , then the source update is given by:
    A* S
    small
    RS =
    2
    6
    6
    4
    1 1 ... 1
    2 2 ... 2
    ... ... ... ...
    n n ... n
    3
    7
    7
    5
    argmin
    A2Rm⇥n,S2Rn⇥t
    1
    2
    kX ASk2
    F
    + RS (S T
    S
    )
    1
    + ◆{8i2[1,n];kAik2
    `2
    =1}
    (A) with
    S ←
    𝒮
    RS
    (A*†
    X) =
    𝒮
    RS
    (A*†
    A*S* + A*†
    N)
    =
    𝒮
    RS
    (S* + A*†
    N)
    - Therefore, to make , the soft-thresholding should remove the Gaussian noise


    => by thresholding with , 99% of the noise is removed


    => here, is estimated using
    S ← S* A*†
    N
    λi
    = 3 ×
    𝚜
    𝚝 𝚍
    [A*†
    N]i
    𝚜 𝚝
    𝚍
    [A*†
    N]i
    𝙼 𝙰
    𝙳
    ( ̂
    A†X)
    MAD(s) ' 1.48 median(|s median(s)|)
    with
    λi
    = κ
    𝙼
    𝙰𝙳
    [ ̂
    A†X]i

    View Slide

  12. L2S seminar
    Choosing hyper-parameters in PALM?
    12
    If we assume errors on the sources: = S* + s (in particular, s non-zero at initialization)
    s ̂
    S
    Implement same heuristic as in GMCA:


    If true A* and S* are found, similar interpretation at convergence
    => Unadapted threshold choice with the MAD due to interferences
    Add dense interferences


    (not present GMCA)
    i
    '  ⇥ MAD

    A⇤T N A⇤T A⇤s

    i
    i
    '  ⇥ MAD A⇤T N
    i
    Need to limit interferences :
    - Good initialization (s(0) small)


    - Reweighted (adapt thresholds to sources)
    ℓ1
    PALM
    λi

    𝙼𝙰𝙳
    [A*†
    N]i
    GMCA

    View Slide

  13. L2S seminar
    Algorithm: embed PALM within a 2-step approach
    GMCA robustness
    to initialization
    Automatic


    parameter choice


    (computed on good


    initialization)
    Reweighting


    from GMCA
    PALM theoretical


    background
    13
    RS = ⇤SG

    View Slide

  14. L2S seminar
    Experiment on realistic astrophysics data
    14
    Experiments from simulated Chandra data


    n = 3 sources


    t = 128 x 128 pixels


    m = 12 observations
    S*
    ˆ
    S
    S⇤ ˆ
    S
    2-step
    Fast
    Convergence ✓
    Automatic Rs
    Find a min. ? ✓
    Reliable



    SNR (dB) 10 15 20 30 60
    2 step 15.0 16.3 17.
    4

    19.7 20.9
    PALM 11.9 13.3 13.5 14.2 14.5
    GMCA 13.2 14.8 15.1 17.1 18.6
    EFICA [Kodolvsky06] 8.8 10.3 14.0 18.9 19.4
    RNA [Zibulevsky03] 9.8 12.6 15.6 18.3 18.4
    CA (in dB)

    View Slide

  15. L2S seminar
    Recent use: bioluminescence
    15
    - Application in biomedical imaging (real dataset)


    - Modality: optical imaging based on the counting of photons emitted by biological tissues
    through a bioluminescence reaction


    - Goal: enable to separate multiple tumors in mice, based on their temporal
    bioluminescence activity
    Dereure E., Kervazo C., Seguin J., Garofalakis A., Mignet N., Angelini E., Olivo-Marin J.-C., Sparse non-negative matrix factorization
    for preclinical bioluminescent imaging, Accepted to ISBI 2023 conference

    View Slide

  16. L2S seminar
    Recent use: bioluminescence
    16
    - Non-negative extension of the above work: PALM is used for minimizing
    arg min
    A,S
    1
    2
    ∥X − AS∥2
    F
    + ∥R ⊙ S∥1
    + i.≥0
    (A) . + i.≥0
    (S) + i{∀j∈[1,n],∥Aj∥ℓ2
    ≤1}
    (A)
    - A good initialization and some reweighting are used, the thresholds are based on the MAD


    - In addition, a gain is used for each source to handle their imblance.
    No sparsity
    Fixed gain


    No reweighting
    Fixed gain


    Reweighting
    A d a p t a t i v e
    gain


    A d a p t a t i v e
    gain


    View Slide

  17. L2S seminar
    Outline
    17
    I - Optimization strategy for sparse BSS: towards automatic parameter
    choices
    III - New data-driven methods for sparse BSS through algorithm unrolling
    II - Extension to single-observation complex component separation:
    continuous target extraction in SAR images

    View Slide

  18. Target extraction
    SAR imaging & objective
    18
    - SAR imaging is a remote sensing modality, yielding complex
    images


    - SAR images are the superposition


    - a background


    - some targets
    x ∈ ℂn×n
    n ∈ ℂn×n
    t ∈ ℂn×n
    Here, we will work on the complex images : x = n + t ∈ ℂn×n
    Goal: retrieve both and from , which can be seen as a complex single-channel source
    separation problem
    n t
    |x| |n| |t|

    View Slide

  19. Target extraction
    Additional information about the problem at hand
    19
    - The targets are assumed to be a Dirac in the scene, which are then convoluted by the SAR
    imaging system PSF, which is a cardinal sine


    - If the targets were lying on the image sampling grid, they would also be Dirac on the image


    - However, they have continuous positions in the scene => they ressemble cardinal sines,
    making their separation harder (we cannot just extract the local maxima in the image)
    - For each target, we want to estimate:


    - Its amplitude


    - The localization of the maximum with a subpixellic accuracy
    Objective

    View Slide

  20. Target extraction
    Sparsity as a separation principle
    20
    - and (background) follow Gaussian laws (of unknown variance)


    - Only a few targets are present in the image
    ℜ{n} ℑ{n}
    Additional assumptions:


    As finding and from is ill-posed, we resort to additional assumptions:
    n t x = n + t
    First (naive) approach
    - The PSF knowledge could be exploited by creating a dictionary containing all
    the cardinal sines shifted at all the sampling points of the grid


    - To exploit sparsity, instead of using the pseudo-norm, we can relax it:
    Φ ∈ ℝn×n
    ℓ0
    - For the sake of clarity, we will consider in the following vectors instead of
    images
    x ∈ ℝn
    arg min
    α∈ℝρN
    1
    2
    ∥x − Φα∥2
    2
    + λ∥α∥1
    ,
    with a set of pixel-wise hyperparameter
    λ ∈ ℝ
    - For subpixelic precisions, we can oversample the dictionary by a factor
    ρ
    - Most existing methods are greedy approaches, extracting the targets one at a time

    View Slide

  21. Target extraction
    Remark: optimization trick
    21
    - In practice, computing is expensive, as it is in


    - Fortunately, can be seen as a circulant matrix and the cost function can then be rewritten
    using a convolution
    Φ ℝn×ρn
    Φ
    - All the convolutions can then be computed in the Fourier domain. As such, we never need
    to compute explicitly !
    Φ
    - The minimization can then be performed by using for instance the ISTA/FISTA algorithm
    arg min
    α∈ℝρN
    1
    2
    ∥x − Φα∥2
    2
    + λ∥α∥1
    ,
    arg min
    α∈ℝρN
    1
    2
    ∥x − sinc * α∥2
    2
    + λ∥α∥1
    ,

    View Slide

  22. Target extraction
    Naive approach: limitations
    22
    - Exemple of results on a simulation
    |x| |α|
    - The targets are split in the neighboring pixels


    - Too much targets are found compared to the ground truth
    , zoom
    |α|
    - Two reasons:


    - The dictionary might be ill-conditioned (especially, when is large)


    - Difficulty to set the hyperparameters
    Φ ρ
    λ ∈ ℝ

    View Slide

  23. Target extraction
    Better approach: Continuous Basis Pursuit
    23
    - To bypass the fact that the dictionary might be ill-conditioned, we propose to use a CBP
    approach.


    - Since the positions of the target are continuous, CBP is based on a Taylor expansion of the
    sinc function around the grid points.
    Φ

    i
    αi
    sinc(π(t − iΔ + li
    )) ≃
    |li
    |≤Δ

    i
    αi
    sinc(π(t − iΔ)) +
    αi
    li
    2
    sinc′

    (π(t − iΔ))
    Positions
    on the grid
    Displacement compared to
    the grid (continuous)
    = ∑
    i
    αi
    sinc(π(t − iΔ)) + δi
    sinc′

    (π(t − iΔ)), with δi
    =
    αi
    li
    2

    View Slide

  24. Target extraction
    CBP cost function
    24
    - Using the above Taylor expansion, we can derive a new cost function:
    arg min
    α∈ℂρn, δ∈ℝρn
    1
    2
    ∥x − Φα − Ψδ ⊙ α∥2
    2
    + λ∥α∥1
    + ι.≤ Δ
    2
    (δ)
    - Where:


    - The component-wise product is denoted as


    - The columns of the dictionary are shifted derivatives of the sinc


    - The is related to the shift compared to the regular grid and is a continuous variable

    Ψ
    δ
    - Once and are found, it is easy to find the displacement over the regular grid by shifting
    the grid coordinates of
    α δ
    Δ
    2
    δ

    View Slide

  25. Target extraction
    CBP: optimization framework
    25
    - In the real case, the CBP cost function
    arg min
    α∈ℂsN, δ∈ℝsN
    1
    2
    ∥x − Φα − Ψδ ⊙ α∥2
    2
    + λ∥α∥1
    + ι.≤ Δ
    2
    (δ)
    can be rewritten in a convex way [Ekanadham2011].


    - However, in the complex-case, it is non-convex.


    - Still, it is multi-convex and we can therefore use the PALM algorithm in order to try
    to minimize it.
    Initialize with FISTA, and will zero shift


    While not converged





    return
    α δ
    α =
    𝒮
    λη (α + η [Φ* + diag(δ)Ψ*] (x − Φα − Ψα ⊙ δ))
    δ ← ΠΔ
    2
    (δ + ξ[diag( ¯
    α)Ψ*](x − Φα − Ψα ⊙ δ))
    α, δ
    All the products with are done using convolutions for higher efficiency
    Φ, Φ*, Ψ, Ψ*
    PALM for CBP

    View Slide

  26. Target extraction
    Extension to images
    26
    - In the case of images (2-D signals) , the methodology easily extends:
    x ∈ ℂsN×sN
    arg min
    α∈ℂsN, δh∈ℝsN, δv∈ℝsN
    1
    2
    ∥x − Φα − Ψhδh ⊙ α − Ψvδv ⊙ α∥2
    2
    + λ∥α∥1
    + ι.≤ Δ
    2
    (δh) + + ι.≤ Δ
    2
    (δv)
    where:


    - contain the horizontal and vertical displacement, respectively


    -
    The columns of are and the columns of are
    δh, δv
    Φh
    ∂sinc(x, y)
    ∂x
    Φv
    ∂sinc(x, y)
    ∂y

    View Slide

  27. Target extraction
    Regularization parameter setting
    27
    - Choice of in:
    λ
    arg min
    α∈ℂρn, δ∈ℝρn
    1
    2
    ∥x − Φα − Ψδ ⊙ α∥2
    2
    + λ∥α∥1
    + ι.≤ Δ
    2
    (δ)
    In principle:


    - should not be constant for the whole image, as it is related to the background level


    => we can rather use a -map, with a value for each pixel


    - Similarly to what we did in the previous part, we use a fixed-point argument :
    λ
    Λ λ (Λ ∈ ℝρn)
    => if the target are well separated, , which follows a Rayleigh law,
    should be canceled by the proximal operator of the -norm
    γ [ΦT + diag(δ)ΨT] n
    ℓ1
    α* + γ[ΦT + diag(δ)ΨT](x − Φα* − Ψα* ⊙ δ) ≃ α* + γ [ΦT + diag(δ)ΨT] n
    => standard deviation is estimated using the MAD.
    γ [ΦT + diag(δ)ΨT] n

    View Slide

  28. Target extraction
    Target localization: results
    28
    - Comparison of :


    - Basis Pursuit with a threshold based on the true noise level


    - Continuous Basis Pursuit with a threshold based on the true noise level


    - Continuous Basis Pursuit with a threshold based on the MAD rule
    Λ*
    Λ*
    Median target localization error
    Blue: true targets; red: estimated targets
    CBP, thresholds
    Λ* CBP, MAD-based thresholds

    View Slide

  29. Target extraction
    Target localization: results
    29
    - Proposed a new method for target extraction in SAR signals


    - Based on convex relaxation, contrary to most other methods


    - Leveraged Continuous Basis Pursuit to obtain a higher subpixellic accuracy


    - Used the PALM algorithm to minimize the corresponding cost function


    - An automatic threshold choice is used
    Remaining work:


    - Further tests on real datasets


    - New ways of choosing the thresholds (Plug-and-Play methods / Unrolling ?)
    Kervazo, C., & Ladjal, S. (2022). Extraction des positions continues des cibles dans les signaux RSO,
    GRETSI conference.

    View Slide

  30. L2S seminar
    Outline
    30
    I - Optimization strategy for sparse BSS: towards automatic parameter
    choices
    III - New data-driven methods for sparse BSS through algorithm unrolling
    II - Extension to single-observation complex component separation:
    continuous target extraction in SAR images

    View Slide

  31. GDR MIA Plug-and-Play workshop
    Limitations of tackling sparse BSS using PALM
    31
    • If we have access to a data base with examples of mixtures and the corresponding
    factors A* and S*, can we make PALM better work by introducing some learnt
    components?


    • The method we use here is algorithm unrolling


    => enables to bypass the cumbersome hyper-parameter choice


    => much more computationally efficient than PALM


    => yield interpretable neural networks


    Contribution
    • We highlighted the difficulty of tuning BSS methods [Kervazo20]


    • We used an heuristic based on the use of the MAD for finding some decent hyperparameters


    • But it is still handcrafted, and the MAD might not be the best noise STD estimator.


    • In addition, PALM might require in any case several thousands of iterations to converge,
    reducing its applicability

    View Slide

  32. Algorithm unrolling: setting
    32
    Sparse Blind Source Separation: minimization of the cost function
    • Let us consider sparse BSS, in which (for the moment) we only want to estimate from a
    (when is estimated, can be quite well estimated by least squares)
    S*
    X S* A*
    • Let us further assume that :


    • We have training datasets such that:
    atrain
    1X, 2X, . . , atrainX
    X = A* S* + N
    ,
    1X = 1A* 1S* +1 N
    2X = 2A* 2S* +2 N
    . . .
    atrainX = atrainA* atrainS* +atrain N
    • For each training dataset , we have access to the corresponding source
    iX iS*
    with « of the same kind as » (same for and
    )
    1A*, 2A*, . . , atrainA* A* 1S*, 2S*, . . , atrainS*
    S*
    Algorithm unrolling is then a principled method to construct neural network architectures
    enabling to recover S*

    View Slide

  33. GDR MIA Plug-and-Play workshop
    Algorithm unrolling: methodology
    33
    • (Classical) iterative method to estimate :
    S*
    with the algorithm parameters
    (gradient step sizes…)
    θ
    • Algorithm unrolling truncates this scheme to rewrite it in the form of a neural network with a
    small number of layers (iterations):
    • The algorithms parameters becomes trainable on a training set (i.e. they becomes the
    weights of the neural network)


    • The number of iterations is usually much smaller than in the original algorithm
    θ(k)
    L
    S ←
    𝒮
    λ
    LS
    (
    S −
    1
    LS
    AT(A*S − X)
    )
    It can be sketched as: X S

    (A)
    update
    A
    while not converged do:
    Update A
    X S
    f(1)
    θ1
    (A) f(2)
    θ2
    (A) f(3)
    θ3
    (A) f(L)
    θL
    (A)

    View Slide

  34. GDR MIA Plug-and-Play workshop
    Being more specific: LISTA algorithm
    34
    • [Gregor, Lecun 10] rewrote the gradient-proximal update in the form of a neural network


    • It is based on a reparametrization of the update
    S ←
    𝒮
    λ
    L
    (
    S −
    1
    L
    AT(AS − X))
    S ←
    𝒮
    λ
    LS
    (W1
    S +W2
    X)
    • This update is a typical NN update: linearity followed by non-linearity



    If there was no learning, and (but note that this is not
    exploited in LISTA, since and are learnt independently)
    W1
    = I −
    1
    LS
    ATA W2
    =
    1
    LS
    AT
    W1
    W2
    Θ = {λ/LS
    , W1
    , W2
    }
    LISTA update
    S ←
    𝒮
    λ
    L ((
    I −
    1
    L
    ATA
    )
    S +
    1
    L
    ATX
    )

    learning some parts of the update

    View Slide

  35. GDR MIA Plug-and-Play workshop
    Difficulties of applying LISTA for sparse BSS
    35
    • But in the original LISTA work, the A* operator is the same over all the , which is not
    the case in BSS:


    iX
    iX = iA* iS* + iN
    iA*
    1
    , i = 1..50 iA*
    2
    , i = 1..50 iA*
    3
    , i = 1..50 iA*
    4
    , i = 1..50
    => LISTA numerically obtains bad results for sparse BSS
    S ←
    𝒮
    λ
    LS
    (W1
    S +W2
    X)
    Illustration on the introductory dataset:

    View Slide

  36. GDR MIA Plug-and-Play workshop
    LISTA-CP
    36
    • In [chen18], the authors have used the theoretical coupling of
    • Therefore, it is important to have an update taking into account the variability of the over
    the different samples
    iA*
    iX
    W1
    = I −
    1
    L
    ATA W2
    =
    1
    L
    AT
    and
    to propose the LISTA-CP algorithm, leading to the update:
    S ←
    𝒮
    λ
    LS
    (S − WT(AS − X))
    • To handle large variations of the operator between the datasets , we propose to use a
    LISTA-CP like update for update, within an alternating structure enabling to also
    estimate


    • This corresponds to unroll PALM in a way which is specifically Taylored for blind inverse
    problems
    iA* iX
    iS
    iA*
    LISTA-CP is better suited than LISTA when all are not the same (in LISTA,
    and are the same for all the samples , giving no flexibility for the update)
    iA* W1
    W2
    iX

    • In LISTA-CP, the matrix appears in the update :
    A
    ☹︎
    LISTA-CP requires an estimate of iA*
    S ←
    𝒮
    λ
    LS
    (W1
    S +W2
    X)
    (LISTA update) (LISTA-CP update)

    View Slide

  37. GDR MIA Plug-and-Play workshop
    Solving the BLIND source separation problem
    37
    • The way to unroll PALM was chosen according to the previous remarks and experimental
    trials :
    S ←
    𝒮
    γ (S − WT(AS − X))
    A ← Π∥.∥≤1 (
    A +
    1
    LA
    (X − AS)ST
    )
    (LISTA-CP update)
    (learning step-size)
    • The loss function is chosen as
    NMSE(A, A*) + NMSE(S, S*)
    for k from 1 to L do :
    end for
    return A, S
    initialize A and S with a very generic initialization
    over the training set.
    Learned-PALM (LPALM)

    View Slide

  38. GDR MIA Plug-and-Play workshop
    Numerical experiments: datasets
    38
    • Test set:
    with:
    X = A*S* + N
    A*
    • A* coming from realistic simulations
    • S* being real supernovae maps (reshaped into vectors)
    • N generated Gaussian noise
    S*
    • Train set:
    750 samples of with:
    iX = iA*iS* + iN
    • coming from realistic simulations (quite different from the test
    set)
    iA*
    • generated using a Bernouilli Generalized-Gaussian
    distribution
    iS*
    • generated Gaussian noise
    iN
    for . Each column
    is represented with a different
    color
    iA* i = 1..50
    LPALM is tested on astrophysical simulations of the Cassiopea A supernovae remnant as
    observed by the X-ray telescope Chandra. There are emissions: synchrotron, thermal
    and 2 red-shifted irons
    n = 4

    View Slide

  39. GDR MIA Plug-and-Play workshop
    Numerical results: comparison with PALM
    39
    Blue, plain and dashed lines: median number of iterations for PALM
    and LPALM, respectively
    Red, plain and dashed lines: median NMSE for PALM and LPALM,
    respectively
    LPALM is compared with PALM, by optimizing PALM parameters over the train set:
    LPALM largely outperforms PALM, both:


    - in terms of separation quality


    - in terms of number of iterations

    View Slide

  40. GDR MIA Plug-and-Play workshop
    Numerical results: comparison with other unrolled methods
    40
    S*
    S*
    LPALM largely outperforms its competitors:


    - LISTA lacks of flexibility to handle varying matrices


    - DNMF suffers from a training using the reconstruction error only: , which is
    well-known to lead to spurious solutions
    iA*
    ∥iX − iAiS∥2
    2

    View Slide

  41. GDR MIA Plug-and-Play workshop
    Conclusion
    41
    Further research paths which are currently considered:


    • Perform unrolling on other kind of BSS algorithms


    • Extend LPALM to remote-sensing datasets in which spectral variabilities can occur
    even within a single dataset


    • Study semi / unsupervised unrolled methods


    • Explore new methods for SAR target extraction (based on unrolling and/or PnP)
    X
    • We started by showing the practical difficulty of choosing hyperparameters in
    optimization-based BSS methods


    • We have empirically shown that using a MAD-heuristic for the hyperparameter choice,
    based on a fixed point argument, can lead to good results


    • This extends to complex data in the context of target continuous position extraction in
    SAR imaging


    • To go a step further, and if a training set with ground-truth is available, we have shown
    the interest of using algorithm unrolling to propose neural network architectures enabling
    to perform BSS by mimicking the structure of PALM.

    View Slide