Christophe Kervazo

Sparse blind source separation: from handcrafted to automatic methods Christophe
Kervazo Joint works with Elsa Angelini, Jérôme Bobin, Cécile Chenot, Abdelkhalak Chetoui, Jérémy Cohen, Erwan Dereure, Mohammad Fahes, Rassim Hadjeres, Florent Sureau, Florence Tupin, and others…

L2S seminar Blind Source Separation (BSS): example of applications 2
Goal: unmix some signals Figures: courtesy from: *https://cacm.acm.org/news/190656-3d-printed-device-helps-computers-solve-cocktail-party-problem/fulltext, ** Rapin, J. (2014). Décompositions parcimonieuses pour l'analyse avancée de données en spectrométrie pour la Santé, *** Chenot, C. (2017). Parcimonie, diversité morphologique et séparation robuste de sources.**** Courtesy of F. Acero Cocktail party problem* Hyperspectral unmixing in remote sensing*** Spectroscopy** Show-through removal Astrophysics**** … /

L2S seminar BSS: Linear model [Comon10] 3 Goal of BSS
: estimate A* and S* from X X : m rows observations and t samples columns (m x t) A* : mixing function (m x n) S* : sources (n x t) - the sources will be assumed to be sparse N : noise and model imperfections (m x t) X = A⇤S⇤ + N (up to limited indeterminacies) = + … + N X x y x y x y A⇤1 S⇤ 1 + S⇤ 2 A⇤2

L2S seminar Sparse BSS [Zibulevsky01] 4 Ill-posed problem [1] =>
introduce prior about S: - ICA [Comon10] : Independent Component Analysis - NMF [Gillis12] : Non-negative Matrix Factorization - SMF [Zibulevsky01] : Sparse Matrix Factorization S X = A⇤S⇤ + N : wavelet domain S (S⇤ S)1 Sparsity: Most of the source coefficients are close to 0 in S S⇤ 1 Direct domain

L2S seminar BSS as an optimization problem 5 Written as
an optimization problem [Zibulevsky01, Bobin07]: RS : Regularization parameters (size n x t) How to solve sparse BSS in practice? Data- fi delity Sparsity Oblique constraint argmin A2Rm⇥n,S2Rn⇥t 1 2 kX ASk2 F + RS (S T S ) 1 + ◆{8i2[1,n];kAik2 `2 =1} (A) T S : is a sparsifying transform Challenges: - Non-smooth (needs advanced optimization tools: proximal operators [Parikh14]) - Non-convex (non-unique minima) - Difficult hyperparameter choice => Difficult optimization problem, requiring some difficult tuning from the user => In the following, we will see how to make these choices more automatic

L2S seminar Outline 6 I - Optimization strategy for sparse
BSS: towards automatic parameter choices III - New data-driven methods for sparse BSS through algorithm unrolling II - Extension to single-observation complex component separation: continuous target extraction in SAR images

L2S seminar Optimization frameworks 7 pALS PALM [Bolte14] Gradient based
Least square based While not converged over and do: A S S ← 𝒮 ηRS (S − η (AS − X)) A ← Π∥.∥2 =1 (A − ξ (AS − X)) While not stabilized over and : A S S ← 𝒮 RS (A†X) A ← 𝒮 RS (XS†) While not converged over do: S S ← 𝒮 ηRS (S − η (AS − X)) A ← Π∥.∥2 =1 (A − ξ (AS − X)) While not converged over do: A While not converged over and do: A S BCD [Tseng01] argmin A2Rm⇥n,S2Rn⇥t 1 2 kX ASk2 F + RS (S T S ) 1 + ◆{8i2[1,n];kAik2 `2 =1} (A) with: - the proximal operator of the -norm (soft-thresholding of parameters , applied element-wise) - the projection on the unit sphere - some gradient step-sizes 𝒮 ηRS ( . ) ℓ1 ηRS Π∥.∥2 =1 ( . ) ℓ2 η, ξ While PALM is in theory the best algorithm, it is in practice often reported to have bad practical results in BSS. Why?

L2S seminar First study: why is PALM difficult to use
in sparse BSS? 8 Median of CA - 3x10-4 change => 30 dB drop - In diagonal, a 2x10-3 change => 7 dB drop Zoom on maximum 1 2 1 2 Empirical study: study sensitivity to regularization parameters - Simulated n = 2 sources (dynamic range = 0.6) - Compute [Bobin15] : indeterminacy correction Difficult choice of RS RS = 2 6 6 4 1 1 ... 1 2 2 ... 2 ... ... ... ... n n ... n 3 7 7 5 Let us first assume one parameter per source: CA = 10 log(mean(|P ˆ A†A⇤ Id |)), P GMCA: 36 dB

L2S seminar Empirical study conclusions: sensitivity of PALM + grid
search 9 - Low efficiency: for given X, hard to choose RS - Low versatility: a choice of RS does not generalize to other X - Low reliability: sensitive to initial point Kervazo, C., Bobin, J., Chenot, C., & Sureau, F. (2020). Use of PALM for ℓ1 sparse matrix factorization: Difficulty and rationalization of a two-step approach. Digital Signal Processing, 97, 102611. => Practical aspect not well discussed in the literature (grid-search, use of the true unknown factorization…) => Intractable for real data

L2S seminar 10 Comparison between sparse BSS algorithms BCD1 pALS2
/ GMCA3 PALM4 Reference [Tseng01] [Bobin07] [Bolte14] Fast ✕ Convergence ✓ ✓ ✕ Automatic parameter choice ✕ ✕ ✓ ✓✓ ✓ Easy to use Mathematically sound How to obtain both an easy to use and mathematically sound algorithm? Work on PALM Find a min. ? ✓ ✓ ✕ argmin A2Rm⇥n,S2Rn⇥t 1 2 kX ASk2 F + RS (S T S ) 1 + ◆{8i2[1,n];kAik2 `2 =1} (A) Robust to spurious min. ✕ ✓ ✕ 1Block Coordinate Descent, 2 projected Alternating Least-Square, 3 Generalized Morphological Component Analysis, 4 Proximal Alternating Linearized Minimization

L2S seminar More about GMCA automatic hyper-parameter choice 11 -
In GMCA, use of MAD of the current source estimation (new value at each iteration): - MAD: robust STD estimator. Insensible to sparse perturbations: MAD(S⇤) ' 0 - Such a choice is motivated by a fixed point argument: assume that at a given iteration, GMCA has perfectly estimated , then the source update is given by: A* S small RS = 2 6 6 4 1 1 ... 1 2 2 ... 2 ... ... ... ... n n ... n 3 7 7 5 argmin A2Rm⇥n,S2Rn⇥t 1 2 kX ASk2 F + RS (S T S ) 1 + ◆{8i2[1,n];kAik2 `2 =1} (A) with S ← 𝒮 RS (A*† X) = 𝒮 RS (A*† A*S* + A*† N) = 𝒮 RS (S* + A*† N) - Therefore, to make , the soft-thresholding should remove the Gaussian noise => by thresholding with , 99% of the noise is removed => here, is estimated using S ← S* A*† N λi = 3 × 𝚜 𝚝 𝚍 [A*† N]i 𝚜 𝚝 𝚍 [A*† N]i 𝙼 𝙰 𝙳 ( ̂ A†X) MAD(s) ' 1.48 median(|s median(s)|) with λi = κ 𝙼 𝙰𝙳 [ ̂ A†X]i

L2S seminar Choosing hyper-parameters in PALM? 12 If we assume
errors on the sources: = S* + s (in particular, s non-zero at initialization) s ̂ S Implement same heuristic as in GMCA: If true A* and S* are found, similar interpretation at convergence => Unadapted threshold choice with the MAD due to interferences Add dense interferences (not present GMCA) i '  ⇥ MAD ⇥ A⇤T N A⇤T A⇤s ⇤ i i '  ⇥ MAD A⇤T N i Need to limit interferences : - Good initialization (s(0) small) - Reweighted (adapt thresholds to sources) ℓ1 PALM λi ≃ 𝙼𝙰𝙳 [A*† N]i GMCA

L2S seminar Algorithm: embed PALM within a 2-step approach GMCA
robustness to initialization Automatic parameter choice (computed on good initialization) Reweighting from GMCA PALM theoretical background 13 RS = ⇤SG

L2S seminar Experiment on realistic astrophysics data 14 Experiments from
simulated Chandra data n = 3 sources t = 128 x 128 pixels m = 12 observations S* ˆ S S⇤ ˆ S 2-step Fast Convergence ✓ Automatic Rs Find a min. ? ✓ Reliable ✓ ✓ ✓ SNR (dB) 10 15 20 30 60 2 step 15.0 16.3 17. 4 19.7 20.9 PALM 11.9 13.3 13.5 14.2 14.5 GMCA 13.2 14.8 15.1 17.1 18.6 EFICA [Kodolvsky06] 8.8 10.3 14.0 18.9 19.4 RNA [Zibulevsky03] 9.8 12.6 15.6 18.3 18.4 CA (in dB)

L2S seminar Recent use: bioluminescence 15 - Application in biomedical
imaging (real dataset) - Modality: optical imaging based on the counting of photons emitted by biological tissues through a bioluminescence reaction - Goal: enable to separate multiple tumors in mice, based on their temporal bioluminescence activity Dereure E., Kervazo C., Seguin J., Garofalakis A., Mignet N., Angelini E., Olivo-Marin J.-C., Sparse non-negative matrix factorization for preclinical bioluminescent imaging, Accepted to ISBI 2023 conference

L2S seminar Recent use: bioluminescence 16 - Non-negative extension of
the above work: PALM is used for minimizing arg min A,S 1 2 ∥X − AS∥2 F + ∥R ⊙ S∥1 + i.≥0 (A) . + i.≥0 (S) + i{∀j∈[1,n],∥Aj∥ℓ2 ≤1} (A) - A good initialization and some reweighting are used, the thresholds are based on the MAD - In addition, a gain is used for each source to handle their imblance. No sparsity Fixed gain No reweighting Fixed gain Reweighting A d a p t a t i v e gain A d a p t a t i v e gain

Target extraction SAR imaging & objective 18 - SAR imaging
is a remote sensing modality, yielding complex images - SAR images are the superposition - a background - some targets x ∈ ℂn×n n ∈ ℂn×n t ∈ ℂn×n Here, we will work on the complex images : x = n + t ∈ ℂn×n Goal: retrieve both and from , which can be seen as a complex single-channel source separation problem n t |x| |n| |t|

Target extraction Additional information about the problem at hand 19
- The targets are assumed to be a Dirac in the scene, which are then convoluted by the SAR imaging system PSF, which is a cardinal sine - If the targets were lying on the image sampling grid, they would also be Dirac on the image - However, they have continuous positions in the scene => they ressemble cardinal sines, making their separation harder (we cannot just extract the local maxima in the image) - For each target, we want to estimate: - Its amplitude - The localization of the maximum with a subpixellic accuracy Objective

Target extraction Sparsity as a separation principle 20 - and
(background) follow Gaussian laws (of unknown variance) - Only a few targets are present in the image ℜ{n} ℑ{n} Additional assumptions: As finding and from is ill-posed, we resort to additional assumptions: n t x = n + t First (naive) approach - The PSF knowledge could be exploited by creating a dictionary containing all the cardinal sines shifted at all the sampling points of the grid - To exploit sparsity, instead of using the pseudo-norm, we can relax it: Φ ∈ ℝn×n ℓ0 - For the sake of clarity, we will consider in the following vectors instead of images x ∈ ℝn arg min α∈ℝρN 1 2 ∥x − Φα∥2 2 + λ∥α∥1 , with a set of pixel-wise hyperparameter λ ∈ ℝ - For subpixelic precisions, we can oversample the dictionary by a factor ρ - Most existing methods are greedy approaches, extracting the targets one at a time

Target extraction Remark: optimization trick 21 - In practice, computing
is expensive, as it is in - Fortunately, can be seen as a circulant matrix and the cost function can then be rewritten using a convolution Φ ℝn×ρn Φ - All the convolutions can then be computed in the Fourier domain. As such, we never need to compute explicitly ! Φ - The minimization can then be performed by using for instance the ISTA/FISTA algorithm arg min α∈ℝρN 1 2 ∥x − Φα∥2 2 + λ∥α∥1 , arg min α∈ℝρN 1 2 ∥x − sinc * α∥2 2 + λ∥α∥1 ,

Target extraction Naive approach: limitations 22 - Exemple of results
on a simulation |x| |α| - The targets are split in the neighboring pixels - Too much targets are found compared to the ground truth , zoom |α| - Two reasons: - The dictionary might be ill-conditioned (especially, when is large) - Difficulty to set the hyperparameters Φ ρ λ ∈ ℝ

Target extraction Better approach: Continuous Basis Pursuit 23 - To
bypass the fact that the dictionary might be ill-conditioned, we propose to use a CBP approach. - Since the positions of the target are continuous, CBP is based on a Taylor expansion of the sinc function around the grid points. Φ ∑ i αi sinc(π(t − iΔ + li )) ≃ |li |≤Δ ∑ i αi sinc(π(t − iΔ)) + αi li 2 sinc′ (π(t − iΔ)) Positions on the grid Displacement compared to the grid (continuous) = ∑ i αi sinc(π(t − iΔ)) + δi sinc′ (π(t − iΔ)), with δi = αi li 2

Target extraction CBP cost function 24 - Using the above
Taylor expansion, we can derive a new cost function: arg min α∈ℂρn, δ∈ℝρn 1 2 ∥x − Φα − Ψδ ⊙ α∥2 2 + λ∥α∥1 + ι.≤ Δ 2 (δ) - Where: - The component-wise product is denoted as - The columns of the dictionary are shifted derivatives of the sinc - The is related to the shift compared to the regular grid and is a continuous variable ⊙ Ψ δ - Once and are found, it is easy to find the displacement over the regular grid by shifting the grid coordinates of α δ Δ 2 δ

Target extraction CBP: optimization framework 25 - In the real
case, the CBP cost function arg min α∈ℂsN, δ∈ℝsN 1 2 ∥x − Φα − Ψδ ⊙ α∥2 2 + λ∥α∥1 + ι.≤ Δ 2 (δ) can be rewritten in a convex way [Ekanadham2011]. - However, in the complex-case, it is non-convex. - Still, it is multi-convex and we can therefore use the PALM algorithm in order to try to minimize it. Initialize with FISTA, and will zero shift While not converged return α δ α = 𝒮 λη (α + η [Φ* + diag(δ)Ψ*] (x − Φα − Ψα ⊙ δ)) δ ← ΠΔ 2 (δ + ξ[diag( ¯ α)Ψ*](x − Φα − Ψα ⊙ δ)) α, δ All the products with are done using convolutions for higher efficiency Φ, Φ*, Ψ, Ψ* PALM for CBP

Target extraction Extension to images 26 - In the case
of images (2-D signals) , the methodology easily extends: x ∈ ℂsN×sN arg min α∈ℂsN, δh∈ℝsN, δv∈ℝsN 1 2 ∥x − Φα − Ψhδh ⊙ α − Ψvδv ⊙ α∥2 2 + λ∥α∥1 + ι.≤ Δ 2 (δh) + + ι.≤ Δ 2 (δv) where: - contain the horizontal and vertical displacement, respectively - The columns of are and the columns of are δh, δv Φh ∂sinc(x, y) ∂x Φv ∂sinc(x, y) ∂y

Target extraction Regularization parameter setting 27 - Choice of in:
λ arg min α∈ℂρn, δ∈ℝρn 1 2 ∥x − Φα − Ψδ ⊙ α∥2 2 + λ∥α∥1 + ι.≤ Δ 2 (δ) In principle: - should not be constant for the whole image, as it is related to the background level => we can rather use a -map, with a value for each pixel - Similarly to what we did in the previous part, we use a fixed-point argument : λ Λ λ (Λ ∈ ℝρn) => if the target are well separated, , which follows a Rayleigh law, should be canceled by the proximal operator of the -norm γ [ΦT + diag(δ)ΨT] n ℓ1 α* + γ[ΦT + diag(δ)ΨT](x − Φα* − Ψα* ⊙ δ) ≃ α* + γ [ΦT + diag(δ)ΨT] n => standard deviation is estimated using the MAD. γ [ΦT + diag(δ)ΨT] n

Target extraction Target localization: results 28 - Comparison of :
- Basis Pursuit with a threshold based on the true noise level - Continuous Basis Pursuit with a threshold based on the true noise level - Continuous Basis Pursuit with a threshold based on the MAD rule Λ* Λ* Median target localization error Blue: true targets; red: estimated targets CBP, thresholds Λ* CBP, MAD-based thresholds

Target extraction Target localization: results 29 - Proposed a new
method for target extraction in SAR signals - Based on convex relaxation, contrary to most other methods - Leveraged Continuous Basis Pursuit to obtain a higher subpixellic accuracy - Used the PALM algorithm to minimize the corresponding cost function - An automatic threshold choice is used Remaining work: - Further tests on real datasets - New ways of choosing the thresholds (Plug-and-Play methods / Unrolling ?) Kervazo, C., & Ladjal, S. (2022). Extraction des positions continues des cibles dans les signaux RSO, GRETSI conference.

GDR MIA Plug-and-Play workshop Limitations of tackling sparse BSS using
PALM 31 • If we have access to a data base with examples of mixtures and the corresponding factors A* and S*, can we make PALM better work by introducing some learnt components? • The method we use here is algorithm unrolling => enables to bypass the cumbersome hyper-parameter choice => much more computationally efficient than PALM => yield interpretable neural networks Contribution • We highlighted the difficulty of tuning BSS methods [Kervazo20] • We used an heuristic based on the use of the MAD for finding some decent hyperparameters • But it is still handcrafted, and the MAD might not be the best noise STD estimator. • In addition, PALM might require in any case several thousands of iterations to converge, reducing its applicability

Algorithm unrolling: setting 32 Sparse Blind Source Separation: minimization of
the cost function • Let us consider sparse BSS, in which (for the moment) we only want to estimate from a (when is estimated, can be quite well estimated by least squares) S* X S* A* • Let us further assume that : • We have training datasets such that: atrain 1X, 2X, . . , atrainX X = A* S* + N , 1X = 1A* 1S* +1 N 2X = 2A* 2S* +2 N . . . atrainX = atrainA* atrainS* +atrain N • For each training dataset , we have access to the corresponding source iX iS* with « of the same kind as » (same for and ) 1A*, 2A*, . . , atrainA* A* 1S*, 2S*, . . , atrainS* S* Algorithm unrolling is then a principled method to construct neural network architectures enabling to recover S*

GDR MIA Plug-and-Play workshop Algorithm unrolling: methodology 33 • (Classical)
iterative method to estimate : S* with the algorithm parameters (gradient step sizes…) θ • Algorithm unrolling truncates this scheme to rewrite it in the form of a neural network with a small number of layers (iterations): • The algorithms parameters becomes trainable on a training set (i.e. they becomes the weights of the neural network) • The number of iterations is usually much smaller than in the original algorithm θ(k) L S ← 𝒮 λ LS ( S − 1 LS AT(A*S − X) ) It can be sketched as: X S fθ (A) update A while not converged do: Update A X S f(1) θ1 (A) f(2) θ2 (A) f(3) θ3 (A) f(L) θL (A) …

GDR MIA Plug-and-Play workshop Being more specific: LISTA algorithm 34
• [Gregor, Lecun 10] rewrote the gradient-proximal update in the form of a neural network • It is based on a reparametrization of the update S ← 𝒮 λ L ( S − 1 L AT(AS − X)) S ← 𝒮 λ LS (W1 S +W2 X) • This update is a typical NN update: linearity followed by non-linearity • If there was no learning, and (but note that this is not exploited in LISTA, since and are learnt independently) W1 = I − 1 LS ATA W2 = 1 LS AT W1 W2 Θ = {λ/LS , W1 , W2 } LISTA update S ← 𝒮 λ L (( I − 1 L ATA ) S + 1 L ATX ) ⇔ learning some parts of the update

GDR MIA Plug-and-Play workshop Difficulties of applying LISTA for sparse
BSS 35 • But in the original LISTA work, the A* operator is the same over all the , which is not the case in BSS: iX iX = iA* iS* + iN iA* 1 , i = 1..50 iA* 2 , i = 1..50 iA* 3 , i = 1..50 iA* 4 , i = 1..50 => LISTA numerically obtains bad results for sparse BSS S ← 𝒮 λ LS (W1 S +W2 X) Illustration on the introductory dataset:

GDR MIA Plug-and-Play workshop LISTA-CP 36 • In [chen18], the
authors have used the theoretical coupling of • Therefore, it is important to have an update taking into account the variability of the over the different samples iA* iX W1 = I − 1 L ATA W2 = 1 L AT and to propose the LISTA-CP algorithm, leading to the update: S ← 𝒮 λ LS (S − WT(AS − X)) • To handle large variations of the operator between the datasets , we propose to use a LISTA-CP like update for update, within an alternating structure enabling to also estimate • This corresponds to unroll PALM in a way which is specifically Taylored for blind inverse problems iA* iX iS iA* LISTA-CP is better suited than LISTA when all are not the same (in LISTA, and are the same for all the samples , giving no flexibility for the update) iA* W1 W2 iX ☺ • In LISTA-CP, the matrix appears in the update : A ☹︎ LISTA-CP requires an estimate of iA* S ← 𝒮 λ LS (W1 S +W2 X) (LISTA update) (LISTA-CP update)

GDR MIA Plug-and-Play workshop Solving the BLIND source separation problem
37 • The way to unroll PALM was chosen according to the previous remarks and experimental trials : S ← 𝒮 γ (S − WT(AS − X)) A ← Π∥.∥≤1 ( A + 1 LA (X − AS)ST ) (LISTA-CP update) (learning step-size) • The loss function is chosen as NMSE(A, A*) + NMSE(S, S*) for k from 1 to L do : end for return A, S initialize A and S with a very generic initialization over the training set. Learned-PALM (LPALM)

GDR MIA Plug-and-Play workshop Numerical experiments: datasets 38 • Test
set: with: X = A*S* + N A* • A* coming from realistic simulations • S* being real supernovae maps (reshaped into vectors) • N generated Gaussian noise S* • Train set: 750 samples of with: iX = iA*iS* + iN • coming from realistic simulations (quite different from the test set) iA* • generated using a Bernouilli Generalized-Gaussian distribution iS* • generated Gaussian noise iN for . Each column is represented with a different color iA* i = 1..50 LPALM is tested on astrophysical simulations of the Cassiopea A supernovae remnant as observed by the X-ray telescope Chandra. There are emissions: synchrotron, thermal and 2 red-shifted irons n = 4

GDR MIA Plug-and-Play workshop Numerical results: comparison with PALM 39
Blue, plain and dashed lines: median number of iterations for PALM and LPALM, respectively Red, plain and dashed lines: median NMSE for PALM and LPALM, respectively LPALM is compared with PALM, by optimizing PALM parameters over the train set: LPALM largely outperforms PALM, both: - in terms of separation quality - in terms of number of iterations

GDR MIA Plug-and-Play workshop Numerical results: comparison with other unrolled
methods 40 S* S* LPALM largely outperforms its competitors: - LISTA lacks of flexibility to handle varying matrices - DNMF suffers from a training using the reconstruction error only: , which is well-known to lead to spurious solutions iA* ∥iX − iAiS∥2 2

GDR MIA Plug-and-Play workshop Conclusion 41 Further research paths which
are currently considered: • Perform unrolling on other kind of BSS algorithms • Extend LPALM to remote-sensing datasets in which spectral variabilities can occur even within a single dataset • Study semi / unsupervised unrolled methods • Explore new methods for SAR target extraction (based on unrolling and/or PnP) X • We started by showing the practical difficulty of choosing hyperparameters in optimization-based BSS methods • We have empirically shown that using a MAD-heuristic for the hyperparameter choice, based on a fixed point argument, can lead to good results • This extends to complex data in the context of target continuous position extraction in SAR imaging • To go a step further, and if a training set with ground-truth is available, we have shown the interest of using algorithm unrolling to propose neural network architectures enabling to perform BSS by mimicking the structure of PALM.

Christophe Kervazo

Christophe Kervazo

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript