Slide 1

Slide 1 text

Robust multichannel sparse recovery Esa Ollila Department of Signal Processing and Acoustics Aalto University, Finland SUPELEC, Feb 4th, 2015

Slide 2

Slide 2 text

1 Introduction 2 Nonparametric sparse recovery 3 Simulation studies 4 Source localization with Sensor Arrays

Slide 3

Slide 3 text

Single Measurement Vector Model (SMV) y M × 1 = Φ M × N x N × 1 + e N × 1 Φ = φ1 · · · φN = φ(1) · · · φ(M) H, M × N known measurement (design, system) matrix x = (x1, . . . , xN )⊤, unobserved signal vector e = (e1, . . . , eM )⊤, random noise vector. Objective recover K-sparse or compressible signal from y knowing Φ, K NOTE: typically, M < N (ill-posed model) and K ≪ N. 2/24

Slide 4

Slide 4 text

Single Measurement Vector Model (SMV) y M × 1 = Φ M × N x N × 1 + e M × 1 If we can determine Γ = supp(x), then the problem reduces to conventional regression problem (more ’responses’ than ’predictors’): y Φ x e 3/24

Slide 5

Slide 5 text

Multiple Measurement Vectors Model (MMV) yi = Φxi + ei , i = 1, . . . , Q In matrix form Y M × Q = Φ M × N X N × Q + E M × 1 Y = y1 · · · yQ , observed measurement matrix X = x1 · · · xQ , unobserved signal matrix E = e1 · · · eQ , noise matrix Objective recover K-sparse (or compressible) signal matrix X knowing Y, Φ, K. N × Q matrix X is called K-sparse if X 0 = |supp(X)| ≤ K, where Γ = supp(X) = Q i=1 supp(xi ) = {j ∈ {1, . . . , N} : xjk = 0 for some k} 4/24

Slide 6

Slide 6 text

Multiple Measurement Vectors Model (MMV) Key idea: signals are all predominantly supported on a common support set =⇒ Joint estimation can lead both to computational advantages and increased reconstruction accuracy See [Tropp, 2006, Chen and Huo, 2006, Eldar and Rauhut, 2010, Duarte and Eldar, 2011, Blanchard et al., 2014]. Applications EEG/MEG [Gorodnitsky et al., 1995] equalization of sparse communications channels [Cotter and Rao, 2002] blind source separation [Gribonval and Zibulevsky, 2010] source localization using sensor arrays [Malioutov et al., 2005] . . . 5/24

Slide 7

Slide 7 text

Matrix norms The mixed ℓp,q norm of X for p, q ∈ [1, ∞) X p,q = i j |xij |p q/p 1/q = i x(i) q p 1/q . If p = q =⇒ matrix p-norm X p,p = X p . ℓ2 (Frobenius) norm · 2 is denoted shortly as · . The mixed ℓ ∞ -norms of X ∈ CN×Q X p,∞ = maxi∈{1,...,N} x(i) p X ∞,q = i (maxj∈{1,...,Q} |xij |)q 1/q. The row ℓ0 quasi-norm of X X 0 = | supp(X)| = nr. of number of nonzero rows of X 6/24

Slide 8

Slide 8 text

Optimization problem min X Y − ΦX 2 subject to X 0 ≤ K is combinatorial (NP-hard). More feasible approaches: Optimization (Convex relaxation) Greedy pursuit: Orthogonal Matching Pursuit (OMP) Normalized Iterative Hard Thresholding (NIHT) Normalized Hard Thresholding Pursuit (NHTP) Compressive Sampling MP (CoSaMP) Guaranteed to perform very well under suitable conditions (restricted isometry property on Φ, non impulsive noise e) NIHT is simple and scalable, convenient for problems where the support is of primary interest 7/24

Slide 9

Slide 9 text

Optimization problem min X Y − ΦX 2 subject to X 0 ≤ K is combinatorial (NP-hard). More feasible approaches: Optimization (Convex relaxation) Greedy pursuit: Orthogonal Matching Pursuit (OMP) Normalized Iterative Hard Thresholding (NIHT) Normalized Hard Thresholding Pursuit (NHTP) Compressive Sampling MP (CoSaMP) Guaranteed to perform very well under suitable conditions (restricted isometry property on Φ, non impulsive noise e) NIHT is simple and scalable, convenient for problems where the support is of primary interest 7/24

Slide 10

Slide 10 text

Convex relaxation IDEA: replace X 0 by ℓp,1 norm [Tropp, 2006] uses p = ∞: min X Y − ΦX 2 + λ X ∞,1 where X ∞,1 = i |x(i) ∞ . [Malioutov et al., 2005] uses p = 2: min X Y − ΦX 2 + λ X 2,1 where X 2,1 = i |x(i) . good recovery depends on the choise of egularization parameter λ > 0 not robust due to the used data fidelity term (LS-loss) not as scalable as greedy methods 8/24

Slide 11

Slide 11 text

1 Introduction 2 Nonparametric sparse recovery 3 Simulation studies 4 Source localization with Sensor Arrays

Slide 12

Slide 12 text

Nonparametric approach based on mixed norms min X Y − ΦX q p,q data fidelity subject to X 0 ≤ K sparsity constraint (Pp,q ) =⇒ mixed ℓ1 norms provide robustness, i.e., give larger weights on small residuals and less weight on large residuals. We consider the cases (p, q) = (1, 1): Y − ΦX 1 = i j |yij − φH (i) xj | robust norm for errors i.i.d. in space and time (p, q) = (2, 1): Y − ΦX 2,1 = i y(i) − XHφ(i) robust norm for errors i.i.d. in ”space” only and devise greedy simultaneous NIHT (SNIHT) algorithm 9/24

Slide 13

Slide 13 text

Normalized iterative hard thresholding (NIHT) Conventional ℓ2,2 approach : Xn+1 = HK ↑ hard thresholding Xn + stepsize µn+1 ΦH(Y − ΦXn) ↑ neg. gradient ∇X∗ · 2 ψ2,2 (X) = X =⇒ Conventional ℓ2,2 approach ψ1,1 (X) = [sign(xij )] (elementwise operation of S(·)) ψ2,1 (X) = [sign(x(i) )] (rowwise operation of S(·)) 10/24

Slide 14

Slide 14 text

Normalized iterative hard thresholding (NIHT) In our mixed norm ℓp,q approach: Xn+1 = HK ↑ hard thresholding Xn + stepsize µn+1 ΦHψp,q (Y − ΦXn) ↑ neg. gradient ∇X∗ · q p,q ψ2,2 (X) = X =⇒ Conventional ℓ2,2 approach ψ1,1 (X) = [sign(xij )] (elementwise operation of S(·)) ψ2,1 (X) = [sign(x(i) )] (rowwise operation of S(·)) 10/24

Slide 15

Slide 15 text

Normalized iterative hard thresholding (NIHT) In our mixed norm ℓp,q approach: Xn+1 = HK ↑ hard thresholding Xn + stepsize µn+1 ΦHψp,q (Y − ΦXn) ↑ neg. gradient ∇X∗ · q p,q ψ2,2 (X) = X =⇒ Conventional ℓ2,2 approach ψ1,1 (X) = [sign(xij )] (elementwise operation of S(·)) ψ2,1 (X) = [sign(x(i) )] (rowwise operation of S(·)) Spatial sign sign(x) = x/ x , for x = 0 0, for x = 0 10/24

Slide 16

Slide 16 text

Normalized iterative hard thresholding (NIHT) In our mixed norm ℓp,q approach: Xn+1 = HK ↑ hard thresholding Xn + stepsize µn+1 ΦHψp,q (Y − ΦXn) ↑ neg. gradient ∇X∗ · q p,q ψ2,2 (X) = X =⇒ Conventional ℓ2,2 approach ψ1,1 (X) = [sign(xij )] (elementwise operation of S(·)) ψ2,1 (X) = [sign(x(i) )] (rowwise operation of S(·)) Spatial sign sign(x) = x/ x , for x = 0 0, for x = 0 10/24

Slide 17

Slide 17 text

SNIHT(p, q) algorithm input : Y, Φ, sparsity K, mixed norm indices (p, q) output : (Xn+1, Γn+1) as estimates of X and Γ = supp(X) initialize: X0 = 0, µ0 = 0, n = 0, Γ0 = ∅ 1 Γ0 = supp HK (ΦHψp,q (Y)) while halting criterion false do 2 Rn ψ = ψp,q (Y − ΦXn) 3 Gn = ΦHRn ψ 4 µn+1 = CompStepsize(Φ, Gn, Γn, µn, p, q) 5 Xn+1 = HK (Xn + µn+1Gn) 6 Γn+1 = supp(Xn+1) 7 n = n + 1 end 11/24

Slide 18

Slide 18 text

Computation of the step size µn+1 For convergence, it is important that the stepsize is adaptively controlled at each iteration. Given the found support Γn is correct, one can choose µn+1 as the min. of the objective fnc for the gradient ascent direction Xn + µGn|Γn : L(µ) = Y − Φ Xn + µGn|Γn q p,q = Rn − µB q p,q where Rn = Y − ΦXn and B = ΦG(Γn) For p = q: simple linear regression problem based on ℓp -norm, with response r = vec(Rn) and predictor b = vec(B). ⇒ µn+1 = Gn (Γn) 2/ ΦGn (Γn) 2 for p = q = 2 For p = q = 1 and (p, q) = (2, 1), minimizer of L(µ) can not be found in closed-form. 12/24

Slide 19

Slide 19 text

Computation of the step size µn+1 When p = q = 1, the solution µ verifies a fixed point (FP) equation µ = H(µ) H(µ) = i,j |˜ rij |−1|bij |2 −1 i,j |˜ rij |−1Re(b∗ ij rij ), and ˜ R = Rn − µB (depends on unknown µ) and B = ΦG(Γn) . We select µn+1 as the 1-step FP iterate µn+1 = H(µn), where µn is the value of the stepsize at previous nth iteration. Often this approximation was within ∼ 1e−3 to minimizer of L(µ). Same approach is used in the case (p, q) = (2, 1). 13/24

Slide 20

Slide 20 text

1 Introduction 2 Nonparametric sparse recovery 3 Simulation studies 4 Source localization with Sensor Arrays

Slide 21

Slide 21 text

Simulation study Y M × Q = Φ M × N X N × Q + E M × 1 Set-up Measurement matrix Φ: φij ∼ CN(0, 1) with unit-norm columns Γ = supp(X) randomly drawn K elements from {1, . . . , N} signal matrix X: |xij | = 10 and Arg(xij ) ∼ Unif (0, 2π) ∀i ∈ Γ. For ℓ = 1, . . . , L MC-trials (L = 1000), compute X[ℓ] and Γ[ℓ] Performance measures mean squared error MSE( ^ X) = 1 LQ L ℓ=1 ^ X[ℓ] − X[ℓ] 2. probability of exact recovery PER = 1 L L ℓ=1 I ^ Γ[ℓ] = Γ[ℓ] . 14/24

Slide 22

Slide 22 text

IIID Gaussian noise, eij ∼ CN (0, σ2) MSE vs SNR for Q=16 0 4 8 12 16 20 10 15 20 25 30 35 SNR (dB) MSE (dB) SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT PER vs Q at SNR = 5 dB 2 4 6 8 10 12 14 16 18 0 0.2 0.4 0.6 0.8 1 Q (number of vectors) PER SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT SNIHT(2, 2) but SNIHT(2, 1) and HUB-SNIHT (threshold c = 1.517) loose only 0.09 dB in MSE (Q = 16) 15/24

Slide 23

Slide 23 text

IID t-distributed noise, eij ∼ Ctν (0, σ2), σ2 = MedF (|ei |2) SNR(σ) = 30 dB 1 1.5 2 3 4 5 0 10 20 30 40 50 60 ν (degrees of freedom) MSE (dB) SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT SNR(σ) = 10 dB, ν = 3 2 4 6 8 10 12 14 16 18 0 0.2 0.4 0.6 0.8 1 Q (number of vectors) PER SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT SNIHT(1, 1) 16/24

Slide 24

Slide 24 text

Row IIID compound Gaussian inverse Gaussian texture, e(i) ∼ CIGλ (0, σ2IQ ) 0 4 8 12 16 20 0 5 10 15 20 25 30 SNR (dB) MSE (dB) SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT SNIHT(2, 1) 17/24

Slide 25

Slide 25 text

1 Introduction 2 Nonparametric sparse recovery 3 Simulation studies 4 Source localization with Sensor Arrays

Slide 26

Slide 26 text

Sensor array and narrowband, farfield source model M sensors, K sources (M > K): Model for sensor measurements at time instant t y(t) = A(θ)x(t) + e(t), - steering matrix A(θ) = a(θ1 ) . . . a(θK ) - unknown (distinct) Direction of Arrivals (DOA’s) θ1, . . . , θK - known array manifold a(θ) (e.g., ULA) - unknown signal waveforms x(t) = (x1 (t), . . . , xK (t))⊤ Objective Estimate the DOA’s of the sources given the snapshots y(t1 ), . . . , y(tQ ) and the number of sources K impinging on the sensor array. 18/24

Slide 27

Slide 27 text

Multichannel sparse recovery approach to source localization Construct an M × N overcomplete steering matrix A(˜ θ) for a grid of all source locations ˜ θ1, . . . , ˜ θN of interest. Suppose that ˜ θ contains the true DOA’s to some accuracy. The array model in matrix form then rewrites as Y = y(t1 ) · · · y(tQ ) = A(˜ θ)X + E, where X ∈ CN×Q is K-sparse, with K non-zero rows containing the source waveforms at time instants t1, . . . , tQ . =⇒ K-sparse MMV model In the multichannel sparce recovery formulation A(˜ θ) is completely known, and we can use SNIHT methods to identify the support. 19/24

Slide 28

Slide 28 text

Multichannel sparse recovery approach to source localization Construct an M × N overcomplete steering matrix A(˜ θ) for a grid of all source locations ˜ θ1, . . . , ˜ θN of interest. Suppose that ˜ θ contains the true DOA’s to some accuracy. The array model in matrix form then rewrites as Y = y(t1 ) · · · y(tQ ) = A(˜ θ)X + E, where X ∈ CN×Q is K-sparse, with K non-zero rows containing the source waveforms at time instants t1, . . . , tQ . =⇒ K-sparse MMV model In the multichannel sparce recovery formulation A(˜ θ) is completely known, and we can use SNIHT methods to identify the support. 19/24

Slide 29

Slide 29 text

Multichannel sparse recovery approach to source localization Construct an M × N overcomplete steering matrix A(˜ θ) for a grid of all source locations ˜ θ1, . . . , ˜ θN of interest. Suppose that ˜ θ contains the true DOA’s to some accuracy. The array model in matrix form then rewrites as Y = y(t1 ) · · · y(tQ ) = A(˜ θ)X + E, where X ∈ CN×Q is K-sparse, with K non-zero rows containing the source waveforms at time instants t1, . . . , tQ . =⇒ Finding DOA’s is equivalent to identifying Γ = supp(X) In the multichannel sparce recovery formulation A(˜ θ) is completely known, and we can use SNIHT methods to identify the support. 19/24

Slide 30

Slide 30 text

Multichannel sparse recovery approach to source localization Construct an M × N overcomplete steering matrix A(˜ θ) for a grid of all source locations ˜ θ1, . . . , ˜ θN of interest. Suppose that ˜ θ contains the true DOA’s to some accuracy. The array model in matrix form then rewrites as Y = y(t1 ) · · · y(tQ ) = A(˜ θ)X + E, where X ∈ CN×Q is K-sparse, with K non-zero rows containing the source waveforms at time instants t1, . . . , tQ . =⇒ Finding DOA’s is equivalent to identifying Γ = supp(X) In the multichannel sparce recovery formulation A(˜ θ) is completely known, and we can use SNIHT methods to identify the support. 19/24

Slide 31

Slide 31 text

Simul set-up ULA of M = 20 sensors with half a wavelength interelement spacing. K = 2 independent Gaussian sources from θ1 = 0o and θ2 = 8o. Temporally/spatially white Gaussian error terms. Low SNR = 0 dB. Uniform grid ˜ θ on [−90, 90] with 2o degree spacing We compute PER rates and relative frequencies of found DOA estimates in the grid for 1000 Monte Carlo runs. 20/24

Slide 32

Slide 32 text

Number of snapshots Q = 4 DOA’s at θ1 = 0o and θ1 = 8o. −4 −2 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 DOA θ degrees (2o sampling grid) Frequency SNIHT(2, 2) SNIHT(1, 1) SNIHT(2, 1) PER rates 1 SNIHT(1, 1) = 81% 2 SNIHT(2, 1) = 73.1% 3 HUB-SNIHT = 42.6% 4 SNIHT = 37.6% SNIHT(1, 1) 21/24

Slide 33

Slide 33 text

Number of snapshots Q = 40 DOA’s at θ1 = 0o and θ1 = 8o. −4 −2 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 DOA θ degrees (2o sampling grid) Frequency SNIHT(2, 2) SNIHT(1, 1) SNIHT(2, 1) PER rates 1 SNIHT(1, 1) = 100% 2 SNIHT(2, 1) = 73.1% 3 HUB-SNIHT = 42.6% 4 SNIHT = 37.6% SNIHT(1, 1) 22/24

Slide 34

Slide 34 text

Conclusions robust and nonparametric appraches for simultaneous sparse recovery 23/24

Slide 35

Slide 35 text

Conclusions robust and nonparametric appraches for simultaneous sparse recovery Different method, SNIHT(1, 1), SNIHT(2, 1), HUB-SNIHT, can be selected for a problem at hand (iid/correlated noise, high-breakdown, etc) 23/24

Slide 36

Slide 36 text

Conclusions robust and nonparametric appraches for simultaneous sparse recovery Different method, SNIHT(1, 1), SNIHT(2, 1), HUB-SNIHT, can be selected for a problem at hand (iid/correlated noise, high-breakdown, etc) Fast and scalable greedy SNIHT algorithm: increase in robustness does not imply increase in computation time! 23/24

Slide 37

Slide 37 text

Conclusions robust and nonparametric appraches for simultaneous sparse recovery Different method, SNIHT(1, 1), SNIHT(2, 1), HUB-SNIHT, can be selected for a problem at hand (iid/correlated noise, high-breakdown, etc) Fast and scalable greedy SNIHT algorithm: increase in robustness does not imply increase in computation time! Applications in source localization and image denoising, etc. 23/24

Slide 38

Slide 38 text

This talk based on E. Ollila (2015a) Nonparametric Simultaneous Sparse Recovery: an Application to Source Localization PDF (ArXiv): http://arxiv.org/pdf/1502.02441v1 EUSIPCO’15, submitted. E. Ollila (2015b). Robust Simultaneous Sparse Approximation A Festschrift in Honor of Hannu Oja, Springer, Sept. 2015. 24/24

Slide 39

Slide 39 text

Blanchard, J. D., Cermak, M., Hanle, D., and Jin, Y. (2014). Greedy algorithms for joint sparse recovery. IEEE Trans. Signal Processing, 62(7). Chen, J. and Huo, X. (2006). Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Processing, 54(12):4634–4643. Cotter, S. and Rao, B. (2002). Sparse channel estimation via matching pursuit with application to equalization. IEEE Trans. Comm., 50(3):374–377. Duarte, M. F. and Eldar, Y. C. (2011). Structured compressed sensing: From theory to applications. IEEE Trans. Signal Processing, 59(9):4053–4085. Eldar, Y. C. and Rauhut, H. (2010). Average case analysis of multichannel sparse recovery using convex relaxation. IEEE Trans. Inform. Theory, 56(1):505–519. Gorodnitsky, I., George, J., and Rao, B. (1995). Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm. J. Electroencephalogr. Clin. Neurophysiol., 95(4):231–251.

Slide 40

Slide 40 text

Gribonval, R. and Zibulevsky, M. (2010). Handbook of Blind Source Separation, chapter Sparse component analysis, pages 367–420. Academic Press, Oxford, UK. Malioutov, D., C ¸etin, M., and Willsky, A. S. (2005). A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Processing, 53(8):3010–3022. Tropp, J. A. (2006). Algorithms for simultaneous sparse approximation. part II: Convex relaxation. Signal Processing, 86:589–602. 24/24