Esa Ollila

Robust multichannel sparse recovery Esa Ollila Department of Signal Processing
and Acoustics Aalto University, Finland SUPELEC, Feb 4th, 2015

1 Introduction 2 Nonparametric sparse recovery 3 Simulation studies 4
Source localization with Sensor Arrays

Single Measurement Vector Model (SMV) y M × 1 =
Φ M × N x N × 1 + e N × 1 Φ = φ1 · · · φN = φ(1) · · · φ(M) H, M × N known measurement (design, system) matrix x = (x1, . . . , xN )⊤, unobserved signal vector e = (e1, . . . , eM )⊤, random noise vector. Objective recover K-sparse or compressible signal from y knowing Φ, K NOTE: typically, M < N (ill-posed model) and K ≪ N. 2/24

Single Measurement Vector Model (SMV) y M × 1 =
Φ M × N x N × 1 + e M × 1 If we can determine Γ = supp(x), then the problem reduces to conventional regression problem (more ’responses’ than ’predictors’): y Φ x e 3/24

Multiple Measurement Vectors Model (MMV) yi = Φxi + ei
, i = 1, . . . , Q In matrix form Y M × Q = Φ M × N X N × Q + E M × 1 Y = y1 · · · yQ , observed measurement matrix X = x1 · · · xQ , unobserved signal matrix E = e1 · · · eQ , noise matrix Objective recover K-sparse (or compressible) signal matrix X knowing Y, Φ, K. N × Q matrix X is called K-sparse if X 0 = |supp(X)| ≤ K, where Γ = supp(X) = Q i=1 supp(xi ) = {j ∈ {1, . . . , N} : xjk = 0 for some k} 4/24

Multiple Measurement Vectors Model (MMV) Key idea: signals are all
predominantly supported on a common support set =⇒ Joint estimation can lead both to computational advantages and increased reconstruction accuracy See [Tropp, 2006, Chen and Huo, 2006, Eldar and Rauhut, 2010, Duarte and Eldar, 2011, Blanchard et al., 2014]. Applications EEG/MEG [Gorodnitsky et al., 1995] equalization of sparse communications channels [Cotter and Rao, 2002] blind source separation [Gribonval and Zibulevsky, 2010] source localization using sensor arrays [Malioutov et al., 2005] . . . 5/24

Matrix norms The mixed ℓp,q norm of X for p,
q ∈ [1, ∞) X p,q = i j |xij |p q/p 1/q = i x(i) q p 1/q . If p = q =⇒ matrix p-norm X p,p = X p . ℓ2 (Frobenius) norm · 2 is denoted shortly as · . The mixed ℓ ∞ -norms of X ∈ CN×Q X p,∞ = maxi∈{1,...,N} x(i) p X ∞,q = i (maxj∈{1,...,Q} |xij |)q 1/q. The row ℓ0 quasi-norm of X X 0 = | supp(X)| = nr. of number of nonzero rows of X 6/24

Optimization problem min X Y − ΦX 2 subject to
X 0 ≤ K is combinatorial (NP-hard). More feasible approaches: Optimization (Convex relaxation) Greedy pursuit: Orthogonal Matching Pursuit (OMP) Normalized Iterative Hard Thresholding (NIHT) Normalized Hard Thresholding Pursuit (NHTP) Compressive Sampling MP (CoSaMP) Guaranteed to perform very well under suitable conditions (restricted isometry property on Φ, non impulsive noise e) NIHT is simple and scalable, convenient for problems where the support is of primary interest 7/24

Convex relaxation IDEA: replace X 0 by ℓp,1 norm [Tropp,
2006] uses p = ∞: min X Y − ΦX 2 + λ X ∞,1 where X ∞,1 = i |x(i) ∞ . [Malioutov et al., 2005] uses p = 2: min X Y − ΦX 2 + λ X 2,1 where X 2,1 = i |x(i) . good recovery depends on the choise of egularization parameter λ > 0 not robust due to the used data ﬁdelity term (LS-loss) not as scalable as greedy methods 8/24

Nonparametric approach based on mixed norms min X Y −
ΦX q p,q data ﬁdelity subject to X 0 ≤ K sparsity constraint (Pp,q ) =⇒ mixed ℓ1 norms provide robustness, i.e., give larger weights on small residuals and less weight on large residuals. We consider the cases (p, q) = (1, 1): Y − ΦX 1 = i j |yij − φH (i) xj | robust norm for errors i.i.d. in space and time (p, q) = (2, 1): Y − ΦX 2,1 = i y(i) − XHφ(i) robust norm for errors i.i.d. in ”space” only and devise greedy simultaneous NIHT (SNIHT) algorithm 9/24

Normalized iterative hard thresholding (NIHT) Conventional ℓ2,2 approach : Xn+1
= HK ↑ hard thresholding Xn + stepsize µn+1 ΦH(Y − ΦXn) ↑ neg. gradient ∇X∗ · 2 ψ2,2 (X) = X =⇒ Conventional ℓ2,2 approach ψ1,1 (X) = [sign(xij )] (elementwise operation of S(·)) ψ2,1 (X) = [sign(x(i) )] (rowwise operation of S(·)) 10/24

Normalized iterative hard thresholding (NIHT) In our mixed norm ℓp,q
approach: Xn+1 = HK ↑ hard thresholding Xn + stepsize µn+1 ΦHψp,q (Y − ΦXn) ↑ neg. gradient ∇X∗ · q p,q ψ2,2 (X) = X =⇒ Conventional ℓ2,2 approach ψ1,1 (X) = [sign(xij )] (elementwise operation of S(·)) ψ2,1 (X) = [sign(x(i) )] (rowwise operation of S(·)) 10/24

Normalized iterative hard thresholding (NIHT) In our mixed norm ℓp,q
approach: Xn+1 = HK ↑ hard thresholding Xn + stepsize µn+1 ΦHψp,q (Y − ΦXn) ↑ neg. gradient ∇X∗ · q p,q ψ2,2 (X) = X =⇒ Conventional ℓ2,2 approach ψ1,1 (X) = [sign(xij )] (elementwise operation of S(·)) ψ2,1 (X) = [sign(x(i) )] (rowwise operation of S(·)) Spatial sign sign(x) = x/ x , for x = 0 0, for x = 0 10/24

SNIHT(p, q) algorithm input : Y, Φ, sparsity K, mixed
norm indices (p, q) output : (Xn+1, Γn+1) as estimates of X and Γ = supp(X) initialize: X0 = 0, µ0 = 0, n = 0, Γ0 = ∅ 1 Γ0 = supp HK (ΦHψp,q (Y)) while halting criterion false do 2 Rn ψ = ψp,q (Y − ΦXn) 3 Gn = ΦHRn ψ 4 µn+1 = CompStepsize(Φ, Gn, Γn, µn, p, q) 5 Xn+1 = HK (Xn + µn+1Gn) 6 Γn+1 = supp(Xn+1) 7 n = n + 1 end 11/24

Computation of the step size µn+1 For convergence, it is
important that the stepsize is adaptively controlled at each iteration. Given the found support Γn is correct, one can choose µn+1 as the min. of the objective fnc for the gradient ascent direction Xn + µGn|Γn : L(µ) = Y − Φ Xn + µGn|Γn q p,q = Rn − µB q p,q where Rn = Y − ΦXn and B = ΦG(Γn) For p = q: simple linear regression problem based on ℓp -norm, with response r = vec(Rn) and predictor b = vec(B). ⇒ µn+1 = Gn (Γn) 2/ ΦGn (Γn) 2 for p = q = 2 For p = q = 1 and (p, q) = (2, 1), minimizer of L(µ) can not be found in closed-form. 12/24

Computation of the step size µn+1 When p = q
= 1, the solution µ veriﬁes a ﬁxed point (FP) equation µ = H(µ) H(µ) = i,j |˜ rij |−1|bij |2 −1 i,j |˜ rij |−1Re(b∗ ij rij ), and ˜ R = Rn − µB (depends on unknown µ) and B = ΦG(Γn) . We select µn+1 as the 1-step FP iterate µn+1 = H(µn), where µn is the value of the stepsize at previous nth iteration. Often this approximation was within ∼ 1e−3 to minimizer of L(µ). Same approach is used in the case (p, q) = (2, 1). 13/24

Simulation study Y M × Q = Φ M ×
N X N × Q + E M × 1 Set-up Measurement matrix Φ: φij ∼ CN(0, 1) with unit-norm columns Γ = supp(X) randomly drawn K elements from {1, . . . , N} signal matrix X: |xij | = 10 and Arg(xij ) ∼ Unif (0, 2π) ∀i ∈ Γ. For ℓ = 1, . . . , L MC-trials (L = 1000), compute X[ℓ] and Γ[ℓ] Performance measures mean squared error MSE( ^ X) = 1 LQ L ℓ=1 ^ X[ℓ] − X[ℓ] 2. probability of exact recovery PER = 1 L L ℓ=1 I ^ Γ[ℓ] = Γ[ℓ] . 14/24

IIID Gaussian noise, eij ∼ CN (0, σ2) MSE vs
SNR for Q=16 0 4 8 12 16 20 10 15 20 25 30 35 SNR (dB) MSE (dB) SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT PER vs Q at SNR = 5 dB 2 4 6 8 10 12 14 16 18 0 0.2 0.4 0.6 0.8 1 Q (number of vectors) PER SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT SNIHT(2, 2) but SNIHT(2, 1) and HUB-SNIHT (threshold c = 1.517) loose only 0.09 dB in MSE (Q = 16) 15/24

IID t-distributed noise, eij ∼ Ctν (0, σ2), σ2 =
MedF (|ei |2) SNR(σ) = 30 dB 1 1.5 2 3 4 5 0 10 20 30 40 50 60 ν (degrees of freedom) MSE (dB) SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT SNR(σ) = 10 dB, ν = 3 2 4 6 8 10 12 14 16 18 0 0.2 0.4 0.6 0.8 1 Q (number of vectors) PER SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT SNIHT(1, 1) 16/24

Row IIID compound Gaussian inverse Gaussian texture, e(i) ∼ CIGλ
(0, σ2IQ ) 0 4 8 12 16 20 0 5 10 15 20 25 30 SNR (dB) MSE (dB) SNIHT(2, 2) SNIHT(2, 1) SNIHT(1, 1) HUB-SNIHT SNIHT(2, 1) 17/24

Sensor array and narrowband, farﬁeld source model M sensors, K
sources (M > K): Model for sensor measurements at time instant t y(t) = A(θ)x(t) + e(t), - steering matrix A(θ) = a(θ1 ) . . . a(θK ) - unknown (distinct) Direction of Arrivals (DOA’s) θ1, . . . , θK - known array manifold a(θ) (e.g., ULA) - unknown signal waveforms x(t) = (x1 (t), . . . , xK (t))⊤ Objective Estimate the DOA’s of the sources given the snapshots y(t1 ), . . . , y(tQ ) and the number of sources K impinging on the sensor array. 18/24

Multichannel sparse recovery approach to source localization Construct an M
× N overcomplete steering matrix A(˜ θ) for a grid of all source locations ˜ θ1, . . . , ˜ θN of interest. Suppose that ˜ θ contains the true DOA’s to some accuracy. The array model in matrix form then rewrites as Y = y(t1 ) · · · y(tQ ) = A(˜ θ)X + E, where X ∈ CN×Q is K-sparse, with K non-zero rows containing the source waveforms at time instants t1, . . . , tQ . =⇒ K-sparse MMV model In the multichannel sparce recovery formulation A(˜ θ) is completely known, and we can use SNIHT methods to identify the support. 19/24

Multichannel sparse recovery approach to source localization Construct an M
× N overcomplete steering matrix A(˜ θ) for a grid of all source locations ˜ θ1, . . . , ˜ θN of interest. Suppose that ˜ θ contains the true DOA’s to some accuracy. The array model in matrix form then rewrites as Y = y(t1 ) · · · y(tQ ) = A(˜ θ)X + E, where X ∈ CN×Q is K-sparse, with K non-zero rows containing the source waveforms at time instants t1, . . . , tQ . =⇒ Finding DOA’s is equivalent to identifying Γ = supp(X) In the multichannel sparce recovery formulation A(˜ θ) is completely known, and we can use SNIHT methods to identify the support. 19/24

Simul set-up ULA of M = 20 sensors with half
a wavelength interelement spacing. K = 2 independent Gaussian sources from θ1 = 0o and θ2 = 8o. Temporally/spatially white Gaussian error terms. Low SNR = 0 dB. Uniform grid ˜ θ on [−90, 90] with 2o degree spacing We compute PER rates and relative frequencies of found DOA estimates in the grid for 1000 Monte Carlo runs. 20/24

Number of snapshots Q = 4 DOA’s at θ1 =
0o and θ1 = 8o. −4 −2 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 DOA θ degrees (2o sampling grid) Frequency SNIHT(2, 2) SNIHT(1, 1) SNIHT(2, 1) PER rates 1 SNIHT(1, 1) = 81% 2 SNIHT(2, 1) = 73.1% 3 HUB-SNIHT = 42.6% 4 SNIHT = 37.6% SNIHT(1, 1) 21/24

Number of snapshots Q = 40 DOA’s at θ1 =
0o and θ1 = 8o. −4 −2 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 DOA θ degrees (2o sampling grid) Frequency SNIHT(2, 2) SNIHT(1, 1) SNIHT(2, 1) PER rates 1 SNIHT(1, 1) = 100% 2 SNIHT(2, 1) = 73.1% 3 HUB-SNIHT = 42.6% 4 SNIHT = 37.6% SNIHT(1, 1) 22/24

Conclusions robust and nonparametric appraches for simultaneous sparse recovery 23/24

Conclusions robust and nonparametric appraches for simultaneous sparse recovery Diﬀerent
method, SNIHT(1, 1), SNIHT(2, 1), HUB-SNIHT, can be selected for a problem at hand (iid/correlated noise, high-breakdown, etc) 23/24

method, SNIHT(1, 1), SNIHT(2, 1), HUB-SNIHT, can be selected for a problem at hand (iid/correlated noise, high-breakdown, etc) Fast and scalable greedy SNIHT algorithm: increase in robustness does not imply increase in computation time! 23/24

method, SNIHT(1, 1), SNIHT(2, 1), HUB-SNIHT, can be selected for a problem at hand (iid/correlated noise, high-breakdown, etc) Fast and scalable greedy SNIHT algorithm: increase in robustness does not imply increase in computation time! Applications in source localization and image denoising, etc. 23/24

This talk based on E. Ollila (2015a) Nonparametric Simultaneous Sparse
Recovery: an Application to Source Localization PDF (ArXiv): http://arxiv.org/pdf/1502.02441v1 EUSIPCO’15, submitted. E. Ollila (2015b). Robust Simultaneous Sparse Approximation A Festschrift in Honor of Hannu Oja, Springer, Sept. 2015. 24/24

Blanchard, J. D., Cermak, M., Hanle, D., and Jin, Y.
(2014). Greedy algorithms for joint sparse recovery. IEEE Trans. Signal Processing, 62(7). Chen, J. and Huo, X. (2006). Theoretical results on sparse representations of multiple-measurement vectors. IEEE Trans. Signal Processing, 54(12):4634–4643. Cotter, S. and Rao, B. (2002). Sparse channel estimation via matching pursuit with application to equalization. IEEE Trans. Comm., 50(3):374–377. Duarte, M. F. and Eldar, Y. C. (2011). Structured compressed sensing: From theory to applications. IEEE Trans. Signal Processing, 59(9):4053–4085. Eldar, Y. C. and Rauhut, H. (2010). Average case analysis of multichannel sparse recovery using convex relaxation. IEEE Trans. Inform. Theory, 56(1):505–519. Gorodnitsky, I., George, J., and Rao, B. (1995). Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm. J. Electroencephalogr. Clin. Neurophysiol., 95(4):231–251.

Gribonval, R. and Zibulevsky, M. (2010). Handbook of Blind Source
Separation, chapter Sparse component analysis, pages 367–420. Academic Press, Oxford, UK. Malioutov, D., C ¸etin, M., and Willsky, A. S. (2005). A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Processing, 53(8):3010–3022. Tropp, J. A. (2006). Algorithms for simultaneous sparse approximation. part II: Convex relaxation. Signal Processing, 86:589–602. 24/24

Esa Ollila

Esa Ollila

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript