Claire Boyer

Sampling rates for `1-synthesis Maximilian März, Claire Boyer, Jonas Kahn,
Pierre Weiss

2 / 30 Joint work with Maximilian März Jonas Kahn
Pierre Weiss (TU Berlin) (IMT Toulouse) (IMT Toulouse)

3 / 30 Outline 1. Introduction 2. A primer on
convex geometry 3. Signal recovery Convex gauge for signal recovery Sampling rate for signal recovery 4. Upper Bounds on the Conic Gaussian Width

4 / 30 Summary 1. Introduction 2. A primer on

5 / 30 Setting Linear Noisy Measurements I Signal: x
0 2 Rn I Measurements: y 2 Rm of x 0 via the linear acquisition model y = Ax 0 + e, (1) where A 2 Rm⇥n is a Gaussian measurement matrix e 2 Rm models measurement noise with kek 2  ⌘ for some ⌘ 0

5 / 30 Setting Linear Noisy Measurements I Signal: x
0 2 Rn I Measurements: y 2 Rm of x 0 via the linear acquisition model y = Ax 0 + e, (1) where A 2 Rm⇥n is a Gaussian measurement matrix e 2 Rm models measurement noise with kek 2  ⌘ for some ⌘ 0 Gaussian assumption I classical benchmark setup in CS I It allows us to determine the sampling rate of a convex program (i.e., the number of required measurements for successful recovery) by calculating the so-called Gaussian mean width

6 / 30 The signal structure As for the signal
x 0 I sparsity hardly satisﬁed in any real-world application I but sparse representations using speciﬁc transforms Gabor dictionaries, wavelet systems or data-adaptive representations Synthesis formulation There exists a matrix D 2 Rn⇥d and a low-complexity representation z 0 2 Rd such that x 0 can be “synthesized” as x 0 = D · z 0 . I D = [d 1 , . . . , dd ] is the dictionary I its columns are the dictionary atoms.

6 / 30 The signal structure As for the signal
x 0 I sparsity hardly satisﬁed in any real-world application I but sparse representations using speciﬁc transforms Gabor dictionaries, wavelet systems or data-adaptive representations Synthesis formulation There exists a matrix D 2 Rn⇥d and a low-complexity representation z 0 2 Rd such that x 0 can be “synthesized” as x 0 = D · z 0 . I D = [d 1 , . . . , dd ] is the dictionary I its columns are the dictionary atoms. I In this work, focus on the synthesis formulation instead of the analysis one.

7 / 30 Visually

8 / 30 Synthesis basis pursuit for coefﬁcient/signal recovery Synthesis
basis pursuit for coefﬁcient recovery ˆ Z := argmin z2Rd kzk 1 s.t. ky ADzk 2  ⌘. (BPcoef ⌘ ) D 2 Rn⇥d I when n = d, for instance D = Id (or any B.O.S) classical basis pursuit can recover any s-sparse vector z 0 w.h.p. if A is sub-Gaussian with m & s · log(2n/s)

basis pursuit for coefﬁcient recovery ˆ Z := argmin z2Rd kzk 1 s.t. ky ADzk 2  ⌘. (BPcoef ⌘ ) D 2 Rn⇥d I when n = d, for instance D = Id (or any B.O.S) classical basis pursuit can recover any s-sparse vector z 0 w.h.p. if A is sub-Gaussian with m & s · log(2n/s) I in practice n ⌧ d, redundant D

basis pursuit for coefﬁcient recovery ˆ Z := argmin z2Rd kzk 1 s.t. ky ADzk 2  ⌘. (BPcoef ⌘ ) D 2 Rn⇥d I when n = d, for instance D = Id (or any B.O.S) classical basis pursuit can recover any s-sparse vector z 0 w.h.p. if m & s · log(2n/s) I in practice n ⌧ d, redundant D representations not necessarily unique can’t expect to recover a speciﬁc representation via (BPcoef ⌘ ) One should be interested instead in: Synthesis basis pursuit for signal recovery ˆ X B D · argmin z2Rd kzk 1 s.t. ky ADzk 2  ⌘ ! | {z } =:ˆ Z . (BPsig ⌘ )

10 / 30 Synthesis basis pursuit for coefﬁcient/signal recovery In
the noiseless case (i.e., when e = 0 and ⌘ = 0), I it might be the case that ˆ Z , {z 0 } (coefﬁcient recovery fails) I but hope that ˆ X = D · ˆ Z = {x 0 } (signal recovery successes)

10 / 30 Synthesis basis pursuit for coefficient/signal recovery In
the noiseless case (i.e., when e = 0 and ⌘ = 0), I it might be the case that ˆ Z , {z 0 } (coefficient recovery fails) I but hope that ˆ X = D · ˆ Z = {x 0 } (signal recovery successes) Some questions addressed in the paper 1. When coefficient recovery , signal recovery? 2. How many measurements are required for coefficient recovery? signal recovery? 3. In case of coefficient and signal recovery, what about robustness to measurement noise? Questions addressed in this talk (Q1) What is the sampling rate for the signal recovery? (Q2) Which tight upper-bound can we provide on this sampling rate?

11 / 30 Related works on the synthesis formulation [Rauhut,
Schnass and Vandergheynst 2008] [Casazza, Chen, and Lynch, 2019] 7 Address the coefﬁcient recovery and not the signal one

Schnass and Vandergheynst 2008] [Casazza, Chen, and Lynch, 2019] 7 Address the coefﬁcient recovery and not the signal one Phase transitions of coefﬁcient and signal recovery by `1-synthesis.

Schnass and Vandergheynst 2008] [Casazza, Chen, and Lynch, 2019] 7 Address the coefﬁcient recovery and not the signal one 7 Uniform results over all s-sparse coefﬁcient vectors

Schnass and Vandergheynst 2008] [Casazza, Chen, and Lynch, 2019] 7 Address the coefﬁcient recovery and not the signal one 7 Uniform results over all s-sparse coefﬁcient vectors 7 Rely on strong assumptions on D: RIP, NSP, incoherence ... 7 Forget about redundant representation systems highly coherent and with many linear dependencies

Schnass and Vandergheynst 2008] [Casazza, Chen, and Lynch, 2019] 7 Address the coefﬁcient recovery and not the signal one 7 Uniform results over all s-sparse coefﬁcient vectors 7 Rely on strong assumptions on D: RIP, NSP, incoherence ... 7 Forget about redundant representation systems highly coherent and with many linear dependencies Mission statement I Sampling rate for the signal recovery I Need for local and non-uniform approach: signal-dependent analysis is crucial for redundant representation systems I Avoiding strong assumptions on the dictionary

13 / 30 The generalized Basis Pursuit Consider the generalized
basis pursuit min x2Rn f(x) s.t. ky Axk 2  ⌘, (BPf ⌘ ) f : Rn ! R is convex, supposed to reﬂect the “low complexity” of the signal x 0 .

basis pursuit min x2Rn f(x) s.t. ky Axk 2  ⌘, (BPf ⌘ ) f : Rn ! R is convex, supposed to reﬂect the “low complexity” of the signal x 0 . I The descent set of f at x 0 is given by D(f, x 0 ) := h 2 Rn : f(x 0 + h)  f(x 0 ) , (2) I The descent cone is deﬁned by D^(f, x 0 ) := cone(D(f, x 0 )).

basis pursuit min x2Rn f(x) s.t. ky Axk 2  ⌘, (BPf ⌘ ) f : Rn ! R is convex, supposed to reflect the “low complexity” of the signal x 0 . I The descent set of f at x 0 is given by D(f, x 0 ) := h 2 Rn : f(x 0 + h)  f(x 0 ) , (2) I The descent cone is defined by D^(f, x 0 ) := cone(D(f, x 0 )). Definition (Minimum conic singular value) Consider A 2 Rm⇥n and a cone K ✓ Rn. The minimum conic singular value of A relative to K is: min (A, K) := inf x2K\Sn 1 kAxk2 . (3)

14 / 30 A key quantity: Minimum conic singular value
Consider the generalized basis pursuit min x2Rn f(x) s.t. ky Axk 2  ⌘, (BPf ⌘ ) f : Rn ! R is convex, supposed to reﬂect the “low complexity” of the signal x 0 . [Chandrasekaran et al. 2012, Tropp 2015] A deterministic error bound for (BPf ⌘ ) (a) If ⌘ = 0, exact recovery of x 0 by solving BPf ⌘=0 () min (A; D^(f, x 0 )) > 0 (b) In addition, any solution ˆ x of (BPf ⌘ ) satisﬁes x 0 ˆ x 2  2⌘ min (A; D^(f, x 0 )) . (4)

Consider the generalized basis pursuit min x2Rn f(x) s.t. ky Axk 2  ⌘, (BPf ⌘ ) f : Rn ! R is convex, supposed to reﬂect the “low complexity” of the signal x 0 . [Chandrasekaran et al. 2012, Tropp 2015] A deterministic error bound for (BPf ⌘ ) (a) If ⌘ = 0, exact recovery of x 0 by solving BPf ⌘=0 () min (A; D^(f, x 0 )) > 0 (b) In addition, any solution ˆ x of (BPf ⌘ ) satisﬁes x 0 ˆ x 2  2⌘ min (A; D^(f, x 0 )) . (4) f II Ha f Il.lk Xo no His a Jinx hotkey J'Az DQ.hn L'Keng DU.lk xo f II Ha f Il.lk Ho a no 4K a Jinx hotkey J'Aae Q.iypeo hotkey DU.lk xo O KA L t j'DH.t4xDierp H.lhieo

Consider the generalized basis pursuit min x2Rn f(x) s.t. ky Axk 2  ⌘, (BPf ⌘ ) f : Rn ! R is convex, supposed to reﬂect the “low complexity” of the signal x 0 . [Chandrasekaran et al. 2012, Tropp 2015] A deterministic error bound for (BPf ⌘ ) (a) If ⌘ = 0, exact recovery of x 0 by solving BPf ⌘=0 () min (A; D^(f, x 0 )) > 0 (b) In addition, any solution ˆ x of (BPf ⌘ ) satisﬁes x 0 ˆ x 2  2⌘ min (A; D^(f, x 0 )) . (4) I min (A; D^(f, x 0 )) can be NP-hard to compute I But there exists an estimate in the sub-Gaussian case! I Through the Gordon’s Escape Through a Mesh theorem

15 / 30 From the minimum conic singular value to
the conic mean width Deﬁnition (Mean width) I The mean width of a set K 2 Rn is w(K) = E sup h2K hg, hi ! with g ⇠ N(0, In). I The conic mean width of a cone K 2 Rn is w^(K) = w(K \ Sn 1) Theorem (Generic recovery) Assume that A 2 Rm⇥n is a Gaussian random matrix then min (A, K) p m 1 w^(K) u with probability larger than 1 e u2/2. A sufﬁcient condition for robust recovery is m w^(D^(f, x 0 ))2 + 1.

16 / 30 And actually, phase transitions [Amelunxen, Lotz, McCoy,
Tropp (2014)] Theorem (Phase transitions) I For m w^(D^(f, x 0 ))2 + 1 log(✏) p n succeeds with probability > 1 ✏. I For m  w^(D^(f, x 0 ))2 + 1 + log(✏) p n succeeds with probability < ✏.

17 / 30 Take-home messages on the generalized BP I
Robust signal recovery via the generalized basis pursuit (BPf ⌘ ) is characterized by min (A; D^(f, x 0 )). I The required number of sub-Gaussian random measurements can be determined by the conic mean width of f at x 0 w2 ^ (D(f, x 0 )). I w2 ^ (D(f, x 0 )) gives a phase transition for the recovery success via BPf ⌘=0 , in the noiseless case. BPf ⌘=0 fails w.h.p. when m . w2 ^ (D(f, x 0 )) BPf ⌘=0 succeeds w.h.p. when m & w2 ^ (D(f, x 0 ))

19 / 30 Convex gauge for signal recovery Recall: synthesis
basis pursuit for signal recovery ˆ X B D · argmin z2Rd kzk 1 s.t. ky ADzk 2  ⌘ ! . (BPsig ⌘ )

basis pursuit for signal recovery ˆ X B D · argmin z2Rd kzk 1 s.t. ky ADzk 2  ⌘ ! . (BPsig ⌘ ) Let pK (x) := inf t>0 {x 2 tK} denote the gauge of a convex set K.

basis pursuit for signal recovery ˆ X B D · argmin z2Rd kzk 1 s.t. ky ADzk 2  ⌘ ! . (BPsig ⌘ ) Let pK (x) := inf t>0 {x 2 tK} denote the gauge of a convex set K. Lemma (Gauge formulation) Assume that y = Ax 0 + e, with kek2  ⌘. Let D 2 Rn⇥d be a dictionary. Then, ˆ X = argmin x2Rn pD·Bd 1 (x) s.t. ky Axk 2  ⌘,

basis pursuit for signal recovery ˆ X B D · argmin z2Rd kzk 1 s.t. ky ADzk 2  ⌘ ! . (BPsig ⌘ ) Let pK (x) := inf t>0 {x 2 tK} denote the gauge of a convex set K. Lemma (Gauge formulation) Assume that y = Ax 0 + e, with kek2  ⌘. Let D 2 Rn⇥d be a dictionary. Then, ˆ X = argmin x2Rn pD·Bd 1 (x) s.t. ky Axk 2  ⌘, Lemma (Descent cone) Let x 0 2 ran(D). For any z `1 2 Z `1 (`1-representers of x 0 in D), D^(pD·Bd 1 , x 0 ) = D · D^(k·k 1 , z `1 ) and D(pD·Bd 1 , x 0 ) = D · D(k·k 1 , z `1 ).

20 / 30 Sampling rate for signal recovery Theorem (Signal
recovery) Let D 2 Rn⇥d be a dictionary with x 0 2 ran(D) and pick any z `1 2 Z `1 . 8u > 0, with probability 1 e u2/2 : if m > m 0 B (w ^ (D · D(k·k 1 ; z `1 )) + u)2 + 1, (5) then any solution ˆ x to the program (BPsig ⌘=0 ) satisﬁes x 0 ˆ x 2  2⌘ p m 1 p m 0 1 . (6)

recovery) Let D 2 Rn⇥d be a dictionary with x 0 2 ran(D) and pick any z `1 2 Z `1 . 8u > 0, with probability 1 e u2/2 : if m > m 0 B (w ^ (D · D(k·k 1 ; z `1 )) + u)2 + 1, (5) then any solution ˆ x to the program (BPsig ⌘=0 ) satisﬁes x 0 ˆ x 2  2⌘ p m 1 p m 0 1 . (6) (a) w2 ^ (D · D(k·k 1 ; z `1 )) drives the sampling rate (also true for coeff recovery)

recovery) Let D 2 Rn⇥d be a dictionary with x 0 2 ran(D) and pick any z `1 2 Z `1 . 8u > 0, with probability 1 e u2/2 : if m > m 0 B (w ^ (D · D(k·k 1 ; z `1 )) + u)2 + 1, (5) then any solution ˆ x to the program (BPsig ⌘=0 ) satisﬁes x 0 ˆ x 2  2⌘ p m 1 p m 0 1 . (6) (a) w2 ^ (D · D(k·k 1 ; z `1 )) drives the sampling rate (also true for coeff recovery) (b) But the set of minimal `1-representers is not required to be a singleton: The descent cone in the signal space may be evaluated at any possible z `1 2 Z `1 .

recovery) Let D 2 Rn⇥d be a dictionary with x 0 2 ran(D) and pick any z `1 2 Z `1 . 8u > 0, with probability 1 e u2/2 : if m > m 0 B (w ^ (D · D(k·k 1 ; z `1 )) + u)2 + 1, (5) then any solution ˆ x to the program (BPsig ⌘=0 ) satisﬁes x 0 ˆ x 2  2⌘ p m 1 p m 0 1 . (6) (a) w2 ^ (D · D(k·k 1 ; z `1 )) drives the sampling rate (also true for coeff recovery) (b) But the set of minimal `1-representers is not required to be a singleton: The descent cone in the signal space may be evaluated at any possible z `1 2 Z `1 . (c) Phase transition of signal recovery at m 0 .

recovery) Let D 2 Rn⇥d be a dictionary with x 0 2 ran(D) and pick any z `1 2 Z `1 . 8u > 0, with probability 1 e u2/2 : if m > m 0 B (w ^ (D · D(k·k 1 ; z `1 )) + u)2 + 1, (5) then any solution ˆ x to the program (BPsig ⌘=0 ) satisﬁes x 0 ˆ x 2  2⌘ p m 1 p m 0 1 . (6) (a) w2 ^ (D · D(k·k 1 ; z `1 )) drives the sampling rate (also true for coeff recovery) (b) But the set of minimal `1-representers is not required to be a singleton: The descent cone in the signal space may be evaluated at any possible z `1 2 Z `1 . (c) Phase transition of signal recovery at m 0 . (d) Believe me, the robustness between signal and coefﬁcient recovery is different

21 / 30 Conclusion on these ﬁrst results I Sampling
rate for coeff rec = sampling rate for signal rec I (Robustness to noise is different) I Critical quantity = conic mean width of a linearly transformed cone w2 ^ (D · D(k·k 1 ; z `1 ))

rate for coeff rec = sampling rate for signal rec I (Robustness to noise is different) I Critical quantity = conic mean width of a linearly transformed cone w2 ^ (D · D(k·k 1 ; z `1 )) Uniform recovery VS non uniform recovery Compressed sensing started with the Restricted Isometry Property leading to: All s-sparse vectors are recovered with probability X if m > mRIP . Here: A speciﬁc vector z 0 is recovered with probability Y if m > mz0 .

rate for coeff rec = sampling rate for signal rec I (Robustness to noise is different) I Critical quantity = conic mean width of a linearly transformed cone w2 ^ (D · D(k·k 1 ; z `1 )) Uniform recovery VS non uniform recovery Compressed sensing started with the Restricted Isometry Property leading to: All s-sparse vectors are recovered with probability X if m > mRIP . Here: A speciﬁc vector z 0 is recovered with probability Y if m > mz0 . The Restricted Isometry Property... I RIP = far stronger statement. I RIP = optimal for orthogonal D, super pessimistic otherwise. I RIP = useless in 99% of the practical cases.

23 / 30 How to evaluate the conic mean width
w2 ^ (D · D(k·k1 ; z`1 ))? 3 Tight and informative upper bounds for simple dictionaries such as orthogonal matrices 7 Involved for general, possibly redundant transforms 7 We cannot use classical argument based on polarity Indeed, 7 A bound based on a local condition number is too pessimistic w2 ^ (D · D(k·k 1 ; z `1 ) | {z } =:C )  kDk 2 min (D; D(k·k 1 ; z `1 )) · ⇣ w2 ^ (D(k·k 1 ; z `1 )) + 1 ⌘

24 / 30 A geometric bound instead 1. Decompose the
cone into its lineality and its range C = CL CR w2 ^ (C) . w2 ^ (CL ) + w2 ^ (CR ) + 1 2. The lineality CL is the largest subspace contained in the cone, so w2 ^ (CL ) ' dim(CL ) 3. The range is ﬁnitely generated, line-free, and contained into a circular cone of circumangle ↵ < ⇡/2 new bound on the conic mean width for such cones

25 / 30 Decomposition of the descent cone of the
gauge pD·Bd 1 Proposition Let D 2 Rn⇥d be a dictionary and let x 0 2 ran(D) \ {0}. Let C := D^(pD·Bd 1 , x 0 ) = D · D(k·k 1 ; z `1 ) denote the descent cone of the gauge at x 0 . Let z `1 2 ri(Z `1 ) be any minimal `1-representer of x 0 in D with maximal support and set ¯ S = supp(z `1 ) as well as ¯ s = # ¯ S. Assume ¯ s < d. Then we have: (a) The lineality space of C has a dimension less than ¯ s 1 and is given by CL = span(¯ s · sign(z `1,i ) · di D · sign(z `1 ) : i 2 ¯ S). (7) (b) The range of C is a 2(d ¯ s)-polyhedral ↵-cone given by: CR = cone(r±? j : j 2 ¯ Sc ) with r±? j B PC? L (±¯ s · dj D · sign(z `1 )) . (8)

26 / 30 Circumangle for pointed polyhedral cone Proposition: Circumangle
and circumcenter of polyhedral cones Let xi 2 Sn 1 for i 2 [k] and let C = cone(x 1 , . . . , xk ) be a nontrivial pointed polyhedral cone. Finding the circumcenter and circumangle of C amounts to solving the convex problem: cos(↵) = sup ✓2Bn 2 inf i2[k] h✓, xi i. 3 possible to numerically compute the circumangle of pointed polyhedral cones. , the minimum conic singular value is intractable in general

27 / 30 Cmw for k-polyhedral cone contained into ↵-circular
cones Proposition For k 5, the conic mean width of a k-polyhedral cone contained into an ↵-circular cone C in Rn is bounded by W(↵, k, n)  tan ↵ · 0 B B B B B B B B B B @ q 2 log ⇣ k/ p 2⇡ ⌘ + 1 q 2 log ⇣ k/ p 2⇡ ⌘ 1 C C C C C C C C C C A + 1 p 2⇡ . I the bound does not depend on the ambient dimension n, , in contrast to the conic width of a circular cone.

28 / 30 Consequence for the sampling rate Theorem If
¯ s  d 3, we obtain that w2 ^ (D^(pD·Bd 1 , x 0 ))  ¯ s + 0 B B B B B B @tan ↵ · 0 B B B B B B @ s 2 log 2(d ¯ s) p 2⇡ ! + 1 1 C C C C C C A + 1 p 2⇡ 1 C C C C C C A 2 ,

28 / 30 Consequence for the sampling rate Theorem If
¯ s  d 3, we obtain that w2 ^ (D^(pD·Bd 1 , x 0 ))  ¯ s + 0 B B B B B B @tan ↵ · 0 B B B B B B @ s 2 log 2(d ¯ s) p 2⇡ ! + 1 1 C C C C C C A + 1 p 2⇡ 1 C C C C C C A 2 , Corollary The critical number of measurements m 0 satisﬁes m 0 . ¯ s + tan2 ↵ · log(2(d ¯ s)/ p 2⇡). (9) The sampling rate is mainly governed by I the sparsity ¯ s of maximal support `1-representations of x 0 in D I the “narrowness” of the remaining cone CR , which is captured by its circumangle ↵ 2 [0, ⇡/2) I The number of dictionary atoms only has a logarithmic inﬂuence. NB: comparable to the mean width of a convex polytope, which is mainly determined by its diameter and by the logarithm of its number of vertices.

29 / 30 Examples D x 0 2 Rn m
& D = Id = 0 B B B B B B B B B B B B B @ 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 C C C C C C C C C C C C C A s-sparse vector 2s log(2(n s)/ p 2⇡) 3 Convolutional dictionary 2-sparse (new) D = 0 B B B B B B B B B B B B @ 1 1 0 0 1 1 0 0 0 1 1 0 0 1 1 0 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 C C C C C C C C C C C C A (1 0 0 . . . 0 1)T 2 + 2 log(4n) : Total gradient variation Numerical evaluation D = r† s-gradient sparse s · log2(n) 3

30 / 30 The end Contributions 3 Sampling rates for
the synthesis problem (coefﬁcient and signal). 3 Decent upper-bounds for the conic width of linearly transformed cones. 3 Dissected the descent cone of the `1-ball. 7 Quantities are still partly cryptic. 7 Case by case study of practical dictionaries is technical.

the synthesis problem (coefﬁcient and signal). 3 Decent upper-bounds for the conic width of linearly transformed cones. 3 Dissected the descent cone of the `1-ball. 7 Quantities are still partly cryptic. 7 Case by case study of practical dictionaries is technical. More to read in the paper arXiv:2004.07175

the synthesis problem (coefﬁcient and signal). 3 Decent upper-bounds for the conic width of linearly transformed cones. 3 Dissected the descent cone of the `1-ball. 7 Quantities are still partly cryptic. 7 Case by case study of practical dictionaries is technical. More to read in the paper arXiv:2004.07175 Thank you!

Claire Boyer

Claire Boyer

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript