Signal Processing Course: Sparse L1 Recovery

Signal Processing Course: Sparse L1 Recovery

E34ded36efe4b7abb12510d4e525fee8?s=128

Gabriel Peyré

January 01, 2012
Tweet

Transcript

  1. Recovery Gabriel Peyré www.numerical-tours.com 1 Sparse

  2. Inverse problem: K : RN0 RP , P N0 Example:

    Regularization K Kf0 f0 1 y = Kf0 + w measurements
  3. Inverse problem: observations = K ⇥ ⇥ RP N K

    : RN0 RP , P N0 f0 = x0 sparse in dictionary RN0 N , N N0 . x0 RN f0 = x0 RN0 y = Kf0 + w RP w Example: Regularization Model: K Kf0 f0 coe cients image K 1 y = Kf0 + w measurements
  4. Inverse problem: Fidelity Regularization min x RN 1 2 ||y

    x||2 + ||x||1 observations = K ⇥ ⇥ RP N K : RN0 RP , P N0 f0 = x0 sparse in dictionary RN0 N , N N0 . x0 RN f0 = x0 RN0 y = Kf0 + w RP w Example: Regularization Model: K Sparse recovery: f = x where x solves Kf0 f0 coe cients image K 1 y = Kf0 + w measurements
  5. f0 = x0 y = x0 + w Recovery: Observations:

    Data: x ⇥ argmin x RN 1 2 || x y||2 + ||x||1 Variations and Stability (P (y))
  6. f0 = x0 y = x0 + w Recovery: Observations:

    Data: x ⇥ argmin x RN 1 2 || x y||2 + ||x||1 x argmin x=y ||x||1 0+ (no noise) Variations and Stability (P (y)) (P0 (y))
  7. f0 = x0 y = x0 + w Questions: Recovery:

    Observations: Data: – Behavior of x with respect to y and . – Criterion to ensure ||x x0 || = O(||w||). – Criterion to ensure x = x0 when w = 0 and = 0+. x ⇥ argmin x RN 1 2 || x y||2 + ||x||1 x argmin x=y ||x||1 0+ (no noise) Variations and Stability (P (y)) (P0 (y))
  8. ! Mapping ! x? looks polygonal. ! If x0 sparse

    and well chosen, sign(x ? ) = sign(x0). Numerical Illustration 10 20 30 40 50 60 −1 −0.5 0 0.5 s=3 10 20 30 40 50 60 −0.5 0 0.5 s=6 20 40 60 80 100 −0.5 0 0.5 1 s=13 20 40 60 80 100 120 140 −1.5 −1 −0.5 0 0.5 1 1.5 s=25 10 20 30 40 50 60 −1 −0.5 0 0.5 s=3 10 20 30 40 50 60 −0.5 0 0.5 s=6 20 40 60 80 100 −0.5 0 0.5 1 s=13 20 40 60 80 100 120 140 −1.5 −1 −0.5 0 0.5 1 1.5 s=25 10 20 30 40 50 60 −1 −0.5 0 0.5 s=3 10 20 30 40 50 60 −0.5 0 0.5 s=6 20 40 60 80 100 −0.5 0 0.5 1 s=13 20 40 60 80 100 120 140 −1.5 −1 −0.5 0 0.5 1 1.5 s=25 10 20 30 40 50 60 −1 −0.5 0 0.5 s=3 10 20 30 40 50 60 −0.5 0 0.5 s=6 20 40 60 80 100 −0.5 0 0.5 1 s=13 20 40 60 80 100 120 140 −1.5 −1 −0.5 0 0.5 1 1.5 s=25 s = 3 s = 6 s = 13 s = 25 y = x0 + w, || x0 ||0 = s, 2 R50⇥200 Gaussian .
  9. Overview • Polytope Noiseless Recovery • Local Behavior of Sparse

    Regularization • Robustness to Small Noise • Robustness to Bounded Noise • Compressed Sensing RIP Theory
  10. (B ) x0 x0 y x (y) 1 2 2

    3 3 1 = ( i ) i R2 3 B = {x \ ||x||1 } = ||x0 ||1 min x=y ||x||1 x0 solution of P0 ( x0 ) ⇥ x0 ⇤ (B ) Polytopes Approach
  11. (B ) x0 x0 y x (y) 1 2 2

    3 3 1 = ( i ) i R2 3 B = {x \ ||x||1 } = ||x0 ||1 min x=y ||x||1 x0 solution of P0 ( x0 ) ⇥ x0 ⇤ (B ) Polytopes Approach (P0 (y))
  12. Suppose x0 not solution, show (x0 ) int( B )

    ⇥z, such that x0 = z, ||z||1 = (1 )||x0 ||1. ||z + ⇥||1 ||z|| + || +h||1 (1 )||x0 ||1 + || ||1,1 ||h||1 < ||x0 ||1 For any h = Im( ) such that ||h||1 < || +||1,1 = (x0 ) + h (B ) Proof = (x0 ) + h = (z + ) x0 solution of P0 ( x0 ) ⇥ x0 ⇤ (B )
  13. = Suppose x0 not solution, show (x0 ) int( B

    ) ⇥z, such that x0 = z, ||z||1 = (1 )||x0 ||1. ||z + ⇥||1 ||z|| + || +h||1 (1 )||x0 ||1 + || ||1,1 ||h||1 < ||x0 ||1 For any h = Im( ) such that ||h||1 < || +||1,1 = (x0 ) + h (B ) Suppose (x0 ) int( B ) Then ⇥z, x0 = (1 ) z and ||z||1 < ||x0 ||1 . ||(1 )z||1 < ||x0 ||1 so x0 is not a solution. z 0 Proof = (x0 ) + h = (z + ) x0 solution of P0 ( x0 ) ⇥ x0 ⇤ (B ) x0 (B )
  14. C(0,1,1) K(0,1,1) Ks = ( isi ) i R3 \

    i 0 2-D cones Cs = Ks 2-D quadrant Basis-Pursuit Mapping in 2-D y x (y) 1 2 3 = ( i ) i R2 3
  15. = ( i ) i R3 N Empty spherical caps

    property RN j Delaunay paving of the sphere with spherical triangles Cs Basis-Pursuit Mapping in 3-D y x (y) k Cs i
  16. All Most RIP Counting faces of random polytopes: Sharp constants.

    No noise robustness. All x0 such that ||x0 ||0 Call (P/N)P are identifiable. Most x0 such that ||x0 ||0 Cmost (P/N)P are identifiable. Call (1/4) 0.065 Cmost (1/4) 0.25 [Donoho] Polytope Noiseless Recovery 50 100 150 200 250 300 350 400 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  17. Overview • Polytope Noiseless Recovery • Local Behavior of Sparse

    Regularization • Robustness to Small Noise • Robustness to Bounded Noise • Compressed Sensing RIP Theory
  18. 0 E(x ) I = {i ⇥ {0, . .

    . , N 1} \ xi ⇤= 0} Support of the solution: First order condition: x solution of P (y) ( x y) + s = 0 where sI = sign(xI ), ||sIc || 1 First Order CNS Condition x ⇥ argmin x RN E(x) = 1 2 || x y||2 + ||x||1
  19. 0 E(x ) I = {i ⇥ {0, . .

    . , N 1} \ xi ⇤= 0} Support of the solution: First order condition: x solution of P (y) ( x y) + s = 0 where sI = sign(xI ), ||sIc || 1 sIc = 1 Ic ( x y) Note: x solution of P (y) || Ic ( x y)|| First Order CNS Condition Theorem: x ⇥ argmin x RN E(x) = 1 2 || x y||2 + ||x||1
  20. If I has full rank: = ( x y) +

    s = 0 Implicit equation Local Parameterization xI = + I y ( I I ) 1sI + I = ( I I ) 1 I
  21. If I has full rank: = ( x y) +

    s = 0 Implicit equation Given y compute x compute (s, I). ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI Define By construction ˆ x (y) = x . ˆ x¯ (¯ y) Ic = 0 Local Parameterization xI = + I y ( I I ) 1sI + I = ( I I ) 1 I
  22. If I has full rank: = ( x y) +

    s = 0 Implicit equation Given y compute x compute (s, I). ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI Define By construction ˆ x (y) = x . Theorem: ˆ x¯ (¯ y) Ic = 0 Remark: the theorem holds outside a union of hyperplanes. 1 2 ||x ||0 =0 such that I is full rank, I = supp( x ?), For (y, ) / 2 H , let x ? be a solution of P (y), Local Parameterization xI = + I y ( I I ) 1sI + I = ( I I ) 1 I 2 2 2 2 2 1 1 1 1 1 for (¯, ¯ y) close to ( , y), ˆ x¯ (¯ y) is solution of P¯ (¯ y)
  23. ! if ker( I) 6 = { 0 } ,

    x ? not unique. Full Rank Condition Lemma: There exists x ? such that ker( I) = {0}.
  24. ! if ker( I) 6 = { 0 } ,

    x ? not unique. Proof: Define 8 t 2 R , xt = x ? + t⌘ . If ker( I) 6= {0}, let ⌘I 2 ker( I) 6= 0. Full Rank Condition Lemma: There exists x ? such that ker( I) = {0}.
  25. ! if ker( I) 6 = { 0 } ,

    x ? not unique. Proof: Define 8 t 2 R , xt = x ? + t⌘ . If ker( I) 6= {0}, let ⌘I 2 ker( I) 6= 0. Let t0 the smallest |t| s.t. sign( xt) 6= sign( x ?). Full Rank Condition Lemma: There exists x ? such that t0 t xt 0 ker( I) = {0}.
  26. ! if ker( I) 6 = { 0 } ,

    x ? not unique. Proof: Define 8 t 2 R , xt = x ? + t⌘ . If ker( I) 6= {0}, let ⌘I 2 ker( I) 6= 0. Let t0 the smallest |t| s.t. sign( xt) 6= sign( x ?). 8 | t | < t0, xt is solution. xt = x ? and same sign: Full Rank Condition Lemma: There exists x ? such that t0 t xt 0 ker( I) = {0}.
  27. ! if ker( I) 6 = { 0 } ,

    x ? not unique. Proof: Define 8 t 2 R , xt = x ? + t⌘ . If ker( I) 6= {0}, let ⌘I 2 ker( I) 6= 0. Let t0 the smallest |t| s.t. sign( xt) 6= sign( x ?). By continuity, xt0 solution. and | supp( xt0 )| < | supp( x ?)|. 8 | t | < t0, xt is solution. xt = x ? and same sign: Full Rank Condition Lemma: There exists x ? such that t0 t xt 0 ker( I) = {0}.
  28. d s j (¯ y, ¯) = |h 'j, ¯

    y I ˆ x¯(¯ y )i| 6 Proof ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI To show: 8 j / 2 I, I = supp(s)
  29. ! ok, by continuity. d s j (¯ y, ¯)

    = |h 'j, ¯ y I ˆ x¯(¯ y )i| 6 Case 1: ds j (y, ) < Proof ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI To show: 8 j / 2 I, I = supp(s)
  30. ! ok, by continuity. d s j (¯ y, ¯)

    = |h 'j, ¯ y I ˆ x¯(¯ y )i| 6 Case 2: ds j (y, ) = and 'j 2 Im( I) Case 1: ds j (y, ) < then ds j(¯ y, ¯) = ¯ ! ok. Proof ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI To show: 8 j / 2 I, I = supp(s)
  31. ! ok, by continuity. 'j / 2 Im( I) !

    exclude this case. d s j (¯ y, ¯) = |h 'j, ¯ y I ˆ x¯(¯ y )i| 6 Case 2: ds j (y, ) = and 'j 2 Im( I) Case 1: ds j (y, ) < then ds j(¯ y, ¯) = ¯ ! ok. Proof ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI To show: 8 j / 2 I, Case 3: ds j (y, ) = and I = supp(s)
  32. ! ok, by continuity. 'j / 2 Im( I) !

    exclude this case. Exclude hyperplanes: d s j (¯ y, ¯) = |h 'j, ¯ y I ˆ x¯(¯ y )i| 6 Case 2: ds j (y, ) = and 'j 2 Im( I) Case 1: ds j (y, ) < H = [ {Hs,j \ 'j / 2 Im( I)} Hs,j = (y, ) \ ds j (¯ y, ¯) = then ds j(¯ y, ¯) = ¯ ! ok. Proof ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI To show: 8 j / 2 I, Case 3: ds j (y, ) = and I = supp(s)
  33. x ?=0 H;,j ! ok, by continuity. 'j / 2

    Im( I) ! exclude this case. Exclude hyperplanes: d s j (¯ y, ¯) = |h 'j, ¯ y I ˆ x¯(¯ y )i| 6 Case 2: ds j (y, ) = and 'j 2 Im( I) Case 1: ds j (y, ) < H = [ {Hs,j \ 'j / 2 Im( I)} Hs,j = (y, ) \ ds j (¯ y, ¯) = then ds j(¯ y, ¯) = ¯ ! ok. Proof ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI To show: 8 j / 2 I, Case 3: ds j (y, ) = and I = supp(s)
  34. x ?=0 H;,j ! ok, by continuity. 'j / 2

    Im( I) ! exclude this case. Exclude hyperplanes: HI,j d s j (¯ y, ¯) = |h 'j, ¯ y I ˆ x¯(¯ y )i| 6 Case 2: ds j (y, ) = and 'j 2 Im( I) Case 1: ds j (y, ) < H = [ {Hs,j \ 'j / 2 Im( I)} Hs,j = (y, ) \ ds j (¯ y, ¯) = then ds j(¯ y, ¯) = ¯ ! ok. Proof ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI To show: 8 j / 2 I, Case 3: ds j (y, ) = and I = supp(s)
  35. Local parameterization: y x x Under uniqueness assumption: are piecewise

    a ne functions. Local Affine Maps ˆ x¯ (¯ y) I = + I ¯ y ¯( I I ) 1sI x1 x2 0 = 0 k x k = 0 x0 (BP sol.) breaking points change of support of x
  36. Corrolary: µ ( y ) = x1 = x2 is

    uniquely defined. Projector Proposition: If x1 and x2 minimize E , E ( x ) = 1 2 || x y ||2 + || x ||1 then x1 = x2.
  37. Corrolary: µ ( y ) = x1 = x2 is

    uniquely defined. x3 = (x1 + x2)/2 is solution and if x1 6 = x2, Projector Proposition: If x1 and x2 minimize E , E ( x ) = 1 2 || x y ||2 + || x ||1 then x1 = x2. Proof: 2|| x3 ||1 6 || x1 ||1 + || x2 ||1 2|| x3 y ||2 < || x1 y ||2 + || x2 y ||2 E (x3) < E (x1) = E (x2) = ) contradiction.
  38. For (¯ y, ) close to ( y, ) /

    2 H : µ(¯ y) = PI(¯ y) dI = I + I = +,⇤ I sI PI: orthogonal projector on { x \ supp(x) = I } . Corrolary: µ ( y ) = x1 = x2 is uniquely defined. x3 = (x1 + x2)/2 is solution and if x1 6 = x2, Projector Proposition: If x1 and x2 minimize E , E ( x ) = 1 2 || x y ||2 + || x ||1 then x1 = x2. Proof: 2|| x3 ||1 6 || x1 ||1 + || x2 ||1 2|| x3 y ||2 < || x1 y ||2 + || x2 y ||2 E (x3) < E (x1) = E (x2) = ) contradiction.
  39. Overview • Polytope Noiseless Recovery • Local Behavior of Sparse

    Regularization • Robustness to Small Noise • Robustness to Bounded Noise • Compressed Sensing RIP Theory
  40. Uniqueness Sufficient Condition E ( x ) = 1 2

    || x y ||2 + || x ||1
  41. Uniqueness Sufficient Condition Theorem: If I has full rank and

    || Ic ( x y)|| < E ( x ) = 1 2 || x y ||2 + || x ||1 then x ? is the unique minimizer of E .
  42. || Ic ( ˜ x ? y )||1 = ||

    Ic ( x ? y )||1 < =) supp(˜ x ?) ⇢ I ˜ x ? I x ? I 2 ker( I) = {0} . Let ˜ x ? be a minimizer. Then x ? = ˜ x ? =) =) ˜ x ? = x ? Uniqueness Sufficient Condition Theorem: If I has full rank and || Ic ( x y)|| < E ( x ) = 1 2 || x y ||2 + || x ||1 Proof: then x ? is the unique minimizer of E .
  43. F(s) = || IsI || where ⇥ I = Ic

    +, I Identifiability crition: [Fuchs] ( I is assumed to have full rank) For s ⇥ { 1, 0, +1}N , let I = supp(s) + I = ( I I ) 1 I satisfies + I I = Id I Robustness to Small Noise
  44. F(s) = || IsI || where ⇥ I = Ic

    +, I Identifiability crition: [Fuchs] ( I is assumed to have full rank) ⇥ If ||w|| small enough, ||x x0 || = O(||w||). is the unique solution of P (y). If ||w||/T is small enough and ||w||, then If F(sign(x0 )) < 1, x0 + + I w ( I I ) 1 sign(x0,I ) T = min i I |x0,i | For s ⇥ { 1, 0, +1}N , let I = supp(s) + I = ( I I ) 1 I satisfies + I I = Id I Robustness to Small Noise Theorem:
  45. F(s) = || IsI || = max j /I |

    dI, j ⇥| where dI defined by: i I, dI, i = si dI = I ( I I ) 1sI Geometric Interpretation j i dI = +, I sI
  46. F(s) = || IsI || = max j /I |

    dI, j ⇥| where dI defined by: i I, dI, i = si Condition F(s) < 1: no vector j inside the cap Cs . dI Cs dI = I ( I I ) 1sI Geometric Interpretation j i i j | dI, ⇥| < 1 dI = +, I sI
  47. F(s) = || IsI || = max j /I |

    dI, j ⇥| where dI defined by: i I, dI, i = si Condition F(s) < 1: no vector j inside the cap Cs . dI Cs dI i j k dI = I ( I I ) 1sI Geometric Interpretation j i i j | dI, ⇥| < 1 | dI, ⇥| < 1 dI = +, I sI
  48. Local candidate: x = ˆ x(sign(x )) ˆ x(s) I

    = + I y ( I I ) 1sI, I = supp(s) where implicit equation ⇥ To prove: ˆ x = ˆ x(sign(x0 )) is the unique solution of P (y). Sketch of Proof
  49. Local candidate: Sign consistency: x = ˆ x(sign(x )) ˆ

    x(s) I = + I y ( I I ) 1sI, I = supp(s) where implicit equation sign(ˆ x) = sign(x0 ) (C 1 ) y = x0 + w = ˆ x = x0 + + I w ( I I ) 1sI || + I || ,2 ||w|| + ||( I I ) 1|| , < T = (C 1 ) ⇥ To prove: ˆ x = ˆ x(sign(x0 )) is the unique solution of P (y). Sketch of Proof
  50. Local candidate: Sign consistency: First order conditions: x = ˆ

    x(sign(x )) ˆ x(s) I = + I y ( I I ) 1sI, I = supp(s) where implicit equation sign(ˆ x) = sign(x0 ) (C 1 ) (C 2 ) y = x0 + w = ˆ x = x0 + + I w ( I I ) 1sI || Ic ( ˆ x y)|| < || + I || ,2 ||w|| + ||( I I ) 1|| , < T || Ic ( I + I Id)||2, ||w|| (1 F(s)) < 0 = (C 1 ) = (C 2 ) ⇥ To prove: ˆ x = ˆ x(sign(x0 )) is the unique solution of P (y). Sketch of Proof
  51. = ˆ x is the solution Sketch of Proof (cont)

    || + I || ,2 ||w|| + ||( I I ) 1|| , < T || Ic ( I + I Id)||2, ||w|| (1 F(s)) < 0
  52. = ˆ x is the solution ||w|| ||w|| + ⇥⇤

    = T For ||w||/T < ⇥max , one can choose ||w||/T such that ˆ x is the solution of P (y). T max ||w|| ⇥⇤ = 0 Sketch of Proof (cont) || + I || ,2 ||w|| + ||( I I ) 1|| , < T || Ic ( I + I Id)||2, ||w|| (1 F(s)) < 0
  53. = ˆ x is the solution ||w|| ||w|| + ⇥⇤

    = T For ||w||/T < ⇥max , one can choose ||w||/T such that ˆ x is the solution of P (y). T max ||w|| ⇥⇤ = 0 = O(||w||) ||ˆ x x0 || || + I w|| + ||( I I ) 1|| ,2 =⇥ ||ˆ x x0 || = O(||w||) Sketch of Proof (cont) || + I || ,2 ||w|| + ||( I I ) 1|| , < T || Ic ( I + I Id)||2, ||w|| (1 F(s)) < 0
  54. Overview • Polytope Noiseless Recovery • Local Behavior of Sparse

    Regularization • Robustness to Small Noise • Robustness to Bounded Noise • Compressed Sensing RIP Theory
  55. Exact Recovery Criterion (ERC): [Tropp] ERC(I) = || I ||

    , Relation with F criterion: ERC(I) = max s,supp(s) I F(s) For a support I ⇥ {0, . . . , N 1} with I full rank, = || + I Ic ||1,1 = max j Ic || + I j ||1 (use ||(aj ) j ||1,1 = max j ||aj ||1 ) Robustness to Bounded Noise where ⇥ I = Ic +, I
  56. Exact Recovery Criterion (ERC): [Tropp] ERC(I) = || I ||

    , Relation with F criterion: ERC(I) = max s,supp(s) I F(s) For a support I ⇥ {0, . . . , N 1} with I full rank, = || + I Ic ||1,1 = max j Ic || + I j ||1 (use ||(aj ) j ||1,1 = max j ||aj ||1 ) Robustness to Bounded Noise where ⇥ I = Ic +, I Theorem: If ERC(supp(x0 )) < 1 and ||w||, then ||x0 x || = O(||w||) x is unique, satisfies supp(x ) supp(x0 ), and
  57. Restricted recovery: ˆ x ⇥ argmin supp(x) I 1 2

    || x y||2 + ||x||1 ⇥ To prove: ˆ x is the unique solution of P (y). Sketch of Proof
  58. Restricted recovery: ˆ x ⇥ argmin supp(x) I 1 2

    || x y||2 + ||x||1 Implicit equation: ˆ xI = + I y ( I I ) 1sI Important: s = sign(ˆ x) is not equal to sign(x ). ⇥ To prove: ˆ x is the unique solution of P (y). Sketch of Proof
  59. Restricted recovery: ˆ x ⇥ argmin supp(x) I 1 2

    || x y||2 + ||x||1 Implicit equation: ˆ xI = + I y ( I I ) 1sI Important: s = sign(ˆ x) is not equal to sign(x ). ⇥ To prove: ˆ x is the unique solution of P (y). Sketch of Proof First order conditions: (C 2 ) || Ic ( ˆ x y)|| < || Ic ( I + I Id)||2, ||w|| (1 F(s)) < 0 = (C 2 )
  60. Restricted recovery: ˆ x ⇥ argmin supp(x) I 1 2

    || x y||2 + ||x||1 Implicit equation: ˆ xI = + I y ( I I ) 1sI Important: s = sign(ˆ x) is not equal to sign(x ). ERC(I) < 1 = F(s) < 1 Since s is arbitrary: Hence, choosing ||w|| implies (C 2 ). ⇥ To prove: ˆ x is the unique solution of P (y). Sketch of Proof First order conditions: (C 2 ) || Ic ( ˆ x y)|| < || Ic ( I + I Id)||2, ||w|| (1 F(s)) < 0 = (C 2 )
  61. (A, B) = max j i I | ai, bj

    ⇥| (A) = max j i=j | ai, aj ⇥| w-ERC(I) = ( I, Ic ) 1 ( I ) if ( I ) < 1 + otherwise. Weak Exact Recovery Criterion: [Gribonval,Dossal] (for I = supp(s)) For A = (ai ) i, B = (bi ) i , where ai, bi RP , F(s) ERC(I) w-ERC(I) Weak ERC Denoting = ( i )N 1 i=0 where i RP Theorem:
  62. ERC(I) = max j /I || + I j ||1

    ||( I I ) 1||1,1 max j /I || I j ||1 max j /I || I ⇥j ||1 = max j /I i m | ⇥i, ⇥j ⇥| = ( I, Ic ) Proof (for I = supp(s)) F(s) ERC(I) w-ERC(I) Theorem:
  63. ERC(I) = max j /I || + I j ||1

    ||( I I ) 1||1,1 max j /I || I j ||1 max j /I || I ⇥j ||1 = max j /I i m | ⇥i, ⇥j ⇥| = ( I, Ic ) One has I I = Id H, if ||H||1,1 < 1, ( I I ) 1 = (Id H) 1 = k 0 Hk ||( I I ) 1||1,1 k 0 ||H||k 1,1 = 1 1 ||H||1,1 ||H||1,1 = max i I j=i | ⇥i, ⇥j ⇥| = ( I ) Proof (for I = supp(s)) F(s) ERC(I) w-ERC(I) Theorem:
  64. P = 200, N = 1000 F < 1 ERC

    < 1 x = x0 w-ERC < 1 Example: Random Matrix 0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1
  65. ⇥x = i xi (· i) Increasing : reduces correlation.

    F(s) ERC(I) w-ERC(I) reduces resolution. Example: Deconvolution x0 x0
  66. Coherence Bounds µ( ) = max i=j | i, j

    ⇥| Mutual coherence: Theorem: F(s) ERC(I) w-ERC(I) |I|µ( ) 1 (|I| 1)µ( )
  67. Coherence Bounds Theorem: ||x0 x || = O(||w||) ||x0 ||0

    < 1 2 1 + 1 µ( ) If µ( ) = max i=j | i, j ⇥| Mutual coherence: one has supp(x ) I, and and ||w||, Theorem: F(s) ERC(I) w-ERC(I) |I|µ( ) 1 (|I| 1)µ( )
  68. Coherence Bounds Theorem: ||x0 x || = O(||w||) ||x0 ||0

    < 1 2 1 + 1 µ( ) If µ( ) = max i=j | i, j ⇥| Mutual coherence: one has supp(x ) I, and and ||w||, Theorem: F(s) ERC(I) w-ERC(I) |I|µ( ) 1 (|I| 1)µ( ) For Gaussian matrices: For convolution matrices: useless criterion. µ( ) log(PN)/P One has: Optimistic setting: ||x0 ||0 O( P) µ( ) N P P(N 1)
  69. Incoherent pair of orthobases: 2 = k N 1/2e2i N

    mk m 1 = {k ⇤⇥ [k m]}m Diracs/Fourier = [ 1, 2 ] RN 2N Coherence - Examples
  70. Incoherent pair of orthobases: 2 = k N 1/2e2i N

    mk m 1 = {k ⇤⇥ [k m]}m Diracs/Fourier = [ 1, 2 ] RN 2N min x R2N 1 2 ||y x||2 + ||x||1 min x1,x2 RN 1 2 ||y 1x1 2x2 ||2 + ||x1 ||1 + ||x2 ||1 = + Coherence - Examples
  71. Incoherent pair of orthobases: 2 = k N 1/2e2i N

    mk m 1 = {k ⇤⇥ [k m]}m Diracs/Fourier = [ 1, 2 ] RN 2N µ( ) = 1 N = separates up to N/2 Diracs + sines. min x R2N 1 2 ||y x||2 + ||x||1 min x1,x2 RN 1 2 ||y 1x1 2x2 ||2 + ||x1 ||1 + ||x2 ||1 = + Coherence - Examples
  72. Overview • Polytope Noiseless Recovery • Local Behavior of Sparse

    Regularization • Robustness to Small Noise • Robustness to Bounded Noise • Compressed Sensing RIP Theory
  73. ⇥ ||x||0 k, (1 k )||x||2 || x||2 (1 +

    k )||x||2 Restricted Isometry Constants: 1 recovery: ⇥ argmin x 1 2 || x y||2 + ||x||1 CS with RIP x⇥ argmin || x y|| ||x||1 where y = x0 + w ||w||
  74. ⇥ ||x||0 k, (1 k )||x||2 || x||2 (1 +

    k )||x||2 Restricted Isometry Constants: 1 recovery: ⇥ argmin x 1 2 || x y||2 + ||x||1 CS with RIP [Candes 2009] x⇥ argmin || x y|| ||x||1 where y = x0 + w ||w|| Theorem: If 2k 2 1, then where xk is the best k-term approximation of x0 . ||x0 x || C0 ⇥ k ||x0 xk ||1 + C1
  75. ||hT c 0 ||1 ||hT0 ||1 + 2||xT c 0

    ||1 Optimality conditions: C0 = 2 1 C1 = 1 ⇥ = 2 1 + 2k 1 2k = 2 2k 1 2k Explicit constants: Reference: Elements of Proof {0, . . . , N 1} = T0 ⇥ T1 ⇥ . . . ⇥ Tm k elements of x0 of hT c 0 largest h = x x0 xk = xT0 largest ||x0 x || C0 ⇥ s ||x0 xk ||1 + C1 E. J. Cand` es, CRAS, 2006
  76. f (⇥) = 1 2⇤ ⇥ (⇥ b)+(a ⇥)+ Eigenvalues

    of I I with |I| = k are essentially in [a, b] a = (1 )2 and b = (1 )2 where = k/P When k = P + , the eigenvalue distribution tends to [Marcenko-Pastur] Large deviation inequality [Ledoux] Singular Values Distributions 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 P=200, k=10 0 0.5 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 P=200, k=30 0.4 0.6 0.8 P=200, k=50 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 P=200, k=10 0 0.5 1 1.5 2 2.5 0 0.2 0.4 0.6 0.8 1 P=200, k=30 0.2 0.4 0.6 0.8 P=200, k=50 P = 200, k = 10 f ( ) k = 30
  77. Link with coherence: k (k 1)µ( ) 2 = µ(

    ) RIP for Gaussian Matrices µ( ) = max i=j | i, j ⇥|
  78. Link with coherence: k (k 1)µ( ) For Gaussian matrices:

    2 = µ( ) RIP for Gaussian Matrices µ( ) = max i=j | i, j ⇥| µ( ) log(PN)/P
  79. Link with coherence: k (k 1)µ( ) For Gaussian matrices:

    Stronger result: 2 = µ( ) RIP for Gaussian Matrices k C log(N/P)P Theorem: If then 2k 2 1 with high probability. µ( ) = max i=j | i, j ⇥| µ( ) log(PN)/P
  80. (1 ⇥1 (A))|| ||2 ||A ||2 (1 + ⇥2 (A))||

    ||2 Stability constant of A: smallest / largest eigenvalues of A A Numerics with RIP
  81. 2 1 (1 ⇥1 (A))|| ||2 ||A ||2 (1 +

    ⇥2 (A))|| ||2 Stability constant of A: Upper/lower RIC: i k = max |I|=k i ( I ) k = min( 1 k , 2 k ) k ˆ2 k ˆ2 k Monte-Carlo estimation: ˆ k k smallest / largest eigenvalues of A A Numerics with RIP
  82. Local behavior: ! x ? polygonal. y ! x ?

    piecewise a ne. Conclusion 10 20 30 40 50 60 −1 −0.5 0 0.5 s=3 10 20 30 40 50 60 −0.5 0 0.5 s=6 20 40 60 80 100 −0.5 0 0.5 1 s=13 20 40 60 80 100 120 140 −1.5 −1 −0.5 0 0.5 1 1.5 s=25
  83. Local behavior: ! x ? polygonal. y ! x ?

    piecewise a ne. Noiseless recovery: () geometry of polytopes. Conclusion 10 20 30 40 50 60 −1 −0.5 0 0.5 s=3 10 20 30 40 50 60 −0.5 0 0.5 s=6 20 40 60 80 100 −0.5 0 0.5 1 s=13 20 40 60 80 100 120 140 −1.5 −1 −0.5 0 0.5 1 1.5 s=25 x0
  84. Local behavior: ! x ? polygonal. y ! x ?

    piecewise a ne. Noiseless recovery: () geometry of polytopes. Small noise: ! sign stability. ! support inclusion. Bounded noise: RIP-based: ! no support stability, L1 bounds. Conclusion 10 20 30 40 50 60 −1 −0.5 0 0.5 s=3 10 20 30 40 50 60 −0.5 0 0.5 s=6 20 40 60 80 100 −0.5 0 0.5 1 s=13 20 40 60 80 100 120 140 −1.5 −1 −0.5 0 0.5 1 1.5 s=25 x0