Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Low Complexity Regularization of Inverse Problems

Low Complexity Regularization of Inverse Problems

Talk at Curves and Surfaces 2014

Gabriel Peyré

June 12, 2014
Tweet

More Decks by Gabriel Peyré

Other Decks in Research

Transcript

  1. y = x0 + w 2 RP Inverse Problems Recovering

    x0 RN from noisy observations
  2. y = x0 + w 2 RP Examples: Inpainting, super-resolution,

    . . . Inverse Problems Recovering x0 RN from noisy observations x0
  3. Magnetic resonance imaging (MRI): x = ( p✓k )16k6K x

    = ( ˆ f ( ! ))!2⌦ Inverse Problems in Medical Imaging ˆ x
  4. Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . .

    . x = ( p✓k )16k6K x = ( ˆ f ( ! ))!2⌦ Inverse Problems in Medical Imaging ˆ x
  5. P measures N micro-mirrors ˜ x0 y [ i ]

    = h x0, 'i i Compressed Sensing [Rice Univ.]
  6. P/N = 0.16 P/N = 0.02 P/N = 1 P

    measures N micro-mirrors ˜ x0 y [ i ] = h x0, 'i i Compressed Sensing [Rice Univ.]
  7. Inverse Problem Regularization observations y parameter Example: variational methods Estimator:

    x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data fidelity Regularity Observations: y = x0 + w 2 RP .
  8. J ( x0) Regularity of x0 Inverse Problem Regularization observations

    y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data fidelity Regularity Observations: y = x0 + w 2 RP . Choice of : tradeo ||w|| Noise level
  9. J ( x0) Regularity of x0 x ( y )

    2 argmin x = y J ( x ) Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data fidelity Regularity Observations: y = x0 + w 2 RP . No noise: 0+, minimize Choice of : tradeo ||w|| Noise level
  10. J ( x0) Regularity of x0 x ( y )

    2 argmin x = y J ( x ) Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data fidelity Regularity Observations: y = x0 + w 2 RP . No noise: 0+, minimize Choice of : tradeo ||w|| Noise level ! Criteria on (x0, || w || , ) to ensure ||x(y) x0 || = O(||w||) model stability Performance analysis:
  11. Coe cients x Image x M Union of Models for

    Data Processing Synthesis sparsity: Union of models: M ⇢ RN subspaces or manifolds.
  12. Coe cients x Image x M Union of Models for

    Data Processing Synthesis sparsity: Structured sparsity: Union of models: M ⇢ RN subspaces or manifolds.
  13. Coe cients x Image x M Union of Models for

    Data Processing D Image x Gradient D ⇤ x Synthesis sparsity: Structured sparsity: Analysis sparsity: Union of models: M ⇢ RN subspaces or manifolds.
  14. Coe cients x Image x Multi-spectral imaging: xi,· = Pr

    j=1 Ai,jSj,· M Union of Models for Data Processing D Image x Gradient D ⇤ x Synthesis sparsity: Structured sparsity: Analysis sparsity: Low-rank: S1,· S2,· S3,· x Union of models: M ⇢ RN subspaces or manifolds.
  15. J ( x ) = | x | Example: J

    ( x ) = | x |. Subdifferentials and Linear Models @J ( x ) Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
  16. I = supp( x ) = { i \ xi

    6= 0} Example: J ( x ) = || x ||1 = P i | xi |. J ( x ) = | x | Example: J ( x ) = | x |. @J ( x ) Subdifferentials and Linear Models @ || x ||1 = ⇢ ⌘ \ supp( ⌘ ) = I, 8 j / 2 I, | ⌘j | 6 1 x @J ( x ) Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
  17. I = supp( x ) = { i \ xi

    6= 0} Example: J ( x ) = || x ||1 = P i | xi |. J ( x ) = | x | Linear model: Example: J ( x ) = | x |. @J ( x ) T x Subdifferentials and Linear Models T x = {⌘ \ supp(⌘) = I} @ || x ||1 = ⇢ ⌘ \ supp( ⌘ ) = I, 8 j / 2 I, | ⌘j | 6 1 x @J ( x ) Tx = VectHull( @J ( x ))? Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
  18. [Lewis 2003] (i) J is C 2 along M x

    around x ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
  19. [Lewis 2003] (i) J is C 2 along M x

    around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
  20. (iii) @J is continuous on M x around x. [Lewis

    2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
  21. (iii) @J is continuous on M x around x. [Lewis

    2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Important: in general M x 6= T x Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
  22. x M x Examples of Partly-smooth Regularizers J ( x

    ) = || x ||1 x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
  23. x M x same M x Examples of Partly-smooth Regularizers

    J ( x ) = || x ||1 x 0 M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
  24. x M x M x = { x ; rank(

    z ) = rank( x )} same M x Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x J ( x ) = || x ||⇤ x M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || Nuclear norm: J ( x ) = || x ||⇤ M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
  25. x M x M x = { x ; rank(

    z ) = rank( x )} same M x I = { i ; | xi | = || x ||1 } M x = Tx = { z ; zI / xI } Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x 0 M x J ( x ) = || x ||1 x x 0 M x J ( x ) = || x ||⇤ x M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 Anti-sparsity: J ( x ) = || x ||1 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || Nuclear norm: J ( x ) = || x ||⇤ M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
  26. Dual certificates: x = x0 Proposition: D( x0) = Im(

    ⇤) \ @J ( x0) x0 solution of ( P0) () 9 ⌘ 2 D (x0) Dual Certificates ⌘ @J ( x0) Noiseless recovery: min x = x0 J ( x ) (P0) x0
  27. Dual certificates: x = x0 Proposition: D( x0) = Im(

    ⇤) \ @J ( x0) x0 solution of ( P0) () 9 ⌘ 2 D (x0) Example: J ( x ) = || x ||1 x = x ? ' D( x0) = { ⌘ = x ? ' ; ⌘i = sign( x0,i) , || ⌘ ||1 6 1} Dual Certificates ⌘ @J ( x0) Noiseless recovery: min x = x0 J ( x ) (P0) x0 ⌘ ⌘
  28. = interior for the topology of a↵( E ) ri(

    E ) = relative interior of E Non degenerate dual certificate: Dual Certificates and L2 Stability x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
  29. = interior for the topology of a↵( E ) ri(

    E ) = relative interior of E Non degenerate dual certificate: Dual Certificates and L2 Stability Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
  30. = interior for the topology of a↵( E ) ri(

    E ) = relative interior of E Non degenerate dual certificate: Dual Certificates and L2 Stability [Grassmair 2012]: J(x? x0) = O(||w||). [Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 . Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
  31. = interior for the topology of a↵( E ) ri(

    E ) = relative interior of E Non degenerate dual certificate: Dual Certificates and L2 Stability [Grassmair 2012]: J(x? x0) = O(||w||). [Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 . ! The constants depend on N . . . Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
  32. Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors:

    J = || · ||1. Let s = || x0 ||0. If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) [Rudelson, Vershynin 2006] Compressed Sensing Setting
  33. Theorem: Then 9 ⌘ 2 ¯ D (x0) with high

    probability on . Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Low-rank matrices: J = || · ||⇤. Let s = || x0 ||0. If Let r = rank( x0). If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) P > 3r(N1 + N2 r) x0 2 RN1 ⇥N2 [Rudelson, Vershynin 2006] Compressed Sensing Setting [Chandrasekaran et al. 2011]
  34. Theorem: Then 9 ⌘ 2 ¯ D (x0) with high

    probability on . Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Low-rank matrices: J = || · ||⇤. Let s = || x0 ||0. If Let r = rank( x0). If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , ! Similar results for || · ||1,2, || · ||1. [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) P > 3r(N1 + N2 r) x0 2 RN1 ⇥N2 [Rudelson, Vershynin 2006] Compressed Sensing Setting [Chandrasekaran et al. 2011]
  35. From [Amelunxen et al. 2013] Phase Transitions P/N 0 1

    s/N r/ p N J = || · ||1 J = || · ||⇤ THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 0 25 50 75 100 0 25 50 75 100 0 10 20 30 0 300 600 900 FIGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empiri probability that the `1 minimization problem (2.6) identifies a sparse vector x 0 2 R100 given random line THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 0 25 50 75 100 0 25 50 75 100 0 10 20 30 0 300 600 900 IGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empirical robability that the `1 minimization problem (2.6) identifies a sparse vector x 0 2 R100 given random linear 1 0 1 1 P/N
  36. Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q

    2 @J ( x0) ||q|| Minimal-norm certificate:
  37. @J ( x0 ) ⇢ A( x0 ) = A↵Hull(

    @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certificate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
  38. @J ( x0 ) ⇢ A( x0 ) = A↵Hull(

    @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certificate: ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certificate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
  39. ! ⌘F is computed by solving a linear system. !

    One does not always have ⌘F 2 D( x0 ) ! @J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certificate: ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certificate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
  40. ! ⌘F is computed by solving a linear system. !

    One does not always have ⌘F 2 D( x0 ) ! @J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certificate: Theorem: If ker( ) \ T = {0}, ⇢ ⌘F 2 ¯ D( x0) =) ⌘F = ⌘0, ⌘0 2 ¯ D( x0) =) ⌘F = ⌘0. ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certificate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
  41. [Vaiter et al. 2014] Model Stability Theorem: the unique solution

    x ? of P (y) for y = x0 + w satisfies If ⌘F 2 ¯ D (x0), there exists C such that if max ( , ||w||/ ) 6 C x? 2 M x0 and ||x? x0 || = O(||w||, )
  42. [Fuchs 2004]: J = || · ||1. [Bach 2008]: J

    = || · ||1,2 and J = || · ||⇤. Previous works: [Vaiter et al. 2014] Model Stability Theorem: the unique solution x ? of P (y) for y = x0 + w satisfies If ⌘F 2 ¯ D (x0), there exists C such that if max ( , ||w||/ ) 6 C [Vaiter et al. 2011]: J = ||D⇤ · ||1. x? 2 M x0 and ||x? x0 || = O(||w||, )
  43. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1),

    i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
  44. P ⇠ 2 s log( N ) P ⇠ 2

    s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
  45. ! Similar results for || · ||1,2, || · ||⇤,

    || · ||1. P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
  46. ! Similar results for || · ||1,2, || · ||⇤,

    || · ||1. ! Not using RIP technics (non-uniform result on x0). P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
  47. ⇥x = i xi (· i) Increasing : reduces correlation.

    reduces resolution. J ( x ) = || x ||1 1-D Sparse Spikes Deconvolution x0 x0
  48. ⇥x = i xi (· i) Increasing : reduces correlation.

    reduces resolution. 0 10 2 support recovery. J ( x ) = || x ||1 () I = { j \ x0( j ) 6= 0} ||⌘F,Ic ||1 ||⌘F,Ic ||1 < 1 ⌘0 = ⌘F 2 ¯ D( x0) 1-D Sparse Spikes Deconvolution x0 x0 20 1 ()
  49. Performance measures L2 error model di↵erent CS guarantees Specific certificate:

    ⌘0, ⌘F , . . . Conclusion Partial smoothness: encodes models using singularities.
  50. Performance measures L2 error model di↵erent CS guarantees Specific certificate:

    – CS performance with arbitrary gauges. – Infinite dimensional regularizations (BV, . . . ) – Convergence discrete ! continuous. ⌘0, ⌘F , . . . Conclusion Open problems: Partial smoothness: encodes models using singularities.