Low Complexity Regularization of Inverse Problems

Low Complexity Regularization of Inverse Problems

Talk at Curves and Surfaces 2014

E34ded36efe4b7abb12510d4e525fee8?s=128

Gabriel Peyré

June 12, 2014
Tweet

Transcript

  1. Gabriel Peyré www.numerical-tours.com Low Complexity Regularization of Inverse Problems Samuel

    Vaiter Jalal Fadili Joint work with: VISI N
  2. y = x0 + w 2 RP Inverse Problems Recovering

    x0 RN from noisy observations
  3. y = x0 + w 2 RP Examples: Inpainting, super-resolution,

    . . . Inverse Problems Recovering x0 RN from noisy observations x0
  4. x = ( p✓k )16k6K Inverse Problems in Medical Imaging

  5. Magnetic resonance imaging (MRI): x = ( p✓k )16k6K x

    = ( ˆ f ( ! ))!2⌦ Inverse Problems in Medical Imaging ˆ x
  6. Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . .

    . x = ( p✓k )16k6K x = ( ˆ f ( ! ))!2⌦ Inverse Problems in Medical Imaging ˆ x
  7. ˜ x0 Compressed Sensing [Rice Univ.]

  8. P measures N micro-mirrors ˜ x0 y [ i ]

    = h x0, 'i i Compressed Sensing [Rice Univ.]
  9. P/N = 0.16 P/N = 0.02 P/N = 1 P

    measures N micro-mirrors ˜ x0 y [ i ] = h x0, 'i i Compressed Sensing [Rice Univ.]
  10. Inverse Problem Regularization observations y parameter Estimator: x(y) depends only

    on Observations: y = x0 + w 2 RP .
  11. Inverse Problem Regularization observations y parameter Example: variational methods Estimator:

    x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data fidelity Regularity Observations: y = x0 + w 2 RP .
  12. J ( x0) Regularity of x0 Inverse Problem Regularization observations

    y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data fidelity Regularity Observations: y = x0 + w 2 RP . Choice of : tradeo ||w|| Noise level
  13. J ( x0) Regularity of x0 x ( y )

    2 argmin x = y J ( x ) Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data fidelity Regularity Observations: y = x0 + w 2 RP . No noise: 0+, minimize Choice of : tradeo ||w|| Noise level
  14. J ( x0) Regularity of x0 x ( y )

    2 argmin x = y J ( x ) Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data fidelity Regularity Observations: y = x0 + w 2 RP . No noise: 0+, minimize Choice of : tradeo ||w|| Noise level ! Criteria on (x0, || w || , ) to ensure ||x(y) x0 || = O(||w||) model stability Performance analysis:
  15. Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error

    • Performance Guarantees: Model Consistency
  16. Coe cients x Image x M Union of Models for

    Data Processing Synthesis sparsity: Union of models: M ⇢ RN subspaces or manifolds.
  17. Coe cients x Image x M Union of Models for

    Data Processing Synthesis sparsity: Structured sparsity: Union of models: M ⇢ RN subspaces or manifolds.
  18. Coe cients x Image x M Union of Models for

    Data Processing D Image x Gradient D ⇤ x Synthesis sparsity: Structured sparsity: Analysis sparsity: Union of models: M ⇢ RN subspaces or manifolds.
  19. Coe cients x Image x Multi-spectral imaging: xi,· = Pr

    j=1 Ai,jSj,· M Union of Models for Data Processing D Image x Gradient D ⇤ x Synthesis sparsity: Structured sparsity: Analysis sparsity: Low-rank: S1,· S2,· S3,· x Union of models: M ⇢ RN subspaces or manifolds.
  20. J ( x ) = | x | Example: J

    ( x ) = | x |. Subdifferentials and Linear Models @J ( x ) Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
  21. I = supp( x ) = { i \ xi

    6= 0} Example: J ( x ) = || x ||1 = P i | xi |. J ( x ) = | x | Example: J ( x ) = | x |. @J ( x ) Subdifferentials and Linear Models @ || x ||1 = ⇢ ⌘ \ supp( ⌘ ) = I, 8 j / 2 I, | ⌘j | 6 1 x @J ( x ) Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
  22. I = supp( x ) = { i \ xi

    6= 0} Example: J ( x ) = || x ||1 = P i | xi |. J ( x ) = | x | Linear model: Example: J ( x ) = | x |. @J ( x ) T x Subdifferentials and Linear Models T x = {⌘ \ supp(⌘) = I} @ || x ||1 = ⇢ ⌘ \ supp( ⌘ ) = I, 8 j / 2 I, | ⌘j | 6 1 x @J ( x ) Tx = VectHull( @J ( x ))? Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
  23. [Lewis 2003] (i) J is C 2 along M x

    around x ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
  24. [Lewis 2003] (i) J is C 2 along M x

    around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
  25. (iii) @J is continuous on M x around x. [Lewis

    2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
  26. (iii) @J is continuous on M x around x. [Lewis

    2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Important: in general M x 6= T x Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
  27. x M x Examples of Partly-smooth Regularizers J ( x

    ) = || x ||1 x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
  28. x M x same M x Examples of Partly-smooth Regularizers

    J ( x ) = || x ||1 x 0 M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
  29. x M x M x = { x ; rank(

    z ) = rank( x )} same M x Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x J ( x ) = || x ||⇤ x M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || Nuclear norm: J ( x ) = || x ||⇤ M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
  30. x M x M x = { x ; rank(

    z ) = rank( x )} same M x I = { i ; | xi | = || x ||1 } M x = Tx = { z ; zI / xI } Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x 0 M x J ( x ) = || x ||1 x x 0 M x J ( x ) = || x ||⇤ x M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 Anti-sparsity: J ( x ) = || x ||1 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || Nuclear norm: J ( x ) = || x ||⇤ M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
  31. Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error

    • Performance Guarantees: Model Consistency
  32. x = x0 Dual Certificates Noiseless recovery: min x =

    x0 J ( x ) (P0) x0
  33. Dual certificates: x = x0 Proposition: D( x0) = Im(

    ⇤) \ @J ( x0) x0 solution of ( P0) () 9 ⌘ 2 D (x0) Dual Certificates ⌘ @J ( x0) Noiseless recovery: min x = x0 J ( x ) (P0) x0
  34. Dual certificates: x = x0 Proposition: D( x0) = Im(

    ⇤) \ @J ( x0) x0 solution of ( P0) () 9 ⌘ 2 D (x0) Example: J ( x ) = || x ||1 x = x ? ' D( x0) = { ⌘ = x ? ' ; ⌘i = sign( x0,i) , || ⌘ ||1 6 1} Dual Certificates ⌘ @J ( x0) Noiseless recovery: min x = x0 J ( x ) (P0) x0 ⌘ ⌘
  35. = interior for the topology of a↵( E ) ri(

    E ) = relative interior of E Non degenerate dual certificate: Dual Certificates and L2 Stability x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
  36. = interior for the topology of a↵( E ) ri(

    E ) = relative interior of E Non degenerate dual certificate: Dual Certificates and L2 Stability Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
  37. = interior for the topology of a↵( E ) ri(

    E ) = relative interior of E Non degenerate dual certificate: Dual Certificates and L2 Stability [Grassmair 2012]: J(x? x0) = O(||w||). [Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 . Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
  38. = interior for the topology of a↵( E ) ri(

    E ) = relative interior of E Non degenerate dual certificate: Dual Certificates and L2 Stability [Grassmair 2012]: J(x? x0) = O(||w||). [Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 . ! The constants depend on N . . . Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
  39. Random matrix: i,j ⇠ N(0, 1), i.i.d. 2 RP ⇥N

    , Compressed Sensing Setting
  40. Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors:

    J = || · ||1. Let s = || x0 ||0. If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) [Rudelson, Vershynin 2006] Compressed Sensing Setting
  41. Theorem: Then 9 ⌘ 2 ¯ D (x0) with high

    probability on . Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Low-rank matrices: J = || · ||⇤. Let s = || x0 ||0. If Let r = rank( x0). If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) P > 3r(N1 + N2 r) x0 2 RN1 ⇥N2 [Rudelson, Vershynin 2006] Compressed Sensing Setting [Chandrasekaran et al. 2011]
  42. Theorem: Then 9 ⌘ 2 ¯ D (x0) with high

    probability on . Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Low-rank matrices: J = || · ||⇤. Let s = || x0 ||0. If Let r = rank( x0). If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , ! Similar results for || · ||1,2, || · ||1. [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) P > 3r(N1 + N2 r) x0 2 RN1 ⇥N2 [Rudelson, Vershynin 2006] Compressed Sensing Setting [Chandrasekaran et al. 2011]
  43. From [Amelunxen et al. 2013] Phase Transitions P/N 0 1

    s/N r/ p N J = || · ||1 J = || · ||⇤ THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 0 25 50 75 100 0 25 50 75 100 0 10 20 30 0 300 600 900 FIGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empiri probability that the `1 minimization problem (2.6) identifies a sparse vector x 0 2 R100 given random line THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 0 25 50 75 100 0 25 50 75 100 0 10 20 30 0 300 600 900 IGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empirical robability that the `1 minimization problem (2.6) identifies a sparse vector x 0 2 R100 given random linear 1 0 1 1 P/N
  44. Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error

    • Performance Guarantees: Model Consistency
  45. Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q

    2 @J ( x0) ||q|| Minimal-norm certificate:
  46. @J ( x0 ) ⇢ A( x0 ) = A↵Hull(

    @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certificate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
  47. @J ( x0 ) ⇢ A( x0 ) = A↵Hull(

    @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certificate: ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certificate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
  48. ! ⌘F is computed by solving a linear system. !

    One does not always have ⌘F 2 D( x0 ) ! @J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certificate: ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certificate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
  49. ! ⌘F is computed by solving a linear system. !

    One does not always have ⌘F 2 D( x0 ) ! @J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certificate: Theorem: If ker( ) \ T = {0}, ⇢ ⌘F 2 ¯ D( x0) =) ⌘F = ⌘0, ⌘0 2 ¯ D( x0) =) ⌘F = ⌘0. ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certificate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
  50. [Vaiter et al. 2014] Model Stability Theorem: the unique solution

    x ? of P (y) for y = x0 + w satisfies If ⌘F 2 ¯ D (x0), there exists C such that if max ( , ||w||/ ) 6 C x? 2 M x0 and ||x? x0 || = O(||w||, )
  51. [Fuchs 2004]: J = || · ||1. [Bach 2008]: J

    = || · ||1,2 and J = || · ||⇤. Previous works: [Vaiter et al. 2014] Model Stability Theorem: the unique solution x ? of P (y) for y = x0 + w satisfies If ⌘F 2 ¯ D (x0), there exists C such that if max ( , ||w||/ ) 6 C [Vaiter et al. 2011]: J = ||D⇤ · ||1. x? 2 M x0 and ||x? x0 || = O(||w||, )
  52. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1),

    i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
  53. P ⇠ 2 s log( N ) P ⇠ 2

    s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
  54. ! Similar results for || · ||1,2, || · ||⇤,

    || · ||1. P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
  55. ! Similar results for || · ||1,2, || · ||⇤,

    || · ||1. ! Not using RIP technics (non-uniform result on x0). P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
  56. ⇥x = i xi (· i) Increasing : reduces correlation.

    reduces resolution. J ( x ) = || x ||1 1-D Sparse Spikes Deconvolution x0 x0
  57. ⇥x = i xi (· i) Increasing : reduces correlation.

    reduces resolution. 0 10 2 support recovery. J ( x ) = || x ||1 () I = { j \ x0( j ) 6= 0} ||⌘F,Ic ||1 ||⌘F,Ic ||1 < 1 ⌘0 = ⌘F 2 ¯ D( x0) 1-D Sparse Spikes Deconvolution x0 x0 20 1 ()
  58. Conclusion Partial smoothness: encodes models using singularities.

  59. Performance measures L2 error model di↵erent CS guarantees Conclusion Partial

    smoothness: encodes models using singularities.
  60. Performance measures L2 error model di↵erent CS guarantees Specific certificate:

    ⌘0, ⌘F , . . . Conclusion Partial smoothness: encodes models using singularities.
  61. Performance measures L2 error model di↵erent CS guarantees Specific certificate:

    – CS performance with arbitrary gauges. – Infinite dimensional regularizations (BV, . . . ) – Convergence discrete ! continuous. ⌘0, ⌘F , . . . Conclusion Open problems: Partial smoothness: encodes models using singularities.