600

# Low Complexity Regularization of Inverse Problems

Talk at Curves and Surfaces 2014

June 12, 2014

## Transcript

1. ### Gabriel Peyré www.numerical-tours.com Low Complexity Regularization of Inverse Problems Samuel

Vaiter Jalal Fadili Joint work with: VISI N
2. ### y = x0 + w 2 RP Inverse Problems Recovering

x0 RN from noisy observations
3. ### y = x0 + w 2 RP Examples: Inpainting, super-resolution,

. . . Inverse Problems Recovering x0 RN from noisy observations x0

5. ### Magnetic resonance imaging (MRI): x = ( p✓k )16k6K x

= ( ˆ f ( ! ))!2⌦ Inverse Problems in Medical Imaging ˆ x
6. ### Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . .

. x = ( p✓k )16k6K x = ( ˆ f ( ! ))!2⌦ Inverse Problems in Medical Imaging ˆ x

8. ### P measures N micro-mirrors ˜ x0 y [ i ]

= h x0, 'i i Compressed Sensing [Rice Univ.]
9. ### P/N = 0.16 P/N = 0.02 P/N = 1 P

measures N micro-mirrors ˜ x0 y [ i ] = h x0, 'i i Compressed Sensing [Rice Univ.]
10. ### Inverse Problem Regularization observations y parameter Estimator: x(y) depends only

on Observations: y = x0 + w 2 RP .
11. ### Inverse Problem Regularization observations y parameter Example: variational methods Estimator:

x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data ﬁdelity Regularity Observations: y = x0 + w 2 RP .
12. ### J ( x0) Regularity of x0 Inverse Problem Regularization observations

y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data ﬁdelity Regularity Observations: y = x0 + w 2 RP . Choice of : tradeo ||w|| Noise level
13. ### J ( x0) Regularity of x0 x ( y )

2 argmin x = y J ( x ) Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data ﬁdelity Regularity Observations: y = x0 + w 2 RP . No noise: 0+, minimize Choice of : tradeo ||w|| Noise level
14. ### J ( x0) Regularity of x0 x ( y )

2 argmin x = y J ( x ) Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data ﬁdelity Regularity Observations: y = x0 + w 2 RP . No noise: 0+, minimize Choice of : tradeo ||w|| Noise level ! Criteria on (x0, || w || , ) to ensure ||x(y) x0 || = O(||w||) model stability Performance analysis:
15. ### Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error

• Performance Guarantees: Model Consistency
16. ### Coe cients x Image x M Union of Models for

Data Processing Synthesis sparsity: Union of models: M ⇢ RN subspaces or manifolds.
17. ### Coe cients x Image x M Union of Models for

Data Processing Synthesis sparsity: Structured sparsity: Union of models: M ⇢ RN subspaces or manifolds.
18. ### Coe cients x Image x M Union of Models for

Data Processing D Image x Gradient D ⇤ x Synthesis sparsity: Structured sparsity: Analysis sparsity: Union of models: M ⇢ RN subspaces or manifolds.
19. ### Coe cients x Image x Multi-spectral imaging: xi,· = Pr

j=1 Ai,jSj,· M Union of Models for Data Processing D Image x Gradient D ⇤ x Synthesis sparsity: Structured sparsity: Analysis sparsity: Low-rank: S1,· S2,· S3,· x Union of models: M ⇢ RN subspaces or manifolds.
20. ### J ( x ) = | x | Example: J

( x ) = | x |. Subdifferentials and Linear Models @J ( x ) Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
21. ### I = supp( x ) = { i \ xi

6= 0} Example: J ( x ) = || x ||1 = P i | xi |. J ( x ) = | x | Example: J ( x ) = | x |. @J ( x ) Subdifferentials and Linear Models @ || x ||1 = ⇢ ⌘ \ supp( ⌘ ) = I, 8 j / 2 I, | ⌘j | 6 1 x @J ( x ) Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
22. ### I = supp( x ) = { i \ xi

6= 0} Example: J ( x ) = || x ||1 = P i | xi |. J ( x ) = | x | Linear model: Example: J ( x ) = | x |. @J ( x ) T x Subdifferentials and Linear Models T x = {⌘ \ supp(⌘) = I} @ || x ||1 = ⇢ ⌘ \ supp( ⌘ ) = I, 8 j / 2 I, | ⌘j | 6 1 x @J ( x ) Tx = VectHull( @J ( x ))? Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}
23. ### [Lewis 2003] (i) J is C 2 along M x

around x ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
24. ### [Lewis 2003] (i) J is C 2 along M x

around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
25. ### (iii) @J is continuous on M x around x. [Lewis

2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
26. ### (iii) @J is continuous on M x around x. [Lewis

2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Important: in general M x 6= T x Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)
27. ### x M x Examples of Partly-smooth Regularizers J ( x

) = || x ||1 x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
28. ### x M x same M x Examples of Partly-smooth Regularizers

J ( x ) = || x ||1 x 0 M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
29. ### x M x M x = { x ; rank(

z ) = rank( x )} same M x Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x J ( x ) = || x ||⇤ x M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || Nuclear norm: J ( x ) = || x ||⇤ M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
30. ### x M x M x = { x ; rank(

z ) = rank( x )} same M x I = { i ; | xi | = || x ||1 } M x = Tx = { z ; zI / xI } Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x 0 M x J ( x ) = || x ||1 x x 0 M x J ( x ) = || x ||⇤ x M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 Anti-sparsity: J ( x ) = || x ||1 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || Nuclear norm: J ( x ) = || x ||⇤ M x = Tx ={ z ; supp( z ) ⇢ supp( x )}
31. ### Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error

• Performance Guarantees: Model Consistency
32. ### x = x0 Dual Certificates Noiseless recovery: min x =

x0 J ( x ) (P0) x0
33. ### Dual certiﬁcates: x = x0 Proposition: D( x0) = Im(

⇤) \ @J ( x0) x0 solution of ( P0) () 9 ⌘ 2 D (x0) Dual Certificates ⌘ @J ( x0) Noiseless recovery: min x = x0 J ( x ) (P0) x0
34. ### Dual certiﬁcates: x = x0 Proposition: D( x0) = Im(

⇤) \ @J ( x0) x0 solution of ( P0) () 9 ⌘ 2 D (x0) Example: J ( x ) = || x ||1 x = x ? ' D( x0) = { ⌘ = x ? ' ; ⌘i = sign( x0,i) , || ⌘ ||1 6 1} Dual Certificates ⌘ @J ( x0) Noiseless recovery: min x = x0 J ( x ) (P0) x0 ⌘ ⌘
35. ### = interior for the topology of a↵( E ) ri(

E ) = relative interior of E Non degenerate dual certiﬁcate: Dual Certificates and L2 Stability x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
36. ### = interior for the topology of a↵( E ) ri(

E ) = relative interior of E Non degenerate dual certiﬁcate: Dual Certificates and L2 Stability Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
37. ### = interior for the topology of a↵( E ) ri(

E ) = relative interior of E Non degenerate dual certiﬁcate: Dual Certificates and L2 Stability [Grassmair 2012]: J(x? x0) = O(||w||). [Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 . Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
38. ### = interior for the topology of a↵( E ) ri(

E ) = relative interior of E Non degenerate dual certiﬁcate: Dual Certificates and L2 Stability [Grassmair 2012]: J(x? x0) = O(||w||). [Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 . ! The constants depend on N . . . Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))
39. ### Random matrix: i,j ⇠ N(0, 1), i.i.d. 2 RP ⇥N

, Compressed Sensing Setting
40. ### Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors:

J = || · ||1. Let s = || x0 ||0. If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) [Rudelson, Vershynin 2006] Compressed Sensing Setting
41. ### Theorem: Then 9 ⌘ 2 ¯ D (x0) with high

probability on . Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Low-rank matrices: J = || · ||⇤. Let s = || x0 ||0. If Let r = rank( x0). If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) P > 3r(N1 + N2 r) x0 2 RN1 ⇥N2 [Rudelson, Vershynin 2006] Compressed Sensing Setting [Chandrasekaran et al. 2011]
42. ### Theorem: Then 9 ⌘ 2 ¯ D (x0) with high

probability on . Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Low-rank matrices: J = || · ||⇤. Let s = || x0 ||0. If Let r = rank( x0). If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , ! Similar results for || · ||1,2, || · ||1. [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) P > 3r(N1 + N2 r) x0 2 RN1 ⇥N2 [Rudelson, Vershynin 2006] Compressed Sensing Setting [Chandrasekaran et al. 2011]
43. ### From [Amelunxen et al. 2013] Phase Transitions P/N 0 1

s/N r/ p N J = || · ||1 J = || · ||⇤ THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 0 25 50 75 100 0 25 50 75 100 0 10 20 30 0 300 600 900 FIGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empiri probability that the `1 minimization problem (2.6) identiﬁes a sparse vector x 0 2 R100 given random line THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 0 25 50 75 100 0 25 50 75 100 0 10 20 30 0 300 600 900 IGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empirical robability that the `1 minimization problem (2.6) identiﬁes a sparse vector x 0 2 R100 given random linear 1 0 1 1 P/N
44. ### Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error

• Performance Guarantees: Model Consistency
45. ### Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q

2 @J ( x0) ||q|| Minimal-norm certiﬁcate:
46. ### @J ( x0 ) ⇢ A( x0 ) = A↵Hull(

@J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
47. ### @J ( x0 ) ⇢ A( x0 ) = A↵Hull(

@J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate: ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certiﬁcate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
48. ### ! ⌘F is computed by solving a linear system. !

One does not always have ⌘F 2 D( x0 ) ! @J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate: ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certiﬁcate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
49. ### ! ⌘F is computed by solving a linear system. !

One does not always have ⌘F 2 D( x0 ) ! @J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate: Theorem: If ker( ) \ T = {0}, ⇢ ⌘F 2 ¯ D( x0) =) ⌘F = ⌘0, ⌘0 2 ¯ D( x0) =) ⌘F = ⌘0. ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certiﬁcate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)
50. ### [Vaiter et al. 2014] Model Stability Theorem: the unique solution

x ? of P (y) for y = x0 + w satisﬁes If ⌘F 2 ¯ D (x0), there exists C such that if max ( , ||w||/ ) 6 C x? 2 M x0 and ||x? x0 || = O(||w||, )
51. ### [Fuchs 2004]: J = || · ||1. [Bach 2008]: J

= || · ||1,2 and J = || · ||⇤. Previous works: [Vaiter et al. 2014] Model Stability Theorem: the unique solution x ? of P (y) for y = x0 + w satisﬁes If ⌘F 2 ¯ D (x0), there exists C such that if max ( , ||w||/ ) 6 C [Vaiter et al. 2011]: J = ||D⇤ · ||1. x? 2 M x0 and ||x? x0 || = O(||w||, )
52. ### Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1),

i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
53. ### P ⇠ 2 s log( N ) P ⇠ 2

s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
54. ### ! Similar results for || · ||1,2, || · ||⇤,

|| · ||1. P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
55. ### ! Similar results for || · ||1,2, || · ||⇤,

|| · ||1. ! Not using RIP technics (non-uniform result on x0). P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )
56. ### ⇥x = i xi (· i) Increasing : reduces correlation.

reduces resolution. J ( x ) = || x ||1 1-D Sparse Spikes Deconvolution x0 x0
57. ### ⇥x = i xi (· i) Increasing : reduces correlation.

reduces resolution. 0 10 2 support recovery. J ( x ) = || x ||1 () I = { j \ x0( j ) 6= 0} ||⌘F,Ic ||1 ||⌘F,Ic ||1 < 1 ⌘0 = ⌘F 2 ¯ D( x0) 1-D Sparse Spikes Deconvolution x0 x0 20 1 ()

59. ### Performance measures L2 error model di↵erent CS guarantees Conclusion Partial

smoothness: encodes models using singularities.
60. ### Performance measures L2 error model di↵erent CS guarantees Speciﬁc certiﬁcate:

⌘0, ⌘F , . . . Conclusion Partial smoothness: encodes models using singularities.
61. ### Performance measures L2 error model di↵erent CS guarantees Speciﬁc certiﬁcate:

– CS performance with arbitrary gauges. – Inﬁnite dimensional regularizations (BV, . . . ) – Convergence discrete ! continuous. ⌘0, ⌘F , . . . Conclusion Open problems: Partial smoothness: encodes models using singularities.