Low Complexity Regularization of Inverse Problems

Slide 1

Slide 1 text

Gabriel Peyré www.numerical-tours.com Low Complexity Regularization of Inverse Problems Samuel Vaiter Jalal Fadili Joint work with: VISI N

Slide 2

Slide 2 text

y = x0 + w 2 RP Inverse Problems Recovering x0 RN from noisy observations

Slide 3

Slide 3 text

y = x0 + w 2 RP Examples: Inpainting, super-resolution, . . . Inverse Problems Recovering x0 RN from noisy observations x0

Slide 4

Slide 4 text

x = ( p✓k )16k6K Inverse Problems in Medical Imaging

Slide 5

Slide 5 text

Magnetic resonance imaging (MRI): x = ( p✓k )16k6K x = ( ˆ f ( ! ))!2⌦ Inverse Problems in Medical Imaging ˆ x

Slide 6

Slide 6 text

Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . . . x = ( p✓k )16k6K x = ( ˆ f ( ! ))!2⌦ Inverse Problems in Medical Imaging ˆ x

Slide 7

Slide 7 text

˜ x0 Compressed Sensing [Rice Univ.]

Slide 8

Slide 8 text

P measures N micro-mirrors ˜ x0 y [ i ] = h x0, 'i i Compressed Sensing [Rice Univ.]

Slide 9

Slide 9 text

P/N = 0.16 P/N = 0.02 P/N = 1 P measures N micro-mirrors ˜ x0 y [ i ] = h x0, 'i i Compressed Sensing [Rice Univ.]

Slide 10

Slide 10 text

Inverse Problem Regularization observations y parameter Estimator: x(y) depends only on Observations: y = x0 + w 2 RP .

Slide 11

Slide 11 text

Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data ﬁdelity Regularity Observations: y = x0 + w 2 RP .

Slide 12

Slide 12 text

J ( x0) Regularity of x0 Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data ﬁdelity Regularity Observations: y = x0 + w 2 RP . Choice of : tradeo ||w|| Noise level

Slide 13

Slide 13 text

J ( x0) Regularity of x0 x ( y ) 2 argmin x = y J ( x ) Inverse Problem Regularization observations y parameter Example: variational methods Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Data ﬁdelity Regularity Observations: y = x0 + w 2 RP . No noise: 0+, minimize Choice of : tradeo ||w|| Noise level

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error • Performance Guarantees: Model Consistency

Slide 16

Slide 16 text

Coe cients x Image x M Union of Models for Data Processing Synthesis sparsity: Union of models: M ⇢ RN subspaces or manifolds.

Slide 17

Slide 17 text

Coe cients x Image x M Union of Models for Data Processing Synthesis sparsity: Structured sparsity: Union of models: M ⇢ RN subspaces or manifolds.

Slide 18

Slide 18 text

Coe cients x Image x M Union of Models for Data Processing D Image x Gradient D ⇤ x Synthesis sparsity: Structured sparsity: Analysis sparsity: Union of models: M ⇢ RN subspaces or manifolds.

Slide 19

Slide 19 text

Coe cients x Image x Multi-spectral imaging: xi,· = Pr j=1 Ai,jSj,· M Union of Models for Data Processing D Image x Gradient D ⇤ x Synthesis sparsity: Structured sparsity: Analysis sparsity: Low-rank: S1,· S2,· S3,· x Union of models: M ⇢ RN subspaces or manifolds.

Slide 20

Slide 20 text

J ( x ) = | x | Example: J ( x ) = | x |. Subdifferentials and Linear Models @J ( x ) Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}

Slide 21

Slide 21 text

I = supp( x ) = { i \ xi 6= 0} Example: J ( x ) = || x ||1 = P i | xi |. J ( x ) = | x | Example: J ( x ) = | x |. @J ( x ) Subdifferentials and Linear Models @ || x ||1 = ⇢ ⌘ \ supp( ⌘ ) = I, 8 j / 2 I, | ⌘j | 6 1 x @J ( x ) Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}

Slide 22

Slide 22 text

I = supp( x ) = { i \ xi 6= 0} Example: J ( x ) = || x ||1 = P i | xi |. J ( x ) = | x | Linear model: Example: J ( x ) = | x |. @J ( x ) T x Subdifferentials and Linear Models T x = {⌘ \ supp(⌘) = I} @ || x ||1 = ⇢ ⌘ \ supp( ⌘ ) = I, 8 j / 2 I, | ⌘j | 6 1 x @J ( x ) Tx = VectHull( @J ( x ))? Regularizer: J : RN ! R convex. Sub-di↵erential: @J ( x ) = { ⌘ ; 8 y, J ( y ) > J ( x ) + h ⌘, y x i}

Slide 23

Slide 23 text

[Lewis 2003] (i) J is C 2 along M x around x ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)

Slide 24

Slide 24 text

[Lewis 2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)

Slide 25

Slide 25 text

(iii) @J is continuous on M x around x. [Lewis 2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)

Slide 26

Slide 26 text

(iii) @J is continuous on M x around x. [Lewis 2003] (i) J is C 2 along M x around x ; (ii) () 8 h 2 T x ?, t 7! J ( x + th ) is non-smooth at t = 0. (ii) VecHull( @J ( x ))? = Tx = Tangent x (M x ) ; Important: in general M x 6= T x Partly Smooth Functions J : RN ! R is partly smooth at x for a manifold M x x T x M x J(x) = max(0, || x || 1)

Slide 27

Slide 27 text

x M x Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 M x = Tx ={ z ; supp( z ) ⇢ supp( x )}

Slide 28

Slide 28 text

x M x same M x Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || M x = Tx ={ z ; supp( z ) ⇢ supp( x )}

Slide 29

Slide 29 text

x M x M x = { x ; rank( z ) = rank( x )} same M x Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x J ( x ) = || x ||⇤ x M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || Nuclear norm: J ( x ) = || x ||⇤ M x = Tx ={ z ; supp( z ) ⇢ supp( x )}

Slide 30

Slide 30 text

x M x M x = { x ; rank( z ) = rank( x )} same M x I = { i ; | xi | = || x ||1 } M x = Tx = { z ; zI / xI } Examples of Partly-smooth Regularizers J ( x ) = || x ||1 x 0 M x 0 M x J ( x ) = || x ||1 x x 0 M x J ( x ) = || x ||⇤ x M x J ( x )=| x1 |+|| x2,3 || x 0 x M x 0 M x 0 Anti-sparsity: J ( x ) = || x ||1 ` 1 sparsity: J ( x ) = || x ||1 Structured sparsity: J ( x ) = P b || xb || Nuclear norm: J ( x ) = || x ||⇤ M x = Tx ={ z ; supp( z ) ⇢ supp( x )}

Slide 31

Slide 31 text

Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error • Performance Guarantees: Model Consistency

Slide 32

Slide 32 text

x = x0 Dual Certificates Noiseless recovery: min x = x0 J ( x ) (P0) x0

Slide 33

Slide 33 text

Dual certiﬁcates: x = x0 Proposition: D( x0) = Im( ⇤) \ @J ( x0) x0 solution of ( P0) () 9 ⌘ 2 D (x0) Dual Certificates ⌘ @J ( x0) Noiseless recovery: min x = x0 J ( x ) (P0) x0

Slide 34

Slide 34 text

Dual certiﬁcates: x = x0 Proposition: D( x0) = Im( ⇤) \ @J ( x0) x0 solution of ( P0) () 9 ⌘ 2 D (x0) Example: J ( x ) = || x ||1 x = x ? ' D( x0) = { ⌘ = x ? ' ; ⌘i = sign( x0,i) , || ⌘ ||1 6 1} Dual Certificates ⌘ @J ( x0) Noiseless recovery: min x = x0 J ( x ) (P0) x0 ⌘ ⌘

Slide 35

Slide 35 text

= interior for the topology of a↵( E ) ri( E ) = relative interior of E Non degenerate dual certiﬁcate: Dual Certificates and L2 Stability x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))

Slide 36

Slide 36 text

= interior for the topology of a↵( E ) ri( E ) = relative interior of E Non degenerate dual certiﬁcate: Dual Certificates and L2 Stability Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))

Slide 37

Slide 37 text

= interior for the topology of a↵( E ) ri( E ) = relative interior of E Non degenerate dual certiﬁcate: Dual Certificates and L2 Stability [Grassmair 2012]: J(x? x0) = O(||w||). [Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 . Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))

Slide 38

Slide 38 text

= interior for the topology of a↵( E ) ri( E ) = relative interior of E Non degenerate dual certiﬁcate: Dual Certificates and L2 Stability [Grassmair 2012]: J(x? x0) = O(||w||). [Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 . ! The constants depend on N . . . Theorem: [Fadili et al. 2013] If 9 ⌘ 2 ¯ D ( x0), for ⇠ ||w|| one has ||x? x0 || = O ( ||w|| ) x = x0 ⌘ @J ( x0) x ? ¯ D( x0) = Im( ⇤) \ ri( @J ( x0))

Slide 39

Slide 39 text

Random matrix: i,j ⇠ N(0, 1), i.i.d. 2 RP ⇥N , Compressed Sensing Setting

Slide 40

Slide 40 text

Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) [Rudelson, Vershynin 2006] Compressed Sensing Setting

Slide 41

Slide 41 text

Theorem: Then 9 ⌘ 2 ¯ D (x0) with high probability on . Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Low-rank matrices: J = || · ||⇤. Let s = || x0 ||0. If Let r = rank( x0). If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) P > 3r(N1 + N2 r) x0 2 RN1 ⇥N2 [Rudelson, Vershynin 2006] Compressed Sensing Setting [Chandrasekaran et al. 2011]

Slide 42

Slide 42 text

Theorem: Then 9 ⌘ 2 ¯ D (x0) with high probability on . Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Low-rank matrices: J = || · ||⇤. Let s = || x0 ||0. If Let r = rank( x0). If Then 9 ⌘ 2 ¯ D (x0) with high probability on . 2 RP ⇥N , ! Similar results for || · ||1,2, || · ||1. [Chandrasekaran et al. 2011] P > 2 s log ( N/s ) P > 3r(N1 + N2 r) x0 2 RN1 ⇥N2 [Rudelson, Vershynin 2006] Compressed Sensing Setting [Chandrasekaran et al. 2011]

Slide 43

Slide 43 text

From [Amelunxen et al. 2013] Phase Transitions P/N 0 1 s/N r/ p N J = || · ||1 J = || · ||⇤ THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 0 25 50 75 100 0 25 50 75 100 0 10 20 30 0 300 600 900 FIGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empiri probability that the `1 minimization problem (2.6) identiﬁes a sparse vector x 0 2 R100 given random line THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 0 25 50 75 100 0 25 50 75 100 0 10 20 30 0 300 600 900 IGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empirical robability that the `1 minimization problem (2.6) identiﬁes a sparse vector x 0 2 R100 given random linear 1 0 1 1 P/N

Slide 44

Slide 44 text

Overview • Low-complexity Convex Regularization • Performance Guarantees: L2 Error • Performance Guarantees: Model Consistency

Slide 45

Slide 45 text

Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate:

Slide 46

Slide 46 text

@J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)

Slide 47

Slide 47 text

@J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate: ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certiﬁcate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)

Slide 48

Slide 48 text

! ⌘F is computed by solving a linear system. ! One does not always have ⌘F 2 D( x0 ) ! @J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate: ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certiﬁcate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)

Slide 49

Slide 49 text

! ⌘F is computed by solving a linear system. ! One does not always have ⌘F 2 D( x0 ) ! @J ( x0 ) ⇢ A( x0 ) = A↵Hull( @J ( x0 )) Minimal Norm Certificate ⌘0 = argmin ⌘ = ⇤ q 2 @J ( x0) ||q|| Minimal-norm certiﬁcate: Theorem: If ker( ) \ T = {0}, ⇢ ⌘F 2 ¯ D( x0) =) ⌘F = ⌘0, ⌘0 2 ¯ D( x0) =) ⌘F = ⌘0. ⌘ F = argmin ⌘ = ⇤ q 2A( x0) ||q|| Linearized pre-certiﬁcate: T =T x0 Case J = || · ||1 A( x0) x0 @J ( x0)

Slide 50

Slide 50 text

[Vaiter et al. 2014] Model Stability Theorem: the unique solution x ? of P (y) for y = x0 + w satisﬁes If ⌘F 2 ¯ D (x0), there exists C such that if max ( , ||w||/ ) 6 C x? 2 M x0 and ||x? x0 || = O(||w||, )

Slide 51

Slide 51 text

[Fuchs 2004]: J = || · ||1. [Bach 2008]: J = || · ||1,2 and J = || · ||⇤. Previous works: [Vaiter et al. 2014] Model Stability Theorem: the unique solution x ? of P (y) for y = x0 + w satisﬁes If ⌘F 2 ¯ D (x0), there exists C such that if max ( , ||w||/ ) 6 C [Vaiter et al. 2011]: J = ||D⇤ · ||1. x? 2 M x0 and ||x? x0 || = O(||w||, )

Slide 52

Slide 52 text

Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )

Slide 53

Slide 53 text

P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )

Slide 54

Slide 54 text

! Similar results for || · ||1,2, || · ||⇤, || · ||1. P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )

Slide 55

Slide 55 text

! Similar results for || · ||1,2, || · ||⇤, || · ||1. ! Not using RIP technics (non-uniform result on x0). P ⇠ 2 s log( N ) P ⇠ 2 s log( N/s ) L2 stability Model stability Phase transitions: vs. Compressed Sensing Setting Theorem: Random matrix: i,j ⇠ N(0, 1), i.i.d. Sparse vectors: J = || · ||1. Let s = || x0 ||0. If 2 RP ⇥N , Then ⌘0 2 ¯ D (x0) with high probability on . [Dossal et al. 2011] [Wainwright 2009] P > 2 s log( N )

Slide 56

Slide 56 text

⇥x = i xi (· i) Increasing : reduces correlation. reduces resolution. J ( x ) = || x ||1 1-D Sparse Spikes Deconvolution x0 x0

Slide 57

Slide 57 text

⇥x = i xi (· i) Increasing : reduces correlation. reduces resolution. 0 10 2 support recovery. J ( x ) = || x ||1 () I = { j \ x0( j ) 6= 0} ||⌘F,Ic ||1 ||⌘F,Ic ||1 < 1 ⌘0 = ⌘F 2 ¯ D( x0) 1-D Sparse Spikes Deconvolution x0 x0 20 1 ()

Slide 58

Slide 58 text

Conclusion Partial smoothness: encodes models using singularities.

Slide 59

Slide 59 text

Performance measures L2 error model di↵erent CS guarantees Conclusion Partial smoothness: encodes models using singularities.

Slide 60

Slide 60 text

Performance measures L2 error model di↵erent CS guarantees Speciﬁc certiﬁcate: ⌘0, ⌘F , . . . Conclusion Partial smoothness: encodes models using singularities.

Slide 61

Slide 61 text

Performance measures L2 error model di↵erent CS guarantees Specific certificate: – CS performance with arbitrary gauges. – Infinite dimensional regularizations (BV, . . . ) – Convergence discrete ! continuous. ⌘0, ⌘F , . . . Conclusion Open problems: Partial smoothness: encodes models using singularities.