Parameter Selection for Sparse Regularization of Inverse Problems

Slide 1

Slide 1 text

Gabriel Peyré www.numerical-tours.com Parameter Selection for Sparse Regularization of Inverse Problems Samuel Vaiter Charles {Dossal, Deledalle} Jalal Fadili Joint work with:

Slide 2

Slide 2 text

Overview • Inverse Problems • Stein Unbiased Risk Estimators • Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms

Slide 3

Slide 3 text

Y = x0 + W 2 RP Hypothesis: W ⇠ N (0 , 2 IdP ), is known. Inverse Problems Recovering x0 RN from noisy observations

Slide 4

Slide 4 text

Y = x0 + W 2 RP Hypothesis: W ⇠ N (0 , 2 IdP ), is known. Examples: Inpainting, super-resolution, compressed-sensing Inverse Problems Recovering x0 RN from noisy observations x0 x0

Slide 5

Slide 5 text

x0 = ( p✓k )16k6K Inverse Problem in Medical Imaging x0

Slide 6

Slide 6 text

() Fourier slice theorem x0 = ( p✓k )16k6K Inverse Problem in Medical Imaging ˆ x0( ! ) x0

Slide 7

Slide 7 text

Magnetic resonance imaging (MRI): () Fourier slice theorem x0 = ( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0

Slide 8

Slide 8 text

Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . . . () Fourier slice theorem x0 = ( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ x0 y Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0

Slide 9

Slide 9 text

observations y parameter Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on Estimators

Slide 10

Slide 10 text

observations y parameter Example: variational methods Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Estimators Data ﬁdelity Regularity

Slide 11

Slide 11 text

observations y parameter Example: variational methods Method: use risk estimators computed from y alone. Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data ﬁdelity Regularity

Slide 12

Slide 12 text

observations y parameter Example: variational methods Method: use risk estimators computed from y alone. Theoretical approaches (closed form expressions). Numerical approaches (recursive di↵erentiation). Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data ﬁdelity Regularity

Slide 13

Slide 13 text

Overview • Inverse Problems • Stein Unbiased Risk Estimators • Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms

Slide 14

Slide 14 text

? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)

Slide 15

Slide 15 text

But: E W is not accessible ! use one observation. ? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)

Slide 16

Slide 16 text

But: x0 is not accessible ! needs risk estimators. E W is not accessible ! use one observation. ? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)

Slide 17

Slide 17 text

Prediction: µ(y) = x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable Prediction Risk Estimation

Slide 18

Slide 18 text

Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) Prediction Risk Estimation

Slide 19

Slide 19 text

Theorem: [Stein, 1981] Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation

Slide 20

Slide 20 text

Theorem: [Stein, 1981] Other estimators: GCV, BIC, AIC, . . . SURE: Requires (not always available) Unbiased and good practical performances Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation

Slide 21

Slide 21 text

Problem: || x0 x(y) || poor indicator of || x0 x(y) || . Generalized SURE

Slide 22

Slide 22 text

Problem: || x0 x(y) || poor indicator of || x0 x(y) || . E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +

Slide 23

Slide 23 text

Generalized df: ML estimator: ˆ x(y) = ⇤ ( ⇤ ) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +

Slide 24

Slide 24 text

Theorem: Generalized df: ML estimator: ˆ x(y) = ⇤ ( ⇤ ) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (GSURE( Y )) = E W (||⇧( x0 x ( Y ))||2) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE [Eldar 09, Pesquet al. 09, Vonesh et al. 08] Projker( )? = ⇧ = ⇤ ( ⇤ ) +

Slide 25

Slide 25 text

Overview • Inverse Problems • Stein Unbiased Risk Estimators • Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms

Slide 26

Slide 26 text

min x 2RN 1 2 || y ( ) x ||2 2 + || x ||1 Coe cients x Image x J ( x ) = || x ||1 = P i | xi | Sparse Regularizations Synthesis regularization

Slide 27

Slide 27 text

D min x 2RN 1 2 || y ( ) x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization

Slide 28

Slide 28 text

D min x 2RN 1 2 || y ( ) x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization

Slide 29

Slide 29 text

D min x 2RN 1 2 || y ( ) x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. ! Analysis more general when using as operator. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization

Slide 30

Slide 30 text

Estimator: x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. Sparse Analysis Local Variations

Slide 31

Slide 31 text

Estimator: x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). Sparse Analysis Local Variations Theorem:

Slide 32

Slide 32 text

Questions: Estimator: x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). ! Find a closed form expression for @µ ( y ). Sparse Analysis Local Variations Theorem: ! Prove that this formula is valid almost everywhere.

Slide 33

Slide 33 text

If J is smooth: Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 Heuristic Derivation Data ﬁdelity Regularity

Slide 34

Slide 34 text

If J is smooth: Implicit function theorem: Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J Heuristic Derivation Data ﬁdelity Regularity

Slide 35

Slide 35 text

If J is smooth: Implicit function theorem: Di culties: ! What if J not di↵erentiable? Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely deﬁned? ! What if ( y ) non-invertible? Heuristic Derivation Data ﬁdelity Regularity

Slide 36

Slide 36 text

If J is smooth: Implicit function theorem: Di culties: ! What if J not di↵erentiable? = ) Answers in the case of analysis sparsity. Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely deﬁned? ! What if ( y ) non-invertible? Heuristic Derivation Data ﬁdelity Regularity

Slide 37

Slide 37 text

D = 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D B = {x \ ||D x||1 1}

Slide 38

Slide 38 text

D = 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}

Slide 39

Slide 39 text

D = 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}

Slide 40

Slide 40 text

D = 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}

Slide 41

Slide 41 text

J = Ic (P (y)) Support of the solution: I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1

Slide 42

Slide 42 text

J = Ic (P (y)) I Support of the solution: 1-D total variation: I = positions of discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1

Slide 43

Slide 43 text

J = Ic (P (y)) I Support of the solution: 1-D total variation: Sub-space model: I = positions of discontinuities. For TV: GJ = signals with prescribed discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x GJ = ker(DJ ) = Im(DJ ) x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1

Slide 44

Slide 44 text

To be understood: there exists a solution with same sign. Local Sign Stability (P (y)) Lemma: x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .

Slide 45

Slide 45 text

To be understood: there exists a solution with same sign. D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .

Slide 46

Slide 46 text

To be understood: there exists a solution with same sign. D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .

Slide 47

Slide 47 text

To be understood: there exists a solution with same sign. D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .

Slide 48

Slide 48 text

To be understood: there exists a solution with same sign. D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) Theorem: If y / 2 H , for ¯ y close to y , x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H . ˆ x(¯ y) is a solution of P (¯ y).

Slide 49

Slide 49 text

Application to GSURE For y / 2 H , one has locally: µ(y) = A[J] ⇤y + cst.

Slide 50

Slide 50 text

df(y) = div (µ) (y) = dim(GJ ) Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.

Slide 51

Slide 51 text

for D = Id (synthesis) df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.

Slide 52

Slide 52 text

for D = Id (synthesis) are unbiased estimators of df and gdf. df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.

Slide 53

Slide 53 text

In practice: for D = Id (synthesis) Proposition: z ⇠ N(0, IdP ) ✓ ⇤ DJ D⇤ J 0 ◆ ✓ ⌫(z) ˜ ⌫ ◆ = ✓ ⇤z 0 ◆ where ⌫ ( z ) solves are unbiased estimators of df and gdf. Trick: tr(A) = E z(hAz, zi), z ⇠ N(0, IdP ). [Ramani et al’08] df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) gdf(y) = E z(h⌫(z), +zi), gdf(y) ⇡ 1 K PK k=1 h⌫(zk), +zk i, zk ⇠ N(0, IdP ). Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.

Slide 54

Slide 54 text

`1 `2 regularization: J ( x ) = X b2B || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b Group Lasso Variations

Slide 55

Slide 55 text

`1 `2 regularization: J ( x ) = X b2B || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0

Slide 56

Slide 56 text

Theorem: `1 `2 regularization: J ( x ) = X b2B || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. where: x = diag(Id b/ || xb ||) b 2 I Px = Id I diag( xbx ⇤ b ) b 2 I 8 y / 2 H , @µI( y ) = I I( x ( y )) 1 ⇤ I ( x ) = ⇤ I I + x Px Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0

Slide 57

Slide 57 text

x0 170.000 290.000 410.000 530.000 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final : sub-sampled radon transform. #projections=16. x0 +y x ? ( y ) CT Scan (Tomography) D = [@1, @2] Finite di↵erences gradient:

Slide 58

Slide 58 text

MRI (Fourier Sampling) 170.000 290.000 410.000 530.000 0.3 0.4 0.5 0.6 0.7 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final ⌦ x0 +y x ? ( y ) x0 = (ˆ x0( ! ))!2⌦

Slide 59

Slide 59 text

Overview • Inverse Problems • Stein Unbiased Risk Estimators • Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms

Slide 60

Slide 60 text

Limitations of the theoretical approach: ! Closed form of @µ ( y ) not available for arbitrary J . ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?

Slide 61

Slide 61 text

Example: synthesis `1 sparsity: x x ( y ) Limitations of the theoretical approach: ! support is not correctly estimated by iterated thresholdings. ! Closed form of @µ ( y ) not available for arbitrary J . df( y ) = || x ( y )||0 One can have || x x ( y )|| small but |df( y ) || x ||0 | large. ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?

Slide 62

Slide 62 text

Iterative method: Iterative Estimators x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

Slide 63

Slide 63 text

Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1' ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Iterative Estimators x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

Slide 64

Slide 64 text

Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1' ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

Slide 65

Slide 65 text

Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1' ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

Slide 66

Slide 66 text

Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1' ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

Slide 67

Slide 67 text

Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1' ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

Slide 68

Slide 68 text

Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1' ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Similar approaches: [Ramani, Blu, Unser’08], [Giryes, Elad, Eldar’11] Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

Slide 69

Slide 69 text

Simple function: prox J (x) = argmin z 1 2 || x z ||2 + J(z) Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

Slide 70

Slide 70 text

Simple function: prox J (x) = argmin z 1 2 || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi prox||·||1 Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

Slide 71

Slide 71 text

Simple function: prox J (x) = argmin z 1 2 || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

Slide 72

Slide 72 text

Simple function: prox J (x) = argmin z 1 2 || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

Slide 73

Slide 73 text

Simple function: prox J (x) = argmin z 1 2 || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . ! Extension to other schemes (DR, GFB, primal-dual, . . . ). prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

Slide 74

Slide 74 text

y Deconvolution Sub-sampling ! use either GFB (aux. variables) or primal-dual scheme. Example: Isotropic Total Variation (g) xLS ( y ) (h) xB ( y ) at the optimal B . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x ( y ) at the optimal / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . imation for inverse problems June 30, 2012 16 / 18 (g) x L S (y) (h) x B (y) at the optimal B / N = 0 . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x (y) at the optimal atrix (P / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . Risk estimation for inverse problems June 30, 2012 16 / 18 ge processing (b) y (c) x ( y ) at the optimal cessing (b) y (c) x ( y ) at the optimal 1 2 4 8 Size of block B (f) Figure: A random CS matrix 2 4 6 8 10 12 1 1.5 2 2.5 x 106 Regularization parameter λ Quadratic loss Projected Risk GSURE True Risk (a) Figure: A sub-samplin C. Deledalle (CEREMADE) Numerical Examples - Im 1 2 3 4 5 0.6 0.8 1 1.2 1.4 1.6 1.8 x 107 Quadratic loss Projected Risk GSURE x ? ( y ) ? Quadratic loss Quadratic loss J ( x ) = ||r x ||1,2

Slide 75

Slide 75 text

SVD: x = V diag( ⇢ ) U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.

Slide 76

Slide 76 text

Nuclear norm: SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.

Slide 77

Slide 77 text

Nuclear norm: Compu t ing @prox ||·||⇤ : see [Deledalle et al., ICML 2012] SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.

Slide 78

Slide 78 text

x0 Nuclear norm: Compu t ing @prox ||·||⇤ : see [Deledalle et al., ICML 2012] Inpainting: = diag("i)i, "i 2 {0, 1} +y SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) x completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square cal Examples – Matrix completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk Hypothesis: x0 2 Rn⇥p has low-rank. x ? ( y ) ?

Slide 79

Slide 79 text

Computing risk estimates () Sensitivity of the estimator Conclusion

Slide 80

Slide 80 text

Computing risk estimates () Sensitivity of the estimator Closed form expression: ! quadratic, `1 regularization, . . . Conclusion

Slide 81

Slide 81 text

Computing risk estimates () Sensitivity of the estimator Closed form expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion