Parameter Selection for Sparse Regularization of Inverse Problems

Gabriel Peyré www.numerical-tours.com Parameter Selection for Sparse Regularization of Inverse
Problems Samuel Vaiter Charles {Dossal, Deledalle} Jalal Fadili Joint work with:

Overview • Inverse Problems • Stein Unbiased Risk Estimators •
Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms

Y = x0 + W 2 RP Hypothesis: W ⇠
N (0 , 2 IdP ), is known. Inverse Problems Recovering x0 RN from noisy observations

Y = x0 + W 2 RP Hypothesis: W ⇠
N (0 , 2 IdP ), is known. Examples: Inpainting, super-resolution, compressed-sensing Inverse Problems Recovering x0 RN from noisy observations x0 x0

x0 = ( p✓k )16k6K Inverse Problem in Medical Imaging
x0

() Fourier slice theorem x0 = ( p✓k )16k6K Inverse
Problem in Medical Imaging ˆ x0( ! ) x0

Magnetic resonance imaging (MRI): () Fourier slice theorem x0 =
( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0

Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . .
. () Fourier slice theorem x0 = ( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ x0 y Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0

observations y parameter Measures: Y = x0 + W ,
W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on Estimators

observations y parameter Example: variational methods Measures: Y = x0
+ W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Estimators Data ﬁdelity Regularity

observations y parameter Example: variational methods Method: use risk estimators
computed from y alone. Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data ﬁdelity Regularity

observations y parameter Example: variational methods Method: use risk estimators
computed from y alone. Theoretical approaches (closed form expressions). Numerical approaches (recursive di↵erentiation). Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data ﬁdelity Regularity

? = argmin R( ) Plugin-estimator: x ? ( Y
) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)

But: E W is not accessible ! use one observation.
? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)

But: x0 is not accessible ! needs risk estimators. E
W is not accessible ! use one observation. ? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)

Prediction: µ(y) = x(y) µ(y + ) = µ(y) +
@µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable Prediction Risk Estimation

Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y +
) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) Prediction Risk Estimation

Theorem: [Stein, 1981] Stein Unbiased Risk Estimator: Prediction: µ(y) =
x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation

Theorem: [Stein, 1981] Other estimators: GCV, BIC, AIC, . .
. SURE: Requires (not always available) Unbiased and good practical performances Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation

Problem: || x0 x(y) || poor indicator of || x0
x(y) || . Generalized SURE

Problem: || x0 x(y) || poor indicator of || x0
x(y) || . E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +

Generalized df: ML estimator: ˆ x(y) = ⇤ ( ⇤
) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +

Theorem: Generalized df: ML estimator: ˆ x(y) = ⇤ (
⇤ ) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (GSURE( Y )) = E W (||⇧( x0 x ( Y ))||2) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE [Eldar 09, Pesquet al. 09, Vonesh et al. 08] Projker( )? = ⇧ = ⇤ ( ⇤ ) +

min x 2RN 1 2 || y ( ) x
||2 2 + || x ||1 Coe cients x Image x J ( x ) = || x ||1 = P i | xi | Sparse Regularizations Synthesis regularization

D min x 2RN 1 2 || y ( )
x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization

D min x 2RN 1 2 || y ( )
x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization

D min x 2RN 1 2 || y ( )
x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. ! Analysis more general when using as operator. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization

Estimator: x ( y ) 2 argmin x 2RN 1
2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. Sparse Analysis Local Variations

Estimator: x ( y ) 2 argmin x 2RN 1
2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). Sparse Analysis Local Variations Theorem:

Questions: Estimator: x ( y ) 2 argmin x 2RN
1 2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). ! Find a closed form expression for @µ ( y ). Sparse Analysis Local Variations Theorem: ! Prove that this formula is valid almost everywhere.

If J is smooth: Variational estimator: x ( y )
2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 Heuristic Derivation Data ﬁdelity Regularity

If J is smooth: Implicit function theorem: Variational estimator: x
( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J Heuristic Derivation Data ﬁdelity Regularity

If J is smooth: Implicit function theorem: Di culties: !
What if J not di↵erentiable? Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely deﬁned? ! What if ( y ) non-invertible? Heuristic Derivation Data ﬁdelity Regularity

If J is smooth: Implicit function theorem: Di culties: !
What if J not di↵erentiable? = ) Answers in the case of analysis sparsity. Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely deﬁned? ! What if ( y ) non-invertible? Heuristic Derivation Data ﬁdelity Regularity

D = 1 1 0 0 0 1 1 0
0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D B = {x \ ||D x||1 1}

D = 1 1 0 0 0 1 1 0
0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}

J = Ic (P (y)) Support of the solution: I
= { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1

J = Ic (P (y)) I Support of the solution:
1-D total variation: I = positions of discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1

J = Ic (P (y)) I Support of the solution:
1-D total variation: Sub-space model: I = positions of discontinuities. For TV: GJ = signals with prescribed discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x GJ = ker(DJ ) = Im(DJ ) x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1

To be understood: there exists a solution with same sign.
Local Sign Stability (P (y)) Lemma: x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .

D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .

D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .

D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .

D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) Theorem: If y / 2 H , for ¯ y close to y , x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H . ˆ x(¯ y) is a solution of P (¯ y).

Application to GSURE For y / 2 H , one
has locally: µ(y) = A[J] ⇤y + cst.

df(y) = div (µ) (y) = dim(GJ ) Application to
GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.

for D = Id (synthesis) df(y) = div (µ) (y)
= dim(GJ ) df( y ) = || x ( y )||0 Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.

for D = Id (synthesis) are unbiased estimators of df
and gdf. df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.

In practice: for D = Id (synthesis) Proposition: z ⇠
N(0, IdP ) ✓ ⇤ DJ D⇤ J 0 ◆ ✓ ⌫(z) ˜ ⌫ ◆ = ✓ ⇤z 0 ◆ where ⌫ ( z ) solves are unbiased estimators of df and gdf. Trick: tr(A) = E z(hAz, zi), z ⇠ N(0, IdP ). [Ramani et al’08] df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) gdf(y) = E z(h⌫(z), +zi), gdf(y) ⇡ 1 K PK k=1 h⌫(zk), +zk i, zk ⇠ N(0, IdP ). Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.

`1 `2 regularization: J ( x ) = X b2B
|| xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b Group Lasso Variations

`1 `2 regularization: J ( x ) = X b2B
|| xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0

Theorem: `1 `2 regularization: J ( x ) = X
b2B || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. where: x = diag(Id b/ || xb ||) b 2 I Px = Id I diag( xbx ⇤ b ) b 2 I 8 y / 2 H , @µI( y ) = I I( x ( y )) 1 ⇤ I ( x ) = ⇤ I I + x Px Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0

x0 170.000 290.000 410.000 530.000 0.45 0.5 0.55 0.6 0.65
0.7 0.75 0.8 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final : sub-sampled radon transform. #projections=16. x0 +y x ? ( y ) CT Scan (Tomography) D = [@1, @2] Finite di↵erences gradient:

MRI (Fourier Sampling) 170.000 290.000 410.000 530.000 0.3 0.4 0.5
0.6 0.7 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final ⌦ x0 +y x ? ( y ) x0 = (ˆ x0( ! ))!2⌦

Limitations of the theoretical approach: ! Closed form of @µ
( y ) not available for arbitrary J . ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?

Example: synthesis `1 sparsity: x x ( y ) Limitations
of the theoretical approach: ! support is not correctly estimated by iterated thresholdings. ! Closed form of @µ ( y ) not available for arbitrary J . df( y ) = || x ( y )||0 One can have || x x ( y )|| small but |df( y ) || x ||0 | large. ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?

Iterative method: Iterative Estimators x (`+1)( y ) = '
( x (`)( y ) , y ) `!+1 ! x ( y )

Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'
( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Iterative Estimators x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Similar approaches: [Ramani, Blu, Unser’08], [Giryes, Elad, Eldar’11] Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )

Simple function: prox J (x) = argmin z 1 2
|| x z ||2 + J(z) Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

|| x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi prox||·||1 Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

|| x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

|| x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

|| x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . ! Extension to other schemes (DR, GFB, primal-dual, . . . ). prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )

y Deconvolution Sub-sampling ! use either GFB (aux. variables) or
primal-dual scheme. Example: Isotropic Total Variation (g) xLS ( y ) (h) xB ( y ) at the optimal B . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x ( y ) at the optimal / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . imation for inverse problems June 30, 2012 16 / 18 (g) x L S (y) (h) x B (y) at the optimal B / N = 0 . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x (y) at the optimal atrix (P / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . Risk estimation for inverse problems June 30, 2012 16 / 18 ge processing (b) y (c) x ( y ) at the optimal cessing (b) y (c) x ( y ) at the optimal 1 2 4 8 Size of block B (f) Figure: A random CS matrix 2 4 6 8 10 12 1 1.5 2 2.5 x 106 Regularization parameter λ Quadratic loss Projected Risk GSURE True Risk (a) Figure: A sub-samplin C. Deledalle (CEREMADE) Numerical Examples - Im 1 2 3 4 5 0.6 0.8 1 1.2 1.4 1.6 1.8 x 107 Quadratic loss Projected Risk GSURE x ? ( y ) ? Quadratic loss Quadratic loss J ( x ) = ||r x ||1,2

SVD: x = V diag( ⇢ ) U ⇤ Example:
Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.

Nuclear norm: SVD: x = V diag( ⇢ ) U
⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.

Nuclear norm: Compu t ing @prox ||·||⇤ : see [Deledalle
et al., ICML 2012] SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.

x0 Nuclear norm: Compu t ing @prox ||·||⇤ : see
[Deledalle et al., ICML 2012] Inpainting: = diag("i)i, "i 2 {0, 1} +y SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) x completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square cal Examples – Matrix completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk Hypothesis: x0 2 Rn⇥p has low-rank. x ? ( y ) ?

Computing risk estimates () Sensitivity of the estimator Conclusion

Computing risk estimates () Sensitivity of the estimator Closed form
expression: ! quadratic, `1 regularization, . . . Conclusion

expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion

expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion SURE ?

Parameter Selection for Sparse Regularization o...

Parameter Selection for Sparse Regularization of Inverse Problems

More Decks by Gabriel Peyré

Other Decks in Research

Featured

Transcript