310

# Parameter Selection for Sparse Regularization of Inverse Problems

Talk at Cambridge

March 07, 2013

## Transcript

1. ### Gabriel Peyré www.numerical-tours.com Parameter Selection for Sparse Regularization of Inverse

Problems Samuel Vaiter Charles {Dossal, Deledalle} Jalal Fadili Joint work with:
2. ### Overview • Inverse Problems • Stein Unbiased Risk Estimators •

Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
3. ### Y = x0 + W 2 RP Hypothesis: W ⇠

N (0 , 2 IdP ), is known. Inverse Problems Recovering x0 RN from noisy observations
4. ### Y = x0 + W 2 RP Hypothesis: W ⇠

N (0 , 2 IdP ), is known. Examples: Inpainting, super-resolution, compressed-sensing Inverse Problems Recovering x0 RN from noisy observations x0 x0

x0
6. ### () Fourier slice theorem x0 = ( p✓k )16k6K Inverse

Problem in Medical Imaging ˆ x0( ! ) x0
7. ### Magnetic resonance imaging (MRI): () Fourier slice theorem x0 =

( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0
8. ### Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . .

. () Fourier slice theorem x0 = ( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ x0 y Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0
9. ### observations y parameter Measures: Y = x0 + W ,

W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on Estimators
10. ### observations y parameter Example: variational methods Measures: Y = x0

+ W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Estimators Data ﬁdelity Regularity
11. ### observations y parameter Example: variational methods Method: use risk estimators

computed from y alone. Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data ﬁdelity Regularity
12. ### observations y parameter Example: variational methods Method: use risk estimators

computed from y alone. Theoretical approaches (closed form expressions). Numerical approaches (recursive di↵erentiation). Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data ﬁdelity Regularity
13. ### Overview • Inverse Problems • Stein Unbiased Risk Estimators •

Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
14. ### ? = argmin R( ) Plugin-estimator: x ? ( Y

) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
15. ### But: E W is not accessible ! use one observation.

? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
16. ### But: x0 is not accessible ! needs risk estimators. E

W is not accessible ! use one observation. ? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
17. ### Prediction: µ(y) = x(y) µ(y + ) = µ(y) +

@µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable Prediction Risk Estimation
18. ### Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y +

) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) Prediction Risk Estimation
19. ### Theorem: [Stein, 1981] Stein Unbiased Risk Estimator: Prediction: µ(y) =

x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation
20. ### Theorem: [Stein, 1981] Other estimators: GCV, BIC, AIC, . .

. SURE: Requires (not always available) Unbiased and good practical performances Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation
21. ### Problem: || x0 x(y) || poor indicator of || x0

x(y) || . Generalized SURE
22. ### Problem: || x0 x(y) || poor indicator of || x0

x(y) || . E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +
23. ### Generalized df: ML estimator: ˆ x(y) = ⇤ ( ⇤

) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +
24. ### Theorem: Generalized df: ML estimator: ˆ x(y) = ⇤ (

⇤ ) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (GSURE( Y )) = E W (||⇧( x0 x ( Y ))||2) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE [Eldar 09, Pesquet al. 09, Vonesh et al. 08] Projker( )? = ⇧ = ⇤ ( ⇤ ) +
25. ### Overview • Inverse Problems • Stein Unbiased Risk Estimators •

Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
26. ### min x 2RN 1 2 || y ( ) x

||2 2 + || x ||1 Coe cients x Image x J ( x ) = || x ||1 = P i | xi | Sparse Regularizations Synthesis regularization
27. ### D min x 2RN 1 2 || y ( )

x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
28. ### D min x 2RN 1 2 || y ( )

x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
29. ### D min x 2RN 1 2 || y ( )

x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. ! Analysis more general when using as operator. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
30. ### Estimator: x ( y ) 2 argmin x 2RN 1

2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. Sparse Analysis Local Variations
31. ### Estimator: x ( y ) 2 argmin x 2RN 1

2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). Sparse Analysis Local Variations Theorem:
32. ### Questions: Estimator: x ( y ) 2 argmin x 2RN

1 2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). ! Find a closed form expression for @µ ( y ). Sparse Analysis Local Variations Theorem: ! Prove that this formula is valid almost everywhere.
33. ### If J is smooth: Variational estimator: x ( y )

2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 Heuristic Derivation Data ﬁdelity Regularity
34. ### If J is smooth: Implicit function theorem: Variational estimator: x

( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J Heuristic Derivation Data ﬁdelity Regularity
35. ### If J is smooth: Implicit function theorem: Di culties: !

What if J not di↵erentiable? Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely deﬁned? ! What if ( y ) non-invertible? Heuristic Derivation Data ﬁdelity Regularity
36. ### If J is smooth: Implicit function theorem: Di culties: !

What if J not di↵erentiable? = ) Answers in the case of analysis sparsity. Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely deﬁned? ! What if ( y ) non-invertible? Heuristic Derivation Data ﬁdelity Regularity
37. ### D = 1 1 0 0 0 1 1 0

0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D B = {x \ ||D x||1 1}
38. ### D = 1 1 0 0 0 1 1 0

0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
39. ### D = 1 1 0 0 0 1 1 0

0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
40. ### D = 1 1 0 0 0 1 1 0

0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
41. ### J = Ic (P (y)) Support of the solution: I

= { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
42. ### J = Ic (P (y)) I Support of the solution:

1-D total variation: I = positions of discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
43. ### J = Ic (P (y)) I Support of the solution:

1-D total variation: Sub-space model: I = positions of discontinuities. For TV: GJ = signals with prescribed discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x GJ = ker(DJ ) = Im(DJ ) x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
44. ### To be understood: there exists a solution with same sign.

Local Sign Stability (P (y)) Lemma: x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
45. ### To be understood: there exists a solution with same sign.

D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
46. ### To be understood: there exists a solution with same sign.

D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
47. ### To be understood: there exists a solution with same sign.

D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
48. ### To be understood: there exists a solution with same sign.

D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) Theorem: If y / 2 H , for ¯ y close to y , x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H . ˆ x(¯ y) is a solution of P (¯ y).
49. ### Application to GSURE For y / 2 H , one

has locally: µ(y) = A[J] ⇤y + cst.
50. ### df(y) = div (µ) (y) = dim(GJ ) Application to

GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
51. ### for D = Id (synthesis) df(y) = div (µ) (y)

= dim(GJ ) df( y ) = || x ( y )||0 Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
52. ### for D = Id (synthesis) are unbiased estimators of df

and gdf. df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
53. ### In practice: for D = Id (synthesis) Proposition: z ⇠

N(0, IdP ) ✓ ⇤ DJ D⇤ J 0 ◆ ✓ ⌫(z) ˜ ⌫ ◆ = ✓ ⇤z 0 ◆ where ⌫ ( z ) solves are unbiased estimators of df and gdf. Trick: tr(A) = E z(hAz, zi), z ⇠ N(0, IdP ). [Ramani et al’08] df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) gdf(y) = E z(h⌫(z), +zi), gdf(y) ⇡ 1 K PK k=1 h⌫(zk), +zk i, zk ⇠ N(0, IdP ). Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
54. ### `1 `2 regularization: J ( x ) = X b2B

|| xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b Group Lasso Variations
55. ### `1 `2 regularization: J ( x ) = X b2B

|| xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0
56. ### Theorem: `1 `2 regularization: J ( x ) = X

b2B || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. where: x = diag(Id b/ || xb ||) b 2 I Px = Id I diag( xbx ⇤ b ) b 2 I 8 y / 2 H , @µI( y ) = I I( x ( y )) 1 ⇤ I ( x ) = ⇤ I I + x Px Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0
57. ### x0 170.000 290.000 410.000 530.000 0.45 0.5 0.55 0.6 0.65

0.7 0.75 0.8 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final : sub-sampled radon transform. #projections=16. x0 +y x ? ( y ) CT Scan (Tomography) D = [@1, @2] Finite di↵erences gradient:
58. ### MRI (Fourier Sampling) 170.000 290.000 410.000 530.000 0.3 0.4 0.5

0.6 0.7 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final ⌦ x0 +y x ? ( y ) x0 = (ˆ x0( ! ))!2⌦
59. ### Overview • Inverse Problems • Stein Unbiased Risk Estimators •

Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
60. ### Limitations of the theoretical approach: ! Closed form of @µ

( y ) not available for arbitrary J . ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?
61. ### Example: synthesis `1 sparsity: x x ( y ) Limitations

of the theoretical approach: ! support is not correctly estimated by iterated thresholdings. ! Closed form of @µ ( y ) not available for arbitrary J . df( y ) = || x ( y )||0 One can have || x x ( y )|| small but |df( y ) || x ||0 | large. ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?
62. ### Iterative method: Iterative Estimators x (`+1)( y ) = '

( x (`)( y ) , y ) `!+1 ! x ( y )
63. ### Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Iterative Estimators x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
64. ### Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
65. ### Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
66. ### Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
67. ### Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
68. ### Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Similar approaches: [Ramani, Blu, Unser’08], [Giryes, Elad, Eldar’11] Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
69. ### Simple function: prox J (x) = argmin z 1 2

|| x z ||2 + J(z) Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
70. ### Simple function: prox J (x) = argmin z 1 2

|| x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi prox||·||1 Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
71. ### Simple function: prox J (x) = argmin z 1 2

|| x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
72. ### Simple function: prox J (x) = argmin z 1 2

|| x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
73. ### Simple function: prox J (x) = argmin z 1 2

|| x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . ! Extension to other schemes (DR, GFB, primal-dual, . . . ). prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
74. ### y Deconvolution Sub-sampling ! use either GFB (aux. variables) or

primal-dual scheme. Example: Isotropic Total Variation (g) xLS ( y ) (h) xB ( y ) at the optimal B . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x ( y ) at the optimal / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . imation for inverse problems June 30, 2012 16 / 18 (g) x L S (y) (h) x B (y) at the optimal B / N = 0 . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x (y) at the optimal atrix (P / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . Risk estimation for inverse problems June 30, 2012 16 / 18 ge processing (b) y (c) x ( y ) at the optimal cessing (b) y (c) x ( y ) at the optimal 1 2 4 8 Size of block B (f) Figure: A random CS matrix 2 4 6 8 10 12 1 1.5 2 2.5 x 106 Regularization parameter λ Quadratic loss Projected Risk GSURE True Risk (a) Figure: A sub-samplin C. Deledalle (CEREMADE) Numerical Examples - Im 1 2 3 4 5 0.6 0.8 1 1.2 1.4 1.6 1.8 x 107 Quadratic loss Projected Risk GSURE x ? ( y ) ? Quadratic loss Quadratic loss J ( x ) = ||r x ||1,2
75. ### SVD: x = V diag( ⇢ ) U ⇤ Example:

Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
76. ### Nuclear norm: SVD: x = V diag( ⇢ ) U

⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
77. ### Nuclear norm: Compu t ing @prox ||·||⇤ : see [Deledalle

et al., ICML 2012] SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
78. ### x0 Nuclear norm: Compu t ing @prox ||·||⇤ : see

[Deledalle et al., ICML 2012] Inpainting: = diag("i)i, "i 2 {0, 1} +y SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) x completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square cal Examples – Matrix completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk Hypothesis: x0 2 Rn⇥p has low-rank. x ? ( y ) ?

80. ### Computing risk estimates () Sensitivity of the estimator Closed form

expression: ! quadratic, `1 regularization, . . . Conclusion
81. ### Computing risk estimates () Sensitivity of the estimator Closed form

expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion
82. ### Computing risk estimates () Sensitivity of the estimator Closed form

expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion SURE ?