Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Parameter Selection for Sparse Regularization o...

Parameter Selection for Sparse Regularization of Inverse Problems

Talk at Cambridge

Gabriel Peyré

March 07, 2013
Tweet

More Decks by Gabriel Peyré

Other Decks in Research

Transcript

  1. Gabriel Peyré www.numerical-tours.com Parameter Selection for Sparse Regularization of Inverse

    Problems Samuel Vaiter Charles {Dossal, Deledalle} Jalal Fadili Joint work with:
  2. Overview • Inverse Problems • Stein Unbiased Risk Estimators •

    Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  3. Y = x0 + W 2 RP Hypothesis: W ⇠

    N (0 , 2 IdP ), is known. Inverse Problems Recovering x0 RN from noisy observations
  4. Y = x0 + W 2 RP Hypothesis: W ⇠

    N (0 , 2 IdP ), is known. Examples: Inpainting, super-resolution, compressed-sensing Inverse Problems Recovering x0 RN from noisy observations x0 x0
  5. () Fourier slice theorem x0 = ( p✓k )16k6K Inverse

    Problem in Medical Imaging ˆ x0( ! ) x0
  6. Magnetic resonance imaging (MRI): () Fourier slice theorem x0 =

    ( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0
  7. Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . .

    . () Fourier slice theorem x0 = ( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ x0 y Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0
  8. observations y parameter Measures: Y = x0 + W ,

    W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on Estimators
  9. observations y parameter Example: variational methods Measures: Y = x0

    + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Estimators Data fidelity Regularity
  10. observations y parameter Example: variational methods Method: use risk estimators

    computed from y alone. Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data fidelity Regularity
  11. observations y parameter Example: variational methods Method: use risk estimators

    computed from y alone. Theoretical approaches (closed form expressions). Numerical approaches (recursive di↵erentiation). Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data fidelity Regularity
  12. Overview • Inverse Problems • Stein Unbiased Risk Estimators •

    Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  13. ? = argmin R( ) Plugin-estimator: x ? ( Y

    ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
  14. But: E W is not accessible ! use one observation.

    ? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
  15. But: x0 is not accessible ! needs risk estimators. E

    W is not accessible ! use one observation. ? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
  16. Prediction: µ(y) = x(y) µ(y + ) = µ(y) +

    @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable Prediction Risk Estimation
  17. Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y +

    ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) Prediction Risk Estimation
  18. Theorem: [Stein, 1981] Stein Unbiased Risk Estimator: Prediction: µ(y) =

    x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation
  19. Theorem: [Stein, 1981] Other estimators: GCV, BIC, AIC, . .

    . SURE: Requires (not always available) Unbiased and good practical performances Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation
  20. Problem: || x0 x(y) || poor indicator of || x0

    x(y) || . E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +
  21. Generalized df: ML estimator: ˆ x(y) = ⇤ ( ⇤

    ) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +
  22. Theorem: Generalized df: ML estimator: ˆ x(y) = ⇤ (

    ⇤ ) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (GSURE( Y )) = E W (||⇧( x0 x ( Y ))||2) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE [Eldar 09, Pesquet al. 09, Vonesh et al. 08] Projker( )? = ⇧ = ⇤ ( ⇤ ) +
  23. Overview • Inverse Problems • Stein Unbiased Risk Estimators •

    Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  24. min x 2RN 1 2 || y ( ) x

    ||2 2 + || x ||1 Coe cients x Image x J ( x ) = || x ||1 = P i | xi | Sparse Regularizations Synthesis regularization
  25. D min x 2RN 1 2 || y ( )

    x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
  26. D min x 2RN 1 2 || y ( )

    x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
  27. D min x 2RN 1 2 || y ( )

    x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. ! Analysis more general when using as operator. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
  28. Estimator: x ( y ) 2 argmin x 2RN 1

    2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. Sparse Analysis Local Variations
  29. Estimator: x ( y ) 2 argmin x 2RN 1

    2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). Sparse Analysis Local Variations Theorem:
  30. Questions: Estimator: x ( y ) 2 argmin x 2RN

    1 2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). ! Find a closed form expression for @µ ( y ). Sparse Analysis Local Variations Theorem: ! Prove that this formula is valid almost everywhere.
  31. If J is smooth: Variational estimator: x ( y )

    2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 Heuristic Derivation Data fidelity Regularity
  32. If J is smooth: Implicit function theorem: Variational estimator: x

    ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J Heuristic Derivation Data fidelity Regularity
  33. If J is smooth: Implicit function theorem: Di culties: !

    What if J not di↵erentiable? Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely defined? ! What if ( y ) non-invertible? Heuristic Derivation Data fidelity Regularity
  34. If J is smooth: Implicit function theorem: Di culties: !

    What if J not di↵erentiable? = ) Answers in the case of analysis sparsity. Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely defined? ! What if ( y ) non-invertible? Heuristic Derivation Data fidelity Regularity
  35. D = 1 1 0 0 0 1 1 0

    0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D B = {x \ ||D x||1 1}
  36. D = 1 1 0 0 0 1 1 0

    0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
  37. D = 1 1 0 0 0 1 1 0

    0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
  38. D = 1 1 0 0 0 1 1 0

    0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
  39. J = Ic (P (y)) Support of the solution: I

    = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
  40. J = Ic (P (y)) I Support of the solution:

    1-D total variation: I = positions of discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
  41. J = Ic (P (y)) I Support of the solution:

    1-D total variation: Sub-space model: I = positions of discontinuities. For TV: GJ = signals with prescribed discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x GJ = ker(DJ ) = Im(DJ ) x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
  42. To be understood: there exists a solution with same sign.

    Local Sign Stability (P (y)) Lemma: x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
  43. To be understood: there exists a solution with same sign.

    D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
  44. To be understood: there exists a solution with same sign.

    D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
  45. To be understood: there exists a solution with same sign.

    D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
  46. To be understood: there exists a solution with same sign.

    D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) Theorem: If y / 2 H , for ¯ y close to y , x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H . ˆ x(¯ y) is a solution of P (¯ y).
  47. Application to GSURE For y / 2 H , one

    has locally: µ(y) = A[J] ⇤y + cst.
  48. df(y) = div (µ) (y) = dim(GJ ) Application to

    GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
  49. for D = Id (synthesis) df(y) = div (µ) (y)

    = dim(GJ ) df( y ) = || x ( y )||0 Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
  50. for D = Id (synthesis) are unbiased estimators of df

    and gdf. df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
  51. In practice: for D = Id (synthesis) Proposition: z ⇠

    N(0, IdP ) ✓ ⇤ DJ D⇤ J 0 ◆ ✓ ⌫(z) ˜ ⌫ ◆ = ✓ ⇤z 0 ◆ where ⌫ ( z ) solves are unbiased estimators of df and gdf. Trick: tr(A) = E z(hAz, zi), z ⇠ N(0, IdP ). [Ramani et al’08] df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) gdf(y) = E z(h⌫(z), +zi), gdf(y) ⇡ 1 K PK k=1 h⌫(zk), +zk i, zk ⇠ N(0, IdP ). Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
  52. `1 `2 regularization: J ( x ) = X b2B

    || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b Group Lasso Variations
  53. `1 `2 regularization: J ( x ) = X b2B

    || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0
  54. Theorem: `1 `2 regularization: J ( x ) = X

    b2B || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. where: x = diag(Id b/ || xb ||) b 2 I Px = Id I diag( xbx ⇤ b ) b 2 I 8 y / 2 H , @µI( y ) = I I( x ( y )) 1 ⇤ I ( x ) = ⇤ I I + x Px Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0
  55. x0 170.000 290.000 410.000 530.000 0.45 0.5 0.55 0.6 0.65

    0.7 0.75 0.8 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final : sub-sampled radon transform. #projections=16. x0 +y x ? ( y ) CT Scan (Tomography) D = [@1, @2] Finite di↵erences gradient:
  56. MRI (Fourier Sampling) 170.000 290.000 410.000 530.000 0.3 0.4 0.5

    0.6 0.7 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final ⌦ x0 +y x ? ( y ) x0 = (ˆ x0( ! ))!2⌦
  57. Overview • Inverse Problems • Stein Unbiased Risk Estimators •

    Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  58. Limitations of the theoretical approach: ! Closed form of @µ

    ( y ) not available for arbitrary J . ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?
  59. Example: synthesis `1 sparsity: x x ( y ) Limitations

    of the theoretical approach: ! support is not correctly estimated by iterated thresholdings. ! Closed form of @µ ( y ) not available for arbitrary J . df( y ) = || x ( y )||0 One can have || x x ( y )|| small but |df( y ) || x ||0 | large. ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?
  60. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Iterative Estimators x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  61. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  62. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  63. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  64. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  65. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Similar approaches: [Ramani, Blu, Unser’08], [Giryes, Elad, Eldar’11] Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  66. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  67. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi prox||·||1 Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  68. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  69. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  70. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . ! Extension to other schemes (DR, GFB, primal-dual, . . . ). prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  71. y Deconvolution Sub-sampling ! use either GFB (aux. variables) or

    primal-dual scheme. Example: Isotropic Total Variation (g) xLS ( y ) (h) xB ( y ) at the optimal B . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x ( y ) at the optimal / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . imation for inverse problems June 30, 2012 16 / 18 (g) x L S (y) (h) x B (y) at the optimal B / N = 0 . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x (y) at the optimal atrix (P / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . Risk estimation for inverse problems June 30, 2012 16 / 18 ge processing (b) y (c) x ( y ) at the optimal cessing (b) y (c) x ( y ) at the optimal 1 2 4 8 Size of block B (f) Figure: A random CS matrix 2 4 6 8 10 12 1 1.5 2 2.5 x 106 Regularization parameter λ Quadratic loss Projected Risk GSURE True Risk (a) Figure: A sub-samplin C. Deledalle (CEREMADE) Numerical Examples - Im 1 2 3 4 5 0.6 0.8 1 1.2 1.4 1.6 1.8 x 107 Quadratic loss Projected Risk GSURE x ? ( y ) ? Quadratic loss Quadratic loss J ( x ) = ||r x ||1,2
  72. SVD: x = V diag( ⇢ ) U ⇤ Example:

    Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
  73. Nuclear norm: SVD: x = V diag( ⇢ ) U

    ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
  74. Nuclear norm: Compu t ing @prox ||·||⇤ : see [Deledalle

    et al., ICML 2012] SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
  75. x0 Nuclear norm: Compu t ing @prox ||·||⇤ : see

    [Deledalle et al., ICML 2012] Inpainting: = diag("i)i, "i 2 {0, 1} +y SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) x completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square cal Examples – Matrix completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk Hypothesis: x0 2 Rn⇥p has low-rank. x ? ( y ) ?
  76. Computing risk estimates () Sensitivity of the estimator Closed form

    expression: ! quadratic, `1 regularization, . . . Conclusion
  77. Computing risk estimates () Sensitivity of the estimator Closed form

    expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion
  78. Computing risk estimates () Sensitivity of the estimator Closed form

    expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion SURE ?