Parameter Selection for Sparse Regularization of Inverse Problems

Parameter Selection for Sparse Regularization of Inverse Problems

Talk at Cambridge

E34ded36efe4b7abb12510d4e525fee8?s=128

Gabriel Peyré

March 07, 2013
Tweet

Transcript

  1. Gabriel Peyré www.numerical-tours.com Parameter Selection for Sparse Regularization of Inverse

    Problems Samuel Vaiter Charles {Dossal, Deledalle} Jalal Fadili Joint work with:
  2. Overview • Inverse Problems • Stein Unbiased Risk Estimators •

    Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  3. Y = x0 + W 2 RP Hypothesis: W ⇠

    N (0 , 2 IdP ), is known. Inverse Problems Recovering x0 RN from noisy observations
  4. Y = x0 + W 2 RP Hypothesis: W ⇠

    N (0 , 2 IdP ), is known. Examples: Inpainting, super-resolution, compressed-sensing Inverse Problems Recovering x0 RN from noisy observations x0 x0
  5. x0 = ( p✓k )16k6K Inverse Problem in Medical Imaging

    x0
  6. () Fourier slice theorem x0 = ( p✓k )16k6K Inverse

    Problem in Medical Imaging ˆ x0( ! ) x0
  7. Magnetic resonance imaging (MRI): () Fourier slice theorem x0 =

    ( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0
  8. Magnetic resonance imaging (MRI): Other examples: MEG, EEG, . .

    . () Fourier slice theorem x0 = ( p✓k )16k6K x0 = (ˆ x0( ! ))!2⌦ x0 y Inverse Problem in Medical Imaging ˆ x0( ! ) ˆ x0 x0
  9. observations y parameter Measures: Y = x0 + W ,

    W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on Estimators
  10. observations y parameter Example: variational methods Measures: Y = x0

    + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Estimators Data fidelity Regularity
  11. observations y parameter Example: variational methods Method: use risk estimators

    computed from y alone. Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data fidelity Regularity
  12. observations y parameter Example: variational methods Method: use risk estimators

    computed from y alone. Theoretical approaches (closed form expressions). Numerical approaches (recursive di↵erentiation). Measures: Y = x0 + W , W ⇠ N(0 , 2IdP ). Estimator: x(y) depends only on x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) Goal: set up to minimize E W ( || x0 x(Y ) ||2 ). Estimators Data fidelity Regularity
  13. Overview • Inverse Problems • Stein Unbiased Risk Estimators •

    Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  14. ? = argmin R( ) Plugin-estimator: x ? ( Y

    ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
  15. But: E W is not accessible ! use one observation.

    ? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
  16. But: x0 is not accessible ! needs risk estimators. E

    W is not accessible ! use one observation. ? = argmin R( ) Plugin-estimator: x ? ( Y ) Risk Minimization How to choose the value of the parameter ? Risk-based selection of I Risk associated to : measure of the expected quality of x?( y, ) wrt x0, R ( ) = E w || x?( y, ) x0||2 . I The optimal (theoretical) minimizes the risk. The risk is unknown since it depends on x0 . Can we estimate the risk solely from x?( y, )? Average risk: R ( ) = E W (|| x ( Y ) x0 ||2)
  17. Prediction: µ(y) = x(y) µ(y + ) = µ(y) +

    @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable Prediction Risk Estimation
  18. Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y +

    ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) Prediction Risk Estimation
  19. Theorem: [Stein, 1981] Stein Unbiased Risk Estimator: Prediction: µ(y) =

    x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation
  20. Theorem: [Stein, 1981] Other estimators: GCV, BIC, AIC, . .

    . SURE: Requires (not always available) Unbiased and good practical performances Stein Unbiased Risk Estimator: Prediction: µ(y) = x(y) µ(y + ) = µ(y) + @µ(y) · + O(|| ||2) Sensitivity analysis: if µ is weakly di↵erentiable SURE(y) = ||y µ(y)||2 2P + 2 2df(y) df(y) = tr(@µ(y)) = div(µ)(y) E W (SURE( Y )) = E W (|| x0 µ ( Y )||2) Prediction Risk Estimation
  21. Problem: || x0 x(y) || poor indicator of || x0

    x(y) || . Generalized SURE
  22. Problem: || x0 x(y) || poor indicator of || x0

    x(y) || . E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +
  23. Generalized df: ML estimator: ˆ x(y) = ⇤ ( ⇤

    ) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE Projker( )? = ⇧ = ⇤ ( ⇤ ) +
  24. Theorem: Generalized df: ML estimator: ˆ x(y) = ⇤ (

    ⇤ ) + y. Problem: || x0 x(y) || poor indicator of || x0 x(y) || . GSURE( y )=||ˆ x ( y ) x ( y )||2 2tr(( ⇤)+)+2 2gdf( y ) gdf(y) = tr ( ⇤)+@µ(y) E W (GSURE( Y )) = E W (||⇧( x0 x ( Y ))||2) E W (||⇧( x0 x ( Y ))||2) Risk on ker( ) ? : Generalized SURE [Eldar 09, Pesquet al. 09, Vonesh et al. 08] Projker( )? = ⇧ = ⇤ ( ⇤ ) +
  25. Overview • Inverse Problems • Stein Unbiased Risk Estimators •

    Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  26. min x 2RN 1 2 || y ( ) x

    ||2 2 + || x ||1 Coe cients x Image x J ( x ) = || x ||1 = P i | xi | Sparse Regularizations Synthesis regularization
  27. D min x 2RN 1 2 || y ( )

    x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
  28. D min x 2RN 1 2 || y ( )

    x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
  29. D min x 2RN 1 2 || y ( )

    x ||2 2 + || x ||1 Coe cients x Image x Gradient D ⇤ x Image x min x 2RN 1 2 || y x ||2 2 + || D ⇤ x ||1 ! Unless D = is orthogonal, produces di↵erent results. ! Analysis more general when using as operator. J ( x ) = || x ||1 = P i | xi | J ( x ) = || D ⇤ x ||1 = P i |h x, di i| Sparse Regularizations Synthesis regularization Analysis regularization
  30. Estimator: x ( y ) 2 argmin x 2RN 1

    2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. Sparse Analysis Local Variations
  31. Estimator: x ( y ) 2 argmin x 2RN 1

    2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). Sparse Analysis Local Variations Theorem:
  32. Questions: Estimator: x ( y ) 2 argmin x 2RN

    1 2 || x y ||2 + || D ⇤ x ||1 Remark: x(y) not always unique but µ ( y ) = x ( y ) always unique. µ is Lipschitz (hence weakly di↵erentiable). ! Find a closed form expression for @µ ( y ). Sparse Analysis Local Variations Theorem: ! Prove that this formula is valid almost everywhere.
  33. If J is smooth: Variational estimator: x ( y )

    2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 Heuristic Derivation Data fidelity Regularity
  34. If J is smooth: Implicit function theorem: Variational estimator: x

    ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J Heuristic Derivation Data fidelity Regularity
  35. If J is smooth: Implicit function theorem: Di culties: !

    What if J not di↵erentiable? Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely defined? ! What if ( y ) non-invertible? Heuristic Derivation Data fidelity Regularity
  36. If J is smooth: Implicit function theorem: Di culties: !

    What if J not di↵erentiable? = ) Answers in the case of analysis sparsity. Variational estimator: x ( y ) 2 argmin x 2RN 1 2 || y x ||2 + J ( x ) ⇤( x ( y ) y ) + r J ( x ( y )) = 0 @µ(y) = (y) 1 ⇤ where = ⇤ + @2J ! What if x(y) not uniquely defined? ! What if ( y ) non-invertible? Heuristic Derivation Data fidelity Regularity
  37. D = 1 1 0 0 0 1 1 0

    0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D B = {x \ ||D x||1 1}
  38. D = 1 1 0 0 0 1 1 0

    0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
  39. D = 1 1 0 0 0 1 1 0

    0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
  40. D = 1 1 0 0 0 1 1 0

    0 0 1 1 1 0 0 1 TV-1D ball: Displayed in {x \ ⇥x, 1⇤ = 0} R3 Copyright c Mael x ( y ) 2 argmin y = x || D ⇤ x ||1 Polytope of TV in 1-D x {x \ y = x} B = {x \ ||D x||1 1}
  41. J = Ic (P (y)) Support of the solution: I

    = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
  42. J = Ic (P (y)) I Support of the solution:

    1-D total variation: I = positions of discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
  43. J = Ic (P (y)) I Support of the solution:

    1-D total variation: Sub-space model: I = positions of discontinuities. For TV: GJ = signals with prescribed discontinuities. I = { i \ ( D ⇤ x ( y ))i 6= 0} Union of Subspaces Model D x = (xi xi 1 ) i x GJ = ker(DJ ) = Im(DJ ) x ( y ) 2 argmin x 2RN 1 2 || x y ||2 + || D ⇤ x ||1
  44. To be understood: there exists a solution with same sign.

    Local Sign Stability (P (y)) Lemma: x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
  45. To be understood: there exists a solution with same sign.

    D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
  46. To be understood: there exists a solution with same sign.

    D = Id3⇥3, 2 R2⇥3 Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
  47. To be understood: there exists a solution with same sign.

    D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H .
  48. To be understood: there exists a solution with same sign.

    D = Id3⇥3, 2 R2⇥3 sI = sign( D ⇤ I x ( y )) ˆ x (¯ y ) = argmin x 2GJ 1 2 || x ¯ y ||2 + h D ⇤ I x, sI i Local Sign Stability (P (y)) Lemma: x ?=0 H;,j HI,j Linearized problem: = A[J] ( ⇤ ¯ y DIsI) Theorem: If y / 2 H , for ¯ y close to y , x ( y ) 2 argmin x 1 2 || x y ||2 + || D ⇤ x ||1 sign(D ⇤ x(y)) is constant around y / 2 H . ˆ x(¯ y) is a solution of P (¯ y).
  49. Application to GSURE For y / 2 H , one

    has locally: µ(y) = A[J] ⇤y + cst.
  50. df(y) = div (µ) (y) = dim(GJ ) Application to

    GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
  51. for D = Id (synthesis) df(y) = div (µ) (y)

    = dim(GJ ) df( y ) = || x ( y )||0 Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
  52. for D = Id (synthesis) are unbiased estimators of df

    and gdf. df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
  53. In practice: for D = Id (synthesis) Proposition: z ⇠

    N(0, IdP ) ✓ ⇤ DJ D⇤ J 0 ◆ ✓ ⌫(z) ˜ ⌫ ◆ = ✓ ⇤z 0 ◆ where ⌫ ( z ) solves are unbiased estimators of df and gdf. Trick: tr(A) = E z(hAz, zi), z ⇠ N(0, IdP ). [Ramani et al’08] df(y) = div (µ) (y) = dim(GJ ) df( y ) = || x ( y )||0 gdf(y) = tr(⇧A[J]) gdf(y) = E z(h⌫(z), +zi), gdf(y) ⇡ 1 K PK k=1 h⌫(zk), +zk i, zk ⇠ N(0, IdP ). Application to GSURE For y / 2 H , one has locally: Corollary: µ(y) = A[J] ⇤y + cst. Let I = supp( D⇤x ( y )) such that HJ holds.
  54. `1 `2 regularization: J ( x ) = X b2B

    || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b Group Lasso Variations
  55. `1 `2 regularization: J ( x ) = X b2B

    || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0
  56. Theorem: `1 `2 regularization: J ( x ) = X

    b2B || xb || , xb = ( xi)i2b Partition in blocks: {1, . . . , N} = S b2B b J ( x ) = ( x 2 1 + x 2 2 )1 2 + | x3 | Transition set H : semi-algebraic with zero measure. where: x = diag(Id b/ || xb ||) b 2 I Px = Id I diag( xbx ⇤ b ) b 2 I 8 y / 2 H , @µI( y ) = I I( x ( y )) 1 ⇤ I ( x ) = ⇤ I I + x Px Group Lasso Variations −20 −10 0 10 20 0 10 20 30 Figure 3: k 1k = 1, k 2k = 1, k 3k < −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 2 1 1 3 3 2 || x ?||0
  57. x0 170.000 290.000 410.000 530.000 0.45 0.5 0.55 0.6 0.65

    0.7 0.75 0.8 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final : sub-sampled radon transform. #projections=16. x0 +y x ? ( y ) CT Scan (Tomography) D = [@1, @2] Finite di↵erences gradient:
  58. MRI (Fourier Sampling) 170.000 290.000 410.000 530.000 0.3 0.4 0.5

    0.6 0.7 Regularization parameter λ Quadratic cost Risk Projection risk GSURE MC Exh. search iterates Exh. search final ⌦ x0 +y x ? ( y ) x0 = (ˆ x0( ! ))!2⌦
  59. Overview • Inverse Problems • Stein Unbiased Risk Estimators •

    Theory: SURE for Sparse Regularization • Practice: SURE for Iterative Algorithms
  60. Limitations of the theoretical approach: ! Closed form of @µ

    ( y ) not available for arbitrary J . ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?
  61. Example: synthesis `1 sparsity: x x ( y ) Limitations

    of the theoretical approach: ! support is not correctly estimated by iterated thresholdings. ! Closed form of @µ ( y ) not available for arbitrary J . df( y ) = || x ( y )||0 One can have || x x ( y )|| small but |df( y ) || x ||0 | large. ! Unstable: requires computing x(y) with high accuracy. From Theory to Practice ! Non-variational methods ?
  62. Iterative method: Iterative Estimators x (`+1)( y ) = '

    ( x (`)( y ) , y ) `!+1 ! x ( y )
  63. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Iterative Estimators x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  64. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  65. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  66. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  67. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  68. Iterative method: Recursive di↵erentiation: @x (`+1)( y ) = @1'

    ( x (`)( y ) , y ) @x (`)( y ) + @2' ( x (`)( y ) , y ) – Compute Algorithm ⌘ (`+1) k = @1' ( x (`) , y ) ⌘ (`) k + @2' ( x (`) , y )[ zk] – For ` 6 `0, compute: gdf(`0)(y) = 1 K PK k=1 h⌘(`0) k , +zk i ! gdf (`0) approximates the gdf of y 7! x (`0) (y). ! unbiased estimate of the risk at current iteration `0. Similar approaches: [Ramani, Blu, Unser’08], [Giryes, Elad, Eldar’11] Iterative Estimators – For k = 1 , . . . , K , generate zk ⇠ N (0 , IdP ). x (`+1)( y ) = ' ( x (`)( y ) , y ) `!+1 ! x ( y )
  69. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  70. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi prox||·||1 Example: Forward-backward Splitting x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  71. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  72. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  73. Simple function: prox J (x) = argmin z 1 2

    || x z ||2 + J(z) Example: prox ||·||1 (x)i = min ✓ 0, 1 | xi | ◆ xi Forward-backward: @1'(x, y) = (Id ⌧ ⇤ ) @prox⌧ J (u) Derivatives: '(x, y) = prox⌧ J (u) where u = x ⌧ ⇤( x y ) . ! Extension to other schemes (DR, GFB, primal-dual, . . . ). prox||·||1 @ prox||·||1 @2'(x, y) = ⌧ @prox⌧ J (u) If ⌧ < 2 / || ||2, x (`)( y ) `!+1 ! x ( y ). Example: Forward-backward Splitting x (`+1)( y ) = ' ( x (`)( y ) , y ) x ( y ) 2 argmin x 1 2 || x y ||2 + J ( x )
  74. y Deconvolution Sub-sampling ! use either GFB (aux. variables) or

    primal-dual scheme. Example: Isotropic Total Variation (g) xLS ( y ) (h) xB ( y ) at the optimal B . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x ( y ) at the optimal / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . imation for inverse problems June 30, 2012 16 / 18 (g) x L S (y) (h) x B (y) at the optimal B / N = 0 . 5 ). G ( x ) = ||Bx ||. Optimization of the block size B . (b) y (c) x (y) at the optimal atrix (P / N = 0 . 5 ). G ( x ) = ||rx ||. Optimization of . Risk estimation for inverse problems June 30, 2012 16 / 18 ge processing (b) y (c) x ( y ) at the optimal cessing (b) y (c) x ( y ) at the optimal 1 2 4 8 Size of block B (f) Figure: A random CS matrix 2 4 6 8 10 12 1 1.5 2 2.5 x 106 Regularization parameter λ Quadratic loss Projected Risk GSURE True Risk (a) Figure: A sub-samplin C. Deledalle (CEREMADE) Numerical Examples - Im 1 2 3 4 5 0.6 0.8 1 1.2 1.4 1.6 1.8 x 107 Quadratic loss Projected Risk GSURE x ? ( y ) ? Quadratic loss Quadratic loss J ( x ) = ||r x ||1,2
  75. SVD: x = V diag( ⇢ ) U ⇤ Example:

    Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
  76. Nuclear norm: SVD: x = V diag( ⇢ ) U

    ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
  77. Nuclear norm: Compu t ing @prox ||·||⇤ : see [Deledalle

    et al., ICML 2012] SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Hypothesis: x0 2 Rn⇥p has low-rank.
  78. x0 Nuclear norm: Compu t ing @prox ||·||⇤ : see

    [Deledalle et al., ICML 2012] Inpainting: = diag("i)i, "i 2 {0, 1} +y SVD: x = V diag( ⇢ ) U ⇤ || x ||⇤ = P i | ⇢i | prox ||·||⇤ (x) = V diag(prox ||·||1 (⇢))U ⇤ Example: Matrix Completion Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) Numerical Examples – Matrix completion (a) X0 2 R1000⇥100 (b) y (masking 25%) (c) X ( y ) at optimal 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk (e) x completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square cal Examples – Matrix completion 1 2 5 10 20 50 100 0 0.2 0.4 0.6 0.8 1 Singular values Amplitude True spectrum Optimal estimate Least square (d) 0.01 0.02 0.03 2 2.5 3 x 10−6 Regularization parameter λ Quadratic cost Predicted risk SURE Optimal λ True risk Hypothesis: x0 2 Rn⇥p has low-rank. x ? ( y ) ?
  79. Computing risk estimates () Sensitivity of the estimator Conclusion

  80. Computing risk estimates () Sensitivity of the estimator Closed form

    expression: ! quadratic, `1 regularization, . . . Conclusion
  81. Computing risk estimates () Sensitivity of the estimator Closed form

    expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion
  82. Computing risk estimates () Sensitivity of the estimator Closed form

    expression: ! quadratic, `1 regularization, . . . Open problems: – Estimate the risk in ker( ). – Fast algorithms to optimize high dimensional . Conclusion SURE ?