Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Convergent Plug & Play schemes for Inverse Prob...

Convergent Plug & Play schemes for Inverse Problems in Imaging

npapadakis

June 10, 2022
Tweet

More Decks by npapadakis

Other Decks in Research

Transcript

  1. Convergent Plug & Play schemes for Inverse Problems in Imaging

    Samuel Hurault, Arthur Leclaire, Nicolas Papadakis
  2. Image Enhancement and Restoration [Thi´ ebaut et al. ’16] [Gustafsson

    et al. ’16] [The Mandalorian, Disney ’20] 1/51
  3. Outline 1. Introduction 2. From Proximal algorithms to Plug &

    Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion
  4. Image Inverse Problems Find x from observation y = Ax

    + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Denoising: = + 2/51
  5. Image Inverse Problems Find x from observation y = Ax

    + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Deblurring: = ∗ + 2/51
  6. Image Inverse Problems Find x from observation y = Ax

    + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Super-resolution: =         ∗         s + 2/51
  7. Image Inverse Problems Find x from observation y = Ax

    + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Inpainting: = ⊗ 2/51
  8. Image Inverse Problems Find x from observation y = Ax

    + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Compressed Sensing: e.g. Magnetic Resonance Imaging (MRI) 2/51
  9. Image Inverse Problems Find x from observation y ∼ p(y|x)

    • y ∈ Rm observation • x ∈ Rn unknown input • p(y|x) forward model Computed Tomography: 2/51
  10. Image Inverse Problems with Plug & Play Find x from

    observation y = A(x) + ξ Supervized learning  Provides state-of-the art results from paired data (xi , yi )  Requires to train a different model for each A 3/51
  11. Image Inverse Problems with Plug & Play Find x from

    observation y = A(x) + ξ Supervized learning  Provides state-of-the art results from paired data (xi , yi )  Requires to train a different model for each A Plug & Play (PnP): • Learn an efficient denoiser • Plug the denoiser in an iterative algorithm • Solve general inverse problems, involving any operator A 3/51
  12. Image Inverse Problems with Plug & Play Find x from

    observation y = A(x) + ξ Supervized learning  Provides state-of-the art results from paired data (xi , yi )  Requires to train a different model for each A Plug & Play (PnP): • Learn an efficient denoiser • Plug the denoiser in an iterative algorithm • Solve general inverse problems, involving any operator A  State-of-the-art image restoration  Convergence of the method 3/51
  13. Maximum A-Posteriori Find x from observation y ∼ p(y|x) with

    an a-priori p(x) on the solution arg max x∈Rn p(x|y) = arg max x∈Rn p(y|x)p(x) p(y) = arg min x∈Rn − log p(y|x) − log p(x) 4/51
  14. Maximum A-Posteriori Find x from observation y ∼ p(y|x) with

    an a-priori p(x) on the solution arg max x∈Rn p(x|y) = arg max x∈Rn p(y|x)p(x) p(y) = arg min x∈Rn − log p(y|x) − log p(x) Maximum A-Posteriori x∗ ∈ arg min x∈Rn f (x) + g(x) ⇐⇒ arg min x∈Rn data-fidelity + regularization f (x) = − log p(y|x) g(x) ∝ − log p(x) 4/51
  15. A variety of data-fidelity terms f Observation y = Ax

    + ξ • Assuming Gaussian noise model ξ ∼ N(0, σ2 Id), f (x) = − log p(y|x) = 1 2σ2 ||Ax − y||2 → convex and smooth f , non-strongly convex in general 5/51
  16. A variety of data-fidelity terms f Observation y = Ax

    + ξ • Assuming Gaussian noise model ξ ∼ N(0, σ2 Id), f (x) = − log p(y|x) = 1 2σ2 ||Ax − y||2 → convex and smooth f , non-strongly convex in general • Less regular cases - Noiseless case: f (x) = ı{x | Ax=y} → non-smooth f - Laplace / Poisson noise model → non-smooth f - Phase retrieval → non-convex f • More complex non-linear modeling of real complex physical systems (e.g. X-ray computed tomography, electron-microscopy...) 5/51
  17. A variety of explicit image priors Design an explicit regularization

    on image features: • Total variation [Rudin et al. ‘92] • Fourier spectrum [Ruderman ‘94] • Wavelet sparsity [Mallat ’09] 6/51
  18. A variety of explicit image priors Design an explicit regularization

    on image features: • Total variation [Rudin et al. ‘92] • Fourier spectrum [Ruderman ‘94] • Wavelet sparsity [Mallat ’09] Learn an explicit prior on image patches: • Dictionary learning [Elad et al. ’06, Mairal et al. ’08] • Gaussian mixture models [Yu et al. ’10], [Zoran, Weiss ‘11] 6/51
  19. A variety of explicit image priors Design an explicit regularization

    on image features: • Total variation [Rudin et al. ‘92] • Fourier spectrum [Ruderman ‘94] • Wavelet sparsity [Mallat ’09] Learn an explicit prior on image patches: • Dictionary learning [Elad et al. ’06, Mairal et al. ’08] • Gaussian mixture models [Yu et al. ’10], [Zoran, Weiss ‘11] Learn an explicit deep prior on full images: (generative models) • Variational Auto-encoders [Kingma & Welling, ‘13] • Normalizing flows [Dinh et al. ‘15] • Adversarial [Lunz et al. ‘18, Mukherjee et al. ‘21] • Score-based models [Song. et al ’21] 6/51
  20. PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x)

    • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] 7/51
  21. PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x)

    • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11]  Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] 7/51
  22. PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x)

    • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11]  Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] 7/51
  23. PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x)

    • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11]  Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior 7/51
  24. PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x)

    • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11]  Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior • PnP: From y degraded, iterate: 1. Perform a denoising step with Dσ 2. Enforce data-fidelity 7/51
  25. PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x)

    • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11]  Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior • PnP: From y degraded, iterate: 1. Perform a denoising step with Dσ 2. Enforce data-fidelity  How many interations? Convergence when plugging Dσ in an optimization algorithm? 7/51
  26. Outline 1. Introduction 2. From Proximal algorithms to Plug &

    Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion
  27. Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) =

    arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) 8/51
  28. Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) =

    arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) data-fidelity 8/51
  29. Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) =

    arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) data-fidelity prior p(x) 8/51
  30. Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) =

    arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) Example: Inpainting 8/51
  31. Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) =

    arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) Example: Inpainting Log-concave prior p Convex problem f + g 8/51
  32. Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) =

    arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) Example: Inpainting Log-concave prior p Non log-concave prior Convex problem f + g Non-convex problem f + g 8/51
  33. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51
  34. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51
  35. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51
  36. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51
  37. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51
  38. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) Fixed point: x∗ = (Id −τ∇h)(x∗) ⇔ Minimizer: ∇h(x∗) = 0 9/51
  39. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) Fixed point: x∗ = (Id −τ∇h)(x∗) ⇔ Minimizer: ∇h(x∗) = 0 Convergence guarantee if Id −τ∇h is non-expansive ||(Id −τ∇h)(x) − (Id −τ∇h)(y)|| ≤ ||x − y|| Sufficient condition: if h convex with L-Lipschitz gradient and τL < 2 9/51
  40. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51
  41. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51
  42. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51
  43. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51
  44. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51
  45. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51
  46. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51
  47. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) Fixed point: x∗ = Proxτh (x∗) ⇔ Minimizer: 0 ∈ ∂h(x∗) 9/51
  48. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) Fixed point: x∗ = Proxτh (x∗) ⇔ Minimizer: 0 ∈ ∂h(x∗) For convex functions, Proxτh is non-expansive for all τ > 0 9/51
  49. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) For γ-strongly convex functions ∇h(x) − ∇h(y), x − y ≥ γ||x − y||2 non-expansiveness → contractiveness 9/51
  50. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) For γ-strongly convex functions ∇h(x) − ∇h(y), x − y ≥ γ||x − y||2 non-expansiveness → contractiveness • Proximal operator || Proxτh (x) − Proxτh (y) ≤ 1 1 + τγ ||x − y|| • Gradient descent operator, for τ < 2 L+γ ||(Id −τ∇h)(x) − (Id −τ∇h)(y)|| ≤ (1 − τγ)||x − y|| 9/51
  51. Proximal optimization algorithms Minimization problem for a proper function h

    : Rn → R x∗ ∈ arg min x∈Rn h(x) To sum up Convexity of function h ⇒ non-expansiveness of gradient/proximal operators ⇒ Convergence of gradient iterations to minimizers of h 9/51
  52. Gradient-based optimization algorithms Minimization problem with explicit prior x∗ ∈

    arg min x∈Rn f (x) + g(x) Proximal algorithms • Proximal Gradient Descent (PGD) xk+1 = TPGD (xk ) = Proxτg ◦(Id −τ∇f )(xk ) • Douglas-Rashford-Splitting (DRS) xk+1 = TDRS (xk ) = (2 Proxτf − Id) ◦ (2 Proxτg − Id)(xk ) • Half-Quadratic-Splitting (HQS) xk+1 = THQS (xk ) = Proxτf ◦ Proxτg (xk ) 10/51
  53. Gradient-based optimization algorithms Minimization problem with explicit prior x∗ ∈

    arg min x∈Rn f (x) + g(x) Proximal algorithms • Proximal Gradient Descent (PGD) xk+1 = TPGD (xk ) = Proxτg ◦(Id −τ∇f )(xk ) • Douglas-Rashford-Splitting (DRS) xk+1 = TDRS (xk ) = (2 Proxτf − Id) ◦ (2 Proxτg − Id)(xk ) • Half-Quadratic-Splitting (HQS) xk+1 = THQS (xk ) = Proxτf ◦ Proxτg (xk ) Fixed points of these non-expansive operators ⇔ Minimizers of f + g 10/51
  54. Gradient-based optimization algorithms Minimization problem with explicit prior x∗ ∈

    arg min x∈Rn f (x) + g(x) Proximal algorithms • Proximal Gradient Descent (PGD) xk+1 = TPGD (xk ) = Proxτg ◦(Id −τ∇f )(xk ) • Douglas-Rashford-Splitting (DRS) xk+1 = TDRS (xk ) = (2 Proxτf − Id) ◦ (2 Proxτg − Id)(xk ) • Half-Quadratic-Splitting (HQS) xk+1 = THQS (xk ) = Proxτf ◦ Proxτg (xk ) Fixed points of these non-expansive operators ⇔ Minimizers of f + g How using (deep) implicit learned priors? 10/51
  55. Plug & Play Image Restoration Denoising MAP solution y =

    ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p 11/51
  56. Plug & Play Image Restoration Denoising MAP solution y =

    ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p Plug & Play [Venkatakrishnan et al. ’13, Romano et al. 17]: Replace the gradient operator of the regularization by an external denoiser Dσ : Rn → Rn learnt for noise level σ Dσ (y) ≈ x∗ MAP (y) = Prox−σ2 log p (y) 11/51
  57. Plug & Play Image Restoration Denoising MAP solution y =

    ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p Plug & Play [Venkatakrishnan et al. ’13, Romano et al. 17]: Replace the gradient operator of the regularization by an external denoiser Dσ : Rn → Rn learnt for noise level σ Dσ (y) ≈ x∗ MAP (y) = Prox−σ2 log p (y) Example Dσ : DRUNet [Zhang et al’ 21] Remark: State-of-the-art denoisers are not non-expansive 11/51
  58. Plug & Play algorithms Proximal algorithms • PGD: xk+1 =

    Proxτg ◦ (Id −τ∇f )(xk ) • DRS: xk+1 = (2 Proxτf − Id) ◦ (2Proxτg − Id)(xk ) • HQS: xk+1 = Proxτf ◦Proxτg (xk ) Plug & Play algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) 12/51
  59. Plug & Play Image Restoration Pros and cons  High

    image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21] 13/51
  60. Plug & Play Image Restoration Pros and cons  High

    image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21]  Implicit prior → no tractable minimization problem minx f (x)+g(x) 13/51
  61. Plug & Play Image Restoration Pros and cons  High

    image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21]  Implicit prior → no tractable minimization problem minx f (x)+g(x)  Efficient denoisers are not non-expansive → Convergence guarantees 13/51
  62. Plug & Play Image Restoration Pros and cons  High

    image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21]  Implicit prior → no tractable minimization problem minx f (x)+g(x)  Efficient denoisers are not non-expansive → Convergence guarantees  Relevant priors are not log-concave → Non-convex problems 13/51
  63. Convergence of PnP methods Plug & Play algorithms • PnP-PGD:

    xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) Convergence results  Pseudo-linear denoisers Dσ (x) = Kσ (x)x [Sreehari et al. ’16, Romano et al. ’17, Chan et al. ’19, Gavaskar and Chaudhury ’20, Nair et al. ’21]  Deep denoisers 14/51
  64. Related works for deep denoisers Fixed point strategies • Denoisers

    assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y||  non-expansiveness can degrade denoising performances 15/51
  65. Related works for deep denoisers Fixed point strategies • Denoisers

    assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y||  non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19]  Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk ) 15/51
  66. Related works for deep denoisers Fixed point strategies • Denoisers

    assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y||  non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19]  Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk )  Implicit prior: no tractable minimization problem No explicit expression of g such that x∗ ∈ arg minx f (x) + g(x) 15/51
  67. Related works for deep denoisers Fixed point strategies • Denoisers

    assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y||  non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19]  Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk )  Implicit prior: no tractable minimization problem No explicit expression of g such that x∗ ∈ arg minx f (x) + g(x) Our objective:  →  15/51
  68. Overview Find x from observation y = Ax + ξ

    Maximum A-Posteriori arg min x∈Rn f (x) + g(x)  unknown prior p 16/51
  69. Overview Find x from observation y = Ax + ξ

    Maximum A-Posteriori arg min x∈Rn f (x) + g(x)  unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk )  implicit prior  no minimization problem  no convergence guarantees  SOTA restoration 16/51
  70. Outline 1. Introduction 2. From Proximal algorithms to Plug &

    Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion
  71. Gradient step denoiser and PnP Idea [Hurault et al ‘21]:

    • Define the denoiser as a gradient descent step over a differentiable (non-convex) scalar function gσ : Rn → R: Dσ = Id −∇gσ • Proposed PnP algorithm, for τ > 0 xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) 17/51
  72. Gradient step denoiser and PnP Idea [Hurault et al ‘21]:

    • Define the denoiser as a gradient descent step over a differentiable (non-convex) scalar function gσ : Rn → R: Dσ = Id −∇gσ • Proposed PnP algorithm, for τ > 0 xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) Plugging the Gradient step denoiser Dσ xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) ⇔ Proximal Gradient Descent on the (non-convex) problem min x f (x) + gσ (x) 17/51
  73. Overview Find x from observation y = Ax + ξ

    Maximum A-Posteriori arg min x∈Rn f (x) + g(x)  unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk )  implicit prior  no minimization problem  no convergence guarantees  SOTA restoration 18/51
  74. Overview Find x from observation y = Ax + ξ

    Maximum A-Posteriori arg min x∈Rn f (x) + g(x)  unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk )  implicit prior  no minimization problem  no convergence guarantees  SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x)  explicit prior  minimization problem ? convergence guarantees ? SOTA restoration 18/51
  75. Convergence of non-convex PGD Minimization problem for a proper function

    f min x f (x) + gσ (x) PnP algorithm xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) 19/51
  76. Convergence of non-convex PGD Minimization problem for a proper function

    f min x f (x) + gσ (x) PnP algorithm xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) Function value and residual convergence [Beck and Teboulle ’09] for τL < 1 • gσ differentiable with L-Lipschitz gradient • gσ bounded from below • f bounded from below, may be non-convex 19/51
  77. Convergence of non-convex PGD Minimization problem for a proper function

    f min x f (x) + gσ (x) PnP algorithm xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) Function value and residual convergence [Beck and Teboulle ’09] for τL < 1 • gσ differentiable with L-Lipschitz gradient • gσ bounded from below • f bounded from below, may be non-convex Convergence to stationary points [Attouch, Bolte and Svaiter ’13] • f and gσ satisfies the Kurdyka-Lojasiewicz (KL) • The iterates xk are bounded 19/51
  78. Design of the potential gσ Gradient Step denoiser Dσ (x)

    = x − ∇gσ (x) Parameterization of gσ to satisfy the required assumptions 20/51
  79. Design of the potential gσ Gradient Step denoiser Dσ (x)

    = x − ∇gσ (x) Parameterization of gσ to satisfy the required assumptions • Convex scalar neural network gσ : Rn → R+ [Cohen et al. ’21]  Limited denoising performances 20/51
  80. Design of the potential gσ Gradient Step denoiser Dσ (x)

    = x − ∇gσ (x) Parameterization of gσ to satisfy the required assumptions • Convex scalar neural network gσ : Rn → R+ [Cohen et al. ’21]  Limited denoising performances • Proposed non-convex potential gσ (x) = 1 2 ||x − Nσ (x)||2 with a neural network Nσ : Rn → Rn Dσ (x) = x − ∇gσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x))  Any efficient denoising architecture Nσ (x) can be reused 20/51
  81. Design of the potential gσ Parameterization of gσ gσ (x)

    = 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn, neural network with C2 and Lipschitz gradient activations 21/51
  82. Design of the potential gσ Parameterization of gσ gσ (x)

    = 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn, neural network with C2 and Lipschitz gradient activations Check the required assumptions  gσ is bounded from below, C1 and with L > 1 Lipschitz gradient ||∇2gσ (x)||S 21/51
  83. Design of the potential gσ Parameterization of gσ gσ (x)

    = 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn, neural network with C2 and Lipschitz gradient activations Check the required assumptions  gσ is bounded from below, C1 and with L > 1 Lipschitz gradient ||∇2gσ (x)||S  Coercivity to bound iterates xk [Laumont et al. ‘21] ˆ gσ (x) = gσ (x) + 1 2 ||x − ProjC (x)||2 for a large convex compact set C (never activated in practice) 21/51
  84. Interpretation Gradient Step denoiser Dσ (x) = x − ∇

    1 2 ||x − Nσ (x)||2 = Nσ (x) + JNσ (x)T (x − Nσ (x)) • Correct a denoiser Nσ (x) to make it a conservative vector field RED [Romano et al. ’17] • Learn the refitting of the denoiser Nσ (x) CLEAR [Deledalle et al.’17] 22/51
  85. GS-DRUNet Gradient Step denoiser Dσ (x) = Nσ (x) +

    JNσ (x)T (x − Nσ (x)) Architecture for Nσ : light version of DRUNet [Zhang et al’ 21] Training with L2 loss, for σ ∈ [0, 50/255]: L(σ) = Ex∼p,ξσ∼N(0,σ2) ||Dσ (x + ξσ ) − x||2 Training set: Berkeley, Waterloo, DIV2K and Flick2K 23/51
  86. GS-DRUNet PSNR on the CBSD68 dataset σ(./255) 5 15 25

    50 FFDNet 39.95 33.53 30.84 27.54 DnCNN 39.80 33.55 30.87 27.52 DRUNet 40.31 33.97 31.32 28.08 GS-DRUNet 40.26 33.90 31.26 28.01 DRUNet light 40.19 33.89 31.25 28.00 24/51
  87. Overview Find x from observation y = Ax + ξ

    Maximum A-Posteriori arg min x∈Rn f (x) + g(x)  unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk )  implicit prior  no minimization problem  no convergence guarantees  SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x)  explicit prior  minimization problem  convergence guarantees ? SOTA restoration 25/51
  88. General image inverse problems Objective function for any degradation operator

    A and noise level ν f (x) + gσ (x) = λ 2 ||Ax − y||2 + 1 2 ||Nσ (x) − x||2 GS-PnP algorithm xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) • Deep denoiser Dσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) • Proximal operator Proxτf (z) = τλAT A + Id −1 τλAT y + z • Backtracking procedure for τ [Hu et al. ‘22] - Rough empirical estimation of the lipschitz constant L - Not being stuck at the first local minima • Automatic tuning via Reinforcement Lerning [Wei et. al ‘20’] 26/51
  89. Deblurring: quantitative results (a) (b) (c) (d) (e) (f) (g)

    (h) (i) (j) ν Method Avg 0.01 EPLL 28.32 28.24 28.36 25.80 29.61 27.15 26.90 26.69 25.84 26.49 27.34 RED 30.47 30.01 30.29 28.09 31.22 28.92 28.90 28.67 26.66 28.45 29.17 IRCNN 32.96 32.62 32.53 32.44 33.51 33.62 32.54 32.20 28.11 29.19 31.97 MMO 32.35 32.06 32.24 31.67 31.77 33.17 32.30 31.80 27.81 29.26 31.44 DPIR 33.76 33.30 33.04 33.09 34.10 34.34 33.06 32.77 28.34 29.16 32.50 GS-PnP 33.52 33.07 32.91 32.83 34.07 34.25 32.96 32.54 28.11 29.03 32.33 0.03 EPLL 25.31 25.12 25.82 23.75 26.99 25.23 25.00 24.59 24.34 25.43 25.16 RED 25.71 25.32 25.71 24.38 26.65 25.50 25.27 24.99 23.51 25.54 25.26 IRCNN 28.96 28.65 28.90 28.38 30.03 29.87 28.92 28.52 25.92 27.64 28.58 DPIR 29.38 29.06 29.21 28.77 30.22 30.23 29.34 28.90 26.19 27.81 28.91 GS-PnP 29.22 28.89 29.20 28.60 30.32 30.21 29.32 28.92 26.38 27.89 28.90 0.05 EPLL 24.08 23.91 24.78 22.57 25.68 23.98 23.70 23.19 23.75 24.78 24.04 RED 22.78 22.54 23.13 21.92 23.78 22.97 22.89 22.67 22.01 23.78 22.84 IRCNN 27.00 26.74 27.25 26.37 28.29 28.06 27.22 26.81 24.85 26.83 26.94 DPIR 27.52 27.35 27.73 27.02 28.63 28.46 27.79 27.30 25.25 27.11 27.42 GS-PnP 27.45 27.28 27.70 26.98 28.68 28.44 27.81 27.38 25.49 27.15 27.44 28/51
  90. Deblurring: qualitative inspection Motion kernel (b) and ν = 0.03:

    Clean Observed (20.97dB) RED (25.92dB) IRCNN (28.66dB) DPIR (29.76dB) GS-PnP (29.90dB) (f + gσ)(xk ) ||xk+1 − xk || 29/51
  91. Deblurring: convergence inspection DPIR: 8 iterations with timestep σ decreasing

    uniformly in log-scale from 50 to ν and τ ∝ σ2 Asymptotic behavior of DPIR: (i) Decreasing timestep along 1000 iterations (ii) Decreasing timestep on the 8 first iterations and constant for the next 992 ones 30/51
  92. Deblurring: convergence inspection DPIR: 8 iterations with timestep σ decreasing

    uniformly in log-scale from 50 to ν and τ ∝ σ2 Asymptotic behavior of DPIR: (i) Decreasing timestep along 1000 iterations (ii) Decreasing timestep on the 8 first iterations and constant for the next 992 ones 30/51
  93. Image inverse problems Super-resolution: A = SH ; efficient closed-form

    solution for Proxτf [Zhao et al. ’16] =         ∗         s + 31/51
  94. Super-resolution: quantitative results Kernels Method s = 2 s =

    3 Avg ν = 0.01 ν = 0.03 ν = 0.05 ν = 0.01 ν = 0.03 ν = 0.05 Bicubic 24.85 23.96 22.79 23.14 22.52 21.62 23.15 RED 28.29 24.65 22.98 26.13 24.02 22.37 24.74 IRCNN 27.43 26.22 25.86 26.12 25.11 24.79 25.92 DPIR 28.62 27.30 26.47 26.88 25.96 25.22 26.74 GS-PnP 28.77 27.54 26.63 26.85 26.05 25.29 26.86 Bicubic 23.38 22.71 21.78 22.65 22.08 21.25 22.31 RED 26.33 23.91 22.45 25.38 23.40 21.91 23.90 IRCNN 25.83 24.89 24.59 25.36 24.36 23.95 24.83 DPIR 26.84 25.59 24.89 26.24 24.98 24.32 25.48 GS-PnP 26.80 25.73 25.03 26.18 25.08 24.31 25.52 32/51
  95. Super-resolution: qualitative inspection Isotropic kernel, s = 2 and ν

    = 0.03: Clean Observed (20.97dB) RED (21.75dB) IRCNN (22.82dB) DPIR (23.97dB) GS-PnP (24.81dB) (f + gσ)(xk ) ||xk+1 − xk || 33/51
  96. Image inverse problems Inpainting (ν = 0): A diagonal, with

    values in {0, 1}, f (x) = ı{x | Ax=y} Proxτf (x) = Π{x | Ax=y} (x) = Ay − Ax + x = ⊗ 34/51
  97. Overview Find x from observation y = Ax + ξ

    Maximum A-Posteriori arg min x∈Rn f (x) + g(x)  unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk )  implicit prior  no minimization problem  no convergence guarantees  SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x)  explicit prior  minimization problem  convergence guarantees  SOTA restoration 36/51
  98. Convergence of PnP algorithms PnP algorithms • PnP-PGD: xk+1 =

    Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) 3. Gradient step denoiser: PnP-HQS  37/51
  99. Convergence of PnP algorithms PnP algorithms • PnP-PGD: xk+1 =

    Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) 3. Gradient step denoiser: PnP-HQS  Equivalent to PGD: xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) 37/51
  100. Convergence of PnP algorithms PnP algorithms • PnP-PGD: xk+1 =

    Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) 3. Gradient step denoiser: PnP-HQS  Equivalent to PGD: xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) 4. What about other PnP algorithms? 37/51
  101. Outline 1. Introduction 2. From Proximal algorithms to Plug &

    Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion
  102. Proximal Gradient Step denoiser PnP algorithms • PnP-PGD: xk+1 =

    Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) PGD/DRS rely on implicit gradient steps Dσ = Proxφσ 38/51
  103. Proximal Gradient Step denoiser PnP algorithms • PnP-PGD: xk+1 =

    Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) PGD/DRS rely on implicit gradient steps Dσ = Proxφσ of proximal operators of convex functions φσ  Firmly non-expansive operator: limited denoising performances 38/51
  104. Proximal Gradient Step denoiser PnP algorithms • PnP-PGD: xk+1 =

    Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) PGD/DRS rely on implicit gradient steps Dσ = Proxφσ of proximal operators of convex functions φσ  Firmly non-expansive operator: limited denoising performances Idea [Hurault et al. ‘22] • Build on top of the explicit gradient step denoiser Dσ = Id −∇gσ • Make Id −∇gσ the proximal operator of a non-convex potential φσ 38/51
  105. Characterisation of proximal operators Gradient step denoiser Dσ = Id

    −∇gσ = ∇ 1 2 ||x||2 − gσ (x) := ∇hσ Proposition [Moreau’65, Gribonval and Nikolova ’20] If hσ is convex then there exists some function φσ such that Dσ ∈ Proxφσ (x) 39/51
  106. Characterisation of proximal operators Gradient step denoiser Dσ = Id

    −∇gσ = ∇ 1 2 ||x||2 − gσ (x) := ∇hσ Theorem [Gribonval ’11, Gribonval and Nikolova ’20] If gσ is Ck+1, k ≥ 1 and if ∇gσ is contractive (L < 1 Lipschitz), then • hσ is (1 − L) strongly convex • Dσ = Proxφσ is injective with φσ (x) := gσ (Dσ −1(x))) − 1 2 ||Dσ −1(x) − x||2 + Kσ if x ∈ Im(Dσ ) • φσ ≥ gσ (x) • φσ is L/(L + 1) semi-convex (φσ (x) + L 2(L+1) ||x||2 is convex) • φσ is Ck and with L 1−L Lipschitz gradient on Im(Dσ ) 40/51
  107. Proximal Gradient Step denoiser Proposed Proximal gradient step denoiser •

    Gradient step expression Dσ = Id −∇gσ • Contractive residual Id −Dσ = ∇gσ , similar to [Ryu et al ’19]  The denoiser is not non-expansive  Residual condition hard to satisfy exactly 41/51
  108. Proximal Gradient Step denoiser Proposed Proximal gradient step denoiser •

    Gradient step expression Dσ = Id −∇gσ • Contractive residual Id −Dσ = ∇gσ , similar to [Ryu et al ’19]  The denoiser is not non-expansive  Residual condition hard to satisfy exactly If gσ is C2: Id −Dσ contractive ⇔ ||JId −Dσ ||S < 1  Penalization of the spectral norm of the Jacobian [Gulrajani et al ’17, Terris et al. ’20] 41/51
  109. Prox-DRUNet Architecture for Nσ : light version of DRUNet [Zhang

    et al’ 21] Training, for σ ∈ [0, 25], with L2 loss and spectral norm penalization: LS (σ) = Ex∼p,ξσ∼N(0,σ2) ||Dσ (x + ξσ ) − x||2+µ max(||J(Id −Dσ) (x + ξσ )||S , 1− ) PSNR on the CBSD68 dataset σ(./255) 5 15 25 FFDNet 39.95 33.53 30.84 DnCNN 39.80 33.55 30.87 DRUNet 40.31 33.97 31.32 GS-DRUNet 40.27 33.92 31.28 prox-DRUNet (µ = 10−3) 40.12 33.60 30.82 prox-DRUNet (µ = 10−2) 40.04 33.51 30.64 nonexp-DRUNet (µ = 10−3) 39.71 33.50 30.89 nonexp-DRUNet (µ = 10−2) 34.92 31.42 29.42 42/51
  110. Prox-DRUNet Posterior approximation of the Lipschitz constant with power iterations

    • GS-DRUNet: L = maxx ||JId −Dσ(x) ||S • Prox-DRUNet: L = maxx ||JId −Dσ(x) ||S • nonexp-DRUNet: L = maxx ||JNσ(x) ||S σ(./255) 0 5 10 15 20 25 GS-DRUNet (µ = 0) 0.94 1.26 2.47 1.96 2.50 3.27 Prox-DRUNet (µ = 10−3) 0.86 0.94 0.97 0.98 0.99 1.19 Prox-DRUNet (µ = 10−2) 0.87 0.92 0.95 0.99 0.96 0.96 nonexp-DRUNet (µ = 10−3) 1.06 1.07 1.07 1.10 1.13 1.20 nonexp-DRUNet (µ = 10−2) 0.97 0.98 0.98 0.98 0.98 0.98 Empirical value L on the CBSD68 dataset 43/51
  111. Prox-DRUNet Posterior approximation of the Lipschitz constant with power iterations

    • GS-DRUNet: L = maxx ||JId −Dσ(x) ||S • Prox-DRUNet: L = maxx ||JId −Dσ(x) ||S • nonexp-DRUNet: L = maxx ||JNσ(x) ||S σ(./255) 0 5 10 15 20 25 GS-DRUNet (µ = 0) 0.94 1.26 2.47 1.96 2.50 3.27 Prox-DRUNet (µ = 10−3) 0.86 0.94 0.97 0.98 0.99 1.19 Prox-DRUNet (µ = 10−2) 0.87 0.92 0.95 0.99 0.96 0.96 nonexp-DRUNet (µ = 10−3) 1.06 1.07 1.07 1.10 1.13 1.20 nonexp-DRUNet (µ = 10−2) 0.97 0.98 0.98 0.98 0.98 0.98 Empirical value L on the CBSD68 dataset 43/51
  112. Prox-DRUNet Posterior approximation of the Lipschitz constant with power iterations

    • GS-DRUNet: L = maxx ||JId −Dσ(x) ||S • Prox-DRUNet: L = maxx ||JId −Dσ(x) ||S • nonexp-DRUNet: L = maxx ||JNσ(x) ||S σ(./255) 0 5 10 15 20 25 GS-DRUNet (µ = 0) 0.94 1.26 2.47 1.96 2.50 3.27 Prox-DRUNet (µ = 10−3) 0.86 0.94 0.97 0.98 0.99 1.19 Prox-DRUNet (µ = 10−2) 0.87 0.92 0.95 0.99 0.96 0.96 nonexp-DRUNet (µ = 10−3) 1.06 1.07 1.07 1.10 1.13 1.20 nonexp-DRUNet (µ = 10−2) 0.97 0.98 0.98 0.98 0.98 0.98 Empirical value L on the CBSD68 dataset 43/51
  113. Prox-DRUNet Posterior approximation of the Lipschitz constant with power iterations

    • GS-DRUNet: L = maxx ||JId −Dσ(x) ||S • Prox-DRUNet: L = maxx ||JId −Dσ(x) ||S • nonexp-DRUNet: L = maxx ||JNσ(x) ||S σ(./255) 0 5 10 15 20 25 GS-DRUNet (µ = 0) 0.94 1.26 2.47 1.96 2.50 3.27 Prox-DRUNet (µ = 10−3) 0.86 0.94 0.97 0.98 0.99 1.19 Prox-DRUNet (µ = 10−2) 0.87 0.92 0.95 0.99 0.96 0.96 nonexp-DRUNet (µ = 10−3) 1.06 1.07 1.07 1.10 1.13 1.20 nonexp-DRUNet (µ = 10−2) 0.97 0.98 0.98 0.98 0.98 0.98 Empirical value L on the CBSD68 dataset 43/51
  114. PnP algorithms with proximal gradient step denoiser Proximal gradient step

    denoiser Dσ = Id −∇gσ = Proxφσ with φσ (x) := gσ (Dσ −1(x))) − 1 2 ||Dσ −1(x) − x||2 if x ∈ Im(Dσ ) +∞ otherwise Minimization objective min x f (x) + φσ (x) Proximal algorithms • PGD: xk+1 = Proxτφσ ◦ (Id −τ∇f )(xk ) • DRS xk+1 = (2 Proxτf − Id) ◦ (2Proxτφσ − Id)(xk ) 44/51
  115. PnP algorithms with proximal gradient step denoiser Proximal gradient step

    denoiser Dσ = Id −∇gσ = Proxφσ with φσ (x) := gσ (Dσ −1(x))) − 1 2 ||Dσ −1(x) − x||2 if x ∈ Im(Dσ ) +∞ otherwise Minimization objective min x f (x) + φσ (x) Proximal algorithms • PGD: xk+1 = Proxτφσ ◦ (Id −τ∇f )(xk ) • DRS xk+1 = (2 Proxτf − Id) ◦ (2Proxτφσ − Id)(xk )  Timestep has to be fixed to τ = 1 44/51
  116. Non-convex PGD Minimization objective min x f (x) + φσ

    (x) PnP-PGD, for differentiable f xk+1 = Proxφσ ◦(Id −∇f )(xk ) Convergence (function value and iterates) [Attouch et al.’13, Li and Lin ’15, Beck ’17] • φσ and f bounded from below • φσ is L L+1 semi convex with L < 1 • f of class C1 with a Lf < L+2 L+1 -Lipschitz gradient 45/51
  117. Non-convex PGD Minimization objective min x f (x) + φσ

    (x) PnP-PGD, for differentiable f xk+1 = Proxφσ ◦(Id −∇f )(xk ) Convergence (function value and iterates) [Attouch et al.’13, Li and Lin ’15, Beck ’17] • φσ and f bounded from below • φσ is L L+1 semi convex with L < 1 • f of class C1 with a Lf < L+2 L+1 -Lipschitz gradient What about non differentiable data fidelity terms? 45/51
  118. Non-convex DRS Minimization objective min x f (x) + φσ

    (x) PnP-DRS xk+1 = (2 Proxφσ − Id) ◦ (2 Proxf − Id)(xk ) Reformulation (also equivalent to ADMM)      yk+1 = Proxφσ (xk ) zk+1 = Proxf (2yk+1 − xk+1 ) xk+1 = xk + (zk+1 − yk+1 ) 46/51
  119. Non-convex DRS Minimization objective min x f (x) + φσ

    (x) PnP-DRS xk+1 = (2 Proxφσ − Id) ◦ (2 Proxf − Id)(xk ) Reformulation (also equivalent to ADMM)      yk+1 = Proxφσ (xk ) zk+1 = Proxf (2yk+1 − xk+1 ) xk+1 = xk + (zk+1 − yk+1 ) Douglas-Rachford Envelope as Lyapunov function FDR (x) = φσ (y) + f (z) + y − x, y − z + 1 2 ||y − z||2  Convergence for proper l.s.c. functions f and φσ if one function has a Lipschitz gradient with constant < 1 [Themelis and Patrinos ’20] 46/51
  120. Non-convex DRS Minimization objective min x f (x) + φσ

    (x) PnP-DRS xk+1 = (2 Proxφσ − Id) ◦ (2 Proxf − Id)(xk ) Reformulation (also equivalent to ADMM)      yk+1 = Proxφσ (xk ) zk+1 = Proxf (2yk+1 − xk+1 ) xk+1 = xk + (zk+1 − yk+1 ) Douglas-Rachford Envelope as Lyapunov function FDR (x) = φσ (y) + f (z) + y − x, y − z + 1 2 ||y − z||2  Convergence for proper l.s.c. functions f and φσ if one function has a Lipschitz gradient with constant < 1 [Themelis and Patrinos ’20]  φσ has a L 1−L Lipschitz gradient: the residual Id −Dσ must be L < 0.5-Lipschitz 46/51
  121. Discussion on assumptions Inverse problem • f (x) = λ

    2 ||Ax − y||2 • Dσ = Id −∇gσ , with (Id −Dσ ) L < 1-Lipschitz 47/51
  122. Discussion on assumptions Inverse problem • f (x) = λ

    2 ||Ax − y||2 • Dσ = Id −∇gσ , with (Id −Dσ ) L < 1-Lipschitz PnP-PGD  ||∇f || = λ||At A|| < L+2 L+1  Solved with a α-relaxed PGD [Tseng ‘08, Hurault et. al ‘23]      qk+1 = (1 − α)yk + αxk xk+1 = Dσ (xk − λ∇f (qk+1 )) yk+1 = (1 − α)yk + αxk+1 PnP-DRS  λ free  Can deal with non differentiable data fidelity terms f  Restricted to denoisers with L < 0.5-Lipschitz residual 47/51
  123. Quantitative evaluation PSNR averaged on CBSD68 over 10 blur kernels

    (deblurring) and 4 kernels and 2 scales (super-resolution) Deblurring Super-resolution scale s = 2 scale s = 3 Noise level ν 0.01 0.03 0.05 0.01 0.03 0.05 0.01 0.03 0.05 IRCNN 31.42 28.01 26.40 26.97 25.86 25.45 25.60 24.72 24.38 DPIR 31.93 28.30 26.82 27.79 26.58 25.83 26.05 25.27 24.66 GS-PnP-HQS 31.70 28.28 26.86 27.88 26.81 26.01 25.97 25.35 24.74 Prox-PnP-PGD 30.91 27.97 26.66 27.68 26.57 25.81 25.94 25.20 24.62 Prox-PnP-DRS 31.54 28.07 26.60 27.93 26.61 25.79 26.13 25.29 24.67 Prox-PnP-αPGD 31.55 28.03 26.66 27.92 26.61 25.80 26.03 25.26 24.61 48/51
  124. Outline 1. Introduction 2. From Proximal algorithms to Plug &

    Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion
  125. Conclusion Contributions • Gradient based denoisers Dσ not non-expansive •

    Convergent PnP schemes • Tractable (non-convex) optimization problems • State-of-the-art results for various imaging tasks 51/51
  126. • S. Hurault, A. Leclaire, N. Papadakis - Gradient Step

    Denoiser for convergent Plug-and-Play - ICLR, 2022 • S. Hurault, A. Leclaire, N. Papadakis - Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization - ICML, 2022 • S. Hurault, A. Chambolle, A. Leclaire, N. Papadakis, - A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser, 2023.
  127. Denoising prior Find x from observation y = x +

    ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers
  128. Denoising prior Find x from observation y = x +

    ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x]
  129. Denoising prior Find x from observation y = x +

    ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] = arg max x∈Rn p(y|x)p(x) p(y) = arg min x∈Rn − log p(y|x) − log p(x) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y)
  130. Denoising prior Find x from observation y = x +

    ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p
  131. Denoising prior Find x from observation y = x +

    ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p Tweedie (explicit gradient): DMMSE σ = Id −σ2∇(− log pσ )
  132. Denoising prior Find x from observation y = x +

    ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p Tweedie (explicit gradient): DMMSE σ = Id −σ2∇(− log pσ ) → Real denoiser Dσ ≈ DMAP σ or Dσ ≈ DMMSE σ
  133. Denoising prior Find x from observation y = x +

    ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p Tweedie (explicit gradient): DMMSE σ = Id −σ2∇(− log pσ ) → Real denoiser Dσ ≈ DMAP σ or Dσ ≈ DMMSE σ → A denoiser provides an implicit prior
  134. Denoising prior Find x from observation y = x +

    ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p Tweedie (explicit gradient): DMMSE σ = Id −σ2∇(− log pσ ) → Real denoiser Dσ ≈ DMAP σ or Dσ ≈ DMMSE σ → A denoiser provides an implicit prior → State-of-the-art denoisers are not non-expansive
  135. PnP and RED algorithms Proximal algorithms • GD: xk+1 =

    (Id −τ(∇f + λ∇g))(xk ) • HQS: xk+1 = Proxτg ◦ Proxτf (xk ) • PGD: xk+1 = Proxτg ◦(Id −τ∇f )(xk ) • DRS: xk+1 = 1 2 Id +1 2 (2 Proxτf − Id) ◦ (2 Proxτg − Id) (xk ) MAP denoiser Dσ (y) = Prox−σ2 log p (y) MMSE denoiser Dσ (y) = (Id +σ2∇ log pσ )(y) PnP algorithms [Venkatakrishnan et al. ‘13] Proxτg ← Dσ RED algorithms [Romano et al. ‘17] ∇g ← Id −Dσ
  136. PnP and RED algorithms Proximal algorithms • GD: xk+1 =

    (Id −τ(∇f + λ∇g))(xk ) • HQS: xk+1 = Proxτg ◦ Proxτf (xk ) • PGD: xk+1 = Proxτg ◦(Id −τ∇f )(xk ) • DRS: xk+1 = 1 2 Id +1 2 (2 Proxτf − Id) ◦ (2 Proxτg − Id) (xk ) MAP denoiser Dσ (y) = Prox−σ2 log p (y) MMSE denoiser Dσ (y) = (Id +σ2∇ log pσ )(y) PnP algorithms [Venkatakrishnan et al. ‘13] Proxτg ← Dσ RED algorithms [Romano et al. ‘17] ∇g ← Id −Dσ • PnP-HQS: xk+1 = Dσ ◦ Proxτf (xk ) • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = 1 2 Id +1 2 (2 Proxτf − Id) ◦ (2Dσ − Id)(xk )
  137. PnP and RED algorithms Proximal algorithms • GD: xk+1 =

    (Id −τ(∇f + λ∇g))(xk ) • HQS: xk+1 = Proxτg ◦ Proxτf (xk ) • PGD: xk+1 = Proxτg ◦(Id −τ∇f )(xk ) • DRS: xk+1 = 1 2 Id +1 2 (2 Proxτf − Id) ◦ (2 Proxτg − Id) (xk ) MAP denoiser Dσ (y) = Prox−σ2 log p (y) MMSE denoiser Dσ (y) = (Id +σ2∇ log pσ )(y) PnP algorithms [Venkatakrishnan et al. ‘13] Proxτg ← Dσ RED algorithms [Romano et al. ‘17] ∇g ← Id −Dσ We have Id −τ∇g = τDσ + (1 − τ) Id • RED-GD: xk+1 = (τDσ + (1 − τ) Id − τ∇f )(xk ) • RED-PGD: xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk )
  138. Non-convex PGD: function value and residual convergence Theorem [Beck and

    Teboulle ’09] Assuming • f : Rn → R ∪ {+∞} proper, convex, lower semicontinous • g : Rn → R differentiable with L-Lipschitz gradient • F = f + λg bounded from below Then, for τ < 1 λL , the iterates xk given by the PGD algorithm xk+1 = Proxτf ◦(Id −λτ∇g)(xk ) verify (i) (F(xk )) is non-increasing and converging. (ii) The residual ||xk+1 − xk || converges to 0 (iii) Cluster points of (xk ) are stationary points of F. Remark: Can be extended to non-convex data terms [Li and Lin ’15]
  139. Non-convex PGD: iterates convergence Theorem [Attouch, Bolte and Svaiter ’13]

    Assuming • Same as before + f and g verify the Kurdyka-Lojasiewicz (KL) property Then, for τ < 1 λL , if the sequence (xk ) given by xk+1 = Proxτf ◦(Id −λτ∇g)(xk ) is bounded, it converges, with finite length, to a critical point of f + λg. Go back
  140. Non-convex Proximal Gradient Descent Theorem (adapted from [Attouch et al.’13,

    Li and Lin ’15, Beck ’17]) For F = f + φσ , assume • φσ such that Dσ = Id −∇gσ = Proxφσ with gσ : Rn → R ∪ {+∞} of class C2 with L < 1-Lipschitz gradient and bounded from below • f : Rn → R ∪ {+∞} differentiable with Lf < 1-Lipschitz gradient and bounded from below Then the iterates xk+1 = Dσ ◦ (Id −∇f )(xk ) of PnP-PGD satisfy (i) (F(xk )) is non-increasing and converges (ii) The residual converges ||xk+1 − xk || convers to 0 (iii) Cluster points of (xk ) are stationary points of F (iv) If f and gσ are respectively KL and semi-algebraic, then if (xk ) is bounded, it converges, with finite length, to a stationary point of F Go back
  141. Non-convex DRS Theorem (Adapted from [Themelis and Patrinos ’20]) For

    F = f + φσ , assume • φσ such that Dσ = Id −∇gσ = Proxφσ with gσ : Rn → R ∪ {+∞} of class C2 with L-Lipschitz gradient (such that 2L3 + L2 + 2L − 1 < 0) and bounded from below • f : Rn → R ∪ {+∞} proper l.s.c and bounded from below Take the Douglas-Rachford Envelope as Lyapunov function FDR (x) = φσ(y) + f (z) + y − x, y − z + 1 2 ||y − z||2 Then the iterates of PnP-DRS satisfy (i) (FDR (xk )) is nonincreasing and converges (ii) The residual ||yk − zk || converges to 0 (iii) For any cluster point (y∗, z∗, x∗), y∗ and z∗ are stationary points of F (iv) If f and gσ are KL and semi-algebraic, then if (yk , zk , xk ) is bounded, it converges, and yk and zk converge to the same stationary point of F Go back
  142. Numerical implementation Denoiser: Dσ (x) = Nσ (x) + JNσ

    (x)T (x − Nσ (x)) Computing JNσ (x)T (x − Nσ (x)): N = DRUNet light ( x , sigma ) t o r c h . autograd . grad (N, x , g r a d o u t p u t s=x − N, c r e a t e g r a p h=True , o n l y i n p u t s=True ) [ 0 ] Go back
  143. Fixed points of the denoiser Nσ (x) Initialization σ =

    0 σ = 0.5 σ = 1 σ = 5 σ = 10 σ = 20 σ = 30 σ = 40
  144. Fixed points of the denoiser Nσ (x) Initialization σ =

    0 σ = 0.5 σ = 1 σ = 5 σ = 10 σ = 20 σ = 30 σ = 40
  145. Fixed points of the denoiser Nσ (x) Initialization σ =

    0 σ = 0.5 σ = 1 σ = 5 σ = 10 σ = 20 σ = 30 σ = 40
  146. Learning the Gradient-step and Proximal denoisers Our denoisers write Dσ

    = Id −∇gσ with gσ (x) = 1 2 ||x − Nσ (x)||2 with a C1 or C2 neural network Nσ : Rn → Rn (DRUNet [Zhang et al. ’21] with ELU) Dσ (x) = x − ∇gσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) Training loss: GS-Denoiser - Prox-Denoiser arg min Param(Dσ) Ex,ξσ ||Dσ (x + ξσ ) − x||2 + µ max(||∇2gσ (x + ξσ )||S , 1 − ) σ(./255) 5 15 25 DRUNet 40.19 33.89 31.25 GS-Denoiser 40.26 33.90 31.26 Prox-Denoiser 40.12 33.60 30.82 Table 1: Denoising PSNR on the CBSD68 dataset, for various σ.
  147. Learning the Gradient-step and Proximal denoisers Our denoisers write Dσ

    = Id −∇gσ with gσ (x) = 1 2 ||x − Nσ (x)||2 with a C1 or C2 neural network Nσ : Rn → Rn (DRUNet [Zhang et al. ’21] with ELU) Dσ (x) = x − ∇gσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) Training loss: GS-Denoiser - Prox-Denoiser arg min Param(Dσ) Ex,ξσ ||Dσ (x + ξσ ) − x||2 + µ max(||∇2gσ (x + ξσ )||S , 1 − ) σ(./255) 5 15 25 GS-DRUNet 1.26 1.96 3.27 Prox-DRUNet 0.92 0.99 0.96 Table 1: maxx ||∇2gσ (x)||S on the CBSD68 dataset, for various σ.
  148. Averaged operator theory [Bauschke & Combettes ‘11] Definition T :

    Rn → Rn and θ ∈ (0, 1). T is θ-averaged if there is a nonexpansive operator R s.t. T = θR + (1 − θ) Id . • This is equivalent to ∀x, y ∈ Rn, T(x)−T(y) 2 + 1−θ θ (Id −T)(x)−(Id −T)(y) 2 ≤ x −y 2. • T θ-averaged =⇒ T nonexpansive. • 1/2-averaged = “firmly nonexpansive”.
  149. Averaged operator theory [Bauschke & Combettes ‘11] Definition T :

    Rn → Rn and θ ∈ (0, 1). T is θ-averaged if there is a nonexpansive operator R s.t. T = θR + (1 − θ) Id . • This is equivalent to ∀x, y ∈ Rn, T(x)−T(y) 2 + 1−θ θ (Id −T)(x)−(Id −T)(y) 2 ≤ x −y 2. • T θ-averaged =⇒ T nonexpansive. • 1/2-averaged = “firmly nonexpansive”. Theorem (Krasnosel’ski˘ ı-Mann) Let T : Rn → Rn be a θ-averaged operator such that Fix(T) = ∅. Then the sequence xk+1 = T(xk ) converges to a fixed point of T.
  150. Averaged operator theory [Bauschke & Combettes ‘11] Proposition Let T1

    θ1 -averaged and T2 θ2 -averaged, with any θ1 , θ2 ∈ (0, 1). • Then T1 ◦ T2 is θ-averaged with θ = θ1+θ2−2θ1θ2 1−θ1θ2 ∈ (0, 1). • For α ∈ [0, 1], αT1 + (1 − α) Id is αθ1 -averaged. Proposition If f : Rn → R is convex and L-smooth (i.e. differentiable with L-Lipschitz gradient) • Proxτf is τL 2(1+τL) -averaged for any τ > 0. • Id −τ∇f is τL 2 -averaged for τ < 2 L .
  151. Averaged operator theory for PnP convergence PnP algorithms: xk+1 =

    TPnP (xk ) with TPnP =      THQS = Dσ ◦ Proxτf TPGD = Dσ ◦ (Id −τ∇f ) TDRS = 1 2 Id +1 2 (2Dσ − Id) ◦ (2 Proxτf − Id) Theorem If f : Rn→R convex, L-smooth and Dσ isθ-averaged, θ ∈ (0, 1) • PnP-HQS converges towards a fixed point of THQS . • If τL < 2, PnP-PGD converges towards a fixed point of TPGD . • If θ ≤ 1/2, PnP-DRS converges towards a fixed point of TDRS .
  152. Averaged operator theory for PnP convergence PnP algorithms: xk+1 =

    TPnP (xk ) with TPnP =      THQS = Dσ ◦ Proxτf TPGD = Dσ ◦ (Id −τ∇f ) TDRS = 1 2 Id +1 2 (2Dσ − Id) ◦ (2 Proxτf − Id) Theorem If f : Rn→R convex, L-smooth and Dσ isθ-averaged, θ ∈ (0, 1) • PnP-HQS converges towards a fixed point of THQS . • If τL < 2, PnP-PGD converges towards a fixed point of TPGD . • If θ ≤ 1/2, PnP-DRS converges towards a fixed point of TDRS .  Does not extend to nonconvex data-fidelity terms f . • If f is L-smooth and strongly convex, Proxτf is contractive, and for τL < 2, Id −τ∇f is contractive → assume Dσ (1 + )-Lipshitz [Ryu et. al, ’19].
  153. How to build averaged deep denoisers ? Average operator: Dσ

    = θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19]  Lipschitz constant 1 for large networks.  Does not allow skip connexions.
  154. How to build averaged deep denoisers ? Average operator: Dσ

    = θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19]  Lipschitz constant 1 for large networks.  Does not allow skip connexions. • Deep spline neural networks [Goujon, Neumayer et. al, ‘22] • Convolutional Proximal Neural Networks [Hertrich et. al, ‘20] • Soft regularization of the training loss [Terris et. al ’21]. • Dσ = Id −∇gσ with gσ Input Convex Neural Network (ICNN) [Meunier et. al ‘21].
  155. How to build averaged deep denoisers ? Average operator: Dσ

    = θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19]  Lipschitz constant 1 for large networks.  Does not allow skip connexions. • Deep spline neural networks [Goujon, Neumayer et. al, ‘22] • Convolutional Proximal Neural Networks [Hertrich et. al, ‘20] • Soft regularization of the training loss [Terris et. al ’21]. • Dσ = Id −∇gσ with gσ Input Convex Neural Network (ICNN) [Meunier et. al ‘21]. Non-expansiveness can harm denoising performance.
  156. Nonexpansive convolutional neural networks [Terris et al., 2021] Idea: •

    Build a nonexpansive convolutional neural network (CNN) D = TM ◦ · · · ◦ T1 with Tm (x) = Rm (Wm x + bm ) where Rm is an (averaged) activation function, Wm a convolution, and bm a bias. • We want to have D = Id +Q 2 with Q nonexpansive. • During training, the Lipschitz constant of 2D − Id is penalized.
  157. Convolution proximal neural networks [Hertrich et al., 2022] Idea: Build

    a convolutional proximal neural network (cPNN) Φu = TM ◦ · · · ◦ T1 with Tm (x) = W T m σm (Wm x + bm ) where u = (Wm , σm , bm )1≤m≤M is a collection of parameters. The linear operators Wm (or W T m ) are convolutions lying in a Stiefel manifold St(d, n) = { W ∈ Rn×d | W T W = Id }. The resulting denoiser is then D = Id −γΦu . • Ideally, Φu is a composition of M firmly non-expansive operators, thus averaged. • In practice, Wm is a convolution with limited filter length. • Condition Wm ∈ St is approximated with a term W T m Wm − Id 2 F in the learning cost. • Φu is verified in practice to be t-averaged with t close to 1 2 .
  158. Deep Spline Neural Networks [Goujon, Neumayer et al., 2022] Idea:

    Approximate the proximal operator of a convex-ridge regularizer R(x) = P p=1 i ψp (hp ∗ x(i)) hp are convolution kernels, ψp are particular C1 convex functions. Given a noisy y, ProxλR (y) = arg min x∈Rn 1 2 x − y 2 + λR(x) is approximated with t iterations of the gradient-step x → x − α((x − y) + λ∇R(x)). The output after t iterations is denoted by Tt R,λ,α (y). • Tt R,λ,α approximates the prox of a convex function • Linear spline parameterization of ψp justified by a density result