Slide 1

Slide 1 text

Convergent Plug & Play schemes for Inverse Problems in Imaging Samuel Hurault, Arthur Leclaire, Nicolas Papadakis

Slide 2

Slide 2 text

Image Enhancement and Restoration [Thi´ ebaut et al. ’16] [Gustafsson et al. ’16] 1/51

Slide 3

Slide 3 text

Image Enhancement and Restoration [Thi´ ebaut et al. ’16] [Gustafsson et al. ’16] [The Mandalorian, Disney ’20] 1/51

Slide 4

Slide 4 text

Outline 1. Introduction 2. From Proximal algorithms to Plug & Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion

Slide 5

Slide 5 text

1. Introduction

Slide 6

Slide 6 text

Image Inverse Problems Find x from observation y = Ax + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Denoising: = + 2/51

Slide 7

Slide 7 text

Image Inverse Problems Find x from observation y = Ax + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Deblurring: = ∗ + 2/51

Slide 8

Slide 8 text

Image Inverse Problems Find x from observation y = Ax + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Super-resolution: =         ∗         s + 2/51

Slide 9

Slide 9 text

Image Inverse Problems Find x from observation y = Ax + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Inpainting: = ⊗ 2/51

Slide 10

Slide 10 text

Image Inverse Problems Find x from observation y = Ax + ξ • x ∈ Rn input ”clean” image • y ∈ Rm degraded image • A ∈ Rm×n degradation matrix • ξ random noise, generally ξ ∼ N(0, σ2 Idm ) Compressed Sensing: e.g. Magnetic Resonance Imaging (MRI) 2/51

Slide 11

Slide 11 text

Image Inverse Problems Find x from observation y ∼ p(y|x) • y ∈ Rm observation • x ∈ Rn unknown input • p(y|x) forward model Computed Tomography: 2/51

Slide 12

Slide 12 text

Image Inverse Problems with Plug & Play Find x from observation y = A(x) + ξ Supervized learning Provides state-of-the art results from paired data (xi , yi ) Requires to train a different model for each A 3/51

Slide 13

Slide 13 text

Image Inverse Problems with Plug & Play Find x from observation y = A(x) + ξ Supervized learning Provides state-of-the art results from paired data (xi , yi ) Requires to train a different model for each A Plug & Play (PnP): • Learn an efficient denoiser • Plug the denoiser in an iterative algorithm • Solve general inverse problems, involving any operator A 3/51

Slide 14

Slide 14 text

Image Inverse Problems with Plug & Play Find x from observation y = A(x) + ξ Supervized learning Provides state-of-the art results from paired data (xi , yi ) Requires to train a different model for each A Plug & Play (PnP): • Learn an efficient denoiser • Plug the denoiser in an iterative algorithm • Solve general inverse problems, involving any operator A State-of-the-art image restoration Convergence of the method 3/51

Slide 15

Slide 15 text

Maximum A-Posteriori Find x from observation y ∼ p(y|x) with an a-priori p(x) on the solution arg max x∈Rn p(x|y) = arg max x∈Rn p(y|x)p(x) p(y) = arg min x∈Rn − log p(y|x) − log p(x) 4/51

Slide 16

Slide 16 text

Maximum A-Posteriori Find x from observation y ∼ p(y|x) with an a-priori p(x) on the solution arg max x∈Rn p(x|y) = arg max x∈Rn p(y|x)p(x) p(y) = arg min x∈Rn − log p(y|x) − log p(x) Maximum A-Posteriori x∗ ∈ arg min x∈Rn f (x) + g(x) ⇐⇒ arg min x∈Rn data-fidelity + regularization f (x) = − log p(y|x) g(x) ∝ − log p(x) 4/51

Slide 17

Slide 17 text

A variety of data-fidelity terms f Observation y = Ax + ξ • Assuming Gaussian noise model ξ ∼ N(0, σ2 Id), f (x) = − log p(y|x) = 1 2σ2 ||Ax − y||2 → convex and smooth f , non-strongly convex in general 5/51

Slide 18

Slide 18 text

A variety of data-fidelity terms f Observation y = Ax + ξ • Assuming Gaussian noise model ξ ∼ N(0, σ2 Id), f (x) = − log p(y|x) = 1 2σ2 ||Ax − y||2 → convex and smooth f , non-strongly convex in general • Less regular cases - Noiseless case: f (x) = ı{x | Ax=y} → non-smooth f - Laplace / Poisson noise model → non-smooth f - Phase retrieval → non-convex f • More complex non-linear modeling of real complex physical systems (e.g. X-ray computed tomography, electron-microscopy...) 5/51

Slide 19

Slide 19 text

A variety of explicit image priors Design an explicit regularization on image features: • Total variation [Rudin et al. ‘92] • Fourier spectrum [Ruderman ‘94] • Wavelet sparsity [Mallat ’09] 6/51

Slide 20

Slide 20 text

A variety of explicit image priors Design an explicit regularization on image features: • Total variation [Rudin et al. ‘92] • Fourier spectrum [Ruderman ‘94] • Wavelet sparsity [Mallat ’09] Learn an explicit prior on image patches: • Dictionary learning [Elad et al. ’06, Mairal et al. ’08] • Gaussian mixture models [Yu et al. ’10], [Zoran, Weiss ‘11] 6/51

Slide 21

Slide 21 text

A variety of explicit image priors Design an explicit regularization on image features: • Total variation [Rudin et al. ‘92] • Fourier spectrum [Ruderman ‘94] • Wavelet sparsity [Mallat ’09] Learn an explicit prior on image patches: • Dictionary learning [Elad et al. ’06, Mairal et al. ’08] • Gaussian mixture models [Yu et al. ’10], [Zoran, Weiss ‘11] Learn an explicit deep prior on full images: (generative models) • Variational Auto-encoders [Kingma & Welling, ‘13] • Normalizing flows [Dinh et al. ‘15] • Adversarial [Lunz et al. ‘18, Mukherjee et al. ‘21] • Score-based models [Song. et al ’21] 6/51

Slide 22

Slide 22 text

PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x) • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] 7/51

Slide 23

Slide 23 text

PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x) • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] 7/51

Slide 24

Slide 24 text

PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x) • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] 7/51

Slide 25

Slide 25 text

PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x) • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior 7/51

Slide 26

Slide 26 text

PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x) • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior • PnP: From y degraded, iterate: 1. Perform a denoising step with Dσ 2. Enforce data-fidelity 7/51

Slide 27

Slide 27 text

PnP motivations Find x∗ ∈ arg minx∈Rn Data-fidelity(x) + Regularization(x) • Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior • PnP: From y degraded, iterate: 1. Perform a denoising step with Dσ 2. Enforce data-fidelity How many interations? Convergence when plugging Dσ in an optimization algorithm? 7/51

Slide 28

Slide 28 text

Outline 1. Introduction 2. From Proximal algorithms to Plug & Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion

Slide 29

Slide 29 text

2. From Proximal algorithms to Plug & Play algorithms

Slide 30

Slide 30 text

Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) = arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) 8/51

Slide 31

Slide 31 text

Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) = arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) data-fidelity 8/51

Slide 32

Slide 32 text

Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) = arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) data-fidelity prior p(x) 8/51

Slide 33

Slide 33 text

Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) = arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) Example: Inpainting 8/51

Slide 34

Slide 34 text

Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) = arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) Example: Inpainting Log-concave prior p Convex problem f + g 8/51

Slide 35

Slide 35 text

Image Inverse Problems Maximum A-Posteriori (MAP) x∗ MAP (y) = arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) Example: Inpainting Log-concave prior p Non log-concave prior Convex problem f + g Non-convex problem f + g 8/51

Slide 36

Slide 36 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) 9/51

Slide 37

Slide 37 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51

Slide 38

Slide 38 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51

Slide 39

Slide 39 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51

Slide 40

Slide 40 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51

Slide 41

Slide 41 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) 9/51

Slide 42

Slide 42 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) Fixed point: x∗ = (Id −τ∇h)(x∗) ⇔ Minimizer: ∇h(x∗) = 0 9/51

Slide 43

Slide 43 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) Fixed point: x∗ = (Id −τ∇h)(x∗) ⇔ Minimizer: ∇h(x∗) = 0 Convergence guarantee if Id −τ∇h is non-expansive ||(Id −τ∇h)(x) − (Id −τ∇h)(y)|| ≤ ||x − y|| Sufficient condition: if h convex with L-Lipschitz gradient and τL < 2 9/51

Slide 44

Slide 44 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51

Slide 45

Slide 45 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51

Slide 46

Slide 46 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51

Slide 47

Slide 47 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51

Slide 48

Slide 48 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51

Slide 49

Slide 49 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51

Slide 50

Slide 50 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) 9/51

Slide 51

Slide 51 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) Fixed point: x∗ = Proxτh (x∗) ⇔ Minimizer: 0 ∈ ∂h(x∗) 9/51

Slide 52

Slide 52 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) Implicit proximal operator for (non)differentiable (convex) functions Proxτh (z) ∈ arg min x∈Rn 1 2τ ||x − z||2 + h(x) xk+1 = Proxτh (xk ) = (Id +τ∂h)−1(xk ) Fixed point: x∗ = Proxτh (x∗) ⇔ Minimizer: 0 ∈ ∂h(x∗) For convex functions, Proxτh is non-expansive for all τ > 0 9/51

Slide 53

Slide 53 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) For γ-strongly convex functions ∇h(x) − ∇h(y), x − y ≥ γ||x − y||2 non-expansiveness → contractiveness 9/51

Slide 54

Slide 54 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) For γ-strongly convex functions ∇h(x) − ∇h(y), x − y ≥ γ||x − y||2 non-expansiveness → contractiveness • Proximal operator || Proxτh (x) − Proxτh (y) ≤ 1 1 + τγ ||x − y|| • Gradient descent operator, for τ < 2 L+γ ||(Id −τ∇h)(x) − (Id −τ∇h)(y)|| ≤ (1 − τγ)||x − y|| 9/51

Slide 55

Slide 55 text

Proximal optimization algorithms Minimization problem for a proper function h : Rn → R x∗ ∈ arg min x∈Rn h(x) To sum up Convexity of function h ⇒ non-expansiveness of gradient/proximal operators ⇒ Convergence of gradient iterations to minimizers of h 9/51

Slide 56

Slide 56 text

Gradient-based optimization algorithms Minimization problem with explicit prior x∗ ∈ arg min x∈Rn f (x) + g(x) Proximal algorithms • Proximal Gradient Descent (PGD) xk+1 = TPGD (xk ) = Proxτg ◦(Id −τ∇f )(xk ) • Douglas-Rashford-Splitting (DRS) xk+1 = TDRS (xk ) = (2 Proxτf − Id) ◦ (2 Proxτg − Id)(xk ) • Half-Quadratic-Splitting (HQS) xk+1 = THQS (xk ) = Proxτf ◦ Proxτg (xk ) 10/51

Slide 57

Slide 57 text

Gradient-based optimization algorithms Minimization problem with explicit prior x∗ ∈ arg min x∈Rn f (x) + g(x) Proximal algorithms • Proximal Gradient Descent (PGD) xk+1 = TPGD (xk ) = Proxτg ◦(Id −τ∇f )(xk ) • Douglas-Rashford-Splitting (DRS) xk+1 = TDRS (xk ) = (2 Proxτf − Id) ◦ (2 Proxτg − Id)(xk ) • Half-Quadratic-Splitting (HQS) xk+1 = THQS (xk ) = Proxτf ◦ Proxτg (xk ) Fixed points of these non-expansive operators ⇔ Minimizers of f + g 10/51

Slide 58

Slide 58 text

Gradient-based optimization algorithms Minimization problem with explicit prior x∗ ∈ arg min x∈Rn f (x) + g(x) Proximal algorithms • Proximal Gradient Descent (PGD) xk+1 = TPGD (xk ) = Proxτg ◦(Id −τ∇f )(xk ) • Douglas-Rashford-Splitting (DRS) xk+1 = TDRS (xk ) = (2 Proxτf − Id) ◦ (2 Proxτg − Id)(xk ) • Half-Quadratic-Splitting (HQS) xk+1 = THQS (xk ) = Proxτf ◦ Proxτg (xk ) Fixed points of these non-expansive operators ⇔ Minimizers of f + g How using (deep) implicit learned priors? 10/51

Slide 59

Slide 59 text

Plug & Play Image Restoration Denoising MAP solution y = ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p 11/51

Slide 60

Slide 60 text

Plug & Play Image Restoration Denoising MAP solution y = ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p Plug & Play [Venkatakrishnan et al. ’13, Romano et al. 17]: Replace the gradient operator of the regularization by an external denoiser Dσ : Rn → Rn learnt for noise level σ Dσ (y) ≈ x∗ MAP (y) = Prox−σ2 log p (y) 11/51

Slide 61

Slide 61 text

Plug & Play Image Restoration Denoising MAP solution y = ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p Plug & Play [Venkatakrishnan et al. ’13, Romano et al. 17]: Replace the gradient operator of the regularization by an external denoiser Dσ : Rn → Rn learnt for noise level σ Dσ (y) ≈ x∗ MAP (y) = Prox−σ2 log p (y) Example Dσ : DRUNet [Zhang et al’ 21] Remark: State-of-the-art denoisers are not non-expansive 11/51

Slide 62

Slide 62 text

Plug & Play algorithms Proximal algorithms • PGD: xk+1 = Proxτg ◦ (Id −τ∇f )(xk ) • DRS: xk+1 = (2 Proxτf − Id) ◦ (2Proxτg − Id)(xk ) • HQS: xk+1 = Proxτf ◦Proxτg (xk ) Plug & Play algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) 12/51

Slide 63

Slide 63 text

Plug & Play Image Restoration Pros and cons High image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21] 13/51

Slide 64

Slide 64 text

Plug & Play Image Restoration Pros and cons High image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21] Implicit prior → no tractable minimization problem minx f (x)+g(x) 13/51

Slide 65

Slide 65 text

Plug & Play Image Restoration Pros and cons High image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21] Implicit prior → no tractable minimization problem minx f (x)+g(x) Efficient denoisers are not non-expansive → Convergence guarantees 13/51

Slide 66

Slide 66 text

Plug & Play Image Restoration Pros and cons High image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21] Implicit prior → no tractable minimization problem minx f (x)+g(x) Efficient denoisers are not non-expansive → Convergence guarantees Relevant priors are not log-concave → Non-convex problems 13/51

Slide 67

Slide 67 text

Convergence of PnP methods Plug & Play algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) Convergence results Pseudo-linear denoisers Dσ (x) = Kσ (x)x [Sreehari et al. ’16, Romano et al. ’17, Chan et al. ’19, Gavaskar and Chaudhury ’20, Nair et al. ’21] Deep denoisers 14/51

Slide 68

Slide 68 text

Related works for deep denoisers Fixed point strategies • Denoisers assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| non-expansiveness can degrade denoising performances 15/51

Slide 69

Slide 69 text

Related works for deep denoisers Fixed point strategies • Denoisers assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19] Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk ) 15/51

Slide 70

Slide 70 text

Related works for deep denoisers Fixed point strategies • Denoisers assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19] Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk ) Implicit prior: no tractable minimization problem No explicit expression of g such that x∗ ∈ arg minx f (x) + g(x) 15/51

Slide 71

Slide 71 text

Related works for deep denoisers Fixed point strategies • Denoisers assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19] Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk ) Implicit prior: no tractable minimization problem No explicit expression of g such that x∗ ∈ arg minx f (x) + g(x) Our objective: → 15/51

Slide 72

Slide 72 text

Overview Find x from observation y = Ax + ξ Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p 16/51

Slide 73

Slide 73 text

Overview Find x from observation y = Ax + ξ Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration 16/51

Slide 74

Slide 74 text

Outline 1. Introduction 2. From Proximal algorithms to Plug & Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion

Slide 75

Slide 75 text

3. Gradient Step Plug & Play

Slide 76

Slide 76 text

Gradient step denoiser and PnP Idea [Hurault et al ‘21]: • Define the denoiser as a gradient descent step over a differentiable (non-convex) scalar function gσ : Rn → R: Dσ = Id −∇gσ • Proposed PnP algorithm, for τ > 0 xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) 17/51

Slide 77

Slide 77 text

Gradient step denoiser and PnP Idea [Hurault et al ‘21]: • Define the denoiser as a gradient descent step over a differentiable (non-convex) scalar function gσ : Rn → R: Dσ = Id −∇gσ • Proposed PnP algorithm, for τ > 0 xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) Plugging the Gradient step denoiser Dσ xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) ⇔ Proximal Gradient Descent on the (non-convex) problem min x f (x) + gσ (x) 17/51

Slide 78

Slide 78 text

Overview Find x from observation y = Ax + ξ Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration 18/51

Slide 79

Slide 79 text

Overview Find x from observation y = Ax + ξ Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x) explicit prior minimization problem ? convergence guarantees ? SOTA restoration 18/51

Slide 80

Slide 80 text

Convergence of non-convex PGD Minimization problem for a proper function f min x f (x) + gσ (x) PnP algorithm xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) 19/51

Slide 81

Slide 81 text

Convergence of non-convex PGD Minimization problem for a proper function f min x f (x) + gσ (x) PnP algorithm xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) Function value and residual convergence [Beck and Teboulle ’09] for τL < 1 • gσ differentiable with L-Lipschitz gradient • gσ bounded from below • f bounded from below, may be non-convex 19/51

Slide 82

Slide 82 text

Convergence of non-convex PGD Minimization problem for a proper function f min x f (x) + gσ (x) PnP algorithm xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) Function value and residual convergence [Beck and Teboulle ’09] for τL < 1 • gσ differentiable with L-Lipschitz gradient • gσ bounded from below • f bounded from below, may be non-convex Convergence to stationary points [Attouch, Bolte and Svaiter ’13] • f and gσ satisfies the Kurdyka-Lojasiewicz (KL) • The iterates xk are bounded 19/51

Slide 83

Slide 83 text

Design of the potential gσ Gradient Step denoiser Dσ (x) = x − ∇gσ (x) Parameterization of gσ to satisfy the required assumptions 20/51

Slide 84

Slide 84 text

Design of the potential gσ Gradient Step denoiser Dσ (x) = x − ∇gσ (x) Parameterization of gσ to satisfy the required assumptions • Convex scalar neural network gσ : Rn → R+ [Cohen et al. ’21] Limited denoising performances 20/51

Slide 85

Slide 85 text

Design of the potential gσ Gradient Step denoiser Dσ (x) = x − ∇gσ (x) Parameterization of gσ to satisfy the required assumptions • Convex scalar neural network gσ : Rn → R+ [Cohen et al. ’21] Limited denoising performances • Proposed non-convex potential gσ (x) = 1 2 ||x − Nσ (x)||2 with a neural network Nσ : Rn → Rn Dσ (x) = x − ∇gσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) Any efficient denoising architecture Nσ (x) can be reused 20/51

Slide 86

Slide 86 text

Design of the potential gσ Parameterization of gσ gσ (x) = 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn, neural network with C2 and Lipschitz gradient activations 21/51

Slide 87

Slide 87 text

Design of the potential gσ Parameterization of gσ gσ (x) = 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn, neural network with C2 and Lipschitz gradient activations Check the required assumptions gσ is bounded from below, C1 and with L > 1 Lipschitz gradient ||∇2gσ (x)||S 21/51

Slide 88

Slide 88 text

Design of the potential gσ Parameterization of gσ gσ (x) = 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn, neural network with C2 and Lipschitz gradient activations Check the required assumptions gσ is bounded from below, C1 and with L > 1 Lipschitz gradient ||∇2gσ (x)||S Coercivity to bound iterates xk [Laumont et al. ‘21] ˆ gσ (x) = gσ (x) + 1 2 ||x − ProjC (x)||2 for a large convex compact set C (never activated in practice) 21/51

Slide 89

Slide 89 text

Interpretation Gradient Step denoiser Dσ (x) = x − ∇ 1 2 ||x − Nσ (x)||2 = Nσ (x) + JNσ (x)T (x − Nσ (x)) • Correct a denoiser Nσ (x) to make it a conservative vector field RED [Romano et al. ’17] • Learn the refitting of the denoiser Nσ (x) CLEAR [Deledalle et al.’17] 22/51

Slide 90

Slide 90 text

GS-DRUNet Gradient Step denoiser Dσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) Architecture for Nσ : light version of DRUNet [Zhang et al’ 21] Training with L2 loss, for σ ∈ [0, 50/255]: L(σ) = Ex∼p,ξσ∼N(0,σ2) ||Dσ (x + ξσ ) − x||2 Training set: Berkeley, Waterloo, DIV2K and Flick2K 23/51

Slide 91

Slide 91 text

GS-DRUNet PSNR on the CBSD68 dataset σ(./255) 5 15 25 50 FFDNet 39.95 33.53 30.84 27.54 DnCNN 39.80 33.55 30.87 27.52 DRUNet 40.31 33.97 31.32 28.08 GS-DRUNet 40.26 33.90 31.26 28.01 DRUNet light 40.19 33.89 31.25 28.00 24/51

Slide 92

Slide 92 text

Overview Find x from observation y = Ax + ξ Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x) explicit prior minimization problem convergence guarantees ? SOTA restoration 25/51

Slide 93

Slide 93 text

General image inverse problems Objective function for any degradation operator A and noise level ν f (x) + gσ (x) = λ 2 ||Ax − y||2 + 1 2 ||Nσ (x) − x||2 GS-PnP algorithm xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) • Deep denoiser Dσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) • Proximal operator Proxτf (z) = τλAT A + Id −1 τλAT y + z • Backtracking procedure for τ [Hu et al. ‘22] - Rough empirical estimation of the lipschitz constant L - Not being stuck at the first local minima • Automatic tuning via Reinforcement Lerning [Wei et. al ‘20’] 26/51

Slide 94

Slide 94 text

Image inverse problems Deblurring with convolution with periodic boundary conditions: A = H diagonal in the Fourier basis = ∗ + 27/51

Slide 95

Slide 95 text

Deblurring: quantitative results (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) ν Method Avg 0.01 EPLL 28.32 28.24 28.36 25.80 29.61 27.15 26.90 26.69 25.84 26.49 27.34 RED 30.47 30.01 30.29 28.09 31.22 28.92 28.90 28.67 26.66 28.45 29.17 IRCNN 32.96 32.62 32.53 32.44 33.51 33.62 32.54 32.20 28.11 29.19 31.97 MMO 32.35 32.06 32.24 31.67 31.77 33.17 32.30 31.80 27.81 29.26 31.44 DPIR 33.76 33.30 33.04 33.09 34.10 34.34 33.06 32.77 28.34 29.16 32.50 GS-PnP 33.52 33.07 32.91 32.83 34.07 34.25 32.96 32.54 28.11 29.03 32.33 0.03 EPLL 25.31 25.12 25.82 23.75 26.99 25.23 25.00 24.59 24.34 25.43 25.16 RED 25.71 25.32 25.71 24.38 26.65 25.50 25.27 24.99 23.51 25.54 25.26 IRCNN 28.96 28.65 28.90 28.38 30.03 29.87 28.92 28.52 25.92 27.64 28.58 DPIR 29.38 29.06 29.21 28.77 30.22 30.23 29.34 28.90 26.19 27.81 28.91 GS-PnP 29.22 28.89 29.20 28.60 30.32 30.21 29.32 28.92 26.38 27.89 28.90 0.05 EPLL 24.08 23.91 24.78 22.57 25.68 23.98 23.70 23.19 23.75 24.78 24.04 RED 22.78 22.54 23.13 21.92 23.78 22.97 22.89 22.67 22.01 23.78 22.84 IRCNN 27.00 26.74 27.25 26.37 28.29 28.06 27.22 26.81 24.85 26.83 26.94 DPIR 27.52 27.35 27.73 27.02 28.63 28.46 27.79 27.30 25.25 27.11 27.42 GS-PnP 27.45 27.28 27.70 26.98 28.68 28.44 27.81 27.38 25.49 27.15 27.44 28/51

Slide 96

Slide 96 text

Deblurring: qualitative inspection Motion kernel (b) and ν = 0.03: Clean Observed (20.97dB) RED (25.92dB) IRCNN (28.66dB) DPIR (29.76dB) GS-PnP (29.90dB) (f + gσ)(xk ) ||xk+1 − xk || 29/51

Slide 97

Slide 97 text

Deblurring: convergence inspection DPIR: 8 iterations with timestep σ decreasing uniformly in log-scale from 50 to ν and τ ∝ σ2 Asymptotic behavior of DPIR: (i) Decreasing timestep along 1000 iterations (ii) Decreasing timestep on the 8 first iterations and constant for the next 992 ones 30/51

Slide 98

Slide 98 text

Deblurring: convergence inspection DPIR: 8 iterations with timestep σ decreasing uniformly in log-scale from 50 to ν and τ ∝ σ2 Asymptotic behavior of DPIR: (i) Decreasing timestep along 1000 iterations (ii) Decreasing timestep on the 8 first iterations and constant for the next 992 ones 30/51

Slide 99

Slide 99 text

Image inverse problems Super-resolution: A = SH ; efficient closed-form solution for Proxτf [Zhao et al. ’16] =         ∗         s + 31/51

Slide 100

Slide 100 text

Super-resolution: quantitative results Kernels Method s = 2 s = 3 Avg ν = 0.01 ν = 0.03 ν = 0.05 ν = 0.01 ν = 0.03 ν = 0.05 Bicubic 24.85 23.96 22.79 23.14 22.52 21.62 23.15 RED 28.29 24.65 22.98 26.13 24.02 22.37 24.74 IRCNN 27.43 26.22 25.86 26.12 25.11 24.79 25.92 DPIR 28.62 27.30 26.47 26.88 25.96 25.22 26.74 GS-PnP 28.77 27.54 26.63 26.85 26.05 25.29 26.86 Bicubic 23.38 22.71 21.78 22.65 22.08 21.25 22.31 RED 26.33 23.91 22.45 25.38 23.40 21.91 23.90 IRCNN 25.83 24.89 24.59 25.36 24.36 23.95 24.83 DPIR 26.84 25.59 24.89 26.24 24.98 24.32 25.48 GS-PnP 26.80 25.73 25.03 26.18 25.08 24.31 25.52 32/51

Slide 101

Slide 101 text

Super-resolution: qualitative inspection Isotropic kernel, s = 2 and ν = 0.03: Clean Observed (20.97dB) RED (21.75dB) IRCNN (22.82dB) DPIR (23.97dB) GS-PnP (24.81dB) (f + gσ)(xk ) ||xk+1 − xk || 33/51

Slide 102

Slide 102 text

Image inverse problems Inpainting (ν = 0): A diagonal, with values in {0, 1}, f (x) = ı{x | Ax=y} Proxτf (x) = Π{x | Ax=y} (x) = Ay − Ax + x = ⊗ 34/51

Slide 103

Slide 103 text

Inpainting 35/51

Slide 104

Slide 104 text

Overview Find x from observation y = Ax + ξ Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x) explicit prior minimization problem convergence guarantees SOTA restoration 36/51

Slide 105

Slide 105 text

Convergence of PnP algorithms PnP algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) 3. Gradient step denoiser: PnP-HQS 37/51

Slide 106

Slide 106 text

Convergence of PnP algorithms PnP algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) 3. Gradient step denoiser: PnP-HQS Equivalent to PGD: xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) 37/51

Slide 107

Slide 107 text

Convergence of PnP algorithms PnP algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) • PnP-HQS: xk+1 = Proxτf ◦Dσ (xk ) 3. Gradient step denoiser: PnP-HQS Equivalent to PGD: xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) 4. What about other PnP algorithms? 37/51

Slide 108

Slide 108 text

Outline 1. Introduction 2. From Proximal algorithms to Plug & Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion

Slide 109

Slide 109 text

4. Proximal Gradient Step Plug & Play

Slide 110

Slide 110 text

Proximal Gradient Step denoiser PnP algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) PGD/DRS rely on implicit gradient steps Dσ = Proxφσ 38/51

Slide 111

Slide 111 text

Proximal Gradient Step denoiser PnP algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) PGD/DRS rely on implicit gradient steps Dσ = Proxφσ of proximal operators of convex functions φσ Firmly non-expansive operator: limited denoising performances 38/51

Slide 112

Slide 112 text

Proximal Gradient Step denoiser PnP algorithms • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) PGD/DRS rely on implicit gradient steps Dσ = Proxφσ of proximal operators of convex functions φσ Firmly non-expansive operator: limited denoising performances Idea [Hurault et al. ‘22] • Build on top of the explicit gradient step denoiser Dσ = Id −∇gσ • Make Id −∇gσ the proximal operator of a non-convex potential φσ 38/51

Slide 113

Slide 113 text

Characterisation of proximal operators Gradient step denoiser Dσ = Id −∇gσ = ∇ 1 2 ||x||2 − gσ (x) := ∇hσ Proposition [Moreau’65, Gribonval and Nikolova ’20] If hσ is convex then there exists some function φσ such that Dσ ∈ Proxφσ (x) 39/51

Slide 114

Slide 114 text

Characterisation of proximal operators Gradient step denoiser Dσ = Id −∇gσ = ∇ 1 2 ||x||2 − gσ (x) := ∇hσ Theorem [Gribonval ’11, Gribonval and Nikolova ’20] If gσ is Ck+1, k ≥ 1 and if ∇gσ is contractive (L < 1 Lipschitz), then • hσ is (1 − L) strongly convex • Dσ = Proxφσ is injective with φσ (x) := gσ (Dσ −1(x))) − 1 2 ||Dσ −1(x) − x||2 + Kσ if x ∈ Im(Dσ ) • φσ ≥ gσ (x) • φσ is L/(L + 1) semi-convex (φσ (x) + L 2(L+1) ||x||2 is convex) • φσ is Ck and with L 1−L Lipschitz gradient on Im(Dσ ) 40/51

Slide 115

Slide 115 text

Proximal Gradient Step denoiser Proposed Proximal gradient step denoiser • Gradient step expression Dσ = Id −∇gσ • Contractive residual Id −Dσ = ∇gσ , similar to [Ryu et al ’19] The denoiser is not non-expansive Residual condition hard to satisfy exactly 41/51

Slide 116

Slide 116 text

Proximal Gradient Step denoiser Proposed Proximal gradient step denoiser • Gradient step expression Dσ = Id −∇gσ • Contractive residual Id −Dσ = ∇gσ , similar to [Ryu et al ’19] The denoiser is not non-expansive Residual condition hard to satisfy exactly If gσ is C2: Id −Dσ contractive ⇔ ||JId −Dσ ||S < 1 Penalization of the spectral norm of the Jacobian [Gulrajani et al ’17, Terris et al. ’20] 41/51

Slide 117

Slide 117 text

Prox-DRUNet Architecture for Nσ : light version of DRUNet [Zhang et al’ 21] Training, for σ ∈ [0, 25], with L2 loss and spectral norm penalization: LS (σ) = Ex∼p,ξσ∼N(0,σ2) ||Dσ (x + ξσ ) − x||2+µ max(||J(Id −Dσ) (x + ξσ )||S , 1− ) PSNR on the CBSD68 dataset σ(./255) 5 15 25 FFDNet 39.95 33.53 30.84 DnCNN 39.80 33.55 30.87 DRUNet 40.31 33.97 31.32 GS-DRUNet 40.27 33.92 31.28 prox-DRUNet (µ = 10−3) 40.12 33.60 30.82 prox-DRUNet (µ = 10−2) 40.04 33.51 30.64 nonexp-DRUNet (µ = 10−3) 39.71 33.50 30.89 nonexp-DRUNet (µ = 10−2) 34.92 31.42 29.42 42/51

Slide 118

Slide 118 text

Prox-DRUNet Posterior approximation of the Lipschitz constant with power iterations • GS-DRUNet: L = maxx ||JId −Dσ(x) ||S • Prox-DRUNet: L = maxx ||JId −Dσ(x) ||S • nonexp-DRUNet: L = maxx ||JNσ(x) ||S σ(./255) 0 5 10 15 20 25 GS-DRUNet (µ = 0) 0.94 1.26 2.47 1.96 2.50 3.27 Prox-DRUNet (µ = 10−3) 0.86 0.94 0.97 0.98 0.99 1.19 Prox-DRUNet (µ = 10−2) 0.87 0.92 0.95 0.99 0.96 0.96 nonexp-DRUNet (µ = 10−3) 1.06 1.07 1.07 1.10 1.13 1.20 nonexp-DRUNet (µ = 10−2) 0.97 0.98 0.98 0.98 0.98 0.98 Empirical value L on the CBSD68 dataset 43/51

Slide 119

Slide 119 text

Prox-DRUNet Posterior approximation of the Lipschitz constant with power iterations • GS-DRUNet: L = maxx ||JId −Dσ(x) ||S • Prox-DRUNet: L = maxx ||JId −Dσ(x) ||S • nonexp-DRUNet: L = maxx ||JNσ(x) ||S σ(./255) 0 5 10 15 20 25 GS-DRUNet (µ = 0) 0.94 1.26 2.47 1.96 2.50 3.27 Prox-DRUNet (µ = 10−3) 0.86 0.94 0.97 0.98 0.99 1.19 Prox-DRUNet (µ = 10−2) 0.87 0.92 0.95 0.99 0.96 0.96 nonexp-DRUNet (µ = 10−3) 1.06 1.07 1.07 1.10 1.13 1.20 nonexp-DRUNet (µ = 10−2) 0.97 0.98 0.98 0.98 0.98 0.98 Empirical value L on the CBSD68 dataset 43/51

Slide 120

Slide 120 text

Prox-DRUNet Posterior approximation of the Lipschitz constant with power iterations • GS-DRUNet: L = maxx ||JId −Dσ(x) ||S • Prox-DRUNet: L = maxx ||JId −Dσ(x) ||S • nonexp-DRUNet: L = maxx ||JNσ(x) ||S σ(./255) 0 5 10 15 20 25 GS-DRUNet (µ = 0) 0.94 1.26 2.47 1.96 2.50 3.27 Prox-DRUNet (µ = 10−3) 0.86 0.94 0.97 0.98 0.99 1.19 Prox-DRUNet (µ = 10−2) 0.87 0.92 0.95 0.99 0.96 0.96 nonexp-DRUNet (µ = 10−3) 1.06 1.07 1.07 1.10 1.13 1.20 nonexp-DRUNet (µ = 10−2) 0.97 0.98 0.98 0.98 0.98 0.98 Empirical value L on the CBSD68 dataset 43/51

Slide 121

Slide 121 text

Prox-DRUNet Posterior approximation of the Lipschitz constant with power iterations • GS-DRUNet: L = maxx ||JId −Dσ(x) ||S • Prox-DRUNet: L = maxx ||JId −Dσ(x) ||S • nonexp-DRUNet: L = maxx ||JNσ(x) ||S σ(./255) 0 5 10 15 20 25 GS-DRUNet (µ = 0) 0.94 1.26 2.47 1.96 2.50 3.27 Prox-DRUNet (µ = 10−3) 0.86 0.94 0.97 0.98 0.99 1.19 Prox-DRUNet (µ = 10−2) 0.87 0.92 0.95 0.99 0.96 0.96 nonexp-DRUNet (µ = 10−3) 1.06 1.07 1.07 1.10 1.13 1.20 nonexp-DRUNet (µ = 10−2) 0.97 0.98 0.98 0.98 0.98 0.98 Empirical value L on the CBSD68 dataset 43/51

Slide 122

Slide 122 text

PnP algorithms with proximal gradient step denoiser Proximal gradient step denoiser Dσ = Id −∇gσ = Proxφσ with φσ (x) := gσ (Dσ −1(x))) − 1 2 ||Dσ −1(x) − x||2 if x ∈ Im(Dσ ) +∞ otherwise Minimization objective min x f (x) + φσ (x) Proximal algorithms • PGD: xk+1 = Proxτφσ ◦ (Id −τ∇f )(xk ) • DRS xk+1 = (2 Proxτf − Id) ◦ (2Proxτφσ − Id)(xk ) 44/51

Slide 123

Slide 123 text

PnP algorithms with proximal gradient step denoiser Proximal gradient step denoiser Dσ = Id −∇gσ = Proxφσ with φσ (x) := gσ (Dσ −1(x))) − 1 2 ||Dσ −1(x) − x||2 if x ∈ Im(Dσ ) +∞ otherwise Minimization objective min x f (x) + φσ (x) Proximal algorithms • PGD: xk+1 = Proxτφσ ◦ (Id −τ∇f )(xk ) • DRS xk+1 = (2 Proxτf − Id) ◦ (2Proxτφσ − Id)(xk ) Timestep has to be fixed to τ = 1 44/51

Slide 124

Slide 124 text

Non-convex PGD Minimization objective min x f (x) + φσ (x) PnP-PGD, for differentiable f xk+1 = Proxφσ ◦(Id −∇f )(xk ) Convergence (function value and iterates) [Attouch et al.’13, Li and Lin ’15, Beck ’17] • φσ and f bounded from below • φσ is L L+1 semi convex with L < 1 • f of class C1 with a Lf < L+2 L+1 -Lipschitz gradient 45/51

Slide 125

Slide 125 text

Non-convex PGD Minimization objective min x f (x) + φσ (x) PnP-PGD, for differentiable f xk+1 = Proxφσ ◦(Id −∇f )(xk ) Convergence (function value and iterates) [Attouch et al.’13, Li and Lin ’15, Beck ’17] • φσ and f bounded from below • φσ is L L+1 semi convex with L < 1 • f of class C1 with a Lf < L+2 L+1 -Lipschitz gradient What about non differentiable data fidelity terms? 45/51

Slide 126

Slide 126 text

Non-convex DRS Minimization objective min x f (x) + φσ (x) PnP-DRS xk+1 = (2 Proxφσ − Id) ◦ (2 Proxf − Id)(xk ) Reformulation (also equivalent to ADMM)      yk+1 = Proxφσ (xk ) zk+1 = Proxf (2yk+1 − xk+1 ) xk+1 = xk + (zk+1 − yk+1 ) 46/51

Slide 127

Slide 127 text

Non-convex DRS Minimization objective min x f (x) + φσ (x) PnP-DRS xk+1 = (2 Proxφσ − Id) ◦ (2 Proxf − Id)(xk ) Reformulation (also equivalent to ADMM)      yk+1 = Proxφσ (xk ) zk+1 = Proxf (2yk+1 − xk+1 ) xk+1 = xk + (zk+1 − yk+1 ) Douglas-Rachford Envelope as Lyapunov function FDR (x) = φσ (y) + f (z) + y − x, y − z + 1 2 ||y − z||2 Convergence for proper l.s.c. functions f and φσ if one function has a Lipschitz gradient with constant < 1 [Themelis and Patrinos ’20] 46/51

Slide 128

Slide 128 text

Non-convex DRS Minimization objective min x f (x) + φσ (x) PnP-DRS xk+1 = (2 Proxφσ − Id) ◦ (2 Proxf − Id)(xk ) Reformulation (also equivalent to ADMM)      yk+1 = Proxφσ (xk ) zk+1 = Proxf (2yk+1 − xk+1 ) xk+1 = xk + (zk+1 − yk+1 ) Douglas-Rachford Envelope as Lyapunov function FDR (x) = φσ (y) + f (z) + y − x, y − z + 1 2 ||y − z||2 Convergence for proper l.s.c. functions f and φσ if one function has a Lipschitz gradient with constant < 1 [Themelis and Patrinos ’20] φσ has a L 1−L Lipschitz gradient: the residual Id −Dσ must be L < 0.5-Lipschitz 46/51

Slide 129

Slide 129 text

Discussion on assumptions Inverse problem • f (x) = λ 2 ||Ax − y||2 • Dσ = Id −∇gσ , with (Id −Dσ ) L < 1-Lipschitz 47/51

Slide 130

Slide 130 text

Discussion on assumptions Inverse problem • f (x) = λ 2 ||Ax − y||2 • Dσ = Id −∇gσ , with (Id −Dσ ) L < 1-Lipschitz PnP-PGD ||∇f || = λ||At A|| < L+2 L+1 Solved with a α-relaxed PGD [Tseng ‘08, Hurault et. al ‘23]      qk+1 = (1 − α)yk + αxk xk+1 = Dσ (xk − λ∇f (qk+1 )) yk+1 = (1 − α)yk + αxk+1 PnP-DRS λ free Can deal with non differentiable data fidelity terms f Restricted to denoisers with L < 0.5-Lipschitz residual 47/51

Slide 131

Slide 131 text

Quantitative evaluation PSNR averaged on CBSD68 over 10 blur kernels (deblurring) and 4 kernels and 2 scales (super-resolution) Deblurring Super-resolution scale s = 2 scale s = 3 Noise level ν 0.01 0.03 0.05 0.01 0.03 0.05 0.01 0.03 0.05 IRCNN 31.42 28.01 26.40 26.97 25.86 25.45 25.60 24.72 24.38 DPIR 31.93 28.30 26.82 27.79 26.58 25.83 26.05 25.27 24.66 GS-PnP-HQS 31.70 28.28 26.86 27.88 26.81 26.01 25.97 25.35 24.74 Prox-PnP-PGD 30.91 27.97 26.66 27.68 26.57 25.81 25.94 25.20 24.62 Prox-PnP-DRS 31.54 28.07 26.60 27.93 26.61 25.79 26.13 25.29 24.67 Prox-PnP-αPGD 31.55 28.03 26.66 27.92 26.61 25.80 26.03 25.26 24.61 48/51

Slide 132

Slide 132 text

Deblurring Clean Observed Prox-PnP-PGD Prox-PnP-DRS (29.41dB) (29.65dB) mini≤k ||xi+1 − xi ||2 Fλ,σ (xk ) 49/51

Slide 133

Slide 133 text

Super-resolution Clean Observed Prox-PnP-PGD Prox-PnP-DRS (27.27dB) (27.95dB) mini≤k ||xi+1 − xi ||2 Fλ,σ (xk ) 50/51

Slide 134

Slide 134 text

Outline 1. Introduction 2. From Proximal algorithms to Plug & Play algorithms 3. Gradient Step Plug & Play 4. Proximal Gradient Step Plug & Play 5. Conclusion

Slide 135

Slide 135 text

Conclusion

Slide 136

Slide 136 text

Conclusion Contributions • Gradient based denoisers Dσ not non-expansive • Convergent PnP schemes • Tractable (non-convex) optimization problems • State-of-the-art results for various imaging tasks 51/51

Slide 137

Slide 137 text

Thank you

Slide 138

Slide 138 text

• S. Hurault, A. Leclaire, N. Papadakis - Gradient Step Denoiser for convergent Plug-and-Play - ICLR, 2022 • S. Hurault, A. Leclaire, N. Papadakis - Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization - ICML, 2022 • S. Hurault, A. Chambolle, A. Leclaire, N. Papadakis, - A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser, 2023.

Slide 139

Slide 139 text

Denoising prior Find x from observation y = x + ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers

Slide 140

Slide 140 text

Denoising prior Find x from observation y = x + ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x]

Slide 141

Slide 141 text

Denoising prior Find x from observation y = x + ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] = arg max x∈Rn p(y|x)p(x) p(y) = arg min x∈Rn − log p(y|x) − log p(x) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y)

Slide 142

Slide 142 text

Denoising prior Find x from observation y = x + ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p

Slide 143

Slide 143 text

Denoising prior Find x from observation y = x + ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p Tweedie (explicit gradient): DMMSE σ = Id −σ2∇(− log pσ )

Slide 144

Slide 144 text

Denoising prior Find x from observation y = x + ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p Tweedie (explicit gradient): DMMSE σ = Id −σ2∇(− log pσ ) → Real denoiser Dσ ≈ DMAP σ or Dσ ≈ DMMSE σ

Slide 145

Slide 145 text

Denoising prior Find x from observation y = x + ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p Tweedie (explicit gradient): DMMSE σ = Id −σ2∇(− log pσ ) → Real denoiser Dσ ≈ DMAP σ or Dσ ≈ DMMSE σ → A denoiser provides an implicit prior

Slide 146

Slide 146 text

Denoising prior Find x from observation y = x + ξ • Input (unknown) distribution of clean image p(x) • Gaussian noise ξ ∼ N(0, σ2 Id) • Noisy observation y ∼ pσ (y) = p ∗ N(0, σ2 Id) → Two optimal denoisers MAP estimator DMAP σ (y) = arg max x p(x|y) MMSE estimator DMMSE σ (y) = Ex∼p(x|y) [x] Bayes (implicit gradient): DMAP σ = Prox−σ2 log p Tweedie (explicit gradient): DMMSE σ = Id −σ2∇(− log pσ ) → Real denoiser Dσ ≈ DMAP σ or Dσ ≈ DMMSE σ → A denoiser provides an implicit prior → State-of-the-art denoisers are not non-expansive

Slide 147

Slide 147 text

PnP and RED algorithms Proximal algorithms • GD: xk+1 = (Id −τ(∇f + λ∇g))(xk ) • HQS: xk+1 = Proxτg ◦ Proxτf (xk ) • PGD: xk+1 = Proxτg ◦(Id −τ∇f )(xk ) • DRS: xk+1 = 1 2 Id +1 2 (2 Proxτf − Id) ◦ (2 Proxτg − Id) (xk ) MAP denoiser Dσ (y) = Prox−σ2 log p (y) MMSE denoiser Dσ (y) = (Id +σ2∇ log pσ )(y) PnP algorithms [Venkatakrishnan et al. ‘13] Proxτg ← Dσ RED algorithms [Romano et al. ‘17] ∇g ← Id −Dσ

Slide 148

Slide 148 text

PnP and RED algorithms Proximal algorithms • GD: xk+1 = (Id −τ(∇f + λ∇g))(xk ) • HQS: xk+1 = Proxτg ◦ Proxτf (xk ) • PGD: xk+1 = Proxτg ◦(Id −τ∇f )(xk ) • DRS: xk+1 = 1 2 Id +1 2 (2 Proxτf − Id) ◦ (2 Proxτg − Id) (xk ) MAP denoiser Dσ (y) = Prox−σ2 log p (y) MMSE denoiser Dσ (y) = (Id +σ2∇ log pσ )(y) PnP algorithms [Venkatakrishnan et al. ‘13] Proxτg ← Dσ RED algorithms [Romano et al. ‘17] ∇g ← Id −Dσ • PnP-HQS: xk+1 = Dσ ◦ Proxτf (xk ) • PnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = 1 2 Id +1 2 (2 Proxτf − Id) ◦ (2Dσ − Id)(xk )

Slide 149

Slide 149 text

PnP and RED algorithms Proximal algorithms • GD: xk+1 = (Id −τ(∇f + λ∇g))(xk ) • HQS: xk+1 = Proxτg ◦ Proxτf (xk ) • PGD: xk+1 = Proxτg ◦(Id −τ∇f )(xk ) • DRS: xk+1 = 1 2 Id +1 2 (2 Proxτf − Id) ◦ (2 Proxτg − Id) (xk ) MAP denoiser Dσ (y) = Prox−σ2 log p (y) MMSE denoiser Dσ (y) = (Id +σ2∇ log pσ )(y) PnP algorithms [Venkatakrishnan et al. ‘13] Proxτg ← Dσ RED algorithms [Romano et al. ‘17] ∇g ← Id −Dσ We have Id −τ∇g = τDσ + (1 − τ) Id • RED-GD: xk+1 = (τDσ + (1 − τ) Id − τ∇f )(xk ) • RED-PGD: xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk )

Slide 150

Slide 150 text

Non-convex PGD: function value and residual convergence Theorem [Beck and Teboulle ’09] Assuming • f : Rn → R ∪ {+∞} proper, convex, lower semicontinous • g : Rn → R differentiable with L-Lipschitz gradient • F = f + λg bounded from below Then, for τ < 1 λL , the iterates xk given by the PGD algorithm xk+1 = Proxτf ◦(Id −λτ∇g)(xk ) verify (i) (F(xk )) is non-increasing and converging. (ii) The residual ||xk+1 − xk || converges to 0 (iii) Cluster points of (xk ) are stationary points of F. Remark: Can be extended to non-convex data terms [Li and Lin ’15]

Slide 151

Slide 151 text

Non-convex PGD: iterates convergence Theorem [Attouch, Bolte and Svaiter ’13] Assuming • Same as before + f and g verify the Kurdyka-Lojasiewicz (KL) property Then, for τ < 1 λL , if the sequence (xk ) given by xk+1 = Proxτf ◦(Id −λτ∇g)(xk ) is bounded, it converges, with finite length, to a critical point of f + λg. Go back

Slide 152

Slide 152 text

Non-convex Proximal Gradient Descent Theorem (adapted from [Attouch et al.’13, Li and Lin ’15, Beck ’17]) For F = f + φσ , assume • φσ such that Dσ = Id −∇gσ = Proxφσ with gσ : Rn → R ∪ {+∞} of class C2 with L < 1-Lipschitz gradient and bounded from below • f : Rn → R ∪ {+∞} differentiable with Lf < 1-Lipschitz gradient and bounded from below Then the iterates xk+1 = Dσ ◦ (Id −∇f )(xk ) of PnP-PGD satisfy (i) (F(xk )) is non-increasing and converges (ii) The residual converges ||xk+1 − xk || convers to 0 (iii) Cluster points of (xk ) are stationary points of F (iv) If f and gσ are respectively KL and semi-algebraic, then if (xk ) is bounded, it converges, with finite length, to a stationary point of F Go back

Slide 153

Slide 153 text

Non-convex DRS Theorem (Adapted from [Themelis and Patrinos ’20]) For F = f + φσ , assume • φσ such that Dσ = Id −∇gσ = Proxφσ with gσ : Rn → R ∪ {+∞} of class C2 with L-Lipschitz gradient (such that 2L3 + L2 + 2L − 1 < 0) and bounded from below • f : Rn → R ∪ {+∞} proper l.s.c and bounded from below Take the Douglas-Rachford Envelope as Lyapunov function FDR (x) = φσ(y) + f (z) + y − x, y − z + 1 2 ||y − z||2 Then the iterates of PnP-DRS satisfy (i) (FDR (xk )) is nonincreasing and converges (ii) The residual ||yk − zk || converges to 0 (iii) For any cluster point (y∗, z∗, x∗), y∗ and z∗ are stationary points of F (iv) If f and gσ are KL and semi-algebraic, then if (yk , zk , xk ) is bounded, it converges, and yk and zk converge to the same stationary point of F Go back

Slide 154

Slide 154 text

Numerical implementation Denoiser: Dσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) Computing JNσ (x)T (x − Nσ (x)): N = DRUNet light ( x , sigma ) t o r c h . autograd . grad (N, x , g r a d o u t p u t s=x − N, c r e a t e g r a p h=True , o n l y i n p u t s=True ) [ 0 ] Go back

Slide 155

Slide 155 text

Fixed points of the denoiser Nσ (x) Initialization σ = 0 σ = 0.5 σ = 1 σ = 5 σ = 10 σ = 20 σ = 30 σ = 40

Slide 156

Slide 156 text

Fixed points of the denoiser Nσ (x) Initialization σ = 0 σ = 0.5 σ = 1 σ = 5 σ = 10 σ = 20 σ = 30 σ = 40

Slide 157

Slide 157 text

Fixed points of the denoiser Nσ (x) Initialization σ = 0 σ = 0.5 σ = 1 σ = 5 σ = 10 σ = 20 σ = 30 σ = 40

Slide 158

Slide 158 text

Learning the Gradient-step and Proximal denoisers Our denoisers write Dσ = Id −∇gσ with gσ (x) = 1 2 ||x − Nσ (x)||2 with a C1 or C2 neural network Nσ : Rn → Rn (DRUNet [Zhang et al. ’21] with ELU) Dσ (x) = x − ∇gσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) Training loss: GS-Denoiser - Prox-Denoiser arg min Param(Dσ) Ex,ξσ ||Dσ (x + ξσ ) − x||2 + µ max(||∇2gσ (x + ξσ )||S , 1 − ) σ(./255) 5 15 25 DRUNet 40.19 33.89 31.25 GS-Denoiser 40.26 33.90 31.26 Prox-Denoiser 40.12 33.60 30.82 Table 1: Denoising PSNR on the CBSD68 dataset, for various σ.

Slide 159

Slide 159 text

Learning the Gradient-step and Proximal denoisers Our denoisers write Dσ = Id −∇gσ with gσ (x) = 1 2 ||x − Nσ (x)||2 with a C1 or C2 neural network Nσ : Rn → Rn (DRUNet [Zhang et al. ’21] with ELU) Dσ (x) = x − ∇gσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) Training loss: GS-Denoiser - Prox-Denoiser arg min Param(Dσ) Ex,ξσ ||Dσ (x + ξσ ) − x||2 + µ max(||∇2gσ (x + ξσ )||S , 1 − ) σ(./255) 5 15 25 GS-DRUNet 1.26 1.96 3.27 Prox-DRUNet 0.92 0.99 0.96 Table 1: maxx ||∇2gσ (x)||S on the CBSD68 dataset, for various σ.

Slide 160

Slide 160 text

Averaged operator theory [Bauschke & Combettes ‘11] Definition T : Rn → Rn and θ ∈ (0, 1). T is θ-averaged if there is a nonexpansive operator R s.t. T = θR + (1 − θ) Id . • This is equivalent to ∀x, y ∈ Rn, T(x)−T(y) 2 + 1−θ θ (Id −T)(x)−(Id −T)(y) 2 ≤ x −y 2. • T θ-averaged =⇒ T nonexpansive. • 1/2-averaged = “firmly nonexpansive”.

Slide 161

Slide 161 text

Averaged operator theory [Bauschke & Combettes ‘11] Definition T : Rn → Rn and θ ∈ (0, 1). T is θ-averaged if there is a nonexpansive operator R s.t. T = θR + (1 − θ) Id . • This is equivalent to ∀x, y ∈ Rn, T(x)−T(y) 2 + 1−θ θ (Id −T)(x)−(Id −T)(y) 2 ≤ x −y 2. • T θ-averaged =⇒ T nonexpansive. • 1/2-averaged = “firmly nonexpansive”. Theorem (Krasnosel’ski˘ ı-Mann) Let T : Rn → Rn be a θ-averaged operator such that Fix(T) = ∅. Then the sequence xk+1 = T(xk ) converges to a fixed point of T.

Slide 162

Slide 162 text

Averaged operator theory [Bauschke & Combettes ‘11] Proposition Let T1 θ1 -averaged and T2 θ2 -averaged, with any θ1 , θ2 ∈ (0, 1). • Then T1 ◦ T2 is θ-averaged with θ = θ1+θ2−2θ1θ2 1−θ1θ2 ∈ (0, 1). • For α ∈ [0, 1], αT1 + (1 − α) Id is αθ1 -averaged. Proposition If f : Rn → R is convex and L-smooth (i.e. differentiable with L-Lipschitz gradient) • Proxτf is τL 2(1+τL) -averaged for any τ > 0. • Id −τ∇f is τL 2 -averaged for τ < 2 L .

Slide 163

Slide 163 text

Averaged operator theory for PnP convergence PnP algorithms: xk+1 = TPnP (xk ) with TPnP =      THQS = Dσ ◦ Proxτf TPGD = Dσ ◦ (Id −τ∇f ) TDRS = 1 2 Id +1 2 (2Dσ − Id) ◦ (2 Proxτf − Id) Theorem If f : Rn→R convex, L-smooth and Dσ isθ-averaged, θ ∈ (0, 1) • PnP-HQS converges towards a fixed point of THQS . • If τL < 2, PnP-PGD converges towards a fixed point of TPGD . • If θ ≤ 1/2, PnP-DRS converges towards a fixed point of TDRS .

Slide 164

Slide 164 text

Averaged operator theory for PnP convergence PnP algorithms: xk+1 = TPnP (xk ) with TPnP =      THQS = Dσ ◦ Proxτf TPGD = Dσ ◦ (Id −τ∇f ) TDRS = 1 2 Id +1 2 (2Dσ − Id) ◦ (2 Proxτf − Id) Theorem If f : Rn→R convex, L-smooth and Dσ isθ-averaged, θ ∈ (0, 1) • PnP-HQS converges towards a fixed point of THQS . • If τL < 2, PnP-PGD converges towards a fixed point of TPGD . • If θ ≤ 1/2, PnP-DRS converges towards a fixed point of TDRS . Does not extend to nonconvex data-fidelity terms f . • If f is L-smooth and strongly convex, Proxτf is contractive, and for τL < 2, Id −τ∇f is contractive → assume Dσ (1 + )-Lipshitz [Ryu et. al, ’19].

Slide 165

Slide 165 text

How to build averaged deep denoisers ? Average operator: Dσ = θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19] Lipschitz constant 1 for large networks. Does not allow skip connexions.

Slide 166

Slide 166 text

How to build averaged deep denoisers ? Average operator: Dσ = θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19] Lipschitz constant 1 for large networks. Does not allow skip connexions. • Deep spline neural networks [Goujon, Neumayer et. al, ‘22] • Convolutional Proximal Neural Networks [Hertrich et. al, ‘20] • Soft regularization of the training loss [Terris et. al ’21]. • Dσ = Id −∇gσ with gσ Input Convex Neural Network (ICNN) [Meunier et. al ‘21].

Slide 167

Slide 167 text

How to build averaged deep denoisers ? Average operator: Dσ = θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19] Lipschitz constant 1 for large networks. Does not allow skip connexions. • Deep spline neural networks [Goujon, Neumayer et. al, ‘22] • Convolutional Proximal Neural Networks [Hertrich et. al, ‘20] • Soft regularization of the training loss [Terris et. al ’21]. • Dσ = Id −∇gσ with gσ Input Convex Neural Network (ICNN) [Meunier et. al ‘21]. Non-expansiveness can harm denoising performance.

Slide 168

Slide 168 text

Nonexpansive convolutional neural networks [Terris et al., 2021] Idea: • Build a nonexpansive convolutional neural network (CNN) D = TM ◦ · · · ◦ T1 with Tm (x) = Rm (Wm x + bm ) where Rm is an (averaged) activation function, Wm a convolution, and bm a bias. • We want to have D = Id +Q 2 with Q nonexpansive. • During training, the Lipschitz constant of 2D − Id is penalized.

Slide 169

Slide 169 text

Convolution proximal neural networks [Hertrich et al., 2022] Idea: Build a convolutional proximal neural network (cPNN) Φu = TM ◦ · · · ◦ T1 with Tm (x) = W T m σm (Wm x + bm ) where u = (Wm , σm , bm )1≤m≤M is a collection of parameters. The linear operators Wm (or W T m ) are convolutions lying in a Stiefel manifold St(d, n) = { W ∈ Rn×d | W T W = Id }. The resulting denoiser is then D = Id −γΦu . • Ideally, Φu is a composition of M firmly non-expansive operators, thus averaged. • In practice, Wm is a convolution with limited filter length. • Condition Wm ∈ St is approximated with a term W T m Wm − Id 2 F in the learning cost. • Φu is verified in practice to be t-averaged with t close to 1 2 .

Slide 170

Slide 170 text

Deep Spline Neural Networks [Goujon, Neumayer et al., 2022] Idea: Approximate the proximal operator of a convex-ridge regularizer R(x) = P p=1 i ψp (hp ∗ x(i)) hp are convolution kernels, ψp are particular C1 convex functions. Given a noisy y, ProxλR (y) = arg min x∈Rn 1 2 x − y 2 + λR(x) is approximated with t iterations of the gradient-step x → x − α((x − y) + λ∇R(x)). The output after t iterations is denoted by Tt R,λ,α (y). • Tt R,λ,α approximates the prox of a convex function • Linear spline parameterization of ψp justified by a density result