observation y = A(x) + ξ Supervized learning Provides state-of-the art results from paired data (xi , yi ) Requires to train a different model for each A 3/51
observation y = A(x) + ξ Supervized learning Provides state-of-the art results from paired data (xi , yi ) Requires to train a different model for each A Plug & Play (PnP): • Learn an efficient denoiser • Plug the denoiser in an iterative algorithm • Solve general inverse problems, involving any operator A 3/51
observation y = A(x) + ξ Supervized learning Provides state-of-the art results from paired data (xi , yi ) Requires to train a different model for each A Plug & Play (PnP): • Learn an efficient denoiser • Plug the denoiser in an iterative algorithm • Solve general inverse problems, involving any operator A State-of-the-art image restoration Convergence of the method 3/51
• Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] 7/51
• Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior 7/51
• Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior • PnP: From y degraded, iterate: 1. Perform a denoising step with Dσ 2. Enforce data-fidelity 7/51
• Decouple data-fidelity and regularization in iterative algorithms [Combette & Pesquet ‘11, Zoran & Weiss ‘11] Regularization by Image Denoising (easier, well-understood task) [Venkatakrishnan et al. ‘13, Romano et al. ‘17] → State-of-the art denoisers Dσ without explicit prior Filtering methods [Dabov et al. ‘07, Lebrun et al. ‘13] Deep denoisers [Zhang et al. ‘16,‘17, ‘21, Song et al. ‘19] → Step towards the manifold of clean images: implicit prior • PnP: From y degraded, iterate: 1. Perform a denoising step with Dσ 2. Enforce data-fidelity How many interations? Convergence when plugging Dσ in an optimization algorithm? 7/51
arg min x∈Rn λ 2 ||Ax − y||2 − log p(x) Explicit prior p: tractable minimization objective x∗ ∈ arg min x∈Rn f (x) + g(x) Example: Inpainting Log-concave prior p Non log-concave prior Convex problem f + g Non-convex problem f + g 8/51
: Rn → R x∗ ∈ arg min x∈Rn h(x) Explicit Gradient Descent operator for differentiable functions xk+1 = (Id −τ∇h)(xk ) Fixed point: x∗ = (Id −τ∇h)(x∗) ⇔ Minimizer: ∇h(x∗) = 0 Convergence guarantee if Id −τ∇h is non-expansive ||(Id −τ∇h)(x) − (Id −τ∇h)(y)|| ≤ ||x − y|| Sufficient condition: if h convex with L-Lipschitz gradient and τL < 2 9/51
: Rn → R x∗ ∈ arg min x∈Rn h(x) To sum up Convexity of function h ⇒ non-expansiveness of gradient/proximal operators ⇒ Convergence of gradient iterations to minimizers of h 9/51
ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p 11/51
ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p Plug & Play [Venkatakrishnan et al. ’13, Romano et al. 17]: Replace the gradient operator of the regularization by an external denoiser Dσ : Rn → Rn learnt for noise level σ Dσ (y) ≈ x∗ MAP (y) = Prox−σ2 log p (y) 11/51
ˆ x + ξ with ξ ∼ N(0, σ2 Idn ): x∗ MAP (y) = arg min x∈Rn 1 2σ2 ||x − y||2 − log p(x) = Prox−σ2 log p (y) A denoiser is related to an implicit prior p Plug & Play [Venkatakrishnan et al. ’13, Romano et al. 17]: Replace the gradient operator of the regularization by an external denoiser Dσ : Rn → Rn learnt for noise level σ Dσ (y) ≈ x∗ MAP (y) = Prox−σ2 log p (y) Example Dσ : DRUNet [Zhang et al’ 21] Remark: State-of-the-art denoisers are not non-expansive 11/51
image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21] Implicit prior → no tractable minimization problem minx f (x)+g(x) 13/51
image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21] Implicit prior → no tractable minimization problem minx f (x)+g(x) Efficient denoisers are not non-expansive → Convergence guarantees 13/51
image restoration performance of PnP methods with deep denoisers IRCNN [Zhang et al. ’17], DPIR [Zhang et al. ’21] Implicit prior → no tractable minimization problem minx f (x)+g(x) Efficient denoisers are not non-expansive → Convergence guarantees Relevant priors are not log-concave → Non-convex problems 13/51
assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| non-expansiveness can degrade denoising performances 15/51
assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19] Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk ) 15/51
assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19] Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk ) Implicit prior: no tractable minimization problem No explicit expression of g such that x∗ ∈ arg minx f (x) + g(x) 15/51
assumed non-expansive [Reehorst and Schniter ’18, Liu et al. ’21], firmly non-expansive [Terris et al. ’20, Sun et al. ’21] or averaged [Sun et al. ’19, Hertrich et al. ‘21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| non-expansiveness can degrade denoising performances • non-expansive residual Id −Dσ , L > 1 Lipschitz denoiser [Ryu et al. ’19] Can only handle strongly convex data terms f xk+1 = Dσ ◦ (Id −τ∇f )(xk ) Implicit prior: no tractable minimization problem No explicit expression of g such that x∗ ∈ arg minx f (x) + g(x) Our objective: → 15/51
Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration 16/51
• Define the denoiser as a gradient descent step over a differentiable (non-convex) scalar function gσ : Rn → R: Dσ = Id −∇gσ • Proposed PnP algorithm, for τ > 0 xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) Plugging the Gradient step denoiser Dσ xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) ⇔ Proximal Gradient Descent on the (non-convex) problem min x f (x) + gσ (x) 17/51
Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration 18/51
Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x) explicit prior minimization problem ? convergence guarantees ? SOTA restoration 18/51
f min x f (x) + gσ (x) PnP algorithm xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) Function value and residual convergence [Beck and Teboulle ’09] for τL < 1 • gσ differentiable with L-Lipschitz gradient • gσ bounded from below • f bounded from below, may be non-convex 19/51
f min x f (x) + gσ (x) PnP algorithm xk+1 = Proxτf ◦(Id −τ∇gσ )(xk ) Function value and residual convergence [Beck and Teboulle ’09] for τL < 1 • gσ differentiable with L-Lipschitz gradient • gσ bounded from below • f bounded from below, may be non-convex Convergence to stationary points [Attouch, Bolte and Svaiter ’13] • f and gσ satisfies the Kurdyka-Lojasiewicz (KL) • The iterates xk are bounded 19/51
= 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn, neural network with C2 and Lipschitz gradient activations Check the required assumptions gσ is bounded from below, C1 and with L > 1 Lipschitz gradient ||∇2gσ (x)||S 21/51
= 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn, neural network with C2 and Lipschitz gradient activations Check the required assumptions gσ is bounded from below, C1 and with L > 1 Lipschitz gradient ||∇2gσ (x)||S Coercivity to bound iterates xk [Laumont et al. ‘21] ˆ gσ (x) = gσ (x) + 1 2 ||x − ProjC (x)||2 for a large convex compact set C (never activated in practice) 21/51
1 2 ||x − Nσ (x)||2 = Nσ (x) + JNσ (x)T (x − Nσ (x)) • Correct a denoiser Nσ (x) to make it a conservative vector field RED [Romano et al. ’17] • Learn the refitting of the denoiser Nσ (x) CLEAR [Deledalle et al.’17] 22/51
JNσ (x)T (x − Nσ (x)) Architecture for Nσ : light version of DRUNet [Zhang et al’ 21] Training with L2 loss, for σ ∈ [0, 50/255]: L(σ) = Ex∼p,ξσ∼N(0,σ2) ||Dσ (x + ξσ ) − x||2 Training set: Berkeley, Waterloo, DIV2K and Flick2K 23/51
Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x) explicit prior minimization problem convergence guarantees ? SOTA restoration 25/51
A and noise level ν f (x) + gσ (x) = λ 2 ||Ax − y||2 + 1 2 ||Nσ (x) − x||2 GS-PnP algorithm xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) • Deep denoiser Dσ (x) = Nσ (x) + JNσ (x)T (x − Nσ (x)) • Proximal operator Proxτf (z) = τλAT A + Id −1 τλAT y + z • Backtracking procedure for τ [Hu et al. ‘22] - Rough empirical estimation of the lipschitz constant L - Not being stuck at the first local minima • Automatic tuning via Reinforcement Lerning [Wei et. al ‘20’] 26/51
uniformly in log-scale from 50 to ν and τ ∝ σ2 Asymptotic behavior of DPIR: (i) Decreasing timestep along 1000 iterations (ii) Decreasing timestep on the 8 first iterations and constant for the next 992 ones 30/51
uniformly in log-scale from 50 to ν and τ ∝ σ2 Asymptotic behavior of DPIR: (i) Decreasing timestep along 1000 iterations (ii) Decreasing timestep on the 8 first iterations and constant for the next 992 ones 30/51
Maximum A-Posteriori arg min x∈Rn f (x) + g(x) unknown prior p Plug-and-Play (PnP) xk+1 = Proxτf ◦Dσ (xk ) implicit prior no minimization problem no convergence guarantees SOTA restoration GS-PnP x∗ ∈ arg min x∈Rn f (x) + gσ (x) explicit prior minimization problem convergence guarantees SOTA restoration 36/51
Dσ ◦ (Id −τ∇f )(xk ) • PnP-DRS: xk+1 = (2 Proxτf − Id) ◦ (2Dσ − Id)(xk ) PGD/DRS rely on implicit gradient steps Dσ = Proxφσ of proximal operators of convex functions φσ Firmly non-expansive operator: limited denoising performances Idea [Hurault et al. ‘22] • Build on top of the explicit gradient step denoiser Dσ = Id −∇gσ • Make Id −∇gσ the proximal operator of a non-convex potential φσ 38/51
−∇gσ = ∇ 1 2 ||x||2 − gσ (x) := ∇hσ Proposition [Moreau’65, Gribonval and Nikolova ’20] If hσ is convex then there exists some function φσ such that Dσ ∈ Proxφσ (x) 39/51
−∇gσ = ∇ 1 2 ||x||2 − gσ (x) := ∇hσ Theorem [Gribonval ’11, Gribonval and Nikolova ’20] If gσ is Ck+1, k ≥ 1 and if ∇gσ is contractive (L < 1 Lipschitz), then • hσ is (1 − L) strongly convex • Dσ = Proxφσ is injective with φσ (x) := gσ (Dσ −1(x))) − 1 2 ||Dσ −1(x) − x||2 + Kσ if x ∈ Im(Dσ ) • φσ ≥ gσ (x) • φσ is L/(L + 1) semi-convex (φσ (x) + L 2(L+1) ||x||2 is convex) • φσ is Ck and with L 1−L Lipschitz gradient on Im(Dσ ) 40/51
Gradient step expression Dσ = Id −∇gσ • Contractive residual Id −Dσ = ∇gσ , similar to [Ryu et al ’19] The denoiser is not non-expansive Residual condition hard to satisfy exactly 41/51
Gradient step expression Dσ = Id −∇gσ • Contractive residual Id −Dσ = ∇gσ , similar to [Ryu et al ’19] The denoiser is not non-expansive Residual condition hard to satisfy exactly If gσ is C2: Id −Dσ contractive ⇔ ||JId −Dσ ||S < 1 Penalization of the spectral norm of the Jacobian [Gulrajani et al ’17, Terris et al. ’20] 41/51
(x) PnP-PGD, for differentiable f xk+1 = Proxφσ ◦(Id −∇f )(xk ) Convergence (function value and iterates) [Attouch et al.’13, Li and Lin ’15, Beck ’17] • φσ and f bounded from below • φσ is L L+1 semi convex with L < 1 • f of class C1 with a Lf < L+2 L+1 -Lipschitz gradient 45/51
(x) PnP-PGD, for differentiable f xk+1 = Proxφσ ◦(Id −∇f )(xk ) Convergence (function value and iterates) [Attouch et al.’13, Li and Lin ’15, Beck ’17] • φσ and f bounded from below • φσ is L L+1 semi convex with L < 1 • f of class C1 with a Lf < L+2 L+1 -Lipschitz gradient What about non differentiable data fidelity terms? 45/51
(x) PnP-DRS xk+1 = (2 Proxφσ − Id) ◦ (2 Proxf − Id)(xk ) Reformulation (also equivalent to ADMM) yk+1 = Proxφσ (xk ) zk+1 = Proxf (2yk+1 − xk+1 ) xk+1 = xk + (zk+1 − yk+1 ) Douglas-Rachford Envelope as Lyapunov function FDR (x) = φσ (y) + f (z) + y − x, y − z + 1 2 ||y − z||2 Convergence for proper l.s.c. functions f and φσ if one function has a Lipschitz gradient with constant < 1 [Themelis and Patrinos ’20] φσ has a L 1−L Lipschitz gradient: the residual Id −Dσ must be L < 0.5-Lipschitz 46/51
Denoiser for convergent Plug-and-Play - ICLR, 2022 • S. Hurault, A. Leclaire, N. Papadakis - Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization - ICML, 2022 • S. Hurault, A. Chambolle, A. Leclaire, N. Papadakis, - A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser, 2023.
Teboulle ’09] Assuming • f : Rn → R ∪ {+∞} proper, convex, lower semicontinous • g : Rn → R differentiable with L-Lipschitz gradient • F = f + λg bounded from below Then, for τ < 1 λL , the iterates xk given by the PGD algorithm xk+1 = Proxτf ◦(Id −λτ∇g)(xk ) verify (i) (F(xk )) is non-increasing and converging. (ii) The residual ||xk+1 − xk || converges to 0 (iii) Cluster points of (xk ) are stationary points of F. Remark: Can be extended to non-convex data terms [Li and Lin ’15]
Assuming • Same as before + f and g verify the Kurdyka-Lojasiewicz (KL) property Then, for τ < 1 λL , if the sequence (xk ) given by xk+1 = Proxτf ◦(Id −λτ∇g)(xk ) is bounded, it converges, with finite length, to a critical point of f + λg. Go back
Li and Lin ’15, Beck ’17]) For F = f + φσ , assume • φσ such that Dσ = Id −∇gσ = Proxφσ with gσ : Rn → R ∪ {+∞} of class C2 with L < 1-Lipschitz gradient and bounded from below • f : Rn → R ∪ {+∞} differentiable with Lf < 1-Lipschitz gradient and bounded from below Then the iterates xk+1 = Dσ ◦ (Id −∇f )(xk ) of PnP-PGD satisfy (i) (F(xk )) is non-increasing and converges (ii) The residual converges ||xk+1 − xk || convers to 0 (iii) Cluster points of (xk ) are stationary points of F (iv) If f and gσ are respectively KL and semi-algebraic, then if (xk ) is bounded, it converges, with finite length, to a stationary point of F Go back
F = f + φσ , assume • φσ such that Dσ = Id −∇gσ = Proxφσ with gσ : Rn → R ∪ {+∞} of class C2 with L-Lipschitz gradient (such that 2L3 + L2 + 2L − 1 < 0) and bounded from below • f : Rn → R ∪ {+∞} proper l.s.c and bounded from below Take the Douglas-Rachford Envelope as Lyapunov function FDR (x) = φσ(y) + f (z) + y − x, y − z + 1 2 ||y − z||2 Then the iterates of PnP-DRS satisfy (i) (FDR (xk )) is nonincreasing and converges (ii) The residual ||yk − zk || converges to 0 (iii) For any cluster point (y∗, z∗, x∗), y∗ and z∗ are stationary points of F (iv) If f and gσ are KL and semi-algebraic, then if (yk , zk , xk ) is bounded, it converges, and yk and zk converge to the same stationary point of F Go back
(x)T (x − Nσ (x)) Computing JNσ (x)T (x − Nσ (x)): N = DRUNet light ( x , sigma ) t o r c h . autograd . grad (N, x , g r a d o u t p u t s=x − N, c r e a t e g r a p h=True , o n l y i n p u t s=True ) [ 0 ] Go back
Rn → Rn and θ ∈ (0, 1). T is θ-averaged if there is a nonexpansive operator R s.t. T = θR + (1 − θ) Id . • This is equivalent to ∀x, y ∈ Rn, T(x)−T(y) 2 + 1−θ θ (Id −T)(x)−(Id −T)(y) 2 ≤ x −y 2. • T θ-averaged =⇒ T nonexpansive. • 1/2-averaged = “firmly nonexpansive”.
Rn → Rn and θ ∈ (0, 1). T is θ-averaged if there is a nonexpansive operator R s.t. T = θR + (1 − θ) Id . • This is equivalent to ∀x, y ∈ Rn, T(x)−T(y) 2 + 1−θ θ (Id −T)(x)−(Id −T)(y) 2 ≤ x −y 2. • T θ-averaged =⇒ T nonexpansive. • 1/2-averaged = “firmly nonexpansive”. Theorem (Krasnosel’ski˘ ı-Mann) Let T : Rn → Rn be a θ-averaged operator such that Fix(T) = ∅. Then the sequence xk+1 = T(xk ) converges to a fixed point of T.
θ1 -averaged and T2 θ2 -averaged, with any θ1 , θ2 ∈ (0, 1). • Then T1 ◦ T2 is θ-averaged with θ = θ1+θ2−2θ1θ2 1−θ1θ2 ∈ (0, 1). • For α ∈ [0, 1], αT1 + (1 − α) Id is αθ1 -averaged. Proposition If f : Rn → R is convex and L-smooth (i.e. differentiable with L-Lipschitz gradient) • Proxτf is τL 2(1+τL) -averaged for any τ > 0. • Id −τ∇f is τL 2 -averaged for τ < 2 L .
TPnP (xk ) with TPnP = THQS = Dσ ◦ Proxτf TPGD = Dσ ◦ (Id −τ∇f ) TDRS = 1 2 Id +1 2 (2Dσ − Id) ◦ (2 Proxτf − Id) Theorem If f : Rn→R convex, L-smooth and Dσ isθ-averaged, θ ∈ (0, 1) • PnP-HQS converges towards a fixed point of THQS . • If τL < 2, PnP-PGD converges towards a fixed point of TPGD . • If θ ≤ 1/2, PnP-DRS converges towards a fixed point of TDRS .
TPnP (xk ) with TPnP = THQS = Dσ ◦ Proxτf TPGD = Dσ ◦ (Id −τ∇f ) TDRS = 1 2 Id +1 2 (2Dσ − Id) ◦ (2 Proxτf − Id) Theorem If f : Rn→R convex, L-smooth and Dσ isθ-averaged, θ ∈ (0, 1) • PnP-HQS converges towards a fixed point of THQS . • If τL < 2, PnP-PGD converges towards a fixed point of TPGD . • If θ ≤ 1/2, PnP-DRS converges towards a fixed point of TDRS . Does not extend to nonconvex data-fidelity terms f . • If f is L-smooth and strongly convex, Proxτf is contractive, and for τL < 2, Id −τ∇f is contractive → assume Dσ (1 + )-Lipshitz [Ryu et. al, ’19].
= θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19] Lipschitz constant 1 for large networks. Does not allow skip connexions.
= θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19] Lipschitz constant 1 for large networks. Does not allow skip connexions. • Deep spline neural networks [Goujon, Neumayer et. al, ‘22] • Convolutional Proximal Neural Networks [Hertrich et. al, ‘20] • Soft regularization of the training loss [Terris et. al ’21]. • Dσ = Id −∇gσ with gσ Input Convex Neural Network (ICNN) [Meunier et. al ‘21].
= θRσ + (1 − θ) Id with Rσ non-expansive. → How to train non-expansive networks ? • Spectral normalization. [Miyato et. al ‘18, Ryu et. al ‘19] Lipschitz constant 1 for large networks. Does not allow skip connexions. • Deep spline neural networks [Goujon, Neumayer et. al, ‘22] • Convolutional Proximal Neural Networks [Hertrich et. al, ‘20] • Soft regularization of the training loss [Terris et. al ’21]. • Dσ = Id −∇gσ with gσ Input Convex Neural Network (ICNN) [Meunier et. al ‘21]. Non-expansiveness can harm denoising performance.
Build a nonexpansive convolutional neural network (CNN) D = TM ◦ · · · ◦ T1 with Tm (x) = Rm (Wm x + bm ) where Rm is an (averaged) activation function, Wm a convolution, and bm a bias. • We want to have D = Id +Q 2 with Q nonexpansive. • During training, the Lipschitz constant of 2D − Id is penalized.
a convolutional proximal neural network (cPNN) Φu = TM ◦ · · · ◦ T1 with Tm (x) = W T m σm (Wm x + bm ) where u = (Wm , σm , bm )1≤m≤M is a collection of parameters. The linear operators Wm (or W T m ) are convolutions lying in a Stiefel manifold St(d, n) = { W ∈ Rn×d | W T W = Id }. The resulting denoiser is then D = Id −γΦu . • Ideally, Φu is a composition of M firmly non-expansive operators, thus averaged. • In practice, Wm is a convolution with limited filter length. • Condition Wm ∈ St is approximated with a term W T m Wm − Id 2 F in the learning cost. • Φu is verified in practice to be t-averaged with t close to 1 2 .
Approximate the proximal operator of a convex-ridge regularizer R(x) = P p=1 i ψp (hp ∗ x(i)) hp are convolution kernels, ψp are particular C1 convex functions. Given a noisy y, ProxλR (y) = arg min x∈Rn 1 2 x − y 2 + λR(x) is approximated with t iterations of the gradient-step x → x − α((x − y) + λ∇R(x)). The output after t iterations is denoted by Tt R,λ,α (y). • Tt R,λ,α approximates the prox of a convex function • Linear spline parameterization of ψp justified by a density result