Slide 1

Slide 1 text

Recovery Guarantees for Low Complexity Models Samuel Vaiter CEREMADE, Univ. Paris-Dauphine [email protected] October 24, 2013 Séminaire Image, GREYC, ENSICAEN

Slide 2

Slide 2 text

People J. Fadili G. Peyré C. Dossal M. Golbabaee C. Deledalle IMB GREYC CEREMADE

Slide 3

Slide 3 text

Papers V., M. Golbabaee, M. J. Fadili et G. Peyré, Model Selection with Piecewise Regular Gauges, Tech. report, http://arxiv.org/abs/1307.2342, 2013 J. Fadili, V. and G. Peyré, Linear Convergence Rates for Gauge Regularization, ongoing work

Slide 4

Slide 4 text

Outline Variational Estimator Gauge and Model Space 2 Robustness and Model Selection Some Examples

Slide 5

Slide 5 text

Linear Inverse Problem denoising inpainting deblurring

Slide 6

Slide 6 text

Linear Inverse Problem : Forward Model y = Φ x0 + w y ∈ RQ observations Φ ∈ RQ×N linear operator x0 ∈ RN unknown vector w ∈ RQ realization of a noise (bounded here)

Slide 7

Slide 7 text

Linear Inverse Problem : Forward Model y = Φ x0 + w y ∈ RQ observations Φ ∈ RQ×N linear operator x0 ∈ RN unknown vector w ∈ RQ realization of a noise (bounded here) Objective: Recover x0 from y.

Slide 8

Slide 8 text

Linear Inverse Problem : Forward Model y = Φ x0 + w y ∈ RQ observations Φ ∈ RQ×N linear operator x0 ∈ RN unknown vector w ∈ RQ realization of a noise (bounded here) Objective: Recover x0 from y. Φ ill-posed

Slide 9

Slide 9 text

Designing an Estimator x0 y degradation

Slide 10

Slide 10 text

Designing an Estimator x0 y degradation xM (y) model M

Slide 11

Slide 11 text

Designing an Estimator x0 y degradation xM (y) model M error estimation

Slide 12

Slide 12 text

The Variational Approach x ∈ argmin x∈RN F(y, x) + λ J(x) (Pλ(y)) Trade-off between data fidelity and prior regularization

Slide 13

Slide 13 text

The Variational Approach x ∈ argmin x∈RN F(y, x) + λ J(x) (Pλ(y)) Trade-off between data fidelity and prior regularization • Data fidelity: 2 loss, logistic, etc. F(y, x) = 1 2 ||y − Φx||2 2

Slide 14

Slide 14 text

The Variational Approach x ∈ argmin x∈RN F(y, x) + λ J(x) (Pλ(y)) Trade-off between data fidelity and prior regularization • Data fidelity: 2 loss, logistic, etc. F(y, x) = 1 2 ||y − Φx||2 2 • Parameter: By hand or automatic like SURE.

Slide 15

Slide 15 text

The Variational Approach x ∈ argmin x∈RN F(y, x) + λ J(x) (Pλ(y)) Trade-off between data fidelity and prior regularization • Data fidelity: 2 loss, logistic, etc. F(y, x) = 1 2 ||y − Φx||2 2 • Parameter: By hand or automatic like SURE. • Regularization: ?

Slide 16

Slide 16 text

A Zoo ... ? Block Sparsity Nuclear Norm Trace Lasso Polyhedral Antisparsity Total Variation

Slide 17

Slide 17 text

Relations to Previous Works • Fuchs, J. J. (2004). On sparse representations in arbitrary redundant bases. • Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. • Grasmair, M. and al. (2008). Sparse regularization with q penalty term. • Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning & Consistency of trace norm minimization. • V. and al. (2011). Robust sparse analysis regularization. • Grasmair, M. and al. (2011). Necessary and sufficient conditions for linear convergence of 1-regularization. • Grasmair, M. (2011). Linear convergence rates for Tikhonov regularization with positively homogeneous functionals. (and more !)

Slide 18

Slide 18 text

Outline Variational Estimator Gauge and Model Space 2 Robustness and Model Selection Some Examples

Slide 19

Slide 19 text

The Sparse Way Sparse approximation: most of wavelets coefficients are 0

Slide 20

Slide 20 text

Back to the Source: Union of Linear Models 2 components signal

Slide 21

Slide 21 text

Back to the Source: Union of Linear Models 2 components signal T0 0 is the only 0-sparse vector

Slide 22

Slide 22 text

Back to the Source: Union of Linear Models 2 components signal Te1 Te2 Axis points are 1-sparse (except 0)

Slide 23

Slide 23 text

Back to the Source: Union of Linear Models 2 components signal The whole space points minus the axis are 2-sparse

Slide 24

Slide 24 text

0 to 1 Combinatorial penalty associated to the previous union of model J(x) = ||x||0 = | {i : xi = 0} |

Slide 25

Slide 25 text

0 to 1 Combinatorial penalty associated to the previous union of model J(x) = ||x||0 = | {i : xi = 0} | → non-convex → no regularity → NP-hard regularization Encode Union of Model in a good functional

Slide 26

Slide 26 text

0 to 1 Combinatorial penalty associated to the previous union of model J(x) = ||x||0 = | {i : xi = 0} | → non-convex → no regularity → NP-hard regularization Encode Union of Model in a good functional x ||x||0 1 ||x||1

Slide 27

Slide 27 text

Union of Linear Models to Regularizations Union of Model Gauges Combinatorial world Functional world

Slide 28

Slide 28 text

Gauge J(x) 0 J(λx) = λJ(x) for λ 0 J convex x → J(x) 1 C = {x : J(x) 1} C a convex set

Slide 29

Slide 29 text

Gauge J(x) 0 J(λx) = λJ(x) for λ 0 J convex x → J(x) 1 C = {x : J(x) 1} C a convex set

Slide 30

Slide 30 text

Subdifferential x f (x) (x, x2)

Slide 31

Slide 31 text

Subdifferential x f (x) (x, x2)

Slide 32

Slide 32 text

Subdifferential x f (x)

Slide 33

Slide 33 text

Subdifferential ∂f (x) = η : f (x ) f (x) + η, x − x lines below the graphical representation x f (x)

Slide 34

Slide 34 text

Some Properties of the Subdifferential f bounded ⇒ ∂f (x) non-empty convex set f Gateaux-differentiable ⇔ ∂f (x) = {∇f (x)} 0 ∈ ∂f (x) ⇔ x minimum of f

Slide 35

Slide 35 text

Some Properties of the Subdifferential f bounded ⇒ ∂f (x) non-empty convex set f Gateaux-differentiable ⇔ ∂f (x) = {∇f (x)} 0 ∈ ∂f (x) ⇔ x minimum of f ∂| · |(x) = sign(x) if x = 0 [−1, 1] if x = 0

Slide 36

Slide 36 text

The Model Linear Space 0 x

Slide 37

Slide 37 text

The Model Linear Space 0 x ∂J(x)

Slide 38

Slide 38 text

The Model Linear Space 0 x ∂J(x)

Slide 39

Slide 39 text

The Model Linear Space 0 x ∂J(x) Tx Tx = VectHull(∂J(x))⊥

Slide 40

Slide 40 text

The Model Linear Space 0 x ∂J(x) Tx ex Tx = VectHull(∂J(x))⊥ ex = PTx (∂J(x))

Slide 41

Slide 41 text

Special cases Sparsity Tx = {η : supp(η) ⊆ supp(x)} ex = sign(x)

Slide 42

Slide 42 text

Special cases Sparsity Tx = {η : supp(η) ⊆ supp(x)} ex = sign(x) (Aniso/Iso)tropic Total Variation Tx = {η : supp(∇η) ⊆ supp(∇x)} ex = sign(∇x) or ex = ∇x ||∇x||

Slide 43

Slide 43 text

Special cases Sparsity Tx = {η : supp(η) ⊆ supp(x)} ex = sign(x) (Aniso/Iso)tropic Total Variation Tx = {η : supp(∇η) ⊆ supp(∇x)} ex = sign(∇x) or ex = ∇x ||∇x|| Trace Norm SVD: x = UΛV ∗ Tx = {η : U∗ ⊥ ηV⊥ = 0} ex = UV ∗

Slide 44

Slide 44 text

Algebraic Stability Composition by a linear operator • ||∇ · ||1 — Anisotropic TV • ||∇ · ||1,2 — Istotropic TV • ||Udiag(·)||∗ — Trace Lasso

Slide 45

Slide 45 text

Algebraic Stability Composition by a linear operator • ||∇ · ||1 — Anisotropic TV • ||∇ · ||1,2 — Istotropic TV • ||Udiag(·)||∗ — Trace Lasso Sum of gauges (Composite priors) • || · ||1 + || · ||1 — sparse TV • || · ||1 + || · ||2 — Elastic net • || · ||1 + || · ||∗ — Sparse + Low-rank

Slide 46

Slide 46 text

Outline Variational Estimator Gauge and Model Space 2 Robustness and Model Selection Some Examples

Slide 47

Slide 47 text

What’s the Robustness ? x0 y degradation xM (y) model M error estimation

Slide 48

Slide 48 text

What’s the Robustness ? x0 y degradation xM (y) model M error estimation Data fidelity loss: ||x − x0|| Prediction loss: ||Φx − Φx0|| Regularization loss: J(x − x0) Taylor/Bregman metric Model selection: Tx = Tx0

Slide 49

Slide 49 text

Certificate x ∈ argmin Φx=Φx0 J(x) (P0(y)) x Φx = Φx0

Slide 50

Slide 50 text

Certificate x ∈ argmin Φx=Φx0 J(x) (P0(y)) ∂J(x) x Φx = Φx0 η Dual certificates: D = Im Φ∗ ∩ ∂J(x0)

Slide 51

Slide 51 text

Certificate x ∈ argmin Φx=Φx0 J(x) (P0(y)) ∂J(x) x Φx = Φx0 η Dual certificates: D = Im Φ∗ ∩ ∂J(x0) Proposition ∃η ∈ D ⇔ x0 solution de (P0(y))

Slide 52

Slide 52 text

Tight Certificate and Restricted Injectivity Tight dual certificates ¯ D = Im Φ∗ ∩ ri ∂J(x)

Slide 53

Slide 53 text

Tight Certificate and Restricted Injectivity Tight dual certificates ¯ D = Im Φ∗ ∩ ri ∂J(x) Restricted Injectivity Ker Φ ∩ Tx = {0} (RICx )

Slide 54

Slide 54 text

Tight Certificate and Restricted Injectivity Tight dual certificates ¯ D = Im Φ∗ ∩ ri ∂J(x) Restricted Injectivity Ker Φ ∩ Tx = {0} (RICx ) Proposition ∃η ∈ ¯ D ∧ (RICx ) ⇒ x unique solution of (Pλ(y))

Slide 55

Slide 55 text

2 stability Theorem If ∃η ∈ ¯ D ∧ (RICx ), then λ ∼ ||w|| ⇒ ||x − x || = O(||w||)

Slide 56

Slide 56 text

Minimal-norm Certificate η ∈ D ⇐⇒ η = Φ∗q, ηT = e and J◦(η) 1

Slide 57

Slide 57 text

Minimal-norm Certificate η ∈ D ⇐⇒ η = Φ∗q, ηT = e and J◦(η) 1 Minimal-norm precertificate η0 = argmin η=Φ∗q ηT =e ||q||

Slide 58

Slide 58 text

Minimal-norm Certificate η ∈ D ⇐⇒ η = Φ∗q, ηT = e and J◦(η) 1 Minimal-norm precertificate η0 = argmin η=Φ∗q ηT =e ||q|| Proposition If (RICx ), then η0 = (Φ+ T Φ)∗e

Slide 59

Slide 59 text

Model Selection Theorem If η0 ∈ ¯ D, the noise-to-signal ratio is low enough and λ ∼ ||w||, the unique solution x of (Pλ(y)) satifies Tx = Tx0 and ||x − x || = O(||w||)

Slide 60

Slide 60 text

A Better Certificate ? • With model selection: no

Slide 61

Slide 61 text

A Better Certificate ? • With model selection: no • Without: ongoing works Duval-Peyré: Sparse Deconvolution Dossal: Sparse Tomography

Slide 62

Slide 62 text

Outline Variational Estimator Gauge and Model Space 2 Robustness and Model Selection Some Examples

Slide 63

Slide 63 text

Sparse Spike Deconvolution (Dossal, 2005) x0

Slide 64

Slide 64 text

Sparse Spike Deconvolution (Dossal, 2005) Φx = i xi ϕ(· − ∆i) J(x) = ||x||1 x0 γ Φx0

Slide 65

Slide 65 text

Sparse Spike Deconvolution (Dossal, 2005) Φx = i xi ϕ(· − ∆i) J(x) = ||x||1 x0 γ Φx0 η0 ∈ ¯ D ⇔ ||Φ+,∗ Ic ΦI s||∞ < 1 γ ||η0,Ic ||∞ γcrit 1

Slide 66

Slide 66 text

1D TV Denoising (V. et al., 2011) Φ = Id J(x) = ||∇x||1 i xi i xi

Slide 67

Slide 67 text

1D TV Denoising (V. et al., 2011) Φ = Id J(x) = ||∇x||1 i xi i xi k mk k mk +1 −1

Slide 68

Slide 68 text

1D TV Denoising (V. et al., 2011) Φ = Id J(x) = ||∇x||1 i xi i xi k mk k mk +1 −1 Support stability No support stability

Slide 69

Slide 69 text

1D TV Denoising (V. et al., 2011) Φ = Id J(x) = ||∇x||1 i xi i xi k mk k mk +1 −1 Support stability No support stability Both are 2-stable

Slide 70

Slide 70 text

2D TV Denoising Φ = Id J(x) = || ∇→ ∇↑ x||1 ∇→ ∇↑

Slide 71

Slide 71 text

2D TV Denoising Φ = Id J(x) = || ∇→ ∇↑ x||1 ∇→ ∇↑

Slide 72

Slide 72 text

Open Problems Union of Model Gauges Combinatorial world Functional world

Slide 73

Slide 73 text

Thanks for your attention !