Recovery Guarantees for Low Complexity Models

Slide 1

Slide 1 text

Recovery Guarantees for Low Complexity Models Samuel Vaiter CEREMADE, Univ. Paris-Dauphine [email protected] October 24, 2013 Séminaire Image, GREYC, ENSICAEN

Slide 2

Slide 2 text

People J. Fadili G. Peyré C. Dossal M. Golbabaee C. Deledalle IMB GREYC CEREMADE

Slide 3

Slide 3 text

Papers V., M. Golbabaee, M. J. Fadili et G. Peyré, Model Selection with Piecewise Regular Gauges, Tech. report, http://arxiv.org/abs/1307.2342, 2013 J. Fadili, V. and G. Peyré, Linear Convergence Rates for Gauge Regularization, ongoing work

Slide 4

Slide 4 text

Outline Variational Estimator Gauge and Model Space 2 Robustness and Model Selection Some Examples

Slide 5

Slide 5 text

Linear Inverse Problem denoising inpainting deblurring

Slide 6

Slide 6 text

Linear Inverse Problem : Forward Model y = Φ x0 + w y ∈ RQ observations Φ ∈ RQ×N linear operator x0 ∈ RN unknown vector w ∈ RQ realization of a noise (bounded here)

Slide 7

Slide 7 text

Linear Inverse Problem : Forward Model y = Φ x0 + w y ∈ RQ observations Φ ∈ RQ×N linear operator x0 ∈ RN unknown vector w ∈ RQ realization of a noise (bounded here) Objective: Recover x0 from y.

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Designing an Estimator x0 y degradation

Slide 10

Slide 10 text

Designing an Estimator x0 y degradation xM (y) model M

Slide 11

Slide 11 text

Designing an Estimator x0 y degradation xM (y) model M error estimation

Slide 12

Slide 12 text

The Variational Approach x ∈ argmin x∈RN F(y, x) + λ J(x) (Pλ(y)) Trade-oﬀ between data ﬁdelity and prior regularization

Slide 13

Slide 13 text

The Variational Approach x ∈ argmin x∈RN F(y, x) + λ J(x) (Pλ(y)) Trade-off between data fidelity and prior regularization • Data fidelity: 2 loss, logistic, etc. F(y, x) = 1 2 ||y − Φx||2 2

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

A Zoo ... ? Block Sparsity Nuclear Norm Trace Lasso Polyhedral Antisparsity Total Variation

Slide 17

Slide 17 text

Relations to Previous Works • Fuchs, J. J. (2004). On sparse representations in arbitrary redundant bases. • Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. • Grasmair, M. and al. (2008). Sparse regularization with q penalty term. • Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning & Consistency of trace norm minimization. • V. and al. (2011). Robust sparse analysis regularization. • Grasmair, M. and al. (2011). Necessary and suﬃcient conditions for linear convergence of 1-regularization. • Grasmair, M. (2011). Linear convergence rates for Tikhonov regularization with positively homogeneous functionals. (and more !)

Slide 18

Slide 18 text

Outline Variational Estimator Gauge and Model Space 2 Robustness and Model Selection Some Examples

Slide 19

Slide 19 text

The Sparse Way Sparse approximation: most of wavelets coeﬃcients are 0

Slide 20

Slide 20 text

Back to the Source: Union of Linear Models 2 components signal

Slide 21

Slide 21 text

Back to the Source: Union of Linear Models 2 components signal T0 0 is the only 0-sparse vector

Slide 22

Slide 22 text

Back to the Source: Union of Linear Models 2 components signal Te1 Te2 Axis points are 1-sparse (except 0)

Slide 23

Slide 23 text

Back to the Source: Union of Linear Models 2 components signal The whole space points minus the axis are 2-sparse

Slide 24

Slide 24 text

0 to 1 Combinatorial penalty associated to the previous union of model J(x) = ||x||0 = | {i : xi = 0} |

Slide 25

Slide 25 text

0 to 1 Combinatorial penalty associated to the previous union of model J(x) = ||x||0 = | {i : xi = 0} | → non-convex → no regularity → NP-hard regularization Encode Union of Model in a good functional

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Union of Linear Models to Regularizations Union of Model Gauges Combinatorial world Functional world

Slide 28

Slide 28 text

Gauge J(x) 0 J(λx) = λJ(x) for λ 0 J convex x → J(x) 1 C = {x : J(x) 1} C a convex set

Slide 29

Slide 29 text

Gauge J(x) 0 J(λx) = λJ(x) for λ 0 J convex x → J(x) 1 C = {x : J(x) 1} C a convex set

Slide 30

Slide 30 text

Subdiﬀerential x f (x) (x, x2)

Slide 31

Slide 31 text

Subdiﬀerential x f (x) (x, x2)

Slide 32

Slide 32 text

Subdiﬀerential x f (x)

Slide 33

Slide 33 text

Subdiﬀerential ∂f (x) = η : f (x ) f (x) + η, x − x lines below the graphical representation x f (x)

Slide 34

Slide 34 text

Some Properties of the Subdiﬀerential f bounded ⇒ ∂f (x) non-empty convex set f Gateaux-diﬀerentiable ⇔ ∂f (x) = {∇f (x)} 0 ∈ ∂f (x) ⇔ x minimum of f

Slide 35

Slide 35 text

Some Properties of the Subdiﬀerential f bounded ⇒ ∂f (x) non-empty convex set f Gateaux-diﬀerentiable ⇔ ∂f (x) = {∇f (x)} 0 ∈ ∂f (x) ⇔ x minimum of f ∂| · |(x) = sign(x) if x = 0 [−1, 1] if x = 0

Slide 36

Slide 36 text

The Model Linear Space 0 x

Slide 37

Slide 37 text

The Model Linear Space 0 x ∂J(x)

Slide 38

Slide 38 text

The Model Linear Space 0 x ∂J(x)

Slide 39

Slide 39 text

The Model Linear Space 0 x ∂J(x) Tx Tx = VectHull(∂J(x))⊥

Slide 40

Slide 40 text

The Model Linear Space 0 x ∂J(x) Tx ex Tx = VectHull(∂J(x))⊥ ex = PTx (∂J(x))

Slide 41

Slide 41 text

Special cases Sparsity Tx = {η : supp(η) ⊆ supp(x)} ex = sign(x)

Slide 42

Slide 42 text

Special cases Sparsity Tx = {η : supp(η) ⊆ supp(x)} ex = sign(x) (Aniso/Iso)tropic Total Variation Tx = {η : supp(∇η) ⊆ supp(∇x)} ex = sign(∇x) or ex = ∇x ||∇x||

Slide 43

Slide 43 text

Special cases Sparsity Tx = {η : supp(η) ⊆ supp(x)} ex = sign(x) (Aniso/Iso)tropic Total Variation Tx = {η : supp(∇η) ⊆ supp(∇x)} ex = sign(∇x) or ex = ∇x ||∇x|| Trace Norm SVD: x = UΛV ∗ Tx = {η : U∗ ⊥ ηV⊥ = 0} ex = UV ∗

Slide 44

Slide 44 text

Algebraic Stability Composition by a linear operator • ||∇ · ||1 — Anisotropic TV • ||∇ · ||1,2 — Istotropic TV • ||Udiag(·)||∗ — Trace Lasso

Slide 45

Slide 45 text

Algebraic Stability Composition by a linear operator • ||∇ · ||1 — Anisotropic TV • ||∇ · ||1,2 — Istotropic TV • ||Udiag(·)||∗ — Trace Lasso Sum of gauges (Composite priors) • || · ||1 + || · ||1 — sparse TV • || · ||1 + || · ||2 — Elastic net • || · ||1 + || · ||∗ — Sparse + Low-rank

Slide 46

Slide 46 text

Outline Variational Estimator Gauge and Model Space 2 Robustness and Model Selection Some Examples

Slide 47

Slide 47 text

What’s the Robustness ? x0 y degradation xM (y) model M error estimation

Slide 48

Slide 48 text

What’s the Robustness ? x0 y degradation xM (y) model M error estimation Data ﬁdelity loss: ||x − x0|| Prediction loss: ||Φx − Φx0|| Regularization loss: J(x − x0) Taylor/Bregman metric Model selection: Tx = Tx0

Slide 49

Slide 49 text

Certiﬁcate x ∈ argmin Φx=Φx0 J(x) (P0(y)) x Φx = Φx0

Slide 50

Slide 50 text

Certiﬁcate x ∈ argmin Φx=Φx0 J(x) (P0(y)) ∂J(x) x Φx = Φx0 η Dual certiﬁcates: D = Im Φ∗ ∩ ∂J(x0)

Slide 51

Slide 51 text

Certiﬁcate x ∈ argmin Φx=Φx0 J(x) (P0(y)) ∂J(x) x Φx = Φx0 η Dual certiﬁcates: D = Im Φ∗ ∩ ∂J(x0) Proposition ∃η ∈ D ⇔ x0 solution de (P0(y))

Slide 52

Slide 52 text

Tight Certiﬁcate and Restricted Injectivity Tight dual certiﬁcates ¯ D = Im Φ∗ ∩ ri ∂J(x)

Slide 53

Slide 53 text

Tight Certiﬁcate and Restricted Injectivity Tight dual certiﬁcates ¯ D = Im Φ∗ ∩ ri ∂J(x) Restricted Injectivity Ker Φ ∩ Tx = {0} (RICx )

Slide 54

Slide 54 text

Tight Certiﬁcate and Restricted Injectivity Tight dual certiﬁcates ¯ D = Im Φ∗ ∩ ri ∂J(x) Restricted Injectivity Ker Φ ∩ Tx = {0} (RICx ) Proposition ∃η ∈ ¯ D ∧ (RICx ) ⇒ x unique solution of (Pλ(y))

Slide 55

Slide 55 text

2 stability Theorem If ∃η ∈ ¯ D ∧ (RICx ), then λ ∼ ||w|| ⇒ ||x − x || = O(||w||)

Slide 56

Slide 56 text

Minimal-norm Certiﬁcate η ∈ D ⇐⇒ η = Φ∗q, ηT = e and J◦(η) 1

Slide 57

Slide 57 text

Minimal-norm Certiﬁcate η ∈ D ⇐⇒ η = Φ∗q, ηT = e and J◦(η) 1 Minimal-norm precertiﬁcate η0 = argmin η=Φ∗q ηT =e ||q||

Slide 58

Slide 58 text

Minimal-norm Certiﬁcate η ∈ D ⇐⇒ η = Φ∗q, ηT = e and J◦(η) 1 Minimal-norm precertiﬁcate η0 = argmin η=Φ∗q ηT =e ||q|| Proposition If (RICx ), then η0 = (Φ+ T Φ)∗e

Slide 59

Slide 59 text

Model Selection Theorem If η0 ∈ ¯ D, the noise-to-signal ratio is low enough and λ ∼ ||w||, the unique solution x of (Pλ(y)) satiﬁes Tx = Tx0 and ||x − x || = O(||w||)

Slide 60

Slide 60 text

A Better Certiﬁcate ? • With model selection: no

Slide 61

Slide 61 text

A Better Certiﬁcate ? • With model selection: no • Without: ongoing works Duval-Peyré: Sparse Deconvolution Dossal: Sparse Tomography

Slide 62

Slide 62 text

Outline Variational Estimator Gauge and Model Space 2 Robustness and Model Selection Some Examples

Slide 63

Slide 63 text

Sparse Spike Deconvolution (Dossal, 2005) x0

Slide 64

Slide 64 text

Sparse Spike Deconvolution (Dossal, 2005) Φx = i xi ϕ(· − ∆i) J(x) = ||x||1 x0 γ Φx0

Slide 65

Slide 65 text

Sparse Spike Deconvolution (Dossal, 2005) Φx = i xi ϕ(· − ∆i) J(x) = ||x||1 x0 γ Φx0 η0 ∈ ¯ D ⇔ ||Φ+,∗ Ic ΦI s||∞ < 1 γ ||η0,Ic ||∞ γcrit 1

Slide 66

Slide 66 text

1D TV Denoising (V. et al., 2011) Φ = Id J(x) = ||∇x||1 i xi i xi

Slide 67

Slide 67 text

1D TV Denoising (V. et al., 2011) Φ = Id J(x) = ||∇x||1 i xi i xi k mk k mk +1 −1

Slide 68

Slide 68 text

1D TV Denoising (V. et al., 2011) Φ = Id J(x) = ||∇x||1 i xi i xi k mk k mk +1 −1 Support stability No support stability

Slide 69

Slide 69 text

1D TV Denoising (V. et al., 2011) Φ = Id J(x) = ||∇x||1 i xi i xi k mk k mk +1 −1 Support stability No support stability Both are 2-stable