Clément Elvira

Safe squeezing for antisparse coding Clément Elvira — joint work
with Cédric Herzet CentraleSupélec, L2S, Inverse Problem Group (GPI) 4 Octobre 2019

Antisparse coding

Inverse / Learning problems Given • Acquisition matrix A ∈
Rm×n m < n • Observation y ≃ Ax0 ∈ Rm Goal: recover x0 from arg min x∈Rn 1 2 ∥y − Ax∥2 2 Infinite number of solutions −→ ill posed problem Clément Elvira Séminaire équipe SCEE 1/28 1/28

Penalized problem Penalized problem x⋆ ∈ arg min x∈Rn 1
2 ∥y − Ax∥2 2 + Reg(x) The choice of Reg should • reduce the number of solutions • favor solutions with desirable properties • allow for fast algorithms Clément Elvira Séminaire équipe SCEE 2/28 2/28

Penalized problem Penalized problem x⋆ ∈ arg min x∈Rn 1
2 ∥y − Ax∥2 2 + Reg(x) The choice of Reg should • reduce the number of solutions • favor solutions with desirable properties • allow for fast algorithms Popular choice of Reg −→ convex function Clément Elvira Séminaire équipe SCEE 2/28 2/28

From sparse coding to antisparse coding Clément Elvira Séminaire équipe
SCEE 3/28 3/28 0 20 40 60 80 100 120 0 0.2 |x(n)| Parcimonieux 0 20 40 60 80 100 120 0 0.2 |x(n)| Ridge 0 20 40 60 80 100 120 0 0.2 |x(n)| Antiparcimonieux 20 40 60 80 100 120 indice n 0 0.5 1 Reg(x) = λ∥x∥1 ⇒ sparsity Reg(x) = λ∥x∥2 2 ⇒ energy Reg(x) = λ∥x∥∞ ⇒ amplitude

Application 1 / 3 • Peak to Average Power Ratio
(PAPR) reduction Studer & Larsson (2013) ∀x ∈ Rn, PAPR(x) = n∥x∥2 ∞ ∥x∥2 2 Courtesy of Studer and Larsson sw = Hw xw yw = Hw xw + nw Clément Elvira Séminaire équipe SCEE 4/28 4/28

Application 2 / 3 • Robotic: Uniform power allocation Cadzow
(1971) • Cinematic redundant system • Uniform spread of electric power Clément Elvira Séminaire équipe SCEE 5/28 5/28 y = A                 x(1) 0 . . . x(1) p . . . x(t) 0 . . . x(t) p                

Application 3 / 3 ML: Approximate Nearest Neighbor search Jegou,
Furon and Fuchs (2012) Idea: Learn a higher dimensional representation • x(i) = ±α =⇒ binarization + privacy • binary distance = XOR =⇒ faster Clément Elvira Séminaire équipe SCEE 6/28 6/28 x1 x2 d(x1 , x2 ) x1 x2

Safe squeezing for antisparse coding

Computing antisparse representation Optimization problem x⋆ ∈ arg min x∈Rn
1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods Clément Elvira Séminaire équipe SCEE 7/28 7/28

1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods • Heuristic to match the optimality conditions [Fuchs, 2011] ◦ add / remove entries from the set of saturated entries Clément Elvira Séminaire équipe SCEE 7/28 7/28

1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods • Heuristic to match the optimality conditions [Fuchs, 2011] ◦ add / remove entries from the set of saturated entries • Proximal Gradient: FITRA [Studer & Larsson, 2013] ◦ x(t+1) = proxλ∥·∥ ∞ (x(t) − α∇f (x(t))) Clément Elvira Séminaire équipe SCEE 7/28 7/28

1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods • Heuristic to match the optimality conditions [Fuchs, 2011] ◦ add / remove entries from the set of saturated entries • Proximal Gradient: FITRA [Studer & Larsson, 2013] ◦ x(t+1) = proxλ∥·∥ ∞ (x(t) − α∇f (x(t))) • Bayesian framework [Elvira et al., 2017] ◦ Democratic prior p(x) ∝ exp ( −λ∥x∥ ∞ ) ◦ Gibbs sampler / Proximal MCMC Clément Elvira Séminaire équipe SCEE 7/28 7/28

Connections with inverse problems involving sparsity Lasso Find x⋆ ∈
arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥1 −→ Promotes sparsity Unused feature Safe screening [El ghaoui et al., 2010] [Fercoq et al., 2015] [Fraga-Dantas et al., 2018] [Dorfler et al., 2019] Clément Elvira Séminaire équipe SCEE 8/28 8/28

Connections with inverse problems involving sparsity Lasso Find x⋆ ∈
arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥1 −→ Promotes sparsity Unused feature ←→ Saturation Safe screening ←→ Safe squeezing Clément Elvira Séminaire équipe SCEE 8/28 8/28

From sparse coding to antisparse coding Clément Elvira Séminaire équipe
SCEE 8/28 8/28 0 20 40 60 80 100 120 0 0.2 |x(n)| Parcimonieux 0 20 40 60 80 100 120 0 0.2 |x(n)| Ridge 0 20 40 60 80 100 120 0 0.2 |x(n)| Antiparcimonieux 20 40 60 80 100 120 indice n 0 0.5 1 Reg(x) = λ∥x∥1 ⇒ sparsity Reg(x) = λ∥x∥2 2 ⇒ energy Reg(x) = λ∥x∥∞ ⇒ amplitude

Take home message • It is possible to dynamically detect
saturated entries • It leads to consider an equivalent lower dimensional problem • It provides faster algorithm at (almost) no additional cost • It is experimentally validated Clément Elvira Séminaire équipe SCEE 9/28 9/28

Notions of saturation Recall x⋆ ∈ arg min x∈Rn 1
2 ∥y − Ax∥2 2 + λ∥x∥∞ Definition: saturated entry entry i is saturated iff x⋆(i) = ±∥x⋆∥∞ Clément Elvira Séminaire équipe SCEE 10/28 10/28

Notions of saturation Recall x⋆ ∈ arg min x∈Rn 1
2 ∥y − Ax∥2 2 + λ∥x∥∞ Definition: saturated entry entry i is saturated iff x⋆(i) = ±∥x⋆∥∞ Proposition [Fuchs,2011] Generically, x⋆ has at most m − 1 non saturated entries Similar results for “antisparse” Basis pursuit m − 1 can be small compared to n (n ≫ m) Clément Elvira Séminaire équipe SCEE 11/28 11/28

Towards a lower dimensional problem Let I⋆ + ≜ {i
| x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries Clément Elvira Séminaire équipe SCEE 12/28 12/28

| x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries If we know I+ ⊂ I⋆ + and I− ⊂ I⋆ − Define • B = AIc • s = ∑ ℓ∈I+ ai − ∑ ℓ∈I− ai Clément Elvira Séminaire équipe SCEE 12/28 12/28

| x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries If we know I+ ⊂ I⋆ + and I− ⊂ I⋆ − Define • B = AIc • s = ∑ ℓ∈I+ ai − ∑ ℓ∈I− ai Equivalent lower dimensional problem [To appear] (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. ∥q∥∞ ≤ w Clément Elvira Séminaire équipe SCEE 12/28 12/28

| x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries If we know I+ ⊂ I⋆ + and I− ⊂ I⋆ − Define • B = AIc • s = ∑ ℓ∈I+ ai − ∑ ℓ∈I− ai Equivalent lower dimensional problem [To appear] (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. ∥q∥∞ ≤ w −→ Can we (dynamically) detect saturated entries? ←− Clément Elvira Séminaire équipe SCEE 12/28 12/28

Detecting saturated entries Theorem [to appear] Given a known polytope
UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Clément Elvira Séminaire équipe SCEE 13/28 13/28

UI y u⋆

UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Then aT i u⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(aT i u⋆) Clément Elvira Séminaire équipe SCEE 14/28 14/28

UI y u⋆ aT 1 u aT 2 u aT
3 u

UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Then aT i u⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(aT i u⋆) • Not a heuristic • Computationally simple Clément Elvira Séminaire équipe SCEE 15/28 15/28

From safe region to safe sphere Finding u⋆ is (almost)
as difficult as finding x⋆ Clément Elvira Séminaire équipe SCEE 16/28 16/28

as difficult as finding x⋆ Idea: perform the test without computing u⋆ Clément Elvira Séminaire équipe SCEE 16/28 16/28

as difficult as finding x⋆ Idea: perform the test without computing u⋆ −→ Resort to a safe region A subset S is called Safe region iff u⋆ ∈ S [El Ghaoui et al., 2010] min u∈S ai Tu > 0 =⇒ ai Tu⋆ > 0 =⇒ x⋆(i) is saturated Clément Elvira Séminaire équipe SCEE 16/28 16/28

UI y u⋆ aT 1 u aT 2 u aT
3 u S

as difficult as finding x⋆ Idea: perform the test without computing u⋆ −→ Resort to a safe region A subset S is called Safe region iff u⋆ ∈ S [El Ghaoui et al., 2010] min u∈S ai Tu > 0 =⇒ ai Tu⋆ > 0 =⇒ x⋆(i) is saturated Safe sphere: S = B(c, r) and minu∈B(c,r) ai Tu = ai Tc − r∥ai ∥2 Close form solution! Clément Elvira Séminaire équipe SCEE 17/28 17/28

Safe sphere design Goal Find c and r such that
u⋆ ∈ B(c, r) Clément Elvira Séminaire équipe SCEE 18/28 18/28

Safe sphere design Dual problem Find u⋆ = arg min
u∈UI ∥y − u∥2 2 −→ Projection onto the convex set UI ! Clément Elvira Séminaire équipe SCEE 19/28 19/28

Safe sphere design Dual problem Find u⋆ = arg min
u∈UI ∥y − u∥2 2 −→ Projection onto the convex set UI ! If one knows some u0 ∈ UI , then by definition ∥y − u⋆∥2 2 ≤ ∥y − u0∥2 2 −→ u⋆ belongs to a Sphere! Clément Elvira Séminaire équipe SCEE 19/28 19/28

u⋆ ∈ B(c, r) Choose u0 ∈ UI ST 1: c = y r = ∥y − u0∥2 Clément Elvira Séminaire équipe SCEE 20/28 20/28 typical use: done once for all before runtime

UI y u⋆

Visualizing the ST1 sphere UI y u⋆ u(0) aT 1
u aT 2 u aT 3 u

Visualizing the ST1 sphere UI y u⋆ u(0) u(1) aT
1 u aT 2 u aT 3 u

Visualizing the ST1 sphere UI y u⋆ u(0) u(1) …
… u(t) aT 1 u aT 2 u aT 3 u

u⋆ ∈ B(c, r) Choose u0 ∈ UI ST 1: c = y r = ∥y − u0∥2 GAP sphere: [Fercoq et al, 2015] c = u0 r = √ 2gap(x0, u0) Clément Elvira Séminaire équipe SCEE 21/28 21/28 typical use: done once for all before runtime typical use: • Dynamically • u(t) = projUI (y − Ax(t)) • radius tends to 0

Visualizing the GAP sphere UI y u⋆ u(0) √ gap(x(0),
u(0)) aT 1 u aT 2 u aT 3 u

Visualizing the GAP sphere UI y u⋆ u(0) u(1) aT
1 u aT 2 u aT 3 u

Visualizing the GAP sphere UI y u⋆ u(0) u(1) …
… u(t) aT 1 u aT 2 u aT 3 u

Algorithms

Principle of Dynamic squeezing x(0) = 0n , I(0) =
∅ ; u(0) = dualscal(y); t = 1 // iteration index repeat // Iterations of the optimization procedure (x(t), u(t)) = optim_update(x(t−1), u(t−1), I(t)) // Update iteration index t = t + 1 until convergence criterion is met; Output: x(t), I(t) Clément Elvira Séminaire équipe SCEE 22/28 22/28

Principle of Dynamic squeezing x(0) = 0n , I(0) =
∅ ; u(0) = dualscal(y); t = 1 // iteration index repeat // Squeezing test (c(t), r(t)) = sphere_param(x(t−1), u(t−1), I(t−1)) ; I(t−½) = squeezing_test(c(t), r(t)) ; I(t) = I(t−½) ∪ I(t−1) ; // Iterations of the optimization procedure (x(t), u(t)) = optim_update(x(t−1), u(t−1), I(t)) // Update iteration index t = t + 1 until convergence criterion is met; Output: x(t), I(t) Clément Elvira Séminaire équipe SCEE 22/28 22/28

A word about optimization procedure Optimization problem (q⋆, w⋆) ∈
arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted Clément Elvira Séminaire équipe SCEE 23/28 23/28

arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted • Squeezed Fitra ◦ Projected gradient algorithm ◦ ! △ Require the projection onto a cone ◦ ! △ Conditioning −→ scaled algorithm Clément Elvira Séminaire équipe SCEE 23/28 23/28

arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted • Squeezed Fitra ◦ Projected gradient algorithm ◦ ! △ Require the projection onto a cone ◦ ! △ Conditioning −→ scaled algorithm Clément Elvira Séminaire équipe SCEE 23/28 23/28 ws ←− w s ∥s∥2

arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted • Squeezed Fitra ◦ Projected gradient algorithm ◦ ! △ Require the projection onto a cone ◦ ! △ Conditioning −→ scaled algorithm • Squeezed Frank-Wolfe Clément Elvira Séminaire équipe SCEE 23/28 23/28 ws ←− w s ∥s∥2

Numerical experiments

Percentage of screened variables of iteration A ∈ R100×150 A[i,
j] ∈ [0, 1] GAP sphere 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 / max 2 4 6 8 10 12 log2 (T) 0.0 0.2 0.4 0.6 0.8 1.0 Clément Elvira Séminaire équipe SCEE 24/28 24/28

Complexity savings A ∈ R100×150 A[i, j] ∈ [0, 1]
GAP sphere 0.0 0.1 0.2 0.3 0.4 0.5 0.6 log10 ( / max ) 105 106 107 108 109 1010 number of operations • Fitra • Squeezed Fitra • Frank-Wolfe • Squeezed Frank-wolfe Clément Elvira Séminaire équipe SCEE 25/28 25/28

Benchmark A ∈ R100×150 A[i, j] ∈ [0, 1] Budget:
108 operations 10 16 10 13 10 10 10 7 10 4 10 1 (Dual gap) 0% 20% 40% 60% 80% 100% %run such that gap< / max =0.3 10 16 10 13 10 10 10 7 10 4 10 1 (Dual gap) / max =0.8 • Fitra • Squeezed Fitra • Frank-Wolfe • Squeezed Frank-wolfe Clément Elvira Séminaire équipe SCEE 26/28 26/28

Squeezing test - at no cost? • Computing u: Dual
scaling of residual vector O(1) ✓ • Squeezing test: inner product aT i u ≡ gradient descent step → already done ✓ • Squeezing test: radius r ≡ dual gap → already computed to monitor convergence ✓ • Proximity operator: sorting O(n log(n)) n is decreasing here can be faster than computing the prox of the ℓ∞ -norm O(n) Clément Elvira Séminaire équipe SCEE 27/28 27/28

Conclusion - prospects • It is possible to dynamically detect
saturated entries • It leads to an equivalent low dimensional problem • We obtain faster algorithms at (almost) no additional cost Prospects • Other safe regions (dome, truncated dome…) • Nesterov acceleration? • Extension to more BLasso? continuous dictionaries? stay tuned! https://arxiv.org/abs/1911.07508 Toolbox: https://gitlab.inria.fr/celvira/safe-squeezing Clément Elvira Séminaire équipe SCEE 28/28 28/28

Merci de votre attention! stay tuned! https://arxiv.org/abs/1911.07508 Toolbox: https://gitlab.inria.fr/celvira/safe-squeezing

Ideal test to detect saturated entries Theorem [to appear] Let
u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) Clément Elvira Séminaire équipe SCEE 1/1 1/1

u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 Clément Elvira Séminaire équipe SCEE 1/1 1/1

u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 2. 1st order optimality conditions involve v⋆ + (i) = ai Tu⋆ Clément Elvira Séminaire équipe SCEE 1/1 1/1

u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 2. 1st order optimality conditions involve v⋆ + (i) = ai Tu⋆ 3. If v⋆ + (i) ̸= 0 then x⋆(i) = +w⋆ necessarily Clément Elvira Séminaire équipe SCEE 1/1 1/1

u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 2. 1st order optimality conditions involve v⋆ + (i) = ai Tu⋆ 3. If v⋆ + (i) ̸= 0 then x⋆(i) = +w⋆ necessarily Clément Elvira Séminaire équipe SCEE 1/1 1/1 + u⋆ cannot be orthogonal to all columns of A.

Clément Elvira

Clément Elvira

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript