Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Clément Elvira

S³ Seminar
October 04, 2019

Clément Elvira

(PANAMA research group, Iniria, CNRS, IRISA - Rennes)

https://s3-seminar.github.io/seminars/clement-elvira

Title — Safe squeezing for antisparse coding

Abstract — Spreading the information over all coefficients of a representation is a desirable property in many applications such as digital communication or machine learning. This so-called antisparse representation can be obtained by solving a convex program involving a $\ell_\infty$-norm penalty combined with a quadratic discrepancy. In this talk, we propose a new methodology, dubbed safe squeezing, to accelerate the computation of antisparse representation. We describe a test that allows to detect saturated entries in the solution of the optimization problem. The contribution of these entries is compacted into a single vector, resulting in a form of dimensionality reduction. We propose two algorithms to solve the latter lower dimensional problem. Numerical experiments show both the effectiveness of the saturation detection tests and that the proposed procedures lead to significant computational gains as compared to existing methods.

Biography — Clément Elvira is a postdoctoral researcher at Inria Rennes - Bretagne atlantique and part of the BECOSE project. He is working under the supervision of Cédric Herzet, Rémi Gribonval and Charles Soussen. He was a PhD student from october, 2014 to november, 2017 at CRIStAL in Lille, France, under the supervision of Pierre Chainais and Nicolas Dobigeon, and he was part of the SigMA group at CRIStAL.

S³ Seminar

October 04, 2019
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Safe squeezing for antisparse coding Clément Elvira — joint work

    with Cédric Herzet CentraleSupélec, L2S, Inverse Problem Group (GPI) 4 Octobre 2019
  2. Inverse / Learning problems Given • Acquisition matrix A ∈

    Rm×n m < n • Observation y ≃ Ax0 ∈ Rm Goal: recover x0 from arg min x∈Rn 1 2 ∥y − Ax∥2 2 Infinite number of solutions −→ ill posed problem Clément Elvira Séminaire équipe SCEE 1/28 1/28
  3. Penalized problem Penalized problem x⋆ ∈ arg min x∈Rn 1

    2 ∥y − Ax∥2 2 + Reg(x) The choice of Reg should • reduce the number of solutions • favor solutions with desirable properties • allow for fast algorithms Clément Elvira Séminaire équipe SCEE 2/28 2/28
  4. Penalized problem Penalized problem x⋆ ∈ arg min x∈Rn 1

    2 ∥y − Ax∥2 2 + Reg(x) The choice of Reg should • reduce the number of solutions • favor solutions with desirable properties • allow for fast algorithms Popular choice of Reg −→ convex function Clément Elvira Séminaire équipe SCEE 2/28 2/28
  5. From sparse coding to antisparse coding Clément Elvira Séminaire équipe

    SCEE 3/28 3/28 0 20 40 60 80 100 120 0 0.2 |x(n)| Parcimonieux 0 20 40 60 80 100 120 0 0.2 |x(n)| Ridge 0 20 40 60 80 100 120 0 0.2 |x(n)| Antiparcimonieux 20 40 60 80 100 120 indice n 0 0.5 1 Reg(x) = λ∥x∥1 ⇒ sparsity Reg(x) = λ∥x∥2 2 ⇒ energy Reg(x) = λ∥x∥∞ ⇒ amplitude
  6. Application 1 / 3 • Peak to Average Power Ratio

    (PAPR) reduction Studer & Larsson (2013) ∀x ∈ Rn, PAPR(x) = n∥x∥2 ∞ ∥x∥2 2 Courtesy of Studer and Larsson sw = Hw xw yw = Hw xw + nw Clément Elvira Séminaire équipe SCEE 4/28 4/28
  7. Application 2 / 3 • Robotic: Uniform power allocation Cadzow

    (1971) • Cinematic redundant system • Uniform spread of electric power Clément Elvira Séminaire équipe SCEE 5/28 5/28 y = A                 x(1) 0 . . . x(1) p . . . x(t) 0 . . . x(t) p                
  8. Application 2 / 3 • Robotic: Uniform power allocation Cadzow

    (1971) • Cinematic redundant system • Uniform spread of electric power Clément Elvira Séminaire équipe SCEE 5/28 5/28 y = A                 x(1) 0 . . . x(1) p . . . x(t) 0 . . . x(t) p                
  9. Application 3 / 3 ML: Approximate Nearest Neighbor search Jegou,

    Furon and Fuchs (2012) Idea: Learn a higher dimensional representation • x(i) = ±α =⇒ binarization + privacy • binary distance = XOR =⇒ faster Clément Elvira Séminaire équipe SCEE 6/28 6/28 x1 x2 d(x1 , x2 ) x1 x2
  10. Application 3 / 3 ML: Approximate Nearest Neighbor search Jegou,

    Furon and Fuchs (2012) Idea: Learn a higher dimensional representation • x(i) = ±α =⇒ binarization + privacy • binary distance = XOR =⇒ faster Clément Elvira Séminaire équipe SCEE 6/28 6/28 x1 x2 d(x1 , x2 ) x1 x2
  11. Computing antisparse representation Optimization problem x⋆ ∈ arg min x∈Rn

    1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods Clément Elvira Séminaire équipe SCEE 7/28 7/28
  12. Computing antisparse representation Optimization problem x⋆ ∈ arg min x∈Rn

    1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods • Heuristic to match the optimality conditions [Fuchs, 2011] ◦ add / remove entries from the set of saturated entries Clément Elvira Séminaire équipe SCEE 7/28 7/28
  13. Computing antisparse representation Optimization problem x⋆ ∈ arg min x∈Rn

    1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods • Heuristic to match the optimality conditions [Fuchs, 2011] ◦ add / remove entries from the set of saturated entries • Proximal Gradient: FITRA [Studer & Larsson, 2013] ◦ x(t+1) = proxλ∥·∥ ∞ (x(t) − α∇f (x(t))) Clément Elvira Séminaire équipe SCEE 7/28 7/28
  14. Computing antisparse representation Optimization problem x⋆ ∈ arg min x∈Rn

    1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods • Heuristic to match the optimality conditions [Fuchs, 2011] ◦ add / remove entries from the set of saturated entries • Proximal Gradient: FITRA [Studer & Larsson, 2013] ◦ x(t+1) = proxλ∥·∥ ∞ (x(t) − α∇f (x(t))) • Bayesian framework [Elvira et al., 2017] ◦ Democratic prior p(x) ∝ exp ( −λ∥x∥ ∞ ) ◦ Gibbs sampler / Proximal MCMC Clément Elvira Séminaire équipe SCEE 7/28 7/28
  15. Connections with inverse problems involving sparsity Lasso Find x⋆ ∈

    arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥1 −→ Promotes sparsity Unused feature Safe screening [El ghaoui et al., 2010] [Fercoq et al., 2015] [Fraga-Dantas et al., 2018] [Dorfler et al., 2019] Clément Elvira Séminaire équipe SCEE 8/28 8/28
  16. Connections with inverse problems involving sparsity Lasso Find x⋆ ∈

    arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥1 −→ Promotes sparsity Unused feature ←→ Saturation Safe screening ←→ Safe squeezing Clément Elvira Séminaire équipe SCEE 8/28 8/28
  17. From sparse coding to antisparse coding Clément Elvira Séminaire équipe

    SCEE 8/28 8/28 0 20 40 60 80 100 120 0 0.2 |x(n)| Parcimonieux 0 20 40 60 80 100 120 0 0.2 |x(n)| Ridge 0 20 40 60 80 100 120 0 0.2 |x(n)| Antiparcimonieux 20 40 60 80 100 120 indice n 0 0.5 1 Reg(x) = λ∥x∥1 ⇒ sparsity Reg(x) = λ∥x∥2 2 ⇒ energy Reg(x) = λ∥x∥∞ ⇒ amplitude
  18. Take home message • It is possible to dynamically detect

    saturated entries • It leads to consider an equivalent lower dimensional problem • It provides faster algorithm at (almost) no additional cost • It is experimentally validated Clément Elvira Séminaire équipe SCEE 9/28 9/28
  19. Notions of saturation Recall x⋆ ∈ arg min x∈Rn 1

    2 ∥y − Ax∥2 2 + λ∥x∥∞ Definition: saturated entry entry i is saturated iff x⋆(i) = ±∥x⋆∥∞ Clément Elvira Séminaire équipe SCEE 10/28 10/28
  20. Notions of saturation Recall x⋆ ∈ arg min x∈Rn 1

    2 ∥y − Ax∥2 2 + λ∥x∥∞ Definition: saturated entry entry i is saturated iff x⋆(i) = ±∥x⋆∥∞ Proposition [Fuchs,2011] Generically, x⋆ has at most m − 1 non saturated entries Similar results for “antisparse” Basis pursuit m − 1 can be small compared to n (n ≫ m) Clément Elvira Séminaire équipe SCEE 11/28 11/28
  21. Towards a lower dimensional problem Let I⋆ + ≜ {i

    | x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries Clément Elvira Séminaire équipe SCEE 12/28 12/28
  22. Towards a lower dimensional problem Let I⋆ + ≜ {i

    | x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries If we know I+ ⊂ I⋆ + and I− ⊂ I⋆ − Define • B = AIc • s = ∑ ℓ∈I+ ai − ∑ ℓ∈I− ai Clément Elvira Séminaire équipe SCEE 12/28 12/28
  23. Towards a lower dimensional problem Let I⋆ + ≜ {i

    | x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries If we know I+ ⊂ I⋆ + and I− ⊂ I⋆ − Define • B = AIc • s = ∑ ℓ∈I+ ai − ∑ ℓ∈I− ai Equivalent lower dimensional problem [To appear] (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. ∥q∥∞ ≤ w Clément Elvira Séminaire équipe SCEE 12/28 12/28
  24. Towards a lower dimensional problem Let I⋆ + ≜ {i

    | x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries If we know I+ ⊂ I⋆ + and I− ⊂ I⋆ − Define • B = AIc • s = ∑ ℓ∈I+ ai − ∑ ℓ∈I− ai Equivalent lower dimensional problem [To appear] (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. ∥q∥∞ ≤ w −→ Can we (dynamically) detect saturated entries? ←− Clément Elvira Séminaire équipe SCEE 12/28 12/28
  25. Detecting saturated entries Theorem [to appear] Given a known polytope

    UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Clément Elvira Séminaire équipe SCEE 13/28 13/28
  26. Detecting saturated entries Theorem [to appear] Given a known polytope

    UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Then aT i u⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(aT i u⋆) Clément Elvira Séminaire équipe SCEE 14/28 14/28
  27. Detecting saturated entries Theorem [to appear] Given a known polytope

    UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Then aT i u⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(aT i u⋆) • Not a heuristic • Computationally simple Clément Elvira Séminaire équipe SCEE 15/28 15/28
  28. From safe region to safe sphere Finding u⋆ is (almost)

    as difficult as finding x⋆ Clément Elvira Séminaire équipe SCEE 16/28 16/28
  29. From safe region to safe sphere Finding u⋆ is (almost)

    as difficult as finding x⋆ Idea: perform the test without computing u⋆ Clément Elvira Séminaire équipe SCEE 16/28 16/28
  30. From safe region to safe sphere Finding u⋆ is (almost)

    as difficult as finding x⋆ Idea: perform the test without computing u⋆ −→ Resort to a safe region A subset S is called Safe region iff u⋆ ∈ S [El Ghaoui et al., 2010] min u∈S ai Tu > 0 =⇒ ai Tu⋆ > 0 =⇒ x⋆(i) is saturated Clément Elvira Séminaire équipe SCEE 16/28 16/28
  31. From safe region to safe sphere Finding u⋆ is (almost)

    as difficult as finding x⋆ Idea: perform the test without computing u⋆ −→ Resort to a safe region A subset S is called Safe region iff u⋆ ∈ S [El Ghaoui et al., 2010] min u∈S ai Tu > 0 =⇒ ai Tu⋆ > 0 =⇒ x⋆(i) is saturated Safe sphere: S = B(c, r) and minu∈B(c,r) ai Tu = ai Tc − r∥ai ∥2 Close form solution! Clément Elvira Séminaire équipe SCEE 17/28 17/28
  32. Safe sphere design Goal Find c and r such that

    u⋆ ∈ B(c, r) Clément Elvira Séminaire équipe SCEE 18/28 18/28
  33. Safe sphere design Dual problem Find u⋆ = arg min

    u∈UI ∥y − u∥2 2 −→ Projection onto the convex set UI ! Clément Elvira Séminaire équipe SCEE 19/28 19/28
  34. Safe sphere design Dual problem Find u⋆ = arg min

    u∈UI ∥y − u∥2 2 −→ Projection onto the convex set UI ! If one knows some u0 ∈ UI , then by definition ∥y − u⋆∥2 2 ≤ ∥y − u0∥2 2 −→ u⋆ belongs to a Sphere! Clément Elvira Séminaire équipe SCEE 19/28 19/28
  35. Safe sphere design Goal Find c and r such that

    u⋆ ∈ B(c, r) Choose u0 ∈ UI ST 1: c = y r = ∥y − u0∥2 Clément Elvira Séminaire équipe SCEE 20/28 20/28 typical use: done once for all before runtime
  36. Safe sphere design Goal Find c and r such that

    u⋆ ∈ B(c, r) Choose u0 ∈ UI ST 1: c = y r = ∥y − u0∥2 GAP sphere: [Fercoq et al, 2015] c = u0 r = √ 2gap(x0, u0) Clément Elvira Séminaire équipe SCEE 21/28 21/28 typical use: done once for all before runtime typical use: • Dynamically • u(t) = projUI (y − Ax(t)) • radius tends to 0
  37. Principle of Dynamic squeezing x(0) = 0n , I(0) =

    ∅ ; u(0) = dualscal(y); t = 1 // iteration index repeat // Iterations of the optimization procedure (x(t), u(t)) = optim_update(x(t−1), u(t−1), I(t)) // Update iteration index t = t + 1 until convergence criterion is met; Output: x(t), I(t) Clément Elvira Séminaire équipe SCEE 22/28 22/28
  38. Principle of Dynamic squeezing x(0) = 0n , I(0) =

    ∅ ; u(0) = dualscal(y); t = 1 // iteration index repeat // Squeezing test (c(t), r(t)) = sphere_param(x(t−1), u(t−1), I(t−1)) ; I(t−½) = squeezing_test(c(t), r(t)) ; I(t) = I(t−½) ∪ I(t−1) ; // Iterations of the optimization procedure (x(t), u(t)) = optim_update(x(t−1), u(t−1), I(t)) // Update iteration index t = t + 1 until convergence criterion is met; Output: x(t), I(t) Clément Elvira Séminaire équipe SCEE 22/28 22/28
  39. A word about optimization procedure Optimization problem (q⋆, w⋆) ∈

    arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted Clément Elvira Séminaire équipe SCEE 23/28 23/28
  40. A word about optimization procedure Optimization problem (q⋆, w⋆) ∈

    arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted • Squeezed Fitra ◦ Projected gradient algorithm ◦ ! △ Require the projection onto a cone ◦ ! △ Conditioning −→ scaled algorithm Clément Elvira Séminaire équipe SCEE 23/28 23/28
  41. A word about optimization procedure Optimization problem (q⋆, w⋆) ∈

    arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted • Squeezed Fitra ◦ Projected gradient algorithm ◦ ! △ Require the projection onto a cone ◦ ! △ Conditioning −→ scaled algorithm Clément Elvira Séminaire équipe SCEE 23/28 23/28 ws ←− w s ∥s∥2
  42. A word about optimization procedure Optimization problem (q⋆, w⋆) ∈

    arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted • Squeezed Fitra ◦ Projected gradient algorithm ◦ ! △ Require the projection onto a cone ◦ ! △ Conditioning −→ scaled algorithm • Squeezed Frank-Wolfe Clément Elvira Séminaire équipe SCEE 23/28 23/28 ws ←− w s ∥s∥2
  43. Percentage of screened variables of iteration A ∈ R100×150 A[i,

    j] ∈ [0, 1] GAP sphere 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 / max 2 4 6 8 10 12 log2 (T) 0.0 0.2 0.4 0.6 0.8 1.0 Clément Elvira Séminaire équipe SCEE 24/28 24/28
  44. Complexity savings A ∈ R100×150 A[i, j] ∈ [0, 1]

    GAP sphere 0.0 0.1 0.2 0.3 0.4 0.5 0.6 log10 ( / max ) 105 106 107 108 109 1010 number of operations • Fitra • Squeezed Fitra • Frank-Wolfe • Squeezed Frank-wolfe Clément Elvira Séminaire équipe SCEE 25/28 25/28
  45. Benchmark A ∈ R100×150 A[i, j] ∈ [0, 1] Budget:

    108 operations 10 16 10 13 10 10 10 7 10 4 10 1 (Dual gap) 0% 20% 40% 60% 80% 100% %run such that gap< / max =0.3 10 16 10 13 10 10 10 7 10 4 10 1 (Dual gap) / max =0.8 • Fitra • Squeezed Fitra • Frank-Wolfe • Squeezed Frank-wolfe Clément Elvira Séminaire équipe SCEE 26/28 26/28
  46. Squeezing test - at no cost? • Computing u: Dual

    scaling of residual vector O(1) ✓ • Squeezing test: inner product aT i u ≡ gradient descent step → already done ✓ • Squeezing test: radius r ≡ dual gap → already computed to monitor convergence ✓ • Proximity operator: sorting O(n log(n)) n is decreasing here can be faster than computing the prox of the ℓ∞ -norm O(n) Clément Elvira Séminaire équipe SCEE 27/28 27/28
  47. Conclusion - prospects • It is possible to dynamically detect

    saturated entries • It leads to an equivalent low dimensional problem • We obtain faster algorithms at (almost) no additional cost Prospects • Other safe regions (dome, truncated dome…) • Nesterov acceleration? • Extension to more BLasso? continuous dictionaries? stay tuned! https://arxiv.org/abs/1911.07508 Toolbox: https://gitlab.inria.fr/celvira/safe-squeezing Clément Elvira Séminaire équipe SCEE 28/28 28/28
  48. Ideal test to detect saturated entries Theorem [to appear] Let

    u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) Clément Elvira Séminaire équipe SCEE 1/1 1/1
  49. Ideal test to detect saturated entries Theorem [to appear] Let

    u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 Clément Elvira Séminaire équipe SCEE 1/1 1/1
  50. Ideal test to detect saturated entries Theorem [to appear] Let

    u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 2. 1st order optimality conditions involve v⋆ + (i) = ai Tu⋆ Clément Elvira Séminaire équipe SCEE 1/1 1/1
  51. Ideal test to detect saturated entries Theorem [to appear] Let

    u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 2. 1st order optimality conditions involve v⋆ + (i) = ai Tu⋆ 3. If v⋆ + (i) ̸= 0 then x⋆(i) = +w⋆ necessarily Clément Elvira Séminaire équipe SCEE 1/1 1/1
  52. Ideal test to detect saturated entries Theorem [to appear] Let

    u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 2. 1st order optimality conditions involve v⋆ + (i) = ai Tu⋆ 3. If v⋆ + (i) ̸= 0 then x⋆(i) = +w⋆ necessarily Clément Elvira Séminaire équipe SCEE 1/1 1/1 + u⋆ cannot be orthogonal to all columns of A.