Clément Elvira - Speaker Deck

Slide 1

Slide 1 text

Safe squeezing for antisparse coding Clément Elvira — joint work with Cédric Herzet CentraleSupélec, L2S, Inverse Problem Group (GPI) 4 Octobre 2019

Slide 2

Slide 2 text

Antisparse coding

Slide 3

Slide 3 text

Inverse / Learning problems Given • Acquisition matrix A ∈ Rm×n m < n • Observation y ≃ Ax0 ∈ Rm Goal: recover x0 from arg min x∈Rn 1 2 ∥y − Ax∥2 2 Infinite number of solutions −→ ill posed problem Clément Elvira Séminaire équipe SCEE 1/28 1/28

Slide 4

Slide 4 text

Penalized problem Penalized problem x⋆ ∈ arg min x∈Rn 1 2 ∥y − Ax∥2 2 + Reg(x) The choice of Reg should • reduce the number of solutions • favor solutions with desirable properties • allow for fast algorithms Clément Elvira Séminaire équipe SCEE 2/28 2/28

Slide 5

Slide 5 text

Slide 6

Slide 6 text

From sparse coding to antisparse coding Clément Elvira Séminaire équipe SCEE 3/28 3/28 0 20 40 60 80 100 120 0 0.2 |x(n)| Parcimonieux 0 20 40 60 80 100 120 0 0.2 |x(n)| Ridge 0 20 40 60 80 100 120 0 0.2 |x(n)| Antiparcimonieux 20 40 60 80 100 120 indice n 0 0.5 1 Reg(x) = λ∥x∥1 ⇒ sparsity Reg(x) = λ∥x∥2 2 ⇒ energy Reg(x) = λ∥x∥∞ ⇒ amplitude

Slide 7

Slide 7 text

Application 1 / 3 • Peak to Average Power Ratio (PAPR) reduction Studer & Larsson (2013) ∀x ∈ Rn, PAPR(x) = n∥x∥2 ∞ ∥x∥2 2 Courtesy of Studer and Larsson sw = Hw xw yw = Hw xw + nw Clément Elvira Séminaire équipe SCEE 4/28 4/28

Slide 8

Slide 8 text

Application 2 / 3 • Robotic: Uniform power allocation Cadzow (1971) • Cinematic redundant system • Uniform spread of electric power Clément Elvira Séminaire équipe SCEE 5/28 5/28 y = A                 x(1) 0 . . . x(1) p . . . x(t) 0 . . . x(t) p                

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Application 3 / 3 ML: Approximate Nearest Neighbor search Jegou, Furon and Fuchs (2012) Idea: Learn a higher dimensional representation • x(i) = ±α =⇒ binarization + privacy • binary distance = XOR =⇒ faster Clément Elvira Séminaire équipe SCEE 6/28 6/28 x1 x2 d(x1 , x2 ) x1 x2

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Safe squeezing for antisparse coding

Slide 13

Slide 13 text

Computing antisparse representation Optimization problem x⋆ ∈ arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods Clément Elvira Séminaire équipe SCEE 7/28 7/28

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Computing antisparse representation Optimization problem x⋆ ∈ arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥∞ −→ Convex, coercive Optimization methods • Heuristic to match the optimality conditions [Fuchs, 2011] ◦ add / remove entries from the set of saturated entries • Proximal Gradient: FITRA [Studer & Larsson, 2013] ◦ x(t+1) = proxλ∥·∥ ∞ (x(t) − α∇f (x(t))) Clément Elvira Séminaire équipe SCEE 7/28 7/28

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Connections with inverse problems involving sparsity Lasso Find x⋆ ∈ arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥1 −→ Promotes sparsity Unused feature Safe screening [El ghaoui et al., 2010] [Fercoq et al., 2015] [Fraga-Dantas et al., 2018] [Dorfler et al., 2019] Clément Elvira Séminaire équipe SCEE 8/28 8/28

Slide 18

Slide 18 text

Connections with inverse problems involving sparsity Lasso Find x⋆ ∈ arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥1 −→ Promotes sparsity Unused feature ←→ Saturation Safe screening ←→ Safe squeezing Clément Elvira Séminaire équipe SCEE 8/28 8/28

Slide 19

Slide 19 text

From sparse coding to antisparse coding Clément Elvira Séminaire équipe SCEE 8/28 8/28 0 20 40 60 80 100 120 0 0.2 |x(n)| Parcimonieux 0 20 40 60 80 100 120 0 0.2 |x(n)| Ridge 0 20 40 60 80 100 120 0 0.2 |x(n)| Antiparcimonieux 20 40 60 80 100 120 indice n 0 0.5 1 Reg(x) = λ∥x∥1 ⇒ sparsity Reg(x) = λ∥x∥2 2 ⇒ energy Reg(x) = λ∥x∥∞ ⇒ amplitude

Slide 20

Slide 20 text

Take home message • It is possible to dynamically detect saturated entries • It leads to consider an equivalent lower dimensional problem • It provides faster algorithm at (almost) no additional cost • It is experimentally validated Clément Elvira Séminaire équipe SCEE 9/28 9/28

Slide 21

Slide 21 text

Notions of saturation Recall x⋆ ∈ arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥∞ Definition: saturated entry entry i is saturated iff x⋆(i) = ±∥x⋆∥∞ Clément Elvira Séminaire équipe SCEE 10/28 10/28

Slide 22

Slide 22 text

Notions of saturation Recall x⋆ ∈ arg min x∈Rn 1 2 ∥y − Ax∥2 2 + λ∥x∥∞ Definition: saturated entry entry i is saturated iff x⋆(i) = ±∥x⋆∥∞ Proposition [Fuchs,2011] Generically, x⋆ has at most m − 1 non saturated entries Similar results for “antisparse” Basis pursuit m − 1 can be small compared to n (n ≫ m) Clément Elvira Séminaire équipe SCEE 11/28 11/28

Slide 23

Slide 23 text

Towards a lower dimensional problem Let I⋆ + ≜ {i | x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries Clément Elvira Séminaire équipe SCEE 12/28 12/28

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Towards a lower dimensional problem Let I⋆ + ≜ {i | x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries If we know I+ ⊂ I⋆ + and I− ⊂ I⋆ − Define • B = AIc • s = ∑ ℓ∈I+ ai − ∑ ℓ∈I− ai Equivalent lower dimensional problem [To appear] (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. ∥q∥∞ ≤ w Clément Elvira Séminaire équipe SCEE 12/28 12/28

Slide 26

Slide 26 text

Towards a lower dimensional problem Let I⋆ + ≜ {i | x⋆(i) = +∥x⋆∥∞ } and I⋆ − ≜ {i | x⋆(i) = −∥x⋆∥∞ } Sets of saturated entries If we know I+ ⊂ I⋆ + and I− ⊂ I⋆ − Define • B = AIc • s = ∑ ℓ∈I+ ai − ∑ ℓ∈I− ai Equivalent lower dimensional problem [To appear] (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. ∥q∥∞ ≤ w −→ Can we (dynamically) detect saturated entries? ←− Clément Elvira Séminaire équipe SCEE 12/28 12/28

Slide 27

Slide 27 text

Detecting saturated entries Theorem [to appear] Given a known polytope UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Clément Elvira Séminaire équipe SCEE 13/28 13/28

Slide 28

Slide 28 text

UI y u⋆

Slide 29

Slide 29 text

Detecting saturated entries Theorem [to appear] Given a known polytope UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Then aT i u⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(aT i u⋆) Clément Elvira Séminaire équipe SCEE 14/28 14/28

Slide 30

Slide 30 text

UI y u⋆ aT 1 u aT 2 u aT 3 u

Slide 31

Slide 31 text

Detecting saturated entries Theorem [to appear] Given a known polytope UI Let u⋆ = arg min u∈UI 1 2 ∥y − u∥2 2 Then aT i u⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(aT i u⋆) • Not a heuristic • Computationally simple Clément Elvira Séminaire équipe SCEE 15/28 15/28

Slide 32

Slide 32 text

From safe region to safe sphere Finding u⋆ is (almost) as difficult as finding x⋆ Clément Elvira Séminaire équipe SCEE 16/28 16/28

Slide 33

Slide 33 text

From safe region to safe sphere Finding u⋆ is (almost) as difficult as finding x⋆ Idea: perform the test without computing u⋆ Clément Elvira Séminaire équipe SCEE 16/28 16/28

Slide 34

Slide 34 text

From safe region to safe sphere Finding u⋆ is (almost) as difficult as finding x⋆ Idea: perform the test without computing u⋆ −→ Resort to a safe region A subset S is called Safe region iff u⋆ ∈ S [El Ghaoui et al., 2010] min u∈S ai Tu > 0 =⇒ ai Tu⋆ > 0 =⇒ x⋆(i) is saturated Clément Elvira Séminaire équipe SCEE 16/28 16/28

Slide 35

Slide 35 text

UI y u⋆ aT 1 u aT 2 u aT 3 u S

Slide 36

Slide 36 text

From safe region to safe sphere Finding u⋆ is (almost) as difficult as finding x⋆ Idea: perform the test without computing u⋆ −→ Resort to a safe region A subset S is called Safe region iff u⋆ ∈ S [El Ghaoui et al., 2010] min u∈S ai Tu > 0 =⇒ ai Tu⋆ > 0 =⇒ x⋆(i) is saturated Safe sphere: S = B(c, r) and minu∈B(c,r) ai Tu = ai Tc − r∥ai ∥2 Close form solution! Clément Elvira Séminaire équipe SCEE 17/28 17/28

Slide 37

Slide 37 text

Safe sphere design Goal Find c and r such that u⋆ ∈ B(c, r) Clément Elvira Séminaire équipe SCEE 18/28 18/28

Slide 38

Slide 38 text

Safe sphere design Dual problem Find u⋆ = arg min u∈UI ∥y − u∥2 2 −→ Projection onto the convex set UI ! Clément Elvira Séminaire équipe SCEE 19/28 19/28

Slide 39

Slide 39 text

Safe sphere design Dual problem Find u⋆ = arg min u∈UI ∥y − u∥2 2 −→ Projection onto the convex set UI ! If one knows some u0 ∈ UI , then by definition ∥y − u⋆∥2 2 ≤ ∥y − u0∥2 2 −→ u⋆ belongs to a Sphere! Clément Elvira Séminaire équipe SCEE 19/28 19/28

Slide 40

Slide 40 text

Safe sphere design Goal Find c and r such that u⋆ ∈ B(c, r) Choose u0 ∈ UI ST 1: c = y r = ∥y − u0∥2 Clément Elvira Séminaire équipe SCEE 20/28 20/28 typical use: done once for all before runtime

Slide 41

Slide 41 text

UI y u⋆

Slide 42

Slide 42 text

Visualizing the ST1 sphere UI y u⋆ u(0) aT 1 u aT 2 u aT 3 u

Slide 43

Slide 43 text

Visualizing the ST1 sphere UI y u⋆ u(0) u(1) aT 1 u aT 2 u aT 3 u

Slide 44

Slide 44 text

Visualizing the ST1 sphere UI y u⋆ u(0) u(1) … … u(t) aT 1 u aT 2 u aT 3 u

Slide 45

Slide 45 text

Safe sphere design Goal Find c and r such that u⋆ ∈ B(c, r) Choose u0 ∈ UI ST 1: c = y r = ∥y − u0∥2 GAP sphere: [Fercoq et al, 2015] c = u0 r = √ 2gap(x0, u0) Clément Elvira Séminaire équipe SCEE 21/28 21/28 typical use: done once for all before runtime typical use: • Dynamically • u(t) = projUI (y − Ax(t)) • radius tends to 0

Slide 46

Slide 46 text

Visualizing the GAP sphere UI y u⋆ u(0) √ gap(x(0), u(0)) aT 1 u aT 2 u aT 3 u

Slide 47

Slide 47 text

Visualizing the GAP sphere UI y u⋆ u(0) u(1) aT 1 u aT 2 u aT 3 u

Slide 48

Slide 48 text

Visualizing the GAP sphere UI y u⋆ u(0) u(1) … … u(t) aT 1 u aT 2 u aT 3 u

Slide 49

Slide 49 text

Algorithms

Slide 50

Slide 50 text

Principle of Dynamic squeezing x(0) = 0n , I(0) = ∅ ; u(0) = dualscal(y); t = 1 // iteration index repeat // Iterations of the optimization procedure (x(t), u(t)) = optim_update(x(t−1), u(t−1), I(t)) // Update iteration index t = t + 1 until convergence criterion is met; Output: x(t), I(t) Clément Elvira Séminaire équipe SCEE 22/28 22/28

Slide 51

Slide 51 text

Principle of Dynamic squeezing x(0) = 0n , I(0) = ∅ ; u(0) = dualscal(y); t = 1 // iteration index repeat // Squeezing test (c(t), r(t)) = sphere_param(x(t−1), u(t−1), I(t−1)) ; I(t−½) = squeezing_test(c(t), r(t)) ; I(t) = I(t−½) ∪ I(t−1) ; // Iterations of the optimization procedure (x(t), u(t)) = optim_update(x(t−1), u(t−1), I(t)) // Update iteration index t = t + 1 until convergence criterion is met; Output: x(t), I(t) Clément Elvira Séminaire équipe SCEE 22/28 22/28

Slide 52

Slide 52 text

A word about optimization procedure Optimization problem (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted Clément Elvira Séminaire équipe SCEE 23/28 23/28

Slide 53

Slide 53 text

A word about optimization procedure Optimization problem (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted • Squeezed Fitra ◦ Projected gradient algorithm ◦ ! △ Require the projection onto a cone ◦ ! △ Conditioning −→ scaled algorithm Clément Elvira Séminaire équipe SCEE 23/28 23/28

Slide 54

Slide 54 text

Slide 55

Slide 55 text

A word about optimization procedure Optimization problem (q⋆, w⋆) ∈ arg min q,w∈Rcard(Ic )×R 1 2 ∥y − Bq − ws∥2 2 + λw s.t. { +q ≤ w −q ≤ w Fitra is not fitted • Squeezed Fitra ◦ Projected gradient algorithm ◦ ! △ Require the projection onto a cone ◦ ! △ Conditioning −→ scaled algorithm • Squeezed Frank-Wolfe Clément Elvira Séminaire équipe SCEE 23/28 23/28 ws ←− w s ∥s∥2

Slide 56

Slide 56 text

Numerical experiments

Slide 57

Slide 57 text

Percentage of screened variables of iteration A ∈ R100×150 A[i, j] ∈ [0, 1] GAP sphere 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 / max 2 4 6 8 10 12 log2 (T) 0.0 0.2 0.4 0.6 0.8 1.0 Clément Elvira Séminaire équipe SCEE 24/28 24/28

Slide 58

Slide 58 text

Complexity savings A ∈ R100×150 A[i, j] ∈ [0, 1] GAP sphere 0.0 0.1 0.2 0.3 0.4 0.5 0.6 log10 ( / max ) 105 106 107 108 109 1010 number of operations • Fitra • Squeezed Fitra • Frank-Wolfe • Squeezed Frank-wolfe Clément Elvira Séminaire équipe SCEE 25/28 25/28

Slide 59

Slide 59 text

Benchmark A ∈ R100×150 A[i, j] ∈ [0, 1] Budget: 108 operations 10 16 10 13 10 10 10 7 10 4 10 1 (Dual gap) 0% 20% 40% 60% 80% 100% %run such that gap< / max =0.3 10 16 10 13 10 10 10 7 10 4 10 1 (Dual gap) / max =0.8 • Fitra • Squeezed Fitra • Frank-Wolfe • Squeezed Frank-wolfe Clément Elvira Séminaire équipe SCEE 26/28 26/28

Slide 60

Slide 60 text

Squeezing test - at no cost? • Computing u: Dual scaling of residual vector O(1) ✓ • Squeezing test: inner product aT i u ≡ gradient descent step → already done ✓ • Squeezing test: radius r ≡ dual gap → already computed to monitor convergence ✓ • Proximity operator: sorting O(n log(n)) n is decreasing here can be faster than computing the prox of the ℓ∞ -norm O(n) Clément Elvira Séminaire équipe SCEE 27/28 27/28

Slide 61

Slide 61 text

Conclusion - prospects • It is possible to dynamically detect saturated entries • It leads to an equivalent low dimensional problem • We obtain faster algorithms at (almost) no additional cost Prospects • Other safe regions (dome, truncated dome…) • Nesterov acceleration? • Extension to more BLasso? continuous dictionaries? stay tuned! https://arxiv.org/abs/1911.07508 Toolbox: https://gitlab.inria.fr/celvira/safe-squeezing Clément Elvira Séminaire équipe SCEE 28/28 28/28

Slide 62

Slide 62 text

Merci de votre attention! stay tuned! https://arxiv.org/abs/1911.07508 Toolbox: https://gitlab.inria.fr/celvira/safe-squeezing

Slide 63

Slide 63 text

Slide 64

Slide 64 text

Slide 65

Slide 65 text

Ideal test to detect saturated entries Theorem [to appear] Let u⋆ = arg max u∈UI 1 2 ∥y∥2 2 − 1 2 ∥y − u∥2 2 with UI a known polytope Then ai Tu⋆ > 0 =⇒ x⋆(i) is saturated + sign given by sign(ai Tu⋆) 1. Slater conditions involves ∃v⋆ + s.t. v⋆ + (i)(x⋆(i) − w⋆) = 0 2. 1st order optimality conditions involve v⋆ + (i) = ai Tu⋆ Clément Elvira Séminaire équipe SCEE 1/1 1/1