Slide 1

Slide 1 text

Exact optimization for 0 -norm sparse solutions to least-squares problems S´ ebastien Bourguignon Laboratoire des Sciences du Num´ erique de Nantes ´ Ecole Centrale de Nantes S3: S´ eminaire Signal @Paris-Saclay, April 9th 2021 Joint work with Ramzi Ben Mhenni (LS2N-ECN, now LITIS, Universit´ e de Rouen) Jordan Ninin (Lab-STICC / ENSTA Bretagne) Marcel Mongeau (Universit´ e de Toulouse, ENAC) Herv´ e Carfantan (Universit´ e de Toulouse, IRAP)

Slide 2

Slide 2 text

Outline 1 Why? 2 Who? 3 How? 4 Where? S. Bourguignon Exact 0-norm optimization 2 / 23

Slide 3

Slide 3 text

Outline 1 Why? exact solutions to 0 -norm problems may achieve better estimates 2 Who? small to moderate size sparse problems can be solved exactly 3 How? dedicated Branch-and-Bound strategy 4 Where? directions for further works S. Bourguignon Exact 0-norm optimization 2 / 23

Slide 4

Slide 4 text

Exactness: exact criterion, exact optimization True, unrelaxed, 0 -“norm” criterion1 x 1 = p |xp| x q q = p |xp|q x 0 := Card{xp| xp = 0} Some sparsity-enhancing functions p ϕ(|xp|) and their unit balls. Global optimization: optimality guaranteed by the algorithm 1On (re)lˆ ache rien! S. Bourguignon Exact 0-norm optimization 3 / 23

Slide 5

Slide 5 text

Low-cardinality least-squares problems Cardinality constrained P2/0 : min x∈Rn 1 2 y − Ax 2 s.t. Card {i | xi = 0} ≤ K CC portfolio optimization, subset selection, . . . Error constrained P0/2 : min x∈Rn Card {i | xi = 0} s.t. 1 2 y − Ax 2 ≤ Sparse approximation, subset selection, . . . Penalized / scalarized P2+0 : min x∈Rn 1 2 y − Ax 2 + λ Card {i | xi = 0} Inverse problems, compressed sensing, subset selection, . . . Card {i | xi = 0} = x 0 : 0 “norm” S. Bourguignon Exact 0-norm optimization 4 / 23

Slide 6

Slide 6 text

Some sparse inverse problems in signal processing Deconvolution of spike trains (seismology [Mendel 83], non-destructive testing [Zala 92]) reflectivity sequence x instrument response h data y = h ∗ x + noise Atoms h(t − nTs ), n = 1, . . . , N x = arg min x 1 2 y − Ax 2 + λ x 0 x ∈ Arg min x x 0 s.t. 1 2 y − Ax 2 ≤ S. Bourguignon Exact 0-norm optimization 5 / 23 ∗ →

Slide 7

Slide 7 text

Some sparse inverse problems in signal processing Spectral unmixing (hyperspectral imaging) [Iordache et al., 11] a = arg min a ≥ 0, i ai = 1 1 2 s − Sa 2 s.t. a 0 ≤ K a ∈ Arg min a ≥ 0, i ai = 1 1 2 a 0 s.t. s − Sa 2 ≤ Atoms sref (λ) S. Bourguignon Exact 0-norm optimization 6 / 23 s1 (λ) asoil ssoil (λ) + arock srock (λ) s2 (λ) awater swater (λ) s3 (λ) atree stree (λ) + asoil ssoil (λ)

Slide 8

Slide 8 text

Some sparse inverse problems in signal processing Identification of non-linear models [Tang et al., 13] Non-linear model y K k=1 αk M(θk ) θ1 θ2 θ3 α1 α2 α3 → θ, α and K? (ex: sparse spectral analysis) Linear model + sparsity y Q q=1 xq M(θd q ) = M x → x sparse: Q = {q|xq = 0} θ = θd q q∈Q α = {xq} q∈Q K = Card Q Variable selection / subset selection / sparse regression [Miller, 02] S. Bourguignon Exact 0-norm optimization 7 / 23 θd q = q∆θ − − − − − − − − →

Slide 9

Slide 9 text

Exactness may be worth. . . Natural formulation for many problems P2/0 : min x∈RP 1 2 y − Ax 2 2 s.t. x 0 ≤ K P0/2 : min x∈RP x 0 s.t. 1 2 y − Ax 2 2 ≤ P2+0 : min x∈RP 1 2 y − Ax 2 2 + λ x 0 Global optimum better solution [Bertsimas et al., 16, Bourguignon et al., 16] 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 Data and truth OMP 1 relaxation SBR Global optimum y − H˚ x 2 2 = 1.62 y − Hx 2 2 = 6.07 y − Hx 2 2 = 2.36 y − Hx 2 2 = 2.22 y − Hx 2 2 = 1.43 Results taken from [Bourguignon et al., 16] S. Bourguignon Exact 0-norm optimization 8 / 23

Slide 10

Slide 10 text

. . . but exactness has a price S. Bourguignon Exact 0-norm optimization 9 / 23

Slide 11

Slide 11 text

. . . but exactness has a price S. Bourguignon Exact 0-norm optimization 9 / 23

Slide 12

Slide 12 text

. . . but exactness has a price NP-hard: x 0 ≤ K P K possible combinations. . . in worst case scenario! Branch-and-Bound: eliminate (hopefully huge) sets of possible combinations without resorting to their evaluation Small data problems! one-dimensional problems deconvolution, time series spectral analysis, spectral unmixing, . . . variable/subset selection in statistics S. Bourguignon Exact 0-norm optimization 9 / 23

Slide 13

Slide 13 text

Mixed Integer Programming (MIP) reformulation (see [Bienstock 96, Bertsimas et al., 16, Bourguignon et al., 16]) Big-M assumption: ∀p, |xp| ≤ M. Then: min x∈R P y − Ax 2 2 s.t. x 0 ≤ K ∀p, |xp| ≤ M ⇔ min b∈{0;1}P x∈R P y − Ax 2 2 s.t.      p bp ≤ K ∀p, |xp| ≤ Mbp Can be addressed by MIP solvers (CPLEX, GUROBI, . . . ) but computation time ↑ / limited to small size Here: No need for MIP reformulation nor binary variables Specific Branch-and-Bound construction for problems P2/0 , P0/2 , and P2+0 S. Bourguignon Exact 0-norm optimization 10 / 23

Slide 14

Slide 14 text

Specificity of 0 -norm problems as a very particular MIP P2/0 : min x ∈ RP b ∈ {0, 1}P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp binary variables appear in a simple sum (separable) one binary variable ↔ one continuous variable no quadratic coupling mixing x and b, between bi and bj cost function in x / constraints in b (or conversely) finally, no need for binary variables! [Bienstock, 96]: many ideas, few results [Jokar & Pfetsch, 08]: MIP mentioned, but two heavy [Bertsimas & Shioda, 09]: dedicated MIP solver (#x ≤ #y) [Karahanoglu et al., 13]: noise-free problems (y = Ax) [Hazimeh et al., 20]: 0 + 2 penalization S. Bourguignon Exact 0-norm optimization 11 / 23

Slide 15

Slide 15 text

Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? Which side explore first? Which node explore first? S. Bourguignon Exact 0-norm optimization 12 / 23

Slide 16

Slide 16 text

Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search S. Bourguignon Exact 0-norm optimization 12 / 23

Slide 17

Slide 17 text

Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search Computation of relaxed solutions? S. Bourguignon Exact 0-norm optimization 12 / 23

Slide 18

Slide 18 text

Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search Computation of relaxed solutions? related to 1 -norm optimization. . . S. Bourguignon Exact 0-norm optimization 12 / 23

Slide 19

Slide 19 text

MIP continuous relaxation and 1 norm P2/0 : min x∈RP b∈{0,1}P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp R2/0 : min x∈RP b∈[0,1]P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp 0 1 -M 0 M We have2 min R2/0 = min x∈RP 1 2 y − Ax 2 s.t. p |xp| ≤ MK ∀p, |xp| ≤ M . 2Proof: for a solution (x , b ) of PR 2/0 , we have |x | = Mb . . . S. Bourguignon Exact 0-norm optimization 13 / 23

Slide 20

Slide 20 text

Continuous relaxation within the branch-and-bound procedure At a given node: bS0 = 0 and xS0 = 0 bS1 = 1 and |xS1 | ≤ M bS free and |xS | ≤ MbS p bp = Card S1 + p∈S bp P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 The relaxed problem at node i reads equivalently: R(i) 2/0 : min xS , xS1 1 2 y − AS xS − AS1 xS1 2 2 s.t.      xS 1 ≤ M(K − Card S1 ) xS ∞ ≤ M xS1 ∞ ≤ M Least squares, 1 norm (partially) and box constraints. No binary variables! S. Bourguignon Exact 0-norm optimization 14 / 23

Slide 21

Slide 21 text

Optimization with (partial) 1 -norm and box constraints Homotopy continuation principle Standard case [Osborne et al., 00] With free variable and box constraints min x 1 2 y − Ax 2 2 + λ x 1 min xS ,xS1 1 2 y − AS xS − AS1 xS1 2 2 + λ xS 1 s.c. xS ∞ ≤ M xS1 ∞ ≤ M λ x∗ 1 x∗ 2 x∗ 3 x∗ 4 λ∗ x∗ λ(0) λ(1) λ(2) λ(3) λ(4) M −M λ x∗ x∗ 1 x∗ 2 x∗ 3 x∗ 4 x∗ 5 λ∗ λ(0) λ(1) λ(2) λ(3) λ(4) λ(5) λ(6) S. Bourguignon Exact 0-norm optimization 15 / 23

Slide 22

Slide 22 text

Homotopy continuation Similarly solves relaxations for the sparsity-constrained problem: R(i) 2/0 : min xS , xS1 1 2 y − AS xS − AS1 xS1 2 2 s.t. xS 1 ≤ τ xS ∞ ≤ M, xS1 ∞ ≤ M and for the error-constrained problem: R(i) 0/2 : min xS , xS1 xS 1 s.t. 1 2 y − AS xS − AS1 xS1 2 2 ≤ xS ∞ ≤ M, xS1 ∞ ≤ M 1 2 y − AS1 x∗ S1 − AS x∗ S 2 x∗ S 1 τ λ(0) λ(1) λ(2) λ(3) λ(4) λ∗ Pareto curve S. Bourguignon Exact 0-norm optimization 16 / 23

Slide 23

Slide 23 text

Branch-and-Bound : example (P = 5, K = 2) y = A x +     0.0489 8.1035 8.0727 −2.0303         1 5 2 −1 1 3 4 5 0 −1 2 1 4 2 −2 −1 0 −2 1 0           2 0 0 0 −2       min x∈R5,b∈{0,1}5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, avec M = 5 S. Bourguignon Exact 0-norm optimization 17 / 23

Slide 24

Slide 24 text

Branch-and-Bound : example P(0) P(0) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb f = 0, x = 2.2, 0.16, 0.3, −0.072, −2.2 T , b = 0.58, 0.24, 0.27, 0.24, 0.56 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 25

Slide 25 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(0) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb f = 0, x = 2.2, 0.16, 0.3, −0.072, −2.2 T , b = 0.58, 0.24, 0.27, 0.24, 0.56 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 26

Slide 26 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) P(1) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1 f = 0, x = 3.1, −0.05, 0.14, 0.51, −1.2 T , b = 1, 0.25, 0.23, 0.2, 0.32 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 27

Slide 27 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(1) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1 f = 0, x = 3.1, −0.05, 0.14, 0.51, −1.2 T , b = 1, 0.25, 0.23, 0.2, 0.32 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 28

Slide 28 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(2) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 1 f = 0.07, x = 3.1, −0, −0, −0, −1.9 T , b = 1, 0, 0, 0, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 29

Slide 29 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) P(3) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0 f = 0, x = 4.2, −0.29, −0.041, 1.2, 0 T , b = 1, 0.33, 0.3, 0.36, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 30

Slide 30 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(3) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0 f = 0, x = 4.2, −0.29, −0.041, 1.2, 0 T , b = 1, 0.33, 0.3, 0.36, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 31

Slide 31 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(4) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 1 f = 0.527, x = 3.7, −0, −0, 1.4, 0 T , b = 1, 0, 0, 1, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 32

Slide 32 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(4) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 1 f = 0.527, x = 3.7, −0, −0, 1.4, 0 T , b = 1, 0, 0, 1, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 33

Slide 33 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(5) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 0 f = 2.52, x = 3, −0.64, 0.95, 0, 0 T , b = 1, 0.47, 0.53, 0, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 34

Slide 34 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(5) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 0 f = 2.52, x = 3, −0.64, 0.95, 0, 0 T , b = 1, 0.47, 0.53, 0, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 35

Slide 35 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) P(6) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.24, 0.27, 0.41, 0.99 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 36

Slide 36 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(6) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.24, 0.27, 0.41, 0.99 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 37

Slide 37 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) P(7) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.31, 0.27, 0.42, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 38

Slide 38 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(7) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.31, 0.27, 0.42, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 39

Slide 39 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 1 f = 37, x = 0, −0, −0, −1.5, −5 T , b = 0, 0, 0, 1, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 40

Slide 40 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(8) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 1 f = 37, x = 0, −0, −0, −1.5, −5 T , b = 0, 0, 0, 1, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 41

Slide 41 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(9) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 0 f = 0.327, x = 0, 0.19, 1.5, 0, −2.2 T , b = 0, 0.41, 0.59, 0, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 42

Slide 42 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(9) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 0 f = 0.327, x = 0, 0.19, 1.5, 0, −2.2 T , b = 0, 0.41, 0.59, 0, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 43

Slide 43 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(10) P(10) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 0 f = 0.992, x = 0, −0.27, 2.2, 1, 0 T , b = 0, 0.47, 0.67, 0.52, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 44

Slide 44 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(10) P(10) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 0 f = 0.992, x = 0, −0.27, 2.2, 1, 0 T , b = 0, 0.47, 0.67, 0.52, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 45

Slide 45 text

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 = 0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(10) P(2) f = 0.07, x = 3.1, −0, −0, −0, −1.9, T , b = 1, 0, 0, 0, 1, T S. Bourguignon Exact 0-norm optimization 18 / 23

Slide 46

Slide 46 text

Sparse deconvolution: computing time Implementation in C++, UNIX desktop machine (single core) N = 120, P = 100, SNR = 10 dB, 50 instances Problem Branch-and-Bound MIP CPLEX12.8 Time Nb. nodes Failed Time Nb. nodes Failed (s) (×103) (s) (×103) K = 5 0.7 1.28 0 3.0 1.71 0 P2/0 K = 7 11.6 17.89 0 16.6 21.51 0 K = 9 43.5 57.37 9 53.8 72.04 6 K = 5 0.1 0.21 0 25.7 6.71 0 P0/2 K = 7 0.9 2.32 0 114.8 49.54 2 K = 9 2.5 5.22 0 328.2 101.07 17 K = 5 1.8 2.01 0 3.2 1.98 0 P2+0 K = 7 7.3 10.20 0 7.4 9.61 0 K = 9 25.6 31.80 5 17.3 23.74 2 Remark: C5 100 ∼ 7.5 107, C7 100 ∼ 1.6 1010, C9 100 ∼ 1.9 1012 Competitive exploration strategy (despite simple) Efficient node evaluation (especially for small K) Performance ≥ CPLEX, CPLEX for P0/2 S. Bourguignon Exact 0-norm optimization 19 / 23

Slide 47

Slide 47 text

Subset selection problems (framework in [Bertsimas et al., 16]) N = 500, P = 1 000, SNR = 8 dB, 50 instances an ∼ N(0, Σ), σij = ρ|i−j| y = Ax0 + , with equispaced non-zero values in x0 equal to 1 ρ = 0.8 ρ = 0.95 Exact recovery (%) 0 10 20 30 40 K 0 20 40 60 80 100 5 10 15 20 K 0 20 40 60 80 100 Sparsity degree Sparsity degree Better estimation than standard methods (even with limited time) Performance CPLEX S. Bourguignon Exact 0-norm optimization 20 / 23

Slide 48

Slide 48 text

Subset selection: computing times N = 500, P = 1 000, SNR = 8 dB, 50 instances Problem Branch-and-Bound CPLEX Time Nodes T/N Failed Time Nodes T/N Failed (s) (×103) (s) (s) (×103) (s) K = 5 0.3 0.01 0.02 0 80.0 0.01 6.07 0 P2/0 K = 9 7.4 0.15 0.05 0 265.8 0.18 1.45 0 K = 13 54.6 0.69 0.08 6 786.0 0.84 0.93 46 K = 17 624.4 5.82 0.11 43 - - - 50 K = 5 0.2 0.01 0.02 1 495.7 0.32 1.54 11 ρ = 0.8 P0/2 K = 9 16 0.21 0.08 0 - - - 50 K = 13 311.5 2.32 0.13 12 - - - 50 K = 17 694.3 4.90 0.14 47 - - - 50 K = 5 0.3 0.03 0.01 0 235.3 0.04 6.00 0 P2+0 K = 9 14.3 0.95 0.02 0 549.1 1.29 0.42 4 K = 13 855.3 34.52 0.02 32 982.6 3.61 0.27 49 K = 17 996.9 42.49 0.02 41 - - - 50 CPLEX performance ↓↓ Resolution guaranteed up to K ∼ 15 1000 15 ∼ 1032 . . . S. Bourguignon Exact 0-norm optimization 21 / 23

Slide 49

Slide 49 text

Conclusions Exact 0 solutions may be worth Specific Branch-and-bound method generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions S. Bourguignon Exact 0-norm optimization 22 / 23

Slide 50

Slide 50 text

Conclusions Exact 0 solutions may be worth Specific Branch-and-bound method generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) S. Bourguignon Exact 0-norm optimization 22 / 23

Slide 51

Slide 51 text

Conclusions Exact 0 solutions may be worth Specific Branch-and-bound method generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) More sophisticated branching rules, local heuristics, cutting planes, . . . S. Bourguignon Exact 0-norm optimization 22 / 23

Slide 52

Slide 52 text

Conclusions Exact 0 solutions may be worth Specific Branch-and-bound method generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) More sophisticated branching rules, local heuristics, cutting planes, . . . Improve relaxations non-convex solutions! 1x=0 |x| |x|p CEL0(x) S. Bourguignon Exact 0-norm optimization 22 / 23

Slide 53

Slide 53 text

Thank you. ANR JCJC program, 2016-2021, MIMOSA: Mixed Integer programming Methods for Optimization of Sparse Approximation criteria. S. Bourguignon, J. Ninin, H. Carfantan and M. Mongeau. IEEE Transactions on Signal Processing, 2016. R. Ben Mhenni, S. Bourguignon and J. Ninin. Optimization Methods and Software, submitted. Matlab and C codes available upon request (Git will come soon!) S. Bourguignon Exact 0-norm optimization 23 / 23

Slide 54

Slide 54 text

Homotopy continuation principle [Osborne et al., 00, Efron et al., 04] Consider the penalized formulation: xλ = arg min x J(x) := 1 2 y − Ax 2 + λ x 1 . xλ minimizes J if and only if 0 ∈ ∂J(xλ), with ∂J(xλ) = −AT (y − Axλ) + z ∈ RQ zi = sgn(xλ i ) if xλ i = 0 zi ∈ [−1, 1] if xλ i = 0 . With NZ and Z indexing the nonzero and the zero components in xλ, it reads: |AT Z (y − ANZxλ NZ )| ≤ λ. −AT NZ (y − ANZxλ NZ ) + λsgn(xλ NZ ) = 0. If λ ≥ λmax := AT y ∞ , then xλ = 0. Non-zero components satisfy xλ NZ = AT NZ ANZ −1 AT NZ y − λsgn(xλ NZ ) The solution path λ → xλ is a piecewise linear function of λ. It is a linear function of λ as long as the support configuration {NZ, Z, sgn(xλ NZ )} does not change. S. Bourguignon Exact 0-norm optimization 24 / 23