Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sébastien Bourguignon

Sébastien Bourguignon

(École Centrale de Nantes, LS2N, SIMS team)

https://s3-seminar.github.io/seminars/sebastien-bourguignon/

Title — Global optimization for l0-norm-based sparse solutions to least squares problems

Abstract — Finding solutions to least- squares problems with low cardinality (with small l0 “norm”) has found many applications, including sparsity-enhancing inverse problems in signal processing, subset selection in statistics, and cardinality-constrained portfolio optimization. This problem is NP-hard, therefore most works propose suboptimal (continuous relaxation or greedy) methods that are able to address high-dimensional problems. On the other hand, several recent works studied the global optimization of l0-norm problems through the mixed integer programming (MIP) reformulation of the l0 norm. Resolution is performed via MIP solvers, which essentially implement branch-and-bound methods.

We propose branch-and-bound algorithms which are specifically built for three possible formulations: cardinality-constrained and cardinality-penalized least squares, and cardinality minimization under a limited error constraint. A specific tree exploration strategy is built, inspired by forward greedy methods in stepwise regression. We then show that the relaxation problems involved at each node of the search tree can be reformulated as a l1-norm optimization problem. We propose a dedicated algorithm for such problems, based on the homotopy continuation principle. Branch-and-bound algorithms are then constructed, which are guaranteed to find the optimal subset, without resorting to MIP reformulations nor binary variables.

The obtained certified solutions are shown to better estimate sparsity patterns than standard methods on simulated problems involving 1 000 unknowns and up to a few tens of non-zero components. For problems with higher complexity, unguaranteed solutions obtained by limiting the computing time are also shown to provide more satisfactory estimates. The resulting global optimization procedure is finally shown to strongly improve over the CPLEX MIP solver in terms of computing time.

Joint work with Ramzi Ben Mhenni and Jordan Ninin.

Biography — Sébastien Bourguignon is an assistant professor (HDR) at Ecole Centrale de Nantes since 2011. His research focuses on the design of signal and image processing algorithms for the resolution of inverse problems encountered in various fields of science and engineering, including ultrasonic imaging, microwave imaging or hyperspectral imaging, in particular for nondestructive testing, planetary science and astronomy. He is particularly interested in exploiting information based on sparsity, both in the physical modeling of data and in the related computational aspects. He is currently the coordinator of the MIMOSA “young researchers” project (Mixed Integer programming Methods for Optimization of Sparse Approximation criteria) funded by the French national agency (2017-2021).

C3bc10b8a72ed3c3bfd843793b8a9868?s=128

S³ Seminar

April 09, 2021
Tweet

Transcript

  1. Exact optimization for 0 -norm sparse solutions to least-squares problems

    S´ ebastien Bourguignon Laboratoire des Sciences du Num´ erique de Nantes ´ Ecole Centrale de Nantes S3: S´ eminaire Signal @Paris-Saclay, April 9th 2021 Joint work with Ramzi Ben Mhenni (LS2N-ECN, now LITIS, Universit´ e de Rouen) Jordan Ninin (Lab-STICC / ENSTA Bretagne) Marcel Mongeau (Universit´ e de Toulouse, ENAC) Herv´ e Carfantan (Universit´ e de Toulouse, IRAP)
  2. Outline 1 Why? 2 Who? 3 How? 4 Where? S.

    Bourguignon Exact 0-norm optimization 2 / 23
  3. Outline 1 Why? exact solutions to 0 -norm problems may

    achieve better estimates 2 Who? small to moderate size sparse problems can be solved exactly 3 How? dedicated Branch-and-Bound strategy 4 Where? directions for further works S. Bourguignon Exact 0-norm optimization 2 / 23
  4. Exactness: exact criterion, exact optimization True, unrelaxed, 0 -“norm” criterion1

    x 1 = p |xp| x q q = p |xp|q x 0 := Card{xp| xp = 0} Some sparsity-enhancing functions p ϕ(|xp|) and their unit balls. Global optimization: optimality guaranteed by the algorithm 1On (re)lˆ ache rien! S. Bourguignon Exact 0-norm optimization 3 / 23
  5. Low-cardinality least-squares problems Cardinality constrained P2/0 : min x∈Rn 1

    2 y − Ax 2 s.t. Card {i | xi = 0} ≤ K CC portfolio optimization, subset selection, . . . Error constrained P0/2 : min x∈Rn Card {i | xi = 0} s.t. 1 2 y − Ax 2 ≤ Sparse approximation, subset selection, . . . Penalized / scalarized P2+0 : min x∈Rn 1 2 y − Ax 2 + λ Card {i | xi = 0} Inverse problems, compressed sensing, subset selection, . . . Card {i | xi = 0} = x 0 : 0 “norm” S. Bourguignon Exact 0-norm optimization 4 / 23
  6. Some sparse inverse problems in signal processing Deconvolution of spike

    trains (seismology [Mendel 83], non-destructive testing [Zala 92]) reflectivity sequence x instrument response h data y = h ∗ x + noise Atoms h(t − nTs ), n = 1, . . . , N x = arg min x 1 2 y − Ax 2 + λ x 0 x ∈ Arg min x x 0 s.t. 1 2 y − Ax 2 ≤ S. Bourguignon Exact 0-norm optimization 5 / 23 ∗ →
  7. Some sparse inverse problems in signal processing Spectral unmixing (hyperspectral

    imaging) [Iordache et al., 11] a = arg min a ≥ 0, i ai = 1 1 2 s − Sa 2 s.t. a 0 ≤ K a ∈ Arg min a ≥ 0, i ai = 1 1 2 a 0 s.t. s − Sa 2 ≤ Atoms sref (λ) S. Bourguignon Exact 0-norm optimization 6 / 23 s1 (λ) asoil ssoil (λ) + arock srock (λ) s2 (λ) awater swater (λ) s3 (λ) atree stree (λ) + asoil ssoil (λ)
  8. Some sparse inverse problems in signal processing Identification of non-linear

    models [Tang et al., 13] Non-linear model y K k=1 αk M(θk ) θ1 θ2 θ3 α1 α2 α3 → θ, α and K? (ex: sparse spectral analysis) Linear model + sparsity y Q q=1 xq M(θd q ) = M x → x sparse: Q = {q|xq = 0} θ = θd q q∈Q α = {xq} q∈Q K = Card Q Variable selection / subset selection / sparse regression [Miller, 02] S. Bourguignon Exact 0-norm optimization 7 / 23 θd q = q∆θ − − − − − − − − →
  9. Exactness may be worth. . . Natural formulation for many

    problems P2/0 : min x∈RP 1 2 y − Ax 2 2 s.t. x 0 ≤ K P0/2 : min x∈RP x 0 s.t. 1 2 y − Ax 2 2 ≤ P2+0 : min x∈RP 1 2 y − Ax 2 2 + λ x 0 Global optimum better solution [Bertsimas et al., 16, Bourguignon et al., 16] 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 Data and truth OMP 1 relaxation SBR Global optimum y − H˚ x 2 2 = 1.62 y − Hx 2 2 = 6.07 y − Hx 2 2 = 2.36 y − Hx 2 2 = 2.22 y − Hx 2 2 = 1.43 Results taken from [Bourguignon et al., 16] S. Bourguignon Exact 0-norm optimization 8 / 23
  10. . . . but exactness has a price S. Bourguignon

    Exact 0-norm optimization 9 / 23
  11. . . . but exactness has a price S. Bourguignon

    Exact 0-norm optimization 9 / 23
  12. . . . but exactness has a price NP-hard: x

    0 ≤ K P K possible combinations. . . in worst case scenario! Branch-and-Bound: eliminate (hopefully huge) sets of possible combinations without resorting to their evaluation Small data problems! one-dimensional problems deconvolution, time series spectral analysis, spectral unmixing, . . . variable/subset selection in statistics S. Bourguignon Exact 0-norm optimization 9 / 23
  13. Mixed Integer Programming (MIP) reformulation (see [Bienstock 96, Bertsimas et

    al., 16, Bourguignon et al., 16]) Big-M assumption: ∀p, |xp| ≤ M. Then: min x∈R P y − Ax 2 2 s.t. x 0 ≤ K ∀p, |xp| ≤ M ⇔ min b∈{0;1}P x∈R P y − Ax 2 2 s.t.      p bp ≤ K ∀p, |xp| ≤ Mbp Can be addressed by MIP solvers (CPLEX, GUROBI, . . . ) but computation time ↑ / limited to small size Here: No need for MIP reformulation nor binary variables Specific Branch-and-Bound construction for problems P2/0 , P0/2 , and P2+0 S. Bourguignon Exact 0-norm optimization 10 / 23
  14. Specificity of 0 -norm problems as a very particular MIP

    P2/0 : min x ∈ RP b ∈ {0, 1}P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp binary variables appear in a simple sum (separable) one binary variable ↔ one continuous variable no quadratic coupling mixing x and b, between bi and bj cost function in x / constraints in b (or conversely) finally, no need for binary variables! [Bienstock, 96]: many ideas, few results [Jokar & Pfetsch, 08]: MIP mentioned, but two heavy [Bertsimas & Shioda, 09]: dedicated MIP solver (#x ≤ #y) [Karahanoglu et al., 13]: noise-free problems (y = Ax) [Hazimeh et al., 20]: 0 + 2 penalization S. Bourguignon Exact 0-norm optimization 11 / 23
  15. Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary

    variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? Which side explore first? Which node explore first? S. Bourguignon Exact 0-norm optimization 12 / 23
  16. Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary

    variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search S. Bourguignon Exact 0-norm optimization 12 / 23
  17. Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary

    variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search Computation of relaxed solutions? S. Bourguignon Exact 0-norm optimization 12 / 23
  18. Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary

    variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search Computation of relaxed solutions? related to 1 -norm optimization. . . S. Bourguignon Exact 0-norm optimization 12 / 23
  19. MIP continuous relaxation and 1 norm P2/0 : min x∈RP

    b∈{0,1}P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp R2/0 : min x∈RP b∈[0,1]P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp 0 1 -M 0 M We have2 min R2/0 = min x∈RP 1 2 y − Ax 2 s.t. p |xp| ≤ MK ∀p, |xp| ≤ M . 2Proof: for a solution (x , b ) of PR 2/0 , we have |x | = Mb . . . S. Bourguignon Exact 0-norm optimization 13 / 23
  20. Continuous relaxation within the branch-and-bound procedure At a given node:

    bS0 = 0 and xS0 = 0 bS1 = 1 and |xS1 | ≤ M bS free and |xS | ≤ MbS p bp = Card S1 + p∈S bp P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 The relaxed problem at node i reads equivalently: R(i) 2/0 : min xS , xS1 1 2 y − AS xS − AS1 xS1 2 2 s.t.      xS 1 ≤ M(K − Card S1 ) xS ∞ ≤ M xS1 ∞ ≤ M Least squares, 1 norm (partially) and box constraints. No binary variables! S. Bourguignon Exact 0-norm optimization 14 / 23
  21. Optimization with (partial) 1 -norm and box constraints Homotopy continuation

    principle Standard case [Osborne et al., 00] With free variable and box constraints min x 1 2 y − Ax 2 2 + λ x 1 min xS ,xS1 1 2 y − AS xS − AS1 xS1 2 2 + λ xS 1 s.c. xS ∞ ≤ M xS1 ∞ ≤ M λ x∗ 1 x∗ 2 x∗ 3 x∗ 4 λ∗ x∗ λ(0) λ(1) λ(2) λ(3) λ(4) M −M λ x∗ x∗ 1 x∗ 2 x∗ 3 x∗ 4 x∗ 5 λ∗ λ(0) λ(1) λ(2) λ(3) λ(4) λ(5) λ(6) S. Bourguignon Exact 0-norm optimization 15 / 23
  22. Homotopy continuation Similarly solves relaxations for the sparsity-constrained problem: R(i)

    2/0 : min xS , xS1 1 2 y − AS xS − AS1 xS1 2 2 s.t. xS 1 ≤ τ xS ∞ ≤ M, xS1 ∞ ≤ M and for the error-constrained problem: R(i) 0/2 : min xS , xS1 xS 1 s.t. 1 2 y − AS xS − AS1 xS1 2 2 ≤ xS ∞ ≤ M, xS1 ∞ ≤ M 1 2 y − AS1 x∗ S1 − AS x∗ S 2 x∗ S 1 τ λ(0) λ(1) λ(2) λ(3) λ(4) λ∗ Pareto curve S. Bourguignon Exact 0-norm optimization 16 / 23
  23. Branch-and-Bound : example (P = 5, K = 2) y

    = A x +     0.0489 8.1035 8.0727 −2.0303         1 5 2 −1 1 3 4 5 0 −1 2 1 4 2 −2 −1 0 −2 1 0           2 0 0 0 −2       min x∈R5,b∈{0,1}5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, avec M = 5 S. Bourguignon Exact 0-norm optimization 17 / 23
  24. Branch-and-Bound : example P(0) P(0) R : min x∈R5,b∈[0,1]5 1

    2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb f = 0, x = 2.2, 0.16, 0.3, −0.072, −2.2 T , b = 0.58, 0.24, 0.27, 0.24, 0.56 T S. Bourguignon Exact 0-norm optimization 18 / 23
  25. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(0) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb f = 0, x = 2.2, 0.16, 0.3, −0.072, −2.2 T , b = 0.58, 0.24, 0.27, 0.24, 0.56 T S. Bourguignon Exact 0-norm optimization 18 / 23
  26. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) P(1) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1 f = 0, x = 3.1, −0.05, 0.14, 0.51, −1.2 T , b = 1, 0.25, 0.23, 0.2, 0.32 T S. Bourguignon Exact 0-norm optimization 18 / 23
  27. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(1) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1 f = 0, x = 3.1, −0.05, 0.14, 0.51, −1.2 T , b = 1, 0.25, 0.23, 0.2, 0.32 T S. Bourguignon Exact 0-norm optimization 18 / 23
  28. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(2) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 1 f = 0.07, x = 3.1, −0, −0, −0, −1.9 T , b = 1, 0, 0, 0, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23
  29. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) P(3) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0 f = 0, x = 4.2, −0.29, −0.041, 1.2, 0 T , b = 1, 0.33, 0.3, 0.36, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23
  30. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(3) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0 f = 0, x = 4.2, −0.29, −0.041, 1.2, 0 T , b = 1, 0.33, 0.3, 0.36, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23
  31. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(4) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 1 f = 0.527, x = 3.7, −0, −0, 1.4, 0 T , b = 1, 0, 0, 1, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23
  32. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(4) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 1 f = 0.527, x = 3.7, −0, −0, 1.4, 0 T , b = 1, 0, 0, 1, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23
  33. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(5) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 0 f = 2.52, x = 3, −0.64, 0.95, 0, 0 T , b = 1, 0.47, 0.53, 0, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23
  34. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(5) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 0 f = 2.52, x = 3, −0.64, 0.95, 0, 0 T , b = 1, 0.47, 0.53, 0, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23
  35. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) P(6) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.24, 0.27, 0.41, 0.99 T S. Bourguignon Exact 0-norm optimization 18 / 23
  36. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(6) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.24, 0.27, 0.41, 0.99 T S. Bourguignon Exact 0-norm optimization 18 / 23
  37. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) P(7) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.31, 0.27, 0.42, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23
  38. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(7) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.31, 0.27, 0.42, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23
  39. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 1 f = 37, x = 0, −0, −0, −1.5, −5 T , b = 0, 0, 0, 1, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23
  40. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(8) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 1 f = 37, x = 0, −0, −0, −1.5, −5 T , b = 0, 0, 0, 1, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23
  41. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(9) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 0 f = 0.327, x = 0, 0.19, 1.5, 0, −2.2 T , b = 0, 0.41, 0.59, 0, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23
  42. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(9) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 0 f = 0.327, x = 0, 0.19, 1.5, 0, −2.2 T , b = 0, 0.41, 0.59, 0, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23
  43. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(10) P(10) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 0 f = 0.992, x = 0, −0.27, 2.2, 1, 0 T , b = 0, 0.47, 0.67, 0.52, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23
  44. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(10) P(10) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 0 f = 0.992, x = 0, −0.27, 2.2, 1, 0 T , b = 0, 0.47, 0.67, 0.52, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23
  45. Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =

    0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(10) P(2) f = 0.07, x = 3.1, −0, −0, −0, −1.9, T , b = 1, 0, 0, 0, 1, T S. Bourguignon Exact 0-norm optimization 18 / 23
  46. Sparse deconvolution: computing time Implementation in C++, UNIX desktop machine

    (single core) N = 120, P = 100, SNR = 10 dB, 50 instances Problem Branch-and-Bound MIP CPLEX12.8 Time Nb. nodes Failed Time Nb. nodes Failed (s) (×103) (s) (×103) K = 5 0.7 1.28 0 3.0 1.71 0 P2/0 K = 7 11.6 17.89 0 16.6 21.51 0 K = 9 43.5 57.37 9 53.8 72.04 6 K = 5 0.1 0.21 0 25.7 6.71 0 P0/2 K = 7 0.9 2.32 0 114.8 49.54 2 K = 9 2.5 5.22 0 328.2 101.07 17 K = 5 1.8 2.01 0 3.2 1.98 0 P2+0 K = 7 7.3 10.20 0 7.4 9.61 0 K = 9 25.6 31.80 5 17.3 23.74 2 Remark: C5 100 ∼ 7.5 107, C7 100 ∼ 1.6 1010, C9 100 ∼ 1.9 1012 Competitive exploration strategy (despite simple) Efficient node evaluation (especially for small K) Performance ≥ CPLEX, CPLEX for P0/2 S. Bourguignon Exact 0-norm optimization 19 / 23
  47. Subset selection problems (framework in [Bertsimas et al., 16]) N

    = 500, P = 1 000, SNR = 8 dB, 50 instances an ∼ N(0, Σ), σij = ρ|i−j| y = Ax0 + , with equispaced non-zero values in x0 equal to 1 ρ = 0.8 ρ = 0.95 Exact recovery (%) 0 10 20 30 40 K 0 20 40 60 80 100 5 10 15 20 K 0 20 40 60 80 100 Sparsity degree Sparsity degree Better estimation than standard methods (even with limited time) Performance CPLEX S. Bourguignon Exact 0-norm optimization 20 / 23
  48. Subset selection: computing times N = 500, P = 1

    000, SNR = 8 dB, 50 instances Problem Branch-and-Bound CPLEX Time Nodes T/N Failed Time Nodes T/N Failed (s) (×103) (s) (s) (×103) (s) K = 5 0.3 0.01 0.02 0 80.0 0.01 6.07 0 P2/0 K = 9 7.4 0.15 0.05 0 265.8 0.18 1.45 0 K = 13 54.6 0.69 0.08 6 786.0 0.84 0.93 46 K = 17 624.4 5.82 0.11 43 - - - 50 K = 5 0.2 0.01 0.02 1 495.7 0.32 1.54 11 ρ = 0.8 P0/2 K = 9 16 0.21 0.08 0 - - - 50 K = 13 311.5 2.32 0.13 12 - - - 50 K = 17 694.3 4.90 0.14 47 - - - 50 K = 5 0.3 0.03 0.01 0 235.3 0.04 6.00 0 P2+0 K = 9 14.3 0.95 0.02 0 549.1 1.29 0.42 4 K = 13 855.3 34.52 0.02 32 982.6 3.61 0.27 49 K = 17 996.9 42.49 0.02 41 - - - 50 CPLEX performance ↓↓ Resolution guaranteed up to K ∼ 15 1000 15 ∼ 1032 . . . S. Bourguignon Exact 0-norm optimization 21 / 23
  49. Conclusions Exact 0 solutions may be worth Specific Branch-and-bound method

    generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions S. Bourguignon Exact 0-norm optimization 22 / 23
  50. Conclusions Exact 0 solutions may be worth Specific Branch-and-bound method

    generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) S. Bourguignon Exact 0-norm optimization 22 / 23
  51. Conclusions Exact 0 solutions may be worth Specific Branch-and-bound method

    generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) More sophisticated branching rules, local heuristics, cutting planes, . . . S. Bourguignon Exact 0-norm optimization 22 / 23
  52. Conclusions Exact 0 solutions may be worth Specific Branch-and-bound method

    generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) More sophisticated branching rules, local heuristics, cutting planes, . . . Improve relaxations non-convex solutions! 1x=0 |x| |x|p CEL0(x) S. Bourguignon Exact 0-norm optimization 22 / 23
  53. Thank you. ANR JCJC program, 2016-2021, MIMOSA: Mixed Integer programming

    Methods for Optimization of Sparse Approximation criteria. S. Bourguignon, J. Ninin, H. Carfantan and M. Mongeau. IEEE Transactions on Signal Processing, 2016. R. Ben Mhenni, S. Bourguignon and J. Ninin. Optimization Methods and Software, submitted. Matlab and C codes available upon request (Git will come soon!) S. Bourguignon Exact 0-norm optimization 23 / 23
  54. Homotopy continuation principle [Osborne et al., 00, Efron et al.,

    04] Consider the penalized formulation: xλ = arg min x J(x) := 1 2 y − Ax 2 + λ x 1 . xλ minimizes J if and only if 0 ∈ ∂J(xλ), with ∂J(xλ) = −AT (y − Axλ) + z ∈ RQ zi = sgn(xλ i ) if xλ i = 0 zi ∈ [−1, 1] if xλ i = 0 . With NZ and Z indexing the nonzero and the zero components in xλ, it reads: |AT Z (y − ANZxλ NZ )| ≤ λ. −AT NZ (y − ANZxλ NZ ) + λsgn(xλ NZ ) = 0. If λ ≥ λmax := AT y ∞ , then xλ = 0. Non-zero components satisfy xλ NZ = AT NZ ANZ −1 AT NZ y − λsgn(xλ NZ ) The solution path λ → xλ is a piecewise linear function of λ. It is a linear function of λ as long as the support configuration {NZ, Z, sgn(xλ NZ )} does not change. S. Bourguignon Exact 0-norm optimization 24 / 23