Sébastien Bourguignon

Exact optimization for 0 -norm sparse solutions to least-squares problems
S´ ebastien Bourguignon Laboratoire des Sciences du Num´ erique de Nantes ´ Ecole Centrale de Nantes S3: S´ eminaire Signal @Paris-Saclay, April 9th 2021 Joint work with Ramzi Ben Mhenni (LS2N-ECN, now LITIS, Universit´ e de Rouen) Jordan Ninin (Lab-STICC / ENSTA Bretagne) Marcel Mongeau (Universit´ e de Toulouse, ENAC) Herv´ e Carfantan (Universit´ e de Toulouse, IRAP)

Outline 1 Why? 2 Who? 3 How? 4 Where? S.
Bourguignon Exact 0-norm optimization 2 / 23

Outline 1 Why? exact solutions to 0 -norm problems may
achieve better estimates 2 Who? small to moderate size sparse problems can be solved exactly 3 How? dedicated Branch-and-Bound strategy 4 Where? directions for further works S. Bourguignon Exact 0-norm optimization 2 / 23

Exactness: exact criterion, exact optimization True, unrelaxed, 0 -“norm” criterion1
x 1 = p |xp| x q q = p |xp|q x 0 := Card{xp| xp = 0} Some sparsity-enhancing functions p ϕ(|xp|) and their unit balls. Global optimization: optimality guaranteed by the algorithm 1On (re)lˆ ache rien! S. Bourguignon Exact 0-norm optimization 3 / 23

Low-cardinality least-squares problems Cardinality constrained P2/0 : min x∈Rn 1
2 y − Ax 2 s.t. Card {i | xi = 0} ≤ K CC portfolio optimization, subset selection, . . . Error constrained P0/2 : min x∈Rn Card {i | xi = 0} s.t. 1 2 y − Ax 2 ≤ Sparse approximation, subset selection, . . . Penalized / scalarized P2+0 : min x∈Rn 1 2 y − Ax 2 + λ Card {i | xi = 0} Inverse problems, compressed sensing, subset selection, . . . Card {i | xi = 0} = x 0 : 0 “norm” S. Bourguignon Exact 0-norm optimization 4 / 23

Some sparse inverse problems in signal processing Deconvolution of spike
trains (seismology [Mendel 83], non-destructive testing [Zala 92]) reﬂectivity sequence x instrument response h data y = h ∗ x + noise Atoms h(t − nTs ), n = 1, . . . , N x = arg min x 1 2 y − Ax 2 + λ x 0 x ∈ Arg min x x 0 s.t. 1 2 y − Ax 2 ≤ S. Bourguignon Exact 0-norm optimization 5 / 23 ∗ →

Some sparse inverse problems in signal processing Spectral unmixing (hyperspectral
imaging) [Iordache et al., 11] a = arg min a ≥ 0, i ai = 1 1 2 s − Sa 2 s.t. a 0 ≤ K a ∈ Arg min a ≥ 0, i ai = 1 1 2 a 0 s.t. s − Sa 2 ≤ Atoms sref (λ) S. Bourguignon Exact 0-norm optimization 6 / 23 s1 (λ) asoil ssoil (λ) + arock srock (λ) s2 (λ) awater swater (λ) s3 (λ) atree stree (λ) + asoil ssoil (λ)

Some sparse inverse problems in signal processing Identiﬁcation of non-linear
models [Tang et al., 13] Non-linear model y K k=1 αk M(θk ) θ1 θ2 θ3 α1 α2 α3 → θ, α and K? (ex: sparse spectral analysis) Linear model + sparsity y Q q=1 xq M(θd q ) = M x → x sparse: Q = {q|xq = 0} θ = θd q q∈Q α = {xq} q∈Q K = Card Q Variable selection / subset selection / sparse regression [Miller, 02] S. Bourguignon Exact 0-norm optimization 7 / 23 θd q = q∆θ − − − − − − − − →

Exactness may be worth. . . Natural formulation for many
problems P2/0 : min x∈RP 1 2 y − Ax 2 2 s.t. x 0 ≤ K P0/2 : min x∈RP x 0 s.t. 1 2 y − Ax 2 2 ≤ P2+0 : min x∈RP 1 2 y − Ax 2 2 + λ x 0 Global optimum better solution [Bertsimas et al., 16, Bourguignon et al., 16] 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 0 50 100 −6 −4 −2 0 2 4 Data and truth OMP 1 relaxation SBR Global optimum y − H˚ x 2 2 = 1.62 y − Hx 2 2 = 6.07 y − Hx 2 2 = 2.36 y − Hx 2 2 = 2.22 y − Hx 2 2 = 1.43 Results taken from [Bourguignon et al., 16] S. Bourguignon Exact 0-norm optimization 8 / 23

. . . but exactness has a price S. Bourguignon
Exact 0-norm optimization 9 / 23

. . . but exactness has a price NP-hard: x
0 ≤ K P K possible combinations. . . in worst case scenario! Branch-and-Bound: eliminate (hopefully huge) sets of possible combinations without resorting to their evaluation Small data problems! one-dimensional problems deconvolution, time series spectral analysis, spectral unmixing, . . . variable/subset selection in statistics S. Bourguignon Exact 0-norm optimization 9 / 23

Mixed Integer Programming (MIP) reformulation (see [Bienstock 96, Bertsimas et
al., 16, Bourguignon et al., 16]) Big-M assumption: ∀p, |xp| ≤ M. Then: min x∈R P y − Ax 2 2 s.t. x 0 ≤ K ∀p, |xp| ≤ M ⇔ min b∈{0;1}P x∈R P y − Ax 2 2 s.t.      p bp ≤ K ∀p, |xp| ≤ Mbp Can be addressed by MIP solvers (CPLEX, GUROBI, . . . ) but computation time ↑ / limited to small size Here: No need for MIP reformulation nor binary variables Speciﬁc Branch-and-Bound construction for problems P2/0 , P0/2 , and P2+0 S. Bourguignon Exact 0-norm optimization 10 / 23

Speciﬁcity of 0 -norm problems as a very particular MIP
P2/0 : min x ∈ RP b ∈ {0, 1}P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp binary variables appear in a simple sum (separable) one binary variable ↔ one continuous variable no quadratic coupling mixing x and b, between bi and bj cost function in x / constraints in b (or conversely) ﬁnally, no need for binary variables! [Bienstock, 96]: many ideas, few results [Jokar & Pfetsch, 08]: MIP mentioned, but two heavy [Bertsimas & Shioda, 09]: dedicated MIP solver (#x ≤ #y) [Karahanoglu et al., 13]: noise-free problems (y = Ax) [Hazimeh et al., 20]: 0 + 2 penalization S. Bourguignon Exact 0-norm optimization 11 / 23

Branch-and-Bound resolution [Land & Doig, 60] Decision tree for binary
variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? Which side explore ﬁrst? Which node explore ﬁrst? S. Bourguignon Exact 0-norm optimization 12 / 23

variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search S. Bourguignon Exact 0-norm optimization 12 / 23

variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search Computation of relaxed solutions? S. Bourguignon Exact 0-norm optimization 12 / 23

variables At each node, a lower bound on all subproblems contained by this node remaining binary variables are relaxed into 0, 1 If this bound exceeds the best known solution, the branch is pruned. P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 Which variable bp branch on? highest relaxed variable Which side explore first? bp = 1 Which node explore first? depth-first search Computation of relaxed solutions? related to 1 -norm optimization. . . S. Bourguignon Exact 0-norm optimization 12 / 23

MIP continuous relaxation and 1 norm P2/0 : min x∈RP
b∈{0,1}P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp R2/0 : min x∈RP b∈[0,1]P 1 2 y − Ax 2 s.c. p bp ≤ K ∀p, −Mbp ≤ xp ≤ Mbp 0 1 -M 0 M We have2 min R2/0 = min x∈RP 1 2 y − Ax 2 s.t. p |xp| ≤ MK ∀p, |xp| ≤ M . 2Proof: for a solution (x , b ) of PR 2/0 , we have |x | = Mb . . . S. Bourguignon Exact 0-norm optimization 13 / 23

Continuous relaxation within the branch-and-bound procedure At a given node:
bS0 = 0 and xS0 = 0 bS1 = 1 and |xS1 | ≤ M bS free and |xS | ≤ MbS p bp = Card S1 + p∈S bp P(0) P(1) P(4) bp0 = 1 bp0 = 0 P(2) P(3) P(5) P(6) bp1 = 1 bp1 = 0 bp4 = 1 bp4 = 0 The relaxed problem at node i reads equivalently: R(i) 2/0 : min xS , xS1 1 2 y − AS xS − AS1 xS1 2 2 s.t.      xS 1 ≤ M(K − Card S1 ) xS ∞ ≤ M xS1 ∞ ≤ M Least squares, 1 norm (partially) and box constraints. No binary variables! S. Bourguignon Exact 0-norm optimization 14 / 23

Optimization with (partial) 1 -norm and box constraints Homotopy continuation
principle Standard case [Osborne et al., 00] With free variable and box constraints min x 1 2 y − Ax 2 2 + λ x 1 min xS ,xS1 1 2 y − AS xS − AS1 xS1 2 2 + λ xS 1 s.c. xS ∞ ≤ M xS1 ∞ ≤ M λ x∗ 1 x∗ 2 x∗ 3 x∗ 4 λ∗ x∗ λ(0) λ(1) λ(2) λ(3) λ(4) M −M λ x∗ x∗ 1 x∗ 2 x∗ 3 x∗ 4 x∗ 5 λ∗ λ(0) λ(1) λ(2) λ(3) λ(4) λ(5) λ(6) S. Bourguignon Exact 0-norm optimization 15 / 23

Homotopy continuation Similarly solves relaxations for the sparsity-constrained problem: R(i)
2/0 : min xS , xS1 1 2 y − AS xS − AS1 xS1 2 2 s.t. xS 1 ≤ τ xS ∞ ≤ M, xS1 ∞ ≤ M and for the error-constrained problem: R(i) 0/2 : min xS , xS1 xS 1 s.t. 1 2 y − AS xS − AS1 xS1 2 2 ≤ xS ∞ ≤ M, xS1 ∞ ≤ M 1 2 y − AS1 x∗ S1 − AS x∗ S 2 x∗ S 1 τ λ(0) λ(1) λ(2) λ(3) λ(4) λ∗ Pareto curve S. Bourguignon Exact 0-norm optimization 16 / 23

Branch-and-Bound : example (P = 5, K = 2) y
= A x +     0.0489 8.1035 8.0727 −2.0303         1 5 2 −1 1 3 4 5 0 −1 2 1 4 2 −2 −1 0 −2 1 0           2 0 0 0 −2       min x∈R5,b∈{0,1}5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, avec M = 5 S. Bourguignon Exact 0-norm optimization 17 / 23

Branch-and-Bound : example P(0) P(0) R : min x∈R5,b∈[0,1]5 1
2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb f = 0, x = 2.2, 0.16, 0.3, −0.072, −2.2 T , b = 0.58, 0.24, 0.27, 0.24, 0.56 T S. Bourguignon Exact 0-norm optimization 18 / 23

Branch-and-Bound : example P(0) P(0) b1 = 1 b1 =
0 P(0) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb f = 0, x = 2.2, 0.16, 0.3, −0.072, −2.2 T , b = 0.58, 0.24, 0.27, 0.24, 0.56 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) P(1) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1 f = 0, x = 3.1, −0.05, 0.14, 0.51, −1.2 T , b = 1, 0.25, 0.23, 0.2, 0.32 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(1) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1 f = 0, x = 3.1, −0.05, 0.14, 0.51, −1.2 T , b = 1, 0.25, 0.23, 0.2, 0.32 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(2) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 1 f = 0.07, x = 3.1, −0, −0, −0, −1.9 T , b = 1, 0, 0, 0, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) P(3) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0 f = 0, x = 4.2, −0.29, −0.041, 1.2, 0 T , b = 1, 0.33, 0.3, 0.36, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(3) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0 f = 0, x = 4.2, −0.29, −0.041, 1.2, 0 T , b = 1, 0.33, 0.3, 0.36, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(4) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 1 f = 0.527, x = 3.7, −0, −0, 1.4, 0 T , b = 1, 0, 0, 1, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(5) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 1, b5 = 0, b4 = 0 f = 2.52, x = 3, −0.64, 0.95, 0, 0 T , b = 1, 0.47, 0.53, 0, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) P(6) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.24, 0.27, 0.41, 0.99 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(6) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.24, 0.27, 0.41, 0.99 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) P(7) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.31, 0.27, 0.42, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(7) : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1 f = 0, x = 0, 0.69, 0.69, −1.5, −4.9 T , b = 0, 0.31, 0.27, 0.42, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 1 f = 37, x = 0, −0, −0, −1.5, −5 T , b = 0, 0, 0, 1, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(8) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 1 f = 37, x = 0, −0, −0, −1.5, −5 T , b = 0, 0, 0, 1, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(9) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 1, b4 = 0 f = 0.327, x = 0, 0.19, 1.5, 0, −2.2 T , b = 0, 0.41, 0.59, 0, 1 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(10) P(10) R : min x∈R5,b∈[0,1]5 1 2 y − Ax 2 s.c. p bp ≤ 2, −Mb ≤ x ≤ Mb, b1 = 0, b5 = 0 f = 0.992, x = 0, −0.27, 2.2, 1, 0 T , b = 0, 0.47, 0.67, 0.52, 0 T S. Bourguignon Exact 0-norm optimization 18 / 23

0 P(1) b5 = 1 b5 = 0 P(2) P(3) b4 = 1 b4 = 0 P(4) P(5) P(6) b5 = 1 b5 = 0 P(7) b4 = 1 b4 = 0 P(8) P(8) P(9) P(10) P(2) f = 0.07, x = 3.1, −0, −0, −0, −1.9, T , b = 1, 0, 0, 0, 1, T S. Bourguignon Exact 0-norm optimization 18 / 23

Sparse deconvolution: computing time Implementation in C++, UNIX desktop machine
(single core) N = 120, P = 100, SNR = 10 dB, 50 instances Problem Branch-and-Bound MIP CPLEX12.8 Time Nb. nodes Failed Time Nb. nodes Failed (s) (×103) (s) (×103) K = 5 0.7 1.28 0 3.0 1.71 0 P2/0 K = 7 11.6 17.89 0 16.6 21.51 0 K = 9 43.5 57.37 9 53.8 72.04 6 K = 5 0.1 0.21 0 25.7 6.71 0 P0/2 K = 7 0.9 2.32 0 114.8 49.54 2 K = 9 2.5 5.22 0 328.2 101.07 17 K = 5 1.8 2.01 0 3.2 1.98 0 P2+0 K = 7 7.3 10.20 0 7.4 9.61 0 K = 9 25.6 31.80 5 17.3 23.74 2 Remark: C5 100 ∼ 7.5 107, C7 100 ∼ 1.6 1010, C9 100 ∼ 1.9 1012 Competitive exploration strategy (despite simple) Eﬃcient node evaluation (especially for small K) Performance ≥ CPLEX, CPLEX for P0/2 S. Bourguignon Exact 0-norm optimization 19 / 23

Subset selection problems (framework in [Bertsimas et al., 16]) N
= 500, P = 1 000, SNR = 8 dB, 50 instances an ∼ N(0, Σ), σij = ρ|i−j| y = Ax0 + , with equispaced non-zero values in x0 equal to 1 ρ = 0.8 ρ = 0.95 Exact recovery (%) 0 10 20 30 40 K 0 20 40 60 80 100 5 10 15 20 K 0 20 40 60 80 100 Sparsity degree Sparsity degree Better estimation than standard methods (even with limited time) Performance CPLEX S. Bourguignon Exact 0-norm optimization 20 / 23

Subset selection: computing times N = 500, P = 1
000, SNR = 8 dB, 50 instances Problem Branch-and-Bound CPLEX Time Nodes T/N Failed Time Nodes T/N Failed (s) (×103) (s) (s) (×103) (s) K = 5 0.3 0.01 0.02 0 80.0 0.01 6.07 0 P2/0 K = 9 7.4 0.15 0.05 0 265.8 0.18 1.45 0 K = 13 54.6 0.69 0.08 6 786.0 0.84 0.93 46 K = 17 624.4 5.82 0.11 43 - - - 50 K = 5 0.2 0.01 0.02 1 495.7 0.32 1.54 11 ρ = 0.8 P0/2 K = 9 16 0.21 0.08 0 - - - 50 K = 13 311.5 2.32 0.13 12 - - - 50 K = 17 694.3 4.90 0.14 47 - - - 50 K = 5 0.3 0.03 0.01 0 235.3 0.04 6.00 0 P2+0 K = 9 14.3 0.95 0.02 0 549.1 1.29 0.42 4 K = 13 855.3 34.52 0.02 32 982.6 3.61 0.27 49 K = 17 996.9 42.49 0.02 41 - - - 50 CPLEX performance ↓↓ Resolution guaranteed up to K ∼ 15 1000 15 ∼ 1032 . . . S. Bourguignon Exact 0-norm optimization 21 / 23

Conclusions Exact 0 solutions may be worth Speciﬁc Branch-and-bound method
generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions S. Bourguignon Exact 0-norm optimization 22 / 23

generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) S. Bourguignon Exact 0-norm optimization 22 / 23

generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) More sophisticated branching rules, local heuristics, cutting planes, . . . S. Bourguignon Exact 0-norm optimization 22 / 23

generic MIP resolution Cardinality-constrained, error-constrained, penalized problems Most computing time = proving optimality partial exploration competitive solutions Alternate formulations to bigM (SOS, set covering, . . . ) More sophisticated branching rules, local heuristics, cutting planes, . . . Improve relaxations non-convex solutions! 1x=0 |x| |x|p CEL0(x) S. Bourguignon Exact 0-norm optimization 22 / 23

Thank you. ANR JCJC program, 2016-2021, MIMOSA: Mixed Integer programming
Methods for Optimization of Sparse Approximation criteria. S. Bourguignon, J. Ninin, H. Carfantan and M. Mongeau. IEEE Transactions on Signal Processing, 2016. R. Ben Mhenni, S. Bourguignon and J. Ninin. Optimization Methods and Software, submitted. Matlab and C codes available upon request (Git will come soon!) S. Bourguignon Exact 0-norm optimization 23 / 23

Homotopy continuation principle [Osborne et al., 00, Efron et al.,
04] Consider the penalized formulation: xλ = arg min x J(x) := 1 2 y − Ax 2 + λ x 1 . xλ minimizes J if and only if 0 ∈ ∂J(xλ), with ∂J(xλ) = −AT (y − Axλ) + z ∈ RQ zi = sgn(xλ i ) if xλ i = 0 zi ∈ [−1, 1] if xλ i = 0 . With NZ and Z indexing the nonzero and the zero components in xλ, it reads: |AT Z (y − ANZxλ NZ )| ≤ λ. −AT NZ (y − ANZxλ NZ ) + λsgn(xλ NZ ) = 0. If λ ≥ λmax := AT y ∞ , then xλ = 0. Non-zero components satisfy xλ NZ = AT NZ ANZ −1 AT NZ y − λsgn(xλ NZ ) The solution path λ → xλ is a piecewise linear function of λ. It is a linear function of λ as long as the support conﬁguration {NZ, Z, sgn(xλ NZ )} does not change. S. Bourguignon Exact 0-norm optimization 24 / 23

Sébastien Bourguignon

Sébastien Bourguignon

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript