83

# Dynamique(s) de descente pour l'optimisation multi-objectif

by G. Garrigos

#### GdR MOA 2015

December 02, 2015

## Transcript

1. ### Dynamique(s) de descente pour l’optimisation multi-objectif Guillaume Garrigos, Hédy Attouch

Istituto Italiano di Tecnologia & Massachusetts Institute of Technology Genoa, Italie Journées du GdR MOA 2 Décembre, 2015 Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 1/19
2. ### Introduction/Motivation Multi-objective problem In engineering, decision sciences, it happens that

various objective functions shall be minimized simultaneously: f1, ..., fm Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 2/19
3. ### Introduction/Motivation Multi-objective problem In engineering, decision sciences, it happens that

various objective functions shall be minimized simultaneously: f1, ..., fm −→ Needs appropriate tools: multi-objective optimization. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 2/19
4. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz, H Hilbert. Solve MIN (f1(x), ..., fm(x)) : x ∈ C ⊂ H convex. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 3/19
5. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz, H Hilbert. Solve MIN (f1(x), ..., fm(x)) : x ∈ C ⊂ H convex. We consider the usual order(s) on Rm: a ĺ b ⇔ ai ≤ bi for all i = 1, ..., m, a ă b ⇔ ai < bi for all i = 1, ..., m. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 3/19
6. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz, H Hilbert. Solve MIN (f1(x), ..., fm(x)) : x ∈ C ⊂ H convex. We consider the usual order(s) on Rm: a ĺ b ⇔ ai ≤ bi for all i = 1, ..., m, a ă b ⇔ ai < bi for all i = 1, ..., m. x is a Pareto point if y ∈ C such that F(y) ň F(x) x is a weak Pareto point if y ∈ C such that F(y) ă F(x) Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 3/19
7. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. How to solve it? Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 4/19
8. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. How to solve it? genetic algorithm −→ no theoretical guarantees. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 4/19
9. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. How to solve it? genetic algorithm −→ no theoretical guarantees. scalarization method: minimize fθ := m i=1 θi fi , θ = (θi )i=1..m ∈ ∆m because θ∈∆m argmin x∈H fθ(x) ⊂ {weak Paretos}. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 4/19
10. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. We are going to present a method which: Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 4/19
11. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. We are going to present a method which: generalizes the steepest descent dynamic ˙ x(t) + ∇f (x(t)) = 0, Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 4/19
12. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. We are going to present a method which: generalizes the steepest descent dynamic ˙ x(t) + ∇f (x(t)) = 0, is cooperative, i.e. all objective functions decrease simultaneously, Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 4/19
13. ### The multi-objective optimization problem Let F = (f1, ..., fm)

: H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. We are going to present a method which: generalizes the steepest descent dynamic ˙ x(t) + ∇f (x(t)) = 0, is cooperative, i.e. all objective functions decrease simultaneously, is independent of any choice of parameters. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 4/19
14. ### Towards a descent dynamic for multi-objective optimization Historical review Smale

(1975) Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 5/19
15. ### Towards a descent dynamic for multi-objective optimization Historical review Smale

(1975) Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 5/19
16. ### Towards a descent dynamic for multi-objective optimization Historical review Smale

(1975) Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 5/19
17. ### Towards a descent dynamic for multi-objective optimization Historical review Cornet

(1981) s(x) := −[∇f1(x), ∇f2(x)]0 s(x), ∇fi (x) < 0 ∇f1(x) ∇f2(x) PhD defense - Guillaume Garrigos 28/30
18. ### Multi-objective steepest descent Let F = (f1, ..., fm) :

H −→ Rm locally Lipschitz, C = H Hilbert. Deﬁnition For all x ∈ H, s(x) := − (co {∂Cfi (x)}i=1,...,m)0 is the (common) steepest descent direction at x. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 6/19
19. ### Multi-objective steepest descent Let F = (f1, ..., fm) :

H −→ Rm locally Lipschitz, C = H Hilbert. Deﬁnition For all x ∈ H, s(x) := − (co {∂Cfi (x)}i=1,...,m)0 is the (common) steepest descent direction at x. Remarks in the smooth case If m = 1 then s(x) = −∇f1(x). Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 6/19
20. ### Multi-objective steepest descent Let F = (f1, ..., fm) :

H −→ Rm locally Lipschitz, C = H Hilbert. Deﬁnition For all x ∈ H, s(x) := − (co {∂Cfi (x)}i=1,...,m)0 is the (common) steepest descent direction at x. Remarks in the smooth case If m = 1 then s(x) = −∇f1(x). At each x, s(x) selects a convex combination: s(x) = − m i=1 θi (x)∇fi (x) = −∇fθ(x) (x) where fθ(x) = m i=1 θi (x)fi . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 6/19
21. ### Multi-objective steepest descent Let F = (f1, ..., fm) :

H −→ Rm locally Lipschitz, C = H Hilbert. Deﬁnition For all x ∈ H, s(x) := − (co {∂Cfi (x)}i=1,...,m)0 is the (common) steepest descent direction at x. Remarks in the smooth case If m = 1 then s(x) = −∇f1(x). At each x, s(x) selects a convex combination: s(x) = − m i=1 θi (x)∇fi (x) = −∇fθ(x) (x) where fθ(x) = m i=1 θi (x)fi . s(x) is the steepest descent: s(x) s(x) = argmin d∈BH max i=1,...,m ∇fi (x), d . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 6/19
22. ### The (multi-objective) Steepest Descent dynamic We consider the continuous steepest

descent dynamic: (SD) ˙ x(t) = s(x(t)), i.e. (SD) ˙ x(t) + (co {∂Cfi (x(t))}i )0 = 0. A solution is a trajectory x : [0, +∞[−→ H being absolutely continuous (on bounded intervals), satisfying (SD) for a.e. t ≥ 0. It is the continuous version of the steepest descent algorithm studied by Svaiter, Fliege, Iusem, ... Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 7/19
23. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x 2 and f2(x) = x1 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 8/19
24. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x 2 and f2(x) = x1 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 8/19
25. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x 2 and f2(x) = x1 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 8/19
26. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x 2 and f2(x) = x1 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 8/19
27. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x 2 and f2(x) = x1 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 8/19
28. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x 2 and f2(x) = x1 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 8/19
29. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x2 1 and f2(x) = x2 2 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 9/19
30. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x2 1 and f2(x) = x2 2 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 9/19
31. ### The (multi-objective) Steepest Descent dynamic Example (SD) ˙ x(t) =

s(x(t)) with f1(x) = x2 1 and f2(x) = x2 2 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 9/19
32. ### The (multi-objective) Steepest Descent dynamic Main results (Attouch, G., Goudou,

2014) A cooperative dynamic Let x : R+ −→ H be a solution of (SD) ˙ x(t) = s(x(t)). For all i = 1, ..., m, the function t → fi (x(·)) is decreasing. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 10/19
33. ### The (multi-objective) Steepest Descent dynamic Main results (Attouch, G., Goudou,

2014) A cooperative dynamic Let x : R+ −→ H be a solution of (SD) ˙ x(t) = s(x(t)). For all i = 1, ..., m, the function t → fi (x(·)) is decreasing. Convergence in the convex case Assume that the objective functions are convex. Then any bounded trajectory weakly converges to a weak Pareto point. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 10/19
34. ### The (multi-objective) Steepest Descent dynamic Main results (Attouch, G., Goudou,

2014) A cooperative dynamic Let x : R+ −→ H be a solution of (SD) ˙ x(t) = s(x(t)). For all i = 1, ..., m, the function t → fi (x(·)) is decreasing. Convergence in the convex case Assume that the objective functions are convex. Then any bounded trajectory weakly converges to a weak Pareto point. Existence in the convex case Suppose that H is ﬁnite dimensional. Then, for any initial data, there exists a global solution to (SD). Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 10/19
35. ### The (multi-objective) Steepest Descent dynamic Going further In case of

convex constraint C ⊂ H: (SD) ˙ x(t) + (NC (x(t)) + co {∂Cfi (x(t))}i )0 = 0. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 11/19
36. ### The (multi-objective) Steepest Descent dynamic Going further In case of

convex constraint C ⊂ H: (SD) ˙ x(t) + (NC (x(t)) + co {∂Cfi (x(t))}i )0 = 0. Uniqueness? Yes, if {∇fi (x(·))}i=1,...,m are aﬃnely independants. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 11/19
37. ### The (multi-objective) Steepest Descent dynamic Going further In case of

convex constraint C ⊂ H: (SD) ˙ x(t) + (NC (x(t)) + co {∂Cfi (x(t))}i )0 = 0. Uniqueness? Yes, if {∇fi (x(·))}i=1,...,m are aﬃnely independants. Convergence to Pareto points? Guaranteed by endowing Rm with a diﬀerent order (recall F = (f1, ..., fm) : H −→ Rm). Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 11/19
38. ### Numerical results Recovering the Pareto front f1(x, y) = x

+ y f2(x, y) = x2 + y2 + 1 x + 3e−100(x−0.3)2 + 3e−100(x−0.6)2 (x, y) ∈ C = [0.1, 1]2 Plot of F(C), F = (f1, f2) : C −→ R2 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 12/19
39. ### Numerical results Recovering the Pareto front f1(x, y) = x

+ y f2(x, y) = x2 + y2 + 1 x + 3e−100(x−0.3)2 + 3e−100(x−0.6)2 (x, y) ∈ C = [0.1, 1]2 Plot of F(C), F = (f1, f2) : C −→ R2 and its pareto front. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 12/19
40. ### Numerical results Recovering the Pareto front f1(x, y) = x

+ y f2(x, y) = x2 + y2 + 1 x + 3e−100(x−0.3)2 + 3e−100(x−0.6)2 (x, y) ∈ C = [0.1, 1]2 Gradient method (Right) vs Scalar method (Left). 100 samples. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 12/19
41. ### Numerical results Pareto selection with Tikhonov penalization Can we select,

among the weak Paretos (= the zeros of x → s(x)) the closest to a desired state? Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 13/19
42. ### Numerical results Pareto selection with Tikhonov penalization Can we select,

among the weak Paretos (= the zeros of x → s(x)) the closest to a desired state? → Tikhonov penalization Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 13/19
43. ### Numerical results Pareto selection with Tikhonov penalization Can we select,

among the weak Paretos (= the zeros of x → s(x)) the closest to a desired state? → Tikhonov penalization ˙ x(t) − s(x(t)) + ε(t)(x(t) − xd ) = 0, ε(t) ↓ 0, ∞ 0 ε(t) dt = +∞. See the works of Attouch, Cabot, Czarnecki, Peypouquet (...) in the monotone case. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 13/19
44. ### Numerical results Pareto selection with Tikhonov penalization Journées du GdR

MOA 2015 - Dijon - Guillaume Garrigos 13/19
45. ### Numerical results Pareto selection with Tikhonov penalization Journées du GdR

MOA 2015 - Dijon - Guillaume Garrigos 13/19
46. ### Numerical results Pareto selection with Tikhonov penalization Journées du GdR

MOA 2015 - Dijon - Guillaume Garrigos 13/19
47. ### Convergence rates : empirical observation ˙ x(t) + ∇f (x(t))

= 0 ¨ x(t) + γ ˙ x(t) + ∇f (x(t)) = 0 Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 14/19
48. ### Convergence rates : empirical observation ˙ x(t) + ∇f (x(t))

= 0 ¨ x(t) + γ ˙ x(t) + ∇f (x(t)) = 0 Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 14/19
49. ### Convergence rates : empirical observation ˙ x(t) + ∇f (x(t))

= 0 ¨ x(t) + γ ˙ x(t) + ∇f (x(t)) = 0 Inertia promotes Faster trajectories (varying γ(t)), Exploratory properties. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 14/19
50. ### Convergence rates : empirical observation f1(x)= 10 i=1 x2 i

−10cos(2πxi )+10 1 4 , f2(x)= 10 i=1 (xi −1.5)2−10cos(2π(xi −1.5))+10 1 4 Convergence rate of F(xn) − F(x∞) ∞ : Steepest Descent vs Inertial Steepest Descent Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 15/19
51. ### Inertial (multi-objective) Steepest Descent Let f1, ..., fm be smooth,

with L-Lipschitz gradient. (ISD) ¨ x(t) = −γ ˙ x(t) + s(x(t)). Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 16/19
52. ### Inertial (multi-objective) Steepest Descent Let f1, ..., fm be smooth,

with L-Lipschitz gradient. (ISD) ¨ x(t) = −γ ˙ x(t) + s(x(t)). Example: f1(x) = x 2 and f2(x) = x1 . Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 16/19
53. ### Inertial (multi-objective) Steepest Descent Main results (Attouch, G., 2015) Let

f1, ..., fm be smooth, with L-Lipschitz gradient. (ISD) m¨ x(t) = −γ ˙ x(t) + s(x(t)). Assume that γ ≥ L. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 17/19
54. ### Inertial (multi-objective) Steepest Descent Main results (Attouch, G., 2015) Let

f1, ..., fm be smooth, with L-Lipschitz gradient. (ISD) m¨ x(t) = −γ ˙ x(t) + s(x(t)). Assume that γ ≥ L. Existence Suppose that H is ﬁnite dimensional. Then, for any initial data, there exists a global solution to (ISD). Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 17/19
55. ### Inertial (multi-objective) Steepest Descent Main results (Attouch, G., 2015) Let

f1, ..., fm be smooth, with L-Lipschitz gradient. (ISD) m¨ x(t) = −γ ˙ x(t) + s(x(t)). Assume that γ ≥ L. Existence Suppose that H is ﬁnite dimensional. Then, for any initial data, there exists a global solution to (ISD). Convergence in the convex case Let f1, ..., fm be convex. Then, any bounded trajectory weakly converges to a weak Pareto point. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 17/19
56. ### Conclusion The steepest descent provides a ﬂexible tool once adapted

to multi-objective optimization problems. Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 18/19
57. ### Conclusion The steepest descent provides a ﬂexible tool once adapted

to multi-objective optimization problems. Open questions: Uniqueness of the trajectories for ˙ x(t) = s(x(t))? Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 18/19
58. ### Conclusion The steepest descent provides a ﬂexible tool once adapted

to multi-objective optimization problems. Open questions: Uniqueness of the trajectories for ˙ x(t) = s(x(t))? Understand the asymptotic behaviour of ˙ x(t) − s(x(t)) + ε(t)x(t) = 0 (the set of weak Paretos is non-convex). Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 18/19
59. ### Conclusion The steepest descent provides a ﬂexible tool once adapted

to multi-objective optimization problems. Open questions: Uniqueness of the trajectories for ˙ x(t) = s(x(t))? Understand the asymptotic behaviour of ˙ x(t) − s(x(t)) + ε(t)x(t) = 0 (the set of weak Paretos is non-convex). Having convergence rates for ﬁrst and second-order dynamics (the critical values are not unique). Journées du GdR MOA 2015 - Dijon - Guillaume Garrigos 18/19