Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mesures de criticalité d'ordres 1 et 2 en reche...

Avatar for GdR MOA 2015 GdR MOA 2015
December 02, 2015

Mesures de criticalité d'ordres 1 et 2 en recherche directe

by C. Royer

Avatar for GdR MOA 2015

GdR MOA 2015

December 02, 2015
Tweet

More Decks by GdR MOA 2015

Other Decks in Science

Transcript

  1. Mesures de criticalité d'ordres 1 et 2 en recherche directe

    From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente Journées du GDR MOA - 02/12/15 Mesures de criticalité d'ordres 1 et 2 en recherche directe 1 / 25
  2. Outline 1 A problem: solving nonconvex problems via second-order methods

    2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 2 / 25
  3. Introduction We are interested in solving an unconstrained optimization problem:

    min x ∈R n f (x). The objective function f f bounded from below, C2; ∇f , ∇2f Lipschitz continuous; f nonconvex ⇒ the Hessian matrix is not always positive semidenite. Mesures de criticalité d'ordres 1 et 2 en recherche directe 3 / 25
  4. Caring about second order Our denition of a second-order method

    An optimization algorithm that exploits the (negative) curvature information contained in the Hessian matrix, to ensure second-order convergence. Second-order tools for the analysis Taylor expansion : f (x + s) − f (x) ≤ ∇f (x) s + 1 2 s ∇2f (x) s + L∇2f s 3, Directional derivative estimate f (x + s) − 2 f (x) + f (x − s) = s ∇2f (x) s + O s 3 . Mesures de criticalité d'ordres 1 et 2 en recherche directe 4 / 25
  5. Second-order derivative-based optimization Early treatment in Trust-Region and (Curvilinear) Line

    Search Methods; Negative curvature is seldom handled to provide second-order convergence guarantees; Regain of interest, with the outbreak of cubic models: Curtis et al '13,'14,'15, Wong ISMP '15. Main issues Cost of computing negative curvature directions; Dissociate the contributions from orders 1 and 2; No natural scaling between ∇f (x) and λmin ∇2f (x) . Mesures de criticalité d'ordres 1 et 2 en recherche directe 5 / 25
  6. Outline 1 A problem: solving nonconvex problems via second-order methods

    2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 6 / 25
  7. Solving the problem without using the derivatives We consider a

    setting in which derivatives of f are unavailable or too expensive for computation. Derivative-Free Optimization (DFO) methods Do not use the derivatives within the algorithm; Two main classes: Model-based methods; Direct-search methods. Introduction to Derivative-Free Optimization A.R. Conn, K. Scheinberg, L.N. Vicente. (2009) Mesures de criticalité d'ordres 1 et 2 en recherche directe 7 / 25
  8. Solving the problem without using the derivatives We consider a

    setting in which derivatives of f are unavailable or too expensive for computation. Derivative-Free Optimization (DFO) methods Do not use the derivatives within the algorithm; Two main classes: Model-based methods; Direct-search methods. Introduction to Derivative-Free Optimization A.R. Conn, K. Scheinberg, L.N. Vicente. (2009) Mesures de criticalité d'ordres 1 et 2 en recherche directe 7 / 25
  9. A simple direct-search framework 1 Initialization Set x0, α0 >

    0, θ < 1 ≤ γ. Set k = 0. 2 Poll Step Choose a polling/direction set of (unitary) vectors. If it exists dk within the set such that f (xk + αk dk) − f (xk) < −α3 k, then set xk+1 := xk + αk dk and αk+1 := γ αk. Otherwise, set xk+1 := xk and αk+1 := θ αk. 3 Set k = k + 1 and go back to the poll step. Mesures de criticalité d'ordres 1 et 2 en recherche directe 8 / 25
  10. A simple direct-search framework 1 Initialization Set x0, α0 >

    0, θ < 1 ≤ γ. Set k = 0. 2 Poll Step Choose a polling/direction set of (unitary) vectors. If it exists dk within the set such that f (xk + αk dk) − f (xk) < −α3 k, then set xk+1 := xk + αk dk and αk+1 := γ αk. Otherwise, set xk+1 := xk and αk+1 := θ αk. 3 Set k = k + 1 and go back to the poll step. Remarks Performance criterion : # of evaluations of f ; Theoretical properties mainly depend on polling choices. Mesures de criticalité d'ordres 1 et 2 en recherche directe 8 / 25
  11. Order 2 in derivative-free methods Few practical methods that explicitly

    deal with nonconvexity; For direct search, most results due to Abramson et al ('05,'06,'14). Issues with the existing direct-search approaches Study properties of (unknown) convergent subsequences; Rely on density assumptions and on direction sets dependent from an iteration to another. Our objective is to develop a method that exploits second-order properties at the iteration level. Mesures de criticalité d'ordres 1 et 2 en recherche directe 9 / 25
  12. Outline 1 A problem: solving nonconvex problems via second-order methods

    2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 10 / 25
  13. Back to the direct-search method 1 Initialization Set x0, α0

    > 0, θ < 1 ≤ γ. Set k = 0. 2 Poll Step Choose a polling set of (unitary) vectors. If it exists dk within the set such that f (xk + αk dk) − f (xk) < −α3 k, then set xk+1 := xk + αk dk and αk+1 := γ αk. Otherwise, set xk+1 := xk and αk+1 := θ αk. 3 Set k = k + 1 and go back to the poll step. How can we dene rules to choose the polling sets ? Mesures de criticalité d'ordres 1 et 2 en recherche directe 11 / 25
  14. First-order polling quality Typical direct-search methods ensure rst-order convergence; The

    polling sets must provide good approximations of the negative gradient. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25
  15. First-order polling quality Typical direct-search methods ensure rst-order convergence; The

    polling sets must provide good approximations of the negative gradient. A measure of rst-order quality Let D be a set of unitary vectors and v ∈ Rn \ {0}. Then cm(D, v) = max d ∈D d v v is called the cosine measure of D at v. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25
  16. First-order polling quality Typical direct-search methods ensure rst-order convergence; The

    polling sets must provide good approximations of the negative gradient. A measure of rst-order quality Let D be a set of unitary vectors and v ∈ Rn \ {0}. Then cm(D, v) = max d ∈D d v v is called the cosine measure of D at v. If cm(D, −∇f (x)) > 0, it means that D contains a descent direction of f at x. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25
  17. Usual polling choice Positive Spanning Sets (PSS) D is a

    PSS if it generates Rn by nonnegative linear combinations. D is a PSS i ∀v = 0, cm(D, v) > 0; a PSS contains at least n + 1 vectors; Ex) The coordinate set D⊕ = [I -I]. Mesures de criticalité d'ordres 1 et 2 en recherche directe 13 / 25
  18. Usual polling choice Positive Spanning Sets (PSS) D is a

    PSS if it generates Rn by nonnegative linear combinations. D is a PSS i ∀v = 0, cm(D, v) > 0; a PSS contains at least n + 1 vectors; Ex) The coordinate set D⊕ = [I -I]. PSS and rst-order convergence Two main ideas : Use the Taylor expansion f (x + α d) − f (x) ≤ α ∇f (x) d + L∇f α2. Assume that for every iteration, cm (D k , −∇f (x k )) ≥ κ, κ ∈ (0, 1). Mesures de criticalité d'ordres 1 et 2 en recherche directe 13 / 25
  19. First-order results First-order polling strategy 1 Poll along a Positive

    Spanning Set D k . Mesures de criticalité d'ordres 1 et 2 en recherche directe 14 / 25
  20. First-order results First-order polling strategy 1 Poll along a Positive

    Spanning Set D k . Convergence arguments Independently of D k , α k → 0; On unsuccessful iterations, α k ≥ O (κ ∇f (x k ) ) . Theorem (First-order convergence) lim inf k→∞ ∇f (x k ) = 0. Mesures de criticalité d'ordres 1 et 2 en recherche directe 14 / 25
  21. A second-order criticality measure Denition Given a set of unitary

    vectors D and a symmetric matrix A, the Rayleigh measure of D with respect to A is dened by rm (D, A) = min d ∈V (D) d A d, where V (D) = {d ∈ D | −d ∈ D} is the symmetric part of D. The Rayleigh measure is an approximation of the minimum eigenvalue; We want this approximation to be suciently good. Mesures de criticalité d'ordres 1 et 2 en recherche directe 15 / 25
  22. Rayleigh measure and negative curvature In derivative-based methods, if λmin(∇2f

    (x k )) < 0, one uses a sucient negative curvature direction: d ∇2f (x k ) d ≤ β λmin(∇2f (x k )), with β ∈ (0, 1]. In a direct-search environment Derivative-free: Hessian eigenvalues cannot be computed; Direct search: The step size goes to zero; We will be ensuring rm D k , ∇2f (x k ) ≤ β λmin(∇2f (x k )) + O(α k ). Mesures de criticalité d'ordres 1 et 2 en recherche directe 16 / 25
  23. A second-order polling strategy for Direct Search Second-order polling rules

    1 Poll along a PSS D k (First-order rule); Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25
  24. A second-order polling strategy for Direct Search Second-order polling rules

    1 Poll along a PSS D k (First-order rule); 2 Poll along -D k ; 3 Select a basis B k ⊂ D k and build an approximated Hessian H k ≈ B k ∇2f (x k ) B k , using function values; 4 Compute a unitary vector such that H k v k = λmin(H k ) v k ; poll along v k and -v k . Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25
  25. A second-order polling strategy for Direct Search Second-order polling rules

    1 Poll along a PSS D k (First-order rule); 2 Poll along -D k ; 3 Select a basis B k ⊂ D k and build an approximated Hessian H k ≈ B k ∇2f (x k ) B k , using function values; 4 Compute a unitary vector such that H k v k = λmin(H k ) v k ; poll along v k and -v k . The cost of an iteration is at most O(n2) evaluations. The polling stops as soon as it encounters a direction d such that f (x k + α k d) − f (x k ) < −α3 k . Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25
  26. Outline 1 A problem: solving nonconvex problems via second-order methods

    2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 18 / 25
  27. Second-order convergence Assumptions The D k 's are PSS with

    ∀k, cm(D k , −∇f (x k )) ≥ κ > 0; It exists σ ∈ (0, 1] such that ∀k, σmin(B k )2 ≥ σ > 0. Minimum eigenvalue estimate Let k be an unsuccessful iteration, and P k the corresponding polling set. rm P k , ∇2f (x k ) ≤ v k ∇2f (x k ) v k ≤ σ λmin(∇2f (x k )) + O (n α k ) . The factors σ and n are due to the approximation error. Mesures de criticalité d'ordres 1 et 2 en recherche directe 19 / 25
  28. Second-order convergence (2) Convergence arguments As before, α k →

    0; On an unsuccessful iteration k, one has: α k ≥ max O (κ ∇f (x k ) ) , O −σ n−1 λmin ∇2f (x k ) . Mesures de criticalité d'ordres 1 et 2 en recherche directe 20 / 25
  29. Second-order convergence (2) Convergence arguments As before, α k →

    0; On an unsuccessful iteration k, one has: α k ≥ max O (κ ∇f (x k ) ) , O −σ n−1 λmin ∇2f (x k ) . Theorem (Second-order convergence) lim inf k→∞ max ∇f (x k ) , −λmin(∇2f (x k )) = 0. Mesures de criticalité d'ordres 1 et 2 en recherche directe 20 / 25
  30. Second-order worst-case complexity We aim to reach an ( g

    , H )-second-order critical point, i.e. ∇f (x k ) < g and λmin(∇2f (x k )) > − H . Theorem Let N gH the number of evaluations of f needed to reach a ( g , H )-second-order critical point; then N gH ≤ O n2 max κ−3 −3 g , σ−3 n3 −3 H . Corollary Choosing D k = [I -I] yields κ = 1/ √ n, σ = 1, and the complexity bound is O n5 max −3 g , −3 H . Mesures de criticalité d'ordres 1 et 2 en recherche directe 21 / 25
  31. Practical insights On 60 CUTEst problems with negative curvature: Using

    symmetric sets generally improves the performance; Second-order rules (plain lines) allow to solve more problems. Mesures de criticalité d'ordres 1 et 2 en recherche directe 22 / 25
  32. Conclusion Our contributions The denition of a second-order criticality measure;

    A second-order direct-search method that converges w.r.t. this measure and its associated complexity; Numerical conrmation of the theoretical ndings. Mesures de criticalité d'ordres 1 et 2 en recherche directe 23 / 25
  33. Conclusion Our contributions The denition of a second-order criticality measure;

    A second-order direct-search method that converges w.r.t. this measure and its associated complexity; Numerical conrmation of the theoretical ndings. For more information A second-order globally convergent direct-search method and its worst-case complexity. S. Gratton, C. W. Royer, L. N. Vicente. To appear in Optimization. Mesures de criticalité d'ordres 1 et 2 en recherche directe 23 / 25
  34. Towards randomization Guaranteeing P (cm(D k , −∇f (x k

    )) > κ) ≥ p > 0 is sucient for rst-order convergence, and we can do it in practice (Gratton, R., Vicente and Zhang '14); Can we do the same with second-order properties ? Mesures de criticalité d'ordres 1 et 2 en recherche directe 24 / 25