Mesures de criticalité d'ordres 1 et 2 en recherche directe

22c721aa043f752b3b6e3299df04b306?s=47 GdR MOA 2015
December 02, 2015

Mesures de criticalité d'ordres 1 et 2 en recherche directe

by C. Royer

22c721aa043f752b3b6e3299df04b306?s=128

GdR MOA 2015

December 02, 2015
Tweet

Transcript

  1. Mesures de criticalité d'ordres 1 et 2 en recherche directe

    From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente Journées du GDR MOA - 02/12/15 Mesures de criticalité d'ordres 1 et 2 en recherche directe 1 / 25
  2. Outline 1 A problem: solving nonconvex problems via second-order methods

    2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 2 / 25
  3. Introduction We are interested in solving an unconstrained optimization problem:

    min x ∈R n f (x). The objective function f f bounded from below, C2; ∇f , ∇2f Lipschitz continuous; f nonconvex ⇒ the Hessian matrix is not always positive semidenite. Mesures de criticalité d'ordres 1 et 2 en recherche directe 3 / 25
  4. Caring about second order Our denition of a second-order method

    An optimization algorithm that exploits the (negative) curvature information contained in the Hessian matrix, to ensure second-order convergence. Second-order tools for the analysis Taylor expansion : f (x + s) − f (x) ≤ ∇f (x) s + 1 2 s ∇2f (x) s + L∇2f s 3, Directional derivative estimate f (x + s) − 2 f (x) + f (x − s) = s ∇2f (x) s + O s 3 . Mesures de criticalité d'ordres 1 et 2 en recherche directe 4 / 25
  5. Second-order derivative-based optimization Early treatment in Trust-Region and (Curvilinear) Line

    Search Methods; Negative curvature is seldom handled to provide second-order convergence guarantees; Regain of interest, with the outbreak of cubic models: Curtis et al '13,'14,'15, Wong ISMP '15. Main issues Cost of computing negative curvature directions; Dissociate the contributions from orders 1 and 2; No natural scaling between ∇f (x) and λmin ∇2f (x) . Mesures de criticalité d'ordres 1 et 2 en recherche directe 5 / 25
  6. Outline 1 A problem: solving nonconvex problems via second-order methods

    2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 6 / 25
  7. Solving the problem without using the derivatives We consider a

    setting in which derivatives of f are unavailable or too expensive for computation. Derivative-Free Optimization (DFO) methods Do not use the derivatives within the algorithm; Two main classes: Model-based methods; Direct-search methods. Introduction to Derivative-Free Optimization A.R. Conn, K. Scheinberg, L.N. Vicente. (2009) Mesures de criticalité d'ordres 1 et 2 en recherche directe 7 / 25
  8. Solving the problem without using the derivatives We consider a

    setting in which derivatives of f are unavailable or too expensive for computation. Derivative-Free Optimization (DFO) methods Do not use the derivatives within the algorithm; Two main classes: Model-based methods; Direct-search methods. Introduction to Derivative-Free Optimization A.R. Conn, K. Scheinberg, L.N. Vicente. (2009) Mesures de criticalité d'ordres 1 et 2 en recherche directe 7 / 25
  9. A simple direct-search framework 1 Initialization Set x0, α0 >

    0, θ < 1 ≤ γ. Set k = 0. 2 Poll Step Choose a polling/direction set of (unitary) vectors. If it exists dk within the set such that f (xk + αk dk) − f (xk) < −α3 k, then set xk+1 := xk + αk dk and αk+1 := γ αk. Otherwise, set xk+1 := xk and αk+1 := θ αk. 3 Set k = k + 1 and go back to the poll step. Mesures de criticalité d'ordres 1 et 2 en recherche directe 8 / 25
  10. A simple direct-search framework 1 Initialization Set x0, α0 >

    0, θ < 1 ≤ γ. Set k = 0. 2 Poll Step Choose a polling/direction set of (unitary) vectors. If it exists dk within the set such that f (xk + αk dk) − f (xk) < −α3 k, then set xk+1 := xk + αk dk and αk+1 := γ αk. Otherwise, set xk+1 := xk and αk+1 := θ αk. 3 Set k = k + 1 and go back to the poll step. Remarks Performance criterion : # of evaluations of f ; Theoretical properties mainly depend on polling choices. Mesures de criticalité d'ordres 1 et 2 en recherche directe 8 / 25
  11. Order 2 in derivative-free methods Few practical methods that explicitly

    deal with nonconvexity; For direct search, most results due to Abramson et al ('05,'06,'14). Issues with the existing direct-search approaches Study properties of (unknown) convergent subsequences; Rely on density assumptions and on direction sets dependent from an iteration to another. Our objective is to develop a method that exploits second-order properties at the iteration level. Mesures de criticalité d'ordres 1 et 2 en recherche directe 9 / 25
  12. Outline 1 A problem: solving nonconvex problems via second-order methods

    2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 10 / 25
  13. Back to the direct-search method 1 Initialization Set x0, α0

    > 0, θ < 1 ≤ γ. Set k = 0. 2 Poll Step Choose a polling set of (unitary) vectors. If it exists dk within the set such that f (xk + αk dk) − f (xk) < −α3 k, then set xk+1 := xk + αk dk and αk+1 := γ αk. Otherwise, set xk+1 := xk and αk+1 := θ αk. 3 Set k = k + 1 and go back to the poll step. How can we dene rules to choose the polling sets ? Mesures de criticalité d'ordres 1 et 2 en recherche directe 11 / 25
  14. First-order polling quality Typical direct-search methods ensure rst-order convergence; The

    polling sets must provide good approximations of the negative gradient. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25
  15. First-order polling quality Typical direct-search methods ensure rst-order convergence; The

    polling sets must provide good approximations of the negative gradient. A measure of rst-order quality Let D be a set of unitary vectors and v ∈ Rn \ {0}. Then cm(D, v) = max d ∈D d v v is called the cosine measure of D at v. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25
  16. First-order polling quality Typical direct-search methods ensure rst-order convergence; The

    polling sets must provide good approximations of the negative gradient. A measure of rst-order quality Let D be a set of unitary vectors and v ∈ Rn \ {0}. Then cm(D, v) = max d ∈D d v v is called the cosine measure of D at v. If cm(D, −∇f (x)) > 0, it means that D contains a descent direction of f at x. Mesures de criticalité d'ordres 1 et 2 en recherche directe 12 / 25
  17. Usual polling choice Positive Spanning Sets (PSS) D is a

    PSS if it generates Rn by nonnegative linear combinations. D is a PSS i ∀v = 0, cm(D, v) > 0; a PSS contains at least n + 1 vectors; Ex) The coordinate set D⊕ = [I -I]. Mesures de criticalité d'ordres 1 et 2 en recherche directe 13 / 25
  18. Usual polling choice Positive Spanning Sets (PSS) D is a

    PSS if it generates Rn by nonnegative linear combinations. D is a PSS i ∀v = 0, cm(D, v) > 0; a PSS contains at least n + 1 vectors; Ex) The coordinate set D⊕ = [I -I]. PSS and rst-order convergence Two main ideas : Use the Taylor expansion f (x + α d) − f (x) ≤ α ∇f (x) d + L∇f α2. Assume that for every iteration, cm (D k , −∇f (x k )) ≥ κ, κ ∈ (0, 1). Mesures de criticalité d'ordres 1 et 2 en recherche directe 13 / 25
  19. First-order results First-order polling strategy 1 Poll along a Positive

    Spanning Set D k . Mesures de criticalité d'ordres 1 et 2 en recherche directe 14 / 25
  20. First-order results First-order polling strategy 1 Poll along a Positive

    Spanning Set D k . Convergence arguments Independently of D k , α k → 0; On unsuccessful iterations, α k ≥ O (κ ∇f (x k ) ) . Theorem (First-order convergence) lim inf k→∞ ∇f (x k ) = 0. Mesures de criticalité d'ordres 1 et 2 en recherche directe 14 / 25
  21. A second-order criticality measure Denition Given a set of unitary

    vectors D and a symmetric matrix A, the Rayleigh measure of D with respect to A is dened by rm (D, A) = min d ∈V (D) d A d, where V (D) = {d ∈ D | −d ∈ D} is the symmetric part of D. The Rayleigh measure is an approximation of the minimum eigenvalue; We want this approximation to be suciently good. Mesures de criticalité d'ordres 1 et 2 en recherche directe 15 / 25
  22. Rayleigh measure and negative curvature In derivative-based methods, if λmin(∇2f

    (x k )) < 0, one uses a sucient negative curvature direction: d ∇2f (x k ) d ≤ β λmin(∇2f (x k )), with β ∈ (0, 1]. In a direct-search environment Derivative-free: Hessian eigenvalues cannot be computed; Direct search: The step size goes to zero; We will be ensuring rm D k , ∇2f (x k ) ≤ β λmin(∇2f (x k )) + O(α k ). Mesures de criticalité d'ordres 1 et 2 en recherche directe 16 / 25
  23. A second-order polling strategy for Direct Search Second-order polling rules

    1 Poll along a PSS D k (First-order rule); Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25
  24. A second-order polling strategy for Direct Search Second-order polling rules

    1 Poll along a PSS D k (First-order rule); 2 Poll along -D k ; 3 Select a basis B k ⊂ D k and build an approximated Hessian H k ≈ B k ∇2f (x k ) B k , using function values; 4 Compute a unitary vector such that H k v k = λmin(H k ) v k ; poll along v k and -v k . Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25
  25. A second-order polling strategy for Direct Search Second-order polling rules

    1 Poll along a PSS D k (First-order rule); 2 Poll along -D k ; 3 Select a basis B k ⊂ D k and build an approximated Hessian H k ≈ B k ∇2f (x k ) B k , using function values; 4 Compute a unitary vector such that H k v k = λmin(H k ) v k ; poll along v k and -v k . The cost of an iteration is at most O(n2) evaluations. The polling stops as soon as it encounters a direction d such that f (x k + α k d) − f (x k ) < −α3 k . Mesures de criticalité d'ordres 1 et 2 en recherche directe 17 / 25
  26. Outline 1 A problem: solving nonconvex problems via second-order methods

    2 A context: direct-search methods 3 From rst to second-order polling 4 Second-order analysis and numerical behaviour Mesures de criticalité d'ordres 1 et 2 en recherche directe 18 / 25
  27. Second-order convergence Assumptions The D k 's are PSS with

    ∀k, cm(D k , −∇f (x k )) ≥ κ > 0; It exists σ ∈ (0, 1] such that ∀k, σmin(B k )2 ≥ σ > 0. Minimum eigenvalue estimate Let k be an unsuccessful iteration, and P k the corresponding polling set. rm P k , ∇2f (x k ) ≤ v k ∇2f (x k ) v k ≤ σ λmin(∇2f (x k )) + O (n α k ) . The factors σ and n are due to the approximation error. Mesures de criticalité d'ordres 1 et 2 en recherche directe 19 / 25
  28. Second-order convergence (2) Convergence arguments As before, α k →

    0; On an unsuccessful iteration k, one has: α k ≥ max O (κ ∇f (x k ) ) , O −σ n−1 λmin ∇2f (x k ) . Mesures de criticalité d'ordres 1 et 2 en recherche directe 20 / 25
  29. Second-order convergence (2) Convergence arguments As before, α k →

    0; On an unsuccessful iteration k, one has: α k ≥ max O (κ ∇f (x k ) ) , O −σ n−1 λmin ∇2f (x k ) . Theorem (Second-order convergence) lim inf k→∞ max ∇f (x k ) , −λmin(∇2f (x k )) = 0. Mesures de criticalité d'ordres 1 et 2 en recherche directe 20 / 25
  30. Second-order worst-case complexity We aim to reach an ( g

    , H )-second-order critical point, i.e. ∇f (x k ) < g and λmin(∇2f (x k )) > − H . Theorem Let N gH the number of evaluations of f needed to reach a ( g , H )-second-order critical point; then N gH ≤ O n2 max κ−3 −3 g , σ−3 n3 −3 H . Corollary Choosing D k = [I -I] yields κ = 1/ √ n, σ = 1, and the complexity bound is O n5 max −3 g , −3 H . Mesures de criticalité d'ordres 1 et 2 en recherche directe 21 / 25
  31. Practical insights On 60 CUTEst problems with negative curvature: Using

    symmetric sets generally improves the performance; Second-order rules (plain lines) allow to solve more problems. Mesures de criticalité d'ordres 1 et 2 en recherche directe 22 / 25
  32. Conclusion Our contributions The denition of a second-order criticality measure;

    A second-order direct-search method that converges w.r.t. this measure and its associated complexity; Numerical conrmation of the theoretical ndings. Mesures de criticalité d'ordres 1 et 2 en recherche directe 23 / 25
  33. Conclusion Our contributions The denition of a second-order criticality measure;

    A second-order direct-search method that converges w.r.t. this measure and its associated complexity; Numerical conrmation of the theoretical ndings. For more information A second-order globally convergent direct-search method and its worst-case complexity. S. Gratton, C. W. Royer, L. N. Vicente. To appear in Optimization. Mesures de criticalité d'ordres 1 et 2 en recherche directe 23 / 25
  34. Towards randomization Guaranteeing P (cm(D k , −∇f (x k

    )) > κ) ≥ p > 0 is sucient for rst-order convergence, and we can do it in practice (Gratton, R., Vicente and Zhang '14); Can we do the same with second-order properties ? Mesures de criticalité d'ordres 1 et 2 en recherche directe 24 / 25
  35. Merci ! clement.royer@enseeiht.fr Mesures de criticalité d'ordres 1 et 2

    en recherche directe 25 / 25