Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to mathematical programming

An introduction to mathematical programming

Gianluca Campanella

April 29, 2017
Tweet

More Decks by Gianluca Campanella

Other Decks in Science

Transcript

  1. What is mathematical programming? • Also known as (mathematical) optimisation

    • Goal is to select the ‘best’ element from some set of available alternatives Typically we have an objective function, e.g. f : Rp → R, that: • Takes p inputs as a vector, e.g. x ∈ Rp • Maps the input to some output value f(x) ∈ R We want to find the optimal x⋆ that minimises (or maximises) f 2
  2. What is mathematical programming? • Many ML methods rely on

    minimisation of cost functions Linear regression MSE(ˆ β) = 1 n ∑ i (ˆ yi − yi)2 where ˆ yi = x⊺ i ˆ β Logistic regression LogLoss(ˆ β) = − ∑ i [yi log ˆ pi + (1 − yi) log(1 − ˆ pi)] where ˆ pi = logit−1(x⊺ i ˆ β) 2
  3. Local and global optima A function may have multiple optima

    ↓ Some will be local, some will be global −3 0 3 6 9 −4 −2 0 2 4 x y 3
  4. Hard optimisation problems Consider these three functions: f : R100

    → R g : [0, 1]100 → R h : {0, 1}100 → R Which one is ‘harder’ to optimise, and why? 4
  5. Combinatorial optimisation Combinatorial problems like optimising h : {0, 1}100

    → R are intrinsically hard • Need to try all 2100 ≈ 1.27 × 1030 combinations • Variable selection is a notable example Side note If h is continuous and we’re actually constraining x ∈ {0, 1}100, approximate solutions (relaxations) are normally easier to obtain 5
  6. Numerical optimisation using directional information Function is differentiable (analytically or

    numerically) ↓ Gradient gives a search direction and Hessian can be used to confirm optimality −3 0 3 6 9 −4 −2 0 2 4 x y 6
  7. Convex functions Function is convex ↓ Any local minimum is

    also a global minimum 0 5 10 15 −4 −2 0 2 4 x y 7
  8. Constrained optimisation What about g : [0, 1]100 → R?

    • Harder than f : R100 → R… but not much • Directional information still useful • Need to ensure search strategy doesn’t escape the feasible region 8
  9. Linear programs max x c⊺x s.t. Ax ≤ b x

    ≥ 0 • Linear objective, linear constraints • Linear objective is convex ⇝ global maximum • An optimal solution need not exist: • Inconsistent constraints ⇝ infeasible • Feasible region unbounded in the direction of the gradient of the objective 9
  10. Linear programs max x, y 3x + 4y s.t. x

    + 2y ≤ 14 3x − y ≥ 0 x − y ≤ 2 q 0 10 20 −2.5 0.0 2.5 5.0 7.5 x y 9
  11. Linear programs Linear programs can be solved efficiently using: •

    Simplex algorithm • Interior-point (barrier) methods Performance is generally similar, but might differ drastically for specific problems 9
  12. Convex quadratic programs min x 1 2 x⊺Qx + c⊺x

    s.t. Ax ⪯ b x ⪰ 0 • Quadratic objective, quadratic constraints • Are quadratic objectives always convex? • Q must be (semi)definite 10
  13. Convex quadratic programs Quadratic programs can be solved efficiently using:

    • Active set method • Augmented Lagrangian method • Conjugate gradient method • Interior-point (barrier) methods 10
  14. LPs and QPs in Python Many Python libraries exist: Linear

    programming • PuLP • Google Optimization Tools • clpy Convex quadratic programming • CVXOPT • CVXPY 11
  15. Linear regression We can rewrite the least-squares problem min x

    || Ax − b ||2 2 = ∑ i ε2 i as the convex quadratic objective f(x) = x⊺A⊺Ax − 2b⊺Ax + b⊺b Side note Setting the gradient to 0 and solving for x recovers the normal equations: ∇f = 2A⊺Ax − 2A⊺b = 0 ⇝ A⊺Ax = A⊺b ⇝ x⋆ = (A⊺A)−1 A⊺b 12
  16. Regularised linear regression Let’s add a penalisation term: min x

    || Ax − b ||2 2 + λ || x ||2 2 Our quadratic objective becomes: f(x) = x⊺ (A⊺A + λIp) x − 2b⊺Ax + b⊺b Side note This is a good trick to use when the columns of A are not perfectly independent 13
  17. Constraints on x Nonnegativity • x ≥ 0 • Parameters

    known to be nonnegative, e.g. intensities or rates Bounds • l ≤ x ≤ u • Prior knowledge of permissible values Unit sum • x ≥ 0 and 1⊺ p x = 1 • Useful for proportions and probability distributions 14
  18. Least squares vs least absolute deviations Why do we minimise

    squared residuals? • Stable, unique, analytical solution • Not very robust! 15
  19. Least squares vs least absolute deviations Least absolute deviations •

    Predates least squares by around 50 years (Bošković) • Adopted by Laplace, but shadowed by Legendre and Gauss • Robust • Possibly multiple solutions 15
  20. Robust regression We can rewrite the LAD problem min x

    || Ax − b ||1 = ∑ i |εi| as the linear program min x,t 1⊺ n t s.t. − t ≤ Ax − b ≤ t t ∈ Rn or min x,u,v 1⊺ n u + 1⊺ n v s.t. Ax + u − v = b u, v ≥ 0 16
  21. Quantile regression Let’s now introduce a weight τ ∈ [0,

    1] min x,u,v τ 1⊺ n u + (1 − τ)1⊺ n v s.t. Ax + u − v = b u, v ≥ 0 This is the τth quantile regression problem q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 10 20 30 0 25 50 75 100 x y 17
  22. Example • Consider these two assets: A Equally likely to

    go up 20% or down 10% in a year A Equally likely to go up 20% or down 10% in a year • Assume they’re perfectly inversely correlated • How would you allocate your money? 18
  23. Example • Consider these two assets: A Equally likely to

    go up 20% or down 10% in a year A Equally likely to go up 20% or down 10% in a year • Assume they’re perfectly inversely correlated • How would you allocate your money? The portfolio 50% A + 50% B goes up 5% every year! 18
  24. Mean-variance approach of Markowitz Given historical ROIs, denoted ri(t) for

    asset i at time t ≤ T, we can compute: • The reward of asset i: rewardi = 1 T ∑ t ri(t) • The risk of asset i: riski = 1 T ∑ t [ri(t) − rewardi]2 We can compute the same quantities for a portfolio x ≥ 0, 1⊺ p x = 1 19
  25. Mean-variance approach of Markowitz Our objective is to maximise reward

    and minimise risk Instead, we solve max x reward(x) − µ risk(x) for multiple values of the risk aversion parameter µ ≥ 0 • Linear constraints: x ≥ 0, 1⊺ p x = 1 • What about the objective function? 19
  26. Mean-variance approach of Markowitz Why is variance a reasonable measure

    of risk? • Variance-based measures are not monotonic • Quantile-based measures (e.g. VaR) are not subadditive • The loss beyond the VaR is ignored 19
  27. Other risk measures Artzner et al. provided a foundation for

    ‘coherent’ risk measures: • Expected shortfall • Conditional VaR (CVaR) • α-risk 20
  28. Other risk measures Artzner et al. provided a foundation for

    ‘coherent’ risk measures: • Expected shortfall • Conditional VaR (CVaR) • α-risk Linear programming solutions • Portfolios with CVaR constraints are linear programs • α-risk models are αth quantile regression problems 20
  29. Recap • Optimisation is at the core of what we

    do! • Some problems are much harder than others ⇝ convexity • LPs and QPs are ‘easy’, with plenty of tools available • Different commonly used regression models are actually LPs or QPs • So are some portfolio allocation models! 21