An introduction to mathematical programming

An introduction to mathematical programming

18cb10f135906d61e4feb96fa2018b4a?s=128

Gianluca Campanella

April 29, 2017
Tweet

Transcript

  1. New uses for old tools An introduction to mathematical programming

    Dr Gianluca Campanella 29th April 2017
  2. Contents Mathematical programming Linear and quadratic programs Regression problems as

    LPs and QPs An application to portfolio theory 1
  3. Mathematical programming

  4. What is mathematical programming? • Also known as (mathematical) optimisation

    • Goal is to select the ‘best’ element from some set of available alternatives Typically we have an objective function, e.g. f : Rp → R, that: • Takes p inputs as a vector, e.g. x ∈ Rp • Maps the input to some output value f(x) ∈ R We want to find the optimal x⋆ that minimises (or maximises) f 2
  5. What is mathematical programming? • Many ML methods rely on

    minimisation of cost functions Linear regression MSE(ˆ β) = 1 n ∑ i (ˆ yi − yi)2 where ˆ yi = x⊺ i ˆ β Logistic regression LogLoss(ˆ β) = − ∑ i [yi log ˆ pi + (1 − yi) log(1 − ˆ pi)] where ˆ pi = logit−1(x⊺ i ˆ β) 2
  6. Local and global optima A function may have multiple optima

    ↓ Some will be local, some will be global −3 0 3 6 9 −4 −2 0 2 4 x y 3
  7. Hard optimisation problems Consider these three functions: f : R100

    → R g : [0, 1]100 → R h : {0, 1}100 → R Which one is ‘harder’ to optimise, and why? 4
  8. Combinatorial optimisation Combinatorial problems like optimising h : {0, 1}100

    → R are intrinsically hard • Need to try all 2100 ≈ 1.27 × 1030 combinations • Variable selection is a notable example Side note If h is continuous and we’re actually constraining x ∈ {0, 1}100, approximate solutions (relaxations) are normally easier to obtain 5
  9. Numerical optimisation using directional information Function is differentiable (analytically or

    numerically) ↓ Gradient gives a search direction and Hessian can be used to confirm optimality −3 0 3 6 9 −4 −2 0 2 4 x y 6
  10. Convex functions Function is convex ↓ Any local minimum is

    also a global minimum 0 5 10 15 −4 −2 0 2 4 x y 7
  11. Constrained optimisation What about g : [0, 1]100 → R?

    • Harder than f : R100 → R… but not much • Directional information still useful • Need to ensure search strategy doesn’t escape the feasible region 8
  12. Linear and quadratic programs

  13. Linear programs max x c⊺x s.t. Ax ≤ b x

    ≥ 0 • Linear objective, linear constraints • Linear objective is convex ⇝ global maximum • An optimal solution need not exist: • Inconsistent constraints ⇝ infeasible • Feasible region unbounded in the direction of the gradient of the objective 9
  14. Linear programs max x, y 3x + 4y s.t. x

    + 2y ≤ 14 3x − y ≥ 0 x − y ≤ 2 q 0 10 20 −2.5 0.0 2.5 5.0 7.5 x y 9
  15. Linear programs Linear programs can be solved efficiently using: •

    Simplex algorithm • Interior-point (barrier) methods Performance is generally similar, but might differ drastically for specific problems 9
  16. Convex quadratic programs min x 1 2 x⊺Qx + c⊺x

    s.t. Ax ⪯ b x ⪰ 0 • Quadratic objective, quadratic constraints • Are quadratic objectives always convex? • Q must be (semi)definite 10
  17. Convex quadratic programs Quadratic programs can be solved efficiently using:

    • Active set method • Augmented Lagrangian method • Conjugate gradient method • Interior-point (barrier) methods 10
  18. LPs and QPs in Python Many Python libraries exist: Linear

    programming • PuLP • Google Optimization Tools • clpy Convex quadratic programming • CVXOPT • CVXPY 11
  19. Regression problems as LPs and QPs

  20. Linear regression We can rewrite the least-squares problem min x

    || Ax − b ||2 2 = ∑ i ε2 i as the convex quadratic objective f(x) = x⊺A⊺Ax − 2b⊺Ax + b⊺b Side note Setting the gradient to 0 and solving for x recovers the normal equations: ∇f = 2A⊺Ax − 2A⊺b = 0 ⇝ A⊺Ax = A⊺b ⇝ x⋆ = (A⊺A)−1 A⊺b 12
  21. Regularised linear regression Let’s add a penalisation term: min x

    || Ax − b ||2 2 + λ || x ||2 2 Our quadratic objective becomes: f(x) = x⊺ (A⊺A + λIp) x − 2b⊺Ax + b⊺b Side note This is a good trick to use when the columns of A are not perfectly independent 13
  22. Constraints on x Nonnegativity • x ≥ 0 • Parameters

    known to be nonnegative, e.g. intensities or rates Bounds • l ≤ x ≤ u • Prior knowledge of permissible values Unit sum • x ≥ 0 and 1⊺ p x = 1 • Useful for proportions and probability distributions 14
  23. Least squares vs least absolute deviations Why do we minimise

    squared residuals? • Stable, unique, analytical solution • Not very robust! 15
  24. Least squares vs least absolute deviations Least absolute deviations •

    Predates least squares by around 50 years (Bošković) • Adopted by Laplace, but shadowed by Legendre and Gauss • Robust • Possibly multiple solutions 15
  25. Robust regression We can rewrite the LAD problem min x

    || Ax − b ||1 = ∑ i |εi| as the linear program min x,t 1⊺ n t s.t. − t ≤ Ax − b ≤ t t ∈ Rn or min x,u,v 1⊺ n u + 1⊺ n v s.t. Ax + u − v = b u, v ≥ 0 16
  26. Quantile regression Let’s now introduce a weight τ ∈ [0,

    1] min x,u,v τ 1⊺ n u + (1 − τ)1⊺ n v s.t. Ax + u − v = b u, v ≥ 0 This is the τth quantile regression problem q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 10 20 30 0 25 50 75 100 x y 17
  27. An application to portfolio theory

  28. Example • Consider these two assets: A Equally likely to

    go up 20% or down 10% in a year A Equally likely to go up 20% or down 10% in a year • Assume they’re perfectly inversely correlated • How would you allocate your money? 18
  29. Example • Consider these two assets: A Equally likely to

    go up 20% or down 10% in a year A Equally likely to go up 20% or down 10% in a year • Assume they’re perfectly inversely correlated • How would you allocate your money? The portfolio 50% A + 50% B goes up 5% every year! 18
  30. Mean-variance approach of Markowitz Given historical ROIs, denoted ri(t) for

    asset i at time t ≤ T, we can compute: • The reward of asset i: rewardi = 1 T ∑ t ri(t) • The risk of asset i: riski = 1 T ∑ t [ri(t) − rewardi]2 We can compute the same quantities for a portfolio x ≥ 0, 1⊺ p x = 1 19
  31. Mean-variance approach of Markowitz Our objective is to maximise reward

    and minimise risk Instead, we solve max x reward(x) − µ risk(x) for multiple values of the risk aversion parameter µ ≥ 0 • Linear constraints: x ≥ 0, 1⊺ p x = 1 • What about the objective function? 19
  32. Mean-variance approach of Markowitz Why is variance a reasonable measure

    of risk? • Variance-based measures are not monotonic • Quantile-based measures (e.g. VaR) are not subadditive • The loss beyond the VaR is ignored 19
  33. Other risk measures Artzner et al. provided a foundation for

    ‘coherent’ risk measures: • Expected shortfall • Conditional VaR (CVaR) • α-risk 20
  34. Other risk measures Artzner et al. provided a foundation for

    ‘coherent’ risk measures: • Expected shortfall • Conditional VaR (CVaR) • α-risk Linear programming solutions • Portfolios with CVaR constraints are linear programs • α-risk models are αth quantile regression problems 20
  35. Recap • Optimisation is at the core of what we

    do! • Some problems are much harder than others ⇝ convexity • LPs and QPs are ‘easy’, with plenty of tools available • Different commonly used regression models are actually LPs or QPs • So are some portfolio allocation models! 21