19

# An introduction to mathematical programming

April 29, 2017

## Transcript

1. ### New uses for old tools An introduction to mathematical programming

Dr Gianluca Campanella 29th April 2017
2. ### Contents Mathematical programming Linear and quadratic programs Regression problems as

LPs and QPs An application to portfolio theory 1

4. ### What is mathematical programming? • Also known as (mathematical) optimisation

• Goal is to select the ‘best’ element from some set of available alternatives Typically we have an objective function, e.g. f : Rp → R, that: • Takes p inputs as a vector, e.g. x ∈ Rp • Maps the input to some output value f(x) ∈ R We want to find the optimal x⋆ that minimises (or maximises) f 2
5. ### What is mathematical programming? • Many ML methods rely on

minimisation of cost functions Linear regression MSE(ˆ β) = 1 n ∑ i (ˆ yi − yi)2 where ˆ yi = x⊺ i ˆ β Logistic regression LogLoss(ˆ β) = − ∑ i [yi log ˆ pi + (1 − yi) log(1 − ˆ pi)] where ˆ pi = logit−1(x⊺ i ˆ β) 2
6. ### Local and global optima A function may have multiple optima

↓ Some will be local, some will be global −3 0 3 6 9 −4 −2 0 2 4 x y 3
7. ### Hard optimisation problems Consider these three functions: f : R100

→ R g : [0, 1]100 → R h : {0, 1}100 → R Which one is ‘harder’ to optimise, and why? 4
8. ### Combinatorial optimisation Combinatorial problems like optimising h : {0, 1}100

→ R are intrinsically hard • Need to try all 2100 ≈ 1.27 × 1030 combinations • Variable selection is a notable example Side note If h is continuous and we’re actually constraining x ∈ {0, 1}100, approximate solutions (relaxations) are normally easier to obtain 5
9. ### Numerical optimisation using directional information Function is differentiable (analytically or

numerically) ↓ Gradient gives a search direction and Hessian can be used to confirm optimality −3 0 3 6 9 −4 −2 0 2 4 x y 6
10. ### Convex functions Function is convex ↓ Any local minimum is

also a global minimum 0 5 10 15 −4 −2 0 2 4 x y 7
11. ### Constrained optimisation What about g : [0, 1]100 → R?

• Harder than f : R100 → R… but not much • Directional information still useful • Need to ensure search strategy doesn’t escape the feasible region 8

13. ### Linear programs max x c⊺x s.t. Ax ≤ b x

≥ 0 • Linear objective, linear constraints • Linear objective is convex ⇝ global maximum • An optimal solution need not exist: • Inconsistent constraints ⇝ infeasible • Feasible region unbounded in the direction of the gradient of the objective 9
14. ### Linear programs max x, y 3x + 4y s.t. x

+ 2y ≤ 14 3x − y ≥ 0 x − y ≤ 2 q 0 10 20 −2.5 0.0 2.5 5.0 7.5 x y 9
15. ### Linear programs Linear programs can be solved efficiently using: •

Simplex algorithm • Interior-point (barrier) methods Performance is generally similar, but might differ drastically for specific problems 9
16. ### Convex quadratic programs min x 1 2 x⊺Qx + c⊺x

s.t. Ax ⪯ b x ⪰ 0 • Quadratic objective, quadratic constraints • Are quadratic objectives always convex? • Q must be (semi)definite 10
17. ### Convex quadratic programs Quadratic programs can be solved efficiently using:

• Active set method • Augmented Lagrangian method • Conjugate gradient method • Interior-point (barrier) methods 10
18. ### LPs and QPs in Python Many Python libraries exist: Linear

programming • PuLP • Google Optimization Tools • clpy Convex quadratic programming • CVXOPT • CVXPY 11

20. ### Linear regression We can rewrite the least-squares problem min x

|| Ax − b ||2 2 = ∑ i ε2 i as the convex quadratic objective f(x) = x⊺A⊺Ax − 2b⊺Ax + b⊺b Side note Setting the gradient to 0 and solving for x recovers the normal equations: ∇f = 2A⊺Ax − 2A⊺b = 0 ⇝ A⊺Ax = A⊺b ⇝ x⋆ = (A⊺A)−1 A⊺b 12
21. ### Regularised linear regression Let’s add a penalisation term: min x

|| Ax − b ||2 2 + λ || x ||2 2 Our quadratic objective becomes: f(x) = x⊺ (A⊺A + λIp) x − 2b⊺Ax + b⊺b Side note This is a good trick to use when the columns of A are not perfectly independent 13
22. ### Constraints on x Nonnegativity • x ≥ 0 • Parameters

known to be nonnegative, e.g. intensities or rates Bounds • l ≤ x ≤ u • Prior knowledge of permissible values Unit sum • x ≥ 0 and 1⊺ p x = 1 • Useful for proportions and probability distributions 14
23. ### Least squares vs least absolute deviations Why do we minimise

squared residuals? • Stable, unique, analytical solution • Not very robust! 15
24. ### Least squares vs least absolute deviations Least absolute deviations •

Predates least squares by around 50 years (Bošković) • Adopted by Laplace, but shadowed by Legendre and Gauss • Robust • Possibly multiple solutions 15
25. ### Robust regression We can rewrite the LAD problem min x

|| Ax − b ||1 = ∑ i |εi| as the linear program min x,t 1⊺ n t s.t. − t ≤ Ax − b ≤ t t ∈ Rn or min x,u,v 1⊺ n u + 1⊺ n v s.t. Ax + u − v = b u, v ≥ 0 16
26. ### Quantile regression Let’s now introduce a weight τ ∈ [0,

1] min x,u,v τ 1⊺ n u + (1 − τ)1⊺ n v s.t. Ax + u − v = b u, v ≥ 0 This is the τth quantile regression problem q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 10 20 30 0 25 50 75 100 x y 17

28. ### Example • Consider these two assets: A Equally likely to

go up 20% or down 10% in a year A Equally likely to go up 20% or down 10% in a year • Assume they’re perfectly inversely correlated • How would you allocate your money? 18
29. ### Example • Consider these two assets: A Equally likely to

go up 20% or down 10% in a year A Equally likely to go up 20% or down 10% in a year • Assume they’re perfectly inversely correlated • How would you allocate your money? The portfolio 50% A + 50% B goes up 5% every year! 18
30. ### Mean-variance approach of Markowitz Given historical ROIs, denoted ri(t) for

asset i at time t ≤ T, we can compute: • The reward of asset i: rewardi = 1 T ∑ t ri(t) • The risk of asset i: riski = 1 T ∑ t [ri(t) − rewardi]2 We can compute the same quantities for a portfolio x ≥ 0, 1⊺ p x = 1 19
31. ### Mean-variance approach of Markowitz Our objective is to maximise reward

and minimise risk Instead, we solve max x reward(x) − µ risk(x) for multiple values of the risk aversion parameter µ ≥ 0 • Linear constraints: x ≥ 0, 1⊺ p x = 1 • What about the objective function? 19
32. ### Mean-variance approach of Markowitz Why is variance a reasonable measure

of risk? • Variance-based measures are not monotonic • Quantile-based measures (e.g. VaR) are not subadditive • The loss beyond the VaR is ignored 19
33. ### Other risk measures Artzner et al. provided a foundation for

‘coherent’ risk measures: • Expected shortfall • Conditional VaR (CVaR) • α-risk 20
34. ### Other risk measures Artzner et al. provided a foundation for

‘coherent’ risk measures: • Expected shortfall • Conditional VaR (CVaR) • α-risk Linear programming solutions • Portfolios with CVaR constraints are linear programs • α-risk models are αth quantile regression problems 20
35. ### Recap • Optimisation is at the core of what we

do! • Some problems are much harder than others ⇝ convexity • LPs and QPs are ‘easy’, with plenty of tools available • Different commonly used regression models are actually LPs or QPs • So are some portfolio allocation models! 21