An introduction to mathematical programming

New uses for old tools An introduction to mathematical programming
Dr Gianluca Campanella 29th April 2017

Contents Mathematical programming Linear and quadratic programs Regression problems as
LPs and QPs An application to portfolio theory 1

Mathematical programming

What is mathematical programming? • Also known as (mathematical) optimisation
• Goal is to select the ‘best’ element from some set of available alternatives Typically we have an objective function, e.g. f : Rp → R, that: • Takes p inputs as a vector, e.g. x ∈ Rp • Maps the input to some output value f(x) ∈ R We want to find the optimal x⋆ that minimises (or maximises) f 2

What is mathematical programming? • Many ML methods rely on
minimisation of cost functions Linear regression MSE(ˆ β) = 1 n ∑ i (ˆ yi − yi)2 where ˆ yi = x⊺ i ˆ β Logistic regression LogLoss(ˆ β) = − ∑ i [yi log ˆ pi + (1 − yi) log(1 − ˆ pi)] where ˆ pi = logit−1(x⊺ i ˆ β) 2

Local and global optima A function may have multiple optima
↓ Some will be local, some will be global −3 0 3 6 9 −4 −2 0 2 4 x y 3

Hard optimisation problems Consider these three functions: f : R100
→ R g : [0, 1]100 → R h : {0, 1}100 → R Which one is ‘harder’ to optimise, and why? 4

Combinatorial optimisation Combinatorial problems like optimising h : {0, 1}100
→ R are intrinsically hard • Need to try all 2100 ≈ 1.27 × 1030 combinations • Variable selection is a notable example Side note If h is continuous and we’re actually constraining x ∈ {0, 1}100, approximate solutions (relaxations) are normally easier to obtain 5

Numerical optimisation using directional information Function is differentiable (analytically or
numerically) ↓ Gradient gives a search direction and Hessian can be used to confirm optimality −3 0 3 6 9 −4 −2 0 2 4 x y 6

Convex functions Function is convex ↓ Any local minimum is
also a global minimum 0 5 10 15 −4 −2 0 2 4 x y 7

Constrained optimisation What about g : [0, 1]100 → R?
• Harder than f : R100 → R… but not much • Directional information still useful • Need to ensure search strategy doesn’t escape the feasible region 8

Linear and quadratic programs

Linear programs max x c⊺x s.t. Ax ≤ b x
≥ 0 • Linear objective, linear constraints • Linear objective is convex ⇝ global maximum • An optimal solution need not exist: • Inconsistent constraints ⇝ infeasible • Feasible region unbounded in the direction of the gradient of the objective 9

Linear programs max x, y 3x + 4y s.t. x
+ 2y ≤ 14 3x − y ≥ 0 x − y ≤ 2 q 0 10 20 −2.5 0.0 2.5 5.0 7.5 x y 9

Linear programs Linear programs can be solved efficiently using: •
Simplex algorithm • Interior-point (barrier) methods Performance is generally similar, but might differ drastically for specific problems 9

Convex quadratic programs min x 1 2 x⊺Qx + c⊺x
s.t. Ax ⪯ b x ⪰ 0 • Quadratic objective, quadratic constraints • Are quadratic objectives always convex? • Q must be (semi)definite 10

Convex quadratic programs Quadratic programs can be solved efficiently using:
• Active set method • Augmented Lagrangian method • Conjugate gradient method • Interior-point (barrier) methods 10

LPs and QPs in Python Many Python libraries exist: Linear
programming • PuLP • Google Optimization Tools • clpy Convex quadratic programming • CVXOPT • CVXPY 11

Regression problems as LPs and QPs

Linear regression We can rewrite the least-squares problem min x
|| Ax − b ||2 2 = ∑ i ε2 i as the convex quadratic objective f(x) = x⊺A⊺Ax − 2b⊺Ax + b⊺b Side note Setting the gradient to 0 and solving for x recovers the normal equations: ∇f = 2A⊺Ax − 2A⊺b = 0 ⇝ A⊺Ax = A⊺b ⇝ x⋆ = (A⊺A)−1 A⊺b 12

Regularised linear regression Let’s add a penalisation term: min x
|| Ax − b ||2 2 + λ || x ||2 2 Our quadratic objective becomes: f(x) = x⊺ (A⊺A + λIp) x − 2b⊺Ax + b⊺b Side note This is a good trick to use when the columns of A are not perfectly independent 13

Constraints on x Nonnegativity • x ≥ 0 • Parameters
known to be nonnegative, e.g. intensities or rates Bounds • l ≤ x ≤ u • Prior knowledge of permissible values Unit sum • x ≥ 0 and 1⊺ p x = 1 • Useful for proportions and probability distributions 14

Least squares vs least absolute deviations Why do we minimise
squared residuals? • Stable, unique, analytical solution • Not very robust! 15

Least squares vs least absolute deviations Least absolute deviations •
Predates least squares by around 50 years (Bošković) • Adopted by Laplace, but shadowed by Legendre and Gauss • Robust • Possibly multiple solutions 15

Robust regression We can rewrite the LAD problem min x
|| Ax − b ||1 = ∑ i |εi| as the linear program min x,t 1⊺ n t s.t. − t ≤ Ax − b ≤ t t ∈ Rn or min x,u,v 1⊺ n u + 1⊺ n v s.t. Ax + u − v = b u, v ≥ 0 16

Quantile regression Let’s now introduce a weight τ ∈ [0,
1] min x,u,v τ 1⊺ n u + (1 − τ)1⊺ n v s.t. Ax + u − v = b u, v ≥ 0 This is the τth quantile regression problem q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 10 20 30 0 25 50 75 100 x y 17

An application to portfolio theory

Example • Consider these two assets: A Equally likely to
go up 20% or down 10% in a year A Equally likely to go up 20% or down 10% in a year • Assume they’re perfectly inversely correlated • How would you allocate your money? 18

Example • Consider these two assets: A Equally likely to
go up 20% or down 10% in a year A Equally likely to go up 20% or down 10% in a year • Assume they’re perfectly inversely correlated • How would you allocate your money? The portfolio 50% A + 50% B goes up 5% every year! 18

Mean-variance approach of Markowitz Given historical ROIs, denoted ri(t) for
asset i at time t ≤ T, we can compute: • The reward of asset i: rewardi = 1 T ∑ t ri(t) • The risk of asset i: riski = 1 T ∑ t [ri(t) − rewardi]2 We can compute the same quantities for a portfolio x ≥ 0, 1⊺ p x = 1 19

Mean-variance approach of Markowitz Our objective is to maximise reward
and minimise risk Instead, we solve max x reward(x) − µ risk(x) for multiple values of the risk aversion parameter µ ≥ 0 • Linear constraints: x ≥ 0, 1⊺ p x = 1 • What about the objective function? 19

Mean-variance approach of Markowitz Why is variance a reasonable measure
of risk? • Variance-based measures are not monotonic • Quantile-based measures (e.g. VaR) are not subadditive • The loss beyond the VaR is ignored 19

Other risk measures Artzner et al. provided a foundation for
‘coherent’ risk measures: • Expected shortfall • Conditional VaR (CVaR) • α-risk 20

Other risk measures Artzner et al. provided a foundation for
‘coherent’ risk measures: • Expected shortfall • Conditional VaR (CVaR) • α-risk Linear programming solutions • Portfolios with CVaR constraints are linear programs • α-risk models are αth quantile regression problems 20

Recap • Optimisation is at the core of what we
do! • Some problems are much harder than others ⇝ convexity • LPs and QPs are ‘easy’, with plenty of tools available • Different commonly used regression models are actually LPs or QPs • So are some portfolio allocation models! 21

An introduction to mathematical programming

An introduction to mathematical programming

More Decks by Gianluca Campanella

Other Decks in Science

Featured

Transcript