Bayesian Hyperparameter Optimisation

Bayesian Hyperparameter Optimisation Allen Akinkunle www.allenkunle.me London PyTorch December 1,
2020 1

Talk Outline • The goal of hyperparameter optimisation • Grid
search and random search • How Bayesian optimisation works? • Code Demo 2

The process of training a machine learning model is controlled
by hyperparameters • Unlike model parameters whose values are learned from the training data, hyperparameter values are prede f i ned before the training process begins.  • Choosing appropriate values of the hyperparameters is important because they a f f ect model performance and even training speed. 3

The goal of hyperparameter optimisation • We want to f
i nd the hyperparameter combinations in the hyperparameter space that maximises some performance function on a holdout dataset. • Some example performance functions for binary classi f i cation problems are Accuracy, ROC AUC, precision, recall etc. x⋆ X f(x) 4 x⋆ = arg max x∈X f(x)

The process of hyperparameter optimisation involves three steps 5 Step
1 Step 2 Step 3 Step 1 Select hyperparameter combination from a grid or through a random process x Step 2 Train a model using the hyperparameter combination selected M x Step 3 Evaluate the performance of model on validation dataset using performance metric M f(x)

Grid search and random search are two of the most
commonly used approaches… 6 • Simple and easy to implement • Because they treat the evaluation of every hyperparameter combination independently, they are easy to parallelise. • They work really well in not-so-complex situations (when it is cheap to evaluate the performance function)

…but they have certain problems 7 • Grid search su
f f ers from the curse of dimensionality • They don’t use past evaluation of the performance function to intelligently select the next hyperparameter combinations to try. • We could end up wasting huge amounts of resources when training complex models like neural networks. f(x)

Using grid search and random search, the three stages of
hyperparameter optimisation actually looks like this 8 Step 1 Step 2 Step 3 Step 1 Select hyperparameter combination from a grid or through a random process x Step 2 Train a model using the hyperparameter combination selected M x Step 3 Evaluate the performance of model on validation dataset using performance metric M f(x) } There is no connection between Steps 3 and 1. Step 3 does not guide step 1

There should be a way to intelligently search for the
next optimal hyperparameter combination 9 High accuracy values. Good idea to concentrate search in this area Low accuracy values. Probably best not to search this area

Bayesian Optimisation 10 • Bayesian optimisation keeps track of past
evaluation result which it uses to build a surrogate probability model of the objective function. • The surrogate is much easier to optimise than the objective function, so Bayesian optimisation works by f i nding the next set of hyperparameters to evaluate on the actual objective function by selecting the hyperparameters that perform best on the surrogation model.

Bayesian Optimisation 11 • Build a surrogate probability model of
the true objective function. • Find the hyperparameters that perform best on the surrogate function. • Apply these hyperparameters to the true objective function. • Update the surrogate model by using the new results. • Repeat the above steps until maximum iterations or convergence.

Bayesian Hyperparameter Optimisation

Bayesian Hyperparameter Optimisation

Allen Akinkunle

More Decks by Allen Akinkunle

Featured

Transcript

Bayesian Hyperparameter Optimisation Allen Akinkunle www.allenkunle.me London PyTorch December 1,

Talk Outline • The goal of hyperparameter optimisation • Grid

The process of training a machine learning model is controlled

The goal of hyperparameter optimisation • We want to f

The process of hyperparameter optimisation involves three steps 5 Step

Grid search and random search are two of the most

…but they have certain problems 7 • Grid search su

Using grid search and random search, the three stages of

There should be a way to intelligently search for the

Bayesian Optimisation 10 • Bayesian optimisation keeps track of past

Bayesian Optimisation 11 • Build a surrogate probability model of