Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bayesian Hyperparameter Optimisation

Allen Akinkunle
December 01, 2020
210

Bayesian Hyperparameter Optimisation

Allen Akinkunle

December 01, 2020
Tweet

Transcript

  1. Talk Outline • The goal of hyperparameter optimisation • Grid

    search and random search • How Bayesian optimisation works? • Code Demo 2
  2. The process of training a machine learning model is controlled

    by hyperparameters • Unlike model parameters whose values are learned from the training data, hyperparameter values are prede f i ned before the training process begins.
 • Choosing appropriate values of the hyperparameters is important because they a f f ect model performance and even training speed. 3
  3. The goal of hyperparameter optimisation • We want to f

    i nd the hyperparameter combinations in the hyperparameter space that maximises some performance function on a holdout dataset. • Some example performance functions for binary classi f i cation problems are Accuracy, ROC AUC, precision, recall etc. x⋆ X f(x) 4 x⋆ = arg max x∈X f(x)
  4. The process of hyperparameter optimisation involves three steps 5 Step

    1 Step 2 Step 3 Step 1 Select hyperparameter combination from a grid or through a random process x Step 2 Train a model using the hyperparameter combination selected M x Step 3 Evaluate the performance of model on validation dataset using performance metric M f(x)
  5. Grid search and random search are two of the most

    commonly used approaches… 6 • Simple and easy to implement • Because they treat the evaluation of every hyperparameter combination independently, they are easy to parallelise. • They work really well in not-so-complex situations (when it is cheap to evaluate the performance function)
  6. …but they have certain problems 7 • Grid search su

    f f ers from the curse of dimensionality • They don’t use past evaluation of the performance function to intelligently select the next hyperparameter combinations to try. • We could end up wasting huge amounts of resources when training complex models like neural networks. f(x)
  7. Using grid search and random search, the three stages of

    hyperparameter optimisation actually looks like this 8 Step 1 Step 2 Step 3 Step 1 Select hyperparameter combination from a grid or through a random process x Step 2 Train a model using the hyperparameter combination selected M x Step 3 Evaluate the performance of model on validation dataset using performance metric M f(x) } There is no connection between Steps 3 and 1. Step 3 does not guide step 1
  8. There should be a way to intelligently search for the

    next optimal hyperparameter combination 9 High accuracy values. Good idea to concentrate search in this area Low accuracy values. Probably best not to search this area
  9. Bayesian Optimisation 10 • Bayesian optimisation keeps track of past

    evaluation result which it uses to build a surrogate probability model of the objective function. • The surrogate is much easier to optimise than the objective function, so Bayesian optimisation works by f i nding the next set of hyperparameters to evaluate on the actual objective function by selecting the hyperparameters that perform best on the surrogation model.
  10. Bayesian Optimisation 11 • Build a surrogate probability model of

    the true objective function. • Find the hyperparameters that perform best on the surrogate function. • Apply these hyperparameters to the true objective function. • Update the surrogate model by using the new results. • Repeat the above steps until maximum iterations or convergence.