Pro Yearly is on sale from $80 to $50! »

A Modern Guide to Hyperparameter Optimization

43cc5956e26cb1ed3185b4bbf4cbe3d5?s=47 Richard Liaw
December 04, 2019

A Modern Guide to Hyperparameter Optimization

Modern deep learning model performance is very dependent on the choice of model hyperparameters, and the tuning process is a major bottleneck in the machine learning pipeline. In this talk, we will overview modern methods for hyperparameter tuning and show how Ray Tune, a scalable open source hyperparameter tuning library with cutting edge tuning methods, can be easily incorporated into everyday workflows. Find Ray Tune on Github at https://github.com/ray-project/ray.

This talk was originally given at PyData LA2019.

43cc5956e26cb1ed3185b4bbf4cbe3d5?s=128

Richard Liaw

December 04, 2019
Tweet

Transcript

  1. A Modern Guide to Hyperparameter Optimization Richard Liaw

  2. 2 ©2017 RISELab Deep learning is taking over the world

  3. 3 ©2017 RISELab def train_model(): model = ConvNet() optimizer =

    Optimizer() for batch in Dataset(): loss, acc = model.train(batch) optimizer.update(model, loss)
  4. 4 ©2017 RISELab def train_model(): model = ConvNet() optimizer =

    Optimizer() for batch in Dataset(): loss, acc = model.train(batch) optimizer.update(model, loss)
  5. 5 ©2017 RISELab def train_model(): model = ConvNet(layers, activations, drop...

    optimizer = Optimizer(lr, momentum, decay...) for batch in Dataset(standardize, shift, ...) loss, acc = model.train(batch) optimizer.update(model, loss) Tune this!
  6. 6 ©2017 RISELab Hyperparameters matter!

  7. 7 ©2017 RISELab Goal of hyperparameter tuning Maximize Model Performance

    Minimize Time spent Minimize money spent
  8. 8 ©2017 RISELab Overview of hyperparameter tuning techniques Grid Search

    Bayesian Optimization HyperBand (Bandits) Population- based Training Random Search Definition: “trial” = “one configuration evaluation”
  9. 9 ©2017 RISELab Grid Search tl;dr - Cross-product of all

    possible configurations. for rate in [0.1, 0.01, 0.001]: for hidden_layers in [2, 3, 4]: for param in ["a", "b", "c"]: train_model( rate, hidden_layers, param) Benefit: 1. Explainable 2. Easily parallelizable Problems: Inefficient/expensive ⇐ 27 evaluations!
  10. 10 ©2017 RISELab Grid Search Random Search

  11. 11 ©2017 RISELab Random Search tldr: Sample configurations. for i

    in num_samples: train_model( rate=sample(0.001, 0.1), hidden_layers=sample(2, 4), param=sample(["a", "b", "c"])) Benefit: 1. Better coverage on important parameters 2. Easily parallelizable 3. Hard to beat on high dimensions Problems: Still inefficient/expensive!
  12. 12 ©2017 RISELab What if we used some prior information

    to guide our tuning process? photo from github.com/ fmfn/bayesianoptimization
  13. 13 ©2017 RISELab Bayesian Optimization opt = Optimizer( lr=(0.01, 0.1),

    layers=(2, 5) ) for i in range(9): config = opt.ask() score = train_model(config) opt.tell(config, score) Model-based optimization of hyperparameters. Libraries: Hyperopt, Scikit-optimize Benefit: 1. Can utilize prior information 2. Semi-Parallelizable (Kandasamy 2018) Still can do better!
  14. 14 ©2017 RISELab We can do better by exploiting structure!

    Why waste resources on this?
  15. 15 ©2017 RISELab HyperBand/ASHA (early stopping algorithms) trial = sample_from(hyperparameter_space)

    while trial.iter < max_epochs: trial.run_one_epoch() if trial.at_cutoff(): if is_top_fraction(trial, trial.iter): trial.extend_cutoff() else: # allow new trials to start trial.pause(); break Intuition: 1. Compare relative performance 2. Terminate bad performing trials 3. Continue better trials for longer period of time Notes: 1. Can be combined with Bayesian Optimization 2. Can be easily parallelized
  16. 16 ©2017 RISELab But what about dynamic hyperparameters? Changed Learning

    rate!
  17. 17 ©2017 RISELab Population-based training 0.1 0.2 0.3 0.4 0.1

    0.2 0.3 0.1 0.2 0.3 .15 0.1 0.3 .15 Main idea: Evaluate a population in parallel. Terminate lowest performers. Copy weights of the best performers and mutates hyperparameters 0.4 Benefits: 1. Easily parallelizable 2. Can search over “schedules” 3. Terminates bad performers
  18. 18 ©2017 RISELab Does it really work?

  19. 19 ©2017 RISELab OK, but there’s no way I’m going

    to implement all of these algorithms...
  20. A library for distributed hyperparameter search tune.io

  21. 21 ©2017 RISELab Tune and many others!

  22. 22 ©2017 RISELab Tune handles hyperparameter search execution. tune.io

  23. 23 ©2017 RISELab Resource Aware Scheduling Framework Agnostic Tune is

    built with Deep Learning as a priority.
  24. 24 ©2017 RISELab • HyperOpt (TPE) • Bayesian Optimization •

    SigOpt • Nevergrad • Scikit-Optimize • Ax/Botorch (PyTorch BayesOpt) Tune Algorithm Offerings Search Algorithms Provided • Population-based Training • HyperBand • ASHA • Median-stopping Rule • BOHB Trial Schedulers Provided
  25. 25 ©2017 RISELab Tune: Powers Many Open Source Projects

  26. 26 ©2017 RISELab Native Integration with TensorBoard HParams

  27. 27 ©2017 RISELab Resources PyData Demo: https://github.com/richardliaw/pydata_demo Tune Documentation: http://tune.io

    Tune Tutorial: https://github.com/ray-project/tutorial/
  28. live demo

  29. 29 ©2017 RISELab ray.readthedocs.io/en/latest/tune.html ray.readthedocs.io/en/latest/tune.html def train_model(config={}): model = ConvNet(config)

    for i in range(steps): loss, acc = model.train()
  30. 30 ©2017 RISELab ray.readthedocs.io/en/latest/tune.html ray.readthedocs.io/en/latest/tune.html from ray.tune import run, track

    def train_model(config={}): model = ConvNet(config) for i in range(steps): loss, acc = model.train() track.log(mean_loss=loss)
  31. 31 ©2017 RISELab ray.readthedocs.io/en/latest/tune.html ray.readthedocs.io/en/latest/tune.html train_model(config={“learning_rate": 0.1}) def train_model(config={}): model

    = ConvNet(config) for i in range(steps): loss, acc = model.train() track.log(mean_loss=loss)
  32. 32 ©2017 RISELab ray.readthedocs.io/en/latest/tune.html ray.readthedocs.io/en/latest/tune.html def train_model(config={}): model = ConvNet(config)

    for i in range(steps): loss, acc = model.train() track.log(mean_loss=loss) tune.run(train_model, config={“learning_rate”: 0.1})
  33. 33 ©2017 RISELab ray.readthedocs.io/en/latest/tune.html ray.readthedocs.io/en/latest/tune.html tune.run(train_model, config={“learning_rate”: 0.1}, num_samples=100) def

    train_model(config): model = ConvNet(config) for i in range(steps): loss, acc = model.train() track.log(mean_loss=loss)
  34. 34 ©2017 RISELab ray.readthedocs.io/en/latest/tune.html ray.readthedocs.io/en/latest/tune.html tune.run(train_model, config={“learning_rate”: 0.1}, num_samples=100, upload_dir="s3://my_bucket")

    def train_model(config): model = ConvNet(config) for i in range(steps): loss, acc = model.train() track.log(mean_loss=loss)
  35. 35 ©2017 RISELab ray.readthedocs.io/en/latest/tune.html ray.readthedocs.io/en/latest/tune.html tune.run(train_model, config={ “learning_rate”: tune.uniform(0.001, 0.1)},

    num_samples=100, upload_dir="s3://my_bucket", scheduler=AsyncHyperBandScheduler()) def train_model(config): model = ConvNet(config) for i in range(steps): loss, acc = model.train() track.log(mean_loss=loss)