$30 off During Our Annual Pro Sale. View Details »

A Modern Guide to Hyperparameter Optimization

Richard Liaw
December 04, 2019

A Modern Guide to Hyperparameter Optimization

Modern deep learning model performance is very dependent on the choice of model hyperparameters, and the tuning process is a major bottleneck in the machine learning pipeline. In this talk, we will overview modern methods for hyperparameter tuning and show how Ray Tune, a scalable open source hyperparameter tuning library with cutting edge tuning methods, can be easily incorporated into everyday workflows. Find Ray Tune on Github at https://github.com/ray-project/ray.

This talk was originally given at PyData LA2019.

Richard Liaw

December 04, 2019
Tweet

Other Decks in Technology

Transcript

  1. A Modern Guide to
    Hyperparameter Optimization
    Richard Liaw

    View Slide

  2. 2 ©2017 RISELab
    Deep learning is taking over the world

    View Slide

  3. 3 ©2017 RISELab
    def train_model():
    model = ConvNet()
    optimizer = Optimizer()
    for batch in Dataset():
    loss, acc = model.train(batch)
    optimizer.update(model, loss)

    View Slide

  4. 4 ©2017 RISELab
    def train_model():
    model = ConvNet()
    optimizer = Optimizer()
    for batch in Dataset():
    loss, acc = model.train(batch)
    optimizer.update(model, loss)

    View Slide

  5. 5 ©2017 RISELab
    def train_model():
    model = ConvNet(layers, activations, drop...
    optimizer = Optimizer(lr, momentum, decay...)
    for batch in Dataset(standardize, shift, ...)
    loss, acc = model.train(batch)
    optimizer.update(model, loss)
    Tune this!

    View Slide

  6. 6 ©2017 RISELab
    Hyperparameters matter!

    View Slide

  7. 7 ©2017 RISELab
    Goal of hyperparameter tuning
    Maximize Model
    Performance
    Minimize Time spent Minimize money
    spent

    View Slide

  8. 8 ©2017 RISELab
    Overview of hyperparameter tuning techniques
    Grid
    Search
    Bayesian
    Optimization
    HyperBand
    (Bandits)
    Population-
    based
    Training
    Random
    Search
    Definition: “trial” = “one configuration evaluation”

    View Slide

  9. 9 ©2017 RISELab
    Grid Search
    tl;dr - Cross-product of all possible
    configurations.
    for rate in [0.1, 0.01, 0.001]:
    for hidden_layers in [2, 3, 4]:
    for param in ["a", "b", "c"]:
    train_model(
    rate,
    hidden_layers,
    param)
    Benefit:
    1. Explainable
    2. Easily parallelizable
    Problems:
    Inefficient/expensive
    ⇐ 27 evaluations!

    View Slide

  10. 10 ©2017 RISELab
    Grid Search Random Search

    View Slide

  11. 11 ©2017 RISELab
    Random Search
    tldr: Sample configurations.
    for i in num_samples:
    train_model(
    rate=sample(0.001, 0.1),
    hidden_layers=sample(2, 4),
    param=sample(["a", "b", "c"]))
    Benefit:
    1. Better coverage on
    important parameters
    2. Easily parallelizable
    3. Hard to beat on high
    dimensions
    Problems:
    Still inefficient/expensive!

    View Slide

  12. 12 ©2017 RISELab
    What if we used some prior information to guide our
    tuning process?
    photo from github.com/ fmfn/bayesianoptimization

    View Slide

  13. 13 ©2017 RISELab
    Bayesian Optimization
    opt = Optimizer(
    lr=(0.01, 0.1),
    layers=(2, 5)
    )
    for i in range(9):
    config = opt.ask()
    score = train_model(config)
    opt.tell(config, score)
    Model-based optimization of
    hyperparameters.
    Libraries: Hyperopt, Scikit-optimize
    Benefit:
    1. Can utilize prior information
    2. Semi-Parallelizable
    (Kandasamy 2018)
    Still can do better!

    View Slide

  14. 14 ©2017 RISELab
    We can do better by exploiting structure!
    Why
    waste
    resources
    on this?

    View Slide

  15. 15 ©2017 RISELab
    HyperBand/ASHA (early stopping algorithms)
    trial = sample_from(hyperparameter_space)
    while trial.iter < max_epochs:
    trial.run_one_epoch()
    if trial.at_cutoff():
    if is_top_fraction(trial, trial.iter):
    trial.extend_cutoff()
    else:
    # allow new trials to start
    trial.pause(); break
    Intuition:
    1. Compare relative performance
    2. Terminate bad performing trials
    3. Continue better trials for longer
    period of time
    Notes:
    1. Can be combined with
    Bayesian Optimization
    2. Can be easily parallelized

    View Slide

  16. 16 ©2017 RISELab
    But what about dynamic hyperparameters?
    Changed Learning
    rate!

    View Slide

  17. 17 ©2017 RISELab
    Population-based training
    0.1
    0.2
    0.3
    0.4
    0.1
    0.2
    0.3
    0.1
    0.2
    0.3
    .15
    0.1
    0.3
    .15
    Main idea:
    Evaluate a population in
    parallel.
    Terminate lowest performers.
    Copy weights of the best
    performers and mutates
    hyperparameters
    0.4
    Benefits:
    1. Easily parallelizable
    2. Can search over
    “schedules”
    3. Terminates bad
    performers

    View Slide

  18. 18 ©2017 RISELab
    Does it really work?

    View Slide

  19. 19 ©2017 RISELab
    OK, but there’s no way I’m going to implement all of
    these algorithms...

    View Slide

  20. A library for distributed hyperparameter search
    tune.io

    View Slide

  21. 21 ©2017 RISELab
    Tune
    and many others!

    View Slide

  22. 22 ©2017 RISELab
    Tune handles hyperparameter search execution.
    tune.io

    View Slide

  23. 23 ©2017 RISELab
    Resource Aware
    Scheduling
    Framework Agnostic
    Tune is built with Deep Learning as a priority.

    View Slide

  24. 24 ©2017 RISELab
    ● HyperOpt (TPE)
    ● Bayesian Optimization
    ● SigOpt
    ● Nevergrad
    ● Scikit-Optimize
    ● Ax/Botorch (PyTorch BayesOpt)
    Tune Algorithm Offerings
    Search Algorithms
    Provided
    ● Population-based Training
    ● HyperBand
    ● ASHA
    ● Median-stopping Rule
    ● BOHB
    Trial Schedulers
    Provided

    View Slide

  25. 25 ©2017 RISELab
    Tune: Powers Many Open Source Projects

    View Slide

  26. 26 ©2017 RISELab
    Native Integration with TensorBoard HParams

    View Slide

  27. 27 ©2017 RISELab
    Resources
    PyData Demo:
    https://github.com/richardliaw/pydata_demo
    Tune Documentation:
    http://tune.io
    Tune Tutorial:
    https://github.com/ray-project/tutorial/

    View Slide

  28. live demo

    View Slide

  29. 29 ©2017 RISELab
    ray.readthedocs.io/en/latest/tune.html
    ray.readthedocs.io/en/latest/tune.html
    def train_model(config={}):
    model = ConvNet(config)
    for i in range(steps):
    loss, acc = model.train()

    View Slide

  30. 30 ©2017 RISELab
    ray.readthedocs.io/en/latest/tune.html
    ray.readthedocs.io/en/latest/tune.html
    from ray.tune import run, track
    def train_model(config={}):
    model = ConvNet(config)
    for i in range(steps):
    loss, acc = model.train()
    track.log(mean_loss=loss)

    View Slide

  31. 31 ©2017 RISELab
    ray.readthedocs.io/en/latest/tune.html
    ray.readthedocs.io/en/latest/tune.html
    train_model(config={“learning_rate": 0.1})
    def train_model(config={}):
    model = ConvNet(config)
    for i in range(steps):
    loss, acc = model.train()
    track.log(mean_loss=loss)

    View Slide

  32. 32 ©2017 RISELab
    ray.readthedocs.io/en/latest/tune.html
    ray.readthedocs.io/en/latest/tune.html
    def train_model(config={}):
    model = ConvNet(config)
    for i in range(steps):
    loss, acc = model.train()
    track.log(mean_loss=loss)
    tune.run(train_model,
    config={“learning_rate”: 0.1})

    View Slide

  33. 33 ©2017 RISELab
    ray.readthedocs.io/en/latest/tune.html
    ray.readthedocs.io/en/latest/tune.html
    tune.run(train_model,
    config={“learning_rate”: 0.1},
    num_samples=100)
    def train_model(config):
    model = ConvNet(config)
    for i in range(steps):
    loss, acc = model.train()
    track.log(mean_loss=loss)

    View Slide

  34. 34 ©2017 RISELab
    ray.readthedocs.io/en/latest/tune.html
    ray.readthedocs.io/en/latest/tune.html
    tune.run(train_model,
    config={“learning_rate”: 0.1},
    num_samples=100,
    upload_dir="s3://my_bucket")
    def train_model(config):
    model = ConvNet(config)
    for i in range(steps):
    loss, acc = model.train()
    track.log(mean_loss=loss)

    View Slide

  35. 35 ©2017 RISELab
    ray.readthedocs.io/en/latest/tune.html
    ray.readthedocs.io/en/latest/tune.html
    tune.run(train_model,
    config={
    “learning_rate”: tune.uniform(0.001, 0.1)},
    num_samples=100,
    upload_dir="s3://my_bucket",
    scheduler=AsyncHyperBandScheduler())
    def train_model(config):
    model = ConvNet(config)
    for i in range(steps):
    loss, acc = model.train()
    track.log(mean_loss=loss)

    View Slide