Make hyperparameters great again

Make hyperparameters great again

While tuning hyperparameters of machine learning algorithms is computationally expensive, it also proves vital for improving their predictive performance. Methods for tuning range from manual search to more complex procedures like Bayesian optimization. This talk will demonstrate the latest methods for finding good hyperparameter-sets within a set period of time for common algorithms like xgboost.

3c3f3f18c25ea5283640ebd23553e7c6?s=128

MunichDataGeeks

October 07, 2017
Tweet

Transcript

  1. Make Hyperparameters great again! Daniel Kühn @ DataGeeks Data Day

    2017
  2. InTro 2

  3. 3 http://xgboost.readthedocs.io/en/latest/model.html XGBoost

  4. 4 XGBoost Hyperparameters eta min_child_weight max_depth gamma nrounds subsample colsample_bytree

    colsample_bylevel
  5. https://www.openml.org/d/151 5

  6. 6

  7. Better 7

  8. 8 How to tune the hyperparameters of XGBoost to get

    a good result?
  9. Grid search

  10. 10 XGBoost Hyperparameters eta min_child_weight max_depth gamma nrounds subsample colsample_bytree

    colsample_bylevel
  11. 11

  12. Better 12

  13. Random search

  14. 14

  15. Better 15

  16. 16 BERGSTRA, BENGIO (2012) Grid search and manual search are

    the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. […] Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. […] this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
  17. Bayesian optimization

  18. 18 SNOEK, LAROCHELLE, ADAMS (2012) In this work, we consider

    the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). […] We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.
  19. 19 https://cran.r-project.org/web/packages/mlrMBO/vignettes/mlrMBO.html

  20. 20 https://cran.r-project.org/web/packages/mlrMBO/vignettes/mlrMBO.html

  21. 21 https://cran.r-project.org/web/packages/mlrMBO/vignettes/mlrMBO.html

  22. 22

  23. 23

  24. Better 24

  25. Open machine Learning bot

  26. 26 https://www.openml.org/u/2702

  27. 27

  28. 28 Random Forest

  29. 29 Dataset ETA NROUNDS … AUC 1 0.27 903 …

    0.84 1 0.12 2841 … 0.92 … … … … … PREDICT
  30. Dataset ETA NROUNDS … AUC 1 0.27 903 … 0.84

    1 0.12 2841 … 0.92 2 … … … … 30 PREDICT How to find good values?
  31. 31 PREDICT Dataset ETA NROUNDS … AUC 1 0.27 903

    … 0.84 1 0.12 2841 … 0.92 2 0.05 1750 … ? 2 0.072 2411 … ? 2 … … … ?
  32. 32 PREDICT Dataset ETA NROUNDS … AUC 1 0.27 903

    … 0.84 1 0.12 2841 … 0.92 2 0.05 1750 … 0.97 2 0.072 2411 … 0.91 2 … … … …
  33. Dataset ETA NROUNDS … AUC 1 0.27 903 … 0.84

    1 0.12 2841 … 0.92 2 0.05 1750 … 0.97 2 0.072 2411 … 0.91 2 … … … … 33 PREDICT Take the best!
  34. 34 Random Forest

  35. Better 35

  36. Wrap up