Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ECML PKDD 2017

142db55abf0e6eec31639e9abf7dd7e3?s=47 GDP Labs
October 25, 2017

ECML PKDD 2017

142db55abf0e6eec31639e9abf7dd7e3?s=128

GDP Labs

October 25, 2017
Tweet

Transcript

  1. None
  2. Towards True End-to-End Learning and Optimization michell.s.handaka [at] gdplabs.id GDP

    Labs Confidential
  3. None
  4. GDP Labs Confidential

  5. GDP Labs Confidential

  6. Deep Learning learns features from data GDP Labs Confidential

  7. Deep Learning learns features from data GDP Labs Confidential

  8. Deep Learning learns features from data GDP Labs Confidential

  9. Deep Learning end-to-end learning: joint optimization of a single loss

    function GDP Labs Confidential
  10. Deep Learning end-to-end learning: joint optimization of a single loss

    function GDP Labs Confidential
  11. Towards True End-to-End Learning and Optimization Deep Learning AutoML deep

    learning “end-to-end” expert chooses architecture & hyperparameters learning box meta-level learning and optimization GDP Labs Confidential
  12. Learning Box can be Any Machine Learning Pipeline • data

    preprocessing • feature engineering • model selection • hyperparameter tuning • ensembles GDP Labs Confidential
  13. Bayesian Optimization Bayesian optimization λ f(λ) GDP Labs Confidential

  14. GDP Labs Confidential

  15. GDP Labs Confidential

  16. GDP Labs Confidential

  17. [Bardenet et al., ICML 2013; Swersky et al., NIPS 2013;

    Feurer, Springenberg, Hutter, AAAI 2015 ] [Domhan, Springenberg, Hutter, IJCAI 2015] [Klein, Bartels, Falkner, Hennig, Hutter, AISTATS 2017] Beyond Black Box Bayesian Optimization [Thornton, Hutter, Hoos, Leyton-Brown, KDD 2013] GDP Labs Confidential
  18. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by

    Extrapolation of Learning Curves [Domhan, Springenberg, Hutter, IJCAI 2015] GDP Labs Confidential
  19. Typical Learning Curves for Iterative Training with SGD Markov Chain

    Monte Carlo to quantify model uncertainty GDP Labs Confidential
  20. Predictive Termination if P < 5%, terminate if P >

    5%, continue training GDP Labs Confidential
  21. Qualitative Analysis GDP Labs Confidential

  22. Quantitative Analysis 2 fold speed up of Deep Neural Network

    structure & hyperparameter optimization GDP Labs Confidential
  23. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

    [Klein, Bartels, Falkner, Hennig, Hutter, AISTATS 2017] GDP Labs Confidential
  24. problem: training is very slow for large datasets approach: scaling

    up from subsets of the data e.g. Support Vector Machine computational cost grows quadratically in dataset size s error shrinks smoothly with dataset size s GDP Labs Confidential
  25. • automatically choose dataset size for each evaluation • entropy

    search based on a probability distribution of where the maximum lies • pick configuration and dataset size pair to maximally decrease entropy per time spent GDP Labs Confidential
  26. • 10 - 500 fold speed up for optimizing SVMs

    • 5 - 10 fold speed up for optimizing Convolutional Neural Networks Quantitative Analysis GDP Labs Confidential
  27. Bayesian Optimization with Robust Bayesian Neural Networks [Springenberg, Klein, Falkner,

    Hutter, NIPS 2016] GDP Labs Confidential
  28. f(λ, D) ✓ f(λ, t) ✓ f(λ, s) ✓ f(λ,

    D, t, s)? • a lot of data points • expensive black box evaluations • cheap incremental evaluations • Gaussian Process Model will not scale Stochastic Gradient Hamiltonian Monte Carlo GDP Labs Confidential
  29. Empirical Evaluation Scalable Bayesian Optimization Using Deep Neural Networks (DNGO)

    [Snoek et al., ICML 2015] DNN with Bayesian Linear Regression in last layer both algorithms are effective SGHMC is more robust as good as Bayesian optimization with Gaussian Processes but much more flexible e.g. reasoning over many related datasets GDP Labs Confidential
  30. Conclusion • Bayesian optimization enables true end-to-end learning • large

    speed ups by going beyond black box optimization • learning across datasets • learning curve extrapolation • dataset subsampling GDP Labs Confidential
  31. References • Domhan et al. Speeding up automatic hyperparameter optimization

    of deep neural networks by extrapolation of learning curves. IJCAI 2015. • Klein et al. Fast Bayesian optimization of machine learning hyperparameters on large datasets. AISTATS 2017. • Springenberg. Bayesian optimization with robust Bayesian neural networks. NIPS 2016. • Snoek et al. Scalable Bayesian optimization using deep neural networks. ICML 2015. • Hutter. Towards true end-to-end learning and optimization. ECML 2017. • Hutter. Black box hyperparameter optimization and AutoML. AutoML 2017. • Hutter. Beyond black box optimization. AutoML 2017. • http://www.ml4aad.org/ • ecmlpkdd2017.automl.org/ • http://ecmlpkdd2017.ijs.si/ • https://www.extremetech.com/extreme/147940-google-self-driving-cars-in-3-5-years-feds-not-so-fast • http://www.techrepublic.com/article/apples-siri-the-smart-persons-guide/ • https://www.youtube.com/watch?v=g-dKXOlsf98 • http://aidev.co.kr/general/876?ckattempt=1 GDP Labs Confidential