ECML PKDD 2017

Towards True End-to-End Learning and Optimization michell.s.handaka [at] gdplabs.id GDP
Labs Confidential

GDP Labs Confidential

Deep Learning learns features from data GDP Labs Confidential

Deep Learning end-to-end learning: joint optimization of a single loss
function GDP Labs Confidential

Towards True End-to-End Learning and Optimization Deep Learning AutoML deep
learning “end-to-end” expert chooses architecture & hyperparameters learning box meta-level learning and optimization GDP Labs Confidential

Learning Box can be Any Machine Learning Pipeline • data
preprocessing • feature engineering • model selection • hyperparameter tuning • ensembles GDP Labs Confidential

Bayesian Optimization Bayesian optimization λ f(λ) GDP Labs Confidential

GDP Labs Confidential

[Bardenet et al., ICML 2013; Swersky et al., NIPS 2013;
Feurer, Springenberg, Hutter, AAAI 2015 ] [Domhan, Springenberg, Hutter, IJCAI 2015] [Klein, Bartels, Falkner, Hennig, Hutter, AISTATS 2017] Beyond Black Box Bayesian Optimization [Thornton, Hutter, Hoos, Leyton-Brown, KDD 2013] GDP Labs Confidential

Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by
Extrapolation of Learning Curves [Domhan, Springenberg, Hutter, IJCAI 2015] GDP Labs Confidential

Typical Learning Curves for Iterative Training with SGD Markov Chain
Monte Carlo to quantify model uncertainty GDP Labs Confidential

Predictive Termination if P < 5%, terminate if P >
5%, continue training GDP Labs Confidential

Qualitative Analysis GDP Labs Confidential

Quantitative Analysis 2 fold speed up of Deep Neural Network
structure & hyperparameter optimization GDP Labs Confidential

Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
[Klein, Bartels, Falkner, Hennig, Hutter, AISTATS 2017] GDP Labs Confidential

problem: training is very slow for large datasets approach: scaling
up from subsets of the data e.g. Support Vector Machine computational cost grows quadratically in dataset size s error shrinks smoothly with dataset size s GDP Labs Confidential

• automatically choose dataset size for each evaluation • entropy
search based on a probability distribution of where the maximum lies • pick configuration and dataset size pair to maximally decrease entropy per time spent GDP Labs Confidential

• 10 - 500 fold speed up for optimizing SVMs
• 5 - 10 fold speed up for optimizing Convolutional Neural Networks Quantitative Analysis GDP Labs Confidential

Bayesian Optimization with Robust Bayesian Neural Networks [Springenberg, Klein, Falkner,
Hutter, NIPS 2016] GDP Labs Confidential

f(λ, D) ✓ f(λ, t) ✓ f(λ, s) ✓ f(λ,
D, t, s)? • a lot of data points • expensive black box evaluations • cheap incremental evaluations • Gaussian Process Model will not scale Stochastic Gradient Hamiltonian Monte Carlo GDP Labs Confidential

Empirical Evaluation Scalable Bayesian Optimization Using Deep Neural Networks (DNGO)
[Snoek et al., ICML 2015] DNN with Bayesian Linear Regression in last layer both algorithms are effective SGHMC is more robust as good as Bayesian optimization with Gaussian Processes but much more flexible e.g. reasoning over many related datasets GDP Labs Confidential

Conclusion • Bayesian optimization enables true end-to-end learning • large
speed ups by going beyond black box optimization • learning across datasets • learning curve extrapolation • dataset subsampling GDP Labs Confidential

References • Domhan et al. Speeding up automatic hyperparameter optimization
of deep neural networks by extrapolation of learning curves. IJCAI 2015. • Klein et al. Fast Bayesian optimization of machine learning hyperparameters on large datasets. AISTATS 2017. • Springenberg. Bayesian optimization with robust Bayesian neural networks. NIPS 2016. • Snoek et al. Scalable Bayesian optimization using deep neural networks. ICML 2015. • Hutter. Towards true end-to-end learning and optimization. ECML 2017. • Hutter. Black box hyperparameter optimization and AutoML. AutoML 2017. • Hutter. Beyond black box optimization. AutoML 2017. • http://www.ml4aad.org/ • ecmlpkdd2017.automl.org/ • http://ecmlpkdd2017.ijs.si/ • https://www.extremetech.com/extreme/147940-google-self-driving-cars-in-3-5-years-feds-not-so-fast • http://www.techrepublic.com/article/apples-siri-the-smart-persons-guide/ • https://www.youtube.com/watch?v=g-dKXOlsf98 • http://aidev.co.kr/general/876?ckattempt=1 GDP Labs Confidential

ECML PKDD 2017

ECML PKDD 2017

GDP Labs

More Decks by GDP Labs

Other Decks in Technology

Featured

Transcript

Towards True End-to-End Learning and Optimization michell.s.handaka [at] gdplabs.id GDP

GDP Labs Confidential

GDP Labs Confidential

Deep Learning learns features from data GDP Labs Confidential

Deep Learning learns features from data GDP Labs Confidential

Deep Learning learns features from data GDP Labs Confidential

Deep Learning end-to-end learning: joint optimization of a single loss

Deep Learning end-to-end learning: joint optimization of a single loss

Towards True End-to-End Learning and Optimization Deep Learning AutoML deep

Learning Box can be Any Machine Learning Pipeline • data

Bayesian Optimization Bayesian optimization λ f(λ) GDP Labs Confidential

GDP Labs Confidential

GDP Labs Confidential

GDP Labs Confidential

[Bardenet et al., ICML 2013; Swersky et al., NIPS 2013;

Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by

Typical Learning Curves for Iterative Training with SGD Markov Chain

Predictive Termination if P < 5%, terminate if P >

Qualitative Analysis GDP Labs Confidential

Quantitative Analysis 2 fold speed up of Deep Neural Network

Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

problem: training is very slow for large datasets approach: scaling

• automatically choose dataset size for each evaluation • entropy

• 10 - 500 fold speed up for optimizing SVMs

Bayesian Optimization with Robust Bayesian Neural Networks [Springenberg, Klein, Falkner,

f(λ, D) ✓ f(λ, t) ✓ f(λ, s) ✓ f(λ,

Empirical Evaluation Scalable Bayesian Optimization Using Deep Neural Networks (DNGO)

Conclusion • Bayesian optimization enables true end-to-end learning • large

References • Domhan et al. Speeding up automatic hyperparameter optimization