ECML PKDD 2017 - Speaker Deck

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Towards True End-to-End Learning and Optimization michell.s.handaka [at] gdplabs.id GDP Labs Confidential

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

GDP Labs Confidential

Slide 5

Slide 5 text

GDP Labs Confidential

Slide 6

Slide 6 text

Deep Learning learns features from data GDP Labs Confidential

Slide 7

Slide 7 text

Deep Learning learns features from data GDP Labs Confidential

Slide 8

Slide 8 text

Deep Learning learns features from data GDP Labs Confidential

Slide 9

Slide 9 text

Deep Learning end-to-end learning: joint optimization of a single loss function GDP Labs Confidential

Slide 10

Slide 10 text

Deep Learning end-to-end learning: joint optimization of a single loss function GDP Labs Confidential

Slide 11

Slide 11 text

Towards True End-to-End Learning and Optimization Deep Learning AutoML deep learning “end-to-end” expert chooses architecture & hyperparameters learning box meta-level learning and optimization GDP Labs Confidential

Slide 12

Slide 12 text

Learning Box can be Any Machine Learning Pipeline ● data preprocessing ● feature engineering ● model selection ● hyperparameter tuning ● ensembles GDP Labs Confidential

Slide 13

Slide 13 text

Bayesian Optimization Bayesian optimization λ f(λ) GDP Labs Confidential

Slide 14

Slide 14 text

GDP Labs Confidential

Slide 15

Slide 15 text

GDP Labs Confidential

Slide 16

Slide 16 text

GDP Labs Confidential

Slide 17

Slide 17 text

[Bardenet et al., ICML 2013; Swersky et al., NIPS 2013; Feurer, Springenberg, Hutter, AAAI 2015 ] [Domhan, Springenberg, Hutter, IJCAI 2015] [Klein, Bartels, Falkner, Hennig, Hutter, AISTATS 2017] Beyond Black Box Bayesian Optimization [Thornton, Hutter, Hoos, Leyton-Brown, KDD 2013] GDP Labs Confidential

Slide 18

Slide 18 text

Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves [Domhan, Springenberg, Hutter, IJCAI 2015] GDP Labs Confidential

Slide 19

Slide 19 text

Typical Learning Curves for Iterative Training with SGD Markov Chain Monte Carlo to quantify model uncertainty GDP Labs Confidential

Slide 20

Slide 20 text

Predictive Termination if P < 5%, terminate if P > 5%, continue training GDP Labs Confidential

Slide 21

Slide 21 text

Qualitative Analysis GDP Labs Confidential

Slide 22

Slide 22 text

Quantitative Analysis 2 fold speed up of Deep Neural Network structure & hyperparameter optimization GDP Labs Confidential

Slide 23

Slide 23 text

Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets [Klein, Bartels, Falkner, Hennig, Hutter, AISTATS 2017] GDP Labs Confidential

Slide 24

Slide 24 text

problem: training is very slow for large datasets approach: scaling up from subsets of the data e.g. Support Vector Machine computational cost grows quadratically in dataset size s error shrinks smoothly with dataset size s GDP Labs Confidential

Slide 25

Slide 25 text

● automatically choose dataset size for each evaluation ● entropy search based on a probability distribution of where the maximum lies ● pick configuration and dataset size pair to maximally decrease entropy per time spent GDP Labs Confidential

Slide 26

Slide 26 text

● 10 - 500 fold speed up for optimizing SVMs ● 5 - 10 fold speed up for optimizing Convolutional Neural Networks Quantitative Analysis GDP Labs Confidential

Slide 27

Slide 27 text

Bayesian Optimization with Robust Bayesian Neural Networks [Springenberg, Klein, Falkner, Hutter, NIPS 2016] GDP Labs Confidential

Slide 28

Slide 28 text

f(λ, D) ✓ f(λ, t) ✓ f(λ, s) ✓ f(λ, D, t, s)? ● a lot of data points ● expensive black box evaluations ● cheap incremental evaluations ● Gaussian Process Model will not scale Stochastic Gradient Hamiltonian Monte Carlo GDP Labs Confidential

Slide 29

Slide 29 text

Empirical Evaluation Scalable Bayesian Optimization Using Deep Neural Networks (DNGO) [Snoek et al., ICML 2015] DNN with Bayesian Linear Regression in last layer both algorithms are effective SGHMC is more robust as good as Bayesian optimization with Gaussian Processes but much more flexible e.g. reasoning over many related datasets GDP Labs Confidential

Slide 30

Slide 30 text

Conclusion ● Bayesian optimization enables true end-to-end learning ● large speed ups by going beyond black box optimization ● learning across datasets ● learning curve extrapolation ● dataset subsampling GDP Labs Confidential

Slide 31

Slide 31 text

References ● Domhan et al. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. IJCAI 2015. ● Klein et al. Fast Bayesian optimization of machine learning hyperparameters on large datasets. AISTATS 2017. ● Springenberg. Bayesian optimization with robust Bayesian neural networks. NIPS 2016. ● Snoek et al. Scalable Bayesian optimization using deep neural networks. ICML 2015. ● Hutter. Towards true end-to-end learning and optimization. ECML 2017. ● Hutter. Black box hyperparameter optimization and AutoML. AutoML 2017. ● Hutter. Beyond black box optimization. AutoML 2017. ● http://www.ml4aad.org/ ● ecmlpkdd2017.automl.org/ ● http://ecmlpkdd2017.ijs.si/ ● https://www.extremetech.com/extreme/147940-google-self-driving-cars-in-3-5-years-feds-not-so-fast ● http://www.techrepublic.com/article/apples-siri-the-smart-persons-guide/ ● https://www.youtube.com/watch?v=g-dKXOlsf98 ● http://aidev.co.kr/general/876?ckattempt=1 GDP Labs Confidential