Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Automating Machine Learning
Search
Andreas Mueller
July 15, 2016
Science
1.2k
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Automating Machine Learning
Andreas Mueller
July 15, 2016
More Decks by Andreas Mueller
See All by Andreas Mueller
PyCon India - Commodity Machine Learning; past, present and future
amueller
0
2.7k
Engineering Scikit-Learn V2
amueller
0
310
Advanced Machine Learning with Scikit-Learn for Pycon Amsterdam
amueller
0
300
Scikit-learn: New project features in 0.17
amueller
0
150
Bootstrapping machine learning
amueller
0
150
PyData Berlin 2014 Keynote: Commodity machine learnin
amueller
0
200
Advanced Machine Learning with Scikit-Learn
amueller
1
760
Machine Learning With Scikit-Learn ODSC SF 2015
amueller
4
1.8k
Machine Learning With Scikit-Learn - Pydata Strata NYC 2015
amueller
1
3k
Other Decks in Science
See All in Science
[NLP2026 参加報告会] AI for Science まとめ / NLP2026
lychee1223
0
1.9k
AkarengaLT vol.40
hashimoto_kei
0
110
機械学習 - ニューラルネットワーク入門
trycycle
PRO
0
1k
プロジェクト「Azayaka」のSARの数式とジオメトリ
syuchimu
0
340
Kaggle: NeurIPS - Open Polymer Prediction 2025 コンペ 反省会
calpis10000
0
590
SHINOMIYA Nariyoshi
genomethica
0
150
Kritische evaluatie van GenAI-output voor literatuuronderzoek
voginip
0
160
主成分分析に基づく教師なし特徴抽出法を用いたコラーゲン-グリコサミノグリカンメッシュの遺伝子発現への影響
tagtag
PRO
0
270
AkarengaLT vol.41
hashimoto_kei
1
140
Bリーグのショットデータを活用した得点期待値モデルの構築 / Construction of expected points model using shot data of B.LEAGUE
konakalab
0
140
1. CPC理論の展開と集合的知能モデル(JSAI2026 KS-27 集合的予測符号化と新たな知性の時代)
hayashiyus884
1
190
大黒市で発生した大規模インシデント の ポストモーテムから読み解く、 記憶媒体消去の大切さ
shucho0103
0
180
Featured
See All Featured
My Coaching Mixtape
mlcsv
0
140
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
150
How to Ace a Technical Interview
jacobian
281
24k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
Mobile First: as difficult as doing things right
swwweet
225
10k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
201
75k
Are puppies a ranking factor?
jonoalderson
1
3.5k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
380
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
720
Transcript
Andreas Mueller (NYU Center for Data Science, scikit-learn) Automatic Machine
Learning?
Why?
Issues with current tools (scikit-learn)
Flow chart / selecting model
Selecting Hyper-Parameters
Scikit-learn: Explicit is better than implicit make_pipeline( OneHotEncoder(), Imputer(), StandardScaler(),
SVC())
What? from automl import AutoClassifier clf = AutoClassifier().fit(X_train, y_train) >
Current Accuracy: 70% (AUC .65) LinearSVC(C=1), 10sec > Current Accuracy: 76% (AUC .71) RandomForest(n_estimators=20) 30sec > Current Accuracy: 80% (AUC .74) RandomForest(n_estimators=500) 30sec
Step 1: Automate Parameter Selection
Step 2: Automate Model Selection
Step 3: Automate Pipeline Selection
How?
Formalizing the Search Space Discrete and Continuous Parameters Conditional Parameters
Fixed pipeline vs flexible pipeline
Formalizing the Search Space Discrete and Continuous Parameters Conditional Parameters
Fixed pipeline vs flexible pipeline
Search Methods
Exhaustive Search (Grid Search)
Randomized Search
Bayesian Optimization (SMBO)
None
None
None
Gaussian Processes
Random Forest Based (SMAC)
Non-parametric (TPE)
None
None
Warm-starting and Meta-learning
Meta-Learning optimization Algorithm + Parameters Dataset 1
Meta-Learning optimization Algorithm + Parameters Dataset 3 optimization Algorithm +
Parameters Dataset 2 optimization Algorithm + Parameters Dataset 1
Meta-Learning Meta-Features 1 optimization Algorithm + Parameters Dataset 3 optimization
Algorithm + Parameters Dataset 2 optimization Algorithm + Parameters Dataset 1 Meta-Features 2 Meta-Features 3 ML model
Meta-Learning Meta-Features 1 optimization Algorithm + Parameters Dataset 3 optimization
Algorithm + Parameters Dataset 2 optimization Algorithm + Parameters Dataset 1 Meta-Features 2 Meta-Features 3 ML model New Dataset ML model Algorithm + Parameters
Meta-Features
Existing Approaches
auto-sklearn (Hutter, Feurer, Eggensperger) http://automl.github.io/auto-sklearn/stable/
Autoweka
Hyperopt-sklearn
TPot
Spearmint https://github.com/HIPS/Spearmint
Scikit-optimize
Within Scikit-learn • GridSearchCV • RandomizedSearchCV • BayesianSearchCV (coming) •
Searching over Pipelines (coming) • Built-in parameter ranges (coming)
TODO Clean separation of: • Model Search Space • Pipeline
Search Space • Optimization Method • Meta-Learning • Exploit prior knowledge better! • Usability • Runtime consideration
TODO Clean separation of: • Model Search Space • Pipeline
Search Space • Optimization Method • Meta-Learning • Exploit prior knowledge better! • Usability • Runtime consideration • Data subsampling
Criticism
Randomized Search works well
Do we need 100 Classifiers? Do we need Complex pipelines?
I don’t want a black-box!
46 http://oreilly.com/pub/get/scipy
47 Material • Random Search for Hyper-Parameter Optimization (Bergstra, Bengio)
• Efficient and Robust Automated Machine Learning (Feurer et al) [autosklearn] • http://automl.github.io/auto-sklearn/stable/ • Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits (Lie et. al) [hyperband] https://arxiv.org/abs/1603.06560 • Scalable Bayesian Optimization Using Deep Neural Networks [Snoek et al]
48 @amuellerml @amueller
[email protected]
http://amueller.io Thank you.