PyCon India 2016 keynote
Commodity Machine LearningPast, present and futureAndreas Mueller
View Slide
What is machine learning?
Automatic Decision MakingSpam?Yes No
Spam?Yes No
ProgrammingMachine Learning
Machine learning is EVERYWHERE
ScienceEngineeringMedicine...
Commodity machine learning
past
+
dawn of open source tools...
The age of shell
Documentation? Testing?
Scikit-learn: User centric machine learning
.fit(X, y).predict(X).transform(X)
present
Choose your ecosystem.
Open! Documented! Tested!
Usability is key!
ML FrameworksPyMC, Edward, Stantheano, tensorflow, keras
from sklearn.model_selection import GridSearchCVfrom sklearn.pipeline import Pipeline
github.com/scikitlearncontrib/scikitlearncontrib
(near) Future
pip install scikitlearn==0.18rc20.18for the release candidate:
sklearn.cross_validationsklearn.grid_searchsklearn.learning_curvesklearn.model_selection
results = pd.DataFrame(grid_search.results_)
labels → groupsn_folds → n_splits
from sklearn.cross_validation import KFoldcv = KFold(n_samples, n_folds)for train, test in cv:...from sklearn.model_selection import KFoldcv = KFold(n_folds)for train, test in cv.split(X, y):...
from sklearn.mixture import GaussianMixturefrom sklearn.mixture import BayesianGaussianMixture
PCA()RandomizedPCA()PCA()
Gaussian Process Rewrite
Isolation Forests
Playfrom sklearn.neural_network import MLPClassifierWorkimport keras
pipe = Pipeline([('preprocessing', StandardScaler()),('classifier', SVC())])param_grid = {'preprocessing': [StandardScaler(), None]}grid = GridSearchCV(pipe, param_grid)
40
(further) Future
Feature / Column names
from __future__ import sklearn.plotting
from __future__ import AutoClassifier
More Transparency
amueller.github.io@amuellerml@amueller[email protected]