Composing Testable and Robust Machine Learning Pipelines

Composing Testable and Robust Machine Learning Pipelines Holger Peters, Data
Scientist and Software Developer @data_hope Budapest BI Forum 2015 http://www.holger-peters.de Slides: https://speakerdeck.com/holgerpeters 1

Given history of data: Historic features X Historic target y
  Estimate for target: Prediction ŷ based on new X' Supervised Machine Learning The Problem Scikit-Learn •Open Source •ML algorithms •"Plumbing" code •Python 2

Supervised Machine Learning Feature1 Feature 2 ... Target 1 8
1 4 2 11 1 1 3 17 5 4 4 18 4 6 ... 34123 21 7 ? 34124 25 0 ? 34125 15 4 ? 34126 15 1 ? Values to be estimated ŷ X

Given history of data: Historic features X Historic target y
  Estimate for target: Prediction ŷ based on new X' Supervised Machine Learning Training/Fit Estimation/Predict 4

An Interface for Predictions estimator.fit(X_train, y_train) cPickle.dump(est) cPickle.load(est) Persist model
estimator.predict(X_test) -> ŷ Train model: Estimate target 5

— implement ﬁt and predict! How to Structure Machine Learning
Model Code? 6

Substructure of ML Model Data Cleanup Feature building ML algorithm
Preprocessing for estimator this part is "predictive" most steps involve transforming the feature matrix X 7

Structuring Predictive Models Data Cleanup Feature building ML algorithm Transformer
1 Estimator Preprocessing for estimators Transformer 2 Transformer n Transformer 3 Support Vector Machine Gradient Boosting Random Forest, etc. StandardScaler Imputer, LabelEncoder, etc. 8

The Transformer Interface Estimator fit(X,y) predict(X) -> ŷ Transformer fit(X,y)
transform(X) -> Xt 9

Pipeline Pipelines: Sequential Transformations Transformer 1 Estimator Transformer 2 Transformer
n We compose transformers and estimators with a pipeline A pipeline is an estimator (a meta-estimator) Composite pattern Transformers and estimator can be tested independently 10

Example pipe = Pipeline([('pca', PCA(n_components=20)), ('scaler', StandardScaler()), ('svc', LinearSVC())]) pipe.fit(X_train,
y_train) y_pred = pipe.predict(X_test) score = mean_absolute_error(y_pred, y_test)

Intermediate Summary Assemble models using Transformers and Estimators. Write preprocessing
using Transformers. Small building blocks make testing easier. Decoupling of ML algorithm, preprocessing, meta-logic.

Example: Compose a Model

Multi classification problem. Problem: Our algorithms are all about binary
classification. Approach: Turn several binary classifications into a multi-classification Example: Recognise Written Digits 3 0 2 0 1 0 8 7 6 5 4 3 2 9 1 0 8 7 6 5 4 3 2 9 1 0 4 1 3 1 2 1 45 trainings one-vs-one 10 trainings with one-vs-rest

Multiclassification problem. Problem: Our algorithms are all about binary classification.
Ansatz: Turn several binary classifications into a multi-classification Example: Recognise Written Digits

from sklearn.pipeline import Pipeline from sklearn.cross_validation import train_test_split from sklearn.decomposition
import PCA from sklearn.svm import LinearSVC, SVC from sklearn.preprocessing import StandardScaler from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier from sklearn.datasets import load_digits digits = load_digits() X, y = digits.data, digits.target X_train, X_test, y_train, y_test = train_test_split(X, y) pipe = Pipeline([('pca', PCA(n_components=20)), ('scaler', StandardScaler()), ('svc', LinearSVC())]) one_vs_one = OneVsOneClassifier(pipe).fit(X_train, y_train) score_one_vs_one = one_vs_one.score(X_test, y_test) one_vs_rest = OneVsRestClassifier(pipe).fit(X_train, y_train) score_one_vs_rest = one_vs_rest.score(X_test, y_test) # Score of one_vs_one: 0.97 # Score of one_vs_rest: 0.94

Learnings Use Meta-Estimators to build upon other models. Write logic
that works with any estimator/transformer. Be able to exchange inner model as needed.

Example: Test Driven Transformer Development

A Rough Reimplementation of StandardScaler import numpy as np from
sklearn.base import BaseEstimator, TransformerMixin class Scaler(BaseEstimator, TransformerMixin): def fit(self, X, y=None): self.mean_ = np.mean(X, axis=0) self.std_ = np.std(X, axis=0) return self def transform(self, X): X = np.asarray(X, dtype=np.float).copy() X -= self.mean_ X /= self.std_ return X def test_scaler_noop(): X = np.c_[[-1, 1]] s = Scaler() Xt = s.fit_transform(X) assert Xt is not X np.testing.assert_allclose(Xt, np.c_[[-1, 1]]) np.testing.assert_allclose(np.mean(Xt, axis=0), 0., atol=1e-10) np.testing.assert_allclose(np.std(Xt, axis=0), 1., atol=1e-10) def test_scaler_simple(): X = np.c_[np.arange(10.), np.arange(10.)] s = Scaler() Xt = s.fit_transform(X) assert Xt is not X np.testing.assert_allclose(np.mean(Xt, axis=0), 0., atol=1e-10) np.testing.assert_allclose(np.std(Xt, axis=0), 1., atol=1e-10) def test_scaler_with_data_where_one_column_is_of_constant_value(): X = np.c_[np.ones(10.), np.arange(10.)] s = Scaler() Xt = s.fit_transform(X) assert Xt is not X np.testing.assert_allclose(np.mean(Xt, axis=0), 0., atol=1e-10) np.testing.assert_allclose(np.std(Xt, axis=0), 1., atol=1e-10)

Final Advice Use composable Transformers and Estimators. Small building blocks
make testing easier,  tested data science makes data science easier A transformer should do one thing (and one thing only),  decouple what can be independent.

Scikit-Learn: http://www.scikit-learn.org Handwritten Digits ﬁgure: http://scikit-learn.org/stable/auto_examples/neighbors/ plot_digits_kde_sampling.html References

Composing Testable and Robust Machine Learning ...

Composing Testable and Robust Machine Learning Pipelines

Holger Peters

More Decks by Holger Peters

Other Decks in Technology

Featured

Transcript