Strategies & Tools for Parallel Machine Learning in Python

Strategies & Tools for Parallel Machine Learning in Python PyConFR
2012 - Paris Sunday, September 16, 2012

The Story scikit-learn The Cloud stackoverﬂow & kaggle Sunday, September
16, 2012

... Sunday, September 16, 2012

Parts of the Ecosystem Multiple Machines with Multiple Cores ———
Single Machine with Multiple Cores ——— multiprocessing Sunday, September 16, 2012

The Problem Big CPU (Supercomputers - MPI) Simulating stuff from
models Big Data (Google scale - MapReduce) Counting stuff in logs / Indexing the Web Machine Learning? often somewhere in the middle Sunday, September 16, 2012

Parallel ML Use Cases • Model Evaluation with Cross Validation
• Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012

Embarrassingly Parallel ML Use Cases • Model Evaluation with Cross
Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012

Inter-Process Comm. Use Cases • Model Evaluation with Cross Validation
• Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012

Cross Validation Labels to Predict Input Data Sunday, September 16,
2012

Cross Validation A B C A B C Sunday, September
16, 2012

Cross Validation A B C A B C Subset of
the data used to train the model Held-out test set for evaluation Sunday, September 16, 2012

Cross Validation A B C A B C A C
B A C B B C A B C A Sunday, September 16, 2012

Model Selection the Hyperparameters hell p1 in [1, 10, 100]
p2 in [1e3, 1e4, 1e5] Find the best combination of parameters that maximizes the Cross Validated Score Sunday, September 16, 2012

Grid Search (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4)
(10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) p1 p2 Sunday, September 16, 2012

(1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4)
(100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) Sunday, September 16, 2012

Grid Search: Qualitative Results Sunday, September 16, 2012

Grid Search: Cross Validated Scores Sunday, September 16, 2012

Enough ML theory! Lets go shopping^W parallel computing! Sunday, September
16, 2012

Single Machine with Multiple Cores — — — — Sunday,
September 16, 2012

multiprocessing >>> from multiprocessing import Pool >>> p = Pool(4)
>>> p.map(type, [1, 2., '3']) [int, float, str] >>> r = p.map_async(type, [1, 2., '3']) >>> r.get() [int, float, str] Sunday, September 16, 2012

multiprocessing • Part of the standard lib • Nice API
• Cross-Platform support (even Windows!) • Some support for shared memory • Support for synchronization (Lock) Sunday, September 16, 2012

multiprocessing: limitations • No docstrings in a stdlib module? WTF?
• Tricky / impossible to use the shared memory values with NumPy • Bad support for KeyboardInterrupt Sunday, September 16, 2012

Sunday, September 16, 2012

• transparent disk-caching of the output values and lazy re-evaluation
(memoize pattern) • easy simple parallel computing • logging and tracing of the execution Sunday, September 16, 2012

>>> from os.path.join >>> from joblib import Parallel, delayed >>>
Parallel(2)( ... delayed(join)('/ect', s) ... for s in 'abc') ['/ect/a', '/ect/b', '/ect/c'] joblib.Parallel Sunday, September 16, 2012

Usage in scikit-learn • Cross Validation cross_val(model, X, y, n_jobs=4,
cv=3) • Grid Search GridSearchCV(model, n_jobs=4, cv=3).fit(X, y) • Random Forests RandomForestClassifier(n_jobs=4).fit(X, y) Sunday, September 16, 2012

>>> from joblib import Parallel, delayed >>> import numpy as
np >>> Parallel(2, max_nbytes=1e6)( ... delayed(type)(np.zeros(int(i))) ... for i in [1e4, 1e6]) [<type 'numpy.ndarray'>, <class 'numpy.core.memmap.memmap'>] joblib.Parallel: shared memory Sunday, September 16, 2012

(1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4)
(100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) Only 3 allocated datasets shared by all the concurrent workers performing the grid search. Sunday, September 16, 2012

Multiple Machines with Multiple Cores — — — — —
— — — — — — — — — — — Sunday, September 16, 2012

MapReduce? [ (k1, v1), (k2, v2), ... ] mapper mapper
mapper [ (k3, v3), (k4, v4), ... ] reducer reducer [ (k5, v6), (k6, v6), ... ] Sunday, September 16, 2012

Why MapReduce does not always work Write a lot of
stuff to disk for failover Inefﬁcient for small to medium problems [(k, v)] mapper [(k, v)] reducer [(k, v)] Data and model params as (k, v) pairs? Complex to leverage for Iterative Algorithms Sunday, September 16, 2012

• Parallel Processing Library • Interactive Exploratory Shell Multi Core
& Distributed IPython.parallel Sunday, September 16, 2012

Demo! Sunday, September 16, 2012

The AllReduce Pattern • Compute an aggregate (average) of active
node data • Do not clog a single node with incoming data transfer • Traditionally implemented in MPI systems Sunday, September 16, 2012

AllReduce 0/3 Initial State Value: 2.0 Value: 0.5 Value: 1.1
Value: 3.2 Value: 0.9 Value: 1.0 Sunday, September 16, 2012

AllReduce 1/3 Spanning Tree Value: 2.0 Value: 0.5 Value: 1.1

AllReduce 2/3 Upward Averages Value: 2.0 Value: 0.5 Value: 1.1
(1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 Sunday, September 16, 2012

AllReduce 2/3 Upward Averages Value: 2.0 (2.1, 3) Value: 0.5
(0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 Sunday, September 16, 2012

AllReduce 2/3 Upward Averages Value: 2.0 (2.1, 3) Value: 0.5
(0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 (1.38, 6) Sunday, September 16, 2012

AllReduce 3/3 Downward Updates Value: 2.0 (2.1, 3) Value: 0.5
(0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.38 Sunday, September 16, 2012

AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.1
(1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.38 Sunday, September 16, 2012

AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38

AllReduce Final State Value: 1.38 Value: 1.38 Value: 1.38 Value:
1.38 Value: 1.38 Value: 1.38 Sunday, September 16, 2012

AllReduce Implementations http://mpi4py.scipy.org IPC directly w/ IPython.parallel https://github.com/ipython/ipython/tree/ master/docs/examples/parallel/interengine Sunday,
September 16, 2012

Working in the Cloud • Launch a cluster of machines
in one cmd: starcluster start mycluster -b 0.07 starcluster sshmaster mycluster • Supports spotinstances! • Ships blas, atlas, numpy, scipy! • IPython plugin! Sunday, September 16, 2012

Perspectives Sunday, September 16, 2012

2012 results by Stanford / Google Sunday, September 16, 2012

The YouTube Neuron Sunday, September 16, 2012

Thanks • http://j.mp/ogrisel-pyconfr-2012 • http://scikit-learn.org • http://packages.python.org/joblib • http://ipython.org •
http://star.mit.edu/cluster/ @ogrisel Sunday, September 16, 2012

Goodies Super Ctrl-C to killall nosetests Sunday, September 16, 2012

psutil nosetests killer script in Automator Sunday, September 16, 2012

Bind killnose to Shift-Ctrl-C Sunday, September 16, 2012

ps aux | grep IPython.parallel \ | grep -v grep
\ | cut -d ' ' -f 2 | xargs kill Or simpler: Sunday, September 16, 2012

Strategies & Tools for Parallel Machine Learnin...

Strategies & Tools for Parallel Machine Learning in Python

More Decks by Olivier Grisel

Other Decks in Programming

Featured

Transcript