Strategies & Tools for
Parallel Machine Learning
in Python
PyConFR 2012 - Paris
Sunday, September 16, 2012
Slide 2
Slide 2 text
The Story
scikit-learn
The Cloud
stackoverflow
& kaggle
Sunday, September 16, 2012
Slide 3
Slide 3 text
...
Sunday, September 16, 2012
Slide 4
Slide 4 text
Parts of the Ecosystem
Multiple Machines with Multiple Cores ———
Single Machine with Multiple Cores ———
multiprocessing
Sunday, September 16, 2012
Slide 5
Slide 5 text
The Problem
Big CPU (Supercomputers - MPI)
Simulating stuff from models
Big Data (Google scale - MapReduce)
Counting stuff in logs / Indexing the Web
Machine Learning?
often somewhere in the middle
Sunday, September 16, 2012
Slide 6
Slide 6 text
Parallel ML Use Cases
• Model Evaluation with Cross Validation
• Model Selection with Grid Search
• Bagging Models: Random Forests
• Averaged Models
Sunday, September 16, 2012
Slide 7
Slide 7 text
Embarrassingly Parallel
ML Use Cases
• Model Evaluation with Cross Validation
• Model Selection with Grid Search
• Bagging Models: Random Forests
• Averaged Models
Sunday, September 16, 2012
Slide 8
Slide 8 text
Inter-Process Comm.
Use Cases
• Model Evaluation with Cross Validation
• Model Selection with Grid Search
• Bagging Models: Random Forests
• Averaged Models
Sunday, September 16, 2012
Slide 9
Slide 9 text
Cross Validation
Labels to Predict
Input Data
Sunday, September 16, 2012
Slide 10
Slide 10 text
Cross Validation
A B C
A B C
Sunday, September 16, 2012
Slide 11
Slide 11 text
Cross Validation
A B C
A B C
Subset of the data used
to train the model
Held-out
test set
for evaluation
Sunday, September 16, 2012
Slide 12
Slide 12 text
Cross Validation
A B C
A B C
A C B
A C B
B C A
B C A
Sunday, September 16, 2012
Slide 13
Slide 13 text
Model Selection
the Hyperparameters hell
p1 in [1, 10, 100]
p2 in [1e3, 1e4, 1e5]
Find the best combination of parameters
that maximizes the Cross Validated Score
Sunday, September 16, 2012
Grid Search:
Qualitative Results
Sunday, September 16, 2012
Slide 17
Slide 17 text
Grid Search:
Cross Validated Scores
Sunday, September 16, 2012
Slide 18
Slide 18 text
Enough ML theory!
Lets go shopping^W
parallel computing!
Sunday, September 16, 2012
Slide 19
Slide 19 text
Single Machine
with
Multiple Cores
— —
— —
Sunday, September 16, 2012
Slide 20
Slide 20 text
multiprocessing
>>> from multiprocessing import Pool
>>> p = Pool(4)
>>> p.map(type, [1, 2., '3'])
[int, float, str]
>>> r = p.map_async(type, [1, 2., '3'])
>>> r.get()
[int, float, str]
Sunday, September 16, 2012
Slide 21
Slide 21 text
multiprocessing
• Part of the standard lib
• Nice API
• Cross-Platform support (even Windows!)
• Some support for shared memory
• Support for synchronization (Lock)
Sunday, September 16, 2012
Slide 22
Slide 22 text
multiprocessing:
limitations
• No docstrings in a stdlib module? WTF?
• Tricky / impossible to use the shared
memory values with NumPy
• Bad support for KeyboardInterrupt
Sunday, September 16, 2012
Slide 23
Slide 23 text
Sunday, September 16, 2012
Slide 24
Slide 24 text
• transparent disk-caching of the output
values and lazy re-evaluation (memoize
pattern)
• easy simple parallel computing
• logging and tracing of the execution
Sunday, September 16, 2012
Slide 25
Slide 25 text
>>> from os.path.join
>>> from joblib import Parallel, delayed
>>> Parallel(2)(
... delayed(join)('/ect', s)
... for s in 'abc')
['/ect/a', '/ect/b', '/ect/c']
joblib.Parallel
Sunday, September 16, 2012
Slide 26
Slide 26 text
Usage in scikit-learn
• Cross Validation
cross_val(model, X, y, n_jobs=4, cv=3)
• Grid Search
GridSearchCV(model, n_jobs=4, cv=3).fit(X, y)
• Random Forests
RandomForestClassifier(n_jobs=4).fit(X, y)
Sunday, September 16, 2012
Slide 27
Slide 27 text
>>> from joblib import Parallel, delayed
>>> import numpy as np
>>> Parallel(2, max_nbytes=1e6)(
... delayed(type)(np.zeros(int(i)))
... for i in [1e4, 1e6])
[, ]
joblib.Parallel:
shared memory
Sunday, September 16, 2012
Slide 28
Slide 28 text
(1, 1e3) (10, 1e3) (100, 1e3)
(1, 1e4) (10, 1e4) (100, 1e4)
(1, 1e5) (10, 1e5) (100, 1e5)
Only 3 allocated datasets shared
by all the concurrent workers performing
the grid search.
Sunday, September 16, 2012
Why MapReduce does
not always work
Write a lot of stuff to disk for failover
Inefficient for small to medium problems
[(k, v)] mapper [(k, v)] reducer [(k, v)]
Data and model params as (k, v) pairs?
Complex to leverage for Iterative
Algorithms
Sunday, September 16, 2012
Slide 32
Slide 32 text
• Parallel Processing Library
• Interactive Exploratory Shell
Multi Core & Distributed
IPython.parallel
Sunday, September 16, 2012
Slide 33
Slide 33 text
Demo!
Sunday, September 16, 2012
Slide 34
Slide 34 text
The AllReduce Pattern
• Compute an aggregate (average) of active
node data
• Do not clog a single node with incoming
data transfer
• Traditionally implemented in MPI systems
Sunday, September 16, 2012
Slide 35
Slide 35 text
AllReduce 0/3
Initial State
Value: 2.0 Value: 0.5
Value: 1.1 Value: 3.2 Value: 0.9
Value: 1.0
Sunday, September 16, 2012
Slide 36
Slide 36 text
AllReduce 1/3
Spanning Tree
Value: 2.0 Value: 0.5
Value: 1.1 Value: 3.2 Value: 0.9
Value: 1.0
Sunday, September 16, 2012
Working in the Cloud
• Launch a cluster of machines in one cmd:
starcluster start mycluster -b 0.07
starcluster sshmaster mycluster
• Supports spotinstances!
• Ships blas, atlas, numpy, scipy!
• IPython plugin!
Sunday, September 16, 2012
Slide 46
Slide 46 text
Perspectives
Sunday, September 16, 2012
Slide 47
Slide 47 text
2012 results by
Stanford / Google
Sunday, September 16, 2012