Slide 1

Slide 1 text

Strategies & Tools for Parallel Machine Learning in Python PyConFR 2012 - Paris Sunday, September 16, 2012

Slide 2

Slide 2 text

The Story scikit-learn The Cloud stackoverflow & kaggle Sunday, September 16, 2012

Slide 3

Slide 3 text

... Sunday, September 16, 2012

Slide 4

Slide 4 text

Parts of the Ecosystem Multiple Machines with Multiple Cores ——— Single Machine with Multiple Cores ——— multiprocessing Sunday, September 16, 2012

Slide 5

Slide 5 text

The Problem Big CPU (Supercomputers - MPI) Simulating stuff from models Big Data (Google scale - MapReduce) Counting stuff in logs / Indexing the Web Machine Learning? often somewhere in the middle Sunday, September 16, 2012

Slide 6

Slide 6 text

Parallel ML Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012

Slide 7

Slide 7 text

Embarrassingly Parallel ML Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012

Slide 8

Slide 8 text

Inter-Process Comm. Use Cases • Model Evaluation with Cross Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012

Slide 9

Slide 9 text

Cross Validation Labels to Predict Input Data Sunday, September 16, 2012

Slide 10

Slide 10 text

Cross Validation A B C A B C Sunday, September 16, 2012

Slide 11

Slide 11 text

Cross Validation A B C A B C Subset of the data used to train the model Held-out test set for evaluation Sunday, September 16, 2012

Slide 12

Slide 12 text

Cross Validation A B C A B C A C B A C B B C A B C A Sunday, September 16, 2012

Slide 13

Slide 13 text

Model Selection the Hyperparameters hell p1 in [1, 10, 100] p2 in [1e3, 1e4, 1e5] Find the best combination of parameters that maximizes the Cross Validated Score Sunday, September 16, 2012

Slide 14

Slide 14 text

Grid Search (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) p1 p2 Sunday, September 16, 2012

Slide 15

Slide 15 text

(1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) Sunday, September 16, 2012

Slide 16

Slide 16 text

Grid Search: Qualitative Results Sunday, September 16, 2012

Slide 17

Slide 17 text

Grid Search: Cross Validated Scores Sunday, September 16, 2012

Slide 18

Slide 18 text

Enough ML theory! Lets go shopping^W parallel computing! Sunday, September 16, 2012

Slide 19

Slide 19 text

Single Machine with Multiple Cores — — — — Sunday, September 16, 2012

Slide 20

Slide 20 text

multiprocessing >>> from multiprocessing import Pool >>> p = Pool(4) >>> p.map(type, [1, 2., '3']) [int, float, str] >>> r = p.map_async(type, [1, 2., '3']) >>> r.get() [int, float, str] Sunday, September 16, 2012

Slide 21

Slide 21 text

multiprocessing • Part of the standard lib • Nice API • Cross-Platform support (even Windows!) • Some support for shared memory • Support for synchronization (Lock) Sunday, September 16, 2012

Slide 22

Slide 22 text

multiprocessing: limitations • No docstrings in a stdlib module? WTF? • Tricky / impossible to use the shared memory values with NumPy • Bad support for KeyboardInterrupt Sunday, September 16, 2012

Slide 23

Slide 23 text

Sunday, September 16, 2012

Slide 24

Slide 24 text

• transparent disk-caching of the output values and lazy re-evaluation (memoize pattern) • easy simple parallel computing • logging and tracing of the execution Sunday, September 16, 2012

Slide 25

Slide 25 text

>>> from os.path.join >>> from joblib import Parallel, delayed >>> Parallel(2)( ... delayed(join)('/ect', s) ... for s in 'abc') ['/ect/a', '/ect/b', '/ect/c'] joblib.Parallel Sunday, September 16, 2012

Slide 26

Slide 26 text

Usage in scikit-learn • Cross Validation cross_val(model, X, y, n_jobs=4, cv=3) • Grid Search GridSearchCV(model, n_jobs=4, cv=3).fit(X, y) • Random Forests RandomForestClassifier(n_jobs=4).fit(X, y) Sunday, September 16, 2012

Slide 27

Slide 27 text

>>> from joblib import Parallel, delayed >>> import numpy as np >>> Parallel(2, max_nbytes=1e6)( ... delayed(type)(np.zeros(int(i))) ... for i in [1e4, 1e6]) [, ] joblib.Parallel: shared memory Sunday, September 16, 2012

Slide 28

Slide 28 text

(1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) Only 3 allocated datasets shared by all the concurrent workers performing the grid search. Sunday, September 16, 2012

Slide 29

Slide 29 text

Multiple Machines with Multiple Cores — — — — — — — — — — — — — — — — Sunday, September 16, 2012

Slide 30

Slide 30 text

MapReduce? [ (k1, v1), (k2, v2), ... ] mapper mapper mapper [ (k3, v3), (k4, v4), ... ] reducer reducer [ (k5, v6), (k6, v6), ... ] Sunday, September 16, 2012

Slide 31

Slide 31 text

Why MapReduce does not always work Write a lot of stuff to disk for failover Inefficient for small to medium problems [(k, v)] mapper [(k, v)] reducer [(k, v)] Data and model params as (k, v) pairs? Complex to leverage for Iterative Algorithms Sunday, September 16, 2012

Slide 32

Slide 32 text

• Parallel Processing Library • Interactive Exploratory Shell Multi Core & Distributed IPython.parallel Sunday, September 16, 2012

Slide 33

Slide 33 text

Demo! Sunday, September 16, 2012

Slide 34

Slide 34 text

The AllReduce Pattern • Compute an aggregate (average) of active node data • Do not clog a single node with incoming data transfer • Traditionally implemented in MPI systems Sunday, September 16, 2012

Slide 35

Slide 35 text

AllReduce 0/3 Initial State Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9 Value: 1.0 Sunday, September 16, 2012

Slide 36

Slide 36 text

AllReduce 1/3 Spanning Tree Value: 2.0 Value: 0.5 Value: 1.1 Value: 3.2 Value: 0.9 Value: 1.0 Sunday, September 16, 2012

Slide 37

Slide 37 text

AllReduce 2/3 Upward Averages Value: 2.0 Value: 0.5 Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 Sunday, September 16, 2012

Slide 38

Slide 38 text

AllReduce 2/3 Upward Averages Value: 2.0 (2.1, 3) Value: 0.5 (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 Sunday, September 16, 2012

Slide 39

Slide 39 text

AllReduce 2/3 Upward Averages Value: 2.0 (2.1, 3) Value: 0.5 (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 (1.38, 6) Sunday, September 16, 2012

Slide 40

Slide 40 text

AllReduce 3/3 Downward Updates Value: 2.0 (2.1, 3) Value: 0.5 (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.38 Sunday, September 16, 2012

Slide 41

Slide 41 text

AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.38 Sunday, September 16, 2012

Slide 42

Slide 42 text

AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Sunday, September 16, 2012

Slide 43

Slide 43 text

AllReduce Final State Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Value: 1.38 Sunday, September 16, 2012

Slide 44

Slide 44 text

AllReduce Implementations http://mpi4py.scipy.org IPC directly w/ IPython.parallel https://github.com/ipython/ipython/tree/ master/docs/examples/parallel/interengine Sunday, September 16, 2012

Slide 45

Slide 45 text

Working in the Cloud • Launch a cluster of machines in one cmd: starcluster start mycluster -b 0.07 starcluster sshmaster mycluster • Supports spotinstances! • Ships blas, atlas, numpy, scipy! • IPython plugin! Sunday, September 16, 2012

Slide 46

Slide 46 text

Perspectives Sunday, September 16, 2012

Slide 47

Slide 47 text

2012 results by Stanford / Google Sunday, September 16, 2012

Slide 48

Slide 48 text

The YouTube Neuron Sunday, September 16, 2012

Slide 49

Slide 49 text

Thanks • http://j.mp/ogrisel-pyconfr-2012 • http://scikit-learn.org • http://packages.python.org/joblib • http://ipython.org • http://star.mit.edu/cluster/ @ogrisel Sunday, September 16, 2012

Slide 50

Slide 50 text

Goodies Super Ctrl-C to killall nosetests Sunday, September 16, 2012

Slide 51

Slide 51 text

psutil nosetests killer script in Automator Sunday, September 16, 2012

Slide 52

Slide 52 text

Bind killnose to Shift-Ctrl-C Sunday, September 16, 2012

Slide 53

Slide 53 text

ps aux | grep IPython.parallel \ | grep -v grep \ | cut -d ' ' -f 2 | xargs kill Or simpler: Sunday, September 16, 2012