Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Strategies & Tools for Parallel Machine Learnin...

Olivier Grisel
September 15, 2012

Strategies & Tools for Parallel Machine Learning in Python

PyConFR 2012 - Paris - scikit-learn - multiprocessing - joblib - IPython - @ogrisel

Olivier Grisel

September 15, 2012
Tweet

More Decks by Olivier Grisel

Other Decks in Programming

Transcript

  1. Parts of the Ecosystem Multiple Machines with Multiple Cores ———

    Single Machine with Multiple Cores ——— multiprocessing Sunday, September 16, 2012
  2. The Problem Big CPU (Supercomputers - MPI) Simulating stuff from

    models Big Data (Google scale - MapReduce) Counting stuff in logs / Indexing the Web Machine Learning? often somewhere in the middle Sunday, September 16, 2012
  3. Parallel ML Use Cases • Model Evaluation with Cross Validation

    • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012
  4. Embarrassingly Parallel ML Use Cases • Model Evaluation with Cross

    Validation • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012
  5. Inter-Process Comm. Use Cases • Model Evaluation with Cross Validation

    • Model Selection with Grid Search • Bagging Models: Random Forests • Averaged Models Sunday, September 16, 2012
  6. Cross Validation A B C A B C Subset of

    the data used to train the model Held-out test set for evaluation Sunday, September 16, 2012
  7. Cross Validation A B C A B C A C

    B A C B B C A B C A Sunday, September 16, 2012
  8. Model Selection the Hyperparameters hell p1 in [1, 10, 100]

    p2 in [1e3, 1e4, 1e5] Find the best combination of parameters that maximizes the Cross Validated Score Sunday, September 16, 2012
  9. Grid Search (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4)

    (10, 1e4) (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) p1 p2 Sunday, September 16, 2012
  10. (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4)

    (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) Sunday, September 16, 2012
  11. multiprocessing >>> from multiprocessing import Pool >>> p = Pool(4)

    >>> p.map(type, [1, 2., '3']) [int, float, str] >>> r = p.map_async(type, [1, 2., '3']) >>> r.get() [int, float, str] Sunday, September 16, 2012
  12. multiprocessing • Part of the standard lib • Nice API

    • Cross-Platform support (even Windows!) • Some support for shared memory • Support for synchronization (Lock) Sunday, September 16, 2012
  13. multiprocessing: limitations • No docstrings in a stdlib module? WTF?

    • Tricky / impossible to use the shared memory values with NumPy • Bad support for KeyboardInterrupt Sunday, September 16, 2012
  14. • transparent disk-caching of the output values and lazy re-evaluation

    (memoize pattern) • easy simple parallel computing • logging and tracing of the execution Sunday, September 16, 2012
  15. >>> from os.path.join >>> from joblib import Parallel, delayed >>>

    Parallel(2)( ... delayed(join)('/ect', s) ... for s in 'abc') ['/ect/a', '/ect/b', '/ect/c'] joblib.Parallel Sunday, September 16, 2012
  16. Usage in scikit-learn • Cross Validation cross_val(model, X, y, n_jobs=4,

    cv=3) • Grid Search GridSearchCV(model, n_jobs=4, cv=3).fit(X, y) • Random Forests RandomForestClassifier(n_jobs=4).fit(X, y) Sunday, September 16, 2012
  17. >>> from joblib import Parallel, delayed >>> import numpy as

    np >>> Parallel(2, max_nbytes=1e6)( ... delayed(type)(np.zeros(int(i))) ... for i in [1e4, 1e6]) [<type 'numpy.ndarray'>, <class 'numpy.core.memmap.memmap'>] joblib.Parallel: shared memory Sunday, September 16, 2012
  18. (1, 1e3) (10, 1e3) (100, 1e3) (1, 1e4) (10, 1e4)

    (100, 1e4) (1, 1e5) (10, 1e5) (100, 1e5) Only 3 allocated datasets shared by all the concurrent workers performing the grid search. Sunday, September 16, 2012
  19. Multiple Machines with Multiple Cores — — — — —

    — — — — — — — — — — — Sunday, September 16, 2012
  20. MapReduce? [ (k1, v1), (k2, v2), ... ] mapper mapper

    mapper [ (k3, v3), (k4, v4), ... ] reducer reducer [ (k5, v6), (k6, v6), ... ] Sunday, September 16, 2012
  21. Why MapReduce does not always work Write a lot of

    stuff to disk for failover Inefficient for small to medium problems [(k, v)] mapper [(k, v)] reducer [(k, v)] Data and model params as (k, v) pairs? Complex to leverage for Iterative Algorithms Sunday, September 16, 2012
  22. • Parallel Processing Library • Interactive Exploratory Shell Multi Core

    & Distributed IPython.parallel Sunday, September 16, 2012
  23. The AllReduce Pattern • Compute an aggregate (average) of active

    node data • Do not clog a single node with incoming data transfer • Traditionally implemented in MPI systems Sunday, September 16, 2012
  24. AllReduce 0/3 Initial State Value: 2.0 Value: 0.5 Value: 1.1

    Value: 3.2 Value: 0.9 Value: 1.0 Sunday, September 16, 2012
  25. AllReduce 1/3 Spanning Tree Value: 2.0 Value: 0.5 Value: 1.1

    Value: 3.2 Value: 0.9 Value: 1.0 Sunday, September 16, 2012
  26. AllReduce 2/3 Upward Averages Value: 2.0 Value: 0.5 Value: 1.1

    (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 Sunday, September 16, 2012
  27. AllReduce 2/3 Upward Averages Value: 2.0 (2.1, 3) Value: 0.5

    (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 Sunday, September 16, 2012
  28. AllReduce 2/3 Upward Averages Value: 2.0 (2.1, 3) Value: 0.5

    (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.0 (1.38, 6) Sunday, September 16, 2012
  29. AllReduce 3/3 Downward Updates Value: 2.0 (2.1, 3) Value: 0.5

    (0.7, 2) Value: 1.1 (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.38 Sunday, September 16, 2012
  30. AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.1

    (1.1, 1) Value: 3.2 (3.1, 1) Value: 0.9 (0.9, 1) Value: 1.38 Sunday, September 16, 2012
  31. AllReduce 3/3 Downward Updates Value: 1.38 Value: 1.38 Value: 1.38

    Value: 1.38 Value: 1.38 Value: 1.38 Sunday, September 16, 2012
  32. AllReduce Final State Value: 1.38 Value: 1.38 Value: 1.38 Value:

    1.38 Value: 1.38 Value: 1.38 Sunday, September 16, 2012
  33. Working in the Cloud • Launch a cluster of machines

    in one cmd: starcluster start mycluster -b 0.07 starcluster sshmaster mycluster • Supports spotinstances! • Ships blas, atlas, numpy, scipy! • IPython plugin! Sunday, September 16, 2012
  34. ps aux | grep IPython.parallel \ | grep -v grep

    \ | cut -d ' ' -f 2 | xargs kill Or simpler: Sunday, September 16, 2012