$30 off During Our Annual Pro Sale. View Details »

Large Scale Non-Linear Learning (Pygotham 2015)

Large Scale Non-Linear Learning (Pygotham 2015)

Out of core learning with scikit-learn, and why not to use a cluster.

Andreas Mueller

August 16, 2015
Tweet

More Decks by Andreas Mueller

Other Decks in Programming

Transcript

  1. Large scale non-linear learning
    on a single CPU
    Andreas Mueller
    NYU / scikit-learn

    View Slide

  2. Andreas Mueller 2

    Large Scale – “Out of core: Fits on a hard disk but
    in RAM” (500GB – 5TB?)

    Non-linear – because real-world problems are not.

    Single CPU – Because parallelization is hard
    (and often unnecessary)

    View Slide

  3. 3

    View Slide

  4. Andreas Mueller 4

    Why not to do out of core learning

    The scikit-learn way

    Hashing trick

    Kernel approximation

    Random neural nets

    Supervised Feature Extraction

    Neural nets

    What else is out there

    View Slide

  5. Andreas Mueller 5
    Why not do to out of core learning.

    View Slide

  6. Andreas Mueller 6
    Your data is not that big!

    View Slide

  7. Andreas Mueller 7

    View Slide

  8. Andreas Mueller 8
    "256Gb ought to be enough for anybody."
    - me

    View Slide

  9. Andreas Mueller 9
    "256Gb ought to be enough for anybody."
    - me
    (for machine learning)

    View Slide

  10. 12
    Subsample!

    View Slide

  11. 13
    The scikit-learn way

    View Slide

  12. 14
    HDD
    Network
    estimator.partial_fit(X_batch, y_batch)
    Your for-loop / polling
    Trained
    Scikit-learn
    estimator

    View Slide

  13. 15
    Linear Classification

    View Slide

  14. 16
    Linear Classification

    View Slide

  15. 17
    Linear Classification

    View Slide

  16. 18
    1st nonlinear option:
    Stateless Transformers

    View Slide

  17. 19
    Text Classification: Bag Of Word
    “This is how you get ants.”
    [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ]
    ants get you
    aardvak zyxst
    ['this', 'is', 'how', 'you', 'get', 'ants']
    tokenizer
    Sparse matrix encoding
    Build a vocabulary over all documents
    ['aardvak', 'amsterdam', 'ants', ... 'you',
    'your', 'zyxst']

    View Slide

  18. 20
    Text Classification: Hashing Trick
    “This is how you get ants.”
    [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ]
    ants get you
    aardvak zyxst
    ['this', 'is', 'how', 'you', 'get', 'ants']
    tokenizer
    Sparse matrix encoding
    hashing
    [hash('this'), hash('is'), hash('how'), hash('you'),
    hash('get'), hash('ants')]
    = [832412, 223788, 366226, 81185, 835749, 173092]

    View Slide

  19. 21
    Text Classification: Hashing Trick

    View Slide

  20. 22
    Kernel Approximation

    View Slide

  21. 23
    Random Neural Nets
    (not merged yet)

    View Slide

  22. 24
    2nd nonlinear option:
    Learn Transformations on Subsets

    View Slide

  23. 25
    RandomForests

    View Slide

  24. 26
    3rd nonlinear option:
    Online Nonlinear Classification

    View Slide

  25. 27
    (not merged yet)
    Neural Networks (MLPs)

    View Slide

  26. 28
    (not merged yet)
    Neural Networks (MLPs)

    View Slide

  27. 29
    What Else is Out There?

    Vowpal Wabbit (VW)

    More deep learning

    Hogwild!

    View Slide

  28. 30
    CDS is hiring Research Engineers

    View Slide

  29. 31
    Thank you!
    (and talk to me if you still think you need a cluster for ML)
    @t3kcit
    @amueller
    [email protected]

    View Slide