Large Scale Non-Linear Learning (Pygotham 2015)

Large Scale Non-Linear Learning (Pygotham 2015)

Out of core learning with scikit-learn, and why not to use a cluster.

8ffe68e4b19092aab184e4aa09ca4bff?s=128

Andreas Mueller

August 16, 2015
Tweet

Transcript

  1. Large scale non-linear learning on a single CPU Andreas Mueller

    NYU / scikit-learn
  2. Andreas Mueller 2 • Large Scale – “Out of core:

    Fits on a hard disk but in RAM” (500GB – 5TB?) • Non-linear – because real-world problems are not. • Single CPU – Because parallelization is hard (and often unnecessary)
  3. 3

  4. Andreas Mueller 4 • Why not to do out of

    core learning • The scikit-learn way • Hashing trick • Kernel approximation • Random neural nets • Supervised Feature Extraction • Neural nets • What else is out there
  5. Andreas Mueller 5 Why not do to out of core

    learning.
  6. Andreas Mueller 6 Your data is not that big!

  7. Andreas Mueller 7

  8. Andreas Mueller 8 "256Gb ought to be enough for anybody."

    - me
  9. Andreas Mueller 9 "256Gb ought to be enough for anybody."

    - me (for machine learning)
  10. 12 Subsample!

  11. 13 The scikit-learn way

  12. 14 HDD Network estimator.partial_fit(X_batch, y_batch) Your for-loop / polling Trained

    Scikit-learn estimator
  13. 15 Linear Classification

  14. 16 Linear Classification

  15. 17 Linear Classification

  16. 18 1st nonlinear option: Stateless Transformers

  17. 19 Text Classification: Bag Of Word “This is how you

    get ants.” [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ] ants get you aardvak zyxst ['this', 'is', 'how', 'you', 'get', 'ants'] tokenizer Sparse matrix encoding Build a vocabulary over all documents ['aardvak', 'amsterdam', 'ants', ... 'you', 'your', 'zyxst']
  18. 20 Text Classification: Hashing Trick “This is how you get

    ants.” [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ] ants get you aardvak zyxst ['this', 'is', 'how', 'you', 'get', 'ants'] tokenizer Sparse matrix encoding hashing [hash('this'), hash('is'), hash('how'), hash('you'), hash('get'), hash('ants')] = [832412, 223788, 366226, 81185, 835749, 173092]
  19. 21 Text Classification: Hashing Trick

  20. 22 Kernel Approximation

  21. 23 Random Neural Nets (not merged yet)

  22. 24 2nd nonlinear option: Learn Transformations on Subsets

  23. 25 RandomForests

  24. 26 3rd nonlinear option: Online Nonlinear Classification

  25. 27 (not merged yet) Neural Networks (MLPs)

  26. 28 (not merged yet) Neural Networks (MLPs)

  27. 29 What Else is Out There? • Vowpal Wabbit (VW)

    • More deep learning • Hogwild!
  28. 30 CDS is hiring Research Engineers

  29. 31 Thank you! (and talk to me if you still

    think you need a cluster for ML) @t3kcit @amueller t3kcit@gmail.com