Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Large Scale Non-Linear Learning (Pygotham 2015)

Large Scale Non-Linear Learning (Pygotham 2015)

Out of core learning with scikit-learn, and why not to use a cluster.

Andreas Mueller

August 16, 2015
Tweet

More Decks by Andreas Mueller

Other Decks in Programming

Transcript

  1. Andreas Mueller 2 • Large Scale – “Out of core:

    Fits on a hard disk but in RAM” (500GB – 5TB?) • Non-linear – because real-world problems are not. • Single CPU – Because parallelization is hard (and often unnecessary)
  2. 3

  3. Andreas Mueller 4 • Why not to do out of

    core learning • The scikit-learn way • Hashing trick • Kernel approximation • Random neural nets • Supervised Feature Extraction • Neural nets • What else is out there
  4. 19 Text Classification: Bag Of Word “This is how you

    get ants.” [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ] ants get you aardvak zyxst ['this', 'is', 'how', 'you', 'get', 'ants'] tokenizer Sparse matrix encoding Build a vocabulary over all documents ['aardvak', 'amsterdam', 'ants', ... 'you', 'your', 'zyxst']
  5. 20 Text Classification: Hashing Trick “This is how you get

    ants.” [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ] ants get you aardvak zyxst ['this', 'is', 'how', 'you', 'get', 'ants'] tokenizer Sparse matrix encoding hashing [hash('this'), hash('is'), hash('how'), hash('you'), hash('get'), hash('ants')] = [832412, 223788, 366226, 81185, 835749, 173092]
  6. 29 What Else is Out There? • Vowpal Wabbit (VW)

    • More deep learning • Hogwild!
  7. 31 Thank you! (and talk to me if you still

    think you need a cluster for ML) @t3kcit @amueller [email protected]