Large Scale Non-Linear Learning (Pygotham 2015)

Large scale non-linear learning on a single CPU Andreas Mueller
NYU / scikit-learn

Andreas Mueller 2 • Large Scale – “Out of core:
Fits on a hard disk but in RAM” (500GB – 5TB?) • Non-linear – because real-world problems are not. • Single CPU – Because parallelization is hard (and often unnecessary)

Andreas Mueller 4 • Why not to do out of
core learning • The scikit-learn way • Hashing trick • Kernel approximation • Random neural nets • Supervised Feature Extraction • Neural nets • What else is out there

Andreas Mueller 5 Why not do to out of core
learning.

Andreas Mueller 6 Your data is not that big!

Andreas Mueller 7

Andreas Mueller 8 "256Gb ought to be enough for anybody."
- me

Andreas Mueller 9 "256Gb ought to be enough for anybody."
- me (for machine learning)

12 Subsample!

13 The scikit-learn way

14 HDD Network estimator.partial_fit(X_batch, y_batch) Your for-loop / polling Trained
Scikit-learn estimator

15 Linear Classification

18 1st nonlinear option: Stateless Transformers

19 Text Classification: Bag Of Word “This is how you
get ants.” [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ] ants get you aardvak zyxst ['this', 'is', 'how', 'you', 'get', 'ants'] tokenizer Sparse matrix encoding Build a vocabulary over all documents ['aardvak', 'amsterdam', 'ants', ... 'you', 'your', 'zyxst']

20 Text Classification: Hashing Trick “This is how you get
ants.” [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ] ants get you aardvak zyxst ['this', 'is', 'how', 'you', 'get', 'ants'] tokenizer Sparse matrix encoding hashing [hash('this'), hash('is'), hash('how'), hash('you'), hash('get'), hash('ants')] = [832412, 223788, 366226, 81185, 835749, 173092]

21 Text Classification: Hashing Trick

22 Kernel Approximation

23 Random Neural Nets (not merged yet)

24 2nd nonlinear option: Learn Transformations on Subsets

25 RandomForests

26 3rd nonlinear option: Online Nonlinear Classification

27 (not merged yet) Neural Networks (MLPs)

28 (not merged yet) Neural Networks (MLPs)

29 What Else is Out There? • Vowpal Wabbit (VW)
• More deep learning • Hogwild!

30 CDS is hiring Research Engineers

31 Thank you! (and talk to me if you still
think you need a cluster for ML) @t3kcit @amueller [email protected]

Large Scale Non-Linear Learning (Pygotham 2015)

Large Scale Non-Linear Learning (Pygotham 2015)

Andreas Mueller

More Decks by Andreas Mueller

Other Decks in Programming

Featured

Transcript

Large scale non-linear learning on a single CPU Andreas Mueller

Andreas Mueller 2 • Large Scale – “Out of core:

3

Andreas Mueller 4 • Why not to do out of

Andreas Mueller 5 Why not do to out of core

Andreas Mueller 6 Your data is not that big!

Andreas Mueller 7

Andreas Mueller 8 "256Gb ought to be enough for anybody."

Andreas Mueller 9 "256Gb ought to be enough for anybody."

12 Subsample!

13 The scikit-learn way

14 HDD Network estimator.partial_fit(X_batch, y_batch) Your for-loop / polling Trained

15 Linear Classification

16 Linear Classification

17 Linear Classification

18 1st nonlinear option: Stateless Transformers

19 Text Classification: Bag Of Word “This is how you

20 Text Classification: Hashing Trick “This is how you get

21 Text Classification: Hashing Trick

22 Kernel Approximation

23 Random Neural Nets (not merged yet)

24 2nd nonlinear option: Learn Transformations on Subsets

25 RandomForests

26 3rd nonlinear option: Online Nonlinear Classification

27 (not merged yet) Neural Networks (MLPs)

28 (not merged yet) Neural Networks (MLPs)

29 What Else is Out There? • Vowpal Wabbit (VW)

30 CDS is hiring Research Engineers

31 Thank you! (and talk to me if you still