Large Scale Non-Linear Learning (Pygotham 2015)

Slide 1

Slide 1 text

Large scale non-linear learning on a single CPU Andreas Mueller NYU / scikit-learn

Slide 2

Slide 2 text

Andreas Mueller 2 ● Large Scale – “Out of core: Fits on a hard disk but in RAM” (500GB – 5TB?) ● Non-linear – because real-world problems are not. ● Single CPU – Because parallelization is hard (and often unnecessary)

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Andreas Mueller 4 ● Why not to do out of core learning ● The scikit-learn way ● Hashing trick ● Kernel approximation ● Random neural nets ● Supervised Feature Extraction ● Neural nets ● What else is out there

Slide 5

Slide 5 text

Andreas Mueller 5 Why not do to out of core learning.

Slide 6

Slide 6 text

Andreas Mueller 6 Your data is not that big!

Slide 7

Slide 7 text

Andreas Mueller 7

Slide 8

Slide 8 text

Andreas Mueller 8 "256Gb ought to be enough for anybody." - me

Slide 9

Slide 9 text

Andreas Mueller 9 "256Gb ought to be enough for anybody." - me (for machine learning)

Slide 10

Slide 10 text

12 Subsample!

Slide 11

Slide 11 text

13 The scikit-learn way

Slide 12

Slide 12 text

14 HDD Network estimator.partial_fit(X_batch, y_batch) Your for-loop / polling Trained Scikit-learn estimator

Slide 13

Slide 13 text

15 Linear Classification

Slide 14

Slide 14 text

16 Linear Classification

Slide 15

Slide 15 text

17 Linear Classification

Slide 16

Slide 16 text

18 1st nonlinear option: Stateless Transformers

Slide 17

Slide 17 text

19 Text Classification: Bag Of Word “This is how you get ants.” [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ] ants get you aardvak zyxst ['this', 'is', 'how', 'you', 'get', 'ants'] tokenizer Sparse matrix encoding Build a vocabulary over all documents ['aardvak', 'amsterdam', 'ants', ... 'you', 'your', 'zyxst']

Slide 18

Slide 18 text

20 Text Classification: Hashing Trick “This is how you get ants.” [0, …, 0, 1, 0, … , 0, 1 , 0, …, 0, 1, 0, …., 0 ] ants get you aardvak zyxst ['this', 'is', 'how', 'you', 'get', 'ants'] tokenizer Sparse matrix encoding hashing [hash('this'), hash('is'), hash('how'), hash('you'), hash('get'), hash('ants')] = [832412, 223788, 366226, 81185, 835749, 173092]

Slide 19

Slide 19 text

21 Text Classification: Hashing Trick

Slide 20

Slide 20 text

22 Kernel Approximation

Slide 21

Slide 21 text

23 Random Neural Nets (not merged yet)

Slide 22

Slide 22 text

24 2nd nonlinear option: Learn Transformations on Subsets

Slide 23

Slide 23 text

25 RandomForests

Slide 24

Slide 24 text

26 3rd nonlinear option: Online Nonlinear Classification

Slide 25

Slide 25 text

27 (not merged yet) Neural Networks (MLPs)

Slide 26

Slide 26 text

28 (not merged yet) Neural Networks (MLPs)

Slide 27

Slide 27 text

29 What Else is Out There? ● Vowpal Wabbit (VW) ● More deep learning ● Hogwild!

Slide 28

Slide 28 text

30 CDS is hiring Research Engineers

Slide 29

Slide 29 text

31 Thank you! (and talk to me if you still think you need a cluster for ML) @t3kcit @amueller [email protected]