Slide 1

Slide 1 text

Machine Learning Tom Maiaroto @shift8creative

Slide 2

Slide 2 text

What is Machine Learning?

Slide 3

Slide 3 text

Algorithms & Approaches Decision trees Random forests Artificial neural networks k-NN (nearest neighbour) Naive Bayesian classifier

Slide 4

Slide 4 text

Algorithms & Approaches Decision trees Random forests Artificial neural networks k-NN (nearest neighbour) Naive Bayesian classifier

Slide 5

Slide 5 text

So could machines one day rule the earth?

Slide 6

Slide 6 text

So could machines one day rule the earth? Maybe (ok probably not)

Slide 7

Slide 7 text

What can Machine Learning do for Apps? Spam filtering

Slide 8

Slide 8 text

What can Machine Learning do for Apps? Auto-tagging

Slide 9

Slide 9 text

What can Machine Learning do for Apps? All Sorts of Categorization

Slide 10

Slide 10 text

What can Machine Learning do for Apps? Sentiment Analysis

Slide 11

Slide 11 text

Languages Commonly Used • Java o Java-ML, WEKA, Apache Mahout, many more... • Python o NLTK, scikit-learn, PyML, a good deal more... • C++ o libDAI, Armadillo, Orange, tons more... and then some others...

Slide 12

Slide 12 text

Languages Commonly Used http://www.mloss.org

Slide 13

Slide 13 text

MongoDB Too! • Map/Reduce • Stored JavaScript • Geo-spatial Indexing • Replication

Slide 14

Slide 14 text

Geo-spatial Indexing Did someone say nearest neighbour?

Slide 15

Slide 15 text

Geo-spatial Indexing Did someone say nearest neighbour? Design geeks, imagine the visualizations...

Slide 16

Slide 16 text

Replication • Store massive amounts of data • Distributed performance benefits • Dedicated databases for calculations All the obvious benefits.

Slide 17

Slide 17 text

Map/Reduce It's the brain.

Slide 18

Slide 18 text

Map/Reduce It's the brain. It's not just for aggregation.

Slide 19

Slide 19 text

Map/Reduce It's the brain. It's not just for aggregation. It's faster than you might think.

Slide 20

Slide 20 text

Map/Reduce It's the brain. It's not just for aggregation. It's faster than you might think. It runs in the database.

Slide 21

Slide 21 text

Map/Reduce In the computer...

Slide 22

Slide 22 text

Example Time! It's simple...Just take this...

Slide 23

Slide 23 text

Example Time! It's simple...Just take this...

Slide 24

Slide 24 text

Example Time! Just kidding... Let's Break Down a Naive Bayes Classifier

Slide 25

Slide 25 text

Classification/Naive Bayes Training the System

Slide 26

Slide 26 text

Classification/Naive Bayes Training the System Simple... $inc

Slide 27

Slide 27 text

Classification/Naive Bayes Just Keep Count of Words per Category

Slide 28

Slide 28 text

Classification/Naive Bayes Reduce:

Slide 29

Slide 29 text

Classification/Naive Bayes Reduce:

Slide 30

Slide 30 text

Classification/Naive Bayes Finalize:

Slide 31

Slide 31 text

Classification/Naive Bayes Finalize:

Slide 32

Slide 32 text

Classification/Naive Bayes Call the Command:

Slide 33

Slide 33 text

Classification/Naive Bayes Results: Can see total words. Can also see word counts per category.

Slide 34

Slide 34 text

Classification/Naive Bayes Results: ...and of course the scores per category... cae = arts and entertainment cs = science ...

Slide 35

Slide 35 text

Classification/Naive Bayes • Accurate even with little training • MongoDB on a small VM Took 1.7 seconds • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP

Slide 36

Slide 36 text

Classification/Naive Bayes • This wasn't even a full map/reduce • Your mileage will vary based on formula • You can cache certain values for speed • Don't forget about stored JavaScript (but use it wisely)

Slide 37

Slide 37 text

Porter Stemming Algorithm Thank You Martin Porter http://tartarus.org/martin/PorterStemmer

Slide 38

Slide 38 text

Porter Stemming Algorithm • Exists for nearly every language • MongoDB will use JavaScript of course • Decent execution time

Slide 39

Slide 39 text

Porter Stemming Algorithm • About 2.5x faster than PHP class • 663x faster than a web browser

Slide 40

Slide 40 text

Porter Stemming Algorithm • About 2.5x faster than PHP class • 663x faster than a web browser • 7x slower than PHP PECL extension

Slide 41

Slide 41 text

Real World Application Social Harvest Analyzes social data from the internet to determine languages spoken, gender, age, sentiment analysis, and categories. www.social-harvest.com

Slide 42

Slide 42 text

Real World Application Social Harvest Who doesn't like pie charts?

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

Follow Tom @shift8creative www.shift8creative.com www.social-harvest.com www.union-of-rad.com Thank You!