Algorithms & Approaches
Decision trees
Random forests
Artificial neural networks
k-NN (nearest neighbour)
Naive Bayesian classifier
Slide 4
Slide 4 text
Algorithms & Approaches
Decision trees
Random forests
Artificial neural networks
k-NN (nearest neighbour)
Naive Bayesian classifier
Slide 5
Slide 5 text
So could machines one day rule
the earth?
Slide 6
Slide 6 text
So could machines one day rule
the earth?
Maybe (ok probably not)
Slide 7
Slide 7 text
What can Machine Learning
do for Apps?
Spam filtering
Slide 8
Slide 8 text
What can Machine Learning
do for Apps?
Auto-tagging
Slide 9
Slide 9 text
What can Machine Learning
do for Apps?
All Sorts of Categorization
Slide 10
Slide 10 text
What can Machine Learning
do for Apps?
Sentiment Analysis
Slide 11
Slide 11 text
Languages Commonly Used
• Java
o Java-ML, WEKA, Apache Mahout, many more...
• Python
o NLTK, scikit-learn, PyML, a good deal more...
• C++
o libDAI, Armadillo, Orange, tons more...
and then some others...
Geo-spatial Indexing
Did someone say nearest neighbour?
Slide 15
Slide 15 text
Geo-spatial Indexing
Did someone say nearest neighbour?
Design geeks, imagine the visualizations...
Slide 16
Slide 16 text
Replication
• Store massive amounts of data
• Distributed performance benefits
• Dedicated databases for calculations
All the obvious benefits.
Slide 17
Slide 17 text
Map/Reduce
It's the brain.
Slide 18
Slide 18 text
Map/Reduce
It's the brain.
It's not just for aggregation.
Slide 19
Slide 19 text
Map/Reduce
It's the brain.
It's not just for aggregation.
It's faster than you might think.
Slide 20
Slide 20 text
Map/Reduce
It's the brain.
It's not just for aggregation.
It's faster than you might think.
It runs in the database.
Slide 21
Slide 21 text
Map/Reduce
In the computer...
Slide 22
Slide 22 text
Example Time!
It's simple...Just take this...
Slide 23
Slide 23 text
Example Time!
It's simple...Just take this...
Slide 24
Slide 24 text
Example Time!
Just kidding...
Let's Break Down a Naive Bayes Classifier
Slide 25
Slide 25 text
Classification/Naive Bayes
Training the System
Slide 26
Slide 26 text
Classification/Naive Bayes
Training the System
Simple...
$inc
Slide 27
Slide 27 text
Classification/Naive Bayes
Just Keep Count of Words per Category
Slide 28
Slide 28 text
Classification/Naive Bayes
Reduce:
Slide 29
Slide 29 text
Classification/Naive Bayes
Reduce:
Slide 30
Slide 30 text
Classification/Naive Bayes
Finalize:
Slide 31
Slide 31 text
Classification/Naive Bayes
Finalize:
Slide 32
Slide 32 text
Classification/Naive Bayes
Call the Command:
Slide 33
Slide 33 text
Classification/Naive Bayes
Results:
Can see total words.
Can also see word
counts per category.
Slide 34
Slide 34 text
Classification/Naive Bayes
Results:
...and of course the scores per category...
cae = arts and entertainment
cs = science
...
Slide 35
Slide 35 text
Classification/Naive Bayes
• Accurate even with little training
• MongoDB on a small VM
Took 1.7 seconds
• Compared to say PHP
33 seconds and timed out
• More training data == exponentially faster
than PHP
Slide 36
Slide 36 text
Classification/Naive Bayes
• This wasn't even a full map/reduce
• Your mileage will vary based on formula
• You can cache certain values for speed
• Don't forget about stored JavaScript
(but use it wisely)
Slide 37
Slide 37 text
Porter Stemming Algorithm
Thank You Martin Porter
http://tartarus.org/martin/PorterStemmer
Slide 38
Slide 38 text
Porter Stemming Algorithm
• Exists for nearly every language
• MongoDB will use JavaScript of course
• Decent execution time
Slide 39
Slide 39 text
Porter Stemming Algorithm
• About 2.5x faster than PHP class
• 663x faster than a web browser
Slide 40
Slide 40 text
Porter Stemming Algorithm
• About 2.5x faster than PHP class
• 663x faster than a web browser
• 7x slower than PHP PECL extension
Slide 41
Slide 41 text
Real World Application
Social Harvest
Analyzes social data from the internet to
determine languages spoken, gender, age,
sentiment analysis, and categories.
www.social-harvest.com
Slide 42
Slide 42 text
Real World Application
Social Harvest
Who doesn't like pie charts?
Slide 43
Slide 43 text
No content
Slide 44
Slide 44 text
Follow Tom
@shift8creative
www.shift8creative.com
www.social-harvest.com
www.union-of-rad.com
Thank You!