Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB & Machine Learning - Thomas Maiaroto, Union of RAD

mongodb
January 03, 2012

MongoDB & Machine Learning - Thomas Maiaroto, Union of RAD

MongoSV 2011

Using some clever functions within MongoDB you can implement various algorithms to create trainable learning systems within your app's database. You can apply these algorithms to systems such as spam filtering, content auto-tagging, social analytics, and other classification applications. Why build these systems using MongoDB? Other than the performance benefits of Mongo's aggregation systems, you can simplify your workflow, and improve the portability of your business logic. Finally, MongoDB offers many tools that your language or toolkit of choice may not.

mongodb

January 03, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Languages Commonly Used • Java o Java-ML, WEKA, Apache Mahout,

    many more... • Python o NLTK, scikit-learn, PyML, a good deal more... • C++ o libDAI, Armadillo, Orange, tons more... and then some others...
  2. Replication • Store massive amounts of data • Distributed performance

    benefits • Dedicated databases for calculations All the obvious benefits.
  3. Map/Reduce It's the brain. It's not just for aggregation. It's

    faster than you might think. It runs in the database.
  4. Classification/Naive Bayes • Accurate even with little training • MongoDB

    on a small VM Took 1.7 seconds • Compared to say PHP 33 seconds and timed out • More training data == exponentially faster than PHP
  5. Classification/Naive Bayes • This wasn't even a full map/reduce •

    Your mileage will vary based on formula • You can cache certain values for speed • Don't forget about stored JavaScript (but use it wisely)
  6. Porter Stemming Algorithm • Exists for nearly every language •

    MongoDB will use JavaScript of course • Decent execution time
  7. Porter Stemming Algorithm • About 2.5x faster than PHP class

    • 663x faster than a web browser • 7x slower than PHP PECL extension
  8. Real World Application Social Harvest Analyzes social data from the

    internet to determine languages spoken, gender, age, sentiment analysis, and categories. www.social-harvest.com