Machine Learning Techniques

Machine Learning Techniques Ben Curtis / ben@honeybadger.io / @stympy

Making Sense of a Bunch of Data Ben Curtis /
ben@honeybadger.io / @stympy

Logging It's big, it's heavy, it's wood http://www.stockvault.net/photo/102553/cutdown

{ Graylog2 Apache Kafka

Clustering So happy together http://www.stockvault.net/photo/115439/yellow-ﬂowers

Group anything that can be plotted on an X-Y chart
K-Means Clustering 7

K-Means Live Demo! 8 http://en.wikipedia.org/wiki/K-means

http://en.wikipedia.org/wiki/K-means K-Means Live Demo!

reddavis/K-Means 1 require 'k_means' data = [[1,1],
[1,2], [1,1], [1000, 1000], [500, 500]] kmeans = KMeans.new(data, :centroids => 2) kmeans.inspect # Use kmeans.view to get hold of the un-‐inspected array => [[3, 4], [0, 1, 2]]

Deciding …and that has made all the difference http://www.ﬂickr.com/photos/47051377@N00/4033866900/

igrigorik/decisiontree require 'decisiontree' ! attributes = ['Temperature'] training
= [ [36.6, 'healthy'], [37, 'sick'], [38, 'sick'], [36.7, 'healthy'], [40, 'sick'], [50, 'really sick'], ] ! # Instantiate the tree, and train it based on the data (set default to '1') dec_tree = DecisionTree::ID3Tree.new(attributes, training, 'sick', :continuous) dec_tree.train ! decision = dec_tree.predict([37, 'sick']) puts "Predicted: #{decision} ... True decision: #{test.last}"; ! # => Predicted: sick ... True decision: sick

A shallow magnitude 2.7 earthquake aftershock was reported Monday morning
four miles from Westwood, according to the U.S. Geological Survey. The temblor occurred at 7:23 a.m. Pacific time at a depth of 4.3 miles. A magnitude 4.4 earthquake was reported at 6.25 a.m. and was felt over a large swath of Southern California. According to the USGS, the epicenter of the aftershock was five miles from Beverly Hills, six miles from Santa Monica and six miles from West Hollywood. In the last 10 days, there has been one earthquake of magnitude 3.0 or greater centered nearby. This information comes from the USGS Earthquake Notification Service and this post was created by an algorithm written by the author. http://lat.ms/1lTIGqa

Classifying Everyone starts as a Level 1 human with no
class

Good/Bad? Spam/Not-spam? Bayesian Classiﬁers 2

cardmagic/classiﬁer 2 require 'classifier' ! b = Classifier::Bayes.new 'Interesting',
'Uninteresting' b.train_interesting "here are some good words. I hope you love them" b.train_uninteresting "here are some bad words, I hate you" b.classify "I hate bad words and you" # returns 'Uninteresting'

Analyze text to identify the topic Latent Semantic Indexing 2

cardmagic/classiﬁer 2 require 'classifier' lsi = Classifier::LSI.new
strings = [ ["This text deals with dogs. Dogs.", :dog], ["This text involves dogs too. Dogs! ", :dog], ["This text revolves around cats. Cats.", :cat], ["This text also involves cats. Cats!", :cat], ["This text involves birds. Birds.",:bird ]] strings.each {|x| lsi.add_item x.first, x.last} lsi.search("dog", 3) # returns => ["This text deals with dogs. Dogs.", "This text involves dogs too. Dogs! ", # "This text also involves cats. Cats!"] lsi.find_related(strings[2], 2) # returns => ["This text revolves around cats. Cats.", "This text also involves cats. Cats!"] lsi.classify "This text is also about dogs!" # returns => :dog

Recommending If I Knew You Were Comin' I'd've Baked a
Cake http://www.ﬂickr.com/photos/48973657@N00/4556156477/

How similar are sets of data? Jaccard Index

francois/jaccard 2 a = ["likes:jeans", "likes:blue"] b = ["likes:jeans",
"likes:apples", "likes:red"] c = ["likes:apples", "likes:red"] # Determines how similar a pair of sets are Jaccard.coefficient(a, b) #=> 0.25 Jaccard.coefficient(a, c) #=> 0.0 Jaccard.coefficient(b, c) #=> 0.6666666666666666 # According to the input data, b and c have the most similar likes.

davidcelis/recommendable 2 class User recommends :movies, :books, :minerals,
:other_things # ... end >> user.like(movie) => true >> user.liked_movies => [#<Movie id: 23, name: "2001: A Space Odyssey">] >> user.recommended_movies => [#<Movie name: "A Clockwork Orange">, ...]

Ted Dunning & Ellen Friedman Innovations in Recommendation Practical Machine
Learning http://www.mapr.com/practical-machine-learning

Machine Learning Techniques Ben Curtis / ben@honeybadger.io / @stympy

Machine Learning Techniques

Machine Learning Techniques

Benjamin Curtis

More Decks by Benjamin Curtis

Other Decks in Programming

Featured

Transcript