Machine Learning Techniques

Machine Learning Techniques

A brief introduction to machine learning and machine learning resources for Rubyists.

9b0968d25731bc92a98c3e0b77e6d2ce?s=128

Benjamin Curtis

March 21, 2014
Tweet

Transcript

  1. Machine Learning Techniques Ben Curtis / ben@honeybadger.io / @stympy

  2. Making Sense of a Bunch of Data Ben Curtis /

    ben@honeybadger.io / @stympy
  3. None
  4. Logging It's big, it's heavy, it's wood http://www.stockvault.net/photo/102553/cutdown

  5. { Graylog2 Apache Kafka

  6. Clustering So happy together http://www.stockvault.net/photo/115439/yellow-flowers

  7. Group anything that can be plotted on an X-Y chart

    K-Means Clustering 7
  8. K-Means Live Demo! 8 http://en.wikipedia.org/wiki/K-means

  9. http://en.wikipedia.org/wiki/K-means K-Means Live Demo!

  10. http://en.wikipedia.org/wiki/K-means K-Means Live Demo!

  11. http://en.wikipedia.org/wiki/K-means K-Means Live Demo!

  12. None
  13. None
  14. None
  15. reddavis/K-Means 1 require  'k_means'       data  =  [[1,1],

     [1,2],  [1,1],  [1000,  1000],  [500,  500]]   kmeans  =  KMeans.new(data,  :centroids  =>  2)   kmeans.inspect    #  Use  kmeans.view  to  get  hold  of  the  un-­‐inspected  array   =>  [[3,  4],  [0,  1,  2]]
  16. Deciding …and that has made all the difference http://www.flickr.com/photos/47051377@N00/4033866900/

  17. igrigorik/decisiontree require  'decisiontree'   ! attributes  =  ['Temperature']   training

     =  [      [36.6,  'healthy'],      [37,  'sick'],      [38,  'sick'],      [36.7,  'healthy'],      [40,  'sick'],      [50,  'really  sick'],   ]   ! #  Instantiate  the  tree,  and  train  it  based  on  the  data  (set  default  to  '1')   dec_tree  =  DecisionTree::ID3Tree.new(attributes,  training,  'sick',  :continuous)   dec_tree.train   ! decision  =  dec_tree.predict([37,  'sick'])   puts  "Predicted:  #{decision}  ...  True  decision:  #{test.last}";   ! #  =>  Predicted:  sick  ...  True  decision:  sick  
  18. A shallow magnitude 2.7 earthquake aftershock was reported Monday morning

    four miles from Westwood, according to the U.S. Geological Survey. The temblor occurred at 7:23 a.m. Pacific time at a depth of 4.3 miles. A magnitude 4.4 earthquake was reported at 6.25 a.m. and was felt over a large swath of Southern California. According to the USGS, the epicenter of the aftershock was five miles from Beverly Hills, six miles from Santa Monica and six miles from West Hollywood. In the last 10 days, there has been one earthquake of magnitude 3.0 or greater centered nearby. This information comes from the USGS Earthquake Notification Service and this post was created by an algorithm written by the author. http://lat.ms/1lTIGqa
  19. Classifying Everyone starts as a Level 1 human with no

    class
  20. Good/Bad? Spam/Not-spam? Bayesian Classifiers 2

  21. cardmagic/classifier 2 require  'classifier'   ! b  =  Classifier::Bayes.new  'Interesting',

     'Uninteresting'   b.train_interesting  "here  are  some  good  words.  I  hope  you  love  them"   b.train_uninteresting  "here  are  some  bad  words,  I  hate  you"   b.classify  "I  hate  bad  words  and  you"  #  returns  'Uninteresting'
  22. Analyze text to identify the topic Latent Semantic Indexing 2

  23. cardmagic/classifier 2 require  'classifier'       lsi  =  Classifier::LSI.new

      strings  =  [  ["This  text  deals  with  dogs.  Dogs.",  :dog],                      ["This  text  involves  dogs  too.  Dogs!  ",  :dog],                      ["This  text  revolves  around  cats.  Cats.",  :cat],                      ["This  text  also  involves  cats.  Cats!",  :cat],                      ["This  text  involves  birds.  Birds.",:bird  ]]   strings.each  {|x|  lsi.add_item  x.first,  x.last}       lsi.search("dog",  3)   #  returns  =>  ["This  text  deals  with  dogs.  Dogs.",  "This  text  involves  dogs  too.   Dogs!  ",     #                          "This  text  also  involves  cats.  Cats!"]       lsi.find_related(strings[2],  2)   #  returns  =>  ["This  text  revolves  around  cats.  Cats.",  "This  text  also  involves   cats.  Cats!"]       lsi.classify  "This  text  is  also  about  dogs!"   #  returns  =>  :dog
  24. Recommending If I Knew You Were Comin' I'd've Baked a

    Cake http://www.flickr.com/photos/48973657@N00/4556156477/
  25. How similar are sets of data? Jaccard Index

  26. francois/jaccard 2 a  =  ["likes:jeans",  "likes:blue"]   b  =  ["likes:jeans",

     "likes:apples",  "likes:red"]   c  =  ["likes:apples",  "likes:red"]       #  Determines  how  similar  a  pair  of  sets  are   Jaccard.coefficient(a,  b)   #=>  0.25       Jaccard.coefficient(a,  c)   #=>  0.0       Jaccard.coefficient(b,  c)   #=>  0.6666666666666666       #  According  to  the  input  data,  b  and  c  have  the  most  similar  likes.
  27. davidcelis/recommendable 2 class  User      recommends  :movies,  :books,  :minerals,

     :other_things          #  ...   end       >>  user.like(movie)   =>  true   >>  user.liked_movies   =>  [#<Movie  id:  23,  name:  "2001:  A  Space  Odyssey">]   >>  user.recommended_movies   =>  [#<Movie  name:  "A  Clockwork  Orange">,  ...]
  28. Ted Dunning & Ellen Friedman Innovations in Recommendation Practical Machine

    Learning http://www.mapr.com/practical-machine-learning
  29. Machine Learning Techniques Ben Curtis / ben@honeybadger.io / @stympy