Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning Techniques

Machine Learning Techniques

A brief introduction to machine learning and machine learning resources for Rubyists.

Benjamin Curtis

March 21, 2014
Tweet

More Decks by Benjamin Curtis

Other Decks in Programming

Transcript

  1. Machine Learning Techniques
    Ben Curtis / [email protected] / @stympy

    View Slide

  2. Making Sense of a Bunch of Data
    Ben Curtis / [email protected] / @stympy

    View Slide

  3. View Slide

  4. Logging
    It's big, it's heavy, it's wood
    http://www.stockvault.net/photo/102553/cutdown

    View Slide

  5. { Graylog2
    Apache Kafka

    View Slide

  6. Clustering
    So happy together
    http://www.stockvault.net/photo/115439/yellow-flowers

    View Slide

  7. Group anything that can be plotted on an X-Y chart
    K-Means Clustering
    7

    View Slide

  8. K-Means Live Demo!
    8
    http://en.wikipedia.org/wiki/K-means

    View Slide

  9. http://en.wikipedia.org/wiki/K-means
    K-Means Live Demo!

    View Slide

  10. http://en.wikipedia.org/wiki/K-means
    K-Means Live Demo!

    View Slide

  11. http://en.wikipedia.org/wiki/K-means
    K-Means Live Demo!

    View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. reddavis/K-Means
    1
    require  'k_means'  
       
    data  =  [[1,1],  [1,2],  [1,1],  [1000,  1000],  [500,  500]]  
    kmeans  =  KMeans.new(data,  :centroids  =>  2)  
    kmeans.inspect    #  Use  kmeans.view  to  get  hold  of  the  un-­‐inspected  array  
    =>  [[3,  4],  [0,  1,  2]]

    View Slide

  16. Deciding
    …and that has made all the difference
    http://www.flickr.com/photos/47051377@N00/4033866900/

    View Slide

  17. igrigorik/decisiontree
    require  'decisiontree'  
    !
    attributes  =  ['Temperature']  
    training  =  [  
       [36.6,  'healthy'],  
       [37,  'sick'],  
       [38,  'sick'],  
       [36.7,  'healthy'],  
       [40,  'sick'],  
       [50,  'really  sick'],  
    ]  
    !
    #  Instantiate  the  tree,  and  train  it  based  on  the  data  (set  default  to  '1')  
    dec_tree  =  DecisionTree::ID3Tree.new(attributes,  training,  'sick',  :continuous)  
    dec_tree.train  
    !
    decision  =  dec_tree.predict([37,  'sick'])  
    puts  "Predicted:  #{decision}  ...  True  decision:  #{test.last}";  
    !
    #  =>  Predicted:  sick  ...  True  decision:  sick  

    View Slide

  18. A shallow magnitude 2.7 earthquake aftershock was reported Monday morning
    four miles from Westwood, according to the U.S. Geological Survey. The temblor
    occurred at 7:23 a.m. Pacific time at a depth of 4.3 miles.
    A magnitude 4.4 earthquake was reported at 6.25 a.m. and was felt over a large
    swath of Southern California.
    According to the USGS, the epicenter of the aftershock was five miles from Beverly
    Hills, six miles from Santa Monica and six miles from West Hollywood.
    In the last 10 days, there has been one earthquake of magnitude 3.0 or greater
    centered nearby.
    This information comes from the USGS Earthquake Notification Service and this
    post was created by an algorithm written by the author.
    http://lat.ms/1lTIGqa

    View Slide

  19. Classifying
    Everyone starts as a Level 1 human with no class

    View Slide

  20. Good/Bad? Spam/Not-spam?
    Bayesian Classifiers
    2

    View Slide

  21. cardmagic/classifier
    2
    require  'classifier'  
    !
    b  =  Classifier::Bayes.new  'Interesting',  'Uninteresting'  
    b.train_interesting  "here  are  some  good  words.  I  hope  you  love  them"  
    b.train_uninteresting  "here  are  some  bad  words,  I  hate  you"  
    b.classify  "I  hate  bad  words  and  you"  #  returns  'Uninteresting'

    View Slide

  22. Analyze text to identify the topic
    Latent Semantic Indexing
    2

    View Slide

  23. cardmagic/classifier
    2
    require  'classifier'  
       
    lsi  =  Classifier::LSI.new  
    strings  =  [  ["This  text  deals  with  dogs.  Dogs.",  :dog],  
                       ["This  text  involves  dogs  too.  Dogs!  ",  :dog],  
                       ["This  text  revolves  around  cats.  Cats.",  :cat],  
                       ["This  text  also  involves  cats.  Cats!",  :cat],  
                       ["This  text  involves  birds.  Birds.",:bird  ]]  
    strings.each  {|x|  lsi.add_item  x.first,  x.last}  
       
    lsi.search("dog",  3)  
    #  returns  =>  ["This  text  deals  with  dogs.  Dogs.",  "This  text  involves  dogs  too.  
    Dogs!  ",    
    #                          "This  text  also  involves  cats.  Cats!"]  
       
    lsi.find_related(strings[2],  2)  
    #  returns  =>  ["This  text  revolves  around  cats.  Cats.",  "This  text  also  involves  
    cats.  Cats!"]  
       
    lsi.classify  "This  text  is  also  about  dogs!"  
    #  returns  =>  :dog

    View Slide

  24. Recommending
    If I Knew You Were Comin' I'd've Baked a Cake
    http://www.flickr.com/photos/48973657@N00/4556156477/

    View Slide

  25. How similar are sets of data?
    Jaccard Index

    View Slide

  26. francois/jaccard
    2
    a  =  ["likes:jeans",  "likes:blue"]  
    b  =  ["likes:jeans",  "likes:apples",  "likes:red"]  
    c  =  ["likes:apples",  "likes:red"]  
       
    #  Determines  how  similar  a  pair  of  sets  are  
    Jaccard.coefficient(a,  b)  
    #=>  0.25  
       
    Jaccard.coefficient(a,  c)  
    #=>  0.0  
       
    Jaccard.coefficient(b,  c)  
    #=>  0.6666666666666666  
       
    #  According  to  the  input  data,  b  and  c  have  the  most  similar  likes.

    View Slide

  27. davidcelis/recommendable
    2
    class  User  
       recommends  :movies,  :books,  :minerals,  :other_things  
       
       #  ...  
    end  
       
    >>  user.like(movie)  
    =>  true  
    >>  user.liked_movies  
    =>  [#]  
    >>  user.recommended_movies  
    =>  [#,  ...]

    View Slide

  28. Ted Dunning
    & Ellen Friedman
    Innovations in Recommendation
    Practical Machine
    Learning
    http://www.mapr.com/practical-machine-learning

    View Slide

  29. Machine Learning Techniques
    Ben Curtis / [email protected] / @stympy

    View Slide