Slide 1

Slide 1 text

Machine Learning Techniques Ben Curtis / [email protected] / @stympy

Slide 2

Slide 2 text

Making Sense of a Bunch of Data Ben Curtis / [email protected] / @stympy

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Logging It's big, it's heavy, it's wood http://www.stockvault.net/photo/102553/cutdown

Slide 5

Slide 5 text

{ Graylog2 Apache Kafka

Slide 6

Slide 6 text

Clustering So happy together http://www.stockvault.net/photo/115439/yellow-flowers

Slide 7

Slide 7 text

Group anything that can be plotted on an X-Y chart K-Means Clustering 7

Slide 8

Slide 8 text

K-Means Live Demo! 8 http://en.wikipedia.org/wiki/K-means

Slide 9

Slide 9 text

http://en.wikipedia.org/wiki/K-means K-Means Live Demo!

Slide 10

Slide 10 text

http://en.wikipedia.org/wiki/K-means K-Means Live Demo!

Slide 11

Slide 11 text

http://en.wikipedia.org/wiki/K-means K-Means Live Demo!

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

reddavis/K-Means 1 require  'k_means'       data  =  [[1,1],  [1,2],  [1,1],  [1000,  1000],  [500,  500]]   kmeans  =  KMeans.new(data,  :centroids  =>  2)   kmeans.inspect    #  Use  kmeans.view  to  get  hold  of  the  un-­‐inspected  array   =>  [[3,  4],  [0,  1,  2]]

Slide 16

Slide 16 text

Deciding …and that has made all the difference http://www.flickr.com/photos/47051377@N00/4033866900/

Slide 17

Slide 17 text

igrigorik/decisiontree require  'decisiontree'   ! attributes  =  ['Temperature']   training  =  [      [36.6,  'healthy'],      [37,  'sick'],      [38,  'sick'],      [36.7,  'healthy'],      [40,  'sick'],      [50,  'really  sick'],   ]   ! #  Instantiate  the  tree,  and  train  it  based  on  the  data  (set  default  to  '1')   dec_tree  =  DecisionTree::ID3Tree.new(attributes,  training,  'sick',  :continuous)   dec_tree.train   ! decision  =  dec_tree.predict([37,  'sick'])   puts  "Predicted:  #{decision}  ...  True  decision:  #{test.last}";   ! #  =>  Predicted:  sick  ...  True  decision:  sick  

Slide 18

Slide 18 text

A shallow magnitude 2.7 earthquake aftershock was reported Monday morning four miles from Westwood, according to the U.S. Geological Survey. The temblor occurred at 7:23 a.m. Pacific time at a depth of 4.3 miles. A magnitude 4.4 earthquake was reported at 6.25 a.m. and was felt over a large swath of Southern California. According to the USGS, the epicenter of the aftershock was five miles from Beverly Hills, six miles from Santa Monica and six miles from West Hollywood. In the last 10 days, there has been one earthquake of magnitude 3.0 or greater centered nearby. This information comes from the USGS Earthquake Notification Service and this post was created by an algorithm written by the author. http://lat.ms/1lTIGqa

Slide 19

Slide 19 text

Classifying Everyone starts as a Level 1 human with no class

Slide 20

Slide 20 text

Good/Bad? Spam/Not-spam? Bayesian Classifiers 2

Slide 21

Slide 21 text

cardmagic/classifier 2 require  'classifier'   ! b  =  Classifier::Bayes.new  'Interesting',  'Uninteresting'   b.train_interesting  "here  are  some  good  words.  I  hope  you  love  them"   b.train_uninteresting  "here  are  some  bad  words,  I  hate  you"   b.classify  "I  hate  bad  words  and  you"  #  returns  'Uninteresting'

Slide 22

Slide 22 text

Analyze text to identify the topic Latent Semantic Indexing 2

Slide 23

Slide 23 text

cardmagic/classifier 2 require  'classifier'       lsi  =  Classifier::LSI.new   strings  =  [  ["This  text  deals  with  dogs.  Dogs.",  :dog],                      ["This  text  involves  dogs  too.  Dogs!  ",  :dog],                      ["This  text  revolves  around  cats.  Cats.",  :cat],                      ["This  text  also  involves  cats.  Cats!",  :cat],                      ["This  text  involves  birds.  Birds.",:bird  ]]   strings.each  {|x|  lsi.add_item  x.first,  x.last}       lsi.search("dog",  3)   #  returns  =>  ["This  text  deals  with  dogs.  Dogs.",  "This  text  involves  dogs  too.   Dogs!  ",     #                          "This  text  also  involves  cats.  Cats!"]       lsi.find_related(strings[2],  2)   #  returns  =>  ["This  text  revolves  around  cats.  Cats.",  "This  text  also  involves   cats.  Cats!"]       lsi.classify  "This  text  is  also  about  dogs!"   #  returns  =>  :dog

Slide 24

Slide 24 text

Recommending If I Knew You Were Comin' I'd've Baked a Cake http://www.flickr.com/photos/48973657@N00/4556156477/

Slide 25

Slide 25 text

How similar are sets of data? Jaccard Index

Slide 26

Slide 26 text

francois/jaccard 2 a  =  ["likes:jeans",  "likes:blue"]   b  =  ["likes:jeans",  "likes:apples",  "likes:red"]   c  =  ["likes:apples",  "likes:red"]       #  Determines  how  similar  a  pair  of  sets  are   Jaccard.coefficient(a,  b)   #=>  0.25       Jaccard.coefficient(a,  c)   #=>  0.0       Jaccard.coefficient(b,  c)   #=>  0.6666666666666666       #  According  to  the  input  data,  b  and  c  have  the  most  similar  likes.

Slide 27

Slide 27 text

davidcelis/recommendable 2 class  User      recommends  :movies,  :books,  :minerals,  :other_things          #  ...   end       >>  user.like(movie)   =>  true   >>  user.liked_movies   =>  [#]   >>  user.recommended_movies   =>  [#,  ...]

Slide 28

Slide 28 text

Ted Dunning & Ellen Friedman Innovations in Recommendation Practical Machine Learning http://www.mapr.com/practical-machine-learning

Slide 29

Slide 29 text

Machine Learning Techniques Ben Curtis / [email protected] / @stympy