Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MakerSquare - Algorithms

sdevani
October 31, 2013

MakerSquare - Algorithms

by Elben Shira

sdevani

October 31, 2013
Tweet

Other Decks in Programming

Transcript

  1. Why? At Mass Relevance we have to analyze lots of

    things really fast. ! Millions hits/minute.
 10,000 Tweets/second.
  2. King James Bible ! s ="In the beginning God created

    the heaven and the earth…” ! Top Words ! God 1000 Jesus 300 Good 300 Evil 300 How do we do this?! What data structures can we use? Array, Hash, Set
  3. Brainstorm #1 Let's try array. Say we have an array

    like this: [["in", 1], ["the", 1"], ["beginning", 1"], ["god", 1], ["created", 1”]]
  4. Brainstorm #1 [["in", 1], ["the", 1], ["beginning", 1], ["god", 1],

    ["created", 1]] What happens when I increment the word earth? [["in", 1], ["the", 1], ["beginning", 1], ["god", 1], ["created", 1], [“earth”, 1]]
  5. Brainstorm #1 [["in", 1], ["the", 1], ["beginning", 1], ["god", 1],

    ["created", 1], [“earth”, 1]] What happens when I increment the word beginning? [["in", 1], ["the", 1], ["beginning", 2], ["god", 1], ["created", 1], [“earth”, 1]]
  6. Brainstorm #2 Say we have a hash like this: {


       "in"                =>  1,
    "the"              =>  1,
    "beginning"  =>  1,
    "god"              =>  1,
    "created"      =>  1,
 } Increment the word earth.
  7. Brainstorm #2 Say we have a hash like this: {


       "in"                =>  1,
    "the"              =>  1,
    "beginning"  =>  1,
    "god"              =>  1,
    "created"      =>  1,
    "earth"          =>  1,
 }
  8. Brainstorm #2 Say we have a hash like this: {


       "in"                =>  1,
    "the"              =>  1,
    "beginning"  =>  1,
    "god"              =>  1,
    "created"      =>  1,
    "earth"          =>  1,
 } Increment the word beginning.
  9. Brainstorm #2 Say we have a hash like this: {


       "in"                =>  1,
    "the"              =>  1,
    "beginning"  =>  2,
    "god"              =>  1,
    "created"      =>  1,
    "earth"          =>  1,
 }
  10. The code s  =  File.read("king_james_bible.txt")   words  =  s.split("\s")  

    counts  =  Hash.new(0)   words.each  do  |w|          counts[w]  +=  1   end
  11. The code sorted  =  counts.to_a.sort_by  do  |word,  count|    

     -­‐count   end   ! sorted.take(10)  
  12. The results the 64106 and 51383 of 34708 to 13630

    that 12799 in 12560 he 10264 shall 9840 unto 8987 for 8838 Not very useful!! ! Need stop words.
  13. The code stopwords  =  
    Set.new(File.read(“stop.txt”).split(“,”))   ! s

     =  File.read(“king_james_bible.txt")   words  =  s.split(“\s”).map(&:downcase)   counts  =  Hash.new(0)   words.each  do  |w|      counts[w]  +=  1  if  stopwords.include?(w)   end  
  14. The results unto 8987 thou 5202 lord 4739 thy 4600

    ye 3847 god 2303 hath 2242 him, 2034 came 1991 man 1978 Need better stop! words!
  15. Project Gutenberg Input: Books, categorized into astronomy, physics, religion, philosophy

    and archeology. ! Output: Some method that takes in a new book we've never seen before and spits out the category it's in.
  16. Tips Use arrays, hashes, and sets. Draw your data. Think:

    word counts, stop words. ! Have fun! ! https://bitbucket.org/elben/makersquare-project-gutenberg/