Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Recommending stuff in Ruby

Recommending stuff in Ruby

Building a simple recommendation engine in Ruby!

Herval Freire

March 14, 2014
Tweet

More Decks by Herval Freire

Other Decks in Programming

Transcript

  1. ARTIFICIAL INTELLIGENCE "the design of systems that perceive their environment

    and take actions to maximize its chances of success" - John McCarthy (1956)
  2. ARTIFICIAL INTELLIGENCE ‣End goal: achieve superhuman intelligence in a machine

    ‣The "golden dream" of the 80's ‣The “holy grail” of the Singularitarians
  3. “BUT WHAT SHOULD I USE AI FOR...?” Genetic algorithms Pick

    the most cost-effective combination of flights between NYC and Tashkent* Natural Language Processing "Siri, call me sexy!" (iPhone) Machine Vision Unlock your Android phone just by looking at it Neural networks "automatically detect and flag NSFW pictures" (Flickr) Classifiers "I got a high fever and my knees hurt. Am I dying?" (WebMD) * that’s in uzbekistan, if you’re asking
  4. But that’s all SO COMPLICATED… Isn’t there anything a bit

    more practical? - some pragmatic programmer
  5. ARTIFICIAL “INTELLIGENCE” ‣Basic techniques ‣Far from “intelligent”, still very useful

    Data clustering “classify blogs in categories based on their content” (Flipboard) Recommendation engines “people who watched The X-Files also liked...”
  6. IN A NUTSHELL... ‣Collect ratings/preferences from users (eg.: reviews, likes)

    ‣Compare people to other people based on what they like ‣Offer stuff similar people already liked
  7. STEP 1: BUILD A DATASET Boil preferences down to numbers*

    ‣ User liked a game = 0.5 point ‣ User purchased a game = 1 point ‣ User played a game = 2 points ‣ User reviewed a game = 3 points ‣ User liked, purchased, played and reviewed a game = 6.5 points * Resist the urge to use 1-5 star ratings
  8. STEP 1: BUILD A DATASET [ Entity.new('Lisa', { 'Prince of

    Persia' => 2.5, 'Doom' => 3.5, 'Castle Wolfenstein' => 3.0, 'Rise of the Triad' => 3.5, 'Commander Keen' => 2.5, 'Duke Nukem' => 3.0 }), Entity.new('Larry', { 'Prince of Persia' => 3.0, 'Doom' => 3.5, 'Castle Wolfenstein' => 1.5, 'Rise of the Triad' => 5.0, 'Duke Nukem' => 3.0, 'Commander Keen' => 3.5 }), Entity.new('Robert', { 'Prince of Persia' => 2.5, 'Doom' => 3.0, 'Rise of the Triad' => 3.5, 'Duke Nukem' => 4.0 }), Entity.new('Claudia', { 'Doom' => 3.5, 'Castle Wolfenstein' => 3.0, 'Duke Nukem' => 4.5, 'Rise of the Triad' => 4.0, 'Commander Keen' => 2.5 }), Entity.new('Mark', { 'Prince of Persia' => 3.0, 'Doom' => 4.0, 'Castle Wolfenstein' => 2.0, 'Rise of the Triad' => 3.0, 'Duke Nukem' => 3.0, 'Commander Keen' => 2.0 }), Entity.new('Jane', { 'Prince of Persia' => 3.0, 'Doom' => 4.0, 'Duke Nukem' => 3.0, 'Rise of the Triad' => 5.0, 'Commander Keen' => 3.5 }), Entity.new('John', { 'Doom' => 4.5, 'Commander Keen' => 1.0, 'Rise of the Triad' => 4.0 }) ]
  9. STEP 2: COMPARE PEOPLE ‣ Compare each person to one

    another, generating a similarity score ‣ Euclidian distance between each pair of ratings: ‣ Many other distance calculation algorithms exist: linear distance, jaccard, manhattan, tanimoto, etc ‣ Better algorithm = more interesting results
  10. STEP 2: COMPARE PEOPLE # Returns the euclidian distance between

    person1 and person2 def distance(person1, person2) rated_by_both = person1.ratings.select { |game| person2.ratings[game] } # if they have no ratings in common, return 0 return 0.0 if rated_by_both.empty? # add up the squares of all the differences sum_of_squares = 0.0 person1.ratings.collect do |game, score| person2_score = person2.ratings[game] next if !person2_score sum_of_squares += ((score - person2_score) ** 2) end 1.0 / (1.0 + sum_of_squares) end
  11. STEP 3: SIMILAR USERS Grab the n users with the

    highest level of similarity (in other words, the people closest to you, according to the distance algorithm from step 2)
  12. STEP 3: SIMILAR USERS # Returns the 5 best matching

    people (most similar preferences) def top_matches(person, all_ratings) other_people = all_ratings.select { |person2| person2.name != person.name } other_people.collect do |other_person| [ other_person, distance(person, other_person) # change this to use other algorithms ] end.sort_by { |sim| sim[1] }.reverse[0..5] end # People similar to John: # Mark (30% match) # Robert (28% match) # Claudia (23% match) # Lisa (22% match) # Jane (11% match) # Larry (10% match)
  13. STEP 3: SIMILAR USERS # Returns the 5 best matching

    people (most similar preferences) def top_matches(person, all_ratings) other_people = all_ratings.select { |person2| person2.name != person.name } other_people.collect do |other_person| [ other_person, distance(person, other_person) # change this to use other algorithms ] end.sort_by { |sim| sim[1] }.reverse[0..5] end ! Achievement unlocked “people you should follow: John, Mary” # People similar to John: # Mark (30% match) # Robert (28% match) # Claudia (23% match) # Lisa (22% match) # Jane (11% match) # Larry (10% match)
  14. STEP 4: RECOMMENDING A GAME ‣Grab each user’s ratings to

    games you haven’t rated ‣Multiply that by how similar the other user is to you (opinions from people similar to you weight more) ‣Grab the highest numbers
  15. # Recommended games for John: # Duke Nukem # Prince

    of Persia # Castle Wolfenstein STEP 4: RECOMMENDING A GAME # Gets recommendations for a person by using a weighted average of every other user's ratings def recommendations(person, other_people) similarities = {} other_people.each do |other_person| similarity = distance(person, other_person) # ignore scores of zero or lower next if similarity <= 0 other_person.ratings.each do |other_person_game, other_person_score| # only score what I haven't rated yet next if person.ratings[other_person_game] similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 } # sum of weighted rating times similarity and total similarity similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity similarity_for_game[:sum] += similarity end end # normalize list and sort by highest scores first similarities.collect do |game_name, score| [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse end
  16. # Recommended games for John: # Duke Nukem # Prince

    of Persia # Castle Wolfenstein STEP 4: RECOMMENDING A GAME # Gets recommendations for a person by using a weighted average of every other user's ratings def recommendations(person, other_people) similarities = {} other_people.each do |other_person| similarity = distance(person, other_person) # ignore scores of zero or lower next if similarity <= 0 other_person.ratings.each do |other_person_game, other_person_score| # only score what I haven't rated yet next if person.ratings[other_person_game] similarity_for_game = similarities[other_person_game] ||= { :weighted => 0, :sum => 0 } # sum of weighted rating times similarity and total similarity similarity_for_game[:weighted] += other_person.ratings[other_person_game] * similarity similarity_for_game[:sum] += similarity end end # normalize list and sort by highest scores first similarities.collect do |game_name, score| [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse end ! Achievement unlocked “games recommended to John: Pac Man, Doom 3”
  17. STEP 5: SIMILAR GAMES ‣ Invert users x preferences, then

    use the exact same algorithm as step 1 to find similar games based solely on people’s interactions (“item-based filtering”). ‣ Cross-compare everything. This might take a very long time for a large number of games… Hint: save this data on a persistent storage will lead to very fast recommendation lookups (that’s what most recommendation engines save, in fact) # User has many ratings "Larry" => { "Prince of Persia" => 3.0, "Doom" => 3.5, "Castle Wolfenstein" => 1.5 }, "Robert" => { "Prince of Persia" => 2.5, "Doom" => 3.0 }, "Jane" => { "Prince of Persia" => 3.0, "Doom" => 4.0 }, "John" => { "Doom" => 4.5 } # Game rated by many users "Prince of Persia" => { "Larry"=>3.0, "Robert"=>2.5, "Jane"=>3.0 }, "Doom" => { "Larry"=>3.5, "Robert"=>3.0, "Jane"=>4.0, "John"=>4.5 }, "Castle Wolfenstein" => { "Larry"=>1.5, "Mark"=>2.0 }
  18. # Similar games: # Prince of Persia: Commander Keen (40%),

    Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%) # Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%) # Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%) # Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%) # Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%) # Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%) STEP 5: SIMILAR GAMES # Create a dictionary of games showing which other games they # are most similar to. This should be run often and cached for reuse def calculate_similar_games(game_ratings) Hash[game_ratings.collect do |game| [ game.name, top_matches(game, game_ratings) ] end] end
  19. # Similar games: # Prince of Persia: Commander Keen (40%),

    Duke Nukem (28%), Castle Wolfenstein (22%), Doom (22%), Rise of the Triad (9%) # Doom: Prince of Persia (22%), Duke Nukem (18%), Rise of the Triad (16%), Castle Wolfenstein (10%), Commander Keen (5%) # Castle Wolfenstein: Prince of Persia (22%), Commander Keen (18%), Duke Nukem (15%), Doom (10%), Rise of the Triad (6%) # Rise of the Triad: Doom (16%), Duke Nukem (10%), Prince of Persia (9%), Castle Wolfenstein (6%), Commander Keen (5%) # Commander Keen: Prince of Persia (40%), Castle Wolfenstein (18%), Duke Nukem (14%), Rise of the Triad (5%), Doom (5%) # Duke Nukem: Prince of Persia (28%), Doom (18%), Castle Wolfenstein (15%), Commander Keen (14%), Rise of the Triad (10%) STEP 5: SIMILAR GAMES ! Achievement unlocked “Doom is similar to Daikatana and Quake” # Create a dictionary of games showing which other games they # are most similar to. This should be run often and cached for reuse def calculate_similar_games(game_ratings) Hash[game_ratings.collect do |game| [ game.name, top_matches(game, game_ratings) ] end] end
  20. BONUS STAGE: FASTER RECOMMENDATIONS ‣A slightly tweaked version of the

    algorithm on step 2: just use the pre-calculated similarities instead of doing distances in the loop ‣Up to 10x faster in a pure Ruby implementation (if you’re still with me)
  21. BONUS STAGE: FASTER RECOMMENDATIONS (if you’re still with me) #

    this is very similar to the recommendations() algorithm, # except we use a pre-calculated similar_games_matrix instead of # calculating distances here def recommended_games(similar_games_matrix, user) similarities = {} user.ratings.each do |game_name, user_rating| # Loop over pre-cached game similarities to the current game similar_games_matrix[game_name].each do |game, similarity| # Ignore if this user has already rated this similar game next if user.ratings[game.name] score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 } # Weighted sum of rating times similarity and sum of similarities score_for_game[:weighted] += similarity * user_rating score_for_game[:sum] += similarity end end # Divide each total score by total weighting to get an average # Return the rankings from highest to lowest similarities.collect do |game_name, score| [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse end
  22. BONUS STAGE: FASTER RECOMMENDATIONS (if you’re still with me) #

    this is very similar to the recommendations() algorithm, # except we use a pre-calculated similar_games_matrix instead of # calculating distances here def recommended_games(similar_games_matrix, user) similarities = {} user.ratings.each do |game_name, user_rating| # Loop over pre-cached game similarities to the current game similar_games_matrix[game_name].each do |game, similarity| # Ignore if this user has already rated this similar game next if user.ratings[game.name] score_for_game = similarities[game.name] ||= { :weighted => 0, :sum => 0 } # Weighted sum of rating times similarity and sum of similarities score_for_game[:weighted] += similarity * user_rating score_for_game[:sum] += similarity end end # Divide each total score by total weighting to get an average # Return the rankings from highest to lowest similarities.collect do |game_name, score| [ game_name, (score[:weighted] / score[:sum]) ] end.sort_by { |sim| sim[1] }.reverse end ! Achievement unlocked EPIC WIN!
  23. Yes, we have RubyGems recommendify top choice according to RubyToolbox

    recommendable “Rails-compatible” acts_as_recommended doesn’t require Redis (not actively maintained)
  24. “RECOMMENDIFY” EXAMPLE class GameRecommender < Recommendify::Base # store only the

    top 10 neighbors per item max_neighbors 10 # define an input data set "game_ratings". we'll add "user_id->game_id" # pairs to this input and use the jaccard coefficient to retrieve a # "users that liked game i1 also liked game i2" list input_matrix :game_ratings, :similarity_func => :jaccard, :weight => 5.0 end recommender = GameRecommender.new # add `order_id->product_id` interactions to the order_item_sim input # you can add data incrementally and call RecommendedItem.process! to update # the similarity matrix at any time. recommender.game_ratings.add_set("John", ["Duke Nukem", "Doom", "Quake"]) recommender.game_ratings.add_set("Mary", ["Prince of Persia", "Doom"]) # Calculate all elements of the similarity matrix recommender.process! # retrieve similar games to "Doom" recommender.for("Doom") => [ <Recommendify::Neighbor item_id:"Duke Nukem" similarity:0.23>, (...) ]
  25. “RECOMMENDABLE” EXAMPLE class User < ActiveRecord::Base recommends :books, :movies, :games

    end >> friend.like(Movie.where(:name => "2001: A Space Odyssey").first) >> friend.like(Book.where(:title => "A Clockwork Orange").first) >> friend.like(Book.where(:title => "Brave New World").first) >> friend.like(Book.where(:title => "One Flew Over the Cuckoo's Next").first) >> user.like(Book.where(:title => "A Clockwork Orange").first) >> user.recommended_books => [#<Book title: "Brave New World">, #<Book title: "One Flew Over the Cuckoo's Nest">] >> user.recommended_movies => [#<Movie name: "A Clockwork Orange">]
  26. CLOSING REMARKS ‣Gems are cool, but you’ll have to dive

    into the code for better results ‣Crossing social filtering with other AI techniques (e.g.: content classification) produces dramatically better results
  27. ZOMG I NEED TO KNOW MORE Code from this presentation:

    https://gist.github.com/herval/4992503 Stuff to read: ‣ AI Application Programming: http://ai-app-prog.rubyforge.org ‣ Programming Collective Intelligence: http://amzn.to/XtANMl (great book for noobs) Ready-to-use Gems ‣ https://github.com/paulasmuth/recommendify ‣ https://github.com/davidcelis/recommendable ‣ https://github.com/Draiken/acts_as_recommended Serious AI algorithms in Ruby ‣ https://github.com/SergioFierens/ai4r ‣ http://web.media.mit.edu/~dustin/papers/ai_ruby_plugins/ ‣ https://github.com/kanwei/algorithms