Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Создание рекомендательного сервиса на Ruby

Jan Bernacki
December 15, 2012

Создание рекомендательного сервиса на Ruby

Jan Bernacki

December 15, 2012
Tweet

More Decks by Jan Bernacki

Other Decks in Science

Transcript

  1. > 160 000 000 ТОВАРОВ В AMAZON > 4 500

    000 РЕПОЗИТОРИЕВ В GITHUB > 10 000 000 КНИГ IN GOODREADS Saturday, December 15, 12
  2. МАТЕМАТИЧНО! cos(  u 1 ,  u 2 )

    =  u 1 •  u 2  u 1 ∗  u 2 = u 11 ∗u 21 +...+ u 1n *u 2n u2 11 +...+ u2 1n + u2 21 +...+ u2 2n COSINE MEASURE ПОХОЖЕСТЬ Saturday, December 15, 12
  3. ПОХОЖЕСТЬ w uv = (r ui − r u )(r

    vi − r v ) (r ui − r u )2 i∈I ∑ (r vi − r v )2 i∈I ∑ i∈I ∑ КОЭФФИЦИЕНТ КОРРЕЛЯЦИИ ПИРСОНА Saturday, December 15, 12
  4. ВОССТАНОВЛЕНИЕ ДАННЫХ ВЗВЕШЕННАЯ СУММА РЕЙТИНГОВ P ai =  r

    a + (r ui −  r u )*w au u∈ U ∑ w au u∈ U ∑ Saturday, December 15, 12
  5. 1 1 1 1 1 1 1 1 1 1

    1 1 РЕПОЗИТОРИИ ЮЗЕРЫ Saturday, December 15, 12
  6. 1 1 1 1 1 1 1 1 1 1

    1 1 РЕПОЗИТОРИИ ЮЗЕРЫ 1 1 USER Saturday, December 15, 12
  7. 1 1 1 1 1 1 1 1 1 1

    1 1 РЕПОЗИТОРИИ ЮЗЕРЫ 0.408 0.353 0.499 0.816 1 1 USER Saturday, December 15, 12
  8. 1 1 1 1 1 1 1 1 1 1

    1 1 РЕПОЗИТОРИИ ЮЗЕРЫ 0.408 0.353 0.499 0.816 1 1 USER Saturday, December 15, 12
  9. 1 1 1 1 1 1 1 1 1 1

    1 1 РЕПОЗИТОРИИ ЮЗЕРЫ 0.408 0.353 0.499 0.816 1 0.23 1 0.53 0.47 ЮЗЕРЫ Saturday, December 15, 12
  10. ОЦЕНКА КАЧЕСТВА MAE = p i, j − r i,

    j i, j ∑ n NMAE = MAE r max − r min RMSE = 1 n (p i, j − r i, j )2 i, j ∑ Saturday, December 15, 12
  11. ЗАГРУЗКА ДАННЫХ require 'em-synchrony' require 'em-synchrony/em-http' require 'em-synchrony/fiber_iterator' class Fetcher

    def fetch_starred_repos users EM.synchrony do EM::Synchrony::FiberIterator.new(users, 10).each do |user| http = EM::HttpRequest.new(user.starred_url).get # store repos # ... end EM.stop end end end Saturday, December 15, 12
  12. JRUBY & MAHOUT def recommend user_id connection = init_connection model

    = org.apache.mahout.cf.taste.impl.model.jdbc. PostgreSQLJDBCDataModel.new(connection, 'stars', 'user_id', 'repo_id', 'preference', 'created_at') data = org.apache.mahout.cf.taste.impl.model.jdbc. ReloadFromJDBCDataModel.new(model) similarity = org.apache.mahout.cf.taste.impl.similarity. TanimotoCoefficientSimilarity.new(data) neighborhood = org.apache.mahout.cf.taste.impl.neighborhood. NearestNUserNeighborhood.new(5, similarity, data) recommender = org.apache.mahout.cf.taste.impl.recommender. GenericBooleanPrefUserBasedRecommender.new(data, neighborhood, similarity) recommendations = recommender.recommend user_id, 30 update_recommendations connection, recommendations, user_id connection.close end http://goo.gl/dEAp9 JAVA CODE! Saturday, December 15, 12
  13. СОХРАНЯЕМ СВЯЗИ class WsServer < EventMachine::WebSocket::Connection @@relations = {} class

    << self def notify(token, message) if channels = @@relations[token] channels.each { |c| c.push message } end end def link(token, channel) @@relations[token] ||= [] @@relations[token] << channel end def unlink(token, channel) @@relations[token].delete(channel) @@relations[token].empty? && @@relations.delete(token) end end end Saturday, December 15, 12
  14. ОПРЕДЕЛЕНИЕ ПОЛЬЗОВАТЕЛЯ class WsServer def initialize(options) super(options) onmessage do |token|

    return if @token token.chomp! @channel = EM::Channel.new @channel.subscribe do |message| send(message) end self.class.link(token, @channel) @token = token end onclose do self.class.unlink(@token, @channel) if @token end end end Saturday, December 15, 12
  15. ДОСТАВКА СООБЩЕНИЙ EventMachine.run do Signal.trap("INT") { EventMachine.stop } Signal.trap("TERM") {

    EventMachine.stop } redis = EM::Hiredis.connect redis.psubscribe('messages:*') redis.on(:pmessage) do |key, channel, message| token = channel[/messages:(.*)/, 1] WsServer.notify(token, message) end EventMachine.start_server('0.0.0.0', 1666, WsServer, {}) end Saturday, December 15, 12
  16. РЕКОМЕНДАТОР def recommend user_id # ... end while array =

    $redis.blpop("recommendations") id = array.last.to_i recommend(id) redis.publish("messages:#{id}", 'iamdone') sleep 0.5 end Saturday, December 15, 12
  17. http://gitfm.com Ruby JavaScript Python PHP Java Objective-C C C++ Shell

    VimL Perl Clojure CoffeeScript C# Erlang Scala Emacs Lisp Haskell 0 7500 15000 22500 30000 Saturday, December 15, 12