Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jaccard Index & Recommendable

Mike Danko
December 06, 2012

Jaccard Index & Recommendable

Overview of the Jaccard Index and the Recommendable Gem

Mike Danko

December 06, 2012
Tweet

More Decks by Mike Danko

Other Decks in Technology

Transcript

  1. Have you ever... • Allowed users to bookmark or like

    things? • Allowed users to block or hide things? • Allowed users to hate things? • Implemented those things yourself?
  2. Collaborative Filtering Collaborative filtering (CF) is a technique used by

    some recommender systems. Collaborative filtering has two senses, a narrow one and a more general one.[1] In general, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.[1] Applications of collaborative filtering typically involve very large data sets. Collaborative filtering methods have been applied to many different kinds of data including sensing and monitoring data - such as in mineral exploration, environmental sensing over large areas or multiple sensors; financial data - such as financial service institutions that integrate many financial sources; or in electronic commerce and web applications where the focus is on user data, etc. The remainder of this discussion focuses on collaborative filtering for user data, although some of the methods and approaches may apply to the other major applications as well. Source: Wikipedia
  3. Collaborative Filtering Making predictions about the interests of a user

    based on the collective preferences of the entire user base.
  4. Easy Mode: Jaccard • Calculating the distance between sample sets.

    • Works on binary systems (likes, dislikes) • Examples: Recommender Systems (Netflix- like), Biological Classifications, any groups really.
  5. Jawho? • Paul Jaccard • 19th Swiss Botanist & Plant

    Physiologist • Awesome time for Taxonomy & Classification • Pioneer in Biogeography (Biodiversity Distribution)
  6. 0.2? • Bounds are -1..1 • -1. Your polar opposite

    • 1. Your long lost twin • Joe and Bob are more similar than dissimilar
  7. So, Requirements. • Recommendable < 2.0, ActiveRecord. • >= 2.0,

    Most popular ORM’s. • Redis • A worker, such as Sidekiq, Resque, etc.