Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MF: Gaussian Vs Poisson

MF: Gaussian Vs Poisson

Trying to make sense of matrix factorisation for recommendation systems using Gaussian vs Poisson distributions.

_themessier

June 30, 2018
Tweet

More Decks by _themessier

Other Decks in Programming

Transcript

  1. Daily Examples • Marks of students in a test •

    Height of people in India • Emotional response to a documentary • Motion of fluid particles.
  2. Poisson Distribution A discrete distribution for rare events occurring within

    a given time or space. Conditions: 1. Each occurrence is independent within the given timeframe 2. The rate of occurrence is constant within the given timeframe
  3. Daily Life Examples • In Soccer, assuming 2.5 goals per

    match, what is the probability of `k` goals in the next match. • Given that lightning strikes 3 times during 2 hrs of raining, what is the probability of lightning striking `k` times during 2 hrs of raining. In most cases you are interested in occurence of these events, it doesn't matter how many times it did not occur.
  4. Introduction to Matrix Factorization Users/Topics Viewed Topic 1 Topic 2

    Topic 3 User 1 0 1 0 User 2 1 1 0 User 3 1 0 0
  5. Why Is Gaussian Famous For MF • They occur frequently

    in nature! • Can be worked on with both Continuous and Discrete observations • Ease of Standardization/Normalization
  6. Why think about Poisson? • Most users either consume an

    item or not, data is mostly discrete • Most users activity are sparse • Long tail problem
  7. Does it help our use case Recommendation for software packages

    Collected from Github: Number of unique JAVA packages ~ 100K Number of JAVA dependencies ~ 800k
  8. Some more examples of Gaussian vs Poisson • Recommendations for

    all users vs active users of Facebook • Recommendations for all items on amazon vs recommendation for Phones • Adding new users vs improving for existing user/item pair
  9. When In Doubt- Ring the Bell Follow nature, assume Gaussian

    and be happy* • Visualize the data set • Take the future user/item interaction into consideration • What area of users will profit you the most. * Subject to risks