Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MF: Gaussian Vs Poisson

MF: Gaussian Vs Poisson

Trying to make sense of matrix factorisation for recommendation systems using Gaussian vs Poisson distributions.



June 30, 2018


  1. Matrix Factorization (Gaussian vs Poisson) -Sarah Masud Red Hat

  2. Gaussian Distribution Normal distribution… Bell Curve…. In Layman terms: Curve

    with most values present in/around the mean.
  3. Daily Examples • Marks of students in a test •

    Height of people in India • Emotional response to a documentary • Motion of fluid particles.
  4. Formula For Gaussian

  5. Poisson Distribution A discrete distribution for rare events occurring within

    a given time or space. Conditions: 1. Each occurrence is independent within the given timeframe 2. The rate of occurrence is constant within the given timeframe
  6. Daily Life Examples • In Soccer, assuming 2.5 goals per

    match, what is the probability of `k` goals in the next match. • Given that lightning strikes 3 times during 2 hrs of raining, what is the probability of lightning striking `k` times during 2 hrs of raining. In most cases you are interested in occurence of these events, it doesn't matter how many times it did not occur.
  7. Daily Life Examples(courtesy wiki)

  8. Formula For Poisson

  9. Introduction to Matrix Factorization Users/Topics Viewed Topic 1 Topic 2

    Topic 3 User 1 0 1 0 User 2 1 1 0 User 3 1 0 0
  10. Introduction to Matrix Factorization

  11. Rating Matrix Rating Matrix= Approximation of (User Matrix * Item

  12. Why Is Gaussian Famous For MF • They occur frequently

    in nature! • Can be worked on with both Continuous and Discrete observations • Ease of Standardization/Normalization
  13. Why think about Poisson? • Most users either consume an

    item or not, data is mostly discrete • Most users activity are sparse • Long tail problem
  14. Does it help our use case Recommendation for software packages

    Collected from Github: Number of unique JAVA packages ~ 100K Number of JAVA dependencies ~ 800k
  15. Does it help our use case

  16. Some more examples of Gaussian vs Poisson • Recommendations for

    all users vs active users of Facebook • Recommendations for all items on amazon vs recommendation for Phones • Adding new users vs improving for existing user/item pair
  17. When In Doubt- Ring the Bell Follow nature, assume Gaussian

    and be happy* • Visualize the data set • Take the future user/item interaction into consideration • What area of users will profit you the most. * Subject to risks
  18. References: https://github.com/fabric8-analytics/f8a-hpf-insights