Slide 1

Slide 1 text

Matrix Factorization (Gaussian vs Poisson) -Sarah Masud Red Hat

Slide 2

Slide 2 text

Gaussian Distribution Normal distribution… Bell Curve…. In Layman terms: Curve with most values present in/around the mean.

Slide 3

Slide 3 text

Daily Examples ● Marks of students in a test ● Height of people in India ● Emotional response to a documentary ● Motion of fluid particles.

Slide 4

Slide 4 text

Formula For Gaussian

Slide 5

Slide 5 text

Poisson Distribution A discrete distribution for rare events occurring within a given time or space. Conditions: 1. Each occurrence is independent within the given timeframe 2. The rate of occurrence is constant within the given timeframe

Slide 6

Slide 6 text

Daily Life Examples ● In Soccer, assuming 2.5 goals per match, what is the probability of `k` goals in the next match. ● Given that lightning strikes 3 times during 2 hrs of raining, what is the probability of lightning striking `k` times during 2 hrs of raining. In most cases you are interested in occurence of these events, it doesn't matter how many times it did not occur.

Slide 7

Slide 7 text

Daily Life Examples(courtesy wiki)

Slide 8

Slide 8 text

Formula For Poisson

Slide 9

Slide 9 text

Introduction to Matrix Factorization Users/Topics Viewed Topic 1 Topic 2 Topic 3 User 1 0 1 0 User 2 1 1 0 User 3 1 0 0

Slide 10

Slide 10 text

Introduction to Matrix Factorization

Slide 11

Slide 11 text

Rating Matrix Rating Matrix= Approximation of (User Matrix * Item Matrix)

Slide 12

Slide 12 text

Why Is Gaussian Famous For MF ● They occur frequently in nature! ● Can be worked on with both Continuous and Discrete observations ● Ease of Standardization/Normalization

Slide 13

Slide 13 text

Why think about Poisson? ● Most users either consume an item or not, data is mostly discrete ● Most users activity are sparse ● Long tail problem

Slide 14

Slide 14 text

Does it help our use case Recommendation for software packages Collected from Github: Number of unique JAVA packages ~ 100K Number of JAVA dependencies ~ 800k

Slide 15

Slide 15 text

Does it help our use case

Slide 16

Slide 16 text

Some more examples of Gaussian vs Poisson ● Recommendations for all users vs active users of Facebook ● Recommendations for all items on amazon vs recommendation for Phones ● Adding new users vs improving for existing user/item pair

Slide 17

Slide 17 text

When In Doubt- Ring the Bell Follow nature, assume Gaussian and be happy* ● Visualize the data set ● Take the future user/item interaction into consideration ● What area of users will profit you the most. * Subject to risks

Slide 18

Slide 18 text

References: https://github.com/fabric8-analytics/f8a-hpf-insights