Slide 1

Slide 1 text

2019 DevDay Timeline Post Recommender System > Jihong Lee > LINE Plus Data Science Dev Machine Learning Engineer

Slide 2

Slide 2 text

Agenda > What Is a Recommender System? > Problems, Solutions, and Lessons Learned • User Embeddings • Feedback Loop • Importance of Evaluation • Model Architecture • Increasing Post Pool

Slide 3

Slide 3 text

What Is a Recommender System?

Slide 4

Slide 4 text

What Is a Recommender System?

Slide 5

Slide 5 text

What Is a Recommender System?

Slide 6

Slide 6 text

LINE Timeline

Slide 7

Slide 7 text

Background Knowledge > Collaborative filtering and content-based filtering > Two big approaches to Recommender Systems

Slide 8

Slide 8 text

Collaborative Filtering Movie A Movie B Movie C Movie D 5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad

Slide 9

Slide 9 text

Collaborative Filtering Movie A Movie B Movie C Movie D 5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad

Slide 10

Slide 10 text

Collaborative Filtering Movie A Movie B Movie C Movie D 5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad

Slide 11

Slide 11 text

Collaborative Filtering Movie A Movie B Movie C Movie D 5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad Recommend!

Slide 12

Slide 12 text

Problem Description > Given: A user and a post with context > GOAL: Predict the probability that the user will click the post

Slide 13

Slide 13 text

User Interactions Model Deployment Model Training Preprocess Data Model Training Pipeline Raw Data

Slide 14

Slide 14 text

Model Training Pipeline User Interactions Model Deployment Model Training Raw Data Preprocess Data

Slide 15

Slide 15 text

Model Training Pipeline User Interactions Model Deployment Model Training Raw Data Preprocess Data

Slide 16

Slide 16 text

Raw Data User View History Log User Post Author Time User Click History Log User Post Author Time Join!

Slide 17

Slide 17 text

Raw Data Labeled Data User Post Author Time Label 1 0 1 1 1 - Positive Label (Clicked) 0 - Negative Label (Not Clicked)

Slide 18

Slide 18 text

Model Training Pipeline User Interactions Model Deployment Model Training Raw Data Preprocess Data

Slide 19

Slide 19 text

Model Training Pipeline User Interactions Model Deployment Model Training Raw Data Preprocess Data

Slide 20

Slide 20 text

Model Training Pipeline User Interactions Model Deployment Model Training Raw Data Preprocess Data

Slide 21

Slide 21 text

Model Training Pipeline Simple! User Interactions Model Deployment Model Training Raw Data Preprocess Data

Slide 22

Slide 22 text

Model Training Pipeline User Interactions Model Deployment Model Training Raw Data Preprocess Data

Slide 23

Slide 23 text

Recommender System 101: Embeddings Item 1 3 0 2 0 … Item 2 2 1 2 4 … Item 3 3 4 1 0 … features dimensionality too high! (# of columns) no meaning of categories

Slide 24

Slide 24 text

Recommender System 101: Embeddings Item 1 3 0 2 0 … Item 2 2 1 2 4 … Item 3 3 4 1 0 … features dimensionality too high! (# of columns) Item 1 [ -0.242 0.218 0.848 -0.887 … ] Item 1 [ -0.242 0.218 0.848 -0.887 … ] Item 1 [ -0.242 0.218 0.848 -0.887 … ] embedding representation no meaning of categories reduced dimensionality has category meanings

Slide 25

Slide 25 text

Problems & Solutions > User Embeddings

Slide 26

Slide 26 text

Creating Embeddings Post 1 Post 2 Post 3 Post 4 0 1 1 1 0 1 0 1 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ 0.324 -0.192 -0.453 0.004 … ] [ -0.187 0.394 -0.225 0.022 … ] [ 0.177 0.718 -0.239 -0.422 … ] [ -0.725 0.090 -0.353 0.228 … ] … [ … ]

Slide 27

Slide 27 text

Creating Embeddings Post 1 Post 2 Post 3 Post 4 0 1 1 1 0 1 0 1 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ 0.324 -0.192 -0.453 0.004 … ] [ -0.187 0.394 -0.225 0.022 … ] [ 0.177 0.718 -0.239 -0.422 … ] [ -0.725 0.090 -0.353 0.228 … ] … [ … ] Extremely Sparse!! Density < 0.0001%

Slide 28

Slide 28 text

Mitigate the Issue Post 1 Post 2 Post 3 Post 4 Post 1 3 0 2 0 Post 2 1 1 3 2 Post 3 1 0 2 1 Post 4 0 1 4 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ linear combination of user history ] [ linear combination of user history ] [ linear combination of user history ] [ linear combination of user history ] … [ … ] Worked Much Better!

Slide 29

Slide 29 text

Lesson Learned > Different algorithms are suitable for different types of data > Must understand the nature of your data!

Slide 30

Slide 30 text

Problems & Solutions > Feedback Loop

Slide 31

Slide 31 text

Feedback Loop User Interactions Model Deployment Model Training Raw Data Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data

Slide 32

Slide 32 text

Feedback Loop User Interactions Model Deployment Model Training Raw Data Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data Problematic!

Slide 33

Slide 33 text

Feedback Loop train recommendations for user user interacts Caused by Model 0 Training Data Is Biased!! Model 0 Model 1

Slide 34

Slide 34 text

User Interactions Model Deployment Model Training Preprocess Data Feedback Loop User Interactions Model Deployment Model Training Raw Data Preprocess Data Raw Data Caused by Friends’ Shares Not Used in Training!

Slide 35

Slide 35 text

Problems & Solutions > Importance of Evaluation

Slide 36

Slide 36 text

AUROC (Area Under ROC Curve) True Positive Rate False Positive Rate True Positive Rate: Proportion of Correctly Classified Positive Labels False Positive Rate: Proportion of Negative Labels Incorrectly Classified as Positive : Trained Classifier : Random Classifier

Slide 37

Slide 37 text

AUROC (Area Under ROC Curve) True Positive Rate False Positive Rate True Positive Rate: Proportion of Correctly Classified Positive Labels False Positive Rate: Proportion of Negative Labels Incorrectly Classified as Positive : Trained Classifier : Random Classifier

Slide 38

Slide 38 text

Global AUROC User Post Pr(Click) A 0.992 B 0.981 A 0.977 C 0.964 B 0.951 A 0.924 C 0.918 A 0.908 C 0.905 C 0.900 B 0.898 B 0.891 … … … Label 1 0 1 1 0 0 1 1 0 1 1 0 … Calculate AUROC

Slide 39

Slide 39 text

Global AUROC User Post Pr(Click) A 0.992 B 0.981 A 0.977 C 0.964 B 0.951 A 0.924 C 0.918 A 0.908 C 0.905 C 0.900 B 0.898 B 0.891 … … … Label 1 0 1 1 0 0 1 1 0 1 1 0 … Calculate AUROC

Slide 40

Slide 40 text

Average AUROC per User Label 1 0 1 … User Post Pr(Click) A 0.992 B 0.981 C 0.977 … … … User Post Pr(Click) A 0.990 C 0.985 D 0.972 … … … User Post Pr(Click) B 0.991 D 0.970 C 0.967 … … … Label 1 0 1 … Label 1 0 1 … Calculate AUROC for each user then average

Slide 41

Slide 41 text

Problems & Solutions > Model Architecture

Slide 42

Slide 42 text

Problems With a Sole Ranking Model Computationally expensive Concentrated post distribution

Slide 43

Slide 43 text

Problems With a Sole Ranking Model Concentrated post distribution Computationally expensive

Slide 44

Slide 44 text

Problems With a Sole Ranking Model Computationally expensive Concentrated post distribution

Slide 45

Slide 45 text

Problems With a Sole Ranking Model Computationally expensive Concentrated post distribution

Slide 46

Slide 46 text

Pareto Optimality Preference Criterion A Preference Criterion B a state of maximum efficiency in the allocation of resources Pareto frontier any point on the Pareto frontier is Pareto optimal

Slide 47

Slide 47 text

Pareto Optimality Preference Criterion A Preference Criterion B a state of maximum efficiency in the allocation of resources Pareto frontier any point on the Pareto frontier is Pareto optimal

Slide 48

Slide 48 text

Current State Personalization KPI Inside Pareto Frontier Converging at KPI’s Local Maxima A B

Slide 49

Slide 49 text

In Search of Pareto Frontier Personalization KPI Need a change in model architecture

Slide 50

Slide 50 text

Candidate Generation and Ranking Raw Data User Interactions Model Deployment Candidate Generation Ranking Preprocess Data Preprocess Data

Slide 51

Slide 51 text

Candidate Generation and Ranking Raw Data User Interactions Model Deployment Candidate Generation Ranking Preprocess Data Preprocess Data

Slide 52

Slide 52 text

Candidate Generation Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] train post embeddings = Post 1 Post 2 Post 3 Post 4 Post 1 3 0 2 0 Post 2 1 1 3 2 Post 3 1 0 2 1 Post 4 0 1 4 1 co-occurrence matrix Candidate Generation Training

Slide 53

Slide 53 text

Candidate Generation Candidate Generation Inference user ID = interaction history Item Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] linear combination of history User ID [ -0.242 0.218 0.848 -0.887 … ] user vector nearest neighbor search candidates Candidates Post 1 Post 2 Post 3 Post 4 …

Slide 54

Slide 54 text

Inference Pipeline User ID Query Candidate Generation Ranking Recommendations

Slide 55

Slide 55 text

Problems & Solutions > Increasing Post Pool

Slide 56

Slide 56 text

Aligning Embeddings Trained in Batches Candidate Generation Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 5 [ -0.242 0.218 0.848 -0.887 … ] Post 6 [ 0.581 -0.859 0.006 -0.598 … ] Post 7 [ 0.344 -0.834 -0.651 0.524 … ] Post 8 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] t=0 t=1 t=2 Each batch has a different post pool

Slide 57

Slide 57 text

Aligning Embeddings Trained in Batches Candidate Generation Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 5 [ -0.242 0.218 0.848 -0.887 … ] Post 6 [ 0.581 -0.859 0.006 -0.598 … ] Post 7 [ 0.344 -0.834 -0.651 0.524 … ] Post 8 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] t=0 t=1 t=2 We want everything Need to align!

Slide 58

Slide 58 text

Orthogonal Procrustes Problem find orthogonal matrix W same size Given A B , such that ∥BW − A∥2 F is minimized

Slide 59

Slide 59 text

Orthogonal Procrustes Problem A B

Slide 60

Slide 60 text

Orthogonal Procrustes Problem A B 1-to-1 correspondence of points find rotation and/or reflection matrix that maps B into A

Slide 61

Slide 61 text

Orthogonal Procrustes Problem 1-to-1 correspondence of points A B find rotation and/or reflection matrix that maps B into A

Slide 62

Slide 62 text

Orthogonal Procrustes Problem 1-to-1 correspondence of points A B find rotation and/or reflection matrix that maps B into A

Slide 63

Slide 63 text

Orthogonal Procrustes Problem Aligned A & B! How can we use this method? B A

Slide 64

Slide 64 text

Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 A B 1-to-1 correspondence of points find orthogonal matrix W that maps B into A

Slide 65

Slide 65 text

Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 A B 1-to-1 correspondence of points find orthogonal matrix W that maps B into A

Slide 66

Slide 66 text

Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 1-to-1 correspondence of points find orthogonal matrix W that maps B into A transform whole t=1 embedding matrix using W

Slide 67

Slide 67 text

Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1′  t=0 and t=1′  are in same vector space add embeddings only in t=1′  to t=0 Aligned Embeddings!

Slide 68

Slide 68 text

Summary > Understand the nature of your data > Dual importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric

Slide 69

Slide 69 text

Summary > Understand the nature of your data > Dual importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric

Slide 70

Slide 70 text

Summary > Understand the nature of your data > Dual importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric

Slide 71

Slide 71 text

Summary > Understand the nature of your data > Dual importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric

Slide 72

Slide 72 text

Thank You