Timeline Post Recommender System

Timeline Post Recommender System

Jihong Lee
LINE Plus Data Science Dev Machine Learning Engineer
https://linedevday.linecorp.com/jp/2019/sessions/D1-2

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 1.

    2019 DevDay Timeline Post Recommender System > Jihong Lee >

    LINE Plus Data Science Dev Machine Learning Engineer
  2. 2.

    Agenda > What Is a Recommender System? > Problems, Solutions,

    and Lessons Learned • User Embeddings • Feedback Loop • Importance of Evaluation • Model Architecture • Increasing Post Pool
  3. 8.

    Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  4. 9.

    Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  5. 10.

    Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  6. 11.

    Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad Recommend!
  7. 12.

    Problem Description > Given: A user and a post with

    context > GOAL: Predict the probability that the user will click the post
  8. 16.

    Raw Data User View History Log User Post Author Time

    User Click History Log User Post Author Time Join!
  9. 17.

    Raw Data Labeled Data User Post Author Time Label 1

    0 1 1 1 - Positive Label (Clicked) 0 - Negative Label (Not Clicked)
  10. 23.

    Recommender System 101: Embeddings Item 1 3 0 2 0

    … Item 2 2 1 2 4 … Item 3 3 4 1 0 … features dimensionality too high! (# of columns) no meaning of categories
  11. 24.

    Recommender System 101: Embeddings Item 1 3 0 2 0

    … Item 2 2 1 2 4 … Item 3 3 4 1 0 … features dimensionality too high! (# of columns) Item 1 [ -0.242 0.218 0.848 -0.887 … ] Item 1 [ -0.242 0.218 0.848 -0.887 … ] Item 1 [ -0.242 0.218 0.848 -0.887 … ] embedding representation no meaning of categories reduced dimensionality has category meanings
  12. 26.

    Creating Embeddings Post 1 Post 2 Post 3 Post 4

    0 1 1 1 0 1 0 1 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ 0.324 -0.192 -0.453 0.004 … ] [ -0.187 0.394 -0.225 0.022 … ] [ 0.177 0.718 -0.239 -0.422 … ] [ -0.725 0.090 -0.353 0.228 … ] … [ … ]
  13. 27.

    Creating Embeddings Post 1 Post 2 Post 3 Post 4

    0 1 1 1 0 1 0 1 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ 0.324 -0.192 -0.453 0.004 … ] [ -0.187 0.394 -0.225 0.022 … ] [ 0.177 0.718 -0.239 -0.422 … ] [ -0.725 0.090 -0.353 0.228 … ] … [ … ] Extremely Sparse!! Density < 0.0001%
  14. 28.

    Mitigate the Issue Post 1 Post 2 Post 3 Post

    4 Post 1 3 0 2 0 Post 2 1 1 3 2 Post 3 1 0 2 1 Post 4 0 1 4 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ linear combination of user history ] [ linear combination of user history ] [ linear combination of user history ] [ linear combination of user history ] … [ … ] Worked Much Better!
  15. 29.

    Lesson Learned > Different algorithms are suitable for different types

    of data > Must understand the nature of your data!
  16. 31.

    Feedback Loop User Interactions Model Deployment Model Training Raw Data

    Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data
  17. 32.

    Feedback Loop User Interactions Model Deployment Model Training Raw Data

    Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data Problematic!
  18. 33.

    Feedback Loop train recommendations for user user interacts Caused by

    Model 0 Training Data Is Biased!! Model 0 Model 1
  19. 34.

    User Interactions Model Deployment Model Training Preprocess Data Feedback Loop

    User Interactions Model Deployment Model Training Raw Data Preprocess Data Raw Data Caused by Friends’ Shares Not Used in Training!
  20. 36.

    AUROC (Area Under ROC Curve) True Positive Rate False Positive

    Rate True Positive Rate: Proportion of Correctly Classified Positive Labels False Positive Rate: Proportion of Negative Labels Incorrectly Classified as Positive : Trained Classifier : Random Classifier
  21. 37.

    AUROC (Area Under ROC Curve) True Positive Rate False Positive

    Rate True Positive Rate: Proportion of Correctly Classified Positive Labels False Positive Rate: Proportion of Negative Labels Incorrectly Classified as Positive : Trained Classifier : Random Classifier
  22. 38.

    Global AUROC User Post Pr(Click) A 0.992 B 0.981 A

    0.977 C 0.964 B 0.951 A 0.924 C 0.918 A 0.908 C 0.905 C 0.900 B 0.898 B 0.891 … … … Label 1 0 1 1 0 0 1 1 0 1 1 0 … Calculate AUROC
  23. 39.

    Global AUROC User Post Pr(Click) A 0.992 B 0.981 A

    0.977 C 0.964 B 0.951 A 0.924 C 0.918 A 0.908 C 0.905 C 0.900 B 0.898 B 0.891 … … … Label 1 0 1 1 0 0 1 1 0 1 1 0 … Calculate AUROC
  24. 40.

    Average AUROC per User Label 1 0 1 … User

    Post Pr(Click) A 0.992 B 0.981 C 0.977 … … … User Post Pr(Click) A 0.990 C 0.985 D 0.972 … … … User Post Pr(Click) B 0.991 D 0.970 C 0.967 … … … Label 1 0 1 … Label 1 0 1 … Calculate AUROC for each user then average
  25. 46.

    Pareto Optimality Preference Criterion A Preference Criterion B a state

    of maximum efficiency in the allocation of resources Pareto frontier any point on the Pareto frontier is Pareto optimal
  26. 47.

    Pareto Optimality Preference Criterion A Preference Criterion B a state

    of maximum efficiency in the allocation of resources Pareto frontier any point on the Pareto frontier is Pareto optimal
  27. 50.

    Candidate Generation and Ranking Raw Data User Interactions Model Deployment

    Candidate Generation Ranking Preprocess Data Preprocess Data
  28. 51.

    Candidate Generation and Ranking Raw Data User Interactions Model Deployment

    Candidate Generation Ranking Preprocess Data Preprocess Data
  29. 52.

    Candidate Generation Post Embedding Vectors Post 1 [ -0.242 0.218

    0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] train post embeddings = Post 1 Post 2 Post 3 Post 4 Post 1 3 0 2 0 Post 2 1 1 3 2 Post 3 1 0 2 1 Post 4 0 1 4 1 co-occurrence matrix Candidate Generation Training
  30. 53.

    Candidate Generation Candidate Generation Inference user ID = interaction history

    Item Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] linear combination of history User ID [ -0.242 0.218 0.848 -0.887 … ] user vector nearest neighbor search candidates Candidates Post 1 Post 2 Post 3 Post 4 …
  31. 56.

    Aligning Embeddings Trained in Batches Candidate Generation Post Embedding Vectors

    Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 5 [ -0.242 0.218 0.848 -0.887 … ] Post 6 [ 0.581 -0.859 0.006 -0.598 … ] Post 7 [ 0.344 -0.834 -0.651 0.524 … ] Post 8 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] t=0 t=1 t=2 Each batch has a different post pool
  32. 57.

    Aligning Embeddings Trained in Batches Candidate Generation Post Embedding Vectors

    Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 5 [ -0.242 0.218 0.848 -0.887 … ] Post 6 [ 0.581 -0.859 0.006 -0.598 … ] Post 7 [ 0.344 -0.834 -0.651 0.524 … ] Post 8 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] t=0 t=1 t=2 We want everything Need to align!
  33. 58.

    Orthogonal Procrustes Problem find orthogonal matrix W same size Given

    A B , such that ∥BW − A∥2 F is minimized
  34. 60.

    Orthogonal Procrustes Problem A B 1-to-1 correspondence of points find

    rotation and/or reflection matrix that maps B into A
  35. 61.

    Orthogonal Procrustes Problem 1-to-1 correspondence of points A B find

    rotation and/or reflection matrix that maps B into A
  36. 62.

    Orthogonal Procrustes Problem 1-to-1 correspondence of points A B find

    rotation and/or reflection matrix that maps B into A
  37. 64.

    Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 A B 1-to-1 correspondence of points find orthogonal matrix W that maps B into A
  38. 65.

    Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 A B 1-to-1 correspondence of points find orthogonal matrix W that maps B into A
  39. 66.

    Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 1-to-1 correspondence of points find orthogonal matrix W that maps B into A transform whole t=1 embedding matrix using W
  40. 67.

    Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1′  t=0 and t=1′  are in same vector space add embeddings only in t=1′  to t=0 Aligned Embeddings!
  41. 68.

    Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  42. 69.

    Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  43. 70.

    Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  44. 71.

    Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  45. 72.