Timeline Post Recommender System

Timeline Post Recommender System

Jihong Lee
LINE Plus Data Science Dev Machine Learning Engineer
https://linedevday.linecorp.com/jp/2019/sessions/D1-2

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay Timeline Post Recommender System > Jihong Lee >

    LINE Plus Data Science Dev Machine Learning Engineer
  2. Agenda > What Is a Recommender System? > Problems, Solutions,

    and Lessons Learned • User Embeddings • Feedback Loop • Importance of Evaluation • Model Architecture • Increasing Post Pool
  3. What Is a Recommender System?

  4. What Is a Recommender System?

  5. What Is a Recommender System?

  6. LINE Timeline

  7. Background Knowledge > Collaborative filtering and content-based filtering > Two

    big approaches to Recommender Systems
  8. Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  9. Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  10. Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  11. Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad Recommend!
  12. Problem Description > Given: A user and a post with

    context > GOAL: Predict the probability that the user will click the post
  13. User Interactions Model Deployment Model Training Preprocess Data Model Training

    Pipeline Raw Data
  14. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  15. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  16. Raw Data User View History Log User Post Author Time

    User Click History Log User Post Author Time Join!
  17. Raw Data Labeled Data User Post Author Time Label 1

    0 1 1 1 - Positive Label (Clicked) 0 - Negative Label (Not Clicked)
  18. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  19. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  20. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  21. Model Training Pipeline Simple! User Interactions Model Deployment Model Training

    Raw Data Preprocess Data
  22. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  23. Recommender System 101: Embeddings Item 1 3 0 2 0

    … Item 2 2 1 2 4 … Item 3 3 4 1 0 … features dimensionality too high! (# of columns) no meaning of categories
  24. Recommender System 101: Embeddings Item 1 3 0 2 0

    … Item 2 2 1 2 4 … Item 3 3 4 1 0 … features dimensionality too high! (# of columns) Item 1 [ -0.242 0.218 0.848 -0.887 … ] Item 1 [ -0.242 0.218 0.848 -0.887 … ] Item 1 [ -0.242 0.218 0.848 -0.887 … ] embedding representation no meaning of categories reduced dimensionality has category meanings
  25. Problems & Solutions > User Embeddings

  26. Creating Embeddings Post 1 Post 2 Post 3 Post 4

    0 1 1 1 0 1 0 1 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ 0.324 -0.192 -0.453 0.004 … ] [ -0.187 0.394 -0.225 0.022 … ] [ 0.177 0.718 -0.239 -0.422 … ] [ -0.725 0.090 -0.353 0.228 … ] … [ … ]
  27. Creating Embeddings Post 1 Post 2 Post 3 Post 4

    0 1 1 1 0 1 0 1 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ 0.324 -0.192 -0.453 0.004 … ] [ -0.187 0.394 -0.225 0.022 … ] [ 0.177 0.718 -0.239 -0.422 … ] [ -0.725 0.090 -0.353 0.228 … ] … [ … ] Extremely Sparse!! Density < 0.0001%
  28. Mitigate the Issue Post 1 Post 2 Post 3 Post

    4 Post 1 3 0 2 0 Post 2 1 1 3 2 Post 3 1 0 2 1 Post 4 0 1 4 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ linear combination of user history ] [ linear combination of user history ] [ linear combination of user history ] [ linear combination of user history ] … [ … ] Worked Much Better!
  29. Lesson Learned > Different algorithms are suitable for different types

    of data > Must understand the nature of your data!
  30. Problems & Solutions > Feedback Loop

  31. Feedback Loop User Interactions Model Deployment Model Training Raw Data

    Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data
  32. Feedback Loop User Interactions Model Deployment Model Training Raw Data

    Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data Problematic!
  33. Feedback Loop train recommendations for user user interacts Caused by

    Model 0 Training Data Is Biased!! Model 0 Model 1
  34. User Interactions Model Deployment Model Training Preprocess Data Feedback Loop

    User Interactions Model Deployment Model Training Raw Data Preprocess Data Raw Data Caused by Friends’ Shares Not Used in Training!
  35. Problems & Solutions > Importance of Evaluation

  36. AUROC (Area Under ROC Curve) True Positive Rate False Positive

    Rate True Positive Rate: Proportion of Correctly Classified Positive Labels False Positive Rate: Proportion of Negative Labels Incorrectly Classified as Positive : Trained Classifier : Random Classifier
  37. AUROC (Area Under ROC Curve) True Positive Rate False Positive

    Rate True Positive Rate: Proportion of Correctly Classified Positive Labels False Positive Rate: Proportion of Negative Labels Incorrectly Classified as Positive : Trained Classifier : Random Classifier
  38. Global AUROC User Post Pr(Click) A 0.992 B 0.981 A

    0.977 C 0.964 B 0.951 A 0.924 C 0.918 A 0.908 C 0.905 C 0.900 B 0.898 B 0.891 … … … Label 1 0 1 1 0 0 1 1 0 1 1 0 … Calculate AUROC
  39. Global AUROC User Post Pr(Click) A 0.992 B 0.981 A

    0.977 C 0.964 B 0.951 A 0.924 C 0.918 A 0.908 C 0.905 C 0.900 B 0.898 B 0.891 … … … Label 1 0 1 1 0 0 1 1 0 1 1 0 … Calculate AUROC
  40. Average AUROC per User Label 1 0 1 … User

    Post Pr(Click) A 0.992 B 0.981 C 0.977 … … … User Post Pr(Click) A 0.990 C 0.985 D 0.972 … … … User Post Pr(Click) B 0.991 D 0.970 C 0.967 … … … Label 1 0 1 … Label 1 0 1 … Calculate AUROC for each user then average
  41. Problems & Solutions > Model Architecture

  42. Problems With a Sole Ranking Model Computationally expensive Concentrated post

    distribution
  43. Problems With a Sole Ranking Model Concentrated post distribution Computationally

    expensive
  44. Problems With a Sole Ranking Model Computationally expensive Concentrated post

    distribution
  45. Problems With a Sole Ranking Model Computationally expensive Concentrated post

    distribution
  46. Pareto Optimality Preference Criterion A Preference Criterion B a state

    of maximum efficiency in the allocation of resources Pareto frontier any point on the Pareto frontier is Pareto optimal
  47. Pareto Optimality Preference Criterion A Preference Criterion B a state

    of maximum efficiency in the allocation of resources Pareto frontier any point on the Pareto frontier is Pareto optimal
  48. Current State Personalization KPI Inside Pareto Frontier Converging at KPI’s

    Local Maxima A B
  49. In Search of Pareto Frontier Personalization KPI Need a change

    in model architecture
  50. Candidate Generation and Ranking Raw Data User Interactions Model Deployment

    Candidate Generation Ranking Preprocess Data Preprocess Data
  51. Candidate Generation and Ranking Raw Data User Interactions Model Deployment

    Candidate Generation Ranking Preprocess Data Preprocess Data
  52. Candidate Generation Post Embedding Vectors Post 1 [ -0.242 0.218

    0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] train post embeddings = Post 1 Post 2 Post 3 Post 4 Post 1 3 0 2 0 Post 2 1 1 3 2 Post 3 1 0 2 1 Post 4 0 1 4 1 co-occurrence matrix Candidate Generation Training
  53. Candidate Generation Candidate Generation Inference user ID = interaction history

    Item Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] linear combination of history User ID [ -0.242 0.218 0.848 -0.887 … ] user vector nearest neighbor search candidates Candidates Post 1 Post 2 Post 3 Post 4 …
  54. Inference Pipeline User ID Query Candidate Generation Ranking Recommendations

  55. Problems & Solutions > Increasing Post Pool

  56. Aligning Embeddings Trained in Batches Candidate Generation Post Embedding Vectors

    Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 5 [ -0.242 0.218 0.848 -0.887 … ] Post 6 [ 0.581 -0.859 0.006 -0.598 … ] Post 7 [ 0.344 -0.834 -0.651 0.524 … ] Post 8 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] t=0 t=1 t=2 Each batch has a different post pool
  57. Aligning Embeddings Trained in Batches Candidate Generation Post Embedding Vectors

    Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 5 [ -0.242 0.218 0.848 -0.887 … ] Post 6 [ 0.581 -0.859 0.006 -0.598 … ] Post 7 [ 0.344 -0.834 -0.651 0.524 … ] Post 8 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] t=0 t=1 t=2 We want everything Need to align!
  58. Orthogonal Procrustes Problem find orthogonal matrix W same size Given

    A B , such that ∥BW − A∥2 F is minimized
  59. Orthogonal Procrustes Problem A B

  60. Orthogonal Procrustes Problem A B 1-to-1 correspondence of points find

    rotation and/or reflection matrix that maps B into A
  61. Orthogonal Procrustes Problem 1-to-1 correspondence of points A B find

    rotation and/or reflection matrix that maps B into A
  62. Orthogonal Procrustes Problem 1-to-1 correspondence of points A B find

    rotation and/or reflection matrix that maps B into A
  63. Orthogonal Procrustes Problem Aligned A & B! How can we

    use this method? B A
  64. Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 A B 1-to-1 correspondence of points find orthogonal matrix W that maps B into A
  65. Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 A B 1-to-1 correspondence of points find orthogonal matrix W that maps B into A
  66. Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 1-to-1 correspondence of points find orthogonal matrix W that maps B into A transform whole t=1 embedding matrix using W
  67. Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1′  t=0 and t=1′  are in same vector space add embeddings only in t=1′  to t=0 Aligned Embeddings!
  68. Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  69. Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  70. Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  71. Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  72. Thank You