Pro Yearly is on sale from $80 to $50! »

Timeline Post Recommender System

Timeline Post Recommender System

by Jihong Lee@LINE TECHPULSE 2019 https://techpulse.line.me/

2102a6b8760bd6f57f672805723dd83a?s=128

line_developers_tw

December 04, 2019
Tweet

Transcript

  1. None
  2. > Jihong Lee / Data Science Dev LINE Timeline Post

    Recommender System
  3. Agenda > What is a Recommender System? > Problems, Solutions,

    and Lessons Learned • User Embeddings • Feedback Loop • Importance of Evaluation • Model Architecture • Increasing Post Pool Size
  4. What Is a Recommender System?

  5. What Is a Recommender System?

  6. What Is a Recommender System?

  7. LINE Timeline

  8. Background Knowledge > Collaborative filtering and content-based filtering > Two

    big approaches to Recommender Systems
  9. Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  10. Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  11. Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad
  12. Collaborative Filtering Movie A Movie B Movie C Movie D

    5 4 5 2 3 4 4 2 1 1 Ratings 5 - Excellent 4 - Good 3 - Average 2 - Not Bad 1 - Bad Recommend!
  13. Problem Description > Given: A user and a post with

    context > GOAL: Predict the probability that the user will click the post
  14. User Interactions Model Deployment Model Training Preprocess Data Model Training

    Pipeline Raw Data
  15. Model Training Pipeline User Interactions Model Deployment Model Training Preprocess

    Data Raw Data
  16. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  17. Raw Data User View History Log User Post Author Time

    User Click History Log User Post Author Time Join!
  18. Raw Data Labeled Data User Post Author Time Label 1

    0 1 1 1 - Positive Label (Clicked) 0 - Negative Label (Not Clicked)
  19. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  20. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  21. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  22. Model Training Pipeline Simple! User Interactions Model Deployment Model Training

    Raw Data Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data
  23. Model Training Pipeline User Interactions Model Deployment Model Training Raw

    Data Preprocess Data
  24. Recommender System 101: Embeddings Item 1 3 0 2 0

    … Item 2 2 1 2 4 … Item 3 3 4 1 0 … features dimensionality too high! (# of columns) no meaning of categories
  25. Recommender System 101: Embeddings Item 1 3 0 2 0

    … Item 2 2 1 2 4 … Item 3 3 4 1 0 … features dimensionality too high! (# of columns) Item 1 [ -0.242 0.218 0.848 -0.887 … ] Item 1 [ -0.242 0.218 0.848 -0.887 … ] Item 1 [ -0.242 0.218 0.848 -0.887 … ] embedding representation no meaning of categories reduced dimensionality has category meanings
  26. Problems & Solutions > User Embeddings

  27. Creating Embeddings Post 1 Post 2 Post 3 Post 4

    0 1 1 1 0 1 0 1 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ 0.324 -0.192 -0.453 0.004 … ] [ -0.187 0.394 -0.225 0.022 … ] [ 0.177 0.718 -0.239 -0.422 … ] [ -0.725 0.090 -0.353 0.228 … ] … [ … ]
  28. Creating Embeddings Post 1 Post 2 Post 3 Post 4

    0 1 1 1 0 1 0 1 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ 0.324 -0.192 -0.453 0.004 … ] [ -0.187 0.394 -0.225 0.022 … ] [ 0.177 0.718 -0.239 -0.422 … ] [ -0.725 0.090 -0.353 0.228 … ] … [ … ] Extremely Sparse!! Density < 0.0001%
  29. Mitigate the Issue Post 1 Post 2 Post 3 Post

    4 Post 1 3 0 2 0 Post 2 1 1 3 2 Post 3 1 0 2 1 Post 4 0 1 4 1 Post Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] User Embedding Vectors [ linear combination of user history ] [ linear combination of user history ] [ linear combination of user history ] [ linear combination of user history ] … [ … ] Worked Much Better!
  30. Lesson Learned > Different algorithms are suitable for different types

    of data > Must understand the nature of your data!
  31. Problems & Solutions > Feedback Loop

  32. Feedback Loop User Interactions Model Deployment Model Training Raw Data

    Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data
  33. Feedback Loop Problematic! User Interactions Model Deployment Model Training Raw

    Data Preprocess Data User Interactions Model Deployment Model Training Raw Data Preprocess Data
  34. Feedback Loop train recommendations for user user interacts Caused by

    Model 0 Training Data Is Biased!! Model 0 Model 1
  35. User Interactions Model Deployment Model Training Preprocess Data Feedback Loop

    User Interactions Model Deployment Model Training Raw Data Preprocess Data Raw Data Caused by Friends’ Shares User Interactions Model Deployment Model Training Raw Data Preprocess Data Not Used in Training!
  36. Problems & Solutions > Importance of Evaluation

  37. AUROC (Area Under ROC Curve) True Positive Rate False Positive

    Rate True Positive Rate: Proportion of Correctly Classified Positive Labels False Positive Rate: Proportion of Negative Labels Incorrectly Classified as Positive : Trained Classifier : Random Classifier
  38. AUROC (Area Under ROC Curve) True Positive Rate False Positive

    Rate True Positive Rate: Proportion of Correctly Classified Positive Labels False Positive Rate: Proportion of Negative Labels Incorrectly Classified as Positive : Trained Classifier : Random Classifier
  39. Global AUROC User Post Pr(Click) A 0.992 B 0.981 A

    0.977 C 0.964 B 0.951 A 0.924 C 0.918 A 0.908 C 0.905 C 0.900 B 0.898 B 0.891 … … … Label 1 0 1 1 0 0 1 1 0 1 1 0 … Calculate AUROC
  40. Global AUROC User Post Pr(Click) A 0.992 B 0.981 A

    0.977 C 0.964 B 0.951 A 0.924 C 0.918 A 0.908 C 0.905 C 0.900 B 0.898 B 0.891 … … … Label 1 0 1 1 0 0 1 1 0 1 1 0 … Calculate AUROC
  41. Average AUROC per User Label 1 0 1 … User

    Post Pr(Click) A 0.992 B 0.981 C 0.977 … … … User Post Pr(Click) A 0.990 C 0.985 D 0.972 … … … User Post Pr(Click) B 0.991 D 0.970 C 0.967 … … … Label 1 0 1 … Label 1 0 1 … Calculate AUROC for each user then average
  42. Problems & Solutions > Model Architecture

  43. Problems With a Sole Ranking Model Computationally expensive Concentrated post

    distribution
  44. Problems With a Sole Ranking Model Concentrated post distribution Computationally

    expensive
  45. Problems With a Sole Ranking Model Computationally expensive Concentrated post

    distribution
  46. Problems With a Sole Ranking Model Computationally expensive Concentrated post

    distribution
  47. Pareto Optimality Preference Criterion A Preference Criterion B a state

    of maximum efficiency in the allocation of resources Pareto frontier any point on the Pareto frontier is Pareto optimal
  48. Pareto Optimality Preference Criterion A Preference Criterion B a state

    of maximum efficiency in the allocation of resources Pareto frontier any point on the Pareto frontier is Pareto optimal
  49. Current State Personalization KPI Inside Pareto Frontier Converging at KPI’s

    Local Maxima A B
  50. In Search of Pareto Frontier Personalization KPI Need a change

    in model architecture
  51. Candidate Generation and Ranking Raw Data User Interactions Model Deployment

    Candidate Generation Ranking Preprocess Data Preprocess Data
  52. Candidate Generation and Ranking Raw Data User Interactions Model Deployment

    Candidate Generation Ranking Preprocess Data Preprocess Data
  53. Candidate Generation Post Embedding Vectors Post 1 [ -0.242 0.218

    0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] train post embeddings = Post 1 Post 2 Post 3 Post 4 Post 1 3 0 2 0 Post 2 1 1 3 2 Post 3 1 0 2 1 Post 4 0 1 4 1 co-occurrence matrix Candidate Generation Training
  54. Candidate Generation Candidate Generation Inference user ID = interaction history

    Item Embedding Vectors Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] linear combination of history User ID [ -0.242 0.218 0.848 -0.887 … ] user vector nearest neighbor search candidates Candidates Post 1 Post 2 Post 3 Post 4 …
  55. Inference Pipeline User ID Query Candidate Generation Ranking Recommendations

  56. Problems & Solutions > Increasing Post Pool

  57. Aligning Embeddings Trained in Batches Candidate Generation Post Embedding Vectors

    Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 5 [ -0.242 0.218 0.848 -0.887 … ] Post 6 [ 0.581 -0.859 0.006 -0.598 … ] Post 7 [ 0.344 -0.834 -0.651 0.524 … ] Post 8 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] t=0 t=1 t=2 Each batch has a different post pool
  58. Aligning Embeddings Trained in Batches Candidate Generation Post Embedding Vectors

    Post 1 [ -0.242 0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] Candidate Generation Post Embedding Vectors Post 5 [ -0.242 0.218 0.848 -0.887 … ] Post 6 [ 0.581 -0.859 0.006 -0.598 … ] Post 7 [ 0.344 -0.834 -0.651 0.524 … ] Post 8 [ 0.255 0.963 -0.127 -0.959 … ] … [ … ] t=0 t=1 t=2 We want everything Need to align!
  59. Orthogonal Procrustes Problem find orthogonal matrix W same size Given

    A B , such that ∥BW − A∥2 F is minimized
  60. Orthogonal Procrustes Problem A B

  61. Orthogonal Procrustes Problem A B 1-to-1 correspondence of points find

    rotation and/or reflection matrix that maps B into A
  62. Orthogonal Procrustes Problem 1-to-1 correspondence of points A B find

    rotation and/or reflection matrix that maps B into A
  63. Orthogonal Procrustes Problem 1-to-1 correspondence of points A B find

    rotation and/or reflection matrix that maps B into A
  64. Orthogonal Procrustes Problem Aligned A & B! How can we

    use this method? B A
  65. Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 A B 1-to-1 correspondence of points find orthogonal matrix W that maps B into A
  66. Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 A B 1-to-1 correspondence of points find orthogonal matrix W that maps B into A
  67. Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ -0.242 0.218 0.848 -0.887 … ] Post 4 [ 0.581 -0.859 0.006 -0.598 … ] Post 5 [ 0.344 -0.834 -0.651 0.524 … ] Post 6 [ 0.255 0.963 -0.127 -0.959 … ] Post 7 [ -0.299 0.808 0.677 -0.604 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1 1-to-1 correspondence of points find orthogonal matrix W that maps B into A transform whole t=1 embedding matrix using W
  68. Orthogonal Procrustes Problem Post Embedding Vectors Post 1 [ -0.242

    0.218 0.848 -0.887 … ] Post 2 [ 0.581 -0.859 0.006 -0.598 … ] Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] … [ … ] Post Embedding Vectors Post 3 [ 0.344 -0.834 -0.651 0.524 … ] Post 4 [ 0.255 0.963 -0.127 -0.959 … ] Post 5 [ 0.239 -0.646 0.002 -0.702 … ] Post 6 [ -0.612 -0.408 0.052 0.064 … ] Post 7 [ 0.139 0.118 -0.142 -0.157 … ] Post 8 [ 0.992 -0.795 0.062 -0.490 … ] Post 9 [ 0.855 0.622 -0.793 -0.329 … ] … [ … ] t=0 t=1′  t=0 and t=1′  are in same vector space add embeddings only in t=1′  to t=0 Aligned Embeddings!
  69. Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  70. Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  71. Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  72. Summary > Understand the nature of your data > Dual

    importance of quantitative and qualitative evaluation > “Perfection is not attainable. But if we chase perfection, we can catch excellence.” - Vince Lombardi > Model architecture is essential > Understanding your evaluation metric
  73. Thank You