2019 DevDay
Timeline Post Recommender
System
> Jihong Lee
> LINE Plus Data Science Dev Machine Learning Engineer
Slide 2
Slide 2 text
Agenda
> What Is a Recommender System?
> Problems, Solutions, and Lessons Learned
• User Embeddings
• Feedback Loop
• Importance of Evaluation
• Model Architecture
• Increasing Post Pool
Slide 3
Slide 3 text
What Is a Recommender System?
Slide 4
Slide 4 text
What Is a Recommender System?
Slide 5
Slide 5 text
What Is a Recommender System?
Slide 6
Slide 6 text
LINE Timeline
Slide 7
Slide 7 text
Background Knowledge
> Collaborative filtering and content-based filtering
> Two big approaches to Recommender Systems
Slide 8
Slide 8 text
Collaborative Filtering
Movie A Movie B Movie C Movie D
5 4 5
2 3
4 4
2 1 1
Ratings
5 - Excellent
4 - Good
3 - Average
2 - Not Bad
1 - Bad
Slide 9
Slide 9 text
Collaborative Filtering
Movie A Movie B Movie C Movie D
5 4 5
2 3
4 4
2 1 1
Ratings
5 - Excellent
4 - Good
3 - Average
2 - Not Bad
1 - Bad
Slide 10
Slide 10 text
Collaborative Filtering
Movie A Movie B Movie C Movie D
5 4 5
2 3
4 4
2 1 1
Ratings
5 - Excellent
4 - Good
3 - Average
2 - Not Bad
1 - Bad
Slide 11
Slide 11 text
Collaborative Filtering
Movie A Movie B Movie C Movie D
5 4 5
2 3
4 4
2 1 1
Ratings
5 - Excellent
4 - Good
3 - Average
2 - Not Bad
1 - Bad
Recommend!
Slide 12
Slide 12 text
Problem Description
> Given: A user and a post with context
> GOAL: Predict the probability that the user will click the post
Slide 13
Slide 13 text
User
Interactions
Model
Deployment
Model
Training
Preprocess
Data
Model Training Pipeline
Raw Data
Slide 14
Slide 14 text
Model Training Pipeline
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Slide 15
Slide 15 text
Model Training Pipeline
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Slide 16
Slide 16 text
Raw Data
User View History Log
User Post Author Time
User Click History Log
User Post Author Time
Join!
Slide 17
Slide 17 text
Raw Data
Labeled Data
User Post Author Time Label
1
0
1
1
1 - Positive Label (Clicked)
0 - Negative Label (Not Clicked)
Slide 18
Slide 18 text
Model Training Pipeline
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Slide 19
Slide 19 text
Model Training Pipeline
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Slide 20
Slide 20 text
Model Training Pipeline
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Slide 21
Slide 21 text
Model Training Pipeline
Simple!
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Slide 22
Slide 22 text
Model Training Pipeline
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Slide 23
Slide 23 text
Recommender System 101: Embeddings
Item 1 3 0 2 0 …
Item 2 2 1 2 4 …
Item 3 3 4 1 0 …
features
dimensionality too high!
(# of columns)
no meaning of categories
Slide 24
Slide 24 text
Recommender System 101: Embeddings
Item 1 3 0 2 0 …
Item 2 2 1 2 4 …
Item 3 3 4 1 0 …
features
dimensionality too high!
(# of columns)
Item 1 [ -0.242 0.218 0.848 -0.887 … ]
Item 1 [ -0.242 0.218 0.848 -0.887 … ]
Item 1 [ -0.242 0.218 0.848 -0.887 … ]
embedding representation
no meaning of categories
reduced dimensionality
has category meanings
Slide 25
Slide 25 text
Problems & Solutions
> User Embeddings
Slide 26
Slide 26 text
Creating Embeddings
Post 1 Post 2 Post 3 Post 4
0 1
1
1 0 1
0 1 1
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
User Embedding Vectors
[ 0.324 -0.192 -0.453 0.004 … ]
[ -0.187 0.394 -0.225 0.022 … ]
[ 0.177 0.718 -0.239 -0.422 … ]
[ -0.725 0.090 -0.353 0.228 … ]
… [ … ]
Slide 27
Slide 27 text
Creating Embeddings
Post 1 Post 2 Post 3 Post 4
0 1
1
1 0 1
0 1 1
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
User Embedding Vectors
[ 0.324 -0.192 -0.453 0.004 … ]
[ -0.187 0.394 -0.225 0.022 … ]
[ 0.177 0.718 -0.239 -0.422 … ]
[ -0.725 0.090 -0.353 0.228 … ]
… [ … ]
Extremely Sparse!!
Density < 0.0001%
Slide 28
Slide 28 text
Mitigate the Issue
Post 1 Post 2 Post 3 Post 4
Post 1 3 0 2 0
Post 2 1 1 3 2
Post 3 1 0 2 1
Post 4 0 1 4 1
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
User Embedding Vectors
[ linear combination of user history ]
[ linear combination of user history ]
[ linear combination of user history ]
[ linear combination of user history ]
… [ … ]
Worked Much Better!
Slide 29
Slide 29 text
Lesson Learned
> Different algorithms are suitable for different types of data
> Must understand the nature of your data!
Slide 30
Slide 30 text
Problems & Solutions
> Feedback Loop
Slide 31
Slide 31 text
Feedback Loop
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Slide 32
Slide 32 text
Feedback Loop
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Problematic!
Slide 33
Slide 33 text
Feedback Loop
train
recommendations for user
user interacts
Caused by Model 0
Training Data Is Biased!!
Model 0
Model 1
Slide 34
Slide 34 text
User
Interactions
Model
Deployment
Model
Training
Preprocess
Data
Feedback Loop
User
Interactions
Model
Deployment
Model
Training
Raw Data
Preprocess
Data
Raw Data
Caused by
Friends’
Shares
Not Used in Training!
Slide 35
Slide 35 text
Problems & Solutions
> Importance of Evaluation
Slide 36
Slide 36 text
AUROC (Area Under ROC Curve)
True Positive Rate
False Positive Rate
True Positive Rate:
Proportion of Correctly Classified Positive Labels
False Positive Rate:
Proportion of Negative Labels Incorrectly
Classified as Positive
: Trained Classifier
: Random Classifier
Slide 37
Slide 37 text
AUROC (Area Under ROC Curve)
True Positive Rate
False Positive Rate
True Positive Rate:
Proportion of Correctly Classified Positive Labels
False Positive Rate:
Proportion of Negative Labels Incorrectly
Classified as Positive
: Trained Classifier
: Random Classifier
Slide 38
Slide 38 text
Global AUROC
User Post Pr(Click)
A 0.992
B 0.981
A 0.977
C 0.964
B 0.951
A 0.924
C 0.918
A 0.908
C 0.905
C 0.900
B 0.898
B 0.891
… … …
Label
1
0
1
1
0
0
1
1
0
1
1
0
…
Calculate
AUROC
Slide 39
Slide 39 text
Global AUROC
User Post Pr(Click)
A 0.992
B 0.981
A 0.977
C 0.964
B 0.951
A 0.924
C 0.918
A 0.908
C 0.905
C 0.900
B 0.898
B 0.891
… … …
Label
1
0
1
1
0
0
1
1
0
1
1
0
…
Calculate
AUROC
Slide 40
Slide 40 text
Average AUROC per User
Label
1
0
1
…
User Post Pr(Click)
A 0.992
B 0.981
C 0.977
… … …
User Post Pr(Click)
A 0.990
C 0.985
D 0.972
… … …
User Post Pr(Click)
B 0.991
D 0.970
C 0.967
… … …
Label
1
0
1
…
Label
1
0
1
…
Calculate
AUROC
for each user
then average
Slide 41
Slide 41 text
Problems & Solutions
> Model Architecture
Slide 42
Slide 42 text
Problems With a Sole Ranking Model
Computationally
expensive
Concentrated
post distribution
Slide 43
Slide 43 text
Problems With a Sole Ranking Model
Concentrated
post distribution
Computationally
expensive
Slide 44
Slide 44 text
Problems With a Sole Ranking Model
Computationally
expensive
Concentrated
post distribution
Slide 45
Slide 45 text
Problems With a Sole Ranking Model
Computationally
expensive
Concentrated
post distribution
Slide 46
Slide 46 text
Pareto Optimality
Preference Criterion A
Preference Criterion B
a state of maximum efficiency
in the allocation of resources
Pareto frontier
any point on the Pareto
frontier is Pareto optimal
Slide 47
Slide 47 text
Pareto Optimality
Preference Criterion A
Preference Criterion B
a state of maximum efficiency
in the allocation of resources
Pareto frontier
any point on the Pareto
frontier is Pareto optimal
Slide 48
Slide 48 text
Current State
Personalization
KPI
Inside Pareto Frontier
Converging at KPI’s Local Maxima
A
B
Slide 49
Slide 49 text
In Search of Pareto Frontier
Personalization
KPI
Need a change in model architecture
Slide 50
Slide 50 text
Candidate Generation and Ranking
Raw Data
User
Interactions
Model
Deployment
Candidate
Generation
Ranking
Preprocess
Data
Preprocess
Data
Slide 51
Slide 51 text
Candidate Generation and Ranking
Raw Data
User
Interactions
Model
Deployment
Candidate
Generation
Ranking
Preprocess
Data
Preprocess
Data
Slide 52
Slide 52 text
Candidate Generation
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
train post embeddings
=
Post 1 Post 2 Post 3 Post 4
Post 1 3 0 2 0
Post 2 1 1 3 2
Post 3 1 0 2 1
Post 4 0 1 4 1
co-occurrence matrix
Candidate
Generation
Training
Slide 53
Slide 53 text
Candidate Generation
Candidate
Generation
Inference
user ID
=
interaction history
Item Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
linear combination
of history
User ID [ -0.242 0.218 0.848 -0.887 … ]
user vector
nearest neighbor
search
candidates
Candidates
Post 1
Post 2
Post 3
Post 4
…
Slide 54
Slide 54 text
Inference Pipeline
User ID
Query
Candidate
Generation
Ranking Recommendations
Slide 55
Slide 55 text
Problems & Solutions
> Increasing Post Pool
Slide 56
Slide 56 text
Aligning Embeddings Trained in Batches
Candidate
Generation
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
Candidate
Generation
Post Embedding Vectors
Post 3 [ -0.242 0.218 0.848 -0.887 … ]
Post 4 [ 0.581 -0.859 0.006 -0.598 … ]
Post 5 [ 0.344 -0.834 -0.651 0.524 … ]
Post 6 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
Candidate
Generation
Post Embedding Vectors
Post 5 [ -0.242 0.218 0.848 -0.887 … ]
Post 6 [ 0.581 -0.859 0.006 -0.598 … ]
Post 7 [ 0.344 -0.834 -0.651 0.524 … ]
Post 8 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
t=0 t=1 t=2
Each batch has a different post pool
Slide 57
Slide 57 text
Aligning Embeddings Trained in Batches
Candidate
Generation
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
Candidate
Generation
Post Embedding Vectors
Post 3 [ -0.242 0.218 0.848 -0.887 … ]
Post 4 [ 0.581 -0.859 0.006 -0.598 … ]
Post 5 [ 0.344 -0.834 -0.651 0.524 … ]
Post 6 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
Candidate
Generation
Post Embedding Vectors
Post 5 [ -0.242 0.218 0.848 -0.887 … ]
Post 6 [ 0.581 -0.859 0.006 -0.598 … ]
Post 7 [ 0.344 -0.834 -0.651 0.524 … ]
Post 8 [ 0.255 0.963 -0.127 -0.959 … ]
… [ … ]
t=0 t=1 t=2
We want everything
Need to align!
Slide 58
Slide 58 text
Orthogonal Procrustes Problem
find orthogonal matrix W
same size
Given
A B
,
such that
∥BW − A∥2
F is minimized
Slide 59
Slide 59 text
Orthogonal Procrustes Problem
A B
Slide 60
Slide 60 text
Orthogonal Procrustes Problem
A
B
1-to-1 correspondence of points
find rotation and/or
reflection matrix
that maps B into A
Slide 61
Slide 61 text
Orthogonal Procrustes Problem
1-to-1 correspondence of points
A
B
find rotation and/or
reflection matrix
that maps B into A
Slide 62
Slide 62 text
Orthogonal Procrustes Problem
1-to-1 correspondence of points
A
B
find rotation and/or
reflection matrix
that maps B into A
Slide 63
Slide 63 text
Orthogonal Procrustes Problem
Aligned A & B!
How can we use
this method?
B
A
Slide 64
Slide 64 text
Orthogonal Procrustes Problem
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
Post 5 [ 0.239 -0.646 0.002 -0.702 … ]
Post 6 [ -0.612 -0.408 0.052 0.064 … ]
Post 7 [ 0.139 0.118 -0.142 -0.157 … ]
… [ … ]
Post Embedding Vectors
Post 3 [ -0.242 0.218 0.848 -0.887 … ]
Post 4 [ 0.581 -0.859 0.006 -0.598 … ]
Post 5 [ 0.344 -0.834 -0.651 0.524 … ]
Post 6 [ 0.255 0.963 -0.127 -0.959 … ]
Post 7 [ -0.299 0.808 0.677 -0.604 … ]
Post 8 [ 0.992 -0.795 0.062 -0.490 … ]
Post 9 [ 0.855 0.622 -0.793 -0.329 … ]
… [ … ]
t=0 t=1
A
B
1-to-1 correspondence of points
find orthogonal matrix
W that maps B into A
Slide 65
Slide 65 text
Orthogonal Procrustes Problem
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
Post 5 [ 0.239 -0.646 0.002 -0.702 … ]
Post 6 [ -0.612 -0.408 0.052 0.064 … ]
Post 7 [ 0.139 0.118 -0.142 -0.157 … ]
… [ … ]
Post Embedding Vectors
Post 3 [ -0.242 0.218 0.848 -0.887 … ]
Post 4 [ 0.581 -0.859 0.006 -0.598 … ]
Post 5 [ 0.344 -0.834 -0.651 0.524 … ]
Post 6 [ 0.255 0.963 -0.127 -0.959 … ]
Post 7 [ -0.299 0.808 0.677 -0.604 … ]
Post 8 [ 0.992 -0.795 0.062 -0.490 … ]
Post 9 [ 0.855 0.622 -0.793 -0.329 … ]
… [ … ]
t=0 t=1
A
B
1-to-1 correspondence of points
find orthogonal matrix
W that maps B into A
Slide 66
Slide 66 text
Orthogonal Procrustes Problem
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
Post 5 [ 0.239 -0.646 0.002 -0.702 … ]
Post 6 [ -0.612 -0.408 0.052 0.064 … ]
Post 7 [ 0.139 0.118 -0.142 -0.157 … ]
… [ … ]
Post Embedding Vectors
Post 3 [ -0.242 0.218 0.848 -0.887 … ]
Post 4 [ 0.581 -0.859 0.006 -0.598 … ]
Post 5 [ 0.344 -0.834 -0.651 0.524 … ]
Post 6 [ 0.255 0.963 -0.127 -0.959 … ]
Post 7 [ -0.299 0.808 0.677 -0.604 … ]
Post 8 [ 0.992 -0.795 0.062 -0.490 … ]
Post 9 [ 0.855 0.622 -0.793 -0.329 … ]
… [ … ]
t=0 t=1
1-to-1 correspondence of points
find orthogonal matrix
W that maps B into A
transform whole t=1
embedding matrix
using W
Slide 67
Slide 67 text
Orthogonal Procrustes Problem
Post Embedding Vectors
Post 1 [ -0.242 0.218 0.848 -0.887 … ]
Post 2 [ 0.581 -0.859 0.006 -0.598 … ]
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
Post 5 [ 0.239 -0.646 0.002 -0.702 … ]
Post 6 [ -0.612 -0.408 0.052 0.064 … ]
Post 7 [ 0.139 0.118 -0.142 -0.157 … ]
… [ … ]
Post Embedding Vectors
Post 3 [ 0.344 -0.834 -0.651 0.524 … ]
Post 4 [ 0.255 0.963 -0.127 -0.959 … ]
Post 5 [ 0.239 -0.646 0.002 -0.702 … ]
Post 6 [ -0.612 -0.408 0.052 0.064 … ]
Post 7 [ 0.139 0.118 -0.142 -0.157 … ]
Post 8 [ 0.992 -0.795 0.062 -0.490 … ]
Post 9 [ 0.855 0.622 -0.793 -0.329 … ]
… [ … ]
t=0 t=1′
t=0 and t=1′ are in
same vector space
add embeddings only in
t=1′ to t=0
Aligned Embeddings!
Slide 68
Slide 68 text
Summary
> Understand the nature of your data
> Dual importance of quantitative and qualitative evaluation
> “Perfection is not attainable. But if we chase perfection, we can catch
excellence.” - Vince Lombardi
> Model architecture is essential
> Understanding your evaluation metric
Slide 69
Slide 69 text
Summary
> Understand the nature of your data
> Dual importance of quantitative and qualitative evaluation
> “Perfection is not attainable. But if we chase perfection, we can catch
excellence.” - Vince Lombardi
> Model architecture is essential
> Understanding your evaluation metric
Slide 70
Slide 70 text
Summary
> Understand the nature of your data
> Dual importance of quantitative and qualitative evaluation
> “Perfection is not attainable. But if we chase perfection, we can catch
excellence.” - Vince Lombardi
> Model architecture is essential
> Understanding your evaluation metric
Slide 71
Slide 71 text
Summary
> Understand the nature of your data
> Dual importance of quantitative and qualitative evaluation
> “Perfection is not attainable. But if we chase perfection, we can catch
excellence.” - Vince Lombardi
> Model architecture is essential
> Understanding your evaluation metric