Slide 22
Slide 22 text
21
Experimental Settings Using Real Data
We evaluated four models using vectors obtained from the following settings:
・Proposed Method (Vectorizer + Guide Task + KLD)
・Without KLD (Vectorizer + Guide Task)
・Without Guide Task (Vectorizer + KLD)
・Baseline (Vectorizer with only word prediction)
Comparison Methods (Using Real Data)
Dataset
We used the IMDb Review Dataset released on Kaggle.
From this dataset, we selected 1,000 movies that satisfy the following:
・Each movie has at least 50 reviews
・IMDb provides metadata for the movie
Number of Movie IDs:1,000
Number of Reviews:50,000
Total Number of Words:4,673,717
Number of Metadata Categories (Genres):22
Hyperparameters Value
Dimensionality
of Vector Representation
50
Batch Size 800
Negative Sampling 5
Epochs 10
Window size 5
・Used negative sampling to speed up training.
・Applied a sigmoid annealing scheduler
to prevent training collapse in early epochs.
Dataset Detail