Recsys勉強会2015資料_1

Slide 1

Slide 1 text

Recsysษڧձ2015 [4a-3] Learning Distributed Representations from Reviews for Collaborative Filtering ! A. Almahairi, K. Kastner, K. Cho, A.Courville ٠ాངฏ [email protected] https://www.facebook.com/yohei.kikuta.3  2015/10/17

Slide 5

Slide 5 text

݁Ռ [4a-3] ٠ాངฏ 5/6 4.2 Experimental Setup Data Preparation. We closely follow the procedure from [13] and [12], where the evaluation is done per category. We randomly select 80% of ratings, up to two million samples, as a training set, and split the rest evenly into validation and test sets, for each category. We preprocess reviews only by tokenizing them using a script from Moses6, after which we build a vocabulary of 5000 most frequent words. Evaluation Criteria. We use mean squared error (MSE) of the rating prediction to evaluate each approach. For assessing the performance on review modeling, we use the average negative log-likelihood. Baseline. We compare the two proposed approaches, BoWLF (see Sec. 3.2.1) and LMLF (see Sec. 3.2.2), against three baseline methods; matrix factorization with L 2 regularization (MF, see Eqs. (1)–(2)), the HFT model from [13] (see Sec. 3.3) and the RMR model from [12]. In the case of HFT, we report the performance both by evaluating the model ourselves7 and by reporting the results from [13] directly. For RMR, we only report the results from [12]. Hyper-parameters. Both user u and product i vectors in Eq. (1) are ﬁve dimensional for all the experiments in this section. This choice was made mainly to make the results comparable to the previously reported ones in [13] and [12]. We initialize all the user and product representations by sampling each element from a zero-mean Gaussian distri- bution with its standard deviation set to 0.01. The biases, µ, u and i are all initialized to 0. All the parameters in BoWLF and LMLF are initialized similarly except for the recurrent weights of the RNN-LM in LMLF which were initialized to be orthogonal. Training Procedure. When training MF, BoWLF and LMLF, we use minibatch RMSProp with the learning rate, momentum coe cient and the size of minibatch set to 0.01, 0.9 and 128, respectively. We trained each model at most 200 epochs, while monitoring the validation performance. For HFT, we follow [13] which uses the Expectation Maximization algorithm together with L-BFGS. In all cases, we early-stop each training run based on the validation set performance. In the preliminary experiments, we found the choice of ↵ in Eq. (6), which balances matrix factorization and review modeling, to be important. We searched for the ↵ that max- categories in terms of MSE with the standard error of mean shown in parentheses. From this table, we can see that except for a single category of“Jewelry”, the proposed BoWLF outperforms all the other models with an improvement of 20.29% over MF and 5.64% over HFT across all categories.8 In general, we note better performance of BoWLF and LMLF models over other methods especially as the size of the dataset grows, which is evident from Figs. 1 and 2. Figure 1: Scatterplot showing performance improvement over the number of samples. We see a performance improvement of BoWLF over HFT as dataset size increases. Figure 2: Scatterplot showing performance improve- each category. We preprocess reviews only by tokenizing them using a script from Moses6, after which we build a vocabulary of 5000 most frequent words. Evaluation Criteria. We use mean squared error (MSE) of the rating prediction to evaluate each approach. For assessing the performance on review modeling, we use the average negative log-likelihood. Baseline. We compare the two proposed approaches, BoWLF (see Sec. 3.2.1) and LMLF (see Sec. 3.2.2), against three baseline methods; matrix factorization with L 2 regularization (MF, see Eqs. (1)–(2)), the HFT model from [13] (see Sec. 3.3) and the RMR model from [12]. In the case of HFT, we report the performance both by evaluating the model ourselves7 and by reporting the results from [13] directly. For RMR, we only report the results from [12]. Hyper-parameters. Both user u and product i vectors in Eq. (1) are ﬁve dimensional for all the experiments in this section. This choice was made mainly to make the results comparable to the previously reported ones in [13] and [12]. We initialize all the user and product representations by sampling each element from a zero-mean Gaussian distri- bution with its standard deviation set to 0.01. The biases, µ, u and i are all initialized to 0. All the parameters in BoWLF and LMLF are initialized similarly except for the recurrent weights of the RNN-LM in LMLF which were initialized to be orthogonal. Training Procedure. When training MF, BoWLF and LMLF, we use minibatch RMSProp with the learning rate, momentum coe cient and the size of minibatch set to 0.01, 0.9 and 128, respectively. We trained each model at most 200 epochs, while monitoring the validation performance. For HFT, we follow [13] which uses the Expectation Maximization algorithm together with L-BFGS. In all cases, we early-stop each training run based on the validation set performance. In the preliminary experiments, we found the choice of ↵ in Eq. (6), which balances matrix factorization and review modeling, to be important. We searched for the ↵ that max- imizes the validation performance, in the range of [0.1, 0.01]. We used a CPU cluster of 16 nodes each with 8 cores and 8 16 GB of memory to run experiments on BoWLF, MF, 6 https://github.com/moses-smt/mosesdecoder/ 7 The code was kindly provided by the authors of [13]. grows, which is evident from Figs. 1 and 2. Figure 1: Scatterplot showing performance improvement over the number of samples. We see a performance improvement of BoWLF over HFT as dataset size increases. Figure 2: Scatterplot showing performance improvement over the number of samples. We see a modest performance improvement of LMLF over HFT as dataset size increases. 8 Due to the use of di↵erent splits, the results by HFT reported in [13] and RMR in [12] are not directly comparable. Dataset (a) (b) (c) (d) BoWLF improvement Dataset Size MF HFT BoWLF LMLF over (a) over (b) HFT* RMR** Arts 27K 1.434 (0.04) 1.425 (0.04) 1.413 (0.04) 1.426 (0.04) 2.15% 1.18% 1.388 1.371 Jewelry 58K 1.227 (0.04) 1.208 (0.03) 1.214 (0.03) 1.218 (0.03) 1.24% -0.59% 1.178 1.160 Watches 68K 1.511 (0.03) 1.468 (0.03) 1.466 (0.03) 1.473 (0.03) 4.52% 0.20% 1.486 1.458 Cell Phones 78K 2.133 (0.03) 2.082 (0.02) 2.076 (0.02) 2.077 (0.02) 5.76% 0.66% N/A 2.085 Musical Inst. 85K 1.426 (0.02) 1.382 (0.02) 1.375 (0.02) 1.388 (0.02) 5.12% 0.75% 1.396 1.374 Software 95K 2.241 (0.02) 2.194 (0.02) 2.174 (0.02) 2.203 (0.02) 6.70% 2.06% 2.197 2.173 Industrial 137K 0.360 (0.01) 0.354 (0.01) 0.352 (0.01) 0.356 (0.01) 0.76% 0.24% 0.357 0.362 O ce Products 138K 1.662 (0.02) 1.656 (0.02) 1.629 (0.02) 1.646 (0.02) 3.32% 2.72% 1.680 1.638 Gourmet Foods 154K 1.517 (0.02) 1.486 (0.02) 1.464 (0.02) 1.478 (0.02) 5.36% 2.22% 1.431 1.465 Automotive 188K 1.460 (0.01) 1.429 (0.01) 1.419 (0.01) 1.428 (0.01) 4.17% 1.03% 1.428 1.403 Kindle Store 160K 1.496 (0.01) 1.435 (0.01) 1.418 (0.01) 1.437 (0.01) 7.83% 1.76% N/A 1.412 Baby 184K 1.492 (0.01) 1.437 (0.01) 1.432 (0.01) 1.443 (0.01) 5.95% 0.48% 1.442 N/A Patio 206K 1.725 (0.01) 1.687 (0.01) 1.674 (0.01) 1.680 (0.01) 5.10% 1.24% N/A 1.669 Pet Supplies 217K 1.583 (0.01) 1.554 (0.01) 1.536 (0.01) 1.544 (0.01) 4.74% 1.78% 1.582 1.562 Beauty 252K 1.378 (0.01) 1.373 (0.01) 1.335 (0.01) 1.370 (0.01) 4.33% 3.82% 1.347 1.334 Shoes 389K 0.226 (0.00) 0.231 (0.00) 0.224 (0.00) 0.225 (0.00) 0.23% 0.72% 0.226 0.251 Tools & Home 409K 1.535 (0.01) 1.498 (0.01) 1.477 (0.01) 1.490 (0.01) 5.78% 2.15% 1.499 1.491 Health 428K 1.535 (0.01) 1.509 (0.01) 1.481 (0.01) 1.499 (0.01) 5.35% 2.82% 1.528 1.512 Toys & Games 435K 1.411 (0.01) 1.372 (0.01) 1.363 (0.01) 1.367 (0.01) 4.71% 0.89% 1.366 1.372 Video Games 463K 1.566 (0.01) 1.501 (0.01) 1.481 (0.01) 1.490 (0.01) 8.47% 2.00% 1.511 1.510 Sports 510K 1.144 (0.01) 1.137 (0.01) 1.115 (0.01) 1.127 (0.01) 2.94% 2.19% 1.136 1.129 Clothing 581K 0.339 (0.00) 0.343 (0.00) 0.333 (0.00) 0.344 (0.00) 0.60% 1.01% 0.327 0.336 Amazon Video 717K 1.317 (0.01) 1.239 (0.01) 1.184 (0.01) 1.206 (0.01) 13.33% 5.47% N/A 1.270 Home 991K 1.587 (0.00) 1.541 (0.00) 1.513 (0.00) 1.535 (0.01) 7.41% 2.79% 1.527 1.501 Electronics 1.2M 1.754 (0.00) 1.694 (0.00) 1.671 (0.00) 1.698 (0.00) 8.29% 2.30% 1.724 1.722 Music 6.3M 1.112 (0.00) 0.970 (0.00) 0.920 (0.00) 0.924 (0.00) 19.15% 4.94% 0.969 0.959 Movies & Tv 7.8M 1.379 (0.00) 1.089 (0.00) 0.999 (0.00) 1.022 (0.00) 37.95% 9.01% 1.119 1.120 Books 12.8M 1.272 (0.00) 1.141 (0.00) 1.080 (0.00) 1.110 (0.00) 19.21% 6.12% 1.135 1.113 All categories 35.3M 1.289 1.143 1.086 1.107 20.29% 5.64% Table 1: Prediction Mean Squared Error results on test data. Standard error of mean in parenthesis. Dimensionality of latent factors dim( i ) = 5 for all models. Best results for each dataset in bold. HFT* and RMR** represent original paper results over di↵erent data splits [13, 12]. Interestingly, BoWLF always outperforms LMLF. These results indicate that the complex language model, which the LMLF learns using an LSTM network, does not seem to improve over a simple bag-of-word representation, which the BoWLF learns, in terms of the learned product representations. This can be understood from how the product representa- ਤද͸ݪ࿦จΑΓҾ༻ ࠷΋γϯϓϧͳBoWLF͕ྑ͍݁Ռ → ݴޠϞσϧͷੑೳ͸ॏཁͰͳ͍Մೳੑ BoWLF͕HFTΑΓ্ LMLF͕HFTΑΓ্

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text