RecSys2023論文読み会 - Augmented Negative Sampling for Collaborative Filtering

© 2023 Wantedly, Inc. Augmented Negative Sampling for Collaborative Filtering
RecSys2023 論文読み会 Y. Zhao, R. Chen, R. Lai, Q. Han, H. Song, and L. Chen Oct. 21 2023 - Presenter: Yudai Hayashi

© 2023 Wantedly, Inc. Self-introduction 林　悠大 • 経歴： ◦ 東京大学工学系研究科で博士号取得
◦ 2022年にウォンテッドリーにデータサイエンティストとして新卒入社 • Twitter(X): @python_walker • 趣味： ◦ 読書 ◦ 音楽聴くこと ◦ ウイスキー Twitter

© 2023 Wantedly, Inc. Short Summary • 解決したかった課題： ◦ Negative
samplingするときに正例と近いサンプルを取ってこようとすることが多いが、それ以外の負例にも学習に有用な情報はあるはず ◦ 負例の情報をもっと活用したい • 手法： ◦ 正例に近くない負例もaugmentationによって正例に近づけてやることで学習効率を担保しつつ多くの情報を取り込む • 結果： ◦ CFモデルで性能の向上を実現

© 2023 Wantedly, Inc. Introduction：CFとnegative sampling = x #User #Item
User Vector Item Vector 負例行列分解（Matrix Factorization, MF）

© 2023 Wantedly, Inc. Introduction：CFとnegative sampling K. Mao et al.,
CIKM’21 CFの性能に対するnegative samplingの効果 CFにおいてnegative samplingが大きな効果を持つことはこれまでに知られていた

© 2023 Wantedly, Inc. Introduction：negative samplingの手法 RNS DNS コスト負例の品質
サンプリング戦略ランダム高スコアの負例を選択低高低高

© 2023 Wantedly, Inc. Introduction：negative samplingの手法モデルの出力スコアが高い負例 = 良い負例というのは正しいのか？ RNS
DNS コスト負例の品質サンプリング戦略ランダム高スコアの負例を選択低高低高?

© 2023 Wantedly, Inc. Motivation：既存手法の問題点 ①Ambiguous trap 学習が進むにつれて負例のスコア分布が低い方に寄る
スコアの高い負例をサンプリングしてくるのがどんどん難しくなっていく

© 2023 Wantedly, Inc. Motivation：既存手法の問題点 ②Information discrimination：スコアの低い負例には有用な情報は無い？ Hx：モデルxで当てたインタラクション数 x
y PER(DNS, RNS)

© 2023 Wantedly, Inc. Motivation：既存手法の問題点 ②Information discrimination：スコアの低い負例には有用な情報は無い？スコアの低い負例を使わないと学習できない情報が多くある Hx：モデルxで当てたインタラクション数
x y PER(RNS, DNS)!

© 2023 Wantedly, Inc. Disentanglementの評価 t-SNE • negative_hard(nh)と positive_hard(ph)が近い •
nhをランダムサンプリングして学習させた結果HNSと同等の性能負例から正例に近い要素をうまく分離できている

© 2023 Wantedly, Inc. Discussions Amazon-Baby dataset ANSではRecallの上昇が大きい状態を長く維持できている良い負例を継続的に取ってこれてい
る (Ambiguous trapの緩和) DNS vs ANS の負例の被り度合いモデルスコアの低い負例まで満遍なく使えている(Information discriminationの緩和)

© 2023 Wantedly, Inc. Conclusion • 負例の情報を包括的に学習に用いることができるnegative sampling手法ANSを提案 •
負例サンプルからより正例に近いデータを作り出して学習に利用 • 既存のnegative sampling手法と比較してCFモデルの大幅な性能向上を実現

© 2023 Wantedly, Inc. References • Y. Zhao et al.,
Augmented Negative Sampling for Collaborative Filtering. 2023. In Seventeenth ACM Conference on Recommender Systems (RecSys ’23). • K. Mao et al., SimpleX: A Simple and Strong Baseline for Collaborative Filtering. 2021. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM ’21)

RecSys2023論文読み会 - Augmented Negative Sampling f...

RecSys2023論文読み会 - Augmented Negative Sampling for Collaborative Filtering

Yudai Hayashi

More Decks by Yudai Hayashi

Other Decks in Research

Featured

Transcript

© 2023 Wantedly, Inc. Augmented Negative Sampling for Collaborative Filtering

© 2023 Wantedly, Inc. Self-introduction 林　悠大 • 経歴： ◦ 東京大学工学系研究科で博士号取得

© 2023 Wantedly, Inc. Short Summary • 解決したかった課題： ◦ Negative

© 2023 Wantedly, Inc. Introduction：CFとnegative sampling = x #User #Item

© 2023 Wantedly, Inc. Introduction：CFとnegative sampling K. Mao et al.,

© 2023 Wantedly, Inc. Introduction：negative samplingの手法 RNS DNS コスト負例の品質

© 2023 Wantedly, Inc. Introduction：negative samplingの手法 RNS DNS コスト負例の品質

© 2023 Wantedly, Inc. Introduction：negative samplingの手法モデルの出力スコアが高い負例 = 良い負例というのは正しいのか？ RNS

© 2023 Wantedly, Inc. Motivation：既存手法の問題点 ①Ambiguous trap 学習が進むにつれて負例のスコア分布が低い方に寄る

© 2023 Wantedly, Inc. Motivation：既存手法の問題点 ②Information discrimination：スコアの低い負例には有用な情報は無い？ Hx：モデルxで当てたインタラクション数 x

© 2023 Wantedly, Inc. Motivation：既存手法の問題点 ②Information discrimination：スコアの低い負例には有用な情報は無い？スコアの低い負例を使わないと学習できない情報が多くある Hx：モデルxで当てたインタラクション数

© 2023 Wantedly, Inc. Method：ANS (Augmented Negative Sampling) ：正例に近い成分

© 2023 Wantedly, Inc. Method：ANS (Augmented Negative Sampling) 負例の簡単な部分を正例に寄せる

© 2023 Wantedly, Inc. Method：ANS (Augmented Negative Sampling) ：正例に近い負例がほしい：もとはスコアが低かったサンプルの情報が　ほ

© 2023 Wantedly, Inc. Method：ANS (Augmented Negative Sampling) BPR-loss ベクトルの分離に

© 2023 Wantedly, Inc. Results Top-10指標で10 %を超える大きな性能向上

© 2023 Wantedly, Inc. Disentanglementの評価 t-SNE • negative_hard(nh)と positive_hard(ph)が近い •

© 2023 Wantedly, Inc. Discussions Amazon-Baby dataset ANSではRecallの上昇が大きい状態を長く維持できている良い負例を継続的に取ってこれてい

© 2023 Wantedly, Inc. Conclusion • 負例の情報を包括的に学習に用いることができるnegative sampling手法ANSを提案 •

© 2023 Wantedly, Inc. References • Y. Zhao et al.,

© 2023 Wantedly, Inc.

© 2023 Wantedly, Inc. Appendix: ハイパーパラメータ依存性