REVEAL Workshopの紹介

REVEAL Workshop 齋藤優太

Outline • Workshopの概要 • Metrics, Engagement, and Recommenders (Invited Talk)
• Marginal Posterior Sampling for Slate Bandit (Oral Presentation) • Posterから2つの論⽂を軽く紹介 (Poster Presentation) • RecoGym Challenge (Competition) • 類似Workshopの紹介 (NeurIPSʼ19)

そもそもRecSysの⽇程・構成って︖ • Main Conference: 9⽉16⽇ ~ 9⽉18⽇ ◦ long・short paperのoral
presentationやposter presentation ◦ Industry sessionやPanel discussion • Tutorial: 9⽉19⽇午前 ◦ 推薦におけるバンディットやグラフ等の話 • Workshop: 9⽉19⽇午後 ~ 9⽉20⽇ ◦ 9⽉19⽇の午後にhalf-day workshop ◦ 9⽉20⽇は基本的にfull-day workshop ◦ 今回紹介するREVEALは学会最終⽇のfull-day workshop

REVEAL Workshop 概要 • 推薦システムに潜在するバイアスの存在の指摘やその除去⽅法、バンディット・強化学習との関連に特化したWorkshop • organizersやtalkersがall-star級に豪華 • 2018年に続いて2回⽬の開催
• 4 invited talks + 7 oral presentations + 23 poster presentations • CAからはADEcon Teamが2本のポスター発表（⾃分のやつ、Yale⼤学成⽥先⽣との共同研究） • その他Criteoから5本, Google・Netﬂixから2本. ⽇本からは富⼠通の今井さん

Metrics, Engagement and Recommenders (Invited Talk) 概要 • onlineのuser engagementを測定するための
指標(metric)として何が適しているかについて議論（本当にclickでいいの︖） • 例えば, clickよりもそのpageやappでどれくらいの時間を消費したか (dwell time)で最適化した⽅が, 結果的にCTRの改善につながるなど, spotifyの事例を交えて紹介 • 関連する本会議論⽂. clickのあとの⾏動によってlabelを定義して学習. ( Leveraging Post-click Feedback for Content Recommendations )

Metrics, Engagement and Recommenders (Invited Talk) Deriving User- and Content-specific
Rewards for Contextual Bandit (WWWʼ19) • spotifyのplaylist recommendationにおける報酬の定義をどうすべきか︖ baselineは閾値を決めてのbinalize bandit policy

Metrics, Engagement and Recommenders (Invited Talk) • しかしstreaming timeの分布はuserやplaylistの性質によって⼤きく異なる sleep
playlist (緑) のstream timeが⻑い jazz listener (緑) のstream timeが⻑い Deriving User- and Content-specific Rewards for Contextual Bandit (WWWʼ19)

Metrics, Engagement and Recommenders (Invited Talk) • co-clusteringによりuser-playlistのgropingをして, それぞれで報酬の定義を変える Deriving
User- and Content-specific Rewards for Contextual Bandit (WWWʼ19)

Metrics, Engagement and Recommenders (Invited Talk) • adaptiveな報酬の定義やco-clusteringを取り⼊れることでstream rateが改善報酬の定義による効果
クラスタリングによる効果 Deriving User- and Content-specific Rewards for Contextual Bandit (WWWʼ19)

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 概要 •
複数のarmの組み合わせに対して1つのrewardが与えられる slate的な状況における新たなbanditアルゴリズムを提案 • 通常のバンディットよりも応⽤場⾯が多そうな問題設定 • 提案⼿法は既存のbaselineの精度と共に意思決定にかかる時間を改善 • long versionは, IJCAI19にfull paperとして採択済み

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 研究の背景 •
click確率を最⼤化する各slot配置の組み合わせ(slate)の最適解は︖ • action数が組み合わせの数だけあり学習の効率化が難 • 既存⼿法は、学習が⾮効率的で cold-startに対応できなかったり rewardに対する仮定がきつかったりした

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 既存⼿法1: K-armed
Bernoulli Bandit • 最も単純な⽅法. 1つのSlateを1つのactionと⾒て Tompson Samplingに当てはめる • 選択肢が⼀部被っている他のSlateの試⾏情報を取り⼊れることができず, 学習が遅い • Slateの数だけパラメータをサンプリングする必要がありarm選択に時間がかかる

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 既存⼿法2: Generalized
Linear Bandit • 各slotがrewardに対して線形に貢献しているという仮定に基づいたバンディットモデル • 選択肢が⼀部被っている他のSlateの試⾏情報はモデルパラメータの学習を通して取り⼊れる • Rewardのモデルが線形というのは強い仮定 • Slateの数だけパラメータをサンプリングする必要がありarm選択に時間がかかる

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 提案⼿法: Marginal
Posterior Sampling • 前述の課題を解決したTompson Sampling -basedのアルゴリズム • Arm選択をslateごとではなくslotごとに⾏う • rewardの発⽣過程に対する緩い仮定に基づく • 別のslateの情報もうまい具合に活⽤ • パラメータサンプリングの回数が少なく済むため arm選択にかかる時間も短縮

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 提案⼿法: Marginal
Posterior Sampling • Slateバンディットの状況を模倣した⼈⼯データでの実験において累積報酬とarmの選択にかかる時間でBaselinesを上回る選択肢の数が多い場合に提案⼿法が特に強い 10-70倍の⾼速化

How Sensitive is Recommendation Systemʼs Oﬄine Evaluation to Popularity? (Poster)
概要 • ItemのpopularityがRecommenderのoffline評価に与える影響を評価 • Popularityの層別によって性能の順位が変わるという事実を指摘 ◦ Itemの全て使った時はBPRなどのPairwise algorithmが良い性能 ◦ ⼀⽅で、test dataをrare itemのみに絞った場合はPointwiseのMFが強かったり • ⾃分の隣でポスター発表をしていてだいぶ⼈を持って⾏かれた。。。 Popularityの違い

How Sensitive is Recommendation Systemʼs Offline Evaluation to Popularity? (Poster)
全itemに対する推薦精度はPairwiseの⽅が強い（⼀般にそう⾔われているはず）ただしtestをrare itemに絞っていくと徐々にMFが優勢に

Counterfactual Cross-Validation (Poster) 概要 • 因果効果予測モデルの新たな評価指標を提案 • もちろんvalidationにおいても因果効果は⾒えないので真の性能を推定するのは難 • よって性能を推定するのではなくて性能の順位を保存するという緩い条件を⽬指す
Actionの因果効果: e.g., 推薦した時としなかった時のCV率の差与えられた関数の予測性能（因果効果に対するMSE） Validationにおいても観測されない

Counterfactual Cross-Validation (Poster) 実験1: モデル選択既存の評価指標はすべて観測可能な変数を使って予測性能を正確に推定する問題を解くしかし性能の順位さえわかればモデル選択やハイパラチューニングが可能なので順位の保存に特化した評価指標を理論解析付きで提案観測可能な情報のみを使って真の性能と順位相関0.93のスコアを振ることに成功
実験2: ハイパラチューニング OptunaのObjectiveに設定することでより良いハイパラ設定を⾒つけることができた

概要 • コンペ期間は10/01 – 11/30で優勝賞⾦3,000ユーロ（約35万円） • criteoが実装したRecoGymを⽤いる • 強化学習で推薦すべきアイテムを決め,
CTRで競う • 与えられるデータは何れかのpolicyで集められたデータなのでバイアスがあり, それをいかにして取り除くかが鍵（なっているはず） RecoGym Challenge (Competition)

類似のWorkshopの紹介 Causal Machine Learning Workshop @NeurIPSʼ19 • 因果推論やバンディットに関する話題 • NeurIPSでは実は2017年から3年連続の開催
• 毎回1年後のICMLやNeurIPSにfull paper論⽂の short versionが多く採択されている印象 • REVEALよりはちょっと理論寄り • 超豪華なInvited Speakers (今年はSusan Atheyとか) • ここでも発表してきます︕

ご静聴ありがとうございました

Reference • REVEAL Workshop 2019: https://sites.google.com/view/reveal2019/home • RecoGeym Challenge: https://sites.google.com/view/recogymchallenge/home
• Metrics, Engagement & “Recommenders”. Mounia Lalmas. : https://www.slideshare.net/mounialalmas/engagement- metrics-and-recommenders • Marginal Posterior Sampling for the Slate Bandits. Maria Dimakopoulou, Nikos Vlassis, and Tony Jebara. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 2019. • Deriving User- and Content- specific Rewards for Contextual Bandits. Paolo Dragone, Rishabh Mehrotra, and Mounia Lalmas. In Proceedings of the International World Wide Web Conference (WWW), 2019. • How Sensitive is Recommendation Systemʼs Offline Evaluation to Popularity? Amir H Jadidinejad, Craig Macdonald, and Iadh Ounis. ACM RecSys Workshop on Reinforcement and Robust Estimators for Recommendation (REVEAL), 2019. • Counterfactual Cross-Validation. Yuta Saito and Shota Yasui. ACM RecSys Workshop on Reinforcement and Robust Estimators for Recommendation (REVEAL), 2019.

REVEAL Workshopの紹介

REVEAL Workshopの紹介

usaito PRO

More Decks by usaito

Other Decks in Research

Featured

Transcript

REVEAL Workshop 齋藤優太

Outline • Workshopの概要 • Metrics, Engagement, and Recommenders (Invited Talk)

そもそもRecSysの⽇程・構成って︖ • Main Conference: 9⽉16⽇ ~ 9⽉18⽇ ◦ long・short paperのoral

REVEAL Workshop 概要 • 推薦システムに潜在するバイアスの存在の指摘やその除去⽅法、バンディット・強化学習との関連に特化したWorkshop • organizersやtalkersがall-star級に豪華 • 2018年に続いて2回⽬の開催

Metrics, Engagement and Recommenders (Invited Talk) 概要 • onlineのuser engagementを測定するための

Metrics, Engagement and Recommenders (Invited Talk) Deriving User- and Content-specific

Metrics, Engagement and Recommenders (Invited Talk) • しかしstreaming timeの分布はuserやplaylistの性質によって⼤きく異なる sleep

Metrics, Engagement and Recommenders (Invited Talk) • co-clusteringによりuser-playlistのgropingをして, それぞれで報酬の定義を変える Deriving

Metrics, Engagement and Recommenders (Invited Talk) • adaptiveな報酬の定義やco-clusteringを取り⼊れることでstream rateが改善報酬の定義による効果

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 概要 •

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 研究の背景 •

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 既存⼿法1: K-armed

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 既存⼿法2: Generalized

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 提案⼿法: Marginal

Marginal Posterior Sampling for Slate Bandit (Oral Presentation) 提案⼿法: Marginal

How Sensitive is Recommendation Systemʼs Oﬄine Evaluation to Popularity? (Poster)

How Sensitive is Recommendation Systemʼs Offline Evaluation to Popularity? (Poster)

概要 • コンペ期間は10/01 – 11/30で優勝賞⾦3,000ユーロ（約35万円） • criteoが実装したRecoGymを⽤いる • 強化学習で推薦すべきアイテムを決め,

類似のWorkshopの紹介 Causal Machine Learning Workshop @NeurIPSʼ19 • 因果推論やバンディットに関する話題 • NeurIPSでは実は2017年から3年連続の開催

ご静聴ありがとうございました

Reference • REVEAL Workshop 2019: https://sites.google.com/view/reveal2019/home • RecoGeym Challenge: https://sites.google.com/view/recogymchallenge/home