Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Counterfactual learning to rank: introduction
Search
Daiki Tanaka
May 02, 2020
Research
0
570
Counterfactual learning to rank: introduction
一般的なランキング学習からcounterfactual LTRへの導入
Daiki Tanaka
May 02, 2020
Tweet
Share
More Decks by Daiki Tanaka
See All by Daiki Tanaka
カーネル法概観
daikitanak
0
430
カーネル法:正定値カーネルの理論
daikitanak
0
63
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA
daikitanak
1
170
[Paper Reading] Attention is All You Need
daikitanak
0
110
Interpretability of Machine Learning : Paper reading (LIME)
daikitanak
0
110
[Paper reading] Local Outlier Detection With Interpretation
daikitanak
0
58
Other Decks in Research
See All in Research
言語処理学会30周年記念事業留学支援交流会@YANS2024:「学生のための短期留学」
a1da4
1
270
移動ビッグデータに基づく地理情報の埋め込みベクトル化
tam1110
0
160
尺度開発における質的研究アプローチ(自主企画シンポジウム7:認知行動療法における尺度開発のこれから)
litalicolab
0
360
メールからの名刺情報抽出におけるLLM活用 / Use of LLM in extracting business card information from e-mails
sansan_randd
2
270
CoRL2024サーベイ
rpc
1
1.1k
Weekly AI Agents News! 9月号 論文のアーカイブ
masatoto
1
150
[依頼講演] 適応的実験計画法に基づく効率的無線システム設計
k_sato
0
180
12
0325
0
200
The Relevance of UX for Conversion and Monetisation
itasohaakhib1
0
120
ダイナミックプライシング とその実例
skmr2348
3
480
[2024.08.30] Gemma-Ko, 오픈 언어모델에 한국어 입히기 @ 머신러닝부트캠프2024
beomi
0
810
Whoisの闇
hirachan
3
170
Featured
See All Featured
What's in a price? How to price your products and services
michaelherold
243
12k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
VelocityConf: Rendering Performance Case Studies
addyosmani
326
24k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
29
2k
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.3k
Agile that works and the tools we love
rasmusluckow
328
21k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
6
520
Code Review Best Practice
trishagee
65
17k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
159
15k
How to train your dragon (web standard)
notwaldorf
88
5.7k
RailsConf 2023
tenderlove
29
940
Why Our Code Smells
bkeepers
PRO
335
57k
Transcript
Unbiased Learning to Rank May 7, 2020
Learning to rank ઃఆ Supervised LTR Pointwise loss Pairwise loss
Listtwise loss Counterfactual Learning to Rank Counterfactual Evaluation Inverse Propensity Scoring Propensity-weighted Learning to Rank 2
Learning to rank: ઃఆ ೖྗɿ จॻͷू߹ D ग़ྗɿ จॻͷॱҐ R
= (R1; R2; R3:::) ͨͩ͠ɺ֤จॻʹϞσϧ f„ ʹΑͬͯείΞ͕͍͍ͭͯͯ f„ (R1) – f„ (R2) – f„ (R3) ::: ͱͳ͍ͬͯΔɻ(ߴ͍είΞ͕͚ΒΕΔ΄ͲॱҐ͕ߴ͍) Learning to Rank (LTR) ͷత࠷దͳॱҐΛग़ྗ͢ΔϞσϧ f„ ͷύϥϝʔλ „ Λ σʔλ͔ΒٻΊΔ͜ͱɻ 3
Supervised LTR ڭࢣ͋Γ LTR Ͱɺ › ݕࡧΫΤϦ › จॻू߹ ›
ॱҐͷϥϕϧ ΛؚΉσʔληοτΛͬͯϞσϧύϥϝʔλΛٻΊΔɻ ڭࢣ͋Γ LTR Ͱ༻͍ΒΕΔଛࣦओʹ 3 ͭɿ › Pointwise loss › Pairwise loss › Listwise loss y (d) ʹΑͬͯɺจॻ d ͷݕࡧΫΤϦͷؔ࿈Λද͢ͱ͢Δɻ(େ͖͍΄ͲॱҐͷ্Ґʹ ͖ͯཉ͍͠) 4
Pointwise loss Pointwise loss ɺॱҐͷਪఆΛྨɾճؼͱͯ͠ղ͘ɻྫ͑ɺ௨ৗͷճؼଛࣦ (squared loss) ͱͯ͠ҎԼͷΑ͏ʹ༩͑Δɿ Lpointwise :=
1 N N X i=1 (f„ (di) ` y (di))2 Pointwise loss ͷɺϞσϧͷग़ྗΛॱҐͱͯ͠͏͜ͱΛߟྀʹೖΕ͍ͯͳ͍͜ ͱɻLTR Ͱग़ྗͱͯ͠ಘΒΕΔείΞΛฒͼସ͑ͯಘΒΕΔॱҐʹͷΈؔ৺͕͋Δɻ 5
Pairwise loss Pairwise loss Ͱɺ2 ͭͷจॻؒͷ૬ରతͳείΞͷେখΛߟྀʹ͍ΕΔɻྫ͑ɺҎԼ ͷΑ͏ͳ hinge-loss Λ༩͑Δʀ Lpairwise
:= X y(di)>y(dj) max (0; 1 ` (f„ (di) ` f„ (di))): ॱҐ͕૬ରతʹߴ͍จॻείΞ͕ߴ͘ɺॱҐ͕͍จॻείΞΛ͘͢Δؾ࣋ͪɻ Pairwise loss ͷɺશͯͷهࣄϖΞΛಉ༷ʹѻ͏͜ͱɻ࣮ͦͯ͠༻্ top100 ͱ top10 ޙऀͷํ͕ॏࢹ͞ΕΔ͜ͱɻPairwise loss Ͱ top100 ͷԼͷํͷॱҐΛվળ ͤ͞ΔͨΊʹ্ҐͷॱҐΛ٘ਜ਼ʹ͢Δ͜ͱ͕͋Γ͑ͯ͠·͏ɻ 6
Listwise loss Listwise loss ͰॱҐࢦඪΛ࠷దԽ͢Δɻ՝ɺॱҐࢦඪ͕ඍՄೳͰͳ͍͜ͱɻ ྫ͑ɺDCG ɿ DCG = N
X i=1 y (di) log2 (rank (di) + 1) Ͱ͋Δ͕ɺlog2 (rank (di) + 1) ඍෆՄೳͰ͋Δɻ ͦͷͨΊʹ֬తۙࣅΛ༻͍Δํ๏ (ListNetɺListMLE) ɺώϡʔϦεςΟοΫॱҐ ࢦඪͷόϯυΛ࠷దԽ͢Δख๏͕͋Δɻ(LambdaRankɺLambdaLoss) ྫ͑ɺ LambdaRank ͷଛࣦ DCG ͷόϯυͱͳ͍ͬͯΔɿ LLambdaRank := X y(di)>y(dj) log (1 + exp (f„ (dj) ` f„ (di))) j´DCGj 7
ҼՌධՁ తɿ৽͍͠ϥϯΩϯάؔ f„ ΛɺผͷϥϯΩϯάؔ fdeploy ͷԼͰूΊΒΕͨաڈ ͷσʔλ (ΫϦοΫσʔλͳͲ) ΛͬͯධՁ͍ͨ͠ɻ ҎԼͷ
2 ͭͷ߹ʹ͍ͭͯߟ͑Δɻ › શͯͷจॻʹ͍ͭͯਅͷؔ࿈ y (di) ͕طͰ͋Δ࣌ › y (di) Θ͔Βͳ͍͕ɺΫϦοΫใͳͲͷ҉తͳϑΟʔυόοΫͷΈར༻Մೳͳ࣌ 8
ҼՌධՁɿϥϕϧ͕طͳΒશʹධՁ͕Ͱ͖Δ શͯͷจॻʹ͍ͭͯਅͷϥϕϧ y (di) ͕طͰ͋Δ࣌ɺIR(ใݕࡧ) ࢦඪΛܭࢉͰ͖Δɿ ´ (f„; D; y)
= X di2D – (rank (di j f„; D)) ´ y (di) ͜͜Ͱɺ– ॱҐॏΈ͚ؔͰ͋ͬͯɺྫ͑ɿ APR: – (r) = r DCG: – (r) = 1 log2 (1+r) ͳͲ͕༻͍ΒΕΔɻ 9
ҼՌධՁ y (di) Θ͔Βͳ͍͕ɺΫϦοΫใͳͲͷ҉తͳϑΟʔυόοΫͷΈར༻Մೳͳ࣌ɿ › ͋Δจॻʹର͢ΔΫϦοΫɺͦͷจॻ͕ؔ࿈͍ͯ͠Δ͜ͱΛࣔ͢ɺόΠΞεɾϊΠζ ͖ͭͷࢦඪʹͳ͍ͬͯΔɻ › ΫϦοΫ͞Εͳ͔͔ͬͨΒͱ͍ͬͯͦͷจॻ͕ؔͳ͍Θ͚Ͱͳ͍ɻ(จॻ͕ؔͳ ͍ɾϢʔβ͕จॻΛ؍ଌ͍ͯ͠ͳ͍ɾϥϯμϜཁૉʹΑΔͷ)
ଟ͘ͷ؍ଌσʔλʹ͍ͭͯฏۉΛऔΕϊΠζআڈͰ͖Δͱߟ͑ΒΕΔ͕ɺόΠΞεআ ڈͰ͖ͳ͍ɻ 10
ҼՌධՁɿ؍ଌɾΫϦοΫϞσϧ Ϣʔβͷ؍ଌٴͼจॻͷؔ࿈ͷΈΛߟྀʹೖΕΔͱɺϢʔβͷΫϦοΫҎԼͷΑ͏ʹϞ σϦϯάͰ͖ͦ͏ɿ › ϥϯΩϯά R ʹ͓͍ͯจॻ di ͕؍ଌ͞ΕΔ (oi
= 1 Ͱද͢) ֬ɺ P (oi = 1 j R; di) (؍ଌ͞ΕΔ֬ؔ࿈ʹؔͳ͍ͱԾఆ͍ͯ͠Δɻ) › ؔ࿈ y (di) ͱ؍ଌ oi ͕༩͑ΒΕͨ࣌ͷɺจॻ di ͕ΫϦοΫ͞ΕΔ֬ (ci = 1 Ͱද͢) ɺ P (ci = 1 j oi; y (di)) › ΫϦοΫ؍ଌ͞Εͨจॻʹ͔͠ى͜Βͳ͍ͨΊɺϥϯΩϯά R ʹ͓͍ͯΫϦοΫ͞ ΕΔ֬ɿ P (ci = 1 ^ oi = 1 j y (di) ; R) = P (ci = 1 j oi = 1; y (di)) ´ P (oi = 1 j R; di) 11
ҼՌධՁɿ´ (f„; D; y) ͷφΠʔϒਪఆ ´ (f„; D; y) ΛφΠʔϒʹਪఆ͢ΔʹɺΫϦοΫͷใ
(ci) Λਅͷؔ࿈ϥϕϧ (y (di)) ͷΘΓʹ͑Αͯ͘ɺ ´NAIVE (f„; D; c) := X di2D – (rank (di j f„; D)) ´ ci ͱͳΔɻ ΫϦοΫʹϊΠζ͕͍ͬͯͳ͍࣌ɺͭ·Γ P (ci = 1 j oi = 1; y (di)) = y (di) Ͱ͋Δ࣌Ͱ͑͞ɺφΠʔϒਪఆ؍ଌόΠΞεΛड͚͍ͯΔɿ Eo ˆ´NAIVE (f„; D; c)˜ = Eo 2 4 X di2D – (rank (di j f„; D)) ´ ci 3 5 = Eo 2 6 4 X di:oi=1^y(di)=1 – (rank (di j f„; D)) 3 7 5 = X di:y(di)=1 P (oi = 1 j R; di)– (rank (di j f„; D)) = X di2D P (oi = 1 j R; di)– (rank (di j f„; D)) ´ y (di) 12
ҼՌධՁɿ´ (f„; D; y) ͷφΠʔϒਪఆ φΠʔϒਪఆɿ Eo ˆ´NAIVE (f„; D;
c)˜ = X di:y(di)=1 P (oi = 1 j R; di)– (rank (di j f„; D)) ͰɺͦΕͧΕͷจॻͷɺϩάऩू࣌ͷϥϯΩϯά R Ͱͷ؍ଌ֬ͰॏΈͨ͠ਪఆʹͳͬ ͯ͠·͏ɻ ϥϯΩϯάͰɺߴॱҐͷจॻ΄Ͳ؍ଌ͞Ε͍͢ɿ͜ΕΛ position bias ͱݺͿɻϩάऩ ूͷࡍʹߴॱҐʹදࣔ͞Εͨจॻਅͷؔ࿈ΑΓؔ࿈͕͋ΔɺͱόΠΞεΛड͚ͯ͠· ͏ɻ όΠΞεΛআڈ͢ΔͨΊʹɺP (oi = 1 j R; di) Λਪఆ͠ɺิਖ਼ͯ͋͛͠Εྑͦ͞͏ ! είΞʹΑΔόΠΞεআڈ 13
είΞΛ༻͍ͨόΠΞεআڈ Inverse Propensity Scoring(IPS) ʹΑͬͯόΠΞεΛআڈ͢Δɿ ´IPS (f„; D; c) :=
X di2D – (rank (di j f„; D)) P (oi = 1 j R; di) ´ ci ͜͜ͰɺP (oi = 1 j R; di) ϩάऩूதʹදࣔ͞ΕͨϥϯΩϯά R Ͱจॻ di ͕؍ଌ͞ ΕΔ֬Ͱ͋Δɻ´IPS (f„; D; c) ΫϦοΫϊΠζ͕ͳ͍߹ɺͭ·Γ P (ci = 1 j oi = 1; y (di)) = y (di) Ͱ͋Δ࣌ʹ ´ (f„; D; y) ͷෆภਪఆྔͰ͋Δɿ Eo ˆ´IPS (f„; D; c)˜ = Eo 2 4 X di2D – (rank (di j f„; D)) P (oi = 1 j R; di) ´ ci 3 5 = Eo 2 6 4 X di:oi=1^y(di)=1 – (rank (di j f„; D)) P (oi = 1 j R; di) 3 7 5 = X di:y(di)=1 P (oi = 1 j R; di) ´ – (rank (di j f„; D)) P (oi = 1 j R; di) = X di2D – (rank (di j f„; D)) ´ y (di) = ´ (f„; D; y) : 14
Propensity-weighted LTR IPS ´ (f„; D; y) ͷෆภਪఆͰ͋ͬͨɻΑͬͯɺ࠷దͳϞσϧύϥϝʔλ „
IPS Λ ࠷దԽ͢Δ͜ͱͰٻΊΔ͜ͱ͕Ͱ͖ΔɻIPS Λ࠷దԽ͢ΔࡍɺϥϯΩϯάࢦඪ – (r) ͷඍ ෆՄೳੑʹରॲ͢ΔͨΊɺ– (r) ͷ bound Λར༻͢Δɻ Propensity-weighted LTR ͷྲྀΕɿ › ΫϦοΫͷείΞΛਪఆɿ P (oi = 1 j R; di) › ෆภਪఆྔ ´IPS (f„; D; c) ͷ bound ʹ͍ͭͯඍΛܭࢉɿ „0 = r„ "– (rank (di j f„; D)) P (oi = 1 j R; di) # › ϞσϧύϥϝʔλΛߋ৽ „new „old ` „0 15
References › https://ilps.github.io/webconf2020-tutorial-unbiased-ltr/ 16