Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Counterfactual learning to rank: introduction
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Daiki Tanaka
May 02, 2020
Research
840
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Counterfactual learning to rank: introduction
一般的なランキング学習からcounterfactual LTRへの導入
Daiki Tanaka
May 02, 2020
More Decks by Daiki Tanaka
See All by Daiki Tanaka
カーネル法概観
daikitanak
0
690
カーネル法:正定値カーネルの理論
daikitanak
0
74
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA
daikitanak
1
210
[Paper Reading] Attention is All You Need
daikitanak
0
130
Interpretability of Machine Learning : Paper reading (LIME)
daikitanak
0
170
[Paper reading] Local Outlier Detection With Interpretation
daikitanak
0
79
Other Decks in Research
See All in Research
LLM の Attention 機構まとめ — 数式・計算量・メモリ
puwaer
8
2.1k
多様なデータを許容し学習し続ける模倣学習 / Advanced Imitation Learning for VLA
prinlab
0
220
コーディングエージェントとABNを再考
hf149
2
720
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent
satai
2
310
Sequences of Logits Reveal the Low Rank Structure of Language Models
sansantech
PRO
1
260
ブレグマン距離最小化に基づくリース表現量推定:バイアス除去学習の統一理論
masakat0
0
290
2026 東京科学大 情報通信系 研究室紹介 (大岡山)
icttitech
0
3.8k
敵対生成プロンプト同時探索による内省型プロンプト最適化
kinoue_smarthr
0
210
世界モデルにおける分布外データ対応の方法論
koukyo1994
7
2.2k
明日から使える!研究効率化ツール入門
matsui_528
13
7.3k
「車1割削減、渋滞半減、公共交通2倍」を 熊本から岡山へ@RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
1
1.2k
「行ける・行けない表」による地域公共交通の性能評価
bansousha
0
160
Featured
See All Featured
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
180
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
290
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
180
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
170
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
1
260
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
330
Information Architects: The Missing Link in Design Systems
soysaucechin
0
970
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
160
How to make the Groovebox
asonas
2
2.2k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
240
Why Our Code Smells
bkeepers
PRO
340
58k
Transcript
Unbiased Learning to Rank May 7, 2020
Learning to rank ઃఆ Supervised LTR Pointwise loss Pairwise loss
Listtwise loss Counterfactual Learning to Rank Counterfactual Evaluation Inverse Propensity Scoring Propensity-weighted Learning to Rank 2
Learning to rank: ઃఆ ೖྗɿ จॻͷू߹ D ग़ྗɿ จॻͷॱҐ R
= (R1; R2; R3:::) ͨͩ͠ɺ֤จॻʹϞσϧ f„ ʹΑͬͯείΞ͕͍͍ͭͯͯ f„ (R1) – f„ (R2) – f„ (R3) ::: ͱͳ͍ͬͯΔɻ(ߴ͍είΞ͕͚ΒΕΔ΄ͲॱҐ͕ߴ͍) Learning to Rank (LTR) ͷత࠷దͳॱҐΛग़ྗ͢ΔϞσϧ f„ ͷύϥϝʔλ „ Λ σʔλ͔ΒٻΊΔ͜ͱɻ 3
Supervised LTR ڭࢣ͋Γ LTR Ͱɺ › ݕࡧΫΤϦ › จॻू߹ ›
ॱҐͷϥϕϧ ΛؚΉσʔληοτΛͬͯϞσϧύϥϝʔλΛٻΊΔɻ ڭࢣ͋Γ LTR Ͱ༻͍ΒΕΔଛࣦओʹ 3 ͭɿ › Pointwise loss › Pairwise loss › Listwise loss y (d) ʹΑͬͯɺจॻ d ͷݕࡧΫΤϦͷؔ࿈Λද͢ͱ͢Δɻ(େ͖͍΄ͲॱҐͷ্Ґʹ ͖ͯཉ͍͠) 4
Pointwise loss Pointwise loss ɺॱҐͷਪఆΛྨɾճؼͱͯ͠ղ͘ɻྫ͑ɺ௨ৗͷճؼଛࣦ (squared loss) ͱͯ͠ҎԼͷΑ͏ʹ༩͑Δɿ Lpointwise :=
1 N N X i=1 (f„ (di) ` y (di))2 Pointwise loss ͷɺϞσϧͷग़ྗΛॱҐͱͯ͠͏͜ͱΛߟྀʹೖΕ͍ͯͳ͍͜ ͱɻLTR Ͱग़ྗͱͯ͠ಘΒΕΔείΞΛฒͼସ͑ͯಘΒΕΔॱҐʹͷΈؔ৺͕͋Δɻ 5
Pairwise loss Pairwise loss Ͱɺ2 ͭͷจॻؒͷ૬ରతͳείΞͷେখΛߟྀʹ͍ΕΔɻྫ͑ɺҎԼ ͷΑ͏ͳ hinge-loss Λ༩͑Δʀ Lpairwise
:= X y(di)>y(dj) max (0; 1 ` (f„ (di) ` f„ (di))): ॱҐ͕૬ରతʹߴ͍จॻείΞ͕ߴ͘ɺॱҐ͕͍จॻείΞΛ͘͢Δؾ࣋ͪɻ Pairwise loss ͷɺશͯͷهࣄϖΞΛಉ༷ʹѻ͏͜ͱɻ࣮ͦͯ͠༻্ top100 ͱ top10 ޙऀͷํ͕ॏࢹ͞ΕΔ͜ͱɻPairwise loss Ͱ top100 ͷԼͷํͷॱҐΛվળ ͤ͞ΔͨΊʹ্ҐͷॱҐΛ٘ਜ਼ʹ͢Δ͜ͱ͕͋Γ͑ͯ͠·͏ɻ 6
Listwise loss Listwise loss ͰॱҐࢦඪΛ࠷దԽ͢Δɻ՝ɺॱҐࢦඪ͕ඍՄೳͰͳ͍͜ͱɻ ྫ͑ɺDCG ɿ DCG = N
X i=1 y (di) log2 (rank (di) + 1) Ͱ͋Δ͕ɺlog2 (rank (di) + 1) ඍෆՄೳͰ͋Δɻ ͦͷͨΊʹ֬తۙࣅΛ༻͍Δํ๏ (ListNetɺListMLE) ɺώϡʔϦεςΟοΫॱҐ ࢦඪͷόϯυΛ࠷దԽ͢Δख๏͕͋Δɻ(LambdaRankɺLambdaLoss) ྫ͑ɺ LambdaRank ͷଛࣦ DCG ͷόϯυͱͳ͍ͬͯΔɿ LLambdaRank := X y(di)>y(dj) log (1 + exp (f„ (dj) ` f„ (di))) j´DCGj 7
ҼՌධՁ తɿ৽͍͠ϥϯΩϯάؔ f„ ΛɺผͷϥϯΩϯάؔ fdeploy ͷԼͰूΊΒΕͨաڈ ͷσʔλ (ΫϦοΫσʔλͳͲ) ΛͬͯධՁ͍ͨ͠ɻ ҎԼͷ
2 ͭͷ߹ʹ͍ͭͯߟ͑Δɻ › શͯͷจॻʹ͍ͭͯਅͷؔ࿈ y (di) ͕طͰ͋Δ࣌ › y (di) Θ͔Βͳ͍͕ɺΫϦοΫใͳͲͷ҉తͳϑΟʔυόοΫͷΈར༻Մೳͳ࣌ 8
ҼՌධՁɿϥϕϧ͕طͳΒશʹධՁ͕Ͱ͖Δ શͯͷจॻʹ͍ͭͯਅͷϥϕϧ y (di) ͕طͰ͋Δ࣌ɺIR(ใݕࡧ) ࢦඪΛܭࢉͰ͖Δɿ ´ (f„; D; y)
= X di2D – (rank (di j f„; D)) ´ y (di) ͜͜Ͱɺ– ॱҐॏΈ͚ؔͰ͋ͬͯɺྫ͑ɿ APR: – (r) = r DCG: – (r) = 1 log2 (1+r) ͳͲ͕༻͍ΒΕΔɻ 9
ҼՌධՁ y (di) Θ͔Βͳ͍͕ɺΫϦοΫใͳͲͷ҉తͳϑΟʔυόοΫͷΈར༻Մೳͳ࣌ɿ › ͋Δจॻʹର͢ΔΫϦοΫɺͦͷจॻ͕ؔ࿈͍ͯ͠Δ͜ͱΛࣔ͢ɺόΠΞεɾϊΠζ ͖ͭͷࢦඪʹͳ͍ͬͯΔɻ › ΫϦοΫ͞Εͳ͔͔ͬͨΒͱ͍ͬͯͦͷจॻ͕ؔͳ͍Θ͚Ͱͳ͍ɻ(จॻ͕ؔͳ ͍ɾϢʔβ͕จॻΛ؍ଌ͍ͯ͠ͳ͍ɾϥϯμϜཁૉʹΑΔͷ)
ଟ͘ͷ؍ଌσʔλʹ͍ͭͯฏۉΛऔΕϊΠζআڈͰ͖Δͱߟ͑ΒΕΔ͕ɺόΠΞεআ ڈͰ͖ͳ͍ɻ 10
ҼՌධՁɿ؍ଌɾΫϦοΫϞσϧ Ϣʔβͷ؍ଌٴͼจॻͷؔ࿈ͷΈΛߟྀʹೖΕΔͱɺϢʔβͷΫϦοΫҎԼͷΑ͏ʹϞ σϦϯάͰ͖ͦ͏ɿ › ϥϯΩϯά R ʹ͓͍ͯจॻ di ͕؍ଌ͞ΕΔ (oi
= 1 Ͱද͢) ֬ɺ P (oi = 1 j R; di) (؍ଌ͞ΕΔ֬ؔ࿈ʹؔͳ͍ͱԾఆ͍ͯ͠Δɻ) › ؔ࿈ y (di) ͱ؍ଌ oi ͕༩͑ΒΕͨ࣌ͷɺจॻ di ͕ΫϦοΫ͞ΕΔ֬ (ci = 1 Ͱද͢) ɺ P (ci = 1 j oi; y (di)) › ΫϦοΫ؍ଌ͞Εͨจॻʹ͔͠ى͜Βͳ͍ͨΊɺϥϯΩϯά R ʹ͓͍ͯΫϦοΫ͞ ΕΔ֬ɿ P (ci = 1 ^ oi = 1 j y (di) ; R) = P (ci = 1 j oi = 1; y (di)) ´ P (oi = 1 j R; di) 11
ҼՌධՁɿ´ (f„; D; y) ͷφΠʔϒਪఆ ´ (f„; D; y) ΛφΠʔϒʹਪఆ͢ΔʹɺΫϦοΫͷใ
(ci) Λਅͷؔ࿈ϥϕϧ (y (di)) ͷΘΓʹ͑Αͯ͘ɺ ´NAIVE (f„; D; c) := X di2D – (rank (di j f„; D)) ´ ci ͱͳΔɻ ΫϦοΫʹϊΠζ͕͍ͬͯͳ͍࣌ɺͭ·Γ P (ci = 1 j oi = 1; y (di)) = y (di) Ͱ͋Δ࣌Ͱ͑͞ɺφΠʔϒਪఆ؍ଌόΠΞεΛड͚͍ͯΔɿ Eo ˆ´NAIVE (f„; D; c)˜ = Eo 2 4 X di2D – (rank (di j f„; D)) ´ ci 3 5 = Eo 2 6 4 X di:oi=1^y(di)=1 – (rank (di j f„; D)) 3 7 5 = X di:y(di)=1 P (oi = 1 j R; di)– (rank (di j f„; D)) = X di2D P (oi = 1 j R; di)– (rank (di j f„; D)) ´ y (di) 12
ҼՌධՁɿ´ (f„; D; y) ͷφΠʔϒਪఆ φΠʔϒਪఆɿ Eo ˆ´NAIVE (f„; D;
c)˜ = X di:y(di)=1 P (oi = 1 j R; di)– (rank (di j f„; D)) ͰɺͦΕͧΕͷจॻͷɺϩάऩू࣌ͷϥϯΩϯά R Ͱͷ؍ଌ֬ͰॏΈͨ͠ਪఆʹͳͬ ͯ͠·͏ɻ ϥϯΩϯάͰɺߴॱҐͷจॻ΄Ͳ؍ଌ͞Ε͍͢ɿ͜ΕΛ position bias ͱݺͿɻϩάऩ ूͷࡍʹߴॱҐʹදࣔ͞Εͨจॻਅͷؔ࿈ΑΓؔ࿈͕͋ΔɺͱόΠΞεΛड͚ͯ͠· ͏ɻ όΠΞεΛআڈ͢ΔͨΊʹɺP (oi = 1 j R; di) Λਪఆ͠ɺิਖ਼ͯ͋͛͠Εྑͦ͞͏ ! είΞʹΑΔόΠΞεআڈ 13
είΞΛ༻͍ͨόΠΞεআڈ Inverse Propensity Scoring(IPS) ʹΑͬͯόΠΞεΛআڈ͢Δɿ ´IPS (f„; D; c) :=
X di2D – (rank (di j f„; D)) P (oi = 1 j R; di) ´ ci ͜͜ͰɺP (oi = 1 j R; di) ϩάऩूதʹදࣔ͞ΕͨϥϯΩϯά R Ͱจॻ di ͕؍ଌ͞ ΕΔ֬Ͱ͋Δɻ´IPS (f„; D; c) ΫϦοΫϊΠζ͕ͳ͍߹ɺͭ·Γ P (ci = 1 j oi = 1; y (di)) = y (di) Ͱ͋Δ࣌ʹ ´ (f„; D; y) ͷෆภਪఆྔͰ͋Δɿ Eo ˆ´IPS (f„; D; c)˜ = Eo 2 4 X di2D – (rank (di j f„; D)) P (oi = 1 j R; di) ´ ci 3 5 = Eo 2 6 4 X di:oi=1^y(di)=1 – (rank (di j f„; D)) P (oi = 1 j R; di) 3 7 5 = X di:y(di)=1 P (oi = 1 j R; di) ´ – (rank (di j f„; D)) P (oi = 1 j R; di) = X di2D – (rank (di j f„; D)) ´ y (di) = ´ (f„; D; y) : 14
Propensity-weighted LTR IPS ´ (f„; D; y) ͷෆภਪఆͰ͋ͬͨɻΑͬͯɺ࠷దͳϞσϧύϥϝʔλ „
IPS Λ ࠷దԽ͢Δ͜ͱͰٻΊΔ͜ͱ͕Ͱ͖ΔɻIPS Λ࠷దԽ͢ΔࡍɺϥϯΩϯάࢦඪ – (r) ͷඍ ෆՄೳੑʹରॲ͢ΔͨΊɺ– (r) ͷ bound Λར༻͢Δɻ Propensity-weighted LTR ͷྲྀΕɿ › ΫϦοΫͷείΞΛਪఆɿ P (oi = 1 j R; di) › ෆภਪఆྔ ´IPS (f„; D; c) ͷ bound ʹ͍ͭͯඍΛܭࢉɿ „0 = r„ "– (rank (di j f„; D)) P (oi = 1 j R; di) # › ϞσϧύϥϝʔλΛߋ৽ „new „old ` „0 15
References › https://ilps.github.io/webconf2020-tutorial-unbiased-ltr/ 16