Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Counterfactual learning to rank: introduction
Search
Daiki Tanaka
May 02, 2020
Research
830
0
Share
Counterfactual learning to rank: introduction
一般的なランキング学習からcounterfactual LTRへの導入
Daiki Tanaka
May 02, 2020
More Decks by Daiki Tanaka
See All by Daiki Tanaka
カーネル法概観
daikitanak
0
680
カーネル法:正定値カーネルの理論
daikitanak
0
73
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA
daikitanak
1
210
[Paper Reading] Attention is All You Need
daikitanak
0
130
Interpretability of Machine Learning : Paper reading (LIME)
daikitanak
0
170
[Paper reading] Local Outlier Detection With Interpretation
daikitanak
0
77
Other Decks in Research
See All in Research
さくらインターネット研究所テックトーク2026春、研究開発Gr.25年度成果26年度方針
kikuzo
0
140
非試合日の野球場を楽しむためのARホームランボールキャッチ体験システムの開発 / EC79-miyazaki
yumulab
0
180
東京大学工学部計数工学科、計数工学特別講義の説明資料
kikuzo
0
440
通時的な類似度行列に基づく単語の意味変化の分析
rudorudo11
0
290
Model Discovery and Graph Simulation: A Lightweight Gateway to Chaos Engineering
anatolykr
0
180
LLM の Attention 機構まとめ — 数式・計算量・メモリ
puwaer
7
2k
Harness Engineering and Al Agent
kzinmr
3
1.6k
NII S. Koyama's Lab Research Overview AY2026
skoyamalab
0
250
定数整数除算・剰余算最適化再考
herumi
1
120
[チュートリアル] 電波マップ構築入門 :研究動向と課題設定の勘所
k_sato
0
450
論文紹介 "ReSim: Reliable World Simulation for Autonomous Driving"
kogo
0
600
討議:RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
0
910
Featured
See All Featured
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
350
Building Adaptive Systems
keathley
44
3k
Speed Design
sergeychernyshev
33
1.8k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
190
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
180
How to train your dragon (web standard)
notwaldorf
97
6.6k
Technical Leadership for Architectural Decision Making
baasie
3
380
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
200
Building AI with AI
inesmontani
PRO
1
1k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9.1k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
220
Skip the Path - Find Your Career Trail
mkilby
1
130
Transcript
Unbiased Learning to Rank May 7, 2020
Learning to rank ઃఆ Supervised LTR Pointwise loss Pairwise loss
Listtwise loss Counterfactual Learning to Rank Counterfactual Evaluation Inverse Propensity Scoring Propensity-weighted Learning to Rank 2
Learning to rank: ઃఆ ೖྗɿ จॻͷू߹ D ग़ྗɿ จॻͷॱҐ R
= (R1; R2; R3:::) ͨͩ͠ɺ֤จॻʹϞσϧ f„ ʹΑͬͯείΞ͕͍͍ͭͯͯ f„ (R1) – f„ (R2) – f„ (R3) ::: ͱͳ͍ͬͯΔɻ(ߴ͍είΞ͕͚ΒΕΔ΄ͲॱҐ͕ߴ͍) Learning to Rank (LTR) ͷత࠷దͳॱҐΛग़ྗ͢ΔϞσϧ f„ ͷύϥϝʔλ „ Λ σʔλ͔ΒٻΊΔ͜ͱɻ 3
Supervised LTR ڭࢣ͋Γ LTR Ͱɺ › ݕࡧΫΤϦ › จॻू߹ ›
ॱҐͷϥϕϧ ΛؚΉσʔληοτΛͬͯϞσϧύϥϝʔλΛٻΊΔɻ ڭࢣ͋Γ LTR Ͱ༻͍ΒΕΔଛࣦओʹ 3 ͭɿ › Pointwise loss › Pairwise loss › Listwise loss y (d) ʹΑͬͯɺจॻ d ͷݕࡧΫΤϦͷؔ࿈Λද͢ͱ͢Δɻ(େ͖͍΄ͲॱҐͷ্Ґʹ ͖ͯཉ͍͠) 4
Pointwise loss Pointwise loss ɺॱҐͷਪఆΛྨɾճؼͱͯ͠ղ͘ɻྫ͑ɺ௨ৗͷճؼଛࣦ (squared loss) ͱͯ͠ҎԼͷΑ͏ʹ༩͑Δɿ Lpointwise :=
1 N N X i=1 (f„ (di) ` y (di))2 Pointwise loss ͷɺϞσϧͷग़ྗΛॱҐͱͯ͠͏͜ͱΛߟྀʹೖΕ͍ͯͳ͍͜ ͱɻLTR Ͱग़ྗͱͯ͠ಘΒΕΔείΞΛฒͼସ͑ͯಘΒΕΔॱҐʹͷΈؔ৺͕͋Δɻ 5
Pairwise loss Pairwise loss Ͱɺ2 ͭͷจॻؒͷ૬ରతͳείΞͷେখΛߟྀʹ͍ΕΔɻྫ͑ɺҎԼ ͷΑ͏ͳ hinge-loss Λ༩͑Δʀ Lpairwise
:= X y(di)>y(dj) max (0; 1 ` (f„ (di) ` f„ (di))): ॱҐ͕૬ରతʹߴ͍จॻείΞ͕ߴ͘ɺॱҐ͕͍จॻείΞΛ͘͢Δؾ࣋ͪɻ Pairwise loss ͷɺશͯͷهࣄϖΞΛಉ༷ʹѻ͏͜ͱɻ࣮ͦͯ͠༻্ top100 ͱ top10 ޙऀͷํ͕ॏࢹ͞ΕΔ͜ͱɻPairwise loss Ͱ top100 ͷԼͷํͷॱҐΛվળ ͤ͞ΔͨΊʹ্ҐͷॱҐΛ٘ਜ਼ʹ͢Δ͜ͱ͕͋Γ͑ͯ͠·͏ɻ 6
Listwise loss Listwise loss ͰॱҐࢦඪΛ࠷దԽ͢Δɻ՝ɺॱҐࢦඪ͕ඍՄೳͰͳ͍͜ͱɻ ྫ͑ɺDCG ɿ DCG = N
X i=1 y (di) log2 (rank (di) + 1) Ͱ͋Δ͕ɺlog2 (rank (di) + 1) ඍෆՄೳͰ͋Δɻ ͦͷͨΊʹ֬తۙࣅΛ༻͍Δํ๏ (ListNetɺListMLE) ɺώϡʔϦεςΟοΫॱҐ ࢦඪͷόϯυΛ࠷దԽ͢Δख๏͕͋Δɻ(LambdaRankɺLambdaLoss) ྫ͑ɺ LambdaRank ͷଛࣦ DCG ͷόϯυͱͳ͍ͬͯΔɿ LLambdaRank := X y(di)>y(dj) log (1 + exp (f„ (dj) ` f„ (di))) j´DCGj 7
ҼՌධՁ తɿ৽͍͠ϥϯΩϯάؔ f„ ΛɺผͷϥϯΩϯάؔ fdeploy ͷԼͰूΊΒΕͨաڈ ͷσʔλ (ΫϦοΫσʔλͳͲ) ΛͬͯධՁ͍ͨ͠ɻ ҎԼͷ
2 ͭͷ߹ʹ͍ͭͯߟ͑Δɻ › શͯͷจॻʹ͍ͭͯਅͷؔ࿈ y (di) ͕طͰ͋Δ࣌ › y (di) Θ͔Βͳ͍͕ɺΫϦοΫใͳͲͷ҉తͳϑΟʔυόοΫͷΈར༻Մೳͳ࣌ 8
ҼՌධՁɿϥϕϧ͕طͳΒશʹධՁ͕Ͱ͖Δ શͯͷจॻʹ͍ͭͯਅͷϥϕϧ y (di) ͕طͰ͋Δ࣌ɺIR(ใݕࡧ) ࢦඪΛܭࢉͰ͖Δɿ ´ (f„; D; y)
= X di2D – (rank (di j f„; D)) ´ y (di) ͜͜Ͱɺ– ॱҐॏΈ͚ؔͰ͋ͬͯɺྫ͑ɿ APR: – (r) = r DCG: – (r) = 1 log2 (1+r) ͳͲ͕༻͍ΒΕΔɻ 9
ҼՌධՁ y (di) Θ͔Βͳ͍͕ɺΫϦοΫใͳͲͷ҉తͳϑΟʔυόοΫͷΈར༻Մೳͳ࣌ɿ › ͋Δจॻʹର͢ΔΫϦοΫɺͦͷจॻ͕ؔ࿈͍ͯ͠Δ͜ͱΛࣔ͢ɺόΠΞεɾϊΠζ ͖ͭͷࢦඪʹͳ͍ͬͯΔɻ › ΫϦοΫ͞Εͳ͔͔ͬͨΒͱ͍ͬͯͦͷจॻ͕ؔͳ͍Θ͚Ͱͳ͍ɻ(จॻ͕ؔͳ ͍ɾϢʔβ͕จॻΛ؍ଌ͍ͯ͠ͳ͍ɾϥϯμϜཁૉʹΑΔͷ)
ଟ͘ͷ؍ଌσʔλʹ͍ͭͯฏۉΛऔΕϊΠζআڈͰ͖Δͱߟ͑ΒΕΔ͕ɺόΠΞεআ ڈͰ͖ͳ͍ɻ 10
ҼՌධՁɿ؍ଌɾΫϦοΫϞσϧ Ϣʔβͷ؍ଌٴͼจॻͷؔ࿈ͷΈΛߟྀʹೖΕΔͱɺϢʔβͷΫϦοΫҎԼͷΑ͏ʹϞ σϦϯάͰ͖ͦ͏ɿ › ϥϯΩϯά R ʹ͓͍ͯจॻ di ͕؍ଌ͞ΕΔ (oi
= 1 Ͱද͢) ֬ɺ P (oi = 1 j R; di) (؍ଌ͞ΕΔ֬ؔ࿈ʹؔͳ͍ͱԾఆ͍ͯ͠Δɻ) › ؔ࿈ y (di) ͱ؍ଌ oi ͕༩͑ΒΕͨ࣌ͷɺจॻ di ͕ΫϦοΫ͞ΕΔ֬ (ci = 1 Ͱද͢) ɺ P (ci = 1 j oi; y (di)) › ΫϦοΫ؍ଌ͞Εͨจॻʹ͔͠ى͜Βͳ͍ͨΊɺϥϯΩϯά R ʹ͓͍ͯΫϦοΫ͞ ΕΔ֬ɿ P (ci = 1 ^ oi = 1 j y (di) ; R) = P (ci = 1 j oi = 1; y (di)) ´ P (oi = 1 j R; di) 11
ҼՌධՁɿ´ (f„; D; y) ͷφΠʔϒਪఆ ´ (f„; D; y) ΛφΠʔϒʹਪఆ͢ΔʹɺΫϦοΫͷใ
(ci) Λਅͷؔ࿈ϥϕϧ (y (di)) ͷΘΓʹ͑Αͯ͘ɺ ´NAIVE (f„; D; c) := X di2D – (rank (di j f„; D)) ´ ci ͱͳΔɻ ΫϦοΫʹϊΠζ͕͍ͬͯͳ͍࣌ɺͭ·Γ P (ci = 1 j oi = 1; y (di)) = y (di) Ͱ͋Δ࣌Ͱ͑͞ɺφΠʔϒਪఆ؍ଌόΠΞεΛड͚͍ͯΔɿ Eo ˆ´NAIVE (f„; D; c)˜ = Eo 2 4 X di2D – (rank (di j f„; D)) ´ ci 3 5 = Eo 2 6 4 X di:oi=1^y(di)=1 – (rank (di j f„; D)) 3 7 5 = X di:y(di)=1 P (oi = 1 j R; di)– (rank (di j f„; D)) = X di2D P (oi = 1 j R; di)– (rank (di j f„; D)) ´ y (di) 12
ҼՌධՁɿ´ (f„; D; y) ͷφΠʔϒਪఆ φΠʔϒਪఆɿ Eo ˆ´NAIVE (f„; D;
c)˜ = X di:y(di)=1 P (oi = 1 j R; di)– (rank (di j f„; D)) ͰɺͦΕͧΕͷจॻͷɺϩάऩू࣌ͷϥϯΩϯά R Ͱͷ؍ଌ֬ͰॏΈͨ͠ਪఆʹͳͬ ͯ͠·͏ɻ ϥϯΩϯάͰɺߴॱҐͷจॻ΄Ͳ؍ଌ͞Ε͍͢ɿ͜ΕΛ position bias ͱݺͿɻϩάऩ ूͷࡍʹߴॱҐʹදࣔ͞Εͨจॻਅͷؔ࿈ΑΓؔ࿈͕͋ΔɺͱόΠΞεΛड͚ͯ͠· ͏ɻ όΠΞεΛআڈ͢ΔͨΊʹɺP (oi = 1 j R; di) Λਪఆ͠ɺิਖ਼ͯ͋͛͠Εྑͦ͞͏ ! είΞʹΑΔόΠΞεআڈ 13
είΞΛ༻͍ͨόΠΞεআڈ Inverse Propensity Scoring(IPS) ʹΑͬͯόΠΞεΛআڈ͢Δɿ ´IPS (f„; D; c) :=
X di2D – (rank (di j f„; D)) P (oi = 1 j R; di) ´ ci ͜͜ͰɺP (oi = 1 j R; di) ϩάऩूதʹදࣔ͞ΕͨϥϯΩϯά R Ͱจॻ di ͕؍ଌ͞ ΕΔ֬Ͱ͋Δɻ´IPS (f„; D; c) ΫϦοΫϊΠζ͕ͳ͍߹ɺͭ·Γ P (ci = 1 j oi = 1; y (di)) = y (di) Ͱ͋Δ࣌ʹ ´ (f„; D; y) ͷෆภਪఆྔͰ͋Δɿ Eo ˆ´IPS (f„; D; c)˜ = Eo 2 4 X di2D – (rank (di j f„; D)) P (oi = 1 j R; di) ´ ci 3 5 = Eo 2 6 4 X di:oi=1^y(di)=1 – (rank (di j f„; D)) P (oi = 1 j R; di) 3 7 5 = X di:y(di)=1 P (oi = 1 j R; di) ´ – (rank (di j f„; D)) P (oi = 1 j R; di) = X di2D – (rank (di j f„; D)) ´ y (di) = ´ (f„; D; y) : 14
Propensity-weighted LTR IPS ´ (f„; D; y) ͷෆภਪఆͰ͋ͬͨɻΑͬͯɺ࠷దͳϞσϧύϥϝʔλ „
IPS Λ ࠷దԽ͢Δ͜ͱͰٻΊΔ͜ͱ͕Ͱ͖ΔɻIPS Λ࠷దԽ͢ΔࡍɺϥϯΩϯάࢦඪ – (r) ͷඍ ෆՄೳੑʹରॲ͢ΔͨΊɺ– (r) ͷ bound Λར༻͢Δɻ Propensity-weighted LTR ͷྲྀΕɿ › ΫϦοΫͷείΞΛਪఆɿ P (oi = 1 j R; di) › ෆภਪఆྔ ´IPS (f„; D; c) ͷ bound ʹ͍ͭͯඍΛܭࢉɿ „0 = r„ "– (rank (di j f„; D)) P (oi = 1 j R; di) # › ϞσϧύϥϝʔλΛߋ৽ „new „old ` „0 15
References › https://ilps.github.io/webconf2020-tutorial-unbiased-ltr/ 16