Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Counterfactual learning to rank: introduction
Search
Daiki Tanaka
May 02, 2020
Research
0
730
Counterfactual learning to rank: introduction
一般的なランキング学習からcounterfactual LTRへの導入
Daiki Tanaka
May 02, 2020
Tweet
Share
More Decks by Daiki Tanaka
See All by Daiki Tanaka
カーネル法概観
daikitanak
0
570
カーネル法:正定値カーネルの理論
daikitanak
0
65
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA
daikitanak
1
180
[Paper Reading] Attention is All You Need
daikitanak
0
110
Interpretability of Machine Learning : Paper reading (LIME)
daikitanak
0
150
[Paper reading] Local Outlier Detection With Interpretation
daikitanak
0
64
Other Decks in Research
See All in Research
Creation and environmental applications of 15-year daily inundation and vegetation maps for Siberia by integrating satellite and meteorological datasets
satai
3
260
snlp2025_prevent_llm_spikes
takase
0
160
GPUを利用したStein Particle Filterによる点群6自由度モンテカルロSLAM
takuminakao
0
240
電力システム最適化入門
mickey_kubo
1
910
数理最適化に基づく制御
mickey_kubo
6
730
cvpaper.challenge 10年の軌跡 / cvpaper.challenge a decade-long journey
gatheluck
3
310
論文紹介:Not All Tokens Are What You Need for Pretraining
kosuken
0
170
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data
satai
1
190
Google Agent Development Kit (ADK) 入門 🚀
mickey_kubo
2
1.7k
とあるSREの博士「過程」 / A Certain SRE’s Ph.D. Journey
yuukit
10
4.2k
[CV勉強会@関東 CVPR2025] VLM自動運転model S4-Driver
shinkyoto
2
480
20250605_新交通システム推進議連_熊本都市圏「車1割削減、渋滞半減、公共交通2倍」から考える地方都市交通政策
trafficbrain
0
750
Featured
See All Featured
Practical Orchestrator
shlominoach
190
11k
GitHub's CSS Performance
jonrohan
1032
460k
A Tale of Four Properties
chriscoyier
160
23k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
840
The Cost Of JavaScript in 2023
addyosmani
53
8.9k
How GitHub (no longer) Works
holman
315
140k
Gamification - CAS2011
davidbonilla
81
5.4k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
The Cult of Friendly URLs
andyhume
79
6.6k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.7k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
Transcript
Unbiased Learning to Rank May 7, 2020
Learning to rank ઃఆ Supervised LTR Pointwise loss Pairwise loss
Listtwise loss Counterfactual Learning to Rank Counterfactual Evaluation Inverse Propensity Scoring Propensity-weighted Learning to Rank 2
Learning to rank: ઃఆ ೖྗɿ จॻͷू߹ D ग़ྗɿ จॻͷॱҐ R
= (R1; R2; R3:::) ͨͩ͠ɺ֤จॻʹϞσϧ f„ ʹΑͬͯείΞ͕͍͍ͭͯͯ f„ (R1) – f„ (R2) – f„ (R3) ::: ͱͳ͍ͬͯΔɻ(ߴ͍είΞ͕͚ΒΕΔ΄ͲॱҐ͕ߴ͍) Learning to Rank (LTR) ͷత࠷దͳॱҐΛग़ྗ͢ΔϞσϧ f„ ͷύϥϝʔλ „ Λ σʔλ͔ΒٻΊΔ͜ͱɻ 3
Supervised LTR ڭࢣ͋Γ LTR Ͱɺ › ݕࡧΫΤϦ › จॻू߹ ›
ॱҐͷϥϕϧ ΛؚΉσʔληοτΛͬͯϞσϧύϥϝʔλΛٻΊΔɻ ڭࢣ͋Γ LTR Ͱ༻͍ΒΕΔଛࣦओʹ 3 ͭɿ › Pointwise loss › Pairwise loss › Listwise loss y (d) ʹΑͬͯɺจॻ d ͷݕࡧΫΤϦͷؔ࿈Λද͢ͱ͢Δɻ(େ͖͍΄ͲॱҐͷ্Ґʹ ͖ͯཉ͍͠) 4
Pointwise loss Pointwise loss ɺॱҐͷਪఆΛྨɾճؼͱͯ͠ղ͘ɻྫ͑ɺ௨ৗͷճؼଛࣦ (squared loss) ͱͯ͠ҎԼͷΑ͏ʹ༩͑Δɿ Lpointwise :=
1 N N X i=1 (f„ (di) ` y (di))2 Pointwise loss ͷɺϞσϧͷग़ྗΛॱҐͱͯ͠͏͜ͱΛߟྀʹೖΕ͍ͯͳ͍͜ ͱɻLTR Ͱग़ྗͱͯ͠ಘΒΕΔείΞΛฒͼସ͑ͯಘΒΕΔॱҐʹͷΈؔ৺͕͋Δɻ 5
Pairwise loss Pairwise loss Ͱɺ2 ͭͷจॻؒͷ૬ରతͳείΞͷେখΛߟྀʹ͍ΕΔɻྫ͑ɺҎԼ ͷΑ͏ͳ hinge-loss Λ༩͑Δʀ Lpairwise
:= X y(di)>y(dj) max (0; 1 ` (f„ (di) ` f„ (di))): ॱҐ͕૬ରతʹߴ͍จॻείΞ͕ߴ͘ɺॱҐ͕͍จॻείΞΛ͘͢Δؾ࣋ͪɻ Pairwise loss ͷɺશͯͷهࣄϖΞΛಉ༷ʹѻ͏͜ͱɻ࣮ͦͯ͠༻্ top100 ͱ top10 ޙऀͷํ͕ॏࢹ͞ΕΔ͜ͱɻPairwise loss Ͱ top100 ͷԼͷํͷॱҐΛվળ ͤ͞ΔͨΊʹ্ҐͷॱҐΛ٘ਜ਼ʹ͢Δ͜ͱ͕͋Γ͑ͯ͠·͏ɻ 6
Listwise loss Listwise loss ͰॱҐࢦඪΛ࠷దԽ͢Δɻ՝ɺॱҐࢦඪ͕ඍՄೳͰͳ͍͜ͱɻ ྫ͑ɺDCG ɿ DCG = N
X i=1 y (di) log2 (rank (di) + 1) Ͱ͋Δ͕ɺlog2 (rank (di) + 1) ඍෆՄೳͰ͋Δɻ ͦͷͨΊʹ֬తۙࣅΛ༻͍Δํ๏ (ListNetɺListMLE) ɺώϡʔϦεςΟοΫॱҐ ࢦඪͷόϯυΛ࠷దԽ͢Δख๏͕͋Δɻ(LambdaRankɺLambdaLoss) ྫ͑ɺ LambdaRank ͷଛࣦ DCG ͷόϯυͱͳ͍ͬͯΔɿ LLambdaRank := X y(di)>y(dj) log (1 + exp (f„ (dj) ` f„ (di))) j´DCGj 7
ҼՌධՁ తɿ৽͍͠ϥϯΩϯάؔ f„ ΛɺผͷϥϯΩϯάؔ fdeploy ͷԼͰूΊΒΕͨաڈ ͷσʔλ (ΫϦοΫσʔλͳͲ) ΛͬͯධՁ͍ͨ͠ɻ ҎԼͷ
2 ͭͷ߹ʹ͍ͭͯߟ͑Δɻ › શͯͷจॻʹ͍ͭͯਅͷؔ࿈ y (di) ͕طͰ͋Δ࣌ › y (di) Θ͔Βͳ͍͕ɺΫϦοΫใͳͲͷ҉తͳϑΟʔυόοΫͷΈར༻Մೳͳ࣌ 8
ҼՌධՁɿϥϕϧ͕طͳΒશʹධՁ͕Ͱ͖Δ શͯͷจॻʹ͍ͭͯਅͷϥϕϧ y (di) ͕طͰ͋Δ࣌ɺIR(ใݕࡧ) ࢦඪΛܭࢉͰ͖Δɿ ´ (f„; D; y)
= X di2D – (rank (di j f„; D)) ´ y (di) ͜͜Ͱɺ– ॱҐॏΈ͚ؔͰ͋ͬͯɺྫ͑ɿ APR: – (r) = r DCG: – (r) = 1 log2 (1+r) ͳͲ͕༻͍ΒΕΔɻ 9
ҼՌධՁ y (di) Θ͔Βͳ͍͕ɺΫϦοΫใͳͲͷ҉తͳϑΟʔυόοΫͷΈར༻Մೳͳ࣌ɿ › ͋Δจॻʹର͢ΔΫϦοΫɺͦͷจॻ͕ؔ࿈͍ͯ͠Δ͜ͱΛࣔ͢ɺόΠΞεɾϊΠζ ͖ͭͷࢦඪʹͳ͍ͬͯΔɻ › ΫϦοΫ͞Εͳ͔͔ͬͨΒͱ͍ͬͯͦͷจॻ͕ؔͳ͍Θ͚Ͱͳ͍ɻ(จॻ͕ؔͳ ͍ɾϢʔβ͕จॻΛ؍ଌ͍ͯ͠ͳ͍ɾϥϯμϜཁૉʹΑΔͷ)
ଟ͘ͷ؍ଌσʔλʹ͍ͭͯฏۉΛऔΕϊΠζআڈͰ͖Δͱߟ͑ΒΕΔ͕ɺόΠΞεআ ڈͰ͖ͳ͍ɻ 10
ҼՌධՁɿ؍ଌɾΫϦοΫϞσϧ Ϣʔβͷ؍ଌٴͼจॻͷؔ࿈ͷΈΛߟྀʹೖΕΔͱɺϢʔβͷΫϦοΫҎԼͷΑ͏ʹϞ σϦϯάͰ͖ͦ͏ɿ › ϥϯΩϯά R ʹ͓͍ͯจॻ di ͕؍ଌ͞ΕΔ (oi
= 1 Ͱද͢) ֬ɺ P (oi = 1 j R; di) (؍ଌ͞ΕΔ֬ؔ࿈ʹؔͳ͍ͱԾఆ͍ͯ͠Δɻ) › ؔ࿈ y (di) ͱ؍ଌ oi ͕༩͑ΒΕͨ࣌ͷɺจॻ di ͕ΫϦοΫ͞ΕΔ֬ (ci = 1 Ͱද͢) ɺ P (ci = 1 j oi; y (di)) › ΫϦοΫ؍ଌ͞Εͨจॻʹ͔͠ى͜Βͳ͍ͨΊɺϥϯΩϯά R ʹ͓͍ͯΫϦοΫ͞ ΕΔ֬ɿ P (ci = 1 ^ oi = 1 j y (di) ; R) = P (ci = 1 j oi = 1; y (di)) ´ P (oi = 1 j R; di) 11
ҼՌධՁɿ´ (f„; D; y) ͷφΠʔϒਪఆ ´ (f„; D; y) ΛφΠʔϒʹਪఆ͢ΔʹɺΫϦοΫͷใ
(ci) Λਅͷؔ࿈ϥϕϧ (y (di)) ͷΘΓʹ͑Αͯ͘ɺ ´NAIVE (f„; D; c) := X di2D – (rank (di j f„; D)) ´ ci ͱͳΔɻ ΫϦοΫʹϊΠζ͕͍ͬͯͳ͍࣌ɺͭ·Γ P (ci = 1 j oi = 1; y (di)) = y (di) Ͱ͋Δ࣌Ͱ͑͞ɺφΠʔϒਪఆ؍ଌόΠΞεΛड͚͍ͯΔɿ Eo ˆ´NAIVE (f„; D; c)˜ = Eo 2 4 X di2D – (rank (di j f„; D)) ´ ci 3 5 = Eo 2 6 4 X di:oi=1^y(di)=1 – (rank (di j f„; D)) 3 7 5 = X di:y(di)=1 P (oi = 1 j R; di)– (rank (di j f„; D)) = X di2D P (oi = 1 j R; di)– (rank (di j f„; D)) ´ y (di) 12
ҼՌධՁɿ´ (f„; D; y) ͷφΠʔϒਪఆ φΠʔϒਪఆɿ Eo ˆ´NAIVE (f„; D;
c)˜ = X di:y(di)=1 P (oi = 1 j R; di)– (rank (di j f„; D)) ͰɺͦΕͧΕͷจॻͷɺϩάऩू࣌ͷϥϯΩϯά R Ͱͷ؍ଌ֬ͰॏΈͨ͠ਪఆʹͳͬ ͯ͠·͏ɻ ϥϯΩϯάͰɺߴॱҐͷจॻ΄Ͳ؍ଌ͞Ε͍͢ɿ͜ΕΛ position bias ͱݺͿɻϩάऩ ूͷࡍʹߴॱҐʹදࣔ͞Εͨจॻਅͷؔ࿈ΑΓؔ࿈͕͋ΔɺͱόΠΞεΛड͚ͯ͠· ͏ɻ όΠΞεΛআڈ͢ΔͨΊʹɺP (oi = 1 j R; di) Λਪఆ͠ɺิਖ਼ͯ͋͛͠Εྑͦ͞͏ ! είΞʹΑΔόΠΞεআڈ 13
είΞΛ༻͍ͨόΠΞεআڈ Inverse Propensity Scoring(IPS) ʹΑͬͯόΠΞεΛআڈ͢Δɿ ´IPS (f„; D; c) :=
X di2D – (rank (di j f„; D)) P (oi = 1 j R; di) ´ ci ͜͜ͰɺP (oi = 1 j R; di) ϩάऩूதʹදࣔ͞ΕͨϥϯΩϯά R Ͱจॻ di ͕؍ଌ͞ ΕΔ֬Ͱ͋Δɻ´IPS (f„; D; c) ΫϦοΫϊΠζ͕ͳ͍߹ɺͭ·Γ P (ci = 1 j oi = 1; y (di)) = y (di) Ͱ͋Δ࣌ʹ ´ (f„; D; y) ͷෆภਪఆྔͰ͋Δɿ Eo ˆ´IPS (f„; D; c)˜ = Eo 2 4 X di2D – (rank (di j f„; D)) P (oi = 1 j R; di) ´ ci 3 5 = Eo 2 6 4 X di:oi=1^y(di)=1 – (rank (di j f„; D)) P (oi = 1 j R; di) 3 7 5 = X di:y(di)=1 P (oi = 1 j R; di) ´ – (rank (di j f„; D)) P (oi = 1 j R; di) = X di2D – (rank (di j f„; D)) ´ y (di) = ´ (f„; D; y) : 14
Propensity-weighted LTR IPS ´ (f„; D; y) ͷෆภਪఆͰ͋ͬͨɻΑͬͯɺ࠷దͳϞσϧύϥϝʔλ „
IPS Λ ࠷దԽ͢Δ͜ͱͰٻΊΔ͜ͱ͕Ͱ͖ΔɻIPS Λ࠷దԽ͢ΔࡍɺϥϯΩϯάࢦඪ – (r) ͷඍ ෆՄೳੑʹରॲ͢ΔͨΊɺ– (r) ͷ bound Λར༻͢Δɻ Propensity-weighted LTR ͷྲྀΕɿ › ΫϦοΫͷείΞΛਪఆɿ P (oi = 1 j R; di) › ෆภਪఆྔ ´IPS (f„; D; c) ͷ bound ʹ͍ͭͯඍΛܭࢉɿ „0 = r„ "– (rank (di j f„; D)) P (oi = 1 j R; di) # › ϞσϧύϥϝʔλΛߋ৽ „new „old ` „0 15
References › https://ilps.github.io/webconf2020-tutorial-unbiased-ltr/ 16