$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Offline A/B testing for Recommender Systems
Search
alpicola
November 20, 2018
Technology
0
2.1k
Offline A/B testing for Recommender Systems
alpicola
November 20, 2018
Tweet
Share
More Decks by alpicola
See All by alpicola
商品レコメンドでのexplicit negative feedbackの活用
alpicola
2
910
Recommending What Video to Watch Next: A Multitask Ranking System
alpicola
1
910
Kibanaを用いたアクセスログ調査と解析 / Access Log Analysis Using Kibana
alpicola
0
990
Other Decks in Technology
See All in Technology
AWS re:Invent 2025で見たGrafana最新機能の紹介
hamadakoji
0
430
AI時代のワークフロー設計〜Durable Functions / Step Functions / Strands Agents を添えて〜
yakumo
3
690
Identity Management for Agentic AI 解説
fujie
0
110
品質のための共通認識
kakehashi
PRO
4
370
IAMユーザーゼロの運用は果たして可能なのか
yama3133
2
490
多様なデジタルアイデンティティを攻撃からどうやって守るのか / 20251212
ayokura
0
490
AWS re:Invent 2025~初参加の成果と学び~
kubomasataka
0
130
SREには開発組織全体で向き合う
koh_naga
0
380
AWS Security Agentの紹介/introducing-aws-security-agent
tomoki10
0
320
re:Invent2025 3つの Frontier Agents を紹介 / introducing-3-frontier-agents
tomoki10
0
250
20251218_AIを活用した開発生産性向上の全社的な取り組みの進め方について / How to proceed with company-wide initiatives to improve development productivity using AI
yayoi_dd
0
140
日本Rubyの会: これまでとこれから
snoozer05
PRO
3
150
Featured
See All Featured
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Speed Design
sergeychernyshev
33
1.4k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
A designer walks into a library…
pauljervisheath
210
24k
How Software Deployment tools have changed in the past 20 years
geshan
0
29k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.1k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Deep Space Network (abreviated)
tonyrice
0
16
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Stop Working from a Prison Cell
hatefulcrawdad
273
21k
Amusing Abliteration
ianozsvald
0
61
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.3k
Transcript
Offline A/B testing for Recommender Systems ͯͳ ాத (alpicola) @
จಡΈձ 11/19 1
Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷจ — SpotifyͷRecSys'18จͰݴٴ
2
Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷจ — SpotifyͷRecSys'18จͰݴٴ
— ΫοΫύου։࠵ͷಡΈձͰ͢Ͱʹհ͞Ε͍ͯͨ — ͕ɺվΊͯ۷ΓԼ͕͛ͨͰ͖Εͱࢥ͍·͢ 3
ΦϑϥΠϯABςετ? — ΦϯϥΠϯͰߦ͏ABςετ࣌ؒͱ͕͔͔ۚΔ — ΦϑϥΠϯͰͦΕʹ͍ۙධՁ͕ߦ͑ΕΞϧΰϦζ ϜվળͷαΠΫϧΛߴԽͰ͖Δ — Ͱਫ਼? ! 4
ϩάʹجͮ͘ΦϑϥΠϯධՁͷݚڀ — Counterfactual estimationͱ͔off-policy estimationͱ ݺΕΔ — WSDM'15ͷνϡʔτϦΞϧ — SIGIR'16ͷνϡʔτϦΞϧ
— ධՁ͚ͩͰͳֶ͘शͷతؔʹ͏͜ͱͰ͖Δ — ͜ͷจͰධՁͷΈΛѻ͏ 5
จͷߩݙ — ΦϑϥΠϯABςετͰ༻͍Δใुͷਪఆख๏NCISͷ ͋Δछͷ࠷దੑΛࣔ͢ — ͜ͷݟʹج͍ͮͯNCISͷ֦ுPieceNCISͱ PointNCISΛఏҊ — ΦϯϥΠϯABςετ݁Ռͱͷ૬͕ؔେ্͖͘ 6
ઃఆ — Top-k ϥϯΩϯά — : ϩά — : ίϯςΩετ
— : ΞΫγϣϯ — : ใु 7
ઃఆ — : ίϯςΩετ͔ΒΞΫγϣϯΛબͿϙϦγʔ — : ݱߦͷϙϦγʔ — : ςετ͍ͨ͠ϙϦγʔ
— : ฏۉॲஔޮՌ — ͜ΕΛਪఆ͍ͨ͠ 8
ઃఆ — ΦϯϥΠϯABςετ — ͷݩͰͷϩάͱ ͷݩͰͷϩά͕͋Δ — ඪຊฏۉͰ , ͦΕͧΕਪఆ
— ΦϑϥΠϯABςετ — ͷݩͰͷϩά͔Β ਪఆ ! 9
ैདྷख๏ — Importance sampling (IS) — Normalized importance sampling (NIS)
— Doubly robust estimator (DR) — Capped importance sampling (CIS) — Normalized capped importance sampling (NCIS) ౷ܭϞϯςΧϧϩ๏ͷจ຺Ͱొ 10
Importance sampling (IS) — ! όΠΞε͕ͳ͍ — — " ʹΑΔߴόϦΞϯε
(unbounded) — όϦΞϯε͕େ͖͍ͱ ͱ ΛൺֱͰ͖ͳ͍ 11
Normalized importance sampling (NIS) Λͬͯ Λஔ͖͑ — ! ҰகਪఆྔʹͳΔ —
— " ґવͱͯ͠όϦΞϯεେ 12
Capped importance sampling (CIS) ॏΈͷ࠷େΛ ʹ (max capping) ॏΈ͕ Ҏ্ͷ߲ࣺͯΔ
(zero capping) 13
CISͷόΠΞε 14
CISͷόΠΞε — όΠΞε ͷ࣌ͷ Ͱbound͞ΕΔ — — ใु͕େ͖͍ͱ͜ΖΛऔΕΔΑ͏ʹվળ͍ͨ͠ ͕ͦ͏͢ΔͱόΠΞε͕େ͖͘ͳΔ !
15
CISͷόΠΞε Cappingͷઃఆʹ͍͍τϨʔυΦϑ͕ଘࡏ͠ͳ͍ ! 16
Normalized capped importance sampling (NCIS) NIS, CIS྆ํͷΞΠσΞΛ࣋ͪࠐΉ 17
NCISͱCISͷؔ 18
NCISͱCISͷؔ CIS͕͍࣋ͬͯͨόΠΞε Λୈೋ߲ͰϞσϧ ͍ͯ͠ΔͱݟͳͤΔ 19
NCISͱCISͷؔ (ಛʹzero cappingͷ࣌) 20
NCISͱCISͷؔ (ಛʹzero cappingͷ࣌) — ͳΒۙతʹόΠΞ ε͕ͳ͘ͳΔ ! — ͷ ,
ʹର͢Δґଘ͕খ͍࣌͞ͳͲ 21
NCISͷόΠΞε 22
NCISͷόΠΞε — ͱcappingͷ༗ແʹ૬͕ؔ͋ΔͱόΠΞε͕େ͖͘ ͳΔ ! — ަབྷҼࢠϢʔβʔͷλΠϓͳͲ͕ߟ͑ΒΕΔ (Table 1) 23
NCISͷόΠΞε 24
จͷΞΠσΞ — ͷϞσϦϯάΛάϩʔόϧ㱺ϩʔΧϧʹ — ίϯςΩετ ʹରͯ͠ہॴతͳNCIS — ͱcappingͷ૬ؔΛݮΒ͢ — Piecewise
NCIS: ׂ͞ΕͨྖҬ͝ͱʹNCIS — Pointwise NCIS: ཁૉ͝ͱʹNCIS 25
Piecewise NCIS (PieceNCIS) ίϯςΩετͷू߹ ͷׂ Λߟ͑Δ 26
Piecewise NCIS (PieceNCIS) ׂ֤ʹରͯ͠NCIS 27
ׂͷྫ దͳؔ ΛఆΊͯ ֤ Ͱ ͷ ʹର͢Δґଘ͕খ͘͞ͳΔΑ͏ʹ 28
Pointwise NCIS (PointNCIS) ཁૉ୯ҐͰׂ͢Δ (i.e. ) ಛఆͷίϯςΩετʹର͢Δαϯϓϧ͘͝গͳ͍ͷ ͰૉʹNCISΛద༻Ͱ͖ͳ͍ 29
Pointwise NCIS (PointNCIS) — ΞΫγϣϯʹ͍ͭͯपลԽ͢Δ ͱਖ਼֬ʹٻΊΒΕΔ — ΞΫγϣϯͷ͕ଟ͍ͱܭࢉ͕ߴίετ ! —
ΛαϯϓϦϯάͰٻΊΔ 30
Midzuno-Sen method 1. Λαϯϓϧ 2. Λ ͔Β ͳͷ͕ಘΒΕΔ·Ͱαϯϓϧ 3. Λ
͔Βαϯϓϧ 4. Λฦ͢ ͜͏ͯ͠ಘΒΕΔΛ ͱॻ͘ 31
Pointwise NCIS (PointNCIS) — ͷ͏ͪ ͕ ͷσʔλແࢹͰ͖Δ — ใु͕εύʔεͳ࣌ʹޮతʹܭࢉͰ͖Δ !
32
࣮ݧ — ϓϩϓϥΠΤλϦͷσʔληοτ — 39छɺ߹ܭͰઍԯ݅ͷϩάσʔλ — ΫϦοΫϕʔεͷใु (εύʔε͔ͭࢄେ) — ରCIS,
NCIS, PieceNCIS, PointNCIS ( ) — IS, NISόϦΞϯε͕ߴ͗͢ΔͷͰআ֎ 33
ΦϯϥΠϯʗΦϑϥΠϯABςετͷ૬ؔ 34
ద߹ͱِӄੑ ʮ ͕ ΑΓΑ͍͔Ͳ͏͔ʯͷ2༧ଌͱͯ͠ݟΔ 35
࣮ݧ݁Ռͷ·ͱΊ — CIS૬͕ؔෛ — શମతʹΊͷਪఆ͕ग़͍ͯͨ (Figure 4) — CIS⇒NCISͰେ͖͘վળ —
NCIS⇒PointNCISͰِཅੑ͕͞ΒʹԼ͕Δ — ద߹NCISҎޙͦ͜·ͰΑ͘ͳΒͳ͍ — ࣮ߦʹ͓͍ͯਫ਼ʹ͓͍ͯPointNCIS͕Α͍ 36
Appendix — ͕খ͍͞ͱ ͕ cappingΛ͑Δ͜ͱ — Max cappingͰ ʹͳΔΑ͏ͳ ৽͍͠capping
͕ͱΕΔ (Lemma A.3) 37