Offline A/B testing for Recommender Systems
by
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Offline A/B testing for Recommender Systems ͯͳ ాத (alpicola) @ จಡΈձ 11/19 1
Slide 2
Slide 2 text
Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷจ — SpotifyͷRecSys'18จͰݴٴ 2
Slide 3
Slide 3 text
Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷจ — SpotifyͷRecSys'18จͰݴٴ — ΫοΫύου։࠵ͷಡΈձͰ͢Ͱʹհ͞Ε͍ͯͨ — ͕ɺվΊͯ۷ΓԼ͕͛ͨͰ͖Εͱࢥ͍·͢ 3
Slide 4
Slide 4 text
ΦϑϥΠϯABςετ? — ΦϯϥΠϯͰߦ͏ABςετ࣌ؒͱ͕͔͔ۚΔ — ΦϑϥΠϯͰͦΕʹ͍ۙධՁ͕ߦ͑ΕΞϧΰϦζ ϜվળͷαΠΫϧΛߴԽͰ͖Δ — Ͱਫ਼? ! 4
Slide 5
Slide 5 text
ϩάʹجͮ͘ΦϑϥΠϯධՁͷݚڀ — Counterfactual estimationͱ͔off-policy estimationͱ ݺΕΔ — WSDM'15ͷνϡʔτϦΞϧ — SIGIR'16ͷνϡʔτϦΞϧ — ධՁ͚ͩͰͳֶ͘शͷతؔʹ͏͜ͱͰ͖Δ — ͜ͷจͰධՁͷΈΛѻ͏ 5
Slide 6
Slide 6 text
จͷߩݙ — ΦϑϥΠϯABςετͰ༻͍Δใुͷਪఆख๏NCISͷ ͋Δछͷ࠷దੑΛࣔ͢ — ͜ͷݟʹج͍ͮͯNCISͷ֦ுPieceNCISͱ PointNCISΛఏҊ — ΦϯϥΠϯABςετ݁Ռͱͷ૬͕ؔେ্͖͘ 6
Slide 7
Slide 7 text
ઃఆ — Top-k ϥϯΩϯά — : ϩά — : ίϯςΩετ — : ΞΫγϣϯ — : ใु 7
Slide 8
Slide 8 text
ઃఆ — : ίϯςΩετ͔ΒΞΫγϣϯΛબͿϙϦγʔ — : ݱߦͷϙϦγʔ — : ςετ͍ͨ͠ϙϦγʔ — : ฏۉॲஔޮՌ — ͜ΕΛਪఆ͍ͨ͠ 8
Slide 9
Slide 9 text
ઃఆ — ΦϯϥΠϯABςετ — ͷݩͰͷϩάͱ ͷݩͰͷϩά͕͋Δ — ඪຊฏۉͰ , ͦΕͧΕਪఆ — ΦϑϥΠϯABςετ — ͷݩͰͷϩά͔Β ਪఆ ! 9
Slide 10
Slide 10 text
ैདྷख๏ — Importance sampling (IS) — Normalized importance sampling (NIS) — Doubly robust estimator (DR) — Capped importance sampling (CIS) — Normalized capped importance sampling (NCIS) ౷ܭϞϯςΧϧϩ๏ͷจ຺Ͱొ 10
Slide 11
Slide 11 text
Importance sampling (IS) — ! όΠΞε͕ͳ͍ — — " ʹΑΔߴόϦΞϯε (unbounded) — όϦΞϯε͕େ͖͍ͱ ͱ ΛൺֱͰ͖ͳ͍ 11
Slide 12
Slide 12 text
Normalized importance sampling (NIS) Λͬͯ Λஔ͖͑ — ! ҰகਪఆྔʹͳΔ — — " ґવͱͯ͠όϦΞϯεେ 12
Slide 13
Slide 13 text
Capped importance sampling (CIS) ॏΈͷ࠷େΛ ʹ (max capping) ॏΈ͕ Ҏ্ͷ߲ࣺͯΔ (zero capping) 13
Slide 14
Slide 14 text
CISͷόΠΞε 14
Slide 15
Slide 15 text
CISͷόΠΞε — όΠΞε ͷ࣌ͷ Ͱbound͞ΕΔ — — ใु͕େ͖͍ͱ͜ΖΛऔΕΔΑ͏ʹվળ͍ͨ͠ ͕ͦ͏͢ΔͱόΠΞε͕େ͖͘ͳΔ ! 15
Slide 16
Slide 16 text
CISͷόΠΞε Cappingͷઃఆʹ͍͍τϨʔυΦϑ͕ଘࡏ͠ͳ͍ ! 16
Slide 17
Slide 17 text
Normalized capped importance sampling (NCIS) NIS, CIS྆ํͷΞΠσΞΛ࣋ͪࠐΉ 17
Slide 18
Slide 18 text
NCISͱCISͷؔ 18
Slide 19
Slide 19 text
NCISͱCISͷؔ CIS͕͍࣋ͬͯͨόΠΞε Λୈೋ߲ͰϞσϧ ͍ͯ͠ΔͱݟͳͤΔ 19
Slide 20
Slide 20 text
NCISͱCISͷؔ (ಛʹzero cappingͷ࣌) 20
Slide 21
Slide 21 text
NCISͱCISͷؔ (ಛʹzero cappingͷ࣌) — ͳΒۙతʹόΠΞ ε͕ͳ͘ͳΔ ! — ͷ , ʹର͢Δґଘ͕খ͍࣌͞ͳͲ 21
Slide 22
Slide 22 text
NCISͷόΠΞε 22
Slide 23
Slide 23 text
NCISͷόΠΞε — ͱcappingͷ༗ແʹ૬͕ؔ͋ΔͱόΠΞε͕େ͖͘ ͳΔ ! — ަབྷҼࢠϢʔβʔͷλΠϓͳͲ͕ߟ͑ΒΕΔ (Table 1) 23
Slide 24
Slide 24 text
NCISͷόΠΞε 24
Slide 25
Slide 25 text
จͷΞΠσΞ — ͷϞσϦϯάΛάϩʔόϧ㱺ϩʔΧϧʹ — ίϯςΩετ ʹରͯ͠ہॴతͳNCIS — ͱcappingͷ૬ؔΛݮΒ͢ — Piecewise NCIS: ׂ͞ΕͨྖҬ͝ͱʹNCIS — Pointwise NCIS: ཁૉ͝ͱʹNCIS 25
Slide 26
Slide 26 text
Piecewise NCIS (PieceNCIS) ίϯςΩετͷू߹ ͷׂ Λߟ͑Δ 26
Slide 27
Slide 27 text
Piecewise NCIS (PieceNCIS) ׂ֤ʹରͯ͠NCIS 27
Slide 28
Slide 28 text
ׂͷྫ దͳؔ ΛఆΊͯ ֤ Ͱ ͷ ʹର͢Δґଘ͕খ͘͞ͳΔΑ͏ʹ 28
Slide 29
Slide 29 text
Pointwise NCIS (PointNCIS) ཁૉ୯ҐͰׂ͢Δ (i.e. ) ಛఆͷίϯςΩετʹର͢Δαϯϓϧ͘͝গͳ͍ͷ ͰૉʹNCISΛద༻Ͱ͖ͳ͍ 29
Slide 30
Slide 30 text
Pointwise NCIS (PointNCIS) — ΞΫγϣϯʹ͍ͭͯपลԽ͢Δ ͱਖ਼֬ʹٻΊΒΕΔ — ΞΫγϣϯͷ͕ଟ͍ͱܭࢉ͕ߴίετ ! — ΛαϯϓϦϯάͰٻΊΔ 30
Slide 31
Slide 31 text
Midzuno-Sen method 1. Λαϯϓϧ 2. Λ ͔Β ͳͷ͕ಘΒΕΔ·Ͱαϯϓϧ 3. Λ ͔Βαϯϓϧ 4. Λฦ͢ ͜͏ͯ͠ಘΒΕΔΛ ͱॻ͘ 31
Slide 32
Slide 32 text
Pointwise NCIS (PointNCIS) — ͷ͏ͪ ͕ ͷσʔλແࢹͰ͖Δ — ใु͕εύʔεͳ࣌ʹޮతʹܭࢉͰ͖Δ ! 32
Slide 33
Slide 33 text
࣮ݧ — ϓϩϓϥΠΤλϦͷσʔληοτ — 39छɺ߹ܭͰઍԯ݅ͷϩάσʔλ — ΫϦοΫϕʔεͷใु (εύʔε͔ͭࢄେ) — ରCIS, NCIS, PieceNCIS, PointNCIS ( ) — IS, NISόϦΞϯε͕ߴ͗͢ΔͷͰআ֎ 33
Slide 34
Slide 34 text
ΦϯϥΠϯʗΦϑϥΠϯABςετͷ૬ؔ 34
Slide 35
Slide 35 text
ద߹ͱِӄੑ ʮ ͕ ΑΓΑ͍͔Ͳ͏͔ʯͷ2༧ଌͱͯ͠ݟΔ 35
Slide 36
Slide 36 text
࣮ݧ݁Ռͷ·ͱΊ — CIS૬͕ؔෛ — શମతʹΊͷਪఆ͕ग़͍ͯͨ (Figure 4) — CIS⇒NCISͰେ͖͘վળ — NCIS⇒PointNCISͰِཅੑ͕͞ΒʹԼ͕Δ — ద߹NCISҎޙͦ͜·ͰΑ͘ͳΒͳ͍ — ࣮ߦʹ͓͍ͯਫ਼ʹ͓͍ͯPointNCIS͕Α͍ 36
Slide 37
Slide 37 text
Appendix — ͕খ͍͞ͱ ͕ cappingΛ͑Δ͜ͱ — Max cappingͰ ʹͳΔΑ͏ͳ ৽͍͠capping ͕ͱΕΔ (Lemma A.3) 37