Slide 1

Slide 1 text

Offline A/B testing for Recommender Systems ͸ͯͳ ాத (alpicola) @ ࿦จಡΈձ 11/19 1

Slide 2

Slide 2 text

Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷ࿦จ — SpotifyͷRecSys'18࿦จͰ΋ݴٴ 2

Slide 3

Slide 3 text

Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷ࿦จ — SpotifyͷRecSys'18࿦จͰ΋ݴٴ — ΫοΫύου։࠵ͷಡΈձͰ͢Ͱʹ঺հ͞Ε͍ͯͨ — ͕ɺվΊͯ۷ΓԼ͛ͨ࿩͕Ͱ͖Ε͹ͱࢥ͍·͢ 3

Slide 4

Slide 4 text

ΦϑϥΠϯABςετ? — ΦϯϥΠϯͰߦ͏ABςετ͸࣌ؒͱ͕͔͔ۚΔ — ΦϑϥΠϯͰͦΕʹ͍ۙධՁ͕ߦ͑Ε͹ΞϧΰϦζ ϜվળͷαΠΫϧΛߴ଎ԽͰ͖Δ — Ͱ΋ਫ਼౓͸? ! 4

Slide 5

Slide 5 text

ϩάʹجͮ͘ΦϑϥΠϯධՁͷݚڀ — Counterfactual estimationͱ͔off-policy estimationͱ ݺ͹ΕΔ — WSDM'15ͷνϡʔτϦΞϧ — SIGIR'16ͷνϡʔτϦΞϧ — ධՁ͚ͩͰͳֶ͘शͷ໨తؔ਺ʹ࢖͏͜ͱ΋Ͱ͖Δ — ͜ͷ࿦จͰ͸ධՁͷΈΛѻ͏ 5

Slide 6

Slide 6 text

࿦จͷߩݙ — ΦϑϥΠϯABςετͰ༻͍Δใुͷਪఆख๏NCISͷ ͋Δछͷ࠷దੑΛࣔ͢ — ͜ͷ஌ݟʹج͍ͮͯNCISͷ֦ுPieceNCISͱ PointNCISΛఏҊ — ΦϯϥΠϯABςετ݁Ռͱͷ૬͕ؔେ͖͘޲্ 6

Slide 7

Slide 7 text

໰୊ઃఆ — Top-k ϥϯΩϯά — : ϩά — : ίϯςΩετ — : ΞΫγϣϯ — : ใु 7

Slide 8

Slide 8 text

໰୊ઃఆ — : ίϯςΩετ͔ΒΞΫγϣϯΛબͿϙϦγʔ — : ݱߦͷϙϦγʔ — : ςετ͍ͨ͠ϙϦγʔ — : ฏۉॲஔޮՌ — ͜ΕΛਪఆ͍ͨ͠ 8

Slide 9

Slide 9 text

໰୊ઃఆ — ΦϯϥΠϯABςετ — ͷݩͰͷϩάͱ ͷݩͰͷϩά͕͋Δ — ඪຊฏۉͰ , ͦΕͧΕਪఆ — ΦϑϥΠϯABςετ — ͷݩͰͷϩά͔Β ΋ਪఆ ! 9

Slide 10

Slide 10 text

ैདྷख๏ — Importance sampling (IS) — Normalized importance sampling (NIS) — Doubly robust estimator (DR) — Capped importance sampling (CIS) — Normalized capped importance sampling (NCIS) ౷ܭ΍ϞϯςΧϧϩ๏ͷจ຺Ͱొ৔ 10

Slide 11

Slide 11 text

Importance sampling (IS) — ! όΠΞε͕ͳ͍ — — " ʹΑΔߴόϦΞϯε (unbounded) — όϦΞϯε͕େ͖͍ͱ ͱ ΛൺֱͰ͖ͳ͍ 11

Slide 12

Slide 12 text

Normalized importance sampling (NIS) Λ࢖ͬͯ Λஔ͖׵͑ — ! ҰகਪఆྔʹͳΔ — — " ґવͱͯ͠όϦΞϯεେ 12

Slide 13

Slide 13 text

Capped importance sampling (CIS) ॏΈͷ࠷େ஋Λ ʹ (max capping) ॏΈ͕ Ҏ্ͷ߲͸ࣺͯΔ (zero capping) 13

Slide 14

Slide 14 text

CISͷόΠΞε 14

Slide 15

Slide 15 text

CISͷόΠΞε — όΠΞε͸ ͷ࣌ͷ Ͱbound͞ΕΔ — — ͸ใु͕େ͖͍ͱ͜ΖΛऔΕΔΑ͏ʹվળ͍ͨ͠ ͕ͦ͏͢ΔͱόΠΞε͕େ͖͘ͳΔ ! 15

Slide 16

Slide 16 text

CISͷόΠΞε Cappingͷઃఆʹ͍͍τϨʔυΦϑ͕ଘࡏ͠ͳ͍ ! 16

Slide 17

Slide 17 text

Normalized capped importance sampling (NCIS) NIS, CIS྆ํͷΞΠσΞΛ࣋ͪࠐΉ 17

Slide 18

Slide 18 text

NCISͱCISͷؔ܎ 18

Slide 19

Slide 19 text

NCISͱCISͷؔ܎ CIS͕͍࣋ͬͯͨόΠΞε Λୈೋ߲ͰϞσϧ ͍ͯ͠ΔͱݟͳͤΔ 19

Slide 20

Slide 20 text

NCISͱCISͷؔ܎ (ಛʹzero cappingͷ࣌) 20

Slide 21

Slide 21 text

NCISͱCISͷؔ܎ (ಛʹzero cappingͷ࣌) — ͳΒ઴ۙతʹόΠΞ ε͕ͳ͘ͳΔ ! — ͷ , ʹର͢Δґଘ౓͕খ͍࣌͞ͳͲ 21

Slide 22

Slide 22 text

NCISͷόΠΞε 22

Slide 23

Slide 23 text

NCISͷόΠΞε — ͱcappingͷ༗ແʹ૬͕ؔ͋ΔͱόΠΞε͕େ͖͘ ͳΔ ! — ަབྷҼࢠ͸ϢʔβʔͷλΠϓͳͲ͕ߟ͑ΒΕΔ (Table 1) 23

Slide 24

Slide 24 text

NCISͷόΠΞε 24

Slide 25

Slide 25 text

࿦จͷΞΠσΞ — ͷϞσϦϯάΛάϩʔόϧ㱺ϩʔΧϧʹ — ίϯςΩετ ʹରͯ͠ہॴతͳNCIS — ͱcappingͷ૬ؔΛݮΒ͢ — Piecewise NCIS: ෼ׂ͞ΕͨྖҬ͝ͱʹNCIS — Pointwise NCIS: ཁૉ͝ͱʹNCIS 25

Slide 26

Slide 26 text

Piecewise NCIS (PieceNCIS) ίϯςΩετͷू߹ ͷ෼ׂ Λߟ͑Δ 26

Slide 27

Slide 27 text

Piecewise NCIS (PieceNCIS) ֤෼ׂʹରͯ͠NCIS 27

Slide 28

Slide 28 text

෼ׂͷྫ ద౰ͳؔ਺ ΛఆΊͯ ֤ ಺Ͱ ͷ ʹର͢Δґଘ͕খ͘͞ͳΔΑ͏ʹ 28

Slide 29

Slide 29 text

Pointwise NCIS (PointNCIS) ཁૉ୯ҐͰ෼ׂ͢Δ (i.e. ) ಛఆͷίϯςΩετʹର͢Δαϯϓϧ਺͸͘͝গͳ͍ͷ Ͱૉ๿ʹNCISΛద༻Ͱ͖ͳ͍ 29

Slide 30

Slide 30 text

Pointwise NCIS (PointNCIS) — ͸ΞΫγϣϯʹ͍ͭͯपลԽ͢Δ ͱਖ਼֬ʹٻΊΒΕΔ — ΞΫγϣϯͷ਺͕ଟ͍ͱܭࢉ͕ߴίετ ! — ΛαϯϓϦϯάͰٻΊΔ 30

Slide 31

Slide 31 text

Midzuno-Sen method 1. Λαϯϓϧ 2. Λ ͔Β ͳ΋ͷ͕ಘΒΕΔ·Ͱαϯϓϧ 3. Λ ͔Βαϯϓϧ 4. Λฦ͢ ͜͏ͯ͠ಘΒΕΔ஋Λ ͱॻ͘ 31

Slide 32

Slide 32 text

Pointwise NCIS (PointNCIS) — ͷ͏ͪ ͕ ͷσʔλ͸ແࢹͰ͖Δ — ใु͕εύʔεͳ࣌ʹޮ཰తʹܭࢉͰ͖Δ ! 32

Slide 33

Slide 33 text

࣮ݧ — ϓϩϓϥΠΤλϦͷσʔληοτ — 39छɺ߹ܭͰ਺ઍԯ݅ͷϩάσʔλ — ΫϦοΫϕʔεͷใु (εύʔε͔ͭ෼ࢄେ) — ର৅͸CIS, NCIS, PieceNCIS, PointNCIS ( ) — IS, NIS͸όϦΞϯε͕ߴ͗͢ΔͷͰআ֎ 33

Slide 34

Slide 34 text

ΦϯϥΠϯʗΦϑϥΠϯABςετͷ૬ؔ 34

Slide 35

Slide 35 text

ద߹཰ͱِӄੑ཰ ʮ ͕ ΑΓΑ͍͔Ͳ͏͔ʯͷ2஋༧ଌͱͯ͠ݟΔ 35

Slide 36

Slide 36 text

࣮ݧ݁Ռͷ·ͱΊ — CIS͸૬͕ؔෛ — શମతʹ௿Ίͷਪఆ͕ग़͍ͯͨ (Figure 4) — CIS⇒NCISͰେ͖͘վળ — NCIS⇒PointNCISͰِཅੑ཰͕͞ΒʹԼ͕Δ — ద߹཰͸NCISҎޙͦ͜·ͰΑ͘ͳΒͳ͍ — ࣮ߦ଎౓ʹ͓͍ͯ΋ਫ਼౓ʹ͓͍ͯ΋PointNCIS͕Α͍ 36

Slide 37

Slide 37 text

Appendix — ͕খ͍͞ͱ ͕ cappingΛ௒͑Δ͜ͱ΋ — Max cappingͰ͸ ʹͳΔΑ͏ͳ ৽͍͠capping ͕ͱΕΔ (Lemma A.3) 37