Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Offline A/B testing for Recommender Systems

A7fef1ca043d64ac78302bcbee244f02?s=47 alpicola
November 20, 2018

Offline A/B testing for Recommender Systems

A7fef1ca043d64ac78302bcbee244f02?s=128

alpicola

November 20, 2018
Tweet

Transcript

  1. Offline A/B testing for Recommender Systems ͸ͯͳ ాத (alpicola) @

    ࿦จಡΈձ 11/19 1
  2. Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷ࿦จ — SpotifyͷRecSys'18࿦จͰ΋ݴٴ

    2
  3. Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷ࿦จ — SpotifyͷRecSys'18࿦จͰ΋ݴٴ

    — ΫοΫύου։࠵ͷಡΈձͰ͢Ͱʹ঺հ͞Ε͍ͯͨ — ͕ɺվΊͯ۷ΓԼ͛ͨ࿩͕Ͱ͖Ε͹ͱࢥ͍·͢ 3
  4. ΦϑϥΠϯABςετ? — ΦϯϥΠϯͰߦ͏ABςετ͸࣌ؒͱ͕͔͔ۚΔ — ΦϑϥΠϯͰͦΕʹ͍ۙධՁ͕ߦ͑Ε͹ΞϧΰϦζ ϜվળͷαΠΫϧΛߴ଎ԽͰ͖Δ — Ͱ΋ਫ਼౓͸? ! 4

  5. ϩάʹجͮ͘ΦϑϥΠϯධՁͷݚڀ — Counterfactual estimationͱ͔off-policy estimationͱ ݺ͹ΕΔ — WSDM'15ͷνϡʔτϦΞϧ — SIGIR'16ͷνϡʔτϦΞϧ

    — ධՁ͚ͩͰͳֶ͘शͷ໨తؔ਺ʹ࢖͏͜ͱ΋Ͱ͖Δ — ͜ͷ࿦จͰ͸ධՁͷΈΛѻ͏ 5
  6. ࿦จͷߩݙ — ΦϑϥΠϯABςετͰ༻͍Δใुͷਪఆख๏NCISͷ ͋Δछͷ࠷దੑΛࣔ͢ — ͜ͷ஌ݟʹج͍ͮͯNCISͷ֦ுPieceNCISͱ PointNCISΛఏҊ — ΦϯϥΠϯABςετ݁Ռͱͷ૬͕ؔେ͖͘޲্ 6

  7. ໰୊ઃఆ — Top-k ϥϯΩϯά — : ϩά — : ίϯςΩετ

    — : ΞΫγϣϯ — : ใु 7
  8. ໰୊ઃఆ — : ίϯςΩετ͔ΒΞΫγϣϯΛબͿϙϦγʔ — : ݱߦͷϙϦγʔ — : ςετ͍ͨ͠ϙϦγʔ

    — : ฏۉॲஔޮՌ — ͜ΕΛਪఆ͍ͨ͠ 8
  9. ໰୊ઃఆ — ΦϯϥΠϯABςετ — ͷݩͰͷϩάͱ ͷݩͰͷϩά͕͋Δ — ඪຊฏۉͰ , ͦΕͧΕਪఆ

    — ΦϑϥΠϯABςετ — ͷݩͰͷϩά͔Β ΋ਪఆ ! 9
  10. ैདྷख๏ — Importance sampling (IS) — Normalized importance sampling (NIS)

    — Doubly robust estimator (DR) — Capped importance sampling (CIS) — Normalized capped importance sampling (NCIS) ౷ܭ΍ϞϯςΧϧϩ๏ͷจ຺Ͱొ৔ 10
  11. Importance sampling (IS) — ! όΠΞε͕ͳ͍ — — " ʹΑΔߴόϦΞϯε

    (unbounded) — όϦΞϯε͕େ͖͍ͱ ͱ ΛൺֱͰ͖ͳ͍ 11
  12. Normalized importance sampling (NIS) Λ࢖ͬͯ Λஔ͖׵͑ — ! ҰகਪఆྔʹͳΔ —

    — " ґવͱͯ͠όϦΞϯεେ 12
  13. Capped importance sampling (CIS) ॏΈͷ࠷େ஋Λ ʹ (max capping) ॏΈ͕ Ҏ্ͷ߲͸ࣺͯΔ

    (zero capping) 13
  14. CISͷόΠΞε 14

  15. CISͷόΠΞε — όΠΞε͸ ͷ࣌ͷ Ͱbound͞ΕΔ — — ͸ใु͕େ͖͍ͱ͜ΖΛऔΕΔΑ͏ʹվળ͍ͨ͠ ͕ͦ͏͢ΔͱόΠΞε͕େ͖͘ͳΔ !

    15
  16. CISͷόΠΞε Cappingͷઃఆʹ͍͍τϨʔυΦϑ͕ଘࡏ͠ͳ͍ ! 16

  17. Normalized capped importance sampling (NCIS) NIS, CIS྆ํͷΞΠσΞΛ࣋ͪࠐΉ 17

  18. NCISͱCISͷؔ܎ 18

  19. NCISͱCISͷؔ܎ CIS͕͍࣋ͬͯͨόΠΞε Λୈೋ߲ͰϞσϧ ͍ͯ͠ΔͱݟͳͤΔ 19

  20. NCISͱCISͷؔ܎ (ಛʹzero cappingͷ࣌) 20

  21. NCISͱCISͷؔ܎ (ಛʹzero cappingͷ࣌) — ͳΒ઴ۙతʹόΠΞ ε͕ͳ͘ͳΔ ! — ͷ ,

    ʹର͢Δґଘ౓͕খ͍࣌͞ͳͲ 21
  22. NCISͷόΠΞε 22

  23. NCISͷόΠΞε — ͱcappingͷ༗ແʹ૬͕ؔ͋ΔͱόΠΞε͕େ͖͘ ͳΔ ! — ަབྷҼࢠ͸ϢʔβʔͷλΠϓͳͲ͕ߟ͑ΒΕΔ (Table 1) 23

  24. NCISͷόΠΞε 24

  25. ࿦จͷΞΠσΞ — ͷϞσϦϯάΛάϩʔόϧ㱺ϩʔΧϧʹ — ίϯςΩετ ʹରͯ͠ہॴతͳNCIS — ͱcappingͷ૬ؔΛݮΒ͢ — Piecewise

    NCIS: ෼ׂ͞ΕͨྖҬ͝ͱʹNCIS — Pointwise NCIS: ཁૉ͝ͱʹNCIS 25
  26. Piecewise NCIS (PieceNCIS) ίϯςΩετͷू߹ ͷ෼ׂ Λߟ͑Δ 26

  27. Piecewise NCIS (PieceNCIS) ֤෼ׂʹରͯ͠NCIS 27

  28. ෼ׂͷྫ ద౰ͳؔ਺ ΛఆΊͯ ֤ ಺Ͱ ͷ ʹର͢Δґଘ͕খ͘͞ͳΔΑ͏ʹ 28

  29. Pointwise NCIS (PointNCIS) ཁૉ୯ҐͰ෼ׂ͢Δ (i.e. ) ಛఆͷίϯςΩετʹର͢Δαϯϓϧ਺͸͘͝গͳ͍ͷ Ͱૉ๿ʹNCISΛద༻Ͱ͖ͳ͍ 29

  30. Pointwise NCIS (PointNCIS) — ͸ΞΫγϣϯʹ͍ͭͯपลԽ͢Δ ͱਖ਼֬ʹٻΊΒΕΔ — ΞΫγϣϯͷ਺͕ଟ͍ͱܭࢉ͕ߴίετ ! —

    ΛαϯϓϦϯάͰٻΊΔ 30
  31. Midzuno-Sen method 1. Λαϯϓϧ 2. Λ ͔Β ͳ΋ͷ͕ಘΒΕΔ·Ͱαϯϓϧ 3. Λ

    ͔Βαϯϓϧ 4. Λฦ͢ ͜͏ͯ͠ಘΒΕΔ஋Λ ͱॻ͘ 31
  32. Pointwise NCIS (PointNCIS) — ͷ͏ͪ ͕ ͷσʔλ͸ແࢹͰ͖Δ — ใु͕εύʔεͳ࣌ʹޮ཰తʹܭࢉͰ͖Δ !

    32
  33. ࣮ݧ — ϓϩϓϥΠΤλϦͷσʔληοτ — 39छɺ߹ܭͰ਺ઍԯ݅ͷϩάσʔλ — ΫϦοΫϕʔεͷใु (εύʔε͔ͭ෼ࢄେ) — ର৅͸CIS,

    NCIS, PieceNCIS, PointNCIS ( ) — IS, NIS͸όϦΞϯε͕ߴ͗͢ΔͷͰআ֎ 33
  34. ΦϯϥΠϯʗΦϑϥΠϯABςετͷ૬ؔ 34

  35. ద߹཰ͱِӄੑ཰ ʮ ͕ ΑΓΑ͍͔Ͳ͏͔ʯͷ2஋༧ଌͱͯ͠ݟΔ 35

  36. ࣮ݧ݁Ռͷ·ͱΊ — CIS͸૬͕ؔෛ — શମతʹ௿Ίͷਪఆ͕ग़͍ͯͨ (Figure 4) — CIS⇒NCISͰେ͖͘վળ —

    NCIS⇒PointNCISͰِཅੑ཰͕͞ΒʹԼ͕Δ — ద߹཰͸NCISҎޙͦ͜·ͰΑ͘ͳΒͳ͍ — ࣮ߦ଎౓ʹ͓͍ͯ΋ਫ਼౓ʹ͓͍ͯ΋PointNCIS͕Α͍ 36
  37. Appendix — ͕খ͍͞ͱ ͕ cappingΛ௒͑Δ͜ͱ΋ — Max cappingͰ͸ ʹͳΔΑ͏ͳ ৽͍͠capping

    ͕ͱΕΔ (Lemma A.3) 37