Upgrade to Pro — share decks privately, control downloads, hide ads and more …

solving of multi-armed bandit problem in advertisement recommendation

yoppi
April 20, 2018
6.8k

solving of multi-armed bandit problem in advertisement recommendation

yoppi

April 20, 2018
Tweet

Transcript

  1. import “github.com/yoppi” w !ZPQQJCMPH4QFFF *OD w 6;06UFDIMFBEFOHJOFFS w ϨίϝϯυΤϯδϯ࡞ͬͨΓ w

    Ξυαʔό࡞ͬͨΓ w ޿ࠂۀք͸೥͘Β͍ʢಈը޿ࠂɺωΠςΟϒΞυʣ w (Pɺ+BWB4DSJQUɺ3VCZ
  2. ޿ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε഑৴ ‣ ͦͷϝσΟΞͷهࣄͱ޿ࠂͷྨࣅ౓͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά഑৴ ‣ ޿ࠂओͷ঎඼Λݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷ঎඼ΛϨί ϝϯυ͢Δ

    w λʔήςΟϯά഑৴ ‣ ϢʔβͷଐੑΛ%.1౳͔Βਪଌ͠ྨࣅ͢Δଐੑͷ޿ࠂΛϨ ίϝϯυ͢Δ w աڈͷ੒Ռʹجͮ͘഑৴ ‣ ͦͷ࿮ʹաڈʹ഑৴͞Εͨ৘ใΛ΋ͱʹ੒Ռ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
  3. ޿ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε഑৴ w ͦͷϝσΟΞͷهࣄͱ޿ࠂͷྨࣅ౓͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά഑৴ w ޿ࠂओͷ঎඼Λݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷ঎඼ΛϨί ϝϯυ͢Δ

    w λʔήςΟϯά഑৴ w ϢʔβͷଐੑΛ%.1౳͔Βਪଌ͠ྨࣅ͢Δଐੑͷ޿ࠂΛϨ ίϝϯυ͢Δ w աڈͷ੒Ռʹجͮ͘഑৴ w ͦͷ࿮ʹաڈʹ഑৴͞Εͨ৘ใΛ΋ͱʹ੒Ռ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
  4. ୳ࡧͱ஌ࣝར༻ͷτϨʔυΦϑ w ୳ࡧʢFYQMPSBUJPOʣ w ଞͷ࿹ͷ΄͏͕੒Ռ͕ྑ͍ͷͰ͸ͱ༧ଌͯ͠Ҿ͍ͯΈΔ w ͨͩɺʮ΋ͬͱଞʹྑ͍࿹͕͋ΔͷͰ͸ʁʯΛ܁Γฦ͢ͱɺ੒Ռ ͷ͍͍࿹ΛҾ͘ճ਺͕গͳ͘ͳΓύϑΥʔϚϯεΛͩ͢͜ͱ͕Ͱ͖ ͳ͍ w

    ஌ࣝར༻ʢFYQMPJUBUJPOʣ w ͋Δظؒͷ͍͔ͭ͘ͷ࿹ͷ੒Ռ͕࠷΋ྑ͍΋ͷΛ͞Βʹબ୒͠ଓ͚ Δ w ͨͩɺʮ͜ͷ࿹࠷ߴʂʯͱͳͬͯͦΕ͹͔Γબ୒͍ͯ͠Δͱɺଞʹ ΋ͬͱύϑΥʔϚϯε͕ྑ͍࿹Λݟ͚ͭΒΕͳ͍ w ୳ࡧͱ஌ࣝར༻ΛͦΕͧΕద੾ʹ࣮ߦ͢ΔํࡦΛݕ౼͢Δ͜ͱ͕ॏཁ
  5. 5IPNQTPO4BNQMJOH w ʮ֤࿹͕ظ଴஋࠷େͱͳΔ֬཰ʯΛ΋ͱʹϥϯμϜͰ࿹Λબ୒͢Δํ ࡦʢ֬཰Ұக๏ʣ w ͜ͷ֬཰ͷ࿮૊ΈΛϕΠζ౷ܭͰఆࣜԽͨ͠΋ͷ͕5IPNQTPO 4BNQMJOH w ظ଴஋µi ͕ԿΒ͔ͷࣄલ෼෍͔Β੒Γཱ͍ͬͯΔͱ͠ɺਅͷظ଴஋µi

    ͷࣄޙ෼෍ΛٻΊ࿹Λબ୒͢Δ w ྫ͑͹޿ࠂΛྫʹڍ͛Δͱɺ޿ࠂ͕ΫϦοΫ͞ΕͨɺΫϦοΫ͞ Εͳ͔ͬͨͱ͍͏஋ʹ෼ྨͰ͖ΔͷͰࣄલ෼෍͸ϕϧψʔΠ෼෍ ʢBer(µ): µ∈[0, 1]ʣͱͳΔ w ܭࢉΛ؆ૉʹ͢ΔͨΊɺϕϧψʔΠ෼෍ͷڞ໾ࣄલ෼෍Ͱ͋Δ#FUB ෼෍Λ༻͍Δ
  6. 5IPNQTPO4BNQMJOHͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤iʹ͍ͭͯni ←0, mi ←0 2:

    for i = 1, 2,...,T do 3: ^μi = Beta(mi+α, (ni-mi)+β) 4: i ← argmaxi∈{1,2,...,K}^μi 5: ࿹iΛͻ͍ͯใुXi(t)∈{0,1}Λ؍ଌ͢Δ 6: ni ← ni + 1, mi ← mi + Xi(t) 7: end for
  7. 6;06ͷ޿ࠂϨίϝϯυ΁ͷద༻ w ଟ࿹όϯσΟοτ໰୊Λ޿ࠂϨίϝϯυΞϧΰϦζϜͱͯ͠ 6;06ద༻ w ࿹͕޿ࠂʹରԠ͢Δ w ํࡦͱͯ͠5IPNQTPO4BNQMJOHΛ࠾༻ w ࣄલ෼෍ͱࣄޙ෼෍͸ͦͷ޿ࠂͷ$53ͷ෼෍

    w ࿮ຖͷ഑৴݁Ռ $53 Λࣄલ෼෍ͱͯ͠ࣄޙ෼෍Λܭࢉ͠ɺ഑ ৴͢Δ޿ࠂΛબ୒͢Δ w #FUB෼෍ͷܭࢉ࣌ʹ͓͚ΔЋ Ќ͸ώϡʔϦεςΟοΫʹܾΊ ଧͪ
  8. 54Λ޿ࠂʹద༻ͨ͠৔߹ͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤޿ࠂiʹ͍ͭͯ impressioni ←0, clicki ←0

    2: for i = 1, 2,...,T do 3: ^μi = Beta(clicki+α, (impressioni-clicki)+β) 4: i←argmaxi∈{1,2,…,K}^μi*CPCi*1000 5: ޿ࠂάϧʔϓiΛͻ͍ͯใुXi(t)∈{0, 1} Λ؍ଌ͢Δ 6: impressioni ←impressioni+1, clicki ←clicki+Xi(t) 7: end for
  9. ·ͱΊͱࠓޙͷ՝୊ w ໨తͱ͍ͯͨ͠ɺܭࢉྔͷ࡟ݮٴͼF$1.ͷ޲্Λୡ੒Ͱ͖ͨ w ϋΠύʔύϥϝʔλͷ࠷దԽ w Ћ Ќͷ஋Λࠓ͸ώϡʔϦεςΟοΫʹఆ͍ٛͯ͠Δঢ়ଶͳͷͰ ࿮͝ͱͷ࠷దͳ஋ΛٻΊ͍ͨ w

    બ୒͢Δ࿹ͷߋ৽ස౓ͷ࠷దԽ w શͯͷ࿮ʹ͓͍ͯ࣌ؒݻఆͰ࿹Λߋ৽͍ͯ͠ΔͷͰ࿮͝ͱͷ࠷ దͳߋ৽ස౓ΛٻΊ͍ͨ w $73ΛՃຯͨ͠࿹ͷ༧ଌ w $53ͷΈ༧ଌͨ͠΋ͷͰ࿹ΛબΜͰ͍ΔͷͰ$73༧ଌ΋౿·͑ ࿹Λબ୒͍ͨ͠