Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
solving of multi-armed bandit problem in advertisement recommendation
Search
yoppi
April 20, 2018
2
6.8k
solving of multi-armed bandit problem in advertisement recommendation
yoppi
April 20, 2018
Tweet
Share
More Decks by yoppi
See All by yoppi
Applying oCPC algorithm for production
yoppi
2
600
recommendation system with document similarity
yoppi
0
3k
RailsはRubyだ
yoppi
0
230
Featured
See All Featured
Design by the Numbers
sachag
274
18k
Practical Orchestrator
shlominoach
182
9.7k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
34
8.9k
Typedesign – Prime Four
hannesfritz
36
2.1k
What the flash - Photography Introduction
edds
64
11k
Web development in the modern age
philhawksworth
202
10k
How GitHub Uses GitHub to Build GitHub
holman
468
290k
Agile that works and the tools we love
rasmusluckow
325
20k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
274
13k
Adopting Sorbet at Scale
ufuk
68
8.6k
How to train your dragon (web standard)
notwaldorf
73
5.2k
Docker and Python
trallard
34
2.7k
Transcript
ࠂϨίϝϯυʹ͓͚Δ ଟόϯσΟοτͷద༻ͱղ๏ )JSPLB[V:PTIJEB!ZPQQJCMPH 4QFFF *OD ΦϨγΧφΠτ
import “github.com/yoppi” w !ZPQQJCMPH4QFFF *OD w 6;06UFDIMFBEFOHJOFFS w ϨίϝϯυΤϯδϯ࡞ͬͨΓ w
Ξυαʔό࡞ͬͨΓ w ࠂۀք͘Β͍ʢಈըࠂɺωΠςΟϒΞυʣ w (Pɺ+BWB4DSJQUɺ3VCZ
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
6;06ͷࠂϨίϝϯυ ࠂೖߘ ࠂ৴ ΞυωοτϫʔΫ ʜ ʜ ϝσΟΞ ࠂཧళ
6;06ͷࠂϨίϝϯυ هࣄɾίϯςϯπ ʮ͋ͳͨʹ͓͢͢Ίͷهࣄʯ ϝσΟΞͷهࣄԼ෦ʹ ΟδΣοτͱ͍͏ܗͰ ࠂΛ৴͢Δ
6;06ͷࠂϨίϝϯυ
6;06ͷࠂϨίϝϯυ ࠂ ࠂ ࠂ/ ͋ΔࠂͰԿΛ৴͢ΕΫϦοΫͯ͘͠ΕΔͷ͔ʁ ʜ
ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε৴ ‣ ͦͷϝσΟΞͷهࣄͱࠂͷྨࣅ͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά৴ ‣ ࠂओͷΛݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷΛϨί ϝϯυ͢Δ
w λʔήςΟϯά৴ ‣ ϢʔβͷଐੑΛ%.1͔Βਪଌ͠ྨࣅ͢ΔଐੑͷࠂΛϨ ίϝϯυ͢Δ w աڈͷՌʹجͮ͘৴ ‣ ͦͷʹաڈʹ৴͞ΕͨใΛͱʹՌ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
ࠓճͷϨίϝϯυΤϯδϯͷత w 6;06ͷച্ʢF$1.ʣΛ্͍͛ͨ w ܭࢉྔΛݮΒ͠ޮΑ͘ϨίϝϯυγεςϜΛӡ༻ ͍ͨ͠
ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε৴ w ͦͷϝσΟΞͷهࣄͱࠂͷྨࣅ͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά৴ w ࠂओͷΛݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷΛϨί ϝϯυ͢Δ
w λʔήςΟϯά৴ w ϢʔβͷଐੑΛ%.1͔Βਪଌ͠ྨࣅ͢ΔଐੑͷࠂΛϨ ίϝϯυ͢Δ w աڈͷՌʹجͮ͘৴ w ͦͷʹաڈʹ৴͞ΕͨใΛͱʹՌ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
ଟόϯσΟοτ ͕Kݸ ࣌ؒtʹ͓͍ͯi(t)Λબ͠ ใुXi(t)ΛಘΔ ͦΕΛ܁Γฦͯ͠ใुΛ࠷େ Խ͍ͤͨ͞ i(t) Xi(t)
ଟόϯσΟοτ w બࢶͷू߹͔ΒҰͭΛબͨ͠ͱ͖ʹɺͦͷબࢶʹ ର͢ΔใुΛಘΔ͕ଞͷબࢶͷใुใಘΒΕͳ͍ w ͦͷঢ়گԼͰ܁Γฦ͠બͨ͠ͱ͖ʹใुͷΛ࠷େԽ ͢Δ͜ͱΛࢦ͢ ‣ iͷਅͷظµi Λ͍ͬͯΔͳΒظ͕࠷େʹͳ
ΔΛҾ͖ଓ͚Ε͍͍͕ͦΕલఏ݅ͱͯ͠Ͱ͖ ͳ͍ w ͦΕ·ͰʹಘͨใुΛͱʹબࢶΛܾఆ͢ΔઓུΛํ ࡦʢQPMJDZʣͱݺͿ
ଟόϯσΟοτ ʲྫʳ εϩοτϚγϯΛճ(T= 100) Ҿ͚Δͱ͖ʹ ใुΛ࠷େԽͤ͞Δʹʁ K = 4
ଟόϯσΟοτ K = 4 ʲࡶͳํࡦʳ nճશ෦ͷΛҾ͍ͯՌͷྑ͔ͬͨΛɺ Γͷճ(100-4n) Ҿ͖ଓ͚Δ
ଟόϯσΟοτ K = 4 n = 3 ͜ͷͱ͖ճՌ͕Α͔ͬͨΛҾ͖ଓ͚Δ͕ɺ ͔͔ͨͩճͷ୳ࡧͰਅͷՌΛݕূ͢Δ͜ͱ ͦ͠͏
i* = argmaxi∈{1, 2, 3, 4}µi
ଟόϯσΟοτ K = 4 n = 20 ͜ͷͱ͖ճՌ͕Α͔ͬͨΛҾ͖ଓ͚Δ͕ɺ ใुͷྑ͍Λར༻͢Δճ͕গͳ͘ɺ ใुΛ࠷େԽͤ͞Δ͜ͱ͍͔͠
i* = argmaxi∈{1, 2, 3, 4}µi
୳ࡧͱࣝར༻ͷτϨʔυΦϑ w ୳ࡧʢFYQMPSBUJPOʣ w ଞͷͷ΄͏͕Ռ͕ྑ͍ͷͰͱ༧ଌͯ͠Ҿ͍ͯΈΔ w ͨͩɺʮͬͱଞʹྑ͍͕͋ΔͷͰʁʯΛ܁Γฦ͢ͱɺՌ ͷ͍͍ΛҾ͘ճ͕গͳ͘ͳΓύϑΥʔϚϯεΛͩ͢͜ͱ͕Ͱ͖ ͳ͍ w
ࣝར༻ʢFYQMPJUBUJPOʣ w ͋Δظؒͷ͍͔ͭ͘ͷͷՌ͕࠷ྑ͍ͷΛ͞Βʹબ͠ଓ͚ Δ w ͨͩɺʮ͜ͷ࠷ߴʂʯͱͳͬͯͦΕ͔Γબ͍ͯ͠Δͱɺଞʹ ͬͱύϑΥʔϚϯε͕ྑ͍Λݟ͚ͭΒΕͳ͍ w ୳ࡧͱࣝར༻ΛͦΕͧΕదʹ࣮ߦ͢ΔํࡦΛݕ౼͢Δ͜ͱ͕ॏཁ
ଟόϯσΟοτͷओཁͳํࡦ w ЏHSFFEZ w ୯७ͳ૯Γ݁Ռ͔Βྑ͔ͬͨͷΛҾ͖ଓ͚Δ w 6$#ʢ6QQFSDPOpEFODFCPVOEʣ w ඪຊฏۉʹิਖ਼߲ΛՃ͑ͨͷΛείΞͱͯ͠ΛબͿ w
54ʢ5IPNQTPO4BNQMJOHʣ w աڈʹ؍ଌͰ͖ͨ݁Ռ͔ΒϕΠζ౷ܭͰະདྷͷظΛ༧ ଌ͢Δ
ଟόϯσΟοτͷओཁͳํࡦ w ЏHSFFEZ w 6$# w 5IPNQTPO4BNQMJOH
5IPNQTPO4BNQMJOH w ʮ֤͕ظ࠷େͱͳΔ֬ʯΛͱʹϥϯμϜͰΛબ͢Δํ ࡦʢ֬Ұக๏ʣ w ͜ͷ֬ͷΈΛϕΠζ౷ܭͰఆࣜԽͨ͠ͷ͕5IPNQTPO 4BNQMJOH w ظµi ͕ԿΒ͔ͷࣄલ͔ΒΓཱ͍ͬͯΔͱ͠ɺਅͷظµi
ͷࣄޙΛٻΊΛબ͢Δ w ྫ͑ࠂΛྫʹڍ͛Δͱɺࠂ͕ΫϦοΫ͞ΕͨɺΫϦοΫ͞ Εͳ͔ͬͨͱ͍͏ʹྨͰ͖ΔͷͰࣄલϕϧψʔΠ ʢBer(µ): µ∈[0, 1]ʣͱͳΔ w ܭࢉΛ؆ૉʹ͢ΔͨΊɺϕϧψʔΠͷڞࣄલͰ͋Δ#FUB Λ༻͍Δ
5IPNQTPO4BNQMJOHͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤iʹ͍ͭͯni ←0, mi ←0 2:
for i = 1, 2,...,T do 3: ^μi = Beta(mi+α, (ni-mi)+β) 4: i ← argmaxi∈{1,2,...,K}^μi 5: iΛͻ͍ͯใुXi(t)∈{0,1}Λ؍ଌ͢Δ 6: ni ← ni + 1, mi ← mi + Xi(t) 7: end for
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
6;06ͷࠂϨίϝϯυͷద༻ w ଟόϯσΟοτΛࠂϨίϝϯυΞϧΰϦζϜͱͯ͠ 6;06ద༻ w ͕ࠂʹରԠ͢Δ w ํࡦͱͯ͠5IPNQTPO4BNQMJOHΛ࠾༻ w ࣄલͱࣄޙͦͷࠂͷ$53ͷ
w ຖͷ৴݁Ռ $53 Λࣄલͱͯ͠ࣄޙΛܭࢉ͠ɺ ৴͢ΔࠂΛબ͢Δ w #FUBͷܭࢉ࣌ʹ͓͚ΔЋ ЌώϡʔϦεςΟοΫʹܾΊ ଧͪ
5IPNQTPO4BNQMJOHΛબΜͩ༁ w όονߋ৽ʹؤ݈Ͱ͋Δ w ຊ൪ӡ༻͢ΔࡍʹσʔλαΠζ͔Β͘Δܭࢉྔͷ੍ʹΑΓ ใु݁ՌΛϦΞϧλΠϜͰ؍ଌ͢Δ͜ͱ͕͘͠Ԇͯ͠ߋ৽ ͢Δ͜ͱ͕ଟ͍ w ͕ɺͷબ࣌ʹʮظ࠷େͱͳΔ֬ʹ͕ͨͬͯ͠ϥϯμ ϜͰબ͢ΔʯͨΊɺՌͷѱ͍ʹภΔ͜ͱ͕গͳ͍
w ܭࢉྔ͕গͳ͍ w جຊతͳϩδοΫࣄલ͔ΒࣄޙΛ#FUBΛܭࢉ͢ Δ͚ͩͰΑ͍ͷͰܭࢉྔΛ͑ΒΕΔ w ଟ͘ͷϞσϧͰཧݶքΛୡͰ͖Δ͜ͱ͕ΒΕ͍ͯΔ
54Λࠂʹద༻ͨ͠߹ͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤ࠂiʹ͍ͭͯ impressioni ←0, clicki ←0
2: for i = 1, 2,...,T do 3: ^μi = Beta(clicki+α, (impressioni-clicki)+β) 4: i←argmaxi∈{1,2,…,K}^μi*CPCi*1000 5: ࠂάϧʔϓiΛͻ͍ͯใुXi(t)∈{0, 1} Λ؍ଌ͢Δ 6: impressioni ←impressioni+1, clicki ←clicki+Xi(t) 7: end for
Ћ ЌͬͯͲ͏ܾΊΕ͍͍ͷʁ w ࣄલʢ৽نʹೖߘ͞Εͨࠂʣ͕ͳ͍ࠂʹ͓͍ ͯҙຯΛͳ͢ w ؾΛ͚ͭΔ͖ͱ͜ΖฏۉͱࢄͰΦʔμʔ͕มΘ Δ
ϨίϝϯυγεςϜߏ ࠂϚελ ৴ϩά ৴αʔόɾ࣮ੵूܭαʔό ৴σʔλ 0O.FN $BDIF ৴σʔλ࡞όον 0O.FN $BDIF
.FEJB "EWFSUJTFS
(PWT1ZUIPO w ݁ہͷͱ͜Ζ5IPNQTPO4BNQMJOHࣄલ͔Βࣄ ޙΛٻΊΔͱ͖ʹ#FUBΛܭࢉ͢Δ͚ͩͰΑ͍ w ߴͳֶతͳܭࢉٻΊΒΕͳ͍ w ѹతʹ(PͰܭࢉ͢Δ΄͏͕ճͷόονʹΑΔ࣮ߦ ࣌ؒΛ͘Ͱ͖Δ w
όϯσΟοτΛղ͘ͱ͖ʹͷߋ৽ִؒΛͳ Δ͘খ͘͞ͳΔ΄͏͕ྑ͍ w ͦͷͨΊ(PΛ࠾༻࣮ͯ͠
(PWT1ZUIPO ࣮ߦ࣌ؒ T /
(P 1ZUIPO
ద༻݁Ռ w 6;06Λಋೖ͍ͯ͠ΔϝσΟΞʹద༻ w ৴͍ͯ͠Δɺ͔ͭɺJNQSFTTJPO͕ଟ͍ϝσΟΞ w 6;06શମͷF$1.Λ΄Ͳ্Ͱ͖ͨ
·ͱΊͱࠓޙͷ՝ w తͱ͍ͯͨ͠ɺܭࢉྔͷݮٴͼF$1.ͷ্ΛୡͰ͖ͨ w ϋΠύʔύϥϝʔλͷ࠷దԽ w Ћ ЌͷΛࠓώϡʔϦεςΟοΫʹఆ͍ٛͯ͠Δঢ়ଶͳͷͰ ͝ͱͷ࠷దͳΛٻΊ͍ͨ w
બ͢Δͷߋ৽සͷ࠷దԽ w શͯͷʹ͓͍ͯ࣌ؒݻఆͰΛߋ৽͍ͯ͠ΔͷͰ͝ͱͷ࠷ దͳߋ৽සΛٻΊ͍ͨ w $73ΛՃຯͨ͠ͷ༧ଌ w $53ͷΈ༧ଌͨ͠ͷͰΛબΜͰ͍ΔͷͰ$73༧ଌ౿·͑ Λબ͍ͨ͠
ࢀߟจݙ w ྛ ΞυωοτϫʔΫʹ͓͚Δࠂ৴ܭըͷ ࠷దԽ ਓೳֶձࢽ7PM/P w ଟόϯσΟοτͷཧͱΞϧΰϦζϜ IUUQJCJTNMPSHBSDIJWFJCJT JCJT@CBOEJUQEGT
w ຊଟ३ʰόϯσΟοτͷཧͱΞϧΰϦζ Ϝʱߨஊࣾ