Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
solving of multi-armed bandit problem in advert...
Search
yoppi
April 20, 2018
2
7.7k
solving of multi-armed bandit problem in advertisement recommendation
yoppi
April 20, 2018
Tweet
Share
More Decks by yoppi
See All by yoppi
Applying oCPC algorithm for production
yoppi
2
790
recommendation system with document similarity
yoppi
0
3.2k
RailsはRubyだ
yoppi
0
240
Featured
See All Featured
Designing Experiences People Love
moore
142
24k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
710
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
The World Runs on Bad Software
bkeepers
PRO
70
11k
4 Signs Your Business is Dying
shpigford
184
22k
The Language of Interfaces
destraynor
158
25k
Raft: Consensus for Rubyists
vanstee
140
7k
The Cost Of JavaScript in 2023
addyosmani
51
8.6k
The Straight Up "How To Draw Better" Workshop
denniskardys
235
140k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Transcript
ࠂϨίϝϯυʹ͓͚Δ ଟόϯσΟοτͷద༻ͱղ๏ )JSPLB[V:PTIJEB!ZPQQJCMPH 4QFFF *OD ΦϨγΧφΠτ
import “github.com/yoppi” w !ZPQQJCMPH4QFFF *OD w 6;06UFDIMFBEFOHJOFFS w ϨίϝϯυΤϯδϯ࡞ͬͨΓ w
Ξυαʔό࡞ͬͨΓ w ࠂۀք͘Β͍ʢಈըࠂɺωΠςΟϒΞυʣ w (Pɺ+BWB4DSJQUɺ3VCZ
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
6;06ͷࠂϨίϝϯυ ࠂೖߘ ࠂ৴ ΞυωοτϫʔΫ ʜ ʜ ϝσΟΞ ࠂཧళ
6;06ͷࠂϨίϝϯυ هࣄɾίϯςϯπ ʮ͋ͳͨʹ͓͢͢Ίͷهࣄʯ ϝσΟΞͷهࣄԼ෦ʹ ΟδΣοτͱ͍͏ܗͰ ࠂΛ৴͢Δ
6;06ͷࠂϨίϝϯυ
6;06ͷࠂϨίϝϯυ ࠂ ࠂ ࠂ/ ͋ΔࠂͰԿΛ৴͢ΕΫϦοΫͯ͘͠ΕΔͷ͔ʁ ʜ
ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε৴ ‣ ͦͷϝσΟΞͷهࣄͱࠂͷྨࣅ͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά৴ ‣ ࠂओͷΛݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷΛϨί ϝϯυ͢Δ
w λʔήςΟϯά৴ ‣ ϢʔβͷଐੑΛ%.1͔Βਪଌ͠ྨࣅ͢ΔଐੑͷࠂΛϨ ίϝϯυ͢Δ w աڈͷՌʹجͮ͘৴ ‣ ͦͷʹաڈʹ৴͞ΕͨใΛͱʹՌ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
ࠓճͷϨίϝϯυΤϯδϯͷత w 6;06ͷച্ʢF$1.ʣΛ্͍͛ͨ w ܭࢉྔΛݮΒ͠ޮΑ͘ϨίϝϯυγεςϜΛӡ༻ ͍ͨ͠
ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε৴ w ͦͷϝσΟΞͷهࣄͱࠂͷྨࣅ͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά৴ w ࠂओͷΛݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷΛϨί ϝϯυ͢Δ
w λʔήςΟϯά৴ w ϢʔβͷଐੑΛ%.1͔Βਪଌ͠ྨࣅ͢ΔଐੑͷࠂΛϨ ίϝϯυ͢Δ w աڈͷՌʹجͮ͘৴ w ͦͷʹաڈʹ৴͞ΕͨใΛͱʹՌ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
ଟόϯσΟοτ ͕Kݸ ࣌ؒtʹ͓͍ͯi(t)Λબ͠ ใुXi(t)ΛಘΔ ͦΕΛ܁Γฦͯ͠ใुΛ࠷େ Խ͍ͤͨ͞ i(t) Xi(t)
ଟόϯσΟοτ w બࢶͷू߹͔ΒҰͭΛબͨ͠ͱ͖ʹɺͦͷબࢶʹ ର͢ΔใुΛಘΔ͕ଞͷબࢶͷใुใಘΒΕͳ͍ w ͦͷঢ়گԼͰ܁Γฦ͠બͨ͠ͱ͖ʹใुͷΛ࠷େԽ ͢Δ͜ͱΛࢦ͢ ‣ iͷਅͷظµi Λ͍ͬͯΔͳΒظ͕࠷େʹͳ
ΔΛҾ͖ଓ͚Ε͍͍͕ͦΕલఏ݅ͱͯ͠Ͱ͖ ͳ͍ w ͦΕ·ͰʹಘͨใुΛͱʹબࢶΛܾఆ͢ΔઓུΛํ ࡦʢQPMJDZʣͱݺͿ
ଟόϯσΟοτ ʲྫʳ εϩοτϚγϯΛճ(T= 100) Ҿ͚Δͱ͖ʹ ใुΛ࠷େԽͤ͞Δʹʁ K = 4
ଟόϯσΟοτ K = 4 ʲࡶͳํࡦʳ nճશ෦ͷΛҾ͍ͯՌͷྑ͔ͬͨΛɺ Γͷճ(100-4n) Ҿ͖ଓ͚Δ
ଟόϯσΟοτ K = 4 n = 3 ͜ͷͱ͖ճՌ͕Α͔ͬͨΛҾ͖ଓ͚Δ͕ɺ ͔͔ͨͩճͷ୳ࡧͰਅͷՌΛݕূ͢Δ͜ͱ ͦ͠͏
i* = argmaxi∈{1, 2, 3, 4}µi
ଟόϯσΟοτ K = 4 n = 20 ͜ͷͱ͖ճՌ͕Α͔ͬͨΛҾ͖ଓ͚Δ͕ɺ ใुͷྑ͍Λར༻͢Δճ͕গͳ͘ɺ ใुΛ࠷େԽͤ͞Δ͜ͱ͍͔͠
i* = argmaxi∈{1, 2, 3, 4}µi
୳ࡧͱࣝར༻ͷτϨʔυΦϑ w ୳ࡧʢFYQMPSBUJPOʣ w ଞͷͷ΄͏͕Ռ͕ྑ͍ͷͰͱ༧ଌͯ͠Ҿ͍ͯΈΔ w ͨͩɺʮͬͱଞʹྑ͍͕͋ΔͷͰʁʯΛ܁Γฦ͢ͱɺՌ ͷ͍͍ΛҾ͘ճ͕গͳ͘ͳΓύϑΥʔϚϯεΛͩ͢͜ͱ͕Ͱ͖ ͳ͍ w
ࣝར༻ʢFYQMPJUBUJPOʣ w ͋Δظؒͷ͍͔ͭ͘ͷͷՌ͕࠷ྑ͍ͷΛ͞Βʹબ͠ଓ͚ Δ w ͨͩɺʮ͜ͷ࠷ߴʂʯͱͳͬͯͦΕ͔Γબ͍ͯ͠Δͱɺଞʹ ͬͱύϑΥʔϚϯε͕ྑ͍Λݟ͚ͭΒΕͳ͍ w ୳ࡧͱࣝར༻ΛͦΕͧΕదʹ࣮ߦ͢ΔํࡦΛݕ౼͢Δ͜ͱ͕ॏཁ
ଟόϯσΟοτͷओཁͳํࡦ w ЏHSFFEZ w ୯७ͳ૯Γ݁Ռ͔Βྑ͔ͬͨͷΛҾ͖ଓ͚Δ w 6$#ʢ6QQFSDPOpEFODFCPVOEʣ w ඪຊฏۉʹิਖ਼߲ΛՃ͑ͨͷΛείΞͱͯ͠ΛબͿ w
54ʢ5IPNQTPO4BNQMJOHʣ w աڈʹ؍ଌͰ͖ͨ݁Ռ͔ΒϕΠζ౷ܭͰະདྷͷظΛ༧ ଌ͢Δ
ଟόϯσΟοτͷओཁͳํࡦ w ЏHSFFEZ w 6$# w 5IPNQTPO4BNQMJOH
5IPNQTPO4BNQMJOH w ʮ֤͕ظ࠷େͱͳΔ֬ʯΛͱʹϥϯμϜͰΛબ͢Δํ ࡦʢ֬Ұக๏ʣ w ͜ͷ֬ͷΈΛϕΠζ౷ܭͰఆࣜԽͨ͠ͷ͕5IPNQTPO 4BNQMJOH w ظµi ͕ԿΒ͔ͷࣄલ͔ΒΓཱ͍ͬͯΔͱ͠ɺਅͷظµi
ͷࣄޙΛٻΊΛબ͢Δ w ྫ͑ࠂΛྫʹڍ͛Δͱɺࠂ͕ΫϦοΫ͞ΕͨɺΫϦοΫ͞ Εͳ͔ͬͨͱ͍͏ʹྨͰ͖ΔͷͰࣄલϕϧψʔΠ ʢBer(µ): µ∈[0, 1]ʣͱͳΔ w ܭࢉΛ؆ૉʹ͢ΔͨΊɺϕϧψʔΠͷڞࣄલͰ͋Δ#FUB Λ༻͍Δ
5IPNQTPO4BNQMJOHͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤iʹ͍ͭͯni ←0, mi ←0 2:
for i = 1, 2,...,T do 3: ^μi = Beta(mi+α, (ni-mi)+β) 4: i ← argmaxi∈{1,2,...,K}^μi 5: iΛͻ͍ͯใुXi(t)∈{0,1}Λ؍ଌ͢Δ 6: ni ← ni + 1, mi ← mi + Xi(t) 7: end for
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
6;06ͷࠂϨίϝϯυͷద༻ w ଟόϯσΟοτΛࠂϨίϝϯυΞϧΰϦζϜͱͯ͠ 6;06ద༻ w ͕ࠂʹରԠ͢Δ w ํࡦͱͯ͠5IPNQTPO4BNQMJOHΛ࠾༻ w ࣄલͱࣄޙͦͷࠂͷ$53ͷ
w ຖͷ৴݁Ռ $53 Λࣄલͱͯ͠ࣄޙΛܭࢉ͠ɺ ৴͢ΔࠂΛબ͢Δ w #FUBͷܭࢉ࣌ʹ͓͚ΔЋ ЌώϡʔϦεςΟοΫʹܾΊ ଧͪ
5IPNQTPO4BNQMJOHΛબΜͩ༁ w όονߋ৽ʹؤ݈Ͱ͋Δ w ຊ൪ӡ༻͢ΔࡍʹσʔλαΠζ͔Β͘Δܭࢉྔͷ੍ʹΑΓ ใु݁ՌΛϦΞϧλΠϜͰ؍ଌ͢Δ͜ͱ͕͘͠Ԇͯ͠ߋ৽ ͢Δ͜ͱ͕ଟ͍ w ͕ɺͷબ࣌ʹʮظ࠷େͱͳΔ֬ʹ͕ͨͬͯ͠ϥϯμ ϜͰબ͢ΔʯͨΊɺՌͷѱ͍ʹภΔ͜ͱ͕গͳ͍
w ܭࢉྔ͕গͳ͍ w جຊతͳϩδοΫࣄલ͔ΒࣄޙΛ#FUBΛܭࢉ͢ Δ͚ͩͰΑ͍ͷͰܭࢉྔΛ͑ΒΕΔ w ଟ͘ͷϞσϧͰཧݶքΛୡͰ͖Δ͜ͱ͕ΒΕ͍ͯΔ
54Λࠂʹద༻ͨ͠߹ͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤ࠂiʹ͍ͭͯ impressioni ←0, clicki ←0
2: for i = 1, 2,...,T do 3: ^μi = Beta(clicki+α, (impressioni-clicki)+β) 4: i←argmaxi∈{1,2,…,K}^μi*CPCi*1000 5: ࠂάϧʔϓiΛͻ͍ͯใुXi(t)∈{0, 1} Λ؍ଌ͢Δ 6: impressioni ←impressioni+1, clicki ←clicki+Xi(t) 7: end for
Ћ ЌͬͯͲ͏ܾΊΕ͍͍ͷʁ w ࣄલʢ৽نʹೖߘ͞Εͨࠂʣ͕ͳ͍ࠂʹ͓͍ ͯҙຯΛͳ͢ w ؾΛ͚ͭΔ͖ͱ͜ΖฏۉͱࢄͰΦʔμʔ͕มΘ Δ
ϨίϝϯυγεςϜߏ ࠂϚελ ৴ϩά ৴αʔόɾ࣮ੵूܭαʔό ৴σʔλ 0O.FN $BDIF ৴σʔλ࡞όον 0O.FN $BDIF
.FEJB "EWFSUJTFS
(PWT1ZUIPO w ݁ہͷͱ͜Ζ5IPNQTPO4BNQMJOHࣄલ͔Βࣄ ޙΛٻΊΔͱ͖ʹ#FUBΛܭࢉ͢Δ͚ͩͰΑ͍ w ߴͳֶతͳܭࢉٻΊΒΕͳ͍ w ѹతʹ(PͰܭࢉ͢Δ΄͏͕ճͷόονʹΑΔ࣮ߦ ࣌ؒΛ͘Ͱ͖Δ w
όϯσΟοτΛղ͘ͱ͖ʹͷߋ৽ִؒΛͳ Δ͘খ͘͞ͳΔ΄͏͕ྑ͍ w ͦͷͨΊ(PΛ࠾༻࣮ͯ͠
(PWT1ZUIPO ࣮ߦ࣌ؒ T /
(P 1ZUIPO
ద༻݁Ռ w 6;06Λಋೖ͍ͯ͠ΔϝσΟΞʹద༻ w ৴͍ͯ͠Δɺ͔ͭɺJNQSFTTJPO͕ଟ͍ϝσΟΞ w 6;06શମͷF$1.Λ΄Ͳ্Ͱ͖ͨ
·ͱΊͱࠓޙͷ՝ w తͱ͍ͯͨ͠ɺܭࢉྔͷݮٴͼF$1.ͷ্ΛୡͰ͖ͨ w ϋΠύʔύϥϝʔλͷ࠷దԽ w Ћ ЌͷΛࠓώϡʔϦεςΟοΫʹఆ͍ٛͯ͠Δঢ়ଶͳͷͰ ͝ͱͷ࠷దͳΛٻΊ͍ͨ w
બ͢Δͷߋ৽සͷ࠷దԽ w શͯͷʹ͓͍ͯ࣌ؒݻఆͰΛߋ৽͍ͯ͠ΔͷͰ͝ͱͷ࠷ దͳߋ৽සΛٻΊ͍ͨ w $73ΛՃຯͨ͠ͷ༧ଌ w $53ͷΈ༧ଌͨ͠ͷͰΛબΜͰ͍ΔͷͰ$73༧ଌ౿·͑ Λબ͍ͨ͠
ࢀߟจݙ w ྛ ΞυωοτϫʔΫʹ͓͚Δࠂ৴ܭըͷ ࠷దԽ ਓೳֶձࢽ7PM/P w ଟόϯσΟοτͷཧͱΞϧΰϦζϜ IUUQJCJTNMPSHBSDIJWFJCJT JCJT@CBOEJUQEGT
w ຊଟ३ʰόϯσΟοτͷཧͱΞϧΰϦζ Ϝʱߨஊࣾ