Slide 1

Slide 1 text

ֶҐ࿦จ ࡾ୐༔հ / ۝भେֶ େֶӃγεςϜ৘ใՊֶ෎ ৘ใ஌ೳ޻ֶઐ߈ 2024.08.22 ֶҐ࿦จެௌձ ଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥʹ దԠ͢Δ৘ใγεςϜ

Slide 2

Slide 2 text

1. ͸͡Ίʹ 2. ܧଓతʹબ୒Λ࠷దԽ͢Δ৘ใγεςϜͷઃܭ 3. ଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥ΁ͷదԠ 4. దԠͷߴ଎Խ 5. ඇઢܗੑ΁ͷରԠ 6. ·ͱΊ  2 ໨࣍

Slide 3

Slide 3 text

1. ͸͡Ίʹ ຊݚڀͷഎܠɾ՝୊ɾ໨త

Slide 4

Slide 4 text

• ஍ݩ෱ԬͷSIerۈ຿Λܦͯɺ2012೥ΑΓגࣜձࣾpaperboy&co.(ݱGMOϖύ Ϙגࣜձࣾ)ʹۈ຿ɻࢿ࢈؅ཧγεςϜ΍ΠϯλʔωοταʔϏεʹ͓͚Δ WebΞϓϦέʔγϣϯͷ։ൃɾӡ༻ҡ࣋ۀ຿ʹैࣄɻ • 2017೥ΑΓಉࣾͷݚڀ৬ɻ৘ใγεςϜͷࣗ཯దԠ౳ͷݚڀʹैࣄɻ • 2020೥10݄ΑΓ ۝भେֶେֶӃγεςϜ৘ใՊֶ෎ത࢜ޙظ՝ఔʢ৘ใ஌ೳ޻ֶઐ߈ʣ  4 ུྺ

Slide 5

Slide 5 text

৘ใγεςϜͱ؀ڥมԽ

Slide 6

Slide 6 text

• ଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥͷதͰɺ৘ใγεςϜ͕ܧଓతʹػೳ͢Δʹ ͸ɺͦͷߏ੒΍ϩδοΫΛߋ৽͠มԽʹ௥ै͢Δඞཁ͕͋Δ • → ྫʣ৘ใγεςϜͷෛՙঢ়گɺར༻ऀͷߦಈͷมԽ౳ • ͜Ε·Ͱɺ͜ͷ௥ै͸ɺӡ༻ҡ࣋ۀ຿ͱͯ͠ӡ༻ऀ͕୲͖ͬͯͨ  6 ৘ใγεςϜͱ؀ڥมԽ

Slide 7

Slide 7 text

• ଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥͷதͰɺ৘ใγεςϜ͕ܧଓతʹػೳ͢Δʹ ͸ɺͦͷߏ੒΍ϩδοΫΛߋ৽͠มԽʹ௥ै͢Δඞཁ͕͋Δ • → ྫʣ৘ใγεςϜͷෛՙঢ়گɺར༻ऀͷߦಈͷมԽ౳ • ͜Ε·Ͱɺ͜ͷ௥ै͸ɺӡ༻ҡ࣋ۀ຿ͱͯ͠ӡ༻ऀ͕୲͖ͬͯͨ  7 ৘ใγεςϜͱ؀ڥมԽ • ਓखʹΑΔ؀ڥͷมԽݕ஌΍৘ใγεςϜͷߋ৽͸ɺ௥ै΁ͷ࣌ؒࠩΛ൐͏ • ݁Ռͱͯ͠ɺ҆ఆੑ΍ར༻ऀͷຬ଍౓ͷ௿Լɺӡ༻ऀͷෛ୲ͷ૿ՃΛট͘

Slide 8

Slide 8 text

• ैདྷͷӡ༻ҡ࣋ͷऔΓ૊ΈͰ͸ɺ৘ใγεςϜΛ؀ڥͷมԽʹରԠͤ͞Δͨ ΊɺਓʹΑΔܦݧଇ΍൑அͱ෦෼తͳࣗಈԽʹཹ·͍ͬͯΔ • → ྫʣܦݧଇʹΑΔᮢ஋ઃఆɺ਺஋Խͨ͠ར༻ऀͷߦಈୡ੒཰ʹΑΔ൑அ  8 ؀ڥมԽʹࣗΒదԠ͢Δ৘ใγεςϜʹ޲͚ͯ • ਓʹΑΔ൑அ΍ߋ৽ͷ޻ఔΛࣗಈԽ͠ɺ৘ใγεςϜࣗମ͕؀ڥมԽΛଊ͑ม Խʹ௥ै͢ΔదԠతͳ࢓૊Έͷݚڀ • ͳΒͼʹ࣮ӡ༻΁ͷద༻ • ݚڀίϯηϓτ

Slide 9

Slide 9 text

؀ڥมԽʹࣗΒదԠ͢Δ৘ใγεςϜ

Slide 10

Slide 10 text

• ैདྷͷγεςϜ։ൃͰ͸ɺར༻ऀ͔Βͷೖྗ  ʹରͯ͠ग़ྗ  Λܾఆ͢Δؔ਺  Λઃܭ͍ͯ͠Δ x y f  10 దԠత৘ใγεςϜͷ࣮ݱʹ޲͚ͨΞϓϩʔν • ਓखʹΑΔ؀ڥͷมԽݕ஌΍৘ใγεςϜͷߋ৽͸ɺ௥ै΁ͷ࣌ؒࠩΛ൐͏ • ݁Ռͱͯ͠ɺ҆ఆੑ΍ར༻ऀͷຬ଍౓ͷ௿Լɺӡ༻ऀͷෛ୲ͷ૿ՃΛট͘ y = f(x) IUUQTJDPOTDPN

Slide 11

Slide 11 text

• ػցֶश͸ʮؔ਺ͷઃܭʯΛࣗಈԽ͢Δʢσʔλ͔ΒύϥϝʔλΛٻΊΔʣ  11 దԠత৘ใγεςϜͷ࣮ݱʹ޲͚ͨΞϓϩʔν ᶃ ೖग़ྗͷରԠؔ܎Λఆٛ ᶄ ༧ଌʹର͢ΔζϨΛఆٛ ᶅ ֶशσʔλʹର͢Δ ༧ଌͷζϨΛ࠷খԽ͢Δ ᶆ ύϥϝʔλ͕ܾ·Δ ʢσʔλ͔Βؔ਺ͷઃܭ͕Ͱ͖ͨʣ y = f(x) IUUQTJDPOTDPN

Slide 12

Slide 12 text

• ઃܭ͞Εͨؔ਺͸ɺ͋Δ࣌఺·Ͱͷσʔλʹجͮ͘ৼΔ෣͍Λߦ͏ʢ಺ૠੑʣ • αʔϏεΛऔΓר͘؀ڥ͸ܧଓతʹมԽ͍ͯ͠Δͱߟ͑ΒΕΔͨΊɺؔ਺ͷܧ ଓతͳ࠶ઃܭ͕ඞཁʢ֎ૠੑͷߟྀʣ  12 దԠత৘ใγεςϜͷ࣮ݱʹ޲͚ͨΞϓϩʔν • ͋Δ࣌఺Ͱ༗ޮͳؔ਺ΛαʔϏε΍ར༻ऀͷ؀ڥͷมԽʹԠͯ͡ܧଓతʹ࠶ઃ ܭʢ͋Δ͍͸ผؔ਺΁ͷஔ͖׵͑ʣΛߦ͏ܥશମͷ࢓૊Έ΁ IUUQTJDPOTDPN

Slide 13

Slide 13 text

બ୒ͷ࠷దԽ

Slide 14

Slide 14 text

• ৘ใγεςϜʹ͓͚Δ৘ใաଟ໰୊Λղܾ͢ΔɺਪનγεςϜͷಋೖ • ͳΜΒ͔ͷػցֶशϞσϧʢ=ਪનख๏ʣʹج͖ͮଟ਺ͷબ୒ࢶ͔Βར༻ऀ ͕ڵຯΛ࣋ͭ΋ͷΛఏҊ͢ΔγεςϜ  14 ۩ମྫʹݟΔదԠత৘ใγεςϜͷ࣮ݱʹ޲͚ͨ՝୊ 6TFST 3FDPNNFOEBUJPO .FUIPE 1SPEVDUT IUUQTJDPOTDPN

Slide 15

Slide 15 text

• ৘ใγεςϜʹ͓͚Δ৘ใաଟ໰୊Λղܾ͢ΔɺਪનγεςϜͷಋೖ • ͳΜΒ͔ͷػցֶशϞσϧʢ=ਪનख๏ʣʹج͖ͮଟ਺ͷબ୒ࢶ͔Βར༻ऀ ͕ڵຯΛ࣋ͭ΋ͷΛఏҊ͢ΔγεςϜ • ਺ଟ͘ͷਪનख๏͕ఏҊ͞Ε͍ͯΔ  15 6TFST 3FDPNNFOEBUJPO .FUIPET 1SPEVDUT IUUQTJDPOTDPN ۩ମྫʹݟΔదԠత৘ใγεςϜͷ࣮ݱʹ޲͚ͨ՝୊

Slide 16

Slide 16 text

• ৘ใγεςϜʹ͓͚Δ৘ใաଟ໰୊Λղܾ͢ΔɺਪનγεςϜͷಋೖ • ͳΜΒ͔ͷػցֶशϞσϧʢ=ਪનख๏ʣʹج͖ͮଟ਺ͷબ୒ࢶ͔Βར༻ऀ ͕ڵຯΛ࣋ͭ΋ͷΛఏҊ͢ΔγεςϜ • ਺ଟ͘ͷਪનख๏͕ఏҊ͞Ε͍ͯΔ → ޮՌతͳʮਪનख๏ͷબఆʯ͕ॏཁ  16 6TFST #FTU3FDPNNFOEBUJPO .FUIPE 1SPEVDUT IUUQTJDPOTDPN ۩ମྫʹݟΔదԠత৘ใγεςϜͷ࣮ݱʹ޲͚ͨ՝୊

Slide 17

Slide 17 text

• ৘ใγεςϜʹ͓͚Δ৘ใաଟ໰୊Λղܾ͢ΔɺਪનγεςϜͷಋೖ • ͳΜΒ͔ͷػցֶशϞσϧʢ=ਪનख๏ʣʹج͖ͮଟ਺ͷબ୒ࢶ͔Βར༻ऀ ͕ڵຯΛ࣋ͭ΋ͷΛఏҊ͢ΔγεςϜ • ਺ଟ͘ͷਪનख๏͕ఏҊ͞Ε͍ͯΔ → ޮՌతͳʮਪનख๏ͷબఆʯ͕ॏཁ  17 • ޮՌతͳਪનख๏͸ঢ়گʹΑͬͯҟͳΔ • ͔͠͠ͳ͕Βɺ࣮؀ڥͰͷܧଓతͳਪનख๏ͷධՁʹ͸ػձଛࣦ͕൐͏ ӡ༻্ͷ՝୊ ۩ମྫʹݟΔదԠత৘ใγεςϜͷ࣮ݱʹ޲͚ͨ՝୊

Slide 18

Slide 18 text

બ୒ͷ࠷దԽʹ͓͚Δ՝୊

Slide 19

Slide 19 text

 19 ࣮؀ڥͰͷධՁʹΑΔػձଛࣦ • ࣮؀ڥͰͷൺֱධՁʹ͸ػձଛࣦ͕ੜ͡ΔϦεΫ͕൐͏ ۉ౳ʹީิΛࢼͯ͠͠·͏ͱ ໌Β͔ʹྼΔީิΛ࢖͍ͬͯ Δظؒɺػձଛࣦ͕ൃੜ͢Δ ީิͷݟ੾Γ͕ૣ͗͢Δͱɺਅ ʹ༗ޮͳީิΛআ͍ͯ͠·͏Մ ೳੑ͕࢒Δ IUUQTJDPOTDPN

Slide 20

Slide 20 text

• ʮ࿹ʯͱݺ͹ΕΔෳ਺ͷީิ͔ΒಘΒΕΔใुΛ࠷େԽ͢Δ໰୊ • ϓϨΠϠʔ͸Ұ౓ͷࢼߦͰ1ͭͷ࿹Λબ୒͠ɺใुΛಘΔ • ͦΕͧΕͷ࿹͸͋Δใु෼෍ʹै͍ใुΛੜ੒ • ͨͩ͠ɺϓϨΠϠʔ͸͜ͷใु෼෍Λࢼߦͷ݁Ռ͔Βਪଌ͢Δඞཁ͕͋Δ  20 ଟ࿹όϯσΟοτ໰୊ ʜ બ୒ ใु ਪଌ ʮ࿹ʯ͸εϩοτϚγʔϯͷʮΞʔϜʢArmʣʯʹ༝དྷ IUUQTJDPOTDPN

Slide 21

Slide 21 text

• ʮ࿹ʯͱݺ͹ΕΔෳ਺ͷީิ͔ΒಘΒΕΔใुΛ࠷େԽ͢Δ໰୊ • ϓϨΠϠʔ͸Ұ౓ͷࢼߦͰ1ͭͷ࿹Λબ୒͠ɺใुΛಘΔ • ͦΕͧΕͷ࿹͸͋Δใु෼෍ʹै͍ใुΛੜ੒ • ͨͩ͠ɺϓϨΠϠʔ͸͜ͷใु෼෍Λࢼߦͷ݁Ռ͔Βਪଌ͢Δඞཁ͕͋Δ  21 ଟ࿹όϯσΟοτ໰୊ • ϓϨΠϠʔ͸͋Δ࣌఺ͷ࿹ͷධՁʹج͖ͮʮ׆༻ʯͱʮ୳ࡧʯΛฒߦͯ͠ߦ͏ • ͜ͷτϨʔυΦϑΛղফ͢ΔͨΊʹ༷ʑͳํࡦ͕ఏҊ͞Ε͍ͯΔ

Slide 22

Slide 22 text

• ࠷΋୯७ͳํࡦͰ͋Δ  -Greedy͸ɺൺ཰  Ͱۉ౳ʹ࿹Λબ୒ʢ୳ࡧʣ͠ɺൺ཰  Ͱͦͷ࣌఺ͷฏۉใु͕࠷΋େ͖͍࿹Λબ୒ʢ׆༻ʣ͢Δ ϵ ϵ 1 − ϵ  22 ଟ࿹όϯσΟοτํࡦ argmaxl=1,L ̂ y(l),1 − ϵ ∀a ∈ A, ϵ/L Bandit A/B testing ∀a ∈ A, ϵ/L

Slide 23

Slide 23 text

ଟ࿹όϯσΟοτํࡦΛ༻͍ͨదԠత৘ใγεςϜ  23 User(s) System Exploitation and Exploration using Multi-armed bandits Feedback Estimation Truth t-1 t-1 t-1 • ଟ࿹όϯσΟοτํࡦʹΑΔػցֶशϞσϧͷબఆʹ͓͚Δػձଛࣦͷ௿ݮ • → దԠత৘ใγεςϜͷ࣮ӡ༻΁ͷద༻ͷোนΛऔΓআ͘ IUUQTJDPOTDPN

Slide 24

Slide 24 text

ଟ࿹όϯσΟοτํࡦΛ༻͍ͨదԠత৘ใγεςϜ  24 User(s) System Exploitation and Exploration using Multi-armed bandits Feedback Estimation Truth t-1 t-1 t-1 • ଟ࿹όϯσΟοτํࡦʹΑΔػցֶशϞσϧͷબఆʹ͓͚Δػձଛࣦͷ௿ݮ • → దԠత৘ใγεςϜͷ࣮ӡ༻΁ͷద༻ͷোนΛऔΓআ͘ ػցֶशϞσϧͷಛੑΛ౿·͑ͨ ࣮༻తͳํࡦ IUUQTJDPOTDPN

Slide 25

Slide 25 text

 25 ಛੑ(1): จ຺΍࣌ؒͷܦաʹΑΔ༗༻ੑͷมಈ ঎඼શମ ΞΫηαϦΧςΰϦ ϑʔυΧςΰϦ • ػցֶशϞσϧͷ༗༻ੑ͸ɺจ຺΍࣌ؒͷܦաʹΑͬͯมಈ͢Δ • ਪનख๏ͷ༗༻ੑ͕࣌ؒͷܦաʹ൐͍มԽ • ਪનख๏Λద༻͢Δ঎඼ΧςΰϦ͝ͱʹ༗༻ੑ͕ҟͳΔ

Slide 26

Slide 26 text

• ػցֶशϞσϧͷ࣮ߦ࣌ؒʹɺํࡦͷ࣮ߦ͕࣌ؒ௥Ճ͞ΕΔ • ػցֶशϞσϧͷԠ౴࣌ؒ͸ར༻ऀͷମݧʹ௚઀తͳӨڹΛٴ΅͢ • ํࡦͷֶश͕࣌ؒ௕͚Ε͹ར༻ऀͷᅂ޷ͷมԽʹ௥ैͰ͖ͳ͍  26 ಛੑ(2): దԠʹ൐͏͕࣌ؒٴ΅͢༗༻ੑ΁ͷӨڹ up to100ms Request Response MAB Policy ML Model t=1 t=10 t=100 t=1000 x

Slide 27

Slide 27 text

• ৘ใγεςϜͰѻ͏σʔλͷछྨͱྔ͕૿Ճ • ਓޱ౷ܭֶతଐੑ౳ͷߏ଄Խσʔλ͔Βɺը૾ɾࣗવݴޠ౳ͷඇߏ଄Խσʔλ΁ • ઢܗͳํࡦͰ͸ɺඇઢܗͷΑ͏ͳෳࡶͳؔ܎ੑΛॆ෼ʹѻ͏͜ͱ͕Ͱ͖ͳ͍  27 ಛੑ(3): ༗༻ੑͷਪఆʹ͓͚Δෳࡶͳؔ܎ੑ • ίϯςΩετ৘ใ  ʹର͢ Δใु෼෍ͷྫɻ • ࠨ͸ઢܗ  ɺӈ͸ඇઢܗΛԾ ఆ͢Δํࡦ͕ద͢Δͱߟ͑ΒΕΔ x = (x1 , x2 )⊤ ( ̂ y = x⊤w)

Slide 28

Slide 28 text

ຊݚڀͷ໨త

Slide 29

Slide 29 text

ຊݚڀͷ၆ᛌਤ  29 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ

Slide 30

Slide 30 text

 30 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ [48] ࡾ୐ ༔հ, ็ ߃ݑ, Synapse: จ຺ʹԠͯ͡ܧଓతʹਪનख๏ͷબ୒Λ࠷దԽ͢Δਪનγε ςϜ, ిࢠ৘ใ௨৴ֶձ࿦จࢽD, Vol.J103-D, No.11, pp.764-775, Nov 2020.

Slide 31

Slide 31 text

 31 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ [49] ࡾ୐ ༔հ, ็ ߃ݑ, Synapse: จ຺ͱ࣌ؒܦաʹԠͯ͡ਪનख๏ͷબ୒Λ࠷దԽ͢Δϝλਪનγε ςϜ, ిࢠ৘ใ௨৴ֶձ࿦จࢽD, Vol.J105-D, No.11, pp.641-652, Nov. 2022.

Slide 32

Slide 32 text

 32 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU [50] Yusuke Miyake, Tsunenori Mine, Contextual and Nonstationary Multi-armed Bandits Using the Linear Gaussian State Space Model for the Meta-Recommender System, 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp.3138-3145, Oct 2023.

Slide 33

Slide 33 text

 33 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU [51] Yusuke Miyake, Ryuji Watanabe, Tsunenori Mine, Online Nonstationary and Nonlinear Bandits with Recursive Weighted Gaussian Process, The 48th IEEE International Conference on Computers, Software, and Applications (COMPSAC 2024), pp.11-20, Jul 2024.

Slide 34

Slide 34 text

2. ܧଓతʹબ୒Λ࠷దԽ͢Δ ৘ใγεςϜͷઃܭ

Slide 35

Slide 35 text

ຊݚڀͷ၆ᛌਤ  35 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ

Slide 36

Slide 36 text

ؔ࿈ݚڀ ෼໺ͰͷऔΓ૊Έͱઌߦݚڀʹର͢ΔຊݚڀͷҐஔ෇͚

Slide 37

Slide 37 text

• ΦϑϥΠϯධՁͰ͸ɺ࣮؀ڥͰͷධՁ݁ՌͱͷࠩҟΛ׬શʹ͸ແͤ͘ͳ͍ • ΦϯϥΠϯ॓ധ༧໿αΠτʢBooking.comʣͰ͸ɺػցֶशϞσϧͷ༗༻ ੑ͸ɺΦϑϥΠϯධՁͰඞͣ͠΋ਖ਼֬ʹ൑அͰ͖ͳ͍ͨΊɺੑೳྼԽͷϔ ϧενΣοΫͱͯ͠ར༻͍ͯ͠Δ[13] • ଟ࿹όϯσΟοτํࡦΛ༻͍ͨదԠܕ৘ใγεςϜ • ଟ͘ͷؔ࿈ݚڀ[26,27,28,29]Ͱ͸୯Ұͷํࡦ͕૝ఆ͞Ε͍ͯΔͨΊɺػցֶशϞ σϧͷಛੑʹΑΔػձଛࣦΛे෼ʹ௿ݮͰ͖ͳ͍  37 ΦϯϥΠϯධՁͷඞཁੑ

Slide 38

Slide 38 text

 38 AdaptEx [30] [W. Black 2023] • ଟ࿹όϯσΟοτํࡦΛ༻͍ͨར༻ऀମݧͷݸผԽϓϥοτϑΥʔϜ • ΦϯϥΠϯཱྀߦαΠτʢExpediaʣͰར༻ऀͷମݧ ʢ࿹ͱͯ͠ͷ৘ใͷ಺༰΍දࣔҐஔʣΛ࠷దԽ͢Δ • ମݧʢ࿹ʣͷ։ൃऀ͸ɺઐ໳஌ࣝΛ࣋ͨͣʹଟ࿹όϯσΟοτʹΑΔൺֱධ ՁΛಋೖɾల։Մೳ • جຊత΋͘͠͸จ຺෇͖ͷଟ࿹όϯσΟοτํࡦ͔Βબ୒Մೳ • → ࿹ͷಛੑʹԠͨ͡ํࡦͷબఆʹΑΔػձଛࣦͷ௿ݮ͕ظ଴Ͱ͖Δ

Slide 39

Slide 39 text

 39 • ଟ࿹όϯσΟοτํࡦΛ༻͍ͨར༻ऀମݧͷݸผԽϓϥοτϑΥʔϜ • جຊత΋͘͠͸จ຺෇͖ͷଟ࿹όϯσΟοτํࡦ͔Βબ୒Մೳ • → ࿹ͷಛੑʹԠͨ͡ํࡦͷબఆʹΑΔػձଛࣦͷ௿ݮ͕ظ଴Ͱ͖Δ • ͜ΕΒͷطଘͷํࡦͰ͸ɺػցֶशϞσϧͷ࣋ͭಛੑΛશͯߟྀͰ͖ͳ͍ • → ෳ਺ํࡦͷ࢖͍෼͚ʹՃ͑ɺํࡦͦͷ΋ͷͷ։ൃ͕ٻΊΒΕΔ AdaptEx [30] [W. Black 2023]

Slide 40

Slide 40 text

ఏҊख๏ Synapse

Slide 41

Slide 41 text

• ຊݚڀͰ͸ɺ࣮؀ڥͰͷධՁʹΑΔػձଛࣦͱ͍͏՝୊ʹର͠ɺਪનख๏ͷબ ୒Λ࠷దԽ͢ΔͨΊɺҎԼͷཁ݅Λຬͨ͢ϝλਪનγεςϜΛઃܭ͢Δ  41 ܧଓతʹਪનख๏ͷબ୒Λ࠷దԽ͢ΔϝλਪનγεςϜ 1. ਪનख๏ͷಛੑʹԠͨ͡બ୒ 2. ೚ҙͷਪનख๏Λൺֱର৅ͱͯ͠ར༻ 3. ܧଓతͳਪનख๏ͷ༗ޮੑͷධՁ 4. ධՁʹ·ͭΘΔػձଛࣦͷ௿ݮ

Slide 42

Slide 42 text

 42 ఏҊγεςϜʢSynapseʣ

Slide 43

Slide 43 text

 43 1. ਪનख๏ͷಛੑʹԠͨ͡બ୒ 2. ೚ҙͷਪનख๏Λൺֱର৅ͱͯ͠ར༻ 3. ܧଓతͳਪનख๏ͷ༗ޮੑͷධՁ 4. ධՁʹ·ͭΘΔػձଛࣦͷ௿ݮ ᶃ ਪનख๏ͷಛੑʹԠͯ͡ ଟ࿹όϯσΟοτํࡦΛܾఆ͢Δ ఏҊγεςϜʢSynapseʣ

Slide 44

Slide 44 text

 44 1. ਪનख๏ͷಛੑʹԠͨ͡બ୒ 2. ೚ҙͷਪનख๏Λൺֱର৅ͱͯ͠ར༻ 3. ܧଓతͳਪનख๏ͷ༗ޮੑͷධՁ 4. ධՁʹ·ͭΘΔػձଛࣦͷ௿ݮ ᶄ ਪનख๏Λొ࿥͢Δ ڞ௨ͷΠϯλʔϑΣʔεΛຬͨ͢ਪન ख๏Ͱ͋Ε͹೚ҙͷख๏Λར༻Մೳ ఏҊγεςϜʢSynapseʣ

Slide 45

Slide 45 text

 45 1. ਪનख๏ͷಛੑʹԠͨ͡બ୒ 2. ೚ҙͷਪનख๏Λൺֱର৅ͱͯ͠ར༻ 3. ܧଓతͳਪનख๏ͷ༗ޮੑͷධՁ 4. ධՁʹ·ͭΘΔػձଛࣦͷ௿ݮ ᶅ ਪનख๏Λબ୒ɾධՁ͢Δ ଟ࿹όϯσΟοτํࡦΛ༻͍ͯਪનख ๏Λબ୒͢Δɻ ར༻ऀͷߦಈ݁Ռ͸อଘ͞ΕҰఆظؒ ͝ͱʹධՁ͕ߋ৽͞ΕΔɻ ఏҊγεςϜʢSynapseʣ

Slide 46

Slide 46 text

• ఏҊγεςϜͰ͸ɺඪ४తͳํࡦͱͯ͠ɺैདྷͷจ຺෇͖ํࡦͰ͋Δɺ Linear Thompson SamplingʢLTSʣΛ࠾༻͢Δ  46 ܧଓతʹਪનख๏ͷબ୒Λ࠷దԽ͢ΔϝλਪનγεςϜ  l* t = argmaxl=1,L (x⊤ t ˜ w(l) N , ˜ w(l) N ∼ 𝒩 D (A−1 N bN , σ2 ϵ A−1 N )) ֶशʹΑΓੜ੒͞ΕΔཚ਺͕ɺίϯςΩετ৘ใʹ Ԡͨ͡෼෍΁ͱมԽ͍༷ͯ͘͠ࢠ • LTSʹ͓͚Δ࿹ͷબఆʢ֬཰Ұக๏ʣ • ਪఆͨ͠ใुͷฏۉͱɺࢼߦճ਺ʹԠͨ͡ෆ࣮֬ੑͷදݱͰ͋Δڞ෼ࢄߦ ྻʹै͏ཚ਺ͱɺίϯςΩετ৘ใͷ಺ੵ͕࠷΋େ͖ͳ࿹Λબఆ͢Δ

Slide 47

Slide 47 text

ධՁͱߟ࡯

Slide 48

Slide 48 text

• ෳ਺ͷػցֶशϞσϧͷ࠾༻ͱͦͷܧଓతධՁͱ͍͏γεςϜํ͕ࣜ༗༻͔ • ಛੑʹԠͨ͡ํࡦΛ࢖͍෼͚Δઃܭ͕༗༻͔ • ࠾༻ํࡦ͕ػձଛࣦΛ௿ݮ͢Δ͔ʢఏҊํࡦͷޮՌ͸֤ষͰධՁʣ • ࣮γεςϜͰͷ੍໿Λ౿·͑ͨධՁͰ΋্ه͕ຬͨͤΔ͔  48 ධՁͷ؍఺

Slide 49

Slide 49 text

• ࣮ࡍͷECαΠτ͔Β࠾औͨ͠࿡ͭͷਪનख๏ͷ঎඼ΧςΰϦ͝ͱͷΫϦοΫ ཰ͷਪҠ࣮੷σʔλΛ༻͍ͯఏҊγεςϜͷ༗ޮੑΛධՁ͢Δ • 2019/6/20ʙ8/4·Ͱͷ໿225ສճͷਪનσʔλ͔Βࢉग़ • ࿡ͭͷਪનख๏ͱ18ͷ঎඼ΧςΰϦ͝ͱʹɺ1࣌ؒ୯ҐͰूܭ  49 ධՁσʔλͱਪનख๏

Slide 50

Slide 50 text

• ࣮ࡍͷECαΠτ͔Β࠾औͨ͠࿡ͭͷਪનख๏ͷ঎඼ΧςΰϦ͝ͱͷΫϦοΫ ཰ͷਪҠ࣮੷σʔλΛ༻͍ͯఏҊγεςϜͷ༗ޮੑΛධՁ͢Δ • ࣌ؒͷܦա΍঎඼ΧςΰϦʹΑͬͯΫϦοΫ཰ͷߴ͍ਪનख๏͕ҟͳΔ  50 ධՁσʔλͱਪનख๏ ঎඼શମ ΞΫηαϦΧςΰϦ ϑʔυΧςΰϦ

Slide 51

Slide 51 text

• બ୒ͨ͠ਪનख๏͔ΒಘΒΕΔΫϦοΫ਺ͷγϛϡϨʔγϣϯ • ํࡦʹΑΓબ୒͞Εͨਪનख๏͸ɺઃఆͨ͠ΫϦοΫ཰ͷϕϧψʔΠ෼෍ʹै͍ਪ ન݁Ռ͕ΫϦοΫ͞ΕΔ΋ͷͱ͢Δ • ֤ਪનख๏͸঎඼ΧςΰϦ਺ͱ౳͍͠18࣍ݩͷύϥϝʔλ  Λ࣋ͭ • ΫϦοΫ཰͸  ͱίϯςΩετ৘ใ  ͷ಺ੵͰܭࢉ͞ΕΔ • ίϯςΩετ৘ใ  ͸ɺ࣌఺  ʹ͓͍ͯར༻ऀ͕Ӿཡ͍ͯ͠Δ঎඼ΧςΰϦͷ1-hot ϕΫτϧͱͯ͠දݱ͞ΕΔ • ࣮ࡍͷਪનγεςϜͷڍಈͱ߹ΘͤΔͨΊɺใु͸1࣌ؒ͝ͱʹ·ͱΊͯϑΟʔυ όοΫ͞ΕΔ΋ͷͱ͢Δ ˜ w(l) t ˜ w(l) t xt xt t  51 ධՁํ๏ʢ1/2ʣ

Slide 52

Slide 52 text

• จ຺ͱ࣌ؒͷܦաͷߟྀͷͦΕͧΕͷد༩౓Λ໌Β͔ʹ͢Δ4άϧʔϓͷγ ϛϡϨʔγϣϯΛ࣮ࢪ  52 ධՁํ๏ʢ2/2ʣ ࣌ؒͷܦա º ˓ จ຺ º "ىटͷ࠷ળͳਪનख๏ΛશظؒҰ؏ ͯ͠༻͍Δ $࣌఺͝ͱʹධՁͷߴ͍ਪનख๏Λόϯ σΟοτΛ༻͍ͯબఆ ˓ #จ຺͝ͱʹ࠷ળͳਪનख๏Λશظؒ Ұ؏ͯ͠༻͍Δ %จ຺͝ͱ࣌఺͝ͱʹධՁͷߴ͍ਪનख ๏ΛόϯσΟοτΛ༻͍ͯબఆ • จ຺ʹ͸ɺਪન࣌ʹӾཡதͷ঎඼ΧςΰϦΛ༻͍Δ • ଟ࿹όϯσΟοτํࡦ͸ɺ -Greedy(B) ɺLinUCB(D) ɺLTS(D) ϵ

Slide 53

Slide 53 text

• จ຺ͱ࣌ؒͷܦաͷߟྀͷͦΕͧΕͷد༩౓Λ໌Β͔ʹ͢Δ4άϧʔϓͷγ ϛϡϨʔγϣϯΛ࣮ࢪ  53 ධՁํ๏ʢ2/2ʣ ࣌ؒͷܦա º ˓ จ຺ º "ىटͷ࠷ળͳਪનख๏ΛશظؒҰ؏ ͯ͠༻͍Δ $࣌఺͝ͱʹධՁͷߴ͍ਪનख๏Λόϯ σΟοτΛ༻͍ͯબఆ ˓ #จ຺͝ͱʹ࠷ળͳਪનख๏Λશظؒ Ұ؏ͯ͠༻͍Δ %จ຺͝ͱ࣌఺͝ͱʹධՁͷߴ͍ਪનख ๏ΛόϯσΟοτΛ༻͍ͯબఆ ΦϯϥΠϯͰͷܧଓతͳධՁ͕ͦ΋ͦ΋༗ޮͰ͋Δ͔ ಛੑʹԠͨ͡ํࡦͷ࢖͍෼͚͕༗ޮͰ͋Δ͔

Slide 54

Slide 54 text

 54 • Bʢจ຺ʣɿ঎඼ΧςΰϦ͝ͱͷظट࣌఺ ʹ͓͚Δ࠷దͳਪનख๏͕ҟͳΔͨΊɺ͜ ΕʹԠ͡Δ͜ͱͰվળ • Cʢ࣌ؒͷܦաʣɿશͯͷ঎඼ΧςΰϦΛ ௨ͯ͠ͷ࠷దͳਪનख๏͸ظؒதมԽ͠ͳ ͍ͨΊɺ݁Ռతʹ୳ࡧ͕ແବͱͳͬͨ • → ࠷దͳਪનख๏͕໌Β͔ͰมԽ͕ͳ͍ͳΒ͹ํࡦ ͷಋೖ͕൓ରʹػձଛࣦΛੜΉ • Dʢจ຺ͱ࣌ؒͷܦաʣɿ঎඼ΧςΰϦ͝ ͱͷมԽʹ௥ैͨ͜͠ͱͰվળɻํࡦ͝ͱ ʹޮՌͷ͕ࠩݟΒΕͨ ධՁ݁Ռ: AάϧʔϓΛج४ͱͨ͠ྦྷੵใुͷࠩͷൺֱ จ຺ͷΈ ࣌ؒͷܦաͷΈ จ຺ͱ࣌ؒͷܦաΛߟྀ͢Δ ඪ४ํࡦʹΑͬͯ໿2%૿Ճ

Slide 55

Slide 55 text

• Dάϧʔϓͷඪ४ํࡦʢLTSʣͷྦྷੵใु Λ෼ੳ͢Δͱɺશͯͷ঎඼ΧςΰϦʹ͓͍ ͯɺ୳ࡧͷίετΛճऩͰ͖͍ͯΔΘ͚Ͱ ͸ͳ͍ • ͨͩ͠ɺظؒதʹ࠷దͳਪનख๏͕੾Γସ Θͬͨ঎඼ΧςΰϦʹ͓͍ͯେ͖͘վળ͢ ΔҰํͰɺͦ͏Ͱͳ͍৔߹ʹ͸ෆཁͳ୳ࡧ Λ཈͑ͨ͜ͱͰɺશମͱͯ͠ͷػձଛࣦΛ ௿ݮͨ͠ • → ۉ౳ͳ୳ࡧͷ  -Greedy΍ɺใु͕஗ΕΔ؀ڥͰ࿹ ͕ݻఆ͞ΕΔLinUCB͸ຊධՁઃఆͰ͸ෆརͱͳͬͨ ϵ  55 ධՁ݁Ռ: ঎඼ΧςΰϦ͝ͱͷྦྷੵใुͷ಺༁ %άϧʔϓͷඪ४ํࡦʢ-54ʣͷྦྷੵใुͱɺ#άϧʔ ϓͷظटͷ࠷దͳਪનख๏ΛҰ؏ͨ͠৔߹ͱͷൺֱ ˎʢʣ಺͸ͦͷൺ཰

Slide 56

Slide 56 text

• ࠾༻ͨ͠ඪ४ํࡦͰ͸ɺมԽޙʹे෼ͳࢼߦճ਺ΛܦΕ͹௥ै͕ՄೳͰ͋ͬͨ ͕ɺ௥ै·Ͱͷ͕࣌ؒࠩػձଛࣦʹͭͳ͕ΔͨΊɺ໌ࣔతʹ࣌ؒͷܦաΛѻ͑ Δํࡦ͕๬·ΕΔ  56 ఏҊʹؔ͢Δٞ࿦ 'PPEΧςΰϦʹ͓͚Δਪનख๏ ͝ͱͷΫϦοΫ཰ͷਪҠ -54ʹΑΔਪનख๏ͷ࢖͍෼͚ ൺ཰ͷਪҠ

Slide 57

Slide 57 text

খ·ͱΊ

Slide 58

Slide 58 text

 58 খ·ͱΊ • ࣮؀ڥͰͷධՁʹΑΔػձଛࣦͱ͍͏՝୊ʹର͠ɺਪનγεςϜΛ୊ࡐʹɺଟ ࿹όϯσΟοτํࡦΛ༻͍ͨϝλਪનγεςϜΛઃܭ • ධՁͰ͸ɺ࣮ࡍͷECαΠτ͔Βऔಘͨ͠ਪનख๏ͷΫϦοΫ཰ͷਪҠ࣮੷ σʔλΛ༻͍ͨγϛϡϨʔγϣϯͷ݁Ռɺจ຺΍࣌ؒͷܦաΛߟྀ͠ͳ͍ํࡦ ͱൺֱͯ͠ɺྦྷੵใुΛ࠷େ2%૿Ճͤ͞ΔޮՌΛ֬ೝͨ͠ • ਪનख๏ͷಛੑ΍γεςϜཁ݅Λ౿·͑ͨద੾ͳํࡦͷબఆ͕༗ޮ • ࠓޙͷݚڀ՝୊ͱͯ͠ɺ࣌ؒͷܦաΛ໌ࣔతʹѻ͏͜ͱͰղܾΛਤΔ

Slide 59

Slide 59 text

3. ଟ༷͔ͭܧଓతʹมԽ͢Δ ؀ڥ΁ͷదԠ

Slide 60

Slide 60 text

ຊݚڀͷ၆ᛌਤ  60 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ

Slide 61

Slide 61 text

 61 ໨తͱఏҊͷࠎࢠ • ػցֶशϞσϧͷ༏ྼ͸ଟ͘ͷཁҼ͔ΒͳΔঢ়گʢ=จ຺ʣ΍࣌ؒͷܦաʹ Αͬͯࠨӈ͞ΕΔ • ޮՌతͳϞσϧΛػձଛࣦ͕ͳ͍Α͏จ຺ͱ࣌ؒͷܦաʹԠͯ͡࢖͍෼͚͍ͨ • จ຺ͱ࣌ؒͷܦաʹԠͨ͡બ୒ͷ࠷దԽΛɺඇఆৗ͔ͭจ຺෇͖ଟ࿹όϯ σΟοτ໰୊ͱΈͳ͠ɺঢ়ଶۭؒϞσϧͱཻࢠϑΟϧλΛ༻͍ͨํࡦΛఏҊ

Slide 62

Slide 62 text

ؔ࿈ݚڀ ෼໺ͰͷऔΓ૊Έͱઌߦݚڀʹର͢ΔຊݚڀͷҐஔ෇͚

Slide 63

Slide 63 text

• ࿹͝ͱʹෳ਺ͷจ຺͕͋Γɺจ຺ʹԠͯ͡ใु෼෍͕ܾ·Δଟ࿹όϯσΟοτ ໰୊ͷઃఆ • ຊݚڀͰ͸ɺจ຺͸ɺෳ਺ͷཁҼͷύϥϝʔλͷ૊Έ߹ΘͤͰදݱ͞Εͨ ঢ়ଶͷ͜ͱΛࢦ͢ • → ཁҼύϥϝʔλͷ஋͕{0,1}ͷ৔߹ɺจ຺͸ཁҼ਺  ʹରͯ͠  ύλʔϯ d 2d  63 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ • จ຺෇͖ଟ࿹όϯσΟοτํࡦͰ͸ɺจ຺ͷ֬཰෼෍Ͱ͸ͳ͘ɺཁҼ͝ͱͷ܎ ਺ʢઢܗύϥϝʔλʣΛਪఆ͢Δ͜ͱͰ֤จ຺ʹ͓͚ΔใुΛ༧૝͢Δ

Slide 64

Slide 64 text

• ಉ͡จ຺ʹ͓͍ͯ΋࣌ؒܦաʹΑͬͯใु෼෍͕มԽ͢Δଟ࿹όϯσΟοτ໰ ୊ͷ໰୊ઃఆ • पظతͳมԽͰ͋Ε͹ཁҼͷύϥϝʔλʹؚΊΔ͜ͱͰରԠͰ͖Δ͕ෆن ଇͳ৔߹ʢඇఆৗͳมԽͷ৔߹ʣ͸͜ͷݶΓͰ͸ͳ͍  64 ඇఆৗͳଟ࿹όϯσΟοτ໰୊ • ඇఆৗͳଟ࿹όϯσΟοτํࡦͰ͸ɺաڈʹ؍ଌͨ͠ใुʹଊΘΕͣ࿹ͷධՁ Λਝ଎ʹߋ৽͢Δ͜ͱͰ֤จ຺ʹ͓͚ΔใुΛ༧૝͢Δ ๨٫ܕ εϥΠσΟϯά ΢Οϯυ΢ܕ มԽݕग़ܕ ঢ়ଶۭؒϞσϧܕ

Slide 65

Slide 65 text

ঢ়ଶۭؒϞσϧܕ • ঢ়ଶͷਪҠաఔΛঢ়ଶۭؒϞσϧͰѻ͍ɺஞ࣍ਪఆͨ͠ঢ়ଶΛར༻ • ใुܥྻͷ࣌ؒతͳมԽΛࣗવʹѻ͑Δɻಛʹ؇΍͔ͳมԽͷઃఆʹద͢Δ  65 • աఔͷදݱྗͱਪఆਫ਼౓ʹੑೳ͕ґଘɻ + ଟมྔΛ૝ఆͨ͠ঢ়ଶͱ؍ଌ஋ͷؔ܎ੑ΋දݱͰ͖ ΔͨΊɺจ຺෇͖ͷઃఆ΋ࣗવʹѻ͑Δ ঢ়ଶۭؒϞσϧܕ State space model State Context ඇఆৗ͔ͭจ຺෇͖ଟ࿹όϯσΟοτํࡦ IUUQTJDPOTDPN

Slide 66

Slide 66 text

 66 ඇఆৗͳଟ࿹όϯσΟοτ໰୊ͷ՝୊ • ैདྷͷඇఆৗͳଟ࿹όϯσΟοτํࡦͷಘҙͳঢ়گͱෆಘҙͳঢ়گͷྫ 9࣠͸࣌ؒͷܦաΛද͢ :࣠͸֤࿹ͷ༗༻ੑͷଟՉ

Slide 67

Slide 67 text

 67 ඇఆৗͳଟ࿹όϯσΟοτ໰୊ͷ՝୊ ʮڧ͍ʯ࿹͕ऑ͘ͳΔঢ়گ ˠਝ଎ʹ࿹Λ੾ସ • ैདྷͷඇఆৗͳଟ࿹όϯσΟοτํࡦͷಘҙͳঢ়گͱෆಘҙͳঢ়گͷྫ

Slide 68

Slide 68 text

 68 ඇఆৗͳଟ࿹όϯσΟοτ໰୊ͷ՝୊ ʮऑ͍ʯ࿹͕ڧ͘ͳΔঢ়گ ˠ࿹ͷ੾ସ͕஗ΕΔ ྫᅂ޷৘ใͷ஝ੵʹ൐͏ਪન ΞϧΰϦζϜͷਫ਼౓޲্ͳͲ ຊݚڀͰͷ͜ͷঢ়گΛ ʮ൪ڰΘͤʯͱݺͿ • ैདྷͷඇఆৗͳଟ࿹όϯσΟοτํࡦͷಘҙͳঢ়گͱෆಘҙͳঢ়گͷྫ

Slide 69

Slide 69 text

• ଟ࿹όϯσΟοτͷํࡦ͸ɺ͋Δ࣌఺ͰධՁͷߴ͍࿹Λ࠷΋ଟ͘׆༻͢Δ • ࿹ͷ༗༻ੑͷมԽͷ൑அʹ͸ɺมԽޙͷใु෼෍͔Βͷαϯϓϧ͕Ұఆ਺ඞཁ  69 ଟ࿹όϯσΟοτ໰୊ʹ͓͚Δ൪ڰΘͤͷ՝୊ ൪ڰΘͤͳ੾ΓସΘΓ • ैདྷͷඇఆৗ͔ͭจ຺෇͖ํࡦͰ͸ɺ໌ࣔత ͳରԠ͕ಋೖ͞Ε͍ͯͳ͍ • ൪ڰΘͤͷݕग़ͷͨΊʹશͯͷ࿹΁ͷ୳ࡧΛ Ұ཯ʹ૿΍͢[37,38]ͱมԽ͕ͳ͍ظؒ΍จ຺Ͱ ͷػձଛࣦʹͭͳ͕Δ

Slide 70

Slide 70 text

 70 Time-varying Thompson Sampling (TVTP) [42] [C. Zeng 2016] • ঢ়ଶۭؒϞσϧܕͷඇఆৗ͔ͭจ຺෇͖ํࡦ • ࿹ͷ༗༻ੑͷมಈΛදݱ͢ΔͨΊͷDynamic Context drift ModelingΛఏҊ • ঢ়ଶͷෳࡶͳύϥϝʔλਪఆͷͨΊཻࢠϑΟϧλΛ༻͍Δ • → աఔͷදݱྗͱਪఆਫ਼౓ͷ՝୊Λղܾ • ҰํͰɺࢼߦճ਺ͷ૿Ճʹ൐͍୳ࡧͷׂ߹͕ٸܹʹ௿Լ͢Δ • → ൪ڰΘͤ΁ͷ௥ै͕े෼Ͱ͸ͳ͍

Slide 71

Slide 71 text

ఏҊख๏ AE-TVTP

Slide 72

Slide 72 text

 72 <>'JH(SBQIJDBMNPEFMSFQSFTFOUBUJPOGPSCBOEJUQSPCMFN ίϯςΩετ ఆৗ߲ ඇఆৗ߲ ʢεέʔϧ߲ɺυϦϑτ߲ʣ ؍ଌޡࠩ Time-varying Thompson Sampling (TVTP) [42] [C. Zeng 2016] y(l) t ∼ 𝒩 (x 𝖳 t (c(l) + θ(l) ⊙ η(l) t ), σ2(l) ϵ ) k → l, cwk → c(l), θk → θ(l), ηk,t → η(l) t , σ2 k → σ2(l) ϵ μc → μ(l) w , Σc → Σ(l) w , α → α(l) ϵ , β → β(l) ϵ , μθ → μ(l) θ , Σθ → Σ(l) θ ˎ ຊ࿦จͷදهͱͷ౷ҰͷͨΊݩ࿦จͷFig.2ʢӈ্ʣͷه๏͸ҎԼʹஔ׵͢Δ • ঢ়ଶۭؒϞσϧ • จ຺෇͖ɺ͔ͭɺඇఆৗͳใुͷมԽ Λѻ͏ͨΊใुͷมಈΛ૊ΈࠐΜͩϞ σϧΛ༻͍Δ

Slide 73

Slide 73 text

 73 <>'JH(SBQIJDBMNPEFMSFQSFTFOUBUJPOGPSCBOEJUQSPCMFN Time-varying Thompson Sampling (TVTP) [42] [C. Zeng 2016] y(l) t ∼ 𝒩 (x 𝖳 t (c(l) + θ(l) ⊙ η(l) t ), σ2(l) ϵ ) ΧϧϚϯϑΟϧλ ཻࢠϑΟϧλ k → l, cwk → c(l), θk → θ(l), ηk,t → η(l) t , σ2 k → σ2(l) ϵ μc → μ(l) w , Σc → Σ(l) w , α → α(l) ϵ , β → β(l) ϵ , μθ → μ(l) θ , Σθ → Σ(l) θ ˎ ຊ࿦จͷදهͱͷ౷ҰͷͨΊݩ࿦จͷFig.2ʢӈ্ʣͷه๏͸ҎԼʹஔ׵͢Δ • ঢ়ଶͷਪఆ • ใुϞσϧͷύϥϝʔλͷࣄޙ෼෍ͱυ Ϧϑτ߲ͷજࡏঢ়ଶͷஞ࣍ਪఆʹཻࢠ ϑΟϧλͱΧϧϚϯϑΟϧλΛ༻͍Δ

Slide 74

Slide 74 text

 74 <>'JH(SBQIJDBMNPEFMSFQSFTFOUBUJPOGPSCBOEJUQSPCMFN Time-varying Thompson Sampling (TVTP) [42] [C. Zeng 2016] y(l) t ∼ 𝒩 (x 𝖳 t (c(l) + θ(l) ⊙ η(l) t ), σ2(l) ϵ ) k → l, cwk → c(l), θk → θ(l), ηk,t → η(l) t , σ2 k → σ2(l) ϵ μc → μ(l) w , Σc → Σ(l) w , α → α(l) ϵ , β → β(l) ϵ , μθ → μ(l) θ , Σθ → Σ(l) θ ˎ ຊ࿦จͷදهͱͷ౷ҰͷͨΊݩ࿦จͷFig.2ʢӈ্ʣͷه๏͸ҎԼʹஔ׵͢Δ ࣄޙ෼෍ ࣄޙ෼෍ l* t = argmaxl=1,L x⊤ ¯ w(l) t−1 ¯ w(l) t−1 ∼ 𝒩 D ( ¯ μ(l) w , ¯ Σ(l) w ) • ֬཰Ұக๏ • ֤࿹ͰٻΊͨύϥϝʔλͷࣄޙ෼෍ʹै ͍αϯϓϦϯάͨ݁͠ՌΛ࿹ͷબఆʹ༻ ͍Δ͜ͱͰଟ࿹όϯσΟοτํࡦͱ౷߹

Slide 75

Slide 75 text

 75 TVTPͷ՝୊ • ࢼߦճ਺ͷ૿Ճʹ൐͏࿹ͷબఆͷภΓ • ͋Δ࣌఺ͰධՁͷ௿͍࿹Λ୳ࡧ͢Δػձ͕ۃ୺ʹ௿Լ • ൪ڰΘͤͷঢ়گ΁ͷ௥ै͕஗ΕΔ ࢼߦճ਺ͷ૿Ճʹ൐͏ٸܹͳݮগʹΑΓ ͋Δ࣌఺ͷධՁʹج͍ͮͨબ୒ʹݻఆ͞ΕΔ l* t = argmaxl=1,L x⊤ ¯ w(l) t−1 ¯ w(l) t−1 ∼ 𝒩 D ( ¯ μ(l) w , ¯ Σ(l) w ) ¯ Σ(l) w = 1 p2 p ∑ i=1 σ2(l,i) ϵ Σ(l,i) w , XIFSF Q JT OVNCFS PG QBSUJDMFT

Slide 76

Slide 76 text

¯ Σ(l) w = 1 p2 p ∑ i=1 σ2(l,i) ϵ Σ(l,i) w , XIFSF Q JT OVNCFS PG QBSUJDMFT  76 ఏҊํࡦ: Aggressive Exploration TVTPʢAE-TVTPʣ • ࢼߦճ਺ͷ૿Ճʹ൐͏࿹ͷબఆͷภΓΛղফ • ͋Δ࣌఺ͰධՁͷ௿͍࿹Λੵۃతʹ୳ࡧ͢ΔػձΛઃ͚Δ • ൪ڰΘͤͷঢ়گʹ͓͚Δ௥ैੑͷ޲্ΛਤΔ ཻࢠͷฏۉ ཻ֤ࢠͰͷ৐ࢉͷΈ l* t = argmaxl=1,L x⊤ ¯ w(l) t−1 ¯ w(l) t−1 ∼ 𝒩 D ( ¯ μ(l) w , ¯ Σ(l) w )

Slide 77

Slide 77 text

ධՁͱߟ࡯

Slide 78

Slide 78 text

• ࣮ࡍͷECαΠτ͔Β࠾औͨ͠4ͭͷਪનख๏ͷ঎඼ΧςΰϦ͝ͱͷΫϦοΫ཰ ͷਪҠ࣮੷σʔλΛ༻͍ͯఏҊγεςϜͷ༗ޮੑΛධՁ͢Δ • 3ষͱಉҰͷσʔλ͕ͩɺҎԼͷมߋʹΑΓ࣌ؒܦաʹΑΔมಈΛڧௐͨ͠ • ΫϦοΫ཰ͷूܭΛʮ࣌఺·ͰͷྦྷܭʹΑΔฏۉʯ͔Βʮ࣌఺͔Β3೔લ· Ͱͷࢦ਺Ҡಈฏۉʯ΁ • ूܭظؒதͷਪનճ਺͕ஶ͘͠গͳ͔ͬͨʮߪങཤྺʹΑΔڠௐܕਪનʯ ͱʮςΩετʹΑΔ಺༰ϕʔεਪનʯͷೋͭͷਪનख๏Λআ͍ͨ • → ूܭํ๏ͷมߋʹΑΓΫϦοΫ཰͕҆ఆ͠ͳ͘ͳΓ෼ੳͷϊΠζͱͳΔ͜ͱ͕ݒ೦͞ΕͨͨΊ  78 ධՁσʔλͱਪનख๏

Slide 79

Slide 79 text

• ࣮ࡍͷECαΠτ͔Β࠾औͨ͠4ͭͷਪનख๏ͷ঎඼ΧςΰϦ͝ͱͷΫϦοΫ཰ ͷਪҠ࣮੷σʔλΛ༻͍ͯఏҊγεςϜͷ༗ޮੑΛධՁ͢Δ • 3ষͱಉҰͷσʔλ͕ͩɺҎԼͷมߋʹΑΓ࣌ؒܦաʹΑΔมಈΛڧௐͨ͠ • ࣌ؒͷܦա΍঎඼ΧςΰϦʹΑͬͯΫϦοΫ཰ͷߴ͍ਪનख๏͕ҟͳΔ  79 ධՁσʔλͱਪનख๏

Slide 80

Slide 80 text

• બ୒ͨ͠ਪનख๏͔ΒಘΒΕΔΫϦοΫ਺ͷγϛϡϨʔγϣϯ • ํࡦʹΑΓબ୒͞Εͨਪનख๏͸ɺઃఆͨ͠ΫϦοΫ཰ͷϕϧψʔΠ෼෍ʹै͍ਪ ન݁Ռ͕ΫϦοΫ͞ΕΔ΋ͷͱ͢Δ • ֤ਪનख๏͸঎඼ΧςΰϦ਺ͱ౳͍͠18࣍ݩͷύϥϝʔλ  Λ࣋ͭ • ΫϦοΫ཰͸  ͱίϯςΩετ৘ใ  ͷ಺ੵͰܭࢉ͞ΕΔ • ίϯςΩετ৘ใ  ͸ɺ࣌఺  ʹ͓͍ͯར༻ऀ͕Ӿཡ͍ͯ͠Δ঎඼ΧςΰϦͷ1-hot ϕΫτϧͱͯ͠දݱ͞ΕΔ • ࣮ࡍͷਪનγεςϜͷڍಈͱ߹ΘͤΔͨΊɺใु͸1࣌ؒ͝ͱʹ·ͱΊͯϑΟʔυ όοΫ͞ΕΔ΋ͷͱ͢Δ ˜ w(l) t ˜ w(l) t xt xt t  80 ධՁํ๏ʢ1/2ʣʢ3ষͱಉҰʣ

Slide 81

Slide 81 text

• จ຺ͱ࣌ؒͷܦաͷߟྀͷͦΕͧΕͷد༩౓Λ໌Β͔ʹ͢Δ4άϧʔϓͷγ ϛϡϨʔγϣϯΛ࣮ࢪ  81 ධՁํ๏ʢ2/2ʣʢํࡦΛআ͖ɺ3ষͱಉҰʣ ࣌ؒͷܦա º ˓ จ຺ º "ىटͷ࠷ળͳਪનख๏ΛશظؒҰ؏ ͯ͠༻͍Δ $࣌఺͝ͱʹධՁͷߴ͍ਪનख๏Λόϯ σΟοτΛ༻͍ͯબఆ ˓ #จ຺͝ͱʹ࠷ળͳਪનख๏Λશظؒ Ұ؏ͯ͠༻͍Δ %จ຺͝ͱ࣌఺͝ͱʹධՁͷߴ͍ਪનख ๏ΛόϯσΟοτΛ༻͍ͯબఆ • จ຺ʹ͸ɺਪન࣌ʹӾཡதͷ঎඼ΧςΰϦΛ༻͍Δ • ଟ࿹όϯσΟοτํࡦ͸ɺLTS(จ຺) ɺTVTP(จ຺/࣌ؒͷܦա) ɺAE-TVTP(จ຺/࣌ؒͷܦա)

Slide 82

Slide 82 text

• จ຺ͱ࣌ؒͷܦաͷߟྀͷͦΕͧΕͷد༩౓Λ໌Β͔ʹ͢Δ4άϧʔϓͷγ ϛϡϨʔγϣϯΛ࣮ࢪ  82 ࣌ؒͷܦա º ˓ จ຺ º "ىटͷ࠷ળͳਪનख๏ΛશظؒҰ؏ ͯ͠༻͍Δ $࣌఺͝ͱʹධՁͷߴ͍ਪનख๏Λόϯ σΟοτΛ༻͍ͯબఆ ˓ #จ຺͝ͱʹ࠷ળͳਪનख๏Λશظؒ Ұ؏ͯ͠༻͍Δ %จ຺͝ͱ࣌఺͝ͱʹධՁͷߴ͍ਪનख ๏ΛόϯσΟοτΛ༻͍ͯબఆ ඇఆৗɾ൪ڰΘͤͷߟྀ͢Δํࡦ͕༗ޮͰ͋Δ͔ ධՁํ๏ʢ2/2ʣʢํࡦΛআ͖ɺ3ষͱಉҰʣ

Slide 83

Slide 83 text

• Bάϧʔϓʢจ຺ʣ͸ظट࣌఺ʹ͓͍ͯAά ϧʔϓͱจ຺ʹΑΔࠩҟ͕΄΅ͳ͍ͨΊ݁ Ռ΋ࠩҟͳ͠ • Cάϧʔϓʢ࣌ؒͷܦաʣ͸ਪનख๏ͷ༗ ޮੑͷมԽʹ௥ैͨ͜͠ͱͰվળ͕ݟΒΕ Δ • Dάϧʔϓʢจ຺ͱ࣌ؒͷܦաʣ͸঎඼Χ ςΰϦ͝ͱͷมԽʹ௥ैͨ͜͠ͱͰߋͳΔ վળ͕ݟΒΕΔ  83 ධՁ݁Ռ: AάϧʔϓΛج४ͱͨ͠ྦྷੵใुͷࠩͷൺֱ จ຺ͱ࣌ؒͷܦաͷߟྀͳΒͼʹɺٯస؀ڥͷ௥ ैੑΛߴΊͨఏҊํࣜʹΑͬͯ໿૿Ճ จ຺ͷΈ ࣌ؒͷܦաͷΈ

Slide 84

Slide 84 text

• Dάϧʔϓͷจ຺͝ͱͷվળ݁ՌΛ෼ੳ͢ Δͱɺਪનख๏ͷ༗ޮੑͷมಈͷগͳ͍จ ຺Ͱ͸ɺશͯͷํࡦʹ͓͍ͯ୳ࡧͷίετ ΛճऩͰ͖͍ͯͳ͍ • มಈͷେ͖͍จ຺Ͱ͸ੵۃతͳ୳ࡧʹΑΓ AE-TVTP͕େ͖͘վળͨ͠ɻTVTP͸ਪન ख๏͕ݻఆ͞ΕվળʹࢸΒͳ͔ͬͨ  84 ධՁ݁Ռ: ਪનख๏ͷ༗ޮੑͷมಈ౓߹͍ͱվળ཰ • ԣ࣠ͷਪનख๏ͷ༗ޮੑͷมಈͷେ͖͞͸ɺظटʹ࠷΋ΫϦοΫ཰͕ߴ͔ͬͨਪનख๏ʹରͯ͠ɺ֤࣌఺Ͱ࠷େͷΫϦοΫ཰ͱͷࠩͷɺظؒ·Ͱͷ߹ܭ • ॎ࣠ͷྦྷੵใुͷվળ཰͸ɺDάϧʔϓͷ֤ํࡦͷྦྷੵใुͱɺBάϧʔϓͷ͏ͪਪનख๏ΛҰ؏ͯ͠༻͍ͨ݁Ռʹର͢Δൺ

Slide 85

Slide 85 text

 85 ߟ࡯ • ਪનख๏ͷ༗ޮੑ͕ٯస͢Δࠨྻʹ͓͍ͯ ͸ఏҊख๏͕༗ޮ • ӈྻʹ͓͍ͯ͸ɺఏҊख๏ͷੵۃతͳ୳ࡧ ʹىҼͯ͠ɺظؒதܧଓతʹྦྷੵϦάϨο τ͕૿Ճ͢Δɻ3ׂఔ౓ͷ঎඼ΧςΰϦͰ ಉ༷ͷࣄ৅Λ֬ೝɻ ਪનख๏ͷ༗ޮੑʹٯస͕͋Δ঎඼ΧςΰϦ ࠨ ͱɺ ͳ͍঎඼ΧςΰϦ ӈ ʹ͓͚ΔྦྷੵϦάϨοτͷਪҠ ˎྦྷੵϦάϨοτ͸ਪનख๏ͷ͏ͪ࠷େͷظ଴஋ͱબ୒ͨ͠ਪનख ๏ͷظ଴஋ͷࠩΛظؒ·Ͱʹ߹ܭͨ͠΋ͷ • มԽͷͳ͍ظؒʹ͓͍ͯ΋ػձଛࣦΛ௿ݮ ͢ΔదԠతͳ୳ࡧख๏ͷݚڀ΁

Slide 86

Slide 86 text

• ఏҊํࡦAE-TVTPͰ͋ͬͯ΋ɺ௕ظతʹ͸ڞ෼ࢄߦྻ  ͷཁૉ͸খ͍͞஋ʹ ऩଋ͢Δ͜ͱɺ·ͨɺ୳ࡧଅਐ͸ɺબ୒ͨ͠࿹ͷΈʹର͢ΔॲஔͰ͋Δ͜ͱ͔ Βɺ௕ظؒͷӡ༻࣌ʹ͓͚Δ௥ैੑͷ௿Լ͕ݒ೦͞ΕΔ • ঢ়ଶਪఆͰ༻͍ΔཻࢠϑΟϧλ͸ɺཻࢠ͝ͱͷෳ਺ճͷٯߦྻܭࢉΛؚΉͨ ΊɺίϯςΩετ৘ใͷ࣍ݩ਺ͱཻࢠ਺ͷ૿Ճʹ൐͍ܭࢉ͕࣌ؒ૿Ճ͢Δɻ • → ཻࢠ਺͸ɺࣄޙ෼෍ͷਪఆਫ਼౓ʹӨڹ͢ΔͨΊɺਫ਼౓ͱ࣮ߦ࣌ؒͷτϨʔυΦϑ͕ੜ͡Δ ¯ Σ(l) w  86 ఏҊʹؔ͢Δٞ࿦ ¯ Σ(l) w = 1 p p ∑ i=1 Σ(l,i) w l* t = argmaxl=1,L x⊤ ¯ w(l) t−1 ¯ w(l) t−1 ∼ 𝒩 D ( ¯ μ(l) w , ¯ Σ(l) w )

Slide 87

Slide 87 text

খ·ͱΊ

Slide 88

Slide 88 text

 88 খ·ͱΊ • จ຺΍࣌ؒͷܦաʹΑΔ༗༻ੑͷมಈͱ͍͏ಛੑʢ1ʣʹର͠ɺཻࢠϑΟϧλ Λ༻͍ͨඇఆৗ͔ͭจ຺෇͖ଟ࿹όϯσΟοτํࡦΛఏҊ • ൪ڰΘͤͷঢ়گͱ୳ࡧͷॏཁੑʹண໨͠ɺैདྷํࡦͷ֦ுʹΑͬͯվળ • ධՁͰ͸ɺಉECαΠτͰͷσʔλΛ༻͍ͨγϛϡϨʔγϣϯͷ݁Ռɺจ຺΍ ࣌ؒͷܦաΛߟྀ͠ͳ͍ํࡦͱൺֱͯ͠ɺྦྷੵใुΛ࠷େ9.7%૿Ճͤ͞Δޮ ՌΛ֬ೝͨ͠ • ࠓޙͷݚڀ՝୊ͱͯ͠ɺػձଛࣦͷগͳ͍୳ࡧํࣜͷݕ౼͓Αͼํࡦͷ࣮ߦ࣌ ؒͷ௿ݮ͕ڍ͛ΒΕΔ

Slide 89

Slide 89 text

4. దԠͷߴ଎Խ

Slide 90

Slide 90 text

ຊݚڀͷ၆ᛌਤ  90 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ

Slide 91

Slide 91 text

 91 ໨తͱఏҊͷࠎࢠ • ػցֶशϞσϧͷ༏ྼ͸ɺจ຺΍࣌ؒͷܦա͚ͩͰͳ͘ɺԠ౴ੑɾ௥ैੑʹΑͬ ͯ΋ࠨӈ͞ΕΔ • ޮՌతͳϞσϧΛػձଛࣦ͕ͳ͍Α͏จ຺ͱ࣌ؒͷܦաʹԠͯ͡࢖͍෼͚ͨ ͍ɻͦΕΛ࣮ߦ࣌ؒͷ؍఺Ͱҙࣝͤ͞Δ͜ͱͳ͘ߦ͍͍ͨ • จ຺ͱ࣌ؒͷܦաʹԠͨ͡બ୒ͷ࠷దԽΛɺඇఆৗ͔ͭจ຺෇͖ଟ࿹όϯ σΟοτ໰୊ͱΈͳ͠ɺਝ଎ʹ͜ΕΛղ͘ɺઢܗΨ΢εঢ়ଶۭؒϞσϧͱઢܗ ΧϧϚϯϑΟϧλΛ༻͍ͨํࡦΛఏҊ

Slide 92

Slide 92 text

ؔ࿈ݚڀ ෼໺ͰͷऔΓ૊Έͱઌߦݚڀʹର͢ΔຊݚڀͷҐஔ෇͚

Slide 93

Slide 93 text

• ஞ࣍తʹಘΒΕΔใुɾίϯςΩετ৘ใʹର͢Δஞ࣍తͳֶशͷඞཁੑ • ஞ࣍తͳֶशʹ͓͚Δֶश࣌ؒ΁ͷओͨΔӨڹཁҼ  93 ଟ࿹όϯσΟοτํࡦͷ࣮ߦ࣌ؒʹؔ͢Δ՝୊ͱରࡦ 1. શͯͷֶशσʔλΛ༻͍ͨ࠶ܭࢉ →  ૊·Ͱͷܭࢉ݁Ռͱ  ૊໨Ͱͷ؍ଌ஋ͷΈΛ༻͍Δ࠶ؼతֶशػߏ 2. ٯߦྻͷܭࢉ → ্هͷ࠶ؼతֶशʹ͓͚Δ ɹ Woodburyͷ߃౳ࣜͷద༻ N N + 1 ஞ࣍࠷খೋ৐๏

Slide 94

Slide 94 text

 94 ํࣜ ໛ࣜਤ ՝୊ ๨٫ܕ ๨٫܎਺෇͖ɺ͔ͭɺਖ਼ଇԽΛؚΉஞ࣍࠷খೋ৐๏͸ ൚Խੑೳ͕௿Լ [41] εϥΠσΟϯά ΢Οϯυ΢ܕ ैདྷํࡦͰ͸ஞ࣍࠷খೋ৐๏ͷద༻ํ๏͸ࣔ͞Εͳ͍ [69] มԽݕग़ܕ มԽݕग़Ϟσϧͷ࣮ߦ࣌ؒ͸ଟมྔʹରԠ͢ΔͨΊ૿Ճ [43,70,71] ঢ়ଶۭؒ Ϟσϧܕ ෳࡶͳաఔΛදݱͰ͖ΔϞσϧʹର͢Δਪఆख๏͸ ࣮ߦ͕࣌ؒ௕͍ [42] Regression model Regression model Regression model Detection model State space model State γ0 γ1 γ2 γ3 w దԠͷߴ଎Խͷ՝୊ • จ຺ͱඇఆৗੑΛಉ࣌ʹߟྀ͢Δࡍͷ࣮ߦ࣌ؒʹؔ͢Δ՝୊ IUUQTJDPOTDPN

Slide 95

Slide 95 text

 95 Phased Initial Exploration of System [73] [J. Cornet 2022] • ঢ়ଶۭؒϞσϧܕͷඇఆৗ͔ͭจ຺෇͖ํࡦ • ࿹ͷ༗༻ੑͷมಈΛදݱ͢ΔͨΊɺঢ়ଶ΍؍ଌ஋ͷؔ܎ੑ͕ɺઢܗ͔ͭޡ ͕ࠩਖ਼ن෼෍ʹै͏ͱԾఆͨ͠ઢܗΨ΢εঢ়ଶۭؒϞσϧΛ࠾༻ • ঢ়ଶͷύϥϝʔλਪఆͷͨΊܰྔͳઢܗΧϧϚϯϑΟϧλΛ༻͍Δ • → ঢ়ଶਪҠͷࣗ༝౓Λ੍ݶ͢Δ͜ͱͰ࣮ߦ࣌ؒͷ୹ॖΛਤΔ • ҰํͰɺίϯςΩετ৘ใ͕લͷ࣌఺ͷঢ়ଶʹґଘ͢Δͱ͍͏ԾఆΛ࣋ͭ • → ίϯςΩετ৘ใ͕֎෦͔Β༩͑ΒΕΔจ຺෇͖ͷ໰୊Λѻ͑ͳ͍ Linear Gaussian state space (LGSS)

Slide 96

Slide 96 text

 96 • ঢ়ଶۭؒϞσϧܕͷඇఆৗ͔ͭจ຺෇͖ํࡦ • ࿹ͷ༗༻ੑͷมಈΛදݱ͢ΔͨΊɺঢ়ଶ΍؍ଌ஋ͷؔ܎ੑ͕ɺઢܗ͔ͭޡ ͕ࠩਖ਼ن෼෍ʹै͏ͱԾఆͨ͠ઢܗΨ΢εঢ়ଶۭؒϞσϧΛ࠾༻ • ঢ়ଶͷύϥϝʔλਪఆͷͨΊܰྔͳઢܗΧϧϚϯϑΟϧλΛ༻͍Δ • → ঢ়ଶਪҠͷࣗ༝౓Λ੍ݶ͢Δ͜ͱͰ࣮ߦ࣌ؒͷ୹ॖΛਤΔ • ·ͨɺ୳ࡧΛଅ͢ػߏΛඋ͍͑ͯͳ͍ • → ൪ڰΘͤͷঢ়گ΁ͷରԠ͕ॆ෼Ͱ͸ͳ͍ Phased Initial Exploration of System [73] [J. Cornet 2022]

Slide 97

Slide 97 text

 97 KF-MANB [40] [O. Granmo 2010] • ঢ়ଶۭؒϞσϧܕͷඇఆৗͳํࡦ • ࿹ͷ༗༻ੑͷมಈΛදݱ͢ΔͨΊϩʔΧϧϨϕϧϞσϧΛ࠾༻ • ঢ়ଶͷύϥϝʔλਪఆͷͨΊܰྔͳΧϧϚϯϑΟϧλΛ༻͍Δ • ܽଌ஋ॲཧΛԉ༻ͨ͠ະબ୒ͷ࿹ʹର͢Δ୳ࡧΛఏҊ • → ࣮ߦ࣌ؒͷ୹ॖͱɺ୳ࡧͷภΓͷղফΛ࣮ݱ • ҰํͰɺίϯςΩετ৘ใΛೖྗͰ͖ͳ͍ํࡦ • → จ຺෇͖ͷ໰୊Λѻ͏͜ͱ͕Ͱ͖ͳ͍ ࢼߦ͝ͱʹશͯͷ࿹Λߋ৽͢Δͨ Ίɺܰྔͳਪఆख๏Ͱͷ࠾༻͕๬ ·͍͠ํࣜ

Slide 98

Slide 98 text

ఏҊख๏ LGSS bandits

Slide 99

Slide 99 text

• ఏҊํࡦͰ͸ใुϞσϧ͕ҎԼͷઢܗΨ΢εঢ়ଶۭؒϞσϧʹै͏ͱԾఆ͢Δ ઢܗΨ΢εঢ়ଶۭؒϞσϧΛ༻͍ͨใुϞσϧ  99 • ঢ়ଶ  ͕લͷ࣌఺ͷঢ়ଶ  ʹใु  ͕ঢ়ଶ  ʹै͏͜ͱΛදݱ͍ͯ͠Δ αt αt−1 rt αt  Rt  t + 1  Zt  Tt  ηt  ϵt  rt  αt  αt+1  +  + rt = Zt αt + ϵt , ϵt ∼ 𝒩 (0,σ2 ϵ ), αt+1 = Tt αt + ηt Rt , ηt ∼ 𝒩 (0,σ2 η ), t = 1,…, τ α1 ∼ 𝒩 d (μ1 , Σ1 ),

Slide 100

Slide 100 text

• Ұظઌ༧ଌͱ؍ଌ݁ՌʹΑΔϑΟϧλϦϯάΛ܁Γฦͯ͠ঢ়ଶΛܧଓతʹਪఆ • ঢ়ଶ  ͸ฏۉ  ͱͦͷڞ෼ࢄߦྻ  Ͱදݱ͢Δ͜ͱ͕Ͱ͖Δ α μ Σ  100 ઢܗΧϧϚϯϑΟϧλʹΑΔ࿹ͷঢ়ଶਪఆ Ұظઌͷঢ়ଶ༧ଌ Ұظઌͷ؍ଌ༧ଌ ؍ଌͷޡࠩ ϑΟϧλϦϯά ઢܗΧϧϚϯϑΟϧλʹ͓͚Δঢ়ଶߋ৽ͷαΠΫϧ μt+1 = Tμt|t Σt+1 = TΣt|t T 𝖳 + RQR 𝖳 ̂ yt = Zμt vt = yt − ̂ yt Ft = ZΣt Z 𝖳 + H μt|t = μt + Gvt = μt + (Σt Z 𝖳 F−1 t )vt Σt|t = Σt − GFt G 𝖳 t = t + 1

Slide 101

Slide 101 text

 101 Ұظઌͷঢ়ଶ༧ଌ Ұظઌͷ؍ଌ༧ଌ ؍ଌͷޡࠩ ϑΟϧλϦϯά ઢܗΧϧϚϯϑΟϧλʹ͓͚Δঢ়ଶߋ৽ͷαΠΫϧ μt+1 = Tμt|t Σt+1 = TΣt|t T 𝖳 + RQR 𝖳 ̂ yt = Zμt vt = yt − ̂ yt Ft = ZΣt Z 𝖳 + H μt|t = μt + Gvt = μt + (Σt Z 𝖳 F−1 t )vt Σt|t = Σt − GFt G 𝖳 t = t + 1 จ຺ʹԠͨ͡࿹ͷධՁߋ৽ • ઢܗΧϧϚϯϑΟϧλʹ͓͚Δߦྻ  ΍  ͸ϑΟϧλϦϯά΍Ұظઌ༧ଌͷॲ ཧʹ͓͍ͯঢ়ଶͷฏۉ΍෼ࢄڞ෼ࢄͷͲͷཁૉΛߋ৽͢Δ͔ܾఆ͍ͯ͠Δ Z R ঢ়ଶਪҠ࣌ͷޡࠩΛ෇༩ ؍ଌ࣌ͷޡࠩΛ෇༩

Slide 102

Slide 102 text

 102 Ұظઌͷঢ়ଶ༧ଌ μt+1 = Tμt|t Σt+1 = TΣt|t T 𝖳 + RQR 𝖳 จ຺ʹԠͨ͡࿹ͷධՁߋ৽ ঢ়ଶਪҠ࣌ͷޡࠩΛ෇༩ ഉଞతͳίϯςΩετͷ࣍ݩʹ΋ޡ͕ࠩ஝ੵͯ͠͠·͏ ίϯςΩετAͰऑ͍࿹͸ɺܽଛ஋ॲཧʹΑΓଞͷίϯςΩετʹରͯ͠΋ޡࠩΛ෇༩͢Δɻ ίϯςΩετBͰ10ճఔ౓ͷ୳ࡧͰྑ͍΋ͷ͕20ճ୳ࡧ͞ΕΔ͔΋͠Εͳ͍ • ઢܗΧϧϚϯϑΟϧλʹ͓͚Δߦྻ  ΍  ͸ϑΟϧλϦϯά΍Ұظઌ༧ଌͷॲ ཧʹ͓͍ͯঢ়ଶͷฏۉ΍෼ࢄڞ෼ࢄͷͲͷཁૉΛߋ৽͢Δ͔ܾఆ͍ͯ͠Δ • → ଟ༷ͳจ຺Λѻ͏ઃఆͰ͸ৗʹಉ͡ߦྻΛ༻͍ͨߋ৽͸ద͞ͳ͍ Z R RQR 𝖳 = σ2 η σ2 η σ2 η σ2 η σ2 η σ2 η σ2 η σ2 η σ2 η R = [ 1 1 1 ] , Q = [σ2 η ]

Slide 103

Slide 103 text

 103 Ұظઌͷঢ়ଶ༧ଌ μt+1 = Tμt|t Σt+1 = TΣt|t T 𝖳 + RQR 𝖳 จ຺ʹԠͨ͡࿹ͷධՁߋ৽ ঢ়ଶਪҠ࣌ͷޡࠩΛ෇༩ Rt QR 𝖳 t = σ2 η 0 σ2 η 0 0 0 σ2 η 0 σ2 η Rt = [ 1 0 1 ] , Q = [σ2 η ] • ઢܗΧϧϚϯϑΟϧλʹ͓͚Δߦྻ  ΍  ͸ϑΟϧλϦϯά΍Ұظઌ༧ଌͷॲ ཧʹ͓͍ͯঢ়ଶͷฏۉ΍ڞ෼ࢄߦྻͷͲͷཁૉΛߋ৽͢Δ͔ܾఆ͍ͯ͠Δ • → จ຺ʹԠͨ࣌͡มͳߦྻΛ༻ҙ͢Δ Z R ίϯςΩετͷԠͨ͡  (=  )Λ༻ҙ͢Δʢ  ʹ͍ͭͯ΋ಉ༷ʣ R Rt Z Rt = xt = (1,0,1) 𝖳 (xt ∈ {0,1}m)

Slide 104

Slide 104 text

 104 จ຺Λߟྀͨ͠܎਺ߦྻ • ఏҊͷ܎਺ߦྻͷઃܭ͸ߏ଄࣌ܥྻϞσϧʹجͮ͘ • ঢ়ଶͷਪҠʹ͓͍ͯਫ४੒෼ͷΈΛѻ͏৔߹ͷઃܭ • ঢ়ଶͷਪҠʹ͓͍ͯਫ४੒෼͚ͩͰͳ͘܏޲੒෼ΛؚΊΔΑ͏༰қʹ֦ுՄೳ Zt = (x 𝖳 t , 01×D), Rt = [ xt 0D×1 0D×1 xt ] , Tt = [ ID diag(xt ) 0D×D ID ] xt ∈ {0,1}m αt [0 : D] = wt Zt = x 𝖳 t , Rt = xt , Tt = ID , xt ∈ {0,1}D

Slide 105

Slide 105 text

 105 ఏҊํࡦ: LGSS bandits 1. ֤࿹  ʹ͓͍ͯɺঢ়ଶ  ͷฏۉ  ͱڞ෼ࢄߦྻ  Λਪఆ͢Δ 2. ਪఆ஋Λ༻͍ͨଟมྔਖ਼ن෼෍ʹै͏ཚ਺Λ֤࿹ͷύϥϝʔλ  ͱͯ͠ಘΔ 3. ύϥϝʔλ  ͱίϯςΩετ৘ใ  ͷ಺ੵ͕࠷΋େ͖͍࿹Λબఆ͢Δ l α(l) t ( = w(l) t ) μ(l) t Σ(l) t ˜ w(l) t ˜ w(l) t xt  l* t = argmaxl=1,L (x⊤ t ˜ w(l) t , ˜ w(l) t ∼ 𝒩 D (μ(l) t , Σ(l) t )) • ઢܗΧϧϚϯϑΟϧλͷঢ়ଶਪఆ஋Λ༻͍ͨ֬཰Ұக๏

Slide 106

Slide 106 text

 106 ઢܗΧϧϚϯϑΟϧλʹΑΔ࿹ͷঢ়ଶਪఆʢ࠶ܝʣ Ұظઌͷঢ়ଶ༧ଌ Ұظઌͷ؍ଌ༧ଌ ؍ଌͷޡࠩ ϑΟϧλϦϯά ઢܗΧϧϚϯϑΟϧλʹ͓͚Δঢ়ଶߋ৽ͷαΠΫϧ μt+1 = Tμt|t Σt+1 = TΣt|t T 𝖳 + RQR 𝖳 ̂ yt = Zμt vt = yt − ̂ yt Ft = ZΣt Z 𝖳 + H μt|t = μt + Gvt = μt + (Σt Z 𝖳 F−1 t )vt Σt|t = Σt − GFt G 𝖳 t = t + 1 • Ұظઌ༧ଌͱ؍ଌ݁ՌʹΑΔϑΟϧλϦϯάΛ܁Γฦͯ͠ঢ়ଶΛܧଓతʹਪఆ • ঢ়ଶ  ͸ฏۉ  ͱͦͷڞ෼ࢄߦྻ  Ͱදݱ͢Δ͜ͱ͕Ͱ͖Δ α μ Σ ෼ࢄڞ෼ࢄΛখ͘͞ ෼ࢄڞ෼ࢄΛେ͖͘

Slide 107

Slide 107 text

 107 Ұظઌͷঢ়ଶ༧ଌ ϑΟϧλϦϯά μt+1 = Tμt|t Σt+1 = TΣt|t T 𝖳 + RQR 𝖳 μt|t = μt Σt|t = Σt t = t + 1 ෼ࢄڞ෼ࢄΛେ͖͘ ܽଌ஋Λ༻͍ͨԾ૝తͳ୳ࡧ • ઢܗΧϧϚϯϑΟϧλͰ͸؍ଌ஋͕ಘΒΕͳ͍৔߹΋ܽଌ஋ͱͯ͠ѻ͑Δ • ͜ͷܽଌ஋ॲཧΛબఆ͞Εͳ͔ͬͨ࿹ʹର͢Δߋ৽ૢ࡞ͱͯ͠औΓೖΕΔ ઢܗΧϧϚϯϑΟϧλʹ͓͚Δܽଌ஋ॲཧͷαΠΫϧ • ظ଴͞ΕΔޮՌ • ֬཰Ұக๏ͷ࢓૊ΈʹΑΓબఆػձͷগͳ͍࿹ʹର͢Δ୳ࡧ͕ଅਐ͞ΕΔ

Slide 108

Slide 108 text

ධՁͱߟ࡯

Slide 109

Slide 109 text

• ࣮ࡍͷECαΠτ͔Β࠾औͨ͠4ͭͷਪનख๏ͷ঎඼ΧςΰϦ͝ͱͷΫϦοΫ཰ ͷਪҠ࣮੷σʔλΛ༻͍ͯఏҊγεςϜͷ༗ޮੑΛධՁ͢Δ • 4ষͱಉҰͷσʔλ͕ͩɺධՁ࣌ؒͷ੍ݶʹΑΓର৅ظؒΛ୹ॖͨ͠ • 2019/6/20ʙʮ8/4·Ͱͷ໿225ສճʯ͔Βʮ7/22·Ͱͷ໿149ສճʯͷਪ નσʔλ΁  109 ධՁσʔλͱਪનख๏ 0 100 200 300 400 500 600 Hours 0.05 0.10 0.15 Click-through rate Browsing path Demographic LLR Similar image ࠷΋ߴ͍ΫϦοΫ཰ͷ੾ΓସΘΓɻ ຊظؒʹ͓͍ͯ΋ɺਪનख๏ͷయܕతͳ ಛ௃͕ݱΕ͍ͯΔɻ

Slide 110

Slide 110 text

• બ୒ͨ͠ਪનख๏͔ΒಘΒΕΔΫϦοΫ਺ͷγϛϡϨʔγϣϯ • ํࡦʹΑΓબ୒͞Εͨਪનख๏͸ɺઃఆͨ͠ΫϦοΫ཰ͷϕϧψʔΠ෼෍ʹै͍ਪ ન݁Ռ͕ΫϦοΫ͞ΕΔ΋ͷͱ͢Δ • ֤ਪનख๏͸঎඼ΧςΰϦ਺ͱ౳͍͠18࣍ݩͷύϥϝʔλ  Λ࣋ͭ • ΫϦοΫ཰͸  ͱίϯςΩετ৘ใ  ͷ಺ੵͰܭࢉ͞ΕΔ • ίϯςΩετ৘ใ  ͸ɺ࣌఺  ʹ͓͍ͯར༻ऀ͕Ӿཡ͍ͯ͠Δ঎඼ΧςΰϦͷ1-hot ϕΫτϧͱͯ͠දݱ͞ΕΔ • ࣮ࡍͷਪનγεςϜͷڍಈͱ߹ΘͤΔͨΊɺใु͸1࣌ؒ͝ͱʹ·ͱΊͯϑΟʔυ όοΫ͞ΕΔ΋ͷͱ͢Δ ˜ w(l) t ˜ w(l) t xt xt t  110 ධՁํ๏ʢ1/2ʣʢ4ষͱಉҰʣ

Slide 111

Slide 111 text

 111 ର৅ ํࡦ උߟ ඇఆৗ ͔ͭ จ຺෇͖ LGSS banditsʢఏҊํࡦʣ ঢ়ଶۭؒϞσϧܕ TVTP [42] ঢ়ଶۭؒϞσϧܕɻධՁ࣌ؒʢ24࣌ؒʣ಺ʹύϥϝʔλௐ੔ ͱ࣮ݧͷ࣮ߦ͕׬ྃ͠ͳ͔ͬͨͨΊɺҰ෦ධՁͷΈ࣮ࢪ AdTS [43] มԽݕग़ܕɻ෼ׂͨ͠ܥྻͷฏۉͱڞ෼ࢄߦྻ͔ΒͷϚϋϥϊ Ϗεڑ཭ͷܥྻʹରͯ͠ϒʔτετϥοϓ๏ͰมԽݕग़ Decay LinUCB [68] ๨٫ܕ dLinUCB [70] มԽݕग़ܕɻใु༧ଌͷޡࠩΛݕग़͠ɺ৽ͨͳใु෼෍༻ͷ όϯσΟοτϞσϧΛ௥Ճɻᮢ஋ͷ௿ԼͰఆظతͳॳظԽ͋Γ จ຺෇͖ LTS [47] LinUCB [33] Neural Linear (+ LTS) [76] χϡʔϥϧωοτϫʔΫͷֶशִؒ͸ධՁ࣌ؒͷ੍ݶͱมԽ΁ ͷ௥ैੑͷ؍఺͔Βࢦ਺ؔ਺తͳִؒΛ࠾༻ ධՁํ๏ʢ2/2ʣ • γϛϡϨʔγϣϯʹ༻͍Δํࡦ

Slide 112

Slide 112 text

ධՁ݁Ռ: Ԡ౴࣌ؒ΁ͷӨڹ  112 0.0 0.2 0.4 ms 0.48 0.19 0.04 0.24 0.29 0.04 0.27 Select per 1 time 10°1 100 sec (Log) 1.49 0.04 0.06 0.09 0.04 0.05 1.82 Update per 2665 times adaptive thompson sampling decay linucb dynamic linucb LGSS bandits linear thompson sampling linucb neural linear Elapsed time for 2665 times for 18 dimension 4 arms • ࿹ͷબఆʹ͍ͭͯLGSS bandits͸0.24ϛϦඵ • ͢΂ͯͷଌఆ஋͸ 0.5 ϛϦඵະຬͰ͋Γɺ ॏେͳӨڹΛ༩͑ͳ͍े෼ͳύϑΥʔϚϯ εͰ͋Δͱ൑அͰ͖Δɻ • ධՁͷߋ৽ʹ͍ͭͯ୯Ґ࣌ؒʢ1࣌ؒʣͱൺֱ ͯ͠े෼ʹ୹͍ • TVTP΍AdTSɺNeural LinearͰ͸ࢼߦճ਺ ͷ૿Ճʹରͯ͠εέʔϥϏϦςΟ͕ͳ͍ ࿹ͷબఆͱධՁͷߋ৽ͷܦա࣌ؒͷଌఆ ͳ͓ɺTVTPʢཻࢠ਺5ʣͰ͸ɺ ࿹ͷબఆ͸1.32ϛϦඵɺ ධՁߋ৽͸13.6ඵ

Slide 113

Slide 113 text

ධՁ݁Ռ: ඇఆৗ͔ͭจ຺෇͖ͷઃఆʹର͢Δ༗ޮੑ  113 0 5000 10000 15000 20000 Cumulative regret adaptive thompson sampling decay linucb dynamic linucb LGSS bandits linear thompson sampling linucb neural linear select best arm at first select random arm 0 100 200 300 400 500 600 0.0 0.2 0.4 0.6 0.8 1.0 Best arm rate ֤ํࡦʹ͓͚ΔྦྷੵϦάϨοτͱ࠷ద࿹ͷબఆׂ߹ • ఏҊํࡦLGSS bandits͸࠷΋ߴ͍ྦྷੵใु ʢ࠷΋গͳ͍ϦάϨοτʣΛୡ੒ • มԽݕग़ܕͷํࡦAdTS͕࣍఺ • → ͜ΕΒʹ͍࣍ͰɺఆৗͳํࡦͷLTSɺLinUCBͷ݁ Ռ͕ྑ͔ͬͨ͜ͱ͔ΒɺຊධՁͰ͸ɺఆৗ࣌Ͱͷ୳ ࡧΛ཈͑ͨػձଛࣦͷ௿ݮ΋ॏཁͰ͋ͬͨɻ LGSS banditsͱAdTS͸ɺ࠷ద࿹ͷ੾Γସ͚͑࣌ͩͰ ͳ͘ɺఆৗظؒத΋ػձଛࣦΛ௿ݮͰ͖ͨɻ 0 100 200 300 400 500 600 Hours 0.05 0.10 0.15 Click-through rate Browsing path Demographic LLR Similar image ਪનख๏͝ͱͷΫϦοΫ཰ͷਪҠ • શͯͷ঎඼ΧςΰϦΛ௨ͯ͠ظटͷ࠷ద࿹ ΛҰ؏ͯ͠༻͍Δ৔߹ͱൺֱͯ͠ɺఏҊํ ࡦʹΑͬͯྦྷੵใु͕໿6.5%૿Ճ

Slide 114

Slide 114 text

ධՁ݁Ռ: ඇఆৗ͔ͭจ຺෇͖ͷઃఆʹର͢Δ༗ޮੑ  114 0 5000 10000 15000 20000 Cumulative regret adaptive thompson sampling decay linucb dynamic linucb LGSS bandits linear thompson sampling linucb neural linear select best arm at first select random arm 0 100 200 300 400 500 600 0.0 0.2 0.4 0.6 0.8 1.0 Best arm rate ֤ํࡦʹ͓͚ΔྦྷੵϦάϨοτͱ࠷ద࿹ͷબఆׂ߹ • มԽݕग़ܕͷํࡦdLinUCB͸ɺมԽݕग़ޙ ͷ௥ैੑೳʹ༏Ε͍͕ͯͨɺલ൒ͷϦά Ϩοτ૿ՃʹΑΓɺఆৗํࡦΑΓྦྷੵϦά Ϩοτ͕૿Ճͨ͠ • → ఆظతͳόϯσΟοτϞσϧ௥Ճͱ࠶ධՁʹىҼ • ๨٫ܕͷํࡦDecay LinUCB͸ɺίϯςΩ ετ৘ใͷཁҼͷ؍ଌ਺ͷภΓʹΑΓɺ׆ ༻ͱ୳ࡧͷཱ͕྆ࠔ೉ͱͳͬͨ 0 100 200 300 400 500 600 Hours 0.05 0.10 0.15 Click-through rate Browsing path Demographic LLR Similar image ਪનख๏͝ͱͷΫϦοΫ཰ͷਪҠ

Slide 115

Slide 115 text

ධՁ݁Ռ: ඇఆৗ͔ͭจ຺෇͖ͷઃఆʹର͢Δ༗ޮੑ  115 • ఏҊํࡦͰ͸ɺԾ૝తͳ୳ࡧʹΑΓɺมԽ ݕग़ޙͷ௥ैΛ࣮ݱͭͭ͠ɺఆৗ࣌ͷػձ ଛࣦͷ௿ݮ͕࣮ݱͰ͖ͨ • → Ծ૝త୳ࡧ͕ͳ͍ɺAdTS΍dLinUCBͰ͸ɺఆৗ΋͠ ͘͸ඇఆৗͷ͍ͣΕ͔ʹ͔͠ରԠͰ͖͍ͯͳ͍ͨΊɺఏ Ҋํࡦʹൺ΂ͯ૯߹తͳੑೳͰ͸ѱԽͨ͠ 0.00 0.05 0.10 0.15 0.20 Change of click-through rate (Stationery) Browsing path Demographic LLR Similar image 0 100 200 300 400 500 600 0.0 0.2 0.4 0.6 0.8 1.0 Best arm rate (Stationery) adaptive thompson sampling dynamic linucb LGSS bandits linear thompson sampling 0 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 Period 1 240 260 280 300 Period 2 300 400 500 600 Period 3 0.00 0.05 0.10 0.15 0.20 Change of click-through rate (Plants) Browsing path Demographic LLR Similar image 0 100 200 300 400 500 600 0.0 0.2 0.4 0.6 0.8 1.0 Best arm rate (Plants) adaptive thompson sampling dynamic linucb LGSS bandits linear thompson sampling ֤ํࡦʹ͓͚ΔྦྷੵϦάϨοτͱ࠷ద࿹ͷબఆׂ߹ Stationary Plants

Slide 116

Slide 116 text

• ؍ଌࡶԻͷ෼ࢄ  ͸ΧϧϚϯϑΟϧλͷ࠷໬ਪఆ݁ՌΑΓେ͖͍஋͕ɺঢ়ଶͷ ҆ఆͨ͠ਪఆʹ༗ޮͰ͋ͬͨɻ࣮ӡ༻Ͱͷల։࣌ʹཹҙ͍ͨ͠ • → ܽଌ஋ॲཧʹΑΓঢ়ଶͷڞ෼ࢄߦྻͷ஋͕ߴ͘อͨΕɺঢ়ଶ͕େ͖͘ิ ਖ਼͞ΕΔͨΊɻ • ఏҊํࡦͰ͸ɺઢܗΨ΢εঢ়ଶۭؒϞσϧΛԾఆ͢Δ͜ͱͰ࣮ߦ࣌ؒͷ୹ॖΛ ਤ͕ͬͨɺ͜ͷԾఆ͕ຬͨ͞Εͳ͍ඇઢܗͳจ຺෇͖ઃఆʹ͓͍ͯ͸ɺਪఆਫ਼ ౓ͷ௿Լͱػձଛࣦͷ૿Ճ͕ݒ೦͞ΕΔɻ σ2 ϵ  116 ఏҊʹؔ͢Δٞ࿦

Slide 117

Slide 117 text

খ·ͱΊ

Slide 118

Slide 118 text

• ಛੑʢ1ʣʹՃ͑ɺదԠʹ൐͏࣌ؒͷӨڹͱ͍͏ಛੑʢ2ʣʹର͠ɺઢܗΧϧ ϚϯϑΟϧλΛ༻͍ͨඇఆৗ͔ͭจ຺෇͖ଟ࿹όϯσΟοτํࡦΛఏҊ • ධՁͰ͸ɺ࣮ࡍͷECαΠτ͔Βऔಘͨ͠ਪનख๏ͷΫϦοΫ཰ͷਪҠ࣮੷ σʔλΛ༻͍ͨγϛϡϨʔγϣϯͷ݁Ռɺจ຺΍࣌ؒͷܦաΛߟྀ͠ͳ͍ํࡦ ͱൺֱͯ͠ɺྦྷੵใुΛ࠷େ6.5%૿Ճͤ͞ΔޮՌΛ֬ೝͨ͠ • ࣮ߦ଎౓ͷ؍఺Ͱ͸ɺ૝ఆ͢ΔϝλਪનγεςϜͷԠ౴଎౓ʹӨڹΛٴ΅͞ͳ ͍͜ͱΛ֬ೝ͠ɺ༗༻ੑͷมಈͷߟྀͱঢ়ଶਪఆͷߴ଎ԽΛཱ྆ͨ͠ • ࠓޙͷݚڀ՝୊ͱͯ͠ɺจ຺ͱใुͷؒͷඇઢܗͳؔ܎ੑͷਪఆΛਐΊ͍ͨ  118 খ·ͱΊ

Slide 119

Slide 119 text

5. ඇઢܗੑ΁ͷରԠ

Slide 120

Slide 120 text

ຊݚڀͷ၆ᛌਤ  120 "EBQUBCMF*OGPSNBUJPOTZTUFNTGPS%ZOBNJDBOE&WPMWJOH&OWJSPONFOU Context Non stationarity Context Non stationarity Online performance Context (Non-linear) Non stationarity Online performance Multi-armed bandit polices 3ষ 4ষ 5ষ 6ষ

Slide 121

Slide 121 text

 121 ໨తͱఏҊͷࠎࢠ • ػցֶशϞσϧͷ༏ྼ͸ɺจ຺΍࣌ؒͷܦա͚ͩͰͳ͘ɺԠ౴ੑɾ௥ैੑʹΑͬ ͯ΋ࠨӈ͞ΕΔɻͳ͓ɺจ຺ͱใुͷؔ܎ੑ͸໌Β͔Ͱ͸ͳ͍ͱԾఆ͢Δ • ޮՌతͳϞσϧΛػձଛࣦ͕ͳ͍Α͏จ຺ͱ࣌ؒͷܦաʹԠͯ͡࢖͍෼͚ͨ ͍ɻͦΕΛ࣮ߦ࣌ؒɾίϯςΩετ৘ใઃܭͷ؍఺Ͱҙࣝͤͣ͞ʹߦ͍͍ͨ • ෳࡶͳจ຺ͱ࣌ؒͷܦաʹԠͨ͡બ୒ͷ࠷దԽΛɺඇఆৗ͔ͭඇઢܗͳจ຺෇ ͖ଟ࿹όϯσΟοτ໰୊ͱΈͳ͠ɺਝ଎ʹ͜ΕΛղ͘ɺॏΈ෇͖ஞ࣍Ψ΢εա ఔճؼΛ༻͍ͨํࡦΛఏҊ

Slide 122

Slide 122 text

ؔ࿈ݚڀ ෼໺ͰͷऔΓ૊Έͱઌߦݚڀʹର͢ΔຊݚڀͷҐஔ෇͚

Slide 123

Slide 123 text

ίϯςΩετ৘ใ  ͱใु  ͷؒͷඇઢܗੑΛѻ͏ํࣜͷݕ౼ 1. ֶशࡁΈϞσϧʹΑΔඇઢܗม׵͞Εͨ  Λೖྗͱ͢Δઢܗͳํࡦ → ҟͳΔλεΫྖҬͰಘͨ݁Ռ͕ɺద༻ઌͷใुਪఆʹ༗ޮ͔͸ෆ໌ 2. ඇઢܗํఔࣜΛಋೖͨ͠ঢ়ଶۭؒϞσϧΛ༻͍ͨํࡦ[42,74] → ঢ়ଶํఔࣜͱ؍ଌํఔࣜΛ໌ࣔత͔ͭݸผʹઃܭͯ͠༩͑Δ͜ͱ͸ࠔ೉ 3. ඇઢܗճؼϞσϧΛ༻͍ͨํࡦ[76~82] → ෆ࣮֬ੑͱ࠶ؼతֶशػߏΛఏڙ͠ͳ͍NNํࡦ͸όϯσΟοτʹదͣ͞ xt rt ˜ xt  123 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͱඇઢܗੑ

Slide 124

Slide 124 text

ඇઢܗճؼϞσϧͱͯ͠Ψ΢εաఔճؼʢGPʣΛ༻͍Δํࡦͷ՝୊ • GP͸ɺσʔλ͔Βɺ͋Δؔ਺ͷ෼෍Λ֬཰աఔͱͯ͠ٻΊΔ • ଟ਺ͷجఈؔ਺  ʹΑΔߴ͍දݱྗ • Χʔωϧ๏ʹΑͬͯجఈؔ਺ͷ໌͕ࣔෆཁ ϕ: ℝD → ℝ  124 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͱඇઢܗੑ k(p, q) ≜ (p 𝖳 q + c)m, m = 2, p, q ∈ ℝ2 ͜ͷଟ߲ࣜΧʔωϧؔ਺͸࿡ͭͷجఈؔ਺ʹΑΔม׵ͱ಺ੵΛऔͬͨ݁ՌͱҰக

Slide 125

Slide 125 text

ඇઢܗճؼϞσϧͱͯ͠Ψ΢εաఔճؼʢGPʣΛ༻͍Δํࡦͷ՝୊ • GP͸ɺσʔλ͔Βɺ͋Δؔ਺ͷ෼෍Λ֬཰աఔͱͯ͠ٻΊΔ • ଟ਺ͷجఈؔ਺  ʹΑΔߴ͍දݱྗ • Χʔωϧ๏ʹΑͬͯجఈؔ਺ͷ໌͕ࣔෆཁ • ඇઢܗੑͱෆ࣮֬ੑΛѻ͑ΔͨΊଟ࿹όϯσΟοτ໰୊΁ͷ਌࿨ੑ͕ߴ͍ [88,89,90] • → Χʔωϧߦྻ  ͱͦͷٯߦྻͷܭࢉΛؚΉͨΊɺ ɹ ֶशσʔλ਺  ʹରֶͯ͠श͕࣌ؒࢦ਺ؔ਺తʹ૿Ճ ϕ: ℝD → ℝ K ∈ ℝN×N N  125 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͱඇઢܗੑ  K−1  K ∈ ℝN×N  K−1 →  k(xi , xj )

Slide 126

Slide 126 text

 126 Weighted GP-UCB [95] [Y. Deng 2022] • ๨٫ܕͷඇఆৗ͔ͭඇઢܗͳจ຺෇͖ํࡦ • ཚ୒ԽϑʔϦΤಛ௃Λ༻͍ͯɺΨ΢εաఔճؼͷ༧ଌ෼෍ͷܭࢉΛ  ͱͳΔ  ࣍ݩͷઢܗճؼͷܗࣜͰղ͘ • ೋͭͷॏΈߦྻʹΑͬͯඇఆৗੑΛॊೈʹѻ͏ • → σʔλ਺  ͷ૿Ճʹର͢Δֶश࣌ؒͷࢦ਺త૿ՃΛճආ͢Δ R ⋘ N R N  K  ZZ⊤  ≃  K ∈ ℝN×N  Z ∈ ℝN×R  Z⊤Z  Z⊤Z ∈ ℝR×R  ⋙

Slide 127

Slide 127 text

 127 • ๨٫ܕͷඇఆৗ͔ͭඇઢܗͳจ຺෇͖ํࡦ • ཚ୒ԽϑʔϦΤಛ௃Λ༻͍ͯɺΨ΢εաఔճؼͷ༧ଌ෼෍ͷܭࢉΛ  ͱͳΔ  ࣍ݩͷઢܗճؼͷܗࣜͰղ͘ • → σʔλ਺  ͷ૿Ճʹର͢Δֶश࣌ؒͷࢦ਺త૿ՃΛճආ͢Δ • ࠶ؼతֶशػߏ͸ఏڙ͞Εͳ͍ • → σʔλ਺ͷ૿Ճʹର͢Δֶश࣌ؒͷ૿ՃΛ෦෼తʹ͔͠ղܾ͠ͳ͍ R ⋘ N R N Weighted GP-UCB [95] [Y. Deng 2022]  K  ZZ⊤  ≃  K ∈ ℝN×N  Z ∈ ℝN×R  Z⊤Z  Z⊤Z ∈ ℝR×R  ⋙

Slide 128

Slide 128 text

 128 NysKRLS [41] [T. Zhang 2020] • ඇఆৗͳΧʔωϧஞ࣍࠷খೋ৐๏ • NyströmۙࣅΛ༻͍ͯɺΨ΢εաఔճؼͷ༧ଌ෼෍ͷܭࢉΛ  ͱͳΔ  ࣍ݩͷઢܗճؼͷܗࣜͰղ͘ • ஞ࣍࠷খೋ৐๏Λద༻͢Δ • ๨٫܎਺Λಋೖ͢Δ͜ͱͰඇఆৗੑΛѻ͏ • → σʔλ਺  ͷ૿Ճʹର͢Δஞֶ࣍शͷߴ଎ԽΛ࣮ݱ͢Δ R ⋘ N R N  (Z⊤ΓZ + γMΛ)−1 γ γ z xN

Slide 129

Slide 129 text

 129 • ඇఆৗͳΧʔωϧஞ࣍࠷খೋ৐๏ • NyströmۙࣅΛ༻͍ͯɺΨ΢εաఔճؼͷ༧ଌ෼෍ͷܭࢉΛ  ͱͳΔ  ࣍ݩͷઢܗճؼͷܗࣜͰղ͘ • → σʔλ਺  ͷ૿Ճʹର͢Δஞֶ࣍शͷߴ଎ԽΛ࣮ݱ͢Δ • ਖ਼ଇԽ߲ʹରͯ͠๨٫ॲཧ͕܁Γฦ͠ద༻͞ΕΔ • → ਖ਼ଇԽޮՌ͕௿ݮ͠աֶशʹΑΔਪఆਫ਼౓ͷྼԽ͕ੜͯ͡͠·͏ • ରࡦख๏Ͱ͋ͬͯ΋௨ৗͱൺ΂ͯܭࢉྔͷ૿Ճ΍࠷ద஋ʹऩଋ͠ͳ͍ [98, 46] R ⋘ N R N NysKRLS [41] [T. Zhang 2020]  (Z⊤ΓZ + γMΛ)−1 γ γ z xN

Slide 130

Slide 130 text

ఏҊख๏ RW-GPB

Slide 131

Slide 131 text

 K  131 ཚ୒ԽϑʔϦΤಛ௃Λ༻͍ͨΨ΢εաఔճؼʢ1/2ʣ  K ∈ ℝN×N  k(xi , xj ) • ཚ୒ԽϑʔϦΤಛ௃͸ɺ͋Δ֬཰෼෍  ͔Βͷ  ݸͷαϯϓϧΛ༻ ͍ͯΧʔωϧؔ਺  Λ  ͱۙࣅ͢Δख๏ • ͳ͓ɺ  • ֬཰෼෍  ͸Χʔωϧؔ਺ͷछྨʹΑܾͬͯ·Δ p(ω) R′  = R/2 k(xi , xj ) ̂ k(xi , xj ) = z(xi )⊤z(xj ) z(xi ) = 1/R′  (cos(ω⊤ 1 xi ), sin(ω⊤ 1 xi ), …, cos(ω⊤ R′  xi ), sin(ω⊤ R′  xi )) p(ω) ݸผͷΧʔωϧؔ਺ʹରͯ͠͸ܭࢉίετ͕ ૿Ճ͢Δ͕ɺجఈؔ਺ͷద༻ͱ಺ੵͷࠞ߹ૢ ࡞ͷ݁ՌΛ෼ղͨ͠ͱݟΔ͜ͱ͕Ͱ͖Δ → ύϥϝʔλͷ࣍ݩ਺Λ  ࣍ݩʹݻఆͰ͖Δ R ϕ(x) = (ϕ1 (x), …, ϕ∞ (x))⊤ ∈ ℝ∞ ϕ(xi )⊤ϕ(xj ) = k(xi , xj ) ≈ z(xi )⊤z(xj )

Slide 132

Slide 132 text

 132 ཚ୒ԽϑʔϦΤಛ௃Λ༻͍ͨΨ΢εաఔճؼʢ2/2ʣ  Λ = σ−2 w IR • ద༻ʹΑΓɺݻఆͷ  ࣍ݩͷϕΠζઢܗճؼϞσϧ૬౰ͷܗ͕ࣜಘΒΕΔ • → ઢܗͳஞ࣍࠷খೋ৐๏ͷద༻͕Մೳ R  K  ZZ⊤  ≃  K ∈ ℝN×N  Z ∈ ℝN×R  Z⊤Z  Z⊤Z ∈ ℝR×R  ⋙ k(xi , xj ) ≃ z(xi )⊤z(xj ) • ݩͷΨ΢εաఔճؼϞσϧͷ༧ଌ෼෍ͱͷൺֱ

Slide 133

Slide 133 text

• Ψ΢εաఔճؼϞσϧ΁ͷ๨٫ػߏͷಋೖ • ؍ଌޡࠩͷै͏֬཰෼෍ͷ෼ࢄ͕ɺաڈʹ ḪΔ΄Ͳେ͖͘ͳΔΑ͏ఆࣜԽ  133 ॏΈ෇͖ஞ࣍Ψ΢εաఔճؼϞσϧʢ1/3ʣ  Γ−1 = diag ((g(n))1≤n≤N) ∈ ℝN×N • ॏΈ෇͖Ψ΢εաఔճؼͷ༧ଌ෼෍ ಛʹ  ͷͱ͖ɺ  ͱ๨٫܎਺ ෇͖࠷খೋ৐๏ʹΑΔઢܗճؼͷ ఆࣜԽ͕Ұக σ2 ϵ = 1 ̂ μ′  ′ 

Slide 134

Slide 134 text

• ॏΈ෇͖Ψ΢εաఔճؼϞσϧ΁ͷஞ࣍࠷খೋ৐๏ͷద༻ • ٯߦྻܭࢉͷճආʹΑΔܭࢉྔͷ࡟ݮͷͨΊ  ͱ  Λ݁Ϳߋ৽ࣜΛಘΔ • ஞ࣍ܭࢉ਺  ͷ૿Ճʹ൐͍ɺ  ͷޡ͕ࠩྦྷੵ͠ɺਖ਼ଇԽޮՌ͕ݮগ • → ޡࠩΛิਖ਼͢Δʹ͸ɺஞ࣍࠷খೋ৐๏ͷద༻͕ෆՄೳͱͳΔ PN PN+1 N (γN − 1)Λ  134 ॏΈ෇͖ஞ࣍Ψ΢εաఔճؼϞσϧʢ2/3ʣ  P ˎ  ·ͨ͸  ͷ৔߹ʹ͸֘౰߲͸ফڈ γ = 1 Λ = diag(0)

Slide 135

Slide 135 text

 135 =  Z⊤ΓZ + Λ z ⋮ xN x1 xN−1  (Z⊤ΓZ + Λ)−1 Inv  (Z⊤ΓZ + γ0Λ)−1 γ γ z ⋮ xN x1 xN−1 ޡࠩิਖ਼ 1. ɹͷٯߦྻͷܭࢉ 2. ɹɹͱͷޡࠩͷղফ 3. ɹɹͷٯߦྻͷܭࢉ −1 −1 ॏΈ෇͖ஞ࣍Ψ΢εաఔճؼϞσϧʢ3/3ʣ • ॏΈ෇͖Ψ΢εաఔճؼϞσϧ΁ͷஞ࣍࠷খೋ৐๏ͷద༻ • ޡࠩͷൃੜճ਺Λѻ͏  Λಋೖͨ͠࠶ ఆࣜԽʹΑΓɺ࠷খೋ৐๏ͱޡࠩิਖ਼ Λஞ࣍Խ • ఏҊख๏Ͱ͸ɺਪఆޡࠩΛڐ༰্ͨ͠ ͰɺຖճͰ͸ͳ͘೚ҙͷִؒͰิਖ਼ M ͜ͷૢ࡞͸  ߦྻʹର͢ΔٯߦྻܭࢉΛ2ճؚΉ ͨΊܭࢉίετ͕ߴ͍ɻ͕ͨͬͯ͠ɺڐ༰Ͱ͖Δਪ ఆਫ਼౓Λ౿·͑ɺ೚ҙͷִؒͰૢ࡞Λ࣮ߦ͢Δɻ ˎޡࠩิਖ਼ͷ஋ΛܾΊΔͷ͸աڈͷֶशσʔλͰ͸ ͳ͘Ͱ͋ΔͨΊɺ͜ͷޡࠩิਖ਼๏΋࠶ؼతֶशͰ͋ Δ R × R

Slide 136

Slide 136 text

 136 ॏΈ෇͖ஞ࣍Ψ΢εաఔճؼϞσϧΛ༻͍ͨํࡦ z P N−2,M−2 Q N−2,M−2 x N−1 y N−1 (M = 0) γ P N−1,M−1 Q N−1,M−1 z x N y N γ P N,M Q N,M z x * p(y N−1 ∣ x N−1 , X, y) p(y N ∣ x N , X, y) p(y * ∣ x * , X, y) (M = 0) ᶃ ࿹͝ͱʹॏΈ෇͖ஞ࣍Ψ΢εաఔճؼʢRW-GPRʣ ϞσϧΛ༻ҙ ᶄ ࿹͝ͱͷRW-GPRϞσϧͷ༧ଌ෼෍ͷฏۉͱ෼ࢄύϥ ɹ ϝʔλΛ༻͍ͨUCB1ํࡦʹΑͬͯ࿹Λબఆ ᶅ બ୒ͨ͠࿹͔ΒͷใुΛ༻͍ͯRW-GPRϞσϧΛ๨٫ ɹ ෇͖ͷ࠶ؼֶशʹΑͬͯߋ৽ ᶅ’ ִؒ  ͝ͱʹޡࠩΛิਖ਼ τ

Slide 137

Slide 137 text

ධՁͱߟ࡯

Slide 138

Slide 138 text

ճؼϞσϧͷੑೳ͓Αͼޡࠩͱิਖ਼๏ͷಛੑධՁ • ඇఆৗ͔ͭඇઢܗͳճؼ໰୊ͱͯ͠2ͭͷඇઢܗ ؔ਺ΛมԽલޙʢ  ͱ  ʣͱݟཱͯσʔλΛੜ੒ • ൚ԽੑೳͷධՁͷͨΊֶशσʔλൣғΛ  ͱ্ͨ͠Ͱɺൣғ֎ͷ  ·Ͱ༧ଌͨ͠ • ఏҊճؼϞσϧʢ  ʣ͸৽͍͠ํͷσʔλʹ ্ख͘ద߹ͨ͠ fA fB [−3,3] [−4,4] M = 0  138 ճؼϞσϧͷ෼ੳ °4 °3 °2 °1 0 1 2 3 4 x °4 °2 0 2 4 y Predictive distribution (ˆ µ00 and 1æ confidence based on ˆ ß00) yA fA (x) yB fB (x) ˆ µ00 1æ confidence

Slide 139

Slide 139 text

ճؼϞσϧͷੑೳ͓Αͼޡࠩͱิਖ਼๏ͷಛੑධՁ • ༧ଌ෼෍ͷฏۉͱ෼ࢄύϥϝʔλͷ࠷ద஋ͱ࠶ ؼతֶशͷࠩ͸ɺ  ͷͱ͖Ұக͠ɺਖ਼͘͠ ػೳ͢Δ͜ͱΛ֬ೝͰ͖ͨ •  ͕େ͖͍ͱ͖ࠩ͸޿͕Γɺֶशσʔλൣғ֎ ͰಛʹݦஶʹͳΔ • → ޡࠩͷ஝ੵ͕ճؼϞσϧʹ͓͚ΔॏΈύϥϝʔλͷ෼ࢄʹ૬౰ͳେ͖ ͞ΛԾఆ͢Δ͜ͱʹ౳͍ͨ͠Ίɻ࣮ࡍʹɺ൚ԽੑೳͷѱԽ΋֬ೝͨ͠ɻ M = 0 M  139 ճؼϞσϧͷ෼ੳ °0.10 °0.05 0.00 0.05 0.10 Error of ˆ µ00 Estimation error of ˆ µ00 for each M M=0 M=200 M=400 M=600 °4 °3 °2 °1 0 1 2 3 4 x 0.0000 0.0005 0.0010 0.0015 0.0020 Error of ˆ ß00 Estimation error of ˆ ß00 for each M M=0 M=200 M=400 M=600

Slide 140

Slide 140 text

ඇఆৗ͔ͭඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτ໰୊ͷγϛϡϨʔγϣϯ • Wheel bandits[71]Λ֦ுͨ͠ɺඇఆৗWheel banditsͰධՁ • ࿹ͷใु͸ਖ਼ن෼෍  ʹै͏͕ɺฏۉ஋  ͸ίϯςΩετ৘ใ  ʹରͯ͠ԼਤͷΑ͏ʹରԠ͢Δ • ଳͷ഑ஔ͸࣌ؒਐలʹରͯ͠ࠨճΓͰҠಈ͢Δʢ4000ճͷࢼߦதʹ1ճసʣ 𝒩 (μ, σ2) μ xt = (xt,d )1≤d≤2  140 ํࡦͷධՁख๏  μ1 = 0.1,μ2 = 0.0,μ3 = 1.0,σ2 = 0.01,δ = 0.8,ρ = 4000 ࿹  ͷใुͷฏۉ͸ίϯςΩε τ৘ใͱ࣌ؒʹؔΘΒͣৗʹಉ͡ a(1) ฏۉͷؔ܎͸  Ͱ͋ ΓɺଳͷൣғͰ͸ରԠ͢Δ࿹Λɺ ͦΕҎ֎Ͱ͸࿹  Λબ୒͢Δ͜ ͱ͕ظ଴ใुͷ࠷େԽʹͭͳ͕Δ μ2 < μ1 ≪ μ3 a(1)

Slide 141

Slide 141 text

 141 ํࡦͷධՁख๏ ඇઢܗ ඇఆৗ ࠶ؼతֶश ํࡦ ஫ऍ ✓ ✓ ✓ RW-GPB (Proposal) ޡࠩิਖ਼ͷޮՌΛൺֱ͢ΔͨΊෳ਺ͷิਖ਼ִؒτͰධՁ ✓ ✓ GP+UCB (Weighted, RFF) [95] - ࠷ઌ୺ํࡦ - ཚ୒ԽϑʔϦΤಛ௃ͷ࠾༻ ✓ ✓ GP+UCB (Weighted) [95] ✓ ✓ GP+UCB (Sliding Window) [93] ✓ ✓ GP+UCB (Restarting) [93] ✓ GP+UCB [88~90] ✓ ✓ Decay LinUCB [68] ֤ํࡦͷճؼϞσϧͷ༗ޮੑ͕༰қʹൺֱͰ͖ΔΑ͏ɺ୳ࡧεέʔϦϯά߲͸  ʹἧ͑ͨ β = 1 • γϛϡϨʔγϣϯʹ༻͍Δํࡦ

Slide 142

Slide 142 text

750 800 850 900 950 Cumulative rewards 102 103 Trials per second ø = 1600 ø = 800 ø = 400 ø = 100 ø = 1 ø = 40 ø = 4 Cumulative rewards - Trials per second trade-oÆ RW-GPB GP+UCB (Sliding Window) GP+UCB (Weighted) GP+UCB (Weighted, RFF) ྦྷੵใुͱ࣮ߦ࣌ؒͷτϨʔυΦϑͷൺֱ • ఏҊํࡦ͕ɺ࠷ઌ୺ํࡦʹରͯ͠ߴ͍ྦྷੵใुͱ ୹͍࣮ߦ࣌ؒΛୡ੒ͨ͠ • GP+UCBʢWeightedʣ͸ྦྷੵใु͕࠷΋ߴ͘ ࠷΋࣮ߦ͕࣌ؒ௕͍ • ۙࣅਫ਼౓ͷ޲্͕ํࡦʹ΋ॏཁ • ଞͷඇఆৗํࣜ͸୳ࡧ͕ຫੑతʹ૿Ճ • ઢܗͳํࡦ͸ਖ਼͘͠ਪఆͰ͖ͳ͔ͬͨ  142 ํࡦͷධՁ݁Ռ

Slide 143

Slide 143 text

࣮ߦ࣌ؒͷ෼ੳͱޡࠩิਖ਼๏ͷ༗ޮੑ • ఏҊํࡦ͸ɺ࠶ؼతֶशʹΑ࣮ͬͯߦ࣌ؒ ΛҰఆʹอͭ • RFFΛར༻͠ͳ͍ํࡦ͸ɺࢦ਺తʹ࣮ߦ࣌ؒ ͕૿Ճɺར༻͢Δ৔߹΋ֶश͕࣌ؒ૿Ճ • ྦྷੵޡࠩͱͦͷิਖ਼ճ਺͸࣮ߦ࣌ؒΑΓ ྦྷੵใु΁ͷӨڹ͕େ͖͍ • ੵۃతͳޡࠩิਖ਼͕༗ޮ  143 ํࡦͷධՁ݁Ռ 500 1000 1500 2000 2500 3000 3500 4000 Number of trials 0 20 40 60 80 Cumulative execution time (Sec) Cumulative execution time RW-GPB (ø = 4) GP+UCB (Sliding Window) GP+UCB (Weighted) GP+UCB (Weighted, RFF) 750 800 850 900 950 Cumulative rewards 102 103 Trials per second ø = 1600 ø = 800 ø = 400 ø = 100 ø = 1 ø = 40 ø = 4 Cumulative rewards - Trials per second trade-oÆ RW-GPB GP+UCB (Sliding Window) GP+UCB (Weighted) GP+UCB (Weighted, RFF)

Slide 144

Slide 144 text

• ఏҊํࡦ͸ɺطʹ࿦ͨ͡ɺ๨٫ܕͷํࡦʹ͓͚ΔҎԼͷ՝୊Λ౿ऻ͢Δɻ • i. ճؼϞσϧϕʔεͰ͋ΔͨΊঢ়ଶۭؒϞσϧͷܽଌ஋ॲཧʹجͮ͘൪ڰΘ ͤ΁ͷରࡦ͕࠾༻Ͱ͖ͳ͍ • ii. มಈͷ͕۠ؒෆنଇͳঢ়گʹ͓͚Δ๨٫཰ͷύϥϝʔλௐ੔ͷࠔ೉͞ • iii. ίϯςΩετ৘ใͷཁҼ͝ͱͷ؍ଌ਺ͷภΓ͕͋Δ৔߹ͷiiͷ೉қ౓޲্ • ຊධՁͰ͸ɺ͜ΕΒ͕Өڹ͠ͳ͍γϛϡϨʔγϣϯΛઃఆ͕ͨ͠ɺ࣮ӡ༻΁ͷ ద༻ʹ޲͚ɺ͜ΕΒ΁ͷରࡦ͕๬·ΕΔɻ • → iʹ͍ͭͯ͸ະબ୒ͷ࿹ʹର͢Δ๨٫ૢ࡞ͷΈΛద༻͢ΔํࣜΛݕ౼த  144 ఏҊʹؔ͢Δٞ࿦

Slide 145

Slide 145 text

খ·ͱΊ

Slide 146

Slide 146 text

• ಛੑʢ1ʣͱʢ2ʣʹՃ͑ɺ༗༻ੑͷਪఆʹ͓͚Δෳࡶͳؔ܎ੑͱ͍͏ಛੑ ʢ3ʣʹରͯ͠ɺॏΈ෇͖ஞ࣍Ψ΢εաఔճؼϞσϧΛ༻͍ͨํࡦΛఏҊ • ఏҊճؼϞσϧʹର͢ΔධՁͰ͸ɺඇఆৗɾඇઢܗͳճؼ໰୊ʹରͯ͠ɺ༧ଌ ෼෍Λਫ਼౓ྑ͘ਪఆͰ͖Δ͜ͱΛ֬ೝ • ఏҊํࡦʹର͢ΔධՁͰ͸ɺඇఆৗ͔ͭඇઢܗͳଟ࿹όϯσΟοτ໰୊ʹର ͠ɺ࠷ઌ୺ํࡦͱൺֱͯ͠ྦྷੵใु0.3%վળͱ࣮ߦ࣌ؒ92%୹ॖΛ࣮ݱ • ࠓޙͷݚڀ՝୊ͱͯ͠ɺิਖ਼ִؒΛ͸͡Ίͱ͢ΔదԠతͳύϥϝʔλௐ੔ػߏ ͓Αͼܽଌ஋Λ༻͍ͨԾ૝త୳ࡧػߏͷ࣮ݱΛਐΊ͍ͨ  146 খ·ͱΊ

Slide 147

Slide 147 text

6. ·ͱΊ

Slide 148

Slide 148 text

• ຊݚڀͰ͸ɺଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥ΁ࣗΒదԠ͢Δ৘ใγεςϜͷ࣮ݱʹ޲͚ͯɺ దԠʹ൐͏ػձଛࣦͷղফΛଟ࿹όϯσΟοτ໰୊ͱΈͳͯ͠ղ͘ɺదԠత৘ใγεςϜͱ ͦͷํࡦΛఏҊͨ͠ • ຊݚڀͰ͸ɺଟ࿹όϯσΟοτ໰୊ʹରͯ͠ɺ࣮༻্ͷཁ੥Ͱ͋Δࡾͭͷಛੑͷߟྀ͕Մೳ ͳํࡦΛఏҊͨ͠ɻ ͜ΕΒʹର͢Δؔ࿈ݚڀ͸ɺݸผʹղܾ͕ՄೳͰ͸͕͋ͬͨɺઃఆͷ૊Έ߹ΘͤʹΑͬͯੜ ͡Δ৽ͨͳ՝୊ʹରͯ͠͸ɺैདྷख๏ͷ୯७ͳ૊Έ߹ΘͤͰ͸े෼ʹରԠͰ͖ͳ͔ͬͨɻ ຊݚڀͰ͸ɺ͜ͷΑ͏ͳ૊Έ߹ΘͤʹΑͬͯੜ͡Δ՝୊Λಉ෼໺ʹର͢Δݚڀ՝୊ͱͯ͠੔ ཧ্ͨ͠Ͱɺ͜ΕΛղܾ͢Δํࡦͷྫ΋ࣔͨ͠఺ʹ͓͍ͯɺֶज़తͳߩݙ͕Ͱ͖ͨͱߟ͑Δ  148 ·ͱΊͱຊݚڀͷҙٛʢ1/2ʣ

Slide 149

Slide 149 text

• ৘ใγεςϜͷར༻ऀ͕ਓͰ͋ΔݶΓɺ༗༻ੑͷ൑அ͸ར༻ऀʹҕͶΒΕɺͦͷͨΊͷ࣮؀ ڥͰͷൺֱධՁʹ͓͚Δػձଛࣦͷ௿ݮ͸ɺଟ࿹όϯσΟοτ໰୊ͷΑ͏ʹᐆດ͞ͷͳ͍਺ ཧతͳઃఆͱղ๏ʹΑͬͯѻΘΕΔͱਪଌ͞ΕΔɻ ػցֶशͷࣗಈԽٕज़ͷࢧԉ΍େن໛ݴޠϞσϧͷ࣮༻ԽʹΑͬͯػցֶशϞσϧͷ։ൃͷ ෑډ͕Լ͕ΓɺػցֶशϞσϧ΍ࢪࡦͷఏڙ͕Ұ૚Ճ଎͢ΔதͰɺຊݚڀͷఏҊํࡦ͸ɺબ ୒ͷ࠷దԽΛ௨ͨ͠ɺӡ༻ऀͷ࿑ྗͷ࡟ݮͱར༻ऀͷຬ଍౓ͷ޲্ͷҰॿʹͳΓಘΔͱߟ͑ Δɻ  149 ·ͱΊͱຊݚڀͷҙٛʢ2/2ʣ

Slide 150

Slide 150 text

No content

Slide 151

Slide 151 text

Appendix

Slide 152

Slide 152 text

ݚڀ֓ཁ: ଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥʹదԠ͢Δ৘ใγεςϜ എܠͱ໨త ՝୊ ੒Ռ [1] ࡾ୐ ༔հ, ็ ߃ݑ, Synapse: จ຺ʹԠͯ͡ܧଓతʹਪનख๏ͷબ୒Λ࠷దԽ͢Δਪન γεςϜ, ిࢠ৘ใ௨৴ֶձ࿦จࢽD, Vol.J103-D, No.11, pp.764-775, Nov 2020. [2] ࡾ୐ ༔հ, ็ ߃ݑ, Synapse: จ຺ͱ࣌ؒܦաʹԠͯ͡ਪનख๏ͷબ୒Λ࠷దԽ͢Δϝ λਪનγεςϜ, ిࢠ৘ใ௨৴ֶձ࿦จࢽD, Vol.J105-D, No.11, pp.641-652, Nov. 2022. [3] Yusuke Miyake, Tsunenori Mine, Contextual and Nonstationary Multi-armed Bandits Using the Linear Gaussian State Space Model for the Meta-Recommender System, 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp.3138-3145, Oct 2023. [4] Yusuke Miyake, Ryuji Watanabe, Tsunenori Mine, Online Nonstationary and Nonlinear Bandits with Recursive Weighted Gaussian Process, The 48th IEEE International Conference on Computers, Software, and Applications (COMPSAC 2024) (to appear) 1. ܧଓతʹબ୒Λ࠷దԽ͢Δ৘ใγεςϜͷઃܭ [1] ՝୊1ʹ͍ͭͯɺਪનγεςϜΛ୊ࡐʹɺҎԼͰఏҊ͢ΔػցֶशϞσϧͷಛੑΛߟྀͨ͠ଟ࿹όϯσΟοτ໰୊ͷํࡦʹΑͬͯɺࣗಈత͔ͭܧଓతʹબ୒Λ ࠷దԽ͢ΔదԠܕ৘ใγεςϜج൫ΛઃܭɾධՁͨ͠ɻ 2. ଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥ΁ͷదԠ [2] ՝୊2ʹ͍ͭͯɺཻࢠϑΟϧλΛ༻͍ͨจ຺෇͖͔ͭඇఆৗͳଟ࿹όϯσΟοτํࡦΛఏҊ͠ɺଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥ΁ͷదԠྗ޲্Λ࣮ݱͨ͠ɻ 3. దԠͷߴ଎Խ [3] ՝୊2ɾ3ʹ͍ͭͯɺઢܗΧϧϚϯϑΟϧλΛ༻͍ͨจ຺෇͖͔ͭඇఆৗͳଟ࿹όϯσΟοτํࡦͷఏҊ͠ɺదԠͷߴ଎ԽΛ࣮ݱͨ͠ɻ 4. ඇઢܗੑ΁ͷରԠ [4] ՝୊2ɾ3ɾ4ʹ͍ͭͯɺॏΈ෇͖ஞ࣍Ψ΢εաఔճؼΛ༻͍ͨඇఆৗ͔ͭඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτํࡦΛఏҊ͠ɺඇઢܗͳ໰୊ઃఆʹ͓͚Δඇఆৗ ੑ΁ͷରԠͱॲཧ଎౓ͷ޲্Λ࣮ݱͨ͠ɻ 1. ࣮؀ڥͰͷධՁʹΑΔػձଛࣦ ैདྷͷଟ࿹όϯσΟοτ໰୊ͷํࡦΛ༻͍ͯϞσϧબ୒ͷ࠷దԽΛਤΔ৘ใγεςϜͰ͸ɺػցֶश ϞσϧͷಛੑΛߟྀͰ͖ͣɺػձଛࣦΛे෼ʹ཈͑ΒΕͳ͍ɻ 2. จ຺΍࣌ؒͷܦաʹΑΔ༗༻ੑͷมಈ ػցֶशϞσϧͷ༗༻ੑ͸ɺར༻ऀ΍γεςϜͷঢ়گʹΑͬͯɺ·ͨಉ͡ঢ়گͰ͋ͬͯ΋࣌ؒͷܦա ʹΑͬͯมಈ͢Δɻ 3. దԠʹ൐͏͕࣌ؒٴ΅͢༗༻ੑ΁ͷӨڹ จ຺΍࣌ؒͷܦաΛ΋ߟྀͨ͠ܧଓతͳൺֱධՁͷ࢓૊Έͷಋೖ͸ɺԠ౴ʹ஗ΕΛҾ͖ى͜͢ɻ 4. ༗༻ੑͷਪఆʹ͓͚Δෳࡶͳؔ܎ੑ จ຺ͱػցֶशϞσϧͷ༗༻ੑͷؒʹ͸ɺඇઢܗͳؔ܎ੑ΋͋ΓಘΔɻ 1. ৘ใγεςϜͱ؀ڥมԽ ଟ༷͔ͭܧଓతʹมԽ͢Δ؀ڥͷதͰɺ৘ใγεςϜ͕ܧଓతʹػೳ͢Δʹ͸ɺैདྷͷਓखʹΑΔӡ༻ Ͱ͸ͳ͘ɺࣗಈԽ͞ΕͨదԠػߏͷ࣮ݱ͕՝୊ʹͳΔɻ 2. ؀ڥมԽʹࣗΒదԠ͢Δ৘ใγεςϜ దԠܕ৘ใγεςϜͷ࣮ݱʹ͸ɺσʔλ͔Βಈతʹಈ࡞Λઃܭ͢ΔػցֶशϞσϧͱͷ౷߹͕ෆՄܽͰ ͋Δ͕ɺͲͷػցֶशϞσϧ͕ਅʹޮՌతͰ͋Δ͔Λ༧Ί஌Δ͜ͱ͸೉͍͠ɻ 3. બ୒ͷ࠷దԽ ࣮؀ڥͰͷධՁʹΑΔػցֶशϞσϧͷબ୒Ͱ͸ɺ୹ظతͳධՁʹΑΔػձଛࣦ΍࠷దͳϞσϧΛݟಀ ͢ϦεΫ͕൐͏ͨΊɺ͜ͷબ୒աఔΛ࠷దԽ͠ػձଛࣦΛܰݮ͢Δ࢓૊Έ͕ٻΊΒΕΔɻ બ୒ͷ࠷దԽͷ՝୊ దԠܕ৘ใγεςϜͷ࣮ݱʹ޲͚ͨબ୒ ͷ࠷దԽ બ୒ͷ࠷దԽͷ՝୊ͷղܾ IUUQTJDPOTDPN