Slide 1

Slide 1 text

ࡾ୐ ༔հ1,2ɼ็ ߃ݑ3 1. Pepabo R&D Institute, GMO Pepabo, Inc., 2. ۝भେֶ େֶӃγεςϜ৘ใՊֶ෎ ৘ใ஌ೳ޻ֶઐ߈ 3. ۝भେֶ େֶӃγεςϜ৘ใՊֶݚڀӃ ৘ใ஌ೳ޻ֶ෦໳ 2022.09.15 SMASH22 Summer Symposium ਝ଎ͳֶशػߏΛ༻͍ͯ ஞ࣍దԠੑΛଛͳ͏͜ͱͳ͘ඇઢܗੑΛѻ͏ จ຺෇͖ଟ࿹όϯσΟοτख๏

Slide 2

Slide 2 text

1. ͸͡Ίʹ 2. ؔ࿈ݚڀ: NNϞσϧͱͷ౷߹ͱஞ࣍దԠੑͷ՝୊ 3. ఏҊख๏: Extreme Neural Linear Bandits 4. ධՁͱߟ࡯ 5. ͓ΘΓʹ 2 ໨࣍

Slide 3

Slide 3 text

1. ͸͡Ίʹ

Slide 4

Slide 4 text

• దԠతͳγεςϜͷ࣮ݱʹ͸ɺγεςϜ͕ར༻ऀͷঢ়گΛΑ͘஌Δ͜ͱ͕ॏཁ • ECαΠτͷγεςϜͰ͋Ε͹ɺར༻ऀͷᅂ޷Λ೺Ѳ͢Δ͜ͱͰɺ࠷దͳ঎ ඼΍ಋઢΛఏҊͰ͖Δ • ࣮ӡ༻ͷγεςϜʹ͓͍ͯίϛϡχέʔγϣϯʹ͸ίετ͕͔͔Δ • ʢར༻ऀࣗ਎΋ؚΊͯʣཁٻ΍ᅂ޷͸໌֬Ͱ͸ͳ͘ঃʑʹܗ੒͞Ε͍ͯ͘ • ͦͷظؒதͷෛ୲΍ػձଛࣦ͸୹ظ௕ظͰചΓ্͛ͳͲʹӨڹ͢Δ • ಛʹɺཁٻ΍ᅂ޷͕มԽ͢Δ؀ڥͰ͸ɺݱ࣌఺ͰՁ஋ͷ௿͍ίϛϡχέʔ γϣϯ΋ܧଓͯ͠ߦ͏ඞཁ͕͋Δ 4 దԠతͳγεςϜͱίϛϡχέʔγϣϯίετ

Slide 5

Slide 5 text

• ίϛϡχέʔγϣϯΛɺબ୒ࢶͷఏҊͱ൓Ԡͱݶఆ͢Δ͜ͱͰɺ͜ͷίετΛ ࠷దԽ͢Δ໰୊Λʮଟ࿹όϯσΟοτ໰୊ʯͱͯ͠ߟ͑Δ͜ͱ͕Ͱ͖Δ 5 ίϛϡέʔγϣϯίετͷ࠷దԽͱଟ࿹όϯσΟοτ • ҰํͰɺैདྷͷಉ໰୊ͷղ๏Ͱ͸ɺঢ়گͱબ୒ͷؒʹʮෳࡶͳؔ܎ੑʯ͕͋Δ ؀ڥͰ͸ɺؔ܎ੑͷ೺ѲͷͨΊͷֶश͕࣌ؒ૿Ճ͢Δʢؔ࿈ݚڀͰઆ໌ʣ • ͜ͷ؀ڥʹ͓͍ͯਖ਼͔֬ͭਝ଎ͳબ୒ࢶͷఏҊ͕Մೳͳղ๏ΛఏҊ͢Δ͜ͱ ͰɺదԠతͳγεςϜͷ࣮༻ԽΛਐΊ͍ͨ

Slide 6

Slide 6 text

• ʮ࿹ʯͱݺ͹ΕΔෳ਺ͷީิ͔ΒಘΒΕΔใुΛ࠷େԽ͢Δ໰୊ • ϓϨΠϠʔ͸Ұ౓ͷࢼߦͰ1ͭͷ࿹Λબ୒͠ɺใुΛಘΔ • ͦΕͧΕͷ࿹͸͋Δใु෼෍ʹै͍ใुΛੜ੒ • ͨͩ͠ɺϓϨΠϠʔ͸͜ͷใु෼෍Λࢼߦͷ݁Ռ͔Βਪଌ͢Δඞཁ͕͋Δ 6 ଟ࿹όϯσΟοτ໰୊ • ϓϨΠϠʔ͸͋Δ࣌఺ͷ࿹ͷධՁʹج͖ͮʮ׆༻ʯͱʮ୳ࡧʯΛฒߦͯ͠ߦ͏ • ͜ͷτϨʔυΦϑΛղফ͢ΔͨΊʹ༷ʑͳղ๏͕ఏҊ͞Ε͍ͯΔ

Slide 7

Slide 7 text

• ࿹͝ͱͷใु෼෍͸ৗʹಉ͡Ͱ͋Δͱ͍͏Ծఆ • → ঢ়گ΍ଐੑ͝ͱʹใु෼෍͕ҟͳΔͷͰ͸ͳ͍͔ʁ • ྫʣ೥୅͝ͱʹਓؾͷ঎඼͕ҧ͏ɺ࠷ۙɺಉ͡ΧςΰϦͷ঎඼Λങͬͨ 7 ଟ࿹όϯσΟοτ໰୊ͷ֦ு • ʮจ຺෇͖ʯଟ࿹όϯσΟοτ໰୊ͱ֦ͯ͠ு͞Ε͍ͯΔ • → ͜ͷղ๏Ͱ͸ɺίϯςΩετ৘ใˎͱใुͷؔ܎ੑΛਪଌ͢Δ • ˎίϯςΩετ৘ใͱ͸ɺঢ়گ౳Λ৘ใγεςϜͰѻ͑Δܗʹม׵ͨ͠΋ͷ

Slide 8

Slide 8 text

• ίϯςΩετ৘ใͱใुͷؒʹઢܗͳؔ܎ΛԾఆͯ͠ਪଌ • LinUCB [L. Li 2010]ɺLinear Thompson Sampling [S. Agrawal 2013] 8 ैདྷͷจ຺෇͖ଟ࿹όϯσΟοτղ๏ a(k*) = argmaxk=1,K (x⊤ ˜ θ(k) + α x⊤U(k)x) ˜ θ(k) = U(k)v(k) U(k) = ( N(k) ∑ i=1 xi x⊤ i ) −1 v(k) = N(k) ∑ i=1 xi yi ྫ-JO6$#ʹ͓͚Δ࿹ͷબఆ ਪఆͨ͠ฏۉใुͱɺࢼߦճ਺ʹԠͨ͡ෆ࣮֬ੑͷදݱͰ͋Δ୳ࡧ߲ͷ࿨ͷ࠷΋େ͖ͳ࿹Λબఆ͢Δ ใु͕ೖྗͷίϯςΩετ৘ใ ͱύϥϝʔλ ͷ ੒෼ͷੵ࿨͔Βٻ·ΔͱԾఆ͢ΔઢܗϞσϧ x θ

Slide 9

Slide 9 text

• ৘ใγεςϜͰѻ͏σʔλͷछྨͱྔ͕૿Ճ • ਓޱ౷ܭֶతଐੑ౳ͷߏ଄Խσʔλ͔Βɺը૾ɾࣗવݴޠ౳ͷඇߏ଄Խσʔλ΁ • ୯७ͳઢܗͳղ๏Ͱ͸ɺෳࡶͳؔ܎ੑΛॆ෼ʹѻ͏͜ͱ͕Ͱ͖ͳ͍ 9 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͷߴ౓Խͱඇઢܗͳղ๏ ίϯςΩετ৘ใ ʹର͢Δ ใु෼෍ͷྫɻ 
 ࠨ͸ઢܗ ɺӈ͸ඇઢܗΛԾఆ ͢Δղ๏͕ద͢Δͱߟ͑ΒΕΔ x = (x1,x2)⊤ ( ̂ y = x⊤w)

Slide 10

Slide 10 text

• Neural Network (NN) Λ༻͍ͯɺίϯςΩετ৘ใͱใुͷඇઢܗͳؔ܎ੑΛ ѻ͏ख๏͕ొ৔ [R. Allesiardo 2014,C. Riquelme 2018, M. Collier 2018, D. Guo 2020, D. Zhou 2020, S. Sajeev 2021] 10 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͷߴ౓Խͱඇઢܗͳղ๏ • NNϞσϧ͕ͦͷੑೳΛൃش͢ΔͨΊʹ͸ɼେྔͷֶशσʔλͱͦΕʹదԠ͞ ͤΔͨΊͷॆ෼ͳֶश͕࣌ؒඞཁ • ར༻ऀ͔Βஞ࣍తʹग़͞ΕΔଟ༷͔ͭมԽ͢Δཁٻ΁ͷదԠੑʢஞ࣍దԠੑʣ ͷ௿ԼΛট͘ • ֶश࣌ؒͷ૿ՃΛߟྀ͠ͳ͍৔߹ɺҙࢥܾఆج४ͷߋ৽͕஗Ԇ͢Δ • ஞ࣍తͳֶशΛආ͚Δ৔߹ɺ࠷৽ͷ৘ใΛར༻Ͱ͖ͳ͍

Slide 11

Slide 11 text

• దԠతͳγεςϜͷ࣮ݱͷͨΊɺෳࡶͳҙࢥܾఆΛਝ଎ʹߦ͏ػߏ͕ඞཁ 
 • ͜ΕΛఆࣜԽͨ͠จ຺෇͖ଟ࿹όϯσΟοτ໰୊ʹର͢Δඇઢܗͳղ๏ʹண໨ 
 • ैདྷղ๏Ͱͷஞ࣍దԠੑΛଛͳ͏ֶश࣌ؒͷ૿Ճͷ՝୊Λղܾ͍ͨ͠ 
 • ൓෮తͳֶश͕ෆཁͰֶश͕࣌ؒ୹͍NNϞσϧͱͷ౷߹ΛఏҊ • Ճ͑ͯɺଟ࿹όϯσΟοτղ๏ʹର͢ΔಉϞσϧͷ༗༻ੑΛ෼ੳɾߟ࡯ 11 ݚڀͷ໨తͱఏҊͷࠎࢠ

Slide 12

Slide 12 text

2. ؔ࿈ݚڀ NNϞσϧͱͷ౷߹ͱஞ࣍దԠੑͷ՝୊

Slide 13

Slide 13 text

• NNϞσϧΛಋೖͨ͠ॳظͷඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτղ๏ • ೚ҙͷNNϞσϧΛίϯςΩετ৘ใ͔ΒใुΛਪఆ͢Δใुؔ਺ͱͯ͠ར༻ • -GreedyʹΑΔݻఆൺ཰Ͱͷ࿹ͷ׆༻ͱ୳ࡧ ϵ 13 Neural Bandit1 [R. Allesiardo 2014] ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ xt ̂ y(1),* t ̂ y(2) t ̂ y(K) t argmaxk=1,K ̂ y(k),1 − ϵ ∀a ∈ A, ϵ/K Neural Network Bandit

Slide 14

Slide 14 text

• NNϞσϧΛಋೖͨ͠ॳظͷඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτղ๏ • ೚ҙͷNNϞσϧΛίϯςΩετ৘ใ͔ΒใुΛਪఆ͢Δใुؔ਺ͱͯ͠ར༻ • -GreedyʹΑΔݻఆൺ཰Ͱͷ࿹ͷ׆༻ͱ୳ࡧ ϵ 14 Neural Bandit1 [R. Allesiardo 2014] • NNϞσϧΛଟ࿹όϯσΟοτղ๏ʹಋೖ͢Δࡍͷ2ͭͷ՝୊Λ໌Β͔ʹͨ͠ 1. ׆༻ͱ୳ࡧʹ͓͚ΔNNϞσϧͷෆ࣮֬ੑͷߟྀ 2. ஞ࣍దԠੑͷ֬อ

Slide 15

Slide 15 text

• ࿹ͷ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑΛNNϞσϧʹ͓͍ͯͲͷΑ͏ʹදݱ͢ Δ͔ͱ͍͏՝୊ • LinUCBͷ୳ࡧ߲ʹ૬౰͢Δ஋ΛNNϞσϧͰͲ͏දݱ͢Δ͔ • • ैདྷͷNNϞσϧΛಋೖ͢Δղ๏ͷଟ͕͘͜ͷ՝୊ͷํʹண໨͍ͯ͠Δ • ༧ଌ࣌ʹDropout๏Λద༻ͯ͠ੜ͡Δਪఆͷ෯Λෆ࣮֬ੑͱΈͳ͢ղ๏ [C. Riquelme 2018][M. Collier 2018] • Bootstrap๏ʹΑΔෳ਺ͷχϡʔϥϧωοτϫʔΫͷϞσϧΛ֬཰తʹબ୒ͯ͠ੜ͡Δਪఆ 
 ͷ෯Λෆ࣮֬ੑͱΈͳ͢ղ๏ [C. Riquelme 2018][D. Guo 2020] • ใुͷਪఆͱͷ͔ࠩΒٻ·Δύϥϝʔλʹର͢Δޯ഑Λ༻͍ͯෆ࣮֬ੑΛදݱ͢Δղ๏ [D. Zhou 2020] a(k*) = argmaxk=1,K (x⊤ ˜ θ(k) + α x⊤U(k)x) 15 1. ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑͷߟྀ

Slide 16

Slide 16 text

• ࿹ͷ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑΛNNϞσϧʹ͓͍ͯͲͷΑ͏ʹද ݱ͢Δ͔ͱ͍͏՝୊ • LinUCBͷ୳ࡧ߲ʹ૬౰͢Δ஋ΛNNϞσϧͰͲ͏දݱ͢Δ͔ • • ैདྷͷNNϞσϧΛಋೖ͢Δղ๏ͷଟ͕͘͜ͷ՝୊ͷํʹண໨͍ͯ͠Δ • ͜ΕΒͷղ๏Ͱ͸ɺར༻͢ΔNNϞσϧʹର͠ɺஞ࣍తͳֶशΛߦΘͳ͍͔ɺ ߦ͏৔߹Ͱ΋ɺֶश࣌ؒΛߟྀ͠ͳ͍ a(k*) = argmaxk=1,K (x⊤ ˜ θ(k) + α x⊤U(k)x) 16 1. ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑͷߟྀ

Slide 17

Slide 17 text

• Neural Linear [C. Riquelme 2018] ͸2ͭͷ՝୊ΛNNϞσϧͱଟ࿹όϯσΟοτղ๏ͷ ϞσϧΛ෼཭͢Δ͜ͱͰରԠͨ͠ • ೚ҙͷNNϞσϧΛݩͷίϯςΩετ৘ใ͔Βใुͱͷؔ܎ੑΛΑ͘දݱ͢Δ৽͍͠ ίϯςΩετ৘ใ΁ͷม׵ثͱͯ͠ར༻ • ࿹ͷ׆༻ͱ୳ࡧ͸ɺैདྷͷઢܗͳղ๏Λར༻ʢ1ͭ໨ͷ՝୊ʹରԠʣ • NNϞσϧͷֶशִؒͱଟ࿹όϯσΟοτղ๏ͷஞ࣍తͳߋ৽ͷִؒͱ੾Γ཭͢ 
 ʢ2ͭ໨ͷ՝୊Λ؇࿨ → ࠷৽ͷ৘ใ͸ར༻Ͱ͖ͳ͍ʣ 17 2. ஞ࣍దԠੑͷ֬อ ʜ ʜ ʜ xt ˜ x(1) t a(k*) = argmaxk=1,K (˜ x(k)⊤ ˜ θ(k) + α ˜ x(k)⊤U(k) ˜ x(k) ) ʜ ʜ ʜ ʜ ˜ x(K) t Bandit (LinUCB, Linear Thompson Sampling, etc…)

Slide 18

Slide 18 text

• NNϞσϧͷֶशͰར༻͞ΕΔޡࠩٯ఻೻๏͸ɺ༧ଌޡ͔ࠩΒٻ·Δޯ഑Λ༻ ͍ͯ൓෮తʹֶशΛਐΊΔ • → ֶशσʔλͷ૿Ճʹ൐͍ɺ݁Ռͷऩଋ·Ͱʹඞཁֶ͕श͕࣌ؒ૿Ճ͢Δ 
 
 • ऩଋ·Ͱͷ࣌ؒΛ୹ॖ͢Δख๏ͱͯ͠ɺ֬཰తޯ഑߱Լ๏΍ޯ഑߱Լ๏ͷ࠷ద ԽΞϧΰϦζϜ [D.P. Kingma 2014]͕ఏҊ͞Ε͍ͯΔ • → ґવͱͯ͠ɺֶशσʔλͷ૿Ճʹ൐ֶ͍श͕࣌ؒ૿Ճ͢Δಛੑ͸࢒Δ • ௚ۙʹಘΒΕֶͨशσʔλͷΈΛ༻͍ͯ௥ՃతʹֶशΛߦ͏ • → ഁ໓త๨٫ [J. Kirkpatrick 2017]͕ൃੜ͠ɺਫ਼౓͕Լ͕Δ໰୊΋ใࠂ͞Ε͍ͯΔ 18 2. ஞ࣍దԠੑͷ֬อʢNNϞσϧͰͷैདྷͷରԠʣ

Slide 19

Slide 19 text

3. ఏҊख๏ Extreme Neural Linear Bandits

Slide 20

Slide 20 text

• ར༻ऀ͔Βஞ࣍తʹग़͞ΕΔଟ༷͔ͭܧଓతʹมԽ͢ΔཁٻʹదԠ͢ΔͨΊɺ 1. จ຺ͱใुͷඇઢܗͳؔ܎ੑΛѻ͏͜ͱͷͰ͖ΔNNϞσϧ 2. ஞ࣍తͳֶश͕Մೳ 3. ֶश͋ͨΓͷ࣌ؒ΋୹͍ • Λશͯຬͨ͢Α͏ͳϞσϧͱͷ౷߹͕༗ޮ 20 ஞ࣍దԠੑΛଛͳΘͳ͍ඇઢܗͳղ๏ʹ޲͚ͯ • Online Sequential Extreme Learning Machine (OS-ELM) Λ༻͍ͨख๏ΛఏҊ

Slide 21

Slide 21 text

ཁૉٕज़ͷ঺հ

Slide 22

Slide 22 text

• ୯ҰͷӅΕ૚Λ࣋ͭNNߏ଄Λ༻͍ͨ୯७ͳճؼϞσϧ • ॏΈͷਪఆʹɺޡࠩٯ఻೻๏ʹΑΔ൓෮తͳֶशͰ͸ͳ͘ɺ࠷খೋ৐๏Λ༻͍ ΔͨΊֶश࣌ؒͷ୹ॖ͕ظ଴Ͱ͖Δ 22 Extreme Learning Machine (ELM) [Huang 2006] ʜ ʜ

Slide 23

Slide 23 text

23 ELMͷߏ଄ ʜ E ʜ - W ∈ ℝL×d b ∈ ℝL β ∈ ℝ1×L h1 = ϕ( d ∑ i=1 W1,i xi + b1 ) x ∈ ℝd h(x) = ϕ . (Wx + b) ̂ y = βh(x) ͸೚ҙͷ׆ੑԽؔ਺ ϕ ͸ཁૉ͝ͱʹ Λద༻͢Δԋࢉ ϕ . ϕ • ݸͷϢχοτΛ࣋ͭ୯ҰͷӅΕ૚͔ΒͳΔNNϞσϧ 
 ʢ͜͜Ͱ͸ଟ࿹όϯσΟοτղ๏ͱͷ౷߹Λલఏʹग़ྗΛεΧϥʔʹݶఆʣ L

Slide 24

Slide 24 text

24 ELMͷֶशʢೖྗ૚ʙӅΕ૚ʣ • ॏΈ ͱόΠΞε ͸ཚ਺ͰॳظԽͯ͠ਪఆͷର৅ͱ͠ͳ͍ 
 ͜ͷ৔߹ɺ͜ͷϞσϧ͸ೖྗΛඇઢܗԽ͢Δಛ௃ྔؔ਺ Λ࣋ͬͨઢܗϞσϧͱΈͳͤΔ W b h(x) ʜ E ʜ - W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ) β ∈ ℝ1×L h1 = ϕ( d ∑ i=1 W1,i xi + b1 ) x ∈ ℝd h(x) = ϕ . (Wx + b) ̂ y = βh(x) ͸೚ҙͷ׆ੑԽؔ਺ ϕ ͸ཁૉ͝ͱʹ Λద༻͢Δԋࢉ ϕ . ϕ

Slide 25

Slide 25 text

25 ELMͷֶशʢӅΕ૚ʙग़ྗ૚ʣ • ͜ͷઢܗϞσϧʹର͠࠷খೋ৐๏Λద༻͠ɺॏΈ Λਪఆ͢Δ 
 ͜Ε͸ֶशσʔλ ͱ ʹ͓͚Δɺ༧ଌޡࠩ ͷ࠷খղ ͱͯ͠ٻ·Δ β X y ∥Hβ⊤ − y∥2 ̂ β⊤ ʜ E ʜ - x ∈ ℝd ̂ y = ̂ βh(x) H = ϕ . (XW⊤ + B) ∈ ℝN×L X = (x1 , …, xN )⊤ ∈ ℝN×d y = (y1 , …, yN )⊤ ∈ ℝN B = (b, …, b) ∈ ℝN×L ̂ β⊤ = (H⊤H)−1H⊤y /PUBUJPO W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ)

Slide 26

Slide 26 text

• ELMͷਪఆํ๏Ͱ͸ɺଟ࿹όϯσΟοτ໰୊ͷΑ͏ʹஞ࣍తʹσʔλ͕ಘΒΕ Δঢ়گͰɺ৽ͨͳσʔλ͕ಘΒΕΔ౓ʹɺաڈͷશͯͷσʔλΛ༻͍ͨ࠶ܭࢉ ͕ൃੜͯ͠͠·͏ʢࢼߦճ਺ʹԠͯ͡ܭࢉྔ͕૿Ճʣ 
 
 • ݸͷೖྗ ͱ ʹର͢Δܭࢉ݁Ռ͕ಘΒΕ͍ͯΔͱ͖ʹɺͦͷܭࢉ݁Ռͱ ݸ໨ͷ৽ͨͳσʔλͷΈΛ༻͍ͯॏΈ Λਪఆ͢ΔOnline Sequential ELM͕ఏҊ͞Ε͍ͯΔʢࢼߦճ਺ʹԠͯ͡ܭࢉྔ͕Ұఆʣ N X y N + 1 β 26 Online Sequential ELM (OS-ELM) [Huang 2005]

Slide 27

Slide 27 text

27 OS-ELMͷֶश • ELMͱಉ༷ͷઢܗϞσϧʹର͠ɺஞ࣍࠷খೋ৐๏Λద༻͠ɺॏΈ Λਪఆ͢Δ 
 ࣌఺·Ͱͷܭࢉ݁Ռ͸ɺಛ௃ྔؔ਺ͱग़ྗ͔ΒٻΊͨ஋ͷ࿨ͱͯ͠ࢀরͰ͖Δ β N ʜ E ʜ - x ∈ ℝd ̂ y = ̂ βN+1 h(x) ̂ β⊤ N+1 = (H⊤ N+1 HN+1 )−1H⊤ N+1 yN+1 W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ) QN+1 = ( N+1 ∑ i=1 h(xi )h(xi )⊤ ) −1 = (Q−1 N + h(xN+1 )h(xN+1 )⊤) −1 = QN − QN h(xN+1 )h(xN+1 )⊤QN 1 + h(xN+1 )⊤QN h(xN+1 ) rN+1 = N+1 ∑ i=1 yi h(xi ) ٯߦྻͷิॿఆཧ ͳ͓ɺ ͷظؒ͸ٯߦྻ ΛٻΊΔ͜ͱ͕Ͱ͖ͳ͍ͨΊɺ ஞ࣍తͳֶशΛ࣮ߦ͠ͳ͍ʢ#PPTUJOHظؒʣ L > N Q

Slide 28

Slide 28 text

ఏҊख๏

Slide 29

Slide 29 text

29 ఏҊख๏: Extreme Neural Linear Bandits xt ˜ x(1) t a(k*) = argmaxk=1,K (˜ x(k)⊤ ˜ θ(k) + α ˜ x(k)⊤U(k) ˜ x(k) ) ʜ ˜ x(K) t ʜ ʜ ʜ ʜ OS-ELM + ਖ਼ଇԽ ม׵ؔ਺ β⊤ ⊗ h(x) Extreme Neural Linear Bandits Neural Network Bandit (LinUCB, Linear Thompson Sampling, etc…) • Neural Linear [C. Riquelme 2018] ํࣜΛ࠾༻ʢNNͱBanditͷ੾ସ΍վળΛࢹ໺ʹʣ • ಉํࣜͰͷOS-ELMద༻ʹ͋ͨΓɺʮ1. ม׵ؔ਺ʯʮ2. ਖ਼ଇԽ߲ʯΛಋೖ

Slide 30

Slide 30 text

• ݩͷίϯςΩετ৘ใ ͔Β৽͍͠ίϯςΩετ৘ใ ΛಘΔͨΊɺӅΕ૚ ͷग़ྗ ͱग़ྗ૚ͷؒͷॏΈ ͱͷཁૉ͝ͱͷੵΛ༻͍Δ • ैདྷͷNeural LinearͰ͸࠷ऴӅΕ૚ͷग़ྗΛͦͷ··༻͍Δ • → OS-ELMʹ͓͍ͯɺඇઢܗੑΛଊ͑ΔͨΊʹ࣮࣭తʹد༩͍ͯ͠ΔॏΈ·Ͱ൓өͤ͞Δ ͜ͱͰɺίϯςΩετ৘ใͱͯ͠ͷ༗༻ੑΛ޲্ͤ͞Δ x ˜ x h(xt ) β 30 ఏҊख๏: ίϯςΩετ৘ใͷม׵ؔ਺ ʜ ʜ ʜ xt ˜ x(NeuralLinear) t = h(xt ) ˜ x(ExtremeNeuralLinearBandits) t = β⊤ ⊗ h(xt ) ʜ ʜ ग़ྗ૚ͷॏΈ·Ͱ׆༻

Slide 31

Slide 31 text

• OS-ELMͰ͸ɺࢼߦճ਺ ͕Ϣχοτ਺ ະຬͷ࣌ɺBoostingظؒͱֶͯ͠ श͕Ͱ͖ͳ͍ͨΊɺ͜ͷظؒͷػձଛࣦ͕ൃੜ͢Δ • ఏҊख๏Ͱ͸ɺOS-ELMʹϦοδճؼΛద༻͠ɺࢼߦॳظ͔Βஞ࣍తʹֶशՄೳ • ݩͷ༧ଌޡࠩʹਖ਼ଇԽ߲ΛՃ͑ͨ ͷ࠷খղ ͱͯ͠ٻ·Δ 
 
 
 
 • ύϥϝʔλͷϊϧϜʹ੍໿ΛՃ͑Δ͜ͱ͔ΒɺաֶशΛ๷͗൚Խੑೳͷ޲্΋ظ଴ Ͱ͖Δ N L ∥Hβ⊤ − y∥2 + λnn ∥β⊤∥2 ̂ β⊤ 31 ఏҊख๏: OS-ELM΁ͷਖ਼ଇԽͷಋೖ ̂ β⊤ N+1 = (H⊤ N+1 HN+1 + λnn I)−1H⊤ N+1 yN+1 ͷ࣌ʹ ͔Β࢝·ΔΑ͏ʹมߋ N = 0 λnn I

Slide 32

Slide 32 text

4. ධՁͱߟ࡯

Slide 33

Slide 33 text

• Wheel bandits [C. Riquelme 2018]: ඇઢܗͳଟ࿹όϯσΟοτ໰୊γϛϡϨʔγϣϯ • ࣌఺ίϯςΩετ৘ใ ʹରͯ͠બఆͨ͠࿹͔Β ใु ͕ ͷΑ͏ʹಘΒΕΔ • ฏۉใुֹ ͸ҎԼͷΑ͏ʹ࿹͝ͱʹܾఆ͞ΕΔʢͨͩ͠ ʣ t xt = (xi )1≤xi ≤2, xi ∼ Uniform(−1,1) yt yt ∼ 𝒩 (μ, σ2) μ μ2 < μ1 ≪ μ3 33 ධՁํ๏

Slide 34

Slide 34 text

• Wheel banditsΛ༻͍ͨඇઢܗͳଟ࿹όϯσΟοτ໰୊ͷγϛϡϨʔγϣϯ • γϛϡϨʔγϣϯͷύϥϝʔλ: • γϛϡϨʔγϣϯ͝ͱʹ5000ճͷࢼߦɻ50ճͷฏۉ஋Λ݁Ռʹ༻͍Δ • ൺֱ͢Δղ๏͸ҎԼͷ௨Γɻղ๏ؒͷࠩҟ͕໌֬ʹͳΔΑ͏ઃఆΛἧ͑Δ • ֶशִؒʹ͍ͭͯɺ࣌ؒͷ͔͔ΔNeural Linear (Full)ʹ߹Θͤͯ100ճ͝ͱͱͨ͠ μ1 = 1.2,μ2 = 1.0,μ3 = 5.0,σ2 = 0.1,δ = 0.7 34 ධՁํ๏ NN Bandit Ϟσϧ ӅΕ૚ ਖ਼ଇԽ Ϟσϧ ਖ਼ଇԽ ୳ࡧ཰ LinUCB: ઢܗͳղ๏ - - - LinUCB λ=1.0 α=0.1 Neural Linear (Differential): ඇઢܗɺࠩ෼ֶश MLP (Diff) L=100 λ=1.0 LinUCB λ=1.0 α=0.1 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 LinUCB λ=1.0 α=0.1 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 LinUCB λ=1.0 α=0.1

Slide 35

Slide 35 text

• γϛϡϨʔγϣϯͷྦྷੵใुΛղ๏͝ͱʹൺֱ • ઢܗͳղ๏ʢLinUCBʣྦྷੵใु͕௿͘ɺඇઢܗͳઃఆΛॆ෼ʹѻ͑ͳ͍ • Neural Linear (Full) ͕ྦྷੵใु͕࠷΋ߴ͍ • Differential < Extreme Neural Linear BanditsͰ͋Δ͜ͱ͔Βɺ 
 ࣮ߦ࣌ؒͷ୹ॖΛ໨తͱͨࠩ͠෼ํࣜͱͯ͠ɺ 
 ఏҊख๏͕ଟ࿹όϯσΟοτ໰୊ʹର͢Δ 
 ੑೳΛҡ࣋Ͱ͖͍ͯΔ͜ͱ͕Θ͔Δ 35 ෳࡶͳҙࢥܾఆʹର͢ΔੑೳͷධՁ

Slide 36

Slide 36 text

• ࿹ͷධՁͷߋ৽ʹؔ͢Δྦྷੵ࣮ߦ࣌ؒΛղ๏͝ͱʹൺֱ • NNͷֶश͕ෆཁͳLinUCB͕0.05ඵͱ࠷΋଎͍ʢඇઢܗ΁ͷରԠ͸ෆे෼ʣ • ͍࣍ͰɺఏҊख๏3.0ඵɺNeural Linear (Differential) 13.1ඵɺFull͕28.3ඵ • ཧ༝1: ఏҊख๏ͱDifferential͸ࠩ෼ֶशͷͨΊɺࢼߦճ਺ͷ૿ՃʹґΒͣ ֶश͕࣌ؒҰఆ • ཧ༝2: ఏҊख๏͸൓෮తͳֶश͕ෆཁɻ 
 ֶश࣌ؒ͋ͨΓͷ࣮ߦ࣌ؒ͸ 
 Differentialͷ0.3ඵʹର͠0.07ඵͱ4.1ഒఔ౓ߴ଎ 36 ஞ࣍దԠੑͷධՁ

Slide 37

Slide 37 text

• ఏҊख๏Ͱ༻͍ͨਖ਼ଇԽ෇͖ͷOS-ELM͔ΒಘΒΕΔ৽͍͠ίϯςΩετ৘ใ ʹ͍ͭͯɺඇઢܗͳଟ࿹όϯσΟοτղ๏ʹର͢Δ༗༻ੑΛҎԼͷ؍఺Ͱ෼ੳ 1. NNϞσϧͱͯ͠ͷਫ਼౓ 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ 37 ίϯςΩετ৘ใͷ༗༻ੑͷධՁ

Slide 38

Slide 38 text

• ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸ςετʹର͢Δ༧ଌޡࠩͷઈର஋ͷฏۉΛશ࿹ͷϞσϧͰ߹ܭͨ͠΋ͷ • ࣮ઢ͸఺ઢ͸ਖ਼ଇԽύϥϝʔλ ɺ఺ઢ͸ ͷ݁Ռ λnn = 1.0 λnn = 0.0001 38 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ NN Ϟσϧ ӅΕ૚ ਖ਼ଇԽ LinUCB: ઢܗͳղ๏ - - - Neural Linear (Differential): ඇઢܗɺࠩ෼ MLP (Diff) L=100 λ=1.0 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 MLP(Diff)ʢࠩ෼σʔλʹΑΔ൓෮తͳֶशํࣜʣͰ͸ɺ NNϞσϧͱͯ͠΋༧ଌޡࠩ͸ݮΒͳ͔ͬͨ OS-ELMͱMLP(Full)ʢશσʔλʹΑΔ൓෮తͳֶशํ ࣜʣͰ͸ɺಛʹॳظʹֶ͓͍ͯशσʔλͷ૿Ճʹ൐͍༧ ଌޡ͕ࠩݮগͨ͠

Slide 39

Slide 39 text

• ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸ςετʹର͢Δ༧ଌޡࠩͷઈର஋ͷฏۉΛશ࿹ͷϞσϧͰ߹ܭͨ͠΋ͷ • ࣮ઢ͸఺ઢ͸ਖ਼ଇԽύϥϝʔλ ɺ఺ઢ͸ ͷ݁Ռ λnn = 1.0 λnn = 0.0001 39 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ NN Ϟσϧ ӅΕ૚ ਖ਼ଇԽ LinUCB: ઢܗͳղ๏ - - - Neural Linear (Differential): ඇઢܗɺࠩ෼ MLP (Diff) L=100 λ=1.0 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 MLP(Diff)ʢࠩ෼σʔλʹΑΔ൓෮తͳֶशํࣜʣͰ͸ɺ NNϞσϧͱͯ͠΋༧ଌޡࠩ͸ݮΒͳ͔ͬͨ OS-ELMͱMLP(Full)ʢશσʔλʹΑΔ൓෮తͳֶशํ ࣜʣͰ͸ɺಛʹॳظʹֶ͓͍ͯशσʔλͷ૿Ճʹ൐͍༧ ଌޡ͕ࠩݮগͨ͠ ਖ਼ଇԽΛऑΊΔͱMLP(Full)ͷΈɺ༧ଌޡ͕ࠩ͞Βʹݮগ → όϯσΟοτ໰୊ͱͯ͠ͷྦྷੵใुͰ͸ਖ਼ଇԽ͕ڧ͍ ํ͕݁Ռ͕Α͔ͬͨͷͱରরత

Slide 40

Slide 40 text

• ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ࿹ ʹඥͮ͘NNϞσϧͷਪఆใु஋ͷ෼෍ʢ ͷपลΛ֦େʣΛՄࢹԽ a2 μ = μ3 40 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ D = 100 D = 25k D = 100 D = 25k λnn = 0.0001 λnn = 1.0 04&-. .-1 'VMM 5SVUI ᶃ ֶशσʔλ͕૿͑Δͱਅͷใु෼෍ͷܗঢ়ʹۙͮ͘ ᶄ ਖ਼ଇԽ͕ऑ͍ͱಘΒΕͨσʔλ ʹద߹͠΍͍͢ʢաֶशʣ ᶅ աֶशΛڐ༰্ͨ͠ͰOS- ELM͸MLP(Full)ͱൺ΂ͯද ݱྗͷݶքΛ֬ೝͰ͖Δ

Slide 41

Slide 41 text

41 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ //Ϟσϧͷ༧ଌޡࠩͱ͸ ͓͓Αͦٯؔ਺ͷؔ܎ NNϞσϧͷਫ਼౓͕ߴ͍΄Ͳɺ ಘΒΕΔใु͕ଟ͍܏޲

Slide 42

Slide 42 text

42 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ //Ϟσϧͷ༧ଌޡࠩͱ͸ ͓͓Αͦٯؔ਺ͷؔ܎ MLP(Full)ͷ ʹ͍ͭͯɺ༧ଌޡࠩ͸ৗʹ ΑΓେ͖͔ͬͨʹ΋ؔΘΒͣɺ Ҏ߱ ͷ߹ܭใु͸ٯస͢Δ λnn = 1.0 λnn = 0.0001 D = 5k ઌͷධՁͰ΋ɺ ΑΓ΋ ͷํ͕ྦྷੵ ใु͕ଟ͍ λnn = 0.0001 λnn = 1.0 ਖ਼ଇԽΛڧΊΔ͜ͱ͕ଟ࿹όϯσΟοτͷใुͷվળʹ ͭͳ͕Δ

Slide 43

Slide 43 text

43 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ //Ϟσϧͷ༧ଌޡࠩͱ͸ ͓͓Αͦٯؔ਺ͷؔ܎ NNϞσϧ΋ஞ࣍తʹֶश͢ΔલఏͷఏҊख๏ʹ͓͍ͯ͸ ਖ਼ଇԽ߲ͷಋೖ͕༗ޮͰ͋Δ͜ͱ͕ࣔࠦ͞Εͨ ਖ਼ଇԽΛڧΊΔ͜ͱ͕ଟ࿹όϯσΟοτͷใुͷվળʹ ͭͳ͕Δ ਖ਼ଇԽΛڧΊΔ͜ͱͰɺNNϞσϧͷ൚Խੑೳ͕ߴ·Γɺ ίϯςΩετ৘ใͷදݱ͕Ұఆͷ܏޲Ͱ؇΍͔ʹมԽ → ଟ࿹όϯσΟοτղ๏ͷֶश͕҆ఆ͢Δͱߟ͑ΒΕΔ

Slide 44

Slide 44 text

5. ͓ΘΓʹ

Slide 45

Slide 45 text

• ඇઢܗͳଟ࿹όϯσΟοτ໰୊ͷैདྷͷղ๏͕ɺֶश࣌ؒͷ૿ՃʹىҼͯ͠ஞ ࣍దԠੑΛଛͳ͏ͱ͍͏࣮Ԡ༻্ʹ΋ॏཁͱͳΔಉ෼໺Ͱͷ՝୊ʹண໨ͨ͠ • ൓෮తͳֶश͕ෆཁͳOS-ELMΛैདྷͷղ๏ͱ౷߹͢Δ͜ͱͰɺඇઢܗͳ໰୊ ʹରͯ͠ஞ࣍దԠੑΛซͤ࣋ͭଟ࿹όϯσΟοτղ๏ΛఏҊͨ͠ • ैདྷͷઢܗ·ͨ͸ඇઢܗͳଟ࿹όϯσΟοτղ๏ͱൺֱධՁΛߦ͍ɺఏҊղ๏ ͕ඇઢܗੑ΁ͷੑೳΛҡ࣋͠ͳ͕Βɺֶश࣌ؒΛ࡟ݮͰ͖Δ͜ͱΛࣔͨ͠ • ଟ࿹όϯσΟοτղ๏ʹର͢Δɺ൓෮తͳֶश͕ෆཁͳNNϞσϧͷ༗༻ੑΛ ෼ੳ͠ɺఏҊղ๏ͷվળཁ݅Λݕ౼ͨ͠ 45 ͓ΘΓʹ

Slide 46

Slide 46 text

No content