$30 off During Our Annual Pro Sale. View Details »

迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/extreme_neural_linear_bandits

monochromegane
September 15, 2022

 迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/extreme_neural_linear_bandits

monochromegane

September 15, 2022
Tweet

More Decks by monochromegane

Other Decks in Research

Transcript

  1. ࡾ୐ ༔հ1,2ɼ็ ߃ݑ3 1. Pepabo R&D Institute, GMO Pepabo, Inc.,

    2. ۝भେֶ େֶӃγεςϜ৘ใՊֶ෎ ৘ใ஌ೳ޻ֶઐ߈ 3. ۝भେֶ େֶӃγεςϜ৘ใՊֶݚڀӃ ৘ใ஌ೳ޻ֶ෦໳ 2022.09.15 SMASH22 Summer Symposium ਝ଎ͳֶशػߏΛ༻͍ͯ ஞ࣍దԠੑΛଛͳ͏͜ͱͳ͘ඇઢܗੑΛѻ͏ จ຺෇͖ଟ࿹όϯσΟοτख๏
  2. 1. ͸͡Ίʹ 2. ؔ࿈ݚڀ: NNϞσϧͱͷ౷߹ͱஞ࣍దԠੑͷ՝୊ 3. ఏҊख๏: Extreme Neural Linear

    Bandits 4. ධՁͱߟ࡯ 5. ͓ΘΓʹ 2 ໨࣍
  3. 1. ͸͡Ίʹ

  4. • దԠతͳγεςϜͷ࣮ݱʹ͸ɺγεςϜ͕ར༻ऀͷঢ়گΛΑ͘஌Δ͜ͱ͕ॏཁ • ECαΠτͷγεςϜͰ͋Ε͹ɺར༻ऀͷᅂ޷Λ೺Ѳ͢Δ͜ͱͰɺ࠷దͳ঎ ඼΍ಋઢΛఏҊͰ͖Δ • ࣮ӡ༻ͷγεςϜʹ͓͍ͯίϛϡχέʔγϣϯʹ͸ίετ͕͔͔Δ • ʢར༻ऀࣗ਎΋ؚΊͯʣཁٻ΍ᅂ޷͸໌֬Ͱ͸ͳ͘ঃʑʹܗ੒͞Ε͍ͯ͘ •

    ͦͷظؒதͷෛ୲΍ػձଛࣦ͸୹ظ௕ظͰചΓ্͛ͳͲʹӨڹ͢Δ • ಛʹɺཁٻ΍ᅂ޷͕มԽ͢Δ؀ڥͰ͸ɺݱ࣌఺ͰՁ஋ͷ௿͍ίϛϡχέʔ γϣϯ΋ܧଓͯ͠ߦ͏ඞཁ͕͋Δ 4 దԠతͳγεςϜͱίϛϡχέʔγϣϯίετ
  5. • ίϛϡχέʔγϣϯΛɺબ୒ࢶͷఏҊͱ൓Ԡͱݶఆ͢Δ͜ͱͰɺ͜ͷίετΛ ࠷దԽ͢Δ໰୊Λʮଟ࿹όϯσΟοτ໰୊ʯͱͯ͠ߟ͑Δ͜ͱ͕Ͱ͖Δ 5 ίϛϡέʔγϣϯίετͷ࠷దԽͱଟ࿹όϯσΟοτ • ҰํͰɺैདྷͷಉ໰୊ͷղ๏Ͱ͸ɺঢ়گͱબ୒ͷؒʹʮෳࡶͳؔ܎ੑʯ͕͋Δ ؀ڥͰ͸ɺؔ܎ੑͷ೺ѲͷͨΊͷֶश͕࣌ؒ૿Ճ͢Δʢؔ࿈ݚڀͰઆ໌ʣ • ͜ͷ؀ڥʹ͓͍ͯਖ਼͔֬ͭਝ଎ͳબ୒ࢶͷఏҊ͕Մೳͳղ๏ΛఏҊ͢Δ͜ͱ

    ͰɺదԠతͳγεςϜͷ࣮༻ԽΛਐΊ͍ͨ
  6. • ʮ࿹ʯͱݺ͹ΕΔෳ਺ͷީิ͔ΒಘΒΕΔใुΛ࠷େԽ͢Δ໰୊ • ϓϨΠϠʔ͸Ұ౓ͷࢼߦͰ1ͭͷ࿹Λબ୒͠ɺใुΛಘΔ • ͦΕͧΕͷ࿹͸͋Δใु෼෍ʹै͍ใुΛੜ੒ • ͨͩ͠ɺϓϨΠϠʔ͸͜ͷใु෼෍Λࢼߦͷ݁Ռ͔Βਪଌ͢Δඞཁ͕͋Δ 6 ଟ࿹όϯσΟοτ໰୊

    • ϓϨΠϠʔ͸͋Δ࣌఺ͷ࿹ͷධՁʹج͖ͮʮ׆༻ʯͱʮ୳ࡧʯΛฒߦͯ͠ߦ͏ • ͜ͷτϨʔυΦϑΛղফ͢ΔͨΊʹ༷ʑͳղ๏͕ఏҊ͞Ε͍ͯΔ
  7. • ࿹͝ͱͷใु෼෍͸ৗʹಉ͡Ͱ͋Δͱ͍͏Ծఆ • → ঢ়گ΍ଐੑ͝ͱʹใु෼෍͕ҟͳΔͷͰ͸ͳ͍͔ʁ • ྫʣ೥୅͝ͱʹਓؾͷ঎඼͕ҧ͏ɺ࠷ۙɺಉ͡ΧςΰϦͷ঎඼Λങͬͨ 7 ଟ࿹όϯσΟοτ໰୊ͷ֦ு •

    ʮจ຺෇͖ʯଟ࿹όϯσΟοτ໰୊ͱ֦ͯ͠ு͞Ε͍ͯΔ • → ͜ͷղ๏Ͱ͸ɺίϯςΩετ৘ใˎͱใुͷؔ܎ੑΛਪଌ͢Δ • ˎίϯςΩετ৘ใͱ͸ɺঢ়گ౳Λ৘ใγεςϜͰѻ͑Δܗʹม׵ͨ͠΋ͷ
  8. • ίϯςΩετ৘ใͱใुͷؒʹઢܗͳؔ܎ΛԾఆͯ͠ਪଌ • LinUCB [L. Li 2010]ɺLinear Thompson Sampling [S.

    Agrawal 2013] 8 ैདྷͷจ຺෇͖ଟ࿹όϯσΟοτղ๏ a(k*) = argmaxk=1,K (x⊤ ˜ θ(k) + α x⊤U(k)x) ˜ θ(k) = U(k)v(k) U(k) = ( N(k) ∑ i=1 xi x⊤ i ) −1 v(k) = N(k) ∑ i=1 xi yi ྫ-JO6$#ʹ͓͚Δ࿹ͷબఆ ਪఆͨ͠ฏۉใुͱɺࢼߦճ਺ʹԠͨ͡ෆ࣮֬ੑͷදݱͰ͋Δ୳ࡧ߲ͷ࿨ͷ࠷΋େ͖ͳ࿹Λબఆ͢Δ ใु͕ೖྗͷίϯςΩετ৘ใ ͱύϥϝʔλ ͷ ੒෼ͷੵ࿨͔Βٻ·ΔͱԾఆ͢ΔઢܗϞσϧ x θ
  9. • ৘ใγεςϜͰѻ͏σʔλͷछྨͱྔ͕૿Ճ • ਓޱ౷ܭֶతଐੑ౳ͷߏ଄Խσʔλ͔Βɺը૾ɾࣗવݴޠ౳ͷඇߏ଄Խσʔλ΁ • ୯७ͳઢܗͳղ๏Ͱ͸ɺෳࡶͳؔ܎ੑΛॆ෼ʹѻ͏͜ͱ͕Ͱ͖ͳ͍ 9 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͷߴ౓Խͱඇઢܗͳղ๏ ίϯςΩετ৘ใ ʹର͢Δ

    ใु෼෍ͷྫɻ 
 ࠨ͸ઢܗ ɺӈ͸ඇઢܗΛԾఆ ͢Δղ๏͕ద͢Δͱߟ͑ΒΕΔ x = (x1,x2)⊤ ( ̂ y = x⊤w)
  10. • Neural Network (NN) Λ༻͍ͯɺίϯςΩετ৘ใͱใुͷඇઢܗͳؔ܎ੑΛ ѻ͏ख๏͕ొ৔ [R. Allesiardo 2014,C. Riquelme

    2018, M. Collier 2018, D. Guo 2020, D. Zhou 2020, S. Sajeev 2021] 10 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͷߴ౓Խͱඇઢܗͳղ๏ • NNϞσϧ͕ͦͷੑೳΛൃش͢ΔͨΊʹ͸ɼେྔͷֶशσʔλͱͦΕʹదԠ͞ ͤΔͨΊͷॆ෼ͳֶश͕࣌ؒඞཁ • ར༻ऀ͔Βஞ࣍తʹग़͞ΕΔଟ༷͔ͭมԽ͢Δཁٻ΁ͷదԠੑʢஞ࣍దԠੑʣ ͷ௿ԼΛট͘ • ֶश࣌ؒͷ૿ՃΛߟྀ͠ͳ͍৔߹ɺҙࢥܾఆج४ͷߋ৽͕஗Ԇ͢Δ • ஞ࣍తͳֶशΛආ͚Δ৔߹ɺ࠷৽ͷ৘ใΛར༻Ͱ͖ͳ͍
  11. • దԠతͳγεςϜͷ࣮ݱͷͨΊɺෳࡶͳҙࢥܾఆΛਝ଎ʹߦ͏ػߏ͕ඞཁ 
 • ͜ΕΛఆࣜԽͨ͠จ຺෇͖ଟ࿹όϯσΟοτ໰୊ʹର͢Δඇઢܗͳղ๏ʹண໨ 
 • ैདྷղ๏Ͱͷஞ࣍దԠੑΛଛͳ͏ֶश࣌ؒͷ૿Ճͷ՝୊Λղܾ͍ͨ͠ 
 •

    ൓෮తͳֶश͕ෆཁͰֶश͕࣌ؒ୹͍NNϞσϧͱͷ౷߹ΛఏҊ • Ճ͑ͯɺଟ࿹όϯσΟοτղ๏ʹର͢ΔಉϞσϧͷ༗༻ੑΛ෼ੳɾߟ࡯ 11 ݚڀͷ໨తͱఏҊͷࠎࢠ
  12. 2. ؔ࿈ݚڀ NNϞσϧͱͷ౷߹ͱஞ࣍దԠੑͷ՝୊

  13. • NNϞσϧΛಋೖͨ͠ॳظͷඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτղ๏ • ೚ҙͷNNϞσϧΛίϯςΩετ৘ใ͔ΒใुΛਪఆ͢Δใुؔ਺ͱͯ͠ར༻ • -GreedyʹΑΔݻఆൺ཰Ͱͷ࿹ͷ׆༻ͱ୳ࡧ ϵ 13 Neural Bandit1

    [R. Allesiardo 2014] ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ xt ̂ y(1),* t ̂ y(2) t ̂ y(K) t argmaxk=1,K ̂ y(k),1 − ϵ ∀a ∈ A, ϵ/K Neural Network Bandit
  14. • NNϞσϧΛಋೖͨ͠ॳظͷඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτղ๏ • ೚ҙͷNNϞσϧΛίϯςΩετ৘ใ͔ΒใुΛਪఆ͢Δใुؔ਺ͱͯ͠ར༻ • -GreedyʹΑΔݻఆൺ཰Ͱͷ࿹ͷ׆༻ͱ୳ࡧ ϵ 14 Neural Bandit1

    [R. Allesiardo 2014] • NNϞσϧΛଟ࿹όϯσΟοτղ๏ʹಋೖ͢Δࡍͷ2ͭͷ՝୊Λ໌Β͔ʹͨ͠ 1. ׆༻ͱ୳ࡧʹ͓͚ΔNNϞσϧͷෆ࣮֬ੑͷߟྀ 2. ஞ࣍దԠੑͷ֬อ
  15. • ࿹ͷ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑΛNNϞσϧʹ͓͍ͯͲͷΑ͏ʹදݱ͢ Δ͔ͱ͍͏՝୊ • LinUCBͷ୳ࡧ߲ʹ૬౰͢Δ஋ΛNNϞσϧͰͲ͏දݱ͢Δ͔ • • ैདྷͷNNϞσϧΛಋೖ͢Δղ๏ͷଟ͕͘͜ͷ՝୊ͷํʹண໨͍ͯ͠Δ • ༧ଌ࣌ʹDropout๏Λద༻ͯ͠ੜ͡Δਪఆͷ෯Λෆ࣮֬ੑͱΈͳ͢ղ๏

    [C. Riquelme 2018][M. Collier 2018] • Bootstrap๏ʹΑΔෳ਺ͷχϡʔϥϧωοτϫʔΫͷϞσϧΛ֬཰తʹબ୒ͯ͠ੜ͡Δਪఆ 
 ͷ෯Λෆ࣮֬ੑͱΈͳ͢ղ๏ [C. Riquelme 2018][D. Guo 2020] • ใुͷਪఆͱͷ͔ࠩΒٻ·Δύϥϝʔλʹର͢Δޯ഑Λ༻͍ͯෆ࣮֬ੑΛදݱ͢Δղ๏ [D. Zhou 2020] a(k*) = argmaxk=1,K (x⊤ ˜ θ(k) + α x⊤U(k)x) 15 1. ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑͷߟྀ
  16. • ࿹ͷ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑΛNNϞσϧʹ͓͍ͯͲͷΑ͏ʹද ݱ͢Δ͔ͱ͍͏՝୊ • LinUCBͷ୳ࡧ߲ʹ૬౰͢Δ஋ΛNNϞσϧͰͲ͏දݱ͢Δ͔ • • ैདྷͷNNϞσϧΛಋೖ͢Δղ๏ͷଟ͕͘͜ͷ՝୊ͷํʹண໨͍ͯ͠Δ • ͜ΕΒͷղ๏Ͱ͸ɺར༻͢ΔNNϞσϧʹର͠ɺஞ࣍తͳֶशΛߦΘͳ͍͔ɺ

    ߦ͏৔߹Ͱ΋ɺֶश࣌ؒΛߟྀ͠ͳ͍ a(k*) = argmaxk=1,K (x⊤ ˜ θ(k) + α x⊤U(k)x) 16 1. ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑͷߟྀ
  17. • Neural Linear [C. Riquelme 2018] ͸2ͭͷ՝୊ΛNNϞσϧͱଟ࿹όϯσΟοτղ๏ͷ ϞσϧΛ෼཭͢Δ͜ͱͰରԠͨ͠ • ೚ҙͷNNϞσϧΛݩͷίϯςΩετ৘ใ͔Βใुͱͷؔ܎ੑΛΑ͘දݱ͢Δ৽͍͠

    ίϯςΩετ৘ใ΁ͷม׵ثͱͯ͠ར༻ • ࿹ͷ׆༻ͱ୳ࡧ͸ɺैདྷͷઢܗͳղ๏Λར༻ʢ1ͭ໨ͷ՝୊ʹରԠʣ • NNϞσϧͷֶशִؒͱଟ࿹όϯσΟοτղ๏ͷஞ࣍తͳߋ৽ͷִؒͱ੾Γ཭͢ 
 ʢ2ͭ໨ͷ՝୊Λ؇࿨ → ࠷৽ͷ৘ใ͸ར༻Ͱ͖ͳ͍ʣ 17 2. ஞ࣍దԠੑͷ֬อ ʜ ʜ ʜ xt ˜ x(1) t a(k*) = argmaxk=1,K (˜ x(k)⊤ ˜ θ(k) + α ˜ x(k)⊤U(k) ˜ x(k) ) ʜ ʜ ʜ ʜ ˜ x(K) t Bandit (LinUCB, Linear Thompson Sampling, etc…)
  18. • NNϞσϧͷֶशͰར༻͞ΕΔޡࠩٯ఻೻๏͸ɺ༧ଌޡ͔ࠩΒٻ·Δޯ഑Λ༻ ͍ͯ൓෮తʹֶशΛਐΊΔ • → ֶशσʔλͷ૿Ճʹ൐͍ɺ݁Ռͷऩଋ·Ͱʹඞཁֶ͕श͕࣌ؒ૿Ճ͢Δ 
 
 • ऩଋ·Ͱͷ࣌ؒΛ୹ॖ͢Δख๏ͱͯ͠ɺ֬཰తޯ഑߱Լ๏΍ޯ഑߱Լ๏ͷ࠷ద

    ԽΞϧΰϦζϜ [D.P. Kingma 2014]͕ఏҊ͞Ε͍ͯΔ • → ґવͱͯ͠ɺֶशσʔλͷ૿Ճʹ൐ֶ͍श͕࣌ؒ૿Ճ͢Δಛੑ͸࢒Δ • ௚ۙʹಘΒΕֶͨशσʔλͷΈΛ༻͍ͯ௥ՃతʹֶशΛߦ͏ • → ഁ໓త๨٫ [J. Kirkpatrick 2017]͕ൃੜ͠ɺਫ਼౓͕Լ͕Δ໰୊΋ใࠂ͞Ε͍ͯΔ 18 2. ஞ࣍దԠੑͷ֬อʢNNϞσϧͰͷैདྷͷରԠʣ
  19. 3. ఏҊख๏ Extreme Neural Linear Bandits

  20. • ར༻ऀ͔Βஞ࣍తʹग़͞ΕΔଟ༷͔ͭܧଓతʹมԽ͢ΔཁٻʹదԠ͢ΔͨΊɺ 1. จ຺ͱใुͷඇઢܗͳؔ܎ੑΛѻ͏͜ͱͷͰ͖ΔNNϞσϧ 2. ஞ࣍తͳֶश͕Մೳ 3. ֶश͋ͨΓͷ࣌ؒ΋୹͍ • Λશͯຬͨ͢Α͏ͳϞσϧͱͷ౷߹͕༗ޮ

    20 ஞ࣍దԠੑΛଛͳΘͳ͍ඇઢܗͳղ๏ʹ޲͚ͯ • Online Sequential Extreme Learning Machine (OS-ELM) Λ༻͍ͨख๏ΛఏҊ
  21. ཁૉٕज़ͷ঺հ

  22. • ୯ҰͷӅΕ૚Λ࣋ͭNNߏ଄Λ༻͍ͨ୯७ͳճؼϞσϧ • ॏΈͷਪఆʹɺޡࠩٯ఻೻๏ʹΑΔ൓෮తͳֶशͰ͸ͳ͘ɺ࠷খೋ৐๏Λ༻͍ ΔͨΊֶश࣌ؒͷ୹ॖ͕ظ଴Ͱ͖Δ 22 Extreme Learning Machine (ELM)

    [Huang 2006] ʜ ʜ
  23. 23 ELMͷߏ଄   ʜ E   ʜ -

       W ∈ ℝL×d b ∈ ℝL β ∈ ℝ1×L h1 = ϕ( d ∑ i=1 W1,i xi + b1 ) x ∈ ℝd h(x) = ϕ . (Wx + b) ̂ y = βh(x) ͸೚ҙͷ׆ੑԽؔ਺ ϕ ͸ཁૉ͝ͱʹ Λద༻͢Δԋࢉ ϕ . ϕ • ݸͷϢχοτΛ࣋ͭ୯ҰͷӅΕ૚͔ΒͳΔNNϞσϧ 
 ʢ͜͜Ͱ͸ଟ࿹όϯσΟοτղ๏ͱͷ౷߹Λલఏʹग़ྗΛεΧϥʔʹݶఆʣ L
  24. 24 ELMͷֶशʢೖྗ૚ʙӅΕ૚ʣ • ॏΈ ͱόΠΞε ͸ཚ਺ͰॳظԽͯ͠ਪఆͷର৅ͱ͠ͳ͍ 
 ͜ͷ৔߹ɺ͜ͷϞσϧ͸ೖྗΛඇઢܗԽ͢Δಛ௃ྔؔ਺ Λ࣋ͬͨઢܗϞσϧͱΈͳͤΔ W

    b h(x)   ʜ E   ʜ -    W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ) β ∈ ℝ1×L h1 = ϕ( d ∑ i=1 W1,i xi + b1 ) x ∈ ℝd h(x) = ϕ . (Wx + b) ̂ y = βh(x) ͸೚ҙͷ׆ੑԽؔ਺ ϕ ͸ཁૉ͝ͱʹ Λద༻͢Δԋࢉ ϕ . ϕ
  25. 25 ELMͷֶशʢӅΕ૚ʙग़ྗ૚ʣ • ͜ͷઢܗϞσϧʹର͠࠷খೋ৐๏Λద༻͠ɺॏΈ Λਪఆ͢Δ 
 ͜Ε͸ֶशσʔλ ͱ ʹ͓͚Δɺ༧ଌޡࠩ ͷ࠷খղ

    ͱͯ͠ٻ·Δ β X y ∥Hβ⊤ − y∥2 ̂ β⊤   ʜ E   ʜ -    x ∈ ℝd ̂ y = ̂ βh(x) H = ϕ . (XW⊤ + B) ∈ ℝN×L X = (x1 , …, xN )⊤ ∈ ℝN×d y = (y1 , …, yN )⊤ ∈ ℝN B = (b, …, b) ∈ ℝN×L ̂ β⊤ = (H⊤H)−1H⊤y /PUBUJPO W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ)
  26. • ELMͷਪఆํ๏Ͱ͸ɺଟ࿹όϯσΟοτ໰୊ͷΑ͏ʹஞ࣍తʹσʔλ͕ಘΒΕ Δঢ়گͰɺ৽ͨͳσʔλ͕ಘΒΕΔ౓ʹɺաڈͷશͯͷσʔλΛ༻͍ͨ࠶ܭࢉ ͕ൃੜͯ͠͠·͏ʢࢼߦճ਺ʹԠͯ͡ܭࢉྔ͕૿Ճʣ 
 
 • ݸͷೖྗ ͱ ʹର͢Δܭࢉ݁Ռ͕ಘΒΕ͍ͯΔͱ͖ʹɺͦͷܭࢉ݁Ռͱ

    ݸ໨ͷ৽ͨͳσʔλͷΈΛ༻͍ͯॏΈ Λਪఆ͢ΔOnline Sequential ELM͕ఏҊ͞Ε͍ͯΔʢࢼߦճ਺ʹԠͯ͡ܭࢉྔ͕Ұఆʣ N X y N + 1 β 26 Online Sequential ELM (OS-ELM) [Huang 2005]
  27. 27 OS-ELMͷֶश • ELMͱಉ༷ͷઢܗϞσϧʹର͠ɺஞ࣍࠷খೋ৐๏Λద༻͠ɺॏΈ Λਪఆ͢Δ 
 ࣌఺·Ͱͷܭࢉ݁Ռ͸ɺಛ௃ྔؔ਺ͱग़ྗ͔ΒٻΊͨ஋ͷ࿨ͱͯ͠ࢀরͰ͖Δ β N 

     ʜ E   ʜ -    x ∈ ℝd ̂ y = ̂ βN+1 h(x) ̂ β⊤ N+1 = (H⊤ N+1 HN+1 )−1H⊤ N+1 yN+1 W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ) QN+1 = ( N+1 ∑ i=1 h(xi )h(xi )⊤ ) −1 = (Q−1 N + h(xN+1 )h(xN+1 )⊤) −1 = QN − QN h(xN+1 )h(xN+1 )⊤QN 1 + h(xN+1 )⊤QN h(xN+1 ) rN+1 = N+1 ∑ i=1 yi h(xi ) ٯߦྻͷิॿఆཧ ͳ͓ɺ ͷظؒ͸ٯߦྻ ΛٻΊΔ͜ͱ͕Ͱ͖ͳ͍ͨΊɺ ஞ࣍తͳֶशΛ࣮ߦ͠ͳ͍ʢ#PPTUJOHظؒʣ L > N Q
  28. ఏҊख๏

  29. 29 ఏҊख๏: Extreme Neural Linear Bandits xt ˜ x(1) t

    a(k*) = argmaxk=1,K (˜ x(k)⊤ ˜ θ(k) + α ˜ x(k)⊤U(k) ˜ x(k) ) ʜ ˜ x(K) t ʜ ʜ ʜ ʜ OS-ELM + ਖ਼ଇԽ ม׵ؔ਺ β⊤ ⊗ h(x) Extreme Neural Linear Bandits Neural Network Bandit (LinUCB, Linear Thompson Sampling, etc…) • Neural Linear [C. Riquelme 2018] ํࣜΛ࠾༻ʢNNͱBanditͷ੾ସ΍վળΛࢹ໺ʹʣ • ಉํࣜͰͷOS-ELMద༻ʹ͋ͨΓɺʮ1. ม׵ؔ਺ʯʮ2. ਖ਼ଇԽ߲ʯΛಋೖ
  30. • ݩͷίϯςΩετ৘ใ ͔Β৽͍͠ίϯςΩετ৘ใ ΛಘΔͨΊɺӅΕ૚ ͷग़ྗ ͱग़ྗ૚ͷؒͷॏΈ ͱͷཁૉ͝ͱͷੵΛ༻͍Δ • ैདྷͷNeural LinearͰ͸࠷ऴӅΕ૚ͷग़ྗΛͦͷ··༻͍Δ

    • → OS-ELMʹ͓͍ͯɺඇઢܗੑΛଊ͑ΔͨΊʹ࣮࣭తʹد༩͍ͯ͠ΔॏΈ·Ͱ൓өͤ͞Δ ͜ͱͰɺίϯςΩετ৘ใͱͯ͠ͷ༗༻ੑΛ޲্ͤ͞Δ x ˜ x h(xt ) β 30 ఏҊख๏: ίϯςΩετ৘ใͷม׵ؔ਺ ʜ ʜ ʜ xt ˜ x(NeuralLinear) t = h(xt ) ˜ x(ExtremeNeuralLinearBandits) t = β⊤ ⊗ h(xt ) ʜ ʜ ग़ྗ૚ͷॏΈ·Ͱ׆༻
  31. • OS-ELMͰ͸ɺࢼߦճ਺ ͕Ϣχοτ਺ ະຬͷ࣌ɺBoostingظؒͱֶͯ͠ श͕Ͱ͖ͳ͍ͨΊɺ͜ͷظؒͷػձଛࣦ͕ൃੜ͢Δ • ఏҊख๏Ͱ͸ɺOS-ELMʹϦοδճؼΛద༻͠ɺࢼߦॳظ͔Βஞ࣍తʹֶशՄೳ • ݩͷ༧ଌޡࠩʹਖ਼ଇԽ߲ΛՃ͑ͨ ͷ࠷খղ

    ͱͯ͠ٻ·Δ 
 
 
 
 • ύϥϝʔλͷϊϧϜʹ੍໿ΛՃ͑Δ͜ͱ͔ΒɺաֶशΛ๷͗൚Խੑೳͷ޲্΋ظ଴ Ͱ͖Δ N L ∥Hβ⊤ − y∥2 + λnn ∥β⊤∥2 ̂ β⊤ 31 ఏҊख๏: OS-ELM΁ͷਖ਼ଇԽͷಋೖ ̂ β⊤ N+1 = (H⊤ N+1 HN+1 + λnn I)−1H⊤ N+1 yN+1 ͷ࣌ʹ ͔Β࢝·ΔΑ͏ʹมߋ N = 0 λnn I
  32. 4. ධՁͱߟ࡯

  33. • Wheel bandits [C. Riquelme 2018]: ඇઢܗͳଟ࿹όϯσΟοτ໰୊γϛϡϨʔγϣϯ • ࣌఺ίϯςΩετ৘ใ ʹରͯ͠બఆͨ͠࿹͔Β

    ใु ͕ ͷΑ͏ʹಘΒΕΔ • ฏۉใुֹ ͸ҎԼͷΑ͏ʹ࿹͝ͱʹܾఆ͞ΕΔʢͨͩ͠ ʣ t xt = (xi )1≤xi ≤2, xi ∼ Uniform(−1,1) yt yt ∼ 𝒩 (μ, σ2) μ μ2 < μ1 ≪ μ3 33 ධՁํ๏
  34. • Wheel banditsΛ༻͍ͨඇઢܗͳଟ࿹όϯσΟοτ໰୊ͷγϛϡϨʔγϣϯ • γϛϡϨʔγϣϯͷύϥϝʔλ: • γϛϡϨʔγϣϯ͝ͱʹ5000ճͷࢼߦɻ50ճͷฏۉ஋Λ݁Ռʹ༻͍Δ • ൺֱ͢Δղ๏͸ҎԼͷ௨Γɻղ๏ؒͷࠩҟ͕໌֬ʹͳΔΑ͏ઃఆΛἧ͑Δ •

    ֶशִؒʹ͍ͭͯɺ࣌ؒͷ͔͔ΔNeural Linear (Full)ʹ߹Θͤͯ100ճ͝ͱͱͨ͠ μ1 = 1.2,μ2 = 1.0,μ3 = 5.0,σ2 = 0.1,δ = 0.7 34 ධՁํ๏ NN Bandit Ϟσϧ ӅΕ૚ ਖ਼ଇԽ Ϟσϧ ਖ਼ଇԽ ୳ࡧ཰ LinUCB: ઢܗͳղ๏ - - - LinUCB λ=1.0 α=0.1 Neural Linear (Differential): ඇઢܗɺࠩ෼ֶश MLP (Diff) L=100 λ=1.0 LinUCB λ=1.0 α=0.1 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 LinUCB λ=1.0 α=0.1 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 LinUCB λ=1.0 α=0.1
  35. • γϛϡϨʔγϣϯͷྦྷੵใुΛղ๏͝ͱʹൺֱ • ઢܗͳղ๏ʢLinUCBʣྦྷੵใु͕௿͘ɺඇઢܗͳઃఆΛॆ෼ʹѻ͑ͳ͍ • Neural Linear (Full) ͕ྦྷੵใु͕࠷΋ߴ͍ •

    Differential < Extreme Neural Linear BanditsͰ͋Δ͜ͱ͔Βɺ 
 ࣮ߦ࣌ؒͷ୹ॖΛ໨తͱͨࠩ͠෼ํࣜͱͯ͠ɺ 
 ఏҊख๏͕ଟ࿹όϯσΟοτ໰୊ʹର͢Δ 
 ੑೳΛҡ࣋Ͱ͖͍ͯΔ͜ͱ͕Θ͔Δ 35 ෳࡶͳҙࢥܾఆʹର͢ΔੑೳͷධՁ
  36. • ࿹ͷධՁͷߋ৽ʹؔ͢Δྦྷੵ࣮ߦ࣌ؒΛղ๏͝ͱʹൺֱ • NNͷֶश͕ෆཁͳLinUCB͕0.05ඵͱ࠷΋଎͍ʢඇઢܗ΁ͷରԠ͸ෆे෼ʣ • ͍࣍ͰɺఏҊख๏3.0ඵɺNeural Linear (Differential) 13.1ඵɺFull͕28.3ඵ •

    ཧ༝1: ఏҊख๏ͱDifferential͸ࠩ෼ֶशͷͨΊɺࢼߦճ਺ͷ૿ՃʹґΒͣ ֶश͕࣌ؒҰఆ • ཧ༝2: ఏҊख๏͸൓෮తͳֶश͕ෆཁɻ 
 ֶश࣌ؒ͋ͨΓͷ࣮ߦ࣌ؒ͸ 
 Differentialͷ0.3ඵʹର͠0.07ඵͱ4.1ഒఔ౓ߴ଎ 36 ஞ࣍దԠੑͷධՁ
  37. • ఏҊख๏Ͱ༻͍ͨਖ਼ଇԽ෇͖ͷOS-ELM͔ΒಘΒΕΔ৽͍͠ίϯςΩετ৘ใ ʹ͍ͭͯɺඇઢܗͳଟ࿹όϯσΟοτղ๏ʹର͢Δ༗༻ੑΛҎԼͷ؍఺Ͱ෼ੳ 1. NNϞσϧͱͯ͠ͷਫ਼౓ 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ 37 ίϯςΩετ৘ใͷ༗༻ੑͷධՁ

  38. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸ςετʹର͢Δ༧ଌޡࠩͷઈର஋ͷฏۉΛશ࿹ͷϞσϧͰ߹ܭͨ͠΋ͷ • ࣮ઢ͸఺ઢ͸ਖ਼ଇԽύϥϝʔλ ɺ఺ઢ͸ ͷ݁Ռ λnn =

    1.0 λnn = 0.0001 38 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ NN Ϟσϧ ӅΕ૚ ਖ਼ଇԽ LinUCB: ઢܗͳղ๏ - - - Neural Linear (Differential): ඇઢܗɺࠩ෼ MLP (Diff) L=100 λ=1.0 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 MLP(Diff)ʢࠩ෼σʔλʹΑΔ൓෮తͳֶशํࣜʣͰ͸ɺ NNϞσϧͱͯ͠΋༧ଌޡࠩ͸ݮΒͳ͔ͬͨ OS-ELMͱMLP(Full)ʢશσʔλʹΑΔ൓෮తͳֶशํ ࣜʣͰ͸ɺಛʹॳظʹֶ͓͍ͯशσʔλͷ૿Ճʹ൐͍༧ ଌޡ͕ࠩݮগͨ͠
  39. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸ςετʹର͢Δ༧ଌޡࠩͷઈର஋ͷฏۉΛશ࿹ͷϞσϧͰ߹ܭͨ͠΋ͷ • ࣮ઢ͸఺ઢ͸ਖ਼ଇԽύϥϝʔλ ɺ఺ઢ͸ ͷ݁Ռ λnn =

    1.0 λnn = 0.0001 39 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ NN Ϟσϧ ӅΕ૚ ਖ਼ଇԽ LinUCB: ઢܗͳղ๏ - - - Neural Linear (Differential): ඇઢܗɺࠩ෼ MLP (Diff) L=100 λ=1.0 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 MLP(Diff)ʢࠩ෼σʔλʹΑΔ൓෮తͳֶशํࣜʣͰ͸ɺ NNϞσϧͱͯ͠΋༧ଌޡࠩ͸ݮΒͳ͔ͬͨ OS-ELMͱMLP(Full)ʢશσʔλʹΑΔ൓෮తͳֶशํ ࣜʣͰ͸ɺಛʹॳظʹֶ͓͍ͯशσʔλͷ૿Ճʹ൐͍༧ ଌޡ͕ࠩݮগͨ͠ ਖ਼ଇԽΛऑΊΔͱMLP(Full)ͷΈɺ༧ଌޡ͕ࠩ͞Βʹݮগ → όϯσΟοτ໰୊ͱͯ͠ͷྦྷੵใुͰ͸ਖ਼ଇԽ͕ڧ͍ ํ͕݁Ռ͕Α͔ͬͨͷͱରরత
  40. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ࿹ ʹඥͮ͘NNϞσϧͷਪఆใु஋ͷ෼෍ʢ ͷपลΛ֦େʣΛՄࢹԽ a2 μ = μ3

    40 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ D = 100 D = 25k D = 100 D = 25k λnn = 0.0001 λnn = 1.0 04&-. .-1 'VMM 5SVUI ᶃ ֶशσʔλ͕૿͑Δͱਅͷใु෼෍ͷܗঢ়ʹۙͮ͘ ᶄ ਖ਼ଇԽ͕ऑ͍ͱಘΒΕͨσʔλ ʹద߹͠΍͍͢ʢաֶशʣ ᶅ աֶशΛڐ༰্ͨ͠ͰOS- ELM͸MLP(Full)ͱൺ΂ͯද ݱྗͷݶքΛ֬ೝͰ͖Δ
  41. 41 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ //Ϟσϧͷ༧ଌޡࠩͱ͸

    ͓͓Αͦٯؔ਺ͷؔ܎ NNϞσϧͷਫ਼౓͕ߴ͍΄Ͳɺ ಘΒΕΔใु͕ଟ͍܏޲
  42. 42 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ //Ϟσϧͷ༧ଌޡࠩͱ͸

    ͓͓Αͦٯؔ਺ͷؔ܎ MLP(Full)ͷ ʹ͍ͭͯɺ༧ଌޡࠩ͸ৗʹ ΑΓେ͖͔ͬͨʹ΋ؔΘΒͣɺ Ҏ߱ ͷ߹ܭใु͸ٯస͢Δ λnn = 1.0 λnn = 0.0001 D = 5k ઌͷධՁͰ΋ɺ ΑΓ΋ ͷํ͕ྦྷੵ ใु͕ଟ͍ λnn = 0.0001 λnn = 1.0 ਖ਼ଇԽΛڧΊΔ͜ͱ͕ଟ࿹όϯσΟοτͷใुͷվળʹ ͭͳ͕Δ
  43. 43 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ //Ϟσϧͷ༧ଌޡࠩͱ͸

    ͓͓Αͦٯؔ਺ͷؔ܎ NNϞσϧ΋ஞ࣍తʹֶश͢ΔલఏͷఏҊख๏ʹ͓͍ͯ͸ ਖ਼ଇԽ߲ͷಋೖ͕༗ޮͰ͋Δ͜ͱ͕ࣔࠦ͞Εͨ ਖ਼ଇԽΛڧΊΔ͜ͱ͕ଟ࿹όϯσΟοτͷใुͷվળʹ ͭͳ͕Δ ਖ਼ଇԽΛڧΊΔ͜ͱͰɺNNϞσϧͷ൚Խੑೳ͕ߴ·Γɺ ίϯςΩετ৘ใͷදݱ͕Ұఆͷ܏޲Ͱ؇΍͔ʹมԽ → ଟ࿹όϯσΟοτղ๏ͷֶश͕҆ఆ͢Δͱߟ͑ΒΕΔ
  44. 5. ͓ΘΓʹ

  45. • ඇઢܗͳଟ࿹όϯσΟοτ໰୊ͷैདྷͷղ๏͕ɺֶश࣌ؒͷ૿ՃʹىҼͯ͠ஞ ࣍దԠੑΛଛͳ͏ͱ͍͏࣮Ԡ༻্ʹ΋ॏཁͱͳΔಉ෼໺Ͱͷ՝୊ʹண໨ͨ͠ • ൓෮తͳֶश͕ෆཁͳOS-ELMΛैདྷͷղ๏ͱ౷߹͢Δ͜ͱͰɺඇઢܗͳ໰୊ ʹରͯ͠ஞ࣍దԠੑΛซͤ࣋ͭଟ࿹όϯσΟοτղ๏ΛఏҊͨ͠ • ैདྷͷઢܗ·ͨ͸ඇઢܗͳଟ࿹όϯσΟοτղ๏ͱൺֱධՁΛߦ͍ɺఏҊղ๏ ͕ඇઢܗੑ΁ͷੑೳΛҡ࣋͠ͳ͕Βɺֶश࣌ؒΛ࡟ݮͰ͖Δ͜ͱΛࣔͨ͠ •

    ଟ࿹όϯσΟοτղ๏ʹର͢Δɺ൓෮తͳֶश͕ෆཁͳNNϞσϧͷ༗༻ੑΛ ෼ੳ͠ɺఏҊղ๏ͷվળཁ݅Λݕ౼ͨ͠ 45 ͓ΘΓʹ
  46. None