Upgrade to Pro — share decks privately, control downloads, hide ads and more …

迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/extreme_neural_linear_bandits

monochromegane
September 15, 2022

 迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/extreme_neural_linear_bandits

monochromegane

September 15, 2022
Tweet

More Decks by monochromegane

Other Decks in Research

Transcript

  1. ࡾ୐ ༔հ1,2ɼ็ ߃ݑ3


    1. Pepabo R&D Institute, GMO Pepabo, Inc., 2. ۝भେֶ େֶӃγεςϜ৘ใՊֶ෎ ৘ใ஌ೳ޻ֶઐ߈


    3. ۝भେֶ େֶӃγεςϜ৘ใՊֶݚڀӃ ৘ใ஌ೳ޻ֶ෦໳


    2022.09.15 SMASH22 Summer Symposium
    ਝ଎ͳֶशػߏΛ༻͍ͯ


    ஞ࣍దԠੑΛଛͳ͏͜ͱͳ͘ඇઢܗੑΛѻ͏


    จ຺෇͖ଟ࿹όϯσΟοτख๏

    View Slide

  2. 1. ͸͡Ίʹ


    2. ؔ࿈ݚڀ: NNϞσϧͱͷ౷߹ͱஞ࣍దԠੑͷ՝୊


    3. ఏҊख๏: Extreme Neural Linear Bandits


    4. ධՁͱߟ࡯


    5. ͓ΘΓʹ
    2
    ໨࣍

    View Slide

  3. 1.
    ͸͡Ίʹ

    View Slide

  4. • దԠతͳγεςϜͷ࣮ݱʹ͸ɺγεςϜ͕ར༻ऀͷঢ়گΛΑ͘஌Δ͜ͱ͕ॏཁ


    • ECαΠτͷγεςϜͰ͋Ε͹ɺར༻ऀͷᅂ޷Λ೺Ѳ͢Δ͜ͱͰɺ࠷దͳ঎
    ඼΍ಋઢΛఏҊͰ͖Δ


    • ࣮ӡ༻ͷγεςϜʹ͓͍ͯίϛϡχέʔγϣϯʹ͸ίετ͕͔͔Δ


    • ʢར༻ऀࣗ਎΋ؚΊͯʣཁٻ΍ᅂ޷͸໌֬Ͱ͸ͳ͘ঃʑʹܗ੒͞Ε͍ͯ͘


    • ͦͷظؒதͷෛ୲΍ػձଛࣦ͸୹ظ௕ظͰചΓ্͛ͳͲʹӨڹ͢Δ


    • ಛʹɺཁٻ΍ᅂ޷͕มԽ͢Δ؀ڥͰ͸ɺݱ࣌఺ͰՁ஋ͷ௿͍ίϛϡχέʔ
    γϣϯ΋ܧଓͯ͠ߦ͏ඞཁ͕͋Δ
    4
    దԠతͳγεςϜͱίϛϡχέʔγϣϯίετ

    View Slide

  5. • ίϛϡχέʔγϣϯΛɺબ୒ࢶͷఏҊͱ൓Ԡͱݶఆ͢Δ͜ͱͰɺ͜ͷίετΛ
    ࠷దԽ͢Δ໰୊Λʮଟ࿹όϯσΟοτ໰୊ʯͱͯ͠ߟ͑Δ͜ͱ͕Ͱ͖Δ
    5
    ίϛϡέʔγϣϯίετͷ࠷దԽͱଟ࿹όϯσΟοτ
    • ҰํͰɺैདྷͷಉ໰୊ͷղ๏Ͱ͸ɺঢ়گͱબ୒ͷؒʹʮෳࡶͳؔ܎ੑʯ͕͋Δ
    ؀ڥͰ͸ɺؔ܎ੑͷ೺ѲͷͨΊͷֶश͕࣌ؒ૿Ճ͢Δʢؔ࿈ݚڀͰઆ໌ʣ


    • ͜ͷ؀ڥʹ͓͍ͯਖ਼͔֬ͭਝ଎ͳબ୒ࢶͷఏҊ͕Մೳͳղ๏ΛఏҊ͢Δ͜ͱ
    ͰɺదԠతͳγεςϜͷ࣮༻ԽΛਐΊ͍ͨ

    View Slide

  6. • ʮ࿹ʯͱݺ͹ΕΔෳ਺ͷީิ͔ΒಘΒΕΔใुΛ࠷େԽ͢Δ໰୊


    • ϓϨΠϠʔ͸Ұ౓ͷࢼߦͰ1ͭͷ࿹Λબ୒͠ɺใुΛಘΔ


    • ͦΕͧΕͷ࿹͸͋Δใु෼෍ʹै͍ใुΛੜ੒


    • ͨͩ͠ɺϓϨΠϠʔ͸͜ͷใु෼෍Λࢼߦͷ݁Ռ͔Βਪଌ͢Δඞཁ͕͋Δ
    6
    ଟ࿹όϯσΟοτ໰୊
    • ϓϨΠϠʔ͸͋Δ࣌఺ͷ࿹ͷධՁʹج͖ͮʮ׆༻ʯͱʮ୳ࡧʯΛฒߦͯ͠ߦ͏


    • ͜ͷτϨʔυΦϑΛղফ͢ΔͨΊʹ༷ʑͳղ๏͕ఏҊ͞Ε͍ͯΔ

    View Slide

  7. • ࿹͝ͱͷใु෼෍͸ৗʹಉ͡Ͱ͋Δͱ͍͏Ծఆ


    • → ঢ়گ΍ଐੑ͝ͱʹใु෼෍͕ҟͳΔͷͰ͸ͳ͍͔ʁ


    • ྫʣ೥୅͝ͱʹਓؾͷ঎඼͕ҧ͏ɺ࠷ۙɺಉ͡ΧςΰϦͷ঎඼Λങͬͨ
    7
    ଟ࿹όϯσΟοτ໰୊ͷ֦ு
    • ʮจ຺෇͖ʯଟ࿹όϯσΟοτ໰୊ͱ֦ͯ͠ு͞Ε͍ͯΔ


    • → ͜ͷղ๏Ͱ͸ɺίϯςΩετ৘ใˎͱใुͷؔ܎ੑΛਪଌ͢Δ


    • ˎίϯςΩετ৘ใͱ͸ɺঢ়گ౳Λ৘ใγεςϜͰѻ͑Δܗʹม׵ͨ͠΋ͷ

    View Slide

  8. • ίϯςΩετ৘ใͱใुͷؒʹઢܗͳؔ܎ΛԾఆͯ͠ਪଌ


    • LinUCB [L. Li 2010]ɺLinear Thompson Sampling [S. Agrawal 2013]
    8
    ैདྷͷจ຺෇͖ଟ࿹όϯσΟοτղ๏
    a(k*) = argmaxk=1,K (x⊤ ˜
    θ(k) + α x⊤U(k)x)
    ˜
    θ(k) = U(k)v(k)
    U(k) =
    (
    N(k)

    i=1
    xi
    x⊤
    i )
    −1
    v(k) =
    N(k)

    i=1
    xi
    yi
    ྫ-JO6$#ʹ͓͚Δ࿹ͷબఆ
    ਪఆͨ͠ฏۉใुͱɺࢼߦճ਺ʹԠͨ͡ෆ࣮֬ੑͷදݱͰ͋Δ୳ࡧ߲ͷ࿨ͷ࠷΋େ͖ͳ࿹Λબఆ͢Δ
    ใु͕ೖྗͷίϯςΩετ৘ใ ͱύϥϝʔλ ͷ
    ੒෼ͷੵ࿨͔Βٻ·ΔͱԾఆ͢ΔઢܗϞσϧ
    x θ

    View Slide

  9. • ৘ใγεςϜͰѻ͏σʔλͷछྨͱྔ͕૿Ճ


    • ਓޱ౷ܭֶతଐੑ౳ͷߏ଄Խσʔλ͔Βɺը૾ɾࣗવݴޠ౳ͷඇߏ଄Խσʔλ΁


    • ୯७ͳઢܗͳղ๏Ͱ͸ɺෳࡶͳؔ܎ੑΛॆ෼ʹѻ͏͜ͱ͕Ͱ͖ͳ͍
    9
    จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͷߴ౓Խͱඇઢܗͳղ๏
    ίϯςΩετ৘ใ ʹର͢Δ
    ใु෼෍ͷྫɻ

    ࠨ͸ઢܗ ɺӈ͸ඇઢܗΛԾఆ
    ͢Δղ๏͕ద͢Δͱߟ͑ΒΕΔ
    x = (x1,x2)⊤
    ( ̂
    y = x⊤w)

    View Slide

  10. • Neural Network (NN) Λ༻͍ͯɺίϯςΩετ৘ใͱใुͷඇઢܗͳؔ܎ੑΛ
    ѻ͏ख๏͕ొ৔ [R. Allesiardo 2014,C. Riquelme 2018, M. Collier 2018, D. Guo 2020, D. Zhou 2020, S. Sajeev 2021]
    10
    จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͷߴ౓Խͱඇઢܗͳղ๏
    • NNϞσϧ͕ͦͷੑೳΛൃش͢ΔͨΊʹ͸ɼେྔͷֶशσʔλͱͦΕʹదԠ͞
    ͤΔͨΊͷॆ෼ͳֶश͕࣌ؒඞཁ
    • ར༻ऀ͔Βஞ࣍తʹग़͞ΕΔଟ༷͔ͭมԽ͢Δཁٻ΁ͷదԠੑʢஞ࣍దԠੑʣ
    ͷ௿ԼΛট͘


    • ֶश࣌ؒͷ૿ՃΛߟྀ͠ͳ͍৔߹ɺҙࢥܾఆج४ͷߋ৽͕஗Ԇ͢Δ


    • ஞ࣍తͳֶशΛආ͚Δ৔߹ɺ࠷৽ͷ৘ใΛར༻Ͱ͖ͳ͍

    View Slide

  11. • దԠతͳγεςϜͷ࣮ݱͷͨΊɺෳࡶͳҙࢥܾఆΛਝ଎ʹߦ͏ػߏ͕ඞཁ

    • ͜ΕΛఆࣜԽͨ͠จ຺෇͖ଟ࿹όϯσΟοτ໰୊ʹର͢Δඇઢܗͳղ๏ʹண໨

    • ैདྷղ๏Ͱͷஞ࣍దԠੑΛଛͳ͏ֶश࣌ؒͷ૿Ճͷ՝୊Λղܾ͍ͨ͠

    • ൓෮తͳֶश͕ෆཁͰֶश͕࣌ؒ୹͍NNϞσϧͱͷ౷߹ΛఏҊ


    • Ճ͑ͯɺଟ࿹όϯσΟοτղ๏ʹର͢ΔಉϞσϧͷ༗༻ੑΛ෼ੳɾߟ࡯
    11
    ݚڀͷ໨తͱఏҊͷࠎࢠ

    View Slide

  12. 2.
    ؔ࿈ݚڀ


    NNϞσϧͱͷ౷߹ͱஞ࣍దԠੑͷ՝୊

    View Slide

  13. • NNϞσϧΛಋೖͨ͠ॳظͷඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτղ๏


    • ೚ҙͷNNϞσϧΛίϯςΩετ৘ใ͔ΒใुΛਪఆ͢Δใुؔ਺ͱͯ͠ར༻


    • -GreedyʹΑΔݻఆൺ཰Ͱͷ࿹ͷ׆༻ͱ୳ࡧ
    ϵ
    13
    Neural Bandit1 [R. Allesiardo 2014]
    ʜ
    ʜ
    ʜ
    ʜ
    ʜ
    ʜ
    ʜ
    ʜ
    ʜ
    ʜ
    xt
    ̂
    y(1),*
    t
    ̂
    y(2)
    t
    ̂
    y(K)
    t
    argmaxk=1,K
    ̂
    y(k),1 − ϵ
    ∀a ∈ A, ϵ/K
    Neural Network Bandit

    View Slide

  14. • NNϞσϧΛಋೖͨ͠ॳظͷඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτղ๏


    • ೚ҙͷNNϞσϧΛίϯςΩετ৘ใ͔ΒใुΛਪఆ͢Δใुؔ਺ͱͯ͠ར༻


    • -GreedyʹΑΔݻఆൺ཰Ͱͷ࿹ͷ׆༻ͱ୳ࡧ
    ϵ
    14
    Neural Bandit1 [R. Allesiardo 2014]
    • NNϞσϧΛଟ࿹όϯσΟοτղ๏ʹಋೖ͢Δࡍͷ2ͭͷ՝୊Λ໌Β͔ʹͨ͠


    1. ׆༻ͱ୳ࡧʹ͓͚ΔNNϞσϧͷෆ࣮֬ੑͷߟྀ


    2. ஞ࣍దԠੑͷ֬อ

    View Slide

  15. • ࿹ͷ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑΛNNϞσϧʹ͓͍ͯͲͷΑ͏ʹදݱ͢
    Δ͔ͱ͍͏՝୊


    • LinUCBͷ୳ࡧ߲ʹ૬౰͢Δ஋ΛNNϞσϧͰͲ͏දݱ͢Δ͔





    • ैདྷͷNNϞσϧΛಋೖ͢Δղ๏ͷଟ͕͘͜ͷ՝୊ͷํʹண໨͍ͯ͠Δ


    • ༧ଌ࣌ʹDropout๏Λద༻ͯ͠ੜ͡Δਪఆͷ෯Λෆ࣮֬ੑͱΈͳ͢ղ๏ [C. Riquelme 2018][M. Collier 2018]


    • Bootstrap๏ʹΑΔෳ਺ͷχϡʔϥϧωοτϫʔΫͷϞσϧΛ֬཰తʹબ୒ͯ͠ੜ͡Δਪఆ

    ͷ෯Λෆ࣮֬ੑͱΈͳ͢ղ๏ [C. Riquelme 2018][D. Guo 2020]


    • ใुͷਪఆͱͷ͔ࠩΒٻ·Δύϥϝʔλʹର͢Δޯ഑Λ༻͍ͯෆ࣮֬ੑΛදݱ͢Δղ๏ [D. Zhou 2020]
    a(k*) = argmaxk=1,K (x⊤ ˜
    θ(k) + α x⊤U(k)x)
    15
    1. ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑͷߟྀ

    View Slide

  16. • ࿹ͷ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑΛNNϞσϧʹ͓͍ͯͲͷΑ͏ʹද
    ݱ͢Δ͔ͱ͍͏՝୊


    • LinUCBͷ୳ࡧ߲ʹ૬౰͢Δ஋ΛNNϞσϧͰͲ͏දݱ͢Δ͔





    • ैདྷͷNNϞσϧΛಋೖ͢Δղ๏ͷଟ͕͘͜ͷ՝୊ͷํʹண໨͍ͯ͠Δ


    • ͜ΕΒͷղ๏Ͱ͸ɺར༻͢ΔNNϞσϧʹର͠ɺஞ࣍తͳֶशΛߦΘͳ͍͔ɺ
    ߦ͏৔߹Ͱ΋ɺֶश࣌ؒΛߟྀ͠ͳ͍
    a(k*) = argmaxk=1,K (x⊤ ˜
    θ(k) + α x⊤U(k)x)
    16
    1. ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑͷߟྀ

    View Slide

  17. • Neural Linear [C. Riquelme 2018] ͸2ͭͷ՝୊ΛNNϞσϧͱଟ࿹όϯσΟοτղ๏ͷ
    ϞσϧΛ෼཭͢Δ͜ͱͰରԠͨ͠


    • ೚ҙͷNNϞσϧΛݩͷίϯςΩετ৘ใ͔Βใुͱͷؔ܎ੑΛΑ͘දݱ͢Δ৽͍͠
    ίϯςΩετ৘ใ΁ͷม׵ثͱͯ͠ར༻


    • ࿹ͷ׆༻ͱ୳ࡧ͸ɺैདྷͷઢܗͳղ๏Λར༻ʢ1ͭ໨ͷ՝୊ʹରԠʣ


    • NNϞσϧͷֶशִؒͱଟ࿹όϯσΟοτղ๏ͷஞ࣍తͳߋ৽ͷִؒͱ੾Γ཭͢

    ʢ2ͭ໨ͷ՝୊Λ؇࿨ → ࠷৽ͷ৘ใ͸ར༻Ͱ͖ͳ͍ʣ
    17
    2. ஞ࣍దԠੑͷ֬อ
    ʜ
    ʜ
    ʜ
    xt
    ˜
    x(1)
    t
    a(k*) = argmaxk=1,K (˜
    x(k)⊤ ˜
    θ(k) + α ˜
    x(k)⊤U(k) ˜
    x(k)
    )
    ʜ
    ʜ
    ʜ
    ʜ
    ˜
    x(K)
    t
    Bandit (LinUCB, Linear Thompson Sampling, etc…)

    View Slide

  18. • NNϞσϧͷֶशͰར༻͞ΕΔޡࠩٯ఻೻๏͸ɺ༧ଌޡ͔ࠩΒٻ·Δޯ഑Λ༻
    ͍ͯ൓෮తʹֶशΛਐΊΔ


    • → ֶशσʔλͷ૿Ճʹ൐͍ɺ݁Ռͷऩଋ·Ͱʹඞཁֶ͕श͕࣌ؒ૿Ճ͢Δ


    • ऩଋ·Ͱͷ࣌ؒΛ୹ॖ͢Δख๏ͱͯ͠ɺ֬཰తޯ഑߱Լ๏΍ޯ഑߱Լ๏ͷ࠷ద
    ԽΞϧΰϦζϜ [D.P. Kingma 2014]͕ఏҊ͞Ε͍ͯΔ


    • → ґવͱͯ͠ɺֶशσʔλͷ૿Ճʹ൐ֶ͍श͕࣌ؒ૿Ճ͢Δಛੑ͸࢒Δ


    • ௚ۙʹಘΒΕֶͨशσʔλͷΈΛ༻͍ͯ௥ՃతʹֶशΛߦ͏


    • → ഁ໓త๨٫ [J. Kirkpatrick 2017]͕ൃੜ͠ɺਫ਼౓͕Լ͕Δ໰୊΋ใࠂ͞Ε͍ͯΔ 18
    2. ஞ࣍దԠੑͷ֬อʢNNϞσϧͰͷैདྷͷରԠʣ

    View Slide

  19. 3.
    ఏҊख๏


    Extreme Neural Linear Bandits

    View Slide

  20. • ར༻ऀ͔Βஞ࣍తʹग़͞ΕΔଟ༷͔ͭܧଓతʹมԽ͢ΔཁٻʹదԠ͢ΔͨΊɺ


    1. จ຺ͱใुͷඇઢܗͳؔ܎ੑΛѻ͏͜ͱͷͰ͖ΔNNϞσϧ


    2. ஞ࣍తͳֶश͕Մೳ


    3. ֶश͋ͨΓͷ࣌ؒ΋୹͍


    • Λશͯຬͨ͢Α͏ͳϞσϧͱͷ౷߹͕༗ޮ
    20
    ஞ࣍దԠੑΛଛͳΘͳ͍ඇઢܗͳղ๏ʹ޲͚ͯ
    • Online Sequential Extreme Learning Machine (OS-ELM) Λ༻͍ͨख๏ΛఏҊ

    View Slide

  21. ཁૉٕज़ͷ঺հ

    View Slide

  22. • ୯ҰͷӅΕ૚Λ࣋ͭNNߏ଄Λ༻͍ͨ୯७ͳճؼϞσϧ


    • ॏΈͷਪఆʹɺޡࠩٯ఻೻๏ʹΑΔ൓෮తͳֶशͰ͸ͳ͘ɺ࠷খೋ৐๏Λ༻͍
    ΔͨΊֶश࣌ؒͷ୹ॖ͕ظ଴Ͱ͖Δ
    22
    Extreme Learning Machine (ELM) [Huang 2006]
    ʜ
    ʜ

    View Slide

  23. 23
    ELMͷߏ଄


    ʜ
    E


    ʜ
    -



    W ∈ ℝL×d
    b ∈ ℝL
    β ∈ ℝ1×L
    h1
    = ϕ(
    d

    i=1
    W1,i
    xi
    + b1
    )
    x ∈ ℝd
    h(x) = ϕ . (Wx + b)
    ̂
    y = βh(x)
    ͸೚ҙͷ׆ੑԽؔ਺
    ϕ
    ͸ཁૉ͝ͱʹ
    Λద༻͢Δԋࢉ
    ϕ .
    ϕ
    • ݸͷϢχοτΛ࣋ͭ୯ҰͷӅΕ૚͔ΒͳΔNNϞσϧ

    ʢ͜͜Ͱ͸ଟ࿹όϯσΟοτղ๏ͱͷ౷߹Λલఏʹग़ྗΛεΧϥʔʹݶఆʣ
    L

    View Slide

  24. 24
    ELMͷֶशʢೖྗ૚ʙӅΕ૚ʣ
    • ॏΈ ͱόΠΞε ͸ཚ਺ͰॳظԽͯ͠ਪఆͷର৅ͱ͠ͳ͍

    ͜ͷ৔߹ɺ͜ͷϞσϧ͸ೖྗΛඇઢܗԽ͢Δಛ௃ྔؔ਺ Λ࣋ͬͨઢܗϞσϧͱΈͳͤΔ
    W b
    h(x)


    ʜ
    E


    ʜ
    -



    W = (wi,j
    )1≤i≤L
    1≤j≤d
    , wi,j
    ∼ P(θ)
    b = (bi
    )1≤i≤L, bi
    ∼ P(θ)
    β ∈ ℝ1×L
    h1
    = ϕ(
    d

    i=1
    W1,i
    xi
    + b1
    )
    x ∈ ℝd
    h(x) = ϕ . (Wx + b)
    ̂
    y = βh(x)
    ͸೚ҙͷ׆ੑԽؔ਺
    ϕ
    ͸ཁૉ͝ͱʹ
    Λద༻͢Δԋࢉ
    ϕ .
    ϕ

    View Slide

  25. 25
    ELMͷֶशʢӅΕ૚ʙग़ྗ૚ʣ
    • ͜ͷઢܗϞσϧʹର͠࠷খೋ৐๏Λద༻͠ɺॏΈ Λਪఆ͢Δ

    ͜Ε͸ֶशσʔλ ͱ ʹ͓͚Δɺ༧ଌޡࠩ ͷ࠷খղ ͱͯ͠ٻ·Δ
    β
    X y ∥Hβ⊤ − y∥2 ̂
    β⊤


    ʜ
    E


    ʜ
    -



    x ∈ ℝd ̂
    y = ̂
    βh(x)
    H = ϕ . (XW⊤ + B) ∈ ℝN×L
    X = (x1
    , …, xN
    )⊤ ∈ ℝN×d
    y = (y1
    , …, yN
    )⊤ ∈ ℝN
    B = (b, …, b) ∈ ℝN×L
    ̂
    β⊤ = (H⊤H)−1H⊤y
    /PUBUJPO
    W = (wi,j
    )1≤i≤L
    1≤j≤d
    , wi,j
    ∼ P(θ)
    b = (bi
    )1≤i≤L, bi
    ∼ P(θ)

    View Slide

  26. • ELMͷਪఆํ๏Ͱ͸ɺଟ࿹όϯσΟοτ໰୊ͷΑ͏ʹஞ࣍తʹσʔλ͕ಘΒΕ
    Δঢ়گͰɺ৽ͨͳσʔλ͕ಘΒΕΔ౓ʹɺաڈͷશͯͷσʔλΛ༻͍ͨ࠶ܭࢉ
    ͕ൃੜͯ͠͠·͏ʢࢼߦճ਺ʹԠͯ͡ܭࢉྔ͕૿Ճʣ


    • ݸͷೖྗ ͱ ʹର͢Δܭࢉ݁Ռ͕ಘΒΕ͍ͯΔͱ͖ʹɺͦͷܭࢉ݁Ռͱ
    ݸ໨ͷ৽ͨͳσʔλͷΈΛ༻͍ͯॏΈ Λਪఆ͢ΔOnline Sequential
    ELM͕ఏҊ͞Ε͍ͯΔʢࢼߦճ਺ʹԠͯ͡ܭࢉྔ͕Ұఆʣ
    N X y
    N + 1 β
    26
    Online Sequential ELM (OS-ELM) [Huang 2005]

    View Slide

  27. 27
    OS-ELMͷֶश
    • ELMͱಉ༷ͷઢܗϞσϧʹର͠ɺஞ࣍࠷খೋ৐๏Λద༻͠ɺॏΈ Λਪఆ͢Δ

    ࣌఺·Ͱͷܭࢉ݁Ռ͸ɺಛ௃ྔؔ਺ͱग़ྗ͔ΒٻΊͨ஋ͷ࿨ͱͯ͠ࢀরͰ͖Δ
    β
    N


    ʜ
    E


    ʜ
    -



    x ∈ ℝd ̂
    y = ̂
    βN+1
    h(x)
    ̂
    β⊤
    N+1
    = (H⊤
    N+1
    HN+1
    )−1H⊤
    N+1
    yN+1
    W = (wi,j
    )1≤i≤L
    1≤j≤d
    , wi,j
    ∼ P(θ)
    b = (bi
    )1≤i≤L, bi
    ∼ P(θ)
    QN+1
    =
    (
    N+1

    i=1
    h(xi
    )h(xi
    )⊤
    )
    −1
    = (Q−1
    N
    + h(xN+1
    )h(xN+1
    )⊤)
    −1
    = QN

    QN
    h(xN+1
    )h(xN+1
    )⊤QN
    1 + h(xN+1
    )⊤QN
    h(xN+1
    )
    rN+1
    =
    N+1

    i=1
    yi
    h(xi
    )
    ٯߦྻͷิॿఆཧ
    ͳ͓ɺ ͷظؒ͸ٯߦྻ ΛٻΊΔ͜ͱ͕Ͱ͖ͳ͍ͨΊɺ
    ஞ࣍తͳֶशΛ࣮ߦ͠ͳ͍ʢ#PPTUJOHظؒʣ
    L > N Q

    View Slide

  28. ఏҊख๏

    View Slide

  29. 29
    ఏҊख๏: Extreme Neural Linear Bandits
    xt
    ˜
    x(1)
    t
    a(k*) = argmaxk=1,K (˜
    x(k)⊤ ˜
    θ(k) + α ˜
    x(k)⊤U(k) ˜
    x(k)
    )
    ʜ
    ˜
    x(K)
    t
    ʜ
    ʜ
    ʜ
    ʜ
    OS-ELM + ਖ਼ଇԽ
    ม׵ؔ਺ β⊤ ⊗ h(x)
    Extreme Neural Linear Bandits
    Neural Network Bandit (LinUCB, Linear Thompson Sampling, etc…)
    • Neural Linear [C. Riquelme 2018] ํࣜΛ࠾༻ʢNNͱBanditͷ੾ସ΍վળΛࢹ໺ʹʣ


    • ಉํࣜͰͷOS-ELMద༻ʹ͋ͨΓɺʮ1. ม׵ؔ਺ʯʮ2. ਖ਼ଇԽ߲ʯΛಋೖ

    View Slide

  30. • ݩͷίϯςΩετ৘ใ ͔Β৽͍͠ίϯςΩετ৘ใ ΛಘΔͨΊɺӅΕ૚
    ͷग़ྗ ͱग़ྗ૚ͷؒͷॏΈ ͱͷཁૉ͝ͱͷੵΛ༻͍Δ


    • ैདྷͷNeural LinearͰ͸࠷ऴӅΕ૚ͷग़ྗΛͦͷ··༻͍Δ


    • → OS-ELMʹ͓͍ͯɺඇઢܗੑΛଊ͑ΔͨΊʹ࣮࣭తʹد༩͍ͯ͠ΔॏΈ·Ͱ൓өͤ͞Δ
    ͜ͱͰɺίϯςΩετ৘ใͱͯ͠ͷ༗༻ੑΛ޲্ͤ͞Δ
    x ˜
    x
    h(xt
    ) β
    30
    ఏҊख๏: ίϯςΩετ৘ใͷม׵ؔ਺
    ʜ
    ʜ
    ʜ
    xt
    ˜
    x(NeuralLinear)
    t
    = h(xt
    )
    ˜
    x(ExtremeNeuralLinearBandits)
    t
    = β⊤ ⊗ h(xt
    )
    ʜ
    ʜ
    ग़ྗ૚ͷॏΈ·Ͱ׆༻

    View Slide

  31. • OS-ELMͰ͸ɺࢼߦճ਺ ͕Ϣχοτ਺ ະຬͷ࣌ɺBoostingظؒͱֶͯ͠
    श͕Ͱ͖ͳ͍ͨΊɺ͜ͷظؒͷػձଛࣦ͕ൃੜ͢Δ


    • ఏҊख๏Ͱ͸ɺOS-ELMʹϦοδճؼΛద༻͠ɺࢼߦॳظ͔Βஞ࣍తʹֶशՄೳ


    • ݩͷ༧ଌޡࠩʹਖ਼ଇԽ߲ΛՃ͑ͨ ͷ࠷খղ ͱͯ͠ٻ·Δ




    • ύϥϝʔλͷϊϧϜʹ੍໿ΛՃ͑Δ͜ͱ͔ΒɺաֶशΛ๷͗൚Խੑೳͷ޲্΋ظ଴
    Ͱ͖Δ
    N L
    ∥Hβ⊤ − y∥2 + λnn
    ∥β⊤∥2 ̂
    β⊤
    31
    ఏҊख๏: OS-ELM΁ͷਖ਼ଇԽͷಋೖ
    ̂
    β⊤
    N+1
    = (H⊤
    N+1
    HN+1
    + λnn
    I)−1H⊤
    N+1
    yN+1
    ͷ࣌ʹ ͔Β࢝·ΔΑ͏ʹมߋ
    N = 0 λnn
    I

    View Slide

  32. 4.
    ධՁͱߟ࡯

    View Slide

  33. • Wheel bandits [C. Riquelme 2018]: ඇઢܗͳଟ࿹όϯσΟοτ໰୊γϛϡϨʔγϣϯ


    • ࣌఺ίϯςΩετ৘ใ ʹରͯ͠બఆͨ͠࿹͔Β
    ใु ͕ ͷΑ͏ʹಘΒΕΔ


    • ฏۉใुֹ ͸ҎԼͷΑ͏ʹ࿹͝ͱʹܾఆ͞ΕΔʢͨͩ͠ ʣ
    t xt
    = (xi
    )1≤xi
    ≤2, xi
    ∼ Uniform(−1,1)
    yt
    yt

    𝒩
    (μ, σ2)
    μ μ2
    < μ1
    ≪ μ3
    33
    ධՁํ๏

    View Slide

  34. • Wheel banditsΛ༻͍ͨඇઢܗͳଟ࿹όϯσΟοτ໰୊ͷγϛϡϨʔγϣϯ


    • γϛϡϨʔγϣϯͷύϥϝʔλ:


    • γϛϡϨʔγϣϯ͝ͱʹ5000ճͷࢼߦɻ50ճͷฏۉ஋Λ݁Ռʹ༻͍Δ


    • ൺֱ͢Δղ๏͸ҎԼͷ௨Γɻղ๏ؒͷࠩҟ͕໌֬ʹͳΔΑ͏ઃఆΛἧ͑Δ


    • ֶशִؒʹ͍ͭͯɺ࣌ؒͷ͔͔ΔNeural Linear (Full)ʹ߹Θͤͯ100ճ͝ͱͱͨ͠
    μ1
    = 1.2,μ2
    = 1.0,μ3
    = 5.0,σ2 = 0.1,δ = 0.7
    34
    ධՁํ๏
    NN Bandit
    Ϟσϧ ӅΕ૚ ਖ਼ଇԽ Ϟσϧ ਖ਼ଇԽ ୳ࡧ཰
    LinUCB: ઢܗͳղ๏
    - - - LinUCB λ=1.0 α=0.1
    Neural Linear (Differential): ඇઢܗɺࠩ෼ֶश
    MLP (Diff) L=100 λ=1.0 LinUCB λ=1.0 α=0.1
    Neural Linear (Full): ඇઢܗɺ౎౓શֶश
    MLP (Full) L=100 λ=1.0 LinUCB λ=1.0 α=0.1
    Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 LinUCB λ=1.0 α=0.1

    View Slide

  35. • γϛϡϨʔγϣϯͷྦྷੵใुΛղ๏͝ͱʹൺֱ


    • ઢܗͳղ๏ʢLinUCBʣྦྷੵใु͕௿͘ɺඇઢܗͳઃఆΛॆ෼ʹѻ͑ͳ͍


    • Neural Linear (Full) ͕ྦྷੵใु͕࠷΋ߴ͍


    • Differential < Extreme Neural Linear BanditsͰ͋Δ͜ͱ͔Βɺ

    ࣮ߦ࣌ؒͷ୹ॖΛ໨తͱͨࠩ͠෼ํࣜͱͯ͠ɺ

    ఏҊख๏͕ଟ࿹όϯσΟοτ໰୊ʹର͢Δ

    ੑೳΛҡ࣋Ͱ͖͍ͯΔ͜ͱ͕Θ͔Δ
    35
    ෳࡶͳҙࢥܾఆʹର͢ΔੑೳͷධՁ

    View Slide

  36. • ࿹ͷධՁͷߋ৽ʹؔ͢Δྦྷੵ࣮ߦ࣌ؒΛղ๏͝ͱʹൺֱ


    • NNͷֶश͕ෆཁͳLinUCB͕0.05ඵͱ࠷΋଎͍ʢඇઢܗ΁ͷରԠ͸ෆे෼ʣ


    • ͍࣍ͰɺఏҊख๏3.0ඵɺNeural Linear (Differential) 13.1ඵɺFull͕28.3ඵ


    • ཧ༝1: ఏҊख๏ͱDifferential͸ࠩ෼ֶशͷͨΊɺࢼߦճ਺ͷ૿ՃʹґΒͣ
    ֶश͕࣌ؒҰఆ


    • ཧ༝2: ఏҊख๏͸൓෮తͳֶश͕ෆཁɻ

    ֶश࣌ؒ͋ͨΓͷ࣮ߦ࣌ؒ͸

    Differentialͷ0.3ඵʹର͠0.07ඵͱ4.1ഒఔ౓ߴ଎
    36
    ஞ࣍దԠੑͷධՁ

    View Slide

  37. • ఏҊख๏Ͱ༻͍ͨਖ਼ଇԽ෇͖ͷOS-ELM͔ΒಘΒΕΔ৽͍͠ίϯςΩετ৘ใ
    ʹ͍ͭͯɺඇઢܗͳଟ࿹όϯσΟοτղ๏ʹର͢Δ༗༻ੑΛҎԼͷ؍఺Ͱ෼ੳ


    1. NNϞσϧͱͯ͠ͷਫ਼౓


    2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ
    37
    ίϯςΩετ৘ใͷ༗༻ੑͷධՁ

    View Slide

  38. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ


    • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸ςετʹର͢Δ༧ଌޡࠩͷઈର஋ͷฏۉΛશ࿹ͷϞσϧͰ߹ܭͨ͠΋ͷ


    • ࣮ઢ͸఺ઢ͸ਖ਼ଇԽύϥϝʔλ ɺ఺ઢ͸ ͷ݁Ռ
    λnn
    = 1.0 λnn
    = 0.0001
    38
    1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ
    NN
    Ϟσϧ ӅΕ૚ ਖ਼ଇԽ
    LinUCB: ઢܗͳղ๏ - - -
    Neural Linear (Differential): ඇઢܗɺࠩ෼ MLP (Diff) L=100 λ=1.0
    Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0
    Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0
    MLP(Diff)ʢࠩ෼σʔλʹΑΔ൓෮తͳֶशํࣜʣͰ͸ɺ
    NNϞσϧͱͯ͠΋༧ଌޡࠩ͸ݮΒͳ͔ͬͨ
    OS-ELMͱMLP(Full)ʢશσʔλʹΑΔ൓෮తͳֶशํ
    ࣜʣͰ͸ɺಛʹॳظʹֶ͓͍ͯशσʔλͷ૿Ճʹ൐͍༧
    ଌޡ͕ࠩݮগͨ͠

    View Slide

  39. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ


    • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸ςετʹର͢Δ༧ଌޡࠩͷઈର஋ͷฏۉΛશ࿹ͷϞσϧͰ߹ܭͨ͠΋ͷ


    • ࣮ઢ͸఺ઢ͸ਖ਼ଇԽύϥϝʔλ ɺ఺ઢ͸ ͷ݁Ռ
    λnn
    = 1.0 λnn
    = 0.0001
    39
    1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ
    NN
    Ϟσϧ ӅΕ૚ ਖ਼ଇԽ
    LinUCB: ઢܗͳղ๏ - - -
    Neural Linear (Differential): ඇઢܗɺࠩ෼ MLP (Diff) L=100 λ=1.0
    Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0
    Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0
    MLP(Diff)ʢࠩ෼σʔλʹΑΔ൓෮తͳֶशํࣜʣͰ͸ɺ
    NNϞσϧͱͯ͠΋༧ଌޡࠩ͸ݮΒͳ͔ͬͨ
    OS-ELMͱMLP(Full)ʢશσʔλʹΑΔ൓෮తͳֶशํ
    ࣜʣͰ͸ɺಛʹॳظʹֶ͓͍ͯशσʔλͷ૿Ճʹ൐͍༧
    ଌޡ͕ࠩݮগͨ͠
    ਖ਼ଇԽΛऑΊΔͱMLP(Full)ͷΈɺ༧ଌޡ͕ࠩ͞Βʹݮগ


    → όϯσΟοτ໰୊ͱͯ͠ͷྦྷੵใुͰ͸ਖ਼ଇԽ͕ڧ͍
    ํ͕݁Ռ͕Α͔ͬͨͷͱରরత

    View Slide

  40. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ


    • ࿹ ʹඥͮ͘NNϞσϧͷਪఆใु஋ͷ෼෍ʢ ͷपลΛ֦େʣΛՄࢹԽ
    a2
    μ = μ3
    40
    1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ
    D = 100 D = 25k D = 100 D = 25k
    λnn
    = 0.0001
    λnn
    = 1.0
    04&-.
    .-1 'VMM

    5SVUI
    ᶃ ֶशσʔλ͕૿͑Δͱਅͷใु෼෍ͷܗঢ়ʹۙͮ͘
    ᶄ ਖ਼ଇԽ͕ऑ͍ͱಘΒΕͨσʔλ
    ʹద߹͠΍͍͢ʢաֶशʣ
    ᶅ աֶशΛڐ༰্ͨ͠ͰOS-
    ELM͸MLP(Full)ͱൺ΂ͯද
    ݱྗͷݶքΛ֬ೝͰ͖Δ

    View Slide

  41. 41
    2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ
    • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ


    • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ


    • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ


    //Ϟσϧͷ༧ଌޡࠩͱ͸
    ͓͓Αͦٯؔ਺ͷؔ܎
    NNϞσϧͷਫ਼౓͕ߴ͍΄Ͳɺ
    ಘΒΕΔใु͕ଟ͍܏޲

    View Slide

  42. 42
    2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ
    • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ


    • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ


    • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ


    //Ϟσϧͷ༧ଌޡࠩͱ͸
    ͓͓Αͦٯؔ਺ͷؔ܎
    MLP(Full)ͷ ʹ͍ͭͯɺ༧ଌޡࠩ͸ৗʹ
    ΑΓେ͖͔ͬͨʹ΋ؔΘΒͣɺ Ҏ߱
    ͷ߹ܭใु͸ٯస͢Δ
    λnn
    = 1.0
    λnn
    = 0.0001 D = 5k
    ઌͷධՁͰ΋ɺ ΑΓ΋ ͷํ͕ྦྷੵ
    ใु͕ଟ͍
    λnn
    = 0.0001 λnn
    = 1.0
    ਖ਼ଇԽΛڧΊΔ͜ͱ͕ଟ࿹όϯσΟοτͷใुͷվળʹ
    ͭͳ͕Δ

    View Slide

  43. 43
    2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ
    • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ


    • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ


    • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ


    //Ϟσϧͷ༧ଌޡࠩͱ͸
    ͓͓Αͦٯؔ਺ͷؔ܎
    NNϞσϧ΋ஞ࣍తʹֶश͢ΔલఏͷఏҊख๏ʹ͓͍ͯ͸
    ਖ਼ଇԽ߲ͷಋೖ͕༗ޮͰ͋Δ͜ͱ͕ࣔࠦ͞Εͨ
    ਖ਼ଇԽΛڧΊΔ͜ͱ͕ଟ࿹όϯσΟοτͷใुͷվળʹ
    ͭͳ͕Δ
    ਖ਼ଇԽΛڧΊΔ͜ͱͰɺNNϞσϧͷ൚Խੑೳ͕ߴ·Γɺ
    ίϯςΩετ৘ใͷදݱ͕Ұఆͷ܏޲Ͱ؇΍͔ʹมԽ


    → ଟ࿹όϯσΟοτղ๏ͷֶश͕҆ఆ͢Δͱߟ͑ΒΕΔ

    View Slide

  44. 5.
    ͓ΘΓʹ

    View Slide

  45. • ඇઢܗͳଟ࿹όϯσΟοτ໰୊ͷैདྷͷղ๏͕ɺֶश࣌ؒͷ૿ՃʹىҼͯ͠ஞ
    ࣍దԠੑΛଛͳ͏ͱ͍͏࣮Ԡ༻্ʹ΋ॏཁͱͳΔಉ෼໺Ͱͷ՝୊ʹண໨ͨ͠


    • ൓෮తͳֶश͕ෆཁͳOS-ELMΛैདྷͷղ๏ͱ౷߹͢Δ͜ͱͰɺඇઢܗͳ໰୊
    ʹରͯ͠ஞ࣍దԠੑΛซͤ࣋ͭଟ࿹όϯσΟοτղ๏ΛఏҊͨ͠


    • ैདྷͷઢܗ·ͨ͸ඇઢܗͳଟ࿹όϯσΟοτղ๏ͱൺֱධՁΛߦ͍ɺఏҊղ๏
    ͕ඇઢܗੑ΁ͷੑೳΛҡ࣋͠ͳ͕Βɺֶश࣌ؒΛ࡟ݮͰ͖Δ͜ͱΛࣔͨ͠


    • ଟ࿹όϯσΟοτղ๏ʹର͢Δɺ൓෮తͳֶश͕ෆཁͳNNϞσϧͷ༗༻ੑΛ
    ෼ੳ͠ɺఏҊղ๏ͷվળཁ݅Λݕ౼ͨ͠
    45
    ͓ΘΓʹ

    View Slide

  46. View Slide