Upgrade to Pro — share decks privately, control downloads, hide ads and more …

バンディット問題の理論とアルゴリズム 第8章 / bandit-8

todesking
December 27, 2019

バンディット問題の理論とアルゴリズム 第8章 / bandit-8

todesking

December 27, 2019
Tweet

More Decks by todesking

Other Decks in Technology

Transcript

  1. όϯσΟοτ໰୊ͷ
    ཧ࿦ͱΞϧΰϦζϜ 8ষ
    @todesking

    View full-size slide

  2. ࿈ଓ࿹όϯσΟοτͱ
    ϕΠζ࠷దԽ
    wબ୒ࢶ͕࣮਺ͷϕΫτϧͰ͋ΔΑ͏ͳ৔߹ͷόϯσΟοτ
    ໰୊
    wऔΕΔߦಈͷू߹
    w࣌ࠁ Ͱͷߦಈ
    wߦಈ ʹର͢Δใुظ଴஋
    w࠷దͳߦಈ
    w ͷܗঢ়͸ະ஌ɺ ͷબ୒ࢶ͸࣮࣭తʹແݶͱ͍͏աࠅ
    ͳঢ়گ
    ⊂ ℝd
    t at

    a f(a)
    a* = argmaxa∈
    f(a)
    f(a) a

    View full-size slide

  3. ࡶԻ
    wࡶԻͳ͠Ϟσϧ
    wࡶԻ͋ΓϞσϧ
    w͜ͷষͰ͸ ͷࡶԻ͕৐ΔϞσϧΛߟ͑Δ
    wࡶԻͳ͠Ϟσϧͷ৔߹ɺ ճͷ୳ࡧͰ࣮֬ʹ ͕Θ͔Δͨ
    Ίɺ ͷ৔߹Λߟ͑Δͷ͕ຊ࣭త
    wྫ.-ϞσϧͷϋΠύʔύϥϝʔλΛ ɺMFBWFPOFPVU$7
    ͷ݁ՌΛ ͱ͢Δɻ
    wਅ໘໨ʹ-00$7͢ΔͳΒޡࠩͳ͠Ϟσϧ
    wҰ෦ͷαϯϓϧ͚ͩͰ$7͢ΔͳΒޡࠩ͋ΓϞσϧ
    Xt
    = f(at
    )
    Xt
    = f(at
    ) + ϵt
    , E[ϵt
    ] = 0
    ϵt
    ∼ (0,σ2)
    || a*
    || = ∞
    a
    f(a)

    View full-size slide

  4. ࿈ଓ࿹όϯσΟοτ໰୊ʹ͓͚Δ
    ϦάϨοτ
    w
    wใु࠷େͷબ୒ࢶΛબͼଓ͚ͨ৔߹ͱ࣮ࡍͷใुͷࠩ
    wྦྷੵใु࠷େԽΛ໨తͱ͢Δ
    w୯७ϦάϨοτ
    w ࣌ࠁ ʹ͓͍ͯ΋ͬͱ΋ใुظ଴஋͕ଟ͍ߦಈ
    w࠷େͷใुͱ࠷ऴతʹબ͹Εͨߦಈͷใुͷࠩ
    wࡶԻͳ͠Ϟσϧͷ৔߹ɺ
    w࠷ద࿹ࣝผΛ໨తͱ͢Δ
    regret(T) =
    T

    t=1
    (f(a*) − f(at
    ))
    Δ(T) = f(a*) − f( ̂
    a*(T))
    ̂
    a*(T) T
    Δ(T) = f(a*) − max
    i∈{1,…,T}
    f(at
    )

    View full-size slide

  5. ظ଴஋ؔ਺ͷΫϥε
    w Ϧϓγοπ࿈ଓͳͲ
    ͸͠͹Β͘ग़ͯ͜ͳ͍ͷͰޙ
    ճ͠

    View full-size slide

  6. ϕΠζ࠷దԽ
    wΨ΢εաఔΛԾఆͯ͠࠷ద࿹ࣝผ໰୊Λղ͘
    w࠷ѱ࣌ͷੑೳʹ͍ͭͯ͸ఘΊͯɺʮฏۉతͳʯέʔεʹ
    ͓͍ͯΑ͍ੑೳΛ༩͑Δํࡦ
    wใुظ଴஋ ͕Ψ΢εաఔʹै͏ͱͯ͠ɺϦάϨοτ
    ΍୯७ϦάϨοτΛ࠷దԽ͢Δํࡦ
    wલఏͱͯ͠Ψ΢εաఔ͕ඞཁͳͷͰઌʹઆ໌͠·͢
    f(a)

    View full-size slide

  7. ༧උ஌ࣝ: Ψ΢εաఔ
    wࠓ·Ͱͷߦಈ͓Αͼใु Λݩʹɺߦಈ ʹΑͬͯಘΒΕΔ
    ใुͷظ଴஋͕ै͏෼෍ ΛٻΊ͍ͨ
    wΨ΢εաఔΛ࢖͏͜ͱͰ͜ͷ෼෍͕ܭࢉՄೳ
    wҎ߱ͷํࡦ (16$#ɺτϯϓιϯɺظ଴վળྔ
    ͷલఏͱͳΔ
    wࢀߟจݙΨ΢εաఔͱػցֶश ػցֶशϓϩϑΣογϣφϧ
    γϦʔζ

    at
    , Xt
    a
    P[f(a) = x|at
    , Xt
    ]

    View full-size slide

  8. Ψ΢εաఔ: ໨త
    w؍ଌ͞Εͨσʔλ͔Βɺະ஌ͷؔ਺ Λਪఆ͍ͨ͠
    wೖྗ
    wݶΒΕͨσʔλ͔Βͷਪఆ݁Ռ͸ෆਖ਼֬
    wˠ৴པ౓Λ֬཰෼෍Ͱද͍ͨ͠
    wؔ਺ͦͷ΋ͷͰ͸ͳ͘ɺͦͷ෼෍
    ΛٻΊΔ
    f(x) : ℝd → ℝ
    X = {x1
    , ⋯, xn
    }, y = {f(x1
    ), ⋯, f(xn
    )}
    P[f |X, y]
    https://www.ism.ac.jp/~daichi/lectures/H26-GaussianProcess/gp-lecture2-daichi.pdf
    ؍ଌσʔλ(ेࣈ)Λݩʹ༧ଌ͞Εͨyͷ෼෍ɻ

    ظ଴஋͕࣮ઢɺ৴པ͕۠ؒ੨͍ྖҬͰࣔ͞Ε͍ͯΔ

    View full-size slide

  9. Ψ΢εաఔ: ʹ͍ͭͯͷԾఆ
    f
    w؍ଌσʔλ͔Β ͷ෼෍ΛٻΊΔʹ͸ԿΒ͔ͷԾఆ͕͍Δ
    wԾఆ ͸جఈؔ਺ ʹΑΔҰൠԽઢܕϞσϧ
    Ͱ͋Γɺࣄલ෼෍ Ͱ͋Δ
    w͋Δ ʹର͢Δ ͷ஋͸ɺ
    Λ࢖ͬͯ ͱॻ͚Δ
    w͜ͷͱ͖ɺ ͸ଟมྔΨ΢ε෼෍ʹै͍ɺ Ͱ͋Δ
    wΧʔωϧؔ਺ Λ༻͍Δͱ
    ͱॻ͚Δ
    f
    f ϕ(x) = (ϕ1
    (x), …, ϕd
    (x)) f(x) = wTϕ(x)
    P(w) = (0, α−1I)
    X = (x1
    , …, xN
    )T y = ( f(x1
    ), …, f(xn
    ))T
    Φ =
    ϕ1
    (x1
    ) … ϕd
    (x1
    )
    ⋮ ⋱ ⋮
    ϕ1
    (xN
    ) … ϕd
    (xN
    )
    y = Φw
    y y = (0, α−1ΦΦT)
    k(xn
    , xm
    ) = α−1ϕ(xn
    )Tϕ(xm
    )
    y = (0,
    k(x1
    , x1
    ) ⋯ k(x1
    , xN
    )
    ⋮ ⋱ ⋮
    k(xN
    , x1
    ) ⋯ k(xN
    , xN
    )
    )

    View full-size slide

  10. Ψ΢εաఔ: ༧ଌ
    w ͕؍ଌ͞Ε͍ͯΔঢ়ଶͰɺ ͷ෼෍Λٻ
    Ί͍ͨ
    wؔ਺ͷ෼෍ͱ͸Կ͔೚ҙͷ ʹରͯ͠ɺ
    ͷಉ࣌෼෍ ͕Θ͔Ε͹ ͷ෼෍͕Θ
    ͔ͬͨ͜ͱʹͳΔ
    wଟมྔΨ΢ε෼෍ͷ৚݅෇͖෼෍ʹΑͬͯٻΊΒΕΔ
    w ͱͯ͠ɺ
    Ͱ͋Δͱ͖ɺ
    ʹͳΔ
    wະ஌ͷ ͷ෼෍Λط஌ͷ Ͱදݱ͢Δ͜ͱ͕Ͱ͖ͨ
    X = {x1
    , ⋯, xn
    }T, y = {f(x1
    ), ⋯, f(xn
    )}T f
    X* = {x*
    1
    , …, x*
    M
    }T
    y* = {f(x*
    1
    ), ⋯, f(x*
    M
    )}T P[y*|X*, X, y] f
    K(n, m) = k(xn
    , xm
    ) k*
    (n, m) = k(xn
    , x*
    m
    ) k**
    (n, m) = k(x*
    n
    , x*
    m
    )
    (
    y
    y*) ∼
    (
    0, (
    K k*
    kT
    *
    k**
    ))
    P[y*|X*, X, y] = (kT
    *
    K−1y, k**
    − kT
    *
    K−1k*)
    y* X*, X, y

    View full-size slide

  11. ϕΠζ࠷దԽ: Χʔωϧ
    wόϯσΟοτຊʹ໭Δ
    wΧʔωϧͷબ୒
    wਖ਼ఆ஋Χʔωϧؔ਺ ʹରͯ͠ ͱ͢Δͷ͕Ұൠ

    w
    wεέʔϧύϥϝʔλ ΛͱΔ
    wΨ΢εΧʔωϧ
    wΨ΢εΧʔωϧͷҰൠԽͰ͋ΔϚλʔϯΧʔωϧ
    wઢܗΧʔωϧ Λ࢖༻ͨ͠৔߹ઢܗόϯσΟοτ ষ

    ͱಉ౳
    g k(a, a′ ) = σ2
    0
    g(∥a − a′ ∥λ
    )
    ∥a∥λ
    =
    d

    i=1
    ai
    /λ2
    i
    λ
    g(z) = exp(−z2/2)
    k(a, a′ ) = σ2
    0
    aTa′

    View full-size slide

  12. ࿈ଓ࿹όϯσΟοτͷํࡦ:
    GP-UCB
    w ͕Ψ΢εաఔʹै͏ͱͨ͠৔߹ɺࠓ·Ͱͷ݁Ռ͔Βߦಈ ͰಘΒΕΔ
    ใु ͷฏۉ ͱ෼ࢄ ͕ܭࢉՄೳ
    wϕΠζ৴པ۠ؒͷ্ݶ͸ ͱͳΔ
    w ৴པ౓
    w֤࣌ࠁͰ Λ࠷େԽ͢Δ ΛબͿํࡦ͕(16$#
    wϦάϨοτΛ࠷খԽͤ͞Δͷ͕໨త͕ͩɺ Λେ͖͘औΔͱ୯७Ϧά
    Ϩοτͷ࠷খԽʹ΋ରԠՄೳ
    w Ψ΢εΧʔωϧ
    w ࣍਺ ͷϚλʔϯΧʔωϧ
    f a
    f(a) μ(a|Xt
    ) σ(a|Xt
    )
    ¯
    μa
    (t) = μ(a|Xt
    ) + αt
    σ(a|Xt
    )
    αt
    ̂
    μa
    (t) a
    αt
    regret(T) = O ( T(log T)d+2
    )
    regret(T) = O (T ν + d(d + 1)
    2ν + d(d + 1) log T) 1 < ν < ∞

    View full-size slide

  13. ࿈ଓ࿹όϯσΟοτͷํࡦ:
    τϯϓιϯநग़
    wΨ΢εաఔΛલఏͱ͍ͯ͠ΔͨΊɺ ͷ෼෍͕ಘΒΕΔ
    wՄೳͳߦಈ Λ཭ࢄԽ͠ɺ༗ݶͷީิ ʹରͯ͠ ͷ஋
    ΛαϯϓϦϯά͠ɺ࠷େͱͳΔߦಈΛબͿ
    wΨ΢εաఔΛ࢖ͬͨํࡦʹڞ௨͢Δಛ௃ͱͯ͠ɺ ࣍ݩߦྻ
    ʹؔ͢Δܭࢉ͕ඞཁࢼߦ͕ଟ͘ͳΔͱܭࢉྔ͕૿͢
    w ͷܭࢉྔφΠʔϒʹ΍ͬͯ
    wۙࣅ͢Ε͹ݮΒͤ͸͢Δ
    w3BUFTPG$POWFSHFODFGPS4QBSTF7BSJBUJPOBM
    (BVTTJBO1SPDFTT3FHSFTTJPO*$.-#FTUQBQFS
    wઢܗόϯσΟοτʹ͓͍ͯ͸͜ͷ໰୊͕ͳ͍͔ΘΓʹɺແݶ
    ࣍ݩΛѻ͑ͳ͍
    f(a)
    a′
    s
    f(a ∈ a′
    s
    )
    t
    K−1 O(N3)

    View full-size slide

  14. ࿈ଓ࿹όϯσΟοτͷํࡦ:
    ظ଴վળྔํࡦ
    wΨ΢εաఔʹ͓͍ͯ୯७ϦάϨοτͷ࠷খԽΛ໨ࢦ͢ํࡦ
    w ճͷߦಈͷதͰҰ൪ྑ͔ͬͨใु
    w ճͷߦಈޙͷ୯७ϦάϨοτ
    w࣌ࠁ5ʹ͓͚Δ࠷దߦಈ
    wظ଴վળྔ
    wߦಈ ʹΑͬͯɺࠓ·Ͱ؍ଌ͞Εͨ࠷େ஋͔ΒͲΕ͚ͩվળ͞ΕΔ͔
    ͷظ଴஋
    w ͕࠷େʹͳΔબ୒ࢶΛબͿ
    T ̂
    f*
    T
    = max{f(aT
    ), ̂
    f*
    T−1
    }
    T Δ(T) = f(a*) − ̂
    f*
    T
    ̂
    aT
    = argmaxa∈
    E[max{f(a), ̂
    f*
    T−1
    }| f(aT−1
    )]
    = argmaxa∈
    E[max{f(a) − ̂
    f*
    T−1
    ,0}| f(aT−1
    )]
    EI(a| f(at
    )) = E[max{f(a) − ̂
    f*
    t
    ,0}| f(at
    )]
    a
    EI(a| f(at
    ))

    View full-size slide

  15. ଟ߲ࣜ࣌ؒͰ࣮ߦՄೳͳํࡦ
    w͚ͩ͜͜Ψ΢εաఔؔ܎ͳ͍ͷͰޙճ͠

    View full-size slide

  16. ڞ෼ࢄؔ਺ͷύϥϝʔλਪఆ
    wΨ΢εաఔΛ࢖͏৔߹ɺڞ෼ࢄؔ਺ʹͲͷΧʔωϧΛ࢖͏͔ɺϋΠ
    ύʔύϥϝʔλΛͲ͏͢Δ͔ͱ͍ͬͨબ୒͕ඞཁ
    wڞ෼ࢄؔ਺Λܾఆ͢ΔͨΊͷύϥϝʔλ εέʔϧύϥϝʔλɺΧʔ
    ωϧͷछྨɺΧʔωϧͷύϥϝʔλɺ؍ଌϊΠζͷ෼ࢄ
    Λڞ෼ࢄύ
    ϥϝʔλ ͱ͢Δ
    w࣌ࠁ Ͱͷ ͷ໬౓͸ɺ ͷ΋ͱͰͷڞ෼ࢄؔ਺ Λ࢖ͬͯ
    ͱͳΔ
    θ
    t θ θ k(θ)
    L(θ; Xt
    ) =
    1
    (2π)ddet(k(θ)(at
    , at
    ) + σ2Id
    )
    exp(
    1
    2
    Xt
    (k(θ)(at
    , at
    ) + σ2Id
    )−1XT
    t
    )

    View full-size slide

  17. ڞ෼ࢄؔ਺ͷύϥϝʔλਪఆ
    w࣌ࠁ ʹ͓͚Δߦಈ Λܾఆ͢Δࡍɺ Λ
    ༻͍Δํࡦ
    wಛʹ ͕খ͍͞͏ͪ͸ɺਅ஋͔Β͔͚཭ΕͨྖҬʹਪఆ஋͕ऩଋ
    ͢Δ৔߹͕͋Δ
    w&*ํࡦͷ৔߹ɺ Ͱ΋࠷ѱ࣌Ͱ͸୯७ϦάϨοτ͕ʹऩଋ
    ͠ͳ͍
    wແݶʹࢼߦͯ͠΋ਖ਼͍͠౴͕͑ಘΒΕͳ͍ʜʜ
    wڞ෼ࢄύϥϝʔλʹؔ͢ΔࣄޙฏۉΛͱΔํࡦ
    wԿΒ͔ͷείΞؔ਺ Λ࠷େԽ͢Δ Λબͼ͍ͨ৔߹
    wࣄલ෼෍ ௨ৗ͸Ұ༷෼෍
    Λஔ͖ɺ Λ࠷େ
    Խ͢Δ ΛબͿ
    t + 1 at+1
    ̂
    θt
    = argmaxθ∈Θ
    L(θ; Xt
    )
    t
    t → ∞

    (a) a
    π(θ) Eθ∼π(θ|Xt
    )
    [uθ
    (a; Xt
    )]
    a

    View full-size slide

  18. ଟ߲ࣜ࣌ؒͰ࣮ߦՄೳͳํࡦ:
    ۭؒ෼ׂʹجͮ͘SOOํࡦ
    wΨ΢εաఔͰ͸ͳ͍΍ͭ
    w؍ଌؔ਺ʹର͢Δଟ߲ࣜ࣌ؒͰ࣮ݱՄೳ͔ͭ୯७Ϧά
    Ϩοτͷऩଋ͕อূՄೳͳํࡦ
    w͜͜ͰΑ͏΍͘׈Β͔͞ͷ੍໿͕ग़ͯ͘Δ
    w
    ࠷ద఺ ɺ͋Δ ͱ͢΂ͯͷ ʹ͓͍
    ͯɺ Λຬͨ͢
    a* c, a > 0 a ∈
    f(a*) − f(a) ≤ c∥a* − a∥α

    View full-size slide

  19. SOOํࡦ
    wՄೳͳߦಈͷۭؒΛ෼ׂ͠ɺͦΕͧΕͷྖҬͷத఺Λߦ
    ಈͱͯ͠બ୒
    wಘͨใुΛݩʹɺΑͦ͞͏ͳ෦෼ۭؒͷީิΛબ୒͠ɺ
    ࠶ؼతʹ෼ׂ͍ͯ͘͠
    wͳΜͰ͏·͍͘͘ͷ͔ཧղͯ͠ͳ͍ʜʜݩؾ͕͋ͬͨΒ
    ޱ಄Ͱ΍Γ·͠ΐ͏

    View full-size slide