todesking
December 27, 2019
86

# バンディット問題の理論とアルゴリズム 第8章 / bandit-8

## todesking

December 27, 2019

## Transcript

2. ### ࿈ଓ࿹όϯσΟοτͱ ϕΠζ࠷దԽ wબ୒ࢶ͕࣮਺ͷϕΫτϧͰ͋ΔΑ͏ͳ৔߹ͷόϯσΟοτ ໰୊ wऔΕΔߦಈͷू߹  w࣌ࠁ Ͱͷߦಈ  wߦಈ

ʹର͢Δใुظ଴஋  w࠷దͳߦಈ  w ͷܗঢ়͸ະ஌ɺ ͷબ୒ࢶ͸࣮࣭తʹແݶͱ͍͏աࠅ ͳঢ়گ ⊂ ℝd t at ∈ a f(a) a* = argmaxa∈ f(a) f(a) a
3. ### ࡶԻ wࡶԻͳ͠Ϟσϧ  wࡶԻ͋ΓϞσϧ  w͜ͷষͰ͸ ͷࡶԻ͕৐ΔϞσϧΛߟ͑Δ wࡶԻͳ͠Ϟσϧͷ৔߹ɺ ճͷ୳ࡧͰ࣮֬ʹ ͕Θ͔Δͨ

Ίɺ ͷ৔߹Λߟ͑Δͷ͕ຊ࣭త wྫ.-ϞσϧͷϋΠύʔύϥϝʔλΛ ɺMFBWFPOFPVU\$7 ͷ݁ՌΛ ͱ͢Δɻ wਅ໘໨ʹ-00\$7͢ΔͳΒޡࠩͳ͠Ϟσϧ wҰ෦ͷαϯϓϧ͚ͩͰ\$7͢ΔͳΒޡࠩ͋ΓϞσϧ Xt = f(at ) Xt = f(at ) + ϵt , E[ϵt ] = 0 ϵt ∼ (0,σ2) || a* || = ∞ a f(a)
4. ### ࿈ଓ࿹όϯσΟοτ໰୊ʹ͓͚Δ ϦάϨοτ w  wใु࠷େͷબ୒ࢶΛબͼଓ͚ͨ৔߹ͱ࣮ࡍͷใुͷࠩ wྦྷੵใु࠷େԽΛ໨తͱ͢Δ w୯७ϦάϨοτ  w ࣌ࠁ

ʹ͓͍ͯ΋ͬͱ΋ใुظ଴஋͕ଟ͍ߦಈ w࠷େͷใुͱ࠷ऴతʹબ͹Εͨߦಈͷใुͷࠩ wࡶԻͳ͠Ϟσϧͷ৔߹ɺ  w࠷ద࿹ࣝผΛ໨తͱ͢Δ regret(T) = T ∑ t=1 (f(a*) − f(at )) Δ(T) = f(a*) − f( ̂ a*(T)) ̂ a*(T) T Δ(T) = f(a*) − max i∈{1,…,T} f(at )

7. ### ༧උ஌ࣝ: Ψ΢εաఔ wࠓ·Ͱͷߦಈ͓Αͼใु Λݩʹɺߦಈ ʹΑͬͯಘΒΕΔ ใुͷظ଴஋͕ै͏෼෍ ΛٻΊ͍ͨ wΨ΢εաఔΛ࢖͏͜ͱͰ͜ͷ෼෍͕ܭࢉՄೳ wҎ߱ͷํࡦ (16\$#ɺτϯϓιϯɺظ଴վળྔ

ͷલఏͱͳΔ wࢀߟจݙΨ΢εաఔͱػցֶश ػցֶशϓϩϑΣογϣφϧ γϦʔζ at , Xt a P[f(a) = x|at , Xt ]
8. ### Ψ΢εաఔ: ໨త w؍ଌ͞Εͨσʔλ͔Βɺະ஌ͷؔ਺ Λਪఆ͍ͨ͠ wೖྗ  wݶΒΕͨσʔλ͔Βͷਪఆ݁Ռ͸ෆਖ਼֬ wˠ৴པ౓Λ֬཰෼෍Ͱද͍ͨ͠ wؔ਺ͦͷ΋ͷͰ͸ͳ͘ɺͦͷ෼෍ ΛٻΊΔ

f(x) : ℝd → ℝ X = {x1 , ⋯, xn }, y = {f(x1 ), ⋯, f(xn )} P[f |X, y] https://www.ism.ac.jp/~daichi/lectures/H26-GaussianProcess/gp-lecture2-daichi.pdf ؍ଌσʔλ(ेࣈ)Λݩʹ༧ଌ͞Εͨyͷ෼෍ɻ ظ଴஋͕࣮ઢɺ৴པ͕۠ؒ੨͍ྖҬͰࣔ͞Ε͍ͯΔ
9. ### Ψ΢εաఔ: ʹ͍ͭͯͷԾఆ f w؍ଌσʔλ͔Β ͷ෼෍ΛٻΊΔʹ͸ԿΒ͔ͷԾఆ͕͍Δ wԾఆ ͸جఈؔ਺ ʹΑΔҰൠԽઢܕϞσϧ  Ͱ͋Γɺࣄલ෼෍

Ͱ͋Δ w͋Δ ʹର͢Δ ͷ஋͸ɺ Λ࢖ͬͯ ͱॻ͚Δ w͜ͷͱ͖ɺ ͸ଟมྔΨ΢ε෼෍ʹै͍ɺ Ͱ͋Δ wΧʔωϧؔ਺ Λ༻͍Δͱ ͱॻ͚Δ f f ϕ(x) = (ϕ1 (x), …, ϕd (x)) f(x) = wTϕ(x) P(w) = (0, α−1I) X = (x1 , …, xN )T y = ( f(x1 ), …, f(xn ))T Φ = ϕ1 (x1 ) … ϕd (x1 ) ⋮ ⋱ ⋮ ϕ1 (xN ) … ϕd (xN ) y = Φw y y = (0, α−1ΦΦT) k(xn , xm ) = α−1ϕ(xn )Tϕ(xm ) y = (0, k(x1 , x1 ) ⋯ k(x1 , xN ) ⋮ ⋱ ⋮ k(xN , x1 ) ⋯ k(xN , xN ) )
10. ### Ψ΢εաఔ: ༧ଌ w ͕؍ଌ͞Ε͍ͯΔঢ়ଶͰɺ ͷ෼෍Λٻ Ί͍ͨ wؔ਺ͷ෼෍ͱ͸Կ͔೚ҙͷ ʹରͯ͠ɺ ͷಉ࣌෼෍ ͕Θ͔Ε͹

ͷ෼෍͕Θ ͔ͬͨ͜ͱʹͳΔ wଟมྔΨ΢ε෼෍ͷ৚݅෇͖෼෍ʹΑͬͯٻΊΒΕΔ w   ͱͯ͠ɺ Ͱ͋Δͱ͖ɺ ʹͳΔ wະ஌ͷ ͷ෼෍Λط஌ͷ Ͱදݱ͢Δ͜ͱ͕Ͱ͖ͨ X = {x1 , ⋯, xn }T, y = {f(x1 ), ⋯, f(xn )}T f X* = {x* 1 , …, x* M }T y* = {f(x* 1 ), ⋯, f(x* M )}T P[y*|X*, X, y] f K(n, m) = k(xn , xm ) k* (n, m) = k(xn , x* m ) k** (n, m) = k(x* n , x* m ) ( y y*) ∼ ( 0, ( K k* kT * k** )) P[y*|X*, X, y] = (kT * K−1y, k** − kT * K−1k*) y* X*, X, y
11. ### ϕΠζ࠷దԽ: Χʔωϧ wόϯσΟοτຊʹ໭Δ wΧʔωϧͷબ୒ wਖ਼ఆ஋Χʔωϧؔ਺ ʹରͯ͠ ͱ͢Δͷ͕Ұൠ త w 

wεέʔϧύϥϝʔλ ΛͱΔ wΨ΢εΧʔωϧ  wΨ΢εΧʔωϧͷҰൠԽͰ͋ΔϚλʔϯΧʔωϧ wઢܗΧʔωϧ Λ࢖༻ͨ͠৔߹ઢܗόϯσΟοτ ষ ͱಉ౳ g k(a, a′ ) = σ2 0 g(∥a − a′ ∥λ ) ∥a∥λ = d ∑ i=1 ai /λ2 i λ g(z) = exp(−z2/2) k(a, a′ ) = σ2 0 aTa′
12. ### ࿈ଓ࿹όϯσΟοτͷํࡦ: GP-UCB w ͕Ψ΢εաఔʹै͏ͱͨ͠৔߹ɺࠓ·Ͱͷ݁Ռ͔Βߦಈ ͰಘΒΕΔ ใु ͷฏۉ ͱ෼ࢄ ͕ܭࢉՄೳ wϕΠζ৴པ۠ؒͷ্ݶ͸

ͱͳΔ w ৴པ౓ w֤࣌ࠁͰ Λ࠷େԽ͢Δ ΛબͿํࡦ͕(16\$# wϦάϨοτΛ࠷খԽͤ͞Δͷ͕໨త͕ͩɺ Λେ͖͘औΔͱ୯७Ϧά Ϩοτͷ࠷খԽʹ΋ରԠՄೳ w Ψ΢εΧʔωϧ w ࣍਺ ͷϚλʔϯΧʔωϧ f a f(a) μ(a|Xt ) σ(a|Xt ) ¯ μa (t) = μ(a|Xt ) + αt σ(a|Xt ) αt ̂ μa (t) a αt regret(T) = O ( T(log T)d+2 ) regret(T) = O (T ν + d(d + 1) 2ν + d(d + 1) log T) 1 < ν < ∞
13. ### ࿈ଓ࿹όϯσΟοτͷํࡦ: τϯϓιϯநग़ wΨ΢εաఔΛલఏͱ͍ͯ͠ΔͨΊɺ ͷ෼෍͕ಘΒΕΔ wՄೳͳߦಈ Λ཭ࢄԽ͠ɺ༗ݶͷީิ ʹରͯ͠ ͷ஋ ΛαϯϓϦϯά͠ɺ࠷େͱͳΔߦಈΛબͿ wΨ΢εաఔΛ࢖ͬͨํࡦʹڞ௨͢Δಛ௃ͱͯ͠ɺ

࣍ݩߦྻ ʹؔ͢Δܭࢉ͕ඞཁࢼߦ͕ଟ͘ͳΔͱܭࢉྔ͕૿͢ w ͷܭࢉྔφΠʔϒʹ΍ͬͯ  wۙࣅ͢Ε͹ݮΒͤ͸͢Δ w3BUFTPG\$POWFSHFODFGPS4QBSTF7BSJBUJPOBM (BVTTJBO1SPDFTT3FHSFTTJPO*\$.-#FTUQBQFS wઢܗόϯσΟοτʹ͓͍ͯ͸͜ͷ໰୊͕ͳ͍͔ΘΓʹɺແݶ ࣍ݩΛѻ͑ͳ͍ f(a) a′ s f(a ∈ a′ s ) t K−1 O(N3)
14. ### ࿈ଓ࿹όϯσΟοτͷํࡦ: ظ଴վળྔํࡦ wΨ΢εաఔʹ͓͍ͯ୯७ϦάϨοτͷ࠷খԽΛ໨ࢦ͢ํࡦ w ճͷߦಈͷதͰҰ൪ྑ͔ͬͨใु  w ճͷߦಈޙͷ୯७ϦάϨοτ  w࣌ࠁ5ʹ͓͚Δ࠷దߦಈ

 wظ଴վળྔ  wߦಈ ʹΑͬͯɺࠓ·Ͱ؍ଌ͞Εͨ࠷େ஋͔ΒͲΕ͚ͩվળ͞ΕΔ͔ ͷظ଴஋ w ͕࠷େʹͳΔબ୒ࢶΛબͿ T ̂ f* T = max{f(aT ), ̂ f* T−1 } T Δ(T) = f(a*) − ̂ f* T ̂ aT = argmaxa∈ E[max{f(a), ̂ f* T−1 }| f(aT−1 )] = argmaxa∈ E[max{f(a) − ̂ f* T−1 ,0}| f(aT−1 )] EI(a| f(at )) = E[max{f(a) − ̂ f* t ,0}| f(at )] a EI(a| f(at ))

16. ### ڞ෼ࢄؔ਺ͷύϥϝʔλਪఆ wΨ΢εաఔΛ࢖͏৔߹ɺڞ෼ࢄؔ਺ʹͲͷΧʔωϧΛ࢖͏͔ɺϋΠ ύʔύϥϝʔλΛͲ͏͢Δ͔ͱ͍ͬͨબ୒͕ඞཁ wڞ෼ࢄؔ਺Λܾఆ͢ΔͨΊͷύϥϝʔλ εέʔϧύϥϝʔλɺΧʔ ωϧͷछྨɺΧʔωϧͷύϥϝʔλɺ؍ଌϊΠζͷ෼ࢄ Λڞ෼ࢄύ ϥϝʔλ ͱ͢Δ w࣌ࠁ

Ͱͷ ͷ໬౓͸ɺ ͷ΋ͱͰͷڞ෼ࢄؔ਺ Λ࢖ͬͯ ͱͳΔ θ t θ θ k(θ) L(θ; Xt ) = 1 (2π)ddet(k(θ)(at , at ) + σ2Id ) exp( 1 2 Xt (k(θ)(at , at ) + σ2Id )−1XT t )
17. ### ڞ෼ࢄؔ਺ͷύϥϝʔλਪఆ w࣌ࠁ ʹ͓͚Δߦಈ Λܾఆ͢Δࡍɺ Λ ༻͍Δํࡦ wಛʹ ͕খ͍͞͏ͪ͸ɺਅ஋͔Β͔͚཭ΕͨྖҬʹਪఆ஋͕ऩଋ ͢Δ৔߹͕͋Δ w&*ํࡦͷ৔߹ɺ

Ͱ΋࠷ѱ࣌Ͱ͸୯७ϦάϨοτ͕ʹऩଋ ͠ͳ͍ wແݶʹࢼߦͯ͠΋ਖ਼͍͠౴͕͑ಘΒΕͳ͍ʜʜ wڞ෼ࢄύϥϝʔλʹؔ͢ΔࣄޙฏۉΛͱΔํࡦ wԿΒ͔ͷείΞؔ਺ Λ࠷େԽ͢Δ Λબͼ͍ͨ৔߹ wࣄલ෼෍ ௨ৗ͸Ұ༷෼෍ Λஔ͖ɺ Λ࠷େ Խ͢Δ ΛબͿ t + 1 at+1 ̂ θt = argmaxθ∈Θ L(θ; Xt ) t t → ∞ uθ (a) a π(θ) Eθ∼π(θ|Xt ) [uθ (a; Xt )] a
18. ### ଟ߲ࣜ࣌ؒͰ࣮ߦՄೳͳํࡦ: ۭؒ෼ׂʹجͮ͘SOOํࡦ wΨ΢εաఔͰ͸ͳ͍΍ͭ w؍ଌؔ਺ʹର͢Δଟ߲ࣜ࣌ؒͰ࣮ݱՄೳ͔ͭ୯७Ϧά Ϩοτͷऩଋ͕อূՄೳͳํࡦ w͜͜ͰΑ͏΍͘׈Β͔͞ͷ੍໿͕ग़ͯ͘Δ w  ࠷ద఺ ɺ͋Δ

ͱ͢΂ͯͷ ʹ͓͍ ͯɺ Λຬͨ͢ a* c, a > 0 a ∈ f(a*) − f(a) ≤ c∥a* − a∥α