todesking
December 27, 2019
77

# バンディット問題の理論とアルゴリズム 第8章 / bandit-8

## todesking

December 27, 2019

## Transcript

1. όϯσΟοτ໰୊ͷ
ཧ࿦ͱΞϧΰϦζϜ 8ষ
@todesking

2. ࿈ଓ࿹όϯσΟοτͱ
ϕΠζ࠷దԽ
wબ୒ࢶ͕࣮਺ͷϕΫτϧͰ͋ΔΑ͏ͳ৔߹ͷόϯσΟοτ
໰୊
wऔΕΔߦಈͷू߹
w࣌ࠁ Ͱͷߦಈ
wߦಈ ʹର͢Δใुظ଴஋
w࠷దͳߦಈ
w ͷܗঢ়͸ະ஌ɺ ͷબ୒ࢶ͸࣮࣭తʹແݶͱ͍͏աࠅ
ͳঢ়گ
⊂ ℝd
t at

a f(a)
a* = argmaxa∈
f(a)
f(a) a

3. ࡶԻ
wࡶԻͳ͠Ϟσϧ
wࡶԻ͋ΓϞσϧ
w͜ͷষͰ͸ ͷࡶԻ͕৐ΔϞσϧΛߟ͑Δ
wࡶԻͳ͠Ϟσϧͷ৔߹ɺ ճͷ୳ࡧͰ࣮֬ʹ ͕Θ͔Δͨ
Ίɺ ͷ৔߹Λߟ͑Δͷ͕ຊ࣭త
wྫ.-ϞσϧͷϋΠύʔύϥϝʔλΛ ɺMFBWFPOFPVU\$7
ͷ݁ՌΛ ͱ͢Δɻ
wਅ໘໨ʹ-00\$7͢ΔͳΒޡࠩͳ͠Ϟσϧ
wҰ෦ͷαϯϓϧ͚ͩͰ\$7͢ΔͳΒޡࠩ͋ΓϞσϧ
Xt
= f(at
)
Xt
= f(at
) + ϵt
, E[ϵt
] = 0
ϵt
∼ (0,σ2)
|| a*
|| = ∞
a
f(a)

4. ࿈ଓ࿹όϯσΟοτ໰୊ʹ͓͚Δ
ϦάϨοτ
w
wใु࠷େͷબ୒ࢶΛબͼଓ͚ͨ৔߹ͱ࣮ࡍͷใुͷࠩ
wྦྷੵใु࠷େԽΛ໨తͱ͢Δ
w୯७ϦάϨοτ
w ࣌ࠁ ʹ͓͍ͯ΋ͬͱ΋ใुظ଴஋͕ଟ͍ߦಈ
w࠷େͷใुͱ࠷ऴతʹબ͹Εͨߦಈͷใुͷࠩ
wࡶԻͳ͠Ϟσϧͷ৔߹ɺ
w࠷ద࿹ࣝผΛ໨తͱ͢Δ
regret(T) =
T

t=1
(f(a*) − f(at
))
Δ(T) = f(a*) − f( ̂
a*(T))
̂
a*(T) T
Δ(T) = f(a*) − max
i∈{1,…,T}
f(at
)

5. ظ଴஋ؔ਺ͷΫϥε
w Ϧϓγοπ࿈ଓͳͲ
͸͠͹Β͘ग़ͯ͜ͳ͍ͷͰޙ
ճ͠

6. ϕΠζ࠷దԽ
wΨ΢εաఔΛԾఆͯ͠࠷ద࿹ࣝผ໰୊Λղ͘
w࠷ѱ࣌ͷੑೳʹ͍ͭͯ͸ఘΊͯɺʮฏۉతͳʯέʔεʹ
͓͍ͯΑ͍ੑೳΛ༩͑Δํࡦ
wใुظ଴஋ ͕Ψ΢εաఔʹै͏ͱͯ͠ɺϦάϨοτ
΍୯७ϦάϨοτΛ࠷దԽ͢Δํࡦ
wલఏͱͯ͠Ψ΢εաఔ͕ඞཁͳͷͰઌʹઆ໌͠·͢
f(a)

7. ༧උ஌ࣝ: Ψ΢εաఔ
wࠓ·Ͱͷߦಈ͓Αͼใु Λݩʹɺߦಈ ʹΑͬͯಘΒΕΔ
ใुͷظ଴஋͕ै͏෼෍ ΛٻΊ͍ͨ
wΨ΢εաఔΛ࢖͏͜ͱͰ͜ͷ෼෍͕ܭࢉՄೳ
wҎ߱ͷํࡦ (16\$#ɺτϯϓιϯɺظ଴վળྔ
ͷલఏͱͳΔ
wࢀߟจݙΨ΢εաఔͱػցֶश ػցֶशϓϩϑΣογϣφϧ
γϦʔζ

at
, Xt
a
P[f(a) = x|at
, Xt
]

8. Ψ΢εաఔ: ໨త
w؍ଌ͞Εͨσʔλ͔Βɺະ஌ͷؔ਺ Λਪఆ͍ͨ͠
wೖྗ
wݶΒΕͨσʔλ͔Βͷਪఆ݁Ռ͸ෆਖ਼֬
wˠ৴པ౓Λ֬཰෼෍Ͱද͍ͨ͠
wؔ਺ͦͷ΋ͷͰ͸ͳ͘ɺͦͷ෼෍
ΛٻΊΔ
f(x) : ℝd → ℝ
X = {x1
, ⋯, xn
}, y = {f(x1
), ⋯, f(xn
)}
P[f |X, y]
https://www.ism.ac.jp/~daichi/lectures/H26-GaussianProcess/gp-lecture2-daichi.pdf
؍ଌσʔλ(ेࣈ)Λݩʹ༧ଌ͞Εͨyͷ෼෍ɻ

ظ଴஋͕࣮ઢɺ৴པ͕۠ؒ੨͍ྖҬͰࣔ͞Ε͍ͯΔ

9. Ψ΢εաఔ: ʹ͍ͭͯͷԾఆ
f
w؍ଌσʔλ͔Β ͷ෼෍ΛٻΊΔʹ͸ԿΒ͔ͷԾఆ͕͍Δ
wԾఆ ͸جఈؔ਺ ʹΑΔҰൠԽઢܕϞσϧ
Ͱ͋Γɺࣄલ෼෍ Ͱ͋Δ
w͋Δ ʹର͢Δ ͷ஋͸ɺ
Λ࢖ͬͯ ͱॻ͚Δ
w͜ͷͱ͖ɺ ͸ଟมྔΨ΢ε෼෍ʹै͍ɺ Ͱ͋Δ
wΧʔωϧؔ਺ Λ༻͍Δͱ
ͱॻ͚Δ
f
f ϕ(x) = (ϕ1
(x), …, ϕd
(x)) f(x) = wTϕ(x)
P(w) = (0, α−1I)
X = (x1
, …, xN
)T y = ( f(x1
), …, f(xn
))T
Φ =
ϕ1
(x1
) … ϕd
(x1
)
⋮ ⋱ ⋮
ϕ1
(xN
) … ϕd
(xN
)
y = Φw
y y = (0, α−1ΦΦT)
k(xn
, xm
) = α−1ϕ(xn
)Tϕ(xm
)
y = (0,
k(x1
, x1
) ⋯ k(x1
, xN
)
⋮ ⋱ ⋮
k(xN
, x1
) ⋯ k(xN
, xN
)
)

10. Ψ΢εաఔ: ༧ଌ
w ͕؍ଌ͞Ε͍ͯΔঢ়ଶͰɺ ͷ෼෍Λٻ
Ί͍ͨ
wؔ਺ͷ෼෍ͱ͸Կ͔೚ҙͷ ʹରͯ͠ɺ
ͷಉ࣌෼෍ ͕Θ͔Ε͹ ͷ෼෍͕Θ
͔ͬͨ͜ͱʹͳΔ
wଟมྔΨ΢ε෼෍ͷ৚݅෇͖෼෍ʹΑͬͯٻΊΒΕΔ
w ͱͯ͠ɺ
Ͱ͋Δͱ͖ɺ
ʹͳΔ
wະ஌ͷ ͷ෼෍Λط஌ͷ Ͱදݱ͢Δ͜ͱ͕Ͱ͖ͨ
X = {x1
, ⋯, xn
}T, y = {f(x1
), ⋯, f(xn
)}T f
X* = {x*
1
, …, x*
M
}T
y* = {f(x*
1
), ⋯, f(x*
M
)}T P[y*|X*, X, y] f
K(n, m) = k(xn
, xm
) k*
(n, m) = k(xn
, x*
m
) k**
(n, m) = k(x*
n
, x*
m
)
(
y
y*) ∼
(
0, (
K k*
kT
*
k**
))
P[y*|X*, X, y] = (kT
*
K−1y, k**
− kT
*
K−1k*)
y* X*, X, y

11. ϕΠζ࠷దԽ: Χʔωϧ
wόϯσΟοτຊʹ໭Δ
wΧʔωϧͷબ୒
wਖ਼ఆ஋Χʔωϧؔ਺ ʹରͯ͠ ͱ͢Δͷ͕Ұൠ

w
wεέʔϧύϥϝʔλ ΛͱΔ
wΨ΢εΧʔωϧ
wΨ΢εΧʔωϧͷҰൠԽͰ͋ΔϚλʔϯΧʔωϧ
wઢܗΧʔωϧ Λ࢖༻ͨ͠৔߹ઢܗόϯσΟοτ ষ

ͱಉ౳
g k(a, a′ ) = σ2
0
g(∥a − a′ ∥λ
)
∥a∥λ
=
d

i=1
ai
/λ2
i
λ
g(z) = exp(−z2/2)
k(a, a′ ) = σ2
0
aTa′

12. ࿈ଓ࿹όϯσΟοτͷํࡦ:
GP-UCB
w ͕Ψ΢εաఔʹै͏ͱͨ͠৔߹ɺࠓ·Ͱͷ݁Ռ͔Βߦಈ ͰಘΒΕΔ
ใु ͷฏۉ ͱ෼ࢄ ͕ܭࢉՄೳ
wϕΠζ৴པ۠ؒͷ্ݶ͸ ͱͳΔ
w ৴པ౓
w֤࣌ࠁͰ Λ࠷େԽ͢Δ ΛબͿํࡦ͕(16\$#
wϦάϨοτΛ࠷খԽͤ͞Δͷ͕໨త͕ͩɺ Λେ͖͘औΔͱ୯७Ϧά
Ϩοτͷ࠷খԽʹ΋ରԠՄೳ
w Ψ΢εΧʔωϧ
w ࣍਺ ͷϚλʔϯΧʔωϧ
f a
f(a) μ(a|Xt
) σ(a|Xt
)
¯
μa
(t) = μ(a|Xt
) + αt
σ(a|Xt
)
αt
̂
μa
(t) a
αt
regret(T) = O ( T(log T)d+2
)
regret(T) = O (T ν + d(d + 1)
2ν + d(d + 1) log T) 1 < ν < ∞

13. ࿈ଓ࿹όϯσΟοτͷํࡦ:
τϯϓιϯநग़
wΨ΢εաఔΛલఏͱ͍ͯ͠ΔͨΊɺ ͷ෼෍͕ಘΒΕΔ
wՄೳͳߦಈ Λ཭ࢄԽ͠ɺ༗ݶͷީิ ʹରͯ͠ ͷ஋
ΛαϯϓϦϯά͠ɺ࠷େͱͳΔߦಈΛબͿ
wΨ΢εաఔΛ࢖ͬͨํࡦʹڞ௨͢Δಛ௃ͱͯ͠ɺ ࣍ݩߦྻ
ʹؔ͢Δܭࢉ͕ඞཁࢼߦ͕ଟ͘ͳΔͱܭࢉྔ͕૿͢
w ͷܭࢉྔφΠʔϒʹ΍ͬͯ
wۙࣅ͢Ε͹ݮΒͤ͸͢Δ
w3BUFTPG\$POWFSHFODFGPS4QBSTF7BSJBUJPOBM
(BVTTJBO1SPDFTT3FHSFTTJPO*\$.-#FTUQBQFS
wઢܗόϯσΟοτʹ͓͍ͯ͸͜ͷ໰୊͕ͳ͍͔ΘΓʹɺແݶ
࣍ݩΛѻ͑ͳ͍
f(a)
a′
s
f(a ∈ a′
s
)
t
K−1 O(N3)

14. ࿈ଓ࿹όϯσΟοτͷํࡦ:
ظ଴վળྔํࡦ
wΨ΢εաఔʹ͓͍ͯ୯७ϦάϨοτͷ࠷খԽΛ໨ࢦ͢ํࡦ
w ճͷߦಈͷதͰҰ൪ྑ͔ͬͨใु
w ճͷߦಈޙͷ୯७ϦάϨοτ
w࣌ࠁ5ʹ͓͚Δ࠷దߦಈ
wظ଴վળྔ
wߦಈ ʹΑͬͯɺࠓ·Ͱ؍ଌ͞Εͨ࠷େ஋͔ΒͲΕ͚ͩվળ͞ΕΔ͔
ͷظ଴஋
w ͕࠷େʹͳΔબ୒ࢶΛબͿ
T ̂
f*
T
= max{f(aT
), ̂
f*
T−1
}
T Δ(T) = f(a*) − ̂
f*
T
̂
aT
= argmaxa∈
E[max{f(a), ̂
f*
T−1
}| f(aT−1
)]
= argmaxa∈
E[max{f(a) − ̂
f*
T−1
,0}| f(aT−1
)]
EI(a| f(at
)) = E[max{f(a) − ̂
f*
t
,0}| f(at
)]
a
EI(a| f(at
))

15. ଟ߲ࣜ࣌ؒͰ࣮ߦՄೳͳํࡦ
w͚ͩ͜͜Ψ΢εաఔؔ܎ͳ͍ͷͰޙճ͠

16. ڞ෼ࢄؔ਺ͷύϥϝʔλਪఆ
wΨ΢εաఔΛ࢖͏৔߹ɺڞ෼ࢄؔ਺ʹͲͷΧʔωϧΛ࢖͏͔ɺϋΠ
ύʔύϥϝʔλΛͲ͏͢Δ͔ͱ͍ͬͨબ୒͕ඞཁ
wڞ෼ࢄؔ਺Λܾఆ͢ΔͨΊͷύϥϝʔλ εέʔϧύϥϝʔλɺΧʔ
ωϧͷछྨɺΧʔωϧͷύϥϝʔλɺ؍ଌϊΠζͷ෼ࢄ
Λڞ෼ࢄύ
ϥϝʔλ ͱ͢Δ
w࣌ࠁ Ͱͷ ͷ໬౓͸ɺ ͷ΋ͱͰͷڞ෼ࢄؔ਺ Λ࢖ͬͯ
ͱͳΔ
θ
t θ θ k(θ)
L(θ; Xt
) =
1
(2π)ddet(k(θ)(at
, at
) + σ2Id
)
exp(
1
2
Xt
(k(θ)(at
, at
) + σ2Id
)−1XT
t
)

17. ڞ෼ࢄؔ਺ͷύϥϝʔλਪఆ
w࣌ࠁ ʹ͓͚Δߦಈ Λܾఆ͢Δࡍɺ Λ
༻͍Δํࡦ
wಛʹ ͕খ͍͞͏ͪ͸ɺਅ஋͔Β͔͚཭ΕͨྖҬʹਪఆ஋͕ऩଋ
͢Δ৔߹͕͋Δ
w&*ํࡦͷ৔߹ɺ Ͱ΋࠷ѱ࣌Ͱ͸୯७ϦάϨοτ͕ʹऩଋ
͠ͳ͍
wແݶʹࢼߦͯ͠΋ਖ਼͍͠౴͕͑ಘΒΕͳ͍ʜʜ
wڞ෼ࢄύϥϝʔλʹؔ͢ΔࣄޙฏۉΛͱΔํࡦ
wԿΒ͔ͷείΞؔ਺ Λ࠷େԽ͢Δ Λબͼ͍ͨ৔߹
wࣄલ෼෍ ௨ৗ͸Ұ༷෼෍
Λஔ͖ɺ Λ࠷େ
Խ͢Δ ΛબͿ
t + 1 at+1
̂
θt
= argmaxθ∈Θ
L(θ; Xt
)
t
t → ∞

(a) a
π(θ) Eθ∼π(θ|Xt
)
[uθ
(a; Xt
)]
a

18. ଟ߲ࣜ࣌ؒͰ࣮ߦՄೳͳํࡦ:
ۭؒ෼ׂʹجͮ͘SOOํࡦ
wΨ΢εաఔͰ͸ͳ͍΍ͭ
w؍ଌؔ਺ʹର͢Δଟ߲ࣜ࣌ؒͰ࣮ݱՄೳ͔ͭ୯७Ϧά
Ϩοτͷऩଋ͕อূՄೳͳํࡦ
w͜͜ͰΑ͏΍͘׈Β͔͞ͷ੍໿͕ग़ͯ͘Δ
w
࠷ద఺ ɺ͋Δ ͱ͢΂ͯͷ ʹ͓͍
ͯɺ Λຬͨ͢
a* c, a > 0 a ∈
f(a*) − f(a) ≤ c∥a* − a∥α

19. SOOํࡦ
wՄೳͳߦಈͷۭؒΛ෼ׂ͠ɺͦΕͧΕͷྖҬͷத఺Λߦ
ಈͱͯ͠બ୒
wಘͨใुΛݩʹɺΑͦ͞͏ͳ෦෼ۭؒͷީิΛબ୒͠ɺ
࠶ؼతʹ෼ׂ͍ͯ͘͠
wͳΜͰ͏·͍͘͘ͷ͔ཧղͯ͠ͳ͍ʜʜݩؾ͕͋ͬͨΒ
ޱ಄Ͱ΍Γ·͠ΐ͏