Slide 11
Slide 11 text
Qπ(s, a) ≐ π
[Gt
|St
= s, At
= a] = π
[Rt+1
+ γGt+1
|St
= s, At
= a]
= ∑
s′
p(s′|s, a)π
[Rt+1
+ γGt+1
|St
= s, At
= a, St+1
= s′]
= ∑
s′
p(s′|s, a)(r(s, a, s′) + γπ
[Gt+1
|St
= s, At
= a, St+1
= s′])
= ∑
s′
p(s′|s, a)(r(s, a, s′) + γVπ(s′))
= ∑
s′
p(s′|s, a)(r(s, a, s′) + γ∑
a′
π(a′|s′)Qπ(s′, a′))
#FMMNBOํఔࣜGPS Qπ(s, a)
ˡ݁Ռ
ˣ݁ՌΛೖ
ˡ݁Ռ