Slide 82
Slide 82 text
Initialize
Q(s,a)
Initialize
s0
Choose initial
action a0 from π
Execute
a
Observe
s'
Choose next
action a'
from π
Update
Q(s',a')
with r
Advance
s,a = s',a'
Q(s,a) = Q(s',a')
SARSA
Repeat until
convergence
Initialization
Q (s, a) = Q(s, a) + [r + ⇥Q(s , a ) Q(s, a)]
Thursday, March 1, 12