Slide 5
Slide 5 text
Key Concepts of Reinforcement Learning
Model-based
▶ Know the model of the environment, e.g., AlphaZero [Silver et al.,
2017]
▶ Learn the model of the environment, e.g., Dyna [Sutton, 1991]
▶ Example: Q value iteration:
Qk+1
(s, a) =
s′
p(s′|s, a)(r(s, a) + γ max
a′
Qk
(s′, a′))
Model-free
▶ Solve for the strategy without using or learning the environment model
▶ Example: Q learning:
Qk+1
(s, a) = Qk
(s, a) + α[r(s, a) + γ max
a′
Qk
(s′, a′) − Qk
(s, a)]
Zhao, UW WPO/SPO March 2024 5 / 30