Slide 60
Slide 60 text
Name Notation Intuition Where Used
Policy π(s) What action do we take in state s?
(π* is optimal)
Policy-based methods
(But all RL methods have
some kind of policy)
State Value function V
π
(s) How good is state s?
(using policy π)
Value-based methods
State-action value
function
Q
π
(s,a) In state s, how good is action a?
(using policy π)
Q-Learning, DDPG
Advantage function A
π
(s,a)
= Q
π
(s,a) - V
π
(s)
In state s, how much better is
action a, than the “overall” V
π
(s)?
(using policy π)
Duelling DQN, Advantage
Actor Critic, A3C
Transition prediction
function
P(s′,r|s,a) In state s, if I take action a, what is
expected next state and reward?
Model-based RL
Reward prediction
function
R(s,a) In state s, if I take action a, what is
expected reward?
Model-based RL
Intro to RL+DQN by Robin Chauhan, Pathway Intelligence Inc. 60