Slide 44
Slide 44 text
Name Notation Intuition Where Used
State Value function V(s) How good is state s? Value-based methods
State-action value function Q(s,a) In state s, how good is action a? Q-Learning, DDPG
Policy π(s) What action do we take in state s? Policy-based methods
(But all RL methods have
some kind of policy)
Advantage function A(s,a) In state s, how much better is
action a, than the “average” V(s)?
Duelling DQN, Advantage
Actor Critic, A3C
Transition prediction
function
P(s′,r|s,a) In state s, if I take action a, what is
expected next state and reward?
Model-based RL
Reward prediction function R(s,a) In state s, if I take action a, what is
expected reward?
Model-based RL
Intro to RL+DQN by Robin Chauhan, Pathway Intelligence Inc. 44