Slide 12
Slide 12 text
Appendix - DQN and DDPG
12
DQN
Approximate true action value with NN
π π , π β π) π , π
object function πΌ #,%,&,#! ~β¬
π + πΎ max
%!
π)
π *, π* β π)
π , π +
DDPG
Approximate value in case of the continuous action
If π is continuous, this is difficult
Actor: π* π
Critic: π)
π , π
object function
πΌ #,%,&,#! ~β¬
π + πΎπ)
π *, π,
π β² β π)
π , π
+
πΌ#~β¬
βπ)
π , π,
π
for π
for π