Slide 20
Slide 20 text
LUNREAL = Lmain +
c
L(c)
Q
+ LVR + LRP
LUNREAL = Lmain + CPC
c
L(c)
Q
+ CVRLVR + CRPLRP
LAS = L(πAS) + L(VAS)
CPC, CVR, CRP
Loss of main and auxiliary tasks
&OWJSPONFOU
3FQMBZ
#VGGFS
$POW
'$
-45.
-BTUSFXBSE
-BTUBDUJPO
!(#)
%(&|#)
$POW
'$
7
%F$POW
"EW
%F$POW
()*+
+
-
$POW
$POW
'$
!-.
(#)
%-.
(&|#)
main task
Pixel Control
Value Function Replay
Reward Prediction
Auxiliary Selection
CPC, CVR, CRP (4)
(CPC, CVR, CRP) = ({0, 1}, {0, 1}, {0, 1}) (5)
(CPC, CVR, CRP) = (0, 0, 0)ʙ(1, 1, 1) (6)
LAS = L(πAS) + L(VAS) (7)
CPC, CVR, CRP
(CPC, CVR, CRP) = ({0, 1}, {0, 1}, {0, 1})
(CPC, CVR, CRP) = (0, 0, 0)ʙ(1, 1, 1)
LAS = L(πAS) + L(VAS)
CPC, CVR, CRP
(CPC, CVR, CRP) = ({0, 1}, {0, 1}, {0, 1})
(CPC, CVR, CRP) = (0, 0, 0)ʙ(1, 1, 1)
LAS = L(πAS) + L(VAS)
• Multiply Auxiliary Selection outputs and loss of
auxiliary tasks
20
MPRG Work Document November 29,2018
1 ࣜ
LUNREAL = Lmain +
c
L(c)
Q
+ LVR + LRP
LUNREAL = Lmain + CPC
c
L(c)
Q
+ CVRLVR + C
LAS = L(πAS) + L(VAS)
a
൘୩ӳయ
2018 3 ݄ 3
1 ͡Ίʹ
{CPC, CVR, CRP} = {0, 1, 1} (1)
c
L(c)
Q
(2)
LVR (3)
LRP (4)
c
L(c)
Q
(5)
a
൘୩ӳయ
2018 3 ݄ 3
1 ͡Ίʹ
{CPC, CVR, CRP} = {0, 1, 1} (
c
L(c)
Q
(
LVR (
LRP (
c
L(c)
Q
(
a
൘୩ӳయ
2018 3 ݄ 3
Ίʹ
{CPC, CVR, CRP} = {0, 1, 1} (1)
c
L(c)
Q
(2)
LVR (3)
LRP (4)
c
L(c)
Q
(5)