LRP LUNREAL = Lmain + CPC c L(c) Q + CVRLVR + CRPLRP LAS = L(πAS) + L(VAS) CPC, CVR, CRP Loss of main and auxiliary tasks &OWJSPONFOU 3FQMBZ #VGGFS $POW '$ -45. -BTUSFXBSE -BTUBDUJPO !(#) %(&|#) $POW '$ 7 %F$POW "EW %F$POW ()*+ + - $POW $POW '$ !-. (#) %-. (&|#) main task Pixel Control Value Function Replay Reward Prediction Auxiliary Selection CPC, CVR, CRP (4) (CPC, CVR, CRP) = ({0, 1}, {0, 1}, {0, 1}) (5) (CPC, CVR, CRP) = (0, 0, 0)ʙ(1, 1, 1) (6) LAS = L(πAS) + L(VAS) (7) CPC, CVR, CRP (CPC, CVR, CRP) = ({0, 1}, {0, 1}, {0, 1}) (CPC, CVR, CRP) = (0, 0, 0)ʙ(1, 1, 1) LAS = L(πAS) + L(VAS) CPC, CVR, CRP (CPC, CVR, CRP) = ({0, 1}, {0, 1}, {0, 1}) (CPC, CVR, CRP) = (0, 0, 0)ʙ(1, 1, 1) LAS = L(πAS) + L(VAS) • Multiply Auxiliary Selection outputs and loss of auxiliary tasks 20 MPRG Work Document November 29,2018 1 ࣜ LUNREAL = Lmain + c L(c) Q + LVR + LRP LUNREAL = Lmain + CPC c L(c) Q + CVRLVR + C LAS = L(πAS) + L(VAS) a ൘୩ӳయ 2018 3 ݄ 3 1 ͡Ίʹ {CPC, CVR, CRP} = {0, 1, 1} (1) c L(c) Q (2) LVR (3) LRP (4) c L(c) Q (5) a ൘୩ӳయ 2018 3 ݄ 3 1 ͡Ίʹ {CPC, CVR, CRP} = {0, 1, 1} ( c L(c) Q ( LVR ( LRP ( c L(c) Q ( a ൘୩ӳయ 2018 3 ݄ 3 Ίʹ {CPC, CVR, CRP} = {0, 1, 1} (1) c L(c) Q (2) LVR (3) LRP (4) c L(c) Q (5)