Uses of Uncertainty in Reinforcement Learning (DQNRP) Deep Q Network With Randomized Priors UADQN DQNRP: Osband, I., Aslanides, J., and Cassirer, A. Randomized prior functions for deep reinforcement learning, 2018. UADQN: Clements, W. R., Delft, B. V., Robaglia, B.-M., Slaoui, R. B., and Toth, S. Estimating risk and uncertainty in deep reinforcement learning, 2020. Quantile regression based distributional RL 偶然の不確実性が⾼いと⾏動価値を下げる <latexit sha1_base64="IZpLpjdZLCGPqK4k1KkX4iJOCiI=">AAADEnichVFLaxRBEK4ZX3F8ZDUXwcvgEokQl949+EQIycWTZBM3CWyvQ0+ndrdJz4OZ3sU4zB/wLHjwpCAi4h/wJrl48OohP0E8RvDiwZrZAd0EtZrprvqqvm+quv1Yq9Qwtm/Zx46fOHlq5rRz5uy587O1Cxc30miUSOzISEfJli9S1CrEjlFG41acoAh8jZv+zkqR3xxjkqoofGh2Y+wFYhCqvpLCEOTV3vOuw4ORl6WLbiY8lef3psPrXJPatnB5qgaB8DJu8LGhpEZhokRJN8/du277D0pRGrg8EGYohc4e5Fxj3yxM6y663EcjHmWt/LA0xjQ3BqV0keeJGgzNNYf3vFqdNVhp7lGnWTl1qGw1qr0BDtsQgYQRBIAQgiFfg4CUVheawCAmrAcZYQl5qswj5OAQd0RVSBWC0B3aBxR1KzSkuNBMS7akv2j6EmK6MM++sLfsgH1i79hX9vOvWlmpUfSyS6c/4WLszT69tP7jv6yATgPD36x/9mygD7fKXhX1HpdIMYWc8MdPnh+s31mbz66yV+wb9f+S7bM9miAcf5ev27j2Ahx6gObh6z7qbLQazRuNZrtVX1qunmIGLsMVWKD7vglLcB9WoQPSmrNuW8vWiv3M/mB/tPcmpbZVceZgyuzPvwAGHsv3</latexit> µs,ai = µs,ai aleatoric ; Qs,ai ⇠ N µs,ai <latexit sha1_base64="IZpLpjdZLCGPqK4k1KkX4iJOCiI=">AAADEnichVFLaxRBEK4ZX3F8ZDUXwcvgEokQl949+EQIycWTZBM3CWyvQ0+ndrdJz4OZ3sU4zB/wLHjwpCAi4h/wJrl48OohP0E8RvDiwZrZAd0EtZrprvqqvm+quv1Yq9Qwtm/Zx46fOHlq5rRz5uy587O1Cxc30miUSOzISEfJli9S1CrEjlFG41acoAh8jZv+zkqR3xxjkqoofGh2Y+wFYhCqvpLCEOTV3vOuw4ORl6WLbiY8lef3psPrXJPatnB5qgaB8DJu8LGhpEZhokRJN8/du277D0pRGrg8EGYohc4e5Fxj3yxM6y663EcjHmWt/LA0xjQ3BqV0keeJGgzNNYf3vFqdNVhp7lGnWTl1qGw1qr0BDtsQgYQRBIAQgiFfg4CUVheawCAmrAcZYQl5qswj5OAQd0RVSBWC0B3aBxR1KzSkuNBMS7akv2j6EmK6MM++sLfsgH1i79hX9vOvWlmpUfSyS6c/4WLszT69tP7jv6yATgPD36x/9mygD7fKXhX1HpdIMYWc8MdPnh+s31mbz66yV+wb9f+S7bM9miAcf5ev27j2Ahx6gObh6z7qbLQazRuNZrtVX1qunmIGLsMVWKD7vglLcB9WoQPSmrNuW8vWiv3M/mB/tPcmpbZVceZgyuzPvwAGHsv3</latexit> µs,ai = µs,ai aleatoric ; Qs,ai ⇠ N µs,ai , 2 2 epistemic アンサンブル⽅法(認識の不確実性) 状態, ⾏動の豊富なデータ領域ではメンバー間で意⾒⼀致 訓練データのない領域では不⼀致を起こす 偶然の不確実性と認識の不確実性を扱う リスク回避