Slide 31
Slide 31 text
安全性や探索において不確実性が果たす役割
偶然の不確実性:エージェントの安全性
認識の不確実性:報酬が少ない環境でのエージェントの探索
Safety & Exploration: A Comparative Study of Uses of
Uncertainty in Reinforcement Learning
(DQNRP) Deep Q Network With Randomized Priors UADQN
DQNRP: Osband, I., Aslanides, J., and Cassirer, A. Randomized prior functions for deep reinforcement learning, 2018.
UADQN: Clements, W. R., Delft, B. V., Robaglia, B.-M., Slaoui, R. B., and Toth, S. Estimating risk and uncertainty in deep reinforcement learning, 2020.
Quantile regression based distributional RL
偶然の不確実性が⾼いと⾏動価値を下げる
AAADEnichVFLaxRBEK4ZX3F8ZDUXwcvgEokQl949+EQIycWTZBM3CWyvQ0+ndrdJz4OZ3sU4zB/wLHjwpCAi4h/wJrl48OohP0E8RvDiwZrZAd0EtZrprvqqvm+quv1Yq9Qwtm/Zx46fOHlq5rRz5uy587O1Cxc30miUSOzISEfJli9S1CrEjlFG41acoAh8jZv+zkqR3xxjkqoofGh2Y+wFYhCqvpLCEOTV3vOuw4ORl6WLbiY8lef3psPrXJPatnB5qgaB8DJu8LGhpEZhokRJN8/du277D0pRGrg8EGYohc4e5Fxj3yxM6y663EcjHmWt/LA0xjQ3BqV0keeJGgzNNYf3vFqdNVhp7lGnWTl1qGw1qr0BDtsQgYQRBIAQgiFfg4CUVheawCAmrAcZYQl5qswj5OAQd0RVSBWC0B3aBxR1KzSkuNBMS7akv2j6EmK6MM++sLfsgH1i79hX9vOvWlmpUfSyS6c/4WLszT69tP7jv6yATgPD36x/9mygD7fKXhX1HpdIMYWc8MdPnh+s31mbz66yV+wb9f+S7bM9miAcf5ev27j2Ahx6gObh6z7qbLQazRuNZrtVX1qunmIGLsMVWKD7vglLcB9WoQPSmrNuW8vWiv3M/mB/tPcmpbZVceZgyuzPvwAGHsv3
µs,ai
= µs,ai aleatoric ; Qs,ai
⇠ N µs,ai
AAADEnichVFLaxRBEK4ZX3F8ZDUXwcvgEokQl949+EQIycWTZBM3CWyvQ0+ndrdJz4OZ3sU4zB/wLHjwpCAi4h/wJrl48OohP0E8RvDiwZrZAd0EtZrprvqqvm+quv1Yq9Qwtm/Zx46fOHlq5rRz5uy587O1Cxc30miUSOzISEfJli9S1CrEjlFG41acoAh8jZv+zkqR3xxjkqoofGh2Y+wFYhCqvpLCEOTV3vOuw4ORl6WLbiY8lef3psPrXJPatnB5qgaB8DJu8LGhpEZhokRJN8/du277D0pRGrg8EGYohc4e5Fxj3yxM6y663EcjHmWt/LA0xjQ3BqV0keeJGgzNNYf3vFqdNVhp7lGnWTl1qGw1qr0BDtsQgYQRBIAQgiFfg4CUVheawCAmrAcZYQl5qswj5OAQd0RVSBWC0B3aBxR1KzSkuNBMS7akv2j6EmK6MM++sLfsgH1i79hX9vOvWlmpUfSyS6c/4WLszT69tP7jv6yATgPD36x/9mygD7fKXhX1HpdIMYWc8MdPnh+s31mbz66yV+wb9f+S7bM9miAcf5ev27j2Ahx6gObh6z7qbLQazRuNZrtVX1qunmIGLsMVWKD7vglLcB9WoQPSmrNuW8vWiv3M/mB/tPcmpbZVceZgyuzPvwAGHsv3
µs,ai
= µs,ai aleatoric ; Qs,ai
⇠ N µs,ai
, 2 2
epistemic
アンサンブル⽅法(認識の不確実性)
状態, ⾏動の豊富なデータ領域ではメンバー間で意⾒⼀致
訓練データのない領域では不⼀致を起こす
偶然の不確実性と認識の不確実性を扱う
リスク回避