Slide 30
Slide 30 text
24. 未知環境に適応する転移強化学習
• 学習済み方策を重み付け和で集約し、迅速に未知環境に適応
MULTIPOLAR: Multi-source policy aggregation for transfer
reinforcement learning between diverse environmental dynamics
[Barekatain et al., IJCAI 2020]
State
𝑠𝑠𝑡𝑡
Auxiliary network for predicting residuals: 𝐹𝐹aux
𝑠𝑠𝑡𝑡
; 𝜃𝜃aux
𝜃𝜃aux
Continuous action space:
𝜋𝜋target
≡ 𝒩𝒩 𝐹𝐹 𝑠𝑠𝑡𝑡
; 𝐿𝐿, 𝜃𝜃agg
, 𝜃𝜃aux
, Σ
𝜇𝜇1
𝜇𝜇2
𝜇𝜇𝐾𝐾
…
Source policies
𝐿𝐿 = 𝜇𝜇1
, … , 𝜇𝜇𝐾𝐾
…
⊙
𝐴𝐴𝑡𝑡
…
𝜃𝜃agg
Adaptive aggregation of source policies: 𝐹𝐹agg
𝑠𝑠𝑡𝑡
; 𝐿𝐿, 𝜃𝜃agg
𝐹𝐹 𝑠𝑠𝑡𝑡
; 𝐿𝐿, 𝜃𝜃agg
, 𝜃𝜃aux
+
30/32