reinforcement learning between diverse environmental dynamics [Barekatain et al., IJCAI 2020] State 𝑠𝑠𝑡𝑡 Auxiliary network for predicting residuals: 𝐹𝐹aux 𝑠𝑠𝑡𝑡 ; 𝜃𝜃aux 𝜃𝜃aux Continuous action space: 𝜋𝜋target ≡ 𝒩𝒩 𝐹𝐹 𝑠𝑠𝑡𝑡 ; 𝐿𝐿, 𝜃𝜃agg , 𝜃𝜃aux , Σ 𝜇𝜇1 𝜇𝜇2 𝜇𝜇𝐾𝐾 … Source policies 𝐿𝐿 = 𝜇𝜇1 , … , 𝜇𝜇𝐾𝐾 … ⊙ 𝐴𝐴𝑡𝑡 … 𝜃𝜃agg Adaptive aggregation of source policies: 𝐹𝐹agg 𝑠𝑠𝑡𝑡 ; 𝐿𝐿, 𝜃𝜃agg 𝐹𝐹 𝑠𝑠𝑡𝑡 ; 𝐿𝐿, 𝜃𝜃agg , 𝜃𝜃aux + 30/32