2013. n Tamer and Castro, Policy Gradients with Variance Related Risk Criteria, NeurIPS 2012. n Xie, Liu, Xu, Ghavamzadeh, Chow, and Lyu, A Block Coordinate Algorithm for Mean- Variance Optimazation. NeurIPS 2018 n Bisi, Sabbioni, Vittori, Papini, and Restelli, Risk-Averse Trust Region Optimization for Reward-Volatility Reduction, AAAI 2020. n Zhang, Liu, and Whiteson, Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning, arXiv 2020. n 森村哲郎「強化学習」 n 強化学習の学習アルゴリズムの分類 https://note.com/npaka/n/n5a6bc4825555 41