Yale university press, 1959. • Tamar, A., Di Castro, D., and Mannor, S. Policy gradients with variance related risk criteria. In ICML, 2012. • Prashanth, L. and Ghavamzadeh, M. Actor-critic algorithms for risk-sensitive MDPs. In NeurIPS, 2013. • Xie, T., Liu, B., Xu, Y., Ghavamzadeh, M., Chow, Y., Lyu, D., and Yoon, D. A block coordinate ascent algorithm for mean-variance optimization. In NeurIPS, 2018.