et al. 2007) • Revisiting Natural Gradient for Deep Networks (Pascanu and Bengio 2013) • Exact Natural Gradient in Deep Linear Networks and Its Application to the Nonlinear Case (Bernacchia 2018) • Fisher Information and Natural Gradient Learning in Random Deep Networks (Amari et al. 2019) 34
in Learning.” Neural Computation 10 (2): 251–76. • Blair, Charles. 1985. “Problem Complexity and Method Efficiency in Optimization (AS Nemirovsky and DB Yudin).” SIAM Review 27 (2): 264. • Mertikopoulos, Panayotis, and Mathias Staudigl. 2018. “On the Convergence of Gradient-Like Flows with Noisy Gradient Input.” SIAM Journal on Optimization: A Publication of the Society for Industrial and Applied Mathematics 28 (1): 163–97. • Raskutti, G., and S. Mukherjee. 2015. “The Information Geometry of Mirror Descent.” IEEE Transactions on Information Theory / Professional Technical Group on Information Theory 61 (3): 1451–57. • Martens, James, and Roger Grosse. 2015. “Optimizing Neural Networks with Kronecker-Factored Approximate Curvature.” In International Conference on Machine Learning, 2408–17. PMLR. • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. • Kunstner, Frederik, Philipp Hennig, and Lukas Balles. 2019. “Limitations of the Empirical Fisher Approximation for Natural Gradient Descent.” In Advances in Neural Information Processing Systems, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, 32:4156–67. Curran Associates, Inc. 39
Schmidt, and Mohammad Emtiyaz Khan. 2018. “SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient.” In Advances in Neural Information Processing Systems, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 31:6245–55. Curran Associates, Inc. • Duan, Tony, Anand Avati, Daisy Yi Ding, Khanh K. Thai, Sanjay Basu, Andrew Y. Ng, and Alejandro Schuler. 2019. “NGBoost: Natural Gradient Boosting for Probabilistic Prediction.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1910.03225. • Karakida, Ryo, and Kazuki Osawa. 2020. “Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks.” arXiv [stat.ML]. arXiv. http://arxiv.org/abs/2010.00879. • Roux, Nicolas, Pierre-Antoine Manzagol, and Yoshua Bengio. 2008. “Topmoumoute Online Natural Gradient Algorithm.” In Advances in Neural Information Processing Systems, edited by J. Platt, D. Koller, Y. Singer, and S. Roweis, 20:849–56. Curran Associates, Inc. • Pascanu, Razvan, and Yoshua Bengio. 2013. “Revisiting Natural Gradient for Deep Networks.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1301.3584. • Bernacchia, Alberto, Mate Lengyel, and Guillaume Hennequin. 2018. “Exact Natural Gradient in Deep Linear Networks and Its Application to the Nonlinear Case.” In Advances in Neural Information Processing Systems, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, 31:5941–50. Curran Associates, Inc. • Amari, Shun-Ichi, Ryo Karakida, and Masafumi Oizumi. 2019. “Fisher Information and Natural Gradient Learning in Random Deep Networks.” In , edited by Kamalika Chaudhuri and Masashi Sugiyama, 89:694–702. Proceedings of Machine Learning Research. PMLR. • Kakade, Sham M. 2002. “A Natural Policy Gradient.” In Advances in Neural Information Processing Systems, edited by T. Dietterich, S. Becker, and Z. Ghahramani, 14:1531–38. MIT Press. • Knight, Ethan, and Osher Lerner. 2018. “Natural Gradient Deep Q-Learning.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1803.07482. • Pajarinen, Joni, Hong Linh Thai, Riad Akrour, Jan Peters, and Gerhard Neumann. 2019. “Compatible Natural Gradient Policy Search.” Machine Learning 108 (8): 1443–66. 40
2000. “Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons.” Neural Computation 12 (6): 1399–1409. • Park, H., S. I. Amari, and K. Fukumizu. 2000. “Adaptive Natural Gradient Learning Algorithms for Various Stochastic Models.” Neural Networks: The Official Journal of the International Neural Network Society 13 (7): 755–64. • Zhao, Junsheng, and Xingjiang Yu. 2015. “Adaptive Natural Gradient Learning Algorithms for Mackey–Glass Chaotic Time Prediction.” Neurocomputing 157 (June): 41–45. 41