Information Newton flow: second order method in probability space

7a507f364fce7547f94b9a5b4a072c87?s=47 Wuchen Li
February 13, 2020

Information Newton flow: second order method in probability space

Markov chain Monte Carlo (MCMC) methods nowadays play essential roles in machine learning, Bayesian sampling problems, and inverse problems. To accelerate the MCMC methods, we formulate a high order optimization framework for accelerated them. It can be viewed as Newton’s flows in probability space with information metrics, named information Newton’s flows. Here two information metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. Several examples of information Newton’s flows for learning objective/loss functions are provided, such as Kullback-Leibler (KL) divergence, Maximum mean discrepancy (MMD), and cross-entropy. The asymptotic convergence results of proposed Newton’s methods are provided. A known fact is that classical MCMC methods, such as overdamped Langevin dynamics, correspond to Wasserstein gradient flows of KL divergence. Extending this fact to Wasserstein Newton’s flows of KL divergence, we derive Newton’s Langevin dynamics. We provide examples of Newton’s Langevin dynamics in both one-dimensional space and Gaussian families. For the numerical implementation, we design sampling efficient variational methods to approximate Wasserstein Newton’s directions. Several numerical examples in Gaussian families and Bayesian logistic regression are shown to demonstrate the effectiveness of the proposed method.


Wuchen Li

February 13, 2020