Recently optimal transport has many applications in machine learning. In this talk, we introduce dynamical optimal transport on machine learning models. We proposed to study these model as a Riemannian manifold with a Wasserstein metric. We call it Wasserstein information geometry. Various developments, especially the Fokker-Planck equation on learning models, will be introduced. The entropy production of Shannon entropy will be established. Many numerical examples, including restricted Boltzmann machine and generative adversary network, will be presented.