Recently optimal transport has many applications in machine learning. In this talk, we introduce dynamical optimal transport on machine learning models. We proposed to study these models as a Riemannian manifold with a Wasserstein metric. We call it Wasserstein information geometry. Various developments, especially the Fokker-Planck equation and the mean-field games on learning models, will be introduced. The entropy production of Shannon entropy in AI models will be established. Many numerical examples, including restricted Boltzmann machine and generative adversary network, will be presented.