Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transport information Hessian distances

7a507f364fce7547f94b9a5b4a072c87?s=47 Wuchen Li
July 23, 2021

Transport information Hessian distances

We formulate closed-form Hessian distances of information entropies in one-dimensional probability density space embedded with the L2-Wasserstein metric. Some analytical examples are provided.


Wuchen Li

July 23, 2021


  1. Transport information Hessian distances Wuchen Li University of South Carolina

    Divergence Statistics Geometric Science of Information. 1
  2. History of Statistical Divergences 2

  3. Examples in Euclidean space Given X, Y ∈ R+ ,

    consider a divergence function between them by D: R+ × R+ → R+ . Several examples are given below. Squared Euclidean distance: D(X Y ) = (X − Y )2; KullbackLeibler (KL) divergence: D(X Y ) = X log X Y ; Squared Hellinger distance: D(X Y ) = 4( √ X − √ Y )2. We briefly review them in L2 space. And we plan to build their counterparts in optimal transport (Wasserstein) space. 3
  4. KL divergence One important example of divergence functional is the

    KL divergence: DKL (p q) = Ω p(x) log p(x) q(x) dx. KL divergence has a lot of properties. Nonsymmetry: DKL (p q) = DKL (q p); Separable; Convexity in both variables p and q. 4
  5. Hessian distance In particular, there is a Hessian metric for

    the KL divergence. Observe that DKL (q + ˙ q q) = gq ( ˙ q, ˙ q) + o( ˙ q 2 L2 ), where the notation gq (h, h) = Ω | ˙ q(x)|2 q(x) dx, represents the Hessian operator of negative entropy Ω q(x) log q(x)dx, in L2 space. Here gq (·, ·) is a Hessian metric, also named Fisher-Rao-information metric. 5
  6. Hessian distance The Hessian metric of KL divergence induces a

    distance function below. DH (p, q)2 = inf γ : [0,1]×Ω→R 1 0 gγt (∂t γt , ∂t γt )dt: γ0 = p, γ1 = q = inf γ : [0,1]×Ω→R 1 0 Ω |∂t γ(t, x)|2 γ(t, x) : γ0 = p, γ1 = q = inf γ : [0,1]×Ω→R 1 0 Ω |2∂t γ(t, x)|2 : γ0 = p, γ1 = q =4 Ω ( p(x) − q(x))2dx. Here DH is named the Hellinger distance. 6
  7. Optimal transport What is the optimal way to move or

    transport the mountain with shape X, density q(x) to another shape Y with density p(y)? I.e. DistT (p, q)2 = inf T : Ω→Ω Ω T(x) − x 2q(x)dx: T# q = p . The problem was first introduced by Monge in 1781 and relaxed by Kantorovich in 1940. It introduces a metric function on probability set, named optimal transport distance, Wasserstein metric or Earth Mover’s distance (Ambrosio, Gangbo, McCann, Benamou, Breiner, Villani, Otto, Figali et.al.). Nowadays, optimal transport distances have been shown useful in inference problems and inverse problems (Poggio, Preye, Yunan, Engquist, Arjovsky, Osher, et.al.). 7
  8. Goals We plan to design Hessian distances of information entropies

    in Wasserstein space. Natural questions (i) What are Hessian distances in Wasserstein space? (ii) What is the “Hellinger” distance in Wasserstein space? Related studies Amari, Karakida, Oizumi, Cuturi; Guo, Hong, Yang; Leonard Wong, Yang, Zhang; Ay, Felice. 8
  9. Optimal transport distance In one dimensional sample space, optimal transport

    distance has the following closed form formulations. DistT (p, q)2 = Ω |T(x) − x|2q(x)dx, where T is a monotone mapping function such that p(T(x))T (x) = q(x). By some calculations, DistT (p, q)2 = Ω |F−1 p (y) − F−1 q (y)|2dy, where Fp , Fq are cumulative distributions of p, q, respectively. From now on, we call F−1 p the transport coordinates. 9
  10. Hessian metric of Entropy in optimal transport space Consider f-entropy

    by F(p) = Ω f(p(x))dx. The Hessian metric of f-entropy in optimal transport space satisfies gT p ( ˙ p, ˙ p) = Ω f (p)|∇2φ|2p(x)2dx, where ˙ p = −∇ · (p∇φ). 10
  11. Transport Hessian distances Denote a one dimensional function h: Ω

    → R by h(y) = y 1 f ( 1 z ) 1 z 3 2 dz. Theorem The squared transport Hessian distance of f-entropy has the following formulations. (i) Inverse CDF formulation: DistTH (p, q)2 = 1 0 h(∇y F−1 p (y)) − h(∇y F−1 q (y)) 2dy. (ii) Mapping formulation: DistTH (p, q)2 = Ω h( ∇x T(x) q(x) ) − h( 1 q(x) ) 2q(x)dx, where T is an optimal transport mapping function, such that T# q = p and T(x) = F−1 p (Fq (x)). 11
  12. Transport Hellinger distances If f(p) = p log p, then

    h(z) = − log z. Hence DistTH (p, q)2 = Ω log ∇x T(x) 2q(x)dx = 1 0 log ∇y F−1 p (y) − log ∇y F−1 q (y) 2dy. In short, the transport Hellinger distance is a Hessian metric of entropy in Wasserstein space. 12
  13. One Dimension: TKL vs KL divergence Similarly, we can extend

    the study of transport Hessian distances to transport Bregman divergences. Transport KL divergence: DTKL (p q) := 1 0 ∇y F−1 p (y) ∇y F−1 q (y) − log ∇y F−1 p (y) ∇y F−1 q (y) − 1 dy. KL divergence: DKL (p q) = Ω ∇x Fp (x) log ∇x Fp (x) ∇x Fq (x) dx. Here Fp = x p(s)ds, Fq = x q(s)ds are cumulative distributions of probability densities p, q, respectively. 13