64

# Transport information Hessian distances

We formulate closed-form Hessian distances of information entropies in one-dimensional probability density space embedded with the L2-Wasserstein metric. Some analytical examples are provided.

July 23, 2021

## Transcript

1. ### Transport information Hessian distances Wuchen Li University of South Carolina

Divergence Statistics Geometric Science of Information. 1

3. ### Examples in Euclidean space Given X, Y ∈ R+ ,

consider a divergence function between them by D: R+ × R+ → R+ . Several examples are given below. Squared Euclidean distance: D(X Y ) = (X − Y )2; KullbackLeibler (KL) divergence: D(X Y ) = X log X Y ; Squared Hellinger distance: D(X Y ) = 4( √ X − √ Y )2. We brieﬂy review them in L2 space. And we plan to build their counterparts in optimal transport (Wasserstein) space. 3
4. ### KL divergence One important example of divergence functional is the

KL divergence: DKL (p q) = Ω p(x) log p(x) q(x) dx. KL divergence has a lot of properties. Nonsymmetry: DKL (p q) = DKL (q p); Separable; Convexity in both variables p and q. 4
5. ### Hessian distance In particular, there is a Hessian metric for

the KL divergence. Observe that DKL (q + ˙ q q) = gq ( ˙ q, ˙ q) + o( ˙ q 2 L2 ), where the notation gq (h, h) = Ω | ˙ q(x)|2 q(x) dx, represents the Hessian operator of negative entropy Ω q(x) log q(x)dx, in L2 space. Here gq (·, ·) is a Hessian metric, also named Fisher-Rao-information metric. 5
6. ### Hessian distance The Hessian metric of KL divergence induces a

distance function below. DH (p, q)2 = inf γ : [0,1]×Ω→R 1 0 gγt (∂t γt , ∂t γt )dt: γ0 = p, γ1 = q = inf γ : [0,1]×Ω→R 1 0 Ω |∂t γ(t, x)|2 γ(t, x) : γ0 = p, γ1 = q = inf γ : [0,1]×Ω→R 1 0 Ω |2∂t γ(t, x)|2 : γ0 = p, γ1 = q =4 Ω ( p(x) − q(x))2dx. Here DH is named the Hellinger distance. 6
7. ### Optimal transport What is the optimal way to move or

transport the mountain with shape X, density q(x) to another shape Y with density p(y)? I.e. DistT (p, q)2 = inf T : Ω→Ω Ω T(x) − x 2q(x)dx: T# q = p . The problem was ﬁrst introduced by Monge in 1781 and relaxed by Kantorovich in 1940. It introduces a metric function on probability set, named optimal transport distance, Wasserstein metric or Earth Mover’s distance (Ambrosio, Gangbo, McCann, Benamou, Breiner, Villani, Otto, Figali et.al.). Nowadays, optimal transport distances have been shown useful in inference problems and inverse problems (Poggio, Preye, Yunan, Engquist, Arjovsky, Osher, et.al.). 7
8. ### Goals We plan to design Hessian distances of information entropies

in Wasserstein space. Natural questions (i) What are Hessian distances in Wasserstein space? (ii) What is the “Hellinger” distance in Wasserstein space? Related studies Amari, Karakida, Oizumi, Cuturi; Guo, Hong, Yang; Leonard Wong, Yang, Zhang; Ay, Felice. 8
9. ### Optimal transport distance In one dimensional sample space, optimal transport

distance has the following closed form formulations. DistT (p, q)2 = Ω |T(x) − x|2q(x)dx, where T is a monotone mapping function such that p(T(x))T (x) = q(x). By some calculations, DistT (p, q)2 = Ω |F−1 p (y) − F−1 q (y)|2dy, where Fp , Fq are cumulative distributions of p, q, respectively. From now on, we call F−1 p the transport coordinates. 9
10. ### Hessian metric of Entropy in optimal transport space Consider f-entropy

by F(p) = Ω f(p(x))dx. The Hessian metric of f-entropy in optimal transport space satisﬁes gT p ( ˙ p, ˙ p) = Ω f (p)|∇2φ|2p(x)2dx, where ˙ p = −∇ · (p∇φ). 10
11. ### Transport Hessian distances Denote a one dimensional function h: Ω

→ R by h(y) = y 1 f ( 1 z ) 1 z 3 2 dz. Theorem The squared transport Hessian distance of f-entropy has the following formulations. (i) Inverse CDF formulation: DistTH (p, q)2 = 1 0 h(∇y F−1 p (y)) − h(∇y F−1 q (y)) 2dy. (ii) Mapping formulation: DistTH (p, q)2 = Ω h( ∇x T(x) q(x) ) − h( 1 q(x) ) 2q(x)dx, where T is an optimal transport mapping function, such that T# q = p and T(x) = F−1 p (Fq (x)). 11
12. ### Transport Hellinger distances If f(p) = p log p, then

h(z) = − log z. Hence DistTH (p, q)2 = Ω log ∇x T(x) 2q(x)dx = 1 0 log ∇y F−1 p (y) − log ∇y F−1 q (y) 2dy. In short, the transport Hellinger distance is a Hessian metric of entropy in Wasserstein space. 12
13. ### One Dimension: TKL vs KL divergence Similarly, we can extend

the study of transport Hessian distances to transport Bregman divergences. Transport KL divergence: DTKL (p q) := 1 0 ∇y F−1 p (y) ∇y F−1 q (y) − log ∇y F−1 p (y) ∇y F−1 q (y) − 1 dy. KL divergence: DKL (p q) = Ω ∇x Fp (x) log ∇x Fp (x) ∇x Fq (x) dx. Here Fp = x p(s)ds, Fq = x q(s)ds are cumulative distributions of probability densities p, q, respectively. 13