Free Wasserstein manifold

Free Wasserstein manifold Wuchen Li University of South Carolina Free
probability seminar, UC Berkeley Based on a joint work with David Jekel (UCSD) and Dimitri Shlyakhtenko (UCLA). 1

Entropy, Fisher information and Transportation In recent years, there are
actively joint studies to connect entropy, Fisher information and transportation. Nowadays, these connections have applications in functional inequalities and dynamical behaviors of ﬂuid dynamics. In this talk, we discuss transportation theory in free probability by studying “log-densities” for non-commutative random variables. 2

Metric in probability space 3

Brownian motion and heat equations Consider a standard Brownian motion
in Rd by dXt = p 2dBt. Let ⇢(t, x) denote the probability density function of Xt . Then ⇢ satisﬁes the heat equation @⇢t @t = r · (r⇢t) = ⇢t. where ⇢t = ⇢(t, x), and r·, r are the divergence, gradient operators in Rd, respectively. 4

Entropy dissipation Consider the negative Boltzmann-Shannon entropy by H(⇢) =
Z Rd ⇢(x)log ⇢(x)dx. Along the heat equation, the dissipation relation holds: d dtH(⇢t) = Z Rd krx log ⇢tk 2⇢tdx = I(⇢t), where I(⇢) is named the Fisher information functional. There is a formulation behind this relation, namely ⇢(t, x) is a gradient ﬂow of entropy in optimal transport space. 5

Optimal transport What is the optimal way to move or
transport the mountain with shape X, density ⇢0(x) to another shape Y with density ⇢1(y)? Consider DistT (⇢0, ⇢1) = inf T Z Rd kx T(x)k 2⇢0(x)dx, where the inﬁmum is among all transport maps T, which transfers ⇢0(x) to ⇢1(x), i.e. ⇢0(x) = ⇢1(T(x))det(rT(x)). 6

Overview The optimal transport problem was first introduced by Monge
in 1781 and relaxed by Kantorovich in 1940. It introduces a distance on the space of probability distributions, named optimal transport distance, Wasserstein distance, or Earth Mover’s distance. There are many viewpoints and applications of this distance: I Linear programming; I Mapping/Monge-Amp´ ere equation; I Fluid dynamics; I Density manifold (Arnold mechanics). See Ambrosio, Villani, Otto and many more. In the first part of this talk, we mainly consider its transportation formulation in classical probability. In the first part, I focus on the formulation in a classical probability. Then David will present the formulation for free probability. 7 (Gangbo et.al.)

Transport distance formulations There is a relaxation formulation of classical
optimal transport distance. inf ⇡ Z Rd Z Rd kx yk 2⇡(x, y)dxdy, where the inﬁmum is taken among all joint measures (transport plans) ⇡(x, y) having ⇢0(x) and ⇢1(y) as marginals, i.e. Z Rd ⇡(x, y)dy = ⇢0(x), Z Rd ⇡(x, y)dx = ⇢1(y), ⇡(x, y) 0. Here ⇡(x, y) = (x, T(x) = x + r (0, x))#⇢0. 8

Dynamical optimal transport 11

Optimal transport space (Density manifold) The optimal transport has a
variational formulation (Benamou-Brenier 2000): inf v Z 1 0 E Xt ⇠⇢t kv(t, Xt)k 2 dt, where E is the expectation operator and the infimum runs over all vector fields vt , such that ˙ Xt = v(t, Xt), X0 ⇠ ⇢0, X1 ⇠ ⇢1. Under this metric, the probability set has a Riemannian geometry structure1. 1John D. La↵erty: the density manifold and configuration space quantization, 1988. 9

Riemannian metric for optimal transport Informally speaking, Wasserstein metric refers
to the following bilinear form: h ˙ ⇢1, G(⇢) ˙ ⇢2i = Z ( ˙ ⇢1, ( ⇢) 1 ˙ ⇢2)dx. In other words, denote ˙ ⇢i = r · (⇢r i), i = 1, 2, then h 1, G(⇢) 1 2i = h 1, r · (⇢r) 2i, where ⇢ 2 P(⌦), ⇢i is the tangent vector in P(⌦) with Z ⇢idx = 0, and i 2 C1(⌦) are cotangent vectors in P(⌦) at the point ⇢. Here r·, r are standard divergence and gradient operators in ⌦. 10 . .

Optimal transport gradient flows The Wasserstein gradient flow of an
energy functional F(⇢) leads to @t⇢ = G(⇢) 1 ⇢F(⇢) =r · (⇢r ⇢F(⇢)). Example If F(⇢) = R F(x)⇢(x)dx, then the gradient flow follows @t⇢ = r · (⇢rF(x)). 11

Entropy dissipation revisited The gradient ﬂow of the negative entropy
H(⇢) = Z Rd ⇢(x)log ⇢(x)dx, w.r.t. optimal transport metric distance satisﬁes @⇢ @t = r · (⇢rlog ⇢) = ⇢. Here the major trick is that ⇢r log ⇢ = r⇢. In this way, one can study the entropy dissipation by d dtH(⇢) = Z Rd log ⇢r · (⇢rlog ⇢)dx = Z Rd kr log ⇢k 2⇢dx. 12

Optimal transport Hamiltonian flows Consider the Lagrangian by L(⇢, @t⇢)
= 1 2 Z ⇣ @t⇢, ( r · (⇢r)) 1@t⇢ ⌘ dx F(⇢). The Hamiltonian flow satisfies the Euler-Lagrange equation d dt @t⇢L(⇢, @t⇢) = r⇢L(⇢, @t⇢). 13

Optimal transport Hamiltonian ﬂows By the Legendre transform, i.e. H(⇢,
) = sup @t⇢ Z @t⇢ dx L(⇢, @t⇢). And the Hamiltonian system follows @t⇢ = H(⇢, ), @t = ⇢H(⇢, ), where ⇢ , are L2 ﬁrst variation operators w.r.t. ⇢, , respectively and the density Hamiltonian forms H(⇢, ) = 1 2 Z kr k 2⇢dx + F(⇢). Here ⇢ is the “density” state variable and is the “density” moment variable. 14

Hamiltonian ﬂows: Compressible Euler equation More explicitly, 8 < :
@t⇢ + r · (⇢r ) = 0 @t + 1 2kr k 2 = ⇢F(⇢). 15

Why optimal transport formalisms? I Generalized log-Sobolev inequalities and bound;
I Generalized dynamics: E.g., Schrodinger equation, Schrodinger bridge problem and mean ﬁeld games; I Generalized dualities and distances in information theory and AI. 16

Log density coordinates As we will see in the second
talk, it is more natural in the random matrix and free setting to study the log-density rather than the density. Thus, let us describe what happens in the classical case when we write everything in terms of the log-density — introducing an alternative coordinate system for the classical manifold of densities. (⇢, ) ! (e V , ), where R e V dx = 1. All formalisms of optimal transport need to be adjusted accordingly. 17

Log density change of variable Consider @t⇢(t, x) = r
· (⇢(t, x)r (t, x)). Denote ⇢(t, x) = e V (t,x). Then @tV = @t log ⇢ = @t⇢ ⇢ =r · (⇢r ) ⇢ =(r log ⇢, r ) + = (rV, r ) + = : LV . Here LV = (rV, r) + . 18

Log density gradient flows Similarly, we can formulate the gradient
flow in term of log densities. Consider @t⇢ =r · (⇢r ⇢F(⇢)). Denote ⇢ = e V . Then @tV = LV ⇢F(⇢)|⇢=e V . Example If F(⇢) = R F(x)⇢(x)dx = R F(x)e V (x)dx, then the gradient flow follows @tV = LV F(x). 19

Log density entropy dissipation The gradient ﬂow of the negative
entropy H(⇢) = Z ⇢ log ⇢dx = Z V e V dx, w.r.t. optimal transport metric distance satisﬁes @tV = LV V = krV k 2 + V. In this log density coordinate, one can study the entropy dissipation by d dtH(⇢) = Z Rd krV k 2e V dx. 20

Log density Hamiltonian ﬂows Similarly, we can formulate the Hamiltonian
ﬂow in term of log densities. 8 < : @tV LV = 0 @t + 1 2kr k 2 = ⇢F(⇢)|⇢=e V . 21

The free Wasserstein manifold David Jekel, Wuchen Li, Dima Shlyakhtenko
February 22, 2021 David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 1 / 44

This work was supported in part by the National Science
Foundation. The talk will focus on the big picture and thus precise deﬁnitions will only be given when helpful. Before the rigorous statements, there will be several slides of motivation and introduction aimed at people who are familiar with free probability. Feel free to interrupt with questions about notation. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 2 / 44

Motivation The classical Wasserstein manifold P(M), whose points are smooth
positive probability densities, is an inﬁnite-dimensional Riemannian framework which nicely describes things such as entropy, the heat equation, log-Sobolev inequalities, optimal transport, measure-preserving transformations, etc. Because many of these notions have analogs in free probability and random matrix theory, we want to deﬁne the tracial non-commutative version of the Wasserstein manifold, in which we use non-commutative laws instead of measures. A big obstacle is that we don’t have a direct analog of density in the non-commutative setting. However, there are several indications that the log-density is a better behaved notion. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 3 / 44

Motivation — random matrices Given some self-adjoint non-commutative polynomial f
(with nice enough behavior at 1), we can deﬁne a function V (N) : MN (C)d sa ! R by V (N)(x) = trN (f (x)), where x = (x1, . . . , xd ) is a d-tuple of self-adjoint N ⇥ N matrices and trN = (1/N) Tr is the normalized trace on MN (C). Then we deﬁne a probability measure µ(N) on MN (C)d sa by dµ(N)(x) = constant(V , N)e N2V (N)(x) dx, where dx is Lebesgue measure. Letting X (N) be a random matrix tuple chosen according to the measure µ(N), a lot of past work has shown in certain cases that for every non-commutative polynomial p, trN (p(X (N))) converges almost surely to some deterministic limit, which is described by ⌧(p(X)) for some d-tuple X from a tracial von Neumann algebra (A, ⌧). Then we might want to say that “tr(p) is a log-density of the distribution of X.” David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 4 / 44

Motivation — free score function This idea is closely related
to Voiculescu’s idea of a free score function (a.k.a. conjugate variable). The classical score function of a measure with density ⇢ on Rd is r log ⇢. If X is a random variable with density ⇢ and if ⇠ = (r log ⇢)(X), then we have the integration-by-parts relation E[h⇠, f (X)i] = E[Tr(Df (X))], for all f 2 C 1 c (Rd , Rd ), where Df is the Jacobian matrix. Given a tracial von Neumann algebra (A, ⌧) generated by X 2 Ad sa , we say that ⇠ 2 Ad sa is a free score function for X if h⇠, p(X)i⌧ = ⌧ ⌦ ⌧ ⌦ Tr(J p) for every non-commutative polynomial p, where J p is the matrix of derivatives of p in the sense of Voiculescu’s free di↵erence quotient. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 5 / 44

Motivation — free score function In the setting of the
random matrix d-tuples X (N) above given by V (N)(x) = trN (f (x)), the free score function becomes very concrete. Since ⇢ = conste N2V (N) , the classical score function is (up to normalization) rV (N)(x), which after some matrix computations works out to D f (x), where D f is Voiculescu’s cyclic gradient. Classical integration by parts plus some matrix computations tell us that E[hrV (N)(X (N)), p(X (N))itrN ] = E[trN ⌦ trN ⌦ Trd [J p(X (N))]]. Thus, if X is the d-tuple of self-adjoint operators from (A, ⌧) describing the large-N limit, then ⇠ = D p(X) is the free score function for X. The existence of a free score means that “the gradient of the log-density makes sense as an element of L 2.” This motivates us to make the log-density the central object of study. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 6 / 44

Overview We’re going to define the free Wasserstein manifold W
(R⇤d ) as the space of certain “log-density” functions V , which are a generalization of things like tr(f ). More precisely, V will some from a space of tracial non-commutative smooth functions that are defined in terms of trace polynomials. (Similar spaces were defined in Dabrowski, Guionnet, and Shlaykhtenko’s 2016 preprint on free transport, but we take a di↵erent approach to the norms.) The tangent space of W (R⇤d ) at V is similarly a space of tracial non-commutative smooth functions W , which are viewed as perturbations of V . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 7 / 44

Overview In order to obtain a Riemannian metric for the
Wasserstein manifold, we must associate a non-commutative law µV to each V . This is much trickier than in the classical case (where we would just set dµV (x) = constant e V (x) dx), but there are two known methods for doing this: (1) For each V , ﬁnd a (hopefully unique) law µ that maximizes (µ) µ(V ), where is Voiculescu’s free microstate entropy. This approach is inspired by Voiculescu and is closely related to the random matrix models discussed before. (2) Set up the free stochastic di↵erential equation dXt = dSt (1/2)rV (Xt) dt, where St is a free Brownian motion (still self-adjoint d-tuple), and (hopefully) recover µV as the limiting distribution of Xt as t ! 1. This approach was pioneered by Biane and Speicher (1999) and further developed by Guionnet, Shlyakhtenko, and Dabrowski. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 8 / 44

Outline Background on non-commutative laws. Tracial non-commutative smooth functions. Free
Wasserstein manifold and di↵eomorphism group. Riemannian metric. Strategy to construct smooth transport. Inversion of the Laplacian LV through heat semigroup and SDE. Free Gibbs laws through maximization of (µ) µ(V ). Geodesics and optimal transport. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 9 / 44

Operator algebras and laws Definition A unital C⇤ -algebra is
a subalgebra of B(H) (for some Hilbert space H) that is closed under adjoints and limits in operator norm. Definition A tracial C⇤ -algebra is a pair (A, ⌧) where A is a C⇤-algebra and ⌧ is a faithful trace, that is, ⌧(1) = 1, ⌧(a ⇤ a) 0 with equality if and only if a = 0, ⌧(ab) = ⌧(ba). Remark We don’t need to go into the definition of von Neumann algebras now. But every tracial C⇤-algebra can be completed to a tracial von Neumann algebra. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 10 / 44

Operator algebras and laws Definition Let Chx1, . . .
, xd i be the algebra of non-commutative polynomials equipped with the ⇤-operation such that x ⇤ j = xj . (We typically use x to denote formal or generic variables and X to denote a specific tuple of operators.) Definition Let ⌃d,R be the set of linear functionals : Chx1, . . . , xd i ! C such that (1) = 1, (p ⇤ p) 0, (pq) = (qp), | (xi1 . . . xik )|  R k, equipped with the weak-? topology (as a subset of the dual of Chx1, . . . , xd i). We call elements of ⌃d,R non-commutative laws with exponential bound R. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 11 / 44

Operator algebras and laws Proposition If (A, ⌧) is a
tracial C⇤-algebra and X = (X1, . . . , Xd ) 2 Ad sa , then the map X : Chx1, . . . , xd i ! C, p 7! ⌧(p(X)) is a non-commutative law with exponential bound kXk1 = maxj kXj k. Conversely, every 2 ⌃d,R can be realized as X for some (A, ⌧) and X 2 Ad sa with kXk1  R. The proof is a variant of the GNS construction. The proposition can be interpreted as follows: 1 ⌃d,R is the space of traces on the C⇤-universal free product C([ R, R])⇤d . 2 ⌃d,R is in bijection with isomorphism classes of triples (A, ⌧, X), where (A, ⌧) is a tracial C⇤-algebra and X 2 Ad sa generates A; here isomorphism means a C⇤-isomorphism that preserves the trace and generators. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 12 / 44

Trace polynomials We next describe non-commutative functions that are modeled
on trace polynomials in a similar spirit to Dab. Gui. Shl. 2016. A trace polynomial in (x1, . . . , xd ) is an expression formed through addition, multiplication, and application of a symbol tr, such as f (x1, x2, x3) = tr(x 2 1 x2) tr(x3)x1 + tr(x2x3) + 5 tr(x1x2x3) tr(x1)x2x 2 3 . These expressions are considered modulo the relations that tr(pq) = tr(qp) and tr(tr(p)q) = tr(p) tr(q). For any tracial C⇤-algebra (A, ⌧) and X 2 Ad sa , we can evaluate a trace polynomial f on X by substituting Xj for the formal symbol xj and ⌧ for the formal symbol tr. Hence, a trace polynomial f gives rise to a function f A,⌧ : Ad sa ! A. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 13 / 44

Trace polynomials We next describe non-commutative functions that are modeled
on trace polynomials. A trace polynomial in (x1, . . . , xd ) is an expression formed through addition, multiplication, and application of a symbol tr, such as f (x1, x2, x3) = tr(x 2 1 x2) tr(x3)x1 + tr(x2x3) + 5 tr(x1x2x3) tr(x1)x2x 2 3 . These expressions are considered modulo the relations that tr(pq) = tr(qp) and tr(tr(p)q) = tr(p) tr(q). For any tracial C⇤-algebra (A, ⌧) and X 2 Ad sa , we can evaluate a trace polynomial f on X by substituting Xj for the formal symbol xj and ⌧ for the formal symbol tr. Hence, a trace polynomial f gives rise to a function f A,⌧ : Ad sa ! A. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 14 / 44

Trace polynomials Trace polynomials have several advantages over non-commutative polynomials.
1 It follows from the work of Procesi (1976) that every function MN (C)d sa ! MN (C) that is entrywise polynomial and is invariant under unitary conjugation must be given by a trace polynomial. 2 For each trace polynomial f , we can compute the Laplacian of f MN (C),trN as a function on MN (C)d sa (equipped with the inner product from trN ). The Laplacian (1/N 2) f MN (C),trN is a trace polynomial and it converges coe cientwise as N ! 1 to some trace polynomial Lf . We’ll deﬁne the non-commutative space C k(R⇤d ) roughly as functions that such that the ﬁrst k derivatives can be approximated on operator-norm balls by trace polynomials. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 15 / 44

Description of trace Ck functions The space C k tr
(R⇤d ) is described as follows: Each f 2 C k tr (R⇤d ) is a collection of functions f A,⌧ : Ad sa ! A for tracial C⇤-algebras (A, ⌧). f A,⌧ must be a C k function in the sense of Fr´ echet di↵erentiation. The derivative @k f A,⌧ (X) is a multilinear map Ad sa ⇥ · · · ⇥ Ad sa ! A. Inspired by the non-commutative H¨ older’s inequality, we deﬁne the norm k@j f A,⌧ (X)kM j as the smallest constant such that k@j f A,⌧ (X)[Y1, . . . , Yk ]kp  k@j f A,⌧ (X)kM k kY1kp1 . . . kYj kpj . where 1/p = 1/p1 + · · · + 1/pj , and where j = 0, . . . , k. Then k@j f kM j ,R is the supremum of k@j f A,⌧ (X)kM j over (A, ⌧) and X 2 Ad sa with kXk1  R. For R > 0 and j  k, we assume that k@j f kM j ,R is ﬁnite and that @j f can be approximated in this norm by trace polynomials of X, Y1, . . . , Yk that are multilinear in Y1, . . . , Yk . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 16 / 44

Properties of trace Ck functions There are also spaces Ctr(R⇤d
, M j (R⇤d1 , . . . , R⇤dn )) of functions where f A,⌧ (X) is a multilinear map Ad1 sa ⇥ · · · ⇥ Adn sa ! A. The exact deﬁnition of the space is less important than the properties: These spaces are closed under composition, whenever the composition makes sense, and they satisfy the chain rule. There is an inverse function theorem: If f is C k tr (R⇤d ) self-adjoint d-tuple, and if @f Id is uniformly bounded by a constant c < 1, then f 1 is deﬁned and is C k tr . There is a trace map tr : C k tr (R⇤d ) ! C k tr (R⇤d ) given by tr(f )A,⌧ (X) = ⌧(f A,⌧ (X)). The image tr(C k tr (R⇤d )) consists of those f which are scalar-valued. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 17 / 44

Examples of trace Ck tr functions Of course, trace polynomials
are C 1 tr functions. If : R ! R such that R R |2⇡s|k b(s) ds < 1, then the function f A,⌧ (X) = (X) (deﬁned by functional calculus) is in C k tr (R⇤1) and the kth derivative is bounded by R R |2⇡s|k b(s) ds. Together with the chain rule, this shows that there is an abundance of BC k tr (R⇤d ) functions, that is, functions in C k tr (R⇤d ) such that k@j f kM j ,u := sup R>0 k@j f kM j ,R < 1. Imposing certain growth conditions at 1 on a on Ctr(R⇤d ) function is not a big restriction. This makes life easier than it would be if we only used trace polynomials. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 18 / 44

Di↵erentiation of trace Ck functions For scalar-valued g 2 C
k tr (R⇤d ), we can deﬁne a gradient rg 2 C k tr (R⇤d ). In the case where g = tr(p) for some non-commutative polynomial p, then rg is the cyclic gradient of p. The analog of C k functions from Rd to Md (C) is the space C k tr (R⇤d , M (R⇤d )). This is the space that contains the derivative @f when f 2 C k+1 tr (R⇤d )d sa , as well as the Hessian of g when g is a scalar-valued element of C k+1 tr (R⇤d ). For F 2 C k tr (R⇤d , M (R⇤d )), for X 2 Ad sa , the object F A,⌧ (X) is a linear transformation Ad ! Ad . We deﬁne F#G to be the pointwise composition of these linear transformations. It turns out that C k tr (R⇤d , M (R⇤d )) is a ⇤-algebra with respect to the #-multiplication. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 19 / 44

Di↵erentiation of trace Ck functions We can deﬁne a trace
Tr# : C k tr (R⇤d , M (R⇤d )) ! tr(C k tr (R⇤d )) by (Tr# (F))A,⌧ (X) = hS, F A⇤B,⌧⇤ (X)[S]i⌧⇤ , where (B, ) is the tracial C⇤-algebra of generated by a free semicircular d-tuple S. This is the analog of the map C k(Rd , Md (C)) ! C k(Rd ) deﬁned by pointwise application of the trace Trd on Md (C). This is because the trace of a matrix A can be expressed as EhY , AY i where Y is a standard Gaussian random vector in Rd , and the analog of the Gaussian in free probability is the semicircular family. Another motivating example is that if F 2 C k tr (R⇤d , M (R⇤d )) is given by F A,⌧ (X)[Y ]i = P j p A,⌧ i,j (X)Yj q A,⌧ i,j (X) for some matrix (pi,j ⌦ qi,j )i,j of non-commutative polynomials, then Tr# (F)A,⌧ (X) = X i ⌧(pi,i (X))⌧(qi,i (X)). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 20 / 44

Di↵erentiation of trace Ck functions The trace Tr# allows us
to deﬁne the divergence operator r† : C k+1 tr (R⇤d , M (R⇤d )) ! tr(C k tr (R⇤d )) as the trace of the Jacobian, as well as the Laplacian L = r†r : tr(C k+2 tr (R⇤d )) ! tr(C k tr (R⇤d )). These operators are the limits of the corresponding normalized divergence and Laplacian for functions on MN (C)d sa . Furthermore, the trace Tr# gives rise to a Fuglede-Kadison log-|determinant| map log # : GL(C k tr (R⇤d , M (R⇤d ))) ! tr(C k tr (R⇤d )). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 21 / 44

Free Wasserstein manifold and di↵eomorphism group We’ll first set up
the manifold formally. Afterwards, we’ll describe how to extract a non-commutative law µV from V and hence define the Riemannian metric. Definition The free Wasserstein manifold W (R⇤d ) is the set of V 2 tr(C 1 tr (R⇤d )) such that V has “quadratic growth at 1” in the sense that for some constants a, a 0 > 0 and b, b 0 2 R, we have a X j ⌧(X 2 j ) + b  V A,⌧ (X)  a 0 X j ⌧(X 2 j ) + b 0. Definition The free di↵eomorphism group D(R⇤d ) is the set of f 2 C 1 tr (R⇤d )d sa such that f has an inverse function f 1 in C 1(R⇤d )d sa , and @f , @f 1 are bounded. Note this is a group under composition. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 22 / 44

Tangent vectors A tangent vector to W (R⇤d ) at
V is an equivalence class of C 1 paths ( ✏, ✏) ! W (R⇤d ) : t 7! Vt with V0 = V , where two paths are equivalent if they have the same ˙ V0. For convenience, we assume that Vt satisﬁes the quadratic growth bounds with a, a 0, b, b 0 independent of t. A tangent vector to D(R⇤d ) at id is similarly an equivalence class of C 1 paths t 7! ft with f0 = id, and the equivalence is equality of ˙ f0. Again, assume that @ft and @f 1 t are uniformly bounded. Here, by “C 1 path”, we mean it is continuously di↵erentiable with respect to the Fr´ echet topology of C 1 tr (R⇤d ) on the target space (deﬁned by the seminorms of each derivative @j f on each ball of radius R). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 23 / 44

The transport action In the classical case, one studies the
action of Di↵(Rd ) on P(Rd ) by push-forward, which is viewed as an infinite-dimensional Lie group acting on an infinite-dimensional Riemannian manifold. If µ has density e V and if f is a di↵eomorphism, then f⇤µ has density e (V f 1 log | det Df 1|) using the classical change of variables formula. This motivates the following definition. Definition We define the transport action D(R⇤d ) y W (R⇤d ) by (f , V ) 7! f⇤V := V f 1 log # (@f 1). One can check this is a well-defined group action. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 24 / 44

Di↵erential of the transport action The key computation behind transport
theory is the description of the di↵erential of the transport action. We deﬁne r⇤ V : C 1 tr (R⇤d )d ! tr(C 1 tr (R⇤d )) by r⇤ V f = r† f + @V #f = Tr# (@f ) + hrV , f itr. (This is just notation; it is not actually the adjoint.) Lemma Let V 2 W (R⇤d ) and let t 7! ft be a tangent vector to D(R⇤d ) at id. Then d dt t=0 (ft)⇤V = r⇤ V ˙ f0. In other words, r⇤ V is the di↵erential at id of the orbit map D(R⇤d ) ! W (R⇤d ) : f 7! f⇤V . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 25 / 44

D(R⇤d) as a Lie group We saw that the tangent
space of D(R⇤d ) is (a dense subspace of) the space of vector ﬁelds Ctr(R⇤d )d sa . Conversely: Lemma Given a time-dependent vector ﬁeld t 7! ht (continuous in t) such that @ht is uniformly bounded, there exists a unique path ft in D(R⇤d ) such that f0 = id, ˙ ft = ht ft. The proof is similar to classical ODE theory. If h is independent of t, then we get a one-parameter subgroup of D(R⇤d ). Combining this with our previous observation: Lemma Let h 2 Ctr(R⇤d )d sa with @h bounded, and let ft be the corresponding one-parameter subgroup. Then (ft)⇤V = V for all t if and only if r⇤ V h = 0. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 26 / 44

D(R⇤d) as a Lie group By studying the one-parameter subgroups
of D(R⇤d ) as described above, we arrive at the following definition of the Lie bracket, completely analogous to the Lie bracket on vector fields of Rd . Definition For two vector fields h1, h2 2 Ctr(R⇤d )d , let [h1, h2] = @h1#h2 @h2#h1. This generalizes the definition of Lie brackets for non-commutative polynomials used in Voiculescu’s paper “Cyclomorphy.” David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 27 / 44

D(R⇤d) as a Lie group For each V 2 W
(R⇤d ), its stabilizer {f 2 D(R⇤d ) : f⇤V = V } is a “Lie subgroup,” analogous to a classical group of measure-preserving transformations. By our previous observations, the corresponding Lie subalgebra should be the set of vector ﬁelds h with r⇤ V h = 0. We can verify directly that this is indeed a Lie subalgebra: Lemma r⇤ V [h1, h2] = @(r⇤ V h1)#h2 @(r⇤ V h2)#h1, and in particular ker(r⇤ V ) is closed under Lie brackets. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 28 / 44

Two ingredients for the Riemannian metric In order to deﬁne
the Riemannian metric on the tangent space at V , we need two conditions on V . We will worry later about checking when these are true. Condition 1 There exists a unique non-commutative law µV satisfying the Dyson-Schwinger equation µV [r⇤ V f ] = 0 for f 2 Ctr(R⇤d )d . Note that r⇤ V f is a scalar-valued function approximated by trace polynomials, and µV [r⇤ V f ] is evaluated as r⇤ V f A,⌧ (X) for any X with X = µV . Condition 2 The operator LV = r⇤ V r : tr(C 1 tr (R⇤d )) ! tr(C 1 tr (R⇤d )) has kernel equal to the constant functions, and it has a continuous pseudo-inverse V : tr(C 1 tr (R⇤d )) ! tr(C 1 tr (R⇤d )) with µ( V f ) = 0 and V LV f + µV (f ) = f . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 29 / 44

The Riemannian metric Definition If V satisfies Conditions 1 and
2, the Riemannian metric on TV W (R⇤d ) is given by h ˙ V1, ˙ V2iV = µV [hr V ˙ V1, r V V2itr]. Remark This definition relates to the Riemannian metric for measures on MN (C)d sa . If µ(N) V is the measure with density constant times e N2V MN (C),trN , then the classical Riemannian metric can be expressed as Z hr(LV (N) ) 1 ˙ V1, r(LV (N) ) 1 ˙ V1itrN dµ(N) = N 2 Z ˙ V1(LV (N) ) 1 ˙ V2 dµ(N). The expression on the right-hand side seems simpler, but it is dimension-dependent!! David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 30 / 44

Consequences of Dyson-Schwinger equation If Conditions 1 and 2 hold
for some V , then using the formula for r⇤ V [h1, h2], one can show that ker(r⇤ V ) and Im(r) are orthogonal with respect to V . Furthermore, r V r⇤ V : C 1 tr (R⇤d )d ! C 1 tr (R⇤d )d defines a projection onto the space of gradients. The complementary projection is known as the Leray projection. Remark In the classical setting, the decomposition of vector fields into ker(r⇤ V ) and Im(r) is an infinitesimal version of Brenier’s factorization of a di↵eomorphism into an optimal transport map and a µV -preserving transformation. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 31 / 44

Warnings Although Condition 1 stipulates that µV is uniquely determined
by V , there are many cases where V is not uniquely determined by V . For instance, since µV arises from bounded operators (it is “supported on a operator norm ball”), often modifying V outside an operator norm ball will not change µV . Another way in which degeneracy arises is from the use of trace polynomials. If a particular (A, ⌧) and X are given, and if f is a trace polynomial, then f A,⌧ (X) agrees with p(X) for some non-commutative polynomial p. We can easily imagine that many V lead to the same µ for this reason. Relatedly, the Riemannian metric on the tangent space could have a very large kernel because when we take the inner product in L 2(µV ), all the tr(p) terms are collapsed to constants. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 32 / 44

Construction of transport Closely related to the previous observation about
the di↵erential of the transport action, we have: Lemma Suppose that t 7! Vt is a C 1 path in W (R⇤d ), for t in some interval containing 0. Let ht be a vector field with @ht uniformly bounded and r⇤ Vt ht = ˙ Vt. Let ft be the flow along the vector field ht. Then (ft)⇤V0 = Vt. Suppose we are given the path t ! Vt (perhaps interpolating between some given V0 and V1) and we want to construct ht. If each Vt satisfies Conditions 1 and 2, then we can take ht = V r ˙ Vt. For @ht to be bounded, we require some concrete estimate on V . For ht to depend continuously on t, we need some joint continuous dependence of V f on V and f , at least for some family of V ’s that contains our given path. If these conditions are met, then some smooth transport exists. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 33 / 44

Construction of transport The following theorem is similar to previous
work such as Guionnet-Shlyakhtenko 2009, Dabrowski-Guionnet-Shlyakhtenko 2016. Theorem A Fix C1, C2, C3 > 0 with C2 < 1. Consider V 2 tr(C 1 tr (R⇤d )) such that krV kBCtr  C1 and k@rV IdkBCtr  C2. V satisﬁes Conditions 1 and 2. For such V , the map (V , f ) 7! V f is jointly continuous with respect to the Fr´ echet topology on C 1 tr . Let k 0. If V is as above and furthermore @j V is bounded by some constant Cj for j  k + 2, then V maps BC k tr into BC k tr . The theorem implies that for a path t 7! Vt, if rVt, @rVt, @2rVt, r ˙ Vt, @r ˙ Vt are uniformly bounded, with k@rVt IdkBCtr  C2 < 1, then the above construction of transport works. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 34 / 44

Construction of transport From this result, one immediately gets isomorphisms
of C⇤-algebras associated to the non-commutative laws µVt . Theorem B For a path t 7! Vt satisfying the conditions on the previous slide, there exists a C 1 path t 7! ft of di↵eomorphisms with (ft)⇤V0 = Vt. These give rise to isomorphisms between the tracial C⇤-algebras (and the von Neumann algebras) associated to the GNS representations of the non-commutative laws µVt . In particular, when V is as in the previous theorem, the C⇤-algebra of µV is isomorphic to the one generated by a free semicircular family. There is one thing to check to ﬁnish the proof: If f⇤V0 = V1, then does f⇤µV0 = µV1 ? For the potentials Vt as in the previous slide, this can be checked from the free entropy viewpoint, which will be explained later. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 35 / 44

Warnings These results are not true for arbitrary V ,
even in the one variable case. Indeed, as in Biane-Speicher 1999, consider V A,⌧ (X) = ⌧(f (X)) where f : R ! R is a “double well” potential. If the wells are deep enough, then in the large N limit the spectral distribution is supported on a disjoint union of two intervals. Hence, the C⇤-algebra is C[0, 1] C[0, 1], which is not isomorphic to the C⇤-algebra C[0, 1] which is obtained in the semicircular case. Actually, Condition 1 fails for a such a potential because other measures satisfying the Dyson-Schwinger equation are obtained by reweighting the two components. Relatedly, there are non-constant smooth functions such that LV vanishes in L 2(µV ). Namely, we take (X) = ⌧(f (X)) where f : R ! R is constant on each of the two intervals and is smooth. On the other hand, LV is not zero in tr(C 1 tr (R⇤d )), but the signiﬁcance of this is unclear. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 36 / 44

Inversion of the Laplacian Theorem A comes out of two
sets of tools: 1 The free entropy approach is used to show existence of a non-commutative law µ satisfying µ[r⇤ V f ] = 0 for f 2 Ctr(R⇤d )d . 2 The heat semigroup is used to uniqueness of a non-commutative law µ satisfying µ[LV ] = 0 for 2 tr(Ctr(R⇤d )) as well as constructing V . Let us start with (2). The broad outline is the same as Dab.-Gui.-Shl. 2016, but with di↵erent function spaces. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 37 / 44

Inversion of the Laplacian Recalling that LV = r⇤ V
r, the heat semigroup is the family of operators e tLV for t 0. The rigorous deﬁnition is through free SDE theory. We set [e tLV /2 f ]A,⌧ (X) = EA[Xt(X)], where dXt(X) = dSt 1 2 rV (Xt(X)) dt, X0(X) = 0, where St is a semicircular Brownian motion freely independent of the initial condition X. The assumption that k@rV IdkBCtr  C2 < 1 implies that @X Xt decays like e t(1 C2)/2. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 38 / 44

Inversion of the Laplacian This in turn implies @[e tLV
f ] decays like e t(1 C2). We recover the non-commutative law µV and the pseudo-inverse V , we argue that µV f = lim t!1 e tLV f and V f = Z 1 0 [e tLV f µV (f )] dt. These expressions make sense because of the exponential decay. The smoothness properties as well as the continuous dependence of V f on (V , f ) are proved by studying the smoothness properties of Xt(X) as a function of X, with some simpleminded inductive arguments. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 39 / 44

Free Gibbs laws — results A free Gibbs law for
V is a non-commutative law µ that maximizes V (µ) := (µ) µ(V ), where is the free microstate entropy. We can show the following: 1 If V 2 W (R⇤d ) with @V and @2 V bounded, then a free Gibbs law always exists. 2 Due to the change of variables formula for entropy, any free Gibbs law µ must satisfy the Dyson-Schwinger equation µ[r⇤ V f ] = 0. 3 Fix C1, C2 > 0. The set of V which have a unique free Gibbs law is generic in the set VC1,C2 of V with k@V kBCtr  C1 and k@2 V kBCtr  C2, equipped with the subspace topology from tr(Ctr(R⇤d )). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 40 / 44

Free Gibbs laws — proof with lies The argument for
the existence of free Gibbs laws relies on enlarging the space of laws in order to obtain more compactness. More precisely: 1 We embed the space of non-commutative laws into the dual of a Banach space C consisting of certain functions with quadratic growth at 1. 2 Letting E ✓ C? be the closure of the space of laws, it turns out that the elements of E with “second moment” (not operator norm) bounded by r is compact. 3 V is upper semi-continuous and it goes to 1 as the as the “second moment” of µ goes to 1, and thus we get a maximizer using compactness. 4 Using the change of variables formula for entropy, we deduce that any maximizer ⌫ satisﬁes the Dyson-Schwinger equation (for nice enough test functions). 5 Using the Dyson-Schwinger equation, we show iteratively that moments of ⌫ are ﬁnite, and ultimately that ⌫ 2 ⌃d,R for some R. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 41 / 44

Geodesic equations Deﬁnition The geodesic equations on W (R⇤d )
are the pair of equations 8 < : ˙ Vt = LVt t ˙ t = 1 2 hr t, r titr. These can be obtained formally as the large N limit of the geodesic equations for measures on MN (C)d sa . Thinking about the classical case, one is led to conjecture that nice enough solutions must have the form Vt = (id +tr ˙ 0)⇤V0. It is straightforward to check that when @r ˙ 0 is bounded, this formula deﬁnes a solution for small enough t. We do not show rigorously that these are the only solutions. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 42 / 44

Towards free optimal transport However, we can show rigorously that
these paths minimize length with respect to the L 2-coupling distance when @r ˙ 0 is bounded by a constant C and when t 2 (0, 1/C). This follows from the more general proposition below. Definition For two non-commutative laws µ and ⌫, we define dW (µ, ⌫) as the infimum of kX Y k2 over all tracial C⇤-algebras (A, ⌧) and X, Y 2 Ad sa such that X = µ and Y = ⌫. Proposition Let 2 tr(C 2 tr (R⇤d ))sa such that k@r IdkBCtr < 1. Then for every (A, ⌧) and X 2 Ad sa , we have dW ( X , r (X) ) = kX r (X)k⌧,2. In other words, X and r (X) are an optimal coupling of their respective laws. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 43 / 44

Towards free optimal transport The proof of the proposition is
inspired by the classical Monge-Kantorovich duality. By the inverse function theorem, r has an inverse function, so deﬁne A,⌧ (Z) = hZ, ((r ) 1)A,⌧ (Z)i⌧ ( (r ) 1)A,⌧ (Z). Note that Y = ((r ) 1)A,⌧ (Z) maximizes the function hZ, Y i⌧ A,⌧ (Y ) by calculus and by convexity of . (So is the Legendre transform of .) Thus, A,⌧ (Y ) + A,⌧ (Z) hY , Zi⌧ for all Y , Z 2 Ad sa . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 44 / 44

Towards free optimal transport Fix (A, ⌧) and X 2
Ad sa . If Y , Z is any coupling of X and r (X) on some other tracial C⇤-algebra (A0, ⌧0), then hY , Zi⌧0  A0,⌧0 (Y ) + A0,⌧0 (Z) = A,⌧ (X) + A,⌧ (r A,⌧ (X)) = hX, A,⌧ (X)i⌧ , where the last inequality follows by the deﬁnition of . Thus, X, r (X) is a coupling that maximizes the inner product between the ﬁrst and second variable, which is equivalent to minimizing the L 2 distance (since the L 2 norms of X and Y are uniquely determined by the laws). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 45 / 44

Free Wasserstein manifold

Free Wasserstein manifold

More Decks by Wuchen Li

Featured

Transcript