Slide 1

Slide 1 text

Free Wasserstein manifold Wuchen Li University of South Carolina Free probability seminar, UC Berkeley Based on a joint work with David Jekel (UCSD) and Dimitri Shlyakhtenko (UCLA). 1

Slide 2

Slide 2 text

Entropy, Fisher information and Transportation In recent years, there are actively joint studies to connect entropy, Fisher information and transportation. Nowadays, these connections have applications in functional inequalities and dynamical behaviors of fluid dynamics. In this talk, we discuss transportation theory in free probability by studying “log-densities” for non-commutative random variables. 2

Slide 3

Slide 3 text

Metric in probability space 3

Slide 4

Slide 4 text

Brownian motion and heat equations Consider a standard Brownian motion in Rd by dXt = p 2dBt. Let ⇢(t, x) denote the probability density function of Xt . Then ⇢ satisfies the heat equation @⇢t @t = r · (r⇢t) = ⇢t. where ⇢t = ⇢(t, x), and r·, r are the divergence, gradient operators in Rd, respectively. 4

Slide 5

Slide 5 text

Entropy dissipation Consider the negative Boltzmann-Shannon entropy by H(⇢) = Z Rd ⇢(x)log ⇢(x)dx. Along the heat equation, the dissipation relation holds: d dtH(⇢t) = Z Rd krx log ⇢tk 2⇢tdx = I(⇢t), where I(⇢) is named the Fisher information functional. There is a formulation behind this relation, namely ⇢(t, x) is a gradient flow of entropy in optimal transport space. 5

Slide 6

Slide 6 text

Optimal transport What is the optimal way to move or transport the mountain with shape X, density ⇢0(x) to another shape Y with density ⇢1(y)? Consider DistT (⇢0, ⇢1) = inf T Z Rd kx T(x)k 2⇢0(x)dx, where the infimum is among all transport maps T, which transfers ⇢0(x) to ⇢1(x), i.e. ⇢0(x) = ⇢1(T(x))det(rT(x)). 6

Slide 7

Slide 7 text

Overview The optimal transport problem was first introduced by Monge in 1781 and relaxed by Kantorovich in 1940. It introduces a distance on the space of probability distributions, named optimal transport distance, Wasserstein distance, or Earth Mover’s distance. There are many viewpoints and applications of this distance: I Linear programming; I Mapping/Monge-Amp´ ere equation; I Fluid dynamics; I Density manifold (Arnold mechanics). See Ambrosio, Villani, Otto and many more. In the first part of this talk, we mainly consider its transportation formulation in classical probability. In the first part, I focus on the formulation in a classical probability. Then David will present the formulation for free probability. 7 (Gangbo et.al.)

Slide 8

Slide 8 text

Transport distance formulations There is a relaxation formulation of classical optimal transport distance. inf ⇡ Z Rd Z Rd kx yk 2⇡(x, y)dxdy, where the infimum is taken among all joint measures (transport plans) ⇡(x, y) having ⇢0(x) and ⇢1(y) as marginals, i.e. Z Rd ⇡(x, y)dy = ⇢0(x), Z Rd ⇡(x, y)dx = ⇢1(y), ⇡(x, y) 0. Here ⇡(x, y) = (x, T(x) = x + r (0, x))#⇢0. 8

Slide 9

Slide 9 text

Dynamical optimal transport 11

Slide 10

Slide 10 text

Optimal transport space (Density manifold) The optimal transport has a variational formulation (Benamou-Brenier 2000): inf v Z 1 0 E Xt ⇠⇢t kv(t, Xt)k 2 dt, where E is the expectation operator and the infimum runs over all vector fields vt , such that ˙ Xt = v(t, Xt), X0 ⇠ ⇢0, X1 ⇠ ⇢1. Under this metric, the probability set has a Riemannian geometry structure1. 1John D. La↵erty: the density manifold and configuration space quantization, 1988. 9

Slide 11

Slide 11 text

Riemannian metric for optimal transport Informally speaking, Wasserstein metric refers to the following bilinear form: h ˙ ⇢1, G(⇢) ˙ ⇢2i = Z ( ˙ ⇢1, ( ⇢) 1 ˙ ⇢2)dx. In other words, denote ˙ ⇢i = r · (⇢r i), i = 1, 2, then h 1, G(⇢) 1 2i = h 1, r · (⇢r) 2i, where ⇢ 2 P(⌦), ⇢i is the tangent vector in P(⌦) with Z ⇢idx = 0, and i 2 C1(⌦) are cotangent vectors in P(⌦) at the point ⇢. Here r·, r are standard divergence and gradient operators in ⌦. 10 . .

Slide 12

Slide 12 text

Optimal transport gradient flows The Wasserstein gradient flow of an energy functional F(⇢) leads to @t⇢ = G(⇢) 1 ⇢F(⇢) =r · (⇢r ⇢F(⇢)). Example If F(⇢) = R F(x)⇢(x)dx, then the gradient flow follows @t⇢ = r · (⇢rF(x)). 11

Slide 13

Slide 13 text

Entropy dissipation revisited The gradient flow of the negative entropy H(⇢) = Z Rd ⇢(x)log ⇢(x)dx, w.r.t. optimal transport metric distance satisfies @⇢ @t = r · (⇢rlog ⇢) = ⇢. Here the major trick is that ⇢r log ⇢ = r⇢. In this way, one can study the entropy dissipation by d dtH(⇢) = Z Rd log ⇢r · (⇢rlog ⇢)dx = Z Rd kr log ⇢k 2⇢dx. 12

Slide 14

Slide 14 text

Optimal transport Hamiltonian flows Consider the Lagrangian by L(⇢, @t⇢) = 1 2 Z ⇣ @t⇢, ( r · (⇢r)) 1@t⇢ ⌘ dx F(⇢). The Hamiltonian flow satisfies the Euler-Lagrange equation d dt @t⇢L(⇢, @t⇢) = r⇢L(⇢, @t⇢). 13

Slide 15

Slide 15 text

Optimal transport Hamiltonian flows By the Legendre transform, i.e. H(⇢, ) = sup @t⇢ Z @t⇢ dx L(⇢, @t⇢). And the Hamiltonian system follows @t⇢ = H(⇢, ), @t = ⇢H(⇢, ), where ⇢ , are L2 first variation operators w.r.t. ⇢, , respectively and the density Hamiltonian forms H(⇢, ) = 1 2 Z kr k 2⇢dx + F(⇢). Here ⇢ is the “density” state variable and is the “density” moment variable. 14

Slide 16

Slide 16 text

Hamiltonian flows: Compressible Euler equation More explicitly, 8 < : @t⇢ + r · (⇢r ) = 0 @t + 1 2kr k 2 = ⇢F(⇢). 15

Slide 17

Slide 17 text

Why optimal transport formalisms? I Generalized log-Sobolev inequalities and bound; I Generalized dynamics: E.g., Schrodinger equation, Schrodinger bridge problem and mean field games; I Generalized dualities and distances in information theory and AI. 16

Slide 18

Slide 18 text

Log density coordinates As we will see in the second talk, it is more natural in the random matrix and free setting to study the log-density rather than the density. Thus, let us describe what happens in the classical case when we write everything in terms of the log-density — introducing an alternative coordinate system for the classical manifold of densities. (⇢, ) ! (e V , ), where R e V dx = 1. All formalisms of optimal transport need to be adjusted accordingly. 17

Slide 19

Slide 19 text

Log density change of variable Consider @t⇢(t, x) = r · (⇢(t, x)r (t, x)). Denote ⇢(t, x) = e V (t,x). Then @tV = @t log ⇢ = @t⇢ ⇢ =r · (⇢r ) ⇢ =(r log ⇢, r ) + = (rV, r ) + = : LV . Here LV = (rV, r) + . 18

Slide 20

Slide 20 text

Log density gradient flows Similarly, we can formulate the gradient flow in term of log densities. Consider @t⇢ =r · (⇢r ⇢F(⇢)). Denote ⇢ = e V . Then @tV = LV ⇢F(⇢)|⇢=e V . Example If F(⇢) = R F(x)⇢(x)dx = R F(x)e V (x)dx, then the gradient flow follows @tV = LV F(x). 19

Slide 21

Slide 21 text

Log density entropy dissipation The gradient flow of the negative entropy H(⇢) = Z ⇢ log ⇢dx = Z V e V dx, w.r.t. optimal transport metric distance satisfies @tV = LV V = krV k 2 + V. In this log density coordinate, one can study the entropy dissipation by d dtH(⇢) = Z Rd krV k 2e V dx. 20

Slide 22

Slide 22 text

Log density Hamiltonian flows Similarly, we can formulate the Hamiltonian flow in term of log densities. 8 < : @tV LV = 0 @t + 1 2kr k 2 = ⇢F(⇢)|⇢=e V . 21

Slide 23

Slide 23 text

The free Wasserstein manifold David Jekel, Wuchen Li, Dima Shlyakhtenko February 22, 2021 David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 1 / 44

Slide 24

Slide 24 text

This work was supported in part by the National Science Foundation. The talk will focus on the big picture and thus precise definitions will only be given when helpful. Before the rigorous statements, there will be several slides of motivation and introduction aimed at people who are familiar with free probability. Feel free to interrupt with questions about notation. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 2 / 44

Slide 25

Slide 25 text

Motivation The classical Wasserstein manifold P(M), whose points are smooth positive probability densities, is an infinite-dimensional Riemannian framework which nicely describes things such as entropy, the heat equation, log-Sobolev inequalities, optimal transport, measure-preserving transformations, etc. Because many of these notions have analogs in free probability and random matrix theory, we want to define the tracial non-commutative version of the Wasserstein manifold, in which we use non-commutative laws instead of measures. A big obstacle is that we don’t have a direct analog of density in the non-commutative setting. However, there are several indications that the log-density is a better behaved notion. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 3 / 44

Slide 26

Slide 26 text

Motivation — random matrices Given some self-adjoint non-commutative polynomial f (with nice enough behavior at 1), we can define a function V (N) : MN (C)d sa ! R by V (N)(x) = trN (f (x)), where x = (x1, . . . , xd ) is a d-tuple of self-adjoint N ⇥ N matrices and trN = (1/N) Tr is the normalized trace on MN (C). Then we define a probability measure µ(N) on MN (C)d sa by dµ(N)(x) = constant(V , N)e N2V (N)(x) dx, where dx is Lebesgue measure. Letting X (N) be a random matrix tuple chosen according to the measure µ(N), a lot of past work has shown in certain cases that for every non-commutative polynomial p, trN (p(X (N))) converges almost surely to some deterministic limit, which is described by ⌧(p(X)) for some d-tuple X from a tracial von Neumann algebra (A, ⌧). Then we might want to say that “tr(p) is a log-density of the distribution of X.” David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 4 / 44

Slide 27

Slide 27 text

Motivation — free score function This idea is closely related to Voiculescu’s idea of a free score function (a.k.a. conjugate variable). The classical score function of a measure with density ⇢ on Rd is r log ⇢. If X is a random variable with density ⇢ and if ⇠ = (r log ⇢)(X), then we have the integration-by-parts relation E[h⇠, f (X)i] = E[Tr(Df (X))], for all f 2 C 1 c (Rd , Rd ), where Df is the Jacobian matrix. Given a tracial von Neumann algebra (A, ⌧) generated by X 2 Ad sa , we say that ⇠ 2 Ad sa is a free score function for X if h⇠, p(X)i⌧ = ⌧ ⌦ ⌧ ⌦ Tr(J p) for every non-commutative polynomial p, where J p is the matrix of derivatives of p in the sense of Voiculescu’s free di↵erence quotient. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 5 / 44

Slide 28

Slide 28 text

Motivation — free score function In the setting of the random matrix d-tuples X (N) above given by V (N)(x) = trN (f (x)), the free score function becomes very concrete. Since ⇢ = conste N2V (N) , the classical score function is (up to normalization) rV (N)(x), which after some matrix computations works out to D f (x), where D f is Voiculescu’s cyclic gradient. Classical integration by parts plus some matrix computations tell us that E[hrV (N)(X (N)), p(X (N))itrN ] = E[trN ⌦ trN ⌦ Trd [J p(X (N))]]. Thus, if X is the d-tuple of self-adjoint operators from (A, ⌧) describing the large-N limit, then ⇠ = D p(X) is the free score function for X. The existence of a free score means that “the gradient of the log-density makes sense as an element of L 2.” This motivates us to make the log-density the central object of study. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 6 / 44

Slide 29

Slide 29 text

Overview We’re going to define the free Wasserstein manifold W (R⇤d ) as the space of certain “log-density” functions V , which are a generalization of things like tr(f ). More precisely, V will some from a space of tracial non-commutative smooth functions that are defined in terms of trace polynomials. (Similar spaces were defined in Dabrowski, Guionnet, and Shlaykhtenko’s 2016 preprint on free transport, but we take a di↵erent approach to the norms.) The tangent space of W (R⇤d ) at V is similarly a space of tracial non-commutative smooth functions W , which are viewed as perturbations of V . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 7 / 44

Slide 30

Slide 30 text

Overview In order to obtain a Riemannian metric for the Wasserstein manifold, we must associate a non-commutative law µV to each V . This is much trickier than in the classical case (where we would just set dµV (x) = constant e V (x) dx), but there are two known methods for doing this: (1) For each V , find a (hopefully unique) law µ that maximizes (µ) µ(V ), where is Voiculescu’s free microstate entropy. This approach is inspired by Voiculescu and is closely related to the random matrix models discussed before. (2) Set up the free stochastic di↵erential equation dXt = dSt (1/2)rV (Xt) dt, where St is a free Brownian motion (still self-adjoint d-tuple), and (hopefully) recover µV as the limiting distribution of Xt as t ! 1. This approach was pioneered by Biane and Speicher (1999) and further developed by Guionnet, Shlyakhtenko, and Dabrowski. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 8 / 44

Slide 31

Slide 31 text

Outline Background on non-commutative laws. Tracial non-commutative smooth functions. Free Wasserstein manifold and di↵eomorphism group. Riemannian metric. Strategy to construct smooth transport. Inversion of the Laplacian LV through heat semigroup and SDE. Free Gibbs laws through maximization of (µ) µ(V ). Geodesics and optimal transport. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 9 / 44

Slide 32

Slide 32 text

Operator algebras and laws Definition A unital C⇤ -algebra is a subalgebra of B(H) (for some Hilbert space H) that is closed under adjoints and limits in operator norm. Definition A tracial C⇤ -algebra is a pair (A, ⌧) where A is a C⇤-algebra and ⌧ is a faithful trace, that is, ⌧(1) = 1, ⌧(a ⇤ a) 0 with equality if and only if a = 0, ⌧(ab) = ⌧(ba). Remark We don’t need to go into the definition of von Neumann algebras now. But every tracial C⇤-algebra can be completed to a tracial von Neumann algebra. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 10 / 44

Slide 33

Slide 33 text

Operator algebras and laws Definition Let Chx1, . . . , xd i be the algebra of non-commutative polynomials equipped with the ⇤-operation such that x ⇤ j = xj . (We typically use x to denote formal or generic variables and X to denote a specific tuple of operators.) Definition Let ⌃d,R be the set of linear functionals : Chx1, . . . , xd i ! C such that (1) = 1, (p ⇤ p) 0, (pq) = (qp), | (xi1 . . . xik )|  R k, equipped with the weak-? topology (as a subset of the dual of Chx1, . . . , xd i). We call elements of ⌃d,R non-commutative laws with exponential bound R. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 11 / 44

Slide 34

Slide 34 text

Operator algebras and laws Proposition If (A, ⌧) is a tracial C⇤-algebra and X = (X1, . . . , Xd ) 2 Ad sa , then the map X : Chx1, . . . , xd i ! C, p 7! ⌧(p(X)) is a non-commutative law with exponential bound kXk1 = maxj kXj k. Conversely, every 2 ⌃d,R can be realized as X for some (A, ⌧) and X 2 Ad sa with kXk1  R. The proof is a variant of the GNS construction. The proposition can be interpreted as follows: 1 ⌃d,R is the space of traces on the C⇤-universal free product C([ R, R])⇤d . 2 ⌃d,R is in bijection with isomorphism classes of triples (A, ⌧, X), where (A, ⌧) is a tracial C⇤-algebra and X 2 Ad sa generates A; here isomorphism means a C⇤-isomorphism that preserves the trace and generators. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 12 / 44

Slide 35

Slide 35 text

Trace polynomials We next describe non-commutative functions that are modeled on trace polynomials in a similar spirit to Dab. Gui. Shl. 2016. A trace polynomial in (x1, . . . , xd ) is an expression formed through addition, multiplication, and application of a symbol tr, such as f (x1, x2, x3) = tr(x 2 1 x2) tr(x3)x1 + tr(x2x3) + 5 tr(x1x2x3) tr(x1)x2x 2 3 . These expressions are considered modulo the relations that tr(pq) = tr(qp) and tr(tr(p)q) = tr(p) tr(q). For any tracial C⇤-algebra (A, ⌧) and X 2 Ad sa , we can evaluate a trace polynomial f on X by substituting Xj for the formal symbol xj and ⌧ for the formal symbol tr. Hence, a trace polynomial f gives rise to a function f A,⌧ : Ad sa ! A. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 13 / 44

Slide 36

Slide 36 text

Trace polynomials We next describe non-commutative functions that are modeled on trace polynomials. A trace polynomial in (x1, . . . , xd ) is an expression formed through addition, multiplication, and application of a symbol tr, such as f (x1, x2, x3) = tr(x 2 1 x2) tr(x3)x1 + tr(x2x3) + 5 tr(x1x2x3) tr(x1)x2x 2 3 . These expressions are considered modulo the relations that tr(pq) = tr(qp) and tr(tr(p)q) = tr(p) tr(q). For any tracial C⇤-algebra (A, ⌧) and X 2 Ad sa , we can evaluate a trace polynomial f on X by substituting Xj for the formal symbol xj and ⌧ for the formal symbol tr. Hence, a trace polynomial f gives rise to a function f A,⌧ : Ad sa ! A. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 14 / 44

Slide 37

Slide 37 text

Trace polynomials Trace polynomials have several advantages over non-commutative polynomials. 1 It follows from the work of Procesi (1976) that every function MN (C)d sa ! MN (C) that is entrywise polynomial and is invariant under unitary conjugation must be given by a trace polynomial. 2 For each trace polynomial f , we can compute the Laplacian of f MN (C),trN as a function on MN (C)d sa (equipped with the inner product from trN ). The Laplacian (1/N 2) f MN (C),trN is a trace polynomial and it converges coe cientwise as N ! 1 to some trace polynomial Lf . We’ll define the non-commutative space C k(R⇤d ) roughly as functions that such that the first k derivatives can be approximated on operator-norm balls by trace polynomials. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 15 / 44

Slide 38

Slide 38 text

Description of trace Ck functions The space C k tr (R⇤d ) is described as follows: Each f 2 C k tr (R⇤d ) is a collection of functions f A,⌧ : Ad sa ! A for tracial C⇤-algebras (A, ⌧). f A,⌧ must be a C k function in the sense of Fr´ echet di↵erentiation. The derivative @k f A,⌧ (X) is a multilinear map Ad sa ⇥ · · · ⇥ Ad sa ! A. Inspired by the non-commutative H¨ older’s inequality, we define the norm k@j f A,⌧ (X)kM j as the smallest constant such that k@j f A,⌧ (X)[Y1, . . . , Yk ]kp  k@j f A,⌧ (X)kM k kY1kp1 . . . kYj kpj . where 1/p = 1/p1 + · · · + 1/pj , and where j = 0, . . . , k. Then k@j f kM j ,R is the supremum of k@j f A,⌧ (X)kM j over (A, ⌧) and X 2 Ad sa with kXk1  R. For R > 0 and j  k, we assume that k@j f kM j ,R is finite and that @j f can be approximated in this norm by trace polynomials of X, Y1, . . . , Yk that are multilinear in Y1, . . . , Yk . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 16 / 44

Slide 39

Slide 39 text

Properties of trace Ck functions There are also spaces Ctr(R⇤d , M j (R⇤d1 , . . . , R⇤dn )) of functions where f A,⌧ (X) is a multilinear map Ad1 sa ⇥ · · · ⇥ Adn sa ! A. The exact definition of the space is less important than the properties: These spaces are closed under composition, whenever the composition makes sense, and they satisfy the chain rule. There is an inverse function theorem: If f is C k tr (R⇤d ) self-adjoint d-tuple, and if @f Id is uniformly bounded by a constant c < 1, then f 1 is defined and is C k tr . There is a trace map tr : C k tr (R⇤d ) ! C k tr (R⇤d ) given by tr(f )A,⌧ (X) = ⌧(f A,⌧ (X)). The image tr(C k tr (R⇤d )) consists of those f which are scalar-valued. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 17 / 44

Slide 40

Slide 40 text

Examples of trace Ck tr functions Of course, trace polynomials are C 1 tr functions. If : R ! R such that R R |2⇡s|k b(s) ds < 1, then the function f A,⌧ (X) = (X) (defined by functional calculus) is in C k tr (R⇤1) and the kth derivative is bounded by R R |2⇡s|k b(s) ds. Together with the chain rule, this shows that there is an abundance of BC k tr (R⇤d ) functions, that is, functions in C k tr (R⇤d ) such that k@j f kM j ,u := sup R>0 k@j f kM j ,R < 1. Imposing certain growth conditions at 1 on a on Ctr(R⇤d ) function is not a big restriction. This makes life easier than it would be if we only used trace polynomials. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 18 / 44

Slide 41

Slide 41 text

Di↵erentiation of trace Ck functions For scalar-valued g 2 C k tr (R⇤d ), we can define a gradient rg 2 C k tr (R⇤d ). In the case where g = tr(p) for some non-commutative polynomial p, then rg is the cyclic gradient of p. The analog of C k functions from Rd to Md (C) is the space C k tr (R⇤d , M (R⇤d )). This is the space that contains the derivative @f when f 2 C k+1 tr (R⇤d )d sa , as well as the Hessian of g when g is a scalar-valued element of C k+1 tr (R⇤d ). For F 2 C k tr (R⇤d , M (R⇤d )), for X 2 Ad sa , the object F A,⌧ (X) is a linear transformation Ad ! Ad . We define F#G to be the pointwise composition of these linear transformations. It turns out that C k tr (R⇤d , M (R⇤d )) is a ⇤-algebra with respect to the #-multiplication. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 19 / 44

Slide 42

Slide 42 text

Di↵erentiation of trace Ck functions We can define a trace Tr# : C k tr (R⇤d , M (R⇤d )) ! tr(C k tr (R⇤d )) by (Tr# (F))A,⌧ (X) = hS, F A⇤B,⌧⇤ (X)[S]i⌧⇤ , where (B, ) is the tracial C⇤-algebra of generated by a free semicircular d-tuple S. This is the analog of the map C k(Rd , Md (C)) ! C k(Rd ) defined by pointwise application of the trace Trd on Md (C). This is because the trace of a matrix A can be expressed as EhY , AY i where Y is a standard Gaussian random vector in Rd , and the analog of the Gaussian in free probability is the semicircular family. Another motivating example is that if F 2 C k tr (R⇤d , M (R⇤d )) is given by F A,⌧ (X)[Y ]i = P j p A,⌧ i,j (X)Yj q A,⌧ i,j (X) for some matrix (pi,j ⌦ qi,j )i,j of non-commutative polynomials, then Tr# (F)A,⌧ (X) = X i ⌧(pi,i (X))⌧(qi,i (X)). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 20 / 44

Slide 43

Slide 43 text

Di↵erentiation of trace Ck functions The trace Tr# allows us to define the divergence operator r† : C k+1 tr (R⇤d , M (R⇤d )) ! tr(C k tr (R⇤d )) as the trace of the Jacobian, as well as the Laplacian L = r†r : tr(C k+2 tr (R⇤d )) ! tr(C k tr (R⇤d )). These operators are the limits of the corresponding normalized divergence and Laplacian for functions on MN (C)d sa . Furthermore, the trace Tr# gives rise to a Fuglede-Kadison log-|determinant| map log # : GL(C k tr (R⇤d , M (R⇤d ))) ! tr(C k tr (R⇤d )). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 21 / 44

Slide 44

Slide 44 text

Free Wasserstein manifold and di↵eomorphism group We’ll first set up the manifold formally. Afterwards, we’ll describe how to extract a non-commutative law µV from V and hence define the Riemannian metric. Definition The free Wasserstein manifold W (R⇤d ) is the set of V 2 tr(C 1 tr (R⇤d )) such that V has “quadratic growth at 1” in the sense that for some constants a, a 0 > 0 and b, b 0 2 R, we have a X j ⌧(X 2 j ) + b  V A,⌧ (X)  a 0 X j ⌧(X 2 j ) + b 0. Definition The free di↵eomorphism group D(R⇤d ) is the set of f 2 C 1 tr (R⇤d )d sa such that f has an inverse function f 1 in C 1(R⇤d )d sa , and @f , @f 1 are bounded. Note this is a group under composition. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 22 / 44

Slide 45

Slide 45 text

Tangent vectors A tangent vector to W (R⇤d ) at V is an equivalence class of C 1 paths ( ✏, ✏) ! W (R⇤d ) : t 7! Vt with V0 = V , where two paths are equivalent if they have the same ˙ V0. For convenience, we assume that Vt satisfies the quadratic growth bounds with a, a 0, b, b 0 independent of t. A tangent vector to D(R⇤d ) at id is similarly an equivalence class of C 1 paths t 7! ft with f0 = id, and the equivalence is equality of ˙ f0. Again, assume that @ft and @f 1 t are uniformly bounded. Here, by “C 1 path”, we mean it is continuously di↵erentiable with respect to the Fr´ echet topology of C 1 tr (R⇤d ) on the target space (defined by the seminorms of each derivative @j f on each ball of radius R). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 23 / 44

Slide 46

Slide 46 text

The transport action In the classical case, one studies the action of Di↵(Rd ) on P(Rd ) by push-forward, which is viewed as an infinite-dimensional Lie group acting on an infinite-dimensional Riemannian manifold. If µ has density e V and if f is a di↵eomorphism, then f⇤µ has density e (V f 1 log | det Df 1|) using the classical change of variables formula. This motivates the following definition. Definition We define the transport action D(R⇤d ) y W (R⇤d ) by (f , V ) 7! f⇤V := V f 1 log # (@f 1). One can check this is a well-defined group action. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 24 / 44

Slide 47

Slide 47 text

Di↵erential of the transport action The key computation behind transport theory is the description of the di↵erential of the transport action. We define r⇤ V : C 1 tr (R⇤d )d ! tr(C 1 tr (R⇤d )) by r⇤ V f = r† f + @V #f = Tr# (@f ) + hrV , f itr. (This is just notation; it is not actually the adjoint.) Lemma Let V 2 W (R⇤d ) and let t 7! ft be a tangent vector to D(R⇤d ) at id. Then d dt t=0 (ft)⇤V = r⇤ V ˙ f0. In other words, r⇤ V is the di↵erential at id of the orbit map D(R⇤d ) ! W (R⇤d ) : f 7! f⇤V . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 25 / 44

Slide 48

Slide 48 text

D(R⇤d) as a Lie group We saw that the tangent space of D(R⇤d ) is (a dense subspace of) the space of vector fields Ctr(R⇤d )d sa . Conversely: Lemma Given a time-dependent vector field t 7! ht (continuous in t) such that @ht is uniformly bounded, there exists a unique path ft in D(R⇤d ) such that f0 = id, ˙ ft = ht ft. The proof is similar to classical ODE theory. If h is independent of t, then we get a one-parameter subgroup of D(R⇤d ). Combining this with our previous observation: Lemma Let h 2 Ctr(R⇤d )d sa with @h bounded, and let ft be the corresponding one-parameter subgroup. Then (ft)⇤V = V for all t if and only if r⇤ V h = 0. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 26 / 44

Slide 49

Slide 49 text

D(R⇤d) as a Lie group By studying the one-parameter subgroups of D(R⇤d ) as described above, we arrive at the following definition of the Lie bracket, completely analogous to the Lie bracket on vector fields of Rd . Definition For two vector fields h1, h2 2 Ctr(R⇤d )d , let [h1, h2] = @h1#h2 @h2#h1. This generalizes the definition of Lie brackets for non-commutative polynomials used in Voiculescu’s paper “Cyclomorphy.” David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 27 / 44

Slide 50

Slide 50 text

D(R⇤d) as a Lie group For each V 2 W (R⇤d ), its stabilizer {f 2 D(R⇤d ) : f⇤V = V } is a “Lie subgroup,” analogous to a classical group of measure-preserving transformations. By our previous observations, the corresponding Lie subalgebra should be the set of vector fields h with r⇤ V h = 0. We can verify directly that this is indeed a Lie subalgebra: Lemma r⇤ V [h1, h2] = @(r⇤ V h1)#h2 @(r⇤ V h2)#h1, and in particular ker(r⇤ V ) is closed under Lie brackets. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 28 / 44

Slide 51

Slide 51 text

Two ingredients for the Riemannian metric In order to define the Riemannian metric on the tangent space at V , we need two conditions on V . We will worry later about checking when these are true. Condition 1 There exists a unique non-commutative law µV satisfying the Dyson-Schwinger equation µV [r⇤ V f ] = 0 for f 2 Ctr(R⇤d )d . Note that r⇤ V f is a scalar-valued function approximated by trace polynomials, and µV [r⇤ V f ] is evaluated as r⇤ V f A,⌧ (X) for any X with X = µV . Condition 2 The operator LV = r⇤ V r : tr(C 1 tr (R⇤d )) ! tr(C 1 tr (R⇤d )) has kernel equal to the constant functions, and it has a continuous pseudo-inverse V : tr(C 1 tr (R⇤d )) ! tr(C 1 tr (R⇤d )) with µ( V f ) = 0 and V LV f + µV (f ) = f . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 29 / 44

Slide 52

Slide 52 text

The Riemannian metric Definition If V satisfies Conditions 1 and 2, the Riemannian metric on TV W (R⇤d ) is given by h ˙ V1, ˙ V2iV = µV [hr V ˙ V1, r V V2itr]. Remark This definition relates to the Riemannian metric for measures on MN (C)d sa . If µ(N) V is the measure with density constant times e N2V MN (C),trN , then the classical Riemannian metric can be expressed as Z hr(LV (N) ) 1 ˙ V1, r(LV (N) ) 1 ˙ V1itrN dµ(N) = N 2 Z ˙ V1(LV (N) ) 1 ˙ V2 dµ(N). The expression on the right-hand side seems simpler, but it is dimension-dependent!! David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 30 / 44

Slide 53

Slide 53 text

Consequences of Dyson-Schwinger equation If Conditions 1 and 2 hold for some V , then using the formula for r⇤ V [h1, h2], one can show that ker(r⇤ V ) and Im(r) are orthogonal with respect to V . Furthermore, r V r⇤ V : C 1 tr (R⇤d )d ! C 1 tr (R⇤d )d defines a projection onto the space of gradients. The complementary projection is known as the Leray projection. Remark In the classical setting, the decomposition of vector fields into ker(r⇤ V ) and Im(r) is an infinitesimal version of Brenier’s factorization of a di↵eomorphism into an optimal transport map and a µV -preserving transformation. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 31 / 44

Slide 54

Slide 54 text

Warnings Although Condition 1 stipulates that µV is uniquely determined by V , there are many cases where V is not uniquely determined by V . For instance, since µV arises from bounded operators (it is “supported on a operator norm ball”), often modifying V outside an operator norm ball will not change µV . Another way in which degeneracy arises is from the use of trace polynomials. If a particular (A, ⌧) and X are given, and if f is a trace polynomial, then f A,⌧ (X) agrees with p(X) for some non-commutative polynomial p. We can easily imagine that many V lead to the same µ for this reason. Relatedly, the Riemannian metric on the tangent space could have a very large kernel because when we take the inner product in L 2(µV ), all the tr(p) terms are collapsed to constants. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 32 / 44

Slide 55

Slide 55 text

Construction of transport Closely related to the previous observation about the di↵erential of the transport action, we have: Lemma Suppose that t 7! Vt is a C 1 path in W (R⇤d ), for t in some interval containing 0. Let ht be a vector field with @ht uniformly bounded and r⇤ Vt ht = ˙ Vt. Let ft be the flow along the vector field ht. Then (ft)⇤V0 = Vt. Suppose we are given the path t ! Vt (perhaps interpolating between some given V0 and V1) and we want to construct ht. If each Vt satisfies Conditions 1 and 2, then we can take ht = V r ˙ Vt. For @ht to be bounded, we require some concrete estimate on V . For ht to depend continuously on t, we need some joint continuous dependence of V f on V and f , at least for some family of V ’s that contains our given path. If these conditions are met, then some smooth transport exists. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 33 / 44

Slide 56

Slide 56 text

Construction of transport The following theorem is similar to previous work such as Guionnet-Shlyakhtenko 2009, Dabrowski-Guionnet-Shlyakhtenko 2016. Theorem A Fix C1, C2, C3 > 0 with C2 < 1. Consider V 2 tr(C 1 tr (R⇤d )) such that krV kBCtr  C1 and k@rV IdkBCtr  C2. V satisfies Conditions 1 and 2. For such V , the map (V , f ) 7! V f is jointly continuous with respect to the Fr´ echet topology on C 1 tr . Let k 0. If V is as above and furthermore @j V is bounded by some constant Cj for j  k + 2, then V maps BC k tr into BC k tr . The theorem implies that for a path t 7! Vt, if rVt, @rVt, @2rVt, r ˙ Vt, @r ˙ Vt are uniformly bounded, with k@rVt IdkBCtr  C2 < 1, then the above construction of transport works. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 34 / 44

Slide 57

Slide 57 text

Construction of transport From this result, one immediately gets isomorphisms of C⇤-algebras associated to the non-commutative laws µVt . Theorem B For a path t 7! Vt satisfying the conditions on the previous slide, there exists a C 1 path t 7! ft of di↵eomorphisms with (ft)⇤V0 = Vt. These give rise to isomorphisms between the tracial C⇤-algebras (and the von Neumann algebras) associated to the GNS representations of the non-commutative laws µVt . In particular, when V is as in the previous theorem, the C⇤-algebra of µV is isomorphic to the one generated by a free semicircular family. There is one thing to check to finish the proof: If f⇤V0 = V1, then does f⇤µV0 = µV1 ? For the potentials Vt as in the previous slide, this can be checked from the free entropy viewpoint, which will be explained later. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 35 / 44

Slide 58

Slide 58 text

Warnings These results are not true for arbitrary V , even in the one variable case. Indeed, as in Biane-Speicher 1999, consider V A,⌧ (X) = ⌧(f (X)) where f : R ! R is a “double well” potential. If the wells are deep enough, then in the large N limit the spectral distribution is supported on a disjoint union of two intervals. Hence, the C⇤-algebra is C[0, 1] C[0, 1], which is not isomorphic to the C⇤-algebra C[0, 1] which is obtained in the semicircular case. Actually, Condition 1 fails for a such a potential because other measures satisfying the Dyson-Schwinger equation are obtained by reweighting the two components. Relatedly, there are non-constant smooth functions such that LV vanishes in L 2(µV ). Namely, we take (X) = ⌧(f (X)) where f : R ! R is constant on each of the two intervals and is smooth. On the other hand, LV is not zero in tr(C 1 tr (R⇤d )), but the significance of this is unclear. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 36 / 44

Slide 59

Slide 59 text

Inversion of the Laplacian Theorem A comes out of two sets of tools: 1 The free entropy approach is used to show existence of a non-commutative law µ satisfying µ[r⇤ V f ] = 0 for f 2 Ctr(R⇤d )d . 2 The heat semigroup is used to uniqueness of a non-commutative law µ satisfying µ[LV ] = 0 for 2 tr(Ctr(R⇤d )) as well as constructing V . Let us start with (2). The broad outline is the same as Dab.-Gui.-Shl. 2016, but with di↵erent function spaces. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 37 / 44

Slide 60

Slide 60 text

Inversion of the Laplacian Recalling that LV = r⇤ V r, the heat semigroup is the family of operators e tLV for t 0. The rigorous definition is through free SDE theory. We set [e tLV /2 f ]A,⌧ (X) = EA[Xt(X)], where dXt(X) = dSt 1 2 rV (Xt(X)) dt, X0(X) = 0, where St is a semicircular Brownian motion freely independent of the initial condition X. The assumption that k@rV IdkBCtr  C2 < 1 implies that @X Xt decays like e t(1 C2)/2. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 38 / 44

Slide 61

Slide 61 text

Inversion of the Laplacian This in turn implies @[e tLV f ] decays like e t(1 C2). We recover the non-commutative law µV and the pseudo-inverse V , we argue that µV f = lim t!1 e tLV f and V f = Z 1 0 [e tLV f µV (f )] dt. These expressions make sense because of the exponential decay. The smoothness properties as well as the continuous dependence of V f on (V , f ) are proved by studying the smoothness properties of Xt(X) as a function of X, with some simpleminded inductive arguments. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 39 / 44

Slide 62

Slide 62 text

Free Gibbs laws — results A free Gibbs law for V is a non-commutative law µ that maximizes V (µ) := (µ) µ(V ), where is the free microstate entropy. We can show the following: 1 If V 2 W (R⇤d ) with @V and @2 V bounded, then a free Gibbs law always exists. 2 Due to the change of variables formula for entropy, any free Gibbs law µ must satisfy the Dyson-Schwinger equation µ[r⇤ V f ] = 0. 3 Fix C1, C2 > 0. The set of V which have a unique free Gibbs law is generic in the set VC1,C2 of V with k@V kBCtr  C1 and k@2 V kBCtr  C2, equipped with the subspace topology from tr(Ctr(R⇤d )). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 40 / 44

Slide 63

Slide 63 text

Free Gibbs laws — proof with lies The argument for the existence of free Gibbs laws relies on enlarging the space of laws in order to obtain more compactness. More precisely: 1 We embed the space of non-commutative laws into the dual of a Banach space C consisting of certain functions with quadratic growth at 1. 2 Letting E ✓ C? be the closure of the space of laws, it turns out that the elements of E with “second moment” (not operator norm) bounded by r is compact. 3 V is upper semi-continuous and it goes to 1 as the as the “second moment” of µ goes to 1, and thus we get a maximizer using compactness. 4 Using the change of variables formula for entropy, we deduce that any maximizer ⌫ satisfies the Dyson-Schwinger equation (for nice enough test functions). 5 Using the Dyson-Schwinger equation, we show iteratively that moments of ⌫ are finite, and ultimately that ⌫ 2 ⌃d,R for some R. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 41 / 44

Slide 64

Slide 64 text

Geodesic equations Definition The geodesic equations on W (R⇤d ) are the pair of equations 8 < : ˙ Vt = LVt t ˙ t = 1 2 hr t, r titr. These can be obtained formally as the large N limit of the geodesic equations for measures on MN (C)d sa . Thinking about the classical case, one is led to conjecture that nice enough solutions must have the form Vt = (id +tr ˙ 0)⇤V0. It is straightforward to check that when @r ˙ 0 is bounded, this formula defines a solution for small enough t. We do not show rigorously that these are the only solutions. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 42 / 44

Slide 65

Slide 65 text

Towards free optimal transport However, we can show rigorously that these paths minimize length with respect to the L 2-coupling distance when @r ˙ 0 is bounded by a constant C and when t 2 (0, 1/C). This follows from the more general proposition below. Definition For two non-commutative laws µ and ⌫, we define dW (µ, ⌫) as the infimum of kX Y k2 over all tracial C⇤-algebras (A, ⌧) and X, Y 2 Ad sa such that X = µ and Y = ⌫. Proposition Let 2 tr(C 2 tr (R⇤d ))sa such that k@r IdkBCtr < 1. Then for every (A, ⌧) and X 2 Ad sa , we have dW ( X , r (X) ) = kX r (X)k⌧,2. In other words, X and r (X) are an optimal coupling of their respective laws. David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 43 / 44

Slide 66

Slide 66 text

Towards free optimal transport The proof of the proposition is inspired by the classical Monge-Kantorovich duality. By the inverse function theorem, r has an inverse function, so define A,⌧ (Z) = hZ, ((r ) 1)A,⌧ (Z)i⌧ ( (r ) 1)A,⌧ (Z). Note that Y = ((r ) 1)A,⌧ (Z) maximizes the function hZ, Y i⌧ A,⌧ (Y ) by calculus and by convexity of . (So is the Legendre transform of .) Thus, A,⌧ (Y ) + A,⌧ (Z) hY , Zi⌧ for all Y , Z 2 Ad sa . David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 44 / 44

Slide 67

Slide 67 text

Towards free optimal transport Fix (A, ⌧) and X 2 Ad sa . If Y , Z is any coupling of X and r (X) on some other tracial C⇤-algebra (A0, ⌧0), then hY , Zi⌧0  A0,⌧0 (Y ) + A0,⌧0 (Z) = A,⌧ (X) + A,⌧ (r A,⌧ (X)) = hX, A,⌧ (X)i⌧ , where the last inequality follows by the definition of . Thus, X, r (X) is a coupling that maximizes the inner product between the first and second variable, which is equivalent to minimizing the L 2 distance (since the L 2 norms of X and Y are uniquely determined by the laws). David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 45 / 44