240

# Free Wasserstein manifold

We formulate a free probabilistic analog of the Wasserstein manifold on Rd (the formal Riemannian manifold of smooth probability densities on Rd). The points of the free Wasserstein manifold are certain smooth tracial non-commutative functions which correspond to minus the log-density in the classical setting. The manifold structure allows to formulate and study a number of differential equations giving rise to non-commutative transport maps as well as analogs of measure-preserving transformations . One of the applications of our results is the optimality (in the sense of the Biane- Voiculescu 2-Wasserstein distance) of certain monotone optimal transport maps, which correspond to geodesics in our manifold (joint work by D. Jekel, W. Li and D. Shlyakhtenko). ## Wuchen Li

February 22, 2021

## Transcript

1. Free Wasserstein manifold
Wuchen Li
University of South Carolina
Free probability seminar, UC Berkeley
Based on a joint work with David Jekel (UCSD) and Dimitri
Shlyakhtenko (UCLA).
1

2. Entropy, Fisher information and Transportation
In recent years, there are actively joint studies to connect entropy, Fisher
information and transportation. Nowadays, these connections have
applications in functional inequalities and dynamical behaviors of ﬂuid
dynamics.
In this talk, we discuss transportation theory in free probability by
studying “log-densities” for non-commutative random variables.
2

3. Metric in probability space
3

4. Brownian motion and heat equations
Consider a standard Brownian motion in Rd by
dXt =
p
2dBt.
Let ⇢(t, x) denote the probability density function of Xt
. Then ⇢ satisﬁes
the heat equation
@⇢t
@t
= r · (r⇢t) = ⇢t.
where ⇢t = ⇢(t, x), and r·, r are the divergence, gradient operators in
Rd, respectively.
4

5. Entropy dissipation
Consider the negative Boltzmann-Shannon entropy by
H(⇢) =
Z
Rd
⇢(x)log ⇢(x)dx.
Along the heat equation, the dissipation relation holds:
d
dtH(⇢t) =
Z
Rd
krx log ⇢tk
2⇢tdx = I(⇢t),
where I(⇢) is named the Fisher information functional.
There is a formulation behind this relation, namely
⇢(t, x) is a gradient ﬂow of entropy in optimal transport space.
5

6. Optimal transport
What is the optimal way to move or transport the mountain with shape
X, density ⇢0(x) to another shape Y with density ⇢1(y)?
Consider
DistT (⇢0, ⇢1) = inf
T
Z
Rd
kx T(x)k
2⇢0(x)dx,
where the inﬁmum is among all transport maps T, which transfers ⇢0(x)
to ⇢1(x), i.e.
⇢0(x) = ⇢1(T(x))det(rT(x)).
6

7. Overview
The optimal transport problem was ﬁrst introduced by Monge in 1781
and relaxed by Kantorovich in 1940. It introduces a distance on the
space of probability distributions, named optimal transport distance,
Wasserstein distance, or Earth Mover’s distance. There are many
viewpoints and applications of this distance:
I Linear programming;
I Mapping/Monge-Amp´
ere equation;
I Fluid dynamics;
I Density manifold (Arnold mechanics).
See Ambrosio, Villani, Otto and many more.
In the ﬁrst part of this talk, we mainly consider its transportation
formulation in classical probability. In the ﬁrst part, I focus on the
formulation in a classical probability. Then David will present the
7
(Gangbo et.al.)

8. Transport distance formulations
There is a relaxation formulation of classical optimal transport distance.
inf

Z
Rd
Z
Rd
kx yk
2⇡(x, y)dxdy,
where the inﬁmum is taken among all joint measures (transport plans)
⇡(x, y) having ⇢0(x) and ⇢1(y) as marginals, i.e.
Z
Rd
⇡(x, y)dy = ⇢0(x),
Z
Rd
⇡(x, y)dx = ⇢1(y), ⇡(x, y) 0.
Here
⇡(x, y) = (x, T(x) = x + r (0, x))#⇢0.
8

9. Dynamical optimal transport
11

10. Optimal transport space (Density manifold)
The optimal transport has a variational formulation (Benamou-Brenier
2000):
inf
v
Z 1
0
E
Xt
⇠⇢t
kv(t, Xt)k
2 dt,
where E is the expectation operator and the inﬁmum runs over all vector
ﬁelds vt
, such that
˙
Xt = v(t, Xt), X0 ⇠ ⇢0, X1 ⇠ ⇢1.
Under this metric, the probability set has a Riemannian geometry
structure1.
1John D. La↵erty: the density manifold and conﬁguration space quantization, 1988.
9

11. Riemannian metric for optimal transport
Informally speaking, Wasserstein metric refers to the following bilinear
form:
h ˙
⇢1, G(⇢) ˙
⇢2i =
Z
( ˙
⇢1, ( ⇢) 1 ˙
⇢2)dx.
In other words, denote ˙
⇢i = r · (⇢r i), i = 1, 2, then
h 1, G(⇢) 1
2i = h 1, r · (⇢r) 2i,
where ⇢ 2 P(⌦), ⇢i
is the tangent vector in P(⌦) with
Z
⇢idx = 0,
and i 2 C1(⌦) are cotangent vectors in P(⌦) at the point ⇢. Here r·,
r are standard divergence and gradient operators in ⌦.
10
.
.

@t⇢ = G(⇢) 1
⇢F(⇢)
=r · (⇢r ⇢F(⇢)).
Example
If F(⇢) =
R
F(x)⇢(x)dx, then the gradient ﬂow follows
@t⇢ = r · (⇢rF(x)).
11

13. Entropy dissipation revisited
The gradient ﬂow of the negative entropy
H(⇢) =
Z
Rd
⇢(x)log ⇢(x)dx,
w.r.t. optimal transport metric distance satisﬁes
@⇢
@t
= r · (⇢rlog ⇢) = ⇢.
Here the major trick is that
⇢r log ⇢ = r⇢.
In this way, one can study the entropy dissipation by
d
dtH(⇢) =
Z
Rd
log ⇢r · (⇢rlog ⇢)dx =
Z
Rd
kr log ⇢k
2⇢dx.
12

14. Optimal transport Hamiltonian ﬂows
Consider the Lagrangian by
L(⇢, @t⇢) =
1
2
Z ⇣
@t⇢, ( r · (⇢r)) 1@t⇢

dx F(⇢).
The Hamiltonian ﬂow satisﬁes the Euler-Lagrange equation
d
dt @t⇢L(⇢, @t⇢) = r⇢L(⇢, @t⇢).
13

15. Optimal transport Hamiltonian ﬂows
By the Legendre transform, i.e.
H(⇢, ) = sup
@t⇢
Z
@t⇢ dx L(⇢, @t⇢).
And the Hamiltonian system follows
@t⇢ = H(⇢, ), @t =
⇢H(⇢, ),
where

, are L2 ﬁrst variation operators w.r.t. ⇢, , respectively and
the density Hamiltonian forms
H(⇢, ) =
1
2
Z
kr k
2⇢dx + F(⇢).
Here ⇢ is the “density” state variable and is the “density” moment
variable.
14

16. Hamiltonian ﬂows: Compressible Euler equation
More explicitly, 8
<
:
@t⇢ + r · (⇢r ) = 0
@t +
1
2kr k
2 =
⇢F(⇢).
15

17. Why optimal transport formalisms?
I Generalized log-Sobolev inequalities and bound;
I Generalized dynamics: E.g., Schrodinger equation, Schrodinger
bridge problem and mean ﬁeld games;
I Generalized dualities and distances in information theory and AI.
16

18. Log density coordinates
As we will see in the second talk, it is more natural in the random matrix
and free setting to study the log-density rather than the density. Thus,
let us describe what happens in the classical case when we write
everything in terms of the log-density — introducing an alternative
coordinate system for the classical manifold of densities.
(⇢, ) ! (e V , ),
where
R
e V dx = 1. All formalisms of optimal transport need to be
17

19. Log density change of variable
Consider
@t⇢(t, x) = r · (⇢(t, x)r (t, x)).
Denote
⇢(t, x) = e V (t,x).
Then
@tV = @t log ⇢ =
@t⇢

=r · (⇢r )

=(r log ⇢, r ) +
= (rV, r ) +
= : LV .
Here
LV = (rV, r) + .
18

Similarly, we can formulate the gradient ﬂow in term of log densities.
Consider
@t⇢ =r · (⇢r ⇢F(⇢)).
Denote ⇢ = e V . Then
@tV = LV ⇢F(⇢)|⇢=e V
.
Example
If F(⇢) =
R
F(x)⇢(x)dx =
R
F(x)e V (x)dx, then the gradient ﬂow
follows
@tV = LV F(x).
19

21. Log density entropy dissipation
The gradient ﬂow of the negative entropy
H(⇢) =
Z
⇢ log ⇢dx =
Z
V e V dx,
w.r.t. optimal transport metric distance satisﬁes
@tV = LV V = krV k
2 + V.
In this log density coordinate, one can study the entropy dissipation by
d
dtH(⇢) =
Z
Rd
krV k
2e V dx.
20

22. Log density Hamiltonian ﬂows
Similarly, we can formulate the Hamiltonian ﬂow in term of log densities.
8
<
:
@tV LV = 0
@t +
1
2kr k
2 =
⇢F(⇢)|⇢=e V
.
21

23. The free Wasserstein manifold
David Jekel, Wuchen Li, Dima Shlyakhtenko
February 22, 2021
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 1 / 44

24. This work was supported in part by the National Science Foundation.
The talk will focus on the big picture and thus precise deﬁnitions will only
be given when helpful. Before the rigorous statements, there will be several
slides of motivation and introduction aimed at people who are familiar
with free probability. Feel free to interrupt with questions about notation.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 2 / 44

25. Motivation
The classical Wasserstein manifold P(M), whose points are smooth
positive probability densities, is an inﬁnite-dimensional Riemannian
framework which nicely describes things such as entropy, the heat
equation, log-Sobolev inequalities, optimal transport, measure-preserving
transformations, etc.
Because many of these notions have analogs in free probability and
random matrix theory, we want to deﬁne the tracial non-commutative
version of the Wasserstein manifold, in which we use non-commutative
A big obstacle is that we don’t have a direct analog of density in the
non-commutative setting. However, there are several indications that the
log-density is a better behaved notion.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 3 / 44

26. Motivation — random matrices
Given some self-adjoint non-commutative polynomial f (with nice enough
behavior at 1), we can deﬁne a function V
(N) : MN
(C)d
sa
! R by
V
(N)(x) = trN
(f (x)), where x = (x1, . . . , xd
) is a d-tuple of self-adjoint
N ⇥ N matrices and trN
= (1/N) Tr is the normalized trace on MN
(C).
Then we deﬁne a probability measure µ(N) on MN
(C)d
sa
by
dµ(N)(x) = constant(V , N)e
N2V (N)(x)
dx,
where dx is Lebesgue measure.
Letting X
(N) be a random matrix tuple chosen according to the measure
µ(N), a lot of past work has shown in certain cases that for every
non-commutative polynomial p, trN
(p(X
(N))) converges almost surely to
some deterministic limit, which is described by ⌧(p(X)) for some d-tuple
X from a tracial von Neumann algebra (A, ⌧). Then we might want to say
that “tr(p) is a log-density of the distribution of X.”
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 4 / 44

27. Motivation — free score function
This idea is closely related to Voiculescu’s idea of a free score function
(a.k.a. conjugate variable).
The classical score function of a measure with density ⇢ on Rd is
r log ⇢. If X is a random variable with density ⇢ and if
⇠ = (r log ⇢)(X), then we have the integration-by-parts relation
E[h⇠, f (X)i] = E[Tr(Df (X))],
for all f 2 C
1
c
(Rd , Rd ), where Df is the Jacobian matrix.
Given a tracial von Neumann algebra (A, ⌧) generated by X 2 Ad
sa
, we say
sa
is a free score function for X if
h⇠, p(X)i⌧
= ⌧ ⌦ ⌧ ⌦ Tr(J p)
for every non-commutative polynomial p, where J p is the matrix of
derivatives of p in the sense of Voiculescu’s free di↵erence quotient.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 5 / 44

28. Motivation — free score function
In the setting of the random matrix d-tuples X
(N) above given by
V
(N)(x) = trN
(f (x)), the free score function becomes very concrete. Since
⇢ = conste
N2V (N)
, the classical score function is (up to normalization)
rV
(N)(x), which after some matrix computations works out to D f (x),
where D f is Voiculescu’s cyclic gradient.
Classical integration by parts plus some matrix computations tell us that
E[hrV
(N)(X
(N)), p(X
(N))itrN
] = E[trN ⌦ trN ⌦ Trd
[J p(X
(N))]].
Thus, if X is the d-tuple of self-adjoint operators from (A, ⌧) describing
the large-N limit, then ⇠ = D p(X) is the free score function for X.
The existence of a free score means that “the gradient of the log-density
makes sense as an element of L
2.” This motivates us to make the
log-density the central object of study.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 6 / 44

29. Overview
We’re going to deﬁne the free Wasserstein manifold W (R⇤d ) as the space
of certain “log-density” functions V , which are a generalization of things
like tr(f ).
More precisely, V will some from a space of tracial non-commutative
smooth functions that are deﬁned in terms of trace polynomials. (Similar
spaces were deﬁned in Dabrowski, Guionnet, and Shlaykhtenko’s 2016
preprint on free transport, but we take a di↵erent approach to the norms.)
The tangent space of W (R⇤d ) at V is similarly a space of tracial
non-commutative smooth functions W , which are viewed as perturbations
of V .
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 7 / 44

30. Overview
In order to obtain a Riemannian metric for the Wasserstein manifold, we
must associate a non-commutative law µV
to each V . This is much
trickier than in the classical case (where we would just set
dµV
(x) = constant e
V (x)
dx), but there are two known methods for
doing this:
(1) For each V , ﬁnd a (hopefully unique) law µ that maximizes
(µ) µ(V ), where is Voiculescu’s free microstate entropy. This
approach is inspired by Voiculescu and is closely related to the random
matrix models discussed before.
(2) Set up the free stochastic di↵erential equation
dXt = dSt (1/2)rV (Xt) dt, where St is a free Brownian motion (still
self-adjoint d-tuple), and (hopefully) recover µV
as the limiting
distribution of Xt as t ! 1. This approach was pioneered by Biane and
Speicher (1999) and further developed by Guionnet, Shlyakhtenko, and
Dabrowski.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 8 / 44

31. Outline
Background on non-commutative laws.
Tracial non-commutative smooth functions.
Free Wasserstein manifold and di↵eomorphism group.
Riemannian metric.
Strategy to construct smooth transport.
Inversion of the Laplacian LV
through heat semigroup and SDE.
Free Gibbs laws through maximization of (µ) µ(V ).
Geodesics and optimal transport.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 9 / 44

32. Operator algebras and laws
Deﬁnition
A unital C⇤
-algebra is a subalgebra of B(H) (for some Hilbert space H)
that is closed under adjoints and limits in operator norm.
Deﬁnition
A tracial C⇤
-algebra is a pair (A, ⌧) where A is a C⇤-algebra and ⌧ is a
faithful trace, that is,
⌧(1) = 1,
⌧(a

a) 0 with equality if and only if a = 0,
⌧(ab) = ⌧(ba).
Remark
We don’t need to go into the deﬁnition of von Neumann algebras now.
But every tracial C⇤-algebra can be completed to a tracial von Neumann
algebra.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 10 / 44

33. Operator algebras and laws
Deﬁnition
Let Chx1, . . . , xd i be the algebra of non-commutative polynomials
equipped with the ⇤-operation such that x

j
= xj
. (We typically use x to
denote formal or generic variables and X to denote a speciﬁc tuple of
operators.)
Deﬁnition
Let ⌃d,R
be the set of linear functionals : Chx1, . . . , xd i ! C such that
(1) = 1,
(p ⇤ p) 0,
(pq) = (qp),
| (xi1
. . . xik
)|  R
k,
equipped with the weak-? topology (as a subset of the dual of
Chx1, . . . , xd i). We call elements of ⌃d,R non-commutative laws with
exponential bound R.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 11 / 44

34. Operator algebras and laws
Proposition
If (A, ⌧) is a tracial C⇤-algebra and X = (X1, . . . , Xd
sa
, then the map
X
: Chx1, . . . , xd i ! C, p 7! ⌧(p(X))
is a non-commutative law with exponential bound kXk1 = maxj kXj k.
Conversely, every 2 ⌃d,R
can be realized as X
for some (A, ⌧) and
sa
with kXk1  R.
The proof is a variant of the GNS construction. The proposition can be
interpreted as follows:
1 ⌃d,R
is the space of traces on the C⇤-universal free product
C([ R, R])⇤d .
2 ⌃d,R
is in bijection with isomorphism classes of triples (A, ⌧, X),
where (A, ⌧) is a tracial C⇤-algebra and X 2 Ad
sa
generates A; here
isomorphism means a C⇤-isomorphism that preserves the trace and
generators.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 12 / 44

35. Trace polynomials
We next describe non-commutative functions that are modeled on trace
polynomials in a similar spirit to Dab. Gui. Shl. 2016.
A trace polynomial in (x1, . . . , xd
) is an expression formed through
addition, multiplication, and application of a symbol tr, such as
f (x1, x2, x3) = tr(x
2
1 x2) tr(x3)x1 + tr(x2x3) + 5 tr(x1x2x3) tr(x1)x2x
2
3
.
These expressions are considered modulo the relations that
tr(pq) = tr(qp) and tr(tr(p)q) = tr(p) tr(q).
For any tracial C⇤-algebra (A, ⌧) and X 2 Ad
sa
, we can evaluate a trace
polynomial f on X by substituting Xj
for the formal symbol xj
and ⌧ for
the formal symbol tr. Hence, a trace polynomial f gives rise to a function
f
sa
! A.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 13 / 44

36. Trace polynomials
We next describe non-commutative functions that are modeled on trace
polynomials.
A trace polynomial in (x1, . . . , xd
) is an expression formed through
addition, multiplication, and application of a symbol tr, such as
f (x1, x2, x3) = tr(x
2
1 x2) tr(x3)x1 + tr(x2x3) + 5 tr(x1x2x3) tr(x1)x2x
2
3
.
These expressions are considered modulo the relations that
tr(pq) = tr(qp) and tr(tr(p)q) = tr(p) tr(q).
For any tracial C⇤-algebra (A, ⌧) and X 2 Ad
sa
, we can evaluate a trace
polynomial f on X by substituting Xj
for the formal symbol xj
and ⌧ for
the formal symbol tr. Hence, a trace polynomial f gives rise to a function
f
sa
! A.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 14 / 44

37. Trace polynomials
Trace polynomials have several advantages over non-commutative
polynomials.
1 It follows from the work of Procesi (1976) that every function
MN
(C)d
sa
! MN
(C) that is entrywise polynomial and is invariant
under unitary conjugation must be given by a trace polynomial.
2 For each trace polynomial f , we can compute the Laplacian of
f
MN (C),trN as a function on MN
(C)d
sa
(equipped with the inner
product from trN
). The Laplacian (1/N
2) f
MN (C),trN is a trace
polynomial and it converges coe cientwise as N ! 1 to some trace
polynomial Lf .
We’ll deﬁne the non-commutative space C
k(R⇤d ) roughly as functions
that such that the ﬁrst k derivatives can be approximated on
operator-norm balls by trace polynomials.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 15 / 44

38. Description of trace Ck functions
The space C
k
tr
(R⇤d ) is described as follows:
Each f 2 C
k
tr
(R⇤d ) is a collection of functions f
sa
! A for
tracial C⇤-algebras (A, ⌧).
f
A,⌧ must be a C
k function in the sense of Fr´
echet di↵erentiation.
The derivative @k
f
A,⌧ (X) is a multilinear map Ad
sa
⇥ · · · ⇥ Ad
sa
! A.
Inspired by the non-commutative H¨
older’s inequality, we deﬁne the
norm k@j
f
A,⌧ (X)kM j
as the smallest constant such that
k@j
f
A,⌧ (X)[Y1, . . . , Yk
]kp  k@j
f
A,⌧ (X)kM k
kY1kp1
. . . kYj kpj
.
where 1/p = 1/p1 + · · · + 1/pj
, and where j = 0, . . . , k.
Then k@j
f kM j ,R
is the supremum of k@j
f
A,⌧ (X)kM j
over (A, ⌧) and
sa
with kXk1  R.
For R > 0 and j  k, we assume that k@j
f kM j ,R
is ﬁnite and that
@j
f can be approximated in this norm by trace polynomials of X, Y1,
. . . , Yk
that are multilinear in Y1, . . . , Yk
.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 16 / 44

39. Properties of trace Ck functions
There are also spaces Ctr(R⇤d , M j (R⇤d1 , . . . , R⇤dn )) of functions where
f
A,⌧ (X) is a multilinear map Ad1
sa
⇥ · · · ⇥ Adn
sa
! A.
The exact deﬁnition of the space is less important than the properties:
These spaces are closed under composition, whenever the composition
makes sense, and they satisfy the chain rule.
There is an inverse function theorem: If f is C
k
tr
d-tuple, and if @f Id is uniformly bounded by a constant c < 1,
then f
1 is deﬁned and is C
k
tr
.
There is a trace map tr : C
k
tr
(R⇤d ) ! C
k
tr
(R⇤d ) given by
tr(f )A,⌧ (X) = ⌧(f
A,⌧ (X)). The image tr(C
k
tr
(R⇤d )) consists of those
f which are scalar-valued.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 17 / 44

40. Examples of trace Ck
tr functions
Of course, trace polynomials are C
1
tr
functions.
If : R ! R such that
R
R
|2⇡s|k b(s) ds < 1, then the function
f
A,⌧ (X) = (X) (deﬁned by functional calculus) is in C
k
tr
(R⇤1) and
the kth derivative is bounded by
R
R
|2⇡s|k b(s) ds.
Together with the chain rule, this shows that there is an abundance of
BC
k
tr
(R⇤d ) functions, that is, functions in C
k
tr
(R⇤d ) such that
k@j
f kM j ,u
:= sup
R>0
k@j
f kM j ,R
< 1.
Imposing certain growth conditions at 1 on a on Ctr(R⇤d ) function is
not a big restriction. This makes life easier than it would be if we
only used trace polynomials.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 18 / 44

41. Di↵erentiation of trace Ck functions
For scalar-valued g 2 C
k
tr
(R⇤d ), we can deﬁne a gradient rg 2 C
k
tr
(R⇤d ).
In the case where g = tr(p) for some non-commutative polynomial p, then
rg is the cyclic gradient of p.
The analog of C
k functions from Rd to Md
(C) is the space
C
k
tr
(R⇤d , M (R⇤d )). This is the space that contains the derivative @f when
f 2 C
k+1
tr
(R⇤d )d
sa
, as well as the Hessian of g when g is a scalar-valued
element of C
k+1
tr
(R⇤d ).
For F 2 C
k
tr
(R⇤d , M (R⇤d )), for X 2 Ad
sa
, the object F
A,⌧ (X) is a linear
transformation Ad ! Ad . We deﬁne F#G to be the pointwise
composition of these linear transformations. It turns out that
C
k
tr
(R⇤d , M (R⇤d )) is a ⇤-algebra with respect to the #-multiplication.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 19 / 44

42. Di↵erentiation of trace Ck functions
We can deﬁne a trace Tr#
: C
k
tr
(R⇤d , M (R⇤d )) ! tr(C
k
tr
(R⇤d )) by
(Tr#
(F))A,⌧ (X) = hS, F
A⇤B,⌧⇤ (X)[S]i⌧⇤ ,
where (B, ) is the tracial C⇤-algebra of generated by a free semicircular
d-tuple S.
This is the analog of the map C
k(Rd , Md
(C)) ! C
k(Rd ) deﬁned by
pointwise application of the trace Trd
on Md
(C). This is because the trace
of a matrix A can be expressed as EhY , AY i where Y is a standard
Gaussian random vector in Rd , and the analog of the Gaussian in free
probability is the semicircular family.
Another motivating example is that if F 2 C
k
tr
(R⇤d , M (R⇤d )) is given by
F
A,⌧ (X)[Y ]i
=
P
j p
A,⌧
i,j
(X)Yj q
A,⌧
i,j
(X) for some matrix (pi,j ⌦ qi,j
)i,j
of
non-commutative polynomials, then
Tr#
(F)A,⌧ (X) =
X
i
⌧(pi,i
(X))⌧(qi,i
(X)).
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 20 / 44

43. Di↵erentiation of trace Ck functions
The trace Tr#
allows us to deﬁne the divergence operator
r† : C
k+1
tr
(R⇤d , M (R⇤d )) ! tr(C
k
tr
(R⇤d ))
as the trace of the Jacobian, as well as the Laplacian
L = r†r : tr(C
k+2
tr
(R⇤d )) ! tr(C
k
tr
(R⇤d )).
These operators are the limits of the corresponding normalized divergence
and Laplacian for functions on MN
(C)d
sa
.
Furthermore, the trace Tr#
log-|determinant| map
log #
: GL(C
k
tr
(R⇤d , M (R⇤d ))) ! tr(C
k
tr
(R⇤d )).
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 21 / 44

44. Free Wasserstein manifold and di↵eomorphism group
We’ll ﬁrst set up the manifold formally. Afterwards, we’ll describe how to
extract a non-commutative law µV
from V and hence deﬁne the
Riemannian metric.
Deﬁnition
The free Wasserstein manifold W (R⇤d ) is the set of V 2 tr(C
1
tr
(R⇤d ))
such that V has “quadratic growth at 1” in the sense that for some
constants a, a
0 > 0 and b, b
0 2 R, we have
a
X
j
⌧(X
2
j
) + b  V
A,⌧ (X)  a
0
X
j
⌧(X
2
j
) + b
0.
Deﬁnition
The free di↵eomorphism group D(R⇤d ) is the set of f 2 C
1
tr
(R⇤d )d
sa
such
that f has an inverse function f
1 in C
1(R⇤d )d
sa
, and @f , @f
1 are
bounded. Note this is a group under composition.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 22 / 44

45. Tangent vectors
A tangent vector to W (R⇤d ) at V is an equivalence class of C
1 paths
( ✏, ✏) ! W (R⇤d ) : t 7! Vt with V0 = V , where two paths are equivalent
if they have the same ˙
V0. For convenience, we assume that Vt satisﬁes the
quadratic growth bounds with a, a
0, b, b
0 independent of t.
A tangent vector to D(R⇤d ) at id is similarly an equivalence class of C
1
paths t 7! ft with f0 = id, and the equivalence is equality of ˙
f0. Again,
assume that @ft and @f
1
t
are uniformly bounded.
Here, by “C
1 path”, we mean it is continuously di↵erentiable with respect
to the Fr´
echet topology of C
1
tr
(R⇤d ) on the target space (deﬁned by the
seminorms of each derivative @j
f on each ball of radius R).
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 23 / 44

46. The transport action
In the classical case, one studies the action of Di↵(Rd ) on P(Rd ) by
push-forward, which is viewed as an inﬁnite-dimensional Lie group acting
on an inﬁnite-dimensional Riemannian manifold. If µ has density e
V and
if f is a di↵eomorphism, then f⇤µ has density e
(V f 1 log | det Df 1|) using
the classical change of variables formula. This motivates the following
deﬁnition.
Deﬁnition
We deﬁne the transport action D(R⇤d ) y W (R⇤d ) by
(f , V ) 7! f⇤V := V f
1 log #
(@f
1).
One can check this is a well-deﬁned group action.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 24 / 44

47. Di↵erential of the transport action
The key computation behind transport theory is the description of the
di↵erential of the transport action. We deﬁne
r⇤
V
: C
1
tr
(R⇤d )d ! tr(C
1
tr
(R⇤d ))
by
r⇤
V f = r†
f + @V #f = Tr#
(@f ) + hrV , f itr.
(This is just notation; it is not actually the adjoint.)
Lemma
Let V 2 W (R⇤d ) and let t 7! ft be a tangent vector to D(R⇤d ) at id.
Then
d
dt t=0
(ft)⇤V = r⇤
V
˙
f0.
In other words, r⇤
V
is the di↵erential at id of the orbit map
D(R⇤d ) ! W (R⇤d ) : f 7! f⇤V .
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 25 / 44

48. D(R⇤d) as a Lie group
We saw that the tangent space of D(R⇤d ) is (a dense subspace of) the
space of vector ﬁelds Ctr(R⇤d )d
sa
. Conversely:
Lemma
Given a time-dependent vector ﬁeld t 7! ht (continuous in t) such that @ht
is uniformly bounded, there exists a unique path ft in D(R⇤d ) such that
f0 = id, ˙
ft = ht ft.
The proof is similar to classical ODE theory. If h is independent of t, then
we get a one-parameter subgroup of D(R⇤d ). Combining this with our
previous observation:
Lemma
Let h 2 Ctr(R⇤d )d
sa
with @h bounded, and let ft be the corresponding
one-parameter subgroup. Then (ft)⇤V = V for all t if and only if
r⇤
V h = 0.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 26 / 44

49. D(R⇤d) as a Lie group
By studying the one-parameter subgroups of D(R⇤d ) as described above,
we arrive at the following deﬁnition of the Lie bracket, completely
analogous to the Lie bracket on vector ﬁelds of Rd .
Deﬁnition
For two vector ﬁelds h1, h2 2 Ctr(R⇤d )d , let
[h1, h2] = @h1#h2 @h2#h1.
This generalizes the deﬁnition of Lie brackets for non-commutative
polynomials used in Voiculescu’s paper “Cyclomorphy.”
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 27 / 44

50. D(R⇤d) as a Lie group
For each V 2 W (R⇤d ), its stabilizer {f 2 D(R⇤d ) : f⇤V = V } is a “Lie
subgroup,” analogous to a classical group of measure-preserving
transformations.
By our previous observations, the corresponding Lie subalgebra should be
the set of vector ﬁelds h with r⇤
V h = 0. We can verify directly that this is
indeed a Lie subalgebra:
Lemma
r⇤
V
[h1, h2] = @(r⇤
V h1)#h2 @(r⇤
V h2)#h1, and in particular ker(r⇤
V
) is
closed under Lie brackets.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 28 / 44

51. Two ingredients for the Riemannian metric
In order to deﬁne the Riemannian metric on the tangent space at V , we
need two conditions on V . We will worry later about checking when these
are true.
Condition 1
There exists a unique non-commutative law µV
satisfying the
Dyson-Schwinger equation µV
[r⇤
V f ] = 0 for f 2 Ctr(R⇤d )d .
Note that r⇤
V f is a scalar-valued function approximated by trace
polynomials, and µV
[r⇤
V f ] is evaluated as r⇤
V f
A,⌧ (X) for any X with
X
= µV
.
Condition 2
The operator LV
= r⇤
V
r : tr(C
1
tr
(R⇤d )) ! tr(C
1
tr
(R⇤d )) has kernel
equal to the constant functions, and it has a continuous pseudo-inverse
V
: tr(C
1
tr
(R⇤d )) ! tr(C
1
tr
(R⇤d )) with µ( V f ) = 0 and
V LV f + µV
(f ) = f .
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 29 / 44

52. The Riemannian metric
Deﬁnition
If V satisﬁes Conditions 1 and 2, the Riemannian metric on TV W (R⇤d ) is
given by
h ˙
V1, ˙
V2iV
= µV
[hr V
˙
V1, r V V2itr].
Remark
This deﬁnition relates to the Riemannian metric for measures on MN
(C)d
sa
.
If µ(N)
V
is the measure with density constant times e
N2V MN (C),trN , then the
classical Riemannian metric can be expressed as
Z
hr(LV (N)
) 1 ˙
V1, r(LV (N)
) 1 ˙
V1itrN dµ(N) = N
2
Z
˙
V1(LV (N)
) 1 ˙
V2 dµ(N).
The expression on the right-hand side seems simpler, but it is
dimension-dependent!!
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 30 / 44

53. Consequences of Dyson-Schwinger equation
If Conditions 1 and 2 hold for some V , then using the formula for
r⇤
V
[h1, h2], one can show that ker(r⇤
V
) and Im(r) are orthogonal with
respect to V .
Furthermore, r V r⇤
V
: C
1
tr
(R⇤d )d ! C
1
tr
(R⇤d )d deﬁnes a projection
onto the space of gradients. The complementary projection is known as
the Leray projection.
Remark
In the classical setting, the decomposition of vector ﬁelds into ker(r⇤
V
)
and Im(r) is an inﬁnitesimal version of Brenier’s factorization of a
di↵eomorphism into an optimal transport map and a µV
-preserving
transformation.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 31 / 44

54. Warnings
Although Condition 1 stipulates that µV
is uniquely determined by V ,
there are many cases where V is not uniquely determined by V . For
instance, since µV
arises from bounded operators (it is “supported on a
operator norm ball”), often modifying V outside an operator norm ball will
not change µV
.
Another way in which degeneracy arises is from the use of trace
polynomials. If a particular (A, ⌧) and X are given, and if f is a trace
polynomial, then f
A,⌧ (X) agrees with p(X) for some non-commutative
polynomial p. We can easily imagine that many V lead to the same µ for
this reason.
Relatedly, the Riemannian metric on the tangent space could have a very
large kernel because when we take the inner product in L
2(µV
), all the
tr(p) terms are collapsed to constants.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 32 / 44

55. Construction of transport
Closely related to the previous observation about the di↵erential of the
transport action, we have:
Lemma
Suppose that t 7! Vt is a C
1 path in W (R⇤d ), for t in some interval
containing 0. Let ht be a vector ﬁeld with @ht uniformly bounded and
r⇤
Vt
ht = ˙
Vt. Let ft be the ﬂow along the vector ﬁeld ht. Then
(ft)⇤V0 = Vt.
Suppose we are given the path t ! Vt (perhaps interpolating between
some given V0 and V1) and we want to construct ht. If each Vt satisﬁes
Conditions 1 and 2, then we can take ht = V r ˙
Vt. For @ht to be
bounded, we require some concrete estimate on V
. For ht to depend
continuously on t, we need some joint continuous dependence of V f on
V and f , at least for some family of V ’s that contains our given path. If
these conditions are met, then some smooth transport exists.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 33 / 44

56. Construction of transport
The following theorem is similar to previous work such as
Guionnet-Shlyakhtenko 2009, Dabrowski-Guionnet-Shlyakhtenko 2016.
Theorem A
Fix C1, C2, C3 > 0 with C2 < 1. Consider V 2 tr(C
1
tr
(R⇤d )) such that
krV kBCtr
 C1 and k@rV IdkBCtr
 C2.
V satisﬁes Conditions 1 and 2.
For such V , the map (V , f ) 7! V f is jointly continuous with respect
to the Fr´
echet topology on C
1
tr
.
Let k 0. If V is as above and furthermore @j
V is bounded by some
constant Cj
for j  k + 2, then V
maps BC
k
tr
into BC
k
tr
.
The theorem implies that for a path t 7! Vt, if rVt, @rVt, @2rVt, r ˙
Vt,
@r ˙
Vt are uniformly bounded, with k@rVt IdkBCtr
 C2 < 1, then the
above construction of transport works.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 34 / 44

57. Construction of transport
From this result, one immediately gets isomorphisms of C⇤-algebras
associated to the non-commutative laws µVt
.
Theorem B
For a path t 7! Vt satisfying the conditions on the previous slide, there
exists a C
1 path t 7! ft of di↵eomorphisms with (ft)⇤V0 = Vt. These give
rise to isomorphisms between the tracial C⇤-algebras (and the von
Neumann algebras) associated to the GNS representations of the
non-commutative laws µVt
. In particular, when V is as in the previous
theorem, the C⇤-algebra of µV
is isomorphic to the one generated by a
free semicircular family.
There is one thing to check to ﬁnish the proof: If f⇤V0 = V1, then does
f⇤µV0
= µV1
? For the potentials Vt as in the previous slide, this can be
checked from the free entropy viewpoint, which will be explained later.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 35 / 44

58. Warnings
These results are not true for arbitrary V , even in the one variable case.
Indeed, as in Biane-Speicher 1999, consider V
A,⌧ (X) = ⌧(f (X)) where
f : R ! R is a “double well” potential. If the wells are deep enough, then
in the large N limit the spectral distribution is supported on a disjoint
union of two intervals. Hence, the C⇤-algebra is C[0, 1] C[0, 1], which is
not isomorphic to the C⇤-algebra C[0, 1] which is obtained in the
semicircular case.
Actually, Condition 1 fails for a such a potential because other measures
satisfying the Dyson-Schwinger equation are obtained by reweighting the
two components.
Relatedly, there are non-constant smooth functions such that LV
vanishes in L
2(µV
). Namely, we take (X) = ⌧(f (X)) where f : R ! R is
constant on each of the two intervals and is smooth. On the other hand,
LV
is not zero in tr(C
1
tr
(R⇤d )), but the signiﬁcance of this is unclear.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 36 / 44

59. Inversion of the Laplacian
Theorem A comes out of two sets of tools:
1 The free entropy approach is used to show existence of a
non-commutative law µ satisfying µ[r⇤
V f ] = 0 for f 2 Ctr(R⇤d )d .
2 The heat semigroup is used to uniqueness of a non-commutative law
µ satisfying µ[LV
] = 0 for 2 tr(Ctr(R⇤d )) as well as constructing
V
.
2016, but with di↵erent function spaces.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 37 / 44

60. Inversion of the Laplacian
Recalling that LV
= r⇤
V
r, the heat semigroup is the family of operators
e
tLV for t 0. The rigorous deﬁnition is through free SDE theory. We set
[e
tLV /2
f ]A,⌧ (X) = EA[Xt(X)],
where
dXt(X) = dSt
1
2
rV (Xt(X)) dt, X0(X) = 0,
where St is a semicircular Brownian motion freely independent of the
initial condition X.
The assumption that k@rV IdkBCtr
 C2 < 1 implies that @X Xt decays
like e
t(1 C2)/2.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 38 / 44

61. Inversion of the Laplacian
This in turn implies @[e
tLV f ] decays like e
t(1 C2). We recover the
non-commutative law µV
and the pseudo-inverse V
, we argue that
µV f = lim
t!1
e
tLV
f
and
V f =
Z 1
0
[e
tLV
f µV
(f )] dt.
These expressions make sense because of the exponential decay.
The smoothness properties as well as the continuous dependence of V f
on (V , f ) are proved by studying the smoothness properties of Xt(X) as a
function of X, with some simpleminded inductive arguments.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 39 / 44

62. Free Gibbs laws — results
A free Gibbs law for V is a non-commutative law µ that maximizes
V
(µ) := (µ) µ(V ), where is the free microstate entropy.
We can show the following:
1 If V 2 W (R⇤d ) with @V and @2
V bounded, then a free Gibbs law
always exists.
2 Due to the change of variables formula for entropy, any free Gibbs law
µ must satisfy the Dyson-Schwinger equation µ[r⇤
V f ] = 0.
3 Fix C1, C2 > 0. The set of V which have a unique free Gibbs law is
generic in the set VC1,C2
of V with k@V kBCtr
 C1 and
k@2
V kBCtr
 C2, equipped with the subspace topology from
tr(Ctr(R⇤d )).
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 40 / 44

63. Free Gibbs laws — proof with lies
The argument for the existence of free Gibbs laws relies on enlarging the
space of laws in order to obtain more compactness. More precisely:
1 We embed the space of non-commutative laws into the dual of a
Banach space C consisting of certain functions with quadratic growth
at 1.
2 Letting E ✓ C? be the closure of the space of laws, it turns out that
the elements of E with “second moment” (not operator norm)
bounded by r is compact.
3 V
is upper semi-continuous and it goes to 1 as the as the
“second moment” of µ goes to 1, and thus we get a maximizer
using compactness.
4 Using the change of variables formula for entropy, we deduce that any
maximizer ⌫ satisﬁes the Dyson-Schwinger equation (for nice enough
test functions).
5 Using the Dyson-Schwinger equation, we show iteratively that
moments of ⌫ are ﬁnite, and ultimately that ⌫ 2 ⌃d,R
for some R.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 41 / 44

64. Geodesic equations
Deﬁnition
The geodesic equations on W (R⇤d ) are the pair of equations
8
<
:
˙
Vt = LVt t
˙
t =
1
2
hr t, r titr.
These can be obtained formally as the large N limit of the geodesic
equations for measures on MN
(C)d
sa
.
Thinking about the classical case, one is led to conjecture that nice
enough solutions must have the form
Vt = (id +tr ˙
0)⇤V0.
It is straightforward to check that when @r ˙
0 is bounded, this formula
deﬁnes a solution for small enough t. We do not show rigorously that
these are the only solutions.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 42 / 44

65. Towards free optimal transport
However, we can show rigorously that these paths minimize length with
respect to the L
2-coupling distance when @r ˙
0 is bounded by a constant
C and when t 2 (0, 1/C). This follows from the more general proposition
below.
Deﬁnition
For two non-commutative laws µ and ⌫, we deﬁne dW
(µ, ⌫) as the
inﬁmum of kX Y k2 over all tracial C⇤-algebras (A, ⌧) and X, Y 2 Ad
sa
such that X
= µ and Y
= ⌫.
Proposition
Let 2 tr(C
2
tr
(R⇤d ))sa such that k@r IdkBCtr
< 1. Then for every
(A, ⌧) and X 2 Ad
sa
, we have dW
( X , r (X)
) = kX r (X)k⌧,2. In
other words, X and r (X) are an optimal coupling of their respective
laws.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 43 / 44

66. Towards free optimal transport
The proof of the proposition is inspired by the classical
Monge-Kantorovich duality.
By the inverse function theorem, r has an inverse function, so deﬁne
A,⌧ (Z) = hZ, ((r ) 1)A,⌧ (Z)i⌧
( (r ) 1)A,⌧ (Z).
Note that Y = ((r ) 1)A,⌧ (Z) maximizes the function
hZ, Y i⌧
A,⌧ (Y )
by calculus and by convexity of . (So is the Legendre transform of .)
Thus,
A,⌧ (Y ) + A,⌧ (Z) hY , Zi⌧
for all Y , Z 2 Ad
sa
.
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 44 / 44

67. Towards free optimal transport
Fix (A, ⌧) and X 2 Ad
sa
. If Y , Z is any coupling of X
and r (X)
on
some other tracial C⇤-algebra (A0, ⌧0), then
hY , Zi⌧0  A0,⌧0
(Y ) + A0,⌧0
(Z)
= A,⌧ (X) + A,⌧ (r A,⌧ (X))
= hX, A,⌧ (X)i⌧ ,
where the last inequality follows by the deﬁnition of . Thus, X, r (X) is
a coupling that maximizes the inner product between the ﬁrst and second
variable, which is equivalent to minimizing the L
2 distance (since the L
2
norms of X and Y are uniquely determined by the laws).
David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 45 / 44