$30 off During Our Annual Pro Sale. View Details »

Free Wasserstein manifold

Wuchen Li
February 22, 2021
240

Free Wasserstein manifold

We formulate a free probabilistic analog of the Wasserstein manifold on Rd (the formal Riemannian manifold of smooth probability densities on Rd). The points of the free Wasserstein manifold are certain smooth tracial non-commutative functions which correspond to minus the log-density in the classical setting. The manifold structure allows to formulate and study a number of differential equations giving rise to non-commutative transport maps as well as analogs of measure-preserving transformations . One of the applications of our results is the optimality (in the sense of the Biane- Voiculescu 2-Wasserstein distance) of certain monotone optimal transport maps, which correspond to geodesics in our manifold (joint work by D. Jekel, W. Li and D. Shlyakhtenko).

Wuchen Li

February 22, 2021
Tweet

Transcript

  1. Free Wasserstein manifold
    Wuchen Li
    University of South Carolina
    Free probability seminar, UC Berkeley
    Based on a joint work with David Jekel (UCSD) and Dimitri
    Shlyakhtenko (UCLA).
    1

    View Slide

  2. Entropy, Fisher information and Transportation
    In recent years, there are actively joint studies to connect entropy, Fisher
    information and transportation. Nowadays, these connections have
    applications in functional inequalities and dynamical behaviors of fluid
    dynamics.
    In this talk, we discuss transportation theory in free probability by
    studying “log-densities” for non-commutative random variables.
    2

    View Slide

  3. Metric in probability space
    3

    View Slide

  4. Brownian motion and heat equations
    Consider a standard Brownian motion in Rd by
    dXt =
    p
    2dBt.
    Let ⇢(t, x) denote the probability density function of Xt
    . Then ⇢ satisfies
    the heat equation
    @⇢t
    @t
    = r · (r⇢t) = ⇢t.
    where ⇢t = ⇢(t, x), and r·, r are the divergence, gradient operators in
    Rd, respectively.
    4

    View Slide

  5. Entropy dissipation
    Consider the negative Boltzmann-Shannon entropy by
    H(⇢) =
    Z
    Rd
    ⇢(x)log ⇢(x)dx.
    Along the heat equation, the dissipation relation holds:
    d
    dtH(⇢t) =
    Z
    Rd
    krx log ⇢tk
    2⇢tdx = I(⇢t),
    where I(⇢) is named the Fisher information functional.
    There is a formulation behind this relation, namely
    ⇢(t, x) is a gradient flow of entropy in optimal transport space.
    5

    View Slide

  6. Optimal transport
    What is the optimal way to move or transport the mountain with shape
    X, density ⇢0(x) to another shape Y with density ⇢1(y)?
    Consider
    DistT (⇢0, ⇢1) = inf
    T
    Z
    Rd
    kx T(x)k
    2⇢0(x)dx,
    where the infimum is among all transport maps T, which transfers ⇢0(x)
    to ⇢1(x), i.e.
    ⇢0(x) = ⇢1(T(x))det(rT(x)).
    6

    View Slide

  7. Overview
    The optimal transport problem was first introduced by Monge in 1781
    and relaxed by Kantorovich in 1940. It introduces a distance on the
    space of probability distributions, named optimal transport distance,
    Wasserstein distance, or Earth Mover’s distance. There are many
    viewpoints and applications of this distance:
    I Linear programming;
    I Mapping/Monge-Amp´
    ere equation;
    I Fluid dynamics;
    I Density manifold (Arnold mechanics).
    See Ambrosio, Villani, Otto and many more.
    In the first part of this talk, we mainly consider its transportation
    formulation in classical probability. In the first part, I focus on the
    formulation in a classical probability. Then David will present the
    formulation for free probability.
    7
    (Gangbo et.al.)

    View Slide

  8. Transport distance formulations
    There is a relaxation formulation of classical optimal transport distance.
    inf

    Z
    Rd
    Z
    Rd
    kx yk
    2⇡(x, y)dxdy,
    where the infimum is taken among all joint measures (transport plans)
    ⇡(x, y) having ⇢0(x) and ⇢1(y) as marginals, i.e.
    Z
    Rd
    ⇡(x, y)dy = ⇢0(x),
    Z
    Rd
    ⇡(x, y)dx = ⇢1(y), ⇡(x, y) 0.
    Here
    ⇡(x, y) = (x, T(x) = x + r (0, x))#⇢0.
    8

    View Slide

  9. Dynamical optimal transport
    11

    View Slide

  10. Optimal transport space (Density manifold)
    The optimal transport has a variational formulation (Benamou-Brenier
    2000):
    inf
    v
    Z 1
    0
    E
    Xt
    ⇠⇢t
    kv(t, Xt)k
    2 dt,
    where E is the expectation operator and the infimum runs over all vector
    fields vt
    , such that
    ˙
    Xt = v(t, Xt), X0 ⇠ ⇢0, X1 ⇠ ⇢1.
    Under this metric, the probability set has a Riemannian geometry
    structure1.
    1John D. La↵erty: the density manifold and configuration space quantization, 1988.
    9

    View Slide

  11. Riemannian metric for optimal transport
    Informally speaking, Wasserstein metric refers to the following bilinear
    form:
    h ˙
    ⇢1, G(⇢) ˙
    ⇢2i =
    Z
    ( ˙
    ⇢1, ( ⇢) 1 ˙
    ⇢2)dx.
    In other words, denote ˙
    ⇢i = r · (⇢r i), i = 1, 2, then
    h 1, G(⇢) 1
    2i = h 1, r · (⇢r) 2i,
    where ⇢ 2 P(⌦), ⇢i
    is the tangent vector in P(⌦) with
    Z
    ⇢idx = 0,
    and i 2 C1(⌦) are cotangent vectors in P(⌦) at the point ⇢. Here r·,
    r are standard divergence and gradient operators in ⌦.
    10
    .
    .

    View Slide

  12. Optimal transport gradient flows
    The Wasserstein gradient flow of an energy functional F(⇢) leads to
    @t⇢ = G(⇢) 1
    ⇢F(⇢)
    =r · (⇢r ⇢F(⇢)).
    Example
    If F(⇢) =
    R
    F(x)⇢(x)dx, then the gradient flow follows
    @t⇢ = r · (⇢rF(x)).
    11

    View Slide

  13. Entropy dissipation revisited
    The gradient flow of the negative entropy
    H(⇢) =
    Z
    Rd
    ⇢(x)log ⇢(x)dx,
    w.r.t. optimal transport metric distance satisfies
    @⇢
    @t
    = r · (⇢rlog ⇢) = ⇢.
    Here the major trick is that
    ⇢r log ⇢ = r⇢.
    In this way, one can study the entropy dissipation by
    d
    dtH(⇢) =
    Z
    Rd
    log ⇢r · (⇢rlog ⇢)dx =
    Z
    Rd
    kr log ⇢k
    2⇢dx.
    12

    View Slide

  14. Optimal transport Hamiltonian flows
    Consider the Lagrangian by
    L(⇢, @t⇢) =
    1
    2
    Z ⇣
    @t⇢, ( r · (⇢r)) 1@t⇢

    dx F(⇢).
    The Hamiltonian flow satisfies the Euler-Lagrange equation
    d
    dt @t⇢L(⇢, @t⇢) = r⇢L(⇢, @t⇢).
    13

    View Slide

  15. Optimal transport Hamiltonian flows
    By the Legendre transform, i.e.
    H(⇢, ) = sup
    @t⇢
    Z
    @t⇢ dx L(⇢, @t⇢).
    And the Hamiltonian system follows
    @t⇢ = H(⇢, ), @t =
    ⇢H(⇢, ),
    where

    , are L2 first variation operators w.r.t. ⇢, , respectively and
    the density Hamiltonian forms
    H(⇢, ) =
    1
    2
    Z
    kr k
    2⇢dx + F(⇢).
    Here ⇢ is the “density” state variable and is the “density” moment
    variable.
    14

    View Slide

  16. Hamiltonian flows: Compressible Euler equation
    More explicitly, 8
    <
    :
    @t⇢ + r · (⇢r ) = 0
    @t +
    1
    2kr k
    2 =
    ⇢F(⇢).
    15

    View Slide

  17. Why optimal transport formalisms?
    I Generalized log-Sobolev inequalities and bound;
    I Generalized dynamics: E.g., Schrodinger equation, Schrodinger
    bridge problem and mean field games;
    I Generalized dualities and distances in information theory and AI.
    16

    View Slide

  18. Log density coordinates
    As we will see in the second talk, it is more natural in the random matrix
    and free setting to study the log-density rather than the density. Thus,
    let us describe what happens in the classical case when we write
    everything in terms of the log-density — introducing an alternative
    coordinate system for the classical manifold of densities.
    (⇢, ) ! (e V , ),
    where
    R
    e V dx = 1. All formalisms of optimal transport need to be
    adjusted accordingly.
    17

    View Slide

  19. Log density change of variable
    Consider
    @t⇢(t, x) = r · (⇢(t, x)r (t, x)).
    Denote
    ⇢(t, x) = e V (t,x).
    Then
    @tV = @t log ⇢ =
    @t⇢

    =r · (⇢r )

    =(r log ⇢, r ) +
    = (rV, r ) +
    = : LV .
    Here
    LV = (rV, r) + .
    18

    View Slide

  20. Log density gradient flows
    Similarly, we can formulate the gradient flow in term of log densities.
    Consider
    @t⇢ =r · (⇢r ⇢F(⇢)).
    Denote ⇢ = e V . Then
    @tV = LV ⇢F(⇢)|⇢=e V
    .
    Example
    If F(⇢) =
    R
    F(x)⇢(x)dx =
    R
    F(x)e V (x)dx, then the gradient flow
    follows
    @tV = LV F(x).
    19

    View Slide

  21. Log density entropy dissipation
    The gradient flow of the negative entropy
    H(⇢) =
    Z
    ⇢ log ⇢dx =
    Z
    V e V dx,
    w.r.t. optimal transport metric distance satisfies
    @tV = LV V = krV k
    2 + V.
    In this log density coordinate, one can study the entropy dissipation by
    d
    dtH(⇢) =
    Z
    Rd
    krV k
    2e V dx.
    20

    View Slide

  22. Log density Hamiltonian flows
    Similarly, we can formulate the Hamiltonian flow in term of log densities.
    8
    <
    :
    @tV LV = 0
    @t +
    1
    2kr k
    2 =
    ⇢F(⇢)|⇢=e V
    .
    21

    View Slide

  23. The free Wasserstein manifold
    David Jekel, Wuchen Li, Dima Shlyakhtenko
    February 22, 2021
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 1 / 44

    View Slide

  24. This work was supported in part by the National Science Foundation.
    The talk will focus on the big picture and thus precise definitions will only
    be given when helpful. Before the rigorous statements, there will be several
    slides of motivation and introduction aimed at people who are familiar
    with free probability. Feel free to interrupt with questions about notation.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 2 / 44

    View Slide

  25. Motivation
    The classical Wasserstein manifold P(M), whose points are smooth
    positive probability densities, is an infinite-dimensional Riemannian
    framework which nicely describes things such as entropy, the heat
    equation, log-Sobolev inequalities, optimal transport, measure-preserving
    transformations, etc.
    Because many of these notions have analogs in free probability and
    random matrix theory, we want to define the tracial non-commutative
    version of the Wasserstein manifold, in which we use non-commutative
    laws instead of measures.
    A big obstacle is that we don’t have a direct analog of density in the
    non-commutative setting. However, there are several indications that the
    log-density is a better behaved notion.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 3 / 44

    View Slide

  26. Motivation — random matrices
    Given some self-adjoint non-commutative polynomial f (with nice enough
    behavior at 1), we can define a function V
    (N) : MN
    (C)d
    sa
    ! R by
    V
    (N)(x) = trN
    (f (x)), where x = (x1, . . . , xd
    ) is a d-tuple of self-adjoint
    N ⇥ N matrices and trN
    = (1/N) Tr is the normalized trace on MN
    (C).
    Then we define a probability measure µ(N) on MN
    (C)d
    sa
    by
    dµ(N)(x) = constant(V , N)e
    N2V (N)(x)
    dx,
    where dx is Lebesgue measure.
    Letting X
    (N) be a random matrix tuple chosen according to the measure
    µ(N), a lot of past work has shown in certain cases that for every
    non-commutative polynomial p, trN
    (p(X
    (N))) converges almost surely to
    some deterministic limit, which is described by ⌧(p(X)) for some d-tuple
    X from a tracial von Neumann algebra (A, ⌧). Then we might want to say
    that “tr(p) is a log-density of the distribution of X.”
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 4 / 44

    View Slide

  27. Motivation — free score function
    This idea is closely related to Voiculescu’s idea of a free score function
    (a.k.a. conjugate variable).
    The classical score function of a measure with density ⇢ on Rd is
    r log ⇢. If X is a random variable with density ⇢ and if
    ⇠ = (r log ⇢)(X), then we have the integration-by-parts relation
    E[h⇠, f (X)i] = E[Tr(Df (X))],
    for all f 2 C
    1
    c
    (Rd , Rd ), where Df is the Jacobian matrix.
    Given a tracial von Neumann algebra (A, ⌧) generated by X 2 Ad
    sa
    , we say
    that ⇠ 2 Ad
    sa
    is a free score function for X if
    h⇠, p(X)i⌧
    = ⌧ ⌦ ⌧ ⌦ Tr(J p)
    for every non-commutative polynomial p, where J p is the matrix of
    derivatives of p in the sense of Voiculescu’s free di↵erence quotient.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 5 / 44

    View Slide

  28. Motivation — free score function
    In the setting of the random matrix d-tuples X
    (N) above given by
    V
    (N)(x) = trN
    (f (x)), the free score function becomes very concrete. Since
    ⇢ = conste
    N2V (N)
    , the classical score function is (up to normalization)
    rV
    (N)(x), which after some matrix computations works out to D f (x),
    where D f is Voiculescu’s cyclic gradient.
    Classical integration by parts plus some matrix computations tell us that
    E[hrV
    (N)(X
    (N)), p(X
    (N))itrN
    ] = E[trN ⌦ trN ⌦ Trd
    [J p(X
    (N))]].
    Thus, if X is the d-tuple of self-adjoint operators from (A, ⌧) describing
    the large-N limit, then ⇠ = D p(X) is the free score function for X.
    The existence of a free score means that “the gradient of the log-density
    makes sense as an element of L
    2.” This motivates us to make the
    log-density the central object of study.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 6 / 44

    View Slide

  29. Overview
    We’re going to define the free Wasserstein manifold W (R⇤d ) as the space
    of certain “log-density” functions V , which are a generalization of things
    like tr(f ).
    More precisely, V will some from a space of tracial non-commutative
    smooth functions that are defined in terms of trace polynomials. (Similar
    spaces were defined in Dabrowski, Guionnet, and Shlaykhtenko’s 2016
    preprint on free transport, but we take a di↵erent approach to the norms.)
    The tangent space of W (R⇤d ) at V is similarly a space of tracial
    non-commutative smooth functions W , which are viewed as perturbations
    of V .
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 7 / 44

    View Slide

  30. Overview
    In order to obtain a Riemannian metric for the Wasserstein manifold, we
    must associate a non-commutative law µV
    to each V . This is much
    trickier than in the classical case (where we would just set
    dµV
    (x) = constant e
    V (x)
    dx), but there are two known methods for
    doing this:
    (1) For each V , find a (hopefully unique) law µ that maximizes
    (µ) µ(V ), where is Voiculescu’s free microstate entropy. This
    approach is inspired by Voiculescu and is closely related to the random
    matrix models discussed before.
    (2) Set up the free stochastic di↵erential equation
    dXt = dSt (1/2)rV (Xt) dt, where St is a free Brownian motion (still
    self-adjoint d-tuple), and (hopefully) recover µV
    as the limiting
    distribution of Xt as t ! 1. This approach was pioneered by Biane and
    Speicher (1999) and further developed by Guionnet, Shlyakhtenko, and
    Dabrowski.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 8 / 44

    View Slide

  31. Outline
    Background on non-commutative laws.
    Tracial non-commutative smooth functions.
    Free Wasserstein manifold and di↵eomorphism group.
    Riemannian metric.
    Strategy to construct smooth transport.
    Inversion of the Laplacian LV
    through heat semigroup and SDE.
    Free Gibbs laws through maximization of (µ) µ(V ).
    Geodesics and optimal transport.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 9 / 44

    View Slide

  32. Operator algebras and laws
    Definition
    A unital C⇤
    -algebra is a subalgebra of B(H) (for some Hilbert space H)
    that is closed under adjoints and limits in operator norm.
    Definition
    A tracial C⇤
    -algebra is a pair (A, ⌧) where A is a C⇤-algebra and ⌧ is a
    faithful trace, that is,
    ⌧(1) = 1,
    ⌧(a

    a) 0 with equality if and only if a = 0,
    ⌧(ab) = ⌧(ba).
    Remark
    We don’t need to go into the definition of von Neumann algebras now.
    But every tracial C⇤-algebra can be completed to a tracial von Neumann
    algebra.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 10 / 44

    View Slide

  33. Operator algebras and laws
    Definition
    Let Chx1, . . . , xd i be the algebra of non-commutative polynomials
    equipped with the ⇤-operation such that x

    j
    = xj
    . (We typically use x to
    denote formal or generic variables and X to denote a specific tuple of
    operators.)
    Definition
    Let ⌃d,R
    be the set of linear functionals : Chx1, . . . , xd i ! C such that
    (1) = 1,
    (p ⇤ p) 0,
    (pq) = (qp),
    | (xi1
    . . . xik
    )|  R
    k,
    equipped with the weak-? topology (as a subset of the dual of
    Chx1, . . . , xd i). We call elements of ⌃d,R non-commutative laws with
    exponential bound R.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 11 / 44

    View Slide

  34. Operator algebras and laws
    Proposition
    If (A, ⌧) is a tracial C⇤-algebra and X = (X1, . . . , Xd
    ) 2 Ad
    sa
    , then the map
    X
    : Chx1, . . . , xd i ! C, p 7! ⌧(p(X))
    is a non-commutative law with exponential bound kXk1 = maxj kXj k.
    Conversely, every 2 ⌃d,R
    can be realized as X
    for some (A, ⌧) and
    X 2 Ad
    sa
    with kXk1  R.
    The proof is a variant of the GNS construction. The proposition can be
    interpreted as follows:
    1 ⌃d,R
    is the space of traces on the C⇤-universal free product
    C([ R, R])⇤d .
    2 ⌃d,R
    is in bijection with isomorphism classes of triples (A, ⌧, X),
    where (A, ⌧) is a tracial C⇤-algebra and X 2 Ad
    sa
    generates A; here
    isomorphism means a C⇤-isomorphism that preserves the trace and
    generators.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 12 / 44

    View Slide

  35. Trace polynomials
    We next describe non-commutative functions that are modeled on trace
    polynomials in a similar spirit to Dab. Gui. Shl. 2016.
    A trace polynomial in (x1, . . . , xd
    ) is an expression formed through
    addition, multiplication, and application of a symbol tr, such as
    f (x1, x2, x3) = tr(x
    2
    1 x2) tr(x3)x1 + tr(x2x3) + 5 tr(x1x2x3) tr(x1)x2x
    2
    3
    .
    These expressions are considered modulo the relations that
    tr(pq) = tr(qp) and tr(tr(p)q) = tr(p) tr(q).
    For any tracial C⇤-algebra (A, ⌧) and X 2 Ad
    sa
    , we can evaluate a trace
    polynomial f on X by substituting Xj
    for the formal symbol xj
    and ⌧ for
    the formal symbol tr. Hence, a trace polynomial f gives rise to a function
    f
    A,⌧ : Ad
    sa
    ! A.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 13 / 44

    View Slide

  36. Trace polynomials
    We next describe non-commutative functions that are modeled on trace
    polynomials.
    A trace polynomial in (x1, . . . , xd
    ) is an expression formed through
    addition, multiplication, and application of a symbol tr, such as
    f (x1, x2, x3) = tr(x
    2
    1 x2) tr(x3)x1 + tr(x2x3) + 5 tr(x1x2x3) tr(x1)x2x
    2
    3
    .
    These expressions are considered modulo the relations that
    tr(pq) = tr(qp) and tr(tr(p)q) = tr(p) tr(q).
    For any tracial C⇤-algebra (A, ⌧) and X 2 Ad
    sa
    , we can evaluate a trace
    polynomial f on X by substituting Xj
    for the formal symbol xj
    and ⌧ for
    the formal symbol tr. Hence, a trace polynomial f gives rise to a function
    f
    A,⌧ : Ad
    sa
    ! A.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 14 / 44

    View Slide

  37. Trace polynomials
    Trace polynomials have several advantages over non-commutative
    polynomials.
    1 It follows from the work of Procesi (1976) that every function
    MN
    (C)d
    sa
    ! MN
    (C) that is entrywise polynomial and is invariant
    under unitary conjugation must be given by a trace polynomial.
    2 For each trace polynomial f , we can compute the Laplacian of
    f
    MN (C),trN as a function on MN
    (C)d
    sa
    (equipped with the inner
    product from trN
    ). The Laplacian (1/N
    2) f
    MN (C),trN is a trace
    polynomial and it converges coe cientwise as N ! 1 to some trace
    polynomial Lf .
    We’ll define the non-commutative space C
    k(R⇤d ) roughly as functions
    that such that the first k derivatives can be approximated on
    operator-norm balls by trace polynomials.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 15 / 44

    View Slide

  38. Description of trace Ck functions
    The space C
    k
    tr
    (R⇤d ) is described as follows:
    Each f 2 C
    k
    tr
    (R⇤d ) is a collection of functions f
    A,⌧ : Ad
    sa
    ! A for
    tracial C⇤-algebras (A, ⌧).
    f
    A,⌧ must be a C
    k function in the sense of Fr´
    echet di↵erentiation.
    The derivative @k
    f
    A,⌧ (X) is a multilinear map Ad
    sa
    ⇥ · · · ⇥ Ad
    sa
    ! A.
    Inspired by the non-commutative H¨
    older’s inequality, we define the
    norm k@j
    f
    A,⌧ (X)kM j
    as the smallest constant such that
    k@j
    f
    A,⌧ (X)[Y1, . . . , Yk
    ]kp  k@j
    f
    A,⌧ (X)kM k
    kY1kp1
    . . . kYj kpj
    .
    where 1/p = 1/p1 + · · · + 1/pj
    , and where j = 0, . . . , k.
    Then k@j
    f kM j ,R
    is the supremum of k@j
    f
    A,⌧ (X)kM j
    over (A, ⌧) and
    X 2 Ad
    sa
    with kXk1  R.
    For R > 0 and j  k, we assume that k@j
    f kM j ,R
    is finite and that
    @j
    f can be approximated in this norm by trace polynomials of X, Y1,
    . . . , Yk
    that are multilinear in Y1, . . . , Yk
    .
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 16 / 44

    View Slide

  39. Properties of trace Ck functions
    There are also spaces Ctr(R⇤d , M j (R⇤d1 , . . . , R⇤dn )) of functions where
    f
    A,⌧ (X) is a multilinear map Ad1
    sa
    ⇥ · · · ⇥ Adn
    sa
    ! A.
    The exact definition of the space is less important than the properties:
    These spaces are closed under composition, whenever the composition
    makes sense, and they satisfy the chain rule.
    There is an inverse function theorem: If f is C
    k
    tr
    (R⇤d ) self-adjoint
    d-tuple, and if @f Id is uniformly bounded by a constant c < 1,
    then f
    1 is defined and is C
    k
    tr
    .
    There is a trace map tr : C
    k
    tr
    (R⇤d ) ! C
    k
    tr
    (R⇤d ) given by
    tr(f )A,⌧ (X) = ⌧(f
    A,⌧ (X)). The image tr(C
    k
    tr
    (R⇤d )) consists of those
    f which are scalar-valued.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 17 / 44

    View Slide

  40. Examples of trace Ck
    tr functions
    Of course, trace polynomials are C
    1
    tr
    functions.
    If : R ! R such that
    R
    R
    |2⇡s|k b(s) ds < 1, then the function
    f
    A,⌧ (X) = (X) (defined by functional calculus) is in C
    k
    tr
    (R⇤1) and
    the kth derivative is bounded by
    R
    R
    |2⇡s|k b(s) ds.
    Together with the chain rule, this shows that there is an abundance of
    BC
    k
    tr
    (R⇤d ) functions, that is, functions in C
    k
    tr
    (R⇤d ) such that
    k@j
    f kM j ,u
    := sup
    R>0
    k@j
    f kM j ,R
    < 1.
    Imposing certain growth conditions at 1 on a on Ctr(R⇤d ) function is
    not a big restriction. This makes life easier than it would be if we
    only used trace polynomials.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 18 / 44

    View Slide

  41. Di↵erentiation of trace Ck functions
    For scalar-valued g 2 C
    k
    tr
    (R⇤d ), we can define a gradient rg 2 C
    k
    tr
    (R⇤d ).
    In the case where g = tr(p) for some non-commutative polynomial p, then
    rg is the cyclic gradient of p.
    The analog of C
    k functions from Rd to Md
    (C) is the space
    C
    k
    tr
    (R⇤d , M (R⇤d )). This is the space that contains the derivative @f when
    f 2 C
    k+1
    tr
    (R⇤d )d
    sa
    , as well as the Hessian of g when g is a scalar-valued
    element of C
    k+1
    tr
    (R⇤d ).
    For F 2 C
    k
    tr
    (R⇤d , M (R⇤d )), for X 2 Ad
    sa
    , the object F
    A,⌧ (X) is a linear
    transformation Ad ! Ad . We define F#G to be the pointwise
    composition of these linear transformations. It turns out that
    C
    k
    tr
    (R⇤d , M (R⇤d )) is a ⇤-algebra with respect to the #-multiplication.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 19 / 44

    View Slide

  42. Di↵erentiation of trace Ck functions
    We can define a trace Tr#
    : C
    k
    tr
    (R⇤d , M (R⇤d )) ! tr(C
    k
    tr
    (R⇤d )) by
    (Tr#
    (F))A,⌧ (X) = hS, F
    A⇤B,⌧⇤ (X)[S]i⌧⇤ ,
    where (B, ) is the tracial C⇤-algebra of generated by a free semicircular
    d-tuple S.
    This is the analog of the map C
    k(Rd , Md
    (C)) ! C
    k(Rd ) defined by
    pointwise application of the trace Trd
    on Md
    (C). This is because the trace
    of a matrix A can be expressed as EhY , AY i where Y is a standard
    Gaussian random vector in Rd , and the analog of the Gaussian in free
    probability is the semicircular family.
    Another motivating example is that if F 2 C
    k
    tr
    (R⇤d , M (R⇤d )) is given by
    F
    A,⌧ (X)[Y ]i
    =
    P
    j p
    A,⌧
    i,j
    (X)Yj q
    A,⌧
    i,j
    (X) for some matrix (pi,j ⌦ qi,j
    )i,j
    of
    non-commutative polynomials, then
    Tr#
    (F)A,⌧ (X) =
    X
    i
    ⌧(pi,i
    (X))⌧(qi,i
    (X)).
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 20 / 44

    View Slide

  43. Di↵erentiation of trace Ck functions
    The trace Tr#
    allows us to define the divergence operator
    r† : C
    k+1
    tr
    (R⇤d , M (R⇤d )) ! tr(C
    k
    tr
    (R⇤d ))
    as the trace of the Jacobian, as well as the Laplacian
    L = r†r : tr(C
    k+2
    tr
    (R⇤d )) ! tr(C
    k
    tr
    (R⇤d )).
    These operators are the limits of the corresponding normalized divergence
    and Laplacian for functions on MN
    (C)d
    sa
    .
    Furthermore, the trace Tr#
    gives rise to a Fuglede-Kadison
    log-|determinant| map
    log #
    : GL(C
    k
    tr
    (R⇤d , M (R⇤d ))) ! tr(C
    k
    tr
    (R⇤d )).
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 21 / 44

    View Slide

  44. Free Wasserstein manifold and di↵eomorphism group
    We’ll first set up the manifold formally. Afterwards, we’ll describe how to
    extract a non-commutative law µV
    from V and hence define the
    Riemannian metric.
    Definition
    The free Wasserstein manifold W (R⇤d ) is the set of V 2 tr(C
    1
    tr
    (R⇤d ))
    such that V has “quadratic growth at 1” in the sense that for some
    constants a, a
    0 > 0 and b, b
    0 2 R, we have
    a
    X
    j
    ⌧(X
    2
    j
    ) + b  V
    A,⌧ (X)  a
    0
    X
    j
    ⌧(X
    2
    j
    ) + b
    0.
    Definition
    The free di↵eomorphism group D(R⇤d ) is the set of f 2 C
    1
    tr
    (R⇤d )d
    sa
    such
    that f has an inverse function f
    1 in C
    1(R⇤d )d
    sa
    , and @f , @f
    1 are
    bounded. Note this is a group under composition.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 22 / 44

    View Slide

  45. Tangent vectors
    A tangent vector to W (R⇤d ) at V is an equivalence class of C
    1 paths
    ( ✏, ✏) ! W (R⇤d ) : t 7! Vt with V0 = V , where two paths are equivalent
    if they have the same ˙
    V0. For convenience, we assume that Vt satisfies the
    quadratic growth bounds with a, a
    0, b, b
    0 independent of t.
    A tangent vector to D(R⇤d ) at id is similarly an equivalence class of C
    1
    paths t 7! ft with f0 = id, and the equivalence is equality of ˙
    f0. Again,
    assume that @ft and @f
    1
    t
    are uniformly bounded.
    Here, by “C
    1 path”, we mean it is continuously di↵erentiable with respect
    to the Fr´
    echet topology of C
    1
    tr
    (R⇤d ) on the target space (defined by the
    seminorms of each derivative @j
    f on each ball of radius R).
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 23 / 44

    View Slide

  46. The transport action
    In the classical case, one studies the action of Di↵(Rd ) on P(Rd ) by
    push-forward, which is viewed as an infinite-dimensional Lie group acting
    on an infinite-dimensional Riemannian manifold. If µ has density e
    V and
    if f is a di↵eomorphism, then f⇤µ has density e
    (V f 1 log | det Df 1|) using
    the classical change of variables formula. This motivates the following
    definition.
    Definition
    We define the transport action D(R⇤d ) y W (R⇤d ) by
    (f , V ) 7! f⇤V := V f
    1 log #
    (@f
    1).
    One can check this is a well-defined group action.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 24 / 44

    View Slide

  47. Di↵erential of the transport action
    The key computation behind transport theory is the description of the
    di↵erential of the transport action. We define
    r⇤
    V
    : C
    1
    tr
    (R⇤d )d ! tr(C
    1
    tr
    (R⇤d ))
    by
    r⇤
    V f = r†
    f + @V #f = Tr#
    (@f ) + hrV , f itr.
    (This is just notation; it is not actually the adjoint.)
    Lemma
    Let V 2 W (R⇤d ) and let t 7! ft be a tangent vector to D(R⇤d ) at id.
    Then
    d
    dt t=0
    (ft)⇤V = r⇤
    V
    ˙
    f0.
    In other words, r⇤
    V
    is the di↵erential at id of the orbit map
    D(R⇤d ) ! W (R⇤d ) : f 7! f⇤V .
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 25 / 44

    View Slide

  48. D(R⇤d) as a Lie group
    We saw that the tangent space of D(R⇤d ) is (a dense subspace of) the
    space of vector fields Ctr(R⇤d )d
    sa
    . Conversely:
    Lemma
    Given a time-dependent vector field t 7! ht (continuous in t) such that @ht
    is uniformly bounded, there exists a unique path ft in D(R⇤d ) such that
    f0 = id, ˙
    ft = ht ft.
    The proof is similar to classical ODE theory. If h is independent of t, then
    we get a one-parameter subgroup of D(R⇤d ). Combining this with our
    previous observation:
    Lemma
    Let h 2 Ctr(R⇤d )d
    sa
    with @h bounded, and let ft be the corresponding
    one-parameter subgroup. Then (ft)⇤V = V for all t if and only if
    r⇤
    V h = 0.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 26 / 44

    View Slide

  49. D(R⇤d) as a Lie group
    By studying the one-parameter subgroups of D(R⇤d ) as described above,
    we arrive at the following definition of the Lie bracket, completely
    analogous to the Lie bracket on vector fields of Rd .
    Definition
    For two vector fields h1, h2 2 Ctr(R⇤d )d , let
    [h1, h2] = @h1#h2 @h2#h1.
    This generalizes the definition of Lie brackets for non-commutative
    polynomials used in Voiculescu’s paper “Cyclomorphy.”
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 27 / 44

    View Slide

  50. D(R⇤d) as a Lie group
    For each V 2 W (R⇤d ), its stabilizer {f 2 D(R⇤d ) : f⇤V = V } is a “Lie
    subgroup,” analogous to a classical group of measure-preserving
    transformations.
    By our previous observations, the corresponding Lie subalgebra should be
    the set of vector fields h with r⇤
    V h = 0. We can verify directly that this is
    indeed a Lie subalgebra:
    Lemma
    r⇤
    V
    [h1, h2] = @(r⇤
    V h1)#h2 @(r⇤
    V h2)#h1, and in particular ker(r⇤
    V
    ) is
    closed under Lie brackets.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 28 / 44

    View Slide

  51. Two ingredients for the Riemannian metric
    In order to define the Riemannian metric on the tangent space at V , we
    need two conditions on V . We will worry later about checking when these
    are true.
    Condition 1
    There exists a unique non-commutative law µV
    satisfying the
    Dyson-Schwinger equation µV
    [r⇤
    V f ] = 0 for f 2 Ctr(R⇤d )d .
    Note that r⇤
    V f is a scalar-valued function approximated by trace
    polynomials, and µV
    [r⇤
    V f ] is evaluated as r⇤
    V f
    A,⌧ (X) for any X with
    X
    = µV
    .
    Condition 2
    The operator LV
    = r⇤
    V
    r : tr(C
    1
    tr
    (R⇤d )) ! tr(C
    1
    tr
    (R⇤d )) has kernel
    equal to the constant functions, and it has a continuous pseudo-inverse
    V
    : tr(C
    1
    tr
    (R⇤d )) ! tr(C
    1
    tr
    (R⇤d )) with µ( V f ) = 0 and
    V LV f + µV
    (f ) = f .
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 29 / 44

    View Slide

  52. The Riemannian metric
    Definition
    If V satisfies Conditions 1 and 2, the Riemannian metric on TV W (R⇤d ) is
    given by
    h ˙
    V1, ˙
    V2iV
    = µV
    [hr V
    ˙
    V1, r V V2itr].
    Remark
    This definition relates to the Riemannian metric for measures on MN
    (C)d
    sa
    .
    If µ(N)
    V
    is the measure with density constant times e
    N2V MN (C),trN , then the
    classical Riemannian metric can be expressed as
    Z
    hr(LV (N)
    ) 1 ˙
    V1, r(LV (N)
    ) 1 ˙
    V1itrN dµ(N) = N
    2
    Z
    ˙
    V1(LV (N)
    ) 1 ˙
    V2 dµ(N).
    The expression on the right-hand side seems simpler, but it is
    dimension-dependent!!
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 30 / 44

    View Slide

  53. Consequences of Dyson-Schwinger equation
    If Conditions 1 and 2 hold for some V , then using the formula for
    r⇤
    V
    [h1, h2], one can show that ker(r⇤
    V
    ) and Im(r) are orthogonal with
    respect to V .
    Furthermore, r V r⇤
    V
    : C
    1
    tr
    (R⇤d )d ! C
    1
    tr
    (R⇤d )d defines a projection
    onto the space of gradients. The complementary projection is known as
    the Leray projection.
    Remark
    In the classical setting, the decomposition of vector fields into ker(r⇤
    V
    )
    and Im(r) is an infinitesimal version of Brenier’s factorization of a
    di↵eomorphism into an optimal transport map and a µV
    -preserving
    transformation.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 31 / 44

    View Slide

  54. Warnings
    Although Condition 1 stipulates that µV
    is uniquely determined by V ,
    there are many cases where V is not uniquely determined by V . For
    instance, since µV
    arises from bounded operators (it is “supported on a
    operator norm ball”), often modifying V outside an operator norm ball will
    not change µV
    .
    Another way in which degeneracy arises is from the use of trace
    polynomials. If a particular (A, ⌧) and X are given, and if f is a trace
    polynomial, then f
    A,⌧ (X) agrees with p(X) for some non-commutative
    polynomial p. We can easily imagine that many V lead to the same µ for
    this reason.
    Relatedly, the Riemannian metric on the tangent space could have a very
    large kernel because when we take the inner product in L
    2(µV
    ), all the
    tr(p) terms are collapsed to constants.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 32 / 44

    View Slide

  55. Construction of transport
    Closely related to the previous observation about the di↵erential of the
    transport action, we have:
    Lemma
    Suppose that t 7! Vt is a C
    1 path in W (R⇤d ), for t in some interval
    containing 0. Let ht be a vector field with @ht uniformly bounded and
    r⇤
    Vt
    ht = ˙
    Vt. Let ft be the flow along the vector field ht. Then
    (ft)⇤V0 = Vt.
    Suppose we are given the path t ! Vt (perhaps interpolating between
    some given V0 and V1) and we want to construct ht. If each Vt satisfies
    Conditions 1 and 2, then we can take ht = V r ˙
    Vt. For @ht to be
    bounded, we require some concrete estimate on V
    . For ht to depend
    continuously on t, we need some joint continuous dependence of V f on
    V and f , at least for some family of V ’s that contains our given path. If
    these conditions are met, then some smooth transport exists.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 33 / 44

    View Slide

  56. Construction of transport
    The following theorem is similar to previous work such as
    Guionnet-Shlyakhtenko 2009, Dabrowski-Guionnet-Shlyakhtenko 2016.
    Theorem A
    Fix C1, C2, C3 > 0 with C2 < 1. Consider V 2 tr(C
    1
    tr
    (R⇤d )) such that
    krV kBCtr
     C1 and k@rV IdkBCtr
     C2.
    V satisfies Conditions 1 and 2.
    For such V , the map (V , f ) 7! V f is jointly continuous with respect
    to the Fr´
    echet topology on C
    1
    tr
    .
    Let k 0. If V is as above and furthermore @j
    V is bounded by some
    constant Cj
    for j  k + 2, then V
    maps BC
    k
    tr
    into BC
    k
    tr
    .
    The theorem implies that for a path t 7! Vt, if rVt, @rVt, @2rVt, r ˙
    Vt,
    @r ˙
    Vt are uniformly bounded, with k@rVt IdkBCtr
     C2 < 1, then the
    above construction of transport works.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 34 / 44

    View Slide

  57. Construction of transport
    From this result, one immediately gets isomorphisms of C⇤-algebras
    associated to the non-commutative laws µVt
    .
    Theorem B
    For a path t 7! Vt satisfying the conditions on the previous slide, there
    exists a C
    1 path t 7! ft of di↵eomorphisms with (ft)⇤V0 = Vt. These give
    rise to isomorphisms between the tracial C⇤-algebras (and the von
    Neumann algebras) associated to the GNS representations of the
    non-commutative laws µVt
    . In particular, when V is as in the previous
    theorem, the C⇤-algebra of µV
    is isomorphic to the one generated by a
    free semicircular family.
    There is one thing to check to finish the proof: If f⇤V0 = V1, then does
    f⇤µV0
    = µV1
    ? For the potentials Vt as in the previous slide, this can be
    checked from the free entropy viewpoint, which will be explained later.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 35 / 44

    View Slide

  58. Warnings
    These results are not true for arbitrary V , even in the one variable case.
    Indeed, as in Biane-Speicher 1999, consider V
    A,⌧ (X) = ⌧(f (X)) where
    f : R ! R is a “double well” potential. If the wells are deep enough, then
    in the large N limit the spectral distribution is supported on a disjoint
    union of two intervals. Hence, the C⇤-algebra is C[0, 1] C[0, 1], which is
    not isomorphic to the C⇤-algebra C[0, 1] which is obtained in the
    semicircular case.
    Actually, Condition 1 fails for a such a potential because other measures
    satisfying the Dyson-Schwinger equation are obtained by reweighting the
    two components.
    Relatedly, there are non-constant smooth functions such that LV
    vanishes in L
    2(µV
    ). Namely, we take (X) = ⌧(f (X)) where f : R ! R is
    constant on each of the two intervals and is smooth. On the other hand,
    LV
    is not zero in tr(C
    1
    tr
    (R⇤d )), but the significance of this is unclear.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 36 / 44

    View Slide

  59. Inversion of the Laplacian
    Theorem A comes out of two sets of tools:
    1 The free entropy approach is used to show existence of a
    non-commutative law µ satisfying µ[r⇤
    V f ] = 0 for f 2 Ctr(R⇤d )d .
    2 The heat semigroup is used to uniqueness of a non-commutative law
    µ satisfying µ[LV
    ] = 0 for 2 tr(Ctr(R⇤d )) as well as constructing
    V
    .
    Let us start with (2). The broad outline is the same as Dab.-Gui.-Shl.
    2016, but with di↵erent function spaces.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 37 / 44

    View Slide

  60. Inversion of the Laplacian
    Recalling that LV
    = r⇤
    V
    r, the heat semigroup is the family of operators
    e
    tLV for t 0. The rigorous definition is through free SDE theory. We set
    [e
    tLV /2
    f ]A,⌧ (X) = EA[Xt(X)],
    where
    dXt(X) = dSt
    1
    2
    rV (Xt(X)) dt, X0(X) = 0,
    where St is a semicircular Brownian motion freely independent of the
    initial condition X.
    The assumption that k@rV IdkBCtr
     C2 < 1 implies that @X Xt decays
    like e
    t(1 C2)/2.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 38 / 44

    View Slide

  61. Inversion of the Laplacian
    This in turn implies @[e
    tLV f ] decays like e
    t(1 C2). We recover the
    non-commutative law µV
    and the pseudo-inverse V
    , we argue that
    µV f = lim
    t!1
    e
    tLV
    f
    and
    V f =
    Z 1
    0
    [e
    tLV
    f µV
    (f )] dt.
    These expressions make sense because of the exponential decay.
    The smoothness properties as well as the continuous dependence of V f
    on (V , f ) are proved by studying the smoothness properties of Xt(X) as a
    function of X, with some simpleminded inductive arguments.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 39 / 44

    View Slide

  62. Free Gibbs laws — results
    A free Gibbs law for V is a non-commutative law µ that maximizes
    V
    (µ) := (µ) µ(V ), where is the free microstate entropy.
    We can show the following:
    1 If V 2 W (R⇤d ) with @V and @2
    V bounded, then a free Gibbs law
    always exists.
    2 Due to the change of variables formula for entropy, any free Gibbs law
    µ must satisfy the Dyson-Schwinger equation µ[r⇤
    V f ] = 0.
    3 Fix C1, C2 > 0. The set of V which have a unique free Gibbs law is
    generic in the set VC1,C2
    of V with k@V kBCtr
     C1 and
    k@2
    V kBCtr
     C2, equipped with the subspace topology from
    tr(Ctr(R⇤d )).
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 40 / 44

    View Slide

  63. Free Gibbs laws — proof with lies
    The argument for the existence of free Gibbs laws relies on enlarging the
    space of laws in order to obtain more compactness. More precisely:
    1 We embed the space of non-commutative laws into the dual of a
    Banach space C consisting of certain functions with quadratic growth
    at 1.
    2 Letting E ✓ C? be the closure of the space of laws, it turns out that
    the elements of E with “second moment” (not operator norm)
    bounded by r is compact.
    3 V
    is upper semi-continuous and it goes to 1 as the as the
    “second moment” of µ goes to 1, and thus we get a maximizer
    using compactness.
    4 Using the change of variables formula for entropy, we deduce that any
    maximizer ⌫ satisfies the Dyson-Schwinger equation (for nice enough
    test functions).
    5 Using the Dyson-Schwinger equation, we show iteratively that
    moments of ⌫ are finite, and ultimately that ⌫ 2 ⌃d,R
    for some R.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 41 / 44

    View Slide

  64. Geodesic equations
    Definition
    The geodesic equations on W (R⇤d ) are the pair of equations
    8
    <
    :
    ˙
    Vt = LVt t
    ˙
    t =
    1
    2
    hr t, r titr.
    These can be obtained formally as the large N limit of the geodesic
    equations for measures on MN
    (C)d
    sa
    .
    Thinking about the classical case, one is led to conjecture that nice
    enough solutions must have the form
    Vt = (id +tr ˙
    0)⇤V0.
    It is straightforward to check that when @r ˙
    0 is bounded, this formula
    defines a solution for small enough t. We do not show rigorously that
    these are the only solutions.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 42 / 44

    View Slide

  65. Towards free optimal transport
    However, we can show rigorously that these paths minimize length with
    respect to the L
    2-coupling distance when @r ˙
    0 is bounded by a constant
    C and when t 2 (0, 1/C). This follows from the more general proposition
    below.
    Definition
    For two non-commutative laws µ and ⌫, we define dW
    (µ, ⌫) as the
    infimum of kX Y k2 over all tracial C⇤-algebras (A, ⌧) and X, Y 2 Ad
    sa
    such that X
    = µ and Y
    = ⌫.
    Proposition
    Let 2 tr(C
    2
    tr
    (R⇤d ))sa such that k@r IdkBCtr
    < 1. Then for every
    (A, ⌧) and X 2 Ad
    sa
    , we have dW
    ( X , r (X)
    ) = kX r (X)k⌧,2. In
    other words, X and r (X) are an optimal coupling of their respective
    laws.
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 43 / 44

    View Slide

  66. Towards free optimal transport
    The proof of the proposition is inspired by the classical
    Monge-Kantorovich duality.
    By the inverse function theorem, r has an inverse function, so define
    A,⌧ (Z) = hZ, ((r ) 1)A,⌧ (Z)i⌧
    ( (r ) 1)A,⌧ (Z).
    Note that Y = ((r ) 1)A,⌧ (Z) maximizes the function
    hZ, Y i⌧
    A,⌧ (Y )
    by calculus and by convexity of . (So is the Legendre transform of .)
    Thus,
    A,⌧ (Y ) + A,⌧ (Z) hY , Zi⌧
    for all Y , Z 2 Ad
    sa
    .
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 44 / 44

    View Slide

  67. Towards free optimal transport
    Fix (A, ⌧) and X 2 Ad
    sa
    . If Y , Z is any coupling of X
    and r (X)
    on
    some other tracial C⇤-algebra (A0, ⌧0), then
    hY , Zi⌧0  A0,⌧0
    (Y ) + A0,⌧0
    (Z)
    = A,⌧ (X) + A,⌧ (r A,⌧ (X))
    = hX, A,⌧ (X)i⌧ ,
    where the last inequality follows by the definition of . Thus, X, r (X) is
    a coupling that maximizes the inner product between the first and second
    variable, which is equivalent to minimizing the L
    2 distance (since the L
    2
    norms of X and Y are uniquely determined by the laws).
    David Jekel, Wuchen Li, Dima Shlyakhtenko The free Wasserstein manifold February 22, 2021 45 / 44

    View Slide