Download A user's guide to optimal transport

Transcript
A user’s guide to optimal transport
Luigi Ambrosio, Nicola Gigli
To cite this version:
Luigi Ambrosio, Nicola Gigli. A user’s guide to optimal transport. CIME summer school, 2009,
Italy. <hal-00769391>
HAL Id: hal-00769391
https://hal.archives-ouvertes.fr/hal-00769391
Submitted on 31 Dec 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destin´ee au d´epˆot et `a la diffusion de documents
scientifiques de niveau recherche, publi´es ou non,
´emanant des ´etablissements d’enseignement et de
recherche fran¸cais ou ´etrangers, des laboratoires
publics ou priv´es.
A user’s guide to optimal transport
Luigi Ambrosio ∗
Nicola Gigli †
Abstract
This text is an expanded version of the lectures given by the first author in the 2009 CIME summer
school of Cetraro. It provides a quick and reasonably account of the classical theory of optimal mass
transportation and of its more recent developments, including the metric theory of gradient flows,
geometric and functional inequalities related to optimal transportation, the first and second order
differential calculus in the Wasserstein space and the synthetic theory of metric measure spaces with
Ricci curvature bounded from below.
Contents
1
2
3
∗
†
The optimal transport problem
1.1 Monge and Kantorovich formulations of the optimal transport problem
1.2 Necessary and sufficient optimality conditions . . . . . . . . . . . . .
1.3 The dual problem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Existence of optimal maps . . . . . . . . . . . . . . . . . . . . . . .
1.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
6
11
14
22
The Wasserstein distance W2
2.1 X Polish space . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 X geodesic space . . . . . . . . . . . . . . . . . . . . . . . .
2.3 X Riemannian manifold . . . . . . . . . . . . . . . . . . . .
2.3.1 Regularity of interpolated potentials and consequences
2.3.2 The weak Riemannian structure of (P2 (M ), W2 ) . .
2.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
31
40
40
43
49
Gradient flows
3.1 Hilbertian theory of gradient flows . . . . . . . . . . . . . . . . . . . . .
3.2 The theory of Gradient Flows in a metric setting . . . . . . . . . . . . . .
3.2.1 The framework . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 General l.s.c. functionals and EDI . . . . . . . . . . . . . . . . .
3.2.3 The geodesically convex case: EDE and regularizing effects . . .
3.2.4 The compatibility of Energy and distance: EVI and error estimates
3.3 Applications to the Wasserstein case . . . . . . . . . . . . . . . . . . . .
3.3.1 Elements of subdifferential calculus in (P2 (Rd ), W2 ) . . . . . .
3.3.2 Three classical functionals . . . . . . . . . . . . . . . . . . . . .
3.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
51
51
55
59
63
67
69
70
76
[email protected]
[email protected]
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
5
6
7
Geometric and functional inequalities
4.1 Brunn-Minkowski inequality . . .
4.2 Isoperimetric inequality . . . . . .
4.3 Sobolev Inequality . . . . . . . .
4.4 Bibliographical notes . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
77
77
78
78
79
Variants of the Wasserstein distance
5.1 Branched optimal transportation . . . . . . .
5.2 Different action functional . . . . . . . . . .
5.3 An extension to measures with unequal mass
5.4 Bibliographical notes . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
80
80
81
82
84
More on the structure of (P2 (M ), W2 )
6.1 “Duality” between the Wasserstein and the Arnold Manifolds
6.2 On the notion of tangent space . . . . . . . . . . . . . . . .
6.3 Second order calculus . . . . . . . . . . . . . . . . . . . . .
6.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
84
. 84
. 87
. 88
. 106
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Ricci curvature bounds
107
7.1 Convergence of metric measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 Weak Ricci curvature bounds: definition and properties . . . . . . . . . . . . . . . . 112
7.3 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Introduction
The opportunity to write down these notes on Optimal Transport has been the CIME course in Cetraro
given by the first author in 2009. Later on the second author joined to the project, and the initial set of
notes has been enriched and made more detailed, in particular in connection with the differentiable
structure of the Wasserstein space, the synthetic curvature bounds and their analytic implications.
Some of the results presented here have not yet appeared in a book form, with the exception of [44].
It is clear that this subject is expanding so quickly that it is impossible to give an account of all
developments of the theory in a few hours, or a few pages. A more modest approach is to give a
quick mention of the many aspects of the theory, stimulating the reader’s curiosity and leaving to
more detailed treatises as [6] (mostly focused on the theory of gradient flows) and the monumental
book [80] (for a -much - broader overview on optimal transport).
In Chapter 1 we introduce the optimal transport problem and its formulations in terms of transport maps and transport plans. Then we introduce basic tools of the theory, namely the duality
formula, the c-monotonicity and discuss the problem of existence of optimal maps in the model case
cost=distance2 .
In Chapter 2 we introduce the Wasserstein distance W2 on the set P2 (X) of probability measures
with finite quadratic moments and X is a generic Polish space. This distance naturally arises when
considering the optimal transport problem with quadratic cost. The connections between geodesics
in P2 (X) and geodesics in X and between the time evolution of Kantorovich potentials and the
Hopf-Lax semigroup are discussed in detail. Also, when looking at geodesics in this space, and in
particular when the underlying metric space X is a Riemannian manifold M , one is naturally lead to
the so-called time-dependent optimal transport problem, where geodesics are singled out by an action
minimization principle. This is the so-called Benamou-Brenier formula, which is the first step in the
2
interpretation of P2 (M ) as an infinite-dimensional Riemannian manifold, with W2 as Riemannian
distance. We then further exploit this viewpoint following Otto’s seminal work [67].
In Chapter 3 we make a quite detailed introduction to the theory of gradient flows, borrowing
almost all material from [6]. First we present the classical theory, for λ-convex functionals in Hilbert
spaces. Then we present some equivalent formulations that involve only the distance, and therefore
are applicable (at least in principle) to general metric space. They involve the derivative of the
distance from a point (the (EVI) formulation) or the rate of dissipation of the energy (the (EDE)
and (EDI) formulations). For all these formulations there is a corresponding discrete version of
the gradient flow formulation given by the implicit Euler scheme. We will then show that there is
convergence of the scheme to the continuous solution as the time discretization parameter tends to
0. The (EVI) formulation is the stronger one, in terms of uniqueness, contraction and regularizing
effects. On the other hand this formulation depends on a compatibility condition between energy
and distance; this condition is fulfilled in Non Positively Curved spaces in the sense of Alexandrov
if the energy is convex along geodesics. Luckily enough, the compatibility condition holds even for
some important model functionals in P2 (Rn ) (sum of the so-called internal, potential and interaction
energies), even though the space is Positively Curved in the sense of Alexandrov.
In Chapter 4 we illustrate the power of optimal transportation techniques in the proof of some
classical functional/geometric inequalities: the Brunn-Minkowski inequality, the isoperimetric inequality and the Sobolev inequality. Recent works in this area have also shown the possibility to
prove by optimal transportation methods optimal effective versions of these inequalities: for instance
we can quantify the closedness of E to a ball with the same volume in terms of the vicinity of the
isoperimetric ratio of E to the optimal one.
Chapter 5 is devoted to the presentation of three recent variants of the optimal transport problem,
which lead to different notions of Wasserstein distance: the first one deals with variational problems
giving rise to branched transportation structures, with a ‘Y shaped path’ opposed to the ‘V shaped
one’ typical of the mass splitting occurring in standard optimal transport problems. The second one
involves modification in the action functional on curves arising in the Benamou-Brenier formula:
this leads to many different optimal transportation distances, maybe more difficult to describe from
the Lagrangian viepoint, but still with quite useful implications in evolution PDE’s and functional
inequalities. The last one deals with transportation distance between measures with unequal mass, a
variant useful in the modeling problems with Dirichlet boundary conditions.
Chapter 6 deals with a more detailed analysis of the differentiable structure of P2 (Rd ): besides
the analytic tangent space arising from the Benamou-Brenier formula, also the “geometric” tangent
space, based on constant speed geodesics emanating from a given base point, is introduced. We
also present Otto’s viewpoint on the duality between Wasserstein space and Arnold’s manifolds of
measure-preserving diffeomorphisms. A large part of the chapter is also devoted to the second order
differentiable properties, involving curvature. The notions of parallel transport along (sufficiently
regular) geodesics and Levi-Civita connection in the Wasserstein space are discussed in detail.
Finally, Chapter 7 is devoted to an introduction to the synthetic notions of Ricci lower bounds
for metric measure spaces introduced by Lott & Villani and Sturm in recent papers. This notion is
based on suitable convexity properties of a dimension-dependent internal energy along Wasserstein
geodesics. Synthetic Ricci bounds are completely consistent with the smooth Riemannian case and
stable under measured-Gromov-Hausdorff limits. For this reason these bounds, and their analytic
implications, are a useful tool in the description of measured-GH-limits of Riemannian manifolds.
Acknowledgement. Work partially supported by a MIUR PRIN2008 grant.
3
1
The optimal transport problem
1.1
Monge and Kantorovich formulations of the optimal transport problem
Given a Polish space (X, d) (i.e. a complete and separable metric space), we will denote by P(X)
the set of Borel probability measures on X. By support supp(µ) of a measure µ ∈ P(X) we intend
the smallest closed set on which µ is concentrated.
If X, Y are two Polish spaces, T : X → Y is a Borel map, and µ ∈ P(X) a measure, the
measure T# µ ∈ P(Y ), called the push forward of µ through T is defined by
T# µ(E) = µ(T −1 (E)),
∀E ⊂ Y, Borel.
The push forward is characterized by the fact that
Z
Z
f dT# µ = f ◦ T dµ,
for every Borel function f : Y → R ∪ {±∞}, where the above identity has to be understood in the
following sense: one of the integrals exists (possibly attaining the value ±∞) if and only if the other
one exists, and in this case the values are equal.
Now fix a Borel cost function c : X × Y → R ∪ {+∞}. The Monge version of the transport
problem is the following:
Problem 1.1 (Monge’s optimal transport problem) Let µ ∈ P(X), ν ∈ P(Y ). Minimize
Z
T 7→
c x, T (x) dµ(x)
X
among all transport maps T from µ to ν, i.e. all maps T such that T# µ = ν.
Regardless of the choice of the cost function c, Monge’s problem can be ill-posed because:
• no admissible T exists (for instance if µ is a Dirac delta and ν is not).
• the constraint T# µ = ν is not weakly sequentially closed, w.r.t. any reasonable weak topology.
As an example of the second phenomenon, one can consider the sequence fn (x) := f (nx),
where f : R → R is 1-periodic and equal to 1 on [0, 1/2) and to −1 on [1/2, 1), and the measures
µ := L|[0,1] and ν := (δ−1 + δ1 )/2. It is immediate to check that (fn )# µ = ν for every n ∈ N, and
yet (fn ) weakly converges to the null function f ≡ 0 which satisfies f# µ = δ0 6= ν.
A way to overcome these difficulties is due to Kantorovich, who proposed the following way to
relax the problem:
Problem 1.2 (Kantorovich’s formulation of optimal transportation) We minimize
Z
γ 7→
c(x, y) dγ(x, y)
X×Y
in the set Adm(µ, ν) of all transport plans γ ∈ P(X ×Y ) from µ to ν, i.e. the set of Borel Probability
measures on X × Y such that
γ(A × Y ) = µ(A) ∀A ∈ B(X),
X
π#
γ
Equivalently:
= µ,
and Y respectively.
Y
π#
γ
X
γ(X × B) = ν(B) ∀B ∈ B(Y ).
Y
= ν, where π , π are the natural projections from X × Y onto X
4
R
Transport plans can be thought of as “multivalued” transport maps: γ = γ x dµ(x), with γ x ∈
P({x} × Y ). Another way to look at transport plans is to observe that for γ ∈ Adm(µ, ν), the value
of γ(A × B) is the amount of mass initially in A which is sent into the set B.
There are several advantages in the Kantorovich formulation of the transport problem:
• Adm(µ, ν) is always not empty (it contains µ × ν),
• the set Adm(µ, ν) is convex and compact w.r.t. the narrow topology
R in P(X × Y ) (see below
for the definition of narrow topology and Theorem 1.5), and γ 7→ c dγ is linear,
• minima always exist under mild assumptions on c (Theorem 1.5),
• transport plans “include” transport maps, since T# µ = ν implies that γ := (Id × T )# µ
belongs to Adm(µ, ν).
In order to prove existence of minimizers of Kantorovich’s problem we recall some basic notions
concerning analysis over a Polish space. We say that a sequence (µn ) ⊂ P(X) narrowly converges
to µ provided
Z
Z
ϕ dµn
7→
ϕ dµ,
∀ϕ ∈ Cb (X),
Cb (X) being the space of continuous and bounded functions on X. It can be shown that the topology
of narrow convergence is metrizable. A set K ⊂ P(X) is called tight provided for every ε > 0 there
exists a compact set Kε ⊂ X such that
µ(X \ Kε ) ≤ ε,
∀µ ∈ K.
It holds the following important result.
Theorem 1.3 (Prokhorov) Let (X, d) be a Polish space. Then a family K ⊂ P(X) is relatively
compact w.r.t. the narrow topology if and only if it is tight.
Notice that if K contains only one measure, one recovers Ulam’s theorem: any Borel probability
measure on a Polish space is concentrated on a σ-compact set.
Remark 1.4 The inequality
γ(X × Y \ K1 × K2 ) ≤ µ(X \ K1 ) + ν(Y \ K2 ),
(1.1)
valid for any γ ∈ Adm(µ, ν), shows that if K1 ⊂ P(X) and K2 ⊂ P(Y ) are tight, then so is the set
n
o
X
Y
γ ∈ P(X × Y ) : π#
γ ∈ K1 , π#
γ ∈ K2
Existence of minimizers for Kantorovich’s formulation of the transport problem now comes from a
standard lower-semicontinuity and compactness argument:
Theorem 1.5 Assume that c is lower semicontinuous and bounded from below. Then there exists a
minimizer for Problem 1.2.
Proof
Compactness Remark 1.4 and Ulam’s theorem show that the set Adm(µ, ν) is tight in P(X × Y ),
and hence relatively compact by Prokhorov theorem.
5
To get the narrow compactness, pick a sequence (γ n ) ⊂ Adm(µ, ν) and assume that γ n → γ
narrowly: we want to prove that γ ∈ Adm(µ, ν) as well. Let ϕ be any function in Cb (X) and notice
that (x, y) 7→ ϕ(x) is continuous and bounded in X × Y , hence we have
Z
Z
Z
Z
Z
X
X
ϕ dπ# γ = ϕ(x) dγ(x, y) = lim
ϕ(x) dγ n (x, y) = lim
ϕ dπ# γ n = ϕ dµ,
n→∞
n→∞
X
Y
so that by the arbitrariness of ϕ ∈ Cb (X) we get π#
γ = µ. Similarly we can prove π#
γ = ν,
which gives γ ∈ Adm(µ, ν) as desired.
R
Lower semicontinuity. We claim that the functional γ 7→ c dγ is l.s.c. with respect to narrow
convergence. This is true because our assumptions on c guarantee that there exists an increasing
sequence of functions cn : X × Y → R continuous an bounded such that c(x, y) = supn cn (x, y),
so that by monotone convergence it holds
Z
Z
c dγ = sup cn dγ.
n
Since by construction γ 7→
R
cn dγ is narrowly continuous, the proof is complete.
We will denote by Opt (µ, ν) the set of optimal plans from µ to ν for the Kantorovich formulation
of the transport problem, i.e. the set of minimizers of Problem 1.2. More generally, we will say that
a plan is optimal, if it is optimal between its own marginals. Observe that with the notation Opt (µ, ν)
we are losing the reference to the cost function c, which of course affects the set itself, but the context
will always clarify the cost we are referring to.
Once existence of optimal plans is proved, a number of natural questions arise:
• are optimal plans unique?
• is there a simple way to check whether a given plan is optimal or not?
• do optimal plans have any natural regularity property? In particular, are they induced by maps?
• how far is the minimum of Problem 1.2 from the infimum of Problem 1.1?
This latter question is important to understand whether we can really consider Problem 1.2 the relaxation of Problem 1.1 or not. It is possible to prove that if c is continuous and µ is non atomic,
then
inf (Monge) = min (Kantorovich),
(1.2)
so that transporting with plans can’t be strictly cheaper than transporting with maps. We won’t detail
the proof of this fact.
1.2
Necessary and sufficient optimality conditions
To understand the structure of optimal plans, probably the best thing to do is to start with an example.
Let X = Y = Rd and c(x, y) := |x − y|2 /2. Also, assume that µ, ν ∈ P(Rd ) are supported on
finite sets. Then it is immediate to verify that a plan γ ∈ Adm(µ, ν) is optimal if and only if it holds
N
X
|xi − yi |2
i=1
2
≤
N
X
|xi − yσ(i) |2
i=1
2
,
for any N ∈ N, (xi , yi ) ∈ supp(γ) and σ permutation of the set {1, . . . , N }. Expanding the squares
we get
N
N
X
X
hxi , yi i ≥
xi , yσ(i) ,
i=1
i=1
6
which by definition means that the support of γ is cyclically monotone. Let us recall the following
theorem:
Theorem 1.6 (Rockafellar) A set Γ ⊂ Rd × Rd is cyclically monotone if and only if there exists a
convex and lower semicontinuous function ϕ : Rd → R ∪ {+∞} such that Γ is included in the graph
of the subdifferential of ϕ.
We skip the proof of this theorem, because later on we will prove a much more general version.
What we want to point out here is that under the above assumptions on µ and ν we have that the
following three things are equivalent:
• γ ∈ Adm(µ, ν) is optimal,
• supp(γ) is cyclically monotone,
• there exists a convex and lower semicontinuous function ϕ such that γ is concentrated on the
graph of the subdifferential of ϕ.
The good news is that the equivalence between these three statements holds in a much more
general context (more general underlying spaces, cost functions, measures). Key concepts that are
needed in the analysis, are the generalizations of the concepts of cyclical monotonicity, convexity
and subdifferential which fit with a general cost function c.
The definitions below make sense for a general Borel and real valued cost.
Definition 1.7 (c-cyclical monotonicity) We say that Γ ⊂ X × Y is c-cyclically monotone if
(xi , yi ) ∈ Γ, 1 ≤ i ≤ N , implies
N
X
c(xi , yi ) ≤
i=1
N
X
for all permutations σ of {1, . . . , N }.
c(xi , yσ(i) )
i=1
Definition 1.8 (c-transforms) Let ψ : Y → R ∪ {±∞} be any function. Its c+ -transform ψ c+ :
X → R ∪ {−∞} is defined as
ψ c+ (x) := inf c(x, y) − ψ(y).
y∈Y
Similarly, given ϕ : X → R∪{±∞}, its c+ -transform is the function ϕc+ : Y → R∪{±∞} defined
by
ϕc+ (y) := inf c(x, y) − ϕ(x).
x∈X
The c− -transform ψ
c−
: X → R ∪ {+∞} of a function ψ on Y is given by
ψ c− (x) := sup −c(x, y) − ψ(y),
y∈Y
and analogously for c− -transforms of functions ϕ on X.
Definition 1.9 (c-concavity and c-convexity) We say that ϕ : X → R∪{−∞} is c-concave if there
exists ψ : Y → R ∪ {−∞} such that ϕ = ψ c+ . Similarly, ψ : Y → R ∪ {−∞} is c-concave if there
exists ϕ : Y → R ∪ {−∞} such that ψ = ϕc+ .
Symmetrically, ϕ : X → R ∪ {+∞} is c-convex if there exists ψ : Y → R ∪ {+∞} such that
ϕ = ψ c− , and ψ : Y → R ∪ {+∞} is c-convex if there exists ϕ : Y → R ∪ {+∞} such that
ψ = ϕc− .
7
Observe that ϕ : X → R ∪ {−∞} is c-concave if and only if ϕc+ c+ = ϕ. This is a consequence
of the fact that for any function ψ : Y → R ∪ {±∞} it holds ψ c+ = ψ c+ c+ c+ , indeed
x, y˜) + c(˜
x, y) − ψ(y),
ψ c+ c+ c+ (x) = inf sup inf c(x, y˜) − c(˜
y˜∈Y x
˜∈X y∈Y
and choosing x
˜ = x we get ψ c+ c+ c+ ≥ ψ c+ , while choosing y = y˜ we get ψ c+ c+ c+ ≤ ψ c+ .
Similarly for functions on Y and for the c-convexity.
Definition 1.10 (c-superdifferential and c-subdifferential) Let ϕ : X → R ∪ {−∞} be a cconcave function. The c-superdifferential ∂ c+ ϕ ⊂ X × Y is defined as
n
o
∂ c+ ϕ := (x, y) ∈ X × Y : ϕ(x) + ϕc+ (y) = c(x, y) .
The c-superdifferential ∂ c+ ϕ(x) at x ∈ X is the set of y ∈ Y such that (x, y) ∈ ∂ c+ ϕ. A symmetric
definition is given for c-concave functions ψ : Y → R ∪ {−∞}.
The definition of c-subdifferential ∂ c− of a c-convex function ϕ : X → {+∞} is analogous:
n
o
∂ c− ϕ := (x, y) ∈ X × Y : ϕ(x) + ϕc− (y) = −c(x, y) .
Analogous definitions hold for c-concave and c-convex functions on Y .
Remark 1.11 (The base case: c(x, y) = − hx, yi) Let X = Y = Rd and c(x, y) = − hx, yi. Then
a direct application of the definitions show that:
• a set is c-cyclically monotone if and only if it is cyclically monotone
• a function is c-convex (resp. c-concave) if and only if it is convex and lower semicontinuous
(resp. concave and upper semicontinuous),
• the c-subdifferential of the c-convex (resp. c-superdifferential of the c-concave) function is the
classical subdifferential (resp. superdifferential),
• the c− transform is the Legendre transform.
Thus in this situation these new definitions become the classical basic definitions of convex analysis.
Remark 1.12 (For most applications c-concavity is sufficient) There are several trivial relations
between c-convexity, c-concavity and related notions. For instance, ϕ is c-concave if and only if
−ϕ is c-convex, −ϕc+ = (−ϕ)c− and ∂ c+ ϕ = ∂ c− (−ϕ). Therefore, roughly said, every statement
concerning c-concave functions can be restated in a statement for c-convex ones. Thus, choosing to
work with c-concave or c-convex functions is actually a matter of taste.
Our choice is to work with c-concave functions. Thus all the statements from now on will deal
only with these functions. There is only one, important, part of the theory where the distinction
between c-concavity and c-convexity is useful: in the study of geodesics in the Wasserstein space
(see Section 2.2, and in particular Theorem 2.18 and its consequence Corollary 2.24).
We also point out that the notation used here is different from the one in [80], where a less
symmetric notion (but better fitting the study of geodesics) of c-concavity and c-convexity has been
preferred.
An equivalent characterization of the c-superdifferential is the following: y ∈ ∂ c+ ϕ(x) if and
only if it holds
ϕ(x) = c(x, y) − ϕc+ (y),
ϕ(z) ≤ c(z, y) − ϕc+ (y),
8
∀z ∈ X,
or equivalently if
ϕ(x) − c(x, y) ≥ ϕ(z) − c(z, y),
∀z ∈ X.
(1.3)
A direct consequence of the definition is that the c-superdifferential of a c-concave function is
always a c-cyclically monotone set, indeed if (xi , yi ) ∈ ∂ c+ ϕ it holds
X
X
X
X
c(xi , yi ) =
ϕ(xi ) + ϕc (yi ) =
ϕ(xi ) + ϕc (yσ(i) ) ≤
c(xi , yσ(i) ),
i
i
i
i
for any permutation σ of the indexes.
What is important to know is that actually under mild assumptions on c, every c-cyclically monotone set can be obtained as the c-superdifferential of a c-concave function. This result is part of the
following important theorem:
Theorem 1.13 (Fundamental theorem of optimal transport) Assume that c : X × Y → R is
continuous and bounded from below and let µ ∈ P(X), ν ∈ P(Y ) be such that
c(x, y) ≤ a(x) + b(y),
(1.4)
for some a ∈ L1 (µ), b ∈ L1 (ν). Also, let γ ∈ Adm(µ, ν). Then the following three are equivalent:
i) the plan γ is optimal,
ii) the set supp(γ) is c-cyclically monotone,
iii) there exists a c-concave function ϕ such that max{ϕ, 0} ∈ L1 (µ) and supp(γ) ⊂ ∂ c+ ϕ.
Proof Observe that the inequality (1.4) together with
Z
Z
Z
Z
c(x, y)d˜
γ (x, y) ≤ a(x)+b(y)d˜
γ (x, y) = a(x)dµ(x)+ b(y)dν(y) < ∞,
∀˜
γ ∈ Adm(µ, ν)
˜ ∈ Adm(µ, ν) the function max{c, 0} is integrable. This,
implies that for any admissible plan γ
˜.
together with the bound from below on c gives that c ∈ L1 (˜
γ ) for any admissible plan γ
(i) ⇒ (ii) We argue by contradiction: assume that the support of γ is not c-cyclically monotone.
Thus we can find N ∈ N, {(xi , yi )}1≤i≤N ⊂ supp(γ) and some permutation σ of {1, . . . , N } such
that
N
N
X
X
c(xi , yi ) >
c(xi , yσ(i) ).
i=1
i=1
By continuity we can find neighborhoods Ui 3 xi , Vi 3 yi with
N
X
c(ui , vσ(i) ) − c(ui , vi ) < 0
∀(ui , vi ) ∈ Ui × Vi , 1 ≤ i ≤ N.
i=1
˜ = γ + η of γ in such a way that minimality of γ is violated.
Our goal is to build a “variation” γ
To this aim, we need a signed measure η with:
˜ is nonnegative);
(A) η − ≤ γ (so that γ
˜ ∈ Adm(µ, ν));
(B) null first and second marginal (so that γ
R
(C) c dη < 0 (so that γ is not optimal).
9
1
Let Ω := ΠN
i=1 Ui × Vi and P ∈ P(Ω) be defined as the product of the measures mi γ |Ui ×Vi ,
where mi := γ(Ui × Vi ). Denote by π Ui , π Vi the natural projections of Ω to Ui and Vi respectively
and define
N
mini mi X Ui Vσ(i)
η :=
(π , π
)# P − (π Ui , π V(i) )# P.
N
i=i
It is immediate to verify that η fulfills (A), (B), (C) above, so that the thesis is proven.
(ii) ⇒ (iii) We need to prove that if Γ ⊂ X × Y is a c-cyclically monotone set, then there exists
a c-concave function ϕ such that ∂ c ϕ ⊃ Γ and max{ϕ, 0} ∈ L1 (µ). Fix (x, y) ∈ Γ and observe
that, since we want ϕ to be c-concave with the c-superdifferential that contains Γ, for any choice of
(xi , yi ) ∈ Γ, i = 1, . . . , N , we need to have
ϕ(x) ≤ c(x, y1 ) − ϕc+ (y1 ) = c(x, y1 ) − c(x1 , y1 ) + ϕ(x1 )
≤ c(x, y1 ) − c(x1 , y1 ) + c(x1 , y2 ) − ϕc+ (y2 )
= c(x, y1 ) − c(x1 , y1 ) + c(x1 , y2 ) − c(x2 , y2 ) + ϕ(x2 )
≤ ···
≤ c(x, y1 ) − c(x1 , y1 ) + c(x1 , y2 ) − c(x2 , y2 ) + · · · + c(xN , y) − c(x, y) + ϕ(x).
It is therefore natural to define ϕ as the infimum of the above expression as {(xi , yi )}i=1,...,N vary
among all N -ples in Γ and N varies in N. Also, since we are free to add a constant to ϕ, we can
neglect the addendum ϕ(x) and define:
ϕ(x) := inf c(x, y1 ) − c(x1 , y1 ) + c(x1 , y2 ) − c(x2 , y2 ) + · · · + c(xN , y) − c(x, y) ,
the infimum being taken on N ≥ 1 integer and (xi , yi ) ∈ Γ, i = 1, . . . , N . Choosing N = 1 and
(x1 , y1 ) = (x, y) we get ϕ(x) ≤ 0. Conversely, from the c-cyclical monotonicity of Γ we have
ϕ(x) ≥ 0. Thus ϕ(x) = 0.
Also, it is clear from the definition that ϕ is c-concave. Choosing again N = 1 and (x1 , y1 ) =
(x, y), using (1.3) we get
ϕ(x) ≤ c(x, y) − c(x, y) < a(x) + b(y) − c(x, y),
which, together with the fact that a ∈ L1 (µ), yields max{ϕ, 0} ∈ L1 (µ). Thus, we need only to
prove that ∂ c+ ϕ contains Γ. To this aim, choose (˜
x, y˜) ∈ Γ, let (x1 , y1 ) = (˜
x, y˜) and observe that by
definition of ϕ(x) we have
ϕ(x) ≤ c(x, y˜) − c(˜
x, y˜) + inf c(˜
x, y2 ) − c(x2 , y2 ) + · · · + c(xN , y) − c(x, y)
= c(x, y˜) − c(˜
x, y˜) + ϕ(˜
x).
By the characterization (1.3), this inequality shows that (˜
x, y˜) ∈ ∂ c+ ϕ, as desired. R
R
˜ ∈ Adm(µ, ν) be any transport plan. We need to prove that cdγ ≤ cd˜
(iii) ⇒ (i). Let γ
γ.
Recall that we have
ϕ(x) + ϕc+ (y) = c(x, y),
c+
ϕ(x) + ϕ (y) ≤ c(x, y),
10
∀(x, y) ∈ supp(γ)
∀x ∈ X, y ∈ Y,
and therefore
Z
Z
Z
Z
c(x, y)dγ(x, y) = ϕ(x) + ϕc+ (y)dγ(x, y) = ϕ(x)dµ(x) + ϕc+ (y)dν(y)
Z
Z
c+
γ (x, y) ≤ c(x, y)d˜
γ (x, y).
= ϕ(x) + ϕ (y)d˜
Remark 1.14 Condition (1.4) is natural in some, but not all, problems. For instance problems with
constraints or in Wiener spaces (infinite-dimensional Gaussian spaces) include +∞-valued costs,
with a “large” set of points where the cost is not finite. We won’t discuss these topics.
An important consequence of the previous theorem is that being optimal is a property that depends
only on the support of the plan γ, and not on how the mass is distributed in the support itself: if γ
˜ is such that supp(˜
˜ is
is an optimal plan (between its own marginals) and γ
γ ) ⊂ supp(γ), then γ
optimal as well (between its own marginals, of course). We will see in Proposition 2.5 that one of
the important consequences of this fact is the stability of optimality.
Analogous arguments works for maps. Indeed assume that T : X → Y is a map such that
T (x) ∈ ∂ c+ ϕ(x) for some c-concave function ϕ for all x. Then, for every µ ∈ P(X) such that
condition (1.4) is satisfied for ν = T# µ, the map T is optimal between µ and T# µ. Therefore it
makes sense to say that T is an optimal map, without explicit mention to the reference measures.
Remark 1.15 From Theorem 1.13 we know that given µ ∈ P(X), ν ∈ P(Y ) satisfying the
assumption of the theorem, for every optimal plan γ there exists a c-concave function ϕ such that
supp(γ) ⊂ ∂ c+ ϕ. Actually, a stronger statement holds, namely: if supp(γ) ⊂ ∂ c+ ϕ for some
optimal γ, then supp(γ 0 ) ⊂ ∂ c+ ϕ for every optimal plan γ 0 . Indeed arguing as in the proof of 1.13
one can see that max{ϕ, 0} ∈ L1 (µ) implies max{ϕc+ , 0} ∈ L1 (ν) and thus it holds
Z
Z
Z
Z
Z
c+
c+
0
0
ϕdµ + ϕ dν = ϕ(x) + ϕ (y)dγ (x, y) ≤ c(x, y)dγ (x, y) = c(x, y)dγ(x, y)
Z
Z
Z
supp(γ) ⊂ ∂ c+ ϕ
= ϕ(x) + ϕc+ (y)dγ(x, y) = ϕdµ + ϕc+ dν.
Thus the inequality must be an equality, which is true if and only if for γ 0 -a.e. (x, y) it holds (x, y) ∈
∂ c+ ϕ, hence, by the continuity of c, we conclude supp(γ 0 ) ⊂ ∂ c+ ϕ.
1.3
The dual problem
The transportR problem in the Kantorovich formulation is the problem of minimizing the linear funcX
Y
tional γ 7→ cdγ with the affine constraints π#
γ = µ, π#
γ = ν and γ ≥ 0. It is well known
that problems of this kind admit a natural dual problem, where we maximize a linear functional with
affine constraints. In our case the dual problem is:
Problem 1.16 (Dual problem) Let µ ∈ P(X), ν ∈ P(Y ). Maximize the value of
Z
Z
ϕ(x)dµ(x) + ψ(y)dν(y),
among all functions ϕ ∈ L1 (µ), ψ ∈ L1 (ν) such that
ϕ(x) + ψ(y) ≤ c(x, y),
∀x ∈ X, y ∈ Y.
(1.5)
11
The relation between the transport problem and the dual one consists in the fact that
Z
Z
Z
inf
c(x, y)dγ(x, y) = sup ϕ(x)dµ(x) + ψ(y)dν(y),
γ ∈Adm(µ,ν)
ϕ,ψ
where the supremum is taken among all ϕ, ψ as in the definition of the problem.
Although the fact that equality holds is an easy consequence of Theorem 1.13 of the previous
section (taking ψ = ϕc+ , as we will see), we prefer to start with an heuristic argument which shows
“why” duality works. The calculations we are going to do are very common in linear programming
and are based on the min-max principle. Observe how the constraint γ ∈ Adm(µ,
R ν) “becomes” the
functional to maximize in the dual problem and the functional to minimize cdγ “becomes” the
constraint in the dual problem.
Start observing that
Z
Z
inf
c(x, y)dγ(x, y) =
inf
c(x, y)dγ + χ(γ),
(1.6)
γ ∈Adm(µ,ν)
γ ∈M+ (X×Y )
where χ(γ) is equal to 0 if γ ∈ Adm(µ, ν) and +∞ if γ ∈
/ Adm(µ, ν), and M+ (X × Y ) is the set
of non negative Borel measures on X × Y . We claim that the function χ may be written as
Z
Z
nZ
o
χ(γ) = sup
ϕ(x)dµ(x) + ψ(y)dν(y) − ϕ(x) + ψ(y)dγ(x, y) ,
ϕ,ψ
where the supremum is taken among all (ϕ, ψ) ∈ Cb (X) × Cb (Y ). Indeed, if γ ∈ Adm(µ, ν) then
χ(γ) = 0, while if γ ∈
/ Adm(µ, ν) we can find (ϕ, ψ) ∈ Cb (X) × Cb (Y ) such that the value between
the brackets is different from 0, thus by multiplying (ϕ, ψ) by appropriate real numbers we have that
the supremum is +∞. Thus from (1.6) we have
Z
inf
c(x, y)dγ(x, y)
γ ∈Adm(µ,ν)
Z
Z
Z
Z
=
inf
sup
c(x, y)dγ(x, y) + ϕ(x)dµ(x) + ψ(y)dν(y) − ϕ(x) + ψ(y)dγ(x, y) .
γ ∈M+ (X×Y ) ϕ,ψ
Call the expression between brackets F (γ, ϕ, ψ). Since γ 7→ F (γ, ϕ, ψ) is convex (actually linear)
and (ϕ, ψ) 7→ F (γ, ϕ, ψ) is concave (actually linear), the min-max principle holds and we have
inf
sup F (γ, ϕ, ψ) = sup
γ ∈Adm(µ,ν) ϕ,ψ
inf
ϕ,ψ γ ∈M+ (X×Y )
F (γ, ϕ, ψ).
Thus we have
Z
inf
c(x, y)dγ(x, y)
γ ∈Adm(µ,ν)
Z
Z
Z
Z
= sup
inf
c(x, y)dγ(x, y) + ϕ(x)dµ(x) + ψ(y)dν(y) − ϕ(x) + ψ(y)dγ(x, y)
ϕ,ψ γ ∈M+ (X×Y )
Z
= sup
Z
Z
ϕ(x)dµ(x) +
ψ(y)dν(y) +
ϕ,ψ
inf
γ ∈M+ (X×Y )
c(x, y) − ϕ(x) − ψ(y)dγ(x, y)
Now observe the quantity
Z
inf
γ ∈M+ (X×Y )
c(x, y) − ϕ(x) − ψ(y)dγ(x, y) .
12
.
If ϕ(x) + ψ(y) ≤ c(x, y) for any (x, y), then the integrand is non-negative and the infimum is 0
(achieved when γ is the null-measure). Conversely, if ϕ(x) + ψ(y) > c(x, y) for some (x, y) ∈
X × Y , then choose γ := nδ(x,y) with n large to get that the infimum is −∞.
Thus, we proved that
Z
Z
Z
inf
c(x, y)dγ(x, y) = sup ϕ(x)dµ(x) + ψ(y)dν(y),
γ ∈Adm(µ,ν)
ϕ,ψ
where the supremum is taken among continuous and bounded functions (ϕ, ψ) satisfying (1.5).
We now give the rigorous statement and a proof independent of the min-max principle.
Theorem 1.17 (Duality) Let µ ∈ P(X), ν ∈ P(Y ) and c : X × Y → R a continuous and
bounded from below cost function. Assume that (1.4) holds. Then the minimum of the Kantorovich
problem 1.2 is equal to the supremum of the dual problem 1.16.
Furthermore, the supremum of the dual problem is attained, and the maximizing couple (ϕ, ψ) is of
the form (ϕ, ϕc+ ) for some c-concave function ϕ.
Proof Let γ ∈ Adm(µ, ν) and observe that for any couple of functions ϕ ∈ L1 (µ) and ψ ∈ L1 (ν)
satisfying (1.5) it holds
Z
Z
Z
Z
c(x, y)dγ(x, y) ≥ ϕ(x) + ψ(y)dγ(x, y) = ϕ(x)dµ(x) + ψ(y)dν(y).
This shows that the minimum of the Kantorovich problem is ≥ than the supremum of the dual problem.
To prove the converse inequality pick γ ∈ Opt (µ, ν) and use Theorem 1.13 to find a c-concave
function ϕ such that supp(γ) ⊂ ∂ c+ ϕ, max{ϕ, 0} ∈ L1 (µ) and max{ϕc+ , 0} ∈ L1 (ν). Then, as in
the proof of (iii) ⇒ (i) of Theorem 1.13, we have
Z
Z
Z
Z
c(x, y) dγ(x, y) = ϕ(x) + ϕc+ (y) dγ(x, y) = ϕ(x) dµ(x) + ϕc+ (y) dν(y),
R
and c dγ ∈ R. Thus ϕ ∈ L1 (µ) and ϕc+ ∈ L1 (ν), which shows that (ϕ, ϕc+ ) is an admissible
couple in the dual problem and gives the thesis.
Remark 1.18 Notice that a statement stronger than the one of Remark 1.15 holds, namely: under the
assumptions of Theorems 1.13 and 1.17, for any c-concave couple of functions (ϕ, ϕc+ ) maximizing
the dual problem and any optimal plan γ it holds
supp(γ) ⊂ ∂ c+ ϕ.
Indeed we already know that for some c-concave ϕ we have ϕ ∈ L1 (µ), ϕc+ ∈ L1 (ν) and
supp(γ) ⊂ ∂ c+ ϕ,
˜ for the dual problem 1.16 and notice
for any optimal γ. Now pick another maximizing couple (ϕ,
˜ ψ)
˜
that ϕ(x)
˜
+ ψ(y)
≤ c(x, y) for any x, y implies ψ˜ ≤ ϕ˜c+ , and therefore (ϕ,
˜ ϕ˜c+ ) is a maximizing
c+
1
couple as well. The fact that ϕ˜ ∈ L (ν) follows as in the proof of Theorem 1.17. Conclude
noticing that for any optimal plan γ it holds
Z
Z
Z
Z
Z
ϕdµ
˜ + ϕ˜c+ dν = ϕdµ + ϕc+ dν = ϕ(x) + ϕc+ (y)dγ(x, y)
Z
Z
Z
= c(x, y)dγ ≥ ϕdµ
˜ + ϕ˜c+ dν,
so that the inequality must be an equality.
13
Definition 1.19 (Kantorovich potential) A c-concave function ϕ such that (ϕ, ϕc+ ) is a maximizing
pair for the dual problem 1.16 is called a c-concave Kantorovich potential, or simply Kantorovich
potential, for the couple µ, ν. A c-convex function ϕ is called c-convex Kantorovich potential if −ϕ
is a c-concave Kantorovich potential.
Observe that c-concave Kantorovich potentials are related to the transport problem in the following two different (but clearly related) ways:
• as c-concave functions whose superdifferential contains the support of optimal plans, according to Theorem 1.13,
• as maximizing functions, together with their c+ -tranforms, in the dual problem.
1.4
Existence of optimal maps
The problem of existence of optimal transport maps consists in looking for optimal plan γ which are
X
γ and some
induced by a map T : X → Y , i.e. plans γ which are equal to (Id, T )# µ, for µ := π#
measurable map T . As we discussed in the first section, in general this problem has no answer, as it
may very well be the case when, for given µ ∈ P(X), ν ∈ P(Y ), there is no transport map at all
from µ to ν. Still, since we know that (1.2) holds when µ has no atom, it is possible that under some
additional assumptions on the starting measure µ and on the cost function c, optimal transport maps
exist.
To formulate the question differently: given µ, ν and the cost function c, is that true that at least
one optimal plan γ is induced by a map?
Let us start observing that thanks to Theorem 1.13, the answer to this question relies in a natural
way on the analysis of the properties of c-monotone sets, to see how far are they from being graphs.
Indeed:
Lemma 1.20 Let γ ∈ Adm(µ, ν). Then γ is induced by a map if and only if there exists a γmeasurable set Γ ⊂ X × Y where γ is concentrated, such that for µ-a.e. x there exists only one
y = T (x) ∈ Y such that (x, y) ∈ Γ. In this case γ is induced by the map T .
Proof The if part is obvious. For the only if, let Γ be as in the statement of the lemma. Possibly
removing from Γ a product N × Y , with N µ-negligible, we can assume that Γ is a graph, and denote
by T the corresponding map. By the inner regularity of measures, it is easily seen that we can also
assume Γ = ∪n Γn to be σ-compact. Under this assumption the domain of T (i.e. the projection of Γ
on X) is σ-compact, hence Borel, and the restriction of T to the compact set πX (Γn ) is continuous.
It follows that T is a Borel map. Since y = T (x) γ-a.e. in X × Y we conclude that
Z
Z
Z
φ(x, y) dγ(x, y) = φ(x, T (x))dγ(x, y) = φ(x, T (x))dµ(x),
so that γ = (Id × T )# µ.
Thus the point is the following. We know by Theorem 1.13 that optimal plans are concentrated
on c-cyclically monotone sets, still from Theorem 1.13 we know that c-cyclically monotone sets are
obtained by taking the c-superdifferential of a c-concave function. Hence from the lemma above
what we need to understand is “how often” the c-superdifferential of a c-concave function is single
valued.
There is no general answer to this question, but many particular cases can be studied. Here we
focus on two special and very important situations:
14
• X = Y = Rd and c(x, y) = |x − y|2 /2,
• X = Y = M , where M is a Riemannian manifold, and c(x, y) = d2 (x, y)/2, d being the
Riemannian distance.
Let us start with the case X = Y = Rd and c(x, y) = |x − y|2 /2. In this case there is a simple
characterization of c-concavity and c-superdifferential:
Proposition 1.21 Let ϕ : Rd → R ∪ {−∞}. Then ϕ is c-concave if and only if x 7→ ϕ(x) :=
|x|2 /2 − ϕ(x) is convex and lower semicontinuous. In this case y ∈ ∂ c+ ϕ(x) if and only if y ∈
∂ − ϕ(x).
Proof Observe that
ϕ(x) = inf
y
|x − y|2
|x|2
|y|2
− ψ(y) ⇔ ϕ(x) = inf
+ hx, −yi +
− ψ(y)
y
2
2
2
|y|2
|x|2
= inf hx, −yi +
− ψ(y)
⇔ ϕ(x) −
y
2
2
2
|y|
⇔ ϕ(x) = sup hx, yi −
− ψ(y) ,
2
y
which proves the first claim.
c+
y ∈ ∂ ϕ(x) ⇔
⇔
For the second observe that
ϕ(x) = |x − y|2 /2 − ϕc+ (y),
ϕ(z) ≤ |z − y|2 /2 − ϕc+ (y),
∀z ∈ Rd
ϕ(x) − |x|2 /2 = hx, −yi + |y|2 /2 − ϕc+ (y),
ϕ(z) − |z|2 /2 ≤ hz, −yi + |y|2 /2 − ϕc+ (y),
⇔ ϕ(z) − |z|2 /2 ≤ ϕ(x) − |x|2 /2 + hz − x, −yi
∀z ∈ Rd
∀z ∈ Rd
⇔ −y ∈ ∂ + (ϕ − | · |2 /2)(x)
⇔ y ∈ ∂ − ϕ(x)
Therefore in this situation being concentrated on the c-superdifferential of a c-concave map means
being concentrated on the (graph of) the subdifferential of a convex function.
Remark 1.22 (Perturbations of the identity via smooth gradients are optimal) An immediate
consequence of the above proposition is the fact that if ψ ∈ Cc∞ (Rd ), then there exists ε > 0
such that Id + ε∇ψ is an optimal map for any |ε| ≤ ε. Indeed, it is sufficient to take ε such that
−Id ≤ ε∇2 ψ ≤ Id. With this choice, the map x 7→ |x|2 /2 + εψ(x) is convex for any |ε| ≤ ε, and
thus its gradient is an optimal map.
Proposition 1.21 reduced the problem of understanding when there exists optimal maps reduces to the
problem of convex analysis of understanding how the set of non differentiability points of a convex
function is made. This latter problem has a known answer; in order to state it, we need the following
definition:
Definition 1.23 (c − c hypersurfaces) A set E ⊂ Rd is called c − c hypersurface1 if, in a suitable
system of coordinates, it is the graph of the difference of two real valued convex functions, i.e. if there
exists convex functions f, g : Rd−1 → R such that
n
o
E = (y, t) ∈ Rd : y ∈ Rd−1 , t ∈ R, t = f (y) − g(y) .
1
here c − c stands for ‘convex minus convex’ and has nothing to do with the c we used to indicate the cost function
15
Then it holds the following theorem, which we state without proof:
Theorem 1.24 (Structure of sets of non differentiability of convex functions) Let A ⊂ Rd . Then
there exists a convex function ϕ : Rd → R such that A is contained in the set of points of non
differentiability of ϕ if and only if A can be covered by countably many c − c hypersurfaces.
We give the following definition:
Definition 1.25 (Regular measures on Rd ) A measure µ ∈ P(Rd ) is called regular provided
µ(E) = 0 for any c − c hypersurface E ⊂ Rd .
Observe that absolutely continuous measures and measures which give 0 mass to Lipschitz hypersurfaces are automatically regular (because convex functions are locally Lipschitz, thus a c − c
hypersurface is a locally Lipschitz hypersurface).
Now we can state the result concerning existence and uniqueness of optimal maps:
R
Theorem 1.26 (Brenier) Let µ ∈ P(Rd ) be such that |x|2 dµ(x) is finite. Then the following are
equivalent:
R
i) for every ν ∈ P(Rd ) with |x|2 dν(x) < ∞ there exists only one transport plan from µ to ν
and this plan is induced by a map T ,
ii) µ is regular.
If either (i) or (ii) hold, the optimal map T can be recovered by taking the gradient of a convex
function.
Proof
(ii) ⇒ (i) and the last statement. Take a(x) = b(x) = |x|2 in the statement of Theorem 1.13. Then
our assumptions on µ, ν guarantees that the bound (1.4) holds. Thus the conclusions of Theorems
1.13 and 1.17 are true as well. Using Remark 1.18 we know that for any c-concave Kantorovich potential ϕ and any optimal plan γ ∈ Opt (µ, ν) it holds supp(γ) ⊂ ∂ c+ ϕ. Now from Proposition 1.21
we know that ϕ := | · |2 /2 − ϕ is convex and that ∂ c ϕ = ∂ − ϕ. Here we use our assumption on
µ: since ϕ is convex, we know that the set E of points of non differentiability of ϕ is µ-negligible.
Therefore the map ∇ϕ : Rd → Rd is well defined µ-a.e. and every optimal plan must be concentrated on its graph. Hence the optimal plan is unique and induced by the gradient of the convex
function ϕ.
(ii) ⇒ (i). We argue by contradiction and assume that there is some convex function ϕ : Rd → R
such that the set E of points of non differentiability of ϕ has positive µ measure. Possibly modifying
ϕ outside a compact set, we can assume that it has linear growth at infinity. Now define the two
maps:
T (x) := the element of smallest norm in ∂ − ϕ(x),
S(x) := the element of biggest norm in ∂ − ϕ(x),
and the plan
γ :=
1
(Id, T )# µ + (Id, S)# µ .
2
Y
The fact that ϕ has linear growth, implies that ν := π#
γ has compact support. Thus in particular
R 2
|x| dν(x) < ∞. The contradiction comes from the fact that γ ∈ Adm(µ, ν) is c-cyclically monotone (because of Proposition 1.21), and thus optimal. However, it is not induced by a map, because
T 6= S on a set of positive µ measure (Lemma 1.20).
16
The question of regularity of the optimal map is very delicate. In general it is only of bounded
variation (BV in short), since monotone maps always have this regularity property, and discontinuities can occur: just think to the case in which the support of the starting measure is connected,
while the one of the arrival measure is not. It turns out that connectedness is not sufficient to prevent
discontinuities, and that if we want some regularity, we have to impose a convexity restriction on
supp ν. The following result holds:
Theorem 1.27 (Regularity theorem) Assume Ω1 , Ω2 ⊂ Rd are two bounded and connected open
sets, µ = ρLd |Ω , ν = ηLd |Ω with 0 < c ≤ ρ, η ≤ C for some c, C ∈ R. Assume also that Ω2
1
2
is convex. Then the optimal transport map T belongs to C 0,α (Ω1 ) for some α < 1. In addition, the
following implication holds:
ρ ∈ C 0,α (Ω1 ), η ∈ C 0,α (Ω2 )
=⇒
T ∈ C 1,α (Ω1 ).
The convexity assumption on Ω2 is needed to show that the convex function ϕ whose gradient
provides the optimal map T is a viscosity solution of the Monge-Ampere equation
ρ1 (x) = ρ2 (∇ϕ(x)) det(∇2 ϕ(x)),
and then the regularity theory for Monge-Ampere, developed by Caffarelli and Urbas, applies.
As an application of Theorem 1.26 we discuss the question of polar factorization of vector fields
on Rd . Let Ω ⊂ Rd be a bounded domain, denote by µΩ the normalized Lebesgue measure on Ω and
consider the space
S(Ω) := {Borel map s : Ω → Ω : s# µΩ = µΩ } .
The following result provides a (nonlinear) projection on the (nonconvex) space S(Ω).
Proposition 1.28 (Polar factorization) Let S ∈ L2 (µΩ ; Rn ) be such that ν := S# µ is regular
(Definition 1.25). Then there exist unique s ∈ S(Ω) and ∇ϕ, with ϕ convex, such that S = (∇ϕ) ◦ s.
Also, s is the unique minimizer of
Z
|S − s˜|2 dµ,
among all s˜ ∈ S(Ω).
Proof By assumption, we know that both µΩ and ν are regular measures with finite second moment.
We claim that
Z
Z
inf
|S − s˜|2 dµ =
min
|x − y|2 dγ(x, y).
(1.7)
γ ∈Adm(µ,ν)
s˜∈S(Ω)
To see why, associate to each s˜ ∈ S(Ω) the plan γ s˜ := (˜
s, S)# µ which clearly belongs to
Adm(µΩ , ν). This gives inequality ≥. Now let γ be the unique optimal plan and apply Theorem
1.26 twice to get that
γ = (Id, ∇ϕ)# µΩ = (∇ϕ,
˜ Id)# ν,
for appropriate convex functions ϕ, ϕ,
˜ which therefore satisfy ∇ϕ ◦ ∇ϕ˜ = Id µ-a.e.. Define s :=
∇ϕ˜ ◦ S. Then s# µΩ = µΩ and thus s ∈ S(Ω). Also, S = ∇ϕ ◦ s which proves the existence of the
polar factorization. The identity
Z
Z
Z
Z
2
2
2
|x − y| dγ s (x, y) = |s − S| dµΩ = |∇ϕ˜ ◦ S − S| dµΩ = |∇ϕ˜ − Id|2 dν
Z
=
min
|x − y|2 dγ(x, y),
γ ∈Adm(µ,ν)
17
shows inequality ≤ in (1.7) and the uniqueness of the optimal plan ensures that s is the unique
minimizer.
To conclude we need to show uniqueness of the polar factorization. Assume that S = (∇ϕ) ◦ s is
another factorization and notice that ∇ϕ# µΩ = (∇ϕ ◦ s)# µΩ = ν. Thus the map ∇ϕ is a transport
map from µΩ to ν and is the gradient of a convex function. By Proposition 1.21 and Theorem 1.13
we deduce that ∇ϕ is the optimal map. Hence ∇ϕ = ∇ϕ and the proof is achieved.
Remark 1.29 (Polar factorization vs Helmholtz decomposition) The classical Helmoltz decomposition of vector fields can be seen as a linearized version of the polar factorization result, which
therefore can be though as a generalization of the former.
To see why, assume that Ω and all the objects considered are smooth (the arguments hereafter
are just formal). Let u : Ω → Rd be a vector field and apply the polar factorization to the map
Sε := Id + εu with |ε| small. Then we have Sε = (∇ϕε ) ◦ sε and both ∇ϕε and sε will be
perturbation of the identity, so that
∇ϕε = Id + εv + o(ε),
sε = Id + εw + o(ε).
The question now is: which information is carried on v, w from the properties of the polar factorization? At the level of v, from the fact that ∇ × (∇ϕε ) = 0 we deduce ∇ × v = 0, which means that v
is the gradient of some function p. On the other hand, the fact that sε is measure preserving implies
that w satisfies ∇ · (wχΩ ) = 0 in the sense of distributions: indeed for any smooth f : Rd → R it
holds
Z
Z
Z
d
d
f d(sε )# µΩ =
f ◦ sε dµΩ = ∇f · w dµΩ .
0=
dε |ε=0
dε |ε=0
Then from the identity (∇ϕε ) ◦ sε = Id + ε(∇p + w) + o(ε) we can conclude that
u = ∇p + w.
We now turn to the case X = Y = M , with M smooth Riemannian manifold, and c(x, y) =
d2 (x, y)/2, d being the Riemannian distance on M . For simplicity, we will assume that M is compact
and with no boundary, but everything holds in more general situations.
The underlying ideas of the foregoing discussion are very similar to the ones of the case X =
Y = Rd , the main difference being that there is no more the correspondence given by Proposition
1.21 between c-concave functions and convex functions, as in the Euclidean case. Recall however
that the concepts of semiconvexity (i.e. second derivatives bounded from below) and semiconcavity
make sense also on manifolds, since these properties can be read locally and changes of coordinates
are smooth.
In the next proposition we will use the fact that on a compact and smooth Riemannian manifold,
the functions x 7→ d2 (x, y) are uniformly Lipschitz and uniformly semiconcave in y ∈ M (i.e. the
second derivative along a unit speed geodesic is bounded above by a universal constant depending
only on M , see e.g. the third appendix of Chapter 10 of [80] for the simple proof).
Proposition 1.30 Let M be a smooth, compact Riemannian manifold without boundary. Let ϕ :
M → R ∪ {−∞} be a c-concave function not identically equal to −∞. Then ϕ is Lipschitz,
+
semiconcave and real valued. Also, assume that y ∈ ∂ c+ ϕ(x). Then exp−1
x (y) ⊂ −∂ ϕ(x).
c+
Conversely, if ϕ is differentiable at x, then expx (−∇ϕ(x)) ∈ ∂ ϕ(x).
18
Proof The fact that ϕ is real valued follows from the fact that the cost function d2 (x, y)/2 is uniformly bounded in x, y ∈ M . Smoothness and compactness ensure that the functions d2 (·, y)/2
are uniformly Lipschitz and uniformly semiconcave in y ∈ M , this gives that ϕ is Lipschitz and
semiconcave.
Now pick y ∈ ∂ c+ ϕ(x) and v ∈ exp−1
x (y). Recall that −v belongs to the superdifferential of
2
d (·, y)/2 at x, i.e.
d2 (x, y) d2 (z, y)
≤
− v, exp−1
x (z) + o(d(x, z)).
2
2
c+
Thus from y ∈ ∂ ϕ(x) we have
(1.3)
ϕ(z) − ϕ(x) ≤
d2 (z, y) d2 (x, y) −
≤ −v, exp−1
x (z) + o(d(x, z)),
2
2
that is −v ∈ ∂ + ϕ(x).
To prove the converse implication, it is enough to show that the c-superdifferential of ϕ at x is
non empty. To prove this, use the c-concavity of ϕ to find a sequence (yn ) ⊂ M such that
d2 (x, yn )
− ϕc+ (yn ),
n→∞
2
d2 (z, yn )
− ϕc+ (yn ),
∀z ∈ M, n ∈ N.
ϕ(z) ≤
2
ϕ(x) = lim
By compactness we can extract a subsequence converging to some y ∈ M . Then from the continuity
of d2 (z, ·)/2 and ϕc+ (·) it is immediate to verify that y ∈ ∂ c+ ϕ(x).
Remark 1.31 The converse implication in the previous proposition is false if one doesn’t assume ϕ
to be differentiable at x: i.e., it is not true in general that expx (−∂ + ϕ(x)) ⊂ ∂ c+ ϕ(x).
From this proposition, and following the same ideas used in the Euclidean case, we give the
following definition:
Definition 1.32 (Regular measures in P(M )) We say that µ ∈ P(M ) is regular provided it vanishes on the set of points of non differentiability of ψ for any semiconvex function ψ : M → R.
The set of points of non differentiability of a semiconvex function on M can be described as in
the Euclidean case by using local coordinates. For most applications it is sufficient to keep in mind
that absolutely continuous measures (w.r.t. the volume measure) and even measures vanishing on
Lipschitz hypersurfaces are regular.
By Proposition 1.30, we can derive a result about existence and characterization of optimal transport maps in manifolds which closely resembles Theorem 1.26:
Theorem 1.33 (McCann) Let M be a smooth, compact Riemannian manifold without boundary
and µ ∈ P(M ). Then the following are equivalent:
i) for every ν ∈ P(M ) there exists only one transport plan from µ to ν and this plan is induced
by a map T ,
ii) µ is regular.
If either (i) or (ii) hold, the optimal map T can be written as x 7→ expx (−∇ϕ(x)) for some cconcave function ϕ : M → R.
19
Proof
(ii) ⇒ (i) and the last statement. Pick ν ∈ P(M ) and observe that, since d2 (·, ·)/2 is uniformly
bounded, condition (1.4) surely holds. Thus from Theorem 1.13 and Remark 1.15 we get that any
optimal plan γ ∈ Opt (µ, ν) must be concentrated on the c-superdifferential of a c-concave function
ϕ. By Proposition 1.30 we know that ϕ is semiconcave, and thus differentiable µ-a.e. by our assumption on µ. Therefore x 7→ T (x) := expx (−∇ϕ(x)) is well defined µ-a.e. and its graph must
be of full γ-measure for any γ ∈ Opt (µ, ν). This means that γ is unique and induced by T .
(i) ⇒ (ii). Argue by contradiction and assume that there exists a semiconcave function f whose
set of points of non differentiability has positive µ measure. Use Lemma 1.34 below to find ε > 0
such that ϕ := εf is c-concave and satisfies: v ∈ ∂ + ϕ(x) if and only expx (−v) ∈ ∂ c+ ϕ(x). Then
conclude the proof as in Theorem 1.26.
Lemma 1.34 Let M be a smooth, compact Riemannian manifold without boundary and ϕ : M → R
semiconcave. Then for ε > 0 sufficiently small the function εϕ is c-concave and it holds v ∈
∂ + (εϕ)(x) if and only expx (−v) ∈ ∂ c+ (εϕ)(x).
Proof We start with the following claim: there exists ε > 0 such that for every x0 ∈ M and every
v ∈ ∂ + ϕ(x0 ) the function
d2 (x, expx0 (−εv))
x 7→ εϕ(x) −
2
has a global maximum at x = x0 .
Use the smoothness and compactness of M to find r > 0 such that d2 (·, ·)/2 : {(x, y) : d(x, y) <
r} → R is C ∞ and satisfies ∇2 d2 (·, y)/2 ≥ cId, for every y ∈ M , with c > 0 independent on y.
Now observe that since ϕ is semiconcave and real valued, it is Lipschitz. Thus, for ε0 > 0 sufficiently
small it holds ε0 |v| < r/3 for any v ∈ ∂ + ϕ(x) and any x ∈ M . Also, since ϕ is bounded, possibly
decreasing the value of ε0 we can assume that
ε0 |ϕ(x)| ≤
r2
.
12
Fix x0 ∈ M , v ∈ ∂ + ϕ(x0 ) and let y0 := expx0 (−ε0 v). We claim that for ε0 chosen as above,
the maximum of ε0 ϕ − d2 (·, y0 )/2, cannot lie outside Br (x0 ). Indeed, if d(x, x0 ) ≥ r we have
d(x, y0 ) > 2r/3 and thus:
ε0 ϕ(x) −
d2 (x, y0 )
r2
2r2
r2
r2
d2 (x0 , y0 )
<
−
=− −
≤ ε0 ϕ(x0 ) −
.
2
12
9
12 18
2
Thus the maximum must lie in Br (x0 ). Recall that in this ball, the function d2 (·, y0 ) is C ∞ and
satisfies ∇2 (d2 (·, y0 )/2) ≥ cId, thus it holds
d2 (·, y0 )
2
∇ ε0 ϕ(·) −
≤ (ε0 λ − c)Id,
2
where λ ∈ R is such that ∇2 ϕ ≤ λId on the whole of M . Thus decreasing if necessary the value of
ε0 we can assume that
d2 (·, y0 )
2
∇ ε0 ϕ(·) −
<0
on Br (x0 ),
2
which implies that ε0 ϕ(·) − d2 (·, y0 )/2 admits a unique point x ∈ Br (x0 ) such that 0 ∈
∂ + (ϕ − d2 (·, y0 )/2)(x), which therefore is the unique maximum. Since ∇ 21 d2 (·, y0 )(x0 ) = ε0 v ∈
∂ + (ε0 ϕ)(x0 ), we conclude that x0 is the unique global maximum, as claimed.
20
Now define the function ψ : M → R ∪ {−∞} by
ψ(y) := inf
x∈M
d2 (x, y)
− ε0 ϕ(x),
2
if y = expx (−ε0 v) for some x ∈ M , v ∈ ∂ + ϕ(x), and ψ(y) := −∞ otherwise. By definition we
have
d2 (x, y)
ε0 ϕ(x) ≤
− ψ(y),
∀x, y ∈ M,
2
and the claim proved ensures that if y0 = expx0 (−ε0 v0 ) for x0 ∈ M , v0 ∈ ∂ + ϕ(x0 ) the inf in the
definition of ψ(y0 ) is realized at x = x0 and thus
ε0 ϕ(x0 ) =
d2 (x0 , y0 )
− ψ(y0 ).
2
Hence ε0 ϕ = ψ c+ and therefore is c-concave. Along the same lines one can easily see that for
y ∈ expx (−ε0 ∂ + ϕ(x)) it holds
ε0 ϕ(x) =
d2 (x, y)
− (ε0 ϕ)c+ (y),
2
i.e. y ∈ ∂ c+ (ε0 ϕ)(x0 ). Thus we have ∂ c+ (ε0 ϕ) ⊃ exp(−∂ + (εϕ)). Since the other inclusion has
been proved in Proposition 1.30 the proof is finished.
Remark 1.35 With the same notation of Theorem 1.33, recall that we know that the c-concave function ϕ whose c-superdifferential contains the graph of any optimal plan from µ to ν is differentiable
µ-a.e. (for regular µ). Fix x0 such that ∇ϕ(x0 ) exists, let y0 := expx0 (−∇ϕ(x0 )) ∈ ∂ c+ ϕ(x0 ) and
observe that from
d2 (x, y0 ) d2 (x0 , y0 )
−
≥ ϕ(x) − ϕ(x0 ),
2
2
we deduce that ∇ϕ(x0 ) belongs to the subdifferential of d2 (·, y0 )/2 at x0 . Since we know that
d2 (·, y0 )/2 always have non empty superdifferential, we deduce that it must be differentiable at x0 .
In particular, there exists only one geodesic connecting x0 to y0 . Therefore if µ is regular, not only
there exists a unique optimal transport map T , but also for µ-a.e. x there is only one geodesic
connecting x to T (x).
The question of regularity of optimal maps on manifolds is much more delicate than the corresponding question on Rd , even if one wants to get only the continuity. We won’t enter into the
details of the theory, we just give an example showing the difficulty that can arise in a curved setting.
The example will show a smooth compact manifold, and two measures absolutely continuous with
positive and smooth densities, such that the optimal transport map is discontinuous. We remark that
similar behaviors occur as soon as M has one point and one sectional curvature at that point which
is strictly negative. Also, even if one assumes that the manifold has non negative sectional curvature
everywhere, this is not enough to guarantee continuity of the optimal map: what comes into play in
this setting is the Ma-Trudinger-Wang tensor, an object which we will not study.
Example 1.36 Let M ⊂ R3 be a smooth surface which has the following properties:
• M is symmetric w.r.t. the x axis and the y axis,
• M crosses the line (x, y) = (0, 0) at two points, namely O, O0 .
21
• the curvature of M at O is negative.
These assumptions ensure that we can find a, b > 0 such that for some za , zb the points
A := (a, 0, za ),
A0 := (−a, 0, za ),
B := (0, b, zb ),
B 0 := (0, −b, zb ),
belong to M and
d2 (A, B) > d2 (A, O) + d2 (O, B),
d being the intrinsic distance on M . By continuity and symmetry, we can find ε > 0 such that
d2 (x, y) > d2 (x, O) + d2 (O, y),
∀x ∈ Bε (A) ∪ Bε (A0 ), y ∈ Bε (B) ∪ Bε (B 0 ).
(1.8)
Now let f (resp. g)R be a smooth probability densityR everywhere positive and symmetric w.r.t. the
x, y axes such that Bε (A)∪Bε (A0 ) f dvol > 21 (resp. Bε (B)∪Bε (B 0 ) g dvol > 12 ), and let T (resp. T 0 )
be the optimal transport map from f vol to gvol (resp. from gvol to f vol).
We claim that either T or T 0 is discontinuous and argue by contradiction. Suppose that both
are continuous and observe that by the symmetry of the optimal transport problem it must hold
T 0 (x) = T −1 (x) for any x ∈ M . Again by the symmetry of M , f, g, the point T (O) must be
invariant under the symmetries around the x and y axes. Thus it is either T (O) = O or T (O) = O0 ,
and similarly, T 0 (O0 ) ∈ {O, O0 }.
We claim that it must hold T (O) = O. Indeed otherwise either T (O) = O0 and T (O0 ) = O, or
T (O) = O0 and T (O0 ) = O0 . In the first case the two couples (O, O0 ) and (O0 , O) belong to the
support of the optimal plan, and thus by cyclical monotonicity it holds
d2 (O, O0 ) + d2 (O0 , O) ≤ d2 (O, O) + d2 (O0 , O0 ) = 0,
which is absurdum.
In the second case we have T 0 (x) 6= O for all x ∈ M , which, by continuity and compactness
0
implies d(T 0 (M ), O) > 0. This contradicts the fact that f is positive everywhere and T#
(gvol) =
f vol.
Thus it holds T (O) = O. Now observe that by construction there must be some mass transfer
from Bε (A) ∪ Bε (A0 ) to Bε (B) ∪ Bε (B 0 ), i.e. we can find x ∈ Bε (A) ∪ Bε (A0 ) and y ∈ Bε (B) ∪
Bε (B 0 ) such that (x, y) is in the support of the optimal plan. Since (O, O) is the support of the
optimal plan as well, by cyclical monotonicity it must hold
d2 (x, y) + d2 (O, O) ≤ d2 (x, O) + d2 (O, y),
which contradicts (1.8).
1.5
Bibliographical notes
G. Monge’s original formulation of the transport problem ([66]) was concerned with the case X =
Y = Rd and c(x, y) = |x − y|, and L. V. Kantorovich’s formulation appeared first in [49].
The equality (1.2), saying that the infimum of the Monge problem equals the minimum of Kantorovich one, has been proved by W. Gangbo (Appendix A of [41]) and the first author (Theorem 2.1
in [4]) in particular cases, and then generalized by A. Pratelli [68].
22
In [50] L. V. Kantorovich introduced the dual problem, and later L. V. Kantorovich and G. S.
Rubinstein [51] further investigated this duality for the case c(x, y) = d(x, y). The fact that the
study of the dual problem can lead to important informations for the transport problem has been
investigated by several authors, among others M. Knott and C. S. Smith [52] and S. T. Rachev and L.
Rüschendorf [69], [71].
The notions of cyclical monotonicity and its relation with subdifferential of convex function
have been developed by Rockafellar in [70]. The generalization to c-cyclical monotonicity and to csub/super differential of c-convex/concave functions has been studied, among others, by Rüschendorf
[71].
The characterization of the set of non differentiability of convex functions is due to Zajíˇcek ([83],
see also the paper by G. Alberti [2] and the one by G. Alberti and the first author [3])
Theorem 1.26 on existence of optimal maps in Rd for the cost=distance-squared is the celebrated
result of Y. Brenier, who also observed that it implies the polar factorization result 1.28 ([18], [19]).
Brenier’s ideas have been generalized in many directions. One of the most notable one is R. McCann’s theorem 1.33 concerning optimal maps in Riemannian manifolds for the case cost=squared
distance ([64]). R. McCann also noticed that the original hypothesis in Brenier’s theorem, which was
µ Ld , can be relaxed into ‘µ gives 0 mass to Lipschitz hypersurfaces’. In [42] W. Gangbo and
R. McCann pointed out that to get existence of optimal maps in Rd with c(x, y) = |x − y|2 /2 it is
sufficient to ask to the measure µ to be regular in the sense of the Definition 1.25. The sharp version
of Brenier’s and McCann’s theorems presented here, where the necessity of the regularity of µ is also
proved, comes from a paper of the second author of these notes ([46]).
Other extensions of Brenier’s result are:
• Infinite-dimensional Hilbert spaces (the authors and Savaré - [6])
• cost functions induced by Lagrangians, Bernard-Buffoni [13], namely
Z 1
c(x, y) := inf
L(t, γ(t), γ(t))
˙
dt : γ(0) = x, γ(1) = y ;
0
• Carnot groups and sub-Riemannian manifolds, c = d2CC /2: the first author and S. Rigot ([10]),
A. Figalli and L. Rifford ([39]);
• cost functions induced by sub-Riemannian Lagrangians A. Agrachev and P. Lee ([1]).
• Wiener spaces (E, H, γ), D. Feyel- A. S. Üstünel ([36]).
Here E is a Banach space, γ ∈ P(E) is Gaussian and H is its Cameron- Martin space, namely
H := {h ∈ E : (τh )] γ γ} .
In this case

 |x − y|2H
c(x, y) :=
2
+∞
if x − y ∈ H;
otherwise.
The issue of regularity of optimal maps would nowadays require a lecture note in its own. A
rough statement that one should have in mind is that it is rare to have regular (even just continuous)
optimal transport maps. The key Theorem 1.27 is due to L. Caffarelli ([22], [21], [23]).
Example 1.36 is due to G. Loeper ([55]). For the general case of cost=squared distance on a compact Riemannian manifold, it turns out that continuity of optimal maps between two measures with
smooth and strictly positive density is strictly related to the positivity of the so-called Ma-TrudingerWang tensor ([59]), an object defined taking fourth order derivatives of the distance function. The
23
understanding of the structure of this tensor has been a very active research area in the last years, with
contributions coming from X.-N. Ma, N. Trudinger, X.-J. Wang, C. Villani, P. Delanoe, R. McCann,
A. Figalli, L. Rifford, H.-Y. Kim and others.
A topic which we didn’t discuss at all is the original formulation of the transport problem of
Monge: the case c(x, y) := |x − y| on Rd . The situation in this case is much more complicated than
the one with c(x, y) = |x − y|2 /2 as it is typically not true that optimal plans are unique, or that
optimal plans are induced by maps. For example consider on R any two probability measures µ, ν
such that µ is concentrated on the negative numbers and ν on the positive ones. Then one can see
that any admissible plan between them is optimal for the cost c(x, y) = |x − y|.
Still, even in this case there is existence of optimal maps, but in order to find them one has to
use a sort of selection principle. A successful strategy - which has later been applied to a number of
different situation - has been proposed by V. N. Sudakov in [77], who used a disintegration principle
to reduce the d-dimensional problem to a problem on R. The original argument by V. N. Sudakov was
flawed and has been fixed by the first author in [4] in the case of the Euclidean distance. Meanwhile,
different proofs of existence of optimal maps have been proposed by L. C.Evans- W. Gangbo ([34]),
Trudinger and Wang [78], and L. Caffarelli, M. Feldman and R. McCann [24].
Later, existence of optimal maps for the case c(x, y) := kx − yk, k · k being any norm has been
established, at increasing levels of generality, in [9], [28], [27] (containing the most general result,
for any norm) and [25].
2
The Wasserstein distance W2
The aim of this chapter is to describe the properties of the Wasserstein distance W2 on the space
of Borel Probability measures on a given metric space (X, d). This amounts to study the transport
problem with cost function c(x, y) = d2 (x, y).
An important characteristic of the Wasserstein distance is that it inherits many interesting geometric properties of the base space (X, d). For this reason we split the foregoing discussion into three
sections on which we deal with the cases in which X is: a general Polish space, a geodesic space and
a Riemannian manifold.
A word on the notation: when considering product spaces like X n , with π i : X n → X we intend
the natural projection onto the i-th coordinate, i = 1, . . . , n. Thus, for instance, for µ, ν ∈ P(X)
1
2
and γ ∈ Adm(µ, ν) we have π#
γ = µ and π#
γ = ν. Similarly, with π i,j : X n → X 2 we intend the
projection onto the i-th and j-th coordinates. And similarly for multiple projections.
2.1 X Polish space
Let (X, d) be a complete and separable metric space.
The distance W2 is defined as
s
Z
W2 (µ, ν) :=
sZ
=
inf
γ ∈Adm(µ,ν)
d2 (x, y)dγ(x, y)
d2 (x, y)dγ(x, y),
∀γ ∈ Opt (µ, ν).
The natural space to endow with the Wasserstein distance W2 is the space P2 (X) of Borel
24
Probability measures with finite second moment:
Z
n
o
P2 (X) := µ ∈ P(X) :
d2 (x, x0 )dµ(x) < ∞ for some, and thus any, x0 ∈ X .
Notice that if either µ or ν is a Dirac delta, say ν = δx0 , then there exists only one plan γ in
Adm(µ, ν): the plan µ × δx0 , which therefore is optimal. In particular it holds
Z
d2 (x, x0 )dµ(x) = W22 (µ, δx0 ),
that is: the second moment is nothing but the squared Wasserstein distance from the corresponding
Dirac mass.
We start proving that W2 is actually a distance on P2 (X). In order to prove the triangle inequality, we will use the following lemma, which has its own interest:
Lemma 2.1 (Gluing) Let X, Y, Z be three Polish spaces and let γ 1 ∈ P(X ×Y ), γ 2 ∈ P(Y ×Z)
Y 1
Y 2
be such that π#
γ = π#
γ . Then there exists a measure γ ∈ P(X × Y × Z) such that
X,Y
π#
γ = γ1,
Y,Z
π#
γ = γ2.
Y 1
Y 2
Proof Let µ := π#
γ = π#
γ and use the disintegration theorem to write dγ 1 (x, y) =
dµ(y)dγ 1y (x) and dγ 2 (y, z) = dµ(y)dγ 2y (z). Conclude defining γ by
dγ(x, y, z) := dµ(y)d(γ 1y × γ 2y )(x, z).
Theorem 2.2 (W2 is a distance) W2 is a distance on P2 (X).
Proof It is obvious that W2 (µ, µ) = 0 and that W2 (µ, ν) = W2 (ν, µ). To prove
that W2 (µ, ν) = 0
R
implies µ = ν just pick an optimal plan γ ∈ Opt (µ, ν) and observe that d2 (x, y)dγ(x, y) = 0
implies that γ is concentrated on the diagonal of X × X, which means that the two maps π 1 and π 2
1
2
coincide γ-a.e., and therefore π#
γ = π#
γ.
For the triangle inequality, we use the gluing lemma to “compose” two optimal plans. Let
µ1 , µ2 , µ3 ∈ P2 (X) and let γ 21 ∈ Opt (µ1 , µ2 ), γ 32 ∈ Opt (µ2 , µ3 ). By the gluing lemma we
know that there exists γ ∈ P2 (X 3 ) such that
1,2
π#
γ = γ 21 ,
2,3
π#
γ = γ 32 .
1,3
1
3
Since π#
γ = µ1 and π#
γ = µ3 , we have π#
γ ∈ Adm(µ1 , µ3 ) and therefore from the triangle
25
inequality in L2 (γ) it holds
sZ
1,3
d2 (x1 , x3 )dπ#
γ(x1 , x3 ) =
W2 (µ1 , µ3 ) ≤
sZ
≤
sZ
≤
sZ
=
sZ
d2 (x1 , x3 )dγ(x1 , x2 , x3 )
2
d(x1 , x2 ) + d(x2 , x3 ) dγ(x1 , x2 , x3 )
d2 (x1 , x2 )dγ(x1 , x2 , x3 ) +
d2 (x1 , x2 )dγ 21 (x1 , x2 ) +
sZ
sZ
d2 (x2 , x3 )dγ(x1 , x2 , x3 )
d2 (x2 , x3 )dγ 32 (x2 , x3 ) = W2 (µ1 , µ2 ) + W2 (µ2 , µ3 ).
Finally, we need to prove that W2 is real valued. Here we use the fact that we restricted the analysis
to the space P2 (X): from the triangle inequality we have
sZ
sZ
d2 (x, x0 )dµ(x) +
W2 (µ, ν) ≤ W2 (µ, δx0 ) + W2 (ν, δx0 ) =
d2 (x, x0 )dν(x) < ∞.
A trivial, yet very useful inequality is:
W22 (f# µ, g# µ) ≤
Z
d2Y (f (x), g(x))dµ(x),
(2.1)
valid for any couple of metric spaces X, Y , any µ ∈ P(X) and any couple of Borel maps f, g :
X → Y . This inequality follows from the fact that (f, g)# µ is an admissible plan for the measures
f# µ, g# µ, and its cost is given by the right hand side of (2.1).
Observe that there is a natural isometric immersion of (X, d) into (P2 (X), W2 ), namely the map
x 7→ δx .
Now we want to study the topological properties of (P2 (X), W2 ). To this aim, we introduce the
notion of 2-uniform integrability: K ⊂ P2 (X) is 2-uniformly integrable provided for any ε > 0 and
x0 ∈ X there exists Rε > 0 such that
Z
sup
d2 (x, x0 )dµ ≤ ε.
µ∈K
X\BRε (x0 )
Remark 2.3 Let (X,
dX ), (Y, dY ) be Polish and endow X × Y with the product distance
d2 (x1 , y1 ), (x2 , y2 ) := d2X (x1 , x2 ) + d2Y (y1 , y2 ). Then the inequality
Z
Z
Z
d2X (x, x0 )dγ(x, y) =
d2X (x, x0 )dγ(x, y) +
d2X (x, x0 )dγ(x, y)
(BR (x0 )×BR (y0 ))c
(BR (x0 ))c ×Y
Z
≤
BR (x0 )×(BR (y0 ))c
(BR (x0 ))c
Z
≤
Z
d2X (x, x0 )dµ(x) +
R2 dγ(x, y)
X×(BR (y0 ))c
d2X (x, x0 )dµ(x) +
(BR (x0 ))c
Z
(BR (y0 ))c
26
d2Y (y, y0 )dν(y),
valid for any γ ∈ Adm(µ, ν) and the analogous one with the integral of d2Y (y, y0 ) in place of
d2X (x, x0 ), show that if K1 ⊂ P2 (X) and K2 ⊂ P2 (Y ) are 2-uniformly integrable, so is the
set
n
o
X
Y
γ ∈ P(X × Y ) : π#
γ ∈ K1 , π#
γ ∈ K2 .
We say that a function f : X → R has quadratic growth provided
|f (x)| ≤ a(d2 (x, x0 ) + 1),
(2.2)
for some a ∈ R and x0 ∈ X. It is immediate to check that if f has quadratic growth and µ ∈ P2 (X),
then f ∈ L1 (X, µ).
The concept of 2-uniform integrability (in conjunction with tightness) in relation with convergence of integral of functions with quadratic growth, plays a role similar to the one played by tightness in relation with convergence of integral of bounded functions, as shown in the next proposition.
Proposition 2.4 Let (µn ) ⊂ P2 (X) be a sequence narrowly converging to some µ. Then the following 3 properties are equivalent
i) (µn ) is 2-uniformly integrable,
R
R
f dµn → f dµ for any continuous f with quadratic growth,
R
R
iii) d2 (·, x0 )dµn → d2 (·, x0 )dµ for some x0 ∈ X.
ii)
Proof
(i) ⇒ (ii). It is not restrictive to assume f ≥ 0. Since any such f can be written as supremum of a
family of continuous and bounded functions, it clearly holds
Z
Z
f dµ ≤ lim inf f dµn .
n→∞
Thus
we only have to prove the limsup inequality. Fix ε > 0, x0 ∈ X and find Rε > 1 such that
R
d2 (·, x0 )dµn ≤ ε for every n. Now let χ be a function with bounded support, values in
X\BRε (x0 )
[0, 1] and identically 1 on BRε and notice that for every n ∈ N it holds
Z
Z
Z
Z
Z
Z
f dµn = f χdµn + f (1 − χ)dµn ≤ f χdµn +
f dµn ≤ f χdµn + 2aε,
X\BRε
R
R
a being given by (2.2). Since f χ is continuous and bounded we have f χdµn → f χdµ and
therefore
Z
Z
Z
lim
f dµn ≤ f χdµ + 2aε ≤ f dµ + 2aε.
n→∞
Since ε > 0 was arbitrary, this part of the statement is proved.
(ii) ⇒ (iii). Obvious.
(iii) ⇒ (i). Argue by contradiction
and assume that there exist ε > 0 and x
˜0 ∈ X such that for
R
every R > 0 it holds supn∈N X\BR (˜x0 ) d2 (·, x
˜0 )dµn > ε. Then it is easy to see that it holds
Z
d2 (·, x0 )dµn > ε.
lim
n→∞
X\BR (x0 )
27
(2.3)
For every R > 0 let χR be a continuous cutoff function with values in [0, 1] supported on BR (x0 )
and identically 1 on BR/2 (x0 ). Since d2 (·, x0 )χR is continuous and bounded, we have
Z
Z
d2 (·, x0 )χR dµn
d2 (·, x0 )χR dµ = lim
n→∞
Z
Z
d2 (·, x0 )dµn − d2 (·, x0 )(1 − χR )dµn
= lim
n→∞
Z
Z
= d2 (·, x0 )dµ + lim − d2 (·, x0 )(1 − χR )dµn
n→∞
Z
Z
2
d2 (·, x0 )dµn
≤ d (·, x0 )dµ + lim −
n→∞
X\BR (x0 )
Z
Z
2
d2 (·, x0 )dµn
= d (·, x0 )dµ − lim
n→∞ X\B (x )
0
R
Z
≤ d2 (·, x0 )dµ − ε,
having used (2.3) in the last step. Since
Z
Z
Z
d2 (·, x0 )dµ = sup d2 (·, x0 )χR dµ ≤ d2 (·, x0 )dµ − ε,
R
we got a contradiction.
Proposition 2.5 (Stability of optimality) The distance W2 is lower semicontinuous w.r.t. narrow
convergence of measures. Furthermore, if (γ n ) ⊂ P2 (X 2 ) is a sequence of optimal plans which
narrowly converges to γ ∈ P2 (X 2 ), then γ is optimal as well.
Proof Let (µn ), (νn ) ⊂ P2 (X) be two sequences of measures narrowly converging to µ, ν ∈
P2 (X) respectively. Pick γ n ∈ Opt (µn , νn ) and use Remark 1.4 and Prokhorov theorem to get that
(γ n ) admits a subsequence, not relabeled, narrowly converging to some γ ∈ P(X 2 ). It is clear that
1
2
π#
γ = µ and π#
γ = ν, thus it holds
Z
Z
2
2
W2 (µ, ν) ≤ d (x, y)dγ(x, y) ≤ lim
d2 (x, y)dγ n (x, y) = lim W22 (µn , νn ).
n→∞
n→∞
Now we pass to the second part of the statement, that is: we need to prove that with the same
notation just used it holds γ ∈ Opt (µ, ν). Choose a(x) = b(x) = d2 (x, x0 ) for some x0 ∈ X in
the bound (1.4) and observe that since µ, ν ∈ P2 (X) Theorem 1.13 applies, and thus optimality is
equivalent to c-cyclical monotonicity of the support. The same for the plans γ n . Fix N ∈ N and pick
(xi , y i ) ∈ supp(γ), i = 1, . . . , N . From the fact that (γ n ) narrowly converges to γ it is not hard to
infer the existence of (xin , yni ) ∈ supp(γ n ) such that
lim d(xin , xi ) + d(yni , y i ) = 0,
∀i = 1, . . . , N.
n→∞
Thus the conclusion follows from the c-cyclical monotonicity of supp(γ n ) and the continuity of the
cost function.
Now we are going to prove that (P2 (X), W2 ) is a Polish space. In order to enable some constructions, we will use (a version of) Kolmogorov’s theorem, which we recall without proof (see e.g. [31]
§51).
28
Theorem 2.6 (Kolmogorov) Let X be a Polish space and µn ∈ P(X n ), n ∈ N, be a sequence of
measures such that
1,...,n−1
π#
µn = µn−1 ,
∀n ≥ 2.
Then there exists a measure µ ∈ X N such that
1,...,n
π#
µ = µn ,
∀n ∈ N.
Theorem 2.7 (Basic properties of the space (P2 (X), W2 )) Let (X, d) be complete and separable.
Then

 Z
µn → µ
narrowly
Z
W2 (µn , µ) → 0
⇔
(2.4)
2
2
d (·, x0 )dµn → d (·, x0 )dµ for some x0 ∈ X.

Furthermore, the space (P2 (X), W2 ) is complete and separable. Finally, K ⊂ P2 (X) is relatively
compact w.r.t. the topology induced by W2 if and only if it is tight and 2-uniformly integrable.
Proof We start showing implication ⇒ in (2.4). Thus assume that W2 (µn , µ) → 0. Then
sZ
sZ
2
2
d (·, x0 )dµn −
d (·, x0 )dµ = |W2 (µn , δx0 ) − W2 (µ, δx0 )| ≤ W2 (µn , µ) → 0.
To prove narrow convergence, for every n ∈ N choose γ n ∈ Opt (µ, µn ) and2 use repeatedly the
gluing lemma to find, for every n ∈ N, a measure αn ∈ P(X × X n ) such that
0,n
π#
αn = γ n ,
0,1,...,n−1
π#
αn = αn−1 .
Then by Kolmogorov’s theorem we know that there exists a measure α ∈ P(X × X N ) such that
0,1,...,n
π#
α = αn ,
∀n ∈ N.
By construction we have
kd(π 0 , π n )kL2 (X×X N ,α) = kd(π 0 , π n )kL2 (X 2 ,γ n ) = W2 (µ, µn ) → 0.
Thus up to passing to a subsequence, not relabeled, we can assume that π n (x) → π 0 (x) for α-almost
any x ∈ X × X N . Now pick f ∈ Cb (X) and use the dominated convergence theorem to get
Z
Z
Z
Z
n
0
lim
f dµn = lim
f ◦ π dα = f ◦ π dα = f dµ.
n→∞
n→∞
if closed balls in X are compact, the proof greatly simplifies. Indeed in this case the inequality R2 µ(X \ BR (x0 )) ≤
d (·, x0 )dµ and the uniform bound on the
yields that the sequence n 7→ µn is tight. Thus to prove narrow
R second moments
R
convergence it is sufficient to check that f dµn → f dµ for every f ∈ Cc (X). Since Lipschitz functions are dense in
Cc (X) w.r.t. uniform convergence, it is sufficient to check the convergence of the integral only for Lipschitz f ’s. This follows
from the inequality
Z
Z
Z
Z
f dµ − f dµn = f (x) − f (y)dγ n (x, y) ≤ |f (x) − f (y)|dγ n (x, y)
sZ
Z
2
R
2
≤ Lip(f )
d(x, y)dγ n (x, y) ≤ Lip(f )
29
d2 (x, y)dγ n (x, y) = Lip(f )W2 (µ, µn ).
Since the argument does not depend on the subsequence chosen, the claim is proved.
We pass to the converse implication in (2.4). Pick γ n ∈ Opt (µ, µn ) and use Remark 1.4 to
get that the sequence (γ n ) is tight, hence, up to passing to a subsequence, we can assume that it
narrowly
converges to some γ. By Proposition 2.5 we know that γ ∈ Opt (µ, µ), which forces
R 2
d (x, y)dγ(x, y) = 0. By Proposition 2.4 and our assumption on (µn ), µ we know that (µn ) is
2-uniformly integrable, thus by Remark 2.3 again we know that (γ n ) is 2-uniformly integrable as
well. Since the map (x, y) 7→ d2 (x, y) has quadratic growth in X 2 it holds
Z
Z
lim W22 (µn , µ) = lim
d2 (x, y)dγ n (x, y) = d2 (x, y)dγ(x, y) = 0.
n→∞
n→∞
Now we prove that (P2P
(X), W2 ) is complete. Pick a Cauchy sequence (µn ) and assume3 , without loss of generality, that n W2 (µn , µn+1 ) < ∞. For every n ∈ N choose γ n ∈ Opt (µn , µn+1 )
and use repeatedly the gluing lemma to find, for every n ∈ N, a measure β n ∈ P2 (X n ) such that
n,n+1
π#
βn = γ n ,
1,...,n−1
π#
β n = αn−1
1,...,n
By Kolmogorov’s theorem we get the existence of a measure β ∈ P(X N ) such that π#
β = βn
for every n ∈ N. The inequality
∞
X
n=1
kd(π i , π i+1 )kL2 (X N ,β) =
∞
X
kd(π i , π i+1 )kL2 (X 2 ,γ i ) =
n=1
∞
X
W2 (µi , µi+1 ) < ∞,
n=1
shows that n 7→ π n R: X N → X is a Cauchy sequence in L2 (β, X), i.e. the space of maps f :
2
X N → X such that dq
(f (y), x0 )dβ(y) < ∞ for some, and thus every, x0 ∈ X endowed with
R
˜ g) :=
d2 (f (y), g(y))dβ(y). Since X is complete, L2 (β, X) is complete as
the distance d(f,
∞
well, and therefore there exists a limit map π ∞ of the Cauchy sequence (π n ). Define µ := π#
β and
notice that by (2.1) we have
Z
2
W2 (µ, µn ) ≤ d2 (π ∞ , π n )dβ → 0,
so that µ is the limit of the Cauchy sequence (µn ) in (P2 (X), W2 ). The fact that (P2 (X), W2 ) is
separable follows from (2.4) by considering the set of finite convex combinations of Dirac masses
centered at points in a dense countable set in X with rational coefficients. The last claim now follows.
Remark 2.8 (On compactness properties of P2 (X)) An immediate consequence of the above
theorem is the fact that if X is compact, then (P2 (X), W2 ) is compact as well: indeed, in this
case the equivalence (2.4) tells that convergence in P2 (X) is equivalent to weak convergence.
It is also interesting to notice that if X is unbounded, then P2 (X) is not locally compact. Actually, for any measure µ ∈ P2 (X) and any r > 0, the closed ball of radius r around µ is not compact.
To see this, fix x ∈ X and find a sequence (xn ) ⊂ X such that d(xn , x) → ∞. Now define the
3
again, if closed balls in X are compact
R the argument simplifies. Indeed from the uniform bound on the second moments
and the inequality R2 µ(X \ BR (x0 )) ≤ X\B (x0 ) d2 (·, x0 )dµ we get the tightness of the sequence. Hence up to pass to a
R
subsequence we can assume that (µn ) narrowly converges to a limit measure µ, and then using the lower semicontinuity of W2
w.r.t. narrow convergence we can conclude limn W2 (µ, µn ) ≤ limn limm W2 (µm , µn ) = 0
30
measures µn := (1 − εn )µ + εn δxn , where εn is chosen such that εn d2 (x, xn ) = r2 . To bound from
above W22 (µ, µn ), leave fixed (1 − εn )µ, move εn µ to x and then move εn δx into εn δxn , this gives
Z
2
2
2
d (x, x)dµ(x) + d (xn , x) ,
W2 (µ, µn ) ≤ εn
so that lim W2 (µ, µn ) ≤ r. Conclude observing that
Z
Z
Z
2
2
2
lim
d (x, x)dµn = lim (1 − εn ) d (x, x)dµ + εn d (xn , x) = d2 (x, x)dµ + r2 ,
n→∞
n→∞
thus the second moments do not converge. Since clearly (µn ) weakly converges to µ, we proved that
there is no local compactness.
2.2 X geodesic space
In this section we prove that if the base space (X, d) is geodesic, then the same is true also for
(P2 (X), W2 ) and we will analyze the properties of this latter space.
Let us recall that a curve γ : [0, 1] → X is called constant speed geodesic provided
d γt , γs = |t − s|d γ0 , γ1 ,
∀t, s ∈ [0, 1],
(2.5)
or equivalently if ≤ always holds.
Definition 2.9 (Geodesic space) A metric space (X, d) is called geodesic if for every x, y ∈ X
there exists a constant speed geodesic connecting them, i.e. a constant speed geodesic such that
γ0 = x and γ1 = y.
Before entering into the details, let us describe an important example. Recall that X 3 x 7→ δx ∈
P2 (X) is an isometry. Therefore if t 7→ γt is a constant speed geodesic on X connecting x to y, the
curve t 7→ δγt is a constant speed geodesic on P2 (X) which connects δx to δy . The important thing
to notice here is that the natural way to interpolate between δx and δy is given by this - so called displacement interpolation. Conversely, observe that the classical linear interpolation
t 7→ µt := (1 − t)δx + tδy ,
produces a curve which has infinite length as soon as x 6= y (because W2 (µt , µs ) =
p
|t − s|d(x, y)), and thus is unnatural in this setting.
We will denote by Geod(X) the metric space of all constant speed geodesics on X endowed with
the sup norm. With some work it is possible to show that Geod(X) is complete and separable as
soon as X is (we omit the details). The evaluation maps et : Geod(X) → X are defined for every
t ∈ [0, 1] by
et (γ) := γt .
(2.6)
Theorem 2.10 Let (X, d) be Polish and geodesic. Then (P2 (X), W2 ) is geodesic as well. Furthermore, the following two are equivalent:
i) t 7→ µt ∈ P2 (X) is a constant speed geodesic,
ii) There exists a measure µ ∈ P2 (Geod(X)) such that (e0 , e1 )# µ ∈ Opt (µ0 , µ1 ) and
µt = (et )# µ.
31
(2.7)
Proof Choose µ0 , µ1 ∈ P2 (X) and find an optimal plan γ ∈ Opt (µ, ν). By Lemma 2.11 below and classical measurable selection theorems we know that there exists a Borel map GeodSel :
X 2 → Geod(X) such that for any x, y ∈ X the curve GeodSel(x, y) is a constant speed geodesic
connecting x to y. Define the Borel probability measure µ ∈ P(Geod(X)) by
µ := GeodSel# γ,
and the measures µt ∈ P(X) by µt := (et )# µ.
We claim that t 7→ µt is a constant speed geodesic connecting
µ0 to µ1. Consider indeed the map
2
(e0 , e1 ) : Geod(X) → X and observe that from (e0 , e1 ) GeodSel(x, y) = (x, y) we get
(e0 , e1 )# µ = γ.
(2.8)
1
In particular, µ0 = (e0 )# µ = π#
γ = µ0 , and similarly µ1 = µ1 , so that the curve t 7→ µt connects
0
1
µ to µ . The facts that the measures µt have finite second moments and (µt ) is a constant speed
geodesic follow from
Z
(2.7),(2.1)
W22 (µt , µs ) ≤
d2 (et (γ), es (γ))dµ(γ)
Z
(2.5)
2
= (t − s)
d2 (e0 (γ), e1 (γ))dµ(γ)
Z
(2.8)
= (t − s)2 d2 (x, y)dγ(x, y) = (t − s)2 W22 (µ0 , µ1 ).
The fact that (ii) implies (i) follows from the same kind of argument just used. So, we turn to
(i) ⇒ (ii). For n ≥ 0 we use iteratively the gluing Lemma 2.1 and the Borel map GeodSel to build
a measure µn ∈ P(C([0, 1], X)) such that
∀i = 0, . . . , 2n − 1,
ei/2n , e(i+1)/2n # µn ∈ Opt (µi/2n , µ(i+1)/2n ),
and µn -a.e. γ is a geodesic in the intervals [i/2n , (i + 1)/2n ], i = 0, . . . , 2n − 1. Fix n and observe
that for any 0 ≤ j < k ≤ 2n it holds
k−1
X
d ej/2n , ek/2n 2 n ≤ d ei/2n , e(i+1)/2n L (µ )
L2 (µn )
i=j
=
k−1
X
≤
k−1
X
d ei/2n , e(i+1)/2n 2 n
L (µ )
i=j
W2 (µi/2n , µ(i+1)/2n ) = W2 (µj/2n , µk/2n ).
i=j
(2.9)
Therefore it holds
ej/2n , ek/2n
#
µn ∈ Opt (µj/2n , µk/2n ),
∀j, k ∈ {0, . . . , 2n }.
Also, since the inequalities in (2.9) are equalities, it is not hard to see that for µn -a.e. γ the points
γi/2n , i = 0, . . . , 2n , must lie along a geodesic and satisfy d(γi/2n , γ(i+1)/2n ) = d(γ0 , γ1 )/2n ,
i = 0, . . . , 2n − 1. Hence µn -a.e. γ is a constant speed geodesic and thus µn ∈ P(Geod(X)).
Now suppose for a moment that (µn ) narrowly converges - up to pass to a subsequence - to some
µ ∈ P(Geod(X)). Then the continuity of the evaluation maps et yields that for any t ∈ [0, 1] the
32
sequence n 7→ (et )# µn narrowly converges to (et )# µ and this, together with the uniform bound
(2.9), easily implies that µ satisfies (2.7).
Thus to conclude it is sufficient to show that some subsequence of (µn ) has a narrow limit4 . We
will prove this by showing that µn ∈ P2 (Geod(X)) for every n ∈ N and that some subsequence is a
Cauchy sequence in (P2 (Geod(X)), W2 ), W2 being the Wasserstein distance built over Geod(X)
endowed with the sup distance, so that by Theorem 2.7 we conclude.
We know by Remark 1.4, Remark 2.3 and Theorem 2.7 that for every n ∈ N the set of plans
n
n
i
α ∈ P2 (X 2 +1 ) such that π#
α = µi/2n for i = 0, . . . , 2n , is compact in P2 (X 2 +1 ). Therefore
a diagonal argument tells that possibly passing to a subsequence, not relabeled, we may assume that
for every n ∈ N the sequence
2n
Y
m 7→
(ei/2n )# µm
i=0
n
converges to some plan w.r.t. the distance W2 on X 2 +1 .
Now fix n ∈ N and notice that for t ∈ [i/2n , (i + 1)/2n ] and γ, γ˜ ∈ Geod(X) it holds
1
γ0 , γ˜1 ) ,
d γt , γ˜t ≤ d γi/2n , γ˜(i+1)/2n + n d(γ0 , γ1 ) + d(˜
2
and therefore squaring and then taking the sup over t ∈ [0, 1] we get
2
sup d (γt , γ˜t ) ≤ 2
t∈[0,1]
n
2X
−1
d2 γi/2n , γ˜(i+1)/2n +
i=0
1 2n−2
d2 (γ0 , γ1 ) + d2 (˜
γ0 , γ˜1 ) .
(2.10)
Choosing γ˜ to be a constant geodesic and using (2.9), we get that µm ∈ P2 (Geod(X)) for every
˜ ∈ P(Geod(X)), by a gluing argument (Lemma 2.12 below with
m ∈ N. Now, for any given ν, ν
n
˜ in place of ν, ν˜, Y = Geod(X), Z = X 2 +1 ) we can find a plan β ∈ P([Geod(X)]2 ) such
ν, ν
that
1
π#
β = ν,
2
˜,
π#
β=ν
2n
2n
Y
Y
1
2
˜ ),
e0 , . . . , ei/2n , . . . , e1 ◦ π , e0 , . . . , ei/2n , . . . , e1 ◦ π
β ∈ Opt ( (ei/2n )# ν, (ei/2n )# ν
#
Q2n
i=0
i=0
Q2n
˜ is meant w.r.t. the Wasserstein diswhere optimality between i=0 (ei/2n )# ν and i=0 (ei/2n )# ν
n
˜ ) and using (2.10) we get that for
tance on P2 (X 2 +1 ). Using β to bound from above W2 (ν, ν
˜ ∈ P2 (Geod(X)) it holds
every couple of measures ν, ν
˜ ) ≤ 2W22
W22 (ν, ν
2n
Y
n
(ei/2n )# ν,
i=0
+
1
2n−2
Z
2
Y
˜
(ei/2n )# ν
i=0
d2 (γ0 , γ1 )dν(γ) +
4
Z
d2 (˜
γ0 , γ˜1 )dν(˜
γ)
as for Theorem 2.7 everything is simpler if closed balls in X are compact. Indeed, observe that a geodesic connecting two
points in BR (x0 ) lies entirely on the compact set B2R (x0 ), and that the set of geodesics lying on a given compact set is itself
compact in Geod(X), so that the tightness of (µn ) follows directly from the one of {µ0 , µ1 }.
33
0
˜ = µm and recalling that W2
Plugging ν = µm , ν
Q
Q2n
2n
m0
m
i=0 (ei/2n )# µ
i=0 (ei/2n )# µ ,
→ 0 as
0
m, m → +∞ for every n ∈ N we get that
Z
Z
1
2
m
2
m0
2
m
m0
d (γ0 , γ1 )dµ (γ) + d (˜
γ0 , γ˜1 )dµ (˜
γ)
lim W2 (µ , µ ) ≤ n−2
m, m0 →∞
2
1
= n−3 W22 (µ0 , µ1 ).
2
Letting n → ∞ we get that (µm ) ⊂ P2 (Geod(X)) is a Cauchy sequence and the conclusion.
Lemma 2.11 The multivalued map from G : X 2 → Geod(X) which associates to each pair (x, y)
the set G(x, y) of constant speed geodesics connecting x to y has closed graph.
Proof Straightforward.
Lemma 2.12 (A variant of gluing) Let Y, Z be Polish spaces, ν, ν˜ ∈ P(Y ) and f, g : Y → Z be
two Borel maps. Let γ ∈ Adm(f# ν, g# ν˜). Then there exists a plan β ∈ P(Y 2 ) such that
1
π#
β = ν,
2
π#
β = ν˜,
(f ◦ π 1 , g ◦ π 2 )# β = γ.
Proof Let {νz }, {˜
νz˜} be the disintegrations of ν, ν˜ w.r.t. f, g respectively. Then define
Z
β :=
νz × ν˜z˜ dγ(z, z˜).
Z2
Remark 2.13 (The Hilbert case) If X is an Hilbert space, then for every x, y ∈ X there exists only
one constant speed geodesic connecting them: the curve t 7→ (1−t)x+ty. Thus Theorem 2.10 reads
as: t 7→ µt is a constant speed geodesic if and only if there exists an optimal plan γ ∈ Opt (µ0 , µ1 )
such that
µt = (1 − t)π 1 + tπ 2 # γ.
If γ is induced by a map T , the formula further simplifies to
µt = (1 − t)Id + tT # µ0 .
(2.11)
Remark 2.14 A slight modification of the arguments presented in the second part of the proof of
Theorem 2.10 shows that if (X, d) is Polish and (P2 (X), W2 ) is geodesic, then (X, d) is geodesic
as well. Indeed, given x, y ∈ X and a geodesic (µt ) connecting δx to δy , we can build a measure
µ ∈ P(Geod(X)) satisfying (2.7). Then every γ ∈ supp(µ) is a geodesic connecting x to y.
Definition 2.15 (Non branching spaces) A geodesic space (X, d) is said non branching if for any
t ∈ (0, 1) a constant speed geodesic γ is uniquely determined by its initial point γ0 and by the point
γt . In other words, (X, d) is non branching if the map
Geod(X) 3 γ 7→ (γ0 , γt ) ∈ X 2 ,
is injective for any t ∈ (0, 1).
34
Non-branching spaces are interesting from the optimal transport point of view, because for such
spaces the behavior of geodesics in P2 (X) is particularly nice: optimal transport plan from intermediate measures to other measures along the geodesic are unique and induced by maps (it is quite
surprising that such a statement is true in this generality - compare the assumption of the proposition
below with the ones of Theorems 1.26, 1.33). Examples of non-branching spaces are Riemannian
manifolds, Banach spaces with strictly convex norms and Alexandrov spaces with curvature bounded
below. Examples of branching spaces are Banach spaces with non strictly convex norms.
Proposition 2.16 (Non branching and interior regularity) Let (X, d) be a Polish, geodesic, non
branching space. Then (P2 (X), W2 ) is non branching as well. Furthermore, if (µt ) ⊂ P2 (X) is a
constant speed geodesic, then for every t ∈ (0, 1) there exists only one optimal plan in Opt (µ0 , µt )
and this plan is induced by a map from µt . Finally, the measure µ ∈ P(Geod(X)) associated to
(µt ) via (2.7) is unique.
Proof Let (µt ) ⊂ P2 (X) be a constant speed geodesic and fix t0 ∈ (0, 1). Pick γ 1 ∈ Opt (µ0 , µt0 )
and γ 2 ∈ Opt (µt0 , µ1 ). We want to prove that both γ 1 and γ 2 are induced by maps from µt0 . To
this aim use the gluing lemma to find a 3-plan α ∈ P2 (X 3 ) such that
1,2
π#
α = γ1,
2,3
π#
α = γ2,
and observe that since (µt ) is a geodesic it holds
kd(π 1 , π 3 )kL2 (α) ≤ kd(π 1 , π 2 ) + d(π 2 , π 3 )kL2 (α) ≤ kd(π 1 , π 2 )kL2 (α) + kd(π 2 , π 3 )kL2 (α)
= kd(π 1 , π 2 )kL2 (γ 1 ) + kd(π 1 , π 2 )kL2 (γ 2 ) = W2 (µ0 , µt0 ) + W2 (µt0 , µ1 )
= W2 (µ0 , µ1 ),
so that (π 1 , π 3 )# α ∈ Opt (µ0 , µ1 ). Also, since the first inequality is actually an equality, we
have that d(x, y) + d(y, z) = d(x, z) for α-a.e. (x, y, z), which means that x, y, z lie along a
geodesic. Furthermore, since the second inequality is an equality, the functions (x, y, z) 7→ d(x, y)
and (x, y, z) 7→ d(y, z) are each a positive multiple of the other in supp(α). It is then immediate to
verify that for every (x, y, z) ∈ supp(α) it holds
d(x, y) = (1 − t0 )d(x, z),
d(y, z) = t0 d(x, z).
We now claim that for (x, y, z), (x0 , y 0 , z 0 ) ∈ supp(α) it holds (x, y, z) = (x0 , y 0 , z 0 ) if and only if
y = y 0 . Indeed, pick (x, y, z), (x0 , y, z 0 ) ∈ supp(α) and assume, for instance, that z 6= z 0 . Since
(π 1 , π 3 )# α is an optimal plan, by the cyclical monotonicity of its support we know that
2
2
d2 (x, z) + d2 (x0 , z 0 ) ≤ d2 (x, z 0 ) + d2 (x0 , z) ≤ d(x, y) + d(y, z 0 ) + d(x0 , y) + d(y, z)
2
2
= (1 − t0 )d(x, z) + t0 d(x0 , z 0 ) + (1 − t0 )d(x0 , z 0 ) + t0 d(x, z) ,
which, after some manipulation, gives d(x, z) = d(x0 , z 0 ) =: D. Again from the cyclical monotonicity of the support we have 2D2 ≤ d2 (x, z 0 ) + d2 (x0 , z), thus either d(x0 , z) or d(x, z 0 ) is ≥ than
D. Say d(x, z 0 ) ≥ D, so that it holds
D ≤ d(x, z 0 ) ≤ d(x, y) + d(y, z 0 ) = (1 − t0 )D + t0 D = D,
which means that the triple of points (x, y, z 0 ) lies along a geodesic. Since (x, y, z) lies on a geodesic
as well, by the non-branching hypothesis we get a contradiction.
35
Thus the map supp(α) 3 (x, y, z) 7→ y is injective. This means that there exists two maps
f, g : X → X such that (x, y, z) ∈ supp(α) if and only if x = f (y) and z = g(y). This is the same
as to say that γ 1 is induced by f and γ 2 is induced by g.
To summarize, we proved that given t0 ∈ (0, 1), every optimal plan γ ∈ Opt (µ0 , µt0 ) is induced
by a map from µt0 . Now we claim that the optimal plan is actually unique. Indeed, if there are two
of them induced by two different maps, say f and f 0 , then the plan
1
(f, Id)# µµt0 + (f 0 , Id)# µµt0 ,
2
would be optimal and not induced by a map.
It remains to prove that P2 (X) is non branching. Choose µ ∈ P2 (Geod(X)) such that (2.7)
holds, fix t0 ∈ (0, 1) and let γ be the unique optimal plan in Opt (µ0 , µt0 ). The thesis will be proved
if we show that µ depends only on γ. Observe that from Theorem 2.10 and its proof we know that
(e0 , et0 )# µ ∈ Opt (µ0 , µt0 ),
and thus (e0 , et0 )# µ = γ. By the non-branching hypothesis we know that (e0 , et0 ) : Geod(X) →
X 2 is injective. Thus it it invertible on its image: letting F the inverse map, we get
µ = F# γ,
and the thesis is proved.
Theorem 2.10 tells us not only that geodesics exists, but provides also a natural way to “interpolate” optimal plans: once we have the measure µ ∈ P(Geod(X)) satisfying (2.7), an optimal plan
from µt to µs is simply given by (et , es )# µ. Now, we know that the transport problem has a natural
dual problem, which is solved by the Kantorovich potential. It is then natural to ask how to interpolate potentials. In other words, if (ϕ, ϕc+ ) are c−conjugate Kantorovich potentials for (µ0 , µ1 ),
is there a simple way to find out a couple of Kantorovich potentials associated to the couple µt , µs ?
The answer is yes, and it is given - shortly said - by the solution of an Hamilton-Jacobi equation. To
see this, we first define the Hopf-Lax evolution semigroup Hts (which in Rd produces the viscosity
solution of the Hamilton-Jacobi equation) via the following formula:

d2 (x, y)


+ ψ(y),
if t < s,
inf


y∈X s − t





s
ψ(x),
if t = s,
(2.12)
Ht (ψ)(x) :=






d2 (x, y)


+ ψ(y),
if t > s,
 sup −
s−t
y∈X
To fully appreciate the mechanisms behind the theory, it is better to introduce the rescaled costs ct,s
defined by
d2 (x, y)
ct,s (x, y) :=
,
∀t < s, x, y ∈ X.
s−t
Observe that for t < r < s
ct,r (x, y) + cr,s (y, z) ≥ ct,s (x, z),
∀x, y, z ∈ X,
and equality holds if and only if there is a constant speed geodesic γ : [t, s] → X such that x =
t,s
γt , y = γr , z = γs . The notions of ct,s
+ and c− transforms, convexity/concavity and sub/superdifferential are defined as in Section 1.2, Definitions 1.8, 1.9 and 1.10.
The basic properties of the Hopf-Lax formula are collected in the following proposition:
36
Proposition 2.17 (Basic properties of the Hopf-Lax formula) We have the following three properties:
(i) For any t, s ∈ [0, 1] the map Hts is order preserving, that is φ ≤ ψ ⇒ Hts (φ) ≤ Hts (ψ).
(ii) For any t < s ∈ [0, 1] it holds
t,s t,s
Hst Hts (φ) = φc− c− ≤ φ,
t,s t,s
Hts Hst (φ) = φc+ c+ ≥ φ,
(iii) For any t, s ∈ [0, 1] it holds
Hts ◦ Hst ◦ Hts = Hts .
Proof The order preserving property is a straightforward consequence of the definition. To prove
property (ii) observe that
Hst Hts (φ) (x) = sup inf0 φ(x0 ) + ct,s (x0 , y) − ct,s (x, y) ,
y
x
t,s t,s
which gives the equality Hst Hts (φ) = φc− c− : in particular, choosing x0 = x we get the claim
(the proof of the other equation is similar). For the last property assume t < s (the other case is
similar) and observe that by (i) we have
H s ◦ H t ◦Hts ≥ Hts
| t {z }s
≥Id
and
Hts ◦ Hst ◦ Hts ≤ Hts .
| {z }
≤Id
The fact that Kantorovich potentials evolve according to the Hopf-Lax formula is expressed in
the following theorem. We remark that in the statement below one must deal at the same time with
c-concave and c-convex potentials.
Theorem 2.18 (Interpolation of potentials) Let (X, d) be a Polish geodesic space, (µt ) ⊂ P2 (X)
a constant speed geodesic in (P2 (X), W2 ) and ϕ a c = c0,1 -convex Kantorovich potential for the
couple (µ0 , µ1 ). Then the function ϕs := H0s (ϕ) is a ct,s -concave Kantorovich potential for the
couple (µs , µt ), for any t < s.
Similarly, if φ is a c-concave Kantorovich potential for (µ1 , µ0 ), then H1t (φ) is a ct,s -convex
Kantorovich potential for (µt , µs ) for any t < s.
Observe that that for t = 0, s = 1 the theorem reduces to the fact that H01 (ϕ) = (−ϕ)c+ is a cconcave Kantorovich potential for µ1 , µ0 , a fact that was already clear by the symmetry of the dual
problem discussed in Section 1.3.
Proof
We will prove only the first part of the statement, as the second is analogous.
Step 1. We prove that H0s (ψ) is a ct,s -concave function for any t < s and any ψ : X → R ∪ {+∞}.
This is a consequence of the equality
c0,s (x, y) = inf c0,t (z, y) + ct,s (x, z),
z∈X
37
from which it follows
H0s (ψ)(x) = inf c0,s (x, y) + ψ(y) = inf ct,s (x, z) +
y∈X
z∈X
inf c0,t (z, y) + ψ(y) .
y∈X
Step 2. Let µ ∈ P(Geod(X)) be a measure associated to the geodesic (µt ) via equation (2.7). We
claim that for every γ ∈ supp(µ) and s ∈ (0, 1] it holds
ϕs (γs ) = ϕ(γ0 ) + c0,s (γ0 , γs ).
(2.13)
Indeed the inequality ≤ comes directly from the definition by taking x = γ0 . To prove the opposite
one, observe that since (e0 , e1 )# µ ∈ Opt (µ0 , µ1 ) and ϕ is a c-convex Kantorovich potential for
µ0 , µ1 , we have from Theorem 1.13 that
ϕc− (γ1 ) = −c0,1 (γ0 , γ1 ) − ϕ(γ0 ),
thus
ϕ(x) = sup −c0,1 (x, y) − ϕc− (y) ≥ −c0,1 (x, γ1 ) − ϕc− (γ1 )
y∈X
= −c0,1 (x, γ1 ) + c0,1 (γ0 , γ1 ) + ϕ(γ0 ).
Plugging this inequality in the definition of ϕs we get
ϕs (γs ) = inf c0,s (x, γs ) + ϕ(x)
x∈X
≥ inf c0,s (x, γs ) − c0,1 (x, γ1 ) + c0,1 (γ0 , γ1 ) + ϕ(γ0 )
x∈X
s,1
≥ −c
(γs , γ1 ) + c0,1 (γ0 , γ1 ) − ϕ(γ0 ) = c0,s (γ0 , γs ) + ϕ(γ0 ).
Step 3. We know that an optimal transport plan from µt to µs is given by (et , es )# µ, thus to conclude
the proof we need to show that
t,s
ϕs (γs ) + (ϕs )c+ (γt ) = ct,s (γt , γs ),
∀γ ∈ supp(µ),
t,s
where (ϕs )c+ is the ct,s -conjugate of the ct,s -concave function ϕs . The inequality ≤ follows from
the definition of ct,s -conjugate. To prove opposite inequality start observing that
ϕs (y) = inf c0,s (x, y) + ϕ(y) ≤ c0,s (γ0 , y) + ϕ(γ0 )
x∈X
0,t
≤ c (γ0 , γt ) + ct,s (γt , y) + ϕ(γ0 ),
and conclude by
ct,s
ϕs+ (γt ) = inf ct,s (γt , y) − ϕs (y) ≥ −c0,t (γ0 , γt ) − ϕ(γ0 )
y∈X
= −c0,s (γ0 , γs ) + ct,s (γt , γs ) − ϕ(γ0 )
(2.13)
= ct,s (γt , γs ) − ϕs (γs ).
38
We conclude the section studying some curvature properties of (P2 (X), W2 ). We will focus
on spaces positively/non positively curved in the sense of Alexandrov, which are the non smooth
analogous of Riemannian manifolds having sectional curvature bounded from below/above by 0.
Definition 2.19 (PC and NPC spaces) A geodesic space (X, d) is said to be positively curved (PC)
in the sense of Alexandrov if for every constant speed geodesic γ : [0, 1] → X and every z ∈ X the
following concavity inequality holds:
d2 γt , z ≥ (1 − t)d2 γ0 , z + td2 γ1 , z − t(1 − t)d2 γ0 , γ1 .
(2.14)
Similarly, X is said to be non positively curved (NPC) in the sense of Alexandrov if the converse
inequality always holds.
Observe that in an Hilbert space equality holds in (2.14).
The result here is that (P2 (X), W2 ) is PC if (X, d) is, while in general it is not NPC if X is.
Theorem 2.20 ((P2 (X), W2 ) is PC if (X, d) is) Assume that (X, d) is positively curved. Then
(P2 (X), W2 ) is positively curved as well.
Proof Let (µt ) be a constant speed geodesic in P2 (X) and ν ∈ P2 (X). Let µ ∈ P2 (Geod(X))
be a measure such that
µt = (et )# µ,
∀t ∈ [0, 1],
as in Theorem 2.10. Fix t0 ∈ [0, 1] and choose γ ∈ Opt (µt0 , ν). Using a gluing argument (we omit
the details) it is possible to show the existence a measure α ∈ P(Geod(X) × X) such that
Geod(X)
α = µ,
X
α = γ,
π#
et0 , π
#
(2.15)
where π Geod(X) (γ, x) := γ ∈ Geod(X), π X (γ, x) := x ∈ X and et0 (γ, x) := γt0 ∈ X. Then α
satisfies also
e0 , π X # α ∈ Adm(µ0 , ν)
(2.16)
e1 , π X # α ∈ Adm(µ1 , ν),
and therefore it holds
W22 (µt0 , ν)
Z
d2 (et0 (γ), x)dα(γ, x)
Z
(2.14)
≥
(1 − t0 )d2 γ0 , z + t0 d2 γ1 , z − t0 (1 − t0 )d2 γ0 , γ1 dα(γ, x)
Z
Z
(2.15)
= (1 − t0 ) d2 γ0 , z dα(γ, x) + t0 d2 γ1 , z dα(γ, x)
Z
− t0 (1 − t0 ) d2 γ0 , γ1 dµ(γ)
=
(2.16)
≥ (1 − t0 )W22 (µ0 , ν) + t0 W22 (µ1 , ν) − t0 (1 − t0 )W22 (µ0 , µ1 ),
and by the arbitrariness of t0 we conclude.
39
Example 2.21 ((P2 (X), W2 ) may be not NPC if (X, d) is) Let X = R2 with the Euclidean distance. We will prove that (P2 (R2 ), W2 ) is not NPC. Define
µ0 :=
1
(δ(1,1) + δ(5,3) ),
2
µ1 :=
1
(δ(−1,1) + δ(−5,3) ),
2
ν :=
1
(δ(0,0) + δ(0,−4) ),
2
then explicit computations show that W22 (µ0 , µ1 ) = 40 and W22 (µ0 , ν) = 30 = W22 (µ1 , ν). The
unique constant speed geodesic (µt ) from µ0 to µ1 is given by
µt =
1
δ(1−6t,1+2t) + δ(5−6t,3−2t) ,
2
and simple computations show that
40 = W22 (µ1/2 , ν) >
30 30 40
+
− .
2
2
4
2.3 X Riemannian manifold
In this section X will always be a compact, smooth Riemannian manifold M without boundary,
endowed with the Riemannian distance d.
We study two aspects: the first one is the analysis of some important consequences of Theorem
2.18 about the structure of geodesics in P2 (M ), the second one is the introduction of the so called
weak Riemannian structure of (P2 (M ), W2 ).
Notice that since M is compact, P2 (M ) = P(M ). Yet, we stick to the notation P2 (M )
because all the statements we make in this section are true also for non compact manifolds (although,
for simplicity, we prove them only in the compact case).
2.3.1
Regularity of interpolated potentials and consequences
We start observing how Theorem 2.10 specializes to the case of Riemannian manifolds:
Corollary 2.22 (Geodesics in (P2 (M ), W2 )) Let (µt ) ⊂ P2 (M ). Then the following two things
are equivalent:
i) (µt ) is a geodesic in (P2 (M ), W2 ),
ii) there exists a plan γ ∈ P(T M ) (T M being the tangent bundle of M ) such that
Z
|v|2 dγ(x, v) = W22 (µ0 , µ1 ),
Exp(t) # γ = µt ,
(2.17)
Exp(t) : T M → M being defined by (x, v) 7→ expx (tv).
Also, for any µ, ν ∈ P2 (M ) such that µ is a regular measure (Definition 1.32), the geodesic connecting µ to ν is unique.
Notice that we cannot substitute the first equation in (2.17) with (π M , exp)# γ ∈ Opt (µ0 , µ1 ), because this latter condition is strictly weaker (it may be that the curve t 7→ expx (tv) is not a globally
minimizing geodesic from x to expx (v) for some (x, v) ∈ supp γ).
40
Proof The implication (i) ⇒ (ii) follows directly from Theorem 2.10 by taking into account the
fact that t 7→ γt is a constant speed geodesic on M implies that for some (x, v ∈ T M ) it holds
γt = expx (tv) and in this case d(γ0 , γ1 ) = |v|.
For the converse implication, just observe that from the second equation in (2.17) we have
Z
Z
W22 (µt , µs ) ≤ d2 expx (tv), expx (sv) dγ(x, v) ≤ (t−s)2 |v|2 dγ(x, v) = (t−s)2 W22 (µ0 , µ1 ),
having used the first equation in (2.17) in the last step.
To prove the last claim just recall that by Remark 1.35 we know that for µ-a.e. x there exists
a unique geodesic connecting x to T (x), T being the optimal transport map. Hence the conclusion
follows from (ii) of Theorem 2.10.
Now we discuss the regularity properties of Kantorovich potentials which follows from Theorem
2.18.
Corollary 2.23 (Regularity properties of the interpolated potentials) Let ψ be a c−convex potential for (µ0 , µ1 ) and let ϕ := H01 (ψ). Define ψt := H0t (ψ), ϕt := H1t (ϕ) and choose a geodesic
(µt ) from µ0 to µ1 . Then for every t ∈ (0, 1) it holds:
i) ψt ≥ ϕt and both the functions are real valued,
ii) ψt = ϕt on supp(µt ),
iii) ψt and ϕt are differentiable in the support of µt and on this set their gradients coincide.
Proof For (i) we have
ϕt = H1t (ϕ) = (H1t ◦ H01 )(ψ) = (H1t ◦ Ht1 ◦H0t )ψ ≤ H0t (ψ) = ψt .
| {z }
≤Id
Now observe that by definition, ψt (x) < +∞ and ϕt (x) > −∞ for every x ∈ M , thus it holds
+∞ > ψt (x) ≥ ϕt (x) > −∞,
∀x ∈ M.
To prove (ii), let µ be the unique plan associated to the geodesic (µt ) via (2.7) (recall Proposition 2.16 for uniqueness) and pick γ ∈ supp(µ). Recall that it holds
ψt (γt ) = c0,t (γ0 , γt ) + ψ(γ0 ),
ϕt (γt ) = ct,1 (γt , γ1 ) + ϕ(γ1 ).
Thus from ϕ(γ1 ) = c0,1 (γ0 , γ1 ) + ψ(γ0 ) we get that ψt (γt ) = ϕt (γt ). Since µt = (et )# µ, the
compactness of M gives supp(µt ) = {γt }γ∈supp(µ) , so that (ii) follows.
Now we turn to (iii). With the same choice of t 7→ γt as above, recall that it holds
ψt (γt ) = c0,t (γ0 , γt ) + ψ(γ0 )
ψt (x) ≤ c0,t (γ0 , x) + ψ(γ0 ),
∀x ∈ M,
and that the function x 7→ c0,t (γ0 , x)+ψ(γ0 ) is superdifferentiable at x = γt . Thus the function x 7→
ψt is superdifferentiable at x = γt . Similarly, ϕt is subdifferentiable at γt . Choose v1 ∈ ∂ + ψt (γt ),
v2 ∈ ∂ − ϕt (γt ) and observe that
−1
ψt (γt )+ v1 , exp−1
γt (x) +o(D(x, γt )) ≥ ψt (x) ≥ ϕt (x) ≥ ϕt (γt )+ v2 , expγt (x) +o(D(x, γt )),
which gives v1 = v2 and the thesis.
41
Corollary 2.24 (The intermediate transport maps are locally Lipschitz) Let (µt ) ⊂ P2 (M ) a
constant speed geodesic in (P2 (M ), W2 ). Then for every t ∈ (0, 1) and s ∈ [0, 1] there exists only
one optimal transport plan from µt to µs , this transport plan is induced by a map, and this map is
locally Lipschitz.
Note: clearly in a compact setting being locally Lipschitz means being Lipschitz. We wrote ‘locally’
because this is the regularity of transport maps in the non compact situation.
Proof Fix t ∈ (0, 1) and, without loss of generality, let s = 1. The fact that the optimal plan from
is unique and induced by a map is known by Proposition 2.16. Now let v be the vector field defined
on supp(µt ) by v(x) = ∇ϕt = ∇ψt (we are using part (iii) of the above corollary, with the same
notation). The fact that ψt is a c0,t -concave potential for the couple µt , µ0 tells that the optimal
0,t
transport map T satisfies T (x) ∈ ∂ c+ φt (x) for µt -a.e. x. Using Theorem 1.33, the fact that ψt is
differentiable in supp(µt ) and taking into account the scaling properties of the cost, we get that T
may be written as T (x) = expx −v(x). Since the exponential map is C ∞ , the fact that T is Lipschitz
will follow if we show that the vector field v on supp(µt ) is, when read in charts, Lipschitz.
Thus, passing to local coordinates and recalling that d2 (·, y) is uniformly semiconcave, the situation is the following. We have a semiconcave function f : Rd → R and a semiconvex function
g : Rd → R such that f ≥ g on Rd , f = g on a certain closed set K and we have to prove that the
vector field u : K → Rd defined by u(x) = ∇f (x) = ∇g(x) is Lipschitz. Up to rescaling we may
assume that f and g are such that f − | · |2 is concave and g + | · |2 is convex. Then for every x ∈ K
and y ∈ Rd we have
hu(x), y − xi − |x − y|2 ≤ g(y) − g(x) ≤ f (y) − f (x) ≤ hu(x), y − xi + |y − x|2 ,
and thus for every x ∈ K, y ∈ Rd it holds
|f (y) − f (x) − hu(x), y − xi | ≤ |x − y|2 .
Picking x1 , x2 ∈ K and y ∈ Rd we have
f (x2 ) − f (x1 ) − hu(x1 ), x2 − x1 i ≤ |x1 − x2 |2 ,
f (x2 + y) − f (x2 ) − hu(x2 ), yi ≤ |y|2 ,
−f (x2 + y) + f (x1 ) + hu(x1 ), x2 + y − x1 i ≤ |x2 + y − x1 |2 .
Adding up we get
hu(x1 ) − u(x2 ), yi ≤ |x1 − x2 |2 + |y|2 + |x2 + y − x1 |2 ≤ 3(|x1 − x2 |2 + |y|2 ).
Eventually, choosing y = (u(x1 ) − u(x2 ))/6 we obtain
|u(x1 ) − u(x2 )|2 ≤ 36|x1 − x2 |2 .
It is worth stressing the fact that the regularity property ensured by the previous corollary holds
without any assumption on the measures µ0 , µ1 .
Remark 2.25 (A (much) simpler proof in the Euclidean case) The fact that intermediate transport maps are Lipschitz can be proved, in the Euclidean case, via the theory of monotone operators. Indeed if G : Rd → Rd is a - possibly multivalued - monotone map (i.e. satisfies
hy1 − y2 , x1 − x2 i ≥ 0 for every x1 , x2 ∈ Rd , yi ∈ G(xi ), i = 1, 2), then the operator
42
((1 − t)Id + tG)−1 is single valued, Lipschitz, with Lipschitz constant bounded above by 1/(1 − t).
To prove this, pick x1 , x2 ∈ Rd , y1 ∈ G(x1 ), y2 ∈ G(x2 ) and observe that
|(1 − t)x1 + ty1 − (1 − t)x2 + ty2 |2
= (1 − t)2 |x1 − x2 |2 + t2 |y1 − y2 |2 + 2t(1 − t) hx1 − x2 , y1 − y2 i ≥ (1 − t)2 |x1 − x2 |2 ,
which is our claim.
Now pick µ0 , µ1 ∈ P2 (Rd ), an optimal plan γ ∈ Opt (µ0 , µ1 ) and consider the geodesic t 7→
µt := ((1 − t)π 1 + tπ 2 )# γ (recall Remark 2.13). From Theorem 1.26 we know that there exists a
convex function ϕ such that supp(γ) ⊂ ∂ − ϕ. Also, we know that the unique optimal plan from µ0
to µt is given by the formula
π 1 , (1 − t)π 1 + tπ 2 # γ,
which is therefore supported in the graph of (1 − t)Id + t∂ − ϕ. Since the subdifferential of a convex
function is a monotone operator, the thesis follows from the previous claim.
Considering the case in which µ1 is a delta and µ0 is not, we can easily see that the bound
(1 − t)−1 on the Lipschitz constant of the optimal transport map from µt to µ0 is sharp.
An important consequence of Corollary 2.24 is the following proposition:
Proposition 2.26 (Geodesic convexity of the set of absolutely continuous measures) Let M be a
Riemannian manifold, (µt ) ⊂ P2 (M ) a geodesic and assume that µ0 is absolutely continuous w.r.t.
the volume measure (resp. gives 0 mass to Lipschitz hypersurfaces of codimension 1). Then µt is
absolutely continuous w.r.t. the volume measure (resp. gives 0 mass to Lipschitz hypersurfaces of
codimension 1) for every t < 1. In particular, the set of absolutely continuous measures is geodesically convex (and the same for measures giving 0 mass to Lipschitz hypersurfaces of codimension
1).
Proof Assume that µ0 is absolutely continuous, let A ⊂ M be of 0 volume measure, t ∈ (0, 1)
and let Tt be the optimal transport map from µt to µ0 . Then for every Borel set A ⊂ M it holds
Tt−1 (Tt (A)) ⊃ A and thus
µt (A) ≤ µt (Tt−1 (Tt (A))) = µ0 (Tt (A)).
The claims follow from the fact that Tt is locally Lipschitz.
Remark 2.27 (The set of regular measures is not geodesically convex) It is natural to ask
whether the same conclusion of the previous proposition holds for the set of regular measures
(Definitions 1.25 and 1.32). The answer is not: there are examples of regular measures µ0 , µ1 in
P2 (R2 ) such that the middle point of the geodesic connecting them is not regular.
2.3.2
The weak Riemannian structure of (P2 (M ), W2 )
In order to introduce the weak differentiable structure of (P2 (X), W2 ), we start with some heuristic
considerations. Let X = Rd and (µt ) be a constant speed geodesic on P2 (Rd ) induced by some
optimal map T , i.e.:
µt = (1 − t)Id + tT # µ0 .
Then a simple calculation shows that (µt ) satisfies the continuity equation
d
µt + ∇ · (vt µt ) = 0,
dt
43
with vt := (T − Id) ◦ ((1 − t)Id + tT )−1 for every t, in the sense of distributions. Indeed for
φ ∈ Cc∞ (Rd ) it holds
Z
Z
Z
Z
d
d
φdµt =
φ (1−t)Id+tT dµ0 = h∇φ (1−t)Id+tT , T −Idi dµ0 = h∇φ, vt idµt .
dt
dt
Now, the continuity equation describes the link between the motion of the continuum µt and the
instantaneous velocity vt : Rd → Rd of every “atom” of µt . It is therefore natural to think at the
vector field vt as the infinitesimal variation of the continuum µt .
From this perspective, one might expect that the set of “smooth” curves on P2 (Rd ) (and more
generally on P2 (M )) is somehow linked to the set of solutions of the continuity equation. This is
actually the case, as we are going to discuss now.
In order to state the rigorous result, we need to recall the definition of absolutely continuous curve
on a metric space.
˜ be a metric space and let [0, 1] 3 t 7→
Definition 2.28 (Absolutely continuous curve) Let (Y, d)
yt ∈ Y be a curve. Then (yt ) is said absolutely continuous if there exists a function f ∈ L1 (0, 1)
such that
Z s
˜ t , ys ) ≤
d(y
f (r)dr,
∀t < s ∈ [0, 1].
(2.18)
t
We recall that if (yt ) is absolutely continuous, then for a.e. t the metric derivative |y˙ t | exists,
given by
˜ t+h , yt )
d(y
,
(2.19)
|y˙ t | := lim
h→0
|h|
and that |y˙ t | ∈ L1 (0, 1) and is the smallest L1 function (up to negligible sets) for which inequality
(2.18) is satisfied (see e.g. Theorem 1.1.2 of [6] for the simple proof).
The link between absolutely continuous curves in P2 (M ) and the continuity equation is given
by the following theorem:
Theorem 2.29 (Characterization of absolutely continuous curves in (P2 (M ), W2 )) Let M be a
smooth complete Riemannian manifold without boundary. Then the following holds.
(A) For every absolutely continuous curve (µt ) ⊂ P2 (M ) there exists a Borel family of vector fields
vt on M such that kvt kL2 (µt ) ≤ |µ˙ t | for a.e. t and the continuity equation
d
µt + ∇ · (vt µt ) = 0,
dt
(2.20)
holds in the sense of distributions.
(B) If (µt , vt ) satisfies the continuity equation (2.20) in the sense of distributions and
R1
kvt kL2 (µt ) dt < ∞, then up to redefining t 7→ µt on a negligible set of times, (µt ) is an ab0
solutely continuous curve on P2 (M ) and |µ˙ t | ≤ kvt kL2 (µt ) for a.e. t ∈ [0, 1].
Note that we are not assuming any kind of regularity on the µt ’s.
We postpone the (sketch of the) proof of this theorem to the end of the section, for the moment
we analyze its consequences in terms of the geometry of P2 (M ).
The first important consequence is that the Wasserstein distance, which was defined via the
‘static’ optimal transport problem, can be recovered via the following ‘dynamic’ Riemannian-like
formula:
44
Proposition 2.30 (Benamou-Brenier formula) Let µ0 , µ1 ∈ P2 (M ). Then it holds
Z 1
0
1
kvt kµt dt ,
W2 (µ , µ ) = inf
(2.21)
0
where the infimum is taken among all weakly continuous distributional solutions of the continuity
equation (µt , vt ) such that µ0 = µ0 and µ1 = µ1 .
Proof We start with inequality ≤. Let (µt , vt ) be a solution of the continuity equation. Then if
R1
kvt kL2 (µt ) = +∞ there is nothing to prove. Otherwise we may apply part B of Theorem 2.29 to
0
get that (µt ) is an absolutely continuous curve on P2 (M ). The conclusion follows from
0
1
Z
W2 (µ , µ ) ≤
1
Z
|µ˙ t |dt ≤
0
1
kvt kL2 (µt ) dt,
0
where in the last step we used part (B) of Theorem 2.29 again.
To prove the converse inequality it is enough to consider a constant speed geodesic (µt ) connecting µ0 to µ1 and apply part (A) of Theorem 2.29 to get the existence of vector fields vt such that the
continuity equation is satisfied and kvt kL2 (µt ) ≤ |µ˙ t | = W2 (µ0 , µ1 ) for a.e. t ∈ [0, 1]. Then we
have
Z
1
W2 (µ0 , µ1 ) ≥
kvt kL2 (µt ) dt,
0
as desired.
This proposition strongly suggests that the scalar product in L2 (µ) should be considered as the
metric tensor on P2 (M ) at µ. Now observe that given an absolutely continuous curve (µt ) ⊂
P2 (M ) in general there is no unique choice of vector field (vt ) such that the continuity equation
(2.20) is satisfied. Indeed, if (2.20) holds and wt is a Borel family of vector fields such that ∇ ·
(wt µt ) = 0 for a.e. t, then the continuity equation is satisfied also with the vector fields (vt + wt ). It
is then natural to ask whether there is some natural selection principle to associate uniquely a family
of vector fields (vt ) to a given absolutely continuous curve. There are two possible approaches:
Algebraic approach. The fact that for distributional solutions of the continuity equation the vector
field vt acts only on gradients of smooth functions suggests that the vt ’s should be taken in the set of
gradients as well, or, more rigorously, vt should belong to
n
oL2 (µt )
∇ϕ : ϕ ∈ Cc∞ (M )
(2.22)
for a.e. t ∈ [0, 1].
Variational approach. The fact that the continuity equation is linear in vt and the L2 norm is
strictly convex, implies that there exists a unique, up to negligible sets in time, family of vector fields
vt ∈ L2 (µt ), t ∈ [0, 1], with minimal norm for a.e. t, among the vector fields compatible with the
curve (µt ) via the continuity equation. In other words, for any other vector field (˜
vt ) compatible with
the curve (µt ) in the sense that (2.20) is satisfied, it holds k˜
vt kL2 (µt ) ≥ kvt kL2 (µt ) for a.e. t. It is
immediate to verify that vt is of minimal norm if and only if it belongs to the set
Z
n
o
2
v ∈ L (µt ) :
hv, wi dµt = 0, ∀w ∈ L2 (µt ) s.t. ∇ · (wµt ) = 0 .
(2.23)
The important point here is that the sets defined by (2.22) and (2.23) are the same, as it is easy to
check. Therefore it is natural to give the following
45
Definition 2.31 (The tangent space) Let µ ∈ P2 (M ). Then the tangent space Tanµ (P2 (M )) at
P2 (M ) in µ is defined as
n
oL2 (µ)
Tanµ (P2 (M )) := ∇ϕ : ϕ ∈ Cc∞ (M )
Z
n
o
= v ∈ L2 (µ) :
hv, wi dµ = 0, ∀w ∈ L2 (µ) s.t. ∇ · (wµ) = 0
Thus we now have a definition of tangent space for every µ ∈ P2 (M ) and this tangent space is naturally endowed with a scalar product: the one of L2 (µ). This fact, Theorem 2.29 and Proposition 2.30
are the bases of the so-called weak Riemannian structure of (P2 (M ), W2 ).
We now state, without proof, some other properties of (P2 (M ), W2 ) which resemble those of a
Riemannian manifold. For simplicity, we will deal with the case M = Rd only and we will assume
that the measures we are dealing with are regular (Definition 1.25), but analogous statements hold
for general manifolds and general measures.
In the next three propositions (µt ) is an absolutely continuous curve in P2 (Rd ) such that µt is
regular for every t. Also (vt ) is the unique, up to a negligible set of times, family of vector fields
such that the continuity equation holds and vt ∈ Tanµt (P2 (Rd )) for a.e. t.
Proposition 2.32 (vt can be recovered by infinitesimal displacement) Let (µt ) and (vt ) as above.
Also, let Tts be the optimal transport map from µt to µs (which exists and is unique by Theorem 1.26,
due to our assumptions on µt ). Then for a.e. t ∈ [0, 1] it holds
Tts − Id
,
s→t s − t
vt = lim
the limit being understood in L2 (µt ).
Proposition 2.33 (“Displacement tangency”) Let (µt ) and (vt ) as above. Then for a.e. t ∈ [0, 1]
it holds
W2 µt+h , (Id + hvt )# µt
lim
= 0.
(2.24)
h→0
h
Proposition 2.34 (Derivative of the squared distance) Let (µt ) and (vt ) as above and ν ∈
P2 (Rd ). Then for a.e. t ∈ [0, 1] it holds
Z
d 2
W2 (µt , ν) = −2 hvt , Tt − Idi dµt ,
dt
where Tt is the unique optimal transport map from µt to ν (which exists and is unique by Theorem 1.26, due to our assumptions on µt ).
We conclude the section with a sketch of the proof of Theorem 2.29.
Sketch of the Proof of Theorem 2.29
Reduction to the Euclidean case Suppose we already know the result for the case Rd and we
want to prove it for a compact and smooth manifold M . Use the Nash embedding theorem to get
the existence of a smooth map i : M → RD whose differential provides an isometry of Tx M
and its image for any x ∈ M . Now notice that the inequality |i(x) − i(y)| ≤ d(x, y) valid for
any x, y ∈ M ensures that W2 (i# µ, i# ν) ≤ W2 (µ, ν) for any µ, ν ∈ P2 (M ). Hence given an
absolutely continuous curve (µt ) ⊂ P2 (M ), the curve (i# µt ) ⊂ P2 (RD ) is absolutely continuous
as well, and there exists a family vector fields vt such that (2.20) is fulfilled with i# µt in place of µt
46
and kvt kL2 (i# µt ) ≤ |i#˙µt | ≤ |µ˙t | for a.e. t. Testing the continuity equation with functions constant
on i(M ) we get that for a.e. t the vector field vt is tangent to i(M ) for i# µt -a.e. point. Thus the vt ’s
are the (isometric) image of vector fields on M and part (A) is proved.
Viceversa, let (µt ) ⊂ P2 (M ) be a curve and the vt ’s vector fields in M such that
R1
kv
t kL2 (µt ) dt < ∞ and assume that they satisfy the continuity equation. Then the measures
0
µ
˜t := i# µt and the vector fields v˜t := di(vt ) satisfy the continuity equation on RD . Therefore (˜
µt )
is an absolutely continuous curve and it holds |µ˜˙t | ≤ k˜
vt kL2 (˜µt ) = kvt kL2 (µt ) for a.e. t. Notice that
i is bilipschitz and therefore (µt ) is absolutely continuous as well. Hence to conclude it is sufficient
to show that |µ˜˙t | = |µ˙t | a.e. t. To prove this, one can notice that the fact that i is bilipschitz and
validity of
d(x, y)
= 1,
lim sup
r→0 x,y∈M |i(x) − i(y)|
d(x,y)<r
give that
lim
r→0
sup
µ,ν∈P2 (M )
W2 (µ,ν)<r
W2 (µ, ν)
= 1.
W2 (i# µ, i# ν)
We omit the details.
Part A. Fix ϕ ∈ Cc∞ (Rd ) and observe that for every γ st ∈ Opt (µt , µs ) it holds
Z
Z
Z
Z
ϕdµs − ϕdµt = ϕ(y)dγ st (x, y) − ϕ(x)dγ st (x, y)
Z
= ϕ(y) − ϕ(x)dγ st (x, y)
Z Z 1
s
=
h∇ϕ(x + λ(y − x)), y − xi dλdγ t (x, y)
Z 0
s
= h∇ϕ(x), y − xi dγ t (x, y) + Rem(ϕ, t, s)
sZ
sZ
≤
|∇ϕ(x)|2 dγ st (x, y)
(2.25)
|x − y|2 dγ st (x, y) + Rem(ϕ, t, s)
= k∇ϕkL2 (µt ) W2 (µt , µs ) + Rem(ϕ, t, s),
where the remainder term Rem(ϕ, t, s) can be bounded by by
Z
Lip(∇ϕ) 2
Lip(∇ϕ)
|x − y|2 dγ st (x, y) =
W2 (µt , µs ).
|Rem(ϕ, t, s)| ≤
2
2
R
Thus (2.25) implies that the map t 7→ ϕdµt is absolutely continuous for any ϕ ∈ Cc∞ (Rd ).
Now let D ⊂ Cc∞ (Rd ) be a countable set such that {∇ϕ : ϕ ∈ D} is dense in Tanµt (P2 (Rd ))
for every t ∈ [0, 1] (the existence of such D follows from the compactness of {µt }t∈[0,1] ⊂ P2 (Rd ),
we omit the details). The
R above arguments imply that there exists a set A ⊂ [0, 1] of full Lebesgue
measure such that t 7→ ϕdµt is differentiable at t ∈ A for every ϕ ∈ D; we can also assume that
the metric derivative |µ˙ t | exists for every t ∈ A. Also, by (2.25) we know that for t0 ∈ A the linear
functional Lt0 : {∇ϕ : ϕ ∈ D} → R given by
Z
d
∇ϕ 7→ Lt0 (∇ϕ) := |t=t
ϕdµt
0
dt
47
satisfies
|Lt0 (∇ϕ)| ≤ k∇ϕkL2 (µt0 ) |µ˙ t0 |,
and thus it can be uniquely extended to a linear and bounded functional on Tanµt0 (P2 (Rd )). By the
Riesz representation theorem there exists a vector field vt0 ∈ Tanµt0 (P2 (Rd )) such that
Z
Z
d
ϕdµ
=
L
(∇ϕ)
=
h∇ϕ, vt0 i dµt0 ,
∀ϕ ∈ D,
(2.26)
t
t0
dt |t=t0
and whose norm in L2 (µt0 ) is bounded above by the metric derivative |µ˙ t | at t = t0 . It remains to
prove that the continuity equation is satisfied in the sense of distributions. This is a consequence of
(2.26), see Theorem 8.3.1 of [6] for the technical details.
Part B. Up to a time reparametrization argument, we can assume that kvt kL2 (µt ) ≤ L for some
L ∈ R for a.e. t. Fix a Gaussian family of mollifiers ρε and define
µεt := µt ∗ ρε ,
vtε :=
(vt µt ) ∗ ρε
.
µεt
It is clear that
d ε
µ + ∇ · (vtε µεt ) = 0.
dt t
Moreover, from Jensen inequality applied to the map (X, z) 7→ z|X/z|2 = |X|2 /z (X = vt µt ) it
follows that
kvtε kL2 (µεt ) ≤ kvt kL2 (µt ) ≤ L.
(2.27)
This bound, together with the smoothness of vtε , implies that there exists a unique locally Lipschitz
map Tε (·, ·) : [0, 1] × Rd → Rd , t ∈ [0, 1] satisfying
(
d ε
∀x ∈ Rd , a.e. t ∈ [0, 1],
T (t, x) = vtε Tε (t, x)
dt ε
T (t, x) = x,
∀x ∈ Rd , t ∈ [0, 1].
A simple computation shows that the curve t 7→ µ
˜εt := Tε (t, ·)# µε0 solves
d ε
µ
˜ + ∇ · (vtε µ
˜εt ) = 0,
dt t
(2.28)
which is the same equation solved by (µεt ). It is possible to show that this fact together with the
smoothness of the vtε ’s and the equality µε0 = µ
˜ε0 gives that µ
˜εt = µεt for every t, ε (see Proposition
8.1.7 and Theorem 8.3.1 of [6] for a proof of this fact).
Conclude observing that
Z
Z Z s
2 ε
2 ε
ε
ε
ε
2
ε
ε
ε
W2 (µt , µs ) ≤ |T (t, x) − T (s, x)| dµ0 (x) = vr T (r, x) dµ0 (x)
t
Z Z s
Z s
ε ε
ε ε
2
ε
vr T (r, ·) 2 2 ε dr
≤ |t − s|
vr T (r, x) dr dµ0 = |t − s|
L (µ0 )
t
Z st
(2.27)
≤ |t − s|
kvrε k2L2 (µεr ) dr ≤ |t − s|2 L,
t
and that, by the characterization of convergence (2.4), W2 (µεt , µt ) → 0 as ε → 0 for every t ∈ [0, 1].
48
2.4
Bibliographical notes
To call the distance W2 the ‘Wasserstein distance’ is quite not fair: a much more appropriate would
be Kantorovich distance. Also, the spelling ‘Wasserstein’ is questionable, as the original one was
‘Vasershtein’. Yet, this terminology is nowadays so common that it would be impossible to change
it.
The equivalence (2.4) has been proven by the authors and G. Savaré in [6]. In the same reference
Remark 2.8 has been first made. The fact that (P2 (X), W2 ) is complete and separable as soon as
(X, d) is belongs to the folklore of the theory, a proof can be found in [6]. Proposition 2.4 was proved
by C. Villani in [79], Theorem 7.12.
The terminology displacement interpolation was introduced by McCann [63] for probability measures in Rd . Theorem 2.10 appears in this form here for the first time: in [58] the theorem was proved
in the compact case, in [80] (Theorem 7.21) this has been extended to locally compact structures and
much more general forms of interpolation. The main source of difficulty when dealing with general
Polish structure is the potential lack of tightness: the proof presented here is strongly inspired by the
work of S. Lisini [54].
Proposition 2.16 and Theorem 2.18 come from [80] (Corollary 7.32 and Theorem 7.36 respectively). Theorem 2.20 and the counterexample 2.21 are taken from [6] (Theorem 7.3.2 and Example
7.3.3 respectively).
The proof of Corollary 2.24 is taken from an argument by A. Fathi [35], the paper being inspired
by Bernand-Buffoni [13]. Remark 2.27 is due to N. Juillet [48].
The idea of looking at the transport problem as dynamical problem involving the continuity equation is due to J.D. Benamou and Y. Brenier ([12]), while the fact that (P2 (Rd ), W2 ) can be viewed as
a sort of infinite dimensional Riemannian manifold is an intuition by F. Otto [67]. Theorem 2.29 has
been proven in [6] (where also Propositions 2.32, 2.33 and 2.34 were proven) in the case M = Rd ,
the generalization to Riemannian manifolds comes from Nash’s embedding theorem.
3
Gradient flows
The aim of this Chapter is twofold: on one hand we give an overview of the theory of Gradient Flows
in a metric setting, on the other hand we discuss the important application of the abstract theory to
the case of geodesically convex functionals on the space (P2 (Rd ), W2 ).
Let us recall that for a smooth function F : M → R on a Riemannian manifold, a gradient flow
(xt ) starting from x ∈ M is a differentiable curve solving
0
xt = −∇F (xt ),
(3.1)
x0 = x.
Observe that there are two necessary ingredients in this definition: the functional F and the metric
on M . The role of the functional is clear. The metric is involved to define ∇F : it is used to identify
the cotangent vector dF with the tangent vector ∇F .
3.1
Hilbertian theory of gradient flows
In this section we quickly recall the main results of the theory of Gradient flow for λ-convex functionals on Hilbert spaces. This will deserve as guideline for the analysis that we will make later on
of the same problem in a purely metric setting.
49
Let H be Hilbert and λ ∈ R. A λ-convex functional F : H → R ∪ {+∞} is a functional
satisfying:
λ
F (1 − t)x + ty ≤ (1 − t)F (x) + tF (y) − t(1 − t)|x − y|2 ,
2
∀x, y ∈ H,
(this corresponds to ∇2 F ≥ λId for functionals on Rd ). We denote with D(F ) the domain of F , i.e.
D(F ) := {x : F (x) < ∞}.
The subdifferential ∂ − F (x) of F at a point x ∈ D(F ) is the set of v ∈ H such that
F (x) + hv, y − xi +
λ
|x − y|2 ≤ F (y),
2
∀y ∈ H.
An immediate consequence of the definition is the fact that the subdifferential of F satisfies the
monotonicity inequality:
∀v ∈ ∂F (x), w ∈ ∂ − F (y).
hv − w, x − yi ≥ λ|x − y|2
We will denote by ∇F (x) the element of minimal norm in ∂F (x), which exists and is unique as soon
as ∂ − F (x) 6= ∅, because ∂ − F (x) is closed and convex.
For convex functions a natural generalization of Definition (3.1) of Gradient Flow is possible: we
say that (xt ) is a Gradient Flow for F starting from x ∈ H if it is a locally absolutely continuous
curve in (0, +∞) such that
(
for a.e. t > 0
x0t ∈ −∂ − F (xt )
(3.2)
lim xt = x.
t↓0
We now summarize without proof the main existence and uniqueness results in this context.
Theorem 3.1 (Gradient Flows in Hilbert spaces - (Brezis, Pazy) ) If F : H → R ∪ {+∞} is λconvex and lower semicontinuous, then the following statements hold.
(i) Existence and uniqueness for all x
¯ ∈ D(F ) (3.2) has a unique solution (xt ).
(ii) Minimal selection and Regularizing effects It holds ddt+ xt = −∇F (xt ) for every t > 0
(that is, the right derivative of xt always exists and realizes the element of minimal norm in
∂ − F (xt )) and ddt+ F ◦ x(t) = −|∇F (x(t))|2 for every t > 0. Also
1
2
F (xt ) ≤ inf
F (v) + |v − x
¯| ,
2t
v∈D(F )
1
2
2
2
|∇F (xt )| ≤ inf
|∇F (v)| + 2 |v − x
¯| .
t
v∈D(∂F )
(iii) Energy Dissipation Equality |x0t |, |∇F |(xt ) ∈ L2loc (0, +∞), F (xt ) ∈ ACloc (0, +∞) and
the following Energy Dissipation Equality holds:
Z
Z
1 s 0 2
1 s
2
|∇F (xr )| dr +
|xr | dr
0 < t ≤ s < ∞;
F (xt ) − F (xs ) =
2 t
2 t
(iv) Evolution Variational Inequality and contraction (xt ) is the unique solution of the system
of differential inequalities
1 d
λ
|˜
xt − y|2 + F (xt ) + |˜
xt − y|2 ≤ F (y),
2 dt
2
50
∀y ∈ H, a.e. t,
among all locally absolutely continuous curves (˜
xt ) in (0, ∞) converging to x as t → 0.
Furthermore, if (yt ) is a solution of (3.2) starting from y, it holds
|xt − yt | ≤ e−λt |x − y|.
(v) Asymptotic behavior If λ > 0 then there exists a unique minimum xmin of F and it holds
F (xt ) − F (xmin ) ≤ F (¯
x) − F (xmin ) e−2λt .
In particular, the pointwise energy inequality
F (x) ≥ F (xmin ) +
λ
|x − xmin |2 ,
2
∀x ∈ H
gives
r
|xt − xmin | ≤
3.2
2(F (x) − F (xmin )) −λt
e .
λ
The theory of Gradient Flows in a metric setting
Here we give an overview of the theory of Gradient Flows in a purely metric framework.
3.2.1
The framework
The first thing we need to understand is the meaning of Gradient Flow in a metric setting. Indeed, the
system (3.2) makes no sense in metric spaces, thus we need to reformulate it so that it has a metric
analogous. There are several ways to do this, below we summarize the most important ones.
For the purpose of the discussion below, we assume that H = Rd and that E : H → R is
λ-convex and of class C 1 .
Let us start observing that (3.2) may be written as: t 7→ xt is locally absolutely continuous in
(0, +∞), converges to x as t ↓ 0 and it holds
1
d
1
E xt ≤ − |∇E|2 xt − |x0t |2 ,
dt
2
2
a.e. t ≥ 0.
(3.3)
Indeed, along any absolutely continuous curve yt it holds
d
E yt = h∇E(yt ), yt0 i
dt
≥ −|∇E|(yt )|yt0 |
(= if and only if − yt0 is a positive multiple of ∇E(yt )),
1
1
≥ − |∇E|2 yt − |yt0 |2
(= if and only if |yt0 | = |∇E(yt )|).
2
2
Thus in particular equation (3.3) may be written in the following integral form
Z
Z
1 s 0 2
1 s
E xs +
|xr | dr +
|∇E|2 (xr )dr ≤ E(xt ),
a.e. t < s
2 t
2 t
(3.4)
(3.5)
which we call Energy Dissipation Inequality (EDI
in the following).
d
Since the inequality (3.4) shows that dt
E yt < − 12 |∇E|2 yt − 12 |yt0 |2 never holds, the system
(3.2) may be also written in form of Energy Dissipation Equality (EDE in the following) as
Z
Z
1 s 0 2
1 s
E xt +
|xr | dr +
|∇E|2 (xr )dr = E(xt ),
∀0 ≤ t ≤ s.
(3.6)
2 t
2 t
51
Notice that the convexity of E does not play any role in this formulation.
A completely different way to rewrite (3.2) comes from observing that if xt solves (3.2) and
y ∈ H is a generic point it holds
1 d
λ
|xt − y|2 = hxt − y, x0t i = hy − xt , ∇E(xt )i ≤ E(y) − E(xt ) − |xt − y|2 ,
2 dt
2
where in the last inequality we used the fact that E is λ-convex. Since the inequality
hy − x, vi ≤ E(y) − E(x) −
λ
|x − y|2 ,
2
∀y ∈ H,
characterizes the elements v of the subdifferential of E at x, we have that an absolutely continuous
curve xt solves (3.2) if and only if
1 d
1
|xt − y|2 + λ|xt − y|2 + E(xt ) ≤ E(y),
2 dt
2
a.e. t ≥ 0,
(3.7)
holds for every y ∈ H. We will call this system of inequalities the Evolution Variational Inequality
(EVI).
Thus we got three different characterizations of Gradient Flows in Hilbert spaces: the EDI, the
EDE and the EVI. We now want to show that it is possible to formulate these equations also for
functionals E defined on a metric space (X, d).
The object |x0t | appearing in EDI and EDE can be naturally interpreted as the metric speed of the
absolutely continuous curve xt as defined in (2.19). The metric analogous of |∇E|(x) is the slope of
E, defined as:
Definition 3.2 (Slope) Let E : X → R ∪ {+∞} and x ∈ X be such that E(x) < ∞. Then the
slope |∇E|(x) of E at x is:
(E(x) − E(y))+
E(x) − E(y)
|∇E|(x) := lim
= max lim
,0 .
y→x
y→x
d(x, y)
d(x, y)
The three definitions of Gradient Flows in a metric setting that we are going to use are:
Definition 3.3 (Energy Dissipation Inequality definition of GF - EDI) Let E : X → R ∪ {+∞}
and let x ∈ X be such that E(x) < ∞. We say that [0, ∞) 3 t 7→ xt ∈ X is a Gradient Flow in the
EDI sense starting at x provided it is a locally absolutely continuous curve, x0 = x and
Z
Z
1 s
1 s
E(xs ) +
|x˙ r |2 dr +
|∇E|2 (xr )dr ≤ E(x),
∀s ≥ 0,
2 0
2 0
(3.8)
Z
Z
1 s
1 s
E(xs ) +
|x˙ r |2 dr +
|∇E|2 (xr )dr ≤ E(xt ),
a.e. t > 0, ∀s ≥ t.
2 t
2 t
Definition 3.4 (Energy Dissipation Equality definition of GF - EDE) Let E : X → R ∪ {+∞}
and let x ∈ X be such that E(x) < ∞. We say that [0, ∞) 3 t 7→ xt ∈ X is a Gradient Flow in the
EDE sense starting at x provided it is a locally absolutely continuous curve, x0 = x and
Z
Z
1 s
1 s
E(xs ) +
|x˙ r |2 dr +
|∇E|2 (xr )dr = E(xt ),
∀0 ≤ t ≤ s.
(3.9)
2 t
2 t
52
Definition 3.5 (Evolution Variation Inequality definition of GF - EVI) Let E : X → R∪{+∞},
x ∈ {E < ∞} and λ ∈ R. We say that (0, ∞) 3 t 7→ xt ∈ X is a Gradient Flow in the EVI sense
(with respect to λ) starting at x provided it is a locally absolutely continuous curve in (0, ∞), xt → x
as t → 0 and
E(xt ) +
1 d 2
λ
d (xt , y) + d2 (xt , y) ≤ E(y),
2 dt
2
∀y ∈ X, a.e. t > 0.
There are two basic and fundamental things that one needs understand when studying the problem
of Gradient Flows in a metric setting:
1) Although the formulations EDI, EDE and EVI are equivalent for λ-convex functionals on
Hilbert spaces, they are not equivalent in a metric setting. Shortly said, it holds
EV I
⇒
⇒
EDE
EDI
and typically none of the converse implication holds (see Examples 3.15 and 3.23 below). Here
the second implication is clear, for the proof of the first one see Proposition 3.6 below.
2) Whatever definition of Gradient Flow in a metric setting we use, the main problem is to show
existence. The main ingredient in almost all existence proofs is the Minimizing Movements
scheme, which we describe after Proposition 3.6.
Proposition 3.6 (EVI implies EDE) Let E : X → R ∪ {+∞} be a lower semicontinuous functional, x ∈ X a given point, λ ∈ R and assume that (xt ) is a Gradient Flow for E starting from x in
the EVI sense w.r.t. λ. Then equation (3.9) holds.
Proof First we assume that xt is locally Lipschitz. The claim will be proved if we show that t 7→
E(xt ) is locally Lipschitz and it holds
−
d
1
1
E(xt ) = |x˙ t |2 + |∇E|2 (xt ),
dt
2
2
a.e. t > 0.
Let us start observing that the triangle inequality implies
1 d 2
d (xt , y) ≥ −|x˙ t |d(xt , y),
2 dt
∀y ∈ X, a.e. t > 0,
thus plugging this bound into the EVI we get
−|x˙ t |d(xt , y) +
λ 2
d (xt , y) + E(xt ) ≤ E(y),
2
∀y ∈ X, a.e. t > 0,
which implies
|∇E|(xt ) = lim
y→xt
E(xt ) − E(y)
d(xt , y)
+
≤ |x˙ t |,
a.e. t > 0.
(3.10)
Fix an interval [a, b] ⊂ (0, ∞), let L be the Lipschitz constant of (xt ) in [a, b] and observe that for
any y ∈ X it holds
d 2
d (xt , y) ≥ −|x˙ t |d(xt , y) ≥ −Ld(xt , y),
dt
a.e. t ∈ [a, b].
Plugging this bound in the EVI we get
−Ld(xt , y) +
λ 2
d (xt , y) + E(xt ) ≤ E(y),
2
53
a.e. t ∈ [a, b],
and by the lower semicontinuity of t 7→ E(xt ) the inequality holds for every t ∈ [a, b]. Taking
y = xs and then exchanging the roles of xt , xs we deduce
E(xt ) − E(xs ) ≤ Ld(xt , xs ) − λ d2 (xt , xs ) ≤ L|t − s| L + |λ| L|t − s| ,
∀t, s ∈ [a, b],
2
2
thus the map t 7→ E(xt ) is locally Lipschitz. It is then obvious that it holds
−
d
E(xt ) − E(xt+h )
E(xt ) − E(xt+h ) d(xt+h , xt )
E(xt ) = lim
= lim
h→0
h→0
dt
h
d(xt+h , xt )
h
1
1
a.e. t.
≤ |∇E|(xt )|x˙ t | ≤ |∇E|2 (xt ) + |x˙ t |2 ,
2
2
Thus to conclude we need only to prove the opposite inequality. Integrate the EVI from t to t + h to
get
Z t+h
Z t+h
d2 (xt+h , y) − d2 (xt , y)
λ 2
+
E(xs ) ds +
d (xs , y)ds ≤ hE(y).
2
2
t
t
Let y = xt to obtain
Z t+h
Z 1
d2 (xt+h , xt )
|λ| 2 3
|λ| 2 3
≤
E(xt ) − E(xs ) ds +
L h =h
E(xt ) − E(xt+hr ) dr +
L h .
2
6
6
t
0
Now let A ⊂ (0, +∞) be the set of points of differentiability of t 7→ E(xt ) and where |x˙ t | exists,
choose t ∈ A∩(a, b), divide by h2 the above inequality, let h → 0 and use the dominated convergence
theorem to get
Z 1
Z 1
1
E(xt ) − E(xt+hr )
d
1 d
|x˙ t |2 ≤ lim
dr = − E(xt )
r dr = −
E(xt ).
h→0 0
2
h
dt
2 dt
0
Recalling (3.10) we conclude with
−
d
1
1
E(xt ) ≥ |x˙ t |2 ≥ |x˙ t |2 + |∇E|2 (xt ),
dt
2
2
a.e. t > 0.
Finally, we see how the local Lipschitz property of (xt ) can be achieved. It is immediate to verify
that the curve t 7→ xt+h is a Gradient Flow in the EVI sense starting from xh for all h > 0. We now
use the fact that the distance between curves satisfying the EVI is contractive up to an exponential
factor (see the last part of the proof of Theorem 3.25 for a sketch of the argument, and Corollary
4.3.3 of [6] for the rigorous proof). We have
d(xs , xs+h ) ≤ e−λ(s−t) d(xt , xt+h ),
∀s > t.
Dividing by h, letting h ↓ 0 and calling B ⊂ (0, ∞) the set where the metric derivative of xt exists,
we obtain
|x˙ s | ≤ |x˙ t |e−λ(s−t) ,
∀s, t ∈ B, s > t.
This implies that the curve (xt ) is locally Lipschitz in (0, +∞).
Let us come back to the case of a convex and lower semicontinuous functional F on an Hilbert
space. Pick x ∈ D(F ), fix τ > 0 and define the sequence n 7→ xτ(n) recursively by setting xτ(n) := x
and defining xτ(n+1) as a minimizer of
x
7→
F (x) +
54
|x − xτ(n) |2
2τ
.
It is immediate to verify that a minimum exists and that it is unique, thus the sequence n 7→ xτ(n) is
well defined. The Euler-Lagrange equation of xτ(n+1) is:
xτ(n+1) − xτ(n)
τ
∈ −∂ − F (xτ(n+1) ),
which is a time discretization of (3.2). It is then natural to introduce the rescaled curve t 7→ xτt by
xτt := xτ([t/τ ]) ,
where [·] denotes the integer part, and to ask whether the curves t 7→ xτt converge in some sense to a
limit curve (xt ) which solves (3.2) as τ ↓ 0. This is the case, and this procedure is actually the heart
of the proof of Theorem 3.1.
What is important for the discussion we are making now, is that the minimization procedure just
described can be naturally posed in a metric setting for a general functional E : X → R ∪ {+∞}: it
is sufficient to pick x ∈ {E < ∞}, τ > 0, define xτ(0) := x and then recursively
)
(
d2 (x, xτ(n) )
τ
x(n+1) ∈ argmin x 7→ E(x) +
.
(3.11)
2τ
We this give the following definition:
Definition 3.7 (Discrete solution) Let (X, d) be a metric space, E : X → R ∪ {+∞} lower semicontinuous, x ∈ {E < ∞} and τ > 0. A discrete solution is a map [0, +∞) 3 t 7→ xτt defined
by
xτt := xτ([t/τ ]) ,
where xτ(0) := x and xτ(n+1) satisfies (3.11).
Clearly in a metric context it is part of the job the identification of suitable assumptions that
ensure that the minimization problem (3.11) admits at least a minimum, so that discrete solutions
exist.
We now divide the discussion into three parts, to see under which conditions on the functional E
and the metric space X it is possible to prove existence of Gradient Flows in the EDI, EDE and EVI
formulation.
3.2.2
General l.s.c. functionals and EDI
In this section we will make minimal assumptions on the functional E and show how it is possible,
starting from them, to prove existence of Gradient Flows in the EDI sense.
Basically, there are two “independent” sets of assumptions that we need: those which ensure the
existence of discrete solutions, and those needed to pass to the limit. To better highlight the structure
of the theory, we first introduce the hypotheses we need to guarantee the existence of discrete solution
and see which properties the discrete solutions have. Then, later on, we introduce the assumptions
needed to pass to the limit.
We will denote by D(E) ⊂ X the domain of E, i.e. D(E) := {E < ∞}
Assumption 3.8 (Hypothesis for existence of discrete solutions) (X, d) is a Polish space and E :
X → R ∪ {+∞} be a l.s.c. functional bounded from below. Also, we assume that there exists τ > 0
such that for every 0 < τ < τ and x ∈ D(E) there exists at least a minimum of
x
7→
E(x) +
55
d2 (x, x)
.
2τ
(3.12)
Thanks to our assumptions we know that discrete solutions exist for every starting point x, for
τ sufficiently small. The big problem we have to face now is to show that the discrete solutions
satisfy a discretized version of the EDI suitable to pass to the limit. The key enabler to do this, is the
following result, due to de Giorgi.
Theorem 3.9 (Properties of the variational interpolation) Let X, E be satisfying the Assumption
3.8. Fix x ∈ X, and for any 0 < τ < τ choose xτ among the minimizers of (3.12). Then the map
2
τ)
τ 7→ E(xτ ) + d (x,x
is locally Lipschitz in (0, τ ) and it holds
2τ
d
d2 (x, xτ )
d2 (x, xτ )
,
a.e. τ ∈ (0, τ ).
E(xτ ) +
=−
(3.13)
dτ
2τ
2τ 2
d2 (x
d2 (xτ0 ,x)
2τ0
,x)
≤ E(xτ1 ) + 2ττ01 we deduce
d2 (xτ0 , x)
d2 (xτ1 , x)
1
1
τ1 − τ0 2
E(xτ0 ) +
− E(xτ1 ) +
≤
−
d2 (xτ1 , x) =
d (xτ1 , x).
2τ0
2τ1
2τ0
2τ1
2τ0 τ1
Proof Observe that from E(xτ0 ) +
Arguing symmetrically we see that
E(xτ0 ) +
d2 (xτ1 , x)
τ1 − τ0 2
d2 (xτ0 , x)
− E(xτ1 ) +
≥
d (xτ0 , x).
2τ0
2τ1
2τ0 τ1
The last two inequalities show that τ 7→ E(xτ ) +
(3.13) holds.
d2 (x,xτ )
2τ
is locally Lipschitz and that equation
Lemma 3.10 With the same notation and assumptions as in the previous theorem, τ 7→ d(x, xτ ) is
non decreasing and τ 7→ E(xτ ) is non increasing. Also, it holds
d(xτ , x)
.
τ
Proof Pick 0 < τ0 < τ1 < τ . From the minimality of xτ0 and xτ1 we get
|∇E|(xτ ) ≤
(3.14)
d2 (xτ0 , x)
d2 (xτ1 , x)
≤ E(xτ1 ) +
,
2τ0
2τ0
d2 (xτ1 , x)
d2 (xτ0 , x)
E(xτ1 ) +
≤ E(xτ0 ) +
.
2τ1
2τ1
E(xτ0 ) +
Adding up and using the fact that τ10 − τ11 ≥ 0 we get d(x, xτ0 ) ≤ d(x, xτ1 ). The fact that τ 7→ E(xτ )
is non increasing now follows from
E(xτ1 ) +
d2 (xτ0 , x)
d2 (xτ1 , x)
d2 (xτ0 , x)
≤ E(xτ1 ) +
≤ E(xτ0 ) +
.
2τ1
2τ1
2τ1
For the second part of the statement, observe that from
E(xτ ) +
d2 (xτ , x)
d2 (y, x)
≤ E(y) +
,
2τ
2τ
∀y ∈ X
we get
d(y, x) − d(xτ , x) d(xτ , x) + d(y, x, )
E(xτ ) − E(y)
d2 (y, x) − d2 (xτ , x)
≤
=
d(xτ , y)
2τ d(xτ , y)
2τ d(xτ , y)
d(xτ , x, ) + d(y, x)
≤
.
2τ
Taking the limsup as y → xτ we get the thesis.
56
By Theorem 3.9 and Lemma 3.10 it is natural to introduce the following variational interpolation in the Minimizing Movements scheme (as opposed to the classical piecewise constant/affine
interpolations used in other contexts):
Definition 3.11 (Variational interpolation) Let X, E be satisfying Assumption 3.8, x ∈ D(E) and
0 < τ < τ . We define the map [0, ∞) 3 t 7→ xτt in the following way:
• xτ0 := x,
• xτ(n+1)τ is chosen among the minimizers of (3.12) with x replaced by xτnτ ,
• xτt with t ∈ (nτ, (n + 1)τ ) is chosen among the minimizers of (3.12) with x and τ replaced by
xτnτ and t − nτ respectively.
For (xτt ) defined in this way, we define the discrete speed Dspτ : [0, +∞) → [0, +∞) and the
Discrete slope Dslτ : [0, +∞) → [0, +∞) by:
τ
τ
d
x
,
x
nτ
(n+1)τ
,
t ∈ (nτ, (n + 1)τ ),
Dspτt :=
τ (3.15)
d xτt , xτnτ
τ
,
t ∈ (nτ, (n + 1)τ ).
Dslt :=
t − nτ
Although the object Dslτt does not look like a slope, we chose this name because from (3.14) we
know that |∇E|(xτt ) ≤ Dslτt and because in the limiting process Dslτ will produce the slope term in
the EDI (see the proof of Theorem 3.14).
With this notation we have the following result:
Corollary 3.12 (EDE for the discrete solutions) Let X, E be satisfying Assumption 3.8, x ∈
D(E), 0 < τ < τ and (xτt ) defined via the variational interpolation as in Definition 3.11 above.
Then it holds
Z
Z
1 s
1 s
E(xτs ) +
|Dspτr |2 dr +
|Dslτr |2 dr = E(xτt ),
(3.16)
2 t
2 t
for every t = nτ , s = mτ , n < m ∈ N.
Proof It is just a restatement of equation (3.13) in terms of the notation given in (3.15).
Thus, at the level of discrete solutions, it is possible to get a discrete form of the Energy Dissipation Equality under the quite general Assumptions 3.8. Now we want to pass to the limit as τ ↓ 0. In
order to do this, we need to add some compactness and regularity assumptions on the functional:
Assumption 3.13 (Coercivity and regularity assumptions) Assume that E : X → R ∪ {+∞}
satisfies:
• E is bounded from below and its sublevels are boundedly compact, i.e. {E ≤ c} ∩ Br (x) is
compact for any c ∈ R, r > 0 and x ∈ X,
• the slope |∇E| : D(E) → [0, +∞] is lower semicontinuous,
• E has the following continuity property:
xn → x, sup{|∇E|(xn ), E(xn )} < ∞
⇒
E(xn ) → E(x).
n
Under these assumptions we can prove the following result:
Theorem 3.14 (Gradient Flows in EDI formulation) Let (X, d) be a metric space and let E :
X → R ∪ {+∞} be satisfying the Assumptions 3.8 and 3.13. Also, let x ∈ D(E) and for 0 < τ < τ
define the discrete solution via the variational interpolation as in Definition 3.11. Then it holds:
57
• the set of curves {(xτt )}τ is relatively compact in the set of curves in X w.r.t. local uniform
convergence,
• any limit curve (xt ) is a Gradient Flow in the EDI formulation (Definition 3.3).
Sketch of the Proof
Compactness. By Corollary 3.12 we have
2
d
(xτt , x)
Z
!2
T
|Dspτr |dr
≤
Z
T
|Dspτr |2 dr ≤ 2T E(x) − inf E ,
≤T
0
∀t ≤ T,
0
for any T = nτ . Therefore for any T > 0 the set {xτt }t≤T is uniformly bounded in τ . As this set is
also contained in {E ≤ E(x)}, it is relatively compact. The fact that there is relative compactness
w.r.t. local uniform convergence follows by an Ascoli-Arzelà-type argument based on the inequality
d2 xτt , xτs ≤
Z
s
|Dspτr |dr
2
≤ 2(s − t) E(x) − inf E ,
∀t = nτ, s = mτ, n < m ∈ N.
t
(3.17)
Passage to the limit. Let τn ↓ 0 be such that (xτt n ) converges to a limit curve xt locally uniformly.
Then by standard arguments based on inequality (3.17) it is possible to check that t 7→ xt is absolutely continuous and satisfies
Z s
Z s
∀0 ≤ t < s.
(3.18)
|Dspτrn |2 dr
|x˙ r |2 dr ≤ lim
n→∞
t
t
By the lower semicontinuity of |∇E| and (3.14) we get
|∇E|(xt ) ≤ lim |∇E|(xτt n ) ≤ lim Dslτt n ,
n→∞
thus Fatou’s lemma ensures that for any t < s it holds
Z s
Z s
Z
|∇E|2 (xr )dr ≤
lim |∇E|2 (xτr )dr ≤ lim
t
t
n→∞
∀t,
n→∞
n→∞
s
|Dslτrn |2 dr ≤ 2T E(x) − inf E . (3.19)
t
Now passing to the limit in (3.16) written for t = 0 we get the first inequality in (3.8). Also, from
(3.19) we get that the L2 norm of f (t) := limn→∞ |∇E|(xτt n ) on [0, ∞) is finite. Thus A := {f <
∞} has full Lebesgue measure and for each t ∈ A we can find a subsequence τnk ↓ 0 such that
τn
τn
supk |∇E|(xt k ) < ∞. Then the third assumption in 3.13 guarantees that E(xt k ) → E(xt ) and
τn k
the lower semicontinuity of E that E(xs ) ≤ limk→∞ E(xs ) for every s ≥ t. Thus passing to the
limit in (3.16) as τnk ↓ 0 and using (3.18) and (3.19) we get
Z
Z
1 s
1 s
2
E(xs ) +
|x˙ r | dr +
|∇E|2 (xr )dr ≤ E(xt ),
∀t ∈ A, ∀s ≥ t.
2 t
2 t
We conclude with an example which shows why in general we cannot hope to have equality in the
EDI. Shortly said, the problem is that we don’t know whether t 7→ E(xt ) is an absolutely continuous
map.
Example 3.15 Let X = [0, 1] with the Euclidean distance, C ⊂ X a Cantor-type set with null
Lebesgue measure and f : [0, 1] → [1, +∞] a continuous, integrable function such that f (x) = +∞
58
for any x ∈ C, which is smooth on the complement of C. Also, let g : [0, 1] → [0, 1] be a “Devil
staircase” built over C, i.e. a continuous, non decreasing function satisfying g(0) = 0, g(1) = 1
which is constant in each of the connected components of the complement of C. Define the energies
˜ : [0, 1] → R by
E, E
Z x
E(x) := −g(x) −
f (y)dy.
0
Z x
˜
f (y)dy.
E(x)
:= −
0
˜ satisfy all the Assumptions 3.8, 3.13 (the choice of f guarantees
It is immediate to verify that E, E
˜
that the slopes of E, E are continuous). Now build a Gradient Flow starting from 0: with some work
it is possible to check that the Minimizing Movement scheme converges in both cases to absolutely
continuous curves (xt ) and (˜
xt ) respectively satisfying
x0t = −|∇E|(xt ),
˜ xt ),
x
˜0t = −|∇E|(˜
a.e. t
a.e. t.
˜
Now, notice that |∇E|(x) = |∇E|(x)
= f (x) for every x ∈ [0, 1], therefore the fact that f ≥ 1
is smooth on [0, 1] \ C gives that each of these two equations admit a unique solution. Therefore
- this is the key point of the example - (xt ) and (˜
xt ) must coincide. In other words, the effect of
the function g is not seen at the level of Gradient Flow. It is then immediate to verify that there is
˜ but there is only the Energy Dissipation Inequality for
Energy Dissipation Equality for the energy E,
the energy E.
3.2.3
The geodesically convex case: EDE and regularizing effects
Here we study gradient flows of so called geodesically convex functionals, which are the natural
metric generalization of convex functionals on linear spaces.
Definition 3.16 (Geodesic convexity) Let E : X → R ∪ {+∞} be a functional and λ ∈ R. We say
that E is λ-geodesically convex provided for every x, y ∈ X there exists a constant speed geodesic
γ : [0, 1] → X connecting x to y such that
E(γt ) ≤ (1 − t)E(x) + tE(y) −
λ
t(1 − t)d2 (x, y).
2
(3.20)
In this section we will assume that:
Assumption 3.17 (Geodesic convexity hypothesis) (X, d) is a Polish geodesic space, E : X →
R ∪ {+∞} is lower semicontinuous, λ-geodesically convex for some λ ∈ R. Also, we assume that
the sublevels of E are boundedly compact, i.e. the set {E ≤ c} ∩ Br (x) is compact for any c ∈ R,
r > 0, x ∈ X.
What we want to prove is that for X, E satisfying these assumptions there is existence of Gradient
Flows in the formulation EDE (Definition 3.4).
Our first goal is to show that in this setting it is possible to recover the results of the previous
section. We start claiming that it holds:
|∇E|(x) = sup
y6=x
+
E(x) − E(y) λ
+ d(x, y)
,
d(x, y)
2
59
(3.21)
so that the lim in the definition of the slope can be replaced by a sup. Indeed, we know that
|∇E|(x) = lim
y→x
+
+
E(x) − E(y) λ
E(x) − E(y) λ
+ d(x, y)
≤ sup
+ d(x, y)
.
d(x, y)
2
d(x, y)
2
y6=x
To prove the opposite inequality fix y 6= x and a constant speed geodesic γ connecting x to y for
which (3.20) holds. Then observe that
+ +
E(x) − E(γt )
E(x) − E(γt )
= lim
|∇E|(x) ≥ lim
t↓0
t↓0
d(x, γt )
d(x, γt )
+ +
(3.20)
E(x) − E(y) λ
E(x) − E(y) λ
≥ lim
+ (1 − t)d(x, y)
=
+ d(x, y)
.
t↓0
d(x, y)
2
d(x, y)
2
Using this representation formula we can show that all the assumptions 3.8 and 3.13 hold:
Proposition 3.18 Suppose that Assumption 3.17 holds. Then Assumptions 3.8 and 3.13 hold as well.
Sketch of the Proof From the λ-geodesic convexity and the lower semicontinuity assumption it is
possible to deduce (we omit the details) that E has at most quadratic decay at infinity, i.e. there
exists x ∈ X, a, b > 0 such that
E(x) ≥ −a − bd(x, x) + λ− d2 (x, x),
∀x ∈ X.
Therefore from the lower semicontinuity again and the bounded compactness of the sublevels of E
we immediately get that the minimization problem (3.12) admits a solution if τ < 1/λ− .
The lower semicontinuity of the slope is a direct consequence of (3.21) and of the lower semicontinuity of E. Thus, to conclude we need only to show that
xn → x, sup{|∇E|(xn ), E(xn )} < ∞
⇒
n
lim E(xn ) ≤ E(x).
n→∞
(3.22)
From (3.21) with x, y replaced by xn , x respectively we get
E(x) ≥ E(xn ) − |∇E|(xn )d(x, xn ) +
and the conclusion follows by letting n → ∞.
λ 2
d (x, xn ),
2
Thus Theorem 3.14 applies directly also to this case and we get existence of Gradient Flows in
the EDI formulation. To get existence in the stronger EDE formulation, we need the following result,
which may be thought as a sort of weak chain rule (observe that the validity of the proposition below
rules out behaviors like the one described in Example 3.15).
Proposition 3.19 Let E be a λ-geodesically convex and l.s.c. functional. Then for every absolutely
continuous curve (xt ) ⊂ X such that E(xt ) < ∞ for every t, it holds
Z s
E(xs ) − E(xt ) ≤
|x˙ r ||∇E(xr )|dr,
∀t < s.
(3.23)
t
Proof We may assume that the right hand side of (3.23) is finite for any t, s ∈ [0, 1], and, by
a reparametrization argument, we may also assume that |x˙t | = 1 for a.e. t (in particular (xt ) is
60
1-Lipschitz), so that t 7→ |∇E|(xt ) is an L1 function. Notice that it is sufficient to prove that
t 7→ E(xt ) is absolutely continuous, as then the inequality
lim
h↑0
E(xt+h ) − E(xt )
(E(xt ) − E(xt+h ))+
≤ lim
h↑0
h
|h|
(E(xt ) − E(xt+h ))+
d(xt , xt+h )
lim
≤ lim
≤ |∇E(xt )||x˙t |,
h↑0
h↑0
d(xt , xt+h )
|h|
valid for any t ∈ [0, 1] gives (3.23).
Define the functions f, g : [0, 1] → R by
f (t) := E(xt ),
g(t) := sup
s6=t
(f (t) − f (s))+
|s − t|
Let D be the diameter of the compact set {xt }t∈[0,1] , use the fact that (xt ) is 1-Lipschitz, formula
(3.21) and the trivial inequality a+ ≤ (a + b)+ + b− (valid for any a, b ∈ R) to get
g(t) ≤ sup
s6=t
λ−
(E(xt ) − E(xs ))+
≤ |∇E|(xt ) +
D.
d(xs , xt )
2
Therefore the thesis will be proved if we show that:
1
g∈L
Z
⇒
|f (s) − f (t)| ≤
s
g(r)dr
∀t < s.
(3.24)
t
Fix M > 0 and define f M := min{f, M }. Now fix ε > 0, pick a smooth mollifier ρε : R → R with
support in [−ε, ε] and define fεM , gεM : [ε, 1 − ε] → R by
fεM (t) := f M ∗ ρε (t),
gεM (t) := sup
s6=t
(fεM (t) − fεM (s))+
.
|s − t|
Since fεM is smooth and gεM ≥ (fεM )0 it holds
|fεM (s) − fεM (t)| ≤
Z
s
gεM (r)dr.
(3.25)
t
R
R
From the trivial bound ( h)+ ≤ h+ we get
R M
R
(f (t − r) − f M (s − r))+ ρε (r)dr
(f (t − r) − f (s − r))+ ρε (r)dr
M
gε (t) ≤ sup
≤ sup
|s − t|
|s − t|
s
s
Z
Z
+
(f (t − r) − f (s − r))
= sup
ρε (r)dr ≤ g(t − r)ρε (r)dr = g ∗ ρε (t).
|(s − r) − (t − r)|
s
(3.26)
Thus the family of functions {gεM }ε is dominated in L1 (0, 1). From (3.25) and (3.26) it follows that
the family of functions {fεM } uniformly converge to some function f˜M on [0, 1] as ε ↓ 0 for which
it holds
Z s
|f˜M (s) − f˜M (t)| ≤
g(r)dr.
t
61
We know that f M = f˜M on some set A ⊂ [0, 1] such that L1 ([0, 1] \ A) = 0, and we want to
prove that they actually coincide everywhere. Recall that f M is l.s.c. and f˜M is continuous, hence
f M ≤ f˜M in [0, 1]. If by contradiction it holds f M (t0 ) < c < C < f˜M (t0 ) for some t0 , c, C, we
can find δ > 0 such that f˜M (t) > C in t ∈ [t0 −δ, t0 +δ]. Thus f M (t) > C for t ∈ [t0 −δ, t0 +δ]∩A
and the contradiction comes from
Z 1
Z
Z
C −c
dt = +∞.
g(t)dt ≥
g(t)dt ≥
|t
− t0 |
0
[t0 −δ,t0 +δ]∩A
[t0 −δ,t0 +δ]∩A
Thus we proved that if g ∈ L1 (0, 1) it holds
Z s
M
M
|f (t) − f (s)| ≤
g(r)dr,
∀t < s ∈ [0, 1], M > 0.
t
Letting M → ∞ we prove (3.24) and hence the thesis.
This proposition is the key ingredient to pass from existence of Gradient Flows in the EDI formulation to the one in the EDE formulation:
Theorem 3.20 (Gradient Flows in the EDE formulation) Let X, E be satisfying Assumption 3.17
and x ∈ X be such that E(x) < ∞. Then all the results of Theorem 3.14 hold.
Also, any Gradient Flow in the EDI sense is also a Gradient Flow in the EDE sense (Definition
3.4).
Proof The first part of the statement follows directly from Proposition 3.18.
By Theorem 3.14 we know that the limit curve is absolutely continuous and satisfies
Z
Z
1 s 2
1 s
E(xs ) +
∀s ≥ 0.
|x|
˙ r dr +
|∇E|2 (xr )dr ≤ E(x),
2 0
2 0
(3.27)
In particular, the functions t 7→ |x˙ t | and t 7→ |∇E|(xt ) belong to L2loc (0, +∞). Now we use
Proposition 3.19: we know that for any s ≥ 0 it holds
Z s
Z
Z
1 s
1 s
E(x) − E(xs ) ≤
|x˙ r ||∇E|(xr )dr ≤
|x˙ r |2 dr +
|∇E|2 (xr )dr.
(3.28)
2
2
0
0
0
Therefore t 7→ E(xt ) is locally absolutely continuous and it holds
Z
Z
1 s
1 s
|x˙ r |2 dr +
|∇E|2 (xr )dr = E(x),
E(xs ) +
2 0
2 0
∀s ≥ 0.
Subtracting from this last equation the same equality written for s = t we get the thesis.
Remark 3.21 It is important to underline that the hypothesis of λ-geodesic convexity is in general
of no help for what concerns the compactness of the sequence of discrete solutions.
The λ-geodesic convexity hypothesis, ensures various regularity results for the limit curve, which we
state without proof:
Proposition 3.22 Let X, E be satisfying Assumption 3.17 and let (xt ) be any limit of a sequence of
discrete solutions. Then:
62
i) the limit
|x˙ +
t | := lim
h↓0
d(xt+h , xt )
,
h
exists for every t > 0,
ii) the equation
d
2
E(xt ) = −|∇E|2 (xt ) = −|x˙ +
˙+
t | = −|x
t ||∇E|(xt ),
dt+
is satisfied at every t > 0,
−
iii) the map t 7→ e−2λ t E(xt ) is convex, the map t 7→ eλt |∇E|(xt ) is non increasing, right
continuous and satisfies
−
t
|∇E|2 (xt ) ≤ e2λ t E(x0 ) − Et (x0 ) ,
2
t|∇E|2 (xt ) ≤ (1 + 2λ+ t)e−2λt E(x0 − inf E ,
where Et : X → R is defined as
Et (x) := inf E(y) +
y
d2 (x, y)
,
2t
iv) if λ > 0, then E admits a unique minimum xmin and it holds
λ 2
d (xt , xmin ) ≤ E(xt ) − E(xmin ) ≤ e−2λt E(x0 ) − E(xmin ) .
2
Observe that we didn’t state any result concerning the uniqueness (nor about contractivity) of
the curve (xt ) satisfying the Energy Dissipation Equality (3.9). The reason is that if no further
assumptions are made on either X or E, in general uniqueness fails, as the following simple example
shows:
Example 3.23 (Lack of uniqueness) Let X := R2 endowed with the L∞ norm, E : X → R be
defined by E(x1 , x2 ) := x1 and x := (0, 0). Then it is immediate to verify that |∇E| ≡ 1 and that
any Lipschitz curve t 7→ xt = (x1t , x2t ) satisfying
x1t = −t,
0
|x2t |
∀t ≥ 0
≤ 1,
a.e. t > 0,
satisfies also
E(xt ) = −t,
|x˙ t | = 1.
This implies that any such (xt ) satisfies the Energy Dissipation Equality (3.9).
3.2.4
The compatibility of Energy and distance: EVI and error estimates
As the last example of the previous section shows, in general we cannot hope to have uniqueness of
the limit curve (xt ) obtained via the Minimizing Movements scheme for a generic λ-geodesically
convex functional. If we want to derive properties like uniqueness and contractivity of the flow, we
need to have some stronger relation between the Energy functional E and the distance d on X: in
this section we will assume the following:
63
Assumption 3.24 (Compatibility in Energy and distance) (X, d) is a Polish space. E : X →
R ∪ {+∞} is a lower semicontinuous functional and for any x0 , x1 , y ∈ X, there exists a curve
t 7→ γ(t) such that
λ
t(1 − t)d2 (x0 , x1 ),
2
d2 (γt , y) ≤ (1 − t)d2 (x0 , y) + td2 (x1 , y) − t(1 − t)d2 (x0 , x1 ),
E(γt ) ≤ (1 − t)E(x0 ) + tE(x1 ) −
(3.29)
for every t ∈ [0, 1].
Observe that there is no compactness assumption of the sublevels of E. If X is an Hilbert space (and
more generally a NPC space - Definition 2.19) then the second inequality in (3.29) is satisfied by
geodesics. Hence λ-convex functionals are automatically compatible with the metric.
Following the same lines of the previous section, it is possible to show that this assumption implies both Assumption 3.8 and, if the sublevels of E are boundedly compact, Assumption 3.13, so that
Theorem 3.14 holds. Also it can be shown that formula (3.21) is true and thus that Proposition 3.19
holds also in this setting, so that Theorem 3.20 can be proved as well.
However, if Assumption 3.24 holds, it is better not to follow the general theory as developed
before, but to restart from scratch: indeed, in this situation much stronger statements hold, also at the
level of discrete solutions, which can be proved by a direct use of Assumption 3.24.
We collect the main results achievable in this setting in the following theorem:
Theorem 3.25 (Gradient Flows for compatible E and d: EVI) Assume that X, E satisfy Assumption 3.24. Then the following hold.
• For every x ∈ D(E) and 0 < τ < 1/λ− there exists a unique discrete solution (xτt ) as in
Definition 3.7.
• Let x ∈ D(E) and (xτt ) any family of discrete solutions starting from it. Then (xτt ) converge
locally uniformly to a limit curve (xt ) as τ ↓ 0 (so that the limit curve is unique). Furthermore,
(xt ) is the unique solution of the system of differential inequalities:
1 d 2
λ
d (˜
xt , y) + d2 (˜
xt , y) + E(˜
xt ) ≤ E(y),
2 dt
2
a.e. t ≥ 0, ∀y ∈ X,
(3.30)
among all locally absolutely continuous curves (˜
xt ) converging to x as t ↓ 0. I.e. xt is a
Gradient Flow in the EVI formulation - see Definition 3.5.
• Let x, y ∈ D(E) and (xt ), (yt ) be the two Gradient Flows in the EVI formulation. Then there
is λ-exponential contraction of the distance, i.e.:
d2 (xt , yt ) ≤ e−λt d2 (x, y).
(3.31)
• Suppose that λ ≥ 0, that x ∈ D(E) and build xτt , xt as above. Then the following a priori
error estimate holds:
p
sup d(xt , xτt ) ≤ 8 τ (E(x) − E(xt )).
(3.32)
t≥0
Sketch of the Proof We will make the following simplifying assumptions: E ≥ 0, λ ≥ 0 and
τ /2n
x ∈ D(E). Also we will prove just that the sequence of discrete solutions n 7→ xt
converges to
a limit curve as n → ∞ for any given τ > 0.
Existence and uniqueness of the discrete solution. Pick x ∈ X. We have to prove that there
exists a unique minimizer of (3.12). Let I ≥ 0 be the infimum of (3.12). Let (xn ) be a minimizing
64
sequence for (3.12), fix n, m ∈ N and let γ : [0, 1] → X be a curve satisfying (3.29) for x0 := xn ,
x1 := xm and y := x. Using the inequalities (3.29) at t = 1/2 we get
d2 (γ1/2 , x)
I ≤ E(γ1/2 ) +
2τ
1
d2 (xn , x)
d2 (xm , x)
1 + λτ 2
≤
E(xn ) +
+ E(xm ) +
−
d (xn , xm ).
2
2τ
2τ
8τ
Therefore
1 + λτ 2
1
lim
d (xn , xm ) ≤ lim
n,m→∞
n,m→∞ 2
8τ
d2 (xm , x)
d2 (xn , x)
+ E(xm ) +
E(xn ) +
2τ
2τ
− I = 0,
and thus the sequence (xn ) is a Cauchy sequence as soon as 0 < τ < 1/λ− . This shows uniqueness,
existence follows by the l.s.c. of E.
One step estimates We claim that the following discrete version of the EVI (3.30) holds: for any
x ∈ X,
d2 (xτ , y) − d2 (x, y) λ 2 τ
+ d (x , y) ≤ E(y) − E(xτ ),
∀y ∈ X,
(3.33)
2τ
2
where xτ is the minimizer of (3.12). Indeed, pick a curve γ satisfying (3.29) for x0 := xτ , x1 := y
and y := x and use the minimality of xτ to get
E(xτ ) +
d2 (x, xτ )
d2 (x, γt )
λ
≤ E(γt ) +
≤ (1 − t)E(xτ ) + tE(y) − t(1 − t)d2 (xτ , y)
2τ
2τ
2
(1 − t)d2 (x, xτ ) + td2 (x, y) − t(1 − t)d2 (xτ , y)
+
.
2τ
Rearranging the terms, dropping the positive addend td2 (x, xτ ) and dividing by t > 0 we get
(1 − t)d2 (xτ , y) d2 (x, y) λ
−
+ (1 − t)d2 (xτ , y) ≤ E(y) − E(xτ ),
2τ
2τ
2
so that letting t ↓ 0 we get (3.33).
Now we pass to the discrete version of the error estimate, which will also give the full convergence of the discrete solutions to the limit curve. Given x, y ∈ D(E), and the associate discrete
τ /2
solutions xτt , ytτ , we are going to bound the distance d(xτ , yττ ) in terms of the distance d(x, y).
Write two times the discrete EVI (3.33) for τ := τ /2 and y := y: first with x := x, then with
τ /2
x := xτ /2 to get (we use the assumption λ ≥ 0)
τ /2
d2 (xτ /2 , y) − d2 (x, y)
τ
τ /2
τ /2
d2 (xτ , y) − d2 (xτ /2 , y)
τ
τ /2
≤ E(y) − E(xτ /2 ),
≤ E(y) − E(xττ /2 ).
τ /2
τ /2
Adding up these two inequalities and observing that E(xτ ) ≤ E(xτ /2 ) we obtain
τ /2
d2 (xτ , y) − d2 (x, y)
≤ 2 E(y) − E(xττ /2 ) .
τ
65
τ /2
On the other hand, equation (3.33) with x := y and y := xτ
reads as
τ /2
τ /2
d2 (yττ , xτ ) − d2 (y, xτ )
≤ 2 E(xττ /2 ) − E(yττ ) .
τ
Adding up these last two inequalities we get
τ /2
d2 (yττ , xτ ) − d2 (x, y)
≤ 2 E(y) − E(yττ ) .
τ
(3.34)
Discrete estimates. Pick t = nτ < mτ = s, write inequality (3.33) for x := xτiτ , i = n, . . . , m − 1
and add everything up to get
m
m
X
X
τ
λτ
d2 (xτt , y) − d2 (xτs , y)
d2 (xτiτ , y) ≤ E(y) −
E(xτiτ ).
+
2(s − t)
2(s − t) i=n+1
s − t i=n+1
(3.35)
τ /2
τ
Similarly, pick t = nτ , write inequality (3.34) for x := xiτ and y := yiτ
for i = 0, . . . , n − 1 and
add everything up to get
τ /2
d2 (xt
, ytτ ) − d2 (x, y)
≤ 2 E(y) − E(ytτ ) .
τ
Now let y = x to get
τ /2
d2 (xt
, xτt ) ≤ 2τ E(x) − E(xτt ) ≤ 2τ E(x),
(3.36)
having used the fact that E ≥ 0.
Conclusion of passage to the limit. Putting τ /2n instead of τ in (3.36) we get
τ /2n+1
d2 (xt
therefore
τ /2n
d2 (xt
τ /2
τ /2m
, xt
τ /2n
, xt
)≤
τ
E(x),
2n−1
) ≤ τ (22−n − 22−m )E(x),
∀n < m ∈ N,
n
which tells that n 7→ xt
is a Cauchy sequence for any t ≥ 0. Also, choosing n = 0 and letting
m → ∞ we get the error estimate (3.32).
We pass to the EVI. Letting τ ↓ 0 in (3.35) it is immediate to verify that we get
Z s
Z s
d2 (xt , y) − d2 (xs , y)
λ
1
2
+
d (xr , y) ≤ E(y) −
E(xr )dr,
2(s − t)
2(s − t) t
s−t t
which is precisely the EVI (3.30) written in integral form.
Uniqueness and contractivity. It remains to prove that the solution to the EVI is unique and the
contractivity (3.31). The heuristic argument is the following: pick (xt ) and (yt ) solutions of the EVI
starting from x, y respectively. Choose y = yt in the EVI for (xt ) to get
1 d
λ
d2 (xs , yt ) + d2 (xt , yt ) + E(xt ) ≤ E(yt ).
2 ds |s=t
2
Symmetrically we have
1 d
λ
d2 (xt , ys ) + d2 (xt , yt ) + E(yt ) ≤ E(xt ).
|
s=t
2 ds
2
66
Adding up these two inequalities we get
d 2
d (xt , yt ) ≤ −2λd2 (xt , yt ),
dt
a.e. t.
The rigorous proof follows this line and uses a doubling of variables argument á la Kruzkhov.
Uniqueness and contraction then follow by the Gronwall lemma.
3.3
Applications to the Wasserstein case
The aim of this section is to apply the abstract theory developed in the previous one to the case of
functionals on (P2 (Rd ), W2 ). As we will see, various diffusion equations may be interpreted as
Gradient Flows of appropriate energy functionals w.r.t. to the Wasserstein distance, and quantitive
analytic properties of the solutions can be derived by this interpretation.
Most of what we are going to discuss here is valid in the more general contexts of Riemannian
manifolds and Hilbert spaces, but the differences between these latter cases and the Euclidean one
are mainly technical, thus we keep the discussion at a level of Rd to avoid complications that would
just obscure the main ideas.
The secton is split in two subsections: in the first one we discuss the definition of subdifferential
of a λ-geodesicaly convex functional on P2 (Rd ), which is based on the interpretation of P2 (Rd ) as
a sort of Riemannian manifold as discussed in Subsection 2.3.2. In the second one we discuss three
by now classical applications, for which the full power of the abstract theory can be used (i.e. we
will have Gradient Flows in the EVI formulation).
Before developing this program, we want to informally discuss a fundamental example.
Let us consider the Entropy functional E : P2 (Rd ) → R ∪ {+∞} defined by
 Z

ρ log(ρ)dLd ,
if µ = ρLd ,
E(µ) :=
 +∞
otherwise.
We claim that: the Gradient Flow of the Entropy in (P2 (Rd ), W2 ) produces a solution of the Heat
equation. This can be proved rigorously (see Subsection 3.3.2), but for the moment we want to keep
the discussion at the heuristic level.
By what discussed in the previous section, we know that the Minimizing Movements scheme
produces Gradient Flows. Let us apply the scheme to this setting. Fix an absolutely continuous
measure ρ0 (here we will make no distinction between an absolutely continuous measure and its
density), fix τ > 0 and minimize
µ
7→
E(µ) +
W22 (µ, ρ0 )
.
2τ
(3.37)
It is not hard to see that the minimum is attained at some absolutely continuous measure ρτ (actually
the minimum is unique, but this has no importance). Our claim will be “proved” if we show that for
any ϕ ∈ Cc∞ (Rd ) it holds
R
R
Z
ϕρτ − ϕρ0
= ∆ϕ ρτ + o(τ ),
(3.38)
τ
because this identity tells us that ρτ is a first order approximation of the distributional solution of the
Heat equation starting from ρ0 and evaluated at time τ .
67
To prove (3.38), fix ϕ ∈ Cc∞ (Rd ) and perturb ρτ in the following way:
ρε := (Id + ε∇ϕ)# ρτ .
The density of ρε can be explicitly expressed by
ρε (x + ε∇ϕ(x)) =
ρτ (x)
.
det(Id + ε∇2 ϕ(x))
Observe that it holds
Z
Z
Z
ρτ
E(ρε ) = ρε log(ρε ) = ρτ log ρε ◦ (Id + ε∇ϕ) = ρτ log
det(Id + ε∇2 ϕ)
(3.39)
Z
Z
2
= E(ρτ ) − ρτ log det(Id + ε∇ ϕ) = E(ρτ ) − ε ρτ ∆ϕ + o(ε),
where we used the fact that det(Id + εA) = 1 + εtr(A) + o(ε).
To evaluate the first variation of the distance squared, let T be the optimal transport map from ρτ
to ρ0 , which exists because of Theorem 1.26, and observe that from T# ρτ = ρ0 , (Id + ε∇ϕ)# ρτ =
ρε and inequality (2.1) we have
W22 (ρ0 , ρε ) ≤ kT − Id − ε∇ϕk2L2 (ρτ ) ,
therefore from the fact that equality holds at ε = 0 we get
W22 (ρ0 , ρε ) − W22 (ρ0 , ρτ ) ≤ kT − Id − ε∇ϕk2L2 (ρτ ) − kT − Idk2L2 (ρτ )
Z
= −2ε hT − Id, ∇ϕi ρτ + o(ε).
(3.40)
From the minimality of ρτ for the problem (3.37) we know that
W 2 (ρτ , ρ0 )
W22 (ρε , ρ0 )
≥ E(ρτ ) + 2
,
∀ε,
2τ
2τ
so that using (3.39) and (3.40), dividing by ε, rearranging the terms and letting ε ↓ 0 and ε ↑ 0 we
get following Euler-Lagrange equation for ρτ :
Z
Z T − Id
ρτ ∆ϕ +
, ∇ϕ ρτ = 0.
(3.41)
τ
E(ρε ) +
Now observe that from T# ρτ = ρ0 we get
R
R
Z
ϕρτ − ϕρ0
1
=−
ϕ(T (x)) − ϕ(x) ρτ (x)dx
τ
τ
ZZ 1
1
=−
h∇ϕ((1 − t)x + tT (x)), T (x) − xi dt ρτ (x) dx
τ
Z 0
1
=−
h∇ϕ(x), T (x) − xi ρτ (x) dx + Remτ
τ
Z
(3.41)
=
∆ϕ ρτ + Remτ ,
where the remainder term Remτ is bounded by
ZZ 1
Lip(∇ϕ)
Lip(∇ϕ) 2
|Remτ | ≤
t|T (x) − x|2 dt ρτ (x) dx =
W2 (ρ0 , ρτ ).
τ
2τ
0
Since, heuristically speaking, W2 (ρ0 , ρτ ) has the same magnitude of τ , we have Remτ = o(τ ) and
the “proof” is complete.
68
3.3.1
Elements of subdifferential calculus in (P2 (Rd ), W2 )
Recall that we introduced a weak Riemannian structure on the space (P2 (M ), W2 ) in Subsection 2.3.2. Among others, this weak Riemannian structure of (P2 (M ), W2 ) allows the development
of a subdifferential calculus for geodesically convex functionals, in the same spirit (and with many
formal similarities) of the usual subdifferential calculus for convex functionals on an Hilbert space.
To keep the notation and the discussion simpler, we are going to define the subdifferential of a
geodesically convex functional only for the case P2 (Rd ) and for regular measures (Definition 1.25),
but everything can be done also on manifolds (or Hilbert spaces) and for general µ ∈ P2 (M ).
Recall that for a λ-convex functional F on an Hilbert space H, the subdifferential ∂ − F (x) at a
point x is the set of vectors v ∈ H such that
F (x) + hv, y − xi +
λ
|x − y|2 ≤ F (y),
2
∀y ∈ H.
Definition 3.26 (Subdifferential in (P2 (Rd ), W2 )) Let E : P2 (Rd ) → R ∪ {+∞} be a λgeodesically convex and lower semicontinuous functional, and µ ∈ P2 (Rd ) be a regular measure
such that E(µ) < ∞. The set ∂ W E(µ) ⊂ Tanµ (P2 (Rd )) is the set of vector fields v ∈ L2 (µ, Rd )
such that
Z
ν
λ
E(µ) +
Tµ − Id, v dµ + W22 (µ, ν) ≤ E(ν),
∀ν ∈ P2 (Rd ),
2
where here and in the following Tµν will denote the optimal transport map from the regular measure
µ to ν (whose existence and uniqueness is guaranteed by Theorem 1.26).
Observe that the subdifferential of a λ-geodesically convex functional E has the following monotonicity property (which closely resembles the analogous valid for λ-convex functionals on an Hilbert
space):
Z
Z
ν
v, Tµ − Id dµ + hw, Tνµ − Idi dν ≤ −λW22 (µ, ν),
(3.42)
for every couple of regular measures µ, ν in the domain of E, and v ∈ ∂ W E(µ), w ∈ ∂ W E(ν). To
prove (3.42) just observe that from the definition of subdifferential we have
Z
ν
λ
E(µ) +
Tµ − Id, v dµ + W22 (µ, ν) ≤ E(ν),
2
Z
λ
E(ν) + hTνµ − Id, wi dν + W22 (µ, ν) ≤ E(µ),
2
and add up these inequalities.
The definition of subdifferential leads naturally to the definition of Gradient Flow: it is sufficient
to transpose the definition given with the system (3.2).
Definition 3.27 (Subdifferential formulation of Gradient Flow) Let E be a λ-geodesically convex functional on P2 (Rd ) and µ ∈ P2 (Rd ). Then (µt ) is a Gradient Flow for E starting from µ
provided it is a locally absolutely continuous curve, µt → µ as t → 0 w.r.t. the distance W2 , µt is
regular for t > 0 and it holds
−vt ∈ ∂ W E(µt ),
a.e. t,
where (vt ) is the vector field uniquely identified by the curve (µt ) via
d
µt + ∇ · (vt µt ) = 0,
dt
vt ∈ Tanµt (P2 (Rd ))
69
a.e. t,
(recall Theorem 2.29 and Definition 2.31).
Thus we have a total of 4 different formulations of Gradient Flows of λ-geodesically convex functionals on P2 (Rd ) based respectively on the Energy Dissipation Inequality, the Energy Dissipation
Equality, the Evolution Variational Inequality and the notion of subdifferential.
The important point is that these 4 formulations are equivalent for λ−geodesically convex functionals:
Proposition 3.28 (Equivalence of the various formulation of GF in the Wasserstein space) Let
E be a λ-geodesically convex functional on P2 (Rd ) and (µt ) a curve made of regular measures.
Then for (µt ) the 4 definitions of Gradient Flow for E (EDI, EDE, EVI and the Subdifferential one)
are equivalent.
Sketch of the Proof
We prove only that the EVI formulation is equivalent to the Subdifferential one. Recall that by
Proposition 2.34 we know that
Z
1 d 2
W2 (µt , ν) = −
vt , Tµνt − Id dµt ,
a.e.t
2 dt
where Tµνt is the optimal transport map from µt to ν. Then we have
−vt ∈ ∂ W E(µt ),
m
λ
E(µt ) +
−vt , Tµνt − Id dµt + W22 (µt , ν) ≤ E(ν),
2
m
λ
1 d 2
W (µt , ν) + W22 (µt , ν) ≤ E(ν),
E(µt ) +
2 dt 2
2
Z
a.e. t,
∀ν ∈ P2 (Rd ), a.e. t
∀ν ∈ P2 (Rd ), a.e. t.
3.3.2
Three classical functionals
We now pass to the analysis of 3 by now classical examples of Gradient Flows in the Wasserstein
space. Recall that in terms of strength, the best theory to use is the one of Subsection 3.2.4, because the compatibility in Energy and distance ensures strong properties both at the level of discrete
solutions and for the limit curve obtained. Once we will have a Gradient Flow, the Subdifferential
formulation will let us understand which is the PDE associated to it.
Let us recall (Example 2.21) that the space (P2 (Rd ), W2 ) is not Non Positively Curved in the
sense of Alexandrov, this means that if we want to check whether a given functional is compatible
with the distance or not, we cannot use geodesics to interpolate between points (because we would
violate the second inequality in (3.29)). A priori the choice of the interpolating curves may depend
on the functional, but actually in what comes next we will always use the ones defined by:
Definition 3.29 (Interpolating curves) Let µ, ν0 , ν1 ∈ P2 (Rd ) and assume that µ is regular (Definition 1.25). The interpolating curve (νt ) from ν0 to ν1 with base µ is defined as
νt := ((1 − t)T0 + tT1 )# µ,
where T0 and T1 are the optimal transport maps from µ to ν0 and ν1 respectively. Observe that if
µ = ν0 , the interpolating curve reduces to the geodesic connecting it to ν1 .
70
Strictly speaking, in order to apply the theory of Section 3.2.4 we should define interpolating
curves having as base any measure µ ∈ P2 (Rd ), and not just regular ones. This is actually possible,
and the foregoing discussion can be applied to the more general definition, but we prefer to avoid
technicalities, and just focus on the main concepts.
For an interpolating curve as in the definition it holds:
W22 (µ, νt ) ≤ (1 − t)W22 (µ, ν0 ) + tW22 (µ, ν1 ) − t(1 − t)W22 (ν0 , ν1 ).
(3.43)
Indeed the map (1 − t)T0 + tT1 is optimal from µ to νt (because we know that T0 and T1 are the
gradients of convex functions ϕ0 , ϕ1 respectively, thus (1 − t)T0 + tT1 is the gradient of the convex
function (1 − t)ϕ0 + tϕ1 , and thus is optimal), and we know by inequality (2.1) that W22 (ν0 , ν1 ) ≤
kT0 − T1 k2L2 (µ) , thus it holds
W22 (µ, νt ) = k(1 − t)T0 + tT1 k2L2 (µ)
= (1 − t)kT0 − Idk2L2 (µ) + tkT1 − Idk2L2 (µ) − t(1 − t)kT0 − T1 k2L2 (µ)
≤ (1 − t)W22 (µ, ν0 ) + tW22 (µ, ν1 ) − t(1 − t)W22 (ν0 , ν1 ).
We now pass to the description of the three functionals we want to study.
Definition 3.30 (Potential energy) Let V : Rd → R∪{+∞} be lower semicontinuous and bounded
from below. The potential energy functional V : P2 (Rd ) → R ∪ {+∞} associated to V is defined
by
Z
V(µ) :=
V dµ.
Definition 3.31 (Interaction energy) Let W : Rd → R ∪ {+∞} be lower semicontinuous, even
and bounded from below. The interaction energy functional W : P2 (Rd ) → R ∪ {+∞} associated
to W is defined by
Z
1
W (x1 − x2 )dµ × µ(x1 , x2 ).
W(µ) :=
2
Observe that the definition makes sense also for not even functions W ; however, replacing if necessary the function W (x) with (W (x) + W (−x))/2 we get an even function leaving the value of the
functional unchanged.
Definition 3.32 (Internal energy) Let u : [0, +∞) → R ∪ {+∞} be a convex function bounded
from below such that u(0) = 0 and
u(z)
> −∞,
α
z→0 z
lim
for some α >
d
,
d+2
(3.44)
let u0 (∞) := limz→∞ u(z)/z. The internal energy functional E associated to u is
Z
E(µ) := u(ρ)Ld + u0 (∞)µs (Rd ),
where µ = ρLd + µs is the decomposition of µ in absolutely continuous and singular parts w.r.t. the
Lebesgue measure.
71
Condition (3.44) ensures that the negative part of u(ρ) is integrable for µ ∈ P2 (Rd ), so that E
is well defined (possibly +∞). Indeed from (3.44) we have u− (z) ≤ az + bz α for some α < 1
satisfying 2α/(1 − α) > d, and it holds
Z
Z
α
d
ρ (x)dL (x) = ρα (x)(1 + |x|)2α (1 + |x|)−2α dLd (x)
Z
≤
2
d
ρ(x)(1 + |x|) dL (x)
α Z
(1 + |x|)
−2α
1−α
d
1−α
L (x)
< ∞.
Under appropriate assumptions on V , W and e the above defined functionals are compatible with
the distance W2 . As said before we will use as interpolating curves those given in Definition 3.29.
Proposition 3.33 Let λ ≥ 0. The following holds.
i) The functional V is λ-convex along interpolating curves in (P2 (Rd ), W2 ) if and only if V is
λ-convex.
ii) The functional W is λ-convex along interpolating curves (P2 (Rd ), W2 ) if W is λ-convex.
iii) The functional E is convex along interpolating curves (P2 (Rd ), W2 ) provided u satisfies
z
7→
z d u(z −d )
is convex and non increasing on (0, +∞).
(3.45)
Proof Since the second inequality in (3.29) is satisfied by the interpolating curves that we are considering (inequality (3.43)) we need only to check the convexity of the functionals.
Let (νt ) be an interpolating curve with base the regular measure µ, and T0 , T1 the optimal transport maps from µ to ν0 and ν1 respectively.
The only if part of (i) follows simply considering interpolation of deltas. For the if, observe that5
Z
Z
V(νt ) = V (x)dνt (x) = V (1 − t)T0 (x) + tT1 (x) dµ(x)
Z
Z
Z
λ
≤ (1 − t) V (T0 (x))dµ(x) + t V (T1 (x))dµ(x) − t(1 − t) |T0 (x) − T1 (x)|2 dµ(x)
2
λ
≤ (1 − t)V(ν0 ) + tV(ν1 ) − t(1 − t)W22 (ν0 , ν1 ).
2
(3.46)
For (ii) we start claiming that W22 (µ × µ, ν × ν) = 2W22 (µ, ν) for any µ, ν ∈ P2 (Rd ). To prove
˜ := (π 1 , π 1 , π 2 , π 2 )# γ ∈ Opt (µ × µ, ν × ν).
this, it is enough to check that if γ ∈ Opt (µ, ν) then γ
d
To see this, let ϕ : R → R ∪ {+∞} be a convex function such that supp(γ) ⊂ ∂ − ϕ and define the
convex function ϕ˜ on R2d by ϕ(x,
˜ y) = ϕ(x) + ϕ(y). It is immediate to verify that supp(˜
γ ) ⊂ ∂ − ϕ,
˜
˜ is optimal as well. This argument also shows that if (νt ) is an interpolating curve with
so that γ
base µ, then t 7→ νt × νt is an interpolating curve from ν0 × ν0 to ν1 × ν1 with base µ × µ. Also,
(x1 , x2 ) 7→ W (x1 − x2 ) is λ-convex if W is. The conclusion now follows from case (i).
We pass to (iii). We will make the simplifying assumption that µ Ld and that T0 and T1
are smooth and satisfy det(∇T0 )(x) 6= 0, det(∇T1 )(x) 6= 0 for every x ∈ supp(µ) (up to an
approximation argument, it is possible to reduce to this case, we omit the details). Then, writing
µ = ρLd , from the change of variable formula we get that νt Ld and for its density ρ˜t it holds
ρ˜t (Tt (x)) =
ρ(x)
,
det(∇Tt (x))
5
the assumption λ ≥ 0 is necessary to have the last inequality in (3.46). If λ < 0, λ−convexity of V along interpolating
curves is not anymore true, so that we cannot apply directly the results of Subsection 3.2.4. Yet, adapting the arguments, it
possible to show that all the results which we will present hereafter are true for general λ ∈ R.
72
where we wrote Tt for (1 − t)T0 + tT1 . Thus
Z
Z d
E(νt ) = u(˜
ρt (y))dL (y) = u
ρ(x)
det(∇Tt )(x)
det(∇Tt )(x)dLd (x).
ρ(x)
) det(A) is convex on the set of
Therefore the proof will be complete if we show that A 7→ u( det(A)
positively defined symmetric matrices for any x ∈ supp(µ). Observe that this map is the composition
of the convex and non increasing map z 7→ z d u(ρ(x)/z d ) with the map A 7→ (det(A))1/d . Thus to
conclude it is sufficient to show that A 7→ (det(A))1/d is concave. To this aim, pick two symmetric
and positive definite matrices A0 and A1 , notice that
det((1 − t)A0 + tA1 )
where B =
√
1/d
1/d
= det(A0 ) det(Id + tB)
,
√
A0 (A1 − A0 ) A0 and conclude by
1/d
d
1
det(Id + tB)1/d =
det(Id + tB)
tr B (Id + tB)−1 ,
dt
d
1
2 1
d2
det(Id + tB)1/d = 2 tr2 B (Id + tB)−1 − tr B (Id + tB)−1
≤0
dt2
d
d
where in the last step we used the inequality tr2 (C) ≤ d tr(C 2 ) for C = B (Id + tB)−1 .
Important examples of functions u satisfying (3.44) and (3.45) are:
zα − z
,
α−1
u(x) = z log(z).
1
α ≥ 1 − , α 6= 1
d
u(z) =
(3.47)
Remark 3.34 (A dimension free condition on u) We saw that a sufficient condition on u to ensure
that E is convex along interpolating curves is the fact that the map z 7→ z d u(z −d ) is convex and
non increasing, so the dimension d of the ambient space plays a role in the condition. The fact that
the map is non increasing follows by the convexity of u together with u(0) = 0, while by simple
computations we see that its convexity is equivalent to
z −1 u(z) − u0 (z) + zu00 (z) ≥ −
1
zu00 (z).
d−1
(3.48)
Notice that the higher d is, the stricter the condition becomes. For applications in infinite dimensional
spaces, it is desirable to have a condition on u ensuring the convexity of E in which the dimension
does not enter. As inequality (3.48) shows, the weakest such condition for which E is convex in any
dimension is:
z −1 u(z) − u0 (z) + zu00 (z) ≥ 0,
and some computations show that this is in turn equivalent to the convexity of the map
z
7→
ez u(e−z ).
A key example of map satisfying this condition is z 7→ z log(z) .
Therefore we have the following existence and uniqueness result:
73
Theorem 3.35 Let λ ≥ 0 and F be either V, W, E (or a linear combination of them with positive
coefficients) and λ-convex along interpolating curves. Then for every µ ∈ P2 (Rd ) there exists a
unique Gradient Flow (µt ) for F starting from µ in the EVI formulation. The curve (µt ) satisfies: is
locally absolutely continuous on (0, +∞), µt → µ as t → 0 and, if µt is regular for every t ≥ 0, it
holds
−vt ∈ ∂ W F (µt ),
a.e. t ∈ (0, +∞),
(3.49)
where (vt ) is the velocity vector field associated to (µt ) characterized by
d
µt + ∇ · (vt µt ) = 0,
dt
vt ∈ Tanµt (P2 (Rd ))
a.e. t.
Proof Use the existence Theorem 3.25 and the equivalence of the EVI formulation of Gradient Flow
and the Subdifferential one provided by Proposition 3.28.
It remains to understand which kind of equation is satisfied by the Gradient Flow (µt ). By
equation (3.49), this corresponds to identify the subdifferentials of V, W, E at a generic µ ∈ P2 (Rd ).
This is the content of the next three propositions. For simplicity, we state and prove them only under
some - unneeded - smoothness assumptions. The underlying idea of all the calculations we are going
to do is the following equivalence:
Z
F((Id + ε∇ϕ)# µ) − F(µ)
≈
v ∈ ∂ W F(µ)
⇔
lim
= hv, ∇ϕi , ∀ϕ ∈ Cc∞ (Rd ),
ε→0
ε
(3.50)
≈
valid for any λ-geodesically convex functional, where we wrote ⇔ to intend that this equivalence
holds only when everything is smooth. To understand why (3.50) holds, start assuming that v ∈
∂ W F (µ), fix ϕ ∈ Cc∞ (Rd ) and recall that for ε sufficiently small the map Id + ε∇ϕ is optimal
(Remark 1.22). Thus by definition of subdifferential we have
Z
λ
F(µ) + ε hv, ∇ϕi dµ + ε2 k∇ϕk2L2 (µ) ≤ F((Id + ε∇ϕ)# µ).
2
Subtracting F(µ) on both sides, dividing by ε > 0 and ε < 0 and letting ε → 0 we get the implication
⇒. To “prove” the converse one, pick ν ∈ P2 (Rd ), let T be the optimal transport map from µ to
ν and recall that T is the gradient of a convex function φ. Assume that φ is smooth and define
ϕ(x) := φ(x) − |x|2 /2. The geodesic (µt ) from µ to ν can then be written as
µt = (1 − t)Id + tT # µ = (1 − t)Id + t∇φ # µ = Id + t∇ϕ # µ.
From the λ-convexity hypothesis we know that
λ
d
F(µt ) + W22 (µ, ν),
|
t=0
dt
2
R
d
therefore, since we know that dt |t=0 F(µt ) = hv, ∇ϕi dµ, from the arbitrariness of ν we deduce
v ∈ ∂ W F(µ).
F(ν) ≥ F(µ) +
Proposition 3.36 (Subdifferential of V) Let V : Rd → R be λ-convex and C 1 , let V be as in
Definition 3.30 and let µ ∈ P2 (Rd ) be regular and satisfying V(µ) < ∞. Then ∂ W V(µ) is non
empty if and only if ∇V ∈ L2 (µ), and in this case ∇V is the only element in the subdifferential of V
at µ.
74
Therefore, if (µt ) is a Gradient Flow of V made of regular measures, it solves
d
µt = ∇ · (∇V µt ),
dt
in the sense of distributions in Rd × (0, +∞).
Sketch of the Proof Fix ϕ ∈ Cc∞ (Rd ) and observe that
Z
Z
V((Id + ε∇ϕ)# µ) − V(µ)
V ◦ (Id + ε∇ϕ) − V
lim
= lim
dµ = h∇V, ∇ϕi dµ.
ε→0
ε→0
ε
ε
Conclude using the equivalence (3.50).
Proposition 3.37 (Subdifferential of W) Let W : Rd → R be λ-convex, even and C 1 , let W be
defined by 3.31 and µ be regular and satisfying W(µ) < ∞. Then ∂ W W(µ) 6= ∅ if and only if
(∇W ) ∗ µ belongs to L2 (µ) and in this case (∇W ) ∗ µ is the only element in the subdifferential of
W at µ.
Therefore, if (µt ) is a Gradient Flow of W made of regular measures, it solves the non local
evolution equation
d
µt = ∇ · ((∇W ∗ µt )µt ),
dt
in the sense of distributions in Rd × (0, +∞).
Sketch of the Proof Fix ϕ ∈ Cc∞ (Rd ), let µε := (Id + ε∇ϕ)# µ and observe that
Z
Z
1
1
W µε =
W (x − y)dµε (x)dµε (y) =
W (x − y + ε(∇ϕ(x) − ∇ϕ(y)))dµ(x)dµ(y)
2
2
Z
Z
ε
1
W (x − y)dµ(x)dµ(y) +
h∇W (x − y), ∇ϕ(x) − ∇ϕ(y)i dµ(x)dµ(y) + o(ε).
=
2
2
Now observe that
Z
Z Z
h∇W (x − y), ∇ϕ(x)i dµ(x)dµ(y) =
∇W (x − y)dµ(y), ∇ϕ(x) dµ(x)
Z
= h∇W ∗ µ(x), ∇ϕ(x)i dµ(x),
and, similarly,
Z
Z
h∇W (x − y), −∇ϕ(y)i dµ(x)dµ(y) =
h∇W ∗ µ(y), ∇ϕ(y)i dµ(y)
Z
=
h∇W ∗ µ(x), ∇ϕ(x)i dµ(x).
Thus the conclusion follows by applying the equivalence (3.50).
Proposition 3.38 (Subdifferential of E) Let u : [0, +∞) → R be convex, C 2 on (0, +∞), bounded
from below and satisfying conditions (3.44) and (3.45). Let µ = ρLd ∈ P2 (Rd ) be an absolutely
continuous measure with smooth density. Then ∇(u0 (ρ)) is the unique element in ∂ W E(µ).
Therefore, if (µt ) is a Gradient Flow for E and µt is absolutely continuous with smooth density
ρt for every t > 0, then t 7→ ρt solves the equation
d
ρt = ∇ · (ρt ∇(u0 (ρt ))).
dt
75
Note: this statement is not perfectly accurate, because we are neglecting the integrability issues.
Indeed a priori we don’t know that ∇(u0 (ρ)) belongs to L2 (µ).
Sketch of the Proof Fix ϕ ∈ Cc∞ (Rd ) and define µε := (Id + ε∇ϕ)# µ. For ε sufficiently small, µε
is absolutely continuous and its density ρε satisfies - by the change of variable formula - the identity
ρε (x + ε∇ϕ(x)) =
Using the fact that
d
dε |ε=0 (det(Id
ρ(x)
.
det(Id + ε∇2 ϕ(x))
+ ε∇2 ϕ(x))) = ∆ϕ(x) we have
Z
Z d
d
ρ(x)
d
ε
ε
E(µ
)
=
u(ρ
(y))dy
=
u
det(Id + ε∇2 ϕ(x))dx
dε |ε=0
dε |ε=0
dε |ε=0
det(Id + ε∇2 ϕ(x))
Z
Z
Z
0
0
= −ρu (ρ)∆ϕ + u(ρ)∆ϕ = h∇(ρu (ρ) − u(ρ)), ∇ϕi = h∇(u0 (ρ)), ∇ϕi ρ,
and the conclusion follows by the equivalence (3.50).
As an example, let u(z) := z log(x), and let V be a λ-convex smooth function on Rd . Since
u (z) = log(z) + 1, we have ρ∇(u0 (ρ)) = ∆ρ, thus a gradient flow (ρt ) of F = E + V solves the
Fokker-Plank equation
d
ρt = ∆ρt + ∇ · (∇V ρt ).
dt
Also, the contraction property (3.31) in Theorem 3.25 gives that for two gradient flows (ρt ), (˜
ρt ) it
holds the contractivity estimate
0
W2 (ρt , ρ˜t ) ≤ e−λt W2 (ρ0 , ρ˜0 ).
3.4
Bibliographical notes
The content of Section 3.2 is taken from the first part of [6] (we refer to this book for a detailed
bibliographical references on the topic of gradient flows in metric spaces), with the only exception
of Proposition 3.6, whose proof has been communicated to us by Savaré (see also [72], [73]).
The study of geodesically convex functionals in (P2 (Rd ), W2 ) has been introduced by R. McCann in [63], who also proved that conditions (3.44) and (3.45) were sufficient to deduce the geodesic
convexity (called by him displacement convexity) of the internal energy functional.
The study of gradient flows in the Wasserstein space began in the seminal paper by R. Jordan, D.
Kinderlehrer and F. Otto [47], where it was proved that the minimizing movements procedure for the
functional
Z
ρLd
7→
ρ log ρ + V ρdLd ,
on the space (P2 (Rd ), W2 ), produce solutions of the Fokker-Planck equation. Later, F. Otto in [67]
showed that the same discretization applied to
Z
1
d
ρL
7→
ρα dLd ,
α−1
(with the usual meaning for measures with a singular part) produce solutions of the porous medium
equation. The impact of Otto’s work on the community of optimal transport has been huge: not only
he was able to provide concrete consequences (in terms of new estimates for the rate of convergence
76
of solutions of the porous medium equation) out of optimal transport theory, but he also clearly
described what is now called the ‘weak Riemannian structure’ of (P2 (Rd ), W2 ) (see also Chapter 6
and Subsection 2.3.2).
Otto’s intuitions have been studied and extended by many authors. The rigorous description
of many of the objects introduced by Otto, as well as a general discussion about gradient flows of
λ-geodesically convex functionals on (P2 (Rd ), W2 ) has been done in the second part of [6] (the
discussion made here is taken from this latter reference).
4
Geometric and functional inequalities
In this short Chapter we show how techniques coming from optimal transport can lead to simple
proofs of some important geometric and functional inequalities. None of the results proven here are
new, in the sense that they all were well known before the proofs coming from optimal transport
appeared. Still, it is interesting to observe how the tools described in the previous sections allow to
produce proofs which are occasionally simpler and in any case providing new informations when
compared to the ‘standard’ ones.
4.1
Brunn-Minkowski inequality
Recall that the Brunn-Minkowski inequality in Rd is:
L
d
A+B
2
1/d
≥
1/d
1/d 1
L d (A)
+ L d (B)
,
2
and is valid for any couple of compact sets A, B ⊂ Rd .
To prove it, let A, B ⊂ Rd be compact sets and notice that without loss of generality we can
assume that L d (A), L d (B) > 0. Define
µ0 :=
1
L d (A)
L d |A
µ1 :=
1
L d (B)
L d |B ,
and let (µt ) be the unique geodesic in (P2 (Rd ), W2 ) connecting them.
R
Recall from (3.47) that for u(z) = −d(z 1−1/d − z) the functional E(ρ) := u(ρ)dLd is geodesically convex in (P2 (Rd ), W2 ). Also, simple calculations show that E(µ0 ) = −d(L d (A)1/d − 1),
E(µ1 ) = −d(L d (B)1/d − 1). Hence we have
E(µ1/2 ) ≤ −
1/d
1/d d
L d (A)
+ L d (B)
+ d.
2
Now notice that Theorem 2.10 (see also Remark 2.13) ensures that µ1/2 is concentrated on A+B
2 ,
d
−1
d
thus letting µ
˜1/2 := (L ((A+B)/2)) L |(A+B)/2 and applying Jensen’s inequality to the convex
function u we get
1/d
d A+B
E(µ1/2 ) ≥ E(˜
µ1/2 ) = −d L
−1 ,
2
which concludes the proof.
77
4.2
Isoperimetric inequality
On Rd the isoperimetric inequality can be written as
P (E)
1
L d (E)1− d ≤
1
dL d (B) d
,
where E is an arbitrary open set, P (E) its perimeter and B the unitary ball.
We will prove this inequality via Brenier’s theorem 1.26, neglecting all the smoothness issues.
Let
1
1
µ := d
L d |E ,
L d |B ,
ν := d
L (E)
L (B)
and T : E → B be the optimal transport map (w.r.t. the cost given by the distance squared). The
change of variable formula gives
1
L d (E)
= det(∇T (x))
1
L d (B)
,
∀x ∈ E.
Since we know that T is the gradient of a convex function, we have that ∇T (x) is a symmetric matrix
with non negative eigenvalues for every x ∈ E. Hence the arithmetic-geometric mean inequality
ensures that
∇ · T (x)
(det ∇T (x))1/d ≤
,
∀x ∈ E.
d
Coupling the last two equations we get
1
1
L d (E) d
≤
∇ · T (x)
1
1
d
d
L (B) d
∀x ∈ E.
Integrating over E and applying the divergence theorem we get
Z
Z
1
1
1
∇
·
T
(x)dx
=
hT (x), ν(x)i dHd−1 (x),
L d (E)1− d ≤
dL d (B)1/d E
dL d (B)1/d ∂E
where ν : ∂E → Rd is the outer unit normal vector. Since T (x) ∈ B for every x ∈ E, we have
|T (x)| ≤ 1 for x ∈ ∂E and thus hT (x), ν(x)i ≤ 1. We conclude with
Z
1
1
P (E)
d
1− d
L (E)
≤
hT (x), ν(x)i dHd−1 (x) ≤
.
d
1/d
dL (B)
dL d (B)1/d
∂E
4.3
Sobolev Inequality
The Sobolev inequality in Rd reads as:
Z
1/p∗
Z
1/p
p∗
p
|f |
≤ C(d, p)
|∇f |
,
∀f ∈ W 1,p (Rd ),
dp
where 1 ≤ p < d, p∗ := d−p
and C(d, p) is a constant which depends only on the dimension d and
the exponent p.
We will prove it via a method which closely resemble the one just used for the isoperimetric
inequality. Again, we will neglect all the
R smoothness issues. Fix d, p and observe that without loss
of generality we can assume f ≥ 0 and |f |p∗ = 1, so that our aim is to prove that
Z
1/p
|∇f |p
≥ C,
(4.1)
78
for some constant C not
R depending on f . Fix once and for all a smooth, non negative function
g : Rd → R satisfying g = 1, define the probability measures
∗
µ := f p L d ,
ν := gL d ,
and let T be the optimal transport map from µ to ν (w.r.t. the cost given by the distance squared).
The change of variable formula gives
∗
g(T (x)) =
f p (x)
,
det(∇T (x))
∀x ∈ Rd .
Hence we have
Z
1
g 1− d =
Z
1
g− d g =
Z
1
∗
(g ◦ T )− d f p =
Z
1
∗
1
det(∇T ) d (f p )1− d .
As for the case of the isoperimetric inequality, we know that T is the gradient of a convex function,
thus ∇T (x) is a symmetric matrix with non negative eigenvalues and the arithmetic-geometric mean
inequality gives (det(∇T (x)))1/d ≤ ∇·Td(x) . Thus we get
Z
Z
Z
p∗
∗
1
1
1
1
p∗
g 1− d ≤
∇ · T (f p )1− d = −
1−
f q T · ∇f,
d
d
d
where
Z
g
1
p
1
1− d
+
1
q
= 1. Finally, by Hölder inequality we have
p∗
≤
d
1
1−
d
Z
p∗
q
f |T |
q1 Z
p
|∇f |
p1
p∗
=
d
1
1−
d
Z
q
g(y)|y| dy
q1 Z
|∇f |
Since g was a fixed given function, (4.1) is proved.
4.4
Bibliographical notes
The possibility of proving Brunn-Minkowski inequality via a change of variable is classical. It has
been McCann in his PhD thesis [62] to notice that the use of optimal transport leads to a natural
choice of reparametrization. It is interesting to notice that this approach can be generalized to curved
and non-smooth spaces having Ricci curvature bounded below, see Proposition 7.14.
The idea of proving the isoperimetric inequality via a change of variable argument is due to Gromov [65]: in Gromov’s proof it is not used the optimal transport map, but the so called Knothe’s
map. Such a map has the property that its gradient has non negative eigenvalues at every point, and
the reader can easily check that this is all we used of Brenier’s map in our proof, so that the argument
of Gromov is the same we used here. The use of Brenier’s map instead of Knothe’s one makes the
difference when studying the quantitative version of the isoperimetric problem: Figalli, Maggi and
Pratelli in [38], using tools coming from optimal transport, proved the sharp quantitative isoperimetric inequality in Rd endowed with any norm (the sharp quantitative isoperimetric inequality for
the Euclidean norm was proved earlier by Fusco, Maggi and Pratelli in [40] by completely different
means).
The approach used here to prove the Sobolev inequality has been generalized by CorderoErasquin, Nazaret and Villani in [30] to provide a new proof of the sharp Gagliardo-NirenbergSobolev inequality together with the identification of the functions realizing the equality
79
p
p1
.
5
Variants of the Wasserstein distance
In this chapter we make a quick overview of some variants of the Wasserstein distance W2 together
with their applications. No proofs will be reported: our goal here is only to show that concepts
coming from the transport theory can be adapted to cover a broader range of applications.
5.1
Branched optimal transportation
Consider the transport problem with µ := δx and ν := 21 (δy1 + δy2 ) for the cost given by the distance
squared on Rd . Then Theorem 2.10 and Remark 2.13 tell that the unique geodesic (µt ) connecting
µ to ν is given by
1
µt :=
δ(1−t)x+ty1 + δ(1−t)x+ty2 ,
2
so that the geodesic produces a ‘V-shaped’ path.
For some applications, this is unnatural: for instance in real life networks, when one wants to
transport the good located in x to the destinations y1 and y2 it is preferred to produce a branched
structure, where first the good it is transported ‘on a single truck’ to some intermediate point, and
only later split into two parts which are delivered to the 2 destinations. This produces a ‘Y-shaped’
path.
If we want to model the fact that ‘it is convenient to ship things together’, we are lead to the
following
construction, due to Gilbert.
starting distribution
of mass is given by µ =
P
P Say that the P
P
a
δ
and
that
the
final
one
is
ν
=
b
δ
,
with
a
=
b
=
1.
An admissible dynamical
i
x
j
y
i
j
i
j
i
j
i
j
transfer is then given by a finite, oriented, weighted graph G, where the weight is a function w :
{set of edges of G} → R, satisfying the Kirchoff’s rule:
X
X
w(e)
−
w(e) = ai ,
∀i
edges e outgoing from xi
X
edges e incoming in xi
w(e)
edges e outgoing from yj
X
X
−
w(e) = −bj ,
∀j
edges e incoming in yj
w(e)
−
edges e outgoing from z
X
w(e) = 0,
for any ‘internal’ node z of G
edges e incoming in z
Then for α ∈ [0, 1] one minimizes
X
wα (e) · length(e),
edges e of G
among all admissible graphs G.
Observe that for α = 0 this problem reduces to the classical Steiner problem, while for α = 1 it
reduces to the classical optimal transport problem for cost = distance.
It is not hard to show the existence of a minimizer for this problem. What is interesting, is that a
‘continuous’ formulation is possible as well, which allows to discuss the minimization problem for
general initial and final measure in P(Rd ).
Definition 5.1 (Admissible continuous dynamical transfer) Let µ, ν ∈ P(Rd ). An admissible
continuous dynamical transfer from µ to ν is given by a countably H1 -rectifiable set Γ, an orientation
on it τ : Γ → S d−1 , and a weight function w : Γ → [0, +∞), such that the Rd valued measure
JΓ,τ,w defined by
JΓ,τ,w := wτ H1 |Γ ,
80
satisfies
∇ · JΓ,τ,w = ν − µ,
(which is the natural generalization of the Kirchoff rule).
Given α ∈ [0, 1], the cost function associated to (Γ, τ, w) is defined as
Z
Eα (JΓ,τ,w ) :=
wα dH 1 .
Γ
Theorem 5.2 (Existence) Let µ, ν ∈ P(Rd ) with compact support. Then for all α ∈ [0, 1) there
exists a minimizer of the cost in the set of admissible continuous dynamical transfers connecting µ to
ν. If µ = δz and ν = Ld | d , the minimal cost is finite if and only if α > 1 − 1/d.
[0,1]
The fact that 1 − 1/d is a limit value to get a finite cost, can be heuristically understood by the
following calculation. Suppose we want to move a Delta mass δx into the Lebesgue measure on a
unit cube whose center is x. Then the first thing one wants to try is: divide the cube into 2d cubes
of side length 1/2, then split the delta into 2d masses and let them move onto the centers of these 2d
cubes. Repeat the process by dividing each of the 2d cubes into 2d cubes of side length 1/4 and so
on. The total cost of this dynamical transfer is proportional to:
∞
X
i=1
2id
|{z}
number of segments
at the step i
1
1
i
2
|{z}
αid
|2{z }
=
∞
X
2i(d−1−αd) ,
i=1
length of each
weighted mass on each
segment at the step i segment at the step i
which is finite if and only if d − 1 − αd < 0, that is, if and only if α > 1 − d1 .
A regularity result holds for α ∈ (1 − 1/d, 1) which states that far away from the supports of the
starting and final measures, any minimal transfer is actually a finite tree:
Theorem 5.3 (Regularity) Let µ, ν ∈ P(Rd ) with compact support, α ∈ (1 − 1/n, 1) and let
(Γ, τ, w) be a continuous tree with minimal α-cost between µ and ν. Then Γ is locally a finite tree in
Rd \ (supp µ ∪ supp ν).
5.2
Different action functional
Let us recall that the Benamou-Brenier formula (Proposition 2.30) identifies the squared Wasserstein
distance between µ0 = ρ0 L d , µ1 := ρ1 L d ∈ P2 (Rd ) by
Z 1Z
W22 (µ0 , µ1 ) = inf
|vt |2 (x)ρt (x)dL d (x)dt,
0
where the infimum is taken among all the distributional solutions of the continuity equation
d
ρt + ∇ · (vt ρt ) = 0,
dt
with ρ0 = ρ0 and ρ1 = µ1 .
A natural generalization of the distance W2 comes by considering a new action, modified by
putting a weight on the density, that is: given a smooth function h : [0, ∞) → [0, ∞) we define
Z 1Z
Wh2 (ρ0 Ld , ρ1 Ld ) = inf
|vt |2 (x)h(ρt (x))dL d (x)dt,
(5.1)
0
81
where the infimum is taken among all the distributional solutions of the non linear continuity equation
d
ρt + ∇ · (vt h(ρt )) = 0,
(5.2)
dt
with ρ0 = ρ0 and ρ1 = ρ1 .
The key assumption that leads to the existence of an action minimizing curve is the concavity of
h, since this leads to the joint convexity of
J 2
,
(ρ, J) 7→ h(ρ) h(ρ) so that using this convexity with J = vh(ρ), one can prove existence of minima of (5.1). Particularly
˜α
important is the case given by h(z) := z α for α < 1 from which we can build the distance W
defined by
Z 1Z
α1
2−α
0
d 1
d
2
d
˜
Wα (ρ L , ρ L ) := inf
|vt | (x)ρt (x)dL (x)dt
,
(5.3)
0
the infimum being taken among all solutions of (5.2) with ρ0 = ρ0 and ρ1 = ρ1 . The following
theorem holds:
Theorem 5.4 Let α > 1 − d1 . Then the infimum in (5.3) is always reached and, if it is finite, the
˜ α (µ, ν) < ∞
minimizer is unique. Now fix a measure µ ∈ P(Rd ). The set of measures ν with W
˜ α is a complete metric space and bounded subsets are narrowly compact.
endowed with W
We remark that the behavior of action minimizing curves in this setting is, in some very rough
sense, “dual” of the behavior of the branched optimal transportation discussed in the previous section.
Indeed, in this problem the mass tends to spread out along an action minimizing curve, rather than to
glue together.
5.3
An extension to measures with unequal mass
RLet us come back to the Heat equation seen as Gradient Flow of the entropy functional E(ρ) =
ρ log(ρ) with respect to the Wasserstein distance W2 , as discussed at the beginning of Section 3.3
and in Subsection 3.3.2. We discussed the topic for arbitrary probability measures in Rd , but actually
everything could have been done for probability measures concentrated on some open bounded set
Ω ⊂ Rd with smooth
R boundary, that is: consider the metric space (P(Ω), W2 ) and the entropy
functional E(ρ) = ρ log(ρ) for absolutely continuous measures and E(µ) = +∞ for measures
with a singular part. Now use the Minimizing Movements scheme to build up a family of discrete
solutions ρτt starting from some given measure ρ ∈ P(Ω). It is then possible to see that these discrete
families converge as τ ↓ 0 to the solution of the Heat equation with Neumann boundary condition:

d
in Ω × (0, +∞),

dt ρt = ∆ρt ,
ρt → ρ,
weakly as t → 0

∇ρt · η = 0,
in ∂Ω × (0, ∞),
where η is the outward pointing unit vector on ∂Ω.
The fact that the boundary condition is the Neumann’s one, can be heuristically guessed by the
fact that working in P(Ω) enforces the mass to be constant, with no flow of the mass through the
boundary.
It is then natural to ask whether it is possible to modify the transportation distance in order to
take into account measures with unequal masses, and such that the Gradient Flow of the entropy
82
functional produces solutions of the Heat equation in Ω with Dirichlet boundary conditions. This is
actually doable, as we briefly discuss now.
Let Ω ⊂ Rd be open and bounded. Consider the set M2 (Ω) defined by
Z
n
o
M2 (Ω) := measures µ on Ω such that
d2 (x, ∂Ω)dµ(x) < ∞ ,
and for any µ, ν ∈ M2 (Ω) define the set of admissible transfer plans Admb (µ, ν) by: γ ∈
Admb (µ, ν) if and only if γ is a measure on (Ω)2 such that
1
π#
γ |Ω = µ,
2
π#
γ |Ω = ν.
Notice the difference w.r.t. the classical definition of transfer plan: here we are requiring the first
(respectively, second) marginal to coincide with µ (respectively ν) only inside the open set Ω. This
means that in transferring the mass from µ to ν we are free to take/put as much mass as we want
from/to the boundary. Then one defines the cost C(γ) of a plan γ by
Z
C(γ) := |x − y|2 dγ(x, y),
and then the distance W b2 by
W b2 (µ, ν) := inf
p
C(γ),
where the infimum is taken among all γ ∈ Admb (µ, ν).
The distance W b2 shares many properties with the Wasserstein distance W2 .
Theorem 5.5 (Main properties of W b2 ) The following hold:
• W b2 is a distance on M2 (Ω) and the metric space (M2 (Ω), W b2 ) is Polish and geodesic.
• A sequence (µn ) ⊂ M2 (Ω) converges to µ w.r.t. W b2 if and only if µnR converges weakly
2
to
R µ2 in duality with continuous functions with compact support in Ω and d (x, ∂Ω)dµn →
d (x, ∂Ω)dµ as n → ∞.
• Finally, a plan γ ∈ Admb (µ, ν) is optimal (i.e. it attains the minimum cost among admissible
plans) if and only there exists a c-concave function ϕ which is identically 0 on ∂Ω such that
supp(γ) ⊂ ∂ c ϕ (here c(x, y) = |x − y|2 ).
Observe that (M2 (Ω), W b2 ) is always a geodesic space (while from Theorem 2.10 and Remark
2.14 we know that (P(Ω), W2 ) is geodesic if and only if Ω is, that is, if and only if Ω is convex).
It makes perfectly
sense to extend the entropy functional to the whole M2 (Ω): the formula is
R
still E(µ) = ρ log(ρ) for µ = ρLd |Ω , and E(µ) = +∞ for measures not absolutely continuous.
The Gradient Flow of the entropy w.r.t. W b2 produces solutions of the Heat equation with Dirichlet
boundary conditions in the following sense:
Theorem 5.6 Let µ ∈ M2 (Ω) be such that E(µ) < ∞. Then:
• for every τ > 0 there exists a unique discrete solution ρτt starting from µ and constructed via
the Minimizing Movements scheme as in Definition 3.7.
• As τ ↓ 0, the measures ρτt converge to a unique measure ρt in (M2 (Ω), W b2 ) for any t > 0.
• The map (x, t) 7→ ρt (x) is a solution of the Heat equation
d
in Ω × (0, +∞),
dt ρt = ∆ρt ,
ρt → µ,
weakly as t → 0,
subject to the Dirichlet boundary condition ρt (x) = e−1 in ∂Ω for every t > 0 (that is, ρt −e−1
belongs to H01 (Ω) for every t > 0).
83
The fact that the boundary value is given by e−1 can be heuristically guessed by the fact that
the entropy has a global minimum in M2 (Ω): such minimum is given by the measure with constant
density e−1 , i.e. the measure whose density is everywhere equal to the minimum of z 7→ z log(z).
On the bad side, the entropy E is not geodesically convex in (M2 (Ω), W b2 ), and this implies that
it is not clear whether the strong properties of Gradient Flows w.r.t. W2 as described in Section 3.3
- Theorem 3.35 and Proposition 3.38 are satisfied also in this setting. In particular, it is not clear
whether there is contractivity of the distance or not:
Open Problem 5.7 Let ρ1t , ρ2t two solutions of the Heat equation with Dirichlet boundary condition
ρit = e−1 in ∂Ω for every t > 0, i = 1, 2. Prove or disprove that
W b2 (ρ1s , ρ2s ) ≤ W b2 (ρ1t , ρ2t ),
∀t > s.
The question is open also for convex and smooth open sets Ω.
5.4
Bibliographical notes
The connection of branched transport and transport problem as discussed in Section 5.1 was first
pointed out by Q. Xia in [81]. An equivalent model was proposed by F. Maddalena, J.-M. Morel and
S. Solimini in [61]. In [81], [60] and [15] the existence of an optimal branched transport (Theorem
5.2) was also provided. Later, this result has been extended in several directions, see for instance the
works A. Brancolini, G. Buttazzo and F. Santambrogio ([16]) and Bianchini-Brancolini [15]. The
interior regularity result (Theorem 5.3) has been proved By Q. Xia in [82] and M. Bernot, V. Caselles
and J.-M. Morel in [14]. Also, we remark that L. Brasco, G. Buttazzo and F. Santambrogio proved a
kind of Benamou-Brenier formula for branched transport in [17].
The content of Section 5.2 comes from J. Dolbeault, B. Nazaret and G. Savaré [33] and [26] of J.
Carrillo, S. Lisini, G. Savaré and D. Slepcev.
Section 5.3 is taken from a work of the second author and A. Figalli [37].
6
More on the structure of (P2 (M ), W2 )
The aim of this Chapter is to give a comprehensive description of the structure of the ‘Riemannian
manifold’ (P2 (Rd ), W2 ), thus the content of this part of the work is the natural continuation of what
we discussed in Subsection 2.3.2. For the sake of simplicity, we are going to stick to the Wasserstein
space on Rd , but the reader should keep in mind that the discussions here can be generalized with
only little effort to the Wasserstein space built over a Riemannian manifold.
6.1
“Duality” between the Wasserstein and the Arnold Manifolds
The content of this section is purely formal and directly comes from the seminal paper of Otto [67].
We won’t even try to provide a rigorous background for the discussion we will do here, as we believe
that dealing with the technical problems would lead the reader far from the geometric intuition. Also,
we will not use the “results” presented here later on: we just think that these concepts are worth of
mention. Thus for the purpose of this section just think that ‘each measure is absolutely continuous
with smooth density’, that ‘each L2 function is C ∞ ’, and so on.
Let us recall the definition of Riemannian submersion. Let M, N be Riemannian manifolds and
let f : M → N a smooth map. f is a submersion provided the map:
df : Ker⊥ df (x) → Tf (x) N,
84
is a surjective isometry for any x ∈ M . A trivial example of submersion is given in the case M =
N ×L (for some Riemannian manifold L, with M endowed with the product metric) and f : M → N
is the natural projection. More generally, if f is a Riemannian submersion, for each y ∈ N , the set
f −1 (y) ⊂ M is a smooth Riemannian submanifold.
The “duality” between the Wasserstein and the Arnold Manifolds consists in the fact that there
exists a Big Manifold BM which is flat and a natural Riemannian submersion from BM to P2 (Rd )
whose fibers are precisely the Arnold Manifolds.
Let us define the objects we are dealing with. Fix once and for all a reference measure ρ ∈
P2 (Rd ) (recall that we are “assuming” that all the measures are absolutely continuous with smooth
densities - so that we will use the same notation for both the measure and its density).
• The Big Manifold BM is the space L2 (ρ) of maps from Rd to Rd which are L2 w.r.t. the
reference measure ρ. The tangent space at some map T ∈ BM is naturally given by the set of
vector fields belonging to L2 (ρ), where the perturbation of T in the direction of the vector field
u is given by t 7→ T + tu.
• The target manifold of the submersion is the Wasserstein “manifold” P2 (Rd ). We recall that
the tangent space Tanρ (P2 (Rd )) at the measure ρ is the set
n
o
Tanρ (P2 (Rd )) := ∇ϕ : ϕ ∈ Cc∞ (Rd ) ,
endowed with the scalar product of L2 (ρ) (we neglect to take the closure in L2 (ρ) because we
want to keep the discussion at a formal level). The perturbation of a measure ρ in the direction
of a tangent vector ∇ϕ is given by t 7→ (Id + t∇ϕ)# ρ.
• The Arnold Manifold Arn(ρ) associated to a certain measure ρ ∈ P2 (Rd ) is the set of maps
S : Rd → Rd which preserve ρ:
n
Arn(ρ) := S : Rd → Rd : S# ρ = ρ}.
We endow Arn(ρ) with the L2 distance calculated w.r.t. ρ. To understand who is the tangent
space at Arn(ρ) at a certain map S, pick a vector field v on Rd and consider the perturbation
d
t 7→ S + tv of S in the direction of v. Then v is a tangent vector if and only if dt
|t=0 (S +
tv)# ρ = 0. Observing that
d
d
d
(S+tv)# ρ = |t=0 (Id+tv◦S −1 )# (S# ρ) = |t=0 (Id+tv◦S −1 )# ρ = ∇·(v◦S −1 ρ),
dt |t=0
dt
dt
we deduce
n
o
TanS Arn(ρ) = vector fields v on Rd such that ∇ · (v ◦ S −1 ρ) = 0 ,
which is naturally endowed with the scalar product in L2 (ρ).
We are calling the manifold Arn(ρ) an Arnold Manifold, because if ρ is the Lebesgue measure
restricted to some open, smooth and bounded set Ω, this definition reduces to the well known
definition of Arnold manifold in fluid mechanics: the geodesic equation in such space is formally - the Euler equation for the motion of an incompressible and inviscid fluid in Ω.
• Finally, the “Riemannian submersion” Pf from BM to P2 (Rd ) is the push forward map:
Pf : BM
T
85
→ P2 (Rd ),
7→ T# ρ,
We claim that Pf is a Riemannian submersion and that the fiber Pf −1 (ρ) is isometric to the manifold
Arn(ρ).
We start considering the fibers. Fix ρ ∈ P2 (Rd ). Observe that
o
n
Pf −1 (ρ) = T ∈ BM : T# ρ = ρ ,
and that the tangent space TanT Pf −1 (ρ) is the set of vector fields u such that
so that from
d
dt |t=0 (T
+tu)# ρ = 0,
d
d
d
(T + tu)# ρ = |t=0 (Id + tu ◦ T −1 )# (T# ρ) = |t=0 (Id + tu ◦ T −1 )# ρ = ∇ · (u ◦ T −1 ρ),
dt |t=0
dt
dt
we have
n
o
TanT Pf −1 (ρ) = vector fields u on Rd such that ∇ · (u ◦ T −1 ρ) = 0 ,
and the scalar product between two vector fields in TanT Pf −1 (ρ) is the one inherited by the one in
BM, i.e. is the scalar product in L2 (ρ).
Now choose a distinguished map T ρ ∈ Pf −1 (ρ) and notice that the right composition with T ρ
provides a natural bijective map from Arn(ρ) into Pf −1 (ρ), because
S# ρ = ρ
(S ◦ T ρ )# ρ = ρ.
⇔
We claim that this right composition also provides an isometry between the “Riemannian manifolds”
Arn(ρ) and Pf −1 (ρ): indeed, if v ∈ TanS Arn(ρ), then the perturbed maps S + tv are sent to
S ◦ T ρ + tv ◦ T ρ , which means that the perturbation v of S is sent to the perturbation u := v ◦ T ρ
of S ◦ T ρ by the differential of the right composition. The conclusion follows from the change of
variable formula, which gives
Z
Z
|v|2 dρ =
|u|2 dρ.
Clearly, the kernel of the differential dPf of Pf at T is given by TanT Pf −1 Pf(T ) , thus it remains
to prove that its orthogonal is sent isometrically onto TanPf(T ) (P2 (Rd )) by dPf. Fix T ∈ BM, let
ρ := Pf(T ) = T# ρ and observe that
Z
o
n
−1
⊥
TanT Pf
ρ = vector fields w :
hw, ui dρ = 0, ∀u s.t. ∇ · (u ◦ T −1 ρ) = 0
Z
n
o
= vector fields w :
w ◦ T −1 , u ◦ T −1 dρ = 0, ∀u s.t. ∇ · (u ◦ T −1 ρ) = 0
n
o
= vector fields w : w ◦ T −1 = ∇ϕ for some ϕ ∈ Cc∞ (Rd ) .
−1
Now pick w ∈ Tan⊥
ρ , let ϕ ∈ Cc∞ (Rd ) be such that w ◦ T −1 = ∇ϕ and observe that
T Pf
d
d
d
d
Pf(T +tw) = |t=0 (T +tw)# ρ = |t=0 (Id+tw◦T −1 )# (T# ρ) = |t=0 (Id+t∇ϕ)# ρ,
|
t=0
dt
dt
dt
dt
which means, by definition of Tanρ (P2 (Rd )) and the action of tangent vectors, that the differential
dPf(T )(w) of Pf calculated at T along the direction w is given by ∇ϕ. The fact that this map is an
isometry follows once again by the change of variable formula
Z
Z
Z
|w|2 dρ = |w ◦ T −1 |2 dρ = |∇ϕ|2 dρ.
86
6.2
On the notion of tangent space
Aim of this section is to quickly discuss the definition of tangent space of P2 (Rd ) at a certain
measure µ from a purely geometric perspective. We will see how this perspective is related to the
discussion made in Subsection 2.3.2, where we defined tangent space as
n
oL2 (Rd ,Rd ;µ)
∞
d
.
Tanµ (P2 (R )) := ∇ϕ : ϕ ∈ Cc (R )
d
Recall that this definition came from the characterization of absolutely continuous curves on P2 (Rd )
(Theorem 2.29 and the subsequent discussion).
Yet, there is a completely different and purely geometrical approach which leads to a definition
of tangent space at µ. The idea is to think the tangent space at µ as the “space of directions”, or,
which is the same, as the set of constant speed geodesics emanating from µ. More precisely, let the
set Geod µ be defined by:
Geod µ :=
n constant speed geodesics starting from µ
o
/ ≈,
and defined on some interval of the kind [0, T ]
where we say that (µt ) ≈ (µ0t ) provided they coincide on some right neighborhood of 0. The natural
distance D on Geod µ is:
W2 (µt , µ0t )
D (µt ), (µ0t ) := lim
.
(6.1)
t↓0
t
The Geometric Tangent space Tanµ (P2 (Rd )) is then defined as the completion of Geod µ w.r.t. the
distance D.
The natural question here is: what is the relation between the “space of gradients”
Tanµ (P2 (Rd )) and the “space of directions” Tanµ (P2 (Rd ))?
Recall that from Remark 1.22 we know that given ϕ ∈ Cc∞ (Rd ), the map t 7→ (Id + t∇ϕ)# µ
is a constant speed geodesic on a right neighborhood of 0. This means that there is a natural map ιµ
from the set {∇ϕ : ϕ ∈ Cc∞ } into Geod µ , and therefore into Tanµ (P2 (Rd )), which sends ∇ϕ into
the (equivalence class of the) geodesic t 7→ (Id + t∇ϕ)# µ. The main properties of the Geometric
Tangent space and of this map are collected in the following theorem, which we state without proof.
Theorem 6.1 (The tangent space) Let µ ∈ P2 (Rd ). Then:
• the lim in (6.1) is always a limit,
• the metric space (Tanµ (P2 (Rd )), D) is complete and separable,
• the map ιµ : {∇ϕ} → Tanµ (P2 (Rd )) is an injective isometry, where on the source space
we put the L2 distance w.r.t. µ. Thus, ιµ always extends to a natural isometric embedding of
Tanµ (P2 (Rd )) into Tanµ (P2 (Rd )).
Furthermore, the following statements are equivalent:
i) the space (Tanµ (P2 (Rd )), D) is an Hilbert space,
ii) the map ιµ : Tanµ (P2 (Rd )) → Tanµ (P2 (Rd )) is surjective,
iii) the measure µ is regular (definition 1.25).
We comment on the second part of the theorem. The first thing to notice is that the “space of directions” Tanµ (P2 (Rd )) can be strictly larger than ‘the space of gradients’ Tanµ (P2 (Rd )). This
is actually not surprising if one thinks to the case in which µ is a Dirac mass. Indeed in this situation the space (Tanµ (P2 (Rd )), D) coincides with the space (P2 (Rd ), W2 ) (this can be checked
87
directly from the definition), however, the space Tanµ (P2 (Rd )) is actually isometric to Rd itself,
and is therefore much smaller.
The reason is that geodesics are not always induced by maps, that is, they are not always of the
form t 7→ (Id + tu)# µ for some vector field u ∈ L2µ . To some extent, here we are facing the same
problem we had to face when starting the study of the optimal transport problem: maps are typically
not sufficient to produce (optimal) transports. From this perspective, it is not surprising that if the
measure we are considering is regular (that is, if for any ν ∈ P2 (Rd ) there exists a unique optimal
plan, and this plan is induced by a map), then the “space of directions” coincides with the “space of
directions induced by maps”.
6.3
Second order calculus
Now we pass to the description of the second order analysis over P2 (Rd ). The concepts that now
enter into play are: Covariant Derivative, Parallel Transport and Curvature. To some extent, the
situation is similar to the one we discussed in Subsection 2.3.2 concerning the first order structure: the
metric space (P2 (Rd ), W2 ) is not a Riemannian manifold, but if we are careful in giving definitions
and in the regularity requirements of the objects involved we will be able to perform calculations
very similar to those valid in a genuine Riemannian context.
Again, we are restricting the analysis to the Euclidean case only for simplicity: all of what comes
next can be generalized to the analysis over P2 (M ), for a generic Riemannian manifold M .
On a typical course of basic Riemannian geometry, one of the first concepts introduced is that
of Levi-Civita connection, which identifies the only natural (“natural” here means: “compatible with
the Riemannian structure”) way of differentiating vector fields on the manifold. It would therefore
be natural to set up our discussion on the second order analysis on P2 (Rd ) by giving the definition
of Levi-Civita connection in this setting. However, this cannot be done. The reason is that we don’t
have a notion of smoothness for vector fields, therefore not only we don’t know how to covariantly
differentiate vector fields, but we don’t know either which are the vector fields regular enough to be
differentiated. In a purely Riemannian setting this problem does not appear, as a Riemannian manifold borns as smooth manifold on which we define a scalar product on each tangent space; but the
space P2 (Rd ) does not have a smooth structure (there is no diffeomorphism of a small ball around
the origin in Tanµ (P2 (Rd )) onto a neighborhood of µ in P2 (Rd )). Thus, we have to proceed in a
different way, which we describe now:
Regular curves first of all, we drop the idea of defining a smooth vector field on the whole “manifold”. We will rather concentrate on finding an appropriate definition of smoothness for vector fields
defined along curves. We will see that to do this, we will need to work with a particular kind of
curves, which we call regular, see Definition 6.2.
Smoothness of vector fields. We will then be able to define the smoothness of vector fields defined
along regular curves (Definition 6.5). Among others, a notion of smoothness of particular relevance
is that of absolutely continuous vector fields: for this kind of vector fields we have a natural notion
of total derivative (not to be confused with the covariant one, see Definition 6.6).
Levi-Civita connection. At this point we have all the ingredients we need to define the covariant
derivative and to prove that it is the Levi-Civita connection on P2 (Rd ) (Definiton 6.8 and discussion
thereafter).
Parallel transport. This is the main existence result on this subject: we prove that along regular
curves the parallel transport always exists (Theorem 6.15). We will also discuss a counterexample to
the existence of parallel transport along a non-regular geodesic (Example 6.16). This will show that
the definition of regular curve is not just operationally needed to provide a definition of smoothness
88
of vector fields, but is actually intrinsically related to the geometry of P2 (Rd ).
Calculus of derivatives. Using the technical tools developed for the study of the parallel transport,
we will be able to explicitly compute the total and covariant derivatives of basic examples of vector
fields.
Curvature. We conclude the discussion by showing how the concepts developed can lead to a rigorous definition of the curvature tensor on P2 (Rd ).
We will write kvkµ and hv, wiµ for the norm of the vector field v and the scalar product of the
vector fields v, w in the space L2 (µ) (which we will denote by L2µ ), respectively.
We now start with the definition of regular curve. All the curves we will consider are defined on
[0, 1], unless otherwise stated.
Definition 6.2 (Regular curve) Let (µt ) be an absolutely continuous curve and let (vt ) be its velocity vector field, that is (vt ) is the unique vector field - up to equality for a.e. t - such that
vt ∈ Tanµt (P2 (Rd )) for a.e. t and the continuity equation
d
µt + ∇ · (vt µt ) = 0,
dt
holds in the sense of distributions (recall Theorem 2.29 and Definition 2.31). We say that (µt ) is
regular provided
Z 1
kvt k2µt dt < ∞,
(6.2)
0
and
Z
1
Lip(vt )dt < ∞.
(6.3)
0
Observe that the validity of (6.3) is independent on the parametrization of the curve, thus if it is
fulfilled it is always possible to reparametrize the curve (e.g. with constant speed) in order to let it
satisfy also (6.2).
Now assume that (µt ) is regular. Then by the classical Cauchy-Lipschitz theory we know that
there exists a unique family of maps T(t, s, ·) : supp(µt ) → supp(µs ) satisfying
(
d
T(t, s, x) = vs (T(t, s, x)),
ds
T(t, t, x) = x,
∀t ∈ [0, 1], x ∈ supp(µt ), a.e. s ∈ [0, 1],
(6.4)
∀t ∈ [0, 1], x ∈ supp(µt ).
Also it is possible to check that these maps satisfy the additional properties
T(r, s, ·) ◦ T(t, r, ·) = T(t, s, ·)
T(t, s, ·)# µt = µs ,
∀t, r, s ∈ [0, 1],
∀t, s ∈ [0, 1].
We will call this family of maps the flow maps of the curve (µt ). Observe that for any couple
of times t, s ∈ [0, 1], the right composition with T(t, s, ·) provides a bijective isometry from L2µs to
L2µt . Also, notice that from condition (6.2) and the inequalities
kT(t, s, ·) − T(t, s0 , ·)k2µt ≤
Z
Z
!2
s0
vr (T(t, r, x))dr
s
0
Z
≤ |s − s|
s
89
s0
kvr (x)k2µr (x) dr
dµt (x)
we get that for fixed t ∈ [0, 1], the map s 7→ T(t, s, ·) ∈ L2µt is absolutely continuous.
It can be proved that the set of regular curves is dense in the set of absolutely continuous curves
on P2 (Rd ) with respect to uniform convergence plus convergence of length. We omit the technical
proof of this fact and focus instead on the important case of geodesics:
Proposition 6.3 (Regular geodesics) Let (µt ) be a constant speed geodesic on [0, 1]. Then its restriction to any interval [ε, 1 − ε], with ε > 0, is regular. In general, however, the whole curve (µt )
may be not regular on [0, 1].
Proof To prove that (µt ) may be not regular just consider the case of µ0 := δx and µ1 := 12 (δy1 +
δy2 ): it is immediate to verify that for the velocity vector field (vt ) it holds Lip(vt ) = t−1 .
For the other part, recall from Remark 2.25 (see also Proposition 2.16) that for t ∈ (0, 1) and
s ∈ [0, 1] there exists a unique optimal map Tts from µt to µs . It is immediate to verify from formula
(2.11) that these maps satisfy
0
T s − Id
Tts − Id
,
= t0
s−t
s −t
∀t ∈ (0, 1), s ∈ [0, 1].
Thus, thanks to Proposition 2.32, we have that vt is given by
Tts − Id
Id − Tt0
=
.
s→t s − t
t
vt = lim
(6.5)
Now recall that Remark 2.25 gives Lip(T0t ) ≤ (1 − t)−1 to obtain
Lip(vt ) ≤ t−1 ((1 − t)−1 + 1) =
2−t
.
t(1 − t)
Thus t 7→ Lip(vt ) is integrable on any interval of the kind [ε, 1 − ε], ε > 0.
Definition 6.4 (Vector fields along a curve) A vector field along a curve (µt ) is a Borel map
(t, x) 7→ ut (x) such that ut ∈ L2µt for a.e. t. It will be denoted by (ut ).
Observe that we are considering also non tangent vector fields, that is, we are not requiring
ut ∈ Tanµt (P2 (Rd )) for a.e. t.
To define the (time) smoothness of a vector field (ut ) defined along a regular curve (µt ) we will
make an essential use of the flow maps: notice that the main problem in considering the smoothness
of (ut ) is that for different times, the vectors belong to different spaces. To overcome this obstruction
we will define the smoothness of t 7→ ut ∈ L2µt in terms of the smoothness of t 7→ ut ◦ T(t0 , t, ·) ∈
L2µt0 :
Definition 6.5 (Smoothness of vector fields) Let (µt ) be a regular curve, T(t, s, ·) its flow maps
and (ut ) a vector field defined along it. We say that (ut ) is absolutely continuous (or C 1 , or C n , . . .,
or C ∞ or analytic) provided the map
t 7→ ut ◦ T(t0 , t, ·) ∈ L2µt0
is absolutely continuous (or C 1 , or C n , . . ., or C ∞ or analytic) for every t0 ∈ [0, 1].
Since ut ◦ T(t1 , t, ·) = ut ◦ T(t0 , t, ·) ◦ T(t1 , t0 , ·) and the composition with T(t1 , t0 , ·) provides
an isometry from L2µt0 to L2µt1 , it is sufficient to check the regularity of t 7→ ut ◦ T(t0 , t, ·) for some
t0 ∈ [0, 1] to be sure that the same regularity holds for every t0 .
90
Definition 6.6 (Total derivative) With the same notation as above, assume that (ut ) is an absolutely
continuous vector field. Its total derivative is defined as:
ut+h ◦ T(t, t + h, ·) − ut
d
ut := lim
,
h→0
dt
h
where the limit is intended in L2µt .
Observe that we are not requiring the vector field to be tangent, and that the total derivative is in
general a non tangent vector field, even if (ut ) is.
The identity
ut+h ◦ T(t0 , t + h, ·) − ut ◦ T(t0 , t, ·)
ut+h ◦ T(t, t + h, ·) − ut
= lim
◦ T(t, t0 , ·)
lim
h→0
h→0
h
h
d
ut ◦ T(t0 , t, ·) ◦ T(t, t0 , ·),
=
dt
shows that the total derivative is well defined for a.e. t and that is an L1 vector field, in the sense that
it holds
Z 1
d ut dt < ∞.
dt 0
µt
Notice also the inequality
Z
kus ◦ T(t, s, ·) − ut kµt ≤
t
s
d
(ur ◦ T(t, r, ·))
dt
µt
s
Z
dr =
t
d ur dr.
dt µr
An important property of the total derivative is the Leibnitz rule: for any couple
contin
of absolutely
uous vector fields (u1t ), (u2t ) along the same regular curve (µt ) the map t 7→ u1t , u2t µt is absolutely
continuous and it holds
d 1 2
d 1 2
d
ut , ut µt =
ut , ut
+ u1t , u2t
,
a.e. t.
(6.6)
dt
dt
dt
µt
µt
Indeed, from the identity
u1t , u2t
µt
= u1t ◦ T(t0 , t, ·), u2t ◦ T(t0 , t, ·) µt ,
0
it follows the absolute continuity, and the same expression gives
d 1 2
d 1
ut , ut µt =
ut ◦ T(t0 , t, ·), u2t ◦ T(t0 , t, ·) µt
0
dt
dt
2
d 1
d 2
1
=
u ◦ T(t0 , t, ·) , ut ◦ T(t0 , t, ·)
+ ut ◦ T(t0 , t, ·),
u ◦ T(t0 , t, ·)
dt t
dt t
µt0
µt0
d 1 2
d
=
u ,u
+ u1t , u2t
.
dt t t µt
dt
µt
Example 6.7 (The smooth case) Let (x, t) 7→ ξt (x) be a Cc∞ vector field on Rd , (µt ) a regular
curve and (vt ) its velocity vector field. Then the inequality
kξs ◦ T(t, s, ·) − ξt kµt ≤ kξs − ξt kµs + kξt ◦ T(t, s, ·) − ξt kµt ≤ C|s − t| + C 0 kT(t, s, ·) − Idkµt ,
91
with C := supt,x |∂t ξt (x)|, C 0 := supt,x |ξt (x)|, together with the fact that s 7→ T(t, s, ·) ∈ L2 (µt )
is absolutely continuous, gives that (ξt ) is absolutely continuous along (µt ).
Then a direct application of the definition gives that its total derivative is given by
d
ξt = ∂t ξt + ∇ξt · vt ,
dt
a.e. t,
(6.7)
which shows that the total derivative is nothing but the convective derivative well known in fluid
dynamics.
For µ ∈ P2 (Rd ), we denote by Pµ : L2µ → Tanµ (P2 (Rd )) the orthogonal projection, and we
put P⊥
µ := Id − Pµ .
Definition 6.8 (Covariant derivative) Let (ut ) be an absolutely continuous and tangent vector field
along the regular curve (µt ). Its covariant derivative is defined as
d
D
ut := Pµt
ut .
(6.8)
dt
dt
The trivial inequality
D ut ≤ d ut dt dt µt
µt
shows that the covariant derivative is an L1 vector field.
In order to prove that the covariant derivative we just defined is the Levi-Civita connection, we
need to prove two facts: compatibiliy with the metric and torsion free identity. Recall that on a
standard Riemannian manifold, these two conditions are respectively given by:
d
hX(γt ), Y (γt )i = (∇γt0 X)(γt ), Y (γt ) + X(γt ), (∇γt0 Y )(γt )
dt
[X, Y ] = ∇X Y − ∇Y X,
where X, Y are smooth vector fields and γ is a smooth curve on M .
The compatibility with the metric follows immediately from the Leibnitz rule (6.6), indeed if
(u1t ), (u2t ) are tangent absolutely continuous vector fields we have:
d 1 2
d 1 2
d
ut , ut µt =
ut , ut
+ u1t , u2t
dt
dt
dt
µt
µ
t d 1
d 2
2
1
= Pµt
u , ut
+ ut , Pµt
u
(6.9)
dt t
dt t
µ
µt
t
D 1 2
D
=
ut , ut
+ u1t , u2t
.
dt
dt
µt
µt
To prove the torsion-free identity, we need first to understand how to calculate the Lie bracket of
two vector fields. To this aim, let µit , i = 1, 2, be two regular curves such that µ10 = µ20 =: µ and let
uit ∈ Tanµit (P2 (Rd )) be two C 1 vector fields satisfying u10 = v02 , u20 = v01 , where vti are the velocity
vector fields of µit . We assume that the velocity fields vti of µit are continuous in time (in the sense
that the map t 7→ vti µit is continuous in the set of vector valued measure with the weak topology and
t 7→ kvti kµit is continuous as well), to be sure that (6.7) holds for all t with vt = vti and the initial
condition makes sense. With these hypotheses, it makes sense to consider the covariant derivative
D 2
2
2
1
1
dt ut along (µt ) at t = 0: for this derivative we write ∇u0 ut . Similarly for (ut ).
92
R
Let us consider vector fields as derivations, and the functional µ 7→ Fϕ (µ) := ϕdµ,
for given
ϕ ∈ Cc∞ (Rd ). By the continuity equation, the derivative of Fϕ along u2t is equal to ∇ϕ, u2t µ2 ,
t
therefore the compatibility with the metric (6.9) gives:
D
E
d ∇ϕ, u2t µ2 |t=0 = ∇2 ϕ · v02 , u20 µ + ∇ϕ, ∇u10 u2t
u1 (u2 (Fϕ ))(µ) =
t
dt
µ
D
E
2
= ∇ ϕ · u10 , u20 µ + ∇ϕ, ∇u10 u2t .
µ
Subtracting the analogous term u2 (u1 (Fϕ ))(µ) and using the symmetry of ∇2 ϕ we get
D
E
[u1 , u2 ](Fϕ )(µ) = ∇ϕ, ∇u10 u2t − ∇u20 u1t .
µ
Given that the set {∇ϕ}ϕ∈Cc∞ is dense in Tanµ (P2 (Rd )), the above equation characterizes [u1 , u2 ]
as:
[u1 , u2 ] = ∇u10 u2t − ∇u20 u1t ,
(6.10)
which proves the torsion-free identity for the covariant derivative.
Example 6.9 (The velocity vector field of a geodesic) Let (µt ) be the restriction to [0, 1] of a
geodesic defined in some larger interval (−ε, 1 + ε) and let (vt ) be its velocity vector field. Then we
know by Proposition 6.3 that (µt ) is regular. Also, from formula (6.5) it is easy to see that it holds
vs ◦ T(t, s, ·) = vt ,
∀t, s ∈ [0, 1],
d
vt = 0 and a fortiori D
and thus (vt ) is absolutely continuous and satisfies dt
dt vt = 0.
Thus, as expected, the velocity vector field of a geodesic has zero convariant derivative, in analogy
with the standard Riemannian case. Actually, it is interesting to observe that not only the covariant
derivative is 0 in this case, but also the total one.
Now we pass to the question of parallel transport. The definition comes naturally:
Definition 6.10 (Parallel transport) Let (µt ) be a regular curve. A tangent vector field (ut ) along
it is a parallel transport if it is absolutely continuous and
D
ut = 0,
dt
93
a.e. t.
It is immediate to verify that the scalar product of two parallel transports is preserved in time,
indeed the compatibility with the metric (6.9) yields
D 1 2
d 1 2
1 D 2
u ,u
+ ut , ut
= 0, a.e. t,
=
u ,u
dt t t µt
dt t t µt
dt
µt
for any couple of parallel transports. In particular, this fact and the linearity of the notion of
parallel transport give uniqueness of the parallel transport itself, in the sense that for any u0 ∈
Tanµ0 (P2 (Rd )) there exists at most one parallel transport (ut ) along (µt ) satisfying u0 = u0 .
Thus the problem is to show the existence. There is an important analogy, which helps understanding the proof, that we want to point out: we already know that the space (P2 (Rd ), W2 ) looks
like a Riemannian manifold, but actually it has also stronger similarities with a Riemannian manifold
M embedded in some bigger space (say, on some Euclidean space RD ), indeed in both cases:
• we have a natural presence of non tangent vectors: elements of L2µ \ Tanµ (P2 (Rd )) for
P2 (Rd ), and vectors in RD non tangent to the manifold for the embedded case.
• The scalar product in the tangent space can be naturally defined also for non tangent vectors:
scalar product in L2µ for the space P2 (Rd ), and the scalar product in RD for the embedded
case. This means in particular that there are natural orthogonal projections from the set of
tangent and non tangent vectors onto the set of tangent vectors: Pµ : L2µ → Tanµ (P2 (Rd ))
for P2 (Rd ) and Px : RD → Tx M for the embedded case.
• The Covariant derivative of a tangent vector field is given by projecting the “time derivative”
onto the tangent space. Indeed, for the space P2 (Rd ) we know that the covariant derivative is
given by formula (6.8), while for the embedded manifold it holds:
d
ut ,
(6.11)
∇γ˙ t ut = Pγt
dt
where t 7→ γt is a smooth curve and t 7→ ut ∈ Tγt M is a smooth tangent vector field.
Given these analogies, we are going to proceed as follows: first we give a proof of the existence
of the parallel transport along a smooth curve in an embedded Riemannian manifold, then we will
see how this proof can be adapted to the Wasserstein case: this approach should help highlighting
what’s the geometric idea behind the construction.
Thus, say that M is a given smooth Riemannian manifold embedded on RD , t 7→ γt ∈ M a
smooth curve on [0, 1] and u0 ∈ Tγ0 M is a given tangent vector. Our goal is to prove the existence
of an absolutely continuous vector field t 7→ ut ∈ Tγt M such that u0 = u0 and
d
ut = 0,
a.e. t.
Pγ t
dt
For any t, s ∈ [0, 1], let trst : Tγt RD → Tγs RD be the natural translation map which takes a
vector with base point γt (tangent or not to the manifold) and gives back the translated of this vector
with base point γs . Notice that an effect of the curvature of the manifold and the chosen embedding
on RD , is that trst (u) may be not tangent to M even if u is. Now define Pts : Tγt RD → Tγs M by
Pts (u) := Pγs (trst (u)),
94
∀u ∈ Tγt RD .
An immediate consequence of the smoothness of M and γ are the two inequalities:
|trst (u) − Pts (u)| ≤ C|u||s − t|,
|Pts (u)|
≤ C|u||s − t|,
∀t, s ∈ [0, 1] and u ∈ Tγt M,
(6.12a)
Tγ⊥t M,
(6.12b)
∀t, s ∈ [0, 1] and u ∈
where Tγ⊥t M is the orthogonal complement of Tγt M in Tγt RD . These two inequalities are all we
need to prove existence of the parallel transport. The proof will be constructive, and is based on the
identity:
∇γt P0t (u)|t=0 = 0, ∀u ∈ Tγ(0) M,
(6.13)
which tells that the vectors P0t (u) are a first order approximation at t = 0 of the parallel transport.
Taking (6.11) into account, (6.13) is equivalent to
|Pt0 (trt0 (u) − P0t (u))| = o(t),
u ∈ Tγ(0) M.
(6.14)
Equation (6.14) follows by applying inequalities (6.12) (note that trt0 (u) − P0t (u) ∈ Tγ⊥t M ):
|Pt0 (trt0 (u) − P0t (u))| ≤ Ct|trt0 (u) − P0t (u)| ≤ C 2 t2 |u|.
Now, let P be the direct set of all the partitions of [0, 1], where, for P, Q ∈ P, P ≥ Q if P is a
refinement of Q. For P = {0 = t0 < t1 < · · · < tN = 1} ∈ P and u ∈ Tγ0 M define P(u) ∈ Tγ1 M
as:
t −1
P(u) := PttNN−1 (PtNN−2
(· · · (P0t1 (u)))).
Our first goal is to prove that the limit P(u) for P ∈ P exists. This will naturally define a curve
t → ut ∈ Tγt M by taking partitions of [0, t] instead of [0, 1]: the final goal is to show that this curve
is actually the parallel transport of u along the curve γ.
The proof is based on the following lemma.
Lemma 6.11 Let 0 ≤ s1 ≤ s2 ≤ s3 ≤ 1 be given numbers. Then it holds:
s
Ps 3 (u) − Pss3 (Pss2 (u)) ≤ C 2 |u||s1 − s2 ||s2 − s3 |, ∀u ∈ Tγs M.
1
2
1
1
Proof From Pss13 (u) = Pγs3 (trss31 (u)) = Pγs3 (trss32 (trss21 (u))) we get
Pss13 (u) − Pss23 (Pss12 (u)) = Pss23 (trss21 (u) − Pss12 (u))
Since u ∈ Tγs1 M and trss21 (u) − Pss12 (u) ∈ Tγ⊥s2 M , the proof follows applying inequalities (6.12).
From this lemma, an easy induction shows that for any 0 ≤ s1 < · · · < sN ≤ 1 and u ∈ Tγs1 M
we have
s
Ps N (u) − PssN (PssN −1 (· · · (Pss2 (u))))
1
1
N −1
N −2
−1
≤ Pss1N (u) − PssNN−1 (Pss1N −1 (u)) + PssNN−1 (Pss1N −1 (u)) − PssNN−1 (PssNN−2
(· · · (Pss12 (u))))
−1
≤ C 2 |u||sN1 − s1 |!sN − sN −1 | + Pss1N −1 (u) − PssNN−2
(· · · (Pss12 (u)))
≤
···
≤
C 2 |u|
N
−1
X
|s1 − si ||si − si+1 | ≤ C 2 |u||s1 − sN |2 .
i=2
With this result, we can prove existence of the limit of P (u) as P varies in P.
95
(6.15)
Theorem 6.12 For any u ∈ Tγ0 M there exists the limit of P(u) as P varies in P.
Proof We have to prove that, given ε > 0, there exists a partition P such that
|P(u) − Q(u)| ≤ |u|ε,
∀Q ≥ P.
(6.16)
P
In order to do so, it is sufficient to find 0 = t0 < t1 < · · · < tN = 1 such that i |ti+1 −ti |2 ≤ ε/C 2 ,
and repeatedly apply equation (6.15) to all partitions induced by Q in the intervals (ti , ti+1 ).
Now, for s ≤ t we can introduce the maps Tts : Tγt M → Tγs M which associate to the vector
u ∈ Tγt M the limit of the process just described taking into account partitions of [s, t] instead of
those of [0, 1].
Theorem 6.13 For any t1 ≤ t2 ≤ t3 ∈ [0, 1] it holds
Ttt23 ◦ Ttt12 = Ttt13 .
(6.17)
Moreover, for any u ∈ Tγ0 M the curve t → ut := T0t (u) ∈ Tγt M is the parallel transport of u
along γ.
Proof For the group property, consider those partitions of [t1 , t3 ] which contain t2 and pass to the
limit first on [t1 , t2 ] and then on [t2 , t3 ]. To prove the second part of the statement, we prove first
that (ut ) is absolutely continuous. To see this, pass to the limit in (6.15) with s1 = t0 and sN = t1 ,
u = ut0 to get
|Ptt01 (ut0 ) − ut1 | ≤ C 2 |ut0 |(t1 − t0 )2 ≤ C 2 |u|(t1 − t0 )2 ,
(6.18)
so that from (6.12a) we get
|trtt10 (ut0 ) − ut1 | ≤ |trtt10 (ut0 ) − Ptt01 (ut0 )| + |Ptt01 (ut0 ) − ut1 | ≤ C|u||t1 − t0 |(1 + C|t1 − t0 |),
which shows the absolute continuity. Finally, due to (6.17), it is sufficient to check that the covariant
derivative vanishes at 0. To see this, put t0 = 0 and t1 = t in (6.18) to get |P0t (u) − ut | ≤ C 2 |u|t2 ,
so that the thesis follows from (6.13).
Now we come back to the Wasserstein case. To follow the analogy with the Riemannian case,
keep in mind that the analogous of the translation map trst is the right composition with T(s, t, ·),
and the analogous of the map Pts is
Pts (u) := Pµs (u ◦ T(s, t, ·)),
which maps L2µt onto Tanµs (P2 (Rd )) We saw that the key to prove the existence of the parallel
transport in the embedded Riemannian case are inequalities (6.12). Thus, given that we want to imitate the approach in the Wasserstein setting, we need to produce an analogous of those inequalities.
This is the content of the following lemma.
d
d
2
We will denote by Tan⊥
µ (P2 (R )) the orthogonal complement of Tanµ (P2 (R )) in Lµ .
Lemma 6.14 (Control of the angles between tangent spaces) Let µ, ν ∈ P2 (Rd ) and T : Rd →
Rd be any Borel map satisfying T# µ = ν. Then it holds:
kv ◦ T − Pµ (v ◦ T )kµ ≤ kvkν Lip(T − Id),
∀v ∈ Tanν (P2 (Rd )),
and, if T is invertible, it also holds
kPµ (w ◦ T )kµ ≤ kwkν Lip(T −1 − Id),
96
d
∀w ∈ Tan⊥
ν (P2 (R )).
Proof We start with the first inequality, which is equivalent to
∀ϕ ∈ Cc∞ (Rd ).
k∇ϕ ◦ T − Pµ (∇ϕ ◦ T )kµ ≤ k∇ϕkν Lip(T − Id),
(6.19)
Let us suppose first that T − Id ∈ Cc∞ (Rd ). In this case the map ϕ ◦ T is in Cc∞ (Rd ), too, and
therefore ∇(ϕ ◦ T ) = ∇T · (∇ϕ) ◦ T belongs to Tanµ (P2 (Rd )). From the minimality properties
of the projection we get:
k∇ϕ ◦ T − Pµ (∇ϕ ◦ T )kµ ≤ k∇ϕ ◦ T − ∇T · (∇ϕ) ◦ T kµ
Z
1/2
=
|(I − ∇T (x)) · ∇ϕ(T (x))|2 dµ(x)
Z
≤
1/2
|∇ϕ(T (x))|2 k∇(Id − T )(x)k2op dµ(x)
≤ k∇ϕkν Lip(T − Id),
where I is the identity matrix and k∇(Id − T )(x)kop is the operator norm of the linear functional
from Rd to Rd given by v 7→ ∇(Id − T )(x) · v.
Now turn to the general case, and we can certainly assume that T is Lipschitz. Then, it is not
hard to see that there exists a sequence (Tn − Id) ⊂ Cc∞ (Rd ) such that Tn → T uniformly on
compact sets and limn Lip(Tn − Id) ≤ Lip(T − Id). It is clear that for such a sequence it holds
kT − Tn kµ → 0, and we have
k∇ϕ ◦ T − Pµ (∇ϕ ◦ T )kµ ≤ k∇ϕ ◦ T − ∇(ϕ ◦ Tn )kµ
≤ k∇ϕ ◦ T − ∇ϕ ◦ Tn kµ + k∇ϕ ◦ Tn − ∇(ϕ ◦ Tn )kµ
≤ Lip(∇ϕ)kT − Tn kµ + k∇ϕ ◦ Tn kµ Lip(Tn − Id).
Letting n → +∞ we get the thesis.
For the second inequality, just notice that
kPµ (w ◦ T )kµ =
sup
v∈Tanµ (P2 (Rd ))
kvkµ =1
=
sup
v∈Tanµ (P2 (Rd ))
kvkµ =1
hw ◦ T, viµ =
sup
w, v ◦ T −1
v∈Tanµ (P2 (Rd ))
kvkµ =1
ν
w, v ◦ T −1 − Pν (v ◦ T −1 ) ν ≤ kwkν Lip(T −1 − Id)
From this lemma and the inequality
Z
Rs
Lip(vr )dr |
|
t
Lip T(s, t, ·) − Id ≤ e
− 1 ≤ C t
s
Lip(vr )dr ,
∀t, s ∈ [0, 1],
R1
(whose simple proof we omit), where C := e 0 Lip(vr )dr − 1, it is immediate to verify that it holds:
Z s
s
ku ◦ T(s, t, ·) − Pt (u)kµs ≤ Ckukµt Lip(vr )dr ,
u ∈ Tanµt (P2 (Rd )),
Zt s
(6.20)
s
d
kPt (u)kµs ≤ Ckukµt Lip(vr )dr ,
u ∈ Tan⊥
(P
(R
)).
2
µt
t
These inequalities are perfectly analogous to the (6.12) (well, the only difference is that here the
bound on the angle is L1 in t, s while for the embedded case it was L∞ , but this does not really
change anything). Therefore the arguments presented before apply also to this case, and we can
derive the existence of the parallel transport along regular curves:
97
Theorem 6.15 (Parallel transport along regular curves) Let (µt ) be a regular curve and u0 ∈
Tanµ0 (P2 (Rd )). Then there exists a parallel transport (ut ) along (µt ) such that u0 = u0 .
Now, we know that the parallel transport exists along regular curves, and we know also that
regular curves are dense, it is therefore natural to try to construct the parallel transport along any
absolutely continuous curve via some limiting argument. However, this cannot be done, as the following counterexample shows:
Example 6.16 (Non existence of parallel transport along a non regular geodesic) Let
Q = [0, 1] × [0, 1] be the unit square in R2 and let Ti , i = 1, 2, 3, 4, be the four open triangles in which Q is divided by its diagonals. Let µ0 := χQ L 2 and define the function v : Q → R2
as the gradient of the convex map max{|x|, |y|}, as in the figure. Set also w = v ⊥ , the rotation by
π/2 of v, in Q and w = 0 out of Q. Notice that ∇ · (wµ0 ) = 0.
Set µt := (Id + tv)# µ0 and observe that, for positive t, the support Qt of µt is made of 4
connected components, each one the translation of one of the sets Ti , and that µt = χQt L 2 .
It is immediate to check that (µt ) is a geodesic in [0, ∞), so that from 6.3 we know that the
restriction of µt to any interval [ε, 1] with ε > 0 is regular. Fix ε > 0 and note that, by construction,
the flow maps of µt in [ε, 1] are given by
T(t, s, ·) = (Id + sv) ◦ (Id + tv)−1 ,
∀t, s ∈ [ε, 1].
Now, set wt := w ◦ T(t, 0, ·) and notice that wt is tangent at µt (because wt is constant in the
connected components of the support of µt , so we can define a Cc∞ function to be affine on each
connected component and with gradient given by wt , and then use the space between the components
d
themselves to rearrange smoothly the function). Since wt+h ◦ T(t, t + h, ·) = wt , we have dt
wt = 0
D
and a fortiori dt wt = 0. Thus (wt ) is a parallel transport in [ε, 1]. Furthermore, since ∇ · (wµ0 ) = 0,
we have w0 = w ∈
/ Tanµ0 (P2 (R2 )). Therefore there is no way to extend wt to a continuous tangent
vector field on the whole [0, 1]. In particular, there is no way to extend the parallel transport up to
t = 0.
Now we pass to the calculus of total and covariant derivatives. Let (µt ) be a fixed regular curve
and let (vt ) be its velocity vector field. Start observing that, if (ut ) is absolutely continuous along
98
(µt ), then (Pµt (ut )) is absolutely continuous as well, as it follows from the inequality
Pµs (us ) ◦ T(t, s, ·) − Pµt (ut ) ≤ Pµs (us ) ◦ T(t, s, ·) − Pµt Pµs (us ) ◦ T(t, s, ·) µt
µt
+ Pµt Pµs (us ) ◦ T(t, s, ·) − Pµt us ◦ T(t, s, ·) µt
+ kPµt (us ◦ T(t, s, ·)) − Pµt (ut )kµt
⊥
≤ Pµt Pµs (us ) ◦ T(t, s, ·) + Pµt P⊥
µs (us ) ◦ T(t, s, ·) µt
µt
+ kus ◦ T(t, s, ·) − ut kµt
Z s
Z s
d ≤ 2SC
Lip(vr )dr +
dr ur dr,
t
t
µr
(6.20)
(6.21)
valid for any t ≤ s, where S := supt kut kµt . Thus (Pµt (ut )) has a well defined covariant derivative
for a.e. t. The question is: can we find a formula to express this derivative?
To compute it, apply the Leibniz rule for the total and covariant derivatives ((6.6) and (6.9)), to
get that for a.e. t ∈ [0, 1] it holds
d
D
D
+ Pµt (ut ), ∇ϕ
,
hPµt (ut ), ∇ϕiµt =
Pµt (ut ), ∇ϕ
dt
dt
dt
µt
µt
d
d
d
+ ut , ∇ϕ
.
hut , ∇ϕiµt =
ut , ∇ϕ
dt
dt
dt
µt
µt
Since ∇ϕ ∈ Tanµt (P2 (Rd )) for any t, it holds hPµt (ut ), ∇ϕiµt = hut , ∇ϕiµt for any t ∈ [0, 1],
and thus the left hand sides of the previous equations are equal for a.e. t. Recalling formula (6.7) we
d
2
have dt
∇ϕ = ∇2 ϕ · vt and D
dt ∇ϕ = Pµt (∇ ϕ · vt ), thus from the equality of the right hand sides
we obtain
D
d
Pµ (ut ), ∇ϕ
=
ut , ∇ϕ
+ ut , ∇2 ϕ · vt µt − Pµt (ut ), Pµt (∇2 ϕ · vt ) µt
dt t
dt
µt
µ
t
d
⊥
2
=
ut , ∇ϕ
+ P⊥
µt (ut ), Pµt (∇ ϕ · vt ) µt .
dt
µt
(6.22)
∞
d
This formula characterizes the scalar product of D
dt Pµt (ut ) with any ∇ϕ when ϕ varies on Cc (R ).
Since the set {∇ϕ} is dense in Tanµt (P2 (Rd )) for any t ∈ [0, 1], the formula actually identifies
D
dt Pµt (ut ).
However, from this expression it is unclear what is the value of D
dt Pµt (ut ), w µt for a general
w ∈ Tanµt (P2 (Rd )), because some regularity of ∇ϕ seems required to compute ∇2 ϕ · vt . In order
to better understand what the value of D
dt Pµt (ut ) is, fix t ∈ [0, 1] and assume for a moment that
vt ∈ Cc∞ (Rd ). Then compute the gradient of x 7→ h∇ϕ(x), vt (x)i to obtain
∇ h∇ϕ, vt i = ∇2 ϕ · vt + ∇vtt · ∇ϕ,
and consider this expression as an equality between vector fields in L2µt . Taking the projection onto
the Normal space we derive
2
⊥
t
P⊥
µt (∇ ϕ · vt ) + Pµt (∇vt · ∇ϕ) = 0.
99
2
Plugging the expression for P⊥
µt (∇ ϕ · vt ) into the formula for the covariant derivative we get
D
d
⊥
t
Pµt (ut ), ∇ϕ
=
ut , ∇ϕ
− P⊥
µt (ut ), Pµt (∇vt · ∇ϕ) µt
dt
dt
µt
µ
t
d
ut , ∇ϕ
− ∇vt · P⊥
=
µt (ut ), ∇ϕ µt ,
dt
µt
which identifies
D
dt Pµt (ut )
as
D
Pµ (ut ) = Pµt
dt t
d
⊥
ut − ∇vt · Pµt (ut ) .
dt
(6.23)
We found this expression assuming that vt was a smooth vector field, but given that we know that
exists for a.e. t, it is realistic to believe that the expression makes sense also for general
Lipschitz vt ’s. The problem is that the object ∇vt may very well be not defined µt -a.e. for arbitrary
µt and Lipschitz vt (Rademacher’s theorem is of no help here, because we are not assuming the
measures µt to be absolutely continuous w.r.t. the Lebesgue measure). To give a meaning to formula
(6.23) we need to introduce a new tensor.
D
dt Pµt (ut )
Definition 6.17 (The Lipschitz non Lipschitz space) Let µ ∈ P2 (Rd ). The set L N Lµ ⊂ [L2µ ]2 is
the set of couples of vector fields (u, v) such that min{Lip(u), Lip(v)} < ∞, i.e. the set of couples
of vectors such that at least one of them is Lipschitz.
We say that a sequence (un , vn ) ∈ L N Lµ converges to (u, v) ∈ L N Lµ provided kun − ukµ → 0,
kvn − vkµ → 0 and
sup min{Lip(un ), Lip(vn )} < ∞.
n
The following theorem holds:
Theorem 6.18 (The Normal tensor) Let µ ∈ P2 (Rd ). The map
Nµ (u, v) : [Cc∞ (Rd , Rd )]2
(u, v)
→
7
→
d
Tan⊥
µ (P2 (R )),
t
⊥
Pµ (∇u · v)
extends uniquely to a sequentially continuous bilinear and antisymmetric map, still denoted by Nµ ,
d
from L N Lµ in Tan⊥
µ (P2 (R )) for which the bound
kNµ (u, v)kµ ≤ min{Lip(u)kvkµ , Lip(v)kukµ },
(6.24)
holds.
Proof For u, v ∈ Cc∞ (Rd , Rd ) we have ∇ hu, vi = ∇ut · v + ∇v t · u so that taking the projections
d
on Tan⊥
µ (P2 (R )) we get
Nµ (u, v) = −Nµ (v, u)
∀u, v ∈ Cc∞ (Rd , Rd ).
In this case, the bound (6.24) is trivial.
To prove existence and uniqueness of the sequentially continuous extension, it is enough to show
that for any given sequence n 7→ (un , vn ) ∈ [Cc∞ (Rd , Rd )]2 converging to some (u, v) ∈ L N Lµ , the
d
sequence n 7→ Nµ (un , vn ) ∈ Tan⊥
µ (P2 (R )) is a Cauchy sequence. Fix such a sequence (un , vn ),
let L := supn min{Lip(un ), Lip(vn )}, I ⊂ N be the set of indexes n such that Lip(un ) ≤ L and fix
two smooth vectors u
˜, v˜ ∈ Cc∞ (Rd , Rd ).
100
Notice that for n, m ∈ I it holds
kNµ (un , vn ) − Nµ (um , vm )kµ ≤ kNµ (un , vn − v˜)kµ + kNµ (un − um , v˜)kµ + kNµ (um , v˜ − vm )kµ
≤ Lkvn − v˜kµ + Lip(˜
v )kun − um kµ + Lkvm − v˜kµ ,
and thus
lim kNµ (un , vn ) − Nµ (um , vm )kµ ≤ 2Lkv − v˜kµ ,
n,m→∞
n,m∈I
(this expression being vacuum if I is finite). If n ∈ I and m ∈
/ I we have Lip(vm ) ≤ L and
kNµ (un , vn ) − Nµ (um , vm )kµ
≤ kNµ (un , vn − v˜)kµ + kNµ (un − u
˜, v˜)kµ + kNµ (˜
u, v˜ − vm )kµ + kNµ (˜
u − um , vm )kµ
≤ Lkvn − v˜kµ + Lip(˜
v )kun − u
˜kµ + Lip(˜
u)k˜
v − vm kµ + Lkum − u
˜kµ ,
which gives
lim
n,m→∞
n∈I, m∈I
/
kNµ (un , vn ) − Nµ (um , vm )kµ ≤ Lkv − v˜kµ + Lku − u
˜kµ .
Exchanging the roles of the u’s and the v’s in these inequalities for the case in which n ∈
/ I we can
conclude
lim kNµ (un , vn ) − Nµ (um , vm )kµ ≤ 2Lkv − v˜kµ + 2Lku − u
˜kµ .
n,m→∞
Since u
˜, v˜ are arbitrary, we can let u
˜ → u and v˜ → v in L2µ and conclude that n 7→ Nµ (un , vn ) is a
Cauchy sequence, as requested.
The other claims follow trivially by the sequential continuity.
Definition 6.19 (The operators Ov (·) and Ov∗ (·)) Let µ ∈ P2 (R)d and v ∈ L2µ with Lip(v) <
∞. Then the operator u 7→ Ov (u) is defined by
Ov (u) := Nµ (v, u).
The operator u 7→ Ov∗ (u) is the adjoint of Ov (·), i.e. it is defined by
hOv∗ (u) , wiµ = hu, Ov (w)iµ ,
∀w ∈ L2µ .
It is clear that the operator norm of Ov (·) and Ov∗ (·) is bounded by Lip(v). Observe that in
writing Ov (u), Ov∗ (u) we are losing the reference to the base measure µ, which certainly plays a
role in the definition; this simplifies the notation and hopefully should create no confusion, as the
measure we are referring to should always be clear from the context. Notice that if v ∈ Cc∞ (Rd , Rd )
these operators read as
t
Ov (u) = P⊥
µ (∇v · u),
Ov∗ (u) = ∇v · P⊥
µ (u).
The introduction of the operators Ov (·) and Ov∗ (·) allows to give a precise meaning to formula (6.23)
for general regular curves:
Theorem 6.20 (Covariant derivative of Pµt (ut )) Let (µt ) be a regular curve, (vt ) its velocity vector field and let (ut ) be an absolutely continuous vector field along it. Then (Pµt (ut )) is absolutely
continuous as well and for a.e. t it holds
D
d
∗
Pµ (ut ) = Pµt
ut − Ovt (ut ) .
(6.25)
dt t
dt
101
Proof The fact that (Pµt (ut )) is absolutely continuous has been proved with inequality (6.21). To
get the thesis, start from equation (6.22) and conclude noticing that for a.e. t it holds Lip(vt ) < ∞
and thus
2
P⊥
µt (∇ ϕ · vt ) = Nµ (∇ϕ, vt ) = −Nµ (vt , ∇ϕ) = −Ovt (∇ϕ) .
Corollary 6.21 (Total derivatives of Pµt (ut ) and P⊥
µt (ut )) Let (µt ) be a regular curve, let (vt ) be
its velocity vector field and let (ut ) be an absolutely continuous vector field along it. Then (P⊥
µt (ut ))
is absolutely continuous and it holds
d
d
Pµt (ut ) = Pµt
ut − Pµt Ov∗t (ut ) − Ovt (Pµt (ut )) ,
dt
dt
(6.26)
d
d ⊥
∗
Pµt (ut ) = P⊥
u
+
P
O
(u
)
+
O
(P
(u
))
.
t
µ
t
v
µ
t
µt
vt
t
t
t
dt
dt
Proof The absolute continuity of (P⊥
µt (ut )) follows from the fact that both (ut ) and (Pµt (ut )) are
absolutely continuous. Similarly, the second formula in (6.26) follows immediately from the first one
d
d
d ⊥
noticing that ut = Pµt (ut ) + P⊥
µt (ut ) yields dt ut = dt Pµt (ut ) + dt Pµt (ut ). Thus we have only
to prove the first equality in (6.26). To this aim, let (wt ) be an arbitrary absolutely continuous vector
field along (µt ) and observe that it holds
d
d
d
hPµt (ut ), wt iµt =
Pµt (ut ), wt
+ Pµt (ut ), wt
,
dt
dt
dt
µt
µt
D
D
d
+ Pµt (ut ), Pµt (wt )
.
hPµt (ut ), Pµt (wt )iµt =
Pµ (ut ), Pµt (wt )
dt
dt t
dt
µt
µt
Since the left hand sides of these expression are equal, the right hand sides are equal as well, thus we
get
D
d
D
d
Pµ (ut ) − Pµt (ut ), wt
= − Pµt (ut ), wt − Pµt (wt )
dt t
dt
dt
dt
µt
µt
d D
= − Pµt (ut ), Pµt
wt − Pµt (wt )
dt
dt
µt
(6.25)
∗
= − Pµt (ut ), Ovt (wt ) µt
= − hOvt (Pµt (ut )) , wt iµt ,
so that the arbitrariness of (wt ) gives
d
D
Pµt (ut ) = Pµt (ut ) − Ovt (Pµt (ut )) ,
dt
dt
and the conclusion follows from (6.25).
Along the same lines, the total derivative of (Nµt (ut , wt )) for given absolutely continuous vector
fields (ut ), (wt ) along the same regular curve (µt ) can be calculated. The only thing the we must take
care of, is the fact that Nµt is not defined on the whole [L2µt ]2 , so that we need to make some assumptions on (ut ), (wt ) to be sure that (Nµt (ut , wt )) is well defined and absolutely continuous. Indeed,
102
observe that from a purely formal point of view, we expect that the total derivative of (Nµt (ut , wt ))
is something like


some tensor - which we may think
d
d
d
.
Nµ (ut , wt ) = Nµt
ut , wt +Nµt ut , wt +  as the derivative of Nµt dt t
dt
dt
applied to the couple (ut , wt )
Forget about the last object and look at the first two addends: given that the domain of definition of
Nµt is not the whole [L2µt ]2 , in order for the above formula to make sense, we should ask that in each
d
d
ut , wt ) and (ut , dt
wt ), at least one vector is Lipschitz. Under the assumption that
of the couples ( dt
R1
R1
d
{ 0 Lip(ut )dt < ∞ and 0 Lip( dt ut )dt < +∞ }, it is possible to prove the following theorem
(whose proof we omit).
Theorem 6.22 Let (µt ) be an absolutely continuous curve, let (vt ) be its velocity vector field and let
R1
(ut ), (wt ) be two absolutely continuous vector fields along it. Assume that 0 Lip(ut )dt < ∞ and
R1
d
ut )dt < +∞. Then (Nµt (ut , wt )) is absolutely continuous and it holds
Lip( dt
0
d
d
d
Nµ (ut , wt ) =Nµt
ut , wt + Nµt ut , wt
dt t
dt
dt
(6.27)
∗
− Ovt (Nµt (ut , wt )) + Pµt Ovt (Nµt (ut , wt )) .
Corollary 6.23 Let (µt ) be a regular curve and assume that its velocity vector field (vt ) satisfies:
Z 1
d
vt dt < ∞.
(6.28)
Lip
dt
0
Then for every absolutely continuous vector field (ut ) both (Ovt (ut )) and (Ov∗t (ut )) are absolutely
continuous and their total derivatives are given by:
d
d
Ovt (ut ) = O d vt (ut ) + Ovt
ut − Ovt (Ovt (ut )) + Pµt Ov∗t (Ovt (ut ))
dt
dt
dt
(6.29)
d
d ∗
Ovt (ut ) = O∗d vt (ut ) + Ov∗t
ut − Ov∗t Ov∗t (ut ) + Ov∗t (Ovt (Pµt (ut )))
dt
dt
dt
Proof The first formula follows directly from Theorem 6.22, the second from the fact that Ov∗t (·) is
the adjoint of Ovt (·).
An important feature of equations (6.27) and (6.29) is that to express the derivatives of
(Nµt (ut , wt )), (Ovt (ut )) and (Ov∗t (ut )) no “new operators appear”. This implies that we can recursively calculate derivatives of any order of the vector fields (Pµt (ut )), (P⊥
µt (ut )), Ovt (ut ) and
Ov∗t (ut ), provided - of course - that we make appropriate regularity assumptions on the vector field
(ut ) and on the velocity vector field (vt ). An example of result which can be proved following this
direction is that the operator t 7→ Pµt (·) is analytic along (the restriction of) a geodesic:
Proposition 6.24 (Analyticity of t 7→ Pµt (·)) Let (µt ) be the restriction to [0, 1] of a geodesic defined in some larger interval [−ε, 1 + ε]. Then the operator t 7→ Pµt (·) is analytic in the following
sense. For any t0 ∈ [0, 1] there exists a sequence of bounded linear operators An : L2µt0 → L2µt0
such that the following equality holds in a neighborhood of t0
Pµt (u) =
X (t − t0 )n
An u ◦ T(t0 , t, ·) ◦ T(t, t0 , ·),
n!
n∈N
103
∀u ∈ L2µt .
(6.30)
Proof From the fact that (µt ) is the restriction of a geodesic we know that L := supt∈[0,1] Lip(vt ) <
d
vt = 0 (recall Example 6.9). In particular condition (6.28) is fulfilled.
∞ and that dt
d
Fix t0 ∈ [0, 1], u ∈ L2µt0 and define ut := u ◦ T(t, t0 , ·), so that dt
ut = 0. From equations
n
d
(6.26) and (6.29) and by induction it follows that (Pµt (ut )) is C ∞ . Also, dt
n Pµt (ut ) is the sum of
addends each of which is the composition of projections onto the tangent or normal space and up
to n operators Ovt (·) and Ov∗t (·), applied to the vector ut . Since the operator norm of Ovt (·) and
Ov∗t (·) is bounded by L, we deduce that
n
d
n Pµ (ut ) ≤ kut kµ Ln = kukµ Ln ,
∀n ∈ N, t ∈ [0, 1].
t
t0
t
dt
µt
Defining the curve t 7→ Ut := Pµt (ut ) ◦ T(t0 , t, ·) ∈ L2µt0 , the above bound can be written as
n d
dtn Ut ≤ kUt0 kµt0 Ln ,
∀n ∈ N, t ∈ [0, 1],
µt0
which implies that the curve t 7→ Ut ∈ L2µt0 is analytic. This means that for t close to t0 it holds
Pµt (ut ) ◦ T(t0 , t, ·) =
X (t − t0 )n dn
(Pµt (ut )).
n!
dtn |t=t0
n≥0
d
Now notice that equations (6.26) and (6.29) and the fact that dt
ut ≡ 0 ensure that
n
d
2
2
dtn |t=t0 (Pµt (ut )) = An (u), where An : Lµt0 → Lµt0 is bounded. Thus the thesis follows by
the arbitrariness of u ∈ L2µt0 .
Now we have all the technical tools we need in order to study the curvature tensor of the “manifold” P2 (Rd ).
Following the analogy with the Riemannian case, we are lead to define the curvature tensor in the
following way: given three vector fields µ 7→ ∇ϕiµ ∈ Tanµ (P2 (Rd )), i = 1, . . . , 3, the curvature
tensor R calculated on them at the measure µ is defined as:
R(∇ϕ1µ , ∇ϕ2µ )(∇ϕ3µ ) := ∇∇ϕ2µ (∇∇ϕ1µ ∇ϕ3µ ) − ∇∇ϕ1µ (∇∇ϕ2µ ∇ϕ3µ ) + ∇[∇ϕ1µ ,∇ϕ2µ ] ∇ϕ3µ ,
where the objects like ∇∇ϕµ (∇ψµ ), are, heuristically speaking, the covariant derivative of the vector
field µ 7→ ∇ψµ along the vector field µ 7→ ∇ϕµ .
However, in order to give a precise meaning to the above formula, we should be sure, at least,
that the derivatives we are taking exist. Such an approach is possible, but heavy: indeed, consider
that we should define what are C 1 and C 2 vector fields, and in doing so we cannot just consider
derivatives along curves. Indeed we would need to be sure that “the partial derivatives have the right
symmetries”, otherwise there won’t be those cancellations which let the above operator be a tensor.
Instead, we adopt the following strategy:
• First we calculate the curvature tensor for some very specific kind of vector fields, for which
we are able to do and justify the calculations. Specifically, we will consider vector fields of the
kind µ 7→ ∇ϕ, where the function ϕ ∈ Cc∞ (M ) does not depend on the measure µ.
• Then we prove that the object found is actually a tensor, i.e. that its value depends only on the
µ−a.e. value of the considered vector fields, and not on the fact that we obtained the formula
assuming that the functions ϕ’s were independent on the measure.
104
• Finally, we discuss the minimal regularity requirements for the object found to be well defined.
Pick ϕ, ψ ∈ Cc∞ (Rd ) and observe that a curve of the kind t 7→ (Id + t∇ϕ)# µ is a regular
geodesic on an interval [−T, T ] for T sufficiently small (Remark 1.22 and Proposition 6.3). It is
then immediate to verify that a vector field of the kind (∇ψ) along it is C ∞ . Its covariant derivative
calculated at t = 0 is given by Pµ (∇2 ψ · ∇ϕ). Thus we write:
∇∇ϕ ∇ψ := Pµ (∇2 ψ · ∇ϕ)
∀ϕ, ψ ∈ Cc∞ (Rd ).
(6.31)
Proposition 6.25 Let µ ∈ P2 (Rd ) and ϕ1 , ϕ2 , ϕ3 ∈ Cc∞ (Rd ). The curvature tensor R in µ
calculated for the 3 vector fields ∇ϕi , i = 1, 2, 3 is given by
∗
R(∇ϕ1 , ∇ϕ2 )∇ϕ3 =Pµ O∇ϕ
(Nµ (∇ϕ1 , ∇ϕ3 ))
2
(6.32)
∗
∗
− O∇ϕ
(N
(∇ϕ
,
∇ϕ
))
+
2O
(N
(∇ϕ
,
∇ϕ
))
.
µ
2
3
µ
1
2
∇ϕ3
1
Proof We start computing the value of ∇∇ϕ2 ∇∇ϕ1 ∇ϕ3 . Let µt := (Id + t∇ϕ2 )# µ and observe,
as just recalled, that (µt ) is a regular geodesic in some symmetric interval [−T, T ]. The vector field
∇2 ϕ3 · ∇ϕ1 is clearly C ∞ along it, thus by Proposition 6.24 also the vector field ut := Pµt (∇2 ϕ3 ·
∇ϕ1 ) = ∇∇ϕ1 ∇ϕ3 (µt ) is C ∞ . The covariant derivative at t = 0 of (ut ) along (µt ) is, by definition,
the value of ∇∇ϕ2 ∇∇ϕ1 ∇ϕ3 at µ. Applying formula (6.25) we get
2
∇∇ϕ2 ∇∇ϕ1 ∇ϕ3 = Pµ ∇(∇2 ϕ3 · ∇ϕ1 ) · ∇ϕ2 − ∇2 ϕ2 · P⊥
(6.33)
µ (∇ ϕ3 · ∇ϕ1 ) .
Symmetrically, it holds
2
∇∇ϕ1 ∇∇ϕ2 ∇ϕ3 = Pµ ∇(∇2 ϕ3 · ∇ϕ2 ) · ∇ϕ1 − ∇2 ϕ1 · P⊥
µ (∇ ϕ3 · ∇ϕ2 ) .
(6.34)
Finally, from the torsion free identity (6.10) we have
[∇ϕ1 , ∇ϕ2 ] = Pµ (∇2 ϕ1 · ∇ϕ2 − ∇2 ϕ2 · ∇ϕ1 ),
and thus
∇[∇ϕ1 ,∇ϕ2 ] ∇ϕ3 = Pµ ∇2 ϕ3 · Pµ (∇2 ϕ1 · ∇ϕ2 − ∇2 ϕ2 · ∇ϕ1 ) .
(6.35)
Subtracting (6.35) and (6.34) from (6.33) and observing that
∇(∇2 ϕ3 · ∇ϕ1 ) · ∇ϕ2 − ∇(∇2 ϕ3 · ∇ϕ2 ) · ∇ϕ1 = ∇2 ϕ3 · ∇2 ϕ1 · ∇ϕ2 − ∇2 ϕ3 · ∇2 ϕ2 · ∇ϕ1 ,
we get the thesis.
Observe that equation (6.32) is equivalent to
hR(∇ϕ1 , ∇ϕ2 )∇ϕ3 , ∇ϕ4 iµ = hNµ (∇ϕ1 , ∇ϕ3 ), Nµ (∇ϕ2 , ∇ϕ4 )iµ
− hNµ (∇ϕ2 , ∇ϕ3 ), Nµ (∇ϕ1 , ∇ϕ4 )iµ
(6.36)
+ 2 hNµ (∇ϕ1 , ∇ϕ2 ), Nµ (∇ϕ3 , ∇ϕ4 )iµ,
for any ϕ4 ∈ Cc∞ (M ). From this formula it follows immediately that the operator R is actually a
tensor:
Proposition 6.26 Let µ ∈ P2 (Rd ). The curvature operator, given by formula (6.36), is a tensor on
[{∇ϕ}]4 , i.e. its value depends only on the µ−a.e. value of the 4 vector fields.
Proof Clearly the left hand side of equation (6.36) is a tensor w.r.t. the fourth entry. The conclusion
follows from the symmetries of the right hand side.
105
We remark that from (6.36) it follows that R has all the expected symmetries.
Concerning the domain of definition of the curvature tensor, the following statement holds, whose
proof follows from the properties of the normal tensor Nµ :
Proposition 6.27 Let µ ∈ P2 (Rd ). Then the curvature tensor, thought as map from [{∇ϕ}]4 to R
given by (6.36), extends uniquely to a sequentially continuous map on the set of 4-ples of vector fields
in L2µ in which at least 3 vector fields are Lipschitz, where we say that (vn1 , vn2 , vn3 , vn4 ) is converging
to (v 1 , v 2 , v 3 , v 4 ) if there is convergence in L2µ on each coordinate and
sup Lip(vni ) < ∞,
n
for at least 3 indexes i.
Thus, in order for the curvature tensor to be well defined we need at least 3 of the 4 vector fields
involved to be Lipschitz. However, for some related notion of curvature the situation simplifies. Of
particular relevance is the case of sectional curvature:
Example 6.28 (The sectional curvature) If we evaluate the curvature tensor R on a 4-ple of vectors
of the kind (u, v, u, v) and we recall the antisymmetry of Nµ we obtain
2
hR(u, v)u, viµ = 3 kNµ (u, v)kµ .
Thanks to the simplification of the formula, the value of hR(u, v)u, viµ is well defined as soon as
either u or v is Lipschitz. That is, hR(u, v)u, viµ is well defined for (u, v) ∈ L N Lµ . In analogy with
the Riemannian case we can therefore define the sectional curvature K(u, v) at the measure µ along
the directions u, v by
K(u, v) :=
2
hR(u, v)u, viµ
2
kuk2µ kvk2µ − hu, viµ
=
3 kNµ (u, v)kµ
2
kuk2µ kvk2µ − hu, viµ
,
∀(u, v) ∈ L N Lµ .
This expression confirms the fact that the sectional curvatures of P2 (Rd ) are positive (coherently
with Theorem 2.20), and provides a rigorous proof of the analogous formula already appeared in [67]
and formally computed using O’Neill formula.
6.4
Bibliographical notes
The idea of looking at the Wasserstein space as a sort of infinite dimensional Riemannian manifold
is due to F. Otto and given in his seminal paper [67]. The whole discussion in Section 6.1 is directly
taken from there.
The fact that the ‘tangent space made of gradients’ Tanµ (P2 (Rd )) was not sufficient to study
all the aspects of the ‘Riemannian geometry’ of (P2 (Rd ), W2 ) has been understood in [6] in connection with the definition of subdifferential of a geodesically convex functional, in particular concerning the issue of having a closed subdifferential. In the appendix of [6] the concept of Geometric
Tangent space discussed in Section 6.2 has been introduced. Further studies on the properties of
Tanµ (P2 (M )) have been made in [43]. Theorem 6.1 has been proved in [46].
The first work in which a description of the covariant derivative and the curvature tensor of
(P2 (M ), W2 ), M being a compact Riemannian manifold has been given (beside the formal calculus
of the sectional curvature via O’Neill formula done already in [67]) is the paper of J. Lott [56]:
rigorous formulas are derived for the computation of such objects on the ‘submanifold’ PC ∞ (M )
106
made of absolutely continuous measures with density C ∞ and bounded away from 0. In the same
paper Lott shows that if M has a Poisson structure, then the same is true for PC ∞ (M ) (a topic which
has not been addressed in these notes).
Independently on Lott’s work, the second author built the parallel transport on (P2 (Rd ), W2 ) in
his PhD thesis [43], along the same lines provided in Section 6.3. The differences with Lott’s work
are the fact that the analysis was carried out on Rd rather than on a compact Riemannian manifold,
that no assumptions on the measures were given, and that both the existence Theorem 6.15 for the
parallel transport along a regular curve and counterexamples to its general existence (the Example
6.16) were provided. These results have been published by the authors of these notes in [5]. Later
on, after having beed aware of Lott’s results, the second author generalized the construction to the
case of Wasserstein space built over a manifold in [44]. Not all the results have been reported here:
we mention that it is possible to push the analysis up show the differentiability properties of the
exponential map and the existence of Jacobi fields.
7
Ricci curvature bounds
Let us start recalling what is the Ricci curvature for a Riemannian manifold M (which we will
always consider smooth and complete). Let R be the Riemann curvature tensor on M , x ∈ M and
u, v ∈ Tx M . Then the Ricci curvature Ric(u, v) ∈ R is defined as
X
Ric(u, v) :=
hR(u, ei )v, ei i ,
i
where {ei } is any orthonormal basis of Tx M . An immediate consequence of the definition and the
symmetries of R is the fact that Ric(u, v) = Ric(v, u).
Another, more geometric, characterization of the Ricci curvature is the following. Pick x ∈ M ,
a small ball B around the origin in Tx M and let µ be the Lebesgue measure on B. The exponential
map expx : B → M is injective and smooth, thus the measure (expx )# µ has a smooth density w.r.t.
the volume measure Vol on M . For any u ∈ B, let f (u) be the density of (expx )# µ w.r.t. Vol at the
point expx (u). Then the function f has the following Taylor expansion:
1
f (u) = 1 + Ric(u, u) + o(|u|2 ).
2
(7.1)
It is said that the Ricci curvature is bounded below by λ ∈ R provided
Ric(u, u) ≥ λ|u|2 ,
for every x ∈ M and u ∈ Tx M .
Several important geometric and analytic inequalities are related to bounds from below on Ricci
curvature, we mention just two of them.
• Brunn-Minkowski. Suppose that M has non negative Ricci curvature, and for any A0 , A1 ⊂ M
compact, let
n
o
At := γt : γ is a constant speed geodesic s.t. γ0 ∈ A0 , γ1 ∈ A1 ,
∀t ∈ [0, 1].
Then it holds
Vol(At )
1/n
≥ (1 − t) Vol(A0 )
1/n
where n is the dimension of M .
107
1/n
+ t Vol(A1 )
,
∀t ∈ [0, 1],
(7.2)
• Bishop-Gromov. Suppose that M has Ricci curvature bounded from below by (n − 1)k, where
˜ be the simply connected, n-dimensional
n is the dimension of M and k a real number. Let M
˜ is a sphere
space with constant curvature, having Ricci curvature equal to (n − 1)k (so that M
if k > 0, a Euclidean space if k = 0 and an hyperbolic space if k < 0). Then for every x ∈ M
˜ the map
and x
˜∈M
Vol(Br (x))
,
(7.3)
(0, ∞) 3 r 7→
g
Vol(Br (˜
x))
˜ respectively.
is non increasing, where Vol and g
Vol are the volume measures on M , M
A natural question is whether it is possible to formulate the notion of Ricci bound from below also
for metric spaces, analogously to the definition of Alexandrov spaces, which are a metric analogous
of Riemannian manifolds with bounded (either from above or from below) sectional curvature. What
became clear over time, is that the correct non-smooth object where one could try to give a notion of
Ricci curvature bound is not a metric space, but rather a metric measure space, i.e. a metric space
where a reference non negative measure is also given. When looking to the Riemannian case, this
fact is somehow hidden, as a natural reference measure is given by the volume measure, which is a
function of the distance.
There are several viewpoints from which one can see the necessity of a reference measure (which
can certainly be the Hausdorff measure of appropriate dimension, if available). A first (cheap) one
is the fact that in most of identities/inequalities where the Ricci curvature appears, also the reference
measures appears (e.g. equations (7.1), (7.2) and (7.3) above). A more subtle point of view comes
from studying stability issues: consider a sequence (Mn , gn ) of Riemannian manifolds and assume
that it converges to a smooth Riemannian manifold (M, g) in the Gromov-Hausdorff sense. Assume
that the Ricci curvature of (Mn , gn ) is uniformly bounded below by some K ∈ R. Can we deduce
that the Ricci curvature of (M, g) is bounded below by K? The answer is no (while the same question
with sectional curvature in place of Ricci one has affirmative answer). It is possible to see that when
Ricci bounds are not preserved in the limiting process, it happens that the volume measures of the
approximating manifolds are not converging to the volume measure of the limit one.
Another important fact to keep in mind is the following: if we want to derive useful analytic/geometric consequences from a weak definition of Ricci curvature bound, we should also known
what is the dimension of the metric measure space we are working with: consider for instance the
Brunn-Minkowski and the Bishop-Gromov inequalities above, both make sense if we know the dimension of M , and not just that its Ricci curvature is bounded from below. This tells that the natural
notion of bound on the Ricci curvature should be a notion speaking both about the curvature and
the dimension of the space. Such a notion exists and is called CD(K, N ) condition, K being the
bound from below on the Ricci curvature, and N the bound from above on the dimension. Let us
tell in advance that we will focus only on two particular cases: the curvature dimension condition
CD(K, ∞), where no upper bound on the dimension is specified, and the curvature-dimension condition CD(0, N ), where the Ricci curvature is bounded below by 0. Indeed, the general case is much
more complicated and there are still some delicate issues to solve before we can say that the theory
is complete and fully satisfactory.
Before giving the definition, let us highlight which are the qualitative properties that we expect
from a weak notion of curvature-dimension bound:
Intrinsicness. The definition is based only on the property of the space itself, that is, is not something
like “if the space is the limit of smooth spaces....”
Compatibility. If the metric-measure space is a Riemannian manifold equipped with the volume
measure, then the bound provided by the abstract definition coincides with the lower bound on the
108
Ricci curvature of the manifold, equipped with the Riemannian distance and the volume measure.
Stability. Curvature bounds are stable w.r.t. the natural passage to the limit of the objects which
define it.
Interest. Geometrical and analytical consequences on the space can be derived from curvaturedimension condition.
In the next section we recall some basic concepts concerning convergence of metric measure
spaces (which are key to discuss the stability issue), while in the following one we give the definition
of curvature-dimension condition and analyze its properties.
All the metric measure spaces (X, d, m) that we will consider satisfy the following assumption:
Assumption 7.1 (X, d) is Polish, the measure m is a Borel probability measure and m ∈ P2 (X).
7.1
Convergence of metric measure spaces
We say that two metric measure spaces (X, dX , mX ) and (Y, dY , mY ) are isomorphic provided
there exists a bijective isometry f : supp(mX ) → supp(mY ) such that f# mX = mY . This is the
same as to say that ‘we don’t care about the behavior of the space (X, dX ) where there is no mass’.
This choice will be important in discussing the stability issue.
Definition 7.2 (Coupling between metric measure spaces) Given two metric measure spaces
(X, dX , mX ), (Y, dY , mY ), we consider the product space (X × Y, DXY ), where DXY is the
distance defined by
q
DXY (x1 , y1 ), (x2 , y2 ) := d2X (x1 , x2 ) + d2Y (y1 , y2 ).
We say that a couple (d, γ) is an admissible coupling between (X, dX , mX ) and (Y, dY , mY ), we
write (d, γ) ∈ Adm((dX , mX ), (dY , mY )) if:
• d is a pseudo distance on supp mX t supp mY (i.e. it may be zero on two different
points) which coincides with dX (resp. dY ) when restricted to supp mX × supp mX (resp.
supp mY × supp mY ).
• a Borel (w.r.t. the Polish structure given by DXY ) measure γ on supp mX × supp mY such
1
2
that π#
γ = mX and π#
γ = mY .
It is not hard to see that the set of admissible couplings is always non empty.
The cost C(d, γ) of a coupling is given by
Z
C(d, γ) :=
d2 (x, y)dγ(x, y).
supp ùscriptsizemX ×supp ùscriptsizemY
The distance D (X, dX , mX ), (Y, dY , mY ) is then defined as
p
D (X, dX , mX ), (Y, dY , mY ) := inf C(d, γ),
(7.4)
the infimum being taken among all couplings (d, γ) of (X, dX , mX ) and (Y, dY , mY ).
˜ d ˜ , m ˜ ) (resp.
A trivial consequence of the definition is that if (X, dX , mX ) and (X,
X
X
˜
(Y, dY , mY ) and (Y , dY˜ , mY˜ )) are isomorphic, then
˜ d ˜ , m ˜ ), (Y˜ , d ˜ , m ˜ ) ,
D (X, dX , mX ), (Y, dY , mY ) = D (X,
X
X
Y
Y
109
so that D is actually defined on isomorphism classes of metric measure spaces.
In the next proposition we collect, without proof, the main properties of D.
Proposition 7.3 (Properties of D) The inf in (7.4) is realized, and a coupling realizing it will be
called optimal.
Also, let X be the set of isomorphism classes of metric measure spaces satisfying Assumption 7.1.
Then D is a distance on X, and in particular D is 0 only on couples of isomorphic metric measure
spaces.
Finally, the space (X, D) is complete, separable and geodesic.
Proof See Section 3.1 of [74].
We will denote by Opt ((dX , mX ), (dY , mY )) the set of optimal couplings between (X, dX , mX )
and (Y, dY , mY ), i.e. the set of couplings where the inf in (7.4) is realized.
Given a metric measure space (X, d, m) we will denote by P2a (X) ⊂ P(X) the set of measures
which are absolutely continuous w.r.t. m.
To any coupling (d, γ) of two metric measure spaces (X, dX , mX ) and (Y, dY , mY ), it is naturally associated a map γ # : P2a (X) → P2a (Y ) defined as follows:
Z
µ = ρmX
7→
γ # µ := ηmY , where η is defined by η(y) := ρ(x)dγ y (x), (7.5)
where {γ y } is the disintegration of γ w.r.t. the projection on Y . Similarly, there is a natural map
a
a
γ −1
# : P2 (Y ) → P2 (X) given by:
ν = ηmY
7→
γ −1
# ν
Z
:= ρmX , where ρ is defined by ρ(x) :=
η(y)dγ x (y),
where, obviously, {γ x } is the disintegration of γ w.r.t. the projection on X.
−1
Notice that γ # mX = mY and γ −1
# mY = mX and that in general γ # γ # µ 6= µ. Also, if γ is
induced by a map T : X → Y , i.e. if γ = (Id, T )# mX , then γ # µ = T# µ for any µ ∈ P2a (X).
D
Our goal now is to show that if (Xn , dn , mn ) → (X, d, m) of the internal energy kind on
Mosco-converge to the corresponding functional on (P2a (X), W2 ). Thus, fix a convex and continuous function u : [0, +∞) → R, define
(P2a (Xn ), W2 )
u0 (∞) := lim
z→+∞
u(z)
,
z
and, for every compact metric space (X, d), define the functional E : [P(X)]2 → R ∪ {+∞} by
Z
E (µ|ν) := u(ρ)dν + u0 (∞)µs (X),
(7.6)
where µ = ρν + µs is the decomposition of µ in absolutely continuous ρν and singular part µs w.r.t.
to ν.
Lemma 7.4 (E decreases under γ # ) Let (X, dX , mX ) and (Y, dY , mY ) be two metric measure
space and (d, γ) a coupling between them. Then it holds
E (γ # µ|mY ) ≤ E (µ|mX ),
∀µ ∈ P2a (X),
E (γ −1
# ν|mX ) ≤ E (ν|mY ),
∀ν ∈ P2a (Y ).
110
Proof Clearly it is sufficient to prove the first inequality. Let µ = ρmX and γ # µ = ηmY , with η
given by (7.5). By Jensen’s inequality we have
Z
Z Z
ρ(x)dγ y (x) dmY (y)
E (γ # µ|mY ) = u(η(y))dmY (y) = u
Z Z
Z
≤
u(ρ(x))dγ y (x)dmY (y) = u(ρ(x))dγ(x, y)
Z
= u(ρ(x))dmX (x) = E (µ|mX )
D
Proposition 7.5 (‘Mosco’ convergence of internal energy functionals) Let (Xn , dn , mn )
→
(X, d, m) and (dn , γ n ) ∈ Opt ((dn , mn ), (d, m)). Then the following two are true:
Weak Γ − lim. For any sequence n 7→ µn ∈ P2a (Xn ) such that n 7→ (γ n )# µn narrowly converges
to some µ ∈ P(X) it holds
lim E (µn |mn ) ≥ E (µ|m).
n→∞
Strong Γ − lim. For any µ ∈ P2a (X) with bounded density there exists a sequence n 7→ µn ∈
P2a (Xn ) such that W2 ((γ n )# µn , µ) → 0 and
lim E (µn |mn ) ≤ E (µ|m).
n→∞
Note: we put the apexes in Mosco because we prove the Γ − lim inequality only for measures with
bounded densities. This will be enough to prove the stability of Ricci curvature bounds (see Theorem
7.12).
Proof For the first statement we just notice that by Lemma 7.4 we have
E (µn |mn ) ≥ E ((γ n )# µn |m),
and the conclusion follows from the narrow lower semicontinuity of E (·|m).
For the second one we define µn := (γ −1
n )# µ. Then applying Lemma 7.4 twice we get
E (µ|m) ≥ E (µn |mn ) ≥ E ((γ n )# µn |m),
from which the Γ − lim inequality follows.
Thus to conclude we need to show that
W2 ((γ n )# µn , µ) → 0. To check this, we use the Wassertein space built over the (pseudo-)metric
˜ n ∈ P(Xn × X) by
space (Xn t X, dn ): let µ = ρmX and for any n ∈ N define the plan γ
˜ n ∈ Adm(µn , µ). Thus
d˜
γ n (y, x) := ρ(x)dγ n (y, x) and notice that γ
sZ
sZ
√ p
W2 (µn , µ) ≤
d2n (x, y)d˜
γ n (y, x) ≤
d2n (x, y)ρ(x)dγ n (y, x) ≤ M C(dn , γ n ),
where M is the essential supremum of ρ. By definition, it is immediate to check that the density ηn
of µn is also bounded above by M . Introduce the plan γ n by dγ n (y, x) := ηn (y)dγ n (y, x) and
notice that γ n ∈ Adm(µn , (γ n )# µn ), so that, as before, we have
sZ
sZ
√ p
d2n (x, y)dγ n (y, x) ≤
d2n (x, y)ηn (y)dγ n (y, x) ≤ M C(dn , γ n ).
W2 (µn , (γ n )# µn ) ≤
In conclusion we have
√ p
W2 (µ, (γ n )# µn ) ≤ W2 (µn , (γ n )# µn ) + W2 (µn , µ) ≤ 2 M C(dn , γ n ),
which gives the thesis.
111
7.2
Weak Ricci curvature bounds: definition and properties
Define the functions uN , N > 1, and u∞ on [0, +∞) as
1
uN (z) := N (z − z 1− N ),
and
u∞ (z) := z log(z).
Then given a metric measure space (X, d, m) we define the functionals EN , E∞ : P(X) →
R ∪ {+∞} by
EN (µ) := E (µ|m),
where E (·|·) is given by formula (7.6) with u := uN ; similarly for E∞ .
The definitions of weak Ricci curvature bounds are the following:
Definition 7.6 (Curvature ≥ K and no bound on dimension - CD(K, ∞)) We say that a metric
measure space (X, d, m) has Ricci curvature bounded from below by K ∈ R provided the functional
E∞ : P(X) → R ∪ {+∞},
is K-geodesically convex on (P2a (X), W2 ). In this case we say that (X, d, m) satisfies the curvature
dimension condition CD(K, ∞) or that (X, d, m) is a CD(K, ∞) space.
Definition 7.7 (Curvature ≥ 0 and dimension ≤ N - CD(0, N )) We say that a metric measure
space (X, d, m) has nonnegative Ricci curvature and dimension bounded from above by N provided
the functionals
EN 0 : P(X) → R ∪ {+∞},
are geodesically convex on (P2a (X), W2 ) for every N 0 ≥ N . In this case we say that (X, d, m)
satisfies the curvature dimension condition CD(0, N ), or that (X, d, m) is a CD(0, N ) space.
Note that N > 1 is not necessarily an integer.
Remark 7.8 Notice that geodesic convexity is required on P2 (supp(mX )) and not on P2 (X).
This makes no difference for what concerns CD(K, ∞) spaces, as E∞ is +∞ on measures having
a singular part w.r.t. m, but is important for the case of CD(0, N ) spaces, as the functional EN has
only real values, and requiring geodesic convexity on the whole P2 (X) would lead to a notion not
invariant under isomorphism of metric measure spaces.
Also, for the CD(0, N ) condition one requires the geodesic convexity of all EN 0 to ensure the
following compatibility condition: if X is a CD(0, N ) space, then it is also a CD(0, N 0 ) space
for any N 0 > N . Using Proposition 2.16 it is not hard to see that such compatibility condition is
automatically satisfied on non branching spaces.
Remark 7.9 (How to adapt the definitions to general bounds on curvature the dimension) It is
pretty natural to guess that the notion of bound from below on the Ricci curvature by K ∈ R
and bound from above on the dimension by N can be given by requiring the functional EN to be
K-geodesically convex on (P(X), W2 ). However, this is wrong, because such condition is not
compatible with the Riemannian case. The hearth of the definition of CD(K, N ) spaces still concerns the properties of EN , but a different and more complicated notion of “convexity” is involved.
112
Let us now check that the definitions given have the qualitative properties that we discussed in
the introduction of this chapter.
Intrinsicness. This property is clear from the definition.
Compatibility. To give the answer we need to do some computations on Riemannian manifolds:
Lemma 7.10 (Second derivative of the internal energy) Let M be a compact and smooth Riemannian manifold, m its normalized volume measure, u : [0, +∞) be convex, continuous and C 2 on
(0, +∞) with u(0) = 0 and define the “pressure” p : [0, +∞) → R by
p(z) := zu0 (z) − u(z),
∀z > 0,
and p(0) := 0. Also, let µ = ρm ∈ P2a (M ) with ρ ∈ C ∞ (M ), pick ϕ ∈ Cc∞ (M ), and define
Tt : M → M by Tt (x) := expx (t∇ϕ(x)). Then it holds:
Z
2 2
d2
0
2
2
∇ ϕ − Ric ∇ϕ, ∇ϕ dm,
E
((T
)
µ)
=
p
(ρ)
ρ
(∆ϕ)
−
p(ρ)
(∆ϕ)
−
t
#
dt2 |t=0
2
where by ∇2 ϕ(x)Pwe mean the trace of the linear map (∇2 ϕ(x))2 : Tx M → Tx M (in coordinates, this reads as ij (∂ij ϕ(x))2 ).
Proof
(Computation of the second derivative). Let Dt (x) := det(∇Tt (x)), µt := (Tt )# µ = ρt Vol. By
compactness, for t sufficiently small Tt is invertible with smooth inverse, so that Dt , ρt ∈ C ∞ (M ).
For small t, the change of variable formula gives
ρt (Tt (x)) =
ρ(x)
ρ(x)
=
.
det(∇Tt (x))
Dt (x)
Thus we have (all the integrals being w.r.t. m):
Z
Z Z
Z ρ
ρ
d
d
ρ ρDt0
ρ
0
u(ρt ) =
u
D
+
u
Dt = −u0
D
=
−
p
Dt0 ,
t
t
2
dt
dt
Dt
Dt Dt
Dt
Dt
and
d2
dt2 |t=0
Z
d
u(ρt ) = − |t=0
dt
Z
p
ρ
Dt
Dt0
Z
=
p0 (ρ)ρ(D00 )2 − p(ρ)D000 ,
having used the fact that D0 ≡ 1.
(Evaluation of D00 and D000 ). We want to prove that
D00 (x) = ∆ϕ(x),
2
D000 (x) = (∆ϕ(x))2 − ∇2 ϕ(x) − Ric ∇ϕ(x), ∇ϕ(x) .
(7.7)
For t ≥ 0 and x ∈ M , let Jt (x) be the operator from Tx M to Texpx (t∇ϕ(x)) M given by:
the value at s = t of the Jacobi field js along the geodesic
Jt (x)(v) :=
s 7→ expx (s∇ϕ(x)), having the initial conditions j0 := v, j00 := ∇2 ϕ · v,
(where here and in the following the apex 0 on a vector/tensor field stands for covariant differentiation), so that in particular we have
J0 = Id,
J00 = ∇2 ϕ.
113
(7.8)
The fact that Jacobi fields are the differential of the exponential map reads, in our case, as:
∇Tt (x) · v = Jt (x) · v,
therefore we have
Dt = det(Jt ).
(7.9)
Also, Jacobi fields satisfy the Jacobi equation, which we write as
Jt00 + At Jt = 0,
(7.10)
where At (x) : Texpx (t∇ϕ(x)) M → Texpx (t∇ϕ(x)) M is the map given by
At (x) · v := R(γ˙ t , v)γ˙ t ,
where γt := expx (t∇ϕ(x)). Recalling the rule (detBt )0 = det(Bt )tr(Bt0 Bt−1 ), valid for a smooth
curve of linear operators, we obtain from (7.9) the validity of
Dt0 = Dt tr(Jt0 Jt−1 ).
(7.11)
Evaluating this identity at t = 0 and using (7.8) we get the first of (7.7). Recalling the rule (Bt−1 )0 =
−Bt−1 Bt0 Bt−1 , valid for a smooth curve of linear operators, and differentiating in time equation (7.11)
we obtain
2
2
Dt00 = Dt tr(Jt0 Jt−1 ) +Dt tr(Jt00 Jt−1 −Jt0 Jt−1 Jt0 Jt−1 ) = Dt tr(Jt0 Jt−1 ) −tr At +Jt0 Jt−1 Jt0 Jt−1 ,
having used the Jacobi equation (7.10). Evaluate this expression at t = 0, use (7.8) and observe that
n
o
tr(A0 ) = tr v 7→ R(∇ϕ, v)∇ϕ = Ric(∇ϕ, ∇ϕ),
to get the second of (7.7).
Theorem 7.11 (Compatibility of weak Ricci curvature bounds) Let M be a compact Riemannian
manifold, d its Riemannian distance and m its normalized volume measure. Then:
i) the functional E∞ is K-geodesically convex on (P2 (M ), W2 ) if and only if M has Ricci
curvature uniformly bounded from below by K.
ii) the functional EN is geodesically convex on (P2 (M ), W2 ) if and only if M has non negative
Ricci curvature and dim(M ) ≤ N .
Sketch of the Proof We will give only a formal proof, neglecting all the issues which arise due to the
potential non regularity of the objects involved.
We start with (i). Assume that Ric(v, v) ≥ K|v|2 for any v. Pick a geodesic (ρt m) ⊂ P2 (M )
and assume that ρt ∈ C ∞ for any t ∈ [0, 1]. By Theorem 1.33 we know that there exists a function
ϕ : M → R differentiable ρ0 m-a.e. such that exp(∇ϕ) is the optimal transport map from ρ0 m to
ρ1 m and
ρt m = exp(t∇ϕ) # ρ0 m.
Assume that ϕ is C ∞ . Then by Lemma 7.10 with u := u∞ we know that
Z Z
2 2
d2
E∞ (ρt m) =
∇ ϕ + Ric(∇ϕ, ∇ϕ) ρ0 dm ≥ K |∇ϕ|2 ρ0 dm.
dt2
114
R
Since |∇ϕ|2 ρ0 dm = W22 (ρ0 , ρ1 ), the claim is proved.
The converse implication follows by an explicit construction: if Ric(v, v) < K|v|2 for some
x ∈ M and v ∈ Tx M , then for ε δ 1 define µ0 := c0 m|B (x) (c0 being the normalizing
ε
constant) and µt := (Tt )# µ0 where Tt (y) := expy (tδ∇ϕ(y)) and ϕ ∈ C ∞ is such that ∇ϕ(x) = v
and ∇2 ϕ(x) = 0. Using Lemma 7.10 again and the hypothesis Ric(v, v) < K|v|2 it is not hard to
prove that E∞ is not λ-geodesically convex along (µt ). We omit the details.
Now we turn to (ii). Let (ρt m) and ϕ as in the first part of the argument above. Assume that
M has non negative Ricci curvature and that dim(M ) ≤ N . Observe that for u := uN Lemma 7.10
gives
Z 2 2 1
1
1
1
d2
2
1− N
2
1− N
∇ ϕ − Ric(∇ϕ, ∇ϕ) dm.
(∆ϕ)
−
ρ
(∆ϕ)
−
E
(ρ
)
=
1
−
ρ
N
t
dt2 |t=0
N
2
2
d2
Using the hypothesis on M and the fact that (∆ϕ)2 ≤ N ∇2 ϕ we get dt
E (ρ ) ≥ 0, i.e. the
2|
t=0 N t
geodesic convexity of EN . For the converse implication it is possible to argue as above, we omit the
details also in this case.
Now we pass to the stability:
D
Theorem 7.12 (Stability of weak Ricci curvature bound) Assume that (Xn , dn , mn )
→
(X, d, m) and that for every n ∈ N the space (Xn , dn , mn ) is CD(K, ∞) (resp. CD(0, N )). Then
(X, d, m) is a CD(K, ∞) (resp. CD(0, N )) space as well.
Sketch of the Proof Pick µ0 , µ1 ∈ P2a (X) and assume they are both absolutely continuous with
bounded densities, say µi = ρi m, i = 0, 1. Choose (d˜n , γ n ) ∈ Opt ((dn , mn ), (d, m)). Define
a
a
n
µni := (γ −1
n )# µi ∈ P2 (Xn ), i = 0, 1. Then by assumption there is a geodesic (µt ) ⊂ P2 (Xn )
such that
K
E∞ (µnt ) ≤ (1 − t)E∞ (µn0 ) + tE∞ (µn1 ) − t(1 − t)W22 (µn0 , µn1 ).
(7.12)
2
Now let σtn := (γ n )# µnt ∈ P2a (X), t ∈ [0, 1]. From Proposition 7.5 and its proof we know that
W2 (µi , σin ) → 0 as n → ∞, i = 0, 1. Also, from (7.12) ad Lemma 7.4, we know that E∞ (σtn ) is
uniformly bounded in n, t. Thus for every fixed t the sequence n 7→ σtn is tight, and we can extract
a subsequence, not relabeled, such that σtn narrowly converges to some σt ∈ P2 (supp(m)) for
every rational t. By an equicontinuity argument it is not hard to see that then σtn narrowly converges
to some σt for any t ∈ [0, 1] (we omit the details). We claim that (σt ) is a geodesic, and that the
K-convexity inequality is satisfied along it. To check that it is a geodesic just notice that for any
partition {ti } of [0, 1] we have
X
W2 (µ0 , µ1 ) = lim W2 (σ0n , σ1n ) = lim
W2 (σtni , σtni+1 )
n→∞
≥
X
i
lim
n→∞
n→∞
W2 (σtni , σtni+1 )
≥
i
X
W2 (σti , σti+1 ).
i
Passing to the limit in (7.12), recalling Proposition 7.5 to get that E∞ (µni ) → E∞ (µi ), i = 0, 1, and
that limn→∞ E∞ (µnt ) ≥ limn→∞ E∞ (σtn ) ≥ E∞ (σt ) we conclude.
To deal with general µ0 , µ1 , we start recalling that the sublevels of E∞ are tight, indeed using
first the bound z log(z) ≥ − 1e and then Jensen’s inequality we get
Z
m(X \ E)
1
µ(E)
+C ≥
+ E∞ (µ) ≥
ρ log(ρ)dm ≥ µ(E) log
,
e
e
m(E)
E
115
for any µ = ρm such that E∞ (µ) ≤ C and any Borel E ⊂ X. This bound gives that if m(En ) → 0
then µ(En ) → 0 uniformly on the set of µ’s such that E∞ (µ) ≤ C. This fact together with the
tightness of m gives the claimed tightness of the sublevels of E∞ .
Now the conclusion follows by a simple truncation argument using the narrow compactness of
the sublevels of E∞ and the lower semicontinuity of E∞ w.r.t. narrow convergence.
For the stability of the CD(0, N ) condition, the argument is the following: we first deal with
the case of µ0 , µ1 with bounded densities with exactly the same ideas used for E∞ . Then to pass to
the general case we use the fact that if (X, d, m) is a CD(0, N ) space, then (supp(m), d, m) is a
doubling space (Proposition 7.15 below - notice that EN 0 ≤ N 0 and thus it is not true that sublevels
of EN 0 are tight) and therefore boundedly compact. Then the inequality
Z
2
R µ(supp(m) \ BR (x0 )) ≤ d2 (·, x0 )dµ,
shows that the set of µ’s in P2a (X) with bounded second moment is tight. Hence the conclusion
follows, as before, using this narrow compactness together with the lower semicontinuity of EN 0
w.r.t. narrow convergence.
It remains to discuss the interest: from now on we discuss some of the geometric and analytic
properties of spaces having a weak Ricci curvature bound.
Proposition 7.13 (Restriction and rescaling) Let (X, d, m) be a CD(K, ∞) space (resp.
CD(0, N ) space). Then:
i) Restriction. If Y ⊂ X is a closed totally convex subset (i.e. every geodesic with endpoints
in Y lies entirely inside Y ) such that m(Y ) > 0, then the space (Y, d, m(Y )−1 m|Y ) is a
CD(K, ∞) space (resp. CD(0, N ) space),
ii) Rescaling. for every α > 0 the space (X, αd, m) is a CD(α−2 K, ∞) space (resp. CD(0, N )
space).
Proof
(i). Pick µ0 , µ1 ∈ P(Y ) ⊂ P(X) and a constant speed geodesic (µt ) ⊂ P(X) connecting them
such that
K
E∞ (µt ) ≤ (1 − t)E∞ (µ0 ) + tE∞ (µ1 ) − t(1 − t)W22 (µ0 , µ1 ),
2
(resp. satisfying the convexity inequality for the functional EN 0 , N 0 ≥ N ).
We claim that supp(µt ) ⊂ Y for any t ∈ [0, 1]. Recall Theorem 2.10 and pick a measure
µ ∈ P(Geod(X)) such that
µt = (et )# µ,
where et is the evaluation map defined by equation (2.6). Since supp(µ0 ), supp(µ1 ) ⊂ Y we know
that for any geodesic γ ∈ supp(µ) it holds γ0 , γ1 ∈ Y . Since Y is totally convex, this implies that
γt ∈ Y for any t and any γ ∈ supp(µ), i.e. µt = (et )# µ ∈ P(Y ). Therefore (µt ) is a geodesic
connecting µ0 to µ1 in (Y, d). Conclude noticing that for any µ ∈ P2 (Y ) it holds
Z
Z
dµ
dµ
dµ
dµ
log
dmY = log(m(Y )) +
log
dm,
dmY
dmY
dm
dm
1− 10
1− 10
Z Z N
N
dµ
dµ
− N10
dmY = m(Y )
dm,
dmY
dm
where we wrote mY for m(Y )−1 m|Y .
˜ 2 be the Wasserstein distance on P(X) induced by the
(ii). Fix α > 0 and let d˜ := αd and W
116
˜ It is clear that a plan γ ∈ Adm(µ, ν) is optimal for the distance W2 if and only if it is
distance d.
˜ 2 , thus W
˜ 2 = αW2 . Now pick µ0 , µ1 ∈ P(X) and let (µt ) ⊂ P(X) be a constant
optimal for W
speed geodesic connecting them such that
E∞ (µt ) ≤ (1 − t)E (µ0 ) + tE (µ1 ) −
K
t(1 − t)W22 (µ0 , µ1 ),
2
then it holds
K
˜ 22 (µ0 , µ1 ),
t(1 − t)W
2α2
and the proof is complete. A similar argument applies for the case CD(0, N ).
E∞ (µt ) ≤ (1 − t)E (µ0 ) + tE (µ1 ) −
For A0 , A1 ⊂ X, we define [A0 , A1 ]t ⊂ X as:
n
o
[A0 , A1 ]t := γ(t) : γ is a constant speed geodesic such that γ(0) ∈ A0 , γ(1) ∈ A1 .
Observe that if A0 , A1 are open (resp. compact) [A0 , A1 ]t is open (resp. compact), hence Borel.
Proposition 7.14 (Brunn-Minkowski) Let (X, d, m) be a metric measure space and A0 , A1 ⊂
supp(m) compact subsets. Then:
i) if (X, d, m) is a CD(K, ∞) space it holds:
log(m([A0 , A1 ]t )) ≥ (1 − t) log(m(A0 )) + t log(m(A1 )) +
K
2
t(1 − t)DK
(A0 , A1 ), (7.13)
2
where DK (A0 , A1 ) is defined as sup x0 ∈A0 d(x0 , x1 ) if K < 0 and as inf x0 ∈A0 d2 (x0 , x1 ) if
x1 ∈A1
K > 0.
x1 ∈A1
ii) If (X, d, m) is a CD(0, N ) space it holds:
1/N
m [A0 , A1 ]t
≥ (1 − t)m(A0 )1/N + tm(A1 )1/N .
(7.14)
Proof We start with (i). Suppose that A0 , A1 are open satisfying m(A0 ), m(A1 ) > 0. Define the
measures µi := m(Ai )−1 m|A for i = 0, 1 and find a constant speed geodesic (µt ) ⊂ P(X) such
i
that
K
E∞ (µt ) ≤ (1 − t)E∞ (µ0 ) + tE∞ (µ1 ) − t(1 − t)W22 (µ0 , µ1 ).
2
Arguing as in the proof of the previous proposition, it is immediate to see that µt is concentrated on
[A0 , A1 ]t for any t ∈ [0, 1].
In particular m([A0 , A1 ]t ) > 0, otherwise E∞ (µt ) would be +∞ and the convexity inequality
would fail. Now let νt := m([A0 , A1 ]t )−1 m|[A ,A ] : an application of Jensen inequality shows
that E∞ (µt ) ≥ E∞ (νt ), thus we have
0
1 t
E∞ (νt ) ≤ (1 − t)E∞ (µ0 ) + tE∞ (µ1 ) −
K
t(1 − t)W22 (µ0 , µ1 ).
2
Notice that for a general µ of the form m(A)−1 m|A it holds
E∞ (µ) = log m(A)−1 = − log m(A) ,
and conclude using the trivial inequality
inf d2 (x0 , x1 ) ≤ W22 (µ0 , µ1 ) ≤ sup d2 (x0 , x1 ).
x0 ∈A0
x1 ∈A1
x0 ∈A0
x1 ∈A1
117
The case of A0 , A1 compact now follows by a simple approximation argument by considering the
ε-neighborhood Aεi := {x : d(x, Ai ) < ε}, i = 0, 1, noticing that [A0 , A1 ]t = ∩ε>0 [Aε0 , Aε1 ]t , for
any t ∈ [0, 1] and that m(Aεi ) > 0 because Ai ⊂ supp(m), i = 0, 1.
Part (ii) follows along the same lines taking into account that for a general µ of the form
m(A)−1 m|A it holds
EN (µ) = N (1 − m(A)1/N ),
and that, as before, if m(A0 ), m(A1 ) > 0 it cannot be m([A0 , A1 ]t ) = 0 or we would violate the
convexity inequality.
A consequence of Brunn-Minkowski is the Bishop-Gromov inequality.
Proposition 7.15 (Bishop-Gromov) Let (X, d, m) be a CD(0, N ) space. Then it holds
r N
m(Br (x))
≥
,
m(BR (x))
R
∀x ∈ supp(m).
(7.15)
In particular, (supp(m), d, m) is a doubling space.
Proof Pick x ∈ supp(m) and assume that m({x}) = 0. Let v(r) := m(Br (x)). Fix R > 0
and apply the Brunn-Minkowski inequality to A0 = {x}, A1 = BR (x) observing that [A0 , A1 ]t ⊂
BtR (x) to get
v 1/N (tR) ≥ m [A0 , A1 ]t
1/N
≥ tv 1/N (R),
∀0 ≤ t ≤ 1.
Now let r := tR and use the arbitrariness of R, t to get the conclusion.
It remains to deal with the case m({x}) 6= 0. We can also assume supp(m) 6= {x}, otherwise
the thesis would be trivial: under this assumption we will prove that m({x}) = 0 for any x ∈ X.
A simple consequence of the geodesic convexity of EN tested with delta measures is that
supp(m) is a geodesically convex set, therefore it is uncountable. Then there must exist some
x0 ∈ supp(m) such that m({x0 }) = 0. Apply the previous argument with x0 in place of x to get that
r N
v(r)
≥
,
v(R)
R
∀0 ≤ r < R,
(7.16)
where now v(r) is the volume of the closed ball of radius r around x0 . By definition, v is right
continuous; letting r ↑ R we obtain from (7.16) that v is also left continuous. Thus it is continuous,
and in particular the volume of the spheres {y : d(y, x0 ) = r} is 0 for any r ≥ 0. In particular
m({y}) = 0 for any y ∈ X and the proof is concluded.
An interesting geometric consequence of the Brunn-Minkowski inequality in conjunction with the
non branching hypothesis is the fact that the ‘cut-locus’ is negligible.
Proposition 7.16 (Negligible cut-locus) Assume that (X, d, m) is a CD(0, N ) space and that it is
non branching. Then for every x ∈ supp(m) the set of y’s such that there is more than one geodesic
from x to y is m-negligible. In particular, for m × m-a.e. (x, y) there exists only one geodesic γ x,y
from x to y and the map X 2 3 (x, y) 7→ γ x,y ∈ Geod(X) is measurable.
Proof Fix x ∈ supp(m), R > 0 and consider the sets At := [{x}, BR (x)]t . Fix t < 1 and y ∈ At .
We claim that there is only one geodesic connecting it to x. By definition, we know that there is some
z ∈ BR (x) and a geodesic γ from z to x such that γt = y. Now argue by contradiction and assume
that there are 2 geodesics γ 1 , γ 2 from y to x. Then starting from z, following γ for time 1 − t, and
118
then following each of γ 1 , γ 2 for the rest of the time we find 2 different geodesics from z to x which
agree on the non trivial interval [0, 1 − t]. This contradicts the non-branching hypothesis.
Clearly At ⊂ As ⊂ BR (x) for t ≤ s, thus t 7→ m(At ) is non decreasing. By (7.14) and the
fact that m({x}) = 0 (proved in Proposition 7.15) we know that limt→1 m(At ) = m(BR (x))
which means that m-a.e. point in BR (x) is connected to x by a unique geodesic. Since R and x are
arbitrary, uniqueness is proved.
The measurability of the map (x, y) 7→ γ x,y is then a consequence of uniqueness, of Lemma 2.11
and classical measurable selection results, which ensure the existence of a measurable selection of
geodesics: in our case there is m × m-almost surely no choice, so the unique geodesic selection is
measurable.
Corollary 7.17 (Compactness) Let N, D < ∞. Then the family X (N, D) of (isomorphism classes
of) metric measure spaces (X, d, m) satisfying the condition CD(0, N ), with diameter bounded
above by D is compact w.r.t. the topology induced by D.
Sketch of the Proof Using the Bishop-Gromov inequality with R = D we get that
m(Bε (x)) ≥
ε N
,
D
∀(X, d, m) ∈ X (N, D), x ∈ supp(mX ).
(7.17)
Thus there exists n(N, D, ε) which does not depend on X ∈ X (N, D), such that we can find at most
n(N, D, ε) disjoint balls of radius ε in X. Thus supp(mX ) can be covered by at most n(N, D, ε)
balls of radius 2ε. This means that the family X (N, D) is uniformly totally bounded, and thus it is
compact w.r.t. Gromov-Hausdorff convergence (see e.g. Theorem 7.4.5 of [20]).
Pick a sequence (Xn , dn , mn ) ∈ X (N, D). By what we just proved, up to pass to a subsequence,
not relabeled, we may assume that (supp(mn ), dn ) converges in the Gromov-Hausdorff topology to
some space (X, d). It is well known that in this situation there exists a compact space (Y, dY ) and a
family of isometric embeddings fn : supp(mn ) → Y , f : X → Y , such that the Hausdorff distance
between fn (supp(mn )) and f (X) goes to 0 as n → ∞.
The space (fn (supp(mn ), dY , (fn )# mn )) is isomorphic to (Xn , dn , mn ) by construction for
every n ∈ N, and (f (X), dY ) is isometric to (X, d), so we identify these spaces with the respective
subspaces of (Y, dY ). Since (Y, dY ) is compact, the sequence (mn ) admits a subsequence, not
relabeled, which weakly converges to some m ∈ P(Y ). It is immediate to verify that actually
m ∈ P(X). Also, again by compactness, weak convergence is equivalent to convergence w.r.t. W2 ,
which means that there exists plans γ n ∈ P(Y 2 ) admissible for the couple (m, mn ) such that
Z
d2Y (x, x
˜)dγ n (x, x
˜) → 0.
Therefore n 7→ (dY , γ n ) is a sequence of admissible couplings for (X, d, m) and (Xn , dn , mn )
whose cost tends to zero. This concludes the proof.
Now we prove the HWI (which relates the entropy, often denoted by H, the Wasserstein distance
W2 and the Fisher information I) and the log-Sobolev inequalities. To this aim, we introduce the
Fisher information functional I : P(X) → [0, ∞] on a general metric measure space (X, d, m) as
the squared slope of the entropy E∞ :

2

(E∞ (µ) − E∞ (ν))+

lim
,
if E∞ (µ) < ∞,
I(µ) :=
W22 (µ, ν)

 ν→µ
+∞,
otherwise.
119
The functional I is called Fisher information because its value on (Rd , | · − · |, Ld ) is given by
Z
|∇ρ|2 d
d
I(ρL ) =
dL ,
ρ
and the object on the right hand side is called Fisher information on Rd . It is possible to prove that a
formula like the above one is writable and true on general CD(K, ∞) spaces (see [7]), but we won’t
discuss this topic.
Proposition 7.18 (HWI inequality) Let (X, d, m) be a metric measure space satisfying the condition CD(K, ∞). Then
p
K
E∞ (µ) ≤ E∞ (ν) + W2 (µ, ν) I(µ) − W22 (µ, ν),
2
∀µ, ν ∈ P(X).
(7.18)
In particular, choosing ν = m it holds
p
K
E∞ (µ) ≤ W2 (µ, m) I(µ) − W22 (µ, m),
2
∀µ ∈ P(X).
(7.19)
Finally, if K > 0 the log-Sobolev inequality with constant K holds:
I
.
(7.20)
2K
Proof Clearly to prove (7.18) it is sufficient to deal with the case E∞ (ν), E∞ (µ) < ∞. Let (µt ) be
a constant speed geodesic from µ to ν such that
E∞ ≤
E∞ (µt ) ≤ (1 − t)E∞ (µ) + tE∞ (ν) −
K
t(1 − t)W22 (µ, ν).
2
p
Then from I(µ) ≥ limt↓0 (E∞ (µ) − E∞ (µt ))/W2 (µ, µt ) we get the thesis.
Equation (7.20) now follows from (7.19) and the trivial inequality
1
1
ab − a2 ≤ b2 ,
2
2
valid for any a, b ≥ 0.
The log-Sobolev inequality is a notion of global Sobolev-type inequality, and it is known that it
implies a global Poincaré inequality (we omit the proof of this fact). When working on metric
measure spaces, however, it is often important to have at disposal a local Poincaré inequality (see
e.g. the analysis done by Cheeger in [29]).
Our final goal is to show that in non-branching CD(0, N ) spaces a local Poincaré inequality
holds. The importance of the non-branching assumption is due to the following lemma.
Lemma 7.19 Let (X, d, m) be a non branching CD(0, N ) space, B ⊂ X a closed ball of positive
measure and 2B the closed ball with same center and double radius. Define the measures µ :=
·,·
m(B)−1 m|B and µ := γ#
(µ × µ) ∈ P(Geod(X)), where (x, y) 7→ γ x,y is the map which
associates to each x, y the unique geodesic connecting them (such a map is well defined for m × ma.e. x, y by Proposition 7.16). Then
(et )# µ ≤
2N
m ,
m(B) |2B
120
∀t ∈ [0, 1].
Proof Fix x ∈ B, t ∈ (0, 1) and consider the ‘homothopy’ map B 3 y 7→ Homxt (y) := γtx,y . By
Proposition 7.16 we know that this map is well defined for m-a.e. y and that (using the characterization of geodesics given in Theorem 2.10) t 7→ µxt := (Homxt )# µ is the unique geodesic connecting
δx to µ. We have
m (Homxt )−1 (E)
x
x −1
,
∀E ⊂ X Borel.
µt (E) = µ (Homt ) (E) =
m(B)
x
The non branching assumption ensures that Hom
t is invertible, therefore from the fact that
x −1
x
x −1
[{x}, (Homt ) (E)]t = Homt Homt ) (E) = E, the Brunn-Minkowski inequality and the
fact that m({x}) = 0 we get
m(E) ≥ tN m (Homxt )−1 (E) ,
(E)
and therefore µxt (E) ≤ tNm
m(B) . Given that E was arbitrary, we deduce
µxt ≤
m
.
tN m(B)
(7.21)
Notice that the expression on the right hand side is independent on x.
Now pick µ as in the hypothesis, and define µt := (et )# µ. The equalities
Z
Z
Z
ϕdµt =
ϕ(γt )dµ(γ) =
ϕ(γtx,y )dµ(x)dµ(y),
X
Geod(X)
X2
Z
Z
ϕdµxt =
ϕ(γtx,y )dµ(y),
X
X
valid for any ϕ ∈ Cb (X), show that
Z
µt =
and therefore, by (7.21), we have
µt ≤
µxt dµ(x),
m
.
tN m(B)
All these arguments can be repeated symmetrically with 1 − t in place of t (because the push forward
of µ via the map which takes γ and gives the geodesic t 7→ γ1−t , is µ itself), thus we obtain
m
m
2N m
µt ≤ min N
,
≤
,
∀t ∈ (0, 1).
t m(B) (1 − t)N m(B)
m(B)
To conclude, it is sufficient to prove that µt is concentrated on 2B for all t ∈ (0, 1). But this is
obvious, as µt is concentrated on [B, B]t and a geodesic whose endpoints lie on B cannot leave 2B.
As we said, we will use this lemma (together with the doubling property, which is a consequence
of the Bishop-Gromov inequality) to prove a local Poincaré inequality. For simplicity, we stick
to the case of Lipschitz functions and their local Lipschitz constant, although everything could be
equivalently stated in terms of generic Borel functions and their upper gradients.
For f : X → R Lipschitz, the local Lipschitz constant |∇f | : X → R is defined as
|∇f |(x) := lim
y→x
|f (x) − f (y)|
.
d(x, y)
121
For any ball B such that m(B) > 0, the number hf iB is the average value of f on B:
Z
1
hf iB :=
f dm.
m(B) B
Proposition 7.20 (Local Poincaré inequality) Assume that (X, d, m) is a non-branching
CD(0, N ) space. Then for every ball B such that m(B) > 0 and any Lipschitz function f : X → R
it holds
Z
Z
22N +1
1
|f (x) − hf iB | dm(x) ≤ r
|∇f |dm,
m(B) B
m(2B) 2B
r being the radius of B.
Proof Notice that
Z
Z
1
1
|f (x) − f (y)| dm(x)dm(y)
|f (x) − hf iB |dm(x) ≤
m(B) B
m(B)2 B×B
Z
=
|f (γ0 ) − f (γ1 )| dµ(γ),
Geod(X)
where µ is defined as in the statement of Lemma 7.19. Observe that for any geodesic γ, the map
t 7→ f (γt ) is Lipschitz and its derivative is bounded above by d(γ0 , γ1 )|∇f |(γt ) for a.e. t. Hence,
since any geodesic γ whose endpoints are in B satisfies d(γ0 , γ1 ) ≤ 2r, we have
Z
Z
1
Z
Z
|f (γ0 )−f (γ1 )| dµ(γ) ≤ 2r
Geod(X)
1
Z
|∇f |(γt ) dµ(γ)dt = 2r
0
Geod(X)
|∇f |d(et )# µdt.
0
X
By Lemma 7.19 we obtain
Z
1
Z
|∇f |d(et )# µdt ≤
2r
0
X
2N +1 r
m(B)
Z
|∇f |dm.
2B
By the Bishop-Gromov inequality we know that m(2B) ≤ 2N m(B) and thus
Z
Z
2N +1 r
22N +1 r
|∇f |dm ≤
|∇f |dm,
m(B) 2B
m(2B) 2B
which is the conclusion.
7.3
Bibliographical notes
The content of this chapter is taken from the works of Lott and Villani on one side ([58], [57]) and of
Sturm ([74], [75]) on the other.
The first link between K-geodesic convexity of the relative entropy functional in (P2 (M ), W2 )
and the bound from below on the Ricci curvature is has been given by Sturm and von Renesse
in [76]. The works [74], [75] and [58] have been developed independently. The main difference
between them is that Sturm provides the general definition of CD(K, N ) bound (which we didn’t
speak about, with the exception of the quick citation in Remark 7.9), while Lott and Villani focused
on the cases CD(K, ∞) and CD(0, N ). Apart from this, the works are strictly related and the
differences are mostly on the technical side. We mention only one of these. In giving
the definition of
R
0
CD(0, N ) space we followed Sturm and asked only the functionals ρm 7→ N 0 (ρ − ρ1−1/N )dm,
122
N 0 ≥ N , to be geodesically convex. Lott and Villani asked for something more restrictive, namely
they introduced the displacement convexity classes DCN as the set of functions u : [0, ∞) → R
continuous, convex and such that
z
z N u(z −N ),
7→
0
is convex. Notice that u(z) := N 0 (z − z 1−1/N ) belongs to DCN . Then they say that a space is
CD(0, N ) provided
Z
ρm
7→
u(ρ)dm,
(with the usual modifications for a measure which is not absolutely continuous) is geodesically convex for any u ∈ DCN . This notion is still compatible with the Riemannian case and stable under convergence. The main advantage one has in working with this definition is the fact that for
a CD(0, N ) space in this sense, for any couple of absolutely continuous measures there exists a
geodesic connecting them which is made of absolutely continuous measures.
The distance D that we used to define the notion of convergence of metric measure spaces has
been defined and studied by Sturm in [74]. This is not the only possible notion of convergence
of metric measure spaces: Lott and Villani used a different one, see [58] or Chapter 27 of [80].
A good property of the distance D is that it pleasantly reminds the Wasserstein distance W2 : to
some extent, the relation of D to W2 is the same relation that there is between Gromov-Hausdorff
distance and Hausdorff distance between compact subsets of a given metric space. A bad property
is that it is not suitable to study convergence of metric measure spaces which are endowed with
infinite reference measures (well, the definition can easily be adapted, but it would lead to a too
strict notion of convergence - very much like the Gromov-Hausdorff distance, which is not used to
discuss convergence of non compact metric spaces). The only notion of convergence of Polish spaces
endowed with σ-finite measures that we are aware of, is the one discussed by Villani in Chapter 27 of
[80] (Definition 27.30). It is interesting to remark that this notion of convergence does not guarantee
uniqueness of the limit (which can be though of as a negative point of the theory), yet, bounds from
below on the Ricci curvature are stable w.r.t. such convergence (which in turn is a positive point, as
it tells that these bounds are ‘even more stable’)
The discussion on the local Poincaré inequality and on Lemma 7.19 is extracted from [57].
There is much more to say about the structure and the properties of spaces with Ricci curvature
bounded below. This is an extremely fast evolving research area, and to give a complete discussion on
the topic one would probably need a book nowadays. Two things are worth to be quickly mentioned.
The first one is the most important open problem on the subject: is the property of being a
CD(K, N ) space a local notion? That is, suppose we have a metric measure space (X, d, m) and a
finite open cover {Ωi } such that (Ωi , d, m(Ωi )−1 m|Ω ) is a CD(K, N ) space for every i. Can we
i
deduce that (X, d, m) is a CD(K, N ) space as well? One would like the answer to be affirmative,
as any notion of curvature should be local. For K = 0 or N = ∞, this is actually the case, at least
under some technical assumptions. The general case is still open, and up to now we only know that
the conjecture 30.34 in [80] is false, being disproved by Deng and Sturm in [32] (see also [11]).
The second, and final, thing we want to mention is the case of Finsler manifolds, which are
differentiable manifolds endowed with a norm - possibly not coming from an inner product - on each
tangent space, which varies smoothly with the base point. A simple example of Finsler manifolds
is the space (Rd , k · k), where k · k is any norm. It turns out that for any choice of the norm, the
space (Rd , k · k, Ld ) is a CD(0, N ) space. Various experts have different opinion about this fact:
namely, there is no agreement on the community concerning whether one really wants or not Finsler
geometries to be included in the class of spaces with Ricci curvature bounded below. In any case,
123
it is interesting to know whether there exists a different, more restrictive, notion of Ricci curvature
bound which rules out the Finsler case. Progresses in this direction have been made in [8], where the
notion of spaces with Riemannian Ricci bounded below is introduced: shortly said, these spaces are
the subclass of CD(K, N ) spaces where the heat flow (studied in [45], [53], [7]) is linear.
References
[1] A. AGRACHEV AND P. L EE, Optimal transportation under nonholonomic constraints, Trans.
Amer. Math. Soc., 361 (2009), pp. 6019–6047.
[2] G. A LBERTI, On the structure of singular sets of convex functions, Calc.Var. and Part.Diff.Eq.,
2 (1994), pp. 17–27.
[3] G. A LBERTI AND L. A MBROSIO, A geometrical approach to monotone functions in Rn , Math.
Z., 230 (1999), pp. 259–316.
[4] L. A MBROSIO, Lecture notes on optimal transport problem, in Mathematical aspects of evolving interfaces, CIME summer school in Madeira (Pt), P. Colli and J. Rodrigues, eds., vol. 1812,
Springer, 2003, pp. 1–52.
[5] L. A MBROSIO AND N. G IGLI, Construction of the parallel transport in the Wasserstein space,
Methods Appl. Anal., 15 (2008), pp. 1–29.
[6] L. A MBROSIO , N. G IGLI , AND G. S AVARÉ, Gradient flows in metric spaces and in the space
of probability measures, Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, second ed., 2008.
[7]
[8]
, Calculus and heat flows in metric measure spaces with ricci curvature bounded below,
preprint, (2011).
, Spaces with riemannian ricci curvature bounded below, preprint, (2011).
[9] L. A MBROSIO , B. K IRCHHEIM , AND A. P RATELLI, Existence of optimal transport maps for
crystalline norms, Duke Mathematical Journal, 125 (2004), pp. 207–241.
[10] L. A MBROSIO AND S. R IGOT, Optimal mass transportation in the Heisenberg group, J. Funct.
Anal., 208 (2004), pp. 261–301.
[11] K. BACHER AND K. T. S TURM, Localization and tensorization properties of the curvaturedimension condition for metric measure spaces, J. Funct. Anal., 259 (2010), pp. 28–56.
[12] J.-D. B ENAMOU AND Y. B RENIER, A numerical method for the optimal time-continuous mass
transport problem and related problems, in Monge Ampère equation: applications to geometry
and optimization (Deerfield Beach, FL, 1997), vol. 226 of Contemp. Math., Amer. Math. Soc.,
Providence, RI, 1999, pp. 1–11.
[13] P. B ERNARD AND B. B UFFONI, Optimal mass transportation and Mather theory, J. Eur. Math.
Soc. (JEMS), 9 (2007), pp. 85–127.
[14] M. B ERNOT, V. C ASELLES , AND J.-M. M OREL, The structure of branched transportation
networks, Calc. Var. Partial Differential Equations, 32 (2008), pp. 279–317.
[15] S. B IANCHINI AND A. B RANCOLINI, Estimates on path functionals over Wasserstein spaces,
SIAM J. Math. Anal., 42 (2010), pp. 1179–1217.
[16] A. B RANCOLINI , G. B UTTAZZO , AND F. S ANTAMBROGIO, Path functionals over Wasserstein
spaces, J. Eur. Math. Soc. (JEMS), 8 (2006), pp. 415–434.
124
[17] L. B RASCO , G. B UTTAZZO , AND F. S ANTAMBROGIO, A benamou-brenier approach to
branched transport, Accepted paper at SIAM J. of Math. Anal., (2010).
[18] Y. B RENIER, Décomposition polaire et réarrangement monotone des champs de vecteurs, C.
R. Acad. Sci. Paris Sér. I Math., 305 (1987), pp. 805–808.
[19]
, Polar factorization and monotone rearrangement of vector-valued functions, Comm.
Pure Appl. Math., 44 (1991), pp. 375–417.
[20] D. B URAGO , Y. B URAGO , AND S. I VANOV, A course in metric geometry, vol. 33 of Graduate
Studies in Mathematics, American Mathematical Society, Providence, RI, 2001.
[21] L. A. C AFFARELLI, Boundary regularity of maps with convex potentials, Comm. Pure Appl.
Math., 45 (1992), pp. 1141–1151.
[22]
, The regularity of mappings with a convex potential, J. Amer. Math. Soc., 5 (1992),
pp. 99–104.
[23]
, Boundary regularity of maps with convex potentials. II, Ann. of Math. (2), 144 (1996),
pp. 453–496.
[24] L. A. C AFFARELLI , M. F ELDMAN , AND R. J. M C C ANN, Constructing optimal maps for
Monge’s transport problem as a limit of strictly convex costs, J. Amer. Math. Soc., 15 (2002),
pp. 1–26 (electronic).
[25] L. C ARAVENNA, A proof of sudakov theorem with strictly convex norms, Math. Z., to appear.
[26] J. A. C ARRILLO , S. L ISINI , G. S AVARÉ , AND D. S LEPCEV, Nonlinear mobility continuity
equations and generalized displacement convexity, J. Funct. Anal., 258 (2010), pp. 1273–1309.
[27] T. C HAMPION AND L. D E PASCALE, The Monge problem in Rd , Duke Math. J.
[28]
, The Monge problem for strictly convex norms in Rd , J. Eur. Math. Soc. (JEMS), 12
(2010), pp. 1355–1369.
[29] J. C HEEGER, Differentiability of Lipschitz functions on metric measure spaces, Geom. Funct.
Anal., 9 (1999), pp. 428–517.
[30] D. C ORDERO -E RAUSQUIN , B. NAZARET, AND C. V ILLANI, A mass-transportation approach
to sharp Sobolev and Gagliardo-Nirenberg inequalities, Adv. Math., 182 (2004), pp. 307–332.
[31] C. D ELLACHERIE AND P.-A. M EYER, Probabilities and potential, vol. 29 of North-Holland
Mathematics Studies, North-Holland Publishing Co., Amsterdam, 1978.
[32] Q. D ENG AND K. T. S TURM, Localization and tensorization properties of the curvaturedimension condition for metric measure spaces ii, Submitted, (2010).
[33] J. D OLBEAULT, B. NAZARET, AND G. S AVARÉ, On the Bakry-Emery criterion for linear
diffusions and weighted porous media equations, Comm. Math. Sci, 6 (2008), pp. 477–494.
[34] L. C. E VANS AND W. G ANGBO, Differential equations methods for the Monge-Kantorovich
mass transfer problem, Mem. Amer. Math. Soc., 137 (1999), pp. viii+66.
[35] A. FATHI AND A. F IGALLI, Optimal transportation on non-compact manifolds, Israel J. Math.,
175 (2010), pp. 1–59.
[36] D. F EYEL AND A. S. Ü STÜNEL, Monge-Kantorovitch measure transportation and MongeAmpère equation on Wiener space, Probab. Theory Related Fields, 128 (2004), pp. 347–385.
[37] A. F IGALLI AND N. G IGLI, A new transportation distance between non-negative measures,
with applications to gradients flows with Dirichlet boundary conditions, J. Math. Pures Appl.
(9), 94 (2010), pp. 107–130.
125
[38] A. F IGALLI , F. M AGGI , AND A. P RATELLI, A mass transportation approach to quantitative
isoperimetric inequalities, Invent. Math., 182 (2010), pp. 167–211.
[39] A. F IGALLI AND L. R IFFORD, Mass transportation on sub-Riemannian manifolds, Geom.
Funct. Anal., 20 (2010), pp. 124–159.
[40] N. F USCO , F. M AGGI , AND A. P RATELLI, The sharp quantitative isoperimetric inequality,
Ann. of Math. (2), 168 (2008), pp. 941–980.
[41] W. G ANGBO, The Monge mass transfer problem and its applications, in Monge Ampère equation: applications to geometry and optimization (Deerfield Beach, FL, 1997), vol. 226 of Contemp. Math., Amer. Math. Soc., Providence, RI, 1999, pp. 79–104.
[42] W. G ANGBO AND R. J. M C C ANN, The geometry of optimal transportation, Acta Math., 177
(1996), pp. 113–161.
[43] N. G IGLI, On the geometry of the space of probability measures in Rn endowed with the
quadratic optimal transport distance, 2008. Thesis (Ph.D.)–Scuola Normale Superiore.
[44]
, Second order calculus on (P2 (M ), W2 ), Accepted by Memoirs of the AMS, 2009.
[45]
, On the heat flow on metric measure spaces: existence, uniqueness and stability, Calc.
Var. Partial Differential Equations, (2010).
[46]
, On the inverse implication of Brenier-McCann theorems and the structure of
P2 (M ), W2 ), accepted paper Meth. Appl. Anal., (2011).
[47] R. J ORDAN , D. K INDERLEHRER , AND F. OTTO, The variational formulation of the FokkerPlanck equation, SIAM J. Math. Anal., 29 (1998), pp. 1–17 (electronic).
[48] N. J UILLET, On displacement interpolation of measures involved in brenier’s theorem, accepted paper Proc. of the AMS, (2011).
[49] L. V. K ANTOROVICH, On an effective method of solving certain classes of extremal problems,
Dokl. Akad. Nauk. USSR, 28 (1940), pp. 212–215.
[50]
, On the translocation of masses, Dokl. Akad. Nauk. USSR, 37 (1942), pp. 199–201.
English translation in J. Math. Sci. 133, 4 (2006), 1381Ð1382.
[51] L. V. K ANTOROVICH AND G. S. RUBINSHTEIN, On a space of totally additive functions,
Vestn. Leningrad. Univ. 13, 7 (1958), pp. 52–59.
[52] M. K NOTT AND C. S. S MITH, On the optimal mapping of distributions, J. Optim. Theory
Appl., 43 (1984), pp. 39–49.
[53] K. K UWADA , N. G IGLI , AND S.-I. O HTA, Heat flow on alexandrov spaces, preprint, (2010).
[54] S. L ISINI, Characterization of absolutely continuous curves in Wasserstein spaces, Calc. Var.
Partial Differential Equations, 28 (2007), pp. 85–120.
[55] G. L OEPER, On the regularity of solutions of optimal transportation problems, Acta Math., 202
(2009), pp. 241–283.
[56] J. L OTT, Some geometric calculations on Wasserstein space, Comm. Math. Phys., 277 (2008),
pp. 423–437.
[57] J. L OTT AND C. V ILLANI, Weak curvature conditions and functional inequalities, J. Funct.
Anal., (2007), pp. 311–333.
[58] J. L OTT AND C. V ILLANI, Ricci curvature for metric-measure spaces via optimal transport,
Ann. of Math. (2), 169 (2009), pp. 903–991.
126
[59] X.-N. M A , N. S. T RUDINGER , AND X.-J. WANG, Regularity of potential functions of the
optimal transportation problem, Arch. Ration. Mech. Anal., 177 (2005), pp. 151–183.
[60] F. M ADDALENA AND S. S OLIMINI, Transport distances and irrigation models, J. Convex
Anal., 16 (2009), pp. 121–152.
[61] F. M ADDALENA , S. S OLIMINI , AND J.-M. M OREL, A variational model of irrigation patterns, Interfaces Free Bound., 5 (2003), pp. 391–415.
[62] R. J. M CCANN, A convexity theory for interacting gases and equilibrium crystals, ProQuest
LLC, Ann Arbor, MI, 1994. Thesis (Ph.D.)–Princeton University.
[63] R. J. M C C ANN, A convexity principle for interacting gases, Adv. Math., 128 (1997), pp. 153–
179.
[64]
, Polar factorization of maps on riemannian manifolds, Geometric and Functional Analysis, 11 (2001), pp. 589–608.
[65] V. D. M ILMAN AND G. S CHECHTMAN, Asymptotic theory of finite-dimensional normed
spaces, vol. 1200 of Lecture Notes in Mathematics, Springer-Verlag, Berlin, 1986. With an
appendix by M. Gromov.
[66] G. M ONGE, Mémoire sur la théorie des d’eblais et des remblais, Histoire de lÕAcadémie
Royale des Sciences de Paris, (1781), pp. 666–704.
[67] F. OTTO, The geometry of dissipative evolution equations: the porous medium equation,
Comm. Partial Differential Equations, 26 (2001), pp. 101–174.
[68] A. P RATELLI, On the equality between Monge’s infimum and Kantorovich’s minimum in optimal mass transportation, Annales de l’Institut Henri Poincare (B) Probability and Statistics, 43
(2007), pp. 1–13.
[69] S. T. R ACHEV AND L. R ÜSCHENDORF, Mass transportation problems. Vol. I, Probability and
its Applications, Springer-Verlag, New York, 1998. Theory.
[70] R. T. ROCKAFELLAR, Convex Analysis, Princeton University Press, Princeton, 1970.
[71] L. R ÜSCHENDORF AND S. T. R ACHEV, A characterization of random variables with minimum
L2 -distance, J. Multivariate Anal., 32 (1990), pp. 48–54.
[72] G. S AVARÉ, Gradient flows and diffusion semigroups in metric spaces under lower curvature
bounds, C. R. Math. Acad. Sci. Paris, 345 (2007), pp. 151–154.
[73] G. S AVARÉ, Gradient flows and evolution variational inequalities in metric spaces, In preparation, (2010).
[74] K.-T. S TURM, On the geometry of metric measure spaces. I, Acta Math., 196 (2006), pp. 65–
131.
[75]
, On the geometry of metric measure spaces. II, Acta Math., 196 (2006), pp. 133–177.
[76] K.-T. S TURM AND M.-K. VON R ENESSE, Transport inequalities, gradient estimates, entropy,
and Ricci curvature, Comm. Pure Appl. Math., 58 (2005), pp. 923–940.
[77] V. N. S UDAKOV, Geometric problems in the theory of infinite-dimensional probability distributions, Proc. Steklov Inst. Math., (1979), pp. i–v, 1–178. Cover to cover translation of Trudy
Mat. Inst. Steklov 141 (1976).
[78] N. S. T RUDINGER AND X.-J. WANG, On the Monge mass transfer problem, Calc. Var. Partial
Differential Equations, 13 (2001), pp. 19–31.
[79] C. V ILLANI, Topics in optimal transportation, vol. 58 of Graduate Studies in Mathematics,
American Mathematical Society, Providence, RI, 2003.
127
[80]
, Optimal transport, old and new, Springer Verlag, 2008.
[81] Q. X IA, Optimal paths related to transport problems, Commun. Contemp. Math., 5 (2003),
pp. 251–279.
[82]
, Interior regularity of optimal transport paths, Calc. Var. Partial Differential Equations,
20 (2004), pp. 283–299.
[83] L. Z AJÍ ˇ CEK, On the differentiability of convex functions in finite and infinite dimensional
spaces, Czechoslovak Math. J., 29 (1979), pp. 340–348.
128