joe neeman november 3, 2016 - uni-bonn.de · 2016-11-03 · for example, we will require...

136
Notes on Markov diffusions Joe Neeman November 3, 2016

Upload: others

Post on 09-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Notes on Markov diffusions

Joe Neeman

November 3, 2016

Page 2: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Contents

1 Markov diffusions 61.1 Some measure-theoretic preliminaries . . . . . . . . . . . . . . 61.2 Markov operators and semigroups . . . . . . . . . . . . . . . . 71.3 The kernel representation . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Density kernels . . . . . . . . . . . . . . . . . . . . . . 111.4 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.6 The carre du champ . . . . . . . . . . . . . . . . . . . . . . . 15

1.6.1 The examples again . . . . . . . . . . . . . . . . . . . 161.7 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . . . 18

1.7.1 Generators are self-adjoint . . . . . . . . . . . . . . . . 201.7.2 The Friedrichs extension . . . . . . . . . . . . . . . . . 22

1.8 Spectral theory of self-adjoint operators . . . . . . . . . . . . 231.8.1 The semigroup generated by a self-adjoint operator . . 25

1.9 Compact Markov diffusions . . . . . . . . . . . . . . . . . . . 261.9.1 Dirichlet forms . . . . . . . . . . . . . . . . . . . . . . 271.9.2 Compact Markov diffusion triple . . . . . . . . . . . . 29

1.10 Properties of diffusion semigroups . . . . . . . . . . . . . . . . 301.10.1 The Markov property . . . . . . . . . . . . . . . . . . 301.10.2 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . 321.10.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.11 Γ2, curvature, and dimension . . . . . . . . . . . . . . . . . . 351.11.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 36

2 Poincare inequalities 382.1 Poincare inequalities under CD(ρ, n) . . . . . . . . . . . . . . 412.2 Spectral gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3 Decay of variance . . . . . . . . . . . . . . . . . . . . . . . . . 442.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

1

Page 3: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

2.5 Exponential tails . . . . . . . . . . . . . . . . . . . . . . . . . 472.6 The exponential distribution . . . . . . . . . . . . . . . . . . 492.7 Log-concave measures . . . . . . . . . . . . . . . . . . . . . . 50

3 Logarithmic Sobolev inequalities 523.1 The strong gradient bound . . . . . . . . . . . . . . . . . . . 523.2 The logarithmic Sobolev inequality . . . . . . . . . . . . . . . 563.3 Log-Sobolev inequalities under CD(ρ, n) . . . . . . . . . . . . 57

3.3.1 The finite-dimensional improvement . . . . . . . . . . 603.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.5 Decay of entropy . . . . . . . . . . . . . . . . . . . . . . . . . 643.6 Exponential integrability and Gaussian tails . . . . . . . . . . 673.7 Hypercontractivity . . . . . . . . . . . . . . . . . . . . . . . . 69

4 Isoperimetric inequalities 724.1 Surface area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.1.1 Minkowski content . . . . . . . . . . . . . . . . . . . . 744.2 Isoperimetric inequalities . . . . . . . . . . . . . . . . . . . . 754.3 Isoperimetry and expansion . . . . . . . . . . . . . . . . . . . 77

4.3.1 The coarea formula . . . . . . . . . . . . . . . . . . . . 784.4 Gaussian-type isoperimetry under CD(ρ,∞) . . . . . . . . . . 804.5 Bobkov’s inequality . . . . . . . . . . . . . . . . . . . . . . . . 82

4.5.1 Equality cases for Bobkov’s inequality . . . . . . . . . 834.5.2 From Bobkov to log-Sobolev . . . . . . . . . . . . . . 844.5.3 Proof of Bobkov’s inequality . . . . . . . . . . . . . . 85

4.6 Exponential-type isoperimetry . . . . . . . . . . . . . . . . . . 874.6.1 Exponential-type isoperimetry implies Poincare . . . . 884.6.2 Milman’s theorem: from integrability to isoperimetry . 90

5 Riemannian manifolds 955.1 Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . 95

5.1.1 Tangent spaces . . . . . . . . . . . . . . . . . . . . . . 975.1.2 The tangent bundle . . . . . . . . . . . . . . . . . . . 985.1.3 The cotangent bundle . . . . . . . . . . . . . . . . . . 995.1.4 Vector bundles . . . . . . . . . . . . . . . . . . . . . . 1005.1.5 Tensors and tensor bundles . . . . . . . . . . . . . . . 1015.1.6 Riemannian metrics . . . . . . . . . . . . . . . . . . . 1025.1.7 Connections . . . . . . . . . . . . . . . . . . . . . . . . 1035.1.8 Integration on a manifold . . . . . . . . . . . . . . . . 1055.1.9 Markov triple on a compact Riemannian manifold . . 106

2

Page 4: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

5.1.10 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . 1075.2 The n-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.2.1 The ultraspheric triple . . . . . . . . . . . . . . . . . . 115

6 Gaussian spaces 1176.1 Infinite-dimensional Gaussian measures . . . . . . . . . . . . 117

6.1.1 Integration on Banach spaces . . . . . . . . . . . . . . 1176.1.2 Gaussian measures . . . . . . . . . . . . . . . . . . . . 1196.1.3 The Cameron-Martin space . . . . . . . . . . . . . . . 1226.1.4 Gaussian Hilbert spaces . . . . . . . . . . . . . . . . . 1266.1.5 The white noise representation . . . . . . . . . . . . . 1286.1.6 A compact Markov triple . . . . . . . . . . . . . . . . 131

3

Page 5: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Preface

These notes are a draft, and almost certainly full of mistakes.Please do not distribute them.

These are lecture notes for a course entitled Geometry of Markov Diffu-sions, given in the summer of 2016 at the University of Bonn. Most of thematerial contained in these notes comes from the book by Bakry, Gentil,and Ledoux [BGL14]. Compared to that book, these notes aim to be

• less general, less comprehensive, and therefore possibly easier for theuninitiated to digest; and

• read sequentially.

Prerequisites

These notes assume a basic knowledge of probability and functional analysis.For example, we will require martingales, the Hahn-Banach theorem, and aworking knowledge of various types of convergence. These notes explicitly donot require any background in stochastic processes or stochastic calculus.Although there is a strong connection between diffusion semigroups anddiffusion processes, we will not deal with it in these notes.

Outline

The main object of study in these notes is called a compact Markov diffusiontriple, which roughly consists of a space, a probability measure on it, andsome notion of the norm of the gradient of a function. Its construction,which requires some preliminaries, is the main task of Chapter 1. From thesecompact Markov diffusion triples we will construct Markov semigroups; ifone were so inclined, one could also go on to construct Markov processes. Themain point of these notes is that the analysis of these Markov semigroupscan reveal non-trivial properties of the underlying probability measure.

4

Page 6: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

In Chapters 2, 3, and 4, we introduce three families of functional inequali-ties: Poincare inequalities, logarithmic Sobolev inequalities, and isoperimet-ric inequalities. We will show how these inequalities are related, how theyfollow from properties of our Markov triples, and some of their geometricconsequences.

The final two chapters will expand our repertoire of Markov triples, byshowing how to canonically put Markov triples on weighted Riemannianmanifolds and infinite-dimensional Gaussian measures.

Acknowledgements

I would like to thank the following people (in alphabetical order) for pointingout errors and suggesting improvements: Claudio Bellani, Mathias Braun,and Mike Parucha.

5

Page 7: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Chapter 1

Markov diffusions

1.1 Some measure-theoretic preliminaries

A kernel on a measurable space (E,F) is a function p : E ×F → [0, 1] suchthat

• for every x ∈ E, p(x, ·) is a probability measure on (E,F); and

• for every A ∈ F , p(·, A) is a measurable function E → [0, 1].

We say that a measurable space (E,F) has the measure decompositionproperty if every measure µ on (E × E,F ⊗ F) can be decomposed asµ(dx, dy) = µ1(dx)p(x, dy) for some measure µ1 on E and some kernel p. Itis well-known that if E is a Polish space and F is its Borel σ-algebra then(E,F) satisfies the measure decomposition property.

We say that a measurable space (E,F) is good if it satisfies the measuredecomposition property and F is countably generated. The measure space(E,F ,m) is good if it satisfies the measure decomposition property and Fis countably generated up to m-null sets.

We will use the bi-measure theorem [MRN53] to show that Markov op-erators have kernel representations:

Theorem 1.1.1. Let (E,F) be a good measurable space, and let ν : F×F →[0,∞) be a map such that ν(·, A) and ν(A, ·) are finite measures for anyA ∈ F . Then there exists a finite measure ν on (E × E,F ⊗ F) such thatν(A,B) = ν(A×B) for all A,B ∈ F .

From now on, all measurable spaces will be assumed to be good and allmeasures will be assumed to be σ-finite.

6

Page 8: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

1.2 Markov operators and semigroups

Definition 1.2.1. A Markov operator P on (E,F) is a linear operator thatsends bounded measurable functions to bounded measurable functions, andalso satisfies

(a) P1 = 1, where 1 is the constant function with value 1; and

(b) Pf ≥ 0 whenever f ≥ 0.

If instead of (a) we have P1 ≤ 1 pointwise, then P is called a sub-Markov operator. Note that if P is sub-Markov and f takes values in [0, 1]then Pf ≥ 0 and P (1− f) ≥ 0, which implies that

0 ≤ Pf ≤ P1 ≤ 1,

and so ‖Pf‖∞ ≤ 1. By linearity, ‖Pf‖∞ ≤ ‖f‖∞ for all bounded, measur-able f .

Next, we observe that Markov operators satisfy a Jensen-type inequality:

Proposition 1.2.2. If P is a Markov operator then for every convex func-tion φ and every bounded, measurable f , Pφ(f) ≥ φ(Pf).

Proof. First, consider simple functions f . That is, let A1, . . . , Ak be a mea-surable partition of E and take f =

∑ki=1 ai1Ai for some ai ∈ R. Then

P1A1 , . . . , P1Ak are non-negative and sum to 1. Applying Jensen’s inequal-ity pointwise,

Pφ(f) =

k∑i=1

φ(ai)P1Ai ≥ φ

(k∑i=1

aiP1Ai

)= φ(Pf).

We may approximate any bounded, measurable f uniformly by simple func-tions. Since P is continuous in L∞(µ), a standard approximation argumentimplies that the claimed inequality holds for all bounded, measurable f .

Definition 1.2.3. The measure µ is invariant for P if∫Pf dµ =

∫f dµ

for every f ∈ L1(µ) ∩ L∞.

7

Page 9: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

As a consequence of Proposition 1.2.2, whenever µ is invariant for P thenP may be extended to a contraction on Lp(µ) for any 1 ≤ p < ∞. Indeed,applying Proposition 1.2.2 to the function φ(x) = |x|p yields∫

|Pf |p dµ ≤∫P (|f |p) dµ =

∫|f |p dµ

for every f ∈ L1(µ) ∩ L∞. Hence ‖Pf‖p ≤ ‖f‖p for all f ∈ L∞(µ) ∩ L1(µ).Since this set is dense in Lp(µ), we may extend P to a contraction on everyLp(µ). As a consequence, the requirement that f ∈ L∞ may be removedfrom Definition 1.2.3.

Definition 1.2.4. The measure µ is reversible for P if∫fPg dµ =

∫gPf dµ

for all f, g ∈ L2(µ).

It is easy to see that reversibility implies invariance: if µ is a finitemeasure then it is enough to just plug in g = 1. In the σ-finite case, onecan take a sequence gn of non-negative functions in L2(µ) increasing to 1.

Definition 1.2.5. A symmetric Markov semigroup on (E,F , µ) is a familyPt : t ≥ 0 of operators satisfying

(a) for every t ≥ 0, Pt is a Markov operator;

(b) P0f = f for all bounded measurable f ;

(c) Pt+s = PtPs for all s, t ≥ 0;

(d) for every t ≥ 0, µ is reversible for Pt; and

(e) for all f ∈ L2(µ), Ptf → f in L2(µ) as t→ 0.

Property (c) of Definition 1.2.5 is the reason for calling Pt a semigroup.Property (e) is sometimes called a strong continuity property, because itis equivalent to requiring that t 7→ Pt is continuous in the strong operatortopology for operators on L2(µ). The choice of L2(µ) as the space underlyingthis convergence will be convenient for us, but other choices are also possible.

8

Page 10: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

1.3 The kernel representation

In the setting of good measurable spaces, every Markov operator is actuallyan integral operator with respect to a probability kernel.

Proposition 1.3.1. Suppose P is a Markov operator that is continuous onL1(µ). Then there exists a probability kernel p on (E,F) such that for everyf ∈ L∞(µ) and µ-almost every x ∈ E,

(Pf)(x) =

∫f(y)p(x, dy). (1.1)

In particular, a Markov semigroup Pt may be represented by a familyof probability kernels pt : E ×F → [0, 1].

Proof. First, assume that µ is a probability measure and that P is a sub-Markov operator. Consider the map ν : F × F defined by

ν(A,B) =

∫1AP1B dµ.

We claim that ν(A, ·) and ν(·, A) are both finite measures. Indeed, onlythe countable additivity property is non-trivial; to check this property, itsuffices to show that if Bk is a decreasing sequence of sets with

⋂Bk = ∅

then ν(Bk, A) and ν(A,Bk) both converge to zero. The first of these followsfrom the monotone convergence theorem:

ν(Bk, A) =

∫1BkP1A dµ,

where the integrand converges pointwise downwards to zero because 1Bk ↓ 0and P1A ≥ 0. On the other hand,

0 ≤ ν(A,Bk) =

∫1AP1Bk dµ ≤

∫P1Bk dµ.

The right-hand side converges to zero because 1Bk → 0 in L1(µ) and P iscontinuous on L1(µ).

Now we may apply Theorem 1.1.1: there exists a measure ν on E × Esuch that ν(A×B) =

∫1AP1B dµ. By the measure decomposition property,

we may decompose ν as ν(dx, dy) = ν1(dx)p(x, dy). That is,∫E

1AP1B dµ =

∫E

1A(x)

∫E

1B(y)p(x, dy) ν1(dx). (1.2)

9

Page 11: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

With B = E, we obtain

ν1(A) =

∫E

1AP1 dµ ≤ µ(A).

It follows then that ν1 is absolutely continuous with respect to µ, withdensity dν1

dµ = P1.Since every bounded, measurable function may be approximated by a

sum of indicator functions, (1.2) implies that for every bounded, measurablef and every A ∈ F ,∫

E1A(x)Pf(x)µ(dx) =

∫E

1A(x)

∫Ef(y)p(x, dy) ν1(dx)

=

∫E

1A(x)

∫Ef(y)p(x, dy)P1(x)µ(dx).

Since A is arbitrary, it follows that

(Pf)(x) = P1(x)

∫Ef(y)p(x, dy)

for µ-almost every x.Next, we consider the case that µ is σ-finite. Let An be a growing

sequence of measurable sets of finite measure exhausting µ. Let µn be µrestricted to An, so that µn is a finite measure. If we define Pn by Pnf =P (f1An) then Pn is a sub-Markov operator that acts continuously on L1(µn).Since µn is finite, we may apply the first part of the proof to find a kernelpn satisfying

(Pnf)(x) = (Pn1)(x)

∫f(y)pn(x, dy)

for µn-almost every x. Finally, we may define the kernel p by settingp(x,B) = limn→∞ pn(x,B), where the limit exists because the definitionof Pn ensures that pn(x,B) is non-decreasing in n. By the monotone con-vergence theorem, for every bounded, non-negative, measurable function f ,

limn→∞

(Pnf)(x) = limn→∞

(Pn1)(x)

∫f(y)p(x, dy).

Now, Pnf = P (1Anf), and since 1Anf converges to f in L1(µ) it followsthat Pnf → Pf in L1(µ); since the convergence is monotone, it must alsohold for µ-almost every point x ∈ E. Applying the same reasoning to Pn1,we conclude that

(Pf)(x) = (P1)(x)

∫f(y)p(x, dy)

10

Page 12: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

for µ-almost every x. In the case that P is a Markov operator, this isexactly (1.1).

1.3.1 Density kernels

Sufficiently nice Markov operators (i.e., all the ones that we will be interestedin) can even be represented by kernels that have densities with respect tosome reference measure. From now on, we introduce the notation ‖P‖p,q forthe Lp(µ)→ Lq(µ) norm of the operator P . (The measure µ is absent fromthe notation, but it should always be clear from the context.)

Proposition 1.3.2. Suppose P is a Markov operator on (E,F , µ) satisfying‖P‖1,∞ = M <∞. Then there exists a measurable p : E×E → [0,∞) suchthat ‖p‖∞ = ‖P‖1,∞ and

(Pf)(x) =

∫f(y)p(x, y)µ(dy)

for all bounded, measurable f and µ-almost every x.

Proof. We consider only the case that µ is a probability measure; as before,the σ-finite case is done by taking limits. The basic idea is to fix an x and toconsider the linear functional f 7→ (Pf)(x), which is bounded on L1(µ) byour assumption on P . Then it can be represented by an element of L∞(µ):there is some px ∈ L∞(µ) such that ‖px‖∞ is bounded by the norm of ourfunctional (which is in turn bounded by M) and such that

(Pf)(x) =

∫f(y)px(y)µ(dy)

for every f . There are at least two problems with this basic idea: the firstis that Pf is only defined up to sets of measure zero, and so our linear func-tional does not make sense. The second is that we would need to somehowcombine the px(y) functions in such a way that they are measurable in x.

Both of the problems in the previous paragraph can be circumventedby dropping to a finite space: let Fn be a sequence of finite σ-algebrasexhausting F , satisfying µ(A) > 0 for all n and all A ∈ Fn (such a sequenceexists because of our assumption that (E,F , µ) is a nice measurable space).Consider the Markov operator Pn on (E,Fn, µ) defined by Pnf = E(Pf |Fn). Then ‖Pn‖1,∞ ≤M and we may carry out the program of the previousparagraph to find a (Fn ⊗ Fn)-measurable pn(x, y) satisfying ‖pn‖∞ ≤ Mand

(Pnf)(x) =

∫f(y)pn(x, y)µ(dy)

11

Page 13: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

for every f ∈ Fn and every x. This time, there is no problem making senseof the linear functional x 7→ (Pf)(x), since a measurable function on aσ-algebra containing only sets of positive measure is defined pointwise.

Now we send n→∞. Since Pnf = E(Pf | Fn) is a bounded martingale,it converges µ-almost surely to Pf . For any fixed x, pn(x, ·) is also a boundedmartingale, because for any A,B ∈ Fn∫A×B

pn+1 dµ⊗ µ =

∫1A(Pn+11B) dµ =

∫1A(Pn1B) dµ =

∫A×B

pndµ⊗ µ

(where the middle equality follows from the definition of Pn). Hence pnconverges (µ⊗µ)-almost everywhere to a limit p. Moreover, ‖p‖∞ ≤M and(by the dominated convergence theorem)∫

f(y)pn(x, y)µ(dy)→∫f(y)p(x, y)µ(dy)

for every f ∈ L∞(µ). Finally, the right hand side above must be equal toPf = limn→∞ Pnf µ-almost everywhere.

1.4 Some examples

We give some examples of symmetric Markov semigroups. Note that in noneof the examples below do we actually check the properties of Definition 1.2.5– this may be taken as an exercise.

Example 1.4.1 (The heat semigroup on Rn). Let µ be the Lebesgue measureon Rn, and define

pt(x, y) =1

(4πt)n/2e−|x−y|

2/4t.

That is, for fixed x, pt(x, ·) is the density of a Gaussian random variable withmean x and covariance 2tIn (where In denotes the n × n identity matrix).One can easily check that

(Ptf)(x) =

∫Rnf(y)pt(x, y) dy

defines a symmetric Markov semigroup on (Rn,B,Leb), where B is the Borelσ-algebra and Leb is the Lebesgue measure. This semigroup may also bedescribed in terms of Brownian motion: if Bt is a standard Brownian motionon Rn then Ptf(x) = Ef(x+

√2Bt).

12

Page 14: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Example 1.4.2 (The wrapped heat semigroup). Let pt be the heat kernelon R from the previous example, and define the semigroup PWt on ([0, 1],B)by

(PWt f)(x) =

∫ 1

0f(y)

∑k∈Z

pt(x, k + y) dy.

Then this defines a symmetric Markov semigroup with respect to the Lebesguemeasure on [0, 1]. In the same way that the heat semigroup on Rn was associ-ated with a Brownian motion, this semigroup is associated with a “wrapped”Brownian motion: let BW

t = Bt − bBtc, where Bt is a Brownian motion onR. Then PWt f(x) = Ef(x +

√2BW

t ). This is called a wrapped Brownianmotion because it runs like a Brownian motion on [0, 1] until it hits one ofthe endpoints, at which point it wraps around to the other endpoint.

Example 1.4.3 (The reflected heat semigroup). With pt again the heatkernel on R, define the semigroup PRt on ([0, 1],B) by

(PRt f)(x) =

∫ 1

0f(y)

∑k∈Z

(pt(x, 2k + y) + pt(x, 2k − y)) dy.

This also defines a symmetric Markov semigroup with respect to the Lebesguemeasure on [0, 1]. This one is associated, however, with a “reflected” Brow-nian motion

BRt =

Bt − bBtc if bBtc is even

1−Bt + bBtc if bBtc is odd,

which runs like a Brownian motion on [0, 1] until it hits the boundary, atwhich point it bounces off.

Example 1.4.4 (The Ornstein-Uhlenbeck semigroup on Rn). Let γn bethe standard Gaussian measure on Rn (i.e., the measure whose density is(2π)−n/2e−|x|

2/2 with respect to the Lebesgue measure). Then

(Ptf)(x) =

∫Rnf(e−tx+

√1− e−2ty) dγn(y)

defines a symmetric Markov semigroup on (Rn,B, γn). Note that represen-tation above (unlike our previous examples) is not written in terms of adensity kernel, but by a change of variables it can be put in that form.

13

Page 15: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

1.5 Generators

The generator of a Markov semigroup is a (usually unbounded) operator thatdescribes the semigroup’s behavior for infinitesmally small t. (essentially, itis the derivative of the semigroup). As we will show later, the generatordetermines the semigroup completely.

Definition 1.5.1. Let Pt : t ≥ 0 be a symmetric Markov semigroup on(E,F , µ). We define its generator L by

Lf = limt→0

Ptf − ft

,

with the domain D(L) being the set of f ∈ L2(µ) such that the limit existsin L2(µ). We may also define L on other domains: let Dp(L) be the set off ∈ Lp(µ) such that the limit above exists in Lp(µ); then L is also defined asan operator from Dp(L) to Lp(µ). Note that there is no ambiguity in usingthe letter L for all of these operators, since if a sequence converges in bothLp(µ) and Lq(µ) then the limits coincide.

The semigroup property (Definition 1.2.5, part (c)) may be used to showthat for any f ∈ D(L) and t ≥ 0, Ptf ∈ D(L) and

LPtf = PtLf = lims→0

Pt+sf − Ptfs

,

where the limit exists in L2(µ); we leave this as an exercise.The symmetry (Definition 1.2.5, part (d)) of Pt : t ≥ 0 may also be

seen in the generator:

Proposition 1.5.2. For every f, g ∈ D(L),∫fLg dµ =

∫gLf dµ.

For every f ∈ D1(L), ∫Lf dµ = 0.

Proof. By the symmetry of Pt,

1

t

∫f(Ptg − g) dµ =

1

t

∫g(Ptf − f) dµ.

14

Page 16: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

The definition of L ensures that the left hand side converges to∫fLg dµ

as t → 0, while the right hand side converges to∫gLf dµ. This proves the

first claim.For the second claim, recall that µ is invariant for Pt, and so

∫Ptf dµ =∫

f dµ for all f ∈ L1(µ). Hence,∫Ptf − f

tdµ = 0

for all t. Recall that f ∈ D1(L) means that the integrand above convergesto Lf in L1(µ) as t→ 0. The second claim follows.

1.6 The carre du champ

There is another useful way, known as the carre du champ of characterizingthe infinitesmal behavior of a Markov semigroup. Suppose that we are ableto find an algebra A contained in D(L) (that is, A is a vector space offunctions that is closed under multiplication). For example, if E is a subsetof Rn then A might consist of smooth functions compactly supported in E.

Definition 1.6.1. The carre du champ associated to L is the bilinear formΓ on A×A defined by

2Γ(f, g) = L(fg)− fLg − gLf.

We will abbreviate Γ(f, f) by Γ(f).

Note that we have defined Γ in terms of L, but one can also go (atleast partly) the other way: if L is any operator satisfying the symmetryand invariance conditions in Proposition 1.5.2 and the formula 2Γ(f, g) =L(fg)− fLg − gLf then∫

EΓ(f, g) dµ = −

∫EfLg dµ

for every f, g ∈ A. If A is dense in L2(µ) then this determines Lg for everyg ∈ A. Whether this determines L completely is a slightly more delicatepoint, which we will investigate later.

Proposition 1.6.2. Suppose that Γ is the carre du champ associated toL, which is itself the generator of a symmetric Markov semigroup. ThenΓ(f) ≥ 0 pointwise for every f ∈ A.

15

Page 17: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proof. By the definition of Γ, it suffices to show that if f ∈ A then L(f2) ≥2fLf pointwise. Now, Proposition 1.2.2 implies that Pt(f

2) ≥ (Ptf)2 point-wise, and hence

L(f2) = limt→0

Pt(f2)− f2

t≥ lim

t→0

(Ptf)2 − f2

t=d(Ptf)2

dt

∣∣∣∣t=0

.

On the other hand, the product rule yields

d

dt(Ptf)2 = 2f

d

dtPtf = 2fLPtf.

Since Γ is bilinear, the statement Γ(f + g) ≥ 0 may be rearranged intoa Cauchy-Schwarz inequality for Γ: for all f, g ∈ A,

Γ(f, g) ≤√

Γ(f)Γ(g).

1.6.1 The examples again

We revisit our examples from Section 1.4 in order to compute their genera-tors and carre du champs. In general, it can be tricky to figure out exactlywhat the domain of the generator is, but it usually suffices to specify thegenerator on a sufficiently large subset of its domain. We will say moreabout this later.

Example 1.6.3 (The heat semigroup on Rn). If f : Rn → R is bounded andhas two bounded, uniformly continuous derivatives then Taylor’s theoremmay be used to check that

Ptf − ft

L∞→ ∆f,

where Pt is the heat semigroup of Example 1.4.1 and ∆f =∑n

i=1∂2f∂x2i

. Thus,

D∞(L) contains all smooth, bounded functions, and Lf = ∆f for such func-tions. Since bounded functions of compact support belong to L2(Leb), onecan easily check that D(L) contains all twice-differentiable functions of com-pact support (which we denote by C2

c (Rn)). Since C2c (Rn) is an algebra, we

may define Γ on it. Some algebra reveals that

Γ(f, g) = 〈∇f,∇g〉

for f, g ∈ C2c (Rn).

16

Page 18: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Example 1.6.4 (The wrapped heat semigroup). To find the generator of thewrapped heat semigroup, we first define the periodic extension fW : R → Rof a function f : [0, 1] → R by fW (x) = f(x − bxc). If f has continoussecond derivatives and satisfies f (k)(0) = f (k)(1) for k = 0, 1, 2 (where f (k)

means the kth derivative of f , and f (0) = f) then fW is C2 and boundedon R. Moreover, one can easily check by rearranging the definition of PWtthat PWt f(x) = Ptf

W (x) for all x ∈ [0, 1]. Then we know from the previousexample that

PWt f − ft

=Ptf

W − fW

t

L∞→ (fW )′′ = f ′′

on [0, 1] for all functions f as above. Since convergence in L∞ impliesconvergence in L2([0, 1]), it follows that D(L) contains all smooth functionssatisfying f (k)(0) = f (k)(1) for all k = 0, 1, 2. This set of functions is analgebra, and on it we have Γ(f, g) = f ′g′.

Example 1.6.5 (The reflected heat semigroup). The reflected heat semi-group can be analyzed in a similar way to the wrapped heat semigroup, butby taking a different extension: for f : [0, 1]→ R, define fR : R→ R by

fR(x) =

f(x− bxc) if bxc is even

f(1− x+ bxc) if bxc is odd.

In order to make fR belong to C2(R), we need f to be in C2([0, 1]) and tosatisfy f ′(0) = f ′(1) = 0. For such functions, we argue as in the previousexample that Lf = f ′′ and Γ(f, g) = f ′g′. Note that L and Γ are given bythe same formulas in both the wrapped and reflected heat semigroups; theonly difference is that their domains might be different (although we haven’tshown this yet, since we haven’t tried to actually characterize the domains,but only to find subsets of them).

Example 1.6.6 (The Ornstein-Uhlenbeck semigroup on Rn). The generatormay be computed in a similar way to the generator of the heat semigroup.Since we left out that computation we will include this one, but only in thecase n = 1, and with the simplifying assumption that f has three boundedderivatives (instead of two uniformly continuous derivatives). By Taylor’s

17

Page 19: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

theorem, if z = (e−t − 1)x+√

1− e−2ty then

(Ptf)(x)− f(x) =

∫Rf(e−tx−

√1− e−2ty)− f(x) dγ1(y)

≤∫Rf ′(x)z dγ1(y) +

1

2

∫Rf ′′(x)z2 dγ1(y)

+‖f ′′′‖∞

6

∫Rz3 dγ1(y).

We can compute ∫z dγ1(y) = (e−t − 1)x∫z2 dγ1(y) = (e−t − 1)2x2 + 1− e−2t∫z3 dγ1(y) = O(t3/2) as t→ 0.

Hence,(Ptf)(x)− f(x)

t= f ′′(x)− xf ′(x) +O(

√t)

as t→ 0, where the O(√t) term is uniform in x. It follows that the left hand

side converges in L∞ (and therefore also in L2(γ1), since γ1 is a probabilitymeasure) to f ′′(x)−xf ′(x). That is, the generator of the Ornstein-Uhlenbecksemigroup on R acts as Lf(x) = f ′′(x)−xf ′(x) for all bounded f with threebounded derivatives. An analogous computation gives

Lf(x) = ∆f(x)− 〈x,∇f(x)〉

in higher dimensions. However, the carre du champ operator remains as itdid in the case of the heat semigroup: Γ(f, g) = 〈∇f,∇g〉.

1.7 Self-adjoint operators

Let H be a (separable) Hilbert space. If A and B are two operators definedon subsets of H, then we write A ⊂ B (and say that “B is an extension ofA”) if D(A) ⊂ D(B) and Af = Bf for all f ∈ D(A).

Definition 1.7.1. An operator A : D(A) → H is symmetric if 〈Af, g〉 =〈f,Ag〉 for all f, g ∈ D(A).

18

Page 20: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Definition 1.7.2. If D(A) is dense in H then the adjoint of A is the oper-ator A∗ defined by

〈A∗f, g〉 = 〈f,Ag〉

on D(A∗) = f ∈ H : the linear functional g 7→ 〈f,Ag〉 is bounded. Anoperator A is self-adjoint if A∗ = A.

In order to make sense of the definition of an adjoint, note that if D(A)is dense in H then for every bounded linear functional ` : D(A) → R thereis a unique h ∈ H such that `(g) = 〈h, g〉 for all g ∈ D(A). Hence, D(A∗) isdefined to be exactly the set of f such that the equation

〈h, g〉 = 〈f,Ag〉 for all g ∈ D(A)

uniquely defines h.Note that a self-adjoint operator is always symmetric, but that a sym-

metric operator is not necessarily self-adjoint – the issue is that for a sym-metric operator A, D(A) and D(A∗) might not coincide.

Exercise 1.7.1. Show that if A is symmetric and densely defined then A ⊂A∗ and A∗ is closed. (In particular, every self-adjoint operator is closed.)

Example 1.7.3. Take H = L2([0, 1],Leb). Consider the operator A definedby Af = f ′′ on D(A) = C2

c ([0, 1]) and the operator B defined by Bf = f ′′ on

D(B) = f ∈ C2([0, 1]) : f(0) = f(1) = 0.

Note that A ⊂ B, that both are densely defined, and that both are symmetricoperators because∫ 1

0f(x)g′′(x) dx = [fg′]10 − [f ′g]10 +

∫ 1

0f ′′(x)g(x) dx

for all f, g ∈ C2([0, 1]). Whenever both f and g belong to either D(A) orD(B), the right hand side reduces to

∫ 10 f′′g dx.

Next, consider the adjoints of A and B: for f ∈ C2, consider the linearfunctional `f defined by

`f (g) =

∫ 1

0fg′′ dx.

Now, f ∈ D(A∗) if and only if `f is bounded on D(A). But for g ∈ D(A),we have

`f (g) =

∫ 1

0f ′′g dx ≤ ‖f ′′‖2‖g‖2,

19

Page 21: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

and so we see that D(A∗) contains all of C2([0, 1]). In particular, D(A∗) issubstantially larger than D(A) and so A is certainly not self-adjoint.

On the other hand, f ∈ D(B∗) if and only if `f is bounded on D(B).For g ∈ D(B),

`f (g) = [fg′]10 +

∫ 1

0f ′′g dx.

Now, if f ∈ D(B) then the first term on the right hand side disappears andso `f is bounded on D(B). On the other hand, if f 6∈ D(B) then we mayfind a sequence gn ∈ D(B) such that gn → 0 but g′n(0) = g′n(1) = 1 for everyn. For this sequence, `f (gn) → f(1) − f(0) 6= 0 and so `f is not boundedon D(B). To summarize, f ∈ C2([0, 1]) belongs to D(B∗) if and only iff ∈ D(B). In some sense, B is much closer to being self-adjoint than A is;the next exercise makes this more precise.

Exercise 1.7.2. In the example above, show that B = B∗ and that B isself-adjoint. (Hint: use Fourier series to characterize their domains.)

1.7.1 Generators are self-adjoint

The entire reason for introducing self-adjoint operators in these notes is toapply their theory to generators of Markov semigroups. For the rest of thissection, let Pt : t ≥ 0 be a symmetric Markov semigroup on (E,F , µ); letL be its generator.

Proposition 1.7.4.

(a) D(L) is dense in L2(µ).

(b) L is closed.

(c) L is self-adjoint.

Proof. For ε > 0, define the operator Aε by Aεf = 1ε

∫ ε0 Psf ds. Note that

Aε computes with Pt, and that Aεf → f as ε→ 0 for every f ∈ L2(µ).Define the operator Lt by Ltf = 1

t (Ptf − f); then Ltf → Lf as t→ 0 ifand only if f ∈ D(L). Now, for any t and ε

LtAεf =

∫ t+εt Psf ds−

∫ ε0 Psf ds

εt

=

∫ t+εε Psf ds−

∫ t0 Psf ds

εt= LεAtf.

20

Page 22: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Taking t → 0, LεAtf → Lεf (since Lε is bounded for any fixed ε > 0.We conclude that for any ε > 0 and f ∈ L2(µ) we have Aεf ∈ D(L) andLAεf = Lεf . Since any f ∈ L2(µ) may be approximated arbitrarily well byAεf ∈ D(L), it follows that D(L) is dense in L2(µ).

To check that L is closed, suppose that fn ∈ D(L) with fn → f andLfn → g. Note that At commutes with Lt, and therefore also with L.Hence,

Ltf = limn→∞

Ltfn = limn→∞

LAtfn = limn→∞

AtLfn = Atg.

Taking t → 0, it follows that Ltf → g, which implies that f ∈ D(L) andLf = g.

Finally, we check that L is self-adjoint: the only thing to check is that iff ∈ D(L∗) then f ∈ D(L). Suppose, then, that f ∈ D(L∗). Then there is aconstant C such that for all g ∈ D(L),

C‖g‖2 ≥∫EfLg dµ = lim

t→0

∫EfLtg dµ = lim

t→0

∫EgLtf dµ.

In other words, there is a bounded linear functional ` on D(L) such that∫EgLtf dµ→ `(g).

Since D(L) is dense, it follows that there is some h ∈ L2(µ) such that`(g) =

∫E hg dµ for all g ∈ D(L) and Ltf h. This is almost what

we want, except that we need to upgrade the weak convergence to strongconvergence.

Recalling that LAt = Ltf h and that Atf → f as t → 0, we seethat the pair (Atf, LAtf) (f, h) in L2(µ) × L2(µ). Since the graph of Lis convex and closed, it is also weakly closed. Hence (f, h) belongs to thegraph of L, and so f ∈ D(L).

Definition 1.7.5. A self-adjoint operator A on H is non-negative if 〈Af, f〉 ≥0 for all f ∈ D(A). A is non-positive if −A is non-negative.

Proposition 1.7.6. If L is the generator of a symmetric Markov semigroupthen L is non-positive.

Proof. For f ∈ D(L),∫fLf dµ = lim

t→0

1

t

∫fPtf − f2 dµ = lim

t→0

‖Pt/2f‖22 − ‖f‖22t

.

For any t > 0, the numerator is non-positive because Pt is a contraction onL2(µ).

21

Page 23: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

To summarize, every generator of a symmetric Markov semigroup is anon-positive self-adjoint operator. It turns out that the converse is partlytrue: every non-positive self-adjoint operator on L2(µ) is the generator ofsome strongly continuous contraction semigroup on L2(µ) (we will say whatthat means later).

1.7.2 The Friedrichs extension

The problem with self-adjoint operators is that it is sometimes difficult toactually specify them. Most of the time (as in Example 1.7.3), we wantto define our operators only on a set of sufficiently smooth functions; thiswill never be a large enough domain for the operator to be self-adjoint.Fortunately, there is a general recipe for finding a self-adjoint extension of adensely-defined, non-negative symmetric operator. Such an extension is notalways unique.

Suppose that A is a densely defined, non-negative symmetric operatoron a Hilbert space H. Now define the positive, symmetric bilinear form〈·, ·〉A on D(A) by

〈f, g〉A = 〈f, g〉+ 〈Af, g〉 = 〈f, g〉+ 〈f,Ag〉.

Now let H1 ⊆ H be the elements of the form limn→∞ fn, where the limit iswith respect to ‖·‖ and fn ∈ D(A) is a sequence that is Cauchy with respectto ‖·‖A. Since A is non-negative, every sequence that is Cauchy with respectto ‖ · ‖A is also Cauchy with respect to ‖ · ‖, and so it always has a limit inH. One can then extend ‖ · ‖A to H1, by defining ‖f‖A = limn→∞ ‖fn‖Awhenever fn is a sequence as above.

Exercise 1.7.3.

(a) Check that the definition of ‖f‖A for f ∈ H1 above does not dependon the choice of fn.

(b) Show that H1 is complete with respect to ‖ · ‖A.

Now we define a new operator A on H by 〈Af, g〉 = 〈f, g〉A − 〈f, g〉 onthe domain

D(A) = f ∈ H1 : ∃ C(f) with 〈f, g〉A ≤ C(f)‖g‖ for all g ∈ H1.

Note that our definition of A makes sense because for every f ∈ D(A), thefunctional 〈f, ·〉A−〈f, ·〉 extends uniquely to a bounded linear functional onH.

Exercise 1.7.4. Show that A is a self-adjoint extension of A. It is calledthe Friedrichs extension.

22

Page 24: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

1.8 Spectral theory of self-adjoint operators

Let H be a (separable) Hilbert space.

Definition 1.8.1. A family Hλ : λ ≥ 0 of closed linear subspaces is aspectral decomposition of H if

(a) Hλ ⊆ Hλ′ whenever λ ≤ λ′;

(b) Hλ =⋃λ′>λHλ′; and

(c)⋃λ≥0Hλ = H.

Take a spectral decomposition Hλ and consider the orthogonal projec-tion operators Eλ : H → Hλ. For any f ∈ H, the function λ 7→ 〈Eλf, f〉 =‖Eλf‖2 is non-decreasing, right-continuous, and bounded by ‖f‖2. For anyf, g ∈ H,

〈Eλf, g〉 =1

2(〈Eλ(f + g), f + g〉 − 〈Eλf, f〉 − 〈Eλg, g〉) ,

and hence the function λ 7→ 〈Eλf, g〉 is right-continuous and has boundedvariation. For a bounded, measurable function ψ : R+ → R, define theoperator Tψ : H → H by

〈Tψf, g〉 = ψ(0)〈E0f, g〉+

∫ ∞0

ψ(λ) d〈Eλf, g〉. (1.3)

This defines a bounded operator, because if ‖f‖ = ‖g‖ = 1 then the totalvariation of the function λ 7→ 〈Eλf, g〉 is at most 3. If ψ is unbounded, wemay define Tψ on the domain

D(Tψ) =

f ∈ H :

∫ ∞0

ψ2(λ)d〈Eλf, f〉 <∞

by the same formula (1.3).

Exercise 1.8.1. Our definition of Tψ in (1.3) satisfies the following prop-erties:

(a) If φ, ψ : R+ → R are two bounded, measurable functions then Tφψ =TφTψ.

(b) If φ : R+ → R is bounded and measurable then Tφ is symmetric andas norm at most ‖φ‖∞.

23

Page 25: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

(c) If ψn is a sequence of bounded, measurable functions converging eithermonotonically or uniformly on compact sets to a measurable functionψ then Tψnf → Tψf for all f ∈ D(Tψ).

(d) For any bounded, measurable ψ and any f, g ∈ D(Tψ),

d〈EλTψf, g〉 = ψ(λ)d〈Eλf, g〉.

(e) For any measurable ψ, Tψ is self-adjoint.

(f) For any non-negative, measurable ψ, Tψ is non-negative.

Theorem 1.8.2 (Spectral theorem for self-adjoint operators). If A : D(A)→H is non-negative and self-adjoint then there exists a spectral decompositionHλ such that when ψ is the identity function x 7→ x then Tψ = A.

We call the spectral decomposition in Theorem 1.8.2 the spectral decom-position of A. In this case, we write ψ(A) instead of Tψ.

Exercise 1.8.2. If ψ is a polynomial then ψ(A) as defined above coincideswith the “usual” definition of ψ(A) (e.g. A2f = A(Af), with D(A2) = f ∈D(A) : Af ∈ D(A)).

The spectral decomposition of an operator is closely related to its setof eigenvalues and eigenfunctions. In particular, we will use the followingcharacterization of the kernel of A:

Proposition 1.8.3. If Hλ is the spectral decomposition of the non-negative,self-adjoint operator A then kerA = H0.

Proof. First, suppose that f ∈ kerA. Then 〈Af, f〉 = 0 and so by Theo-rem 1.8.2 and (1.3),

0 = 〈Af, f〉 =

∫ ∞0

λd〈Eλf, f〉.

It follows that the measure d〈Eλf, f〉 is identically zero and so 〈Eλf, f〉is constant for λ ∈ (0,∞). Right-continuity (Definition 1.8.1, part (b))of the spectral decomposition implies that 〈Eλf, f〉 → ‖E0f‖2 as λ → 0;exhaustiveness (part (c)) implies that 〈Eλf, f〉 → ‖f‖2 as λ → ∞. Weconclude that ‖f‖ = ‖E0f‖, and so f ∈ H0.

On the other hand, if f ∈ H0 then Eλf is constant for λ ∈ (0,∞) and sothe measure d〈Eλf, g〉 is zero on (0,∞) for any g. Theorem 1.8.2 and (1.3)imply that

〈Af, g〉 =

∫ ∞0

λd〈Eλf, g〉 = 0

for any g ∈ D(A). Since D(A) is dense, it follows that Af = 0.

24

Page 26: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

1.8.1 The semigroup generated by a self-adjoint operator

Now suppose that L is a non-positive, self-adjoint operator. Then −L isnon-negative. Since the function λ 7→ e−tλ is bounded and non-negative onR+, we may apply the spectral theorem to define Pt = etL.

Proposition 1.8.4. The family Pt : t ≥ 0 satisfies the following properties

(a) Pt+s = PtPs for all s, t ≥ 0;

(b) for every t ≥ 0 and f ∈ L2(µ), ‖Ptf‖2 ≤ ‖f‖2;

(c) for every t ≥ 0, µ is reversible for Pt;

(d) for all f ∈ L2(µ), Ptf → f in L2(µ) as t→ 0;

(e) for every f ∈ D(L), Lf = limt→0Ptf−ft ; and

(f) for any f ∈ L2(µ), every t > 0, and every k ∈ N, Ptf ∈ D(Lk).

Note that Proposition 1.8.4 doesn’t assert that Pt is a Markov operatorfor any t. Using Hille-Yosida theory, it is possible to fully characterize thegenerators L that give rise to Markov semigroups. We will not present thischaracterization, however, since we are interested in a special class of gen-erators (diffusion generators) that will turn out to always produce Markovsemigroups.

Proof. The semigroup property Pt+s = PtPs follows from part (a) of Exer-cise 1.8.1. The bound ‖Ptf‖2 ≤ ‖f‖2 follows from part (b) of Exercise 1.8.1.The reversibility of µ also follows from part (b) of Exercise 1.8.1 (recallingthat the inner product here is 〈f, g〉 =

∫E fg dµ). The convergence Ptf → f

follows from part (c) of Exercise 1.8.1, since the function λ 7→ e−tλ convergespointwise and monotonically to the function with constant value 1. Sinceλ 7→ e−tλ−1

t converges monotonically to the function λ 7→ −λ as t → 0, thesame part of Exercise 1.8.1, shows that 1

t (Ptf − f)→ Lf if f ∈ D(L).Finally, note that g ∈ D(Lk) whenever

∫∞0 λ2k d〈Eλg, g〉 < ∞. Now,

part (d) implies that

d〈EλPtf, Ptf〉 = d〈EλP2tf, f〉 = e−2λtd〈Eλf, f〉

for any f ∈ L2(µ). Hence,∫ ∞0

λ2k d〈EλPtf, Ptf〉 =

∫ ∞0

e−2tλλ2k d〈Eλf, f〉 <∞.

25

Page 27: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

1.9 Compact Markov diffusions

In all the examples we’ve seen so far, L was a second-order differentialoperator and Γ was a first-order differential operator. From now on, we willbe interested only in semigroups of this form. We will encode this property(which we call the diffusion property) as an algebraic assumption on Γ.

Definition 1.9.1 (Diffusion carre du champ). Suppose that A is an algebraof functions E → R that is closed under composition with smooth functions:for every f1, . . . , fk ∈ A and any C∞ function Ψ : Rk → R, the functionΨ(f1, . . . , fk) also belongs to A. We say that a bilinear form Γ : A×A → Ais a diffusion carre du champ if for every Ψ as above,

Γ(Ψ(f1, . . . , fk), g) =

k∑i=1

∂iΨ(f1, . . . , fk)Γ(fi, g).

Definition 1.9.1 may be seen as a sort of chain rule for Γ, which maybe seen to imply a sort of second-order chain rule for L: recall that for asymmetric Markov semigroup, its generator L and carre du champ form Γare related by the formula

2Γ(f, g) = L(fg)− fLg − gLf.

If Γ is a diffusion carre du champ then by some manipulations, we observethat

LΨ(f1, . . . , fk) =k∑i=1

∂iΨ(f1, . . . , fk)Lfi +k∑

i,j=1

∂i∂jΨ(f1, . . . , fk)Γ(fi, fj).

(1.4)For the special case k = 1, we have

Γ(ψ(f), g) = ψ′(f)Γ(f, g)

Lψ(f) = ψ′(f)Lf + ψ′′(f)Γ(f).

In particular, if f ∈ A is a constant function then Lf = 0 and Γ(f, g) = 0for all g ∈ A.

Definition 1.9.2. An operator L satisfying (1.4) is called a diffusion gen-erator. A symmetric Markov semigroup whose generator is a diffusion gen-erator is called a diffusion semigroup.

26

Page 28: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Example 1.9.3. Since we have already seen several examples of diffusionsemigroups, let us consider an example which is not a diffusion semigroup.Let µ be a probability measure on R such that

∫R x

2 dµ(x) <∞. Define

(Ptf)(x) =∑k≥0

tk

k!

∫Rf(x+ y) dµ∗k(y),

where µ∗k denotes the k-fold convolution of µ with itself, and µ∗0 is the unitmass at zero. One can check that this defines a symmetric Markov semigroupwith respect to the Lebesgue measure (it is usually known as a compoundPoisson semigroup). Then

Ptf(x)− f(x)

t=∑k≥1

tk−1

k!

∫Rf(x+ y) dµ∗k(y).

If f ∈ L2(R,Leb) then the function x 7→∫f(x + y) dµ∗k(y) also belongs

to L2(R,Leb), and its norm is at most ‖f‖2. It follows that for every f ∈L2(R,Leb), ∥∥∥∥Ptf − ft

−∫Rf(·+ y) dµ(y)

∥∥∥∥2

≤∑k≥2

tk−1

k!‖f‖2,

and the right hand side converges to zero as t → 0. Hence, D(L) =L2(R,Leb) and

(Lf)(x) =

∫Rf(x+ y) dµ(y).

This is certainly not a second-order differential operator, and it also failsto be a diffusion operator. One way to see this is that if L is a diffusionoperator and ψ is constant on a neighborhood of f(x) then Lψ(f)(x) = 0;this property clearly does not hold for the operator L above.

1.9.1 Dirichlet forms

Definition 1.9.4. A bilinear map E : D(E)×D(E)→ R is a Dirichlet formif

(a) D(E) is a dense subset of L2(µ) for some measure µ;

(b) E(f, g) = E(g, f) for all f, g ∈ D(E);

(c) E(f) := E(f, f) ≥ 0 for all f ∈ D(E);

27

Page 29: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

(d) D(E) is complete with respect to the inner product 〈f, g〉E =∫fg dµ+

E(f, g); and

(e) for every f ∈ D(E), (0 ∨ f ∧ 1) ∈ E and E(0 ∨ f ∧ 1) ≤ E(f).

Note that if we have a symmetric, non-negative bilinear form E definedon some dense D ⊂ L2(µ) then we can always try to extend it to the closureof D under the norm ‖ · ‖L2(µ) + E(·). This may be done unambiguously ifand only if E is closable on D, in the sense that if fn → 0 in L2(µ) and fnis Cauchy with respect to 〈·, ·〉E then E(fn)→ 0.

Now suppose that Γ is a diffusion carre du champ defined on an algebraA that is dense in L2(µ). Suppose also that Γ is non-negative, in the sensethat Γ(f, f) ≥ 0 pointwise for all f ∈ A. We may then define a symmetric,non-negative bilinear form E on A by

E(f, g) =

∫Γ(f, g) dµ.

If we suppose for now that this is closable, we may take D(E) to be thecompletion of A under 〈·, ·〉E and extend E to D(E)×D(E) as above.

We claim that this E is a Dirichlet form; to do this, it suffices to showproperty (e). For ε > 0, let ψε be a smooth approximations to the functionψ0(x) = 0 ∨ x ∧ 1; specifically, suppose that 0 ≤ ψε ≤ ψ0 and that ψ′εconverges upwards to 1(0,1). By the diffusion property of Γ, ψε(f) ∈ A forevery ε > 0 and

Γ(ψε(f)) = ψ′ε(f)2Γ(f).

In particular, ψε(f)2 converges upwards to ψ0(f)2 and Γ(ψε(f)) convergesupwards to 1f∈(0,1)Γ(f). It follows that ψε(f) is Cauchy with respect to〈·, ·〉E , and so its limit in L2(µ) (namely, ψ0(f)) belongs to D(E). Finally,ψ′ε ≤ 1 everywhere; hence

E(ψε(f)) =

∫ψ′ε(f)2Γ(f) dµ ≤

∫Γ(f) dµ = E(f).

Taking ε→ 0 shows that the E(ψ0(f)) ≤ E(f).As a final remark on Dirichlet forms, note that a Dirichlet form may

always be used to define a non-positive, self-adjoint operator L by setting∫gLf dµ = −E(f, g)

on the domain

D(L) = f ∈ D(E) : ∃ C such that E(f, g) ≤ C‖g‖2 for all g ∈ D(E).

28

Page 30: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Indeed, such an operator is clearly non-positive, and it is self-adjoint for thesame reason that the Friedrichs extension is self-adjoint. Note that D(E)plays the same role here as H1 played in defining the Friedrichs extension.

1.9.2 Compact Markov diffusion triple

The following definition gives the main setting for the rest of this course.The basic idea is to start with a space, a measure, and a carre du champoperator and build everything up from those. For simplicity, we will restrictourselves initially to “compact” Markov triples. This does not mean thatthe space need necessarily be compact (indeed, we will never even put atopology on it). The book [BGL14] on which these notes are based go intomuch more detail on various less restrictive settings.

Definition 1.9.5. Let (E,F , µ) be a good measure space, where µ is a prob-ability measure. For A ⊂ L2(µ), let Γ : A×A → A be a symmetric, bilinearform. We say that (E,µ,Γ) is a compact Markov diffusion triple if thefollowing conditions are all satisfied:

(a) A is dense in L2(µ).

(b) A is closed under composition with smooth functions Ψ : Rk → R (inparticular, it is an algebra and it contains the constant functions).

(c) Γ(f) = Γ(f, f) ≥ 0 pointwise for all f ∈ A.

(d) Γ is a diffusion carre du champ.

(e) If Γ(f) = 0 then f is constant.

Now define E(f, g) =∫

Γ(f, g) dµ for f, g ∈ A.

(f) For every f ∈ A, there exists a constant C such that E(f, g) ≤ C‖g‖L2(µ)

for all g ∈ A.

It follows that E is closable, and so we may extend it to a Dirichlet form. LetL be the self-adjoint operator defined by

∫E fLg dµ = −E(f, g) on the domain

where this makes sense. Note that (f) implies A ⊂ D(L). Let Pt = etL.

(g) LA ⊂ A.

(h) PtA ⊂ A.

29

Page 31: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Most of the requirements in the definition above should be familiar bynow. Part (e) is new; it is called the ergodicity condition. In the case thatΓ(f, g) = 〈∇f,∇g〉 on some E ⊂ Rn, it is equivalent to assuming that E isconnected. Part (h) is also new, but in the concrete cases we consider it willmostly be easy to show.

As we have already mentioned, the book [BGL14] considers several moregeneral settings. Essentially, it turns out to be desirable to weaken assump-tion (b) and remove assumption (h). For some motivation, suppose thatE = Rn. Under assumption (b), A contains constant functions. Hence, theassumption that A ⊂ L2(µ) can only hold if µ is finite; in particular, theheat semigroup is already excluded. If we restricted the smooth functions Ψto those satisfying Ψ(0) = 0, we could try to take A = C∞c (Rn). However,the heat semigroup maps compactly supported functions to fully supportedfunctions, so (h) will fail.

To deal with this issue, [BGL14] introduces two algebras: they begin bydefining everything on an “inner” algebra A0 (think of C∞c (E)) and thenlater extending everything to an “outer” algebra A (think of C∞(E)). Onlythe inner algebra A0 is assumed to belong to L2(µ), and it is only assumedto be closed under composition with smooth functions vanishing at zero.Part (h) is assumed for A, but not for A0.

Although the setting with two algebras is rather more general than theone that we present, the main ideas may already be developed in the set-ting of Definition 1.9.5. Essentially all of the interesting results that weprove will extend to the more general setting, but the proofs usually havesome extra technicalities (for example, the assumption that A ⊂ L2(µ) willsimplify our life substantially. Therefore, we will restrict this presentationto compact diffusion Markov triples (which we will usually abbreviate tocompact Markov triple).

1.10 Properties of diffusion semigroups

1.10.1 The Markov property

We originally constructed semigroups Pt from generators L using spectraltheory, from which it wasn’t particularly clear how to check if the resultingsemigroup is a Markov semigroup. One nice consequence of Definition 1.9.5is that the resulting semigroup is always a Markov semigroup:

Proposition 1.10.1. Let (E,µ,Γ) be a compact Markov diffusion triple andlet Pt be its semigroup. Then Pt is a Markov semigroup.

30

Page 32: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proof. First, we show that Pt1 = 1. For this, note that because of Def-inition 1.9.5 (d), L1 = 0. Letting Hλ : λ ≥ 0 be the spectral decom-position of −L, it follows (from Proposition 1.8.3) that 1 ∈ H0. Then〈Eλ1, f〉 = 〈1, f〉 =

∫f dµ for all λ ≥ 0 and all f ∈ L2(µ); hence,

〈Pt1, f〉 = 〈E01, f〉+

∫ ∞0

e−λt d〈Eλ1, f〉 = 〈1, f〉.

It follows that Pt1 = 1.In order to show that Pt preserves non-negativity, we will show the equiv-

alent statement that Ptf ≤ 1 whenever f ≤ 1. First, introduce the resolventoperator Rλ = (λ Id−L)−1, which may be defined using spectral theory asψλ(−L) where ψλ(x) = 1/(λ+ x). Since

∑k≥0

tk

k!

(λ2

λ+ x

)k=∑k≥0

tk

k!

(λ− x

1 + x/λ

)k= exp

(tλ− tx

1 + x/λ

),

it follows that

limλ→∞

e−tλ∑k≥0

tk

k!(λ2Rλ)k = Pt.

Hence, it suffices to show that λRλf ≤ 1 whenever f ≤ 1. Let g = λRλf , sothat λg = Lg+λf . Note that such a g is automatically in D(L) (and hencealso in D(E)). Now let ψ : R→ R be a smooth function satisfying ψ(0) = 0and ‖ψ′‖∞ ≤ 1. Then

λ

∫g(g − ψ(g)) dµ =

∫Lg(g − ψ(g)) dµ+ λ

∫f(g − ψ(g)) dµ.

Now,∫Lg(g−ψ(g)) dµ = −

∫Γ(g, g−ψ(g)) dµ = −

∫(1−ψ′(g))Γ(g, g) dµ ≤ 0.

Hence, ∫(f − g)(g − ψ(g)) dµ ≥ 0.

Now take ψ to approach the function x 7→ x ∧ 1; we obtain∫(1− g)(g − g ∧ 1) ≥

∫(f − g)(g − g ∧ 1) dµ ≥ 0,

and so it must be that g ≤ 1 µ-almost surely.

31

Page 33: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

1.10.2 Ergodicity

Those who are familiar with the notion of ergodicity in other settings mayhave been surprised that we used the same word to describe part (e) of Def-inition 1.9.5. However, this assumption does turn out to be closely relatedto a more standard usage of the term. Indeed, Definition 1.9.5 (e) is themain property driving the convergence in the following proposition:

Proposition 1.10.2. For a compact Markov triple (E,µ,Γ) with semigroupPt and any f ∈ L2(µ), Ptf →

∫E f dµ in L2(µ) as t→∞.

Proof. Let L be the generator of Pt, and recall that Lg = 0 for all constantfunctions g. Definition 1.9.5 (e) implies that only constant functions belongto kerL. That is, if Hλ is the spectral decomposition of −L then (byProposition 1.8.3) H0 is the space of constant functions. In particular, E0fis the constant function x 7→

∫E f dµ.

By the right-continuity of the spectral decomposition, Eλf → E0f asλ → 0. Fix f and ε > 0, and choose λε > 0 small enough so that ‖Eλεf −E0f‖ ≤ ε. According to the definition of Pt,

〈P2tf, f〉 = 〈E0f, f〉+

∫ ∞0

e−2tλd〈Eλf, f〉.

Since d〈Eλf, f〉 is a positive measure, we may insert the bound e−2tλ ≤ 1for λ ≤ λε and e−2tλ ≤ e−2tλε otherwise. Then

〈P2tf, f〉 ≤ 〈Eλεf, f〉+ e−2tλε(‖f‖2 − 〈Eλεf, f〉) ≤ ‖E0f‖2 + ε+ e−2tλε‖f‖2.

On the other hand, the reversibility of µ with respect to Pt implies that〈P2tf, f〉 = ‖Ptf‖2, and so

lim supt→∞

‖Ptf‖2 ≤ ‖E0f‖2 + ε.

Since ε > 0 was arbitrary, we have lim sup ‖Ptf‖2 ≤ ‖E0f‖2. On the otherhand, the definition of Ptf also implies that

〈Ptf,E0f〉 = 〈E0f,E0f〉+

∫ ∞0

e−tλd〈Eλf,E0f〉,

where the integral is zero because 〈Eλf,E0f〉 is constant for λ > 0. Hence,〈Ptf,E0f〉 = ‖E0f‖2 for all t. It follows that ‖Ptf‖ ≥ ‖E0f‖ for all t; hence,limt→∞ ‖Ptf‖ = ‖E0f‖ and Ptf → E0f .

32

Page 34: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

1.10.3 Examples

Example 1.10.3 (The heat semigroup on Rn). The triple (Rn,Leb,Γ) cor-responding to the heat semigroup is not a compact Markov triple because Lebis not finite.

Example 1.10.4 (The wrapped heat semigroup). Take E = [0, 1] and letµ be the Lebesgue measure. Let A be the set of f ∈ C∞[0, 1] satisfyingf(0) = f(1), and let Γ(f, g) = f ′g′ for f, g ∈ A. Then (E,µ,Γ) is a compactMarkov triple. All of the properties are immediate, except possibly for (h),which follows from our earlier explicit form of Pt.

In this example, we can write the E and L (on their whole domains)explicitly in terms of Fourier series: let

f(x) = f0 +∑k≥1

fk sin(2πkx) + fk cos(2πkx),

and similarly for g, where the sum converges in L2. In order to compute theFourier expansion of f ′, note that∫ 1

0f ′(x) cos(2πkx) dx = 2πk

∫ 1

0f(x) sin(2πkx) dx,

and similarly for∫ 1

0 f′(x) sin(2πkx) dx. Then we have

f ′(x) = 2π∑k≥1

kfk cos(2πkx)− kfk sin(2πkx).

Since the terms cos(2πkx) and sin(2πkx) are all orthogonal and have L2

norm 1/√

2,

E(f, g) = 2π2∑k≥1

k2fkgk + k2fkgk.

It’s clear from this expression that the domain of E is the set of f ∈ L2(µ)such that ∑

k≥1

k2(f2k + f2

k ) <∞.

Recall that D(L) is the set of f ∈ D(E) such that there is a constant Cfor which E(f, g) ≤ C‖g‖2 for all g ∈ D(E). From the expression for E aboveand the Cauchy-Schwarz inequality, D(L) is the set of f such that∑

k≥1

k4(f2k + f2

k ) <∞.

33

Page 35: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

On this set,

Lf(x) =∑k≥1

k2(fk cos(2πkx) + fk sin(2πkx)).

Example 1.10.5 (The reflected heat semigroup). The reflected heat semi-group is similar to the wrapped one, except that we let A be the set off ∈ C∞[0, 1] satisfying f ′(0) = f ′(1) = 0. This leads to some differencesin E and L; for example, the difference in boundary conditions leads to adifferent relationship between the Fourier expansions of f and f ′:

f ′(x) =∑k≥1

(f(1)− f(0) + 2πkfk) cos(2πkx)− 2πkfk sin(2πkx).

Hence, the domain of E becomes the set of f ∈ L2(µ) such that there existssome a ∈ R with ∑

k≥1

(a+ kfk)2 + k2f2

k <∞.

We leave the rest of this example as an exercise.

Example 1.10.6 (The Ornstein-Uhlenbeck semigroup on R). The Ornstein-Uhlenbeck semigroup on R is associated with the triple (R, γ1,Γ), where γ1

is the standard Gaussian measure on R and Γ(f, g) = f ′g′. We take A tobe the set of smooth functions whose derivatives of all orders decay fasterthan any polynomial as x → ∞. One can check that A is preserved un-der Pt by using the explicit formula for Pt that we gave earlier. Recall thatLf(x) = f ′′(x)− xf ′(x).

Like the wrapped heat semigroup, the Ornstein-Uhlenbeck semigroup maybe studied using explicit spectral expansions. For this, we introduce the Her-mite polynomials, defined by

hk(x) = (−1)kex2/2 d

k

dxke−x

2/2,

or equivalently in terms of the generating function

ext−t2/2 =

∞∑k=0

hk(x)tk

k!.

Note that h0(x) = 1, h1(x) = x, h2(x) = x2 − 1, and that in general hk isa polynomial of degree k. One can check that hk, k ≥ 0 is an orthogonalbasis of L2(R, γ1), and that ‖hk‖22 = k!. The hk also satisfy the recursion

Lhk(x) = h′′k(x)− xh′k(x) = −khk(x).

34

Page 36: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

It follows that if f =∑

k≥0 fkhk/√k! then

Ptf =∑k≥0

e−tkfkhk√k!

Lf = −∑k≥0

kfkhk√k!

E(f) =∑k≥0

kf2k ,

where f ∈ D(L) whenever∑k2fk <∞ and f ∈ D(E) whenever

∑kfk <∞.

1.11 Γ2, curvature, and dimension

Recall that Γ may be written in terms of L as

2Γ(f, g) = L(fg)− fLg − gLf.

We define a bilinear form Γ2 by a very similar formula, but where multipli-cation is replaced by Γ.

Definition 1.11.1. For f, g ∈ A, define

2Γ2(f, g) = LΓ(f, g)− Γ(f, Lg)− Γ(g, Lf).

By similar formulas, one can define a whole sequence Γk of bilinear forms.It isn’t clear how useful this is, although Γ3 has recently received someattention [LNP15]. If Γ is a diffusion carre du champ then one can derivea chain rule for Γ2. We will explore the general case later, but the single-variable version is

Γ2(ψ(f)) = ψ′(f)2Γ2(f) + ψ′(f)ψ′′(f)Γ(f,Γ(f)) + ψ′′(f)2Γ(f)2

for every smooth ψ : R→ R.

Definition 1.11.2. For ρ ∈ R and n ∈ [1,∞], a compact Markov triple(E,µ,Γ) is said to satisfy the curvature-dimension condition CD(ρ, n) if

Γ2(f) ≥ ρΓ(f) +1

n(Lf)2

µ-almost everywhere for every f ∈ A.

35

Page 37: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

The motivation for Definition 1.11.2 is probably unclear so far. As wewill show later, if (E,µ,Γ) is the compact Markov triple corresponding tothe heat semigroup on a compact n-dimensional Riemannian manifold withRicci curvature bounded below by ρ then (E,µ,Γ) satisfies CD(ρ, n); this isthe motivation for the terminology.

Note that CD(ρ, n) trivially implies CD(ρ′, n′) for every ρ′ ≤ ρ andn′ ≥ n. However, there is not necessarily a best possible ρ or a best possiblen for a given Markov triple – it could be that one can increase ρ at the costof decreasing n, or vice versa.

1.11.1 Examples

Example 1.11.3 (The heat semigroup on Rn). Technically, we definedCD(ρ, n) only for compact Markov triples, and the heat semigroup on Rnis not a compact Markov triple. However, there is no obstacle in extendingDefinition 1.11.2 to cover this case. Since Γ(f, g) = 〈∇f,∇g〉 and Lf = ∆f ,we have

2Γ2(f, g) = ∆〈∇f,∇g〉 − 〈∇f,∇∆g〉 − 〈∇g,∇∆f〉 = 2〈∇2f,∇2g〉,

where ∇2f denotes the matrix (∂i∂jf)ni,j=1 and 〈A,B〉 = tr(ATB) for n× nmatrices A and B. Hence, the CD(ρ, k) condition becomes

‖∇2f‖22 ≥ ρ‖∇f‖22 +1

k(∆f)2. (1.5)

For any ρ > 0, this condition does not hold for all f in C∞c (for example,by taking f to be linear on some neighborhood, so ∇f is non-zero but ∇2fand Lf are zero). As for k, we have the inequality ‖A‖22 ≥ 1

n tr(A)2 for anyn × n matrix A. Since Lf = tr∇2f , it follows that (1.5) is satisfied forρ = 0 and k = n; hence the heat semigroup on Rn satisfies CD(0, n). Notealso that n is the smallest value of k for which (1.5) holds, by considering afunction f satisfying f(x) = x2 on some neighborhood.

Example 1.11.4 (The Ornstein-Uhlenbeck semigroup on Rn). For theOrnstein-Uhlenbeck semigroup, Γ(f, g) = 〈∇f,∇g〉 and Lf = ∆f −〈x,∇f〉.Compared to the computation for the heat semigroup on Rn, note that theonly difference will involve the term 〈x,∇f〉 in L. Hence,

2Γ2(f, g) = 2〈∇2f,∇2g〉 − 〈x,∇〈∇f,∇g〉〉+ 〈∇f,∇〈x,∇g〉〉+ 〈∇g,∇〈x,∇f〉〉= 2〈∇2f,∇2g〉+ 2〈∇f,∇g〉.

36

Page 38: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

In particular, Γ2(f) = ‖∇2f‖22 + Γ(f), and so it is immediately obvious thatthis satisfies CD(1,∞). By choosing f(x) = x1 (or more precisely, choosinga sequence of functions in A that equal x1 on larger and larger compactsets), we have Γ2(f) = 1, Γ(f) = 1, and Lf = −x1; since this is unbounded,the Ornstein-Uhlenbeck semigroup does not satisfy CD(ρ, n) for any finiten: CD(1,∞) is the best possible curvature-dimension condition.

Example 1.11.5 (The ultraspheric semigroup on (−1, 1)). For n ≥ 1, con-sider the operator Ln that acts on C∞[−1, 1] as

(Lnf)(x) = (1− x2)f ′′(x)− nxf ′(x).

The corresponding carre du champ operator is Γ(f, g) = (1 − x2)f ′g′. Onecan easily see using integration by parts that∫ 1

−1(1− x2)n/2−1Lnf dx = 0

for every smooth f . Hence, this defines a compact Markov triple with in-variant measure µ = Cn(1− x2)n/2−1, where Cn is a normalizing constant.As we will see later, this triple can be obtained by starting with the heatsemigroup on the sphere Sn and projecting it on some axis. For now, wejust consider this an an example of the curvature-dimension condition.

After some slightly tedious computations, one can check that

Γ2(f) = (n− 1)Γ(f) +1

n(Lnf)2 +

(1− 1

n

)(1− x2)2(f ′′)2.

Hence, this triple satisfies CD(n−1, n) (and this is the best possible in eitherparameter).

37

Page 39: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Chapter 2

Poincare inequalities

Curvature-dimension conditions turn out to imply many important inequal-ities. We will begin with the observation that a CD(ρ,∞) condition isequivalent to exponential decay of various quantities along the semigroupPt.

Proposition 2.0.1. Let Pt be the semigroup of a compact Markov triple.The following are equivalent:

(a) the curvature-dimension condition CD(ρ,∞) holds for some ρ ∈ R;

(b) for every f ∈ A and every t ≥ 0,

Γ(Ptf) ≤ e−2ρtPtΓ(f);

(c) for every function f ∈ A and every t ≥ 0,

Pt(f2)− (Ptf)2 ≤ 1− e−2ρt

ρPtΓ(f);

(d) for every function f ∈ A and every t ≥ 0,

Pt(f2)− (Ptf)2 ≥ e2ρt − 1

ρΓ(Ptf).

In the last two conditions, if ρ = 0 then we take (1−e−2ρt)/ρ = (e2ρt−1)/ρ =2t.

38

Page 40: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proof. To see that (a) implies (b), choose f ∈ A and define

Λ(s) = e−2ρsPsΓ(Pt−sf).

Applying the chain rule and the definition of Γ2,

Λ′(s) = 2e−2ρsPs(Γ2(Pt−sf)− ρΓ(Pt−sf)).

Under the CD(ρ,∞) assumption, Λ′(s) is non-negative pointwise and soΛ(t) ≥ Λ(0), which is equivalent to claim (b).

To prove that (b) implies (c), choose f ∈ A and define

Λ(s) = Ps(Pt−sf)2.

By the chain rule and the definition of Γ, Λ′(s) = 2PsΓ(Pt−sf). Hence,

Λ(t)− Λ(0) = 2

∫ t

0PsΓ(Pt−sf) ds

≤ 2

∫ t

0e−2ρ(t−s)PtΓ(f) ds

=1− e−2ρt

ρPtΓ(f),

where the inequality follows from (b). This proves (c).The proof that (b) implies (d) is very similar, except that instead of the

inequality PsΓ(Pt−sf) ≤ e−2ρ(t−s)PtΓ(f) that we used above, we use theinequality PsΓ(Pt−sf) ≥ e2ρsΓ(Ptf). This yields

Λ(t)− Λ(0) ≥ 2

∫ t

0e2ρsΓ(Ptf) ds =

e2ρt − 1

ρΓ(Ptf).

Now let us show that (c) implies (a). Note that the definition of L allowsus to write

Pth = h+ tLh+t2

2L2h+ o(t2) (2.1)

as t→ 0 for any h ∈ A. Recall that the definition of L involves L2(µ) conver-gence, which implies pointwise convergence µ-almost everywhere. Hence, theequation above makes sense pointwise µ-almost everywhere. Applying (2.1)for h = f and h = f2,

Pt(f2)− (Ptf)2 = tL(f2) +

t2

2L2(f2)− 2tfLf − t2(Lf)2 − t2fL2f + o(t2)

= 2tΓ(f) +t2

2L2(f2)− t2(Lf)2 − t2fL2f + o(t2).

39

Page 41: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

On the other hand,

1− e−2ρt

ρPtΓ(f) = 2tΓ(f)− 2ρt2Γ(f) + 2t2LΓ(f) + o(t2).

Assuming (c) and taking t→ 0, we have

1

2L2(f2)− (Lf)2 − fL2f ≤ −2ρΓ(f) + 2LΓ(f).

Now, note that the right hand side above can also be written as 2Γ(f, Lf)+LΓ(f). After rearranging, we conclude that

LΓ(f)− 2Γ(f, Lf) ≥ 2ρΓ(f),

which is exactly the CD(ρ,∞) condition.The proof that (d) implies (a) is analogous, and we omit it.

We will study part (c) of Proposition 2.0.1 in much more detail. Tobegin with, suppose that ρ > 0 and take the limit as t→∞. By ergodicity(Proposition 1.10.2), Pth→

∫h dµ as t→∞, and so∫

f2 dµ−(∫

f, dµ

)2

≤ 1

ρ

∫Γ(f) dµ.

We call such an inequality a Poincare inequality.

Definition 2.0.2. Let µ be a probability measure and E be a Dirichlet formon L2(µ). We say that µ and E satisfy a Poincare inequality with constantC if ∫

f2 dµ−(∫

f dµ

)2

≤ CE(f) (2.2)

for every f ∈ D(E). The Poincare constant of µ and E is the smallest Csuch that (2.2) holds for every f ∈ D(E).

Usually, we will apply Definition 2.0.2 in the context of a compactMarkov triple. In this case, since A is dense in D(E) with respect to〈·, ·〉E , it suffices to check (2.2) for all f ∈ A. We will also sometimes saythat µ and Γ satisfy a Poincare inequality, in which case it is implied thatE(f) =

∫Γ(f) dµ.

As we have argued above, if a compact Markov triple satisfies CD(ρ,∞)then it satisfies a Poincare inequality with constant 1

ρ . In fact, for finite tProposition 2.0.1 gives something more precise, which we will call a localPoincare inequality for the semigroup Pt.

40

Page 42: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Corollary 2.0.3. The compact Markov triple (E,µ,Γ) satisfies CD(ρ,∞)if and only if for every t ≥ 0 and µ-almost every x ∈ E, the measure pt(x, ·)satisfies a Poincare inequality with constant 1−e−2ρt

ρ , where pt is the kernelrepresentation of the semigroup Pt.

2.1 Poincare inequalities under CD(ρ, n)

Recall that if n < ∞ then the condition CD(ρ, n) is more restrictive thanthe condition CD(ρ,∞). It turns out that under CD(ρ, n) for ρ > 0 andn ∈ (1,∞), the finiteness of n lets us get a better Poincare inequality thatwe did for CD(ρ,∞). When specialized to the case of the heat semigroup ona compact Riemannian manifold, this is known as Lichnerowicz’s theorem.

Theorem 2.1.1. Let (E,µ,Γ) be a compact Markov triple. If the curvature-dimension condition CD(ρ, n) holds for some ρ > 0, n > 1 then µ satisfiesa Poincare inequality with constant C = n−1

ρn .

Lemma 2.1.2. Suppose that the compact Markov triple (E,µ,Γ) satisfiesCD(ρ,∞) for some ρ > 0. Then µ satisfies a Poincare inequality withconstant C > 0 if and only if∫

EΓ(f) dµ ≤ C

∫E

(Lf)2 dµ (2.3)

for all f ∈ D(L).

Proof. Fix f ∈ A and consider the function Λ(t) =∫E(Ptf)2 dµ. Then

Λ′(t) = −2

∫E

Γ(Ptf) dµ

and

Λ′′(t) = −4

∫E

Γ(Ptf, LPtf) = 4

∫E

(LPtf)2 dµ.

Under the assumption (2.3), we have Λ′′(t) ≥ − 2CΛ′(t) for all t ≥ 0. Hence,∫

f2 dµ−(∫

f dµ

)2

= −∫ ∞

0Λ′(t) dt

≤ C

2

∫ ∞0

Λ′′(t) dt

= −C2

Λ′(0)

= C

∫E

Γ(f) dµ.

41

Page 43: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Here, we are implicitly using the CD(ρ,∞) condition in order to concludethat Γ(Ptf) is decaying exponentially in t, so all of the integrals converge.This proves one of the claimed directions under the assumption that f ∈ A;for f ∈ D(L), we use a standard approximation argument.

For the converse, suppose that µ satisfies a Poincare inequality withconstant C, and fix some f ∈ D(L) with mean zero. By the Cauchy-Schwarzinequality, ∫

EΓ(f) dµ =

∫Ef(−Lf) dµ

√∫Ef2 dµ

∫E

(Lf)2 dµ

√C

∫E

Γ(f) dµ

∫E

(Lf)2 dµ,

which proves (2.3).

Proof of Theorem 2.1.1. Integrating the CD(ρ, n) condition yields∫E

Γ2(f) dµ ≥ ρ∫E

Γ(f) dµ+1

n

∫E

(Lf)2 dµ.

Since∫E Lhdµ = 0 for every h ∈ A, we have∫

EΓ2(f) =

1

2

∫ELΓ(f) dµ−

∫E

Γ(f, Lf) dµ

=1

2

∫ELΓ(f) dµ− 1

2

∫EL(fLf) dµ+

1

2

∫E

(Lf)2 + fL2f dµ

=

∫E

(Lf)2 dµ.

Applying this to the integrated CD(ρ, n) condition, we have

n− 1

n

∫E

(Lf)2 dµ ≥ ρ∫E

Γ(f) dµ.

Finally, apply Lemma 2.1.2 to this inequality.

Note that (unlike the CD(ρ,∞) case), we do not obtain a characteri-zation of CD(ρ, n) in terms of any Poincare inequality. To see that thereis no such characterization, recall that the Ornstein-Uhlenbeck semigroupsatisfies CD(1,∞) (and hence also a family of local Poincare inequalities),but not CD(ρ, n) for any n <∞.

42

Page 44: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

2.2 Spectral gap

Definition 2.2.1. Let A be a self-adjoint operator on a Hilbert space H.Its spectrum σ(A) is the set of λ ∈ R such that A − λ Id does not have abounded inverse.

Usually, one defines a spectrum over the complex numbers for operatorsacting on complex Hilbert spaces; it is then possible to show that self-adjointoperators have an entirely real spectrum. To save time, we will skip this stepand only define the real spectrum for self-adjoint operators.

Exercise 2.2.1. If A is self-adjoint and non-negative then σ(A) ⊂ R+.

One easily sees that every eigenvalue of A belongs to its spectrum: ifAf = λf for f 6= 0 then (A− λ Id)f = 0 and so A− λ Id certainly does nothave a bounded inverse. The converse is not true – not every λ ∈ σ(A) isnecessarily an eigenvalue – but an approximate converse is true.

Lemma 2.2.2. If A is self-adjoint then λ ∈ σ(A) if and only if there is somesequence fn ∈ D(A) such that ‖fn‖ = 1 for all n but ‖(A− λ Id)fn‖ → 0.

In the case that λ is an eigenvalue of A one can just take the sequencefn = f , where f is a normalized eigenfunction with eigenvalue λ.

Proof. One direction is clear: if there is a sequence fn as in the lemma thenany inverse of A− λ Id could not possibly be bounded; hence λ ∈ σ(A).

To show the other direction, fix λ ∈ σ(A); let R denote the range ofA−λ Id. One way that A−λ Id could fail to have an inverse is that it couldfail to be injective. In that case, A−λ Id has a non-trivial null space, and soλ is an eigenvalue of A. As we already discussed, this implies the existenceof a sequence fn as above.

It remains to consider the case that A−λ Id is injective. In this case, weclaim that R is dense in H. Indeed, suppose that h ∈ R⊥. Then for everyf ∈ D(A), 〈h, (A− λ Id)f〉 = 0. It follows that h ∈ D(A∗) = D(A) and that

0 = 〈h, (A− λ Id)f〉 = 〈(A− λ Id)h, f〉

for every f ∈ D(A); hence, (A − λ Id)h = 0. Since A − λ Id was assumedinjective, it follows that h = 0 and R⊥ = 0. In particular, R is dense inH.

Since A−λ Id is injective, we can certainly define an inverse (A−λ Id)−1 :R → H. Suppose first that this inverse is bounded on R. For any g ∈ H, thedensity of R implies we may choose a sequence gn ∈ R such that gn → g.

43

Page 45: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Setting fn = (A − λ Id)−1gn, the boundedness of (A − λ Id)−1 implies thesequence fn converges, to f say. Since A is a closed operator and both fnand (A− λ Id)fn converge, it follows that f = lim fn ∈ D(A) and

(A− λ Id)f = limn→∞

(A− λ Id)fn = limn→∞

gn = g.

Hence, g ∈ R and so R = H. In particular, (A − λ Id)−1 is a boundedoperator on all of H, contradicting our assumption that λ ∈ σ(A).

To conclude, λ ∈ σ(A) implies that the operator (A − λ Id)−1 : R →H is unbounded. Hence, there exist gn ∈ R, ‖gn‖ = 1 such that ‖(A −λ Id)−1gn‖ → ∞. Setting fn to be (A − λ Id)−1gn (renormalized to havenorm 1) gives the desired sequence.

Proposition 2.2.3. If the compact Markov triple (E,µ,Γ) satisfies a Poincareinequality with constant C then σ(L) belongs to (−∞,− 1

C ] ∪ 0.

The conclusion of Proposition 2.2.3 is known as a spectral gap, becauseit says that there is a gap between the largest and second-largest elementsof the spectrum. Note that 0 ∈ σ(L) because L1 = 0.

Proof. Take λ ∈ σ(L), λ 6= 0. By Lemma 2.2.2, we may choose a sequencefn ∈ D(L) such that ‖fn‖2 = 1 and ‖Lfn−λfn‖2 → 0. Since

∫E Lfn dµ = 0

for every n, it follows in particular that∫E fn dµ→ 0 as n→∞. Now, the

Poincare inequality yields∫Ef2n dµ−

(∫Efn dµ

)2

≤ C∫E

Γ(fn) dµ = −C∫EfnLfn dµ.

As n → ∞, the left hand side converges to 1. On the right hand side, wemay replace Lfn in the limit by λfn; hence, the right hand side convergesto −Cλ. We conclude that λ ≤ − 1

C .

2.3 Decay of variance

For a probability measure µ and a function f ∈ L2(µ), define

Varµ(f) =

∫Ef2 dµ−

(∫Ef dµ

)2

.

Note that Varµ(f) is the squared distance of f from the constant functionwith value

∫E f dµ. Recall from Proposition 1.10.2 that Ptf →

∫E f dµ in

L2(µ) whenever Pt is the semigroup from a compact Markov triple (this wasessentially the meaning of ergodicity). A Poincare inequality can equiva-lently be understood as a rate of convergence in this limit.

44

Page 46: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proposition 2.3.1. The compact Markov triple (E,µ,Γ) satisfies a Poincareinequality with constant C if and only if

Varµ(Ptf) ≤ e−2t/C Varµ(f) (2.4)

for every f ∈ L2(µ).

Proof. Suppose first that (E,µ,Γ) satisfies a Poincare inequality with con-stant C, and choose some f ∈ A. Since µ is invariant for Pt,

∫E Ptf dµ is

constant in t. On the other hand,

d

dt

∫E

(Ptf)2 dµ = 2

∫EPtfLPtf dµ = −2E(Ptf).

Now define Λ(t) = e2t/C Varµ(Ptf). Then

Λ′(t) =2

CVarµ(Ptf)− 2E(Ptf) ≤ 0,

and so Λ(t) ≤ Λ(0) for all t, which proves (2.4). So far we assumed f ∈ A,but any f ∈ L2(µ) may be reached by approximation.

For the other direction, we consider a Taylor expansion of Var(Ptf): iff ∈ A satisfies

∫E f dµ = 0 then Ptf = f + tLf + o(t), and so

Var(Ptf) =

∫Ef2 dµ+ 2t

∫EfLf dµ+ o(t).

On the other hand,

e−2t/C Varµ(f) =

(1− 2t

C+ o(t)

)Varµ(f).

Under the assumption (2.4), we may send t→ 0 to obtain

2

∫EfLf dµ ≤ − 2

CVarµ(f),

which is just a re-arrangement of the Poincare inequality.

2.4 Examples

Example 2.4.1 (The Ornstein-Uhlenbeck semigroup on Rn). Recall thatthe Ornstein-Uhlenbeck semigroup on Rn may be realized as the semigroupof a CD(1,∞) compact Markov diffusion with respect to the measure γn. In

45

Page 47: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

particular, Theorem 2.1.1 implies that γn satisfies a Poincare inequality withconstant 1. The constant 1 is sharp, as may be seen by considering linearfunctions: the function f(x) = x1 satisfies

Varγn(f) =

∫x2

1 dγn(x) = 1 =

∫1 dγn =

∫Γ(f) dγn.

It is of course no coincidence that the function f(x) = x1 is an eigenfunctionof L with eigenvalue −1.

Example 2.4.2 (The ultraspheric semigroup). Recall that the ultrasphericgenerator (Lnf)(x) = (1−x2)f ′′(x)−nxf ′(x) on (−1, 1) satisfies the curvature-dimension condition CD(n−1, n). According to Theorem 2.1.1, it satisfies aPoincare inequality with constant 1

n . This constant is sharp, as can be seenfrom the example f(x) = x which is an eigenfunction of Ln with eigenvalue−n.

Example 2.4.3 (The wrapped heat semigroup). The wrapped heat semi-group satisfies CD(0, 1); therefore Theorem 2.1.1 tells us nothing about it.However, one can nevertheless prove a Poincare inequality by using theFourier expansion as developed in Example 1.10.4. In particular, for anyf ∈ D(E), we have

Var(f) =1

2

∑k≥1

f2k + f2

k

E(f) = 2π2∑k≥1

k2(f2k + f2

k ).

In particular, Var(f) ≤ 14π2E(f), and so the wrapped heat semigroup satisfies

a Poincare inequality with constant 14π2 . The example f(x) = cos(2πx)

shows that this constant is sharp.Note that the inequality above applies only to functions f ∈ D(E), which

was obtained by starting with smooth functions satisfying f(0) = f(1) andtaking the appropriate closure. In particular, it does not hold for all smoothfunctions f : [0, 1] → R. For example, the function f(x) = cos(πx) sat-isfies Var(f) = 1

π2

∫ 10 (f ′)2 dx. In order to have an inequality that applies

to all smooth enough functions on [0, 1], we introduce a simple reflectiontransformation: for a smooth f : [0, 1]→ R, define g : [0, 1]→ R by

g(x) =

f(2x) if 0 ≤ x ≤ 1

2

f(1− 2x) if 12 < x ≤ 1.

46

Page 48: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Then g is smooth except possibly at the point x = 12 , where it is continuous.

Moreover, g(0) = g(1). Hence, by taking smooth approximations of g, onecan show that g ∈ D(E). Then

Var(g) ≤ 1

4π2E(g).

To relate this back to f , note that∫ 1

0g dx =

∫ 1

0f dx∫ 1

0g2 dx =

∫ 1

0f2 dx∫ 1

0(g′)2 dx = 4

∫ 1

0(f ′)2 dx.

Hence,

Var(f) ≤ 1

π2

∫ 1

0(f ′)2 dx.

This inequality holds for all smooth f (and easily extends to a larger class).The constant 1

π2 is sharp because of the example f(x) = cos(πx).

2.5 Exponential tails

Suppose that Γ : A → A is a diffusion form where A is dense in L2(µ),and consider the Dirichlet form E(f) =

∫E Γ(f) dµ. (Note that we are not

assuming here most of the properties of compact Markov triples.)A remarkable property of Poincare inequalities is that they automatically

imply exponential concentration inequalities: if you go “far away” in thespace E then the measure µ drops off exponentially. Since we did not beginwith any metric structure on E, the correct notion of distance comes fromΓ: say that a function f ∈ A is `-Lipschitz if ‖Γ(f)‖∞ ≤ `2. Of course, thisagrees with the usual notion of Lipschitz if E = Rn and Γ(f) = |∇f |2.

Proposition 2.5.1. If (µ, E) satisfy a Poincare inequality with constant C

then for every 1-Lipschitz f ∈ A and every 0 < s <√

4C ,

∫Eesf dµ ≤ 2 + s

√C

2− s√Ces

∫E f dµ.

47

Page 49: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

To develop some intuition about the exponential integrability conditionin Proposition 2.5.1, consider the case E = R, Γ(f, g) = f ′g′. Let µ bea mean-zero probability measure that satisfies a Poincare inequality withconstant C. Then the function f(x) = x is 1-Lipschitz and has mean zero;by Proposition 2.5.1 with s = C−1/2,∫ ∞

−∞ex/√C dµ(x) ≤ 3.

By Markov’s inequality, it follows that µ[t,∞) ≤ 3e−t/√C for every t ≥ 0.

This is what we mean when we say that µ has exponential tails. For ageneral E and Γ, the same argument implies that

µ

x : f(x) ≥

∫Ef dµ+ t

≤ 3e−t/

√C

for every 1-Lipschitz function f .

Proof. It is enough to prove the claim for bounded f ∈ A; the case ofarbitrary f ∈ A may then be reached by approximation and Fatou’s lemma.Now assume that f ∈ A is bounded, and define Z(s) =

∫E e

sf dµ. Note that

Varµ(esf/2) = Z(s)− Z(s/2)2

and that

E(esf/2) =

∫E

Γ(esf/2) dµ =s2

4

∫EesfΓ(f) dµ ≤ s2

4Z(s).

Under the assumption that µ and E satisfy a Poincare inequality with con-stant C, we therefore have(

1− Cs2

4

)Z(s) ≤ Z(s/2)2.

By induction,

Z(s) ≤ Z(s2−n)2nn−1∏`=0

(1− Cs2

4`+1

)−2`

(2.5)

for every n ≥ 1. Now, Z(s) = 1 + s∫E f dµ+O(s2) as s→ 0; it follows that

Z(s2−n)2n → es∫E f dµ as n→∞. Taking n→∞ in (2.5), we have∫Eesf dµ ≤ es

∫E f dµ

∞∏`=0

(1− Cs2

4`+1

)−2`

.

48

Page 50: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Finally, we claim that

∞∏`=0

(1− x2

4`

)−2`

≤ 1 + x

1− x(2.6)

for all x ∈ [0, 1); applying this inequality with x = Cs2/4 will then completethe proof.

To check (2.6), take logarithms on both sides: on the left hand side,

log

∞∏`=0

(1− x2

4`

)−2`

= −∞∑`=0

2` log

(1− x2

4`

)

=

∞∑`=0

2`∞∑k=1

x2k

k4k`

=

∞∑k=1

x2k

k(1− 21−2k).

On the other hand, if 0 ≤ x < 1 then

log1 + x

1− x= 2

∞∑k=1

x2k−1

2k − 1≥ 2

∞∑k=1

x2k

2k − 1.

Then (2.6) follows from comparing these expansions term-by-term and not-ing that 1

k(1−21−2k)≤ 2

2k−1 for all k ≥ 1.

2.6 The exponential distribution

One might wonder whether the exponential tail behavior of the previoussection is sharp. For example, the Gaussian measure satisfies a Poincareinequality and has tails that decay at the faster rate e−x

2/2. To see that anexponential tail decay is the best possible that can be implied by a Poincareinequality, we consider the exponential distribution.

Proposition 2.6.1. Let µ be the distribution dµ(x) = e−x dx on [0,∞).Then

Varµ(f) ≤ 4

∫ ∞0

(f ′)2 dµ

for every f : R+ → R with a bounded derivative. In other words, µ satisfies aPoincare inequality with constant 4 with respect to the usual carre du champoperator.

49

Page 51: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Note that the constant 4 in Proposition 2.6.1 is sharp: indeed, if µ wereto satisfy a Poincare inequality with constant C < 4, it would follow fromProposition 2.5.1 that ex were integrable with respect to µ (which is clearlynot the case). On the other hand, the exponential distribution shows thatthe range 0 < s <

√4/C for exponential integrability in Proposition 2.5.1

is sharp.

Proof. Since f ′ and Varµ(f) are unchanged when we add constant functionsto f , we may assume that f(0) = 0. Then∫ ∞

0f2(x) dµ(x) =

∫ ∞0

(2

∫ x

0f(y)f ′(y) dy

)e−x dx

= 2

∫ ∞0

f(y)f ′(y)

(∫ ∞y

e−x dx

)dy

= 2

∫ ∞0

ff ′ dµ

≤ 2

√∫ ∞0

f2 dµ

∫ ∞0

(f ′)2 dµ.

Hence,∫∞

0 f2 dµ ≤ 4∫∞

0 (f ′)2 dµ. Since Varµ(f) ≤∫∞

0 f2 dµ, this proves theclaim.

2.7 Log-concave measures

Definition 2.7.1. The probability measure µ on Rn defined by dµ(x) =e−W (x) dx is log-concave if W is convex. For ρ > 0, µ is ρ-strongly log-concave if W (x)− ρ|x|2 is convex.

From now on, assume that W is a smooth function, and let Γ(f, g) =〈∇f,∇g〉 be the usual carre du champ on Rn. Integrating by parts, we have

−∫Rn

Γ(f, g) dµ = −∫Rn〈∇f, e−W∇g〉 dx =

∫Rnf(∆g − 〈∇W,∇g〉) dµ.

In particular, setting Lg = ∆g − 〈∇W,∇g〉 gives us the integration byparts formula

∫fLg dµ = −

∫Γ(f, g) dµ. If all derivatives of W (x) grow

at most polynomially fast as |x| → ∞ then (Rn, µ,Γ) is a compact Markovtriple when A is the class of smooth, bounded functions whose derivativesall vanish super-polynomially fast. (We leave it as an exercise to checkthat this defines a compact Markov triple; the main difficulty is to show

50

Page 52: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

that PtA0 ⊂ A0, for which one may use regularity estimates for parabolicPDE.) Of course, if W (x) = 1

2 |x|2− n

2 log(2π) then we recover the Ornstein-Uhlenbeck triple.

By adapting the computation in Example 1.11.4 (replacing x by∇W (x)),one computes

Γ2(f, g) = 〈∇2f,∇2g〉+ (∇2W )(∇f,∇g),

where for a n×n matrix A, A(v, w) means the bilinear form∑

i,j Aijviwj . Inparticular, the triple (Rn, µ,Γ) satisfies CD(ρ, n) if and only if µ is ρ-stronglylog-concave and n =∞.

Corollary 2.7.2. Every ρ-strongly log-concave probability measure satisfiesa Poincare inequality (with respect to the usual carre du champ) with con-stant 1

ρ .

Note that Corollary 2.7.2 says nothing about measure that are log-concave but not strongly so. A theorem due to Bobkov states that everylog-concave measure satisfies a Poincare inequality, but does not give a sharpbound on the Poincare constant. This is the content of a famous conjectureof Kannan, Lovasz, and Simonovits:

Conjecture 2.7.3. There is a universal constant C such that if µ is alog-concave probability measure on Rn satisfying

∫Rn x dµ(x) = 0 and∫

RnxxT dµ(x) = Id,

then µ satisfies a Poincare inequality (with respect to the usual carre duchamp) with constant C.

Note that the two conditions∫Rn x dµ(x) = Id and

∫Rn xx

T dµ(x) = Idcan always be ensured by applying an affine transformation to µ.

51

Page 53: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Chapter 3

Logarithmic Sobolevinequalities

3.1 The strong gradient bound

When we proved the Poincare inequality under curvature conditions, the“gradient bound” Γ(Ptf) ≤ e−2tPtΓ(f) played a crucial role. To under-stand this bound better, let us consider the Ornstein-Uhlenbeck semigroup,whose carre du champ is Γ(f, g) = 〈∇f,∇g〉 and whose semigroup has therepresentation

(Ptf)(x) =

∫Rnf(e−tx+

√1− e−2ty) dγn(y).

The gradient bound yields |∇Ptf |2 ≤ e−2tPt|∇f |2. This is sharp when f isa linear function, because if f(x) = 〈a, x〉+ b then (Ptf)(x) = e−t〈a, x〉+ b.But in this specific case, we can do better. Passing the gradient under theintegral,

(∇Ptf)(x) =

∫Rne−t(∇f)(e−tx+

√1− e−2ty) dγn(y) = e−t(Pt∇f)(x).

Jensen’s inequality gives |∇Ptf | ≤ e−tPt|∇f |. In terms of Γ,√Γ(Ptf) ≤ e−tPt

√Γ(f).

This is better than our original gradient bound, which follows from this onebecause x 7→ x2 is convex.

Remarkably, the improved gradient bound holds not just for the Ornstein-Uhlenbeck semigroup, but under a CD(ρ,∞) condition. This should be at

52

Page 54: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

least a little bit surprising: the CD(ρ,∞) condition is equivalent to theweaker gradient bound, so how can it imply a strictly stronger bound? Thekey is in the diffusion assumption Γ(Ψ(f), g) =

∑∂iΨΓ(f, g), which played

no role in the weaker gradient bound but is crucial for the stronger one.

Theorem 3.1.1 (Strong gradient bound). Let (E,µ,Γ) be a compact Markovtriple that satisfies CD(ρ,∞). Then for every f ∈ A and every t ≥ 0,√

Γ(Ptf) ≤ e−ρtPt√

Γ(f).

We begin with a non-rigorous argument that will help motivate the realproof. Fix t and define Λ(s) = Ps

√Γ(Pt−sf). If the square root were a

smooth function, the chain rule for L would give L√g = Lg

2√g −

Γ(g)

4g3/2for any

g ∈ A. Then we could compute

Λ′(s) = PsL√

Γ(Pt−sf) + Ps

ddsΓ(Pt−sf)

2√

Γ(Pt−sf)

= Ps

[LΓ(Pt−sf)

2√

Γ(Pt−sf)− Γ(Γ(Pt−sf))

4Γ(Pt−sf)3/2− Γ(Pt−sf, LPt−sf)√

Γ(Pt−sf)

]

= Ps

[Γ2(Pt−sf)√

Γ(Pt−sf)− Γ(Γ(Pt−sf))

4Γ(Pt−sf)3/2

].

Hence,

d

ds(e−ρsΛ(s)) = e−ρs(Λ′(s)− ρΛ(s))

= e−ρsPs

[Γ2(Pt−sf)Γ(Pt−sf)− ρΓ(Pt−sf)2 − 1

4Γ(Γ(Pt−sf))

4Γ(Pt−sf)3/2

].

Noting that the denominator is negative, the inequality dds(e

−ρsΛ(s)) ≥ 0 isequivalent to the inequality

Γ(g)(Γ2(g)− ρΓ(g)) ≥ 1

4Γ(Γ(g)), (3.1)

for g = Pt−sf . Since Γ(g) and Γ(Γ(g)) are both non-negative, (3.1) isstronger than the CD(ρ,∞) condition.

In order to better understand (3.1), consider again the Ornstein-Uhlenbeckexample. There, ρ = 1 and Γ2(g) − Γ(g) = ‖∇2f‖22. To compute Γ(Γ(g)),note that

∂i|∇g|2 =∑j

2∂jg∂i∂jg = 2(∇2g · ∇g)i.

53

Page 55: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Hence, |∇|∇g|2|2 = 4|∇2g · ∇g|2 and so (3.1) becomes |v|2‖H‖22 ≥ |Hv|2,where H = ∇2g and v = ∇g. Since the Hilbert-Schmidt norm is larger thanthe operator norm, this inequality is indeed true.

To prove the inequality (3.1) under the CD(ρ,∞) condition, one firstneeds to develop a chain rule for Γ2. That is, by expanding the definition ofΓ2 and applying the chain rules

Γ(Ψ(f), g) =∑i

(∂iΨ)(f)Γ(fi, g)

LΨ(f) = (∑i

∂iΨ)(f)Lfi +∑i,j

(∂i∂jΨ)(f)Γ(fi, fj),

one can derive a similar chain rule for Γ2. Unfortunately, the computation isextremely tedious and (in our opinion) not particularly edifying. Thereforewe omit it, and only state the result (and that only in the case of thequadratic form, which is the only one we need). Let Xi = (∂iΨ)(f) and Yij =(∂i∂jΨ)(f). Then for any smooth Ψ : Rk → R and any f = (f1, . . . , fk) ∈Ak,

Γ2(Ψ(f)) =∑i,j

XiXjΓ2(fi, fj)

+∑i,j,`

XiYj`(Γ(fj ,Γ(fi, f`)) + Γ(f`,Γ(fi, fj))− Γ(fi,Γ(fj , f`))

)+∑i,j,`,m

YijY`mΓ(fi, f`)Γ(fj , fm). (3.2)

The CD(ρ,∞) condition becomes

0 ≤∑i,j

XiXjΓ2(fi, fj)

+∑i,j,`

XiYj`(Γ(fj ,Γ(fi, f`)) + Γ(f`,Γ(fi, fj))− Γ(fi,Γ(fj , f`))

)+∑i,j,`,m

YijY`mΓ(fi, f`)Γ(fj , fm)

− ρ∑i,j

XiXjΓ(fi, fj).

If we fix a single point, then as Ψ ranges over all smooth functions, Xi andYij range over all combinations of real numbers (subject to the symmetrycondition Yij = Yji). In particular, the right hand side above may be viewedas a quadratic form in the variables Xi and Yij .

54

Page 56: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Consider the case k = 3, X1 = x, Y23 = Y32 = y, and all the othervariables are zero. Moreover, specialize to the functions f = (g, g, h). Then

x2(Γ2(g)− ρΓ(g)) + 2xyΓ(h,Γ(g)) + 2y2(Γ(g)Γ(h) + Γ(g, h)2)

is a non-negative quadratic form in x and y. In particular, its determinantmust be non-negative:

Γ(h,Γ(g))2 ≤ 2(Γ2(g)− ρΓ(g))(Γ(g)Γ(h) + Γ(g, h)2)

≤ 4(Γ2(g)− ρΓ(g))Γ(g)Γ(h),

where the second inequality follows from the Cauchy-Schwarz inequality forΓ. Setting h = Γ(g) gives (3.1). By proving (3.1), we have also complete thenot-quite-proof of Theorem 3.1.1, in which we assumed that the square rootfunction was smooth. The real proof adds an approximation to the squareroot function.

Proof of Theorem 3.1.1. Fix ε > 0 and let ψ(x) be a smooth function satis-fying ψ(x) =

√x+ ε for some open interval containing R+. Note that

ψ′(x) =1

2ψ(x)

ψ′′(x) = − 1

4ψ3(x)

for all x ≥ 0. Fix t and define Λ(s) = Psψ(e−2ρsΓ(Pt−sf)). Similarly to ourprevious computation with the square root function,

d

dsΛ(s) = e−2ρsPs

[Γ2 − ρΓ

ψ(e−2ρsΓ)− e−2ρsΓ(Γ)

4ψ3(e−2ρsΓ)

],

where we have omitted all the occurrences of “Pt−sf” to save space. (Notethat we were able to use our formulas for ψ′ and ψ′′ above, because we arealways evaluating ψ at something non-negative.) For the second term, weapply the bound ψ2(x) ≥ x in the denominator, obtaining

d

dsΛ(s) ≥ e−2ρsPs

[Γ2 − ρΓ

ψ(e−2ρsΓ)− Γ(Γ)

4Γψ(e−2ρsΓ)

].

According to (3.1), the right hand side above is non-negative. Hence, Λ(s)is non-decreasing in s, and so

Pt√e−2ρtΓ(f) + ε = Λ(t) ≥ Λ(0) =

√Γ(Ptf) + ε.

Taking ε→ 0 completes the proof.

55

Page 57: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

3.2 The logarithmic Sobolev inequality

For a probability measure µ on E and a non-negative, measurable functionf : E → R+, define the entropy by

Entµ(f) =

∫Ef log f dµ−

∫Ef dµ log

∫Ef dµ,

where we adopt the convention that 0 log 0 = 0. By Jensen’s inequalityapplied to the convex function ψ(x) = x log x, Entµ(f) is always non-negative. Since ψ(x) is strictly convex, Entµ(f) = 0 if and only if f isconstant µ-a.s. Note that the entropy is homogeneous in the sense thatEntµ(cf) = cEntµ(f) for all c > 0.

Definition 3.2.1. If µ is a probability measure and E is a Dirichlet form, wesay they satisfy a log-Sobolev inequality with constant C if for all f ∈ D(E),

Entµ(f2) ≤ 2CE(f). (3.3)

The smallest C (which may be infinite) for which µ and E satisfy a log-Sobolev inequality is called the log-Sobolev constant of µ and E.

Of course, it will always suffice to prove a log-Sobolev inequality for alarge enough family (usually A) of functions belonging to D(E).

The log-Sobolev inequality is often encountered in an equivalent form: iff is non-negative and bounded away from zero (i.e. there exists some ε > 0such that f ≥ ε) then we may apply (3.3) to

√f , yielding

Entµ(f) ≤ 2CE(√f) =

C

2

∫E

Γ(f)

f, dµ,

where the last equality follows in the case that E(g) =∫E Γ(f) dµ for a

diffusion carre du champ Γ. Once the inequality is established for f ∈ Athat are bounded away from zero, it follows immediately for all positivef ∈ A by taking limits. (Note that for an arbitrary positive f ∈ A, the

integral of Γ(f)f may be infinity, but this does not contradict the inequality.)

Definition 3.2.2. If ν is a probability measure that has density f withrespect to µ, define the Fisher information of ν with respect to µ by

I(ν|µ) = Iµ(f) =

∫E

Γ(f)

fdµ

and the entropy of ν with respect to µ by

H(ν|µ) = Entµ(f).

56

Page 58: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

With this terminology, µ and Γ satisfy a log-Sobolev inequality withconstant C if and only if

H(ν|µ) ≤ C

2I(ν|µ)

for every probability measure ν that is absolutely continuous with respectto µ. Here, we are allowing both sides to be (positive) infinity; for example,we declare the left hand side to be infinity if dν

dµ does not belong to D(E).Note that the assumption for ν to be a probability measure is not important,since the original form (3.3) is homogeneous.

Proposition 3.2.3. If µ and E satisfy a log-Sobolev inequality with constantC then they satisfy a Poincare inequality with constant C.

Proof. Given f ∈ A with mean zero and ε > 0, apply the log-Sobolevinequality (3.3) to 1 + εf . On the right hand side, 2CE(1 + εf) = 2Cε2E(f).On the left hand side, since log(1 + x) = x− 1

2x2 + o(x2), we may write the

entropy of (1 + εf)2 as∫E

(1 + εf)2(εf − ε2

2f2) dµ−

∫E

(1 + εf)2 dµ log

∫E

(1 + εf)2 dµ+ o(ε2).

Recalling that∫E f dµ = 0, this reduces to

Entµ((1 + εf)2) = 2ε2∫Ef2 dµ+ o(ε2).

Taking ε→ 0, (3.3) implies that∫Ef2 dµ ≤ CE(f),

which is exactly the Poincare inequality with constant C.

3.3 Log-Sobolev inequalities under CD(ρ, n)

The story for log-Sobolev inequalities under a CD(ρ, n) condition is quitesimilar to the story for Poincare inequalities: there are local log-Sobolevinequalities that are equivalent to CD(ρ,∞), and there is an improved (butnon-local) log-Sobolev inequality for CD(ρ, n). We begin with the local log-Sobolev inequalities, which should be compared with Proposition 2.0.1, itsPoincare analogue.

57

Page 59: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proposition 3.3.1. For a compact Markov triple (E,µ,Γ) with semigroupPt, the following are equivalent:

(a) the CD(ρ,∞) condition holds for some ρ ∈ R;

(b) for every f ∈ A,

Γ(f)(Γ2(f)− ρΓ(f)) ≥ 1

4Γ(Γ(f));

(c) for every f ∈ A and every t ≥ 0,√Γ(Ptf) ≤ e−ρtPt

√Γ(f);

(d) for every positive f ∈ A and every t ≥ 0,

Pt(f log f)− Ptf logPtf ≤1− e−2ρt

2ρPt

Γ(f)

f;

(e) for every positive f ∈ A and every t ≥ 0,

Pt(f log f)− Ptf logPtf ≥e2ρt − 1

Γ(Ptf)

Ptf.

We call part (d) of Proposition 3.3.1 a local log-Sobolev inequality, sinceit says that each kernel measure pt(x, ·) satisfies a log-Sobolev inequality

with constant 1−e−2ρt

ρ . Similarly, we call part (e) a local reverse log-Sobolevinequality. Note that by taking t → ∞ and using ergodicity (Proposi-tion 1.10.2), we obtain the following corollary:

Corollary 3.3.2. If a compact Markov triple (E,µ,Γ) satisfies the CD(ρ,∞)condition for some ρ > 0 then µ and Γ satisfy a log-Sobolev inequality withconstant 1

ρ .

Proof of Proposition 3.3.1. In proving Theorem 3.1.1, we have already provedthat (a) implies (b), which in turn implies (c). On the other hand, (c) andJensen’s inequality together imply that

Γ(Ptf) ≤ e−2ρtPtΓ(f)

for all f ∈ A and t ≥ 0. According to Proposition 2.0.1, this implies theCD(ρ,∞) condition (i.e., (a)).

58

Page 60: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

To go from (c) to (d), fix t > 0 and define

Λ(s) = Ps[Pt−sf logPt−sf ].

Letting ψ(x) = x log x, one computes

Λ′(s) =d

dsPsψ(Pt−sf)

= Ps[Lψ(Pt−sf)− ψ′(Pt−sf)LPt−sf ]

= Ps[ψ′′(Pt−sf)Γ(Pt−sf)].

(Of course, this computation is valid for a general smooth function ψ.) Forour function ψ, ψ′′(x) = 1

x . Hence,

Λ′(s) = PsΓ(Pt−sf)

Pt−sf≤ e−2ρ(t−s)Ps

(Pt−s√

Γ(f))2

Pt−sf, (3.4)

where the inequality follows from part (c) of the proposition. Now, if X andY are positive random variables then the Cauchy-Schwarz inequality maybe written as

EX = E[√

YX√Y

]≤√EY E

X2

Y

Or, rearranged, (EX)2

EY ≤ EX2

Y . We may apply this to (3.4), where X =√Γ(f), Y = f , and the expectations are taken with respect to the kernel

measures pt−s(x, ·). This yields

Λ′(s) ≤ e−2ρ(t−s)PsPt−sΓ(f)

f= e−2ρ(t−s)Pt

Γ(f)

f.

Integrating out s, we have

Λ(t)− Λ(0) ≤ PtΓ(f)

f

∫ t

0e−2ρ(t−s) ds =

1− e−2ρt

2ρPt

Γ(f)

f,

thus proving (d).The proof that (c) implies (d) is very similar, except that instead of ap-

plying the inequality in (3.4), we first apply the Cauchy-Schwarz inequalitywith X =

√Γ(Pt−sf), Y = Pt−sf , and with expectations taken according

to the measures ps(x, ·). This gives

Λ′(s) ≥(Ps√

Γ(Pt−sf))2

Ptf≥ e2ρsΓ(Ptf)

Ptf.

59

Page 61: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Integrating out s implies (e).To prove that (d) implies (a), recall that the log-Sobolev inequality with

constant C implies the Poincare inequality with the same constant. In par-ticular, the local log-Sobolev inequality in (d) implies the local Poincareinequality in Proposition 2.0.1, part (c). By Proposition 2.0.1, this impliesCD(ρ,∞).

Finally, the proof that (e) implies (a) is analogous: by applying the localreverse log-Sobolev inequality to a function of the form (1+εf)2, one deducesthe local reverse Poincare inequality in Proposition 2.0.1, part (d), which inturn implies CD(ρ,∞).

3.3.1 The finite-dimensional improvement

We saw that a log-Sobolev inequality holds under a CD(ρ,∞) condition.As in the case of the Poincare inequality, the log-Sobolev constant may besharpened if the dimension is finite:

Theorem 3.3.3. If a compact Markov triple (E,µ,Γ) satisfies the CD(ρ, n)condition for ρ > 0 then µ and Γ satisfy a log-Sobolev inequality with con-stant n−1

nρ .

Lemma 3.3.4. Suppose that∫EfΓ(log f) dµ ≤ C

∫EfΓ2(log f) dµ

for some C > 0 and all positive f ∈ A. Then µ and Γ satisfy a log-Sobolevinequality with constant C.

Proof. Fix some positive f ∈ A. Let Λ(t) =∫E Ptf logPtf dµ. As in the

proof of Proposition 3.3.1,

Λ′(t) = −∫E

Γ(Ptf)

Ptfdµ = −

∫EPtfΓ(logPtf) dµ.

For the second derivative,

Λ′′(t) =

∫E

Γ(Ptf)LPtf

(Ptf)2− 2Γ(Ptf, LPtf)

Ptfdµ. (3.5)

From now on, take g = Ptf . The change of variables formula for L gives

LΓ(g)

g= −Γ(g)Lg

g2+LΓ(g)

g− 2Γ(g,Γ(g))

g2+

2Γ(g)2

g3.

60

Page 62: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Since∫E Lhdµ = 0 for any h ∈ D(L),∫

E

Γ(g)Lg

g2=

∫E

LΓ(g)

g− 2

Γ(g,Γ(g))

g2+ 2

Γ(g)2

g3dµ.

Applying this to (3.5) and recognizing that LΓ(g)− 2Γ(g, Lg) = 2Γ2(g), wehave

Λ′′(t) = 2

∫E

Γ2(g)

g− Γ(g,Γ(g))

g2+

Γ(g)2

g3dµ.

On the other hand, the chain rule (3.2) for Γ2 reveals that the integrandabove is just gΓ2(log g).

To recap,

Λ′(t) = −∫EPtfΓ(logPtf) dµ

Λ′′(t) = 2

∫EPtfΓ2(logPtf) dµ.

Under the assumption of the lemma, −Λ′(t) ≤ C2 Λ′′(t), and hence Λ′(t) ≥

Λ′(0)e−2t/C . In other words,∫E

Γ(Ptf)

Ptfdµ ≤ e−2t/C

∫E

Γ(f)

fdµ.

Finally,

Λ(0)−Λ(t) = −∫ t

0Λ′(s) ds ≤ −Λ′(0)

∫ ∞0

e−2s/C ds =C(1− e−2t/C)

2Iµ(f).

Taking t→∞ completes the proof.

Proof of Theorem 3.3.3. Our main task is to verify the condition of Lemma 3.3.4.We begin by recalling the chain rule for Γ2:

Γ2(ψ(g)) = (ψ′)2(g)Γ2(g) + ψ′(g)ψ′′(g)Γ(g,Γ(g)) + (ψ′′)2(g)Γ(g)2.

With ψ(g) = eag (for a constant a to be determined), we have

Γ2(eag) = a2e2ag[Γ2(g) + aΓ(g,Γ(g)) + a2Γ(g)2] (3.6)

Γ(eag) = a2e2agΓ(g)

Leag = aeag[Lg + aΓ(g)].

61

Page 63: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Hence, the CD(ρ, n) condition may be written as

Γ2(g) + aΓ(g,Γ(g)) + a2Γ(g)2 ≥ ρΓ(g) +1

n[Lg + aΓ(g)]2.

Multiplying by eg and integrating,∫Eeg[Γ2(g) + aΓ(g,Γ(g)) + a2Γ(g)2 − ρΓ(g)− 1

n[Lg + aΓ(g)]2

]dµ ≥ 0.

(3.7)By various manipulations, we will turn this into an inequality involving

only Γ2 and Γ. To that end, note that 4(Leg/2)2 = eg[Lg+ 12Γ(g)]2. On the

other hand,∫E(Lf)2 dµ =

∫E Γ2(f) dµ for any f . Hence the term involving

L in (3.7) may be expanded as follows:∫E

[Lg + aΓ(g)]2 dµ

=

∫E

4(Leg/2)2 + eg[(2a− 1)Γ(g)Lg +

4a2 − 1

4Γ(g)2

]dµ

=

∫E

4Γ2(eg/2) + eg[(2a− 1)Γ(g)Lg +

4a2 − 1

4Γ(g)2

]dµ

=

∫E

4Γ2(eg/2) + eg[(2a− 1)Γ(g)Lg +

4a2 − 1

4Γ(g)2

]dµ

=

∫Eeg[Γ2(g) +

1

2Γ(g,Γ(g)) + (2a− 1)Γ(g)Lg + a2Γ(g)2

]dµ, (3.8)

where the last line follows from the chain rule for Γ2 (i.e., (3.6) in the casea = 1

2). To remove the last remaining L, note that∫EegΓ(g)Lg dµ = −

∫E

Γ(g, egΓ(g)) dµ = −∫EegΓ(g,Γ(g)) + egΓ(g)2 dµ.

Plugging this back into (3.8) and the result back into (3.7), we see thateverything is written in terms of Γ2(g), Γ(g,Γ(g)), Γ(g) and Γ(g)2. It onlyremains to check the coefficients:∫

Eeg[n− 1

nΓ2(g) + bnΓ(g,Γ(g)) + cnΓ(g)2 − ρΓ(g)

]dµ ≥ 0 (3.9)

where bn = 2an+4a−32n and cn = na2−(a−1)2

n . If we choose a = 32n+4 then

bn = 0 and cn ≤ 0 for all n ≥ 1. Hence, we can remove bn and cn from theinequality above, concluding that∫

EegΓ2(g) dµ ≥ nρ

n− 1

∫EegΓ(g) dµ

62

Page 64: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

for all bounded g ∈ A. This implies that the condition of Lemma 3.3.4 holdsfor all bounded, positive f ∈ A that are also bounded away from zero. Fora bounded, positive f ∈ A, approximating it by f + ε allows us to verify thecondition of Lemma 3.3.4 completely.

3.4 Examples

Example 3.4.1 (The Ornstein-Uhlenbeck semigroup). We have alreadymentioned that the standard Gaussian measure satisfies the log-Sobolev in-equality with constant 1 (since the Ornstein-Uhlenbeck semigroup satisfiesCD(1,∞). This is the sharp constant: it can be seen from the fact that (byProposition 3.2.3) it yields the sharp Poincare constant, or from the factthat Proposition 3.6.1 yields sharp tail bounds.

Example 3.4.2 (The ultraspheric semigroup). Since the ultraspheric semi-group satisfies CD(n − 1, n), Theorem 3.3.3 implies that it satisfies a log-Sobolev inequality with constant 1

n . This is the sharp constant, since itimplies (by Proposition 3.2.3) the sharp Poincare constant of 1

n .

Example 3.4.3 (The wrapped heat semigroup). Theorem 3.3.3 has nothingto say about the wrapped heat semigroup on [0, 1], which satisfies CD(0, 1).However, we already saw that the wrapped heat semigroup satisfies a Poincareinequality (despite Theorem 2.1.1 having nothing to say about it), and so itis perhaps not so surprising that there is a log-Sobolev inequality also. Infact, we can show something a little stronger: under CD(0, 1), a Poincareinequality with constant C is equivalent to a log-Sobolev inequality with con-stant C. Indeed, under a Poincare inequality with constant C, Lemma 2.1.2implies that ∫

EΓ(f) dµ ≤ C

∫E

Γ2(f) dµ (3.10)

for all f ∈ A. We will deduce from this the condition∫EegΓ(g) dµ ≤ C

∫EegΓ2(g) dµ (3.11)

of Lemma 3.3.4, which will then lead to a log-Sobolev inequality. We beginby applying (3.10) with f = eg/2. Using the chain rule for Γ2, this leads to∫

Eeg[Γ2(g) +

1

2Γ(g,Γ(g)) +

1

4Γ(g)2

]dµ ≥

∫EegΓ(g) dµ.

63

Page 65: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

In order to arrive at (3.11) it suffices to show that∫Eeg[

1

2Γ(g,Γ(g)) +

1

4Γ(g)2] dµ ≤ 0 (3.12)

for all g ∈ A. But this follows from a computation that we already carriedout in the proof of Theorem 3.3.3: by (3.9) with n = 1 and ρ = 0,∫

Eeg[

6a− 3

2Γ(g,Γ(g)) + (2a− 1)Γ2(g)

]dµ ≥ 0

for every a. With a = 13 ,∫

Eeg[

1

2Γ(g,Γ(g)) +

1

3Γ2(g)

]dµ ≤ 0,

which is stronger than (3.12). Hence, the wrapped heat semigroup on [0, 1]satisfies a log-Sobolev inequality with (sharp) constant 1

4π2 .

3.5 Decay of entropy

Recall that a Poincare inequality was equivalent to exponential decay of thevariance under Pt (Proposition 2.3.1. A similar equivalence holds betweena log-Sobolev inequality and the decay of entropy. Indeed, the proofs in thePoincare and log-Sobolev settings are analogous: in the case of the Poincareinequality, the derivative of the variance along the semigroup was the nega-tion of the Dirichlet form. Hence, the Poincare inequality is equivalent to anexponential-type differential inequality for the variance along the semigroup.In this section (and as we have essentially already observed), the derivativeof the entropy along the semigroup is the negation of the Fisher informa-tion. The Poincare inequality is hence equivalent to an exponential-typedifferential inequality for the entropy along the semigroup.

Proposition 3.5.1. The compact Markov triple (E,µ,Γ) satisfies a log-Sobolev inequality with constant C if and only if

Entµ(Ptf) ≤ e−2t/C Entµ(f)

for every t ≥ 0 and every f ∈ L1(µ) with finite entropy.

Proof. It is enough to consider f ∈ A: fix f ∈ A with finite entropy anddefine

Λ(t) = Entµ(Ptf) =

∫EPtf logPtf dµ−

∫EPtf dµ log

∫EPtf dµ.

64

Page 66: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Note that the second term is actually constant in t. That is, the Λ(t)here and the one from the proof of Lemma 3.3.4 differ only by an additiveconstant. In particular, we already computed there that

Λ′(t) = −∫E

Γ(Ptf)

Ptfdµ = −Iµ(Ptf).

Under the log-Sobolev inequality with constant C, Λ′(t) ≤ − 2CΛ(t), and it

follows that Λ(t) ≤ e−2t/CΛ(0).In the other direction,

Λ(t) = Λ(0) + tΛ′(0) + o(t) = Λ(0)− tIµ(f) + o(t).

If Λ(t) ≤ e−2t/CΛ(0) = (1− 2tC + o(t))Λ(0) then as t→ 0 it follows that

Iµ(f) ≥ 2

CΛ(0),

which is the log-Sobolev inequality with constant C.

We may reformulate Proposition 3.5.1 in terms of probability distribu-tions: for a probability distribution ν0 that has a density f with respect toµ, let νt be the probability distribution dνt = Ptf dµ. If µ and Γ satisfy alog-Sobolev inequality with constant C, Proposition 3.5.1 implies that

H(νt | µ) ≤ e−2t/CH(ν0 | µ). (3.13)

In applications, convergence in total variation is often of interest: thetotal variation distance between probability distributions µ and ν is definedas

dTV(µ, ν) = supA∈F|µ(A)− ν(A)|.

The well-known Pinsker-Csizsar-Kullback inequality relates total variationdistance with entropy:

Proposition 3.5.2. For any probability measures µ and ν on the samespace,

dTV(µ, ν)2 ≤ 1

2H(ν | µ)

(where we declare H(ν | µ) to be infinity if ν does not have a density withrespect to µ).

65

Page 67: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Combining Proposition 3.5.2 with (3.13),

dTV(µ, νt)2 ≤ 1

2e−2t/CH(ν0 | µ).

That is, if we start with a probability measure that has finite entropy, thenas we apply the semigroup it converges exponentially fast in total variation.

Note that the Poincare inequality may also be used to deduce conver-gence in total variation: based on the equivalent (at least, when dν

dµ exists)form

dTV(µ, ν) =1

2

∫E

∣∣∣∣1− dν

∣∣∣∣ dµfor total variation distance, one can bound dTV(µ, ν)2 ≤ 1

4 Varµ( dνdµ) and thendeduce exponential convergence of dTV(µ, νt) from Proposition 2.3.1. Thedisadvantage with that approach is that one needs to start with dν

dµ ∈ L2(µ),

which is a stronger condition than the assumption that ν has finite entropywith respect to µ.

Proof of Proposition 3.5.2. Let f = dνdµ (if the density doesn’t exist, the

claim is trivial). It suffices to show that(∫E|1− f | dµ

)2

≤ 2 Entµ(f).

For this, define fs = 1 + s(f − 1) for s ∈ [0, 1] and set

Λ(s) = 2 Entµ(fs)−(∫

E|1− fs| dµ

)2

= 2 Entµ(fs)− s2

(∫E|1− f | dµ

)2

.

Since∫fs dµ = 1 for all s, it follows that

d

dsEntµ(fs) =

d

ds

∫Efs log fs dµ =

∫E

(f − 1)(1 + log fs) dµ

andd2

ds2Entµ(fs) =

∫E

(f − 1)2

fsdµ.

In particular, Λ(0) = Λ′(0) = 0 and

Λ′′(s) = 2

∫E

(f − 1)2

fsdµ− 2

(∫E|1− f | dµ

)2

≥ 0,

where the inequality follows from the Cauchy-Schwarz inequality. Hence,Λ(s) ≥ 0 for all s ∈ [0, 1]

66

Page 68: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

3.6 Exponential integrability and Gaussian tails

By Proposition 2.5.1, a Poincare inequality implies sub-exponential tail be-havior. As we will see here, a log-Sobolev inequality implies sub-Gaussiantail behavior. Here, we assume that E takes the form E(f) =

∫E Γ(f) dµ

for a diffusion carre du champ Γ. Recall that a function f is said to be`-Lipschitz if Γ(f) ≤ `2 pointwise.

Proposition 3.6.1. If µ and Γ satisfy a log-Sobolev inequality with constantC > 0 then for every 1-Lipschitz function f and every s ∈ R,∫

Eesf dµ ≤ es

∫E f dµ+Cs2/2.

Moreover, for every s < 1√C

,∫Ees

2f2/2 dµ <∞.

Proof. Let f be a bounded, 1-Lipschitz function in A (it suffices to checkthis case). Define Z(s) =

∫E e

sf dµ, which is differentiable and satisfies

Z ′(s) =

∫Efesf dµ.

The log-Sobolev inequality Entµ(esf ) ≤ 2CE(esf/2) may be written in termsof Z: for the entropy,

Entµ(esf ) = s

∫Efesf dµ− Z(s) logZ(s) = sZ ′(s)− Z(s) logZ(s).

On the other hand,

E(esf/2) =s2

4

∫EesfΓ(f) dµ ≤ s2

4Z(s).

Hence, the log-Sobolev inequality implies that

sZ ′(s) ≤ Z(s) logZ(s) +Cs2

2Z(s)

for all s ∈ R. Using the change of variables F (s) = 1s logZ(s) (where we set

F (0) = Z ′(0) =∫E fdµ for continuity), we have F ′(s) ≤ C

2 . Integrating thisinequality, we have

F (s) ≤ F (0) +Cs

2=

∫Ef dµ+

Cs

2

67

Page 69: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

for all s ≥ 0 (and the reverse inequality for negative s). Hence, for all s ∈ R

Z(s) = esF (s) ≤ es∫E f dµ+Cs2/2.

For the second claim, we integrate the first claim against a Gaussianmeasure: since

∫R e

σs dγ1(σ) = es2/2, Fubini’s theorem implies∫

Ees

2f2/2 dµ =

∫E

∫Reσsf dγ1(σ) dµ

=

∫R

∫Eeσsf dµ dγ1(σ)

≤∫Reσs

∫E f dµ+Cσ2s2/2 dγ1(σ).

This last integral is finite whenever Cs2 < 1.

As a consequence of Proposition 3.6.1, measures satisfying the log-Sobolevinequality have the property that Lipschitz functions are concentrated aroundtheir means with sub-Gaussian tails: if f is 1-Lipschitz with mean zero and µ,Γ satisfy a log-Sobolev inequality with constant C then by Propositon 3.6.1and Markov’s inequality,

µ(f ≥ t) = µ(esf ≥ est

)≤ e−st

∫Eesf dµ

≤ eCs2/2−st.

Choosing s = t/C gives µ(f ≥ t) ≤ e−t2

2C . By scaling and shifting, this maybe applied to an `-Lipschitz function of arbitrary mean, yielding

µ

(f ≥

∫Ef dµ+ t

)≤ e−

t2

2C`2 . (3.14)

To test this estimate, consider the Gaussian measure. By Theorem 3.3.3and the fact that the Ornstein-Uhlenbeck semigroup satisfies CD(1,∞),the standard Gaussian measure on R satisfies a log-Sobolev inequality withconstant 1. Choosing f(x) = x (which is 1-Lipschitz), (3.14) implies thatγ1(x ≥ t) ≤ e−t

2/2. On the other hand, classical bounds for the Gaussiantail imply that

t

t2 + 1≤√

2πet2/2γ1(x ≥ t) ≤ 1

t

whenever t > 0. This shows that the exponent in (3.14) is sharp, althoughthere may be some missing polynomial factors.

68

Page 70: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Example 3.6.2 (The exponential measure). Recall the exponential distribu-tion dµ(x) = e−xdx on [0,∞). In Section 2.6, we showed that the Poincareconstant of µ (with respect to the usual carre du champ Γ(f, g) = f ′g′) is 4.On the other hand, Proposition 3.6.1 implies that µ does not have a finitelog-Sobolev constant: if it did, then esx would be integrable against µ forevery s > 0, which is clearly not the case. This is our first example of ameasure that satisfies a Poincare inequality but not a log-Sobolev inequal-ity; in particular, it shows that Proposition 3.2.3 is only a one-directionalimplication.

3.7 Hypercontractivity

One of the first things we proved about Markov semigroups was that theyare contractions: if µ is an invariant measure then Pt : Lp(µ) → Lp(µ)is a contraction for every 1 ≤ p ≤ ∞. One important consequence of alog-Sobolev inequality is that the associated semigroup is hypercontractive,meaning that it even gives contractions from Lp(µ) to Lq(µ) for some q > p.

Theorem 3.7.1. For a compact Markov triple with semigroup Pt, the fol-lowing are equivalent:

(a) the log-Sobolev inequality holds with constant C; and

(b) for some (or every) 1 < p <∞, every t ≥ 0 and every f ∈ Lp(µ),

‖Ptf‖q(t) ≤ ‖f‖p,

where q(t) is the solution of q(t)−1p−1 = e2t/C .

(c) for some (or every) −∞ < p < 1, every t ≥ 0 and every positive,bounded f ∈ A,

‖Ptf‖q(t) ≥ ‖f‖p,

where q(t) is as above.

Part (b) above is called a hypercontractive inequality, while part (b) is areverse hypercontractive inequality.

Proof. Fix 1 < p < ∞ and a bounded, positive f ∈ A. Since A is denseand Pt|f | ≥ |Ptf |, it suffices to check (b) for such functions. By the (usual

69

Page 71: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

calculus) chain rule,

∂q‖f‖qq =

∂q

∫Ef q dµ

=

∫Ef q log f dµ.

The chain rule also gives

d

dqu(q)1/q =

1

qu(q)1/q−1u′(q)− 1

q2u(q)1/q log u(q);

applying this to u(q) = ‖f‖qq gives

∂q‖f‖q =

‖f‖1−qq

q2Entµ(f q).

On the other hand,

∂t‖Ptf‖q =

‖Ptf‖1−qq

q

∂t

∫E

(Ptf)q dµ

= ‖Ptf‖1−qq

∫E

(Ptf)q−1LPtf dµ

= −4(q − 1)‖Ptf‖1−qq

q2

∫E

Γ((Ptf)q/2) dµ.

Now define Λ(t, q) = ‖Ptf‖q. For any function q(t) that is differentiable int, the chain rule yields

d

dtΛ(t, q(t)) =

q′Λ1−q

q2Entµ((Ptf)q)− 4(q − 1)Λ1−q

q2E((Ptf)q/2). (3.15)

Under a log-Sobolev inequality with constant C,

d

dtΛ(t, q(t)) ≤ (2Cq′ − 4(q − 1))

Λ1−q

q2E((Ptf)q/2).

For our choice of q, 2Cq′ = 4(q − 1) and so ddtΛ(t, q(t)) ≤ 0. It follows that

‖Ptf‖q(t) = Λ(t) ≤ Λ(0) = ‖f‖p.

The previous argument may also be used to show that the log-Sobolevinequality implies the reverse hypercontractive inequality: the only differ-ence is that we begin with p < 1, which implies that q′(t) < 0. Hence, the

70

Page 72: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

log-Sobolev inequality applied to (3.15) produces an inequality in the otherdirection.

On the other hand, suppose that the hypercontractive (or reverse hyper-contractive) inequality holds for some p and all t. Then d

dt

∣∣t=0

Λ(t, q(t)) ≤ 0,which by (3.15) implies the log-Sobolev inequality with constant C.

An immediate consequence of Theorem 3.7.1 is that all eigenfunctionsof the generator belong to Lq(µ) for every 2 < q <∞. Indeed, if Lf = −λffor λ > 0 then Ptf = e−λtf . Applying Theorem 3.7.1 with p = 2 andt = C

2 log(q − 1),

e−λt‖f‖q = ‖Ptf‖q ≤ ‖f‖2,

which yields‖f‖q ≤ (q − 1)Cλ/2‖f‖2.

71

Page 73: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Chapter 4

Isoperimetric inequalities

4.1 Surface area

One of our applications of the log-Sobolev inequality was a tail estimate thatwas almost sharp for the Gaussian measure. Since the Gaussian measure isour model space for the CD(1,∞) condition, it is natural to ask whether wecan get an exact tail estimate under this assumption. This is one motivationfor introducing the notion of an isoperimetric inequality.

We begin with the notion of bounded variation: for the remainder of thissection, we fix a compact Markov triple (E,µ,Γ).

Definition 4.1.1. For f ∈ L1(µ), define

‖f‖BV = inffn

lim infn→∞

∫E

√Γ(fn) dµ,

where the infimum ranges over all sequences fn ∈ A satisfying fn → f inL1(µ). Note that ‖f‖BV may be infinite.

This defines a seminorm on f ∈ L1(µ) : ‖f‖BV < ∞ (not a norm,because if f is constant then ‖f‖BV = 0). Indeed, the triangle inequality for‖ · ‖BV follows from the Cauchy-Schwarz inequality for Γ, since

Γ(f + g) = Γ(f) + Γ(g) + 2Γ(f, g) ≤ (√

Γ(f) +√

Γ(g))2.

Taking square roots and integrating proves the triangle inequality for ‖f‖BV.

Definition 4.1.2. For a measurable subset A of E, µ+(A) = ‖1A‖BV. Wecall this the µ-surface area of A.

72

Page 74: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Example 4.1.3. Let E = R, Γ(f, g) = f ′g′ (with A = C∞c (R)), and supposethat µ has a continuous density φ with respect to the Lebesgue measure. Thenfor any a < b and an interval A = [a, b], µ+(A) = φ(a) + φ(b).

To see that µ+(A) ≤ φ(a) +φ(b), it suffices to find a sequence of smoothfε such that fε → 1A as ε→ 0 and

lim infε→0

∫R|f ′ε(x)|dµ(x) ≤ φ(a) + φ(b).

This can be easily done by convolving 1A with compactly supported, smoothmollifiers. For example, in this way one can construct functions fε : R →[0, 1] satisfying

• fε = 0 outside of [a− ε, b+ ε],

• fε = 1 on [a+ ε, b− ε],

• fε is non-decreasing on [a− ε, a+ ε], and

• fε is non-increasing on [b− ε, b+ ε].

For such a function fε, we have

‖fε − 1A‖L1(µ) ≤ µ([a− ε, a+ ε]) + µ([b− ε, b+ ε]),

which converges to zero since µ has a continuous density. On the otherhand, ∫

R|f ′ε| dµ =

∫ a+ε

a−εf ′ε(x)φ(x) dx−

∫ b+ε

b−εf ′ε(x)φ(x) dx.

Since φ is continuous, as ε → 0 one can approximate the right hand sideabove by

φ(a)

∫ a+ε

a−εf ′ε(x) dx− φ(b)

∫ b+ε

b−εf ′ε(x) dx = φ(a) + φ(b).

In order to check that µ+(A) ≥ φ(a)+φ(b), one needs to show that everysequence of smooth functions fn approximating 1A satisfies

lim infn→∞

∫R|f ′ε(x)| dµ(x) ≥ φ(a) + φ(b).

Suppose that φ(a) and φ(b) are both strictly positive (otherwise, a smallmodification of the following argument will also work). Then for any ε > 0there exists a δ > 0 (depending on φ(a), φ(b), and the continuity of φ near

73

Page 75: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

a and b) such that if ‖f − 1A‖L1(µ) ≤ δ then there a− < a+ ∈ [a − ε, a + ε]and b− < b+ ∈ [b− ε, b+ ε] with f(a−), f(b+) ≤ ε and f(a+), f(b−) ≥ 1− ε.Then ∫

R|f ′(x)|φ(x) dx ≥

∫ a+

a−|f ′(x)|φ(x) dx+

∫ b+

b−|f ′(x)|φ(x) dx.

By taking ε small, we can ensure that φ(x) is nearly constant on [a−ε, a+ε]and [b− ε, b+ ε]. In particular,∫

R|f ′(x)|φ(x) dx ≥ φ(a)

∫ a+

a−f ′(x) dx+ φ(b)

∫ b+

b−f ′(x) dx− oε(1)

≥ (1− 2ε)(φ(a) + φ(b))− oε(1),

where oε(1) is some quantity converging to zero as ε→ 0.

Exercise 4.1.1. Extend the previous example to the case where the densityφ is not necessarily continuous.

4.1.1 Minkowski content

When there is a metric on the space E, the Minkowski content is anothercommon method of defining surface area. For simplicity, we restrict here tothe setting E = Rn.

Definition 4.1.4. Let µ be a measure on Rn. For a measurable set A, define

µ+(A) = lim infε→0

µ(Aε)− µ(A)

ε,

where Aε = x ∈ Rn : d(x,A) ≤ ε. We call µ+ the Minkowski content.

Even when Γ(f) = |∇f |2, the Minkowski content and the µ-surface areawe defined above are not always the same. As an example, in R2 with µthe Lebesgue measure, consider the set A = ([0, 1] × [0, 1]) ∪ ([1, 2] × 1).This set looks like a square with a line sticking out of it, and one can easilycheck that µ(Aε) = 6ε+O(ε2). On the other hand, one can approximate Aby a smooth version of the indicator function of [0, 1] × [0, 1] (because thepiece [1, 2]×1 has measure zero. Hence, one sees that µ+(A) ≤ 4. This isactually a demonstration of a more general phenomenon: µ+ is unaffectedby changes of measure zero, whereas µ+ might be.

Nevertheless, there is a relationship between µ+ and µ+.

74

Page 76: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proposition 4.1.5. If Γ(f) = |∇f |2 then for every measurable A ⊂ Rn,µ+(A) ≤ µ+(A).

Proof. For simplicity, we will assume that A is bounded. The general casemay be attained by an approximation argument.

For ε > 0, let fε = (1 − (ε + 2ε2)−1d(x,Aε2))+. Then fε is not smooth,but it is (ε+2ε2)−1-Lipschitz. Moreover, fε is 1 on Aε2 and fε = 0 outside ofAε−ε2 . By convolving fε with a suitable smooth mollifier, one can constructa smooth function gε : Rn → [0, 1] with the following properties:

• gε = 1 on A and 0 outside Aε,

• gε ∈ C∞c (Rn) and is (ε+ 2ε2)−1-Lipschitz.

From the first property and the fact that gε takes values in [0, 1], ‖gε−1A‖ ≤µ(Aε)−µ(A), which we may assume converges to zero for some sequence ofε → 0 (otherwise µ+(A) = ∞, so there is nothing to prove). On the otherhand,

√Γ(gε) is zero on A and outside Aε; hence,∫

Rn

√Γ(gε) dµ ≤

µ(Aε)− µ(A)

ε+ 2ε2

and so µ+(A) ≤ lim infε→0

∫ √Γ(gε) dµ ≤ µ+(A).

Exercise 4.1.2. Take E = Rn, Γ(f) = |∇f |2, and suppose that µ has asmooth density φ.

(a) Show that

µ+(A) = sup

∫A

div(φv) dx : v ∈ C∞c (Rn;Rn), supx|v(x)| ≤ 1

.

(b) We say that A ⊂ Rn has a smooth boundary if there is some smoothfunction ψ : Rn → R such that A = x : ψ(x) ≤ 0 and ∇ψ(x) 6= 0 on∂A. Show that if A is compact with smooth boundary then µ+(A) =µ+(A).

4.2 Isoperimetric inequalities

An isoperimetric inequality is a lower bound on the surface area of a setin terms of its measure. The most famous isoperimetric inequality is theclassical Euclidean isoperimetric inequality, which says that the Lebesguesurface area of A ⊂ Rn is at least as large as the surface area of a Euclideanball with the same volume.

75

Page 77: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Definition 4.2.1. Given a measure µ and a diffusion carre du champ Γ, anisoperimetric function is a function I : [0, µ(E)] → R+ such that for everymeasurable A,

µ+(A) ≥ I(µ(A)).

The largest possible isoperimetric function is called the isoperimetric pro-file: the isoperimetric profile of µ and Γ is the function I : [0, µ(E)] → R+

defined by

I(a) = infµ+(A) : A ⊂ E measurable, µ(A) = a.

On Rn with µ the Lebesgue measure, the classical Euclidean isoperimet-ric inequality is equivalent to the statement that the isoperimetric profilefor Rn and µ is

In(a) = nµ(B)1/nan−1n ,

where B is the unit Euclidean ball.Here, however, we will focus on Gaussian-type isoperimetric inequalities.

Hence, the isoperimetric profile of the standard Gaussian measure will beof central importance. From now on, let φ : R → R be the density of thestandard Gaussian measure on R, and let Φ be its cumulative distributionfunction:

φ(x) =1√2πe−x

2/2

Φ(x) =

∫ x

−∞φ(y) dy.

We note that both φ and Φ may be naturally extended to ±∞: set φ(±∞) =0 and Φ(∞) = 1, Φ(−∞) = 0. Note also that Φ is strictly increasing, andso it has a inverse function Φ−1 : [0, 1]→ R ∪ ±∞.

Theorem 4.2.2 (Gaussian isoperimetric inequality). The isoperimetric pro-file for γn is I = φ Φ−1.

Based on Theorem 4.2.2, we say that a measure has a Gaussian-typeisoperimetric inequality if it’s isoperimetric function is larger than the Gaus-sian one: we say that µ has Gaussian-type isoperimetry if there is some c > 0such that cφ Φ−1 is an isoperimetric function for µ.

There are of course two directions of Theorem 4.2.2 that need to beproven. The more difficult direction is to show that I is an isoperimetricfunction; we will do this later. In order to show that I is the best possi-ble isoperimetric function, however, it suffices to give an example showing

76

Page 78: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

equality. For this, we begin with the case n = 1: for a ∈ [0, 1], defineA = (−∞,Φ−1(a)]. By the definition of Φ,

γ1(A) =

∫ Φ−1(a)

−∞φ(y) dy = a.

In order to find the surface area of A, we consult Example 4.1.3. Technically,that example only considered the surface area of finite intervals, but it caneasily be extended to the semi-infinite case. Hence, µ+(A) = φ(Φ−1(a)) =I(a). In particular, the isoperimetric profile of γ1 is at most I. For n > 1,it suffices to consider sets of the form A = x : x1 ≤ Φ−1(a); such a set hasµ(A) = a and µ+(A) = I(a). Since γn is rotationally invariant, one can ofcourse also take A = x : 〈x, θ〉 ≤ Φ−1(a) for any θ ∈ Sn−1.

4.3 Isoperimetry and expansion

For now, fix E = Rn and Γ(f) = |∇f |2. Recall that in this setting theMinkowski content µ+ is always at least as large as the µ-surface area µ+,and that an isoperimetric function I is a lower bound on the µ-surface area.Therefore it is also a lower bound on the Minkowski content. Since theMinkowski content is a sort of derivative, we may integrate out this inequal-ity in order to obtain bounds on µ(Ar) (recall that Ar = x : d(x,A) ≤ r).

Proposition 4.3.1. Let I be an isoperimetric function for µ, which weassume to be strictly positive on (0, µ(Rn)). Suppose that g : R→ (0, µ(Rn))solves g′(r) = I(g(r)). Then for every measurable A ⊂ Rn and every r > 0,µ(Ar) ≥ g(g−1(µ(Ar)) + r).

Proof. For the duration of this proof, we write f ′ for the lower derivative off ; i.e., f ′(x) = lim infh→0

f(x+h)−f(x)h . Let f(r) = µ(Ar); then the definition

of Minkowski content implies that f ′(r) = µ+(Ar) (here, we are also usingthe fact that (Ar)s = Ar+s). Since I is an isoperimetric function, f ′(r) ≥I(f(r)) for all r ≥ 0.

Since I is strictly positive, g is strictly increasing. Hence its inverseis defined on (0, µ(Rn)); we may extend it by continuity to [0, µ(Rn)]. Leth = g−1f . By the chain rule (appropriately extended for lower derivatives),

h′(r) =f ′(r)

g′(g−1(f(r)))=

f ′(r)

I(f(r))≥ 1.

It follows that g−1(f(r)) ≥ g−1(f(0)) + r. Since f(0) = µ(A), the claimfollows.

77

Page 79: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Combining Proposition 4.3.1 with Theorem 4.2.2 and the function g(r) =Φ(r), we obtain a lower bound on the Gaussian measure of an expansion.

Corollary 4.3.2. For any measurable A ⊂ Rn, γ(Ar) ≥ Φ(Φ−1(A) + r).

Note that the bound of Corollary 4.3.2 is sharp for sets of the formA = x : x1 ≤ Φ−1(a). Indeed, for this set Ar = x : x1 ≤ Φ−1(a) + r.In one dimension, applying Corollary 4.3.2 to the set A = (−∞, 0] yieldsγ1([t,∞)) ≤ 1−Φ(t) for t ≥ 0. Of course, this is a sharp inequality since it isessentially the definition of Φ. The point, however, is that an isoperimetricinequality can give a sharp bound for tail decay.

4.3.1 The coarea formula

Of course, expansions as we defined them above make sense when we havea metric that is compatible with Γ in some sense. In our general setting,however, it is more natural to work with thresholds of Lipshitz functionsinstead of expansions of sets. To this end, we introduce the co-area formula.From now on, we are in the setting of a compact Markov diffusion (E,µ,Γ).In what follows,

∫ ∗denotes the outer integral∫ ∗

Ef dµ = inf

∫Eg dµ : f ≥ g is measurable

.

Proposition 4.3.3. For any f ∈ A, let Ar(f) ⊂ E be the set x : f(x) > r.Then ∫ ∗

Rµ+(Ar(f)) dr ≤

∫E

√Γ(f) dµ.

In nice settings (e.g. Rn with the usual carre du champ), the inequalityof Proposition 4.3.3 is actually an equality.

Before proving Proposition 4.3.3, it will be convenient to extend Γ fromA to D(E). Indeed, if f ∈ D(E) then there exists a sequence fn ∈ A suchthat fn → f in L2(µ) and fn is Cauchy with respect to E . Since

E(fn−fm) =

∫E

Γ(fn)+Γ(fm)−2Γ(fm, fn) dµ ≥∫E

(√

Γ(fn)−√

Γ(fm))2 dµ,

it follows that√

Γ(fn) is Cauchy (and hence converges) in L2(µ). We maytherefore define

√Γ(f) ∈ L2(µ) to be this limit. Of course, this procedure

also defines Γ(f) ∈ L1(µ).

Exercise 4.3.1. Verify that the chain rule Γ(ψ(f)) = (ψ′(f))2Γ(f) remainsvalid for f ∈ D(E) and for smooth functions ψ with bounded derivatives.

78

Page 80: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Exercise 4.3.2. Recall that if f ∈ D(E) then a∧ f ∨ b ∈ D(E) for all a < b.Show that Γ(a ∧ f ∨ b) = 1a≤f≤bΓ(f) = 1a<f<bΓ(f).

Proof of Proposition 4.3.3. For k ∈ N and j ∈ Z, let

fj,k = 0 ∨ k(f − j

k

)∧ 1.

By Exercise 4.3.2,Γ(fj,k) = Γ(f)1 j

k≤f< j+1

k.

Hence, for every k ∈ N∫E

√Γ(f) dµ =

∑j∈Z

∫E

√Γ(fj,k) dµ.

Now fix some r ∈ R. If f = r has measure zero (which happens for allbut at most countably many r) then as k → ∞, fbrkc,k converges in L1(µ)to 1f>r. It follows by the definition of µ+ that

µ+(f > r) ≤ lim infk→∞

∫E

√Γ(fbrkc,k).

Hence, define the function ψk : R→ R by

ψk(r) =

∫E

√Γ(fbrkc,k) dµ

Then ψk is a measurable function such that µ+(f > r) ≤ lim infk→∞ ψk(r)for all but countably many r. It follows from Fatou’s lemma that∫ ∗

Rµ+(f > r) dr ≤ lim inf

k→∞

∫Rψk(r) dr

= lim infk→∞

∑j∈Z

∫E

√Γ(fj,k) dµ

=

∫E

√Γ(f) dµ.

By a similar argument to the one in Proposition 4.3.3, we get an analogueof Proposition 4.3.1 in the setting of a compact Markov triple.

Corollary 4.3.4. Suppose that I is a positive isoperimetric function for µand Γ, and let g be as in Proposition 4.3.1. Then for any 1-Lipschitz f ∈ Aand any t ∈ R, r > 0,

µ(f > t+ r) ≥ g(g−1(µ(f > t)) + r).

79

Page 81: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proof. Consider the function ψ(s) = µ(f ≤ s). Then

ψ(t)− ψ(s) =

∫E

1s<f≤t dµ

=

∫E

√Γ(s ∨ f ∧ t) dµ

≥∫ ∗Rµ+(Ar(s ∨ f ∧ t)) dr

≥∫ t

sI(µ(f ≤ r)) dr

=

∫ t

sI(ψ(r)) dr,

where the first inequality follows from Proposition 4.3.3 and the second fromthe fact that I is an isoperimetric function for µ. Hence, ψ′(s) ≥ I(ψ(s)),where ψ′ denotes the lower derivative. We conclude by applying the sameargument as in Proposition 4.3.1.

4.4 Gaussian-type isoperimetry under CD(ρ,∞)

For the rest of this section, let I = φ Φ−1 be the Gaussian isoperimetricprofile. Our goal here is to prove the Gaussian isoperimetric inequality,which we will do by seeing how PsI(Pt−sf) changes with s. We will begin,however, with a bound in the opposite direction (i.e., an upper bound onΓ(f)). This inequality should be compared to the reverse Poincare or reverselog-Sobolev inequalities.

Proposition 4.4.1. Let (E,µ,Γ) be a compact Markov triple that satisfiesCD(ρ,∞) for some ρ ∈ R. For any f ∈ A taking values in (0, 1) and anyt > 0,

Γ(Ptf) ≤ ρ

e2ρt − 1(I2(Ptf)− (PtI(f))2).

Proof. Assume that f takes values in [ε, 1−ε] for some ε > 0 (this assumptionmay be removed at the end by taking limits). Then I is smooth on an openset strictly containing the range of f , and so the following computations arejustified.

Fix t > 0 and define Λ(s) = Pt−sI(Psf). By a direct computation,I ′′ = − 1

I ; hence,

Λ′(s) = Pt−sΓ(Psf)

I(Psf).

80

Page 82: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

By the Cauchy-Schwarz inequality applied in the form Pt−sg2

h ≥(Pt−sg)2

Pt−shand the strong gradient bound,

Λ′(s) ≥(Pt−s

√Γ(Psf))2

Pt−sI(Psf)≥ e2ρ(t−s) Γ(Ptf)

Λ(s).

Then the chain rule yields

d

ds(Λ(s))2 = 2Λ(s)Λ′(s) ≥ 2e2ρ(t−s)Γ(Ptf).

Integrating,

Λ2(t)− Λ2(0) ≥ 2Γ(Ptf)

∫ t

0e2ρ(t−s) ds =

e2ρt − 1

ρΓ(Ptf).

Now we are prepared to prove the Gaussian isoperimetric inequality.Actually, we will prove something stronger: an isoperimetric inequality fordiffusions satisfying CD(ρ,∞).

Theorem 4.4.2. Let (E,µ,Γ) be a compact Markov triple that satisfiesCD(ρ,∞) for some ρ > 0. Then for any f ∈ A with values in (0, 1),

Pt√

Γ(f) ≥√ρ

√1− e−2ρt

(I(Ptf)− PtI(f)).

Before proving Theorem 4.4.2, let us see how it implies an isoperimetricinequality. First, take t → ∞ and integrate Theorem 4.4.2 with respect toµ, yielding ∫

E

√Γ(f) dµ ≥ √ρ

(I(∫

Ef dµ

)−∫I(f) dµ

). (4.1)

For a measurable A, let fn ∈ A be a sequence approaching 1A in L1(µ). Bytruncating, we may assume that fn take values in (0, 1) (because truncatingfn only reduces

∫E

√Γ(fn) dµ). Therefore, we may apply (4.1) to each fn. In

doing so, note that the convergence fn → 1A implies that∫E fn dµ→ µ(A)

and that I(fn)→ 0 in L1(µ). Hence,

lim infn→∞

∫E

√Γ(fn) ≥ √ρI(µ(A)),

which implies that√ρI is an isoperimetric function for µ.

81

Page 83: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proof of Theorem 4.4.2. Take Λ(s) = Pt−sI(Psf). As in the proof of Propo-

sition 4.4.1, Λ′(s) = Pt−sΓ(Psf)I(Psf) . Now, Proposition 4.4.1 implies that

√Γ(Psf) ≤

√ρ

√1− e−2ρs

I(Psf),

and so

Λ′(s) ≤√ρ

√e2ρs − 1

Pt−s√

Γ(Psf) ≤√ρe−ρs

√e2ρs − 1

Pt√

Γ(f).

Integrating out s,

Λ(t)− Λ(0) ≤ √ρPt√

Γ(f)

∫ t

0

1√e2ρs − 1

ds =

√1− e−2ρt

√ρ

Pt√

Γ(f).

4.5 Bobkov’s inequality

Although we have already investigated Gaussian-type isoperimetric inequal-ties, it turns out that there is a stronger functional inequality than The-orem 4.2.2. This inequality is known as Bobkov’s inequality, and we willinvestigate it in a way that bears strong similarities to the local Poincareand log-Sobolev inequalities.

Theorem 4.5.1. Let (E,µ,Γ) be a compact Markov triple with semigroupPt. Then the following are equivalent:

(a) the curvature-dimension condition CD(ρ,∞) holds for ρ ∈ R; and

(b) for every f ∈ A taking values in [0, 1] and every t ≥ 0,√ρI2(Ptf) + Γ(Ptf) ≤ Pt

√ρI2(f) + Γ(f). (4.2)

If ρ > 0 in Theorem 4.5.1 then we may take t→∞; the gradient boundensures that Γ(Ptf)→ 0, and hence

√ρI(∫

Ef dµ

)≤∫E

√ρI2(f) + Γ(f) dµ. (4.3)

This last equation is known as Bobkov’s inequality, while we call (4.2) alocal Bobkov inequality. Note that (4.3) implies (4.1) via the inequality√a2 + b2 ≤ |a|+ |b|. In particular, it also implies the Gaussian isoperimetric

inequality.

82

Page 84: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

4.5.1 Equality cases for Bobkov’s inequality

Recall that the Gaussian isoperimetric inequality is an equality for the stan-dard Gaussian measure γn and a set of the form x ∈ Rn : 〈x, θ〉 ≤ b.Hence, the functional inequality (4.1) almost achieves equality when f isthe smoothed indicator of a half-space. However, when E = Rn and µ = γnthen (4.1) actually never achieves equality for any smooth, non-constantfunction. This may be deduced from the inequality (4.3), since the inequal-ity√a2 + b2 ≤ |a|+ |b| is strict unless either a or b is zero. In particular, in

order to maintain equality in going from (4.3) to (4.1) we must have

γn(x : f(x) ∈ 0, 1 or ∇f = 0) = 1,

which is incompatible with f being a smooth, non-constant function.On the other hand, there is a family of smooth functions achieving equal-

ity in (4.3). Indeed, consider f : Rn → [0, 1] defined by f(x) = Φ(〈x, a〉+ b)for some a ∈ Rn and b ∈ R. Taking random variables X ∼ γn and Z ∼ γ1,∫

Rnf(x) dγn(x) = EΦ(〈X, a〉+ b)

= Pr(Z ≤ 〈X, a〉+ b)

= Pr(Z√

1 + |a|2 ≤ b)

= Φ

(b√

1 + |a|2

),

where we have used the fact that 〈X, a〉 has the same distribution as |a|Z,and that the sum of independent Gaussian variables is again Gaussian.

Applying (4.3) (with ρ = 1) to the function f above, the left hand sidebecomes

φ

(b√

1 + |a|2

).

On the right hand side, I(f) = φ(〈x, a〉 + b) and |∇f | = |a|φ(〈x, a〉 + b);hence the right hand side becomes√

1 + |a|2∫Rnφ(〈a, x〉+ b) dγn(x) =

√1 + |a|2

∫Rφ(|a|z + b) dγ1(z).

By expanding the definition of φ and completing the square in the exponent,this last integral may be computed explicitly, showing that this function fattains equality in (4.3).

83

Page 85: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

4.5.2 From Bobkov to log-Sobolev

By applying Bobkov’s inequality to functions of the form εf and then sendingε→ 0, we may recover the log-Sobolev inequality with constant 1

ρ . In orderto show this, however, we first consider the asymptotics of the function Inear zero.

Lemma 4.5.2. For all x > 0,

1√2π

(1

x− 1

x3

)e−x

2/2 ≤ 1− Φ(x) ≤ 1√2πx

e−x2/2.

Proof. By definition, 1 − Φ(x) = 1√2π

∫∞x e−y

2/2 dy. Integrating by partstwice,∫ ∞x

e−y2/2 dy =

e−y2/2

y−∫ ∞x

e−y2/2

y2dy =

e−y2/2

y− e−y2/2

y3+

∫ ∞x

e−y2/2

y4dy.

The first equality above gives the claimed upper bound on 1 − Φ(x), whilethe second equality gives the claimed lower bound.

In what follows, we write f(x) ∼ g(x) if f(x)g(x) → 1 as x converges to some

prescribed limit. According to Lemma 4.5.2 (and since 1−Φ(x) = Φ(−x)),Φ(x) ∼ 1

|x|φ(x) as x→ −∞. On the other hand, as x→ 0

Φ(−(1 + ε)

√2 log(1/x)

) x

Φ(−(1− ε)

√2 log(1/x)

) x;

and hence Φ−1(x) ∼ −√

2 log(1/x) as x→ 0.For the asymptotics of I, note that if y = Φ−1(x) then

x = Φ(y) ∼ −1

yφ(y) = − 1

Φ−1(x)I(x)

as x→ 0. In particular, I(x) ∼ −xΦ−1(x) ∼ x√

2 log(1/x) as x→ 0.Now let f ∈ A be bounded above and away from zero. Assume also that∫

E f dµ = 1, and note that in order to establish the log-Sobolev inequalityin general, it suffices to do it for such f . For small enough ε > 0, εf takesvalues in [0, 1] and we may apply (4.3). On the left hand side,

√ρI(∫

Eεf dµ

)=√ρI(ε) ∼ ε

√2ρ log(1/ε)

84

Page 86: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

as ε→ 0. On the right hand side,∫E

√ρI2(εf) + Γ(εf) dµ

∼ ε∫E

√2ρf2 log

1

εf+ Γ(f) dµ

∼ ε∫E

√2ρf2 log

1

ε+ 2ρf2 log

1

f+ Γ(f) dµ

= ε

∫E

√2ρ log(1/ε)f +

2ρf2 log(1/f) + Γ(f)

2√

2ρ log(1/ε)fdµ+ o

(ε√

log(1/ε)

),

where the last line follows from√y + δ =

√y+ δ/(2

√y) + o(δ). Comparing

the asymptotics for the two sides of Bobkov’s inequality, the terms of orderε√

log(1/ε) cancel out. Examining the term of order ε log−1/2(1/ε), we seethat

∫Ef log f dµ ≤

∫E

Γ(f)

fdµ,

which is equivalent to the log-Sobolev inequality with constant 1ρ . Recalling

that the log-Sobolev inequality itself implies the Poincare inequality, we haveestablished a heirarchy of three inequalities: Bobkov implies log-Sobolevimplies Poincare.

4.5.3 Proof of Bobkov’s inequality

We will prove Theorem 4.5.1 in much the same way as we proved the localPoincare and local log-Sobolev inequalities. Note that we already know onedirection: if the local Bobkov inequality holds then the local log-Sobolev in-equality holds, and we already observed that the local log-Sobolev inequalityimplies CD(ρ,∞). Hence, it only remains to prove the local Bobkov inequal-ity under a CD(ρ,∞) condition.

Lemma 4.5.3. For any smooth function on R2, f ∈ A, and t > 0,

d

dsPsΨ(Pt−sf,Γ(Pt−sf)) =

Ps[2∂2ΨΓ2(g) + ∂2

1ΨΓ(g) + 2∂1∂2ΨΓ(g,Γ(g)) + ∂22ΨΓ(Γ(g))

],

where g = Pt−sf and Ψ stands for Ψ(g,Γ(g)).

85

Page 87: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proof. We begin with

d

dsPsΨ(Pt−sf,Γ(Pt−sf)) = Ps

[LΨ +

d

dsΨ(Pt−sf,Γ(Pt−sf))

]= Ps [LΨ− ∂1ΨLg − 2∂2ΨΓ(g, Lg)] .

The chain rule for L yields

LΨ = ∂1ΨLg + ∂2ΨLΓ(g) + ∂21ΨΓ(f) + 2∂1∂2ΨΓ(g,Γ(g)) + ∂2

2ΨΓ(Γ(g)).

Gathering terms and recalling the definition of Γ2 completes the proof.

Proof of Theorem 4.5.1. Let Ψ(u, v) =√ρI2(u) + v. Then we may com-

pute

∂1Ψ =ρII ′

Ψ

∂2Ψ =1

∂21Ψ = −(ρII ′)2

Ψ3+ ρ

(I ′)2 − 1

Ψ

∂1∂2Ψ = −ρII′

2Ψ3

∂22Ψ = − 1

4Ψ3.

Now fix t and define Λ(s) = PsΨ(Pt−sf,Γ(Pt−sf)). We will aim to showthat Λ is non-decreasing in s; according to Lemma 4.5.3, it is enough tocheck that

(∗) = Ψ3[2∂2ΨΓ2(g) + ∂2

1ΨΓ(g) + 2∂1∂2ΨΓ(g,Γ(g)) + ∂22ΨΓ(Γ(g))

]≥ 0

pointwise for all g ∈ A taking values in [0, 1]. Plugging in the derivatives ofΨ,

(∗) = Ψ2Γ2(g)−(ρII ′)2Γ(g)+Ψ2ρ((I ′)2−1)Γ(g)−ρII ′Γ(g,Γ(g))−1

4Γ(Γ(g)),

where I and I ′ stand for I(g) and I ′(g). Now recall that Ψ2 = ρI + Γ(g).Making this expansion and recalling also the reinforced CD(ρ,∞) conditionΓ(g)Γ2(g)− ρΓ(g)2 ≥ 1

4Γ(Γ(g)), we have

(∗) ≥ ρI2Γ2(g) + ρ(I ′)2Γ(g)2 − ρ2I2Γ(g)− ρII ′Γ(g,Γ(g)). (4.4)

86

Page 88: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

By applying the reinforced CD(ρ,∞) condition and the Cauchy-Schwarzinequality, one can directly show this to be non-negative. Another argumentuses the chain rule for Γ2: since (Φ−1)′ = 1/I and (Φ−1)′′ = Φ−1/I2 =−I ′/I2, we have

I4Γ2(Φ−1 g) = I2Γ2(g)− II ′Γ(g,Γ(g)) + (I ′)2Γ(g)2.

Comparing this to (4.4), we have

(∗) ≥ ρI4Γ2(Φ−1 g)− ρ2I4Γ(Φ−1 g),

which is non-negative by the CD(ρ,∞) condition.

4.6 Exponential-type isoperimetry

Recall the exponential measure dµ(x) = e−xdx on [0,∞). We may define asurface area µ+ by applying our standard definition to the carre du champΓ(f, g) = f ′g′ on the algebra A of smooth functions whose derivatives vanishat infinity. Note that with this definition of surface area, a set [a, b] witha > 0 has surface area e−a + e−b, but the set [0, b] has surface area e−b.

Proposition 4.6.1. The isoperimetric profile of µ is I(a) = mina, 1− a.

Proof. For any f ∈ A and any t ≥ 0, integration by parts yields∫ ∞t

f ′dµ =

∫ ∞t

f ′(x)e−x dx

= −e−tf(t) +

∫ ∞0

f(x)e−x dx

=

∫ ∞0

f dµ− f(0).

Then, by the triangle inequality∫ ∞t

√Γ(f) dµ ≥

∣∣∣∣∫ ∞0

f dµ− e−tf(t)

∣∣∣∣ (4.5)

Now take any measurable A ⊂ [0,∞), and suppose that fn is a sequenceof functions in A converging to 1A in L1(µ). Without loss of generality, fntakes values in [0, 1].

We will consider two cases, depending on whether or not 0 belongs to thesupport of A. If it does, then for any ε > 0, µ(A∩ [0, ε)) > 0. It follows thatfor sufficiently large n, there must be xn ∈ (0, ε) such that fn(xn) ≥ 1− ε.

87

Page 89: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

By (4.5), if n is sufficiently large then∫ ∞0

√Γ(fn) dµ ≥

∫ ∞xn

√Γ(fn) dµ

≥ e−ε(1− ε)−∫ ∞xn

fn dµ

≥ e−ε(1− ε)−∫ ∞

0fn dµ− ε,

where the last line follows because xn ≤ ε and fn ≤ 1. Taking n → ∞ andthen ε→ 0, it follows that

lim infn→∞

∫ ∞0

√Γ(fn) dµ ≥ 1− µ(A),

and so µ+(A) ≥ 1− µ(A).When 0 does not belong to the support of A, the argument is similar

except that fn(xn) ≤ ε. In this case, we obtain

lim infn→∞

∫ ∞0

√Γ(fn) dµ ≥ µ(A).

Putting these two directions together, µ+(A) ≥ minµ(A), 1−µ(A). Hence,I(a) = mina, 1− a is an isoperimetric function for µ.

To show that I is the isoperimetric profile, it suffices to give examples.In the case a ≤ 1

2 , the set [log 1a ,∞) has measure a and surface area a; when

a > 12 , the set [0, log 1

1−a) has measure a and surface area 1− a.

Based on Proposition 4.6.1, we make the following definition:

Definition 4.6.2 (Exponential-type isoperimetry). Say that µ and Γ satisfyan exponential-type isoperimetric inequality if there is a constant c > 0 suchthat a 7→ cmina, 1− a is an isoperimetric function for µ, Γ.

Note that by a change of variables in Proposition 4.6.1, a 7→ cmina, 1−a is the isoperimetric function of the scaled exponential measure dµ(x) =ce−cx dx.

4.6.1 Exponential-type isoperimetry implies Poincare

Proposition 4.6.3. Let µ be a probability measure and Γ be a diffusioncarre du champ. If a 7→ cmina, 1 − a is an isoperimetric function for µand Γ then µ and Γ satisfy a Poincare inequality with constant 4

c2.

88

Page 90: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Note that the constant 4c2

is best possible: the measure dµ(x) = e−x dxhas isoperimetric profile a 7→ mina, 1− a and (by a change of variables inProposition 2.6.1) Poincare constant 4

c2.

Proof. Choose f ∈ A such that Γ(f) is bounded (just to ensure that allintegrals converge – this restriction can be removed at the end by takinglimits). Let m be a median of f , meaning that µ(f ≥ m) ≥ 1

2 andµ(f ≤ m) ≥ 1

2 . Let g+ = (f − m)+ and let g− = (f − m)−. Then(f −m)2 = g2

+ + g2−. By Fubini’s theorem,∫

E(f −m)2 dµ =

∫Eg2

+ dµ+

∫Eg2− dµ

=

∫ ∞0

µ(g2+ ≥ r) dr +

∫ ∞0

µ(g2− ≥ r) dr.

Since cmina, 1 − a is an isoperimetric function and µ(g2+ ≥ r) ≤ 1

2 forall r ≥ 0, we have cµ(g2

+ ≥ r) ≤ µ+(g2+ ≥ r) for all r ≥ 0 (and similarly

for g−). Hence,

c

∫E

(f −m)2 dµ ≤∫ ∗Rµ+(g2

+ ≥ r) dr +

∫ ∗Rµ+(g2

− ≥ r) dr

≤∫E

√Γ(g2

+) dµ+

∫E

√Γ(g2−) dµ, (4.6)

where the second inequality follows from the co-area formula (Proposi-tion 4.3.3). By the diffusion property of Γ and the Cauchy-Schwarz in-equality,∫

E

√Γ(g2

+) dµ = 2

∫Eg+

√Γ(g+) dµ ≤ 2

(∫Eg2

+ dµ

∫E

Γ(g+) dµ

)1/2

.

By the analogous inequality for g− and the fact that (by the Cauchy-Schwarzinequality)

√ab+√cd ≤

√(a+ c)(b+ d) for non-negative a, b, c, d, the right

hand side of (4.6) is bounded by

2

√∫Eg2

+ + g2− dµ

∫E

Γ(g+) + Γ(g−) dµ = 2

√∫E

(f −m)2 dµ

∫E

Γ(f) dµ

Going back to (4.6) and re-arranging, we have∫E

(f −m)2 dµ ≤ 4

c2

∫E

Γ(f) dµ.

Since Varµ(f) ≤∫E(f − a)2 dµ for any a ∈ R, it follows that µ and Γ satisfy

a Poincare inequality with constant 4c2

.

89

Page 91: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

4.6.2 Milman’s theorem: from integrability to isoperimetry

Definition 4.6.4. Let µ be a probability measure and Γ be a diffusion carredu champ. Say that the non-increasing function α : R+ → R+ is a concen-tration function for µ, Γ if for every 1-Lipschitz function f , every medianm of f , and every r ≥ 0,

µ(f ≥ m+ r) ≤ α(r).

Note that the requirement that a concentration function be non-decreas-ing is without loss of generality, since we can always replace α(r) by infα(s) :s ≤ r.

When we were discussing exponential tails (for the Poincare inequality)and exponential-squared tails (for the log-Sobolev inequality), it was moreconvenient to use the mean instead of the median. However, the mean andthe median are basically the same as long as the concentration functiondecays quickly enough.

Exercise 4.6.1.

(a) Suppose that α is a concentration function for µ, Γ. Show that forevery 1-Lipschitz function f and every r ≥ r0,

µ

(f ≥

∫Ef dµ+ r

)≤ α(r − r0),

where r0 =∫∞

0 α(r) dr.

(b) Suppose that µ(f ≥∫E f dµ+ r) ≤ α(r) for every 1-Lipschitz f and

every r ≥ 0. Show that

β(r) =

α(r0) r ≤ r0

α(r − r0) r > r0

is a concentration function for µ, Γ, where

r0 = infr ≥ 0 : α(r) ≤ 1/2.

In particular, if α(r) = Ce−cr or α(r) = Ce−cr2

in Exercise 4.6.1 thenwe can swap the median with the mean in Definition 4.6.4 at the cost ofchanging the constants C and c.

90

Page 92: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Theorem 4.6.5 (Milman’s theorem). Suppose that (E,µ,Γ) is a compactMarkov triple satisfying CD(0,∞), and suppose that α is a concentrationfunction for µ and Γ. Then there exists some c > 0 depending only on αsuch that

I(a) =

c

α−1(a/2)a log 1

a a ≤ 12

cα−1((1−a)/2)

(1− a) log 11−a a > 1

2

is an isoperimetric function for µ and Γ.

By plugging in α(r) = Ce−cr and α(r) = Ce−cr2, we see that under a

CD(0,∞) condition, exponential-type concentration implies an exponential-type isoperimetric inequality while exponential-squared-type concentrationimplies a Gaussian-type isoperimetric inequality.

Before beginning the proof of Theorem 4.6.5, recall the local reverse log-Sobolev inequality of Proposition 3.3.1. In the case that f takes values in(0, 1], we have f log f ≤ 0 and hence

Γ(Ptf) ≤ 1

t(Ptf)2 log

1

Ptf. (4.7)

Taking ψ(x) =√

log(1/x), we have ψ′(x) = (2x√

log(1/x))−1 and so thechain rule for Γ implies that

Γ

(√log

1

Ptf

)≤ 1

4t.

We summarize this as a lemma:

Lemma 4.6.6. Under a CD(0,∞) condition, if f ∈ A takes values in (0, 1]then

√log(1/Ptf) is 1

2√t-Lipschitz.

Under the additional assumption of a concentration function, Lemma 4.6.6may be used to obtain a tail bound on Ptf : let m be a median of Ptf ; since√

log(1/x) is monotonic in x,√

log(1/m) is a median of√

log(1/Ptf). Then

α(r) ≥ µ(√

log(1/Ptf) ≤√

log(1/m)− r

2√t

)= µ

(Ptf ≥ mer

2/(4t))

. (4.8)

Recall from our proof of the log-Sobolev and reverse log-Sobolev inequal-ities that

Pt(f log f)− Ptf logPtf =

∫ t

0Pt−s

Γ(Psf)

Psfds.

91

Page 93: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Now, (4.7) implies that√

Γ(Psf) ≤ Psf√s−1 log(1/ε). Hence,

Ptf log1

Ptf− Pt

(f log

1

f

)≤∫ t

0Pt−s

√s−1 log(1/ε)Γ(Psf) ds

≤∫ t

0Pt√s−1 log(1/ε)Γ(f) ds,

where the second inequality follows from the gradient bound under CD(0,∞).Rearranging this and integrating over E,∫

EPtf log

1

Ptfdµ−

∫Ef log

1

fdµ ≤ 2

√t log

1

ε

∫E

√Γ(f) dµ.

This is going in the right direction for an isoperimetric inequality, since weare lower-bounding

∫E

√Γ(f) dµ in terms of an integral of f . Moreover, if

f is approximating an indicator function then the second term on the lefthand side is very small; the remaining step is to relate

∫Ptf log(1/Ptf) dµ

to a =∫f dµ. For this, take some δ > 0. Then∫

EPt log

1

Ptfdµ ≥ log

1

δ

∫Ptf<δ

Ptf dµ ≥ log1

δ[a− µ(Ptf ≥ δ)] .

If δ ≥ met2/(4t) then we may apply (4.8) to bount µ(Ptf ≥ δ) by α(r).

Combining the argument so far, we have the following lemma.

Lemma 4.6.7. Suppose that f ∈ A takes values in [ε, 1]. Then for anyr, t > 0 and any δ ≥ mer2/(4t) (where m is a median of Ptf),

log1

δ

(∫Ef dµ− α(r)

)−∫Ef log

1

fdµ ≤ 2

√t log

1

ε

∫E

√Γ(f) dµ. (4.9)

To turn this into an isoperimetric inequality, we will apply Lemma 4.6.7to a sequence of functions approaching an indicator function, and we willneed to specify the parameters r, t, ε, and δ. We will consider two cases,depending on how large the measure of the set is.

First, let A ⊂ E be a measurable set. Set, µ(A) = a and assume firstthat 0 ≤ a ≤ η, where η is some constant to be determined. Let fn ∈ A be asequence converging to 1A in L1(µ), and suppose without loss of generalitythat fn take values in [0, 1]. Now take ε = a2 and let gn = ε ∧ fn. Then∫E gn dµ ≥

∫E fn dµ and

∫E

√Γ(gn) dµ ≤

∫E

√Γ(fn) dµ for every n. Choose

r = α−1(a/2), and let t solve r2 = t log(1/ε) = 2t log(1/a). Set

δ = 2aer2/(4t) = 2ae

12

log 1a = 2

√a,

92

Page 94: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

which satisfies the conditions of Lemma 4.6.7 because (by Markov’s inequal-ity) m ≤

∫Ptf dµ = 2a. Plugging all these parameters back into (4.9),

log1

2√a

(∫Efn dµ−

a

2

)−∫Egn log

1

gndµ ≤ 2α−1(a/2)

∫E

√Γ(fn) dµ.

(4.10)Now, gn → a2 ∧ 1A in L1(µ) and hence also pointwise µ-almost everywhere.Since gn log 1

gnis also bounded, it follows that

lim supn→∞

∫Egn log

1

gndµ ≤ a2 log

1

a2.

Since∫E fn dµ→ a, the left hand side of (4.10) is of order a log 1

a for smalla. In particular, there are constants c, η > 0 such that for all A withµ(A) = a ≤ η,

µ+(A) ≥ c

α−1(a/2)a log

1

a. (4.11)

This proves Theorem 4.6.5 in the case that µ(A) is small.For η ≤ µ(A) ≤ 1

2 , we apply essentially the same argument but with somedifferent parameters. First, choose r = α−1(η/2), so that α(r) = η/2 ≤ a/2.Next, note that (4.7) and the fact that x2 log(1/x) ≤ 2 for x ∈ [0, 1] togetherimply that Ptf is

√2/t-Lipschitz. This may be used to find a better bound

on a =∫E f dµ and m, the median of Ptf . Indeed,

a =

∫EPtf dµ ≤ m+

∫E

(f −m)+ dµ

= m+

∫ 1

0µ(f ≥ m+ r) dr

≤ m+ α(√t/2).

For t large enough (depending on α and η), we may ensure that a ≤ 1110m. By

increasing t if necessary, we may also ensure that er2/(4t) ≤ 12

11 . This implies

that δ = 35 ≥

6a5 ≥

1110ae

r2/(4t) satisfies the conditions of Lemma 4.6.7.

Finally, choose ε so that ε log(1/ε) ≤ η log(5/3)4 . Then (4.9) implies that

a log 53

2−∫Ef log

1

fdµ ≤ 2

√t log

1

ε

∫E

√Γ(f)

whenever f is a function of mean a ∈ [η, 1/2] taking values in [ε, 1]. Note thatt and ε depend only on η and α. By the same argument for the isoperimetricinequality above, we see that

µ+(A) ≥ cµ(A)

93

Page 95: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

for every A with η ≤ µ(A) ≤ 12 , where c is a constant depending only on η

and α. Combining this with (4.11) proves Theorem 4.6.5.

94

Page 96: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Chapter 5

Riemannian manifolds

5.1 Riemannian manifolds

This section contains a brief introduction to Riemannian manifolds, in whichwe introduce just enough in order to show how to put compact Markovtriples on compact, connected Riemannian manifolds. Readers who havenot yet seen the main concepts may prefer to consult a real book (for exam-ple, [Lee06]) on the topic. Those who are already familiar with Riemannianmanifolds will probably only care about Section 5.1.9 and Proposition 5.1.29.

Definition 5.1.1. A smooth n-manifold is a topological space M togetherwith a collection of open sets Ui and maps ψi : Ui → Rn satisfying

(a) M is a second countable Hausdorff space;

(b) each ψi is a homeomorphism onto its image;

(c) every p ∈M belongs to some Ui; and

(d) for every Ui and Uj that have a non-empty intersection, the map

ψj ψ−1i : ψi(Ui ∩ Uj)→ Rn

belongs to C∞.

The maps ψi are called charts and the collection of charts is called anatlas.

The simplest example of a smooth n-manifold is Rn itself, with the onlychart being the identity map (whose domain is all of Rn). Similarly, an opensubset of Rn is an n-manifold.

95

Page 97: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

For a slightly less trivial example, consider M = Sn = x ∈ Rn+1 :∑i x

2i = 1. We may equip this with charts defined on open hemispheres.

For example, let U+i = p ∈ M : pi > 0 and let U−i = p ∈ M : pi < 0.

Then let ψ±i : U±i be the map that leaves out the ith coordinate. It is easyto check that this defines an atlas for M .

Usually, when we do a local computation (such as differentiating a func-tion) on a manifold it is more convenient to fix a chart ψ and do the com-putation on the image of ψ (which is a subset of Rn). However, it would beextremely inconvenient to write ψ all the time; therefore, we will usually docomputations on the image of a chart but without explicitly writing downthe chart. We adopt the convention that p denotes a point on the manifoldM and x (or y) denotes its image under some chart. For example, if f is afunction on M then we will write f(x) when we really mean f(ψ−1(x)) forsome chart ψ. We say that x are local coordinates on M . When we need todeal with two intersecting charts (say ψ1 and ψ2) simultaneously, we will letx be the image under ψ1 and y be the image under ψ2. We will then writey(x) for the map ψ2 ψ−1

1 that changes from x coordinates to y coordinates.

Definition 5.1.2. Let M be a smooth m-manifold and N be a smooth n-manifold. A map f : M → N is smooth if it is smooth when viewed in localcoordinates – that is, when the map ψN f ψ−1

M is smooth whenever ψM isa chart for M and ψN is a chart for N .

We write C∞(M,N) for the set of smooth maps from M → N . We willuse the abbreviation C∞(M) = C∞(M,R).

If f is smooth, bijective, and has a smooth inverse then we call it adiffeomorphism.

One of the points of a manifold is that many things that are locally truein Euclidean space are also true on manifolds. In particular, the followinglemma is very useful for imitating Euclidean arguments on manifolds.

Lemma 5.1.3 (Extension lemma). If A ⊂M is a closed set and f : A→ Ris a smooth function (meaning that there exists an open set containing Mon which f can be smoothly extended), then for every open U ⊃ A there isa smooth extension of f to all of M such that f = 0 outside of U .

In particular, Lemma 5.1.3 implies that manifolds have smooth “bump”functions: for every closed A contained in an open U , there is a smoothfunction that is 1 on A and 0 outside of U .

96

Page 98: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

5.1.1 Tangent spaces

Definition 5.1.4. Given a point p ∈ M , a tangent vector to M at p is amap X : C∞(M)→ R such that for every f, g ∈ C∞(M) and a ∈ R,

(a) X(f + g) = X(f) +X(g) and X(af) = aX(f), and

(b) X(fg) = g(p)X(f) + f(p)X(g).

The set of all tangent vectors to M at p is denoted TpM .

The definition of tangent vectors may seem abstract, but it turns out thatthey can be used in a fairly concrete way. The following exercise should helpto make some sense of them.

Exercise 5.1.1. Let M be a smooth n-manifold. Let f and g be smoothfunctions and take X ∈ TpM .

(a) If f is constant then X(f) = 0.

(b) If f and g agree on some open set containing p then X(f) = X(g).(Hint: use a bump function.)

(c) If f(p) = g(p) = 0 then X(fg) = 0.

(d) If f has gradient zero at p in local coordinates (i.e. f ψ−1 has gradientzero for some chart ψ around p) then X(f) = 0. (Hint: use theprevious point and a Taylor expansion of f ψ−1.)

(e) TpM is an n-dimensional vector space. Given local coordinates xaround p, TpM is spanned by

∂x1

∣∣∣∣p

, . . . ,∂

∂xn

∣∣∣∣p

,

where ∂∂xi

∣∣p

(f) is defined in the usual calculus way. (This time I leave

it to you to make sense of this statement in terms of a chart.)

In particular, the last part of Exercise 5.1.1 implies that for every tangentvector X at p (and every choice of local coordinates), there are real numbersX1, . . . , Xn such that

X =

n∑i=1

Xi ∂

∂xi

∣∣∣∣p

.

97

Page 99: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

We will adopt the Einstein summation convention, which says that wheneveran index i appears in both an “upper” position and a “lower” position thenwe implicitly sum over it. In other words, we abbreviate the equation aboveby

X = Xi ∂

∂xi

∣∣∣∣p

.

When there is no ambiguity about the choice of coordinates x or the pointp, we will abbreviate ∂

∂xi

∣∣p

by ∂i.

Note that the representation of X in terms of X1, . . . , Xn depends on thechoice of local coordinates. However, there is a simple formula for changingcoordinates.

Exercise 5.1.2. Let x and x be two sets of local coordinates around thesame point p. Show that

∂xi

∣∣∣∣p

=∂xj

∂xi

∣∣∣∣p

∂xj

∣∣∣∣p

.

(The Einstein convention considers a upper index in the denominator asa lower index, so there is an implicit summation on the right hand sideabove.) As a consequence, if X1, . . . , Xn represent X in the x coordinatesand X1, . . . , Xn represent X in the x coordinates then

Xj =∂xj

∂xi

∣∣∣∣p

Xi.

In other words, the Jacobian matrix of the coordinate change map representsthe change of basis for TpM .

Note that since TpM is a finite-dimensional vector space, it comes witha natural topology.

5.1.2 The tangent bundle

Definition 5.1.5. The tangent bundle TM of M is the disjoint union⊔p∈M TpM .

If M is an n-manifold then we may define a topology and an atlas for TMsuch that TM is a 2n-manifold. Indeed, if (x1, . . . , xn) are local coordinatesfor M on the neighborhood U ⊂M , then we may define a map

⊔p∈U TpM →

Rn × Rn by (vi

∂xi

∣∣∣∣p

)7→ (x1(p), . . . , xn(p), v1, . . . , vn).

98

Page 100: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Exercise 5.1.3. Consider the topology on TM generated by the maps above.Show that these maps and this topology turn TM into a smooth 2n-manifold.

Definition 5.1.6. A vector field on M is a smooth map X : M → TM(which we write as p 7→ Xp) with the property that Xp ∈ TpM for everyp ∈M . The set of vector fields on M is denoted TM .

Note that for every choice of local coordinates around a point p, we maywrite Xp in terms of the standard basis of TpM . This means that locallythere exist real-valued functions X1, . . . , Xn such that

X = Xi∂i.

Also, if f : M → R is any smooth function then we can define anotherfunction Xf : M → R by (Xf)(p) = Xp(f).

Exercise 5.1.4.

(a) The coordinate functions p 7→ Xip are (locally) smooth functions from

M to R.

(b) The function Xf : M → R is smooth.

5.1.3 The cotangent bundle

For any p ∈ M , TpM is a finite-dimensional vector space. Therefore wemay also consider its dual space (i.e. the set of linear functionals on TpM),which be denote by T ∗pM and call the cotangent space. With respect to anylocal coordinates, ∂i form a basis for TpM ; hence, it induces a dual basis forT ∗pM . We denote this dual basis by dx1, . . . , dxn. That is, dxi is defined by

dxi(∂j) = δji .

Definition 5.1.7. The cotangent bundle T ∗M of M is the disjoint union⊔p∈M T ∗pM .

We may endow the cotangent bundle with a smooth manifold structurein the same way we did with the tangent bundle.

Definition 5.1.8. A covector field on M is a smooth map ω : M → T ∗M(which we write as p 7→ ωp) with the property that ωp ∈ T ∗pM for everyp ∈M . The set of covector fields on M is denoted T ∗M .

99

Page 101: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

As we did for vector fields, we may of course write ω locally in terms ofthe basis coordinate covector basis as ω = ωidx

i for smooth functions ωi.There is also a change of coordinates formula for covectors: if ω = ωidx

i =ωjdx

j then

ωi =∂xj

∂xiωj

Note that this change of coordinates formula is different from the one forvectors. This is important, since it gives us a coordinate-independent wayof writing the derivative of a function.

Definition 5.1.9. For a smooth function f : M → R, its differential df isthe covector field defined by df(X) = X(f).

Exercise 5.1.5.

(a) In local coordinates, df = ∂f∂xidxi. In particular, ∂f

∂xidxi is invariant

under changes of coordinates.

(b) Show that ∂f∂xi∂i is not invariant under changes of coordinates.

5.1.4 Vector bundles

The tangent and cotangent bundles share a structure that is useful to gen-eralize: they are manifolds built by “glueing” together in a smooth way avector space at every point of the manifold M . This construction may bemade in a more general way, although we will not give the full definitionhere. (In all the cases we consider, the construction will be analogous to theone for the tangent bundle.) When E is built from M by gluing together k-dimensional vector spaces at each point of M , then we call E a k-dimensionalvector bundle. We equip E with the projection map π : E → M (so thatπ−1(p) is the k-dimensional vector space that we attached to p).

The natural generalization of vector fields and covector fields are calledsections.

Definition 5.1.10. Let E be a vector bundle over M with projection mapπ. A section of E is a smooth function F : M → E such that F (p) ∈ π−1(p)for all p ∈M .

Besides the tangent bundle and the cotangent bundle that we have al-ready seen, one can easily visualize the normal bundle of a manifold embed-ded in Rn; here, we attach to every point the space of vectors that point ina normal direction. A section of the normal bundle (called a normal vectorfield) consists of a choice of a normal vector at every point.

100

Page 102: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

5.1.5 Tensors and tensor bundles

Definition 5.1.11. Let V be a finite-dimensional vector space. A for k and` non-negative integers, a (k, `)-tensor on V is a multilinear map

V k × (V ∗)` → R.

We write T k` (V ) for the set of all (k, `)-tensors on V ; when either k or ` iszero, we will omit it.

Note that there is a natural vector space structure on T k` (V ). Also, ifwe fix a basis e1, . . . , en of V (and let φ1, . . . , φn) be the dual basis) then a(k, `)-tensor F may be represented by a (k + `)-tuple of numbers: define

F i1,...,ikj1,...,j`= F (ei1 , . . . eik , φ

j1 , . . . , φj`).

Since F was assumed to be multilinear, this completely defines F .

Definition 5.1.12. If F is a (k, `)-tensor and G is a (k′, `′)-tensor then thetensor product F ⊗G is the (k + k′, `+ `′)-tensor defined by

(F ⊗G)(x1, . . . , xk+k′ , y1, . . . , y`+`

′)

= F (x1, . . . , xk, y1, . . . , y`)G(xk+1, . . . , xk+k′ , y

`+1, y`+`′).

Some spaces of tensors are already familiar: T 00 (V ) is just R, T 1(V ) =

V ∗, and T1(V ) is canonically isomorphic to V . Moreover, T 11 (V ) is canoni-

cally isomorphic to the space of linear maps from V to itself.

Exercise 5.1.6. Let e1, . . . , en be a basis for V and take φ1, . . . , φn to bethe dual basis for V ∗. Show that

ei1 ⊗ · · · ⊗ eik ⊗ φj1 ⊗ · ⊗ φj` : i1, . . . , ik, j1, . . . , j` = 1, . . . , n

is a basis for T k` (V ). In particular, T k` (V ) has dimension nk+`.

Definition 5.1.13. Let e1, . . . , en be a basis for V and let φ1, . . . , φn beits dual basis for V ∗. The trace, or contraction of a (k, `)-tensor F is the(k − 1, `− 1)-tensor given by

(trF )(X1, . . . , Xk−1, ω1, . . . , ω`−1) = F (ei, X1, . . . , Xk−1, φ

i, ω1, . . . , ω`−1).

Note that the trace of a (1, 1)-tensor is the usual matrix trace.

Exercise 5.1.7. Show that the trace does not depend on the choice of basise1, . . . , en.

101

Page 103: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Definition 5.1.14. Given an n-manifold M , the tensor bundle T k` (M) isthe vector bundle given by ⊔

p∈MT k` (TpM).

A section of a tensor bundle is called a tensor field, and the set of (k, `)-tensor fields is denoted T k` (M).

Note that T1(M) is just the tangent bundle, while T 1(M) is the cotangentbundle. Also, T 0

0 (M) = C∞(M).A tensor field F ∈ T k` (M) naturally induces a multilinear map (TM)k×

(T ∗M)` → C∞(M), defined by simply doing everything pointwise. It is afundamental property of tensor fields that this map is not only linear over R,but also over C∞(M). Specializing to the case F ∈ T 1

1 (M) for brevity, thismeans that for every X ∈ TM , every Y ∈ T ∗M , and every f, g ∈ C∞(M),

F (fX, gY ) = fgF (X,Y ).

In fact, this property characterizes tensor fields.

Lemma 5.1.15 (Tensor field characterization). A function F : (TM)k ×(T ∗M)` → C∞(M) belongs to T k` (M) if and only if it is C∞(M)-linear inevery coordinate.

5.1.6 Riemannian metrics

Definition 5.1.16 (Riemannian metric). A Riemannian metric g on a man-ifold M is a (2, 0)-tensor satisfying

(a) symmetry: for every X,Y ∈ TM , g(X,Y ) = g(Y,X); and

(b) positivity: for every p ∈ M and X ∈ TpM , if X 6= 0 then gp(X,X) >0.

In other words, a Riemannian metric specifies an inner product on everytangent space TpM . We will usually write 〈X,Y 〉 instead of g(X,Y ). As withany (2, 0)-tensor, we may express a Riemannian metric in local coordinatesas

g = gijdxi ⊗ dxj .

The symmetry condition ensures that gij = gji, while the positivity condi-tion ensures that the n× n matrix (gij) is positive definite at every p ∈M .

102

Page 104: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Exercise 5.1.8. Write down the change of coordinates formula for gij.

An inner product on TpM induces an isomorphism between TpM andits dual T ∗pM by mapping X ∈ TpM to the linear functional Y 7→ 〈X,Y 〉.We denote this map by X 7→ X[ and its inverse by ω 7→ ω]. In coordinates,X[ = Xjdx

j where Xj = gijXi and ω] = ωj∂j where ωj = gijωi and gij is

the inverse matrix of gij .We will extend the notation 〈, 〉 to covectors using this isomorphism:

〈ω, φ〉 = 〈ω], φ]〉 = ωiφjgij .

5.1.7 Connections

A connection on a manifold is essentially a local correspondence betweentangent spaces; for example, it allows one to say what it means for twovectors in different tangent spaces to be “parallel.” Formally, we define aconnection to be something that allows us to differentiate vector fields.

Definition 5.1.17 (Connection). A Riemannian connection on a Rieman-nian manifold is a map ∇ : TM × TM → TM , written (X,Y ) 7→ ∇XY ,that satisfies

(a) ∇XY is C∞(M)-linear in X, i.e.

∇fX1+gX2Y = f∇X1Y + g∇X2Y for f, g ∈ C∞(M);

(b) ∇XY is R-linear in Y , i.e.

∇X(aY1 + bY2) = a∇XY1 + b∇XY2 for a, b ∈ R;

(c) for f ∈ C∞(M), we have the product rule

∇X(fY ) = f∇XY + (Xf)Y ;

(d) the connection is compatible with the metric, meaning that

X〈Y,Z〉 = 〈∇XY, Z〉+ 〈Y,∇XZ〉;

(e) the connection is torsion-free, meaning that

(∇XY −∇YX)f = X(Y f)− Y (Xf).

103

Page 105: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

We interpret ∇XY as the derivative of Y along the directions given byX.

Exercise 5.1.9. On Rn with the Euclidean metric, show that ∇XY =(XY i)∂i defines a Riemannian connection.

Theorem 5.1.18 (Levi-Civita theorem). Every Riemannian manifold ad-mits a unique Riemannian connection.

It turns out that knowing how to differentiate functions and vector fieldsgives us a canonical way to differentiate all tensor fields:

• For f ∈ T 00 (M) = C∞(M), define ∇Xf = Xf .

• For Y ∈ T1(M) = T (M), ∇XY is defined by Theorem 5.1.18.

• For ω ∈ T 1(M), ∇Xω is defined by the equation

X(ω(Y )) = ∇X(ω(Y )) = (∇Xω)(Y ) + ω(∇XY ).

• For F ∈ T k` (M), ∇XF is defined by the equation

X(F (Y1, . . . , Yk, ω1, . . . , ω`)) = (∇XF )(Y1, . . . , Yk, ω

1, . . . , ω`)

+∑i

F (. . . ,∇XYi, . . . )

+∑j

F (. . . ,∇Xωj , . . . ).

Exercise 5.1.10. The map ∇ defined above satisfies the following proper-ties:

(a) ∇X(Y [) = (∇XY )[ and ∇X(ω]) = (∇Xω)].

(b) ∇X(F ⊗G) = F ⊗ (∇XG) + (∇XF )⊗G.

(c) ∇X tr(F ) = tr(∇XF ).

(d) For F ∈ T k` (M), the map ∇F : (TM)k+1×(T ∗M)` → C∞(M) definedby

(∇F )(Y1, . . . , Yk, X, ω1, . . . , ω`) = (∇XF )(Y1, . . . , Yk, ω

1, . . . , ω`)

is a (k + 1, `)-tensor field.

104

Page 106: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

(e) For f ∈ C∞(M), ∇f = df .

Definition 5.1.19 (Hessian). The Hessian of f ∈ C∞(M) is the (2, 0)-tensor ∇2f = ∇(∇f).

Note that (∇2f)(X,Y ) is not the same as ∇X(∇Y f).

Exercise 5.1.11. Show that (∇2f)(X,Y ) = Y (Xf)−(∇YX)(f). Concludethat ∇2f is symmetric, in the sense that (∇2f)(X,Y ) = (∇2f)(Y,X).

So far, we defined the trace only for tensors that take both a vector anda covector. However, using the isomorphism induced by the Riemannianmetric between vectors and covectors, we may also define the trace of othertensors (although we will stick to the (2, 0) case because the notation issimpler).

Definition 5.1.20 (Trace). If F is a (2, 0)-tensor, written as Fijdxidxj in

local coordinates, then its trace is trF = F ijgji.

Exercise 5.1.12. Show that the trace defined above is invariant underchanges of coordinates.

Definition 5.1.21 (Laplace-Beltrami operator). The Laplace-Beltrami op-erator ∆ : C∞(M)→ C∞(M) is defined by

∆f = tr(∇2f).

Exercise 5.1.13. Show that the Laplace-Beltrami operator is given in localcoordinates by

∆f =1√|det g|

∂i

(√|det g| · gij∂jf

).

(Hint: first, figure out how to write ∇∂i∂j in terms of gij and its derivatives.)

5.1.8 Integration on a manifold

There are two different ways to define integration on manifolds. The morecommon one defines a “volume form,” which depends on orientation andrequires an oriented manifold. We will instead define a “volume element,”which is more natural for the applications we have in mind.

We define the integral in two steps: first, if f ∈ C∞(M) is supported ona single chart U then we define∫

Mf dV =

∫Uf√|det g| dx.

105

Page 107: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Exercise 5.1.14. If f ∈ C∞(M) is supported on the intersection of twocharts, show that the definition above does not depend on the chart.

Next, if f ∈ C∞(M) has compact support then we write f =∑k

i=1 fiwhere each fi is smooth and supported on a single chart (the fact that thiscan be done is not completely trivial, but it can be done by using a standardtool known as a partition of unity). Then we define∫

Mf dV =

k∑i=1

∫Mfi dV.

Of course, this may be extended beyond smooth functions of compact sup-port by taking limits.

Exercise 5.1.15. Show that the definition above does not depend on thechoice of fi.

The usual integration by parts formulas transfer over from Euclideanspace to manifolds:

Theorem 5.1.22 (Green’s theorem). For any compactly supported f, g ∈C∞(M), ∫

Mf∆g dV = −

∫M〈∇f,∇g〉 dV =

∫Mg∆f dV.

Exercise 5.1.16. Prove Theorem 5.1.22. (Hint: do it first for functionssupported on a single chart.)

5.1.9 Markov triple on a compact Riemannian manifold

Let M be a compact Riemannian manifold and take W ∈ C∞(M). Then∫M e−W dV <∞, and so by adding a constant to W we may assume without

loss of generality that∫M e−W dV = 1. By approximating indicators of open

sets with smooth functions, one can define a probability measure µ on onthe Borel sets of M using the formula

µ(A) =

∫M

1Ae−W dV.

Let A = C∞(M) and define Γ : A×A → A by

Γ(f, g) = 〈∇f,∇g〉 = gij∂if∂jg.

106

Page 108: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Because the Riemannian metric is a non-negative bilinear form, Γ is also anon-negative bilinear form. Moreover, Γ is a diffusion carre du champ in thesense of Definition 1.9.1.

Now, Theorem 5.1.22 implies that for f, g ∈ C∞(M)∫Mf∆g dµ =

∫M

(e−W f)∆g dV

= −∫M

Γ(e−W f, g) dV

= −∫M

Γ(f, g)− fΓ(W, g) dµ.

Setting Lg = ∆g − 〈∇W,∇g〉 for g ∈ C∞(M), we have the integration byparts formula ∫

MfLg dµ = −

∫M

Γ(f, g) dµ.

Hence, we obtain the bound∫M

Γ(f, g) dµ ≤ ‖Lf‖L2(µ)‖g‖L2(µ).

This implies that the Dirichlet form E is closable. All the required propertiesof Definition 1.9.5 follow immediately, except possibly for (h). For this lastproperty, we appeal to a classical result on parabolic regularity:

Theorem 5.1.23 (Parabolic regularity theorem). Suppose that u : M×R→R is a bounded solution to

∂u(x, t)

∂t= ∆u− 〈∇W,∇u〉

u(x, 0) = f(x)(5.1)

where f,W ∈ C∞(M). Then u ∈ C∞(M).

With Theorem 5.1.23 in hand, note that the function u(x, t) = (Ptf)(x)solves (5.1). It follows that for any fixed t, Ptf = u(·, t) ∈ C∞(M) = A. Itfollows now that (M,µ,Γ) is a compact Markov triple.

5.1.10 Curvature

Definition 5.1.24 (Lie bracket). Given X,Y ∈ TM , their Lie bracket isthe vector field [X,Y ] defined by

[X,Y ]f = X(Y f)− Y (Xf).

107

Page 109: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Exercise 5.1.17. Show that [X,Y ] defines a vector field.

Definition 5.1.25 (Riemannian curvature). Define the map R : TM ×TM × TM → TM by

R(X,Y, Z) = ∇X∇Y Z −∇Y∇XZ −∇[X,Y ]Z.

The Riemannian curvature endomorphism is the (3, 1)-tensor defined by

R(X,Y, Z, ω) = ω(R(X,Y, Z)).

The Riemannian curvature tensor is the (4, 0) tensor defined by raising thelast index of R:

Rm(X,Y, Z,W ) = 〈R(X,Y, Z),W 〉.

Exercise 5.1.18. Show that Rm is a (4, 0)-tensor.

Exercise 5.1.19. On Rn with the Euclidean metric, the Riemannian cur-vature is identically zero.

Definition 5.1.26 (Ricci curvature). The Ricci curvature is the (2, 0)-tensor Ric obtained by contracting the first and last positions of the Rie-mannian curvature endomorphism:

Ric(X,Y ) = R(ei, X, Y, φi)

for some basis ei and its dual basis φi.

We write R`ijk for the coordinates of R and Rij for the coordinates ofRic, so that

Rij = Rkkij .

Proposition 5.1.27. The Ricci curvature is a symmetric (2, 0)-tensor, inthe sense that Ric(X,Y ) = Ric(Y,X). Or in coordinates, Rij = Rji.

The symmetry of the Ricci tensor follows from some (not completelytrivial) symmetries of the Riemannian curvature tensor. See, for exam-ple, [Lee06, Proposition 7.4].

When our manifold M comes with a probability density e−W , it turnsout to be natural to consider a modified definition of curvature.

Definition 5.1.28 (Weighted Ricci curvature). Given W ∈ C∞(M), itsweighted Ricci curvature is the (2, 0)-tensor RicW defined by

RicW (X,Y ) = Ric(X,Y ) + (∇2W )(X,Y ).

108

Page 110: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

The connection between curvature and the Γ-calculus is given by Bochner’sformula. We define grad f = (∇f)] and

|∇2f |2 =∑i

|∇Ei∇f |2,

where Ei is an orthonormal basis.

Proposition 5.1.29. Let M be a Riemannian n-manifold and take W ∈C∞(M). Define

Lf = ∆f − 〈∇W,∇g〉Γ(f, g) = 〈∇g,∇g〉

Γ2(f) =1

2LΓ(f)− Γ(f, Lf).

ThenΓ2(f) = RicW (grad f, grad f) + |∇2f |2.

In proving Proposition 5.1.29, the computations will be much less messyif we work in a special set of coordinates known as normal coordinates. At afixed point p, normal coordinates around p have two nice properties at thepoint p:

〈∂i|p, ∂j |p〉 = δij and (∇∂i∂j)|p = 0.

(Note that these properties only hold at exactly the point p, and not ona neighborhood of it.) For a construction of these coordinates, see [Lee06,Chapter 5].

Lemma 5.1.30. In normal coordinates at p, (∆f)(p) =∑n

i=1(∂2i f)(p).

Proof. For this entire proof, everything is evaluated at p. By the definitionof the trace, ∆f = tr∇2f = (∇2f)(∂i, ∂j)g

ij . In normal coordinates, gij isthe identity matrix at p, and hence ∆f =

∑i(∇2f)(∂i, ∂i). Now,

(∇2f)(∂i, ∂i) = (∇∂if)(∂i) = ∂i(∇∂if)−∇f(∇∂i∂i),

where the last equality follows from the definition of what ∇∂i does to thecovector field ∇f . Finally, note that in normal coordinates ∇∂i∂i = 0.Hence,

(∇2f)(∂i, ∂i) = ∂i(∇∂if) = ∂2i f.

109

Page 111: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proof of Proposition 5.1.29. Fix a point p and fix normal coordinates aroundp. For this entire proof, everything is evaluated at p.

First, we note a simple consequence of the symmetry of Hessians. Bydefinition,

(∇2f)(X,Y ) = (∇X∇f)(Y ) = 〈∇X grad f, Y 〉.

Then the symmetry of ∇2f implies that

〈∇X grad f, Y 〉 = 〈∇Y grad f,X〉.

Beginning with Lemma 5.1.30,

1

2∆|∇f |2 =

1

2

n∑i=1

∂i∂i〈grad f, grad f〉

=

n∑i=1

∂i〈∇∂i grad f, grad f〉

=

n∑i=1

∂i〈∇grad f grad f, ∂i〉

=

n∑i=1

〈∇∂i∇grad f grad f, ∂i〉+ 〈∇grad f ,∇∂i∂i〉

=

n∑i=1

〈∇∂i∇grad f grad f, ∂i〉,

since ∇∂i∂i = 0 at p. We continue by recalling the definition of the Rieman-nian curvature tensor.

1

2∆|∇f |2 =

n∑i=1

Rm(∂i, grad f, grad f, ∂i) (5.2)

+n∑i=1

〈∇grad f∇∂i grad f, ∂i〉 (5.3)

+

n∑i=1

〈∇[∂i,grad f ] grad f, ∂i〉. (5.4)

By definition, (5.2) is Ric(grad f, grad f). To simplify (5.3), note that∇grad f∂i =

110

Page 112: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

0 at p; hence, the compatibility of ∇ with the metric implies that

(5.3) =n∑i=1

(grad f)(〈∇∂i grad f, ∂i〉)

= (grad f)n∑i=1

(∇2f)(∂i, ∂i)

= (grad f)∆f = 〈∇f,∇∆f〉.

For part (5.4), note that because ∇ is torsion-free and we are workingwith normal coordinates, [∂i, grad f ] = ∇∂i grad f −∇grad f∂i = ∇∂i grad f .Hence,

(5.4) =

n∑i=1

〈∇[∂i,grad f ] grad f, ∂i〉

=

n∑i=1

〈∇∇∂i grad f grad f, ∂i〉

=

n∑i=1

〈∇∂i grad f,∇∂i grad f〉

= |∇2f |2.

Putting these computations together,

1

2∆|∇f |2 = Ric(grad f, grad f) + 〈∇f,∇∆f〉+ |∇2f |2. (5.5)

This is in fact Bochner’s original formula, which corresponds to the claimof Proposition 5.1.29 when W is a constant function. As a consequenceof (5.5),

Γ2(f) =1

2L|∇f |2 − 〈∇f,∇Lf〉

= Ric(grad f, grad f) + |∇2f |2 − 1

2〈∇W,∇|∇f |2〉+ 〈∇f,∇〈∇W,∇f〉〉.

To simplify these last two terms, note that

1

2〈∇W,∇|∇f |2〉 =

1

2∇gradW |∇f |2 = 〈∇f,∇gradW f〉,

while

〈∇f,∇〈∇W,∇f〉〉 = ∇grad f 〈∇W,∇f〉= 〈∇grad f∇W,∇f〉+ 〈∇W,∇grad f∇f〉= ∇2W (grad f, grad f) + 〈∇f,∇gradW∇f〉.

111

Page 113: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

It follows then that

Γ2(f) = Ric(grad f, grad f) + |∇2f |2 +∇2W (grad f, grad f)

= RicW (grad f, grad f) + |∇2f |2.

Let (M, g) be a compact Riemannian n-manifold and let dµ = e−WdVbe a probability measure on M , where W ∈ C∞(M). Define Γ(f, g) =〈∇f,∇g〉 on A = C∞(M), so that (M,µ,Γ) is a compact Markov triple. Asconsequence of Proposition 5.1.29, lower bounds on the Ricci tensor in termsof the metric tensor imply curvature-dimension conditions on (M,µ,Γ).

Corollary 5.1.31.

(a) If W is constant and Ric ≥ ρg then (M,µ,Γ) satisfies CD(ρ, n).

(b) If RicW ≥ ρg then (M,µ,Γ) satisfies CD(ρ,∞).

(c) If RicW ≥ ρg and m ∈ (n,∞] satisfies RicW ≥ ρg + 1m−n∇W ⊗∇W

then (M,µ,Γ) satisfies CD(ρ,m).

Proof. Note that the first two points are just special cases of the last point.For the last point, note by Proposition 5.1.29 that

Γ2(f) = RicW (grad f, grad f) + |∇2f |2

≥ ρΓ(f) +1

m− n〈∇W,∇f〉2 + |∇2f |2.

Now, under local coordinates with ∂1, . . . , ∂n an orthogonal basis at p,|∇2f |2 is the sum of squared entries of an n×nmatrix, while ∆f is the sum ofthe diagonal entries. By the Cauchy-Schwarz inequality, |∇2f |2 ≥ 1

n(∆f)2.Hence,

Γ2(f) ≥ ρΓ(f) +1

m− n〈∇W,∇f〉2 +

1

n(∆f)2.

Thanks to the inequality a2

m + b2

m−n ≥1m(a + b)2, the CD(ρ,m) condition

follows.

5.2 The n-sphere

Consider the n-sphere Sn; that is, the set Sn = p ∈ Rn+1 : |p|2 = 1endowed with the metric coming from Rn+1. To be precise, we identifyTpS

n with the subspace p⊥ ⊂ Rn+1; then the metric on TpSn ⊂ Rn+1 is

defined to be the restriction of the usual Euclidean inner product.

112

Page 114: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Clearly Sn is a compact Riemannian manifold, and so we may apply ourearlier machinery in order to construct a compact Markov triple (Sn, µ,Γ),where µ is the normalized Riemannian volume element on Sn. However,in order to apply any of our inequalities to this Markov triple, we mustfirst compute the Ricci tensor on Sn. There are many different methodsof carrying out this computation, although most of them require a morethorough introduction to methods in Riemannian geometry than these notesprovide.

We will do our computation in the coordinates

(p1, . . . , pn+1) = (x1, . . . , xn,√

1− |x|2).

The final step, for simplicitly, will be carried out only at the “north pole,”where x = 0 and p = (0, . . . , 0, 1). The rotational symmetry of Sn, however,will imply that the conclusion applies at every point.

As a first step, we compute the metric in our coordinates. On the onehand, 〈 ∂

∂pi, ∂∂pj〉 = δij . On the other hand,

∂pi

∂xj=

δij i ≤ nxj√

1−|x|2i = n.

From this, it follows that

gij = 〈 ∂∂xi

,∂

∂xj〉 = 〈 ∂

∂pk∂pk

∂xi,∂

∂p`∂p`

∂xj〉 = δij +

xixj

1− |x|2.

It will be convenient also to compute the inverse of the metric: in localcoordinates,

gij = δij − xixj ;

this may be verified by checking that gijgjk = δik. From now on, we write

∂i instead of ∂∂xi

.Next, we will compute the connection in local coordinates. Note that a

connection is determined by by the vector fields ∇∂i∂j : once we know these,∇XY = ∇Xi∂i(Y

i∂i) is determined by properties (a)–(c) of Definition 5.1.17.We claim that the Riemannian connection on Sn is determined by

∇∂i∂j =xixj

1− |x|2xk∂k + δijx

k∂k;

to check this, it suffices to show that the quantity above is torsion-free andcompatible with the metric. The torsion-free property holds because ∇∂i∂j

113

Page 115: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

is symmetric in i and j, and because [∂i, ∂j ] = 0. For compatibility with themetric, we need to check that

∇∂i〈∂j , ∂k〉 = 〈∇∂i∂j , ∂k〉+ 〈∂j ,∇∂i∂k〉.

On the left hand side,

∇∂i〈∂j , ∂k〉 = ∂i

(δjk +

xjxk

1− |x|2

)=

2xixjxk

(1− |x|2)2+δijx

k + δikxj

1− |x|2.

On the other hand,

〈∇∂i∂j , ∂k〉 =

(xixj

1− |x|2+ δij

)x`〈∂`, ∂k〉

=

(xixj

1− |x|2+ δij

)∑`

x`(δk` +

xkx`

1− |x|2

)=

xk

1− |x|2

(xixj

1− |x|2+ δij

).

It follows then that ∇ is compatible with the metric.Next, we will calculate the coordinates of the Riemannian curvature

tensor at the north pole (i.e., where x = 0). Since [∂i, ∂j ] = 0, we have

Rijk` = 〈∂`,∇∂i∇∂j∂k −∇∂j∇∂i∂k〉.

Now,

∇∂i∇∂j∂k = ∇∂i((

xjxkx`

1− |x|2+ δjkx

`

)∂`

)=

(∇∂i

(xjxkx`

1− |x|2+ δjkx

`

))∂` +

(xjxkx`

1− |x|2+ δjkx

`

)∇∂i∂`.

At the point x = 0, various parts of this expression vanish. The secondterm vanishes entirely, since ∇∂i∂` = 0. For the first term, ∇∂i x

jxkx`

1−|x|2 also

vanishes at x = 0. Hence,

∇∂i∇∂j∂k|x=0 = δjk∂i

Since gij = δij at x = 0, we have

Rijk` = 〈∂`, δjk∂i − δik∂j〉 = δjkδi` − δijδj`.

Hence, the components of the Ricci tensor at x = 0 are

Rjk = Rijk`gi` = (n− 1)δjk = (n− 1)gjk.

It follows then that on Sn, Ric = (n− 1)g.

114

Page 116: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

5.2.1 The ultraspheric triple

Recall the ultraspheric Markov triple from Example 1.11.5: A = C∞([−1, 1])with ΓU (f)(x) = (1 − x2)f ′(x)2 and dµn(x) = Cn(1 − x2)n/2−1 dx. To seethe connection between this triple and the compact Markov triple on Sn,suppose that we have a function h on Sn that only depends on the lastcoordinate: h(p) = f(pn+1). Using our earlier computation for ΓSn in localcoordinates,

ΓSn(h) =n∑i=1

(∂ih)2 −n∑

i,j=1

xixj∂ih∂jh.

Now, h(x) = f(√

1− x2) and so ∂ih = f ′(√

1− x2) xi√1−|x|2

. Hence

ΓSn(h) = f ′(√

1− x2)2

[|x|2

1− |x|2− |x|4

1− |x|2

]= f ′(pn+1)2(1− p2

n+1).

In other words, (ΓSn(h))(p) = (ΓU (f))(pn+1). This begins to clarify thecorrespondence between the triples (Sn, dV,ΓSn) and ((−1, 1), µn,ΓU ): theultraspheric carre du champ acts on f in the same way that the sphericalcarre du champ acts on p 7→ f(pn+1).

The same correspondence holds at the level of the measures. Supposethat f is a function supported on (0, 1), so that p 7→ f(pn+1) is supportedon a single chart. If B = x ∈ Rn : |x| ≤ 1 then∫

Snf(pn+1) dV =

∫B

√| det g|f(

√1− x2) dx

=

∫B

f(√

1− |x|2)√1− |x|2

dx,

where we used Sylvester’s determinant theorem to compute

det g = det

(In −

xxT

1− |x|2

)= 1− |x|2

1− |x|2=

1

1− |x|2.

Moving to polar coordinates,∫Snf(pn+1) dV = sn−1

∫ 1

0rnf(√

1− r2)√1− r2

dr

where sn−1 is the surface area of the (n− 1)-dimensional unit sphere. Withthe change of variables t =

√1− r2,∫

Snf(pn+1) dV = sn−1

∫ 1

0(1− t2)n/2−1f(t) dt.

115

Page 117: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

By a similar argument for the lower hemisphere, if f is any smooth functionon [0, 1] then ∫

Snf(pn+1) dV = sn−1

∫ 1

−1(1− t2)n/2−1f(t) dt.

Up to the difference in normalizing constants, this is the same as∫ 1−1 f dµ.

In this sense, the Markov triple ((−1, 1), µn,ΓU ) may be seen as the Markovtriple (Sn, dV,ΓSn) restricted to functions only depending on one coordi-nate.

116

Page 118: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Chapter 6

Gaussian spaces

6.1 Infinite-dimensional Gaussian measures

In this section, we develop another class of spaces on which we can placecompact Markov triples with curvature lower bounds: the class of Gaussianmeasures on Banach spaces. In the end, the construction of such Markovtriples will be essentially trivial – it will essentially be an extension of theOrnstein-Uhlenbeck example. However, the construction of the spaces them-selves is interesting in its own right, and we will give a self-contained intro-duction. For more background on infinite-dimensional Gaussian measures,see [Bog98]; to see more general diffusion processes on infinite-dimensionalspaces, see [DPZ14].

6.1.1 Integration on Banach spaces

Let (E, ‖ · ‖) be a separable Banach space; recall that separable means thatthere exists a countable, dense subset of E. We write E∗ for the dual spaceof E: the set of bounded linear functionals on E with norm

‖φ‖∗ = supφ(x) : x ∈ E, ‖x‖ = 1.

We write B(E) for the Borel σ-algebra on E; from now on, when we say thata X is an E-valued random variable, we mean that it is B(E)-measurable.

Exercise 6.1.1.

(a) If E is separable then there exists a countable set φn ⊂ E such that‖x‖ = supn φn(x) for every x ∈ E.

117

Page 119: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

(b) B(E) is generated by sets of the form x ∈ E : φ(x) ≤ a, for φ ∈ E∗and a ∈ R.

For a random variable X : Ω → E on a probability space (Ω,F , µ), wesay that X is integrable if E‖X‖ <∞, where E‖X‖ =

∫Ω ‖X‖ dµ; note here

that ‖ · ‖ : E → R is a continuous, and hence Borel measurable, function.Hence, ‖X‖ is B(R)-measurable.

When a random variable is integrable, we may define its integral, alsoknown as the Bochner integral. We do this in two stages: first, if X =∑N

i=1 xi1Ai is simple (i.e. takes only finitely many values) then we define

EX =

N∑i=1

xiµ(Ai).

In the general case, we will use the fact that any random variable may beapproximated in norm by simple ones:

Lemma 6.1.1. If E is a separable Banach space and X : Ω → E is arandom variable then there exists a sequence Xn : Ω→ E of simple randomvariables such that ‖X −Xn‖ 0 pointwise.

Proof. Let x1, x2, . . . be a countable dense subset of E. Define

Xn(ω) = argminx∈x1,...,xn ‖X(ω)− x‖,

where if there are multiple xi that achieve the minimum then we choosethe one with the smallest i. Clearly, each Xn is measurable and simple.Moreover,

‖Xn −X‖ = mini=1,...,n

‖X − xi‖,

which is non-increasing in n. Finally, the density of the set xi ensuresthat ‖Xn −X‖ → 0 pointwise.

Given an integrable random variable X, we will use Lemma 6.1.1 todefine

∫E X dµ: let Xn be a sequence of simple random variables such that

‖Xn −X‖ 0 pointwise. Then

‖EXn − EXm‖ ≤ E‖Xn −X‖+ E‖Xm −X‖,

which converges to zero by the dominated convergence theorem and the factthat X is integrable. Hence, EXn is a Cauchy sequence in E; then we define

EX = limn→∞

EXn.

118

Page 120: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Note that this definition does not depend on the choice of approximatingsequence Xn: given another such sequence Xn,

‖EXn − EXn‖ ≤ E‖Xn −X‖+ E‖Xn −X‖ → 0;

hence EXn and EXn have the same limit.

Exercise 6.1.2. The Bochner integral has the usual properties of integrals:if X and Y are integrable E-valued random variables then

(a) for a ∈ R, aX is integrable and∫E aX dµ = a

∫E X dµ;

(b) X + Y is integrable and∫E X + Y dµ =

∫E X dµ+

∫E Y dµ;

(c) ‖EX‖ ≤ E‖X‖; and

(d) if A : E → F is a bounded linear operator between two separableBanach spaces then then AF is integrable and E[AX] = AEX.

6.1.2 Gaussian measures

A Gaussian measure µ on R takes one of two forms: it is either a unitmass at a point, or it has a density of the form 1√

2πσexp(−(x−m)2/(2σ2))

for some m,σ ∈ R. Recall that the push-forward of a measure µ on Eunder a measurable map f : E → F is the measure f∗µ on F defined by(f∗µ)(A) = µ(f−1(A)).

Definition 6.1.2. Let E be a separable Banach space. A (Borel) probabilitymeasure µ on E is Gaussian if for every φ ∈ E∗, φ∗µ is a Gaussian measureon R. If every φ∗µ has mean zero, we say that µ is centered.

In the case that E is a finite-dimensional space, this definition of Gaus-sian measures coincides with the usual one: a probability measure on Rnis Gaussian if it can be written as a linear push-forward of the standardGaussian measure dγn(x) = (2π)−n/2 exp(−|x|2/2) dx.

Exercise 6.1.3.

(a) Prove the claim of the previous paragraph.

(b) Let µ be a Gaussian measure on E. For a finite collection φ1, . . . , φk ∈E∗, show that (φ1, . . . , φk)∗µ is a Gaussian measure on Rk.

It turns out that many well-known facts about finite-dimensional Gaus-sian measures extend to Gaussian measures on Banach spaces. We beginwith a straight-forward one:

119

Page 121: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Lemma 6.1.3. Let µ be a centered Gaussian measure on E, and let X,Y ∼µ be independent. Then X+Y√

2and X−Y√

2are independent and have distribu-

tion µ.

Proof. Let U = X+Y√2

and V = X−Y√2

. First, note that φ(U) + ψ(V ) =1√2(φ(X) + ψ(X) + φ(Y ) − ψ(Y )) for any φ, ψ ∈ E∗. Since φ(X) + ψ(X)

and φ(Y )−ψ(Y ) are independent real-valued Gaussian variables, it followsthat φ(U) + ψ(V ) is also a real-valued Gaussian variable. Hence, U has aGaussian distribution on E. Moreover, for any finite collection of φi and ψj ,the finite-dimensional random variable

Z := (φ1(U), . . . , φk(U), ψ1(V ), . . . , ψ`(V ))

is Gaussian, since any linear functional of Z may be written in the formφ(U) + ψ(V ).

Now, recall that finite-dimensional Gaussian variables are independentif and only if they are uncorrelated. Applying this to X and Y , it followsthat for any φ, ψ ∈ E∗,

E[φ(U)ψ(V )] =E[φ(X)ψ(X)]− E[φ(Y )ψ(Y )]

2= 0.

It follows then that the first k components of Z are independent of the other` components.

Recall (from Exercise 6.1.1) that sets of the form x : φ(x) ≤ a generateB(E) (where φ ∈ E∗). By the monotone class theorem, if we want to showthat U and V are independent, it suffices to show that Pr(U ∈ A, V ∈B) = Pr(U ∈ A) Pr(V ∈ B) for sets A and B of the form x : φ1(x) ≤a1, . . . , φk(x) ≤ ak. However, this follows from the previous paragraph.Hence, U and V are independent.

The following theorem shows that any Gaussian measure on a separableBanach space has strong integrability properties.

Theorem 6.1.4 (Fernique). For every Gaussian random variable X on aseparable Banach space E, there exists some ε > 0 such that E exp(ε‖x‖2) <∞.

Proof. We first consider the case that µ is centered; then let X and Y beindependent random variables with distribution µ. By Lemma 6.1.3,

Pr(‖X‖ ≤ s) Pr(‖Y ‖ > t) = Pr(‖X − Y ‖ ≤√

2s, ‖X + Y ‖ >√

2t)

≤ Pr(√

2‖X‖ ≥ t− s,√

2‖Y ‖ > t− s)= Pr(

√2‖X‖ ≥ t− s)2,

120

Page 122: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

where the inequality follows because 2‖X‖ ≥ ‖X + Y ‖ − ‖X − Y ‖, andsimilarly for Y . SinceX and Y have the same distribution, we may rearrangethis to read

Pr(‖X‖ > t) ≤Pr(‖X‖ ≥ t−s√

2

)2

Pr(‖X‖ ≤ s). (6.1)

Since Pr(‖X‖ < ∞) = 1, there exists some t0 such that Pr(‖X‖ ≤t0) > 1

2 . We then recursively define tn = t0 +√

2tn−1; equivalently, tn =

t0(1 +√

2)(2(n+1)/2 − 1). Define

pn =Pr(‖X‖ > tn)

Pr(‖X‖ ≤ t0).

By (6.1) with s = t0 and t = tn, pn ≤ p2n−1. Then pn ≤ p2n

0 , recalling thatwe chose t0 so that p0 < 1.

Now,

E exp(ε‖X‖2) ≤ exp(εt20) +∞∑n=0

exp(εt2n+1) Pr(tn < ‖X‖ ≤ tn+1)

≤ exp(εt20) +

∞∑n=0

exp(ε36t202n)p2n

0

= exp(εt20) +∞∑n=0

exp(ε36t202n − 2n log(1/p0)),

where we have used the fact that tn+1 ≤ 3t02n+1. If we choose ε > 0 smaller

than log(1/p0)36t20

, then the sum above converges. This proves the claim in the

case that µ is centered.If µ is not centered, then (again with X and Y independent random

variables distributed according to µ) X − Y has a centered Gaussian distri-bution on E. It follows that for some ε > 0, E exp(ε‖X − Y ‖2) < ∞. ByFubini’s theorem, there exists some y ∈ E such that E exp(ε‖X−y‖2) <∞.Since ‖X‖2 ≤ 2‖X − y‖2 + ‖y2‖, it follows that

E exp( ε

2‖X‖2

)≤ eε‖y‖2E exp(ε‖X − y‖2) <∞.

As a corollary of Theorem 6.1.4, we may take means of Gaussian randomvectors. In particular, the following corollary states that every Gaussianmeasure is some shift of a centered Gaussian measure. In the future, we willmainly study centered Gaussian measures.

121

Page 123: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Corollary 6.1.5. Let X be a Gaussian random variable on E.

(a) X is integrable.

(b) For all φ ∈ E∗, Eφ(X) = φ(EX).

(c) X − EX is a centered Gaussian random variable on E.

Example 6.1.6. Let E = C([0, 1]), the space of continuous, real-valuedfunctions [0, 1], with the norm ‖f‖ = supt∈[0,1] |f(t)|. This is a separableBanach space; for example, the set of polynomials with rational coefficients iscountable and dense. The dual space E∗ is the set of finite, signed measureson [0, 1].

Let X be a Brownian motion on [0, 1], starting at 0. We remark that theusual construction of Brownian motion gives a process that is measurableunder the product σ-algebra R[0,1]. However, this coincides with B(E) sincethe closed unit ball f ∈ E : ‖f‖ ≤ 1 may be written as the countableintersection

⋂t∈Qf ∈ E : |f(t)| ≤ 1.

We claim that X is a centered Gaussian random variable on E. Tocheck that it is Gaussian, consider first a signed measure µ ∈ E∗ of theform µ =

∑ni=1 aiδti. Then µ(X) =

∑ni=1 aiX(ti), which has a Gaussian

measure by the definition of Brownian motion. Note that every µ ∈ E∗

may be written as the weak limit of measures µn, where each µn is a finitesum of Dirac measures. Moreover, recall (or prove as an exercise, using theFourier transform) that every pointwise limit of Gaussian random variableson R is Gaussian. Since every µn(X) is Gaussian, it follows that µ(X) isalso Gaussian. Hence, X is Gaussian. To see that X is centered, it sufficesto observe that whenever µ =

∑ni=1 aiδti then

Eµ(X) =n∑i=1

aiEX(ti) = 0.

6.1.3 The Cameron-Martin space

Let γ be a centered Gaussian measure on E. Since every linear functional φ ∈E∗ is a square-integrable function E → R, E∗ naturally embeds into L2(γ),the Hilbert space of (equivalence classes of) square-integrable functions E →R. Let E∗γ denote the closure of E∗ in L2(γ); as a closed, subspace of aHilbert space, E∗γ is also a Hilbert space. We write 〈·, ·〉∗,γ for its innerproduct and | · |∗,γ for its norm. Note that the inclusion from E∗ to E∗γ maynot be injective: for example, if γ is just a Dirac measure then all linearfunctionals φ ∈ E∗ coincide as elements of L2(γ).

122

Page 124: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Now let X be a E-valued random variable with distribution γ. Con-sider the map J : E∗γ → E defined by J(φ) = E[Xφ(X)] (where Xφ(X)is integrable because E‖Xφ(X)‖ = E[‖X‖|φ(X)|], which is finite by theCauchy-Schwarz inequality and Fernique’s theorem). The map J is linearand injective: linearity is obvious; for injectivity, note that for any φ ∈ E∗that is not zero as an element of L2(γ), φ(J(φ)) = E[φ(X)2] 6= 0, and soJ(φ) 6= 0. Since E∗ is dense in E∗γ , it follows that J is injective on E∗γ also.

Let Hγ ⊂ E be the range of J on E∗γ . Since J is injective, we may definean inner product 〈·, ·〉γ on Hγ by 〈x, y〉γ = 〈J−1(h), J−1(y)〉∗,γ . Then Hγ

is a Hilbert space; we call it the Cameron-Martin space for γ. Sometimes,either E∗γ or Hγ is also called the reproducing kernel Hilbert space of γ.

The Cameron-Martin space may also be characterized as the Hilbertspace that induces the L2(γ) norm on E∗:

Lemma 6.1.7. Hγ is the unique Hilbert space H that is continuously in-cluded in E and satisfies

|φ|∗,γ = supφ(h) : h ∈ H, |h|H ≤ 1

for all φ ∈ E∗.

Proof. By definition, |h|γ ≤ 1 if and only if h = J(ψ) for some ψ ∈ E∗γ withE[ψ(X)2] ≤ 1. Hence,

E[φ(X)2] = supE[ψ(X)φ(X)] : ψ ∈ E∗γ ,E[ψ(X)2] ≤ 1= supφ(J(ψ)) : ψ ∈ E∗γ ,E[ψ(X)2] ≤ 1= supφ(h) : h ∈ Hγ , |h|γ ≤ 1.

Moreover, the embedding of Hγ into E is continuous because

‖J(φ)‖2 ≤ E[‖X‖|φ(X)|]2 ≤ E[‖X‖2]|J(φ)|2γ .

Hence, Hγ satisfies the conditions of the lemma.Now suppose that H is another Hilbert space with the same properties.

Then we may define a map J : E∗ → H ⊂ E by

〈J(φ), h〉H = φ(h);

indeed, note that because H is continously embedded in E, the right handside above is a bounded linear functional on H, and hence can be representedas an inner product. Moreover, the assumption of the lemma implies thatfor every φ ∈ E∗,

|φ|∗,γ = sup〈J(φ), h〉H : |h|H ≤ 1 = |J(φ)|H .

123

Page 125: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

That is, J is an isometry from E∗ ⊂ L2(µ) to H ⊂ E. Hence,

ψ(J(φ)) = 〈J(φ), J(ψ)〉H = E[φ(X)ψ(X)] = ψ(J(φ))

for any ψ, φ ∈ E. It follows that J = J |E∗ . Since Hγ is the closure of therange of J |E∗ , if we can show that the range of J is dense in H then it willimply H = Hγ . For this last step, note that if h ∈ H is orthogonal to therange of J then 0 = 〈h, J(φ)〉H = φ(h) for all φ ∈ E∗, so h = 0.

A consequence of Lemma 6.1.7 is that the Cameron-Martin space Hγ isan intrinsic feature of γ, in the sense that it is unchanged if we embed Einto a larger space:

Corollary 6.1.8. Let E1 ⊂ E2 be separable Banach spaces. Let γ1 be aGaussian measure on E1 and let γ2 be its extension to E2 (i.e. γ2(A) =γ1(A ∩ E)). Then Hγ1 = Hγ2.

Proof. Take X ∼ γ2 and φ2 ∈ E∗2 . Let φ1 ∈ E∗1 be φ2 restricted to E1. Notethat X ∈ E1 almost surely, and hence

|φ2|2∗,γ = E[φ2(X)2] = E[φ1(X)2] = |φ1|2∗,γ .

Now, Lemma 6.1.7 implies that

|φ2|∗,γ = |φ1|∗,γ = supφ1(h) : |h|Hγ1 ≤ 1 = supφ2(h) : |h|Hγ1 ≤ 1.

By the uniqueness part of Lemma 6.1.7 applied to E2, it follows that Hγ2 =Hγ1 .

Before proceeding to some examples, we will mention without proof onefundamental property of Cameron-Martin spaces (which, historically, wasthe property that Cameron and Martin were originally interested in). Givena measure γ on E and some h ∈ E, we write γh for the shifted measureγh(A) = γ(A − h). Cameron and Martin were studying the problem ofwhen γh has a density with respect to γ. For finite-dimensional Gaussianmeasures this is easy, and it depends only on the support of γ: assumingfor convenience that γ is centered, γh has a density with respect to γ if andonly if h belongs to the support of γ.

Theorem 6.1.9 (Cameron-Martin formula). Let γ be a Gaussian measureon a separable Banach space E. If h ∈ E \Hγ then γh and γ are mutuallysingular. If h ∈ Hγ then γh and γ are mutually absolutely continuous, and

dγhdγ

= exp

(g(x)− 1

2|h|2γ

),

where g = J−1(h) ∈ X∗γ .

124

Page 126: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Example 6.1.10. Let X be a Brownian motion on [0, 1] and let γ be itsdistribution. We already saw that X is a centered Gaussian random variable,and that if µ ∈ E∗ is a finite, signed measure then

|µ|2∗,γ =

∫ 1

0

∫ 1

0s ∧ t dµ(s) dµ(t).

In order to compute the Cameron-Martin space of γ, we rewrite this quantity:∫ 1

0

∫ 1

0s ∧ t dµ(s) dµ(t). =

∫ 1

0

∫ 1

0

∫ 1

01r≤s1r≤t dr dµ(s) dµ(t)

=

∫ 1

0µ([r, 1])2 dr.

Now let g : [0, 1] → R be a smooth function with g(0) = 0. Integrating byparts, ∫ 1

0g′(r)µ([r, 1]) dr =

∫ 1

0g(r)dµ(r).

Since C∞([0, 1]) is dense in L2([0, 1]),

|µ|∗,γ =

(∫ 1

0µ([r, 1])2 dr

)1/2

= sup

∫ 1

0µ([r, 1])g′(r) dr : g ∈ C∞([0, 1]), g(0) = 0,

∫ 1

0g′(r)2 dr ≤ 1

= sup

∫ 1

0g(r) dµ(r) : g ∈ C∞([0, 1]), g(0) = 0,

∫ 1

0g′(r)2 dr ≤ 1

.

Therefore, let H0γ ⊂ C([0, 1]) be the set of smooth g such that g(0) = 0 and∫ 1

0 g′(r)2 dr <∞, with the pre-Hilbert norm

|g|2γ =

∫ 1

0g′(r)2 dr.

According to Lemma 6.1.7 and the computation above, Hγ is the completionof H0

γ with respect to | · |γ. Equivalently, Hγ consists of all measurablefunctions g such that there exists h ∈ L2([0, 1]) with g(x) =

∫ x0 h(t) dt.

Example 6.1.11. Let Xn : n ≥ 1 be a sequence of independent real-valued Gaussian variables, with E[Xn] = 0 and E[X2

n] = σ2n. Depending on

the properties of the sequence σn, the sequence X = (X1, X2, . . . ) may beviewed as a Gaussian random variable in one of various different spaces.The details are left as an exercise.

125

Page 127: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

• If there is some M ∈ R such that∑

n≥1 exp(−M/σ2n) < ∞ then X

defines an `∞-valued random variable, where `∞ is the set of boundedsequences of real numbers, with the norm ‖x‖∞ = supn |xn|. Note thatthis is not a separable Banach space; there is no difficulty in extendingthe definition of Gaussian measures to non-separable Banach spaces.

• If for every ε > 0,∑

n≥1 exp(−ε/σ2n) <∞ then X defines a c0-valued

random variable, where c0 is the set of sequences of real numbers thatconverge to zero, with the norm ‖x‖∞ = supn |xn|.

• If∑

n≥1 σpn < ∞ for some 1 ≤ p < ∞ then X defines an `p-valued

random variable, where `p is the set of sequences x with

‖x‖pp =∑n≥1

|xn|p <∞.

In each of the cases above, the Cameron-Martin norm is given by

|x|2γ =∑n≥1

x2n

σ2n

and the Cameron-Martin space is the set of x ∈ E (for one of the vari-ous choices of E) such that |x|2γ < ∞. Note that in each of the cases, anysequence with |x|γ < ∞ belongs to E. To check that this is the Cameron-Martin norm, note that for any linear function of the form φ(x) =

∑n≥1 xnan,

|φ(x)|2∗,γ = E[φ(X)2] =∑n≥1

σ2na

2n = sup

∑n≥1

anxn :∑n≥1

x2n

σ2n

≤ 1

.

In c0 and `p for p < ∞, all linear functionals have the form above. In `∞,not all do; however, one can check that linear functionals of the form abovegenerate B(E), and that it is enough to check the condition of Lemma 6.1.7on a set of linear functionals that generate B(E).

6.1.4 Gaussian Hilbert spaces

In the case that γ is a centered Gaussian measure on a Hilbert space E,several nice simplifications occur. In particular, E∗ is canonically isomorphicto E, and so we have the inclusions

Hγ ⊂ E ⊂ E∗γ .

126

Page 128: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Moreover, there is a nice notion of covariance on Gaussian Hilbert spaces:let K : E → E be defined by

〈Kx, y〉E = 〈x, y〉∗,γ = E[〈X,x〉E〈X, y〉E ],

where X ∼ γ and we use 〈·, ·〉E to denote the inner product under which Eis a Hilbert space. By the Cauchy-Schwarz inequality,

E[〈X,x〉E〈X, y〉E ] ≤ |x|E |y|EE[|X|2E ],

and Fernique’s theorem implies that E[|X|2E ] <∞. It follows then that K isa bounded operator. Note that K is also symmetric (and hence self-adjoint)and non-negative.

Recall that a compact operator is one that maps bounded sets to compactsets, or equivalently one that maps weakly converging sequences to stronglyconverging sequences. If xn ∈ E is a sequence that weakly converges to zero,then

|√Kxn|2E = 〈Kxn, xn〉E ≤ E[〈X,xn〉2E ],

which converges to zero by the dominated convergence theorem, since theintegrand is bounded by |X|2E maxn |xn|2E . Then

√K is compact, and so K

is also. Now, every compact operator on a Hilbert space may be decom-posed in terms of its eigenvectors and eigenvalues: there exists a completeorthonormal sequence en and a real sequence λn such that

Kx =∑n≥1

λnen〈en, x〉E .

Moreover, zero is the only possibly accumulation point of the sequence λn.Note also that since X =

∑n≥1 en〈X, en〉E ,

E[|X|2E ] =∑n≥1

E[〈X, en〉2E ] =∑n≥1

〈Ken, en〉E =∑n≥1

λn.

Hence, K is a trace-class operator, meaning an operator such that∑n≥1

〈Ken, en〉E <∞

for some (or equivalently, every) orthonormal basis en. Finally, one cancharacterize Hγ , E, and E∗γ in terms of the orthonormal basis en of K’s

127

Page 129: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

eigenvectors:

E =∑n≥1

anen :∑n≥1

a2n <∞

with norm

∣∣∣∑n≥1

anen

∣∣∣2E

=∑n≥1

a2n

Hγ =∑n≥1

anen :∑n≥1

a2n

λn<∞

with norm

∣∣∣∑n≥1

anen

∣∣∣2γ

=∑n≥1

a2n

λn

E∗γ =∑n≥1

anen :∑n≥1

λna2n <∞

with norm

∣∣∣∑n≥1

anen

∣∣∣2∗,γ

=∑n≥1

λna2n.

As another interpretation, the `2 case of Example 6.1.11 is essentially theonly construction of a Gaussian Hilbert space.

6.1.5 The white noise representation

Theorem 6.1.12. Let γ be a Gaussian measure on the separable Banachspace E. Let e1, e2, . . . be an orthonormal basis for Hγ that belongs to JE∗,and let ξ1, ξ2, . . . be independent standard Gaussian variables on R. Then∑

n≥1 ξiei converges almost surely to a random variable with distribution γ.

Before proceeding to the proof of Theorem 6.1.12, we need to introducea few tools. The first is Prohorov’s theorem, which will help us prove con-vergence in distribution:

Definition 6.1.13 (Tightness). A sequence µn of probability measures istight if for every ε > 0, there exists a compact set K such that for every nµn(K) ≥ 1− ε. A sequence of random variables is tight if their distributionsare tight.

Theorem 6.1.14 (Prohorov). A sequence of probability measures is tightif and only if it is weakly relatively compact (i.e. if every subsequence has afurther subsequence that converges in distribution).

In order to apply Prohorov’s theorem, we first observe that every prob-ability measure on a separable Banach space can be exhausted by compactsets. Equivalently, every singleton set µ is tight. This property is alsosometimes called inner regular : µ is inner regular if µ is tight.

Lemma 6.1.15. For every probability measure µ on a separable Banachspace E, and for every ε > 0, there exists a compact set K with µ(K) ≥ 1−ε.

128

Page 130: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Proof. Let xn be a countable, dense subset of E. Define

Fm,k =

m⋃i=1

B(xi, 1/k),

where B(x, r) is the closed ball of radius r centered at x. Since Fm,k → Eas m→∞, there exists some mk such that µ(Fmk,k) ≥ 1− 2−kε. Note thatFmk,k is closed.

Let K =⋂∞k=1 Fmk,k. Then µ(K) ≥ 1 −

∑k≥1 2−kε = 1 − ε. Moreover,

K is compact because it is closed and totally bounded.

The second tool that we will need for the proof of Theorem 6.1.12 is aso-called maximal inequality. This will help us to upgrade the convergence indistribution (provided by Prohorov’s theorem) into almost sure convergence.

Lemma 6.1.16. Let Xn be a sequence of independent random variableson E with symmetric distributions (i.e. Xn and −Xn have the same distri-bution). Let Sn =

∑ni=1Xn. Then for all t ≥ 0,

Pr(maxi≤n‖Si‖ ≥ t) ≤ 2 Pr(‖Sn‖ ≥ t).

Proof. Let N be the smallest i that ‖Si‖ ≥ t. Then

Pr(maxi≤n‖Si‖ ≥ t) =

n∑i=1

Pr(N = i).

Conditioned on N = i ≤ n, the probability that ‖Sn‖ ≥ t is at least 12 , since

the random variables ‖SN +∑n

i=N+1Xi‖ and ‖SN −∑n

i=N+1Xi‖ have thesame distribution conditioned on N , and at least one of them is at least t.Hence, Pr(N = i) ≤ 2 Pr(N = i, ‖Sn‖ ≥ t), which implies the claim.

Proof of Theorem 6.1.12. Since the claim of the theorem depends only onthe distribution of the sequence ξn, it suffices to prove the theorem ona specific probability space. In particular, we take the probability space(E, γ). Take φn = J−1(en) ∈ E∗, which then form an orthonormal basisfor E∗γ which is dual to the basis en in the sense that φm(en) = δmn. Letξn : E → R be given by ξn(x) = φn(x); since the φn are orthonormal inE∗γ , the ξn are independent real-valued Gaussian variables. Let Sn(x) =∑n

i=1 eiξi(x) and set S(x) = x. Note that S is an E-valued random variablewith distribution γ; our goal will be to show that Sn → S almost surely.

129

Page 131: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

For any m ≤ n, φm(Sn) = φm(x) = φm(S). On the other hand, if m > nthen φm(Sn) = 0. It follows that φm(S − Sn) and φm(S) are independentfor all m (since one of them is always trivial). Hence, φ(S − Sn) and φ(S)are independent for every φ in E∗γ , which is the closed linear span of φm.Since E∗γ generates B(E) up to γ-null sets, it follows that S−Sn and Sn areindependent; let Rn = S − Sn.

Next, we show that Rn is a tight sequence of random variables. Forε > 0, choose a compact set K ⊂ E such that γ(K) ≥ 1− ε/2. Then

γ(K) = Pr(S ∈ K) = Pr(Rn + Sn ∈ K) ≥ 1− ε/2.

By Fubini’s theorem (over the independent random variables Rn and Sn),there exists some x ∈ E such that

Pr(x+Rn ∈ K) ≥ 1− ε/2.

Since Rn is symmetric, we have also that Pr(x−Rn ∈ K) ≥ 1− ε/2. Hence,with probability at least 1 − ε, both x + Rn and x − Rn belong to K. Onthis event, Rn ∈ K −K = y − z : y ∈ K, z ∈ K, which is a compact set.To summarize, Pr(Rn ∈ K −K) ≥ 1− ε, which implies that Rn is tight.

By Prohorov’s theorem, there is a random variable R such that somesubsequence of Rn converges to R in distribution. Since φm(Rn) = φm(S)−φm(Sn)→ 0 as n→∞, it follows that φm(R) = 0 for every m (recalling thatevery φm ∈ E∗ is a continuous function on E). Since the φm generate B(E)up to γ-null sets, every possible subsequential limit R of Rn is the constantrandom variable 0. Hence, S − Sn converges to zero in distribution, andtherefore also in probability. That is, Pr(‖S−Sn‖ > ε)→ 0 for every ε > 0.

Finally, we upgrade the convergence of S − Sn to be almost sure. ByLemma 6.1.16, for every m < n,

Pr( maxm<`≤n

‖S` − Sm‖ ≥ t) ≤ 2 Pr(‖Sn − Sm‖ ≥ t)

≤ 2 Pr(‖Sm − S‖ ≥ t/2) + 2 Pr(‖Sn − S‖ ≥ t/2).

Taking n→∞,

Pr(sup`>m‖S` − Sm‖ ≥ t) ≤ 2 Pr(‖Sm − S‖ ≥ t/2).

Since the right hand side converges to zero as m → ∞, it follows that S`has a limit almost surely (and it can only be zero, since that is the limit indistribution).

130

Page 132: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

6.1.6 A compact Markov triple

Using the orthonormal system of coordinates provided by the Cameron-Martin space Hγ , we will construct compact Markov triples on Gaussianmeasures. Let φn be an orthonormal (in L2(γ)) basis of E∗, and takeen = J(φn). Take A0 to be the set of functions of the form f(x) =g(φ1(x), . . . , φk(x), where g : Rk → R is a smooth, bounded function suchthat all of its derivatives vanish super-polynomially fast near ∞. DefineΓ : A0 ×A0 → A0 by

Γ(f1, f2)(x) = 〈∇g1(φ1(x), . . . , φk(x),∇g2(φ1(x), . . . , φk(x))〉,

where fi(x) = gi(φ1(x), . . . , φk(x)). Note that (since φ1, . . . , φk are orthonor-mal in L2(γ)), (φ1(x), . . . , φk(x)) has a standard Gaussan distribution on Rk.Integrating by parts, we have

Lf(x) = ∆g(φ1(x), . . . , φk(x))−k∑i=1

φi(x)∂ig(φ1(x), . . . , φk(x)).

Moreover, the associated semigroup Pt acts on f ∈ A0 in the same way thatthe Ornstein-Uhlenbeck semigroup acts on functions Rk → R:

(Ptf)(x) = Ef(e−tx+√

1− e−2tX)

=

∫Rkg(e−t(φ1(x), . . . , φk(x)) +

√1− e−2ty) dγn(y)

where X ∼ γ and γn is the standard Gaussian measure on Rk. (The easiestway to see that this agrees with etL is to work backwards: starting from theabove formula for Pt, we have already computed dPt

dt and it agrees with theformula for L above.)

Most of the properties of Definition 1.9.5 are trivial to check. We com-ment briefly on the last two: for f ∈ A0 with f(x) = g(φ1(x), . . . , φk(x)),note that

∆g −k∑i=1

xi∂ig

is still smooth, bounded, and has partial derivatives that vanish polynomi-ally fast at ∞; this shows that LA0 ⊂ A0. On the other hand, the formulaabove for Ptf implies that Ptf has the form Ptf(x) = gt(φ1(x), . . . , φk(x)),and that ∂igt = e−tPt∂ig. Moreover, the formula for Pt shows that if ∂igdecays super-polynomially at infinity then Pt∂ig does also. The same argu-ment applies to repeated derivatives of gt, and hence Ptf ∈ A0.

131

Page 133: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Now we have a compact Markov triple on (E, γ), but in order to getuseful inequalities we need to establish a curvature-dimension condition. Asbefore, this just reduces to the finite-dimensional problem. As we computedalready, the standard Gaussian measure on Rk satisfies CD(1,∞) for any k.Since A0 consists only of “finite-dimensional” functions, the same inequalityapplies for all f ∈ A0; we conclude that (E, γ,Γ) satisfies CD(ρ,∞).

In order to interpret the implications of the CD(ρ,∞) condition, notethat in our setting, there are multiple possible notions of Lipschitz functions:

(a) f is L-Lipschitz with respect to ‖ · ‖ if |f(x) − f(y)| ≤ L‖x − y‖ forevery x, y ∈ E;

(b) f is L-Lipschitz with respect to | · |γ if |f(x) − f(x + h)| ≤ L|h|γ forevery x ∈ E and h ∈ Hγ ; or

(c) f is L-Lipschitz with respect to Γ if f ∈ D(E) and Γ(f) ≤ 1 γ-almostsurely.

Of course, the third of these is the one that we have been using throughoutthese notes, and it is the one that we must use in interpreting consequencesof the CD(1,∞) property. However, these three notions are related:

Proposition 6.1.17. If f is L-Lipschitz with respect to ‖ · ‖ then it is CL-

Lipschitz with respect to | · |γ, where C = sup ‖h‖|h|γ : h ∈ Hγ , h 6= 0 <∞. If

f is L-Lipschitz with respect to | · |γ then it is L-Lipschitz with respect to Γ.

Proof. The first statement follows from the fact that ‖h‖ ≤ C|h|γ ; moreover,we have already seen that C is finite: it is equivalent to the fact that Hγ iscontinuously embedded in E.

To prove the second statement, suppose that f is 1-Lipschitz with respectto |·|γ (it suffices to check the case L = 1). By the white noise decompositionof Theorem 6.1.12, the random variable X =

∑n≥1 ξnen has distribution

γ, where ξn are independent real-valued standard Gaussians. In order toshow that f ∈ D(E), we need to produce an approximating sequence in A0.Therefore, define fn : E → R by

fn(x) = Ef

(n∑i=1

φi(x)ei +

∞∑i=n+1

ξiei

).

One can check directly that fn is the conditional expectation of f withrespect to the σ-algebra generated by φ1, . . . , φn. In particular, fn → f inL2(γ), because the set of all φi spans L2(γ).

132

Page 134: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Next, note that fn(x) depends only on φ1(x), . . . , φn(x). Therefore, letgn : Rn → R be defined by

gn(y) = fn

(n∑i=1

yiei

);

this way, fn(x) = gn(φ1(x), . . . , φn(x)). Then gn is 1-Lipschitz with respectto the Euclidean metric on Rn, since

|gn(y)− gn(y′)| ≤ E

∣∣∣∣∣f(

n∑i=1

yiei +

∞∑i=n+1

ξiei

)− f

(n∑i=1

y′iei +

∞∑i=n+1

ξiei

)∣∣∣∣∣≤

∣∣∣∣∣n∑i=1

yiei +∞∑

i=n+1

ξiei −n∑i=1

y′iei +∞∑

i=n+1

ξiei

∣∣∣∣∣γ

= |y − y′|.

By the usual analysis on Rn (e.g. convolving with smooth mollifiers), gn maybe approximated in L2(γn) by smooth, 1-Lipschitz functions gn,m in such away that |∇gn,m| converges in L1(γn) as m → ∞. We call the limit |∇gn|,and note that it may also be expressed as

|∇gn|(x) = limε→0

sup

|gn(x)− gn(y)||x− y|

: y 6= x, |y − x| ≤ ε. (6.2)

It follows that fn ∈ D(E) and

Γ(fn)(x) = |∇gn|2(φ1(x), . . . , φn(x)) ≤ 1.

Finally, we address the convergence of Γ(fn) as n → ∞. To establishthis convergence, we claim that Γ(fn) is a sub-martingale that is boundedby 1. From this, it will follow that Γ(fn) converges in L1(γ) to a limit thatis also bounded by 1. Then f ∈ D(E), and Γ(f) is equal (by definition) tothe limit of Γ(fn).

It remains to show that Γ(fn) is a sub-martingale (we already know thatit is bounded by 1). By Jensen’s inequality, it suffices to show that

√Γ(fn)

is a sub-martingale, which is equivalent to showing that for every n andevery x ∈ Rn,

|∇gn(x)| ≤ E[|∇gn+1|(x, ξ)],where ξ is a standard normal variable. To check this inequality, note thatgn(x) = E[gn+1(x, ξ)], and so by Jensen’s inequality

|gn(x)− gn(y)||x− y|

≤ E|gn+1(x, ξ)− gn+1(y, ξ)|

|x− y|.

133

Page 135: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Going back to (6.2), we have

|∇gn(x)| ≤ sup

E|gn+1(x, ξ)− gn+1(y, ξ)|

|x− y|: y 6= x, |y − x| ≤ ε

≤ E sup

|gn+1(x, ξ)− gn+1(y′)|

|x− y′|: y′ 6= (x, ξ), |y′ − (x, ξ)| ≤ ε

= E|∇gn+1(x, ξ)|.

134

Page 136: Joe Neeman November 3, 2016 - uni-bonn.de · 2016-11-03 · For example, we will require martingales, the Hahn-Banach theorem, and a working knowledge of various types of convergence

Bibliography

[BGL14] D. Bakry, I. Gentil, and M. Ledoux. Analysis and geometry ofMarkov diffusion operators. Springer, 2014.

[Bog98] V. I. Bogachev. Gaussian measures. American MathematicalSociety, 1998.

[DPZ14] G. Da Prato and J. Zabczyk. Stochastic equations in infinitedimensions. Cambridge University Press, 2014.

[Lee06] J. Lee. Riemannian manifolds: an introduction to curvature.Springer, 2006.

[LNP15] M. Ledoux, I. Nourdin, and G. Peccati. Stein’s method, logarith-mic Sobolev and transport inequalities. Geometric and FunctionalAnalysis, 25(1):256–306, 2015.

[MRN53] E. Marczewski and C. Ryll-Nardzewski. Remarks on the compact-ness and non direct products of measures. Fundamenta Mathe-maticae, 40(1):165–170, 1953.

135