stochastic processes and applications - …lucambio/ce222/2s2011/stoch_proc_notes.pdf · 2 elements...

STOCHASTIC PROCESSES ANDAPPLICATIONS

G.A. PavliotisDepartment of Mathematics

Imperial College LondonLondon SW7 2AZ, UK

June 9, 2011

Contents

Preface vii

1 Introduction 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 The One-Dimensional Random Walk . . . . . . . . . . . . . . . . 3

1.4 Stochastic Modeling of Deterministic Chaos . . . . . . . . . .. . 61.5 Why Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 71.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Elements of Probability Theory 92.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Basic Definitions from Probability Theory . . . . . . . . . . . .. 9

2.2.1 Conditional Probability . . . . . . . . . . . . . . . . . . . 122.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Expectation of Random Variables . . . . . . . . . . . . . 162.4 Conditional Expecation . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . 192.6 Gaussian Random Variables . . . . . . . . . . . . . . . . . . . . 20

2.7 Types of Convergence and Limit Theorems . . . . . . . . . . . . 232.8 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 25

2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Basics of the Theory of Stochastic Processes 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Definition of a Stochastic Process . . . . . . . . . . . . . . . . . 29

i

ii CONTENTS

3.3 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.1 Strictly Stationary Processes . . . . . . . . . . . . . . . . 31

3.3.2 Second Order Stationary Processes . . . . . . . . . . . . 32

3.3.3 Ergodic Properties of Second-Order Stationary Processes . 37

3.4 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.5 Other Examples of Stochastic Processes . . . . . . . . . . . . . .44

3.5.1 Brownian Bridge . . . . . . . . . . . . . . . . . . . . . . 44

3.5.2 Fractional Brownian Motion . . . . . . . . . . . . . . . . 45

3.5.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . 46

3.6 The Karhunen-Loeve Expansion . . . . . . . . . . . . . . . . . . 46

3.7 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 51

3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Markov Processes 574.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3 Definition of a Markov Process . . . . . . . . . . . . . . . . . . . 62

4.4 The Chapman-Kolmogorov Equation . . . . . . . . . . . . . . . . 64

4.5 The Generator of a Markov Processes . . . . . . . . . . . . . . . 67

4.5.1 The Adjoint Semigroup . . . . . . . . . . . . . . . . . . 69

4.6 Ergodic Markov processes . . . . . . . . . . . . . . . . . . . . . 70

4.6.1 Stationary Markov Processes . . . . . . . . . . . . . . . . 72


4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5 Diffusion Processes 775.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2 Definition of a Diffusion Process . . . . . . . . . . . . . . . . . . 77

5.3 The Backward and Forward Kolmogorov Equations . . . . . . . .79

5.3.1 The Backward Kolmogorov Equation . . . . . . . . . . . 79

5.3.2 The Forward Kolmogorov Equation . . . . . . . . . . . . 82

5.4 Multidimensional Diffusion Processes . . . . . . . . . . . . . .. 84

5.5 Connection with Stochastic Differential Equations . . .. . . . . . 85

5.6 Examples of Diffusion Processes . . . . . . . . . . . . . . . . . . 86


5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

CONTENTS iii

6 The Fokker-Planck Equation 896.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.2 Basic Properties of the FP Equation . . . . . . . . . . . . . . . . 90

6.2.1 Existence and Uniqueness of Solutions . . . . . . . . . . 906.2.2 The FP equation as a conservation law . . . . . . . . . . . 92

6.2.3 Boundary conditions for the Fokker–Planck equation .. . 92

6.3 Examples of Diffusion Processes . . . . . . . . . . . . . . . . . . 94

6.3.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . 94

6.3.2 The Ornstein-Uhlenbeck Process . . . . . . . . . . . . . . 986.3.3 The Geometric Brownian Motion . . . . . . . . . . . . . 102

6.4 The Ornstein-Uhlenbeck Process and Hermite Polynomials . . . . 103

6.5 Reversible Diffusions . . . . . . . . . . . . . . . . . . . . . . . . 110

6.5.1 Markov Chain Monte Carlo (MCMC) . . . . . . . . . . . 115

6.6 Perturbations of non-Reversible Diffusions . . . . . . . . .. . . . 1156.7 Eigenfunction Expansions . . . . . . . . . . . . . . . . . . . . . 116

6.7.1 Reduction to a Schrodinger Equation . . . . . . . . . . . 117


6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7 Stochastic Differential Equations 1237.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.2 The Ito and Stratonovich Stochastic Integral . . . . . . . .. . . . 1247.2.1 The Stratonovich Stochastic Integral . . . . . . . . . . . . 125

7.3 Stochastic Differential Equations . . . . . . . . . . . . . . . . .. 126

7.3.1 Examples of SDEs . . . . . . . . . . . . . . . . . . . . . 127

7.4 The Generator, Ito’s formula and the Fokker-Planck Equation . . . 129

7.4.1 The Generator . . . . . . . . . . . . . . . . . . . . . . . 1297.4.2 Ito’s Formula . . . . . . . . . . . . . . . . . . . . . . . . 129

7.5 Linear SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7.6 Derivation of the Stratonovich SDE . . . . . . . . . . . . . . . . 134

7.6.1 Ito versus Stratonovich . . . . . . . . . . . . . . . . . . . 137

7.7 Numerical Solution of SDEs . . . . . . . . . . . . . . . . . . . . 1387.8 Parameter Estimation for SDEs . . . . . . . . . . . . . . . . . . . 138

7.9 Noise Induced Transitions . . . . . . . . . . . . . . . . . . . . . 138


7.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

iv CONTENTS

8 The Langevin Equation 1418.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

8.2 The Fokker-Planck Equation in Phase Space (Klein-Kramers Equa-tion) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

8.3 The Langevin Equation in a Harmonic Potential . . . . . . . . .. 146

8.4 Asymptotic Limits for the Langevin Equation . . . . . . . . . .. 155

8.4.1 The Overdamped Limit . . . . . . . . . . . . . . . . . . . 157

8.4.2 The Underdamped Limit . . . . . . . . . . . . . . . . . . 163

8.5 Brownian Motion in Periodic Potentials . . . . . . . . . . . . . .168

8.5.1 The Langevin equation in a periodic potential . . . . . . .168

8.5.2 Equivalence With the Green-Kubo Formula . . . . . . . . 174

8.6 The Underdamped and Overdamped Limits of the Diffusion Coef-ficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

8.6.1 Brownian Motion in a Tilted Periodic Potential . . . . . .185

8.7 Numerical Solution of the Klein-Kramers Equation . . . . .. . . 188


8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9 The Mean First Passage time and Exit Time Problems 1919.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

9.2 Brownian Motion in a Bistable Potential . . . . . . . . . . . . . .191

9.3 The Mean First Passage Time . . . . . . . . . . . . . . . . . . . . 194

9.3.1 The Boundary Value Problem for the MFPT . . . . . . . . 194

9.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 197

9.4 Escape from a Potential Barrier . . . . . . . . . . . . . . . . . . . 199

9.4.1 Calculation of the Reaction Rate in the Overdamped Regime200

9.4.2 The Intermediate Regime:γ = O(1) . . . . . . . . . . . 201

9.4.3 Calculation of the Reaction Rate in the energy-diffusion-limited regime . . . . . . . . . . . . . . . . . . . . . . . 202


9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

10 Stochastic Processes and Statistical Mechanics 20510.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

10.2 The Kac-Zwanzig Model . . . . . . . . . . . . . . . . . . . . . . 206

10.3 The Generalized-Langevin Equation . . . . . . . . . . . . . . . .213

CONTENTS v

10.4 Open Classical Systems . . . . . . . . . . . . . . . . . . . . . . . 21710.5 Linear Response Theory . . . . . . . . . . . . . . . . . . . . . . 21810.6 Projection Operator Techniques . . . . . . . . . . . . . . . . . . 21810.7 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 21910.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

vi CONTENTS

Preface

The purpose of these notes is to present various results and techniques from thetheory of stochastic processes and are useful in the study ofstochastic problemsin physics, chemistry and other areas. These notes have beenused for severalyears for a course on applied stochastic processes offered to fourth year and toMSc students in applied mathematics at the department of mathematics, ImperialCollege London.

G.A. PavliotisLondon, December 2010

vii

viii PREFACE

Chapter 1

Introduction

1.1 Introduction

In this chapter we introduce some of the concepts and techniques that we will studyin this book. In Section 1.2 we present a brief historical overview on the develop-ment of the theory of stochastic processes in the twentieth century. In Section 1.3we introduce the one-dimensional random walk an we use this example in order tointroduce several concepts such Brownian motion, the Markov property. In Sec-tion 1.4 we discuss about the stochastic modeling of deterministic chaos. Somecomments on the role of probabilistic modeling in the physical sciences are of-fered in Section 1.5. Discussion and bibliographical comments are presented inSection 1.6. Exercises are included in Section 1.7.

1.2 Historical Overview

The theory of stochastic processes, at least in terms of its application to physics,started with Einstein’s work on the theory of Brownian motion: Concerning themotion, as required by the molecular-kinetic theory of heat, of particles suspendedin liquids at rest(1905) and in a series of additional papers that were published inthe period1905 − 1906. In these fundamental works, Einstein presented an expla-nation of Brown’s observation (1827) that when suspended inwater, small pollengrains are found to be in a very animated and irregular state of motion. In devel-oping his theory Einstein introduced several concepts thatstill play a fundamentalrole in the study of stochastic processes and that we will study in this book. Usingmodern terminology, Einstein introduced a Markov chain model for the motion of

1

2 CHAPTER 1. INTRODUCTION

the particle (molecule, pollen grain...). Furthermore, heintroduced the idea that itmakes more sense to talk about the probability of finding the particle at positionxat timet, rather than about individual trajectories.

In his work many of the main aspects of the modern theory of stochastic pro-cesses can be found:

• The assumption of Markovianity (no memory) expressed through the Chapman-Kolmogorov equation.

• The Fokker–Planck equation (in this case, the diffusion equation).

• The derivation of the Fokker-Planck equation from the master (Chapman-Kolmogorov) equation through a Kramers-Moyal expansion.

• The calculation of a transport coefficient (the diffusion equation) using macro-scopic (kinetic theory-based) considerations:

D =kBT

6πηa.

• kB is Boltzmann’s constant,T is the temperature,η is the viscosity of thefluid anda is the diameter of the particle.

Einstein’s theory is based on theFokker-Planck equation. Langevin (1908) de-veloped a theory based on astochastic differential equation. The equation ofmotion for a Brownian particle is

md2x

dt2= −6πηa

dx

dt+ ξ,

whereξ is a random force. It can be shown that there is complete agreement be-tween Einstein’s theory and Langevin’s theory. The theory of Brownian motionwas developed independently by Smoluchowski, who also performed several ex-periments.

The approaches of Langevin and Einstein represent the two main approachesin the theory of stochastic processes:

• Study individual trajectories of Brownian particles. Their evolution is gov-erned by a stochastic differential equation:

dX

dt= F (X) + Σ(X)ξ(t),

1.3. THE ONE-DIMENSIONAL RANDOM WALK 3

• whereξ(t) is a random force.

• Study the probabilityρ(x, t) of finding a particle at positionx at timet. Thisprobability distribution satisfies the Fokker-Planck equation:

∂ρ

∂t= −∇ · (F (x)ρ) +

1

2∇∇ : (A(x)ρ),

• whereA(x) = Σ(x)Σ(x)T .

The theory of stochastic processes was developed during the20th century by sev-eral mathematicians and physicists including Smoluchowksi, Planck, Kramers,Chandrasekhar, Wiener, Kolmogorov, Ito, Doob.

1.3 The One-Dimensional Random Walk

We let time be discrete, i.e.t = 0, 1, . . . . Consider the following stochasticprocessSn: S0 = 0; at each time step it moves to±1 with equal probability1

2 .

In other words, at each time step we flip a fair coin. If the outcome is heads,we move one unit to the right. If the outcome is tails, we move one unit to the left.

Alternatively, we can think of the random walk as a sum of independent randomvariables:

Sn =

n∑

j=1

Xj ,

whereXj ∈ −1, 1 with P(Xj = ±1) = 12 .

We can simulate the random walk on a computer:

• We need a(pseudo)random number generatorto generaten independentrandom variables which areuniformly distributed in the interval [0,1].

• If the value of the random variable is> 12 then the particle moves to the left,

otherwise it moves to the right.

• We then take the sum of all these random moves.

• The sequenceSnNn=1 indexed by the discrete timeT = 1, 2, . . . N isthepath of the random walk. We use a linear interpolation (i.e. connect thepointsn, Sn by straight lines) to generate acontinuous path.


0 5 10 15 20 25 30 35 40 45 50

−6

−4

−2

0

2

4

6

8

50−step random walk

Figure 1.1: Three paths of the random walk of lengthN = 50.

0 100 200 300 400 500 600 700 800 900 1000

−50

−40

−30

−20

−10

0

10

20

1000−step random walk

Figure 1.2: Three paths of the random walk of lengthN = 1000.

1.3. THE ONE-DIMENSIONAL RANDOM WALK 5

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

t

U(t)

mean of 1000 paths5 individual paths

Figure 1.3: Sample Brownian paths.

Every path of the random walk is different: it depends on the outcome of a se-quence of independent random experiments. We can compute statistics by gen-erating a large number of paths and computing averages. For example,E(Sn) =

0, E(S2n) = n. The paths of the random walk (without the linear interpolation) are

not continuous: the random walk has a jump of size1 at each time step. This is anexample of adiscrete time, discrete spacestochastic processes. The random walkis a time-homogeneous Markovprocess. If we take a large number of steps, therandom walk starts looking like a continuous time process with continuous paths.

We can quantify this observation by introducing an appropriate rescaled pro-cess and by taking an appropriate limit. Consider the sequence ofcontinuous timestochastic processes

Znt :=1√nSnt.

In the limit asn → ∞, the sequenceZnt converges (in some appropriate sense,that will be made precise in later chapters) to aBrownian motion with diffusioncoefficientD = ∆x2

2∆t = 12 . Brownian motionW (t) is a continuous time stochastic

processes with continuous paths that starts at0 (W (0) = 0) and has indepen-dent, normally. distributed Gaussian increments. We can simulate the Brownian


motion on a computer using a random number generator that generates normallydistributed, independent random variables. We can write anequation for the evo-lution of the paths of a Brownian motionXt with diffusion coefficientD startingat x:

dXt =√

2DdWt, X0 = x.

This is the simplest example of astochastic differential equation. The probabilityof finding Xt at y at time t, given that it was atx at time t = 0, the transitionprobability density ρ(y, t) satisfies the PDE

∂ρ

∂t= D

∂2ρ

∂y2, ρ(y, 0) = δ(y − x).

This is the simplest example of theFokker-Planck equation. The connectionbetween Brownian motion and the diffusion equation was madeby Einstein in1905.

1.4 Stochastic Modeling of Deterministic Chaos

1.5 Why Randomness

Why introduce randomness in the description of physical systems?

• To describe outcomes of a repeated set of experiments. Thinkof tossing acoin repeatedly or of throwing a dice.

• To describe a deterministic system for which we have incomplete informa-tion: we have imprecise knowledge of initial and boundary conditions or ofmodel parameters.

– ODEs with random initial conditions are equivalent to stochastic pro-cesses that can be described using stochastic differentialequations.

• To describe systems for which we are not confident about the validity of ourmathematical model.

• To describe a dynamical system exhibiting very complicatedbehavior (chaoticdynamical systems). Determinism versus predictability.

1.6. DISCUSSION AND BIBLIOGRAPHY 7

• To describe a high dimensional deterministic system using asimpler, lowdimensional stochastic system. Think of the physical modelfor Brownianmotion (a heavy particle colliding with many small particles).

• To describe a system that is inherently random. Think of quantum mechan-ics.

Stochastic modeling is currently used in many different areas ranging frombiology to climate modeling to economics.

1.6 Discussion and Bibliography

The fundamental papers of Einstein on the theory of Brownianmotion have beenreprinted by Dover [11]. The readers of this book are strongly encouraged to studythese papers. Other fundamental papers from the early period of the developmentof the theory of stochastic processes include the papers by Langevin, Ornstein andUhlenbeck, Doob, Kramers and Chandrashekhar’s famous review article [7]. Manyof these early papers on the theory of stochastic processes have been reprintedin [10]. Very useful historical comments can be founds in thebooks by Nelson [54]and Mazo [52].

1.7 Exercises

1. Read the papers by Einstein, Ornstein-Uhlenbeck, Doob etc.

2. Write a computer program for generating the random walk inone and two di-mensions. Study numerically the Brownian limit and computethe statistics ofthe random walk.

Chapter 2

Elements of Probability Theory

2.1 Introduction

In this chapter we put together some basic definitions and results from probabilitytheory that will be used later on. In Section 2.2 we give some basic definitionsfrom the theory of probability. In Section 2.3 we present some properties of ran-dom variables. In Section 2.4 we introduce the concept of conditional expectationand in Section 2.5 we define the characteristic function, oneof the most usefultools in the study of (sums of) random variables. Some explicit calculations forthe multivariate Gaussian distribution are presented in Section 2.6. Different typesof convergence and the basic limit theorems of the theory of probability are dis-cussed in Section 2.7. Discussion and bibliographical comments are presented inSection 2.8. Exercises are included in Section 2.9.

2.2 Basic Definitions from Probability Theory

In Chapter 1 we defined a stochastic process as a dynamical system whose law ofevolution is probabilistic. In order to study stochastic processes we need to be ableto describe the outcome of a random experiment and to calculate functions of thisoutcome. First we need to describe the set of all possible experiments.

Definition 2.2.1. The set of all possible outcomes of an experiment is called thesample spaceand is denoted byΩ.

Example 2.2.2. • The possible outcomes of the experiment of tossing a coinareH andT . The sample space isΩ =

H, T

.

9

10 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

• The possible outcomes of the experiment of throwing a die are1, 2, 3, 4, 5

and6. The sample space isΩ =1, 2, 3, 4, 5, 6

.

We defineeventsto be subsets of the sample space. Of course, we would likethe unions, intersections and complements of events to alsobe events. When thesample spaceΩ is uncountable, then technical difficulties arise. In particular, notall subsets of the sample space need to be events. A definitionof the collection ofsubsets of events which is appropriate for finite additive probability is the follow-ing.

Definition 2.2.3. A collectionF of Ω is called afield onΩ if

i. ∅ ∈ F ;

ii. if A ∈ F thenAc ∈ F ;

iii. If A, B ∈ F thenA ∪B ∈ F .

From the definition of a field we immediately deduce thatF is closed underfinite unions and finite intersections:

A1, . . . An ∈ F ⇒ ∪ni=1Ai ∈ F , ∩ni=1Ai ∈ F .

When Ω is infinite dimensional then the above definition is not appropriatesince we need to consider countable unions of events.

Definition 2.2.4. A collectionF of Ω is called aσ-field or σ-algebra onΩ if

i. ∅ ∈ F ;

ii. if A ∈ F thenAc ∈ F ;

iii. If A1, A2, · · · ∈ F then∪∞i=1Ai ∈ F .

A σ-algebra is closed under the operation of taking countable intersections.

Example 2.2.5. • F =∅, Ω

.

• F =∅, A, Ac, Ω

whereA is a subset ofΩ.

• Thepower setof Ω, denoted by0, 1Ω which contains all subsets ofΩ.

2.2. BASIC DEFINITIONS FROM PROBABILITY THEORY 11

Let F be a collection of subsets ofΩ. It can be extended to aσ−algebra (takefor example the power set ofΩ). Consider all theσ−algebras that containF andtake their intersection, denoted byσ(F), i.e. A ⊂ Ω if and only if it is in everyσ−algebra containingF . σ(F) is aσ−algebra (see Exercise 1 ). It is the smallestalgebra containingF and it is called theσ−algebra generated byF .

Example 2.2.6.LetΩ = Rn. Theσ-algebra generated by the open subsets ofR

n

(or, equivalently, by the open balls ofRn) is called theBorel σ-algebra of R

n andis denoted byB(Rn).

LetX be a closed subset ofRn. Similarly, we can define the Borelσ-algebraof X, denoted byB(X).

A sub-σ–algebra is a collection of subsets of aσ–algebra which satisfies theaxioms of aσ–algebra.

Theσ−field F of a sample spaceΩ contains all possible outcomes of the ex-periment that we want to study. Intuitively, theσ−field contains all the informationabout the random experiment that is available to us.

Now we want to assign probabilities to the possible outcomesof an experiment.

Definition 2.2.7. A probability measure P on themeasurable space(Ω, F) is afunctionP : F 7→ [0, 1] satisfying

i. P(∅) = 0, P(Ω) = 1;

ii. For A1, A2, . . . withAi ∩Aj = ∅, i 6= j then

P(∪∞i=1Ai) =

∞∑

i=1

P(Ai).

Definition 2.2.8. The triple(Ω, F , P

)comprising a setΩ, aσ-algebraF of sub-

sets ofΩ and a probability measureP on (Ω, F) is a called aprobability space.

Example 2.2.9.A biased coin is tossed once:Ω = H, T, F = ∅, H, T, Ω =

0, 1, P : F 7→ [0, 1] such thatP(∅) = 0, P(H) = p ∈ [0, 1], P(T ) =

1 − p, P(Ω) = 1.

Example 2.2.10. TakeΩ = [0, 1], F = B([0, 1]), P = Leb([0, 1]). Then(Ω,F ,P) is a probability space.


2.2.1 Conditional Probability

One of the most important concepts in probability is that of the dependence be-tween events.

Definition 2.2.11. A familyAi : i ∈ I of events is called independent if

P(∩j∈J Aj

)= Πj∈JP(Aj)

for all finite subsetsJ of I.

When two eventsA, B are dependent it is important to know the probabilitythat the eventA will occur, given thatB has already happened. We define thisto beconditional probability , denoted byP(A|B). We know from elementaryprobability that

P (A|B) =P (A ∩B)

P(B).

A very useful result is that of thetotal law of probability .

Definition 2.2.12. A family of eventsBi : i ∈ I is called a partition ofΩ if

Bi ∩Bj = ∅, i 6= j and ∪i∈I Bi = Ω.

Proposition 2.2.13. Law of total probability. For any eventA and any partitionBi : i ∈ I we have

P(A) =∑

i∈IP(A|Bi)P(Bi).

The proof of this result is left as an exercise. In many cases the calculation ofthe probability of an event is simplified by choosing an appropriate partition ofΩand using the law of total probability.

Let (Ω,F ,P) be a probability space and fixB ∈ F . ThenP(·|B) defines aprobability measure onF . Indeed, we have that

P(∅|B) = 0, P(Ω|B) = 1

and (sinceAi ∩Aj = ∅ implies that(Ai ∩B) ∩ (Aj ∩B) = ∅)

P (∪∞j=1Ai|B) =

∞∑

j=1

P(Ai|B),

for a countable family of pairwise disjoint setsAj+∞j=1. Consequently,(Ω,F ,P(·|B))

is a probability space for everyB ∈ cF .

2.3. RANDOM VARIABLES 13

2.3 Random Variables

We are usually interested in the consequences of the outcomeof an experiment,rather than the experiment itself. The function of the outcome of an experiment isa random variable, that is, a map fromΩ to R.

Definition 2.3.1. A sample spaceΩ equipped with aσ−field of subsetsF is calleda measurable space.

Definition 2.3.2. Let (Ω,F) and (E,G) be two measurable spaces. A functionX : Ω → E such that the event

ω ∈ Ω : X(ω) ∈ A =: X ∈ A (2.1)

belongs toF for arbitrary A ∈ G is called a measurable function or randomvariable.

WhenE is R equipped with its Borelσ-algebra, then (2.1) can by replacedwith

X 6 x ∈ F ∀x ∈ R.

LetX be a random variable (measurable function) from(Ω,F , µ) to (E,G). If Eis a metric space then we may defineexpectationwith respect to the measureµ by

E[X] =

∫

ΩX(ω) dµ(ω).

More generally, letf : E 7→ R beG–measurable. Then,

E[f(X)] =

∫

Ωf(X(ω)) dµ(ω).

Let U be a topological space. We will use the notationB(U) to denote the Borelσ–algebra ofU : the smallestσ–algebra containing all open sets ofU . Every ran-dom variable from a probability space(Ω,F , µ) to a measurable space(E,B(E))

induces a probability measure onE:

µX(B) = PX−1(B) = µ(ω ∈ Ω;X(ω) ∈ B), B ∈ B(E). (2.2)

The measureµX is called thedistribution (or sometimes thelaw) of X.

Example 2.3.3. Let I denote a subset of the positive integers. A vectorρ0 =

ρ0,i, i ∈ I is a distribution onI if it has nonnegative entries and its total massequals1:

∑i∈I ρ0,i = 1.


Consider the case whereE = R equipped with the Borelσ−algebra. In thiscase a random variable is defined to be a functionX : Ω → R such that

ω ∈ Ω : X(ω) 6 x ⊂ F ∀x ∈ R.

We can now define the probability distribution function ofX, FX : R → [0, 1] as

FX(x) = P( ω ∈ Ω

∣∣X(ω) 6 x)

=: P(X 6 x). (2.3)

In this case,(R,B(R), FX) becomes a probability space.The distribution functionFX(x) of a random variable has the properties that

limx→−∞ FX(x) = 0, limx→+∞ F (x) = 1 and is right continuous.

Definition 2.3.4. A random variableX with values onR is calleddiscrete if ittakes values in some countable subsetx0, x1, x2, . . . of R. i.e.: P(X = x) 6= x

only forx = x0, x1, . . . .

With a random variable we can associate theprobability mass function pk =

P(X = xk). We will consider nonnegative integer valued discrete random vari-ables. In this casepk = P(X = k), k = 0, 1, 2, . . . .

Example 2.3.5. The Poisson random variable is the nonnegative integer valuedrandom variable with probability mass function

pk = P(X = k) =λk

k!e−λ, k = 0, 1, 2, . . . ,

whereλ > 0.

Example 2.3.6.The binomial random variable is the nonnegative integer valuedrandom variable with probability mass function

pk = P(X = k) =N !

n!(N − n)!pnqN−n k = 0, 1, 2, . . . N,

wherep ∈ (0, 1), q = 1 − p.

Definition 2.3.7. A random variableX with values onR is calledcontinuous ifP(X = x) = 0∀x ∈ R.

Let (Ω,F ,P) be a probability space and letX : Ω → R be a random variablewith distributionFX . This is a probability measure onB(R). We will assumethat it is absolutely continuous with respect to the Lebesgue measure with densityρX : FX(dx) = ρ(x) dx. We will call the densityρ(x) the probability densityfunction (PDF) of the random variableX.


Example 2.3.8. i. The exponential random variable has PDF

f(x) =

λe−λx x > 0,

0 x < 0,

with λ > 0.

ii. The uniform random variable has PDF

f(x) =

1b−a a < x < b,

0 x /∈ (a, b),

with a < b.

Definition 2.3.9. Two random variablesX and Y are independent if the eventsω ∈ Ω |X(ω) 6 x andω ∈ Ω |Y (ω) 6 y are independent for allx, y ∈ R.

Let X, Y be two continuous random variables. We can view them as a ran-dom vector, i.e. a random variable fromΩ to R

2. We can then define thejointdistribution function

F (x, y) = P(X 6 x, Y 6 y).

The mixed derivative of the distribution functionfX,Y (x, y) := ∂2F∂x∂y (x, y), if it

exists, is called thejoint PDF of the random vectorX, Y :

FX,Y (x, y) =

∫ x

−∞

∫ y

−∞fX,Y (x, y) dxdy.

If the random variablesX andY are independent, then

FX,Y (x, y) = FX(x)FY (y)

and

fX,Y (x, y) = fX(x)fY (y).

The joint distribution function has the properties

FX,Y (x, y) = FY,X(y, x),

FX,Y (+∞, y) = FY (y), fY (y) =

∫ +∞

−∞fX,Y (x, y) dx.


We can extend the above definition to random vectors of arbitrary finite dimen-sions. LetX be a random variable from(Ω,F , µ) to (Rd,B(Rd)). The (joint)distribution functionFXR

d → [0, 1] is defined as

FX(x) = P(X 6 x).

Let X be a random variable inRd with distribution functionf(xN ) wherexN =

x1, . . . xN. We define themarginal or reduced distribution function fN−1(xN−1)

by

fN−1(xN−1) =

∫

R

fN(xN ) dxN .

We can define other reduced distribution functions:

fN−2(xN−2) =

∫

R

fN−1(xN−1) dxN−1 =

∫

R

∫

R

f(xN ) dxN−1dxN .

2.3.1 Expectation of Random Variables

We can use the distribution of a random variable to compute expectations and prob-abilities:

E[f(X)] =

∫

R

f(x) dFX (x) (2.4)

and

P[X ∈ G] =

∫

GdFX(x), G ∈ B(E). (2.5)

The above formulas apply to both discrete and continuous random variables, pro-vided that we define the integrals in (2.4) and (2.5) appropriately.

WhenE = Rd and a PDF exists,dFX(x) = fX(x) dx, we have

FX(x) := P(X 6 x) =

∫ x1

−∞. . .

∫ xd

−∞fX(x) dx..

WhenE = Rd then byLp(Ω; Rd), or sometimesLp(Ω;µ) or even simplyLp(µ),

we mean the Banach space of measurable functions onΩ with norm

‖X‖Lp =(

E|X|p)1/p

.

LetX be a nonnegative integer valued random variable with probability massfunction pk. We can compute the expectation of an arbitrary function ofX usingthe formula

E(f(X)) =

∞∑

k=0

f(k)pk.


Let X, Y be random variables we want to know whether they are correlatedand, if they are, to calculate how correlated they are. We define the covariance ofthe two random variables as

cov(X,Y ) = E[(X − EX)(Y − EY )

]= E(XY ) − EXEY.

Thecorrelation coefficient is

ρ(X,Y ) =cov(X,Y )√

var(X)√

var(X)(2.6)

The Cauchy-Schwarz inequality yields thatρ(X,Y ) ∈ [−1, 1]. We will saythat two random variablesX andY areuncorrelated provided thatρ(X,Y ) = 0.It is not true in general that two uncorrelated random variables are independent.This is true, however, forGaussianrandom variables (see Exercise 5).

Example 2.3.10. • Consider the random variableX : Ω 7→ R with pdf

γσ,b(x) := (2πσ)−12 exp

(−(x− b)2

2σ

).

Such anX is termed aGaussianor normal random variable. The mean is

EX =

∫

R

xγσ,b(x) dx = b

and the variance is

E(X − b)2 =

∫

R

(x− b)2γσ,b(x) dx = σ.

• Let b ∈ Rd andΣ ∈ R

d×d be symmetric and positive definite. The randomvariableX : Ω 7→ R

d with pdf

γΣ,b(x) :=((2π)ddetΣ

)− 12exp

(−1

2〈Σ−1(x− b), (x − b)〉

)

is termed amultivariate Gaussian or normal random variable. The meanis

E(X) = b (2.7)

and the covariance matrix is

E

((X − b) ⊗ (X − b)

)= Σ. (2.8)


Since the mean and variance specify completely a Gaussian random variable onR, the Gaussian is commonly denoted byN (m,σ). Thestandard normal randomvariable isN (0, 1). Similarly, since the mean and covariance matrix completelyspecify a Gaussian random variable onR

d, the Gaussian is commonly denoted byN (m,Σ).

Some analytical calculations for Gaussian random variables will be presentedin Section 2.6.

2.4 Conditional Expecation

Assume thatX ∈ L1(Ω,F , µ) and letG be a sub–σ–algebra ofF . Theconditionalexpectationof X with respect toG is defined to be the function (random variable)E[X|G] : Ω 7→ E which isG–measurable and satisfies

∫

GE[X|G] dµ =

∫

GX dµ ∀G ∈ G.

We can defineE[f(X)|G] and the conditional probabilityP[X ∈ F |G] = E[IF (X)|G],whereIF is the indicator function ofF , in a similar manner.

We list some of the most important properties of conditionalexpectation.

Theorem 2.4.1.[Properties of Conditional Expectation]. Let(Ω,F , µ) be a prob-ability space and letG be a sub–σ–algebra ofF .

(a) If X is G−measurable and integrable thenE(X|G) = X.

(b) (Linearity) IfX1, X2 are integrable andc1, c2 constants, then

E(c1X1 + c2X2|G) = c1E(X1|G) + c2E(X2|G).

(c) (Order) IfX1, X2 are integrable andX1 6 X2 a.s., thenE(X1|G) 6 E(X2|G)

a.s.

(d) If Y andXY are integrable, andX is G−measurable thenE(XY |G) =

XE(Y |G).

(e) (Successive smoothing) IfD is a sub–σ–algebra ofF , D ⊂ G andX is inte-grable, thenE(X|D) = E[E(X|G)|D] = E[E(X|D)|G].

2.5. THE CHARACTERISTIC FUNCTION 19

(f) (Convergence) LetXn∞n=1 be a sequence of random variables such that, forall n, |Xn| 6 Z whereZ is integrable. IfXn → X a.s., thenE(Xn|G) →E(X|G) a.s. and inL1.

Proof. See Exercise 10.

2.5 The Characteristic Function

Many of the properties of (sums of) random variables can be studied using theFourier transform of the distribution function. LetF (λ) be the distribution functionof a (discrete or continuous) random variableX. Thecharacteristic function ofX is defined to be the Fourier transform of the distribution function

φ(t) =

∫

R

eitλ dF (λ) = E(eitX ). (2.9)

For a continuous random variable for which the distributionfunctionF has a den-sity, dF (λ) = p(λ)dλ, (2.9) gives

φ(t) =

∫

R

eitλp(λ) dλ.

For a discrete random variable for whichP(X = λk) = αk, (2.9) gives

φ(t) =

∞∑

k=0

eitλkak.

From the properties of the Fourier transform we conclude that the characteristicfunction determines uniquely the distribution function ofthe random variable, inthe sense that there is a one-to-one correspondance betweenF (λ) andφ(t). Fur-thermore, in the exercises at the end of the chapter the reader is asked to prove thefollowing two results.

Lemma 2.5.1. LetX1,X2, . . . Xn be independent random variables with char-acteristic functionsφj(t), j = 1, . . . n and letY =

∑nj=1Xj with characteristic

functionφY (t). ThenφY (t) = Πn

j=1φj(t).

Lemma 2.5.2. LetX be a random variable with characteristic functionφ(t) andassume that it has finite moments. Then

E(Xk) =1

ikφ(k)(0).


2.6 Gaussian Random Variables

In this section we present some useful calculations for Gaussian random variables.In particular, we calculate the normalization constant, the mean and variance andthe characteristic function of multidimensional Gaussianrandom variables.

Theorem 2.6.1.Letb ∈ Rd andΣ ∈ R

d×d a symmetric and positive definite ma-trix. Let X be the multivariate Gaussian random variable with probability densityfunction

γΣ,b(x) =1

Zexp

(−1

2〈Σ−1(x − b),x − b〉

).

Then

i. The normalization constant is

Z = (2π)d/2√

det(Σ).

ii. The mean vector and covariance matrix ofX are given by

EX = b

andE((X − EX) ⊗ (X − EX)) = Σ.

iii. The characteristic function ofX is

φ(t) = ei〈b,t〉−12〈t,Σt〉.

Proof. i. From the spectral theorem for symmetric positive definitematriceswe have that there exists a diagonal matrixΛ with positive entries and anorthogonal matrixB such that

Σ−1 = BTΛ−1B.

Let z = x− b andy = Bz. We have

〈Σ−1z, z〉 = 〈BTΛ−1Bz, z〉= 〈Λ−1Bz, Bz〉 = 〈Λ−1y,y〉

=d∑

i=1

λ−1i y2

i .

2.6. GAUSSIAN RANDOM VARIABLES 21

Furthermore, we have that det(Σ−1) = Πdi=1λ

−1i , that det(Σ) = Πd

i=1λiand that the Jacobian of an orthogonal transformation isJ = det(B) = 1.Hence,

∫

Rd

exp

(−1

2〈Σ−1(x − b),x − b〉

)dx =

∫

Rd

exp

(−1

2〈Σ−1z, z〉

)dz

=

∫

Rd

exp

(

−1

2

d∑

i=1

λ−1i y2

i

)

|J | dy

=

d∏

i=1

∫

R

exp

(−1

2λ−1i y2

i

)dyi

= (2π)d/2Πni=1λ

1/2i = (2π)d/2

√det(Σ),

from which we get that

Z = (2π)d/2√

det(Σ).

In the above calculation we have used the elementary calculus identity

∫

R

e−αx2

2 dx =

√2π

α.

ii. From the above calculation we have that

γΣ,b(x) dx = γΣ,b(BTy + b) dy

=1

(2π)d/2√

det(Σ)

d∏

i=1

exp

(−1

2λiy

2i

)dyi.

Consequently

EX =

∫

Rd

xγΣ,b(x) dx

=

∫

Rd

(BTy + b)γΣ,b(BTy + b) dy

= b

∫

Rd

γΣ,b(BTy + b) dy = b.

We note that, sinceΣ−1 = BTΛ−1B, we have thatΣ = BTΛB. Further-


more,z = BTy. We calculate

E((Xi − bi)(Xj − bj)) =

∫

Rd

zizjγΣ,b(z + b) dz

=1

(2π)d/2√

det(Σ)

∫

Rd

∑

k

Bkiyk∑

m

Bmiym exp

(

−1

2

∑

ℓ

λ−1ℓ y2

ℓ

=1

(2π)d/2√

det(Σ)

∑

k,m

BkiBmj

∫

Rd

ykym exp

(−1

2

∑

ℓ

λ−1ℓ y2

ℓ

)dy

=∑

k,m

BkiBmjλkδkm

= Σij.

iii. Let y be a multivariate Gaussian random variable with mean0 and covari-anceI. Let alsoC = B

√Λ. We have thatΣ = CCT = CTC. We have

thatX = CY + b.

To see this, we first note thatX is Gaussian since it is given through a lineartransformation of a Gaussian random variable. Furthermore,

EX = b and E((Xi − bi)(Xj − bj)) = Σij.

Now we have:

φ(t) = Eei〈X,t〉 = ei〈b,t〉Eei〈CY,t〉

= ei〈b,t〉Eei〈Y,CT t〉

= ei〈b,t〉EeiP

j(P

k Cjktk)yj

= ei〈b,t〉e−12

P

j|Pk Cjktk|2

= ei〈b,t〉e−12〈Ct,Ct〉

= ei〈b,t〉e−12〈t,CTCt〉

= ei〈b,t〉e−12〈t,Σt〉.

Consequently,

φ(t) = ei〈b,t〉−12〈t,Σt〉.

2.7. TYPES OF CONVERGENCE AND LIMIT THEOREMS 23

2.7 Types of Convergence and Limit Theorems

One of the most important aspects of the theory of random variables is the study oflimit theorems for sums of random variables. The most well known limit theoremsin probability theory are the law of large numbers and the central limit theorem.There are various different types of convergence for sequences or random variables.We list the most important types of convergence below.

Definition 2.7.1. Let Zn∞n=1 be a sequence of random variables. We will saythat

(a) Zn converges toZ with probability one if

P(

limn→+∞

Zn = Z)

= 1.

(b) Zn converges toZ in probability if for everyε > 0

limn→+∞

P(|Zn − Z| > ε

)= 0.

(c) Zn converges toZ in Lp if

limn→+∞

E[∣∣Zn − Z

∣∣p] = 0.

(d) Let Fn(λ), n = 1, · · · + ∞, F (λ) be the distribution functions ofZn n =

1, · · · + ∞ andZ, respectively. ThenZn converges toZ in distribution if

limn→+∞

Fn(λ) = F (λ)

for all λ ∈ R at whichF is continuous.

Recall that the distribution functionFX of a random variable from a probabilityspace(Ω,F ,P) to R induces a probability measure onR and that(R,B(R), FX ) isa probability space. We can show that the convergence in distribution is equivalentto the weak convergence of the probability measures inducedby the distributionfunctions.

Definition 2.7.2. Let (E, d) be a metric space,B(E) theσ−algebra of its Borelsets,Pn a sequence of probability measures on(E,B(E)) and letCb(E) denotethe space of bounded continuous functions onE. We will say that the sequence ofPn converges weakly to the probability measureP if, for eachf ∈ Cb(E),

limn→+∞

∫

Ef(x) dPn(x) =

∫

Ef(x) dP (x).


Theorem 2.7.3.LetFn(λ), n = 1, · · · + ∞, F (λ) be the distribution functions ofZn n = 1, · · · + ∞ andZ, respectively. ThenZn converges toZ in distribution ifand only if, for allg ∈ Cb(R)

limn→+∞

∫

Xg(x) dFn(x) =

∫

Xg(x) dF (x). (2.10)

Notice that (2.10) is equivalent to

limn→+∞

Eng(Xn) = Eg(X),

whereEn andE denote the expectations with respect toFn andF , respectively.

When the sequence of random variables whose convergence we are interestedin takes values inRd or, more generally, a metric space space(E, d) then we canuse weak convergence of the sequence of probability measures induced by thesequence of random variables to define convergence in distribution.

Definition 2.7.4. A sequence of real valued random variablesXn defined on aprobability spaces(Ωn,Fn, Pn) and taking values on a metric space(E, d) is saidto converge in distribution if the indued measuresFn(B) = Pn(Xn ∈ B) forB ∈ B(E) converge weakly to a probability measureP .

Let Xn∞n=1 be iid random variables withEXn = V . Then, thestrong lawof large numbers states that average of the sum of the iid converges toV withprobability one:

P

(lim

N→+∞1

N

N∑

n=1

Xn = V)

= 1.

The strong law of large numbers provides us with informationabout the behav-ior of a sum of random variables (or, a large number or repetitions of the sameexperiment) on average. We can also study fluctuations around the average be-havior. Indeed, letE(Xn − V )2 = σ2. Define the centered iid random variablesYn = Xn − V . Then, the sequence of random variables1

σ√N

∑Nn=1 Yn converges

in distribution to aN (0, 1) random variable:

limn→+∞

P

(1

σ√N

N∑

n=1

Yn 6 a

)

=

∫ a

−∞

1√2πe−

12x2dx.

This is thecentral limit theorem .



The material of this chapter is very standard and can be foundin many books onprobability theory. Well known textbooks on probability theory are [4, 14, 15, 44,45, 37, 71].

The connection between conditional expectation and orthogonal projections isdiscussed in [8].

The reduced distribution functions defined in Section 2.3 are used extensivelyin statistical mechanics. A different normalization is usually used in physics text-books. See for instance [2, Sec. 4.2].

The calculations presented in Section 2.6 are essentially an exercise in linearalgebra. See [42, Sec. 10.2].

Random variables and probability measures can also be defined in infinite di-mensions. More information can be found in [?, Ch. 2].

The study of limit theorems is one of the cornerstones of probability theory andof the theory of stochastic processes. A comprehensive study of limit theorems canbe found in [33].

2.9 Exercises

1. Show that the intersection of a family ofσ-algebras is aσ-algebra.

2. Prove the law of total probability, Proposition 2.2.13.

3. Calculate the mean, variance and characteristic function of the following prob-ability density functions.

(a) The exponential distribution with density

f(x) =

λe−λx x > 0,

0 x < 0,

with λ > 0.

(b) The uniform distribution with density

f(x) =

1b−a a < x < b,

0 x /∈ (a, b),

with a < b.


(c) The Gamma distribution with density

f(x) =

λ

Γ(α)(λx)α−1e−λx x > 0,

0 x < 0,

with λ > 0, α > 0 andΓ(α) is the Gamma function

Γ(α) =

∫ ∞

0ξα−1e−ξ dξ, α > 0.

4. LeX andY be independent random variables with distribution functionsFXandFY . Show that the distribution function of the sumZ = X + Y is theconvolution ofFX andFY :

FZ(x) =

∫FX(x− y) dFY (y).

5. LetX andY be Gaussian random variables. Show that they are uncorrelated ifand only if they are independent.

6. (a) LetX be a continuous random variable with characteristic function φ(t).Show that

EXk =1

ikφ(k)(0),

whereφ(k)(t) denotes thek-th derivative ofφ evaluated att.

(b) LetX be a nonnegative random variable with distribution function F (x).Show that

E(X) =

∫ +∞

0(1 − F (x)) dx.

(c) LetX be a continuous random variable with probability density functionf(x) and characteristic functionφ(t). Find the probability density andcharacteristic function of the random variableY = aX+ b with a, b ∈ R.

(d) LetX be a random variable with uniform distribution on[0, 2π]. Find theprobability density of the random variableY = sin(X).

7. LetX be a discrete random variable taking vales on the set of nonnegative inte-gers with probability mass functionpk = P(X = k) with pk > 0,

∑+∞k=0 pk =

1. Thegenerating function is defined as

g(s) = E(sX) =+∞∑

k=0

pksk.

2.9. EXERCISES 27

(a) Show that

EX = g′(1) and EX2 = g′′(1) + g′(1),

where the prime denotes differentiation.

(b) Calculate the generating function of the Poisson randomvariable with

pk = P(X = k) =e−λλk

k!, k = 0, 1, 2, . . . and λ > 0.

(c) Prove that the generating function of a sum of independent nonnegativeinteger valued random variables is the product of their generating func-tions.

8. Write a computer program for studying the law of large numbers and the centrallimit theorem. Investigate numerically the rate of convergence of these twotheorems.

9. Study the properties of Gaussian measures on separable Hilbert spaces from [?,Ch. 2].

10. . Prove Theorem 2.4.1.

Chapter 3

Basics of the Theory of StochasticProcesses

3.1 Introduction

In this chapter we present some basic results form the theoryof stochastic pro-cesses and we investigate the properties of some of the standard stochastic pro-cesses in continuous time. In Section 3.2 we give the definition of a stochastic pro-cess. In Section 3.3 we present some properties of stationary stochastic processes.In Section 3.4 we introduce Brownian motion and study some ofits properties.Various examples of stochastic processes in continuous time are presented in Sec-tion 3.5. The Karhunen-Loeve expansion, one of the most useful tools for repre-senting stochastic processes and random fields, is presented in Section 3.6. Furtherdiscussion and bibliographical comments are presented in Section 3.7. Section 3.8contains exercises.

3.2 Definition of a Stochastic Process

Stochastic processes describe dynamical systems whose evolution law is of proba-bilistic nature. The precise definition is given below.

Definition 3.2.1. LetT be an ordered set,(Ω,F ,P) a probability space and(E,G)

a measurable space. Astochastic processis a collection of random variablesX = Xt; t ∈ T where, for each fixedt ∈ T , Xt is a random variable from(Ω,F ,P) to (E,G). Ω is called thesample space. andE is thestate spaceof thestochastic processXt.

29

30 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

The setT can be either discrete, for example the set of positive integersZ+, orcontinuous,T = [0,+∞). The state spaceE will usually beR

d equipped with theσ–algebra of Borel sets.

A stochastic processX may be viewed as a function of botht ∈ T andω ∈ Ω.

We will sometimes writeX(t),X(t, ω) orXt(ω) instead ofXt. For a fixed samplepoint ω ∈ Ω, the functionXt(ω) : T 7→ E is called asample path(realization,trajectory) of the processX.

Definition 3.2.2. Thefinite dimensional distributions (fdd) of a stochastic pro-cess are the distributions of theEk–valued random variables(X(t1),X(t2), . . . ,X(tk))

for arbitrary positive integerk and arbitrary timesti ∈ T, i ∈ 1, . . . , k:

F (x) = P(X(ti) 6 xi, i = 1, . . . , k)

with x = (x1, . . . , xk).

From experiments or numerical simulations we can only obtain informationabout the finite dimensional distributions of a process. A natural question arises:are the finite dimensional distributions of a stochastic process sufficient to deter-mine a stochastic process uniquely? This is true for processes with continuouspaths1. This is the class of stochastic processes that we will studyin these notes.

Definition 3.2.3. We will say that two processesXt andYt are equivalent if theyhave same finite dimensional distributions.

Definition 3.2.4. A one dimensionalGaussian processis a continuous time stochas-tic process for whichE = R and all the finite dimensional distributions are Gaus-sian, i.e. every finite dimensional vector(Xt1 ,Xt2 , . . . ,Xtk ) is aN (µk,Kk) ran-dom variable for some vectorµk and a symmetric nonnegative definite matrixKk

for all k = 1, 2, . . . and for all t1, t2, . . . , tk.

From the above definition we conclude that the Finite dimensional distributionsof a Gaussian continuous time stochastic process are Gaussian with PFG

γµk ,Kk(x) = (2π)−n/2(detKk)

−1/2 exp

[−1

2〈K−1

k (x− µk), x− µk〉],

wherex = (x1, x2, . . . xk).

1In fact, what we need is the stochastic process to beseparable. See the discussion in Section 3.7

3.3. STATIONARY PROCESSES 31

It is straightforward to extend the above definition to arbitrary dimensions. AGaussian processx(t) is characterized by its mean

m(t) := Ex(t)

and the covariance (or autocorrelation) matrix

C(t, s) = E

((x(t) −m(t)

)⊗(x(s) −m(s)

)).

Thus, the first two moments of a Gaussian process are sufficient for a completecharacterization of the process.

3.3 Stationary Processes

3.3.1 Strictly Stationary Processes

In many stochastic processes that appear in applications their statistics remain in-variant under time translations. Such stochastic processes are calledstationary. Itis possible to develop a quite general theory for stochasticprocesses that enjoy thissymmetry property.

Definition 3.3.1. A stochastic process is called(strictly) stationary if all finitedimensional distributions are invariant under time translation: for any integerkand timesti ∈ T , the distribution of(X(t1),X(t2), . . . ,X(tk)) is equal to thatof (X(s + t1),X(s + t2), . . . ,X(s + tk)) for any s such thats + ti ∈ T for alli ∈ 1, . . . , k. In other words,

P(Xt1+t ∈ A1,Xt2+t ∈ A2 . . . Xtk+t ∈ Ak) = P(Xt1 ∈ A1,Xt2 ∈ A2 . . . Xtk ∈ Ak), ∀t ∈ T.

Example 3.3.2. Let Y0, Y1, . . . be a sequence of independent, identically dis-tributed random variables and consider the stochastic processXn = Yn. ThenXn is a strictly stationary process (see Exercise 1). Assume furthermore thatEY0 = µ < +∞. Then, by the strong law of large numbers, we have that

1

N

N−1∑

j=0

Xj =1

N

N−1∑

j=0

Yj → EY0 = µ,

almost surely. In fact,Birkhoff’s ergodic theorem states that, for any functionfsuch thatEf(Y0) < +∞, we have that

limN→+∞

1

N

N−1∑

j=0

f(Xj) = Ef(Y0), (3.1)


almost surely. The sequence of iid random variables is an example of anergodicstrictly stationary processes.

Ergodic strictly stationary processes satisfy (3.1) Hence, we can calculate thestatistics of a sequence stochastic processXn using a single sample path, providedthat it is long enough (N ≫ 1).

Example 3.3.3. Let Z be a random variable and define the stochastic processXn = Z, n = 0, 1, 2, . . . . ThenXn is a strictly stationary process (see Exercise 2).We can calculate the long time average of this stochastic process:

1

N

N−1∑

j=0

Xj =1

N

N−1∑

j=0

Z = Z,

which is independent ofN and does not converge to the mean of the stochastic pro-cessesEXn = EZ (assuming that it is finite), or any other deterministic number.This is an example of a non-ergodic processes.

3.3.2 Second Order Stationary Processes

Let(Ω,F ,P

)be a probability space. LetXt, t ∈ T (with T = R or Z) be a

real-valued random process on this probability space with finite second moment,E|Xt|2 < +∞ (i.e. Xt ∈ L2(Ω,P) for all t ∈ T ). Assume that it is strictlystationary. Then,

E(Xt+s) = EXt, s ∈ T (3.2)

from which we conclude thatEXt is constant. and

E((Xt1+s − µ)(Xt2+s − µ)) = E((Xt1 − µ)(Xt2 − µ)), s ∈ T (3.3)

from which we conclude that thecovariance or autocorrelation or correlationfunctionC(t, s) = E((Xt − µ)(Xs − µ)) depends on the difference between thetwo times,t ands, i.e.C(t, s) = C(t−s). This motivates the following definition.

Definition 3.3.4. A stochastic processXt ∈ L2 is calledsecond-order station-ary or wide-sense stationaryor weakly stationary if the first momentEXt is aconstant and the covariance functionE(Xt − µ)(Xs − µ) depends only on thedifferencet− s:

EXt = µ, E((Xt − µ)(Xs − µ)) = C(t− s).


The constantµ is theexpectationof the processXt. Without loss of generality,we can setµ = 0, since if EXt = µ then the processYt = Xt − µ is meanzero. A mean zero process with be called acenteredprocess. The functionC(t)

is thecovariance (sometimes also called autocovariance) or theautocorrelationfunction of theXt. Notice thatC(t) = E(XtX0), whereasC(0) = E(X2

t ), whichis finite, by assumption. Since we have assumed thatXt is a real valued process,we have thatC(t) = C(−t), t ∈ R.

Remark 3.3.5. LetXt be a strictly stationary stochastic process with finite secondmoment (i.e.Xt ∈ L2). The definition of strict stationarity implies thatEXt = µ, aconstant, andE((Xt−µ)(Xs−µ)) = C(t−s). Hence, a strictly stationary processwith finite second moment is also stationary in the wide sense. The converse is nottrue.

Example 3.3.6.

LetY0, Y1, . . . be a sequence of independent, identically distributed random vari-ables and consider the stochastic processXn = Yn. From Example 3.3.2 weknow that this is a strictly stationary process, irrespective of whetherY0 is suchthat EY 2

0 < +∞. Assume now thatEY0 = 0 and EY 20 = σ2 < +∞. Then

Xn is a second order stationary process with mean zero and correlation functionR(k) = σ2δk0. Notice that in this case we have no correlation between the valuesof the stochastic process at different timesn andk.

Example 3.3.7. Let Z be a single random variable and consider the stochasticprocessXn = Z, n = 0, 1, 2, . . . . From Example 3.3.3 we know that this is astrictly stationary process irrespective of whetherE|Z|2 < +∞ or not. Assumenow thatEZ = 0, EZ2 = σ2. ThenXn becomes a second order stationaryprocess withR(k) = σ2. Notice that in this case the values of our stochasticprocess at different times are strongly correlated.

We will see in Section 3.3.3 that for second order stationaryprocesses, ergod-icity is related to fast decay of correlations. In the first ofthe examples above,there was no correlation between our stochastic processes at different times andthe stochastic process is ergodic. On the contrary, in our second example there isvery strong correlation between the stochastic process at different times and thisprocess is not ergodic.

Remark 3.3.8. The first two moments of a Gaussian process are sufficient for a


complete characterization of the process. Consequently, aGaussian stochasticprocess is strictly stationary if and only if it is weakly stationary.

Continuity properties of the covariance function are equivalent to continuityproperties of the paths ofXt in theL2 sense, i.e.

limh→0

E|Xt+h −Xt|2 = 0.

Lemma 3.3.9. Assume that the covariance functionC(t) of a second order sta-tionary process is continuous att = 0. Then it is continuous for allt ∈ R. Fur-thermore, the continuity ofC(t) is equivalent to the continuity of the processXt intheL2-sense.

Proof. Fix t ∈ R and (without loss of generality) setEXt = 0. We calculate:

|C(t+ h) − C(t)|2 = |E(Xt+hX0) − E(XtX0)|2 = E|((Xt+h −Xt)X0)|2

6 E(X0)2E(Xt+h −Xt)

2

= C(0)(EX2t+h + EX2

t − 2EXtXt+h)

= 2C(0)(C(0) − C(h)) → 0,

ash→ 0. Thus, continuity ofC(·) at0 implies continuity for allt.Assume now thatC(t) is continuous. From the above calculation we have

E|Xt+h −Xt|2 = 2(C(0) − C(h)), (3.4)

which converges to0 ash → 0. Conversely, assume thatXt is L2-continuous.Then, from the above equation we getlimh→0C(h) = C(0).

Notice that form (3.4) we immediately conclude thatC(0) > C(h), h ∈ R.The Fourier transform of the covariance function of a secondorder stationary

process always exists. This enables us to study second orderstationary processesusing tools from Fourier analysis. To make the link between second order station-ary processes and Fourier analysis we will use Bochner’s theorem, which appliesto all nonnegative functions.

Definition 3.3.10. A functionf(x) : R 7→ R is called nonnegative definite if

n∑

i,j=1

f(ti − tj)cicj > 0 (3.5)

for all n ∈ N, t1, . . . tn ∈ R, c1, . . . cn ∈ C.


Lemma 3.3.11. The covariance function of second order stationary processis anonnegative definite function.

Proof. We will use the notationXct :=

∑ni=1Xtici. We have.

n∑

i,j=1

C(ti − tj)cicj =

n∑

i,j=1

EXtiXtj cicj

= E

n∑

i=1

Xtici

n∑

j=1

Xtj cj

= E(Xct X

ct

)

= E|Xct |2 > 0.

Theorem 3.3.12.(Bochner) LetC(t) be a continuous positive definite function.Then there exists a unique nonnegative measureρ on R such thatρ(R) = C(0)

and

C(t) =

∫

R

eixt ρ(dx) ∀t ∈ R. (3.6)

Definition 3.3.13. LetXt be a second order stationary process with autocorrela-tion functionC(t) whose Fourier transform is the measureρ(dx). The measureρ(dx) is called thespectral measureof the processXt.

In the following we will assume that the spectral measure is absolutely contin-uous with respect to the Lebesgue measure onR with densityf(x), i.e. ρ(dx) =

f(x)dx. The Fourier transformf(x) of the covariance function is called thespec-tral density of the process:

f(x) =1

2π

∫ ∞

−∞e−itxC(t) dt.

From (3.6) it follows that that the autocorrelation function of a mean zero, secondorder stationary process is given by the inverse Fourier transform of the spectraldensity:

C(t) =

∫ ∞

−∞eitxf(x) dx. (3.7)

There are various cases where the experimentally measured quantity is the spec-tral density (or power spectrum) of a stationary stochasticprocess. Conversely,


from a time series of observations of a stationary processeswe can calculate theautocorrelation function and, using (3.7) the spectral density.

The autocorrelation function of a second order stationary process enables us toassociate a time scale toXt, thecorrelation time τcor:

τcor =1

C(0)

∫ ∞

0C(τ) dτ =

∫ ∞

0E(XτX0)/E(X2

0 ) dτ.

The slower the decay of the correlation function, the largerthe correlation timeis. Notice that when the correlations do not decay sufficiently fast so thatC(t) isintegrable, then the correlation time will be infinite.

Example 3.3.14.Consider a mean zero, second order stationary process with cor-relation function

R(t) = R(0)e−α|t| (3.8)

whereα > 0. We will writeR(0) = Dα whereD > 0. The spectral density of this

process is:

f(x) =1

2π

D

α

∫ +∞

−∞e−ixte−α|t| dt

=1

2π

D

α

(∫ 0

−∞e−ixteαt dt +

∫ +∞

0e−ixte−αt dt

)

=1

2π

D

α

(1

−ix+ α+

1

ix+ α

)

=D

π

1

x2 + α2.

This function is called theCauchy or the Lorentz distribution. The correlationtime is (we have thatR(0) = D/α)

τcor =

∫ ∞

0e−αt dt = α−1.

A Gaussian process with an exponential correlation function is of particularimportance in the theory and applications of stochastic processes.

Definition 3.3.15. A real-valued Gaussian stationary process defined onR withcorrelation function given by(3.8) is called the (stationary)Ornstein-Uhlenbeckprocess.


The Ornstein Uhlenbeck process is used as a model for the velocity of a Brown-ian particle. It is of interest to calculate the statistics of the position of the Brownianparticle, i.e. of the integral

X(t) =

∫ t

0Y (s) ds, (3.9)

whereY (t) denotes the stationary OU process.

Lemma 3.3.16.LetY (t) denote the stationary OU process with covariance func-tion (3.8) and setα = D = 1. Then the position process(3.9) is a mean zeroGaussian process with covariance function

E(X(t)X(s)) = 2min(t, s) + e−min(t,s) + e−max(t,s) − e−|t−s| − 1. (3.10)


3.3.3 Ergodic Properties of Second-Order Stationary Processes

Second order stationary processes have nice ergodic properties, provided that thecorrelation between values of the process at different times decays sufficiently fast.In this case, it is possible to show that we can calculate expectations by calculatingtime averages. An example of such a result is the following.

Theorem 3.3.17.Let Xtt>0 be a second order stationary process on a proba-bility spaceΩ, F , P with meanµ and covarianceR(t), and assume thatR(t) ∈L1(0,+∞). Then

limT→+∞

E

∣∣∣∣1

T

∫ T

0X(s) ds − µ

∣∣∣∣2

= 0. (3.11)

For the proof of this result we will first need an elementary lemma.

Lemma 3.3.18.LetR(t) be an integrable symmetric function. Then∫ T

0

∫ T

0R(t− s) dtds = 2

∫ T

0(T − s)R(s) ds. (3.12)

Proof. We make the change of variablesu = t − s, v = t + s. The domain ofintegration in thet, s variables is[0, T ] × [0, T ]. In theu, v variables it becomes[−T, T ] × [0, 2(T − |u|)]. The Jacobian of the transformation is

J =∂(t, s)

∂(u, v)=

1

2.


The integral becomes

∫ T

0

∫ T

0R(t− s) dtds =

∫ T

−T

∫ 2(T−|u|)

0R(u)J dvdu

=

∫ T

−T(T − |u|)R(u) du

= 2

∫ T

0(T − u)R(u) du,

where the symmetry of the functionR(u) was used in the last step.

Proof of Theorem 3.3.17.We use Lemma (3.3.18) to calculate:

E

∣∣∣∣1

T

∫ T

0Xs ds − µ

∣∣∣∣2

=1

T 2E

∣∣∣∣

∫ T

0(Xs − µ) ds

∣∣∣∣2

=1

T 2E

∫ T

0

∫ T

0(X(t) − µ)(X(s) − µ) dtds

=1

T 2

∫ T

0

∫ T

0R(t− s) dtds

=2

T 2

∫ T

0(T − u)R(u) du

62

T

∫ +∞

0

∣∣∣(1 − u

T

)R(u)

∣∣∣ du 62

T

∫ +∞

0R(u) du→ 0,

using the dominated convergence theorem and the assumptionR(·) ∈ L1.Assume thatµ = 0 and define

D =

∫ +∞

0R(t) dt, (3.13)

which, from our assumption onR(t), is a finite quantity.2 The above calculationsuggests that, forT ≫ 1, we have that

E

(∫ t

0X(t) dt

)2

≈ 2DT.

This implies that, at sufficiently long times, the mean square displacement of theintegral of the ergodic second order stationary processXt scales linearly in time,with proportionality coefficient2D.

2Notice however that we do not know whether it is nonzero. Thisrequires a separate argument.

3.4. BROWNIAN MOTION 39

Assume thatXt is the velocity of a (Brownian) particle. In this case, the inte-gral ofXt

Zt =

∫ t

0Xs ds,

represents the particle position. From our calculation above we conclude that

EZ2t = 2Dt.

where

D =

∫ ∞

0R(t) dt =

∫ ∞

0E(XtX0) dt (3.14)

is thediffusion coefficient. Thus, one expects that at sufficiently long times andunder appropriate assumptions on the correlation function, the time integral of astationary process will approximate a Brownian motion withdiffusion coefficientD. The diffusion coefficient is an example of atransport coefficient and (3.14) isan example of theGreen-Kubo formula: a transport coefficient can be calculatedin terms of the time integral of an appropriate autocorrelation function. In thecase of the diffusion coefficient we need to calculate the integral of the velocityautocorrelation function.

Example 3.3.19.Consider the stochastic processes with an exponential correla-tion function from Example 3.3.14, and assume that this stochastic process de-scribes the velocity of a Brownian particle. SinceR(t) ∈ L1(0,+∞) Theo-rem 3.3.17 applies. Furthermore, the diffusion coefficientof the Brownian particleis given by ∫ +∞

0R(t) dt = R(0)τ−1

c =D

α2.

3.4 Brownian Motion

The most important continuous time stochastic process isBrownian motion. Brow-nian motion is a mean zero, continuous (i.e. it has continuous sample paths: fora.eω ∈ Ω the functionXt is a continuous function of time) process with indepen-dent Gaussian increments. A processXt hasindependent incrementsif for everysequencet0 < t1 < . . . tn the random variables

Xt1 −Xt0 , Xt2 −Xt1 , . . . ,Xtn −Xtn−1


are independent. If, furthermore, for anyt1, t2, s ∈ T and Borel setB ⊂ R

P(Xt2+s −Xt1+s ∈ B) = P(Xt2 −Xt1 ∈ B)

then the processXt hasstationary independent increments.

Definition 3.4.1. • A one dimensional standardBrownian motionW (t) : R+ →

R is a real valued stochastic process such that

i. W (0) = 0.

ii. W (t) has independent increments.

iii. For every t > s > 0 W (t) −W (s) has a Gaussian distribution withmean0 and variancet− s. That is, the density of the random variableW (t) −W (s) is

g(x; t, s) =(2π(t− s)

)− 12exp

(− x2

2(t− s)

); (3.15)

• A d–dimensional standard Brownian motionW (t) : R+ → R

d is a collec-tion ofd independent one dimensional Brownian motions:

W (t) = (W1(t), . . . ,Wd(t)),

whereWi(t), i = 1, . . . , d are independent one dimensional Brownian mo-tions. The density of the Gaussian random vectorW (t) −W (s) is thus

g(x; t, s) =(2π(t− s)

)−d/2exp

(− ‖x‖2

2(t− s)

).

Brownian motion is sometimes referred to as theWiener process.Brownian motion has continuous paths. More precisely, it has a continuous

modification.

Definition 3.4.2. LetXt andYt, t ∈ T , be two stochastic processes defined on thesame probability space(Ω,F ,P). The processYt is said to be a modification ofXt if P(Xt = Yt) = 1 ∀t ∈ T .

Lemma 3.4.3. There is a continuous modification of Brownian motion.

This follows from a theorem due to Kolmogorov.


0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

t

U(t)

mean of 1000 paths5 individual paths

Figure 3.1: Brownian sample paths

Theorem 3.4.4. (Kolmogorov) LetXt, t ∈ [0,∞) be a stochastic process on aprobability spaceΩ,F ,P. Suppose that there are positive constantsα andβ,and for eachT > 0 there is a constantC(T ) such that

E|Xt −Xs|α 6 C(T )|t− s|1+β, 0 6 s, t 6 T. (3.16)

Then there exists a continuous modificationYt of the processXt.

The proof of Lemma 3.4.3 is left as an exercise.

Remark 3.4.5. Equivalently, we could have defined the one dimensional standard

Brownian motion as a stochastic process on a probability space(Ω,F ,P

)with

continuous paths for almost allω ∈ Ω, and Gaussian finite dimensional distri-butions with zero mean and covarianceE(WtiWtj ) = min(ti, tj). One can thenshow that Definition 3.4.1 follows from the above definition.

It is possible to prove rigorously the existence of the Wiener process (Brownianmotion):

Theorem 3.4.6. (Wiener) There exists an almost-surely continuous processWt

with independent increments such andW0 = 0, such that for eacht > 0 therandom variableWt is N (0, t). Furthermore,Wt is almost surely locally Holdercontinuous with exponentα for anyα ∈ (0, 1

2).


Notice that Brownian paths are not differentiable.

We can also construct Brownian motion through the limit of anappropriatelyrescaled random walk: letX1, X2, . . . be iid random variables on a probabilityspace(Ω,F ,P) with mean0 and variance1. Define the discrete time stochasticprocessSn with S0 = 0, Sn =

∑j=1Xj , n > 1. Define now a continuous time

stochastic process with continuous paths as the linearly interpolated, appropriatelyrescaled random walk:

W nt =

1√nS[nt] + (nt− [nt])

1√nX[nt]+1,

where [·] denotes the integer part of a number. ThenW nt converges weakly, as

n→ +∞ to a one dimensional standard Brownian motion.

Brownian motion is a Gaussian process. For thed–dimensional Brownian mo-tion, and forI thed× d dimensional identity, we have (see (2.7) and (2.8))

EW (t) = 0 ∀t > 0

and

E

((W (t) −W (s)) ⊗ (W (t) −W (s))

)= (t− s)I. (3.17)

Moreover,

E

(W (t) ⊗W (s)

)= min(t, s)I. (3.18)

From the formula for the Gaussian densityg(x, t − s), eqn. (3.15), we immedi-ately conclude thatW (t) −W (s) andW (t+ u) −W (s+ u) have the same pdf.Consequently, Brownian motion has stationary increments.Notice, however, thatBrownian motion itselfis not a stationary process. SinceW (t) = W (t) −W (0),the pdf ofW (t) is

g(x, t) =1√2πt

e−x2/2t.

We can easily calculate all moments of the Brownian motion:

E(xn(t)) =1√2πt

∫ +∞

−∞xne−x

2/2t dx

=

1.3 . . . (n− 1)tn/2, n even,0, n odd.

Brownian motion is invariant under various transformations in time.


Theorem 3.4.7. . LetWt denote a standard Brownian motion inR. Then,Wt hasthe following properties:

i. (Rescaling). For eachc > 0 defineXt = 1√cW (ct). Then(Xt, t > 0) =

(Wt, t > 0) in law.

ii. (Shifting). For eachc > 0 Wc+t −Wc, t > 0 is a Brownian motion which isindependent ofWu, u ∈ [0, c].

iii. (Time reversal). DefineXt = W1−t−W1, t ∈ [0, 1]. Then(Xt, t ∈ [0, 1]) =

(Wt, t ∈ [0, 1]) in law.

iv. (Inversion). LetXt, t > 0 defined byX0 = 0, Xt = tW (1/t). Then(Xt, t > 0) = (Wt, t > 0) in law.

We emphasize that the equivalence in the above theorem holdsin law and notin a pathwise sense.


We can also add a drift and change the diffusion coefficient ofthe Brownianmotion: we will define a Brownian motion with driftµ and varianceσ2 as theprocess

Xt = µt+ σWt.

The mean and variance ofXt are

EXt = µt, E(Xt − EXt)2 = σ2t.

Notice thatXt satisfies the equation

dXt = µdt+ σ dWt.

This is the simplest example of astochastic differential equation.We can define the OU process through the Brownian motion via a time change.

Lemma 3.4.8.LetW (t) be a standard Brownian motion and consider the process

V (t) = e−tW (e2t).

ThenV (t) is a Gaussian stationary process with mean0 and correlation function

R(t) = e−|t|. (3.19)


For the proof of this result we first need to show that time changed Gaussianprocesses are also Gaussian.

Lemma 3.4.9.LetX(t) be a Gaussian stochastic process and letY (t) = X(f(t))

wheref(t) is a strictly increasing function. ThenY (t) is also a Gaussian process.

Proof. We need to show that, for all positive integersN and all sequences of timest1, t2, . . . tN the random vector

Y (t1), Y (t2), . . . Y (tN ) (3.20)

is a multivariate Gaussian random variable. Sincef(t) is strictly increasing, it isinvertible and hence, there existsi, i = 1, . . . N such thatsi = f−1(ti). Thus, therandom vector (3.20) can be rewritten as

X(s1), X(s2), . . . X(sN ),

which is Gaussian for allN and all choices of timess1, s2, . . . sN . HenceY (t) isalso Gaussian.

Proof of Lemma 3.4.8.The fact thatV (t) is mean zero follows immediatelyfrom the fact thatW (t) is mean zero. To show that the correlation function ofV (t)

is given by (3.19), we calculate

E(V (t)V (s)) = e−t−sE(W (e2t)W (e2s)) = e−t−s min(e2t, e2s)

= e−|t−s|.

The Gaussianity of the processV (t) follows from Lemma 3.4.9 (notice that thetransformation that givesV (t) in terms ofW (t) is invertible and we can writeW (s) = s1/2V (1

2 ln(s))).

3.5 Other Examples of Stochastic Processes

3.5.1 Brownian Bridge

LetW (t) be a standard one dimensional Brownian motion. We define theBrown-ian bridge (from 0 to 0) to be the process

Bt = Wt − tW1, t ∈ [0, 1]. (3.21)

3.5. OTHER EXAMPLES OF STOCHASTIC PROCESSES 45

Notice thatB0 = B1 = 0. Equivalently, we can define the Brownian bridge to bethe continuous Gaussian processBt : 0 6 t 6 1 such that

EBt = 0, E(BtBs) = min(s, t) − st, s, t ∈ [0, 1]. (3.22)

Another, equivalent definition of the Brownian bridge is through an appropriatetime change of the Brownian motion:

Bt = (1 − t)W

(t

1 − t

), t ∈ [0, 1). (3.23)

Conversely, we can write the Brownian motion as a time changeof the Brownianbridge:

Wt = (t+ 1)B

(t

1 + t

), t > 0.

3.5.2 Fractional Brownian Motion

Definition 3.5.1. A (normalized) fractional Brownian motionWHt , t > 0 with

Hurst parameterH ∈ (0, 1) is a centered Gaussian process with continuous sam-ple paths whose covariance is given by

E(WHt W

Hs ) =

1

2

(s2H + t2H − |t− s|2H

). (3.24)

Proposition 3.5.2. Fractional Brownian motion has the following properties.

i. WhenH = 12 , W

12t becomes the standard Brownian motion.

ii. WH0 = 0, EWH

t = 0, E(WHt )2 = |t|2H , t > 0.

iii. It has stationary increments,E(WHt −WH

s )2 = |t− s|2H .

iv. It has the following self similarity property

(WHαt , t > 0) = (αHWH

t , t > 0), α > 0, (3.25)

where the equivalence is in law.

Proof. See Exercise 19


3.5.3 The Poisson Process

Another fundamental continuous time process is thePoisson process:

Definition 3.5.3. The Poisson process with intensityλ, denoted byN(t), is aninteger-valued, continuous time, stochastic process withindependent incrementssatisfying

P[(N(t) −N(s)) = k] =e−λ(t−s)(λ(t− s)

)k

k!, t > s > 0, k ∈ N.

The Poisson process does not have a continuous modification.See Exercise 20.

3.6 The Karhunen-Loeve Expansion

Let f ∈ L2(Ω) whereΩ is a subset ofRd and leten∞n=1 be an orthonormal basisin L2(Ω). Then, it is well known thatf can be written as a series expansion:

f =

∞∑

n=1

fnen,

where

fn =

∫

Ωf(x)en(x) dx.

The convergence is inL2(Ω):

limN→∞

∥∥∥∥∥f(x) −N∑

n=1

fnen(x)

∥∥∥∥∥L2(Ω)

= 0.

It turns out that we can obtain a similar expansion for anL2 mean zero processwhich is continuous in theL2 sense:

EX2t < +∞, EXt = 0, lim

h→0E|Xt+h −Xt|2 = 0. (3.26)

For simplicity we will takeT = [0, 1]. LetR(t, s) = E(XtXs) be the autocorrela-tion function. Notice that from (3.26) it follows thatR(t, s) is continuous in bothtands (exercise 21).

Let us assume an expansion of the form

Xt(ω) =∞∑

n=1

ξn(ω)en(t), t ∈ [0, 1] (3.27)

3.6. THE KARHUNEN-LOEVE EXPANSION 47

whereen∞n=1 is an orthonormal basis inL2(0, 1). The random variablesξn arecalculated as

∫ 1

0Xtek(t) dt =

∫ 1

0

∞∑

n=1

ξnen(t)ek(t) dt

=

∞∑

n=1

ξnδnk = ξk,

where we assumed that we can interchange the summation and integration. Wewill assume that these random variables are orthogonal:

E(ξnξm) = λnδnm,

whereλn∞n=1 are positive numbers that will be determined later.Assuming that an expansion of the form (3.27) exists, we can calculate

R(t, s) = E(XtXs) = E

( ∞∑

k=1

∞∑

ℓ=1

ξkek(t)ξℓeℓ(s)

)

=∞∑

k=1

∞∑

ℓ=1

E (ξkξℓ) ek(t)eℓ(s)

=∞∑

k=1

λkek(t)ek(s).

Consequently, in order to the expansion (3.27) to be valid weneed

R(t, s) =

∞∑

k=1

λkek(t)ek(s). (3.28)

From equation (3.28) it follows that

∫ 1

0R(t, s)en(s) ds =

∫ 1

0

∞∑

k=1

λkek(t)ek(s)en(s) ds

=

∞∑

k=1

λkek(t)

∫ 1

0ek(s)en(s) ds

=

∞∑

k=1

λkek(t)δkn

= λnen(t).


Hence, in order for the expansion (3.27) to be valid,λn, en(t)∞n=1 have to bethe eigenvalues and eigenfunctions of the integral operator whose kernel is thecorrelation function ofXt:

∫ 1

0R(t, s)en(s) ds = λnen(t). (3.29)

Hence, in order to prove the expansion (3.27) we need to studythe eigenvalueproblem for the integral operatorR : L2[0, 1] 7→ L2[0, 1]. It easy to check thatthis operator isself-adjoint ((Rf, h) = (f,Rh) for all f, h ∈ L2(0, 1)) andnon-negative (Rf, f > 0 for all f ∈ L2(0, 1)). Hence, all its eigenvalues are realand nonnegative. Furthermore, it is acompactoperator (ifφn∞n=1 is a boundedsequence inL2(0, 1), thenRφn∞n=1 has a convergent subsequence). The spec-tral theorem for compact, self-adjoint operators implies that R has a countablesequence of eigenvalues tending to0. Furthermore, for everyf ∈ L2(0, 1) we canwrite

f = f0 +

∞∑

n=1

fnen(t),

whereRf0 = 0, en(t) are the eigenfunctions ofR corresponding to nonzeroeigenvalues and the convergence is inL2. Finally, Mercer’s Theorem states thatfor R(t, s) continuous on[0, 1] × [0, 1], the expansion (3.28) is valid, where theseries converges absolutely and uniformly.

Now we are ready to prove (3.27).

Theorem 3.6.1. (Karhunen-Loeve). LetXt, t ∈ [0, 1] be anL2 process withzero mean and continuous correlation functionR(t, s). Letλn, en(t)∞n=1 be theeigenvalues and eigenfunctions of the operatorR defined in(3.35). Then

Xt =∞∑

n=1

ξnen(t), t ∈ [0, 1], (3.30)

where

ξn =

∫ 1

0Xten(t) dt, Eξn = 0, E(ξnξm) = λδnm. (3.31)

The series converges inL2 toX(t), uniformly int.

Proof. The fact thatEξn = 0 follows from the fact thatXt is mean zero. Theorthogonality of the random variablesξn∞n=1 follows from the orthogonality of

3.6. THE KARHUNEN-LOEVE EXPANSION 49

the eigenfunctions ofR:

E(ξnξm) = E

∫ 1

0

∫ 1

0XtXsen(t)em(s) dtds

=

∫ 1

0

∫ 1

0R(t, s)en(t)em(s) dsdt

= λn

∫ 1

0en(s)em(s) ds

= λnδnm.

Consider now the partial sumSN =∑N

n=1 ξnen(t).

E|Xt − SN |2 = EX2t + ES2

N − 2E(XtSN )

= R(t, t) + E

N∑

k,ℓ=1

ξkξℓek(t)eℓ(t) − 2E

(Xt

N∑

n=1

ξnen(t)

)

= R(t, t) +

N∑

k=1

λk|ek(t)|2 − 2E

N∑

k=1

∫ 1

0XtXsek(s)ek(t) ds

= R(t, t) −N∑

k=1

λk|ek(t)|2 → 0,

by Mercer’s theorem.

Remark 3.6.2. LetXt be a Gaussian second order process with continuous co-varianceR(t, s). Then the random variablesξk∞k=1 are Gaussian, since theyare defined through the time integral of a Gaussian processes. Furthermore, sincethey are Gaussian and orthogonal, they are also independent. Hence, for Gaussianprocesses the Karhunen-Loeve expansion becomes:

Xt =+∞∑

k=1

√λkξkek(t), (3.32)

whereξk∞k=1 are independentN (0, 1) random variables.

Example 3.6.3. The Karhunen-Loeve Expansion for Brownian Motion. Thecorrelation function of Brownian motion isR(t, s) = min(t, s). The eigenvalueproblemRψn = λnψn becomes

∫ 1

0min(t, s)ψn(s) ds = λnψn(t).


Let us assume thatλn > 0 (it is easy to check that0 is not an eigenvalue). Uponsettingt = 0 we obtainψn(0) = 0. The eigenvalue problem can be rewritten inthe form ∫ t

0sψn(s) ds + t

∫ 1

tψn(s) ds = λnψn(t).

We differentiate this equation once:

∫ 1

tψn(s) ds = λnψ

′n(t).

We sett = 1 in this equation to obtain the second boundary conditionψ′n(1) = 0.

A second differentiation yields;

−ψn(t) = λnψ′′n(t),

where primes denote differentiation with respect tot. Thus, in order to calcu-late the eigenvalues and eigenfunctions of the integral operator whose kernel isthe covariance function of Brownian motion, we need to solvethe Sturm-Liouvilleproblem

−ψn(t) = λnψ′′n(t), ψ(0) = ψ′(1) = 0.

It is easy to check that the eigenvalues and (normalized) eigenfunctions are

ψn(t) =√

2 sin

(1

2(2n− 1)πt

), λn =

(2

(2n− 1)π

)2

.

Thus, the Karhunen-Loeve expansion of Brownian motion on[0, 1] is

Wt =√

2

∞∑

n=1

ξn2

(2n − 1)πsin

(1

2(2n− 1)πt

). (3.33)

We can use the KL expansion in order to study theL2-regularity of stochas-tic processes. First, letR be a compact, symmetric positive definite operator onL2(0, 1) with eigenvalues and normalized eigenfunctionsλk, ek(x)+∞

k=1 and con-sider a functionf ∈ L2(0, 1) with

∫ 10 f(s) ds = 0. We can define the one parame-

ter family of Hilbert spacesHα through the norm

‖f‖2α = ‖R−αf‖2

L2 =∑

k

|fk|2λ−α.


The inner product can be obtained through polarization. This norm enables us tomeasure the regularity of the functionf(t).3 LetXt be a mean zero second order(i.e. with finite second moment) process with continuous autocorrelation function.Define the spaceHα := L2((Ω, P ),Hα(0, 1)) with (semi)norm

‖Xt‖2α = E‖Xt‖2

Hα =∑

k

|λk|1−α. (3.34)

Notice that the regularity of the stochastic processXt depends on the decay of theeigenvalues of the integral operatorR· :=

∫ 10 R(t, s) · ds.

As an example, consider theL2-regularity of Brownian motion. From Exam-ple 3.6.3 we know thatλk ∼ k−2. Consequently, from (3.34) we get that, in orderfor Wt to be an element of the spaceHα, we need that

∑

k

|λk|−2(1−α) < +∞,

from which we obtain thatα < 1/2. This is consistent with the Holder continuityof Brownian motion from Theorem 3.4.6.4


The Ornstein-Uhlenbeck process was introduced by Ornsteinand Uhlenbeck in1930 as a model for the velocity of a Brownian particle [73].

The kind of analysis presented in Section 3.3.3 was initiated by G.I. Taylorin [72]. The proof of Bochner’s theorem 3.3.12 can be found in[39], where addi-tional material on stationary processes can be found. See also [36].

The spectral theorem for compact, self-adjoint operators which was neededin the proof of the Karhunen-Loeve theorem can be found in [63]. The Karhunen-Loeve expansion is also valid forrandom fields. See [69] and the reference therein.

3.8 Exercises

1. Let Y0, Y1, . . . be a sequence of independent, identically distributed randomvariables and consider the stochastic processXn = Yn.

3Think of R as being the inverse of the Laplacian with periodic boundaryconditions. In this caseHα coincides with the standard fractional Sobolev space.

4Notice, however, that Wiener’s theorem refers to a.s. Holder continuity, whereas the calculationpresented in this section is aboutL2-continuity.


(a) Show thatXn is a strictly stationary process.

(b) Assume thatEY0 = µ < +∞ andEY 20 = sigma2 < +∞. Show that

limN→+∞

E

∣∣∣∣∣∣1

N

N−1∑

j=0

Xj − µ

∣∣∣∣∣∣= 0.

(c) Letf be such thatEf2(Y0) < +∞. Show that

limN→+∞

E

∣∣∣∣∣∣1

N

N−1∑

j=0

f(Xj) − f(Y0)

∣∣∣∣∣∣= 0.

2. LetZ be a random variable and define the stochastic processXn = Z, n =

0, 1, 2, . . . . Show thatXn is a strictly stationary process.

3. LetA0, A1, . . . Am andB0, B1, . . . Bm be uncorrelated random variables withmean zero and variancesEA2

i = σ2i , EB2

i = σ2i , i = 1, . . . m. Letω0, ω1, . . . ωm ∈

[0, π] be distinct frequencies and define, forn = 0,±1,±2, . . . , the stochasticprocess

Xn =

m∑

k=0

(Ak cos(nωk) +Bk sin(nωk)

).

Calculate the mean and the covariance ofXn. Show that it is a weakly stationaryprocess.

4. Let ξn : n = 0,±1,±2, . . . be uncorrelated random variables withEξn =

µ, E(ξn − µ)2 = σ2, n = 0,±1,±2, . . . . Let a1, a2, . . . be arbitrary realnumbers and consider the stochastic process

Xn = a1ξn + a2ξn−1 + . . . amξn−m+1.

(a) Calculate the mean, variance and the covariance function of Xn. Showthat it is a weakly stationary process.

(b) Setak = 1/√m for k = 1, . . . m. Calculate the covariance function and

study the casesm = 1 andm→ +∞.

5. LetW (t) be a standard one dimensional Brownian motion. Calculate the fol-lowing expectations.

(a) EeiW (t).

3.8. EXERCISES 53

(b) Eei(W (t)+W (s)), t, s,∈ (0,+∞).

(c) E(∑n

i=1 ciW (ti))2, whereci ∈ R, i = 1, . . . n and ti ∈ (0,+∞), i =

1, . . . n.

(d) Ee

[i(

Pni=1 ciW (ti)

)], whereci ∈ R, i = 1, . . . n andti ∈ (0,+∞), i =

1, . . . n.

6. LetWt be a standard one dimensional Brownian motion and define

Bt = Wt − tW1, t ∈ [0, 1].

(a) Show thatBt is a Gaussian process with

EBt = 0, E(BtBs) = min(t, s) − ts.

(b) Show that, fort ∈ [0, 1) an equivalent definition ofBt is through theformula

Bt = (1 − t)W

(t

1 − t

).

(c) Calculate the distribution function ofBt.

7. Let Xt be a mean-zero second order stationary process with autocorrelationfunction

R(t) =

N∑

j=1

λ2j

αje−αj |t|,

whereαj , λjNj=1 are positive real numbers.

(a) Calculate the spectral density and the correlaction time of this process.

(b) Show that the assumptions of Theorem 3.3.17 are satisfiedand use theargument presented in Section 3.3.3 (i.e. the Green-Kubo formula) to cal-culate the diffusion coefficient of the processZt =

∫ t0 Xs ds.

(c) Under what assumptions on the coefficientsαj , λjNj=1 can you studythe above questions in the limitN → +∞?

8. Prove Lemma 3.10.

9. Let a1, . . . an ands1, . . . sn be positive real numbers. Calculate the mean andvariance of the random variable

X =

n∑

i=1

aiW (si).


10. LetW (t) be the standard one-dimensional Brownian motion and letσ, s1, s2 >

0. Calculate

(a) EeσW (t).

(b) E(sin(σW (s1)) sin(σW (s2))

).

11. LetWt be a one dimensional Brownian motion and letµ, σ > 0 and define

St = etµ+σWt .

(a) Calculate the mean and the variance ofSt.

(b) Calculate the probability density function ofSt.

12. Use Theorem 3.4.4 to prove Lemma 3.4.3.

13. Prove Theorem 3.4.7.

14. Use Lemma 3.4.8 to calculate the distribution function of the stationary Ornstein-Uhlenbeck process.

15. Calculate the mean and the correlation function of the integral of a standardBrownian motion

Yt =

∫ t

0Ws ds.

16. Show that the process

Yt =

∫ t+1

t(Ws −Wt) ds, t ∈ R,

is second order stationary.

17. LetVt = e−tW (e2t) be the stationary Ornstein-Uhlenbeck process. Give thedefinition and study the main properties of the Ornstein-Uhlenbeck bridge.

18. The autocorrelation function of the velocityY (t) a Brownian particle movingin a harmonic potentialV (x) = 1

2ω20x

2 is

R(t) = e−γ|t|(cos(δ|t|) − 1

δsin(δ|t|)

),

whereγ is the friction coefficient andδ =√ω2

0 − γ2.

3.8. EXERCISES 55

(a) Calculate the spectral density ofY (t).

(b) Calculate the mean square displacementE(X(t))2 of the position of theBrownian particleX(t) =

∫ t0 Y (s) ds. Study the limitt→ +∞.

19. Show the scaling property (3.25) of the fractional Brownian motion.

20. Use Theorem (3.4.4) to show that there does not exist a continuous modificationof the Poisson process.

21. Show that the correlation function of a processXt satisfying (3.26) is continu-ous in botht ands.

22. LetXt be a stochastic process satisfying (3.26) andR(t, s) its correlation func-tion. Show that the integral operatorR : L2[0, 1] 7→ L2[0, 1]

Rf :=

∫ 1

0R(t, s)f(s) ds (3.35)

is self-adjoint and nonnegative. Show that all of its eigenvalues are real andnonnegative. Show that eigenfunctions corresponding to different eigenvaluesare orthogonal.

23. LetH be a Hilbert space. An operatorR : H → H is said to beHilbert-Schmidt if there exists a complete orthonormal sequenceφn∞n=1 in H suchthat ∞∑

n=1

‖Ren‖2 <∞.

LetR : L2[0, 1] 7→ L2[0, 1] be the operator defined in (3.35) withR(t, s) beingcontinuous both int ands. Show that it is a Hilbert-Schmidt operator.

24. LetXt a mean zero second order stationary process defined in the interval[0, T ]

with continuous covarianceR(t) and letλn+∞n=1 be the eigenvalues of the

covariance operator. Show that

∞∑

n=1

λn = T R(0).

25. Calculate the Karhunen-Loeve expansion for a second order stochastic processwith correlation functionR(t, s) = ts.


26. Calculate the Karhunen-Loeve expansion of the Brownianbridge on[0, 1].

27. LetXt, t ∈ [0, T ] be a second order process with continuous covariance andKarhunen-Loeve expansion

Xt =

∞∑

k=1

ξkek(t).

Define the processY (t) = f(t)Xτ(t), t ∈ [0, S],

wheref(t) is a continuous function andτ(t) a continuous, nondecreasing func-tion with τ(0) = 0, τ(S) = T . Find the Karhunen-Loeve expansion ofY (t),in an appropriate weightedL2 space, in terms of the KL expansion ofXt. Usethis in order to calculate the KL expansion of the Ornstein-Uhlenbeck process.

28. Calculate the Karhunen-Loeve expansion of a centered Gaussian stochastic pro-cess with covariance functionR(s, t) = cos(2π(t− s)).

29. Use the Karhunen-Loeve expansion to generate paths of the

(a) Brownian motion on[0, 1].

(b) Brownian bridge on[0, 1].

(c) Ornstein-Uhlenbeck on[0, 1].

Study computationally the convergence of the KL expansion for these pro-cesses. How many terms do you need to keep in the KL expansion in orderto calculate accurate statistics of these processes?

Chapter 4

Markov Processes

4.1 Introduction

In this chapter we will study some of the basic properties of Markov stochasticprocesses. In Section 4.2 we present various examples of Markov processes, indiscrete and continuous time. In Section 4.3 we give the precise definition of aMarkov process. In Section 4.4 we derive the Chapman-Kolmogorov equation,the fundamental equation in the theory of Markov processes.In Section 4.5 weintroduce the concept of the generator of a Markov process. In Section 4.6 we studyergodic Markov processes. Discussion and bibliographicalremarks are presentedin Section 4.7 and exercises can be found in Section 4.8.

4.2 Examples

Roughly speaking, a Markov process is a stochastic process that retains no mem-ory of where it has been in the past: only the current state of aMarkov processcan influence where it will go next. A bit more precisely: a Markov process isa stochastic process for which, given the present, past and future are statisticallyindependent.

Perhaps the simplest example of a Markov process is that of a random walkin one dimension. We defined the one dimensional random walk as the sum ofindependent, mean zero and variance1 random variablesξi, i = 1, . . . :

XN =N∑

n=1

ξn, X0 = 0.

57

58 CHAPTER 4. MARKOV PROCESSES

Let i1, . . . i2, . . . be a sequence of integers. Then, for all integersn andm we havethat

P(Xn+m = in+m|X1 = i1, . . . Xn = in) = P(Xn+m = in+m|Xn = in). (4.1)

1In words, the probability that the random walk will be atin+m at timen + m

depends only on its current value (at timen) and not on how it got there.The random walk is an example of adiscrete time Markov chain:

Definition 4.2.1. A stochastic processSn;n ∈ N and state space isS = Z iscalled a discrete time Markov chain provided that theMarkov property (4.1) issatisfied.

Consider now a continuous-time stochastic processXt with state spaceS = Z

and denote byXs, s 6 t the collection of values of the stochastic process up totime t. We will say thatXt is a Markov processes provided that

P(Xt+h = it+h|Xs, s 6 t) = P(Xt+h = it+h|Xt = it), (4.2)

for all h > 0. A continuous-time, discrete state space Markov process iscalled acontinuous-time Markov chain.

Example 4.2.2.The Poisson process is a continuous-time Markov chain with

P(Nt+h = j|Nt = i) = 0 if j < i,

e−λs(λs)j−i

(j−i)! , if j > i.

Similarly, we can define a continuous-time Markov process whose state spaceis R. In this case, the above definitions become

P(Xt+h ∈ Γ|Xs, s 6 t) = P(Xt+h ∈ Γ|Xt = x) (4.3)

for all Borel setsΓ.

Example 4.2.3.The Brownian motion is a Markov process withconditional prob-ability density

p(y, t|x, s) := p(Wt = y|Ws = x) =1√

2π(t− s)exp

(−|x− y|2

2(t− s)

). (4.4)

1In fact, it is sufficient to takem = 1 in (4.1). See Exercise 1.

4.2. EXAMPLES 59

Example 4.2.4. The Ornstein-Uhlenbeck processVt = e−tW (e2t) is a Markovprocess withconditional probability density

p(y, t|x, s) := p(Vt = y|Vs = x) =1√

2π(1 − e−2(t−s))exp

(−|y − xe−(t−s)|2

2(1 − e−2(t−s))

).

(4.5)To prove(4.5) we use the formula for the distribution function of the Brownianmotion to calculate, fort > s,

P(Vt 6 y|Vs = x) = P(e−tW (e2t) 6 y|e−sW (e2s) = x)

= P(W (e2t) 6 ety|W (e2s) = esx)

=

∫ ety

−∞

1√2π(e2t − e2s)

e− |z−xes|2

2(e2t−e2s) dz

=

∫ y

−∞

1√2πe2t(1 − e−2(t−s))

e− |ρet−xes|2

2(e2t(1−e−2(t−s)) dρ

=

∫ y

−∞

1√2π(1 − e−2(t−s))

e− |ρ−x|2

2(1−e−2(t−s)) dρ.

Consequently, the transition probability density for the OU process is given by theformula

p(y, t|x, s) =∂

∂yP(Vt 6 y|Vs = x)

=1√

2π(1 − e−2(t−s))exp

(−|y − xe−(t−s)|2

2(1 − e−2(t−s))

).

Markov stochastic processes appear in a variety of applications in physics,chemistry, biology and finance. In this and the next chapter we will develop vari-ous analytical tools for studying them. In particular, we will see that we can obtainan equation for thetransition probability

P(Xn+1 = in+1|Xn = in), P(Xt+h = it+h|Xt = it), p(Xt+h = y|Xt = x),

(4.6)which will enable us to study the evolution of a Markov process. This equationwill be called theChapman-Kolmogorov equation.

We will be mostly concerned withtime-homogeneousMarkov processes, i.e.processes for which the conditional probabilities are invariant under time shifts.


For time-homogeneous discrete-time Markov chains we have

P(Xn+1 = j|Xn = i) = P(X1 = j|X0 = i) =: pij.

We will refer to the matrixP = pij as thetransition matrix . It is each to checkthat the transition matrix is astochastic matrix, i.e. it has nonnegative entries and∑

j pij = 1. Similarly, we can define then-step transition matrixPn = pij(n)as

pij(n) = P(Xm+n = j|Xm = i).

We can study the evolution of a Markov chain through the Chapman-Kolmogorovequation:

pij(m+ n) =∑

k

pik(m)pkj(n). (4.7)

Indeed, letµ(n)i := P(Xn = i). The (possibly infinite dimensional) vectorµn

determines the state of the Markov chain at timen. A simple consequence of theChapman-Kolmogorov equation is that we can write an evolution equation for thevectorµ(n)

µ(n) = µ(0)Pn, (4.8)

wherePn denotes thenth power of the matrixP . Hence in order to calculate thestate of the Markov chain at timen all we need is the initial distributionµ0 and thetransition matrixP . Componentwise, the above equation can be written as

µ(n)j =

∑

i

µ(0)i πij(n).

Consider now a continuous time Markov chain with transitionprobability

pij(s, t) = P(Xt = j|Xs = i), s 6 t.

If the chain is homogeneous, then

pij(s, t) = pij(0, t− s) for all i, j, s, t.

In particular,pij(t) = P(Xt = j|X0 = i).

The Chapman-Kolmogorov equation for a continuous time Markov chain is

dpijdt

=∑

k

pik(t)gkj , (4.9)

4.2. EXAMPLES 61

where the matrixG is called thegenerator of the Markov chain. Equation (4.9)can also be written in matrix notation:

dP

dt= PtG.

The generator of the Markov chain is defined as

G = limh→0

1

h(Ph − I).

Let nowµit = P(Xt = i). The vectorµt is the distribution of the Markov chain attime t. We can study its evolution using the equation

µt = µ0Pt.

Thus, as in the case if discrete time Markov chains, the evolution of a continuoustime Markov chain is completely determined by the initial distribution and andtransition matrix.

Consider now the case a continuous time Markov process with continuous statespace and with continuous paths. As we have seen in Example 4.2.3 the Brownianmotion is an example of such a process. It is a standard resultin the theory of par-tial differential equations that the conditional probability density of the Brownianmotion (4.4) is the fundamental solution of the diffusion equation:

∂p

∂t=

1

2

∂2p

∂y2, lim

t→sp(y, t|x, s) = δ(y − x). (4.10)

Similarly, the conditional distribution of the OU process satisfies the initial valueproblem

∂p

∂t=∂(yp)

∂y+

1

2

∂2p

∂y2, lim

t→sp(y, t|x, s) = δ(y − x). (4.11)

The Brownian motion and the OU process are examples of adiffusion process.A diffusion process is a continuous time Markov process withcontinuous paths.We will see in Chapter 5, that the conditional probability density p(y, t|x, s) of adiffusion process satisfies theforward Kolmogorov or Fokker-Planck equation

∂p

∂t= − ∂

∂y(a(y, t)p) +

1

2

∂2

∂y2(b(y, t)p), lim

t→sp(y, t|x, s) = δ(y − x). (4.12)

as well as thebackward Kolmogorov equation

−∂p∂s

= a(x, s)∂p

∂x+

1

2b(x, s)

∂2p

∂x2, lim

t→sp(y, t|x, s) = δ(y − x). (4.13)

for appropriate functionsa(y, t), b(y, t). Hence, a diffusion process is determineduniquely from these two functions.


4.3 Definition of a Markov Process

In Section 4.1 we gave the definition of Markov process whose time is either dis-crete or continuous, and whose state space is the set of integers. We also gaveseveral examples of Markov chains as well as of processes whose state space is thereal line. In this section we give the precise definition of a Markov process witht ∈ T , a general index set andS = E, an arbitrary metric space. We will use thisformulation in the next section to derive the Chapman-Kolmogorov equation.

In order to state the definition of a continuous-time Markov process that takesvalues in a metric space we need to introduce various new concepts. For the defini-tion of a Markov process we need to use the conditional expectation of the stochas-tic process conditioned on all past values. We can encode allpast informationabout a stochastic process into an appropriate collection of σ-algebras. Our set-ting will be that we have a probability space(Ω,F ,P) and an ordered setT . LetX = Xt(ω) be a stochastic process from the sample space(Ω,F) to the state space(E,G), whereE is a metric space (we will usually takeE to be eitherR or R

d).Remember that the stochastic process is a function of two variables,t ∈ T andω ∈ Ω.

We start with the definition of aσ–algebra generated by a collection of sets.

Definition 4.3.1. LetK be a collection of subsets ofΩ. The smallestσ–algebra onΩ which containsK is denoted byσ(K) and is called theσ–algebra generatedbyK.

Definition 4.3.2. LetXt : Ω 7→ E, t ∈ T . The smallestσ–algebraσ(Xt, t ∈T ), such that the family of mappingsXt, t ∈ T is a stochastic process withsample space(Ω, σ(Xt, t ∈ T )) and state space(E,G), is called theσ–algebragenerated byXt, t ∈ T.

In other words, theσ–algebra generated byXt is the smallestσ–algebra suchthatXt is a measurable function (random variable) with respect to it: the set

(ω ∈ Ω : Xt(ω) 6 x

)∈ σ(Xt, t ∈ T )

for all x ∈ R (we have assumed thatE = R).

Definition 4.3.3. A filtration on (Ω,F) is a nondecreasing familyFt, t ∈ T ofsub–σ–algebras ofF : Fs ⊆ Ft ⊆ F for s 6 t.

4.3. DEFINITION OF A MARKOV PROCESS 63

We setF∞ = σ(∪t∈TFt). The filtration generated by Xt, whereXt is astochastic process, is

FXt := σ (Xs; s 6 t) .

Definition 4.3.4. A stochastic processXt; t ∈ T is adapted to the filtrationFt := Ft, t ∈ T if for all t ∈ T ,Xt is anFt–measurable random variable.

Definition 4.3.5. Let Xt be a stochastic process defined on a probability space(Ω,F , µ) with values inE and letFX

t be the filtration generated byXt; t ∈ T.ThenXt; t ∈ T is a Markov process if

P(Xt ∈ Γ|FXs ) = P(Xt ∈ Γ|Xs) (4.14)

for all t, s ∈ T with t > s, andΓ ∈ B(E).

Remark 4.3.6. The filtrationFXt is generated by events of the formω|Xs1 ∈

B1, Xs2 ∈ B2, . . . Xsn ∈ Bn, with 0 6 s1 < s2 < · · · < sn 6 s andBi ∈B(E). The definition of a Markov process is thus equivalent to the hierarchy ofequations

P(Xt ∈ Γ|Xt1 ,Xt2 , . . . Xtn) = P(Xt ∈ Γ|Xtn) a.s.

for n > 1, 0 6 t1 < t2 < · · · < tn 6 t andΓ ∈ B(E).

Roughly speaking, the statistics ofXt for t > s are completely determinedonceXs is known; information aboutXt for t < s is superfluous. In other words:a Markov process has no memory. More precisely: when a Markov process isconditioned on the present state, then there is no memory of the past.The past andfuture of a Markov process are statistically independent when the present is known.

Remark 4.3.7. A non-Markovian processXt can be described through a Marko-vian oneYt by enlarging the state space: the additional variables thatwe introduceaccount for the memory in theXt. This ”Markovianization” trick is very usefulsince there exist many analytical tools for analyzing Markovian processes.

Example 4.3.8.The velocity of a Brownian particle is modeled by the stationaryOrnstein-Uhlenbeck processYt = e−tW (e2t). The particle position is given by theintegral of the OU process (we takeX0 = 0)

Xt =

∫ t

0Ys ds.


The particle position depends on the past of the OU process and, consequently,is not a Markov process. However, the joint position-velocity processXt, Yt is.Its transition probability densityp(x, y, t|x0, y0) satisfies the forward Kolmogorovequation

∂p

∂t= −p∂p

∂x+

∂

∂y(yp) +

1

2

∂2p

∂y2.

4.4 The Chapman-Kolmogorov Equation

With a Markov processXt we can associate a functionP : T×T×E×B(E) →R

+ defined through the relation

P[Xt ∈ Γ|FX

s

]= P (s, t,Xs,Γ),

for all t, s ∈ T with t > s and allΓ ∈ B(E). Assume thatXs = x. SinceP[Xt ∈ Γ|FX

s

]= P [Xt ∈ Γ|Xs] we can write

P (Γ, t|x, s) = P [Xt ∈ Γ|Xs = x] .

The transition function P (t,Γ|x, s) is (for fixed t, x s) a probability measure onE with P (t, E|x, s) = 1; it is B(E)–measurable inx (for fixed t, s, Γ) and satis-fies theChapman–Kolmogorovequation

P (Γ, t|x, s) =

∫

EP (Γ, t|y, u)P (dy, u|x, s). (4.15)

for all x ∈ E, Γ ∈ B(E) ands, u, t ∈ T with s 6 u 6 t. The derivation of theChapman-Kolmogorov equation is based on the assumption of Markovianity andon properties of the conditional probability. Let(Ω,F , µ) be a probability space,X a random variable from(Ω,F , µ) to (E,G) and letF1 ⊂ F2 ⊂ F . Then (seeTheorem 2.4.1)

E(E(X|F2)|F1) = E(E(X|F1)|F2) = E(X|F1). (4.16)

Given G ⊂ F we define the functionPX(B|G) = P (X ∈ B|G) for B ∈ F .Assume thatf is such thatE(f(X)) <∞. Then

E(f(X)|G) =

∫

R

f(x)PX(dx|G). (4.17)

4.4. THE CHAPMAN-KOLMOGOROV EQUATION 65

Now we use the Markov property, together with equations (4.16) and (4.17) andthe fact thats < u ⇒ FX

s ⊂ FXu to calculate:

P (Γ, t|x, s) := P(Xt ∈ Γ|Xs = x) = P(Xt ∈ Γ|FXs )

= E(IΓ(Xt)|FXs ) = E(E(IΓ(Xt)|FX

s )|FXu )

= E(E(IΓ(Xt)|FXu )|FX

s ) = E(P(Xt ∈ Γ|Xu)|FXs )

= E(P(Xt ∈ Γ|Xu = y)|Xs = x)

=

∫

R

P (Γ, t|Xu = y)P (dy, u|Xs = x)

=:

∫

R

P (Γ, t|y, u)P (dy, u|x, s).

IΓ(·) denotes the indicator function of the setΓ. We have also setE = R. TheCK equation is an integral equation and is the fundamental equation in the theoryof Markov processes. Under additional assumptions we will derive from it theFokker-Planck PDE, which is the fundamental equation in thetheory of diffusionprocesses, and will be the main object of study in this course.

Definition 4.4.1. A Markov process ishomogeneousif

P (t,Γ|Xs = x) := P (s, t, x,Γ) = P (0, t− s, x,Γ).

We setP (0, t, ·, ·) = P (t, ·, ·). The Chapman–Kolmogorov (CK) equation becomes

P (t+ s, x,Γ) =

∫

EP (s, x, dz)P (t, z,Γ). (4.18)

Let Xt be a homogeneous Markov process and assume that theinitial distri-bution of Xt is given by the probability measureν(Γ) = P (X0 ∈ Γ) (for deter-ministic initial conditions–X0 = x– we have thatν(Γ) = IΓ(x) ). The transitionfunctionP (x, t,Γ) and the initial distributionν determine the finite dimensionaldistributions ofX by

P(X0 ∈ Γ1, X(t1) ∈ Γ1, . . . ,Xtn ∈ Γn)

=

∫

Γ0

∫

Γ1

. . .

∫

Γn−1

P (tn − tn−1, yn−1,Γn)P (tn−1 − tn−2, yn−2, dyn−1)

· · · × P (t1, y0, dy1)ν(dy0). (4.19)

Theorem 4.4.2. ([12, Sec. 4.1]) LetP (t, x,Γ) satisfy (4.18) and assume that(E, ρ) is a complete separable metric space. Then there exists a Markov processX in E whose finite-dimensional distributions are uniquely determined by(4.19).


Let Xt be a homogeneous Markov process with initial distributionν(Γ) =

P (X0 ∈ Γ) and transition functionP (x, t,Γ). We can calculate the probability offindingXt in a setΓ at timet:

P(Xt ∈ Γ) =

∫

EP (x, t,Γ)ν(dx).

Thus, the initial distribution and the transition functionare sufficient to character-ize a homogeneous Markov process. Notice that they do not provide us with anyinformation about the actual paths of the Markov process. The transition proba-bility P (Γ, t|x, s) is a probability measure. Assume that it has a density for allt > s:

P (Γ, t|x, s) =

∫

Γp(y, t|x, s) dy.

Clearly, for t = s we haveP (Γ, s|x, s) = IΓ(x). The Chapman-Kolmogorovequation becomes:

∫

Γp(y, t|x, s) dy =

∫

R

∫

Γp(y, t|z, u)p(z, u|x, s) dzdy,

and, sinceΓ ∈ B(R) is arbitrary, we obtain the equation

p(y, t|x, s) =

∫

R

p(y, t|z, u)p(z, u|x, s) dz. (4.20)

The transition probability density is a function of4 arguments: the initial positionand timex, s and the final position and timey, t.

In words, the CK equation tells us that, for a Markov process,the transitionfrom x, s to y, t can be done in two steps: first the system moves fromx to z atsome intermediate timeu. Then it moves fromz to y at timet. In order to calculatethe probability for the transition from(x, s) to (y, t) we need to sum (integrate) thetransitions from all possible intermediary statesz. The above description suggeststhat a Markov process can be described through asemigroup of operators, i.e. aone-parameter family of linear operators with the properties

P0 = I, Pt+s = Pt Ps ∀ t, s > 0.

Indeed, letP (t, x, dy) be the transition function of a homogeneous Markovprocess. It satisfies the CK equation (4.18):

P (t+ s, x,Γ) =

∫

EP (s, x, dz)P (t, z,Γ).

4.5. THE GENERATOR OF A MARKOV PROCESSES 67

LetX := Cb(E) and define the operator

(Ptf)(x) := E(f(Xt)|X0 = x) =

∫

Ef(y)P (t, x, dy).

This is a linear operator with

(P0f)(x) = E(f(X0)|X0 = x) = f(x) ⇒ P0 = I.

Furthermore:

(Pt+sf)(x) =

∫f(y)P (t+ s, x, dy)

=

∫ ∫f(y)P (s, z, dy)P (t, x, dz)

=

∫ (∫f(y)P (s, z, dy)

)P (t, x, dz)

=

∫(Psf)(z)P (t, x, dz)

= (Pt Psf)(x).

Consequently:Pt+s = Pt Ps.

4.5 The Generator of a Markov Processes

Let (E, ρ) be a metric space and letXt be anE-valued homogeneous Markovprocess. Define the one parameter family of operatorsPt through

Ptf(x) =

∫f(y)P (t, x, dy) = E[f(Xt)|X0 = x]

for all f(x) ∈ Cb(E) (continuous bounded functions onE). Assume for simplicitythatPt : Cb(E) → Cb(E). Then the one-parameter family of operatorsPt formsa semigroupof operators onCb(E). We define byD(L) the set of allf ∈ Cb(E)

such that the strong limit

Lf = limt→0

Ptf − f

t,

exists.

Definition 4.5.1. The operatorL : D(L) → Cb(E) is called theinfinitesimalgenerator of the operator semigroupPt.


Definition 4.5.2. The operatorL : Cb(E) → Cb(E) defined above is called thegeneratorof the Markov processXt; t > 0.

The semigroup property and the definition of the generator ofa semigroupimply that, formally at least, we can write:

Pt = exp(Lt).

Consider the functionu(x, t) := (Ptf)(x). We calculate its time derivative:

∂u

∂t=

d

dt(Ptf) =

d

dt

(eLtf

)

= L(eLtf

)= LPtf = Lu.

Furthermore,u(x, 0) = P0f(x) = f(x). Consequently,u(x, t) satisfies the initialvalue problem

∂u

∂t= Lu, u(x, 0) = f(x). (4.21)

When the semigroupPt is the transition semigroup of a Markov processXt,then equation (4.21) is called thebackward Kolmogorov equation. It governs theevolution of anobservable

u(x, t) = E(f(Xt)|X0 = x).

Thus, given the generator of a Markov processL, we can calculate all the statisticsof our process by solving the backward Kolmogorov equation.In the case wherethe Markov process is the solution of a stochastic differential equation, then thegenerator is a second order elliptic operator and the backward Kolmogorov equa-tion becomes an initial value problem for a parabolic PDE.

The spaceCb(E) is natural in a probabilistic context, but other Banach spacesoften arise in applications; in particular when there is a measureµ onE, the spacesLp(E;µ) sometimes arise. We will quite often use the spaceL2(E;µ), whereµ will is the invariant measure of our Markov process. The generator is fre-quently taken as the starting point for the definition of a homogeneous Markovprocess. Conversely, letPt be acontraction semigroup(LetX be a Banach spaceand T : X → X a bounded operator. ThenT is a contraction provided that‖Tf‖X 6 ‖f‖X ∀ f ∈ X), with D(Pt) ⊂ Cb(E), closed. Then, under mildtechnical hypotheses, there is anE–valued homogeneous Markov processXtassociated withPt defined through

E[f(X(t)|FXs )] = Pt−sf(X(s))

4.5. THE GENERATOR OF A MARKOV PROCESSES 69

for all t, s ∈ T with t > s andf ∈ D(Pt).

Example 4.5.3.The Poisson process is a homogeneous Markov process.

Example 4.5.4.The one dimensional Brownian motion is a homogeneous Markovprocess. The transition function is the Gaussian defined in the example in Lecture2:

P (t, x, dy) = γt,x(y)dy, γt,x(y) =1√2πt

exp

(−|x− y|2

2t

).

The semigroup associated to the standard Brownian motion isthe heat semigroup

Pt = et2

d2

dx2 . The generator of this Markov process is12d2

dx2 .

Notice that thetransition probability density γt,x of the one dimensionalBrownian motion is the fundamental solution (Green’s function) of the heat (diffu-sion) PDE

∂u

∂t=

1

2

∂2u

∂x2.

4.5.1 The Adjoint Semigroup

The semigroupPt acts on bounded measurable functions. We can also define theadjoint semigroupP ∗

t which acts on probability measures:

P ∗t µ(Γ) =

∫

R

P(Xt ∈ Γ|X0 = x) dµ(x) =

∫

R

p(t, x,Γ) dµ(x).

The image of a probability measureµ underP ∗t is again a probability measure.

The operatorsPt andP ∗t are adjoint in theL2-sense:

∫

R

Ptf(x) dµ(x) =

∫

R

f(x) d(P ∗t µ)(x). (4.22)

We can, formally at least, write

P ∗t = exp(L∗t),

whereL∗ is theL2-adjoint of the generator of the process:∫

Lfh dx =

∫fL∗hdx.

Let µt := P ∗t µ. This is thelaw of the Markov process andµ is the initial dis-

tribution. An argument similar to the one used in the derivation of the backward


Kolmogorov equation (4.21) enables us to obtain an equationfor the evolution ofµt:

∂µt∂t

= L∗µt, µ0 = µ.

Assuming thatµt = ρ(y, t) dy, µ = ρ0(y) dy this equation becomes:

∂ρ

∂t= L∗ρ, ρ(y, 0) = ρ0(y). (4.23)

This is theforward Kolmogorov or Fokker-Planck equation. When the initialconditions are deterministic,X0 = x, the initial condition becomesρ0 = δ(y−x).Given the initial distribution and the generator of the Markov processXt, we cancalculate the transition probability density by solving the Forward Kolmogorovequation. We can then calculate all statistical quantitiesof this process through theformula

E(f(Xt)|X0 = x) =

∫f(y)ρ(t, y;x) dy.

We will derive rigorously the backward and forward Kolmogorov equations forMarkov processes that are defined as solutions of stochasticdifferential equationslater on.

We can study the evolution of a Markov process in two different ways: Eitherthrough the evolution ofobservables(Heisenberg/Koopman)

∂(Ptf)

∂t= L(Ptf),

or through the evolution ofstates(Schrodinger/Frobenious-Perron)

∂(P ∗t µ)

∂t= L∗(P ∗

t µ).

We can also study Markov processes at the level of trajectories. We will do thisafter we define the concept of a stochastic differential equation.

4.6 Ergodic Markov processes

A very important concept in the study of limit theorems for stochastic processes isthat ofergodicity. This concept, in the context of Markov processes, providesuswith information on the long–time behavior of a Markov semigroup.

4.6. ERGODIC MARKOV PROCESSES 71

Definition 4.6.1. A Markov process is calledergodic if the equation

Ptg = g, g ∈ Cb(E) ∀t > 0

has only constant solutions.

Roughly speaking, ergodicity corresponds to the case wherethe semigroupPtis such thatPt − I has only constants in its null space, or, equivalently, to the casewhere the generatorL has only constants in its null space. This follows from thedefinition of the generator of a Markov process.

Under some additional compactness assumptions, an ergodicMarkov processhas aninvariant measureµ with the property that, in the caseT = R

+,

limt→+∞

1

t

∫ t

0g(Xs) ds = Eg(x),

whereE denotes the expectation with respect toµ. This is a physicist’s definitionof an ergodic process:time averages equal phase space averages.

Using the adjoint semigroup we can define an invariant measure as the solutionof the equation

P ∗t µ = µ.

If this measure is unique, then the Markov process is ergodic. Using this, we canobtain an equation for the invariant measure in terms of the adjoint of the generatorL∗, which is the generator of the semigroupP ∗

t . Indeed, from the definition of thegenerator of a semigroup and the definition of an invariant measure, we concludethat a measureµ is invariant if and only if

L∗µ = 0

in some appropriate generalized sense ((L∗µ, f) = 0 for every bounded measur-able function). Assume thatµ(dx) = ρ(x) dx. Then theinvariant density satisfiesthestationary Fokker-Planck equation

L∗ρ = 0.

The invariant measure (distribution) governs the long-time dynamics of the Markovprocess.


4.6.1 Stationary Markov Processes

If X0 is distributed according toµ, then so isXt for all t > 0. The resultingstochastic process, withX0 distributed in this way, isstationary . In this casethe transition probability density (the solution of the Fokker-Planck equation) isindependent of time:ρ(x, t) = ρ(x). Consequently, the statistics of the Markovprocess is independent of time.

Example 4.6.2. Consider the one-dimensional Brownian motion. The generatorof this Markov process is

L =1

2

d2

dx2.

The stationary Fokker-Planck equation becomes

d2ρ

dx2= 0, (4.24)

together with the normalization and non-negativity conditions

ρ > 0,

∫

R

ρ(x) dx = 1. (4.25)

There are no solutions to Equation(4.24), subject to the constraints(4.25). 2 Thus,the one dimensional Brownian motionis not an ergodic process.

Example 4.6.3.Consider a one-dimensional Brownian motion on[0, 1], with pe-riodic boundary conditions. The generator of this Markov processL is the differ-ential operatorL = 1

2d2

dx2 , equipped with periodic boundary conditions on[0, 1].

This operator is self-adjoint. The null space of bothL andL∗ comprises constantfunctions on[0, 1]. Both the backward Kolmogorov and the Fokker-Planck equationreduce to the heat equation

∂ρ

∂t=

1

2

∂2ρ

∂x2

with periodic boundary conditions in[0, 1]. Fourier analysis shows that the solu-tion converges to a constant at an exponential rate. See Exercise 6.

Example 4.6.4. The one dimensionalOrnstein-Uhlenbeck (OU) processis aMarkov process with generator

L = −αx ddx

+Dd2

dx2.

2The general solution to Equation (4.25) isρ(x) = Ax+B for arbitrary constantsA andB. Thisfunction is not normalizable, i.e. there do not exist constantsA andB so that

R

Rrho(x) dx = 1.


The null space ofL comprises constants inx. Hence, it is an ergodic Markovprocess. In order to calculate the invariant measure we needto solve the stationaryFokker–Planck equation:

L∗ρ = 0, ρ > 0, ‖ρ‖L1(R) = 1. (4.26)

Let us calculate theL2-adjoint ofL. Assuming thatf, h decay sufficiently fast atinfinity, we have:

∫

R

Lfh dx =

∫

R

[(−αx∂xf)h+ (D∂2

xf)h]dx

=

∫

R

[f∂x(αxh) + f(D∂2

xh)]dx =:

∫

R

fL∗hdx,

where

L∗h :=d

dx(axh) +D

d2h

dx2.

We can calculate the invariant distribution by solving equation (4.26). The invari-ant measure of this process is the Gaussian measure

µ(dx) =

√α

2πDexp

(− α

2Dx2)dx.

If the initial condition of the OU process is distributed according to the invariantmeasure, then the OU process is a stationary Gaussian process.

Let Xt be the1d OU process and letX0 ∼ N (0,D/α). ThenXt is a meanzero, Gaussian second order stationary process on[0,∞) with correlation function

R(t) =D

αe−α|t|

and spectral density

f(x) =D

π

1

x2 + α2.

Furthermore, the OU process isthe only real-valued mean zero Gaussian second-order stationary Markov process defined onR.


The study of operator semigroups started in the late40’s independently by Hille andYosida. Semigroup theory was developed in the50’s and60’s by Feller, Dynkinand others, mostly in connection to the theory of Markov processes. Necessaryand sufficient conditions for an operatorL to be the generator of a (contraction)semigroup are given by the Hille-Yosida theorem [13, Ch. 7].


4.8 Exercises

1. Let Xn be a stochastic process with state spaceS = Z. Show that it is aMarkov process if and only if for alln

P(Xn+1 = in+1|X1 = i1, . . . Xn = in) = P(Xn+1 = in+1|Xn = in).

2. Show that (4.4) is the solution of initial value problem (4.10) as well as of thefinal value problem

−∂p∂s

=1

2

∂2p

∂x2, lim

s→tp(y, t|x, s) = δ(y − x).

3. Use (4.5) to show that the forward and backward Kolmogorovequations for theOU process are

∂p

∂t=

∂

∂y(yp) +

1

2

∂2p

∂y2

and

−∂p∂s

= −x∂p∂x

+1

2

∂2p

∂x2.

4. LetW (t) be a standard one dimensional Brownian motion, letY (t) = σW (t)

with σ > 0 and consider the process

X(t) =

∫ t

0Y (s) ds.

Show that the joint processX(t), Y (t) is Markovian and write down thegenerator of the process.

5. LetY (t) = e−tW (e2t) be the stationary Ornstein-Uhlenbeck process and con-sider the process

X(t) =

∫ t

0Y (s) ds.

Show that the joint processX(t), Y (t) is Markovian and write down thegenerator of the process.

6. Consider a one-dimensional Brownian motion on[0, 1], with periodic boundaryconditions. The generator of this Markov processL is the differential operatorL = 1

2d2

dx2 , equipped with periodic boundary conditions on[0, 1]. Show that this

4.8. EXERCISES 75

operator is self-adjoint. Show that the null space of bothL andL∗ comprisesconstant functions on[0, 1]. Conclude that this process is ergodic. Solve thecorresponding Fokker-Planck equation for arbitrary initial conditionsρ0(x) .Show that the solution converges to a constant at an exponential rate. .

7. (a) LetX, Y be mean zero Gaussian random variables withEX2 = σ2X , EY 2 =

σ2Y and correlation coefficientρ (the correlation coefficient isρ = E(XY )

σXσY).

Show thatE(X|Y ) =

ρσXσY

Y.

(b) LetXt be a mean zero stationary Gaussian process with autocorrelationfunctionR(t). Use the previous result to show that

E[Xt+s|Xs] =R(t)

R(0)X(s), s, t > 0.

(c) Use the previous result to show that the only stationary Gaussian Markovprocess with continuous autocorrelation function is the stationary OU pro-cess.

8. Show that a Gaussian processXt is a Markov process if and only if

E(Xtn |Xt1 = x1, . . . Xtn−1 = xn−1) = E(Xtn |Xtn−1 = xn−1).

Chapter 5

Diffusion Processes

5.1 Introduction

In this chapter we study a particular class of Markov processes, namely Markovprocesses with continuous paths. These processes are called diffusion processesand they appear in many applications in physics, chemistry,biology and finance.

In Section 5.2 we give the definition of a diffusion process. In section 5.3 wederive the forward and backward Kolmogorov equations for one-dimensional diffu-sion processes. In Section 5.4 we present the forward and backward Kolmogorovequations in arbitrary dimensions. The connection betweendiffusion processesand stochastic differential equations is presented in Section 5.5. Discussion andbibliographical remarks are included in Section 5.7. Exercises can be found inSection 5.8.

5.2 Definition of a Diffusion Process

A Markov process consists of three parts: a drift (deterministic), a random processand a jump process. A diffusion process is a Markov process that has continuoussample paths (trajectories). Thus, it is a Markov process with no jumps. A diffusionprocess can be defined by specifying its first two moments:

Definition 5.2.1. A Markov processXt with transition functionP (Γ, t|x, s) iscalled adiffusion processif the following conditions are satisfied.

77

78 CHAPTER 5. DIFFUSION PROCESSES

i. (Continuity). For everyx and everyε > 0

∫

|x−y|>εP (dy, t|x, s) = o(t− s) (5.1)

uniformly overs < t.

ii. (Definition of drift coefficient). There exists a function a(x, s) such that foreveryx and everyε > 0

∫

|y−x|6ε(y − x)P (dy, t|x, s) = a(x, s)(t− s) + o(t− s). (5.2)


iii. (Definition of diffusion coefficient). There exists a functionb(x, s) such thatfor everyx and everyε > 0

∫

|y−x|6ε(y − x)2P (dy, t|x, s) = b(x, s)(t− s) + o(t− s). (5.3)


Remark 5.2.2. In Definition 5.2.1 we had to truncate the domain of integrationsince we didn’t know whether the first and second moments exist. If we assumethat there exists aδ > 0 such that

limt→s

1

t− s

∫

Rd

|y − x|2+δP (dy, t|x, s) = 0, (5.4)

then we can extend the integration over the wholeRd and use expectations in the

definition of the drift and the diffusion coefficient. Indeed, ,let k = 0, 1, 2 andnotice that

∫

|y−x|>ε|y − x|kP (dy, t|x, s)

=

∫

|y−x|>ε|y − x|2+δ|y − x|k−(2+δ)P (dy, t|x, s)

61

ε2+δ−k

∫

|y−x|>ε|y − x|2+δP (dy, t|x, s)

61

ε2+δ−k

∫

Rd

|y − x|2+δP (dy, t|x, s).

5.3. THE BACKWARD AND FORWARD KOLMOGOROV EQUATIONS 79

Using this estimate together with(5.4)we conclude that:

limt→s

1

t− s

∫

|y−x|>ε|y − x|kP (dy, t|x, s) = 0, k = 0, 1, 2.

This implies that assumption(5.4)is sufficient for the sample paths to be continuous(k = 0) and for the replacement of the truncated integrals in(8.73)and (5.3) byintegrals overR (k = 1 andk = 2, respectively). The definitions of the drift anddiffusion coefficients become:

limt→s

E

(Xt −Xs

t− s

∣∣∣Xs = x

)= a(x, s) (5.5)

and

limt→s

E

( |Xt −Xs|2t− s

∣∣∣Xs = x

)= b(x, s) (5.6)

5.3 The Backward and Forward Kolmogorov Equations

In this section we show that a diffusion process is completely determined by its firsttwo moments. In particular, we will obtain partial differential equations that governthe evolution of the conditional expectation of an arbitrary function of a diffusionprocessXt, u(x, s) = E(f(Xt)|Xs = x), as well as of the transition probabilitydensityp(y, t|x, s). These are thebackward andforward Kolmogorov equations.

In this section we shall derive the backward and forward Kolmogorov equa-tions for one-dimensional diffusion processes. The extension to multidimensionaldiffusion processes is presented in Section 5.4.

5.3.1 The Backward Kolmogorov Equation

Theorem 5.3.1.(Kolmogorov) Letf(x) ∈ Cb(R) and let

u(x, s) := E(f(Xt)|Xs = x) =

∫f(y)P (dy, t|x, s).

Assume furthermore that the functionsa(x, s), b(x, s) are continuous in bothxands. Thenu(x, s) ∈ C2,1(R × R

+) and it solves thefinal value problem

−∂u∂s

= a(x, s)∂u

∂x+

1

2b(x, s)

∂2u

∂x2, lim

s→tu(s, x) = f(x). (5.7)


Proof. First we notice that, the continuity assumption (5.1), together with the factthat the functionf(x) is bounded imply that

u(x, s) =

∫

R

f(y)P (dy, t|x, s)

=

∫

|y−x|6εf(y)P (dy, t|x, s) +

∫

|y−x|>εf(y)P (dy, t|x, s)

6

∫

|y−x|6εf(y)P (dy, t|x, s) + ‖f‖L∞

∫

|y−x|>εP (dy, t|x, s)

=

∫

|y−x|6εf(y)P (dy, t|x, s) + o(t− s).

We add and subtract the final conditionf(x) and use the previous calculation toobtain:

u(x, s) =

∫

R

f(y)P (dy, t|x, s) = f(x) +

∫

R

(f(y) − f(x))P (dy, t|x, s)

= f(x) +

∫

|y−x|6ε(f(y) − f(x))P (dy, t|x, s) +

∫

|y−x|>ε(f(y) − f(x))P (dy, t|x, s)

= f(x) +

∫

|y−x|6ε(f(y) − f(x))P (dy, t|x, s) + o(t− s).

Now the final condition follows from the fact thatf(x) ∈ Cb(R) and the arbitrari-ness ofε.

Now we show thatu(s, x) solves the backward Kolmogorov equation. We usethe Chapman-Kolmogorov equation (4.15) to obtain

u(x, σ) =

∫

R

f(z)P (dz, t|x, σ) (5.8)

=

∫

R

∫

R

f(z)P (dz, t|y, ρ)P (dy, ρ|x, σ)

=

∫

R

u(y, ρ)P (dy, ρ|x, σ). (5.9)

The Taylor series expansion of the functionu(x, s) gives

u(z, ρ)−u(x, ρ) =∂u(x, ρ)

∂x(z−x)+ 1

2

∂2u(x, ρ)

∂x2(z−x)2(1+αε), |z−x| 6 ε,

(5.10)where

αε = supρ,|z−x|6ε

∣∣∣∣∂2u(x, ρ)

∂x2− ∂2u(z, ρ)

∂x2

∣∣∣∣ .


Notice that, sinceu(x, s) is twice continuously differentiable inx, limε→0 αε = 0.

We combine now (5.9) with (5.10) to calculate

u(x, s) − u(x, s+ h)

h=

1

h

(∫

R

P (dy, s+ h|x, s)u(y, s + h) − u(x, s + h)

)

=1

h

∫

R

P (dy, s+ h|x, s)(u(y, s + h) − u(x, s+ h))

=1

h

∫

|x−y|<εP (dy, s + h|x, s)(u(y, s + h) − u(x, s)) + o(1)

=∂u

∂x(x, s+ h)

1

h

∫

|x−y|<ε(y − x)P (dy, s + h|x, s)

+1

2

∂2u

∂x2(x, s + h)

1

h

∫

|x−y|<ε(y − x)2P (dy, s + h|x, s)(1 + αε) + o(1)

= a(x, s)∂u

∂x(x, s+ h) +

1

2b(x, s)

∂2u

∂x2(x, s+ h)(1 + αε) + o(1).

Equation (5.7) follows by taking the limitsε→ 0, h→ 0.

Assume now that the transition function has a densityp(y, t|x, s). In this casethe formula foru(x, s) becomes

u(x, s) =

∫

R

f(y)p(y, t|x, s) dy.

Substituting this in the backward Kolmogorov equation we obtain

∫

R

f(y)

(∂p(y, t|x, s)

∂s+ As,xp(y, t|x, s)

)= 0 (5.11)

where

As,x := a(x, s)∂

∂x+

1

2b(x, s)

∂2

∂x2.

Since (5.11) is valid for arbitrary functionsf(y), we obtain a partial differentialequations for the transition probability density:

−∂p(y, t|x, s)∂s

= a(x, s)∂p(y, t|x, s)

∂x+

1

2b(x, s)

∂2p(y, t|x, s)∂x2

. (5.12)

Notice that the variation is with respect to the ”backward” variablesx, s. We willobtain an equation with respect to the ”forward” variablesy, t in the next section.


5.3.2 The Forward Kolmogorov Equation

In this section we will obtain the forward Kolmogorov equation. In the physicsliterature is called theFokker-Planck equation. We assume that the transitionfunction has a density with respect to Lebesgue measure.

P (Γ, t|x, s) =

∫

Γp(y, t|x, s) dy.

Theorem 5.3.2. (Kolmogorov) Assume that conditions(5.1), (8.73), (5.3)are sat-isfied and thatp(y, t|·, ·), a(y, t), b(y, t) ∈ C2,1(R × R

+). Then the transitionprobability density satisfies the equation

∂p

∂t= − ∂

∂y(a(t, y)p) +

1

2

∂2

∂y2(b(t, y)p) , lim

t→sp(t, y|x, s) = δ(x − y). (5.13)

Proof. Fix a functionf(y) ∈ C20(R). An argument similar to the one used in the

proof of the backward Kolmogorov equation gives

limh→0

1

h

(∫f(y)p(y, s+ h|x, s) ds − f(x)

)= a(x, s)fx(x) +

1

2b(x, s)fxx(x),

(5.14)where subscripts denote differentiation with respect tox. On the other hand∫f(y)

∂

∂tp(y, t|x, s) dy =

∂

∂t

∫f(y)p(y, t|x, s) dy

= limh→0

1

h

∫(p(y, t+ h|x, s) − p(y, t|x, s)) f(y) dy

= limh→0

1

h

(∫p(y, t+ h|x, s)f(y) dy −

∫p(z, t|s, x)f(z) dz

)

= limh→0

1

h

(∫ ∫p(y, t+ s|z, t)p(z, t|x, s)f(y) dydz −

∫p(z, t|s, x)f(z)

= limh→0

1

h

(∫p(z, t|x, s)

(∫p(y, t+ h|z, t)f(y) dy − f(z)

))dz

=

∫p(z, t|x, s)

(a(z, t)fz(z) +

1

2b(z)fzz(z)

)dz

=

∫ (− ∂

∂z(a(z)p(z, t|x, s)) +

1

2

∂2

∂z2(b(z)p(z, t|x, s)

)f(z) dz.

In the above calculation used the Chapman-Kolmogorov equation. We have alsoperformed two integrations by parts and used the fact that, since the test functionfhas compact support, the boundary terms vanish.


Since the above equation is valid for every test functionf(y), the forwardKolmogorov equation follows.

Assume now that initial distribution ofXt is ρ0(x) and sets = 0 (the initialtime) in (5.13). Define

p(y, t) :=

∫p(y, t|x, 0)ρ0(x) dx. (5.15)

We multiply the forward Kolmogorov equation (5.13) byρ0(x) and integrate withrespect tox to obtain the equation

∂p(y, t)

∂t= − ∂

∂y(a(y, t)p(y, t)) +

1

2

∂2

∂y2(b(y, t)p(t, y)) , (5.16)

together with the initial condition

p(y, 0) = ρ0(y). (5.17)

The solution of equation (5.16), provides us with the probability that the diffusionprocessXt, which initially was distributed according to the probability densityρ0(x), is equal toy at timet. Alternatively, we can think of the solution to (5.13)as the Green’s function for the PDE (5.16). Using (5.16) we can calculate theexpectation of an arbitrary function of the diffusion processXt:

E(f(Xt)) =

∫ ∫f(y)p(y, t|x, 0)p(x, 0) dxdy

=

∫f(y)p(y, t) dy,

wherep(y, t) is the solution of (5.16). Quite often we need to calculate joint prob-ability densities. For, example the probability thatXt1 = x1 andXt2 = x2. Fromthe properties of conditional expectation we have that

p(x1, t1, x2, t2) = P(Xt1 = x1,Xt2 = x2)

= P(Xt1 = x1|Xt2 = x2)P(Xt2 = x2)

= p(x1, t1|x2t2)p(x2, t2).

Using the joint probability density we can calculate the statistics of a function ofthe diffusion processXt at timest ands:

E(f(Xt,Xs)) =

∫ ∫f(y, x)p(y, t|x, s)p(x, s) dxdy. (5.18)


Theautocorrelation function at timet ands is given by

E(XtXs) =

∫ ∫yxp(y, t|x, s)p(x, s) dxdy.

In particular,

E(XtX0) =

∫ ∫yxp(y, t|x, 0)p(x, 0) dxdy.

5.4 Multidimensional Diffusion Processes

LetXt be a diffusion process inRd. The drift and diffusion coefficients of a diffu-sion process inRd are defined as:

limt→s

1

t− s

∫

|y−x|<ε(y − x)P (dy, t|x, s) = a(x, s)

and

limt→s

1

t− s

∫

|y−x|<ε(y − x) ⊗ (y − x)P (dy, t|x, s) = b(x, s).

The drift coefficienta(x, s) is ad-dimensional vector field and the diffusion coef-ficientb(x, s) is ad× d symmetric matrix (second order tensor). The generator ofad dimensional diffusion process is

L = a(x, s) · ∇ +1

2b(x, s) : ∇∇

=d∑

j=1

aj(x, s)∂

∂xj+

1

2

d∑

i,j=1

bij(x, s)∂2

∂x2j

.

Exercise 5.4.1.Derive rigorously the forward and backward Kolmogorov equa-

tions in arbitrary dimensions.

Assuming that the first and second moments of the multidimensional diffusionprocess exist, we can write the formulas for the drift vectorand diffusion matrix as

limt→s

E

(Xt −Xs

t− s

∣∣∣Xs = x

)= a(x, s) (5.19)

and

limt→s

E

((Xt −Xs) ⊗ (Xt −Xs)

t− s

∣∣∣Xs = x

)= b(x, s) (5.20)

Notice that from the above definition it follows that the diffusion matrix is sym-metric and nonnegative definite.

5.5. CONNECTION WITH STOCHASTIC DIFFERENTIAL EQUATIONS85

5.5 Connection with Stochastic Differential Equations

Notice also that the continuity condition can be written in the form

P (|Xt −Xs| > ε|Xs = x) = o(t− s).

Now it becomes clear that this condition implies that the probability of large changesin Xt over short time intervals is small. Notice, on the other hand, that the abovecondition implies that the sample paths of a diffusion process are not differen-tiable: if they where, then the right hand side of the above equationwould have tobe0 whent− s ≪ 1. The sample paths of a diffusion process have the regularityof Brownian paths. A Markovian processcannot bedifferentiable: we can definethe derivative of a sample paths only with processes for which the past and futureare not statistically independent when conditioned on the present.

Let us denote the expectation conditioned onXs = x by Es,x. Notice that the

definitions of the drift and diffusion coefficients (5.5) and(5.6) can be written inthe form

Es,x(Xt −Xs) = a(x, s)(t− s) + o(t− s).

andEs,x((Xt −Xs) ⊗ (Xt −Xs)

)= b(x, s)(t − s) + o(t− s).

Consequently, the drift coefficient defines themean velocity vectorfor the stochas-tic processXt, whereas the diffusion coefficient (tensor) is a measure of the localmagnitude of fluctuations ofXt −Xs about the mean value. hence, we can writelocally:

Xt −Xs ≈ a(s,Xs)(t− s) + σ(s,Xs) ξt,

whereb = σσT andξt is a mean zero Gaussian process with

Es,x(ξt ⊗ ξs) = (t− s)I.

Since we have thatWt −Ws ∼ N (0, (t − s)I),

we conclude that we can write locally:

∆Xt ≈ a(s,Xs)∆t+ σ(s,Xs)∆Wt.

Or, replacing the differences by differentials:

dXt = a(t,Xt)dt+ σ(t,Xt)dWt.

Hence, the sample paths of a diffusion process are governed by a stochastic dif-ferential equation (SDE).


5.6 Examples of Diffusion Processes

i. The1-dimensional Brownian motion starting atx is a diffusion process withgenerator

L =1

2

d2

dx2.

The drift and diffusion coefficients are, respectivelya(x) = 0 andb(x) = 1.The corresponding stochastic differential equation is

dXt = dWt, X0 = x.

The solution of this SDE is

Xt = x+Wt.

ii. The 1-dimensional Ornstein-Uhlenbeck process is a diffusion process withdrift and diffusion coefficients, respectively,a(x) = −αx andb(x) = D.The generator of this process is

L = −αx ddx

+D

2

d2

dx2.

The corresponding SDE is

dXt = −αXt dt+√DdWt.

The solution to this equation is

Xt = e−αtX0 +√D

∫ t

0e−α(t−s) dWs.


The argument used in the derivation of the forward and backward Kolmogorovequations goes back to Kolmogorov’s original work. More material on diffusionprocesses can be found in [26], [32].

5.8. EXERCISES 87

5.8 Exercises

1. Prove equation (5.14).

2. Derive the initial value problem (5.16), (5.17).

3. Derive rigorously the backward and forward Kolmogorov equations in arbitrarydimensions.

Chapter 6

The Fokker-Planck Equation

6.1 Introduction

In the previous chapter we derived the backward and forward (Fokker-Planck) Kol-mogorov equations and we showed that all statistical properties of a diffusion pro-cess can be calculated from the solution of the Fokker-Planck equation.1 In thislong chapter we study various properties of this equation such as existence anduniqueness of solutions, long time asymptotics, boundary conditions and spectralproperties of the Fokker-Planck operator. We also study in some detail various ex-amples of diffusion processes and of the associated Fokker-Palnck equation. Wewill restrict attention to time-homogeneous diffusion processes, for which the driftand diffusion coefficients do not depend on time.

In Section 6.2 we study various basic properties of the Fokker-Planck equa-tion, including existence and uniqueness of solutions, writing the equation as aconservation law and boundary conditions. In Section 6.3 wepresent some exam-ples of diffusion processes and use the corresponding Fokker-Planck equation inorder to calculate various quantities of interest such as moments. In Section 6.4 westudy the multidimensional Onrstein-Uhlenbeck process and we study the spectralproperties of the corresponding Fokker-Planck operator. In Section 6.5 we studystochastic processes whose drift is given by the gradient ofa scalar function,gra-dient flows. In Section 6.7 we solve the Fokker-Planck equation for a gradientSDE using eigenfunction expansions and we show how the eigenvalue problemfor the Fokker-Planck operator can be reduced to the eigenfunction expansion for

1In this chapter we will call the equation Fokker-Planck, which is more customary in the physicsliterature. rather forward Kolmogorov, which is more customary in the mathematics literature.

89

90 CHAPTER 6. THE FOKKER-PLANCK EQUATION

a Schrodinger operator. In Section 8.2 we study the Langevin equation and theassociated Fokker-Planck equation. In Section 8.3 we calculate the eigenvaluesand eigenfunctions of the Fokker-Planck operator for the Langevin equation in aharmonic potential. Discussion and bibliographical remarks are included in Sec-tion 6.8. Exercises can be found in Section 6.9.

6.2 Basic Properties of the FP Equation

6.2.1 Existence and Uniqueness of Solutions

Consider a homogeneous diffusion process onRd with drift vector and diffusion

matrixa(x) andb(x). The Fokker-Planck equation is

∂p

∂t= −

d∑

j=1

∂

∂xj(ai(x)p) +

1

2

d∑

i,j=1

∂2

∂xi∂xj(bij(x)p), t > 0, x ∈ R

d, (6.1a)

p(x, 0) = f(x), x ∈ Rd. (6.1b)

Sincef(x) is the probability density of the initial condition (which is a randomvariable), we have that

f(x) > 0, and∫

Rd

f(x) dx = 1.

We can also write the equation innon-divergence form:

∂p

∂t=

d∑

j=1

aj(x)∂p

∂xj+

1

2

d∑

i,j=1

bij(x)∂2p

∂xi∂xj+ c(x)u, t > 0, x ∈ R

d, (6.2a)

p(x, 0) = f(x), x ∈ Rd, (6.2b)

where

ai(x) = −ai(x) +

d∑

j=1

∂bij∂xj

, ci(x) =1

2

d∑

i,j=1

∂2bij∂xi∂xj

−d∑

i=1

∂ai∂xi

.

By definition (see equation (5.20)), the diffusion matrix isalways symmetric andnonnegative. We will assume that it is actually uniformly positive definite, i.e. wewill impose theuniform ellipticity condition:

d∑

i,j=1

bij(x)ξiξj > α‖ξ‖2, ∀ ξ ∈ Rd, (6.3)

6.2. BASIC PROPERTIES OF THE FP EQUATION 91

Furthermore, we will assume that the coefficientsa, b, c are smooth and that theysatisfy the growth conditions

‖b(x)‖ 6 M, ‖a(x)‖ 6 M(1 + ‖x‖), ‖c(x)‖ 6 M(1 + ‖x‖2). (6.4)

Definition 6.2.1. We will call a solution to the Cauchy problem for the Fokker–Planck equation(6.2)a classical solutionif:

i. u ∈ C2,1(Rd,R+).

ii. ∀T > 0 there exists ac > 0 such that

‖u(t, x)‖L∞(0,T ) 6 ceα‖x‖2

iii. limt→0 u(t, x) = f(x).

It is a standard result in the theory of parabolic partial differential equationsthat, under the regularity and uniform ellipticity assumptions, the Fokker-Planckequation has a unique smooth solution. Furthermore, the solution can be estimatedin terms of an appropriate heat kernel (i.e. the solution of the heat equation onRd).

Theorem 6.2.2.Assume that conditions(6.3) and (6.4) are satisfied, and assumethat |f | 6 ceα‖x‖

2. Then there exists a unique classical solution to the Cauchy

problem for the Fokker–Planck equation. Furthermore, there exist positive con-stantsK, δ so that

|p|, |pt|, ‖∇p‖, ‖D2p‖ 6 Kt(−n+2)/2 exp

(− 1

2tδ‖x‖2

). (6.5)

Notice that from estimates (6.5) it follows that all momentsof a uniformlyelliptic diffusion process exist. In particular, we can multiply the Fokker-Planckequation by monomialsxn and then to integrate overR

d and to integrate by parts.No boundary terms will appear, in view of the estimate (6.5).

Remark 6.2.3. The solution of the Fokker-Planck equation is nonnegative for alltimes, provided that the initial distribution is nonnegative. This is follows from themaximum principle for parabolic PDEs.


6.2.2 The FP equation as a conservation law

The Fokker-Planck equation is in fact a conservation law: itexpresses the law ofconservation of probability. To see this we define theprobability current to bethe vector whoseith component is

Ji := ai(x)p−1

2

d∑

j=1

∂

∂xj

(bij(x)p

). (6.6)

We use the probability current to write the Fokker–Planck equation as acontinuityequation:

∂p

∂t+ ∇ · J = 0.

Integrating the FP equation overRd and integrating by parts on the right hand side

of the equation we obtain

d

dt

∫

Rd

p(x, t) dx = 0.

Consequently:

‖p(·, t)‖L1(Rd) = ‖p(·, 0)‖L1(Rd) = 1. (6.7)

Hence, the total probability is conserved, as expected. Equation (6.7) simply meansthat

E(Xt ∈ Rd) = 1, t > 0.

6.2.3 Boundary conditions for the Fokker–Planck equation

When studying a diffusion process that can take values on thewhole ofRd, thenwe study the pure initial value (Cauchy) problem for the Fokker-Planck equation,equation (6.1). The boundary condition was that the solution decays sufficientlyfast at infinity. For ergodic diffusion processes this is equivalent to requiring thatthe solution of the backward Kolmogorov equation is an element of L2(µ) whereµ is the invariant measure of the process. There are many applications where it isimportant to study stochastic process in bounded domains. In this case it is neces-sary to specify the value of the stochastic process (or equivalently of the solutionto the Fokker-Planck equation) on the boundary.

6.2. BASIC PROPERTIES OF THE FP EQUATION 93

To understand the type of boundary conditions that we can impose on theFokker-Planck equation, let us consider the example of a random walk on the do-main0, 1, . . . N.2 When the random walker reaches either the left or the rightboundary we can either set

i. X0 = 0 or XN = 0, which means that the particle gets absorbed at theboundary;

ii. X0 = X1 orXN = XN−1, which means that the particle is reflected at theboundary;

iii. X0 = XN , which means that the particle is moving on a circle (i.e., weidentify the left and right boundaries).

Hence, we can haveabsorbing, reflectingor periodic boundary conditions.

Consider the Fokker-Planck equation posed inΩ ⊂ Rd whereΩ is a bounded

domain with smooth boundary. LetJ denote the probability current and letn be theunit outward pointing normal vector to the surface. The above boundary conditionsbecome:

i. The transition probability density vanishes on an absorbing boundary:

p(x, t) = 0, on ∂Ω.

ii. There is no net flow of probability on a reflecting boundary:

n · J(x, t) = 0, on ∂Ω.

iii. The transition probability density is a periodic function in the case of peri-odic boundary conditions.

Notice that, using the terminology customary to PDEs theory, absorbing boundaryconditions correspond to Dirichlet boundary conditions and reflecting boundaryconditions correspond to Neumann. Of course, on consider more complicated,mixed boundary conditions.

2Of course, the random walk is not a diffusion process. However, as we have already seen theBrownian motion can be defined as the limit of an appropriately rescaled random walk. A similarconstruction exists for more general diffusion processes.


Consider now a diffusion process in one dimension on the interval [0, L]. Theboundary conditions are

p(0, t) = p(L, t) = 0 absorbing,

J(0, t)) = J(L, t) = 0 reflecting,

p(0, t) = p(L, t) periodic,

where the probability current is defined in (6.6). An exampleof mixed boundaryconditions would be absorbing boundary conditions at the left end and reflectingboundary conditions at the right end:

p(0, t) = J(L, t) = 0.

There is a complete classification of boundary conditions inone dimension, theFeller classification: the BC can beregular, exit, entranceandnatural .

6.3 Examples of Diffusion Processes

6.3.1 Brownian Motion

Brownian Motion on R

Seta(y, t) ≡ 0, b(y, t) ≡ 2D > 0. This diffusion process is the Brownian motionwith diffusion coefficientD. Let us calculate the transition probability density ofthis process assuming that the Brownian particle is aty at times. The Fokker-Planck equation for the transition probability densityp(x, t|y, s) is:

∂p

∂t= D

∂2p

∂x2, p(x, s|y, s) = δ(x− y). (6.8)

The solution to this equation is the Green’s function (fundamental solution) of theheat equation:

p(x, t|y, s) =1√

4πD(t− s)exp

(− (x− y)2

4D(t− s)

). (6.9)

Notice that using the Fokker-Planck equation for the Brownian motion we canimmediately show that the mean squared displacement grows linearly in time. As-

6.3. EXAMPLES OF DIFFUSION PROCESSES 95

suming that the Brownian particle is at the origin at timet = 0 we get

d

dtEW 2

t =d

dt

∫

R

x2p(x, t|0, 0) dx

= D

∫

R

x2∂2p(x, t)

∂x2dx

= D

∫

R

p(x, t|0, 0) dx = 2D,

where we performed two integrations by parts and we used the fact that, in viewof (6.9), no boundary terms remain. From this calculation weconclude that

EW 2t = 2Dt.

Assume now that the initial conditionW0 of the Brownian particle is a randomvariable with distributionρ0(x). To calculate the probability density function (dis-tribution function) of the Brownian particle we need to solve the Fokker-Planckequation with initial conditionρ0(x). In other words, we need to take the aver-age of the probability density functionp(x, t|y, 0) over all initial realizations ofthe Brownian particle. The solution of the Fokker-Planck equation, the distributionfunction, is

p(x, t) =

∫p(x, t|y, 0)ρ0(y) dy. (6.10)

Notice that only the transition probability density depends onx andy only throughtheir difference. Thus, we can writep(x, t|y, 0) = p(x− y, t). From (6.10) we seethat the distribution function is given by theconvolution between the transitionprobability density and the initial condition, as we know from the theory of partialdifferential equations.

p(x, t) =

∫p(x− y, t)ρ0(y) dy =: p ⋆ ρ0.

Brownian motion with absorbing boundary conditions

We can also consider Brownian motion in a bounded domain, with either absorb-ing, reflecting or periodic boundary conditions. SetD = 1 and consider theFokker-Planck equation (6.8) on[0, 1] with absorbing boundary conditions:

∂p

∂t=

1

2

∂2p

∂x2, p(0, t) = p(1, t) = 0. (6.11)


We look for a solution to this equation in a sine Fourier series:

p(x, t) =∞∑

k=1

pn(t) sin(nπx). (6.12)

Notice that the boundary conditions are automatically satisfied. The initial condi-tion is

p(x, 0) = δ(x − x0),

where we have assumed thatW0 = x0. The Fourier coefficients of the initialconditions are

pn(0) = 2

∫ 1

0δ(x − x0) sin(nπx) dx = 2 sin(nπx0).

We substitute the expansion (6.12) into (6.11) and use the orthogonality propertiesof the Fourier basis to obtain the equations

pn = −n2π2

2pn n = 1, 2, . . .

The solution of this equation is

pn(t) = pn(0)e−n2π2

2t.

Consequently, the transition probability density for the Brownian motion on[0, 1]with absorbing boundary conditions is

p(x, t|x0, 0) = 2

∞∑

n=1

e−n2π2

2t sinnπx0 sin(nπx).

Notice that

limt→∞

p(x, t|x0, 0) = 0.

This is not surprising, since all Brownian particles will eventually get absorbed atthe boundary.

Brownian Motion with Reflecting Boundary Condition

Consider now Brownian motion on the interval[0, 1] with reflecting boundary con-ditions and setD = 1 for simplicity. In order to calculate the transition probability


density we have to solve the Fokker-Planck equation which isthe heat equation on[0, 1] with Neumann boundary conditions:

∂p

∂t=

1

2

∂2p

∂x2, ∂xp(0, t) = ∂xp(1, t) = 0, p(x, 0) = δ(x − x0).

The boundary conditions are satisfied by functions of the form cos(nπx). We lookfor a solution in the form of a cosine Fourier series

p(x, t) =1

2a0 +

∞∑

n=1

an(t) cos(nπx).

From the initial conditions we obtain

an(0) = 2

∫ 1

0cos(nπx)δ(x − x0) dx = 2cos(nπx0).

We substitute the expansion into the PDE and use the orthonormality of the Fourierbasis to obtain the equations for the Fourier coefficients:

an = −n2π2

2an

from which we deduce that

an(t) = an(0)e−n2π2

2t.

Consequently

p(x, t|x0, 0) = 1 + 2

∞∑

n=1

cos(nπx0) cos(nπx)e−n2π2

2t.

Notice that Brownian motion with reflecting boundary conditions is an ergodicMarkov process. To see this, let us consider the stationary Fokker-Planck equation

∂2ps∂x2

= 0, ∂xps(0) = ∂xps(1) = 0.

The unique normalized solution to this boundary value problem is ps(x) = 1.Indeed, we multiply the equation byps, integrate by parts and use the boundaryconditions to obtain ∫ 1

0

∣∣∣∣dpsdx

∣∣∣∣2

dx = 0,


from which it follows thatps(x) = 1. Alternatively, by taking the limit ofp(x, t|x0, 0)

ast→ ∞ we obtain the invariant distribution:

limt→∞

p(x, t|x0, 0) = 1.

Now we can calculate the stationary autocorrelation function:

E(W (t)W (0)) =

∫ 1

0

∫ 1

0xx0p(x, t|x0, 0)ps(x0) dxdx0

=

∫ 1

0

∫ 1

0xx0

(

1 + 2

∞∑

n=1

cos(nπx0) cos(nπx)e−n2π2

2t

)

dxdx0

=1

4+

8

π4

+∞∑

n=0

1

(2n + 1)4e−

(2n+1)2π2

2t.

6.3.2 The Ornstein-Uhlenbeck Process

We set nowa(x, t) = −αx, b(x, t) = 2D > 0. With this drift and diffusioncoefficients the Fokker-Planck equation becomes

∂p

∂t= α

∂(xp)

∂x+D

∂2p

∂x2. (6.13)

This is the Fokker-Planck equation for the Ornstein-Uhlenbeck process. The cor-responding stochastic differential equation is

dXt = −αXt +√

2DdWt.

So, in addition to Brownian motion there is a linear force pulling the particle to-wards the origin. We know that Brownian motion is not a stationary process, sincethe variance grows linearly in time. By adding a linear damping term, it is rea-sonable to expect that the resulting process can be stationary. As we have alreadyseen, this is indeed the case.

The transition probability densitypOU(x, t|y, s) for an OU particle that is lo-cated aty at times is

pOU(y, t|x, s) =

√α

2πD(1 − e−2α(t−s))exp

(

−α(x− e−α(t−s)y)2

2D(1 − e−2α(t−s))

)

. (6.14)

We obtained this formula in Example (4.2.4) (forα = D = 1) by using the fact thatthe OU process can be defined through the a time change of the Brownian motion.


We can also derive it by solving equation (6.13). To obtain (6.14), we first takethe Fourier transform of the transition probability density with respect tox, solvethe resulting first order PDE using the method of characteristics and then take theinverse Fourier transform3

Notice that from formula (6.14) it immediately follows thatin the limit as thefriction coefficientα goes to0, the transition probability of the OU processes con-verges to the transition probability of Brownian motion. Furthermore, by takingthe long time limit in (6.14) we obtain (we have sets = 0)

limt→+∞

pOU(x, t|y, 0) =

√α

2πDexp

(−αx

2

2D

),

irrespective of the initial positiony of the OU particle. This is to be expected, sinceas we have already seen the Ornstein-Uhlenbeck process is anergodic Markovprocess, with a Gaussian invariant distribution

ps(x) =

√α

2πDexp

(−αx

2

2D

). (6.15)

Using now (6.14) and (6.15) we obtain the stationary joint probability density

p2(x, t|y, 0) = p(x, t|y, 0)ps(y)

=α

2πD√

1 − e−2αtexp

(−α(x2 + y2 − 2xye−αt)

2D(1 − e−2αt)

).

More generally, we have

p2(x, t|y, s) =α

2πD√

1 − e−2α|t−s|exp

(−α(x2 + y2 − 2xye−α|t−s|)

2D(1 − e−2α|t−s|)

).(6.16)

Now we can calculate the stationary autocorrelation function of the OU process

E(X(t)X(s)) =

∫ ∫xyp2(x, t|y, s) dxdy (6.17)

=D

αe−α|t−s|. (6.18)

In order to calculate the double integral we need to perform an appropriate changeof variables. The calculation is similar to the one presented in Section 2.6. SeeExercise 2.

3This calculation will be presented in Section?? for the Fokker-Planck equation of a linear SDEin arbitrary dimensions.


Assume that initial position of the OU particle is a random variable distributedaccording to a distributionρ0(x). As in the case of a Brownian particle, the proba-bility density function (distribution function) is given by the convolution integral

p(x, t) =

∫p(x− y, t)ρ0(y) dy, (6.19)

wherep(x − y, t) := p(x, t|y, 0). When the OU process is distributed initiallyaccording to its invariant distribution,ρ0(x) = ps(x) given by (6.15), then theOrnstein-Uhlenbeck process becomes stationary. The distribution function is givenby ps(x) at all times and the joint probability density is given by (6.16).

Knowledge of the distribution function enables us to calculate all moments ofthe OU process using the formula

E((Xt)n) =

∫xnp(x, t) dx,

We will calculate the moments by using the Fokker-Planck equation, rather thanthe explicit formula for the transition probability density. LetMn(t) denote thenthmoment of the OU process,

Mn :=

∫

R

xnp(x, t) dx, n = 0, 1, 2, . . . ,

Let n = 0. We integrate the FP equation overR to obtain:

∫∂p

∂t= α

∫∂(yp)

∂y+D

∫∂2p

∂y2= 0,

after an integration by parts and using the fact thatp(x, t) decays sufficiently fastat infinity. Consequently:

d

dtM0 = 0 ⇒ M0(t) = M0(0) = 1.

In other words, sinced

dt‖p‖L1(R) = 0,

we deduce that ∫

R

p(x, t) dx =

∫

R

p(x, t = 0) dy = 1,

which means that the total probability is conserved, as we have already shownfor the general Fokker-Planck equation in arbitrary dimensions. Letn = 1. We


multiply the FP equation for the OU process byx, integrate overR and performand integration by parts to obtain:

d

dtM1 = −αM1.

Consequently, the first moment convergesexponentially fast to 0:

M1(t) = e−αtM1(0).

Let now n > 2. We multiply the FP equation for the OU process byxn andintegrate by parts (once on the first term on the RHS and twice on the second) toobtain:

d

dt

∫ynp = −αn

∫ynp+Dn(n− 1)

∫yn−2p.

Or, equivalently:

d

dtMn = −αnMn +Dn(n− 1)Mn−2, n > 2.

This is a first order linear inhomogeneous differential equation. We can solve itusing the variation of constants formula:

Mn(t) = e−αntMn(0) +Dn(n− 1)

∫ t

0e−αn(t−s)Mn−2(s) ds. (6.20)

We can use this formula, together with the formulas for the first two moments inorder to calculate all higher order moments in an iterative way. For example, forn = 2 we have

M2(t) = e−2αtM2(0) + 2D

∫ t

0e−2α(t−s)M0(s) ds

= e−2αtM2(0) +D

αe−2αt(e2αt − 1)

=D

α+ e−2αt

(M2(0) −

D

α

).

Consequently, the second moment converges exponentially fast to its stationaryvalue D

2α . The stationary moments of the OU process are:

〈yn〉OU :=

√α

2πD

∫

R

yne−αy2

2D dx

=

1.3 . . . (n− 1)(Dα

)n/2, n even,

0, n odd.


It is not hard to check that (see Exercise 3)

limt→∞

Mn(t) = 〈yn〉OU (6.21)

exponentially fast4. Since we have already shown that the distribution functionofthe OU process converges to the Gaussian distribution in thelimit as t → +∞, itis not surprising that the moments also converge to the moments of the invariantGaussian measure. What is not so obvious is that the convergence is exponentiallyfast. In the next section we will prove that the Ornstein-Uhlenbeck process does,indeed, converge to equilibrium exponentially fast. Of course, if the initial condi-tions of the OU process are stationary, then the moments of the OU process becomeindependent of time and given by their equilibrium values

Mn(t) = Mn(0) = 〈xn〉OU . (6.22)

6.3.3 The Geometric Brownian Motion

We seta(x) = µx, b(x) = 12σ

2x2. This is thegeometric Brownian motion. Thecorresponding stochastic differential equation is

dXt = µXt dt + σXt dWt.

This equation is one of the basic models in mathematical finance. The coefficientσ is called the volatility. The generator of this process is

L = µx∂

∂x+σx2

2

∂2

∂x2.

Notice that this operator is not uniformly elliptic. The Fokker-Planck equation ofthe geometric Brownian motion is:

∂p

∂t= − ∂

∂x(µx) +

∂2

∂x2

(σ2x2

2p

).

We can easily obtain an equation for thenth moment of the geometric Brownianmotion:

d

dtMn =

(µn+

σ2

2n(n− 1)

)Mn, n > 2.

4Of course, we need to assume that the initial distribution has finite moments of all orders in orderto justify the above calculations.

6.4. THE ORNSTEIN-UHLENBECKPROCESS AND HERMITE POLYNOMIALS103


Mn(t) = e(µ+(n−1)σ2

2)ntMn(0), n > 2

and

M1(t) = eµtM1(0).

Notice that thenth moment might diverge ast→ ∞, depending on the values ofµandσ. Consider for example the second moment and assume thatµ < 0. We have

Mn(t) = e(2µ+σ2)tM2(0),

which diverges whenσ2 + 2µ > 0.

6.4 The Ornstein-Uhlenbeck Process and Hermite Poly-nomials

The Ornstein-Uhlenbeck process is one of the few stochasticprocesses for whichwe can calculate explicitly the solution of the corresponding SDE, the solution ofthe Fokker-Planck equation as well as the eigenfunctions ofthe generator of theprocess. In this section we will show that the eigenfunctions of the OU process arethe Hermite polynomials. We will also study various properties of the generatorof the OU process. In the next section we will show that many ofthe propertiesof the OU process (ergodicity, self-adjointness of the generator, exponentially fastconvergence to equilibrium, real, discrete spectrum) are shared by a large class ofdiffusion processes, namely those for which the drift term can be written in termsof the gradient of a smooth functions.

The generator of thed-dimensional OU process is (we set the drift coefficientequal to1)

L = −p · ∇p + β−1∆p (6.23)

whereβ denotes theinverse temperature. We have already seen that the OU pro-cess is an ergodic Markov process whose unique invariant measure is absolutelycontinuous with respect to the Lebesgue measure onR

d with Gaussian densityρ ∈ C∞(Rd)

ρβ(p) =1

(2πβ−1)d/2e−β

|p|2

2 .


The natural function space for studying the generator of theOU process is theL2-space weighted by the invariant measure of the process. Thisis a separable Hilbertspace with norm

‖f‖2ρ :=

∫

Rd

f2ρβ dp.

and corresponding inner product

(f, h)ρ =

∫

R

fhρβ dp.

Similarly, we can define weightedL2-spaced involving derivatives, i.e. weightedSobolev spaces. See Exercise .

The reason why this is the right function space in which to study questionsrelated to convergence to equilibrium is that the generatorof the OU process be-comes a self-adjoint operator in this space. In fact,L defined in (6.23) has manynice properties that are summarized in the following proposition.

Proposition 6.4.1. The operatorL has the following properties:

i. For everyf, h ∈ C20 (Rd) ∩ L2

ρ(Rd),

(Lf, h)ρ = (f,Lh)ρ = −β−1

∫

Rd

∇f · ∇hρβ dp. (6.24)

ii. L is a non-positive operator onL2ρ.

iii. Lf = 0 iff f ≡ const.

iv. For everyf ∈ C20(Rd) ∩ L2

ρ(Rd) with

∫fρβ = 0,

(−Lf, f)ρ > ‖f‖2ρ (6.25)

Proof. Equation (6.24) follows from an integration by parts:

(Lf, h)ρ =

∫−p · ∇fhρβ dp+ β−1

∫∆fhρβ dp

=

∫−p · ∇fhρβ dp− β−1

∫∇f · ∇hρβ dp+

∫−p · ∇fhρβ dp

= −β−1(∇f,∇h)ρ.

Non-positivity ofL follows from (6.24) upon settingh = f :

(Lf, f)ρ = −β−1‖∇f‖2ρ 6 0.


Similarly, multiplying the equationLf = 0 by fρβ, integrating overRd and us-ing (6.24) gives

‖f‖ρ = 0,

from which we deduce thatf ≡ const. The spectral gap follows from (6.24),together with Poincare’s inequality for Gaussian measures:

∫

Rd

f2ρβ dp 6 β−1

∫

Rd

|∇f |2ρβ dp (6.26)

for every f ∈ H1(Rd; ρβ) with∫fρβ = 0. Indeed, upon combining (6.24)

with (6.26) we obtain:

(Lf, f)ρ = −β−1‖∇f‖2ρ

6 −‖f‖2ρ

The spectral gap of the generator of the OU process, which is equivalent tothe compactness of its resolvent, implies thatL has discrete spectrum. Further-more, since it is also a self-adjoint operator, we have that its eigenfunctions forma countable orthonormal basis for the separable Hilbert spaceL2

ρ. In fact, we cancalculate the eigenvalues and eigenfunctions of the generator of the OU process inone dimension.5

Theorem 6.4.2. Consider the eigenvalue problem for the generator of the OUprocess in one dimension

−Lfn = λnfn. (6.27)

Then the eigenvalues ofL are the nonnegative integers:

λn = n, n = 0, 1, 2, . . . .

The corresponding eigenfunctions are the normalizedHermite polynomials:

fn(p) =1√n!Hn

(√βp), (6.28)

where

Hn(p) = (−1)nep2

2dn

dpn

(e−

p2

2

). (6.29)

5The multidimensional problem can be treated similarly by taking tensor products of the eigen-functions of the one dimensional problem.


For the subsequent calculations we will need some additional properties ofHermite polynomials which we state here without proof (we use the notationρ1 =

ρ).

Proposition 6.4.3. For eachλ ∈ C, set

H(p;λ) = eλp−λ2

2 , p ∈ R.

Then

H(p;λ) =

∞∑

n=0

λn

n!Hn(p), p ∈ R, (6.30)

where the convergence is both uniform on compact subsets ofR×C, and forλ’s incompact subsets ofC, uniform inL2(C; ρ). In particular,fn(p) := 1√

n!Hn(

√βp) :

n ∈ N is an orthonormal basis inL2(C; ρβ).

From (6.29) it is clear thatHn is a polynomial of degreen. Furthermore, onlyodd (even) powers appear inHn(p) whenn is odd (even). Furthermore, the coef-ficient multiplying pn in Hn(p) is always1. The orthonormality of the modifiedHermite polynomialsfn(p) defined in (6.28) implies that

∫

R

fn(p)fm(p)ρβ(p) dp = δnm.

The first few Hermite polynomials and the corresponding rescaled/normalized eigen-functions of the generator of the OU process are:

H0(p) = 1, f0(p) = 1,

H1(p) = p, f1(p) =√βp,

H2(p) = p2 − 1, f2(p) =β√2p2 − 1√

2,

H3(p) = p3 − 3p, f3(p) =β3/2

√6p3 − 3

√β√6p

H4(p) = p4 − 3p2 + 3, f4(p) =1√24

(β2p4 − 3βp2 + 3

)

H5(p) = p5 − 10p3 + 15p, f5(p) =1√120

(β5/2p5 − 10β3/2p3 + 15β1/2p

).


The proof of Theorem 6.4.2 follows essentially from the properties of the Hermitepolynomials. First, notice that by combining (6.28) and (6.30) we obtain

H(√βp, λ) =

+∞∑

n=0

λn√n!fn(p)

We differentiate this formula with respect top to obtain

λ√βH(

√βp, λ) =

+∞∑

n=1

λn√n!∂pfn(p),

sincef0 = 1. From this equation we obtain

H(√βp, λ) =

+∞∑

n=1

λn−1

√β√n!∂pfn(p)

=

+∞∑

n=0

λn√β√

(n + 1)!∂pfn+1(p)

from which we deduce that

1√β∂pfk =

√kfk−1. (6.31)

Similarly, if we differentiate (6.30) with respect toλ we obtain

(p − λ)H(p;λ) =

+∞∑

k=0

λk

k!pHk(p) −

+∞∑

k=1

λk

(k − 1)!Hk−1(p)

+∞∑

k=0

λk

k!Hk+1(p)

from which we obtain the recurrence relation

pHk = Hk+1 + kHk−1.

Upon rescaling, we deduce that

pfk =√β−1(k + 1)fk+1 +

√β−1kfk−1. (6.32)

We combine now equations (6.31) and (6.32) to obtain

(√βp− 1√

β∂p

)fk =

√k + 1fk+1. (6.33)


Now we observe that

−Lfn =

(√βp− 1√

β∂p

)1√β∂pfn

=

(√βp− 1√

β∂p

)√nfn−1 = nfn.

The operators(√

βp− 1√β∂p

)and 1√

β∂p play the role ofcreation andanni-

hilation operators. In fact, we can generate all eigenfunctions of the OU operatorfrom theground statef0 = 0 through a repeated application of the creation oper-ator.

Proposition 6.4.4. Setβ = 1 and leta− = ∂p. Then theL2ρ-adjoint ofa+ is

a+ = −∂p + p.

Then the generator of the OU process can be written in the form

L = −a+a−.

Furthermore,a+ anda− satisfy the following commutation relation

[a+, a−] = −1

Define now the creation and annihilation operators onC1(R) by

S+ =1√

(n+ 1)a+

and

S− =1√na−.

Then

S+fn = fn+1 and S−fn = fn−1. (6.34)

In particular,

fn =1√n!

(a+)n1 (6.35)

and

1 =1√n!

(a−)nfn. (6.36)


Proof. let f, h ∈ C1(R) ∩ L2ρ. We calculate

∫∂pfhρ = −

∫f∂p(hρ) (6.37)

=

∫f(− ∂p + p

)hρ. (6.38)

Now,

−a+a− = −(−∂p + p)∂p = ∂p − p∂p = L.

Similarly,

a−a+ = −∂2p + p∂p + 1.

and

[a+, a−] = −1

Forumlas (6.34) follow from (6.31) and (6.33). Finally, formulas (6.35) and (6.36)are a consequence of (6.31) and (6.33), together with a simple induction argument.

Notice that upon using (6.35) and (6.36) and the fact thata+ is the adjoint ofa− we can easily check the orthonormality of the eigenfunctions:

∫fnfm ρ =

1√m!

∫fn(a

−)m1 ρ

=1√m!

∫(a−)mfn ρ

=

∫fn−m ρ = δnm.

From the eigenfunctions and eigenvalues ofL we can easily obtain the eigenvaluesand eigenfunctions ofL∗, the Fokker-Planck operator.

Lemma 6.4.5. The eigenvalues and eigenfunctions of the Fokker-Planck operator

L∗· = ∂2p · +∂p(p·)

are

λ∗n = −n, n = 0, 1, 2, . . . and f∗n = ρfn.


Proof. We have

L∗(ρfn) = fnL∗ρ+ ρLfn= −nρfn.

An immediate corollary of the above calculation is that we can thenth eigen-function of the Fokker-Planck operator is given by

f∗n = ρ(p)1

n!(a+)n1.

6.5 Reversible Diffusions

The stationary Ornstein-Uhlenbeck process is an example ofa reversible Markovprocess:

Definition 6.5.1. A stationary stochastic processXt is time reversible if for ev-ery m ∈ N and everyt1, t2, . . . , tm ∈ R

+, the joint probability distribution isinvariant under time reversals:

p(Xt1 ,Xt2 , . . . ,Xtm) = p(X−t1 ,X−t2 , . . . ,X−tm). (6.39)

In this section we study a more general class (in fact, as we will see later themost general class) of reversible Markov processes, namelystochastic perturba-tions of ODEs with a gradient structure.

Let V (x) = 12αx

2. The generator of the OU process can be written as:

L = −∂xV ∂x + β−1∂2x.

Consider diffusion processes with a potentialV (x), not necessarily quadratic:

L = −∇V (x) · ∇ + β−1∆ (6.40)

In applications of (6.40) to statistical mechanics the diffusion coefficientβ−1 =

kBT wherekB is Boltzmann’s constant andT the absolute temperature. The cor-responding stochastic differential equation is

dXt = −∇V (Xt) dt +√

2β−1 dWt. (6.41)

6.5. REVERSIBLE DIFFUSIONS 111

Hence, we have a gradient ODEXt = −∇V (Xt) perturbed by noise due to ther-mal fluctuations. The corresponding FP equation is:

∂p

∂t= ∇ · (∇V p) + β−1∆p. (6.42)

It is not possible to calculate the time dependent solution of this equation for anarbitrary potential. We can, however, always calculate thestationary solution, if itexists.

Definition 6.5.2. A potentialV will be calledconfining if lim|x|→+∞ V (x) = +∞and

e−βV (x) ∈ L1(Rd). (6.43)

for all β ∈ R+.

Gradient SDEs in a confining potential are ergodic:

Proposition 6.5.3. Let V (x) be a smooth confining potential. Then the Markovprocess with generator(6.40) is ergodic. The unique invariant distribution is theGibbs distribution

p(x) =1

Ze−βV (x) (6.44)

where the normalization factorZ is thepartition function

Z =

∫

Rd

e−βV (x) dx.

The fact that the Gibbs distribution is an invariant distribution follows by directsubstitution. Uniqueness follows from a PDEs argument (seediscussion below). Itis more convenient to ”normalize” the solution of the Fokker-Planck equation withrespect to the invariant distribution.

Theorem 6.5.4.Let p(x, t) be the solution of the Fokker-Planck equation(6.42),assume that(6.43) holds and letρ(x) be the Gibbs distribution(10.11). Defineh(x, t) through

p(x, t) = h(x, t)ρ(x).

Then the functionh satisfies thebackward Kolmogorov equation:

∂h

∂t= −∇V · ∇h+ β−1∆h, h(x, 0) = p(x, 0)ρ−1(x). (6.45)


Proof. The initial condition follows from the definition ofh. We calculate thegradient and Laplacian ofp:

∇p = ρ∇h− ρhβ∇V

and∆p = ρ∆h− 2ρβ∇V · ∇h+ hβ∆V ρ+ h|∇V |2β2ρ.

We substitute these formulas into the FP equation to obtain

ρ∂h

∂t= ρ

(−∇V · ∇h+ β−1∆h

),

from which the claim follows.

Consequently, in order to study properties of solutions to the FP equation, it issufficient to study the backward equation (6.45). The generator L is self-adjoint,in the right function space. We define the weightedL2 spaceL2

ρ:

L2ρ =

f |∫

Rd

|f |2ρ(x) dx <∞,

whereρ(x) is the Gibbs distribution. This is a Hilbert space with innerproduct

(f, h)ρ =

∫

Rd

fhρ(x) dx.

Theorem 6.5.5.Assume thatV (x) is a smooth potential and assume that condi-tion (6.43)holds. Then the operator

L = −∇V (x) · ∇ + β−1∆

is self-adjoint inL2ρ. Furthermore, it is non-positive, its kernel consists of con-

stants.

Proof. Let f, ∈ C20 (Rd). We calculate

(Lf, h)ρ =

∫

Rd

(−∇V · ∇ + β−1∆)fhρ dx

=

∫

Rd

(∇V · ∇f)hρ dx− β−1

∫

Rd

∇f∇hρ dx− β−1

∫

Rd

∇fh∇ρ dx

= −β−1

∫

Rd

∇f · ∇hρ dx,

from which self-adjointness follows.

6.5. REVERSIBLE DIFFUSIONS 113

If we setf = h in the above equation we get

(Lf, f)ρ = −β−1‖∇f‖2ρ,

which shows thatL is non-positive.Clearly, constants are in the null space ofL. Assume thatf ∈ N (L). Then,

from the above equation we get

0 = −β−1‖∇f‖2ρ,

and, consequently,f is a constant.

Remark 6.5.6. The expression(−Lf, f)ρ is called theDirichlet form of the op-eratorL. In the case of a gradient flow, it takes the form

(−Lf, f)ρ = β−1‖∇f‖2ρ. (6.46)

Using the properties of the generatorL we can show that the solution of theFokker-Planck equation converges to the Gibbs distribution exponentially fast. Forthis we need to use the fact that, under appropriate assumptions on the potentialV ,the Gibbs measureµ(dx) = Z−1e−βV (x) satisfiesPoincare’s inequality:

Theorem 6.5.7.Assume that the potentialV satisfies the convexity condition

D2V > λI.

Then the corresponding Gibbs measure satisfies the Poincare inequality with con-stantλ: ∫

Rd

fρ = 0 ⇒ ‖∇f‖ρ >√λ‖f‖ρ. (6.47)

Theorem 6.5.8.Assume thatp(x, 0) ∈ L2(eβV ). Then the solutionp(x, t) of theFokker-Planck equation(6.42) converges to the Gibbs distribution exponentiallyfast:

‖p(·, t) − Z−1e−βV ‖ρ−1 6 e−λDt‖p(·, 0) − Z−1e−βV ‖ρ−1 . (6.48)

Proof. We Use (6.45), (6.46) and (6.47) to calculate

− d

dt‖(h− 1)‖2

ρ = −2

(∂h

∂t, h− 1

)

ρ

= −2 (Lh, h− 1)ρ

= (−L(h− 1), h − 1)ρ = 2D‖∇(h− 1)‖ρ> 2β−1λ‖h− 1‖2

ρ.


Our assumption onp(·, 0) implies thath(·, 0) ∈ L2ρ. Consequently, the above

calculation shows that

‖h(·, t) − 1‖ρ 6 e−λβ−1t‖h(·, 0) − 1‖ρ.

This, and the definition ofh, p = ρh, lead to (6.48).

Remark 6.5.9. The assumption∫

Rd

|p(x, 0)|2Z−1eβV <∞

is very restrictive (think of the case whereV = x2). The function spaceL2(ρ−1) =

L2(e−βV ) in which we prove convergence is not the right space to use. Sincep(·, t) ∈ L1, ideally we would like to prove exponentially fast convergence inL1.We can prove convergence inL1 using the theory oflogarithmic Sobolev inequal-ities. In fact, we can also prove convergence inrelative entropy:

H(p|ρV ) :=

∫

Rd

p ln

(p

ρV

)dx.

The relative entropy norm controls theL1 norm:

‖ρ1 − ρ2‖2L1 6 CH(ρ1|ρ2)

Using a logarithmic Sobolev inequality, we can prove exponentially fast conver-gence to equilibrium, assuming only that the relative entropy of the initial condi-tions is finite.

A much sharper version of the theorem of exponentially fast convergence toequilibrium is the following:

Theorem 6.5.10.Let p denote the solution of the Fokker–Planck equation(6.42)where the potential is smooth and uniformly convex. Assume that the the initialconditions satisfy

H(p(·, 0)|ρV ) <∞.

Thenp converges to the Gibbs distribution exponentially fast in relative entropy:

H(p(·, t)|ρV ) 6 e−λβ−1tH(p(·, 0)|ρV ).

Self-adjointness of the generator of a diffusion process isequivalent to time-reversibility.

6.6. PERTURBATIONS OF NON-REVERSIBLE DIFFUSIONS 115

Theorem 6.5.11.LetXt be a stationary Markov process inRd with generator

L = b(x) · ∇ + β−1∆

and invariant measureµ. Then the following three statements are equivalent.

i. The process it time-reversible.

ii. Its generator of the process is symmetric inL2(Rd;µ(dx)).

iii. There exists a scalar functionV (x) such that

b(x) = −∇V (x).

6.5.1 Markov Chain Monte Carlo (MCMC)

The Smoluchowski SDE (6.41) has a very interesting application in statistics. Sup-pose we want to sample from a probability distributionπ(x). One method fordoing this is by generating the dynamics whose invariant distribution is preciselyπ(x). In particular, we consider the Smolochuwoski equation

dXt = ∇ ln(π(Xt)) dt +√

2dWt. (6.49)

Assuming that− ln(π(x)) is a confining potential, thenXt is an ergodic Markovprocess with invariant distributionπ(x). Furthermore, the law ofXt converges toπ(x) exponentially fast:

‖ρt − π‖L1 6 e−Λt‖ρ0 − π‖L1 .

The exponentΛ is related to the spectral gap of the generatorL = 1π(x)∇π(x) ·

∇+ ∆. This technique for sampling from a given distribution is anexample of theMarkov Chain Monte Carlo (MCMC) methodology.

6.6 Perturbations of non-Reversible Diffusions

We can add a perturbation to a non-reversible diffusion without changing the in-variant distributionZ−1e−βV .

Proposition 6.6.1. Let V (x) be a confining potential,γ(x) a smooth vector fieldand consider the diffusion process

dXt = (−∇V (Xt) + γ(x)) dt +√

2β−1 dWt. (6.50)


Then the invariant measure of the processXt is the Gibbs measureµ(dx) =1Z e

−βV (x) dx if and only if γ(x) is divergence-free with respect to the density ofthis measure:

∇ ·(γ(x)e−βV (x))

)= 0. (6.51)

6.7 Eigenfunction Expansions

Consider the generator of a gradient stochastic flow with a uniformly convex po-tential

L = −∇V · ∇ +D∆. (6.52)

We know thatL is a non-positive self-adjoint operator onL2ρ and that it has a

spectral gap:

(Lf, f)ρ 6 −Dλ‖f‖2ρ

whereλ is the Poincare constant of the potentialV (i.e. for the Gibbs measureZ−1e−βV (x) dx). The above imply that we can study the spectral problem for−L:

−Lfn = λnfn, n = 0, 1, . . .

The operator−L has real, discrete spectrum with

0 = λ0 < λ1 < λ2 < . . .

Furthermore, the eigenfunctionsfj∞j=1 form an orthonormal basis inL2ρ: we can

express every element ofL2ρ in the form of a generalized Fourier series:

φ =∞∑

n=0

φnfn, φn = (φ, fn)ρ (6.53)

with (fn, fm)ρ = δnm. This enables us to solve the time dependent Fokker–Planckequation in terms of an eigenfunction expansion. Consider the backward Kol-mogorov equation (6.45). We assume that the initial conditionsh0(x) = φ(x) ∈L2ρ and consequently we can expand it in the form (6.53). We look for a solution

of (6.45) in the form

h(x, t) =∞∑

n=0

hn(t)fn(x).

6.7. EIGENFUNCTION EXPANSIONS 117

We substitute this expansion into the backward Kolmogorov equation:

∂h

∂t=

∞∑

n=0

hnfn = L( ∞∑

n=0

hnfn

)

(6.54)

=∞∑

n=0

−λnhnfn. (6.55)

We multiply this equation byfm, integrate wrt the Gibbs measure and use theorthonormality of the eigenfunctions to obtain the sequence of equations

hn = −λnhn, n = 0, 1,

The solution is

h0(t) = φ0, hn(t) = e−λntφn, n = 1, 2, . . .

Notice that

1 =

∫

Rd

p(x, 0) dx =

∫

Rd

p(x, t) dx

=

∫

Rd

h(x, t)Z−1eβV dx = (h, 1)ρ = (φ, 1)ρ

= φ0.

Consequently, the solution of the backward Kolmogorov equation is

h(x, t) = 1 +∞∑

n=1

e−λntφnfn.

This expansion, together with the fact that all eigenvaluesare positive (n > 1),shows that the solution of the backward Kolmogorov equationconverges to1 ex-ponentially fast. The solution of the Fokker–Planck equation is

p(x, t) = Z−1e−βV (x)

(1 +

∞∑

n=1

e−λntφnfn

).

6.7.1 Reduction to a Schrodinger Equation

Lemma 6.7.1. The Fokker–Planck operator for a gradient flow can be writteninthe self-adjoint form

∂p

∂t= D∇ ·

(e−V/D∇

(eV/Dp

)). (6.56)


Define nowψ(x, t) = eV/2Dp(x, t). Thenψ solves the PDE

∂ψ

∂t= D∆ψ − U(x)ψ, U(x) :=

|∇V |24D

− ∆V

2. (6.57)

LetH := −D∆ + U . ThenL∗ andH have the same eigenvalues. Thenth eigen-functionφn of L∗ and thenth eigenfunctionψn of H are associated through thetransformation

ψn(x) = φn(x) exp

(V (x)

2D

).

Remarks 6.7.2. i. From equation(6.56) shows that the FP operator can bewritten in the form

L∗· = D∇ ·(e−V/D∇

(eV/D·

)).

ii. The operator that appears on the right hand side of eqn.(6.57)has the formof aSchrodinger operator:

−H = −D∆ + U(x).

iii. The spectral problem for the FP operator can be transformed into the spec-tral problem for a Schrodinger operator. We can thus use all the availableresults from quantum mechanics to study the FP equation and the associatedSDE.

iv. In particular, the weak noise asymptoticsD ≪ 1 is equivalent to the semi-classical approximation from quantum mechanics.

Proof. We calculate

D∇ ·(e−V/D∇

(eV/Df

))= D∇ ·

(e−V/D

(D−1∇V f + ∇f

)eV/D

)

= ∇ · (∇V f +D∇f) = L∗f.

Consider now the eigenvalue problem for the FP operator:

−L∗φn = λnφn.

Setφn = ψn exp(− 1

2DV). We calculate−L∗φn:

−L∗φn = −D∇ ·(e−V/D∇

(eV/Dψne

−V/2D))

= −D∇ ·(e−V/D

(∇ψn +

∇V2D

ψn

)eV/2D

)

=

(−D∆ψn +

(−|∇V |2

4D+

∆V

2D

)ψn

)e−V/2D = e−V/2DHψn.


From this we conclude thate−V/2DHψn = λnψne−V/2D from which the equiva-

lence between the two eigenvalue problems follows.

Remarks 6.7.3. i. We can rewrite the Schrodinger operator in the form

H = DA∗A, A = ∇ +∇U2D

, A∗ = −∇ +∇U2D

.

ii. These arecreation andannihilation operators. They can also be written inthe form

A· = e−U/2D∇(eU/2D·

), A∗· = eU/2D∇

(e−U/2D·

)

iii. The forward the backward Kolmogorov operators have thesame eigenvalues.Their eigenfunctions are related through

φBn = φFn exp (−V/D) ,

whereφBn andφFn denote the eigenfunctions of the backward and forwardoperators, respectively.


The proof of existence and uniqueness of classical solutions for the Fokker-Planckequation of a uniformly elliptic diffusion process with smooth drift and diffusioncoefficients, Theorem 6.2.2, can be found in [21]. A standardtextbook on PDEs,with a lot of material on parabolic PDEs is [13], particularly Chapters2 and7 inthis book.

It is important to emphasize that the condition that solutions to the Fokker-Planck equation do not grow too fast, see Definition 6.2.1, isnecessary to ensureuniqueness. In fact, there are infinitely many solutions of

∂p

∂t= ∆p in R

d × (0, T )

p(x, 0) = 0.

Each of these solutions besides the trivial solutionp = 0 grows very rapidly asx→ +∞. More details can be found in [34, Ch. 7].


The Fokker-Planck equation is studied extensively in Risken’s monograph [64].See also [25] and [32]. The connection between the Fokker-Planck equation andstochastic differential equations is presented in Chapter7. See also [1, 22, 23].

Hermite polynomials appear very frequently in applications and they also playa fundamental role in analysis. It is possible to prove that the Hermite polynomialsform an orthonormal basis forL2(Rd, ρβ) without using the fact that they are theeigenfunctions of a symmetric operator with compact resolvent.6 The proof ofProposition 6.4.1 can be found in [71], Lemma 2.3.4 in particular.

Diffusion processes in one dimension are studied in [48]. The Feller classifica-tion for one dimensional diffusion processes can be also found in [35, 15].

Convergence to equilibrium for kinetic equations (such as the Fokker-Planckequation) both linear and non-linear (e.g., the Boltzmann equation) has been stud-ied extensively. It has been recognized that the relative entropy and logarithmicSobolev inequalities play an important role in the analysisof the problem of con-vergence to equilibrium. For more information see [49].

6.9 Exercises

1. Solve equation (6.13) by taking the Fourier transform, using the method of char-acteristics for first order PDEs and taking the inverse Fourier transform.

2. Use the formula for the stationary joint probability density of the Ornstein-Uhlenbeck process, eqn. (6.17) to obtain the stationary autocorrelation functionof the OU process.

3. Use (6.20) to obtain formulas for the moments of the OU process. Prove, usingthese formulas, that the moments of the OU process converge to their equilib-rium values exponentially fast.

4. Show that the autocorrelation function of the stationaryOrnstein-Uhlenbeck is

E(XtX0) =

∫

R

∫

R

xx0pOU(x, t|x0, 0)ps(x0) dxdx0

=D

2αe−α|t|,

whereps(x) denotes the invariant Gaussian distribution.

6In fact, Poincare’s inequality for Gaussian measures can be proved using the fact that that theHermite polynomials form an orthonormal basis forL2(Rd, ρβ).

6.9. EXERCISES 121

5. LetXt be a one-dimensional diffusion process with drift and diffusion coeffi-cientsa(y, t) = −a0−a1y andb(y, t) = b0 + b1y+ b2y

2 whereai, bi > 0, i =

0, 1, 2.

(a) Write down the generator and the forward and backward Kolmogorovequations forXt.

(b) Assume thatX0 is a random variable with probability densityρ0(x) thathas finite moments. Use the forward Kolmogorov equation to derive asystem of differential equations for the moments ofXt.

(c) Find the first three momentsM0, M1, M2 in terms of the moments of theinitial distributionρ0(x).

(d) Under what conditions on the coefficientsai, bi > 0, i = 0, 1, 2 is M2

finite for all times?

6. Let V be a confining potential inRd, β > 0 and letρβ(x) = Z−1e−βV (x).Give the definition of the Sobolev spaceHk(Rd; ρβ) for k a positive integerand study some of its basic properties.

7. LetXt be a multidimensional diffusion process on[0, 1]d with periodic bound-ary conditions. The drift vector is a periodic functiona(x) and the diffusionmatrix is2DI, whereD > 0 andI is the identity matrix.

(a) Write down the generator and the forward and backward Kolmogorovequations forXt.

(b) Assume thata(x) is divergence-free (∇ · a(x) = 0). Show thatXt isergodic and find the invariant distribution.

(c) Show that the probability densityp(x, t) (the solution of the forward Kol-mogorov equation) converges to the invariant distributionexponentiallyfast inL2([0, 1]d). (Hint: Use Poincare’s inequality on[0, 1]d).

8. TheRayleigh processXt is a diffusion process that takes values on(0,+∞)

with drift and diffusion coefficientsa(x) = −ax+ Dx andb(x) = 2D, respec-

tively, wherea, D > 0.

(a) Write down the generator the forward and backward Kolmogorov equa-tions forXt.

(b) Show that this process is ergodic and find its invariant distribution.


(c) Solve the forward Kolmogorov (Fokker-Planck) equationusing separationof variables. (Hint: Use Laguerre polynomials).

9. Letx(t) = x(t), y(t) be the two-dimensional diffusion process on[0, 2π]2

with periodic boundary conditions with drift vectora(x, y) = (sin(y), sin(x))

and diffusion matrixb(x, y) with b11 = b22 = 1, b12 = b21 = 0.

(a) Write down the generator of the processx(t), y(t) and the forward andbackward Kolmogorov equations.

(b) Show that the constant function

ρs(x, y) = C

is the unique stationary distribution of the processx(t), y(t) and calcu-late the normalization constant.

(c) Let E denote the expectation with respect to the invariant distributionρs(x, y). Calculate

E(cos(x) + cos(y)

)and E(sin(x) sin(y)).

10. Leta, D be positive constants and letX(t) be the diffusion process on[0, 1]with periodic boundary conditions and with drift and diffusion coefficientsa(x) =

a andb(x) = 2D, respectively. Assume that the process starts atx0, X(0) =

x0.

(a) Write down the generator of the processX(t) and the forward and back-ward Kolmogorov equations.

(b) Solve the initial/boundary value problem for the forward Kolmogorovequation to calculate the transition probability densityp(x, t|x0, 0).

(c) Show that the process is ergodic and calculate the invariant distributionps(x).

(d) Calculate the stationary autocorrelation function

E(X(t)X(0)) =

∫ 1

0

∫ 1

0xx0p(x, t|x0, 0)ps(x0) dxdx0.

Chapter 7

Stochastic Differential Equations

7.1 Introduction

In this part of the course we will study stochastic differential equation (SDEs):ODEs driven by Gaussian white noise.

Let W (t) denote a standardm–dimensional Brownian motion,h : Z → Rd

a smooth vector-valued function andγ : Z → Rd×m a smooth matrix valued

function (in this course we will takeZ = Td, R

d or Rl ⊕ T

d−l. Consider the SDE

dz

dt= h(z) + γ(z)

dW

dt, z(0) = z0. (7.1)

We think of the termdWdt as representing Gaussian white noise: a mean-zero Gaus-sian process with correlationδ(t − s)I. The functionh in (7.1) is sometimesreferred to as thedrift andγ as thediffusion coefficient. Such a process exists onlyas a distribution. The precise interpretation of (7.1) is asan integral equation forz(t) ∈ C(R+,Z):

z(t) = z0 +

∫ t

0h(z(s))ds +

∫ t

0γ(z(s))dW (s). (7.2)

In order to make sense of this equation we need to define the stochastic integralagainstW (s).

123

124 CHAPTER 7. STOCHASTIC DIFFERENTIAL EQUATIONS

7.2 The Ito and Stratonovich Stochastic Integral

For the rigorous analysis of stochastic differential equations it is necessary to definestochastic integrals of the form

I(t) =

∫ t

0f(s) dW (s), (7.3)

whereW (t) is a standard one dimensional Brownian motion. This is not straight-forward becauseW (t) does not have bounded variation. In order to define thestochastic integral we assume thatf(t) is a random process, adapted to the filtra-tion Ft generated by the processW (t), and such that

E

(∫ T

0f(s)2 ds

)<∞.

The Ito stochastic integralI(t) is defined as theL2–limit of the Riemann sumapproximation of (7.3):

I(t) := limK→∞

K−1∑

k=1

f(tk−1) (W (tk) −W (tk−1)) , (7.4)

wheretk = k∆t andK∆t = t. Notice that the functionf(t) is evaluated at theleft end of each interval[tn−1, tn] in (7.4). The resulting Ito stochastic integralI(t)is a.s. continuous int. These ideas are readily generalized to the case whereW (s)

is a standardd dimensional Brownian motion andf(s) ∈ Rm×d for eachs.

The resulting integral satisfies theIto isometry

E|I(t)|2 =

∫ t

0E|f(s)|2F ds, (7.5)

where| · |F denotes the Frobenius norm|A|F =√tr(ATA). The Ito stochastic

integral is amartingale:

EI(t) = 0

and

E[I(t)|Fs] = I(s) ∀ t > s,

whereFs denotes the filtration generated byW (s).

7.2. THE ITO AND STRATONOVICH STOCHASTIC INTEGRAL 125

Example 7.2.1. • Consider the Ito stochastic integral

I(t) =

∫ t

0f(s) dW (s),

• wheref,W are scalar–valued. This is a martingale with quadratic variation

〈I〉t =

∫ t

0(f(s))2 ds.

• More generally, forf, W in arbitrary finite dimensions, the integralI(t) isa martingale with quadratic variation

〈I〉t =

∫ t

0(f(s) ⊗ f(s)) ds.

7.2.1 The Stratonovich Stochastic Integral

In addition to the Ito stochastic integral, we can also define the Stratonovich stochas-tic integral. It is defined as theL2–limit of a different Riemann sum approximationof (7.3), namely

Istrat(t) := limK→∞

K−1∑

k=1

1

2

(f(tk−1) + f(tk)

)(W (tk) −W (tk−1)) , (7.6)

wheretk = k∆t andK∆t = t. Notice that the functionf(t) is evaluated at bothendpoints of each interval[tn−1, tn] in (7.6). The multidimensional Stratonovichintegral is defined in a similar way. The resulting integral is written as

Istrat(t) =

∫ t

0f(s) dW (s).

The limit in (7.6) gives rise to an integral which differs from the Ito integral. Thesituation is more complex than that arising in the standard theory of Riemann in-tegration for functions of bounded variation: in that case the points in[tk−1, tk]

where the integrand is evaluated do not effect the definitionof the integral, via alimiting process. In the case of integration against Brownian motion, which doesnot have bounded variation, the limits differ. Whenf andW are correlated throughan SDE, then a formula exists to convert between them.


7.3 Stochastic Differential Equations

Definition 7.3.1. By a solution of(7.1) we mean aZ-valued stochastic processz(t) on t ∈ [0, T ] with the properties:

i. z(t) is continuous andFt−adapted, where the filtration is generated by theBrownian motionW (t);

ii. h(z(t)) ∈ L1((0, T )), γ(z(t)) ∈ L2((0, T ));

iii. equation(7.1)holds for everyt ∈ [0, T ] with probability1.

The solution is called unique if any two solutionsxi(t), i = 1, 2 satisfy

P(x1(t) = x2(t), ∀t ∈ [0.T ]) = 1.

It is well known that existence and uniqueness of solutions for ODEs (i.e. whenγ ≡ 0 in (7.1)) holds for globally Lipschitz vector fieldsh(x). A very similartheorem holds whenγ 6= 0. As for ODEs the conditions can be weakened, whenapriori bounds on the solution can be found.

Theorem 7.3.2.Assume that bothh(·) andγ(·) are globally Lipschitz onZ andthat z0 is a random variable independent of the Brownian motionW (t) with

E|z0|2 <∞.

Then the SDE(7.1)has a unique solutionz(t) ∈ C(R+;Z) with

E

[∫ T

0|z(t)|2 dt

]<∞ ∀T <∞.

Furthermore, the solution of the SDE is a Markov process.

The Stratonovich analogue of (7.1) is

dz

dt= h(z) + γ(z) dW

dt, z(0) = z0. (7.7)

By this we mean thatz ∈ C(R+,Z) satisfies the integral equation

z(t) = z(0) +

∫ t

0h(z(s))ds +

∫ t

0γ(z(s)) dW (s). (7.8)

7.3. STOCHASTIC DIFFERENTIAL EQUATIONS 127

By using definitions (7.4) and (7.6) it can be shown thatz satisfying the StratonovichSDE (7.7) also satisfies the Ito SDE

dz

dt= h(z) +

1

2∇ ·(γ(z)γ(z)T

)− 1

2γ(z)∇ ·

(γ(z)T

)+ γ(z)

dW

dt, (7.9a)

z(0) = z0, (7.9b)

provided thatγ(z) is differentiable. White noise is, in most applications, anideal-ization of a stationary random process with short correlation time. In this contextthe Stratonovich interpretation of an SDE is particularly important because it oftenarises as the limit obtained by using smooth approximationsto white noise. Onthe other hand the martingale machinery which comes with theIto integral makesit more important as a mathematical object. It is very usefulthat we can convertfrom the Ito to the Stratonovich interpretation of the stochastic integral. There areother interpretations of the stochastic integral, e.g. theKlimontovich stochasticintegral.

The Definition of Brownian motion implies the scaling property

W (ct) =√cW (t),

where the above should be interpreted as holding in law. Fromthis it follows that,if s = ct, then

dW

ds=

1√c

dW

dt,

again in law. Hence, if we scale time tos = ct in (7.1), then we get the equation

dz

ds=

1

ch(z) +

1√cγ(z)

dW

ds, z(0) = z0.

7.3.1 Examples of SDEs

The SDE for Brownian motion is:

dX =√

2σdW, X(0) = x.

The Solution is:X(t) = x+W (t).

The SDE for the Ornstein-Uhlenbeck process is

dX = −αX dt +√

2λ dW, X(0) = x.


We can solve this equation using the variation of constants formula:

X(t) = e−αtx+√

2λ

∫ t

0e−α(t−s)dW (s).

We can use Ito’s formula to obtain equations for the momentsof the OU process.The generator is:

L = −αx∂x + λ∂2x.

We apply Ito’s formula to the functionf(x) = xn to obtain:

dX(t)n = LX(t)n dt +√

2λ∂X(t)n dW

= −αnX(t)n dt + λn(n− 1)X(t)n−2 dt + n√

2λX(t)n−1 dW.

Consequently:

X(t)n = xn +

∫ t

0

(−αnX(t)n + λn(n− 1)X(t)n−2

)dt

+n√

2λ

∫ t

0X(t)n−1 dW.

By taking the expectation in the above equation we obtain theequation for the mo-ments of the OU process that we derived earlier using the Fokker-Planck equation:

Mn(t) = xn +

∫ t

0(−αnMn(s) + λn(n− 1)Mn−2(s)) ds.

Consider thegeometric Brownian motion

dX(t) = µX(t) dt + σX(t) dW (t), (7.10)

where we use the Ito interpretation of the stochastic differential. The generator ofthis process is

L = µx∂x +σ2x2

2∂2x.

The solution to this equation is

X(t) = X(0) exp

((µ− σ2

2)t+ σW (t)

). (7.11)

7.4. THE GENERATOR, ITO’S FORMULA AND THE FOKKER-PLANCK EQUATION129

To derive this formula, we apply Ito’s formula to the function f(x) = log(x):

d log(X(t)) = L(

log(X(t)))dt+ σx∂x log(X(t)) dW (t)

=

(µx

1

x+σ2x2

2

(− 1

x2

))dt+ σ dW (t)

=

(µ− σ2

2

)dt + σ dW (t).

Consequently:

log

(X(t)

X(0)

)=

(µ− σ2

2

)t+ σW (t)

from which (7.11) follows. Notice that the Stratonovich interpretation of this equa-tion leads to the solution

X(t) = X(0) exp(µt+ σW (t))

7.4 The Generator, Ito’s formula and the Fokker-PlanckEquation

7.4.1 The Generator

Given the functionγ(z) in the SDE (7.1) we define

Γ(z) = γ(z)γ(z)T . (7.12)

ThegeneratorL is then defined as

Lv = h · ∇v +1

2Γ : ∇∇v. (7.13)

This operator, equipped with a suitable domain of definition, is the generator of theMarkov process given by (7.1). The formalL2−adjoint operatorL∗

L∗v = −∇ · (hv) +1

2∇ · ∇ · (Γv).

7.4.2 Ito’s Formula

The It o formula enables us to calculate the rate of change in time of functionsV : Z → R

n evaluated at the solution of aZ-valued SDE. Formally, we can write:

d

dt

(V (z(t))

)= LV (z(t)) +

⟨∇V (z(t)), γ(z(t))

dW

dt

⟩.


Note that ifW were a smooth time-dependent function this formula would not becorrect: there is an additional term inLV , proportional toΓ, which arises from thelack of smoothness of Brownian motion. The precise interpretation of the expres-sion for the rate of change ofV is in integrated form:

Lemma 7.4.1. (Ito’s Formula) Assume that the conditions of Theorem 7.3.2 hold.Letx(t) solve(7.1)and letV ∈ C2(Z,Rn). Then the processV (z(t)) satisfies

V (z(t)) = V (z(0)) +

∫ t

0LV (z(s))ds +

∫ t

0〈∇V (z(s)), γ(z(s)) dW (s)〉 .

Let φ : Z 7→ R and consider the function

v(z, t) = E(φ(z(t))|z(0) = z

), (7.14)

where the expectation is with respect to all Brownian driving paths. By averagingin the Ito formula, which removes the stochastic integral,and using the Markovproperty, it is possible to obtain the Backward Kolmogorov equation.

Theorem 7.4.2.Assume thatφ is chosen sufficiently smooth so that thebackwardKolmogorov equation

∂v

∂t= Lv for (z, t) ∈ Z × (0,∞),

v = φ for (z, t) ∈ Z × 0 , (7.15)

has a unique classical solutionv(x, t) ∈ C2,1(Z × (0,∞), ). Thenv is given by(7.14)wherez(t) solves(7.2).

For a Stratonovich SDE the rules of standard calculus apply:Consider theStratonovich SDE (7.29) and letV (x) ∈ C2(R). Then

dV (X(t)) =dV

dx(X(t)) (f(X(t)) dt + σ(X(t)) dW (t)) .

Consider the Stratonovich SDE (7.29) onRd (i.e. f ∈ R

d, σ : Rn 7→ R

d, W (t) isstandard Brownian motion onRn). The corresponding Fokker-Planck equation is:

∂ρ

∂t= −∇ · (fρ) +

1

2∇ · (σ∇ · (σρ))). (7.16)

Now we can derive rigorously the Fokker-Planck equation.

7.5. LINEAR SDES 131

Theorem 7.4.3.Consider equation(7.2)with z(0) a random variable with densityρ0(z). Assume that the law ofz(t) has a densityρ(z, t) ∈ C2,1(Z × (0,∞)). Thenρ satisfies theFokker-Planck equation

∂ρ

∂t= L∗ρ for (z, t) ∈ Z × (0,∞), (7.17a)

ρ = ρ0 for z ∈ Z × 0. (7.17b)

Proof. LetEµ denote averaging with respect to the product measure induced by themeasureµ with densityρ0 on z(0) and the independent driving Wiener measureon the SDE itself. Averaging over randomz(0) distributed with densityρ0(z), wefind

Eµ(φ(z(t))) =

∫

Zv(z, t)ρ0(z) dz

=

∫

Z(eLtφ)(z)ρ0(z) dz

=

∫

Z(eL

∗tρ0)(z)φ(z) dz.

But sinceρ(z, t) is the density ofz(t) we also have

Eµ(φ(z(t))) =

∫

Zρ(z, t)φ(z)dz.

Equating these two expressions for the expectation at timet we obtain∫

Z(eL

∗tρ0)(z)φ(z) dz =

∫

Zρ(z, t)φ(z) dz.

We use a density argument so that the identity can be extendedto all φ ∈ L2(Z).Hence, from the above equation we deduce that

ρ(z, t) =(eL

∗tρ0

)(z).

Differentiation of the above equation gives (7.17a). Setting t = 0 gives the initialcondition (7.17b).

7.5 Linear SDEs

In this section we study linear SDEs in arbitrary finite dimensions. LetA ∈ Rn×n

be a positive definite matrix and letD > 0 be a positive constant. We will considerthe SDE

dX(t) = −AX(t) dt +√

2DdW (t)


or, componentwise,

dXi(t) = −d∑

j=1

AijXj(t) +√

2DdWi(t), i = 1, . . . d.

The corresponding Fokker-Planck equation is

∂p

∂t= ∇ · (Axp) +D∆p

or∂p

∂t=

d∑

i,j

∂

∂xi(Aijxjp) +D

d∑

j=1

∂2p

∂x2j

.

Let us now solve the Fokker-Planck equation with initial conditionsp(x, t|x0, 0) =

δ(x − x0). We take the Fourier transform of the Fokker-Planck equation to obtain

∂p

∂t= −Ak · ∇kp−D|k|2p (7.18)

with

p(x, t|x0, 0) = (2π)−d∫

Rd

eik·xp(k, t|x0, t) dk.

The initial condition is

p(k, 0|x0, 0) = e−ik·x0 (7.19)

We know that the transition probability density of a linear SDE is Gaussian. Sincethe Fourier transform of a Gaussian function is also Gaussian, we look for a solu-tion to (7.18) which is of the form

p(k, t|x0, 0) = exp(−ik ·M(t) − 1

2kTΣ(t)k).

We substitute this into (7.18) and use the symmetry ofA to obtain the equations

dM

dt= −AM and

dΣ

dt= −2AΣ + 2DI,

with initial conditions (which follow from (10.13))M(0) = x0 andΣ(0) = 0

where0 denotes the zerod × d matrix. We can solve these equations using thespectral resolution ofA = BTΛB. The solutions are

M(t) = e−AtM(0)

7.5. LINEAR SDES 133

and

Σ(t) = DA−1 −DA−1e−2At.

We calculate now the inverse Fourier transform ofp to obtain the fundamentalsolution (Green’s function) of the Fokker-Planck equation

p(x, t|x0, 0) = (2π)−d/2(det(Σ(t)))−1/2 exp

(−1

2

(x− e−Atx0

)TΣ−1(t)

(x− e−Atx0

)).

(7.20)

We note that generator of the Markov processesXt is of the form

L = −∇V (x) · ∇ +D∆

with V (x) = 12x

TAx = 12

∑di,j=1Aijxixj. This is a confining potential and from

the theory presented in Section 6.5 we know that the processXt is ergodic. Theinvariant distribution is

ps(x) =1

Ze−

12xTAx (7.21)

with Z =∫

Rd e− 1

2xTAx dx = (2π)

d2

√det(A−1). Using the above calculations, we

can calculate the stationary autocorrelation matrix is given by the formula

E(XT0 Xt) =

∫ ∫xT0 xp(x, t|x0, 0)ps(x0) dxdx0.

We substitute the formulas for the transitions probabilitydensity and the station-ary distribution, equations (7.21) and (7.20) into the above equations and do theGaussian integration to obtain

E(XT0 Xt) = DA−1e−At.

We use now the the variation of constants formula to obtain

Xt = eAtX0 +√

2D

∫ t

0eA(t−s) dW (s).

The matrix exponential can be calculated using the spectralresolution ofA:

eAt = BT eΛtB.


7.6 Derivation of the Stratonovich SDE

When white noise is approximated by a smooth process this often leads to Stratonovichinterpretations of stochastic integrals, at least in one dimension. We use multiscaleanalysis (singular perturbation theory for Markov processes) to illustrate this phe-nomenon in a one-dimensional example.

Consider the equations

dx

dt= h(x) +

1

εf(x)y, (7.22a)

dy

dt= −αy

ε2+

√2D

ε2dV

dt, (7.22b)

with V being a standard one-dimensional Brownian motion. We say that the pro-cessx(t) is driven bycolored noise: the noise that appears in (7.22a) has non-zerocorrelation time. The correlation function of the colored noiseη(t) := y(t)/ε is(we takey(0) = 0)

R(t) = E (η(t)η(s)) =1

ε2D

αe−

α

ε2|t−s|.

The power spectrum of the colored noiseη(t) is:

f ε(x) =1

ε2Dε−2

π

1

x2 + (αε−2)2

=D

π

1

ε4x2 + α2→ D

πα2

and, consequently,

limε→0

E

(y(t)

ε

y(s)

ε

)=

2D

α2δ(t − s),

which implies the heuristic

limε→0

y(t)

ε=

√2D

α2

dV

dt. (7.23)

Another way of seeing this is by solving (7.22b) fory/ε:

y

ε=

√2D

α2

dV

dt− ε

α

dy

dt. (7.24)

7.6. DERIVATION OF THE STRATONOVICH SDE 135

If we neglect theO(ε) term on the right hand side then we arrive, again, at theheuristic (7.23). Both of these arguments lead us to conjecture the limiting ItoSDE:

dX

dt= h(X) +

√2D

αf(X)

dV

dt. (7.25)

In fact, as applied, the heuristic gives the incorrect limit. Whenever white noise isapproximated by a smooth process, the limiting equation should be interpreted inthe Stratonovich sense, giving

dX

dt= h(X) +

√2D

αf(X) dV

dt. (7.26)

This is usually called the Wong-Zakai theorem. A similar result is true in arbitraryfinite and even infinite dimensions. We will show this using singular perturbationtheory.

Theorem 7.6.1.Assume that the initial conditions fory(t) are stationary and thatthe functionf is smooth. Then the solution of eqn(7.22a)converges, in the limitasε→ 0 to the solution of the Stratonovich SDE(7.26).

Remarks 7.6.2. i. It is possible to prove pathwise convergence under verymild assumptions.

ii. The generator of a Stratonovich SDE has the from

Lstrat = h(x)∂x +D

αf(x)∂x (f(x)∂x) .

iii. Consequently, the Fokker-Planck operator of the Stratonovich SDE can be

written in divergence form:

L∗strat· = −∂x (h(x)·) +

D

α∂x(f2(x)∂x·

).

iv. In most applications in physics the white noise is an approximation of amore complicated noise processes with non-zero correlation time. Hence, thephysically correct interpretation of the stochastic integral is the Stratonovichone.

v. In higher dimensions an additional drift term might appear due to the non-commutativity of the row vectors of the diffusion matrix. This is related totheLevy area correctionin the theory of rough paths.


Proof of Proposition 7.6.1The generator of the process(x(t), y(t)) is

L =1

ε2(−αy∂y +D∂2

y

)+

1

εf(x)y∂x + h(x)∂x

=:1

ε2L0 +

1

εL1 + L2.

The ”fast” process is an stationary Markov process with invariant density

ρ(y) =

√α

2πDe−

αy2

2D . (7.27)

The backward Kolmogorov equation is

∂uε

∂t=

(1

ε2L0 +

1

εL1 + L2

)uε. (7.28)

We look for a solution to this equation in the form of a power series expansion inε:

uε(x, y, t) = u0 + εu1 + ε2u2 + . . .

We substitute this into (7.28) and equate terms of the same power inε to obtain thefollowing hierarchy of equations:

−L0u0 = 0,

−L0u1 = L1u0,

−L0u2 = L1u1 + L2u0 −∂u0

∂t.

The ergodicity of the fast process implies that the null space of the generatorL0

consists only of constant iny. Hence:

u0 = u(x, t).

The second equation in the hierarchy becomes

−L0u1 = f(x)y∂xu.

This equation is solvable since the right hand side is orthogonal to the null space ofthe adjoint ofL0 (this is theFredholm alterantive). We solve it using separationof variables:

u1(x, y, t) =1

αf(x)∂xuy + ψ1(x, t).

7.6. DERIVATION OF THE STRATONOVICH SDE 137

In order for the third equation to have a solution we need to require that the righthand side is orthogonal to the null space ofL∗

0:

∫

R

(L1u1 + L2u0 −

∂u0

∂t

)ρ(y) dy = 0.

We calculate: ∫

R

∂u0

∂tρ(y) dy =

∂u

∂t.

Furthermore: ∫

R

L2u0ρ(y) dy = h(x)∂xu.

Finally

∫

R

L1u1ρ(y) dy =

∫

R

f(x)y∂x

(1

αf(x)∂xuy + ψ1(x, t)

)ρ(y) dy

=1

αf(x)∂x (f(x)∂xu) 〈y2〉 + f(x)∂xψ1(x, t)〈y〉

=D

α2f(x)∂x (f(x)∂xu)

=D

α2f(x)∂xf(x)∂xu+

D

α2f(x)2∂2

xu.

Putting everything together we obtain the limiting backward Kolmogorov equation

∂u

∂t=

(h(x) +

D

α2f(x)∂xf(x)

)∂xu+

D

α2f(x)2∂2

xu,

from which we read off the limiting Stratonovich SDE

dX

dt= h(X) +

√2D

αf(X) dV

dt.

7.6.1 Ito versus Stratonovich

A Stratonovich SDE

dX(t) = f(X(t)) dt + σ(X(t)) dW (t) (7.29)

can be written as an Ito SDE

dX(t) =

(f(X(t)) +

1

2

(σdσ

dx

)(X(t))

)dt+ σ(X(t)) dW (t).


Conversely, and Ito SDE

dX(t) = f(X(t)) dt + σ(X(t))dW (t) (7.30)

can be written as a Statonovich SDE

dX(t) =

(f(X(t)) − 1

2

(σdσ

dx

)(X(t))

)dt+ σ(X(t)) dW (t).

The Ito and Stratonovich interpretation of an SDE can lead to equations with verydifferent properties!

When the diffusion coefficient depends on the solution of theSDEX(t), wewill say that we have an equation withmultiplicative noise .

7.7 Numerical Solution of SDEs

7.8 Parameter Estimation for SDEs

7.9 Noise Induced Transitions

Consider theLandau equation:

dXt

dt= Xt(c−X2

t ), X0 = x. (7.31)

This is a gradient flow for the potentialV (x) = 12cx

2 − 14x

4. Whenc < 0 allsolutions are attracted to the single steady stateX∗ = 0. Whenc > 0 the steadystateX∗ = 0 becomes unstable andXt →

√c if x > 0 andXt → −√

c if x < 0.Consider additive random perturbations to the Landau equation:

dXt

dt= Xt(c−X2

t ) +√

2σdWt

dt, X0 = x. (7.32)

This equation defines an ergodic Markov process onR: There exists a uniqueinvariant distribution:

ρ(x) = Z−1e−V (x)/σ , Z =

∫

R

e−V (x)/σ dx, V (x) =1

2cx2 − 1

4x4.

ρ(x) is a probability density for all values ofc ∈ R. The presence of additive noisein some sense ”trivializes” the dynamics. The dependence ofvarious averagedquantities onc resembles the physical situation of a second order phase transition.

7.9. NOISE INDUCED TRANSITIONS 139

Consider now multiplicative perturbations of the Landau equation.

dXt

dt= Xt(c−X2

t ) +√

2σXtdWt

dt, X0 = x. (7.33)

Where the stochastic differential is interpreted in the Itˆo sense. The generator ofthis process is

L = x(c− x2)∂x + σx2∂2x.

Notice thatXt = 0 is always a solution of (7.33). Thus, if we start withx > 0

(x < 0) the solution will remain positive (negative). We will assume thatx > 0.Consider the functionYt = log(Xt). We apply Ito’s formula to this function:

dYt = L log(Xt) dt + σXt∂x log(Xt) dWt

=

(Xt(c−X2

t )1

Xt− σX2

t

1

X2t

)dt+ σXt

1

XtdWt

= (c− σ) dt −X2t dt+ σ dWt.

Thus, we have been able to transform (7.33) into an SDE with additive noise:

dYt =[(c− σ) − e2Yt

]dt+ σ dWt. (7.34)

This is a gradient flow with potential

V (y) = −[(c− σ)y − 1

2e2y].

The invariant measure, if it exists, is of the form

ρ(y) dy = Z−1e−V (y)/σ dy.

Going back to the variablex we obtain:

ρ(x) dx = Z−1x(c/σ−2)e−x2

2σ dx.

We need to make sure that this distribution is integrable:

Z =

∫ +∞

0xγe−

x2

2σ <∞, γ =c

σ− 2.

For this it is necessary that

γ > −1 ⇒ c > σ.

Not all multiplicative random perturbations lead to ergodic behavior. The depen-dence of the invariant distribution onc is similar to the physical situation of firstorder phase transitions.



Colored NoiseWhen the noise which drives an SDE has non-zero correlation timewe will say that we havecolored noise. The properties of the SDE (stability,ergodicity etc.) are quite robust under ”coloring of the noise”. See

G. Blankenship and G.C. Papanicolaou,Stability and control of stochastic sys-tems with wide-band noise disturbances. I, SIAM J. Appl. Math.,34(3), 1978, pp.437–476. Colored noise appears in many applications in physics and chemistry.For a review see P. Hanggi and P. JungColored noise in dynamical systems.Adv.Chem. Phys.89239 (1995).

In the case where there is an additional small time scale in the problem, inaddition to the correlation time of the colored noise, it is not clear what the rightinterpretation of the stochastic integral (in the limit as both small time scales goto 0). This is usually called theIt o versus Stratonovich problem. Consider, forexample, the SDE

τX = −X + v(X)ηε(t),

whereηε(t) is colored noise with correlation timeε2. In the limit where both smalltime scales go to0 we can get either Ito or Stratonovich or neither. See [40, 56].

Noise induced transitions are studied extensively in [32].The material in Sec-tion 7.9 is based on [47]. See also [46].

7.11 Exercises

1. Calculate all moments of the geometric Brownian motion for the Ito and Stratonovichinterpretations of the stochastic integral.

2. Study additive and multiplicative random perturbationsof the ODE

dx

dt= x(c+ 2x2 − x4).

3. Analyze equation (7.33) for the Stratonovich interpretation of the stochasticintegral.

Chapter 8

The Langevin Equation

8.1 Introduction

8.2 The Fokker-Planck Equation in Phase Space (Klein-Kramers Equation)

Consider a diffusion process in two dimensions for the variablesq (position) andmomentump. The generator of this Markov process is

L = p · ∇q −∇qV∇p + γ(−p∇p +D∆p). (8.1)

TheL2(dpdq)-adjoint is

L∗ρ = −p · ∇qρ−∇qV · ∇pρ+ γ (∇p(pρ) +D∆pρ) .

The corresponding FP equation is:

∂p

∂t= L∗p.

The corresponding stochastic differential equations is the Langevin equation

Xt = −∇V (Xt) − γXt +√

2γDWt. (8.2)

This is Newton’s equation perturbed by dissipation and noise. The Fokker-Planckequation for the Langevin equation, which is sometimes called theKlein-Kramers-Chandrasekhar equationwas first derived by Kramers in1923 and was studiedby Kramers in his famous paper [?]. Notice thatL∗ is not a uniformly elliptic op-erator: there are second order derivatives only with respect to p and notq. This is

141

142 CHAPTER 8. THE LANGEVIN EQUATION

an example of a degenerate elliptic operator. It is, however, hypoelliptic. We canstill prove existence, uniqueness and regularity of solutions for the Fokker-Planckequation, and obtain estimates on the solution. It is not possible to obtain the so-lution of the FP equation for an arbitrary potential. We can,however, calculate the(unique normalized) solution of the stationary Fokker-Planck equation.

Theorem 8.2.1.LetV (x) be a smooth confining potential. Then the Markov pro-cess with generator(8.45) is ergodic. The unique invariant distribution is theMaxwell-Boltzmann distribution

ρ(p, q) =1

Ze−βH(p,q) (8.3)

whereH(p, q) =

1

2‖p‖2 + V (q)

is theHamiltonian , β = (kBT )−1 is theinverse temperatureand the normaliza-tion factorZ is thepartition function

Z =

∫

R2d

e−βH(p,q) dpdq.

It is possible to obtain rates of convergence in either a weightedL2-norm orthe relative entropy norm.

H(p(·, t)|ρ) 6 Ce−αt.

The proof of this result is very complicated, since the generator L is degenerateand non-selfadjoint. See for example and the references therein.

Let ρ(q, p, t) be the solution of the Kramers equation and letρβ(q, p) be theMaxwell-Boltzmann distribution. We can write

ρ(q, p, t) = h(q, p, t)ρβ(q, p),

whereh(q, p, t) solves the equation

∂h

∂t= −Ah+ γSh (8.4)

whereA = p · ∇q −∇qV · ∇p, S = −p · ∇p + β−1∆p.

The operatorA is antisymmetric inL2ρ := L2(R2d; ρβ(q, p)), whereasS is sym-

metric.

8.2. THE FOKKER-PLANCK EQUATION IN PHASE SPACE (KLEIN-KRAMERS EQUATION)143

LetXi := − ∂∂pi

. TheL2ρ-adjoint ofXi is

X∗i = −βpi +

∂

∂pi.

We have that

S = β−1d∑

i=1

X∗i Xi.

Consequently, the generator of the Markov processq(t), p(t) can be written inHormander’s ”sum of squares” form:

L = A + γβ−1d∑

i=1

X∗i Xi. (8.5)

We calculate the commutators between the vector fields in (8.5):

[A,Xi] =∂

∂qi, [Xi,Xj ] = 0, [Xi,X

∗j ] = βδij .

Consequently,

Lie(X1, . . . Xd, [A,X1], . . . [A,Xd]) = Lie(∇p,∇q)

which spansTp,qR2d for all p, q ∈ Rd. This shows that the generatorL is a

hypoelliptic operator.Let nowYi = − ∂

∂piwith L2

ρ-adjointY ∗i = ∂

∂qi− β ∂V∂qi . We have that

X∗i Yi − Y ∗

i Xi = β

(pi∂

∂qi− ∂V

∂qi

∂

∂pi

).

Consequently, the generator can be written in the form

L = β−1d∑

i=1

(X∗i Yi − Y ∗

i Xi + γX∗i Xi) . (8.6)

Notice also that

LV := −∇qV∇q + β−1∆q = β−1d∑

i=1

Y ∗i Yi.

The phase-space Fokker-Planck equation can be written in the form

∂ρ

∂t+ p · ∇qρ−∇qV · ∇pρ = Q(ρ, fB)


where thecollision operatorhas the form

Q(ρ, fB) = D∇ ·(fB∇

(f−1B ρ

)).

The Fokker-Planck equation has a similar structure to the Boltzmann equation (thebasic equation in the kinetic theory of gases), with the difference that the collisionoperator for the FP equation is linear. Convergence of solutions of the Boltzmannequation to the Maxwell-Boltzmann distribution has also been proved. See??.

We can study the backward and forward Kolmogorov equations for (9.13) byexpanding the solution with respect to the Hermite basis. Weconsider the problemin 1d. We setD = 1. The generator of the process is:

L = p∂q − V ′(q)∂p + γ(−p∂p + ∂2

p

).

=: L1 + γL0,

whereL0 := −p∂p + ∂2

p and L1 := p∂q − V ′(q)∂p.

The backward Kolmogorov equation is

∂h

∂t= Lh. (8.7)

The solution should be an element of the weightedL2-space

L2ρ =

f |∫

R2

|f |2Z−1e−βH(p,q) dpdq <∞.

We notice that the invariant measure of our Markov process isa product measure:

e−βH(p,q) = e−β12|p|2e−βV (q).

The spaceL2(e−β12|p|2 dp) is spanned by the Hermite polynomials. Consequently,

we can expand the solution of (8.7) into the basis of Hermite basis:

h(p, q, t) =

∞∑

n=0

hn(q, t)fn(p), (8.8)

wherefn(p) = 1/√n!Hn(p). Our plan is to substitute (8.8) into (8.7) and obtain a

sequence of equations for the coefficientshn(q, t). We have:

L0h = L0

∞∑

n=0

hnfn = −∞∑

n=0

nhnfn

8.2. THE FOKKER-PLANCK EQUATION IN PHASE SPACE (KLEIN-KRAMERS EQUATION)145

FurthermoreL1h = −∂qV ∂ph+ p∂qh.

We calculate each term on the right hand side of the above equation separately. Forthis we will need the formulas

∂pfn =√nfn−1 andpfn =

√nfn−1 +

√n+ 1fn+1.

p∂qh = p∂q

∞∑

n=0

hnfn = p∂ph0 +

∞∑

n=1

∂qhnpfn

= ∂qh0f1 +∞∑

n=1

∂qhn(√nfn−1 +

√n+ 1fn+1

)

=

∞∑

n=0

(√n+ 1∂qhn+1 +

√n∂qhn−1)fn

with h−1 ≡ 0. Furthermore

∂qV ∂ph =∞∑

n=0

∂qV hn∂pfn =∞∑

n=0

∂qV hn√nfn−1

=

∞∑

n=0

∂qV hn+1

√n+ 1fn.

Consequently:

Lh = L1 + γL1h

=

∞∑

n=0

(− γnhn +

√n+ 1∂qhn+1

+√n∂qhn−1 +

√n+ 1∂qV hn+1

)fn

Using the orthonormality of the eigenfunctions ofL0 we obtain the following setof equations which determinehn(q, t)∞n=0.

hn = −γnhn +√n+ 1∂qhn+1

+√n∂qhn−1 +

√n+ 1∂qV hn+1, n = 0, 1, . . .

This is set of equations is usually called theBrinkman hierarchy (1956). We canuse this approach to develop a numerical method for solving the Klein-Kramers


equation. For this we need to expand each coefficienthn in an appropriate basiswith respect toq. Obvious choices are other the Hermite basis (polynomial po-tentials) or the standard Fourier basis (periodic potentials). We will do this for thecase of periodic potentials. The resulting method is usually called thecontinuedfraction expansion. See [64]. The Hermite expansion of the distribution func-tion wrt to the velocity is used in the study of various kinetic equations (includingthe Boltzmann equation). It was initiated by Grad in the late40’s. It quite oftenused in the approximate calculation of transport coefficients (e.g. diffusion coeffi-cient). This expansion can be justified rigorously for the Fokker-Planck equation.See [53]. This expansion can also be used in order to solve thePoisson equation−Lφ = f(p, q). See [58].

8.3 The Langevin Equation in a Harmonic Potential

There are very few potentials for which we can solve the Langevin equation orto calculate the eigenvalues and eigenfunctions of the generator of the Markovprocessq(t), p(t). One case where we can calculate everything explicitly is thatof a Brownian particle in a quadratic (harmonic) potential

V (q) =1

2ω2

0q2. (8.9)

The Langevin equation is

q = −ω20q − γq +

√2γβ−1W (8.10)

orq = p, p = −ω2

0q − γp+√

2γβ−1W . (8.11)

This is a linear equation that can be solved explicitly. Rather than doing this, wewill calculate the eigenvalues and eigenfunctions of the generator, which takes theform

L = p∂q − ω20q∂p + γ(−p∂p + β−1∂2

p). (8.12)

The Fokker-Planck operator is

L = p∂q − ω20q∂p + γ(−p∂p + β−1∂2

p). (8.13)

The processq(t), p(t) is an ergodic Markov process with Gaussian invariantmeasure

ρβ(q, p) dqdp =βω0

2πe−β

2p2−βω2

0q2 . (8.14)

8.3. THE LANGEVIN EQUATION IN A HARMONIC POTENTIAL 147

For the calculation of the eigenvalues and eigenfunctions of the operatorL it isconvenient to introduce creation and annihilation operator in both the position andmomentum variables. We set

a− = β−1/2∂p, a+ = −β−1/2∂p + β1/2p (8.15)

andb− = ω−1

0 β−1/2∂q, b+ = −ω−10 β−1/2∂q + ω0β

1/2p. (8.16)

We have thata+a− = −β−1∂2

p + p∂p

andb+b− = −β−1∂2

q + q∂q

Consequently, the operator

L = −a+a− − b+b− (8.17)

is the generator of the OU process in two dimensions.The operatorsa±, b± satisfy the commutation relations

[a+, a−] = −1, (8.18a)

[b+, b−] = −1, (8.18b)

[a±, b±] = 0. (8.18c)

See Exercise 3. Using now the operatorsa± andb± we can write the generatorLin the form

L = −γa+a− − ω0(b+a− − a+b−), (8.19)

which is a particular case of (8.6). In order to calculate theeigenvalues and eigen-functions of (8.19) we need to make an appropriate change of variables in orderto bring the operatorL into the ”decoupled” form (8.17). Clearly, this is a lineartransformation and can be written in the form

Y = AX

whereX = (q, p) for some2 × 2 matrixA. It is somewhat easier to make thischange of variables at the level of the creation and annihilation operators. In par-ticular, our goal is to find first order differential operators c± andd± so that theoperator (8.19) becomes

L = −Cc+c− −Dd+d− (8.20)


for some appropriate constantsC andD. Since our goal is, essentially, to mapLto the two-dimensional OU process, we require that that the operatorsc± andd±

satisfy thecanonical commutation relations

[c+, c−] = −1, (8.21a)

[d+, d−] = −1, (8.21b)

[c±, d±] = 0. (8.21c)

The operatorsc± andd± should be given as linear combinations of the old op-eratorsa± and b±. From the structure of the generatorL (8.19), the decoupledform (8.20) and the commutation relations (8.21) and (8.18)we conclude thatc±

andd± should be of the form

c+ = α11a+ + α12b

+, (8.22a)

c− = α21a− + α22b

−, (8.22b)

d+ = β11a+ + β12b

+, (8.22c)

d− = β21a− + β22b

−. (8.22d)

Notice that thec− andd− are not the adjoints ofc+ andd+. If we substitute nowthese equations into (8.20) and equate it with (8.19) and into the commutation re-lations (8.21) we obtain a system of equations for the coefficientsαij, βij. Inorder to write down the formulas for these coefficients it is convenient to introducethe eigenvalues of the deterministic problem

q = −γq − ω20q.


q(t) = C1e−λ1t +C2e

−λ2t

with

λ1,2 =γ ± δ

2, δ =

√γ2 − 4ω2

0 . (8.23)

The eigenvalues satisfy the relations

λ1 + λ2 = γ, λ1 − λ2 = δ, λ1λ2 = ω20. (8.24)


Proposition 8.3.1.LetL be the generator(8.19)and letc±, dpm be the operators

c+ =1√δ

(√λ1a

+ +√λ2b

+), (8.25a)

c− =1√δ

(√λ1a

− −√λ2b

−), (8.25b)

d+ =1√δ

(√λ2a

+ +√λ1b

+), (8.25c)

d− =1√δ

(−√λ2a

− +√λ1b

−). (8.25d)

Thenc±, d± satisfy the canonical commutation relations(8.21)as well as

[L, c±] = −λ1c±, [L, d±] = −λ2d

±. (8.26)

Furthermore, the operatorL can be written in the form

L = −λ1c+c− − λ2d

+d−. (8.27)

Proof. first we check the commutation relations:

[c+, c−] =1

δ

(λ1[a

+, a−] − λ2[b+, b−]

)

=1

δ(−λ1 + λ2) = −1.

Similarly,

[d+, d−] =1

δ

(−λ2[a

+, a−] + λ1[b+, b−]

)

=1

δ(λ2 − λ1) = −1.

Clearly, we have that

[c+, d+] = [c−, d−] = 0.

Furthermore,

[c+, d−] =1

δ

(−√λ1λ2[a

+, a−] +√λ1λ2[b

+, b−])

=1

δ(−√λ1λ2 + −

√λ1λ2) = 0.


Finally:

[L, c+] = −λ1c+c−c+ + λ1c

+c+c−

= −λ1c+(1 + c+c−) + λ1c

+c+c−

= −λ1c+(1 + c+c−) + λ1c

+c+c−

= −λ1c+,

and similarly for the other equations in (8.26). Now we calculate

L = −λ1c+c− − λ2d

+d−

= −λ22 − λ2

1

δa+a− + 0b+b− +

√λ1λ2

δ(λ1 − λ2)a

+b− +1

δ

√λ1λ2(−λ1 + λ2)b

+a−

= −γa+a− − ω0(b+a− − a+b−),

which is precisely (8.19). In the above calculation we used (8.24).

Using now (8.27) we can readily obtain the eigenvalues and eigenfunctions ofL. From our experience with the two-dimensional OU processes(or, the Schrodingeroperator for the two-dimensional quantum harmonic oscillator), we expect that theeigenfunctions should be tensor products of Hermite polynomials. Indeed, we havethe following, which is the main result of this section.

Theorem 8.3.2.The eigenvalues and eigenfunctions of the generator of the Markovprocessq, p (8.11)are

λnm = λ1n+ λ2m =1

2γ(n+m) +

1

2δ(n −m), n,m = 0, 1, . . . (8.28)

and

φnm(q, p) =1√n!m!

(c+)n(d+)m1, n,m = 0, 1, . . . (8.29)

Proof. We have

[L, (c+)2] = L(c+)2 − (c+)2L= (c+L − λ1c

+)c+ − c+(Lc+ + λ1c+)

= −2λ1(c+)2

and similarly[L, (d+)2] = −2λ1(c+)2. A simple induction argument now shows

that (see Exercise 8.3.3)

[L, (c+)n] = −nλ1(c+)n and [L, (d+)m] = −mλ1(d

+)m. (8.30)


We use (8.30) to calculate

L(c+)n(d+)n1

= (c+)nL(d+)m1 − nλ1(c+)n(d+m)1

= (c+)n(d+)mL1 −mλ2(c+)n(d+m)1− nλ1(c

+)n(d+m)1

= −nλ1(c+)n(d+m)1 −mλ2(c

+)n(d+m)1

from which (8.28) and (8.29) follow.

Exercise 8.3.3.Show that

[L, (c±)n] = −nλ1(c±)n, [L, (d±)n] = −nλ1(d

±)n, [c−, (c+)n] = n(c+)n−1, [d−, (d+)n] = n(d+)n−1.

(8.31)

Remark 8.3.4. In terms of the operatorsa±, b± the eigenfunctions ofL are

φnm =√n!m!δ−

n+m2 λ

n/21 λ

m/22

n∑

ℓ=0

m∑

k=0

1

k!(m− k)!ℓ!(n − ℓ)!

(λ1

λ2

) k−ℓ2

(a+)n+m−k−ℓ(b+)ℓ+k1.

The first few eigenfunctions are

φ00 = 1.

φ10 =

√β(√λ1p+

√λ2ω0q

)√δ

.

φ01 =

√β(√λ2p+

√λ1ω0q

)√δ

φ11 =−2

√λ1

√λ2 +

√λ1β p

2√λ2 + β pλ1ω0q + ω0β qλ2p+

√λ2ω0

2β q2√λ1

δ.

φ20 =−λ1 + β p2λ1 + 2

√λ2β p

√λ1ω0q − λ2 + ω0

2β q2λ2√2δ

.

φ02 =−λ2 + β p2λ2 + 2

√λ2β p

√λ1ω0q − λ1 + ω0

2β q2λ1√2δ

.


Notice that the eigenfunctions are not orthonormal.As we already know, the first eigenvalue, corresponding to the constant eigen-

function, is0:λ00 = 0.

Notice that the operatorL is not self-adjoint and consequently, we do not expect itseigenvalues to be real. Indeed, whether the eigenvalues arereal or not depends onthe sign of the discriminant∆ = γ2−4ω2

0. In theunderdampedregime,γ < 2ω0

the eigenvalues are complex:

λnm =1

2γ(n+m) +

1

2i√

−γ2 + 4ω20(n−m), γ < 2ω0.

This it to be expected, since the underdamped regime the dynamics is dominatedby the deterministic Hamiltonian dynamics that give rise tothe antisymmetric Li-ouville operator. We setω =

√(4ω2

0 − γ2), i.e. δ = 2iω. The eigenvalues can bewritten as

λnm =γ

2(n +m) + iω(n−m).

In Figure 8.3 we present the first few eigenvalues ofL in the underdamped regime.The eigenvalues are contained in a cone on the right half of the complex plane. Thecone is determined by

λn0 =γ

2n+ iωn and λ0m =

γ

2m− iωm.

The eigenvalues along the diagonal are real:

λnn = γn.

On the other hand, in theoverdampedregime,γ > 2ω0 all eigenvalues are real:

λnm =1

2γ(n+m) +

1

2

√γ2 − 4ω2

0(n−m), γ > 2ω0.

In fact, in the overdamped limitγ → +∞ (which we will study in Chapter??), theeigenvalues of the generatorL converge to the eigenvalues of the generator of theOU process:

λnm = γn+ω2

0

γ(n−m) +O(γ−3).

This is consistent with the fact that in this limit the solution of the Langevin equa-tion converges to the solution of the OU SDE. See Chapter?? for details.


0 0.5 1 1.5 2 2.5 3−3

−2

−1

0

1

2

3

Re (λnm)

Im (

λnm)

Figure 8.1: First few eigenvalues ofL for γ = ω = 1.


The eigenfunctions ofL do not form an orthonormal basis inL2β := L2(R2, Z−1e−βH)

sinceL is not a selfadjoint operator. Using the eigenfunctions/eigenvalues ofL wecan easily calculate the eigenfunctions/eigenvalues of theL2

β adjoint ofL. Fromthe calculations presented in Section 8.2 we know that the adjoint operator is

L := −A + γS (8.32)

= −ω0(b+a− − b−a+) + γa+a− (8.33)

= −λ1(c−)∗(c+)∗ − λ2(d

−) ∗ (d+)∗, (8.34)

where

(c+)∗ =1√δ

(√λ1a

− +√λ2b

−), (8.35a)

(c−)∗ =1√δ

(√λ1a

+ −√λ2b

+), (8.35b)

(d+)∗ =1√δ

(√λ2a

− +√λ1b

−), (8.35c)

(d−)∗ =1√δ

(−√λ2a

+ +√λ1b

+). (8.35d)

L has the same eigenvalues asL:

−Lψnm = λnmψnm,

whereλnm are given by (8.28). The eigenfunctions are

ψnm =1√n!m!

((c−)∗)n((d−)∗)m1. (8.36)

Proposition 8.3.5. The eigenfunctions ofL and L satisfy the biorthonormalityrelation ∫ ∫

φnmψℓkρβ dpdq = δnℓδmk. (8.37)

Proof. We will use formulas (8.31). Notice that using the third and fourth of theseequations together with the fact thatc−1 = d−1 = 0 we can conclude that (forn > ℓ)

(c−)ℓ(c+)n1 = n(n− 1) . . . (n− ℓ+ 1)(c+)n−ℓ. (8.38)

8.4. ASYMPTOTIC LIMITS FOR THE LANGEVIN EQUATION 155

We have∫ ∫

φnmψℓkρβ dpdq =1√

n!m!ℓ!k!

∫ ∫((c+))n((d+))m1((c−)∗)ℓ((d−)∗)k1ρβ dpdq

=n(n− 1) . . . (n− ℓ+ 1)m(m− 1) . . . (m− k + 1)√

n!m!ℓ!k!

∫ ∫((c+))n−ℓ((d+))m−k1ρβ dpdq

= δnℓδmk,

since all eigenfunctions average to0 with respect toρβ.

From the eigenfunctions ofL we can obtain the eigenfunctions of the Fokker-Planck operator. Using the formula (see equation (8.4))

L∗(fρβ) = ρLf

we immediately conclude that the the Fokker-Planck operator has the same eigen-values as those ofL andL. The eigenfunctions are

ψ∗nm = ρβφnm = ρβ1√n!m!

((c−)∗)n((d−)∗)m1. (8.39)

8.4 Asymptotic Limits for the Langevin Equation

There are very few SDEs/Fokker-Planck equations that can besolved explicitly. Inmost cases we need to study the problem under investigation either approximatelyor numerically. In this part of the course we will develop approximate methods forstudying various stochastic systems of practical interest. There are many problemsof physical interest that can be analyzed using techniques from perturbation theoryand asymptotic analysis:

i. Small noise asymptotics at finite time intervals.

ii. Small noise asymptotics/large times (rare events): thetheory of large devia-tions, escape from a potential well, exit time problems.

iii. Small and large friction asymptotics for the Fokker-Planck equation: TheFreidlin–Wentzell (underdamped) and Smoluchowski (overdamped) limits.

iv. Large time asymptotics for the Langevin equation in a periodic potential:homogenization and averaging.


v. Stochastic systems with two characteristic time scales:multiscale problemsand methods.

We will study various asymptotic limits for the Langevin equation (we have setm = 1)

q = −∇V (q) − γq +√

2γβ−1W . (8.40)

There are two parameters in the problem, the friction coefficient γ and the in-verse temperatureβ. We want to study the qualitative behavior of solutions to thisequation (and to the corresponding Fokker-Planck equation). There are variousasymptotic limits at which we can eliminate some of the variables of the equa-tion and obtain a simpler equation for fewer variables. In the large temperaturelimit, β ≪ 1, the dynamics of (9.13) is dominated by diffusion: the Langevinequation (9.13) can be approximated by free Brownian motion:

q =√

2γβ−1W .

The small temperature asymptotics,β ≫ 1 is much more interesting and moresubtle. It leads to exponential, Arrhenius type asymptotics for the reaction rate (inthe case of a particle escaping from a potential well due to thermal noise) or thediffusion coefficient (in the case of a particle moving in a periodic potential in thepresence of thermal noise)

κ = ν exp (−βEb) , (8.41)

whereκ can be either the reaction rate or the diffusion coefficient.The smalltemperature asymptotics will be studied later for the case of a bistable potential(reaction rate) and for the case of a periodic potential (diffusion coefficient).

Assuming that the temperature is fixed, the only parameter that is left is thefriction coefficientγ. The large and small friction asymptotics can be expressed interms of a slow/fast system of SDEs. In many applications (especially in biology)the friction coefficient is large:γ ≫ 1. In this case the momentum is the fastvariable which we can eliminate to obtain an equation for theposition. This is theoverdampedor Smoluchowskilimit. In various problems in physics the frictioncoefficient is small:γ ≪ 1. In this case the position is the fast variable whereas theenergy is the slow variable. We can eliminate the position and obtain an equationfor the energy. This is theunderdampled or Freidlin-Wentzell limit. In bothcases we have to look at sufficiently long time scales.


We rescale the solution to (9.13):

qγ(t) = λγ(t/µγ).

This rescaled process satisfies the equation

qγ = −λγµ2γ

∂qV (qγ/λγ) −γ

µγqγ +

√2γλ2

γµ−3γ β−1W , (8.42)

Different choices for these two parameters lead to the overdamped and under-damped limits:λγ = 1, µγ = γ−1, γ ≫ 1. In this case equation (8.42)becomes

γ−2qγ = −∂qV (qγ) − qγ +√

2β−1W . (8.43)

Under this scaling, the interesting limit is the overdampedlimit, γ ≫ 1. We willsee later that in the limit asγ → +∞ the solution to (8.43) can be approximatedby the solution to

q = −∂qV +√

2β−1W .

λγ = 1, µγ = γ, γ ≪ 1:

qγ = −γ−2∇V (qγ) − qγ +√

2γ−2β−1W . (8.44)

Under this scaling the interesting limit is the underdampedlimit, γ ≪ 1. We willsee later that in the limit asγ → 0 the energy of the solution to (8.44) converges toa stochastic process on a graph.

8.4.1 The Overdamped Limit

We consider the rescaled Langevin equation (8.43):

ε2qγ(t) = −∇V (qγ(t)) − qγ(t) +√

2β−1W (t), (8.45)

where we have setε−1 = γ, since we are interested in the limitγ → ∞, i.e.ε→ 0. We will show that, in the limit asε→ 0, qγ(t), the solution of the Langevinequation (8.45), converges toq(t), the solution of the Smoluchowski equation

q = −∇V +√

2β−1W . (8.46)

We write (8.45) as a system of SDEs:

q =1

εp, (8.47)

p = −1

ε∇V (q) − 1

ε2p+

√2

βε2W . (8.48)


This systems of SDEs defined a Markov process in phase space. Its generator is

Lε =1

ε2(− p · ∇p + β−1∆

)+

1

ε

(p · ∇q −∇qV · ∇p

)

=:1

ε2L0 +

1

εL1.

This is a singularly perturbed differential operator. We will derive the Smolu-chowski equation (8.46) using a pathwise technique, as wellas by analyzing thecorresponding Kolmogorov equations.

We apply Ito’s formula top:

dp(t) = Lεp(t) dt +1

ε

√2β−1∂pp(t) dW

= − 1

ε2p(t) dt− 1

ε∇qV (q(t)) dt +

1

ε

√2β−1 dW.

Consequently:

1

ε

∫ t

0p(s) ds = −

∫ t

0∇qV (q(s)) ds +

√2β−1 W (t) + O(ε).

From equation (8.47) we have that

q(t) = q(0) +1

ε

∫ t

0p(s) ds.

Combining the above two equations we deduce

q(t) = q(0) −∫ t

0∇qV (q(s)) ds +

√2β−1W (t) + O(ε)

from which (8.46) follows.Notice that in this derivation we assumed that

E|p(t)|2 6 C.

This estimate is true, under appropriate assumptions on thepotentialV (q) and onthe initial conditions. In fact, we can prove a pathwise approximation result:

(

E supt∈[0,T ]

|qγ(t) − q(t)|p)1/p

6 Cε2−κ,

whereκ > 0, arbitrary small (it accounts for logarithmic corrections).


The pathwise derivation of the Smoluchowski equation implies that the solu-tion of the Fokker-Planck equation corresponding to the Langevin equation (8.45)converges (in some appropriate sense to be explained below)to the solution of theFokker-Planck equation corresponding to the Smoluchowskiequation (8.46). It isimportant in various applications to calculate corrections to the limiting Fokker-Planck equation. We can accomplish this by analyzing the Fokker-Planck equationfor (8.45) using singular perturbation theory. We will consider the problem in onedimension. This mainly to simplify the notation. The multi–dimensional problemcan be treated in a very similar way.

The Fokker–Planck equation associated to equations (8.47)and (8.48) is

∂ρ

∂t= L∗ρ

=1

ε(−p∂qρ+ ∂qV (q)∂pρ) +

1

ε2(∂p(pρ) + β−1∂2

pρ)

=:

(1

ε2L∗

0 +1

εL∗

1

)ρ. (8.49)

The invariant distribution of the Markov processq, p, if it exists, is

ρβ(p, q) =1

Ze−βH(p,q), Z =

∫

R2

e−βH(p,q) dpdq,

whereH(p, q) = 12p

2 + V (q). We define the function f(p,q,t) through

ρ(p, q, t) = f(p, q, t)ρβ(p, q). (8.50)

Proposition 8.4.1. The functionf(p, q, t) defined in(8.50)satisfies the equation

∂f

∂t=

[1

ε2(−p∂q + β−1∂2

p

)− 1

ε(p∂q − ∂qV (q)∂p)

]f

=:

(1

ε2L0 −

1

εL1

)f. (8.51)

Remark 8.4.2. This is ”almost” the backward Kolmogorov equation with the dif-ference that we have−L1 instead ofL1. This is related to the fact thatL0 is asymmetric operator inL2(R2;Z−1e−βH(p,q)), whereasL1 is antisymmetric.

Proof. We note thatL∗0ρ0 = 0 andL∗

1ρ0 = 0. We use this to calculate:

L∗0ρ = L0(fρ0) = ∂p(fρ0) + β−1∂2

p(fρ0)

= ρ0p∂pf + ρ0β−1∂2

pf + fL∗0ρ0 + 2β−1∂pf∂pρ0

=(−p∂pf + β−1∂2

pf)ρ0 = ρ0L0f.


Similarly,

L∗1ρ = L∗

1(fρ0) = (−p∂q + ∂qV ∂p) (fρ0)

= ρ0 (−p∂qf + ∂qV ∂pf) = −ρ0L1f.

Consequently, the Fokker–Planck equation (8.94b) becomes

ρ0∂f

∂t= ρ0

(1

ε2L0f − 1

εL1f

),

from which the claim follows.

We will assume that the initial conditions for (8.51) dependonly onq:

f(p, q, 0) = fic(q). (8.52)

Another way for stating this assumption is the following: LetH = L2(R2d; ρβ(p, q))

and define the projection operatorP : H 7→ L2(Rd; ρβ(q)) with ρβ(q) = 1Zqe−βV (q), Zq =

∫Rd e

−βV (q) dq:

P · :=1

Zp

∫

Rd

·e−β|p|2

2 dp, (8.53)

with Zp :=∫

Rd e−β|p|2/2 dp. Then, assumption (10.13) can be written as

Pfic = fic.

We look for a solution to (8.51) in the form of a truncated power series inε:

f(p, q, t) =N∑

n=0

εnfn(p, q, t). (8.54)

We substitute this expansion into eqn. (8.51) to obtain the following system ofequations.

L0f0 = 0, (8.55a)

−L0f1 = −L1f0, (8.55b)

−L0f2 = −L1f1 −∂f0

∂t(8.55c)

−L0fn = −L1fn−1 −∂fn−2

∂t, n = 3, 4 . . . N. (8.55d)


The null space ofL0 consists of constants inp. Consequently, from equation (8.55a)we conclude that

f0 = f(q, t).

Now we can calculate the right hand side of equation (8.55b):

L1f0 = p∂qf.

Equation (8.55b) becomes:L0f1 = p∂qf.

The right hand side of this equation is orthogonal toN (L∗0) and consequently there

exists a unique solution. We obtain this solution using separation of variables:

f1 = −p∂qf + ψ1(q, t).

Now we can calculate the RHS of equation (8.55c). We need to calculateL1f1:

−L1f1 =(p∂q − ∂qV ∂p

)(p∂qf − ψ1(q, t)

)

= p2∂2q f − p∂qψ1 − ∂qV ∂qf.

The solvability condition for (8.55c) is∫

R

(− L1f1 −

∂f0

∂t

)ρOU (p) dp = 0,

from which we obtain the backward Kolmogorov equation corresponding to theSmoluchowski SDE:

∂f

∂t= −∂qV ∂qf + β−1∂2

q f, (8.56)

together with the initial condition (10.13).Now we solve the equation forf2. We use (8.56) to write (8.55c) in the form

L0f2 =(β−1 − p2

)∂2q f + p∂qψ1.


f2(p, q, t) =1

2∂2q f(p, q, t)p2 − ∂qψ1(q, t)p + ψ2(q, t).

Now we calculate the right hand side of the equation forf3, equation (8.55d) withn = 3. First we calculate

L1f2 =1

2p3∂3

qf − p2∂2qψ1 + p∂qψ2 − ∂qV ∂

2q fp− ∂qV ∂qψ1.


The solvability condition∫

R

(∂ψ1

∂t+ L1f2

)ρOU (p) dp = 0.

This leads to the equation

∂ψ1

∂t= −∂qV ∂qψ1 + β−1∂2

qψ1,

together with the initial conditionψ1(q, 0) = 0. From the calculations presentedin the proof of Theorem 6.5.5, and using Poincaree’s inequality for the measure1Zqe−βV (q), we deduce that

1

2

d

dt‖ψ1‖2

6 −C‖ψ1‖2.

We use Gronwall’s inequality now to conclude that

ψ1 ≡ 0.

Putting everything together we obtain the first two terms in theε-expansion of theFokker–Planck equation (8.51):

ρ(p, q, t) = Z−1e−βH(p,q)(f + ε(−p∂qf) + O(ε2)

),

wheref is the solution of (8.56). Notice that we can rewrite the leading order termto the expansion in the form

ρ(p, q, t) = (2πβ−1)−12 e−βp

2/2ρV (q, t) + O(ε),

whereρV = Z−1e−βV (q)f is the solution of the Smoluchowski Fokker-Planckequation

∂ρV∂t

= ∂q(∂qV ρV ) + β−1∂2qρV .

It is possible to expand then-th term in the expansion (8.54) in terms of Hermitefunctions (the eigenfunctions of the generator of the OU process)

fn(p, q, t) =

n∑

k=0

fnk(q, t)φk(p), (8.57)

whereφk(p) is thek–th eigenfunction ofL0:

−L0φk = λkφk.


We can obtain the following system of equations (L = β−1∂q − ∂qV ):

Lfn1 = 0,√k + 1

β−1Lfn,k+1 +

√kβ−1∂qfn,k−1 = −kfn+1,k, k = 1, 2 . . . , n− 1,

√nβ−1∂qfn,n−1 = −nfn+1,n,√

(n+ 1)β−1∂qfn,n = −(n+ 1)fn+1,n+1.

Using this method we can obtain the first three terms in the expansion:

ρ(x, y, t) = ρ0(p, q)

(f + ε(−

√β−1∂qfφ1) + ε2

(β−1

√2∂2qfφ2 + f20

)

+ε3

(−√β−3

3!∂3q fφ3 +

(−√β−1L∂2

q f −√β−1∂qf20

)φ1

))

+O(ε4),

8.4.2 The Underdamped Limit

Consider now the rescalingλγ,ε = 1, µγ,ε = γ. The Langevin equation becomes

qγ = −γ−2∇V (qγ) − qγ +√

2γ−2β−1W . (8.58)

We write equation (8.58) as system of two equations

qγ = γ−1pγ , pγ = −γ−1V ′(qγ) − pγ +√

2β−1W .

This is the equation for anO(1/γ) Hamiltonian system perturbed byO(1) noise.We expect that, to leading order, the energy is conserved, since it is conserved forthe Hamiltonian system. We apply Ito’s formula to the Hamiltonian of the systemto obtain

H =(β−1 − p2

)+√

2β−1p2W

with p2 = p2(H, q) = 2(H − V (q)).Thus, in order to study theγ → 0 limit we need to analyze the following

fast/slow system of SDEs

H =(β−1 − p2

)+√

2β−1p2W (8.59a)

pγ = −γ−1V ′(qγ) − pγ +√

2β−1W . (8.59b)


The Hamiltonian is the slow variable, whereas the momentum (or position) is thefast variable. Assuming that we can average over the Hamiltonian dynamics, weobtain the limiting SDE for the Hamiltonian:

H =(β−1 − 〈p2〉

)+√

2β−1〈p2〉W . (8.60)

The limiting SDE lives on the graph associated with the Hamiltonian system. Thedomain of definition of the limiting Markov process is definedthrough appropriateboundary conditions (thegluing conditions) at the interior vertices of the graph.

We identify all points belonging to the same connected component of the alevel curvex : H(x) = H, x = (q, p). Each point on the edges of the graphcorrespond to a trajectory. Interior vertices correspond to separatrices. LetIi, i =

1, . . . d be the edges of the graph. Then(i,H) defines a global coordinate systemon the graph.

We will study the smallγ asymptotics by analyzing the corresponding back-ward Kolmogorov equation using singular perturbation theory. The generator ofthe processqγ , pγ is

Lγ = γ−1 (p∂q − ∂qV ∂p) − p∂p + β−1∂2p

= γ−1L0 + L1.

Letuγ = E(f(pγ(p, q; t), qγ(p, q; t))). It satisfies the backward Kolmogorov equa-tion associated to the processqγ , pγ:

∂uγ

∂t=

(1

γL0 + L1

)uγ . (8.61)

We look for a solution in the form of a power series expansion in ε:

uγ = u0 + γu1 + γ2u2 + . . .

We substitute this ansatz into (8.61) and equate equal powers in ε to obtain thefollowing sequence of equations:

L0u0 = 0, (8.62a)

L0u1 = −L1u1 +∂u0

∂t, (8.62b)

L0u2 = −L1u1 +∂u1

∂t. (8.62c)


. . . . . . . . .

Notice that the operatorL0 is the backward Liouville operator of the Hamiltoniansystem with Hamiltonian

H =1

2p2 + V (q).

We assume that there are no integrals of motion other than theHamiltonian. Thismeans that the null space ofL0 consists of functions of the Hamiltonian:

N (L0) =

functions ofH. (8.63)

Let us now analyze equations (8.62). We start with (8.62a); eqn. (8.63) implies thatu0 depends onq, p through the Hamiltonian functionH:

u0 = u(H(p, q), t) (8.64)

Now we proceed with (8.62b). For this we need to find the solvability conditionfor equations of the form

L0u = f (8.65)

My multiply it by an arbitrary smooth function ofH(p, q), integrate overR2 anduse the skew-symmetry of the Liouville operatorL0 to deduce:1

∫

R2

L0uF (H(p, q)) dpdq =

∫

R2

uL∗0F (H(p, q)) dpdq

=

∫

R2

u(−L0F (H(p, q))) dpdq

= 0, ∀F ∈ C∞b (R).

This implies that thesolvability condition for equation (8.83) is that

∫

R2

f(p, q)F (H(p, q)) dpdq = 0, ∀F ∈ C∞b (R). (8.66)

We use the solvability condition in (8.62b) to obtain that

∫

R2

(L1u1 −

∂u0

∂t

)F (H(p, q)) dpdq = 0, (8.67)

1We assume that bothu1 andF decay to0 as |p| → ∞ to justify the integration by parts thatfollows.


To proceed, we need to understand howL1 acts to functions ofH(p, q). Let φ =

φ(H(p, q)). We have that

∂φ

∂p=∂H

∂p

∂φ

∂H= p

∂φ

∂H

and

∂2φ

∂p2=

∂

∂p

(∂φ

∂H

)=

∂φ

∂H+ p2 ∂

2φ

∂H2.

The above calculations imply that, whenL1 acts on functionsφ = φ(H(p, q)), itbecomes

L1 =[(β−1 − p2)∂H + β−1p2∂2

H

], (8.68)

wherep2 = p2(H, q) = 2(H − V (q)).

We want to change variables in the integral (8.67) and go from(p, q) to p, H. TheJacobian of the transformation is:

∂(p, q)

∂(H, q)=

∂p∂H

∂p∂q

∂q∂H

∂q∂q

=∂p

∂H=

1

p(H, q).

We use this, together with (8.68), to rewrite eqn. (8.67) as∫ ∫ (∂u

∂t+[(β−1 − p2)∂H + β−1p2∂2

H

]u)F (H)p−1(H, q) dHdq = 0.

We introduce the notation

〈·〉 :=

∫· dq.

The integration overq can be performed ”explicitly”:∫ [∂u

∂t〈p−1〉 +

((β−1〈p−1〉 − 〈p〉)∂H + β−1〈p〉∂2

H

)u]F (H) dH = 0.

This equation should be valid for every smooth functionF (H), and this require-ment leads to the differential equation

〈p−1〉∂u∂t

=(β−1〈p−1〉 − 〈p〉

)∂Hu+ 〈p〉β−1∂2

Hu,

or,∂u

∂t=(β−1 − 〈p−1〉−1〈p〉

)∂Hu+ γ〈p−1〉−1〈p〉β−1∂2

Hu.


Thus, we have obtained the limiting backward Kolmogorov equation for the energy,which is the ”slow variable”. From this equation we can read off the limiting SDEfor the Hamiltonian:

H = b(H) + σ(H)W (8.69)

whereb(H) = β−1 − 〈p−1〉−1〈p〉, σ(H) = β−1〈p−1〉−1〈p〉.

Notice that the noise that appears in the limiting equation (8.69) is multiplica-tive, contrary to the additive noise in the Langevin equation.

As it well known from classical mechanics, theaction andfrequency are de-fined as

I(E) =

∫p(q,E) dq

and

ω(E) = 2π

(dI

dE

)−1

,

respectively. Using the action and the frequency we can write the limiting Fokker–Planck equation for the distribution function of the energyin a very compact form.

Theorem 8.4.3.The limiting Fokker–Planck equation for the energy distributionfunctionρ(E, t) is

∂ρ

∂t=

∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)). (8.70)

Proof. We notice that

dI

dE=

∫∂p

∂Edq =

∫p−1 dq

and consequently

〈p−1〉−1 =ω(E)

2π.

Hence, the limiting Fokker–Planck equation can be written as

∂ρ

∂t= − ∂

∂E

((β−1 I(E)ω(E)

2π

)ρ

)+ β−1 ∂2

∂E2

(Iω

2π

)

= −β−1 ∂ρ

∂E+

∂

∂E

(Iω

2πρ

)+ β−1 ∂

∂E

(dI

dE

ωρ

2π

)+ β−1 ∂

∂E

(I∂

∂E

(ωρ2π

))

=∂

∂E

(Iω

2πρ

)+ β−1 ∂

∂E

(I∂

∂E

(ωρ2π

))

=∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)),


which is precisely equation (8.70).

Remarks 8.4.4. i. We emphasize that the above formal procedure does notprovide us with the boundary conditions for the limiting Fokker–Planck equa-tion. We will discuss about this issue in the next section.

ii. If we rescale back to the original time-scale we obtain the equation

∂ρ

∂t= γ

∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)). (8.71)

We will use this equation later on to calculate the rate of escape from apotential barrier in the energy-diffusion-limited regime.

8.5 Brownian Motion in Periodic Potentials

Basic model

mx = −γx(t) −∇V (x(t), f(t)) + y(t) +√

2γkBTξ(t), (8.72)

Goal: Calculate the effective drift and the effective diffusion tensor

Ueff = limt→∞

〈x(t)〉t

(8.73)

and

Deff = limt→∞

〈x(t) − 〈x(t)〉) ⊗ (x(t) − 〈x(t)〉)〉2t

. (8.74)

8.5.1 The Langevin equation in a periodic potential

We start by studying the underdamped dynamics of a Brownian particlex(t) ∈ Rd

moving in a smooth, periodic potential.

x = −∇V (x(t)) − γx(t) +√

2γkBTξ(t), (8.75)

whereγ is the friction coefficient,kB the Boltzmann constant andT denotes thetemperature.ξ(t) stands for the standardd–dimensional white noise process, i.e.

〈ξi(t)〉 = 0 and 〈ξi(t)ξj(s)〉 = δijδ(t− s), i, j = 1, . . . d.

8.5. BROWNIAN MOTION IN PERIODIC POTENTIALS 169

The potentialV (x) is periodic inx and satisfies‖∇V (x)‖L∞ = 1 with period1 inall spatial directions:

V (x+ ei) = V (x), i = 1, . . . , d,

whereeidi=1 denotes the standard basis ofRd.

Notice that we have already non–dimensionalized eqn. (8.75) in such a waythat the non–dimensional particle mass is1 and the maximum of the (gradient ofthe) potential is fixed [41]. Hence, the only parameters in the problem are thefriction coefficient and the temperature. Notice, furthermore, that the parameterγ in (8.75) controls the coupling between the Hamiltonian system x = −∇V (x)

and the thermal heat bath:γ ≫ 1 implies that the Hamiltonian system is stronglycoupled to the heat bath, whereasγ ≪ 1 corresponds to weak coupling.

Equation (8.75) defines a Markov process in the phase spaceTd ×R

d. Indeed,let us write (8.75) as a first order system

x(t) = y(t), (8.76a)

y(t) = −∇V (x(t)) − γy(t) +√

2γkBTξ(t), (8.76b)

The processx(t), y(t) is Markovian with generator

L = y · ∇x −∇V (x) · ∇y + γ (−y · ∇y +D∆y) .

In writing the above we have setD = KBT . This process is ergodic. The uniqueinvariant measure is absolutely continuous with respect tothe Lebesgue measureand its density is the Maxwell–Boltzmann distribution

ρ(y, x) =1

(2πD)n2Z

e−1DH(x,y), (8.77)

whereZ =∫

Td e−V (x)/D dx andH(x, y) is the Hamiltonian of the system

H(x, y) =1

2y2 + V (x).

The long time behavior of solutions to (8.75) is governed by an effective Brownianmotion. Indeed, the following central limit theorem holds [65, 55,?]

Theorem 8.5.1.LetV (x) ∈ C(Td). Define the rescaled process

x

ε(t) := εx(t/ε2).


Thenxε (t) converges weakly, asε→ 0, to a Brownian motion with covariance

Deff =

∫

Td×Rd

−LΦ ⊗ Φµ(dx dy), (8.78)

whereµ(dx dy) = ρ(x, y)dxdy and the vector valued functionΦ is the solution ofthe Poisson equation

−LΦ = y. (8.79)

We are interested in analyzing the dependence ofDeff on γ. We will mostlyfocus on the one dimensional case. We start by rescaling the Langevin equa-tion (9.13)

x = F (x) − γx+√

2γβ−1W , (8.80)

where we have setF (x) = −∇V (x). We will assume that the potential is periodicwith period2π in every direction. Since we expect that at sufficiently longlengthand time scales the particle performs a purely diffusive motion, we perform a dif-fusive rescaling to the equations of motion (9.13):t → t/ε2, x → x

ε . Using thefact thatW (c t) = 1√

cW (t) in law we obtain:

ε2x =1

εF(xε

)− γx+

√2γβ−1W ,

Introducingp = εx andq = x/ε we write this equation as a first order system:

x = 1εp,

p = 1ε2F (q) − 1

ε2γp+ 1

ε2γβ−1W ,

q = 1ε2p,

(8.81)

with the understanding thatq ∈ [−π, π]d and x, p ∈ Rd. Our goal now is to

eliminate the fast variablesp, q and to obtain an equation for the slow variablex. We shall accomplish this by studying the corresponding backward Kolmogorovequation using singular perturbation theory for partial differential equations.

Let

uε(p, q, x, t) = Ef(p(t), q(t), x(t)|p(0) = p, q(0) = q, x(0) = x

),

whereE denotes the expectation with respect to the Brownian motionW (t) inthe Langevin equation andf is a smooth function.2 The evolution of the func-tion uε(p, q, x, t) is governed by the backward Kolmogorov equation associatedto

2In other words, we have that

uε(p, q, x, t) =

Z

f(x, v, t; p, q)ρ(x, v, t; p, q)µ(p, q) dpdqdxdv,


equations (8.81) is [59]3

∂uε

∂t=

1

εp · ∇xu

ε +1

ε2

(−∇qV (q) · ∇p + p · ∇q + γ

(− p · ∇p + β−1∆p

))uε.

:=

(1

ε2L0 +

1

εL1

)uε, (8.82)

where:

L0 = −∇qV (q) · ∇p + p · ∇q + γ(− p · ∇p + β−1∆p

),

L1 = p · ∇x

The invariant distribution of the fast processq(t), p(t)

in T

d×Rd is the Maxwell-

Boltzmann distribution

ρβ(q, p) = Z−1e−βH(q,p), Z =

∫

Td×Rd

e−βH(q,p) dqdp,

whereH(q, p) = 12 |p|2 + V (q). Indeed, we can readily check that

L∗0ρβ(q, p) = 0,

whereL∗0 denotes the Fokker-Planck operator which is theL2-adjoint of the gen-

erator of the processL0:

L∗0f · = ∇qV (q) · ∇pf − p · ∇qf + γ

(∇p · (pf) + β−1∆pf

).

The null space of the generatorL0 consists of constants inq, p. Moreover, theequation

−L0f = g, (8.83)

has a unique (up to constants) solution if and only if

〈g〉β :=

∫

Td×Rd

g(q, p)ρβ(q, p) dqdp = 0. (8.84)

whereρ(x, v, t; p, q) is the solution of the Fokker-Planck equation andµ(p, q) is the initial distribu-tion.

3it is more customary in the physics literature to use the forward Kolmogorov equation, i.e. theFokker-Planck equation. However, for the calculation presented below, it is more convenient to usethe backward as opposed to the forward Kolmogorov equation.The two formulations are equivalent.See [57, Ch. 6] for details.


Equation (8.83) is equipped with periodic boundary conditions with respect tozand is such that ∫

Td×Rd

|f |2µβ dqdp <∞. (8.85)

These two conditions are sufficient to ensure existence and uniqueness of solutions(up to constants) of equation (8.83) [28, 29, 55].

We assume that the following ansatz for the solutionuε holds:

uε = u0 + εu1 + ε2u2 + . . . (8.86)

with ui = ui(p, q, x, t), i = 1, 2, . . . being2π periodic inq and satisfying condi-tion (8.85). We substitute (8.86) into (8.82) and equate equal powers inε to obtainthe following sequence of equations:

L0 u0 = 0, (8.87a)

L0 u1 = −L1 u0, (8.87b)

L0 u2 = −L1 u1 +∂u0

∂t. (8.87c)

From the first equation in (8.87) we deduce thatu0 = u0(x, t), since the nullspace ofL0 consists of functions which are constants inp andq. Now the secondequation in (8.87) becomes:

L0u1 = −p · ∇xu0.

Since〈p〉 = 0, the right hand side of the above equation is mean-zero with respectto the Maxwell-Boltzmann distribution. Hence, the above equation is well-posed.We solve it using separation of variables:

u1 = Φ(p, q) · ∇xu0

with−L0Φ = p. (8.88)

This Poisson equation is posed onTd × R

d. The solution is periodic inq andsatisfies condition (8.85). Now we proceed with the third equation in (8.87). Weapply the solvability condition to obtain:

∂u0

∂t=

∫

Td×Rd

L1u1ρβ(p, q) dpdq

=d∑

i,j=1

(∫

Td×Rd

piΦjρβ(p, q) dpdq

)∂2u0

∂xi∂xj.


This is the Backward Kolmogorov equation which governs the dynamics on largescales. We write it in the form

∂u0

∂t=

d∑

i,j=1

Dij∂2u0

∂xi∂xj(8.89)

where the effective diffusion tensor is

Dij =

∫

Td×Rd

piΦjρβ(p, q) dpdq, i, j = 1, . . . d. (8.90)

The calculation of the effective diffusion tensor requiresthe solution of the bound-ary value problem (8.88) and the calculation of the integralin (8.90). The limitingbackward Kolmogorov equation is well posed since the diffusion tensor is non-negative. Indeed, letξ be a unit vector inRd. We calculate (we use the notationΦξ = Φ · ξ and〈·, ·〉 for the Euclidean inner product)

〈ξ,Dξ〉 =

∫(p · ξ)(Φξ)µβ dpdq =

∫ (− L0Φξ

)Φξµβ dpdq

= γβ−1

∫ ∣∣∇pΦξ

∣∣2µβ dpdq > 0, (8.91)

where an integration by parts was used.Thus, from the multiscale analysis we conclude that at largelenght/time scales

the particle which diffuses in a periodic potential performs and effective Brownianmotion with a nonnegative diffusion tensor which is given byformula (8.90).

We mention in passing that the analysis presented above can also be appliedto the problem of Brownian motion in a tilted periodic potential. The Langevinequation becomes

x(t) = −∇V (x(t)) + F − γx(t) +√

2γβ−1W (t), (8.92)

whereV (x) is periodic with period2π andF is a constant force field. The formulasfor the effective drift and the effective diffusion tensor are

V =

∫

Rd×Td

pρ(q, p) dqdp, D =

∫

Rd×Td

(p− V ) ⊗ φρ(p, q) dpdq, (8.93)

where−Lφ = p− V, (8.94a)

L∗ρ = 0,

∫

Rd×Td

ρ(p, q) dpdq = 1. (8.94b)


with

L = p · ∇q + (−∇qV + F ) · ∇p + γ(− p · ∇p + β−1∆p

). (8.95)

We have used⊗ to denote the tensor product between two vectors;L∗ denotes theL2-adjoint of the operatorL, i.e. the Fokker-Planck operator. Equations (8.94)are equipped with periodic boundary conditions inq. The solution of the Poissonequation (8.94) is also taken to be square integrable with respect to the invariantdensityρ(q, p): ∫

Rd×Td

|φ(q, p)|2ρ(p, q) dpdq < +∞.

The diffusion tensor is nonnegative definite. A calculationsimilar to the one usedto derive (8.91) shows the positive definiteness of the diffusion tensor:

〈ξ,Dξ〉 = γβ−1

∫ ∣∣∇pΦξ

∣∣2ρ(p, q) dpdq > 0, (8.96)

for every vectorξ in Rd. The study of diffusion in a tilted periodic potential, in the

underdamped regime and in high dimensions, based on the above formulas forVandD, will be the subject of a separate publication.

8.5.2 Equivalence With the Green-Kubo Formula

Let us now show that the formula for the diffusion tensor obtained in the pre-vious section, equation (8.90), is equivalent to the Green-Kubo formula (3.14).To simplify the notation we will prove the equivalence of thetwo formulas inone dimension. The generalization to arbitrary dimensionsis immediate. Let(x(t; q, p), v(t; q, p)) with v = x and initial conditionsx(0; q, p) = q, v(0; q, p) =

p be the solution of the Langevin equation

x = −∂xV − γx+ ξ

whereξ(t) stands for Gaussian white noise in one dimension with correlation func-tion

〈ξ(t)ξ(s)〉 = 2γkBTδ(t− s).

We assume that the(x, v) process is stationary, i.e. that the initial conditions aredistributed according to the Maxwell-Boltzmann distribution

ρβ(q, p) = Z−1e−βH(p,q).


The velocity autocorrelation function is [9, eq. 2.10]

〈v(t; q, p)v(0; q, p)〉 =

∫v pρ(x, v, t; p, q)ρβ(p, q) dpdqdxdv, (8.97)

andρ(x, v, t; p, q) is the solution of the Fokker-Planck equation

∂ρ

∂t= L∗ρ, ρ(x, v, 0; p, q) = δ(x− q)δ(v − p),

whereL∗ρ = −v∂xρ+ ∂xV (x)∂vρ+ γ

(∂(vρ) + β−1∂2

vρ).

We rewrite (8.97) in the form

〈v(t; q, p)v(0; q, p)〉 =

∫ ∫ (∫ ∫vρ(x, v, t; p, q) dvdx

)pρβ(p, q) dpdq

=:

∫ ∫v(t; p, q)pρβ(p, q) dpdq. (8.98)

The functionv(t) satisfies the backward Kolmogorov equation which governs theevolution of observables [59, Ch. 6]

∂v

∂t= Lv, v(0; p, q) = p. (8.99)

We can write, formally, the solution of (8.99) as

v = eLtp. (8.100)

We combine now equations (8.98) and (8.100) to obtain the following formula forthe velocity autocorrelation function

〈v(t; q, p)v(0; q, p)〉 =

∫ ∫p(eLtp

)ρβ(p, q) dpdq. (8.101)

We substitute this into the Green-Kubo formula to obtain

D =

∫ ∞

0〈v(t; q, p)v(0; q, p)〉 dt

=

∫ (∫ ∞

0eLt dt p

)pρβ dpdq

=

∫ (− L−1p

)pρβ dpdq

=

∫ ∞

−∞

∫ π

−πφpρβ dpdq,

whereφ is the solution of the Poisson equation (8.88). In the above derivation wehave used the formula−L−1 =

∫∞0 eLt dt, whose proof can be found in [59, Ch.

11].


8.6 The Underdamped and Overdamped Limits of the Dif-fusion Coefficient

In this section we derive approximate formulas for the diffusion coefficient whichare valid in the overdampedγ ≫ 1 and underdampledγ ≪ 1 limits. The deriva-tion of these formulas is based on the asymptotic analysis ofthe Poisson equa-tion (8.88).

The Underdamped Limit

In this subsection we solve the Poisson equation (8.88) in one dimension perturba-tively for smallγ. We shall use singular perturbation theory for partial differentialequations. The operatorL0 that appears in (8.88) can be written in the form

L0 = LH + γLOU

whereLH stands for the (backward) Liouville operator associated with the Hamil-tonianH(p, q) andLOU for the generator of the OU process, respectively:

LH = p∂q − ∂qV ∂p, LOU = −p∂p + β−1∂2p .

We expect that the solution of the Poisson equation scales like γ−1 whenγ ≪ 1.Thus, we look for a solution of the form

Φ =1

γφ0 + φ1 + γφ2 + . . . (8.102)

We substitute this ansatz in (8.88) to obtain the sequence ofequations

LHφ0 = 0, (8.103a)

−LHφ1 = p+ LOUφ0, (8.103b)

−LHφ2 = LOUφ1. (8.103c)

From equation (8.103a) we deduce that, since theφ0 is in the null space of theLiouville operator, the first term in the expansion is a function of the Hamiltonianz(p, q) = 1

2p2 + V (q):

φ0 = φ0(z(p, q)).

Now we want to obtain an equation forφ0 by using the solvability condition for(8.103b). To this end, we multiply this equation by an arbitrary function ofz,

8.6. THE UNDERDAMPED AND OVERDAMPED LIMITS OF THE DIFFUSIONCOEFFICIENT177

g = g(z) and integrate overp andq to obtain∫ +∞

−∞

∫ π

−π(p+ LOUφ0) g(z(p, q)) dpdq = 0.

We change now fromp, q coordinates toz, q, so that the above integral becomes∫ +∞

Emin

∫ π

−πg(z) (p(z, q) + LOUφ0(z))

1

p(z, q)dzdq = 0,

whereJ = p−1(z, q) is the Jacobian of the transformation. OperatorL0, whenapplied to functions of the Hamiltonian, becomes:

LOU = (β−1 − p2)∂

∂z+ β−1p2 ∂

2

∂z2.

Hence, the integral equation forφ0(z) becomes∫ +∞

Emin

∫ π

−πg(z)

[p(z, q) +

((β−1 − p2)

∂

∂z+ β−1p2 ∂

2

∂z2

)φ0(z)

]1

p(z, q)dzdq = 0.

LetE0 denote the critical energy, i.e. the energy along the separatrix (homoclinicorbit). We set

S(z) =

∫ x2(z)

x1(z)p(z, q) dq, T (z) =

∫ x2(z)

x1(z)

1

p(z, q)dq,

where Risken’s notation [64, p. 301] has been used forx1(z) andx2(z).We need to consider the cases

z > E0, p > 0

,z > E0, p < 0

and

Emin < z < E0

separately.

We consider first the caseE > E0, p > 0. In this casex1(x) = π, x2(z) =

−π. We can perform the integration with respect toq to obtain∫ +∞

E0

g(z)

[2π +

((β−1T (z) − S(z))

∂

∂z+ β−1S(z)

∂2

∂z2

)φ0(z)

]dz = 0,

This equation is valid for every test functiong(z), from which we obtain the fol-lowing differential equation forφ0:

−Lφ := −β−1 1

T (z)S(z)φ′′ +

(1

T (z)S(z) − β−1

)φ′ =

2π

T (z), (8.104)

where primes denote differentiation with respect toz and where the subscript0 hasbeen dropped for notational simplicity.


A similar calculation shows that in the regionsE > E0, p < 0 andEmin <E < E0 the equation forφ0 is

−Lφ = − 2π

T (z), E > E0, p < 0 (8.105)

and−Lφ = 0, Emin < E < E0. (8.106)

Equations (8.104), (8.105), (8.106) are augmented with condition (8.85) and a con-tinuity condition at the critical energy [18]

2φ′3(E0) = φ′1(E0) + φ′2(E0), (8.107)

whereφ1, φ2, φ3 are the solutions of equations (8.104), (8.105) and (8.106), re-spectively.

The average of a functionh(q, p) = h(q, p(z, q)) can be written in the form [64,p. 303]

〈h(q, p)〉β :=

∫ ∞

−∞

∫ π

−πh(q, p)µβ(q, p) dqdp

= Z−1β

∫ +∞

Emin

∫ x2(z)

x1(z)

(h(q, p(z, q)) + h(q,−p(z, q))

)(p(q, z))−1e−βz dzdq,

where the partition function is

Zβ =

√2π

β

∫ π

−πe−βV (q) dq.

From equation (8.106) we deduce thatφ3(z) = 0. Furthermore, we have thatφ1(z) = −φ2(z). These facts, together with the above formula for the averagingwith respect to the Boltzmann distribution, yield:

D = 〈pΦ(p, q)〉β = 〈pφ0〉β + O(1) (8.108)

≈ 2

γZ−1β

∫ +∞

E0

φ0(z)eβz dzO(1)

=4π

γZ−1β

∫ +∞

E0

φ0(z)e−βz dz, (8.109)

to leading order inγ, and whereφ0(z) is the solution of the two point boundaryvalue problem (8.104). We remark that if we start with formulaD = γβ−1〈|∂pΦ|2〉β


for the diffusion coefficient, we obtain the following formula, which is equivalentto (8.109):

D =4π

γβZ−1β

∫ +∞

E0

|∂zφ0(z)|2e−βz dz.

Now we solve the equation forφ0(z) (for notational simplicity, we will drop thesubscript0 ). Using the fact thatS′(z) = T (z), we rewrite (8.104) as

−β−1(Sφ′)′ + Sφ′ = 2π.

This equation can be rewritten as

−β−1(e−βzSφ′

)= e−βz.

Condition (8.85) implies that the derivative of the unique solution of (8.104) is

φ′(z) = S−1(z).

We use this in (8.109), together with an integration by parts, to obtain the followingformula for the diffusion coefficient:

D =1

γ8π2Z−1

β β−1

∫ +∞

E0

e−βz

S(z)dz. (8.110)

We emphasize the fact that this formula is exact in the limit as γ → 0 and is validfor all periodic potentials and for all values of the temperature.

Consider now the case of the nonlinear pendulumV (q) = − cos(q). Thepartition function is

Zβ =(2π)3/2

β1/2J0(β),

whereJ0(·) is the modified Bessel function of the first kind. Furthermore, a simplecalculation yields

S(z) = 25/2√z + 1E

(√2

z + 1

)

,

whereE(·) is the complete elliptic integral of the second kind. The formula for thediffusion coefficient becomes

D =1

γ

√π

2β1/2J0(β)

∫ +∞

1

e−βz√z + 1E(

√2/(z + 1))

dz. (8.111)


We use now the asymptotic formulaJ0(β) ≈ (2πβ)−1/2eβ, β ≫ 1 and the factthatE(1) = 1 to obtain the small temperature asymptotics for the diffusion coeffi-cient:

D =1

γ

π

2βe−2β , β ≫ 1, (8.112)

which is precisely formula (??), obtained by Risken.Unlike the overdamped limit which is treated in the next section, it is not

straightforward to obtain the next order correction in the formula for the effectivediffusivity. This is because, due to the discontinuity of the solution of the Poissonequation (8.88) along the separatrix. In particular, the next order correction toφwhenγ ≪ 1 is of (γ−1/2), rather than(1) as suggested by ansatz (8.102).

Upon combining the formula for the diffusion coefficient andthe formula forthe hopping rate from Kramers’ theory [31, eqn. 4.48(a)] we can obtain a formulafor the mean square jump length at low friction. For the cosine potential, and forβ ≫ 1, this formula is

〈ℓ2〉 =π2

8γ2β2for γ ≪ 1, β ≫ 1. (8.113)

The Overdamped Limit

In this subsection we study the largeγ asymptotics of the diffusion coefficient. Asin the previous case, we use singular perturbation theory, e.g. [32, Ch. 8]. Theregularity of the solution of (8.88) whenγ ≫ 1 will enable us to obtain the firsttwo terms in the1

γ expansion without any difficulty.We setγ = 1

ε . The differential operatorL0 becomes

L0 =1

εLOU + LH .

We look for a solution of (8.88) in the form of a power series expansion inγ:

Φ = φ0 + εφ1 + ε2φ2 + ε3φ3 + . . . (8.114)

We substitute this into (8.88) and obtain the following sequence of equations:

−LOUφ0 = 0, (8.115a)

−LOUφ1 = p+ LHφ0, (8.115b)

−LOUφ2 = LHφ1, (8.115c)

−LOUφ3 = LHφ2. (8.115d)


The null space of the Ornstein-Uhlenbeck operatorL0 consists of constants inp.Consequently, from the first equation in (8.115) we deduce that the first term in theexpansion in independent ofp, φ0 = φ(q). The second equation becomes

−LOUφ1 = p(1 + ∂qφ).

Let

νβ(p) =

(2π

β

)− 12

e−βp2

2 ,

be the invariant distribution of the OU process (i.e.L∗OUνβ(p) = 0). The solvabil-

ity condition for an equation of the form−LOUφ = f requires that the right handside averages to0 with respect toνβ(p), i.e. that the right hand side of the equationis orthogonal to the null space of the adjoint ofLOU . This condition is clearlysatisfied for the equation forφ1. Thus, by Fredholm alternative, this equation hasa solution which is

φ1(p, q) = (1 + ∂qφ)p+ ψ1(q),

where the functionψ1(q) of is to be determined. We substitute this into the righthand side of the third equation to obtain

−LOUφ2 = p2∂2qφ− ∂qV (1 + ∂qφ) + p∂qψ1(q).

From the solvability condition for this we obtain an equation for φ(q):

β−1∂2qφ− ∂qV (1 + ∂qφ) = 0, (8.116)

together with the periodic boundary conditions. The derivative of the solution ofthis two-point boundary value problem is

∂qφ+ 1 =2π∫ π

−π eβV (q) dq

eβV (q). (8.117)

The first two terms in the largeγ expansion of the solution of equation (8.88) are

Φ(p, q) = φ(q) +1

γ(1 + ∂qφ) + O

(1

γ2

),

whereφ(q) is the solution of (8.116). Substituting this in the formulafor the diffu-sion coefficient and using (8.117) we obtain

D =

∫ ∞

−∞

∫ π

−πpΦρβ(p, q) dpdq =

4π2

βZZ+ O

(1

γ3

),


whereZ =∫ π−π e

−βV (q), Z =∫ π−π e

βV (q). This is, of course, the Lifson-Jacksonformula which gives the diffusion coefficient in the overdamped limit [43]. Con-tinuing in the same fashion, we can also calculate the next two terms in the expan-sion (8.114), see Exercise 4. From this, we can compute the next order correctionto the diffusion coefficient. The final result is

D =4π2

βγZZ− 4π2βZ1

γ3ZZ2+ O

(1

γ5

), (8.118)

whereZ1 =∫ π−π |V ′(q)|2eβV (q) dq.

In the case of the nonlinear pendulum,V (q) = cos(q), formula (8.118) gives

D =1

γβJ−2

0 (β) − β

γ3

(J2(β)

J30 (β)

− J−20 (β)

)+ O

(1

γ5

), (8.119)

whereJn(β) is the modified Bessel function of the first kind.In the multidimensional case, a similar analysis leads to the large gamma

asymptotics:

〈ξ,Dξ〉 =1

γ〈ξ,D0ξ〉 + O

(1

γ3

),

whereξ is an arbitrary unit vector inRd andD0 is the diffusion coefficient for theSmoluchowski (overdamped) dynamics:

D0 = Z−1

∫

Rd

(− LV χ

)⊗ χe−βV (q) dq (8.120)

whereLV = −∇qV · ∇q + β−1∆q

andχ(q) is the solution of the PDELV χ = ∇qV with periodic boundary condi-tions.

Now we prove several properties of the effective diffusion tensor in the over-damped limit. For this we will need the following integration by parts formula∫

Td

(∇yχ

)ρ dy =

∫

Td

(∇y(χρ)− χ⊗∇yρ

)dy = −

∫

Td

(χ⊗∇yρ) dy. (8.121)

The proof of this formula is left as an exercise, see Exercise5.

Theorem 8.6.1.The effective diffusion tensorD0 (8.120)satisfies the upper andlower bounds

D

ZZ6 〈ξ,Kξ〉 6 D|ξ|2 ∀ξ ∈ R

d, (8.122)


where

Z =

∫

Td

eV (y)/D dy.

In particular, diffusion is always depleted when compared to molecular diffusivity.Furthermore, the effective diffusivity is symmetric.

Proof. The lower bound follows from the general lower bound (??), equation (??)and the formula for the Gibbs measure. To establish the upperbound, we use(8.121) and (??) to obtain

K = DI + 2D

∫

Td

(∇χ)Tρ dy +

∫

Td

−∇yV ⊗ χρ dy

= DI − 2D

∫

Td

∇yρ⊗ χdy +

∫

Td


= DI − 2

∫

Td

−∇yV ⊗ χρ dy +

∫

Td


= DI −∫

Td


= DI −∫

Td

(− L0χ

)⊗ χρ dy

= DI −D

∫

Td

(∇yχ⊗∇yχ

)ρ dy. (8.123)

Hence, forχξ = χ · ξ,

〈ξ,Kξ〉 = D|ξ|2 −D

∫

Td

|∇yχξ|2ρ dy

6 D|ξ|2.

This proves depletion. The symmetry ofK follows from (8.123).

The One Dimensional Case

The one dimensional case is always in gradient form:b(y) = −∂yV (y). Further-more in one dimension we can solve the cell problem (??) in closed form andcalculate the effective diffusion coefficient explicitly–up to quadratures. We start


with the following calculation concerning the structure ofthe diffusion coefficient.

K = D + 2D

∫ 1

0∂yχρ dy +

∫ 1

0−∂yV χρ dy

= D + 2D

∫ 1

0∂yχρ dy +D

∫ 1

0χ∂yρ dy

= D + 2D

∫ 1

0∂yχρ dy −D

∫ 1

0∂yχρ dy

= D

∫ 1

0

(1 + ∂yχ

)ρ dy. (8.124)

The cell problem (??) in one dimension is

D∂yyχ− ∂yV ∂yχ = ∂yV. (8.125)

We multiply equation (8.125) bye−V (y)/D to obtain

∂y

(∂yχe

−V (y)/D)

= −∂y(e−V (y)/D

).

We integrate this equation from0 to 1 and multiply byeV (y)/D to obtain

∂yχ(y) = −1 + c1eV (y)/D.

Another integration yields

χ(y) = −y + c1

∫ y

0eV (y)/D dy + c2.

The periodic boundary conditions imply thatχ(0) = χ(1), from which we con-clude that

−1 + c1

∫ 1

0eV (y)/D dy = 0.

Hence

c1 =1

Z, Z =

∫ 1

0eV (y)/D dy.

We deduce that

∂yχ = −1 +1

ZeV (y)/D.


We substitute this expression into (8.124) to obtain

K =D

Z

∫ 1

0(1 + ∂yχ(y)) e−V (y)/D dy

=D

ZZ

∫ 1

0eV (y)/De−V (y)/D dy

=D

ZZ, (8.126)

with

Z =

∫ 1

0e−V (y)/D dy, Z =

∫ 1

0eV (y)/D dy. (8.127)

The Cauchy-Schwarz inequality shows thatZZ > 1. Notice that in the one–dimensional case the formula for the effective diffusivityis precisely the lowerbound in (8.122). This shows that the lower bound is sharp.

Example 8.6.2.Consider the potential

V (y) =

a1 : y ∈ [0, 1

2 ],a2 : y ∈ (1

2 , 1],(8.128)

wherea1, a2 are positive constants.4

It is straightforward to calculate the integrals in(8.127)to obtain the formula

K =D

cosh2(a1−a2D

) . (8.129)

In Figure 8.2 we plot the effective diffusivity given by(8.129)as a function of themolecular diffusivityD. We observe thatK decays exponentially fast in the limitasD → 0.

8.6.1 Brownian Motion in a Tilted Periodic Potential

In this appendix we use our method to obtain a formula for the effective diffusioncoefficient of an overdamped particle moving in a one dimensional tilted periodicpotential. This formula was first derived and analyzed in [62, 61] without anyappeal to multiscale analysis. The equation of motion is

x = −V ′(x) + F +√

2Dξ, (8.130)

4Of course, this potential is not even continuous, let alone smooth, and the theory as developedin this chapter does not apply. It is possible, however, to consider a regularized version of thisdiscontinuous potential and then homogenization theory applies.


10−1

100

101

10−10

10−8

10−6

10−4

10−2

100

K

D

Figure 8.2: Effective diffusivity versus molecular diffusivity for the potential(8.128).

whereV (x) is a smooth periodic function with periodL, F andD > 0 constantsandξ(t) standard white noise in one dimension. To simplify the notation we havesetγ = 1.

The stationary Fokker–Planck equation corresponding to(8.130) is

∂x((V ′(x) − F

)ρ(x) +D∂xρ(x)

)= 0, (8.131)

with periodic boundary conditions. Formula (??) for the effective drift now be-comes

Ueff =

∫ L

0(−V ′(x) + F )ρ(x) dx. (8.132)

The solution of eqn. (8.131) is [60, Ch. 9]

ρ(x) =1

Z

∫ x+L

xdyZ+(y)Z−(x), (8.133)

with

Z±(x) := e±1D

(V (x)−Fx),


and

Z =

∫ L

0dx

∫ x+L

xdyZ+(y)Z−(x). (8.134)

Upon using (8.133) in (8.132) we obtain [60, Ch. 9]

Ueff =DL

Z

(1 − e−

F LD

). (8.135)

Our goal now is to calculate the effective diffusion coefficient. For this we firstneed to solve the Poisson equation (8.94a) which now becomes

Lχ(x) := D∂xxχ(x) + (−V ′(x) + F )∂xχ = V ′(x) − F + Ueff , (8.136)

with periodic boundary conditions. Then we need to evaluatethe integrals in (??):

Deff = D +

∫ L

0(−V ′(x) + F − Ueff )ρ(x) dx + 2D

∫ L

0∂xχ(x)ρ(x) dx.

It will be more convenient for the subsequent calculation torewrite the above for-mula for the effective diffusion coefficient in a different form. The fact thatρ(x)solves the stationary Fokker–Planck equation, together with elementary integra-tions by parts yield that, for all sufficiently smooth periodic functionsφ(x),

∫ L

0φ(x)(−Lφ(x))ρ(x) dx = D

∫ L

0(∂xφ(x))2ρ(x) dx.

Now we have

Deff = D +

∫ L

0(−V ′(x) + F − Ueff )χ(x)ρ(x) dx + 2D

∫ L

0∂xχ(x)ρ(x) dx

= D +

∫ L

0(−Lχ(x))χ(x)ρ(x) dx + 2D

∫ L

0∂xχ(x)ρ(x) dx

= D +D

∫ L

0(∂xχ(x))2 ρ(x) dx+ 2D

∫ L

0∂xχ(x)ρ(x) dx

= D

∫ L

0(1 + ∂xχ(x))2 ρ(x) dx. (8.137)

Now we solve the Poisson equation (8.136) with periodic boundary conditions. Wemultiply the equation byZ−(x) and divide through byD to rewrite it in the form

∂x(∂xχ(x)Z−(x)) = −∂xZ−(x) +UeffD

Z−(x).


We integrate this equation fromx − L to x and use the periodicity ofχ(x) andV (x) together with formula (8.135) to obtain

∂xχ(x)Z−(x)(1 − e−

F LD

)= −Z−(x)

(1 − e−

F LD

)+L

Z

(1 − e−

F LD

) ∫ x

x−LZ−(y) dy,

from which we immediately get

∂xχ(x) + 1 =1

Z

∫ x

x−LZ−(y)Z+(x) dy.

Substituting this into (8.137) and using the formula for theinvariant distribution(8.133) we finally obtain

Deff =D

Z3

∫ L

0(I+(x))2I−(x) dx, (8.138)

with

I+(x) =

∫ x

x−LZ−(y)Z+(x) dy and I−(x) =

∫ x+L

xZ+(y)Z−(x) dy.

Formula (8.138) for the effective diffusion coefficient (formula (22) in [61]) is themain result of this section.

8.7 Numerical Solution of the Klein-Kramers Equation


The rigorous study of the overdamped limit can be found in [54]. A similar approx-imation theorem is also valid in infinite dimensions (i.e. for SPDEs); see [5, 6].

More information about the underdamped limit of the Langevin equation canbe found at [70, 19, 20].

We also mention in passing that the various formulae for the effective diffusioncoefficient that have been derived in the literature [24, 43,62, 66] can be obtainedfrom equation (??): they correspond to cases where equations (??) and (??) can besolved analytically. An example–the calculation of the effective diffusion coeffi-cient of an overdamped Brownian particle in a tilted periodic potential–is presentedin appendix. Similar calculations yield analytical expressions for all other exactlysolvable models that have been considered in the literature.

8.9. EXERCISES 189

8.9 Exercises

1. LetL be the generator of the two-dimensional Ornstein-Uhlenbeck operator (8.17).Calculate the eigenvalues and eigenfunctions ofL. Show that there exists atransformation that transformsL into the Schrodinger operator of the two-dimensionalquantum harmonic oscillator.

2. Let L be the operator defined in (8.34)

(a) Show by direct substitution thatL can be written in the form

L = −λ1(c−)∗(c+)∗ − λ2(d

−)∗(d+)∗.

(b) Calculate the commutators

[(c+)∗, (c−)∗], [(d+)∗, (d−)∗], [(c±)∗, (d±)∗], [L, (c±)∗], [L, (d±)∗].

3. Show that the operatorsa±, b± defined in (8.15) and (8.16) satisfy the commu-tation relations

[a+, a−] = −1, (8.139a)

[b+, b−] = −1, (8.139b)

[a±, b±] = 0. (8.139c)

4. Obtain the second term in the expansion (8.118).

5. Prove formula (8.121).

Chapter 9

The Mean First Passage time andExit Time Problems

9.1 Introduction

9.2 Brownian Motion in a Bistable Potential

There are many systems in physics, chemistry and biology that exist in at least twostable states. Among the many applications we mention the switching and storagedevices in computers. Another example is biological macromolecules that can existin many different states. The problems that we would like to solve are:

• How stable are the various states relative to each other.

• How long does it take for a system to switch spontaneously from one stateto another?

• How is the transfer made, i.e. through what path in the relevant state space?There is a lot of important current work on this problem by E, Vanden Eijn-den etc.

• How does the system relax to an unstable state?

We can separate between the1d problem, the finite dimensional problem and theinfinite dimensional problem (SPDEs). We we will solve completely the one di-mensional problem and discuss in some detail about the finitedimensional prob-lem. The infinite dimensional situation is an extremely hardproblem and we will

191

192CHAPTER 9. THE MEAN FIRST PASSAGE TIME AND EXIT TIME PROBLEMS

0 50 100 150 200 250 300 350 400 450 500−3

−2

−1

0

1

2

3

only make some remarks. The study of bistability and metastability is a very activeresearch area, in particular the development of numerical methods for the calcula-tion of various quantities such as reaction rates, transition pathways etc.

We will mostly consider the dynamics of a particle moving in abistable poten-tial, under the influence of thermal noise in one dimension:

x = −V ′(x) +√

2kBT β. (9.1)

An example of the class of potentials that we will consider isshown in Figure. Ithas to local minima, one local maximum and it increases at least quadratically atinfinity. This ensures that the state space is ”compact”, i.e. that the particle cannotescape at infinity. The standard potential that satisfies these assumptions is

V (x) =1

4x4 − 1

2x2 +

1

4. (9.2)

It is easily checked that this potential has three local minima, a local maximum atx = 0 and two local minima atx = ±1. The values of the potential at these threepoints are:

V (±1) = 0, V (0) =1

4.

We will say that the height of the potential barrier is14 . The physically (and math-

ematically!) interesting case is when the thermal fluctuations are weak when com-pared to the potential barrier that the particle has to climbover.

9.2. BROWNIAN MOTION IN A BISTABLE POTENTIAL 193

More generally, we assume that the potential has two local minima at the pointsa andc and a local maximum atb. Let us consider the problem of the escape of theparticle from the left local minimuma. The potential barrier is then defined as

∆E = V (b) − V (a).

Our assumption that the thermal fluctuations are weak can be written as

kBT

∆E≪ 1.

In this limit, it is intuitively clear that the particle is most likely to be found at eithera or c. There it will perform small oscillations around either of the local minima.This is a result that we can obtain by studying the small temperature limit by usingperturbation theory. The result is that we can describe locally the dynamics ofthe particle by appropriate Ornstein–Uhlenbeck processes. Of course, this result isvalid only for finite times: at sufficiently long times the particle can escape fromthe one local minimum,a say, and surmount the potential barrier to end up atc.It will then spend a long time in the neighborhood ofc until it escapes again thepotential barrier and end ata. This is an example of arare event. The relevanttime scale, theexit time or themean first passage timescales exponentially inβ := (kBT )−1:

τ = ν−1 exp(β∆E).

It is more customary to calculate thereaction rate κ := τ−1 which gives the ratewith which particles escape from a local minimum of the potential:

κ = ν exp(−β∆E). (9.3)

It is very important to notice that the escape from a local minimum, i.e. a state oflocal stability, can happen only at positive temperatures:it is a noise assisted event.Indeed, consider the caseT = 0. The equation of motion becomes

x = −V ′(x), x(0) = x0.

In this case the potential becomes aLyapunov function:

dx

dt= V ′(x)

dx

dt= −(V ′(x))2 < 0.

Hence, depending on the initial condition the particle willconverge either toa orc. The particle cannot escape from either state of local stability.


On the other hand, at high temperatures the particle does not”see” the potentialbarrier: it essentially jumps freely from one local minimumto another.

To get a better understanding of the dependence of the dynamics on the depth ofthe potential barrier relative to temperature, we solve theequation of motion (9.1)numerically. In Figure we present the time series of the particle position. Weobserve that at small temperatures the particle spends mostof its time aroundx =

±1 with rapid transitions from−1 to 1 and back.

9.3 The Mean First Passage Time

The Arrhenius-type factor in the formula for the reaction rate, eqn. (9.3) is intu-itively and it has been observed experimentally in the late nineteenth century byArrhenius and others. What is extremely important both froma theoretical and anapplied point of view is the calculation of the prefactorν, therate coefficient. Asystematic approach for the calculation of the rate coefficient, as well as the jus-tification of the Arrhenius kinetics, is that of the mean firstpassage time method(MFPT). Since this method is of independent interest and is useful in various othercontexts, we will present it in a quite general setting and apply it to the problemof the escape from a potential barrier in later sections. We will first treat the onedimensional problem and then extend the theory to arbitraryfinite dimensions.

We will restrict ourselves to the case of homogeneous Markovprocesses. It isnot very easy to extend the method to non-Markovian processes.

9.3.1 The Boundary Value Problem for the MFPT

LetXt be a continuous time diffusion process onRd whose evolution is governed

by the SDEdXx

t = b(Xxt ) dt + σ(Xx

t ) dWt, Xx0 = x. (9.4)

LetD be a bounded subset ofRd with smooth boundary. Givenx ∈ D, we want to

know how long it takes for the processXt to leave the domainD for the first time

τxD = inf t > 0 : Xxt /∈ D .

Clearly, this is a random variable which is called thefirst passage time. Theaverage of this random variable is called themean first passage timeMFPT or thefirst exit time :

τ(x) := EτxD = E

(inf t > 0 : Xx

t /∈ D∣∣∣Xx

0 = x).

9.3. THE MEAN FIRST PASSAGE TIME 195

We have written the second equality in the above in order to emphasize the factthat the mean first passage time is defined in terms of a conditional expectation, i.e.the MFPT is defined as the expectation of the first time the diffusion processesXt

leaves the domain, conditioned onXt starting atx ∈ Ω. Consequently, the MFPTis a function of the starting pointx. Consider now an ensemble of initial conditionsdistributed according to a distributionp0(x). Theconfinement timeis defined as

τ =

∫

Ωτ(x)p0(x) dx =

∫

ΩE

(inf t > 0 : Xx

t /∈ D∣∣∣Xx

0 = x)p0(x) dx.

(9.5)We can calculate the MFPT by solving an appropriate boundaryvalue problem.

The calculation of the confinement time follows then by calculating the integralin [?].

Theorem 9.3.1.The MFPT is the solution of the boundary value problem

−Lτ = 1, x ∈ D, (9.6a)

τ = 0, x ∈ ∂D, (9.6b)

whereL is the generator of the SDE 9.6.

The homogeneous Dirichlet boundary conditions correspondto anabsorbingboundary: the particles are removed when they reach the boundary. Other choicesof boundary conditions are also possible. The rigorous proof of Theorem 9.3.1 isbased on Ito’s formula.

Proof. Let ρ(X,x, t) be the probability distribution of the particles thathave notleft the domainD at time t. It solves the FP equation with absorbing boundaryconditions.

∂ρ

∂t= L∗ρ, ρ(X,x, 0) = δ(X − x), ρ|∂D = 0. (9.7)

We can write the solution to this equation in the form

ρ(X,x, t) = eL∗tδ(X − x),

where the absorbing boundary conditions are included in thedefinition of the semi-groupeL

∗t. The homogeneous Dirichlet (absorbing) boundary conditions implythat

limt→+∞

ρ(X,x, t) = 0.


That is: all particles will eventually leave the domain. The(normalized) number ofparticles that are still insideD at timet is

S(x, t) =

∫

Dρ(X,x, t) dx.

Notice that this is a decreasing function of time. We can write

∂S

∂t= −f(x, t),

wheref(x, t) is thefirst passage times distribution. The MFPT is the first mo-ment of the distributionf(x, t):

τ(x) =

∫ +∞

0f(s, x)s ds =

∫ +∞

0−dSdss ds

=

∫ +∞

0S(s, x) ds =

∫ +∞

0

∫

Dρ(X,x, s) dXds

=

∫ +∞

0

∫

DeL

∗sδ(X − x) dXds

=

∫ +∞

0

∫

Dδ(X − x)

(eLs1

)dXds =

∫ +∞

0

(eLs1

)ds.

We applyL to the above equation to deduce:

Lτ =

∫ +∞

0

(LeLt1

)dt =

∫ t

0

d

dt

(LeLt1

)dt

= −1.

In the case where a part of the boundary is absorbing and a partis reflecting,then we end up with a mixed boundary value problem for the MFPT:

−Lτ = 1, x ∈ D, (9.8a)

τ = 0, x ∈ ∂DA, (9.8b)

η · J = 0, x ∈ ∂DR. (9.8c)

Here∂DA ∪ ∂DR = ∂D where∂DA denotes the absorbing and∂DR denotes thereflecting part of the boundary andJ denotes the probability flux.

9.3. THE MEAN FIRST PASSAGE TIME 197

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x

τ(x)

Figure 9.1: The mean first passage time for Brownian motion with one absorbingand one reflecting boundary.

9.3.2 Examples

In this section we consider a few simple examples for which wecan calculate themean first passage time in closed form.

Brownian motion with one absorbing and one reflecting boundary.

We consider the problem of Brownian motion moving in the interval [a, b]. Weassume that the left boundary is absorbing and the right boundary is reflecting.The boundary value problem for the MFPT time becomes

−d2τ

dx2= 1, τ(a) = 0,

dτ

dx(b) = 0. (9.9)


τ(x) = −x2

2+ bx+ a

(a2− b).

The MFPT time for Brownian motion with one absorbing and one reflecting bound-ary in the interval[−1, 1] is plotted in Figure 9.3.2.


−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

x

τ(x)

Figure 9.2: The mean first passage time for Brownian motion with two absorbingboundaries.

Brownian motion with two reflecting boundaries.

Consider again the problem of Brownian motion moving in the interval[a, b], butnow with both boundaries being absorbing. The boundary value problem for theMFPT time becomes

−d2τ

dx2= 1, τ(a) = 0, τ(b) = 0. (9.10)


τ(x) = −x2

2+ bx+ a

(a2− b).

The MFPT time for Brownian motion with two absorbing boundaries in the interval[−1, 1] is plotted in Figure 9.3.2.

The Mean First Passage Time for a One-Dimensional DiffusionProcess

Consider now the mean exit time problem from an interval[a, b] for a general one-dimensional diffusion process with generator

L = a(x)d

dx+

1

2b(x)

d2

dx2,

9.4. ESCAPE FROM A POTENTIAL BARRIER 199

where the drift and diffusion coefficients are smooth functions and where the dif-fusion coefficientb(x) is a strictly positive function (uniform ellipticity condition).In order to calculate the mean first passage time we need to solve the differentialequation

−(a(x)

d

dx+

1

2b(x)

d2

dx2

)τ = 1, (9.11)

together with appropriate boundary conditions, dependingon whether we have oneabsorbing and one reflecting boundary or two absorbing boundaries. To solvethis equation we first define the functionψ(x) throughψ′(x) = 2a(x)/b(x) towrite (9.11) in the form

(eψ(x)τ ′(x)

)′= − 2

b(x)e−ψ(x)

The general solution of (9.11) is obtained after two integrations:

τ(x) = −2

∫ x

ae−ψ(z) dz

∫ z

a

e−ψ(y)

b(y)dy + c1

∫ x

ae−ψ(y) dy + c2,

where the constantsc1 andc2 are to be determined from the boundary conditions.When both boundaries are absorbing we get

τ(x) = −2

∫ x

ae−ψ(z) dz

∫ z

a

e−ψ(y)

b(y)dy +

2Z

Z

∫ x

ae−ψ(y) dy. (9.12)

9.4 Escape from a Potential Barrier

In this section we use the theory developed in the previous section to study thelong time/small temperature asymptotics of solutions to the Langevin equation fora particle moving in a one–dimensional potential of the form(9.2):

x = −V ′(x) − γx+√

2γkBTW . (9.13)

In particular, we justify the Arrhenius formula for the reaction rate

κ = ν(γ) exp(−β∆E)

and we calculate the escape rateν = ν(γ). In particular, we analyze the depen-dence of the escape rate on the friction coefficient. We will see that the we need todistinguish between the cases of large and small friction coefficients.


9.4.1 Calculation of the Reaction Rate in the Overdamped Regime

We consider the Langevin equation (9.13) in the limit of large friction. As wesaw in Section 8.4, in the overdamped limitγ ≫ 1, the solution to (9.13) can beapproximated by the solution to the Smoluchowski equation (9.1)

x = −V ′(x) +√

2β−1W .

We want to calculate the rate of escape from the potential barrier in this case. Weassume that the particle is initially atx0 which is neara, the left potential mini-mum. Consider the boundary value problem for the MFPT of the one dimensionaldiffusion process (9.1) from the interval(a, b):

−β−1eβV ∂x

(e−βV τ

)= 1 (9.14)

We choose reflecting BC atx = a and absorbing B.C. atx = b. We can solve (9.14)with these boundary conditions by quadratures:

τ(x) = β−1

∫ b

xdyeβV (y)

∫ y

0dze−βV (z). (9.15)

Now we can solve the problem of the escape from a potential well: the reflectingboundary is atx = a, the left local minimum of the potential, and the absorbingboundary is atx = b, the local maximum. We can replace the B.C. atx = a by arepelling B.C. atx = −∞:

τ(x) = β−1

∫ b

xdyeβV (y)

∫ y

−∞dze−βV (z).

WhenEbβ ≫ 1 the integral wrtz is dominated by the value of the potential neara. Furthermore, we can replace the upper limit of integrationby∞:

∫ z

−∞exp(−βV (z)) dz ≈

∫ +∞

−∞exp(−βV (a)) exp

(−βω

20

2(z − a)2

)dz

= exp (−βV (a))

√2π

βω20

,

where we have used the Taylor series expansion around the minimum:

V (z) = V (a) +1

2ω2

0(z − a)2 + . . .

9.4. ESCAPE FROM A POTENTIAL BARRIER 201

Similarly, the integral wrty is dominated by the value of the potential around thesaddle point. We use the Taylor series expansion

V (y) = V (b) − 1

2ω2b (y − b)2 + . . .

Assuming thatx is close toa, the minimum of the potential, we can replace thelower limit of integration by−∞. We finally obtain

∫ b

xexp(βV (y)) dy ≈

∫ b

−∞exp(βV (b)) exp

(−βω

2b

2(y − b)2

)dy

=1

2exp (βV (b))

√2π

βω2b

.

Putting everything together we obtain a formula for the MFPT:

τ(x) =π

ω0ωbexp (βEb) .

The rate of arrival atb is 1/τ . Only have of the particles escape. Consequently, theescape rate (or reaction rate), is given by1

2τ :

κ =ω0ωb2π

exp (−βEb) .

9.4.2 The Intermediate Regime:γ = O(1)

• Consider now the problem of escape from a potential well for the Langevinequation

q = −∂qV (q) − γq +√

2γβ−1W . (9.16)

• The reaction rate depends on the fiction coefficient and the temperature. Inthe overdamped limit (γ ≫ 1) we retrieve (??), appropriately rescaled withγ:

κ =ω0ωb2πγ

exp (−βEb) . (9.17)

• We can also obtain a formula for the reaction rate forγ = O(1):

κ =

√γ2

4 − ω2b −

γ2

ωb

ω0

2πexp (−βEb) . (9.18)

• Naturally, in the limit asγ → +∞ (9.18) reduces to (9.17)


9.4.3 Calculation of the Reaction Rate in the energy-diffusion-limitedregime

In order to calculate the reaction rate in the underdamped orenergy-diffusion-limited regimeγ ≪ 1 we need to study the diffusion process for the energy, (8.69)or (8.70). The result is

κ = γβI(Eb)ω0

2πe−βEb , (9.19)

whereI(Eb) denotes the action evaluated atb.


The calculation of reaction rates and the stochastic modeling of chemical reactionshas been a very active area of research since the30’s. One of the first methodsthat were developed was that oftransition state theory. Kramers developed histheory in his celebrated paper [38]. In this chapter we have based our approachto the calculation of the mean first passage time. Our analysis is based mostlyon [25, Ch. 5, Ch. 9], [75, Ch. 4] and the excellent review article [31]. We highlyrecommend this review article for further information on reaction rate theory. Seealso [30] and the review article of Melnikov (1991). A formula for the escaperate which is valid for all values of friction coefficient wasobtained by Melnikovand Meshkov in 1986, J. Chem. Phys 85(2) 1018-1027. This formula requires thecalculation of integrals and it reduced to (9.17) and (9.19)in the overdamped andunderdamped limits, respectively.

There are many applications of interest where it is important to calculate reac-tion rates for non-Markovian Langevin equations of the form

x = −V ′(x) −∫ t

0bγ(t− s)x(s) ds + ξ(t) (9.20a)

〈ξ(t)ξ(0)〉 = kBTM−1γ(t) (9.20b)

We will derive generalized non–Markovian equations of the form (9.20a), togetherwith the fluctuation–dissipation theorem (10.16), in Chapter 10. The calculation ofreaction rates for the generalized Langevin equation is presented in [30].

The long time/small temperature asymptotics can be studiedrigorously bymeans of the theory of Freidlin-Wentzell [20]. See also [3].A related issue isthat of the small temperature asymptotics for the eigenvalues (in particular, the

9.6. EXERCISES 203

first eigenvalue) of the generator of the Markov processx(t) which is the solutionof

γx = −∇V (x) +√

2γkBTW .

The theory of Freidlin and Wentzell has also been extended toinfinite dimensionalproblems. This is a very important problem in many applications such as micro-magnetics...We refer toCITE... for more details.

A systematic study of the problem of the escape from a potential well wasdeveloped by Matkowsky, Schuss and collaborators [67, 50, 51]. This approachis based on a systematic use of singular perturbation theory. In particular, thecalculation of the transition rate which is uniformly validin the friction coefficientis presented in [51]. This formula is obtained through a careful analysis of the PDE

p∂qτ − ∂qV ∂pτ + γ(−p∂p + kBT∂2p)τ = −1,

for the mean first passage timeτ . The PDE is equipped, of course, with the appro-priate boundary conditions. Singular perturbation theoryis used to study the smalltemperature asymptotics of solutions to the boundary valueproblem. The formuladerived in this paper reduces to the formulas which are validat large and smallvalues of the friction coefficient at the appropriate asymptotic limits.

The study of rare transition events between long lived metastable states is akey feature in many systems in physics, chemistry and biology. Rare transitionevents play an important role, for example, in the analysis of the transition betweendifferent conformation states of biological macromolecules such as DNA [68]. Thestudy of rare events is one of the most active research areas in the applied stochasticprocesses. Recent developments in this area involve the transition path theory of W.E and Vanden Eijnden. Various simple applications of this theory are presented inMetzner, Schutte et al 2006. As in the mean first passage time approach, transitionpath theory is also based on the solution of an appropriate boundary value problemfor the so-called commitor function.

9.6 Exercises

Chapter 10

Stochastic Processes andStatistical Mechanics

10.1 Introduction

In this final chapter we study the connection between stochastic processes andnon-equilibrium statistical mechanics. In particular, wederive stochastic equationsof evolution for a particle (or more generally, a low-dimensional deterministic–Hamiltonian–dynamical system) that is in contact with a heat bath. This derivationprovides a justification for the use of stochastic differential equations in physicsand chemistry. We also develop some additional tools that are useful in the studyof systems far from equilibrium such linear response theoryand projection operatortechniques.

In Section 10.2 we study the Kac-Zwanzig model and we derive the generalizedLangevin equation (GLE), together with the fluctuation-dissipation theorem . Thegeneralized Langevin equation is studied in Section 10.3. More general classesof models that describe the dynamics of a particle interacting with a heat bathare studied in Section 10.4. Linear response theory , one of the most importanttechniques that are used in the study of systems far from equilibrium, is developedin Section 10.5. Projection operator techniques, another extremely useful tool innon-equilibrium statistical mechanics, are studied in Section 10.6. Discussion andbibliographical remarks are included in Section 10.7. Exercises can be found inSection 10.8.

205

206CHAPTER 10. STOCHASTIC PROCESSES AND STATISTICAL MECHANICS

10.2 The Kac-Zwanzig Model

In this section we will study a simple model for the dynamics of a particle (thedistinguishedor Brownianparticle) that interacts–i.e. exchanges energy–with itsenvironment (theheat bath). The dynamics of the particle-heat bath system can bedescribed through a Hamiltonian of the form

H(Q,P ; q, p) = HBP (Q,P ) +HHB(q, p) +HI(Q, q), (10.1)

Q, P are the coordinates of the Brownian particle andq, p the coordi-nates of particles in the heat bath. The last term in the Hamiltonian function (10.1),HI(Q, q) describes the interaction between the particle and the heatbath. The heatbath is assumed to be in equilibrium at temperatureβ−1. For this, we need to pre-pare the system appropriately, i.e. we need to assume that the initial conditions forthe particles in the heat bath are random variables that are distributed according toan appropriate probability distribution, an appropriateGibbs measure.

For simplicity we will restrict ourselves to the one dimensional case. We willalso consider the simplest possible model for the heat bath as well as the simplestpossible coupling between the particle and the heat bath: the heat bath will takento consists ofN harmonic oscillators and the coupling will be taken to be linear:

H(QN , PN , q, p) =P 2N

2+ V (QN ) +

N∑

n=1

[(p2n

2mn+

1

2mnω

2nq

2n

)− λµnqnQN

],(10.2)

where we have introduced the subscriptN in the notation for the position and mo-mentum of the distinguished particle,QN andPN to emphasize their dependenceon the numberN of the harmonic oscillators in the heat bath.V (Q) denotes thepotential experienced by the Brownian particle. For notational simplicity we haveassumed that the Brownian particle has unit mass. Notice also that we have intro-duced a parameterλ that measures the strength of the coupling between the particleand the thermal reservoir and that we have also introduced a family of constantsµnNn=1.

Hamilton’s equations of motion are:

QN + V ′(QN ) = λ

N∑

n=1

µnqn, (10.3a)

qn + ω2n

(qn −

λµnmn

QN

)= 0, n = 1, . . . N. (10.3b)

10.2. THE KAC-ZWANZIG MODEL 207

The equations for the particles in the harmonic heat bath aresecond order linearinhomogeneous equations with constant coefficients. Our plan is to solve them andthen to substitute the result in the equations of motion for the Brownian particle.We can solve the equations of motion for the heat bath variables using the variationof constants formula. Setzn = (qn vn)

T , vn = qn. Then equations (10.3b) can bewritten as

dzndt

= Anzn + λhN (t), (10.4)

where

An =

(0 1

−ω2n 0

)and F (t) =

(0

µn

mnQN (t)

)

The solution of (10.4) is

zn(t) = eAntzn(0) + λ

∫ t

0eAn(t−s)hN (s) ds.

It is straightforward to calculate the exponential of the matrix An (See Exercise 1)

eAnt = cos(ωnt)I +1

ωnsin(ωnt)An, (10.5)

whereI stands for the2×2 identity matrix. From this we obtain, withpn = mnqn,

qn(t) = qn(0) cos(ωnt) +pn(0)

mnωnsin(ωnt)

+λµn

mnωn

∫ t

0sin(ωn(t− s))QN (s) ds. (10.6)

Now we can substitute (10.6) into (10.3a) to obtain a closed equation that describesthe dynamics of the distinguished particle. However, it is more customary to per-form an integration by parts in (10.6) first:

qn(t) =

(qn(0) −

λµnmnω2

n

QN (0)

)cos(ωnt) +

pn(0)

mnωnsin(ωnt)

+λµn

mnω2n

QN (t) − λµn

mnω2n

∫ t

0cos(ωn(t− s))QN (s) ds

=: Γn cos(ωnt) + ∆n sin(ωnt)

+EnQN (t) − λ

∫ t

0Rn(t− s)QN (s) ds.

We substitute this in equation (10.3) to obtain

QN + V ′(QN ) = λ2ENQN (t) − λ2

∫ t

0RN (t− s)QN (s) ds + λFN (t), (10.7)


where

EN =

N∑

n=1

µ2n

mnω2n

, (10.8a)

RN (t) =

N∑

n=1

µ2n

mnω2n

cos(ωnt), (10.8b)

FN (t) =N∑

n=1

[(µnqn(0) −

λµ2n

mnω2n

QN (0)

)cos(ωnt) (10.8c)

+µnpn(0)

mnωnsin(ωnt)

](10.8d)

It is important to note that equation (10.7) withEN , RN (t) and FN (t) givenby (10.8a) is equivalent to the original Hamiltonian system(10.2): so far no ap-proximation or particular assumption has been made. Noticealso that the abovecalculation is valid for any number of harmonic oscillatorsin the heat bath, evenfor N = 1!

Equation (10.7) can be also written in the form

QN + V ′eff(QN ) = −λ2

∫ t

0RN (t− s)QN (s) ds+ λFN (t), (10.9)

where we have defined the effective potential

Veff(Q) = V (Q) − λ2 1

2ENQ

2. (10.10)

Consequently, the effect of the interaction between the Brownian particle and theheat bath is not only to introduce two additional terms to theequations of motionfor the Brownian particle, the two terms on the right hand side of (10.9), but also tomodify the potential. Notice also that all the dependence onthe initial conditionsin (10.9) is included inFN (t). When the initial conditions for the heat bath arerandom, the case of interest here,FN (t) becomes a stochastic process, a randomforcing term.

The initial conditions of the Brownian particleQN (0), PN (0) =: Q0, P01

are taken to be deterministic. As it has already been mentioned, the initial condi-tions for the harmonic heat bath are chosen so that the thermal reservoir is in equi-librium. Here we can make two choices: we can either assume that the heat bath

1The initial conditions for the distinguished particle are,of course, independent of the number ofparticles in the heat bath


initially in equilibrium in the absenceof the Brownian particle or that the heat bathis initially in equilibrium in the presenceof the distinguished particle, i.e. that theinitial positions and momenta of the heat bath particles aredistributed according toa Gibbs distribution, conditional on the knowledge ofQ0, P0:

µβ(dpdq) = Z−1e−βHeff(q,p,QN) dqdp, (10.11)

where

Heff(q, p,QN ) =N∑

n=1

[p2n

2mn+

1

2mnω

2n

(qn −

λµnmnω2

n

QN

)2], 2 (10.12)

β is the inverse temperature andZ is the normalization constant. This is a wayof introducing the concept of the temperature in the system:through the averagekinetic energy of the bath particles.

Our assumption that the initial conditions for the heat bathare distributed ac-cording to (10.11) imply that

qn(0) =λµnmnω2

n

Q0 +

√β−1k−1

n ξn, pn(0) =√mnβ−1 ηn, (10.13)

where theξn ηn are mutually independent sequences of i.i.d.N (0, 1) random vari-ables and we have used the notationkn = mnω

2n. We reiterate that we actually

consider the Gibbs measure of an effective Hamiltonian. If we assume that theheat bath is in equilibrium att = 0 in the absence of the distinguished particle,then we haveqn(0) =

√β−1k−1

n ξn. Our choice of the initial conditions (10.13)ensures that the forcing term in the generalized Langevin equation that we willderive is mean zero (see below).

Now we use (10.13) into (10.8c) to obtain

FN (t) =√β−1

N∑

n=1

µn

√k−1n

(ξn cos(ωnt) + ηn sin(ωnt)

). (10.14)

Equation (10.9) is called thegeneralized Langevin equation, FN (t) the noiseandRN (t)

RN (t) =N∑

n=1

µ2n

kncos(ωnt) (10.15)

2Notice that if we add the quadratic term inQ to the Hamiltonian (10.2) then no correction to thepotentialV (Q) (eqn. (10.10)) appears.


is thememory kernel. The noise and memory kernel are related. This is not sur-prising, since the dissipation (i.e. the term

∫ t0 RN (t − s)QN (s) ds ) and the noise

FN (t) in (10.9) have the same source, namely the interaction between the Brow-nian particle and the heat bath. In fact, the memory kernel isthe autocorrelationfunction of the noise (times a constant, the temperature). The following propositionsummarizes the basic properties of the noise termFN (t).

Proposition 10.2.1. The noise termFN (t) is a mean zero Gaussian stationaryprocess with autocorrelation function

〈FN (t)FN (s)〉 = β−1RN (t− s). (10.16)

In the writing the above equation we have used the notation〈〉 to denote theaverage with respect to the random variablesξn, ηnNn=1.

Remark 10.2.2. Equation(10.16)is called thefluctuation-dissipation theorem

Proof. The fact thatFN (t) is mean zero follows from (10.13). Gaussianity followsfrom the fact that theξn ηn are mutually independent Gaussian random variables.Stationarity is proved in Exercise 3, Chapter 3. The proof of(10.16) follows fromthe formulas〈ξnξm〉 = δnm, 〈ηnηm〉 = δnm, 〈ξnηm〉 = 0, n,m = 1, . . . N and asimple trigonometric identity

〈FN (t)FN (s)〉 = β−1N∑

n=1

µ2nk

−1n

(cos(ωnt) cos(ωns)

+ sin(ωnt) sin(ωns))

= β−1RN (t− s).

By choosing the frequenciesωn and spring constantskn(ω) of the heat bathparticles appropriately we can pass to the limit asN → +∞ and obtain the GLEwith different memory kernelsR(t) and noise processesF (t).

Let a ∈ (0, 1), 2b = 1 − a and setωn = Naζn whereζn∞n=1 are i.i.d. withζ1 ∼ U(0, 1). Furthermore, we choose the spring constants according to

kn =f2(ωn)

N2b,


where the functionf(ωn) decays sufficiently fast at infinity. We can rewrite thedissipation and noise terms in the form

RN (t) =N∑

n=1

f2(ωn) cos(ωnt)∆ω

and

FN (t) =

N∑

n=1

f(ωn) (ξn cos(ωnt) + ηn sin(ωnt))√

∆ω,

where∆ω = Na/N . Using now properties of Fourier series with random coeffi-cients/frequencies and of weak convergence of probabilitymeasures we can passto the limit:

RN (t) → R(t) in L1[0, T ],

for a.a.ζn∞n=1 and

FN (t) → F (t) weakly in C([0, T ],R).

The timeT > 0 if finite but arbitrary. The limiting kernel and noise satisfy thefluctuation-dissipation theorem (10.16):

〈F (t)F (s)〉 = β−1R(t− s). (10.17)

QN (t), the solution of (??) converges weakly to the solution of the limiting GLE

Q = −V ′(Q) − λ2

∫ t

0R(t− s)Q(s) ds + λF (t). (10.18)

The properties of the limiting dissipation and noise are determined by the functionf(ω). As an example, consider the Lorentzian function

f2(ω) =2α/π

α2 + ω2(10.19)

with α > 0. ThenR(t) = e−α|t|.

The noise processF (t) is a mean zero stationary Gaussian process with continuouspaths and, from (10.17), exponential correlation function:

〈F (t)F (s)〉 = β−1e−α|t−s|.


Hence,F (t) is the stationary Ornstein-Uhlenbeck process:

dF

dt= −αF +

√2β−1α

dW

dt, (10.20)

with F (0) ∼ N (0, β−1). The GLE (10.18) becomes

Q = −V ′(Q) − λ2

∫ t

0e−α|t−s|Q(s) ds+ λ2F (t), (10.21)

whereF (t) is the OU process (10.20).Q(t), the solution of the GLE (10.18), is nota Markov process, i.e. the future is not statistically independent of the past, whenconditioned on the present. The stochastic processQ(t) has memory. We canturn (10.18) into a Markovian SDE by enlarging the dimensionof state space, i.e.introducing auxiliary variables. We might have to introduce infinitely many vari-ables! For the case of the exponential memory kernel, when the noise is givenby an OU process, it is sufficient to introduce one auxiliary variable. We canrewrite (10.21) as a system of SDEs:

dQ

dt= P,

dP

dt= −V ′(Q) + λZ,

dZ

dt= −αZ − λP +

√2αβ−1

dW

dt,

whereZ(0) ∼ N (0, β−1).

The processQ(t), P (t), Z(t) ∈ R3 is Markovian.

It is a degenerate Markov process: noise acts directly only on one of the3 degreesof freedom.

We can eliminate the auxiliary processZ by taking an appropriate distinguishedlimit.

Setλ =√γε−1, α = ε−2. Equations (10.23) become

dQ

dt= P,

dP

dt= −V ′(Q) +

√γ

εZ,

dZ

dt= − 1

ε2Z −

√γ

εP +

√2β−1

ε2dW

dt.

10.3. THE GENERALIZED-LANGEVIN EQUATION 213

We can use tools from singular perturbation theory for Markov processes to showthat, in the limit asε→ 0, we have that

1

εZ →

√2γβ−1

dW

dt− γP.

Thus, in this limit we obtain the MarkovianLangevin Equation (R(t) = γδ(t))

Q = −V ′(Q) − γQ+√

2γβ−1dW

dt. (10.24)

10.3 The Generalized-Langevin Equation

In the previous section we studied the gLE for the case where the memory kerneldecays exponentially fast. We showed that we can represent the gLE as a Marko-vian processes by adding one additional variable, the solution of a linear SDE. Anatural question which arises is whether it is always possible to turn the gLE intoa Markovian system by adding a finite number of additional variables. This is notalways the case. However, there are many applications wherethe memory kerneldecays sufficiently fast so that we can approximate the gLE bya finite dimensionalMarkovian system.

We introduce the concept of aquasi-Markovian stochastic process.

Definition 10.3.1. We will say that a stochastic processXt is quasi-Markovian ifit can be represented as a Markovian stochastic process by adding a finite numberof additional variables: There exists a stochastic processYt so thatXt, Yt is aMarkov process.

In many cases the additional variablesYt in terms of solutions to linear SDEs.This is possible, for example, when the memory kernel consists of a sum of expo-nential functions, a natural extension of the case considered in the previous section.

Proposition 10.3.2.Consider the generalized Langevin equation

Q = p, P = −V ′(Q) −∫ t

0R(t− s)P (s) ds + F (t) (10.25)

with a memory kernel of the form

R(t) =n∑

j=1

λje−αj |t| (10.26)


andF (t) being a mean zero stationary Gaussian process and whereR(t) andF (t)

are related through the fluctuation-dissipation theorem,

〈F (t)F (s)〉 = β−1R(t− s). (10.27)

Then(10.25)is equivalent to the Markovian SDE

Q = P, P = −V ′(Q)+

n∑

j=1

λjuj , uj = −αjuj−λjpj+√

2αjβ−1, j = 1, . . . n,

(10.28)with uj ∼ N (0, β−1) and whereWj(t) are independent standard one dimensionalBrownian motions.

Proof. We solve the equations foruj :

uj = −λj∫ t

0e−αj(t−s)P (s) ds+ e−αjtuj(0) +

√2αjβ−1

∫ t

0e−αj(t−s)dWj

=: −∫ t

0Rj(t− s)P (s) ds + ηj(t).

We substitute this into the equation forP to obtain

P = −V ′(Q) +n∑

j=1

λjuj

= −V ′(Q) +

n∑

j=1

λj

(−∫ t

0Rj(t− s)P (s) ds + ηj(t)

)

= −V ′(Q) −∫ t

0R(t− s)P (s) ds+ F (t)

whereR(t) is given by (10.26) and the noise processF (t) is

F (t) =

n∑

j=1

λjηj(t),

with ηj(t) being one-dimensional stationary independent OU processes. We read-

10.3. THE GENERALIZED-LANGEVIN EQUATION 215

ily check that the fluctuation-dissipatione theorem is satisfied:

〈F (t)F (s)〉 =

n∑

i,j=1

λiλj〈ηi(s)ηj(t)〉

=

n∑

i,j=1

λiλjδije−αi|t−s|

=

n∑

i=1

λie−αi|t−s| = R(t− s).

These additional variables are solutions of a linear systemof SDEs. This fol-lows from results in approximation theory. Consider now thecase where the mem-ory kernel is a bounded analytic function. Its Laplace transform

R(s) =

∫ +∞

0e−stR(t) dt

can be represented as a continued fraction:

R(s) =∆2

1

s+ γ1 +∆2

2...

, γi > 0, (10.29)

SinceR(t) is bounded, we have that

lims→∞

R(s) = 0.

Consider an approximationRN (t) such that the continued fraction representationterminates afterN steps.

RN (t) is bounded which implies that

lims→∞

RN (s) = 0.

The Laplace transform ofRN (t) is a rational function:

RN (s) =

∑Nj=1 ajs

N−j

sN +∑N

j=1 bjsN−j

, aj , bj ∈ R. (10.30)


This is the Laplace transform of the autocorrelation function of an appropriatelinear system of SDEs. Indeed, set

dxjdt

= −bjxj + xj+1 + ajdWj

dt, j = 1, . . . , N, (10.31)

with xN+1(t) = 0. The processx1(t) is a stationary Gaussian process with auto-correlation functionRN (t). ForN = 1 andb1 = α, a1 =

√2β−1α we derive

the GLE (10.21) withF (t) being the OU process (10.20). Consider now the caseN = 2 with bi = αi, i = 1, 2 anda1 = 0, a2 =

√2β−1α2. The GLE becomes

Q = −V ′(Q) − λ2

∫ t

0R(t− s)Q(s) ds + λF1(t),

F1 = −α1F1 + F2,

F2 = −α2F2 +√

2β−1α2W2,

withβ−1R(t− s) = 〈F1(t)F1(s)〉.

We can write (10.33) as a Markovian system for the variablesQ, P, Z1, Z2:

Q = P,

P = −V ′(Q) + λZ1(t),

Z1 = −α1Z1 + Z2,

Z2 = −α2Z2 − λP +√

2β−1α2W2.

Notice that this diffusion process is ”more degenerate” than (10.21): noise actson fewer degrees of freedom. It is still, however, hypoelliptic (Hormander’s con-dition is satisfied): there is sufficient interaction between the degrees of freedomQ, P, Z1, Z2 so that noise (and hence regularity) is transferred from thede-grees of freedom that are directly forced by noise to the onesthat are not. Thecorresponding Markov semigroup has nice regularizing properties. There exists asmooth density. Stochastic processes that can be written asa Markovian process byadding afinite number of additional variables are calledquasimarkovian . Underappropriate assumptions on the potentialV (Q) the solution of the GLE equationis an ergodic process. It is possible to study the ergodic properties of a quasi-markovian processes by analyzing the spectral properties of the generator of thecorresponding Markov process. This leads to the analysis ofthe spectral propertiesof hypoelliptic operators.

10.4. OPEN CLASSICAL SYSTEMS 217

10.4 Open Classical Systems

When studying the Kac-Zwanzing model we considered a one dimensional Hamil-tonian system coupled to a finite dimensional Hamiltonian system with randominitial conditions (the harmonic heat bath) and then passedto the theromdynamiclimit N → ∞. We can consider a small Hamiltonian system coupled to its environ-ment which we model as an infinite dimensional Hamiltonian system with randominitial conditions. We have acoupled particle-field model. The distinguishedparticle (Brownian particle) is described through the Hamiltonian

HDP =1

2p2 + V (q). (10.34)

We will model the environment through a classical linear field theory (i.e. the waveequation) with infinite energy:

∂2t φ(t, x) = ∂2

xφ(t, x). (10.35)

The Hamiltonian of this system is

HHB(φ, π) =

∫ (|∂xφ|2 + |π(x)|2

). (10.36)

π(x) denotes the conjugate momentum field. The initial conditions are distributedaccording to the Gibbs measure (which in this case is a Gaussian measure) at in-verse temperatureβ, which we formally write as

”µβ = Z−1e−βH(φ,π) dφdπ”. (10.37)

Care has to be taken when defining probability measures in infinite dimensions.Under this assumption on the initial conditions, typical configurations of the

heat bath have infinite energy. In this way, the environment can pump enoughenergy into the system so that non-trivial fluctuations emerge. We will assumelinear coupling between the particle and the field:

HI(q, φ) = q

∫∂qφ(x)ρ(x) dx. (10.38)

where The functionρ(x) models the coupling between the particle and the field.This coupling is influenced by the dipole coupling approximation from classicalelectrodynamics. The Hamiltonian of the particle-field model is

H(q, p, φ, π) = HDP (p, q) + H(φ, π) +HI(q, φ). (10.39)


The corresponding Hamiltonian equations of motion are a coupled system of equa-tions of the coupled particle field model. Now we can proceed as in the case of thefinite dimensional heat bath. We can integrate the equationsmotion for the heatbath variables and plug the solution into the equations for the Brownian particle toobtain the GLE. The final result is

q = −V ′(q) −∫ t

0R(t− s)q(s) + F (t), (10.40)

with appropriate definitions for the memory kernel and the noise, which are relatedthrough the fluctuation-dissipation theorem.

10.5 Linear Response Theory

10.6 Projection Operator Techniques

Consider now theN + 1-dimensional Hamiltonian (particle+ heat bath) with ran-dom initial conditions. TheN + 1− probability distribution function fN+1 sat-isfies theLiouville equation

∂fN+1

∂t+ fN+1,H = 0, (10.41)

whereH is the full Hamiltonian and·, · is the Poisson bracket

A,B =N∑

j=0

(∂A

∂qj

∂B

∂pj− ∂B

∂qj

∂A

∂pj

).

We introduce the Liouville operator

LN+1· = −i·,H.

The Liouville equation can be written as

i∂fN+1

∂t= LN+1fN+1. (10.42)

We want to obtain a closed equation for the distribution function of the Brownianparticle. We introduce a projection operator which projects onto the distributionfunctionf of the Brownian particle:

PfN+1 = f, PfN+1 = h.


The Liouville equation becomes

i∂f

∂t= PL(f + h), (10.43a)

i∂h

∂t= (I − P )L(f + h). (10.43b)

We integrate the second equation and substitute into the first equation. We obtain

i∂f

∂t= PLf − i

∫ t

0PLe−i(I−P )Ls(I − P )Lf(t− s) ds+ PLe−i(I−P )Lth(0).

(10.44)In the Markovian limit (large mass ratio) we obtain the Fokker-Planck equation (??).


.The original papers by Kac et al and by Zwanzig are [17, 74]. See also [16].

The variant of the Kac-Zwanzig model that we have discussed in this chapter wasstudied in [27]. An excellent discussion on the derivation of the Fokker-Planckequation using projection operator techniques can be foundin [52].

Applications of linear response theory to climate modelingcan be found in.

10.8 Exercises

1. Prove (10.5). Use this formula to obtain (10.6).

Index

autocorrelation function, 32

Banach space, 16Brownian motion

scaling and symmetry properties, 43

central limit theorem, 24conditional expectation, 18confinement time, 195correlation coefficient, 17covariance function, 32

Diffusion processconfinement time, 195mean first passage time, 194

Diffusion processesreversible, 110

Dirichlet form, 113

equationFokker-Planck, 90kinetic, 120Klein-Kramers-Chandrasekhar, 141Langevin, 141

EquationGeneralized Langevin, 205

equationgeneralized Langevin, 209

first passage time, 194fluctuation-dissipation theorem, 205, 210

Fokker-Planck, 90Fokker-Planck equation, 131Fokker-Planck equation

classical solution of, 91

Gaussian stochastic process, 30generalized Langevin equation, 205, 209generator, 68, 129Gibbs distribution, 111Gibbs measure, 113, 206Green-Kubo formula, 39

heat bath, 206

inverse temperature, 103Ito formula, 130

Joint probability density, 99

Kac-Zwanzig model, 206Karhunen-Loeve Expansion, 46Karhunen-Loeve Expansion

for Brownian Motion, 49kinetic equation, 120Kolmogorov equation, 130

Langevin equation, 141law, 13law of large numbers

strong, 24linear response theory, 205

221

222 INDEX

Markov Chain Monte Carlo, 115MCMC, 115Mean first passage time, 194mean first passage time, MFPT, 194Multiplicative noise, 138

operatorhypoelliptic, 142

Ornstein-Uhlenbeck processFokker-Planck equation for, 98

partition function, 111Poincare’s inequality

for Gaussian measures, 105Poincare’s inequality, 113

Quasimarkovian stochastic process, 216

random variableGaussian, 17uncorrelated, 17

Reversible diffusion, 110

spectral density, 35stationary process, 31stationary process

second order stationary, 32strictly stationary, 31wide sense stationary, 32

stochastic differential equation, 43Stochastic Process

quasimarkovian, 216stochastic process

definition, 29Gaussian, 30second-order stationary, 32stationary, 31equivalent, 30

stochastic processesstrictly stationary, 31

theoremfluctuation-dissipation, 205fluctuations-dissipation, 210

transport coefficient, 39

Wiener process, 40

Bibliography

[1] L. Arnold. Stochastic differential equations: theory and applications. Wiley-Interscience [John Wiley & Sons], New York, 1974. Translated from theGerman.

[2] R. Balescu.Statistical dynamics. Matter out of equilibrium. Imperial CollegePress, London, 1997.

[3] N. Berglund and B. Gentz.Noise-induced phenomena in slow-fast dynam-ical systems. Probability and its Applications (New York). Springer-VerlagLondon Ltd., London, 2006. A sample-paths approach.

[4] L. Breiman.Probability, volume 7 ofClassics in Applied Mathematics. Soci-ety for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992.Corrected reprint of the 1968 original.

[5] S. Cerrai and M. Freidlin. On the Smoluchowski-Kramers approximationfor a system with an infinite number of degrees of freedom.Probab. TheoryRelated Fields, 135(3):363–394, 2006.

[6] S. Cerrai and M. Freidlin. Smoluchowski-Kramers approximation for a gen-eral class of SPDEs.J. Evol. Equ., 6(4):657–689, 2006.

[7] S. Chandrasekhar. Stochastic problems in physics and astronomy. Rev. Mod.Phys., 15(1):1–89, Jan 1943.

[8] A.J. Chorin and O.H. Hald.Stochastic tools in mathematics and science,volume 1 ofSurveys and Tutorials in the Applied Mathematical Sciences.Springer, New York, 2006.

[9] W. Dietrich, I. Peschel, and W.R. Schneider. Diffusion in periodic potentials.Z. Phys, 27:177–187, 1977.

223

224 BIBLIOGRAPHY

[10] N. Wax (editor).Selected Papers on Noise and Stochastic Processes. Dover,New York, 1954.

[11] A. Einstein. Investigations on the theory of the Brownian movement. DoverPublications Inc., New York, 1956. Edited with notes by R. F¨urth, Translatedby A. D. Cowper.

[12] S.N. Ethier and T.G. Kurtz.Markov processes. Wiley Series in Probabilityand Mathematical Statistics: Probability and Mathematical Statistics. JohnWiley & Sons Inc., New York, 1986.

[13] L.C. Evans.Partial Differential Equations. AMS, Providence, Rhode Island,1998.

[14] W. Feller. An introduction to probability theory and its applications. Vol. I.Third edition. John Wiley & Sons Inc., New York, 1968.

[15] W. Feller. An introduction to probability theory and its applications. Vol. II.Second edition. John Wiley & Sons Inc., New York, 1971.

[16] G. W. Ford and M. Kac. On the quantum Langevin equation.J. Statist. Phys.,46(5-6):803–810, 1987.

[17] G. W. Ford, M. Kac, and P. Mazur. Statistical mechanics of assemblies ofcoupled oscillators.J. Mathematical Phys., 6:504–515, 1965.

[18] M. Freidlin and M. Weber. A remark on random perturbations of the nonlinearpendulum.Ann. Appl. Probab., 9(3):611–628, 1999.

[19] M. I. Freidlin and A. D. Wentzell. Random perturbationsof Hamiltoniansystems.Mem. Amer. Math. Soc., 109(523):viii+82, 1994.

[20] M.I. Freidlin and A.D. Wentzell.Random Perturbations of dunamical sys-tems. Springer-Verlag, New York, 1984.

[21] A. Friedman.Partial differential equations of parabolic type. Prentice-HallInc., Englewood Cliffs, N.J., 1964.

[22] A. Friedman.Stochastic differential equations and applications. Vol.1. Aca-demic Press [Harcourt Brace Jovanovich Publishers], New York, 1975. Prob-ability and Mathematical Statistics, Vol. 28.

BIBLIOGRAPHY 225

[23] A. Friedman.Stochastic differential equations and applications. Vol.2. Aca-demic Press [Harcourt Brace Jovanovich Publishers], New York, 1976. Prob-ability and Mathematical Statistics, Vol. 28.

[24] H. Gang, A. Daffertshofer, and H. Haken. Diffusion in periodically forcedBrownian particles moving in space–periodic potentials.Phys. Rev. Let.,76(26):4874–4877, 1996.

[25] C. W. Gardiner.Handbook of stochastic methods. Springer-Verlag, Berlin,second edition, 1985. For physics, chemistry and the natural sciences.

[26] I. I. Gikhman and A. V. Skorokhod.Introduction to the theory of randomprocesses. Dover Publications Inc., Mineola, NY, 1996.

[27] D. Givon, R. Kupferman, and A.M. Stuart. Extracting macroscopic dynamics:model problems and algorithms.Nonlinearity, 17(6):R55–R127, 2004.

[28] M. Hairer and G. A. Pavliotis. From ballistic to diffusive behavior in periodicpotentials.J. Stat. Phys., 131(1):175–202, 2008.

[29] M. Hairer and G.A. Pavliotis. Periodic homogenizationfor hypoelliptic dif-fusions.J. Statist. Phys., 117(1-2):261–279, 2004.

[30] P. Hanggi. Escape from a metastable state.J. Stat. Phys., 42(1/2):105–140,1986.

[31] P. Hanggi, P. Talkner, and M. Borkovec. Reaction-rate theory: fifty yearsafter Kramers.Rev. Modern Phys., 62(2):251–341, 1990.

[32] W. Horsthemke and R. Lefever.Noise-induced transitions, volume 15 ofSpringer Series in Synergetics. Springer-Verlag, Berlin, 1984. Theory andapplications in physics, chemistry, and biology.

[33] J. Jacod and A.N. Shiryaev.Limit theorems for stochastic processes, vol-ume 288 ofGrundlehren der Mathematischen Wissenschaften [FundamentalPrinciples of Mathematical Sciences]. Springer-Verlag, Berlin, 2003.

[34] F. John. Partial differential equations, volume 1 ofApplied MathematicalSciences. Springer-Verlag, New York, fourth edition, 1991.

[35] S. Karlin and H. M. Taylor.A second course in stochastic processes. Aca-demic Press Inc. [Harcourt Brace Jovanovich Publishers], New York, 1981.

226 BIBLIOGRAPHY

[36] S. Karlin and H.M. Taylor.A first course in stochastic processes. AcademicPress [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London, 1975.

[37] L. B. Koralov and Y. G. Sinai.Theory of probability and random processes.Universitext. Springer, Berlin, second edition, 2007.

[38] H. A. Kramers. Brownian motion in a field of force and the diffusion modelof chemical reactions.Physica, 7:284–304, 1940.

[39] N. V. Krylov. Introduction to the theory of diffusion processes, volume 142 ofTranslations of Mathematical Monographs. American Mathematical Society,Providence, RI, 1995.

[40] R. Kupferman, G. A. Pavliotis, and A. M. Stuart. Ito versus Stratonovichwhite-noise limits for systems with inertia and colored multiplicative noise.Phys. Rev. E (3), 70(3):036120, 9, 2004.

[41] A.M. Lacasta, J.M Sancho, A.H. Romero, I.M. Sokolov, and K. Lindenberg.From subdiffusion to superdiffusion of particles on solid surfaces.Phys. Rev.E, 70:051104, 2004.

[42] P. D. Lax.Linear algebra and its applications. Pure and Applied Mathematics(Hoboken). Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, secondedition, 2007.

[43] S. Lifson and J.L. Jackson. On the self–diffusion of ions in polyelectrolyticsolution. J. Chem. Phys, 36:2410, 1962.

[44] M. Loeve. Probability theory. I. Springer-Verlag, New York, fourth edition,1977. Graduate Texts in Mathematics, Vol. 45.

[45] M. Loeve. Probability theory. II. Springer-Verlag, New York, fourth edition,1978. Graduate Texts in Mathematics, Vol. 46.

[46] M. C. Mackey. Time’s arrow. Dover Publications Inc., Mineola, NY,2003. The origins of thermodynamic behavior, Reprint of the1992 original[Springer, New York; MR1140408].

[47] M.C. Mackey, A. Longtin, and A. Lasota. Noise-induced global asymptoticstability. J. Statist. Phys., 60(5-6):735–751, 1990.

BIBLIOGRAPHY 227

[48] P. Mandl. Analytical treatment of one-dimensional Markov processes. DieGrundlehren der mathematischen Wissenschaften, Band 151.Academia Pub-lishing House of the Czechoslovak Academy of Sciences, Prague, 1968.

[49] P. A. Markowich and C. Villani. On the trend to equilibrium for the Fokker-Planck equation: an interplay between physics and functional analysis.Mat.Contemp., 19:1–29, 2000.

[50] B. J. Matkowsky, Z. Schuss, and E. Ben-Jacob. A singularperturbation ap-proach to Kramers’ diffusion problem.SIAM J. Appl. Math., 42(4):835–849,1982.

[51] B. J. Matkowsky, Z. Schuss, and C. Tier. Uniform expansion of the transitionrate in Kramers’ problem.J. Statist. Phys., 35(3-4):443–456, 1984.

[52] R.M. Mazo. Brownian motion, volume 112 ofInternational Series of Mono-graphs on Physics. Oxford University Press, New York, 2002.

[53] J. Meyer and J. Schroter. Comments on the Grad procedure for the Fokker-Planck equation.J. Statist. Phys., 32(1):53–69, 1983.

[54] E. Nelson. Dynamical theories of Brownian motion. Princeton UniversityPress, Princeton, N.J., 1967.

[55] G.C. Papanicolaou and S. R. S. Varadhan. Ornstein-Uhlenbeck process in arandom potential.Comm. Pure Appl. Math., 38(6):819–834, 1985.

[56] G. A. Pavliotis and A. M. Stuart. Analysis of white noiselimits for stochasticsystems with two fast relaxation times.Multiscale Model. Simul., 4(1):1–35(electronic), 2005.

[57] G. A. Pavliotis and A. M. Stuart. Parameter estimation for multiscale diffu-sions.J. Stat. Phys., 127(4):741–781, 2007.

[58] G. A. Pavliotis and A. Vogiannou. Diffusive transport in periodic potentials:Underdamped dynamics.Fluct. Noise Lett., 8(2):L155–173, 2008.

[59] G.A. Pavliotis and A.M. Stuart.Multiscale methods, volume 53 ofTexts inApplied Mathematics. Springer, New York, 2008. Averaging and homoge-nization.

228 BIBLIOGRAPHY

[60] R. L. R. L. Stratonovich.Topics in the theory of random noise. Vol. II. Re-vised English edition. Translated from the Russian by Richard A. Silverman.Gordon and Breach Science Publishers, New York, 1967.

[61] P. Reimann, C. Van den Broeck, H. Linke, P. Hanggi, J.M.Rubi, and A. Perez-Madrid. Diffusion in tilted periodic potentials: enhancement, universality andscaling.Phys. Rev. E, 65(3):031104, 2002.

[62] P. Reimann, C. Van den Broeck, H. Linke, J.M. Rubi, and A.Perez-Madrid.Giant acceleration of free diffusion by use of tilted periodic potentials.Phys.Rev. Let., 87(1):010602, 2001.

[63] Frigyes Riesz and Bela Sz.-Nagy.Functional analysis. Dover PublicationsInc., New York, 1990. Translated from the second French edition by Leo F.Boron, Reprint of the 1955 original.

[64] H. Risken. The Fokker-Planck equation, volume 18 ofSpringer Series inSynergetics. Springer-Verlag, Berlin, 1989.

[65] H. Rodenhausen. Einstein’s relation between diffusion constant and mobilityfor a diffusion model.J. Statist. Phys., 55(5-6):1065–1088, 1989.

[66] M Schreier, P. Reimann, P. Hanggi, and E. Pollak. Giantenhancement ofdiffusion and particle selection in rocked periodic potentials. Europhys. Let.,44(4):416–422, 1998.

[67] Z. Schuss. Singular perturbation methods in stochastic differential equationsof mathematical physics.SIAM Review, 22(2):119–155, 1980.

[68] Ch. Schutte and W. Huisinga. Biomolecular conformations can be identi-fied as metastable sets of molecular dynamics. InHandbook of NumericalAnalysis (Computational Chemistry), Vol X, 2003.

[69] C. Schwab and R.A. Todor. Karhunen-Loeve approximation of random fieldsby generalized fast multipole methods.J. Comput. Phys., 217(1):100–122,2006.

[70] R.B. Sowers. A boundary layer theory for diffusively perturbed transportaround a heteroclinic cycle.Comm. Pure Appl. Math., 58(1):30–84, 2005.

BIBLIOGRAPHY 229

[71] D.W. Stroock. Probability theory, an analytic view. Cambridge UniversityPress, Cambridge, 1993.

[72] G. I. Taylor. Diffusion by continuous movements.London Math. Soc.,20:196, 1921.

[73] G. E. Uhlenbeck and L. S. Ornstein. On the theory of the brownian motion.Phys. Rev., 36(5):823–841, Sep 1930.

[74] R. Zwanzig. Nonlinear generalized Langevin equations. J. Stat. Phys.,9(3):215–220, 1973.

[75] R. Zwanzig.Nonequilibrium statistical mechanics. Oxford University Press,New York, 2001.

stochastic processes and applications - …lucambio/ce222/2s2011/stoch_proc_notes.pdf · 2 elements...

Documents