stochastic processes and applications

STOCHASTIC PROCESSES ANDAPPLICATIONS

G.A. PavliotisDepartment of Mathematics

Imperial College LondonLondon SW7 2AZ, UK

June 9, 2011

Contents

Preface vii

1 Introduction 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 The One-Dimensional Random Walk . . . . . . . . . . . . . . . . 31.4 Stochastic Modeling of Deterministic Chaos . . . . . . . . . . . . 61.5 Why Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . 61.6 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 71.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Elements of Probability Theory 92.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Basic Definitions from Probability Theory . . . . . . . . . . . . . 9

2.2.1 Conditional Probability . . . . . . . . . . . . . . . . . . . 122.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Expectation of Random Variables . . . . . . . . . . . . . 162.4 Conditional Expecation . . . . . . . . . . . . . . . . . . . . . . . 182.5 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . 192.6 Gaussian Random Variables . . . . . . . . . . . . . . . . . . . . 202.7 Types of Convergence and Limit Theorems . . . . . . . . . . . . 232.8 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 252.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Basics of the Theory of Stochastic Processes 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Definition of a Stochastic Process . . . . . . . . . . . . . . . . . 29

i

ii CONTENTS

3.3 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . 313.3.1 Strictly Stationary Processes . . . . . . . . . . . . . . . . 313.3.2 Second Order Stationary Processes . . . . . . . . . . . . 323.3.3 Ergodic Properties of Second-Order Stationary Processes . 37

3.4 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . 393.5 Other Examples of Stochastic Processes . . . . . . . . . . . . . . 44

3.5.1 Brownian Bridge . . . . . . . . . . . . . . . . . . . . . . 443.5.2 Fractional Brownian Motion . . . . . . . . . . . . . . . . 453.5.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . 46

3.6 The Karhunen-Loeve Expansion . . . . . . . . . . . . . . . . . . 463.7 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 513.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Markov Processes 574.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.3 Definition of a Markov Process . . . . . . . . . . . . . . . . . . . 624.4 The Chapman-Kolmogorov Equation . . . . . . . . . . . . . . . . 644.5 The Generator of a Markov Processes . . . . . . . . . . . . . . . 67

4.5.1 The Adjoint Semigroup . . . . . . . . . . . . . . . . . . 694.6 Ergodic Markov processes . . . . . . . . . . . . . . . . . . . . . 70

4.6.1 Stationary Markov Processes . . . . . . . . . . . . . . . . 724.7 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 734.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5 Diffusion Processes 775.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.2 Definition of a Diffusion Process . . . . . . . . . . . . . . . . . . 775.3 The Backward and Forward Kolmogorov Equations . . . . . . . . 79

5.3.1 The Backward Kolmogorov Equation . . . . . . . . . . . 795.3.2 The Forward Kolmogorov Equation . . . . . . . . . . . . 82

5.4 Multidimensional Diffusion Processes . . . . . . . . . . . . . . . 845.5 Connection with Stochastic Differential Equations . . . . . . . . . 855.6 Examples of Diffusion Processes . . . . . . . . . . . . . . . . . . 865.7 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 865.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

CONTENTS iii

6 The Fokker-Planck Equation 896.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.2 Basic Properties of the FP Equation . . . . . . . . . . . . . . . . 90

6.2.1 Existence and Uniqueness of Solutions . . . . . . . . . . 906.2.2 The FP equation as a conservation law . . . . . . . . . . . 926.2.3 Boundary conditions for the FokkerPlanck equation . . . 92

6.3 Examples of Diffusion Processes . . . . . . . . . . . . . . . . . . 946.3.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . 946.3.2 The Ornstein-Uhlenbeck Process . . . . . . . . . . . . . . 986.3.3 The Geometric Brownian Motion . . . . . . . . . . . . . 102

6.4 The Ornstein-Uhlenbeck Process and Hermite Polynomials . . . . 1036.5 Reversible Diffusions . . . . . . . . . . . . . . . . . . . . . . . . 110

6.5.1 Markov Chain Monte Carlo (MCMC) . . . . . . . . . . . 1156.6 Perturbations of non-Reversible Diffusions . . . . . . . . . . . . . 1156.7 Eigenfunction Expansions . . . . . . . . . . . . . . . . . . . . . 116

6.7.1 Reduction to a Schrodinger Equation . . . . . . . . . . . 1176.8 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 1196.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7 Stochastic Differential Equations 1237.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237.2 The Ito and Stratonovich Stochastic Integral . . . . . . . . . . . . 124

7.2.1 The Stratonovich Stochastic Integral . . . . . . . . . . . . 1257.3 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . 126

7.3.1 Examples of SDEs . . . . . . . . . . . . . . . . . . . . . 1277.4 The Generator, Itos formula and the Fokker-Planck Equation . . . 129

7.4.1 The Generator . . . . . . . . . . . . . . . . . . . . . . . 1297.4.2 Itos Formula . . . . . . . . . . . . . . . . . . . . . . . . 129

7.5 Linear SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1317.6 Derivation of the Stratonovich SDE . . . . . . . . . . . . . . . . 134

7.6.1 Ito versus Stratonovich . . . . . . . . . . . . . . . . . . . 1377.7 Numerical Solution of SDEs . . . . . . . . . . . . . . . . . . . . 1387.8 Parameter Estimation for SDEs . . . . . . . . . . . . . . . . . . . 1387.9 Noise Induced Transitions . . . . . . . . . . . . . . . . . . . . . 1387.10 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 1407.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

iv CONTENTS

8 The Langevin Equation 1418.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1418.2 The Fokker-Planck Equation in Phase Space (Klein-Kramers Equa-

tion) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1418.3 The Langevin Equation in a Harmonic Potential . . . . . . . . . . 1468.4 Asymptotic Limits for the Langevin Equation . . . . . . . . . . . 155

8.4.1 The Overdamped Limit . . . . . . . . . . . . . . . . . . . 1578.4.2 The Underdamped Limit . . . . . . . . . . . . . . . . . . 163

8.5 Brownian Motion in Periodic Potentials . . . . . . . . . . . . . . 1688.5.1 The Langevin equation in a periodic potential . . . . . . . 1688.5.2 Equivalence With the Green-Kubo Formula . . . . . . . . 174

8.6 The Underdamped and Overdamped Limits of the Diffusion Coef-ficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1768.6.1 Brownian Motion in a Tilted Periodic Potential . . . . . . 185

8.7 Numerical Solution of the Klein-Kramers Equation . . . . . . . . 1888.8 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 1888.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9 The Mean First Passage time and Exit Time Problems 1919.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1919.2 Brownian Motion in a Bistable Potential . . . . . . . . . . . . . . 1919.3 The Mean First Passage Time . . . . . . . . . . . . . . . . . . . . 194

9.3.1 The Boundary Value Problem for the MFPT . . . . . . . . 1949.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 197

9.4 Escape from a Potential Barrier . . . . . . . . . . . . . . . . . . . 1999.4.1 Calculation of the Reaction Rate in the Overdamped Regime2009.4.2 The Intermediate Regime: = O(1) . . . . . . . . . . . 2019.4.3 Calculation of the Reaction Rate in the energy-diffusion-

limited regime . . . . . . . . . . . . . . . . . . . . . . . 2029.5 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 2029.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

10 Stochastic Processes and Statistical Mechanics 20510.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20510.2 The Kac-Zwanzig Model . . . . . . . . . . . . . . . . . . . . . . 20610.3 The Generalized-Langevin Equation . . . . . . . . . . . . . . . . 213

CONTENTS v

10.4 Open Classical Systems . . . . . . . . . . . . . . . . . . . . . . . 21710.5 Linear Response Theory . . . . . . . . . . . . . . . . . . . . . . 21810.6 Projection Operator Techniques . . . . . . . . . . . . . . . . . . 21810.7 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . 21910.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

vi CONTENTS

Preface

The purpose of these notes is to present various results and techniques from thetheory of stochastic processes and are useful in the study of stochastic problemsin physics, chemistry and other areas. These notes have been used for severalyears for a course on applied stochastic processes offered to fourth year and toMSc students in applied mathematics at the department of mathematics, ImperialCollege London.

G.A. PavliotisLondon, December 2010

vii

viii PREFACE

Chapter 1

Introduction

1.1 Introduction

In this chapter we introduce some of the concepts and techniques that we will studyin this book. In Section 1.2 we present a brief historical overview on the develop-ment of the theory of stochastic processes in the twentieth century. In Section 1.3we introduce the one-dimensional random walk an we use this example in order tointroduce several concepts such Brownian motion, the Markov property. In Sec-tion 1.4 we discuss about the stochastic modeling of deterministic chaos. Somecomments on the role of probabilistic modeling in the physical sciences are of-fered in Section 1.5. Discussion and bibliographical comments are presented inSection 1.6. Exercises are included in Section 1.7.

1.2 Historical Overview

The theory of stochastic processes, at least in terms of its application to physics,started with Einsteins work on the theory of Brownian motion: Concerning themotion, as required by the molecular-kinetic theory of heat, of particles suspendedin liquids at rest (1905) and in a series of additional papers that were published inthe period 1905 1906. In these fundamental works, Einstein presented an expla-nation of Browns observation (1827) that when suspended in water, small pollengrains are found to be in a very animated and irregular state of motion. In devel-oping his theory Einstein introduced several concepts that still play a fundamentalrole in the study of stochastic processes and that we will study in this book. Usingmodern terminology, Einstein introduced a Markov chain model for the motion of

1

2 CHAPTER 1. INTRODUCTION

the particle (molecule, pollen grain...). Furthermore, he introduced the idea that itmakes more sense to talk about the probability of finding the particle at position xat time t, rather than about individual trajectories.

In his work many of the main aspects of the modern theory of stochastic pro-cesses can be found:

The assumption of Markovianity (no memory) expressed through the Chapman-Kolmogorov equation.

The FokkerPlanck equation (in this case, the diffusion equation). The derivation of the Fokker-Planck equation from the master (Chapman-

Kolmogorov) equation through a Kramers-Moyal expansion.

The calculation of a transport coefficient (the diffusion equation) using macro-scopic (kinetic theory-based) considerations:

D =kBT

6a.

kB is Boltzmanns constant, T is the temperature, is the viscosity of thefluid and a is the diameter of the particle.

Einsteins theory is based on the Fokker-Planck equation. Langevin (1908) de-veloped a theory based on a stochastic differential equation. The equation ofmotion for a Brownian particle is

md2x

dt2= 6adx

dt+ ,

where is a random force. It can be shown that there is complete agreement be-tween Einsteins theory and Langevins theory. The theory of Brownian motionwas developed independently by Smoluchowski, who also performed several ex-periments.

The approaches of Langevin and Einstein represent the two main approachesin the theory of stochastic processes:

Study individual trajectories of Brownian particles. Their evolution is gov-erned by a stochastic differential equation:

dX

dt= F (X) + (X)(t),

1.3. THE ONE-DIMENSIONAL RANDOM WALK 3

where (t) is a random force.

Study the probability (x, t) of finding a particle at position x at time t. Thisprobability distribution satisfies the Fokker-Planck equation:

t= (F (x)) + 1

2 : (A(x)),

where A(x) = (x)(x)T .

The theory of stochastic processes was developed during the 20th century by sev-eral mathematicians and physicists including Smoluchowksi, Planck, Kramers,Chandrasekhar, Wiener, Kolmogorov, Ito, Doob.

1.3 The One-Dimensional Random Walk

We let time be discrete, i.e. t = 0, 1, . . . . Consider the following stochasticprocess Sn: S0 = 0; at each time step it moves to 1 with equal probability 12 .

In other words, at each time step we flip a fair coin. If the outcome is heads,we move one unit to the right. If the outcome is tails, we move one unit to the left.

Alternatively, we can think of the random walk as a sum of independent randomvariables:

Sn =

nj=1

Xj ,

where Xj {1, 1} with P(Xj = 1) = 12 .We can simulate the random walk on a computer:

We need a (pseudo)random number generator to generate n independentrandom variables which are uniformly distributed in the interval [0,1].

If the value of the random variable is > 12 then the particle moves to the left,otherwise it moves to the right.

We then take the sum of all these random moves.

The sequence {Sn}Nn=1 indexed by the discrete time T = {1, 2, . . . N} isthe path of the random walk. We use a linear interpolation (i.e. connect thepoints {n, Sn} by straight lines) to generate a continuous path.


0 5 10 15 20 25 30 35 40 45 50

6

4

2

0

2

4

6

8

50step random walk

Figure 1.1: Three paths of the random walk of length N = 50.

0 100 200 300 400 500 600 700 800 900 1000

50

40

30

20

10

0

10

20

1000step random walk

Figure 1.2: Three paths of the random walk of length N = 1000.

1.3. THE ONE-DIMENSIONAL RANDOM WALK 5

0 0.2 0.4 0.6 0.8 11.5

1

0.5

0

0.5

1

1.5

2

t

U(t)

mean of 1000 paths5 individual paths

Figure 1.3: Sample Brownian paths.

Every path of the random walk is different: it depends on the outcome of a se-quence of independent random experiments. We can compute statistics by gen-erating a large number of paths and computing averages. For example, E(Sn) =0, E(S2n) = n. The paths of the random walk (without the linear interpolation) arenot continuous: the random walk has a jump of size 1 at each time step. This is anexample of a discrete time, discrete space stochastic processes. The random walkis a time-homogeneous Markov process. If we take a large number of steps, therandom walk starts looking like a continuous time process with continuous paths.

We can quantify this observation by introducing an appropriate rescaled pro-cess and by taking an appropriate limit. Consider the sequence of continuous timestochastic processes

Znt :=1nSnt.

In the limit as n , the sequence {Znt } converges (in some appropriate sense,that will be made precise in later chapters) to a Brownian motion with diffusioncoefficient D = x22t =

12 . Brownian motion W (t) is a continuous time stochastic

processes with continuous paths that starts at 0 (W (0) = 0) and has indepen-dent, normally. distributed Gaussian increments. We can simulate the Brownian


motion on a computer using a random number generator that generates normallydistributed, independent random variables. We can write an equation for the evo-lution of the paths of a Brownian motion Xt with diffusion coefficient D startingat x:

dXt =2DdWt, X0 = x.

This is the simplest example of a stochastic differential equation. The probabilityof finding Xt at y at time t, given that it was at x at time t = 0, the transitionprobability density (y, t) satisfies the PDE

t= D

2

y2, (y, 0) = (y x).

This is the simplest example of the Fokker-Planck equation. The connectionbetween Brownian motion and the diffusion equation was made by Einstein in1905.

1.4 Stochastic Modeling of Deterministic Chaos

1.5 Why Randomness

Why introduce randomness in the description of physical systems?

To describe outcomes of a repeated set of experiments. Think of tossing acoin repeatedly or of throwing a dice.

To describe a deterministic system for which we have incomplete informa-tion: we have imprecise knowledge of initial and boundary conditions or ofmodel parameters.

ODEs with random initial conditions are equivalent to stochastic pro-cesses that can be described using stochastic differential equations.

To describe systems for which we are not confident about the validity of ourmathematical model.

To describe a dynamical system exhibiting very complicated behavior (chaoticdynamical systems). Determinism versus predictability.

1.6. DISCUSSION AND BIBLIOGRAPHY 7

To describe a high dimensional deterministic system using a simpler, lowdimensional stochastic system. Think of the physical model for Brownianmotion (a heavy particle colliding with many small particles).

To describe a system that is inherently random. Think of quantum mechan-ics.

Stochastic modeling is currently used in many different areas ranging frombiology to climate modeling to economics.

1.6 Discussion and Bibliography

The fundamental papers of Einstein on the theory of Brownian motion have beenreprinted by Dover [11]. The readers of this book are strongly encouraged to studythese papers. Other fundamental papers from the early period of the developmentof the theory of stochastic processes include the papers by Langevin, Ornstein andUhlenbeck, Doob, Kramers and Chandrashekhars famous review article [7]. Manyof these early papers on the theory of stochastic processes have been reprintedin [10]. Very useful historical comments can be founds in the books by Nelson [54]and Mazo [52].

1.7 Exercises

1. Read the papers by Einstein, Ornstein-Uhlenbeck, Doob etc.

2. Write a computer program for generating the random walk in one and two di-mensions. Study numerically the Brownian limit and compute the statistics ofthe random walk.

Chapter 2

Elements of Probability Theory

2.1 Introduction

In this chapter we put together some basic definitions and results from probabilitytheory that will be used later on. In Section 2.2 we give some basic definitionsfrom the theory of probability. In Section 2.3 we present some properties of ran-dom variables. In Section 2.4 we introduce the concept of conditional expectationand in Section 2.5 we define the characteristic function, one of the most usefultools in the study of (sums of) random variables. Some explicit calculations forthe multivariate Gaussian distribution are presented in Section 2.6. Different typesof convergence and the basic limit theorems of the theory of probability are dis-cussed in Section 2.7. Discussion and bibliographical comments are presented inSection 2.8. Exercises are included in Section 2.9.

2.2 Basic Definitions from Probability Theory

In Chapter 1 we defined a stochastic process as a dynamical system whose law ofevolution is probabilistic. In order to study stochastic processes we need to be ableto describe the outcome of a random experiment and to calculate functions of thisoutcome. First we need to describe the set of all possible experiments.

Definition 2.2.1. The set of all possible outcomes of an experiment is called thesample space and is denoted by .

Example 2.2.2. The possible outcomes of the experiment of tossing a coinare H and T . The sample space is =

{H, T

}.

9

10 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

The possible outcomes of the experiment of throwing a die are 1, 2, 3, 4, 5and 6. The sample space is =

{1, 2, 3, 4, 5, 6

}.

We define events to be subsets of the sample space. Of course, we would likethe unions, intersections and complements of events to also be events. When thesample space is uncountable, then technical difficulties arise. In particular, notall subsets of the sample space need to be events. A definition of the collection ofsubsets of events which is appropriate for finite additive probability is the follow-ing.

Definition 2.2.3. A collection F of is called a field on if

i. F;

ii. if A F then Ac F;

iii. If A, B F then A B F .

From the definition of a field we immediately deduce that F is closed underfinite unions and finite intersections:

A1, . . . An F ni=1Ai F , ni=1Ai F .

When is infinite dimensional then the above definition is not appropriatesince we need to consider countable unions of events.

Definition 2.2.4. A collection F of is called a -field or -algebra on if

i. F;

ii. if A F then Ac F;

iii. If A1, A2, F then i=1Ai F .

A -algebra is closed under the operation of taking countable intersections.

Example 2.2.5. F = {, }. F = {, A, Ac, } where A is a subset of . The power set of , denoted by {0, 1} which contains all subsets of .

2.2. BASIC DEFINITIONS FROM PROBABILITY THEORY 11

Let F be a collection of subsets of . It can be extended to a algebra (takefor example the power set of ). Consider all the algebras that contain F andtake their intersection, denoted by (F), i.e. A if and only if it is in everyalgebra containing F . (F) is a algebra (see Exercise 1 ). It is the smallestalgebra containing F and it is called the algebra generated by F .

Example 2.2.6. Let = Rn. The -algebra generated by the open subsets of Rn(or, equivalently, by the open balls of Rn) is called the Borel -algebra of Rn andis denoted by B(Rn).

Let X be a closed subset of Rn. Similarly, we can define the Borel -algebraof X, denoted by B(X).

A sub-algebra is a collection of subsets of a algebra which satisfies theaxioms of a algebra.

The field F of a sample space contains all possible outcomes of the ex-periment that we want to study. Intuitively, the field contains all the informationabout the random experiment that is available to us.

Now we want to assign probabilities to the possible outcomes of an experiment.

Definition 2.2.7. A probability measure P on the measurable space (, F) is afunction P : F 7 [0, 1] satisfying

i. P() = 0, P() = 1;

ii. For A1, A2, . . . with Ai Aj = , i 6= j then

P(i=1Ai) =i=1

P(Ai).

Definition 2.2.8. The triple(, F , P) comprising a set , a -algebra F of sub-

sets of and a probability measure P on (, F) is a called a probability space.

Example 2.2.9. A biased coin is tossed once: = {H, T}, F = {, H, T, } ={0, 1}, P : F 7 [0, 1] such that P() = 0, P(H) = p [0, 1], P(T ) =1 p, P() = 1.

Example 2.2.10. Take = [0, 1], F = B([0, 1]), P = Leb([0, 1]). Then(,F ,P) is a probability space.


2.2.1 Conditional Probability

One of the most important concepts in probability is that of the dependence be-tween events.

Definition 2.2.11. A family {Ai : i I} of events is called independent ifP( jJ Aj) = jJP(Aj)

for all finite subsets J of I .When two events A, B are dependent it is important to know the probability

that the event A will occur, given that B has already happened. We define thisto be conditional probability, denoted by P(A|B). We know from elementaryprobability that

P (A|B) = P (A B)P(B)

.

A very useful result is that of the total law of probability.

Definition 2.2.12. A family of events {Bi : i I} is called a partition of ifBi Bj = , i 6= j and iI Bi = .

Proposition 2.2.13. Law of total probability. For any event A and any partition{Bi : i I} we have

P(A) =iI

P(A|Bi)P(Bi).

The proof of this result is left as an exercise. In many cases the calculation ofthe probability of an event is simplified by choosing an appropriate partition of and using the law of total probability.

Let (,F ,P) be a probability space and fix B F . Then P(|B) defines aprobability measure on F . Indeed, we have that

P(|B) = 0, P(|B) = 1and (since Ai Aj = implies that (Ai B) (Aj B) = )

P (j=1Ai|B) =j=1

P(Ai|B),

for a countable family of pairwise disjoint sets {Aj}+j=1. Consequently, (,F ,P(|B))is a probability space for every B cF .

2.3. RANDOM VARIABLES 13

2.3 Random Variables

We are usually interested in the consequences of the outcome of an experiment,rather than the experiment itself. The function of the outcome of an experiment isa random variable, that is, a map from to R.

Definition 2.3.1. A sample space equipped with a field of subsets F is calleda measurable space.

Definition 2.3.2. Let (,F) and (E,G) be two measurable spaces. A functionX : E such that the event

{ : X() A} =: {X A} (2.1)belongs to F for arbitrary A G is called a measurable function or randomvariable.

When E is R equipped with its Borel -algebra, then (2.1) can by replacedwith

{X 6 x} F x R.Let X be a random variable (measurable function) from (,F , ) to (E,G). If Eis a metric space then we may define expectation with respect to the measure by

E[X] =

X() d().

More generally, let f : E 7 R be Gmeasurable. Then,

E[f(X)] =

f(X()) d().

Let U be a topological space. We will use the notation B(U) to denote the Borelalgebra of U : the smallest algebra containing all open sets of U . Every ran-dom variable from a probability space (,F , ) to a measurable space (E,B(E))induces a probability measure on E:

X(B) = PX1(B) = ( ;X() B), B B(E). (2.2)

The measure X is called the distribution (or sometimes the law) of X.Example 2.3.3. Let I denote a subset of the positive integers. A vector 0 ={0,i, i I} is a distribution on I if it has nonnegative entries and its total massequals 1:

iI 0,i = 1.


Consider the case where E = R equipped with the Borel algebra. In thiscase a random variable is defined to be a function X : R such that

{ : X() 6 x} F x R.We can now define the probability distribution function of X, FX : R [0, 1] as

FX(x) = P( {

X() 6 x)} =: P(X 6 x). (2.3)In this case, (R,B(R), FX) becomes a probability space.

The distribution function FX(x) of a random variable has the properties thatlimx FX(x) = 0, limx+ F (x) = 1 and is right continuous.

Definition 2.3.4. A random variable X with values on R is called discrete if ittakes values in some countable subset {x0, x1, x2, . . . } of R. i.e.: P(X = x) 6= xonly for x = x0, x1, . . . .

With a random variable we can associate the probability mass function pk =P(X = xk). We will consider nonnegative integer valued discrete random vari-ables. In this case pk = P(X = k), k = 0, 1, 2, . . . .

Example 2.3.5. The Poisson random variable is the nonnegative integer valuedrandom variable with probability mass function

pk = P(X = k) =k

k!e, k = 0, 1, 2, . . . ,

where > 0.

Example 2.3.6. The binomial random variable is the nonnegative integer valuedrandom variable with probability mass function

pk = P(X = k) =N !

n!(N n)!pnqNn k = 0, 1, 2, . . . N,

where p (0, 1), q = 1 p.Definition 2.3.7. A random variable X with values on R is called continuous ifP(X = x) = 0x R.

Let (,F ,P) be a probability space and let X : R be a random variablewith distribution FX . This is a probability measure on B(R). We will assumethat it is absolutely continuous with respect to the Lebesgue measure with densityX : FX(dx) = (x) dx. We will call the density (x) the probability densityfunction (PDF) of the random variable X.


Example 2.3.8. i. The exponential random variable has PDF

f(x) =

{ex x > 0,

0 x < 0,

with > 0.

ii. The uniform random variable has PDF

f(x) =

{1

ba a < x < b,0 x / (a, b),

with a < b.

Definition 2.3.9. Two random variables X and Y are independent if the events{ |X() 6 x} and { |Y () 6 y} are independent for all x, y R.

Let X, Y be two continuous random variables. We can view them as a ran-dom vector, i.e. a random variable from to R2. We can then define the jointdistribution function

F (x, y) = P(X 6 x, Y 6 y).

The mixed derivative of the distribution function fX,Y (x, y) := 2F

xy (x, y), if itexists, is called the joint PDF of the random vector {X, Y }:

FX,Y (x, y) =

x

y

fX,Y (x, y) dxdy.

If the random variables X and Y are independent, then

FX,Y (x, y) = FX(x)FY (y)

andfX,Y (x, y) = fX(x)fY (y).

The joint distribution function has the properties

FX,Y (x, y) = FY,X(y, x),

FX,Y (+, y) = FY (y), fY (y) = +

fX,Y (x, y) dx.


We can extend the above definition to random vectors of arbitrary finite dimen-sions. Let X be a random variable from (,F , ) to (Rd,B(Rd)). The (joint)distribution function FXRd [0, 1] is defined as

FX(x) = P(X 6 x).

Let X be a random variable in Rd with distribution function f(xN ) where xN ={x1, . . . xN}. We define the marginal or reduced distribution function fN1(xN1)by

fN1(xN1) =

R

fN(xN ) dxN .

We can define other reduced distribution functions:

fN2(xN2) =

R

fN1(xN1) dxN1 =

R

R

f(xN ) dxN1dxN .

2.3.1 Expectation of Random Variables

We can use the distribution of a random variable to compute expectations and prob-abilities:

E[f(X)] =

R

f(x) dFX (x) (2.4)and

P[X G] =GdFX(x), G B(E). (2.5)

The above formulas apply to both discrete and continuous random variables, pro-vided that we define the integrals in (2.4) and (2.5) appropriately.

When E = Rd and a PDF exists, dFX(x) = fX(x) dx, we have

FX(x) := P(X 6 x) =

x1

. . .

xd

fX(x) dx..

When E = Rd then by Lp(;Rd), or sometimes Lp(;) or even simply Lp(),we mean the Banach space of measurable functions on with norm

XLp =(

E|X|p)1/p

.

Let X be a nonnegative integer valued random variable with probability massfunction pk. We can compute the expectation of an arbitrary function of X usingthe formula

E(f(X)) =

k=0

f(k)pk.


Let X, Y be random variables we want to know whether they are correlatedand, if they are, to calculate how correlated they are. We define the covariance ofthe two random variables as

cov(X,Y ) = E[(X EX)(Y EY )] = E(XY ) EXEY.

The correlation coefficient is

(X,Y ) =cov(X,Y )

var(X)

var(X)(2.6)

The Cauchy-Schwarz inequality yields that (X,Y ) [1, 1]. We will saythat two random variables X and Y are uncorrelated provided that (X,Y ) = 0.It is not true in general that two uncorrelated random variables are independent.This is true, however, for Gaussian random variables (see Exercise 5).Example 2.3.10. Consider the random variable X : 7 R with pdf

,b(x) := (2) 1

2 exp

((x b)

2

2

).

Such an X is termed a Gaussian or normal random variable. The mean is

EX =

R

x,b(x) dx = b

and the variance is

E(X b)2 =

R

(x b)2,b(x) dx = .

Let b Rd and Rdd be symmetric and positive definite. The randomvariable X : 7 Rd with pdf

,b(x) :=((2)ddet

) 12exp

(121(x b), (x b)

)is termed a multivariate Gaussian or normal random variable. The meanis

E(X) = b (2.7)

and the covariance matrix is

E

((X b) (X b)

)= . (2.8)


Since the mean and variance specify completely a Gaussian random variable onR, the Gaussian is commonly denoted byN (m,). The standard normal randomvariable is N (0, 1). Similarly, since the mean and covariance matrix completelyspecify a Gaussian random variable on Rd, the Gaussian is commonly denoted byN (m,).

Some analytical calculations for Gaussian random variables will be presentedin Section 2.6.

2.4 Conditional Expecation

Assume thatX L1(,F , ) and let G be a subalgebra ofF . The conditionalexpectation of X with respect to G is defined to be the function (random variable)E[X|G] : 7 E which is Gmeasurable and satisfies

GE[X|G] d =

GX d G G.

We can define E[f(X)|G] and the conditional probability P[X F |G] = E[IF (X)|G],where IF is the indicator function of F , in a similar manner.

We list some of the most important properties of conditional expectation.

Theorem 2.4.1. [Properties of Conditional Expectation]. Let (,F , ) be a prob-ability space and let G be a subalgebra of F .

(a) If X is Gmeasurable and integrable then E(X|G) = X.

(b) (Linearity) If X1, X2 are integrable and c1, c2 constants, then

E(c1X1 + c2X2|G) = c1E(X1|G) + c2E(X2|G).

(c) (Order) IfX1, X2 are integrable andX1 6 X2 a.s., then E(X1|G) 6 E(X2|G)a.s.

(d) If Y and XY are integrable, and X is Gmeasurable then E(XY |G) =XE(Y |G).

(e) (Successive smoothing) If D is a subalgebra of F , D G and X is inte-grable, then E(X|D) = E[E(X|G)|D] = E[E(X|D)|G].

2.5. THE CHARACTERISTIC FUNCTION 19

(f) (Convergence) Let {Xn}n=1 be a sequence of random variables such that, forall n, |Xn| 6 Z where Z is integrable. If Xn X a.s., then E(Xn|G) E(X|G) a.s. and in L1.

Proof. See Exercise 10.

2.5 The Characteristic Function

Many of the properties of (sums of) random variables can be studied using theFourier transform of the distribution function. Let F () be the distribution functionof a (discrete or continuous) random variable X. The characteristic function ofX is defined to be the Fourier transform of the distribution function

(t) =

R

eit dF () = E(eitX ). (2.9)

For a continuous random variable for which the distribution function F has a den-sity, dF () = p()d, (2.9) gives

(t) =

R

eitp() d.

For a discrete random variable for which P(X = k) = k, (2.9) gives

(t) =

k=0

eitkak.

From the properties of the Fourier transform we conclude that the characteristicfunction determines uniquely the distribution function of the random variable, inthe sense that there is a one-to-one correspondance between F () and (t). Fur-thermore, in the exercises at the end of the chapter the reader is asked to prove thefollowing two results.

Lemma 2.5.1. Let {X1,X2, . . . Xn} be independent random variables with char-acteristic functions j(t), j = 1, . . . n and let Y =

nj=1Xj with characteristic

function Y (t). ThenY (t) =

nj=1j(t).

Lemma 2.5.2. Let X be a random variable with characteristic function (t) andassume that it has finite moments. Then

E(Xk) =1

ik(k)(0).


2.6 Gaussian Random Variables

In this section we present some useful calculations for Gaussian random variables.In particular, we calculate the normalization constant, the mean and variance andthe characteristic function of multidimensional Gaussian random variables.

Theorem 2.6.1. Let b Rd and Rdd a symmetric and positive definite ma-trix. Let X be the multivariate Gaussian random variable with probability densityfunction

,b(x) =1

Zexp

(121(x b),x b

).

Then

i. The normalization constant is

Z = (2)d/2

det().

ii. The mean vector and covariance matrix of X are given by

EX = b

andE((X EX) (X EX)) = .

iii. The characteristic function of X is

(t) = eib,t12t,t.

Proof. i. From the spectral theorem for symmetric positive definite matriceswe have that there exists a diagonal matrix with positive entries and anorthogonal matrix B such that

1 = BT1B.

Let z = x b and y = Bz. We have

1z, z = BT1Bz, z= 1Bz, Bz = 1y,y

=di=1

1i y2i .

2.6. GAUSSIAN RANDOM VARIABLES 21

Furthermore, we have that det(1) = di=11i , that det() = di=1i

and that the Jacobian of an orthogonal transformation is J = det(B) = 1.Hence,

Rd

exp

(121(x b),x b

)dx =

Rd

exp

(121z, z

)dz

=

Rd

exp

(12

di=1

1i y2i

)|J | dy

=

di=1

R

exp

(121i y

2i

)dyi

= (2)d/2ni=11/2i = (2)

d/2

det(),

from which we get that

Z = (2)d/2

det().

In the above calculation we have used the elementary calculus identityR

ex2

2 dx =

2

.

ii. From the above calculation we have that

,b(x) dx = ,b(BTy + b) dy

=1

(2)d/2

det()

di=1

exp

(12iy

2i

)dyi.

Consequently

EX =

Rd

x,b(x) dx

=

Rd

(BTy + b),b(BTy + b) dy

= b

Rd

,b(BTy + b) dy = b.

We note that, since 1 = BT1B, we have that = BTB. Further-


more, z = BTy. We calculate

E((Xi bi)(Xj bj)) =

Rd

zizj,b(z + b) dz

=1

(2)d/2

det()

Rd

k

Bkiykm

Bmiym exp

(12

1 y2

=1

(2)d/2

det()

k,m

BkiBmj

Rd

ykym exp

(12

1 y2

)dy

=k,m

BkiBmjkkm

= ij.

iii. Let y be a multivariate Gaussian random variable with mean 0 and covari-ance I . Let also C = B

. We have that = CCT = CTC . We have

thatX = CY + b.

To see this, we first note that X is Gaussian since it is given through a lineartransformation of a Gaussian random variable. Furthermore,

EX = b and E((Xi bi)(Xj bj)) = ij.

Now we have:

(t) = EeiX,t = eib,tEeiCY,t

= eib,tEeiY,CT t

= eib,tEeiP

j(P

k Cjktk)yj

= eib,te12

Pj|Pk Cjktk|2

= eib,te12Ct,Ct

= eib,te12t,CTCt

= eib,te12t,t.

Consequently,

(t) = eib,t12t,t.

2.7. TYPES OF CONVERGENCE AND LIMIT THEOREMS 23

2.7 Types of Convergence and Limit Theorems

One of the most important aspects of the theory of random variables is the study oflimit theorems for sums of random variables. The most well known limit theoremsin probability theory are the law of large numbers and the central limit theorem.There are various different types of convergence for sequences or random variables.We list the most important types of convergence below.

Definition 2.7.1. Let {Zn}n=1 be a sequence of random variables. We will saythat

(a) Zn converges to Z with probability one ifP(

limn+Zn = Z

)= 1.

(b) Zn converges to Z in probability if for every > 0lim

n+P(|Zn Z| > ) = 0.

(c) Zn converges to Z in Lp iflim

n+E[Zn Zp] = 0.

(d) Let Fn(), n = 1, + , F () be the distribution functions of Zn n =1, + and Z , respectively. Then Zn converges to Z in distribution if

limn+Fn() = F ()

for all R at which F is continuous.Recall that the distribution function FX of a random variable from a probability

space (,F ,P) to R induces a probability measure on R and that (R,B(R), FX ) isa probability space. We can show that the convergence in distribution is equivalentto the weak convergence of the probability measures induced by the distributionfunctions.

Definition 2.7.2. Let (E, d) be a metric space, B(E) the algebra of its Borelsets, Pn a sequence of probability measures on (E,B(E)) and let Cb(E) denotethe space of bounded continuous functions on E. We will say that the sequence ofPn converges weakly to the probability measure P if, for each f Cb(E),

limn+

Ef(x) dPn(x) =

Ef(x) dP (x).


Theorem 2.7.3. Let Fn(), n = 1, +, F () be the distribution functions ofZn n = 1, + and Z , respectively. Then Zn converges to Z in distribution ifand only if, for all g Cb(R)

limn+

Xg(x) dFn(x) =

Xg(x) dF (x). (2.10)

Notice that (2.10) is equivalent to

limn+Eng(Xn) = Eg(X),

where En and E denote the expectations with respect to Fn and F , respectively.When the sequence of random variables whose convergence we are interested

in takes values in Rd or, more generally, a metric space space (E, d) then we canuse weak convergence of the sequence of probability measures induced by thesequence of random variables to define convergence in distribution.

Definition 2.7.4. A sequence of real valued random variables Xn defined on aprobability spaces (n,Fn, Pn) and taking values on a metric space (E, d) is saidto converge in distribution if the indued measures Fn(B) = Pn(Xn B) forB B(E) converge weakly to a probability measure P .

Let {Xn}n=1 be iid random variables with EXn = V . Then, the strong lawof large numbers states that average of the sum of the iid converges to V withprobability one:

P

(lim

N+1

N

Nn=1

Xn = V)= 1.

The strong law of large numbers provides us with information about the behav-ior of a sum of random variables (or, a large number or repetitions of the sameexperiment) on average. We can also study fluctuations around the average be-havior. Indeed, let E(Xn V )2 = 2. Define the centered iid random variablesYn = Xn V . Then, the sequence of random variables 1N

Nn=1 Yn converges

in distribution to a N (0, 1) random variable:

limn+P

(1

N

Nn=1

Yn 6 a

)=

a

12

e12x2 dx.

This is the central limit theorem.



The material of this chapter is very standard and can be found in many books onprobability theory. Well known textbooks on probability theory are [4, 14, 15, 44,45, 37, 71].

The connection between conditional expectation and orthogonal projections isdiscussed in [8].

The reduced distribution functions defined in Section 2.3 are used extensivelyin statistical mechanics. A different normalization is usually used in physics text-books. See for instance [2, Sec. 4.2].

The calculations presented in Section 2.6 are essentially an exercise in linearalgebra. See [42, Sec. 10.2].

Random variables and probability measures can also be defined in infinite di-mensions. More information can be found in [?, Ch. 2].

The study of limit theorems is one of the cornerstones of probability theory andof the theory of stochastic processes. A comprehensive study of limit theorems canbe found in [33].

2.9 Exercises

1. Show that the intersection of a family of -algebras is a -algebra.

2. Prove the law of total probability, Proposition 2.2.13.

3. Calculate the mean, variance and characteristic function of the following prob-ability density functions.

(a) The exponential distribution with density

f(x) =

{ex x > 0,

0 x < 0,

with > 0.

(b) The uniform distribution with density

f(x) =

{1

ba a < x < b,0 x / (a, b),

with a < b.


(c) The Gamma distribution with density

f(x) =

{

()(x)1ex x > 0,0 x < 0,

with > 0, > 0 and () is the Gamma function

() =

0

1e d, > 0.

4. Le X and Y be independent random variables with distribution functions FXand FY . Show that the distribution function of the sum Z = X + Y is theconvolution of FX and FY :

FZ(x) =

FX(x y) dFY (y).

5. Let X and Y be Gaussian random variables. Show that they are uncorrelated ifand only if they are independent.

6. (a) Let X be a continuous random variable with characteristic function (t).Show that

EXk =1

ik(k)(0),

where (k)(t) denotes the k-th derivative of evaluated at t.(b) Let X be a nonnegative random variable with distribution function F (x).

Show thatE(X) =

+0

(1 F (x)) dx.

(c) Let X be a continuous random variable with probability density functionf(x) and characteristic function (t). Find the probability density andcharacteristic function of the random variable Y = aX+ b with a, b R.

(d) Let X be a random variable with uniform distribution on [0, 2]. Find theprobability density of the random variable Y = sin(X).

7. Let X be a discrete random variable taking vales on the set of nonnegative inte-gers with probability mass function pk = P(X = k) with pk > 0,

+k=0 pk =

1. The generating function is defined as

g(s) = E(sX) =+k=0

pksk.

2.9. EXERCISES 27

(a) Show that

EX = g(1) and EX2 = g(1) + g(1),

where the prime denotes differentiation.

(b) Calculate the generating function of the Poisson random variable with

pk = P(X = k) =ek

k!, k = 0, 1, 2, . . . and > 0.

(c) Prove that the generating function of a sum of independent nonnegativeinteger valued random variables is the product of their generating func-tions.

8. Write a computer program for studying the law of large numbers and the centrallimit theorem. Investigate numerically the rate of convergence of these twotheorems.

9. Study the properties of Gaussian measures on separable Hilbert spaces from [?,Ch. 2].

10. . Prove Theorem 2.4.1.

Chapter 3

Basics of the Theory of StochasticProcesses

3.1 Introduction

In this chapter we present some basic results form the theory of stochastic pro-cesses and we investigate the properties of some of the standard stochastic pro-cesses in continuous time. In Section 3.2 we give the definition of a stochastic pro-cess. In Section 3.3 we present some properties of stationary stochastic processes.In Section 3.4 we introduce Brownian motion and study some of its properties.Various examples of stochastic processes in continuous time are presented in Sec-tion 3.5. The Karhunen-Loeve expansion, one of the most useful tools for repre-senting stochastic processes and random fields, is presented in Section 3.6. Furtherdiscussion and bibliographical comments are presented in Section 3.7. Section 3.8contains exercises.

3.2 Definition of a Stochastic Process

Stochastic processes describe dynamical systems whose evolution law is of proba-bilistic nature. The precise definition is given below.

Definition 3.2.1. Let T be an ordered set, (,F ,P) a probability space and (E,G)a measurable space. A stochastic process is a collection of random variablesX = {Xt; t T} where, for each fixed t T , Xt is a random variable from(,F ,P) to (E,G). is called the sample space. and E is the state space of thestochastic process Xt.

29

30 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

The set T can be either discrete, for example the set of positive integers Z+, orcontinuous, T = [0,+). The state space E will usually be Rd equipped with thealgebra of Borel sets.

A stochastic process X may be viewed as a function of both t T and .We will sometimes write X(t),X(t, ) or Xt() instead of Xt. For a fixed samplepoint , the function Xt() : T 7 E is called a sample path (realization,trajectory) of the process X.

Definition 3.2.2. The finite dimensional distributions (fdd) of a stochastic pro-cess are the distributions of theEkvalued random variables (X(t1),X(t2), . . . ,X(tk))for arbitrary positive integer k and arbitrary times ti T, i {1, . . . , k}:

F (x) = P(X(ti) 6 xi, i = 1, . . . , k)

with x = (x1, . . . , xk).

From experiments or numerical simulations we can only obtain informationabout the finite dimensional distributions of a process. A natural question arises:are the finite dimensional distributions of a stochastic process sufficient to deter-mine a stochastic process uniquely? This is true for processes with continuouspaths 1. This is the class of stochastic processes that we will study in these notes.

Definition 3.2.3. We will say that two processes Xt and Yt are equivalent if theyhave same finite dimensional distributions.

Definition 3.2.4. A one dimensional Gaussian process is a continuous time stochas-tic process for which E = R and all the finite dimensional distributions are Gaus-sian, i.e. every finite dimensional vector (Xt1 ,Xt2 , . . . ,Xtk ) is a N (k,Kk) ran-dom variable for some vector k and a symmetric nonnegative definite matrix Kkfor all k = 1, 2, . . . and for all t1, t2, . . . , tk.

From the above definition we conclude that the Finite dimensional distributionsof a Gaussian continuous time stochastic process are Gaussian with PFG

k ,Kk(x) = (2)n/2(detKk)1/2 exp

[12K1k (x k), x k

],

where x = (x1, x2, . . . xk).1In fact, what we need is the stochastic process to be separable. See the discussion in Section 3.7

3.3. STATIONARY PROCESSES 31

It is straightforward to extend the above definition to arbitrary dimensions. AGaussian process x(t) is characterized by its mean

m(t) := Ex(t)

and the covariance (or autocorrelation) matrixC(t, s) = E

((x(t)m(t)) (x(s)m(s))).

Thus, the first two moments of a Gaussian process are sufficient for a completecharacterization of the process.

3.3 Stationary Processes3.3.1 Strictly Stationary Processes

In many stochastic processes that appear in applications their statistics remain in-variant under time translations. Such stochastic processes are called stationary. Itis possible to develop a quite general theory for stochastic processes that enjoy thissymmetry property.

Definition 3.3.1. A stochastic process is called (strictly) stationary if all finitedimensional distributions are invariant under time translation: for any integer kand times ti T , the distribution of (X(t1),X(t2), . . . ,X(tk)) is equal to thatof (X(s + t1),X(s + t2), . . . ,X(s + tk)) for any s such that s + ti T for alli {1, . . . , k}. In other words,P(Xt1+t A1,Xt2+t A2 . . . Xtk+t Ak) = P(Xt1 A1,Xt2 A2 . . . Xtk Ak), t T.Example 3.3.2. Let Y0, Y1, . . . be a sequence of independent, identically dis-tributed random variables and consider the stochastic process Xn = Yn. ThenXn is a strictly stationary process (see Exercise 1). Assume furthermore thatEY0 = < +. Then, by the strong law of large numbers, we have that

1

N

N1j=0

Xj =1

N

N1j=0

Yj EY0 = ,

almost surely. In fact, Birkhoffs ergodic theorem states that, for any function fsuch that Ef(Y0) < +, we have that

limN+

1

N

N1j=0

f(Xj) = Ef(Y0), (3.1)


almost surely. The sequence of iid random variables is an example of an ergodicstrictly stationary processes.

Ergodic strictly stationary processes satisfy (3.1) Hence, we can calculate thestatistics of a sequence stochastic process Xn using a single sample path, providedthat it is long enough (N 1).Example 3.3.3. Let Z be a random variable and define the stochastic processXn = Z, n = 0, 1, 2, . . . . ThenXn is a strictly stationary process (see Exercise 2).We can calculate the long time average of this stochastic process:

1

N

N1j=0

Xj =1

N

N1j=0

Z = Z,

which is independent of N and does not converge to the mean of the stochastic pro-cesses EXn = EZ (assuming that it is finite), or any other deterministic number.This is an example of a non-ergodic processes.

3.3.2 Second Order Stationary Processes

Let(,F ,P) be a probability space. Let Xt, t T (with T = R or Z) be a

real-valued random process on this probability space with finite second moment,E|Xt|2 < + (i.e. Xt L2(,P) for all t T ). Assume that it is strictlystationary. Then,

E(Xt+s) = EXt, s T (3.2)from which we conclude that EXt is constant. and

E((Xt1+s )(Xt2+s )) = E((Xt1 )(Xt2 )), s T (3.3)

from which we conclude that the covariance or autocorrelation or correlationfunction C(t, s) = E((Xt )(Xs )) depends on the difference between thetwo times, t and s, i.e. C(t, s) = C(ts). This motivates the following definition.Definition 3.3.4. A stochastic process Xt L2 is called second-order station-ary or wide-sense stationary or weakly stationary if the first moment EXt is aconstant and the covariance function E(Xt )(Xs ) depends only on thedifference t s:

EXt = , E((Xt )(Xs )) = C(t s).


The constant is the expectation of the process Xt. Without loss of generality,we can set = 0, since if EXt = then the process Yt = Xt is meanzero. A mean zero process with be called a centered process. The function C(t)is the covariance (sometimes also called autocovariance) or the autocorrelationfunction of the Xt. Notice that C(t) = E(XtX0), whereas C(0) = E(X2t ), whichis finite, by assumption. Since we have assumed that Xt is a real valued process,we have that C(t) = C(t), t R.

Remark 3.3.5. Let Xt be a strictly stationary stochastic process with finite secondmoment (i.e. Xt L2). The definition of strict stationarity implies that EXt = , aconstant, and E((Xt)(Xs)) = C(ts). Hence, a strictly stationary processwith finite second moment is also stationary in the wide sense. The converse is nottrue.

Example 3.3.6.

Let Y0, Y1, . . . be a sequence of independent, identically distributed random vari-ables and consider the stochastic process Xn = Yn. From Example 3.3.2 weknow that this is a strictly stationary process, irrespective of whether Y0 is suchthat EY 20 < +. Assume now that EY0 = 0 and EY 20 = 2 < +. ThenXn is a second order stationary process with mean zero and correlation functionR(k) = 2k0. Notice that in this case we have no correlation between the valuesof the stochastic process at different times n and k.Example 3.3.7. Let Z be a single random variable and consider the stochasticprocess Xn = Z, n = 0, 1, 2, . . . . From Example 3.3.3 we know that this is astrictly stationary process irrespective of whether E|Z|2 < + or not. Assumenow that EZ = 0, EZ2 = 2. Then Xn becomes a second order stationaryprocess with R(k) = 2. Notice that in this case the values of our stochasticprocess at different times are strongly correlated.

We will see in Section 3.3.3 that for second order stationary processes, ergod-icity is related to fast decay of correlations. In the first of the examples above,there was no correlation between our stochastic processes at different times andthe stochastic process is ergodic. On the contrary, in our second example there isvery strong correlation between the stochastic process at different times and thisprocess is not ergodic.

Remark 3.3.8. The first two moments of a Gaussian process are sufficient for a


complete characterization of the process. Consequently, a Gaussian stochasticprocess is strictly stationary if and only if it is weakly stationary.

Continuity properties of the covariance function are equivalent to continuityproperties of the paths of Xt in the L2 sense, i.e.

limh0

E|Xt+h Xt|2 = 0.

Lemma 3.3.9. Assume that the covariance function C(t) of a second order sta-tionary process is continuous at t = 0. Then it is continuous for all t R. Fur-thermore, the continuity of C(t) is equivalent to the continuity of the process Xt inthe L2-sense.

Proof. Fix t R and (without loss of generality) set EXt = 0. We calculate:|C(t+ h) C(t)|2 = |E(Xt+hX0) E(XtX0)|2 = E|((Xt+h Xt)X0)|2

6 E(X0)2E(Xt+h Xt)2

= C(0)(EX2t+h + EX2t 2EXtXt+h)

= 2C(0)(C(0) C(h)) 0,as h 0. Thus, continuity of C() at 0 implies continuity for all t.

Assume now that C(t) is continuous. From the above calculation we have

E|Xt+h Xt|2 = 2(C(0) C(h)), (3.4)which converges to 0 as h 0. Conversely, assume that Xt is L2-continuous.Then, from the above equation we get limh0C(h) = C(0).

Notice that form (3.4) we immediately conclude that C(0) > C(h), h R.The Fourier transform of the covariance function of a second order stationary

process always exists. This enables us to study second order stationary processesusing tools from Fourier analysis. To make the link between second order station-ary processes and Fourier analysis we will use Bochners theorem, which appliesto all nonnegative functions.

Definition 3.3.10. A function f(x) : R 7 R is called nonnegative definite ifn

i,j=1

f(ti tj)cicj > 0 (3.5)

for all n N, t1, . . . tn R, c1, . . . cn C.


Lemma 3.3.11. The covariance function of second order stationary process is anonnegative definite function.

Proof. We will use the notation Xct :=n

i=1Xtici. We have.

ni,j=1

C(ti tj)cicj =n

i,j=1

EXtiXtj cicj

= E

ni=1

Xtici

nj=1

Xtj cj

= E (Xct Xct )= E|Xct |2 > 0.

Theorem 3.3.12. (Bochner) Let C(t) be a continuous positive definite function.Then there exists a unique nonnegative measure on R such that (R) = C(0)and

C(t) =

R

eixt (dx) t R. (3.6)

Definition 3.3.13. Let Xt be a second order stationary process with autocorrela-tion function C(t) whose Fourier transform is the measure (dx). The measure(dx) is called the spectral measure of the process Xt.

In the following we will assume that the spectral measure is absolutely contin-uous with respect to the Lebesgue measure on R with density f(x), i.e. (dx) =f(x)dx. The Fourier transform f(x) of the covariance function is called the spec-tral density of the process:

f(x) =1

2

eitxC(t) dt.

From (3.6) it follows that that the autocorrelation function of a mean zero, secondorder stationary process is given by the inverse Fourier transform of the spectraldensity:

C(t) =

eitxf(x) dx. (3.7)

There are various cases where the experimentally measured quantity is the spec-tral density (or power spectrum) of a stationary stochastic process. Conversely,


from a time series of observations of a stationary processes we can calculate theautocorrelation function and, using (3.7) the spectral density.

The autocorrelation function of a second order stationary process enables us toassociate a time scale to Xt, the correlation time cor:

cor =1

C(0)

0

C() d =

0

E(XX0)/E(X20 ) d.

The slower the decay of the correlation function, the larger the correlation timeis. Notice that when the correlations do not decay sufficiently fast so that C(t) isintegrable, then the correlation time will be infinite.

Example 3.3.14. Consider a mean zero, second order stationary process with cor-relation function

R(t) = R(0)e|t| (3.8)where > 0. We will write R(0) = D where D > 0. The spectral density of thisprocess is:

f(x) =1

2

D

+

eixte|t| dt

=1

2

D

( 0

eixtet dt + +0

eixtet dt)

=1

2

D

(1

ix+ +1

ix+

)=

D

1

x2 + 2.

This function is called the Cauchy or the Lorentz distribution. The correlationtime is (we have that R(0) = D/)

cor =

0

et dt = 1.

A Gaussian process with an exponential correlation function is of particularimportance in the theory and applications of stochastic processes.

Definition 3.3.15. A real-valued Gaussian stationary process defined on R withcorrelation function given by (3.8) is called the (stationary) Ornstein-Uhlenbeckprocess.


The Ornstein Uhlenbeck process is used as a model for the velocity of a Brown-ian particle. It is of interest to calculate the statistics of the position of the Brownianparticle, i.e. of the integral

X(t) =

t0Y (s) ds, (3.9)

where Y (t) denotes the stationary OU process.

Lemma 3.3.16. Let Y (t) denote the stationary OU process with covariance func-tion (3.8) and set = D = 1. Then the position process (3.9) is a mean zeroGaussian process with covariance function

E(X(t)X(s)) = 2min(t, s) + emin(t,s) + emax(t,s) e|ts| 1. (3.10)Proof. See Exercise 8.

3.3.3 Ergodic Properties of Second-Order Stationary Processes

Second order stationary processes have nice ergodic properties, provided that thecorrelation between values of the process at different times decays sufficiently fast.In this case, it is possible to show that we can calculate expectations by calculatingtime averages. An example of such a result is the following.

Theorem 3.3.17. Let {Xt}t>0 be a second order stationary process on a proba-bility space , F , P with mean and covariance R(t), and assume that R(t) L1(0,+). Then

limT+

E

1T T0

X(s) ds 2 = 0. (3.11)

For the proof of this result we will first need an elementary lemma.

Lemma 3.3.18. Let R(t) be an integrable symmetric function. Then T0

T0

R(t s) dtds = 2 T0

(T s)R(s) ds. (3.12)

Proof. We make the change of variables u = t s, v = t + s. The domain ofintegration in the t, s variables is [0, T ] [0, T ]. In the u, v variables it becomes[T, T ] [0, 2(T |u|)]. The Jacobian of the transformation is

J =(t, s)

(u, v)=

1

2.


The integral becomes T0

T0

R(t s) dtds = TT

2(T|u|)0

R(u)J dvdu

=

TT

(T |u|)R(u) du

= 2

T0

(T u)R(u) du,

where the symmetry of the function R(u) was used in the last step.

Proof of Theorem 3.3.17. We use Lemma (3.3.18) to calculate:

E

1T T0

Xs ds 2 = 1T 2E

T0

(Xs ) ds2

=1

T 2E

T0

T0

(X(t) )(X(s) ) dtds

=1

T 2

T0

T0

R(t s) dtds

=2

T 2

T0

(T u)R(u) du

62

T

+0

(1 uT

)R(u)

du 6 2T

+0

R(u) du 0,

using the dominated convergence theorem and the assumption R() L1.Assume that = 0 and define

D =

+0

R(t) dt, (3.13)

which, from our assumption on R(t), is a finite quantity. 2 The above calculationsuggests that, for T 1, we have that

E

( t0X(t) dt

)2 2DT.

This implies that, at sufficiently long times, the mean square displacement of theintegral of the ergodic second order stationary process Xt scales linearly in time,with proportionality coefficient 2D.

2Notice however that we do not know whether it is nonzero. This requires a separate argument.

3.4. BROWNIAN MOTION 39

Assume that Xt is the velocity of a (Brownian) particle. In this case, the inte-gral of Xt

Zt =

t0Xs ds,

represents the particle position. From our calculation above we conclude that

EZ2t = 2Dt.

whereD =

0

R(t) dt =

0

E(XtX0) dt (3.14)

is the diffusion coefficient. Thus, one expects that at sufficiently long times andunder appropriate assumptions on the correlation function, the time integral of astationary process will approximate a Brownian motion with diffusion coefficientD. The diffusion coefficient is an example of a transport coefficient and (3.14) isan example of the Green-Kubo formula: a transport coefficient can be calculatedin terms of the time integral of an appropriate autocorrelation function. In thecase of the diffusion coefficient we need to calculate the integral of the velocityautocorrelation function.

Example 3.3.19. Consider the stochastic processes with an exponential correla-tion function from Example 3.3.14, and assume that this stochastic process de-scribes the velocity of a Brownian particle. Since R(t) L1(0,+) Theo-rem 3.3.17 applies. Furthermore, the diffusion coefficient of the Brownian particleis given by +

0R(t) dt = R(0)1c =

D

2.

3.4 Brownian Motion

The most important continuous time stochastic process is Brownian motion. Brow-nian motion is a mean zero, continuous (i.e. it has continuous sample paths: fora.e the function Xt is a continuous function of time) process with indepen-dent Gaussian increments. A process Xt has independent increments if for everysequence t0 < t1 < . . . tn the random variables

Xt1 Xt0 , Xt2 Xt1 , . . . ,Xtn Xtn1


are independent. If, furthermore, for any t1, t2, s T and Borel set B R

P(Xt2+s Xt1+s B) = P(Xt2 Xt1 B)

then the process Xt has stationary independent increments.

Definition 3.4.1. A one dimensional standard Brownian motionW (t) : R+ R is a real valued stochastic process such that

i. W (0) = 0.

ii. W (t) has independent increments.

iii. For every t > s > 0 W (t) W (s) has a Gaussian distribution withmean 0 and variance t s. That is, the density of the random variableW (t)W (s) is

g(x; t, s) =(2(t s)

) 12exp

( x

2

2(t s)); (3.15)

A ddimensional standard Brownian motion W (t) : R+ Rd is a collec-tion of d independent one dimensional Brownian motions:

W (t) = (W1(t), . . . ,Wd(t)),

where Wi(t), i = 1, . . . , d are independent one dimensional Brownian mo-tions. The density of the Gaussian random vector W (t)W (s) is thus

g(x; t, s) =(2(t s)

)d/2exp

( x

2

2(t s)).

Brownian motion is sometimes referred to as the Wiener process .Brownian motion has continuous paths. More precisely, it has a continuous

modification.

Definition 3.4.2. Let Xt and Yt, t T , be two stochastic processes defined on thesame probability space (,F ,P). The process Yt is said to be a modification ofXt if P(Xt = Yt) = 1 t T .

Lemma 3.4.3. There is a continuous modification of Brownian motion.

This follows from a theorem due to Kolmogorov.


0 0.2 0.4 0.6 0.8 11.5

1

0.5

0

0.5

1

1.5

2

t

U(t)

mean of 1000 paths5 individual paths

Figure 3.1: Brownian sample paths

Theorem 3.4.4. (Kolmogorov) Let Xt, t [0,) be a stochastic process on aprobability space {,F ,P}. Suppose that there are positive constants and ,and for each T > 0 there is a constant C(T ) such that

E|Xt Xs| 6 C(T )|t s|1+, 0 6 s, t 6 T. (3.16)

Then there exists a continuous modification Yt of the process Xt.The proof of Lemma 3.4.3 is left as an exercise.

Remark 3.4.5. Equivalently, we could have defined the one dimensional standardBrownian motion as a stochastic process on a probability space

(,F ,P) with

continuous paths for almost all , and Gaussian finite dimensional distri-butions with zero mean and covariance E(WtiWtj ) = min(ti, tj). One can thenshow that Definition 3.4.1 follows from the above definition.

It is possible to prove rigorously the existence of the Wiener process (Brownianmotion):

Theorem 3.4.6. (Wiener) There exists an almost-surely continuous process Wtwith independent increments such and W0 = 0, such that for each t > 0 therandom variable Wt is N (0, t). Furthermore, Wt is almost surely locally Holdercontinuous with exponent for any (0, 12).


Notice that Brownian paths are not differentiable.We can also construct Brownian motion through the limit of an appropriately

rescaled random walk: let X1, X2, . . . be iid random variables on a probabilityspace (,F ,P) with mean 0 and variance 1. Define the discrete time stochasticprocess Sn with S0 = 0, Sn =

j=1Xj , n > 1. Define now a continuous time

stochastic process with continuous paths as the linearly interpolated, appropriatelyrescaled random walk:

W nt =1nS[nt] + (nt [nt])

1nX[nt]+1,

where [] denotes the integer part of a number. Then W nt converges weakly, asn + to a one dimensional standard Brownian motion.

Brownian motion is a Gaussian process. For the ddimensional Brownian mo-tion, and for I the d d dimensional identity, we have (see (2.7) and (2.8))

EW (t) = 0 t > 0

andE

((W (t)W (s)) (W (t)W (s))

)= (t s)I. (3.17)

Moreover,E

(W (t)W (s)

)= min(t, s)I. (3.18)

From the formula for the Gaussian density g(x, t s), eqn. (3.15), we immedi-ately conclude that W (t)W (s) and W (t+ u)W (s+ u) have the same pdf.Consequently, Brownian motion has stationary increments. Notice, however, thatBrownian motion itself is not a stationary process. Since W (t) = W (t)W (0),the pdf of W (t) is

g(x, t) =12t

ex2/2t.

We can easily calculate all moments of the Brownian motion:

E(xn(t)) =12t

+

xnex2/2t dx

={

1.3 . . . (n 1)tn/2, n even,0, n odd.

Brownian motion is invariant under various transformations in time.


Theorem 3.4.7. . Let Wt denote a standard Brownian motion in R. Then, Wt hasthe following properties:

i. (Rescaling). For each c > 0 define Xt = 1cW (ct). Then (Xt, t > 0) =(Wt, t > 0) in law.

ii. (Shifting). For each c > 0Wc+t Wc, t > 0 is a Brownian motion which isindependent of Wu, u [0, c].

iii. (Time reversal). DefineXt = W1tW1, t [0, 1]. Then (Xt, t [0, 1]) =(Wt, t [0, 1]) in law.

iv. (Inversion). Let Xt, t > 0 defined by X0 = 0, Xt = tW (1/t). Then(Xt, t > 0) = (Wt, t > 0) in law.

We emphasize that the equivalence in the above theorem holds in law and notin a pathwise sense.

Proof. See Exercise 13.We can also add a drift and change the diffusion coefficient of the Brownian

motion: we will define a Brownian motion with drift and variance 2 as theprocess

Xt = t+ Wt.

The mean and variance of Xt are

EXt = t, E(Xt EXt)2 = 2t.

Notice that Xt satisfies the equation

dXt = dt+ dWt.

This is the simplest example of a stochastic differential equation.We can define the OU process through the Brownian motion via a time change.

Lemma 3.4.8. Let W (t) be a standard Brownian motion and consider the process

V (t) = etW (e2t).

Then V (t) is a Gaussian stationary process with mean 0 and correlation function

R(t) = e|t|. (3.19)


For the proof of this result we first need to show that time changed Gaussianprocesses are also Gaussian.

Lemma 3.4.9. Let X(t) be a Gaussian stochastic process and let Y (t) = X(f(t))where f(t) is a strictly increasing function. Then Y (t) is also a Gaussian process.

Proof. We need to show that, for all positive integers N and all sequences of times{t1, t2, . . . tN} the random vector

{Y (t1), Y (t2), . . . Y (tN )} (3.20)

is a multivariate Gaussian random variable. Since f(t) is strictly increasing, it isinvertible and hence, there exist si, i = 1, . . . N such that si = f1(ti). Thus, therandom vector (3.20) can be rewritten as

{X(s1), X(s2), . . . X(sN )},

which is Gaussian for all N and all choices of times s1, s2, . . . sN . Hence Y (t) isalso Gaussian.

Proof of Lemma 3.4.8. The fact that V (t) is mean zero follows immediatelyfrom the fact that W (t) is mean zero. To show that the correlation function of V (t)is given by (3.19), we calculate

E(V (t)V (s)) = etsE(W (e2t)W (e2s)) = etsmin(e2t, e2s)

= e|ts|.

The Gaussianity of the process V (t) follows from Lemma 3.4.9 (notice that thetransformation that gives V (t) in terms of W (t) is invertible and we can writeW (s) = s1/2V (12 ln(s))).

3.5 Other Examples of Stochastic Processes

3.5.1 Brownian Bridge

Let W (t) be a standard one dimensional Brownian motion. We define the Brown-ian bridge (from 0 to 0) to be the process

Bt = Wt tW1, t [0, 1]. (3.21)

3.5. OTHER EXAMPLES OF STOCHASTIC PROCESSES 45

Notice that B0 = B1 = 0. Equivalently, we can define the Brownian bridge to bethe continuous Gaussian process {Bt : 0 6 t 6 1} such that

EBt = 0, E(BtBs) = min(s, t) st, s, t [0, 1]. (3.22)

Another, equivalent definition of the Brownian bridge is through an appropriatetime change of the Brownian motion:

Bt = (1 t)W(

t

1 t), t [0, 1). (3.23)

Conversely, we can write the Brownian motion as a time change of the Brownianbridge:

Wt = (t+ 1)B

(t

1 + t

), t > 0.

3.5.2 Fractional Brownian Motion

Definition 3.5.1. A (normalized) fractional Brownian motion WHt , t > 0 withHurst parameter H (0, 1) is a centered Gaussian process with continuous sam-ple paths whose covariance is given by

E(WHt WHs ) =

1

2

(s2H + t2H |t s|2H). (3.24)

Proposition 3.5.2. Fractional Brownian motion has the following properties.

i. When H = 12 , W12t becomes the standard Brownian motion.

ii. WH0 = 0, EWHt = 0, E(WHt )2 = |t|2H , t > 0.

iii. It has stationary increments, E(WHt WHs )2 = |t s|2H .

iv. It has the following self similarity property

(WHt , t > 0) = (HWHt , t > 0), > 0, (3.25)

where the equivalence is in law.

Proof. See Exercise 19


3.5.3 The Poisson Process

Another fundamental continuous time process is the Poisson process :

Definition 3.5.3. The Poisson process with intensity , denoted by N(t), is aninteger-valued, continuous time, stochastic process with independent incrementssatisfying

P[(N(t)N(s)) = k] = e(ts)((t s))k

k!, t > s > 0, k N.

The Poisson process does not have a continuous modification. See Exercise 20.

3.6 The Karhunen-Loeve Expansion

Let f L2() where is a subset of Rd and let {en}n=1 be an orthonormal basisin L2(). Then, it is well known that f can be written as a series expansion:

f =

n=1

fnen,

wherefn =

f(x)en(x) dx.

The convergence is in L2():

limN

f(x)Nn=1

fnen(x)

L2()

= 0.

It turns out that we can obtain a similar expansion for an L2 mean zero processwhich is continuous in the L2 sense:

EX2t < +, EXt = 0, limh0

E|Xt+h Xt|2 = 0. (3.26)

For simplicity we will take T = [0, 1]. Let R(t, s) = E(XtXs) be the autocorrela-tion function. Notice that from (3.26) it follows that R(t, s) is continuous in both tand s (exercise 21).

Let us assume an expansion of the form

Xt() =n=1

n()en(t), t [0, 1] (3.27)

3.6. THE KARHUNEN-LO EVE EXPANSION 47

where {en}n=1 is an orthonormal basis in L2(0, 1). The random variables n arecalculated as 1

0Xtek(t) dt =

10

n=1

nen(t)ek(t) dt

=

n=1

nnk = k,

where we assumed that we can interchange the summation and integration. Wewill assume that these random variables are orthogonal:

E(nm) = nnm,

where {n}n=1 are positive numbers that will be determined later.Assuming that an expansion of the form (3.27) exists, we can calculate

R(t, s) = E(XtXs) = E

( k=1

=1

kek(t)e(s)

)

=k=1

=1

E (k) ek(t)e(s)

=k=1

kek(t)ek(s).

Consequently, in order to the expansion (3.27) to be valid we need

R(t, s) =

k=1

kek(t)ek(s). (3.28)

From equation (3.28) it follows that 10R(t, s)en(s) ds =

10

k=1

kek(t)ek(s)en(s) ds

=

k=1

kek(t)

10ek(s)en(s) ds

=

k=1

kek(t)kn

= nen(t).


Hence, in order for the expansion (3.27) to be valid, {n, en(t)}n=1 have to bethe eigenvalues and eigenfunctions of the integral operator whose kernel is thecorrelation function of Xt: 1

0R(t, s)en(s) ds = nen(t). (3.29)

Hence, in order to prove the expansion (3.27) we need to study the eigenvalueproblem for the integral operator R : L2[0, 1] 7 L2[0, 1]. It easy to check thatthis operator is self-adjoint ((Rf, h) = (f,Rh) for all f, h L2(0, 1)) and non-negative (Rf, f > 0 for all f L2(0, 1)). Hence, all its eigenvalues are realand nonnegative. Furthermore, it is a compact operator (if {n}n=1 is a boundedsequence in L2(0, 1), then {Rn}n=1 has a convergent subsequence). The spec-tral theorem for compact, self-adjoint operators implies that R has a countablesequence of eigenvalues tending to 0. Furthermore, for every f L2(0, 1) we canwrite

f = f0 +

n=1

fnen(t),

where Rf0 = 0, {en(t)} are the eigenfunctions of R corresponding to nonzeroeigenvalues and the convergence is in L2. Finally, Mercers Theorem states thatfor R(t, s) continuous on [0, 1] [0, 1], the expansion (3.28) is valid, where theseries converges absolutely and uniformly.

Now we are ready to prove (3.27).

Theorem 3.6.1. (Karhunen-Loeve). Let {Xt, t [0, 1]} be an L2 process withzero mean and continuous correlation function R(t, s). Let {n, en(t)}n=1 be theeigenvalues and eigenfunctions of the operator R defined in (3.35). Then

Xt =n=1

nen(t), t [0, 1], (3.30)

wheren =

10Xten(t) dt, En = 0, E(nm) = nm. (3.31)

The series converges in L2 to X(t), uniformly in t.

Proof. The fact that En = 0 follows from the fact that Xt is mean zero. Theorthogonality of the random variables {n}n=1 follows from the orthogonality of

3.6. THE KARHUNEN-LO EVE EXPANSION 49

the eigenfunctions of R:

E(nm) = E

10

10XtXsen(t)em(s) dtds

=

10

10R(t, s)en(t)em(s) dsdt

= n

10en(s)em(s) ds

= nnm.

Consider now the partial sum SN =N

n=1 nen(t).

E|Xt SN |2 = EX2t + ES2N 2E(XtSN )

= R(t, t) + EN

k,=1

kek(t)e(t) 2E(Xt

Nn=1

nen(t)

)

= R(t, t) +

Nk=1

k|ek(t)|2 2ENk=1

10XtXsek(s)ek(t) ds

= R(t, t)Nk=1

k|ek(t)|2 0,

by Mercers theorem.

Remark 3.6.2. Let Xt be a Gaussian second order process with continuous co-variance R(t, s). Then the random variables {k}k=1 are Gaussian, since theyare defined through the time integral of a Gaussian processes. Furthermore, sincethey are Gaussian and orthogonal, they are also independent. Hence, for Gaussianprocesses the Karhunen-Loeve expansion becomes:

Xt =+k=1

kkek(t), (3.32)

where {k}k=1 are independent N (0, 1) random variables.Example 3.6.3. The Karhunen-Loeve Expansion for Brownian Motion. Thecorrelation function of Brownian motion is R(t, s) = min(t, s). The eigenvalueproblem Rn = nn becomes 1

0min(t, s)n(s) ds = nn(t).


Let us assume that n > 0 (it is easy to check that 0 is not an eigenvalue). Uponsetting t = 0 we obtain n(0) = 0. The eigenvalue problem can be rewritten inthe form t

0sn(s) ds + t

1tn(s) ds = nn(t).

We differentiate this equation once: 1tn(s) ds = n

n(t).

We set t = 1 in this equation to obtain the second boundary condition n(1) = 0.A second differentiation yields;

n(t) = nn(t),

where primes denote differentiation with respect to t. Thus, in order to calcu-late the eigenvalues and eigenfunctions of the integral operator whose kernel isthe covariance function of Brownian motion, we need to solve the Sturm-Liouvilleproblem

n(t) = nn(t), (0) = (1) = 0.It is easy to check that the eigenvalues and (normalized) eigenfunctions are

n(t) =2 sin

(1

2(2n 1)t

), n =

(2

(2n 1))2

.

Thus, the Karhunen-Loeve expansion of Brownian motion on [0, 1] is

Wt =2

n=1

n2

(2n 1) sin(1

2(2n 1)t

). (3.33)

We can use the KL expansion in order to study the L2-regularity of stochas-tic processes. First, let R be a compact, symmetric positive definite operator onL2(0, 1) with eigenvalues and normalized eigenfunctions {k, ek(x)}+k=1 and con-sider a function f L2(0, 1) with 10 f(s) ds = 0. We can define the one parame-ter family of Hilbert spaces H through the norm

f2 = Rf2L2 =k

|fk|2.


The inner product can be obtained through polarization. This norm enables us tomeasure the regularity of the function f(t).3 Let Xt be a mean zero second order(i.e. with finite second moment) process with continuous autocorrelation function.Define the space H := L2((, P ),H(0, 1)) with (semi)norm

Xt2 = EXt2H =k

|k|1. (3.34)

Notice that the regularity of the stochastic process Xt depends on the decay of theeigenvalues of the integral operator R := 10 R(t, s) ds.

As an example, consider the L2-regularity of Brownian motion. From Exam-ple 3.6.3 we know that k k2. Consequently, from (3.34) we get that, in orderfor Wt to be an element of the space H, we need that

k

|k|2(1) < +,

from which we obtain that < 1/2. This is consistent with the Holder continuityof Brownian motion from Theorem 3.4.6. 4


The Ornstein-Uhlenbeck process was introduced by Ornstein and Uhlenbeck in1930 as a model for the velocity of a Brownian particle [73].

The kind of analysis presented in Section 3.3.3 was initiated by G.I. Taylorin [72]. The proof of Bochners theorem 3.3.12 can be found in [39], where addi-tional material on stationary processes can be found. See also [36].

The spectral theorem for compact, self-adjoint operators which was neededin the proof of the Karhunen-Loeve theorem can be found in [63]. The Karhunen-Loeve expansion is also valid for random fields. See [69] and the reference therein.

3.8 Exercises

1. Let Y0, Y1, . . . be a sequence of independent, identically distributed randomvariables and consider the stochastic process Xn = Yn.3Think of R as being the inverse of the Laplacian with periodic boundary conditions. In this case

H coincides with the standard fractional Sobolev space.4Notice, however, that Wieners theorem refers to a.s. Holder continuity, whereas the calculation

presented in this section is about L2-continuity.


(a) Show that Xn is a strictly stationary process.(b) Assume that EY0 = < + and EY 20 = sigma2 < +. Show that

limN+

E

1NN1j=0

Xj = 0.

(c) Let f be such that Ef2(Y0) < +. Show that

limN+

E

1NN1j=0

f(Xj) f(Y0) = 0.

2. Let Z be a random variable and define the stochastic process Xn = Z, n =0, 1, 2, . . . . Show that Xn is a strictly stationary process.

3. Let A0, A1, . . . Am and B0, B1, . . . Bm be uncorrelated random variables withmean zero and variances EA2i = 2i , EB2i = 2i , i = 1, . . . m. Let0, 1, . . . m [0, ] be distinct frequencies and define, for n = 0,1,2, . . . , the stochasticprocess

Xn =

mk=0

(Ak cos(nk) +Bk sin(nk)

).

Calculate the mean and the covariance ofXn. Show that it is a weakly stationaryprocess.

4. Let {n : n = 0,1,2, . . . } be uncorrelated random variables with En =, E(n )2 = 2, n = 0,1,2, . . . . Let a1, a2, . . . be arbitrary realnumbers and consider the stochastic process

Xn = a1n + a2n1 + . . . amnm+1.

(a) Calculate the mean, variance and the covariance function of Xn. Showthat it is a weakly stationary process.

(b) Set ak = 1/m for k = 1, . . . m. Calculate the covariance function and

study the cases m = 1 and m +.5. Let W (t) be a standard one dimensional Brownian motion. Calculate the fol-

lowing expectations.

(a) EeiW (t).

3.8. EXERCISES 53

(b) Eei(W (t)+W (s)), t, s, (0,+).(c) E(ni=1 ciW (ti))2, where ci R, i = 1, . . . n and ti (0,+), i =

1, . . . n.

(d) Ee[i(Pn

i=1 ciW (ti))]

, where ci R, i = 1, . . . n and ti (0,+), i =1, . . . n.

6. Let Wt be a standard one dimensional Brownian motion and define

Bt = Wt tW1, t [0, 1].(a) Show that Bt is a Gaussian process with

EBt = 0, E(BtBs) = min(t, s) ts.(b) Show that, for t [0, 1) an equivalent definition of Bt is through the

formulaBt = (1 t)W

(t

1 t).

(c) Calculate the distribution function of Bt.7. Let Xt be a mean-zero second order stationary process with autocorrelation

function

R(t) =

Nj=1

2jjej |t|,

where {j , j}Nj=1 are positive real numbers.(a) Calculate the spectral density and the correlaction time of this process.(b) Show that the assumptions of Theorem 3.3.17 are satisfied and use the

argument presented in Section 3.3.3 (i.e. the Green-Kubo formula) to cal-culate the diffusion coefficient of the process Zt =

t0 Xs ds.

(c) Under what assumptions on the coefficients {j , j}Nj=1 can you studythe above questions in the limit N +?

8. Prove Lemma 3.10.

9. Let a1, . . . an and s1, . . . sn be positive real numbers. Calculate the mean andvariance of the random variable

X =

ni=1

aiW (si).


10. LetW (t) be the standard one-dimensional Brownian motion and let , s1, s2 >0. Calculate

(a) EeW (t).(b) E( sin(W (s1)) sin(W (s2))).

11. Let Wt be a one dimensional Brownian motion and let , > 0 and define

St = et+Wt .

(a) Calculate the mean and the variance of St.(b) Calculate the probability density function of St.

12. Use Theorem 3.4.4 to prove Lemma 3.4.3.

13. Prove Theorem 3.4.7.

14. Use Lemma 3.4.8 to calculate the distribution function of the stationary Ornstein-Uhlenbeck process.

15. Calculate the mean and the correlation function of the integral of a standardBrownian motion

Yt =

t0Ws ds.

16. Show that the process

Yt =

t+1t

(Ws Wt) ds, t R,

is second order stationary.

17. Let Vt = etW (e2t) be the stationary Ornstein-Uhlenbeck process. Give thedefinition and study the main properties of the Ornstein-Uhlenbeck bridge.

18. The autocorrelation function of the velocity Y (t) a Brownian particle movingin a harmonic potential V (x) = 12

20x

2 is

R(t) = e|t|(cos(|t|) 1

sin(|t|)

),

where is the friction coefficient and =20 2.

3.8. EXERCISES 55

(a) Calculate the spectral density of Y (t).(b) Calculate the mean square displacement E(X(t))2 of the position of the

Brownian particle X(t) = t0 Y (s) ds. Study the limit t +.

19. Show the scaling property (3.25) of the fractional Brownian motion.

20. Use Theorem (3.4.4) to show that there does not exist a continuous modificationof the Poisson process.

21. Show that the correlation function of a process Xt satisfying (3.26) is continu-ous in both t and s.

22. Let Xt be a stochastic process satisfying (3.26)

stochastic processes and applications

Documents

theory of stochastic

stationary markov processes

ergodic markov processes

order stationary processes

exercises514 markov

fp equation

backward kolmogorov

fokkerplanck equation