Convergence of Markov Processes
Amanda TurnerUniversity of Cambridge
1
Contents
1 Introduction 2
2 The Space DE [0,∞) 3
2.1 The Skorohod Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Convergence of Probability Measures 10
3.1 The Prohorov Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 The Skorohod Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Convergence of Finite Dimensional Distributions 18
5 Relative Compactness in DE [0,∞) 21
5.1 Prohorov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Compact Sets in DE [0,∞) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Some Useful Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 A Law of Large Numbers 28
6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 The Fluid Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3 A Brief Look at the Exit Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7 A Central Limit Theorem 38
7.1 Relative Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Convergence of the Finite Dimensional Distributions . . . . . . . . . . . . . . . . 42
8 Applications 48
8.1 Epidemics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.2 Logistic Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9 Conclusion 53
1
1 Introduction
This essay aims to give an account of the theory and applications of the convergence of stochasticprocesses, and in particular Markov processes. This is developed as a generalisation of theconvergence of real-valued random variables using ideas mainly due to Prohorov and Skorohod.Sections 2 to 5 cover the general theory, which is applied in Sections 6 to 8.
For random variables taking values in R, there are a number of types of convergence includ-ing almost sure convergence, convergence in probability and convergence in distribution. Thefirst two depend on the random variables being defined on the same probability space and areconsequently not sufficiently general. Convergence in distribution is weaker than the other twotypes of convergence in the sense that if Xn → X almost surely or in probability, then Xn
converges to X in distribution. However, if Xn converges to X in distribution, then there existsa probability space on which are defined random variables Yn and Y with the same distributionsas Xn and X such that Yn → Y almost surely, and hence in probability. In this way convergencein distribution ‘incorporates’ the other types of convergence.
The notion of convergence for stochastic processes, that is random variables taking valuesin some space of functions on [0,∞), is even less straightforward. Once again, there is almostsure convergence and convergence in probability, but one may also say that Xn converges to Xif the finite-dimensional distributions converge, that is if for any choice t1, . . . , tk of times, then(Xn
t1 , . . . , Xntk
) converges in distribution to (Xt1 , . . . , Xtk). It turns out that a direct generali-
sation of convergence in distribution of random variables in R to stochastic processes in somesense ‘incorporates’ all these types of convergence.
We restrict our attention to stochastic processes whose sample paths are right continuousfunctions with left limits at every time point t, also known as cadlag functions. The reason forthis is that most processes which arise in applications have this property, and these functionsare reasonably well behaved. In order to be able to talk about almost sure convergence andconvergence in probability, we need a topology on the space of cadlag functions. It turns out tobe extremely difficult to construct a topology with useful properties on this space and Section2 contains a very technical discussion of the construction and properties of such a topology, theSkorohod topology.
For each stochastic process X , there is a unique probability measure on the space of cadlagfunctions that characterises the distribution of X . As this probability measure is independentof the probability space on which X is defined, it is easier to work with the probability measuresto obtain results on convergence. In Section 3 we construct a metric on the space of probabilitymeasures that induces a topology equivalent to that generated by convergence in distribution ofthe related stochastic processes. Using this we establish an equivalence between convergence indistribution, almost sure convergence and convergence in probability.
Section 4 looks at the convergence of finite dimensional distributions. This viewpoint is usedto establish a key result, due to Prohorov, that states that a sequence of stochastic processesconverges if and only if it is relatively compact and the corresponding finite dimensional distri-butions converge. This gives the required equivalence between convergence in distribution andconvergence of finite dimensional distributions.
In order to apply the result from Section 4 in any practical situations, we need to have anunderstanding of what it means for a sequence of stochastic processes to be relatively compact.Prohorov’s Theorem, discussed in Section 5, establishes equivalent conditions in terms of thecompact sets on the space of cadlag functions. Characterising these compact sets in a way thatcan be easily applied and putting these results together gives some useful necessary and sufficientconditions for a family of stochastic processes to be relatively compact.
The remainder of the essay applies the general theory that has been built up so far to variouscases of interest. In Section 6 the Law of Large Numbers is generalised by showing that undercertain conditions a sequence of Markov jump processes can converge to the solution of an
2
ordinary differential equation. (The sense in which this is a generalisation of the Law of LargeNumbers is explained at the beginning of the section.)
This idea is developed further in Section 7, by showing that the fluctuations about this limitconverge in distribution to the solution of a stochastic differential equation, which generalisesthe central limit theorem. Here the results from Section 4 and the characterisation of relativecompactness from Section 5 are applied to prove the convergence in distribution.
Finally, the applications of the large number and central limit results to some practicalsituations are discussed. Particular mention is given to the application of these limit theoremsto population processes in biology.
The material in Sections 2 to 5 is broadly based on the approach of Ethier and Kurtz [4].Sections 6 and 7 cover material from the paper of Darling and Norris [3], although the applicationof the theorems from Section 4 and Section 5 is an extension of the results in this paper.
2 The Space DE[0,∞)
Most stochastic processes arising in applications have right and left limits at each time pointfor almost all sample paths. By convention we assume that the sample paths are in fact rightcontinuous where this can be done without changing the finite-dimensional distributions. Forthis reason, the space of right continuous functions with left limits is of great importance andin this section we explore its various properties and define a suitable metric on it. We concludethe section by investigating the Borel σ-algebra that results from this metric.
Although in the applications to be discussed, the stochastic processes have sample pathstaking values in some subset of (Rd, | · |), where possible we establish results for processes withsample paths taking values in a general metric space. Throughout this essay we shall denotethis metric space by (E, r), and define q to be the metric q = r ∧ 1.
Definition 2.1. DE [0,∞) is the space of all right continuous functions x : [0,∞) → E with leftlimits i.e. for each t ≥ 0, lims↓t x(s) = x(t), and lims↑t x(s) = x(t−) exists.
We begin with a result that shows that functions in DE[0,∞) are fairly well behaved.
Lemma 2.2. If x ∈ DE [0,∞), then x has at most countably many points of discontinuity.
Proof. The set of discontinuities of x is given by⋃∞
n=1An where An = t > 0 : r(x(t), x(t−)) >1n, so it is enough to show that each An is countable. Suppose we have distinct points t1, t2, . . . ∈An with tm → t for some t, as m → ∞. By restricting to a subsequence if necessary, we mayassume that either tm ↑ t or tm ↓ t. Then limm→∞ x(tm) = x(t−) = limm→∞ x(tm−) orlimm→∞ x(tm) = x(t) = limm→∞ x(tm−) and so r(x(tm), x(tm−)) < 1
n for large enough m,contradicting xm ∈ An. Therefore An cannot contain any limiting sequences. But for eachT > 0, every sequence in the interval [0, T ] has a convergent subsequence and so there are onlya finite number of points of An in [0, T ]. Hence An is countable, as required.
2.1 The Skorohod Topology
The results on convergence of probability measures that we shall prove in subsequent sectionsare most applicable to complete separable metric spaces. For this reason we define a metricon DE [0,∞) under which it is separable and complete if (E, r) is separable and complete. Inparticular DRd [0,∞) will be separable and complete.
Definition 2.3. Let Λ′ be the collection of strictly increasing functions λ mapping [0,∞) onto[0,∞) (in particular, λ(0) = 0, limt→∞ λ(t) = ∞, and λ is continuous) Let Λ be the set of
3
Lipschitz continuous functions λ ∈ Λ′ such that
γ(λ) = sup0≤s<t
∣
∣
∣
∣
logλ(t) − λ(s)
t− s
∣
∣
∣
∣
<∞
For x, y ∈ DE [0,∞), define
d(x, y) = infλ∈Λ
[
γ(λ) ∨∫ ∞
0
e−ud(x, y, λ, u)du
]
whered(x, y, λ, u) = sup
t≥0q(x(t ∧ u), y(λ(t) ∧ u)).
The Skorohod Topology is the topology induced on DE [0,∞) by the metric d.
Proposition 2.4. The function d, defined above, is a metric on DE [0,∞).
Proof. Suppose (xn)n≥1, (yn)n≥1 are sequences in DE [0,∞). Then limn→∞ d(xn, yn) = 0 if andonly if there exists a sequence (λn)n≥1 in Λ such that
limn→∞
γ(λn) = 0 (2.1)
andlim
n→∞µu ∈ [0, u0] : d(xn, yn, λn, u) ≥ ε = 0 (2.2)
for every ε > 0 and u0 > 0, where µ is Lebesgue measure.
Now for all T > 0,
T(
eγ(λ) − 1)
= T(
esup0≤s<t|log λ(t)−λ(s)t−s | − 1
)
= sup0≤s<t
T(
e|logλ(t)−λ(s)
t−s | − 1)
≥ sup0<t≤T
T(
e|logλ(t)
t | − 1)
= sup0<t≤T
T
(
max
λ(t) − t
t,
t− λ(t)
λ(t)
)
≥ sup0<t≤T
max λ(t) − t, t− λ(t) ,
where the first inequality follows from setting s = 0 and bounding t by T and the next line
follows by considering the cases log λ(t)t ≥ and < 0 separately. This gives us
sup0≤t≤T
|λ(t) − t| ≤ T(
eγ(λ) − 1)
, (2.3)
by which (2.1) implieslim
n→∞sup
0≤t≤T|λn(t) − t| = 0 (2.4)
for all T > 0.
Now suppose d(x, y) = 0. Then setting xn = x and yn = y for all n ∈ N, (2.2) and (2.4)imply that x(t) = y(t) for almost all continuity points t of y. But by Lemma 2.2, y has at mostcountably many points of discontinuity and so x(t) = y(t) for almost all points t and, as x andy are right continuous, x = y.
Let x, y ∈ DE[0,∞). Then since λ is bijective on [0,∞)
supt≥0
q(x(t ∧ u), y(λ(t) ∧ u)) = supt≥0
q(x(λ−1(t) ∧ u), y(t ∧ u))
4
for all λ ∈ Λ and u ≥ 0, and so d(x, y, λ, u) = d(y, x, λ−1, u). Also
γ(λ) = sup0≤s<t
∣
∣
∣
∣
logλ(t) − λ(s)
t− s
∣
∣
∣
∣
= sup0≤s<t
∣
∣
∣
∣
logλ(λ−1(t)) − λ(λ−1(s))
λ−1(t) − λ−1(s)
∣
∣
∣
∣
= sup0≤s<t
∣
∣
∣
∣
logt− s
λ−1(t) − λ−1(s)
∣
∣
∣
∣
= sup0≤s<t
∣
∣
∣
∣
logλ−1(t) − λ−1(s)
t− s
∣
∣
∣
∣
= γ(λ−1)
for every λ ∈ Λ and so d(x, y) = d(y, x).
To show that d is a metric it only remains to check the triangle inequality.
Let x, y, z ∈ DE [0,∞), λ1, λ2 ∈ Λ, and u ≥ 0. Then
supt≥0
q(x(t ∧ u), z(λ2(λ1(t)) ∧ u)) ≤ supt≥0
q(x(t ∧ u), y(λ1(t) ∧ u))
+ supt≥0
q(y(λ1(t) ∧ u), z(λ2(λ1(t)) ∧ u))
= supt≥0
q(x(t ∧ u), y(λ1(t) ∧ u))
+ supt≥0
q(y(t ∧ u), z(λ2(t) ∧ u)),
i.e. d(x, z, λ2 λ1, u) ≤ d(x, y, λ1, u) + d(y, z, λ2, u). But since λ2 λ1 ∈ Λ and
γ(λ2 λ1) = sup0≤s<t
∣
∣
∣
∣
logλ2(λ1(t)) − λ2(λ1(s))
t− s
∣
∣
∣
∣
= sup0≤s<t
∣
∣
∣
∣
logλ2(λ1(t)) − λ2(λ1(s))
λ1(t) − λ1(s)+ log
λ1(t) − λ1(s)
t− s
∣
∣
∣
∣
≤ sup0≤s<t
∣
∣
∣
∣
logλ2(λ1(t)) − λ2(λ1(s))
λ1(t) − λ1(s)
∣
∣
∣
∣
+ sup0<s≤t
∣
∣
∣
∣
logλ1(t) − λ1(s)
t− s
∣
∣
∣
∣
= γ(λ2) + γ(λ1),
we obtain d(x, z) ≤ d(x, y) + d(y, z) as required.
It is not very clear from the definition of the Skorohod topology under which conditionssequences in DE [0,∞) converge. The following two propositions establish some necessary andsufficient conditions for convergence in DE [0,∞) which are slightly easier to get an intuitivegrasp of.
Proposition 2.5. Let (xn)n≥1 and x be in DE [0,∞). Then limn→∞ d(xn, x) = 0 if and only ifthere exists a sequence (λn)n≥1 in Λ such that (2.1) holds and
limn→∞
d(xn, x, λn, u) = 0 for all continuity points u of x. (2.5)
In particular, limn→∞ d(xn, x) = 0 implies that limn→∞ xn(u) = limn→∞ xn(u−) = x(u) for allcontinuity points u of x.
Proof. By Lemma 2.2, x has only countably many discontinuity points and so, by the Reverse
5
Fatou Lemma and (2.5),
limn→∞
µu ∈ [0, u0] : d(xn, x, λn, u) ≥ ε = lim supn→∞
∫
1u∈[0,u0]:d(xn,x,λn,u)≥εdµ
≤∫
1u∈[0,u0]:lim supn→∞ d(xn,x,λn,u)≥εdµ
≤ µu ∈ [0, u0] : u is a discontinuity point of x= 0,
i.e (2.2) holds and so the conditions are sufficient.
Conversely, suppose that limn→∞ d(xn, x) = 0 and that u is a continuity point of x. Thenthere exists a sequence (λn)n≥1 in Λ such that (2.1) holds, and (2.2) holds with yn = x for alln. In particular, there exists an increasing sequence (Nm)m≥1 such that for all n ≥ Nm
µ
v ∈ (u, u+ 1] : d(xn, x, λn, v) <1
m
> 0. (2.6)
Hence, for each Nm ≤ n < Nm+1, there exists a un ∈ (u, u + 1] such that d(xn, x, λn, un) < 1m .
By picking arbitrary values of un ∈ (u, u+ 1] for n < N1, we obtain
limn→∞
supt≥0
q(xn(t ∧ un), x(λn(t) ∧ un)) = limn→∞
d(xn, x, λn, un) = 0. (2.7)
Now
d(xn, x, λn, u) = supt≥0
q(xn(t ∧ u), x(λn(t) ∧ u))
≤ supt≥0
q(xn(t ∧ u), x(λn(t ∧ u) ∧ un))
+ supt≥0
q(x(λn(t ∧ u) ∧ un), x(λn(t) ∧ u))
But
supt≥0
q(x(λn(t ∧ u) ∧ un), x(λn(t) ∧ u)) = sup0≤t≤u
q(x(λn(t ∧ u) ∧ un), x(λn(t) ∧ u))
∨ supt>u
q(x(λn(t ∧ u) ∧ un), x(λn(t) ∧ u))
= sup0≤t≤u
q(x(λn(t) ∧ un), x(λn(t) ∧ u))
∨ supt>u
q(x(λn(u) ∧ un), x(λn(t) ∧ u))
= supu≤s≤λn(u)∨u
q(x(s), x(u))
∨ supλn(u)∧u<s≤u
q(x(λn(u) ∧ un), x(s))
where the third equality is obtained by setting s = λn(t)∧un in the first term and s = λn(t)∧uin the second term. Hence
d(xn, x, λn, u) ≤ sup0≤t≤u
q(xn(t ∧ un), x(λn(t) ∧ un))
+ supu≤s≤λn(u)∨u
q(x(s), x(u))
∨ supλn(u)∧u<s≤u
q(x(λn(u) ∧ un), x(s)) (2.8)
for each n. Thus limn→∞ d(xn, x, λn, u) = 0 by (2.7), (2.4), and the continuity of x at u.
6
Proposition 2.6. Let (xn)n≥1 and x be in DE [0,∞). Then limn→∞ d(xn, x) = 0 if and only ifthere exists a sequence (λn)n≥1 in Λ such that (2.1) holds and
limn→∞
sup0≤t≤T
r(xn(t), x(λn(t))) = 0 (2.9)
for all T > 0.
Remark 2.7. The above proposition is equivalent to one with (2.9) replaced by
limn→∞
sup0≤t≤T
r(xn(λn(t)), x(t)) = 0 (2.10)
Proof. Suppose limn→∞ d(xn, x) = 0. Then there exists a sequence (λn)n≥1 in Λ such that (2.1)holds and (2.2) holds with yn = x for all n. In particular, by (2.6) with u = m, there exists asequence (un)n≥1 ⊂ (0,∞) with un → ∞, and d(xn, x, λn, un) → 0 i.e.
limn→∞
supt≥0
r(xn(t ∧ un), x(λn(t) ∧ un)) = 0.
But given T > 0, un ≥ T ∨λn(T ) for sufficiently large n (using (2.4)) and so the above equationimplies (2.9).
Conversely, suppose there exists a sequence (λn)n≥1 in Λ such that (2.1) and (2.9) hold.Then for every continuity point u of x,
sup0≤t≤u
q(xn(t ∧ un), x(λn(t) ∧ un)) = sup0≤t≤u
q(xn(t), x(λn(t))) ≤ sup0≤t≤u
r(xn(t), x(λn(t)))
for all n large enough that un > λn(u)∨ u. And so, by (2.8), and the right continuity of x (2.5)holds. The result follows by Proposition 2.5.
Two simple examples of sequences that converge in (DE [0,∞), d) may give some insight intowhy the metric d is defined as above:
Example 2.8. Suppose (xn)n≥1 is a sequence in (DE [0,∞), d) such that xn → x locally uni-formly for some x ∈ (DE [0,∞), d). Then limn→∞ sup0≤t≤T r(xn(t), x(t)) = 0 and so, takingλn(t) = t for all n in Proposition 2.6, xn → x with respect to d. Hence the Skorohod topologyis weaker than the (locally) uniform topology.
Example 2.9. Define a sequence (xn)n≥1 in (DE [0,∞), d), by xn(s) = αn1tn≤s, where αn ∈ Eand tn ∈ [0,∞). Suppose that tn → t and αn → α for some t ∈ [0,∞) and α ∈ E. Intuitivelyone would expect xn → x where x(s) = α1t≤s. However, in the locally uniform topology thisis not always the case; for example if t ≤ tn for all n and α 6= 0, then xn(t) = 0 for all n,whereas x(t) = α. In other words, the locally uniform topology is too strong. The functionsλn are introduced to allow small perturbations around the points of discontinuity of x. In thisexample, if t 6= 0, then tn > 0 for sufficiently large n and so we may set λn(s) = t
tns. Then
limn→∞ γ(λn) = limn→∞∣
∣
∣log(
ttn
)∣
∣
∣ = 0 and
sup0≤s≤T
r(xn(s), x(λn(s))) = sup0≤s≤T
r(αn1tn≤s, α1tn≤s) → 0.
By Proposition 2.6, xn → x, as intuitively expected.
We are now ready to prove that (DE [0,∞), d) has the required properties. Note that whileseparability is a topological property, completeness is a property of the metric.
Theorem 2.10. If E is separable, then DE [0,∞) is separable. If the metric space (E, r) iscomplete, then (DE [0,∞), d) is complete.
7
Proof. Since E is separable, there exists a countable dense subset α1, α2, . . . ⊂ E. Let Γ bethe countable collection of elements of DE [0,∞) of the form
y(t) =
αik, tk−1 ≤ t < tk, k = 1, . . . , n,
αin, t ≥ tn,
(2.11)
where 0 = t0 < t1 < · · · < tn are rationals, i1, . . . in are positive integers and n ≥ 1. We shallshow that Γ is dense in DE [0,∞). Given ε > 0, let x ∈ DE [0,∞) and let T ∈ N be sufficientlylarge that e−T < ε
2 . By Lemma 2.2, x has only finitely many points of discontinuity in theinterval (0, T ), at s1, . . . sm say, where 0 = s0 < s1 < · · · < sm < sm+1 = T . Since x has leftlimits, for each j = 0, . . . ,m, x is uniformly continuous on the interval [sj , sj+1) and so thereexists some 0 < δj < sj+1−sj such that if s, t ∈ [sj, sj+1), and |s−t| < δj , then |x(s)−x(t)| < ε
4 .
Let n ≥ 3Tδ0∧···∧δm
and let tk = kTn for k = 0, . . . , n. For each k = 0, . . . , n− 1, there exists some
positive integer ik such that |αik− x(tk)| < ε
4 . Define y as in (2.11). For each j = 1, . . .m with
nsj /∈ N, there exists some kj such that sj ∈ (tkj, tkj+1). Let t′kj
=sj−e−εtkj+1
1−e−ε ∈ (tkj−1, sj),
and t′kj+1=
sj−eεtkj+1
1−eε ∈ (tkj+1, tkj+2). Note that, by the definition of n, |kj − ki| ≥ 3 for all
i 6= j and so the t′kjare strictly increasing. Let λ ∈ Λ be the strictly increasing piecewise linear
function joining (t′kj, t′kj
) to (tkj+1, sj) to (t′kj+1, t′kj+1) for those j = 1, . . .m, for which nsj /∈ N,
with gradient 1 otherwise. Then
γ(λ) ≤ minj
log
∣
∣
∣
∣
∣
t′kj+1 − sj
t′kj+1 − tkj+1
∣
∣
∣
∣
∣
∧ log
∣
∣
∣
∣
∣
sj − t′kj
tkj+1 − tkj
∣
∣
∣
∣
∣
= ε
and, if t < T , then r(x(t), y(λ(t))) = r(x(t), x(tk)) + r(x(tk), αik) < ε
2 , for some k with 0 ≤t− tk < δ. Hence, by the definition of d,
d(x, y) ≤ ε ∨(ε
2+ e−T
)
≤ ε,
and so Γ is dense in DE [0,∞).
To prove completeness, suppose that (xn)n≥1 is a Cauchy sequence in DE [0,∞). There exist1 ≤ N1 < N2 < · · · such that m,n ≥ Nk implies
d(xn, xm) ≤ 2−k−1e−k.
Then if yk = xNkfor k = 1, 2, . . ., there exist uk > k and λk ∈ Λ such that
γ(λk) ∨ d(yk, yk+1, λk, uk) ≤ 2−k.
Let µn,k = λk+n · · · λk. Then γ(µn,k) ≤∑k+nj=k γ(λj) ≤ 2−k+1. By a similar argument to that
used to prove (2.3), for any µ, λ ∈ Λ,
eγ(µ) ≥ supt6=λ(t)
∣
∣
∣
∣
µ(λ(t)) − µ(t)
λ(t) − t
∣
∣
∣
∣
,
and sosup
0≤t≤T|µ(λ(t) − t)| ≤ sup
0≤t≤T|λ(t) − t|eγ(µ).
Hence, if n ≥ m, then
sup0≤t≤T
|µn,k − µm,k| ≤ sup0≤t≤T
|µn−m−1,k+m+1(t) − t|eγ(µm,k)
≤ T (eγ(µn−m−1,k+m+1) − 1)e2−k+1
≤ T (e2−k−m − 1)e2
−k+1
→ 0
8
as m → ∞, where the second inequality follows by (2.3). Hence, µn,k converges uniformly oncompact intervals to a strictly increasing, Lipschitz continuous function µk, with γ(µk) ≤ 2−k+1,i.e. µk ∈ Λ. Now
supt≥0
q(yk(µ−1k (t) ∧ uk), yk+1(µ
−1k+1(t) ∧ uk)) = sup
t≥0q(yk(µ−1
k (t) ∧ uk), yk+1(λk(µ−1k (t)) ∧ uk))
= supt≥0
q(yk(t ∧ uk), yk+1(λk(t) ∧ uk))
= d(yk, yk+1, λk, uk)
≤ 2−k
for k = 1, 2, . . .. Since (E, r) is complete, zk = yk µ−1k ∈ DE [0,∞), converges uniformly
on bounded intervals to a function y : [0,∞) → E. As each zk ∈ DE [0,∞), y ∈ DE [0,∞)(taking locally uniform limits preserves right continuity and the existence of left limits). Nowlimk→∞ γ(µ−1
k ) = limk→∞ γ(µk) = 0 and
limk→∞
sup0≤t≤T
r(yk(µ−1k (t)), y(t)) = lim
k→∞sup
0≤t≤Tr(zk(t), y(t)) = 0,
for all T > 0 and so, by Proposition 2.6 (using Remark 2.7), limk→∞ d(yk, y) = 0. Hence(DE [0,∞), d) is complete.
In order to study Borel probability measures on DE [0,∞) it is important to know more aboutSE , the Borel σ-algebra of DE [0,∞). The following result states that SE is just the σ-algebragenerated by the coordinate random variables.
Proposition 2.11. For each t ≥ 0, define πt : DE [0,∞) → E by πt(x) = x(t). Then
SE ⊃ S′E = σ(πt : 0 ≤ t <∞).
If E is separable, then SE = S′E .
Proof. For each ε > 0, t ≥ 0, and f ∈ C(E), the space of real valued bounded continuousfunctions, define
fεt (x) =
1
ε
∫ t+ε
t
f(πs(x))ds.
Now suppose (xn)n≥1 is a sequence in DE [0,∞) converging to x. Then there exists a sequence(λn)n≥1 in Λ such that (2.1) and (2.10) hold. Then
fεt (xn) =
1
ε
∫
f(xn(λn(s)))1λn(t)≤s≤λn(t+ε)λ′n(s)ds
→ 1
ε
∫
f(x(s))1t≤s≤t+εds
= fεt (x),
by dominated convergence, since xn(λn(s)) → x(s) uniformly on bounded intervals, f is boundedand continuous, and λn(s) → s uniformly on bounded intervals, implying λ′n(s) → 1 almosteverywhere, uniformly on bounded intervals. Hence fε
t is a continuous function on DE [0,∞)and so is Borel measurable. As limε→∞ fε
t (x) = f(πt(x)) for every x ∈ DE[0,∞), f πt is Borelmeasurable for every f ∈ C(E) and hence every bounded measurable function f . In particular,
π−1t (Γ) = x ∈ DE[0,∞) : 1Γ(πt(x)) = 1 ∈ SE
for all Borel subsets Γ ⊂ E, and hence SE ⊃ S′E .
Assume now that E is separable. Let n ≥ 1, let 0 = t0 < t1 < · · · < tn < tn+1 = ∞, and forα0, α1, . . . , αn ∈ E define η(α0, α1, . . . , αn) ∈ DE [0,∞) by
η(α0, α1, . . . , αn)(t) = αi, ti ≤ t < ti+1, i = 0, 1, . . . , n.
9
Nowd(η(α0, α1, . . . , αn), η(α′
0, α′1, . . . , α
′n)) ≤ max
0≤i≤nr(αi, α
′i),
and so η is a continuous function from En+1 into DE [0,∞). Since E is separable, En+1 isseparable and so there exists a countable dense subset A ⊂ En+1. For fixed z ∈ DE [0,∞) andε > 0, by the continuity of η,
Γ = a ∈ En+1 : d(z, η(a)) < ε =⋃
a∈A:d(z,η(a))<ε
∞⋂
n=1
B
(
a,1
n
)
,
is a measurable subset of En+1 with respect to the Borel product σ-algebra and so, since eachπt is S′
E measurable, d(z, η (πt0 , πt1 , . . . , πtn)) is an S′
E-measurable function from DE [0,∞)into R. For m = 1, 2, . . ., define ηm as η but with n = m2 and ti = i
m , i = 0, 1, . . .. Byan identical argument to that in the proof of the separability of DE [0,∞) in Theorem 2.10,d(x, ηm(πt0(x), πt1 (x), . . . πtn
(x))) → 0 as m→ ∞ and hence
limm→∞
|d(z, ηm(πt0(x), πt1 (x), . . . πtn(x))) − d(z, x)| ≤ lim
m→∞d(x, ηm(πt0(x), πt1 (x), . . . πtn
(x)))
= 0
for every x ∈ DE [0,∞). Therefore d(z, x) = limm→∞ d(z, ηm(πt0(x), πt1 (x), . . . πtn(x))) is S′
E-measurable in x for fixed z ∈ DE [0,∞) and in particular, every open ball B(z, ε) = x ∈DE [0,∞) : d(z, x) < ε belongs to S ′
E . Since E (and, by Theorem 2.10, DE [0,∞)) is separable,S′
E contains all the open sets in DE[0,∞) and hence contains SE .
3 Convergence of Probability Measures
I order to study the convergence of the distributions of stochastic processes, it is necessary tounderstand the probability measures that characterise these. In this section we construct ametric on the space of probability measures corresponding to the convergence, in distribution,of the stochastic processes. Using this, we establish a relationship between convergence indistribution and convergence in probability of processes defined on a common probability space.This result is applied to sequences of Markov chains and Diffusion processes to obtain somesimple conditions for convergence.
Where possible, results are proved for probability measures on a general metric space (S, d).However, in practice, we generally take S = DE [0,∞), and in particular DRd [0,∞), with themetric d defined in the previous section.
Notation 3.1. For a metric space (S, d)B(S) is the σ-algebra of Borel subsets of SP(S) is the family of Borel probability measures on SC(S) is the space of real-valued bounded continuous functions on (S, d) with norm ‖f‖ =supx∈S |f(x)|C is the collection of closed subsets of SF ε = x ∈ S : infy∈F d(x, y) < ε where F ⊂ S
Definition 3.2. A sequence (Pn)n≥1 in P(S) is said to converge weakly to P ∈ P(S) (denotedPn ⇒ P ) if
limn→∞
∫
fdPn =
∫
fdP for all f ∈ C(S).
The distribution of an S-valued random variable X , denoted by PX−1, is the element of P(S)given by PX−1(B) = P(X ∈ B) where P is the probability measure on the probability spaceunderlying X .
10
A sequence (Xn)n≥1 of S-valued random variables is said to converge in distribution to theS-valued random variable X if PX−1
n ⇒ PX−1, or equivalently, if
limn→∞
E(f(Xn)) = E(f(X)) for all f ∈ C(S).
This is denoted by Xn ⇒ X .
Remark 3.3. Note that this is a direct generalisation of the definition of convergence in distri-bution of a sequence of real-valued random variables (Xn)n≥1, where we say that Xn ⇒ X iflimn→∞ E(f(Xn)) = E(f(X)) for all f ∈ C(R).
3.1 The Prohorov Metric
We now define a metric ρ on P(S) with the property that a sequence of probability measuresconverges with respect to ρ if and only if it converges weakly.
Definition 3.4. For P and Q ∈ P(S) the Prohorov metric is defined by
ρ(P,Q) = infε > 0 : P (F ) ≤ Q(F ε) + ε for all F ∈ C,
using the notation defined in 3.1.
In order to prove that ρ is a metric, the following lemma is needed.
Lemma 3.5. Let P,Q ∈ P(S) and α, β > 0. If
P (F ) ≤ Q(Fα) + β (3.1)
for all F ∈ C, thenQ(F ) ≤ P (Fα) + β (3.2)
for all F ∈ C.
Proof. Suppose F ∈ C. Fα is open since if x ∈ Fα, then there exists y ∈ F such that d(x, y) < α.Then d(z, y) ≤ d(z, x) + d(x, y) < α for all z ∈ B(x, α− d(x, y)) and so B(x, α− d(x, y)) ⊂ Fα.Hence, if G = S \ Fα, then G ∈ C. F ⊂ S \Gα since if x ∈ Gα, then there exists some y /∈ Fα
such that d(x, y) < α. Then d(y, z) ≥ α for all z ∈ F and so d(x, z) ≥ d(y, z) − d(x, y) > 0 forall z ∈ F i.e. x /∈ F . Substituting G into (3.1) gives
P (Fα) = 1 − P (G) ≥ 1 −Q(Gα) − β ≥ Q(F ) − β.
Proposition 3.6. The function ρ, defined above, is a metric on P(S).
Proof. By the above lemma,
ρ(P,Q) = infε > 0 : P (F ) ≤ Q(F ε) + ε for all F ∈ C= infε > 0 : Q(F ) ≤ P (F ε) + ε for all F ∈ C= ρ(Q,P ).
If ρ(P,Q) = 0, then there exists a sequence (εn)n≥1 with εn → 0 such that P (F ) ≤ Q(F εn)+ εn
for all n. Letting n→ ∞ and using the continuity of probability measures, gives P (F ) ≤ Q(F )for all F ∈ C. By the above symmetry between P and Q, P (F ) = Q(F ) for all F ∈ C and hencefor all F ∈ B(S). Therefore ρ(P,Q) = 0 if and only if P = Q.
11
Finally, if P,Q,R ∈ P(S), and δ > 0, ε > 0 satisfy ρ(P,Q) < δ and ρ(Q,R) < ε, then
P (F ) ≤ Q(F δ) + δ
≤ Q(F δ) + δ
≤ R((F δ)ε) + δ + ε
≤ R(F δ+ε) + δ + ε
for all F ∈ C, so ρ(P,R) ≤ δ + ε and hence ρ(P,R) ≤ infδ>ρ(P,Q) δ + infε>ρ(Q,R) ε = ρ(P,Q) +ρ(Q,R) as required.
Theorem 3.7. Let (Pn)n≥1 be a sequence in P(S) and P ∈ P(S). If S is separable, thenlimn→∞ ρ(Pn, P ) = 0 if and only if Pn ⇒ P .
Proof. Suppose limn→∞ ρ(Pn, P ) = 0. For each N , let εn = ρ(Pn, P )+ 1n . Given f ∈ C(S) with
f ≥ 0,∫
fdPn =
∫ ‖f‖
0
Pn(f ≥ t)dt ≤∫ ‖f‖
0
P (f ≥ tεn)dt+ εn‖f‖
for every n and so, by dominated convergence,
lim supn→∞
∫
fdPn ≤ limn→∞
∫ ‖f‖
0
P (f ≥ tεn)dt
=
∫ ‖f‖
0
P (f ≥ t)dt
=
∫
fdP.
Hence, for all f ∈ C(S),
lim supn→∞
∫
(‖f‖ + f)dPn ≤∫
(‖f‖ + f)dP, and
lim supn→∞
∫
(‖f‖ − f)dPn ≤∫
(‖f‖ − f)dP.
Therefore∫
fdP ≤ ‖f‖ − lim supn→∞
∫
(‖f‖ − f)dPn
= lim infn→∞
∫
fdPn
≤ lim supn→∞
∫
fdPn
= lim supn→∞
∫
(‖f‖ + f)dPn − ‖f‖
≤∫
fdP.
and so we must have equality throughout. Thus limn→∞∫
fdPn =∫
fdP for all f ∈ C(S) i.e.Pn ⇒ P .
Conversely, suppose Pn ⇒ P . We first establish some preliminary results for open and closedsubsets of S. Let F ∈ C and for each ε > 0, define fε ∈ C(S) by
fε(x) =
(
1 − d(x, F )
ε
)
∨ 0,
12
where d(x, F ) = infy∈F d(x, y). Then fε ≥ 1F for all ε > 0 and so
lim supn→∞
Pn(F ) ≤ limn→∞
∫
fεdPn =
∫
fεdP,
for each ε > 0. Therefore, by dominated convergence,
lim supn→∞
Pn(F ) ≤ limε→0
∫
fεdP = P (F ).
If G ⊂ S is open, then
lim infn→∞
Pn(G) = 1 − lim supn→∞
Pn(Gc) ≥ 1 − P (Gc) = P (G).
Now let ε > 0. Since S is separable (note: this is the only point in the proof where we useseparability), there exists a countable dense subset x1, x2, . . . ⊂ S. Let Ei = B(xi,
ε4 ) for
i = 1, 2, . . .. Then P (⋃∞
i=1 Ei) = P (S) = 1 and so there exists some smallest integer N such
that P (⋃N
i=1 Ei) > 1− ε2 . Now let G be the collection of open sets of the form
(⋃
i∈I Ei
)ε2 , where
I ⊂ 1, . . . , N. Since G is finite, by the above result on open sets, there exists some N0 suchthat P (G) ≤ Pn(G) + ε
2 for all G ∈ G and n ≥ N0. Given F ∈ C, let
F0 =⋃
Ei : 1 ≤ i ≤ N, Ei ∩ F 6= ∅.
Then Fε20 ∈ G and so
P (F ) ≤ P (Fε20 ) + P
( ∞⋃
i=N+1
Ei
)
≤ P (Fε20 ) +
ε
2
≤ Pn(Fε20 ) + ε
≤ Pn(F ε) + ε
for all n ≥ N0, where the first inequality is by F ⊂ F0∪(⋃∞
i=N+1Ei), the second by the definitionof N , the third by the definition of N0 and the fourth by the diameter of the Ei being ε
2 . Henceρ(Pn, P ) ≤ ε for each n ≥ N0, i.e. limn→∞ ρ(Pn, P ) = 0.
Definition 3.8. Let P,Q ∈ P(S). Define M(P,Q) to be the set of all µ ∈ P(S × S) withmarginals P and Q i.e. µ(A× S) = P (A) and µ(S ×A) = Q(A) for all A ∈ B(S).
The following lemma provides a probabilistic interpretation of the Prohorov metric:
Lemma 3.9.ρ(P,Q) ≤ inf
µ∈M(P,Q)inf ε > 0 : µ((x, y) : d(x, y) ≥ ε) ≤ ε .
Proof. If for some ε > 0 and µ ∈ M(P,Q) we have
µ((x, y) : d(x, y) ≥ ε) ≤ ε,
then
P (F ) = µ(F × S)
≤ µ ((F × S) ∩ (x, y) : d(x, y) < ε) + µ((x, y) : d(x, y) ≥ ε)
≤ µ(S × F ε) + ε
= Q(F ε) + ε,
for all F ∈ C and so ρ(P,Q) ≤ ε. The result follows.
13
In fact, in the case when S is separable, the inequality in the above lemma can be replacedby an equality. (The proof is an immediate consequence of Lemma 3.11).
Proposition 3.10. Let (S, d) be separable. Suppose that Xn, n = 1, 2, . . ., and X are S-valuedrandom variables defined on the same probability space with distributions Pn, n = 1, 2, . . ., andP respectively. If d(Xn, X) → 0 in probability as n→ ∞, then Pn ⇒ P .
Proof. For n = 1, 2, . . ., let µn be the joint distribution of Xn and X . Then
limn→∞
µn((x, y) : d(x, y) ≥ ε) = limn→∞
P(d(Xn, X) ≥ ε)
= 0,
where P is the probability measure on the probability space underlying Xn and X . By Lemma3.9, limn→∞ ρ(Xn, X) = 0, and since S is separable, the result follows by Theorem 3.7.
3.2 Examples
Proposition 3.10 suggests a method of proving that a sequence of probability measures convergesweakly by constructing random variables with the required distributions on a common proba-bility space and showing that they converge in probability. We illustrate this method by lookingat three examples: discrete time Markov chains, continuous time Markov chains and diffusionprocesses.
3.2.1 Discrete Time Markov Chains
Suppose that E is countable and that (XN)N≥1 and X are discrete time Markov Chains withinitial distributions (λN )N≥1 and λ, and transition matrices (PN )N≥1 and P respectively. Wewill show that if PN → P and λN → λ uniformly, then XN ⇒ X .
We construct random variables (Y N )N≥1 and Y with the required distributions on the prob-ability space ([0, 1),B, µ) where B is the Borel σ-algebra and µ is Lebesgue measure. Withoutloss of generality we may assume E = N (set the relevant probabilities to zero if E is finite).
For each ω ∈ [0, 1), construct a sequence a(ω) = (in)n≥0 of elements of N inductively asfollows:
Since∑∞
i=0 λi = 1, there exists a smallest i0 ∈ N such that ω >∑i0
i=0 λi. Set a(ω)0 = i0,
and let µi0 =∑i0−1
i=0 λi.
Since µi0 ≤ ω < µi0 + λi0 , and∑∞
i=0 pi0 i = 1, there exists some smallest i1 such that
ω < µi0 + λi0
∑i1i=0 pi0 i. Set a(ω)1 = i1, and let µi0,i1 = µi0 + λi0
∑i1−1i=0 pi0 i.
Suppose we have constructed a(ω)m = im and µi0,...,imfor all m < n, such that µi0,...,im
≤ω < µi0,...,im
+ λi0pi0 i1 · · · pim−1 im. Since
∑∞i=0 pin−1 i = 1, there exists some smallest in such
that ω < µi0,...,in−1 + λi0pi0 i1 · · · pin−2 in−1
∑in
i=0 pin−1 i. Set a(ω)n = in, and let µi0,...,in=
µi0,...,in−1 + λi0pi0 i1 · · · pin−2 in−1
∑in−1i=0 pin−1 i.
Define a discrete time process (Yn)n≥0 on ([0, 1),B, µ) by setting Yn(ω) = a(ω)n. Then
P (Y0 = i0, Y1 = i1, . . . , Yn = in) = µ(ω : µi0,...,in≤ ω < µi0,...,in
+ λi0pi0 i1 · · · pin−1 in)
= λi0pi0 i1 · · · pin−1 in,
and so (Yn)n≥0 is a Markov chain with initial distribution λ and transition matrix P . For eachN ∈ N, construct (Y N
n )n≥0 with initial distribution λN and transition matrix PN similarly.
µi0,...,inis a continuous function in a finite number of the entries of λ and P , and λN → λ and
PN → P , uniformly in N . Hence µNi0,...,in
→ µi0,...,in. Therefore, if µi0,...,in
< ω < µi0,...,in+1,then there exists some N0 ∈ N such that N ≥ N0 implies that
µNi0,...,in
< ω < µNi0,...,in+1
14
and hence Y Nm (ω) = Ym(ω) for allm ≤ n. In other words, provided ω 6= µi0,...,in
for any i0, . . . , in,Y N (ω) → Y (ω). But only a countable number of elements of [0, 1) are equal to µi0,...,in
for somei0, . . . , in and so limN→∞ Y N = Y almost surely. By Proposition 3.10, XN ⇒ X as required.
3.2.2 Continuous Time Markov Chains
Suppose that E is countable and that (XN )N≥1 and X are continuous time Markov Chains withinitial distributions (λN )N≥1 and λ, and generator matrices (QN )N≥1 and Q respectively. Wewill show that if QN → Q and λN → λ uniformly, then XN ⇒ X .
We construct random variables (ZN )N≥1 and Z with the required distributions on a commonprobability space (Ω,F ,P), using a construction due to Norris [9]. As in the discrete case weshall assume that E = N.
Let (ΠN )N≥1 and Π be the jump matrices corresponding to the generator matrices (QN )N≥1
and Q respectively. Since QN → Q uniformly, the corresponding jump matrices ΠN → Πuniformly and, by the discrete case above, there exist discrete time Markov chains (Y N )N≥1 andY with initial distributions (λN )N≥1 and λ, and transition matrices (ΠN )N≥1 and Π respectively,such that limN→∞ Y N = Y almost surely. By discarding a set of measure zero if necessary, wemay assume that Y N (ω) → Y (ω) for all ω ∈ Ω. Let T1, T2, . . . be independent exponentialrandom variables of parameter 1, independent of (Y N )N≥1 and Y . Defining q(i) = −qii, setSn = Tn
q(Yn−1) , Jn = S1 + · · · + Sn, and
Zt =
Yn if Jn ≤ t < Jn+1 for some n
∞ otherwise.
Then the Sn are independent exponential random variables with parameters q(Yn) and so Z hasthe required distribution. Define SN
n , JNn and ZN similarly for N ≥ 1. Since Y N (ω) → Y (ω)
for all ω ∈ Ω, given ω ∈ Ω, for each n, there exists some Nn such that N ≥ Nn implies that
Y Nm (ω) = Ym(ω) for all m ≤ n. Then since QN → Q, if N ≥ Nn, then SN
m(ω) = Tm(ω)
qN (Y Nm−1(ω))
=
Tm(ω)qN (Ym−1(ω)) → Tm(ω)
q(Ym−1(ω)) = Sm(ω) for all m ≤ n+ 1 and hence JNm → Jm for all m ≤ n+ 1. By
the same argument as that used to prove that DE [0,∞) is separable in Theorem 2.10, it followsthat d(ZN (ω), Z(ω)) → 0 as N → ∞, and so limN→∞ ZN = Z almost surely. By Proposition3.10, XN ⇒ X , as required.
3.2.3 Diffusion Processes
Suppose that (an)n≥1 and a are bounded symmetric uniformly positive definite Lipschitz func-tions Rd → Rd ⊗ Rd, and that (bn)n≥1 and b are bounded Lipschitz functions Rd → Rd. Let(Xn)n≥1 and X be diffusion processes in R
d with diffusivities (an)n≥1 and a, and drifts (bn)n≥1
and b respectively, starting from x ∈ Rd. We shall show that if an → a and bn → b uniformly,then Xn ⇒ X .
Let (Bt)t≥0 be a Brownian Motion in Rd. We shall construct diffusions (Zn)n≥1 and Z, withthe required distributions, on the probability space underlying (Bt)t≥0.
Since a(x) is symmetric positive definite for all x ∈ Rd, there exists a unique symmetric
positive definite map σ : Rd → Rd ⊗ (Rd)∗ such that σ(x)σ(x)∗ = a(x). Furthermore, since ais bounded and Lipschitz, σ is bounded and Lipschitz. Similarly, there exist unique symmetricpositive definite bounded Lipschitz maps (σn)n≥1 and, since an → a uniformly, σn → σ uni-formly. Assume that σ, σn, b and bn have Lipschitz constant K, independent of n. Since σ, σn,b and bn are Lipschitz, there exist continuous processes Z and Zn for n = 1, 2, . . ., adapted tothe filtration generated by (Bt)t≥0 satisfying
dZt = σ(Zt)dBt + b(Zt)dt, Z0 = x,
dZnt = σn(Zn
t )dBt + bn(Znt )dt, Zn
0 = x.
15
Zn and Z have the required distributions. Using (x+ y)2 = 2x2 + 2y2,
|Znt − Zt|2 =
∣
∣
∣
∣
∫ t
0
(σn(Zns ) − σ(Zs))dBs +
∫ t
0
(bn(Zns ) − b(Zs))ds
∣
∣
∣
∣
2
≤ 2
∣
∣
∣
∣
∫ t
0
(σn(Zns ) − σ(Zs))dBs
∣
∣
∣
∣
2
+ 2
∣
∣
∣
∣
∫ t
0
(bn(Zns ) − b(Zs))ds
∣
∣
∣
∣
2
.
By Doob’s L2 inequality,
E
(
sups≤t
∣
∣
∣
∣
∫ t
0
(σn(Zns ) − σ(Zs))dBs
∣
∣
∣
∣
2)
≤ 4E
(∫ t
0
|σn(Zns ) − σ(Zs)|2ds
)
,
and by the Cauchy-Schwartz inequality,
E
(
sups≤t
∣
∣
∣
∣
∫ t
0
(bn(Zns ) − b(Zs))ds
∣
∣
∣
∣
2)
≤ tE
(∫ t
0
|bn(Zns ) − b(Zs)|2ds
)
.
Now since K is a Lipschitz constant for σn,
|σn(Znr ) − σ(Zr)| ≤ |σn(Zn
r ) − σn(Zr)| + |σn(Zr) − σ(Zr)|≤ K|Zn
r − Zr| + ‖σn − σ‖,
and similarly|bn(Zn
r ) − b(Zr)| ≤ K|Znr − Zr| + ‖bn − b‖.
Hence
E
(
sups≤t
|Zns − Zs|2
)
≤ 8E
(∫ t
0
|σn(Zns ) − σ(Zs)|2ds
)
+ 2tE
(∫ t
0
|bn(Zns ) − b(Zs)|2ds
)
≤ 16E
(∫ t
0
(K2|Zns − Zs|2 + ‖σn − σ‖2)ds
)
+ 4tE
(∫ t
0
(K2|Zns − Zs|2 + ‖bn − b‖2)ds
)
≤ 16t‖σn − σ‖2 + 4t2‖bn − b‖2
+ (16 + 4t)K2
∫ t
0
E
(
supr≤s
|Znr − Zr|2
)
ds.
Given ε > 0 and T > 0, set c = (16 + 4T )K2 and ε′ = εe−cT . Since σn → σ and bn → b
uniformly, there exists N ∈ N such that n ≥ N implies ‖σn −σ‖ <√
ε′
32T and ‖bn− b‖ <√
ε′
8T 2 .
If fn(t) = E(
sups≤t |Zns − Zs|2
)
, then
fn(t) ≤ ε′ + c
∫ t
0
fn(s)ds,
for all n ≥ N and 0 ≤ t ≤ T . By Gronwall’s Inequality (Lemma 6.9), fn(t) ≤ ε′ect = ε and soE(
sups≤t |Zns − Zs|2
)
→ 0 as n→ ∞. In particular, sups≤t |Zns −Zs| → 0 in probability. Taking
λn(t) = t for all t, Proposition 2.6 implies that d(Zn, Z) → 0 in probability. By Proposition3.10, XN ⇒ X , as required.
3.3 The Skorohod Representation
A converse to Proposition 3.10 exists, in the form of the Skorohod Representation. Before wecan prove this, we need the following lemma.
16
Lemma 3.11. Let S be separable. Let P,Q ∈ P(S), ε > 0 satisfy ρ(P,Q) < ε, and δ > 0.Suppose that E1, . . . , EN ∈ B(S) are disjoint with diameters less than δ and that P (E0) ≤ δ,
where E0 = S \⋃Ni=1 Ei. Then there exist constants c1, . . . cN ∈ [0, 1] and independent random
variables X,Y0, . . . , YN (S-valued) and ξ ( [0, 1]-valued) on some probability space (Ω,F ,P) suchthat X has distribution P , ξ is uniformly distributed on [0, 1],
Y =
Yi on X ∈ Ei, ξ ≥ ci, i = 1, . . . , N,
Y0 on X ∈ E0 ∪⋃N
i=1X ∈ Ei, ξ < ci
has distribution Q,
d(X,Y ) ≥ δ + ε ⊂ X ∈ E0 ∪
ξ < max
ε
P (Ei): P (Ei) > 0
,
andP(d(X,Y ) ≥ δ + ε) ≤ δ + ε.
Proof. This lemma is not proved here as the proof is long and not very illuminating. Theinterested reader is referred to pp97-101 of Ethier and Kurtz [4].
Theorem 3.12 (The Skorohod Representation). Let (S, d) be separable. Suppose Pn,n = 1, 2, . . . and P in P(S) satisfy Pn ⇒ P . Then there exists a probability space (Ω,F ,P)on which are defined S-valued random variables Xn, n = 1, 2, . . . and X with distributions Pn,n = 1, 2, . . ., and P respectively, such that limn→∞Xn = X almost surely.
Proof. Let x1, x2, . . . be a dense subset of S. Then P(⋃∞
i=1 B(
xi, 2−k))
= 1 for each k andso, given ε > 0, there exist integers N1, N2, . . . such that
P
(
Nk⋃
i=1
B(
xi, 2−k)
)
≥ 1 − 2−k
for k = 1, 2, . . .. Set E(k)i = B(xi, 2
−k) and E(k)0 = S \ ⋃Nk
i=1 E(k)i . Assume (without loss of
generality) that εk = min1≤i≤NkP (E
(k)i ) > 0. Define the sequence (kn)n≥1 by
kn = 1 ∨ max
k ≥ 1 : ρ(Pn, P ) <εk
k
.
Apply Lemma 3.11 with Q = Pn, ε =εkn
knif kn > 1 and ε = ρ(Pn, P ) + 1
n if kn = 1, δ = 2−kn ,
Ei = Ekn
i , and N = Nknfor n = 1, 2, . . .. Then there exists a probability space (Ω,F ,P) on
which are defined S-valued random variables Y(n)0 , . . . , Y
(n)Nkn
, n = 1, 2, . . ., a random variable
ξ, uniformly distributed on [0, 1], and an S-valued random variable X with distribution P , all
of which are independent, such that if the constants c(n)1 , . . . , c
(n)Nkn
∈ [0, 1], n = 1, 2, . . . areappropriately chosen, then the random variable
Xn =
Y(n)i on X ∈ E
(kn)i , ξ ≥ c
(n)i , i = 1, . . . , Nkn
,
Y(n)0 on X ∈ E
(kn)0 ∪⋃Nkn
i=1 X ∈ E(kn)i , ξ < c
(n)i
has distribution Pn and
d(Xn, X) ≥ 2−kn +εkn
kn
⊂ X ∈ E(kn)0 ∪
ξ <1
kn
if kn > 1, for n = 1, 2, . . ..
17
Since Pn ⇒ P , by Theorem 3.7 ρ(Pn, P ) → 0. Hence, for each k ∈ N, ρ(Pn, P ) < εk
kfor sufficiently large n, and so kn ≥ k for sufficiently large n. If Kn = minm≥n km, thenlimn→∞Kn = ∞. However, for Kn > 1,
P
( ∞⋃
m=n
d(Xm, X) ≥ 2−km +εkm
km
)
≤∞∑
k=Kn
P(X ∈ E(k)0 ) + P
(
ξ <1
Kn
)
≤ 2−Kn+1 +1
Kn
→ 0.
So limn→∞Xn = X almost surely.
We conclude this section by giving an application of this theorem.
Corollary 3.13 (The Continuous Mapping Theorem). Let (S, d) and (S′, d′) be separablemetric spaces and let h : S → S′ be Borel measurable. Suppose that Pn, n = 1, 2, . . . and P inP(S) satisfy Pn ⇒ P , and define Qn, n = 1, 2, . . . and Q in P(S′) by Qn = Pnh
−1, Q = Ph−1.(By definition, Ph−1(B) = P (s ∈ S : h(s) ∈ B).) Let Ch be the set of points of S at which h iscontinuous. If P (Ch) = 1, then Qn ⇒ Q on S′.
Proof. By Theorem 3.12, there exists a probability space (Ω,F ,P) on which are defined S-valued random variables Xn, n = 1, 2, . . ., and X with distributions Pn, n = 1, 2, . . ., and Prespectively, such that limn→∞Xn = X almost surely. Since P(X ∈ Ch) = P (Ch) = 1, we havelimn→∞ h(Xn) = h(X) almost surely, and so, by Proposition 3.10, Qn ⇒ Q in S′.
4 Convergence of Finite Dimensional Distributions
We now get to a key result, due to Prohorov, on characterizing convergent processes. This statesthat a sequence of stochastic processes converges if and only if it is relatively compact and thefinite dimensional distributions converge.
This is a particularly useful method for checking the convergence of processes where thelimit has independent increments, as in this case computing the finite dimensional distributionsis relatively straightforward. (See Example 4.4 at the end of the section for an illustration).
Definition 4.1. Let Xα (where α ranges over some index set) be a family of stochastic pro-cesses with sample paths in DE [0,∞), and let Pα ⊂ P(DE [0,∞)) be the family of associatedprobability distributions (i.e. Pα(B) = P(Xα ∈ B), (where P is the probability measure on theprobability space underlying Xα), for all B ∈ SE , SE being the Borel σ-algebra of DE[0,∞).)We say that Xα is relatively compact if Pα is (i.e. if the closure of Pα in P(DE [0,∞)) iscompact).
Lemma 4.2. If X is a process with sample paths in DE [0,∞), then the complement in [0,∞)of
D(X) = t ≥ 0 : P(X(t) = X(t−)) = 1is at most countable.
Proof. For each, ε > 0, δ > 0, and T > 0, let
A(ε, δ, T ) = 0 ≤ t ≤ T : P (r(X(t), X(t−)) ≥ ε) ≥ δ .
18
Now if A(ε, δ, T ) contains a sequence (tn)n≥1 of distinct points, then
P (r(X(tn), X(tn−)) ≥ ε infinitely often) = P
( ∞⋂
n=1
∞⋃
m=n
r(X(tm), X(tm−)) ≥ ε)
= limn→∞
P
( ∞⋃
m=n
r(X(tm), X(tm−)) ≥ ε)
≥ limn→∞
P (r(Xn(t), X(tn−)) ≥ ε)
≥ δ > 0.
contradicting the fact that for each x ∈ DE [0,∞), r(x(t), x(t−)) ≥ ε for at most finitely manyt ∈ [0, T ] (see the proof of Lemma 2.2). Hence A(ε, δ, T ) is finite and so
D(X)c = t ≥ 0 : P(r(X(t), X(t−)) > 0) > 0 =∞⋃
n=1
∞⋃
n=1
∞⋃
N=1
A
(
1
n,
1
m,N
)
,
is at most countable.
Theorem 4.3. Let E be separable and let Xn, n = 1, 2, . . ., and X be processes with samplepaths in DE [0,∞).
(a) If Xn ⇒ X, then(Xn(t1), . . . , Xn(tk)) ⇒ (X(t1), . . . , X(tk)) (4.1)
for every finite set t1, . . . , tk ⊂ D(X). Moreover, for each finite set t1, . . . , tk ⊂ [0,∞),there exist sequences (tn1 )n≥1 in [t1,∞), . . . , (tnk )n≥1 in [tk,∞) converging to t1, . . . , tk,respectively, such that (Xn(tn1 ), . . . , Xn(tnk )) ⇒ (X(t1), . . . , X(tk)).
(b) If Xn is relatively compact and there exists a dense set D ⊂ [0,∞) such that (4.1) holdsfor every finite set t1, . . . , tk ⊂ D, then Xn ⇒ X.
Proof. (a) Suppose that Xn ⇒ X . Using the Skorohod Representation (Theorem 3.12), thereexists a probability space on which are defined processes Yn, n = 1, 2, . . ., and Y withsample paths in DE [0,∞) and with the same distributions asXn, n = 1, 2, . . ., and X , suchthat d(Yn, Y ) = 0 almost surely. If t1, . . . , tk ⊂ D(X) = D(Y ), then using the notation ofProposition 2.11, (πt1 , . . . , πtk
) : DE [0,∞) → Ek is continuous almost surely with respectto the distribution of Y and so, by the Continuous Mapping Theorem (Corollary 3.13),
limn→∞
(Yn(t1), . . . , Yn(tk)) = limn→∞
(πt1 , . . . , πtk)(Yn)
= (πt1 , . . . , πtk)(Y )
= (Y (t1), . . . , Y (tk)) almost surely.
The first conclusion follows by Proposition 3.10.
For the second conclusion, we observe that, by Lemma 4.2, for each finite set t1, . . . tk ⊂[0,∞), there exist sequences (tn1 )n≥1 in [t1,∞)∩D(X), . . . , (tnk )n≥1 in [tk,∞)∩D(X) con-verging to t1, . . . , tk, respectively. Then, by the above result, (Xn(tm1 ), . . . , Xn(tmk )) ⇒(X(tm1 ), . . . , X(tmk )) for each m ∈ N as n → ∞. Since the process X is right con-tinuous, (X(tm1 ), . . . , X(tmk )) → (X(t1), . . . , X(tk)) almost surely as m → ∞ and so(Xn(tn1 ), . . . , Xn(tnk )) ⇒ (X(t1), . . . , X(tk)).
(b) Since Xn is relatively compact, the closure of Pn is compact and hence every subse-quence of Pn has a convergent subsequence. It follows that every subsequence of Xnhas a convergent (in distribution) subsequence and so it is enough to show that every con-vergent subsequence of Xn converges in distribution toX . Restricting to a subsequence ifnecessary, suppose that Xn ⇒ Y . We must show that X and Y have the same distribution.
19
Let t1, . . . , tk ⊂ D(Y ) and f1, . . . fk ∈ C(E), and choose sequences (tn1 )n≥1 in D∩[t1,∞),. . . , (tnk )n≥1 in D ∩ [tk,∞) converging to t1, . . . , tk, respectively. The map (x1, . . . , xk) 7→∏k
i=1 fi(xi) ∈ C(Ek) and so (4.1) implies E
(
∏ki=1 fi(Xn(tmi ))
)
→ E
(
∏ki=1 fi(X(tmi ))
)
as
n→ ∞ for each m ≥ 1. Therefore there exist integers n1 < n2 < n3 < . . . such that∣
∣
∣
∣
∣
E
(
k∏
i=1
fi(X(tmi ))
)
− E
(
k∏
i=1
fi(Xnm(tmi ))
)∣
∣
∣
∣
∣
<1
m. (4.2)
Now∣
∣
∣
∣
∣
E
(
k∏
i=1
fi(X(ti))
)
− E
(
k∏
i=1
fi(Y (ti))
)∣
∣
∣
∣
∣
≤∣
∣
∣
∣
∣
E
(
k∏
i=1
fi(X(ti))
)
− E
(
k∏
i=1
fi(X(tmi ))
)∣
∣
∣
∣
∣
+
∣
∣
∣
∣
∣
E
(
k∏
i=1
fi(X(tmi ))
)
− E
(
k∏
i=1
fi(Xnm(tmi ))
)∣
∣
∣
∣
∣
+
∣
∣
∣
∣
∣
E
(
k∏
i=1
fi(Xnm(tmi ))
)
− E
(
k∏
i=1
fi(Xnm(ti))
)∣
∣
∣
∣
∣
+
∣
∣
∣
∣
∣
E
(
k∏
i=1
fi(Xnm(ti))
)
− E
(
k∏
i=1
fi(Y (ti))
)∣
∣
∣
∣
∣
for each m ≥ 1. All four terms on the right tend to zero as m→ ∞, the first by the rightcontinuity of X , the second by (4.2), the third by the right continuity of Xnm
, and thefourth by (a), using the facts that Xnm
⇒ Y and t1, . . . , tk ⊂ D(Y ). Hence
E
(
k∏
i=1
fi(X(ti))
)
= E
(
k∏
i=1
fi(Y (ti))
)
(4.3)
for all f1, . . . fk ∈ C(E) and all t1, . . . , tk ⊂ D(Y ) (and hence for all t1, . . . , tk ⊂ [0,∞)(Lemma 4.2 and right continuity of X and Y )). Now let
D = A ∈ SE : E(1X∈A) = E(1Y ∈A) .
This is clearly a d-system. Also, since 1A is in the closure of C(E) for all open sets A ⊂ E,(4.3), together with the dominated convergence theorem, implies that D contains the π-system consisting of all finite intersections of π−1
t (A) : t ∈ [0,∞) and A ⊂ E is open. ByDynkin’s π-system Lemma, D contains the σ-algebra generated by the coordinate randomvariables πt, and hence, by Proposition 2.11, SE itself. Thus X and Y have the samedistribution.
Example 4.4. Suppose (Xn)n≥1 is a relatively compact sequence of processes with samplepaths in DRd [0,∞) having independent increments, and let X be a process in DRd [0,∞) hav-ing independent increments. For simplicity assume that X(0) = Xn(0) = 0 for all n. Now(Xn(t1), . . . , Xn(tk)) ⇒ (X(t1), . . . , X(tk)) for every finite set t1, . . . , tk ⊂ D(X) if and only if
E
exp i
k∑
j=1
〈θj , Xn(tj)〉
→ E
exp i
k∑
j=1
〈θj , X(tj)〉
for every k-tuple (θ1, . . . , θk) ∈ ((Rd)∗)k. Since Xn has independent increments,
E
exp i
k∑
j=1
〈θj , Xn(tj)〉
=
k∏
j=1
E(exp i〈θ′j , Xn(tj) −Xn(tj−1)〉)
20
where θ′j =∑k
m=j θm for j = 1, . . . , k. The same result holds for X and so, if Xn(t) −Xn(s) ⇒X(t)−X(s) for all s, t, then the finite dimensional distributions of Xn converge in distributionto those of X and hence Xn ⇒ X . Since this condition is clearly necessary, Xn ⇒ X if and onlyif Xn(t) −Xn(s) ⇒ X(t) −X(s) for all s, t.
5 Relative Compactness in DE[0,∞)
In order to apply Theorem 4.3 in any practical cases, it is necessary to have an understanding ofthe conditions a family of stochastic processes, or equivalently probability measures, must satisfyto be relatively compact. In this section we establish some necessary and sufficient conditionsfor relative compactness in DE [0,∞) which will be useful in later sections.
5.1 Prohorov’s Theorem
Prohorov’s Theorem gives a characterisation of the compact subsets of P(S), where (S, d) is themetric space of Section 3, by relating compactness to the notion of tightness.
Definition 5.1. A probability measure P ∈ P(S) is said to be tight if for each ε > 0 thereexists a compact set K ⊂ S such that P (K) ≥ 1 − ε.
A family of probability measures M ⊂ P(S) is tight if for each ε > 0 there exists a compactset K ⊂ S such that infP∈M P (K) ≥ 1 − ε.
Theorem 5.2 (Prohorov’s Theorem). Let (S, d) be complete and separable, and let M ⊂P(S). Then the following are equivalent:
(a) M is tight.
(b) For each ε > 0, there exists a compact set K ⊂ S such that
infP∈M
P (Kε) ≥ 1 − ε,
where Kε is defined in 3.1.
(c) M is relatively compact.
Before we can prove this, we need two intermediate results.
Theorem 5.3. If S is separable, then P(S) is separable. If in addition (S, d) is complete, then(P(S), ρ) is complete.
Proof. Since S is separable, there exists a countable dense subset x1, x2, . . . ⊂ S. Let δx denotethe element of P(S) with unit mass at x ∈ S. Fix x0 ∈ S. We shall show that the countable
set of probability measures of the form∑N
i=0 aiδxiwith N finite, ai rational and
∑Ni=0 ai = 1
is dense in P(S). Given ε > 0, let P ∈ P(S). Since P (⋃∞
i=1B(xi, ε/2)) = 1, there exists some
N < ∞ such that P (⋃N
i=1 B(xi, ε/2)) ≥ 1 − ε2 . Set Ai = B(xi, ε/2) \⋃i−1
j=1 B(xj , ε/2) for each
i ≤ N . Pick some m ∈ N with m ≥ 2Nε . Let ai = ⌊mP (Ai)⌋
m < P (Ai) for i = 1, . . . , N and define
a0 = 1 −∑Ni=1 ai. Then the ai are rational and so Q =
∑Ni=0 aiδxi
is of the required form. IfF ∈ C, then
P (F ) ≤ P
⋃
F∩Ai 6=∅
Ai
+ P
((
N⋃
i=1
Ai
)c)
≤∑
F∩Ai 6=∅
⌊mP (Ai)⌋m
+N
m+ε
2
≤ Q(F ε) + ε.
21
Therefore ρ(P,Q) < ε and so P(S) is separable.
To prove completeness, suppose (Pn)n≥1 is a Cauchy sequence in P(S). By restrictingto a subsequence if necessary, we may assume that ρ(Pn−1, Pn) < 2−n for each n ≥ 2. Asin the proof of separability, for each n = 2, 3, . . . there exists some Nn < ∞ and disjoint
sets E(n)1 , . . . , E
(n)Nn
∈ B(S) with diameters less than 2−n such that Pn−1(E(n)0 ) ≤ 2−n where
E(n)0 = S \⋃Nn
i=1 E(n)i . By applying Lemma 3.11 successively for n = 2, 3, . . ., with P = Pn−1,
Q = Pn, ε = δ = 2−n andN = Nn, there exists a probability space (Ω,F ,P) on which are defined
S-values random variables Y(n)0 , . . . , Y
(n)Nn
, for n = 2, 3, . . ., [0, 1]-valued random variables ξ(n),n = 2, 3, . . ., and an S-valued random variableX1 with distribution P1, such that if the constants
c(n)1 , . . . , c
(n)Nn
∈ [0, 1] are appropriately chosen, then the random variable
Xn =
Y(n)i on Xn−1 ∈ E
(n)i , ξ(n) ≥ c
(n)i , i = 1, . . . , Nn,
Y(n)0 on Xn−1 ∈ E
(n)0 ∪⋃Nn
i=1Xn−1 ∈ E(n)i , ξ(n) < c
(n)i
has distribution Pn, andP(d(Xn−1, Xn) ≥ 2−n+1) ≤ 2−n+1.
Then∑∞
n=2 P(d(Xn−1, Xn) ≥ 2−n+1) <∞ and, by the Borel-Cantelli Lemma, P(d(Xn−1, Xn) ≥2−n+1 infinitely often) = 0. Hence
P
( ∞∑
n=2
d(Xn−1, Xn) <∞)
= 1.
Since (S, d) is complete, limn→∞Xn exists on this set. Setting X to be the value of the limitwhere it exists, and 0 otherwise, limn→∞Xn = X almost surely and so, by Proposition 3.10,limn→∞ ρ(Pn, P ) = 0, where P is the distribution of X .
Lemma 5.4. If (S, d) is complete and separable, then each P ∈ P(S)is tight.
Proof. Let x1, x2, . . . be a dense subset of S, and let P ∈ P(S). Then P(⋃∞
k=1 B(
xk,1n
))
= 1for each n and so, given ε > 0, there exist integers N1, N2, . . . such that
P
(
Nn⋃
k=1
B
(
xk,1
n
)
)
≥ 1 − ε
2n
for n = 1, 2, . . .. Let K be the closure of⋂
n≥1
⋃Nn
k=1 B(
xk,1n
)
. Then for each δ > 0, K can
be covered by Nn balls of radius δ where n > 1δ . Therefore K is totally bounded and hence
compact, and
P (K) ≥ 1 −∞∑
n=1
[
1 − P
(
Nn⋃
k=1
B
(
xk,1
n
)
)]
≥ 1 −∞∑
n=1
ε
2n
= 1 − ε.
Proof of Theorem 5.2.
(a ⇒ b) Immediate.
22
(b ⇒ c) By Theorem 5.3, (P(S), ρ) is complete and hence the closure of M is complete. So itsuffices to show that M is totally bounded i.e. given δ > 0, there exists a finite set N ⊂ P(S)such that M ⊂ ⋃P∈N Q : ρ(P,Q) < δ.
Let 0 < ε < δ2 . Then there exists a compact set K ⊂ S such that (b) holds. By the
compactness of K, there exists a finite set x1, . . . , xn ⊂ K such that Kε ⊂ ⋃ni=1Bi, where
Bi = B(xi, 2ε). Fix x0 ∈ S and an integerm ≥ nε and let N be the finite collection of probability
measures of the form
P =
n∑
i=0
(
ki
m
)
δxi, (5.1)
where ki are integers with 0 ≤ ki ≤ m and∑n
i=0 ki = m.
Given Q ∈ M, let ki = ⌊mQ(Ei)⌋ for i = 1, 2, . . . , n, where Ei = Bi \⋃i−1
j=1Bj , and let
k0 = m−∑ni=1 ki. Then, defining P by 5.1, we have
Q(F ) ≤ Q
⋃
F∩Ei 6=∅
Ei
+Q((Kε)c)
≤∑
F∩Ei 6=∅
⌊mQ(Ei)⌋m
+n
m+ ε
≤ P (F 2ε) + 2ε
for all closed sets F ⊂ S. So ρ(P,Q) ≤ 2ε < δ as required.
(c ⇒ a) Let ε > 0. Since M is relatively compact, it is totally bounded and hence, for eachn ∈ N, there exists a finite subset Nn ⊂ M such that M ⊂ ⋃P∈Nn
Q : ρ(P,Q) < ε2n+1 . Since
Nn is finite, by Lemma 5.4, for each n ∈ N we can choose a compact set Kn ⊂ S such thatP (Kn) ≥ 1 − ε
2n+1 for all P ∈ Nn. Given Q ∈ M, for each n ∈ N, there exists Pn ∈ Nn suchthat
Q(Kε/2n+1
n ) ≥ Pn(Kn) − ε
2n+1≥ 1 − ε
2n.
Letting K be the closure of⋂
n≥1Kε/2n+1
n , K is compact and
Q(K) ≥ 1 −∞∑
n=1
ε
2n= 1 − ε.
5.2 Compact Sets in DE [0,∞)
To apply Prohorov’s Theorem (Theorem 5.2) to P(DE [0,∞)), it is necessary to have a charac-terisation of the compact sets of DE[0,∞). We first give conditions under which a collection ofstep functions is compact.
Definition 5.5. Given a step function x ∈ DE [0,∞), define s0(x) = 0 and, for k = 1, 2, . . .,define sk(x) = inft > sk−1(x) : x(t) 6= x(t−) if sk−1(x) <∞, and sk(x) = ∞ if sk−1(x) = ∞.
Lemma 5.6. For Γ ⊂ E, compact, and δ > 0, define A(Γ, δ) to be the set of step functionsx ∈ DE[0,∞) such that x(t) ∈ Γ for all t ≥ 0, and sk(x)− sk−1(x) > δ for each k ≥ 1 for whichsk−1 <∞. Then the closure of A(Γ, δ) is compact.
Proof. It is enough to show that every sequence in A(Γ, δ) has a convergent subsequence. Sup-pose (xn)n≥1 is a sequence in A(Γ, δ). Either there exists a subsequence (x1a
n )n≥1 such thats1(x
1an ) <∞ for all n, or there exists a subsequence (x1
n)n≥1 such that s1(x1n) = ∞ for all n. In
23
the first case, there exists a subsequence (x1bn )n≥1 of (x1a
n )n≥1 such that, for some t1 ∈ [δ,∞],
limn→∞ s1(x1bn ) = t1. If t1 <∞, we can insist further that
∣
∣
∣log t1s1(x1b
n )
∣
∣
∣ < 1n . Since Γ is compact,
the sequence (x1bn )n≥1 has a subsequence (x1
n)n≥1 such that limn→∞ x1n(s1(x
1n)) = α1 for some
α1. In this way, we can construct a sequence of subsequences (x1n)n≥1 ⊂ (x2
n)n≥1 ⊂ · · · suchthat, for k = 1, 2, . . ., either
(a) sk(xkn) <∞ for all n, there exists some tk ∈ [kδ,∞] such that limn→∞ sk(xk
n) = tk, and if
tk <∞,∣
∣
∣log tk−tk−1
sk(xkn)−sk−1(xk
n)
∣
∣
∣< 1
n , and limn→∞ xkn(sk(xk
n)) = αk for some αk, or
(b) sk(xkn) = ∞ for all n.
Let (ym)m≥1 be the subsequence of (xn)n≥1 defined by ym = xmm and define y ∈ DE [0,∞) by
y(t) = αk, tk ≤ t < tk+1, for k = 0, 1, . . . where we take t0 = 0. Since sk(ym) − sk−1(ym) > δfor each k ≥ 1 and each m for which sk−1(ym) <∞, we may define λm ∈ Λ to be the piecewiselinear function that joins (sk−1(ym), tk−1) to (sk(ym), tk) if tk < ∞ and which has gradient 1
otherwise. Then γ(λm) = supk
∣
∣
∣logtk−tk−1
sk(ym)−sk−1(ym)
∣
∣
∣ < 1m → 0, and, for each t there exists a k
such that r(ym(t), y(λm(t))) = r(ym(sk(ym)), y(tk)) → 0. Thus ym → y by Proposition 2.6.
The conditions for compactness will be stated in terms of the following modulus of continuity.
Definition 5.7. For x ∈ DE[0,∞), δ > 0, and T > 0, define
w′(x, δ, T ) = infti
maxi
sups,t∈[ti−1,ti)
r(x(s), x(t)),
where ti ranges over all partitions of the form 0 = t0 < t1 < · · · < tn−1 < T ≤ tn withmin1≤i≤n(ti − ti−1) > δ and n ≥ 1. (The initially strange looking condition tn−1 < T ≤ tnallows us to not have to worry about the length of the final interval, in a partition of [0, T ],being smaller than δ. For example, partitions where each interval is the same length, but thislength does not divide T , are admissible.)
Note that w′(x, δ, T ) is non-decreasing in δ and in T , and that
w′(x, δ, T ) ≤ w′(y, δ, T ) + 2 sup0≤s<T+δ
r(x(s), y(s)).
Before we can characterise the compact sets, we need to establish a few properties ofw′(x, δ, T ).
Lemma 5.8. (a) For each x ∈ DE [0,∞) and T > 0, w′(x, δ, T ) is right continuous in δ and
limδ→0
w′(x, δ, T ) = 0.
(b) If (xn)n≥1 is a sequence in DE [0,∞), and limn→∞ d(xn, x) = 0, then
lim supn→∞
w′(x, δ, T ) ≤ w′(x, δ, T + ε)
for every δ > 0, T > 0, and ε > 0.
(c) For each δ > 0 and T > 0, w′(x, δ, T ) is Borel measurable in x.
Proof. (a) The right continuity follows from the fact that any partition 0 = t0 < t1 < · · · <tn−1 < T ≤ tn with min1≤i≤n(ti−ti−1) > δ and n ≥ 1 also satisfies min1≤i≤n(ti−ti−1) > δ′
for δ′ = 12 (δ + min1≤i≤n(ti − ti−1)) > δ.
24
Let N ∈ N and define τN0 = 0 and, for k = 1, 2, . . .,
τNk = inf
t > τNk−1 : r(x(t), x(τN
k−1)) >1
N
if τNk−1 < ∞, τN
k = ∞ if τNk−1 = ∞. Note that the sequence (τN
k )k≥0 is strictly increasing
(as long as its terms remain finite) by the right continuity of x, and for 0 < δ < minτNk+1−
τNk : τN
k < T , w′(x, δ, T ) ≤ maxi sups,t∈[τNi ,τN
i+1)
(
r(x(s), x(τNi )) + r(x(τN
i ), x(t)))
≤ 2N .
Hence limδ→0 w′(x, δ, T ) = 0.
(b) Let (xn)n≥1 be a sequence in DE [0,∞), x ∈ DE [0,∞). Suppose δ > 0 and T > 0. Iflimn→∞ d(xn, x) = 0, then by Proposition 2.6, there exists a sequence (λn)n≥1 in Λ suchthat (2.1) and (2.9) hold when T is replaced by T + δ. For each n, let yn(t) = x(λn(t)) forall t ≥ 0 and let δn = sup0≤t≤T (λn(t+ δ) − λn(t)). Then, for every ε > 0,
lim supn→∞
w′(xn, δ, T ) ≤ lim supn→∞
w′(yn, δ, T ) + 2 lim supn→∞
sup0≤s<T+δ
r(xn(s), x(λn(t)))
≤ lim supn→∞
w′(x, δn, λn(T ))
≤ limn→∞
w′(x, δn ∨ δ, T + ε)
= w′(x, δ, T + ε),
where the first inequality follows from the comment at the end of Definition 5.7, the secondfollows by substituting s, t for λn(s), λn(t) in the definition of w′ and by w′ being non-decreasing in δ, the third follows by w′ being non-decreasing in δ and T and by λn(T ) → T ,and the equality follows by w′ being right continuous in δ (part (a)).
(c) Define w′(x, δ, T+) = limε↓0 w′(x, δ, T + ε). This exists by the monotonicity of w′ in T .Then if xn → x, by (b),
lim supn→∞
w′(xn, δ, T+) ≤ limε↓0
w′(x, δ, T + 2ε)
= w′(x, δ, T+).
So w′(x, δ, T+) is upper semicontinuous and hence Borel measurable in x. The result fol-lows by the observation that w′(x, δ, T ) = limε↓0 w′(x, δ, (T −ε)+) for every x ∈ DE [0,∞).
Theorem 5.9. Let (E, r) be complete. Then the closure of A ⊂ DE [0,∞) is compact if andonly if the following two conditions hold:
(a) For every rational t ≥ 0, there exits a compact set Γt ⊂ E such that x(t) ∈ Γt for allx ∈ A.
(b) For each T > 0,limδ→0
supx∈A
w′(x, δ, T ) = 0.
Proof. Suppose A satisfies (a) and (b), and let l ≥ 1. Choose δl ∈ (0, 1) such that
supx∈A
w′(x, δl, l) ≤1
l
and ml ≥ 2 such that 1ml
< δl. Define Γ(l) =⋃(l+1)ml
i=0 Γi/ml. Using the notation of Lemma 5.6,
let Al = A(Γ(l), δl).
25
Given x ∈ A, there is a partition 0 = t0 < t1 < · · · < tn−1 < l ≤ tn < l+ 1 < tn+1 = ∞ withmin1≤i≤n(ti − ti−1) > δl such that
max1≤i≤n
supt∈[ti−1,ti)
r(x(s), x(t)) ≤ 2
l.
Define x′ ∈ Al by x′(t) = x((⌊mlti⌋ + 1)/ml) for ti ≤ t < ti+1, i = 0, 1, . . . , n. Then, by thedefinition of ml, ti ≤ (⌊mlti⌋ + 1)/ml ≤ ti + 1
ml< ti+1, sup0≤t<l r(x
′(t), x(t)) ≤ 2l . Hence
d(x′, x) ≤∫ ∞
0
e−u supt≥0
r(x′(t ∧ u), x(t ∧ u)) ∧ 1du
≤ 2
l+ e−l
<3
l,
and so A ⊂ A3/ll . Now l was arbitrary and so A ⊂ ⋂l≥1 A
l/3l . By Lemma 5.6, Al is compact for
each l ≥ 1 and hence A is totally bounded. It follows that A has compact closure, as required.
Conversely, suppose that A has compact closure. For each rational t ≥ 0, define Γt ⊂ E byΓt = At, where At = x(t) : x ∈ A. In order to show that Γt is compact, it suffices to showthat every sequence in At has a convergent subsequence. Suppose (xn(t))n≥1 is a sequence inAt. Since A has compact closure, by restricting to a subsequence if necessary, we may assumethat xn → x, for some x ∈ DE[0,∞). By Proposition 2.6, there exists a sequence (λn)n≥1
in Λ such that (2.1) and (2.9) hold. There is a subsequence λnrsuch that either λnr
(t) ≥ tfor all nr, or λnr
(t) < t for all nr. In the first case, r(xnr(t), x(t)) ≤ r(xnr
(t), x(λnr(t))) +
r(x(λnr(t)), x(t)), the first term of which converges to 0 by (2.9) and the second term of which
converges to 0 by (2.4) and the right continuity of x. In the second case, r(xnr(t), x(t−)) ≤
r(xnr(t), x(λnr
(t))) + r(x(λnr(t)), x(t−)), which converges to 0 similarly. Hence (xn(t))n≥1 has
a convergent subsequence, and (a) holds.
To see that (b) holds, suppose there exist η > 0, T > 0 and a sequence (xn)n≥1 in A such thatw′(xn,
1n , T ) ≥ η for all n. Since A has compact closure, we may assume that limn→∞ d(xn, x) =
0 for some x ∈ DE [0,∞). By Lemma 5.8(b),
η ≤ lim supn→∞
w′(xn, δ, T ) ≤ w′(x, δ, T + 1)
for all δ > 0. Letting δ → 0, by Lemma 5.8(a), the right hand side tends to zero, resulting in acontradiction. Hence (b) holds.
5.3 Some Useful Criteria
We now combine the above characterisation with Prohorov’s Theorem (Theorem 5.2) to obtainsome useful criteria for relative compactness in DE [0,∞).
Theorem 5.10. Let (E, r) be complete and separable, and let Xα be a family of stochasticprocesses with sample paths in DE [0,∞). Then Xα is relatively compact if and only if thefollowing two conditions hold:
(a) For every η > 0 and rational t ≥ 0, there exists a compact set Γη,t ⊂ E such that
infα
P(
Xα(t) ∈ Γηη,t
)
≥ 1 − η.
(b) For every η > 0 and T > 0, there exists δ > 0 such that
supα
P (w′(Xα, δ, T ) ≥ η) ≤ η.
26
(Note that, by Lemma 5.8, w′(x, δ, T ) is Borel measurable and so the set w′(Xα, δ, T ) ismeasurable in the probability space underlying Xα. Hence it is legitimate to refer to theprobability of this set).
Proof. If Xα is relatively compact, then since (DE [0,∞), d) is complete and separable (The-orem 2.10) we may apply Prohorov’s Theorem (Theorem 5.2), to find a compact set Kη ⊂DE [0,∞) such that infα P (Xα ∈ Kη) ≥ 1 − η. Since Kη is compact, by Theorem 5.9, for everyrational t ≥ 0, there exists a compact set Γη,t such that x(t) ∈ Γη,t for all x ∈ Kη i.e.
infα
P (Xα(t) ∈ Γη,t) ≥ infα
P (Xα ∈ Kη) ≥ 1 − η.
Also, by Theorem 5.9, limδ→0 supx∈Kηw′(x, δ, T ) = 0 and so, for each T > 0, there exists δ > 0
such that supx∈Kηw′(x, δ, T ) < η. Then
supα
P (w′(Xα, δ, T ) ≥ η) ≤ supα
P(Xα /∈ Kη)
≤ 1 − infα
P(Xα ∈ Kη)
≤ η.
So (a) and (b) hold and in fact Γηη,t can be replaced by Γη,t in (a).
Conversely, let ε > 0, let T be a positive integer such that e−T < ε2 , and choose δ > 0 such
that (b) holds with η = ε4 . Let m > 1
δ and set Γ =⋃mT
i=0 Γε2−i−2, im
. Then
infα
P(
Xα(i/m) ∈ Γε4 , i = 0, 1, . . . ,mT
)
≥ 1 −mT∑
i=0
(
1 − infαP(
Xα(i/m) ∈ Γε2−i−2
ε2−i−2, im
))
≥ 1 −mT∑
i=0
ε2−i−2
≥ 1 − ε
2.
Using the notation of Lemma 5.6, let A = A(Γ, δ). By the lemma, A has compact closure.
Given x ∈ DE [0,∞), with w′(x, δ, T ) < ε4 and x(i/m) ∈ Γ
ε4 for i = 0, 1, . . . ,mT , by the
definition of w′ (Definition 5.7), there exists a partition 0 = t0 < t1 < · · · < tn−1 < T ≤ tn suchthat min1≤i≤n(ti − ti−1) > δ and
max1≤i≤n
sups∈[ti−1,ti)
r(x(s), x(t)) <ε
4,
and there exist yi ⊂ Γ such that r(x(i/m), yi) <ε4 , for i = 0, 1, . . . ,mT . Define x′ ∈ A by
x′(t) =
y⌊mti−1⌋+1, ti−1 ≤ t < ti, i = 1, . . . , n− 1
y⌊mtn−1⌋+1, t ≥ tn−1.
Then since m > 1δ , for each i = 1, . . . , n, ti−1 ≤ ⌊mti−1⌋+1
m < ti and so if ti−1 ≤ t < ti ∧ T , thenr(x(t), x′(t)) ≤ r(x(t), x((⌊mti−1⌋+ 1)/m))+ r(x((⌊mti−1⌋+ 1)/m), y⌊mti−1⌋+1) <
ε2 , and hence
d(x, x′) < ε2 + e−T < ε, implying that x ∈ Aε. Consequently,
infα
P(Xα ∈ Aε) ≥ infα
P
(
Xα(i/m) ∈ Γε4 , i = 0, 1, . . . ,mT and w′(Xα, δ, T ) <
ε
4
)
≥ 1 −(ε
2+ε
4
)
> 1 − ε.
Since Theorem 2.10 implies (DE [0,∞), d) is complete and separable, by Prohorov’s Theorem(Theorem 5.2), Xα is relatively compact.
27
Corollary 5.11. Let (E, r) be complete and separable, and let (Xn)n≥1 be a sequence of stochas-tic processes with sample paths in DE[0,∞). Then (Xn)n≥1 is relatively compact if and only ifthe following two conditions hold:
(a) For every η > 0 and rational t ≥ 0, there exists a compact set Γη,t ⊂ E such that
lim infn→∞
P(
Xn(t) ∈ Γηη,t
)
≥ 1 − η.
(b) For every η > 0 and T > 0, there exists δ > 0 such that
lim supn→∞
P (w′(Xn, δ, T ) ≥ η) ≤ η.
Proof. The conditions are necessary as an immediate consequence of Theorem 5.10.
For the sufficiency, fix η > 0, rational t ≥ 0, and T > 0. For every n ∈ N, by Lemma 5.4there exists a compact set Γn ⊂ E such that P(Xn(t) ∈ Γη
n) ≥ 1 − η and by Lemma 5.8(a)there exists δn > 0 such that P(w′(Xn, δn, T ) ≥ η) ≤ η. By conditions (a) and (b) there existsa compact set Γ0 ⊂ E, δ0 > 0, and a positive integer n0 such that
infn≥n0
P (Xn(t) ∈ Γη0) ≥ 1 − η
andsup
n≥n0
P (w′(Xn, δ0, T ) ≥ η) ≤ η.
We can replace n0 by 1 in the above relations if we replace Γ0 by Γ =⋃n0−1
n=0 Γn and δ0 by
δ =∧n0−1
n=0 δn. The result follows by Theorem 5.10.
6 A Law of Large Numbers
We now change course slightly to discuss a generalisation of the Law of Large Numbers to asequence of Markov jump processes. We shall use this result in the next section when we applythe theory we have built up so far to establishing a generalisation of the Central Limit Theoremto Markov processes.
The Law of Large Numbers essentially demonstrates that the average behaviour of a sequenceof independent identically distributed random variables with finite means becomes deterministicas the number of random variables becomes large. By viewing these random variables as thejump sizes in a random walk, the value of
X1 +X2 + · · · +XN
N
can be regarded as the position of the random walk at time 1 if the jump rate is increased andthe jump size is decreased, each by a factor of N (See Figure 1). In this setting, the Law of LargeNumbers can be interpreted as saying that in the limit as N → ∞ the position of a random walkat time 1 becomes deterministic if the jump rate is increased, and the jump size is decreased,each by a factor of N .
Now suppose that instead of a random walk we have a general pure jump Markov process(defined below). We generalise the Law of Large Numbers by showing that, under certainconditions, if we alter the jump rate and jump size as described above, then, in the limit asN → ∞, the position of the jump process at time t is deterministic for all t. This deterministiclimit is known as the fluid limit.
28
Figure 1: Illustration of the scaling process for N = 3
6.1 Preliminaries
We start with a few definitions.
Definition 6.1. A stochastic process X = (Xt)t≥0 taking values in a subset I of Rd is a purejump process if there exist some (possibly infinite) random times 0 = J1 < J2 < . . . < Jn ↑ ζ,(where ∞ <∞ is allowed) and some I-valued process (Yn)n∈Z+ such that
Xt =
Yn if Jn ≤ t < Jn+1
∂ if t ≥ ζ
where ζ (which may be infinite) is the explosion time of the process and ∂ is some cemeterystate.
X is a pure jump Markov process with Levy kernel K if it is a pure jump process and, for alln ∈ N,
P(Jn ∈ dt, ∆XJn∈ dy | Jn > t, XJn−1 = x) = K(x, dy)dt,
where ∆XJn= XJn
−XJn−1 is the displacement of the nth jump.
Definition 6.2. The jump measure µ of X on (0,∞) × Rd is given by
µ =
∞∑
n=1
δ(Jn,∆XJn )
where δ(t,y) denotes the unit mass at (t, y) i.e.∫
fdµ =∑∞
n=1 f(Jn,∆XJn).
We also introduce the random measure ν on (0,∞) × Rd, given by
ν(dt, dy) = K(Xt−, dy)dt.
29
Definition 6.3. The Laplace transform of a pure jump Markov process with Levy kernel K isgiven by
m(x, θ) =
∫
Rd
e〈θ,y〉K(x, dy).
We assume that, for some η0 > 0,
supx∈I
sup|θ|≤η0
m(x, θ) ≤ C <∞. (6.1)
The conditions for the fluid limit to exist will be formulated in terms of the Laplace transform,so before we state the main theorem, we establish a few preliminary results.
Fix η ∈ (0, η0). We establish bounds on m′′(x, θ) and m′′′(x, θ) for |θ| ≤ η in the lemmabelow, where ′ denotes differentiation in θ. Although we only use the first bound in the proof ofthe fluid limit, the second bound will be useful in the next section and it is convenient to provethem together.
Lemma 6.4. There exist A <∞ and B <∞ such that
|m′′(x, θ)| ≤ A and |m′′′(x, θ)| ≤ B (6.2)
for all x ∈ I and |θ| ≤ η.
Proof. Set δ = η0 − |θ| ≥ η0 − η > 0. Note that (δy)2 ≤ eδy + e−δy for all y ∈ R. Then
|y|2 = (y1)2 + · · · + (yd)
2 ≤ δ−2d∑
i=1
(eδyi + e−δyi)
and so∫
Rd
|y|2e〈θ,y〉K(x, dy) ≤ δ−2d∑
i=1
(∫
Rd
eδyie〈θ,y〉K(x, dy) +
∫
Rd
e−δyie〈θ,y〉K(x, dy)
)
= δ−2d∑
i=1
(∫
Rd
e〈θ+δei,y〉K(x, dy) +
∫
Rd
e〈θ−δei,y〉K(x, dy)
)
≤ 2Cd
δ2
since |θ± δei| ≤ |θ|+ δ = η0. We may now differentiate twice under the integral sign to see thatm′′(x, θ) exists (m′(x, θ) exists since |y| ≤ 1
2 (|y|2 + 1)). Then for |u|, |v| ≤ 1,
〈u,m′′(x, θ)v〉 =
∫
Rd
〈u, y〉〈y, v〉e〈θ,y〉K(x, dy)
≤∫
Rd
|u||y|2|v|e〈θ,y〉K(x, dy)
≤ 2Cd
δ2
≤ 2Cd
(η0 − η)2
So, setting A = 2Cd(η0−η)2 gives the required result.
Now, with δ as above, (δy)2 ≤ 94 (e
23 δy + e−
23 δy) and so
|y|3 =(
(y1)2 + · · · + (yd)
2)
32 ≤ 27
8δ−3
(
d∑
i=1
(e23 δyi + e−
23 δyi)
)
32
≤ 27
8δ−3(2d)
12
d∑
i=1
(eδyi + e−δyi)
30
where the last inequality follows by Jensen’s Inequality and the concavity of x23 . Then, as above
∫
Rd
|y|3e〈θ,y〉K(x, dy) ≤ 27
8δ−3(2d)
12 2Cd ≤ 27Cd
32
2√
2(η0 − η)3
and so, by the same reasoning as above, m′′′(x, θ) exists and taking B = 27Cd32
2√
2(η0−η)3gives the
required result.
Proposition 6.5. Let a : Ω × (0,∞) × Rd → R be a previsible process satisfying
E
∫ t
0
∫
Rd
|a(s, y)|ν(ds, dy) <∞
for all t. Then the following process is a martingale.
∫ t
0
∫
Rd
a(s, y)(µ− ν)(ds, dy).
Proof. This result is well known.
In particular, by the proof of Lemma 6.4, |y| ≤ 12 (|y|2 + 1) ≤ 1
2 (η−20
∑di=1(e
〈η0ei,y〉 +
e〈−η0ei,y〉) + 1), and so (6.1) gives
E
∫ t
0
∫
Rd
|y|ν(ds, dy) ≤ 1
2
(
2d
η20
+ 1
)
Ct <∞
and hence we may take a(s, y) = y to get the martingale
Mt =
∫
Rd
y(µ− ν)(ds, dy). (6.3)
For θ ∈ (Rd)∗, define
φ(x, θ) =
∫
Rd
e〈θ,y〉 − 1 − 〈θ, y〉
K(x, dy).
Then φ ≥ 0 and, by the second-order mean value theorem,
φ(x, θ) =
∫ 1
0
〈θ,m′′(x, rθ)θ〉(1 − r)dr.
Therefore, by Lemma 6.4
φ(x, θ) ≤∫ 1
0
|m′′(x, rθ)||θ|2(1 − r)dr ≤ 1
2A|θ|2,
for x ∈ I and |θ| ≤ η.
Definition 6.6. Let (θt)t≥0 be a previsible process in (Rd)∗ with |θt| ≤ η for all t. Define
Zt = Zθt = exp
∫ t
0
〈θs, dMs〉 −∫ t
0
φ(Xs, θs)ds
.
Lemma 6.7. (Zt)t≥0 is a martingale.
31
Proof. Since |θt| ≤ η for all t,
|Zt| ≤ exp∫ t
0
|θs||dMs| +∫ t
0
|φ(Xs, θs)|ds
≤ exp∫ t
0
∫
Rd
2η|y|ν(ds, dy) +
∫ t
0
1
2Aη2dt
≤ exp
(
2d
η20
+ 1
)
ηCt+1
2Aη2t
,
and so (Zt)t≥0 is locally bounded. Now,
Zt − Zt− = Zt−
(
exp∫
Rd
〈θt, y〉(µ− ν)(dt, dy) −∫
Rd
e〈θt,y〉 − 1 − 〈θt, y〉ν(dt, dy) − 1
)
= Zt−
∑
y:(Jn,∆XJn)=(t,y)
e〈θt,y〉 − 1 +
∫
Rd
(e〈θt,y〉 − 1)ν(dt, dy)
=
∫
Rd
Zt−(e〈θt,y〉 − 1)(µ− ν)(dt, dy).
Hence
Zt = 1 +
∫ t
0
∫
Rd
Zs−(e〈θs,y〉 − 1)(µ− ν)(ds, dy),
and so (Zt)t≥0 is a non-negative local martingale. Therefore there exist stopping times Tn → ∞such that (ZTn
t )t≥0 is a martingale and so, by Fatou’s Lemma, E(Zt) = E(lim infn→∞ ZTn
t ) ≤lim infn→∞ E(ZTn
t ) = 1. Therefore
E
∫ t
0
∫
Rd
|Zs−(e〈θs,y〉 − 1)|ν(ds, dy) ≤ E
∫ t
0
Zs−
∫
Rd
(e〈θs,y〉 + 1)K(Xs−, dy)ds
≤ E
∫ t
0
Zs(m(Xs, θs) +m(Xs, 0))ds
≤ 2Ct <∞.
By Proposition 6.5, (Zt)t≥0 is a martingale.
Proposition 6.8 (The Exponential Martingale Inequality). For all δ ∈ (0, Aηt√d],
P
(
sups<t
|Ms| > δ
)
≤ 2de−δ2
2Adt .
Proof. Fix θ ∈ (Rd)∗ with |θ| = 1 and, for any δ′ > 0, consider the stopping time
T = inft ≥ 0 : 〈θ,Mt〉 > δ′.For any ε < η, by Lemma 6.7, (Zεθ
t )t≥0 is a martingale, where θt = θ for all t. Therefore, bythe Optional Stopping Theorem, E(Zεθ
T∧t) = E(Zεθ0 ) = 1. Since on the set T ≤ t,
ZεθT∧t ≥ exp
ε〈θ,MT 〉 −∫ T
0
1
2Aε2|θ|2dt
≥ eδ′ε− 12Atε2
(using the right continuity of M to get 〈θ,MT 〉 ≥ δ′),
P
(
sups≤t
〈θ,Ms〉 > δ′)
= P(T ≤ t)
≤ P
(
ZεθT∧t ≥ eδ′ε− 1
2Atε2)
≤ e−δ′ε+ 12Atε2
E(ZεθT∧t)
= e−δ′ε+ 12Atε2
,
32
where the second inequality is by Chebyshev’s Inequality. In particular, when δ′ ≤ Atη, we cantake ε = δ′
At to get
P
(
sups≤t
〈θ,Ms〉 > δ′)
≤ e−δ′22At .
Now, if sups≤t |Ms| > δ, then sups≤t〈θ,Ms〉 > δ√d
for one of θ = ±e1, . . . ,±ed. Setting δ′ =δ√d≤ ATη gives
P(sups<t
|Ms| > δ) ≤d∑
i=1
P
(
〈ei,Ms〉 >δ√d
)
+
d∑
i=1
P
(
〈−ei,Ms〉 >δ√d
)
≤ 2de−δ2
2Adt .
Finally we state Gronwall’s Inequality, which will be useful in this section and the next.
Lemma 6.9 (Gronwall’s Inequality). Let µ be a Borel measure on [0,∞), let ε ≥ 0 and letf be a Borel measurable function that is bounded on bounded intervals and satisfies
0 ≤ f(t) ≤ ε+
∫
[0,t)
f(s)µ(ds)
for t ≥ 0. Thenf(t) ≤ εeµ[0,t).
In particular, if M > 0 and
0 ≤ f(t) ≤ ε+M
∫ t
0
f(s)ds
for t ≥ 0, thenf(t) ≤ εeMt.
Proof. This result is well known.
6.2 The Fluid Limit
Let (XNt )t≥0 be a sequence of pure jump Markov processes with Levy kernels KN (x, dy) taking
values in some subsets IN of Rd. Let S be an open subset in Rd and set SN = IN ∩S. We shallstudy, under certain conditions, the limiting behaviour of (XN
t )t≥0 as N → ∞, on compact timeintervals, up to the first time the process leaves S.
The introduction of S does not add any additional restrictions as we are free to take S = Rd.However, in some situations our processes may all stop abruptly on leaving some open set Ui.e. KN(x, dy) = 0 for all x /∈ U and when this happens it is useful to restrict our attention towhat happens before the processes leave U . In this case we may choose S to be a subset of U .In other cases we may wish to take S to be small to make it easier to check that the variousconvergence conditions are satisfied. The only restriction imposed on S is that the conjecturedlimit path does not leave S in the relevant compact time interval.
Definition 6.10. Fix t0 > 0 and set
TN = inft ≥ 0 : XNt /∈ S ∧ t0.
¿From now on we shall assume that XNt = XN
t∧T N for all t ≥ 0.
33
Let mN (x, θ) be the Laplace transform corresponding to Levy kernel KN(x, dy) for x ∈ SN
and θ ∈ (Rd)∗ (see Definition 6.3). We assume that there is a limit kernel K(x, dy) defined forx ∈ S with corresponding Laplace transform m(x, θ) with the following properties:
(a) There exists a constant η0 > 0 such that m(x, θ) is bounded for all x ∈ S and |θ| ≤ η0.
(b) As N → ∞,
supx∈SN
sup|θ|≤η0
∣
∣
∣
∣
mN (x,Nθ)
N−m(x, θ)
∣
∣
∣
∣
→ 0. (6.4)
We give some justification of condition (6.4) with the following example.
Example 6.11. Consider the case outlined at the beginning of this chapter where XN arisesfrom X1 by increasing the jump rate by a factor of N and decreasing the jump size by a factor ofN . Then XN
t = 1N (X1
Nt −X10 ) +XN
0 = 1N (XNt −X0) +XN
0 . So, setting K(x, dy) = K1(x, dy),we get
K(x, dy)dt = P (J1 ∈ dt, ∆XJ1 ∈ dy | J1 > t, X0 = x)
= P
(
NJN1 ∈ dt, N∆XN
JN1
∈ dy | NJN1 > t, XN
0 = x)
= P
(
JN1 ∈ dt/N, ∆XN
JN1
∈ dy/N | JN1 > t/N, XN
0 = x)
= KN (x, dy/N) dt/N.
HenceKN (x, dy
N)
N = K(x, dy) and
m(x,Nθ)
N=
1
N
∫
Rd
e〈Nθ,y〉KN (x, dy)
=
∫
Rd
e〈θ,y′〉KN (x, dy′/N)/N
=
∫
Rd
e〈θ,y′〉K(x, dy′)
= m(x, θ)
(where y′ = Ny) i.e. (6.4) holds. Hence our fluid limit theorem will apply in this case (subjectto a few additional constraints on K(x, dy)) and so we do have a generalisation of the Law ofLarge Numbers as claimed at the beginning of the section.
We also assume that there exists some x0 ∈ S, the closure of S such that, for all δ > 0,
lim supN→∞
N−1 log P(|XN0 − x0| ≥ δ) < 0. (6.5)
What this means is that there exists some γδ > 0 such that P(|XN0 − x0| ≥ δ) < e−γδN for
sufficiently large N i.e. XN0 converges in probability to x0 exponentially fast in N . This holds
in the simple case XN0 = xN
0 (deterministic) with xN0 → x0 as N → ∞.
Set b(x) = m′(x, 0), where once again ′ denotes differentiation in θ. We make the finalassumptions that b is Lipschitz on S and that S has a Lipschitz boundary so that b has anextension to a Lipschitz vector field b on Rd. Then there is a unique solution (xt)t≥0 to the
ordinary differential equation xt = b(xt) starting from x0.
Theorem 6.12 (Fluid Limit). Under the above assumptions, for all δ > 0,
lim supN→∞
N−1 log P( supt≤T N
|XNt − xt| ≥ δ) < 0,
where TN is defined in Definition 6.10.
34
Proof. By assumption (6.4), since m(x, θ) is bounded, there exists some constant C < ∞ suchthat, for all n ∈ N,
supx∈SN
sup|θ|≤η0
mN (x,Nθ)
N≤ C.
Fix η ∈ (0, η0). Then, by the proof of Lemma 6.4,
|mN ′′(x, θ)| ≤ 2NCd
(Nη0 −Nη)2=A
N,
for all x ∈ SN and |θ| ≤ Nη.
Set bN (x) = mN ′(x, 0) and define (MNt )t≥0 by
XNt = XN
0 +MNt +
∫ t
0
bN (XNs )ds.
Then
MNt = XN
t −XN0 −
∫ t
0
∫
Rd
yKN(XNs , dy)ds
=∑
JNn ≤t
∆XJNn−∫ t
0
∫
Rd
yνN (ds, dy)
=
∫ t
0
∫
Rd
y(µN − νN )(ds, dy),
where µN and νN are defined as in Definition 6.2. MNt corresponds to the martingale Mt in
(6.3) and hence we may apply Proposition 6.8 to MNt to conclude that, if ε0 = A
NNηt0√d =
Aηt0√d > 0 and C0 = max2d, 2Adt0, then, for all ε ∈ (0, ε0],
P
(
supt≤T N
|MNt | > ε
)
≤ 2de−Nε2
2AdT N ≤ C0e−Nε2
C0 . (6.6)
Given δ > 0, set ε = min δ2e
−Kt0 , ε0, where K is the Lipschitz constant of b. Let
ΩN = |XN0 − x0| ≤ ε and sup
t≤T N
|MNt | ≤ ε.
Then by (6.5) and (6.6)
P(Ω \ ΩN ) = P(|XN0 − x0| > ε or sup
t≤T N
|MNt | > ε)
≤ P(|XN0 − x0| > ε) + P( sup
t≤T N
|MNt | > ε)
≤ e−γεN + C0e−Nε2
C0
for large enough N , and, by L’Hopital’s Rule,
lim supN→∞
N−1 log(e−γεN + C0e−Nε2
C0 ) = lim supN→∞
−γεe−γεN − ε2
C0C0e
−Nε2
C0
e−γεN + C0e−Nε2
C0
≤ −minγε, ε2/C0 < 0.
Hencelim supN→∞
N−1 log P(Ω \ ΩN ) < 0.
35
However, by (6.4), there exists N1 ∈ N such that
supx∈SN
sup|θ|≤η0
∣
∣
∣
∣
mN (x,Nθ)
N−m(x, θ)
∣
∣
∣
∣
≤ ε2
8At0d
for all N ≥ N1. Then for each N ≥ N1, pick h ∈ (Rd)∗ in the same direction as bN(x) − b(x)and with |h| = ε
2At0√
d. |h| ≤ η by the definition of ε0, since ε ≤ ε0. Then
|bN(x) − b(x)| =|〈h, bN (x) − b(x)〉|
|h|
≤ 1
|h|
( ∣
∣
∣
∣
〈Nh, bN(x)〉N
− mN (x,Nh) −mN (x, 0)
N
∣
∣
∣
∣
+
∣
∣
∣
∣
mN (x,Nh)
N−m(x, h)
∣
∣
∣
∣
+
∣
∣
∣
∣
mN (x, 0)
N−m(x, 0)
∣
∣
∣
∣
+ |m(x, h) −m(x, 0) − 〈h, b(x)〉|)
≤ 1
|h|
(
∣
∣
∣
∣
φN (x,Nh)
N
∣
∣
∣
∣
+ 2 supx∈SN
sup|θ|≤η0
∣
∣
∣
∣
mN (x,Nθ)
N−m(x, θ)
∣
∣
∣
∣
+ |φ(x, h)|)
≤ 1
|h|
(
1
N
1
2
A
NN2|h|2 + 2
ε2
8At0d+
1
2A|h|2
)
=ε
t0(6.7)
We note that
XNt − xt = (XN
0 − x0) +MNt +
∫ t
0
(bN (XNs ) − b(XN
s ))ds+
∫ t
0
(b(XNs ) − b(xs))ds
for t ≤ TN ≤ t0. So, for N ≥ N1, on ΩN ,
|XNt − xt| ≤ 3ε+K
∫ t
0
|XNs − xs|ds
which implies, by Gronwall’s lemma (Lemma 6.9), that supt≤T N |XNt −xt| ≤ 3εeKt0 ≤ δ. Hence
lim supN→∞
N−1 log P( supt≤T N
|XNt − xt| ≥ δ) ≤ lim sup
N→∞N−1 log P(Ω \ ΩN ) < 0,
as required.
Remark 6.13. A corresponding generalisation of the Law of Large Numbers holds for diffusionprocesses. Suppose X1 is a diffusion process with diffusivity a1 and drift b1, satisfying theconditions of Section 3.2.3. For simplicity assume that X1
0 = 0. Then there exists a Brownianmotion B such that
dX1t = σ1(X
1t )dBt + b1(X
1t )dt,
where σ1(x)σ1(x)∗ = a1(x). Defining XN as in Example 6.11, but with XN
0 = 0, we getXN
t = 1NX
1Nt and so,
dXNt =
σ1(NXNt )√
NdB′
t + b1(NXNt )dt,
where B′t = 1√
NBNt is a Brownian motion. Hence XN has diffusivity aN (x) = a1(Nx)
N , and
drift bN (x) = b1(Nx). Since a1 is bounded, aN → 0 uniformly as N → ∞. If there exists someLipschitz function b such that bN → b uniformly, let xt be the solution to the ordinary differentialequation dxt
dx = b(xt) with x0 = 0. By Section 3.2.3, XNt → xt in probability, generalising the
Law of Large Numbers, as required.
36
6.3 A Brief Look at the Exit Time
We conclude this section by making some observations about the limiting distribution of theexit time TN . This will enable us to make the statement of Theorem 6.12 more precise and willalso be useful in the next section.
Definition 6.14. Defineτ = inft ≥ 0 : xt /∈ S ∧ t0,T = t ∈ [0, τ) : xt /∈ S.
We aim to show that
lim supN→∞
N−1 log P( inft∈T ∪τ
|TN − t| > δ) < 0.
SetΩN
δ = supt≤T N
|XNt − xt| < δ
andAδ = s ∈ [0, t0] : inf
y∈Sc|xs − y| < δ.
Note that since S is open, by the right continuity of XN , either TN = t0 or XNT N /∈ S. Hence,
on the set ΩNδ , TN ∈ Aδ or TN = t0 (See Figure 2).
Figure 2: Illustration of the set Aδ
Note that if τ 6= t0, then, for 0 < ε < t0−τ there exists some τε ∈ [τ, τ+ε) such that xτε/∈ S.
Since (S)c is open, there exists δ > 0 such that B(xτ , δ)∩ S = ∅. Then on ΩNδ , XN
τε/∈ S and so
TN ≤ τε < t0. Hence TN = t0 on ΩNδ implies that τ = t0.
Lemma 6.15. sups∈Aδ inft∈T ∪τ |s− t| → 0 as δ → 0.
37
Proof. Suppose the contrary. Then there exists some ε > 0 such that, for each n ∈ N, thereexists some sn ∈ A
1n , with inft∈T ∪τ |sn − t| ≥ ε. By the definition of A
1n , there is a sequence
yn ∈ Sc with |xsn− yn| < 1
n . Since sn is in the compact interval [0, t0] for all n, there existsa convergent subsequence snr
→ s for some s ≤ t0. As xt is continuous, xsnr→ xs. Hence
ynr= (ynr
− xsnr) + xsnr
→ xs and, since Sc is closed, xs ∈ Sc i.e. s ∈ T ∪ τ, contradictingour choice of sn.
Proposition 6.16. For all δ > 0,
lim supN→∞
N−1 log P
(
inft∈T ∪τ
|TN − t| > δ
)
< 0.
Proof. By the above lemma, given δ > 0, there exists an ε > 0 such that sups∈Aε inft∈T ∪τ |s−t| ≤ δ and such that, if τ 6= t0, B(xτη
, ε) ∩ S = ∅ for some 0 < η < t0 − τ . On ΩNε , TN ∈ Aε
or TN = t0. Hence inft∈T ∪τ |TN − t| ≤ δ and so inft∈T ∪τ |TN − t| > δ ⊂ Ω \ ΩNε . By
Theorem 6.12,
lim supN→∞
N−1 log P( inft∈T ∪τ
|TN − t| > δ) ≤ lim supN→∞
N−1 log P(Ω \ ΩNε )
= lim supN→∞
N−1 log P( supt≤T N
|XNt − xt| > δ)
< 0.
In particular, if T is empty, then TN → τ in probability, exponentially fast in N . SinceXN
t = XNt∧T N , for all t ≥ TN , |XN
t − xt∧τ | ≤ |XT N − xT N | + |xT N − xt∧τ |. Now as xt iscontinuous, there exists some ε0 > 0 such that |TN − τ | ≤ ε0 implies that the second term isless than δ
2 for all t ≥ TN . Choosing 0 < ε < δ2 sufficiently small that ΩN
ε ⊂ |TN − τ | ≤ ε0gives
lim supN→∞
N−1 log P(supt≤t0
|XNt − xt∧τ | > δ) ≤ lim sup
N→∞N−1 log P(Ω \ ΩN
ε ) < 0.
7 A Central Limit Theorem
The Central Limit Theorem states that if we have a sequence (Xn)n∈N of independent identicallydistributed random variables with finite mean and variance, then
√N
(
X1 +X2 + · · · +XN
N− µ
)
converges in distribution to a N(0, σ2) random variable where σ2 is the variance of each randomvariable, and µ is the mean of each random variable or, equivalently, the deterministic limit towhich (X1 +X2 + · · · +XN )/N converges by the Law of Large Numbers.
As in the previous section, we can view these random variables as the jump sizes in a randomwalk, and so obtain a corresponding statement about the position of the random walk at time1 if the jump rate is increased and the jump size is decreased, each by a factor of N .
In this section we generalise the Central Limit Theorem to Markov jump processes by showingthat, under certain conditions, if we alter the jump rate and jump size as described above toobtain a sequence (XN
t )t≥0 of jump processes, then,
√N(
XNt − xt
)
38
converges in distribution to a Gaussian process (the analogue of a Gaussian random variablefor processes), where xt is the deterministic fluid limit obtained in the previous section bygeneralising the Law of Large Numbers.
We prove our result for the more general sequence of Markov jump processes described inthe previous section.
Definition 7.1. DefineγN
t =√N(
XNt − xt∧T N
)
,
where TN is the exit time defined in Definition 6.10 and, as before, XNt = XN
t∧T N .
In addition to the assumptions of the previous section, we also assume that
(a) there exists some random variable γ0 such that
γN0 ⇒ γ0, (7.1)
(b)
supx∈SN
√N |bN (x) − b(x)| → 0, (7.2)
where, as before, bN(x) = mN ′(x, 0) and b(x) = m′(x, 0),
(c) b is C1 on S,
(d) a, defined by a(x) = m′′(x, 0), is Lipschitz on S.
Example 7.2. In the case where XN arises from X1 by increasing the jump rate by a factor
of N and decreasing the jump size by a factor of N , by Example 6.11 mN (x,Nθ)N = m(x, θ) and
so bN (x) = b(x). Hence assumption (b) holds in this case and so our convergence theorem willapply in this case (subject to a few additional constraints on K(x, dy)) and so we do have ageneralisation of the Central Limit Theorem as claimed at the beginning of the section.
Note that
〈u, a(x)v〉 =
∫
Rd
〈u, y〉〈y, v〉K(x, dy) = 〈ua(x)∗, v〉
and
〈v, a(x)v〉 =
∫
Rd
〈v, y〉2K(x, dy) ≥ 0
so a is symmetric positive definite and hence there exists σ, unique up to change of sign, suchthat σ(x)σ(x)∗ = a(x).
Definition 7.3. Let (γt)t≤τ be the unique solution to the linear stochastic differential equation
dγt = σ(xt)dBt + ∇b(xt)γtdt
starting from γ0, where τ is defined in Definition 6.14 and B is a Brownian motion. Note thatthe distribution of (γt)t≤τ does not depend on the choice of σ since −B and B have the samedistributions.
As the limiting distribution of γNt will depend on the limiting behaviour of TN , we make the
final assumptions that T , defined in Definition 6.14 is finite and that, for all t ∈ T , ∂S is C1 atxt with inward normal nt, and P(〈nt, γt〉 = 0) = 0.
Definition 7.4. DefineT = mint ∈ T : 〈nt, γt〉 < 0 ∧ τ.
From now on we assume that γt = γt∧T .
We aim to prove that γNt ⇒ γt as N → ∞.
39
7.1 Relative Compactness
As (γt)t≥0 arises from a linear stochastic differential equation, it has independent incrementsand so checking that the finite dimensional distributions of γN converge to those of γ will berelatively straightforward. Therefore it is convenient to use Theorem 4.3 to prove the convergencein distribution of the above processes.
We establish the relative compactness of the sequence (γN)N∈N by checking the necessaryand sufficient conditions stated in Corollary 5.11, namely that
(a) For every ε > 0 and rational t ≥ 0, there exists a compact set Γε,t ⊂ Rd such that
lim infN→∞
P(
γNt ∈ Γε
ε,t
)
≥ 1 − ε.
(b) For every ε > 0 and T > 0, there exists δ > 0 such that
lim supN→∞
P(
w′(γN , δ, T ) ≥ ε)
≤ ε.
where w′ is defined in Definition 5.7.
By conditioning on F0 = σ(γ0), if necessary, it is sufficient to consider the case when γ0 isnon-random. We make this assumption throughout the remainder of this section.
Lemma 7.5. For all ε > 0 there exists λ <∞ such that, for all N ∈ N,
P
(
supt≤t0
|γNt | ≥ λ
)
< ε.
Proof. Given ε > 0, pick λ′ ≥ max|γ0| + 1,√
C0 log 2C0
ε , where C0 is defined in (6.6). By
(7.2), there exists some N1 ∈ N such that N ≥ N1 implies
√N |bN(x) − b(x)| ≤ λ′
t0
for all x ∈ SN . By (7.1), there exists some N2 ∈ N such that N ≥ N2 implies
P(|γN0 | > λ′) ≤ P(|γN
0 | > |γ0| + 1) <ε
2.
By (6.6), there exists some N3 ∈ N such that N ≥ N3 implies that
P
(
√N sup
t≤T N
|MNt | > λ′
)
≤ P
supt≤T N
|MNt | >
√
C0 log 2C0
ε
N
≤ ε
2.
N3 is chosen so that
√
C0 log2C0
ε
N ≤ ε0, where ε0 is defined in (6.6).
Now, by the definition of MNt ,
γNt = γN
0 +√NMN
t +√N
∫ t
0
(bN (XNs ) − b(XN
s ))ds+√N
∫ t
0
(b(XNs ) − b(xs))ds
and so, on the set
√N |bN(x) − b(x)| ≤ λ′
t0, |γN
0 | ≤ λ′,√N sup
t≤T N
|MNt | ≤ λ′
,
40
|γNt | ≤ λ′ + λ′ + t
λ′
t0+√NK
∫ t
0
|XNs − xs|ds
≤ 3λ′ +K
∫ t
0
|γNs |ds,
where K is the Lipschitz constant of b. Hence, by Gronwall’s Inequality (Lemma 6.9),
supt≤T N
|γNt | ≤ 3λ′e−Kt0 ,
and so, if N0 = maxN1, N2, N3, then N ≥ N0 implies that
P
(
supt≤t0
|γNt | ≥ 3λ′e−Kt0
)
≤ P(|γN0 | > λ′ or
√N sup
t≤T N
|MNt | > λ′) < ε.
Pick λ ≥ 3λ′e−Kt0 sufficiently large that P(
supt≤t0 |γNt | ≥ λ
)
< ε for all N < N0. Then, for all
N ∈ N, P(
supt≤t0 |γNt | ≥ λ
)
< ε.
For all t, take Γε,t = B(0, λ), the closed ball of radius λ in Rd. Then by the above lemma,
lim infN→∞
P(γNt ∈ Γε
ε,t) ≥ 1 − ε,
and so condition (a) of Corollary 5.11 holds.
Lemma 7.6. For all ε > 0, there exists λ < ∞ such that, for all δ > 0, there exists Nδ < ∞such that, for all N ≥ Nδ and all t ≤ t0,
P
(
sups≤t0,t≤s≤t+δ
|γNs − γN
t | > λ√δ
)
< ε.
Proof. Given ε > 0, by Lemma 7.5 there exists λ1 <∞ such that
P
(
|γNt | ≥ λ1
K√t0
)
<ε
2,
for all t ≤ t0. By (6.6), setting λ2 =√
2Ad log 4dε , there exists N ′
δ < ∞ such that N ≥ N ′δ
implies that
P
(
√N sup
t≤T N∧δ
|MNt | > λ2
√δ
)
< 2d exp
−N(λ2
√
δ/N)2
2Ad(TN ∧ δ)
≤ ε
2,
and hence
P
(
√N sup
s≤t0,t≤s≤t+δ|MN
s −MNt | > λ2
√δ
)
<ε
2,
where the dependence of N ′δ on δ arises from needing λ2
√
δ/N ≤ ε0. Setting λ′ = maxλ1, λ2and λ = 3λ′e−Kt0 , by (7.2), there exists N0 <∞ such that N ≥ N0 implies
√N |bN(x) − b(x)| ≤ λ′√
t0.
Now, by the definition of MNt , if s ≥ t
γNs − γN
t =√N(MN
s −MNt ) +
√N
∫ s
t
(bN (XNu ) − b(XN
u ))du +√N
∫ s
t
(b(XNu ) − b(xu))du,
41
and∣
∣
∣
∣
√N
∫ s
t
(b(XNu ) − b(xu))du
∣
∣
∣
∣
≤ K
∫ s
t
|γNu |du ≤ K
∫ s
t
(|γNt | + |γN
u − γNt |)du.
Hence, on the set
√N |bN(x) − b(x)| ≤ λ′√
t0, |γN
t | ≤ λ′
K√t0,
√N sup
s≤t0,t≤s≤t+δ|MN
s −MNt | ≤ λ′
√δ
,
if s ≤ t0 and t ≤ s ≤ t+ δ,
|γNs − γN
t | ≤ λ′√δ + (s− t)
λ′√t0
+K(s− t)λ′
K√t0
+K
∫ s
t
|γNu − γN
t |du
≤ 3λ′√δ +K
∫ s
t
|γNu − γN
t |du,
where, for the second inequality, δ may be taken to be at most t0 without changing the admissiblevalues of s. By Gronwall’s Inequality (Lemma 6.9),
sups≤t0,t≤s≤t+δ
|γNs − γN
t | ≤ 3λ′√δe−Kt0 = λ
√δ,
and so, if Nδ = maxN0, N′δ, then N ≥ Nδ implies that
P
(
sups≤t0,t≤s≤t+δ
|γNs − γN
t | ≥ λ√δ
)
≤ P
(
|γNt | ≤ λ′
K√t0
)
+P
(
√N sup
s≤t0,t≤s≤t+δ|MN
s −MNt | ≤ λ′
√δ
)
≤ ε.
Given ε > 0 and T > 0, set 2δ =(
ελ
)2where λ is as above. Then there exists a partition
ti of the form 0 = t0 < t1 < · · · < tn−1 < T ≤ tn with ti − ti−1 = 2δ for all i. Now
w′(γN , δ, T ) ≤ maxi
sups,t∈[ti−1,ti)
|γNs − γN
t |
≤ sups≤t0,t≤s≤t+2δ
|γNs − γN
t |,
and so
lim supN→∞
P(
w′(γN , δ, T ) ≥ ε)
≤ lim supN→∞
P
(
sups≤t0,t≤s≤t+2δ
|γNs − γN
t | ≥ λ√
2δ
)
≤ ε.
Therefore condition (b) of Corollary 5.11 holds and hence the sequence (γN )N∈N is relativelycompact.
7.2 Convergence of the Finite Dimensional Distributions
It now remains to prove that, for all finite subsets t1, . . . tk ⊂ [0,∞), the finite dimensionaldistributions (γN
t1 , . . . , γNtk
) ⇒ (γt1 , . . . , γtk). As γN
t is stopped at TN and γt is stopped at T ,where T is defined in Definition 7.4, it is necessary to establish a preliminary result about therelationship between TN and T when N is large.
Lemma 7.7. (a) If T = 0, then P(TN = 0) → 1 as N → ∞.
42
(b) If T > 0 and τ1 is the smallest non-zero element of T , then for every t < τ1, P(TN > t) → 1as N → ∞.
Proof. (a) By the definition of T , T = 0 if and only if x0 ∈ ∂S and 〈n0, γ0〉 < 0 where n0 isthe inward normal at x0. Since γN
0 → γ0 in probability (γ0 is non random), 〈n0, γ0〉 < 0implies that P(〈n0, γ
N0 〉 < 0) → 1 as N → ∞ and so P(〈n0, X
N0 −x0〉 < 0) → 1 as N → ∞
(by the definition of γN0 ). If x0 ∈ ∂S, then this means that XN
0 /∈ S provided |XN0 − x0|
is small enough. Therefore TN = 0. Since XN0 → x0 in probability, this implies that
P(TN = 0) → 1 as N → ∞.
(b) If T > 0, then, as above, either x0 ∈ S, or 〈n0, γ0〉 > 0 (P(〈n0, γ0〉 = 0) = 0). If x0 ∈ Sthen, since T is finite, the result follows by Proposition 6.16. Suppose that x0 ∈ ∂S and〈n0, γ0〉 > 0. Since ∂S is C1 at x0, for all ε > 0, there exists δ(ε) > 0 such that, for allx ∈ S with |x− x0| ≤ δ(ε), and all v ∈ Rd,
|v| ≤ δ(ε) and 〈n0, v〉 ≥ ε|v| ⇒ x+ v ∈ S. (7.3)
We illustrate this condition in Figure 3.
Figure 3: Illustration of the case x0 ∈ S and 〈n0, γ0〉 > 0
Given ε > 0, let λ1 be the λ from Lemma 7.5 corresponding to ε3 . Let λ2 be the λ from
Lemma 7.6 corresponding to ε3 . Since γN
0 → γ0 in probability, there exists some N1 < ∞such that N ≥ N1 implies
P
(
|γN0 − γ0| >
〈n0, γ0〉4
)
<ε
3.
Let ε1 = min
〈n0,γ0〉2λ1
, 1λ1,(
〈n0,γ0〉4λ2
)2
and N0 = maxN1, Nε1, where Nε1 is defined as
in Lemma 7.6. Let Ω be the set
supt≤t0
|γNt | < max
1
ε1,〈n0, γ0〉
2ε1
, supt≤t0∧ε1
|γNt − γN
0 | ≤ 〈n0, γ0〉4
, |γN0 − γ0| ≤
〈n0, γ0〉4
.
43
Then if N ≥ N0 and t ≤ t0 ∧ ε1, on Ω,
〈n0, γNt 〉 ≥ 2ε1
〈n0, γ0〉(
〈n0, γ0〉 − |〈n0, γNt − γN
0 〉| − |〈n0, γN0 − γ0〉|
) 〈n0, γ0〉2ε1
≥ 2ε1〈n0, γ0〉
(
〈n0, γ0〉 − |γNt − γN
0 | − |γN0 − γ0|
)
|γNt |
≥ 2ε1〈n0, γ0〉
(
〈n0, γ0〉 −〈n0, γ0〉
4− 〈n0, γ0〉
4
)
|γNt |
= ε1|γNt |,
and
|γNt | < 1
ε1.
But
P(Ωc) ≤ P
(
supt≤t0
|γNt | ≥ λ1
)
+P
(
supt≤t0∧ε1
|γNt − γN
0 | > λ2√ε1
)
+P
(
|γN0 − γ0| >
〈n0, γ0〉4
)
which is less than ε. Hence
P
(
〈n0, γNt 〉 > ε1|γN
t | and |γNt | < 1
ε1
)
≥ 1 − ε.
By the continuity of xt, there exists 0 < ε2 < ε1 ∧ τ such that t ≤ ε2 implies |xt − x0| ≤δ(ε1). Set N ′ ≥ maxN0, (ε1δ(ε1))
−2. Then, for N ≥ N ′ and t ≤ TN ∧ ε2, xt ∈ S,
|xt − x0| ≤ δ(ε1), N− 1
2 |γNt | ≤ N− 1
2
ε1≤ δ(ε1) and 〈n0, γ
Nt 〉 > ε1|γN
t |, all with probability
exceeding 1 − ε. By (7.3), this implies that XNt = xt + N− 1
2 γNt ∈ S i.e. TN 6= t for all
t ≤ ε2. Hence P(TN ≤ ε2) < ε for all N ≥ N ′, and so P(TN = 0) → 0 as N → ∞. Theresult follows by Proposition 6.16.
Lemma 7.8. Suppose t ≤ t0. Then γNt ⇒ γt as N → ∞.
Proof. If T = 0, then by Lemma 7.7, TN → 0 in probability as N → ∞. If t > 0, then, recallingthat γN
t is stopped at TN and that γt is stopped at T , by Lemma 7.6 and (7.1),
P(|γNt − γt| ≤ ε) = P(|γN
T N − γ0| ≤ ε)
≥ P(|γNT N − γN
0 | ≤ ε
2, |γN
0 − γ0| <ε
2and TN ≤ t)
→ 1.
So assume T > 0. By conditioning on σ(γτm) if necessary, where τm ∈ T , we may assume
that t < τ1, where τ1 is the smallest non-zero element of T .
Define (ψt)t≤τ in Rd ⊗ (Rd)∗ by
ψt = ∇b(xt)ψt, ψ0 = id.
Fix θ ∈ (Rd)∗ and set θt = (ψ∗t )−1θ. Then ψ∗
t θt + ψ∗t θt = 0 (θ is constant) and so θt =
−ψ∗t−1(ψ∗
t θt) = −ψ∗t−1(ψ∗
t ∇b(xt)∗θt) = −∇b(xt)
∗θt. By Ito’s formula, and the definition of γt
(Definition 7.3),
d〈θt, γt〉 = 〈dθt, γt〉 + 〈θt, dγt〉 + 0
= −〈∇b(xt)∗θt, γt〉dt+ 〈θt, σ(xt)dBt〉 + 〈θt,∇b(xt)γt〉dt
= 〈θt, σ(xt)dBt〉,
44
and so
〈θt, γt〉 ∼ N(〈θ, γ0〉,∫ t
0
〈θs, a(xs)θs〉ds),
i.e. γ is a Gaussian process. On the other hand,
dγNt =
√NdMN
t +√N(bN (XN
t ) − b(xt))dt,
where MNt is defined as in Theorem 6.12. As above,
d〈θt, γNt 〉 =
√N〈θt, dM
Nt 〉 +RN,θ
t dt, (7.4)
whereRN,θ
t =√N〈θt, b
N (XNt ) − b(xt) −∇b(xt)(X
Nt − xt)〉.
By (7.2)
supt≤T N
√N |bN (XN
t ) − b(XNt )| → 0.
Given ε > 0, since xt ∈ S for all t in the compact interval [ε, τ1 − ε], there exists some δ′ > 0such that the compact set x ∈ Rd : inft∈[ε,τ1−ε] |x − xt| ≤ δ′ is a subset of S. Then since b isC1 on S, ∇b is uniformly continuous on compact sets and so there exists some 0 < δ < δ′ suchthat |x− y| < δ implies that |∇b(x)−∇b(y)| ≤ ε. Now suppose |x−xt| ≤ δ. By the mean valuetheorem, there exists some ξ ∈ B(xt, |x−xt|) ⊂ B(xt, δ) such that b(x)− b(xt) = ∇b(ξ)(x−xt).Then
|b(x) − b(xt) −∇b(xt)(x− xt)| = |∇b(ξ) −∇b(xt)||x− xt|≤ ε|x− xt|.
Hence, |XNt − xt| ≤ δ and ε ≤ t ≤ τ1 − ε imply
√N |b(XN
t ) − b(xt) −∇b(xt)(XNt − xt)| ≤ ε|γN
t |.
θt is continuous and hence bounded, by C1 say, on the compact interval [0, τ1]. Similarly ∇b(xt)is bounded, by C2 say. Given η > 0, by Lemma 7.5 there exists some λ < ∞ such thatP(supt≤T N |γN
t | > λ) < η2 . Given t′ < τ1, pick ε sufficiently small that t′ ≤ τ1 − ε and
ε[C1(ε+ (K + C2)λ) + t′(C1(1 + λ))] < η. Then, if δ > 0 is as above, on the set
Ω = supt≤T N
√N |bN (XN
t ) − b(XNt )| < ε, sup
t≤T N
|XNt − xt| ≤ δ, sup
t≤T N
|γNt | ≤ λ,
|RN,θt | ≤ C1(sup
t≤t′
√N |bN (XN
t ) − b(XNt )| + (K + sup
t≤t0
|∇b(xt)|) supt≤t′
|γNt |)
≤ C1(ε+ (K + C2)λ),
and if ε ≤ t ≤ t′, then
|RN,θt | ≤ C1(sup
t≤t′
√N |bN (XN
t ) − b(XNt )| + ε sup
t≤t′|γN
t |)
≤ C1ε(1 + λ).
Hence
∫ t′
0
|RN,θt |dt =
∫ ε
0
|RN,θt |dt+
∫ t′
ε
|RN,θt |dt
≤ ε [C1(ε+ (K + C2)λ)] + (t′ − ε)(C1ε(1 + λ))
< η.
45
Choosing N sufficiently large that P(Ω) ≥ 1 − η,
P
(
∫ t′
0
|RN,θt |dt ≤ η
)
≥ 1 − η
i.e. the integral converges to 0 in probability.
Therefore, returning to (7.4), in order to show γNt ⇒ γt, it suffices to show, for all θ ∈ (Rd)∗
and all t ≤ τ1, that√N
∫ t
0
〈θs, dMNs 〉 → N(0,
∫ t
0
〈θs, a(xs)θs〉ds)
in distribution. By Levy’s Continuity Theorem, it is sufficient to show, for all θ ∈ (Rd)∗ and all
t ≤ τ1, that E(EN,θt ) → 1 as N → ∞, where
EN,θt = expi
√N
∫ t
0
〈θs, dMNs 〉 +
1
2
∫ t
0
〈θs, a(xs)θs〉ds.
Set mN (x, θ) = mN (x, iθ), m(x, θ) = m(x, iθ) and
φN (x, θ) =
∫
Rd
(
ei〈θ,y〉 − 1 − i〈θ, y〉)
KN(x, dy).
Assuming that (6.4) holds for θ ∈ (Cd)∗, by the same argument as (6.7),
supx∈SN
sup|θ|≤η
|mN ′(x,Nθ) − m′(x, θ)| → 0
as N → ∞, for all η ≤ η0. By the second-order mean value theorem, if h ∈ (Rd)∗, then
m′′(x, θ)h = m′(x, θ + h) − m′(x, θ) −∫ 1
0
m′′′(x, rθ)(h, h)(1 − r)dr,
and similarly for mN ′′(x,Nθ). By Lemma 6.4, |m′′′(x, θ)| ≤ B, for all x ∈ S and |θ| ≤ η, andN2|mN ′′′(x,Nθ)| ≤ N2 NB
N3 = B, for all x ∈ SN and |θ| ≤ η. Given ε > 0, let δ = min η0−η2 , ε
2B .There exists N0 such that N ≥ N0 implies
supx∈SN
sup|θ|≤η+η0
2
|mN ′(x,Nθ) − m′(x, θ)| < εδ
4.
If N ≥ N0, then
|NmN ′′(x,Nθ) − m′′(x, θ)| = sup|h|=δ
1
|h|∣
∣(NmN ′′(x,Nθ) − m′′(x, θ))h∣
∣
= sup|h|=δ
1
|h|∣
∣
∣mN ′(x,N(θ + h)) − m′(x, θ + h)
− mN ′(x,Nθ) + m′(x, θ)
+
∫ 1
0
(
m′′′(x, θ) −N2mN ′′′(x,Nθ))
(h, h)(1 − r)dr∣
∣
∣
≤ sup|h|=δ
1
|h|(
2 supx∈SN
sup|θ|≤η+η0
2
|mN ′(x,Nθ) − m′(x, θ)| + 1
22B|h|2
)
<1
δ
2εδ
4+Bδ
≤ ε.
Hence, for all η < η0, we have
supx∈SN
sup|θ|≤η
|NmN ′′(x,Nθ) − m′′(x, θ)| → 0.
46
Since a(x) = m′′(x, 0) = −m′′(x, 0),
φN (x,√Nθ) +
1
2〈θ, a(x)θ〉 =
∫ 1
0
(
NmN ′′(x,√Nrθ) − m′′(x, 0)
)
(θ, θ)(1 − r)dr,
where the first term arises from the second-order mean value theorem. Hence, for all ρ <∞,
supx∈SN
sup|θ|≤ρ
|φN (x,√Nθ) +
1
2〈θ, a(x)θ〉| ≤ 1
2ρ2 sup
x∈SN
sup|θ|≤ρ
|NmN ′′(x,√Nθ) − m′′(x, 0)|
≤ 1
2ρ2 sup
x∈SN
sup|θ|≤ ρ√
N
|NmN ′′(x,Nθ) − m′′(x, θ)|
+ supx∈SN
sup|θ|≤ρ
|m′′(x, θ/√N) − m′′(x, 0)|
→ 0, (7.5)
the second term converging by the continuity of m′′(x, θ) as a function of θ.
Write EN,θt = EN
t = ZNt A
Nt B
Nt , where
ZNt = expi
√N
∫ t
0
〈θs, dMNs 〉 −
∫ t
0
φN (XNs ,
√Nθs)ds,
ANt = exp
∫ t
0
(
φN (XNs ,
√Nθs +
1
2〈θs, a(X
Ns )θs〉
)
ds,
BNt = exp
∫ t
0
1
2〈θs, (a(xs) − a(XN
s ))θs〉ds.
(ZNt∧T N )t≤τ is a martingale as in Lemma 6.7, and so E(ZN
t∧T N ) = 1 for all N . Fix t ≤ τ . Sinceθs is continuous, it is bounded on the compact interval [0, t]. By Lemma 6.4, a(x) is boundedand so, for s ≤ t, by (7.5), φN (XN
s ,√Nθs), and hence ZN
t∧T N , is bounded, uniformly in N . Alsoby (7.5), AN
t∧T N → 1 uniformly as N → ∞. Since a(x) is bounded, BNt∧T N is bounded uniformly
in N , and since a is Lipschitz and XNt → xt in probability (Theorem 6.12), BN
t∧T N converges to1 in probability. Given ε > 0, suppose |ZN
t∧T N |, |ANt∧T N |, |BN
t∧T N | ≤M for some constant M andthat N is sufficiently large that P(|AN
t∧T NBNt∧T N − 1| ≥ δ) ≤ ε
2(M3+1) where δ = ε2M . Then
∣
∣E(ZNt∧T NAN
t∧T NBNt∧T N ) − 1
∣
∣ =∣
∣E(
ZNt∧T N (AN
t∧T NBNt∧T N − 1)
)∣
∣
≤ E
(
|ZNt∧T N |AN
t∧T NBNt∧T N − 1|1|AN
t∧TN BN
t∧T N −1|<δ)
+ E
(
(|ZNt∧T NA
Nt∧T NB
Nt∧T N | + 1)1|AN
t∧TNBN
t∧T N−1|≥δ
)
≤ MδP(|ANt∧T NB
Nt∧T N − 1| < δ)
+ (M3 + 1)P(|ANt∧T NBN
t∧T N − 1| ≥ δ)
< ε.
Hence E(ZNt∧T NA
Nt∧T NB
Nt∧T N ) → 1 as N → ∞. By Lemma 7.7, P(TN > t) → 1 for all t < τ1.
Since ENt = ZN
t ANt B
Nt is bounded uniformly in N , by a similar argument to that above,
E(ENt ) → 1 for all t < τ1, as required.
Lemma 7.9. For all finite subsets t1, . . . tk ⊂ [0, τ ], (γNt1 , . . . , γ
Ntk
) ⇒ (γt1 , . . . γtk).
Proof. The proof is by induction on k. The case k = 1 is proved in Lemma 7.8 above. Supposewe have proved the result for all m < k, where k > 1. Let 0 ≤ t1 < · · · < tk ≤ τ . By Levy’sContinuity Theorem, it is sufficient to show that
E
exp i
k∑
j=1
〈θj , γNtj〉
→ E
exp i
k∑
j=1
〈θj , γtj〉
,
47
for every k-tuple (θ1, . . . , θk) ∈ ((Rd)∗)k. Let
βN =
k−2∑
j=1
〈θj , γNtj〉 + 〈θk−1 + θk, γ
Ntk−1
〉,
and define β similarly. By our inductive hypothesis, βN ⇒ β and so, conditional on Ftk−1=
σ(γt : 0 ≤ t ≤ tk−1), βN → β in probability. Hence
E
(∣
∣
∣ei(βN−β) − 1∣
∣
∣ | Ftk−1
)
→ 0
as N → ∞. Also, by an identical argument to Lemma 7.8,
E
(
ei〈θk,γN
tk−γN
tk−1〉 | Ftk−1
)
→ E
(
ei〈θk,γtk−γtk−1
〉 | Ftk−1
)
,
as N → ∞. Hence∣
∣
∣E
(
eiβN
ei〈θk,γN
tk−γN
tk−1〉)− E
(
eiβei〈θk,γtk−γtk−1
〉)∣
∣
∣
=∣
∣
∣E
[
E
(
eiβN
ei〈θk,γN
tk−γN
tk−1〉 − eiβei〈θk,γtk
−γtk−1〉 | Ftk−1
)]∣
∣
∣
≤ E
∣
∣
∣E
(
ei(βN−β)ei〈θk,γN
tk−γN
tk−1〉 − ei〈θk,γtk
−γtk−1〉 | Ftk−1
)∣
∣
∣
≤ E
∣
∣
∣E
[(
ei(βN−β) − 1)
ei〈θk,γN
tk−γN
tk−1〉 | Ftk−1
]∣
∣
∣
+ E
∣
∣
∣E
(
ei〈θk,γN
tk−γN
tk−1〉 − ei〈θk,γtk
−γtk−1〉 | Ftk−1
)∣
∣
∣
≤ E
[
E
(∣
∣
∣ei(βN−β) − 1∣
∣
∣ | Ftk−1
)]
+ E
∣
∣
∣E
(
ei〈θk,γN
tk−γN
tk−1〉 − ei〈θk,γtk
−γtk−1〉 | Ftk−1
)∣
∣
∣
→ 0.
Therefore (γNt1 , . . . , γ
Ntk
) ⇒ (γt1 , . . . γtk), completing the inductive step.
Combining these results gives the following theorem.
Theorem 7.10. γN ⇒ γ as N → ∞.
Proof. By Lemmas 7.5 and 7.6, the sequence (γN )N∈N is relatively compact and by Lemma 7.9,the finite dimensional distributions of γN converge in distribution to those of γ. The resultfollows by Theorem 4.3.
The basic implication of this theorem is that (XNt )t≥0 can be approximated (in distribution)
by the Gaussian process (xt +N− 12 γt)t≥0. This allows us to use properties of diffusion processes
when investigating the behaviour of XN for large values of N .
8 Applications
The results from the previous two sections can be applied to obtain limiting results in a widerange of areas, from random graphs and stochastic networks to perturbed dynamical systems.We look at the applications to biology and specifically to population processes, discussing twoexamples: epidemics and logistic growth.
48
8.1 Epidemics
Suppose we have a population consisting ofN individuals. In the population, at any given time t,there are a number of individuals, SN
t , who are susceptible to a particular disease, and a numberof individuals, IN
t , who are infected by the disease and can pass it on. A susceptible individualencounters diseased individuals at a rate proportional to the fraction of the total populationthat is diseased, with proportionality constant λ. We assume that diseased individuals recoverand become immune independently of each other, at a rate µ. Therefore, (SN
t , INt )t≥0 is a
continuous-time Markov chain, taking values in (Z+)2 where
(s, i) → (s− 1, i+ 1) at rate λsi/N
(s, i) → (s, i− 1) at rate µi.
We are interested in the proportion of susceptible and infected individuals for large values of Nas this is generally the situation under which an actual epidemic occurs.
Let XNt = (SN
t , INt )/N be the proportion of susceptibles and infectives at time t. Since
INt + SN
t ≤ N for all t, XNt takes values in UN = 1
N Z2 ∩ U , where U is the unit square [0, 1]2.We are particularly interested in the proportion of individuals who have avoided being infectedby the time the last infective recovers and for this reason, we let S be the open set (−1, 2)×(0, 2).(This choice of S is somewhat arbitrary, but has the property that the only way the processcan leave S is by the last infective recovering). Set SN = UN ∩ S, and for some fixed t0, letTN = inft ≥ 0 : XN
t /∈ S ∧ t0.
∆XNt =
(−1,1)N with rate λSN
t INt /N = NλXN,1
t XN,2t
(0,−1)N with rate µIN
t = NµXN,2t ,
and so KN (x, dy) = Nλx1x2δ(−1,1)/N +Nµx2δ(0,−1)/N . Then
mN(x, θ) =
∫
R2
e〈θ,y〉KN (x, dy)
= Nλx1x2eθ2−θ1
N +Nµx2e−θ2N ,
and so, there exists a limit kernel K(x, dy) = λx1x2δ(−1,1) + µx2δ(0,−1), with corresponding
Laplace transform m(x, θ) = λx1x2eθ2−θ1 + µx2e−θ2 with the properties:
(a) There exists a constant η0 = 1 such that
supx∈S
sup|θ|≤1
m(x, θ) ≤ 4λe2 + 2µe1 <∞.
(b)
supx∈SN
sup|θ|≤η0
∣
∣
∣
∣
mN (x,Nθ)
N−m(x, θ)
∣
∣
∣
∣
= 0.
We also assume that there exists some x0 ∈ S, the closure of S such that, for all δ > 0,
lim supN→∞
N−1 log P(|XN0 − x0| ≥ δ) < 0.
Note that as SN0 + IN
0 = N for all N , x0 = (1 − α, α) for some α ∈ [0, 1]. Set b(x) = m′(x, 0) =(−λx1x2, λx1x2 − µx2). On S,
|λ(x1x2 − y1y2)| ≤ λ(|x1 − y1||x2| + |x2 − y2||y1|)≤ 2λ(|x1 − y1| + |x2 − y2|),
49
and
|λ(x1x2 − y1y2) − µ(x2 − y2)| ≤ 2λ|x1 − y1| + (2λ+ µ)|x2 − y2|)≤ (2λ+ µ)(|x1 − y1| + |x2 − y2|).
Hence
|b(x)|2 = |λ(x1x2 − y1y2)|2 + |λ(x1x2 − y1y2) − µ(x2 − y2)|2≤ (4λ2 + (2λ+ µ)2)(|x1 − y1| + |x2 − y2|)2≤ 2(4λ2 + (2λ+ µ)2)|x − y|2,
and so b is Lipschitz on S. Therefore, there is a unique solution (xt)t≥0 to the ordinary differentialequation xt = b(xt) starting from x0, that is
x1t = −λx1
tx2t , x2
t = (λx1t − µ)x2
t .
The sequence of jump processes (XN )N≥1 satisfies all the conditions needed to apply the FluidLimit Theorem (Theorem 6.12) and so, for all δ > 0,
lim supN→∞
N−1 log P( supt≤T N
|XNt − xt| ≥ δ) < 0.
Let τ = inft ≥ 0 : x2t = 0. Then T = t ≥ 0 : xt /∈ S = [τ,∞) (if x2
t = 0, then x1t = x2
t = 0),and so xt∧τ = xt for all t. The argument in Proposition 6.16 then shows that
lim supN→∞
N−1 log P(supt≥0
|XNt − xt∧τ | > δ) < 0.
By investigating the properties of xt, we can get a good idea of the behaviour of XNt for large
values of N . We introduce a new variable x3t , the proportion of individuals that have recovered
by time t, so that x1t +x2
t +x3t = 1 for all t, and hence x3
t = µx2t . We shall discuss three questions:
(a) What conditions are needed for an epidemic to take place?
(b) What proportion of people will catch the disease and, in particular, will there be anysusceptibles left when the epidemic has died out?
(c) What is the time dependence of the epidemic?
The case α = 0, when there are no infectives initially, is not interesting as all the values remainconstant over time, so we assume that α > 0. By reversing time in the differential equations, itcan be seen that this implies that x2
t > 0 for all t (and so in fact τ = ∞). An epidemic occurswhen the proportion of infectives increases over some time period. Looking at the equation for
x2t , this happens if and only if λx1
t > µ for some t. By the equation for x1t , x
1t is a decreasing
function of t and so an epidemic occurs if and only if λxt0 > µ, or ρ0 = λ(1−α)
µ > 1. We call ρ0
the reproductive rate.
By the equation for x3t , x
3t is increasing, but bounded above by 1, and so must converge to
a limit as t→ ∞. Hence x3t → 0, and so x2
t → 0 as t→ ∞. Therefore, the epidemic dies out inthe limit. Now
dx1t
dx3t
=x1
t
x3t
= −λµx1
t ,
and so x1t = x1
0e−λ
µx3
t . At the end of the epidemic, x2∞ = 0 and so x1
∞ + x3∞ = 1. Hence the
proportion of susceptibles at the end of the epidemic satisfies
x1∞ = (1 − α)e−
λµ
(1−x1∞),
50
which can be solved numerically. Often, the initial number of infectives is very small, so (1−α)
is approximately 1. If we let π = 1 − x1∞
1−α be the proportion of susceptibles ultimately infected,then substituting in the above equation gives
π ≈ 1 − e−λ(1−α)
µπ = 1 − e−ρ0π.
For a given value of ρ0, this can be solved using Newton’s method.
Using the fact that x2t = 1− x1
t − x3t , and x1
t = (1−α)e−λµ
x3t , the epidemic curve is given by
x3t = µ(1 − x3
t − (1 − α)e−λµ
x3t ).
For specified values of λ, µ and α, this equation can be solved by numerical methods, which inturn gives values for x1
t and x2t . The graphs in Figure 4 (from [2]), show the course of a model
epidemic for λ = 0.001, µ = 0.1 in the cases α = 1/500 i.e ρ0 = 4.99 > 1, and α = 1/150 i.e.ρ0 = 1.49 > 1.
Figure 4: Graphs of a model epidemic for ρ0 = 4.99 and ρ0 = 1.49
8.2 Logistic Growth
We now interpret N as the area of a region occupied by a certain population. If the populationsize is k, then the population density is k
N . For simplicity, we assume that births and deaths occursingly. The rate of births and deaths should be approximately proportional to the populationsize. We assume, however, that crowding affects the birth and death rates, which thereforedepend on the population density. Therefore, the birth and death rates should be of the formλ(k/N)k and µ(k/N)k, respectively, for some functions λ and µ. If a, b, and c are non-negativeconstants, then one of the simplest such models is given by taking λ(x) = a and µ(x) = b+ cx.
51
Let the Markov chain (XNt )t≥0 be the population density at time t. Then XN takes values in
IN = 1N Z+, and
∆XNt =
1N at rate NaXN
t
− 1N at rate NbXN
t +Nc(XNt )2.
We are interested in whether the population exceeds a certain density or dies out and so, forfixed R > 0, we set S = (0, R), and SN = IN ∩ S. XN has Levy Kernel KN (x, dy) =Naxδ 1
N+ (Nbx+Nc)δ− 1
Nand so
mN (x, θ) = NaxeθN + (Nbx+Nc)e−
θN .
There exists a limit kernel K(x, dy) = axδ1 +(bx+ c)δ−1, with corresponding Laplace transformm(x, θ) = axeθ + (bx + c)e−θ with the required properties. Suppose there exists some x0 ∈ S,the closure of S such that, for all δ > 0,
lim supN→∞
N−1 log P(|XN0 − x0| ≥ δ) < 0.
Set b(x) = m′(x, 0) = (a−b)x−cx2. b is Lipschitz on S and so there is a unique solution (xt)t≥0
to the ordinary differential equation xt = (a− b)xt − cx2t starting from x0. Applying the Fluid
Limit Theorem (Theorem 6.12), for all δ > 0,
lim supN→∞
N−1 log P( supt≤T N
|XNt − xt| ≥ δ) < 0,
where TN is defined as in Definition 6.10. We can solve the ordinary differential equationexplicitly to get
xt =x0(a− b)
(a− b− cx0)e(b−a)t + cx0if a 6= b, or xt =
x0
x0ct+ 1if a = b,
from which it is easy to see that, when c 6= 0, the population dies out as t → ∞ if a ≤ b, andthe population density stabilises to a−b
c if a > b. If c = 0, then the population dies out if a < b,it explodes (i.e. becomes infinite) if a > b, and the density is stable at x0 if a = b. A graph ofthe case a > b, c 6= 0 is sketched in Figure 5.
Figure 5: A graph of the logistic growth model in the case a > b, c 6= 0
We now look at the fluctuations about this result. As in Section 7, let γNt =
√N(XN
t − xt)and suppose that there exists a random variable γ0 such that γN
0 ⇒ γ0. Let a(x) = m′′(x, 0) =
52
(a+b)x+cx2. It is easy to check that all the necessary conditions for Theorem 7.10 hold. HenceγN ⇒ γ, where γ is the solution to the stochastic differential equation
dγt = σ(xt)dBt + ∇b(xt)γtdt
=√
(a+ b)xt + cx2tdBt + (a− b− 2cxt)γtdt.
In all of the cases mentioned above, except a > b, c = 0, xt converges to a finite limit as t→ ∞.When this limit is zero, γt behaves asymptotically like the solution to the differential equation
dyt = (a− b)ytdt.
That is, yt = x0e(a−b)t → 0 as t→ ∞ when a < b, and so the fluctuations die out as well. When
xt converges to a−bc , a > b and so γt behaves asymptotically like the solution to the stochastic
differential equation
dYt =
√
2a(a− b)
cdBt − (a− b)Ytdt,
i.e. Y is an Ornstein-Uhlenbeck process and so Yt ∼ N(
x0e−(a−b)t, a
c (1 − e−2(a−b)t))
⇒ N(0, ac )
as t→ ∞. Hence the fluctuations are asymptotically stable and XNt behaves like a N(a−b
c , aNc)
random variable for large t and N .
9 Conclusion
In this essay, the notion of convergence for a sequence of stochastic processes has been discussed.We followed the route of defining a topology on the space of cadlag functions, and a metricon the space of probability measures that induced the same topology as that generated byweak convergence. This enabled us to establish an equivalence between weak convergence,convergence in probability and almost sure convergence, which we used to prove that, in thesimple cases of Markov chains and diffusion processes, the convergence of the generators impliedweak convergence. In Section 4, we proved a key result, due to Prohorov, which states thata sequence of stochastic processes converges if and only if it is relatively compact and thecorresponding finite dimensional distributions converge. The remainder of the essay was spentestablishing a more workable form of this result, and applying it to generalise the Law of LargeNumbers and the Central Limit Theorem to Markov jump processes. Applications of thesetheorems were also discussed.
There are a number of ways in which the key result mentioned above can be developed. Oneidea is to show that any sequence of Markov processes converges if the corresponding sequenceof generators converges. Using the semigroup representation, it follows directly from the Markovproperty that the finite dimensional distributions converge. The difficulty is showing the relativecompactness of the sequence of processes. This can be done using Aldous’ Criterion which isproved by Kallenberg [8]. A full proof of the result, for Feller processes, is given in the samereference.
A shortcoming of Prohorov’s result is that in general it is very difficult, and sometimes im-possible, to calculate the finite dimensional distributions of a process. Strook and Varadhanformulated an alternative result which depends on the generators of the processes, and is conse-quently applicable in many more situations. It relies on the fact that if X is a Markov processwith sample paths in a metric space (S, d) and with generator A, then
f(X(t)) −∫ t
0
Af(X(s))ds
is a martingale for all f ∈ C(S). This can in fact be used to characterise the process X .Strook and Varadhan’s result is stated as follows. Suppose that (Xn)n≥1 is a relatively compact
53
sequence of Markov processes, with generators (An)n≥1 such that An → A. If there exists aMarkov process X such that
f(X(t)) −∫ t
0
Af(X(s))ds
is a martingale for all f ∈ C(S), then Xn ⇒ X . A comprehensive account of this result is givenin [4] and in [7].
54
References
[1] Billingsley, P. Convergence of Probability Measures. Wiley, New York, 1999
[2] Brown, D. and Rothery, P. Models in Biology: Mathematics, Statistics and Computing.Wiley, Chichester, 1993
[3] Darling, R.W.R. and Norris J.R. Vertex Identifiability in Large Random Hypergraphs. Inpreparation, 2002
[4] Ethier, S.N. and Kurtz, T.G. Markov Processes: Characterization and Convergence. Wiley,New York, 1986
[5] Friedlin, M.I. Markov Processes and Differential Equations: Asymptotic Problems.Birkhauser Verlag, Basel, 1996
[6] Galambos, J. and Gani, J. Studies in Applied Probability. Applied Probability Trust, Ox-ford, 1994
[7] Jacod, J. and Shiryaev, A.N. Limit Theorems for Stochastic Processes. Springer-Verlag,Berlin, 1987
[8] Kallenberg, O. Foundations of Modern Probability. Springer, New York, 1997
[9] Norris, J.R. Markov Chains. Cambridge University Press, Cambridge, 1997
[10] Rogers, L.C.G. and Williams, D. Diffusions, Markov processes, and Martingales, Vol. I.Cambridge University Press, Cambridge, 2001
55