path properties of rare events - university of ottawa€¦ · 3 to ˇfor discrete time markov...

Path Properties of Rare Events

Jesse Collingwood

Thesis Submitted to the Faculty of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the degree of Doctor of Philosophy in

Mathematics 1

Department of Mathematics and Statistics

Faculty of Science

University of Ottawa

c Jesse Collingwood, Ottawa, Canada, 2015

1The Ph.D. program is a joint program with Carleton University, administered by the Ottawa-Carleton Institute of Mathematics and Statistics

Abstract

Simulation of rare events can be costly with respect to time and computational re-

sources. For certain processes it may be more efficient to begin at the rare event and

simulate a kind of reversal of the process. This approach is particularly well suited

to reversible Markov processes, but holds much more generally. This more general

result is formulated precisely in the language of stationary point processes, proven,

and applied to some examples.

An interesting question is whether this technique can be applied to Markov pro-

cesses which are substochastic, i.e. processes which may die if a graveyard state is

ever reached. First, some of the theory of substochastic processes is developed; in

particular a slightly surprising result about the rate of convergence of the distribution

�n at time n of the process conditioned to stay alive to the quasi-stationary distri-

bution, or Yaglom limit, is proved. This result is then verified with some illustrative

examples.

Next, it is demonstrated with an explicit example that on infinite state spaces

the reversal approach to analyzing both the rate of convergence to the Yaglom limit

and the likely path of rare events can fail due to transience.

ii

Acknowledgements

Many people have been gracious guides during the writing of my thesis. Jean-Yves

Le Boudec kindly provided me with an office to work in and bits of lunchtime wisdom

at Ecole Polytechnique Federale de Lausanne, and it was here that I met with the

invaluable insight of Nicolas Gast whose help in obtaining some of my results cannot

be overstated. Gail Ivanoff offered encouraging words of support on more than one

occasion; for this and for the years of helpful tutelage she imparted to me before

this project ever began I am very grateful. Paul Dupuis must be sincerely thanked

for his help in overcoming a sticky technical issue. I am indebted to Chris Dionne

and Elizabeth Maltais for their support as friends and colleagues, and to several of

the graduate students with whom I have had the pleasure of sharing close academic

quarters over the past 4 years, among them Farid Elktaibi, Maryam Sohrabi, Nada

Habli, Ibrahim Abdelrazeq, and Jeanseong Park.

As with anyone, the most vital support I received was at home, and consequently

my deepest gratitude lies with my own family and with the Lee-Shanok family. If this

thesis is a play they are its light crew, make up artist, special effects technician and

one hundred other things that work behind the scenes to make everything possible.

Finally, I must thank my friend and advisor David McDonald. What I have

learned under his tutelage intersects many domains from mathematics to history

and politics to where one finds the best eats in Paris. I have learned to take a

more intuitive approach to things from being exposed to his thinking style, and I am

iii

iv

eternally grateful for the opportunities he has furnished me with.

Dedication

For Murray.

v

Contents

Abstract ii

Acknowledgements iii

Dedication v

1 Introduction 1

2 Rare Excursions for Markov Processes 5

2.1 Stationary Point Processes . . . . . . . . . . . . . . . . . . . . . 5

2.2 The Markovian Setting . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Reversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 The Folk Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4.1 Folk Theorem for Markov Chains . . . . . . . . . . . . . . . 25

2.4.2 Fluid Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4.3 An Accounting Network . . . . . . . . . . . . . . . . . . . . 36

3 Quasi-Stationary Measures for Sub-Stochastic Chains 44

3.1 The Yaglom Limit . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Deterministic Convergence to � . . . . . . . . . . . . . . . . . . 50

3.3 Conditions on �0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.4 Comparison with Stochastic Case . . . . . . . . . . . . . . . . . . 59

vi

CONTENTS vii

3.5 Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4 Rare Events for Substochastic Chains 69

4.1 The Sustained Kernel . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2 An Accounting Network With Absorption . . . . . . . . . . . . 72

5 Finding Quasi-Stationary Measures 81

5.1 A Representation Theorem . . . . . . . . . . . . . . . . . . . . . 82

5.2 The Approximation Lemma . . . . . . . . . . . . . . . . . . . . . 89

6 Conclusions and Future Work 95

Chapter 1

Introduction

There are two central ideas upon which this thesis is based. The first is the fact,

known already to some, that when attempting to understand what a large deviation

path for a process looks like the prudent course may be to begin at the end and look at

things in reverse. This is because straightforward simulation may take forever to yield

the desired rare event; however, if the laws governing the time reversal are available

then by beginning at the rare event and “running the film” of its path backwards one

selects from the very subset of the forward paths which are of interest.

This general idea is not new, and has obscure origins ([2] is the earliest example

of which I am aware, and was never published). Yet like so many folk tales and

ballads which come through mysterious beginnings to fixate in our minds, it is a part

of the repertoire of many probabilists. It is for this reason that the result has been

named the Folk Theorem herein. It has not to my knowledge been given a rigorous

formulation and proof for point processes before now. This is done in Section 2.4

after first setting up the notation and framework in Sections 2.1 through 2.3.

Section 2.1 follows closely the standard text [3] and introduces the reader to

the theory of stationary point processes. The purpose of Section 2.2 is to introduce

processes watched on a set and to arrive at Lemma 2.2.2, which is a new calculation

1

2

based on a similar formula in [3]. This formula provides a means of calculating certain

kinds of Palm probabilities from other known quantities and is subsequently used to

help apply the Folk Theorem.

Section 2.3 deals with reversals. The definition of a reversal of a sample path and

a stochastic process is given, and all the changes to the corresponding stationary point

process are carefully tracked. The main facts needed from this Section are Lemma

2.3.3, which simply says that reversing a set of paths does not change its probability

under the stationary measure, and Equation 2.3.1, which is a way of rewriting the

reversal of a set shifted by a particular exit time and is needed in the proof of the

Folk Theorem.

In Section 2.4 a simple coin flipping example is given to illustrate the central

idea behind the Folk Theorem, and then the Theorem itself is presented. Its rigorous

formulation and proof using the theory of stationary point processes is one of the

main results of the thesis. The goal is to apply the Folk Theorem to Markov chains,

and so the subsequent subsection provides techniques for computing the Palm prob-

ability expressions of the Theorem from the stationary measure and kernel of a given

chain. Lemmas 2.4.3 and 2.4.4 are new results which achieve this, and Corollary 2.4.5

uses these two Lemmas to essentially restate the Folk Theorem in the language of

Markov chains. Section 2.4.3 applies the Markovian version of the Folk Theorem to

an accounting network example taken from [8].

The next idea, also already well known, is that the standard paradigm wherein

one approximates the distribution of a Markovian process at time n, for large n, by

the limiting steady state may be an oversimplification in some cases. This is because

many processes so modeled are in fact substochastic - i.e. there is a chance that the

whole process stops (the machine breaks, the population dies, the universe implodes

etc). For such processes analyzing the distribution at time n conditional upon still

being alive and its limiting distribution � (when it has one) may be more apt. Thus the

main result of Chapter 3 gives a rate of convergence of these conditional distributions

3

to � for discrete time Markov chains. Notice that a substochastic chain staying alive

until a large time is itself a rare event.

To set up the exposition, Section 3.1 introduces the concept of a Yaglom limit

and states some of the classic results for such objects. Then Theorem 3.3.2 of Section

3.3 is the central new result giving the rate of convergence of �n to � with respect to a

chi-square distance. The examples to which this result is then applied (in Section 3.4)

illustrate the surprising fact that one of the standard approaches to finding such rates

of convergence in the fully stochastic case (given in [11]) fails in the substochastic

case. Another important consideration in the substochastic setting is whether the

convergence to quasi-stationarity is more rapid than absorption, and in Section 3.4 a

simple eigenvalue inequality is given as a test and applied in an example. Section 3.5

restates the ideas of Section 3.3 in the context of continuous time chains.

Bringing these two ideas together to analyze large deviations of substochastic

chains is discussed in Chapter 4. The central idea of this chapter is that large devia-

tions require many time steps to occur, and thus the kernel governing the transitions

for such paths is the one formed by conditioning on being alive not at the present,

but far into the future. In the limit, one considers the sustained kernel of Section 4.1,

whose paths are conditioned to live indefinitely. Such paths typically form a set of

measure 0 since absorption (i.e. death) is usually assumed to be certain. The measure

induced by the sustained kernel appeared in [7].

No rigorous version of the Folk Theorem for substochastic chains is proved. How-

ever a new example based on the network of [8] is given in Section 4.2 which illustrates

both that considering the sustained kernel is the justified approach, but also that the

Folk Theorem can fail to hold. This happens because on infinite state spaces condi-

tioning on never dying can cause the sustained kernel to “push” paths away to infinity

in order to keep them alive. The trade off for immortality is exile. Nevertheless, in

the example of Section 4.2 the backwards process does adhere to a predictable path

for a time, then hits a boundary and flies away. The “fix” for such behavior is to put

4

reflecting bounds around the state space to make it finite.

Finding the quasi-stationary distribution � is often more difficult than finding the

steady state of a stochastic Markov kernel, and so finally in Chapter 5 two techniques

for constructing the quasi-stationary measure are given. First in Section 5.1 the

watched process of Section 2.2 is revisited in the substochastic setting. Here the

ideas of Section 4 of [25] are extended to the case where the set A on which the

process is being “watched” is more general than a singleton. A new representation

Theorem (Theorem 5.1.6) shows how quasi-stationary measures can be computed

from a 1-invariant measure of a much smaller matrix indexed by elements of the

watched set A.

In Section 5.2 the Approximation Lemma (Lemma 5.2.1) shows how the quasi-

stationary measure can be constructed from an approximate quasi-stationary measure

(i.e. a measure which satisfies the necessary equations on a reduction of the state

space). There are cases in which finding such measures is easier than finding �

because parts of the state space on which the transitions are more difficult to work

with can be “swept away”. This is a new result based on a similar approach given

in [8] for stochastic kernels. The Approximation Lemma is applied to an example in

which � is successfully computed.

Throughout, probability measures on a countable space S will typically be treated

as row vectors when left multiplying matrices: �K. The vectors which contain only

0s or 1s will be treated as column vectors, as will generally any functions to which a

matrix will be applied as an operator: Kh. It should be clear from the context which

form a function on S takes. For a function g on S the expression diag.g/ is used to

denote the square matrix whose rows and columns are indexed by S and which has

the entries of g along its diagonal, and 0s elsewhere.

Chapter 2

Rare Excursions for Markov

Processes

2.1 Stationary Point Processes

The notation established in [3, 4] is followed, and a brief explanation of it is given in

this section.

Let .N; �t ; P / be a stationary point process on the probability space .�;F ; P /;

this means that

i) N is a random variable on .�;F ; P / taking values in the space .M;M/ of sigma

finite counting measures on .R;B.R//, with the � -field M generated by sets of

the form

fm 2M Im.C/ D kg; k 2 N; C 2 B.R/;

ii) f�tgt2R is a measurable flow on .�;F/, i.e.

a) .t; !/! �t.!/ is B.R/˝ F=F - measurable,

b) �t is bijective for all t 2 R, and

c) �t ı �s D �tCs for all t; s 2 R,

5

2.1. Stationary Point Processes 6

iii) .N; P / is compatible with the flow f�tgt2R, i.e. for every t 2 R and every ! 2 �,

the following equalities of measures hold: N.�t!/ D StN.!/ and P ı �t D P ,

where St is the shift operator on .M;M/ satisfying Stm.C/ D m.C C t /.

Note that �0 is the identity on � and hence that the inverse of �t is ��1t D ��t .

The intuitive idea here is that the random measure N and the probability P look no

different when shifted by the flow �t . The use of the term “stationary” is justified

since it follows from these definitions that

P ŒN.C1/ D n1; : : : ; N.Ck/ D nk� D P ŒN.C1 C t / D n1; : : : ; N.Ck C t / D nk�

for any borel sets C1; : : : ; Ck � R, any nonnegative integers n1; : : : ; nk 2 N, and any

t 2 R.

The nth point of N.!/ is denoted Tn.!/ so that N.C/ DPn2Z IC .Tn/ (the

convention here is that the zeroth point of N.!/ is the largest point not exceeding

zero to which N.!/ gives mass, and that the points are ordered).

A stochastic process fXtgt2R on .�;F ; P / is �t compatible if it satisfies Xt.!/ D

X0.�t!/ for all t 2 R and ! 2 �. Notice that P ı �t D P implies that such a process

is stationary in the usual sense since

P ŒXt 2 A� D P ŒX0 ı �t 2 A� D P ı ��t ŒX0 2 A� D P ŒX0 2 A�

for all t 2 R and all A 2 F .

Although taking .�;F ; P / to be any general probability space does not change

the results of the discussion herein, more intuitive clarity is achieved (without com-

promising that generality) by selecting a particular space to deal with. Specifically,

let � the set of cadlag trajectories ! W R! S from R to some countable, measurable

space .S;S/ which have only countably many, non-accumulating discontinuities and

let F D �.fX�1t .G/IG 2 S; t 2 Rg/ where fXtgt2R is the coordinate process given by

Xt.!/ D !.t/. According to Kolmogorov’s Extension Theorem and Theorem 38:1 of


Figure 2.1: A sample path ! and the shifted path �s!.

[5], it is possible to obtain such a space/process pair corresponding to any collection

of finite dimensional distributions one might consistently specify, and this shall be

done variously throughout.

Take �t to be the shift satisfying �t!.s/ D !.s C t /. It is clear that this defines

a measurable flow and that the coordinate process is indeed compatible with f�tgt2R.

Let P be such that P ı �t D P for all t 2 R so that the process fXtgt2R is a

stationary process. Figure 2.1 depicts a typical path ! and the shifted path �s!.

With this setup various point processes can be constructed; for instance, N may

count the jump discontinuities of X , or the number of entrances into a particular set.

For any random variable W W �! R define �W W �! � by �W .!/ WD �W.!/.!/.


It will be helpful to refer later to the following properties: for random variables

U; V W �! R

�U ı �V D �VCUı�V : (2.1.1)

Also,

Tm ı �t D

8<:TmCN.0;t� � t t � 0

Tm�N.t;0� � t t < 0

(2.1.2)

holds for all m 2 Z from which

Tm ı �Tn D TmCn � Tn; m; n 2 Z (2.1.3)

follows by taking t D Tn. Consequently

�Tn ı �T�n D �T�nCTnı�T�n D �T�nCTn�n�T�n D �T0;

and so, for all n 2 N,

��1Tn D �T�n on �0 WD fT0 D 0g: (2.1.4)

The focus being on stochastic paths, jump processes on a two dimensional integer

lattice for example, it is natural to ask questions like “what is the probability of having

hit a distant point .`; y/ 2 N2 after some large time t?”. In practical examples, such

as when the jump process describes the state of a tandem queue, one wishes to do

analysis conditional on a jump having occured at time 0. This is a set of measure 0

however, and so the following construction of a measure which puts all its mass on

�0 is handy.

Let l denote the Lebesgue measure on .R;B.R// and let � D EŒN.0; 1�� be the

intensity of .N; �t ; P /. Since P ı �t D P , the measure B.R/ 3 C 7! EŒN.C /� is

translation invariant and therefore proportional to l ; that is EŒN.C /� D � � l.C /.

Furthermore, for A 2 F and t � 0, stationarity and (2.1.2) imply that

E

"Xn2Z

IA.�Tn/IC .Tn/

#D E

"Xn2Z

IA.�Tn ı �t/IC .Tn ı �t/

#


D E

"Xn2Z

IA.�TnCN.0;t�/IC .TnCN.0;t� � t /

#

D E

"Xk2Z

IA.�Tk/ICCt.Tk/

#with the extremes of this equality holding also for t � 0 (the argument proceeds with

N.0; t � replaced by �N.t; 0�). That is, the measure

B.R/ 3 C 7! E

"Xn2Z

IA.�Tn/IC .Tn/

#is also translation invariant, hence proportional to l , and therefore

PN0 .A/ WD1

�l.C /E

"Xn2Z

IA.�Tn/IC .Tn/

#D

1

�l.C /E�Z

C

.IA ı �s/N.ds/

�(2.1.5)

has the same value for any C 2 B.R/. The resultant measure PN0 on F is called the

Palm probability of the stationary point process .N; �t ; P /. As mentioned earlier, it

has the appealing property that

PN0 Œ�0� D1

�l.C /E

"Xn2Z

�fT0 ı �Tn D 0gIC .Tn/

#D 1 (2.1.6)

as T0 ı �Tn � 0.

Korolyuk’s estimate ((1.5.1) of [3]) says that

P ŒN.0; t � > 1� D o.t/; (2.1.7)

and Dobrushin’s estimate ((1.5.2) of [3]) says that

P ŒN.0; t � > 0� D �t C o.t/I

it will be useful to notice that this second limit implies

limt!0

P ŒT1 � t �

�tD 1: (2.1.8)

In the forthcoming Folk Theorem stationary point processes which count only

jumps of a certain sort, namely entrances or exits into sets of particular interest, shall


be studied. Let A 2 S be such that the stochastic process fIfX.t/2Aggt2R is almost

surely continuous at all points t except possibly those belonging to a countable set

with no accumulation points.

The discontinuities of the process fIfX.t/2Aggt2R can be categorized as either en-

trance times, i.e. those values t 2 R for which there is an a > 0 with X.t � �/ … A

and X.t C �/ 2 A for all 0 < � < a, or exit times, which have the reverse definition.

Let fT!An gn2Z denote the (non-decreasing) sequence of successive entrance times

into A and fT A!n gn2Z the exit times out of A, with the same numbering convention

used for ordinary point processes. Let N!A and NA! be the point processes whose

points are these sequences of entrance and exit times respectively. It is straightforward

to show that these point processes are �t -compatible and assign each singleton a mass

of 0 or 1 (that is to say, they are simple).

If, in addition to the above assumption on A, it is also true that the point

processes N!A and NA! have finite and positive intensities then A is called regular.

If A;F 2 S are disjoint regular sets then define

� N .A!/F to be the point process whose points consist of the subsequence of the

exits fT A!n gn2Z out of A after which X hits F before A;

� N .!A/F to be the point process whose points consist of the subsequence of the

entrances fT!An g into A after which, once X has left A, it hits F before A;

� N F.!A/ to be the point process whose points consist of the subsequence of the

first entrances into A after X has left F .

Figure 2.2 shows a sample path with the point T .A!/F1 indicated.

The notation �N will be used to denote the intensity EŒN.0; 1�� of a general point

process N , and the notation �A.!F / will be used in place of the more cumbersome

�NA.!F/

in the case that N is of the form NA.!F /, with similar notation for the other

entrance and exit point processes defined above.


Figure 2.2: The point T .A!/F1 for a sample path !.

2.2. The Markovian Setting 12

When considering more than one stationary point process at a time, such as

N F.!A/ and N F!, it is useful to have a method of intelligibly switching between

their associated Palm measures. The Neveu Exchange formula provides a means of

doing so:

Theorem 2.1.1 Let .N; �t ; P / and .N 0; �t ; P / be two stationary point processes with

finite intensities. Then for any non-negative measurable function f W .�;F/ !

.R;B.R//

EN0 Œf � DEN 00 Œ

R.0;T 01�

.f ı �t/N.dt/�

EN 00 ŒN.0; T 01��(2.1.9)

(see (1.3.6) of [3]).

2.2 The Markovian Setting

In this section assume that the process X is an ergodic Markov process with generator

Q satisfyingPj¤i qij < 1 for each i 2 S , stationary distribution �, and define

Ft WD �.Xu; u � t /. Associate with X the basic point process N which counts the

jump discontinuities of X and is given by (1.1.13) of [3]:

N.C/ WD

ZC

1S2ndiag.S2/.Xs�; Xs/m.ds/; C 2 B.R/ (2.2.1)

where m is the counting measure on .R;B.R//. From this the embedded discrete time

chain Y associated with X can be formed as Yn WD XTn D X0 ı �Tn . This is the chain

obtained by winnowing from X all information about its sojourn times.

Let A;D 2 S be disjoint regular sets, let �n WD TD.!A/n , and define the watched

process W by Wn WD X0 ı ��n . Observing W is equivalent to “watching” the process

X only when it returns to A after having left D.

Recall the strong Markov property for stopping times � :

P Œ��1� GjF� � D PX� ŒG� a.s. on f� <1g (2.2.2)


for all G 2 FC WD �.Xu; u � 0/. (Here F� is the � -field

F� WD �.fE 2 F IE \ f� � tg 2 Ftg/

with respect to which both � and X� are measurable).

Since X is cadlag, the points �n, n 2 ZC are stopping times with respect to

fFtgt2R. Furthermore, �n � �nC1 holds for all n. Accordingly, the sigma fields F�nform a filtration for W .

Note that by (2.1.3)WnC1 D X0ı�TD.!A/nC1

D X0ı.�TD.!A/1

ı�TD.!A/n

/ D W1ı��n for

each n. Then for i0; : : : ; in D i; j 2 S , and E D fWn D i;Wn�1 D in�1; : : : ; W0 D i0g,

the strong Markov property implies

P ŒfWnC1 D j g \E� D EŒIEEŒIfWnC1Dj gjF�n��

D EŒIEEŒI��1�n fW1Dj gjF�n��

D EŒIEPWnŒW1 D j jF�n��

D P ŒE�Pi ŒW1 D j �:

That is, W is a Markov chain.

Let K be the (embedded) kernel whose entries consist of the instantaneous tran-

sition probabilities of X :

Kij WDqijP`¤i qi`

; i; j 2 S; i ¤ j: (2.2.3)

This kernel corresponds to the chain Y . From it the kernel corresponding to W can

be determined: first define the nth entrance time �B.n/ into an arbitrary set B by

�B.0/ � 0 and

�B.n/ WD N.0; T!Bn � D k ”

kXiD1

�fYi 2 B; Yi�1 … Bg D n; (2.2.4)

and for i; j 2 S , let BK.0/ij WD �fi D j g, BK

.1/ij D Kij and

BK.n/ij WD

Xk…B

BK.n�1/

ikKkj D Pi ŒYn D j; �B.1/ � n� n � 2: (2.2.5)


This is the probability that the chain Y , starting from i , moves to j via a path that

avoids B in the n � 1 intermediate steps. Now define

BGij WD

1XnD1

BK.n/ij (2.2.6)

for i; j 2 S . This specifies the probability of a transition from i to j via a path

(of any length) which avoids B. In the case B D ; note that BK.n/ D Kn and so

BGij DP1nD1K

nij is just the potential series. Note that if K is recurrent then BG

satisfies Xj2B

BGij D Pi Œ�B.1/ <1� D 1; i 2 S:

With this notation, the transition matrix for W can be described:

Lemma 2.2.1 For any disjoint, regular sets A;D 2 S, the watched process Wn D

X0 ı �TD.!A/nis a Markov chain with transition kernel

P ŒWnC1 D j jWn D i � DXk2D

.DGik/.AGkj /; i; j 2 A:

Proof: It is enough to observe that, a one step transition for W from i 2 A to

j 2 A consists of a path for Y which must hit D for a first time and subsequently hit

A for a first time.

For each i 2 S qi WDPj¤i qij gives the jump rate out of i , and so the rate at which

X transitions from i , via a path that hits D and subsequently hits A first at j 2 A,

is

qD.!A/ij WD qi

Xk2D

.DGik/.AGkj /: (2.2.7)

Therefore, by stationarity,Z Z tCh

t

ND.!A/.ds/dP

D EŒND.!A/.t; t C h��


D P ŒND.!A/.t; t C h� D 1�C o.h/

D

Xi2S;j2A

P ŒXtCh D j;ND.!A/.t; t C h� D 1jXt D i �P ŒXt D i �C o.h/

D

Xi2S;j2A

qD.!A/ij P ŒXt D i �hC o.h/

D

Z Z tCh

t

Xi2S;j2A

�.i/qD.!A/ij dsdP C o.h/

for t 2 R and h > 0 small. It follows from this that

ND.!A/.ds/dP DX

i2S;j2A

�.i/qD.!A/ij dsdP:

Consequently

�D.!A/ D E�Z 1

0

ND.!A/.ds/

�D

Xi2S;j2A

�.i/qD.!A/ij

and

PD.!A/0 ŒX0 D j � D

1

�D.!A/E�Z 1

0

Ifj g.Xs/ND.!A/.ds/

�D

Pi2S �.i/q

D.!A/ijP

i2S;k2A �.i/qD.!A/

ik

(compare Equation (1.4.8) of [3]). By (2.2.2) and (2.1.5)

PD.!A/0 ŒH � D

1

�D.!A/E

24Xn2ZC

IH .��n/I.0;1�.�n/

35D

1

�D.!A/

Xn2ZC

E�EhI��1�n H j F�n

iI.0;1�.�n/

�D

1

�D.!A/

Xn2ZC

E�PX�n ŒH �I.0;1�.�n/

�for any H 2 FC. So, by the mean value formula ((1.3.3) of [3]) with g D Ij ,

PD.!A/0 ŒH � D

Pj2A Pj ŒH �E

hPn2ZC �fX�n D j gI.0;1�.�n/

iEhP

n2ZC I.0;1�.�n/i

D

Xj2A

Pj ŒH �PD.!A/0 ŒX0 D j �: (2.2.8)

Therefore

2.3. Reversals 16

Lemma 2.2.2 Let X be an ergodic Markov chain with stationary distribution � and

generator Q D .qij /i;j2S , let H 2 FC, and let A;D 2 S be disjoint regular sets.

Then

PD.!A/0 ŒH � D ED.!A/0 ŒPX�0 ŒH �� D

Pi2S;j2A �.i/q

D.!A/ij Pj ŒH �P

i2S;j2A �.i/qD.!A/ij

where qD.!A/ij is given by (2.2.7).

Proof: The first equality is just (2.2.8), and the second equality has been estab-

lished by the foregoing arguments.

Note that Lemma 2.2.2 can be leveraged for computations of the form PD!0 ŒH � or

P!A0 ŒH �, for example, by identifying the associated Palm measures as PD.!Dc/

0 or

PAc.!A/0 respectively.

2.3 Reversals

The main theorem of Section 2.4 is a statement about how processes behave when

the films of their trajectories are “watched” back to front. The implied process here

is the backwards or reversed process, which will be denoted QX . Some technical details

will be worked out and some notation made clear.

For ! 2 � and F 2 F , define the reversal Q! W R ! R of ! (respectively the

reversal of F ) via

Q!.t/ WD lims"�t

!.s/; QF WD f Q!I! 2 F g:

Thus Q! is the cadlag version of the map R 3 t 7! !.�t /. Notice that ! 7! Q! defines

a bijection of �. Define the reversed process . QXt/t2R by QXt.!/ D Q!.t/.

The process QX is measurable with respect to F . To see this, simply note that

since � consists precisely of the cadlag trajectories from R to S , for any Borel set C

2.3. Reversals 17

the set QX�1t .C / is of the form

f! 2 �I Q!.t/ 2 C g D f! 2 �I lims"�t

!.s/ 2 C g D[

�2.0;1/\Q

\s2.�t��;�t/\Q

f! 2 �I!.s/ 2 C g

and hence an element of F . Therefore all cylinder sets corresponding to the process

QX lie within F and QX is measurable. This also verifies that QF 2 F for any F 2 F .

Definition 2.3.1 Let N DPn2Z ıTn be a point process on .�;F ; P /. Then the time

reversal of N is the point process QN on .�;F ; P / defined by QN DPn2Z ı�Tn.

Note that the point process QN is not compatible with the flow f�tg but rather

the flow Q�t WD ��t ; this is because, for t � 0 and C 2 B.R/, (2.1.2) implies

QN ı �t.C / DXn2Z

ı�Tnı�t .C /

D

Xn2Z

ı�TnCN.0;t�Ct.C /

D

Xk2Z

ı�Tk.C � t /

D QN.C � t /

with a similar result holding for t � 0 (see (1.1.4) of [3]). Of course, P is compatible

with Q�t , and hence . QN; Q�t ; P / is a stationary point process for every stationary point

process .N; �; P /. It should also be clear that the point process corresponding to

QX is QN . The Palm probability corresponding to . QN; Q�t ; P / is denoted QPN0 , and the

intensity EŒ QN.0; 1�� of the point process QN is denoted Q�.

Lemma 2.3.2 For all ! 2 �, t 2 R and all random variables W W �! R

a) e��1t D Q��1t I

b) e�t! D Q��t Q!; e�W! D Q��W Q!I

c) e�tF D Q��t QF ; A�WF D Q��W QF F 2 F :

2.3. Reversals 18

Proof: Assertion a) is verified by the observation that

e��1t ı Q�t D f��t ı Q�t D �t ı ��t D �0:Recall that the basic point process described by (2.2.1) counts the jump discontinuities

of X , and that these jumps do not accumulate. Thus for any u 2 R there is some

� > 0 such that

Q��t Q!.u/ D �t lims"�u

!.s/

D

8<:�t!.�u/ �u a continuity point of !

�t!.�u � �/ �u a point of discontinuity of !

D

8<:!.t � u/ �u a continuity point of !

!.t � u � �/ �u a point of discontinuity of !

D

8<:!.t � u/ �u � t a continuity point of !.� C t /

!.t � u � �/ �u � t a point of discontinuity of !.� C t /

D lims"�u

!.s C t /

D lims"�u

�t!.s/

De�t!.u/:

Now, notice that the rule Q�t D ��t remains valid with t replaced by the random

variable W (recall the definition �W .!/ WD �W.!/.!/ preceding (2.1.1)). Moreover,

since the coordinate process X is compatible with f�tgt2R,

�W!.s/ D Xs.�W!/ D X0.�s ı �W.!/.!// D X0.�sCW.!/.!// D !.s CW.!//

and hence the above arguments go through with t replaced by W , verifying b). As-

sertion c) follows immediately from b).

For A � S , note that �T A!1 D QT!A�1 on the set �A!0 WD fT A!0 D 0g. Applying

2.3. Reversals 19

(2.1.4) and part c) of Lemma 2.3.2 with W D T A!1 yields

B��1TA!�1

F D B�TA!1 F D Q��TA!1QF D Q� QT!A

�1

QF

D Q��1QT!A1

QF on �A!0 D fT A!0 D 0g: (2.3.1)

Let . QFt/t2R denote the filtration corresponding to QX .

Lemma 2.3.3 If X is a stationary Markov process with steady state � then for any

F 2 F , P ŒF � D P Œ QF �.

Proof: It is sufficient to check the claim for F belonging to a �-system generating

F (this is because the operations of intersection, union and complement all commute

with the reversal operator Q�). Thus let F be a cylinder set of the form

F D fXt1 2 H1; : : : ; Xtn 2 Hng

for some n 2 N, some real numbers t1 < � � � < tn and some sets H1; : : : ;Hn 2 S. Let

Q be the generator for X and define

Qqij WD�.j /qj i

�.i/; QQ WD . Qqij /i;j2S : (2.3.2)

This is the generator for QX and has steady state �. Let .P.t//t2R and . QP.t//t2R denote

the transition semigroups for X and QX respectively:

Pij .t/ D P ŒXt D j jX0 D i � D .eQt/ij ; QPij .t/ D P Œ QXt D j j QX0 D i � D .e

QQt/ij :

Notice that

Pij .t/ D�.j /

�.i/QPj i.t/:

Then

P ŒF � DX

.i1;:::;in/2H1��Hn

P ŒXt1 D i1�

n�1YkD1

P ŒXtkC1 D ikC1jXtk D ik�

D

X.i1;:::;in/2H1��Hn

�.i1/

n�1YkD1

PikikC1.tkC1 � tk/

2.3. Reversals 20

D

X.i1;:::;in/2H1��Hn

�.in/

n�1YkD1

QPikC1ik.tkC1 � tk/

D P Œ QF �:

Note that the steady state for K is the measure p.i/ WD c�.i/P`¤i qi` where

c D�P

i2S;`¤i �.i/qi`��1D��Pi2S �.i/qi i

��1is a normalizing constant, and note

that by (2.3.2),P`¤i qi` D �qi i D �Qqi i D

P`¤i Qqi`: So defining QKij WD

p.j /Kjip.i/

,

i ¤ j , yields

QKij D�.j /

P`¤j qj`

�.i/P`¤i qi`

qj iP`¤j qj`

DQqijP`¤i qi`

DQqijP`¤i Qqi`

: (2.3.3)

This is the kernel corresponding to the embedded chain for QX . Let B QK.n/ be defined

as in (2.2.5) with Kij replaced by QKij , let BQG WD

P1nD0 B

QK.n/, let Qqi WDP`¤i Qqi`,

and let

QqD.!A/ij WD Qqi.DGAG/ij ; i 2 S; j 2 A: (2.3.4)

2.4. The Folk Theorem 21

2.4 The Folk Theorem

The Folk Theorem of this section is meant to make precise the following result which

has been applied elsewhere (see [2]): the large deviation path from the origin to a

rare event can be obtained by observing the time reversal from the rare event back

to the origin. This will be formulated more rigorously with the notation of section

(2.1).

Example 2.4.1 (Gambler’s Coin) Consider a simple random walk on N: a gam-

bler repeatedly flips a coin with probability of heads and tails given by p and q D 1�p

respectively, p < q. If heads comes up the gambler receives a dollar, and if tails comes

up she loses a dollar if she has one to lose (i.e. 0 is a reflecting state). Let Yn be the

amount of money she has in hand after the nth coin flip. Thus the transition kernel

for the process fYngn�0 is

K D

26666666664

q p 0 0

q 0 p 0 � � �

0 q 0 p

0 0 q 0: : :

:::: : :

: : :

37777777775;

and the well known stationary distribution � for this chain is given by �.n/ D�1 � p

q

� �p

q

�n. How does the gambler accrue a large fortune?

If we are interested in the path to a large number ` we might try simulating paths

and extracting those which arrive at ` for analysis. However, if ` is quite large then

this could be quite time consuming and inefficient since the vast majority of paths

would need to be discarded. The revelation of the Folk theorem is that it is possible to

start from the rare event and “watch the film in reverse” to see what rare excursions

are likely to look like.


The time reversal of K is the kernel QK given by

QKij D�.j /

�.i/Kj i D

�p

q

�j�iKj i

and so K is reversible; that is, QK D K. Thus when watching the gambler’s fortune

evolve in reverse as it begins at some large value ` and drifts back toward 0 one sees

roughly q transitions to the left for every p transitions to the right. This means that

the path which lead to the rare excursion contained roughly q transitions to the right

for every p transitions to the left!

The surprising conclusion is that, the most likely paths which lead to the rare

excursion are ones in which the coin behaved as if the probabilities of its sides were

permuted (see Figure 2.3). Here, the simulated reversed paths QY remained close in

some sense (to be made precise in Section 2.4.2) to the line segment

Qy.s/ D `C .p � q/s; 0 � s �`

q � p(2.4.1)

In the following discussion, assume that A and D`, ` 2 N, are regular sets for

fXtgt2R in S satisfying A \D` D ; and

lim inf`!1

QPD`!0 Œ QT A!1 < QT

D`!1 � > 0: (2.4.2)

Theorem 2.4.2 (Folk Theorem) Let fF`g`2N be a collection of measurable sets sat-

isfying

lim`!1

QPD`!0 Œ QF`� D 1: (2.4.3)

Then

lim`!1

P.A!/D`0 Œ��1

TA!�1

F`� D 1:

Proof: Recall that, by (2.3.1), B��1TA!�1

F` D Q��1QT!A1

QF` under P .A!/D`0 . Since

P.A!/D`0 ŒU � D QP

D`.!A/0 Œ QU � U 2 F ;


Figure 2.3: Reversal of Coin Flip Trajectories


it follows by the exchange formula (2.1.9) that

P.A!/D`0 Œ��1

TA!�1

F`� D QPD`.!A/0 Œ Q��1

QT!A1

QF`�

D QED`.!A/0 ŒI Q��1QT!A1

QF`�

D

QED`!0

�RŒ0; QT

D`!

1 /I Q��1QT!A1

QF`ı Q�t QN

D`.!A/.dt/

�QED`!0 Œ QND`.!A/Œ0; QT

D`!1 /�

:

Now, on the set f QTD`!0 D 0g, QTD`.!A/1 D QT!A1 and so the only point in the interval

Œ0; QTD`!1 / to which QND`.!A/ gives mass is QT!A1 . Thus the above denominator is

QPD`!0 Œ QT!A1 < QT

D`!1 � and the numerator is

QED`!0

��I Q��1QT!A1

QF`ı Q� QT!A1

�If QT!A1 < QT

D`!

1 g

�D QED`!0

hI QF`If QT!A1 < QT

D`!

1 g

i:

This means that

P.A!/D`0 Œ��1

TA!�1

F`� DQPD`!0 Œ QF` \ f QT

A!1 < QT

D`!1 g�

QPD`!0 Œ QT A!1 < QT

D`!1 �

�QPD`!0 ŒA`�

QPD`!0 ŒA`�C QP

D`!0 Œ QF c

`�

where A` D QF` \ f QTA!1 < QT

D`!1 g. Finally, using (2.4.3) and (2.4.2),

lim inf`!1

P.A!/D`0 Œ��1

TA!�1

F`� � 1:

To make intuitive sense of the Folk Theorem, treat the set D` as a rare event

which gets rarer as `!1 (perhaps it is a point in the state space of a Markov chain

which is moving away), and treat A as the set on which the process X begins. Recall

that the Palm measure QPD`!0 concentrates all mass on the subset Q�D`!0 D f QTD`!0 D

0g of � on which the backwards process QX has an exit from the rare set D` at time


0. Thus condition (2.4.3) is the requirement that the backwards process QX remains

asymptotically within a tube QF` of trajectories almost surely. So if one observes

QX and then “runs the film in reverse”, the expectation might be that the resulting

forward trajectory will belong to the set of paths one gets by turning everything in

QF` around in time, i.e. F`.

This is inaccurate since turning everything in QF` around in time would yield

trajectories which arrive at the rare set D` at time 0. The idea is to start at the set A

at time 0 then arrive at D` at time T!D`1 , the first entrance time into D`. Thus each

path ! 2 F` must be shifted by the amount of time it took the path Q! to travel from

the rare event to A, namely QT!A1 D �T A!�1 . This is the reason that the conclusion of

the Folk Theorem is expressed in terms of ��1TA!�1

F` rather than simply F`. Finally, the

Palm measure P .A!/D`0 concentrates all its mass on paths which leave the starting

set A at time 0 and subsequently reach the rare event before returning to A.

The Folk Theorem will be applied to the coin flipping example (Example 2.4.1) in

Section 2.4.2. It was seen in that example that the reversed paths QY remained close

somehow to the line segment (2.4.1). It will be shown that the sets QF` defined in

2.4.24, which are “tubes” of paths around a scaling of this line segment beginning at

the rare point, satisfy (2.4.3). Trajectories of the reversed fortune process QY sped up

in time and scaled in space remain asymptotically within these tubes almost surely.

Thus the conclusion from the Folk Theorem is that the forward trajectories

(properly scaled) remain asymptotically within a tube of trajectories which begin at

the initial fortune and stay close to the same line segment.

2.4.1 Folk Theorem for Markov Chains

Working with Palm probabilities is useful for formalizing the key notions of the Folk

Theorem, and for proving a general form of the result. However, the notation can

be rather cumbersome and resultantly obscure many of the ideas. Since the main


motivation is an application to discrete time Markov Chains, this Subsection will

serve to provide computational devices for verifying some of the conditions associated

with Theorem 2.4.2, as well as results more readily applied to the setting of interest.

In this Subsection suppose that X is an ergodic Markov process with generator Q

and stationary measure �. The following results allow condition (2.4.3) to be checked

without appeal to Palm probabilities:

Lemma 2.4.3 Let f QF`g`2N be a sequence of measurable sets from QFC, and let � WD

QTD`!1 . Then

QPD`!0 Œ QX 2 QF`� D

Q�

Q�D`!

Xi2D`

Xj2Dc

`

Pj Œ QF`��.i/Kij (2.4.4)

D

Pi2D`

Pj2Dc

`Pj Œ QF`��.i/KijP

i2D`

Pj2Dc

`�.i/Kij

: (2.4.5)

Proof: Recall that with the notation of Section 2.3, Q� and Q�D`! denote the

intensities EŒ QN.0; 1�� and EŒ QND`!.0; 1�� of QN and QND`! respectively, and recall that

K denotes the kernel associated with the embedded chain. Equation (1.5.3) of [3]

implies, with A D QF` D f QX 2 QF`g and N D QND`!,

QPD`!0 Œ QX 2 QF`� D lim

t!0P Œ QX ı Q�� 2 QF`j� � t �: (2.4.6)

Using the strong Markov property (2.2.2) write

P Œ QX ı Q�� 2 QF`; � � t � D E.P Œ QX ı Q�� 2 QF`j QF� �If��tg/

D E.P QX� Œ QF`�If��tg/

D E

0@Xi2D`

Xj2Dc

`

Pj Œ QF`�If QX�� Di; QX�Dj;��tg

1AD

Xi2D`

Xj2Dc

`

Pj Œ QF`�P Œ QX�� D i;

QX� D j; � � t �;

and note that by (2.1.7)

P Œ QX�� D i;QX� D j; � � t � D P Œ QX

�� D i;

QX� D j; � � t; QN.0; t � D 1�C o.t/


D P Œ QX QT0 D i;QX QT1 D j;

QT1 � t �C o.t/; i 2 D`; j 2 Dc` :

Moreover, by (1.4.27) of [3],

P Œ QX QT0 D i;QX QT1 D j;

QT1 � t �

D P Œ QX QT0 D i;QX QT1 D j � � P Œ

QX QT0 D i;QX QT1 D j;

QT1 > t�

D Q��.i/Kij

Z 10

.1 �Gij .s//ds � Q��.i/Kij

Z 1t

.1 �Gij .s//ds

D Q��.i/Kij

Z t

0

.1 �Gij .s//ds

where Q� is the intensity of QX and

Gij .s/ WD P Œ QTnC1 � QTn � sj QX QTnC1 D j;QX QTn D i �; n 2 Z:

Thus, using (2.1.8), (2.4.6) becomes


t!0

Xi2D`

Xj2Dc

`

Pj Œ QF`� Q��.i/KijR t0.1 �Gij .s//ds

Q�D`!t

D

Q�

Q�D`!

Xi2D`

Xj2Dc

`

Pj Œ QF`��.i/Kij limt!0

1

t

Z t

0

.1 �Gij .s//ds

D

Q�

Q�D`!

Xi2D`

Xj2Dc

`

Pj Œ QF`��.i/Kij .1 �Gij .0C//

D

Q�

Q�D`!

Xi2D`

Xj2Dc

`

Pj Œ QF`��.i/Kij ;

verifying (2.4.4). In the case that QF` D Q�, this equation says

Q�D`!

Q�D

Xi2D`

Xj2Dc

`

�.i/Kij ;

which verifies (2.4.5).


Lemma 2.4.4 Let � WD QTD`!1 . If QF` is such that

lim`!1

P Œ QX ı Q� QT1 2QF`j QT1 D �� D 1 (2.4.7)

then

lim`!1

QPD`!0 Œ QX 2 QF`� D 1:

Proof:

Let

�.`; t/ WD f QX QT0 2 D`; QX QT1 2 Dc` ;QT1 � tg D f� D QT1; � � tg

and

�.`/ WD f QX ı Q� QT1 2QF`g:

Using (2.1.8) and (2.1.7) implies

limt#0

P Œ QX ı Q�� 2 QF` j � � t � D limt#0

P Œ QX ı Q�� 2 QF`; � � t �

P Œ� � t �

D limt#0

P Œ QX ı Q�� 2 QF`; � � t �

Q�D`!t

D limt#0

P Œ QX ı Q� QT1 2QF`; QX QT0 2 D`; QX QT1 2 D

c`; QT1 � t �

Q�D`!t

D limt#0

P Œ�.`/ \�.`; t/�

Q�D`!t:

Furthermore, Equation (1.5.3) of [3] implies, with A D QF` D f QX 2 QF`g and N D

QND`!,


t#0P Œ QX ı Q�� 2 QF`j� � t �:

So the above says that


t#0

P Œ�.`/ \�.`; t/�

Q�D`!t: (2.4.8)

Note that (2.1.8) also implies

limt#0

Q�D`!t

P Œ�.`; t/�D lim

t#0

P Œ� � t �

P Œ�.`; t/�

Q�D`!t

P Œ� � t �D lim

t#0

P Œ� � t; QT1 � t �C o.t/

P Œ�.`; t/�D 1:


Let � 2 .0; 1/ be arbitrary. Since f` WD EŒI�.`/j�1; QT1� is equal to P Œ�.`/j� D QT1� on

the set f� D QT1g, the hypothesis and the preceding arguments entail that there exists

an L 2 N large enough so that ` � L implies

1 � � < fÌf�D QT1g C If�¤ QT1g < 1C � (2.4.9)

uniformly in Q! and an s.`/ > 0 small enough so that t � s.`/ implies

1 � � <P Œ�.`; t/�

Q�D`!t< 1C �: (2.4.10)

Then, for ` � L and t � s.`/, multiplying both sides of (2.4.9) by 1Q�D`!t

I�.`;t/

and taking expectations yields

.1 � �/P Œ�.`; t/�

Q�D`!t<

EŒfÌ�.`;t/�C P Œ�.`; t/ \ f� ¤ QT1g�Q�D`!t

< .1C �/P Œ�.`; t/�

Q�D`!t;

after which applying (2.4.10) gives

.1 � �/2 <EŒfÌ�.`;t/�Q�D`!t

< .1C �/2

(note that �.`; t/ � f� D QT1g). Since � 2 .0; 1/ it follows thatˇEŒfÌ�.`;t/�Q�D`!t

� 1

ˇ< 3�

whenever ` � L and t � s.`/. Also, observe that EŒfÌ�.`;t/� D P Œ�.`/ \�.`; t/�.

According to Equation (2.4.8), s.`/ > 0 can be chosen small enough so that

t � s.`/ also implies ˇQPD`!0 Œ QX 2 QF`� �

P Œ�.`/ \�.`; t/�

Q�D`!t

ˇ< �:

Then ` � L and t � s.`/ imply

j QPD`!0 Œ QX 2 QF`� � 1j �

ˇQPD`!0 Œ QX 2 QF`� �

P Œ�.`/ \�.`; t/�

Q�D`!t

ˇCˇ

P Œ�.`/ \�.`; t/�

Q�D`!t� 1

ˇ


< � C

ˇEŒf`I�.`;t/�Q�D`!t

� 1

ˇ< 4�:

The conclusion is that lim`!1QPD`!0 Œ QX 2 QF`� D 1.

Corollary 2.4.5 Suppose that

lim inf`!1

Pi2S �.i/ Qq

D`.!A/ijP

i2S;k2A �.i/ QqD`.!A/

ik

!> 0 (2.4.11)

for j in some nonempty set C � A. If fF`g`2N is a collection of measurable sets

belonging to QFC D �. QXu; u � 0/ and satisfying Equation (2.4.7) then

lim`!1

Pj Œ��1

TA!�1

F`� D 1 8j 2 C:

Proof: By Lemma 2.4.4 QPD`!0 Œ QF`� converges to 1 as `!1. Now, for an arbitrary

set A the random variables T .A!/D`n are not stopping times, and so are not directly

amenable to the analysis of Section 2.2. However, P .A!/D`0 ŒU � D QPD`.!A/0 Œ QU � for

U 2 F , and QTD`.!A/n are stopping times for each n 2 Z. So applying Lemma 2.2.2

with H D B��1TA!�1

F`, which by (2.3.1) is equal to Q��1QT!A1

QF` 2 QFC under QPD`.!A/0 , and

D D D` yields

P.A!/D`0 Œ��1

TA!�1

F`� D QPD`.!A/0 ŒB��1

TA!�1

F`� D

Pi2S;j2A �.i/ Qq

D`.!A/ij Pj Œ Q�

�1QT!A1

QF`�Pi2S;j2A �.i/ Qq

D`.!A/ij

:

By (2.3.1) and Lemma 2.3.3 Pj Œ Q��1QT!A1

QF`� D Pj Œ��1

TA!�1

F`�, and so taking the limit as

`!1 and applying Theorem 2.4.2 gives

1 D lim`!1

Pi2S;j2A �.i/ Qq

D`.!A/ij Pj Œ�

�1

TA!�1

F`�Pi2S;j2A �.i/ Qq

D`.!A/ij

:

For each `, let �` be the measure on A given by

�`.j / WD

Pi2S �.i/ Qq

D`.!A/ijP

i2S;k2A �.i/ QqD`.!A/

ik

; j 2 A:


According to (2.4.11), the support of the measure � WD lim inf`!1 �` is C , and so

1 DXj2C

�.j / lim inf`!1

Pj Œ�TA!�1F`�:

Since this is a convex combination of positive terms summing to 1, the result follows.

The Folk Theorem has been applied to a few networks [2, 19, 22].

Theorem 2.4.6 (Large Deviation Folk Theorem) Let Y be the discrete time em-

bedded chain associated with X . Let fz`g`2N be a sequence of states satisfying �.z0/ >

0 and lim`!1 �.z`/! 0, let f QF`g`2N be a collection of measurable sets satisfying

lim`!1

Pj¤z`

Kz`jPj ŒQF`�P

j¤z`Kz`j

D 1 (2.4.12)

and let E` WD fTz0!0 D 0; T

!z`1 < T

!z01 g. Then

lim`!1

Pz0ŒY 2 C`jE`� D 1 (2.4.13)

where C` WD ��1

Tz0!

�1

F`.

Proof: Consider the point process N of jumps out of z0, and let U WD fY 2 E`g

be the set of trajectories which depart from z0 at time zero and subsequently arrive

at z` before z0. Equation (1.4.7) of [3] defines the thinned process NU as follows:

NU .!; C / WD

ZC

1U .�t!/N.!; dt/; C 2 B.R/:

Notice that with A WD fz0g and D` WD fz`g, NU D N .A!/D` . Moreover, notice that

Pz0 D PN0 (see Section 1.7 in [3]). Therefore applying Equation (1.4.9) of [3] yields

Pz0ŒY 2 C`jE`� D PNU0 ŒY 2 C`� D P

.A!/D`0 ŒY 2 C`�:

By Lemma 2.4.3 and the hypothesis lim`!1QPD`!0 Œ QF`� D 1, and so by Theorem 2.4.2

lim`!1 P.A!/D`0 ŒY 2 C`� D 1. The result then follows.


2.4.2 Fluid Limits

A generalization of a Theorem of Kurtz [18] is often instrumental in finding an ap-

propriate set QF` with which to apply Theorem 2.4.6. The central idea is that the

more “distant” the point in the state space at which the backwards process begins

its journey, the more tightly the trajectories back to the origin, scaled in space and

time, adhere to a deterministic path called the fluid limit. The following exposition

follows [9].

For T > 0 let D.Œ0; T �IRn/ be the space of cadlag trajectories from Œ0; T � to Rn.

Assume that the countable state space S for X satisfies S � RnC (typically S D Nn)

and that the transition rates and directions for X are constant on each of the sets

Ds WD fx 2 RnC j xi D 0 8i 2 sg for s an element of the power set f1; : : : ; ng of

f1; : : : ; ng. Thus D; D fx 2 RnC j miniD1;:::;n

xi > 0g is the interior and Df1g is the first

boundary of RnC.

The process X will be assumed to be “nearest neighbor”, i.e. V WD f�1; 0; 1gn is

the set of possible jump directions out of any state. The jump rate out of a state x in

direction v will be denoted �v.x/. Thus �v.x/ D �v.y/ whenever x; y 2 Ds for any

s 2 f1; : : : ; ng. Moreover, since X lies in S , �v.x/ D 0 whenever x 2 S but xCv … S .

The generator Q for X is therefore given by

qx;xCv WD �v.x/; x 2 S; v 2 V:

Here the process Zk.t/ WD1kX.kt/ scaled in time and space will be compared to a

deterministic limit. The scaled process has the generator Qk defined by

qk.x; x C v=k/ WD k�v.x/; x 2 S=k; v 2 V:

Thus Zk makes smaller jumps than those of X , at the accelerated rates k�v.x/. It

should be clear that the process Zk is Markovian. For x 2 RnC define

B.x/ WD fi 2 f1; : : : ; ng j xi D 0g D s , x 2 Ds; s 2 f1; : : : ; ng:


For s 2 f1; : : : ; ng let xs 2 fx j B.x/ D sg D Ds and for ˇ 2 Rn define

L.s; ˇ/ D sup˛2Rn

(˛ � ˇ �

Xv2V

�v.xs/Œexp.˛ � v/ � 1�

): (2.4.14)

Notice that L.s; ˇ/ � 0 (take ˛ D 0 in the above expression). Now for x 2 RnC,

ˇ 2 Rn define

l.x; ˇ/ WD inf

8<: Xs2B.x/

�sL.s; ˇs/ jXs2B.x/

�sˇs D ˇ;^

s2B.x/

�s � 0;Xs2B.x/

�s D 1

9=;(2.4.15)

and define the functional Ix on functions � 2 D.Œ0; T �IRn/ by

Ix.�/ D

8<:R T0l.�.s/; � 0.s//ds � absolutely continuous, �.0/ D x

C1 otherwise:

(2.4.16)

The following result from [9] will provide tubes QF` to which the Folk Theorem can be

applied.

Theorem 2.4.7 (Theorem 1.1 (ii) in [9]) There exists a compact set C � RnCsuch that for each closed set F in D.Œ0; T �IRn/,

lim supk!1

1

klnPxŒZk 2 F � � � inf

�2FIx.�/

uniformly in x 2 C .

By choosing an appropriate closed set F � D.Œ0; T �IRn/ in Theorem 2.4.7 a Folk

tube can be constructed. All such sets considered here will center around an object

called the fluid limit associated with Q. Let

ˇ; WDXv2V

�v.x/v; x 2 D;; ˇfig WDXv2V

�v.x/v; x 2 Dfig: (2.4.17)

These vectors give the drifts in the interior and on the i th boundary. The interior

fluid limit is a deterministic function p1 defined as the solution to the differential

systemd

dtp1.t/ D ˇ;; (2.4.18)


and having initial point p1.0/ 2 D;. The definition of the boundary fluid limits

depends on a stability condition. Fix i 2 f1; : : : ; ng and let x 2 D; be a point in

the interior. Let V Ci and V �i be the set of jump directions v 2 V with positive or

negative components in the direction of the i th axis respectively. Then if the stability

condition Xv2V

C

i

�v.x/ �Xv2V �

i

�v.x/ < 0; x 2 D; (2.4.19)

holds, the function �i W N! R given by

�i.k/ WD �i.0/

Pv2V

C

i

�v.x/Pv2V �

i�v.x/

!k(2.4.20)

is a well defined probability measure for the appropriate normalizing constant �i.0/.

Condition (2.4.19) is an assurance that the process does not wander away in the

direction of the i th axis. When this condition holds, the i th boundary fluid limit is

the deterministic function pi1 defined as the solution to the differential system

d

dtpi1.t/ D �i.0/ˇfig C .1 � �i.0//ˇ; DW ˇ;;fig; (2.4.21)

with initial point pi1.0/ 2 Dfig. Note that for x 2 Ds, s D ;; fig,

L.s; ˇs/ D sup˛2Rn

(Xv2V

�v.x/Œ˛ � v � exp.˛ � v/C 1�

):

As ˛ ranges over Rn, the numbers uv WD ˛ � v range over R, and so the above

supremand is a convex combination of points lying on the graph of f .u/ D 1Cu�eu.

This function is non positive, and thus any convex combination of the points of f is

non-positive. Since L.s; ˇs/ � 0 it follows that L.s; ˇs/ D 0. If s D ; then the only

element of B.x/ is s D ; and hence (2.4.15) becomes

l.x; ˇ;/ D L.;; ˇ;/ D 0; x 2 D;: (2.4.22)

If instead s D fig then since ˇ;;fig D �i.0/ˇfig C .1 � �i.0//ˇ;, the only convex

combination �;L.;; ˇ;/C �figL.fig/ appearing in (2.4.15) is the one for which �fig D


�i.0/ and �; D 1 � �i.0/. Thus

l.x; ˇ;;fig/ D �i.0/L.fig; ˇfig/C .1 � �i.0//L.;; ˇ;/ D 0; x 2 Dfig: (2.4.23)

Since the function l is strictly convex in its second coordinate (see [9]) it follows that

ˇ; and ˇ;;f1g are the unique vectors satisfying (2.4.22) and (2.4.23).

Example 2.4.8 (Gambler’s Coin Continued) Returning to Example 2.4.1, let QX

be the continuous time Markov process whose instantaneous transition probabilities

are given by QK and whose holding times have rate 1. Thus QY is the embedded chain

associated with QX . The interior fluid limit Qp1 associated with QK can be computed

from (2.4.18):d

dtQp1.t/ D p � q

has solution Qp1.t/ D Qp1.0/C.p�q/t . Let Qp1.0/ D 1 be the initial condition and let

T D 1q�p

be the time point at which the fluid limit reaches 0. Then, for fixed � > 0,

QF WD

�sup0�t�T

ˇQZ`.t/ � Qp1.t/

ˇ� �

�D

(sup0�t�T

ˇˇ QY QN.0;`t�`

� Qp1.t/

ˇˇ � �

)(2.4.24)

satisfies

inf�2 QF c

Ix.�/ > 0

for any x > 0 (the argument is similar to the arguments of the forthcoming Example).

Then by Theorem 2.4.7

lim`!1

PxŒ QZ` 2 QFc� D 1

for any x in a compact set C . Take C D Œ1�ı; 1Cı� so that if QF` WD f QX.`t/=` 2 QFcg

and Qz` D `,

lim`!1

Pz`ŒQF`� D 1:

If s D s.t/ D `t is an acceleration of the time parameter by the factor `, the above

says that as `!1 the backwards path QX.s/ remains asymptotically within a “tube”


of radius �` around the line segment

` Qp1.t/ D `C .p � q/s; 0 � s �`

q � p:

This makes precise the reason for (2.4.1).

Condition (2.4.12) holds:

lim`!1

Pj¤z`

Kz`jPj ŒQF`�P

j¤z`Kz`j

D lim`!1

�P`C1Œ QF`�C �P`�1Œ QF`�

�C �D 1;

and so by Theorem 2.4.6

1 D lim`!1

Pz0ŒC` j E`� D lim`!1

Pz0Œ��1

Tz0!

�1

F` j Tz0!0 D 0; T

!z`1 < T

!z01 �:

2.4.3 An Accounting Network

Consider the following example taken from [8]: a small insurance firm has an ac-

counting department consisting of a senior accountant and her assistant the junior

accountant. Operations in the department proceed as follows: all new claims arrive

on the desk of the junior accountant, and do so according to a Poisson process with

rate �. The junior accountant processes the claims at rate �2, which subsequently

either require more complicated calculations (with probability r21) and thus proceed

to the senior accountant, or else leave the system (with probability r20 WD 1 � r21).

The senior accountant processes claims at rate �1 at which point they return to the

desk of the junior accountant.

Identify the senior accountant’s desk as node 1 and the junior accountant’s desk

as node 2 in this 2-node network. Let a point .x; y/ in the state space S WD N2

denote the the number x of claims on the senior accountant’s desk and the number

y of claims on the junior accountant’s desk (see Figure 2.4).

Without loss of generality assume �C �1 C �2 D 1 so that the jump rates can

be interpreted as transition probabilities. Let X.t/ denote the state of the system

at time t ; clearly X is a Markov process. Let Y denote the discrete time embedded


� - y ��2

?�2r20

completed return

-�2r21

x ��1 -

6�

?�1

Figure 2.4: Flow Diagram for the Accountants’ Network

Markov chain which describes the evolution of this network, and K its associated

kernel. The transition diagram for the system is shown in Figure 2.5. The steady

state of this Jackson network is

�.x; y/ D .1 � �1/.�1/x.1 � �2/.�2/

y (2.4.25)

where

�1 WD�1

�1; �2 WD

�2

�2; �1 WD

�r21

r20; �2 WD

�

r20;

and a simple calculation shows that �.i/

�.j /Kij D Kj i ; i.e. this process is reversible

(K D QK). Since the chain is reversible Figure 2.5 is the transition diagram for the

backwards process as well. The assumptions �1 < �1 and �2 < �2 ensure stability of

the network and will be retained.

It is of interest to study large deviations in the first coordinate, which correspond

to large numbers of returns on the senior accountant’s desk. This simplistic model

with known stationary distribution � facilitates the illustration of several points.

However, the methodology is also applicable in situations where � is unknown. As-

sume

�1 C � > �2: (2.4.26)

Then by stability

�1 > �2 � � > �2 � �2r20 D �2r21I (2.4.27)

i.e. the chain drifts northwest.


-

6

.0; 0/ .`; 0/

�2r21

6�1

�

�2r20

?

@@I

@@R

6

��1

@@I

��2

�2r21

6

�

�2r20?@@R��

��1

Figure 2.5: Transition diagram for the accountants’ chain and its reversal

There are two boundaries for the process QX (the two axes Df1g and Df2g). The

interior fluid limit Qp1 associated with QK satisfies (2.4.18):

d

dtQp1.t/ D ˇ;

D �.0; 1/C �2r21.1;�1/C �2r20.0;�1/C �1.�1; 1/

D .�2r20 � �1; �C �1 � �2/;

the solution of which is

Qp1.t/ D Qp1.0/C tˇ;:

Thus with the initial condition Qp1.0/ D .1; 0/, Qp1 is a line connecting the points

Qp1.0/ D .1; 0/ and Qp1

�1

�1 � �2r21

�D

�0;�C �1 � �2

�1 � �2r21

�:

Let T1 WD1

�1��2r21; this is the time for the interior fluid limit to hit the y axis.

The steady state �1 for QK in the x direction far from the origin is given by

(2.4.20):

�1.k/ D

�1 �

�2r21

�1

��2r21

�1

�kk � 0;


and so (2.4.21) becomes

d

dtQp11.t/ D ˇ;;f1g

D

�1 �

�2r21

�1

�.�2r21; � � �2/C

��2r21

�1

�.�2r21 � �1; �C �1 � �2/

D .0; � � �2r20/

where Qp11 is the boundary fluid limit along the y-axis. The solution to this system is

Qp11.t/ DW Qp11.0/C tˇ;;f1g:

Thus with the initial condition Qp11.0/ D Qp1.T1/; Qp11 is a line connecting the points

Qp11.0/ D

�0;�C �1 � �2

�1 � �2r21

�and Qp11

��2 � � � �1

.�1 � �2r21/.� � �2r20/

�D .0; 0/:

Define T D T2 WD1

�2r20��and

Qz1.t/ WD

8<: Qp1.t/ 0 � t � T1

Qp11.t � T1/ T1 � t � T

:

Thus Qz1.t/ is a piecewise linear function connecting the points

Qz1.0/ D .1; 0/; Qz1.T1/ D

�0;�C �1 � �2

�1 � �2r21

�; and Qz1.T / D .0; 0/:

It follows from (2.4.22) and (2.4.23) that l. Qz1.t/; Qz01.t// D 0 for all t > 0. Therefore

the functional (2.4.16) satisfies Ix. Qz1/ D 0 for the initial point x D Qz1.0/. Define

QF; WD

(� 2 D.Œ0; T �IR2/ j sup

t2Œ0;��

k�.t/ � .x C ˇ;t /k > �

);

QF;;f1g WD

(� 2 D.Œ0; T �IR2/ j sup

t2Œ�� ;T �

k�.t/ � .�.��/C ˇ;;f1gt /k > �

);

and

QF WD QF; [ QF;;f1g


where

�� WD infft � 0 j �.t/ 2 Df1gg:

Then

QF` WD f QX.`t/=` 2 QFcg

will be the “Folk tube” within which the scaled backwards process QZ` is increasing

likely to remain as ` ! 1. To demonstrate (2.4.12) Theorem 2.4.7 must be used,

and it must be checked that

inf�2 QF

Ix.�/ > 0: (2.4.28)

There are some technical nuisances involved in verifying this condition: it is clear

that Qz1 2 QF , and that Qz1 is the unique path satisfying Ix. Qz1/ D 0. What remains

is to show that the value of Ix.�/ cannot be arbitrarily small for � 2 QF c, and this

involves handling many cases which needn’t appear here.

Thus by Theorem 2.4.7 there is a compact set C such that, uniformly for x 2 C ,

lim sup`!1

1

`lnPxŒ QZ` 2 F � � � inf

�2FIx.�/:

This implies, by (2.4.28), that

lim`!1

PxŒ QF`� D 1 (2.4.29)

uniformly for x in a compact set containing z`.

This limit says that the time reversal starting from a distant point .`; 0/ will

drift northwest remaining roughly within a tube of radius �` around the line segment

` Qz1.s=`/; s � `T1 until it hits the y axis. It then travels down the y axis until it

arrives at the point .0; 0/. See Figure 2.6 for a path simulated with the parameters

� D 0:3; �1 D 0:7; �2 D 0:6; r21 D 0:2 and ` D 150.

Condition (2.4.12) holds:

lim`!1

Pj¤z`

Kz`jPj ŒQF`�P

j¤z`Kz`j

D lim`!1

�P.`;1/Œ QF`�C �1P.`�1;1/Œ QF`�

�C �1D 1;


Figure 2.6: A simulated path QY and Qp1


and so by Theorem 2.4.6

lim`!1

Pz0ŒY 2 C` j Tz0!0 D 0; T

!z`1 < T

!z01 � D 1:

Hence the large deviation path from .0; 0/ has two segments, the first of which

is a jitter up the y axis to a point at roughly�0; `.�1C��2/

�1��2r21

�. To determine what the

transitions of the forward process look like during this segment, reason as follows: the

backwards process QY begins this segment at roughly�0; `.�1C��2/

�1��2r21

�, after which the

x-component quickly reaches its local steady state �. Thus the proportion of time

spent by QY on any of the rays x D 0; x D 1; : : : will approximately be given by �.x/,

which then gives also the approximate proportion of time spent on these rays by the

forward process Y , when near the y-axis. Therefore the proportion of transitions out

of a point .x; y/, in the north western direction say, is the product of the proportion

of time spent on the vertical ray x�1 with the proportion of south eastern transitions

experienced by QY when at .x � 1; y C 1/:

�.x � 1/ QK.x�1;yC1/.x;y/ D �.x � 1/�2r21 D �1�.x/:

Finally, the probability of that the forward process transitions from .x; y/ to .x �

1; y C 1/ near x D 0 is given by dividing this proportion by the probability of being

at the specified x-value, namely �.x/. This yields a probability �1 of a north western

transition. The probability of other transitions are computed in a similar manner.

The network near x D 0 seen by the accountants for the forward process leading to a

large deviation is as in Figure 2.7. The accountants see new returns arriving at rate

�2r20 and returns exiting the system at rate �. Thus the vertical drift for the large

deviation path is �2r20 � �; this is the negative of the drift associated with K D QK.

Compare this to Example 2.4.1 in which the path to the large fortune was one in

which the coin behaved as if the sides were reversed.


�2r20 - y ��2

?�

completed return

-�2r21

x ��1 -

6�

?�1

Figure 2.7: Transition rates for the large deviation path as it moves up the y axis

Chapter 3

Quasi-Stationary Measures for

Sub-Stochastic Chains

3.1 The Yaglom Limit

It has been noted by several authors, perhaps most notably A.M. Yaglom, that many

processes behave in an ergodic way in the short term but are eventually assured of

experiencing a “death” of one kind or another. In many ways, it may be more realistic

to analyze the limiting behaviour conditional on non-absorption rather than assuming

that the governing probabilistic model remains valid indefinitely. In this section some

definitions and notational conventions are laid out so that such analyses can proceed.

LetK be an irreducible, sub-stochastic kernel on the countable state space S[f0g

with graveyard state 0; that is, for each i 2 S , Ki0 D 1�Pj2S Kij . Suppose that K

is the kernel associated with the instantaneous transition probabilities of a Markov

jump process X D fXtgt2RC with jump times fTngn2Z. Let Y D fYngn2N be defined by

Yn WD XTn where fTngn2Z are the points of X . Thus Y is a discrete time Markov chain

on S [ f0g with transition kernel K. Assume that eventual absorption (or transition

44

3.1. The Yaglom Limit 45

to 0) is certain: for all i 2 S

Pi Œ�0 <1� D 1 for �0 WD inffn > 0IYn D 0g: (3.1.1)

Given that the chain started in state i 2 S , let �n.�ji/ be the distribution of the

process at time n conditional on non-absorption:

�n.j / D �n.j ji/ WD Pi ŒYn D j jYn 2 S�:

The more concise notation �n.j / will be preferred when i is understood, or unimpor-

tant.

Definition 3.1.1 A Yaglom limit � for Y (or K) exists if, for any i; j 2 S ,

�.j / D limn!1

�n.j ji/: (3.1.2)

Thus, when it exists, the Yaglom limit describes the behaviour of the non-absorbed

process after long time periods and does not depend on the initial state i . There

are interesting situations in which the limit above can exist and depend on i , but

those cases are not treated here. Notice that by the series form of Scheffe’s Theorem

�n.A/! �.A/ for all A � S , and thus that �n ) � (see the Corollary to Theorem

16.12 of [5]).

Lemma 3.1.2 The Yaglom limit is a left eigenvector for K with eigenvalue

˛ WD limn!1

Pi ŒYn 2 S�

Pi ŒYn�1 2 S�; i 2 S: (3.1.3)

Proof: For any i; j 2 S and each n,

�n.j ji/ DXk2S

Pi ŒYn D j; Yn�1 D k�

Pi ŒYn 2 S�

D

Xk2S

Pi ŒYn�1 D k�

Pi ŒYn 2 S�Kkj

DPi ŒYn�1 2 S�

Pi ŒYn 2 S�

Xk2S

�n�1.kji/Kkj ; (3.1.4)


orPi ŒYn 2 S�

Pi ŒYn�1 2 S�D

Pk2S �n�1.kji/Kkj

�n.j ji/:

Since �n) � taking the limit as n!1 yields, by the Portmanteau lemma,

limn!1

Pi ŒYn 2 S�

Pi ŒYn�1 2 S�D

Pk2S �.k/Kkj

�.j /:

Thus the limit in question exists, does not depend on j or i , and clearly is an eigen-

value of K corresponding to the left eigenvector � .

It will be useful to invoke some of the terminology of [17] and [25]:

Definition 3.1.3 A measure � on S is called r-invariant for K if

r�.j / DXi2S

�.i/Kij (3.1.5)

holds for all j . A 1-invariant measure may simply be called invariant, and an ˛-

invariant measure may be called quasi-invariant.

In the terminology of Definition (3.1.3) then, Lemma (3.1.2) says that any Yaglom

limit is ˛-invariant for K with ˛ defined by (3.1.3).

Corollary 3.1.4 If Y has a Yaglom limit � then for all n

�.i/ D P� ŒYn D i jYn 2 S� (3.1.6)

and

˛ DP� ŒYn 2 S�

P� ŒYn�1 2 S�< 1: (3.1.7)

Proof: This follows immediately from the fact that � is a left eigenvector for K:

P� ŒYn D j � DXi2S

�.i/Knij D ˛

n�.j / (3.1.8)


and so P� ŒYn D j jYn 2 S� D˛n�.j /Pk2S ˛

n�.k/D �.j / as well as

P� ŒYn 2 S�

P� ŒYn�1 2 S�D

˛nPj2S �.j /

˛n�1Pj2S �.j /

D ˛:

The fact that ˛ < 1 follows from summing (3.1.8) over all j 2 S and noting that the

limit as n!1 of the resultant sum is 0 by (3.1.1).

There is another characterization of ˛ which may be useful:

Lemma 3.1.5 For any i 2 S ,

˛ D limn!1

Pi Œ�0 > n�1=n:

Proof: Let i 2 S and � > 0 be arbitrary and let L 2 N be large enough so that

n � L impliesˇ˛ �

Pi ŒYnC12S�

Pi ŒYn2S�

ˇ< �. Then for n � L

Pi Œ�0 > n�1=nD

�Pi ŒYn 2 S�

Pi ŒYn�1 2 S�� Pi ŒY1 2 S�

Pi ŒY0 2 S�

�1=n2 ..˛ � �/1�L=nT .n/; .˛ C �/1�L=nT .n//

where T .n/ D�Pi ŒYL2S�

Pi ŒYL�12S��

Pi ŒY12S�

Pi ŒY02S�

�1=nis a term going to 1 as n!1. Thus tak-

ing the limit as n ! 1 of the above expression shows that limn!1 Pi Œ�0 > n�1=n 2

Œ˛ � �; ˛ C ��, and since � was arbitrary the result follows.

In this setting, one question is centrally important: will the process be absorbed

before reaching a Yaglom limit? If so, the Yaglom limit is not of much practical use,

and it is important to check this before making pronouncements about what to expect

of the process after long time periods. The following well known result allows us to

begin to answer the question.


Corollary 3.1.6 If the Yaglom limit � exists and is the initial distribution for Y

then �0 is geometrically distributed with parameter ˛:

P� Œ�0 D n� D .1 � ˛/˛n:

Proof: Summing (3.1.8) over all j 2 S gives P� Œ�0 > n� D ˛n, and the result

follows by subtracting P� Œ�0 > nC 1� D ˛nC1.

Corollary (3.1.6) says that �0 is geometrically distributed when the chain is initially

distributed according to the Yaglom limit, whereas Lemma (3.1.5) says that it is

asymptotically so starting from any point i 2 S .

The question of whether a Yaglom limit exists for a given kernel was addressed

in [17] which gave sufficient conditions. These conditions, hereafter referred to as the

Kesten conditions, are very much 1-dimensional restrictions. It is shown in [17] that

when the Kesten conditions hold, when (3.1.1) holds, and when one step absorption

is impossible from large enough states, an R�1-invariant probability measure � exists

for K and satisfies

�.j / D limn!1

KnijP

`2S Kni`

D limn!1

Pi ŒYn D j �

Pi ŒYn 2 S�D lim

n!1�n.j ji/ (3.1.9)

for all i; j 2 S (take i D k and m D 0 in equation (1.13) of that paper). Here R�1

is the value specified by (3.2.17). Thus, by (3.1.2) � is a Yaglom limit for K with

R�1 D ˛.

Example 3.1.7 Let fXtgt2RC be a birth-death process with jump rate 1 D � C �,

0 < � < �, and instantaneous transition probabilities

pij D

8<:� j D i C 1

� j D i � 1; i � 1:

This defines a sub-stochastic continuous time Markov chain with associated embedded


chain Yn D XTn, n � 0; its kernel is

K D

266666640 � 0 0 � � �

� 0 � 0 � � �

0 � 0 �

: : :: : :

: : :

37777775 : (3.1.10)

Seneta and Vere-Jones discuss this kernel in their paper [24] and find the ˛-invariant

probability measure �.i/ WD .1 �p�=�/2i

�q��

�i�1, ˛ D 2

p��, which does not

satisfy the Kesten conditions (it is not aperiodic, for example) but nevertheless is a

Yaglom limit for K. This kernel is central to some forthcoming examples.

3.2. Deterministic Convergence to � 50

3.2 Deterministic Convergence to �

Equation (3:1:4) says that the conditional distribution �n at time n satisfies the

recursive relationship

�n D Cn�1�n�1K; n � 1 (3.2.1)

where Cn WDPŒYn2S�

P ŒYnC12S�is a quantity converging to ˛�1 as n!1 (here �n is treated

as a row vector). This suggests an iterative, deterministic procedure for computing

�n and thus for approximating � , should it exist. To examine the rate of convergence

of �n to � , consider the chi-square distance introduced in [11] and given by

�2n WDXi2S

��n.i/

�.i/� 1

�2�.i/; (3.2.2)

and the inner product structure imbued by � on the space of complex-valued functions

over S :

hf; gi� WDXi2S

f .i/g.i/�.i/ for f; g W S ! R: (3.2.3)

The associated �-norm is defined by jjf jj� WDphf; f i� . Then if … D diag.�/ is the

diagonal matrix whose entries are given by � and fn WD …�1�>n � 1, the chi-square

measure of the distance from the nth iterate and � is just the square of the �-norm

of fn:

�2n D jjfnjj2� : (3.2.4)

Note that the adjoint of an operator A on the inner product space F.S/ D .CS ; h�; �i�/,

denoted by QA, is the reversal with respect to � , i.e.

QAij WD�.j /

�.i/Aj i : (3.2.5)

Then QA and AH are similar matrices since QA D …�1AH…, and thus A and QA have

the same eigenvalues (here AH denotes the Hermitian of A).

The stochastic kernel K� on S defined by K�ij WD Kij C Ki0�.j / effectively

resurrects dead chains and distributes them according to � , and satisfies

K�D K C .1 �K1/� D KM C 1� (3.2.6)


where

M WD I � 1�: (3.2.7)

Right multiplying both sides of (3.2.1) by 1 gives C�1n�1 D �n�1K1 which implies

�C�1n�1� D � � �n�1K1� � � D �n�1.1 �K1/� � � and so

f >n D �n…�1� 1>

D Cn�1.�n�1K � C�1n�1�/…

�1

D Cn�1.�n�1K C �n�1.1 �K1/� � �/…�1

D Cn�1.�n�1K�� /…�1

D Cn�1.�n�1…�1. QK�/> � 1>/

or, since 1> D 1>. QK�/>,

fn D Cn�1 QK�fn�1: (3.2.8)

Notice that, for any self adjoint matrix A,

eKnA D …�1.KnA/>… D …�1A>……�1.Kn/>… D A QKn

I (3.2.9)

it is then immediate that

QK�DM QK C 1�: (3.2.10)

Lemma 3.2.1 The matrix M is the projection onto the subspace of F.S/ consisting

of functions orthogonal to 1. Also,

MKM D KM: (3.2.11)

Proof: It is immediate that QM DM and M 2 DM . For any g 2 F.S/

hMg; 1i� D hg � 1�g; 1i� D hg; 1i� � h1�g; 1i� D 0;

and so M is the specified projection. Then MKM D .K � ˛1�/M D KM verifies

the specified equality.


Notice that KM1� D 0 and 1�KM D ˛1�M D 0 and so the cross terms that

appear from expanding the powers .K�/n D .KM C 1�/n disappear. Also, 1� is

the projection onto 1 (the so called Perron projection in finite dimensions) and hence

idempotent, and .KM/n D KnM by (3.2.11). Therefore

.K�/n D KnM C 1�; . QK�/n DM QKnC 1�: (3.2.12)

Lemma 3.2.2 For each n, fn ? 1 and

fn D .Cn�1 � � �C0/. QK�/nf0 D .Cn�1 � � �C0/M QKnf0:

Proof: Direct computation shows that

hfn; 1i� D f>n …1 D .�n…

�1� 1>/…1 D �n1 � �1 D 0

since �n and � are measures summing to 1. Then (3.2.8) and (3.2.12) imply

fn D .Cn�1 � � �C0/. QK�/nf0 D .Cn�1 � � �C0/M QKnf0:

Suppose there exists a right eigenvector h for K with strictly positive entries corre-

sponding to the eigenvalue ˛, represented as a column vector. Consider the assump-

tion

hh; 1i� DXi2S

�.i/h.i/ D 1; (3.2.13)

which, without loss of generality, is equivalent to the assumptionPi2S �.i/h.i/ <1.

Lemma 3.2.3 a) The vector 1 is a right eigenvector for QK with eigenvalue ˛;

b) If assumption (3.2.13) holds then the steady state � for 1˛QK is given by

�.i/ WD �.i/h.i/; i 2 S I (3.2.14)


c) If K is aperiodic and assumption (3.2.13) holds then for each i 2 S

h.i/ D limn!1

Pi Œ�0 > n�

˛nI (3.2.15)

d) If v 2 ker.K � �I/n for some � ¤ ˛ then v 2 ker.K� � �I/n and v ? 1.

e) If v 2 ker. QK � �I/n for some � ¤ ˛ then v ? h.

Proof: The calculation

QK1 D …�1K>…1 D …�1K>�> D …�1.�K/> D …�1.˛�/> D ˛1

verifies a). Under assumption (3.2.13) � certainly defines a probability measure, and

so

�1

˛QK D

1

˛�…�1K>… D

1

˛h>K>… D

1

˛˛h>… D �

verifies b). If K is aperiodic then 1˛QK is (irreducible and) aperiodic and so�

1

˛QK

�nij

! �.j /; i 2 S

by part b). Now

Pi Œ�0 > n� DXj2S

Knij D

˛n

�.i/

Xj2S

�.j /

�1

˛QK

�nji

;

so by dominated convergence

limn!1

Pi Œ�0 > n�

˛nD

1

�.i/

Xj2S

�.j /�.i/ D h.i/;

verifying c). Now if v 2 ker.K � �I/n then

0 D h.K � �I/nv; 1i�

D

nXiD0

n

i

!.��/ihKn�iv; 1i�

D

nXiD0

n

i

!.��/ihv; QKn�i1i�


D

nXiD0

n

i

!.��/i˛n�ihv; 1i�

D .˛ � �/nhv; 1i� :

If � ¤ ˛ then v ? 1 and by (3.2.12)

.K�� I/nv D

nXiD0

n

i

!.��/i.K�/n�iv

D

nXiD0

n

i

!.��/iKn�iv

D .K � �I/nv

D 0

so that d) holds. For v 2 ker. QK � �I/n the claim hh; vi� D 0 is verified by the

calculation

0 D h. QK � �I/nv; hi�

D

nXiD0

n

i

!.��/ih QKn�iv; hi�

D

nXiD0

n

i

!.��/ihv;Kn�ihi�

D

nXiD0

n

i

!.��/i˛n�ihv; hi�

D .˛ � �/nhv; hi� :

Note the distinction between part c) of Lemma 3.2.3 and Lemma 3.1.5; in par-

ticular, the former is a strictly stronger property.

Conditions for the existence, and uniqueness, of a positive h satisfying (3.2.13) are

given in [25]. A sufficient condition is R-positivity of K, whose formulation requires


a well known limit result: define the potential series associated with K to be

Gij .z/ WD

1XnD0

Knij z

n (3.2.16)

where K0ij D �fi D j g. Recall that K is irreducible. A routine application of Fekete’s

subadditivity lemma shows that

R�1 WD limn!1

.Knij /1=n (3.2.17)

is a limit that exists and has the same value for all i; j 2 S .

Lemma 3.2.4 The radius of convergence of the potential series G.z/ is R.

Proof: For z 2 C let yn D .Knij /1=nz. The root test says that R�1jzj D

limn!1 jynj < 1 implies that Gij .z/ is convergent and R�1jzj D limn!1 jynj > 1

implies that Gij .z/ is divergent. The result follows.

The kernel K is called R-recurrent if Gij .R/ D 1, and R-transient otherwise. An

R-recurrent matrix is called R-positive if none of the terms KnijR

n goes to zero in n,

and R-null otherwise. These definitions have well known analogues in the stochastic

case (where R D 1). To see the connection with Lemma 3.1.5, note that by part a)

of Lemma 3.2.3

.Knij /1=nD

��.j /

�.i/

�1=n �QKnji

�1=nD

��.j /

�.i/

�1=n ˛Xk

QKn�1jk

�1

˛QKki

�!1=n

�

��.j /

�.i/

�1=n ˛Xk

QKn�1jk

!1=nD

��.j /

�.i/

�1=n˛:

Taking n!1 in the above expression shows that R�1 � ˛. The reverse inequality

may not hold in general, but Theorem D. of [25] says that if K is R-positive then

3.3. Conditions on �0 56

˛ D R�1 is the maximal eigenvalue of K, and that there exists a unique positive

eigenvector h satisfying (3.2.13).

3.3 Conditions on �0

Here conditions on �0 sufficient to ensure that �2n ! 0 will be given; this will demon-

strate that the iterative approximation �n of � converges in more than just the

pointwise sense of (3.1.2).

Lemma 3.2.2 says that fn is computed from f0 by means of a large power of

the matrix QK� . It is therefore reasonable to expect convergence to an element of the

eigenspace of QK� , and since fn ? 1 for each n, a dominant eigenvalue of QK� not

equal to unity should describe the asymptotics of kfnk� D �n. Retain the following

assumption:

Assumption 3.3.1 The initial distribution �0 can be written as a linear combination

of finitely many generalized (left) eigenvectors of K, among them �: for distinct

eigenvalues �1; : : : ; �m of K ordered so that j�1j > j�2j > � � � > j�mj, there exist

non-zero scalars c; cij , i D 1; : : : ; m, j D 1; : : : ;Di such that

�0 D c� C

mXiD1

DiXjD1

cijwij ;

with w>ij 2 ker.K> � �i/n.i;j / for some n.i; j / 2 N.

In a finite dimensional setting, Di could be taken to be the algebraic multiplicity of the

eigenvalue �i , and wi1; : : : ; wiDi the generalized left eigenvectors of K. In the infinite

dimensional setting, there is no guaranty that an eigenspace ker.K> � �I/ is finite

dimensional, but the assumption here is that finitely many generalized eigenvectors

are sufficient to describe (via linear combination) the initial distribution �0. Since

. QK � �iI /n.i;j /…�1w>ij D …

�1.K> � �iI /n.i;j /w>ij D 0;


i.e. …�1w>ij 2 ker. QK � �iI /n.i;j / for every i; j , it follows from (3.2.13) and part e) of

Lemma 3.2.3 that

�0h D c�hC

mXiD1

DiXjD1

cijwijh D c C

mXiD1

DiXjD1

cij .…�1w>ij /

>…h D c;

and hence that

�0 D .�0h/� C

mXiD1

DiXjD1

cijwij : (3.3.1)

Thus

f0 D …�1�>0 � 1 D .�0h � 1/1C

mXiD1

DiXjD1

cij…�1w>ij :

Of course,

�0h D h>�>0 D h

>….…�1�>0 � 1/C h>…1 D hf0; hi� C 1;

so

f0 D hf0; hi�1C

mXiD1

DiXjD1

cij…�1w>ij : (3.3.2)

Now for n � n.i/ WD maxj�Di

n.i; j /

QKn

DiXjD1

cij…�1w>ij D

DiXjD1

cij . QK � �iI C �iI /n…�1w>ij

D

DiXjD1

cij

nXkD0

n

k

!�n�ki . QK � �iI /

k…�1w>ij

D �ni

DiXjD1

cij

n.i;j /�1XkD0

n

k

!��ki .

QK � �iI /k…�1w>ij

D

n

n.i/ � 1

!�ni gin

for a vector gin 2 ker. QK � �iI /n.i/ satisfying

gin D ��.n.i/�1/i . QK � �iI /

n.i/�1X

n.i;j /Dn.i/

cij…�1w>ij CO.n

�1/ DW gi CO.n�1/:


Note that gi 2 ker. QK � �iI /. Thus, for n � maxi�m

n.i/, (3.2.12) and f0 ? 1 implies

. QK�/nf0 DM

mXiD1

n

n.i/ � 1

!�ni gin

DM

n

n.1/ � 1

!�n1

0@g1n C mXiD2

n

n.1/ � 1

!�1 n

n.i/ � 1

!��i

�1

�ngin

1AD

n

n.1/ � 1

!�n1.Mg1 CO.n

�1//:

Note that if n.1/ D 1 the O.n�1/ term can be replaced with O�nrˇ�2�1

ˇ�for some r .

Now, Cn�1 � � �C0 D1

P Œ�0>n�D

1�0Kn1

, so by Lemma 3.2.2

fn D. QK�/nf0

�0Kn1D

�n

n.1/�1

��n1.Mg1 CO.n

�1//

�0Kn1: (3.3.3)

Since (3.3.1) implies

�0Kn1 D

0@.�0h/� C mXiD1

DiXjD1

cijwij

1AKn1

D .�0h/˛nC

mXiD1

DiXjD1

cij ..K>/nw>ij /

>1

D .�0h/˛nC

mXiD1

DiXjD1

cij�ni

n.i;j /�1XkD0

n

k

!��ki ..K

>� �i/

kw>ij />1

D .�0h/˛nC

mXiD1

DiXjD1

cij�ni

n.i;j /�1XkD0

n

k

!��ki ..

QK � �i/k…�1w>ij /

>…1

D .�0h/˛nC

mXiD1

n

n.i/ � 1

!�ni g

>in…1

D .�0h/˛nC

n

n.1/ � 1

!�n1.g

>1 CO.n

�1//…1;

or�0K

n1

˛nD �0hC

n

n.1/ � 1

!��1

˛

�nhg1 CO.n

�1/; 1i� ; (3.3.4)

3.4. Comparison with Stochastic Case 59

it follows from (3.3.3) that n

n.1/ � 1

!��1

˛

�n!�1fn D

Mg1 CO.n�1/

�0Kn1=˛nD

Mg1 CO.n�1/

�0hCO��

n

n.1/�1

�� ! Mg1

�0h:

Theorem 3.3.2

Suppose that K is R-positive, and that the initial distribution �0 can be represented as

a linear combination of finitely many generalized left eigenvectors of K, as in (3.3.1),

with n.1/ the largest order of the generalized eigenvectors in this linear combination

corresponding to the dominant eigenvalue j�1j < ˛. Then

a) there exists an eigenvector g1 2 ker. QK � �1I / such that

limn!1

n

n.1/ � 1

!��1

˛

�n!�1fn D

Mg1

�0hI (3.3.5)

b) if kMg1k� <1 then the asymptotic decay rate of �n to 0 is�

n

n.1/�1

� �j�1j

˛

�n.

Proof: Part a) has been argued above, and from (3.3.3) it follows that

�n D

n

n.1/ � 1

!�j�1j

˛

�nkMg1k� CO.n

�1/

�0hCO��

n

n.1/�1

�� ;which verifies b).

Theorem 3.3.2 says that, properly scaled, fn is approaching an eigenvector of QK�

with eigenvalue �1.

3.4 Comparison with Stochastic Case

In the fully stochastic case, an upper bound on the rate of convergence of the distri-

bution �n at time n to the steady state � is described in Section 2 of [11]. Theorem


3.3.2 expresses the subtlety of the approach needed in the sub-stochastic case. The

recursive relationship specified by Lemma 3.2.2 implies

�2nC1 D C2n hQK�fn; QK

�fni� D C2n

�QK� fn

jjfnjj�; QK� fn

jjfnjj�

��

�2n;

and so applying the iterated inequality approach of [11] yields

�2nC1 � C2nˇ1.K

� QK�/�2n � : : : � .Cn : : : C1/2ˇ1.K

� QK�/n�20 (3.4.1)

where

ˇ1.K� QK�/ WD supfhf;K� QK�f i� I jjf jj� D 1; f ¤ 1g:

The matrix K� QK� is the multiplicative reversiblization of K� , and the minimax char-

acterization of the eigenvalues of the real symmetric matrix …1=2K� QK�…�1=2 implies

that ˇ1.K� QK�/ is the second largest eigenvalue of K� QK� (see 2.19 of [11]). Thus

(3.4.1) gives an approximate upper bound of ˇ1.K� QK�/1=2=˛ on the decay rate for

�n.

This bound can be unhelpful in the sub-stochastic case because it may happen

that ˇ1.K� QK�/1=2=˛ > 1, as is shown in the following example:

Example 3.4.1 Let K be the 10 � 10 matrix given by

K D

26666666666664

0 � 0 0 0

� 0 � 0 � � � 0

0 � 0 � 0

: : :: : :

: : :

0 0 � � � � 0 �

0 0 � � � 0 � �

37777777777775(3.4.2)

where � D 0:3, � D 0:7, and let the initial distribution �0 be uniform on f1; : : : ; 10g.

Notice that 0 is an absorbing state. Using matlab to determine the eigenvalues of

K yields ˛ � 0:889 and �1 � 0:877. The asymptotic rate given by Theorem 3.3.2


Figure 3.1: Ratio �2nC1=�2n vs n.

is therefore approximately 0:987n, which was also verified by directly computing the

values of �2nC1=�2n for n D 1; :::; 200 (they converged to 0:973 D .�1=˛/

2, see figure

3.1). However, the second largest eigenvalue of K� QK� is ˇ1.K� QK�/ � 0:856 and

thus the asymptotic rate suggested by (3.4.1) is � 1:042; while a valid upper bound,

this inequality gives no information about whether �n ! 0, which as a matter of fact

does occur here.

The reason that the methodology of [11] is inadequate to prove convergence

is that the inequality (3.4.1) is formed “too early”, and the resulting supremum

ˇ1.K� QK�/ ranges over too expansive a set. The effect of repeated applications of QK�

to f0 cannot be ignored, and they are taken into account in Theorem 3.3.2 to achieve

the crucial stabilization (3.3.5).

As mentioned in the discussion preceding Corollary 3.1.6, it is important to

compare the rate at which substochastic chains are absorbed with the rate at which


they converge to the quasi-stationary distribution so that one has a fair idea of what

to expect at moderately large times n. Notice that when S is finite, Lemma 3.1.5

implies that limn!1 P�0Œ�0 > n�1=n D ˛ for any initial distribution �0. Together with

Theorem 3.3.2 this means that the convergence to quasi-stationarity is more rapid

than absorption if and only if

j�1j < ˛2: (3.4.3)

Example 3.4.2 Let

K D1

10

2666417=2 1=3 1=6

5=8 97=12 1=24

1=4 5=6 101=12

37775 :Then � D Œ1=2; 1=3; 1=6� and K has the following Jordan decomposition K D VJV �1:

K D

266642=3 0 1

1=2 1=20 1

1 �1=10 �5

37775266640:9 0 0

0 0:8 1

0 0 0:8

3777526664

3=4 1=2 1=4

�35=2 65=3 5=6

1=2 �1=3 �1=6

37775 Iso the generalized eigenspace corresponding to �1 D 0:8 is 2-dimensional. Thus the

asymptotic rate of convergence of �n to zero is n�0:80:9

�n� n0:889n. Here, the relation

(3.4.3) holds and so one expects the chain to be distributed approximately according

to � at moderately large values of n. In 500 000 simulations of a chain with kernel K

and uniform initial distribution �0, 2850 survived until time 50 (one expects roughly

2578 � 500 000˛50), and the empirical distribution of those that survived was ap-

proximately Œ0:4681; 0:3537; 0:1782� which deviated from � (with respect to jj � jj�) by

about 0:06392 (an asymptotic upper bound is 0:1385 � 50.0:889/50). This agrees with

the theoretical results.

Now

K�D

266649=10 3=45 3=90

1=8 17=20 1=40

1=20 1=10 17=20

37775 ; QK�D

266649=10 1=12 1=60

1=10 17=20 1=20

1=10 1=20 17=20

37775


and the second largest eigenvalue of K� QK� is ˇ1.K� QK�/ � 0:681. So the rate implied

by (3.4.1) is ˇ1.K� QK�/1=2=˛ � 0:917. This overestimates the asymptotic rate, but

is less egregious than Example 3.4.1 as the bound is sufficient to prove convergence.

However, since ˇ1.K� QK�/=˛ > ˛, using this bound would lead one to infer that

absorption occurs faster than convergence to quasi-stationarity, which is not so.

3.5. Continuous Case 64

3.5 Continuous Case

The arguments of Section 3.2 essentially apply in the continuous case, though some

technical aspects change. Consider a sub-stochastic continuous time Markov chain

fXtgt2R, with infinitessimal generator Q D fqij gi;j2S and transition semigroup P D

.P.t//t�0; thus

Pij .t/ D P ŒXsCt D j j Xs D i �; i; j 2 S; s; t 2 R;

and the sub-stochasticity manifests in the condition

qi0 WD �Xj2S

qij � 0; 8i 2 S (3.5.1)

with strict inequality holding for at least one i 2 S (i.e. transition to the graveyard

state 0 is possible from at least one state in S). As before, assume that eventual

absorption is certain:

limt!1

.P.t/1/i D limt!1

Pi ŒXt 2 S� D 0; 8i 2 S: (3.5.2)

Recall that P.t/ D exp.Qt/ is the solution to Kolmogorov’s forward differential sys-

temd

dtP.t/ D P.t/Q: (3.5.3)

As before let

�t.j / D �t.j ji/ WD Pi ŒXt D j jXt 2 S� (3.5.4)

with the notation �t.j / being preferred over �t.j ji/ when mention of i is gratuitous.

Suppose that the Yaglom limit � exists in the sense of definition (3.1.2) with the

discrete index n replaced by the continuous index t :

�.j / D limt!1

�t.j j i/; i; j 2 S: (3.5.5)

Let �0 denote a row vector with 1 at entry i and 0’s in all other entries, where i is

any element of S . Then (3.5.4) says

�t DP>i � .t/Pk2S Pik.t/

D�0P.t/

�0P.t/1: (3.5.6)


Lemma 3.5.1 The Yaglom limit is a left eigenvector for P.s/ with eigenvalue

˛.s/ WD limt!1

Pi ŒXsCt 2 S�

Pi ŒXt 2 S�D lim

t!1

�0P.t C s/1

�0P.t/1(3.5.7)

for all s � 0. Moreover, ˛.s/ D ˛s with ˛ WD ˛.1/ 2 .0; 1/, and � is a left eigenvector

for Q with eigenvalue � D ln˛.

Proof: For s; t � 0, (3.5.6) implies

�tCs D�0P.t C s/

�0P.t C s/1

D�0P.t/P.s/

�0P.t/1

�0P.t/1

�0P.t C s/1

D�0P.t/1

�0P.t C s/1�tP.s/;

or�0P.t C s/1

�0P.t/1D.�tP.s//j

�tCs.j /; 8j 2 S:

For any sequence ftngn2N increasing to 1, Scheffe’s Theorem implies that �tn ) � ,

and hence by the Portmanteau lemma the limit t !1 of the above expression can

be applied to conclude

limt!1

�0P.t C s/1

�0P.t/1D.�P.s//j

�.j /; 8j 2 S:

This verifies that ˛.s/ is the specified eigenvalue. Let ˛ WD ˛.1/. Note ˛ D ˛�1 D

�P1 D P� ŒX1 2 S� > 0. Then for every n 2 N

˛� D �P.1/ D �P

�1

nC � � � C

1

n

�D �P

�1

n

�nD ˛ .1=n/n �;

so ˛.1=n/ D ˛1=n. Then

˛m=n� D ˛.1=n/m� D �P

�1

n

�mD �P

�mn

�D ˛.m=n/�

so that ˛.m=n/ D ˛m=n for all m; n 2 N. Since the semigroup P.s/ is continuous in s

it follows that ˛.s/ is continuous and hence that

˛.s/ D ˛s D e.ln˛/s DW e�s; s � 0:


Thus

˛t� D e�t� D �P.t/;

for all t � 0. Right multiplying by 1 and taking t ! 1 in this expression demon-

strates, in light of (3.5.2), that ˛ < 1, whereas applying (3.5.3) at t D 0 gives

�� D �P.0/Q D �Q:

As in the discrete case, with

ft WD …�1�>t � 1;

the metric

�t WD kftk� Dphft ; fti� (3.5.8)

gives the chi-square distance between the conditional distribution �t and its limit

� . A key ingredient to demonstrating �n ! 0 and obtaining the corresponding

decay rate for discrete chains was the recursive relationship (3.2.8) for fn. No exact

analogue exists in the continuous time case, but (3.5.3) gives rise to something similar.

Differentiating (3.5.6) yields

d

dt�t D

�0P.t/Q � �0P.t/1 � �0P.t/Q1 � �0P.t/

.�0P.t/1/2

D �tQ � �tQ1 � �t

D �tQ.I � 1�t/:

So if

Mt WD I � 1�t (3.5.9)

this means thatd

dt�t D �tQMt D �t.Q � �tQ1I /: (3.5.10)

Note

�tQ1 D�0P.t/Q1

�0P.t/1D

1

�0P.t/1

d

dt�0P.t/1 D

d

dtlnP Œ� > t�


where � is the time to hit the cemetery state. Thus

d

dt�t D �t

�Q �

d

dtlnP Œ� > t�I

�(3.5.11)

(the solution to this system of differential equations is

�t D �0eQt�lnPŒ�>t�I

D�0e

Qt

P Œ� > t�D

�0P.t/

�0P.t/1;

as required by (3.5.6)). Taking t ! 1 in the second equality of (3.5.10) gives

�Q� D 0> where Q� is the generator given by

Q�D Q �Q1� D QM (3.5.12)

with M given by (3.2.7). ThereforeX`2S

�.`/.q j C q`0�.j // D 0 for every j 2 S (3.5.13)

(see (2:1) of [10]).

Lemma 3.5.2 a) Both QQ1 D �1 and QQ�1 D 0 hold;

b) If v 2 ker.Q � ˇI/n for some ˇ ¤ � then v 2 ker.Q� � ˇI/n and v ? 1;

Proof: Assertion a) is verified by the computations

QQ1 D …�1Q>…1 D …�1Q>�> D …�1.�Q/> D �…�1�> D �1

and

QQ�1 D …�1.Q�/>…1 D …�1.�Q�/> D 0:

Assertion b) is proved exactly as in Lemma 3.2.3.

Since

Mft D ft � 1�.…�1�>t � 1/ D ft � 11>�>t C 1 D ft ;


it follows from (3.5.11) that

d

dtft D

d

dtMft

DM

�…�1

d

dt�>t � 1

�DM…�1

�Q> �

d

dtlnP Œ� > t�I

��>t

DM

�QQ �

d

dtlnP Œ� > t�I

�…�1�>t :

As M QQ1 D �M1 D 0; the above says that

d

dtft DM

�QQ �

d

dtlnP Œ� > t�I

�ft D

�QQ��d

dtlnP Œ� > t�I

�ft

and the solution to this system of differential equations is

ft DMeQQtf0

P Œ� > t�D

eQQ� tf0

P Œ� > t�(3.5.14)

(compare Lemma 3.2.2). Thus

�t D kftk� D1

P Œ� > t�keQQtf0k� :

The rate at which �t ! 0 can then be determined in a manner similar to the argu-

ments leading to Theorem 3.3.2.

Chapter 4

Rare Events for Substochastic

Chains

4.1 The Sustained Kernel

The Folk Theorem says that rare paths of a stationary process are described by the

reversal of paths which begin at the rare destination. In practical terms, this suggests

simulating the reversed process from the rare event and waiting until it reaches the

set A of Theorem 2.4.2 (typically the origin). Trying to apply this to substochastic

Markov chains leads to difficulties since, firstly, a substochastic chain can’t very well

be stationary, and secondly, there is some chance that the simulated chain (or its

reversal) will be absorbed. Clearly, the Folk Theorem does not hold.

Nevertheless, the conviction that it is easier to view the chain backward in time

starting from the rare event than to wait for the rare event to occur persists; indeed,

in some respects this approach is all the more advisable in the substochastic setting

since rarity is effectively compounded by the additional requirement to stay alive. To

reconcile these observations one must employ the reversal of a modified kernel, and

the fact that 1˛QK is a fully stochastic kernel (or that 1

˛QQ a fully stochastic generator)

69

4.1. The Sustained Kernel 70

is suggestive.

Let K be the embedded kernel of the substochastic Markov process X D fXtgt2R

on S , and let Y D fYngn2Z be the corresponding embedded discrete time chain.

Assume that the Yaglom limit � with corresponding eigenvalue ˛ exists. The kernel

which describes transition probabilities conditional upon absorption not occurring

until some time n far into the future is

Pi ŒY1 D j j Yn 2 S� DPi ŒY1 D j; Yn 2 S�

Pi ŒYn 2 S�D Kij

Pj ŒYn�1 2 S�

Pi ŒYn 2 S�: (4.1.1)

Clearly this is a stochastic kernel on S . Under assumption (3.2.13) taking the limit

as n!1 of (4.1.1) gives, by Lemma 3.2.3,

limn!1

Pi ŒY1 D j j Yn 2 S� Dh.j /

˛h.i/Kij D K

hij (4.1.2)

where h is the right eigenvector for K corresponding to ˛ and satisfying hh; 1i� D 1

and

Khij WD

Kijh.j /

˛h.i/(4.1.3)

is the sustained kernel of the chain. If it is known that a rare event will occur at some

point far into the future, so that in particular the chain survives for a long time, the

transition probabilities are described by Kh. It is therefore a chain with this kernel

to which the Folk Theorem should be applied. Note that if one forms the measure on

S given by S � A 7!Pj2AK

hij then the measure Qi of [7] with f D h is recovered.

It is easy to check that the stationary distribution for Kh is given by (3.2.14):

�.i/ D �.i/h.i/; i 2 S:

Therefore, the reversal of Kh (with respect to �) is given by

QKhij D

�.j /

�.i/Khji D

�.j /h.j /

�.i/h.i/

h.i/

˛h.j /Kj i D

�.j /

˛�.i/Kj i :

That is, QKh D1˛QK (it should be clear that QKh is the reversal of Kh with respect to

� whereas QK is the reversal of K with respect to � described in Section 3.2). This

4.1. The Sustained Kernel 71

justifies the guess one might have been led to by part a) of Lemma 3.2.3 about how

to solve the problem described in the beginning of this section.

Consider the multiplicative reversiblizationKh�1˛QK�

ofKh with respect to its sta-

tionary measure �, and let ˆ WD diag.�/. If C WD ˆ1=2�1˛QK�ˆ�1=2 then Kh

�1˛QK�D

ˆ�1=2C>Cˆ1=2 is similar to the positive semidefinite matrix C>C , and so all eigen-

values of Kh�1˛QK�

are real and nonnegative. Also, forp� WD 1>ˆ1=2, bothp

�C D �

�1

˛

�QKˆ�1=2 D

p�

and

Cp�>

D ˆ1=2�1

˛QK

�1 D

p�>

hold; i.e.p� is both a left and right eigenvector for C corresponding to the maximal

eigenvalue 1. Let .�1; g1/ be the eigenpair for QK of Theorem 3.3.2, and assume

g1 can be normalized so that kg1k� D 1. Then v D ˆ1=2g1 is an eigenvector for

C with eigenvalue �1=˛ and satisfies kvk2 D 1. One findsp�v D 0 by noting

p�v D

p�Cv D .�1=˛/

p�v: Let V0 D spanf

p�>; vg. Suppose that a non-zero

spectral gap exists for Kh�1˛QK�, and hence that it has a second largest eigenvalue

ˇ (this is then the second largest singular value of C ). Then by the Courant-Fisher

minimax principal for compact operators,

ˇ D maxdim.V /D2

minx2V;kxk2D1

kCxk2 � minx2V0;kxk2D1

kCxk2 D kCvk2 D j�1j=˛:

Thus

Lemma 4.1.1 If Kh�1˛QK�

has a non-zero spectral gap and if the eigenvector g1 of

Theorem 3.3.2 satisfies kg1k� < 1 then the asymptotic decay rate of �n to 0 is�n

n.1/�1

�ˇn where ˇ is the second largest eigenvalue of Kh

�1˛QK�.

4.2. An Accounting Network With Absorption 72

4.2 An Accounting Network With Absorption

In this section it is shown with an explicit example that applying the Folk Theo-

rem to substochastic processes on infinite state spaces can present difficulties. This

essentially occurs because the kernel 1˛QK has a tendency to avoid the set of points

from which absorption is possible, and this tendency can manifest as a drift away

to infinity, precluding arrival at the point z0 which the large deviation path began

from. This transient behavior is stamped out in finite state spaces. The example of

this section also provides a demonstration of what can go wrong in trying to apply

Theorem 3.3.2.

Consider the following modification to the network of Section 2.4.3: A small

insurance firm with an accounting department consisting of a senior accountant and

her assistant the junior accountant. Operations in the department proceed as follows:

all new claims arrive on the desk of the junior accountant, and do so according to

a Poisson process with rate �. The junior accountant processes the claims at rate

�2, which subsequently proceed to the senior accountant who carries out the more

complicated calculations. The senior accountant processes claims at rate �1 at which

point they leave the system.

Identify the senior accountant’s desk as node 1 and the junior accountant’s desk

as node 2 in this 2-node network. Let a point .x; y/ in the state space S WD N2 n

f.x; 0/I x 2 Ng denote the the number x of claims on the senior accountant’s desk

and the number y of claims on the junior accountant’s desk (see Figure 4.1).

Here the graveyard state consists of all points of the form .x; 0/, which means

that the department ceases operations if the junior accountant is ever idle (the junior

accountant is seen as expendable if such a situation arises and is fired, thereby abol-

ishing the network). Let Xt be the state of the network at time t and let X D .Xt/t�0

be the corresponding sub-stochastic Markov chain. Let K be the transition kernel

for the embedded discrete time chain Y D .Yn/n2N. The transition diagram for Y is


� - y ��

2 -�2

x ��

1 -�1

Figure 4.1: Substochastic Accountant Network

given in Figure 4.2.

For the stochastic chain corresponding to an open Jackson network (where the

set f.x; 0/I x 2 Ng is included in the state space) the stationary distribution is given

by the well known product formula

.x; y/ 7! .1 � �1/�x1 .1 � �2/�

y2 (4.2.1)

where �1 WD�r21�1

and �2 WD��2

. The sub-stochastic case is not so simple, and the

Yaglom limit � , which will be shown to exist, has no product form.

Let ˛ be the eigenvalue corresponding to � , extend � to S [ f.x; 0/I x 2 Ng

with the definition �.x; 0/ WD 0, and let �x.y/ WDPx�0 �.x; y/ and �y.x/ WDP

y�1 �.x; y/ denote the marginal densities for � . The equation ˛� D �K gives

the following interior and boundary conditions:

˛�.x; y/ D ��.x; y � 1/C �1�.x C 1; y/C �2�.x � 1; y C 1/ (4.2.2)

˛�.0; y/ D �1�.0; y/C ��.0; y � 1/C �1�.1; y/; (4.2.3)

for x � 1; y � 1 (see Figure 4.2).

Summing (4.2.2) over all x � 1 and employing (4.2.3) gives

˛.�y.y/ � �.0; y// D �.�y.y � 1/ � �.0; y � 1//C �1.�y.y/ � �.0; y/ � �.1; y// C

�2�y.y C 1/

or

˛�y.y/ D ��y.y � 1/C �1�y.y/C �2�y.y C 1/: (4.2.4)


-

6

y

x„ ƒ‚ …Graveyard

�6

�1�

�2@@@R

�6

�2

@@@R

�1��

Figure 4.2: Transition probabilities for the accountants’ chain

�1

s

�� b ))�2

s

��

aii

b ))�3

s

��

aii

b ))� � �

aii

Figure 4.3: Transition Diagram for Ls

If a WD �2, b WD �, and s WD �1, then this means that the marginal �y corresponds to

the one-dimensional sub-stochastic random walk with loops depicted in Figure 4.3.

The transition matrix Ls corresponding to this chain is given by

Ls D

26666664s b 0 0

a s b 0 � � �

0 a s b

: : :: : :

: : :

37777775 D sI C266666640 b 0 0

a 0 b 0 � � �

0 a 0 b

: : :: : :

: : :

37777775„ ƒ‚ …

L

: (4.2.5)

So if .�v; v/ is an eigenvalue-eigenvector pair for L then .s C �v; v/ is an eigenvalue-

eigenvector pair for Ls. Example 3.1.7 provides the maximal eigenvalue 2pab for L


with its corresponding eigenvector, and therefore

˛ D s C 2pab and �y.y/ D .1 �

pb=a/2y

rb

a

y�1

: (4.2.6)

Ifpab < s then one can then check that � is given by

�.x; y/ D c�x1 �y2

�y C .1 � �

y1 /

�x �

�1

1 � �1

��(4.2.7)

where �1 WDpabs

, �2 WDqba, and c is some normalization constant.

Unfortunately, K is not reversible and in fact QK has state dependent transitions:

QK.x;y/;j D .…�1K>…/.x;y/;j

D

8<ˆ:b��12

y�1C.1��y�11 /.x��1=.1��1///

yC.1��y1 /.x��1=.1��1//

j D .x; y � 1/

s�1yC.1��

y1 /.xC1��1=.1��1///

yC.1��y1 /.x��1=.1��1//

j D .x C 1; y/

a �2�1

yC1C.1��yC11 /.x�1��1=.1��1//

yC.1��y1 /.x��1=.1��1//

j D .x � 1; y C 1/

D

8<ˆ:pab

y�1C.1��y�11 /.x��1=.1��1///

yC.1��y1 /.x��1=.1��1//

j D .x; y � 1/

pab

yC.1��y1 /.xC1��1=.1��1///

yC.1��y1 /.x��1=.1��1//

j D .x C 1; y/

syC1C.1��

yC11 /.x�1��1=.1��1///

yC.1��y1 /.x��1=.1��1//

j D .x � 1; y C 1/

for x > 0; y > 0 and

QK.0;y/;j D

8<ˆ:pab

y�1C.1��y�11 /.��1=.1��1///

yC.1��y1 /.��1=.1��1//

j D .0; y � 1/

pab

yC.1��y1 /.1��1=.1��1///

yC.1��y1 /.��1=.1��1//

j D .1; y/

s j D .0; y/

for y > 0. However, for .x; y/ far from the origin the transition rates of 1˛QK begin to


-

6

y

x

pab=˛?

pab=˛-

s=˛

@@@I

pab=˛?

pab=˛-

s=˛

��

Figure 4.4: Transition Diagram for QK1

approach those of the kernel QK1 on S n f.x; 0/ j x � 0g given by

QK1.x;y/;j WD1

˛

8ˆ<ˆˆ:

pab�fy > 1g j D .x; y � 1/

pab�fy D 1g j D .x; 1/

pab j D .x C 1; y/

s�fx > 0g j D .x � 1; y C 1/

s�fx D 0g j D .0; y/

(4.2.8)

(see Figure 4.4).

One rare event of interest for this network is the situation where the desk of the

senior account is overflowing with claims. This catastrophe arises as an excursion from

.0; 1/ to a distant point .`; y/ (here the point .`; 1/ is considered). Since ` is large

such paths require many intermediate steps in which absorption is impossible, and so

in light of (4.1.2) Kh most appropriately describes the transition probabilities of such

paths as `!1. Using the methodology described in Section 4.1, one would attempt

to observe a chain with initial distribution I.`;1/ and kernel 1˛QK until it reaches .0; 1/.

This approach breaks down, however, yielding an interesting conclusion.


First notice that QK1 is a 2-node Jackson network whose flows into and out of

node 2 (the y coordinate) are equal. As in Section 2.4.3 artificially enlarge the state

space to Sy D N�Z and let QK1;y be the appropriately modified kernel. The interior

fluid limit Qp1 associated with QK1;y satisfies (2.4.18):

d

dtQp1.t/ D

pab � s

˛;s �pab

˛

!;

the solution of which is Qp1.t/ D Qp1.0/ C.s�pab/

˛t .�1; 1/. Thus with the initial

condition Qp1.0/ D .1;1`/, Qp1 is a line connecting the points

Qp1.0/ D .1; 1/ and Qp1

�˛

s �pab

�D .0; 1C 1=`/:

Let T D ˛

s�pab

. Thus if the process QX1 is associated with QK1, for any � > 0 and

large ` one would expect QX to lie within the “tube”

QF` WD

(sup0�t�T

QX1.`t/`� Qp1.t/

2

� �

):

This means that paths beginning from .`; 1/ will make their way toward the axis,

hitting it in a region around the point .0; `C 1/.

However, once the backwards process reaches the y axis it is difficult to charac-

terize its behavior. Since the steady state � for QK1;y in the x direction is

�.x/ WD .1 � �1/ �x1 x � 0;

the boundary fluid limit Qp11 associated with QK1;y satisfies

d

dtQp11.t/ D .1 � �1/

pab

˛.1;�1/C �1

s �pab

˛.�1; 1/ D .0; 0/

(see (2.4.21)). Thus, the process QX1 has no net drift once it reaches the axis. The

situation is worse for QX since it can be shown that there is a slight drift in the positive

direction, the magnitude of which tends to 0 as y !1, but is non negligible at small


Figure 4.5: ]

Backwards Path for Large Deviation

values of y. Therefore there should be no expectation that QX will ever reach the point

.0; 1/, and indeed this is verified by simulation.

The process QX1 does eventually arrive at the point .0; 1/ however, and QX can be

made to do so by placing reflecting boundaries at x D B and y D B for some value

B. Figure 4.5 shows such a path beginning at .`; 1/ and transitioning to .0; 1/ for

` D 34, � D b D 0:1, �2 D a D 0:25, and s D 0:65. It can be seen that the process

adhered rather closely to the interior fluid limit, and then jittered wildly near the y

axis for a time before finally reaching .0; 1/.

As mentioned at the beginning of this section, it is the transient behavior of 1˛QK

(specifically in the y direction) which prevents some version of the Folk Theorem

from holding. Now it will be shown that the hypotheses of Theorem 3.3.2 fail to hold

for the marginal process in the y direction. Let Y s D fY sn gn2N be a substochastic

Markov chain with kernel Ls, and let Y D fYngn2N be a Markov chain with kernel

1aCb

L with Ls; L given by (4.2.5). Figure 4.3 depicts the transitions for Y s.

Let N s1 and N1 be the number of steps, respectively, Y s and Y make before

arriving at state 1. Let U si be iid random variables having geometric density with


probability of success 1 � s. The chain Y s will make a geometrically distributed

number of loops (with probability 1� s of success) at each state before transitioning

to a new state. Among the remaining transitions the proportion which are of the

form i ! i C 1 is a0 and the proportion which are of the form i ! i � 1 is b0, and

so one can write N s1

dDPN1iD1.U

si C 1/. Letting Fij .z/ and F sij .z/ be the series (5.1.1)

corresponding to 1aCb

L and Ls respectively,

F s21.z/ D E2ŒzNs1 �

D E2hzPN1iD1

.U siC1/i

D

1XkD1

E2hzPkiD1.U

siC1/iP2ŒN1 D k�

D

1XkD1

E2ŒzUs1 �kzkP2ŒN1 D k�:

Since

Ej ŒzUs1 � D

1XmD0

zmsm.1 � s/ D1 � s

1 � zs

for any j 2 S if jzsj < 1 it follows that

F s21.z/ D

1XkD1

�1 � s

1 � zs

�kzkP2ŒN1 D k� D F21

�.1 � s/z

1 � zs

�: (4.2.9)

Using the formula on pp. 428 of [24] then gives

F s21.z/ D1 �

p1 � 4a0b0.1 � s/2z2=.1 � zs/2

2b0.1 � s/z=.1 � zs/

D1 �

p1 � 4abz2=.1 � zs/2

2bz=.1 � zs/:

SinceP1nD1 P1ŒN

s1 D n�zn D zs C

P1nD2 P1ŒN

s1 D n�zn D zs C zb

P1nD2 P2ŒN

s1 D

n � 1�zn�1 i.e. F s11.z/ D zs C zbFs21.z/, one can write

F s11.z/ D zs C1 �

p1 � 4abz2=.1 � zs/2

2=.1 � zs/:


Note that

1 � 4abz2=.1 � zs/2 � 0 ,

.1 � sz/2 � 4abz2 ,

.s2 � 4ab/z2 � 2sz C 1 � 0;

which is equivalent to the condition z � 1

sC2pab

or z � 1

s�pab

. Since F11.z/ is an

increasing function, it reaches its singularity precisely at

Rs WD1

s C 2pab

which must therefore be the radius of convergence corresponding to Ls. Note that

Rs is, as one might expect, the reciprocal of the eigenvalue given by (4.2.6), and that

F11.Rs/ Ds

s C 2pabC

1

2=�1 � s

sC2pab

�D

2s

2.s C 2pab/Cs C 2

pab � s

2.s C 2pab/

Ds Cpab

s C 2pab:

If s D 0 this says that F11.Rs/ D12, which is in agreement with [24]. Setting

F11.Rs/ D 1 yields the requirement ab D 0, and so there are no values of s for which

the matrix Ls is Rs-recurrent. Therefore, Theorem 3.3.2 cannot be applied to find

the rate of convergence of �n to 0 for the substochastic kernel Ls.

Chapter 5

Finding Quasi-Stationary Measures

In general, finding a quasi-stationary distribution for a substochastic Markov chain

can be more difficult than finding the steady state of a fully stochastic chain. The

accounting network of Section 4.2 serves as a poignant example of this: the steady

state of the corresponding Jackson network is given by the classic formula (4.2.1)

whereas the quasi-stationary distribution for the chain is complicated, not easy to

arrive at (many thanks to Nicolas Gast for help with this), and only has the nice

closed form (4.2.7) by a stroke of luck!

For finite state spaces with small enough cardinality the problem of finding a

quasi-stationary distribution is a trivial eigenvector problem. The following sections

give techniques for representing or approximating the quasi-stationary distribution

when the state space is of large cardinality or infinite. Theorem 5.1.6 can be used to

construct the desired measure from a 1-invariant measure of a matrix indexed by a

subset A of the state space (in general A may be small, thereby reducing the problem

greatly). The kernel of the watched process of Section 2.2 is used as a tool in this

construction. Much of what appears in Section 5.1 is motivated by [25], and some of

the same ground is covered but with different proofs.

Next, the Approximation Lemma (Lemma 5.2.1) says that the quasi-stationary

81

5.1. A Representation Theorem 82

measure can be constructed as a function of an approximate R�1-invariant measure

(with R given in (3.2.17)). The Lemma is an extension of Lemma 5.1 of [8] to the case

of substochastic chains. A situation where this result may be useful is one in which

the set A from which absorption is possible can be thought of as a kind of boundary,

so that restricted to Ac the chain is nice enough that an R�1-invariant measure can

be computed. This is the case in Example 5.2.2 where the Approximation Lemma is

applied.

5.1 A Representation Theorem

Let fYngn2N be a discrete time Markov chain with substochastic kernel K and let

f.0/ij WD 0; f

.n/ij WD Pi Œ�j .1/ D n�; n � 1;

be the probability that the first entry to j occurs on the nth step when starting in

state i (here �j is given by (2.2.4) with B D fj g). Let

Fij .z/ WD

1XnD1

f.n/ij zn; (5.1.1)

and let R be the radius of convergence of the potential series (3.2.16). Then for any

i; j 2 S

Gij .z/ � �fi D j g D

1XnD1

1XmD1

�fm � ngf.m/ij zmKn�m

jj zn�m

D

1XmD1

f.m/ij zm

1XkD0

Kkjj z

k

D Fij .z/Gjj .z/I

this is true for any z 2 C. For jzj < R these series are convergent (clearly the radius

of convergence of F.z/ is no less than R) and in particular

Gi i.z/ D1

1 � Fi i.z/:


If K is R-recurrent then taking z " R yields 1 on the left hand side, and hence

Fi i.R/ D 1 (see [25] for a more detailed discussion).

For A � S let Wn D Y�A.n/ be the process watched on A, which arises by

restricting attention to Y at the entrance times (2.2.4) into A. Lemma 2.2.1 says

that W is a Markov chain. Let AK.n/ be given by (2.2.5) with B D A, let Af

.0/ij D 0,

Af.1/ij D Kij , and for n � 2 define

Af.n/ij WD

Xk…A[fj g

Af.n�1/

ikKkj D Pi Œ�A.1/ � n; �j .1/ D n�I

this is the probability that the chain Y , starting from i , moves to j for the first time

via a path that avoids A in the n � 1 intermediate steps. Note that if j 2 A then

Af.n/ij D f

.n/ij D AK

.n/ij . Let

AFij .z/ WD

1XnD1

Af.n/ij zn; AGij .z/ WD

1XnD1

AK.n/ij z

n: (5.1.2)

Of course, the radii of convergence of these series is no less than R.

Lemma 5.1.1 For i 2 A and j 2 S ,

AGij .z/ D zKij C zXk…A

AGik.z/Kkj :

Proof: Straightforward calculation yields

AGij .z/ D

1XnD1

Xk…A

AK.n�1/

ikKkj z

n

D zKij C z

1XnD2

Xk…A

AK.n�1/

ikKkj z

n�1

D zKij C zXk…A

1XnD1

AK.n/

ikznKkj

D zKij C zXk…A

AGik.z/Kkj :


For i; j 2 A define

Jij WD AFij .R/ D AGij .R/; (5.1.3)

let �0ij D �fi D j g, �1ij WD Jij and

�nij WDXk¤j

�n�1ik Jkj ; n � 2:

This is analogous to the definition of Afnij , and so �nij can be informally thought of as

the “probability”, starting in i , that the first entry to j occurs on the nth step with

kernel J . Strictly speaking however J can be, for instance, superstochastic. Let

îj .z/ WD

1XnD0

�nij zn; „ij .z/ WD

1XnD0

J nij zn:

Just as before each of the series „ij .z/ share a common radius of convergence, and

for i; j 2 A,

„ij .z/ � �fi D j g D

1XnD1

1XmD1

�fm � ng�mij zmJ n�mjj zn�m

D

1XmD1

�mij zm

1XkD0

J kjj zk

D îj .z/„jj .z/:

In particular,

„i i.z/ D1

1 �î i.z/(5.1.4)

whenever jzj is less than the radius of convergence of „.

Theorem 5.1.2 Suppose that a measure �A on A is invariant for J , i.e.

�A.j / DXi2A

�A.i/Jij for j 2 A:


Define the measure � on S via

�.k/ WDXi2A

�A.i/AGik.R/ for k 2 S:

Then �.j / D �A.`/ for all ` 2 A and � is R�1-invariant for K.

Proof: It follows from the definition of � that �.`/ D �A.`/ if ` 2 A. Then for

any `, Xk2S

�.k/Kk` DXk2A

�.k/Kk` CXk2Ac

�.k/Kk`

D

Xi2A

�A.i/Ki` CXi2A

�A.i/Xk2Ac

AGik.R/Kk`

D R�1Xi2A

�A.i/

RKi` CR

Xk2Ac

AGik.R/Kk`

!D R�1

Xi2A

�A.i/AGi`.R/ by Lemma (5.1.1)

D R�1�.`/:

Corollary 5.1.3 If K is R-recurrent then it has an R�1-invariant measure.

Proof: Let A D fig for any i 2 S . Then AFi i.z/ D Fi i.z/ and so �A.j / WD �fi D

j g satisfies

�A.i/Ji i D �A.i/AFi i.R/ D �A.i/Fi i.R/ D �A.i/;

i.e. �A is invariant for J . Then, by Theorem 5.1.2 �A can be extended to a measure

� which is R�1-invariant for K.

Note that Corollary 5.1.3 is implied by Theorem 4:1 of [25], but has been verified

here using different techniques.


Lemma 5.1.4 The matrix J is irreducible, and if K is R-recurrent then J is 1-

recurrent.

Proof: Take arbitrary points i; j 2 A. For a sequence i D k0; k1; :::; km D j

specifying a path from i to j , i.e. satisfying Kk0k1 � � �Kkm�1km > 0, let 0 D n0 <

n1 < ::: < n` D m be the subsequence of 0; 1; :::; m enumerating the points of A (i.e.

kd 2 A if and only if d D np for some p). If H is the collection of all paths from i

to j of length m and having entrances kn1; :::; kn` D j into A at times n1; :::; n` D m

then

AKn1k0kn1

� � � AKm�n`�1kn`�1km

D

X.j0;:::;jm/2H

Kj0j1 � � �Kjm�1jm (5.1.5)

� Kk0k1 � � �Kkm�1km

> 0;

(5.1.6)

which means AKnrC1�nrknr knrC1

> 0 and thus Jknr knrC1 > 0 for every r � 0. Then

J `ij � Jk0kn1 � � �Jkn`�1kn` > 0;

so J is irreducible.

Now let i D j , sum the term in (5.1.5) over all possible values of kn1; :::; kn`�1 2 A

and every possible collection of ` entry times n1; :::; n`�1; n` D m, multiply by Rm,

then sum over all m. The resulting term on the left is (the rather complicated)

1XmD1

Xn1<��<n`Dm

X.kn1 ;:::;kn`�1 /

AKn1ikn1

Rn1 � � � AKm�n`�1kn`�1 i

Rm�n`�1

D

X.kn1 ;:::;kn`�1 /

1Xn1D1

AKn1ikn1

Rn11X

n2Dn1C1

AKn2�n1kn1kn2

Rn2�n1 � � �

1Xn`Dn`�1C1

AKn`�n`�1kn`�1 i

Rn`�n`�1

D

X.j1;:::;j`�1/2A`�1

1Xm1D1

AKm1ij1Rm1

1Xm2D1

AKm2j1j2

Rm2 � � �

1Xm`D1

AKm`j`�1i

Rm`


D

X.j1;:::;j`�1/2A`�1

Jij1Jj1j2 � � �Jj`�1i :

Now summing over all ` gives

1X`D1

X.j1;:::;j`�1/2A`�1

Jij1Jj1j2 � � �Jj`�1i D

1X`D1

J ì i :

This is equivalent to summing (5.1.5) over all paths from i to i of length m, multi-

plying by Rm, and summing over m, which producesP1mD0K

miiR

m on the right side

of that equation. This proves that

1XmD0

KmiiR

mD

1X`D1

J ì i :

Because ofR-recurrence the sum on the left hand side is infinity. ThereforeP1nD0 J

nii D

1. Since î i.z/ is a continuous function of z, the above argument together with

(5.1.4) implies that î i.1/ � 1.

Now suppose that jzj < 1. Then

î i.z/ D

1XnD1

X.i1;:::;in�1/2.Anfig/n�1

Ji i1Ji1i2 � � �Jin�1izn

�

1XnD1


figFi i1.R/figFi1i2.R/ � � � figFim�1i.R/jzjn

<

1XnD1


figFi i1.R/figFi1i2.R/ � � � figFim�1i.R/

D Fi i.R/:

Since Fi i.R/ D 1 this means that î i.z/ < 1 for z < 1. Since î i.1/ � 1 continuity

of ˆ implies that the radius of convergence of „i i is 1.

Corollary 5.1.5 If K is R-recurrent there are no 1-subinvariant measures for J

which are not 1-invariant, and if a 1-invariant measure (such as �A in Theorem

5.1.2) exists then it is unique up to constant factors.


Proof: By Lemma 5.1.4 J is 1-recurrent and so this is a direct consequence of

Corollary 2 in Section 4 of [25].

Theorem 5.1.6 Suppose K is R-recurrent so that (by Corollary 5.1.3) it has an R�1-

invariant measure �. Then, for j 2 S , �.j / DPi2A �.i/AGij .R/. In particular, �

restricted to A is a 1-invariant measure for J .

Proof: For n D 1, the equation

�.j / D

nXmD1

Xi2A

�.i/ AKmijR

mCRn

Xi2Ac

�.i/ AKnij (5.1.7)

holds by the definition of an R�1-invariant measure. Suppose (5.1.7) holds for some

n � 1. The last term in (5.1.7) is

RnXi2Ac

�.i/AKnij D R

nXi2Ac

RXk2S

�.k/Kki

!AK

nij

D

Xk2A

�.k/Xi2Ac

KkiAKnijR

nC1CRnC1

Xk2Ac

�.k/Xi2Ac

Kki AKnij

D

Xk2A

�.k/ AKnC1kj

RnC1 CRnC1Xk2Ac

�.k/ AKnC1kj

;

and so (5.1.7) holds for nC1 also. By induction it holds for all n, and letting n!1

gives

�.j / �Xi2A

�.i/ AGij .R/ for all j 2 S: (5.1.8)

Take j 2 A so the above implies � restricted to A is 1-subinvariant for J . By

Corollary (5.1.5) any subinvariant measure must in fact be invariant so � restricted

to A is 1-invariant for J . Hence by Theorem 5.1.2, defining

�.j / DXi2A

�.i/AGij .R/

5.2. The Approximation Lemma 89

for j 2 S yields an R�1-invariant measure for K, and by Corollary 2 in Section 4 of

[25] � is the unique measure with this property, up to constant factors. Since � has

this property and agrees with � on A, � D � and the result follows.

Corollary 5.1.7 Any two R�1-invariant measures � and � for an R-recurrent kernel

K are constant multiples of each other.

Proof: Let A D fig. Without loss of generality assume that each measure is

normalized so that �.A/ D �.A/ D 1. Hence, by Theorem 5.1.6, �.j / D AGij .R/

and �.j / D AGij .R/. In other words, before normalization � and � are multiples.

This is again a result which is implied by Corollary 2 of Section 4 of [25].

To see the power of Theorem 5.1.6, suppose an R-recurrent kernelK is in hand for

which the unique quasi-invariant measure � is sought. Examples of interest are those

in which the state space is infinite or at least of large enough cardinality as to make

computerized eigenvalue/eigenvector analyses intractable. Perhaps K exhibits some

degree of “regularity” across much of the state space but is irregular with respect to

transitions into or out of a set A of manageable size. Then the procedure for finding

� suggested by Theorem 5.1.6 is to form the small jAj � jAj matrix J and compute

its unique 1-invariant measure �A, a much simpler computation, and then compute

� as

�.j / DXi2A

�.i/AGij .R/:

5.2 The Approximation Lemma

Suppose an irreducible R-recurrent kernel K has R�1-invariant measure � , and sup-

pose one can find an approximate R�1-invariant measure for K, i.e. satisfying, for


some set A � S ,

R�1 .i/ DXk2S

.k/Kki for i 2 Ac: (5.2.1)

Define the approximate time reversal of K with respect to to be the matrix

�K ij WD

R .j /

.i/Kj i : (5.2.2)

Note that if i 2 Ac thenPj2S

�K ij D 1. Thus it is possible to define a sub-stochastic

Markov chain �Y D f

�Y ngn2N which transitions from state i 2 Ac to j 2 Ac with

probability �K ij and is absorbed on the set A, starting from i 2 Ac, with probabilityP

j2A

�K ij . Let �� D inffn � 0 W

�Y n 2 Ag, which may be infinite.

Lemma 5.2.1 For j 2 Ac,

�.j /

.j /D Ej

"�. �Y �� //

. �Y �� /

�f �� <1g

#: (5.2.3)

Proof: Equation (5.2.2) implies that

AKnij D Pi ŒY1 62 A; : : : ; Yn�1 62 A; Yn D j �

D Pj Œ �Y 1 62 A; : : : ;

�Y n�1 62 A;

�Y n D i �

D .j /

.i/RnA

�K nji :

According to Theorem (5.1.6), j 2 Ac implies

�.j / DXi2A

�.i/AGij .R/

D

Xi2A

�.i/

1XnD1

AKnijR

n

D

Xi2A

�.i/

1XnD1

.j /

.i/A

�K nji

D .j /

1XnD1

Xi2A

A

�K nji

�.i/

.i/


D .j /Ej

"�. �Y �� /

. �Y �� /

�f �� <1g

#;

which establishes (5.2.3).

Example 5.2.2 Let 0 < � < � < 0 be such that � D 1 � � and let K be the

irreducible, substochastic kernel on S WD N given by

Kij WD

8<ˆ:� j D i C 1

� j D i � 1 ¤ 0; or i D j D 0

�0 i D 1; j D 0

I

here �0 2 .0; �/ is the probability of surviving a jump from state 1 to state 0. Let X be

a process whose transition probabilities are described by K. In [24] it was determined

that F21.z/ D1�p1�4��z2

2�z, and since f .k/01 D �

k�1� it follows that

F11.z/ D

1XnD1

f.1/10 f

.n�1/01 zn C

1XnD1

f.1/12 f

.n�1/21 zn

D �0zF01.z/C �zF21.z/

D �0z

1XnD1

�n�1�zn C1 �

p1 � 4��z2

2

D ��0z2 1

1 � �zC1 �

p1 � 4��z2

2

for jzj < minn1�; 1

2p��

o. This minimum is equal to 1

�precisely when

2p�� < �; or 4� < �: (5.2.4)

Assume that (5.2.4) holds. Then on Œ0;1/ the function F11.z/ is non-decreasing and

encounters a singularity at z D 1=� > 1. Recall Gi i.z/ D1

1�Fii .z/for jzj < R where


R is the radius of convergence of Gi i.z/. Thus R � 1�

. Setting F11.R/ D 1 yields

2.1 � �z/ D 2��0z2 C .1 � �z/.1 �

p1 � 4��z2/ ”

1 � �z � 2��0z2 D �.1 � �z/.

p1 � 4��z2/ H)

.1 � �z � 2��0z2/2 D .1 � �z/2.1 � 4��z2/:

After some simplification this reduces to the polynomial equation

0 D .�0 � �/C �.2� � �0/z � .�3C ��20/z

2:

This will have a real solution R 2�1; 1�

�precisely when K is R-recurrent. The subset

of the parameter space where this occurs is non-empty; for example taking � D 0:6

and �0 D 0:5 yields R � 1:0184. Let R > 1 be the root of this polynomial which gives

the radius of convergence of Gi i.z/. Since F11.R/ D 1 the kernel K is R-recurrent.

Let A D f0; 1g and

J D

24 F00.R/ F01.R/

F10.R/ AF11.R/

35 D 24 �R �R

�0R1�p1�4��R2

2

35 :By Lemma 5.1.4 this matrix has 1 as an eigenvalue. Let B D f0; 1; 2; 3g and consider

that, since the transition rates into and out of each state i > 1 are equal, BF33.z/ D

AF11.z/. Therefore

F31.z/ D �2z2 C ��z2F31.z/C BF33.z/F31.z/

D �2z2 C ��z2F31.z/C AF11.z/F31.z/;

which means that

F31.z/ D�2z2

1 � ��z2 � AF11.z/:

Now

AF11.z/ D ��z2C

1XnD3

f.n/11 z

n

D ��z2 C �2z2F31.z/


D ��z2 C �2z2�2z2

1 � ��z2 � AF11.z/;

and multiplying both sides of this equation by 1 � ��z2 � AF11.z/ yields

.1 � ��z2 � AF11.z//AF11.z/ D ��z2� ��z2AF11.z/:

Therefore

AF11.z/ D ��z2C AF11.z/

2: (5.2.5)

Setting z D R, dividing both sides of this equation by R2� and letting � WD 1R�A

F11.R/

gives1

R� D �C ��2:

This means that .i/ WD .1 � �/�i satisfies (5.2.1) and hence is an approximate

R�1-invariant measure for K. Thus the Approximation Lemma can be applied. Since �Y �� 1, the Approximation Lemma says that the quasi-stationary measure � for K

satisfies

�.j / D�.1/

.1/Pj Œ �� <1� .j /; j 2 Ac: (5.2.6)

Note that

�K ij D

R .j /

.i/Kj i D

8<:

��R2

AF11.R/j D i � 1

AF11.R/ j D i C 1

for i 2 Ac. Let �� WD AF11.R/ and �� WD ��R2

AF11.R/. Note that by (5.2.5)

�� C �� D

AF11.R/2 C ��R2

AF11.R/D 1;

as required. Note also that (5.2.4) and R < 1�

imply 0 < 4��R2 < 1. Then

1 � 4��R2 <p1 � 4��R2 H)

.1 �p1 � 4��R2/2 < 4��R2 H)

AF11.R/2 < ��R2 H)

�� < �� :


Therefore Pj Œ �� <1� D 1 and (5.2.6) becomes

�.j / D �.1/�j�1; j > 1:

Chapter 6

Conclusions and Future Work

The aim of this thesis has been to make some constructive observations about different

varieties of rare events.

For stationary stochastic processes there is the Folk Theorem (Theorem 2.4.2)

which explains nicely what one should expect the typical large deviation path to look

like, under suitable conditions. One needs only to have an understanding of how

the reversed process evolves, which is always true when dealing with reversible and

stationary Markov chains, and a “tube” of backward trajectories QF` within which

the reversed process remains asymptotically. This theorem successfully plants with

rigorous footing the time reversal techniques which have already been in use for

some time, in a setting (stationary point processes) which is more general than the

Markovian case.

The specialization of this result to discrete time Markov chains in Theorem 2.4.6

links the abstract to the checkable, and the accountants example of Section 2.4.3

demonstrates nicely how the Folk Theorem can be used to find the most likely path

to a snowed in senior accountant. This could allow the senior accountant to forestall

a catastrophe by keeping an eye out for early warning signs that the large deviation

path is underway.

95

96

The extension of these reversal techniques to the substochastic Markovian case

is not complete. One problem is the lack of stationarity for substochastic Markov

processes. If a new Folk Theorem is to be formulated which does not require station-

arity then it will likely bear little resemblance to Theorem 2.4.2, and in any case it

is not clear how this would be done. Instead, it was argued in Section 4 that one

should study a fully stochastic transformation of the substochastic kernel (namely the

sustained kernel Kh given by (4.1.2)). There are good reasons to consider this kernel,

but in the R-transient case it is difficult to describe the tube of backwards trajectories

QF` originating from the rare event with this device alone. Some new techniques will

need to be developed in a future work.

In the case that a unique Yaglom limit exists for a substochastic chain, it plays

a prominent role in the formation of a kernel 1˛QK governing the backwards process

of the chain starting at a rare initial point. Thus in Chapter 3 this important object

(the Yaglom limit �) was introduced and a brief review of some standard results

was given. It is natural to ask how quickly the Yaglom limit is reached, and under

some conditions on the initial distribution �0 and the assumption of R-positivity, this

question was successfully answered in Theorem 3.3.2. It was also demonstrated that

the straightforward spectral analysis which gives the rate of convergence in the fully

stochastic setting can fail to produce correct conclusions in the substochastic setting.

The conditions on �0 required to prove Theorem 3.3.2 are known to hold on all

finite state spaces, but the infinite dimensional case is not fully understood. It may

be that the infinite dimensional case is very sensitive to the metric which measures

the distance from the conditional distribution �n at time n and the Yaglom limit � .

This is a difficult topic, but the humble hope is that more headway can be made in

the future.

Finally, in Chapter 5 the problem of finding quasi-stationary measures (or Ya-

glom limits) is addressed in two main results: the Representation Theorem and the

Approximation Lemma. The first is an extension of the work [25] of Vere-Jones, and

97

the second is an extension of Lemma 5.1 of [8] to the substochastic case. It was shown

in Example 5.2.2 that these techniques can work nicely to yield quasi-stationary dis-

tributions.

Bibliography

[1] I. J. B. F. Adan, R. Foley, D. McDonald, (2009). Exact asymptotics for the

stationary distribution of a Markov chain: a production model. Queueing Syst.,

311–344.

[2] V. Anantharam, P. Heidelberger, and P. Tsoucas, (1990). Analysis of Rare

Events in Continuous Time Markov Chains via Time Reversal and Fluid Ap-

proximation . Research Report, Computer Science Division, IBM T.J. Watson

Center.

[3] F Baccelli, P. Bremaud, (2003). “Elements of Queueing Theory” Springer Verlag.

[4] F Baccelli, D McDonald,(1997). Rare events for stationary processes. Stochastic

Processes and their Applications 89, 141–173.

[5] P. Billingsley, (1995). “Probability and Measure” Wiley Series in Probability and

Mathematical Statistics.

[6] A.A. Borovkov, A.A. Mogul’skii, (2001). Limit theorems in the boundary hitting

problem for a multidimensional random walk. Siberian Mathematical Journal 4,

245–270.

[7] L.A. Breyer, G.O. Roberts (1997). A Quasi-Ergodic Theorem for Evanescent

Processes, preprint.

98

BIBLIOGRAPHY 99

[8] J. Collingwood, R. Foley and D. McDonald, (2012). Networks with Cascading

Overloads, Journal of Industrial and Management Optimization, 4, 877–894.

[9] P. Dupuis, R. S. Ellis, A. Weiss, (1991). Large Deviations for Markov Processes

with Discontinuous Statistics, I: General Upper Bounds. The Annals of Proba-

bility, Vol. 19, No. 3, 1280 – 1297.

[10] P. A. Ferrari, H. Kesten, S. Martinez, P. Picco, (1995). Existence of Quasi-

Stationary Distributions. A Renewal Dynamical Approach. The Annals of Prob-

ability, Vol. 23, No. 2, 501 – 521.

[11] J. A. Fill, (1991). Eigenvalue Bounds on Convergence to Stationarity for Non-

reversible Markov Chains, With an Application to the Exclusion Process. The

Annals of Applied Probability, Vol. 1, No. 1, 62–87.

[12] R. Foley, D. McDonald, (2001). Join the shortest queue: stability and exact

asymptotics, Annals of Applied Probability, 11, 569–607.

[13] R. Foley, D. McDonald, (2004). Large deviations of a modified jackson network:

stability and rough asymptotics, Annals of Applied Probability, 15, 519–541.

[14] R. Foley, D. McDonald, (2005). Bridges and Networks: Exact Asymptotics, An-

nals of Applied Probability, 15, pp. 542–586.

[15] R. Foley, D. McDonald, Constructing a harmonic function for an irreducible non-

negative matrix with convergence parameter R > 1. Accepted subject to revisions

London Mathematical Society.

[16] I. Ignatiouk-Robert, C. Loree,(2010). Martin boundary of a killed random walk

on a quadrant Ann. Probab. 38 , 1106–1142.

[17] H. Kesten, (1995). A Ratio Limit Theorem For (Sub) Markov Chains on f1; 2; :::g

With Bounded Jumps. Adv. Appl. Prob. 27, 652–691.

BIBLIOGRAPHY 100

[18] T. Kurtz, (1978). Strong approximation theorems for density dependent Markov

chains, Stochastic Processes and their Applications 6, 223–240.

[19] K. Majewski, K. Ramanan,(2008). How large queue lengths build up in a Jackson

network, preprint.

[20] S.P. Meyn, R.L. Tweedie, (1993). Markov chains and stochastic stability.

Springer-Verlag, London. Available at: probability.ca/MT.

[21] M. Miyazawa, Y. Zhao, (2004). The stationary tail asymptotics in the GI/G/1-

type queue with countably many background states Advances in Applied Proba-

bility, 36, 1231 – 1251.

[22] V. Nicola, T. Zaburnenko, (2007).Efficient importance sampling heuristics for

the simulation of population overflow in jackson networks. ACM Transactions

on Modeling and Computer Simulation, 17.

[23] A. Schwartz, A. Weiss, (1995). “Large Deviations For Performance Analysis”

Chapman & Hall.

[24] E. Seneta, D. Vere-Jones, (1966). On Quasi-Stationary Distributions in Discrete-

time Markov Chains with a Denumerable Infinity of States. Journal of Applied

Probability, Vol. 3, No. 2, 403–434.

[25] D. Vere-Jones (1967). E rgodic Properties of Non-Negative Matrices I. Pacific

Journal of Mathematics 22, 361-386.

path properties of rare events - university of ottawa€¦ · 3 to ˇfor discrete time markov...

Documents