advanced topics in markov chainsstaff.utia.cas.cz/swart/chain10.pdf · proposition 0.3(optional...

Advanced Topicsin Markov chains

J.M. Swart

May 16, 2012

Abstract This is a short advanced course in Markov chains, i.e., Markovprocesses with discrete space and time. The first chapter recalls, with-out proof, some of the basic topics such as the (strong) Markov prop-erty, transience, recurrence, periodicity, and invariant laws, as well assome necessary background material on martingales. The main aim ofthe lecture is to show how topics such as harmonic functions, coupling,Perron-Frobenius theory, Doob transformations and intertwining are allrelated and can be used to study the properties of concrete chains, bothqualitatively and quantitatively. In particular, the theory is applied tothe study of first exit problems and branching processes.

2

Notation

N natural numbers 0, 1, . . .N+ positive natural numbers 1, 2, . . .N N ∪ ∞Z integers

Z Z ∪ −∞,∞Q rational numbersR real numbers

R extended real numbers [−∞,∞]C complex numbersB(E) Borel-σ-algebra on a topological space E1A indicator function of the set AA ⊂ B A is a subset of B, which may be equal to BAc complement of AA\B set differenceA closure of Aint(A) interior of A(Ω,F ,P) underlying probability spaceω typical element of ΩE expectation with respect to Pσ(. . .) σ-field generated by sets or random variables‖f‖∞ supremumnorm ‖f‖∞ := supx |f(x)|µ ν µ is absolutely continuous w.r.t. νfk gk lim fk/gk = 0fk ∼ gk lim fk/gk = 1o(n) any function such that o(n)/n→ 0O(n) any function such that supn o(n)/n ≤ ∞δx delta measure in xµ⊗ ν product measure of µ and ν⇒ weak convergence of probability laws

Contents

0 Preliminaries 50.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.2 Filtrations and stopping times . . . . . . . . . . . . . . . . . . . . . 60.3 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70.4 Martingale convergence . . . . . . . . . . . . . . . . . . . . . . . . . 90.5 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100.6 Kernels, operators and linear algebra . . . . . . . . . . . . . . . . . 140.7 Strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . 150.8 Classification of states . . . . . . . . . . . . . . . . . . . . . . . . . 170.9 Invariant laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1 Harmonic functions 231.1 (Sub-) harmonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.2 Random walk on a tree . . . . . . . . . . . . . . . . . . . . . . . . . 291.3 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.4 Convergence in total variation norm . . . . . . . . . . . . . . . . . . 37

2 Eigenvalues 412.1 The Perron-Frobenius theorem . . . . . . . . . . . . . . . . . . . . . 412.2 Subadditivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3 Spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.4 Existence of the leading eigenvector . . . . . . . . . . . . . . . . . . 452.5 Uniqueness of the leading eigenvector . . . . . . . . . . . . . . . . . 462.6 The Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . . 482.7 The spectrum of a nonnegative matrix . . . . . . . . . . . . . . . . 512.8 Convergence to equilibrium . . . . . . . . . . . . . . . . . . . . . . 532.9 Conditioning to stay inside a set . . . . . . . . . . . . . . . . . . . . 55

3 Intertwining 613.1 Intertwining of Markov chains . . . . . . . . . . . . . . . . . . . . . 613.2 Markov functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.3 Intertwining and coupling . . . . . . . . . . . . . . . . . . . . . . . 693.4 First exit problems revisited . . . . . . . . . . . . . . . . . . . . . . 72

4 Branching processes 754.1 The branching property . . . . . . . . . . . . . . . . . . . . . . . . 75

3

4 CONTENTS

4.2 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . 784.3 The survival probability . . . . . . . . . . . . . . . . . . . . . . . . 814.4 First moment formula . . . . . . . . . . . . . . . . . . . . . . . . . 844.5 Second moment formula . . . . . . . . . . . . . . . . . . . . . . . . 864.6 Supercritical processes . . . . . . . . . . . . . . . . . . . . . . . . . 894.7 Trimmed processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

A Supplementary material 97A.1 The spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Chapter 0

Preliminaries

0.1 Stochastic processes

Let I be a (possibly infinite) interval in Z. By definition, a stochastic processwith discrete time is a collection of random variables X = (Xk)k∈I , defined onsome underlying probability space (Ω,F ,P) and taking values in some measurablespace (E, E). We call the random function

I 3 k 7→ Xk(ω) ∈ E

the sample path of the process X. The sample path of a discrete-time stochasticprocess is in fact itself a random variable X = (Xk)k∈I , taking values in the productspace (EI , EI), where

EI := x = (xk)k∈I : xk ∈ E ∀k ∈ I

is the space of all functions x : I → E and EI denotes the product-σ-field. Itis well-known that a probability law on (EI , EI) is uniquely characterized by itsfinite-dimensional marginals, i.e., even if I is infinite, the law of the sample pathX is uniquely determined by the finite dimensional distributions

P[(Xk, . . . , Xk+n) ∈ ·

](k, . . . , k + n ⊂ I).

of the process. Conversely, if (E, E) is a Polish space equipped with its Borel-σ-field, then by the Daniell-Kolmogorov extension theorem, any consistent collectionof probability measures on the finite-dimensional product spaces (EJ , EJ), withJ ⊂ I a finite interval, uniquely defines a probability measure on (EI , EI). Polish

5

6 CHAPTER 0. PRELIMINARIES

spaces include many of the most commonly used spaces, such as countable spacesequipped with the discrete topology, Rd, separable Banach spaces, and much more.Moreover, open or closed subsets of Polish spaces are Polish, as are countablecarthesian products of Polish spaces, equipped with the product topology.

0.2 Filtrations and stopping times

As before, let I be an interval in Z. A discrete filtration is a collection of σ-fields(Fk)k∈I such that Fk ⊂ Fk+1 for all k, k + 1 ∈ I. If X = (Xk)k∈I is a stochasticprocess, then

FXk := σ(Xj : j ∈ I, j ≤ k

)(k ∈ I)

is a filtration, called the filtration generated by X. For any filtration (Fk)k∈I , weset

F∞ := σ(⋃k∈I

Fk).

In particular, FX∞ = σ((Xk)k∈I).

A stochastic process X = (Xk)k∈I is adapted to a filtration (Fk)k∈I if Xk is Fk-measurable for each k ∈ I. Then (FXk )k∈I is the smallest filtration that X isadapted to, and X is adapted to a filtration (Fk)k∈I if and only if FXk ⊂ Fk for allk ∈ I.

Let (Fk)k∈I be a filtration. An Fk- stopping time is a function τ : Ω → I ∪ ∞such that the 0, 1-valued process k 7→ 1τ≤k is Fk-adapted. Obviously, this isequivalent to the statement that

τ ≤ k ∈ Fk (k ∈ I).

If (Xk)k∈I is an E-valued stochastic process and A ⊂ E is measurable, then thefirst entrance time of X into A

τA := infk ∈ I : Xk ∈ A

with inf ∅ :=∞ is an FXk -stopping time. More generally, the same is true for thefirst entrance time of X into A after σ

τσ,A := infk ∈ I : k > σ, Xk ∈ A,

where σ is an Fk-stopping time. Deterministic times are stopping times (w.r.t.any filtration). Moreover, if σ, τ are Fk-stopping times, then also

σ ∨ τ, σ ∧ τ

0.3. MARTINGALES 7

are Fk-stopping times. If f : I ∪ ∞ → I ∪ ∞ is measurable and f(k) ≥ k forall k ∈ I, and τ is an Fk-stopping time, then also f(τ) is an Fk-stopping time.

If X = (Xk)k∈I is an Fk-adapted stochastic process and τ is an Fk-stopping time,then the stopped process

ω 7→ Xk∧τ(ω)(ω) (k ∈ I)

is also an Fk-adapted stochastic process. If τ < ∞ a.s., then moreover ω 7→Xτ(ω)(ω) is a random variable. If τ is an Fk-stopping time defined on some filteredprobability space (Ω,F , (Fk)k∈I ,P) (with Fk ⊂ F for all k ∈ I), then the σ-field ofevents observable before τ is defined as

Fτ :=A ∈ F∞ : A ∩ τ ≤ k ∈ Fk ∀k ∈ I

.

Exercise 0.1 If (Fk)k∈I is a filtration and σ, τ are Fk-stopping times, then showthat Fσ∧τ = Fσ ∧ Fτ .

Exercise 0.2 Let (Fk)k∈I be a filtration, let X = (Xk)k∈I be an Fk-adaptedstochastic process and let τ be an FXk -stopping time. Let Yk := Xk∧τ denote thestopped process Show that the filtration generated by Y is given by

FYk = FXk∧τ(k ∈ I ∪ ∞

).

In particular, since this formula holds also for k =∞, one has

FXτ = σ((Xk∧τ )k∈I

),

i.e., FXτ is the σ-algebra generated by the stopped process.

0.3 Martingales

By definition, a real stochastic process M = (Mk)k∈I , where I ⊂ Z is an interval,is an Fk-submartingale with respect to some filtration (Fk)k∈I if M is Fk-adapted,E[|Mk|] <∞ for all k ∈ I, and

E[Mk+1|Fk] ≥Mk (k, k + 1 ⊂ I). (0.1)

We say that M is a supermartingale if the reverse inequality holds, i.e., if −Mis a submartingale, and a martingale if equality holds in (0.1), i.e., M is both a


submartingale and a supermartingale. By induction, it is easy to show that (0.1)holds more generally when k, k + 1 are replaced by more general times k,m ∈ Iwith k ≤ m.

If M is an Fk-submartingale and (F ′k)k≥0 is a smaller filtration (i.e., F ′k ⊂ Fk forall k ∈ I) that M is also adapted to, then

E[Mk+1|F ′k] = E[E[Mk+1|Fk]|F ′k] ≥ E[Mk|F ′k] = Mk (k, k + 1 ⊂ I),

which shows that M is also an Fk-submartingale. In particular, a stochastic pro-cess M is a submartingale with respect to some filtration if and only if it is asubmartingale with respect to its own filtration (FMk )k∈I . In this case, we simplysay that M is a submartingale (resp. supermartingale, martingale).

Let (Fk)k∈I be a filtration and let (Fk−1)k∈I be the filtration shifted one step toleft, where we set Fk−1 := ∅,Ω if k − 1 6∈ I. Let X = (Xk)k∈I be a real Fk-adapted stochastic process such that E[|Xk|] < ∞ for all k ∈ I. By definition,a compensator of X w.r.t. the filtration (Fk)k∈I is an Fk−1-adapted real processK = (Kk)k∈I such that E[|Kk|] < ∞ for all k ∈ I and (Xk − Kk)k∈I is an Fk-martingale. It is not hard to show that K is a compensator if and only if K isFk−1-adapted, E[|Kk|] <∞ for all k ∈ I and

Kk+1 −Kk = E[Xk+1

∣∣Fk]−Xk (k, k + 1 ⊂ I).

It follows that any two compensators must be equal up to an additive⋂k∈I Fk−1-

measurable random constant. In particular, if I = N, then because of the waywe have defined F−1, such a constant must be deterministic. In this case, it iscustomary to put K0 := 0, i.e., we call

Kn :=n∑k=1

(E[Xk

∣∣Fk−1]−Xk−1

)(n ≥ 0)

the (unique) compensator of X with respect to the filtration (Fk)k∈N. We notethat X is a submartingale if and only if its compensator is a.s. nondecreasing.

The proof of the following basic fact can be found in, e.g., [Lach12, Thm 2.4].

Proposition 0.3 (Optional stopping) Let I ⊂ Z be an interval, (Fk)k∈I afiltration, let τ be an Fk-stopping time and let (Mk)k∈I be an Fk-submartingale.Then the stopped process (Mk∧τ )k∈I is an Fk-submartingale.

The following proposition is a special case of [Lach12, Prop. 2.1].

0.4. MARTINGALE CONVERGENCE 9

Proposition 0.4 (Conditioning on events up to a stopping time) Let I ⊂ Zbe an interval, (Fk)k∈I a filtration, let τ be an Fk-stopping time and let (Mk)k∈Ibe an Fk-submartingale. Then

E[Mk

∣∣Fk∧τ] ≥Mk∧τ (k ∈ I).

0.4 Martingale convergence

If F ,Fk (k ≥ 0) are σ-fields, then we say that Fk ↑ F if Fk ⊂ Fk+1 (k ≥ 0) andF = σ(

⋃k≥0Fk). Note that this is the same as saying that (Fk)k≥0 is a filtration

and F = F∞, as we have defined it above. Similarly, if F ,Fk (k ≥ 0) are σ-fields,then we say that Fk ↓ F if Fk ⊃ Fk+1 (k ≥ 0) and F =

⋂k≥0Fk.

Exercise 0.5 Let (Fk)k∈N be a filtration and let τ be an Fk-stopping time. Showthat

Fk∧τ ↑ Fτ as k ↑ ∞.

The following proposition says that conditional expectations are continuous w.r.t.convergence of σ-fields. A proof can be found in, e.g., [Lach12, Prop. 4.12], [Chu74,Thm 9.4.8] or [Bil86, Thms 3.5.5 and 3.5.7].

Proposition 0.6 (Continuity in σ-field) Let X be a real random variable de-fined on a probability space (Ω,F ,P) and let F∞,Fk ⊂ F (k ≥ 0) be σ-fields.Assume that E[|X|] <∞ and Fk ↑ F∞ or Fk ↓ F∞. Then

E[X | Fk] −→k→∞

E[X | F∞] a.s. and in L1-norm.

Note that if Fk ↑ F and E[|X|] < ∞, then Mk := E[X | Fk] defines a martin-gale. Proposition 0.6 says that such a martingale always converges. Conversely,we would like to know for which martingales (Mk)k≥0 there exists a final elementX such that Mk = E[X | Fk] . This leads to the problem of martingale conver-gence. Since each submartingale is the sum of a martingale and a nondecreasingcompensator and since nondecreasing functions always converge, we may more orless equivalently ask the same question for submartingales. For a proof of thefollowing fact we refer to, e.g., [Lach12, Thm 4.1].

Proposition 0.7 (Submartingale convergence) Let (Mk)k∈N be a submartin-gale such that supn≥0 E[Mk ∨ 0] < ∞. Then there exists a random variable M∞with E[|M∞|] <∞ such that

Mk −→k→∞

M∞ a.s.


In particular, this implies that nonnegative supermartingales converge almostsurely. The same is not true for nonnegative submartingales: a counterexample isone-dimensional random walk reflected at the origin.

In general, even if M is a martingale, it need not be true that E[M∞] ≥ E[M0] (acounterexample is random walk stopped at the origin). We recall that a collectionof random variables (Xk)k∈I is uniformly integrable if

limn→∞

supk∈I

E[|Xk|1|Xk|≥n

]= 0.

Sufficient1 for this is that supk∈I E[ψ(|Xk|)] < ∞, where ψ : [0,∞) → [0,∞) isnonnegative, increasing, convex, and satisfies limr→∞ ψ(r)/r =∞. Possible choicesare for example ψ(r) = r2 or ψ(r) = (1 + r) log(1 + r) − r. It is well-known thatuniform integrability and a.s. convergence of a sequence of real random variablesimply convergence in L1-norm. For submartingales, the following result is known[Lach12, Thm 4.8].

Proposition 0.8 (Final element) In addition to the assumptions of Proposi-tion 0.7, assume that (Mk)k∈N is uniformly integrable. Then

E[|Mk −M∞|

]−→k→∞

0 a.s.

and E[M∞|Fk] ≥ Mk for all k ≥ 0. If M is a martingale, then E[M∞|Fk] = Mk

for all k ≥ 0.

Note that in particular, if M is a martingale, then Proposition 0.8 says that Mk =E[M∞|Fk], which shows that all information about the martingale M is hidden inits final element M∞.

Combining Propositions 0.8 and 0.3, we see that if τ is an Fk-stopping time suchthat τ < ∞ a.s., (Mk)k∈N is an Fk-submartingale, and (Mk∧τ )k∈N is uniformlyintegrable, then E[Mτ ] = limk→∞ E[Mk∧τ ] ≥M0.

There also exist convergence results for ‘backward’ martingales (Mk)k∈−∞,...,0.

0.5 Markov chains

Proposition 0.9 (Markov property) Let (E, E) be a measurable space, let I ⊂Z be an interval and let (Xk)k∈I be an E-valued stochastic process. For each n ∈ I,

1By the De la Valle-Poussin theorem, this condition is in fact also necessary.

0.5. MARKOV CHAINS 11

set I−n := k ∈ I : k ≤ n and I+n := k ∈ I : k ≥ n, and let FXn := σ((Xk)k∈I−n )

be the filtration generated by X. Then the following conditions are equivalent.

(i) P[(Xk)k∈I−n ∈ A, (Xk)k∈I+n ∈ B

∣∣Xn

]= P

[(Xk)k∈I−n ∈ A

∣∣Xn

]P[(Xk)k∈I+n ∈ B

∣∣Xn

]a.s.

for all A ∈ EI−n , B ∈ EI+n , n ∈ I.

(ii) P[(Xk)k∈I+n ∈ B

∣∣FXn ] = P[(Xk)k∈I+n ∈ B

∣∣Xn

]a.s. for all B ∈ EI+n , n ∈ I.

(iii) P[Xn+1 ∈ C

∣∣FXn ] = P[Xn+1 ∈ C

∣∣Xn

]a.s. for all C ∈ E, n, n+ 1 ⊂ I.

Remarks Property (i) says that the past and future are conditionally independentgiven the present. Property (ii) says that the future depends on the past onlythrough the present, i.e., after we condition on the present, conditioning on thewhole past does not give any extra information. Property (iii) says that it sufficesto check (ii) for single time steps.

Proof of Proposition 0.9 Set GXn := σ((Xk)k∈I+n ). If (i) holds, then, for anyA ∈ FXn and B ∈ GXn , we have

E[1AP[B |Xn]

]= E

[E[1AP[B |Xn]

∣∣Xn

]]= E

[E[1A |Xn]P[B |Xn]

]= E

[P[A |Xn]P[B |Xn]

]= E

[P[A ∩B |Xn]

]= P[A ∩B],

where we have pulled the σ(Xn)-measurable random variable P[B |Xn] out of theconditional expectation. Since this holds for arbitrary A ∈ FXn and since P[B |Xn]is FXn -measurable, it follows from the definition of conditional expectations that

P[B | FXn ] = P[B |Xn],

which is just another way of writing (ii). Conversely, if (ii) holds, then for anyC ∈ σ(Xn),

E[P[A |Xn]P[B |Xn]1C

]= E

[P[B |Xn]1CE[1A |Xn]

]= E

[E[P[B |Xn]1C1A |Xn]

]= E

[1A∩CP[B | FXn ]

]= P[A ∩B ∩ C].

Since this holds for any C ∈ σ(Xn), it follows from the definition of conditionalexpectations that

P[A ∩B |Xn] = P[A |Xn]P[B |Xn] a.s.


To see that (iii) is sufficient for (ii), one first proves by induction that

P[Xn+1 ∈ C1, . . . , Xn+m ∈ Cm

∣∣FXn ] = P[Xn+1 ∈ C1, . . . , Xn+m ∈ Cm

∣∣Xn

],

and then uses that these sort events uniquely determine conditional probabilitiesof events in GXn .

If a process X = (Xk)k∈I satisfies the equivalent conditions of Proposition 0.9,then we say that X has the Markov property. For processes with countable statespaces, there is an easier formulation.

Proposition 0.10 (Markov chains) Let I ⊂ Z be an interval and let X =(Xk)k∈I be a stochastic process taking values in a countable space S. Then X hasthe Markov property if and only if for each k, k+ 1 ⊂ I there exists a probabilitykernel Pk,k+1(x, y) on S such that

P[Xk = xk, . . . , Xk+n = xk+n]

= P[Xk = xk]Pk,k+1(xk, xk+1) · · ·Pk+n−1,k+n(xk+n−1, xk+n)(0.2)

for all k, . . . , k + n ⊂ I, xk, . . . , xk+n ∈ S.

Proof See, e.g., [LP11, Thm 2.1].

If I = N, then Proposition 0.10 shows that the finite dimensional distributions,and hence the whole law of a Markov chain X are defined by its initial law P[X0 ∈· ] and its transition probabilities Pk,k+1(x, y). If the initial law and transitionprobabilities are given, then it is easy to see that the probability laws definedby (0.2) are consistent, hence by the Daniell-Kolmogorov extension theorem, thereexists a Markov chain X, unique in distribution, with this initial law and transitionprobabilies.

We note that conversely, a Markov chain X determines its transition probabilitiesPk,k+1(x, y) only for a.e. x ∈ S w.r.t. the law of Xk. If it is possible to choosethe transition kernels Pk,k+1’s in such a way that they do not depend on k, thenwe say that the Markov chain is homogeneous. We are usually not interested inthe problem to find Pk,k+1 given X, but typically we start with a given probabilitykernel P on S and are interested in all Markov chains that have P as their transitionprobability in each time step, and arbitrary initial law. Often, the word Markovchain is used in this more general sense. Thus, we often speak of ‘the Markov chainwith state space S that jumps from x to y with probability. . . ’ without specifying

0.5. MARKOV CHAINS 13

the initial law. From now on, we use the convention that all Markov chains arehomogeneous, unless explicitly stated otherwise. Moreover, if we don’t mention theinitial law, then we mean the process started in an arbitrary initial law.

We note from Proposition 0.9 (i) that the Markov property is symmetric undertime reversal, i.e., if (Xk)k∈I has the Markov property and I ′ := −k : k ∈ I, thenthe process X ′ = (X ′k)k∈I′ defined by X ′k := X−k (k ∈ I ′) also has the Markovproperty. It is usually not true, however, that X ′ is homogeneous if X is. Anexception are stationary processes, which leads to the useful concept of reversiblelaws.

Exercise 0.11 (Stopped Markov chain) Let X = (Xk)k≥0 be a Markov chainwith countable state space S and transition kernel P , let A ⊂ S and let τA :=infk ≥ 0 : Xk ∈ A be the first entrance time of A. Let Y be the stopped processYk := Xk∧τA (k ≥ 0). Show that Y is a Markov chain and determine its transitionkernel. If we replace τA by the second entrance time of A, is Y then still Markov?

By definition, a random mapping representation of a probability kernel P on acountable state space S is a probability space (E, E , µ) together with a measurablefunction f : S×E → S such that if Z1 is a random variable taking values in (E, E)with law µ, then

P (x, y) = P[f(x, Z1) = y] (x, y ∈ S).

If (Zk)k≥1 are i.i.d. random variables with values in (E, E) and common law µand X0 is a random variable taking values in S, independent of the (Zk)k≥1, thensetting, inductively,

Xk := f(Xk−1, Zk) (k ≥ 1)

defines a Markov chain with transition kernel P and initial state X0. Randommapping representations are essential for simulating Markov chains on a computer.In addition, they have plenty of theoretical applications, for example for couplingMarkov chains with different initial states. (See Section 1.3 for an introductionto coupling.) One can show that each probability kernel has a random mappingrepresentation, but such a representation is far from unique. Often, the key to agood proof is choosing the right one.

We note that it is in general not true that functions of Markov chains are themselvesMarkov chains. An exception is the case when

P[f(Xk+1) ∈ · |FXk ]

depends only on f(Xk). In this case, we say that f(X) is an autonomous Markovchain.


Lemma 0.12 (Autonomous Markov functional) Let X = (Xk)k∈I be a Mar-kov chain with countable state space S and transition kernel P . Let S ′ be a count-able set and let f : S → S ′ be a function. Assume that there exists a probabilitykernel P ′ on S ′ such that

P ′(x′, y′) =∑

y: f(y)=y′

P (x, y′) (x ∈ S, f(x) = x′).

Then f(X) := (f(Xk))k∈I is a Markov chain with state space S ′ and transitionkernel P ′.

0.6 Kernels, operators and linear algebra

Let X = (Xk)k∈I be a stochastic process taking values in a countable space S, andlet P be a probability kernel on S. Then it is not hard to see that X is a Markovprocess with transition kernel P (and arbitrary initial law) if and only if

P[Xk+1 = y | FXk ] = P (Xk, y) a.s. (y ∈ S, k, k + 1 ⊂ I),

where (FXk )k∈I is the filtration generated by X. More generally, one has

P[Xk+n = y | FXk ] = P n(Xk, y) a.s. (y ∈ S, n ≥ 0, k, k + n ⊂ I),

where P n denotes the n-th power of the transition kernel P . Here, if K,L areprobability kernels on S, then we define their product as

KL(x, z) :=∑y∈S

K(x, y)L(y, z) (x, z ∈ S),

which is again a probability kernel on S. Then Kn is defined as the product of ntimes K with itself, where K0(x, y) := 1x=y. We may associate each probabilitykernel on S with a linear operator, acting on bounded real functions f : S → R,defined as

Kf(x) :=∑y∈S

K(x, y)f(y) (x ∈ S).

Then KL is just the composition of the operators K and L, and for each boundedf : S → R, one has

E[f(Xk+n) | FXk ] = P nf(Xk) a.s. (n ≥ 0, k, k + n ⊂ I), (0.3)

0.7. STRONG MARKOV PROPERTY 15

and this formula holds more generally provided the expectations are well-defined(e.g., if E[|f(Xk+n)|] <∞ or f ≥ 0).

If µ is a probability measure on S and K is a probability kernel on S, then wemay define a new probability measure µK on S by

µK(y) :=∑x∈S

µ(x)K(x, y) (y ∈ S).

In this notation, if X is a Markov process with transition kernel P and initial lawP[X0 ∈ · ] = µ, then P[Xn ∈ · ] = µP n is its law at time n.

We may view transition kernels as (possibly infinite) matrices that act on rowvectors µ or column vectors f by left and right multiplication, respectively.

0.7 Strong Markov property

We assume that the reader is familiar with some basic facts about Markov chains,as taught in the lecture [LP11]. In particular, this concerns the strong Markovproperty, which we formulate now.

Let X = (Xk)k≥0 be a Markov chain with countable state space S and transitionkernel P . As usual, it goes without saying that X is homogeneous (i.e., we use thesame P in each time step) and when we don’t mention the initial law, we meanthe process started in an arbitrary initial law. Often, it is notationally convenientto assume that our process X is always the same, while the dependence on theinitial law is expressed in the choice of the probability measure on our underlyingprobability space.

More precisely, we assume that we have a measurable space (Ω,F) and a collectionX = (Xk)k≥0 of measurable maps Xk : Ω → S, as well as a collection (Px)x∈S ofprobability measures on (Ω,F), such that under the measure Px, the process X isa Markov chain with initial law Px[X0 = x] = 1 and transition kernel P . In thisset-up, if µ is any probability measure on S, then under the law P :=

∑x∈S µ(x)Px,

the process X is distributed as a Markov chain with initial law µ and transitionkernel P .

If X,P,Px are as just described and (FXk )k≥0 is te filtration generated by X, thenit follows from Proposition 0.9 (ii) and homogeneity that

P[(Xn+k)k≥0 ∈ ·

∣∣FXn ] = PXn[(Xk)k≥0 ∈ ·

]a.s. (0.4)


Here, for fixed n ≥ 0, we consider (Xn+k)k≥0 as a random variable taking valuesin SN (i.e., this is the process Y defined by Yk := Xn+k (k ≥ 0)). Since SN is anice (in particular Polish) space, we can choose a regular version of the conditionalprobability on the left-hand side of (0.4), i.e., this is a random probability measureon SN. Since Xn is random, the same is true for the right-hand side. In words,formula (0.4) says that given the process up to time n, the process after time n isdistributed as the process started in Xn. The strong Markov property extends thisto stopping times.

Proposition 0.13 (Strong Markov property) Let X,P,Px be as defined above.Then, for any FXk -stopping time τ such that τ <∞ a.s., one has

P[(Xτ+k)k≥0 ∈ ·

∣∣FXτ ] = PXτ[(Xk)k≥0 ∈ ·

]a.s. (0.5)

Proof This follows from [LP11, Thm 2.3].

Remark 1 Even if P[τ = ∞] > 0, formula (0.5) still holds a.s. on the eventτ <∞.

Remark 2 Homogeneity is essential for the strong Markov property, at least inthe (useful) formulation of (0.5).

Since this is closely related to formula (0.4), we also mention the following usefulprinciple here.

Proposition 0.14 (What can happen must eventually happen) Let X =(Xk)k≥0 be a Markov chain with countable state space S. Let B ⊂ SN be measurableand set ρ(x) := Px[(Xk)k≥0 ∈ B]. Then the event

(Xn+k)k≥0 ∈ B for infinitely many n ≥ 0∪ρ(Xn) −→

n→∞0

has probability one.

Proof Let A denote the event that (Xn+k)k≥0 ∈ B for some n ≥ 0. Then byProposition 0.6,

ρ(Xn) = PXn[(Xk)k≥0 ∈ B

]= P

[(Xn+k)k≥0 ∈ B

∣∣FXn ]≤ P

[A∣∣FXn ] −→

n→∞P[A∣∣FX∞] = 1A a.s.

In particular, this shows that ρ(Xn)→ 0 a.s. on the event Ac. Applying the sameargument to Am := (Xn+k)k≥0 ∈ B for some n ≥ m, we see that the eventAm ∪ ρ(Xn)→ 0 has probability one for each m. Letting m ↑ ∞ and observingthat Am ↓ (Xn+k)k≥0 ∈ B for infinitely many n ≥ 0, the claim follows.

0.8. CLASSIFICATION OF STATES 17

0.8 Classification of states

Let X be a Markov chain with countable state space S and transition kernel P .For each x, y ∈ S, we write x y if P n(x, y) > 0 for some n ≥ 0. Note thatx y z implies x z. Two states x, y are called equivalent if x y andy x. It is easy to see that this defines an equivalence relation on S. A Markovchain / transition kernel is called irreducible if all states are equivalent.

A state x is called recurrent if

Px[Xk = x for some k ≥ 1] = 1,

otherwise it is called transient. If two states are equivalent and one of them isrecurrent (resp. transient), then so is the other. Fix x ∈ S, let τ0 := 0 and let

τk := infn > τk−1 : Xn = x (k ≥ 1)

be the times of the k-th visit to x after time zero. Consider the process startedin X0 = x. If x is recurrent, then τ1 < ∞ a.s. It follows from the strong Markovproperty that τ2− τ1 is equally distributed with and independent of τ1. By induc-tion, (τk − τk−1)k≥1 are i.i.d. In particular, τk < ∞ for all k ≥ 1, i.e., the processreturns to x infinitely often.

On the other hand, if x is transient, then by the same sort of argument we seethat the number Nx =

∑k≥1 1Xk=x of returns to x is geometrically distributed

Px[Nx = n] = θn(1− θ) where θ := Pn[Xk = x for some k ≥ 1].

In particular, Ex[Nx] <∞ if and only if x is transient.

Lemma 0.15 (Recurrent classes are closed) Let X be a Markov chain withcountable state space S and transition kernel P . Assume that S ′ ⊂ S is an equiv-alence class of recurrent states. Then P (x, y) = 0 for all x ∈ S ′, y ∈ S\S ′.

Proof Imagine that P (x, y) > 0 for some x ∈ S ′, y ∈ S\S ′. Then, since S ′ isan equivalence class, y 6 x, i.e., the process cannot return from y to x. SinceP (x, y) > 0, this shows that the process started in x has a positive probabilitynever to return to x, a contradiction.

A state x is called positively recurrent if

Ex[infn ≥ 1 : Xn = x] <∞.


Recurrent states that are not positively recurrent are called null recurrent. If twostates are equivalent and one of them is positively recurrent (resp. null recurrent),then so is the other. From this, it is easy to see that a finite equivalence class ofstates can never be null recurrent.

The following lemma is an easy application of the principle ‘what can happen musthappen’ (Proposition 0.14).

Lemma 0.16 (Finite state space) Let X = (Xk)k≥0 be a Markov chain withfinite state space S and transition kernel P . Let Spos denote the set of all positivelyrecurrent states. Then P[Xk ∈ Spos for some k ≥ 0] = 1.

By definition, the period of a state x is the greatest common divisor of n ≥ 1 :P (x, x) > 0. Equivalent states have the same period. States with period one arecalled aperiodic. If x is an aperiodic state, then there exists an m ≥ 1 such thatP n(x, x) > 0 for all n ≥ m. Irreducible Markov chains are called aperiodic if one,and hence all states have period one. If X = (Xk)k≥0 is an irreducible Markovchain with period n, then X ′k := Xkn (k ≥ 0) defines an aperiodic Markov chainX ′ = (X ′k)k≥0.

The following example is of special importance.

Lemma 0.17 (Recurrence of one-dimensional random walk) The Markovchain X with state space Z and transition kernel P (k, k − 1) = P (k, k + 1) = 1

2is

null recurrent.

Proof Note that this Markov chain is irreducible and has period two, as it takesvalue alternatively in the even and odd integers. Using Stirling’s formula, it is nothard to show that (see [LP11, Example 2.9])

P 2k(0, 0) ∼ 1√πk

as k →∞.

In particular, this shows that the expected number of returns to the origin E0[N0] =∑∞k=1 P

2k(0, 0) is infinite, hence X is recurrent.

To prove that X is not positively recurrent, we jump a bit ahead. By Theo-rem 0.18 (a) and (b) below, an irreducible Markov chain is positively recurrentif and only if it has an invariant law. It is not hard to check that any invariantmeasure for X must be infinite, hence X has no invariant law, so it cannot bepositively recurrent.

We will later see that, more generally, random walks on Zd are recurrent in dimen-sions d = 1, 2 and transient in dimensions d ≥ 3.

0.9. INVARIANT LAWS 19

0.9 Invariant laws

By definition, an invariant law for a Markov process with transition kernel P andcountable state space S is a probability measure µ on S that is invariant underleft-multiplication with P , i.e., µP = µ, or, written out per coordinate,∑

y∈S

µ(y)P (y, x) = µ(x) (x ∈ S).

More generally, a (possibly infinite) measure µ on S satisfying this equation iscalled an invariant measure. A probability measure µ on S is an invariant lawif and only if the process (Xk)k≥0 started in the initial law P[X0 ∈ · ] = µ is(strictly) stationary. If µ is an invariant law, then there also exists a stationaryprocess X = (Xk)k∈Z, unique in distribution, such that X is a Markov process withtransition kernel P and P[Xk ∈ · ] = µ for all k ∈ Z (including negative times).

A detailed proof of the following theorem can be found in [LP11, Thms 2.10 and2.26].

Theorem 0.18 (Invariant laws) Let X be a Markov chain with countable statespace S and transition kernel P . Then

(a) If µ is an invariant law and x is not positively recurrent, then µ(x) = 0.

(b) If S ′ ⊂ S is an equivalence class of positively recurrent states, then thereexists a unique invariant law µ of X such that µ(x) > 0 for all x ∈ S ′ andµ(x) = 0 for all x ∈ S\S ′.

(c) The invariant law µ from part (b) is given by

µ(x) = Ex[

infk ≥ 1 : Xk = x]−1

. (0.6)

Sketch of proof For any x ∈ S, define µ(x) as in (0.6), with 1/∞ := 0. Sinceconsecutive return times are i.i.d., it is not hard to prove that

limn→∞

1

n

n∑k=1

Px[Xk = x] = µ(x), (0.7)

i.e., the process started in x spends a µ(x)-fraction of its time in x. As a result, itis not hard to show that if x is transient or null-recurrent, then the process started


in any initial law satisfies P[Xn = x] → 0 for n → ∞, hence no invariant law cangive positive probability to such a state.

On the other hand, if S ′ ⊂ S is an equivalence class of positively recurrent states,then one can check that (0.7) holds more generally for the process started in anyinitial law on S ′. It follows that for any such process, the Cesaro-limit

µ = limn→∞

1

n

n∑k=1

P[Xk ∈ · ]

exists and does not depend on the initial law. In particular, if we start in aninvariant law, then this limit must be µ, which proves uniqueness. It is not hardto check that any such Cesaro-limit must be an invariant law, from which we obtainexistence.

Remark Using Lemma 0.15, it is not hard to prove that a general invariant lawof the process is a convex combination of invariant laws that are concentrated onone equivalence class of positively recurrent states.

Theorem 0.19 (Convergence to invariant law) Let X be an irreducible pos-itively recurrent Markov chain with invariant law µ. Then the process started inany initial law satisfies

1

n

n−1∑k=0

P[Xk = x] −→n→∞

µ(x) (x ∈ S).

If X is moreover aperiodic, then one also has that

P[Xk = x] −→k→∞

µ(x) (x ∈ S).

On the other hand, if X is a Markov chain such that all states of X are transientor null recurrent, then the process started in any initial law satisfies

P[Xk = x] −→k→∞

0 (x ∈ S).

Proof See [LP11, Thm 2.26].

If µ is an invariant law and X = (Xk)k∈Z is a stationary process such that P[Xk ∈· ] = µ for all k ∈ Z, then by the symmetry of the Markov property w.r.t. timereversal, the process X ′ = (X ′k)k∈Z defined by X ′k := X−k (k ∈ Z) is also a

0.9. INVARIANT LAWS 21

Markov process. By stationarity, X ′ is moreover homogeneous, i.e., there existsa transition kernel P ′ such that the transition probabilities P ′k,k+1 of X ′ satisfyP ′(x, y) = P ′k,k+1(x, y) for a.e. x w.r.t. µ. In general, it will not be true thatP ′ = P . We say that µ is a reversible law if µ is invariant and in addition,the stationary processes X and X ′ are equal in law. One can check that this isequivalent to the detailed balance condition

µ(x)P (x, y) = P (x, y)µ(y) (x, y ∈ S),

which says that the process X started in P[X0 ∈ · ] = µ satisfies P[X0 = x, X1 =y] = P[X0 = y, X1 = x]. More generally, a (possibly infinite) meaure µ on Ssatisfying detaied balance is called an reversible measure. If µ is reversible measureand we define a (semi-) inner product of real functions f : S → R by

〈f, g〉µ :=∑x∈S

f(x)g(x)µ(x),

then P is self-adjoint w.r.t. this inner product:

〈f, Pg〉µ = 〈Pf, g〉µ.

Chapter 1

Harmonic functions

1.1 (Sub-) harmonicity

Let X be a Markov chain with countable state space S and transition kernel P .As we have seen, an invariant law of X is a vector that is invariant under left-multiplication with P . Harmonic functions 1 are functions that are invariantunder right-multiplication with P . More precisely, we will say that a functionh : S → R is subharmonic for X if∑

y

P (x, y)|h(y)| <∞ (x ∈ S),

andh(x) ≤

∑y

P (x, y)h(y) (x ∈ S).

We say that h is superharmonic if −h is subharmonic, and harmonic if it is bothsubharmonic and superharmonic.

Lemma 1.1 (Harmonic functions and martingales) Assume that h is sub-harmonic for the Markov chain X = (Xk)k≥0 and that E[|h(Xk)|] < ∞ (k ≥ 0).Then Mk := h(Xk) (k ≥ 0) defines a submartingale M = (M(Xk))k≥0 w.r.t. to thefiltration (FXk )k≥0 generated by X.

1Historically, the term harmonic function was first used, and is still commonly used, for asmooth function f : U → R, defined on some open domain U ⊂ Rd, that solves the Laplaceequation

∑di=1

∂2

∂xi2 f(x) = 0. This is basically the same as our definition, but with our Markov

chain X replaced by Brownian motion B = (Bt)t≥0. Indeed, a smooth function f solves theLaplace equation if and only if (f(Bt))t≥0 is a local martingale.

23

24 CHAPTER 1. HARMONIC FUNCTIONS

Proof This follows by writing (using (0.3)),

E[h(Xk+1)

∣∣FXk ] =∑y

P (Xk, y)h(y) ≥ h(Xk) (k ≥ 0).

We will say that a state x is an absorbing state or trap for a Markov chain X ifP (x, x) = 1.

Lemma 1.2 (Trapping probability) Let X be a Markov chain with countablestate space S and transition kernel P , and let z ∈ S be a trap. Then the trappingprobability

h(x) := Px[Xk = z for some k ≥ 0

]is a harmonic function for X.

Proof Since 0 ≤ h ≤ 1, integrability is not an issue. Now

h(x) = Px[Xk = z for some k ≥ 0

]=∑y

Px[Xk = z for some k ≥ 0

∣∣X1 = y]Px[X1 = y]

=∑y

P (x, y)Py[Xk = z for some k ≥ 0

]=∑y

P (x, y)h(y).

Lemma 1.3 (Trapping estimates) Let X be a Markov chain with countablestate space S and transition kernel P , and let T := z ∈ S : z is a trap. Assumethat the chain gets trapped a.s., i.e., P[∃n ≥ 0 s.t. Xn ∈ T ] = 1 (regardless of theinitial law). Let z ∈ T and let h : S → [0, 1] be a subharmonic function such thath(z) = 1 and h ≡ 0 on T\z. Then

h(x) ≤ Px[Xk = z for some k ≥ 0

]If h is superharmonic, then the same holds with the inequality sign reversed.

Proof Since h is subharmonic, Mk := h(Xk) is a submartingale. Since h isbounded, M is uniformly integrable. Therefore, by Propositions 0.7 and 0.8,Mk → M∞ a.s. and in L1-norm, where M∞ is some random variable such thatEx[M∞] ≥ M0 = h(x). Since the chain gets trapped a.s., we have M∞ = h(Xτ ),

1.1. (SUB-) HARMONICITY 25

where τ := infk ≥ 0 : Xk ∈ T is the trapping time. Since h(z) = 1 and h ≡ 0on T\z, we have M∞ = 1Xτ=z and therefore Px[Xτ = z] = Ex[M∞] ≥ h(x). Ifh is superharmonic, the same holds with the inequality sign reversed.

Remark 1 If S ′ ⊂ S is a ‘closed’ set in the sense that P(x, y) = 0 for all x ∈ S ′,y ∈ S\S ′, then define φ : S → (S\S ′) ∪ ∗ by φ(x) := ∗ if x ∈ S ′ and φ(x) := xif x ∈ S\S ′. Now (φ(Xk))k≥0 is a Markov chain that gets trapped in ∗ if and onlyif the original chain enters the closed set S ′. In this way, Lemma 1.3 can easily begeneralized to Markov chains that eventually get ‘trapped’ in one of finitely manyequivalence classes of recurrent states. In particular, this applies when S is finite.

Remark 2 Lemma 1.3 tells us in particular that, provided that the chain getstrapped a.s., the function h from Lemma 1.2 is the unique harmonic functionsatisfying h(z) = 1 and h ≡ 0 on T\z. For a more general statement of thistype, see Exercise 1.7 below.

Remark 3 Even in situations where it is not feasable to calculate trapping prob-abilities exactly, Lemma 1.3 can sometimes be used to derive lower and upperbounds for these trapping probabilities.

The following transformation is usually called an h-transform or Doob’s h-trans-form. Following [LPW09], we will simply call it a Doob transform.2

Lemma 1.4 (Doob transform) Let X be a Markov chain with countable statespace S and transition kernel P , and let h : S → [0,∞) be a nonnegative harmonicfunction. Then setting S ′ := x ∈ S : h(x) > 0 and

P h(x, y) :=P (x, y)h(y)

h(x)(x, y ∈ S ′)

defines a transition kernel P h on S ′.

Proof Obviously P h(x, y) ≥ 0 for all x, y ∈ S ′. Since∑y∈S′

P h(x, y) = h(x)−1∑y∈S′

P (x, y)h(y) = h(x)−1Ph(x) = 1 (x ∈ S ′),

P h is a transition kernel.2The term h-transform is somewhat inconvenient for several reasons. First of all, having

mathematical symbols in names of chapters or articles causes all kinds of problems for referencing.Secondly, if one performs an h-transform with a function g, then should one speak of a g-transformor an h-transform? The situation becomes even more confusing when there are several functionsaround, one of which may be called h.


Proposition 1.5 (Conditioning on the future) Let X = (Xk)k≥0 be a Markovchain with countable state space S and transition kernel P , and let z ∈ S be atrap. Set S ′ := y ∈ S : y z and assume that P[X0 ∈ S ′] > 0. Then, under theconditional law

P[(Xk)k≥0 ∈ ·∣∣Xm = z for some m ≥ 0

],

the process X is a Markov process in S ′ with Doob-transformed transition kernelP h, where

h(x) := Px[Xm = z for some m ≥ 0

]satisfies h(x) > 0 if and only if x ∈ S ′.

Proof Using the Markov property (in its strong form (0.4)), we observe that

P[Xn+1 = y

∣∣ (Xk)0≤k≤n = (xk)0≤k≤n, Xm = z for some m ≥ 0]

= P[Xn+1 = y

∣∣ (Xk)0≤k≤n = (xk)0≤k≤n, Xm = z for some m ≥ n+ 1]

=P[Xn+1 = y, Xm = z for some m ≥ n+ 1

∣∣ (Xk)0≤k≤n = (xk)0≤k≤n]

P[Xm = z for some m ≥ n+ 1

∣∣ (Xk)0≤k≤n = (xk)0≤k≤n]

=P (xn, y)Py[Xm = z for some m ≥ 0]

Pxn [Xm = z for some m ≥ 1]= P h(xn, y)

for each (xk)0≤k≤n and y such that P[(Xk)0≤k≤n = (xk)0≤k≤n] > 0 and xn, y ∈ S ′.

Remark At first sight, it is surprising that conditioning on the future may preservethe Markov property. What is essential here is that being trapped in x is a tailevent, i.e., an event measurable w.r.t. the tail-σ-algebra

T :=⋂k≥0

σ(Xk, Xk+1, . . .).

Similarly, if we condition a Markov chain (Xk)0≤k≤n that is defined on finite timeinterval on its final state Xn, then under the conditional law, (Xk)0≤k≤n is stillMarkov, although no longer homogeneous.

Exercise 1.6 (Sufficient conditions for integrability) Let h : S → R be anyfunction. Assume that E[|h(X0)|] < ∞ and there exists a constant K < ∞ suchthat

∑y P (x, y)|h(y)| ≤ K|h(x)|. Show that E[|h(Xk)|] <∞ (k ≥ 0).

Exercise 1.7 (Boundary conditions) Let X be a Markov chain with countablestate space S and transition kernel P , and let T := z ∈ S : z is a trap. Assume

1.1. (SUB-) HARMONICITY 27

that the chain gets trapped a.s., i.e., P[∃n ≥ 0 s.t. Xn ∈ T ] = 1 (regardlessof the initial law). Show that for each real function φ : T → R there exists aunique bounded harmonic function h : S → R such that h = φ on T . Hint: takeh(x) := E[φ(Xτ )], where τ := infk ≥ 0 : Xk ∈ T is the trapping time.

Exercise 1.8 (Conditions for getting trapped) If we do not know a priori thata Markov chain eventually gets trapped, then the following fact is often useful. LetX be a Markov chain with countable state space S and transition kernel P , andlet h : S → [0, 1] be a sub- or superharmonic function. Assume that for all ε > 0there exists a δ > 0 such that

Px[|h(X1)− h(x)| ≥ δ

]≥ δ whenever ε ≤ h(x) ≤ 1− ε.

Show that limk→∞ h(Xk) ∈ 0, 1 a.s. Hint: use martingale convergence to provethat limk→∞ h(Xk) exists and use the principle ‘what can happen must happen’(Proposition 0.14) to show that the limit cannot take values in (0, 1).

Exercise 1.9 (Trapping estimate) Let X,S, , P and h be as in Excercise 1.8.Assume that h is a submartingale and there is a point z ∈ S such that h(z) = 1and supx∈S\z h(x) < 1. Show that

h(x) ≤ Px[Xk = z for some k ≥ 0].

Exercise 1.10 (Compensator) Let X = (Xk)k≥0 be a Markov chain with count-able state space S and transition kernel P , and let f : S → R be a function suchthat

∑y P (x, y)|f(y)| <∞ for all x ∈ S. Assume that, for some given initial law,

the process X satisfies E[|f(Xk)|] <∞ for all k ≥ 0. Show that the compensatorof (f(Xk))k≥0 is given by

Kn =n−1∑k=0

(Pf(Xk)− f(Xk)

)(n ≥ 0).

Exercise 1.11 (Expected time till absorption: part 1) Let X be a Markovchain with countable state space S and transition kernel P , and let T := z ∈ S :z is a trap. Let τ := infk ≥ 0 : Xk ∈ T and assume that Ex[τ ] < ∞ for allx ∈ S. Show that the function

f(x) := Ex[τ ]

satisfies Pf(x)− f(x) = −1 (x ∈ S\T ) and f ≡ 0 on T .


Exercise 1.12 (Expected time till absorption: part 2) Let X be a Markovchain with countable state space S and transition kernel P , let T := z ∈ S :z is a trap, and set τ := infk ≥ 0 : Xk ∈ T. Assume that f : S → [0,∞)satisfies Pf(x)− f(x) ≤ −1 (x ∈ S\T ) and f ≡ 0 on T . Show that

Ex[τ ] ≤ f(x) (x ∈ S).

Hint: show that the compensator K of (f(Xk))k≥0 satisfies Kn ≤ −(n ∧ τ).

Exercise 1.13 (Absorption of random walk) Consider a random walk W =(Wk)k≥0 on Z that jumps from x to x + 1 with probability p and to x − 1 withthe remaining probability q := 1 − p, where 0 < p < 1. Fix n ≥ 1 and setτ := inf

k ≥ 0 : Wk ∈ 0, n

. Calculate, for each 0 ≤ x ≤ n, the probability

P[Wτ = n].

Exercise 1.14 (First ocurrence of a pattern: part 1)

Let (Xk)k≥0 be i.i.d. Bernoulli random variableswith P[Xk = 0] = P[Xk = 1] = 1

2(k ≥ 0). Set

τ110 := infk ≥ 0 : (Xk, Xk+1, Xk+2) = (1, 1, 0)

,

and define τ010 similarly. Calculate P[τ010 < τ110].

Hint: Set ~Xk := (Xk, Xk+1, Xk+2) (k ≥ 0). Then

( ~Xk)k≥0 is a Markov chain with transition prob-abilities as in the picture on the right. Now theproblem amounts to calculating the trapping prob-abilities for the chain stopped at τ010 ∧ τ110.

111

110 011

101

010

100 001

000

Exercise 1.15 (First ocurrence of a pattern: part 2) In the set-up of theprevious exercise, calculate E[τ110] and E[τ111]. Hints: you need to solve a system

1.2. RANDOM WALK ON A TREE 29

of linear equations. To find the solution, it helps to use Theorem 0.18 (c) and thefact that the uniform distribution is an invariant law. In the case of τ111, it alsohelps to observe that Ex[τ111] depends only on the number of ones at the end of x.

1.2 Random walk on a tree

In this section, we study random walk on an infinite tree in which every vertex hasthree neighbors. Such random walks have many interesting properties. At presentthey are of interest to us because they have many different bounded harmonicfunctions. As we will see in the next section, the situation for random walks onZd is quite different.

Let T2 be an infinite tree, (i.e., a connected graph without cycles) in which eachvertex has degree 3 (i.e., there are three edges incident to each vertex). We will beinterested in the Markov chain whose state space are the vertices of T2 and thatjumps in each step with equal probabilities to one of the three neighboring sites.

We first need a convenient way to label vertices in such a tree. Consider a finitelygenerated group with generators a, b, c satisfying a = a−1, b = b−1 and c = c−1.More formally, we can construct such a group as follows. Let G be the set of allfinite sequences x = x(1) · · ·x(n) where n ≥ 0 (we allow for the empty sequence∅), x(i) ∈ a, b, c for all 1 ≤ i ≤ n, and x(i) 6= x(i + 1) for all 1 ≤ i ≤ i + 1 ≤ n.We define a product on V by concatenation, where we apply the rule that any twoa’s, b’s or c’s next to each other cancel each other, inductively, till we obtain anelement of G. So, for example,

(abacb)(cab) = abacbcab, (abacb)(bab) = abacbbab = abacab,

and (abacb)(bcb) = abacbbcb = abaccb = abab.

With these rules, G is a group with unit element ∅, the empty sequence, andinverse (x(1) · · · x(n))−1 = x(n) · · ·x(1). Note that G is not abelian, i.e., thegroup product is not commutative.

We can make G into a graph by drawing an edge between two elements x, y ∈ Gif x = ya, x = yb, or x = yc. It is not hard to see that the resulting graphis an infinite tree in which each vertex has degree 3; see Figure 1.1.3 We let

3This is a special case of a much more general construction. Let G be a finitely generatedgroup and let ∆ ⊂ G be a finite, symmetric (in the sense that a ∈ ∆ implies a−1 ∈ ∆) set ofelements that generates G. Draw a vertex between two elements a, b ∈ G if a = cb for some


|x| = |x(1) · · ·x(n)| := |n| denote the length of an element x ∈ G. It is not hardto see that this is the same as the graph distance of x to the ‘origin’ ∅, i.e., thelength of the shortest path connecting x to ∅.

∅

a

b c

abac

ba

bc

ca

cb

aba

abc

aca

acb

babbac

bca

bcb

cab cac

cba

cbc

Figure 1.1: The regular tree T2

Let X = (Xk)k≥0 be the Markov chain with state space G and transition proba-bilities

P (x, xa) = P (x, xb) = P (x, xc) =1

3(x ∈ G),

i.e., X jumps in each step to a uniformly chosen neighboring vertex in the graph.We call X the nearest neighbor random walk on G.

We observe that if X is such a random walk on G, then |X| = (|Xk|)k≥0 is a

c ∈ ∆ (or equivalently, by the symmetry of ∆, if b = c′a for some c′ ∈ ∆). The resulting graph iscalled the left Cayley graph associated with G and ∆. This is a very general method of makinggraphs with some sort of translation-invariant structure.


Markov chain with state space N and transition probabilities given by

Q(n, n− 1) :=1

3and Q(n, n+ 1) :=

2

3(n ≥ 1),

and Q(0, 1) := 1.

For each x = x(1) · · ·x(n) ∈ G, let us write x(i) := ∅ if i > n. The followinglemma shows that the random walk X is transient and walks away to infinity in awell-defined ‘direction’.

Lemma 1.16 (Transience) Let X be the random walk on G described above,started in any initial law. Then there exists a random variable X∞ ∈ a, b, cN+

such thatlimn→∞

Xn(i) = X∞(i) a.s. (i ∈ N+).

Proof We may compare |X| to a random walk Z = (Zk)k≥0 on Z that jumps fromn to n − 1 or n + 1 with probabilities 1/3 and 2/3, respectively. Such a randomwalk has independent increments, i.e., (Zk − Zk−1)k≥1 are i.i.d. random variablesthat take the values −1 and +1 with probabilities 1/3 and 2/3. Therefore, by thestrong law of large numbers, (Zn − Z0)/n → 1/3 a.s. and therefore Zn → ∞ a.s.In particular Z visits each state only finitely often, which shows that all states aretransient. It follows that the process Z started in Z0 = 0 has a positive probabilityof not returning to 0. Since Zn →∞ a.s. and since |X| has the same dynamics asZ as long as it is in N+, this shows that the process started in X0 = a satisfies

Pa[|Xk| ≥ 1 ∀k ≥ 1

]= P1

[Zk ≥ 1 ∀k ≥ 1

]> 0.

This shows that a is a transient state for X. By irreducibility, all states aretransient and |Xk| → ∞ a.s., which is easily seen to imply the lemma.

We are now ready to prove the existence of a many bounded harmonic functionsfor the Markov chain X. Let

∂G :=x ∈ a, b, cN+ : x(i) 6= x(i+ 1) ∀i ≥ 1

.

Elements in ∂G correspond to different ways of walking to infinity. Note that ∂Gis an uncountable set. In fact, if we identify elements of ∂G with points in [0, 1]written in base 3, then ∂G corresponds to a sort of Cantor set. We equip ∂G withthe product-σ-field, which we denote by B(∂G). (Indeed, one can check that thisis the Borel-σ-field associated with the product topology.)


Proposition 1.17 (Bounded harmonic functions) Let φ : ∂G→ R be boundedand measurable, let X be the random walk on the tree G described above, and letX∞ be as in Lemma 1.16. Then

h(x) := Ex[φ(X∞)

](x ∈ G)

defines a bounded harmonic function for X. Moreover, the process started in anarbitrary initial law satisfies

h(Xn) −→n→∞

φ(X∞) a.s..

Proof It follows from the Markov property (in the form (0.4)) that

h(x) = Ex[φ(X∞)] =∑y

P (x, y)Ey[φ(X∞)] =∑y

P (x, y)h(y) (x ∈ G),

which shows that h is harmonic. Since ‖h‖∞ ≤ ‖φ‖∞, the function h is bounded.Moreover, by (0.4) and Proposition 0.6,

h(Xn) = EXn [φ(X∞)] = E[φ(X∞) | FXn ] −→n→∞

E[φ(X∞) | FX∞] = φ(X∞) a.s.

For example, in Figure 1.2, we have drawn a few values of the harmonic function

h(x) := Px[X∞(1) = a] (x ∈ G).

Although Proposition 1.17 proves that each bounded measurable function φ on∂G yields a bounded harmonic function for the process X, we have not actuallyshown that different φ’s yield different h’s.

Lemma 1.18 (Many bounded harmonics) Let µ be the probability measureon ∂G defined by µ := P∅[X∞ ∈ · ]. Let φ, ψ : ∂G→ R be bounded and measurableand let

h(x) := Ex[φ(X∞)

]and g(x) := Ex

[ψ(X∞)

](x ∈ G).

Then h = g if and only if φ = ψ a.s. w.r.t. µ.

Proof Let us define more generally µx = Px[X∞ ∈ · ]. Since

µx(A) =∑z

P n(x, z)Pz[X∞ ∈ · ] ≤ P n(x, y)µy(A)


13

23

16

16

56

56

112

112

112

112

2324

2324

2324

2324

124

124

124

124

124

124

124

124

Figure 1.2: A bounded harmonic function

(x, y ∈ G, n ≥ 0, A ∈ B(∂G)) and P is irreducible, we see that µx µy for allx, y ∈ G, hence the measures (µx)x∈G are all equivalent. Thus, if φ = ψ a.s. w.r.t.µ, then they are a.s. equal w.r.t. to µx for each x ∈ G, and therefore

h(x) =

∫φdµx =

∫ψdµx = g(x) (x ∈ G).

On the other hand, if the set φ 6= ψ has positive probability under µ, then byProposition 1.17

P∅[

limn→∞

h(Xn) 6= limn→∞

g(Xn)]> 0,

which shows that there must exist x ∈ G with h(x) 6= g(x).

Exercise 1.19 (Escape probability) Let Z = (Zk)k≥0 be the Markov chainwith state space Z that jumps in each step from n to n − 1 with probability 1/3and to n + 1 with probability 2/3. Calculate P1[Zk ≥ 1 ∀k ≥ 0]. Hint: find asuitable harmonic function for the process stopped at zero.


Exercise 1.20 (Independent increments) Let (Yk)k≥1 be i.i.d. and uniformlydistributed on a, b, c. Define (Xn)n≥0 by the random group product (in the groupG)

Xn := Y1 · · ·Yn (n ≥ 1),

with X0 := ∅. Show that X is the Markov chain with transition kernel P as definedabove.

1.3 Coupling

For any x = (x(1), . . . , x(d)) ∈ Zd, let |x|1 :=∑d

i=1 |x(i)| denote the ‘L1-norm’ ofx. Set ∆ := x ∈ Zd : |x|1 = 1. Let (Yk)k≥1 be i.i.d. and uniformly distributedon ∆ and let

Xn :=n∑k=1

Yk (n ≥ 1),

with X0 := 0. (Here we also use the symbol 0 to denote the origin 0 = (0, . . . , 0) ∈Zd.) Then, just as in Excercise 1.20, X is a Markov chain, that jumps in each timestep from its present position x to a uniformly chosen position in x+ ∆ = x+ y :y ∈ ∆. We call X the symmetric nearest neighbor random walk on the integerlattice Zd. Sometimes X is also called simple random walk.

Let P denote its transition kernel. We will be interested in bounded harmonicfunctions for P . We will show that in contrast to the random walk on the tree,the random walk on the integer lattice has very few bounded harmonic functions.Indeed, all such functions are constant. We will prove this using coupling, whichis a technique of much more general interest, with many applications.

Usually, when we talk about a random variable X (which may be the path of aprocess X = (Xk)k≥0), we are not so much interested in the concrete probabilityspace (Ω,F ,P) that X is defined on. Rather, all that we usually care about isthe law P[X ∈ · ] of X. Likewise, when we have in mind two random variables Xand Y (for example, one binomially and the other normally distributed, or X andY may be two Markov chains with possibly different initial states or transitionkernels), then we usually do not a priori know what their joint distribution is,even if we know there individual distributions. A coupling of two random variablesX and Y , in the most general sense of the word, is a way to construct X andY together on one underlying probability space (Ω,F ,P). More precisely, if Xand Y are random variables defined on different underlying probability spaces,then a coupling of X and Y is a pair of random variables (X ′, Y ′) defined on one

1.3. COUPLING 35

underlying probability space (Ω,F ,P), such that X ′ is equally distributed with Xand Y ′ is equally distributed with Y . Equivalently, since the laws of X and Yare all we really care about, we may say that a coupling of two probability lawsµ, ν defined on measurable spaces (E, E) and (F,F), respectively, is a probabilitymeasure ρ on the product space (E × F, E ⊗ F) such that the first marginal of ρis µ and its second marginal is ν.

Obviously, a trivial way to couple any two random variables is to make themindependent, but this is usually not what we are after. A typical coupling isdesigned to compare two random variables, for example by showing that they areclose, or one is larger than the other. The next excercise gives a simple example.

Exercise 1.21 (Monotone coupling) Let X be uniformly distributed on [0, λ]and let Y be exponentially distributed with mean λ > 0. Show that X andY can be coupled such that X ≤ Y a.s. (Hint: note that this says that youhave to construct a probability measure on [0, λ]× [0,∞) that is concentrated on(x, y) : x ≤ y and has the ‘right’ marginals.) Use your coupling to prove thatE[Xα] ≤ E[Y α] for all α > 0.

Now let ∆ ⊂ Zd be as defined at the beginning of this section and let P be thetransition kernel on Zd defined by

P (x, y) :=1

2d1y − x ∈ ∆ (x, y ∈ Zd).

We are interested in bounded harmonic functions for P , i.e., bounded functionsh : Zd → R such that Ph = h. It is somewhat inconvenient that P is periodic.4

In light of this, we define a ‘lazy’ modification of our transition kernel by

Plazy(x, y) := 12P (x, y) + 1

21x=y.

Obviously, Plazyf = 12Pf + 1

2f , so a function h is harmonic for P if and only if it

is harmonic for Plazy.

Proposition 1.22 (Coupling of lazy walks) Let Xx and Xy be two lazy randomwalks, i.e., Markov chains on Zd with transition kernel Plazy, and initial statesXx

0 = x and Xy0 = y, x, y ∈ Zd. Then Xx and Xy can be coupled such that

∃n ≥ 0 s.t. Xxk = Xy

k ∀k ≥ n a.s.

4Indeed, the Markov chain with transition kernel P takes values alternatively in Zdeven := x ∈

Zd :∑d

i=1 x(i) is even and Zdodd := x ∈ Zd :

∑di=1 x(i) is odd.


Proof We start by choosing a suitable random mapping representation. Let(Uk)k≥1, (Ik)k≥1, and (Wk)k≥1 be collections of i.i.d. random variables, each collec-tion independent of the others, such that for each k ≥ 1, Uk is uniformly distributedon 0, 1, Ik is uniformly distributed on 1, . . . , d, and Wk is uniformly distributedon −1,+1. Let ei ∈ Zd be defined as ei(j) := 1i=j. Then we may constructXx inductively by setting Xx

0 := x and

Xxk = Xx

k−1 + UkWkeIk (k ≥ 1).

Note that this says that Uk decides if we jump at all, Ik decides which coordinatejumps, and Wk decides whether up or down.

To construct also Xy on the same probability space, we define inductively Xy0 := y

and

Xyk =

Xyk−1 + (1− Uk)WkeIk if Xy

k−1(Ik) 6= Xxk−1(Ik),

Xyk−1 + UkWkeIk if Xy

k−1(Ik) = Xxk−1(Ik),

(k ≥ 1).

Note that this says that Xx and Xy always select the same coordinate Ik ∈1, . . . , d that is allowed to move. As long as Xx and Xy differ in this coor-dinate, they jump at different times, but after the first time they agree in thiscordinate, they always increase or decrease this coordinate by the same amount atthe same time. In particular, these rules ensure that

Xxk (i) = Xy

k (i) for all k ≥ τi := infn ≥ 0 : Xxn(i) = Xy

n(i).

Since (Xxk , X

yk )k≥0 is defined in terms of i.i.d. random variables (Uk, Ik,Wk)k≥1

by a random mapping representation, the joint process (Xx, Xy) is clearly aMarkov chain. We have already seen that Xx, on its own, is also a Markovchain, with the right transition kernel Plazy. It is straightforward to check thatP[Xy

k = z | (Xxk , X

yk )] = Plazy(Xy

k , z) a.s. In particular, this transition probabilitydepends only on Xy

k , hence by Lemma 0.12, Xy is an autonomous Markov chainwith transition kernel Plazy.

In view of this, our claim will follow provided we show that τi < ∞ a.s. for eachi = 1, . . . , d. Fix i and define inductively σ0 := 0 and

σk := infk > σk−1 : Ik = i.

Consider the difference process

Dk := Xxσk−Xy

σk(k ≥ 0).

1.4. CONVERGENCE IN TOTAL VARIATION NORM 37

Then D = (Dk)k≥0 is a Markov process on Z that in each step jumps from z toz + 1 or z − 1 with equal probabilities, except when it is in zero, which is a trap.In other words, this says that D is a simple random walk stopped at the first timeit hits zero. By Lemma 0.17, there a.s. exists some (random) k ≥ 0 such thatDk = 0 and hence τi = σk <∞ a.s.

As a corollary of Proposition 1.22, we obtain that all bounded harmonic functionsfor nearest-neighbor random walk on the d-dimensional integer lattice are constant.

Corollary 1.23 (Bounded harmonic functions are constant) Let P (x, y) =(2d)−11|x−y|1=1 be the transition kernel of nearest-neighbor random walk on thed-dimensional integer lattice Zd. If h : Zd → R is bounded and satisfies Ph = h,then h is constant.

Proof Couple Xy and Xy as in Proposition 1.22. Since h is harmonic and bounded,(h(Xx

k ))k≥0 and (h(Xyk ))k≥0 are martingales. It follows that

h(x)− h(y) = E[h(Xxk )]− E[h(Xy

k )]

= E[h(Xxk )− h(Xy

k )] ≤ 2‖h‖∞P[Xxk 6= Xy

k ] −→k→∞

0

for each x, y ∈ Zd, proving that h is constant.

Remark Actually, a much stronger statement than Corollary 1.23 is true: fornearest-neighbor random walk on Zd, all nonnegative harmonic functions are con-stant. This is called the strong Liouville property, see [Woe00, Corollary 25.5]. Ingeneral, the problem of finding all positive harmonic functions for a Markov chainleads to the (rather difficult) problem of determining the Martin boundary of aMarkov chain.

1.4 Convergence in total variation norm

In this section we turn our attention away from harmonic functions and insteadshow another application of coupling. We will use coupling to give a proof of thestatement in Theorem 0.19 (stated without proof in the Introduction) that anyaperiodic, irreducible, positively recurrent Markov chain is ergodic, in the sensethat regardless of the initial state, its law at time n converges to the invariant lawas n→∞.


Recall that the total variation distance between two probability measures µ, ν ona countable set S is defined as

‖µ− ν‖TV := maxA⊂S

∣∣µ(A)− ν(A)∣∣

The following lemma gives another formula for ‖ · ‖TV.

Lemma 1.24 (Total variation distance) For any probability measures µ, ν ona countable set S, one has

‖µ−ν‖TV =∑

x:µ(x)≥ν(x)

(µ(x)−ν(x)

)=

∑x:µ(x)<ν(x)

(ν(x)−µ(x)

)=∑x∈S

∣∣µ(x)−ν(x)∣∣.

Proof Set S+ := x ∈ S : µ(x) > ν(x), S0 := x ∈ S : µ(x) = ν(x), andS− := x ∈ S : µ(x) < ν(x). Define a finite measure ρ by ρ(x) := |µ(x)− ν(x)|.Then

µ(A)− ν(A) = ρ(A ∩ S+)− ρ(A ∩ S−).

It follows that

−ρ(S−) ≤ µ(A)− ν(A) ≤ ρ(S+)

where either inequality may be an equality for a suitable choice of A (A = S− orA = S+, respectively). Here

ρ(S+)− ρ(S−) =∑x∈S

(µ(x)− ν(x)

)= 1− 1 = 0,

so ∑x:µ(x)≥ν(x)

(µ(x)− ν(x)

)= ρ(S+) = ρ(S−) =

∑x:µ(x)<ν(x)

(ν(x)− µ(x)

).

Theorem 1.25 (Convergence to invariant law) Let X be an irreducible, ape-riodic, positively recurrent Markov chain with transition kernel P , state space S,and invariant law µ. Then the process started in any initial law satisfies∥∥P[Xn ∈ ·]− µ

∥∥TV−→n→∞

0.

1.4. CONVERGENCE IN TOTAL VARIATION NORM 39

Proof We take the existence of an invariant law µ as proven. Uniqueness will followfrom our proof. Let X and X be two independent Markov chains with transitionkernel P , where X is started in an arbitrary initial law and P[X0 ∈ · ] = µ. It iseasy to see that the joint process (X,X) = (Xk, Xk)k≥0 is a Markov process withstate space S × S. Let is denote its transition kernel by P2, i.e., by independence,

P2

((x, x), (y, y)

)= P (x, y)P (x, y) (x, x, y, y ∈ S).

We claim that P2 is irreducible. Fix x, x, y, y ∈ S. Since P is irreducible andaperiodic, it is not hard to see that there exists an m1 ≥ 1 such that P n(x, y) > 0for all n ≥ m1. Likewise, there exists an m2 ≥ 1 such that P n(x, y) > 0 for alln ≥ m2. Choosing n ≥ m1 ∨m2, we see that

P n2

((x, x), (y, y)

)= P n(x, y)P n(x, y) > 0,

proving that P2 is irreducible.

By Theorem 0.18 (a) and (b), an irreducible Markov chain is positively recurrentif and only if it has an invariant law. Obviously, the product measure µ⊗ µ is aninvariant law for P2, so P2 is positively recurrent. In particular, this proves thatthe stopping time

τ := infk ≥ 0 : Xk = Xk

is a.s. finite and has, in fact, finite expectation. Let X ′ = (X ′k)k≥0 be the processdefined by

X ′k :=

Xk if k < τ,

Xk if k ≥ τ.

It is not hard to see that X ′ is a Markov chain with transition kernel P and initiallaw P[X ′0 ∈ · ] = P[X0 ∈ · ], hence X ′ is equal in law with X. Now∥∥P[Xn ∈ · ]− µ

∥∥TV

= supA⊂S

∣∣P[X ′k ∈ A]− µ(A)∣∣ = sup

A⊂S

∣∣P[X ′k ∈ A]− P[Xk ∈ A]∣∣

= supA⊂S

P[X ′k ∈ A,Xk 6∈ A or Xk 6∈ A,Xk ∈ A

]≤ P

[X ′k 6= Xk

]= P[k < τ ] −→

k→∞0.

Exercise 1.26 (Periodic kernels) Show that the probability kernel P2 in theproof of Theorem 1.25 is not irreducible if P is periodic.

Chapter 2

Eigenvalues

2.1 The Perron-Frobenius theorem

In this section, we recall the classical Perron-Frobenius theorem about the leadingeigenvalue and eigenvector of a nonnegative matrix. This theorem is usually viewedas a part of linear algebra and proved in that context. We will see that the theoremhas close connections with Markov chains and some of its statements can even beproved by probabilistic methods.

Let S be a finite set (S = 1, . . . , n in the traditional formulation of the Perron-Frobenius theorem) and let A : S × S → C be a function. We view such functionsas matrices, equipped with the usual matrix product, or, equivalently, we identifyA with the linear operator A : CS → CS given by Af(x) :=

∑y∈S A(x, y)f(y),

where, CS denotes the linear space consisting of all functions f : S → C. Wesay that A is real if A(x, y) ∈ R, and nonnegative if A(x, y) ≥ 0, for all x, y ∈ S.Note that probability kernels are nonnegative matrices. A nonnegative matrix A iscalled irreducible if for each x, y ∈ S there exists an n ≥ 1 such that An(x, y) > 0.For probability kernels, this coincides with our earlier definition of irreducibility.We let spec(A) denote the spectrum of A, i.e., the collection of (possibly complex)eigenvalues of A, and we let ρ(A) denote its spectral radius

ρ(A) := sup|λ| : λ ∈ spec(A).

If ‖ · ‖ is any norm on CS, then we define the associated operator norm ‖A‖ of Aas

‖A‖ := sup‖Af‖ : f ∈ CS, ‖f‖ = 1.

41

42 CHAPTER 2. EIGENVALUES

It is well-known that for any such operator norm

ρ(A) = limn→∞

‖An‖1/n. (2.1)

The following version of the Perron-Frobenius theorem can be found in [Gan00,Section 8.3] (see also, e.g., [Sen73, Chapter 1]). In this section, we will give ourown proof, with a distinctly probabilistic flavor, of this theorem and the remarkbelow it.

Theorem 2.1 (Perron-Frobenius) Let S be a finite set and let A : CS → CS

be a linear operator whose matrix is nonnegative and irreducible. Then

(i) There exist an f : S → [0,∞), unique up to multiplication by a positiveconstant, and a unique α ∈ R such that Af = αf and f is not identicallyzero.

(ii) f(x) > 0 for all x ∈ S.

(iii) α = ρ(A) > 0.

(iv) The algebraic multiplicity of α is one. In particular, if A is written in itsJordan normal form, then α corresponds to a block of size 1× 1.

Remark We define periodicity for nonnegative matrices in the same way as forprobability kernels. If A is moreover aperiodic, then there exists some n ≥ 1such that An(x, y) > 0 for all x, y ∈ S (i.e., the same n works for all x, y). NowPerron’s theorem [Gan00, Section 8.2] implies that all other eigenvalues λ of Asatisfy |λ| < α. If A is not aperiodic, then it is easy to see that this statement failsin general. (This is stated incorrectly in [DZ98, Thm 3.1.1 (b)].)

We call the constant α and function f from Theorem 2.1 the Perron-Frobeniuseigenvalue and eigenfunction of A, respectively. We note that if A†(x, y) := A(y, x)denotes the transpose of A, then A† is also nonnegative and irreducible. It is well-known that the spectra of a matrix and its transpose agree: spec(A) = spec(A†),and therefore also ρ(A) = ρ(A†), which implies that the Perron-Frobenius eigenval-ues of A and A† are the same. The same is usually not true for the correspondingPerron-Frobenius eigenvectors. We call eigenvectors of A and A† also right and lefteigenvectors, respectively, and write φA := A†φ, consistent with earlier notationfor probability kernels.

2.2. SUBADDITIVITY 43

2.2 Subadditivity

By definition, a function f : N+ → R is subadditive if

f(n+m) ≤ f(n) + f(m) (n,m ≥ 1).

The following simple lemma has many applications. Proofs can be found in manyplaces, e.g. [Lig99, Thm B.22].

Lemma 2.2 (Fekete’s lemma) If f : N+ → R is subadditive, then the limit

limn→∞

1

nf(n) = inf

n≥1

1

nf(n)

exists in [−∞,∞).

Proof Note that we can always extend f to a subadditive function f : N→ R bysetting f(0) = 0. Fix m ≥ 1 and for each n ≥ 0 write n = km(n)m+ rm(n) wherekm(n) ≥ 0 and 0 ≤ rm(n) < m, i.e., km(n) is n/m rounded off to below and rm(n)is the remainder. Setting sm := sup1≤r<m f(r), we see that

f(n)

n=f(km(n)m+ rm(n)

)km(n)m+ rm(n)

≤ km(n)f(m) + smkm(n)m

−→n→∞

f(m)

m,

which proves that

lim supn→∞

f(n)

n≤ f(m)

m(m ≥ 1).

Taking the infimum over m we conclude that

lim supn→∞

f(n)

n≤ inf

m≥1

f(m)

m.

This shows in particular that the limit superior is less or equal than the limit infe-rior, hence the limit exists. Moreover, the limit (which equals the limit superior)is given by the infimum. Since f takes values in R, the infimum is clearly less than+∞.


2.3 Spectral radius

Let S be a finite set and let L(CS,CS) denote the space of all linear operatorsA : CS → CS. Let ‖ · ‖ be any norm on CS and let ‖A‖ denote the associatedoperator norm of an operator A ∈ L(CS,CS). Then obviously

‖Af‖ = ‖A(f/‖f‖)‖ ‖f‖ ≤ ‖A‖ ‖f‖ (f ∈ CS),

where the inequality between the left- and right-hand sides holds also for f = 0even though in this case the intermediate steps are not defined. It follows that‖ABf‖ ≤ ‖A‖ ‖B‖ ‖f‖, which in turn shows that

‖AB‖ ≤ ‖A‖ ‖B‖(A,B ∈ L(CS,CS)

). (2.2)

Exercise 2.3 (Operator norm induced by supremumnorm) If ‖ · ‖∞ de-notes the supremumnorm on CS, then show that the associated operator norm onL(CS,CS) is given by

‖A‖∞ = supx∈S

∑y∈S

|A(x, y)|.

Lemma 2.4 (Spectral radius) For each operator A ∈ L(CS,CS), the limit

ρ(A) := limn→∞

‖An‖1/n

exists in [0,∞). For each ε > 0 there exists a Cε <∞ such that

ρ(A)n ≤ ‖An‖ ≤ Cε(ρ(A) + ε

)n(n ≥ 0). (2.3)

Proof It follows from (2.2) that ‖An+m‖ ≤ ‖An‖ ‖Am‖. In other words, this saysthat the function n 7→ log ‖An‖ is subadditive. By Lemma 2.2, it follows that thelimit

limn→∞

1

nlog ‖An‖ = inf

n≥1

1

nlog ‖An‖

exists in [−∞,∞). Applying the exponential function to both sides of this equa-tion, we see that the limit

ρ(A) := limn→∞

‖An‖1/n = infn≥1‖An‖1/n

2.4. EXISTENCE OF THE LEADING EIGENVECTOR 45

exists in [0,∞). In particular, this shows that ρ(A) ≤ ‖An‖1/n (n ≥ 1) and foreach ε > 0 there exists an N ≥ 0 such that

‖An‖1/n ≤ ρ(A) + ε (n ≥ N).

Raising both sides of our inequalities to the n-th power and choosing

Cε := 1 ∨ sup0≤n<N

‖An‖(ρ(A) + ε)−n

yields the desired statement (where we observe that ρ(A)0 = 1 = ‖1‖ = ‖A0‖).

Exercise 2.5 (Choice of the norm) Show that the limit ρ(A) from Lemma 2.4does not depend on the choice of the norm on CS. Hint: you can use the fact thaton a finite-dimensional space, all norms are equivalent. In particular, if ‖ · ‖ and‖ · ‖′ are two different norms on the space L(CS,CS), then there exist constants0 < c < C <∞ such that c‖A‖ ≤ ‖A‖′ ≤ C‖A‖ for all A ∈ L(CS,CS).

2.4 Existence of the leading eigenvector

Let A be a nonnegative real matrix with coordinates indexed by a finite set S.The Perron-Frobenius theorem tells us that if A is irreducible, then A has a uniquenonnegative eigenvector with eigenvalue ρ(A). In this section, we will prove theexistence of such an eigenvector.

Lemma 2.6 (Existence of eigenvector) Let S be a finite set, let A : CS → CS

be a linear operator whose matrix is nonnegative, and let ρ(A) be its spectral radiusas defined in Lemma 2.4. Then there exists a function h : S → [0,∞) that is notidentically zero such that Ah = ρ(A)h.

Proof We will treat the cases ρ(A) = 0 and ρ(A) > 0 separately. We start with thelatter. In this case, for each z ∈ [0, 1/ρ(A)), let us define a function fz : S → [0,∞)by

fz :=∞∑n=0

znAn1,

where 1 denotes the function on S that is identically one. Note that for eachz < 1/ρ(A), this sequence is absolutely summable in any norm on CS by (2.3). Itfollows that

Afz =∞∑n=0

znAn+11 = z−1fz − 1 (0 < z < ρ(A)−1). (2.4)


By Excercise 2.3 and the nonnegativity of A (and hence of An for any n ≥ 0),

‖An1‖∞ = supx∈S

∑y∈S

An(x, y) = ‖An‖∞.

Let ‖f‖1 :=∑

x∈S |f(x)| denote the `1-norm of a function f : S → C. Then, bynonnegativity and (2.3)

‖fz‖1 =∞∑n=0

zn‖An1‖1 ≥∞∑n=0

zn‖An1‖∞ =∞∑n=0

zn‖An‖∞ ≥∞∑n=0

znρ(A)n. (2.5)

Let hz := fz/‖fz‖1. Since hz takes values in the compact set [0, 1]S, we can choosezn ↑ ρ(A)−1 such that hzn → h where h is a nonnegative function h with ‖h‖1 = 1.Now (2.4) tells us that

Ahzn = z−1n hzn − ‖fzn‖−1

1 .

By (2.5), ‖fz‖1 → ∞ as z ↑ ρ(A)−1, so letting n → ∞ we obtain a nonnegativefunction h such that Ah = ρ(A)h and ‖h‖1 = 1.

We are left with the case ρ(A) = 0. We will show that this implies An = 0for some n ≥ 1. Indeed, by the finiteness of the state space, if A|S| 6= 0, then bynonnegativity there must exist x0, . . . , xn ∈ S with x0 = xn and A(xk−1, xk) > 0 foreach k = 1, . . . , n. But then Anm(x0, x0) ≥ ηm where η :=

∏nk=1A(xk−1, xk) > 0,

which is easily seen to imply that ρ(A) ≥ η1/n > 0. Thus, we see that ρ(A) = 0implies An = 0 for some n. Let m := infn ≥ 1 : An1 = 0. Then h := Am−11 6= 0while Ah = 0 = ρ(A)h.

2.5 Uniqueness of the leading eigenvector

In the previous section, we have shown that each nonnegative matrix A has anonnegative eigenvector with eigenvalue ρ(A). In this section, we will show thatif A is irreducible, then such an eigenvector is unique. We will use probabilisticmethods. Our proof will be based on the following observation.

Lemma 2.7 (Generalized Doob transform) Let S be a finite set and let A :CS → CS be a linear operator whose matrix is nonnegative. Assume that h : S →[0,∞) is not identically zero, α > 0, and Ah = αh. Then setting S ′ := x ∈ S :h(x) > 0 and

Ah(x, y) :=A(x, y)h(y)

αh(x)(x, y ∈ S ′)

defines a transition kernel Ah on S ′.

2.5. UNIQUENESS OF THE LEADING EIGENVECTOR 47

Proof Obviously Ah(x, y) ≥ 0 for all x, y ∈ S ′. Since∑y∈S′

Ah(x, y) = (αh(x))−1∑y∈S′

A(x, y)h(y) = (αh(x))−1Ah(x) = 1 (x ∈ S ′),

Ah is a transition kernel.

We need one more preparatory lemma.

Lemma 2.8 (Eigenfunctions of positively recurrent chains) Let P be aprobability kernel on a finite or countably infinite space S. Assume that the Markovchain X with transition kernel P is irreducible and positively recurrent. Let f :S → (0,∞) be a bounded function and let α > 0 be a constant. Assume thatPf = αf . Then α = 1 and f is constant.

Proof Since P is irreducible and positively recurrent, by Theorem 0.18 (b), it hasa unique invariant law µ, which satisfies µ(x) > 0 for all x ∈ S. Now µf = µPf =µαf which by the fact that f > 0 proves that α = 1 and hence f is harmonic.Let X denote the Markov chain with transition kernel P , let Px denote the lawof the chain started in X0 = x, and let Ex denote expectation w.r.t. Px. Letδx denotes the delta measure in x, i.e., the probability measure on S such thatδx(y) := 1x=y. Then

f(x) =1

n

n−1∑k=0

P kf(x) =( 1

n

n−1∑k=0

δxPk)f =

1

n

n−1∑k=0

Ex[f(Xk)] −→n→∞

µf (x ∈ S),

where in the last step we have used Theorem 0.19. In particular, since the limitdoes not depend on x, the function f is constant.

We can now prove parts (i)–(iii) of the Perron-Frobenius theorem.

Proposition 2.9 (Perron-Frobenius) Let S be a finite set and let A : CS → CS

be a linear operator whose matrix is nonnegative and irreducible. Then

(i) There exist an f : S → [0,∞), unique up to multiplication by a positiveconstant, and a unique α ∈ R such that Af = αf and f is not identicallyzero.

(ii) f(x) > 0 for all x ∈ S.

(iii) α = ρ(A) > 0.


Proof Assume that f : S → [0,∞) is not identically zero and satisfies Af = αffor some α ∈ R. Since f is nonnegative and not identically zero, we can choosex ∈ S such that f(x) > 0. Since A is nonnegative, Af(x) ≥ 0 and therefore α ≥ 0.In fact, since A is irreducible, we can choose some n ≥ 1 such that An(x, x) > 0and hence Anf(x) > 0, so we must have α > 0. Moreover, by irreducibility, foreach y ∈ S there is an m such that Am(y, x) > 0 and hence Amf(y) > 0, whichshows that f(y) > 0 for all y ∈ S.

Now let h : S → [0,∞) and β ∈ R be another function and real constant suchthat h is not identically zero and Ah = βh. We will show that this implies f = chfor some c > 0 and α = β. In particular, choosing h as in Lemma 2.6, this thenimplies the statements of the proposition.

By what we have already proved, β > 0 and h(y) > 0 for all y ∈ S. Let Ah bethe probability kernel on S ′ = S (since h > 0 everywhere) defined in Lemma 2.7.Note that Ah is irreducible since A is and since h > 0 everywhere. Set g := f/h.We observe that

Ahg(x) =∑y∈S

A(x, y)h(y)

βh(x)

f(y)

h(y)=Af(x)

βh(x)=α

βg(x).

Since Ah is positively recurrent by the finiteness of the state space, Lemma 2.8implies that α/β = 1 and g(x) = c (x ∈ S) for some c > 0.

2.6 The Jordan normal form

In this section, we recall some facts from linear algebra. Let V be a finite-dimensional linear vector space over the complex numbers and let L(V, V ) denotethe space of linear operators A : V → V . If e1, . . . , ed is a basis for V , then eachvector φ ∈ V can in a unique way be written as

φ =d∑i=1

φ(i)ei,

where the φ(i) are complex numbers, the coordinates of φ with respect to the basise1, . . . , ed. There exist complex numbers A(i, j) such that for any φ ∈ V , thecoordinates of Aφ are given by

Aφ(i) =d∑j=1

A(i, j)φ(j) (i = 1, . . . , d).

2.6. THE JORDAN NORMAL FORM 49

We call (A(i, j))1≤i,j≤d the matrix of A w.r.t. basis e1, . . . , ed. We warn thereader that whether a matrix is nonnegative (i.e., wether its entries A(x, y) are allnonnegative) very much depends on the choice of the basis.1

The spectrum of A is the set of eigenvalues of A, i.e.,

spec(A) :=λ ∈ C : ∃0 6= φ ∈ V s.t. Aφ = λφ

.

The function C 3 z 7→ pA(z) defined by

pA(z) := det(A− z)

is the characteristic polynomial of A. We may write

pA(z) =n∏i=1

(λi − z)ma(λi),

where spec(A) = λ1, . . . , λn, with λ1, . . . , λn all different, and ma(λ) is the al-gebraic multiplicity of the eigenvalue λ. We need to distinguish the algebraicmultiplicity of an eigenvalue λ from its geometric multiplicity mg(λ), which isthe dimension of the corresponding eigenspace φ ∈ V : Aφ = λφ. By defini-tion, φ ∈ V is a generalized eigenvector of A with eigenvalue λ if (A − λ)kφ = 0for some k ≥ 1. Then the algebraic multiplicity ma(λ) of λ is the dimension ofthe generalized eigenspace φ ∈ V : (A − λ)kφ = 0 for some k ≥ 1. Note thatmg(λ) ≤ ma(λ).

It is possible to find a basis e1, . . . , ed such that the matrix of A w.r.t. this basishas the Jordan normal form , i.e., the matrix of A has the block diagonal form

A =

B1

. . .

Bd

, with Bk =

λk 1

. . . . . .. . . 1

λk

.

We can read off both the algebraic and geometric multiplicity of an eigenvalue λfrom the Jordan normal form of A. Indeed, ma(λ) is simply the number of times

1The statement that the matrix of a linear operator A is nonnegative should not be confusedwith the statement that A is nonnegative definite. The latter concept, which is defined onlyfor linear spaces that are equipped with an inner product 〈φ, ψ〉, means that 〈φ,Aφ〉 is realand nonnegative for all φ. One can show that A is nonnegative definite if and only if thereexists an orthonormal basis of eigenvectors of A and all eigenvalues are nonnegative. Operatorswhose matrix is nonnegative w.r.t. some basis, by contrast, need not be diagonalizable and theireigenvalues can be negative or even complex.


λ occurs on the diagonal, while mg(λ) is the number of blocks having λ on thediagonal. Note that

λ 1. . . . . .

. . . 1λ

10...0

=

λ0...0

.

Each block in the Jordan normal form of A corresponds to an invariant subspace ofgeneralized eigenvectors that contains exactly one eigenvector of A. The subspacescorresponding to all blocks with a given eigenvalue span the generalized eigenspacecorresponding to this eigenvalue. To give a concrete example: if A has the Jordannormal form

λ1 1λ1 1

λ1

λ1 1λ1

λ2

λ3 1λ3

then the eigenspace and generalized eigenspace corresponding to the eigenvalue λ1

consist of all vectors of the forms

φ(1)00

φ(4)0000

and

φ(1)φ(2)φ(3)φ(4)φ(5)

000

repectively.

Recall that in Lemma 2.4, we defined ρ(A) as ρ(A) := limn→∞ ‖An‖1/n. Thenext lemma relates ρ(A) to the spectrum of A. This fact is well-known, but forcompleteness we include a proof in Appendix A.1.

Lemma 2.10 (Gelfand’s formula) Fix any norm on V and let ‖A‖ denote theassociated operator norm of an operator A ∈ L(V, V ). Let ρ(A) be defined as in

2.7. THE SPECTRUM OF A NONNEGATIVE MATRIX 51

Lemma 2.4. Then

ρ(A) = sup|λ| : λ ∈ spec(A)(A ∈ L(V, V )

).

2.7 The spectrum of a nonnegative matrix

We note that in the proof of Proposition 2.9, which contains the main conclusionsof the Perron Frobenius theorem, we have used very little linear algebra. In fact,we have not even very often used the fact that our state space S is finite. We usedthe finiteness of S several times in the proof of Lemma 2.6 (existence of the leadingeigenvalue), for example when we used that ‖1‖1 < ∞ or when we used that anysequence of functions that is bounded in `1-norm has a convergent subsequence.Later, in the proof of Proposition 2.9, we needed the finiteness of S when weused that any irreducible transition kernel is positively recurrent. In both of theseplaces, our arguments can sometimes be replaced by more ad hoc calculations toshow that certain infinite dimensional nonnegative matrices also have a Perron-Frobenius eigenvector that is unique up to a multiplicative constant, in a suitablespace of functions.

In the present section, we will use the finiteness of S in a more essential way toprove the missing part (iv) of Theorem 2.1, as well as the remark below it. Inparticular, we will use that each finite-dimensional square matrix can be broughtin its Jordan normal form.

Lemma 2.11 (Spectrum and Doob transform) Let S be a finite set and letA : CS → CS be a linear operator whose matrix is nonnegative. Assume thath : S → (0,∞) and α > 0 satisfy Ah = αh, and let Ah be the probability kerneldefined in Lemma 2.7. Then Ah is irreducible, resp. apriodic if and only if Ahas these properties. Moreover, a function f : S → C is an eigenvector, resp.generalized eigenvector of A corresponding to some eigenvalue λ ∈ C if and onlyif f/h is an eigenvector, resp. generalized eigenvector of Ah corresponding to theeigenvalue λ/α.

Proof Let us use the symbol h to denote the linear operator on CS that mapsa function f into hf . Similarly, let h−1 denote pointwise multiplication with thefunction 1/h(x). Then Lemma 2.7 says that Ah = α−1h−1Ah. Recall that f isan eigenvector, resp. generalized eigenvector of A corresponding to the eigenvalueλ if and only if (A − λ)f = 0, resp. (A − λ)kf = 0 for some k ≥ 1. Now


Ah − λ/α = α−1h−1Ah− α−1h−1λh = α−1h−1(A− λ)h and therefore

(Ah − λ/α)k = α−1h−1(A− λ)h · · ·α−1h−1(A− λ)h︸︷︷︸k times

= α−kh−1(A− λ)kh.

It follows that (Ah−λ/α)(f/h) = α−1h−1(A−λ)f is zero if and only if (A−λ)f = 0and more generally (Ah − λ/α)k(f/h) = α−kh−1(A − λ)kf is zero if and only if(A− λ)kf = 0.

Exercise 2.12 (Non-diagonalizable kernel) Give an example of a probabilitykernel P that is not diagonalizable. Hint: It is easy to give an example of anonnegative matrix that is not diagonalizable. Now apply Lemma 2.11.

Remark Regarding the excercise above: I do not know how to give an examplesuch that P is irreducible.

Proposition 2.13 (Spectrum of a nonnegative matrix) Let S be a finite set,let A : CS → CS be a linear operator whose matrix is nonnegative and irreducible,and let α = ρ(A) be its Perron-Frobenius eigenvalue. Then the algebraic multi-plicity of α is one. If A is moreover aperiodic, then all other eigenvalues λ of Asatisfy |λ| < α.

Proof By Lemma 2.11 it suffices to prove the proposition for the case that A isa probability kernel. Writing P instead of A to remind us of this fact, we firstobserve that since the constant function 1 is nonnegative and satisfies P1 = 1, byProposition 2.9, this is the Perron-Frobenius eigenfunction of P and its eigenvalueis 1. In particular, α = ρ(P ) = 1.

Let µ be the unique invariant law of P , which satisfies µ(x) > 0 for all x ∈ S.2

Set V := f ∈ CS : µf = 0. Since f ∈ V implies µPf = µf = 0, we see that Pmaps the space V into itself, i.e., V is an invariant subpace of P . Note that for anyf ∈ CS, we have f − (µf)1 ∈ V . We observe that for any f ∈ CS, f − (µf)1 ∈ V .In fact, since f − c1 6∈ V for all c 6= µf , each function in CS can in a unique waybe written as a linear combination of a function in V and the constant function 1,i.e., the space V has codimension one.

Let P∣∣V

denote the restriction of the linear operator P to the space V . We claim

that 1 is not an eigenvalue of P∣∣V

. To prove this, imagine that f ∈ V satisfies

2Note that, actually, µ is the left Perron-Frobenius eigenvector of P .

2.8. CONVERGENCE TO EQUILIBRIUM 53

Pf = f . Then, by the reasoning in Lemma 2.8,

f(x) =1

n

n−1∑k=0

P kf(x) −→n→∞

µf = 0 (x ∈ S).

If P is aperiodic, a stronger conclusion can be drawn. In this case, if 0 6= f ∈ Vsatisfies Pf = λf for some λ ∈ C, then by Theorem 0.19,

λnf(x) = P nf(x) = δxPnf = Ex[f(Xn)] −→

n→∞µf = 0 (x ∈ S),

which is possible only if |λ| < 1.

Now imagine that f ∈ CS (not necessarily f ∈ V ) is a generalized eigenvector ofP corresponding to some eigenvalue λ, i.e., (P − λ)kf = 0. Then we may write fin a unique way as f = g + c1 where g ∈ V and c ∈ C. Now 0 = (P − λ)kf =(P − λ)kg + c(1 − λ)k1, where (P − λ)kg ∈ V by the fact that V is an invariantsubspace. Since each function in CS can in a unique way be written as a linearcombination of a function in V and the constant function 1, this implies that both(P − λ)kg = 0 and c(1− λ)k1 = 0. We now distinguish two cases.

If λ = 1, then since we have just proved that 1 6∈ spec(P∣∣V

), the operator P∣∣V

hasno generalized eigenvectors corresponding to the eigenvalue λ, and hence g = 0.This shows that the constant function 1 is (up to constant multiples) the onlygeneralized eigenvector of P corresponding to the eigenvalue 1, i.e., the algebraicmultiplicity of the eigenvalue 1 is 1.

If λ 6= 1, then we must have c = 0 and hence f ∈ V . In this case, if P is aperiodic,then by what we have just proved we must have |λ| < 1.

Exercise 2.14 (Periodic kernel) Let P be the probability kernel on 0, . . . , n−1 defined by

P (x, y) = 1y = x mod(n) (0 ≤ x, y < n).

Show that P is irreducible but not aperiodic. Determine the spectrum of P .

2.8 Convergence to equilibrium

Now that the Perron Frobenius theorem is proved, we can start to reap some ofits useful consequences. The next proposition says that for irreducible, apriodicMarkov chains with finite state space, convergence to the invariant law alwayshappens exponentially fast.


Proposition 2.15 (Exponential convergence to equilibrium) Let S be afinite set and let P be an irreducible, aperiodic probability kernel on S. Then thePerron-Frobenius eigenvalue of P is ρ(P ) = 1 and its associated eigenvector is theconstant function 1. Set

γ := sup|λ| : λ ∈ spec(P ), λ 6= 1

,

and let µ denote the unique invariant law of P . Then γ < 1 and for any norm onCS and ε > 0, there exists a constant Cε <∞ such that

‖P nf − (µf)1‖ ≤ Cε(γ + ε)n‖f − (µf)1‖ (n ≥ 0, f ∈ CS).

Proof Obviously P1 = 1 and 1 is a positive function, so by Proposition 2.9, thisis the Perron-Frobenius eigenfunction of P and its eigenvalue is 1. As in the proofof Proposition 2.13, we set V := f ∈ CS : µf = 0. We observe that V is aninvariant subspace of P and f − (µf)1 ∈ V for all f ∈ CS. Let P

∣∣V

denote therestriction of the linear operator P to V .

By Lemma 2.4, for any norm on CS and its associated operator norm, and for anyε > 0, there exists a Cε <∞ such that for any f ∈ CS,

‖P nf − (µf)1‖ = ‖P n(f − (µf)1)‖≤ ‖P n

∣∣V‖ ‖f − (µf)‖ ≤ Cε

(ρ(P

∣∣V

) + ε)n ‖f − (µf)‖.

As we have seen near the end of the proof of Proposition 2.13, if f is an eigenvectorof P corresponding to an eigenvalue λ 6= 1, then we must have f ∈ V . ByProposition 2.13, the algebraic multiplicity of 1 is 1, hence 1 6∈ spec(P

∣∣V

). Itfollows that

spec(P )\1 = spec(P∣∣V

),

and therefore γ = ρ(P∣∣V

). By Proposition 2.13, all eigenvalues λ 6= 1 of P satisfy|λ| < 1, hence γ < 1.

Remark If γ is as in Proposition 2.15, then the positive quantity 1−γ is called theabsolute spectral gap of the operator P . If X = (Xk)k≥0 denotes the Markov chainwith transition kernel P , then Proposition 2.15 says that the expectation Ex[f(Xn)]of any function f at time n converges to its equilibrium value µf exponentiallyfast: ∣∣Ex[f(Xn)]− µf

∣∣ ≤ Cεe−(η − ε)n, where η = log(1− γ).

Here Cε can even be chosen uniformly in x (if we take ‖ · ‖ to be the supremum-norm).

2.9. CONDITIONING TO STAY INSIDE A SET 55

2.9 Conditioning to stay inside a set

Let X be a Markov chain with countable state space S and transition kernel P ,and let S ′ ⊂ S be a finite set of states such that

• P (x, y) > 0 for some x ∈ S, y 6∈ S, i.e., it is possible leave S ′.

• The restriction of P to S ′ is irreducible, i.e., it is possible to go from anystate x in S ′ to any other state y ∈ S ′ without leaving S ′.

We will be interested in the process X started in any initial state in S ′ and con-ditioned not to leave S ′ till time n, in the limit that n → ∞. Obviously, ourassumptions imply that starting from anywhere in S ′, there is a positive proba-bility to leave S ′ at some time in the future. Since S ′ is finite, this probability isuniformly bounded from below, so by the principle ‘what can happen must happen’(Proposition 0.14), we have that

P[Xk ∈ S ′ ∀k ≥ 0

]= 0.

Let P ′ := P∣∣S′

denote the restriction of P to S ′, i.e., the matrix (P (x, y))x,y∈S′

and its associated linear operator acing on CS′ . Since P ′ is nonnegative and ir-reducible, it has a unique Perron-Frobenius eigenvalue α and associated left andright eigenfunctions η and h, unique up to multiplicative constants, i.e., η and hare strictly positive functions on S ′ such that

ηP ′ = αη and P ′h = αh.

We choose our normalization of η and h such that

π(x) := η(x)h(x) (x ∈ S ′)

is a probability measure on S ′. It turns out that π is the invariant law of the Doobtransformed probability kernel (P ′)h defined as in Lemma 2.7.

Lemma 2.16 (Invariant law of Doob transformed kernel) Let S ′ be a finiteset and let A : CS′ → CS′ be linear operator whose matrix is irreducible andnonnegative. Let α, η and h be its Perron-Frobenius eigenvalue and left and righteigenvectors, respectively, normalized such that π(x) := η(x)h(x) (x ∈ S ′) is aprobability law on S ′. Then π is the invariant law of the probability kernel Ah

defined in Lemma 2.7.


Proof Note that since Ah is irreducible, its invariant law is unique. Now

πAh(x) =∑y∈S′

η(y)h(y)A(y, x)h(x)

αh(y)

= α−1h(x)∑y∈S′

η(y)A(y, x) = α−1(ηA)(x)h(x) = π(x),

which proves that π is an invariant law.

For simplicity, we assume below that P ′ is not only irreducible but also aperiodic.

Theorem 2.17 (Process conditioned not to leave a set) Let X be a Markovchain with countable state space S and transition kernel P , and let S ′ ⊂ S befinite. Assume that P ′ := P

∣∣S′

is irreducible and aperiodic. Let α and η, h de-note its Perron-Frobenius eigenvalue and left and right eigenvectors, respectively,normalized such that

∑x∈S′ η(x) = 1 and

∑x∈S′ η(x)h(x) = 1. Set

τ := infk ≥ 0 : Xk 6∈ S ′

.

Then, for any x ∈ S ′,

limn→∞

α−nPx[n < τ ] = h(x) (x ∈ S ′). (2.6)

Moreover, for each m ≥ 1 and x ∈ S ′,

Px[(Xk)0≤k≤m ∈ ·

∣∣n < τ]

=⇒n→∞

Px[(Xh

k )0≤k≤m ∈ ·],

where Xh denotes the Markov chain with state space S ′ and Doob transformedtransition kernel

P h(x, y) :=P ′(x, y)h(y)

αh(x)(x, y ∈ S ′).

Proof We observe that, setting x0 := x,

Px[n < τ ] =∑x1∈S′

· · ·∑xn∈S′

n∏k=1

P (xk−1, xk) =∑xn∈S′

P ′n(x0, xn) = P ′

n1(x).

As in the proof of Lemma 2.11, we let h and h−1 also denote the operators thatcorrespond to multiplication with these functions. Then P h = α−1h−1P ′h andtherefore

α−nPx[n < τ ] = α−nP ′n1(x) = α−n

(αhP hh−1

)n1(x)

= h(x)((P h)nh−1

)(x) −→

n→∞h(x)πh−1 = h(x),


where

πh−1 =∑x′∈S′

π(x)h−1(x) =∑x′∈S′

η(x)h(x)h−1(x) =∑x∈S′

η(x) = 1

is the average of the function h−1 with respect to the invariant law π of P h.

To prove the remaining statement of the theorem, we observe that for any 0 ≤m < n, xm+1 ∈ S ′, and (x0, . . . , xm) ∈ S ′m such that

P[(X0, . . . , Xm) = (x0, . . . , xm)

]> 0,

we have

Px0[Xm+1 = xm+1

∣∣ (X0, . . . , Xm) = (x0, . . . , xm), n < τ]

=

∑xm+2∈S′ · · ·

∑xn∈S′

∏nk=1 P (xk−1, xk)∑

x′m+1∈S′· · ·∑

x′n∈S′(∏m

k=1 P (xk−1, xk))(∏n

k=m+1 P (x′k−1, x′k))

=P (xm, xm+1)P

′n−m−11(xm+1)

P ′n−m1(xm).

This shows that conditional on the event n < τ , the process (X0, . . . , Xn) is atime-inhomogeneous Markov chain with transition kernel in the (m + 1)-th stepgiven by

P(n)m,m+1(x, y) =

P (x, y)P ′n−m+11(y)

P ′n−m1(x)=P (x, y)α−(n−m−1)P ′n−m+11(y)

αα−(n−m)P ′n−m1(x).

Since we have already shown that α−nP ′n1→ h as n→∞, the result follows.

Exercise 2.18 (Behavior at typical times) Let 0 ≤ mn ≤ n be such thatmn →∞ and n−mn →∞ as n→∞. Show that for any x ∈ S ′,

P[Xmn ∈ ·

∣∣n < τ]

= π,

where π = ηh is the invariant law of the Doob transformed probability kernel P h.

Excercise 2.18 shows that conditional on the unlikely event that n < τ where nis large, most of the time up to n, the process X is approximately distributedaccording to the invariant law π of the Doob transformed probability kernel P h.We will see below that at the time n, the situation is different.


Let X be a Markov chain with countable state space S and transition kernel P ,and let S ′ ⊂ S be finite. By definition, we say that a probability measure ρ on S ′

is a quasi-stationary law3 of P on S ′ if

P[X0 ∈ · ] = ρ implies P[X1 ∈ · |X1 ∈ S ′] = ρ.

It seems that quasi-stationary laws were first introduced by Darroch and Senetain [DS67].

Proposition 2.19 (Quasi stationary law) Let X be a Markov chain with count-able state space S and transition kernel P , and let S ′ ⊂ S be finite. Assumethat P ′ := P

∣∣S′

is irreducible and aperiodic. Let α and η, h denote its Perron-Frobenius eigenvalue and left and right eigenvectors, respectively, normalized suchthat

∑x∈S′ η(x) = 1 and

∑x∈S′ η(x)h(x) = 1. Then η is the unique quasi-

stationary law of P on S ′ and the process started in any initial law satisfies

limn→∞

P[Xn ∈ ·

∣∣n < τ ] = η,

where τ := infk ≥ 0 : Xk 6∈ S ′.

Proof Define finite measures µn on S ′ by

µn(x) := P[Xn = x, n < τ ] (x ∈ S ′, n ≥ 0).

Then

µn+1(x) = P[Xn+1 = x, n+ 1 < τ ] =∑y′∈S′

P[Xn = y, n < τ ]P (y, x) = µnP′(x)

(x ∈ S ′), i.e., µn+1 = µnP′. As before, we let h and h−1 denote the operators

that correspond to multiplication with these functions. Then P h = α−1h−1P ′hand therefore

µn = µ0P′n = µ0

(αhP hh−1

)n= αnµ0h(P h)nh−1. (2.7)

Let us define a probability measure ν on S ′ by

ν(x) := (µ0h)−1µ(x)h(x) (x ∈ S ′).3Usually, the terms ‘stationary law’ and ‘invariant law’ can be used exchangeably. In this case,

this may lead to confusion, however, since the term ‘quasi-invariant measure’ is normally used inergodic theory for a measure that is mapped into an equivalent measure by some transformation.


We observe thatαnµ0h(P h)n = αn(µ0h)ν(P h)n,

where ν(P h)n is just the law at time n of a Markov chain with transition kernelP h and initial law ν. Since P h is irreducible and aperiodic, ν(P h)n converges asn→∞ to the invariant law π given by π(x) = η(x)h(x). Inserting this into (2.7),we see that

α−nµn = µ0h(P h)nh−1 −→n→∞

(µ0h)ηhh−1 = (µ0h)η.

It follows that

P[Xn ∈ ·

∣∣n < τ ] =µnµn1

=α−nµnα−nµn1

=⇒n→∞

(µ0h)η

(µ0h)η1= η,

where we have used that η1 = 1 by the choice of our normalization.

Chapter 3

Intertwining

3.1 Intertwining of Markov chains

In linear algebra, an intertwining relation is a relation between linear operators ofthe form

AB = BA.

In particular, if B is invertible, then this says that A = BAB−1, or, in other words,that the matrices A and A are similar. In this case, we may associate B with achange of basis, and A corresponds to the matrix A written in terms of a differentbasis. In general, however, B need not be invertible. It is especially in this casethat the word intertwining is used.

We will be especially interested in the case that A, A and B are (linear operatorscorresponding to) probability kernels. So let us assume that we have probabilitykernels P, P on countable spaces S, S, respectively, and a probability kernel Kfrom S to S (i.e., S × S 3 (x, y) 7→ K(x, y)), such that

PK = KP. (3.1)

Note that P, P and K correspond to linear operators P : CS → CS, P : CS → CS,and K : CS → CS, so both sides of the equation (3.1) correspond to a linear

operator from CS into CS. We need some examples to see this sort of relationsbetween probability kernels can really happen.

Exclusion process. Fix n ≥ 2 and let Cn := Z/n, i.e., Cn = 0, . . . , n − 1with addition modulo n. (I.e., Cn is the cyclic group with n elements.) Let

61

62 CHAPTER 3. INTERTWINING

S := 0, 1Cn , i.e., S consists of all finite sequences x = (x(0), . . . , x(n − 1))indexed by Cn. Let (Ik)k≥1 be i.i.d. and uniformly distributed on Cn. For eachx ∈ S, we may define a Markov chain X = (Xk)k≥0, started in X0 = x and withvalues in S, by setting

Xk+1(i) :=

Xk(I + 1) if i = I,

Xk(I) if i = I + 1,

Xk(i) otherwise.

(3.2)

In words, this says that in each time step, we choose a uniformly distributedposition I ∈ Cn and exchange the values of X in I and I + 1 (where we calculatemodulo n). Note that we have described our Markov chain in terms of a randommapping representation. In particular, it is clear from this construction that X isa Markov chain.

Thinning Let C be any countable set and let S := 0, 1C be the set of all x =(x(i))i∈C with x(i) ∈ 0, 1 for all i ∈ C, i.e., S is the set of all sequences of zerosand ones indexed by C. Fix 0 ≤ p ≤ 1, and let (χi)i∈C be i.i.d. Bernoulli (i.e.,0, 1-valued) random variables with P[χi = 1] = p. We define a probability kernelKp from S to S by

Kp(x, · ) = P[(χix(i)

)i∈C ∈ ·

](x ∈ S). (3.3)

Note that(χix(i)

)i∈C is obtained from x by setting some coordinates of x to

zero, independently for each i ∈ C, where each coordinate x(i) that is one hasprobability p to remain one and probability 1− p to become a zero. We describethis procedure as thinning the ones with parameter p.

Exercise 3.1 (Thinning of exclusion processes) Let P be the transition ker-nel of the exclusion process on Cn described above, with state space S = 0, 1Cn ,and for 0 ≤ p ≤ 1, let Kp be the kernel from S to S corresponding to thinningwith parameter p. Show that

PKp = KpP (0 ≤ p ≤ 1).

Sum of two processes Let P be a transition kernel on a countable space S and letX(1) = (Xk(1))k≥0 and X(2) = (Xk(2))k≥0 be two independent Markov chainswith transition kernel P and possibly different deterministic initial states X0(i) =x(i) (i = 1, 2). Then (X(1), X(2)) = (Xk(1), Xk(2))k≥0 is a Markov chain withvalues in the product space S × S. We may view (X(1), X(2)) as two particles,

3.1. INTERTWINING OF MARKOV CHAINS 63

walking around in the space S. Now maybe we are not interested in which particleis where, but only in how many particles are on which place. In that case, we maylook at the process

Yk(i) := 1Xk(1)=i + 1Xk(2)=i (i ∈ S, k ≥ 0),

which takes values in the space S consisting of all functions y : S → 0, 1, 2 suchthat

∑i∈S y(i) = 2. Note that Y just counts how many particles are present on

each site i ∈ S. It is not hard to see that Y = (Yk)k≥0 is an autonomous Markovchain.

Exercise 3.2 (Counting process) Let (X(1), X(2)) and Y be the Markov chainswith state spaces S × S and S described above. Let P2 be the transition kernel of(X(1), X(2)) and let P be the transition kernel of Y . For each y ∈ S, let K(y, · )be the uniform distribution on the set

Uy :=(x(1), x(2)

)∈ S × S : y = 1x(1) + 1x(2)

,

where 1x(i) := 1x=i. Note that K is a probability kernel from S to S × S. Fromy, we can see that there are two particles and where these are, but not which isthe first and which is the second particle. All K does is arbitrarily ordering theparticles in y. Show that

PK = KP2.

This example can easily be generalized to any number of independent Markovchains (all with the same transition kernel).

Conditioning on the future As a third example, we look at a Markov chain X withfinite state space S and transition kernel P . We assume that P is such that allstates in S are transient, except for two states z0 and z1, which are traps. We set

hi(x) := Px[Xk = zi for some k ≥ 0

](i = 0, 1).

By Lemma 1.2, we know that h0 and h1 are harmonic functions. Since all statesexcept z0 and z1 are transient and our state space is finite, we have h0 + h1 = 1.By Proposition 1.5, the process X conditioned to be eventually trapped in zi isitself a Markov chain, with state space x ∈ S : hi(x) > 0 and Doob transformedtransition kernel P hi . Set

S :=

(x, i) : x ∈ S, i ∈ 0, 1, hi(x) > 0.


We define a probability kernel P on S by

P((x, i), (y, j)

):= 1i=jP

hi(x, y) (x, y ∈ S, i, j ∈ 0, 1).

Note that if (X, I) = (Xk, Ik)k≥0 is a Markov chain with transition kernel P , thenI never changes its value, and depending on whether Ik = 0 for all k ≥ 0 or Ik = 1for all k ≥ 0, the process X is our original Markov chain X conditioned to betrapped in either z0 or z1.

Exercise 3.3 (Conditioning on the future) In the example above, define aprobability kernel K from S to S by

K(x, (y, i)

):= 1x=yhi(x) (x, y ∈ S, i ∈ 0, 1).

Show thatPK = KP.

Returning to our general set-up, we observe that the intertwining relation (3.1)implies that for any probability measure µ on S

µP nK = µKP n (n ≥ 0).

This function has the following interpretation. If we start the Markov chain withtransition kernel P in the initial law µ, run it till time n, and then apply the kernelK to its law, then the result is the same as if we start the the Markov chain withtransition kernel P in the initial law µK, run it till time n, and look at its law.

More concretely, in our three examples, this says:

• If we start an exclusion process in some initial law, run it till time n, andthen thin it with the parameter p, then the result is the same as when wethin the initial state with p, and then run the process till time n.

• If we start the counting process Y in any initial law, run it to time n, andthen arbitrarily order the particles, then the result is the same as when wefirst arbitrarily order the particles, and then run the process (X(1), X(2))till time n.

• If we run the two-trap Markov chain X till time n, and then assign it a value0 or 1 according to the probability, given its present state Xn, that it willeventually get trapped in z0 or z1, respectively, then the result is the sameas when we assign such values at time zero, and then run the appropriateDoob transformed Markov chain till time n.

3.2. MARKOV FUNCTIONALS 65

None of these statement comes as a big surprise, but we will later see less trivialexamples of this phenomenon.

3.2 Markov functionals

We already know that functions of Markov chains usually do not have the Markovproperty. An exception, as we have seen in Lemma 0.12, is the case when a functionof a Markov chain is autonomous. Let us quickly recall what this means. Let Xbe a Markov chain with countable state space S and transition kernel P , and letψ : S → S be a function from S into some other countable set S. Then we saythat (Yk)k≥0 := (ψ(Xk))k≥0 is an autonomous Markov chain if

P[ψ(Xk+1) = y

∣∣Xk = x]

depends on x only through ψ(x). Equivalently, this says that there exists a tran-sition kernel P on S such that

P (y, y′) =∑

x′:ψ(x′)=y′

P (x, x′) ∀x ∈ S s.t. ψ(x) = y. (3.4)

If (3.4) holds, then, regardless of the initial law of X, one has that the processψ(X) = (ψ(Xk))k≥0 is a Markov chain with transition kernel P .

The next theorem shows that sometimes, a function ψ(X) of a Markov chain X canbe a Markov chain itself, even when it is not autonomous. In this case, however,this is usually only true for certain special initial laws of X. It seems this resultis due to Rogers and Pitman [RP81]. For better comparison with other results, inthe theorem below, it will be convenient to interchange the roles of X and Y , i.e.,X will be a function of Y and not the other way around. Note that if X and Yare random variables such that P[Y ∈ · |X] = K(X, · ), then formula (3.5) belowsays that ψ(Y ) = X a.s. Thus, the probability kernel K from S to S is in a sensethe ‘inverse’ of the function ψ : S → S.

Theorem 3.4 (Markov functionals) Let Y be a Markov chain with countablestate space S and transition kernel P , and let ψ : S → S be a function from S intosome other countable set S. Let K be a probability kernel from S to S such that

y ∈ S : K(x, y) > 0⊂y ∈ S : ψ(y) = x

(x ∈ S), (3.5)

and let P be a transition kernel on S such that

PK = KP. (3.6)


ThenP[Y0 = y

∣∣ψ(Y0)]

= K(ψ(Y0), y) a.s. (y ∈ S) (3.7)

implies that

P[Yn = y

∣∣ (ψ(Y0), . . . , ψ(Yn))]

= K(ψ(Yn), y) a.s. (y ∈ S, n ≥ 0). (3.8)

Moreover, (3.7) implies that the process ψ(Y ) = (ψ(Yk))k≥0, on its own, is aMarkov chain with transition kernel P .

Proof Formula (3.8) says that

P[Yn = y

∣∣ (ψ(Y0), . . . , ψ(Yn))

= (x0, . . . , xn)]

= K(xn, y) a.s.

n ≥ 0, (x0, . . . , xn) ∈ Sn such that

P[(ψ(Y0), . . . , ψ(Yn)

)= (x0, . . . , xn)

]> 0,

and y ∈ S such that ψ(y) = xn. We will prove this by induction. Assume that thestatement holds for some n ≥ 0. We wish to prove the statement for n + 1. Let(x0, . . . , xn+1) be such that

P[(ψ(Y0), . . . , ψ(Yn+1)

)= (x0, . . . , xn+1)

]> 0,

and let y ∈ S be such that ψ(y) = xn+1. Then

P[Yn+1 = y

∣∣ (ψ(Y0), . . . , ψ(Yn+1))

= (x0, . . . , xn+1)]

=P[Yn+1 = y, ψ(Yn+1) = xn+1

∣∣ (ψ(Y0), . . . , ψ(Yn))

= (x0, . . . , xn)]

P[ψ(Yn+1) = xn+1

∣∣ (ψ(Y0), . . . , ψ(Yn))

= (x0, . . . , xn)]

=P[Yn+1 = y

∣∣ (ψ(Y0), . . . , ψ(Yn))

= (x0, . . . , xn)]

P[ψ(Yn+1) = xn+1

∣∣ (ψ(Y0), . . . , ψ(Yn))

= (x0, . . . , xn)] .

(3.9)

We will treat the nominator and denominator separately. To shorten notation, letus denote the conditional law given

(ψ(Y0), . . . , ψ(Yn)

)= (x0, . . . , xn) by

P(x0,...,xn)

[·]

:= P[·∣∣ (ψ(Y0), . . . , ψ(Yn)

)= (x0, . . . , xn)

].

Then

P(x0,...,xn)

[Yn+1 = y

]=

∑y′:ψ(y′)=xn

P(x0,...,xn)

[Yn+1 = y

∣∣Yn = y′]P(x0,...,xn)

[Yn = y′

].

3.2. MARKOV FUNCTIONALS 67

Here, by our induction assumption,

P(x0,...,xn)

[Yn = y′

]= K(xn, y

′) ∀y′ ∈ S s.t. ψ(y′) = xn,

while by the Markov property of Y ,

P(x0,...,xn)

[Yn+1 = y

∣∣Yn = y′]

= P[Yn+1 = y

∣∣Yn = y′,(ψ(Y0), . . . , ψ(Yn)

)= (x0, . . . , xn)

]= P

[Yn+1 = y

∣∣Yn = y′]

= P (y′, y).

Thus, we see that

P(x0,...,xn)

[Yn+1 = y

]=

∑y′:ψ(y′)=xn

K(xn, y′)P (y′, y)

=∑y′∈S

K(xn, y′)P (y′, y) = KP (xn, y

′) = PK(xn, y),

where we have used that K(xn, · ) is concentrated on y′ : ψ(y′) = xn and in thelast step we have applied our intertwining assumption.

We observe that by (3.5) and the fact that ψ(y) = xn+1,

PK(xn, y) =∑x∈S

P (xn, x)K(x, y)

=∑x∈S

P (xn, x)K(x, y)1ψ(y)=x = P (xn, xn+1)K(xn+1, y),

and thereforeP(x0,...,xn)

[Yn+1 = y

]= P (xn, xn+1)K(xn+1, y).

It follows that

P(x0,...,xn)

[ψ(Yn+1) = xn+1

]=

∑y:ψ(y)=xn+1

P(x0,...,xn)

[Yn+1 = y

]=

∑y:ψ(y)=xn+1

P (xn, xn+1)K(xn+1, y) = P (xn, xn+1).(3.10)

Inserting our last two formulas into (3.9), we obtain that

P[Yn+1 = y

∣∣ (ψ(Y0), . . . , ψ(Yn+1))

= (x0, . . . , xn+1)]

=P (xn, xn+1)K(xn+1, y)

P (xn, xn+1)= K(xn+1, y).


This completes our proof of (3.8). Moreover, formula (3.10) shows that ψ(Y ) is aMarkov chain with transition kernel P .

Let us see how Theorem 3.4 relates to the examples developed in the previoussection.

Counting process Let (X(1), X(2)) be two independent Markov processes, whereeach process takes values in S and has transition kernel P , as in Excercise 3.2. LetY and S be as defined there and let ψ : S × S → S be the function

ψ(x(1), x(2)) := 1x(1) + 1x(2),

so that Y = ψ(X(1), X(2)). Then the kernel K defined in Excercise 3.2 satisfies(x(1), x(2)) : K

(y, (x(1), x(2))

)> 0⊂ (x(1), x(2)) : ψ(x(1), x(2)) = y

.

Since moreover PK = KP2, all assumptions of Theorem 3.4 are fulfilled, so wefind that

P[(X0(1), X0(2)) = (x(1), x(2))

∣∣Y0

]= K

(Y0, (x(1), x(2))

)a.s. (3.11)

for all (x(1), x(2)) ∈ S × S implies that

P[(Xn(1), Xn(2)) = (x(1), x(2))

∣∣ (Y0, . . . , Yn)]

= K(Yn, (x(1), x(2))

)a.s. (3.12)

for all (x(1), x(2)) ∈ S × S and n ≥ 0, and under the same assumption, Y =ψ(X(1), X(2)) is a Markov chain with transition kernel P . Note that this lat-ter conclusion holds in fact even without the assumption (3.11), since Y is au-tonomous. Formula (3.12) tells us that if initially we do not know the order of theparticles, then by observing the process Y up to time n, we obtain no informationabout the order of the particles.

Conditioning on the future Let X and (X, I) be Markov chains with state spacesS and S and transition kernels P and P as in Excercise 3.3. Thus, X is a Markovchain that eventually gets trapped in one of two traps z0 and z1, and (X, I) is thesame process where the second coordinate I tells us from the beginning in whichof the two traps we will end up.

Define ψ : S → S byψ(x, i) := x

((x, i) ∈ S

),

i.e., ψ is just projection on the first coordinate. Then the kernel K from Excer-cise 3.3 satisfies

(x, i) ∈ S : K(x, (x, i)

)> 0⊂

(x, i) ∈ S : ψ(x, i) = x

3.3. INTERTWINING AND COUPLING 69

and PK = KP , so all assumptions of Theorem 3.4 are fulfilled. We thereforeconclude that

P[(X0, I0) = (x, i)

∣∣X0] = K(X0, (x, i)

)a.s.

((x, i) ∈ S

)implies that

P[(Xn, In) = (x, i)

∣∣(X0, . . . , Xn)] = K(Xn, (x, i)

)a.s.

for all n ≥ 0 and (x, i) ∈ S. More simply formulated, this says that

P[I0 = i

∣∣ X0

]= hi(X0) a.s. (i = 0, 1)

implies that

P[In = i

∣∣ (X0, . . . , Xn)]

= hi(Xn) a.s. (n ≥ 0, i = 1, 2). (3.13)

Moreover, under the same assumption, the process X, on its own, is a Markovchain with transition kernel P . Note that this is more surprising than in theprevious example, since in the present set-up, X is not autonomous as part of thejoint Markov chain (X, I). Indeed, X evolves according to the transition kernelP h0 , resp. P h1 , depending on whether I = 0 or = 1.

To understand how it is possible that X has the Markov property even though itis not autonomous, we observe that by (3.13), if we observe the whole path of theprocess X up to time n, then we do not gain more information about the presentstate of I than we would get from knowing Xn. As a result,

P[Xn+1 = xn+1

∣∣ (X0, . . . , Xn) = (x0, . . . , xn)]

= P[Xn+1 = xn+1

∣∣ (Xn, In) = (xn, 0)]h0(xn)

+ P[Xn+1 = xn+1

∣∣ (Xn, I) = (xn, 1)]h1(xn),

which depends only on xn and not on (x0, . . . , xn−1).

Thinning of exclusion processes This example does not satisfy condition (3.5),hence Theorem 3.4 is not applicable. In view of this and other examples, we willprove a more general theorem in the next section.

3.3 Intertwining and coupling

In Theorem 3.4 we have seen how intertwining is related to the problem of Markovfunctionals, i.e., the question whether certain functions of a Markov chain them-selves have the Markov property. The intertwinings in Theorem 3.4 are of a special


kind, because of the condition (3.5). In this section, we will see that intertwiningrelations between two Markov chains X and Y in general give rise to couplings be-tween X and Y such that (3.8) holds. The following result is due to Diaconis andFill [DF90, Thm 2.17]; the continuous-time analogue has been proved in [Fil92,Thm. 2].

Theorem 3.5 (Intertwining coupling) Let P and P be probability kernels oncountable state spaces S and S, respectively. Assume that K is a probability kernelfrom S to S such that

PK = KP.

Let

Py′(x, x′) :=

P (x, x′)K(x′, y′)

PK(x, y′)

(x, x′ ∈ S, y′ ∈ S, PK(x, y′) > 0

), (3.14)

and choose for Py′(x, · ) any probability law on S if PK(x, y′) = 0. Then Py′ is aprobability kernel on S for each y′ ∈ S and

P (x, y;x′, y′) := P (y, y′)Py′(x, x′)

((x, y), (x′, y′) ∈ S) (3.15)

defines a probability kernel on S := (x, y) ∈ S × S : K(x, y) > 0, where P doesnot depend on the freedom in the choice of the Py′. If (X, Y ) is the Markov chain

with transition kernel P started in an initial law such that

P[Y0 = y

∣∣X0

]= K(X0, · ) (y ∈ S),

thenP[Yn = y

∣∣ (Xk)0≤k≤n]

= K(Xn, · ) (y ∈ S, n ≥ 0),

and X, on its own, is a Markov chain with transition kernel P .

Remark It is clear from (3.15) that in the joint Markov chain (X, Y ), the secondcomponent Y is autonomous with transition kernel P , but X is in general notautonomous, unless Py′ can be chosen so that it does not depend on y′.

Proof of Theorem 3.5 We observe that∑x′∈S

Py′(x, x′) =

PK(x, y′)

PK(x, y′)= P (y, y′) (PK(x, y′) > 0),

which shows that Py′ is a probability kernel. To show that the definition of Pdoes not depend on how we define Py′(x, · ) when PK(x, y′) = 0, we observe that

3.3. INTERTWINING AND COUPLING 71

(x, y) ∈ S and P (y, y′) > 0 imply that K(x, y)P (y, y′) > 0 and hence PK(x, y′) =KP (x, y′) > 0.

To prove the remaining statements, let K be the probability kernel from S to Sdefined by

K(x, (x′, y′)

):= 1x=x′K(x, y′)

(x ∈ S, (x′, y′) ∈ S

),

and define ψ : S → S by ψ(x, y) := x. If we show that

PK = KP,

then the remaining claims follow from Theorem 3.4. We observe that

PK(x;x′, y′) =∑x′′∈S

P (x, x′′)1x′′=x′K(x′′, y′) = P (x, x′)K(x′, y′)

andKP (x;x′, y′) =

∑(x′′,y′′)∈S

1x=x′′K(x, y′′)P (x′′, y′′;x′, y′)

=∑y′′∈S

K(x, y′′)P (x, y′′;x′, y′),

so that PK = KP can be written coordinatewise as

P (x, x′)K(x′, y′) =∑y′′∈S

K(x, y′′)P (x, y′′;x′, y′)(x ∈ S, (x′, y′) ∈ S

).

To check this, we write∑y′′∈S

K(x, y′′)P (x, y′′;x′, y′) =∑y′′∈S

K(x, y′′)P (y′′, y′)P (x, x′)K(x′, y′)

PK(x, y′)

=KP (x, y′)

PK(x, y′)P (x, x′)K(x′, y′) = P (x, x′)K(x′, y′),

where we have used our assumption that KP = PK.

Thinning of exclusion processes In this example,

S =

(x, y) : x, y ∈ 0, 1Cn , y ≤ x,

and the joint evolution of (X, Y ) can be described in the same way as in (3.2), usingthe same random pair I for both X and Y . While this example is rather trivial (wecould have invented this coupling without knowing anything about intertwining!),we will see in the next section that there exist less trivial examples of intertwiningswhere the coupling of Theorem 3.5 is much more surprising.


3.4 First exit problems revisited

In this section we return to the set-up of Section 2.9. We will be interested at thebehavior of a Markov chain X until it leaves some finite set S ′. For simplicity, wewill assume that the whole state space of X just consists of S ′ plus one additionalpoint z, which is a trap. More generally, the results we derive can be used to studyMarkov chains stopped at the first time they leave a finite set.

Thus, we let X be a Markov chain with finite state space S = S ′ ∪ z, where z isa trap. We let P denote the transition kernel of X, we write P ′ for the restrictionof P to S ′ and assume that P ′ is irreducible. It follows that P ′ has a uniquePerron-Frobenius eigenvalue α and associated positive right eigenvector h, whichis unique up to a multiplicative constant. If we extend h to S by putting h(z) := 0,then h is an eigenvector of P . Indeed, since h(z) = 0, we have

Ph(x) =∑y∈S

P (x, y)h(y) =∑y∈S′

P ′(x, y)h(y) = P ′h(x) = αh(x) (x ∈ S ′),

while Ph(z) =∑

y∈S P (z, y)h(y) = h(z) = 0 by the fact that z is a trap.

Proposition 3.6 (Intertwining for chain with one trap) Let X be a Markovchain with finite state space S = S ′ ∪ z and transition kernel P . Assume that zis a trap and the restriction of P to S ′ is irreducible. Let h be the, up to multipleconstants unique, right eigenvector of P such that h(z) = 0 and h > 0 on S ′, andassume that h is normalized such that supx∈S′ h(x) ≤ 1. Let α be the eigenvalue ofh and let Y be a Markov process with state space S := 1, 2 and transition kernel

P =

(P (1, 1) P (1, 2)P (2, 1) P (2, 2)

)=

(α 1− α0 1

).

Let K be the probability kernel from S to S defined by

K(x, y) :=

h(x) if y = 1,1− h(x) if y = 2.

ThenPK = KP.

Proof This follows by writing

PK(x, y) =∑x′∈S

P (x, x′)K(x′, y) =

Ph(x) = αh(x) if y = 1,P (1− h)(x) = 1− αh(x) if y = 1,

3.4. FIRST EXIT PROBLEMS REVISITED 73

while

KP (x, y) =∑y′∈S

K(x, y′)P (y′, y)

=

K(x, 1)α = αh(x) if y = 1,K(x, 2) +K(x, 1)(1− α) = 1− h(x) + (1− α)h(x) if y = 2,

which shows that PK = KP .

By Theorem 3.5 and the remark following the latter, we see that the processes Xand Y from Proposition 3.6 can be coupled in such a way that their joint transitionprobabilities are given by

P (x, y;x′, y′) = P (y, y′)Py′(x, x′)

((x, y), (x′, y′) ∈ S

),

where S := (x, y) ∈ S × S : K(x, y) > 0 and Py′ is given by

Py′(x, x′) :=

P (x, x′)K(x′, y′)

PK(x, y′)(PK(x, y′) > 0).

Filling in the definition of K and using our formulas for PK, we see that

P1(x, x′) =

P (x, x′)K(x′, 1)

PK(x, 1)=P (x, x′)h(x′)

αh(x′)= P h(x, x′) (x, x′ ∈ S ′)

is the generalized Doob transform of P ′ as defined in Lemma 2.7 and more specif-ically in Theorem 2.17. Thus, as long as the autonomous process Y stays in 1, theprocess X jumps according to the generalized Doob transformed transition kernelP h. Recall that Ph is concentrated on S ′, hence X cannot get trapped as longas Y = 1. Starting with the time step when Y jumps to the state 2, and fromthereon, the process X jumps according to the transition kernel

P2(x, x′) =

P (x, x′)K(x′, 2)

PK(x, 2)=P (x, x′)(2− h(x′))

2− αh(x),

which is well-defined for all x ∈ S since α < 1 and h ≤ 1. Since 1 − h is not aneigenvector of P , this is not a kind of transform we have seen so far.

Remark The coupling of X and Y immediately gives us the estimate

Px[Xn ∈ S ′] ≥ K(x, 1)P1[Yn = 1] = αnh(x). (3.16)

In view of (2.6), this estimate quite good, but it is not completely sharp sincethe function h in Section 2.9 is normalized in a different way than in the present


section. Indeed, in Section 2.9 we chose h such that∑

x∈S′ η(x)h(x) = 1, where ηis a probability measure, while at present we need supx∈S′ h(x) ≤ 1. Thus (unlessh is constant on S ′), the h in our present section is always smaller than the one inSection 2.9.

Interesting examples of intertwining relations for birth-and-death processes can befound in [DM09, Swa11]. Lower estimates in the spirit of (3.16) but in the morecomplicated set-up of hierarchical contact processes have been derived in [AS10].The ‘evolving set process’ in [LPW09, Thm 17.23] (which can be defined for quitegeneral Markov chains) provides another nontrivial example of an intertwiningrelation.

Chapter 4

Branching processes

4.1 The branching property

Let S be a countable set and let N (S) be the set of all functions x : S → N suchthat

∑i∈S x(i) <∞. It is easy to check that N (S) is a countable set (even though

the set of all functions x : S → N is uncountable). We interpret x ∈ N (S) asa collection of finitely many particles or individuals, where x(i) is the number ofindividuals of type i, where S is the type space, or sometimes also the number ofparticles at the position i, where S represents physical space. Each x ∈ N (S) canbe written as

x =

|x|∑β=1

δiβ , (4.1)

where |x| :=∑

i x(i), i1, . . . , i|x| ∈ S, and δi ∈ N (S) is defined as

δi(j) := 1i=j (i, j ∈ S).

This way of writing x is of course not unique but depends on the way we orderthe individuals. We will be interested in Markov processes with state space N (S),where in each time step, each individual, independently of the others, is replacedby a finite number of new individuals (its offspring). Let Q be a probability kernelfrom S to N (S). For a given x ∈ N (S) of the form (4.1), we can constructindependent N (S)-valued random variables V 1, . . . , V |x| such that

P[V β ∈ · ] = Q(iβ, · ) (β = 1, . . . , |x|).

75

76 CHAPTER 4. BRANCHING PROCESSES

Then

P (x, · ) := P[ |x|∑β=1

V β ∈ ·]

defines a probability law on N (S). Doing this for each x ∈ N (S) defines a prob-ability kernel P on N (S). By definition, the Markov chain X with state spaceN (S) and transition kernel P is called the multitype branching process with off-spring distribution Q.

Lemma 4.1 (Branching property) Let X = (Xk)k≥0 and Y = (Yk)k≥0 be in-dependent branching processes with the same type space N (S) and offspring dis-tribution Q. Then Z = (Zk)k≥0, defined by

Zk(i) := Xk(i) + Yk(i) (k ≥ 0, i ∈ S) (4.2)

is distributed as a branching processes with type space N (S) and offspring distri-bution Q.

Proof We need to check that

P[Zk+1 = z

∣∣FZk ] = P (Zk, z) a.s.(k ≥ 0, z ∈ N (S)

),

where (FZk )k≥0 is the filtration generated by Z. We will show that actually

P[Zk+1 = z

∣∣F (X,Y )k

]= P (Zk, z) a.s.

(k ≥ 0, z ∈ N (S)

),

where (F (X,Y )k )k≥0 is the filtration generated by (X, Y ) = (Xk, yk)k≥0. Since FZk ⊂

F (X,Y )k and P (Zk, z) is FZk -measurable, this then implies that

P[Zk+1 = z

∣∣FZk ] = E[P[Zk+1 = z

∣∣F (X,Y )k

] ∣∣FZk ] = E[P (Zk, z)

∣∣FZk ] = P (Zk, z).

Thus, by the Markov property of (X, Y ), it suffices to show that

P[Zk+1 = z

∣∣Xk = x, Yk = y]

= P (x+ y, z)(k ≥ 0, x, y, z ∈ N (S)

).

Here

P[Zk+1 = z

∣∣Xk = x, Yk = y]

=∑x′≤z

P[Xk+1 = x′, Yk+1 = z − x′

∣∣Xk = x, Yk = y]

=∑x′≤z

P (x, x′)P (y, z − x′),

4.1. THE BRANCHING PROPERTY 77

where we have used that by the independence of X and Y , the process (X, Y ) isa Markov process with transition kernel P2(x, y;x′, y′) := P (x, x′)P (y, y′).

Thus, we are left with the task of showing that

P (x+ y, z) =∑x′≤z

P (x, x′)P (y, z − x′)(x, y, z ∈ N (S)

). (4.3)

Let us write

x =

|x|∑β=1

δiβ and y =

|y|∑γ=1

δjγ ,

and let V 1, . . . , V |x| and W 1, . . . ,W |y| be all independent of each other such that

P[V β ∈ · ] =Q(iβ, · ) (β = 1, . . . , |x|),P[W γ ∈ · ] =Q(jγ, · ) (γ = 1, . . . , |y|).

Then

P (x+ y, z) = P[ |x|∑β=1

V β +

|y|∑γ=1

W γ = z]

=∑x′≤z

P[ |x|∑β=1

V β = x′,

|y|∑γ=1

W γ = z − x′]

=∑x′≤z

P (x, x′)P (y, z − x′),

as required.

Remark We may define the convolution of two probability laws µ, ν on N (S) by

µ ∗ ν(z) :=∑x′≤z

µ(x′)ν(z − x′).

Then (4.3) says that

P (x+ y, · ) = P (x, · ) ∗ P (y, · )(x, y ∈ N (S)

). (4.4)

In general, any probability kernel on N (S) with this property is said to have thebranching property. It is not hard to see that a Markov process with state spaceN (S) has the branching property (4.2) if and only if its transition kernel has thebranching property (4.4). In particular (4.4) implies that if

x =

|x|∑β=1

δiβ ,


then

P (x, · ) = P (δi1 , · ) ∗ · · · ∗ P (δi|x| , · ),

where P (δi, · ) = Q(i, · ). Thus, each Markov process that has the branchingproperty is a branching process, and its transition probabilities are uniquely char-acterized by the offspring distribution.

4.2 Generating functions

For each function φ : S → R and x ∈ N (S), let us write

φx :=∏i∈S

φ(i)x(i) =

|x|∏β=1

φ(iβ) where x =

|x|∑β=1

δiβ ,

where φ0 := 1. It is easy to see that

φx+y = φxφy(x, y ∈ N (S), φ : S → R

).

Because of all the independence coming from the branching property, the linearoperator P associated with the transition kernel of a branching process mapssuch ‘multiplicative functions’ into multiplicative functions. We will especially beinterested in the case that φ takes values in [0, 1]. We let [0, 1]S denote the spaceof all functions φ : S → [0, 1].

Lemma 4.2 (Generating operator) Let P denote the transition kernel of amultitype branching process with type space S and offspring distribution Q. Let Ube the nonlinear operator defined by

1− Uφ(i) :=∑

x∈N (S)

Q(i, x)(1− φ)x(i ∈ S, φ ∈ [0, 1]S

).

Then

Pfφ = fUφ(φ ∈ [0, 1]S

),

where for any φ ∈ [0, 1]S, we define fφ : N (S)→ [0, 1] by fφ(x) := (1− φ)x.

Proof If X is started in X0 = x with x =∑|x|

β=1 δiβ , then X1 is equal in distribution

to∑|x|

β=1 Vβ where the V β’s are independent with distribution Q(iβ, · ). It follows

4.2. GENERATING FUNCTIONS 79

that

Pfφ(x) = E[(1− φ)

P|x|β=1 V

β]= E

[ |x|∏β=1

(1− φ)Vβ]

=

|x|∏β=1

E[(1− φ)V

β]=

|x|∏β=1

(1− Uφ)(iβ) = fUφ(x).

Remark 1 It would seem that the formulation of the lemma is simpler if wereplace φ by 1 − φ everywhere, but as we will see later there are good reasons toformulate things in terms of 1− φ.

Remark 2 Our assumption that 0 ≤ φ ≤ 1 guarantees that the sum∑

xQ(i, x)φx

in the definition of Uφ(i) is finite. Under more restrictive assumptions on Q, wecan define Uφ also for more general real-valued φ.

We call the nonlinear operator U from Lemma 4.2 the generating operator of thebranching process with offspring distribution Q. By induction, Lemma 4.2 showsthat P nfφ = fUnφ, or, in other words

Ex[(1− φ)Xn

]= (1− Unφ)x

(n ≥ 0, φ ∈ [0, 1]S

). (4.5)

The advantage of the operator U is that it acts on functions ‘living’ on the spaceS, while P acts on functions on the much larger space N (S). The price we payfor this is that U , unlike P , is not linear.

The next lemma shows that U contains, in a sense ‘all information we need’.

Lemma 4.3 (Generating functions are distribution determining) Let µ, νbe probability measures on N (S) such that∑

x∈N (S)

µ(x)(1− φ)x =∑

x∈N (S)

ν(x)(1− φ)x(φ ∈ [0, 1]S

).

Then µ = ν.

Proof We will prove the statement first under the assumtion that S is finite. LetN (S) ∪ ∞ be the one-point compactification of N (S). For each ψ ∈ [0, 1)S,define gψ(x) := ψx and gψ(∞) := 0. Then the gψ’s are continuous functions on thecompact space N (S)∪∞. It is not hard to see that they separate points, i.e., foreach x 6= x′ there exists a ψ ∈ [0, 1)S such that gψ(x) 6= gψ(x′). Since gψgψ′ = gψψ′ ,


the class gψ : ψ ∈ [0, 1)S is closed under multiplication. Let H be the spaceof all linear combinations of functions from this class and the identity function.ThenH is an algebra that separates points, hence by the Stone-Weierstras theoremH is dense in the space of continuous functions on N (S) ∪ ∞, equipped withthe supremumnorm. By linearity and because µ and ν are probability measures,∑

x µ(x)f(x) =∑

x ν(x)f(x) for all f ∈ H. Since H is dense, it follows that µ = ν.

If S is not finite, then by applying our argument to functions ψ that are zerooutside a finite set, we see that the finite-dimensional marginals of µ and ν agree,which shows that µ = ν in general.

There is a nice suggestive way of writing the relation (4.5). Generalizing (3.3), forany φ ∈ [0, 1]S, we define a probability kernel Kφ from N (S) to N (S) by

Kφ(x, · ) := P[ |x|∑β=1

χβδiβ ∈ ·], (4.6)

where x =∑|x|

β=1 δiβ and the χ1, . . . , χ|x| are independent Bernoulli random vari-ables with P[χβ = 1] = φ(iβ). Thus, if Z is distributed according to the lawKφ(x, · ), then Z is obtained from x by independent thinning, where a particle oftype i is kept with probability φ(i) and thrown away with the remaining probabil-ity. Note that Kφ has the branching property (4.4) and corresponds in fact to theoffspring distribution

Qφ(i, y) := Kφ(δi, y) = ρ(i)1y=δi + (1− ρ(i))1y=0(i ∈ S, y ∈ N (S)

).

Let Thinφ(x) denote a random variable with law Kφ(x, · ). Then

P[Thinφ(x) = 0

]= (1− φ)x

(x ∈ N (S), φ ∈ [0, 1]S

),

where 0 denotes the configuration in N (S) with no particles. In view of this,

1− Uφ(i) =∑

x∈N (S)

Q(i, x)(1− φ)x = δiPKφ(0)

and henceUφ(i) = δiQKφ(N (S)\0). (4.7)

Note that this says that if we start with one particle of type i, let it produceoffspring, and then thin with φ, then Uφ(i) is the probability that we are left withat least one individual. Likewise, we may rewrite (4.5) in the form

δxPnKφ(0) = δxKUnφ(0), (4.8)

4.3. THE SURVIVAL PROBABILITY 81

where µP nKφ is the law obtained by starting the branching process in x, runningit till time t, and then applying the kernel Kφ, while δxKUnφ is the law obtainedby thinning x with Unφ.

Exercise 4.4 (Repeated thinning) Show that KφKψ = Kφψ (φ, ψ ∈ [0, 1]S).

Exercise 4.5 (Thinning characterization) Let µ, ν be probability laws onN (S) such that µKφ(0) = νKφ(0) for all φ ∈ [0, 1]S. Show that µ = ν.

4.3 The survival probability

Let X be a branching process with type space S and generating operator U . Weobserve that

Pδi[Xn 6= 0] = 1− Pδi

[Thin1(Xn) = 0]

= 1− Pδi[ThinUn1(δi) = 0] = 1−

(1− Un1(i)

)= Un1(i),

where we use the symbol 1 also to denote the function that is constantly one. Since0 is a trap for any branching process,

Pδi[Xn 6= 0] = Pδi

[Xk 6= 0 ∀0 ≤ k ≤ n] −→

n→∞Pδi[Xk 6= 0 ∀k ≥ 0],

where we have used the continuity of our probability measure with respect todecreasing sequences of events. In view of this, let us write

ρ(i) := limn→∞

Un1(i) = Pδi[Xk 6= 0 ∀k ≥ 0] (4.9)

for the probability to survive starting from a single particle of type i.

Lemma 4.6 (Survival probability) The function ρ in (4.9) is the largest solu-tion (in [0, 1]S) of the equation

Uρ = ρ,

i.e., ρ solves this equation and any other solutions ρ′ of this equation, if they exist,satisfy ρ′ ≤ ρ.

Proof SinceUφ(i) =

∑x

Q(i, x)(1− (1− φ)x

),


and since φ 7→ 1− (1− φ)x is a nondecreasing function, we see that

φ ≤ ψ implies Uφ ≤ Uψ. (4.10)

By monotone convergence, we see moreover that

φn ↓ φ implies Uφn ↓ Uφ. (4.11)

Since 1 ≥ U1 we see by (4.10) and induction that Un1 ≥ Un+11 and Un1 ↓ ρ,which was in fact also clear from our probabilistic interpretation. By (4.11), itfollows that

Uρ = U limn→∞

Un1 = limn→∞

Un+1 = ρ.

Now if ρ′ ∈ [0, 1]S is any other fixed point of U , then by (4.10) and (4.11)

ρ′ ≤ 1 implies ρ′ = Unρ′ ≤ Un1 −→ρ→∞

,

which shows that ρ′ ≤ ρ.

Exercise 4.7 (Galton-Watson process) Let X be a branching process whosetype space S = 1 consists of a single point, and let Q be its offspring distribution.We identify N (1) ∼= N. Since there is only one type of individual, we only needto now with which probability a single individual produces n offspring (n ≥ 0).Thus, we simply write

Q(n) = Q(1, nδ1)

which is a probability law on N. Assume that Q has a finite second moment andlet

a :=∞∑n=0

nQ(n)

denote its mean. We identify [0, 1]S ∼= [0, 1] and let U : [0, 1] → [0, 1] be thegenerating operator of X, which is now just a (nonlinear) function from [0, 1] to[0, 1].

(a) Show that U is a concave function and that U ′(0) = a.

(b) Assume that Q(1) < 1. Show that

P 1[Xk 6= 0 ∀k ≥ 0] > 0

if and only if a > 1.

4.3. THE SURVIVAL PROBABILITY 83

Remark Single-type branching processes as in Excercise 4.7 are called Galton-Watson processes after the seminal paper (from 1875) [WG75], where they provedthat the survival probability ρ solves Uρ = ρ but incorrectly concluded from thisthat the process survives for all a ≥ 0, since ‘obviously the solution of this equationis ρ = 0’.

Exercise 4.8 (Spatial branching) Let (Ik)k≥0 be i.i.d. with P[Ik = −1] = 1/2 =P[Ik = 1], and let N be a Poisson distributed random variable with mean a,independent of (Ik)k≥0. Let X be the branching process with type space Z andoffspring distribution Q given by

Q(i, · ) := P[ N∑k=1

δi+Ik ∈ ·]

(i ∈ Z),

which says that a particle at i produces Pois(a) offspring which are independentlyplaced on either i− 1 or i+ 1, with equal probabilities. Show that

Pδ0[Xk 6= 0 ∀k ≥ 0

]> 0

if and only if a > 1.

Exercise 4.9 (Two-type process) Let X be a branching process with typespace S = 1, 2 and the following offspring distribution. Individuals of bothtypes produce a Poisson number of offspring, with mean a. If the parent is oftype 1, then its offspring are, independently of each other, of type 1 or 2 withprobability 1/2 each. All offspring of individuals of type 2 are again of type 2.Starting with a single individual of type 1, for what values of a is there a positiveprobability that there will be individuals of type 1 at all times?

Exercise 4.10 (Poisson offspring) Let X be a Galton-Watson process whereeach individual produces a Poisson number of offspring with mean a, and let

ρn := P1[Xn = 0] (n ≥ 0)

be the probability that the process started with a single individual is extinct aftern steps. Prove that ρn+1 = ea(ρn−1).


4.4 First moment formula

Proposition 4.11 (First moment formula) Let X be a branching process withtype space S and offspring distribution Q. Assume that the matrix

A(i, j) :=∑

x∈N (S)

Q(i, x)x(j) (i, j ∈ S) (4.12)

satisfies

supi∈S

∑j∈S

A(i, j) <∞. (4.13)

ThenEx[Xn(j)] =

∑i

x(i)An(i, j)(x ∈ N (S), j ∈ S, n ≥ 0

).

Proof We first prove the statement for n = 1. If x =∑|x|

β=1 δiβ , then X1 is equal in

distribution to∑|x|

β=1 Vβ where the V β’s are independent with distributionQ(iβ, · ).

Therefore,

Ex[X1(j)] = E[ |x|∑β=1

V β(j)]

=

|x|∑β=1

E[V β(j)

]=

|x|∑β=1

∑y

Q(iβ, y)y(j) =

|x|∑β=1

A(iβ, j) =∑i

x(i)A(i, j).

By induction, it follows that

Ex[Xn+1(j)] =∑x′

Px[Xn = x′]Ex[Xn+1(j) |Xn = x′]

=∑x′

Px[Xn = x′]∑i

x′(i)A(i, j) =∑i

A(i, j)∑x′

Px[Xn = x′]x′(i)

=∑i

A(i, j)∑x′

Ex[Xn(i)] =∑i

A(i, j)∑k

x(k)An(k, i) = An+1(i, k),

where all expressions are finite by (4.13).

Lemma 4.12 (Subcritical processes) Let X be a branching process with finitetype space S and offspring distribution Q. Assume that its first moment matrixA defined in (4.12) is irreducible and satisfies (4.13), and let α be its Perron-Frobenius eigenvalue. If α < 1, then

Px[Xk 6= 0 ∀k ≥ 0

]= 0

(x ∈ N (S)

).

4.4. FIRST MOMENT FORMULA 85

Proof For any f : S → R, let lf : N (S)→ R denote the ‘linear’ function

lf (x) :=∑i∈S

x(i)f(i)(x ∈ N (S), f : S → R

).

Then by Lemma 4.11,

P nlf (x) = Ex[lf (Xn)

]=∑i

f(i)Ex[Xn(i)

]=∑i

f(i)∑j

x(j)An(j, i) =∑j

x(j)Anf(j) = lAnf (x).

In particular, if h is the (strictly positive) right eigenvector of A with eigenvalueα, then

P nlh = lAnh = lαnh,

which says that

Ex[h(Xn)

]= αnh(x),

which tends to zero by our assumption that α < 1. Since h is strictly positive, itfollows that Px[Xn 6= 0]→ 0.

Proposition 4.13 (Critical processes) Let X be a branching process with finitetype space S and offspring distribution Q. Assume that its first moment matrixA defined in (4.12) is irreducible and satisfies (4.13), and let α be its Perron-Frobenius eigenvalue. If α = 1 and there exists some i ∈ S such that Q(i, 0) > 0,then

Px[Xk 6= 0 ∀k ≥ 0

]= 0

(x ∈ N (S)

).

Proof Let A := Xn 6= 0 ∀n ≥ 0 and let

ρ(i) := Pδi(A).

By the branching property,

Px(Ac) = (1− ρ)x(x ∈ N (S)

).

By the principle ‘what can happen must happen’ (Proposition 0.14), we have

(1− ρ)Xn −→0→∞

a.s. on the event A.


By irreducibility and the fact that Q(j, 0) > 0 for some j, it is not hard to see thatρ(i) < 1 for all i ∈ S. By the finiteness of S, it follows that supi∈S ρ(i) < 1 andhence

infx∈N (S), |x|≤N

(1− ρ)x > 0 (N ≥ 0).

It follows that|Xn| −→

n→∞∞ a.s. on the event A, (4.14)

i.e., the only way for the process to survive is to let the number of particles tendto infinity.

Let h be the (strictly positive) right eigenvector of A with eigenvalue α. Then

E[lh(Xn+1)

∣∣FXn ] = (Plh)(Xn) = lαh(Xn) = αlh(Xn),

which shows that (provided that E[h(X0)] < ∞, which is satisfied for processesstarted in deterministic initial states) the process

Mn := α−n∑i∈S

h(i)Xn(i) (n ≥ 0) (4.15)

is a nonnegative martingale. By martingale convergence, it follows that there existsa random variable M∞ such that

Mn −→n→∞

M∞ a.s. (4.16)

In particular, if α = 1, this proves that

|Xn| 6−→n→∞

∞ a.s.,

which by (4.14) implies that P(A) = 0.

4.5 Second moment formula

We start with some general Markov chain theory. For any probability law µ on acountable set S and functions f, g : S → R, let us write

Covµ(f, g) := µ(fg)− (µf)(µg)

for the covariace of f and g, whenever this is well-defined. Note that if X is arandom variable with law µ, then Covµ(f, g) = E[f(X)g(X)] − E[f(X)]E[g(X)],in accordance with the usual formula for the covariance of two functions.

4.5. SECOND MOMENT FORMULA 87

Lemma 4.14 (Covariance formula) Let X be a Markov chain with countablestate space S and transition kernel P . Let µ be a probability law on S and let C bea class of functions f : S → R such that

(i) f ∈ C implies Pf ∈ C.

(ii)∑x,y

µ(x)P (x, y)|f(y)g(y)| <∞ for all x ∈ S and f, g ∈ C.

Then, for any f, g ∈ C,

CovµPn(f, g) = Covµ(P nf, P ng) +n∑k=1

µP n−kΓ(P k−1f, P k−1g), (4.17)

whereΓ(f, g) := P (fg)− (Pf)(Pg) (f, g ∈ C).

Remark 1 If X = (Xk)k≥0 is the Markov chain with initial law µ and transitionkernel P , then

CovµPn(f, g) = E[f(Xn)g(Xn)

]− E

[f(Xn)

]E[g(Xn)

],

=: Cov(f(Xn), g(Xn)

),

and similarly

Covµ(P nf, P ng) = Cov((P nf)(X0), (P

ng)(X0)).

Remark 2 The assumptions of the lemma are trivially fulfilled if we take for Cthe class of all bounded real functions on S. Often, we also need the lemma forcertain unbounded functions, but in this case we need to find a class C satisfyingthe assumptions of the lemma to ensure that all second moments are finite.

Proof of Lemma 4.14 The statement is trivial for n = 0. Fix n ≥ 1 and definea function H : 0, . . . , n → R by

H(k) := P k((P n−kf)(P n−kg)

)(0 ≤ k ≤ n).

Then

µ(H(n)−H(0)

)= µP n(fg)− µ

((P nf)(P ng)

)=[µP n(fg)− (µP hf)(µP ng)

]−[µ((P nf)(P ng)

)− (µP hf)(µP ng)

]= CovµPn(f, g)− Covµ(P nf, P ng).


It follows that

CovµPn(f, g)− Covµ(P nf, P ng) =n∑k=1

µ[H(k)−H(k − 1)

]=

n∑k=1

µ[P k((P n−kf)(P n−kg)

)− P k−1

((P n−k+1f)(P n−k+1g)

)]=

n∑k=1

µP k−1Γ(P n−kf, P n−kg).

Changing the summation order (setting k′ := n− k + 1), we arrive at (4.17).

We now apply this general formula to branching processes. To simplify matters,we will only look at finite type spaces.

Proposition 4.15 (Second moment formula) Let X be a branching processwith finite type space S and offspring distribution Q. Let V i denote a randomvariable with law Q(i, · ) and assume that

A(i, j) := E[V i(j)],

C(i; j, k) := E[V i(j)V i(k)](4.18)

are finite for all i, j, k ∈ S. Let A be the linear operator with matrix A(i, j) andfor functions f, g : S → R, let C(f, g) : S → R be defined by

C(f, g)(i) :=∑j,k∈S

C(i; j, k)f(j)g(k).

Fox x ∈ N (S) and f : S → R, let xf :=∑

i x(i)f(i). Then, for functionsf, g : S → R, one has

Ex[Xnf

]=xAnf,

Covx(Xnf,Xng

)=

n∑k=1

xAn−kC(Ak−1f, Ak−1g),(4.19)

where Covx denotes covariance w.r.t. to the law Px.

Proof The first formula in (4.19) has already been proved in Lemma 4.11. As inthe proof of Lemma 4.12, for any real function f on S, let lf : N (S)→ R denote

4.6. SUPERCRITICAL PROCESSES 89

the ‘linear’ function lf (x) :=∑

i f(i)x(i). Then the first formula in (4.19) saysthat

P nlf = lAnf (f ∈ RS),

which motivates us to take for the class C in Lemma 4.14 the class of ‘linear’functions lf with f : S → R any function. Using the fact that C(i; j, k) < ∞ forall i, j, k ∈ S, it is not hard to prove that C satisfies the assumptions of Lemma 4.14.Let x =

∑|x|β=1 δiβ and let V 1, . . . , V |x| be independent such that V β is distributed

according to Q(iβ, · ). We calculate

Γ(lf , lg)(x) =(P (lf lg)− (Plf )(Plg)

)(x)

= Cov(∑j,β

f(j)V β(j) ,∑k,γ

g(k)V γ(k))

=∑jk

f(j)g(k)∑β

Cov(V β(j), V β(k)

)=∑ijk

x(i)f(j)g(k)C(i; j, k) = lC(f, g)(x),

where we have used that Cov(V β(i), V γ(j)

)= 0 for β 6= γ by independence. Then

Lemma 4.14 tells us that

CovδxPn(lf , lg) =n∑k=1

δxPn−kΓ(P k−1lf , P

k−1lg) =n∑k=1

δxPn−kΓ(lAk−1f , lAk−1g)

=n∑k=1

δxPn−klC(Ak−1f, Ak−1g) =

n∑k=1

lAn−kC(Ak−1f, Ak−1g)(x),

which proves the second formula in (4.19).

4.6 Supercritical processes

The aim of this section is to prove that supercritical branching processes survive.

Proposition 4.16 (Supercritical process) Let X be a branching process withfinite type space S and offspring distribution Q. Assume that the first and secondmoments A(i, j) and C(i; j, k) of Q, defined in (4.18), are all finite and that A isirreducible. Assume that the Perron-Frobenius eigenvalue α of A satisfies α > 1.Then

Px[Xk 6= 0 ∀k ≥ 0

]> 0

(x ∈ N (S), x 6= 0

).


Proof Let h be the (strictly positive) right eigenvector of A with eigenvalue α. Wehave already seen in formulas (4.15)–(4.16) in the proof of Proposition 4.13 thatMn = α−nXnh is a nonnegative martingale that converges to an a.s. limit M∞.By Proposition 0.8, if we can show that M is uniformly integrable, then

Ex[M∞] = xh > 0

for all x 6= 0, which shows that the process survives with positive probability.Thus, it suffices to show that

supn≥0

Ex[M2

n] <∞.

We writeEx[M2

n] =(Ex[α−nXnh]

)2+ Varx(α−nXnh),

where by the first formula in (4.19)(Ex[α−nXnh]

)2= (α−nxAnh)2 = (xh)2

is clearly bounded uniformly in n. We observe that since h is strictly positive andS is finite, we can find a constant K <∞ such that

C(h, h) ≤ Kh.

Therefore, applying the second formula in (4.19), we see that

Varx(α−nXnh) = α−2n

n∑k=1

xAn−kC(Ak−1h,Ak−1h)

= α−2n

n∑k=1

xAn−kα2(k−1)C(h, h) ≤ α−2nKn∑k=1

xAn−kα2(k−1)h

= α−2nKn∑k=1

αn−kα2(k−1)xh = Kxhn∑k=1

αk−n−2 ≤ Kxhα−2

∞∑k=0

α−k <∞,

uniformly in n ≥ 1, where we have used that α > 1.

4.7 Trimmed processes

Let X be a multitype branching process with countable type space S and offspringdistribution Q. Recall from Lemma 4.6 that

ρ(i) := Pδi[Xk 6= 0 ∀k ≥ 0

](i ∈ S)

4.7. TRIMMED PROCESSES 91

is the largest solution of the equation Uρ = ρ, where U is the generating operatorof X. Let ρ ∈ [0, 1]S be any solution of Uρ = ρ. The aim of the present sectionis to prove a result similar to conditioning on the future (as in Proposition 1.5)or intertwining of processes with one trap (Proposition 3.6), but now on the levelof the individuals in a branching process. More precisely, we will divide the pop-ulation into two ‘sorts’ of individuals, with probabilities (depending on the type)ρ(i) and 1 − ρ(i). In particular, if ρ is the survival probability, then we dividethe population at each time k into those individuals which we know are going tosurvive (or, more precisely, that have living descendants at all times), and thosewhose descendants are going to die out completely.

To this aim, let x =∑|x|

β=1 δiβ ∈ N (S) and let χ1, . . . , χ|x| be independent Bernoullirandom variables with P[χβ = 1] = ρ(iβ). Doing this for any x ∈ N (S), we definea probability kernel Lρ from N (S) to N (S × 0, 1) by

Lρ(x, · ) := P[ |x|∑β=1

δ(iβ ,χβ) ∈ ·].

Another way to describe Lρ is to note that Lρ has the branching property (4.4)and

Lρ(δi, y) = ρ(i)1y=δ(i,1) + (1− ρ(i))1y=δ(i,0)(i ∈ S, y ∈ N (S)

).

Note that this is very similar to the thinning kernel Kρ defined in (4.6).

We set

S :=

(i, σ) : i ∈ S, σ ∈ 0, 1, K(δi, δ(i,σ)) > 0,

i.e.,

S := S0 ∪ S1, where

S0 := (i, 0) : 1− ρ(i) > 0 and S1 := (i, 0) : 1− ρ(i) > 0.

Then Lρ is in effect a probability kernel from N (S) to N (S).

For any x ∈ N (S) and S ′ ⊂ S, let us write

x∣∣S′

:=(x(i))i∈S′ ∈ N (S ′)

for the restriction of x to S ′ (and similarly for subsets of S). With this notation,if Y is a random variable with law Lρ(x, · ), then Y |S1 (resp. Y |S0) is a thinningof x with the function ρ (resp. 1− ρ).


Recall that the offspring distribution Q of X is a probability kernel from S toN (S). Let

E :=y ∈ N (S) : y

∣∣S1

= 0.

Observe that QLρ is a probability kernel from S to N (S). For given i ∈ S, theprobability of E under QLρ(i, · ) is given by

QLρ(i, E) =∑

x∈N (S)

Q(i, x)(1− ρ)x = 1− Uρ(i) = 1− ρ(i), (4.20)

where we have used that Uρ = ρ. We define a probability kernel Q from S toN (S) by

Q(i, σ; · ) :=

QLρ

(i, ·∣∣ E) if σ = 0,

QLρ(i, ·∣∣ Ec)

if σ = 1,

where we are conditioning the probability lawQLρ(i, · ) on the event E and its com-

plement, respectively. By (4.20), these conditional probabilities are well-defined,i.e., QLρ(i, E) > 0 for all (i, 0) ∈ S0 and QLρ(i, Ec) > 0 for all (i, 1) ∈ S1.

In words, our definition of Q says that an individual of type (i, 0) ∈ S0 producesoffspring in the following manner. First, we produce offspring according to thelaw Q(i, · ) of our original branching process. Next, we assign to these individualsindependent ‘signs’ 0, 1 with probabilities depending on the type of the individualthrough the function ρ. Finally, we condition on producing only offspring with sign0. Individuals of type (i, 1) ∈ S1 reproduce similarly, but in this case we conditionon producing at least one offspring with sign 1. Note that individuals of a type inS1 always produce at least one offspring in S1, and possibly also offspring in S0.Individuals in S0 produce only offspring in S0, and possibly no offspring at all.

Theorem 4.17 (Distinguishing surviving particles) Let X = (Xk)k≥0 be abranching process with finite type space S and irreducible offspring distribution Q.Assume that X survives (with positive probability). Let ρ, Lρ, S, and Q be as

defined above. Then X can be coupled to a branching process Y with type space Sand offspring distribution Q, in such a way that

P[Yn = y

∣∣ (Xk)0≤k≤n]

= Lρ(Xn, · ) (y ∈ S, n ≥ 0). (4.21)

Proof We apply Theorem 3.5. In fact, for our present purpose Theorem 3.4 issufficient, where the function ψ occurring there is given by

ψ(y)(i) := y(i, 0) + y(i, 1)(y ∈ N (S), i ∈ S

).


Let P and P denote the transition kernels of X and Y , respectively. We need tocheck that

PLρ = LρP.

Since P,Lρ, and P have the branching property (4.4), it suffices to check that

δiPLρ = δiLρP (i ∈ S).

Indeed, by our definition of Q,

δiLρP = (1− ρ(i))Q(i, 0; · ) + ρ(i)Q(i, 1; · )= (1− ρ(i))QLρ(i, · | E) + ρ(i)QLρ(i, · | Ec) = QLρ(i, · ) = δiPLρ,

where we have used (4.20).

Proposition 4.18 (Trimmed process) Let X and Y be the branching processeswith type spaces S and S in Theorem 4.17, let ρ be as in that theorem and let Ube the generating operator of X. Then(

Yk∣∣S1

)k≥0

is a branching process with type space S1 and generating operator Uρ given by

Uρφ(i) = ρ−1(i)U(ρφ)(i)(i ∈ S1, φ ∈ [0, 1]S1

),

where ρφ denotes the pointwise product of ρ and φ, which is extended to a functionon S by setting ρφ(j) = 0 for all j ∈ S\S1.

Proof Since individuals in S0 never produce offpring in S1, it is clear that therestriction of Y to S1 is a branching process. Let Q′ be the offspring distributionof the restricted process. Then Q′(i, · ) is just QKρ(i, · ) conditioned on producingat least one offspring, where Kρ is the thinning kernel defined in (4.6). Then,setting G := N (S1)\0, we have by (4.7)

Uρφ(i) = δiQ′Kφ(G) =

δiQKρKφ(G)

δiQKρ(G)=δiQKρφ(G)

δiQKρ(G)=U(ρφ)(i)

Uρ(i)= ρ(i)−1U(ρφ)(i),

where we have used that Uρ = ρ.

Remark In particular, if ρ is the survival probability, then the branching processY from Proposition 4.18 has been called the trimmed tree of X in [FS04], whichdeals with continuous type spaces and continuous time. Similar constructions havebeen used in branching theory long before this paper but it is hard to find a goodreference.


Exercise 4.19 (Nonbranching process) Let S be a countable set, let P ′ be aprobability kernel on S, and let Q be the offspring distribution defined by

Q(i, δj) := P ′(i, j) (i, j ∈ S),

and Q(i, x) := 0 if |x| 6= 1. In other words: a particle at the position i producesexactly one offspring on a position that is distributed according to P ′(i, · ). LetX be the branching process with type space S and ofspring distribution Q and letU be its generating operator. Show that U = P ′. In particular, this says that afunction ρ ∈ [0, 1]S solves Uρ = ρ if and only if ρ is harmonic for P ′ and Uρ = (P ′)ρ

is just the classical Doob transform of P ′.

If X is a multitype branching process with type space S and offspring distributionQ, then let us write i → j if Q(i, x : x(j) > 0) > 0 and i j if thereexist i = i0 → · · · → in = j. We say that Q is irreducible if i j for alli, j ∈ S. In particular, if Q has finite first moments, then this is equivalent toirreducibility of the matrix A(i, j) in (4.18). We also define aperiodicity of Q inthe obvious way, i.e., an irreducible Q is aperiodic if the greatest common divisiorof n ≥ 1 : P n(δi, x : x ≥ δi) > 0 is one for some, and hence for all i ∈ S. If Qis irreducible and

ρ(i) = Pδi[Xk 6= 0 ∀k ≥ 0

](i ∈ S), (4.22)

then it is not hard to see that either ρ(i) > 0 for all i ∈ S, or ρ(i) = 0 for all i ∈ S.In the first case, we say that X survives, while in the second case we say that Xdies out.

Exercise 4.20 (Immortal process) Let X be a branching process with finitetype space S, offspring distribution Q, and generating operator U . Assume that Qis irreducible and aperiodic and that Q(i, 0) = 0 for each i ∈ S, i.e., each individualalways produces at least one offspring. Assume also that Q(i, x : |x| ≥ 2) > 0for at least one i ∈ S. Then it is not hard to show that

Pδi [Xn(j) ≥ N ] −→n→∞

1 (i ∈ S, N <∞).

Use this to show that for any φ ∈ [0, 1]S that is not identically zero,

Unφ(i) −→n→∞

1 (i ∈ S).

In particular, this shows that the equation Uρ = ρ has only two solutions: ρ = 0and ρ = 1.


Exercise 4.21 (Fixed points of generating operator) Let X be a branchingprocess with finite type space S, offspring distribution Q, and generating operatorU . Assume that Q is irreducible and aperiodic and that Q(i, x : |x| ≥ 2) > 0 forat least one i ∈ S. Assume that the survival probability ρ in (4.22) is positive forsome, and hence for all i ∈ S. Show that for any φ ∈ [0, 1]S that is not identicallyzero,

Unφ(i) −→n→∞

ρ(i) (i ∈ S).

In particular, this shows that ρ is the only nonzero solution of the equation Uρ = ρ.Hint: use Proposition 4.18 to reduce the problem to the set-up of Excercise 4.20.

Exercise 4.22 (Exponential growth) In the set-up of Excercise 4.21, assumethat Q has finite first moments. Let α be the Perron-Frobenis eigenvalue of thefirst moment matrix A defined in (4.12) and let h > 0 be the associated righteigenvector. Assume that α > 0, let M = (Mn)n≥0 be the martingale defined in(4.15), and let M∞ := limn→∞Mn. Set

ρ′(i) := Pδi[M∞ > 0

].

Prove that ρ′ = ρ, where ρ is the survival probability defined in (4.22). Hint: showthat Uρ′ = ρ′.

Appendix A

Supplementary material

A.1 The spectral radius

Proof of Lemma 2.10 Suppose that λ ∈ spec(A) and let f be an associatedeigenvector. Then ‖Anf‖ = |λ|n‖f‖ which shows that ‖An‖1/n ≥ |λ| and henceρ(A) ≥ |λ|.

To complete the proof, it suffices to show that ρ(A) ≤ λ+, where λ+ := sup|λ| :λ ∈ spec(A). We start with the case that A can be diagonalized, i.e., thereexists a basis e1, . . . , ed of eigenvectors with associated eigenvalues λ1, . . . , λd.By Excercise 2.5 the choice of our norm on V is irrelevant. We choose the `1-normwith respect to the basis e1, . . . , ed, i.e.,

‖φ‖ :=∑i=1

|φ(i)|,

where φ(1), . . . , φ(d) are the coordinates of φ w.r.t. this basis. Then

‖Anφ‖ = ‖d∑i=1

φ(i)λni ei‖ =d∑i=1

|φ(i)| |λi|n ≤ λn+‖φ‖,

which proves that ‖An‖ ≤ λn+ for each n ≥ 1 and hence ρ(A) ≤ λ+.

In general, A need not be diagonalizable, but we can choose a basis such thatthe matrix of A w.r.t. this basis has a Jordan normal form. Then we may writeA = D+E where D is the diagonal part of A and E has ones only on some placesjust above the diagonal and zeroes elsewhere. One can check that E is nilpotent,

97

98 APPENDIX A. SUPPLEMENTARY MATERIAL

i.e., Em = 0 for some m ≥ 1. Moreover E commutes with D and ‖E‖ ≤ 1 if wechoose the `1-norm with respect to the basis e1, . . . , ed. Now

An = (D + E)n =m−1∑k=0

(n

k

)Dn−kEk

and therefore

‖An‖ ≤m−1∑k=0

(n

k

)‖Dn−k‖ ≤

m−1∑k=0

(n

k

)λn−k+ = λn+

m−1∑k=0

(n

k

)λ−k+ =: q(n)λn+,

whereq(n) = 1 + nλ−1

+ + 12n(n− 1)λ−2

+ + · · ·

is a polynomial in n of degree m. In particular, this shows that ‖An‖ (λ+ + ε)n

as n→∞ for all ε > 0, which again yields that ρ(A) ≤ λ+.

Bibliography

[AS10] S.R. Athreya and J.M. Swart. Survival of contact processes on the hierar-chical group. Probab. Theory Relat. Fields 147(3), 529–563, 2010.

[Bil86] P. Billingsley. Probability and Measure. Wiley, New York, 1986.

[Chu74] K.L. Chung. A Course in Probability Theory, 2nd ed. Academic Press,Orlando, 1974.

[DF90] Strong stationary times via a new form of duality. P. Diaconis and J. Fill.Ann. Probab. 18(4), 1483–1522, 1990.

[DM09] P. Diaconis and L. Miclo. On times to quasi-stationarity for birth anddeath processes. J. Theor. Probab. 22, 558–586, 2009.

[DS67] J.N. Darroch and E. Seneta. On quasi-stationary distributions in absorb-ing continuous-time finite Markov chains. J. Appl. Probab. 4, 192–196,1967.

[DZ98] A. Dembo and O. Zeitouni. Large deviations techniques and applications2nd edition. Applications of Mathematics 38. Springer, New York, 1998.

[Fil92] J.A. Fill. Strong stationary duality for continuous-time Markov chains. I.Theory. J. Theor. Probab. 5(1), 45–70, 1992.

[FS04] K. Fleischmann and J.M. Swart. Trimmed trees and embedded particlesystems. Ann. Probab. 32(3A), 2179-2221, 2004.

[Gan00] F.R. Gantmacher. The Theory of Matrices, Vol. 2. AMS, Providence RI,2000.

[Lach12] P. Lachout. Disktretnı martingaly. Skripta MFF UK, 2012.

99

100 BIBLIOGRAPHY

[Lig10] T.M. Liggett. Continuous Time Markov Processes. AMS, Providence,2010.

[Lig99] T.M. Liggett. Stochastic interacting systems: contact, voter and exclusionprocesses. Springer-Verlag, Berlin, 1999.

[LL10] G.F. Lawler and V. Limic. Random walk: A modern introduction. Cam-bridge Studies in Advances Mathematics 123. Cambridge University Press,Cambridge, 2010.

[LP11] P. Lachout and Z. Praskova. Zaklady nahodnych procesu I. Skripta MFFUK, 2011.

[LPW09] D.A. Levin, Y. Peres and E.L. Wilmer. Markov Chains and MixingTimes. AMS, Providence, 2009.

[RP81] L.C.G. Rogers and J.W. Pitman. Markov functions. Ann. Probab. 9(4),573–582, 1981.

[Sen73] E. Seneta. Non-Negative Matrices: An Introduction to Theory and Appli-cations. George Allen & Unwin, London, 1973.

[Swa11] J.M. Swart. Intertwining of birth-and-death processes. Kybernetika 47(1),1–14, 2011.

[WG75] H.W. Watson and F. Galton. On the Probability of the Extinction ofFamilies. Journal of the Anthropological Institute of Great Britain 4, 138–144, 1875.

[Woe00] W. Woess. Random walks on infinite graphs and groups. CambridgeTracts in Mathematics 138. Cambridge University Press, Cambridge, 2000.

[Woe09] W. Woess. Denumerable Markov chains. Generating functions, boundarytheory, random walks on trees. EMS Textbooks in Mathematics. EMS,Zurich, 2009.

Index

absorbing state, 24adaptedness, 6algebraic multiplicity, 49autonomous Markov chain, 13, 65

Bernoulli random variable, 62branching process, 76branching property, 77

characteristic polynomial, 49codimension, 52compensator, 8convergence of σ-fields, 9convolution, 77coupling, 34

detailed balance, 21Doob transform, 25

generalized, 46, 56

eigenspace, 49eigenvector

left or right, 42equivalence of states, 17escape probability, 33extinction, 94

Fekete’s lemma, 43filtered probability space, 7filtration, 6first entrance time, 6

Galton-Watson process, 83Gelfand’s formula, 50

generalized eigenspace, 49generalized eigenvector, 49generating operator, 79geometric multiplicity, 49

h-transform, 25harmonic function, 23homogeneous Markov chain, 12

incrementsindependent, 31

integer lattice, 34intertwining, 61invariant

law, 19measure, 19

invariant subspace, 52irreducibility, 17, 41, 94

Jordan normal form, 49

Markovchain, 12functional, 13property, 10, 15strong property, 16

martingale, 7convergence, 9

multiplicityalgebraic, 49geometric, 49

nonnegative definite, 49

101

102 INDEX

nonnegative matrix, 41null recurrence, 18

offspring distribution, 76operator norm, 41optional stopping, 8

period of a state, 18periodicity, 18Perron-Frobenius theorem, 42positive recurrence, 17, 19

quasi-stationary law, 58

random mapping representation, 13random walk

on integer lattice, 34on tree, 30

real matrix, 41recurrence, 17

null, 18positive, 17, 19

reversibility, 21

sample path, 5self-adjoint transition kernel, 21similar matrices, 61spectral gap

absolute, 54spectral radius, 41spectrum, 41stochastic process, 5stopped

Markov chain, 13stochastic process, 7submartingale, 8

stopping time, 6strong Markov property, 16subadditivity, 43subharmonic function, 23submartingale, 7

convergence, 9superharmonic function, 23supermartingale, 7

convergence, 9survival, 94

thinning, 62, 80total variation distance, 38transience, 17transition

kernel, 12probabilities, 12

transposed matrix, 42trap, 24type space, 75

uniformly integrable, 10

advanced topics in markov chainsstaff.utia.cas.cz/swart/chain10.pdf · proposition 0.3(optional...

Documents