part c: interacting particle systemschleboun/partc-ipsnotes-2018.pdf · observed phenomena of...

Part C: Interacting Particle Systems

Paul Chleboun

Last updated:

May 15, 2019

Contents

1 Introduction and Notation 51.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Brief recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Electrical networks 112.1 Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Electrickery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Effective Resistance and Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Some further probabilistic interpretations . . . . . . . . . . . . . . . . . . . . . . 16

3 Polya’s Theorem 183.1 Network reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Polya’s Theorem and proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Uniform Spanning Tree and Wilson’s algorithm 224.1 Wilson’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2 Electrical interpretations and negative association . . . . . . . . . . . . . . . . . . 254.3 Weak limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.1 A Topological Aside . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3.2 Back to the UST measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Percolation 285.1 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.2 The phase transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Percolation toolbox 326.1 Harris-FKG inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 BK inequlaity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.3 Margulis-Russo formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Further percolation 387.1 Exponential decay in the subcritical regime . . . . . . . . . . . . . . . . . . . . . 387.2 Kesten’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407.3 The Russo-Seymour-Welsh theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

8 Exclusion, contact and voter models 438.1 Semigroups and generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438.2 Contact process and graphical construction . . . . . . . . . . . . . . . . . . . . . 47

8.2.1 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488.2.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498.2.3 Stationary measures and percolation . . . . . . . . . . . . . . . . . . . . . 51

1

CONTENTS 2

8.3 Exclusion process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528.3.1 ASEP and blocking measures . . . . . . . . . . . . . . . . . . . . . . . . . 548.3.2 SSEP stationary measure, coupling and duality . . . . . . . . . . . . . . . 54

9 Ising, Potts and Random Cluster models 599.1 Gibbs States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599.2 Ising and Potts Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619.3 Random-cluster model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

9.3.1 Coupling with the Potts model . . . . . . . . . . . . . . . . . . . . . . . . 629.3.2 Potts model two point correlations . . . . . . . . . . . . . . . . . . . . . . 639.3.3 Basic properties of the FK model . . . . . . . . . . . . . . . . . . . . . . . 639.3.4 ∞-volume and the phase transition . . . . . . . . . . . . . . . . . . . . . . 64

10 Stochastic Ising model and Mixing times 6610.1 Phase transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6710.2 Large deviations of the empirical magnetization . . . . . . . . . . . . . . . . . . . 6710.3 Mixing time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

10.3.1 Bottle neck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6910.3.2 Path coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

PART CINTERACTING PARTICLE SYSTEMS

16 lectures MT 2018

Aims

The aims is to introduce fundamental probabilistic and combinatorial tools, as well as key mod-els, in the theory of discrete disordered systems. We will examine the large-scale behavior ofsystems containing many interacting components, subject to some random noise. Models of thistype have a wealth of applications in statistical physics, biology and beyond, and we will seeseveral key examples in the course. Many of the tools we will discuss are also of independenttheoretical interest, and have far reaching applications. For example, we will study the amountof time it takes for these process to reach the stationary distribution (mixing time). This con-cept is also important in many statistical applications, such as studying the run time of MCMCmethods.

Prerequisites:

Discrete and continuous time Markov process on countable state space, as covered for examplein Part A A8 Probability and Part B SB3a Applied Probability.

Synopsis:

• Percolation and phase transitions.

• Uniform spanning trees, loop-erased random walks, and Wilson’s algorithm.

• Random walks on graphs and electrical networks (discrete potential theory).

• Important models: Stochastic Ising model (Glauber dynamics), Random-cluster model,Contact process, Exclusion process, Hard-core model.

• Important tools: Monotonicity, coupling, duality, FKG inequality.

• Gibbs measures and a discussion of phase transitions in this context.

• Mixing times and the spectral gap.

Reading

• G. Grimmett, Probability on Graphs, 2nd edition, Cambridge University Press (2018)

• R. Lyons, Y. Peres, Probability on Trees and Networks, available online.

• D. A. Levin, Y. Peres, E. L. Wilmer, Markov Chains and Mixing Times, American Mathemat-ical Society (2009)

• T. Liggett, Continuous time Markov Processes, Graduate studies in mathematics (2010)

• B. Bollobas, O. Riordan, Percolation, Cambridge University Press (2009)

• P. Doyle, J. L. Snell, Random Walks and electrical networks, Cambridge University Press(2012)

CONTENTS 4

Classes and problem sheets

There will be 16 hours of lectures; L1-L16. There will be 4 problem classes and assignementsheets. The 1st class will be shorter than usual covering material from the first week of lectures,it will run in Week 2. The 2nd and 3rd will be ‘standard’ and run in Weeks 4-5 and Weeks 6-8respectively. Class 4 will take place in Week 1 HT. The final sheet will be longer, and includemore stretching material, the sort of problems that would be nice to ask but no time duringterm. Full solutions will be provided to the students.

Chapter 1

Introduction and Notation

1.1 Introduction

Interacting particle systems (IPS) are Markov processes, in continuous or discrete time, whichdescribe ‘particles’ moving in some underlying discrete space, subject to some random noiseand interactions. Such models often arise naturally in theoretical physics (statistical mechan-ics), and there are now numerous applications of these systems across the natural and socialsciences, including in; population genetics, epidemiology, computer science, electrical engineer-ing, and economics. A few examples are given below and on the associated slides. Interactingparticle systems also provide a natural framework to study fundamental phenomena which oc-cur in these applications, such as phase transitions, metastability and relaxation to equilibrium.One of the main goals in this field is to understand and predict emergent behaviour on macro-scopic (large) scales, as a result of the microscopic dynamics and interactions of the individualcomponents. In the past decades this field has grown in importance and established itself as oneof the most active branches of probability theory.

The microscopic dynamics of interacting particle systems are often very simple to describe,however they are frequently very challenging to analyse, in particular on large or infinite scales.On the other-hand, they are often simple enough to be open to detailed mathematical (rigorous)analysis while still capturing the main features of interest in applications. Qualitative changesin large scale macroscopic behavior depending on the system parameters are often known as‘collective phenomena’ or phase transitions, and are of particular interest. A couple of realworld examples [Figures on lecture slides]:

1. Figure 1 shows colour-coded data for car speed on the M25 as a function of space and time,as recorded by sensors under the road. The striped patterns of low-speed (red) correspondto stop-and-go waves during rush hour. Often there is no obvious external cause such asan accident, so the pattern is interpreted as an intrinsic collective phenomenon emergingfrom the interactions of cars on a busy road. A minimal mathematical description of thissituation in terms of IPS would be to take a one-dimensional lattice (or a subset thereof),and at each site denote the presence or absence of a car with an occupation number 1 or 0respectively. In terms of dynamics, we simplify the system so that we can determine if theobserved phenomena of interest is a result of relatively simple interactions. We only wantto model normal traffic on a single lane road without car crashes or overtaking. So carsare allowed to proceed one lattice site to the right, say, with a given rate 1, provided thatthe site in front is not occupied by another car. The rate may depend on the surroundingconfiguration of cars (e.g. number of empty sites ahead), and relatively simple choicesdepending only on three or four neighboring sites can already lead to interesting patternsand the emergence of stop-and-go waves. The defining features of this process in terms ofIPS are that no particles are created or destroyed (conservation of the number of particles)and that there is at most one particle per site (exclusion rule). We will see the most basicexample of a model in this class during the course, called the simple exclusion processes,which not only has man applications (including in biological transport systems) but hasattracted a great deal of attention in the mathematical literature and is at the heart ofmany very important results.

5

1.2. NOTATION 6

2. Figure 2 shows segregation patterns of a microbial species (E. coli) when grown in anutrient rich medium from a mixed initial population of circular shape. Each individualappears either red or green as a result of a neutral genetic marker that only affects thecolour. A possible IPS model of this system might evolve on a two dimensional lattice(say Z2 for simplicity), where a state is described by assigning to each vertex one of threecolours. The dynamics can be modeled by letting each individual split into two with agiven rate, and then place the offspring on an empty (white) neighboring site. If there isno empty neighboring site the reproduction rate is zero (or equivalently the offspring isimmediately killed). Therefore we have two equivalent species competing for the sameresource (empty sites), and spatial segregation is a result of the fact that once the redparticles died out in a certain region due to fluctuations, all the offspring is descendingfrom green ancestors. Note that in contrast to the first example, the number of particlesin this model is not conserved. The Eden model is a well known model of pure growth inthis way. The simplest such process to model extinction or survival of a single species iscalled contact process.

3. Figure 3 shows a snapshot of a simplified model of magnetic materials, called the Isingmodel. This is one of the main prototypical models in statistical physics for exploringphase transitions and one of the most studied lattice spin models. Many important devel-opments have been driven by studying this model (and slight generalisations), includingthe interplay between static (equilibrium properties) and dynamical features. In the Isingmodel each vertext of Z2 (or some subset) is assigned a value of either +1 (associated withspin-up) or −1 (associated with spin-down). The model is then defined by a probabilitymeasure on this space, which is parametrized by the system temperature and an externalmagnetic field. Natural Markovian dynamics can be associated with the model, an a par-ticular special class called the Glauber dynamics have been extensively studied. The figureshows th system switching from one metastable state (mostly −1) to a stable one (mostly+1) by droplet growth. Metastable dynamics of stochastic systems is currently a rich anddiverse research area in its own right.

4. Figure 4 shows a snapshot from a kinetically constrained model under Glauber type dy-namics, relaxing to equilibrium. Kinetically constrained models where introduced in thephysics literature as minimal models for the dynamics of glassy (or amorphous) materials- such as supercooled liquids. These models have recently attracted significant attentionin the mathematics community. Developments here are occurring simultaneously in themore applied and theoretical communities and there is a great deal of interdisciplinaryresearch in the area of glassy-dynamics.

In these notes we will cover the basic definitions, concepts, and tools, that are used in thestudy of interacting particle systems. Along the way we will introduce some classical examplesof these systems systems. We will then focus mainly on the large scale behavior, such as phasetransitions, and dynamical properties such as mixing and relaxation. Many of the tools we willdiscuss are also of independent theoretical interest, and have far reaching applications. Forexample, we will study the amount of time it takes for these process to reach the stationarydistribution (mixing time). This concept is also important in many statistical applications, suchas studying the run time of MCMC methods.

1.2 Notation

I will attempt to stick to the following notation and terminology throughout the notes. Theremaybe some need to stray from this now and again to improve clarity.

1.3. BRIEF RECAP 7

We will typically denote the state space (i.e the set of all possible configurations of the system)by Ω, and configurations will typically be denoted by small Greek letters η, σ, . . . ∈ Ω. Forinteracting particle systems Ω will be of the form SΛ, where S ⊂ Z is called the local state space,which will typically be finite (e.g. S = 0, 1 corresponding to the presence or absence of aparticle at a give site for exclusion type models and S = −1,+1 for the Ising model). Λ willbe a countable set which represents the lattice on which the ‘particles’ live. When S is finite Ωis compact and metrizable with respect to the product topology. We will try to always denotevertices (or sites) in the lattice Λ by small Latin letters, x, y, z ∈ Λ. For η ∈ Ω and x ∈ Λ the localoccupation number is denoted by η(x) ∈ S. So a configuration η ∈ Ω can be thought of as avector (η(x))x∈Λ indexed by the lattice Λ, or as a function η : Λ→ S. Clearly if S and Λ are bothfinite then so is Ω, if S is countable and Λ is finite then Ω is countable, we will typically focuson these two cases. Often when discussing the ’large-scale’ behavior we will be considering asequence of increasingly large systems Ωn = SΛn , where |Λn| → ∞ as n → ∞. As soon as|S| > 1 and Λ is infinite the state space Ω is uncountable. This is often the case in models instatistical mechanics which are defined on the entire lattice Zd. For a measurable structure inthis case we will take the Borel σ-algebra generate by open cylinders, denoted by B(Ω). We willdenote the set of probability measures on Ω by P(Ω). Probability measures on Ω will typicallybe called µ, ν or π ∈ P(Ω).

When the state space is relatively low dimensional, e.g Ω ⊂ Z we may use capital letters todenote the process e.g. (Xt)t > 0.

We will consider chains in continuous time,

t ∈ R+ = [0,∞)

and also sometimes in discrete time

n ∈ N0 = 0, 1, . . ..

The letters n,m, k will typically denote integers, while we reserve t and s for real numbers(times).

We write (ηt)t > 0 for a continuous time process and (ηn)n > 0 for a discrete time process,where it is hopefully clear from context which one we are using. We will typically denote theprobability measure associated with the process (on path space) by P. In continuous time thecanonical path space is given by

D[0,∞) = η· : [0,∞)→ Ω cadlag ,

we will be more specific about the probability space etc. when we look at specific models. Theassociated expectation of a random variable f will be written as E[f ].

For a graph G = (V,E) we will write σ ∼ η ⇐⇒ σ, η ∈ E.

1.3 Brief recap

This section gives a very brief recap of discrete and continuous time Markov processes on count-able state spaces, and some important concepts such as reversibility. For more detail see forexample Part A Probability and Part B Applied probability notes, or the book ‘Markov Chains’ byJ.R. Norris.

Let Ω be a countable set. Let P =(P (u, v)

)u,v∈Ω

be a stochastic matrix on Ω, ie.

P (η, σ) > 0 , ∀η, σ ∈ Ω and∑σ∈Ω

P (η, σ) = 1 ∀η ∈ Ω.

1.3. BRIEF RECAP 8

We say that the Ω–valued random variable (Xn)n > 0 is a Markov chain with transition matrix Pand initial distribution ν if its law, denoted by Pν , is determined by the following: For all n > 1,ξ, ζ, σ1, . . . , σn−1 ∈ Ω, and all events Hn−1 = ∩n−1

i=0 Xi = σi satisfying Pν(Hn−1 ∩ Xn = ξ) 6=0, we have

(i) Pν(X0 = ζ) = ν(ζ);

(ii) Pν(Xn+1 = ζ | Hn−1 ∩ Xn = ξ) = P (ξ, ζ) .

Equation (ii), often called the Markov property, means that the conditional probability of pro-ceeding from state ζ to state σ is the same no matter what the sequence of states precedingthe current state ζ, i.e. the past and future are conditionally independent given the present. Ifν = δσ then we write Pσ.

Lemma 1.1. A discrete-time random process (Xn)n > 0 is a Markov chain with initial distributionν and transition metric P if and only if for all n > 0 and σ0, . . . , σn ∈ Ω

P(X0 = σ0, X1 = σ1, . . . , Xn = σn) = ν(σ0)P (σ0, σ1)P (σ1, σ2) . . . P (σn−1, σn) .

Proof. Exercise.

We denote the distribution at time n started from ν by νn, i.e.

νn(σ) : = Pν(Xn = σ) = (νn−1P )(σ) =∑ξ∈Ω

νn−1(ξ)P (ξ, σ) ,

or νn = νn−1P ,

so νn = ν0Pn ∀n > 0 .

The transition matrix acts on functions (observables), which by convention we think of as col-umn vectors, as

Pf(σ) =∑ξ∈Ω

P (σ, ξ) f(ξ) =∑ξ∈Ω

f(ξ)Pσ(X1 = ξ) = Eσ[f(X1)]

in general Pnf(σ) = Eσ[f(Xn)] for n > 1

That is the σth entry of Pf tells us the expected value of the function f after one time step, giventhe initial distribution was δσ.

Definition 1.2 (Stationary distribution). A distribution probability π on Ω is called stationary forthe chain if πP = π, equivalently

π(σ) =∑ξ∈Ω

π(ξ)P (ξ, σ) ∀σ ∈ Ω .

Clearly if π is stationary and ν0 = π then νn = π for all n > 0.

The Markov chain is said to be irreducible if it is possible to go from any sate to any otherusing only transitions of positive probability.

Definition 1.3 (Irreducible). A chain P is irreducible if for any σ, ξ ∈ Ω there exists an n ∈ Nsuch that Pn(σ, ξ) > 0

Recall the definition of recurrence and positive recurrence.

Theorem 1.4. Let P be irreducible Markov transition matrix. Then

(i) P has a stationary distribution if and only if P is positive recurrent.

1.3. BRIEF RECAP 9

(ii) In that case, the stationary distribution is unique and is given by π(σ) = 1Eσ [τ+

σ ]where τ+

σ :=

minn > 1 | Xn = σ is the first return time to state σ.

Proof. Seen previously, see e.g. Prop 1.14 and Corollary 1.17 in [2].

We will call a chain ergodic if it has a unique stationary distribution.

Definition 1.5 (Aperiodic). A discrete time Markov chain is called aperiodic if for all σ ∈ Ω,Pn(σ, σ) is eventually positive, i.e.

there exists Nσ ∈ N such that Pn(σ, σ) > 0 for all n > Nσ .

Theorem 1.6 (Convergence to equilibrium). An irreducible, aperiodic, Markov chain with finitestate space is ergodic, i.e. it has a unique stationary distribution π and

Pn(σ, ξ) = Pσ(Xn = ξ)→ π(ξ) as n→∞, ∀σ, ξ ∈ Ω, .

Theorem 1.7 (Ergodic Theorem). Let f be a real valued function on Ω and (Xn)n > 0 an irre-ducible, positive recurrent, Markov chain with stationary distribution π, then for any initial distri-bution ν

limn→∞

1

n

n−1∑m=0

f(Xm)→ Eπ[f ] a.s. as n→∞ .

In words “time averages converge to space averages.”

Proof. See e.g. [2] Section 4.3

A Markov chains past and future are independent given the present. However, consider an er-godic chain (irreducible and aperiodic) starting from a point mass on a single state, convergenceto equilibrium demonstrates an asymmetrical behavior, i.e. increasing entropy (or uncertainty).On the other hand, a Markov chain in equilibrium (started from a stationary distribution) runbackward is again a Markov chain, although possibly not the same one.

Theorem 1.8. Suppose (Xn)n > 0 is a Markov chain with irreducible (positive recurrent) transitionmatrix P and initial distribution π the unique stationary distribution. Fix N > 0, then (Xn)Nn=0

given by Xn = XN−n is a Markov chain with initial distribution π and transition matrix

P (σ, ξ) =π(ξ)P (ξ, σ)

π(σ).

Proof. See [Thm 1.9.1 in Markov Chains, Norris].

Definition 1.9. A Markov chain is called reversible with respect to π if the forward and reversedchains have the same distribution, i.e. P = P .

Definition 1.10. A transition matrix P satisfies the detailed balance condition with respect tomeasure π ∈ P(Ω) if

π(σ)P (σ, ξ) = π(ξ)P (ξ, σ) ∀σ, ξ ∈ Ω .

Proposition 1.11. If P satisfies the detailed balance condition with respect to measure π, then π isa stationary distribution for the Markov chain generated by P . A Markov chain is reversible withrespect to π if and only if it’s transition matrix satisfies the detailed balance condition with respectto π.

Proof. Exercises.

1.3. BRIEF RECAP 10

Remark 1.12. A Markov chain with transition matrix P is reversible with respect to π if and only ifP is self adjoint in `2(π), with inner product 〈f, g〉π =

∑σ∈Ω f(σ)g(σ)π(σ). That is, for all square

integrable random variables f, g ∈ `2(π)

〈Pf, g〉π :=∑σ∈Ω

π(σ)Pf(σ) g(σ) =∑σ∈Ω

π(σ)g(σ)∑ξ∈Ω

P (σ, ξ)f(ξ)

=∑σ,ξ∈Ω

π(ξ)P (ξ, σ)g(σ)f(ξ) = 〈f, Pg〉π

J0 = 0 J2 J2 J3 J4 J5

S2S1 S3 S4 S5 S6

t

Ω

s

ηs

Figure 1.1: A sample path η· ∈ D[0,∞) on a countable state space Ω.

You should revise/be familiar with the analogous results for continuous time chains on count-able state space. In particular in terms of the ‘Q-matrix’ or (infinitesimal) generator of theprocess, and the correspondence with the jump-chain/holding time description (see Fig. 1.1).

We will often denote the generator (Q-matrix) of a continuous time process on countablestate space as either Q =

(q(σ, η)

)σ,η∈Ω

or by L. Recall that if the state space is finite then thecontinuous time transition matrix (Markov semi-group) can be expressed as Pt = etL. Thesesatisfy the backward and forward equations

P ′t = LPt = Pt L with P0 = I . (1.3.1)

Chapter 2

Electrical networks

Any Markov process on a countable state space can be thought of as a random walk on a (di-rected) graph, possibly with self loops, where the vertices’s are the states of the process andedges represent possible transitions. It turns out that there is an intimate connection betweenreversible random walks on graphs and electrical networks. This connection gives the discreteanalogue of the deep connection between continuous potential theory and Brownian motion.The purpose of this chapter is to give an introduction to this discrete potential theory. In thenext lecture we will see a nice application of this theory to transience and recurrence of randomwalks in Zd, summarised in the beautiful theorem of Polya.

In countable or finite state spaces the electrical network analogy looks essentially the samein continuous and discrete time (as a consequence of the jump chain/holding time construc-tion). We will write things predominantly in terms of a discrete time chain, but try to makeit clear how things would work similarly in continuous time as we go. When the state spaceis uncountable there are some extra subtleties that we will not discuss here. A nice and muchmore detailed account of this subject is given by Doyle and Snell, Random Walks and ElectricNetworks (available on arXiv).

2.1 Set-up

To any reversible Markov chain on finite state space Ω, with transition matrix P and reversibledistribution π, we can associate an (undirected) graph G = (Ω, E) and weights c : E → (0,∞).We will drop the usual assumption that G is irreflexive, that is E may contain ‘loops’ i.e. edgesof the form η, η for η ∈ Ω.

The edge set is given by η, σ ∈ E if and only if P (η, σ) > 0 (or analogously in continuoustime q(η, σ) > 0). Note that the definition of the edge set makes sense since P (η, σ) > 0 if andonly if P (σ, η) > 0 by reversibility (the detailed balance condition). We put a weight function onthe edges, c : E → (0,∞), specified by c(η, σ) = π(η)P (η, σ). This is symmetric with respectto switching η and σ again by detailed balance. We think of c(η, σ) as the conductance of theedge η, σ, or equivalently the edge has a resistor on it with resistance r(η, σ) = 1/c(η, σ).With a slight abuse of notation we will write c(σ, η). It is straightforward to check that π(η) =c(η) :=

∑σ∼η c(η, σ). We will call a graph together with a weight function of this kind a network.

We could also start with a finite graph and associated conductances c : E → (0,∞), and usethese to construct a reversible Markov process via the relation P (η, σ) = c(η, σ)/c(η). In thiscase it turns out that the chain is reversible with respect to the probability distribution given byπ(η) = c(η)/cΩ where cΩ =

∑η∈Ω c(η).

For A ⊂ Ω define the hitting time and first return time as

τA = minn > 0 : Xn ∈ Aτ+A = minn > 1 : Xn ∈ A .

In problems such as the gambler’s ruin we are often interested in the probability of reachingsome point of the state space before another. Consider the probability that the chain startedfrom η ∈ Ω hits a set A before the set B where A ∩B = ∅, and let

f(η) = Pη(τA < τB) .

11

2.2. ELECTRICKERY 12

Then f A≡ 1, f B≡ 0 and by the Markov property

f(η) =∑σ∼η

P (η, σ)f(σ) .

In words this says that the value of the function at a point is equal to its local average, f(η) =Eη[f(X1)] (notice that this does not require the chain to be reversible). This property turns outto be very important and f is called harmonic on (A ∪ B)c with respect to P . These harmonicfunctions share many of the properties that you may be familiar with from the ‘non discretesetting’ (i.e. harmonic functions on R), such as the mean value property above and the followingtheorems. We introduce some notation for the external boundary of a set A ⊂ Ω in G

∂A = σ ∈ Ac : σ ∼ η for some η ∈ A .

The following are stated for finite graphs but with week conditions can be easily extended tocountable graphs.

Theorem 2.1 (Maximum Principle). Let A ⊂ Ω be a connected component of G, if f : Ω → R isharmonic on A and attains its supremum at η0 ∈ A, then f is constant on A ∪ ∂A.

Proof. Let K := σ ∈ Ω : f(σ) = f(η0). If η ∈ A ∩K and P (η, σ) > 0 then σ ∈ K, since f isharmonic at η and takes its largest value at η. We can proceed by iteration.

Theorem 2.2 (Uniqueness Principle). Let G be connected, and A a proper subset of Ω. If f, g :Ω→ R are harmonic on A and f(η) = g(η) for all η ∈ Ac then f ≡ g.

Proof. The proof follows from the Maximum Principle, exercise.

Theorem 2.3 (Existence Principle). Let A be a proper subset of Ω, if f0 : Ω \ A → R then thereexists an f : Ω→ R such that f Ac≡ f0 and f is harmonic on A.

Proof. Check that f(η) = Eη[f0(XτAc )] gives such a function by using the usual one-step analysis.

2.2 Electrickery

We will now define potential differences (voltages) and current flows on the network describedin the previous section, and give some probabilistic interpretations. Several nice results thenfollow from physical principles of electrical networks. We will denote the set of oriented edgesE = (η, σ) ∈ Ω2 : P (η, σ) > 0.

We may think of real valued functions on Ω as potentials. To any potential function ϕ : Ω→ Rwe may associate an induced current j : E → R (and vice-versa) according to Ohm’s Law, whichstates that the current is the conductance times the gradient of the potential.

Ohm’s Law: If η ∼ σ then the current j(η, σ) from η to σ is given by

j(η, σ) = c(η, σ)(ϕ(η)− ϕ(σ)) .

Note that j is always anti-symmetric, i.e. j(η, σ) = −j(σ, η), since the conductances are sym-metric. We will find the following definition very useful in this context.

Definition 2.4. Let A and B be disjoint subsets of Ω. We call an anti-symmetric function θ : E → Ra flow from A to B if

• div θ(η) :=∑

σ∼η θ(η, σ) = 0 for all η 6∈ A ∪B (“flow in is equal to flow-out”).

2.2. ELECTRICKERY 13

• div θ(σ) > 0 for all σ ∈ A.

• div θ(σ) 6 0 for all σ ∈ B.

The strength of a flow |θ| =∑

σ∈A div θ(σ), and if |θ| = 1 we call θ a unit flow.

Exercises (uniqueness): Check that if i and j are both flows from A = a to B = b withstrength |i| = |j| then i ≡ j.

If we attach a battery to the graph such that the vertices in set A are at a positive voltage,φ, and those in B at zero, then an equilibrium current will flow which satisfies Kirchhoff’s Node(or current) Law.

Kirchhoff’s Current Law: The equilibrium current, iφA,B : E → R, associated with a fixedvoltage φ across A and B, is a flow from A to B .

There are several interpretations of these laws. Firstly you could take Ohm’s law as a defini-tion of the current given a potential. Then Kirchhoff’s law is a property of any current given bya potential which is fixed on A and B and harmonic on (A ∪ B)c. Alternatively you could takeKirchhoff’s law together with Ohm’s law as a definition of the equilibrium current and voltage.We described this more precisely next.

Suppose we attach a battery with voltage φ on A and 0 on B. Then by Kirchhoff’s currentlaw this induces some current which is a flow called iφA,B : E → R. By Ohm’s law this currentmust be the gradient of some potential, we assume that this potential must be φ on A and0 on B, then it turns out this uniquely determines the potential and the current flow. Wedenote by vφA,B : Ω → R the potential induced by this voltage φ, i.e. by its restriction to thesets A and B, together with Kirchhoff’s current law and Ohm’s law. Ohm’s law together withKirchhoff’s current law imply that the voltage must be a harmonic function on (A ∪ B)c, andthen its restriction to A and B determine it uniquely. This immediately gives us a probabilisticinterpretation of the voltage.

Theorem 2.5. The voltage (potential function) vφA,B : Ω → R induced by a voltage φ applied to

sets A and B, that is vφA,B A≡ φ, vφA,B B≡ 0 and vφA,B satisfies Ohm’s law with respect to the

current flow iφA,B, is harmonic on (A ∪B)c. Furthermore vφA,B(η) = φPη(τA < τB).

Proof. Let U = (A ∪ B)c, and suppose v satisfies the conditions of the theorem, then by Kirch-hoff’s current law and Ohm’s law

div i(η) =∑σ∼η

i(η, σ) =∑σ∼η

c(η, σ)(v(η)− v(σ)) = 0 for η ∈ U.

Rearranging the final equality and using the fact that the conductances are symmetric we find

v(η) =1

c(η)

∑σ∼η

c(η, σ)v(σ)

i.e. v is harmonic on U . So, by the uniqueness theorem, vφA,B is the unique harmonic function on

U satisfying the boundary conditions v A≡ φ and v B≡ 0. Since v′(·) = vφA,B(·)/φ is harmonicon U and satisfies v′ A≡ 1 and v′ B≡ 0 (simply check), then by uniqueness v′(η) = Pη(τA <τB), i.e. v(η) = φPη(τA < τB).

There are also other interpretations of the voltage as well as probabilistic and combinatorialinterpretations for the current flow. We will see some more of these in the next section. We now

2.3. EFFECTIVE RESISTANCE AND ENERGY 14

give a connection between unit flows and spanning trees on G. We will see spanning trees againalter. Note that flow associated with the voltage v above is not necessarily of unit strength (wewill also return to this point).

Theorem 2.6. Let T be a tree on G chosen uniformly from the set T of spanning trees. Then

j(η, σ) = P(The unique path from ξ to ζ in T contains (η, σ)

)−

P(The unique path from ξ to ζ in T contains (σ, η)

)is a unit flow from ξ to ζ.

Proof. See Grimmett, Probability on Graphs Theorem 1.16.

2.3 Effective Resistance and Energy

The strength of the current, i, that flows when we apply a voltage φ on A and 0 on B, as inTheorem 2.5, can be calculated as follows,

div iφA,B(a) =∑σ∼a

iφA,B(a, σ) =∑σ∼a

c(a, σ) [φ− φPσ(τA < τB)]

= φ∑σ∼a

c(a, σ)[1− Pσ(τA < τB)] = φ∑σ∼a

c(a, σ)Pσ(τA > τB)

= φπ(a)∑σ∼a

P (a, σ)Pσ(τA > τB) = φπ(a)Pa(τ+A > τB) .

It follows that

|iφA,B| =∑a∈A

div iφA,B(a) = φ∑a∈A

π(a)Pa(τ+A > τB) . (2.3.1)

Now in analogy with Ohm’s law we define the effective conductance as

C(A,B) :=∑a∈A

π(a)Pa(τ+A > τB) ,

which is independent of the applied voltage. The effective resistance R(A,B) = 1/C(A,B).Notice that for a given voltage φ over the sets A and B, the strength of the associated currentflow is φC(A,B), equivalently an applied voltage equal to R(A,B) induces a unit current flow.This holds for the original network and the reduced network G′ = (A,B, A,B) withc(A,B) = C(A,B). The associated voltage that induces this unit flow is denoted vA,B(·) :=R(A,B)v1

A,B(·). We will call this (equilibrium) unit current flow iA,B, furthermore it is simple

to check that iA,B(·) ≡ iφA,B(·)R(A,B)/φ.We now define the energy dissipated in a flow, and observe that the energy dissipated in

the flow associated with potential φ over A and B is the same on G as on a single resistor withresistance RA,B (further motivation for calling it effective resistance).

Definition 2.7. The energy of a flow θ on G is given by

D(θ) =∑e∈E

θ(e)2r(e) =1

2

∑e∈E

θ(e)2r(e) .

Note the first sum is over unoriented edges (which makes sense because we square an anti-symmetric) and the second is over all oriented edges in the graph. Consider the energy dissi-pated by a current IA,B flowing across a single edge with resistance R(A,B) when a voltage

2.3. EFFECTIVE RESISTANCE AND ENERGY 15

of φ is applied, given by D(IA,B) = φ2C(A,B)2R(A,B) = φ|IA,B|. We observe that this is thesame as the energy of the equilibrium flow iφA,B on G described above (suppressing the sub andsuperscripts for clarity),

D(iφA,B) =1

2

∑(η,σ)∈E

iφA,B(η, σ)(v(η)− v(σ)) =1

2

∑η∈Ω

v(η)div i(η) +∑σ∈Ω

v(σ)div i(σ)

(since the potential (voltage) on B is zero) = φ

∑η∈A

div i(η) = φ|i| = φ|IA,B| ,

where in the final equality we have used the fact that the strength of the induced flow is equalto φC(A,B) on both networks.

We may give an equivalent definition of the effective resistance in terms of the energy.

Theorem 2.8 (Thomson’s principle). LetA andB be disjoint subsets of a finite connected (weighted)graph G, then

R(A,B) =1

C(A,B)= infD(θ) : θ is a unit flow from A to B ,

and the unique minimum is attained by iA,B (the unit strength equilibrium flow defined above).

Proof. It is clear from the calculation above that D(iA,B) = R(A,B). Let j be a unit flow from Ato B and define δ = j − iA,B. It is straight forward to check that δ is a strength zero flow fromA to B, i.e. ∑

a∈Adiv δ(a) =

∑b∈B

div δ(b) = div δ(η) = 0 for each η 6∈ A ∪B .

Also,

D(j) = D(i) +D(δ) +∑e∈E

δ(e)i(e)r(e)

= D(i) +D(δ) +∑

(η,σ)∈E

(v(η)− v(σ))δ(η, σ)

= D(i) +D(δ) .

It follows that D(j) > D(i) as soon as δ 6≡ 0.

We can also define the energy dissipated by a function f : Ω → R acting as a potential(voltage). With a slight abuse of notation we use the same letter since it will be clear fromcontext if the argument is a ’voltage’ or a current flow.

Definition 2.9. The energy of a function f : Ω → R on the weighted graph G, also called theDirichlet form of f , is given by

D(f) =1

2

∑(η,σ)∈E

c(η, σ)(f(η)− f(σ))2 .

The Dirichlet form is often expressed in terms of the inner product on `2(π), this will beexplored a little on the first assignment sheet. Note that, from Ohm’s law, D(iφA,B) = D(vφA,B),so it makes sense to use related notation.

2.4. SOME FURTHER PROBABILISTIC INTERPRETATIONS 16

Theorem 2.10 (Dirichlet’s principle). LetA andB be disjoint subsets of a finite connect (weighted)graph G, then

C(A,B) = infD(f) : f : Ω→ R, f A≡ 1 and f B≡ 0 ,

and the minimum is attained at v1A,B.

Note that the effective conductance C(A,B) is often also called the capacity. In our ‘dimen-sionless’ electrical network setting it turns out that these two physical concepts coincided.

Proof. The proof is essentially the same as for Thomson’s principle.To see that C(A,B) = D(vA,B) we appeal to Thomson’s principle. By Eq. 2.3.1 we have

|i1A,B| = C(A,B), i.e. i1A,B = C(A,B)iA,B, so D(vA,B) = D(i1A,B) = C(A,B)2/C(A,B) byThomson’s principle.

Fix a function f : Ω→ R such that f A≡ 1 and f B≡ 0, then define h := f −vA,B. A simplecalculation gives

D(f) = D(vA,B) +D(h) +∑

(η,σ)∈E

c(η, σ)(vA,B(η)− vA,B(σ))(h(η)− h(σ))

= D(vA,B) +D(h) +∑

(η,σ)∈E

(h(η)− h(σ))i1A,B(η, σ)

= D(vA,B) +D(h) ,

where the last equality follows since h A≡ h B≡ 0 and div i1A,B(η) = 0 for η 6∈ A∪B. It followsthat D(f) > D(vA,B) whenever h 6≡ 0.

Together Thomson’s principle and Dirichlet’s principle give two variational principles for theeffective conductance, the former can be used to get lower bounds on the effective conductanceby considering ‘test flows’ on G, and the latter can give upper bounds by consider ‘test functions’on Ω.

2.4 Some further probabilistic interpretations

The Green’s function, Gτ (a, η), for a random walk started from a and stopped at a stopping timeτ is defined by the expected number of visits to η before time τ , i.e.

Gτ (a, η) := Ea

[τ∑

n=0

1(Xn = η, n < τ)

].

The following interpretation of the effective resistance is a simple consequence of the calcula-tions in the previous section.

Lemma 2.11. Let (Xn) be the random walk on weighted graph G, fix B ⊂ Ω and a 6∈ B, then

GτB (a, a) = π(a)R(a, B) ,

and more generally (recall va,B is the voltage the induces a unit current flow from a to B)

GτB (a, η) = π(η)va,B(η) ,

Sketch proof. The number of visits to a before visiting B, started from a, has a geometric distri-bution with parameter Pa(τB < τ+

a ), by definition of the effective resistance this completes theproof by the first part.

For the second part we observe that it is sufficient to show that GτB (a, η)/π(η) is harmonicon (a ∪ B)c. This follows from one-step analysis together with the identity GτB (a, η)π(a) =GτB (η, a)π(η) from problem Sheet 1.

2.4. SOME FURTHER PROBABILISTIC INTERPRETATIONS 17

We give another probabilistic interpretation of the current flow, which is particularly appeal-ing if we take a naive view of electrical current as charged particles moving on the network. Ifa unit current flows from a to b then the current flow on all the edges (η, σ) is equal to the netnumber of times that the process started from a will cross (η, σ) before it reaches b. As we havealready observe the unit current is proportional to i1a,b and the constant of proportionality isthe effective resistance.

Lemma 2.12. Let (Xn) be a random walk on G that is absorbed at b, and for (η, σ) ∈ E let Sη,σ bethe number of transitions from η to σ before (Xn) is absorbed, then Ea[Sη,σ−Sσ,η] = ia,b(η, σ).

Proof. The proof follows by a simple calculation and using Lemma 2.11, and is left as an exer-cises.

Chapter 3

Polya’s Theorem

Up to this point we have focused on finite networks. In this Chapter we will see how theelectrical network analogy can help us to establish if processes on countable graphs are transientor recurrent in term of limits of finite graphs.

3.1 Network reduction

There are several ways of simplifying electrical networks without changing the quantities ofinterest, which come in very handy when trying to do calculations. This is one reason that theelectrical network analogy can be helpful. The simplest, and most familiar of these, will be theSeries and Parallel laws. For the following fix a graph G = (Ω, E) with conductances (resistances)c : E → R (respectively r), and A,B ⊂ Ω disjoint with potential difference φ applied across Aand B.

Lemma 3.1 (Series Law). Two resistors r1 and r2 in series are equivalent to a single resistor withresistance r1 + r2. That is if η ∈ (A ∪ B)c is a vertex with degree 2 with neighbours σ1, σ2, andwe replace the edges η, σ1, η, σ2 with a single edge (σ1, σ2) with resistance r(η, σ1) + r(η, σ2)then all the potentials and currents in G \ η are unchanged, and the current flowing from σ1 toσ2 equals i(σ1, η).

Note that, since this network reduction does not change the potential or current flows onany of the vertices or edges in the reduced graph, we must have that all the effective resistancescalculated between sets that do not contain η must also be the same (we define them in terms ofthe strength of the currents and potentials). In particular if we can reduce the entire network inthis way to a single resistor between the sets we are interested in then the value of the resistormust be equal to the effective resistance.

Example 3.2. Consider a random walk on Z that moves right with probability p and left withprobability q = 1− p. Now for the usual Gamblers ruin problem we would be interested in Pk(τ0 <τN ). From detailed balance we have c(i, i + 1) = (p/q)i (e.g. take as reversible measure π(i) =

1q

(pq

)iwhich is actually a probability measure on N0 if p < q, but we don’t mind if it is not

normalised). We may now apply the series law ‘on either side’ of the initial sate k. The resistancebetween 0 and k is R(0, k) =

∑k−1i=0 (q/p)i and the resistance between k andN is R(k, 0) =∑N−1

i=k (q/p)i. By our previous observations v10,N (k) = Pk(τ0 < τN ). Since potential’s and current

flows are unchanged by the application of the series law we know the stregth of the flow i10,N isequal in both networks, and is C(0, N) by definition. By Ohm’s law on the reduced networkv1

0,N (0)−v10,N (k) = 1−v1

0,N (k) = C(0, N)R(0, k), therefore it follows that the probabilityof ruin is [(p/q)N−k − 1]/[(p/q)N − 1].

Lemma 3.3 (Parallel Law). Two conductors c1 and c2 in parallel are equivalent to a single con-ductor with conductance c1 + c2. That is if e1 and e2 are both edges that join η, σ ∈ Ω they canbe replaced with a single edges η, σ with conductance c(e1) + c(e2) and all the potentials andcurrents in G \ e1, e2 are unchanged, and the current flowing from η to σ equals i(ε1) + i(ε2).

18

3.2. POLYA’S THEOREM AND PROOF 19

Proof. The proofs of the above lemmas just requires checking Kirchoff’s and Ohm’s laws withthe respective current flows, and that this does not change the voltage function. Or, equivalentlycheck that the new voltage function on the reduced network is still harmonic on the appropriateset.

Lemma 3.4 (Gluing). If two neighbouring vertices have the same voltage, then no current flowsbetween them, and we can glue them in to a single vertex while keeping all other edges, withoutchanging the potential or current flow.

Thomson’s principle immediately gives us another useful tool for bounding resistances.

Theorem 3.5 (Rayleigh monotonicity principle). The effective resistanceR(A,B) is a non-decreasingfunction of the edge resistances (r(e))e∈E .

Proof. Let iA,B and i′A,B be the two unit currents associated with a voltage across A and B, withresistances (r(e))e∈E and (r′(e))e∈E respectively (we drop the bar). Suppose r(e) 6 r′(e) for alle ∈ E . By Thomson’s principle we have

R(A,B) =1

2

∑e∈E

(iA,B(e))2r(e) 61

2

∑e∈E

(i′A,B(e))2r(e) .

Now by assumption on the edge resistances and Thomson’s principle again

R(A,B) 61

2

∑e∈E

(i′A,B(e))2r(e) 61

2

∑e∈E

(i′A,B(e))2r′(e) = R′(A,B) .

Corollary 3.6. Adding edges, that are not incident with A can only increase the probability ofescaping A.

3.2 Polya’s Theorem and proof

We now apply the tools we have developed to the problem of identifying for which dimensionsd the simple random walk on Zd is transient. Slightly more generally we consider a infiniteconnected graph G = (Ω, E) with finite vertex degree and conductances (c(e))e∈E , and a distin-guished vertex 0 ∈ Ω called the origin. Let d(·, ·) denote the graph distance, i.e. the length ofthe shortest path between two vertices, then let

Ωn = η ∈ Ω : d(0, η) 6 n and ∂Ωn = η ∈ Ω : d(0, η) = n

and Gn = (Ωn, En) be the sub-graph of G given by all the edges between vertices in Ωn. Foreach n we construct a graph G∗n = (Ω∗n, E

∗n) by gluing all the vertices in Ω \ Ωn together, i.e.

Ω∗N = Ωn ∪ ζn and E∗n = En ∪ η, ζn : η ∈ ∂Ωn. The weight function on G∗n is equal to con all edges present in G and on the edges that connect with ζn we put c(η, ζn) =

∑ζ∈Ωcn

c(η, ζ).Let Reff(n) = R(0, ζn), it follows from the Rayleigh’s monotonicity principle that the limit

Reff = limn→∞

Reff(n)

exists, and therefore

P0(τ+0 =∞) = lim

n→∞P0(τζn < τ+

0 ) = limn→∞

1

Reff(n)=

1

Reff. (3.2.1)

We define a flow from 0 to∞ on G as an anti-symmetric function θ satisfying div θ(η) = 0 for allη 6= 0.


Proposition 3.7. The process (Xn)n > 0 associated with G = (Ω, E) with finite vertex degree andconductances (c(e))e∈E is;

1. recurrent if and only if Reff =∞,

2. transient if and only if there exists a non-zero 0 to∞ flow i on G satisfying D(i) <∞.

Proof. The first part follows directly from 3.2.1, since the walk is recurrent if and only if P0(τ+0 =

∞) = 0, for more details see Grimmett, Probability on graphs.For 2. we first assume that the chain is recurrent, i.e. Reff = ∞, and let i be a non-zero

0 to ∞ flow on G, by dividing by its strength we may assume (wlog) that it is a unit flow. ByThomson principle there exists a unit flow in on G∗n such that D(in) = Reff(n). It is simple tocheck that the obvious restriction of i to G∗n is a unit flow from 0 to ζn, so by Thomson’s principle

D(in) 6∑e∈E∗n

i(e)2/c(e) 6 D(i) .

Hence D(i) > limn→∞D(in) = Reff =∞.Suppose, conversely, that the chain is transient, i.e. Reff < ∞. There exists a subsequence

nk such that ink(e)→ j(e) for every e ∈ E (sometimes called diagonal selection). Since ink is aunit flow from the origin, j must be a unit flow from 0 to∞. Now,

D(ink) =∑e∈E∗n

ink(e)2/c(e) >∑e∈Em

ink(e)2/c(e)

for m 6 n, so

∞ > Reff = limk→∞

D(ink) > limk→∞

∑e∈Em

ink(e)2/c(e) =∑e∈Em

j(e)2/c(e)→ D(j) as m→∞ ,

therefore j is a flow with the required property.

Corollary 3.8. If G ⊂ G′ and G is transient, then G′ is also transient. If G′ is recurrent then G isalso recurrent.

Proof. This follows immediately from the previous proposition and Rayleigh’s monotonicity prin-ciple.

Theorem 3.9 (Polya’s Theorem). The simple random walk on Zd is recurrent if d = 1, 2 andtransient if d > 3.

In this case we are applying the tools we have developed so far to G = (Zd, E = (x, y) :|x − y|1 = 1), with conductances c(x, y) = 1 if and only if (x, y) ∈ E . Note that in this casethe reversible measure π is uniform on Zd, with π(x) = 2d for all x, all our theory so-far appliesequally well whether or not π is normalised. In general, for a countable graph with finite vertexdegree, we call any process with associated with edge conductances which are all constant therandom walk on the graph.

Proof. It is clear by the previous corollary that we can restrict to the cases d = 2 and d = 3.Firstly, for d = 2 we aim to show that Reff = ∞, and we therefore only need a lower bound onReff(n) that is diverging with n. By Rayleigh’s monotonicity principle we reduce the effectiveresistance by adding edges (formally this corresponds to reducing the resistance between statesfrom ∞). In this case we connect all the states in ∂Ωn with edges that have zero resistance, orequivalently ∞ conductance. These vertices must be at the same voltage and all neighbours in


the new graph, so we may glue them. We are left with a one dimensional graph that we canreduce using the series/parallel laws, so

Reff(n) >n−1∑i=1

1

4(2i− 1),

and hence Reff(n) > C log n → ∞ as n → ∞. Alternatively we can construct Nash-Williams cutsets using the same sets (see Peres and Lyons Probability on trees and networks.

Now suppose d = 3, we wish to construct a non-zero flow from 0 to∞ such that D(j) <∞.There are many ways to do this of course, but it makes sense to make the most of the symmetryand take inspiration from the obvious flow field on R given by a unit source at the origin. Weconstruct such a flow as follows. Let u ∼ Uniform(S3) be a uniform random point on the unitsphere (with associate measure P), and let Lu be the unique straight half-line from 0 to∞ thatpasses through the point u. Let γu be the (almost surely) unique path in G = (Z3, E = (x, y) :|x−y|1 = 1) from 0 to∞ that is closest to Lu in the Hausdorff metric, thought of as a sequenceof directed edges. Clearly γu stays within a distance of 4 from L. Now define the flow by

j(e) =∑n > 0

[P(γun = e)−P(γun = −e)] .

You should check that this defines a flow (in fact a unit flow) from 0 to∞, this trick of definingunit flows as the expectation of random unit flows is a more general one. We now show thatj(e) has finite energy. By construction, for any given edge e ∈ E whose midpoint is a distanceR from the origin, there is a constant A (independent of R) such that P(e ∈ γu) 6 A/R2. Also,there is a constant B such that there are at most Bn2 edges a distance between n and n+1 fromthe origin. It follows that D(j) 6

∑∞n=1A

2n−4Bn2, which is finite, and the result follows fromProposition 3.7 2..

Chapter 4

Uniform Spanning Tree and Wilson’salgorithm

We have already seen the idea of uniform spanning trees (USTs) appear briefly in the previouschapter (see Theorem 2.6). It turns out that connection between USTs and random walks ongraphs (and the electrical network analogy) goes much deeper. Also these objects turn out tohave connections with our next topic - percolation.

In this chapter we will see our first correlation inequality, these turn out to be extremelyuseful, and we will revisit such concepts in more detail later. In large graphs there are typicallya huge number of spanning trees, and it is not obvious how to generate one in an efficientalgorithmic way. We will see one very effective way called Wilson’s algorithm which is efficientand theoretically very important, with links to loop erased random walks. At the end of thesection we will touch on the idea of weak limits of finite-volume measures, a concept that willalso be important later.

A spanning tree of an undirected graph will be composed of undirected edges, since weconnect with the previous electrical network picture we will also consider directed graphs anddirected spanning trees. It should be clear from context which case we are dealing with.

Definition 4.1. Let G = (Ω, E) be a finite connected graph. A spanning tree of G is a sub-graphof G that is connected, contains all vertices of G, and is a tree (has no cycles). Let T denote the setof all spanning trees of G.

We call T a uniform spanning tree, abbreviated UST, if it is disstributed according to theuniform measure on T , i.e.

P(T = ω) =1

|T |, ω ∈ T .

We may think of T as being a random subset of all the edges E (satisfying the constraints to bea spanning tree), in this sense we can identify T with a special subset of ΩE = 0, 1E , that isfor w ∈ T we may think of ω(e) = 1 if the edge e ∈ E is present in the tree and ω(e) = 0 if theedge does not belong to the tree.

4.1 Wilson’s algorithm

There are many ways to generate a UST of G. The following algorithm highlights the closeconnection between USTs and random walks, and exploits some hidden independence in Markovchains. In fact we will present the algorithm in sufficient generality that it can not only sampleUST be other weighted random trees.

We will consider the directed, and connect, graph G generated by a finite, irreducible, Markovprocess with transition matrix P . For e = (η, σ) ∈ E we call e− = η the tail of the edge, ande+ = σ the head.

Definition 4.2. We call a sub-graph of G, with one vertex r ∈ Ω marked as a root, a (directed androoted) spanning tree (sometimes called a spanning arborescence) if it includes every vertex, thereis no cycle, and every vertex other than the root is the tail of exactly one edge.

22

4.1. WILSON’S ALGORITHM 23

The root r will be arbitrary but fixed from this point onwards. The edges in the directedspanning tree all point towards the root. It is clear that if we ignore the root and the edge direc-tions then we obtain a spanning tree. Wilson’s algorithm will select spanning trees proportionalto the weights

Ψ(T ) =∏e∈T

P (e−, e+) ,

by explicitly using the process with transition matrix P . If the process is reversible, recall c(e) =π(e−)P (e), so that

Ψ(T ) =

∏e∈T c(e)∏h6=r π(η)

∝∏e∈T

c(e) ,

since the root is fixed. In particular for the simple random walk on G we have P (η, σ) = 1/deg(η)if η ∼ σ, and all other entries are zero, and the reversible stationary measure satisfies π(η) ∝deg(η), i.e. in this case we will select a UST (when we ignore the direction and root).

To describe Wilson’s algorithm we must first define the important concept of loop-erasure(meaning cycle erasure since we have already used loop in the context of the graph having selfloops). For any path Γ = (η0, η1, . . . , ηk) on the graph G, meaning (ηi, ηi+1) ∈ E for 0 6 i < k,which can be thought of as a connected set of edges, we can construct a non-intersecting sub-path, we call LE(Γ), by removing cycles in the order they appear. Note that paths can bethought of as a sequence of edges and induce a sub-graph in the obvious way. More precisely letJ = minj > 1 : ηj = ηi for some i < j be the first time you re-visit a vertex in the path, andlet I be the index of the point in the path it hits, i.e. I is the unique value of i satisfying i < Jand wi = wJ . Let Γ′ = (η0, . . . , ηI , ηJ+1, . . . , ηk) be the sub path attained after removing the firstcycle. This procedure is iterated until we arrive at a path with no cycles, which is denoted LE(Γ)(Pictured in lectures).

Wilson’s algorithm: First let (σ1, σ2, . . . , σn) be an arbitrary fixed ordering of Ω \ r, andlet T0 = r.

1. If Ti spans G we are done and the algorithm stops. Otherwise pick the smallest element ofΩ that belongs to T ci , call this ηi, and run the process with transition matrix P on G startedfrom ηi until it hits Ti, call the path it followed Γi.

2. Let Ti+1 = Ti ∪ LE(Γi) be the union of the two edge disjoint paths.

It is clear that each stage of the algorithm gives a sub-tree of G, directed toward the root, alsothe final tree must be a spanning tree since it contains every element of Ω. Marvellously, therandom tree grown in this way has the desired distribution.

Theorem 4.3. Wilson’s algorithm yields a random spanning tree rooted at r with distributionproportional to Ψ(·).

Note that, in the reversible case, if we forget the root and the orientation then Wilson’s algo-rithm will construct a weighted spanning tree of G were each tree has probability proportionalto∏e∈E c(e).

In order to prove this result we will construct a Markov process with transition matrix Pin a special way and remove cycles in a way that is called ‘cycle popping.’ We therefore havesome setting up to do, and first prove a deterministic result at the heart of the algorithm. Wethink of fixing the randomness of the process in advance, as follows. At each state we assign aninfinite sequence of moves (stack of instructions) with the correct distribution. That is, for eachη ∈ Ω \ r let Sηi , i > 1, be independent random variables laws

P(Sηi = σ) = P (η, σ) , for σ ∈ Ω .

4.1. WILSON’S ALGORITHM 24

Since we only care about the process up to the first time it hits the root we put an empty stackon the root. Then the process which moves according to the next instruction in the stack eachtime it visits any state, and then discards that instruction, is clearly a Markov process with thegiven transition matrix (absorbed at the root).

The top of these stacks, or visible moves, always determines a directed graph via the edges(η, Sηki), η ∈ Ω and i > 1, which we call the visible graph. If this visible graph contains nodirected cycles then it is a spanning tree rooted at r. Otherwise we will proceed by removingvisible cycles in a certain order, called ‘cycle popping’. When a cycle is popped, the moves inthat cycle get removed from the top of the stack. It turns out the actual order that visible cyclesare popped does not matter.

Lemma 4.4. The order in which cycles are popped is irrelevant. Either every order props an infinitenumber of cycles, or every order pops the same finite set of cycles, thus leaving the same tree on thetop of the stacks at the end.

Proof. We will track the position of the moves in the stack, we say that the edge (η, Sηi ) hascolour i. A coloured cycle is a visible cycle all of whose edges have an associated colour. Theedges in a coloured cycle do not all have to be of the same colour. Although the same cycle couldbe popped several times, a coloured cycle can be popped at most once. We will show that if Cis any coloured cycle that can be popped, i.e. there exists a sequence C1, C2, . . . , Cn = C thatmay be popped in that order, but we instead chose to pop some other coloured cycle C ′ 6= C1

first, then either; (1) C = C ′ or else, (2) C can still be popped from the stack after C ′. Showingthis completes the proof, since we either never stop popping or every cycle that can pop will bepopped.

If all the vertices of C ′ are disjoint from those of C1, . . . , Cn then C can obviously still bepopped. Otherwise, let k be the first cycle with at least one vertex in common with C ′. Fixη ∈ C ′ ∩ Ck, since all edges in C ′ have colour 1 by assumption, and η 6∈ C1, . . . , Ck−1, the edgewith head η in Ck must also have colour 1. It follows that the edge with head η in C ′ andCk must both point to the same state, lets call it σ. We can now repeat the same argumentfor the state σ, and it follows that C ′ = Ck. Therefore C ′ = C or we can pop C in the orderC ′, C1, C2, . . . , Ck−1, Ck+1, . . . , Cn.

We are now in a position to prove Theorem 4.3.

Proof of Theorem 4.3. It is clear that Wilson’s argument stops with probability 1, it also popsvisible cycles in some order, and so by the previous lemma the output is independent of thechoice of order in the implementation.

We now show that the final distribution is the desired one. We think of the stacks (up to thepoint we stop popping) as defining a finite set of coloured cycles (loops) O = (C1, C2, . . . , Cn)lying over a non coloured spanning tree T , and let Π be the set of all such pairs we can construct.If (O, T ) ∈ Π then so is (O, T ′) for any spanning tree T ′ since the final tree is dictated by thevalues on the bottom of the stacks after popping all the cycles, and the values of the precedingelements in the stack imparts no information on these. So Π = O × Tr where O is the set ofall sequences O that can occur, and Tr is the set of directed spanning trees rooted at r. Byconstruction the probability of seeing a pair (O, T ) ∈ Π is given by

∏C∈O

∏e∈C P (e−, e+)Ψ(T ).

Since the probability factorises the random sequence of popped cycles and the final tree areindependent and the marginal on the tree is the desired one.

The consequences of Wilson’s algorithm are far reaching, we state some consequences ofTheorem 4.3 here.

Corollary 4.5. Given two states η and σ in Ω (finite), the distribution of the path in the weightedspanning tree from η to σ is equal to the distribution of the loop-erased path taken by the processfrom η to σ.

4.2. ELECTRICAL INTERPRETATIONS AND NEGATIVE ASSOCIATION 25

The next corollary is somewhat less trivial, and although it can be proved in many ways italso follows from Theorem 4.3.

Corollary 4.6. The number of un-rooted trees with n vertices is nn−2.

One proof is to examine the USTs of the complete graph on 1, 2, . . . , n, and in particularcalculate the probability that a UST is given by a single path (1, 2, . . . , n), which occurs withprobability 1/nn−2, as required.

4.2 Electrical interpretations and negative association

In this section we will discuss some important theoretical consequence of the connection be-tween random walks and spanning trees on finite networks, and in particular the connectionagain with electrical networks.

It turns out that knowing any given edge is present in the tree can never increase the proba-bility that another edge is also in the tree. The most immediate way to express this property ofnegative association is as follows.

Theorem 4.7. For e1, e2 ∈ E such that e1 6= e2,

P(e2 ∈ T | e1 ∈ T ) 6 P(e2 ∈ T ) . (4.2.1)

The proof makes use of the link between the current flow and the probability of the randomwalk crossing given edges. In particular the following results which you have already provedpart of on the first example sheet.

Lemma 4.8 (Kirchhoff’s Effective Resistance Formula). Let T be an un-rooted weighted spanningtree (with law P) of a finite network G (associated with transition matrix P , i.e. edge weightsc(e) = π(e−)P (e−, e+)) and e ∈ E, then

P(e ∈ T ) = Pe−(1st hit e+ via travelling along e) = i(e) = c(e)R(e−, e+) ,

where i is the unit current flow from e− to e+.

Proof. Exercise. Hint use the current as edge crossing probabilities result from Sheet 1. Notethat this result also follows immediately from Theorem 2.6.

Proof of Theorem 4.7. Denote the dependence of the random tree on the graph by TG. Thecontraction of the graph G along the edge e is defined by removing the edge e and identifyingthe end points, and is denoted by G/e (gluing the vertices connected via e), and leaving theother edges and vertices the same. Note that even if G is a simple graph the contraction may bea multi-graph. There is a one-to-one correspondence between spanning trees of G containingthe edge e and spanning trees of G/e. Furthermore, the distribution of TG/e given e ∈ TG is thesame as that of TG/e. It follows from the previous lemma that

P(f ∈ T | e ∈ T ) = P(f ∈ TG/e) = RG/e(f−, f+) 6 RG(f−, f+) = P(f ∈ T ) ,

where the key inequality follows from Rayleigh’s monotonicity principle.

The negative association in Theorem 4.7 was generalised by Feder and Mihail in 1992 tomore general ‘increasing’ events and random variables, as follows. Recall we can think if sub-graphs of G as elements of 0, 1E , on which there is an obvious partial ordering, ω 6 ω′ ifω(e) 6 ω′(e) for all e ∈ E (this type of partial ordering will be come up again later on ourtypical statespace for IPS SΛ). An event A ⊂ 0, 1E is called increasing if for all ω, ω′ ∈ 0, 1Ewith ω 6 ω′, we have that ω′ ∈ A whenever ω ∈ A, i.e. adding any edge to any configuration(subgraph) in A results in another configuration (subgraph) in A. Examples of increasing eventsinclude, the edge e is in the sub-graph, and there is a path connecting x to y in the sub-graph. Arandom variable X : 0, 1E → R is increasing if X(ω) 6 X(ω′) whenever ω 6 ω′.

4.3. WEAK LIMITS 26

Theorem 4.9. Let G be a finite network and A an increasing event that ignores the edge e ∈ E,then P(A | e ∈ T ) 6 P(A). Furthermore, if X and Y are increasing random variables thenE[XY ] 6 E[X]E[Y ].

For proofs see Chapter 3, Section 4.2 in Lyons and Peres Probability on Trees and Networks.Later we will see an extremely useful counterpart to this (with the inequality the other way)called ‘positive association’ or positive correlations, which will be called the FKG inequality.

4.3 Weak limits

It turns out that Wilson’s algorithm can also be run using recurrent, irreducible Markov processeson countable graphs, provided we start the loop-erased walks appropriately so that each vertexbelongs to the final tree. Can this object on the countable graph be interpreted in terms ofweighted spanning trees? Usually the weighted spanning tree on the d-dimensional cubic latticeis not defined directly on the infinite graph, but instead on finite subgraphs Gn which increaseto the cubic lattice as n→∞. Then the probability of observing an event that only depends onfinitely many edges converges to the probability of observing this event on the infinite lattice asGn (Zd, E). Also, it turns out that these local probabilities determine the distribution of theinfinite tree, that is the infinite object locally looks like a weighted spanning tree. To make thisdiscussion more precise we will examine weak limits of probability measures.

4.3.1 A Topological Aside

The state space for random trees as viewed as subsets of 0, 1E is of the more general formthat will be of interest later for IPS and was discussed in the intro, i.e. SΛ for some finite Sand countable Λ. So the discussion here will be relavent also later in the course. We endowthis space with the product topology, i.e. the coarsest topology with respect to which all thecanonical projections, pe : 0, 1E → 0, 1 given by pe(ω) = ω(e), are continuous. Equivalentlythis is the topology is generate by countable unions and finite intersections of sets of the form

p−1e (U) = ω ∈ 0, 1E : ω(e) ∈ U for U ⊂ 0, 1 ,

called open cylinders. Note that since we take the discrete topology on 0, 1 (or more generallyfinite S) every subset if clopen. Finite intersections of open cylinders, of the form

ω ∈ 0, 1E : ω(e1) ∈ U1, . . . , ω(en) ∈ Un for Ui ⊂ 0, 1, 1 6 i 6 n ,

are called cylinder sets, in words these are exactly the events that depend on only finitely manycoordinates. Since 0, 1 is compact (more generally if S is finite), then by Tychonoff’s Theorem0, 1E is compact. It is also metrisable by

d(ω, ω′) =∞∑i=1

2−i|ω(ei)− ω′(ei)| , ω, ω′ ∈ 0, 1E ,

moreover this gives us a complete separable metric space.For a measurable structure on 0, 1E we take the product σ-algebra F of 0, 1E generated

by the cylinder sets. Equivalently this is just the Borel σ-algebra with respect to the topologydiscussed above.

We would like to discuss convergence of probability measures, as such we will take thetopology of weak convergence on the setM1(0, 1E) of probability measures on our state space.We say µn converges weakly to µ, written µn → µ as n→∞, if

µn(f)→ µ(f) as n→∞ ,

4.3. WEAK LIMITS 27

for all bounded continuous functions f : 0, 1E → R. Recall µ(f) denotes the expectationof the function f with respect to µ. Since Ω is compact M1 is also compact with respect tothis topology, and every family of probability measures on (0, 1E ,F) is relatively compact(this turns out to be very useful). Since all cylinder events are both open and closed, and theygenerate F , we also have µn → µ if and only if limn→∞ µn(C) = µ(C) for all cylinder sets C.

4.3.2 Back to the UST measure

We make a slight shift in notation in line with the topological discussion previously. Let µn bethe distribution of the UST on the d-dimensional box Λn = [−n, n]d, with edge set given by theusual `1 neighbours (i.e. the sub-graph of the d-dimensional lattice). As above, we think of thisas a measure on Ωn = 0, 1Edn where Edn are the edges contained in Λn (we should be carefulwith our change of notation since we previously called the vertex set of the graph Ω).

Theorem 4.10. The weak limit µ = limn→∞ µn exists, it is a translation-invariant and (shift)-ergodic probability measure. It is supported in the set of forests of (Zd, E) (i.e. no cycles butpossibly not connected) with no bounded component.

For x ∈ Zd let θx be the shift operator, i.e. θx(y) = y + x for y ∈ Zd. This induces an obviousaction on the edge set and Ω, by θx(e) = (θx(e−), θx(e+)) and θx(ω) =

(ω(θ−x(e)

)e∈Ed . In this

was the translation by x is a graph automorphism on the d-dimensional lattice.An event A ∈ F is called shift-invariant if A = θx(A) := θx(ω) : ω ∈ A for each x ∈ Zd.

A probability measure ν is called ergodic (with respect to the shift) if every shift invariant eventhas probability 0 or 1. That is in words, the operator mixes things up with respect to ν, thatis no event with non-trivial measure is left fixed by shifts. A probability measure is said to besupported on an event F ∈ F if ν(F ) = 1.

Proof of Theorem 4.10. In the interest of time we skip this proof, but it may be found for examplein Grimmett Probability on Graphs.

An important question now is; when is this spanning forest actually a tree? We will not provethe following result, see for example Pemantle (1991).

Theorem 4.11. The limit measure µ is supported on the set of spanning trees if and only if d 6 4.

For proof of a similar result in d 6 2 (infact more generally when the random walk isrecurrent), using the current flows see Lyons and Peres, Probability on Trees and Networks.

Another important question becomes, does the limit in Theorem 4.10 depend on what wedid with the boundary of the set Λn. As it stands we did not insist on anything special happeningon the boundary (a sort of ‘free’ boundary condition), and so µn is sometimes called the freeuniform spanning tree measure. However we could have imposed other boundary conditions,one natural thing would be to glue all the points outside Λn into a single vertex connected tothe boundary of Λn (as we did earlier). This gives rise to the so called wired uniform spanningtree measure. These turn out to give rise to the same weak limit on the d-dimensional integerslattice, and the key reason why the limits are the same is because these graphs are edge amenable,meaning the the ratio of the size of the edge set and the volume of large boxes goes to zero

limn→∞

|∂EΛn||Λn|

= 0 ,

where Λn 6= Zd and ∂EΛn is the set of edges that connect to Λn from outside. Then we couldask, when is the limiting object the same thing. It turns out in this case that we do get the samething, but we will return to this in more generality later when we look at Gibbs sates.

Chapter 5

Percolation

Percolation theory was introduced over 50 years ago as a mathematical framework for studyingrandom spatial processes such as the flow of liquids through disordered porous mediums. It gaverise to a huge amount of work in both applied and pure mathematics. These relatively simplemodels to define have given rise to a spectacular array of challenging mathematical problems,with many deep results (with surprising connections to other areas of mathematics) now proven,and many more still conjectured. One of the central features of these models, that we will beparticularly interested in during this course, is the concept of a phase transition. This is a pointin parameter space where the behaviour of the model changes in some abrupt way (we willbe more precise later). More generally, these models give rise to very rich large scale randomgeometries.

5.1 Model definition

We will focus on independent bond percolation on bbZd, as opposed to site percolation. We fixthe underlying graph G = (Zd, E), where E = x, y : |x−y|1 = 1, and a parameter p ∈ [0, 1].To each edge e ∈ E we assign a state 0 or 1, corresponding to closed or open respectively. Ourprobability space will therefore be Ω = 0, 1E endowed with the Borel σ-algebra generated byfinite dimensional cylinder sets that we discussed at the end of the last chapter. For a probabilitymeasure we take the product measure Pp, parametrised by p, under-which each edge is inde-pendently open with probability p. We think of an open edge as being open to the passage ofsome material, such as a liquid through a porous media.

It turns out that the following construction, that simultaneously couples bond percolationon G for all parameter values p ∈ [0, 1] on the same probability space, is often useful. Theidea of coupling is an important one, the concept is deceptively simple looking and extremelygeneral, but it is a very handy tool. By coupling, we simply mean that the random variablesare defined on a single probability space, with a single measure, such that the marginal on eachof the random variables is the desired one. In this way there is a huge amount of freedom inspecification of the joint law. Suppose (Ue : e ∈ E) is a family of independent uniform randomvariables on [0, 1], then define

ωp(e) =

1 if Ue < p ,

0 if Ue > p ,

and we call the associated probability measure P (without the p subscript). Then the marginallaw on ωp ∈ Ω for a fixed p ∈ [0, 1] is bond percolation on G with parameter p, i.e. (ωp(e) : e ∈E) is independent in its components and P(ωp(e) = 1) = Pp(ω(e) = 1) = p .

We are typically interested in the geometry of the open sub-graph, in particular the connectedcomponents, or clusters, of this sub-graph containing a given vertex x, called Cx. For x, y ∈ Vwe write x↔ y if there exists a path of open edges connecting x and y, and so

Cx = y ∈ V : x↔ y .We also write x↔∞ if |Cx| =∞. Since the graph G is translation invariant (vertex transitive -i.e. for each pair x, y ∈ V there exists a graph automorphism that takes x to y), it is clear that|Cx| has the same distribution as |C0|, and so we will focus on the latter.

28

5.2. THE PHASE TRANSITION 29

One of the principle concerns is whether or not there exists an infinite open cluster.

Definition 5.1 (Percolation probability and the critical probability). The percolation probabilityas a function of the parameter p is defined as

θ(p) = Pp(|C0| =∞) .

The critical probability is defined by

pc(d) = supp ∈ [0, 1] : θ(p) = 0 .

By the coupling given above we immediately get the following lemma.

Lemma 5.2. The percolation probability, θ(p), is an increasing function of p, and

θ(p)

= 0 if p < pc ,

> 0 if p > pc .

5.2 The phase transition

In one dimension it is easy to check that pc = 1, and there is no non-trivial phase transition.However it turns out that for all dimensions d > 2 there is a non-trivial phase transition.

Theorem 5.3. For all d > 2 we have pc ∈ (0, 1).

It turns out the for all p > pc the percolation probability θ(p) is a C∞ function of p forp ∈ (pc, 1], by definition it is 0 for p < pc, and by the previous coupling it must be increasing(this gives the typical picture for the function θ(p) in Fig. given in lecture). However, is still asignificant outstanding problem to show that there is continuity at pc.

Conjecture 1. The percolation probability θ(p) is continuous at pc for each d > 3.

The result is known to hold for d = 2, and in high dimensions (currently for d > 11), but notin between - and in particular it is not known for d = 3.

Proof of Theorem 5.3. By considering the obvious embedding it is clear that pc(d + 1) 6 pc(d).Therefore it is sufficient to show that pc(d) > 0 for all d > 2 and pc(2) < 1. We do the former by apath counting argument, and the latter using a technique which is often referred to as a Peierlscontour argument, after Peierls who used a similar argument applied to the two dimensionalIsing model.

We first show that θ(p) = 0 for p sufficiently close to zero, i.e. pc(d) > 0 for d > 2. We dothis by counting self avoiding paths. A self avoiding walk (SAW) is a path in the graph that visitseach site at most once. Let Nn be the the (random) number of SAWs of length n, starting fromthe origin, in the percolation sub-graph (using only open edges). Then

θ(p) = Pp(|C0| =∞) = P(Nn > 1 for all n > 1)

= limn→∞

Pp(Nn > 1) 6 limn→∞

Ep(Nn) , (5.2.1)

where the last inequality follows from Markov’s inequality. We upper bound the expectationby a crude upper bound on all possible SAWs in Zd and using the independence structure.Let σn denote the total number of SAWs, starting from the origin, in the d-dimensional lattice(deterministic graph). There are 2d choices for the first edge, and at most 2d−1 choices for each


subsequent edge, since they certainly can’t backtrack. Since each edge is open independentlywith probability p it follows that

Ep(Nn) 6 σnpn 6 2d(2d− 1)n−1pn =

2d

2d− 1((2d− 1)p)n .

Combining with Inequality (5.2.1) we find that pc > 1/(2d− 1).We now use planar duality and a Peierls argument in two dimensions to show that θ(p) > 0

for p sufficiently large. We now define the dual graph (see figure in lectures). The (planer)dual of a planer graph G (i.e. a graph embedded in R2 without edges crossing) is described asfollows. Each face of G (with respect to the edges of G), is associated with a vertex of the dual,and there is an edge between each of these vertices which have an edge in common (share aboundary). The dual of the Z2 lattice is isomorphic to the Z2 lattice, i.e. it is self dual, and inthe obvious embeding the dual is just shifted by (1/2, 1/2). There is a 1-to-1 correspondencebetween edges of the dual and edges of the original (primal) graph since each edge of the dualcrosses a unique edge of the primal graph. The percolation measure on the primal graph inducesa percolation on the dual by calling each edge of the dual open if and only if it crosses an openedge.

Suppose that |C0| <∞ then there exists a closed cycle in the dual percolation which enclosesC0. The converse also holds, if the origin lies inside a closed loop in the dual then |C0| < ∞.The proof of this fact is straightforward by tedious to write out, so we take it as given (see thefigure in lectures). It was originally proved by Kesten in ’82. We now count again. Let Mn bethe number of closed loops in the dual of length n, containing the origin, then

1− θ(p) = Pp(|C0| <∞) = P(∑

n > 1Mn > 1) (5.2.2)

6 Ep[∑

n > 1Mn] =∑

n > 1 Ep[Mn] 6∑

n > 1 ρn(1− p)n , (5.2.3)

where ρn is the (deterministic) number of loops (closed paths) of length n containing the originin the dual lattice. Every loop of length n which contains the origin must use an edge of theform (k+ 1/2, 1/2) for some 0 6 k < n (with the obvious embedding in R2), and so by applyingprevious bound on the number of SAWs we have ρn 6 nσ(n − 1) 6 n4n. So by taking psufficiently small we can make the sum on the right hand side of (5.2.2) smaller than 1, asrequired.

From the self duality in two dimensions you might guess that pc(2) = 1/2. That is, closededges in the dual form a percolation model where each edge is present independently withprobability 1− p, if the origin belongs to an infinite cluster in the primal graph then, due to theconstrained geometry, it seems unlikely that there is simultaneously an infinite closed clusterin the dual. On the other hand, for p < pc we may picture many finite ’islands’ of open edgesin the primal graph, surrounded by a see of closed edges in the dual, and presumably this seecontains some infinite cluster. Surprisingly this result was not made rigorous until Kesten in1980, building on the work of Russo, Seymour and Welsh.

Lemma 5.4. Let A∞ be the event that there exists an infinite cluster. Then the following dichotomyholds:

(a) If θ(p) = 0 then Pp(A∞) = 0.

(b) If θ(p) > 0 then Pp(A∞) = 1.

Proof. This result also follows from the fact that Pp turns out to be shift-ergodic, which we willsee later. For now we can deduce the result immediately from Kolmogorov’s zero-one law.(a) Suppose θ(p) = 0, the result follows by a simple union bound and translation invariance,

Pp(A∞) 6∑x∈Zd

Pp(|Cx| =∞) =∑x∈Zd

θ(p) = 0 .


(b) Recall Komogorov’s 0-1 law, if (Xi)i∈N are i.i.d. r.v.s, Fn = σ(Xk : k > n) and F∞ =∩n > 0Fn is the asymptotic σ-algebra, then F∞ is trivial, i.e. for all A ∈ F∞ either P(A) = 1 orP(A) = 0. Now suppose θ(p) > 0. Order the edges of Zd then (w(ei))i∈N are i.i.d., and A∞ ∈F∞ since changing finitely many edges does not change the occurrence of A∞. FurthermorePp(A∞) > θ(p) > 0, and so Pp(A∞) = 1.

Chapter 6

Percolation toolbox

In this chapter we present some of the fundamental tools that are often applied in the study ofpercolation, and more broadly in other areas of probability, for example in the study of randomgraphs and interacting particle systems.

Recall the natural partial order on Ω and the definition of increasing events and functions.Note that the events |C0| = ∞ and x ↔ y are increasing. We can take the consequences ofthe partial order on the state space yet another step, in that it also induces a partial order onprobability measures on Ω (we do not always make explicit reference to the probability spacebut it is always the one we discussed in Section 4.3).

Definition 6.1. Given two probability measures µ1, µ2 ∈ M1(Ω) we say µ2 stochastically domi-nates µ1, written µ1 6 µ2, if

µ1(f) 6 µ2(f) for all increasing functions f .

(Equivalently we could formulate this in terms of increasing events).

The following results, which states that if µ1 6 µ2 then the two measures can be coupled ina ‘pointwise monotone’ way, is often called Strassen’s theorem.

Theorem 6.2 (Strassen’s Theorem). Let µ1 and µ2 be probability measures on Ω then a necessaryand sufficient condition for µ1 6 µ2 is that there exists a probability measure ν on Ω2 such that

ν ((η, σ) : η 6 σ) = 1 ,

and whose marginals are µ1 and µ2.

Proof. Non examinable.

This should be compared with the increasing coupling we gave in the previous chapter forbond percolation. The coupling has the following immediate consequences (which we havealready used in the context of θ(p)).

Proposition 6.3. If f is an increasing random variable then

Ep1(f) 6 Ep2(f) whenever p1 6 p2 ,

so long as the expectation exists. If A is an increasing event then

Pp1(A) 6 Pp2(A) whenever p1 6 p2 .

Proof. Exercise.

32

6.1. HARRIS-FKG INEQUALITY 33

6.1 Harris-FKG inequality

We have previously seen negative association for weighted spanning trees, we will now seean extremely useful converse to this of positive association. Often we which to calculate theprobability of certain intersections, or conditional expectations of certain random variables.Unfortunately when the events (random variables) are not independent this is often difficult orimpossible to do directly. However the following inequalities can give very useful bounds for acertain class of events (or random variables).

Theorem 6.4 (Harris inequality). Let A and B be two increasing events then,

Pp(A ∩B) > Pp(A)Pp(B) i.e. Pp(A | B) > P(A) .

More generally if f and g are increasing, bounded, functions then

Ep[fg] > Ep[f ]Ep[g] .

Remark 6.5. Intuitively the meaning of the inequality is that conditioning on an increases thechance that edges are open, and therefore increases the probability of observing another increasingevent.

Example 6.6. For bond percolation on any infinite graph G with countable edge set, the criticalprobability is independent of the vertex you pick. Let pxc = supp : θx(p) = 0 where θx(p) =Pp(|Cx| =∞), then pxc = pyc for all x, y ∈ V , since

θx(p) > Pp(x↔ y ∩ y ↔∞) > Pp(x↔ y)θy(p) ,

so pxc 6 pyc . By the same argument we can reverse the inequality.

Proof of the Harris inequality. The second inequality is more general than the first (take f = 1A

and g = 1B), so we prove the second. Since E is countable we let E = eii > 1 and letωi = ω(ei). Fix f and g bounded and increasing functions. We first show that it is sufficient toconsider functions that depend on only a finite number of the edges. Let fn := Ep[f | ω1, . . . , ωn]and gn := Ep[g | ω1, . . . , ωn], which are functions only of the first n edge variables. If Gn =σ(w1, . . . , wn) then Gnn > 1 is a filtration, and trivially we have

Ep[fn+1 | Gn] = Ep[Ep[f | Gn+1] | Gn] = Ep[f | Gn] = fn ,

since Gn ⊂ Gn+1, i.e. (fn)n > 1 is a martingale. So by the martingale convergence theorem wehave fn → f and gn → g, Pp-a.s. and in L2(Pp). We also have Ep[fg] = limn→∞ Ep[fngn] since

Ep|fngn − fg| 6 Ep [|(fn − f)gn|+ |f(gn − g)|]

6(Ep [|(fn − f)|]Ep

[g2n

])1/2+(Ep [|(gn − g)|]Ep

[f2])1/2 → 0 as n→∞ ,

where we use the triangle inequality in the first line and Cauchy-Schwarz inequality in thesecond. It therefore suffices to show that Ep[fngn] > Ep[fn]Ep[gn] for each n > 1. We do this byinduction on n.

For n = 1 we have f1, g1 : 0, 1 → R, and f1(ω1) − f2(w2) > 0 if ω1 > ω2 and vice versa,and the same for g1, therefore

(f1(ω1)− f2(w2))(g1(ω1)− g2(w2)) > 0 for all ω1, ω2 ∈ 0, 1 .

It follows that

0 6∑

a,b∈0,1

Pp(ω1 = a)Pp(ω2 = b)(f1(a)− f2(b))(g1(a)− g2(b)) = 2 (Ep[f1g1]− Ep[f1]Ep[g1]) ,

6.1. HARRIS-FKG INEQUALITY 34

i.e. the required inequality holds.Now assume that for an function depending on k < n edges the required inequality holds.

The following gets notationally a little nasty. We firstly use the tower property,

Ep[fngn] = Ep [Ep[fngn | ω1, . . . , ωn−1]]

= Ep [Eωn [fn(ω1, . . . , ωn−1, ·)gn(ω1, . . . , ωn−1, ·)]] ,

where Eωn [fn(ω1, . . . , ωn−1, ·)] is understood to be the expected value with respect to the Bernoullirandom variable on edge en with ω1, . . . , ωn−1 fixed, therefore fn(ω1, . . . , ωn−1, ·) is understoodto be a function of a single edge configuration, and we may therefore apply the ‘k = 1 case’. Itfollows that

Ep[fngn] > Ep [Eωn [fn(ω1, . . . , ωn−1, ·)]Eωn [gn(ω1, . . . , ωn−1, ·)]]= Ep [Eωn [fn | ω1, . . . , ωn−1]Eωn [gn | ω1, . . . , ωn−1]]

> Ep[fn]Ep[gn] ,

where the final inequality follows from the ‘k = n− 1 case’. This completes the proof.

The Harris inequality has been extended to measure more general than the product measureon Ω, by Fortuin, Kasteleyn and Ginibre (1971). The following result is often called the FKGinequality and is very useful in many interacting particle systems. We first give a precursor,from which the FKG inequality follows. For ω1, ω2 ∈ Ω we define the (pointwise) maximum andminimum by

(ω1 ∨ ω2)(e) = maxω1(e), ω2(e) ,(ω1 ∧ ω2)(e) = minω1(e), ω2(e) ,

for e ∈ E. A probability measure µ on Ω is called positive if it assigns a strictly positive proba-bility measure to each point ω ∈ Ω, i.e. µ(ω) > 0.

Theorem 6.7 (Holley inequality). Suppose µ1 and µ2 are two positive probability measures on Ω,if

µ1(ω1 ∧ ω2)µ2(ω1 ∨ ω2) > µ1(ω1)µ2(ω2) , for all ω1, ω2 ∈ Ω ,

then µ1 6 µ2.

Proof. The typical proof is via stochastic monotonicity, which we will come to later, and con-structing suitable coupled processes and applying Strassen’s theorem.

The FKG inequality follows as a consequence of the Holley inequality.

Theorem 6.8 (FKG inequality). If µ is a positive probability measure on Ω such that

µ(ω1 ∧ ω2)µ(ω1 ∨ ω2) > µ(ω1)µ(ω2) , for all ω1, ω2 ∈ Ω ,

then µ is positively associated (has positive correlations) i.e.

µ(fg) > µ(f)µ(g) for all increasing f, g : Ω→ R .

6.2. BK INEQULAITY 35

6.2 BK inequlaity

In this section we will see a partial converse to the FKG inequality which holds in the specialcase of product measures on Ω. The inequality, typically called the BK-inequality after Berg andKesten, obviously can’t hold in general for events of the form A ∩ B for increasing A and B, sowe have to replace the intersection with something else (but closely related) called the ‘disjointoccurrence’ or ‘disjoint intersection’. In words, we will define A B to be the event that A andB occur on disjoint edge sets.

Suppose A and B are increasing events that depend on finitely many edges eini=1, eachω = (ω1, . . . , ωn) is specified uniquely by the set of edges on which it is non-zero, K(ω) = ei :ω(ei) = 1. We define A B to be the set of all ω such that there exists a subset H of K(ω) withω′ ∈ A whenever K(ω′) = H, and ω′′ ∈ B whenever K(ω′′) = K(ω) \H. In other-words, if

ωH(e) =

ω(e) for e ∈ H ,

0 for e 6∈ H ,

thenA B = ω : there exists H(ω) ⊂ E s.t. ωH ∈ A, ωE\H ∈ B .

In words, there are disjoint edge sets (possibly depending on ω) such that the configurationbeing open on the first is sufficient to belong to A and the configuration being open on thesecond is sufficient to belong to B. Note that A B is both commutative and associative.

Example 6.9. Suppose G is a finite sub graph of the d-dimensional lattice, and let AG(x, y) =

there is an open path in G connecting x and y = x G←→ y, then AG(u, v) AG(x, y) is theevent that there exists two edge-disjoint paths in G connecting x to y and u to v. It is then eminentlyplausible that

P (AG(u, v) AG(x, y) | AG(u, v)) 6 P (AG(x, y)) ,

since the disjoint intersection can’t “re-use” the edges used for the event “AG(u, v).”

This conclusion is the assertion of the BK inequality.

Theorem 6.10 (BK inequality). If A and B are increasing events that depend on finitely manyedges, then

Pp(A B) 6 Pp(A)Pp(B) .

Proof. Guided exercise on the final sheet.

The restriction to events that depend on only finitely many edges is important, but it can bedropped in practice for some important applications (with a little extra work).

It was conjectured by Berg and Kesten that the inequality should hold in general without therestriction to increasing events, this was shown later in 1997 by Reimer. The following resultactually contains the BK and Harris inequality but has not seen any application so far that wasnot one of these two special cases.

For ω ∈ Ω and F ⊂ E we define the cylinder event C(ω, F ) generate by ω on F as

C(ω, F ) = ω′ ∈ Ω : ω′(e) = ω(e) for all e ∈ F .

We define AB as the set of all ω such that there exists a K(ω) ⊂ E with C(ω,K) ⊂ A andC(ω,Kc) ⊂ B. Alternatively, if we write ωI for the restriction of ω to I ⊂ E, then if ω ∈ A wesay I is a witness to the event A for ω if ω′ ∈ A whenever ω′I = ωI . Then in words AB isthe set of configurations ω such that there exists two disjoint witnesses I(ω) and J(ω) of A andB for ω.

6.3. MARGULIS-RUSSO FORMULA 36

Remark 6.11. The disjoint occurrence satisfies:

• AB ⊂ A ∩B.

• If A and B are increasing then so is AB and AB = A B.

• If A is increasing and B is decreasing then AB = A ∩B.

Theorem 6.12 (Reimer’s inequality). If A and B are two events that depend on finitely manyedges then

Pp(AB) 6 Pp(A)Pp(B) .

Corollary 6.13. Consider a finite set S ⊂ Zd containing the origin and x 6∈ S, then,

Pp(o↔ x) 6∑y∈∂S

Pp(0S←→ y)Pp(y ↔ x) ,

where ∂S = x ∈ S : there exists a y ∈ Sc s.t. x, y ∈ E is the internal boundary of S.

Proof. For n > 1 let Λn = [−n, n]d. Fix n large enough such that S ⊂ Λn. If ω ∈ 0 Λn←−→ x thenthere is a SAW of open edges in ω from 0 to x that must cross S, let y be the first vertex in ∂Sthat this path visits, then necessarily ω ∈ 0 S←→ y y Λn←−→ x since the set of edges in the path

up to y is witness to 0 S←→ y for ω and the remainder of the path is witness to y Λn←−→ x. It

follows that 0 Λn←−→ x ⊂ ∪y∈∂S0S←→ y y Λn←−→ x, and so by a union bound we have

Pp(oΛn←−→ x) 6

∑y∈∂S

Pp(0S←→ y y Λn←−→ x) .

Then the BK inequality implies that

Pp(oΛn←−→ x) 6

∑y∈∂S

Pp(0S←→ y)Pp(y

Λn←−→ x) ,

and the claim follows by letting n tend to infinity.

6.3 Margulis-Russo formula

We have already seen that Pp(A) is non decreasing in p if A is an increasing event. The final toolwe will examine in this chapter will help to control the rate at which this probability changes asa function of p.

Consider our increasing coupling using the i.i.d. uniform r.v.s Uee∈E (recall we denote themeasure by P. For an increasing event A that depends on only finitely many edges, using thiscoupling, we have

Pp+δ(A)− Pp(A) = P(ηp+δ ∈ A, ηp 6∈ A) . (6.3.1)

If ηp+δ ∈ A and ηp 6∈ A we know there must exist at least one edge e ∈ E with p 6 Ue < p + δ,denote the set of all such edges by Ep,δ. Since the edges have independent weights it is clearthat P(|Ep,δ| > 2) = o(δ), i.e. with high probability there is a single edge with Ue ∈ [p, p+ δ). Ife is the unique edge satisfying Ue ∈ [p, p + δ) then this edge must be ‘essential’ for the event Ain the sense that ηp 6∈ A but ηep ∈ A, where in general for ω ∈ Ω and e ∈ E we define

ωe(f) =

ω(f) if f 6= e ,

1− ω(f) if f = e .

6.3. MARGULIS-RUSSO FORMULA 37

to be the configuration ‘flipped’ at edge e. Note that this ‘essentialness’ of this particular edgecould depend on ηp ∈ A but not the specific value of ηp(e). It follows that this specific edge con-tributes to (6.3.1) an amount P(Ue ∈ [p, p+δ) )PP (e is ‘essential’ for A) = δPP (e is ‘essential’ for A).Now summing over all such contributions and taking δ 0 we have

d

dpPp(A) =

∑e∈E

Pp(e is ‘essential’ for A) .

It turns out that this heuristic is essentially valid. Now we make the argument precise.Fix A an increasing even depending only finitely many edges and ω ∈ Ω, we call the edge

e ∈ E pivotal for the pair (A,ω) if 1A(ω) 6= 1a(ωe), and define the event

pive(A) = e is pivotal for A = ω ∈ Ω : e is pivotal for the pair (A,ω) .

Example 6.14. If A = 0↔ ∂Λn then

pive(A) = ω : if e is open then 0↔ ∂Λn and if e is closed then 0 6↔ ∂Λn

which is all the ω ∈ Ω such that e− ↔ 0, e+ ↔ ∂Λn, and there exists a closed loop in the dualsurrounding 0 (except for the edge crossing e).

Theorem 6.15 (Russo’s formula). If A is an increasing event, that depends on only finitely manyedges, then

d

dpPp(A) =

∑e∈E

Pp(pive(A)) = Ep[N(A)] ,

where N(A)(ω) = |e ∈ E : e is pivotal for (A,ω)|.Again, it is important that A depends on only finitely many edges, consider for example θ(p)

- we expect that this is not differentiable at the critical point.

Proof. We start by showing that if A depends on finitely many edges then f(p) = Pp(A) isdifferentiable. In fact (with slight abuse of notation since really we have average with respectto the variables on E \ e1, . . . , en),

f(p) = Ep[1A] =∑

ω1,...,ωn

1A(ω)p|ω|(1− p)n−|ω| ,

where |ω| =∑n

i=1 ωi. So in particular f(p) is a polynomial in p. We can simply calculate thederivative and collect terms to find (check),

f ′(p) =1

p(1− p)

n∑i=1

∑ω1,...,ωn

(ωi − p)1A(ω)p|ω|(1− p)n−|ω|

=1

p(1− p)

n∑i=1

Ep [1A(ω)(ωi − p)] =1

p(1− p)

n∑i=1

Covp [1A(ω), ωi] . (6.3.2)

This is already in a form which may come in handy later, but to express it in the form statedwe make a couple more observations. Firstly, if ω ∈ A ∩ pive(A)c then ωe ∈ A by definition ofpive(A) and ωe ∈ pive(A)c since pive(A) does not depend on ω(e), therefore

Ep[1A(ω)(ωi − p)1pive(A)c(ω)] = Ep[(ωi − p)]Ep[1A∩pive(A)c(ω)] = 0 .

It follows by the total law that

Ep[1A(ω)(ωi − p)] = Ep[(ωi − p)1A(ω)1pive(A)(ω)] ,

and since A is increasing, if ω ∈ pive(A), then 1A(ω) = 0 if ωi = 0 and 1 if ωi = 1. Therefore

Ep[1A(ω)(ωi − p)] = (1− p)Pp(pive(A) ∩ wi = 1) = (1− p) pPp(pive(A)) ,

which completes the proof.

Chapter 7

Further percolation

We start this section with some important results on ergodicity (with respect to shifts) andsome important results about the infinite cluster. Recall the discussion in Section 4.3 where wediscussed the details of the measurable structure of Ω, and we defined shift-ergodic

Lemma 7.1. The measure Pp is invariant under translations and is shift-ergodic.

Proof. We use repeatedly that any measurable event A ∈ F can be approximated by an eventthat depends on finitely many edges, in the sense that for all ε > 0 there exists an event Bdepending on fintely many edges with Pp(A∆B) < ε. It is clear that for any event B dependingon finitely many edges Pp(B) = Pp(τxB), since Pp is a homogeneous product measure, hence Ppis invariant under shifts. Fix A ∈ F translation invariant, we which to show that Pp(A) ∈ 0, 1.Fix ε > 0, and B depending on finitely many edges, E, with Pp(A∆B) < ε. Since E is finitethere exists an x such that B and τxB depend on disjoint edges, and are therefore independent,so

Pp(B ∩ τxB) = Pp(B)2 .

It follows thatPp(A) = Pp(A ∩ τxA) 6 Pp(B ∩ τxB) + 2ε 6 Pp(A)2 + 4ε ,

where the first equality followed from translation invariance, and hence Pp(A) ∈ 0, 1 as re-quired.

Corollary 7.2. If p > pc then there is an infinite cluster Pp-a.s. .

Note we already saw one proof using Kolmogorov’s zero-one law earlier, but it is also an imme-diate consequence of the previous lemma. It turns out that the infinite cluster (if it exists) is alsounique.

Theorem 7.3. If p > pc then there is a unique infinite cluster Pp-a.s. .

This result was first proved by Aizenman, Kesten and Newman in 1987, and there have beenvarious alternative proof since. See for example Burton am Keane 1989. I will add references.

7.1 Exponential decay in the subcritical regime

The arguments in this section and the next section on Kestern’s theorem are taken from Duminil-Copin and Tassion 2016 []. It turns out that (strictly) inside the subcritical region, not only isthere almost surely no infinite cluster, but the probability that the cluster containing the originis large decays exponentially quickly in the diameter. More precisely:

Theorem 7.4 (Exponential decay in the diameter). Fix d > 2, for every p ∈ (0, pc) there existscp > 0 such that for all n > 1 we have Pp(0 ↔ ∂Λn) 6 e−cpn. Furthermore, there exists a c > 0such that for p ∈ (0, pc), θ(p) > c(p− pc).

This theorem was first proved by Aizenman and Barksy in 1987, but we now present aremore recent proof by Duminil-Copin and Tassion from 2016. Clearly Pp(0 ↔ ∂Λn) is at leastpn = e−(log 1/p)n since this is the probability that the path from the origin straight up to theboundary is open.

38

7.1. EXPONENTIAL DECAY IN THE SUBCRITICAL REGIME 39

Proof. Let θn(p) = Pp(o ↔ ∂Λn), and denote the edge boundary of S ⊂ Zd by ∆S = e ∈ E :e− ∈ S, e+ 6∈ S, and finally define

ϕp(S) = p∑e∈∆S

Pp(0S←→ e−) ,

which is the expected number of open edges in the boundary of the connected component of Scontaining 0. Clearly if 0 6∈ S then ϕp(S) = 0 The proof now relies on the following claims (wepostpone their proof to the end):

Claim 7.5. If there exists a finite set S containing 0 such that ϕp(S) < 1, then there exists cp > 0such that for all n > 1 we have θn(p) 6 e−cpn.

For n > 1 define the random subset of Λn,

S = x ∈ Λn : x 6↔ ∂Λn .

The boundary of S is the ‘out-most blocking surface’ from the boundary of Λn, you can ‘discover’S by exploring all the sites you can reach following open paths started from the boundary, thiswill leave ‘unexplored’ islands of sites which can’t be reached from the boundary (belongingto S ), in a ‘sea’ of the connected component of the boundary. Thus the event S = S ismeasurable with respect to all the edges which have at one end outside of S.

Claim 7.6. For every p ∈ (0, 1) we have

θ′n(p) =1

p(1− p)Ep[ϕp(S )] . (7.1.1)

Remark 7.7. Since the edge boundary of S must be closed (or else the vertex attached by an openedge would also be in S ), we can interpret ϕp(S) (up to constant factors) as the expected numberof closed pivotal points of 0 ↔ ∂Λn conditioned on the connected component of the boundarybeing Sc.

We now complete the proof in light of these claims. This is achieved through a new definitionof the critical point that turns out to be equivalent; let

pc := supp ∈ [0, 1] : ∃ a finite set S containing 0 such that ϕ(S) < 1 .

By Claim 7.5, for p < pc there exists a cp > 0 such that θn(p) 6 e−cpn for every n > 1, inparticular θ(p) = 0. Therefore p 6 pc, so pc 6 pc. We now show pc > pc, i.e. θ(p) > 0 for allp > pc. Fix p > pc, so by definition ϕ(S) > 1 for all S containing the origin. Claim 7.6 then gives

θ′n(p) =1

p(1− p)Ep[ϕp(S )] >

1

p(1− p)Pp(0 ∈ S ) =

1

p(1− p)(1− θn) .

Some basic re-arranging then gives(log

(1

1− θn

))′>

(log

(p

1− p

))′,

which by integrating between pc and pc implies that for all n > 1, θn(p) > p−pcp(1−pc) . Letting n

tend to infinity we have a positive lower bound on θ(p) since p > pc. This completes the proofthat pc = pc, together with Claim 7.5 this completes the proof of the first part of the theorem.Also, substituting pc = pc into the previous bound on θ(p) gives θ(p) > c(p− pc) as required (theso-called mean field bound).

7.2. KESTEN’S THEOREM 40

Proof of Claim 7.5. A sketch could be helpful here. Fix S such that varphip(S) < 1, since S isfinite there exists an L > 0 such that S ⊂ ΛL−1. Now fix k ∈ N and consider the ‘big box’ ΛkL,then by the same argument as in the proof of Corollary 6.13, using the BK inequality,

θkL(p) 6 p∑e∈∆S

Pp(0S←→ e−)Pp(e+ ↔ ∂ΛkL) 6 ϕp(S)θ(k−1)L(p) ,

where in the second inequality we used translation invariance and the fact taht e+ must be adistance at least (k − 1)L of ∂ΛkL. [The proof of the above inequality is left as an exerciseon Sheet 3]. The claim follows by iteration on k, which give θkL(p) 6 ϕp(S)k, i.e. exponentialdecay of θn(p) in n.

Proof of Claim 7.6. Here we use Russo’s formula. Let A = 0↔ ∂Λn, then note that

(1− p)Pp(e is pivotal for A) = Pp(e is pivotal for A, ω(e) = 0)= Pp(e is pivotal for A, 0 6↔ ∂Λn) ,

since (as disscussed previously) e is pivotal for A does not depend on the configuration ofedge e. The by Russo’s formula,

θ′n(p) =1

1− p∑e∈En

Pp(e is pivotal for A, 0 6↔ ∂Λn)

=1

1− p∑e∈En

Pp(0↔ e−, e+ ↔ ∂Λn, 0 6↔ ∂Λn) (7.1.2)

where En are the edges between vertices both in Λn. Now we partition on S = S, andobserve that on the event S = S the event on the right hand side above can be interpretednicely,

0↔ e−, e+ ↔ ∂Λn, 0 6↔ ∂Λn,S = S = 0 S←→ e−, e+ 6∈ S, 0 ∈ S,S = S .

it follows immediately that we have

θ′n(p) =1

p(1− p)∑

S⊂Λn : 0∈S

∑e∈∆S

pPp(0S←→ e−, S = S)

=1

p(1− p)∑

S⊂Λn : 0∈S

(∑e∈∆S

pPp(0S←→ e−)

)Pp(S = S)

=1

p(1− p)Ep[ϕp(S )] ,

where in the second equality we used that 0 S←→ e− depends only on edges strictly inside S,and S = S can be determined from all the other edges (exploring from the boundary in usingopen paths).

7.2 Kesten’s theorem

We will rely heavily on the self duality of the Z2 lattice previously discussed. In light of this selfduality there are many heuristic arguments for why pc(2) = 1/2, arguably the most informativeis in terms of “box crossing’s”. Let Rn = [0, n]× [0, n− 1] and

Hn = ω ∈ Ω : 0 × [0, n− 1]↔ n × [0, n− 1] ,

7.2. KESTEN’S THEOREM 41

be the event that there is a right to left crossing of Rn. It turns out that studying crossingprobabilities, of boxes and more general shapes, is instrumental in understanding the randomgeometry of percolation and related models, in particular at criticality. This technology used tostudy this is generally called Russo-Seymour-Welsh theory.

Notice that the compliment of Hn is the event that there exists a closed path in the dualpercolation ‘from top to bottom’ of the rectangle Rn under a rotation of π/2. Checking thisremark carefully it follows immediately that

P 12(Hn) =

1

2∀n > 1 . (7.2.1)

If p < pc we know from the previous section that clusters are typically small, so that the prob-ability of crossing Rn goes to zero as n → ∞, therefore pc 6 1/2. On the other hand if p > pcthen its is believable that the infinite cluster is omnipresent, in the sense that if you look at anylarge-enough box you will see part of the infinite cluster, and therefore if n is sufficiently bigthen the probability of crossing Rn from left to right is close to one (i.e. Pp(Hn)→ 1 as n→∞for p > pc), and hence pc > 1/2. In order to prove that pc = 1/2 we will make both of theseargument rigorous, the former really follows almost immediately from the exponential decay ofcluster size at sub-criticality, and we prove the latter now using uniqueness of the infinite cluster.

Proposition 7.8. If θ(p) > 0 then limn→∞ Pp(Hn) = 1.

Proof. Fix n > k > 1, if Λk = [−k, k]2 is a box of ‘size’ k inside Λn = [−n, n]2, then a path fromΛk to Λn ends up on either the top, bottom, left or right of Λn, so by the Harris inequality itfollows that

Pp(Λk ↔ the left side of Λn) > 1− Pp(Λk 6↔ ∞)1/4 , (7.2.2)

this is a special case of the ‘square-root trick’ that will be part of the third example sheet.Now set n′ =

⌊n−1

2

⌋, and let An be the event that the box of size k shifted to the (left) centre

of Rn, given by Λk = Λk + (n′, n′), is connected to the left of Rn, and the box of size k shifted tothe (right) centre of Rn, given by Λk = Λk + (n′ + 2, n′) is connected to the right of Rn (noticethat we are a little careful about what we mean by ‘centred’ so that the bounds are obvious).Then from the bound above (7.2.2) we have

Pp(An) > 1− 2Pp(Λk 6↔ ∞)1/4 .

If An occurs but not Hn, then there are two distinct clusters from the box in the centre of Rn tothe boundary. The intersection over n > 1 of this event is included in the event that there aretwo distinct infinite clusters. It follows that Pp(An \ Hn)→ 0 as n→∞. Therefore,

lim infn→∞

Pp(Hn) = lim infn→∞

Pp(An) > 1− 2Pp(Λk 6↔ ∞)1/4 .

Taking the limit as k tends to infinity we deduce that Pp(Hn)→ 1 as required.

Combining this proposition with Eq. (7.2.1) we have the following corollary.

Corollary 7.9. There is no infinite cluster at p = 1/2, i.e. θ(1/2) = 0, in particular pc > 1/2.

To complete Kesten’s theorem we apply the subcritical exponential bounds.

Theorem 7.10 (Kesten’s theorem). For Bernoulli percolation on the Z2 lattice, pc(2) = 1/2, andfurthermore there is no infinite cluster at pc.

7.3. THE RUSSO-SEYMOUR-WELSH THEORY 42

Proof. It remains to show that if p < pc then Pp(Hn)→ 0 as n→∞. Fix p < pc then by Theorem7.4 there exists cp > 0 such that

Pp(0↔ ∂Λn) 6 e−cp n ,

We use a simple union bound over the left hand edge of Rn to get

Pp(Hn) 6n−1∑k=0

Pp ( (0, k)↔ n × [0, n− 1])

6 nPp(0↔ ∂Λn) 6 ne−cp n ,

where in the second inequality we used the obvious inclusion (i.e. Rn ⊂ τ(0,k)Λn) and translationinvariance. Again by (7.2.1) this implies that p < 1/2.

7.3 The Russo-Seymour-Welsh theory

In the previous section we saw estimates of the probability of crossing a box at the critical point(in particular P1/2(Hn) = 1/2). It turns out that understanding the probabilities of crossing morecomplicated shapes is extremely informative about the structure of the connected components.Estimates of rectangle crossing probabilities can be used to estimate the probability of crossingmore complicated structure, or for example the probability of an open cycle in an annulus, itturn these estimates have been pivotal for the proof of Cardy’s formula, conformal invariance,and scaling limits.

We did not have time to cover this in lectures, but we might explore some of these conceptson the longer final example sheet (Sheet 4 Question 2).

Chapter 8

Exclusion, contact and voter models

We will look at some classical interacting particle systems, in the sense of Markov processeson Ω = 0, 1Λ. Along the way we will develop some very general tools which turn out tobe surprisingly powerful, such as graphical constructions, coupling of processes (along withstochastic monotonicity), and duality. We start with some general setup.

8.1 Semigroups and generators

Throughout this section we will consider compact state space Ω, i.e. finite local state space, withthe topological and measurable structure discussed previously.

Let C(Ω) = f : Ω → R continuous regarded as a Banach space with respect to thesup-norm

‖f‖∞ = supη∈Ω|f(η)| .

Since Ω is compact ‖f‖∞ <∞ for all f ∈ C(Ω).

Definition 8.1. A (homogeneous) Markov process on Ω is a collection Pζ : ζ ∈ Ω of probabilitymeasures on D[0,∞) with the following properties:

(a) Pζ[η. ∈ D[0,∞) : η0 = ζ

]= 1 for all ζ ∈ Ω ,

i.e. Pζ is normalized on all paths with initial condition η0 = ζ.

(b) The mapping ζ 7→ Pζ [A] is measurable for every A ∈ F .

(c) Pζ [ηt+. ∈ A|Ft] = Pηt [A], Pζ -a.s. for all ζ ∈ Ω, A ∈ F and t > 0 . (Markov property)

Suppose Pη : η ∈ Ω is a Markov process on Ω. For f ∈ C(Ω) we define

S(t)f(η) := Eη[f(ηt)] =

∫D[0,∞)

f(ηt)dPη , (8.1.1)

Recall Pnf(η) = Eη[f(ηn)] in discrete-time, and P (t)f(η) = Eη[f(ηt)] for countable state contin-uous time Markov processes, in this sense S(t) is the continuous time analogue of the transitionmatrix.

Definition 8.2. A Markov process is called a Feller process if S(t)f ∈ C(Ω) for every f ∈ C(Ω)and t > 0.

Proposition 8.3. (and Definition of Markov semigroup) Suppose Pη : η ∈ Ω is a Feller processon Ω. Then the collection of linear operators (S(t) : t > 0) on C(Ω) is a Markov semigroup, i.e.

(a) S(0) = I, (identity operator at t = 0).

(b) t 7→ S(t)f from [0,∞) to C(Ω) is right-continuous for all f ∈ C(Ω), (right-continuity)

(c) S(t+ s)f = S(t)S(s)f for all f ∈ C(Ω), s, t > 0, (semigroup/Markov property)

(d) S(t) 1 = 1 for all t > 0, (conservation of probability)

43

8.1. SEMIGROUPS AND GENERATORS 44

(e) S(t)f > 0 for all non-negative f ∈ C(Ω) . (positivity)

Proof. Part (a) is equivalent to (a) of Definition 8.1, S(0)f(ζ) = Eζ [f(η0)] = f(ζ) since η0 = ζ.For fixed η ∈ Ω the right-continuity of t 7→ S(t)f(η) in t follows directly from right-continuity ofηt and continuity of f . The proof of uniformity in η ∈ Ω, required to show (b), is more involvedand can be found in Section 1 of Chapter IX of Yosida (1980).

Part (c) follows from the Markov property of ηt, Definition 8.1 (c),

S(t+ s)f(ζ) = Eζf(ηt+s) = Eζ[Eζ(f(ηt+s

∣∣Ft)] = Eζ[Eηtf(ηs)

]=

= Eζ[(S(s)f)(ηt)

]= S(t)S(s)f(ζ) .

Part (d): S(t) 1 = Eη(1) = Eη(1Ω(ηt)

)= 1 since ηt ∈ Ω for all t > 0 (conservation of

probability). Similarly for all constant functions c we must have S(t)c = c.Part (e): is immediate by definition.

Remark 8.4. Note that (b) implies in particular S(t)f → f as t → 0 for all f ∈ C(Ω), which isusually called strong continuity of the semigroup (see e.g. [5], Section 19). Furthermore, S(t) isalso contractive, i.e. for all f ∈ C(X)∥∥S(t)f

∥∥∞ 6

∥∥S(t)|f |∥∥∞ 6 ‖f‖∞

∥∥S(t)1∥∥∞ = ‖f‖∞ , (8.1.2)

which follows directly from conservation of probability (d). Strong continuity and contractivityimply that t 7→ S(t)f is actually uniformly continuous for all t > 0. Using also the semigroupproperty (c) we have for all t > ε > 0 and f ∈ C(X)∥∥S(t)f − S(t− ε)f

∥∥∞ =

∥∥S(t− ε)(S(ε)f − f

)∥∥∞ 6

∥∥S(ε)f − f∥∥∞ , (8.1.3)

which vanishes for ε→ 0 and implies left-continuity in addition to right-continuity (b).

The importance of Markov semigroups lies in the fact that each one corresponds to a Markovprocess, as can be seen from the following inverse of Proposition 8.3.

Theorem 8.5. Suppose (S(t))t > 0 is a Markov semigroup on C(Ω). Then there exists a uniqueMarkov process Pη : η ∈ Ω such that

S(t)f(η) = Eη[f(ηt)]

for all f ∈ C(Ω), η ∈ Ω, and t > 0.

Proof. Proofs can be found in Chapter I of Blumenthal and Getoor (1968) and Chapter I ofGihman and Skorohod (1975).

Let P(Ω) denote the set of all probability measures on Ω, with the topology of weak conver-gence: µn → µ in P(Ω) iff

∫fdµn →

∫fdµ for all f ∈ C(Ω). Note that P(Ω) is compact since Ω

is. Recall if µ ∈ P(Ω) then the Markov process started from initial distribution µ is given by

Pµ =

∫ΩPη µ(dη) .

Thus,

Eµ[f(ηt)] =

∫Ω

(S(t)f) (ζ)µ(dζ) =

∫ΩS(t)fdµ for all f ∈ C(Ω), . (8.1.4)

This motivates the following definition.


Definition 8.6. Suppose (S(t))t > 0 is a Markov semigroup on C(Ω). Given an initial distributionµ ∈ P(Ω) we denote the distribution at time t by µS(t) ∈ P(Ω), defined by∫

Ωfd[µS(t)] =

∫ΩS(t)fdµ ∀f ∈ C(Ω) . (8.1.5)

Note that we will sometimes write µt = µS(t). This is reminiscent of πn = πPn in discrete-time countable state space case. The fact that µS(t) is uniquely determined by (8.1.5) is aconsequence of the Riesz representation theorem (See for example [6, Theorem 2.14]).

Since (S(t))t > 0 has the semigroup property we expect it has the form of an exponentialgenerated by the linearization S′(0);

“S(t) = etS′(0) = Id + S′(0)t+ o(t) as t→ 0 where S′(0) = Id”.

In countable state space this makes sense in terms of matrix expontentials of the Q-matrix.Otherwise we make it precise as follows.

Definition 8.7. The generator L : DL → C(Ω) for the process (S(t))t > 0 is given by

Lf := limt0

S(t)f − ft

for all f ∈ DL, (8.1.6)

where the domain DL ⊆ C(Ω) is the set of all functions for which the limit above exists. Note thatthe limit is with respect to the sup-norm ‖ · ‖∞.

Proposition 8.8. (and Definition of Markov generator) L as defined above in Definition 8.7 is aMarkov generator, i.e.

(a) 1 ∈ DL and L1 = 0,

(b) for f ∈ DL and λ > 0: minζ∈Ω f(ζ) > minζ∈Ω(f − λLf)(ζ),

(c) DL is dense in C(Ω) and the range R(Id− λL) = C(Ω) for all λ > 0 sufficiently small.

Theorem 8.9. (Hille-Yosida) There is a one-to-one correspondence between Markov generators andMarkov semigroups on C(Ω) given by Definition 8.7 and

S(t)f := limn→∞

(Id− t

nL)−n

f ∀f ∈ C(Ω), t > 0.

Furthermore for f ∈ DL we have S(t)f ∈ DL and

d

dtS(t)f = S(t)Lf = LS(t)f

called the forward and backward equations respectively.

Remark 8.10. We see that the forward equation corresponds, in words, to differentiating withrespect to a small change in time at the end of the path, whereas the backward equation correspondsto a derivative in the initial time. Formally,

S(t)Lf(η) = S(t)g(η) where g(η) =d

dsEη [f(ηs)]

= Eη[d

dsEηt [f(ηs)]

]and

LS(t)f(η) = Lh(η) where h(η) = Eη [f(ηt)]

=d

dsEηEηs [f(ηt)]


Remark 8.11. It may appear that the forward and backward equations are one and the same. Ingeneral this is not the case. While L and S(t) formally commute, the domains of definition of theoperators are not necessarily identical. The difference between the forward and backward equationsbecomes significant, for instance, when dealing with certain boundary conditions where there isinstantaneous return from boundary points (or points at infinity) to another state. However if thegenerator L on a countable state space has the property that the absolute values of the diagonalentries satisfy a uniform bound, then the forward and backward equations have the same solution.In general, the backward equation has more solutions than the forward equation and its minimalsolution is also the solution of the forward equation. A full discussion of such matters demandsmore technical analysis which is not relevant for the rest of this course.

Remark 8.12. The Hille-Yosida Theorem usually is used as follows. The process is first described interms of the jump rates as we did for the examples. These are used to define a generator, formallywritten

Lf(η) =∑η′ 6=η

c(η, η′)(f(η′)− f(η)). (8.1.7)

Note that the sum on the right hand side doesn’t technically makes sense for uncountable state space.For the process we have introduced this makes sense for sufficiently smooth functions, specificity orf ∈ D(Ω) = f ∈ C(Ω) :

∑x∈Λ supη∈Ω |f(ηx) − f(η)| < ∞. Given reasonable conditions on

the jump rates, the closure of the operator L on D(Ω) is a generator on C(Ω) (in this case D(Ω) iscalled a core of the generator). For more details see [3, Chapter 1] or [2, Chapter 3 and 4].

An interacting particle system on Ω = 0, 1Λ is called a reaction diffusion process if the onlypossible transitions are those of the form η → ηx or η → ηx,y for some x, y ∈ Λ.

Definition 8.13. The jump rates of a reaction-diffusion process on 0, 1Λ are said to be of finiterange R > 0 if for all x ∈ Λ there exists a finite ∆x ⊆ Λ with |∆x| 6 R such that

c(ηz, (ηz)x) = c(η, ηx) and c(ηz, (ηz)x,y) = c(η, ηx,y) ∀ η ∈ Ω and z 6∈ ∆x,

furthermorec(η, ηx,y) = 0 whenever |x− y| > R .

That is the rate at which the state changes at a site x only depends on the configuration in at mostR lattice sites, and a particle can move to at most R sites.

Proposition 8.14. For a finite range reaction-diffusion process for each f ∈ D(Ω) we have ‖Lf‖∞ <∞ and the Markov generator L is uniquely determined by its action on D(Ω), furthermore

Lf(η) =∑x∈Λ

c(η, ηx) (f(ηx)− f(η)) +∑x,y∈Λ

c(η, ηx,y) (f(ηx,y)− f(η)) ∀ f ∈ D(Ω).

Proof. The proof that ‖Lf‖∞ < ∞ for all f ∈ D(Ω) is left as an excercise. The second part ismore technical, see for example [3, Theorem 3.9].

Definition 8.15. A measure π on Ω is called stationary (or invariant) if πS(t) = π for all t > 0.We denote the set of all stationary probability measures by I.

Proposition 8.16. Suppose D is a core for the generator L of a Markov semigroup S(t) then

π ∈ I ⇔ π(Lf) =

∫ΩLf(η)π(dη) = 0 ∀ f ∈ D(Ω) .

Theorem 8.17. For every Feller process on compact Ω:

8.2. CONTACT PROCESS AND GRAPHICAL CONSTRUCTION 47

1. I is non-empty, compact and convex,

2. If the weak limit π = limt→∞ µS(t) exists from some probability measure µ then π ∈ I.

Proof. Convexity of I follows from the following two basic facts:

a) A convex combination of probability measures is again a probability measure, if µ1, µ2 ∈P(Ω) then

ν = λµ1 + (1− λ)µ2 ∈ P(Ω) ∀λ ∈ [0, 1] .

b) The stationarity condition is linear, i.e. if µ1, µ2 ∈ I then ν ∈ I since

ν(Lf) = λµ1(Lf) + (1− λ)µ2(Lf) = 0 ∀ f ∈ DL .

Suppose µ1, µ2 ∈ I then by the two properties above

ν := λµ1 + (1− λ)µ2 ∈ I .

If µn ∈ I is a sequence satisfying µn → µ weakly then µn(Lf) = 0 for all n ∈ N, and Lf ∈ C(Ω),hence µ(Lf) = limn→∞ µn(f) = 0. Therefore I is a closed subset of a compact space, and hencecompact.

The proof of part 2. is omitted.

Definition 8.18. A Feller process is called ergodic if;

1. I = π is a singleton and,

2. limt→∞ µS(t) = π for all µ a probability measure on Ω.

8.2 Contact process and graphical construction

We may think of the contact process as a simple model for the spread of infection on a countablegraph G = (Λ, E). The state space is again Ω = 0, 1Λ with the usual topological and measur-able structure. We think of a site x ∈ Λ being in state 0 as ‘healthy’ and 1 as infected. We definea Markov process in which each infected site recovers independently with rate 1 and infects itsneighbour (independently) with rate λ while it is infected, i.e. for η ∈ Ω

η → ηx with rate

1 if η(x) = 1 (recovery),

λ∑

y∼x η(y) if η(x) = 0 (infection) .

Note that in this model the number of particles (i.e. infections) is not conserved, and η ≡ 0is absorbing, so if G is finite the process is a.s. absorbed at 0. We can follow the constructiondescribed very briefly in the previous section with

Lf(η) =∑z∈Λ

(η(z) + λ

(1− η(z)

)∑y∼z

η(y))(f(ηz)− f(η)

)∀ f ∈ D(Ω) .

However it turns out that there is an equivalent definition on a much larger probability spacewhich will have a lot of useful features, namely the graphical construction. The rigours proofthat the graphical construction gives rise to the same process goes back to Harris in the late 70’s.

For each site x ∈ Λ we draw a vertical ‘time line’ [0,∞), and on each line x × [0,∞) weplace a Poison point process Rx (equivalently the arrival times of a Poisson process (Rxt )t > 0)with intensity 1, independently in x, which we associate with possible recovery events. To eachpair x, y ∈ Λ such that x ∼ y we associate independent Poisson point process Bx,y with intensity


Z

time

0 21−1−2−3−4 3 4

Figure 8.1: Contact process on 0, 1Z graphical construction started from two (ordered) initialconditions

λ, which we associate with possible pass of infection from x to y (see Figure 8.1). This definesthe probability space and measure Pλ.

The contact process is constructed on the space above in terms of (directed) paths. We saythere exists a direct path from (x, s) to (y, t) if there is a path which is increasing in time, does notcross any recovery events, and can follow the direct arrows (see Fig. 8.1). More precisesly, thereexists a direct path from (x, s) to (y, t) if there exists (x, s) = (x0, t0), (x0, t1), (x1, t1), (x1, t2), . . . , (xn, tn+1) =(y, t) with t0 6 t1 6 . . . 6 tn+1 with;

1. each interval xi × [ti, ti+1] contains no points of Rxi ,

2. ti ∈ Bxi,xi+1 for i = 1, 2, . . . , n.

If shuch a path exists we say that an infection at x at time s can give rise to an infection at y attime t.

For an initial infection η0 ∈ Ω we define ηt ∈ Ω, for t ∈ [0,∞), by ηt(y) = 1 if and only ifthere exists an x ∈ Λ such that η0(x) = 1 and there is a directed path from (x, 0) to (y, t). Then(ηt)t > 0 is a contact process.

8.2.1 Monotonicity

The contact process belongs to an important class of IPS called spin systems. We call a Fellerprocess a spin system if Ω = 0, 1Λ or −1, 1Λ, i.e. the local state space has cardinality 2, andthe dynamics are such that at most one site can change state at any time (single ‘spin-flips’).

Definition 8.19 (Stochastic monotonicity). A Feller process (ηt)t > 0 with semigroup S(t) is calledmonotone if it preserves stochastic order, i.e. if f ∈ C(Ω) is an increasing function then S(t)f is anincreasing function for all t > 0. Equivalently µ 6 ν implies µS(t) 6 νS(t) for all t > 0.

It turns out, and we will prove on example sheet 3, that for spin systems this property canbe identified in a rather simple way in terms of the spin flip rates. In particular we call thespin system attractive if the presence of more 1s in a configuration can not decrease the rate offlipping to 1 - i.e. the local configuration is ‘attracted’ by the state around it. More precisely:


Definition 8.20 (Attractive). A spin system is called attractive if and only if

η 6 σ =⇒

c(η, ηx) 6 c(σ, σx) if η(x) = σ(x) = 0 ,

c(η, ηx) > c(σ, σx) if η(x) = σ(x) = 1 .

It turns out that for spin systems these two concepts are equivalent.

Theorem 8.21. A spin system is attractive if and only if it is monotone.

Proof. See Sheet 3.

Lemma 8.22. The contact process is monotone.

Proof. The fact that the contact process is monotone follows immediately from the graphicalconstruction.

In fact the graphical construction immediately gives rise to an even stronger property.

Lemma 8.23. The graphical construction is additive: Given the usual bijection between Ω and thepower set 2Λ given by identifying a configuration with the subset of Λ of occupied sites, i.e. weidentify η ∈ Ω with x ∈ Λ : η(x) = 1 ⊂ Λ, if η0 we write ηA, then ηA∪Bt = ηAt ∪ ηBt .

There is a further monotonicity present in the contact process which is evident in the graph-ical construction, in-particular monotonicity in λ.

Lemma 8.24. Let (ηt)t > 0,(σt)t > 0 be two contact processes with semigroups Sλ1 and Sλ2 , andinfection rates λ! 6 λ2 respectively. Then

µλ1 6 µλ2 =⇒ µλ1Sλ1(t) 6 µλ2Sλ2(t) ∀t > 0.

Proof. We provide a Markov coupling of the two processes such that if h0 6 σ0 then ηt 6 σtfor all t > 0 with probability one. The result then follows from Strassen’s theorem (Theorem??). The coupling is given by the graphical construction with infection rate λ2, to construct thecontact process with smaller infection rate λ2 we thin the infection processes (Bx,y), by removingeach arrow with probability λ1/λ2. It is clear that this gives a coupling such that if η0 = σ0 thenηt 6 σt for all t > 0. The result follows by stochastic monotonicity of the process.

8.2.2 Duality

Another extremely useful tool for studying interacting particle systems is duality. This propertyallows us to examine the probability of certain key events under one processes by relating themto events under a hopefully simpler Markov process.

Definition 8.25. Two Markov processes (ηt)t > 0 on Ω and (ζt)t > 0 on Ω with path measures Pηand Pζ respectively are said to be dual with respect to duality function H : Ω × Ω → R if H isnon-negative and continuous, and

Eη [H(ηt, ζ)] = Eζ [H(η, ζt)] ,

equivalently

S(t)H(·, ζ)(η) = S(t)H(η, ·)(ζ) ∀ η ∈ Ω, ζ ∈ Ω . (8.2.1)

If Ω = Ω and Pη = Pη then (ηt)t > 0 is called self-dual.

The following result is extremely useful for checking, and recognising, duality conditions.


Proposition 8.26. In the context of the previous definition, if (ηt)t > 0 (respectively (ζt)t > 0) hasMarkov generator L (respectively L) they the processes are dual if and only if

LH(·, ζ)(η) = LH(η, ·)(ζ) ∀ η ∈ Ω, ζ ∈ Ω ,

provided that LH(·, ζ) and LH(η, ·) are well defined.

Proof. Omitted, see Liggett Interacting Particle Systems.

Example 8.27. Birth-death process on N = 0, 1, . . . with a trap at 0.

n→ n+ 1 at rate β(n) > 0 ∀n > 1 β(0) = 0 ,

n→ n− 1 at rate δ(n) > 0 ∀n > 1 δ(0) = 0 .

Let H(η, ζ) = 1η 6 ζ, we shall see that this defines a duality function and that the dual processis ‘simpler’ in the sense that it is irreducible (no absorbing state).

LH(·, ζ)(η) = β(η) [H(η + 1, ζ)−H(η, ζ)] + δ(η) [H(η − 1, ζ)−H(η, ζ)]

= β(η)

0 if η 6= ζ

−1 if η = ζ+ δ(η)

0 if η 6= ζ + 1

1 if η = ζ + 1

= β(ζ) [H(η, ζ − 1)−H(η, ζ)] + δ(ζ + 1) [H(η, ζ + 1)−H(η, ζ)] := LH(η, ·)(ζ) ,

whereLf(ζ) = β(ζ) (f(ζ − 1)− f(ζ)) + δ(ζ + 1) (f(ζ + 1)− f(ζ))

is the Markov generator of a process for which n → n + 1 at rate δ(n + 1) and n → n − 1 at rateβ(n). Notice that the chain described by L is irreducible, and the roles of β and δ are reversed. Nowin terms of the Markov semigroup

S(t)H(·, ζ)(η) = S(t)1· 6 ζ(η) = Eη [1ηt 6 ζ] = Pη(ηt 6 ζ)

= S(t)1η 6 ·(ζ) = Pζ (ζt > η) .

So taking the limit t → ∞ we see that ηt has a positive probability of escaping to ∞ if and onlyif ζt is positive recurrent, and in this case Pη(ηt is absorbed at 0 ) =

∑ζ > η π(ζ) where π is the

stationary distribution of (ζt)t > 0.

Theorem 8.28. The contact process is self dual (with a contact process with finitely many infectedsites) with respect to the duality function

H(η,A) =∏x∈A

(1− η(x)) = 1(η ≡ 0 on A) , for η ∈ Ω, and A ⊂ Λ finite,

where (as usual) we identify subsets of Λ with configurations in Ω. That is, for A,B ⊂ Λ

PA(ηAt ∩B 6= 0) = PB(ηBt ∩A 6= 0) ,

where (ηAt )t > 0 denotes the process started from infection in A ⊂ Λ (there is some obvious redun-dancy in the notation PA at this point).

Proof. Using the graphical construction we see that the event ηAt ∩ B 6= 0 is the union overall a ∈ A and b ∈ B of the event that there is a directed path from (a, 0) to (b, t). If wereverse the direction of time and all the arrows in the graphical construction the probabilitymeasure induced on path space started from time t and run back to 0 is the same as we havefor the process started from time zero and run forwards. Given these two observation the eventηAt ∩B 6= 0 occurs with the same probability as ηBt ∩A 6= 0.

As an alternative proof, you could mirror ‘style’ of the calculation in the previous example.


8.2.3 Stationary measures and percolation

To simplify the presentation we will now focus on the contact process on the Zd lattice. We wouldlike to characterise the stationary measures of the process, by Theorem 8.17 it is sufficient tofind all the extremal measures. Let δ0 be the probability measure that puts unit mass on theconfiguration “everyone is healthy”, i.e. δ0(η ≡ 0) = 1, and δ1 is defined by δ1(η ≡ 1) = 1.Clearly δ0 is invariant (since no infection can “appear out of nowhere”), and δ0 6 µ for all µ aprobability measures on Ω. δ0 is called the minimal (invariant) measure.

The maximal (invariant) measure is constructed by the weak limit of the contact processstarted from “everyone infected”.

Proposition 8.29. We have δ1S(t) 6 δ1S(s) for all 0 6 s 6 t and hence the weak limit νλ =limt→∞ δ1S(t) exists, furthermore νλ ∈ Ie. We call νλ the upper invariant measure.

Proof Sketch. µr := σ1S(r) 6 δ1 by construction. Let t = s+ r, then

µt = σ1S(r)S(s) = µrS(s) 6 δ1S(s) = µs ,

where the final inequality follows from stochastic monotonicity. We now use the fact that theset of probability measures over Ω is relatively compact (see Section 4.3.1). The fact that ν ∈ Iefollows from the next proposition.

Proposition 8.30. We have δ0 6 ν 6 ν for all ν ∈ I.

Proof. Apply stochastic monotonicity and take week limits.

To complete the proof of the previous proposition suppose ν = αµ1 + (1− α)µ2 for µ1, µ2 ∈I and α ∈ [0, 1]. Then for any increasing function f we have, by the Prop 8.30, we haveµ1(f) 6 ν(f), and µ2(f) 6 ν(f). Together with ν(f) = αµ1(f) + (1 − α)µ2(f) these implyν = µ1 = µ2, as required. Since δ1 is clearly translation invariant, and ν is defined as the weeklimit of δ1S(t) as t→∞, we also have that ν is translation invariant.

As a consequence of the results above, the contact process is ergodic if and only if ν = δ0.Trying to identify when this is the case reduces to a percolation-type of problem. In analogywith the percolation probability we define

θ(λ) = P0(η0t 6= ∅ for all t > 0) (8.2.2)

this is exactly the probability that the origin (0, 0) is ‘connected to ∞’ by a directed path in thegraphical construction.

Proposition 8.31. The density of infected sites under ν is equal to θ(λ), i.e.

θ(λ) = ν(σ(x) = 1) , x ∈ Zd ,

which is clearly independent of x.

Proof. We use the duality result, Theorem 8.28 and translation invariance. Clearly η0t 6= ∅ ⊆

η0s 6= ∅ for t > s, hence

θ(λ) = limt→∞

P0(η0t 6= ∅) .

Now applying the self duality, Theorem 8.28, with with B = Zd we have

P0(η0t 6= ∅) = Pδ1(ηt(0) = 1) ,

and by construction

Pδ1(ηt(0) = 1) = δ1S(t)(η(0) = 1)→ ν(η(0) = 1) as t→∞ ,

the claim follows by translation invariance of ν.

8.3. EXCLUSION PROCESS 52

By the monotonicity of the contact process in the infection rate, Lemma 8.24, θ(λ) must benon-decreasing in λ. We define the critical value of λ by

λc = λc(d) = supλ > 0 : θ(λ) = 0 .

Then (c.f. percolation)

θ(λ)

0 if λ < λc ,

> 0 ifλ > λc ,

and by Proposition 8.31,

ν

= δ0 (ergodic) ifλ < λc ,

6= δ0 (not ergodic) ifλ > λc .

In fact the connections between the contact process and percolation go much deeper than wehave time to go into any further here, but see for example Grimmett Probability on Graphs andLiggett Interacting Particle Systems (Chapter on the contact process) for an introduction to thisarea and further references.

8.3 Exclusion process

The exclusion process is a prototypical IPS in which the particle number is locally conserved.Roughly speaking particles make independent random walks until they meet each other, andthen they interact simply by exclusion (as the name suggests) meaning that at most one particlecan occupy any site. The most simple forms of this process have been studied intensely from atheoretical point of view and have given rise to a whole host of rich and beautiful results, alsowith deep connections to many other areas of probability theory. Many variants have also beenstudied and applied to modelling systems in biology and physics.

Again, let Ω = 0, 1Λ and for η ∈ Ω we say there is a particle at site x ∈ Λ if η(x) = 1,and otherwise the site is empty. Fix p(x, y) a transition matrix of a (discrete time) irreduciblerandom walk on Λ such that ∑

y

∑x∈Λ

p(x, y) <∞ ,

this technical condition is just a simple way of insuring we construct a Feller process. Thenη → ηx,y at rate p(x, y) if η(x) = 1 and η(y) = 0, where

ηx,y(z) =

η(z) if z /∈ x, y ,1− η(z) if z ∈ x, y .

Correspondingly, the generator of the process is given by

Lf(η) =∑x,y∈Λ

p(x, y)η(x)(1− η(y))(f (ηx,y)− f (η)

), for f ∈ DL . (8.3.1)

Here we may take the core of the generator to DL to be the set of functions which depend ononly fintiely many sites (cylinder functions).

Remark 8.32. Unlike ‘spin-flip’ dynamics we saw in the contact process, in the exclusion processthe particle number is conserved, i.e. particles move but are never created or destroyed.

Our main aim in this section will again be to characterise the stationary measures of theprocess. With this in mind we will find the following definition rather useful.


Definition 8.33. For a given density profile ρ : Λ → [0, 1] the probability measure νρ is calledproduct measure on Ω if for all k ∈ N and x1, . . . , xk ∈ Λ, mutually different, and n1, . . . , nk ∈0, 1,

νρ [η(x1) = n1, η(x2) = n2, . . . , η(xk) = nk] =

k∏i=1

νρ [η(xi) = ni]

=k∏i=1

ρ(xi)ni(1− ρ(xi))

1−ni .

With a slight abuse of notation, if ρ ∈ [0, 1] then by νρ we mean νϕ with f(x) = ρ for all x ∈ Λ.

Observe, the above notation simply says that under νρ we have η(x) ∼ Ber(ρ(x)

)independently

in x ∈ Λ.

Theorem 8.34. (a) Suppose p(x, y) is doubly stochastic, i.e.∑y′∈Λ

p(x, y′) =∑x′∈Λ

p(x′, y) = 1 ∀x, y ∈ Λ

then νρ ∈ I for all ρ ∈ [0, 1] (uniform density stationary measures).

(b) If λ(x) > 0 and λ(x)p(x, y) = λ(y)p(y, x) for all x, y ∈ Λ (i.e. p(x, y) is reversible), then

νρ ∈ I where ρ(x) =λ(x)

1 + λ(x)∀x ∈ Λ .

Proof. By Proposition 8.16, we need to check

νρ (Lf) = 0 , for all f ∈ DL .

By linearity of this condition it is sufficient to check that νρ (Lf∆) = 0 for simple functions

f∆(η) =

1 if η(x) = 1 ∀x ∈ Λ ,

0 otherwise,

where ∆ ⊂ Λ finite. The rest of the proof is a calculation

νρ(Lf∆) =∑x,y∈Λ

p(x, y)

∫Ωη(x)

(1− η(y)

)(f∆(ηx,y)− f∆(η)

)dνρ . (8.3.2)

For x 6= y (we take p(x, x) = 0 for all x ∈ Λ) the integral terms in the sum look like∫Ωf∆(η) η(x)

(1− η(y)

)dνρ =

0 if y ∈ ∆,

(1− ρ(y))∏u∈∆∪x ρ(u) if y 6∈ ∆∫

Ωf∆(ηx,y) η(x)

(1− η(y)

)dνρ =

0 if x ∈ ∆,

(1− ρ(y))∏u∈∆∪x\y ρ(u) if x 6∈ ∆

(8.3.3)

This follows from the fact that the integrands take values only in 0, 1, so the right-hand sideis the probability of the integrand being 1. Re-arranging the sum we get

νρ(Lf∆) =∑x∈∆y 6∈∆

[ρ(y)

(1− ρ(x)

)p(y, x)− ρ(x)

(1− ρ(y)

)p(x, y)

] ∏u∈∆\x

ρ(u) . (8.3.4)


Assumption of (b) is equivalent to

ρ(x)

1− ρ(x)p(x, y) =

ρ(y)

1− ρ(y)p(y, x) , (8.3.5)

so the square bracket vanishes for all x, y in the sum (8.3.4). For ρ(x) ≡ ρ in (a) we get

νρ(Lf∆) = ρ|∆|(1− ρ)∑x∈∆y 6∈∆

[p(y, x)− p(x, y)

]= 0 (8.3.6)

due to p(., .) being proportional to a doubly-stochastic matrix.

We have found sufficient conditions for stationary measures, in fact they turn out to be closeto necessary. In the next two sections we focus on two different special cases.

8.3.1 ASEP and blocking measures

Now we consider an asymmetric simple exclusion process on Λ = Z (ASEP) with p(x, x+ 1) = p

and p(x, x − 1) = q with p > q. The underlying RW is transient but λ(x) = c(pq

)xsatisfies the

detailed balance condition (it is certainly not a stationary probability measure for the RW sinceit can not be normalized). For c > 0 it blows up exponentially as x → ∞. By Theorem 8.34 weknow νρ is stationary for this ASEP, where

ρ(x) =c(p/q)x

1 + c(p/q)x=

(p/q)x−x0

1 + (p/q)x−x0,

and x0 = logp/q(1/c). See picture from lectures. Note, by the Borel-Cantelli Lemma, νρconcentrates on configuration with only finitely many particles on the left and only finitelymany vacancies on the right, i.e.

νρ

[η ∈ Ω :

∑x<0

η(x) <∞,∑x > 0

(1− η(x)) <∞]

= 1 .

These are called blocking measures (there is a blockage in this case on the right), and it turnsout that for p 6= q they are reversible. Let

Θx0 =

η ∈ Θ :

∑x<x0

η(x) =∑x > x0

(1− η(x))

where Θ = η ∈ Ω :∑

x<0 η(x) < ∞,∑

x > 0(1 − η(x)) < ∞. Then the process starting froma point η0 ∈ Θx0 remains in Θx0 for all times. This holds since - when a particle crosses thebond (x0 − 1, x0) then simultaneously a hole goes the other way. So (ηt)t > 0 is an irreducible,countable state, Markov chain on Θx0 which is positive recurrent since πx0 [·] = νρ[· | Θx0 ] isstationary (note πx0 is independent of ρ). T. M. Ligget proved that if p 6= q then Ie = νρ : ρ ∈[0, 1] ∪ πn ; n ∈ Z.

8.3.2 SSEP stationary measure, coupling and duality

We again focus on Λ = Z but now instead we assume the underlying RW is symmetric, i.e.p(x, y) = p(0, y − x) = p(y, x), called the symmetric simple exclusion process (SSEP).

Theorem 8.35. For the SSEP on Λ = Z and with p(x, y) = p(0, y − x) = p(y, x) the transitionprobabilities of a symmetric irreducible discrete time random walk on Z, then


(a) Ie = νρ : 0 6 ρ 6 1.

(b) If µ is a translation invariant and shift-ergodic, then limt→∞ µS(t) = νρ where ρ = µη ∈Ω : η(0) = 1 is the initial density.

The proof of this result will take up the remainder of this section, we will use coupling(stochastic monotonicity), self duality, and de Finetti’s Theorem.

Proposition 8.36. The SSEP on Ω is self-dual with a finite exclusion process. More precisely, forη ∈ Ω and A ⊂ Λ finite define

H(η,A) =∏x∈A

η(x) = 1ηA≡1 =

1 if η ≡ 1 on A,

0 otherwise.

then

S(t)H(·, A)(η) = S(t)H(η, ·)(A) for all η ∈ Ω and A ⊂ Λ finite.

Here again we identify subsets of Λ with elements of Ω.

Proof. We extend our notation for configuration’s after a jump from x to y to finite sets as follows,for a finite A ⊂ Λ let

Ax,y =

A if x, y ∈ A, or x, y 6∈ A,(A \ x) ∪ y if x ∈ A and y 6∈ A,(A ∪ x) \ y if x 6∈ A and y ∈ A.

Now observe thatH(ηx,y, A) = H(η,Ax,y),

please check this calculation yourself. We now examine how the generator of the symmetricexclusion process acts on H as a function of its first variable (and mirror the calculation we didin the birth-death chain example),

LH(·, A) =∑x,y∈Λ

p(x, y)η(x) (1− η(y)) (H(ηx,y, A)−H(η,A)) ,

=∑x,y∈Λ

p(x, y)η(x) (1− η(y)) (H(η,Ax,y)−H(η,A)) .

There are only two cases in the sum above for which the argument is non-zero, either x ∈ A andy is not or vice versa. So, also applying the following obvious identities η(y)(1 − η(y)) = 0 andη(x)η(x) = η(x),

LH(·, A) =∑

x 6∈Ay∈Ap(x, y) (1− η(y))H(η,Ax,y)−

∑x∈Ay 6∈A

p(x, y) (1− η(y))H(η,A)

=∑

x∈Ay 6∈Ap(y, x) (1− η(x))H(η,Ax,y)− p(x, y) (1− η(y))H(η,A)

=∑

x∈Ay 6∈Ap(y, x)H(η,Ax,y)− p(x, y)H(η,A) + (p(x, y)− p(y, x))H(η,A ∪ y)

Now since p(x, y) = p(y, x) we have

LH(·, A) =∑

x∈Ay 6∈Ap(x, y) (H(η,Ax,y)−H(η,A)) ,

=∑x,y∈Λ

p(x, y)1A(x) (1− 1A(y)) (H(η,Ax,y)−H(η,A)) := LH(η, ·)


where L is the generator of a process for which

A→ Ax,y at rate p(x, y)1A(x) (1− 1A(y)) .

This is just an exclusion process with a finite number of particles with different notation. That iswe may associate a finite configuration with the set A via ζ(x) = 1A(x), or equivalently identifyη ∈ Ω such that

∑x η(x) <∞ with a finite subset of Λ by A = x ∈ Λ : η(x) = 1 (See pictures

from the lectures).

The self duality is useful since it allows us to write that probability that a certain finitesubset A of the lattice is completely occupied, at time t, by a possibly infinite exclusion process,in terms of the probability that a finite exclusion process, started from A, occupies sites whichwere occupied by the initial configuration of the process we were originally interested in. Muchmore succinctly

S(t)H(·, A)(η) = Eη[1ηtA≡1] = Pη[ηt ≡ 1 on A] = S(t)H(η, ·)(A) = PA[η ≡ 1 on At] .

The basic coupling of two copies of the exclusion process is the process (ηt, ζt)t > 0 with thefollowing transitions at rate p(x, y):

(η, ζ)→ (ηx,y, ζx,y) if η(x) = ζ(x) = 1 and η(y) = ζ(y) = 0,

(η, ζ)→ (ηx,y, ζ) if η(x) = 1, η(y) = 0 and (ζ(x) = 0, or ζ(y) = 1),

(η, ζ)→ (η, ζx,y) if ζ(x) = 1, ζ(y) = 0 and (η(x) = 0, or η(y) = 1).

In other words, particles move together whenever possible. We say there is a discrepancy at sitex ∈ Λ if η(x) 6= ζ(x). An important property of the coupling is that, although discrepancies canmove and be destroyed, they can not be created. Isolated discrepancies move according to acontinuous time random walk on Λ with rates p(x, y). Also if η0 6 ζ0 then ηt 6 ζt for all t > 0under the coupling, so the process is attractive. In lectures we drew some pictures, these arevery helpful here, they may appear in time in these printed notes, but in the meantimeit is worth drawing your own. For some important consequences see for example the Liggettnotes on Interacting Particle Systems - An introduction [4].

For the proof of Theorem 8.35 we will use a coupling of two symmetric exclusion processeswith the same finite number of particles, which initially have two discrepancy (the minimalnumber). We will use the finite subset representation of configurations described in the pre-viously. Consider two initial conditions A0 and B0 such that they both contain n particles,|A0| = |B0| = n, and which agree everywhere except for two particles (again draw pic-tures). Write At = Ct ∪ Xt and Bt = Ct ∪ Yt where Ct = At ∩ Bt. The two discrep-ancies at (Xt)t > 0 and (Yt)t > 0 perform independent random walks until they meet, at whichpoint they annihilate each other, from this random time onwards the two processes At andBt move together, they are said to be coupled. We call this random time the coupling time,τcouple := inft > 0 : At = Bt = inft > 0 : Xt = Yt. For t < τcouple the basic coupling canbe re-written as follows (in terms of the overlap (Ct) and the discrepancies (Xt) and (Yt)), foru ∈ C and v 6∈ C ∪ x, y

(C, x, y)→

(Cu,v, x, y) at rate p(u, v),

(C, v, y) at rate p(x, v),

(C, x, v) at rate p(y, v),

(C, x, x) at rate p(y, x),

(C, y, y) at rate p(x, y),

(Cu,x, u, y) at rate p(u, x),

(Cu,y, x, u) at rate p(u, y).


For t > τcouple the two processes move together, i.e. At = Bt. Since we construct this couplingonly in the symmetric case p(x, y) = p(y, x) we can check that the marginal process truly movewith the rates we described initially, that is: The pair (Xt, Yt) has the transition rates of twoindependent random walks with transition rates p(·, ·) until the first time they meet, at whichpoint they move together, and Ct evolves as a single exclusion process. This is just the basiccoupling but we are careful to keep track of the locations of the two discrepancies, and whenthey meet they annihilate each other.

We will use this coupling to give a neat proof of a Maximum Principle (c.f. Theorem 2.1)in this particular setting (which is countable so Theorem 2.1 does not apply directly). We willprove that the finite symmetric exclusion process, restricted to an irreducible component, hasno non-constant bounded harmonic functions. Recall a bounded function f is called harmonicfor the process if Eη[f(ηt)] = f(η).

Lemma 8.37. A bounded function f is harmonic function for the finite exclusion process, i.e.

Lf(A) = 0 ∀A ⊂ Λ finite,

if and only if f is constant on A ⊂ Λ : |A| = n for all n > 1.

Proof. Since the exclusion process conserves particle number we immediately observe that anyfunction which is constant on A ⊂ Λ : |A| = n for each n > 1 is harmonic.

We will present the converse only for the case that the random walk p(·, ·) is recurrent.Suppose f is a bounded harmonic function and fix A0 and B0 such that |A0| = |B0| = n. Letm = n − |A0 ∩ B0|, since A0 and B0 both have cardinality n there exist a sequence of setsA0 = A(1), A(2) . . . , A(m) = B0 such that |A(i)| = n and A(i) ∩A(i+1) = n− 1 (i.e. we can reduceto exactly the case we constructed the coupling for above). So

|f(A0)− f(B0)| = |EA0 [f(At)]− EB0 [f(Bt)]|

6 m max1 6 i 6 m−1

∣∣∣EA(i) [f(A(i)t )]− EA(i+1) [f(A

(i+1)t )]

∣∣∣ .We denote path measure and expectation with respect to the coupling of two finite exclusionprocess started from just two discrepancies, constructed above, by P and E. Now the right handside above can be written in terms of the coupled process, so

|f(A0)− f(B0)| 6 m max1 6 i 6 m−1

E(A(i),A(i+1))

∣∣∣f(A(i)t )− f(A

(i+1)t )

∣∣∣6 m‖f‖∞ max

1 6 i 6 m−1P(A(i),A(i+1))

[A

(i)t 6= A

(i+1)t

].

Since the two discrepancies move as independent random walks with transition rates p(·, ·), untilthe first time they meet τcouple, by the recurrence assumption on p(·, ·) (in particular (Xt−Yt)t > 0

is a random walk with rates 2p(·, ·)), we have

P(A(i),A(i+1)) [τcouple <∞] = 1 for i ∈ 1, . . . ,m− 1 ,

and so the right hand side above tends to zero as t→∞. Since the left hand side is independentof t this implies f(A0) = f(B0), and so f is constant on A ⊂ Λ : |A| = n.

If the random walk is not recurrent then one can proceed by comparison with independentrandom walks (for which the result still holds), using a different coupling [4].

To complete the proof of Theorem 8.35 we need one more standard result, in particularwe will use de Finetti’s Theorem. To this end we have to introduce the concept of exchangeablerandom variables, which you may have come across before in different settings. For more detailssee for example [1, Section VII.4].


Definition 8.38. The random variables (η(x))x∈Λ with probability distribution µ are exchangeableif all finite dimensional marginals are the same under finite permutations of the coordinates, i.e.for each n > 1, x1, . . . , xn ∈ Λ mutually different and ε1, . . . , εn ∈ S

µ[η(x1) = ε1, . . . , η(xn) = εn] = µ[η(xπ1) = ε1, . . . , η(xπn) = εn]

for every permutation π ∈ Sn.

Remark 8.39. On Ω = 0, 1Λ the measure µ is exchangeable if and only if µ [ η ∈ Ω : η ≡ 1 on A ]depends on A only through its cardinality |A|.

Theorem 8.40 (de Finetti). Every exchangeable measure µ on 0, 1Λ is a mixture of homogeneous(flat) product Bernoulli measures, i.e. there exists a measure γ on [0, 1] such that

µ =

∫ 1

0ναγ(dα) .

Equivalently

µ[η(x1) = 1, . . . , η(xk) = 1, η(xk+1) = 0, . . . , η(xn) = 0] =

∫ 1

0αk(1− α)n−kγ(dα) .

We are now in a position to complete the proof of Theorem 8.35.

Proof of Theorem 8.35 (a). By de Finetti’s theorem it is sufficient to show a measure is stationaryif and only if it is exchangeable. Since linear combinations of function of the form H(·, A) forA ⊂ Λ finite are dense in C(Ω), any measure probability measure on Ω is uniquely determinedby its values on sets of the form η | η ≡ 1 on A for A ⊂ Λ finite. Fix an intinal probabilitymeasure µ on Ω, and let µt = µS(t), applying the duality (Proposition 8.36) we have

µt (η | η ≡ 1 on A) =

∫ΩPη (ηt ≡ 1 on A)µ(dη)

=

∫ΩPA (η ≡ 1 on At)µ(dη) =

∑B⊂Λ|B|=|A|

PA(At = B)µ (η | η ≡ 1 on B) .

(8.3.7)

If we assume that µ is exchangeable then the last term on the right hand side is just µ (η | η ≡ 1 on A),and so µ is stationary. Now suppose µ is stationary so µt = µ, and let f(A) := µ (η | η ≡ 1 on A),(8.3.7) implies

f(A) = µt (η | η ≡ 1 on A) =∑B⊂Λ|B|=|A|

PA(At = B)f(B) = S(t)f(A) ,

where the first inequality was due to stationarity. So f is harmonic, and so by Lemma 8.37 itcan only depend on A through its cardinatlity, therefore µ is exchangeable. This completes theproof.

Chapter 9

Ising, Potts and Random Clustermodels

9.1 Gibbs States

The large scale time stationary, or equilibrium, behaviour of spatial random systems on lattices(or more generally graphs) is described in terms of the corresponding ’infinite volume’ proba-bility distributions. In classical equilibrium statistical physics this is made precise in terms ofGibbs states (which are in fact probability measures). These are defined starting from certainpotentials, or energy functions, in finite volume, which broadly speaking describe how the to-tal energy can be calculated by summing over interactions between components of the system.The Gibbs measures are then given by maximum entropy measures with fixed average energy.It turns out that the infinite volume Gibbs measures, at certain parameter values, may not beunique - this gives rise to a rigorous way of describing phase transitions in such systems. Also,this non-uniqueness is intimately linked to ergodicity of natural dynamics, for which the Gibbsmeasures are reversible, which often turn out to also be physically relevant. This very generalapproach turns out to be extremely powerful, although the details were worked out for modelsin condensed matter physics the applications are now numerous across physics, biology, eco-nomics and more. For more details then we will have time to cover in this Chapter see LiggettIPS (Chapter IV) and Georgii Gibbs Measures and Phase Transitions.

We take Ω = SΛ where S is finite and for now we also assume Λ is finite. To begin weneed some definitions: Given (η(x))x∈Λ with distribution µ, we write x ⊥ y if η(x) and η(y) areconditionally independent given (η(z))z∈Λ\x,y (clearly ⊥ is symmetric). This property givesrise to the dependency graph G = (Λ, E) where E = x, y : x 6⊥ y. A fully connected sub-graph of G is called a clique, and we denote the set of all cliques by K 3 ∅. A clique K is calledmaximal if no superset of K is contained in K. The set of all maximal cliques is denoted byM.

Theorem 9.1. Let π be a positive probability measure on Ω (i.e. π(η) > 0 for all η ∈ Ω), thenthere exists a family of functions fK : ΩK → [0,∞), for K ∈M, such that

π(η) =∏K∈M

fK(ηK) .

Proof. The result follows from Brook’s Lemma, see for example Grimmett Probability on Graphsfor further details.

We now introduce a type of spatial anlogue of the Markov property. This very natural prop-erty arrises often in random geometry, in particular in statisitcal mechanics and statistics.

Definition 9.2. A positive peobability measure π on Ω is called a Markov random field if it satisfies

π(ηW = sW | ηΛ\W = sΛ\W ) = π(ηW = sW | η∂W = s∂W ) , (9.1.1)

for all s ∈ Ω, W ⊂ Λ, where ∂W = x ∈ W c : x ∼ y for some y ∈ W is the external boundaryof W .

59

9.1. GIBBS STATES 60

It turns out that Markov random fields can be written in terms of potential fucntions, i.e. asGibbs states. For simplicity in the presentation we restirct to Ω = 0, 1Λ with Λ finite, so thatwe havev the usual bijection between configurations and subsets of Λ.

Definition 9.3. A probability measure π on Ω is called Gibbs random field if theser exists apotential function φ : 2L → R satisying φ(C) = 0 if c 6∈ K and

π(B) = exp

(∑K⊂B

φ(K)

), for B ⊂ Λ ,

i.e. there exists a function a family of functions fK : 0, 1K → R, K ∈ K, such that

π(σ) = exp

(β∑K∈K

fK(σK)

), for σ ∈ Λ ,

for some β > 0.

The motivation behind Gibbs measures comes from statistical mechanics, in particular thetotal energy of the system is a random variable E(σ) which is written in terms of the sum overthe contribution of each of the componentsE(σ) = −

∑K∈K fK (σK). Then the Gibbs measures

are exactly the maximum (Shannon) entropy measures subject to the constraint that the meanof E is fixed. The parameter β is the Lagrange multiplier (or conjugate parameter) which fixesthe mean of E. In the phyisics context β is called the inverse temperature.

Theorem 9.4. A positive probability measure on Ω is a Markov random field if and only if it is aGibbs random field. The potential function corresponding to a Markov random field π is given by

φ(K) =∑L⊆K

(−1)|K\L| log π(L) , K ∈ K .

A measure which satisfies Eq. 9.1.1, for all s ∈ Ω and W ⊂ Λ, is said to satisfy the globalMarkov property. It turns out that it is sufficient to check the condition on singleton sets W =w. A positive probability measure is said to satisfy the local Markov property if Eq. 9.1.1 holdsfor all s ∈ Ω, and all singleton sets W = w, for w ∈ Λ. It is clear that the global propertyimplies the local one, however the converse also holds.

Proposition 9.5. Let π be a positive probability measure on Ω, then the following three statementsare equivalent:

(a) π satisfies the global Markov property.

(b) π satisfies the local Markov property.

(c) For all A ⊆ Λ and any pair x, y ∈ Λ with x 6∈ A, y ∈ A and x 6∼ y,

π(A ∪ x)π(A)

=π((A ∪ x) \ y)

π(A \ y)(9.1.2)

Proof. The proof will be a guided exercise on the final problem sheet.

Proof of Theorem 9.4. The Theorem follows from an application of the previous proposition.

9.2. ISING AND POTTS MODELS 61

9.2 Ising and Potts Models

The Ising model is a surprisingly effective ‘toy’ model of magnetic materials, which is extremelywell studied. As way of motivation, imagine applying a strong external magnetic filed to a pieceof iron and then turning the filed off. At sufficiently low temperature, if T < Tc where Tc iscalled the Currie temperature, the sample will remain magnetic. However, if the temperature isincreased past the critical point, it will loose its magnetisation. This is an example of a phasetransition (in analogy with changes of phase for example as a liquid is heated and becomes agas), i.e. over a very small region of the parameter T the physical properties of the materialchanged abruptly.

In the statistical physics such phase transitions are often described in terms of uniqueness/non-uniqueness transitions of infinite volume Gibbs measures (or states). This is also why phasetransitions of the IPS we have seen previously are sometimes described in terms of the set Ie ofextremal stationary measures.

Let G = (Λ, E) be a finite graph and Ω = −1,+1Λ, where, for σ ∈ Ω we say that the sateat site x ∈ Λ is a down-spin if σ(x) = −1 and an up-spin if σ(x) = 1.

We define the finite volume Gibbs measures (which are just a family of probability measureson Ω) in terms of a Hamiltonian (energy function). ix a coupling constant J ∈ R+ an externalfield h ∈ R, then define the Hamiltonian H : Ω→ R by

H(σ) = −J∑e∈E

σ(e−)σ(e+)− h∑x∈Λ

σ(x) . (9.2.1)

Then, for a fixed inverse temperature β we define the finite volume Gibbs measures on Λ by

µΛ,β(σ) =1

Z(β)e−βH(σ) , for all σ ∈ Ω , (9.2.2)

where Z(β) is the normalization to ensure we have a probability measure, which in the physicsjargon is called the partition function, Z(β) =

∑σ∈Ω e

−βH(σ). Since J > 0, these measuresfavour configurations where the spins are aligned (ferromagnetic). How strongly these areenergetically favorable is controlled by the inverse temperature β, and there is clearly a compe-tition between energy and entropy in the sense that there are few configurations with extremelylow energy.

One obvious extension of the Ising model is to make the local state space larger (still finite),this is called the Potts model. Let G be as above, and Ωq = 1, 2, . . . , qΛ, that is the local statecan take any of q colours (the model will reduce to the Ising model - with slightly differentparameters which can be calculate - when q = 2). For e ∈ E let δe(σ) = 1

(σ(e−) = σ(e+)

), then

the Hamiltonian of the Potts model is given by

Hq(σ) = −∑e∈E

δe(σ) , (9.2.3)

note that in this case there is no external field (we only consider the case h = 0 here). theassociated measure is then given by

πΛ,βq (σ) =

1

Zq(β)e−βHq(σ) , for all σ ∈ Ω . (9.2.4)

These are two of the most well studied models in statistical mechanics. Notice that themodels are just described in terms of measures on Ω, at this point there are no dynamics. Thestudy of such models is often called equilibrium statistical mechanics, although there is now someambiguity in the literature over non-equilibrium being reserved for non-reversible dynamics. Wewill come to dynamics later.

9.3. RANDOM-CLUSTER MODEL 62

9.3 Random-cluster model

The random-custer model developed by Fortuin and Kasteleyn, sometimes called the FK-percolatio,brings together percolation, the Ising model (more generally Potts model), electrical networksand USTs.

As we did when we studied percolation let G = (Λ, E) be a finite connected graph, andΩ = 0, 1E (with the usual measurable structure). Let k(ω) be the number of connectedcomponents (including isolated vertices) in ω ∈ O. For p ∈ [0, 1] and q ∈ (0,∞) the randomcluster model is a probability measure on Ω defined by

φp,q(ω) =1

Zp,q

(∏e∈E

pω(e)(1− p)1−ω(e)

)qk(ω) , ω ∈ Ω . (9.3.1)

Note that q = 1 is exactly Bernoulli bond percolation on G. In the limit p, q → 0 while p/q → 0we recover the UST measure. It also turns out that connections in the FK-percolation correspondin some way to correlation structure in the Ising and Potts models.

9.3.1 Coupling with the Potts model

In this section Ωq = 1, . . . , qΛ and Ω = 0, 1E . Given and FK-realisation, ω ∈ Ω, indepen-dently colour each component a colour chosen uniformly from 1, . . . , q. It turns out that thisgives a realisation of the Potts model with the correct distribution. We now make this moreprecise.

Definition 9.6 (FK and Potts coupling). Let F ⊂ Ωq × Ω be the set given by

F = (σ, ω) : σ(e−) = σ(e+) ∀ e ∈ E such that ω(e) = 1 , (9.3.2)

i.e. all pairs of compatible Potts and FK configurations (the configurations are called compatible ifevery connected component is monochromatic). Then we define the coupling by

µ(σ, ω) ∝ φp,1(ω)1F (σ, ω) , (σ, ω) ∈ Ωq × Ω . (9.3.3)

We now observe that the two marginals of µ are as desired.

Marginal on Ωq: Fix σ ∈ Ωq and sum over ω ∈ Ω, we find:

µ(σ) ∝∑ω∈Ω

φp,1(ω)1(ω(e) = 0 if σ(e−) 6= σ(e+)

)∝∑ω∈Ω

∏e∈E

pω(e)(1− p)1−ω(e)1

(ω(e) = 0 if σ(ε−) 6= σ(e+)

)∝∏e∈E

(1− p)1−δe(σ) ∝ eβ∑e∈E δe(σ) ∝ πΛ,β

q (σ) , where p = 1− e−β ,

where in the first ∝ in the third line we used that for each e satisfying ω(e−) = ω(e+) there isno constraint on the value of ω(e). In particular if we fix the value of ω on all edges except onemonochromatic edge, then we see that by summing over ω(e) ∈ 0, 1 the contribution to theproduct from this edge becomes p+ (1− p) = 1. After we have done this on all monochromaticedges, we know that each non-monochromatic edge w(e) = 0 which gives the expression.

Marginal on Ω: For a given ω ∈ Ω we sum over all the σ ∈ Ωq which are constant on open


clusters. There are qk(ω) such configurations (simply by picking colour 1, 2, . . . , q for each clus-ter), and µ(σ, ω) = µ(σ′, ω) for any two such colourings σ and σ′. It follows that

µ(ω) ∝(∏

pω(e)(1− p)1−ω(e))qk(ω) ∝ φp,q(ω) .

Conditional measures:

(i) µ(σ | ω) is obtained by putting uniform random colours on each component in ω.

(ii) µ(ω | σ) is obtained by flipping a Ber(p) coin for each monochrome edge in σ to decide ifthey are connected, all non-monochrome edges are necessarily not connected.

9.3.2 Potts model two point correlations

If we think of the local configurations as vectors σ(x) ∈ 1, 2, . . . , q as a vector (0, . . . , 0, 1, 0 . . . , 0)with a single 1 in the σ(x)’th position. This is useful for concepts such as average colour, if theorder of the colours does not matter. Since we will not use this representation much beyond mo-tivating the following definition of the two-point correlation we will not dicuss this in any moredetail here, you may simply take the right handside of the following display as the definition,

τβ,q(x, y) = Cov(σ(x), σ(y)) = E[σ(x) · σ(y)]− E[σ(x)] · E[σ(y)] = πβq [σ(x) = σ(y)]− 1

q,

(9.3.4)

for x, y ∈ Λ (where all expectations are with respect to the Potts measure πβq ).

Theorem 9.7. For q ∈ 2, 3, . . ., β > 0 and p = 1− e−β,

τβ,q(x, y) =

(1− 1

q

)φp,q(x↔ y) . (9.3.5)

Proof. We firstly write τβ,q(x, y) = E[1σ(x) = σ(y) − 1q ] and then use µ(σ | ω),

τβ,q(x, y) =∑σ,ω

(1σ(x) = σ(y) − 1

q

)µ(σ, ω)

=∑ω

φp,q(ω)∑σ

µ(σ | ω)

(1σ(x) = σ(y) − 1

q

)=∑ω

φp,q(ω)

((1− 1

q)1x↔y(ω) + 01x 6↔y(ω)

)=

(1− 1

q

)φp,q(x↔ y) .

9.3.3 Basic properties of the FK model

Theorem 9.8. The measure φp,q has positive correlations, i.e. it satisfies the FKG lattice condition(or Holley Criterion), see Theorem 6.8.

Proof. The result is clear for p = 0, 1. Fix p ∈ (0, 1), the Holley inequality (condition of Theorem6.8) is equivalent to

k(ω ∨ ω′) + k(ω ∧ ω′) > k(ω) + k(ω′) ω, ω′ ∈ Ω .

(The equivalence and the proof of the inequality above are left as an exercise).


Theorem 9.9 (Comparison - monotonicity in p and q). The following comparisons hold:

• φp′,q′ 6 φp,q if p′ 6 p, q′ > q, q′ > 1.

• φp′,q′ > φp,q if p′

q′(1−p′) >p

q(1−p) and q′ > q, q′ > 1.

Proof. Apply the FKG inequality.

The BK inequality does not hold for φp,q due to long range dependencies which are present.These are in a sense through the “boundaries” of simple regions c.f. the spatial Markov property.

9.3.4 ∞-volume and the phase transition

See Section 8.3 in Grimmett Probability on Graphs. We construct the∞-volume measure by tak-ing weak limits. There are alternative constructions using the Dobrushin-Lanford-Ruelle (DLR)formalism. The two approaches are not quite equivalent in ways we will not get into here (Addref). For clarity we consider subsets of the usual integer lattice Zd with associated edge set Ed

and finite subsets Λ ⊂ Zd with associated edge set EΛ given by the set of all nearest neighborspairs (in `1).

Let ΩτΛ = ω ∈ 0, 1Ed : ω(e) = τ ∀e 6∈ EcΛ where τ ∈ 0, 1 is considered to be the

boundary condition imposed on the finite region Λ. If τ = 1 we call the boundary wired (everyedge outside the finite region is open or wired together), and if τ = 0 we call the boundarycondition free. Notice that the set Ωτ

Λ is finite since the configuration is fixed on all but finitelymany edges. It turns out that these boundary conditions can have an interesting effect on thelarge volume limit.

On ΩτΛ we define the random cluster measure with boundary condition τ , and p ∈ [0, 1],

q ∈ (0,∞) as

φΛ,τp,q (ω) =

1

ZΛp,q

∏e∈EΛ

pω(e)(1− q)1−ω(e)

qk(ωΛ) , ω ∈ ΩτΛ .

The next theorem states that the infinite volume measure exists, see Section 4.3.

Theorem 9.10. Let p > 1. The weak limits

φτp,q = limΛZd

φΛ,τp,q , τ ∈ 0, 1 ,

exist and are translation invariant and (shift) ergodic.

The infinite volume limit is often called the ‘thermodynamic limit,’ in analogy with physicalmodels where relevant limits are often to take volume diverging as other parameter such aspressure or temperature are kept constant.

It turns out that the measure φ0/1p,q are extremal in the following sense

φ0p,q 6 φ

τp,q 6 φ

1p,q for more general τ ,

where φτp,q is the obvious generalization of φ0 or 1p,q by taking other ‘wirings’ outside of the finite re-

gion Λ. In particular, if φ0p,q = φ1

p,q then the infinite volume measure is unique. By weak and verygeneral arguments (using convexity of the free energy function) it can be shown that φ1

p,q = φ0p,q

for all but countably many values of p for fixed q. This means that there is no phase transition inp. However, there is a percolation transition, which translates into a uniqueness/non-uniquenessphase transition for the associated Potts (Ising) model.

For τ ∈ 0, 1 let θτ (p, q) = φτp,q(0 ↔ ∞) and pτc (p, q) = supp : τ τ (p, q) = 0. By thediscussion in the previous paragraph (no phase transition) p0

c = p1c = pc(p, q) where we dropped

the τ dependence in the notation on the right hand side since it is not required.


Theorem 9.11. Fix q > 1, d > 1, then

• For p < pc the∞-volume FK measure has no∞-cluster (a.s.).

• For p > pc the∞-volume FK measure has an∞-cluster (a.s.).

In terms of the associated Potts model (when q ∈ N) implies the following. Let βc be givenby 1− e−βc = pc, then (by Theorem 9.7) we have

πΛ,1β,q (σ0 = 1)− 1

q= (1− q−1)φΛ,1

p,q (0↔ ∂Λ) ,

so

limΛZd

πΛ,1β,q (σ0 = 1) =

0 if β < βc ,

m∗ > 0 if β > βc ,

where m∗ is (1− q−1) times the probability that 0 belongs to the∞-cluster in the associated FKmodel. The value m∗ is often called magnetization in analogy with the q = 2 cases which can bemapped to the Ising model. Following the calculation above m∗ = limΛZd

(πΛ,1β,q (σ0 = 1)− 1

q

)which measures the effect of the boundary condition 1 on the spin at the origin. If β < βc (smallinverse temperature i.e. high temperature) the origin does not ‘feel’ the boundary condition inthe limit. However, if β > βc (low temperature) then the origin prefers to be aligned to theboundary even as the distance to the boundary diverges. The limiting infinite volume Pottsmeasure must then depends on which boundary conditions were chosen, i.e. the infinite volumemeasure is not unique.

Chapter 10

Stochastic Ising model and Mixingtimes

Mixing times and relaxation times of Markov processes were explored further on the vacationassignment sheet (Sheet 4).

It turns out that phase transitions in the sense of the previous section are often associatedwith “dynamic” transitions for very natural (local) associated dynamics. That is dynamics thathave the measures of interest as invariant measures. The dynamics are then used to modelhow the system reaches its equilibrium which is characterized by the stationary measure. Whenthere is non-uniqueness of the∞-volume measure as described previously then the system ‘feels’the effect of boundaries even when arbitrarily large. These long range effects typically make itdifficult (slow) to mix (reach equilibrium).

Here we will consider local ‘spin-flip’ type dynamics (meaning the configuration can changeat most at one site at a time). The dynamics can be defined immediately in infinite volume(see for example [3]), but we will be interested in how characteristic times associated with thedynamics grow as the system size increases.

Recall the Ising measure on finite volume Λ ⊂ Zd with edge set EΛ taken to be all the edgesin the integer lattice that have both ends in Λ (note we could include boundary conditions byincluding in EΛ all edges that have at least one end in Λ and following the formalism of theprevious section, what we do here is sometimes called free boundary conditions for the Isingmodel),

πΛβ (σ) =

1

ZΛβ

e−βHΛ(σ) , for σ ∈ ΩΛ

where HΛ(σ) =∑

e∈EΛσ(e−)σ(e+) (note we take zero external field i.e. h = 0). To simplify

notation we will drop the Λ and β dependence, i.e.

π(σ) =1

Zeβ2

∑x∼y σ(x)σ(y) .

Note the re-parametrisation of the sum above means that we include each edge twice, hence theextra factor of 1/2 in the exponent.

For simplicity of the presentation we will consider discrete time dynamics, though similaranalysis hold also for continuous time dynamics (where each site updates at the arrival times ofi.i.d. Poisson process). The discrete time Markov dynamics are define as follows: At each steppick a vertex x ∈ Λ uniformly at random and then sample a new local state from π conditionedon the configuration outside of x, i.e. from π(· | σ Λ\x) (independent of the current state, sothis may be a ‘lazy’ jump). This description gives rise to the following transition matrix elements(check!)

P (σ, η) =1

|Λ|∑x∈Λ

1(σ(y) = η(y) for y 6= x)eβ2η(x)S(σ,x)

eβ2S(σ,x) + e−

β2S(σ,x)

, (10.0.1)

where S(σ, x) =∑

z∼x σ(z). Using standard identities we can write the term under the sum as

eβ2η(x)S(σ,x)

eβ2S(σ,x) + e−

β2S(σ,x)

=1

2

(1 + tanh

(β2S(σ, x)

)).

66

10.1. PHASE TRANSITION 67

Clearly P is reversible with respect to π, so for fixed boundary conditions (or free) we havea finite state irreducible and reversible process. It turns out that if we had defined things ap-propriately on the infinite lattice then the set of Gibbs states would have coincided with the setof reversible infinite volume invariant measures (which are not necessarily unique if there is aphase transition).

For simplicity of the analysis from this point on we will focus on the complete graph ratherthan the integer lattice. In physics language this is called the mean field approximation. Thatis from now on G = (Λ, E) where E = Λ2. In the mean field case each term in the sum inthe Hamiltonian is order of n, that is each site has (n − 1) neighbor’s. It is therefore helpful toparameterize by setting α = β

2n .

10.1 Phase transition

Using results developed on random graphs, such as G(n, p) i.e. Bernoulli percolation on thecomplete graph (this will be covered in more detail in the Probabilistic Combinatorics course),Bollobas, Grimmett and Janson (1995) showed that the FK model on the complete graph un-dergoes a percolation transition at

λc(q) =

q, if 0 < q 6 2 ,

2(q−1q−2

)log(q − 1), if q > 2 ,

where p = λ/n . (10.1.1)

There is therefore a corresponding phase transition the the associated Potts model at βc =− log(1− pc).

It turns out that when q = 2 (i.e. the Ising case) the mean field Potts (or Ising) modelis exactly solvable, meaning that it is possible to calculate many things explicitly. These exactmethods were used to fully characterize the phase transition by Wu (1982), and Kesten andSchonmann (1990). It can be shown that the critical value of βc = 1/n i.e. αc = βc

n = 1. Notethis is consistent with the FK result on the complete graph.

If α < 1 (i.e. β < 1/n) then there is a unique ∞-volume Gibbs measure. Whereas if α > 1then there are exactly two extremal measures, one corresponds to taking + boundary conditions,i.e. πΛ,+

β , and one corresponding to − boundary conditions, i.e. the weak limit of πΛ,−β . Clearly

the average magnetization of any single fixed vertex under πΛ,+β will be positive and negative

under πΛ,−β . We will denote these average magnetizations by

π+β (σ(0)) = lim

ΛZdπΛ,+β (σ(0)) = m+ > 0 and

π−β (σ(0)) = limΛZd

πΛ,−β (σ(0)) = m− > 0 .

On a finite lattice with free boundary conditions there there is an obvious symmetry with respectto flipping all the spins. It turns out as we will see in the next section, if there is no boundarycondition or external field to break the symmetry then at low temperature, β > βc, the stationarymeasure concentrates on configurations that have magnetization close to m+ or m−.

10.2 Large deviations of the empirical magnetization

We now focus exclusively on the Ising model on the complete graph of n vertices with freeboundary conditions, i.e.

π(σ) ∝ eβ2

∑x∈Λ

∑y 6=x σ(x)σ(y) .

10.2. LARGE DEVIATIONS OF THE EMPIRICAL MAGNETIZATION 68

Recall, from (10.1.1) it follows that if β = α/n then the critical point is specified by αc = 1.The weak limit (thermodynamic limit) as |Λ| → ∞ of π when α > 1 is a mixture of twomeasure π+ and π− which concentrate, respectively, on positive and negative (empirical) meanmagnetisation

mn(σ) =1

n

∑x∈Λ

σ(x) .

Figure 10.1: The energy function (large deviation rate function) associated with the empiricalmagnetisation mn, described by (10.2.1).

We can define an effective energy landscape under the stationary measure as follows (ascalled a rate function in the language of large deviation theory),

ψ(n)α (m) = − 1

nlog πα(mn(σ) = m) , m ∈ −1,−1 +

2

n, . . . , 1− 2

n, 1 . (10.2.1)

=1

nlog

[(n

mn

)exp

(α

n

[(mn

2

)+

(n−mn

2

)−mn(n−mn)

])]− 1

nlogZ(α) .

Notice that the last term on the right hand side is the normalisation and is independent of m,it turns out that it converges as n → ∞ and we call the limit f(α). This is often called the freeenergy in the physics literature. By applying Stirling’s approximation we find

ψ(n)α (m)

n→∞−−−→ ψα(m) := m logm+ (1−m) log(1−m)− α

2(1− 2m)2 − f(α) .

The first two terms are related to the Binary entropy function. Taking derivatives shows that

ψ′α(0) = 0 , and ψ′′α(0) = 4(1− α) ,

hence m = 0 is a critical point of ψα, and is a local maximum or minimum depending on thevalue of α (see Fig. 10.1). When m = 0 is a local maximum it can be interpreted as an ‘energybarrier’ in the sense that we will discuss in more detail now. Intuitively this is because under thedynamics the value of mn can only make ‘local moves’ (in fact in this case it makes a RW withstep size 2/n), and to go from one region with large probability, mn < 0, to another, mn > 0,the process must cross a state with very low stationary probability. This is often referred to as abottleneck.

LetS = σ ∈ ΩΛ : mn(σ) < 0 ,

then by symmetry πα(S) 6 1/2 and to go from S to Sc the process must have to go through the‘unlikely’ set ∂S = σ : mn = 1− 2

nbn2 c.

10.3. MIXING TIME 69

10.3 Mixing time

The ideas of mixing and relaxation time were explored more on the final problem sheet, see alsoLevin, Peres, Wilmer Markov chains and mixing times (2008) - available online.

Definition 10.1 (Mixing time). The total variation mixing time is define by the first time that thedistribution of the chain is close to the stationary distribution, uniformly in the initial conditions.Fix ε > 0,

Tmix(ε) = inft > 0 : maxη∈Ω‖P t(η, ·)− π‖TV < ε , (10.3.1)

where

‖ν − µ‖TV =1

2

∑η∈Ω

|ν(η)− µ(η)| = supf : ‖f‖∞ 6 1

|µ(f)− ν(f)|

= infP(σ 6= η) : (σ, η) ∼ P is a coupling of µ, ν .

To simplify notation we let Tmix := Tmix(1/4).

10.3.1 Bottle neck

We use the existence of the bottleneck discussed above to show that the system is slow mixing(i.e. exponentially large mixing time in the system size), when α > 1 (low temperature).

Theorem 10.2 (Slow mixing). If α > 1 then there exists a positive function r(α) > 0 and constantC > 0 such that

Tmix > Cer(α)n .

Proof. Recall the definition of the conductance c(η, σ) = π(η)P (η, σ), letQ(A,B) =∑

η∈A,σ∈B c(η, σ)denote the probability to go from A to B in one step starting from the stationary measure π. Wemeasure the perimeter of a set S ⊂ Ω by

Φ(S) =2Q(S, Sc)

π(S).

Then define the bottleneck constant (also known as the Cheeger or Isoperimetric constant) by

Φ∗ = minS :π(S) 6 1

2

Φ(S) .

We proved the following lemma on the final exercise sheet (Sheet 4)

Lemma 10.3.Tmix >

1

Φ∗.

to complete the proof we simply show that S = σ ∈ ΩΛ : mn(σ) < 0 gives rise to anexponentially narrow bottleneck. Roughly the argument proceeds as follows (and can be madeprecises with only a little effort). To cross from S to Sc in a single jump of the process thestarting configuration must have a magnetization close 0 plus or minus something of order 1/n.It follows that

Q(S, Sc) 6dn/2en

π(mn ≈ 0) . e−nψn(0) ,

and π(S) > π(mn = m−), so

Ψ(S) 6e−nΨn(0)

e−nΨ(m−). e−n(Ψα(0)−Ψα(m−)) .

To conclude we observe that (Ψα(0)−Ψα(m−)) > 0.


10.3.2 Path coupling

To derive upper bounds on the mixing time of the stochastic Ising model we will use a methodcalled path coupling.

Since the magnetisation of the stochastic Ising model on the complete graph is Markov (canbe checked) and completely characterizes the full state (by symmetry), we can reduce the prob-lem to the mixing time of a simple random walk. However we will take a more general approachthat can be applied with only slightly more technicalities to more general models of this type.

Theorem 10.4. The Glauber dynamics for the Ising model on the complete graph is fast mixing ifα < 1 (at high temperature),

Tmin 6n(log n+ logα)

1− α.

The proof relies on the following more general path coupling result. Let

d(t) = maxη∈Ω‖P t(η, ·)− π‖TV .

Since π is stationary it follows that d(t) 6 maxη,σ∈Ω ‖P t(η, ·)−P t(σ, ·)‖TV. In particular if Pη,σ isa Markov coupling of (ηt)t > 0 and (σt)t > 0, such that they both have transition matrix P (·, ·) andintial conditions ηo = η, σ0 = σ (a.s.) respectively, then d(t) 6 Pη,σ(ηt 6= σt) (this follows fromthe coupling characterization the total variation norm). The aim, as we have seen previously,is to construct a Markov coupling of the process started from different points such that the twotrajectories move towards each other as much as possible.

Let ` : E → R+ be a positive function define on the edge sets, which we consider to be edgelengths, such that `(η, σ) > 1 for each (η, σ) ∈ E. Then we define an associated path metric by

ρ(σ, η) = min|γ| : γ a path from σ to η , (10.3.2)

where we define the length of path by |γ| =∑

e∈γ `(e). Observe that ρ(σ, η) > 1(η 6= σ), so

Pη,σ(ηt 6= σt) 6 Eη,σ [ρ(ηt, σt)] ,

for any coupling. We define the transportation metric on probability measures, with respect toa path metric ρ, by

dρ(µ, ν) = infE [ρ(X,Y )] : (X,Y ) is a coupling of µ and ν .

Theorem 10.5 (Budley and Dyer). Let ρ be a path metric and for each (η, σ) ∈ Ω2 with P (η, σ) >0 there exists a coupling of P (η, ·) and P (σ, ·) such that

Eη,σ[ρ(η1, σ1)] 6 ρ(η, σ)e−c = `(η, σ)e−c .

Then for each each µ, ν probability measure on Ω,

dρ(µP, νP ) 6 e−cdρ(µ, ν) .

Corollary 10.6.

d(t) 6 e−ctdiam(Ω) ,

and hence

Tmix 6

⌈log(1/ε) + log diam(Ω)

c

⌉.


Sketch proof of Theorem 10.5. Choose η, σ ∈ Ω then pick the shortest path and apply the triangleinequality to get

dρ(P (η, ·), P (σ, ·)) 6 e−cρ(η, σ) .

Finally pick the optimal coupling.

We now apply these results to the stochastic Ising model to complete the proof of Theorem10.4.

Let ρ(η, σ) = 12

∑x∈Λ |η(x)−σ(x)| (which counts the number of differences between the two

configurations). Fix η, σ ∈ Ω with ρ(η, σ) = 1, and without loss of generality let x ∈ Λ be thevertex at which they differ and η(x) = 1, σ(x) = −1. We now describe a one-step coupling thaton average reduces the number of differences between the two configurations. Pick the samevertex z to update in both configurations. If we pick z = x ∈ Λ then the neighbours of z have thesame spin in both configurations and so we can update to the same thing. Otherwise, if z 6= xwe use ‘the same randomness’ to update both configurations. Let U ∼ Unif[0, 1] independent ofeverything else and let

η1(z) =

+1 if U 6 P (η, ηz,+) ,

−1 if U > P (η, ηz,+) ,

σ1(z) =

+1 if U 6 P (σ, σz,+) ,

−1 if U > P (σ, σz,+) .

Thus on the event z = x then ρ(η1, σ1) = 0 (which occurs with probability 1/n. If z 6= x andP (σ, σz,+) < U 6 P (η, ηz,+) then ρ(η1, σ1) = 2 otherwise the two configurations move togetherand ρ(η1, σ1) = 1. It follows that

Eη,σ [ρ(η1, σ1)] 6 1 +n− 1

n

(P (η, ηz,+)− P (σ, σz,+)

)− 1

n6 tanh(β) .

SoEη,σ [ρ(η1, σ1)] 6 e−c(β)/n , if β <

1

n.

Bibliography

[1] W. Feller. An introduction to probability theory and its applications. Wiley, 2008.

[2] T. M. Liggett. Continuous time Markov Processes: an Introduction. . ISBN 9780821849491.

[3] T. M. Liggett. Interacting Particle Systems. . ISBN 3540226176.

[4] T. M. Liggett. Interacting Particle Systems - An Introduction. ICTP Lecture Notes Series, 17,2004.

[5] T. Mikosch and O. Kallenberg. Foundations of Modern Probability. Journal of the AmericanStatistical Association, 93(443):1243, 1998. doi: 10.2307/2669881.

[6] W. Rudin. Real and complex analysis. McGraw-Hill, 1987. ISBN 0-07-054234-1. doi: 10.2307/2348852. URL http://scholar.google.com/scholar?hl=en&btnG=Search&q=

intitle:Real+and+complex+analysis#0.

72

http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Real+and+complex+analysis#0

http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Real+and+complex+analysis#0

part c: interacting particle systemschleboun/partc-ipsnotes-2018.pdf · observed phenomena of...

Documents