error bounds for metropolis–hastings algorithms applied to ...ity measures that have a density...

arX

iv:1

210.

1180

v2 [

mat

h.PR

] 1

6 Ja

n 20

14

The Annals of Applied Probability

2014, Vol. 24, No. 1, 337–377DOI: 10.1214/13-AAP926c© Institute of Mathematical Statistics, 2014

ERROR BOUNDS FOR METROPOLIS–HASTINGS ALGORITHMS

APPLIED TO PERTURBATIONS OF GAUSSIAN MEASURES IN

HIGH DIMENSIONS

By Andreas Eberle

Universitat Bonn

The Metropolis-adjusted Langevin algorithm (MALA) is a Metro-polis–Hastings method for approximate sampling from continuousdistributions. We derive upper bounds for the contraction rate inKantorovich–Rubinstein–Wasserstein distance of the MALA chainwith semi-implicit Euler proposals applied to log-concave probabil-ity measures that have a density w.r.t. a Gaussian reference mea-sure. For sufficiently “regular” densities, the estimates are dimension-independent, and they hold for sufficiently small step sizes h that donot depend on the dimension either. In the limit h ↓ 0, the bounds ap-proach the known optimal contraction rates for overdamped Langevindiffusions in a convex potential.

A similar approach also applies to Metropolis–Hastings chainswith Ornstein–Uhlenbeck proposals. In this case, the resulting es-timates are still independent of the dimension but less optimal, re-flecting the fact that MALA is a higher order approximation of thediffusion limit than Metropolis–Hastings with Ornstein–Uhlenbeckproposals.

1. Introduction. The performance of Metropolis–Hastings (MH) meth-ods [16, 23, 27] for sampling probability measures on high-dimensional con-tinuous state spaces has attracted growing attention in recent years. The pio-neering works by Roberts, Gelman and Gilks [28] and Roberts and Rosenthal[29] show in particular that for product measures πd on R

d, the average ac-ceptance probabilities for the Random Walk Metropolis algorithm (RWM)and the Metropolis adjusted Langevin algorithm (MALA) converge to astrictly positive limit as d→∞ only if the step sizes h go to zero of orderO(d−1), O(d−1/3), respectively. In this case, a diffusion limit as d→∞ has

Received November 2012; revised February 2013.AMS 2000 subject classifications. Primary 60J22; secondary 60J05, 65C40, 65C05.Key words and phrases. Metropolis algorithm, Markov chain Monte Carlo, Langevin

diffusion, Euler scheme, coupling, contractivity of Markov kernels.

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in The Annals of Applied Probability,2014, Vol. 24, No. 1, 337–377. This reprint differs from the original in paginationand typographic detail.

1

http://arxiv.org/abs/1210.1180v2

http://www.imstat.org/aap/

http://dx.doi.org/10.1214/13-AAP926

http://www.imstat.org

http://www.ams.org/msc/

http://www.imstat.org

http://www.imstat.org/aap/

http://dx.doi.org/10.1214/13-AAP926

2 A. EBERLE

been derived, leading to an optimal scaling of the step sizes maximizing thespeed of the limiting diffusion, and an asymptotically optimal acceptanceprobability.

Recently, the optimal scaling results for RWM and MALA have beenextended significantly to targets that are not of product form but have asufficiently regular density w.r.t. a Gaussian measure; cf. [22, 26]. On theother hand, it has been pointed out [3, 4, 8, 14] that for corresponding per-turbations of Gaussian measures, the acceptance probability has a strictlypositive limit as d→∞ for small step sizes that do not depend on the dimen-sion, provided the random walk or Euler proposals in RWM and MALA arereplaced by Ornstein–Uhlenbeck or semi-implicit (“preconditioned”) Eulerproposals, respectively; cf. also below. Pillai, Stuart and Thiery [25] showthat in this case, the Metropolis–Hastings algorithm can be realized directlyon an infinite-dimensional Hilbert space arising in the limit as d→∞, andthe corresponding Markov chain converges weakly to an infinite-dimensionaloverdamped Langevin diffusion as h ↓ 0.

Mixing properties and convergence to equilibrium of Langevin diffusionshave been studied intensively [1, 2, 9, 15, 24, 31]. In particular, it is well-known that contractivity and exponential convergence to equilibrium inWasserstein distance can be quantified if the stationary distribution is strictlylog-concave [7, 35]; cf. also [11] for a recent extension to the nonlog-concavecase. Because of the diffusion limit results, one might expect that the ap-proximating Metropolis–Hastings chains have similar convergence proper-ties. However, this heuristics may also be wrong, since the convergence ofthe Markov chains to the diffusion is known only in a weak and nonquanti-tative sense.

Although there is a huge number of results quantifying the speed of con-vergence to equilibrium for Markov chains on discrete state spaces (cf. [18,32] for an overview), there are relatively few quantitative results on Metro-polis–Hastings chains on R

d when d is large. The most remarkable exceptionare the well-known works [10, 17, 19–21] which prove an upper bound for themixing time that is polynomial in the dimension for Metropolis chains withball walk proposals for uniform measures on convex sets and more generallog-concave measures.

Below, we develop an approach to quantify Wasserstein contractivityand convergence to equilibrium in a dimension-independent way for theMetropolis–Hastings chains with Ornstein–Uhlenbeck and semi-implicit Eu-ler proposals. Our approach applies in the strictly log-concave case (or, moregenerally, if the measure is strictly log-concave on an appropriate ball) andyields bounds for small step sizes that are very precise. The results for semi-implicit Euler proposals require less restrictive assumptions than those forOrnstein–Uhlenbeck proposals, reflecting the fact that the correspondingMarkov chain is a higher order approximation of the diffusion.

ERROR BOUNDS FOR METROPOLIS–HASTINGS ALGORITHMS 3

Our results are closely related and complementary to the recent work[13], and to the dimension-dependent geometric ergodicity results in [5].In particular, in [13], Hairer, Stuart and Vollmer apply related methodsto establish exponential convergence to equilibrium in Wasserstein distancefor Metropolis–Hastings chains with Ornstein–Uhlenbeck proposals in a lessquantitative way, but without assuming log-concavity. In the context of prob-ability measures on function spaces, the techniques developed here are ap-plied in the PhD Thesis of Gruhlke [12].

We now recall some basic facts on Metropolis–Hastings algorithms and de-scribe our setup and the main results. Sections 2 and 3 contain basic resultson Wasserstein contractivity of Metropolis–Hastings kernels, and contractiv-ity of the proposal kernels. In Sections 4 and 5, we prove bounds quantifyingrejection probabilities and the dependence of the rejection event on the cur-rent state for Ornstein–Uhlenbeck and semi-implicit Euler proposals. Thesebounds, combined with an upper bound for the exit probability of the corre-sponding Metropolis–Hastings chains from a given ball derived in Section 6are crucial for the proof of the main results in Section 7.

1.1. Metropolis–Hastings algorithms. Let U :Rd →R be a lower boundedmeasurable function such that

Z =

∫

Rd

exp(−U(x))dx <∞,

and let µ denote the probability measure on Rd with density proportional

to exp(−U). We use the same letter µ for the measure and its density, thatis,

µ(dx) = µ(x)dx=Z−1 exp(−U(x))dx.(1.1)

Below, we view the measure µ defined by (1.1) as a perturbation of thestandard normal distribution γd in R

d; that is, we decompose

U(x) = 12 |x|

2 + V (x), x ∈Rd,(1.2)

with a measurable function V :Rd →R, and obtain the representation

µ(dx) = Z−1 exp(−V (x))γd(dx)(1.3)

with normalization constant Z =Z/(2π)d/2 . Here | · | denotes the Euclideannorm.

Note that in Rd, any probability measure with a strictly positive den-

sity can be represented as an absolutely continuous perturbation of γd as in(1.3). In an infinite-dimensional limit, however, the density may degenerate.Nevertheless, also on infinite-dimensional spaces, absolutely continuous per-turbations of Gaussian measures form an important and widely used classof models.

4 A. EBERLE

Example 1.1 (Transition path sampling). We briefly describe a typicalapplication; cf. [14] and [12] for details. Suppose that we are interested insampling a trajectory of a diffusion process in R

ℓ conditioned to a givenendpoint b at time t= 1. We assume that the unconditioned diffusion process(Yt,P) satisfies a stochastic differential equation of the form

dYt =−∇H(Yt)dt+ dBt,(1.4)

where (Bt) is an ℓ-dimensional Brownian motion, andH ∈C2(Rℓ) is boundedfrom below. Then, by Girsanov’s theorem and Ito’s formula, a regular ver-sion of the law of the conditioned process satisfying Y0 = a and Y1 = b onthe path space E = y ∈C([0,1],Rℓ) :y0 = a, y1 = b is given by

µ(dy) =C−1 exp(−V (y))γ(dy),(1.5)

where γ is the law of the Brownian bridge from a to b,

V (y) =1

2

∫ 1

0φ(ys)ds with φ(x) = |∇H(x)|2 −∆H(x),(1.6)

and C = exp(H(b)−H(a)); cf. [31]. In order to obtain finite-dimensionalapproximations of the measure µ on E, we consider the Wiener–Levy ex-

pansion

yt = et +∞∑

n=0

2n−1∑

k=0

ℓ∑

i=1

xn,k,ien,k,it , t ∈ [0,1],(1.7)

of a path y ∈E in terms of the basis functions et = (1− t)a+ tb and en,k,it =2−n/2g(2nt − k)ei with g(s) = min(s,1 − s)+. Here the coefficients xn,k,i,n ≥ 0, 0 ≤ k < 2n, 1 ≤ i ≤ ℓ, are real numbers. Recall that truncating theseries at n=m− 1 corresponds to taking the polygonal interpolation of thepath y adapted to the dyadic partition Dm = k2−m :k = 0,1, . . . ,2m of theinterval [0,1]. Now fix m ∈N, let d= (2m − 1)ℓ and let

xd = (xn,k,i : 0≤ n <m,0≤ k < 2n,1≤ i≤ l) ∈Rd

denote the vector consisting of the first d components in the basis expan-sion of a path y ∈ E. Then the image of the Brownian bridge measure γunder the projection πd :E → R

d that maps y to xd is the d-dimensionalstandard normal distribution γd; for example, cf. [33]. Therefore, a naturalfinite-dimensional approximation to the infinite-dimensional sampling prob-lem described above consists in sampling from the probability measure

µd(dx) = Z−1d exp(−Vd(x))γ

d(dx)(1.8)

on Rd where Zd is a normalization constant, and

Vd(x) = 2−m−1

(1

2φ(y0) +

2m−1∑

k=1

φ(yk2−m) +1

2φ(y1)

);(1.9)


with y = e +∑

n<m

∑k

∑i xn,k,ie

n,k,i denoting the polygonal path corre-

sponding to xd = (xn,k,i) ∈Rd.

Returning to our general setup, suppose that p(x,dy) = p(x, y)dy is anabsolutely continuous transition kernel on R

d with strictly positive densitiesp(x, y). Let

α(x, y) = min

(µ(y)p(y,x)

µ(x)p(x, y),1

), x, y ∈R

d.(1.10)

Note that α(x, y) does not depend on Z . The Metropolis–Hastings algorithmwith proposal kernel p is the following Markov chain Monte Carlo methodfor approximate sampling and Monte Carlo integration w.r.t. µ:

(1) Choose an initial state X0.(2) For n := 0,1,2, . . .:

• Sample Yn ∼ p(Xn, dy) and Un ∼Unif(0,1) independently.• If Un < α(Xn, Yn), then accept the proposal, and set Xn+1 := Yn, else

reject the proposal and set Xn+1 :=Xn.

The algorithm generates a time-homogeneous Markov chain (Xn)n=0,1,2,...

with initial state X0 and transition kernel

q(x,dy) = α(x, y)p(x, y)dy + r(x) · δx(dy).(1.11)

Here

r(x) = 1− q(x,Rd \ x) = 1−∫

Rd

α(x, y)p(x, y)dy(1.12)

is the average rejection probability for the proposal when the Markov chain isat x. Note that q(x,dy) restricted to R

d \x is again absolutely continuouswith density

q(x, y) = α(x, y)p(x, y).

Since

µ(x)q(x, y) = α(x, y)µ(x)p(x, y) = min(µ(y)p(y,x), µ(x)p(x, y))

is a symmetric function in x and y, the kernel q(x,dy) satisfies the detailed

balance condition

µ(dx)q(x,dy) = µ(dy)q(y, dx).(1.13)

In particular, µ is a stationary distribution for the Metropolis–Hastingschain, and the chain with initial distribution µ is reversible. Therefore, un-der appropriate ergodicity assumptions, the distribution of Xn will convergeto µ as n→∞.

6 A. EBERLE

To analyze Metropolis–Hastings algorithms it is convenient to introducethe function

G(x, y) = logµ(x)p(x, y)

µ(y)p(y,x)= U(y)−U(x) + log

p(x, y)

p(y,x).(1.14)

For any x, y ∈Rd,

α(x, y) = exp(−G(x, y)+).(1.15)

In particular, for any x, y, x, y ∈Rd,

1−α(x, y)≤G(x, y)+,(1.16)

(α(x, y)− α(x, y))+ ≤ (G(x, y)−G(x, y))− and(1.17)

(α(x, y)− α(x, y))− ≤ (G(x, y)−G(x, y))+.(1.18)

The function G(x, y) defined by (1.14) can also be represented in terms ofV : Indeed, since

logγd(x)

γd(y)=

1

2(|y|2 − |x|2),

we have

G(x, y) = V (y)− V (x) + logγd(x)p(x, y)

γd(y)p(y,x),(1.19)

where γd(x) = (2π)−d/2 exp(−|x|2/2) denotes the standard normal densityin R

d.

1.2. Metropolis–Hastings algorithms with Gaussian proposals. We aimat proving contractivity of Metropolis–Hastings kernels w.r.t. appropriateKantorovich–Rubinstein–Wasserstein distances. For this purpose, we arelooking for a proposal kernel that has adequate contractivity properties andsufficiently small rejection probabilities. The rejection probability is smallif the proposal kernel approximately satisfies the detailed balance conditionw.r.t. µ.

1.2.1. Ornstein–Uhlenbeck proposals. A straightforward approach wouldbe to use a proposal density that satisfies the detailed balance condition

γd(x)p(x, y) = γd(y)p(y,x) for any x, y ∈Rd(1.20)

w.r.t. the standard normal distribution. In this case,

G(x, y) = V (y)− V (x).(1.21)


The simplest form of proposal distributions satisfying (1.20) are the transi-tion kernels of AR(1) (discrete Ornstein–Uhlenbeck) processes given by

pOUh (x,dy) =N

((1− h

2

)x,

(h− h2

4

)Id

)(1.22)

for some constant h ∈ (0,2). If Z is a standard normally distributed Rd-valued random variable, then the random variables

Y OUh (x) :=

(1− h

2

)x+

√h− h2

4Z, x ∈R

d,(1.23)

have distributions pOUh (x,dy). Note that by (1.21), the acceptance probabil-

ities

αOU(x, y) = exp(−GOU(x, y)+) = exp(−(V (y)− V (x))+)(1.24)

for Ornstein–Uhlenbeck proposals do not depend on h.

1.2.2. Euler proposals. In continuous time, under appropriate regularityand growth conditions on V , detailed balance w.r.t. µ is satsfied exactlyby the transition functions of the diffusion process solving the over-dampedLangevin stochastic differential equation

dXt =−12Xt dt− 1

2∇V (Xt)dt+ dBt,(1.25)

because the generator

L = 12∆− 1

2x · ∇ − 12∇V · ∇= 1

2 (∆−∇U · ∇)

is a self-adjoint operator on an appropriate dense subspace of L2(Rd;µ); cf.[31]. Although we cannot compute and sample from the transition func-tions exactly, we can use approximations as proposals in a Metropolis–Hastings algorithm. A corresponding MH algorithm where the proposalsare obtained from a discretization scheme for the SDE (1.25) is called aMetropolis-adjusted Langevin algorithm (MALA); cf. [27, 30].

In this paper, we focus on the MALA scheme with proposal kernel

ph(x, ·) =N

((1− h

2

)x− h

2∇V (x),

(h− h2

4

)· Id)

(1.26)

for some constant h ∈ (0,2); that is, ph(x, ·) is the distribution of

Yh(x) = x− h

2∇U(x) +

√h− h2

4Z

(1.27)

=

(1− h

2

)x− h

2∇V (x) +

√h− h2

4Z,

where Z ∼ γd is a standard normal random variable with values in Rd.

8 A. EBERLE

Note that if h − h2/4 is replaced by h, then (1.27) is a standard Eulerdiscretization step for the SDE (1.25). Replacing h by h−h2/4 ensures thatdetailed balance is satisfied exactly for V ≡ 0. Alternatively, (1.27) can beviewed as a semi-implicit Euler discretization step for (1.25):

Remark 1.2 (Euler schemes). The explicit Euler discretization of theover-damped Langevin equation (1.25) with time step size h > 0 is given by

Xn+1 =

(1− h

2

)Xn −

h

2∇V (Xn) +

√hZn+1, n= 0,1,2, . . . ,(1.28)

where Zn, n ∈N, are i.i.d. random variables with distribution γd. The process(Xn) defined by (1.28) is a time-homogeneous Markov chain with transitionkernel

pEulerh (x, ·) =N

((1− h

2

)x− h

2∇V (x), h · Id

).(1.29)

Even for V ≡ 0, the measure µ is not a stationary distribution for the kernelpEulerh . A semi-implicit Euler scheme for (1.25) with time-step size ε > 0 isgiven by

Xn+1 −Xn =−ε

2· Xn+1 +Xn

2− ε

2∇V (Xn) +

√εZn+1(1.30)

with Zn i.i.d. with distribution γd; cf. [14]. Note that the scheme is implicitonly in the linear part of the drift but explicit in ∇V . Solving for Xn+1 in(1.30) and substituting h = ε/(1 + ε

4) with h ∈ (0,2) yields the equivalentequation

Xn+1 =

(1− h

2

)Xn −

h

2∇V (Xn) +

√h− h2

4Zn+1.(1.31)

We call the Metropolis–Hastings algorithm with proposal kernel ph(x, ·)a semi-implicit MALA scheme with step size h.

Proposition 1.3 (Acceptance probabilities for semi-implicit MALA).Let V ∈ C1(Rd) and h ∈ (0,2). Then the acceptance probabilities for the

Metropolis-adjusted Langevin algorithm with proposal kernels ph are given

by αh(x, y) = exp(−Gh(x, y)+) with

Gh(x, y)

= V (y)− V (x)− y− x

2· (∇V (y) +∇V (x))(1.32)

+h

8− 2h[(y + x) · (∇V (y)−∇V (x)) + |∇V (y)|2 − |∇V (x)|2].


For explicit Euler proposals with step size h > 0, a corresponding represen-

tation holds with

GEulerh (x, y) = V (y)− V (x)− y− x

2· (∇V (y) +∇V (x))

(1.33)

+h

8[|y+∇V (y)|2 − |x+∇V (x)|2].

The proof of the proposition is given in Section 4 below.

Remark 1.4. For explicit Euler proposals, the O(h) correction term in(1.33) does not vanish for V ≡ 0. More significantly, this term goes to infinityas |y− x| →∞, and the variance of y− x w.r.t. the proposal distribution isof order O(d).

1.3. Bounds for rejection probabilities. We fix a norm ‖ · ‖− on Rd such

that

‖x‖− ≤ |x| for any x ∈Rd.(1.34)

We assume that V is sufficiently smooth w.r.t. ‖·‖− with derivatives growingat most polynomially:

Assumption 1.5. The function V is in C4(Rd), and for any n ∈ 1,2,3,4,there exist finite constants Cn ∈ [0,∞), pn ∈ 0,1,2, . . . such that

|(∂nξ1,...,ξnV )(x)| ≤Cnmax(1,‖x‖−)pn‖ξ1‖− · · · · · ‖ξn‖−

holds for any x ∈Rd and ξ1, . . . , ξn ∈R

d.

For discretizations of infinite-dimensional models, ‖·‖− will typically be afinite-dimensional approximation of a norm that is almost surely finite w.r.t.the limit measure in infinite-dimensions.

Example 1.6 (Transition path sampling). Consider the situation of Ex-ample 1.1, and assume that H is in C6(Rd). Then by (1.9) and (1.6), Vd isC4. For n ≤ 4 and x, ξ1, . . . , ξn ∈ R

d, the directional derivatives of Vd aregiven by

∂nξ1···ξnVd(x) = 2−m−1

2m∑

k=0

wkDnφ(yk2−m)[η1,k2−m , . . . , ηn,k2−m],(1.35)

where y, η1, . . . , ηn are the polygonal paths in E corresponding to x, ξ1, . . . , ξn,respectively, wk = 1 for k = 1, . . . ,2m − 1 and w0 = w1 = 1/2. Assuming‖D4φ(z)‖=O(|z|r) for some integer r ≥ 0 as |z| →∞, we can estimate

|∂nξ1···ξnVd(x)| ≤Cnmax(1,‖y‖Lq )pn‖η1‖Lq · · · · · ‖ηn‖Lq ,

10 A. EBERLE

where q = r + 4, pn = r + (4 − n), ‖y‖Lq = 2−m∑2m

k=0wk|yk|q is a discreteLq norm of the polygonal path y and C1, . . . ,C4 are finite constants thatdo not depend on the dimension d. One could now choose for the minusnorm the norm on R

d corresponding to the discrete Lq norm on polygonalpaths. However, it is more convenient to choose a norm coming from aninner product. To this end, we consider the norms

‖y‖α =

(∑

n,k,i

2−2αnx2n,k,i

)1/2

, y = e+∑

xn,k,ien,k,i,

on path space E, and the induced norms

‖x‖α =

(∑

n<m

∑

k,i

2−2αnx2n,k,i

)1/2

, x ∈Rd,

on Rd where d= (2m−1)ℓ. One can show that for α< 1/2+1/q, the Lq norm

can be bounded from above by ‖·‖α independently of the dimension; cf. [12].On the other hand, if α> 1/2, then ‖y‖α <∞ for γ-almost every path y ofthe Brownian bridge. This property will be crucial when restricting to ballsw.r.t. ‖ · ‖α. For ‖ · ‖− = ‖ · ‖α with α ∈ (1/2,1/2 + 1/q), both requirementsare satisfied, and Assumption 1.5 holds with constants that do not dependon the dimension.

The next proposition yields in particular an upper bound for the averagerejection probability w.r.t. both Ornstein–Uhlenbeck and semi-implicit Eulerproposals at a given position x ∈R

d; cf. [6] for an analogue result:

Proposition 1.7 (Upper bounds for MH rejection probabilities). Sup-

pose that Assumption 1.5 is satisfied and let k ∈N. Then there exist polyno-

mials POUk :R→R+ and Pk :R

2 →R+ of degrees p1+1, max(p3+3,2p2+2),respectively, such that for any x ∈R

d and h ∈ (0,2),

E[(1− αOU(x,Y OUh (x)))k]1/k ≤POU

k (‖x‖−) · h1/2 and

E[(1−αh(x,Yh(x)))k]1/k ≤Pk(‖x‖−,‖∇U(x)‖−) · h3/2.

The result is a consequence of Proposition 1.3. The proof is given inSection 4 below.

Remark 1.8. (1) The polynomials POUk and Pk in Proposition 1.7 are

explicit; cf. the proof below. They depend only on the values Cn, pn in As-sumption 1.5 for n= 1, n= 2,3, respectively, and on the moments

mn =E[‖Z‖n−], n≤ k · (p1 + 1), n≤ k ·max(p3 +3,2p2 +2),

respectively,


but they do not depend on the dimension d. For semi-implicit Euler pro-posals, the upper bound in Proposition 1.7 is stated in explicit form for thecase k = 1 and p2 = p3 = 0 in (4.6) below.

(2) For explicit Euler proposals, corresponding estimates hold with mn

replaced by mn =E[|Z|n]; cf. Remark 4.3 below. Note, however, that mn →∞ as d→∞.

Our next result is a bound of order O(h1/2), O(h3/2), respectively, forthe average dependence of the acceptance event on the current state w.r.t.Ornstein–Uhlenbeck and semi-implicit Euler proposals. Let ‖ ·‖+ denote thedual norm of ‖ · ‖− on R

d, that is,

‖ξ‖+ = supξ · η|η ∈Rd with ‖η‖− ≤ 1.

Note that

‖ξ‖− ≤ |ξ| ≤ ‖ξ‖+ ∀ξ ∈Rd.

For a function F ∈C1(Rd),

|F (y)−F (x)|=∣∣∣∣∫ 1

0(y− x) · ∇F ((1− t)x+ ty)dt

∣∣∣∣

≤ ‖y − x‖− · supz∈[x,y]

‖∇F (z)‖+,

that is, the plus norm of ∇F determines the Lipschitz constant w.r.t. theminus norm.

Proposition 1.9 (Dependence of rejection on the current state). Sup-

pose that Assumption 1.5 is satisfied, and let k ∈ N. Then there exist poly-

nomials QOUk :R→R+ and Qk :R

2 →R+ of degrees p2+1, max(p4+3, p3+p2 + 2,3p2 + 1), respectively, such that for any x, x ∈R

d and h ∈ (0,2),

E[‖∇xGOU(x,Y OU

h (x))‖k+]1/k ≤QOU

k (‖x‖−) · h1/2,(1.36)

E[‖∇xGh(x,Yh(x))‖k+]1/k ≤Qk(‖x‖−,‖∇U(x)‖−) · h3/2,(1.37)

E[|αOU(x,Y OUh (x))− αOU(x, Y OU

h (x))|k]1/k(1.38)

≤QOUk (max(‖x‖−,‖x‖−)) · ‖x− x‖− · h1/2 and

E[|αh(x,Yh(x))− αh(x, Yh(x))|k]1/k(1.39)

≤Qk

(max(‖x‖−,‖x‖−), sup

z∈[x,x]‖∇U(z)‖−

)· ‖x− x‖− · h3/2,

where [x, x] denotes the line segment between x and x.

The proof of the proposition is given in Section 5 below.

12 A. EBERLE

Remark 1.10. Again, the polynomials QOUk and Qk are explicit. They

depend only on the values Cn, pn in Assumption 1.5 for n= 1,2, n= 2,3,4,respectively, and on the moments mn = E[‖Z‖n−] for n ≤ k · (p2 + 1), n ≤k ·max(p4 + 3, p3 + p2 + 2,2p2 + 1), respectively, but they do not dependon the dimension d. For semi-implicit Euler proposals, the upper bound inProposition 1.9 is made explicit for the case k = 1 and p2 = p3 = p4 = 0 in(5.18) below.

For Ornstein–Uhlenbeck proposals, it will be useful to state the boundsin Propositions 1.7 and 1.9 more explicitly for the case p2 = 0, that is, whenthe second derivatives of V are uniformly bounded w.r.t. the minus norm:

Proposition 1.11. Suppose that Assumption 1.5 is satisfied for n= 1,2with p2 = 0. Then for any x, x ∈R

d and h ∈ (0,2),

E[1− αOU(x,Y OUh (x))]

≤m1(C1 +C2‖x‖−) · h1/2

+ 12(2m2C2 +C1‖x‖− +C2‖x‖2−) · h+ 1

2m1C2‖x‖− · h3/2,and

E[|αOU(x,Y OUh (x))− αOU(x, Y OU

h (x))|2]1/2

≤ (m1/22 C2 · h1/2 + 1

2 (C1 +2C2max(‖x‖−,‖x‖−)) · h) · ‖x− x‖−.

The proof is given in Sections 4 and 5 below. Again, corresponding boundsalso hold for Lk norms for k 6= 1,2.

1.4. Wasserstein contractivity. The bounds in Propositions 1.7, 1.9 and1.11 can be applied to study contractivity properties of Metropolis–Hastingstransition kernels. Recall that theKantorovich–Rubinstein or L1-Wassersteindistance of two probability measures µ and ν on the Borel σ-algebra B(Rd)w.r.t. a given metric d on R

d is defined by

W(µ, ν) = infη∈Π(µ,ν)

∫d(x, x)η(dxdx),

where Π(µ, ν) consists of all couplings η of µ and ν, that is, all probabilitymeasures η on R

d×Rd with marginals µ and ν; cf., for example, [34]. Recall

that a coupling of µ and ν can be realized by random variables W and W

defined on a joint probability space such that W ∼ µ and W ∼ ν.In order to derive upper bounds for the distances W(µqh, νqh), and, more

generally, W(µqnh , νqnh), n ∈N, we define a coupling of the MALA transition

probabilities qh(x, ·), x ∈Rd, by setting

Wh(x) :=

Yh(x), if U ≤ αh(x,Yh(x)),

x, if U > αh(x,Yh(x)).


Here Yh(x), x ∈Rd, is the basic coupling of the proposal distributions ph(x, ·)

defined by (1.27) with Z ∼ γd, and the random variable U is uniformlydistributed in (0,1) and independent of Z.

Correspondingly, we define a coupling of the Metropolis–Hastings transi-tion kernels qOU

h based on Ornstein–Uhlenbeck proposals by setting

WOUh (x) :=

Y OUh (x), if U ≤ αOU(x,Y OU

h (x)),

x, if U > αOU(x,Y OUh (x)).

Let

B−R := x ∈R

d :‖x‖− <Rdenote the centered ball of radius R w.r.t. ‖ ·‖−. As a consequence of Propo-sition 1.11 above, we obtain the following upper bound for the Kantorovich–Rubinstein–Wasserstein distance of qOU

h (x, ·) and qOUh (x, ·) w.r.t. the metric

d(x, x) = ‖x− x‖−:

Theorem 1.12 (Contractivity of MH transitions based on OU proposals).Suppose that Assumption 1.5 is satisfied for n= 1,2 with p2 = 0. Then for

any h ∈ (0,2), R ∈ (0,∞), and x, x∈B−R ,

E[‖WOUh (x)−WOU

h (x)‖−]≤ cOUh (R) · ‖x− x‖−,

where

cOUh (R) = 1− 1

2h+m2C2h+A(1 +R)(1 + h1/2R)h3/2

with an explicit constant A that only depends on the values m1,m2,C1

and C2.

The proof is given in Section 7 below.Theorem 1.12 shows that Wasserstein contractivity holds on the ball B−

Rprovided 2m2C2 < 1 and h is chosen sufficiently small depending on R [withh1/2 = O(R−1)]. In this case, the contraction constant cOU

h (R) depends onthe dimension only through the values of the constants C1,C2,m1 and m2.On the other hand, the following one-dimensional example shows that form2C2 > 1, the acceptance-rejection step may destroy the contraction prop-erties of the OU proposals:

Example 1.13. Suppose that d= 1 and ‖·‖− = | · |. If V (x) = bx2/2 witha constant b ∈ (−1/2,1/2), then by Theorem 1.12, Wasserstein contractivityholds for the Metropolis–Hastings chain with Ornstein–Uhlenbeck propos-als on the interval (−R,R) provided h is chosen sufficiently small. On theother hand, if V (x) = bx2/2 for |x| ≤ 1 with a constant b < −1, then thelogarithmic density

U(x) = V (x) + x2/2 = (b+1) · x2/2

14 A. EBERLE

is strictly concave for |x| ≤ 1, and it can be easily seen that Wassersteincontractivity on (−1,1) does not hold for the MH chain with OU proposalsif h is sufficiently small.

A disadvantage of the result for Ornstein–Uhlenbeck proposals statedabove is that not only a lower bound on the second derivative of V is required(this would be a fairly natural condition as the example indicates), but alsoan upper bound of the same size. For semi-implicit Euler proposals, wecan derive a better result that requires only a strictly positive lower boundon the second derivative of U(x) = V (x) + |x|2/2 and Assumption 1.5 witharbitrary constants to be satisfied. For this purpose we assume that

‖ · ‖− = 〈·, ·〉1/2

for an inner product 〈·, ·〉 on Rd, and we make the following assumption

on U :

Assumption 1.14. There exists a strictly positive constant K ∈ (0,1]such that

〈η,∇2U(x) · η〉 ≥K〈η, η〉 for any x, η ∈Rd.(1.40)

Of course, Assumption 1.14 is still restrictive, and it will often be satisfiedonly in a suitable ball around a local minimum of U . Most of the resultsbelow are stated on a given ball B−

R w.r.t. the minus norm. In this case itis enough to assume that 1.14 holds on that ball. If ‖ · ‖− coincides withthe Euclidean norm | · |, then the assumption is equivalent to convexity ofU(x)−K|x|2. Moreover, since ∇2U(x) = Id+∇2V (x), a sufficient conditionfor (1.40) to hold is

‖∇2V (x) · η‖− ≤ (1−K)‖η‖− for any x, η ∈Rd.(1.41)

As a consequence of Propositions 1.7 and 1.9 above, we obtain the fol-lowing upper bound for the Kantorovich–Rubinstein–Wasserstein distanceof qh(x, ·) and qh(x, ·) w.r.t. the metric d(x, x) = ‖x− x‖−:

Theorem 1.15 (Contractivity of semi-implicit MALA transitions). Sup-

pose that Assumptions 1.5 and 1.14 are satisfied. Then for any h ∈ (0,2),R ∈ (0,∞) and x, x ∈B−

R ,

E[‖Wh(x)−Wh(x)‖−]≤ ch(R) · ‖x− x‖−,

where

ch(R) = 1− 12Kh+ (18M(R)2 + γ(R))h2 + (Kβ(R) + 1

2δ(R))h5/2


with

M(R) = sup‖∇2U(z) · η‖− :η ∈B−1 , z ∈B−

R,

β(R) = supP1(‖z‖−,‖∇U(z)‖−) : z ∈B−R,

γ(R) =m1/22 · supQ2(‖z‖−,‖∇U(z)‖−) : z ∈B−

R,

δ(R) = supQ2(‖z‖−,‖∇U(z)‖−)‖∇U(z)‖− : z ∈B−R.

The proof is given in Section 7 below.

Remark 1.16. Theorem 1.15 shows in particular that under Assump-tions 1.5 and 1.14, there exist constants C,q ∈ (0,∞) such that the contrac-tion

E[‖Wh(x)−Wh(x)‖−]≤(1− K

4h

)‖x− x‖

holds for x, x ∈B−R whenever h−1 ≥C · (1 +Rq).

Example 1.17 (Transition path sampling). In the situation of Examples1.1 and 1.6 above, condition (1.41) and (hence) assumption 1.14 are satisfiedon a ball B−

R with K independent of d provided ‖D2φ(x)‖ ≤ 1−K for anyx ∈ B−

R ; cf. (1.35). More generally, by modifying the metric in a suitableway if necessary, one may expect Assumption 1.14 to hold uniformly in thedimension in neighborhoods of local minima of U .

1.5. Conclusions. For R ∈ (0,∞), we denote by WR the Kantorovich–Rubinstein–Wasserstein distance based on the distance function

dR(x, x) := min(‖x− x‖−,2R).(1.42)

Note that dR is a bounded metric that coincides with the distance functioninduced by the minus norm on B−

R . The bounds resulting from Theorems1.15 and 1.12 can be iterated to obtain estimates for the KRW distance WR

between the distributions of the corresponding Metropolis–Hastings chainsafter n steps w.r.t. two different initial distributions.

Corollary 1.18. Suppose that Assumptions 1.5 and 1.14 are satisfied,

and let h ∈ (0,2) and R ∈ (0,∞). Then for any n ∈N, and for any probability

measures µ, ν on B(Rd),

WR(µqnh , νq

nh)≤ ch(R)nWR(µ, ν)

+ 2R · (Pµ[∃k < n :Xk /∈B−R ] + Pν[∃k < n :Xk /∈B−

R ]).

16 A. EBERLE

Here ch(R) is the constant in Theorem 1.15, and (Xn,Pµ) and (Xn,Pν)are Markov chains with transition kernel qh and initial distributions µ, ν,respectively. A corresponding result with ch replaced by cOU

h holds for the

Metropolis–Hastings chain with Ornstein–Uhlenbeck proposals.

Since the joint law of Wh(x) and Wh(x) is a coupling of qh(x, ·) andqh(x, ·) for any x, x ∈ R

d, Corollary 1.18 is a direct consequence of Theo-rems 1.15, 1.12, respectively, and Theorem 2.3 below. The corollary can beused to quantify the Wasserstein distance between the distribution of theMetropolis–Hastings chain after n steps w.r.t. two different initial distribu-tions. For this purpose, one can estimate the exit probabilities from the ballB−

R via an argument based on a Lyapunov function. For semi-implicit Eulerproposals we eventually obtain the following main result:

Theorem 1.19 (Quantitative convergence bound for semi-implicit MALA).Suppose that Assumptions 1.5 and 1.14 are satisfied. Then there exist con-

stants C,D, q ∈ (0,∞) such that the estimate

W2R(νqnh , πq

nh)≤

(1− K

4h

)n

W2R(ν,π) +DR exp

(−KR2

8

)nh

holds for any n ∈N, h,R ∈ (0,∞) such that h−1 ≥C · (1 +R)q, and for any

probability measures ν,π on Rd with support in B−

R . The constants C, D and

q can be made explicit. They depend only on the values of the constants in

Assumptions 1.5 and 1.14 and on the moments mk, k ∈N, w.r.t. the minus

norm, but they do not depend explicitly on the dimension.

The proof of Theorem 1.19 is given in Section 7 below.Let µR(A) = µ(A|B−

R ) denote the conditional probability measure givenB−

R . Recalling that µ is a stationary distribution for the kernel qh, we canapply Theorem 1.19 to derive a bound for the Wasserstein distance of thediscretization of the MALA chain and µR after n steps:

Theorem 1.20. Suppose that Assumptions 1.5 and 1.14 are satisfied.

Then there exist constants C, D, q ∈ (0,∞) that do not depend explicitly on

the dimension such that the estimate

W2R(νqnh , µR)≤ 58R

(1− K

4h

)n

+ DR exp

(−KR2

33

)nh

holds for any n ∈N, h,R ∈ (0,∞) such that h−1 ≥C · (1 +R)q, and for any

probability measure ν on Rd with support in B−

R .

The proof is given in Section 7.


Given an error bound ε ∈ (0,∞) for the Kantorovich–Rubinstein–Wasser-stein distance, we can now determine how many steps of the MALA chainare required such that

W2R(νqnh , µR)< ε for any ν with support in B−

R .(1.43)

Assuming

nh≥ 4

Klog

(116R

ε

),(1.44)

we have 58R(1−Kh/4)n ≤ ε/2. Hence (1.43) holds provided the assumptionsin Theorem 1.20 are satisfied, and

DR exp(−KR2/33)nh< ε/2.(1.45)

For a minimal choice of n, all conditions are satisfied if R is of order(log ε−1)1/2 up to a log log correction, and the inverse step size h−1 is oforder (log ε−1)q/2 up to a log log correction. Hence if Assumption 1.14 holdson R

d, then a number n of steps that is polynomial in log ε−1 is sufficientto bound the error by ε independently of the dimension.

On the other hand, if Assumption 1.14 is satisfied only on a ball B−R

of given radius R, then a given error bound ε is definitely achieved onlyprovided (1.45) holds with the minimal choice for nh satisfying (1.44), thatis, if

8DK−1 log(116Rε−1)R exp(−KR2/33)< ε.(1.46)

If ε is chosen smaller, then the chain may leave the ball B−R before sufficient

mixing on B−R has taken place.

2. Wasserstein contractivity of Metropolis–Hastings kernels. In this sec-tion, we first consider an arbitrary stochastic kernel q :S ×B(S)→ [0,1] ona metric space (S,d). Further below, we will choose S = R

d and d(x, y) =‖x− y‖− ∧R for some constant R ∈ (0,∞], and we will assume that q is thetransition kernel of a Metropolis–Hastings chain.

The Kantorovich–Rubinstein or L1-Wasserstein distance of two proba-bility measures µ and ν on the Borel-σ-algebra B(S) w.r.t. the metric d isdefined by

Wd(µ, ν) = infη

∫d(x, x)η(dxdx),

where the infimum is over all couplings η of µ and ν, that is, over all prob-ability measures η on S × S with marginals µ and ν; cf., for example, [34].In order to derive upper bounds for the Kantorovich distances Wd(µq, νq),and more generally, Wd(µq

n, νqn), n ∈ N, we construct couplings betweenthe measures q(x, ·) for x ∈ S, and we derive bounds for the distancesWd(q(x, ·), q(x, ·)), x, x ∈ S.

18 A. EBERLE

Definition 2.1. A Markovian coupling of the probability measuresq(x, ·), x ∈ S, is a stochastic kernel c on the product space (S×S,B(S×S))such that for any x, x ∈ S, the distribution of the first and second componentunder c((x, x), dy dy) is q(x,dy) and q(x, dy), respectively.

Example 2.2. (1) Suppose that (Ω,A,P) is a probability space, and let

(x, x, ω) 7→ Y (x, x)(ω), (x, x, ω) 7→ Y (x, x)(ω) be product measurable func-

tions from S × S ×Ω to S such that Y (x, x)∼ q(x, ·) and Y (x, x) ∼ q(x, ·)w.r.t. P for any x, x ∈ S. Then the joint distributions

c((x, x), ·) = P (Y (x, x), Y (x, x))−1, x, x ∈ S,

define a Markovian coupling of the measures q(x, ·), x ∈ S.(2) In particular, if (x,ω) 7→ Y (x)(ω) is a product measurable function

from S ×Ω to S such that Y (x)∼ q(x, ·) for any x ∈ S, then

c((x, x), ·) = P (Y (x), Y (x))−1

is a Markovian coupling of the measures q(x, ·), x ∈ S.

Suppose that (Xn, Xn) on (Ω,A,P) is a Markov chain with values in S×Sand transition kernel c, where c is a Markovian coupling w.r.t. the kernelq. Then the components (Xn) and (Xn) are Markov chains with transitionkernel q and initial distributions given by the marginals of the initial distri-bution of (Xn, Xn), that is, (Xn, Xn) is a coupling of these Markov chains.We will apply the following general theorem to quantify the deviation fromequilibrium after n steps of the Markov chain with transition kernel q:

Theorem 2.3. Let γ ∈ (0,1), and let c((x, x), dy dy) be a Markovian

coupling of the probability measures q(x, ·), x ∈ S. Suppose that O is an open

subset of S, and assume that the metric d is bounded. Let

∆ := diamS = supd(x, x) :x, x∈ S.If the contractivity condition

∫d(y, y)c((x, x), dy dy)≤ γ · d(x, x)(2.1)

holds for any x, x ∈O, then

Wd(µqn, νqn)

(2.2)≤ γnWd(µ, ν) +∆ · (Pµ[∃k < n :Xk /∈O] + Pν [∃k < n :Xk /∈O])

for any n ∈N and for any probability measures µ, ν on B(S). Here (Xn,Pµ)and (Xn,Pν) are Markov chains with transition kernel q and initial distri-

butions µ, ν, respectively.


Proof. Suppose that µ and ν are probability measures on B(S) and

η(dxdx) is a coupling of µ and ν. We consider the coupling chain (Xn, Xn)on (Ω,A,P) with initial distribution η and transition kernel c. Since (Xn)

and (Xn) are Markov chains with transition kernel q and initial distributions

µ and ν, we have PX−1n = µqn and P X−1

n = νqn for any n ∈N. Moreover,by (2.1),

E[d(Xn, Xn); (Xk, Xk) ∈O ×O ∀k < n]

= E

[∫d(xn, xn)c((Xn−1, Xn−1), dxn dxn); (Xk, Xk) ∈O ×O ∀k < n

]

≤ γE[d(Xn−1, Xn−1); (Xk, Xk) ∈O ×O ∀k < n− 1].

Therefore, by induction,

Wd(µqn, νqn)≤ E[d(Xn, Xn)]

= E[d(Xn, Xn); (Xk, Xk) ∈O ×O ∀k < n]

+E[d(Xn, Xn);∃k < n : (Xk, Xk) /∈O×O]

≤ γnd(x, x) +∆ · P[∃k < n : (Xk, Xk) /∈O×O],

which implies (2.2).

Remark 2.4. Theorem 2.3 may also be useful for studying local equili-bration of a Markov chain within a metastable state. In fact, if O is a regionof the state space where the process stays with high probability for a longtime, and if a contractivity condition holds on O, then the result can beused to bound the Kantorovich–Rubinstein–Wasserstein distance betweenthe distribution after a finite number of steps and the stationary distribu-tion conditioned to O.

From now on, we assume that we are given a Markovian coupling of theproposal distributions p(x, ·), x ∈ R

d, of a Metropolis–Hastings algorithmwhich is realized by product measurable functions (x, x, ω) 7→ Y (x, x)(ω),

Y (x, x)(ω) on a probability space (Ω,A,P) such that

Y (x, x)∼ p(x, ·) and Y (x, x)∼ p(x, ·) for any x, x ∈Rd.

Let α(x, y) and q(x,dy) again denote the acceptance probabilities and thetransition kernel of the Metropolis–Hastings chain with stationary distribu-tion µ; cf. (1.10), (1.11) and (1.12). Moreover, suppose that U is a uniformlydistributed random variable with values in (0,1) that is independent of

Y (x, x) :x, x ∈Rd. Then the functions (x, x, ω) 7→W (x, x)(ω), W (x, x)(ω)

20 A. EBERLE

defined by

W (x, x) :=

Y (x, x), if U ≤ α(x,Y (x, x)),

x, if U >α(x,Y (x, x)),

W (x, x) :=

Y (x, x), if U ≤ α(x, Y (x, x)),

x, if U >α(x, Y (x, x)),

realize a Markovian coupling between the Metropolis–Hastings transitionfunctions q(x, ·), x ∈R

d, that is,

W (x, x)∼ q(x, ·) and W (x, x)∼ q(x, ·)for any x, x ∈R

d. This coupling is optimal in the acceptance step in the sensethat it minimizes the probability that a proposed move from x to Y (x, x) is

accepted and the corresponding proposed move from x to Y (x, x) is rejectedor vice versa.

Lemma 2.5 (Basic contractivity lemma for MH kernels). For any x, x ∈Rd,

E[d(W (x, x), W (x, x))]

≤ E[d(Y (x, x), Y (x, x))]

+E[(d(x, x)− d(Y (x, x), Y (x, x)))

×max(1−α(x,Y (x, x)),1−α(x, Y (x, x)))]

+E[d(x,Y (x, x) · (α(x,Y (x, x))− α(x, Y (x, x)))+]

+E[d(x, Y (x, x) · (α(x,Y (x, x))− α(x, Y (x, x)))−].

Proof. By the definition of W and by the triangle inequality, we obtainthe estimate

E[d(W (x, x), W (x, x))]

≤ E[d(Y (x, x), Y (x, x));U <min(α(x,Y (x, x)), α(x, Y (x, x)))]

+ d(x, x) · P[U ≥min(α(x,Y (x, x)), α(x, Y (x, x)))]

+E[d(x,Y (x, x));α(x, Y (x, x))≤ U < α(x,Y (x, x))]

+E[d(x, Y (x, x));α(x,Y (x, x))≤ U < α(x, Y (x, x))].

The assertion now follows by conditioning on Y and Y .

Remark 2.6. (1) Note that the upper bound in Lemma 2.5 is close toan equality. Indeed, the only estimate in the proof is the triangle inequalitythat has been applied to bound d(x, Y ) by d(x, x) + d(x, Y ) and d(x, Y ) byd(x, x) + d(x,Y ).


(2) For the couplings and distances considered in this paper, d(Y, Y ) willalways be deterministic. Therefore, the upper bound in the lemma simplifiesto

E[d(W,W )]

≤ d(Y, Y ) + (d(x, x)− d(Y, Y )) ·E[max(1− α(x,Y ),1−α(x, Y ))](2.3)

+ E[d(x,Y )(α(x,Y )− α(x, Y ))+ + d(x, Y )(α(x,Y )−α(x, Y ))−].

Here E[max(1−α(x,Y ),1−α(x, Y ))] is the probability that at least one ofthe proposals is rejected.

(3) If the metric d is bounded with diameter ∆, then the last two expec-tations in the upper bound in Lemma 2.5 can be estimated by ∆ times theprobability E[|α(x,Y )− α(x, Y )|] that one of the proposals is rejected andthe other one is accepted. Alternatively (and usually more efficiently), theseterms can be estimated by Holder’s inequality.

3. Contractivity of the proposal step. In this section we assume V ∈C2(Rd). We study contractivity properties of the Metropolis–Hastings pro-posals defined in (1.23) and (1.27).

Note first that the Ornstein–Uhlenbeck proposals do not depend on V .For h ∈ (0,2), the contractivity condition

‖Y OUh (x)− Y OU

h (x)‖= ‖(1− h/2)(x− x)‖= (1− h/2)‖x− x‖(3.1)

holds pointwise for any x, x ∈Rd w.r.t. an arbitrary norm ‖ · ‖ on R

d.For the semi-implicit Euler proposals

Yh(x) = x− h

2∇U(x) +

√h− h2

4Z, Z ∼ γd.

Wasserstein contractivity does not necessarily hold. Close to optimal suffi-cient conditions for contractivity w.r.t. the minus norm can be obtained ina straightforward way by considering the derivative of Yh w.r.t. x.

Lemma 3.1. Let h ∈ (0,2), and let C be a convex subset of Rd. If there

exists a constant λ ∈ (0,∞) such that∥∥∥∥(Id −

h

2∇2U(x)

)· η∥∥∥∥−

≤ λ‖η‖− for any η ∈Rd, x ∈C,(3.2)

then

‖Yh(x)− Yh(x)‖− ≤ λ‖x− x‖− for any x, x ∈C.

Proof. If (3.2) holds, then

‖∂ηYh(x)‖− =

∥∥∥∥η−h

2∇2U(x) · η

∥∥∥∥−

≤ λ‖η‖−

22 A. EBERLE

for any x ∈C and η ∈Rd. Hence

‖Yh(x)− Yh(x)‖− =

∥∥∥∥∫ 1

0

d

dtYh(tx+ (1− t)x)dt

∥∥∥∥−

≤∫ 1

0‖∂x−xYh(tx+ (1− t)x)‖−dt

≤ λ‖x− x‖− for x, x ∈C.

Remark 3.2. (1) Note that condition (3.2) requires a bound on ∇2U inboth directions. This is in contrast to the continuous time case where a lowerbound by a strictly positive constant is sufficient to guarantee contractivityof the derivative flow.

(2) Condition (3.2) is equivalent to

ξ · η− h

2∂2ξηU(x)≤ λ‖ξ‖+‖η‖− for any x ∈C, ξ, η ∈R

d.(3.3)

Recall that for R ∈ (0,∞],

M(R) = sup‖∇2U(x) · η‖− :η ∈B−1 , x ∈B−

R.(3.4)

Hence M(R) bounds the second derivative of U on B−R in both directions,

whereas the constant K in Assumption 1.14 is a strictly positive lower boundfor the second derivative. We also define

N(R) = sup‖∇2V (x) · η‖− :η ∈B−1 , x ∈B−

R.(3.5)

Note that M(R)≤ 1 +N(R). As a consequence of Lemma 3.1 we obtain:

Proposition 3.3. For any h ∈ (0,2) and x, x ∈B−R ,

‖Yh(x)− Yh(x)‖− ≤(1− 1−N(R)

2h

)· ‖x− x‖−.(3.6)

Moreover, if Assumption 1.14 holds, then

‖Yh(x)− Yh(x)‖− ≤(1− K

2h+

M(R)2

8h2)· ‖x− x‖−.(3.7)

Proof. Note that for z ∈ [x, x] and η ∈R,(I − h

2∇2U(z)

)· η =

(1− h

2

)η− h

2∇2V (z) · η.(3.8)

Therefore, by (3.5),∥∥∥∥(I − h

2∇2U(z)

)· η∥∥∥∥−

≤(1− h

2

)‖η‖− +

h

2N(R) · ‖η‖−.

Inequality (3.6) now follows by Lemma 3.1.


Moreover, if Assumption 1.14 holds, then for z ∈ [x, x] and η ∈Rd,

∥∥∥∥(I − h

2∇2U(z)

)· η∥∥∥∥2

−

= ‖η‖2− − h〈η,∇2U(z) · η〉+ h2

4‖∇2U(z) · η‖2−

≤ (1−Kh+M(R)2h2/4)‖η‖2−= (1−Kh/2 +M(R)2h2/8)2‖η‖2−.

Inequality (3.7) again follows by Lemma 3.1.

4. Upper bounds for rejection probabilities. In this section we derive theupper bounds for the MH rejection probabilities stated in Proposition 1.7.As a first step we prove the explicit formula for the MALA acceptance prob-abilities w.r.t. explicit and semi-implicit Euler proposals stated in Proposi-tion 1.3:

Proof of Proposition 1.3. For explicit Euler proposals with givenstep size h > 0,

log γd(x)pEulerh (x, y)

=1

2|x|2 + 1

2h

∣∣∣∣y−(1− h

2

)x+

h

2∇V (x)

∣∣∣∣2

+C

=1

2h

(h|x|2 + |y − x|2 − hx · y + 1

4h2|x|2 + h(y − x) · ∇V (x)

+1

2h2x · ∇V (x) +

1

4h2|∇V (x)|2

)+C

= S(x, y) +1

2(y − x) · ∇V (x) +

h

8|x+∇V (x)|2

with a normalization constant C that does not depend on x and y, and asymmetric function S :Rd ×R

d →R. Therefore, by (1.19),

GEulerh (x, y) = V (y)− V (x) + log γd(x)pEulerh (x, y)− log γd(y)pEulerh (y,x)

= V (y)− V (x)− (y− x) · (∇V (y) +∇V (x))/2

+ h(|y+∇V (y)|2 − |x+∇V (x)|2)/8.Similarly, for semi-implicit Euler proposals we obtain

− logγd(x)ph(x, y) =1

2|x|2 + 1

2

∣∣∣∣y −(1− h

2

)x+

h

2∇V (x)

∣∣∣∣2/(

h− h2

4

)+C

=1

2

((h− h2

4

)|x|2 +

∣∣∣∣y−(1− h

2

)x+

h

2∇V (x)

∣∣∣∣2)

24 A. EBERLE

/(h− h2

4

)+C

=−1

2

h

4− h|x|2 + S(x, y) +

1

2· 4

4− h(y − x)∇V (x)

+1

2· h

4− h|x+∇V (x)|2

= S(x, y) +1

2(y − x) · ∇V (x)

+1

2

4

4− h[(y + x) · ∇V (x) + |∇V (x)|2],

and, therefore,

Gh(x, y)

= V (y)− V (x)− (y − x) · (∇V (y) +∇V (x))/2

+h

8− 2h[(y + x) · (∇V (y)−∇V (x)) + |∇V (y)|2 − |∇V (x)|2].

From now on we assume that Assumption 1.5 holds. We will derive upperbounds for the functions Gh(x, y) computed in Proposition 1.3. By (1.16),these directly imply corresponding upper bounds for the MALA rejectionprobabilities.

Let ∂nξ1,...,ξn

V (z) denote the nth-order directional derivative of the func-tion V at z in directions ξ1, . . . , ξn. By ∂nV we denote the nth-order differ-ential of V , that is, the n-form (ξ1, . . . , ξn) 7→ ∂n

ξ1,...,ξnV . For x, x ∈ R

d andn= 1,2,3,4 let

Ln(x, x) = sup(∂nξ1,...,ξnV )(z) : z ∈ [x, x], ξ1, . . . , ξn ∈B−

1 .(4.1)

In other words,

Ln(x, x) = supz∈[x,x]

‖(∂nV )(z)‖∗−,

where ‖ · ‖∗− is the dual norm on n-forms defined by

‖l‖∗− = supl(ξ1, . . . , ξn) : ξ1, . . . , ξn ∈B−1 .

In particular,

L1(x, x) = supz∈[x,x]

‖∇V (z)‖+.

By Assumption 1.5,

Ln(x, x)≤Cn ·max(1,‖x‖−,‖x‖−)pn ∀x, x∈Rd, n= 1,2,3,4.(4.2)


We now derive upper bounds for the terms in the expression for Gh(x, y)given in Proposition 1.3. We first express the leading order term in terms of3rd derivatives of V :

Lemma 4.1. For x, y ∈Rd,

V (y)− V (x)− y− x

2· (∇V (y) +∇V (x))

=−1

2

∫ 1

0t(1− t)∂3

y−xV ((1− t)x+ ty)dt.

Proof. A second-order expansion for f(t) = V (x+ t(y − x)), t ∈ [0,1],yields

V (y)− V (x) =

∫ 1

0∂y−xV (x+ t(y − x))dt

= (y − x) · ∇V (x) +

∫ 1

0

∫ t

0∂2y−xV (x+ s(y − x))dsdt

= (y − x) · ∇V (x) +

∫ 1

0(1− s)∂2

y−xV (x+ s(y − x))ds,

and, similarly,

V (y)− V (x) = (y − x) · ∇V (y)−∫ 1

0

∫ 1

t∂2y−xV (x+ s(y− x))dsdt

= (y − x) · ∇V (y)−∫ 1

0s∂2

y−xV (x+ s(y − x))ds.

By averaging both equations, we obtain

V (y)− V (x)− y − x

2· (∇V (y) +∇V (x))

=1

2

∫ 1

0(1− 2s)∂2

y−xV (x+ s(y− x))ds

=1

2

∫ 1

0t(1− t)∂3

y−xV (x+ t(y − x))dt.

Here we have used that for any function g ∈C1([0,1]),∫ 1

0(1− 2s)g(s)ds =

∫ 1

0(1− 2s)

∫ s

0g′(t)dt ds

=

∫ 1

0

∫ 1

t(1− 2s)dsg′(t)dt=−

∫ 1

0t(1− t)g′(t)dt.

26 A. EBERLE

Lemma 4.2. For x, y ∈Rd, the following estimates hold:

(1) |V (y)− V (x)| ≤ L1(x, y) · ‖y − x‖−;(2) |V (y)− V (x)− y−x

2 · (∇V (y) +∇V (x))| ≤ 112L3(x, y) · ‖y − x‖3−;

(3) |(∇U(y)+∇U(x)) ·(∇V (y)−∇V (x))| ≤L2(x, y) ·‖∇U(y)+∇U(x)‖− ·‖y − x‖−;

(4) ‖∇U(y) +∇U(x)‖− ≤ 2‖∇U(x)‖− + (1 +L2(x, y)) · ‖y − x‖−.

Remark 4.3. The estimates in Lemma 4.2 provide a bound for theterms in the expression (1.32) for Gh(x, y) in the case of semi-implicit Eulerproposals. For explicit Euler proposals, one also has to bound the term

|∇U(y)|2 − |∇U(x)|2 = |y +∇V (y)|2 − |x+∇V (x)|2.Note that even when V vanishes, this term cannot be controlled in terms of‖ · ‖− in general. A valid upper bound is

|∇U(y) +∇U(x)| · |y − x|+L2(x, y)‖∇U(y) +∇U(x)‖− · ‖y − x‖−.

Proof of Lemma 4.2. By Lemma 4.1 and by definition of Ln(x, y),we obtain

|V (y)− V (x)| ≤ supz∈[x,y]

|∂y−xV (z)| ≤L1(x, y) · ‖y− x‖−,∣∣∣∣V (y)− V (x)− y − x

2· (∇V (y) +∇V (x))

∣∣∣∣

≤ 1

2

∫ 1

0t(1− t)dt sup

z∈[x,y]|∂3

y−xV (z)| ≤ 1

12L3(x, y) · ‖y − x‖3−,

(∇U(y) +∇U(x)) · (∇V (y)−∇V (x))

= ∂∇U(y)+∇U(x)V (y)− ∂∇U(y)+∇U(x)V (x)

=

∫ 1

0∂2∇U(y)+∇U(x),y−xV ((1− t)x+ ty)dt

≤ L2(x, y) · ‖∇U(y) +∇U(x)‖− · ‖y − x‖−.Moreover, the estimate

‖∇U(y) +∇U(x)‖− ≤ 2‖∇U(x)‖− + ‖∇U(y)−∇U(x)‖−≤ 2‖∇U(x)‖− + ‖y − x‖− + ‖∇V (y)−∇V (x)‖−≤ 2‖∇U(x)‖− + (1+L2(x, y)) · ‖y − x‖−

holds by definition of L2(x, y) and since

‖∇V (y)−∇V (x)‖− ≤ |∇V (y)−∇V (x)|= sup|ξ|=1

(∂ξV (y)− ∂ξV (x))

≤ sup‖ξ‖−≤1

(∂ξV (y)− ∂ξV (x)).


Recalling the definitions of Y OUh (x) and Yh(x) from (1.23) and (1.27), we

obtain:

Lemma 4.4. For x ∈ Rd, h ∈ (0,2) and n ∈ 1,2,3,4 with pn ≥ 1, we

have:

(1) ‖Y OUh (x)− x‖− ≤ h

2‖x‖− +√h‖Z‖−;

(2) ‖Yh(x)− x‖− ≤ h2‖∇U(x)‖− +

√h‖Z‖−;

(3) ‖Y OUh (x)‖− ≤ (1− h

2 )‖x‖− +√h‖Z‖−;

(4) ‖Yh(x)‖− ≤ ‖x‖− + h2‖∇U(x)‖− +

√h‖Z‖−;

(5) Ln(x,YOUh (x))≤Cn2

pn−1(max(1,‖x‖−)pn + hpn/2‖Z‖pn− );

(6) Ln(x,Yh(x)) ≤ Cn3pn−1(max(1,‖x‖−)pn + (h2 )

pn‖∇U(x)‖pn− +

hpn/2‖Z‖pn− ).

Proof. Estimates (1)–(4) are direct consequences of the triangle in-equality. Moreover, by (3) and (4),

max(1,‖x‖−,‖Y OUh (x)‖−)≤max(1,‖x‖−) +

√h‖Z‖−

and

max(1,‖x‖−,‖Yh(x)‖−)≤max(1,‖x‖−) +h

2‖∇U(x)‖− +

√h‖Z‖−.

Estimates (5) and (6) now follow from (4.2) and Holder’s inequality.

We now combine the estimates in Lemmas 4.2 and 4.4 in order to proveProposition 1.7 and the first part of Proposition 1.11:

Proof of Proposition 1.7. By (1.16) and Proposition 1.3, for h ∈ (0,2),

E[(1− αh(x,Yh(x)))k]1/k ≤ ‖Gh(x,Yh(x))

+‖Lk ≤ I +h

4II ,(4.3)

where

I =

∥∥∥∥V (Yh(x))− V (x)− Yh(x)− x

2· (∇V (Yh(x))−∇V (x))

∥∥∥∥Lk

,

II = ‖(∇U(Yh(x)) +∇U(x)) · (∇V (Yh(x))−∇V (x))‖Lk .

By Lemma 4.2,

I ≤ E[L3(x,Yh(x))k · ‖Yh(x)− x‖3k− ]1/k/12 and(4.4)

II ≤ E[L2(x,Yh(x))k · ‖Yh(x)− x‖k−

(4.5)× (2‖∇U(x)‖− + (1+L2(x,Yh(x))) · ‖Yh(x)− x‖−)k]1/k.

The assertion of Proposition 1.7 for semi-implicit Euler proposals is nowa direct consequence of the estimates (2) and (6) in Lemma 4.4. The as-

28 A. EBERLE

sertion for Ornstein–Uhlenbeck proposals follows similarly from (1.21), theestimates (1) in Lemma 4.2 and (1), (3) and (5) in Lemma 4.4.

It is possible to write down the polynomial in Proposition 1.7 explicitly.For semi-implicit Euler proposals, we illustrate this in the case k = 1 andp2 = p3 = 0. Here, by (4.4) and (4.5) we obtain

I ≤ C3

12E[(h‖∇U(x)‖−/2 +

√h‖Z‖−)3]≤

C3

4(h3‖∇U(x)‖3−/8 + h3/2m3),

II ≤ C2E[(h‖∇U(x)‖−/2 +√h‖Z‖−)

× (2‖∇U(x)‖− + (1 +C2)(h‖∇U(x)‖−/2 +√h‖Z‖−))]

≤ C2

(h‖∇U(x)‖2− +2

√h‖∇U(x)‖−m1

+ (1 +C2)

(h2

2‖∇U(x)‖2− +2hm2

)).

Hence by (4.3),

E[1− αh(x,Yh(x))]≤ h3/2 ·(1

4C3m3 +

1

2C2m1‖∇U(x)‖−

)

+ h2 ·(1

4C2‖∇U(x)‖2− +

1

2C2(1 +C2)m2

)(4.6)

+ h3 ·(

1

32C3‖∇U(x)‖3− +

1

8C2(1 +C2)‖∇U(x)‖2−

).

For Ornstein–Uhlenbeck proposals, we derive the explicit bound for therejection probabilities stated in Proposition 1.11 for the case k = 1 andp2 = 0.

Proof of Proposition 1.11, first part. If p2 = 0, then for anyx ∈R

d,

‖∇V (x)‖+ ≤ ‖∇V (0)‖+ + ‖∇V (x)−∇V (0)‖+ ≤C1 +C2 · ‖x‖−.(4.7)

Therefore, for any x, y ∈Rd,

|V (y)− V (x)| ≤ (C1 +C2 ·max(‖x‖−,‖y‖−)) · ‖y − x‖−.Hence, by (1.21) and by (1) and (3) in Lemma 4.4,

E[1− αOU(x,Y OUh (x))]

≤ E[(V (Y OUh (x))− V (x))+]

≤ E[(C1 +C2 ·max(‖x‖−,‖Y OUh (x)‖−)) · ‖Y OU

h (x)− x‖−]


≤ E[(C1 +C2 · (‖x‖− +√h‖Z‖−)) · (h‖x‖−/2 +

√h‖Z‖−)]

=m1(C1 +C2‖x‖−) · h1/2

+1

2(2m2C2 +C1‖x‖− +C2‖x‖2−) · h+

1

2m1C2‖x‖− · h3/2.

5. Dependence of rejection on the current state. We now derive esti-mates for the derivatives of the functions

Fh(x,w) =Gh

(x,x− h

2∇U(x) +w

),(5.1)

FOUh (x,w) =GOU

(x,x− h

2x+w

), (x,w) ∈R

d ×Rd,(5.2)

w.r.t. x. Since

Gh(x,Yh(x)) = Fh

(x,

√h− h2

4Z

)with Z ∼ γd, and(5.3)

GOU(x,Y OUh (x)) = FOU

h

(x,

√h− h2

4Z

)with Z ∼ γd,(5.4)

these estimates can then be applied to control the dependence of rejectionon the current state x.

For Ornstein–Uhlenbeck proposals, by (1.21), we immediately obtain

∇xFOUh (x,w) =

(1− h

2

)(∇V (y)−∇V (x))− h

2∇V (x),(5.5)

where y := (1− h2 )x+w.

For semi-implicit Euler proposals, the formula for the derivative is moreinvolved. To simplify the notation we set for x ∈R

d and fixed h ∈ (0,2),

x′ := x− h

2∇U(x).(5.6)

In the sequel, we use the conventions

v ·w =

d∑

i=1

viwi, (v · T )j =d∑

i=1

viTi,j, (T · v)j =d∑

j=1

Ti,jvj ,

(S · T )ik =d∑

j=1

Si,jTj,k

for vectors v,w ∈Rd and (2,0) tensors S,T ∈R

d ⊗Rd. In particular,

v · (S · T ) = (v · S) · T,that is, the brackets may be omitted. We now give an explicit expression forthe derivative of Fh(x,w) w.r.t. the first variable:

30 A. EBERLE

Proposition 5.1. Suppose V ∈C2(Rd). Then for any x,w ∈Rd,

∇xFh(x,w) =∇V (y)−∇V (x)− y − x

2· (∇2V (y) +∇2V (x))

+h

4(y − x) · ∇2V (y) · (Id +∇2V (x))

+h

8− 2h(∇V (y)−∇V (x) +∇U(y) +∇U(x))

× (∇2V (y)−∇2V (x))

− h2

16− 4h(∇V (y)−∇V (x) +∇U(y) +∇U(x)) · ∇2V (y)

× (Id +∇2V (x))

with y := x′ +w= x− h2∇U(x) +w.

Proof. Let

W (x) :=∇V (x) =∇U(x)− x, x ∈Rd.

By Proposition 1.3,

Fh(x,w) =Ah(x,w) +h

8− 2hBh(x,w)(5.7)

for any x,w ∈Rd, where

Ah(x,w) := V (x′ +w)− V (x)− x′ +w− x

2· (∇V (x′ +w) +∇V (x)) and

Bh(x,w) := (∇U(x′ +w) +∇U(x)) · (∇V (x′ +w)−∇V (x)).

Noting that by (5.6),

x− x′ =h

2∇U(x) =

h

2x+

h

2∇V (x),(5.8)

∇x(x− x′) =h

2∇2U(x) =

h

2Id +

h

2∇2V (x) and(5.9)

∇xx′ = Id −

h

2∇2U(x) =

(1− h

2

)Id −

h

2∇2V (x),(5.10)

we obtain with y = x′ +w

∇xAh(x,w) =W (x′ +w) ·(Id −

h

2∇2U(x)

)−W (x)

− x′ +w− x

2·(∇W (x′ +w) ·

(Id −

h

2∇2U(x)

)+∇W (x)

)


+h

4(W (x′ +w) +W (x)) · ∇2U(x)

=W (y)−W (x)− y− x

2· (∇W (y) +∇W (x))

− h

4(W (y)−W (x)) · ∇2U(x) +

h

4(y − x) · (∇W (y) · ∇2U(x)),

∇xBh(x,w) = (W (x′ +w)−W (x))

×(∇2U(x′ +w) ·

(Id −

h

2∇2U(x)

)+∇2U(x)

)

+ (∇U(x′ +w) +∇U(x))

×(∇W (x′ +w) ·

(Id −

h

2∇2U(x)

)−∇W (x)

)

= (W (y)−W (x)) · (∇2U(y)

+∇2U(x)) + (∇U(y) +∇U(x)) · (∇W (y)−∇W (x))

− h

2(W (y)−W (x)) · (∇2U(y) · ∇2U(x))

− h

2(∇U(y) +∇U(x)) · (∇W (y) · ∇2U(x)).

In total, we obtain

∇xFh(x,w) =∇xAh(x,w) +h

8− 2h∇xBh(x,w)

=W (y)−W (x)− y− x

2· (∇W (y) +∇W (x))

+h

8− 2h(W (y)−W (x)) · (∇2U(y)−∇2U(x))

+h

4(y − x) · ∇W (y) · ∇2U(x)

+h

8− 2h(∇U(y) +∇U(x)) · (∇W (y)−∇W (x))

+

(2h

8− 2h− h

4

)(W (y)−W (x)) · ∇2U(x)

− h2

16− 4h(W (y)−W (x)) · ∇2U(y) · ∇2U(x)

− h2

16− 4h(∇U(y) +∇U(x)) · ∇W (y) · ∇2U(x)

32 A. EBERLE

=W (y)−W (x)− y− x

2· (∇W (y) +∇W (x))

+h

4(y − x) · ∇W (y) · ∇2U(x)

+h

8− 2h(W (y)−W (x) +∇U(y) +∇U(x))

× (∇W (y)−∇W (x))

− h2

16− 4h((W (y)−W (x)) · (∇2U(y)− Id)

+ (∇U(y) +∇U(x)) · ∇W (y)) · ∇2U(x).

Here we have used that

∇2U = Id +∇2V = Id +∇W.

The assertion follows by applying this identity to the remaining ∇2U termsas well.

Similar to Lemma 4.2 above, we now derive bounds for the individualsummands in the expressions for ∇xF

OUh and ∇xFh in (5.5) and Proposi-

tion 5.1.

Lemma 5.2. For V ∈C4(Rd) and x, y ∈Rd the following estimates hold:

(1) ‖∇V (y)−∇V (x)‖+ ≤ L2(x, y) · ‖y − x‖−;(2) ‖∇V (y)−∇V (x)− y−x

2 ·(∇2V (y)+∇2V (x))‖+ ≤L4(x, y) ·‖y−x‖3−/12;(3) ‖(y−x) ·∇2V (y) ·(Id+∇2V (x))‖+ ≤ L2(y, y) ·(1+L2(x,x)) ·‖y−x‖−;(4) ‖(∇V (y)−∇V (x)+∇U(y)+∇U(x)) ·(∇2V (y)−∇2V (x))‖+ ≤ L3(x, y) ·

(L2(x, y)‖y − x‖− + ‖∇U(y) +∇U(x)‖−) · ‖y − x‖−;(5) ‖(∇V (y)−∇V (x) +∇U(y) +∇U(x)) · ∇2V (y) · (Id +∇2V (x))‖− ≤

L2(y, y) · (L2(x, y)‖y − x‖− + ‖∇U(y) +∇U(x)‖−) · (1 +L2(x,x)).

Proof. (1) For any ξ ∈Rd, we have

|∂ξV (y)− ∂ξV (x)| ≤ supz∈[x,y]

|∂2y−x,ξV (z)| ≤L2(x, y)‖x− y‖−‖ξ‖−.

This proves (1) by definition of ‖ · ‖+.(2) By Lemma 4.1 applied to ∂ξV ,∣∣∣∣∂ξV (y)− ∂ξV (x)− y − x

2· (∇∂ξV (y)−∇∂ξV (x))

∣∣∣∣

≤ 1

2

∫ 1

0t(1− t)dt · sup

z∈[x,y]|∂3

y−x∂ξV (z)| ≤ 1

12L4(x, y)‖x− y‖3−‖ξ‖−.


(3) For v,w ∈Rd, we have

|v · ∇2V (y)w|= |∂2vwV (y)| ≤ L2(x, y)‖v‖−‖w‖−.(5.11)

Since ‖ · ‖− is weaker than the Euclidean norm, we obtain

|(y − x) · ∇2V (y) · (I +∇2V (x)) · ξ|≤ L2(y, y)‖y − x‖−‖(I +∇2V (x)) · ξ‖−≤ L2(y, y)‖y − x‖−(1 +L2(x,x))‖ξ‖−.

(4), (5) For v,w ∈Rd,

|v · (∇2V (y)−∇2V (x)) ·w|=∣∣∣∣∫ 1

0∂3y−x,v,wV ((1− t)x+ ty)dt

∣∣∣∣

≤ L3(x, y)‖y − x‖−‖v‖−‖w‖−.Therefore,

|(∇V (y)−∇V (x) +∇U(y) +∇U(x)) · (∇2V (y)−∇2V (x)) · ξ|≤L3(x, y)‖y − x‖− · (‖∇V (y)−∇V (x)‖− + ‖∇U(y) +∇U(x)‖−) · ‖ξ‖−≤L3(x, y)‖y − x‖− · (L2(x, y)‖y − x‖− + ‖∇U(y) +∇U(x)‖−) · ‖ξ‖−,

and, correspondingly,

|(∇V (y)−∇V (x) +∇U(y) +∇U(x)) · ∇2V (y) · (I +∇2V (x)) · ξ|≤ L2(y, y) · (L2(x, y)‖y − x‖− + ‖∇U(y) +∇U(x)‖−)

× (1 +L2(x,x))‖ξ‖−.

By combining Proposition 5.1 with the estimates in Lemma 5.2 andLemma 4.4, we will now prove Proposition 1.9.

Proof of Proposition 1.9. Fix h ∈ (0,2). By (1.17) and (1.18), forany x, x ∈R

d,

‖αh(x,Yh(x))−αh(x, Yh(x))‖Lk

≤ ‖Gh(x,Yh(x))−Gh(x, Yh(x))‖Lk(5.12)

≤ ‖x− x‖− · supz∈[x,x]

‖‖∇xGh(x,Yh(x))‖+‖Lk .

Moreover, by (5.3) and Proposition 5.1,

‖‖∇xGh(x,Yh(x))‖+‖Lk

= ‖‖∇xFh(x,√h− h2/4Z)‖+‖Lk(5.13)

≤ I +h

4· II + h

8− 2h· III + h2

16− 4h· IV ,

34 A. EBERLE

where

I = E[‖∇V (Yh(x))−∇V (x)− 12 (Yh(x)− x)

× (∇2V (Yh(x)) +∇2V (x))‖k+]1/k,

II = E[‖(Yh(x)− x) · ∇2V (Yh(x)) · (I +∇2V (x))‖k+]1/k,

III = E[‖(∇V (Yh(x))−∇V (x) +∇U(Yh(x)) +∇U(x))

× (∇2V (Yh(x))−∇2V (x))‖k+]1/k,

IV = E[‖(∇V (Yh(x))−∇V (x) +∇U(Yh(x)) +∇U(x))

×∇2V (Yh(x)) · (I +∇2V (x))‖k+]1/k.

By applying the estimates from Lemmas 5.2 and 4.2(4), we obtain

I ≤ 112E[L4(x,Yh(x))

k‖Yh(x)− x‖3k− ]1/k,(5.14)

II ≤ (1 +L2(x,x)) ·E[L2(Yh(x), Yh(x))k‖Yh(x)− x‖k−]

1/k,(5.15)

III ≤ E[L3(x,Yh(x))k‖Yh(x)− x‖k−

(5.16)× ((1 + 2L2(x,Yh(x)))

k‖Yh(x)− x‖k− +2‖∇U(x)‖k−)]1/k,

IV ≤ (1 +L2(x,x))

× E[L2(Yh(x), Yh(x))k(5.17)

× ((1 +L2(x,Yh(x)))k‖Yh(x)− x‖k− + 2‖∇U(x)‖k−)]

1/k.

The assertion for semi-implicit Euler proposals is now a direct consequenceof the estimates in Lemma 4.4, (4.2) and (5.12).

The assertion for Ornstein–Uhlenbeck proposals follows in a similar wayfrom (5.5), Lemma 5.2(1) and the estimates in Lemma 4.4.

Again, it is possible to write down the polynomial in Proposition 1.9explicitly. For semi-implicit Euler proposals, we illustrate this in the casek = 1 and p2 = p3 = p4 = 0. Here, by (5.14), (5.15), (5.17) and (5.17) weobtain

I ≤ C4

12E

[(h

2‖∇U(x)‖− +

√h‖Z‖−

)3]≤ C4

4

(h3

8‖∇U(x)‖3− + h3/2m3

),

II ≤ (C2 +C22 )E

[h

2‖∇U(x)‖− +

√h‖Z‖−

]

= (C2 +C22 )

(h

2‖∇U(x)‖− + h1/2m1

),


III ≤ C3

(2‖∇U(x)‖− ·E

[h

2‖∇U(x)‖− +

√h‖Z‖−

]

+ (1 + 2C2)E

[(h

2‖∇U(x)‖− +

√h‖Z‖−

)2])

≤ 2C3‖∇U(x)‖−(h

2‖∇U(x)‖− +

√hm1

)

+C3(1 + 2C2)

(h2

2‖∇U(x)‖2 + 2hm2

),

IV ≤ (1 +C2)C2

((1 +C2)E

[h

2‖∇U(x)‖− +

√h‖Z‖−

]+2‖∇U(x)‖−

)

≤ 2(1 +C2)C2‖∇U(x)‖− + (1+C2)2C2

(h

2‖∇U(x)‖− +

√hm1

).

Hence by (5.13), for h ∈ (0,2),

E[‖∇xGh(x,Yh(x))‖+]

≤ 14h

3/2(C4m3 + (1 +C2)C2m1 + 2C3‖∇U(x)‖−m1)

+ 18h

2(4C2(1 + 2C2)m2 + 3C2(1 +C2)‖∇U(x)‖− + 2C3‖∇U(x)‖2−)(5.18)

+ 116h

5/2C2(1 +C2)2(2m1 + h1/2‖∇U(x)‖−)

+ 132h

3(4C3(1 + 2C2)‖∇U(x)‖2− +C4‖∇U(x)‖3−).

For Ornstein–Uhlenbeck proposals, we prove the explicit bound for thedependence of the rejection probabilities on the current state for the casek = 2 and p2 = 0 as stated in Proposition 1.11.

Proof of Proposition 1.11, second part. If p2 = 0, then by (5.4),(5.5) and (4.7),

‖∇xGOU(x,Y OU

h (x))‖+ ≤ ‖∇V (Y OUh (x))−∇V (x)‖+ +

h

2‖∇V (x)‖+

≤ C2‖Y OUh (x)− x‖− + (C1 +C2‖x‖−)h/2

≤ C2‖Z‖−h1/2 + (C1 +2C2‖x‖−)h/2

for any x ∈Rd. Therefore,

E[‖∇xGOU(x,Y OU

h (x))‖2+]1/2 ≤C2m

1/22 h1/2 + (C1 + 2C2‖x‖−)h/2.

The assertion now follows similarly to (5.12).

36 A. EBERLE

6. Upper bound for exit probabilities. In this section, we prove an upperbound for the exit probabilities of the MALA chain from the ball B−

R thatis required in the proof of Theorem 1.19; cf. [12] for a detailed proof of amore general result. Let

f(x) := exp(K‖x‖2−/16).(6.1)

The following lemma shows that f(x) acts as a Lyapunov function for theMALA transition kernel on B−

R :

Lemma 6.1. Suppose that Assumptions 1.5 and 1.14 hold. Then there

exist constants C1,C2, ρ1 ∈ (0,∞) such that

qhf ≤ f1−Kh/4eC2h on B−R(6.2)

for any R,h ∈ (0,∞) such that h−1 ≥C1(1 +R)ρ1 .

Proof. We first observe that a corresponding bound holds for the pro-posal kernel ph. Indeed, by (1.27), and since ‖v‖2− = v ·Gv with a nonnegativedefinite symmetric matrix G≤ I , an explicit computation yields

(phf)(x) = E

[exp

(K

∥∥∥∥x− h

2∇U(x) +

√h− h2/4Z

∥∥∥∥2

−

/16

)]

≤ exp

(K(1 +Kh/4)

∥∥∥∥x− h

2∇U(x)

∥∥∥∥2

−

/16

).

Moreover, by Assumption 1.14,∥∥∥∥x− h

2∇U(x)

∥∥∥∥2

−

≤(1− Kh

2

)‖x‖2− +

h

2K‖∇U(0)‖2− +

h2

4‖∇U(x)‖2−.

Hence by Assumption 1.5, there exist constants C3,C4, ρ2 ∈ (0,∞) such that

(phf)(x)≤ f(x)1−Kh/4eC3h

for any x ∈Rd and h ∈ (0,∞) such that h−1 ≥C4(1+‖x‖−)ρ2 . By the upper

bound for the rejection probabilities in Proposition 1.7, we conclude thatthere exists a polynomial s such that the corresponding upper bound

qhf ≤ f1−Kh/4eC3h + s(R)h3/2f

= f1−Kh/4(eC3h + s(R)h3/2fKh/4)

≤ f1−Kh/4e(C3+1)h

holds on B−R whenever both h−1 ≥C4(1+R)ρ2 and s(R)h1/2fKh/4 ≤ 1. The

assertion follows, since the second condition is satisfied if K2hR2/64≤ 1 ands(R)eh1/2 ≤ 1.


Now consider the first exit time

TR := infn≥ 0 :Xn /∈B−R,

where (Xn,Px) is the Markov chain with transition kernel qh and initial con-dition X0 = x Px-a.s. We can estimate TR by constructing a supermartingalebased on the Lyapunov function f :

Theorem 6.2. If Assumptions 1.5 and 1.14 hold, then there exist con-

stants C,ρ,D ∈ (0,∞) such that the upper bound

Px[TR ≤ n]≤Dnh exp[K(‖x‖2− −R2)/24](6.3)

holds for any n≥ 0, R,h ∈ (0,∞) such that h−1 ≥C(1 +R)ρ, and x ∈B−R .

Proof. Fix n ∈N, choose C1,C2, ρ1 as in the lemma above, and let

Mj := f(Xj)(1−Kh/4)n−j

exp

(−C2h

j∑

i=0

(1−Kh/4)n−i

)

for j = 0,1, . . . , n. If h−1 ≥ C1(1 + R)ρ1 , then by Jensen’s inequality and(6.2),

Ex[Mj+1|Fj ]≤ (qhf)(Xj)1−Kh/4 exp

(−C2h

j+1∑

i=0

(1−Kh/4)n−i

)

≤Mj on Xj ∈B−R for any j < n.

Hence the stopped process (MTR

j )0≤j≤n is a supermartingale, and thus

Ex[MTR;n−m≤ TR ≤ n]≤ Ex[M0] for any 0≤m≤ n.

Noting that M0 = f(x)(1−Kh/4)n = exp((1−Kh/4)nK‖x‖2−/16), and

MTR≥ (f(XTR

) exp(−4C2/K))(1−Kh/4)n−TR

= exp

[(K

16R2 − 4C2/K

)· (1−Kh/4)n−TR

],

we obtain the bound

Px[n−m≤ TR ≤ n]

≤ exp

[(1− Kh

4

)m(K

16

((1− Kh

4

)n−m

‖x‖2− −R2

)+

4C2

K

)]

for any 0 ≤ m ≤ n provided R2 ≥ 64C2/K2. In particular, if mKh/2 ≤

log(3/2), then (1−Kh/4)m ≥ exp(−mKh/2)≥ 2/3, and hence

Px[n−m≤ TR ≤ n]≤ exp(4C2K) · exp(K(‖x‖2− −R2)/24).

The assertion follows by partitioning 0,1, . . . , n into blocks of length ≤mwhere m= ⌊2 log(3/2)K−1h−1⌋.

38 A. EBERLE

7. Proof of the main results. In this section, we combine the results inorder to derive the contraction properties for Metropolis–Hastings transitionkernels stated in Theorems 1.15, 1.12 and 1.19, and we finally prove 1.20.Note that for x, x ∈R

d, the distances

‖Y OUh (x)− Y OU

h (x)‖− = (1− h/2)‖x− x‖− and(7.1)

‖Yh(x)− Yh(x)‖− = ‖x− x− (∇U(x)−∇U(x))h/2‖−(7.2)

are deterministic. We now combine Lemma 2.5 with the estimates in Propo-sitions 1.7 and 1.9:

Proof of Theorem 1.15. We fix h ∈ (0,2), R ∈ (0,∞) and x, x ∈B−R .

By the basic contractivity Lemma 2.5 and by (2.3), respectively,

E[‖Wh(x)−Wh(x)‖−]≤ ‖x− x‖−

− (1−E[max(1−αh(x,Yh(x)),1−αh(x, Yh(x)))])

× (‖x− x‖− − ‖Yh(x)− Yh(x)‖−)

+E[max(‖x− Yh(x)‖−,‖x− Yh(x)‖−)2]1/2

×E[(αh(x,Yh(x))−αh(x, Yh(x)))2]1/2.

By Proposition 3.3,

‖Yh(x)− Yh(x)‖− ≤ (1−Kh/2 +M(R)2h2/8) · ‖x− x‖−,and by Lemma 4.4 (2),

E[max(‖x− Yh(x)‖−,‖x− Yh(x)‖−)2]1/2

≤m1/22 h1/2 +max(‖∇U(x)‖−,‖∇U(x)‖−)h/2.

The assertion of Theorem 1.15 follows by combining these estimates withthe bounds for the acceptance probabilities in Propositions 1.7 and 1.9.

The corresponding bound for Ornstein–Uhlenbeck proposals follows sim-ilarly from Lemma 2.5 and Proposition 1.11:

Proof of Theorem 1.12. We again fix h ∈ (0,2), R ∈ (0,∞), andx, x ∈B−

R . Since YOUh (x)−Y OU

h (x) = (1−h/2)(x− x) and ‖x−Y OUh (x)‖− ≤

‖x‖−h/2 + ‖Z‖−√h, the basic contractivity Lemma 2.5 implies

E[‖WOUh (x)−WOU

h (x)‖−]

≤(1− h

2

)‖x− x‖−


+h

2‖x− x‖−E[max(1− αOU(x,Y OU

h (x)),1−αOU(x, Y OUh (x)))]

+

(h

2max(‖x‖−,‖x‖−) +

√hm

1/22

)

×E[(αOU(x,Y OUh (x))−αOU(x, Y OU

h (x)))2]1/2.

The assertion of Theorem 1.12 follows by combining this estimates with thebounds for the acceptance probabilities in Proposition 1.11.

Proof of Theorem 1.19. Noting that

‖x‖− − (2R)2 ≤−3R2 for any x∈B−R ,

the assertion is a direct consequence of Corollary 1.18 and Theorem 6.2applied with R replaced by 2R.

Let µR = µ(·|B−R ) denote the conditional measure on B−

R . The fact thatµR is a stationary distribution for the Metropolis–Hastings transition kernelqh can be used to bound the Wasserstein distance between µRq

nh and µR:

Lemma 7.1. For any R≥ 0 and a ∈ (0,1),

W2R(µRqnh , µR)≤ 8R(µRq

nh)(R

d \B−R )

≤ 8(1− a)−1W2R(µRqnh , δ0q

nh) + 8R(δ0q

nh)(B

−aR).

Proof. The distance induced by the total variation norm ‖ · ‖TV is theWasserstein distance w.r.t. the metric d(x, y) = Ix 6=y. Since d2R(x, y) ≤4Rd(x, y), we obtain

W2R(µRqnh , µR)≤ 4R‖µRq

nh − µR‖TV

= 8R‖(µRqnh − µR)

+‖TV(7.3)

≤ 8R(µRqnh)(R

d \B−R).

Here we have used in the last step that µqh = µ, and hence

(µRqnh)(A)≤ (µRq

nh)(A∩B−

R ) + (µRqnh)(R

d \B−R )

≤ µR(A) + (µRqnh)(R

d \B−R )

for any Borel set A⊆Rd. Moreover, for a ∈ (0,1),

W2R(µRqnh , δ0q

nh)

(7.4)≥ (R− aR) · ((µRq

nh)(R

d \B−R)− (δ0q

nh)(R

d \B−aR)).

40 A. EBERLE

Indeed, for any coupling η(dxdx) of the two measures,

η(d2R(x, x)≥R− aR)≥ (µRqnh)(R

d \B−R )− (δ0q

nh)(R

d \B−aR).

The assertion follows by combining the estimates in (7.3) and (7.4).

Proof of Theorem 1.20. By combining the estimates in Theorem 1.19,Lemma 7.1 with a= 6/7, and Theorem 6.2, we obtain

W2R(νqnh , µR)≤W2R(νq

nh , µRq

nh) +W2R(µRq

nh , µR)

≤(1− K

4h

)n

W2R(ν,µR) +DR exp(−KR2/8)nh

+ 56 ·(1− K

4h

)n

W2R(µR, δ0) + 56DR exp(−KR2/8)nh

+ 8RP0[T6R/7 ≤ n]

≤ 58R ·(1− K

4h

)n

+57DR exp(−KR2/8)nh

+ 8DR exp(−KR2/33)nh.

Acknowledgments. I would like to thank in particular Daniel Gruhlkeand Sebastian Vollmer for many fruitful discussions related to the contentsof this paper.

REFERENCES

[1] Bakry, D. (1994). L’hypercontractivite et son utilisation en theorie des semigroupes.In Lectures on Probability Theory (Saint-Flour, 1992). Lecture Notes in Math.1581 1–114. Springer, Berlin. MR1307413

[2] Bakry, D., Cattiaux, P. and Guillin, A. (2008). Rate of convergence for ergodiccontinuous Markov processes: Lyapunov versus Poincare. J. Funct. Anal. 254

727–759. MR2381160[3] Beskos, A., Roberts, G., Stuart, A. and Voss, J. (2008). MCMC methods for

diffusion bridges. Stoch. Dyn. 8 319–350. MR2444507[4] Beskos, A. and Stuart, A. (2009). MCMC methods for sampling function space. In

ICIAM 07—6th International Congress on Industrial and Applied Mathematics337–364. Eur. Math. Soc., Zurich. MR2588600

[5] Bou-Rabee, N., Hairer, M. and Vanden-Eijnden, E. (2010). Non-asymptoticmixing of the MALA algorithm. Available at arXiv:1008.3514.

[6] Bou-Rabee, N. and Vanden-Eijnden, E. (2010). Pathwise accuracy and ergodicityof metropolized integrators for SDEs. Comm. Pure Appl. Math. 63 655–696.MR2583309

[7] Chen, M. F. and Li, S. F. (1989). Coupling methods for multidimensional diffusionprocesses. Ann. Probab. 17 151–177. MR0972776

[8] Cotter, S., Roberts, G., Stuart, A. and White, D. (2012). MCMC meth-ods for functions: Modifying old algorithms to make them faster. Available atarXiv:1202.0709.

http://www.ams.org/mathscinet-getitem?mr=1307413




http://arxiv.org/abs/arXiv:1008.3514





[9] Da Prato, G. and Zabczyk, J. (1996). Ergodicity for Infinite-dimensional Systems.London Mathematical Society Lecture Note Series 229. Cambridge Univ. Press,Cambridge. MR1417491

[10] Dyer, M., Frieze, A. andKannan, R. (1991). A random polynomial-time algorithmfor approximating the volume of convex bodies. J. Assoc. Comput. Mach. 38 1–17. MR1095916

[11] Eberle, A. (2011). Reflection coupling and Wasserstein contractivity without con-vexity. C. R. Math. Acad. Sci. Paris 349 1101–1104. MR2843007

[12] Gruhlke, D. (2013). Convergence of multilevel MCMC methods on path spaces.Ph.D. thesis, Univ. Bonn.

[13] Hairer, M., Stuart, A. and Vollmer, S. (2011). Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions. Available at arXiv:1112.1392.

[14] Hairer, M., Stuart, A. andVoss, J. (2011). Signal processing problems on functionspace: Bayesian formulation, stochastic PDEs and effective MCMC methods.In The Oxford Handbook of Nonlinear Filtering 833–873. Oxford Univ. Press,Oxford. MR2884617

[15] Has’minskiı, R. Z. (1980). Stochastic Stability of Differential Equations. Monographsand Textbooks on Mechanics of Solids and Fluids: Mechanics and Analysis 7.Sijthoff & Noordhoff, Alphen aan den Rijn. MR0600653

[16] Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains andtheir applications. Biometrika 57 97–109.

[17] Kannan, R., Lovasz, L. and Simonovits, M. (1997). Random walks and an O∗(n5)

volume algorithm for convex bodies. Random Structures Algorithms 11 1–50.MR1608200

[18] Levin, D. A., Peres, Y. and Wilmer, E. L. (2009). Markov Chains and MixingTimes. Amer. Math. Soc., Providence, RI. MR2466937

[19] Lovasz, L. and Vempala, S. (2006). Hit-and-run from a corner. SIAM J. Comput.35 985–1005 (electronic). MR2203735

[20] Lovasz, L. and Vempala, S. (2006). Simulated annealing in convex bodies and anO

∗(n4) volume algorithm. J. Comput. System Sci. 72 392–417. MR2205290[21] Lovasz, L. and Vempala, S. (2007). The geometry of logconcave functions and

sampling algorithms. Random Structures Algorithms 30 307–358. MR2309621[22] Mattingly, J. C., Pillai, N. S. and Stuart, A. M. (2012). Diffusion limits of the

random walk Metropolis algorithm in high dimensions. Ann. Appl. Probab. 22881–930. MR2977981

[23] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. andTeller, E. (1953). Equation of state calculations by fast computing machines.J. Chem. Phys. 6 1087–1092.

[24] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability.Springer, London. MR1287609

[25] Pillai, N. S., Stuart, A. M. and Thiery, A. H. (2013). Gradient flow from arandom walk in Hilbert space. Available at arXiv:1108.1494v3.

[26] Pillai, N. S., Stuart, A. M. and Thiery, A. H. (2012). Optimal scaling and dif-fusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab.22 2320–2356. MR3024970

[27] Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed.Springer, New York. MR2080278

[28] Roberts, G. O., Gelman, A. and Gilks, W. R. (1997). Weak convergence andoptimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7

110–120. MR1428751














http://arxiv.org/abs/arXiv:1108.1494v3




42 A. EBERLE

[29] Roberts, G. O. and Rosenthal, J. S. (1998). Optimal scaling of discrete ap-proximations to Langevin diffusions. J. R. Stat. Soc. Ser. B Stat. Methodol. 60255–268. MR1625691

[30] Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence ofLangevin distributions and their discrete approximations. Bernoulli 2 341–363.MR1440273

[31] Royer, G. (2007). An Initiation to Logarithmic Sobolev Inequalities. SMF/AMSTexts and Monographs 14. Amer. Math. Soc., Providence, RI. MR2352327

[32] Saloff-Coste, L. (1997). Lectures on finite Markov chains. In Lectures on Prob-ability Theory and Statistics (Saint-Flour, 1996). Lecture Notes in Math. 1665301–413. Springer, Berlin. MR1490046

[33] Steele, J. M. (2001). Stochastic Calculus and Financial Applications. Applicationsof Mathematics (New York) 45. Springer, New York. MR1783083

[34] Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der Mathematis-chen Wissenschaften 338. Springer, Berlin. MR2459454

[35] von Renesse, M.-K. and Sturm, K.-T. (2005). Transport inequalities, gradientestimates, entropy, and Ricci curvature. Comm. Pure Appl. Math. 58 923–940.MR2142879

Institut fur Angewandte Mathematik

Universitat Bonn

Endenicher Allee 60

53115 Bonn

Germany

E-mail: [email protected]








mailto:[email protected]

error bounds for metropolis–hastings algorithms applied to ...ity measures that have a density...

Documents