bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...ondelyon and...

30
Bayesian estimation of discretely observed multi-dimensional diffusion processes using guided proposals Frank van der Meulen and Moritz Schauer, Delft Institute of Applied Mathematics (DIAM) Delft University of Technology Mekelweg 4 2628 CD Delft The Netherlands e-mail: [email protected] e-mail: [email protected] Abstract. Bayesian estimation of parameters of a diffusion based on discrete time ob- servations poses a difficult problem due to the lack of a closed form expression for the likelihood. Data-augmentation has been proposed for obtaining draws from the poste- rior distribution of the parameters. Within this approach, the discrete time observations are augmented with diffusion bridges connecting these observations. This poses two challenges: (i) efficiently generating diffusion bridges; (ii) if unknown parameters ap- pear in the diffusion coefficient, then direct implementation of data-augmentation re- sults in an induced Markov chain which is reducible. In this paper we show how both challenges can be addressed in continuous time (before discretisation) by using guided proposals. These are Markov processes with dynamics described by the stochastic dif- ferential equation of the diffusion process with an additional term added to the drift coefficient to guide the process to hit the right end point of the bridge. The form of these proposals naturally provides a mapping that decouples the dependence between the diffusion coefficient and diffusion bridge using the driving Brownian motion of the proposals. As the guiding term has a singularity at the right end point, care is needed when discretisation is applied for implementation purposes. We show that this problem can be dealt with by appropriately time changing and scaling of the guided proposal process. In two examples we illustrate the performance of the algorithms we propose. The second of these concerns a diffusion approximation of a chemical reaction network with a four-dimensional diffusion driven by an eight-dimensional Brownian motion. Keywords: Discretely observed diffusion process, multidimensional diffusion bridge; data augmentation; linear processes; innovation process; Chemical Langevin Equa- tion; reparametrisation. Primary 62M05, 60J60; secondary 62F15, 65C05. 1. Introduction In this article we discuss a novel approach for estimating an unknown parameter θ Θ of the drift and the diffusion coefficient of a diffusion process dX t = b θ (t, X t )dt + σ θ (t, X t )dW t , X 0 = u (1.1) which is observed discretely in time. Here b θ : R × R d denotes the drift function, a θ = σ θ σ 0 θ is the diffusion function, where σ θ : R × R d R d×d 0 , and W is a d 0 -dimensional 1

Upload: others

Post on 04-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Bayesian estimation of discretely observedmulti-dimensional diffusion processes using

guided proposalsFrank van der Meulen and Moritz Schauer,

Delft Institute of Applied Mathematics (DIAM)Delft University of Technology

Mekelweg 42628 CD Delft

The Netherlandse-mail: [email protected]

e-mail: [email protected]

Abstract. Bayesian estimation of parameters of a diffusion based on discrete time ob-servations poses a difficult problem due to the lack of a closed form expression for thelikelihood. Data-augmentation has been proposed for obtaining draws from the poste-rior distribution of the parameters. Within this approach, the discrete time observationsare augmented with diffusion bridges connecting these observations. This poses twochallenges: (i) efficiently generating diffusion bridges; (ii) if unknown parameters ap-pear in the diffusion coefficient, then direct implementation of data-augmentation re-sults in an induced Markov chain which is reducible. In this paper we show how bothchallenges can be addressed in continuous time (before discretisation) by using guidedproposals. These are Markov processes with dynamics described by the stochastic dif-ferential equation of the diffusion process with an additional term added to the driftcoefficient to guide the process to hit the right end point of the bridge. The form ofthese proposals naturally provides a mapping that decouples the dependence betweenthe diffusion coefficient and diffusion bridge using the driving Brownian motion of theproposals. As the guiding term has a singularity at the right end point, care is neededwhen discretisation is applied for implementation purposes. We show that this problemcan be dealt with by appropriately time changing and scaling of the guided proposalprocess. In two examples we illustrate the performance of the algorithms we propose.The second of these concerns a diffusion approximation of a chemical reaction networkwith a four-dimensional diffusion driven by an eight-dimensional Brownian motion.

Keywords: Discretely observed diffusion process, multidimensional diffusion bridge;data augmentation; linear processes; innovation process; Chemical Langevin Equa-tion; reparametrisation.

Primary 62M05, 60J60; secondary 62F15, 65C05.

1. Introduction

In this article we discuss a novel approach for estimating an unknown parameter θ ∈ Θof the drift and the diffusion coefficient of a diffusion process

dXt = bθ(t,Xt) dt+ σθ(t,Xt) dWt, X0 = u (1.1)

which is observed discretely in time. Here bθ : R×Rd denotes the drift function, aθ =σθσ

′θ is the diffusion function, where σθ : R×Rd → Rd×d′ , andW is a d′-dimensional

1

Page 2: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 2

Wiener process. The observation times will be denoted by t0 = 0 < t1 < · · · < tn = Tand the corresponding observations by xi = Xti .

Estimation of θ in this setting has attracted much attention during the past decade.Here we restrict attention to estimation within the Bayesian paradigm. From a theoret-ical perspective, results on posterior consistency have been proved in Van der Meulenand Van Zanten (2013) and Gugushvili and Spreij (2012). The associated computa-tional problem is the object of study here. Two review articles that include many refer-ences on this topic are Van Zanten (2013) and Sørensen (2004).

The main difficulty in estimation for discretely observed diffusion processes is thelack of a closed form expression for transition densities, making the likelihood in-tractable. If the diffusion path is observed continuously, then estimation becomes eas-ier as for a fully observed diffusion path the likelihood is available in closed form (andparameters appearing in the diffusion coefficient can be determined from the quadraticvariation of the process). This naturally suggests to study the computational problemwithin a missing data framework, treating the unobserved path segments between twosucceeding observation times as missing data. This setup dates back to at least Pedersen(1995), who used it to obtain simulated maximum likelihood estimates for θ. Within theBayesian computational problem, the resulting Markov-Chain-Monte-Carlo algorithmis known as data-augmentation and was introduced in this context by Eraker (2001),Elerian et al. (2001) and Roberts and Stramer (2001). This algorithm is a special formof the Metropolis-Hastings (MH) algorithm which iterates the following steps:

1. draw missing segments, conditional on θ and the observed discrete time data;2. draw from the distribution of θ, conditional on the “full data”.

Here, by “full data” we mean the path formed by the drawn segments joined at theobservation times. The algorithm can be initialised by either interpolating the discretetime data or choosing an initial value for θ. Since in both steps it is usually not possibleto draw directly from the distribution of interest, each step can be replaced by generat-ing a proposal which is then accepted with probability defined by the MH-algorithm.There are two challenges with this approach:

Challenge 1: generating “good” proposals for the missing segments. The problem ofsimulating diffusion bridges has received a lot of attention in the past. Vastly differenttechniques have been proposed, including (i) single site Gibbs updating of the miss-ing segments locally on a discrete grid (Eraker (2001)), (ii) independent Metropolis-Hastings steps using as a proposal a Laplace approximation to the conditional dis-tribution obtained by Euler approximation (Elerian et al. (2001)), (iii) forward simu-lated processes derived from representations of the Brownian bridge in discrete time(Durham and Gallant (2002)), (iv) coupling arguments (Bladt and Sørensen (2012)),(v) a constrained sequential Monte Carlo algorithm with a resampling scheme guidedby backward pilots Lin et al. (2010), and (vi) exact simulation (Beskos et al. (2006)).

Delyon and Hu (2006) extended the work of Durham and Gallant to a continuoustime setup and derived an innovative proposal process taking the drift of the targetdiffusion into account. In case the diffusion coefficient is constant this proposal wasproposed earlier in Clark (1990). The basic idea consists of superimposing an addi-tional term to the drift of the unconditioned diffusion to guide the process towards the

Page 3: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 3

endpoint. Such proposals are termed guided proposals. The proposals of this type havethe drawback of a possible mismatch between drift and guiding term which preventstheir use for routine practice. Instead a variant of the proposal without the target drift,also due to Delyon and Hu (2006), is used. When discretised, the latter is closely relatedto the Modified Brownian Bridge proposed in Durham and Gallant (2002). In Schaueret al. (2013) a general class of proposal processes for simulating diffusion bridges wasintroduced. The proposals in Schauer et al. (2013) do take the drift of the target diffu-sion into account, but in a way different from Delyon and Hu (2006). As a result, theseproposals substantially reduce the mismatch of drift and guiding, because they allowfor more flexibility in choosing an appropriate guiding term to pull the process towardsthe endpoint in the right manner. An example of the advantage of this approach is givenin the introduction of Schauer et al. (2013). Further, this method only requires forwardsimulation and applies to diffusion processes which cannot be transformed to have unitdiffusion coefficient as well.

This comes with another challenge, as in any implementation, ultimately all propos-als have to be evaluated on a finite number of grid points. Now the pulling term addedto the drift for guided proposals has a singularity near the endpoint. This renders directEuler discretisation inaccurate. Furthermore, integrals that appear in the acceptanceprobability of bridges potentially suffer from this problem as well.

Challenge 2: handling unknown parameters appearing in the diffusion coefficient. Aspointed out by Roberts and Stramer (2001), the data augmentation algorithm degener-ates if θ appears in the diffusion coefficient as the quadratic variation of the full data∫ T0aθ(t,Xt) dt forces the conditional distribution for the next iterate for θ to be de-

generate at the current value. Hence, iterates of θ remain stuck at their initial value.From a second perspective, diffusion bridges corresponding to different θ have mutu-ally singular measures. For illuminating figures that illustrate this problem, we referto Roberts and Stramer (2001). This difficulty can be overcome, when the laws of thebridge proposals can be understood as parametrised push forwards of the law of anunderlying random process common to all models with different parameters θ. This isnaturally the case for proposals defined as solutions of stochastic differential equationsand the driving Brownian motion can be taken as such underlying random process. IfX? denotes a missing segment given that the parameter equals θ, the main idea con-sists of finding a map g and a process Z? such that X? = g(θ, Z?). The process Z?

will be called the “innovation process”, as was done in related approaches (Cf. Chibet al. (2004) and Golightly and Wilkinson (2010) ). Defining g and Z? appropriately israther subtile and we postpone a detailed discussion to Sections 2 and 4. To be success-ful, the tight dependence between θ and X? should be decoupled. In a more generalset-up, decouplings of similar forms are discussed under the keyword non-centeredparameterisation (Papaspiliopoulos et al. (2003)). It turns out that likelihood ratiosbetween diffusion bridges with different parameters driven by the same innovationsZ? can be defined and used to update the diffusion parameter. In an elementary form,where σθ(t, x) = θ this technique was used by Roberts and Stramer (2001). Chib et al.(2004) and Golightly and Wilkinson (2010) extended this technique to general diffu-sions in a discretised setting. A construction for time homogeneous diffusions basedon Delyon and Hu (2006) was explored by Fuchs (2013) (in particular section 7.4).

Page 4: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 4

Papaspiliopoulos et al. (2013) have an approach where the missing data is initiallyconsidered in continuous time using Delyon and Hu (2006) bridge proposals, but thedegeneracy problem is tackled after discretisation.

1.1. Contribution

In this article we show how the guided proposals introduced in Schauer et al. (2013)address the outlined challenges and can be used to define an efficient MCMC procedurefor generating draws from the posterior distribution of θ given discrete time data. Theprocedure can be seen as extension and unification of previous approaches within acontinuous time framework. Specific features of our approach include:

• We use in each data augmentation step “adapted” bridge proposals which takethe drift and the value of θ at that particular iteration into account. Hence, ateach iteration, the pulling term depends on θ, a feature which is unavailable us-ing proposals as in Delyon and Hu (2006). Especially in the multivariate case,the additional freedom in devising good proposals is crucial for obtaining a fea-sible MCMC procedure. The possibility to exploit special features of the driftfunction to achieve high acceptance rates makes this approach interesting forpractitioners. This is illustrated with a practical example in Section 7.2. This ex-ample concerns a diffusion approximation of a chemical reaction network andit turns out that our methods are often well suited for estimation of diffusionsobtained in this way.

• The algorithm does not suffer from the degeneracy problem in case unknownparameters appear in the diffusion coefficient, not even in the continuous timesetup.

• The innovation process is defined using the proposal process. As a result, in ouralgorithm (Cf. see algorithm 1), somewhat surprisingly, the innovations actuallynever need to be computed. This implies that our method can also cope with thecase where σ is not a square matrix.

• Though we derive all our results in a continuous time setup, for implementa-tion purposes integrals in likelihood ratios and solutions to stochastic differen-tial equations need to be approximated on a finite grid. As the drift of our pro-posal bridges has a singularity near its endpoint, we introduce a time-changeand space-scaling of the proposal process that allows for numerically accuratediscretisation and evaluation of the likelihood.

1.2. Outline

In Section 2 we clarify the aforementioned difficulties in a toy example. Here, we setsome general notation, introduce some key ideas used throughout and further motivatethe advantage of using guided proposals. In Section 3 we recap some key results fromSchauer et al. (2013) on guided proposals. The core of this paper is Section 4 wherewe precisely state our algorithm. In Section 5 we relate our approach to earlier workby Golightly and Wilkinson (2010), Fuchs (2013) and Papaspiliopoulos et al. (2013).

Page 5: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 5

In Section 6 we introduce a time-change and space-scaling of the proposal process todeal with the aforementioned numerical discretisation issues. Next, in Section 7 we dis-cuss two examples. In the second example we estimate the parameters in a prokaryoticauto-regulation network example of Golightly and Wilkinson (2010). The appendicescontain a few postponed proofs and some implementation details.

2. A toy problem

In this section we consider a toy example used to illustrate some key ideas to solvethe aforementioned problems with a simple data-augmentation algorithm. The typeof reparameterisation introduced shortly is not new, and has appeared for example inRoberts and Stramer (2001). The goal here is to introduce the main idea and point outsome of its potential shortcomings in more complex problems. Furthermore, later onwe will deal with more difficult cases and this toy example allows us to sequentiallybuild up an appropriate framework for that. We consider the diffusion process

dXt = b(Xt) dt+ θ dWt, X0 = u, t ∈ [0, T ],

where b is a known drift function and θ ∈ Θ an unknown scaling parameter. We assumeθ is equipped with a prior distribution π0(θ) and only one observation XT = v at timeT is available. We aim to draw from the posterior π(θ | XT ). The diffusion processconditioned on XT = v is a diffusion process itself. Denote by X? the conditioneddiffusion path (Xt, t ∈ (0, T )) (conditional on XT = v). Suppose we wish to iterate adata-augmentation algorithm and the current iterate is given by (X?, θ).

Updating X?: For almost all choices of b, there is no direct way of simulating X?.Instead, one can first generate a proposal bridge X◦ and accept with MH-acceptanceprobability. As an easy tractable example we choose to take

X◦t = u(1− t/T ) + vt/T + θZ◦t , (2.1)

where Z◦ is a standard Brownian bridge on [0, T ] (i.e. Z◦(0) = 0 and Z◦(T ) = 0).This is the bridge process corresponding to the unconditioned diffusion process dXt =θdWt, X0 = u.

Denoting the laws of X? and X◦ by P?θ and P◦θ respectively, we have

dP?θdP◦θ

(X◦) =pθ(0, u;T, v)

pθ(0, u;T, v)Ψθ(X

◦), (2.2)

with

Ψθ(X◦) = exp

(θ−2

∫ T

0

b(X◦s ) dX◦s −1

2θ−2

∫ T

0

b(X◦s )2 ds

).

Here, p and p denote the transition densities of the processesX and X respectively. Ab-solute continuity is a consequence of Girsanov’s theorem applied to the unconditionedprocesses and the abstract Bayes’ formula. Now the MH-step consists of generatinga proposal X◦ and accepting it with probability 1 ∧ Ψθ(X

◦)/Ψθ(X?) (the ratio of

transition densities just acts as a proportionality constant here).

Page 6: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 6

Updating θ: As explained in the introduction, taking the missing segment as missingdata yields the Metropolis-Hastings algorithm reducible. To deal with this problem,note that

X◦t = g(θ, Z◦t ) for g(θ, x) = u(1− t/T ) + vt/T + θx.

Define the process Z? by the relation

X?t = g(θ, Z?t ). (2.3)

This construction can be viewed by first choosing a process Z◦ whose law does notdepend on θ (this is crucial, as we will shortly explain). Here, we have chosen Z◦ tobe a standard Brownian bridge on (0, T ). Second, setting X◦ = g(θ, Z◦) defines themapping g. Third, equation (2.3) defined the process Z?.

Now that Z? is defined, rather then drawing from the distribution of θ conditionalon (X0 = u,XT = v,X?) we will sample from the the distribution of θ conditionalon (X0 = u,XT = v, Z?). This means that we augment the discrete time observationswith Z? instead of X?. Denote the laws of Z? and Z◦ by Q?θ and Q◦ respectively.Suppose the current iterate is (θ, Z?), where Z? can be extracted from θ and X? bymeans of equation (2.3). The following diagram summarises the notation introduced:

Process Z?g(θ,·)−→ X? Z◦

g(θ,·)−→ X◦

Measure Q?θ P?θ Q◦ P◦θ(2.4)

For updating θ we propose a value θ◦ from some proposal distribution q(· | θ) andaccept the proposal with probability min(1, A), where

A =π0(θ◦)

π0(θ)

pθ◦(0, u;T, v)

pθ(0, u;T, v)

dQ?θ◦dQ?θ

(Z?)q(θ | θ◦)q(θ◦ | θ)

. (2.5)

Here, we have implicitly assumed that Q?θ◦ and Q?θ are equivalent, which is indeed thecase as we have

dQ?θ◦dQ?θ

(Z?) =dQ?θ◦dQ◦

(Z?)

/dQ?θdQ◦

(Z?)

=dP?θ◦dP◦θ◦

(g(θ◦, Z?))

/dP?θdP◦θ

(g(θ, Z?))

and thus results from absolute continuity of P?θ and P◦θ . Being able to relate the like-lihood ratio of Q?θ◦ and Q?θ in this way is possible as Q◦ does not depend on θ. Byequation (2.2), we now get

dQ?θ◦dQ?θ

(Z?) =pθ(0, u;T, v)

pθ◦(0, u;T, v)

pθ◦(0, u;T, v)

pθ(0, u;T, v)

Ψθ◦(g(θ◦, Z?))

Ψθ(g(θ, Z?)).

Substituting this expression into equation (2.5) yields

A =π0(θ◦)

π0(θ)

pθ◦(0, u;T, v)

pθ(0, u;T, v)

Ψθ◦(g(θ◦, Z?))

Ψθ(g(θ, Z?))

q(θ | θ◦)q(θ◦ | θ)

(2.6)

Page 7: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 7

and all terms containing the unknown transition density cancel.The efficiency of this algorithm is crucially determined by choice of the transition

kernel q and proposal process X◦. We focus on an appropriate choice of X◦, thoughin section 4.3 we give guidelines on appropriate choice of q if the drift is of a specificstructure with respect to θ. In the toy-example, X◦ is defined in equation (2.1). Thisis a convenient choice, as it easily yields absolute continuity of P?θ and P◦θ . Moreover,generating an exact realisation of X◦ at any set of points in [0, T ] is straightforward.Nevertheless, there are at least two disadvantages to this approach:

1. The proposal does not take the drift b into account. Depending on the preciseform of b, this means that proposal bridges can look completely different fromthe bridge we wish to draw. Figures that illustrate this phenomenon in casedXt = sin(Xt − θ) dt + θ dWt can be found in Lin et al. (2010) (figure 4).This results in low MH-acceptance probabilities.

2. If the dispersion coefficient takes the more general form σθ(t,Xt), then P?θ andP◦θ are singular. Taking X◦ to be the process defined by dXt = σθ(t,X

◦t )dWt,

conditioned on starting in u and ending in v at time T , may seem to solve thisproblem. Unfortunately this introduces another problem as for almost all choicesof σθ it will be impossible to generate realisations from X◦ conditioned on hit-ting the endpoint v at time T . Moreover, the transition densities p will be com-pletely intractable in this case.

3. Guided proposals

A flexible class of proposal processes was developed and studied in Schauer et al.(2013). We will use this framework throughout and provide a recap of the relevantresults in this section. For precise statements of these results we refer the reader toSchauer et al. (2013).

Our starting point is that under weak assumptions the target diffusion bridge X?

from u at time t = 0 to v at time t = T is characterised as the solution to the SDE

dX?t = b?(t,X?

t ) dt+ σ(t,X?t ) dWt, X?

0 = u, t ∈ [0, T ), (?)

whereb?(t, x) = b(t, x) + a(t, x)∇x log p(t, x;T, v) (??)

and a(t, x) = σ(t, x)σ′(t, x). As the true transition density p(t, x;T, v) (this is thedensity of the process starting in x at time s, ending in v at time T ) of the diffusionprocess is generally intractable, we propose to replace it by the transition density of adiffusion process X for which it is known in closed form. Denote the transition densityof X by p(s, x;T, v). Define the process X◦ as the solution of the SDE

dX◦t = b◦(t,X◦t ) dt+ σ(t,X◦t ) dWt, X◦0 = u, (◦)

whereb◦(t, x) = b(t, x) + a(t, x)∇x log p(t, x;T, v). (◦◦)

Page 8: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 8

A process X◦ constructed in this way is referred to as a guided proposal (a guidingterm is superimposed on the drift to ensure the process hits v at time T ).

Denote the laws of X◦ and X? (viewed as Borel measures on C([0, T ],Rd)) by P◦and P? respectively. We reduce notation by writing p(s, x) for p(s, x;T, v). Define

R(s, x) = log p(s, x), r(s, x) = ∇R(s, x), H(s, x) = −∆R(s, x), (3.1)

where ∇ and ∆ denote gradient and Laplacian with respect to x. In Schauer et al.(2013) sufficient conditions for absolute continuity of µX◦ and µX? are establishedtogether with a closed form expression for the Radon-Nikodym derivative. It turns outthat

dP?

dP◦(X◦) =

p(0, u;T, v)

p(0, u;T, v)exp

(∫ T

0

D(s,X◦s ) ds

), (3.2)

where D is given by

D(s, x) = (b(s, x)− b(s, x))′r(s, x)

− 1

2tr(

[a(s, x)− a(s, x)][H(s, x)− r(s, x)r(s, x)′

]).

and hence does not depend on any intractable objects. The class of linear processes,

dXt = B(t)Xt dt+ β(t) dt+ σ(t) dWt, (3.3)

is a flexible class with known transition densities and its induced guided proposalssatisfy the conditions for absolute continuity derived in Schauer et al. (2013) underweak conditions on B, β and σ. Proposal processes X◦ derived by choosing a linearprocess as in (3.3) will be referred to as linear guided proposals. One key requirementfor absolute continuity of X? and X◦ is that σ is such that a(T ) = (σσT )(T ) =a(T, v). Another requirement is existence of an ε > 0 such that for all s ∈ [0, T ]x ∈ Rd and y ∈ Rd we have min(y′a(s, x)y, y′a(s)y) ≥ ε‖y‖2 (uniform ellipticity).Therefore, a simple type of guiding proposals is obtained upon choosing dXt = β dt+σ(T, v) dWt. For this particular choice

dX◦t =

(b(t,X◦t ) + a(t,X◦t )a(T, v)−1

[v −X◦tT − t

− β])

dt+σ(t,X◦t ) dWt, X◦0 = u.

Depending on the precise form of b and σ it can nevertheless be advantageous to useguided proposals induced for nonzero B.

For later reference, we write

Ψ(X◦) = exp

(∫ T

0

D(s,X◦s ) ds

)(3.4)

and Ψθ(X◦) in the case where b or σ depend on a parameter θ.

Page 9: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 9

4. Proposed MCMC algorithm

4.1. Innovation process

Using the notation introduced in the previous section, the dynamics of the proposalprocess can be described by the stochastic differential equation

dX◦t = (bθ(t,X◦t ) + aθ(t,X

◦t )rθ(t,X

◦t )) dt+ σθ(t,X

◦t ) dWt, X◦0 = u. (4.1)

Here r is determined by β, B and σ appearing in equation (3.3). Although we con-ceptually follow the same approach as in Section 2, we now take Z◦ to be the drivingBrownian motion instead of a Brownian bridge and set Z◦ = W . Define the mappingg by

X◦ = g(θ, Z◦). (4.2)

Such a mapping exists if X◦ is a strong solution to equation (4.1) (Cf. chapter V-10,p. 127 of Rogers and Williams (2000)). Sufficient conditions for existence of a strongsolution are detailed in Schauer et al. (2013). Define the process Z? by

Z?t = Wt −∫ t

0

σ′θ(s,X?s ) (rθ(t,X

?t )− rθ(s,X?

s )) ds (4.3)

Now it’s easily verified thatX? = g(θ, Z?).

We refer to the process Z? as the innovation process corresponding to X◦ (by analogyof the terminology of Golightly and Wilkinson (2010) and Chib et al. (2004)). Fromequations (4.2) and (4.3) it follows that X? is related to Z? just like X◦ is related toZ◦. Note however that while Z? does not depend on θ, Z? does. Denote the laws ofX◦,X?, Z◦ and Z? by P◦θ , P?θ , Q◦, Q?θ respectively and recall the diagram in (2.4). Wehave

dQ?θdQ◦

(Z?) =dP?θdP◦θ

(g(θ, Z?)) =pθ(0, u;T, v)

pθ(0, u;T, v)Ψθ(g(θ, Z?)). (4.4)

Here, the final equality follows from equation (3.2), which takes over the role of (2.2)in the toy-example.

If we had taken as different proposal process from (4.1) then the mapping g wouldchange according to (4.2). At this point we want to stress that while different choicesfor the map g formally yield valid algorithms, updating θ to θ◦ given Z? is difficultif g(θ◦, Z?) does not resemble a draw from X? conditional on θ◦. Ideally, one wouldtake g = gopt, where gopt is defined by the relation X? = gopt(θ,W ). As this is notfeasible, using g as derived from (◦) instead of (?) is a very reasonable choice, seefigure 1.

4.2. Algorithm

In this section we present an algorithm to sample from the posterior of θ given the dis-crete observationsD = {X0 = u,Xt1 = x1, . . . , Xtn = xn}. Denote the prior density

Page 10: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 10

FIGURE 1. Cartoon: g(θ◦, Z?) should resemble a draw from X? conditional on θ◦.

on θ by π0. The idea is to define a Metropolis–Hastings sampler on (θ, Z?) instead of(θ,X?) where Z? is the innovation process from the previous section. More precisely,we construct a Markov chain for (θ, (Z?i )1≤i≤n), where each Z?i is an innovation pro-cess corresponding to the bridge X?

i connecting observation xi−1 to xi.

Algorithm 1.

1. Initialisation. Choose a starting value for θ and sample i = 1, . . . , n Wienerprocesses Wi and set Z?i = Wi.

2. Update Z? | (θ,D). Independently, for 1 ≤ i ≤ n do

(a) Sample a Wiener process Z◦i .

(b) Sample U ∼ U(0, 1). Compute

A1 =Ψθ(g(θ, Z◦i ))

Ψθ(g(θ, Z?i )).

Set

Z?i :=

{Z◦i if U ≤ A1

Z?i if U > A1

.

3. Update θ | (Z?,D).(a) Sample θ◦ ∼ q(· | θ).

(b) Sample U ∼ U(0, 1). Compute

A2 =

n∏i=1

pθ◦(ti−1, xi−1; ti, xi)

pθ(ti−1, xi−1; ti, xi)

Ψθ◦(g(θ◦, Z?i ))

Ψθ(g(θ, Z?i ))

q(θ | θ◦)q(θ◦ | θ)

π0(θ◦)

π0(θ)

Set

θ :=

{θ◦ if U ≤ A2

θ if U > A2

.

4. Repeat steps (2) and (3).

Note that in none of these steps we need to compute innovations Z? from X?. Thisis a consequence of adapting the definition of the innovations to the bridge proposalsbeing used. It is essential, as Z? depends on r which is intractable (see equation (4.3)).

Page 11: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 11

Step 2 constitutes a step of a MH-sampler with independent proposals. The ex-pression for A1 follows directly from equation (3.2). The expression for A2 in step 3follows in exactly the same way as equation (2.6) was established in the toy-example(Cf. section 2.)

4.3. Partially conjugate series prior for the drift

In this subsection we study the specific case for which

bϑ(x) =

N∑i=1

ϑiϕi(x) (4.5)

where ϑ = (ϑ1, . . . , ϑN ) is an unknown parameter and ϕ1, . . . , ϕN are known func-tions on Rd. We assume the diffusion coefficient is parametrised by the parameter γ.We denote the vector of all unknown parameters by θ = (ϑ, γ) and assume these areassigned independent priors. With slight abuse of notation we use π0(ϑ) and π0(γ)to denote the priors on ϑ and γ respectively (the argument in parentheses will clarifywhich prior is meant). In this case it is convenient to choose a conjugate Gaussian priorfor the coefficients

ϑi ∼ N (0, ξ2i )

for positive scaling constants ξi. Priors for the drift obtained by specifying a priordistribution on ϑ were previously considered in Van der Meulen et al. (2014). Uponcompleting the square, it follows that the distribution of ϑ conditional on γ and the fullpath Y of the diffusion is multivariate normal with mean vectorW−1γ µγ and covariancematrix W−1γ . We define for k, ` ∈ {1, . . . , d},

µγ [k] =

∫ T

0

ϕk(Yt)′a−1γ (Yt) dYt

Σγ [k, `] =

∫ T

0

ϕk(Yt)′a−1γ (Yt)ϕ`(Yt) dt

Wγ = Σ + diag(ξ−21 , . . . , ξ−2N ).

(For a vector x ∈ Rn we denote the i-th element by x[i]. To emphasise the dependenceon Y we sometimes also write µγ(Y ), Wγ(Y ) etc.) This leads to a natural adaptationof algorithm 1 from section 4.2.

Algorithm 2. Steps 1, 2 and 4 as in algorithm 1. Assume that σ is invertible. Step 3 isgiven by

3.1 Update γ | (ϑ,Z?,D).

(a) Sample γ◦ ∼ q(· | γ).

(b) Sample U ∼ U(0, 1). Compute

A3 =

n∏i=1

p(γ◦,ϑ)(ti−1, xi−1; ti, xi)

p(γ,ϑ)(ti−1, xi−1; ti, xi)

Ψ(γ◦,ϑ)(g((γ◦, ϑ), Z?i )

Ψ(γ,ϑ)(g((γ, ϑ), Z?i ))

q(γ | γ◦)q(γ◦ | γ)

π0(γ◦)

π0(γ)

Page 12: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 12

Set

γ :=

{γ◦ if U ≤ A3

γ if U > A3

.

3.2 Update ϑ | (γ, Z?,D).

(a) Compute µg = µγ(g((ϑ, γ), Z?) and Wγ = Wγ(g((ϑ, γ), Z?)).

(b) Sample ϑ◦ ∼ N (W−1γ µγ ,W−1γ ).

(c) Compute Z◦ such that g((ϑ◦, γ), Z◦) = g((ϑ, γ), Z?). Set ϑ = ϑ◦ andZ? = Z◦.

Note that computation of Z◦ in step 3.2(c) requires invertibility of σ.

Proof. Suppose (ϑ, γ, Z?) ∼ π, where π denotes the posterior distribution. Considerthe map f : (ϑ, γ, Z?) 7→ (ϑ, γ,X?), where X? = g((ϑ, γ), Z?). Here X? solves (?)by (4.3). We show that step 3.2 preserves π. The distribution of (ϑ, γ,X?) is the imagemeasure of the posterior distribution π of the tuple (ϑ, γ, Z?) under f and coincideswith the posterior distribution of (ϑ, γ,X?). Denote the image measure of π underf by by π ◦ f−1. In steps 3.2(a) and 3.2(b) we apply the mapping f , followed bya Gibbs step in which we draw ϑ◦ conditional on (γ,X?). The latter preserves π ◦f−1. Hence (ϑ◦, γ,X?) ∼ π ◦ f−1. In step 3.2(c) we we compute (ϑ◦, γ, Z◦) as pre-image of (ϑ◦, γ,X?) under f (this is possible as we assume σ to be invertible). Hence(ϑ◦, γ, Z◦) ∼ π.

A variation of this algorithm is obtained in case the drift is of the form specified inequation (4.5) and the diffusion coefficient depends on both ϑ and γ. In this case wecan update γ just as in algorithm 2. Updating ϑ can be done using a random walk typeproposal of the form

q(ϑ◦ | ϑ) ∼ N(ϑ, αV ),

with α a positive tuning parameter. Motivated by the covariance matrix of the priorexploited in the case of partial conjugacy we propose to replace V by W−1ϑ . By thischoice, if two components ϑi and ϑj are strongly correlated, the proposed local randomwalk proposals have the same correlation structure, which can improve mixing of thechain.

Algorithm 3. The same algorithm as algorithm 2 without the invertibility assumptionsand Step 3.2 replaced by

3.2’ Update ϑ | (γ, Z?,D).

(a) Set X? = g(ϑ,Z?).

(b) Compute W(ϑ,γ).

(c) Sample ϑ◦ ∼ N (ϑ, α2W−1(ϑ,γ)).

(d) Compute W(ϑ◦,γ).

Page 13: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 13

(e) Sample U ∼ U(0, 1). Compute

A4 =

n∏i=1

p(γ,ϑ◦)(ti−1, xi−1; ti, xi)

p(γ,ϑ)(ti−1, xi−1; ti, xi)

Ψ(γ,ϑ◦)(g((γ, ϑ◦), Z?i ))

Ψ(γ,ϑ)(g((γ, ϑ), Z?i ))

π0(ϑ◦)

π0(ϑ)

× |Wϑ◦ |1/2

|Wϑ|1/2exp

(− 1

2α2(ϑ◦ − ϑ)′(Wϑ◦ −Wϑ)(ϑ◦ − ϑ)

).

Set

ϑ :=

{ϑ◦ if U ≤ A4

ϑ if U > A4

.

The following argument gives some guidance in the choice of α. If the target dis-tribution is a d-dimensional Gaussian distribution Nd(µ,Σ) and the proposal is ofthe form ϑ◦ ∼ q(ϑ◦, ϑ) ∼ Nd(ϑ, α2Σq), then optimal choices for α and Σq aregiven by Σq = Σ and α = 2.38/

√d (Cf. Rosenthal (2011)). Hence, we will choose

α = 2.38/√

dim(ϑ), which corresponds to average acceptance probability equal to0.234.

5. Discussion on related work

In this section we point out similarities and differences of this paper with respect to re-lated approaches in Golightly and Wilkinson (2006), Golightly and Wilkinson (2010),Delyon and Hu (2006), Fuchs (2013) and Papaspiliopoulos et al. (2013). Delyon andHu (2006) introduced proposals with dynamics governed by the SDE

dX◦t =

(λb(t,X◦t ) +

v −X◦tT − t

)dt+ σ(t,X◦t ) dWt, X◦0 = u, (◦′)

where λ ∈ {0, 1}. Sufficient conditions for absolute continuity for both cases are givenin Delyon and Hu (2006). The pulling term (v −X◦t )/(T − t) in the SDE forces X◦

to hit v at time T and the case λ = 1 takes the drift of the process into account whenproposing bridges. However, for both λ = 0 and λ = 1 the resulting bridges may notresemble true bridges.

Without loss of generality we may assume we have have one observation XT = v.If λ = 0 in (◦′), a slight adjustment of the Euler discretisation of the bridge (introducedby Durham and Gallant (2002)) sets

X◦ti −X◦ti−1≈v −X◦ti−1

T − ti−1δi +

T − tiT − ti−1

σ(ti−1, X◦ti−1

)(Wti −Wti−1),

where 0 = t0 < t1 < · · · < tN = T and δi = ti − ti−1 (the adjustment beingthe addition of the term (T − ti)/(T − ti−1) in front of σ). If δ := max1≤i≤N δi issmall, the right hand side defines a discrete time Markov chain approximation X◦i toX◦ti termed the Modified Brownian Bridge. We can define a mapping g by the relation

(∆X◦1, . . . ,∆X◦N ) = g(θ,∆W1, . . . ,∆WN ),

Page 14: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 14

where ∆X◦i = X◦i − X◦i−1 and ∆Wi = Wti −Wti−1 . This implies that parameters

in the diffusion coefficient can be updated conditional on (∆W1, . . . ,∆WN ), insteadof (∆X◦1, . . . ,∆X

◦N ); thereby preventing degeneracy of the data-augmentation algo-

rithm. This is the approach of Golightly and Wilkinson (2010). Chib et al. (2004) usethe same algorithm using Euler discretisation of X◦ in case λ = 0 (so no correctionin the diffusion coefficient appears in the discretisation). Note that there is no formalproof that these methods work as δ ↓ 0.

Fuchs (2013) explores how to use the driving Brownian motion in (◦′) as innova-tion process within a continuous time setup. MH-acceptance probabilities are obtainedusing results in Delyon and Hu (2006) who provide expressions for the likelihood ra-tio of the laws of X? and X◦. However, the absolute continuity results of Delyon andHu (2006) do not provide the proportionality constants in the derived likelihood ratio.For generating diffusion bridges using a MH-sampler these constants are irrelevant.However, as these constants do depend on θ they cannot be neglected.

We shortly review the derivation of this constant in the one-dimensional case ifλ = 0 (the argument for λ = 1 is similar). Let t < T and denote the restrictions ofthe measures P? and P◦ by P?t and P◦t respectively. By equation (4.1) in Schauer et al.(2013) and Girsanov’s theorem,

dP ?tdP ◦t

(X◦) =p(t,X◦t ;T, v)

p(0, u;T, v)×Gt(X◦)

× exp

(−∫ t

0

κ(s,X◦s )a−1(s,X◦s ) dX◦s +1

2

∫ t

0

κ(s,X◦s )2a−1(s,X◦s ) ds

),

where κ(s, x) = (v − x)/(T − s) and

Gt(X◦) = exp

(∫ t

0

b(s,X◦s )a−1(s,X◦s ) dX◦s −1

2

∫ t

0

b(s,X◦s )2a−1(s,X◦s ) ds

).

A careful argument by Delyon and Hu (2006) (also outlined in Papaspiliopoulos et al.(2013)) shows that this likelihood ratio can be rewritten as

Gt(X◦)× exp

(−1

2

∫ t

0

(T − s)κ(s,X◦s )2 � da−1(s,X◦s )

)(5.1)

× p(t,X◦t ;T, v)

ϕ(v;X◦t , (T − t)a(t,X◦t ))

α√a(t,X◦t )

.

Here ϕ(x;µ, a) denotes the value of the normal density with mean µ and variance a,evaluated at x and α is a constant given by

α =1

p(0, u;T, v)

1√2πT

exp

(− 1

2Ta(0, u)−1(v − u)2

).

The �-integral is obtained as the limit of sums where the integrand is computed at theright limit of each time interval as opposed to the left limit used in the definition of theIto integral. Delyon and Hu (2006) provide sufficient conditions such that for solutionsto (◦′) there exists an almost surely finite random variable C such that

|v −X◦t |2 ≤ C(T − t) log log(1/(T − t) + e). (5.2)

Page 15: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 15

This ensures that the second term in equation (5.1) is well defined in the limit t ↑ T .Intuitively, when t is close to T and X◦t is close to v, p(t,X◦t ;T, v) and ϕ(v;X◦t , (T −t)a(t,X◦t )) cancel each other. This is true under the expectation, for completeness wegive the result in A.2, as to the best of our knowledge a rigorous derivation of theconstant appearing in the likelihood ratio for the Delyon and Hu (2006) type proposalsis missing in the literature. In this derivation, we naturally assume the conditions statedin section 4 of Delyon and Hu (2006).

Papaspiliopoulos et al. (2013) extend the reparametrisation of the diffusion bridgesin Roberts and Stramer (2001) (cf. section 2) to reducible diffusions and otherwisedecouple parameter and bridge by first discretising and then using the transformationformula in Euclidean coordinates for the discretised bridges. The use of the drivingBrownian motion of the continuous time bridge proposal (◦′) as innovation processwas proposed by Fuchs (2013).

6. Time changing and scaling of linear guided proposals

Simulation of X◦ and numerical evaluation of Ψ as defined in (3.4) is numericallycumbersome since the drift of X◦ and the integrand D explode for s near the endpointT . As an example, suppose σ is constant and we take X = σdWt. Then we haver(s, x) = a−1(v − x)/(T − s) and

log Ψ(X◦) =

∫ T

0

D(s,X◦s ) ds =

∫ T

0

b(s,X◦s )′r(s,X◦s ) ds.

In this section we explain how these numerical problems can be dealt with using a timeand scaling of the proposal process.

For the particular example just given Clark (1990) proposed to perform a timechange and scaling of the proposal process to remove the singularities. Define τC :[0,∞) → [0, T ) by τC(s) = T (1 − e−s) and UCs = es/2(v − X◦τC(s)). Then UC

satisfies the stochastic differential equation

dUCs = − T e−s/2b(T (1− e−s), v − e−s/2UCs ) ds− 1

2UCs ds−

√Tσ dWs,

which behaves as a mean-reverting Ornstein-Uhlenbeck-process as s → ∞. Further-more,

log Ψ(X◦) =

∫ ∞0

e−s/2b(τC(s), v − e−s/2UCs )′T a−1UCs ds

(note that there are some minor typographical errors in Clark (1990)). Clearly, if b isbounded, this removes the singularity near T , but at the cost of having to deal with aninfinite integration interval. For this reason, we propose a different time-change andscaling.

The intuition about our choice of the time change is as follows: as shown in Schaueret al. (2013) up to a logarithmic term ‖v−X◦s ‖ ∼

√T − s for s close to T . Therefore,

‖r(s,X◦s )‖ ∼ ‖v−X◦s

T−s ‖ ∼ 1/√T − s. Now if we wish to remove the singularity in

Page 16: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 16

the integral∫ T0

(T − s)−1/2 ds, we can substitute s = u(2 − u/T ) to get∫ T0

((T −s)−1/2 ds = 2T−1/2

∫ T0

1 du = 2√T .

To make this intuitive idea more precise, we need a few more results from Schaueret al. (2013). If X is a linear process (satisfying equation (3.3)), then

r(s, x) = H(s)(v(s)− x), (6.1)

where

v(s) = Φ(s, T )v −∫ T

s

Φ(s, τ)β(τ) dτ (6.2)

(r and H were defined in equation (3.1)). Here Φ(t, s) = Ψ(t)Ψ(s)−1 with Ψ thefundamental d× d matrix that satisfies

Ψ(t) = I +

∫ t

0

B(τ)Ψ(τ) dτ.

Define the time change τ : [0, T ]→ [0, T ] by

τ(s) = s(2− s/T )

and define the process U by

Us :=1

2

v(τ(s))−X◦τ(s)T − τ(s)

τ ′(s) =v(τ(s))−X◦τ(s)

T − s. (6.3)

This impliesX◦τ(s) = v(τ(s))− (T − s)Us =: G(s, Us). (6.4)

In the following, we denote the derivatives of v and τ by v and τ respectively.

Proposition 6.1. The time changed process U = (Us, s ∈ [0, T )) satisfies the stochas-tic differential equation

dUs =2

Tv(τ(s)) ds− 2

Tb(τ(s), G(s, Us)) ds

+1

T − s

(I− 2a(τ(s), G(s, Us))J(s)

)Us ds

−√

2

T

1√T − s

σ(τ(s), G(s, Us)) dWs, U0 =v − uT

(6.5)

whereJ(s) = H(τ(s))(T − s)2/T. (6.6)

Moreover, lims↑T a(s)J(s) = I.

If we simulate U on an equidistant grid we can recover X◦ on a nonequidistant gridfrom equation (6.4). This implies X◦ is evaluated on an increasingly finer grid as s in-creases to T . In our implementation, all computations are done in time-changed/scaleddomain, and the mapping g is in fact defined by setting Us = g(θ, Z◦s ), where Z◦ isthe driving Brownian Motion for U .

Page 17: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 17

At first, it may seem that the SDE given by equation (6.5) has not resolved thesingularity problem at time T . However, note that if b and σ are smooth, then fors ≈ T ,

dUs ≈ −2

Tb(T, v) ds− 1

T − sUs ds−

√2

T

1√T − s

σ(T, v) dWs. (6.7)

In the following subsection we study discretisation of this SDE and show that no dis-cretisation error is made at the final discretisation step near T .

Finally we give the time changed version of formula (3.2), which follows from thesame elementary, if tedious, calculations as in the proof of Proposition 6.1:∫ T

0

D(s,X◦s ) ds = 2

∫ T

0

(b− b)′(τ(s), G(s, Us))J(s)Us ds

+2

∫ T

0

tr

[(a− a)(τ(s), G(s, Us))

T − sJ(s)

]ds

+ 2T

∫ T

0

tr

[(a− a)(τ(s), G(s, Us))

T − sJ(s)UsU

′sJ(s)

]ds.

6.1. Discretisation error

In this section we show in a simple setting the benefits of discretising U instead of X◦.To this end, we consider the diffusion dXs = σ dWs. In this case it is natural to takedX = σ dWs which yields proposals satisfying the stochastic differential equation(SDE)

dX◦s =v −X◦sT − s

ds+ σ dWs, X◦0 = u

(note the resemblance with equation (6.7)). Of course, for this particular example itis trivial to obtain an exact realisation at a finite set of points in [0, T ], but supposefor a moment we’re not aware of this fact. Now there are at least two approaches forobtaining an approximate discrete skeleton of the process:

1. Apply the Euler discretisation directly to the SDE for X◦.2. First transform X◦ to the process U as defined in (6.3), then apply Euler dis-

cretisation on that process and subsequently transform this discretisation back toX◦.

In this section we will show that the second approach yields improved accuracy. Weassume in both cases we wish to obtain a skeleton at m − 1 (with m ≥ 2) time pointsin (0, T ). Let h = T/m.

Approach 1. Direct application of Euler discretisation to X◦ gives that the law ofX◦s+h conditional on X◦s = x is approximately normal with mean x + v−x

T−sh andcovariance CEuler = ha. While the mean is exact, the true covariance equals Ctrue =hT−s−hT−s a. Therefore, for s ∈ [0, T − h]

d(s) := ‖Ctrue − CEuler‖ =h2

T − s‖a‖ ≤ h‖a‖.

Page 18: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 18

Note that the largest difference is attained for s = T − h.Approach 2. Straightforward calculus gives that U satisfies the SDE

dUs = − 1

T − sUs ds+

√2

T

1√T − s

σ dWs, U0 =v − uT

.

If we denote the Euler discretisation applied to this equation by U , then

Us+h | Us = c ∼ N(T − s− hT − s

c,2

T

h

T − sa

).

As X◦τ(s) = v− (T − s)Us, we define X◦τ(s) = v− (T − s)Us. Using the result of theprevious display, some tedious calculations reveal that

X◦τ(s+h) | X◦τ(s) = x ∼ N

(x+ 2h

v − xT − s

− h2 v − x(T − s)2

, C ′Euler

),

where

C ′Euler =2h

T

(T − s− h)2

T − sa.

The true law of X◦τ(s+h), conditional on X◦τ(s) = x has exactly the same mean, whilethe true covariance equals

C ′true =2h

T

(T − s− h)2

T − sa− h2

T

(T − s− h)2

(T − s)2a.

Therefore,

d′(s) := ‖C ′true − C ′Euler‖ =h2

T

(1− h

T − s

)2

‖a‖

≤ h2

T

(T − hT

)2

‖a‖ ≤ h2

T‖a‖ =

h

N‖a‖.

Note that this difference is zero if s = T − h; while its maximal value is attained fors = 0.

Comparing both approaches we see that

Rm(i) :=d′((i− 1)h)

d((i− 1)h)=

(m− i)2

m(m− i+ 1), i = 1, . . . ,m.

This ratio is always smaller than 1 and i 7→ R(i) decreases roughly linearly to zero.This implies that it is always advantageous in this example to use the second approach,the closer we come to the endpoint T , the more we gain. As we apply the discretisationiteratively over all points it is important that the ratio is uniformly smaller than 1.Further, from figure 2 it is apparent that relative gain of approach 2 is higher for smallvalues of m. Simulation results indicate that this advantage also holds more generally,though a proof for this is not available yet.

Page 19: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 19

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

i / m

R_m

(i)

● m=5m=10m=50m=500

FIGURE 2. Behaviour of Rm(i) with respect to m and i.

7. Examples

7.1. Example for one-dimensional diffusion

In this section we consider an example. The goal is twofold: (i) to show that the pro-posed algorithm does not deteriorate when increasing the number of imputed points,(ii): to show that the time change and scaling proposed in the previous section reducesdiscretisation error.

We consider the diffusion process driven by the SDE

dXt = (α arctan(Xt) + β) dt+ σ dWt.

Assume that we observe X at times points t = 0, 0.3, 0.6 . . . , T = 30 and wish to es-timate (α, β, σ). As true values we took α = −2, β = 0 and σ = 0.75. For generatingthe discrete time data we simulated the process on [0, T ] at 400 001 equidistant timepoints using the Euler scheme and take a subsample. See figure 3.

For α and β we chose apriori independently a N (0, ξ2)-distribution with varianceξ2 = 5. For log σ we used an uninformative flat prior. We applied algorithm 2 withrandom walk proposals for q(σ◦ | σ) of the form

log σ◦ := log σ + u, u ∼ U(−0.1, 0.1).

We initialised the sampler with α = −0.1, β = −0.1 and σ = 2 and varied thenumber of imputed points over m = 10, 100 and 1000.

Figures 4 and 5 illustrate the results of running the MCMC chain for 10.000 itera-tions using m = 10, 100, 1000 imputed points respectively for each bridge (includingendpoints), both with time change and without. Two things stand out: firstly, increasing

Page 20: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 20

1

2

-2

0

-1

X

10 200 30

t

Observations

FIGURE 3. Diffusion path and discrete observations (small circles).

the number of imputed points m does not worsen the mixing of the chain and secondlythe vastly reduced bias when using discretisation of U (especially when m is small, aswe anticipated in section 6.1.)

7.2. Prokaryotic auto-regulation

7.2.1. Model and data

In this section we apply our methods to an example presented in section 4 of Golightlyand Wilkinson (2010) on prokaryotic auto-regulation. We briefly introduce the exam-ple and focus on estimation; the interested reader can consult Golightly and Wilkinson(2010) for more details and background on this example. A gene is expressed into aprotein P in several steps including transcription of the DNA encoding the protein intomRNA by RNA-polymerase and synthesis of the protein in the ribosome as speci-fied by the mRNA. In the autoregulation model, pairs of the protein may reversiblyform dimers P2 which bind to certain regions of the DNA resulting in bound DNA ·P2

which is not transcripted into RNA for P, thus providing an auto-regulation mecha-nism for the synthesis of this particular protein P . The total amount of bound and freeDNA is constant DNA + DNA ·P2 = K so the quantities of RNA, P, P2 and DNAdescribe the state of the system. These reactions can be modelled as Markov jump pro-cess. While it is straightforward to exactly simulate sample paths of the jump processwith values (RNA,P,P2,DNA) ∈ N4, estimation can be carried out by exploitinga diffusion approximation to this process. This approximating diffusion process withvalues in R4 solves the Chemical Langevin Equation

dXt = Shθ(Xt) dt+ S diag(√hθ(Xt)) dWt (7.1)

Page 21: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 21

10 100 1000

−10.0

−7.5

−5.0

−2.5

−0.8

−0.4

0.0

0.4

1.0

1.5

αβ

σ

0

100

200

300

400

500 0

100

200

300

400

500 0

100

200

300

400

500

iterate

10 100 1000

−8

−6

−4

−2

−1.2

−0.8

−0.4

0.0

0.4

1.0

1.5

αβ

σ

0

100

200

300

400

500 0

100

200

300

400

500 0

100

200

300

400

500

iterate

10 100 1000

−8

−6

−4

−2

0

−0.5

0.0

0.5

0.75

1.00

1.25

αβ

σ

2500

5000

7500

1000

0

2500

5000

7500

1000

0

2500

5000

7500

1000

0

iterate

10 100 1000

−4

−3

−2

−1

−0.50

−0.25

0.00

0.25

0.50

0.6

0.8

1.0

αβ

σ

2500

5000

7500

1000

0

2500

5000

7500

1000

0

2500

5000

7500

1000

0

iterate

10 100 1000

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

αβ

σ

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50lag

acf

10 100 1000

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

αβ

σ

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50lag

acf

FIGURE 4. Panels comparing different numbers of imputed points (m = 10, 100, 1000). Left: without timechange. Right: with time change. Top: first 500 iterates. Middle: iterates 501-10.000. Bottom: ACF-plotsbased on iterates 501-10.000.

Page 22: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 22

α

0.00

0.25

0.50

0.75

−7.5 −5.0 −2.5 0.0

dens

ity

β

0

1

2

3

−0.5 0.0 0.5

dens

ity

σ

0

2

4

6

0.75 1.00 1.25

dens

ity

m101001000

α

0.00

0.25

0.50

0.75

−4 −3 −2 −1

dens

ity

β

0

1

2

3

−0.50 −0.25 0.00 0.25 0.50

dens

ity

σ

0

2

4

6

0.6 0.8 1.0 1.2

dens

ity

m101001000

FIGURE 5. Kernel density estimates of the parameters based on iterates 501-10.000. Top: non time-changed.Bottom: time-changed. The bias from using only a small number of imputed points (m = 10, red curve) isclearly smaller for the time changed process.

driven by a R8-valued Brownian motion, where

S =

0 0 1 0 0 0 −1 00 0 0 1 −2 2 0 −1−1 1 0 0 1 −1 0 0−1 1 0 0 0 0 0 0

is the stoichiometry matrix of the system describing the chemical reactions and

hθ(x) = θ ◦ h(x)

is a function describing the hazard for a particular reaction to happen. Here ◦ denotesthe Hadamard (or entrywise) product of two vectors and

θ = [θ1, θ2, . . . , θ8]′

is the vector of the unknown rates of each of the 8 distinct reactions which are knownto happen proportionally to the amount of

h(x) = [x3x4,K − x4, x4, x1, 12x2(x2 − 1), x3, x1, x2]′

(respectively) in the system. Note that this is a natural situation in which both drift anddiffusion coefficient depend on the same parameters. Golightly and Wilkinson (2010)proceed from (7.1) to a SDE driven by a 4-dimensional Brownian motion with σ(x) =

(Shθ(x)S′)12 . In our case, except for algorithm 2 which does not apply here, this is not

necessary.

Page 23: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 23

15

20

5

10

0

#

3020100 40 50

t

variable

P2

DNA

P

RNA

FIGURE 6. Prokaryotic auto-regulation: Quantities of (RNA,P,P2,DNA) at integer times (simulatedfrom discrete model).

Golightly and Wilkinson (2010) simulated the continuous time Markov jump pro-cess with parameter values θ = (0.1, 0.7, 0.35, 0.2, 0.1, 0.9, 0.3, 0.1) and K = 10 andrecorded the quantities of each of (RNA,P,P2,DNA) at integer times 0, 1 . . . , 49, seefigure 6.

For inference, Golightly and Wilkinson (2010) use independent U [−5, 5] priors onlog θi and a Gaussian random walk proposal updating the components of θ jointly.They find that the few observations taken allow for remarkably good recovery of thetrue parameters.

7.2.2. Choice of guided proposals

We perform an analysis of the same data, this time with the guided innovation schemeusing the time change from section 6 using (7.1) as model. We choose B and β to de-pend on θ (but not on time) so that Bx+ β approximates bθ(x). While it is possible totake different approximations specifically tailored for each bridge segment, it is com-putationally advantageous to work with a global approximation to bθ. To this end, wereplace h by a linear approximation h which allows for obtaining Bθ and βθ from theequation

Bθx+ βθ = S(θ ◦ h(x)).

Except for its first and fifth entry, x 7→ h(x) is linear. Therefore, for i ∈ {2, 3, 4, 6, 7, 8}we set hi = hi. For i = 1, we take h1 = c1 + u1,3x3 + u1,4x4. Values for c1, u1,3 andu1,4 are obtained from a weighted linear regression of x3x4 on x3 and x4, with weightsproportional to x3x4. Similarly, for i = 5, we take h5 = c5 +u5,2x2. Values for c5 andu2,5 are obtained from a weighted linear regression of 1

2x2(x2− 1) on x2 and x4, withweights proportional to x2(x2 − 1).

Page 24: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 24

We take a weighted regression in this way because for a good proposal the errormatters more if the corresponding dispersion component is small. This automatic pro-cedure gives u5,2 = 18, c5 = −82, u1,3 = 5, u1,4 = 6, and c1 = −32 for the dataobtained.

7.2.3. Rejection of nonpositive bridges

Contrary to the underlying jump process, the chemical Langevin approximation mayhave positive probability of leaving the positive cone, even before Euler discretisationSzpruch and Higham (2009/10). This is also the case for this example, to intuitivelysee this, note for example that a11(x) = θ23x4 + θ27x1. So even if the first coordinateof X approaches 0 at time t, the diffusivity of X in that direction does not vanish ifX

(4)t > 0, so X(1)

t+h < 0 with positive probability.As we are naturally not interested in values of θ which explain the data via bridges

X? violating the condition X? ≥ 0, we resample proposals with X?s < 0 for some s.

In this way we effectively draw from the conditional distribution of θ, conditional onthe discrete time data andX?

s ≥ 0 for all values of s that correspond to imputed points.

7.2.4. Results from applying algorithm 1

As priors for log θi we use independent U [−7, 7] distributions. We used algorithm 1with step 2(a) replaced with

• Sample a Wiener process Z◦i , conditional on g(θ, Z◦i ) ≥ 0 (using rejection sam-pling).

and step 3(a) replaced with

• Sample θ◦ ∼ q(· | θ), conditional on g(θ◦, Z◦i ) ≥ 0 for all 1 ≤ i ≤ n (usingrejection sampling).

We took random-walk type proposals for θ on the log scale, where conditional of thecurrent value θ a new value θ◦ is sampled from

log θ◦ ∼ N(log θ, (0.12)2).

We choose θ = (0.05, . . . , 0.05)′ as first iterate. A run of 100, 000 iterations usingm = 20, 50 imputed points (including endpoints) for each bridge yields the estimatesfor θ1, . . . , θ8 given in the upper panel of table 1.

In case m = 50, on average 76% of the sampled bridges for a segment whereaccepted in step 2(b) and the average acceptance probability for θ◦ in step 3 was 24%.

7.2.5. Results from applying algorithm 3

In this example one can clearly see from figure 8 that both θ1 and θ2 as θ4 and θ5are highly correlated. This worsens mixing of the Markov chain for these parameters.

Page 25: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 25

Alg. 1 Mean Standard dev. Autocorrelation time

θ1 0.0957 0.0277 180.05

θ2 0.7523 0.2187 176.88

θ3 0.3777 0.1120 171.18

θ4 0.2342 0.0516 75.18

θ5 0.0726 0.0206 187.59

θ6 0.7016 0.1958 169.24

θ7 0.3095 0.0901 169.36

θ8 0.1381 0.0297 87.62

Alg. 3 Mean Standard dev. Autocorrelation time

θ1 0.0976 0.0289 97.23 (54%)

θ2 0.7664 0.2268 100.29 (57%)

θ3 0.3811 0.1039 20.44 (12%)

θ4 0.2343 0.0523 15.47 (21%)

θ5 0.0731 0.0215 127.90 (68%)

θ6 0.7048 0.2032 121.05 (72%)

θ7 0.3130 0.0845 21.30 (13%)

θ8 0.1376 0.0295 15.74 (18%)TABLE 1

Prokaryotic auto-regulation: Estimated parameters of the posterior distribution using algorithms 1 and 3.The estimated autocorrelation time was computed using R package coda.

Since

bθ(x) = Sθ ◦ h(x) =∑j

θjϕj(x) with ϕj(x) = (sijhj(x))i (7.2)

we see that b is of the form (4.5). Therefore, we can apply algorithm 3. We make aslight adjustment, in which step 3.2’(c) is replaced with

• Sample θ◦ ∼ N (θ, α2W−1θ ) under the condition g(θ◦, Z?i ) ≥ 0 for all 1 ≤ i ≤n (using rejection sampling).

A run of 100, 000 iterations usingm = 20, 50 imputed points (including endpoints) foreach bridge yields the estimates for θ1, . . . , θ8 given in the lower panel of table 1. Wetook the same priors as in subsection 7.2.4. The scaling parameter of the random walkproposal was chosen according to the remark after algorithm 3. The correspondingacceptance probability for the proposals for θ was 26% (for m = 50). This compareswell with the anticipated value of 0.234.

From table 1 we see that indeed application of algorithm 3 greatly reduces autocor-relation times over algorithm 1.

Page 26: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 26

20 50

0.050.100.15

0.080.120.160.200.24

0.51.01.5

0.250.500.751.001.25

0.10.20.30.4

0.050.10

0.080.100.120.140.16

0.51.0

0.250.500.751.00

0.100.150.200.250.30

θ1

θ1

θ2

θ2

θ3

θ4

θ5

θ5

θ6

θ6

θ7

θ8

0

2500

0

5000

0

7500

0

1000

00 0

2500

0

5000

0

7500

0

1000

00iterate

FIGURE 7. Prokaryotic auto-regulation: Iterates of the MCMC chain for m = 20, 50 using 3.a. Note theshift in the distribution of θ5 and θ6.

m=20

thet

a5th

eta6

theta5 theta6

0.025

0.050

0.075

0.100

Corr:

0.884

●●

●●

●●

●●

●●

● ●

● ●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●● ●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●● ●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

●●●

● ●

●●

●●●

●●

●●●●

●●

●●

●● ●

●● ●

●●●

●●

●●

● ●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●● ●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

●●

●●●

●●

●●

● ●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●● ●

●●

●●

●●

●●●

●●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●

●● ●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ● ●

●●

●●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●● ●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

● ●

●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

● ● ●

●●

●●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●● ●

●●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●●

● ●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

●●

● ●

● ●

● ●

●●

●●

●●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

● ●

●●

●●

● ●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

● ●

●●

●● ●

●●

● ●●

●●

●●

●●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●● ●

●●●

●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

● ●

●●

●●

●●●●

●●

●●

●● ●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●●●

●●

●●

●●

●●

●● ●●

● ●

●● ●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●● ●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

● ●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

●●●

0.3

0.6

0.9

0.03 0.06 0.09 0.12 0.3 0.6 0.9

m=50

thet

a5th

eta6

theta5 theta6

0.05

0.10

0.15

Corr:

0.922

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

● ●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●

●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●●

●●●●

●●

● ●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

● ●

● ●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●●

●●●

●●

● ●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●● ●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

● ●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●●●

●●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ● ●

● ●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

● ●●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●●

●● ●

● ●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●●

●●●

●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

● ●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

● ●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

● ●

●●

●●● ●

●●

● ●●

●●

●●

●●

●● ●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●● ●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●● ●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●●●

● ●

●●

●●

●●

●● ●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●●

●● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

0.5

1.0

0.05 0.10 0.15 0.5 1.0

FIGURE 8. Scatterplots of θ5, θ6 and marginal densities for m = 20, m = 50 (including burn in).

Page 27: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 27

20 50

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

θ1

θ3

θ7

0 10 20 30 0 10 20 30lag

acf

20 50

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

θ1

θ3

θ7

0 10 20 30 0 10 20 30lag

acf

FIGURE 9. ACF plots of the thinned samples of θ1, θ3, θ7 (taking every 50th iterate after burn in). Left:algorithm 1, right: algorithm 3.

Appendix A: Proofs

A.1. Proof of Proposition 6.1

For ease of notation we will write τ and τ instead of τ(s) and τ(s). If X satisfies theSDE

dXs = b(s,Xs) ds+ σ(s,Xs) dWs

and we are given a smooth function τ = τ(s), τ : [0, T )→ R+ with positive derivativeτ , then

dXτ = τ b(τ,Xτ ) ds+√τσ(τ,Xτ ) dWτ . (A.1)

Applying this to X◦ (defined in equations (◦) and (◦◦)) gives

dX◦τ = 2(1− s/T )[b(τ,X◦τ ) + a(τ,X◦τ )r(τ,X◦τ )] ds+√

2(1− s/T )σ(τ,X◦τ ) dWs

by Ito’s formula

dUs = d

(v(τ)−X◦τT − s

)=

2

Tv(τ) ds+

v(τ)

(T − s)2ds− X◦τ

(T − s)2ds

− 2

T[b(τ,X◦τ ) + a(τ,X◦τ )r(τ,X◦τ )] ds−

√2/T√T − s

σ(τ,X◦τ ) dWs

=2

Tv(τ) ds+

UsT − s

ds− 2

T[b(τ,X◦τ ) + a(τ,X◦τ )r(τ,X◦τ )] ds

−√

2/T√T − s

σ(τ,X◦τ ) dWs.

Page 28: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 28

By equation (6.1),

r(τ,X◦τ ) = H(τ)(v(τ)−X◦τ ) = H(τ)(T − s)Us = J(s)T

T − sUs.

The result now follows from substituting this expression and using the relation X◦τ =G(Us). The final statement follows from limt→T H(t)(T − t) = a−1, see (Schaueret al., 2013, Lemma 8).

A.2. Proportionality constant for the Delyon-Hu type proposal

Denote the product in equation (5.1) by ξt(X◦). Delyon and Hu (2006) provide a rig-orous proof that X◦ and X? are absolutely continuous on [0, T ] with density c ξT (X◦)for some (not specified) constant c > 0. Hence E ξT (X◦) = 1/c.

For t < T we have

E[ϕ(v;X?

t , (T − t)a(t,X?t ))

p(t,X?t ;T, v)

√a(t,X?

t )

]= αE [ξt(X

◦)].

The right-hand-side of this expression converges to αE [ξT (X◦)] = α/c. The left-hand-side equals∫

1√2π(T − t)

exp

(−1

2

(v − x)2

2a(t, x)(T − t)

)p(0, u; t, x)

p(0, u;T, v)dx.

From here, we can use the substitution y = (x − v)/√T − t in the integral and use

the same arguments as in the proof of Lemma 7 in Delyon and Hu (2006) to establishthat this converges to

√a(T, v) as t ↑ T . We conclude that c = α/

√a(T, v) is the

proportionality constant.

Appendix B: Implementation details

The source code of the examples will be available online1 It is written in the program-ming language Julia (Bezanson et al. (2012)). In this section we provide some computa-tional details in case X is a time homogeneous process with linear drift b(x) = Bx+ βand constant diffusion coefficient a. Simulation of U as defined in equation (6.5) re-quires evaluation of both v and J , where v and J are defined in equations (6.2) and (6.6)respectively. If B ≡ 0, then this is straightforward, Else, if we let λ be the solution tocontinuous time Lyapunov (matrix) equation

Bλ+ λB′ + a = 0,

then

J(s) =(T − s)2

T

(e−B(T (1−s/T )2)λe−B

′(T (1−s/T )2) − λ)−1

v(τ(s)) = e−B(T (1−s/T )2)(Bv + β).

1See https://github.com/mschauer/ChemicalLangevin.jl.

Page 29: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 29

Note that both J and v do not depend on x. This implies that these functions onlyneed to be computed once on the grid of a single bridge in each iteration. In case Bdoes not depend on θ then these functions can be precomputed on a grid in advanceto the MCMC-algorithm. As computing matrix exponentials is expensive, this reducescomputing time substantially.

Acknowledgement

We thank Andrew Golightly for sharing the simulated data used in the prokaryoticauto-regulation example with us.

References

Beskos, A., Papaspiliopoulos, O., Roberts, G. O. and Fearnhead, P. (2006). Exact andcomputationally efficient likelihood-based estimation for discretely observed diffu-sion processes. J. R. Stat. Soc. Ser. B Stat. Methodol. 68(3), 333–382. With discus-sions and a reply by the authors.

Bezanson, J., Karpinski, S., Shah, V. B. and Edelman, A. (2012). Julia: A fast dynamiclanguage for technical computing. CoRR abs/1209.5145.

Bladt, M. and Sørensen, M. (2012). Simple simulation of diffusion bridges with appli-cation to likelihood inference for diffusions. Preprint.

Chib, S., Pitt, M. K. and Shephard, N. (2004). Likelihood based inference for diffusiondriven models. Economics Papers 2004-W20, Economics Group, Nuffield College,University of Oxford.

Clark, J. M. C. (1990). The simulation of pinned diffusions. In Decision and Control,1990., Proceedings of the 29th IEEE Conference on, pp. 1418–1420. IEEE.

Delyon, B. and Hu, Y. (2006). Simulation of conditioned diffusion and application toparameter estimation. Stochastic Processes and their Applications 116(11), 1660 –1675.

Durham, G. B. and Gallant, A. R. (2002). Numerical techniques for maximum like-lihood estimation of continuous-time diffusion processes. J. Bus. Econom. Statist.20(3), 297–338. With comments and a reply by the authors.

Elerian, O., Chib, S. and Shephard, N. (2001). Likelihood inference for discretelyobserved nonlinear diffusions. Econometrica 69(4), 959–993.

Eraker, B. (2001). MCMC analysis of diffusion models with application to finance. J.Bus. Econom. Statist. 19(2), 177–191.

Fuchs, C. (2013). Inference for diffusion processes. Springer, Heidelberg. With appli-cations in life sciences, With a foreword by Ludwig Fahrmeir.

Golightly, A. and Wilkinson, D. J. (2006). Bayesian sequential inference for nonlinearmultivariate diffusions. Stat. Comput. 16(4), 323–338.

Golightly, A. and Wilkinson, D. J. (2010). Learning and Inference in ComputationalSystems Biology, chapter Markov chain Monte Carlo algorithms for SDE parameterestimation, pp. 253–276. MIT Press.

Gugushvili, S. and Spreij, P. (2012). Parametric inference for stochastic differentialequations: a smooth and match approach. ALEA Lat. Am. J. Probab. Math. Stat.9(2), 609–635.

Page 30: Bayesian estimation of discretely observed multi-dimensional ...meulen/innovations...onDelyon and Hu(2006) was explored byFuchs(2013) (in particular section 7.4). Van der Meulen and

Van der Meulen and Schauer/Bayesian estimation for diffusions using guided proposals 30

Lin, M., Chen, R. and Mykland, P. (2010). On generating Monte Carlo samples ofcontinuous diffusion bridges. J. Amer. Statist. Assoc. 105(490), 820–838.

Papaspiliopoulos, O., Roberts, G. O. and Skold, M. (2003). Non-centered parameter-izations for hierarchical models and data augmentation. In Bayesian statistics, 7(Tenerife, 2002), pp. 307–326. Oxford Univ. Press, New York. With a discussionby Alan E. Gelfand, Ole F. Christensen and Darren J. Wilkinson, and a reply by theauthors.

Papaspiliopoulos, O., Roberts, G. O. and Stramer, O. (2013). Data Augmentation forDiffusions. J. Comput. Graph. Statist. 22(3), 665–688.

Pedersen, A. R. (1995). Consistency and asymptotic normality of an approximate max-imum likelihood estimator for discretely observed diffusion processes. Bernoulli1(3), 257–279.

Roberts, G. O. and Stramer, O. (2001). On inference for partially observed nonlineardiffusion models using the Metropolis-Hastings algorithm. Biometrika 88(3), 603–621.

Rogers, L. C. G. and Williams, D. (2000). Diffusions, Markov processes, and mar-tingales. Vol. 2. Cambridge Mathematical Library. Cambridge University Press,Cambridge. Ito calculus, Reprint of the second (1994) edition.

Rosenthal, J. S. (2011). Handbook of Markov Chain Monte Carlo (Chapman &Hall/CRC Handbooks of Modern Statistical Methods), chapter Optimal proposal dis-tributions and adaptive MCMC. Chapman and Hall/CRC, 1 edition.

Schauer, M. R., Van der Meulen, F. H. and Van Zanten, J. H. (2013). Guided proposalsfor simulating multi-dimensional diffusion bridges. ArXiv e-prints .

Sørensen, H. (2004). Parametric Inference for Diffusion Processes Observed at Dis-crete Points in Time: a Survey. Internat. Statist. Rev. 72, 337–354.

Szpruch, L. and Higham, D. J. (2009/10). Comparing hitting time behavior of Markovjump processes and their diffusion approximations. Multiscale Model. Simul. 8(2),605–621.

Van der Meulen, F. H., Schauer, M. R. and Van Zanten, J. H. (2014). Reversible jumpMCMC for nonparametric drift estimation for diffusion processes. Comput. Statist.Data Anal. 71, 615–632.

Van der Meulen, F. H. and Van Zanten, J. H. (2013). Consistent nonparametricBayesian inference for discretely observed scalar diffusions. Bernoulli 19(1), 44–63.

Van Zanten, J. H. (2013). Nonparametric bayesian methods for one-dimensional diffu-sion models. Mathematical biosciences 243(2), 215–222.