fast covariance estimation for innovations computed from a ... · cedure is implemented in the...

Scandinavian Journal of Statistics, Vol. 40: 669-684, 2013

doi: 10.1111/sjos.12017© 2013 Board of the Foundation of the Scandinavian Journal of Statistics. Published by Wiley Publishing Ltd.

Fast Covariance Estimation forInnovations Computed from a SpatialGibbs Point ProcessJEAN-FRANÇOIS COEURJOLLYLaboratory Jean Kuntzmann, Grenoble University

EGE RUBAKDepartment of Mathematical Sciences, Aalborg University

ABSTRACT. In this paper, we derive an exact formula for the covariance of two innovations com-puted from a spatial Gibbs point process and suggest a fast method for estimating this covariance.We show how this methodology can be used to estimate the asymptotic covariance matrix of themaximum pseudo-likelihood estimator of the parameters of a spatial Gibbs point process model.This allows us to construct asymptotic confidence intervals for the parameters. We illustrate theefficiency of our procedure in a simulation study for several classical parametric models. The pro-cedure is implemented in the statistical software R, and it is included in spatstat, which is an Rpackage for analyzing spatial point patterns.

Key words: confidence intervals, exponential family models, Georgii–Nguyen–Zessin formula,innovation process, maximum pseudo-likelihood

1. Introduction

Spatial point patterns are datasets containing the random locations of some event of inter-est. Such datasets appear in many scientific fields such as biology, epidemiology, geography,astrophysics, physics and economics. The stochastic mechanism generating such a dataset ismodelled as a spatial point process and general references covering as well theoretical as practi-cal aspects of this topic are, for example, Møller & Waagepetersen (2004); Stoyan et al. (1995);Illian et al. (2008). The spatial point process model, considered as the reference, is the Poissonprocess, which models complete spatial randomness in the sense that points appear uniformlyand independently of each other. In many applications, there is dependence (or interaction)between the points, and the Poisson point process model cannot be applied. In this case, Gibbs(or Markov) point processes constitute one of the main alternatives to the Poisson process, andthey allow for both repulsive and attractive interaction between points. Gibbs point processesare typically defined through the so-called Papangelou conditional intensity, and a parametricclass of Gibbs point process models is obtained by defining a parametric class of Papangelouconditional intensities. For the sake of simplicity, this paper deals with exponential family mod-els, meaning that the Papangelou conditional intensity is log-linear in terms of the parameters.However, extensions to nonlinear models may be undertaken on the basis of this paper.

In the literature, several methods for estimating parameters of Gibbs point process modelshave been suggested, and we refer to Møller & Waagepetersen (2007) for a recent overview ofthis problem. One of the most widely used methods is to use the maximum pseudo-likelihoodestimator (MPLE) originally suggested by Besag (1975). Theoretical aspects of the MPLE forstationary Gibbs point processes have been considered in, for example, the works of Jensen& Møller (1991), Jensen & Künsch (1994), Billiot et al. (2008), whereas practical aspects weretackled in the work of Baddeley & Turner (2000). The popularity of this procedure is mainlydue to its computational simplicity compared with the classical maximum likelihood method,

670 J. Coeurjolly and E. Rubak Scand J Statist 40

and it is the default method for estimating parameters of spatial Gibbs point processes in the Rpackage spatstat (Baddeley and Turner, 2005).

Typically the uncertainty of the MPLE is assessed by parametric bootstrap methods. Thisis computationally expensive because it requires both Monte Carlo simulations of the fittedmodel and computation of the MPLE for each realization. As an alternative, Billiot et al.(2008) proved the asymptotic normality of the MPLE and derived a formula for the asymptoticcovariance matrix as well as an estimator of this matrix. However, this estimator is also compu-tationally expensive because of numerical approximation of several integrals. In this paper, weexpress the entries of the covariance matrix as covariances between certain spatial point processinnovations as defined by Baddeley et al. (2005). We prove an exact formula for the covariancebetween two innovations and derive a consistent estimator of this covariance. The proposedestimator does not involve any integration making it very fast compared with the alternativemethods.

The rest of the paper is organized as follows. Section 2 introduces relevant notation andbackground material on spatial point processes including some known asymptotic results forthe MPLE. Section 3 contains the main results of the paper. Here, we study the covariancebetween two innovations and suggest an estimator of the asymptotic covariance matrix forthe MPLE. Section 4 illustrates the performance and efficiency of the developed methodologythrough a simulation study. Finally, auxiliary results and proofs are deferred to Appendix A.

2. Gibbs point processes and pseudo-likelihood

2.1. Definition of (Gibbs) point processes

A point process X in Rd is a locally finite random subset of Rd meaning that the restriction of

X to any bounded Borel set is finite. The elements of X are referred to as points, and we thinkof them as locations of some objects or events of interest. In applications, this may be locationsof trees, mineral deposits, disease cases, galaxies, and so on.

In this paper, we keep on measure theoretical details to a minimum, and we will only intro-duce some necessary notation and terminology. The point process X takes values in the set �consisting of all locally finite subsets of Rd . Thus, the distribution ofX is a probability measureP on an appropriate � -algebra consisting of subsets of�. If the distribution ofX is translationinvariant, we say that X is stationary.

Often the points of a point process have extra information attached to them such as, forexample, the size of the tree or the type of a disease. Such information is called a mark takingvalues in a mark space M, which is equipped with a reference mark distribution �m. In this case,X is called a marked point process with state space S D R

d �M, and a typical element of S isdenoted u WD .u; �/. We will often need to consider a point located at the origin with mark �,and in this particular case, we write 0� WD .0; �/. The mark space M may be quite general, butthe reader will miss no fundamental concepts by letting M be R

m or a countable set. The statespace S is equipped with the product measure �d ˝ �m, where �d is the Lebesgue measure onRd , and with a slight abuse of notation, we let du WD �d ˝ �m.u; �/ D �d .u/�m.�/. We call

a marked point process stationary if the point process on Rd induced by discarding the marks

is stationary. For marked point processes, we let � denote the set of locally finite subsets of S.Throughout the paper, ƒ is exclusively used to denote bounded Borel sets of Rd , and j � j

denotes the volume of such a set. For x 2 �, we let xƒ WD x \ .ƒ �M/, and n.x/ denotesthe number of points in X . For brevity, we say that ‘X is observed in ƒ’ for some ƒ when thelocations of X are in ƒ and the marks are in M.

In this paper, we work with stationary (marked) Gibbs point process models, which may bedefined through a parametric family of Papangelou conditional intensities �� W S �� ! R

C,

© 2013 Board of the Foundation of the Scandinavian Journal of Statistics.

Scand J Statist 40 Covariance of innovations 671

� 2 ‚, where � is the parameter vector and � is the parameter space. Heuristically, thePapangelou conditional intensity has the interpretation that �� .u;X/du is the conditionalprobability of observing a marked point in a ball of volume du around u given the rest of thepoint process is X (see, for example, Møller & Waagepetersen (2004)). We will not discuss howto consistently specify the Papangelou conditional intensity to ensure the existence of a Gibbspoint process on S, but rather we simply assume we are given a well-defined Gibbs point pro-cess. The reader interested in a deeper presentation of Gibbs point processes and the existenceproblem is referred to Ruelle (1969); Preston (1976) or Dereudre et al. (2012) In Section 2.2, wegive several examples of Gibbs point processes.

Throughout the paper, we will often use the following two concepts for a function f W S ��! R,

(i) f has finite interaction range R � 0, that is,

f .u; x/ D f .u; xB.u;R// (2.1)

where B.u;R/ is the euclidean ball centred at u with radius R.(ii) f is translation invariant, that is,

f ..u; �/; x/ D f ..0; �/; �ux/ (2.2)

where �ux D ¹. Nv � Nu;�/j. Nv; �/ 2 xº is the translation of the locations of X by thevector �u.

In the remainder of the paper, we will assume the following general model assumption:[Model] For any u 2 S and x 2 �, let v.u; x/ D .v1.u; x/; : : : ; vp.u; x//T , where vi W S��! R

for i D 1; : : : ; p. For � 2 ‚ � Rp , let �� W S ��! R

C be a function of the form

�� .u; x/ D exp.�T v.u; x// (2.3)

satisfying (2.1) and (2.2). Let P� denote the distribution of a (well-defined) stationaryhereditary marked Gibbs point process with Papangelou conditional intensity �� , and letX � P�? .

Under this assumption, the Papangelou conditional intensity completely characterizes theGibbs point process in terms of the Georgii–Nguyen–Zessin (GNZ) formula (see Papangelou(2009) and Zessin (2009) for historical comments and Georgii (1976) or Nguyen & Zessin(1979a) for a general presentation).

Lemma 2.1 (Georgii–Nguyen–Zessin formula). For any measurable function h W S � � ! R

such that the following quantities are defined and finite, then

E

Xu2X

h.u;X n u/

!D E

�ZRd�M

h.u;X/��?.u;X/du�

(2.4)

where E denotes the expectation with respect to P�? .On the basis of this formula, Baddeley et al. (2005) defined the concept of h-innovation of a

spatial point process (for a function h W S��! R). The h-innovation computed in a boundeddomain ƒ is the centred random variable defined by

Iƒ.X; h/ WDXu2Xƒ

h.u;X n u/ �

Zƒ�M

h.u;X/��?.u;X/du: (2.5)



Baddeley et al. (2005) proposed to replace �? in (2.5) by a consistent estimator to obtainresiduals for spatial point processes. Such residuals can be used as a diagnostic tool ofgoodness-of-fit and they have also been considered by Coeurjolly & Lavancier (2012) andBaddeley et al. (2011) both from a theoretical and practical point of view.

2.2. Examples of Gibbs point processes

In this section, we present some classical examples of parametric point process models (see, forexample, Møller & Waagepetersen (2004) for more details). In particular, these examples willbe used in the simulation study in Section 4 to assess the methodology proposed in this paper.Let u 2 S and x 2 �. Most of the examples presented hereafter are not marked, and in thesecases, we omit the mark notation and simply write u in place of u D u.

(i) Poisson point process. Basic example for which the log-Papangelou conditional intensityis a constant, that is, log�� .u; x/ D � . The assumption [Model] is satisfied for any valueof � .

(ii) Strauss point process. Defined by

log�� .u; x/ D �1 C �2nŒ0;R�.u; x/

where nŒ0;R�.u; x/ DPv2x 1.kv � uk � R/ is the number of R-close neighbours of

u in x. This process has range of interaction R, and assumption [Model] is satisfied ifR <1 and �2 � 0.

(iii) Piecewise Strauss point process. Generalization of the Strauss point process obtained bysubstituting the indicator function with a step function. It is defined by

log�� .u; x/ D �1 CpXjD1

�jn.Rj�1;Rj �.u; x/

where n.Rj�1;Rj �.u; x/ DPv2x 1.Rj�1 < kv � uk � Rj / for R0 D 0 < R1 < � � � <

Rp . This process has range of interaction Rp , and assumption [Model] is satisfied ifRp <1 and �2; : : : ; �p � 0.

(iv) Geyer saturation point process (Geyer, 1999). The saturation point process, with inter-action radius R, saturation threshold s, and parameters ˇ and � , is the point processin which each point u in the pattern x contributes a factor ˇ�min.s;nŒ0;R�.u;xnu// to theprobability density. When s D 1, the log-Papangelou conditional intensity correspondsto

log�� .u; x/ D �1 C �2

Xv2x[u

1.d.v; x [ u n v/ � R/ �Xv2x

1.d.v; x n v/ � R/

!where d.u; x/ D minw2x kw � uk is the distance from u to the nearest point of x. Thisprocess has range of interaction 2R, and assumption [Model] is satisfied if R <1.

(v) Multi-type Strauss point process. This is a marked point process with m discrete marks(M D ¹1; : : : ; mº). It is defined by

log�� ..u; j /; x/ D �j CmXkD1

�jknŒ0;Rjk/..u; j /; xk/; j D 1; : : : ; m

where �jk D �kj and Rjk D Rkj . Here, nŒ0;Rjk/..u; j /; xk/ denotes the number of

points in x of type k that are Rjk-close neighbours to the point .u; j / of type j . Theprocess has range of interaction R D maxRjk , and assumption [Model] is satisfiedwhen R <1 and �jk � 0, for all j; k 2 ¹1; : : : ; mº.



2.3. Maximum pseudo-likelihood estimator

Assume we observe XƒC , where ƒC Rd is bounded, and let ƒ D ƒCR be the erosion of

ƒC by R, that is,

ƒ D ƒC R D ¹u 2 ƒCjB.u;R/ � ƒCº: (2.6)

The MPLE is the value � D b� that maximizes the pseudo-likelihood

PLƒ.X I �/ DYu2Xƒ

�� .u;X n u/ exp��

Zƒ�M

�� .u;X/du�:

This maximum is attained at the root of the score function with j th component

@

@�jlogPLƒ.X I �/ D

Xu2Xƒ

vj .u;X n u/ �

Zƒ�M

vj .u;X/�� .u;X/du

for j D 1; : : : ; p.Because X is observed in ƒC and because we assume the finite range assumption (2.1),

the pseudo-likelihood can be effectively computed in ƒ, XƒCnƒ playing the role of a bordercorrection.

To detail the asymptotic properties of the MPLE, we now let ƒ D ƒn depend on an indexn. We assume that .ƒn/n�1 is a sequence of increasing cubes such that ƒn ! R

d as n ! 1.Furthermore, we need the following technical assumption:[MPLE] The parameter space ‚ R

p is compact, �? 2 V‚ and for any � ¤ �?, the followingidentifiability condition holds

P�?�.� � �?/T v.0�; X/ ¤ 0

�> 0:

Furthermore, for all u 2 S and x 2 �, there exists a constant � 0 such that one of thefollowing two assumptions is satisfied:

�i � 0 and � � vi .0�; x/ � n.xB.0;R// (2.7)

or

� � vi .0�; x/ � (2.8)

where R is the range of interaction defined in (2.1). Recall that 0� WD .0; �/.Billiot et al. (2008) extended the results in Jensen & Møller (1991) and Jensen & Künsch

(1994) and obtained consistency and asymptotic normality of the MPLE for a large class ofmodels including the examples presented in Section 2.2. We now state the central limit theoremfor the MPLE.

Proposition 2.1 (Billiot et al. (2008)). Assume that the distribution of X is ergodic and that[MPLE] is satisfied. Then, for n ! 1, the MPLE is strongly consistent and satisfies thefollowing central limit theorem

jƒnj1=2.b�n � �?/ d

�! N .0; U�1†U�1/;



where U and † are .p; p/ matrices with entries

Ujk D EŒvj .0M ; X/vk.0M ; X/��?.0M ; X/ (2.9)

†jk D limn!1

jƒnj�1Cov

�@

@�jlogPLƒ.X I �?/;

@

@�klogPLƒ.X I �?/

�(2.10)

where M is a random variable with distribution �m.To propose a computationally efficient way of estimating the asymptotic covariance matrix

for the MPLE, the key point is to note that

@

@�jlogPLƒ.X I �?/ D Iƒ.X; vj /: (2.11)

Thus, from (2.10), we need to be able to estimate the covariance between innovations, which wedetail in the following section.

3. Covariance of innovations

Several properties of the innovations are established in Baddeley et al. (2005) and Baddeleyet al. (2008). In particular, Proposition 4 in Baddeley et al. (2005) presents a formula for thevariance of Iƒ.X; h/. We first extend this result by providing a formula for the covariancebetween two innovations Iƒ.X; g/ and Iƒ.X; h/. Then, we study the asymptotic covariancebetween innovations. In particular, we propose a consistent estimator of this covariance thatrequires no numerical integration. Finally, the results are applied to estimate the asymptoticcovariance matrix of the MPLE, which allows us to quantify the uncertainty of the MPLEmuch faster than previously possible.

To obtain the asymptotic results in this section, we need the second-order Papangelouconditional intensity

�� .¹u; vº; X/ D �� .u;X [ v/�� .v;X/ D �� .v;X [ u/�� .u;X/; u; v 2 S: (3.1)

Also, for any v 2 S, we define the difference operator�v applied to a function h W S��! R as

�vh.u;X/ WD h.u;X [ v/ � h.u;X/: (3.2)

We now present our first result.

Lemma 3.1. Assume g; h W S � � ! R such that the following quantities are defined and finite,then

Cov .Iƒ.X; g/; Iƒ.X; h// D eA1;ƒ.g; h/C eA2;ƒ.g; h/C eA3;ƒ.g; h/with

eA1;ƒ.g; h/ D E�Zƒ�M

g.u;X/h.u;X/��?.u;X/du�

eA2;ƒ.g; h/ D E�Z.ƒ�M/2

g.u;X/h.v;X/ .��?.u;X/��?.v;X/

� ��?.¹u; vº; X/

�dudv

�eA3;ƒ.g; h/ D E

�Z.ƒ�M/2

�vg.u;X/�uh.v;X/��?.¹u; vº; X/dudv�:



Now, we study the normalized covariance of innovations

Cƒn.g; h/ WD jƒnj�1 Cov .Iƒn.X; g/; Iƒn.X; h//

where .ƒn/n�1 is a sequence of increasing cubes such that ƒn ! Rd as n ! 1. For this, we

need certain conditions on the functions g and h, as detailed in the following assumption:[H(g,h)] The functions g; h W S � � ! R satisfy (2.1) and (2.2). Furthermore, there exists anopen neighbourhood V of �? such that for any � 2 V , the random variables I1; I2; I3 given by

I1.g; h/ WDˇ̌̌g.0M ; X/h.0M ; X/��?.0

M ; X/ˇ̌̌

(3.3)

I2.g; h/ WD

ZB.0;R/�M

ˇ̌̌g.0M ; X/h.v;X/��?.¹0

M ; vº; X/

�

�� .0

M ; X/�� .v;X/

�� .¹0M ; vº; X/� 1

!ˇ̌̌̌ˇ dv (3.4)

I3.g; h/ WD

ZB.0;R/�M

ˇ̌̌�vg.0

M ; X/�0M h.v;X/��?.¹0M ; vº; X/

ˇ̌̌dv (3.5)

have finite expectation.Note that [Model] implies that �� .¹u; vº; X/ is almost surely positive for any u; v 2 S and any

� 2 ‚. In particular the ratio in (3.4) is therefore well defined. For the result in the succeedingtext, the neighbourhood V appearing in [H(g,h)] could be replaced by ¹�?º.

Proposition 3.2. Assume [H(g,h)] and let M be a random variable with distribution �m. Then,as n!1,

Cƒn.g; h/! C.g; h/ D A1.g; h/C A2.g; h/C A3.g; h/

where

A1.g; h/ D Ehg.0M ; X/h.0M ; X/��?.0

M ; X/i

A2.g; h/ D

ZB.0;R/�M

Ehg.0M ; X/h.v;X/

��?.0

M ; X/��?.v;X/

��?.¹0M ; vº; X/

�idv

A3.g; h/ D

ZB.0;R/�M

Eh�vg.0

M ; X/�0M h.v;X/��?.¹0M ; vº; X/

idv:

The following main result of this paper establishes a strongly consistent and computationallyfast estimator of C.g; h/. The idea behind our result is to combine a consistent estimator of �?

with estimators of the terms Ai .g; h/, i D 1; 2; 3 of Proposition 3.2.

Theorem 3.3. Let g� ; h� W S � � ! R be parametric families of functions, which are (almostsurely) continuous in � . Assume there exists an open neighbourhood V of �? such that for all� 2 V , the assumption ŒH.g� ; h� / holds, and let b� D b�n.X/ be a strongly consistent estimator of�?. Then, as n!1, we have the following almost sure convergence

bC.g O� ; h O� / WD bA1.g O� ; h O� /C bA2.g O� ; h O� /C bA3.g O� ; h O� /! C.g�? ; h�?/



where

bA1.g O� ; h O� / D 1

jƒnj

Xu2Xƒn

g O� .u;X n u/h O� .u;X n u/

bA2.g O� ; h O� / D 1

jƒnj

Xu;v2Xƒn

u¤v;ku�vk�R

g O� .u;X n ¹u; vº/h O� .v;X n ¹u; vº/

�

�b� .u;X n ¹u; vº/�b� .v;X n ¹u; vº/

� O� .¹u; vº; X n ¹u; vº/� 1

!bA3.g O� ; h O� / D 1

jƒnj

Xu;v2Xƒn

u¤v;ku�vk�R

�vg O� .u;X n ¹u; vº/�uh O� .v;X n ¹u; vº/:

From Proposition 3.2 and (2.9)–(2.11), we have Ujk D A1.vj ; vk/ and †jk D C.vj ; vk/.Then, Corollary 3.4 follows by combining Proposition 2.1 with Theorem 3.3.

Corollary 3.4. Let the matrices bAi .vj ; vk/, i D 1; 2; 3, be as in Theorem 3.3 with b� givenby the MPLE. Under the assumption [MPLE], the .p; p/ matrices bU and b† with entriesbUjk D bA1.vj ; vk/ and b†jk D bC.vj ; vk/ D bA1.vj ; vk/ C bA2.vj ; vk/ C bA3.vj ; vk/ arestrongly consistent estimators ofU and†. Moreover, if† is positive definite, we have the followingconvergence in distribution as n!1

jƒnj1=2b†�1=2bU.b�n � �?/ d

�!N .0; Ip/: (3.6)

We point out that (3.6) does not require the ergodicity of P�? , and it therefore applies evenif a phase transition occurs (see Jensen & Künsch (1994) for a proof of this). Furthermore, werefer to Billiot et al. (2008) for a proof of the positive definiteness of the matrix † for a largeclass of models (including the ones presented in this paper).

4. Applications

In this section, we describe how the theory of Sections 2 and 3 is applied in practice (for d D 2).In Section 4.1., we detail the methodology for a Strauss point process. Section 4.2. describes asimulation study involving the models presented in Section 2.2.

We assume that we are given a realization xC of XƒC , and we let x D xCƒ

denote the real-ization of Xƒ, whereƒ is given by (2.6). This means that xC nx is used as a border correction.Let b� denote the MPLE based on XƒC .

From Corollary 3.4, we use the approximation b� � N .�?;b†MPLE/, where b†MPLE D

jƒj�1bU�1b†bU�1. If Os2i

denotes the i th diagonal element of b†MPLE, then the approximate

95 per cent confidence interval for �?i

is [b�i � 1:96Osi ;b�i C 1:96Osi , i D 1; : : : ; p. The approxi-

mate 95 per cent confidence region for �? is ¹� W .b� � �/Tb†�1MPLE.b� � �/ � q95%º, where q95%

is the 95 per cent quantile of a �2p distribution.

4.1. Strauss point process

When X is a Strauss point process, the formulas for bA1, bA2 and bA3 defining b†MPLE simplifyconsiderably, and we detail these in the following to underline the computational simplicity ofb†MPLE. Let n D n.x/ be the number of points in x D .x1; : : : ; xn/. We denote by T (respectivelyTC) the vector of length n (respectively n.xC/) with i th component given by the number of



R-close neighbours of xi in x n xi (respectively R-close neighbours of xi in xC n xi ). Then,

bA1 D jƒj�1 nPi TC

iPi TC

i

Pi TC

i

2

!

bA2 D jƒj�1.e�b�2 � 1/ Pi Ti

Pi Ti .T

C

i� 1/P

i Ti .TC

i� 1/

PIR.TCi� 1/.TC

j� 1/

!

bA3 D jƒj�1 0 0

0Pi Ti

!

where IR D ¹i; j D 1; : : : ; n W kxi � xj k � R; xi ¤ xj º.As an example, consider a realization in ƒC D Œ�R; 1 C R2 of a Strauss point process

with interaction range R D 0:05 and parameters �?1D log.ˇ/ D log.200/ � 5:3 and �?

2D

log.�/ D log.0:5/ � �0:69. Such a realization is generated via a perfect simulation algorithmin spatstat as follows:> X <- rStrauss(beta=200, gamma=0.5, R=0.05, W=square(-0.05,1.05))

In this case, the point pattern with 204 points, shown in Figure 1(a), was generated. Then, theMPLE of the parameters of a Strauss point process model with interaction range R D 0:05 iscalculated via:> fit <- ppm(X, interaction=Strauss(0.05))

The result fit contains relevant information about the fitted model and the MPLE, whichwas .b�1;b�2/ D .5:23;�0:78/ in this case. The approximate covariance matrix of the MPLE isestimated using the aforementioned formulas via> sigma <- vcov(fit)

The result is simply the estimated covariance matrix of the MPLE. From this, we can cal-culate the approximate 95 per cent confidence region, shown in Fig. 1(b), and the individualconfidence intervals, which in this case were Œ4:91; 5:55 and Œ�1:23;�0:33 for �?

1and �?

2,

respectively.

5.0 5.2 5.4 5.6

-1.2

-1.0

-0.8

-0.6

-0.4

-0.2

1

2

(a) (b)

Fig. 1. (a) Realization of a Strauss point process. (b) Approximate 95 per cent confidence region for themaximum pseudo-likelihood estimate.



Note that the procedure vcov is not specific to the Strauss model but works for any pointprocess model implemented in spatstat.

We have repeated the instructions above 500 times for ƒC D Œ�R; ` C R2, ` D1=3; 1=2; 3=4; 1; 2 to have a rough idea of how large the observation window (or more pre-cisely the average number of points in the observation window) should be to rely on asymptoticresults. Table 1 shows the empirical coverage rates obtained from the percentage of confidenceellipses (with nominal level 95 per cent) that contained the true parameter values as well as theempirical intensity of the process. Overall, the coverage rates are acceptably close to the nomi-nal level except in case of the smallest observation window ` D 1=3 where the average numberof points in ƒ only is approximately n D 123=9 � 14.

We can use the empirical covariance matrix based on the 500 parameter estimates for eachobservation window as an estimate of the true covariance matrix and compare this with the 500estimated covariance matrices based on our formulas. Rather than quantifying the discrepancybetween the empirical covariance matrix and our estimates in terms of a matrix norm, webelieve it is much more illustrative to simply compare the confidence ellipses (translated to theorigin), which is carried out in Figure 2(a) for ` D 1=3; 1=2; 1; 2 (the last window size is omittedto save space). Furthermore, Figure 2(b) shows the parameter estimates (which are the centresof the confidence ellipses), and there is no apparent bias in the location of the ellipses relativeto the true parameter value.

Table 1. Empirical coverage rates for the Strauss process as the window size increases

n=jƒj ` D 1=3 ` D 1=2 ` D 3=4 ` D 1 ` D 2

123 93.8 94.6 95.0 95.0 94.6

The first column is the empirical intensity of the process. The results are based on 500independent realizations for each window size.

= 1/3 = 1/2

= 1

-2 -1 0 1 2

-3-2

-10

21

3

-1.0 -0.5 0.0 0.5 1.0

-2-1

01

2

-0.4 -0.2 0.0 0.2 0.4

-0.5

0.0

0.5

-0.2 -0.1 0.0 0.1 0.2

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

= 2

= 1/3 = 1/2

= 1

4.0 4.5 5.0 5.5 6.0 6.5

-3.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

4.5 5.0 5.5 6.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

4.8 5.0 5.2 5.4 5.6 5.8

-1.6

-1.4

-1.2

-1.0

-0.8

-0.6

-0.4

5.1 5.2 5.3 5.4 5.5

-1.0

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

= 2

(a) (b)

Fig. 2. (a) A total of 500 estimated confidence ellipses (grey) compared with the empirical confidenceellipse (black). (b) A total of 500 parameter estimates (grey) compared with the true value (black). In bothfigures, �1 is the abscissa, �2 is the ordinate and ƒC D Œ�R; ` C R�2, ` D 1=3; 1=2; 1; 2 (notice thedifferent axis scales in the individual plots).



Table 2. Results for different simulated Gibbs point process models based on 500 replicationswithƒ D Œ0; `�2; ` D 1=2; 1; 2

Coverage .%/ One-dimensional coverage .%/

n=jƒj ` D 1=2 ` D 1 ` D 2 ` D 1=2 ` D 1 ` D 2

S1 100 90.6 96.2 96.0 91.4 – 95.0 96.0 – 96.4 96.2 – 97.2S2 123 94.6 95.0 94.6 95.4 – 96.4 96.2 – 97.2 94.2 – 94.8S3 157 97.2 95.0 94.8 97.0 – 98.2 96.0 – 96.0 94.6 – 96.0P1 57 85.5 89.8 96.2 94.2 – 96.1 93.6 – 96.6 94.4 – 96.8P2 77 85.8 91.2 94.6 94.6 – 97.1 93.2 – 95.0 93.8 – 96.6G1 120 92.8 94.0 95.4 93.0 – 93.6 93.2 – 93.8 95.2 – 96.2G2 84 91.8 94.0 95.0 94.2 – 95.8 94.2 – 94.4 96.4 – 96.6M1 182 86.3 92.8 95.6 94.1 – 96.7 93.8 – 96.2 95.0 – 97.0M2 186 81.5 90.8 93.8 93.4 – 97.3 94.0 – 95.8 94.4 – 96.2

From left to right: empirical intensity, empirical coverage rates, range of empirical one-dimensional coverage rates.

4.2. Simulation study

In this section, we present a simulation study using the following models:

� Strauss point processes with R D 0:05 and �?1D log.200/, where models S1, S2 and S3,

respectively, have �?2D log.0:8/, �?

2D log.0:5/ and �?

2D log.0:2/.

� Piecewise Strauss point processes with R1 D 0:05, R2 D 0:1 and �?1D log.200/, where

models P1 and P2, respectively, have .�?2; �?3/ D .log.0:8/; log.0:2// and .�?

2; �?3/ D

.log.0:2/; log.0:8//.� Geyer point processes with R D 0:05, saturation threshold s D 1 and �?

1D log.100/,

where models G1 and G2, respectively, have �?2D log.1:2/ and �?

2D log.0:8/.

� Multi-type Strauss point processes with two types, R11 D R22 D R12 D 0:05 and�?1D �?

2D log.200/, where models M1 and M2, respectively, have �?

11D �?

22D �?

12D

log.0:5/, and �?11D �?

22D log.0:8/, �?

12D log.0:2/.

For each model, 500 realizations were generated using the Metropolis–Hastings algo-rithm with birth, death and shift proposals as detailed in Geyer & Møller (1994) (exceptfor the Strauss point processes that were generated using the perfect simulation algorithmof Berthelsen & Møller (2002); Berthelsen & Møller (2003)). For all the models, ƒC DŒ�R; ` C R2, ` D 1=2; 1; 2, where R is the interaction range of each model. On the basis ofthese simulations, we calculated the approximate 95 per cent confidence region (respectivelyconfidence intervals for each parameter) and checked whether it covered �? (respectively �?

j).

The results given in Table 2 show that the coverage rates are close to the nominal 95 per centfor all the models when ` D 2, whereas the results vary from model to model for the smallerwindow sizes. In particular, the models with many parameters (M1 and M2) and the modelswith a low number of points (P1 and P2) have low coverage rates for the smaller window sizes.

Acknowledgements

This research was supported by Joseph Fourier University of Grenoble (project ‘SpaComp’),by Centre for Stochastic Geometry and Advanced Bioimaging, funded by a grant from theVillum Foundation, by the Danish Natural Science Research Council, grant 09-072331, ‘Pointprocess modelling and statistical inference’ and by l’Institut Français du Danemark. A partof this research was done while the first author was visiting the Department of MathematicalSciences at Aalborg University. He would like to thank the members of the department fortheir kind hospitality.



References

Baddeley, A. & Turner, R. (2000). Practical maximum pseudolikelihood for spatial point patterns (withdiscussion). Aust. Nz. J. Stat. 42, 283–322.

Baddeley, A. & Turner, R. (2005). Modelling spatial point patterns in R. J. Stat. Softw. 12, 1–42.Baddeley, A., Turner, R., Møller, J. & Hazelton, M. (2005). Residual analysis for spatial point processes

(with discussion). J. R. Stat. Soc. 67, 1–35.Baddeley, A., Møller, J. & Pakes, A.G. (2008). Properties of residuals for spatial point processes. Ann. I.

Stat. Math. 60, 627–649.Baddeley, A., Rubak, E. & Møller, J. (2011). Score, pseudo-score and residual diagnostics for spatial point

process models. Stat. Sci. 26, 613–646.Berthelsen, K.K. & Møller, J. (2002). A primer on perfect simulation for spatial point processes. B. Braz.

Math. Soc. 33, 351–367.Berthelsen, K.K. & Møller, J. (2003). Likelihood and non-parametric Bayesian MCMC inference

for spatial point processes based on perfect simulation and path sampling. Scand. J. Stat. 30,549–564.

Bertin, E., Billiot, J.-M. & Drouilhet, R. (2008). R-local Delaunay inhibition model. J. Stat. Phys. 132,649–667.

Besag, J. (1975). Statistical analysis of Non-Lattice data. J. R. Stat. Soc. 24, 179–195.Billiot, J.-M., Coeurjolly, J.-F. & Drouilhet, R. (2008). Maximum pseudolikelihood estimator for exponen-

tial family models of marked Gibbs point processes. Electron. J. Stat. 2, 234–264.Coeurjolly, J.-F. & Lavancier, F. (2012). Residuals and goodness-of-fit tests for stationary marked Gibbs

point processes. J. R. Stat. Soc.: Ser. B 75, (2), 247–276.Coeurjolly, J.-F., Dereudre, D., Drouilhet, R. & Lavancier, F. (2012). Takacs-Fiksel method for stationary

marked Gibbs point processes. Scand. J. Stat. 49, 416–443.Dereudre, D., Drouilhet, R. & Georgii, H.O. (2012). Existence of Gibbsian point processes with geometry-

dependent interactions. J. R. Stat. Soc. 153, (3-4), 643–670.Georgii, H.O. (1976). Canonical and grand canonical Gibbs states for continuum systems. Commun. Math.

Phys. 48, 31–51.Georgii, H.O. (1988). Gibbs measures and phase transitions, de Gruyter, Berlin.Geyer, C.J. (1999). Likelihood inference for spatial point processes. In Stochastic geometry: Likelihood

and computation, number 80, chapter 3 (eds Kendall, W.S., Barndorff-Nielsen, O.E. & Van Lieshout,M.N.M.), Monographs on Statistics and Applied Probability Chapman and Hall / CRC, Boca Raton,Florida; 79–140.

Geyer, C.J. & Møller, J. (1994). Simulation procedures and likelihood inference for spatial point processes.Scand. J. Stat. 21, 359–373.

Illian, J., Penttinen, A., Stoyan, H. & Stoyan, D. (2008). Statistical analysis and modelling of spatial pointpatterns, Wiley-Interscience, Chichester.

Jensen, J.L. & Künsch, H.R. (1994). On asymptotic normality of pseudolikelihood estimates of pairwiseinteraction processes. Ann. I Stat. Math. 46, 475–486.

Jensen, J.L. & Møller, J. (1991). Pseudolikelihood for exponential family models of spatial point processes.Ann. Appl. Probab. 1, 445–461.

Møller, J. & Waagepetersen, R. (2004). Statistical inference and simulation for spatial point processes,Chapman and Hall/CRC, Boca Raton.

Møller, J. & Waagepetersen, R.P. (2007). Modern statistics for spatial point processes. Scand. J. Stat. 34,643–684.

Nguyen, X. & Zessin, H. (1979a). Integral and differential characterizations of Gibbs processes. Math.Nachr. 88, 105–115.

Nguyen, X.X. & Zessin, H. (1979b). Ergodic theorems for spatial processes. Z. Wahrscheinlichkeit 48,133–158.

Papangelou, F. (2009). The Armenian Connection: Reminiscences from a time of interactions. J. Contemp.Math. Anal. 44, 14–19.

Preston, C.J. (1976). Random fields, Springer Verlag, Berlin.Ruelle, D. (1969). Statistical mechanics, Benjamin, New York-Amsterdam.Stoyan, D., Kendall, W.S. & Mecke, J. (1995). Stochastic geometry and its applications, John Wiley and

Sons, Chichester.Zessin, H. (2009). Der papangelou prozess. J. Contemp. Math. Anal. 44, 36–44.



Received April 2012, in final form January 2013

Ege Rubak, Department of Mathematical Sciences, Aalborg University, Fredrik Bajers Vej 7G,Aalborg, Denmark.

E-mail: [email protected]

Appendix A:

A.1 Proof of Lemma 3.1

Proof. From the GNZ formula (2.4), EŒIƒ.X; g/ D EŒIƒ.X; h/ D 0. Now, we decompose thecovariance into four terms

EŒIƒ.X; g/Iƒ.X; h/ D T1 C T2 C T3 C T4:

These different terms are defined and simplified using again the GNZ formula as follows

T1 D E�Z.ƒ�M/2

g.u;X/��?.u;X/h.v;X/��?.v;X/dudv�

(3.7)

T2 D �E

24Zƒ�M

g.u;X/��?.u;X/duXv2Xƒ

h.v;X n v/

35D �E

24 Xv2Xƒ

�h.v;X n v/

Zƒ�M

g.u;X/��?.u;X/du�35

D �E�Z.ƒ�M/2

h.v;X/g.u;X [ v/��?.u;X [ v/��?.v;X/dudv�

(3.8)

T3 D �E

24Zƒ�M

h.v;X/��?.v;X/dvXu2Xƒ

g.u;X n u/

35D �E

�Z.ƒ�M/2

g.u;X/h.v;X [ u/��?.¹u; vº; X/dudv�

(3.9)

and

T4 D E

24 Xu;v2Xƒ

g.u;X n u/h.v;X n v/

35

D E

2664 Xu;v2Xƒu¤v

g.u;X n u/h.v;X n v/

3775C E

24 Xu2Xƒ

g.u;X n u/h.u;X n u/

35D E

�Z.ƒ�M/2

g.u;X [ v/h.v;X [ u/��?.¹u; vº; X/dudv�

C E�Zƒ�M

g.u;X/h.u;X/��?.u;X/du�: (3.10)

Rearranging (3.7)–(3.10) leads to the result.

A.2 Proof of Proposition 3.2

Proof. From Lemma 3.1, we just have to prove that jƒnj�1 eAi;ƒn .g; h/ ! Ai .g; h/, for i D1; 2; 3. The stationarity of the point process is sufficient for i D 1 because jƒnj�1 eA1;ƒn D



A1.g; h/. For the other terms, let u; v 2 S such that ku � vk � R. Then, for any functionf W S ��! R satisfying (2.1), we have f .u; x [ v/ D f .u; x/, which implies

��?.¹u; vº; X/ D ��?.u;X [ v/��?.v;X/ D ��?.u;X/��?.v;X/

and

�vg.u;X/ D g.u;X [ v/ � g.u;X/ D 0:

Then, we focus on the convergence of the second term (the third one follows by similararguments). Let us decompose eA2;ƒn.g; h/ D eA1

2;ƒn.g; h/C eA2

2;ƒn.g; h/ where

eA12;ƒn.g; h/ WD E�Z.ƒn�R/�M

Z.B.u;R/\ƒn/�M

f .u; v;X/dvdu�

eA22;ƒn.g; h/ WD E�Z.ƒnn.ƒn�R//�M

Z.B.u;R/\ƒn/�M

f .u; v;X/dvdu�;

and f .u; v;X/ WD g.u;X/h.v;X/ .��?.u;X/��?.v;X/ � ��?.¹u; vº; X//. From the stationar-ity of X and because f satisfies (2.2), we have

jƒnj�1 eA12;ƒn.g; h/ D jƒnj�1E

�Z.ƒn�R/�M

ZB.u;R/�M

f .u; v;X/dvdu�

Djƒn Rj

jƒnjA2.g; h/

! A2.g; h/

and

jƒnj�1 jeA22;ƒn.g; h/j � jƒnj�1E

�Z.ƒnn.ƒn�R//�M

ZB.u;R/�M

jf .u; v;X/jdvdu�

Djƒn n .ƒn R/j

jƒnjE�Z

B.0;R/�Mjf .0M ; v; X/jdv

�! 0

as n!1.

A.3 Proof of Theorem 3.3

Assumption [Model] asserts the existence of at least one stationary Gibbs measure. If this mea-sure is unique, it is ergodic. Otherwise, it can be represented as a mixture of ergodic measures(see (Georgii 1988), Theorem 14.10). Therefore, we can assume, for this proof, that P�? isergodic.

Proof. For j D 1; 2; 3, let us denote by bAj .�/ the quantity bAj .g; h/ where b� is replaced by � ,for � 2 V . In the following, the general ergodic theorem for spatial point processes obtainedby Nguyen & Zessin (1979b) (see also Lemma 2 in Coeurjolly et al. (2012)) combined with theGNZ formula (2.4) will be widely used (as n!1). These uses are justified by the assumptions[Model] and [H.g�;h�/]. Using the aforementioned arguments, we immediately obtain thefollowing almost sure convergence.

bA1.�/! A1.�/ WD Ehg� .0

M ; X/h� .0M ; X/��?.0

M ; X/i: (5.11)



As in the proof of Proposition 3.2, we focus on the convergence of the second term bA2.�/ (thethird one follows by similar arguments). Let us decompose bA2.�/ D bA1

2.�/C bA2

2.�/ where

bA12.�/ D 1

jƒnj

Xu2Xƒn�R

Xv2XB.u;R/nu

g� .u;X n ¹u; vº/h� .v;X n ¹u; vº/

�

�� .u;X n ¹u; vº/�� .v;X n ¹u; vº/

�� .¹u; vº; X n ¹u; vº/� 1

�bA22.�/ D 1

jƒnj

Xu2Xƒnnƒn�R

Xv2Xƒn\B.u;R/nu

g� .u;X n ¹u; vº/h� .v;X n ¹u; vº/

�

�� .u;X n ¹u; vº/�� .v;X n ¹u; vº/

�� .¹u; vº; X n ¹u; vº/� 1

�:

This decomposition will allow us to apply an ergodic theorem for the term bA12.�/. The termbA2

2.�/ that deals with edges will be proved to be negligible. Because jƒn Rj=jƒnj � 1 as

n!1, we obtain the following almost sure convergence

bA12.�/! E

24 Xv2XB.0;R/

g� .0M ; X n v/h� .v;X n v/�

�

�� .0

M ; X n v/�� .v;X n v/

�� .¹0M ; vº; X n v/� 1

!��?.0

M ; X/

#

D E�Z

B.0;R/�Mg� .0

M ; X/h� .v;X/

�

�� .0

M ; X/�� .v;X/

�� .¹0M ; vº; X/� 1

!��?.0

M ; X [ v/��?.v;X/„ ƒ‚ …D��?.¹0

M ;vº;X/

dv

375DW A2.�/:

Now, there exists n0 2 N such that for all n � n0, the following holds almost surely

jbA22.�/j � 1

jƒnj

Xu2Xƒnn.ƒn�R/

Xv2XB.u;R/nu

ˇ̌̌g� .u;X n ¹u; vº/h� .v;X n ¹u; vº/

�

�� .u;X n ¹u; vº/�� .v;X n ¹u; vº/

�� .¹u; vº; X n ¹u; vº/� 1

�ˇ̌̌� 2jƒn n .ƒn R/j

jƒnjI2.g� ; h� /

! 0:

In the previous equations, I2.g� ; h� / is given by (3.4). With similar arguments, we may provethat bA3.�/! A3.�/ where

A3.�/ WD E�Z

B.0;R/�M�vg� .0

M ; X/�0M h� .v;X/��?.¹0M ; vº; X/dv

�:

For any � 2 V , bC.�/ WD P3jD1

bAj .�/ converges P�? -almost surely towards C.�/ WDP3jD1 Aj .�/ as n ! 1. Under the assumption [H(g� ;h�)], bC.�/ and C.�/ are continu-

ous functions in � that implies bC.g O� ; h O� / ! C.�?/. The proof is therefore finished becauseC.�?/ D C.g�? ; h�?/.



A.4 Proof of Corollary 3.4

Proof. Because the MPLE is a strongly consistent estimate of �? (Proposition 2.1), the onlything to prove is that for all j; k D 1; : : : ; p, the assumption [H(vj ; vk)] is fulfilled. In par-ticular, we have to verify that the variables Ii .vj ; vk/, i D 1; 2; 3 defined by (3.3)–(3.5) havefinite expectation. We note that [MPLE] implies the local stability property, that is, thereexists e� < 1, such that for any u; v 2 S, x 2 � and � 2 ‚, we have �� .u; x/ � e� and�� .¹u; vº; x/ D �� .u; x/�� .v; x [ u/ �e�2. For ease of presentation, we assume in the follow-ing that vi .u; x/ satisfies (2.7) for i D 1; : : : ; p. Similar arguments can be used when some ofvi .u; x/, i D 1; : : : ; p satisfy (2.8). Then, for any u; v 2 S such that ku � vk � R, we have1=�� .u; x/ � exp. Qn.xB.u;R///, where Q D sup� .�

Pp

iD1�i / > 0 and

1

�� .u; v; x/� e Q�.n.xB.u;R//Cn.xB.v;R/// � e Q� n.xB.u;2R//:

Then, we derive

I1.vj ; vk/ �e�2 n.XB.0;R//2;

I2.vj ; vk/ � 2e�32e Q� n.XB.0;2R//ZB.0;R/

n.XB.0;R// n.XB.v;R//dv

� 2e�42jB.0; R/jn.XB.0;R//n.XB.0;2R//e Q� n.XB.0;2R//

I3.vj ; vk/ � 4e�22jB.0; R/j .1C 2n.XB.0;R/// .1C 2n.XB.0;2R///:

The result is therefore proved because we recall that for any spatial Gibbs point process satisfy-ing a local stability property, we have in particular EŒn.XA/kec n.XA/ < 1 for any integer k,constant c and bounded Borel setA (see, for example, Bertin et al. (2008) [Proposition 11]).


fast covariance estimation for innovations computed from a ... · cedure is implemented in the...

Documents