poisson/gamma random field models for spatial statistics€¦ · with infinitely divisible...

Biometrika (1998), 85, 2, pp. 251-267Printed in Great Britain

Poisson/gamma random field models for spatial statistics

BY ROBERT L. WOLPERTInstitute of Statistics and Decision Sciences, Duke University, Durham, North Carolina

27708-0251, [email protected]

AND KATJA ICKSTADTDepartment of Statistics, University of North Carolina, Chapel Hill, North Carolina

27599-3260, U.S.A.katj [email protected]

SUMMARY

Doubly stochastic Bayesian hierarchical models are introduced to account for uncer-tainty and spatial variation in the underlying intensity measure for point process models.Inhomogeneous gamma process random fields and, more generally, Markov random fieldswith infinitely divisible distributions are used to construct positively autocorrelated inten-sity measures for spatial Poisson point processes; these in turn are used to model thenumber and location of individual events. A data augmentation scheme and Markov chainMonte Carlo numerical methods are employed to generate samples from Bayesian pos-terior and predictive distributions. The methods are developed in both continuous anddiscrete settings, and are applied to a problem in forest ecology.

Some key words: Bayesian mixture model; Bioabundance; Cox process; Data augmentation; Levy process;Markov chain Monte Carlo; Simulation.

1. INTRODUCTION

Having observed counts and some collateral information in some region we would liketo make inferences about the probability mechanism underlying these counts, exploringtheir dependence and relation to covariates. In many applications the data will includespatial information, such as the exact locations of the observations. Statistical methodsand models that reflect spatial variation explicitly are used commonly now in many areasof applied statistics, including geostatistics, disease mapping and image restoration; seeCressie (1993) or Diggle (1983) for excellent overviews. In this paper we present a newclass of models for exploring the possibility that counts from nearby locations are similar,in being positively correlated or exhibiting spatial dependence in some other way.

A number of modelling approaches have been proposed for studying this question.Positive association is exhibited by inhomogeneous Poisson processes with continuouslyvarying intensity, including doubly-stochastic or Cox processes and Poisson cluster models(Cox, 1955; Cox & Isham, 1980, Ch. 6; Neyman & Scott, 1958). Although some commonMarkov point processes (Strauss, 1975) are, like the auto-Poisson processes of Besag(1974), purely inhibitory (Kelly & Ripley, 1976), others do permit positive association(Baddeley & van Lieshout, 1995; Ripley, 1977; Ripley & Kelly, 1977).

252 ROBERT L. WOLPERT AND KATJA ICKSTADT

Perhaps the most common approach is to model either the counts themselves, after avariance-stabilising transformation, or the intensity for an underlying Poisson model, aftera logarithmic transformation, as a Gaussian Markov random field with a partially uncer-tain covariance structure (Besag, 1974; Cressie, 1993, p. 402; Cressie & Chan, 1989; Meller,Syversveen & Waagepetersen, 1998). Unfortunately, Gaussian models fail to reflect thecounts' discrete nature, while a log-normal approach (Bernardinelli et al, 1997; Besag,York & Mollie, 1991; Clayton & Kaldor, 1987) does not scale properly under aggregationand refinement of partitions: the logarithmic structure leads to products rather than sumsfor the Poisson means for unions of neighbouring sets.

Traditionally, inference in spatial statistics has been based on a combination of ad hocnonparametric techniques, such as distance-based methods and second-order methods,kernel smoothing (Silverman, 1986, Ch. 4), maximum-likelihood parameter estimates andassociated clever approximations and simulation-based estimation and testing. The emerg-ence of Markov chain Monte Carlo algorithms (Tierney, 1994; Gilks, Richardson &Spiegelhalter, 1996) has made it practicable, and increasingly common, to apply Bayesianstatistical methods to problems in spatial statistics, particularly for Markov random fieldmodels where the complete conditional distributions required for the Gibbs samplingapproach are usually available (Gelfand & Smith, 1990).

In this paper we introduce a natural and flexible class of Bayesian hierarchical modelsfor analysing spatially dependent count data without any arbitrary discretisation of thedata. The models are doubly stochastic Poisson processes whose intensities are convol-utions or mixtures of inhomogeneous, infinitely divisible random fields; as such theygeneralise the conjugate Poisson/gamma hierarchical models of Clayton & Kaldor (1987),and may be viewed as limits in probability of Poisson cluster processes. The presentmethods extend, and in special cases reduce to, those based on mixtures of Dirichletprocesses (Antoniak, 1974; Ferguson, 1973). Inference is based on simulations from thejoint posterior distribution of all quantities of interest.

At the first, that is the lowest, stage of the hierarchy the number of points in each setA in some region % of Euclidean space Rd is modelled as a Poisson-distributed randomvariable N{A), whose mean X(A) is the unobserved value of a random measure A(̂ 4).

Doubly stochastic models in which the 'increments' A(At) assigned to disjoint sets At

are independent, such as the gamma process of Ferguson (1974), would necessarily featureindependent counts N(At) as well, and so could not be used to study possible spatialdependence. Instead at the second stage of the hierarchy we construct the intensity measureas a kernel mixture A(A) = \ y k(A, s)T(ds) for an independent-increment infinitely divisiblerandom measure T(ds) on an auxiliary space Sf. In our examples in § 5, £f is a subset ofEuclidean space or a discrete set. For exposition we take T{ds) to have the gamma processdistribution r(ds) ~ Ga.{a(ds), /J(s)~1}, with shape measure tx(ds) and inverse scale functionP(s)>0.

To express uncertainty about the kernel and some aspects of the distribution of T(ds)within the Bayesian paradigm we introduce at the third stage of the hierarchy a priorprobability distribution n{d6) on a parameter space 0, and indicate by superscripts thedependence on 9 e © of k(x, s) = ke(x, s), a(ds) = ae(ds) and P(s) = j3e(s). Interest centres onthe uncertain kernel k°(x, s), on the unobserved Poisson mean A(x), on features of thedistributions or even the specific values of the underlying process T{ds), or on derivedquantities such as linear trends and other measures of systematic spatial variation.

Section 2 introduces a new and very efficient scheme, the Inverse Levy Measure algor-ithm, for sampling from inhomogeneous gamma random fields and other random measures

Poisson/gamma random field models 253

assigning independent infinitely divisible random variables to disjoint sets. We believe thatthis algorithm, based on ideas of Levy (1937) and generalising more recent work of severalauthors (Bondesson, 1982; Ferguson & Klass, 1972; Laud, Smith & Damien, 1996) to ourinhomogeneous spatial setting, will be useful in a variety of other problems. In § 3 thePoisson/gamma random field models are introduced, and in § 4 the Inverse Levy Measurealgorithm becomes the key to implementing a hybrid Metropolis/Gibbs approach withdata augmentation (Tanner & Wong, 1987) to evaluate posterior distributions.

In § 5 the methods are illustrated with an example from forest ecology, studying thespatial distribution of hickory trees in a portion of Duke Forest in Durham, NorthCarolina. We analyse the data twice, first in a continuous setting and then in a discreteone. The discrete form is appropriate for applications in which census counts for subregionsare given instead of exact locations; this is typical in disease mapping, for example, wheredata are customarily aggregated and reported by hospital, county or state to preserveconfidentiality. The methods are then discussed in § 6. All but one-line proofs are deferredto an Appendix.

2. POISSON AND GAMMA RANDOM FIELDS

The simplest way to specify or identify the probability distribution of nonnegativerandom measures such as N(A) is to give the moment generating function for the associatedrandom field N[<f>~\ = $x<f>(x)N(dx) defined in the obvious way by N[<f>] = H<p{N(Ai) fornonnegative simple functions <j>(x) = L ^ l ^ M , where lA(x) denotes the indicator functionequal to one if x e A and zero otherwise, and extended by continuity to a wider class.This is usually done by giving the Laplace exponent, that is the negative logarithm&s(4>)= -log[£{exp(-NM)}] (Karr, 1991, p. 9).

For example, the distributions of random measures assigning independent Poissonrandom variables N(A)~Po{X(A)}, with mean k{A) for some measure X{dx) on 3C, andgamma random variables F(fl) ~ Ga{a(B), ft'1}, with inverse scale parameter /?>0 andshape measure a(ds) on some set y , are determined by the Laplace exponents

&*(#)= I {\-e-«x))k[dx), &A<j>)= I log{l+<f>(s)/PMds). (2-1)Jar Jy

From Levy (1937, Ch. VII, § 53-55) and Jacod & Shiryaev (1987, Ch. II, § 4c) it followsthat any random field, including the Poisson and gamma fields, that assigns independentinfinitely divisible random variables to disjoint sets and satisfies appropriate regularityconditions must have a Laplace exponent of the form S£(§) = \y(\ — e~u^s))v(du, ds),generalising that of the Poisson. The gamma random field, for example, has 'Levy measure'vr(du, ds) = e~"Pu~1 dwx(ds). This has the following two important consequences for us.

(i) It suggests that the gamma process can be extended to nonconstant /?(s), withLaplace exponent i?r(<£) = J^log{l +<f>{s)/P(s)}a{ds) and Levy measure vr(du, ds) =e-ufi(s)u-i j u a ^ s ) _ This process was introduced by Dykstra & Laud (1981), who called itthe Extended Gamma process, for the special case of Sf = R +; an explicit but approximateconstruction appears in Laud et al. (1996) under the restriction of right-continuity for/?(s). We will see below that the restrictions of Sf and /?(s) are unnecessary.

(ii) It suggests that the gamma process can be constructed from a Poisson process


H(du, ds) on K+ x & with mean E{H(du, ds)} = v{du, ds) by setting

u4>(s)H(du, ds)J Jn+xsr

or, equivalently,

=\L= £ vmHoj or r(i4)= X »-. (2-2)

m < oo

where the sums extend over the random countable support {(vm, am)} of H(du, ds).

Both suggestions (i) and (ii) above are borne out by the following result, in which£1(t) = Jf

0Oe~"u~1 dw denotes the exponential integral function (Abramowitz & Stegun,1964, p. 228).

THEOREM 1. Let a(s) ̂ 0 and f}(s) > 0 be measurable functions on a space Sf. Let {am}be independent identically distributed draws from any probability distribution H(ds) ony, and let T m ^ 0 be the successive jump times of a standard Poisson process. Setx(u, s) = E1 {uP(s)}a{s) and vm = sup {M ^ 0: x(u, am) >xm}, that is,

vn = E;l{tmH<rJ}P(°J-\ (2-3)

or vm = 0 ifa{am) = 0. Then the random field defined by T\j>] = Em < 0 0 vm<j>(<jm) for boundedmeasurable <f>(s) has the gamma process distribution F(ds) ~ Ga{a(ds), /J(s)"1} for themeasure tx(ds) = u(s)H(ds).

COROLLARY 1. The gamma random field T(ds) ~ Ga{a(ds), Pis)'1} can be approximatedin distribution to arbitrarily high accuracy by the following.

INVERSE LEVY MEASURE ALGORITHMStep 1. Fix a large integer M and choose any convenient distribution U(ds) on Sf from

which it is easy to draw samples and such that the shape measure ct(ds) has a densitya(s) = a(ds)/IT(ds); see guidance on choices below.

Step 2. Generate M independent identically distributed draws {<rm}m<Mfrom n(ds).Step 3. Generate the first M jumps { t m } m < M of a standard Poisson process, for example

by adding successive independent exponential random variables.Step 4. Set vm = E^{xmKam))^am)-\Step 5. Set rM(ds) = TJm<Mvm8aJds)^r(ds), r M [ f l s E m < M « B f l O

COROLLARY 2. The joint distribution of the jump sizes and locations

in the Inverse L6vy Measure algorithm has a density with respect to the product measureUm<MdvmU(d(Tm) proportional to

on the set of {(vm, om)}m<tlfor which the function

x(u,s) = El{up{s)Ms)

satisfies x(vm, am) < t(um+1, am+1)for every m<M.

Proof. This requires simply a change of variables (2-3) from xm = x(vm, am) to vm. •


Remark 1. The probability distribution II can be written uniquely as the sum of continu-ous and discrete parts I\.{ds) = Ylc(ds)+ Hnnd^(ds), leading to similar representations forthe shape measure

a(ds) = (x(s)Uc(ds) + I nnix(sK)SJds)

and random field T(ds) = rc(ds) + Y.v^sjds), with vn ~ Ga.{nna(sn), ^(s,,)"1} independentof the gamma random field Tc ~ Ga{ct(s)Hc(ds), /J(s)"1}; often it is more efficient to usethe Inverse Levy Measure algorithm for rc while sampling directly the independent randomvariables vn from gamma distributions. Upon relabelling we recover representation (2-2).

The remarkable feature of Theorem 1 is the freedom to choose any convenient samplingdistribution II (ds) that dominates <x(ds), and any measurable function /J(s) > 0. The InverseLevy Measure algorithm compensates for oversampling, where a(s) = tx(ds)/FL(ds) is small,or undersampling, where <x(s) is large, by generating small or large jumps vm, respectively.Naturally some choices of U(ds) will sample more efficiently than others, so that T(ds)will be better approximated by the partial sum FM(ds) for modest M.

Appropriate choices for M and II (ds) can be guided by considering the expected trunc-ation error. From the estimates EJz) = — y — log(z) + O(z) (Abramowitz & Stegun, 1964,§5.1.11) and TM —M we find that um===exp{— y — m/a(crm)}//?(<7m); for nearly constant<x(s) — a, that is, for II (ds) nearly proportional to x(ds), and /?(s) — /?, this is approximatelya geometric series, with relative truncation error about exp(—M/a). For a(S?)< oo andconstant /?(s) = /? the choice U(ds) = a{ds)la.(Sf) makes oc(s) = v.(Sf) constant too, and theInverse Levy Measure algorithm will generate the vm in strictly decreasing order startingat the largest mass point of F(ds). In this case one might prefer to select a small e > 0 andsample a random number M, of times until e exceeds the smallest jump included andhence exceeds each of the jumps omitted, to get an expected truncation error that obeysthe simple bound E{T(A) — rMt(A)} ^ etx(A). In this special case, in one dimension,Sf = (0,1) c R1, with Lebesgue measure for a(ds) and II (ds), the present method reducesto that of Bondesson (1982) and generalises the method of Ferguson & Klass (1972).

Our simulation is approximate only in that we cannot draw infinitely many points(vm, (Tm); the points we do draw are from exactly the right distribution, without approxi-mation. Other published methods for simulating draws from some of these distributions(Damien, Laud & Smith, 1995; Laud et al., 1996) do entail approximations and moreoverimpose the restrictions that ¥ be one-dimensional and that the Levy measure v(du, ds)have a product density with respect to Lebesgue measure du ds. Also, since they do notgenerate the largest jumps first, they require larger samples to ensure the same fractionof the total mass F(^).

The Inverse L6vy Measure algorithm extends easily to any other infinitely divisibleindependent-increment process with Levy measure v(du, ds) = v(u, s) dull(ds), simply byreplacing T(U, S) = £x {u/J(s)}a(s) with x(u, s) = J* v(v, s) dv; hence the name 'Inverse LevyMeasure'. The one-sided stable process S[^] of index £(s)e(0,1) and intensity measurea(ds) = tx(s)U(ds), for example, has L£vy measure v(du,ds) = ^(s)u~1~ii')dux(ds) and soT(U, S) = u~i(s)ix(s) leads to a sampling scheme with vm = {<x(om)/xm}lli(''m); note that permit-ting the stable index £(s) to be nonconstant entails no additional difficulty.

3. MODELS

31. The Poisson/gamma modelAt the top, that is the third, level of the hierarchy let n(d9) be a probability distribution

on a Borel measurable space 0.


At the middle, second stage of the hierarchy, given 0 e 0, let F(ds) be the gamma randomfield r(ds) ~ Ga{a9(ds), /^(s)"1} constructed in Theorem 1. Define an integrable functionA(x) on a measure space {3C, w{dx)} by A(x) = \ y k?(x, s)F{ds) for some nonnegative inte-gral kernel /c"(x, s) satisfying the integrability condition

III kf(x, s)w(dx)^e(s)-1ae(ds)n(de) < oo;

this condition ensures that the random measure A(dx) = A(x)w(dx) and integral kernelk°(dx, s) = fc"(x, s)w(dx) will be well defined.

At the bottom, first stage of the hierarchy, given 6 and r(ds), take N(dx) to be a Poissonmeasure with conditional mean A{dx), assigning conditionally independent Po{A(v4,)}random variables N(Ai) to disjoint Borel sets At cz $C\ the unconditional dependence ofthe {N(At)} arises from that of the {A(At)}. Multiple points are possible if A(dx) hasatoms; see the example in § 5-3. Thus N(dx) is modelled as a doubly stochastic Poissonprocess with hierarchical representation:

9 ~ n(dO), r(ds)\e ~ Ga{a»(«fe), ^(s)'1},

C (3-1)A(dx)= k°(dx,s)r{ds), N(dx)\es~Po{A(dx)}.

The representation, which depends upon k?(x, s) and fle(s) only through their ratiok"(x, s)/pe(s), could be made unique by requiring /?fl(s) = 1 or f^{SC, s) = 1, but the presentover-parameterisation facilitates Bayesian updating, which affects only /Je(s), see § 4, andthe inclusion of covariates, which affects only &"(*, s), see § 6.

The conditional mean and covariance of N(dx), given 6, are easily computed:

-LEe{N(A)}= k?(A,s)pe{s)-la?(ds),

= I ke(AnB,s)pe(s)-1a.e(ds)+ \cov*{N(A),N(B)}= I ke(AnB,s)pe(s)-1a.e(ds)+

We now seek fc^x, s), cce(ds) and Pe(s) flexible enough to approximate the anticipated pointprocess means and covariances in an illustrative example.

32. An illustrationIn § 5 we study an example from forest ecology in which N(dx) counts the number and

locations of hickory trees at least 10 m from the boundary of the 140 m x 140 m Bormannresearch plot in Duke Forest in Durham, North Carolina. We represent these regions by

3T s [10,130] x [10,130] c <f = [0,140] x [0,140] <= R2.

For 0eR 2 x lR + let F(ds) be a gamma random field with uniform shape measurea?{ds) = e$1 ds and constant scale ^9(s)~1 = e02. Let fc^x, s) be a Gaussian kernel densityk?(x, s) = (n6l)~1 exp(— |x — s\2/6l) with respect to the reference measure w(dx) = 10~4 dxon 2E, area measured in hectares, normalised by the condition JR 2 I^(X, S) ds = 1, supporting


a range of possible correlation distances. With these choices the approximate means andvariances become

Jsr JA

\\ {2nel)-le

A*A

The first two approximations are quite close for A c 3C whenever 63 « 10 m and wouldbe exact for y = X = R2, where the spatial point process N(dx) would be stationary andisotropic with intensity 2 = eei+fl210~4and Ripley's function

(Diggle, 1983, § 4.1). The expected number of trees within distance t of an arbitrary pointand of an arbitrary tree would then be Ant2 and

XK(t) = Xnt2 + 1(T V*(l - e-*1'2^),

respectively.Evidently e6l+Bl is the mean density, in trees per hectare; 10 ~4e62 is the large-scale

overdispersion or, alternatively, the excess number of trees close to a randomly selectedtree; and 03 is a measure of the distance within 9C over which any spatial interactionextends. The overdispersion is smaller for small sets A, and is nearly proportional to w(A).A prior distribution n(d6) is selected in § 5 to express prior belief about these features.For future use note that k8^, s) = $rk?(x, s)w(dx) in this example may be written in termsof the standard normal distribution function as

(3-2)

which, if 63 is small, is approximately equal to 10 ~4 for s in the interior of SC but nearzero for s far from 3E\ in particular it is a nonconstant function of s = {slt s2) e y.

4. THE COMPUTATIONAL ALGORITHM

41. IntroductionAn exact likelihood function for 9 may be evaluated as a convolution from the hierarchi-

cal mixture representation (31), but the computations are tedious and numericallyunstable. Instead we follow a Markov chain Monte Carlo computational approach basedon (31) to evaluate the joint posterior distribution of the uncertain quantities 8 andF = r(ds) and functions thereof by simulating steps (6\ V) from an ergodic Markov chainconstructed to have as a stationary distribution the posterior distribution of 6 and F giventhe observed count data N(dx). The key computational requirements are the complete


conditional distributions of each uncertain quantity given all the others, evaluated asfollows.

4-2. Augmentation and the complete conditional distributionsFirst consider the conditional distribution of F given 9 and N(dx). The point distribution

N(dx) is a finite integer-valued measure on X, and so can be represented as the sum of arandom number Nx~Vo{A(SC)} of unit point masses at not-necessarily-distinct pointsXn = xn, drawn conditionally independently from the mixture distribution

For each n ^ Nx resolve the mixture by drawing an auxiliary random variable Sn = sne ¥from the distribution Sn\eXtN ~ k?{xn, sn)r{dsn)/A(xn), and let

Z{dx ds) = Ln<NrSiXniSn)(dx ds)

be the random measure on SC x Sf assigning unit mass to each pair (xn, sn). Of course theunaugmented data may be recovered as the first marginal measure N(dx) =Z1(dx) = Z{dx x y), while the augmentation points {Sn = sB} can be recovered as thesecond marginal, Z2{ds) = Z(5" x ds). With this data augmentation we have the followingconjugacy property.

THEOREM 2. The conditional distribution ofT{ds), given 9 and Z(dx ds), is given by

T(ds)\e,z ~ Ga[(ae + Z2)(ds), {?<>(*) + W , *)} - 1 ] . (4-1)

Note that, even if we begin with a constant /Jfl(s) = /J9, we are forced by (3-2) and (41)to consider nonconstant functions /?e(s). Now turn to the conditional distribution ofZ{dx ds).

THEOREM 3. Conditional on 9, F and N(dx), the random measure Z(dx ds) is distributedas the sum of Nx = N(&) unit point masses at points {xn, sH) e 3C x ¥, where xH ranges overthe support of N{dx), counted according to multiplicity, and where {sH} are the realisedvalues of conditionally independent random variables {Sn}n<Njr with discrete distributions

= am\9,T,N) = vJ»(xH, <xm)/A(xn)

for the sequence {{vm,aj} of (2-2).

Proof. The representation A(x) = Jlm<oavml<^{x, am) of the Poisson mean follows from(2-2) and (31), and the definitions of Sn and Z(dx ds) complete the proof. •

The complete conditional distribution of 9, given F and Z(dx ds), depends on the arbi-trary prior distribution n(d9) and so cannot in general lead to a simple sampling scheme;nevertheless it will have a density function, if n(d9) does, that will be needed for aMetropolis step (Hastings, 1970) in the calculations to follow.

THEOREM 4. Let a?(ds) have a density function cte(s) with respect to a fixed nonatomicprobability measure Yl(ds) on ¥, and n(d9) a density function n(9) with respect to somereference measure d9 on ©. Then for M > Nr = N(2E) the conditional density of 9, givenZ(dx ds) and the truncated gamma measure FM(ds) constructed in Corollary 1, is proportional


to

0|z.rMoc n(6) exp [ " J

x n k?{xH,sn) ft [a

where /?* (s) = Pe{s) + ^{SC, s) andim=\ if am$ {sH:n<m}, otherwise im = 0.

Theorem 4 needs only a small adjustment to accommodate measures Tl(ds) with atoms:simply replace a9(<7m)'m in the final product by {7rma9(o-m) + £m} for atoms am withnm = n({(jm})> 0, where £m is t r i e cumulative multiplicity #{n <m:sn = am), but Remark 1offers a more efficient way to accommodate atoms.

4-3. The Markov chain Monte Carlo schemeTheorems 2-4 provide all the conditional distributions needed to implement a Markov

chain Monte Carlo scheme for sampling from the joint posterior distribution of the uncer-tain quantities from the augmented model,

6 ~ n(dd), T(ds)\g ~ Ga{a9(ds), P^s)'1}, Z{dx ds)\BJ ~ Po{fcVx, s)T{ds)}.

Given a prior density function n(9) on a parameter space 0 , a reference measure w(dx)on the observation space X, a probability measure H(ds) on an auxiliary space £f,a measurable kernel function ^(x, s), a gamma process shape density <xe(s) and inversescale function /?e(s) as in § 3 1 , a Markov transition density Q(9, 9*) on 0 , an integerM»N = Nr and initial points 9° e 0 and {(i>°, tx°)}m<M c R + x y , successive points canbe generated by the following hybrid Gibbs/Metropolis scheme starting at iteration t = 1.

1. Gibbs step to update the augmentation points. Given 9'~l and F '" 1 ,(a) generate independent S* = {si}n<JV with pr(Si = o*~l) cc v*'1^ \xn, a1'1);(b) (optional) move S* with a Metropolis/Hastings step; see note below.

2. Gibbs step to update the gamma process. Given 0 '"1 and S* = {si}n<iV,(a) set oi, = s^ (1 ̂ m < N) and generate independent oi, ~ H(ds) for iV < m ̂ M;(b) set o4 = a f l ' " 1 ( 0 and ̂ i, = )39'"1(oJ

m) + /cflt"1(5", o ,̂) for K m ^ A f , and putii, = 0 if si = oi, for some n<m, otherwise i'm = 1;

(c) generate successive jumps { t m } m < M of a standard Poisson process;(d) set

m — tm-i)/f}'m for 1 < m ̂ N, see Remark 1,

I 1 {(Tm - Tjv)/ai, }/î, for N < m < M, see Corollary 1;

3. Metropolis/Hastings step to update the parameter 9. Given d''1, V and

(a) set 9~ = 6'-1 and generate 6+ ~ Q(0", 0+);(b) set k~ = tf-{xn, *i), /cn

+ = /c«+(xn, si);(c) seta;âe"(O,a:â('Vj;(d) set fS~ = p'-(o*m) +)?-{%, o>J, fc = po+io'J + k»\%,


(e) set

(f) set 0' = 6+ with probability (1 A P(), otherwise set ex = e~ = 0'"1.4. Set t<- t+ 1 and return to step 1.

The approach of this Markov chain to its equilibrium distribution can be slow, with someof the same augmentation points s*n recurring for long periods; convergence can be acceler-ated by adding another Metropolis step within Gibbs step l(b), in which a small moveis proposed for each sj, and is accepted or not following the Hastings rule (Tierney, 1994,p. 1704). A similar phenomenon affecting sampling from Dirichlet process distributionswas recognised and overcome by Bush & MacEachern (1996) and MacEachera (1994).

5. EXAMPLES

5-1. Description of the dataAs an illustration we analyse the density and spatial correlation of hickory trees in the

Bormann research plot discussed in § 3-2, seeking answers to a number of questions. Isthe tree distribution consistent with a homogeneous Poisson distribution? If the intensityis not constant, are intensities at nearby points positively associated? How might 'nearby'be determined?

The underlying dataset consists of a complete census count of every tree in the Bormannplot with diameter at height 1-5 m of at least 2-5 cm, including its diameter, to within0-1 cm, its spatial coordinates, to within 01 m, and an indicator of its condition. The data,concerning 7859 trees of 38 species, were initially collected in 1951-1952 by Bormann andupdated in 1974, 1982, 1989 and 1993 by Christensen, Peet and members of the DukeUniversity and University of North Carolina botany departments in a continuing studyof forest maturation (Bormann, 1953; Christensen, 1977). We group the four hickoryspecies present and base our inference on the locations, shown as dots in Fig. l(a), of the85 hickory trees located at least 10 m from the boundary of the 140 m x 140 m plot withdiameter at least 2-5 cm in the 1974 census, making no further use of the trees' species,diameters or conditions; see § 6 for possible extensions.

5-2. Example: The continuous caseWe continue now with the model introduced as an illustration in § 32 with independent

prior distributions, normal for 0X and 92 and log-normal for 63, chosen to give a prior meandensity of about 60 trees per hectare, which is typical for this area, prior expected over-

Poissonjgamma random field models 261

. . - • * " •

0 20 40 60 80 100 120 140

«— West East —>

Fig. 1. Hickory tree data; Location on 140 metre x 140 metre plot. Image (a) and perspective (b) plots ofposterior mean tree density A(x).

dispersion of 50% and prior expected interaction distance of 5 m. Thus for 9 e 0 = R2 x R +,

1n(d6) = exp {-(0! + 4-4)72 - (92 - 8-5)2/2 - log(03/5-O)2/2} d9x dO2 d93,

1

3/2

Figure 1 shows the posterior mean Poisson intensity £{A(x)}, as an image plot in Fig. l(a)with darker regions indicating higher intensity and dots indicating the trees, and as aperspective plot in Fig. l(b). Twenty-five thousand steps of the computational scheme of§ 4-3 appear more than adequate for convergence, since time series plots of all the featureswe monitored had stabilised by 5000 steps; we used symmetric Gaussian random walksfor both Metropolis steps 3(a) and l(b), with step sizes chosen to ensure about 40%acceptance of the proposed steps.

Figure 2 shows the prior density, as a curve, and estimated posterior density, as ahistogram, for the overall tree density eei+°2, in trees per hectare, the overdispersionlO~4e02, as a percentage, and the interaction distance 63, in metres, with the posteriormeans 76-2 trees/ha, 178-5% and 4-6 m indicated by vertical lines. This analysis suggeststhat there is substantially more overdispersion than anticipated and spatial interactionwith positive association in tree density at distances of up to about 5 m.

5-3. Example: The discrete caseWhen both %={xl}ieI and Sf={Sj}jeJ are finite or countable, the models can

be expressed in terms of vectors and matrices Nt = N{{xt}), w; = w({x,}), A, = A(x,),a? = <xe({Sj}), pej = pe(Sj), Tj = T({Sj}), Zu = Z({x,} x {Sj}) and ^ = ke(xi, Sj) in the simplerform

0~n(dd),

262

(a)

ROBERT L. WOLPERT AND KATJA ICKSTADT

(b) ft

20 60 100 140

Density (trees/ha)

0 100 200 300 400 500

Overdispersion (%)

2 4 6 8 10

Distance (m)

Fig. 2. Hickory tree data. Prior density functions and posterior histograms for (a) overall density(trees/hectare), (b) overdispersion (%), and (c) interaction distance (metres). Vertical lines indicate posterior

means.

Again data augmentation, now from independent multinominal distributions

{Zi.}iei\e.r.n ~ M N M> P?.). Py = ^j/Ah

makes the model conjugate, but now sampling from the gamma distribution is routineand does not require the Inverse Levy Measure algorithm to sample from a gammarandom field as in Theorem 1; see Remark 1.

We now study the same data, but this time we divide the 120 m x 120 m area into a6 x 6 lattice of 20 m x 20 m quadrats and aggregate the counts Nt of hickory trees in eachquadrat. The empirical density plot Fig. 3 (a) indicates these counts numerically andthrough shading.

Let / be a set indexing the 36 quadrats {x,} and view 9C = {x (} ( e / as a nearest neighbouradjacency graph; set JY = I and let J2 be the set of unordered pairs (i, i") indexing the 60

T

Nor

t<-

Sou

th

14/1 -

120 •

100 -

80 -

60 -

40 -

20 -

0 J

(a)

n0

W

D̂̂0

1

D

•I

0

-

2

[ . 1•h

0

0

11 ,

wmi ^9

u ^M

0 1 0

20 40 60 80 100 120 140

<— West East —•

20 40 60 80 100 120 140

<— West East - •

Fig. 3. Hickory tree data. Empirical (a) and posterior (b) image plots of intensity A(x) on 6 x 6 quadrats in120 metre x 120 metre area.

Poisson/'gamma random field models 263

edges or nearest neighbour pairs {xh xt-} that comprise the dual graph %'; and let F bea 96-dimensional gamma-distributed random vector indexed by J = (J1UJ2)- We modelthe Poisson intensity A, at xt as the sum of a part associated with only that quadrat,fc?(F, = 6^, and up to four edge-terms fcyr} = 02r}, j = (i,i')e J2, shared with nearestneighbours. Figure 3(b) shows the estimated posterior mean intensity £(A() for each quad-rat after 10 000 Markov chain Monte Carlo steps, both numerically and by shading, basedon independent standard log-normal prior distributions for 0 t and 62. The spatial patternapparent in the empirical density plot Fig. 3 (a) remains, but is smoothed considerably inthe posterior plot Fig. 3(b). Related discrete models with more conventional nearest neigh-bour and distance-based kernels are studied in more detail at a variety of resolutions inIckstadt & Wolpert (1997).

6. DISCUSSION

We now return to the questions posed at the beginning of § 51 . In the Example of § 5-2we see substantially more overdispersion than anticipated, in that the posterior probabilityof at least 100% overdispersion is 0-929, while the log-normal prior accords only prob-ability 0-239 to the event; evidently the hickory trees are not spread uniformly throughout#". The interaction distance 83 has posterior mean 4-59 m, with 90% of the mass in therange 3-24m<0 3 <6-23 m, consonant with the prior distribution, with median 50m, butmuch more concentrated in that the prior probability of this interval is only 0-255; inparticular, the posterior probability of interaction at distances exceeding 3 m (respectively2 m) is 0-976 (respectively > 0-999), offering overwhelming evidence of spatial interaction.

The models can be extended in a variety of ways. The kernel ke(x, s) and hence thePoisson intensity A(x) may be modelled as uncertain functions of observed collateralinformation C(x), such as location, soil type, elevation or aspect at sites x e X, to avoidmistaking covariate dependence or a spatial trend for spatial dependency; one convenientchoice is the log-linear kernel ke(x,s) = eC(x)6ck^(x,s), with baseline kernel k%(x, s).Individual-specific covariates, such as the species, diameter and condition of each tree,may be included by treating the observations as a marked point process, a random measureon the Cartesian product {X x s/) of the observation space X with some space si ofpossible attributes, or 'marks', so that the intensities and relative abundances for theseveral tree species or hickory subspecies can be considered simultaneously in a spatialstudy of biodiversity. Temporal trends may be studied by treating the parameter6 = 6(t) e 0 as a time series or stochastic process, to permit the study of evolving spatialtrends; the surveys from 1952, 1974, 1982, 1989 and 1993 might be used together to makeinference about the time-dependence of 0(t) and so reveal trends and support predictions.Some of these possibilities are pursued by Ickstadt & Wolpert (1997) in a discrete setting.

The latent random field T(ds) may itself be of independent interest. A mechanistic model,in which F represents the unobserved distribution of soil nutrients which diffuse over timeto sites x e 3C, would be quite similar to the present model, in that the Gaussian kernel^(x, s) is also the Fickian diffusion kernel (Crank, 1975, p. 28). This suggests that a moreelaborate analysis might include eliciting features of nonconstant densities <x9(s) and scalesPe(s)~1, reflecting the anticipated distribution of nutrients, minerals, water and othergrowth requirements.

The doubly stochastic Bayesian hierarchical models introduced here are applicable ina wide range of problems, both continuous and discrete, in any number of dimensions,that feature correlated count data: disease mapping, where the counts are cases, bioabund-


ance, where they are individuals, network analysis, where they are packets passing a node,and survival analysis, where they are failure times and 3C = R + represents time. In ecologi-cal and epidemiological applications it is common to study count data aggregated withinpolitical units while environmental covariates are reported at different levels of aggre-gation; the existence of a continuous random field model underlying the discrete versionof the present models enables one to accommodate data and covariates reported at varying,and sometimes incommensurate, spatial levels without any further aggregation.

The Inverse Levy Measure algorithm and sampling method of Theorem 1, with itscompletely arbitrary location-sampling distribution H(ds) and its easy extensibility toarbitrary spatial Levy processes, and the hybrid Metropolis/Gibbs approach, with a com-pletely arbitrary prior distribution n(d6), free the modeller from the need to choose conven-tional or merely convenient prior distributions and intensity measures. Software and dataare available on request from the authors.

ACKNOWLEDGEMENT

The authors would like to thank Norm Christensen and Robert Peet for sharing dataand insight, Jurgen Lehn for support and encouragement and Merlise Clyde, Noel Cressie,Michael Lavine and Philip Protter for helpful conversations. We are grateful to the Editor,Associate Editor and two referees for their careful reading and valuable suggestions.This work was supported by the United States Environmental Protection Agency, UnitedStates National Science Foundation and the Deutsche Forschungsgemeinschaft and wasperformed while the second author was visiting Duke University.

APPENDIX

Proofs

Proof of Theorem 1. First consider simple functions <f>(s), a(s) and /J(s), all constant on elementsof a fixed partition {Bj}]eJ of £f with values <f>j ^ 0, a, ^ 0 and /?; > 0, respectively. For each j e Jset pj = Tl(Bj) and P^ = T.m:pjtm^t lBj(

am)- Then {PY>}JeJ are independent standard Poisson pro-cesses, each a time-changed Bernoulli-thinned version of P, = Em : t m < ( 1, with jump times xJ

m, and

m-.^B, I W J m<co I \Pj*jJ J

are independent gamma-distributed Gn(pj(tj, fij1) random variables, so

-log£(C-™)= -log n (1 + j) '"= [ logjl + ̂ r\ a(ds)jej\ PjJ Jy K. P\s))

is the Laplace exponent for the gamma process. A limiting argument completes the proof for anynonnegative measurable <f>(s) integrable with respect to fl(s)~ya.{ds), while F[^] = oo almost surely,and E(e~n*]) = 0, for any <p not so integrable. •

Proof of Corollary 1. For <f>(s) ^ 0 the conditional distribution, given FM, of the necessarilypositive truncation error (F[^] — ^,^[£1) = T,m>Mvm<f>(am) depends only on TU, with conditionalexpectation

M) = £ ft*) ( l - e x P [ -

[


where the last bound follows from the inequality e^E^x) < log(l + 1/x) of Abramowitz & Stegun(1964, §5.1.20). •

Proof of Theorem 2. Conditional on 9 and T, Z(dx ds) is a Poisson measure o n l x ^ withintensity measure ke(dx, s)T(ds); thus (XH, Sn)\gj-~kB(dxH, sH)r(dsn)/A(2E). First consider simplefunctions fSe(s) and f^(dx, s), constant on a fixed partition {Bj}JeJ of Sf; fix any Sj6 BJt and seta) = of(Bj), pBj = Pe(sj), Fj = r(Bj), with realisations y}, and z7 = Z2{B}). Let 0(s) be nonnegativeand simple as well, with constant value <j>} on each By, note that A(#") = £j6yfcs(#", SJ)TJ and

L j 6 j<t>jFj. The joint distribution is then proportional to

e,r,z~ n(do) | 1

n ffcfl(xa,sn)w(dxB)T.JejlBj{sHyyjndsm\sHeB^)}

l l AW r (A-l)

so that

E®(e-™|0,Z)oc I ! ( I e"*^/J"1e""j^e"if(*'-1^y}> dy j oc f ] {̂ j + ^ + fc9^, s7)}~'*"^.

The proportionality constant is determined by the relation e"n 0 ] = 1, giving

= [the Laplace exponent given in (21) for the gamma process claimed in (41). A routine passage tothe limit, refining the partition BJt completes the proof. •

Proof of Theorem 4. Integrate (Al) with respect to {yj}jej to find

As the partition {Bj} is refined, the first product converges to

expf- [ log{l + ̂ [^}oW| El {?L JS" I P (S> ) J m-iNr

and the second, by nonatomicity, to const x Y[neN {oP(sn)U(ds,,)}, where the product extends onlyover a set No indexing the distinct augmentation points sn. After terms are combined, the jointdensity with respect to the product measure dQY\H<N w{dxn)\\n€N Tl(dsm) is proportional to

By Theorem 2 and Remark 1 the conditional distribution of T(ds) may be written in form (2-2)with am = sm and vm = (tm - tm-i)P%{pm)~l, for m ^ TV,-, om~ Tl(ds) and

»m = E;1 {(tm - t

for m > Nf, for jumps zm of a standard unit-rate Poisson process; by Corollary 2, the 0-dependent


portion of the density function of the first M points (vm, am) is proportional to

r*kz°c El PM 11 ""(OexpT- £ i>B/Jj(O-£i{»*0i(*M)}«V*)]-

Combining the last two equations completes the proof. •

REFERENCES

ABRAMOWTTZ, M. & STEGUN, I. A. (Ed). (1964). Handbook of Mathematical Functions with Formulas, Graphsand Mathematical Tables, Applied Mathematics Series, 55. Washington, DC: National Bureau of Standards.

ANTONIAK, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric prob-lems. Ann. Statist. 2, 1152-74.

BADDELEY, A. J. & VAN LIESHOUT, M. N. M. (1995). Area-interaction point processes. Ann. lnst. Statist. Math.47, 601-19.

BERNARDINELLL. L., PASCUTTO, C , BEST, N. G. & GILKS, W. R. (1997). Disease mapping with errors incovariates. Statist. Med. 16, 741-52.

BESAG, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with Discussion). J. R.Statist. Soc. B 35, 192-236.

BESAG, J., YORK, J. & MOLLIE, A. (1991). Bayesian image restoration, with two applications in spatial statistics(with Discussion). Ann. lnst. Statist. Math. 43, 1-59.

BONDESSON, L. (1982). On simulation from infinitely divisible distributions. Adv. Appl. Prob. 14, 855-69.BORMANN, F. H. (1953). The statistical efficiency of sample plot size and shape in forest ecology. Ecology

34, 474-87.BUSH, C. A. & MACEACHERN, S. N. (1996). A semi-parametric Bayesian model for randomised block designs.

Biometrika 83, 275-85.CHRISTENSEN, N. L. (1977). Changes in structure, pattern and diversity associated with climax forest maturation

in Piedmont, North Carolina. Am. Midland Naturalist 97, 176-88.CLAYTON, D. G. & KALDOR, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in

disease mapping. Biometrics 43, 671-81.Cox, D. R. (1955). Some statistical methods connected with series of events (with Discussion). J. R. Statist.

Soc. B 17, 129-64.Cox, D. R. & ISHAM, V. (1980). Point Processes. New York: Chapman and Hall.CRANK, J. (1975). The Mathematics of Diffusion, 2nd ed Oxford: Oxford University Press.CRESSIE, N. A. C. (1993). Statistics for Spatial Data. New York: Wiley.CRESSIE, N. A. C. & CHAN, N. H. (1989). Spatial modeling of regional variables. J. Am. Statist. Assoc.

84, 393-401.DAMIEN, P., LAUD, P. W. & SMITH, A. F. M. (1995). Approximate random variate generation from infinitely

divisible distributions with applications to Bayesian inference. J. R. Statist. Soc. B 57, 547-63.DIGGLE, P. J. (1983). Statistical Analysis of Spatial Point Patterns. New York: Academic Press.DYKSTRA, R. L. & LAUD, P. W. (1981). A Bayesian nonparametric approach to reliability. Ann. Statist.

9, 356-67.FERGUSON, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1, 209-30.FERGUSON, T. S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2, 615-29.FERGUSON, T. S. & KLASS, M. J. (1972). A representation of independent increment processes without Gaussian

components. Ann. Math. Statist. 43, 1634-43.GELFAND, A. E. & SMITH, A. F. M. (1990). Sampling-based approaches to calculating marginal densities.

J. Am. Statist. Assoc. 85, 398-409.GILKS, W. R., RICHARDSON, S. & SPIEGELHALTER, D. J. (1996). Markov Chain Monte Carlo in Practice. New

York: Chapman and Hall.HASTINGS, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications.

Biometrika 57, 97-109.ICKSTADT, K. & WOLPERT, R. L. (1997). Multiresolution assessment of forest inhomogeneity. In Case Studies

in Bayesian Statistics, Volume 3, Lecture Notes in Statistics, 121, Ed C. Gatsonis, J. S. Hodges, R. E. Kass,R. McCulloch, P. Rossi and N. D. Singpurwalla, pp. 371-86. New York: Springer-Verlag.

JACOD, J. & SHIRYAEV, A. N. (1987). Limit Theorems for Stochastic Processes, Grundlehren der mathematischenWissenschaften, 288. Berlin: Springer-Verlag.

KARR, A. F. (1991). Point Processes and their Statistical Inference, 2nd ed., Probability, Pure and Applied, 7.New York: Marcel Dekker.

KELLY, F. P. & RIPLEY, B. D. (1976). A note on Strauss's model for clustering. Biometrika 63, 357-60.


LAUD, P. W., SMITH, A. F. M. & DAMIEN, P. (1996). Monte Carlo methods for approximating a posteriorhazard rate process. Statist. Comp. 6, 77-83.

LEVY, P. (1937). Theorie de LAddition des Variables Aliatoires. Paris: Gauthier-Villars.MACEACHERN, S. N. (1994). Estimating normal means with a conjugate style Dirichlet process prior. Commun.

Statist. B 23, 727-41.M0LLER, J., SYVERSVEEN, A. R. & WAAGEPETERSEN, R. P. (1998). Log Gaussian Cox processes. Scand. J. Statist.

25. To appear.NEYMAN, J. & SCOTT, E. L. (1958). Statistical approach to problems of cosmology (with Discussion). J. R.

Statist. Soc. B 20, 1-43.RIPLEY, B. D. (1977). Modelling spatial patterns (with Discussion). J. R. Statist. Soc. B 39, 172-212.RIPLEY, B. D. & KELLY, F. P. (1977). Markov point processes J. Lond. Math. Soc. 15, 188-92SILVERMAN, B. W. (1986). Density Estimation for Statistics and Data Analysis, Monographs on Statistics and

Applied Probability. New York: Chapman and HalLSTRAUSS, D. J. (1975). A model for clustering. Biometrika 62, 467-76.TANNER, M. A. & WONG, W. H. (1987). The calculation of posterior distributions by data augmentation.

J. Am. Statist. Assoc. 82, 528-50.TIERNEY, L. (1994). Markov chains for exploring posterior distributions (with Discussion). Ann. Statist. 22,

1701-62.

[Received January 1996. Revised September 1997]

poisson/gamma random field models for spatial statistics€¦ · with infinitely divisible...

Documents