spatial modelling of count data: a case study in modelling

Spatial Modelling of Count Data:A Case Study in

Modelling Breeding Bird Survey Dataon Large Spatial Domains

Christopher K. WikleUniversity of Missouri-Columbia

2

0.1 Introduction

The North American Breeding Bird Survey (BBS) is conducted each breed-ing season by volunteer observers (e.g., Robbins et al. 1986). The observerscount the number of various species of birds along specified routes. The col-lected data are used for several purposes, including the study of the range ofbird species, and the variation of the range and abundance over time (e.g.,Link and Sauer, 1998). Such studies usually require spatial maps of relativeabundance. Traditional methods for producing such maps are somewhat adhoc (e.g., inverse distance methods) and do not always account for the spe-cial discrete, positive nature of the count data (e.g., Sauer et al. 1995). Inaddition, corresponding prediction uncertainties for maps produced in thisfashion are not typically available. Providing such uncertainties is criticalas the prediction maps are often used as ”data” in other studies and forthe design of auxiliary sampling plans.

We consider the BBS modeling problem from a hierarchical perspective,modeling the count data as Poisson, conditional on a spatially-varying in-tensity process. The intensities are then assumed to follow a log-normaldistribution with fixed effects and with spatial and non-spatial randomeffects. Model-based geostatistical methods for generalized linear mixedmodels (GLMMs) of this type have been available since the seminal workof Diggle et al. (1998). However, implementation is problematic when thereare large data sets and prediction is desired over large domains. We showthat by utilizing spectral representations of the spatial random effects pro-cess, Bayesian spatial prediction can easily be carried out on very largedata sets over extensive prediction domains.

The BBS sampling unit is a roadside route 39.2 km in length. Over eachroute, an observer makes 50 stops, at which birds are counted by sight andsound for a period of 3 minutes. Over 4000 routes have been included inthe North American survey, but not all routes are available each year. Asmight be expected due to the subjectivity involved in counting birds bysight and sound, and the relative experience and expertise of the volunteerobservers, there is substantial observer error in the BBS survey (e.g., Saueret al. 1994).

In this study, we are concerned with the relative abundance of the HouseFinch (Carpodacus mexicanus). Figure 0.1 shows the location of the sam-pling route midpoints and observed counts over the continental UnitedStates (U.S.) for the 1999 House Finch BBS. The size of the circle radiusis proportional to the number of birds observed at each site. This figuresuggests that the House Finch is more prevalent in the Eastern and West-ern U.S. than in the interior. Indeed, this species is native to the WesternU.S. and Mexico. The Eastern population is a result of a 1940 release ofcaged birds in New York. The birds were being sold illegally in New YorkCity as “Hollywood Finches” and were supposedly released by dealers in

THE POISSON RANDOM EFFECTS MODEL 3

Figure 0.1 Observation locations for 1999 BBS of House Finch (Carpodacus mex-icanus). Radius and color are proportional to the observed counts.

an attempt to avoid prosecution. Within three years there were reports ofthe birds breeding in the New York area. Because the birds are prolificbreeders and their juveniles disperse over long distances, the House Finchquickly expanded to the west (Elliott and Arbib, 1953). Simultaneously, asthe human population on the west coast expanded eastward (and corre-spondingly, changed the environment) the House Finch expanded eastwardas well. By the late 1990’s, the two populations met in the Central Plainsof North America.

From Figure 0.1 it is clear that there are many regions of the U.S. thatwere not sampled in the 1999 House Finch BBS. Our interest here is topredict abundance over a relatively dense network of spatial locations, everyquarter degree of latitude and longitude. The network of prediction gridlocations includes 228 points in the longitudinal and 84 in the latitudinaldirection, for a total of 19,152 prediction grid locations.

0.2 The Poisson Random Effects Model

Consider the model for the count process y(x) given a spatially varyingmean process λ(x):

y(x)|λ(x) ∼ Poisson(λ(x)). (0.1)

The log of the spatial mean process is given by:

log(λ(x)) = µ+ z(x) + η(x), (0.2)

where µ is a deterministic mean component, z(x) is a spatially correlatedrandom component, and η(x) is an uncorrelated spatial random compo-nent. In general, the fixed component µ might be related to spatially-varying covariates (such as habitat) and could include “regression” terms.We will consider the simple constant mean formulation in this application.

4

The correlated process, z(x), is necessary in this application because wehave substantial prior belief that the counts at “nearby” routes are corre-lated. From a scientific point of view, this is likely due (at least in part)to the fact that the birds are attracted to specific habitats, and we knowthat habitat is correlated in space. Typically, one can view the z-process asaccounting for the effects of “unknown” covariates, since it induces spatialstructure in the λ-process, and thus the observed counts. In that sense,maps of the z-process may be interesting and lead to greater understand-ing as to the preferred habitat of the modeled bird species (e.g., Royleet al, 2001). The random component η(x) accounts for observer effects.A major concern in the analysis of BBS data is the known observer bias,as discussed previously. Typically, we can assume that since the observersproduce counts on different routes, they are independent with regards tospace.

The above discussion suggests that we might model z(x) as a Gaussianrandom field with zero mean and covariance given by cθ(x,x′), where θrepresents parameters (possibly vector-valued) of the covariance functionc. In addition, we assume η(x) ∼ N(0, σ2

η), where cov(η(x), η(x′)) = 0 ifx 6= x′.

As presented, the Poisson spatial model follows the framework for gener-alized geostatistical prediction formulated in Diggle et al. (1998). An exam-ple of this approach applied to the BBS problem can be found in Royle etal. (2001). However, implementation in that case was concerned with rela-tively small data sets and over limited geographical regions. The Gaussianrandom field-based Bayesian hierarchical approach becomes increasinglydifficult to implement as the dimensionality of the data and number of pre-diction locations increases. Consequently, such an approach is not feasibleat the continental scale and high resolution that we require in the presentapplication. However, as outlined in Royle and Wikle (2001), one can stilluse the Bayesian GLMM methodology in these high-dimensional settingsif one makes use of spectral representations. This approach is summarizedin the next section.

0.2.1 Spectral Formulation

Let {xi}mi=1 be the set of data locations, at which counts y(xi) were ob-served. Further, let {xj}nj=1 be the set of prediction locations, which may,but need not, include some or all of the m data locations. We now rewritethe mean-process model (0.2):

log(λ(xi)) = µ+ k′izn + η(xi), (0.3)

where zn is an n × 1 vector representation of z-process at the predictionlocations, and the vector ki relates the log-mean process at observationlocation xi to one or more elements of the z-process at prediction locations


(e.g., Wikle et al. 1998; Wikle et al. 2001). We then assume:

zn = Ψα+ ε, (0.4)

where Ψ is an n×pmatrix, fixed and known,α is a p×1 vector of coefficientswith α ∼ N(0,Σα), and ε ∼ N(0, σ2

εI). We let Ψ consist of spectral basisfunctions [ψj,k]n,pj=1,k=1 that are orthogonal. That is, if ψk ≡ [ψ1,k, . . . , ψn,k]′

then ψ′kψj = 0 if k 6= j and 1, otherwise. In this case, we say that α arespectral coefficients. From a hierarchical perspective, we can write:

zn|α, σ2ε ∼ N(Ψα, σ2

εI) (0.5)

andα|Σα ∼ N(0,Σα). (0.6)

In general, the covariance function for the α-process depends on some pa-rameters θ; we denote this covariance by Σα(θ). The modeling motivationfor the hierarchy is apparent if we note that the random z-process can bewritten zn ∼ N(0,Σz(θ)+σεI), where σ2

ε accounts for the “nugget effect”due to small scale variability. Given (0.4), the covariance function for thez-process can be written, Σz(θ) = ΨΣα(θ)Ψ′.

In principle, any set of orthogonal spectral basis functions could be usedfor Ψ. For example, one could use the leading variance modes of the co-variance matrix Σz. Such modes are just the eigenvectors that diagonalizethe spatial covariance matrix and thus are just principal components. Thesespatial principal components are known as Empirical Orthogonal Functions(EOFs) in the geostatistical literature (e.g., Obled and Creutin, 1986; Wikleand Cressie, 1999). Such a formulation is advantageous because it allowsfor non-stationary spatial correlation and dimension reduction (p << n).Another possibility would be to use Fourier basis functions in Ψ. This couldapply if the prediction locations were defined in continuous space or on agrid. However, as we will demonstrate, if we choose a grid implementation,one need not actually form the matrix Ψ, which would be problematic forgrid sizes of order 105 as we consider here. That is, the operation Ψα is ac-tually an inverse Fourier transform operation on the vector α. On a discretelattice, one can use Fast Fourier Transform (FFT) procedures to efficientlyimplement this transform without having to make or store the matrix ofbasis functions. In this case, p = n. If the z-process is stationary, the use ofFourier basis functions suggests that the matrix Σα(θ) is diagonal (asymp-totically). For situations where it is more appropriate to assume that theprocess is nonstationary and the prediction locations can be thought ofas a discrete grid, one could consider a wavelet basis function for Ψ. Inthis case, the operation Ψα is just an inverse discrete wavelet transformof α; again, Ψ need not be constructed directly. Depending on the classof wavelets chosen, the matrix Σα(θ) may be diagonal (asymptotically) ornearly so.

6

In the hierarchical implementation, the parameterization of Σα(θ) isespecially critical. For example, with wavelet basis functions, we mightassume a fractional scaling behavior in the variance of the different waveletscales. This is particularly useful when the process is known to exhibit suchbehavior, such as turbulence examples in atmospheric science (e.g., Wikleet al. 2001). Alternatively, we might assume a common stationary class forthe z-process, such as the Matern class of covariance functions,

c(dij) = φ(θ1dij)θ2Kθ2(θ1dij), φ > 0, θ1 > 0, θ2 > 0, (0.7)

where dij is the distance between two spatial locations, Kθ2 is the modifiedBessel function, θ2 is related to the degree of smoothness of the spatialprocess, θ1 is related to the correlation range, and φ is proportional to thevariance of the process (e.g., Stein 1999, p.48). The corresponding spatialspectral density function at frequency ω is,

f(ω; θ1, θ2, φ, g) =2θ2−1φΓ(θ2 + g/2)θ1

2θ2

πg/2(θ12 + ω2)θ2+g/2

, (0.8)

where g is the dimension of the spatial process (e.g., Stein 1999, p. 49).Thus, if one chooses Fourier basis functions for Ψ and assumes the Maternclass, then Σα(θ) should be diagonal (asymptotically) with diagonal ele-ments corresponding to f given by (0.8). If not known, one must specifyprior distributions for θ and φ at the next level of the model hierarchy.

0.2.2 Model Implementation and Prediction

The hierarchical Poisson model with a spectral spatial component is sum-marized as follows. The joint likelihood for all observations y (an m × 1vector) is

[y|λ] =m∏

i=1

Poisson(λ(xi)), (0.9)

where λ is an m× 1 vector, corresponding to the locations of the vector y.The joint prior distribution for log(λ(xi)) is:

[log(λ)|µ, γ,zn, σ2n] = N(µ1 + γKzn, σ

2ηI), (0.10)

where 1 is an m×1 vector of ones, log(λ) is the m×1 vector with elementslog(λ(xi)), K is an m×n matrix with rows k′i, and γ is a scaling coefficient(introduced for computational reasons as discussed below). Then, let

[zn|α, σ2e ] = N(Ψα, σ2

eI), (0.11)

and allow the spectral coefficients to have distribution,

[α|Rα(θ1)] = N(0,Rα(θ1)), (0.12)

whereRα(θ1) is a diagonal matrix. For the BBS illustration presented here,we let θ2 = 1/2 in (0.7) (i.e., we assume the covariance model is exponen-


tial) but assume the dependence parameter θ1 is random. Note that as aconsequence of including the γ parameter in (0.10) we are able to spec-ify the conditional covariance of α as the diagonalization of a correlationmatrix rather than a covariance matrix (see discussion below). Finally, tocomplete the model hierarchy, we assume the remaining parameters areindependent and specify the following prior distributions:

µ ∼ N(µ0, σ2µ), σ2

η ∼ IG(qη, rη), γ ∼ U [0, b], (0.13)

σ2e ∼ IG(qe, re), θ1 ∼ U [u1, u2], (0.14)

where IG( ) refers to an inverse gamma distribution, and U [ ] a uniformdistribution. For the BBS House Finch data we select qη = 0.5, qe = 1,rη = 2, re = 10, µ0 = 0, σ2

µ = 10, b = 100, u1 = 1, and u2 = 100 (note,our parameterization of the exponential is r(d) ∝ exp(−θ1d), where d isthe distance). These hyperparameters correspond to rather vague properpriors.

The alternative to specifying γ in (0.10) is to let the conditional covari-ance of α be σ2

αRα(θ1). However, as is often the case for Bayesian spatialmodels that are deep in the model hierarchy (and thus, relatively far fromthe data), the MCMC implementation has difficulty converging becauseof the tradeoff between the spatial process variance, σ2

α, and the depen-dence parameter, θ1. By allowing the z-process to have unit variance, as inthe above formulation, we need not estimate σ2

α (which is 1 in this case).The variance in the spatial process is then achieved through γ. In situa-tions where the implied assumption of homogeneous variance is unrealistic,a more complicated reparameterization would be required. Note that theγ parameterization also affects the interpretation of the variance of thez-process (i.e., σ2

e = σ2ε /γ

2).Our goal is the estimation of the joint posterior distribution,

[log(λ),zn, θ1, γ, σ2η, σ

2e , µ|y] ∝ [y| log(λ)][log(λ)|µ, zn, σ2

η][zn|α, σ2e ]

× [α|θ1][θ1][γ][µ][σ2η][σ2

e ]

Although this distribution cannot be analyzed directly, we are able to useMCMC approaches as suggested by Diggle et al. (1998) to draw samplesfrom this posterior and appropriate marginals. In particular, we utilized aGibbs sampler with Metropolis-Hastings sampling of log(λ) and θ1 (e.g.,see Royle et al. 2001). Perhaps more importantly, we would like estimatesfrom the posterior distribution of λn, the λ-process at prediction grid lo-cations. The key difficulty in the traditional (non-spectral) geostatisticalformulation is the dimensionality of the full-conditional update for the z-process given all other parameters. As we show below, this is no longer aserious problem if we make use of the spectral representation.

8

Selected Full-Conditional Distributions

As mentioned above, for the most part the full-conditional distributionsfollow those outlined generally in Diggle et al. (1998) and specifically, thosein Royle et al. (2001). However, the spectral representation allows simplerforms for the zn and α full-conditionals.

The full-conditional distribution for zn can be shown to be:

zn|· ∼ N(S−1z az,S

−1z ), (0.15)

where Sz = I/σ2e +K ′Kγ2/σ2

η and az = Ψα/σ2e +K ′(log(λ)− µ1)γ/σ2

η.In our case, K is an incidence matrix (a matrix of ones and zeros) suchthat each observation is only associated with one prediction grid location(a reasonable assumption at the resolution presented here). Thus, K ′Kcan be shown to be a diagonal matrix with 1’s and 0’s along the diagonal.Although the matrix Sz is very high-dimensional (order 105 × 105), it isdiagonal and trivial to invert. In addition, Ψα can be calculated by theinverse FFT function (a fast operation) and zn is updated as simple uni-variate normal distributions. In practice, we update these simultaneouslyin a matrix language implementation.

Similarly,α|· ∼ N(S−1

α aα,S−1α ), (0.16)

where Sα = (Ψ′Ψ/σ2e + Rα(θ1)−1) and aα = Ψ′zn/σ2

e . At first glance,this appears problematic due to the Ψ′Ψ and Rα(θ1)−1 terms in the full-conditional variance. However, since the spectral operators are orthogonal,Ψ′Ψ = I and the matrix Rα(θ1)−1 is diagonal as discussed previously.Furthermore, Ψ′zn is just the FFT operation on zn and is very fast. Thus,

α|· ∼ N((I/σ2e +Rα(θ1)−1)−1Ψ′zn/σ2

e , (I/σ2e +Rα(θ1)−1)−1) (0.17)

and can be sampled as individual univariate normals, or easily in a blockupdate.

Prediction

To obtain predictions of λn, the λ-process at the prediction grid locations,we sample from

[log(λ(t)n )|z(t)

n , γ(t), µ(t), σ2 (t)η ] = N(µ(t)1 + γ(t)z(t)

n , σ2 (t)η I), (0.18)

where 1 is n × 1 and µ(t), γ(t), z(t)n , σ2 (t)

η are the t-th samples from theMCMC simulation. We obtain λ(t)

n by simply exponentiating these samples.

Implementation

The MCMC simulation must be run long enough to achieve precise estima-tion of model parameters and predictions. For the BBS House Finch data,the MCMC simulation was run for 200,000 iterations after a 50,000 burn-in

RESULTS 9

period. For sake of comparison, the algorithm took approximately 0.5 sec-onds per iteration with a MATLAB implementation on a 500 MHz PentiumIII processor running Linux. Considering there are nearly 20,000 predictionlocations and relatively strong spatial structure, this is quite fast. We ex-amined many shorter runs to establish burn-in time and to evaluate modelsensitivity to the fixed parameters and starting values. The model does notseem overly sensitive to these parameters.

0.3 Results

The posterior mean and posterior standard deviation for the scalar param-eters are shown in Table 0.1.

Table 0.1 Posterior mean and standard deviation of univariate model parameters.

PosteriorPosterior Standard

Parameter Mean Deviation

µ 0.74 0.105γ 1.41 0.138σ2η 0.84 0.100σ2e 0.23 0.064θ1 14.78 4.605

Figure 0.2 shows the posterior mean for the gridded z-process. We notethe agreement with the data shown in Figure 0.1. One might examine thismap to indentify possible habitat covariates that are represented by thespatial random field. One possibility in this case might be elevation andpopulation, both of which are thought to be associated with the prevalenceof the House Finch.

We note that the prediction grid extends beyond the continental UnitedStates. Clearly, estimates over ocean regions are meaningless with regardto House Finch data. These estimates are a result of the large-scale Fouriercoefficients in the model. Fortunately, the map of posterior standard de-viations for this process, shown in Figure 0.3, indicates that these regionswith no-data are highly suspect. This is also true of the northern plainsregion, which has few observations. Of course, having the prediction gridextend over the ocean is not ideal in this case, but the FFT-based algo-rithm requires rectangular grids. We could control for the land-sea effectby having an indicator covariate or possibly, a regime-specific model. Suchmodifications would be simple to implement in the hierarchical Bayesianframework presented here. However, simulation studies have shown that

10

Figure 0.2 Posterior mean of zn for the 1999 BBS House Finch data.

Figure 0.3 Posterior standard deviation of zn for the 1999 BBS House Finchdata.

these are not necessary and if desired, one could simply mask the waterportions of the map for presentation.

Finally, Figure 0.4 and Figure 0.5 show the posterior mean and standarddeviation of the λ-process on the prediction grid. These plots show clearlythat the posterior standard deviation is proportional to the predicted mean,as expected with Poisson count data. In addition, the standard errors arealso high in data sparse regions, as we expect.

0.4 Conclusion

In summary, we have demonstrated how the Bayesian implementation ofgeostatistical-based GLMM Poisson spatial models can be implemented inproblems with very large numbers of prediction locations. By utilizing rel-atively simple spectral transforms and associated orthogonality and decor-relation, we are able to implement the modeling approach very efficientlyin general MCMC algorithms.

CONCLUSION 11

Figure 0.4 Posterior mean of gridded �n for the 1999 BBS House Finch data.

Figure 0.5 Posterior standard deviation of �n for the 1999 BBS House Finchdata.

Acknowledgement

This research has been supported by a grant from the U.S. EnvironmentalProtection Agency’s Science to Achieve Results (STAR) program, Assis-tance Agreement No. R827257-01-0. The author would like to thank AndyRoyle for providing the BBS data and for helpful discussions.

References

Diggle, P.J., J.A. Tawn, and R.A. Moyeed. 1998. Model-based geostatistics(with discussion). Applied Statistics 47:299-350.

Elliott, J.J., and R.S. Arbib. 1953. Origin and status of the house finch inthe eastern United States. Auk 70:31-37.

Link, W.A., and J.R. Sauer. 1998. Estimating population change from

12

count data: application to the North American Breeding Bird Survey.Ecological Applications 8:258-268.

Obled, C., and J.D. Creutin. 1986. Some developments in the use of empir-ical orthogonal functions for mapping meteorological fields. J. Climateand Applied Meteorology 25:1189-1204.

Robbins, C.S., D.A. Bystrak, and P.H. Geissler. 1986. The Breeding BirdSurvey: its first fifteen years, 1965-1979. USDOI, Fish and Wildlife Ser-vice Resource Publication 157. Washington, D.C.

Royle, J.A., W.A. Link, and J.R. Sauer. 2001. Statistical mapping of countsurvey data. In Predicting Species Occurrences: Issues of Scale and Accu-racy, (Scott, J. M., P. J. Heglund, M. Morrison, M. Raphael, J. Haufler,B. Wall, editors). Island Press. Covello, CA. (to appear)

Royle, J.A., and C.K. Wikle. 2001. Large-scale spatial modeling of breedingbird survey data. Under review.

Sauer, J.R., B.G. Peterjohn, and W.A. Link. 1994. Observer differences inthe North American Breeding Bird Survey. Auk 111:50-62.

Sauer, J.R., G.W. Pendleton, and S. Orsillo. 1995. Mapping of bird dis-tributions from point count surveys. Pages 151-160 in C.J. Ralph, J.R.Sauer, and S. Droege, eds. Monitoring Bird Populations by Point Counts,USDA Forest Service, Pacific Southwest Research Station, General Tech-nical Report PSW-GTR-149.

Stein, M. 1999. Interpolation of Spatial Data: Some Theory for Kriging.Springer-Verlag: New York.

Wikle, C.K., Berliner, L.M., and N. Cressie. 1998. Hierarchical Bayesianspace-time models. Journal of Environmental and Ecological Statistics5:117–154.

Wikle, C.K. and N. Cressie. 1999. A dimension reduction approach to space-time Kalman filtering. Biometrika 86:815-829.

Wikle, C.K., R.F. Milliff, D. Nychka, and L.M. Berliner. 2001. Spatiotempo-ral hierarchical Bayesian modeling: Tropical ocean surface winds. Jour-nal of the American Statistical Association 96:382-397.

spatial modelling of count data: a case study in modelling

Documents