propriety of posterior in bayesian space varying parameter models with normal data

4
Statistics and Probability Letters 78 (2008) 2408–2411 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro Propriety of posterior in Bayesian space varying parameter models with normal data Alexandre Rodrigues a , Renato Assunção b,* a Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, United Kingdom b Universidade Federal de Minas Gerais, Departamento de Estatística, 31270-901 Belo Horizonte, MG, Brazil article info Article history: Received 12 October 2007 Accepted 1 March 2008 Available online 15 March 2008 abstract In Bayesian spatially varying parameter models the covariates’ coefficients in a regression model are allowed to change smoothly in space. A Markov random field is adopted as an improper prior distribution for the area-specific spatial effects. We demonstrate that the posterior distribution is a proper probability distribution. © 2008 Elsevier B.V. All rights reserved. 1. Introduction Bayesian methods have been one of the major approaches to deal with spatial data. In fact, Bayesian methods became the choice for almost any complex model involving hierarchically structured random effects or complicated dependence structures. One of the most successful areas of application is that of modelling area-based data (Banerjee et al., 2003). For this type of data, Besag et al. (1991) introduced a particular spatial hierarchical model that generated much interest. This model is still the main building block for further extensions and generalizations such as for space-time models (Bernadinelli et al., 1995; Knorr-Held and Besag, 1998; Waller et al., 1997; Assunção et al., 2001), and for spatially varying regression coefficients (Assunção, 2003; Congdon, 2003a; Gelfand et al., 2003). In the Besag, York and Mollié model, a Gaussian Markov random field (GMRF) is used as a prior distribution for spatially varying random effects. For a thorough review on GMRF, see Rue and Held (2005). The prior used can be either an improper prior or a proper prior, depending on a linear constraint being imposed on the parameters. If the improper prior is used, a relevant problem becomes the assessment of the posterior propriety. With a generalized linear model as the likelihood model for the data, Ghosh et al. (1998), Sun et al. (1999, 2001) found conditions under which the posterior distribution is proper if the spatially structured random effects have an improper prior. Another model that is gaining wide acceptance is the Bayesian spatially varying parameter model (Assuncao et al., 2002; Congdon, 2003b; Gelfand et al., 2003; Fahrmeir et al., 2004; Brezger et al., 2007; Waller et al., 2007). A review of spatially varying models is given in Assunção (2003). In these models, rather than introducing spatially structured random effects, the covariates’ coefficients vary smoothly as one moves on a geographical map. That is, the effects of the covariates are not constant but rather vary spatially. The typical inference approach in these spatially varying models is Bayesian and they have been called Bayesian spatially varying parameter (BSVP) models. The prior distribution for the random covariates’ coefficients is an improper Markov random field. The important issue of the propriety of the posterior distribution in these models has not been addressed. Considering normally distributed response variables, the aim of this paper is to prove that the posterior distribution of the BSVP models is a proper distribution when the prior distribution is an improper distribution. The next section presents the * Corresponding author. E-mail addresses: [email protected] (A. Rodrigues), [email protected] (R. Assunção). 0167-7152/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2008.03.004

Upload: alexandre-rodrigues

Post on 29-Jun-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Propriety of posterior in Bayesian space varying parameter models with normal data

Statistics and Probability Letters 78 (2008) 2408–2411

Contents lists available at ScienceDirect

Statistics and Probability Letters

journal homepage: www.elsevier.com/locate/stapro

Propriety of posterior in Bayesian space varying parameter models withnormal dataAlexandre Rodrigues a, Renato Assunção b,∗

a Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, United Kingdomb Universidade Federal de Minas Gerais, Departamento de Estatística, 31270-901 Belo Horizonte, MG, Brazil

a r t i c l e i n f o

Article history:Received 12 October 2007Accepted 1 March 2008Available online 15 March 2008

a b s t r a c t

In Bayesian spatially varying parameter models the covariates’ coefficients in a regressionmodel are allowed to change smoothly in space. A Markov random field is adopted as animproper prior distribution for the area-specific spatial effects. We demonstrate that theposterior distribution is a proper probability distribution.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction

Bayesian methods have been one of the major approaches to deal with spatial data. In fact, Bayesian methods becamethe choice for almost any complex model involving hierarchically structured random effects or complicated dependencestructures.

One of the most successful areas of application is that of modelling area-based data (Banerjee et al., 2003). For this typeof data, Besag et al. (1991) introduced a particular spatial hierarchical model that generated much interest. This model isstill the main building block for further extensions and generalizations such as for space-time models (Bernadinelli et al.,1995; Knorr-Held and Besag, 1998;Waller et al., 1997; Assunção et al., 2001), and for spatially varying regression coefficients(Assunção, 2003; Congdon, 2003a; Gelfand et al., 2003).

In the Besag, York and Mollié model, a Gaussian Markov random field (GMRF) is used as a prior distribution for spatiallyvarying random effects. For a thorough review on GMRF, see Rue and Held (2005). The prior used can be either an improperprior or a proper prior, depending on a linear constraint being imposed on the parameters. If the improper prior is used,a relevant problem becomes the assessment of the posterior propriety. With a generalized linear model as the likelihoodmodel for the data, Ghosh et al. (1998), Sun et al. (1999, 2001) found conditions under which the posterior distribution isproper if the spatially structured random effects have an improper prior.

Another model that is gaining wide acceptance is the Bayesian spatially varying parameter model (Assuncao et al., 2002;Congdon, 2003b; Gelfand et al., 2003; Fahrmeir et al., 2004; Brezger et al., 2007; Waller et al., 2007). A review of spatiallyvarying models is given in Assunção (2003). In these models, rather than introducing spatially structured random effects,the covariates’ coefficients vary smoothly as one moves on a geographical map. That is, the effects of the covariates are notconstant but rather vary spatially.

The typical inference approach in these spatially varyingmodels is Bayesian and they have been called Bayesian spatiallyvarying parameter (BSVP) models. The prior distribution for the random covariates’ coefficients is an improper Markovrandom field. The important issue of the propriety of the posterior distribution in these models has not been addressed.Considering normally distributed response variables, the aim of this paper is to prove that the posterior distribution of theBSVP models is a proper distribution when the prior distribution is an improper distribution. The next section presents the

∗ Corresponding author.E-mail addresses: [email protected] (A. Rodrigues), [email protected] (R. Assunção).

0167-7152/$ – see front matter© 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.spl.2008.03.004

Page 2: Propriety of posterior in Bayesian space varying parameter models with normal data

A. Rodrigues, R. Assunção / Statistics and Probability Letters 78 (2008) 2408–2411 2409

BSVPmodel and shows that we obtain a proper prior if we impose a linear constraint on the area-specific vector coefficients.More relevant for the applications of the BSVP model is the material in Section 3, where we present our main resultconcerning the propriety of the posterior distribution when no restrictions are added to the spatially varying parameters.

2. BSVP model for normally distributed data

The situation where the response variable has a normal distribution is important for two reasons: because there areapplications where this is a reasonable assumption, and because it is a situation where we can make explicit calculationsthat highlight important aspects of themodel. In particular, this is a situation where Gibbs sampling can be used, dispensingwith a more complicated Metropolis–Hastings procedure.

Suppose that we have n areas and, in each one of them, we record a response variable Yi normally distributed. Supposealso that q covariates xi1, . . . , xiq, potentially associated with Yi, are also measured. We assume that

yi = b0i + b1ix1i + b2ix2i + · · · + bqixqi + εi, (1)

conditionally on the parameters bij (i = 1, . . . n, j = 0, . . . q) and on φ, where φ is the precision associated with the vectorε = (ε1, . . . , εn) of normally distributed random variables with mean zero and covariance matrix φ−1In. The parameters bijrepresent the average effect of the j-th covariate on the response associated with area i.

In model (1) we can decompose bij in a sum of two random effects. One of them is the average (or global) effect of the jcovariate on the response. The other is the local effect, or the area-specific effect on the response. We can rewrite the modelas

yi = (α0 + β0i) + (α1 + β1i)x1i + · · · + (αq + βqi)xqi + εi. (2)

Let β i• be defined as the (q + 1) × 1 vector (βi0, . . . ,βiq) and β•• as the n(q + 1)-dim vector (β1•, . . . ,βn•). The parametervector β•• has a spatial structure induced by a neighborhood graph where each site i has a set of neighbors indicated by ∂i.Its joint prior density is given by a Markov random field:

f (β••|Ψ) = f (β1•, . . . ,βn•|Ψ) ∝ |Ψ |n/2 exp

{−12

∑i∼j

(β i• − β j•

)tΨ

(β i• − β j•

)}, (3)

where i ∼ j means that site i is a neighbor of site j, and Ψ is a (q + 1) × (q + 1) positive definite precision matrix.It is easy to show that (3) is equivalent to specifying the conditional distributions as q+1multivariate normal distributions

given by

(β i•|{β j•, j 6= i},Ψ) ∼ Nq+1(β i•, (niΨ)−1), (4)

where ni = # ∂i and β i• is the vector mean of the set {β j• ; j ∈ ∂i}.To complete the prior distribution specification, we assume that α• = (α0,α1, . . . ,αq) has a Gaussian multivariate prior

distribution, φ has a Gamma prior distribution and the precision matrix Ψ has a Wishart prior distribution with υ degreesof freedom and a symmetric and positive definite (q + 1) × (q + 1) matrix M which implies that it has density proportionalto |Ψ |

(υ−q)/2 exp (−tr (MΨ) /2), denoted as W (υ/2, S/2).Let 1k and 0k be the k-dimensional vectors (1, . . . , 1) and (0, . . . , 0), respectively. Since

f (β1•, . . . ,βn•|Ψ) = f (β1• − c1q+1, . . . ,βn• − c1q+1 | Ψ),

the prior distribution (3) is invariant under translation of (β1•, . . . ,βn•) by a constant c added to each vector entry. Therefore,the prior distribution is improper.

If a linear constraint such as∑n

i β i• = 0q+1, is imposed, then (3) is integrable and the prior distribution becomes proper.In fact, if we denote the Kronecker product by ⊗, we can write,

Q =∑i∼j

(β i• − β j•)tΨ(β i• − β j•) = β t

••(A ⊗ Ψ)β••,

where A is a symmetric n × n matrix with diagonal elements aii = ni and off-diagonal elements aij = −1 if i ∼ j, and 0otherwise.

Denote the rank of a matrix C by r(C). From the elementary properties of the Kronecker product, the r(A ⊗ Ψ) is theproduct of the ranks of A and Ψ . The matrix Ψ has rank q + 1. Assuncao et al. (2002) showed that the rank of A is n − 1.Therefore,

r(A ⊗ Ψ) = r(A) r(Ψ) = (n − 1) (q + 1) = n(q + 1) − (q + 1).

As a consequence, A ⊗ Ψ has q + 1 eigenvectors associated with the eigenvalue zero. These eigenvectors are the columns ofthe matrix 1n ⊗ Iq+1, where Iq+1 is the identity matrix of rank q + 1.

Since A ⊗ Ψ is a symmetric matrix, we can write A ⊗ Ψ = PtDP, where D = diag(0, . . . , 0,λq+2, . . . ,λn(q+1)) and Pt isa n(q + 1) × n(q + 1) matrix with the first q + 1 columns given by 1n ⊗ Iq+1. The remaining columns of Pt complete anorthonormal basis of eigenvectors of A ⊗ Ψ .

Page 3: Propriety of posterior in Bayesian space varying parameter models with normal data

2410 A. Rodrigues, R. Assunção / Statistics and Probability Letters 78 (2008) 2408–2411

By assumption, the q + 1-dimensional vector 0q+1 satisfies

0q+1 =∑i

β i• = (1n ⊗ Iq+1)t β••.

Therefore,

Pβ = (0, . . . , 0, xq+2, . . . , xn(q+1))t= x,

for arbitrarily chosen xq+2, . . . , xn(q+1). Denote (xq+2, . . . , xn(q+1))t by x−(q+1). We then have

Q = βt••

(A ⊗ Ψ)β•• = β t••PtDPβ•• = xtDx = xt

−(q+1)D∗x−(q+1),

where D∗= diag(λq+2, . . . ,λn(q+1)). Hence, if

∑i βi• = 0q+1, we have:

f (β•• | Ψ) ∝ exp(−12Q

)= exp

(−12xt

−(q+1)D∗x−(q+1)

),

where x−(q+1) belongs to Rn(q+1)−(q+1). The function exp(− 12x

t−(q+1)D

∗x−(q+1)) is integrable in Rn(q+1)−(q+1) because it isproportional to the n(q + 1) − (q + 1)-dim multivariate normal density. Therefore, f (β•• | Ψ) is a proper prior distributionif

∑i β i• = 0q+1.

3. Propriety of posterior distribution

We prove now that f (β••,α•,Ψ,φ | Y) is a proper posterior distribution even if one does not add the linear restriction∑i β i• = 0q+1.We have that the posterior distribution is given by

f (β••,α•,Ψ,φ | Y) ∝ exp{−12

[β t

••

(φWtW + (A ⊗ Ψ)

)β•• − 2β t

••φWtZ + ZtZ

]}f (α•)φ

n/2f (φ)|Ψ |n/2f (Ψ), (5)

where Z = Y − Xα• and W is a n × n(q + 1) matrix defined as

W =

1 x11 . . . x1q 0 . . . 00 . . . 0 1 x21 . . . x2q 0 . . . 0...

......

......

......

......

......

0 . . . 0 . . . 1 . . . xnq

.

Hence, WtW is a block-diagonal matrix with the i-th block given by the (q + 1) × (q + 1) matrix xtixi.

If we prove that R = φWtW + (A × Ψ) is a positive definite matrix, then f (β••,α•,Ψ,φ | Y) is integrable with respect toβ•• because (5) is proportional to a multivariate normal density.

The matrix A⊗ Ψ is semi-positive definite. Since r(A) = n− 1 (Assuncao et al., 2002) and r(Ψ) = q+ 1, the rank of A⊗ Ψ

is n(q+1)− (q+1). Therefore, we have q+1 orthogonal eigenvectors associated with the eigenvalue 0. These eigenvectors,as seen before, are the columns of the matrix 1n ⊗ Iq+1.

The i-th column of the matrix WtW(1n ⊗ Iq+1) is 0n(q+1), if and only if the covariate i is equal to 0 for all areas. Therefore,the columns of 1n ⊗ Iq+1 do not belong to the null space of WtW.

Take any x ∈ Rn(q+1) with x 6= 0n(q+1). If xbelongs to the subspace generated by the columns of1n⊗Iq+1 then xt(WtW)x > 0.Otherwise, xt(A ⊗ Ψ)x > 0. Therefore, for any non-null x ∈ Rn(q+1), we have

xt (φWtW + A ⊗ Ψ)x = φxt(WtW)x + xt(A ⊗ Ψ)x > 0

and R = φWtW + (A × Ψ) is a positive-definite matrix.Integrating out with respect to β••, we have that f (α•,Ψ,φ | Y) is proportional to

|R|−1/2 exp{−12(φZtZ − atRa)

}f (α•)φ

n/2f (φ)|Ψ |n/2f (Ψ), (6)

where a = φR−1WtZ.Let ΛC denote the diagonal matrix with the eigenvalues λi

C of a matrix C. We can write

R = Pt(φΛWtW + ΛA⊗Ψ)P,

where the first q+ 1 columns of Pt are given by 1n ⊗ Iq+1 and the remaining columns of Pt complete an orthonormal basis ofeigenvectors of R.

By the properties of the Kronecker product, the eigenvalues of A ⊗ Ψ are given by λiAλ

jΨ(i = 1, . . . , n; j = 1, . . . , q + 1).

As the first q + 1 eigenvalues of A ⊗ Ψ are equal to 0, we have that

|R| = |φΛWtW + ΛA⊗Ψ | =

np∏i=1

(φλiWtW + λi

A⊗Ψ) >q+1∏i=1

φλiWtW

np∏i=q+2

λiA⊗Ψ = φq+1c1|Ψ |

n−1c2, (7)

Page 4: Propriety of posterior in Bayesian space varying parameter models with normal data

A. Rodrigues, R. Assunção / Statistics and Probability Letters 78 (2008) 2408–2411 2411

where c1 and c2 are constants that do not depend on φ and Ψ . Hence,

|R|−1/2 < φ−(q+1)/2c−1/21 |Ψ |

(n−1)/2c−1/22 .

We also have that (Z − Wa)tφ(Z − Wa) ≥ 0, and then atφWtWa ≥ 2φZtWa − φZtZ. Furthermore, we can write

atRa = atφWtWa + at(A ⊗ Ψ)a ≥ 2φZtWa − φZtZ.

Since atRa = φZtWa, we have that φZtZ − atRa ≥ 0. Therefore,

|R|−1/2 exp{−12(φZtZ − atRa)

}f (α•)φ

n/2f (φ)|Ψ |n/2f (Ψ) < c

−1/21 c

−1/22 f (α•)φ

(n−(q+1))/2f (φ)|Ψ |1/2f (Ψ). (8)

If f (α•) is a proper prior distribution and f (φ) e f (Ψ) have Gamma and Wishart distributions, respectively, thenf (β••,α•,Ψ,φ | Y) is integrable and therefore it is a proper posterior distribution.

References

Assunção, R., 2003. Space varying coefficient models for small area data. Environmetrics 14, 453–473.Assuncao, R.M., Potter, J.E., Cavenaghi, S.M., 2002. A Bayesian space varying parametermodel applied to estimating fertility schedules. Statistics inMedicine

21, 2057–2076.Assunção, R., Reis, I.A., Oliveira, C.L., 2001. Diffusion and prediction of Leishmaniasis in a largemetropolitan area in Brazil with a Bayesian spacetimemodel.

Statistics in Medicine 20, 2319–2335.Banerjee, S., Carlin, B.P., Gelfand, A.E., 2003. Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall/CRC, Boca Raton.Bernadinelli, L., Clayton, D., Pascutto, C., Montomoli, C., Ghislandi, M., Songini, M., 1995. Bayesian analysis of space-time variation in disease risk. Statistics

in Medicine 14, 2433–2443.Besag, J., York, J., Mollié, A., 1991. A Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics

43, 1–20.Brezger, A., Fahrmeir, L., Hennerfeind, A., 2007. Adaptive Gaussian Markov random fields with applications in human brain mapping. Journal of the Royal

Statistical Society: Series C 56, 327–345.Congdon, P., 2003a. A model for non-parametric spatially varying regression effects. Computational Statistics & Data Analysis 50, 422–445.Congdon, P., 2003b. Modelling spatially varying impacts of socioeconomic predictors on mortality outcomes. Journal of Geographical Systems 5, 161–184.Fahrmeir, L., Kneib, T., Lang, S., 2004. Penalized structured additive regression for space-time data: A Bayesian perspective. Statistica Sinica 14, 731–761.Gelfand, A.E., Kim, H.J., Sirmans, C.F., Banerjee, S., 2003. Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical

Association 98, 387–396.Ghosh,M., Natarajan, K., Stroud, T.W.F., Carlin, B.P., 1998. Generalized linearmodels for small-area estimation. Journal of theAmerican Statistical Association

93, 273–282.Rue, H., Held, L., 2005. Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall/CRC, Boca Raton.Sun, D., Tsutakawa, R.K., Speckman, P.L., 1999. Bayesian inference for CAR (1) models with noninformative priors. Biometrika 86, 341–350.Sun, D., Tsutakawa, R.K., He, Z., 2001. Propriety of posteriors with improper priors in hierarchical linear mixed models. Statistica Sinica 11, 77–95.Waller, L.A., Carlin, B.P., Xia, H., Gelfand, A.E., 1997. Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association

92, 607–617.Waller, L.A., Zhu, L., Gotway, C.A., Gorman, D.M., Gruenewald, P.J., 2007. Quantifying geographic variations in associations between alcohol distribution and

violence: A comparison of geographically weighted regression and spatially varying coefficient models. Stochastic Environmental Research and RiskAssessment 21, 573–588.