fabio sigrist, hans r. kunsc h, werner a. stahel 8092 ... · spde based modeling of large...

32
SPDE based modeling of large space-time data sets Fabio Sigrist, Hans R. K¨ unsch, Werner A. Stahel Seminar for Statistics, Department of Mathematics, ETH Z¨ urich 8092 Z¨ urich, Switzerland October 29, 2018 Abstract Increasingly larger data sets of processes in space and time ask for statistical models and methods that can cope with such data. We show that solutions of stochastic advection- diffusion partial differential equations (SPDEs) provide a flexible model class for spatio- temporal processes which is computationally feasible also for large data sets. The solution of the SPDE has in general a nonseparable covariance structure. Furthermore, its parame- ters can be physically interpreted as explicitly modeling phenomena such as transport and diffusion that occur in many natural processes in diverse fields ranging from environmental sciences to ecology. In order to obtain computationally efficient statistical algorithms we use spectral methods to solve the SPDE. This has the advantage that approximation errors do not accumulate over time, and that in the spectral space the computational cost grows linearly with the dimension, the total computational costs of Bayesian or frequentist inference being dominated by the fast Fourier transform. The proposed model is applied to postprocessing of precipitation forecasts from a numerical weather prediction model for northern Switzerland. In contrast to the raw forecasts from the numerical model, the postprocessed forecasts are calibrated and quantify prediction uncertainty. Moreover, they outperform the raw forecasts. Keywords: spatio-temporal model, Gaussian process, physics based model, advection-diffusion equation, spectral methods, numerical weather prediction 1 arXiv:1204.6118v4 [stat.ME] 20 Nov 2012

Upload: buicong

Post on 10-Nov-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

SPDE based modeling of large space-time data sets

Fabio Sigrist, Hans R. Kunsch, Werner A. StahelSeminar for Statistics, Department of Mathematics, ETH Zurich

8092 Zurich, Switzerland

October 29, 2018

Abstract

Increasingly larger data sets of processes in space and time ask for statistical models andmethods that can cope with such data. We show that solutions of stochastic advection-diffusion partial differential equations (SPDEs) provide a flexible model class for spatio-temporal processes which is computationally feasible also for large data sets. The solutionof the SPDE has in general a nonseparable covariance structure. Furthermore, its parame-ters can be physically interpreted as explicitly modeling phenomena such as transport anddiffusion that occur in many natural processes in diverse fields ranging from environmentalsciences to ecology. In order to obtain computationally efficient statistical algorithms we usespectral methods to solve the SPDE. This has the advantage that approximation errors do notaccumulate over time, and that in the spectral space the computational cost grows linearlywith the dimension, the total computational costs of Bayesian or frequentist inference beingdominated by the fast Fourier transform. The proposed model is applied to postprocessing ofprecipitation forecasts from a numerical weather prediction model for northern Switzerland.In contrast to the raw forecasts from the numerical model, the postprocessed forecasts arecalibrated and quantify prediction uncertainty. Moreover, they outperform the raw forecasts.

Keywords: spatio-temporal model, Gaussian process, physics based model, advection-diffusionequation, spectral methods, numerical weather prediction

1

arX

iv:1

204.

6118

v4 [

stat

.ME

] 2

0 N

ov 2

012

1 Introduction

Space-time data arise in many applications, see Cressie and Wikle (2011) for an introductionand an overview. Increasingly larger space-time data sets are obtained, for instance, fromremote sensing satellites or deterministic physical models such as numerical weather prediction(NWP) models. Statistical models are needed that can cope with such data.

As Wikle and Hooten (2010) point out, there are two basic paradigms for constructingspatio-temporal models. The first approach is descriptive and follows the traditional geo-statistical paradigm, using joint space-time covariance functions (Cressie and Huang 1999;Gneiting 2002; Ma 2003; Wikle 2003; Stein 2005; Paciorek and Schervish 2006). The secondapproach is dynamic and combines ideas from time-series and spatial statistics (Solna andSwitzer 1996; Wikle and Cressie 1999; Huang and Hsu 2004; Xu et al. 2005; Gelfand et al.2005; Johannesson et al. 2007; Sigrist et al. 2012).

Even for purely spatial data, developing methodology which can handle large data setsis an active area of research. Banerjee et al. (2004) refer to this as the “big n problem”.Factorizing large covariance matrices is not possible without assuming a special structureor using approximate methods. Using low rank matrices is one approach (Nychka et al.2002; Banerjee et al. 2008; Cressie and Johannesson 2008; Stein 2008; Wikle 2010). Otherproposals include using Gaussian Markov random-fields (GMRF) (Rue and Tjelmeland 2002;Rue and Held 2005; Lindgren et al. 2011) or applying tapering (Furrer et al. 2006) so thatone obtains sparse precision or covariance matrices, respectively, for which calculations canbe done efficiently. Another proposed solution is to approximate the likelihood so that it canbe evaluated faster (Vecchia 1988; Stein et al. 2004; Fuentes 2007; Eidsvik et al. 2012). Royleand Wikle (2005) and Paciorek (2007) use Fourier functions to reduce computational costs.

In a space-time setting, the situation is the same, if not worse: one runs into a com-putational bottleneck with high dimensional data since the computational costs to factorizedense NT ×NT covariance matrices are O((NT )3), N and T being the number of points inspace and time, respectively. Moreover, specifying flexible and realistic space-time covariancefunctions is a nontrivial task.

In this paper, we follow the dynamic approach and study models which are defined througha stochastic advection-diffusion partial differential equation (SPDE). This has the advantageof providing physically motivated parametrizations of space-time covariances. We show thatwhen solving the SPDE using Fourier functions, one can do computationally efficient statisti-cal inference. In the spectral space, computational costs for the Kalman filter and backwardsampling algorithms are of order O(NT ). As we show, roughly speaking, this computationalefficiency is due to the temporal Markov property, the fact that Fourier functions are eigen-functions of the spatial differential operators, and the use of some matrix identities. Theoverall computational costs are then determined by the ones of the fast Fourier transform(FFT) (Cooley and Tukey 1965) which are O(TN logN).

Defining Gaussian processes through stochastic differential equations has a long history instatistics going back to early works such as Whittle (1954), Heine (1955), and Whittle (1962).Later works include Jones and Zhang (1997) and Brown et al. (2000). Recently, Lindgrenet al. (2011) showed how a certain class of SPDEs can be solved using finite elements toobtain parametrizations of spatial GMRF.

Spectral methods for solving partial differential equation are well established in the nu-merical mathematics community (see, e.g., Gottlieb and Orszag (1977), Folland (1992), orHaberman (2004)). In contrast, statistical models have different requirements and goals, since

2

the (hyper-)parameters of an (S)PDE are not known a priori and need to be estimated. Spec-tral methods have also been used in spatio-temporal statistics, mostly for approximating orsolving deterministic integro-difference equations (IDEs) or PDEs. Wikle and Cressie (1999)introduce a dynamic spatio-temporal model obtained from an IDE that is approximated us-ing a reduced-dimensional spectral basis. Extending this work, Wikle (2002) and Xu et al.(2005) propose parametrizations of spatio-temporal processes based on IDEs. Modeling trop-ical ocean surface winds, Wikle et al. (2001) present a physics based model based on theshallow-water equations. See also (Cressie and Wikle 2011, Chapter 7) for an overview ofbasis function expansions in spatio-temporal statistics.

The novel features of our work are the following. While spectral methods have beenused for approximating deterministic IDEs and PDEs in the statistical literature, there isno article, to our knowledge, that explicitly shows how one obtains a space-time Gaussianprocess by solving an advection-diffusion SPDE using the real Fourier transform. Moreover,we present computationally efficient algorithms, for doing statistical inference, which use thefast Fourier transform and the Kalman filter. The computational burden can be additionallyalleviated by applying dimension reduction. We also give a bound on the accuracy of theapproximate solution. In the application, we postprocess precipitation forecasts, explicitlymodeling spatial and temporal variation. The idea is that the spatio-temporal model notonly accounts for dependence, but it also captures and extrapolates dynamically an errorterm of the NWP model in space and time.

The remainder of this paper is organized as follows. Section 2 introduces the continuousspace-time Gaussian process defined through the advection-diffusion SPDE. In Section 3, itis shown how the solution of the SPDE can be approximated using the two-dimensional realFourier transform, and we give convergence rates for the approximation. Next, in Section 4,we show how one can do computationally efficient inference. In Section 5, the spatio-temporalmodel is used as part of a hierarchical Bayesian model, which we then apply for postprocessingof precipitation forecasts.

All the methodology presented in this article is implemented in the R package spate (seeSigrist et al. (2012)).

2 A Continuous Space-Time Model: TheAdvection-Diffusion SPDE

In one dimension, a fundamental process is the Ornstein-Uhlenbeck process which is governedby a relatively simple stochastic differential equation (SDE). The process has an exponen-tial covariance function and its discretized version is the famous AR(1) model. In the twodimensional spatial case, Whittle (1954) argues convincingly that the process with a Whittlecorrelation function is an “elementary” process (see Section 2.2 for further discussion). If thetime dimension is added, we think that the process defined through the stochastic partialdifferential equation (SPDE) in (1) has properties that make it a good candidate for an “ele-mentary” spatio-temporal process. It is a linear equation that explicitly models phenomenasuch as transport and diffusion that occur in many natural processes ranging from environ-mental sciences to ecology. This means that, if desired, the parameters can be given a physicalinterpretation. Furthermore, if some parameters equal zero (no advection and no diffusion),its covariance structure reduces to a separable one with an AR(1) structure over time and acertain covariance structure over space.

3

The advection-diffusion SPDE, also called transport-diffusion SPDE, is given by

∂tξ(t, s) = −µ · ∇ξ(t, s) +∇ ·Σ∇ξ(t, s)− ζξ(t, s) + ε(t, s), (1)

with s = (x, y)′ ∈ R2, where ∇ =(∂∂x ,

∂∂y

)′is the gradient operator, and, for a vector

field F = (F x, F y)′, ∇ · F = ∂Fx

∂x + ∂F y

∂y is the divergence operator. ε(t, s) is a Gaussianprocess that is temporally white and spatially colored. See Section 2.2 for a discussion onthe choice of the spatial covariance function. The SPDE has the following interpretation.Heuristically, an SPDE specifies what happens locally at each point in space during a smalltime step. The first term µ · ∇ξ(t, s) models transport effects (called advection in weatherapplications), µ = (µx, µy)

′ being a drift or velocity vector. The second term, ∇ ·Σ∇ξ(t, s),is a diffusion term that can incorporate anisotropy. If Σ is the identity matrix, this termreduces to the divergence (∇·) of the gradient (∇) which is the ordinary Laplace operator

∇ · ∇ = ∆ = ∂2

∂x2+ ∂2

∂y2. The third term −ζξ(t, s) diminishes ξ(t, s) at a constant rate and

thus accounts for damping. Finally, ε(t, s) is a source-sink or stochastic forcing term, alsocalled innovation term, that can be interpreted as describing, amongst others, convectivephenomena in precipitation modeling applications.

Concerning the diffusion matrix Σ, we suggest the following parametrization

Σ−1 =1

ρ21

(cosα sinα−γ · sinα γ · cosα

)T (cosα sinα−γ · sinα γ · cosα

). (2)

The parameters are interpreted as follows. ρ1 acts as a range parameter and controls theamount of diffusion. The parameters γ and α control the amount and the direction ofanisotropy. With γ = 1, isotropic diffusion is obtained.

Heine (1955) and Whittle (1963) introduced and analyzed SPDEs of similar form as in(1). Further, Jones and Zhang (1997) also investigated SPDE based models. Brown et al.(2000) obtained such an advection-diffusion SPDE as a limit of stochastic integro-differenceequation models. Without giving any concrete details, Lindgren et al. (2011) suggested thatthis SPDE can be used in connection with their GMRF method.

Figure 1 illustrates the SPDE in (1) and the corresponding PDE without the stochasticinnovation term. The top row shows a solution to the PDE which corresponds the deter-ministic part of the SPDE that is obtained when there is no stochastic term ε(t, s). Thefigure shows how the initial state in the top-left plot gets propagated forward in time. Thedrift vector points from north-east to south-west and the diffusive part exhibits anisotropy inthe same direction. A 100 × 100 grid is used and the PDE is solved in the spectral domainusing the method described below in Section 3. There is a fundamental difference betweenthe deterministic PDE and the probabilistic SPDE. In the first case, a deterministic processis modeled directly. In the second case, the SPDE defines a stochastic process. Since the op-erator is linear and the input Gaussian, this process is a Gaussian process whose covariancefunction is implicitly defined by the SPDE. The bottom row of Figure 1 shows one samplefrom this Gaussian process. The same initial state as in the deterministic example is used, i.e.,we use a fixed initial state. For the innovations ε(t, s), we choose a Gaussian process that istemporally independent and spatially structured according to the Matern covariance functionwith smoothness parameter 1. Again, the drift vector points from north-east to south-westand the diffusive part exhibits anisotropy in the same direction.

4

2040

6080

100 t=1

1:n

0

1

2

3

4

t=2

1:n

1:n

0.0

0.5

1.0

1.5

2.0

t=3

1:n

1:n

0.0

0.5

1.0

1.5

t=4

1:n

1:n

0.2

0.4

0.6

0.8

1.0

1.2

t=5

1:n

1:n

0.2

0.4

0.6

0.8

1.0

20 40 60 80 100

2040

6080

100 t=1

0

1

2

3

4

20 40 60 80 100

t=2

1:n

−1

0

1

2

20 40 60 80 100

t=3

1:n

−2

−1

0

1

2

20 40 60 80 100

t=4

1:n

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

20 40 60 80 100

t=5

1:n

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

Figure 1. Illustration of the SPDE in (1) and the corresponding PDE. The top row illustrates asolution to the PDE which corresponds the deterministic part of the SPDE without stochasticterm ε(t, s). The bottom row shows one sample from the distribution specified by the SPDEwith a fixed initial condition. The drift vector points from north-east to south-west and thediffusive part exhibits anisotropy in the same direction.

Note that the use of this spatio-temporal Gaussian process is not restricted to situationswhere it is a priori known that phenomena such as transport and diffusion occur. In theone dimensional case, it is common to use the AR(1) process in situations where it is nota priori clear whether the modeled process follows the dynamic of the Ornstein-UhlenbeckSDE. In two dimensions, the same holds true for the process with the Whittle covariancefunction, not to mention the process having an exponential covariance structure. Having thisin mind, even though the SPDE in (1) is physically motivated, it can be used as a generalspatio-temporal model. As the case may be, the interpretation of the parameters can be moreor less straightforward.

2.1 Spectral Density and Covariance Function

As can be shown using the Fourier transform (see, e.g., Whittle (1963)), if the innovationprocess ε(t, s) is stationary with spectral density f(k), the spectrum of the stationary solutionξ(t, s) of the SPDE (1) is

f(ω,k) = f(k)1

(2π)

((k′Σk + ζ

)2+(ω + µ′k

)2)−1, (3)

where k and ω are spatial wavenumbers and temporal frequencies. The covariance functionC(t, s) of ξ(t, s) is then given by

C(t, s) =

∫f(ω,k)exp (itω)exp

(is′k

)dkdω

=

∫f(k)

exp(−iµ′kt− (k′Σk + ζ)|t|

)2(k′Σk + ζ)

exp(is′k

)dk,

(4)

where the integration over the temporal frequencies ω follows from the calculation of thecharacteristic function of the Cauchy distribution (Abramowitz and Stegun 1964). The spatial

5

integral above has no closed form solution but can be computed approximately by numericalintegration.

Since, in general, the spectrum does not factorize into a temporal and a spatial component,we see that ξ(t, s) has a non-separable covariance function (see Gneiting et al. (2007b) fora definition of separability). The model reduces to a separable one, though, when there isno advection and diffusion, i.e., when both µ and Σ are zero. In this case, the covariancefunction is given by C(t, s) = 1

2ζ exp (−ζ|t|)C(s), where C(s) denotes the spatial covariancefunction of the innovation process.

2.2 Specification of the Innovation Process

As mentioned before, it is assumed that the innovation process is white in time and spatiallycolored. In principle, one can choose any spatial covariance function such that the covariancefunction in (4) is finite at zero. Note that if f(k) is integrable, then f(ω,k) is also integrable.Similarly as Lindgren et al. (2011), we opt for the most commonly used covariance functionin spatial statistics: the Matern covariance function (see Handcock and Stein (1993), Stein(1999)). Since in many applications the smoothness parameter is not estimable, we furtherrestrict ourselves to the Whittle covariance function. This covariance function is of the formσ2d/ρ0K1 (d/ρ0) with d being the Euclidean distance between two points and K1 (d/ρ0) beingthe modified Bessel function of order 1. It is called after Whittle (1954) who introducedit and argued convincingly that it “may be regarded as the ’elementary’ correlation in twodimensions, similar to the exponential in one dimension.”. It can be shown that the stationarysolution of the SPDE (

∇ · ∇ − 1

ρ20

)ε(t, s) =W(t, s), (5)

where W(t, s) is a zero mean Gaussian white noise field with variance σ2, has the Whittlecovariance function in space. From this, it follows that the spectrum of the process ε(t, s) isgiven by

f(k) =σ2

(2π)2

(k′k +

1

ρ20

)−2

. (6)

2.3 Relation to an Integro-Difference Equation

Assuming discrete time steps with lag ∆, Brown et al. (2000) consider the following integro-difference equation (IDE)

ξ(t, s) = exp (−∆ζ)

∫R2

h(s− s′)ξ(t−∆, s)ds′ + ε(t, s), s ∈ R2, (7)

with a Gaussian redistribution kernel

h(s− s′) = (2π)−1|2∆Σ|−1/2 exp(−(s− s′ −∆µ)T (2∆Σ)−1(s− s′ −∆µ)/2

),

ε(t, s) being temporally independent and spatially dependent. They show that in the limit∆ → 0, the solution of the IDE and the one of the SPDE in (1) coincide. The IDE isinterpreted as follows: the convolution kernel h(s− s′) determines the weight or the amountof influence that a location s′ at previous time t − ∆ has on the point s at current time t.This IDE representation provides an alternative way of interpreting the SPDE model and its

6

parameters. Storvik et al. (2002) show under which conditions a dynamic model determinedby an IDE as in (7) can be represented using a parametric joint space-time covariance function,and vice versa. Based on the IDE in (7), Sigrist et al. (2012) construct a spatio-temporal modelfor irregularly spaced data and apply it to obtain short term predictions of precipitation.

3 Solution in the Spectral Space

Solutions ξ(t, s) of the SPDE (1) are defined in continuous space and time. In practice, oneneeds to discretize both space and time. The resulting vector of NT space-time points is ingeneral of large dimension. This makes statistical inference, be it frequentist or Bayesian,computationally difficult to impossible. However, as we show in the following, solving theSPDE in the spectral space alleviates the computational burden considerably and allows fordimension reduction, if desired.

Heuristically speaking, spectral methods (Gottlieb and Orszag (1977), Cressie and Wikle(2011)[Chapter 7]) approximate the solution ξ(t, s) by a linear combination of deterministicspatial functions φj(s) with random coefficients αj(t) that evolve dynamically over time:

ξK(t, s) =

K∑j=1

αj(t)φj(s) = φ(s)′α(t), (8)

where φ(s) = (φ1(s), . . . , φK(s))′ and α(t) = (α1(t), . . . , αK(t))′. As we argue below, it isexpedient to use Fourier functions

φj(s) = exp (ik′js), (9)

where i denotes the imaginary number i2 = −1 and kj = (kxj , kyj )′ is a spatial wavenumber.

For solving the SPDE in (1), Fourier functions have several advantages. First, differentia-tion in the physical space corresponds to multiplication in the spectral space. In other words,Fourier functions are eigenfunctions of the spatial differential operator. Instead of approxi-mating the differential operator in the physical space and then worrying about approximationerrors, one just has to multiply in the spectral space, and there is no approximation errorof the operator. Further, the solution of the SPDE is exact over time, given the frequenciesincluded, see Proposition 1 below. In contrast to finite differences, one does not accumulateerrors over time. The approximation error only depends on the number of spectral terms andnot on the temporal discretization. This is related to the fact that there is no need for nu-merical stability conditions. For statistical applications, where the parameters are not knowna priori, this is particularly useful. Finally, one can use the FFT for efficiently transformingfrom the physical to the spectral space, and vice versa. However, since Fourier terms areglobal functions, stationarity in space but not in time is a necessary assumption.

The following result shows that if the initial condition and the innovation process are inthe space spanned by a finite number of Fourier functions, then the solution of the SPDE (1)remains in this space for all times and can be given in explicit form.

Proposition 1. Assume that the initial state and the innovation terms are of the form

ξK(0, s) = φ(s)′α(0), εK(t, s) = φ(s)′ε(t) (10)

7

where φ(s) = (φ1(s), . . . , φK(s))′, φj(s) is given in 9, and ε(t) is a K-dimensional Gaussianwhite noise independent of α(0) with

Cov(ε(t), ε(t′)) = δt,t′diag(f(kj)

), (11)

where f is a spectral density. Then the process ξK(t, s) = φ(s)′α(t) where the componentsαj(t) are given by

αj(t) = exp (hjt)αj(0) +

∫ t

0exp (hj(t− s))εj(s)ds (12)

with hj = −iµ′kj − k′jΣkj − ζ is a solution of the SPDE in (1). For t → ∞, the influenceof the initial condition exp(hjt)αj(0) converges to zero and the process ξ(t, s) converges to astationary Gaussian process with mean zero and

Cov(ξK(t+ ∆t, s), ξK(t, s′)) = φ(s)′diag

(−exp (hj∆t)f(kj)

hj + h∗j

)φ(s′)∗,

where .∗ stands for complex conjugation.

Proof of Proposition 1. By (12), we have

∂tξK(t, s) =

K∑j=1

αj(t)φj(s) =K∑j=1

(hjαj(t) + εj(t))φj(s).

On the other hand, since the functions φj(s) = exp (ik′js) are Fourier terms, differentiationin the physical space corresponds to multiplication in the spectral space:

µ · ∇φj(s) = iµ′kjφj(s) (13)

and∇ ·Σ∇φj(s) = −k′jΣkjφj(s). (14)

Therefore, by the definition of hj ,

(−µ · ∇+∇ ·Σ∇− ζ)K∑j=1

αj(t)φj(s) =K∑j=1

hjαj(t)φj(s).

Together, we have

∂tξK(t, s) = (−µ · ∇+∇ ·Σ∇− ζ) ξK(t, s) + εK(t, s)

which proves the first part of the proposition. Since the real part of hj is negative, exp(hjt)→0 for t→∞. Moreover,

limt→∞

Cov(αj(t+ ∆t), αl(t)) = limt→∞

exp (hj∆t)δj,lf(kj)

∫ t

0exp (−(hj + h∗l )(t− s))ds

= −exp (hj∆t)

hj + h∗jδj,lf(kj),

(15)

and thus the last statement follows.

8

We assume that the forcing term ε(t, .), and consequently also the solution ξ(t, .), isstationary in space. Recall the Cramer representation for a stationary field ε(t, .)

ε(t, s) =

∫exp (ik′s)dεt(k)

where εt has orthogonal increments Cov(dεt(k), dεt′(l)) = δt,t′δk,lf(k) and f is the spectraldensity of ε(t, .) (see, e.g., Cramer and Leadbetter (1967)). This implies that we can approx-imate any stationary field, in particular also the one with a Whittle covariance function, bya finite linear combination of complex exponentials, and the covariance of ε(t) is a diagonalmatrix as required in the proposition.

3.1 Approximation bound

By passing to the limit K →∞ such that both the wavenumbers kj cover the entire domainR2 and the distance between neighboring wavenumbers goes to zero, we obtain from (8) thestationary (in space and time) solution with spectral density as in (3). In practice, if oneuses the discrete Fourier transform (DFT), or its fast variant, the FFT, the wavenumbers areregularly spaced and the distance between them is fixed for all K (see below). This impliesthat the covariance function of an approximate solution is periodic which is equivalent toassuming a rectangular domain being wrapped around a torus. Since in most applications,the domain is fixed anyway, this is a reasonable assumption.

Based on the above considerations, we assume, in the following, that s ∈ [0, 1]2 withperiodic boundary condition, i.e., that [0, 1]2 is wrapped on a torus. In practice, to avoidspurious periodicity, we can apply what is called “padding”. This means that we take s ∈[0, 0.5]2 and then embed it in [0, 1]2. As in the discrete Fourier transform, if we chooses ∈ [0, 1]2, it follows that the spatial wavenumbers kj lie on the n × n grid given by Dn ={2π · (i, j) : −(n/2 + 1) ≤ i, j ≤ n/2} = {−2π(n/2 + 1), . . . , 2πn/2}2 with n2 = N = K, nbeing an even natural number. We then have the following convergence result.

Proposition 2. The approximation ξN (t, s) converges in law to the solution ξ(t, s) of theSPDE (1) with s ∈ [0, 1]2 wrapped on a torus, and we have the bound

|C(t, s)− CN (t, s)| ≤ σ2ξ − σ2

ξN , (16)

where C(t, s) and CN (t, s) denote the covariance functions of ξ(t, s) and ξN (t, s), respectively,and where σ2

ξ = C(0,0) and σ2ξN

= CN (0,0) denote the marginal variances of these twoprocesses.

Proof of Proposition 2. Similarly as in (4) and due to k ∈ 2π · Z2, it follows that covariancefunction of ξ(t, s) is given by

C(t, s) =∑

k∈2π·Z2

∫f(ω,k)exp (itω)dω exp(is′k)

=∑

k∈2π·Z2

f(k)−exp (hkt)

hk + h∗kexp(is′k),

(17)

9

where hk = −iµ′k − k′Σk − ζ. From Proposition 1 we know that the approximate solutionξN (t, s) has the covariance function

CN (t, s) =∑k∈Dn

f(k)−exp (hkt)

hk + h∗kexp(is′k). (18)

It follows that

|C(t, s)− CN (t, s)| =

∣∣∣∣∣∣∑

k∈2π·Z2

f(k)exp (hkt)

hk + h∗k(1− 1{k∈Dn}) exp(is′k)

∣∣∣∣∣∣≤

∑k∈2π·Z2

f(k)1

hk + h∗k(1− 1{k∈Dn})

=σ2ξ − σ2

ξN .

(19)

Not surprisingly, this result tells us that the rate of convergence essentially depends onthe smoothness properties of the process ξ(t, s), i.e., on how fast the spectrum decays. Thesmoother ξ(t, s), that is, the more variation is explained by low frequencies, the faster is theconvergence of the approximation.

Note that, there is a conceptual difference between the stationary solution of the SPDE(1) with s ∈ R2 and the periodic one with s ∈ [0, 1]2 wrapped on a torus. For the sake ofnotational simplicity, we have denoted both of them by ξ(t, s). The finite dimensional solutionξN (t, s) is an approximation to both of the above infinite dimensional solutions. The aboveconvergence result, though, only holds true for the solution on the torus.

3.2 Real Fourier Functions and Discretization in Time and Space

In order that we can apply the model to real data, we have to discretize it. In the following,we consider the process ξ(t, s) on a regular grid of n × n = N spatial locations s1, . . . , sNin [0, 1]2 and at equidistant time points t1, . . . , tT with ti − ti−1 = ∆. Note that these twoassumptions can be easily relaxed, i.e., one can have irregular spatial observation locations andnon-equidistant time points. The former can be achieved by adopting a data augmentationapproach (see, for instance, Sigrist et al. (2012)) or by using an incidence matrix (see Section4.2). The latter can be done by taking a time varying ∆.

For the sake of illustration, we have stated the results in the previous section using complexFourier functions. However, when discretizing the model, one obtains a linear Gaussian statespace model with a propagator matrix G that contains complex numbers, due to (13). Toavoid this, we replace the complex terms exp (ik′js) with real cos(k′js) and sin(k′js) functions.In other words, we use the real instead of the complex Fourier transform. The above resultsthen still hold true, since for real valued data, the real Fourier transform is equivalent to thecomplex one. For notational simplicity, we will drop the superscript “K” from ξK(t, s). Thedistinction between the approximation and the true solution is clear from the context.

Proposition 3. On the above specified discretized spatial and temporal domain and using thereal Fourier transform, a stationary solution of the SPDE (1) is of the form

ξ(ti+1) = Φα(ti+1), (20)

α(ti+1) = Gα(ti) + ε(ti+1), ε(ti+1) ∼ N(0, Q), (21)

10

with stacked vectors ξ(ti) = (ξ(ti, s1), . . . , ξ(ti, sN ))′ and cosine and sine coefficients α(ti) =(α

(c)1 (ti), . . . , α

(c)4 (ti), α

(c)5 (ti), α

(s)5 (ti), . . . , α

(c)K/2+2(ti), α

(s)K/2+2(ti)

)′, where Φ applies the dis-

crete, real Fourier transformation, G is a block diagonal matrix with 2 × 2 blocks, and Q isa diagonal matrix. These three matrices are defined as follows.

• Φ = [φ(s1), . . . ,φ(sN )]′ ,

φ(sl) =(φ

(c)1 (sl), . . . , φ

(c)4 (sl), φ

(c)5 (sl), φ

(s)5 (sl), . . . , φ

(c)K/2+2(sl), φ

(s)K/2+2(sl)

)′,

φ(c)j (sl) = cos(k′jsl), φ

(s)j (sl) = sin(k′jsl),

• [G]1:4,1:4 = diag(exp

(−∆(k′jΣkj + ζ)

)),

[G]5:K,5:K = diag(exp

(−∆(k′jΣkj + ζ)

)(cos(∆µ′kj)12 − sin(∆µ′kj)J2)

), where

12 =

(1 00 1

), J2 =

(0 1−1 0

), (22)

• Q = diag

(f(kj)

1−exp(−2∆(k′jΣkj+ζ))2(k′jΣkj+ζ)

).

In summary, at each time point t and spatial point s the solution ξ(t, s) is the discretereal Fourier transform of the random coefficients α(t)

ξ(t, s) =

4∑j=1

α(c)j (t)φ

(c)j (s) +

K/2+2∑j=5

α(c)j (t)φ

(c)j (s) + α

(s)j (t)φ

(s)j (s)

= φ(s)′α(t),

(23)

and the Fourier coefficients α(t) evolve dynamically over time according to the vector autore-gression in (21). The above equations (20) and (21) form a linear Gaussian state space modelwith parametric propagator matrix G and innovation covariance matrix Q, the parametriza-tion being determined by the corresponding SPDE.

Proof of Proposition 3. Using

µ · ∇φ(c)j (s) = iµ′kjφ

(s)j (s), µ · ∇φ(s)

j (s) = −iµ′kjφ(c)j (s),

∇ ·Σ∇φ(c)j (s) = −k′jΣkjφ

(c)j (s), ∇ ·Σ∇φ(s)

j (s) = −k′jΣkjφ(s)j (s),

and the same arguments as in the proof of Proposition 1, it follows that the continuous time

solution is of the form (23). For each pair of cosine - sine coefficients αj(t) = (α(c)j (t), α

(s)j (t))′

we have

αj(t) = eHjtαj(0) +

∫ t

0eHj(t−s)εj(s)ds, (24)

where

Hj =

(−k′jΣkj − ζ −µ′kj

µ′kj −k′jΣkj − ζ

).

Now Hj can be written as

Hj = (−k′jΣkj − ζ)12 − µ′kjJ2,

11

where

12 =

(1 00 1

), J2 =

(0 1−1 0

).

Since 12 and J2 commute, we have

eHjt =exp(−t(k′jΣkj + ζ)12

)exp

(−tµ′kjJ2

)=exp

(−t(k′jΣkj + ζ)

) (cos(tµ′kj)12 − sin(tµ′kj)J2

).

Further, the covariance of the stochastic integral in (24) is given by∫ t

0eHj(t−s)f(kj)e

H′j(t−s)ds =

∫ t

0f(kj)exp

(−2(k′jΣkj + ζ)(t− s)

)12ds

= f(kj)1− exp

(−2(k′jΣkj + ζ)t

)2(k′jΣkj + ζ)

12.

Discretizing in time and space then leads to the result in (20) and (21).

Spatial wavenumbers

kx/2pi

ky/2

pi

−5 0 5 10

−5

05

10

● ●

● ●

●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●

●● ●● ●● ●● ●● ●● ●● ●● ●●

●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●●

Figure 2. Illustration of spatial wavenumbers for the two-dimensional discrete real Fouriertransform with n2 = 400 grid points.

The discrete complex Fourier transform uses n2 different wavenumbers kj each havinga corresponding Fourier term exp(ik′js). The real Fourier transform, on the other hand,uses n2/2 + 2 different wavenumbers, where four of them have only a cosine term and theothers each have sine and cosine terms. This follows from the fact that, for real data, certaincoefficients of the complex transform are the complex transpose of other coefficients. Fortechnical details on the real Fourier transform, we refer to Dudgeon and Mersereau (1984),Borgman et al. (1984), Royle and Wikle (2005), and Paciorek (2007). Figure 2 illustrates anexample of the spatial wavenumbers, with n2 = 20 × 20 = 400 grid points. The dots witha circle represent the wavenumbers actually used in the real Fourier transform, and the red

12

crosses mark the wavenumbers having only a cosine term. Note that in (23) we choose toorder the spatial wavenumbers such that the first four spatial wavenumbers correspond tothe cosine-only terms. To get an idea how the basis functions cos (k′js) and sin (k′js) look

cos

0.5

1si

n0.

51

0.5 1 0.5 1 0.5 1 0.5 1 0.5 1 0.5 1

Figure 3. Illustration of two dimensional Fourier basis functions used in the discrete realFourier transform with n2 = 400. On the x- and y-axis are the coordinates of s.

like, we plot in Figure 3 twelve low-frequency basis functions corresponding to the six spatialfrequencies closest to the origin 0 (see Figure 2). Further, in Figure 4, there is an example of

Propagator matrix G

Dimensions: 16 x 16Column

Row

5

10

15

5 10 15

−0.4

−0.2

0.0

0.2

0.4

0.6

Figure 4. Illustration of propagator matrix G. 16 real Fourier functions are used (n = 4).

a propagator matrix G when n = 4, i.e., when sixteen (42) spatial basis functions are used.The upper left 4 × 4 diagonal matrix corresponds to the cosine-only frequencies. The 2 × 2blocks following correspond to wavenumbers with cosine - sine pairs.

13

3.3 Remarks on Finite Differences

Another approach to solve PDEs or SPDEs such as the one in (1) consists of using a dis-cretization such as finite differences. Stroud et al. (2010) use finite differences to solve anadvection-diffusion PDE. Other examples are Wikle (2003), Xu and Wikle (2007), Duanet al. (2009), Malmberg et al. (2008), and Zheng and Aukema (2010). The finite differenceapproximation, however, has several disadvantages. First, each spatial discretization effec-tively implies an interaction structure between temporal and spatial correlation. In otherwords, as Xu et al. (2005) state, the discretization effectively suggests a knowledge of thescale of interaction, lagged in time. Usually, this space-time covariance interaction structureis not known, though. Furthermore, there are numerical stability conditions that need to befulfilled in order that the approximate solution is meaningful. Since these conditions dependon the values of the unknown parameters, one can run into problems.

In addition, computational tractability is an issue. In fact, we have tried to solve theSPDE in (1) using finite differences as described in the following. A finite difference ap-proximations in (1) leads to a vector autoregressive model with a sparse propagator matrixbeing determined by the discretization. The innovation term ε can be approximated usinga Gaussian Markov random field with sparse precision matrix (see Lindgren et al. (2011)).Even though the propagator and the precision matrices of the innovations are sparse, we haverun into a computational bottleneck when using the Forward Filtering Backward Sampling(FFBS) algorithm (Carter and Kohn 1994; Fruhwirth-Schnatter 1994) for fitting the model.The basic problem is that the Kalman gain is eventually a dense matrix. Alternative sam-pling schemes like the information filter (see, e.g., Anderson and Moore (1979) and Vivarand Ferreira (2009)) did not solve the problem either. However, future research on this topicmight come up with solutions.

4 Computationally Efficient Statistical Inference

As said in the introduction, when using a naive approach, the computational costs for aspatio-temporal model with T time points and N spatial points equal O((NT )3). Using theKalman filter or the Forward Filtering Backward Sampling (FFBS) algorithm (Carter andKohn 1994; Fruhwirth-Schnatter 1994), depending on what is needed, these costs are reducedto O(TN3) which, generally, is still too high for large data sets. In the following, we show howBayesian and frequentist inference can be done efficiently in O(TN logN) operations. In thespectral space, the costs of the algorithms grow linearly in the dimension TN , which meansthat the total computational costs are dominated by the costs of the fast Fourier transform(FFT) (Cooley and Tukey 1965) which are O(TN logN).

As is often done in a statistical model, we add a non-structured Gaussian term ν(ti+1, s) ∼N(0, τ2), iid, to (20) to account for small scale variation and / or measurement errors. Ingeostatistics, this term is called nugget effect. Denoting the observations at time ti by w(ti),we then have the following linear Gaussian state space model:

w(ti+1) = Φα(ti+1) + ν(ti+1), ν(ti+1) ∼ N(0, τ21N ), (25)

α(ti+1) = Gα(ti) + ε(ti+1), ε(ti+1) ∼ N(0, Q).

Note that ξ(ti+1) = Φα(ti+1). As mentioned before, irregular spatial data can be modeledby adopting a data augmentation approach (see Sigrist et al. (2012)) or by using an incidence

14

matrix (see Section 4.2). For the sake of simplicity, a zero mean was assumed. Extending themodel by including covariates in a regression term is straightforward. Furthermore, we assumenormality. The model can be easily generalized to allow for data not following a Gaussiandistribution. For instance, this can be done by including it in a hierarchical Bayesian model(HBM) (Wikle et al. 1998) and specifying a non-Gaussian distribution for w|ξ. A frequentistapproach is then typically no longer feasible. But approximate posterior probabilities canstill be computed using, for instance, simulation based methods such as Markov chain MonteCarlo (MCMC) (see, e.g., Gilks et al. (1996) or Robert and Casella (2004)). An additionaladvantage of BHMs is that these models can be extended, for instance, to account for temporalnonstationarity by letting one or several parameters vary over time.

4.1 Kalman Filtering and Backward Sampling in the Spectral Space

In a frequentist context, it is crucial that one is able to evaluate the likelihood with a rea-sonable computational effort, and when doing Bayesian inference, one needs to be able tosimulate efficiently from the full conditional of the latent process [ξ|·], or, equivalently, theFourier coefficients [α|·]. Below, we show how both these tasks can be done in the spectralspace in linear time, i.e., using O(TN) operations. For transforming between the physicaland spectral space, one can use the FFT which requires O(TN logN) operations. We startwith the spectral version of the Kalman filter. Its output is used for both evaluating thelog-likelihood and for simulating from the full conditional of the coefficients α.

Algorithm 1 Spectral Kalman filter

Input: T ,w, G, τ2, Q, HOutput: forecast and filter means mti|ti−1

, mti|ti and covariance matrices Rti|ti , Rti|ti−1,

i = 1, . . . , T

mt0|t0 = 0

Rt0|t0 = Qfor i = 1 to T domti|ti−1

= Gmti−1|ti−1

Rti|ti−1= Q+Rti−1|ti−1

H

Rti|ti =(τ−21N +Rti|ti−1

)−1

mti|ti = mti|ti−1+ τ−2Rti|ti

(w(ti)−mti|ti−1

)end for

Algorithm 1 shows the Kalman filter in the spectral space. For the sake of simplicity, weassume that the initial distribution equals the innovation distribution. The spectral Kalmanfilter has as input the Fourier transform of w = (w(t1)′, . . . , w(tT )′)′ ofw, the diagonal matrixH given by

[H]1:4,1:4 = diag(exp

(−2∆(k′jΣkj + ζ)

)),

[H]5:N,5:N = diag(exp

(−2∆(k′jΣkj + ζ)

)12

),

(26)

and other parameters that characterize the SPDE model. It returns forecast and filter meansmti|ti−1

and mti|ti and covariance matrices Rti|ti and Rti|ti−1, i = 1, . . . , T , respectively. Note

that we follow the notation of Kunsch (2001). Since the matrices Q and H are diagonal, the

15

covariance matrices Rti|ti and Rti|ti−1are also diagonal. Note that the matrix notation

in Algorithm 1 is used solely for illustrational purpose. In practice, matrix vector products(Gmti−1|ti−1

), matrix multiplications (Rti−1|ti−1H), and matrix inversions ((τ−2+Rti|ti−1

)−1)are not calculated with general purpose algorithms but elementwise since all matrices arediagonal. It follows that the computational costs for this algorithm are O(TN).

The derivation of this algorithm follows from the classical Kalman filter (see, e.g., Kunsch(2001)) using Φ′Φ = 1N , GRti−1|ti−1

G′ = Rti−1|ti−1GG′, and the fact that GG′ = H. The

first equation holds true due to the orthonormality of the discrete Fourier transform. Thesecond equation follows from the fact that G is 2 × 2 block diagonal and that Rti−1|ti−1

isdiagonal with the diagonal entries being equal for each cosine - sine pair. The last equationholds true as shown in the following. Being obvious for the first four frequencies, we considerthe 2× 2 diagonal blocks of cosine - sine pairs:

[G](2l−5):(2l−4),(2l−5):(2l−4)[G]′(2l−5):(2l−4),(2l−5):(2l−4)

= exp(−2∆(k′jΣkj + ζ)

) (cos(∆µ′kj)12 − sin(∆µ′kj)J2

) (cos(∆µ′kj)12 − sin(∆µ′kj)J2

)′= exp

(−2∆(k′jΣkj + ζ)

) (cos(∆µ′kj)

2 + sin(∆µ′kj)2)12,

l = 5, . . . , N/2 + 2, which equals (26). In the last equation we have used

J ′2 = −J2 and J22 = −12.

Based on the Kalman filter, the log-likelihood is calculated as (see, e.g., Shumway andStoffer (2000))

` =T∑i=1

log∣∣Rti|ti−1

+ τ21N∣∣+(w(ti)−mti|ti−1

)′ (Rti|ti−1

+ τ21N)−1 (

w(ti)−mti|ti−1

)+TN

2log(2π).

Since the forecast covariance matrices Rti|ti−1are diagonal, calculation of their determinants

and their inverses is trivial, and computational costs are again O(TN).

Algorithm 2 Spectral backward sampling

Input: T , G, Q, H, mti|ti−1, mti|ti , Rti|ti , Rti|ti−1

, i = 1, . . . , TOutput: a sample α∗(t1), . . . ,α∗(tT ) from [α|·]α∗(tT ) = mtT |tT +

(RtT |tT

)1/2nT , nT ∼ N(0,1N )

for i = T − 1 to 1 domti = mti|ti +Rti|tiR

−1ti|ti−1

G′(α∗(ti+1)−mti|ti−1

)Rti =

(QH +R−1

ti−1|ti−1

)−1

α∗(ti) = mti +(Rti

)1/2ni, ni ∼ N(0,1N )

end for

As said, in a Bayesian context, the main difficulty consists in simulating from the fullconditional of the latent coefficients [α|·]. After running the Kalman filter, this can be donewith a backward sampling step. Together, these two algorithms are know as Forward FilteringBackward Sampling (FFBS). Again, backward sampling is computationally very efficient inthe spectral space with costs being O(TN). Algorithm 2 shows the backward samplingalgorithm in the spectral space. The matrices Rti are diagonal which makes their Choleskydecomposition trivial.

16

4.2 Dimension Reduction and Incidence Matrix

If desired, the total computational costs can be additionally alleviated by using a reduceddimensional Fourier basis with K << N . This means that one includes only certain frequen-cies, typically low ones. The spectral filtering and sampling algorithms then require O(KT )operations. For using the FFT, the frequencies being excluded are just set to zero. When theobservation process is low-dimensional, as is the case in our application, one can include anincidence matrix I that relates the process on the grid to the observation locations. Insteadof (25), the model is then

w(ti+1) = IΦα(ti+1) + ν(ti+1), ν(ti+1) ∼ N(0, τ21N ). (27)

The FFT cannot be used anymore, and the total computational costs are O(K3T ) due to thetraditional FFBS.

5 Postprocessing Precipitation Forecasts

Nowadays, numerical weather prediction (NWP) models are capable of producing predictivefields at spatially and temporally high frequencies. Statistical postprocessing serves twopurposes. First, probabilistic predictions are obtained in cases where only deterministic onesare available. Further, even if “probabilistic” forecasts in form of ensembles (Palmer 2002;Gneiting and Raftery 2005) are available, they are typically not calibrated, i.e., they areoften underdispersed (Hamill and Colucci 1997). The goal of postprocessing is then to obtaincalibrated and sharp predictive distributions (see Gneiting et al. (2007a) for a definitionof calibration and sharpness). In the case of precipitation, the need for postprocessing isparticularly strong, since, despite their importance, precipitation forecasts are still not asaccurate as forecasts for other meteorological quantities (Applequist et al. 2002; Stensrudand Yussouf 2007).

Several approaches for postprocessing precipitation forecasts have been proposed, includ-ing linear regression (Antolik 2000), logistic regression (Hamill et al. 2004), quantile regression(Bjørnar Bremnes 2004; Friederichs and Hense 2007), hierarchical models based on prior cli-matic distribution (Krzysztofowicz and Maranzano 2006), neural networks (Ramrez et al.2005), and binning techniques (Yussouf and Stensrud 2006). Sloughter et al. (2007) proposea two-stage model to postprocess precipitation forecasts. Berrocal et al. (2008) extendedthe model of Sloughter et al. (2007) by accounting for spatial correlation. Kleiber et al.(2011) present a similar model that includes ensemble predictions and accounts for spatialcorrelation.

Except for the last two references, spatial correlation is typically not modeled in postpro-cessing precipitation forecasts, and none of the aforementioned models explicitly accounts forspatio-temporal dependencies. However, for temporally and spatially highly resolved data,it is necessary to account for correlation in space and time. First, spatio-temporal correla-tion is important, for instance, for predicting precipitation accumulation over space and timewith accurate estimates of precision. Further, it is likely that errors of NWP models exhibitstructured behaviour over space and time. A spatio-temporal model can capture structureddynamics of this error term and extrapolate a spatial error over time.

17

Locations of forecast gridpoints and observation stations

● ●

●●

500 550 600 650 700 750 800

100

150

200

250

Figure 5. Locations of grid points at which predictions are obtained (50 × 100 grid of smalldots) and observations stations (bold dots). Both axis are in km using the Swiss coordinatesystem (CH1903).

5.1 Data

We have precipitation predictions from an NWP model called COSMO-2, a high-resolutionmodel with a grid spacing of 2.2 km that is run by MeteoSwiss as part of COonsortium forSmall-scale MOdelling (COSMO) (see,e.g., Steppeler et al. 2003). The NWP model producesdeterministic forecasts once a day starting at 0:00UTC. Predictions are made for eight con-secutive time periods corresponding to 24 h ahead. In the following, let yF (t, s) denote theforecast for time t and site s made at 0:00UTC of the same day. We consider a rectangularregion in northern Switzerland shown in Figure 5. The grid at which predictions are made isof size 50 × 100. Precipitation is observed at 32 stations over northern Switzerland. Figure5 also shows the locations of the observation stations. We use three hourly data from thebeginning of December 2008 till the end of March 2009. To illustrate the data, in Figure 6,observed precipitation at one station and the equally weighted areal average precipitation areplotted versus time. We will use the first three months containing 720 time points for fitting,and the last month is left aside for evaluation.

The NWP model forecasts are deterministic and ensembles are not available in our case.However, the extension to use an ensemble instead of just one member can be easily done.One can include all the ensemble members in the regression part of the model. Or, in thecase of exchangeable members, one can use the location and the spread of the ensemble.

5.2 Precipitation Model for Postprocessing

The model presented in the following is a Bayesian hierarchical model (BHM). It uses thespatio-temporal process presented above at the process level. At the data stage, a modeladapted to the nature of precipitation is used. A characteristic feature of precipitation is that

18

02

46

8

Station: Waedenswil

time

Pre

cip

(mm

)

01

23

45

6

Areal Average

Time

Pre

cip

(mm

)

Dec Jan Feb Mar

Figure 6. Precipitation versus time, for one station and averaged over all stations.

its distribution consists of a discrete component, indicating occurrence of precipitation, and acontinuous one, determining the amount (see Figure 6). As a consequence, there are two basicstatistical modeling approaches. The continuous and the discrete part are either modelledseparately (Coe and Stern 1982; Wilks 1999) or together (Bell 1987; Wilks 1990; Bardossyand Plate 1992; Hutchinson 1995; Sanso and Guenni 2004). See, e.g., Sigrist et al. (2012) fora more extensive overview of precipitation models and for further details on the data modelused below. Originally, the approach presented in the following goes back to Tobin (1958)who analyzed household expenditure on durable goods. For modeling precipitation, Stidd(1973) took up this idea and modified it by including a power-transformation for the non-zeropart so that the model can account for skewness.

It is assumed that rainfall y(t, s) at time t on site s ∈ R2 depends on a latent Gaussianvariable w(t, s) through

y(t, s) = 0, if w(t, s) ≤ 0,

= w(t, s)λ, if w(t, s) > 0,(28)

where λ > 0. A power transformation is needed since precipitation amounts are skewedand do not follow a truncated normal distribution. The latent Gaussian process w(t, s) isinterpreted as a precipitation potential.

The mean of the Gaussian process w(t, s) is assumed to depend linearly on spatio-temporalcovariates x(t, s) ∈ Rk. As shown below, this mean term basically consists of the NWPforecasts. Variation that is not explained by the linear term is modeled using the Gaussianprocess ξ(t, s) and the unstructured term ν(t, s) for microscale variability and measurementerrors. The spatio-temporal process ξ(t, s) has two functions. First, it captures systematicerrors of the NWP in space and time and can extrapolate them over time. Second, it accountsfor structured variability so that the postprocessed forecast is probabilistic and its distributionsharp and calibrated.

To be more specific concerning the covariates, similarly as in Berrocal et al. (2008), we

include a transformed variable yF (t, s)1/λ and an indicator variable 1{yF (t,s)=0} which equals

19

1 if yF (t, s) = 0 and 0 otherwise. λ is determined by fitting the transformed Tobit model asin (28) to the marginal distribution of the rain data ignoring any spatio-temporal correlation.

In doing so, we obtain λ ≈ 1.4. yF (t, s)1/λ is centered around zero by subtracting its overall

mean y1/λF in order to reduce posterior correlations. Thus, w(t, s) equals

w(t, s) = b1

(yF (t, s)1/λ − y1/λ

F

)+ b21{yF (t,s)=0} + ξ(t, s) + ν(t, s). (29)

An intercept is not included since the first Fourier term is constant in space. In our case,including an intercept term results in weak identifiability which slows down the convergenceof the MCMC algorithm used for fitting. Note that in situations where the mean is large it isadvisable to include an intercept, since the coefficient of the first Fourier term is constrainedby the joint prior on α. Further, unidendifiability is unlikely to be a problem in these cases.

Concerning the spatio-temporal process ξ(t, s), we apply padding. This means that weembed the 50×100 grid in a rectangular 200×200 grid. A brief prior investigation showed thatthe range parameters are relatively large in comparison to the spatial domain, and padding istherefore used in order to avoid spurious correlations due to periodicity. Further, we use anincidence matrix I as in (27) to relate the process at the grid to the observation stations. Asargued in Section 4.2, this then requires that we use a reduced dimensional Fourier expansion.I.e., instead of using N = 2002 basis functions, we only use K << N low-frequency Fourierterms. This is done for the following reasons. First, the observation stations are relativelyscarce which means that there is probably no information on spatial high frequencies of theNWP error and, consequently, the low frequencies are important. Further, the covariates,i.e., the NWP forecasts, are not available on the extended 200×200 domain, which they needto be if we do not use an incidence matrix and model the process on the full grid.

Concerning prior distributions, we assign weakly informative, and in some cases improper,priors to the parameters θ = (ρ0, σ

2, ζ, ρ1, γ, α, µx, µy, τ2)′. b and λ have improper, locally

uniform priors on R and R+, respectively. Based on Gelman (2006), we use improper priorsfor σ2 and τ2 that are uniform on the standard deviation scale σ and τ , respectively. Further,the drift parameters µx and µy have uniform priors on [−0.5, 0.5], and α has a uniform prioron [0, π/2]. The anisotropy parameter γ has a uniform prior on the log scale of the interval[0.1, 10]. γ is restricted to [0.1, 10] since stronger anisotropy does not seem reasonable. Therange parameters ρ0 and ρ1 as well as the damping parameter ζ are assigned improper, locallyuniform priors on R+. In summary,

P [b, λ,θ] ∝ 1√σ2√τ2γ

1{−0.5≤µx,µy≤0.5}1{0≤α≤π/2}1{λ,ρ0,ρ1,ζ,σ2,τ2≥0}1{0.1≤γ≤10}.

5.3 Fitting

Monte Carlo Markov Chain (MCMC) is used to sample from the posterior distribution. Tobe more specific, we apply what Neal and Roberts (2006) call a Metropolis within-Gibbsalgorithm which alternates between blocked Gibbs (Gelfand and Smith 1990) and Metropolis(Metropolis et al. 1953; Hastings 1970) sampling steps. For the random walk Metropolissteps, we use an adaptive algorithm (Roberts and Rosenthal 2009) meaning that the proposalcovariances are successively estimated such that an optimal scaling is obtained with an ac-ceptance rate between 0.2 and 0.3. See Roberts and Rosenthal (2001) for more informationon optimal scaling for Metropolis-Hastings algorithms.

20

Our goal is to simulate from the posterior distribution [b, λ,θ,α,w|y], where y denotesthe set of all observations. Note that since the latent process ξ is the Fourier transform ofthe coefficients α, knowing the posterior of α is equivalent to knowing the one of ξ. In thefollowing, we use the notation ”[y|·]” and ”P [y|·]” to denote conditional distributions anddensities, respectively. A Gibbs step is used to sample from [w|·]. To be more specific, wis only sampled for those values with corresponding observations being censored at zero ormissing, the full conditionals being truncated and regular Gaussian distributions, respectively.See Sigrist et al. (2012) for more details on this type of data augmentation approach. Next,we sample jointly from [b,θ,α|·] in a Metropolis-Hastings step. A proposal (b∗,θ∗,α∗) isobtained by first sampling (b∗,θ∗) from a Gaussian distribution with the mean equaling thelast value and an adaptively estimated proposal covariance matrix, and then sampling α∗

from [α|θ∗, b∗, ·] using the forward filtering backward sampling (FFBS) (Carter and Kohn1994; Fruhwirth-Schnatter 1994). Joint sampling is preferable over iteratively and separatelysampling from b∗, θ∗, and α∗, since these variables can be strongly dependent which resultsin slow mixing. In particular α and θ as well as the fixed and random effects determined by band α, respectively, can be strongly dependent. This problem is similar to the one observedwhen doing inference for diffusion models(see, e.g., Roberts and Stramer (2001) and Golightlyand Wilkinson (2008)). It can be shown that the acceptance ratio for the joint proposal is

min

(1,

P [b∗,θ∗|w(i)]

P [b(i),θ(i)|w(i)]

), (30)

where P [b,θ|w(i)] denotes the likelihood of (b,θ) given the current w(i) evaluated at (b,θ),and where θ∗ and θ(i) denote the proposal and the last values, respectively. We see thatthis acceptance ratio does not depend on α. Thus, the parameters b and θ are allowed tomove faster in their parameter space. We recall that the value of the likelihood P [b,θ|w(i)] isobtained as a side product of the Kalman filter in the FFBS. Finally, for sampling from the fullconditional [λ|·], a random walk Metropolis step is used. After a burn-in of 5, 000 iterations,we use 100, 000 samples from the Markov chain to characterize the posterior distribution.Convergence is monitored by inspecting trace plots.

5.4 Model Selection and Results

As said, we use a reduced dimensional approach. The number of Fourier functions is deter-mined based on predictive performance for the 240 time points that were set aside. We startwith models including only spatial low-frequencies and add successively higher frequencies.In doing so, we only consider models that have the same resolution in each direction. Forinstance, we do not consider models that have higher frequency spatial basis functions in thehorizontal direction than in the vertical one.

In order to assess the performance of the predictions and to choose the number of basisfunctions to include, we use the continuous ranked probability score (CRPS) (Matheson andWinkler 1976). The CRPS is a strictly proper scoring rule (Gneiting and Raftery 2007) thatassigns a numerical value to probabilistic forecasts and assesses calibration and sharpnesssimultaneously (Gneiting et al. 2007a). It is defined as

CRPS(F, y) =

∫ ∞−∞

(F (x)− 1{y≤x})2dx, (31)

21

where F is the predictive cumulative distribution, y is the observed realization, and 1 denotesan indicator function. If a sample y(1), . . . , y(m) from F is available, it can be approximatedby

1

m

m∑i=1

|y(i) − y| − 1

2m2

m∑i,j=1

|y(i) − y(j)|. (32)

Ideally, one would run the full MCMC algorithm at each time point t ≥ 720, including alldata up to the point, and obtain predictive distributions from this. Since this is rather timeconsuming, we make the following approximation. We assume that the posterior distributionof the “primary” parameters θ, b, and λ given y1:t = {y1, . . . ,yt} is the same for all t ≥ 720.That is, we neglect the additional information that the observations in March give aboutthe primary parameters. Thus, the posterior distributions of the primary parameters arecalculated only once, namely on the dataset from December 2008 to February 2009. Thisassumption that the posterior of the primary parameters does not change with additionaldata may be questionable over longer time periods and when one moves away from the timeperiod from which data is used to obtain the posterior distribution. But since all our datalies in the winter season, we think that this assumption is reasonable. If longer time periodsare considered, one could use sliding training windows or model the primary parameters asnon-stationary using a temporal evolution.

For each time point t ≥ 720, we make up to 8 steps ahead forecasts corresponding to24 hours. I.e., we sample from the predictive distribution of y∗t+k, k = 1, . . . 8, given y1:t ={y1, . . . ,yt}.

0.37

0.38

0.39

0.40

0.41

0.42

Stationswise

Model (K)

CR

PS

5 9 13 21 25 29 37 45 49

Sep

5

9

1321

25

29

37

4549

Sep

●● ●

0.31

0.32

0.33

0.34

0.35

0.36

Areal

Model (K)

CR

PS

5 9 13 21 25 29 37 45 49

Sep

5

9

13

21

25

29

37 45 49

Sep

Figure 7. Comparison of different statistical models using the continuous ranked probabilityscore (CRPS). On the left are CRPSs of station specific forecasts and on the right are CRPSsof areal forecasts. K denotes the number of basis functions used in the model. “Sep” denotesthe separable model with K = 29. The unit of the CRPS is mm.

In Figure 7, the average CRPS of the pointwise predictions and the areal predictions areshown for the different statistical models. In the left plot, the mean is taken over all stations

22

and lead times, whereas the areal version is an average over all lead times. This is done for themodels with different numbers of basis functions used as well as for a separable model whichis obtained by setting µ = 0 and Σ−1 = 0. Models including only a few low-frequency Fourierterms perform worse. Then the CRPS decreases successively. The model based on includingK = 29 Fourier functions performs best. After this, adding higher frequencies results intolower predictive performance. We interpret this results in the way that the observation datadoes not allow for resolving high frequencies. The separable model, with K = 29 Fourierterms, clearly performs worse than the model with a non-separable covariance structure.Based on these findings, we decide to use the model with 29 cosine and sine functions.

Table 1. Posterior medians and 95 % credible intervals.Median 2.5 % 97.5 %

ρ0 25.4 18.8 32.4σ2 0.838 0.727 0.994ζ 0.00655 0.000395 0.0156ρ1 48.8 42.1 57.1γ 4.33 3.34 6.01α 0.557 0.49 0.617µx 6.73 0.688 12.9µy -4.19 -8.55 -0.435τ2 0.307 0.288 0.327

yF (t, s)1/λ 0.448 0.414 0.4811{yF (t,s)=0} -0.422 -0.5 -0.344

λ 1.67 1.64 1.7

Table 1 shows posterior medians as well as 95% credible intervals for the different param-eters. Note that the range parameters ρ0 and ρ1 as well as the drift parameters µx and µyhave been transformed back from the unit [0, 1] scale to the original km scale. The posteriormedian of the variance σ2 of the innovations of the spatio-temporal process is around 0.8.Compared to this, the nugget variance being about 0.3 is smaller. For the innovation rangeparameter ρ0, we obtain a value of about 25 km. And the range parameter ρ1 that controlsthe amount of diffusion or, in other words, the amount of spatio-temporal interaction, is ap-proximately 49 km. With γ and α being around 4 and 0.6, respectively, we observe anisotropyin the south-west to north-east direction. This is in line with the orography of the region, asthe majority of the grid points lies between two mountain ranges: the Jura to the north-westand the Alps to the south-east. The drift points to the south-east, both parameters beingrather small though. Further, the damping parameter ζ has a posterior median of about 0.01.

Next, we compare the performance of the postprocessed forecasts with the ones fromthe NWP model. In addition to the temporal cross-validation, we do the following cross-validation in space and time. We first remove six randomly selected stations from the data,fit the latent process to the remaining stations, and evaluate the forecasts at the stations leftout. Concerning the primary parameters, i.e., all parameters except the latent process, we usethe posterior obtained from the full data including all stations. This is done for computationalsimplicity and since this posterior is not very sensitive when excluding a few stations. Sincethe NWP produces 8 step ahead predictions once a day, we only consider statistical forecasts

23

Table 2. Comparison of NWP model and statistically postprocessed forecasts (’Stat PP’)using the mean absolute error (MAE). ’Static’ denotes the constant forecast obtained byusing the most recently observed data. The unit of the MAE is mm.

Stat PP NWP Static

Stationwise 0.359 0.485 0.594Areal 0.303 0.387 0.489

starting at 0:00UTC. This is in contrast to the above comparison of the different statisticalmodels for which 8 step ahead predictions were made at all time points and not just once foreach day. We use the mean absolute error (MAE) for evaluating the NWP forecasts. In orderto be consistent, we also generate point forecasts from the statistical predictive distributionsby using medians, and then calculate the MAE for these point forecasts. In Table 2, the resultsare reported. For comparison, we also give the score for the static forecast that is obtainedby using the most recently observed data. The postprocessed forecasts clearly perform betterthan the raw NWP forecasts. In addition, the postprocessed forecasts have the advantagethat they provide probabilistic forecasts quantifying prediction uncertainty.

The statistical model produces a joint spatio-temporal predictive distribution that is spa-tially highly resolved. To illustrate the use of the model, we show several quantities in Figure8. We consider the time point t = 760 and calculate predictive distributions over the next 24hours. Predicted fields for the period t = 761, . . . , 768 from the NWP are shown in the topleft corner. On the right of it are pointwise medians obtained from the statistical forecasts.This is a period during which the NWP predicts too much rainfall compared to the observeddata (results not shown). The figure shows how the statistical model corrects for this. Forillustration, we also show one sample from the predictive distribution. To quantify predic-tion uncertainty, the difference between the third quartile and the median of the predictivedistribution is plotted. These plots again show the growing uncertainty with increasing leadtime. Other quantities of interest (not shown here), that can be easily obtained, includeprobabilities of precipitation occurrence or various quantiles of the distribution.

6 CONCLUSION

We have presented a spatio-temporal model and corresponding efficient algorithms for doingstatistical inference for large data sets. Instead of using the covariance function, we haveproposed to use a Gaussian process defined trough an SPDE. The SPDE was solved usingFourier functions, and we have given a bound on the precision of the approximate solu-tion. In the spectral space, one can use computationally efficient statistical algorithms whosecomputational costs grow linearly with the dimension, the total computational costs beingdominated by the fast Fourier transform. The space-time Gaussian process defined throughthe advection-diffusion SPDE has a nonseparable covariance structure and can be physicallymotivated. The model is applied to postprocessing of precipitation forecasts for northernSwitzerland. The postprocessed forecasts clearly outperform the raw NWP predictions. Inaddition, they have the advantage that they quantify prediction uncertainty.

An interesting direction for further research would be to extend the SPDE based model toallow for spatial non-stationarity. Whether this can be done using some form of basis function

24

2010−03−06

0

1

2

3

4

5

2010−03−06 03:00:00

0

1

2

3

4

5

2010−03−06 06:00:00

0

1

2

3

4

5

6

2010−03−06 09:00:00

0

1

2

3

4

5

6

2010−03−06 12:00:00

0

1

2

3

4

5

6

2010−03−06 15:00:00

0

2

4

6

8

10

2010−03−06 18:00:00

0

2

4

6

8

10

12

2010−03−06 21:00:00

0

2

4

6

8

10

12

(a) NWP

2010−03−06

0.0

0.5

1.0

1.5

2010−03−06 03:00:00

0.0

0.5

1.0

1.5

2010−03−06 06:00:00

0.0

0.5

1.0

1.5

2010−03−06 09:00:00

0.0

0.5

1.0

1.5

2010−03−06 12:00:00

0.0

0.5

1.0

1.5

2.0

2010−03−06 15:00:00

0.0

0.5

1.0

1.5

2.0

2.5

3.0

2010−03−06 18:00:00

0

1

2

3

4

2010−03−06 21:00:00

0

1

2

3

4

5

(b) Median

2010−03−06

0.0

0.5

1.0

1.5

2.0

2.5

2010−03−06 03:00:00

0

1

2

3

4

5

2010−03−06 06:00:00

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

2010−03−06 09:00:00

0.0

0.5

1.0

1.5

2.0

2.5

3.0

2010−03−06 12:00:00

0

1

2

3

4

2010−03−06 15:00:00

0

1

2

3

4

5

6

2010−03−06 18:00:00

0

1

2

3

4

5

6

2010−03−06 21:00:00

0

2

4

6

(c) One sample

2010−03−06

0.0

0.5

1.0

1.5

2.0

2.5

2010−03−06 03:00:00

0.0

0.5

1.0

1.5

2.0

2.5

2010−03−06 06:00:00

0.0

0.5

1.0

1.5

2.0

2.5

2010−03−06 09:00:00

0.0

0.5

1.0

1.5

2.0

2.5

2010−03−06 12:00:00

0.0

0.5

1.0

1.5

2.0

2.5

2010−03−06 15:00:00

0.0

0.5

1.0

1.5

2.0

2.5

2010−03−06 18:00:00

0.0

0.5

1.0

1.5

2.0

2.5

2010−03−06 21:00:00

0.0

0.5

1.0

1.5

2.0

2.5

(d) Quartile difference

Figure 8. Illustration of postprocessed spatio-temporal precipitation fields for the periodt = 761, . . . , 768. The figure shows the NWP forecasts (a), pointwise medians of the predictivedistribution (b), one sample from the predictive distribution (c), and the differences betweenthe third quartile and the median of the predictive distribution (d). All quantities are in mm.Note that the scales are different in different figures.

25

expansion approach or by relying on a discretization method such as finite differences or finitevolumes remains an open question. Since the operators of the SPDE are local, one can definethe SPDE on general manifolds and, in particular, on the sphere (see, e.g., Lindgren et al.(2011)). It would be interesting to investigate to which extent spectral methods can still beused.

ACKNOWLEDGMENTS

We thank Vanessa Stauch from MeteoSchweiz for providing the data and for inspiring discus-sions.

References

Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions. DoverPublications, New York.

Anderson, B. D. O. and Moore, J. B. (1979). Optimal filtering / Brian D. O. Anderson, JohnB. Moore. Prentice-Hall, Englewood Cliffs, N.J.

Antolik, M. S. (2000). An overview of the national weather service’s centralized statisticalquantitative precipitation forecasts. Journal of Hydrology, 239(1-4):306 – 337.

Applequist, S., Gahrs, G. E., Pfeffer, R. L., and Niu, X.-F. (2002). Comparison of method-ologies for probabilistic quantitative precipitation forecasting. Weather and Forecasting,17:783–799.

Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis forSpatial Data. Monographs on Statistics and Applied Probability. Chapman and Hall/CRC.

Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussian predictive processmodels for large spatial data sets. Journal Of The Royal Statistical Society Series B,70(4):825–848.

Bardossy, A. and Plate, E. (1992). Space-time model for daily rainfall using atmosphericcirculation patterns. Water Resources Research, 28(5):1247–1259.

Bell, T. (1987). A space-time stochastic model of rainfall for satellite remote-sensing studies.Journal of Geophysical Research, 92(D8):9631–9643.

Berrocal, V. J., Raftery, A. E., and Gneiting, T. (2008). Probabilistic quantitative pre-cipitation field forecasting using a two-stage spatial model. Annals of Applied Statistics,2(4):1170–1193.

Bjørnar Bremnes, J. (2004). Probabilistic forecasts of precipitation in terms of quantiles usingNWP model output. Monthly Weather Review, 132:338–347.

Borgman, L., Taheri, M., and Hagan, R. (1984). Three-dimensional frequency-domain sim-ulations of geological variables. In Verly, G., editor, Geostatistics for Natural ResourcesCharacterization, pages 517–541. D. Reidel.

26

Brown, P. E., Karesen, K. F., Roberts, G. O., and Tonellato, S. (2000). Blur-generated non-separable space-time models. Journal of the Royal Statistical Society. Series B (StatisticalMethodology), 62(4):847–860.

Carter, C. K. and Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika,81(3):541–553.

Coe, R. and Stern, R. (1982). Fitting models to daily rainfall data. Journal of AppliedMeteorology, 21(7):1024–1031.

Cooley, J. W. and Tukey, J. W. (1965). An algorithm for the machine calculation of complexFourier series. Math. Comp., 19:297–301.

Cramer, H. and Leadbetter, M. R. (1967). Stationary and related stochastic processes. Samplefunction properties and their applications. John Wiley & Sons Inc., New York.

Cressie, N. and Huang, H.-C. (1999). Classes of nonseparable, spatio-temporal stationarycovariance functions. Journal of the American Statistical Association, 94(448):1330–1340.

Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):209–226.

Cressie, N. and Wikle, C. K. (2011). Statistics for spatio-temporal data. Wiley Series inProbability and Statistics. John Wiley & Sons, Inc.

Duan, J. A., Gelfand, A. E., and Sirmans, C. (2009). Modeling space-time data using stochas-tic differential equations. Bayesian Analysis, 4(4):733–758.

Dudgeon, D. E. and Mersereau, R. M. (1984). Multidimensional digital signal processing.Prentice-Hall.

Eidsvik, J., Shaby, B., Reich, B., Wheeler, M., and Niemi, J. (2012). Estima-tion and prediction in spatial models with block composite likelihoods using par-allel computing. Technical report, Department of Mathematical Sciences, NTNU.http://www.math.ntnu.no/∼joeid/ESRWN.pdf.

Folland, G. B. (1992). Fourier analysis and its applications. The Wadsworth & Brooks/ColeMathematics Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove,CA.

Friederichs, P. and Hense, A. (2007). Statistical downscaling of extreme precipitation eventsusing censored quantile regression. Monthly Weather Review, 135:2365–2378.

Fruhwirth-Schnatter, S. (1994). Data augmentation and dynamic linear models. Journal ofTime Series Analysis, 15:183–202.

Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. Journalof the American Statistical Association, 102:321–331.

Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance tapering for interpolation oflarge spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502–523.

27

Gelfand, A. E., Banerjee, S., and Gamerman, D. (2005). Spatial process modelling for uni-variate and multivariate dynamic spatial data. Environmetrics, 16(5):465–479.

Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculatingmarginal densities. Journal of the American Statistical Association, 85(410):398–409.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models.Bayesian Analysis, 1:1–19.

Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo inPractice. Chapman and Hall, London.

Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data. Jour-nal of the American Statistical Association, 97(458):590–600.

Gneiting, T., Balabdaoui, F., and Raftery, A. E. (2007a). Probabilistic forecasts, calibrationand sharpness. Journal Of The Royal Statistical Society Series B, 69(2):243–268.

Gneiting, T., Genton, M. G., and Guttorp, P. (2007b). Geostatistical space-time models,stationarity, separability and full symmetry. In Finkenstadt, B., Held, L., and Isham, V.,editors, Modelling Longitudinal and Spatially Correlated Data, volume 107 of Monographson Statistics and Applied Probability, pages 151–175. Chapman & Hall/CRC, Boca Raton.

Gneiting, T. and Raftery, A. E. (2005). Weather forecasting with ensemble methods. Science,310(5746):248–249.

Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estima-tion. Journal of the American Statistical Association, 102:359–378.

Golightly, A. and Wilkinson, D. (2008). Bayesian inference for nonlinear multivariate diffusionmodels observed with error. Computational & Statistics Data Analysis, 52(3):1674 – 1693.

Gottlieb, D. and Orszag, S. A. (1977). Numerical analysis of spectral methods: theory andapplications. Society for Industrial and Applied Mathematics, Philadelphia, Pa.

Haberman, R. (2004). Applied partial differential equations: with Fourier series and boundaryvalue problems. Pearson Prentice Hall.

Hamill, T. M. and Colucci, S. J. (1997). Verification of Eta RSM short-range ensembleforecasts. Monthly Weather Review, 125:1312–1327.

Hamill, T. M., Whitaker, J. S., and Wei, X. (2004). Ensemble reforecasting: Improv-ing medium-range forecast skill using retrospective forecasts. Monthly Weather Review,132:1434–1447.

Handcock, M. S. and Stein, M. L. (1993). A Bayesian analysis of kriging. Technometrics,35(4):403–410.

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and theirapplications. Biometrika, 57(1):97–109.

Heine, V. (1955). Models for two-dimensional stationary stochastic processes. Biometrika,42(1/2):pp. 170–178.

28

Huang, H.-C. and Hsu, N.-J. (2004). Modeling transport effects on ground-level ozone usinga non-stationary space-time model. Environmetrics, 15(3):251–268.

Hutchinson, M. (1995). Stochastic space-time weather models from ground-based data. Agri-cultural and Forest Meteorology, 73(3-4):237–264.

Johannesson, G., Cressie, N., and Huang, H.-C. (2007). Dynamic multi-resolution spatialmodels. Environmental and Ecological Statistics, 14:5–25.

Jones, R. and Zhang, Y. (1997). Models for continuous stationary space-time processes.In Gregoire, T., Brillinger, D. R., Diggle, P. J., Russek-Cohen, E., Warren, W. G., andWolfinger, R., editors, Modelling Longitudinal and Spatially Correlated Data, volume 122of Lecture Notes in Statistics, pages 289–298. Springer-Verlag, New-York.

Kleiber, W., Raftery, A. E., and Gneiting, T. (2011). Geostatistical model averaging for locallycalibrated probabilistic quantitative precipitation forecasting. Journal of the AmericanStatistical Association, 106(496):1291–1303.

Krzysztofowicz, R. and Maranzano, C. J. (2006). Bayesian processor of output for prob-abilistic quantitative precipitation forecasting. Technical report, Department of SystemsEngineering and Deptartment of Statistics, University Virginia.

Kunsch, H. R. (2001). State space and hidden Markov models. In Complex stochastic systems(Eindhoven, 1999), volume 87 of Monogr. Statist. Appl. Probab., pages 109–173. Chapman& Hall/CRC, Boca Raton, FL.

Lindgren, F., Rue, H., and Lindstrom, J. (2011). An explicit link between Gaussian fieldsand Gaussian Markov random fields: the stochastic partial differential equation approach.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(4):423–498.

Ma, C. (2003). Families of spatio-temporal stationary covariance models. Journal of StatisticalPlanning and Inference, 116(2):489 – 501.

Malmberg, A., Arellano, A., Edwards, D. P., Flyer, N., Nychka, D., and Wikle, C. (2008).Interpolating fields of carbon monoxide data using a hybrid statistical-physical model. TheAnnals of Applied Statistics, 2(4):1231–1248.

Matheson, J. E. and Winkler, R. L. (1976). Scoring rules for continuous probability distribu-tions. Management Science, 22(10):pp. 1087–1096.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953).Equation of state calculations by fast computing machines. The Journal of ChemicalPhysics, 21(6):1087–1092.

Neal, P. and Roberts, G. (2006). Optimal scaling for partially updating MCMC algorithms.The Annals of Applied Probability, 16(2):pp. 475–515.

Nychka, D., Wikle, C., and Royle, J. A. (2002). Multiresolution models for nonstationaryspatial covariance functions. Statistical Modeling, 2(4):315–331.

Paciorek, C. J. (2007). Bayesian smoothing with Gaussian processes using Fourier basisfunctions in the spectralGP package. Journal of Statistical Software, 19(2):1–38.

29

Paciorek, C. J. and Schervish, M. J. (2006). Spatial modelling using a new class of nonsta-tionary covariance functions. Environmetrics, 17(5):483–506.

Palmer, T. N. (2002). The economic value of ensemble forecasts as a tool for risk assessment:From days to decades. Quarterly Journal of the Royal Meteorological Society, 128(581):747–774.

Ramrez, M. C. V., de Campos Velho, H. F., and Ferreira, N. J. (2005). Artificial neuralnetwork technique for rainfall forecasting applied to the Sao Paulo region. Journal ofHydrology, 301(1-4):146 – 162.

Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, Second Edition.Springer Texts in Statistics. Springer Science and Business Media Inc., Spring Street, NewYork, NY 10013, USA, 2 edition.

Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various Metropolis-Hastingsalgorithms. Statistical Science, 16(4):pp. 351–367.

Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC. Journal ofComputational and Graphical Statistics, 18(2):349–367.

Roberts, G. O. and Stramer, O. (2001). On inference for partially observed nonlinear diffusionmodels using the Metropolis-Hastings algorithm. Biometrika, 88(3):pp. 603–621.

Royle, J. A. and Wikle, C. K. (2005). Efficient statistical mapping of avian count data.Environmental and Ecological Statistics, 12:225–243.

Rue, H. and Held, L. (2005). Gaussian Markov random fields, volume 104 of Monographs onStatistics and Applied Probability. Chapman & Hall/CRC, Boca Raton, FL. Theory andapplications.

Rue, H. and Tjelmeland, H. (2002). Fitting Gaussian Markov random fields to Gaussianfields. Scandinavian Journal of Statistics, 29(1):31–49.

Sanso, B. and Guenni, L. (2004). A Bayesian approach to compare observed rainfall data todeterministic simulations. Environmetrics, 15(6):597–612.

Shumway, R. H. and Stoffer, D. S. (2000). Time series analysis and its applications. SpringerTexts in Statistics. Springer-Verlag, New York.

Sigrist, F., Kunsch, H. R., and Stahel, W. A. (2012). A dynamic non-stationary spatio-temporal model for short term prediction of precipitation. Annals of Applied Statistics (toappear), 6(4):–.

Sigrist, F., Kunsch, H. R., and Stahel, W. A. (2012). spate: an R package forstatistical modeling with SPDE based spatio-temporal Gaussian processes. Tech-nical report, Seminar for Statistics, ETH Zurich. Preprint soon to appear on”http://stat.ethz.ch/people/sigrist/spate.pdf”.

Sloughter, J. M., Raftery, A. E., Gneiting, T., and Fraley, C. (2007). Probabilistic quanti-tative precipitation forecasting using Bayesian model averaging. Monthly Weather Review,135(9):3209–3220.

30

Solna, K. and Switzer, P. (1996). Time trend estimation for a geographic region. Journal ofthe American Statistical Association, 91(434):577–589.

Stein, M. L. (1999). Interpolation of spatial data, Some theory for Kriging. Springer Seriesin Statistics. Springer-Verlag, New York.

Stein, M. L. (2005). Space-time covariance functions. Journal of the American StatisticalAssociation, 100:310–321.

Stein, M. L. (2008). A modeling approach for large spatial datasets. Journal of the KoreanStatistical Society, 37(1):3 – 10.

Stein, M. L., Chi, Z., and Welty, L. J. (2004). Approximating likelihoods for large spatialdata sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology),66(2):275–296.

Stensrud, D. J. and Yussouf, N. (2007). Reliable probabilistic quantitative precipitationforecasts from a short-range ensemble forecasting system. Weather and Forecasting, 22:2–17.

Steppeler, J., Doms, G., Schttler, U., Bitzer, H. W., Gassmann, A., Damrath, U., and Gre-goric, G. (2003). Meso-gamma scale forecasts using the nonhydrostatic model LM. Meteo-rology and Atmospheric Physics, 82:75–96.

Stidd, C. K. (1973). Estimating the precipitation climate. Water Resources Research, 9:1235–1241.

Storvik, G., Frigessi, A., and Hirst, D. (2002). Stationary space-time Gaussian fields andtheir time autoregressive representation. Stat. Model., 2(2):139–161.

Stroud, J. R., Stein, M. L., Lesht, B. M., Schwab, D. J., and Beletsky, D. (2010). An en-semble Kalman filter and smoother for satellite data assimilation. Journal of the AmericanStatistical Association, 105(491):978–990.

Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica,26:24–36.

Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes.Journal of the Royal Statistical Society. Series B (Methodological), 50(2):pp. 297–312.

Vivar, J. C. and Ferreira, M. A. R. (2009). Spatiotemporal models for gaussian areal data.Journal of Computational and Graphical Statistics, 18(3):658–674.

Whittle, P. (1954). On stationary processes in the plane. Biometrika, 41(3/4):pp. 434–449.

Whittle, P. (1962). Topographic correlation, power-law covariance functions, and diffusion.Biometrika, 49(3/4):pp. 305–314.

Whittle, P. (1963). Stochastic processes in several dimensions. Bull. Inst. Internat. Statist.,40:974–994.

31

Wikle, C. (2010). Low-rank representations for spatial processes. In Handbook of SpatialStatistics, pages 107–118. Chapman & Hall/CRC Handbooks of Modern Statistical Meth-ods.

Wikle, C. and Hooten, M. (2010). A general science-based framework for dynamical spatio-temporal models. TEST, 19:417–451.

Wikle, C. K. (2002). A kernel-based spectral model for non-gaussian spatio-temporal pro-cesses. Statistical Modelling, 2(4):299–314.

Wikle, C. K. (2003). Hierarchical Bayesian models for predicting the spread of ecologicalprocesses. Ecology, 84(6):pp. 1382–1394.

Wikle, C. K., Berliner, L. M., and Cressie, N. (1998). Hierarchical Bayesian space-timemodels. Environmental and Ecological Statistics, 5:117–154.

Wikle, C. K. and Cressie, N. (1999). A dimension-reduced approach to space-time Kalmanfiltering. Biometrika, 86(4):815–829.

Wikle, C. K., Milliff, R. F., Nychka, D., and Berliner, L. M. (2001). Spatiotemporal hierarchi-cal bayesian modeling: Tropical ocean surface winds. Journal of the American StatisticalAssociation, 96(454):382–397.

Wilks, D. (1990). Maximum likelihood estimation for the gamma distribution using datacontaining zeros. Journal of Climate, 3(12):1495–1501.

Wilks, D. (1999). Multisite downscaling of daily precipitation with a stochastic weathergenerator. Climate Research, 11(2):125–136.

Xu, K. and Wikle, C. K. (2007). Estimation of parameterized spatio-temporal dynamicmodels. Journal of Statistical Planning and Inference, 137(2):567 – 588.

Xu, K., Wikle, C. K., and Fox, N. I. (2005). A kernel-based spatio-temporal dynamical modelfor nowcasting weather radar reflectivities. Journal of the American Statistical Association,100(472):1133–1144.

Yussouf, N. and Stensrud, D. J. (2006). Prediction of near-surface variables at independentlocations from a bias-corrected ensemble forecasting system. Monthly Weather Review,134:3415–3424.

Zheng, Y. and Aukema, B. H. (2010). Hierarchical dynamic modeling of outbreaks of mountainpine beetle using partial differential equations. Environmetrics, 21(7-8):801–816.

32