christel faes - soc.kuleuven.be · spatio-temporal data.location and time.whereandwhenmatters.to...

74
Spatial Statistics Christel Faes Interuniversity Institute for Biostatistics and statistical Bioinformatics Hasselt University [email protected]

Upload: others

Post on 10-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatial Statistics

Christel FaesInteruniversity Institute for Biostatistics and statistical Bioinformatics

Hasselt University

[email protected]

Page 2: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Contents

1 What and why spatial statistics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Areas of applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Features of Spatial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4 Hierarchical Statistical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.5 Spatial Statistics Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.6 Classes of Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Geostatistical (Point-Referenced) Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.1 Spatial Geostatistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Spatial Statistics ii

Page 3: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

2.2 Application: Cadmium Concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Area (Lattice) Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.1 Spatial Autocorrelated Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2 Application: Kidney-Cancer Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Application: South Africa Poverty Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Spatial Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Spatio-Temporal Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1 Application: Surveillance data for space-time outbreak prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Application: Bluetongue in cattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Spatial Statistics 1

Page 4: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Chapter 1

What and why spatial statistics?

Spatial Statistics 2

Page 5: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

1.1 Introduction

• Spatial data

. Location, location, locations

. Where matters

. To study the ’lay of the land’

. Spatial data do not have temporal dimension

∗ e.g. snapshot in time, aggregation over time or process not evolving in time

• Spatio-temporal data

. Location and time

. Where and when matters

. To infer cause-effect relationships (’why’)

Spatial Statistics 3

Page 6: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

1.2 Areas of applications

Researchers in diverse areas such as climatology, ecology, environmental health,and real estate marketing are increasingly faced with the task of analysing datathat are:

. highly multivariate, with many important predictors and response variables,

. geographically referenced, and often presented as maps,

. and temporally correlated, as in longitudinal or other time series structures.

Spatial Statistics 4

Page 7: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Cholera Epidemic

Dr. John Snow’s study of London’s cholera epidemic in 1854

Spatial Statistics 5

Page 8: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Kidney-Cancer Rates

Kidney cancer rates in Limburg (2000-2009)

Spatial Statistics 6

Page 9: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Ebola Outbreak

Cumulative number of cases from August till October 2014

Spatial Statistics 7

Page 10: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Ebola Outbreak

Cumulative number of cases from August till October 2014

Spatial Statistics 8

Page 11: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Ebola Outbreak

Cumulative number of cases from August till October 2014

Spatial Statistics 9

Page 12: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Ebola Outbreak

Cumulative number of cases from August till October 2014

Spatial Statistics 10

Page 13: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Ebola Outbreak

Cumulative number of cases from August till October 2014

Spatial Statistics 11

Page 14: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Cadmium Concentration

Spatial Statistics 12

Page 15: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatial Statistics 13

Page 16: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

1.3 Features of Spatial Analysis

Spatial Statistics 14

Page 17: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Non-Spatial Analysis

. spatial (geographical) data are analyzed using conventional statistical methods

. the geographical coordinates are excluded from the computational procedures

. the results are independent of the spatial arrangement of the geographical entities

. observations or entities are assumed to be independent and identically distributed,or in some occasions temporal dependence are also explored

ATTRIBUTE

Variable 1 Variable 2 · · · Variable n

Entity 1 attribute11 attribute12 · · · attribute1n

Entity 2 attribute21 attribute22 · · · attribute2n

......

.... . .

...

Entity m attributem1 attributem2 · · · attributemn

Spatial Statistics 15

Page 18: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatial Analysis

. spatial (geographical) data are analyzed using spatial statistical methods

. the geographical coordinates are included into the computational procedures

. the results depend on the spatial arrangement of the geographical entities

. it can also include temporal dependence

Geographical ATTRIBUTE

Coordinate

X Y Variable 1 Variable 2 · · · Variable n

Entity 1 X1 Y1 attribute11 attribute12 · · · attribute1n

Entity 2 X2 Y2 attribute21 attribute22 · · · attribute2n

......

......

.... . .

...

Entity m Xm Ym attributem1 attributem2 · · · attributemn

Spatial Statistics 16

Page 19: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Nearby things tend to be more alike...

• Spatial (and temporal) dependence is the rule

. Nearby (in space and time) observations tend to be more alike than those farapart

. E.g. spatial interaction, contagion, spill-overs, copycatting

. Competition: opposite may happen

. Physical barriers can affect what is meant by ’nearby’ or ’neighbouring’ (e.g.rivers, mountains)

• Spatio-temporal data should not be modeled as being statistically independent

• Tobler (1970) called this the first law of geography: everything depends oneverything else, but closer things more

Spatial Statistics 17

Page 20: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatial Statistics

• Spatial statistical model: incorporate spatial dependence

• Relatively new field in statistics

. Fisher (1935): use randomisation to neutralise effect

. Whittle (1954), Besag (1974): first use of spatial statistical models

• Spatial statistical models to yield more efficient inference and hence, shorter, lesscostly experiments

Spatial Statistics 18

Page 21: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Space is different from time

• We can visit the same place over and over

• We can go north, south, east, and west, up and down

• But, we can only ever go forward from past to present to future.

• Modeling spatio-temporal phenomena needs to respect this difference

Spatial Statistics 19

Page 22: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

1.4 Hierarchical Statistical Modeling

• Data model : expresses the distribution of the data given a hidden process

• Process model : uncertainty in the hidden (’true’) process through a probabilitydistribution of the phenomenon of interest

• Characterised by conditional probability distributions

• Unknown parameters

. Parameter model : Bayesian hierarchical model

. Estimate parameters using the data: Empirical hierarchical model

Spatial Statistics 20

Page 23: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

1.5 Spatial Statistics Books

Pioneer Book: Paelinck and Klaassen (1979): which focused the attention ofregional scientists on the need for specialized econometric methods to deal withestimation and specification problems caused by spatial data.

Books in the field:

. Cressie (1990, 1993): the legendary “bible” of spatial statistics, but

∗ rather high mathematical level

∗ lacks modern hierarchical modeling/computing

. Wackernagel (1998): terse; only geostatistics

. Chiles and Delfiner (1999): only geostatistics

. Stein (1999a): theoretical treatise on kriging

Spatial Statistics 21

Page 24: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

More descriptive presentations:

. Bailey and Gatrell (1996)

∗ focuses on description of pattern, tests of hypotheses, and interpolation,

∗ the authors continually stress the role of visualization in understanding spatialphenomena.

. Fotheringham and Rogerson (1994): deal with the integration of GIS and spatialanalysis,

. Haining (1990): include data description, map interpolation, exploratory andexplanatory analyses.

Spatial Statistics 22

Page 25: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

More recent books:

. Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2004) Hierarchical modeling ofanalysis for spatial data, CRC Press

. Waller, L.A., and Gotway, C.A. (2004) Applied spatial statistics for public health,Wiley & Sons

. Bivand, R.S., Pebesma, E.J., Gomez-Rubio, V. (2008) Applied Spatial DataAnalysis with R, Springer

. Lawson, A. (2009) Bayesian disease mapping. Hierarchical modeling in spatialepidemiology, Chapman & Hall

. Gelfand, A.E., Diggle, P., Fuentes, M. and Guttorp, P. (2011) Handbook ofSpatial Statistics, CRC press

. Cressie, N. and Wikle, C. (2011) Statistics for spatio-temporal data.

Spatial Statistics 23

Page 26: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

1.6 Classes of Spatial Data

As outlined in Cressie’s book, spatial data generally fall into one of three categories:

. Spatially Continuous (Geostatistical or point-referenced) Data

. R is a fixed subset of the plane of positive area (2-D) or volume (3-D).

. Y (s) is a random variable at each of the infinite continuous locations s ∈ R.

. Area (Lattice) Data

. R = {s1; s2; . . . ; sn} is a fixed regular or irregular lattice on the plane.

. Y (si) is a random variable at each location si, i = 1; . . . ;n.

. Spatial Point Process Data

. R = {s1; s2; . . . ; sn} is a random collection of points on the plane.

. Y (s) is not specified or is a random variable at a location s ∈ R (markedpoint process).

Spatial Statistics 24

Page 27: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Chapter 2

Geostatistical (Point-Referenced) Data

• The term “geostatistics” was coined by Matheron (1962, 1963) to describe thestatistical methodology for examining ore reserves from spatially distributed datain an ore body.

• More generally, geostatistics refers to data from a random process{Y (s) : s ∈ R}, where R ⊂ <n fixed, and s is allowed to vary continuously overR.

• For example, if R ⊂ <n is an agricultural field, and we are measuring themagnesium content in the soil(Y ), then we can measure Y at any point s withinthe field R.

Spatial Statistics 25

Page 28: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Examples: Meuse River Zinc

zinc

●●● ●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●

●●

●●●

●●●

●●●●●

●●

●●●

●●

●●●●●●

●●

●●●●

●●

●●

●●●●

●●●●●

●●●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●●

●●●

●●

113198326674.51839

Spatial Statistics 26

Page 29: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Research Question

. interest focuses on modeling continuous spatial variation across space

. spatial interpolation

. estimating how spatial dependence varies with distance

Spatial Statistics 27

Page 30: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

2.1 Spatial Geostatistical Models

. Methods of Analysis (for geostatistical data)

. linear interpolation

. variogram

. kriging

. splines

Spatial Statistics 28

Page 31: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Geostatistical Process

• Data model:Y (si) = Z(si) + ε(si)

with

∗ ε(si) white-noise process N(0, σ2ε )

• Process model:Z(si) = X(si)β + δ(si)

with

∗ X(si) known covariates,

∗ β fixed effect parameters

∗ δ(si) a (spatial) random effect for which

V ar(δ) = (Cγ(si − sj))

a positive-definite matrix

Spatial Statistics 29

Page 32: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Variogram and covariogram function

• Spatial dependence expressed in variogram or covariogram function

• Variogram = var(Z(s + h)−Z(s)) = 2γZ(h)

. Variogram is valid if all model-based variances are nonnegative

. If 2γZ(h) is only function of ‖h‖: isotropic (otherwise anisotropic)

. Example:

Spatial Statistics 30

Page 33: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Variogram and covariogram function

• Covariogram = cov(Z(s + h),Z(s)) = CZ(h)

. CZ(h) is valid if all model-based variances are nonnegative.

. There is relationship between CZ(h) and γZ(h)

. But, some processes with valid variogram, do not have valid covariogram.

. Example:

Spatial Statistics 31

Page 34: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Kriging

• Spatial-prediction method

• Linear predictor Y (s0) =∑λiY (si), with weights specified by semivariogram

• Optimal in terms of minimising mean squared prediction error

• The optimal spatial predictor E(Y (s0)|Z, β, CY (, ), σ2ε ] from the hierarchicalmodel is equivalent with the simple-kriging predictor

Spatial Statistics 32

Page 35: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Examples: Meuse River Zinc

universal kriging

. exponential variogram

. universal kriging

. log(zinc) significantly related to√dist

. prediction in grid (high-resolution)

Spatial Statistics 33

Page 36: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Remarks/issues

• Can become computer-intensive.

• Non-Gaussian geostatistical processes can be defined by replacing the data processwith an exponential-family data model

• Closed form expression are no longer available for complex problems, and MCMCis commonly used

• If covariates are included, universal kriging corresponds to hierarchical model (withimproper uniform distribution on fixed effect parameters)

• Change of support problem

• Multivariate geostatistical processes

Spatial Statistics 34

Page 37: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Splines

• Splines can be used as a smoother of the surface of interest

• There is a connection between splines and charging (can be shown via dual krigingequations)

• Radial basis splines correspond with valid variogram function

• Standard use of splines correspond with pre-selecting a particular semivariogram,without letting data decide which semivariogram gives the best fir (in contrast tokriging)

• However, estimation of spline basis-function, leads to optimal prediction (similarto kriging)

Spatial Statistics 35

Page 38: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

2.2 Application: Cadmium Concentration

• Investigate 4 models with different covariogram functions

M1: log(Cadmiumi) = β0 + β1.timei + S(xi, yi) + εiM2: log(Cadmiumi) = β0 + f (timei) + S(xi, yi) + εiM3: log(Cadmiumi) = β0 + f (timei) + β1ecoregioni + S(xi, yi) + εiM4: log(Cadmiumi) = β0 + bi + f (timei) + β1ecoregioni + S(xi, yi) + εi

• AIC values

Gaussian Matern Circular Spherical Exponential Thin plate

M1 1398.7 1402.2 1408.6 1405.5 1409.4 1440.3

M2 1392.5 1395.4 1400.7 1398.5 1402.5 1432.0

M3 1326.7 1326.0 1330.5 1328.8 1326.7 1357.8

M4 1056.5 1056.8 1065.8 1061.8 1060.4 1081.4

Spatial Statistics 36

Page 39: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Final Model

• Model yielding lowest AIC with Gaussian spatial process

log(Cadmiumi) = β0 + bi + f (timei) + β1ecoregioni + S(xi, yi) + εi

• bi ∼ N (0, τ 2) (τ 2=0.1172) random intercept per location

• 150 knots used for S(xi, yi)

• Estimated spatial process:

Spatial Statistics 37

Page 40: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Time component

Spatial Statistics 38

Page 41: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatial component

Spatial Statistics 39

Page 42: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Chapter 3

Area (Lattice) Data

• Under the assumption that we have data from the spatial process {Y (s) : s ∈ R},lattice data refers to the case where R is some countable collection of spatial sites.

• In other words, data can only be observed at the sites in R, and all subsequentinference applied only to those sites.

• Important concepts: Spatial Weights and Spatial Autocorrelation

Spatial Statistics 40

Page 43: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Kidney-Cancer Rates

Kidney cancer rates in Limburg (1996-2005)

. Limburgs Cancer Registry

. All new cancers

. Consists of 44 towns

. Largest population in middleof province

. Elevated levels of chemicalsand heavy metals

Spatial Statistics 41

Page 44: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

3.1 Spatial Autocorrelated Models

• Tests for Spatial Autocorrelation: In these discrete-space data, a common testperformed is to assess whether there is any spatial autocorrelation present in thedata. Some common tools for studying autocorrelation include the Geary’s C andMoran’s I statistics, and their corresponding randomization tests.

• Nearest-Neighbor Models: These models make use of a spatial Markovianassumption to model the data (i.e.: The value of the random variable at a givensite only depends on the values at a specified set of neighboring sites). Thesetypes of models lead to a number of so-called “auto”-models, (Autonormal,AutoPoisson, Autologistic, etc.) where the distribution of the random variable at agiven site depends on itself through its dependence with the nearest neighbors.

Spatial Statistics 42

Page 45: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Markov-Type model

• Gaussian (autonormal) model

Zi|Zj, j 6= i ∼ N

∑j

bijZj, τ2i

• Does not guarantee existence of the joint distribution for Z

• If ΣZ = (I −B)−1D positive definite

Z ∼ N(0, (I −B)−1D

)where B = {bij} and D is diagonal with Dii = τ 2i

• D−1(I −B) symmetric requiresbijτ2i

=bjiτ2j

for all i, j

Spatial Statistics 43

Page 46: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Conditional Autoregressive Model

• Data model:Y (si) ∼ Poi(Eiθi)

with

∗ Ei expected number in area i

• Process model:log(θi) = Xiβ + Zi

with

∗ Xi known covariates,

∗ [Zi|Zj, i 6= j, τ 2Z] ∼ N(Zi, τ2i ) random effect parameters

∗ Zi = 1∑j ωij

∑j Zjωij

∗ τ 2i = τ2u∑j ωij

∗ ωij = 1 is adjacent (and 0 otherwise)

Spatial Statistics 44

Page 47: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Comments /Issues

• CAR model commonly used model in disease mapping

• The idea: ’borrow strength’ from areas that are more precise (larger Ei); this isalso called shrinkage or super-population modelling.

• Ecological fallacy: relationships seen at aggregated levels could be different fromthose at individual level

• Modifiable areal unit problem/Change-of-support problem: how inference variesaccording to level of aggregation

• Alternative models: simultaneous autoregressive models, spatial moving averagemodel

Spatial Statistics 45

Page 48: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

3.2 Application: Kidney-Cancer Rates

Kidney cancer rates in Limburg (1996-2005)

. Best model:

. log(θi) = Xiβ + Zi + Vi

. [Zi|Zj, i 6= j, τ 2Z] ∼ N(Zi, τ2i )

. Vi ∼ N(0, τ 2V )

. CH > UH

Spatial Statistics 46

Page 49: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

3.3 Application: South Africa Poverty Data

. Census Data in South Africa: 2001,2011

. Number of households with access to piped water

. % of 20+ year-olds without any schooling

. Around 5% of population sampled

. Is the number of households with access to piped water related to the education?

Spatial Statistics 47

Page 50: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

South Africa Neighborhood Structure

Spatial Statistics 48

Page 51: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Proportion of households with access to piped water

piped water.jpg piped water.jpg piped water.jpg piped water.jpg

Spatial Statistics 49

Page 52: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Proportion of 20+ year-olds without schooling

no schooling.jpg no schooling.jpg no schooling.jpg no schooling.jpg

Spatial Statistics 50

Page 53: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Binomial Model

This can be modelled using a binomial likelihood:

yi ∼ Binomial(ni, pi)

logit(pi) = α + βxi + ui + vi

• α is an overall level of the relative risk

• β is the covariate effect

• vi is the uncorrelated heterogeneity

vi ∼ N(0, σ2v)

• ui is the correlated heterogeneity

. a spatial correlation structure is used

. estimation of the risk in any area depends on neighboring areas

[ui|uj, i 6= j, τ 2u ] ∼ N(ui, σ2i )

Spatial Statistics 51

Page 54: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Predicted Proportions

prop.jpg prop.jpg prop.jpg prop.jpg

Spatial Statistics 52

Page 55: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

CAR (CH) component

Spatial Statistics 53

Page 56: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

UH component

Spatial Statistics 54

Page 57: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Is % no-schooling significant?

node mean sd LL median UL start sample

alpha 0.2169000 0.021360 0.172900 0.216900 0.260800 10001 60000

beta -7.1140000 0.589400 -8.305000 -7.121000 -5.889000 10001 60000

An effect at the aggregate level cannot be interpreted on the individual level:ecological fallacy!

Spatial Statistics 55

Page 58: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Chapter 4

Spatial Point Processes

• For this third type of spatial process {Y (s) : s ∈ R}, R is again considered to bea random index set for the locations of the process, but here, the data do notconsist of realizations of some random variable at a given site. The data are thelocations of the sites themselves, and the collection of all the sites is the event ofinterest. In this way, a measure such as the count of the number of items over anysubset of R might be the key variable.

• If a spatial point process {Y (s) : s ∈ R} also consists of measurements{Z(s) : s ∈ R} taken at the locations indicated by Y , the process is known as amarked point process, where the measurements in Z are known as the “marks”.

Spatial Statistics 56

Page 59: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Examples:

7.2 Packages for the Analysis of Spatial Point Patterns 157

x

y

0.0

0.2

0.4

0.6

0.8

1.0

0.0

CELLS JAPANESE REDWOOD

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Fig. 7.1. Example of three point patterns re-scaled to fit in the unit square. On theleft, spatial distribution of the location of cell centres (Ripley, 1977); in the middle,Japanese black pine saplings (Numata, 1961); and on the right, saplings of Californiaredwood trees (Strauss, 1975)

> library(spatstat)

> data(japanesepines)

> summary(japanesepines)

Planar point pattern: 65 points

Average intensity 65 points per square unit (one unit = 5.7 metres)

Window: rectangle = [0, 1] x [0, 1] units

Window area = 1 square unit

Unit of length: 5.7 metres

The summary shows the average intensity in the region of interest; thisregion, known as a window, is also reported in the summary; windows arestored in objects of class owin. In this case, the points have been scaled tothe unit square already, but the size of the sampling square can be usedto retrieve the actual measurements. Note that spatstat windows may be ofseveral forms, here the window is a rectangle. When we coerce a ppp objectwith a rectangular window to a SpatialPoints object, the point coordinateswill by default be re-scaled to their original values.

> library(maptools)

> spjpines <- as(japanesepines, "SpatialPoints")

> summary(spjpines)

Object of class SpatialPoints

Coordinates:

min max

[1,] 0 5.7

[2,] 0 5.7

Spatial Statistics 57

Page 60: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Research Question

• interest focuses on detecting absence of spatial randomness (cluster statistics)

• clustered points vs dispersed points

• Methods of Analysis:

. Quadrat Methods: These methods involve counting the number of events(trees) in subsets of the region of study R, and comparing the observedfrequencies to the expected frequencies under a Poisson process. The quadratsthemselves are usually taken to be rectangular, but can be any shape desired.

. Kernel Estimation: Letting λ(·) represent the intensity parameter function forthe Poisson process, kernel estimation can be used to estimate λ.

. Distance Methods: These methods take advantage of the precise distancesbetween points, and often use nearest-neighbor methods to examine thedistribution of sites (trees) in a given area. Such methods include the Ripley’sK- and L-functions, as well as the F - and G-functions.

Spatial Statistics 58

Page 61: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Poisson Point Process Model

• Data model: the point process Y (si) is inhomogeneous Poisson point process withintensity λ(si)

• Process model:log(λ(si)) = X(si)β + δi

with

∗ Xi known covariates,

∗ δ(si) Gaussian process with mean 0 and spatial covariance matrix.

Spatial Statistics 59

Page 62: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Chapter 5

Spatio-Temporal Processes

• Next to spatial process, take into account time component

• Spatio-temporal dynamic model

Spatial Statistics 60

Page 63: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

5.1 Application: Surveillance data for space-time outbreakprediction

• Passive surveillance useful for early detection/warning of emerging diseases

• Develop dynamic spatio-temporal model for outbreak detection

• Use mortality of cattle over different areas and years

Spatial Statistics 61

Page 64: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatio-temporal model

yit ∼ NB(miµit)

log(µt) = α0 + xitγx + bt + ci + dit

with

. bt: temporal term including a linear and seasonal term

. ci = ui + vi: spatial term decomposed into spatially structured (CAR) andunstructured effect

. dit: space-time interaction, random time process spatially structured

Spatial Statistics 62

Page 65: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Results

Spatial Statistics 63

Page 66: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Results

Spatial Statistics 64

Page 67: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Results

Spatial Statistics 65

Page 68: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

5.2 Application: Bluetongue in cattle

• Bluetongue is infectious vector-borne disease is ruminants

• First case of BTV in Belgium in 2006

• Risk factors: climatic condition, land composition, animal movement

• Spatio-temporal transmission model: interaction within and between areas,movement between areas

Spatial Statistics 66

Page 69: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Bluetongue outbreak

Spatial Statistics 67

Page 70: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Environmental risk factors

Spatial Statistics 68

Page 71: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatio-temporal dynamic model

• Susceptible - Infected model:

Si,t = Ni − Ii,t

Ii,t =

t∑k=1

Yi,k

Yi,t ∼ Bin(Si,t−1, pi,t)

logit(pi,t) = β1Ind(Ii,t−1 = 0) + β2Ind(Ii,t−1 > 0) + Xβint

+(XWβW )Ii,t−1 + (XBβB)

N∑j=1

bi,jIj,t−1

• Model takes into account:

. background risk

. within-municipality transmission

. neighbourhood between-municipality transmission

Spatial Statistics 69

Page 72: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatial Statistics 70

Page 73: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

Spatial Statistics 71

Page 74: Christel Faes - soc.kuleuven.be · Spatio-temporal data.Location and time.Whereandwhenmatters.To infer cause-e ect relationships (’why’) Spatial Statistics 3. 1.2Areas of applications

5.3 Conclusion

• Spatial and spatio-temporal data of interest to answer several research question

• Spatio-temporal in demography of interest

• Geographical differences

• Temporal differences in different areas

• Level of aggregation: micro- or macro-scale analysis

• Predictions and simulations to investigate scenarios

Spatial Statistics 72