statistical methods for spatial and spatio-temporal health...

68
Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References Statistical Methods for Spatial and Spatio-Temporal Health Data: Introduction and Overview Jon Wakefield Departments of Statistics and Biostatistics University of Washington

Upload: duongnhi

Post on 31-Aug-2018

257 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Statistical Methods for Spatial andSpatio-Temporal Health Data:

Introduction and Overview

Jon Wakefield

Departments of Statistics and BiostatisticsUniversity of Washington

Page 2: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Outline

Course Details

Spatial Epidemiology

Background Epidemiology

Geographical Information Systems

Spatial Objects in R

CRS, Projections and Maps

Importing Shapefiles

Page 3: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Texts

• Spatial Books:• Elliott, P., Wakefield, J., Best, N. and Briggs, D. (2000). Spatial

Epidemiology: Methods and Applications, Oxford University Press.• Waller, L.A. and Gotway, C.A. (2004). Applied Spatial Statistics for

Public Health Data, Wiley, New York.

• Epidemiology Books:• Breslow, N.E. and Day, N.E. (1987). Statistical Methods in Cancer

Research. Volume I: The Analysis of Case-Control Studies, IARCScientific Publications Nos. 32, Lyon.

• Breslow, N.E. and Day, N.E. (1980, 1987). Statistical Methods inCancer Research. Volume II: The Design and Analysis of CohortStudies, IARC Scientific Publications Nos. 82, Lyon.

• Rothman, K. and Greenland, S. (1998). Modern Epidemiology,Second Edition, Lipincott-Raven.

Page 4: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Supplementary Texts

• R Computing Environment:• Bivand, R.S., Pebesma, E.J. and Gomez-Rubio, V. (2013). Applied

Spatial Data Analysis with R, Second Edition, Springer.• Hills, M., Plummer, M. and Carstensen, B. (2006). Statistical

Practice in Epidemiology with R.• Krause, A. and Olson, M. (2005). The Basics of S-Plus, Fourth

Edition, Springer-Verlag.

Page 5: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Supplementary TextsAdditional Spatial Books:

• Banerjee, S., Gelfand, A.E. and Carlin, B.P. (2004). Hierarchicalmodeling and analysis for spatial data, CRC Press.

• Diggle, P.J. (2013). Statistical Analysis of Spatial andSpatio-Temporal Point Patterns. CRC Press.

• Diggle, P.J. and P.J. Ribeiro (2007). Model-Based Geostatistics,Springer.

• Gelfand, A., Diggle, P., Guttorp, P. and Fuentes, M. editors (2010).Handbook of Spatial Statistics, CRC Press.

• Lawson, A.B. (2006). Statistical Methods in Spatial Epidemiology,2nd Edition, John Wiley and Sons.

• Lawson, A.B., Browne, W.J. and Rodeiro, C.L.V. (2003). DiseaseMapping with WinBUGS and MLwiN, John Wiley and Sons.

• Schabenberger, O. and Gotway, C.A. (2004). Statistical methods forspatial data analysis, CRC Press.

• Stein, M.L. (1999). Interpolation of spatial data: some theory forkriging, Springer.

Page 6: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Course Outline

• IntroductionMotivation of the need for spatial epidemiology; Types of spatialstudy and examples; Overview of epidemiological framework;Overview of Statistical Techniques.

• Disease MappingNon-spatial and spatial smoothing models; Bayesian inference andcomputation (INLA software); Statistical models; Space-timemodeling; Prevalence mapping; Geostatistics; Examples.

• Clustering and Cluster DetectionDistance/adjacency methods; Moving window methods; Risk surfaceestimation; Examples.

• Infectious Disease DataSpatio-Temporal Models for Count Data; SIR Models; Examples.

• Small area estimationIntroduction to survey data; Weighted estimators; SpatialSmoothing; Examples.

Page 7: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Course Schedule

• Monday 11.00–12.30: Motivation, Types of Data, GIS, SpatialAnalysis in R.

• Monday 2.00–3.30: Disease Mapping: Spatial Count Data.

• Monday 4.00–5.30: Disease Mapping: Spatial Count Data,Continued.

• Tuesday 9.00–10.30: Disease Mapping: Spatio-Temporal CountData.

• Tuesday 11.00–12.30: Disease Mapping: GeostatisticalApproaches.

• Tuesday 2.00–3.30: Clustering and Cluster Detection: Count Data.

• Tuesday 4.00–5.30: Clustering and Cluster Detection: Point Data.

• Wednesday 9.00–10.30: Infectious Disease Data.

• Wednesday 11.00–12.30: Small Area Estimation.

Lectures will be broken into teaching portions and practical sessions.

Page 8: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Motivation: Spatial Epidemiology

• Epidemiology: The study of the distribution, causes and control ofdiseases in human populations.

• Disease risk depends on the classic epidemiological triad of person(genetics/behavior), place and time – spatial epidemiology focuseson the second of these.

• Place is a surrogate for exposures present at that location,e.g. environmental exposures in water/air/soil, or the lifestylecharacteristics of those living in particular areas.

• Time, which may be measured on different scales(age/period/cohort), is also a surrogate for aging processes andexposures/experiences accrued.

Page 9: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Types of Data

• Key Point: People are not uniformly distributed in space, thereforewe need information on the background spatial distribution of thepopulation at risk in order to infer whether the spatial distribution ofcases differs.

• An important distinction is whether the data arise as:• Point data in which “exact” residential locations exist for cases and

non-cases, or• Count data in which aggregation (typically over administrative units)

has been carried out. These data are ecological in nature, in thatthey are collected across groups, in spatial studies the groups aregeographical areas.

Page 10: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Need for Spatial Methods

• All epidemiological studies are spatial!

• But often the study area is small and/or there is abundantindividual-level information and so spatial location is not acting as asurrogate for risk factors.

• When do we consider the spatial component?• When we are explicitly interested in the spatial pattern of disease

incidence? e.g. disease mapping, cluster detection.• When we want to leverage spatial dependence in rates to improve

estimation, e.g. small area estimation.• The clustering may be a nuisance quantity that we wish to

acknowledge, but are not explicitly interested in? e.g. in spatialregression we want to get appropriate standard errors.

• If we are interested in the spatial pattern then, if the data are not acomplete enumeration, we clearly we would prefer the data to be“randomly sampled in space”, i.e. not subject to selection bias thathas a spatial component.

Page 11: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Motivation

Growing interest in spatial epidemiology due to:

• Public interest in effects of environmental “pollution”, e.g. Sellafield,UK (Gardner, 1992), Three-Mile Island, US.

• Acknowledgment that many environmental/man-made risk factorsmay be detrimental to human health.

• Development of statistical/epidemiological methods for investigatingdisease “clusters”.

• Epidemiological interest in the existence of large/medium spread inchronic disease rates across different areas.

• Data availability: collection of health, population and exposure dataat different geographical scales.

• Increase in computing power and tools such as GeographicalInformations Systems (GIS).

Page 12: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Overview of Epidemiological Framework

Definition of Measures of Disease Occurrence:

• The incidence (proportion) measures the proportion of people whodevelop the disease during a specified period.

• The prevalence (proportion) is the proportion of people with thedisease at a certain time.

• The risk is the probability of developing the disease within aspecified time interval – can be estimated by the incidenceproportion (note: a probability so between 0 and 1).

• The relative risk is the ratio of risks under two exposure distributions(e.g., exposed and not exposed).

Precise definitions of the outcomes and exposures under study arerequired.The majority of epidemiological studies are observational in nature. Incontrast, an intervention provides an example of an experimental study.

Page 13: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Observational Study Types

• Cohort studies select a study population and obtain exposureinformation. The population is subsequently followed over time todetermine incidence. Requires large numbers of individuals (sincediseases are usually statistically rare), and long study duration (formost exposures/diseases).

• Case-control studies begin by identifying “cases” of the disease anda set of “controls”, exposure is then determined retrospectively.Although subject to selection bias, can overcome the difficulties ofcohort studies.

• Matched case-control studies are case-control studies in which casesare matched with controls on the basis of confounders.• Efficiency considerations lead to the conclusion that 3–5 controls to

each case is usually sufficient.• Frequency matched studies ensure that the ratio of controls to cases

is roughly constant within broad confounder bands (e.g. 10-year agebands). Individually matched studies carry out precise matching.

Page 14: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Observational Study Types• Cross-sectional studies determine the exposure and disease outcome

on a sample of individuals at a particular point of time.

• The nested case-control study starts with a cohort and identifiescases that have already occurred, or as they occur. For each case, aspecified number of controls is selected from within the cohortamong those in the cohort who have not developed the disease bythe time of disease occurrence in the case1. Some form oftime-matching is carried out.

• A case-cohort studies is a variant on the nested case-control studywithout matching. Hence, the same sub-cohort (a random sample ofthe complete cohort) can be used for multiple disease outcomes.

• Ecological studies use data on groups, areas in a spatial setting. Nodirect linkage between individual disease and exposures/confounders.

• Semi-ecological studies collect individual-level data on diseaseoutcome and confounders, and supplement with ecological exposureinformation.

1Theoretically every case-control study takes place within a cohort, but identifyingthe cohort is often difficult

Page 15: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Survival and Poisson Models

• Suppose we observe an area containing N people for a period [0,P]and observe Y counts of disease.

• For such count data, throughout the course, a starting point foranalysis is

Y |θ ∼ Poisson(Nθ), (1)

where θ is the risk, i.e. the probability of disease for an individual inthe area in the time frame [0,P].

• We show this model is related to an underlying survival model.

• Let T > 0 be the survival time, i.e. time until disease develops, foran individual in the area.

• The survivor function is defined as S(t) = Pr(T ≥ t), and theprobability density function is

f (t) = −dS(t)

dt.

Page 16: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Survival Modeling

• The hazard function is defined as

λ(t) = −S ′(t)

S(t)=

f (t)

S(t),

i.e. the relative rate of change of the survival function.

• The hazard rate is the limiting “instantaneous probability” of failure.

• Interpretation: Consider a small interval of time ∂t; then

Pr( failure in [t, t + ∂t) | survival until t )

= Pr(t ≤ T < t + ∂t|T ≥ t)

≈ λ(t)× ∂t.

• Note: λ(t) is a rate, a dynamic quantity and not a probability, astatic quantity.

Page 17: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Survival Modeling

• Suppose we have a constant hazard (i.e., an exponential survivalmodel) so that λ(t) = λ.

• The survivor function is S(t) = exp(−λt) and so

Pr( failure in [0,P] ) = 1− exp(−λP).

• Hence, assuming independence of each of the N individuals, we have

Y |λ ∼ Binomial [N, 1− exp(−λP)] .

Page 18: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Survival Modeling

• If λP small,1− exp(−λP) ≈ λP,

and so the failure probability (risk) is approximated by thecumulative rate.

• We then have the model

Y |λ ∼ Binomial(N, λP).

• For a rare disease,Y |λ ∼ Poisson(NλP).

• Note that NP approximates the person years at risk.

• Hence, in our original Poisson model (1) we have θ = λP.

Page 19: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Survival Analysis

• For general survival models the cumulative hazard function isdefined as

Λ(t) =

∫ t

0

λ(s)ds.

• Since

d

dtΛ(t) = λ(t) =

f (t)

S(t)= −S ′(t)

S(t)= − d

dtlog S(t),

we can express the survivor function as

S(t) = exp {−Λ(t)} .

Page 20: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Survival Modeling

• The failure probability in [0,P] is

1− S(P) = Pr( failure in [0,P] ) = 1− exp

{−∫ P

0

λ(t)dt

}= 1− exp {−Λ(P)}≈ Λ(P)

leading toY |Λ(P) ∼ Poisson [ NΛ(P) ]

so that in this case, in the Poisson model (1), θ is estimating therisk and approximating the cumulative rate.

Page 21: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Survival Modeling

• Models for λ(t) can be complex, since damaging processes can beoccurring on different temporal scales.

• λ(t) can vary with different rates in different years; these aresometimes known as period effects.

• The other obvious extension is to allow different hazard rates atdifferent ages.

• Finally, there may be different rates for different cohorts,i.e. individuals born at different times.

• Combining the three different time scales give age-period-cohortmodels (Carstensen, 2007; Kuang et al., 2008).

Page 22: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Risks and Rates

• We can estimate the probability of failure (the risk) in [0,P] (years)by

p =Y

N.

• The rate is estimated by

λ =p

P=

Y

NP,

i.e. the number of events divided by the person years.

• Note that a rate does not need to lie between 0 and 1.

• The rate is sometimes expressed as a function of some number ofyears. For example, if the units of P are years, the rate per 1000person years is

1000× λ = 1000× p

P= 1000× Y

NP.

Page 23: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Standardization

• We have so far not considered different confounder (e.g. age) groupsin each area.

• We will almost always have to account for age in the analysis, sincedifferent disease risks in different area may reflect differences in theage population.

• There are a number of ways to control for confounding, andcommon methods are direct or indirect standardization.

• Let• Yij denote the number of cases, within some specified period in area

i and confounder stratum j , and• Nij be the corresponding population at risk, i = 1, ...,m, j = 1, ..., J.• Let Zj denote the number of cases in a “reference”, or “standard”,

population.• Let Mj be the population in stratum j in this “reference”, or

“standard”, population.

Page 24: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Spatial Context of Risks and Rates

• The risk of disease in confounder stratum j in area i , over the timeperiod [0,P] (in years, say), is

pij =Yij

Nij.

• The rate of disease per 1000 person years is

1000× λij =1000× Yij

P × Nij.

• The crude rate in area i per 1000 person years is

1000× Yi

P × Ni

where Yi =∑

j Yij is the total number of cases and Ni =∑J

j=1 Nij isthe total population in area i .

Page 25: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Standardization

• The directly standardized rate, per 1000 person years, in area i is

1000×J∑

j=1

λijwj ,

where

wj =Mj∑j Mj

is the proportion of the population in stratum j (these weights maybe based on the world, or a uniform, population).

• The directly standardized rate is a weighted average of thestratum-specific rates, and corresponds to a counter-factualargument in which the estimated rates within the study region areapplied to the standard population.

Page 26: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Standardization

• If

ψj =Zj

P ×Mj

is a standard disease rate in stratum j then the comparativemortality/morbidity figure (CMF) for area i is:

CMFi =

∑Jj=1 λijwj∑Jj=1 ψjwj

.

• In small-area studies in particular the CMF is rarely used since it isvery unstable, due to small counts in stratum j in area i , Yij .

• Hence, for data aggregated over areas we will concentrate on theStandardized Mortality/Morbidity Ratio (SMR).

Page 27: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Indirect Standardization• The method of indirect standardization produces the standardized

mortality/morbidity ratio (SMR):

SMRi =Yi∑J

j=1 Nijqj

where qj is a reference risk.

• The indirectly standardized rate compares the total number of casesin an area to those that would result if the risks in the referencepopulation were applied to the population of area i .

• Which reference rates should be used?

• In a regression setting dangerous to use internal standardization inwhich

qj =Yj

Nj

since we might distort the effect of the exposure of interest.

• External standardization uses risks/rates from another region.

Page 28: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Expected Numbers• We have E [Yij ] = Nijpij where pij is the risk in area i and stratum j .• In a general model we would try to estimate all parameters pij

which, for small areas in particular, is not possible.• As an alternative we can assume the proportionality model

pij = θi × qj

so that θi is the relative risk associated with area i .• We then have E [Yij ] = Nijθi × qj . Summing over stratum:

E [Yi ] = E

J∑j=1

Yij

= θi

J∑j=1

Nijqj = θiEi

where the expected numbers are defined as

Ei =J∑

j=1

Nijqj .

• This assumption removes the need to estimate J risks in each area.

Page 29: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Expected Numbers

• The SMR in area i is

SMRi =Yi

Ei

and is an estimate of θi .

• If incidence is measured then also known as the StandardizedIncidence Ratio (SIR).

• Control for confounding may also be carried out using regressionmodeling.

• If there is a concern with confounding then we can not collapse thedata, and fit models to Yij (and so estimate the qj along with otherparameters in the model).

Page 30: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Data Quality Issues

• In routinely carried out investigations the constituent data are oftensubject to errors; local knowledge is invaluable forunderstanding/correcting these errors.

• Wakefield and Elliott (1999) contains more discussion of theseaspects.

Population data

• Population registers are the gold standard but counts from thecensus are those that are typically routinely-available.

• Census counts should be treated as estimates, however, sinceinaccuracies, in particular underenumeration, are common.

• For inter-censual years, as well as births and deaths, migration mustalso be considered.

• The geography, that is, the geographical areas of the study variables,may also change across censuses which causes complications.

Page 31: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Data Quality Issues

Health data

• For any health event there is always the possibility of diagnosticerror or misclassification.

• For other events such as cancers, case registrations may be subjectto double counting and under registration.

Exposure data

• Exposure misclassification is always a problem in epidemiologicalstudies.

• Often the exposure variable is measured at distinct locations withinthe study region, and some value is imputed for all of theindividuals/areas in the study.

• A measure of uncertainty in the exposure variable for eachindividual/area is invaluable as an aid to examine the sensitivity toobserved relative risks.

Combining the population, health and exposure data is easiest if suchdata are nested, that is, the geographical units are non-overlapping.

Page 32: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Geographical Information Systems (GIS)

• A GIS is a computer-based set of tools for collecting, editing, storing,integrating, displaying and analyzing spatially referenced data.

• A GIS allows linkage and querying of geographically indexedinformation.

• So for example, for a set of geographical residential locations a GIScan be used to retrieve characteristics of the neighborhood withinthe locations lies (e.g. census-based measures such as populationcharacteristics and distributions of income/education), and theproximity to point (e.g. incinerator) and line (e.g. road) sources ofpollution.

• Buffering – a specific type of spatial query in which an area isdefined within a specific distance of a particular point, line or area.

Page 33: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Geographical Information Systems (GIS)

• Time activity modeling of exposures – we may trace the pathway ofan individual, or simulate the movements of a population groupthrough a particular space-time concentration field, in order toobtain an integrated exposure.

• In this course, we will not use any GIS tools, but use capabilitieswithin R and WinBUGS/GeoBUGS to display maps.

Examples

• Figure 1 shows a map of Washington state with various featuressuperimposed; this was created with the Maptitude GIS.

• Figure 2 smoothed relative risk estimates for bladder cancer.

• Figure 3 shows 16 monitor sites in London – a GIS was used toextract mortality and population data within 1km of the monitors,and the association with SO2 was estimated.

Page 34: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Figure 1 : Features of Washington state, created using a GIS.

Page 35: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Figure 2 : Smoothed relative risk estimates for bladder cancer in 1990–2000for counties of Washington state.

Page 36: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Figure 3 : Air pollution monitor sites in London.

Page 37: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

R for Spatial Analysis

• R has extensive spatial capabilities, see the Spatial task view:

http://cran.r-project.org/web/views/Spatial.html

• Chris Fowler’s R GIS class site:

http://csde.washington.edu/services/gis/workshops/SPATIALR.shtml

• Other GIS resources:

http://geostat-course.org/node

• Some of the notes that follow are strongly based on Roger Bivand’snotes taken from this site, and these are based on Bivand et al.(2013), which is the reference book!

Page 38: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Representing Spatial Data

• Spatial classes were defined to represent and handle spatial data, sothat data can be exchanged between different classes.

• The sp library is the workhorse for representing spatial data.

• The most basic spatial object is a 2D or 3D point: a set ofcoordinates may be used to define a SpatialPoints object.

• From the help function

S p a t i a l P o i n t s ( coords , p r o j 4 s t r i n g=CRS( as . c h a r a c t e r (NA) ) ,bbox = NULL)

• PROJ.4 is a library for performing conversions between cartographicprojections.

• The points in a SpatialPoints object may be associated with a setof attributes to give a SpatialPointsDataFrame object.

Page 39: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Representing Spatial Data• splancs was pre-sp and so does not use spatial objects:

l i b r a r y ( sp ) ; l i b r a r y ( s p l a n c s )data ( s o u t h l a n c s ) ; summary ( s o u t h l a n c s )

x y ccMin . :346475 Min . :412437 Min . : 0 . 0 0 0 0 01 s t Qu . : 3 5 3 0 3 1 1 s t Qu . : 4 1 7 3 5 8 1 s t Qu . : 0 . 0 0 0 0 0Median :355870 Median :421900 Median : 0 . 0 0 0 0 0Mean :355526 Mean :421518 Mean : 0 . 0 5 8 5 23 rd Qu . : 3 5 8 2 0 2 3 rd Qu . : 4 2 5 9 8 4 3 rd Qu . : 0 . 0 0 0 0 0Max . :364435 Max . :428987 Max . : 1 . 0 0 0 0 0

SpPtsObj <− S p a t i a l P o i n t s ( s o u t h l a n c s [ , c (” x ” ,” y ” ) ] )summary ( SpPtsObj )Object o f c l a s s S p a t i a l P o i n t sC o o r d i n a t e s :

min maxx 346475 364435y 412437 428987I s p r o j e c t e d : NAp r o j 4 s t r i n g : [NA]Number o f p o i n t s : 974SpPtsDFObj <− S p a t i a l P o i n t s D a t a F r a m e ( c o o r d s=SpPtsObj ,

data=as . data . f rame ( s o u t h l a n c s $ c c ) )

Page 40: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Spatial Lines and Polygons• A Line object is just a collection of 2D coordinates while a Polygon

object is a Line object with equal first and last coordinates.

• A Lines object is a list of Line objects, such as all the contours ata single elevation; the same relationship holds between a Polygons

object and a list of Polygon objects, such as islands belonging tothe same county.

• SpatialLines and SpatialPolygons objects are made using listsof Lines and Polygons objects, respectively.

• SpatialLinesDataFrame and SpatialPolygonsDataFrame

objects are defined using SpatialLines and SpatialPolygons

objects and standard data frames, and the ID fields are here requiredto match the data frame row names.

• For data on rectangular grids (oriented N-S, E-W) there are tworepresentations: SpatialPixels and SpatialGrid.

• Spatial*DataFrame family objects usually behave like data frames,so most data frame techniques work with the spatial versions, e.g.[] or $.

Page 41: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Visualizing Spatial Data

• We demonstrate how points and polygons can be plotted on thesame graph.

• Note that the default is for axes not to be included.

data ( meuse )c o o r d s <− S p a t i a l P o i n t s ( meuse [ , c (” x ” ,” y ” ) ] )summary ( c o o r d s )Object o f c l a s s S p a t i a l P o i n t sC o o r d i n a t e s :

min maxx 178605 181390y 329714 333611I s p r o j e c t e d : NAp r o j 4 s t r i n g : [NA]Number o f p o i n t s : 155

Page 42: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Visualizing Spatial Data

meuse1 <− S p a t i a l P o i n t s D a t a F r a m e ( coords , meuse )data ( meuse . r i v )r i v e r p o l y g o n <− Polygons ( l i s t ( Polygon ( meuse . r i v ) ) , ID=”meuse ”)r i v e r s <− S p a t i a l P o l y g o n s ( l i s t ( r i v e r p o l y g o n ) )summary ( r i v e r s )Object o f c l a s s S p a t i a l P o l y g o n sC o o r d i n a t e s :

min maxx 178304.0 182331.5y 325698.5 337684.8I s p r o j e c t e d : NAp r o j 4 s t r i n g : [NA]p l o t ( as ( meuse1 , ” S p a t i a l ” ) , a x e s=T)p l o t ( meuse1 , add=T)p l o t ( r i v e r s , add=T)

Page 43: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

178000 179000 180000 181000 182000

3300

0033

1000

3320

0033

3000

Figure 4 : The Meuse river and sampling points.

Page 44: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Spatial Pixels and Grids

• For data on rectangular grids (oriented N-S, E-W) there are tworepresentations: SpatialPixels and SpatialGrid.

• SpatialPixels are like SpatialPoints objects, but thecoordinates have to be regularly spaced. Coordinates and gridindices are stored.

• SpatialPixelDataFrame objects only store attribute data where itis present, but need to store the coordinates and grid indices ofthose grid cells.

• SpatialGridDataFrame objects do not need to store coordinates,because they fill the entire defined grid, but they need to store NA

values where attribute values are missing.

• Spatial*DataFrame family objects usually behave like data frames,so most data frame techniques work with the spatial versions, e.g.[] or $.

Page 45: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Visualizing Spatial Data

• Plotting spatial data can be provided in a variety of ways, seeChapter 3 of Bivand et al. (2013).

• The most obvious is to use the regular plotting functions, byconverting Spatial dataframes to regular dataframes, for exampleusing as.data.frame.

• Trellis graphics (which produce conditional plots) are particularlyuseful for plotting maps over time.

Page 46: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Visualizing Spatial Data

• We construct a SpatialPixelsDataFrame object for the Meuseriver grid data provided.

data ( meuse . g r i d ) # a g r i d w i t h 40 m x 40 m s p a c i n g t h a t c o v e r s# t h e Meuse s t u d y a r e a

c o o r d s <− S p a t i a l P i x e l s ( S p a t i a l P o i n t s ( meuse . g r i d [ , c (” x ” ,” y ” ) ] ) )meuseg1 <− S p a t i a l P i x e l s D a t a F r a m e ( coords , meuse . g r i d )p l o t ( r i v e r s , a x e s=T, c o l =”a z u r e 1 ” , x l i m=c ( 17 6 0 00 , 18 2 00 0 ) ,

y l i m=c ( 3 2 9 4 0 0 , 3 3 4 0 0 0 ) )box ( )p l o t ( meuseg1 , add=T, c o l =”g r e y 6 0 ” , cex =0.15)

Page 47: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

176000 177000 178000 179000 180000 181000 182000

3290

0033

0000

3310

0033

2000

3330

0033

4000

Figure 5 : A grid next to the Meuse river.

Page 48: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Visualizing Spatial Data

• Now we plot a continuous variable using a particular class intevalstyle.

• The “Fisher-Jenks” style uses the “natural breaks” of class intervalsbases on minimizing the within-class variance.

l i b r a r y ( c l a s s I n t )l i b r a r y ( RColorBrewer )p a l <− brewer . p a l ( 3 , ” B l u e s ”)f j 5 <− c l a s s I n t e r v a l s ( meuse1$z inc , n=5, s t y l e =” f i s h e r ”)p l o t ( f j 5 , p a l=p a l )f j 5 c o l s <− f i n d C o l o u r s ( f j 5 , p a l )p l o t ( as ( meuse1 , ” S p a t i a l ” ) , a x e s=T)p l o t ( meuse1 , c o l=f j 5 c o l s , pch =19, add=T)l e g e n d (” t o p l e f t ” , f i l l =a t t r ( f j 5 c o l s , ” p a l e t t e ” ) ,

l e g e n d=names ( a t t r ( f j 5 c o l s , ” t a b l e ” ) ) , bty=”n ”)l i b r a r y ( l a t t i c e )b u b b l e ( meuse1 , ” z i n c ” , m a x s i z e =1.8 , key . e n t r i e s =100∗2ˆ(0:4) ,

main=” z i n c l e v e l s a t Meuse ”)

Page 49: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

0 500 1000 1500 2000

0.0

0.2

0.4

0.6

0.8

1.0

ecdf(x$var)

x

Fn(

x)

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●●

●●●●

●●●●●●●●●●●

●●●●●●●●●●●

●●● ●●

●●● ●●

●● ● ● ●●● ● ●

Figure 6 : Illustration of Fisher-Jenks natural breaks with five classes.

Page 50: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

178000 179000 180000 181000 182000

3300

0033

1000

3320

0033

3000

●● ●

●●

●●

●●

●●

●●●

● ●●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

[113,307.5)[307.5,573)[573,869.5)[869.5,1286.5)[1286.5,1839]

Figure 7 : Zinc levels plotted using Fisher-Jenks breaks.

Page 51: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

zinc levels at Meuse

●● ●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

1002004008001600

Figure 8 : Bubble plot of zinc levels.

Page 52: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

John Snow...

• For fun, let’s look at the poster child of health mapping.

• The Snow data consists of the relevant 1854 London streets, thelocation of 578 deaths from cholera, and the position of 13 waterpumps (wells) that can be used to re-create John Snow’s mapshowing deaths from cholera in the area surrounding Broad Street,London in the 1854 outbreak.

• The following code was taken from

http://www.inside-r.org/packages/cran/HistData/docs/Snow

l i b r a r y ( Hi s tData }data ( Snow . d e a t h s )data ( Snow . pumps )data ( Snow . s t r e e t s )data ( Snow . p o l y g o n s )l i b r a r y ( sp )

Page 53: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

John Snow...

# s t r e e t ss l i s t <− s p l i t ( Snow . s t r e e t s [ , c (” x ” ,” y ” ) ] ,

as . f a c t o r ( Snow . s t r e e t s [ , ” s t r e e t ” ] ) )L l 1 <− l a p p l y ( s l i s t , L i n e )L s l 1 <− L i n e s ( Ll1 , ” S t r e e t ”)Snow . s t r e e t s . sp <− S p a t i a l L i n e s ( l i s t ( L s l 1 ) )p l o t ( Snow . s t r e e t s . sp , c o l =”g r a y ”)t i t l e ( main=”Snow ’ s C h o l e r a Map o f London ”)# d e a t h s

Snow . d e a t h s . sp = S p a t i a l P o i n t s ( Snow . d e a t h s [ , c (” x ” ,” y ” ) ] )p l o t ( Snow . d e a t h s . sp , add=TRUE, c o l =’ red ’ , pch =15, cex =0.6)# pumps

spp <− S p a t i a l P o i n t s ( Snow . pumps [ , c (” x ” ,” y ” ) ] )Snow . pumps . sp <−

S p a t i a l P o i n t s D a t a F r a m e ( spp , Snow . pumps [ , c (” x ” ,” y ” ) ] )p l o t ( Snow . pumps . sp , add=TRUE, c o l =’ b lue ’ , pch =17, cex =1.0)t e x t ( Snow . pumps [ , c (” x ” ,” y ” ) ] , l a b e l s=Snow . pumps$ labe l ,

pos =1, cex =0.8)

Page 54: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Snow's Cholera Map of London

Oxford St #1Oxford St #2

Gt Marlborough

Crown Chapel

Broad St

WarwickBriddle St

So SohoDean St

Coventry StVigo St

Figure 9 : Deaths (red dots) and pumps (blue triangles)

Page 55: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Coordinate Reference Systems

• Coordinate reference systems (CRS) are key to GIS.

• The CRS allows, amongst other things, datasets to be combined anda reference for calculated distances.

• How to represent attributes within some region on the face of theearth on the plane.

• A geographic CRS can be expressed in degrees and gives arepresentation of a model for the shape of the earth — a primemeridian and a datum, which anchors a CRS to an origin in 3D,including a height from the center of the earth or above a standardmeasure of sea level.

• Most countries have multiple CRS.

Page 56: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Projections

• If we have a set of points on the surface of the Earth they may berepresented by their associated latitude and longitude (one possiblecoordinate system).

• Lines of longitude pass through the north and south poles. Theorigin is the line set to 0◦ and is the line of longitude passingthrough the Greenwich Observatory in England.

• Longitude can be measured in degrees (0◦ to 180◦) east or westfrom the 0◦ meridian.

• Due to the curvature of the Earth the distance between twomeridians (line of longitude) depends on where we are.

• Latitude references North-South position and lines of latitude (calledparallels) are perpendicular to lines of longitude.

• The equator is defined as 0◦ latitude.

• Once a coordinate system is decided upon once must decide onwhich projection is to be used for display, i.e. the map projection.

Page 57: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Projections

• Different map projections distort areas, shapes, distances anddirections in different ways.

• Conformal (e.g. Mercator) projections preserve local shape.

• Equal-area (e.g. Albers) projections preserve local area.

• Once projection has taken place a grid system must be established.

• One common system is the Universal Transverse Mercator (UTM)coordinate system, which is a conformal mapping system.

Page 58: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Reading a Shapefile into R

• ESRI (a company one of whose products is ArcGIS) shapefilesconsist of three files: this is a common form.

• The first file (*.shp) contains the geography of each shape.

• The second file (*.shx) is an index file which contains record offsets.

• The third file (*.dbf) contains feature attributes with one record perfeature.

• The Washington state Geospatial Data Archive(http://wagda.lib.washington.edu/) contains data that can be readinto R.

• As an example, consider Washington county data was downloaded,which consists of three files: wacounty.shp, wacounty.shx,wacounty.dbf.

• The following code reads in these data and then draws a countylevel map of 1990 populations, and a map with centroids.

• We also read in Washington census tract data.

Page 59: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Reading a Shapefile into R

To add variables to be plotted to a shapefile, the *.dbf file may be edited(in excel for example), with an extra column containing the variable youwant to plot.

l i b r a r y ( maps )l i b r a r y ( RColorBrewer )l i b r a r y ( s h a p e f i l e s )l i b r a r y ( maptoo l s )

# t h e f o l l o w i n g i s u s e f u l to s e e i f you have t h e# v e r s i o n s you wants e s s i o n I n f o ( )

v e r s i o n 2 . 1 0 . 1 (2009−12−14)a t t a c h e d base packages :[ 1 ] s t a t s g r a p h i c s g r D e v i c e s u t i l s d a t a s e t s methods baseo t h e r a t t a c h e d packages :[ 1 ] RCo lorBrewer 1 .0−2 S p a t i a l E p i 0 . 1 s h a p e f i l e s 0 . 6[ 4 ] m a p t o o l s 0 .7−34 l a t t i c e 0 .18−3 f o r e i g n 0 .8−38[ 7 ] maps 2 .1−4 s p 0 .9−65 MASS 7.3−4

Page 60: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Reading a Shapefile into RNote that there are problems with the files, which are sorted by using thelatest version of maptools, thanks to Roger Bivand (using the repair=T

argument).

wacounty <− r e ad Sh ap eP o ly ( f i l e . choose ( ) ,p r o j 4 s t r i n g=CRS(”+ p r o j=l o n g l a t ” ) , r e p a i r=T)

# e n t e r wacounty . shpnames ( wacounty )[ 1 ] ”AreaName” ” AreaKey ” ”INTPTLAT” ”INTPTLNG” ”TotPop90” ”CNTY”

# Let ’ s s e e what t h e s e v a r i a b l e s l o o k l i k ewacounty$AreaName # county names

[ 1 ] WA, Adams County WA, A s o t i n County WA, Benton County[ 4 ] WA, Chelan County WA, C l a l l a m County WA, C l a r k County

. . .[ 3 4 ] WA, Thurston County WA, Wahkiakum County WA, . . .[ 3 7 ] WA, Whatcom County WA, Whitman County WA, . . .39 L e v e l s : WA, Adams County WA, . . . WA, Yakima Countywacounty$AreaKey # FIPS ( F e d e r a l I n f o r m a t i o n P r o c e s s i n g

# S t a n d a r d s ) codes : 2 d i g i t s f o r s t a t e , 3 f o r c t y[ 1 ] 53001 53003 53005 53007 53009 53011 53013 53015 53017

. . .[ 3 7 ] 53073 53075 5307739 L e v e l s : 53001 53003 53005 53007 53009 53011 . . . 53077

Page 61: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

WA County Data

wacounty$INTPTLAT # l a t i t u d e[ 1 ] 46 .98899 46.18248 46.24764 47.87696 48.10950 45.77367

. . .[ 3 3 ] 48 .39702 46.92512 46.29340 46.22599 48.83375 46.88668wacounty$INTPTLNG # l o n g i t u d e

[ 1 ] −118.5569 −117.1850 −119.5015 −120.6414 −123.9312. . .[ 3 6 ] −118.4784 −121.9001 −117.5191 −120.7389wacounty$CNTY

[ 1 ] 1 3 5 7 9 11 13 15 17 19 21 23 . . . 45 47 49[ 2 6 ] 51 53 55 57 59 61 63 65 67 69 71 73 75 7739 L e v e l s : 1 11 13 15 17 19 21 23 . . . 9wacounty$TotPop90

[ 1 ] 13603 17605 112560 52250 56464 238053 402482119 26205. . .[ 3 7 ] 127780 38775 188823

Page 62: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

WA County Data

# Now l e t ’ s map t h e p o p u l a t i o n sp l o t v a r <− wacounty$TotPop90 # v a r i a b l e we want to mapn c l r <− 8 # n e x t few l i n e s s e t up t h e c o l o r scheme f o r p l o t t i n gp l o t c l r <− brewer . p a l ( n c l r , ” BuPu”)b r k s <− round ( q u a n t i l e ( p l o t v a r , p r o b s=seq ( 0 , 1 , 1 / ( n c l r ) ) ) , d i g i t s =1)colornum <− f i n d I n t e r v a l ( p l o t v a r , brks , a l l . i n s i d e=T)c o l c o d e <− p l o t c l r [ co lornum ]p l o t ( wacounty )p l o t ( wacounty , c o l=c o l c o d e , add=T)l e g e n d (−119 ,46 , l e g e n d=l e g l a b s ( round ( brks , d i g i t s =1)) , f i l l =p l o t c l r ,

cex =0.6 , bty=”n ”)p l o t ( wacounty)# Now t h e map w i t h c e n t r o i d st e x t ( wacounty$INTPTLNG , wacounty$INTPTLAT )

Page 63: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

under 8720.28720.2 − 17110.517110.5 − 27780.827780.8 − 3877538775 − 58634.558634.5 − 97339.597339.5 − 201811.5over 201811.5

Figure 10 : 1990 WA populations by county.

Page 64: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

1

23

45

6

78

9

10

1112

1314

15

16

1718

19

20

21

2223

24

25

26

27

28 29

30

31

32

33

34

35 36

37

38

39

Figure 11 : 1990 WA county labels.

Page 65: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

WA Census Tract Data

# Now c e n s u s t r a c t sw a t r a c t <− r e ad Sh ap eP o ly ( f i l e . choose ( ) , p r o j 4 s t r i n g=CRS(”+ p r o j=l o n g l a t ” ) , r e p a i r=T) #

names ( w a t r a c t )[ 1 ] ”AreaName” ” AreaKey ” ”INTPTLAT” ”INTPTLNG” ”TotPop90”

”TRACT” ”CNTY”p l o t v a r <− watract$TotPop90 # v a r i a b l e we want to mapb r k s <− round ( q u a n t i l e ( p l o t v a r , p r o b s=seq ( 0 , 1 , 1 / ( n c l r ) ) ) ,

d i g i t s =1)colornum <− f i n d I n t e r v a l ( p l o t v a r , brks , a l l . i n s i d e=T)c o l c o d e <− p l o t c l r [ co lornum ]p l o t ( w a t r a c t )p l o t ( wat ract , c o l=c o l c o d e , add=T)l e g e n d (−119 ,46 , l e g e n d=l e g l a b s ( round ( brks , d i g i t s =1)) ,

f i l l =p l o t c l r , cex =0.6 , bty=”n ”)

Page 66: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

under 1761.61761.6 − 2690.82690.8 − 3369.93369.9 − 3969.53969.5 − 4729.14729.1 − 56345634 − 6834.2over 6834.2

Figure 12 : 1990 WA populations by census tract.

Page 67: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

Final Comments

• Finding the shapefiles and coercing the data into the right form formapping is often not trivial.

• A great resource for finding shapefiles at different geographicalscales:http://gadm.org/country

• Particular polygon files can be found in specific R packages, e.g, forUS 2000 census tracts:http://cran.r-project.org/web/packages/UScensus2000tract/

• Once shapefles are read into R, we can add the variables we wish tomap/analyze to the spatial data frames.

Page 68: Statistical Methods for Spatial and Spatio-Temporal Health ...faculty.washington.edu/jonno/Newcastlematerial/Newcastle-Spatial... · Statistical Methods for Spatial and Spatio-Temporal

Course Details Spatial Epi Background Epidemiology GIS Spatial Objects in R Maps Importing Shapefiles References

References

Bivand, R., Pebesma, E., and Gomez-Rubio, V. (2013). Applied SpatialData Analysis with R, 2nd Edition. Springer, New York.

Carstensen, B. (2007). Age–period–cohort models for the lexis diagram.Statistics in medicine, 26, 3018–3045.

Gardner, M. (1992). Childhood leukaemia around the sellafield nuclearplant. In P. Elliott, J. Cuzick, D. English, and R. Stern, editors,Geographical and Environmental Epidemiology: Methods forSmall-area Studies, pages 291–309, Oxford. Oxford University Press.

Kuang, D., Nielsen, B., and Nielsen, J. (2008). Identification of theage-period-cohort model and the extended chain-ladder model.Biometrika, 95, 979–986.

Wakefield, J. and Elliott, P. (1999). Issues in the statistical analysis ofsmall area health data. Statistics in Medicine, 18, 2377–2399.