introduction to geospatial analysis in r

63
Introduction to Geospatial Analysis in R SURF – 24 April 2012 Daniel Marlay

Upload: edolie

Post on 10-Feb-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Geospatial Analysis in R. SURF – 24 April 2012 Daniel Marlay. Synopsis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Geospatial Analysis in R

Introduction toGeospatial Analysis in R

SURF – 24 April 2012Daniel Marlay

Page 2: Introduction to Geospatial Analysis in R

Synopsis• This month's talk is going to look at the geo-spatial

capabilities of R. We'll look at how to import common geographical data formats into R and some of the free geographic data sources and map layers available. We'll then look at how to create maps in R using this data, and some of the ways to style it to display our data. We'll look at how R stores geographic data and how we can perform queries against that - for example identifying which points fall into a particular region. Finally, we'll take a brief look at modeling geospatial data and some of the issues to be aware of.

Page 3: Introduction to Geospatial Analysis in R

Introduction

• There are extensive geospatial capabilities in R– I’ve just started to scratch the surface

• This presentation will give a little bit of theory– Most of the content is a walk through of doing

geospatial analysis in R• I’ve picked data sets that are freely available– Trying this yourself is the best way to learn

• And maybe we’ll learn something about the way Australians vote…

Page 4: Introduction to Geospatial Analysis in R

R Geospatial Packages

• sp – provides a generic set of functions, classes and methods for handling spatial data

• rgdal – provides an R interface into the Geospatial Data Abstraction Library (GDAL) which is used to read and write geospatial data from R

Page 5: Introduction to Geospatial Analysis in R

Types of Geospatial Data

• Vector data– Points– Lines– Areas

• Bitmap– Often used for image data (e.g. aerial photos)– Needs to be registered to a coordinate system

• “Labelled” data– Has geographic information, but needs to be matched

before it can be used

Page 6: Introduction to Geospatial Analysis in R

Setting up the R Environment## Set working directory to where the data is. Update as required if running this yourselfsetwd("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation");

## Load the relevant librarieslibrary(sp); # Basic R classes for handling geographic datalibrary(rgdal); # Library for using the Geographic Data Abstraction Layerlibrary(nlme); # Library that gives us generalised least squares

Page 7: Introduction to Geospatial Analysis in R

Obtain Census Data (1/6)

Page 8: Introduction to Geospatial Analysis in R

Obtain Census Data (2/6)

Page 9: Introduction to Geospatial Analysis in R

Obtain Census Data (3/6)

Page 10: Introduction to Geospatial Analysis in R

Obtain Census Data (4/6)

Page 11: Introduction to Geospatial Analysis in R

Obtain Census Data (5/6)

Page 12: Introduction to Geospatial Analysis in R

Obtain Census Data (6/6)

Page 13: Introduction to Geospatial Analysis in R

Read In Census Data (1/3)## Read in and clean the census data (Note: a lot of this cleaning could be done more easily in Excel)

EducationLevel <- read.csv("EducationData.csv",skip=6,na.strings="");

EducationLevel <- EducationLevel[c(-1,-2),c(-1,-27)]; # Remove leading and trailing blank columns and blank second rowEducationLevel <- EducationLevel[-(97:100),]; # Remove trailing blank lines

#### Create some useable column namesEduDataCols <- paste(c(rep("Male",8),rep("Female",8),rep("Total",8)), rep(c("NotStated","InadDescr","Postgrad","GradDipCert","Bachelor","Diploma","Certificate","NA"),3), sep=".");colnames(EducationLevel) <- c("SED",EduDataCols);

Page 14: Introduction to Geospatial Analysis in R

Read In Census Data (2/3)

#### Recode the data into character and numeric data to avoid weird errors from factorsEducationLevel[,1] <- as.character(EducationLevel[,1]);for (col in EduDataCols) { EducationLevel[,col] <- as.numeric(as.character(EducationLevel[,col]));}

#### Eyeball the data to make sure it is ok.summary(EducationLevel);head(EducationLevel,10);tail(EducationLevel,10);

Page 15: Introduction to Geospatial Analysis in R

Read In Census Data (3/3)

Page 16: Introduction to Geospatial Analysis in R

Obtain Electoral Data (1/4)

Page 17: Introduction to Geospatial Analysis in R

Obtain Electoral Data (2/4)

Page 18: Introduction to Geospatial Analysis in R

Obtain Electoral Data (3/4)

Page 19: Introduction to Geospatial Analysis in R

Obtain Electoral Data (4/4)

Page 20: Introduction to Geospatial Analysis in R

Read In Electoral Data (1/2)## Read in the electoral dataElectionResults <- read.csv("2011NSWElectionResults.csv");

#### Eyeball data to make sure it is oksummary(ElectionResults);head(ElectionResults);tail(ElectionResults);

Page 21: Introduction to Geospatial Analysis in R

Read In Electoral Data (2/2)

Page 22: Introduction to Geospatial Analysis in R

Obtain Geography (1/4)

Page 23: Introduction to Geospatial Analysis in R

Obtain Geography (2/4)

Page 24: Introduction to Geospatial Analysis in R

Obtain Geography (3/4)

Page 25: Introduction to Geospatial Analysis in R

Obtain Geography (4/4)

Page 26: Introduction to Geospatial Analysis in R

Read In SED Geography (1/3)## Read in the state electoral division boundaries (geography) and explore the SpatialPolygonsDataFrame classSED <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation\\Geographies","SED06aAUST_region");

#### Have an initial look at the SED data set that we've just read insummary(SED);plot(SED);

Page 27: Introduction to Geospatial Analysis in R

Read In SED Geography (2/3)

Page 28: Introduction to Geospatial Analysis in R

Read In SED Geography (3/3)

Page 29: Introduction to Geospatial Analysis in R

Examining the SpatialPloygonsDataFrame (1/2)

#### SED is a SpatialPolygonsDataFrame, an S4 object. We can have a look at how it is constructedmode(SED);slotNames(SED);summary(SED@data);summary(SED@polygons);SED@plotOrder;SED@bbox;SED@proj4string;

Page 30: Introduction to Geospatial Analysis in R

Examining the SpatialPloygonsDataFrame (2/2)

Page 31: Introduction to Geospatial Analysis in R

Simple Mapping of SpatialPolygonsDataFrames (1/2)

#### Let's now look at some more mapping, we've seen that we can plot all of Australiaplot(SED[SED$STATE_2006 == "1",]); # Plot NSW

plot(SED[SED$STATE_2006 == "1",],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); # Plot Sydney - xlim and ylim from google maps ;-)

plot(SED[SED$STATE_2006 == "1",],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); # Plot Sydney and put on some electoral district namestext(coordinates(SED[SED$STATE_2006 == "1",]),labels=(SED[SED$STATE_2006 == "1",])$NAME_2006,cex=0.5);

Page 32: Introduction to Geospatial Analysis in R

Simple Mapping of SpatialPolygonsDataFrames (1/2)

Page 33: Introduction to Geospatial Analysis in R

Thematic Mapping (1/8)## Thematic mappingSED.NSW <- SED[SED$STATE_2006 == "1",]; # subset of SED for convenience

#### Create a ThemeData data set with a summary of the data we are interested in - proportion of people with a tertiary educationThemeData <- data.frame(SED = as.character(EducationLevel$SED), PropTertiaryEd = (EducationLevel$Total.Postgrad + EducationLevel$Total.GradDipCert + EducationLevel$Total.Bachelor + EducationLevel$Total.Diploma + EducationLevel$Total.Certificate) / (EducationLevel$Total.Postgrad + EducationLevel$Total.GradDipCert + EducationLevel$Total.Bachelor + EducationLevel$Total.Diploma + EducationLevel$Total.Certificate + EducationLevel$Total.NA), stringsAsFactors=FALSE);

hist(ThemeData$PropTertiaryEd); # Histogram of the proportions to work out the appropriate cut points

ThemeData$PropTertiaryEdFact <- cut(ThemeData$PropTertiaryEd,c(0,0.25,0.3,0.35,0.4,0.5,1.0)); # Create a factor for the proportion variablelevels(ThemeData$PropTertiaryEdFact) <- c("25% or Less","25% to 30%","30% to 35%","35% to 40%","40% to 50%","More than 50%");

Page 34: Introduction to Geospatial Analysis in R

Thematic Mapping (2/8)

Page 35: Introduction to Geospatial Analysis in R

Thematic Mapping (3/8)#### Display a thematic map for all of NSWbands <- length(levels(ThemeData$PropTertiaryEdFact));pal <- heat.colors(bands);plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,ThemeData$SED)]]); # Note the use of match() to get the right rowslegend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01);

#### Display a thematic map for Sydneyplot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,ThemeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4));legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01);

Page 36: Introduction to Geospatial Analysis in R

Thematic Mapping (4/8)

Page 37: Introduction to Geospatial Analysis in R

Thematic Mapping (5/8)#### Now we'll add the election results to our ThemeData data setrownames(ElectionResults) <- as.character(ElectionResults$District); # Adding rownames allows us to index by them when matchingThemeData$PropGreenVote <- ElectionResults[ThemeData$SED,"GRN"] / ElectionResults[ThemeData$SED,"Total"]; # Create a green vote proportion variable

hist(ThemeData$PropGreenVote,breaks=20); # Have a look at the distribution

ThemeData$PropGreenVoteFact <- cut(ThemeData$PropGreenVote,c(0,0.05,0.06,0.08,0.1,0.15,1.0)); # Create a factorlevels(ThemeData$PropGreenVoteFact) <- c("Less than 5%","5% to 6%","6% to 8%","8% to 10%","10% to 15%","More than 15%");

Page 38: Introduction to Geospatial Analysis in R

Thematic Mapping (6/8)

Page 39: Introduction to Geospatial Analysis in R

Thematic Mapping (7/8)#### And do some thematic maps of the election resultsbands <- length(levels(ThemeData$PropGreenVoteFact));pal <- heat.colors(bands);plot(SED.NSW,col=pal[ThemeData$PropGreenVoteFact[match(SED.NSW$NAME_2006,ThemeData$SED)]])legend("bottomright", legend=levels(ThemeData$PropPropGreenVoteFactFact), fill=pal, title="Prop. Voted Green",inset=0.01)

plot(SED.NSW,col=pal[ThemeData$PropGreenVoteFact[match(SED.NSW$NAME_2006,ThemeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4))legend("bottomright", legend=levels(ThemeData$PropGreenVoteFact), fill=pal, title="Prop. Voted Green",inset=0.01)

Page 40: Introduction to Geospatial Analysis in R

Thematic Mapping (8/8)

Page 41: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (1/9)

Page 42: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (2/9)

Page 43: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (3/9)

Page 44: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (4/9)

Page 45: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (5/9)

Page 46: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (6/9)

Page 47: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (7/9)

Page 48: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (8/9)

Page 49: Introduction to Geospatial Analysis in R

Obtain Topographic Map Data (9/9)

Page 50: Introduction to Geospatial Analysis in R

Geographic Querying (1/4)## Demonstration of geographic querying

#### Read in the Localities layer from the TOPO 2.5M data setLocs <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation\\Geographies\\localities","aus25lgd_p");Mtns <- Locs[Locs$LOCALITY == "6",]; # Select only mountains

plot(Mtns)

#### Use the over function to find a list of mountains in SEDs with more than 10% green votesover(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns); # Only gets one mountain per SED

over(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns,returnList=TRUE); # Gets all mountains, but in a less useful format

do.call("rbind",over(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns,returnList=TRUE)); # Gives us something a bit more useable

Page 51: Introduction to Geospatial Analysis in R

Geographic Querying (2/4)

Page 52: Introduction to Geospatial Analysis in R

Geographic Querying (3/4)

Page 53: Introduction to Geospatial Analysis in R

Geographic Querying (4/4)

Page 54: Introduction to Geospatial Analysis in R

Geospatial Modelling (1/6)## Spatial GLS relating proportion who vote green to proportion with a higher education

#### Add some spatial data to the ThemeData data set - using equidistant conic coordinates - lat-long give greater distance distortionSED.NSW.coords.eqdc <- coordinates(spTransform(SED.NSW,CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0=-33.5 +lon_0=151 +x_0=0 +y_0=0")));rownames(SED.NSW.coords.eqdc) <- as.character(SED.NSW$NAME_2006);colnames(SED.NSW.coords.eqdc) <- c("x","y");

plot(spTransform(SED.NSW,CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0=-33.5 +lon_0=151 +x_0=0 +y_0=0"))); # shows how the conic projection lookslines(spTransform(gridlines(SED.NSW,easts=seq(140,160,by=2.5),norths=seq(-37.5,-27.5,by=2.5)),CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0=-33.5 +lon_0=151 +x_0=0 +y_0=0")));

tail(ThemeData);ThemeData2 <- ThemeData[-(94:96),]; # Remove the last few rows of ThemeData - they don't have geographic locationsThemeData2 <- cbind(ThemeData2,SED.NSW.coords.eqdc[ThemeData2$SED,]);head(ThemeData2);summary(ThemeData2);

Page 55: Introduction to Geospatial Analysis in R

Geospatial Modelling (2/6)

Page 56: Introduction to Geospatial Analysis in R

Geospatial Modelling (3/6)#### Start with a basic linear modelmodel1 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,na.action=na.omit);summary(model1);plot(model1);

plot(Variogram(model1, form=~x+y)); # Note the correlation structure

Page 57: Introduction to Geospatial Analysis in R

Geospatial Modelling (4/6)

Page 58: Introduction to Geospatial Analysis in R

Geospatial Modelling (5/6)#### Now try some gls models with spatial correlation structuresmodel2 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corExp(form=~x+y),na.action=na.omit);summary(model2);plot(model2);plot(Variogram(model2, form=~x+y));

model3 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corGaus(form=~x+y),na.action=na.omit);summary(model3);plot(model3);plot(Variogram(model3, form=~x+y));

model4 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corSpher(form=~x+y),na.action=na.omit);summary(model4);plot(model4);plot(Variogram(model4, form=~x+y));

#### Compare the models using AICAIC(model1,model2,model3,model4); # Looks like adding the correlation structure gave no benefit

Page 59: Introduction to Geospatial Analysis in R

Geospatial Modelling (6/6)

Page 60: Introduction to Geospatial Analysis in R

Nice Looking Map (1/2)## Finally, lets put together a good looking map.

Roads <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation\\Geographies\\roads","aus25vgd_l");

SED.NSW.coords <- coordinates(SED.NSW);sydrows <- (SED.NSW.coords[,1] > 150.5) & (SED.NSW.coords[,1] < 151.4) & (SED.NSW.coords[,2] > -34.3) & (SED.NSW.coords[,2] < -33.4);SED.SYD <- SED.NSW[sydrows,];

sydgrid <- gridlines(SED.SYD,easts=seq(150.4,151.6,by=0.1),norths=seq(-34.3,-33.4,by=0.1));sydgridat <- gridat(SED.SYD,easts=seq(150.4,151.6,by=0.1),norths=seq(-34.3,-33.4,by=0.1));

pdf("FinalMap.pdf");bands <- length(levels(ThemeData$PropTertiaryEdFact));pal <- heat.colors(bands);plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,ThemeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.5,-33.4))lines(Roads,col="black",xlim=c(150.6,151.4),ylim=c(-34.5,-33.4));legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01,bty="n",bg="white")title(c("Proportion of People with Tertiary Education","by Sydney State Electoral Divisions"),sub="Data from 2006 Census")dev.off();

Page 61: Introduction to Geospatial Analysis in R

Nice Looking Map (2/2)

Page 62: Introduction to Geospatial Analysis in R

Example Data Sources

• Census geographies– http://www.abs.gov.au/websitedbs/D3310114.nsf

/home/Geography?opendocument#from-banner=LN

• Census results (CDATA Online)– http://www.abs.gov.au/CDATAOnline

• NSW State Electoral Results– http://elections.nsw.gov.au/home

• Geoscience Australia – Topographic Maps– http://www.ga.gov.au/

Page 63: Introduction to Geospatial Analysis in R

QUESTIONS?