chapter 11 spatial analysis credit to prof michael goodchild

Chapter 11 Spatial AnalysisChapter 11 Spatial Analysis

Credit to Prof Michael Goodchild

Methods for working with spatial data to detect patterns, anomalies to find answers to questions to test or confirm theories (deductive

reasoning) to generate new theories and generalizations

(inductive reasoning) Methods for adding value to data

in doing scientific research in trying to convince others

What is spatial analysis?What is spatial analysis?

A collaboration between human and machine the machine does things the human finds too

tedious, difficult, complex to do by hand the human directs, makes interpretations

and inferences Ranging from simple to complex

some methods are mathematically sophisticated e.g. statistical tests

other methods are visual, intuitive, simple e.g. making and examining maps

The Snow map The Snow map

Cholera outbreak in Soho, 1854 Dr John Snow and the pump inference regarding the transmission mechanism

for cholera see www.jsi.com updating Snow

Openshaw's map of childhood leukemia in N England

Data types

Discrete objects (points, lines, areas) Fields spatially intensive, spatially extensive nominal, ordinal, interval, ratio, cyclic variables

Application domains Objectives

Types of spatial analysis Types of spatial analysis

nominal

e.g. vegetation class

no implied order, no arithmetic operations

no average

"central" value is the commonest class (mode) Ordinal

e.g. ranking from best to worst

implied order, but no arithmetic operations

no average

"central" value has half of cases above, half below (median)

Data typesData types

Interval

e.g. Fahrenheit temperature

differences make sense

arbitrary zero point

"central" value is the mean Ratio

e.g. weight

ratios make sense

absolute zero point

"central" value is the mean Cyclic

e.g. aspect

be careful with arithmetic

average of 1 and 359 is 180

Queries and reasoning Measurements Transformations Descriptive summaries Optimization Hypothesis testing

Six distinct objectives Six distinct objectives

In ArcMap map view table view linked views histogram view scatterplot view

QUERIES QUERIES

Exploratory spatial data analysis interactive methods to explore spatial data use of linked views finding anomalies mining large masses of data SQL structured or standard query language e.g. SELECT FROM counties WHERE median

value > 100,000

We spend our lives in the vague world of human discourse "is Santa Barbara north of LA?"

a GIS needs to know exactly what is meant by "north of"

is Reno east or west of San Diego?

we tend to think of the US as a square, with two N-S coasts

how to design a GIS to provide driving directions?

to direct people through airports?

REASONING WITH GIS REASONING WITH GIS

a GIS would be easier to use if could "think" and "talk" more like humans or

if there could be smooth transitions between our vague world and its precise world

in our vague world, terms like "north of" are context-specific

geographically relevant terms like "across" or "in" have many meanings

Measurements are often difficult to make by hand from maps

MEASUREMENT WITH GIS MEASUREMENT WITH GIS

Distance and length

calculation from metric coordinates

straight-line distance on a plane

Pythagorean distance

d = sqrt ((x1-x2)2+(y1-y2)2)

distance on a spherical Earth

from (lat1,long1) to (lat2,long2)

R is the radius of the Earth, roughly 6378 km

d = R arccos [sin lat1 sin lat2 + cos lat1 cos lat2 cos (long1 - long2)]

Length of a complex object

add the lengths of polyline or polygon segments

Two types of distortions

if segments are straight, length will be underestimated in general

for lines and areas

lengths are measured in the horizontal plane

underestimated in hilly areas

Area (of a polygon)

proceed in clockwise direction around the polygon

for each segment

drop perpendiculars to the x axis

this constructs a trapezium

compute the area of the trapezium

difference in x times average of y

keep a cumulative sum of areas

at the end, the sum will be the area of the polygon

when might the algorithm fail?

islands must all be scanned clockwise

holes must be scanned anticlockwise

holes have negative area

because of limited computer precision

results could be wrong if the area is very small and the

coordinate values are very large

e.g. in UTM or SPC

need double precision for calculations

but not for results

applying the algorithm to a coverage

keep running total for each polygon

for each arc

proceed segment by segment from FNODE to TNODE

add trapezia areas to R polygon area

subtract from L polygon area

on completing all arcs, totals are correct areas

Shape

how to measure shape of an area?

a compact shape has a small perimeter for a given area

compare perimeter to the perimeter of a circle of the same area

shape = perimeter / [3.54 sqrt (area)]

other types of districts designed with GIS

administrative regions

sales districts

Slope and aspect

measured from DEM raster

by comparing elevations of points in a 3x3 neighborhood

slope and aspect at one point estimated from elevations of it and

surrounding 8 points

various methods

important to know how your favorite GIS calculates slope

number points row by row from top left

from 1 to 9

b denotes slope in the x direction

c denotes slope in the y direction

D is the spacing of points

e.g. 30m for USGS DEMs

find the slope that fits best to the 9 elevations

minimizes the total of squared differences

between point elevation and the fitted slope

weighting four closer neighbors higher

b = dZ/dX = (z3 + 2z6 + z9 - z1 - 2z4 - z7) / 8D

c = dZ/dY = (z1 + 2z2 + z3 - z7 - 2z8 - z9) / 8D

tan (slope) = sqrt (b2 + c2)

slope defined as angle

or rise over horizontal run

or rise over actual run

tan (aspect) = b/c

aspect measured clockwise from vertical to direction of steepest slope

add 180 to aspect if c is positive, 360 to aspect if c is negative and b is

positive

Buffering Point in polygon Polygon overlay Spatial interpolation Density estimation

TRANSFORMATIONS TRANSFORMATIONS

Buffering

Transformations create new objects and data sets from existing objects and data sets

buffering takes points, lines, or areas and creates areas

every location within the resulting area is either:

in/on the original object

within the defined buffer width of the original object

Two versions

discrete object:

for every object, result is a new polygon object

new objects may overlap

field (objects cannot overlap):

every location on the map has one of two values:

inside buffer distance

outside buffer distance

Applications

find all households within 1 mile of a proposed new freeway

and send them notification of proposal

find all areas of Los Padres National Forest beyond 1 mile from a road

find all liquor stores within 1 mile of a school

and notify them of a proposed change in the law

Variants

raster and vector versions

vary the object's buffer width according to an attribute value

e.g. noise buffers depending on road traffic volume

vary the rate of spread according to a friction field

only in raster

e.g. travel speed varies

Thiessen polygons for point objects

the area closest to each point forms a polygon

POINT IN POLYGON

Determine whether a given point lies inside or outside a given polygon

assign a set of points to a set of polygons

e.g. count numbers of accidents in counties

e.g. whose property does this phone pole lie in?

Algorithm

draw a line from the point to infinity

count intersections with the polygon boundary

inside if the count is odd

outside if the count is even

Field case

point must lie in exactly one polygon

Discrete object case

point can lie in any number of polygons, including zero

Issues

algorithm for a coverage

what if the point lies on the boundary?

special cases

POLYGON OVERLAY

Create polygons by overlaying existing polygons

how many polygons are created when two polygons are overlaid?

Discrete object case

find overlaps between two polygons

e.g. a property and an easement

creates a collection of polygons

Field case

overlay two complete coverages

creates a new coverage

e.g. find all areas that are owned by the Forest Service and classified

as wetland

in vector or raster

in raster the values in each cell are combined, e.g. added

Issues

major computing workload

indexing

swamped by slivers

tolerance

SPATIAL INTERPOLATION

What is interpolation?

intelligent guesswork

an interval/ratio variable conceived as a field

temperature

soil pH

population density

sampled at observation points

needed:

values at other points

a complete surface

a contour map

a TIN

a raster of point values

Two methods commonly used in GIS

inverse-distance weighting (IDW)

Kriging (geostatistics)

Moving average/distance weighted average/inverse distance weighting

estimates are averages of the values at n known points

known values z1,z2,...,zn

unknown value z = Sum over i (wizi) / Sum over i (wi)

where w is some function of distance, such as:

w = 1/dk

w = e-kd

an almost infinite variety of algorithms may be used, variations

include:

the nature of the distance function

varying the number of points used

the direction from which they are selected

is the most widely used method

objections to this method arise from the fact that the range of interpolated

values is limited by the range of the data

other problems include:

how many points should be included in the averaging?

what to do about irregularly spaced points?

how to deal with edge effects?

Example

ozone concentrations at CA measurement stations

objectives:

1. estimate a complete field, make a map

2. estimate ozone concentrations at other locations

e.g. cities

data sets:

measuring stations and concentrations (point shapefile)

CA outline (polygon shapefile)

DEM (raster)

CA cities (point shapefile)

IDW wizard in Geostatistical Analyst

opening screen defines data source

next screen defines interpolation method

which power of distance? (2)

how many sectors? (4)

how many neighbors in each sector? (10-15)

next screen gives results of cross-validation

results map

Kriging

developed by Georges Matheron, as the "theory of regionalized variables",

and D.G. Krige as an optimal method of interpolation for use in the mining

industry

the basis of this technique is the rate at which the variance between points

changes over space

this is expressed in the variogram which shows how the average

difference between values at points changes with distance between

points

Kriging is based on an analysis of the data, then an application of the results

of this analysis to interpolation

Variograms

vertical axis is E(zi - zj)2, i.e. "expectation" of the difference

i.e. the average difference in elevation of any two points distance d

apart

d (horizontal axis) is distance between i and j

most variograms show behavior like the diagram

the upper limit (asymptote) is called the sill

the distance at which this limit is reached is called the range

the intersection with the y axis is called the nugget

a non-zero nugget indicates that repeated measurements at the

same point yield different values

in developing the variogram it is necessary to make some assumptions about

the nature of the observed variation on the surface:

simple Kriging assumes that the surface has a constant mean, no

underlying trend and that all variation is statistical

universal Kriging assumes that there is a deterministic trend in the

surface that underlies the statistical variation

in either case, once trends have been accounted for (or assumed not to

exist), all other variation is assumed to be a function of distance

Deriving the variogram

the input data for Kriging is usually an irregularly spaced sample of points

to compute a variogram we need to determine how variance increases with

distance

begin by dividing the range of distance into a set of discrete intervals, e.g. 10

intervals between distance 0 and the maximum distance in the study area

for every pair of points, compute distance and the squared difference

in z values

assign each pair to one of the distance ranges, and accumulate total

variance in each range

after every pair has been used (or a sample of pairs in a large dataset)

compute the average variance in each distance range

plot this value at the midpoint distance of each range

fit one of a standard set of curve shapes to the points

"model" the variogram

Computing the estimates

once the variogram has been developed, it is used to estimate distance weights

for interpolation

interpolated values are the sum of the weighted values of some number of

known points where weights depend on the distance between the interpolated

and known points

weights are selected so that the estimates are:

unbiased (if used repeatedly, Kriging would give the correct result on

average)

minimum variance (variation between repeated estimates is minimum)

problems with this method:

when the number of data points is large this technique is

computationally very intensive

the estimation of the variogram is not simple, no one technique is best

since there are several crucial assumptions that must be made about

the statistical nature of the variation, results from this technique can

never be absolute

simple Kriging routines are available in

the Surface II package (Kansas Geological Survey)

and Surfer (Golden Software)

the GEOEAS package for the PC developed

by the US Environmental Protection Agency, and in

ArcInfo 8 as an add-on Geostatistical Analyst

DENSITY ESTIMATION

Suppose you had a map of discrete objects and wanted to calculate their density

density of population

density of cases of a disease

density of roads in an area

density would form a field

density estimation is one way of creating a field from a set of discrete

objects

Methods

count the number of points in every cell of a raster

measure the length of lines, e.g. roads

result depends on cell size

result is very noisy, erratic

Density estimation using kernels

think of each point being replaced by a pile of sand of constant shape

add the piles to create a surface

example kernel

width of the kernel determines the smoothness of the surface

Density estimation and spatial interpolation applied to the same data

density of ozone measuring stations

using Spatial Analyst

kernel is too small (radius of 16 km)

kernel radius 150 km

what's the difference?

chapter 11 spatial analysis credit to prof michael goodchild

Documents