a practical guide to practical guide to geostatistical mapping tomislav hengl isric | world soil...

A Practical Guide toGeostatistical Mapping

Tomislav Hengl

ISRIC — World Soil Information, Wageningen University

GEOSTAT course, 11-17 April 2011, Canberra

Topics

I spatio-temporal data — elements, aspects, formats

I data import (GDAL) and visual exploration

I geographic data, maps, cartographic projections systems(proj4)

I Google Earth — the final GIS?I Spatio-temporal statistics — basics:

1. spatial prediction / automated mapping2. kriging, regression, regression-kriging3. some applications


Today, everybody is a spatial analyst!

I We have the tools that allow GIS+statistics integrationI There is more and more auxiliary data:

1. MODIS (global coverage, 250 m, every 2 days, 36 bands)2. Meteorological images (e.g.SEVIRI; 1 km, every 15 mins., 12

bands)3. SRTM DEM, GDEM, LiDAR (topography, 30–100 m)

I We can automate data analysis (“Get results sooner, withmore accuracy. . . and retire sooner” Chih Jeng Kenneth Tan)

I GE registered more than 350 millions of downloads!


http://google-latlong.blogspot.com/2008/02/truly-global.html

GIS analysis for all

“From a period in which geographic information sys-tems, and later geocomputation and geographical in-formation science, have been agenda setters, thereseems to be interest in trying things out, in ex-pressing ideas in code, and in encouraging othersto apply the coded functions in teaching and appliedresearch settings.”

Roger Bivand


http://www.nhh.no/Default.aspx?ID=697

The missing link

I Our projects typically depend on both statistical and GISanalysis

I Some believe that this could all be done within R

I Others believe that this could all be done within commercialpackages (ArcGIS)

I . . . and the winner is:

1. R — scripting, statistical computing2. SAGA/GRASS — GIS data input and geographical analysis3. Google Earth — storage, sharing, browsing,


Basic concepts

I Models — statistical model (conceptual); data models(formats); model parameters;

I Methods (functions) — implemented as algorithms; inputs,outputs, arguments;

I Data — variables: target variables, auxiliary variables(predictors); metadata; geoinformation;

I Applications — field-specific; result interpretation;associated uncertainty;


What is spatio-temporal statistics about?

Spatio-temporal statistics — statistical techniques adjusted tohandle spatio-temporal data.

Geostatistics is a subset of statistics specialized in analysis andinterpretation of geographically (and temporally) referenced data.

Geostatistics is an analytical tool for statistical analysis ofsampled field data.The bottom line is — you collect (spatio-temporal) data and youneed tools that can help you answer field-specific questions(i.e.that can help you produce outputs of interest — maps,predictions, statistical measures).


Geostatistics — topics

Typical questions of interest to a geostatistician are:

I how does a variable vary in space?

I what controls its variation in space?

I where to locate samples to describe its spatial variability?

I how many samples are needed to represent its spatialvariability?

I what is a value of a variable at some new location?

I what is the uncertainty of the estimate?


Analysis objectives

For Diggle and Ribeiro (2007) there are three scientific objectivesof geostatistics:

1. model estimation, i.e.inference about the model parameters;

2. prediction, i.e.inference about the unobserved values of thetarget variable;

3. hypothesis testing;


http://www.leg.ufpr.br/mbgbook/

Environmental variables

Quantitative or descriptive measures of different environmentalfeatures.

I biology (distribution of species and biodiversity measures)

I soil science (soil properties and types)

I vegetation science (plant species and communities, landcover types)

I climatology (climatic variables at surface and benith/above)

I hydrology (water quantities and conditions)


Example


Variables

* Name, definition

* Feature of interest

* Measurement units

* Representation, data model, domain

* Spatio-temporal pattern

* Application, decision making process, datainterpretation


Example: pH

I environmental feature: acidity in soil

I variable of interest: pH factor

I units: concentration of the H+ ions in soil (negativeexponent)

I sampling technique: pH meter (field or laboratory); soilsolution

I targeted output: a map of continuous values ofconcentration (continuous fields)

I interpretation: values of pH define properties of soil (acid,neutral, alkaline soils)


Spatial variability

Commonly a result of complex processes working at the sametime and over long periods of time, rather than an effect of asingle realization of a single factor.

Sum of two components: (a) the natural spatial variation and(b) the inherent noise.

I Geographical variation (2D)

I Vertical variation (3D)

I Temporal variation

I Variation at different scales (support size)


A way to classify variables

1. SRV — short-range variability

2. TV — temporal variability

3. VV — vertical variability

4. SSD — standard sampling density

5. DRS — remote-sensing detectability

Other important issues: (6) sampling costs, (7) global or localcoverage, (8) relationship with other variables, (9) scalability


What you need to know!

Each 2D map of an environmental variable should always indicatea time reference (interval), applicable vertical dimension1

and the sample (support) size i.e.the effective scale.

It is also important to know: the approximate geographicalcoordinates of the study area (gravity point), borders of thearea of interest (mask), coordinate system (proj4 string) andwho and how made the map.

1Orthogonal distance from the land surface.


Look again at these maps


Nature of variables

From a meta-physical perspective, what we are most oftenmapping in geostatistics are, in fact, quantities of molecules of acertain kind or quantities of energy.

Many variables directly refer to processes and are expressed inquantity per time units — e.g.mm of rainfall per year.

In ecology: objects of interest (individual plants or animals), oftenimmeasurable in quantity — animal species change their locationdynamically, often in unpredictable directions and withunpredictable spatial patterns (non-linear trajectories); occurrencerecords — 0/1 observations; these are modeled using thestatistical probability theory.


Data models/formats

Data format is the way we define structure and elements of arecord of some variable/feature.Data format dictates many things: the way we edit(reading/writing), search, compute, transform or scale data.

Data formats are software-specific — everybody has a differentidea about how to represent data digitally.


Data formats in R

R classes — what type of object is it?

numeric: array of numbers (vector/matrix); dataframe: thefundamental structure for statistical analysis; matrix: with namedcolumns (roughly, database fields) and (optionally) named rows(roughly, database cases); models/formulas: complex hierarchicalstructure (a set of lists, vectors, dataframes);

Common classes in R: numeric, string, integer, factor,Date-Time (POSIX/C99) etc.


Spatial data in R

R has special classes for spatial data: spatial points, pointpatterns, pixels, lines, grids, CRS etc.; many existing spatialstatistics packages work with these classes;

http://www.r-project.org/Rgeo/

Pebesma, E.J., Bivand, R.S., 2005. Classes and methods forspatial data in R. R News 5/2, 9–13.Bivand, R.S., Pebesma, E.J., Gomez-Rubio, V., 2008. AppliedSpatial Data Analysis with R. UseR! Series, Springer, 378 pp.


http://www.r-project.org/Rgeo/

http://cran.r-project.org/doc/Rnews/Rnews_2005-2.pdf

http://cran.r-project.org/doc/Rnews/Rnews_2005-2.pdf

http://www.asdar-book.org/

http://www.asdar-book.org/

Example of a gridded data in R

Formal class 'SpatialGridDataFrame' [package "sp"] with 6 slots

..@ data :'data.frame': 5530 obs. of 1 variables:

.. ..$ lgn3: int [1:5530] 1 1 1 1 1 1 1 22 22 22 ...

..@ grid :Formal class 'GridTopology' [package "sp"] with 3 slots

.. .. ..@ cellcentre.offset: Named num [1:2] 190163 314013

.. .. .. ..- attr(*, "names")= chr [1:2] "x" "y"

.. .. ..@ cellsize : num [1:2] 25 25

.. .. ..@ cells.dim : int [1:2] 70 79

..@ grid.index : int(0)

..@ coords : num [1:2, 1:2] 190163 191888 314013 315963

.. ..- attr(*, "dimnames")=List of 2

.. .. ..$ : NULL

.. .. ..$ : chr [1:2] "x" "y"

..@ bbox : num [1:2, 1:2] 190150 314000 191900 315975

.. ..- attr(*, "dimnames")=List of 2

.. .. ..$ : chr [1:2] "x" "y"

.. .. ..$ : chr [1:2] "min" "max"

..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots

.. .. ..@ projargs: chr NA


Compare with ArcInfo ASCII

ncols 70

nrows 79

xllcorner 190150

yllcorner 314000

cellsize 25.00

nodata_value 0

1 1 1 1 1 1 1 22 22 22 22 22 22 22 22 22 1 1

1 17 17 17 17 24 17 17 17 17 17 17 17

17 17 17 24 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 4 5 5 5 5 5 5 5 5 6 6 6 2 2 2 2 2

1 1 1 1 1 22 22 22 22 22 22 22 22 22 22 22

1 1 1 17 17 17 24 24 24 17 17 17 17 17 17

17 17 17 24 24 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 ...


Compare with Idrisi raster I

file title : Land cover types from LGN3

data type : byte

file type : binary

columns : 70

rows : 79

ref. system : epsg:28992

ref. units : m

unit dist. : 1.0000000

min. X : 190150.00000

max. X : 191900.00000

min. Y : 314000.00000

max. Y : 315975.00000

pos'n error : unknown

resolution : 25.000000

min. value : 0

max. value : 39

value units : meter

value error : unknown

flag value : 0

flag def'n : missing data

legend cats : 26


Compare with Idrisi raster II

category 0 :

category 1: Agrarisch gras

category 2: Maıs

category 3: Aardappelen

category 4: Bieten

category 5: Granen

category 6: Overige landbouwgewassen

category 7: Glastuinbouw

category 8: Boomgaarden

category 9: Bloembollen

category 10: Loofbos

category 11: Naaldbos

category 12: Droge heide

category 13: Open begroeid natuurgebied

category 14: Kale grond natuurgebied

category 15: Zoet water

...


Land cover map in NL

LGN3 land cover classes

314500

315000

315500

190500 191000 191500

1

2

3

4

5

6

8

10

15

17

18

19

20

21

22

24

25


What is GDAL/OGR?

Translation of data from one software to other is now made simplethanks to:

I GDAL2 — Geospatial Data Abstraction Library

I OGR — OpenGIS Simple Features Reference Implementation

Note: not all software producers support GDAL!

2http://www.gdal.org/formats_list.html


http://www.gdal.org/formats_list.html

PROJ.4

Geographic data always refers to some referent coordinatesystem.

PROJ.4 — Cartographic Projections Library (this allows you toreproject maps to almost any coordinate system)http://spatialreference.org

Today, it is much easier to move maps from one to otherprojection system than 10 years ago (reproject on-fly). You onlyneed to assign the correct proj4string and then you do not have toworry about it any more. Unless you got it wrong — all parametersneed to be absolutely correct!


http://proj.maptools.org/

http://spatialreference.org

EPSG

European Petroleum Survey Group (EPSG) databasehttp://www.epsg-registry.org/

World standard or user-defined coordinate systems;e.g.Amersfoort / RD New (EPSG 28992):

+proj=sterea

+lat_0=52.15616055555555 +lon_0=5.38763888888889

+k=0.999908 +x_0=155000 +y_0=463000

+ellps=bessel

+towgs84=565.237,50.0087,465.658,-0.406857,

0.350733,-1.87035,4.0812

+units=m +no_defs


http://www.epsg-registry.org/

WGS84

The only truly global reference model of the Earth is the WorldGeodetic System (WGS84) ellipsoid:

> EPSG <- make_EPSG()

> EPSG[EPSG$note=="# WGS 84",-2]

code prj4

249 4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs

Our current space-time location:

> 149.11902 E, -35.28028 N, 14 April, 03:15 GMT


http://earth-info.nga.mil/GandG/wgs84/

http://earth-info.nga.mil/GandG/wgs84/

Google Earth


Why not always use Longlat system?

I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1

I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)

I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);

I Map units are abstract — arcseconds, arcminutes,arcdegrees


From Geographic to Projected coordinates


Grid cell size / scale

see also “Finding the right pixel size”


http://dx.doi.org/10.1016/j.cageo.2005.11.008

Impact of grid cell size


Resampling

For example in SAGA GIS:$module name : Get Grid Data for Shapes

...

-INTERPOL:<num> Interpolation

Choice

Available Choices:

[0] Nearest Neighbor

[1] Bilinear Interpolation

[2] Inverse Distance Interpolation

[3] Bicubic Spline Interpolation

[4] B-Spline Interpolation


Resampling scheme (nearest neighbor and bilinear)


Spatial prediction techniques

Next topics:

I Spatial prediction — basic principles, classification(mechanical / statistical methods)

I Kriging — semivariance, variogram, ordinary kriging,characteristics of kriging, variants of kriging

I Regression — correlation, prediction error, OLS, GLS, GLMs

I Regression-kriging — the generic spatial prediction model


Books on geostatistics

I Goovaerts, P., 1997. Geostatistics for Natural ResourcesEvaluation (Applied Geostatistics). Oxford University Press,New York, 496 pp.

I Webster, R. and Oliver, M.A., 2007. Geostatistics forEnvironmental Scientists. Statistics in Practice. JohnWiley & Sons, Chichester, 330 pp.

I Pebesma, E.J., 2003. Gstat User’s manual. University ofUtrecht, Utrecht www.gstat.org

I Rossiter D.G., 2008. Spatial analysis and Geostatistics,lecture notes, ITC.


http://www.geog.uu.nl/gstat/manual/gstat.html

www.gstat.org

Geostatistical mapping

“Analytical production of maps by using field observations, auxiliaryinformation and a computer program that calculates values atlocations of interest (a study area)”


Spatial prediction model

A widely-accepted generic spatial prediction model3:

z (s0) = E {Z |z (si), qk (s0), γ(h), s ∈ A} (1)

where z (si) is the input point dataset, qk (s0) is the list ofdeterministic predictors and γ(h) is the covariance model definingthe spatial autocorrelation structure.

3A spatial prediction model defines inputs, outputs and the computationalprocedure to derive outputs based on the given inputs.


Spatial prediction scheme


Spatial prediction techniques I

1. MECHANICAL (DETERMINISTIC) MODELS — Theseare models where arbitrary or empirical model parameters areused. No estimate of the model error is available and usuallyno strict assumptions about the variability of a feature exist.The most common techniques that belong to this group are:

I Thiessen polygons;I Inverse distance interpolation;I Regression on coordinates;I Natural neighbors;I Splines;I . . .


Spatial prediction techniques II

2. LINEAR STATISTICAL (PROBABILITY) MODELS —In the case of statistical models, the model parameters arecommonly estimated in an objective way, following probabilitytheory. The predictions are accompanied with an estimate ofthe prediction error. A drawback is that the input data setusually need to satisfy strict statistical assumptions. There areat least four groups of linear statistical models:

I kriging (plain geostatistics);I environmental correlation (e.g.regression-based);I Bayesian-based models (e.g.Bayesian Maximum Entropy);I hybrid models (e.g.regression-kriging);I . . .


Spatial prediction techniques III

3. EXPERT-BASED SYSTEMS — These models can becompletely subjective (ergo irreproducible) or completelybased on data; predictions are typically different for each run.Expert systems can also largely be based on probability theory(especially Bayesian statistics), however, it is good to putthem in a different group because they are conceptuallydifferent from standard linear statistical techniques. There areat least three groups of expert based systems:

I mainly knowledge-driven expert system (e.g.hand-drawnmaps);

I mainly data-driven expert system (e.g.based on neuralnetworks);

I machine learning algorithms (purely data-driven);


Inverse distance interpolation

A value of target variable at some new location can be derived as aweighted average:

z (s0) =

n∑i=1

λi(s0) · z (si) = λT0 · z (2)

where λi is the weight for neighbour i . The sum of weights needs toequal one to ensure an unbiased interpolator.The simplest approach for determining the weights is to use the inversedistances from all points to the new point:

λi(s0) =

1dβ(s0,si )

n∑i=0

1dβ(s0,si )

; β > 1 (3)


Kriging

A standard version of kriging is called ordinary kriging. The predictionsare based on the model:

Z (s) = µ+ ε′(s) (4)

where µ is the constant stationary function (global mean) and ε′(s) is thespatially correlated stochastic part of variation.The predictions are obtained using:

zOK(s0) =

n∑i=1

wi(s0) · z (si) = λT0 · z (5)

where λ0 is the vector of kriging weights (wi), z is the vector of nobservations at primary locations.


Kriging (2)

The kriging OK weights are solved by multiplying the covariances:

λ0 = C−1 · c0; C (|h| = 0) = C0 + C1 (6)

where C is the covariance matrix derived for n × n observations and c0is the vector of covariances at new location.

C (s1, s1) · · · C (s1, sn) 1

......

...C (sn , s1) · · · C (sn , sn) 1

1 · · · 1 0

−1

·

C (s0, s1)

...C (s0, sn)

1

=

w1(s0)

...wn(s0)ϕ

(7)


Semivariance

The basis of kriging is derivation and plotting of the so-calledsemivariances — differences between the neighbouring values:

γ(h) =1

2E[(z (si)− z (si + h))

2]

(8)

where z (si) is the value of target variable at some sampled location andz (si + h) is the value of the neighbor at distance si + h.Suppose that there are n point observations, this yields n · (n − 1)/2pairs for which a semivariance can be calculated.

Once we calculated an experimental variogram, we can fit it using some

of the authorized variogram models, such as linear, spherical,

exponential, circular, Gaussian, Bessel, power. . .


Variograms


Anisotropy


Anisotropy in gstat

The variogram models can be extended to even larger number ofparameters if either (a) anisotropy or (b) smoothness are considered inaddition to modelling of nugget and sill variation.

The 2D geometric anisotropy in gstat, for example, is modelled byreplacing the range parameter with three parameters — range in themajor direction (direction of the strongest correlation), angle of theprincipal direction and the anisotropy ratio:

vgm(nugget=1, model="Sph", sill=10, range=2,

anis=c(30,0.5))

where value of the angle of major direction is 30 (azimuthal directionmeasured in degrees clockwise), and value of the anisotropy ratio is 0.5

(range in minor direction is two times shorter).


http://www.gstat.org/manual/node20.html

Kriging, steps

330000

331000

332000

333000

178500 179500 180500 181500

●●●●

●●

●●● ●

●●

●●●●

●●

●●●●●

●●●●

●●

●

●●● ●

●●●●●●

●●●

●

●●●●●

●●

●●●●●●●●

●●●●

●●●●

●

●●●●

●●●●●

●●●●

●

●

●

●●

●●

●● ●●

●

●●

●

●●●

●●

●

●

●

●●

●

●

●●●

●●

●●

●

●

●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●●●

● ●

●

●●●●● ●

●●

●●●

●

●

●

●

●

●

1002004008001600

(a)

distance

sem

ivar

ianc

e

1

2

3

500 1000 1500

●●●

●●

●

●●

●

●

●●

●

●●

●●

●●● ●

●●

● ●● ●●

●●

●●● ●●●

●

●

●

●●●●

●●

●

●

●

●●●●

●●

●

●●

●

●●●●●●●● ●●●

●● ●

●●

●

●●

●

●●

●

●● ●●●●

●●

●●

●●

●

●● ●●●●●●

●

●

●●●●

●● ●

●●●

●●

●

●

●

●

●●

●

●● ●●

●●

●●

●

●●●●

●●●

●●●●

●●

●●

●

●●

●

●● ●●●

●●●●

●●

●●

●

●●

●

●●● ●●●

●●●

●●

●

●●

●

●

●

●

●●●● ●●

●

●●●●

●●

●●

●

●●

●●

●●●●

●●

●●●●●●●

●●●

●

●●

●

●●●

●

●

●●

●●●●●●●

●●●

●

●●

●

●●●

●

●

●

●●

●

●●●●●

●●●●

●

●

●

●

●● ●

●

●

●●

●●

●

●●●●●

●●●●

●

●

●

●

●

●●

●

●

●●

●

●●

●

●●●●●

●●●●

●

●

●

●

●

●●

●

●

●●

●●

●●

●

●●●●

●●●●●

●

●

●

●

●●●

●

●

●●

●●●

●

●

●

●●●●

●●

●●●

●

●

●

●

●

●●

●

●

●●

●●●●

●●

●

●●●●●●●●●

●

●

●

●

●●●

●

●

●● ●●●●●

●

●

●

●●●●

●●

●● ●

●

●

●

●

●

● ●

●

●

●●

●●● ● ●●

●●

●

●●●●●

●●●●

●

●

●

●

●●●

●

●

●●

●●●●● ● ●

●●

●

●●●●

●●●●●

●

●

●

●

●

●●

●

●

●●

●●●●● ● ●●

●●

●

●●●●●●●●●

●

●

●

●

●●●

●

●

●● ●●●●●● ●●●

●

●

●

●●●●

●●

●●●

●

●

●

●

●

●●

●

●

●●

●●●●● ● ●●●●

●●

●

●●●●

●●

●●●

●

●

●

●

●●●

●

●

●●

●●●●● ● ●●●●●

●●

●

●●●●●●●●

●

●

●●

●

●●●

●

●

●● ●●●●●●

●●●●

●●●●

●

●●●●●●

●●

●●

●●●

●●●●●●

●

●●●●

●

●

●●●

●

●●

●●● ●

●●●

●●

●

●●

●

●●

●

●●●●●●

●●

●●●●

●

●

●

●●

●

●●

●

● ●● ●

●●●

●●

●

●●

●

●●

●

●●●●● ●

●●

●●●●

●

●

●

●●

●

●●

●

●● ●●

●

●●●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●●●●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●●●●●●●●

●

●

●●

●

●●●

●

●

●● ●●●●●●

●●●●●●●

●

●●

●

●

●

●

●●●●

●●

●●●

●

●

●

●

●

●●

●

●

●●

●●●●●● ●●●● ●●●

●

●●

●

●

●

●

●

●●●●

●●

●●●

●

●

●

●

●

●●

●

●

●●

●●●●●● ●●●● ●●●

●

●●

●

●● ●●●●●●●●

●

●

●

●

●●●

●

●

●● ●●●●●● ●●●● ●●●

●

●●

●

●●● ●●

●

●●●●

●●

●●

●

●●

●

●●●●●●●

●

●●●●

●

●

●

●●

●

●●

●

●●●

●

●

●●

●

●

●●●

●●

●●

●

●●

●

●●●●●●●

●

●●●●

●

●

●

●●

●

●●

●

●●●

●

●

●●

●

● ●●

●●

●●

●●

●

●●●

●

●●●

●●●●●

●

●●●

●

●●

●●●●

●

●

●●

● ●●●

●●

●

●●

●

●●●

●

●●●

●●●●●●

●●●

●●●

●●

●●

●

●

●●

●

●●

● ●

●●●

●

●

●● ●●●●●●●●●●●●

●

●●

●

●●●

●

●●

●●

●●

●

●●●

●●●●●●

●●

●●● ●

●●

●

●

●●

●●●

●●●

●●

●

●

●●

●●●●●●●●●

●

●●

●

●●●●

●●

●●

●●

●● ●

●●●

●●

●

●●

●

●●

●

●●●●● ●

●●

●●●●

●

●

●

●●

●

●●

●

●●●●

●

●●

●

● ●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●●●●

●

●

●

●●

●

●

●

●

●

●●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●●●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●●●

●

●

●●

●

●

●

●

●

●●●

●

●●

●

●●

●

●

●

●

●

●

●● ●●●●● ●

●●

●

●

●

●●

●

●●

●

●●●

●

●

●●

●

●●

●

●

●

●

●

●●

●● ●

●

●●●

●

●●●

●●●●

●●●

●●

●

●●●●

●

●

●●

●●●●●

●●

●●

●

●

●

● ● ●

●●●

●●●●

● ●●

●●●●

●

●●●

●

●●

●●

●●

●

●

●●

●

●●● ●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●●●●

●

●

●

●●

●

●

●

●

●

●●●

●

●●

●

●●

●

●

●

●

●

●

● ● ●●

●

●

●●●

●●

●●

●

●

●

●

●

●●

●

●

●●●

●

●●

●

●●

●

●

●

●

●

●●●●●

●

●

● ●●● ●

●●

●

●

●●

●

●●●

●

●

●●

●

●●

●

●

●

●

●

●●

●●

●

●● ●

● ●● ●

●●

●

●

●●

●

●●●

●

●

●●

●

●●●

●

●

●

●

●

●●

●

●●

●

●●

●●

●

●

●

●

●●●

●●

●●

●

●●●●

●

●

●

●

●●

●

●● ●

●

●●●

●

●●●

●●

●●

●

●●●●

●

●

●

●

●●

●

●● ●

●

●●●●

●

●

●●

●

●●●

●

●

●

●

●

●●

●

●●

●●

●●●●●

●

●

●●●

●

●

●

●

●

●●

●

●●

●●

●●●●●●

●

●●

●

●●

●

●

●

●

●

●●●

●●

●●

●●●●●●●● ●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●●●

●

●

●

●

●●

●

●●

●●

●●●●●●●●

●

●●

●

●

●

●●

●●

●

●●

●●●●●●●

●

●●

●

●

●

●

●●●

●●●●●●●

●

●

●●

●

●

●

●

●

●●

●●●●●●●●●

●

●●●●

●

●●●●●●●●●

●

●

●●●●

●●

●

●● ●

●●●●●●●

●

●

●●●●● ●

●

●

●●●●●●●

●

●

●●●●● ●

●

●

●

●●●●●●●●

●

●

●●●●●●● ●

●

●●●●●●

●

●

●●●●●●●●

●

●●●●●●●

●

●●●●● ●● ●●

●

●

●

●

●●●●●●● ●

●

●●●●●●●

●●●

●

●

●

●●

●

●

●●

●●●

●●●

●

●●●●●● ●

●

●●

●

●

●

●

●

●

●●●

●

●

●●●●●●●● ●

●

●● ●● ●● ●

●●●

●●

●

●

●

●

●●●

●●●

●

●●●●●

●●

●

● ●

●●●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●●

●●●●●● ●

●

●● ●● ●● ●● ●●●●

●

●

●●

●

●● ●●●● ●● ●

●

●●

●

●● ●

●

●

●●● ●●● ●●

● ● ● ●● ● ●

●

●●

●

●● ●

●

● ●

●●

●

●

●

●●

●●●●

●●

●●●

●●

●

●

●

●

●

●●

●

●

●

●●●

● ●

●

●

●● ●

●●

● ●

●

●●

●

●

●

●

●

●●

●●

●

●

●●●

●●

●●●●●

●

●

●

●

●

●●

●

●

●●

●●● ●

●

● ●● ●

●●

● ●●

●●

●

●

●

●

●

●●

●

●●

●● ●

●●●●●●●

●

●

●●●● ●● ●● ●●

●●

●

●

●

●●

●

●

●

●

●●●●

●●●

●

●●●●● ●● ● ●●

●●

●

●

●●

●●

●●●●

●●

●

●

●●●●

● ●●●

●●

●

●

●

●

● ●●●●

●●●

●●

●●

●●●

●●

●●●

●●

●

●

●

●

●

● ●●●●

●●●

●●

●

●

●●●●

●●●●

● ●

●

●

●

●

●●

●●●●● ●●●●●

●

●●●●● ●● ●●●

●●

●

●

●

●

●●●●

●● ●●●●●

●

●

●●●●● ●● ●●●

●●

●

●

●

●

●●●●

●●●

●

●

●

●●

●●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

●●

●●

●

●

●●●

●●●●

●

● ●

●

●

●

●

●

● ●

●●●

●●

●●

●●

●●

●

●

●

●

●

●●

●●

●

●

●●●

●

●●

●

●

● ●

●

●

●

●

●

●●

●●

●●

●

●●

● ● ●

●●

●●

●

●

●

●

●●

●

●

●

●●●

●●

●

●

●●●

●●

● ●

●

● ●

●

●

●

●

●

● ●

● ●●

●●

●●

● ●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●●

●

●

● ●

●

●

●

●

●

●●

●●

●●

●

●●

●●●●●

●●

●

●

●

●

●

●

●●●

●●

●

●

●●●

●

●●

●

●

● ●

●

●

●

●

●

●●

●●●

●●

●●

●●●●●

●

●

●

●

●●

●●

●

●

●●●

●

●●

●

●

● ●

●

●

●

●

●

●●

●●

●●

●

●●

●●● ●●●●

●●

●

●

●

●●●

●●

●

●

●●●

●

●●●

●

● ●

●

●

●

●

●

●●

●●●

●●

●●

● ●● ● ●●●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

● ●●● ● ●●● ●●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●●●

●●

●●

●

● ●

●

●

●

●

●

●●

● ●●

●●

●●

● ●●● ● ● ●●● ●●●●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●

●●●

● ● ● ●●●●

●

●

●

●●

●●●●

●●●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●●

●●

●●

● ● ●●●●●

●●

●

●

●●

●●●●●

●●●●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●●

● ●●●

●●●

●

●

●

●●●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●●

●●

●●

● ● ●●●

●●● ●

●●●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

● ● ●●●●

● ● ●●●

●●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●●●

●

●●

●

●

● ●

●

●

●

●

●

●●

●●

●●

●

●●

● ●●● ● ● ●●●● ● ● ●●● ●●●●

●●

●●●

●●

●

●

●

●

●

●●

●

●

●

●●●

●●

●

●

●●●

●●

● ●

●

● ●

●

●

●

●

●

●●

● ●●

●●

●●

●●●● ● ● ●●●●

● ●●●

●● ●●●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●●●

●●

● ●

●

● ●

●

●

●

●

●

● ●

● ●●

●●

●●

● ●●● ● ● ●●●●● ●

●●●

●●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

●●

●● ● ● ●●●●● ● ●●●●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●●

● ● ●●●

●● ● ●●●

●●●

●

●

●

●

●● ●●●

●

●●

●

●●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

● ●●● ● ● ●●●●●● ●● ●●●●●●●

●

●●

●

●●●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

● ●

●

●

●● ●

●

●●

●

●

●●

●

●

●

●

●

●●

●●

●●

●

●●

● ●●●● ● ●●●●● ● ●●●●●●●●●

●

●●

●

●●

●

●●●●

●●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●● ●

● ●

●

●

●● ●

●●

● ●

●

●●

●

●

●

●

●

●●

● ●●

●●

●●

●●●●● ● ●●●●

●●●●

●●●●

●●●●

●

●

●

●●

●● ●●●

●

●●

●

●●●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●● ●

● ●

●

●

●● ●

●●

●●

●

●●

●

●

●

●

●

●●

● ●●●

●

● ●●●● ● ●●●●●● ●● ●●●●●●

● ●●

●

●

●

●

●●●●●

●

● ●● ●●●●●● ●●●

●

●

●

●

●

●●●

●●

●

●

●

●

●●●

●

●

●

●●

●●

●

●

●●

●●

●

●

●●

●●

●

●●●

●

●

●●

●●●●●●●●● ●●●

●

●●

●

●●●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●● ●

● ●

●

●

●● ●

●

●

●

●

●

●

●●

●

●● ● ●●●●●●● ●

●●●●

●●● ●●●

●●

●●

●

●

●●

●●●●●●●● ●●●

●

●●

●

●●●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●● ●

● ●

●

●

●● ●

●

●●

●

●

●

●

●

●

●

●

●●

● ●

●●● ● ● ●●●●●● ● ● ●●●●●●

● ●●●

●

●

●

●●

●

●

●●

●●● ●●

●●●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●● ●

●

●

●●

●

●●

●

●

●

●

●●

●●

●●● ● ●●●●● ● ● ● ●

●●●●● ●

●● ●●

●●

●

●

●● ●●●●● ●●●

●

●●

●

●●●●

●●

●●●●●

●

●

●

●

●

●●

●

●

●

●● ●

● ●

●

●

●● ●

●●

● ●●

●●

●

●

●

●

●●

● ●●●● ● ●●

●●●

● ●●

●●●●●

● ●●● ●●●

●

●●

●

●●●●●

●●

●

●●●●

●

●●●

●

●●

●

●●●

●●

●●

●

●●●

●

●

●

●

●

●●

●

●●

●

●●

● ● ● ● ● ●●

●

●● ●● ●

●●

●

●

●

●

●

●●●

●●

● ●

●

●

●●

●●

●

●

●●●●

●

●

●● ●

●●●

●●

●

●●

●

●●

●

●●●●● ●

●●

●●●●

●

●

●

●●

●

●●

●

●●●●

●

●●

●

● ●●

●

●

●

●

●

●●

●●

●●

●● ● ● ● ● ● ●●

●

●●

●●

●

●

●●

● ●

●

●

●●

●●

●

●

●●●

●

●

●

●●● ●●●

●

●

●

●

●●●

●

●

●● ●●●●●● ●●●● ●●●

●

●●

●

●●●

●

●●

●● ● ●●

●

●

●

●

●

●●

●

●

●

●● ●

● ●

●

●

●● ●

●

●

●●● ● ● ●●

●●● ●

●

●●●●●

● ●●●● ●●

●

●●

●●

●

●●●

●

●

●

●

●

●●

●

●

●●

●●●●●●

●●●●

●●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

● ●

●●

●

●●●●●● ● ● ●

● ●●●● ●

●●

●

● ● ●●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●●

●●●●●●

●●●●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●●

●

●

●●

●

●●●

●●●● ● ●

●● ●

●●● ●●

●

●

● ●●

●

●

●

●

●●

●

●●

●

●

●●

●●●●●●●●

●●●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●● ●

●

●

●

●

●

●

●●

●

●● ● ●●●●●●● ● ● ●● ●●●● ●●●

●

● ● ●●

●

●

●●●

●●

● ●● ●●●●

●●●

●

●●

●

● ● ●

●

●● ●

● ●●●●

●

●●●

●● ●

●●

●●

●

●

● ●

●● ●● ● ●

●

●

●

●

●

●●

●

●●

●●

●

●

●

●

●●

●●●●

●●●

●●

●●

●●

●

●●●

●

●● ●

●●●●

●

●

●●●

●

●●

●●●●

●

●

●●

●● ●

● ●●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●● ●

●

●●

● ●

●

●

●

●

●

●

●●●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●●

●●

●●

● ● ●●●

●● ● ●● ●

●● ●

●● ● ●●●

●

●●●

●

●

●

●

●●●●●●

●

●

●

●

●●●

●

●

●●

●●●●●● ●●●● ●●●

●

●●

●

●●●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●● ●

● ●

●

●

●● ●

●

●

●

●

●

●

●●

●

●● ● ●●●●●●● ●

●●●●

●●● ●●●

●

● ●● ●

●

●

● ●●

●●

●

●

●●

●

●●●●

●●

●●●

●

●

●

●

●● ●

●

●

●●

●●●● ●● ●●●●● ●●

●

●●

●

● ● ● ●

● ●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●

● ●●

●

●

●

●●

●●

●●

● ●●

●

●

●

●●

●

●

●●

●●●●●●

●●●●●●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●● ●

●

●

●

●

●

●●

●●●●●●● ● ● ●

● ●●●● ●

●●

●

● ●●

●

●

●

●

●●●

●●

●● ●

●

●

●

●●

●●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●

●●

●● ● ● ●●●●● ● ●● ● ●

●●●●● ●

●●

●

●● ●●

●

●

●

●●●● ●●

●●

●

●

●●

●

●

●●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

● ●●● ● ● ●●●●● ● ●●●●●●●● ● ●● ●

●

●● ●●

●

●

●●●●● ●●●

●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●●

● ● ●●●●

● ● ●●●●

● ●●● ● ●

● ●

●

●●●

●

●

●

●

●●●●●

●●●●

●

●

●

●●●

●●

●

●

●●●

●●●

●

●

● ●

●

●

●

●

●

●●

●●●

●●

●●

●●● ●●● ● ● ● ● ● ●●●

●●●● ●●● ● ●

●

●●●

●●●●●

●●●●●

●●

●

●

●●●●●●●●● ●

●

●●

●

● ●●●●● ●● ●●

●

●●

●●●● ●

●

●

● ●

●●

●●●

●●

●●

●●

●

●

●

●

●●

●

●

●

●

●●

●●

●●

●

● ●●●

●●●●

●● ●

●

●

●

●

●

● ●●●

● ●●●● ●

●●

●●●● ●●

●

● ●●

●

●●●

● ●●● ● ●

●

●

●●

●●

●●

●

●●

●

●

●●●

●

●●

●

●

● ●

●

●

●

●

●

●●

●●

●●

●

●●

●● ● ●●●● ● ● ● ●●●●●● ● ● ●● ●

●

●●●●●

●

●

●●

●●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

●● ● ● ●●●● ● ● ● ●●●●●● ● ● ●● ●

●

●●●●●

●

●

●

●

●●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●●

●

●●

●● ● ● ●●●● ● ● ● ●●●●●● ● ● ●●

●

● ●●●●

●

●●●

●●

●

●

● ●●●

●●●●

● ●

●

●

●

●

● ●●●●●● ●●

●

●●

●●●●●

●

●

●●

●●●

●●

●

●●

●

●

●

●●

●●

●

●●

●

●

●●●●●●●●● ●

●

●●

●

●●

●●●●●● ●●

●

●●

●●●● ●

●

●

●●

●●●

●●

●

●

●●

●

●

●

●●

●

●●

● ●●

●

●●●●●●●●● ●●

●●

●

●●●●●

●● ●

●

●

●

●

●●● ●

●

● ●

●●

●

●

●

●

●

●

●

●●

●● ●●

●

●●●●●●●●● ●

●●

●

●

●●●●●●

● ●

●

●

●

●

●●● ●

●

● ●

●●

●

●

●

●

●

●

●●

●●● ●●

●

●●●●● ●●●● ● ●●

●

●

●●●●

●●● ●

●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●●

●●

●●

●●

●

●

●●●

●●●●

●

● ●

●

●

●

●

●

● ●

●●●

●●

● ●

●● ● ●●●● ●

●●

●●

●●●● ●

●

●

●●

●●

●●● ●●

●

●●

● ●

●●

●

●

●●●

●●

●●

●

● ●

●

●

●

●

●

● ●

●●●

●●

●●

●● ● ●●●● ● ● ●●●●●●

● ●●● ●

●

●●●●

●

●

●● ● ●●

●

●●

●

●

●●

●●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

●● ● ●●●● ● ● ● ● ●●●●●● ● ● ●● ●

●

●●●●●

●

●● ● ●

●●

●●

●

●● ●●●●●

●

●

●●●●●●●●● ●

●●

●

●

●●

●●●●●● ●●

●

●

●

●●●● ●

●

●

● ●

●

●

●●●

●●

●●

●●

●

●●

●

●

●

●●

●

●●

●● ●●●

●●

●

●●●

●●

●●

●●●

●●

●●●

● ●

●

●

●

●

●

● ●●●●● ●

●●●

●●

●●●● ●

●●

●●

●●●

●●

●●

●

●

●●

●●

●●●

●● ●

●●●●

●

●●●

●●

●

●

●

●

●●

●●

●

●

●●●

●

●

●●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

● ●● ● ●●●● ● ● ● ● ●●●●●● ● ● ●● ● ●

●

● ●●●●●●●●

●

●●● ●

●●

● ●

●

●●●

●

●

●●●

●●

●●

●

●

●●●

● ●●●●

●

● ●●●

●

●● ●

●●

●●

●● ●●

●

●

●●

●●

●

● ●● ●

●

●

●

(b)

distance

sem

ivar

ianc

e

0.2

0.4

0.6

500 1000 1500

+

+

+

+

+

+ +

++

+ +

+

+

+ +

57

299

419

457547

533574564589

543500

477452

457415

(c)

distance

sem

ivar

ianc

e0.2

0.4

0.6

500 1000 1500

+

+

+

+

+

+ +

++

+ +

+

+

+ +

(d)


Burrough and McDonnell (1998) example

What is the value of target variable at the location x=5, y=5? (examplefrom the book by Burrough and McDonnell (1998)):

> grid10 = expand.grid(x = seq(0, 10, 10), y = seq(0, 10,

+ 10))

> gridded(grid10) = ~x + y

> newpoint = as.data.frame(matrix(c(5, 5), nrow = 1, ncol = 2,

+ dimnames = list(c("x1"), c("X", "Y"))))

> coordinates(newpoint) <- ~X + Y

> krige(Z ~ 1, points5, newpoint, vgm(nugget = 2.5, "Sph",

+ psill = 7.5, range = 10))

[using ordinary kriging]

coordinates var1.pred var1.var

1 (5, 5) 4.3 4.93


Kriging = geostatistics

Kriging has for many decades been used as a synonym forgeostatistical interpolation.

It originated in the mining industry in the early 1950’s as a meansof improving ore reserve estimation (mining engineers D. G. Krigeand the statistician H. S. Sichel).

The technique was first published in Krige (1951), but it tookalmost a decade until a French mathematician G. Matheronderived the formulas and basically established the whole field oflinear geostatistics.


Environmental correlation

The concept of vegetation / soil – environment relationships hasfrequently been presented in terms of an equation with six keyenvironmental factors as:

V × S [x , y , t ] = f

{s[x , y , t ] c[x , y , t ] o[x , y , t ]r [x , y , t ] p[x , y , t ] a[x , y , t ]

(9)


Types of lm’s

There are (at least) four groups of statistical models that havebeen used to make spatial predictions with the help ofenvironmental factors:

I Classification-based models

I Tree-based models (decision tree)

I Regression models (Generalized Linear Models, GeneralAdditive Models)


Environmental correlation with OLS

A common regression-based approach to spatial prediction is themultiple linear regression:

zOLS(s0) = b0+b1 ·q1(s0)+. . .+bp ·qp(s0) =p∑

k=0

βk ·qk (s0) = βT·q (10)

where qk (s0) are the values of the auxiliary variables at the target

location, p is the number of predictors or auxiliary variables, and βk arethe regression coefficients solved using the Ordinary Least Squares:

β =(qT · q

)−1 · qT · z (11)


OLS prediction error

The prediction error of a multiple linear regression model is:

σ2OLS(s0) = MSE ·

[1 + qT

0 ·(qT · q

)−1 · q0

](12)

where MSE is the mean square (residual) error around the regression line:

MSE =

n∑i=1

[z (si)− z (si)]2

n − 2(13)

and q0 is the vector of predictors at new, unvisited location. The OLS

prediction error reflects the amount of extrapolation in the feature

space!


Adjusted R-square

The sum of squares of residuals (SSE ) can be used to determine theadjusted coefficient of multiple determination (R2

a), which describesthe goodness of fit:

R2a = 1−

(n − 1

n − p

)· SSE

SSTO

= 1−(n − 1

n − p

)·(1− R2

) (14)

where SSTO is the total sum of squares, R2 indicates amount of

variance explained by model, whereas R2a adjusts for the number of

variables (p) used. For many environmental mapping projects, a

R2a ≥0.85 is already a very satisfactory solution and higher values will

typically only mean over-fitting of the data.


Comparison of spatial prediction techniques

zinc.id zinc.tr zinc.lm zinc.ok

0

500

1000

1500


Universal model of spatial variation

From the statistical perspective, an environmental variable can beviewed as an information signal consisting of three components:

Z (s) = Z ∗(s) + ε′(s) + ε′′ (15)

where Z ∗(s) is the deterministic component, ε′(s) is thespatially correlated random component and ε′′ is the purenoise, usually the result of the measurement error.


Best Linear Unbiased Predictor

Matheron (1969) proposed that a value of a target variable at somelocation can be modelled as a sum of the deterministic and stochasticcomponents:

z (s0) = m(s0) + e(s0)

=

p∑k=0

βk · qk (s0) +n∑

i=1

λi · e(si)(16)

where m(s0) is the fitted deterministic part, e(s0) is the interpolated

residual, βk are estimated deterministic model coefficients (β0 is theestimated intercept), λi are kriging weights determined by the spatialdependence structure of the residual and where e(si) is the residual atlocation si .


BLUP (2)

The regression coefficients βk can be estimated from the sample by somefitting method, e.g.ordinary least squares (OLS) or, optimally, usingGeneralized Least Squares:

βGLS =(qT ·C−1 · q

)−1 · qT ·C−1 · z (17)

where βGLS is the vector of estimated regression coefficients, C is thecovariance matrix of the residuals, q is a matrix of predictors at thesampling locations and z is the vector of measured values of the targetvariable.


BLUP (3)

In matrix notation, regression-kriging is commonly written as(Christensen, 2001):

zRK(s0) = qT0 · βGLS + λT0 · (z− q · βGLS) (18)

where z (s0) is the predicted value at location s0, q0 is the vector ofp + 1 predictors and λ0 is the vector of n kriging weights used tointerpolate the residuals.

The estimation of the residuals is an iterative process: first the

deterministic part of variation is estimated using ordinary least squares

(OLS), then the covariance function of the residuals is used to obtain the

GLS coefficients. The most reliable way to estimate the model

coefficients is REML.


RK variance

The RK prediction variance reflects the position of new locations(extrapolation) in both geographical and feature space:

σ2RK(s0) = (C0 + C1)− cT0 ·C−1 · c0

+(q0 − qT ·C−1 · c0

)T · (qT ·C−1 · q)−1 · (q0 − qT ·C−1 · c0

)(19)

where C0 + C1 is the sill variation and c0 is the vector of covariances ofresiduals at the unvisited location.


RK and MLR

If the residuals show no spatial auto-correlation (pure nugget effect), theregression-kriging converges to pure multiple linear regression (Cbecomes identity matrix):

C =

C0 + C1 · · · 0... C0 + C1 00 0 C0 + C1

= (C0 + C1) · I (20)

so the kriging weights at any location predict the mean residual i.e.0value. Similarly, the regression-kriging variance reduces to the multiplelinear regression variance:

σ2RK(s0) = (C0 + C1)− 0 + qT

0 ·(qT · 1

(C0 + C1)· q)−1

· q0

σ2RK(s0) = σ2

OLS(s0) = MSE ·[1 + qT

0 ·(qT · q

)−1 · q0

](21)


RK and OK

If the target variable shows no correlation with the auxiliary predictors,

the regression-kriging model reduces to ordinary kriging model because

the deterministic part equals the (global) mean value.


RK and KED/UK

In the case of KED/UK, the extended covariance matrix of residuals isused, which looks like this:

CKED =

C (s1, s1) · · · C (s1, sn) 1 q1(s1) · · · qp(s1)...

......

......

C (sn , s1) · · · C (sn , sn) 1 q1(sn) · · · qp(sn)1 · · · 1 0 0 · · · 0

q1(s1) · · · q1(sn) 0 0 · · · 0...

... 0...

...qp(s1) · · · qp(sn) 0 0 · · · 0

(22)


RK and KED/UK (2)

The KED/UK weights are solved using the extended matrices:

λKED0 ={wKED1 (s0), ...,w

KEDn (s0), ϕ0(s0), ..., ϕp(s0)

}T= CKED−1 · cKED0

(23)

where λKED0 is the vector of solved weights, ϕp are the Lagrangemultipliers, CKED is the extended covariance matrix of residuals and cKED0

is the extended vector of covariances at new location.


RK and KED/UK (3)

The predictions at new locations are made by:

zKED(s0) =

n∑i=1

wKEDi (s0)·z (si) = δT0 · z (24)

for:n∑

i=1

wKEDi (s0)·qk (si) = qk (s0); k = 1, ..., p (25)

where δ0 is the vector of KED/UK weights (wKEDi ).

Hence, KED/UK looks exactly as ordinary kriging, except thecovariance matrix is extended with values of auxiliary predictors!


The name confusion

Matheron (1969) originally termed the technique Le krigeage universel(Universal kriging), however, the technique was intended as ageneralized case of kriging where the trend is modelled as a function ofcoordinates only.

If the deterministic part of variation (drift) is defined externally as alinear function of some auxiliary variables, rather than the coordinates,the term Kriging with External Drift (KED) is preferred.

The drift and residuals can also be estimated separately and thensummed. This procedure was suggested by Ahmed and de Marsily (1987)and Odeh et al.(1995), who later named it Regression-kriging.

Minasny and McBratney (2007) suggest that instead a mathematicallyaccurate term should be used to name the technique: EBLUP.


Why do I prefer the term RK?

1. RK explicitly separates trend estimation from spatialprediction of residuals, allowing the use ofarbitrarily-complex forms of regression, rather than the simplelinear techniques

2. RK allows the separate interpretation of the twointerpolated components

3. The emphasis on regression is important also because fittingof the deterministic part of variation (regression) is often morebeneficial for the quality of final maps

4. KED (extended) matrix is instable in the case that thecovariate does not vary smoothly in space


Decision tree

YESIs the

variable correlated with environmental

factors?

NO

YESIs the physical model

known?

NO

ORDINARY KRIGING

YESDo the

residuals showspatial auto-correlation?

YESDo the

residuals showspatial auto-correlation?

YESDoes the

variable showsspatial auto-correlation?

NO

INVERSE DISTANCE

INTERPOLATION

YESCan a

variogram with >1 parametersbe fitted?

NO

ENVIRONMENTAL CORRELATION

(OLS)

REGRESSION-KRIGING

(calibration)

NO PREDICTIONSPOSSIBLE

REGRESSION-KRIGING

(GLS)

DETERMINISTIC MODEL

NO

NO


Gstat

inverse distance interpolation:

ev.id = krige(ev∼1, data=points, newdata=mapgrid)

correlation with coordinates (2nd order polynomial model):

ev.ts = krige(ev∼x+y+x*y+x*x+y*y, data=points,

newdata=mapgrid)

moving window (with coordinates):

ev.mv = krige(ev∼x+y+x*y+x*x+y*y, data=points,

newdata=mapgrid, nmax=20)


Gstat (2)

ordinary kriging:

ev.ok = krige(ev∼1, data=points, newdata=mapgrid,

model=vgm(psill=5, "Exp", range=1000, nugget=1))

environmental correlation (OLS):

ev.ec = krige(ev∼q1+q2, data=points, newdata=mapgrid)

regression-kriging (universal kriging):

ev.rk = krige(ev∼q1+q2, data=points, newdata=mapgrid,

model=vgm(psill=3, "Exp", range=500, nugget=0))


R syntax


Space-time data

Universal kriging model for spatio-temporal data (Heuvelink &Griffith, 2010):

T (s, t) = m(s, t) + ε(s, t) (26)

where m(s, t) is the deterministic part of the variation (i.e. a linear

function of the auxiliary variables), ε(s, t) is the residual for every(s, t).


http://dx.doi.org/10.1111/j.1538-4632.2010.00788.x

http://dx.doi.org/10.1111/j.1538-4632.2010.00788.x

Space-time cube


Space-time workshop (Munster)


Space-time semivariance

γ(si , ti ; sj , tj ) = 0.5 · E[(ε(si , ti)− ε(sj , tj ))2

](27)


Residuals

Residuals (ε) consist of three stationary and independentcomponents (Heuvelink & Griffith, 2010):

ε(s, t) = εs(s) + εt(t) + εs,t(s, t) (28)

where εs(s) is a purely spatial process (with constant realizationsover time), εt(t) is a purely temporal process, and εs,t(s, t) is aspace-time process for which distance in space is made comparableto distance in time by introducing a space-time anisotropy ratio.


http://dx.doi.org/10.1111/j.1538-4632.2010.00788.x

Zonal anisotropies

The covariance structure can be represented by (Snepvangers etal., 2003):

C (h, u) = Cs(h) + Ct(u) + Cs,t(√

h2 + (α+ u)2) (29)

where C (h, u) is the covariance at distance h in space, andtime-distance u, Cs(h) + Ct(u) allow the presence of zonalanisotropies (different variogram sills in different directions), andCs,t(

√h2 + (α+ u)2) allows the presence of geometric anisotropy

represented with the ratio α.


http://dx.doi.org/10.1016/S0016-7061(02)00310-5

http://dx.doi.org/10.1016/S0016-7061(02)00310-5

The data set


http://spatial-analyst.net/book/HRclim2008

In space-time cube

XY

cdays


Variograms (separately)

0 50000 100000 150000 200000

05

1015

2025

365 days

Distance (m)

Sem

ivar

ianc

e

0 10 20 30 40 50 600

1020

3040

50

159 stations

Distance (in days)

Sem

ivar

ianc

e


Variograms (zonal anisotropy)

Distance (m)

sem

ivar

ianc

e

2

4

6

8

10

50000 100000 150000 200000

●●

●

● ●●

● ● ●● ●

● ●●

●

Distance (in days)se

miv

aria

nce

2

4

6

8

10

5 10 15

●

●

●

●

● ●●

●

●●

●●

●●

● ●

Marginal experimental variograms for residuals and fitted models:(left) space-domain only, (right) time-domain only.


Final results


http://spatial-analyst.net/book/node/467

Some experiences

I By adding the time component we are better off.

I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.

I Fitting and visualization of space-time variograms is abottle-neck!

I Predictions need to be visualized as animations.

I We have ignored the one-way auto-correlation (time worksonly one way)?


Universal space-time reference

Each observation should have by default:

I Longitude and latitude (WGS84) (or projected X ,Ycoordinates + proj4 string);

I Begin / end of the time interval in UTC (GMT) system;

I Support size (in square meters);

I Uncertainty or measurement error;


Space-time algebra re-visited

Should we (re)define and (re)implementspace-time (4D) algebra?


What does this mean?

I Distances always on a sphere (sphere geometry);

I Always use information about uncertainty (weightedregression);

I Always use information about the support size (nuggetestimation, cross-validation);

I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);

I Use Google Earth to visualize any type of geographic data;


Global Multiscale Nested RK

Global models will soon replace local (isolated) modeling. Onesuch approach is the nested RK model:

z (sB) = m0(sB−k )+e1(sB−k |sB−[k+1])+ . . .+ek (sB−2|sB−1)+ε(sB) (30)

where z (sB) is the value of the target variable estimated at groundscale (B), B−1, . . . ,B−k are the higher order components,ek (sB−k |sB−(k+1)) is the residual variation from scale sB−(k+1) to ahigher resolution scale sB−k , and ε is spatially auto-correlatedresidual soil variation (dealt with ordinary kriging).


Multi-scale concept

5.6 km

Global: climatic patterns and

processes, vegetation zones,

elevation

1 km

extent

Continental: geological zones,

meso-climatic conditions, erosion/

deposition at large scales

250 m Regional: general land use,

vegetation cover

100 mLocal: land management, erosion/

deposition at watershed level


Multi-resolution signal (McBratney, 1998)


http://dx.doi.org/10.1023/A:1009778500412

Some thoughts

I Global models — global multiscale predictions — arenow.

I It is very probable that, in the near future, any geostatisticalanalysis will be global.

I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).

I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!


Limits of geostatistics in R

I Sampling optimization algorithms for geostatisticalmodelling are missing

I Local regression-kriging still waits to be implemented

I In gstat, only linear models can be implemented; geoR canincorporate also non-linear models, but it can only work with<< 103 points

I KED algorithm can be rather slow and can lead to instabilities(singularity problem)

I Interactive visualization and fitting of 3D space-timevariograms is missing


Nothing can save bad data!

I Even the most sophisticated geostatistical tools will not beable to save the data sets of poor quality! If you want toproduce quality outputs (maps/reports), make sure your inputfield data satisfies some minimum requirements:

1. it is large enough2. it is representative3. it is independent4. it is produced using consistent methodology5. its precision is significantly precise

I Geostatistical mapping using inconsistent point samples ispossible, but do you really need this?


a practical guide to practical guide to geostatistical mapping tomislav hengl isric | world soil...

Documents