a practical guide to practical guide to geostatistical mapping tomislav hengl isric | world soil...
TRANSCRIPT
A Practical Guide toGeostatistical Mapping
Tomislav Hengl
ISRIC — World Soil Information, Wageningen University
GEOSTAT course, 11-17 April 2011, Canberra
Topics
I spatio-temporal data — elements, aspects, formats
I data import (GDAL) and visual exploration
I geographic data, maps, cartographic projections systems(proj4)
I Google Earth — the final GIS?I Spatio-temporal statistics — basics:
1. spatial prediction / automated mapping2. kriging, regression, regression-kriging3. some applications
GEOSTAT course, 11-17 April 2011, Canberra
Today, everybody is a spatial analyst!
I We have the tools that allow GIS+statistics integrationI There is more and more auxiliary data:
1. MODIS (global coverage, 250 m, every 2 days, 36 bands)2. Meteorological images (e.g.SEVIRI; 1 km, every 15 mins., 12
bands)3. SRTM DEM, GDEM, LiDAR (topography, 30–100 m)
I We can automate data analysis (“Get results sooner, withmore accuracy. . . and retire sooner” Chih Jeng Kenneth Tan)
I GE registered more than 350 millions of downloads!
GEOSTAT course, 11-17 April 2011, Canberra
GIS analysis for all
“From a period in which geographic information sys-tems, and later geocomputation and geographical in-formation science, have been agenda setters, thereseems to be interest in trying things out, in ex-pressing ideas in code, and in encouraging othersto apply the coded functions in teaching and appliedresearch settings.”
Roger Bivand
GEOSTAT course, 11-17 April 2011, Canberra
The missing link
I Our projects typically depend on both statistical and GISanalysis
I Some believe that this could all be done within R
I Others believe that this could all be done within commercialpackages (ArcGIS)
I . . . and the winner is:
1. R — scripting, statistical computing2. SAGA/GRASS — GIS data input and geographical analysis3. Google Earth — storage, sharing, browsing,
GEOSTAT course, 11-17 April 2011, Canberra
Basic concepts
I Models — statistical model (conceptual); data models(formats); model parameters;
I Methods (functions) — implemented as algorithms; inputs,outputs, arguments;
I Data — variables: target variables, auxiliary variables(predictors); metadata; geoinformation;
I Applications — field-specific; result interpretation;associated uncertainty;
GEOSTAT course, 11-17 April 2011, Canberra
What is spatio-temporal statistics about?
Spatio-temporal statistics — statistical techniques adjusted tohandle spatio-temporal data.
Geostatistics is a subset of statistics specialized in analysis andinterpretation of geographically (and temporally) referenced data.
Geostatistics is an analytical tool for statistical analysis ofsampled field data.The bottom line is — you collect (spatio-temporal) data and youneed tools that can help you answer field-specific questions(i.e.that can help you produce outputs of interest — maps,predictions, statistical measures).
GEOSTAT course, 11-17 April 2011, Canberra
Geostatistics — topics
Typical questions of interest to a geostatistician are:
I how does a variable vary in space?
I what controls its variation in space?
I where to locate samples to describe its spatial variability?
I how many samples are needed to represent its spatialvariability?
I what is a value of a variable at some new location?
I what is the uncertainty of the estimate?
GEOSTAT course, 11-17 April 2011, Canberra
Analysis objectives
For Diggle and Ribeiro (2007) there are three scientific objectivesof geostatistics:
1. model estimation, i.e.inference about the model parameters;
2. prediction, i.e.inference about the unobserved values of thetarget variable;
3. hypothesis testing;
GEOSTAT course, 11-17 April 2011, Canberra
Environmental variables
Quantitative or descriptive measures of different environmentalfeatures.
I biology (distribution of species and biodiversity measures)
I soil science (soil properties and types)
I vegetation science (plant species and communities, landcover types)
I climatology (climatic variables at surface and benith/above)
I hydrology (water quantities and conditions)
GEOSTAT course, 11-17 April 2011, Canberra
Example
GEOSTAT course, 11-17 April 2011, Canberra
Variables
* Name, definition
* Feature of interest
* Measurement units
* Representation, data model, domain
* Spatio-temporal pattern
* Application, decision making process, datainterpretation
GEOSTAT course, 11-17 April 2011, Canberra
Example: pH
I environmental feature: acidity in soil
I variable of interest: pH factor
I units: concentration of the H+ ions in soil (negativeexponent)
I sampling technique: pH meter (field or laboratory); soilsolution
I targeted output: a map of continuous values ofconcentration (continuous fields)
I interpretation: values of pH define properties of soil (acid,neutral, alkaline soils)
GEOSTAT course, 11-17 April 2011, Canberra
Spatial variability
Commonly a result of complex processes working at the sametime and over long periods of time, rather than an effect of asingle realization of a single factor.
Sum of two components: (a) the natural spatial variation and(b) the inherent noise.
I Geographical variation (2D)
I Vertical variation (3D)
I Temporal variation
I Variation at different scales (support size)
GEOSTAT course, 11-17 April 2011, Canberra
A way to classify variables
1. SRV — short-range variability
2. TV — temporal variability
3. VV — vertical variability
4. SSD — standard sampling density
5. DRS — remote-sensing detectability
Other important issues: (6) sampling costs, (7) global or localcoverage, (8) relationship with other variables, (9) scalability
GEOSTAT course, 11-17 April 2011, Canberra
What you need to know!
Each 2D map of an environmental variable should always indicatea time reference (interval), applicable vertical dimension1
and the sample (support) size i.e.the effective scale.
It is also important to know: the approximate geographicalcoordinates of the study area (gravity point), borders of thearea of interest (mask), coordinate system (proj4 string) andwho and how made the map.
1Orthogonal distance from the land surface.
GEOSTAT course, 11-17 April 2011, Canberra
Look again at these maps
GEOSTAT course, 11-17 April 2011, Canberra
Nature of variables
From a meta-physical perspective, what we are most oftenmapping in geostatistics are, in fact, quantities of molecules of acertain kind or quantities of energy.
Many variables directly refer to processes and are expressed inquantity per time units — e.g.mm of rainfall per year.
In ecology: objects of interest (individual plants or animals), oftenimmeasurable in quantity — animal species change their locationdynamically, often in unpredictable directions and withunpredictable spatial patterns (non-linear trajectories); occurrencerecords — 0/1 observations; these are modeled using thestatistical probability theory.
GEOSTAT course, 11-17 April 2011, Canberra
Data models/formats
Data format is the way we define structure and elements of arecord of some variable/feature.Data format dictates many things: the way we edit(reading/writing), search, compute, transform or scale data.
Data formats are software-specific — everybody has a differentidea about how to represent data digitally.
GEOSTAT course, 11-17 April 2011, Canberra
Data formats in R
R classes — what type of object is it?
numeric: array of numbers (vector/matrix); dataframe: thefundamental structure for statistical analysis; matrix: with namedcolumns (roughly, database fields) and (optionally) named rows(roughly, database cases); models/formulas: complex hierarchicalstructure (a set of lists, vectors, dataframes);
Common classes in R: numeric, string, integer, factor,Date-Time (POSIX/C99) etc.
GEOSTAT course, 11-17 April 2011, Canberra
Spatial data in R
R has special classes for spatial data: spatial points, pointpatterns, pixels, lines, grids, CRS etc.; many existing spatialstatistics packages work with these classes;
http://www.r-project.org/Rgeo/
Pebesma, E.J., Bivand, R.S., 2005. Classes and methods forspatial data in R. R News 5/2, 9–13.Bivand, R.S., Pebesma, E.J., Gomez-Rubio, V., 2008. AppliedSpatial Data Analysis with R. UseR! Series, Springer, 378 pp.
GEOSTAT course, 11-17 April 2011, Canberra
Example of a gridded data in R
Formal class 'SpatialGridDataFrame' [package "sp"] with 6 slots
..@ data :'data.frame': 5530 obs. of 1 variables:
.. ..$ lgn3: int [1:5530] 1 1 1 1 1 1 1 22 22 22 ...
..@ grid :Formal class 'GridTopology' [package "sp"] with 3 slots
.. .. ..@ cellcentre.offset: Named num [1:2] 190163 314013
.. .. .. ..- attr(*, "names")= chr [1:2] "x" "y"
.. .. ..@ cellsize : num [1:2] 25 25
.. .. ..@ cells.dim : int [1:2] 70 79
..@ grid.index : int(0)
..@ coords : num [1:2, 1:2] 190163 191888 314013 315963
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..@ bbox : num [1:2, 1:2] 190150 314000 191900 315975
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "x" "y"
.. .. ..$ : chr [1:2] "min" "max"
..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
.. .. ..@ projargs: chr NA
GEOSTAT course, 11-17 April 2011, Canberra
Compare with ArcInfo ASCII
ncols 70
nrows 79
xllcorner 190150
yllcorner 314000
cellsize 25.00
nodata_value 0
1 1 1 1 1 1 1 22 22 22 22 22 22 22 22 22 1 1
1 17 17 17 17 24 17 17 17 17 17 17 17
17 17 17 24 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 4 5 5 5 5 5 5 5 5 6 6 6 2 2 2 2 2
1 1 1 1 1 22 22 22 22 22 22 22 22 22 22 22
1 1 1 17 17 17 24 24 24 17 17 17 17 17 17
17 17 17 24 24 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 ...
GEOSTAT course, 11-17 April 2011, Canberra
Compare with Idrisi raster I
file title : Land cover types from LGN3
data type : byte
file type : binary
columns : 70
rows : 79
ref. system : epsg:28992
ref. units : m
unit dist. : 1.0000000
min. X : 190150.00000
max. X : 191900.00000
min. Y : 314000.00000
max. Y : 315975.00000
pos'n error : unknown
resolution : 25.000000
min. value : 0
max. value : 39
value units : meter
value error : unknown
flag value : 0
flag def'n : missing data
legend cats : 26
GEOSTAT course, 11-17 April 2011, Canberra
Compare with Idrisi raster II
category 0 :
category 1: Agrarisch gras
category 2: Maıs
category 3: Aardappelen
category 4: Bieten
category 5: Granen
category 6: Overige landbouwgewassen
category 7: Glastuinbouw
category 8: Boomgaarden
category 9: Bloembollen
category 10: Loofbos
category 11: Naaldbos
category 12: Droge heide
category 13: Open begroeid natuurgebied
category 14: Kale grond natuurgebied
category 15: Zoet water
...
GEOSTAT course, 11-17 April 2011, Canberra
Land cover map in NL
LGN3 land cover classes
314500
315000
315500
190500 191000 191500
1
2
3
4
5
6
8
10
15
17
18
19
20
21
22
24
25
GEOSTAT course, 11-17 April 2011, Canberra
What is GDAL/OGR?
Translation of data from one software to other is now made simplethanks to:
I GDAL2 — Geospatial Data Abstraction Library
I OGR — OpenGIS Simple Features Reference Implementation
Note: not all software producers support GDAL!
2http://www.gdal.org/formats_list.html
GEOSTAT course, 11-17 April 2011, Canberra
PROJ.4
Geographic data always refers to some referent coordinatesystem.
PROJ.4 — Cartographic Projections Library (this allows you toreproject maps to almost any coordinate system)http://spatialreference.org
Today, it is much easier to move maps from one to otherprojection system than 10 years ago (reproject on-fly). You onlyneed to assign the correct proj4string and then you do not have toworry about it any more. Unless you got it wrong — all parametersneed to be absolutely correct!
GEOSTAT course, 11-17 April 2011, Canberra
EPSG
European Petroleum Survey Group (EPSG) databasehttp://www.epsg-registry.org/
World standard or user-defined coordinate systems;e.g.Amersfoort / RD New (EPSG 28992):
+proj=sterea
+lat_0=52.15616055555555 +lon_0=5.38763888888889
+k=0.999908 +x_0=155000 +y_0=463000
+ellps=bessel
+towgs84=565.237,50.0087,465.658,-0.406857,
0.350733,-1.87035,4.0812
+units=m +no_defs
GEOSTAT course, 11-17 April 2011, Canberra
WGS84
The only truly global reference model of the Earth is the WorldGeodetic System (WGS84) ellipsoid:
> EPSG <- make_EPSG()
> EPSG[EPSG$note=="# WGS 84",-2]
code prj4
249 4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
Our current space-time location:
> 149.11902 E, -35.28028 N, 14 April, 03:15 GMT
GEOSTAT course, 11-17 April 2011, Canberra
Google Earth
GEOSTAT course, 11-17 April 2011, Canberra
Why not always use Longlat system?
I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1
I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)
I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);
I Map units are abstract — arcseconds, arcminutes,arcdegrees
GEOSTAT course, 11-17 April 2011, Canberra
Why not always use Longlat system?
I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1
I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)
I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);
I Map units are abstract — arcseconds, arcminutes,arcdegrees
GEOSTAT course, 11-17 April 2011, Canberra
Why not always use Longlat system?
I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1
I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)
I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);
I Map units are abstract — arcseconds, arcminutes,arcdegrees
GEOSTAT course, 11-17 April 2011, Canberra
Why not always use Longlat system?
I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1
I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)
I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);
I Map units are abstract — arcseconds, arcminutes,arcdegrees
GEOSTAT course, 11-17 April 2011, Canberra
From Geographic to Projected coordinates
GEOSTAT course, 11-17 April 2011, Canberra
Grid cell size / scale
see also “Finding the right pixel size”
GEOSTAT course, 11-17 April 2011, Canberra
Impact of grid cell size
GEOSTAT course, 11-17 April 2011, Canberra
Resampling
For example in SAGA GIS:$module name : Get Grid Data for Shapes
...
-INTERPOL:<num> Interpolation
Choice
Available Choices:
[0] Nearest Neighbor
[1] Bilinear Interpolation
[2] Inverse Distance Interpolation
[3] Bicubic Spline Interpolation
[4] B-Spline Interpolation
GEOSTAT course, 11-17 April 2011, Canberra
Resampling scheme (nearest neighbor and bilinear)
GEOSTAT course, 11-17 April 2011, Canberra
Spatial prediction techniques
Next topics:
I Spatial prediction — basic principles, classification(mechanical / statistical methods)
I Kriging — semivariance, variogram, ordinary kriging,characteristics of kriging, variants of kriging
I Regression — correlation, prediction error, OLS, GLS, GLMs
I Regression-kriging — the generic spatial prediction model
GEOSTAT course, 11-17 April 2011, Canberra
Books on geostatistics
I Goovaerts, P., 1997. Geostatistics for Natural ResourcesEvaluation (Applied Geostatistics). Oxford University Press,New York, 496 pp.
I Webster, R. and Oliver, M.A., 2007. Geostatistics forEnvironmental Scientists. Statistics in Practice. JohnWiley & Sons, Chichester, 330 pp.
I Pebesma, E.J., 2003. Gstat User’s manual. University ofUtrecht, Utrecht www.gstat.org
I Rossiter D.G., 2008. Spatial analysis and Geostatistics,lecture notes, ITC.
GEOSTAT course, 11-17 April 2011, Canberra
Geostatistical mapping
“Analytical production of maps by using field observations, auxiliaryinformation and a computer program that calculates values atlocations of interest (a study area)”
GEOSTAT course, 11-17 April 2011, Canberra
Spatial prediction model
A widely-accepted generic spatial prediction model3:
z (s0) = E {Z |z (si), qk (s0), γ(h), s ∈ A} (1)
where z (si) is the input point dataset, qk (s0) is the list ofdeterministic predictors and γ(h) is the covariance model definingthe spatial autocorrelation structure.
3A spatial prediction model defines inputs, outputs and the computationalprocedure to derive outputs based on the given inputs.
GEOSTAT course, 11-17 April 2011, Canberra
Spatial prediction scheme
GEOSTAT course, 11-17 April 2011, Canberra
Spatial prediction techniques I
1. MECHANICAL (DETERMINISTIC) MODELS — Theseare models where arbitrary or empirical model parameters areused. No estimate of the model error is available and usuallyno strict assumptions about the variability of a feature exist.The most common techniques that belong to this group are:
I Thiessen polygons;I Inverse distance interpolation;I Regression on coordinates;I Natural neighbors;I Splines;I . . .
GEOSTAT course, 11-17 April 2011, Canberra
Spatial prediction techniques II
2. LINEAR STATISTICAL (PROBABILITY) MODELS —In the case of statistical models, the model parameters arecommonly estimated in an objective way, following probabilitytheory. The predictions are accompanied with an estimate ofthe prediction error. A drawback is that the input data setusually need to satisfy strict statistical assumptions. There areat least four groups of linear statistical models:
I kriging (plain geostatistics);I environmental correlation (e.g.regression-based);I Bayesian-based models (e.g.Bayesian Maximum Entropy);I hybrid models (e.g.regression-kriging);I . . .
GEOSTAT course, 11-17 April 2011, Canberra
Spatial prediction techniques III
3. EXPERT-BASED SYSTEMS — These models can becompletely subjective (ergo irreproducible) or completelybased on data; predictions are typically different for each run.Expert systems can also largely be based on probability theory(especially Bayesian statistics), however, it is good to putthem in a different group because they are conceptuallydifferent from standard linear statistical techniques. There areat least three groups of expert based systems:
I mainly knowledge-driven expert system (e.g.hand-drawnmaps);
I mainly data-driven expert system (e.g.based on neuralnetworks);
I machine learning algorithms (purely data-driven);
GEOSTAT course, 11-17 April 2011, Canberra
Inverse distance interpolation
A value of target variable at some new location can be derived as aweighted average:
z (s0) =
n∑i=1
λi(s0) · z (si) = λT0 · z (2)
where λi is the weight for neighbour i . The sum of weights needs toequal one to ensure an unbiased interpolator.The simplest approach for determining the weights is to use the inversedistances from all points to the new point:
λi(s0) =
1dβ(s0,si )
n∑i=0
1dβ(s0,si )
; β > 1 (3)
GEOSTAT course, 11-17 April 2011, Canberra
Kriging
A standard version of kriging is called ordinary kriging. The predictionsare based on the model:
Z (s) = µ+ ε′(s) (4)
where µ is the constant stationary function (global mean) and ε′(s) is thespatially correlated stochastic part of variation.The predictions are obtained using:
zOK(s0) =
n∑i=1
wi(s0) · z (si) = λT0 · z (5)
where λ0 is the vector of kriging weights (wi), z is the vector of nobservations at primary locations.
GEOSTAT course, 11-17 April 2011, Canberra
Kriging (2)
The kriging OK weights are solved by multiplying the covariances:
λ0 = C−1 · c0; C (|h| = 0) = C0 + C1 (6)
where C is the covariance matrix derived for n × n observations and c0is the vector of covariances at new location.
C (s1, s1) · · · C (s1, sn) 1
......
...C (sn , s1) · · · C (sn , sn) 1
1 · · · 1 0
−1
·
C (s0, s1)
...C (s0, sn)
1
=
w1(s0)
...wn(s0)ϕ
(7)
GEOSTAT course, 11-17 April 2011, Canberra
Semivariance
The basis of kriging is derivation and plotting of the so-calledsemivariances — differences between the neighbouring values:
γ(h) =1
2E[(z (si)− z (si + h))
2]
(8)
where z (si) is the value of target variable at some sampled location andz (si + h) is the value of the neighbor at distance si + h.Suppose that there are n point observations, this yields n · (n − 1)/2pairs for which a semivariance can be calculated.
Once we calculated an experimental variogram, we can fit it using some
of the authorized variogram models, such as linear, spherical,
exponential, circular, Gaussian, Bessel, power. . .
GEOSTAT course, 11-17 April 2011, Canberra
Variograms
GEOSTAT course, 11-17 April 2011, Canberra
Anisotropy
GEOSTAT course, 11-17 April 2011, Canberra
Anisotropy in gstat
The variogram models can be extended to even larger number ofparameters if either (a) anisotropy or (b) smoothness are considered inaddition to modelling of nugget and sill variation.
The 2D geometric anisotropy in gstat, for example, is modelled byreplacing the range parameter with three parameters — range in themajor direction (direction of the strongest correlation), angle of theprincipal direction and the anisotropy ratio:
vgm(nugget=1, model="Sph", sill=10, range=2,
anis=c(30,0.5))
where value of the angle of major direction is 30 (azimuthal directionmeasured in degrees clockwise), and value of the anisotropy ratio is 0.5
(range in minor direction is two times shorter).
GEOSTAT course, 11-17 April 2011, Canberra
Kriging, steps
330000
331000
332000
333000
178500 179500 180500 181500
●●●●
●●
●●● ●
●●
●●●●
●●
●●●●●
●●●●
●●
●
●●● ●
●●●●●●
●●●
●
●●●●●
●●
●●●●●●●●
●●●●
●●●●
●
●●●●
●●●●●
●●●●
●
●
●
●●
●●
●● ●●
●
●●
●
●●●
●●
●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
● ●
●
●●●●● ●
●●
●●●
●
●
●
●
●
●
1002004008001600
(a)
distance
sem
ivar
ianc
e
1
2
3
500 1000 1500
●●●
●●
●
●●
●
●
●●
●
●●
●●
●●● ●
●●
● ●● ●●
●●
●●● ●●●
●
●
●
●●●●
●●
●
●
●
●●●●
●●
●
●●
●
●●●●●●●● ●●●
●● ●
●●
●
●●
●
●●
●
●● ●●●●
●●
●●
●●
●
●● ●●●●●●
●
●
●●●●
●● ●
●●●
●●
●
●
●
●
●●
●
●● ●●
●●
●●
●
●●●●
●●●
●●●●
●●
●●
●
●●
●
●● ●●●
●●●●
●●
●●
●
●●
●
●●● ●●●
●●●
●●
●
●●
●
●
●
●
●●●● ●●
●
●●●●
●●
●●
●
●●
●●
●●●●
●●
●●●●●●●
●●●
●
●●
●
●●●
●
●
●●
●●●●●●●
●●●
●
●●
●
●●●
●
●
●
●●
●
●●●●●
●●●●
●
●
●
●
●● ●
●
●
●●
●●
●
●●●●●
●●●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●●●●
●●●●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●●●
●●●●●
●
●
●
●
●●●
●
●
●●
●●●
●
●
●
●●●●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●●●●
●●
●
●●●●●●●●●
●
●
●
●
●●●
●
●
●● ●●●●●
●
●
●
●●●●
●●
●● ●
●
●
●
●
●
● ●
●
●
●●
●●● ● ●●
●●
●
●●●●●
●●●●
●
●
●
●
●●●
●
●
●●
●●●●● ● ●
●●
●
●●●●
●●●●●
●
●
●
●
●
●●
●
●
●●
●●●●● ● ●●
●●
●
●●●●●●●●●
●
●
●
●
●●●
●
●
●● ●●●●●● ●●●
●
●
●
●●●●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●●●●● ● ●●●●
●●
●
●●●●
●●
●●●
●
●
●
●
●●●
●
●
●●
●●●●● ● ●●●●●
●●
●
●●●●●●●●
●
●
●●
●
●●●
●
●
●● ●●●●●●
●●●●
●●●●
●
●●●●●●
●●
●●
●●●
●●●●●●
●
●●●●
●
●
●●●
●
●●
●●● ●
●●●
●●
●
●●
●
●●
●
●●●●●●
●●
●●●●
●
●
●
●●
●
●●
●
● ●● ●
●●●
●●
●
●●
●
●●
●
●●●●● ●
●●
●●●●
●
●
●
●●
●
●●
●
●● ●●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●●●●●●●●
●
●
●●
●
●●●
●
●
●● ●●●●●●
●●●●●●●
●
●●
●
●
●
●
●●●●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●●●●●● ●●●● ●●●
●
●●
●
●
●
●
●
●●●●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●●●●●● ●●●● ●●●
●
●●
●
●● ●●●●●●●●
●
●
●
●
●●●
●
●
●● ●●●●●● ●●●● ●●●
●
●●
●
●●● ●●
●
●●●●
●●
●●
●
●●
●
●●●●●●●
●
●●●●
●
●
●
●●
●
●●
●
●●●
●
●
●●
●
●
●●●
●●
●●
●
●●
●
●●●●●●●
●
●●●●
●
●
●
●●
●
●●
●
●●●
●
●
●●
●
● ●●
●●
●●
●●
●
●●●
●
●●●
●●●●●
●
●●●
●
●●
●●●●
●
●
●●
● ●●●
●●
●
●●
●
●●●
●
●●●
●●●●●●
●●●
●●●
●●
●●
●
●
●●
●
●●
● ●
●●●
●
●
●● ●●●●●●●●●●●●
●
●●
●
●●●
●
●●
●●
●●
●
●●●
●●●●●●
●●
●●● ●
●●
●
●
●●
●●●
●●●
●●
●
●
●●
●●●●●●●●●
●
●●
●
●●●●
●●
●●
●●
●● ●
●●●
●●
●
●●
●
●●
●
●●●●● ●
●●
●●●●
●
●
●
●●
●
●●
●
●●●●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●● ●●●●● ●
●●
●
●
●
●●
●
●●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●●
●● ●
●
●●●
●
●●●
●●●●
●●●
●●
●
●●●●
●
●
●●
●●●●●
●●
●●
●
●
●
● ● ●
●●●
●●●●
● ●●
●●●●
●
●●●
●
●●
●●
●●
●
●
●●
●
●●● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
● ● ●●
●
●
●●●
●●
●●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●●
●
●
●
●
●
●●●●●
●
●
● ●●● ●
●●
●
●
●●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●
●● ●
● ●● ●
●●
●
●
●●
●
●●●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●●●
●●
●●
●
●●●●
●
●
●
●
●●
●
●● ●
●
●●●
●
●●●
●●
●●
●
●●●●
●
●
●
●
●●
●
●● ●
●
●●●●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●●
●●
●●●●●
●
●
●●●
●
●
●
●
●
●●
●
●●
●●
●●●●●●
●
●●
●
●●
●
●
●
●
●
●●●
●●
●●
●●●●●●●● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●●
●
●
●
●
●●
●
●●
●●
●●●●●●●●
●
●●
●
●
●
●●
●●
●
●●
●●●●●●●
●
●●
●
●
●
●
●●●
●●●●●●●
●
●
●●
●
●
●
●
●
●●
●●●●●●●●●
●
●●●●
●
●●●●●●●●●
●
●
●●●●
●●
●
●● ●
●●●●●●●
●
●
●●●●● ●
●
●
●●●●●●●
●
●
●●●●● ●
●
●
●
●●●●●●●●
●
●
●●●●●●● ●
●
●●●●●●
●
●
●●●●●●●●
●
●●●●●●●
●
●●●●● ●● ●●
●
●
●
●
●●●●●●● ●
●
●●●●●●●
●●●
●
●
●
●●
●
●
●●
●●●
●●●
●
●●●●●● ●
●
●●
●
●
●
●
●
●
●●●
●
●
●●●●●●●● ●
●
●● ●● ●● ●
●●●
●●
●
●
●
●
●●●
●●●
●
●●●●●
●●
●
● ●
●●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●●●●●● ●
●
●● ●● ●● ●● ●●●●
●
●
●●
●
●● ●●●● ●● ●
●
●●
●
●● ●
●
●
●●● ●●● ●●
● ● ● ●● ● ●
●
●●
●
●● ●
●
● ●
●●
●
●
●
●●
●●●●
●●
●●●
●●
●
●
●
●
●
●●
●
●
●
●●●
● ●
●
●
●● ●
●●
● ●
●
●●
●
●
●
●
●
●●
●●
●
●
●●●
●●
●●●●●
●
●
●
●
●
●●
●
●
●●
●●● ●
●
● ●● ●
●●
● ●●
●●
●
●
●
●
●
●●
●
●●
●● ●
●●●●●●●
●
●
●●●● ●● ●● ●●
●●
●
●
●
●●
●
●
●
●
●●●●
●●●
●
●●●●● ●● ● ●●
●●
●
●
●●
●●
●●●●
●●
●
●
●●●●
● ●●●
●●
●
●
●
●
● ●●●●
●●●
●●
●●
●●●
●●
●●●
●●
●
●
●
●
●
● ●●●●
●●●
●●
●
●
●●●●
●●●●
● ●
●
●
●
●
●●
●●●●● ●●●●●
●
●●●●● ●● ●●●
●●
●
●
●
●
●●●●
●● ●●●●●
●
●
●●●●● ●● ●●●
●●
●
●
●
●
●●●●
●●●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●●
●●
●●
●
●
●●●
●●●●
●
● ●
●
●
●
●
●
● ●
●●●
●●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
● ●
●
●
●
●
●
●●
●●
●●
●
●●
● ● ●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●●●
●●
● ●
●
● ●
●
●
●
●
●
● ●
● ●●
●●
●●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
● ●
●
●
●
●
●
●●
●●
●●
●
●●
●●●●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●●●
●
●●
●
●
● ●
●
●
●
●
●
●●
●●●
●●
●●
●●●●●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
● ●
●
●
●
●
●
●●
●●
●●
●
●●
●●● ●●●●
●●
●
●
●
●●●
●●
●
●
●●●
●
●●●
●
● ●
●
●
●
●
●
●●
●●●
●●
●●
● ●● ● ●●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●●
● ●●● ● ●●● ●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●●
●●
●●
●
● ●
●
●
●
●
●
●●
● ●●
●●
●●
● ●●● ● ● ●●● ●●●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●
●●●
● ● ● ●●●●
●
●
●
●●
●●●●
●●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●● ●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●●
●●
●●
● ● ●●●●●
●●
●
●
●●
●●●●●
●●●●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
● ●●●
●●●
●
●
●
●●●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●●
●●
●●
● ● ●●●
●●● ●
●●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
● ● ●●●●
● ● ●●●
●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●●
●
●●
●
●
● ●
●
●
●
●
●
●●
●●
●●
●
●●
● ●●● ● ● ●●●● ● ● ●●● ●●●●
●●
●●●
●●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●●●
●●
● ●
●
● ●
●
●
●
●
●
●●
● ●●
●●
●●
●●●● ● ● ●●●●
● ●●●
●● ●●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●●
●●
● ●
●
● ●
●
●
●
●
●
● ●
● ●●
●●
●●
● ●●● ● ● ●●●●● ●
●●●
●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●● ●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●●
●●
●● ● ● ●●●●● ● ●●●●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●● ●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
● ● ●●●
●● ● ●●●
●●●
●
●
●
●
●● ●●●
●
●●
●
●●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●● ●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●●
● ●●● ● ● ●●●●●● ●● ●●●●●●●
●
●●
●
●●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ●
●
●
●● ●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●●
● ●●●● ● ●●●●● ● ●●●●●●●●●
●
●●
●
●●
●
●●●●
●●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●● ●
● ●
●
●
●● ●
●●
● ●
●
●●
●
●
●
●
●
●●
● ●●
●●
●●
●●●●● ● ●●●●
●●●●
●●●●
●●●●
●
●
●
●●
●● ●●●
●
●●
●
●●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
●
●
●● ●
●●
●●
●
●●
●
●
●
●
●
●●
● ●●●
●
● ●●●● ● ●●●●●● ●● ●●●●●●
● ●●
●
●
●
●
●●●●●
●
● ●● ●●●●●● ●●●
●
●
●
●
●
●●●
●●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●●●
●
●
●●
●●●●●●●●● ●●●
●
●●
●
●●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
●
●
●● ●
●
●
●
●
●
●
●●
●
●● ● ●●●●●●● ●
●●●●
●●● ●●●
●●
●●
●
●
●●
●●●●●●●● ●●●
●
●●
●
●●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●●
● ●
●●● ● ● ●●●●●● ● ● ●●●●●●
● ●●●
●
●
●
●●
●
●
●●
●●● ●●
●●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●● ●
●
●
●●
●
●●
●
●
●
●
●●
●●
●●● ● ●●●●● ● ● ● ●
●●●●● ●
●● ●●
●●
●
●
●● ●●●●● ●●●
●
●●
●
●●●●
●●
●●●●●
●
●
●
●
●
●●
●
●
●
●● ●
● ●
●
●
●● ●
●●
● ●●
●●
●
●
●
●
●●
● ●●●● ● ●●
●●●
● ●●
●●●●●
● ●●● ●●●
●
●●
●
●●●●●
●●
●
●●●●
●
●●●
●
●●
●
●●●
●●
●●
●
●●●
●
●
●
●
●
●●
●
●●
●
●●
● ● ● ● ● ●●
●
●● ●● ●
●●
●
●
●
●
●
●●●
●●
● ●
●
●
●●
●●
●
●
●●●●
●
●
●● ●
●●●
●●
●
●●
●
●●
●
●●●●● ●
●●
●●●●
●
●
●
●●
●
●●
●
●●●●
●
●●
●
● ●●
●
●
●
●
●
●●
●●
●●
●● ● ● ● ● ● ●●
●
●●
●●
●
●
●●
● ●
●
●
●●
●●
●
●
●●●
●
●
●
●●● ●●●
●
●
●
●
●●●
●
●
●● ●●●●●● ●●●● ●●●
●
●●
●
●●●
●
●●
●● ● ●●
●
●
●
●
●
●●
●
●
●
●● ●
● ●
●
●
●● ●
●
●
●●● ● ● ●●
●●● ●
●
●●●●●
● ●●●● ●●
●
●●
●●
●
●●●
●
●
●
●
●
●●
●
●
●●
●●●●●●
●●●●
●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●●
●
●●●●●● ● ● ●
● ●●●● ●
●●
●
● ● ●●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●●●●●●
●●●●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●●
●
●
●●
●
●●●
●●●● ● ●
●● ●
●●● ●●
●
●
● ●●
●
●
●
●
●●
●
●●
●
●
●●
●●●●●●●●
●●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●● ●
●
●
●
●
●
●
●●
●
●● ● ●●●●●●● ● ● ●● ●●●● ●●●
●
● ● ●●
●
●
●●●
●●
● ●● ●●●●
●●●
●
●●
●
● ● ●
●
●● ●
● ●●●●
●
●●●
●● ●
●●
●●
●
●
● ●
●● ●● ● ●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●●●●
●●●
●●
●●
●●
●
●●●
●
●● ●
●●●●
●
●
●●●
●
●●
●●●●
●
●
●●
●● ●
● ●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●● ●
●
●●
● ●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●●
●●
●●
● ● ●●●
●● ● ●● ●
●● ●
●● ● ●●●
●
●●●
●
●
●
●
●●●●●●
●
●
●
●
●●●
●
●
●●
●●●●●● ●●●● ●●●
●
●●
●
●●●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
●
●
●● ●
●
●
●
●
●
●
●●
●
●● ● ●●●●●●● ●
●●●●
●●● ●●●
●
● ●● ●
●
●
● ●●
●●
●
●
●●
●
●●●●
●●
●●●
●
●
●
●
●● ●
●
●
●●
●●●● ●● ●●●●● ●●
●
●●
●
● ● ● ●
● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●●
●●
●●
● ●●
●
●
●
●●
●
●
●●
●●●●●●
●●●●●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●● ●
●
●
●
●
●
●●
●●●●●●● ● ● ●
● ●●●● ●
●●
●
● ●●
●
●
●
●
●●●
●●
●● ●
●
●
●
●●
●●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●● ●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●
●●
●● ● ● ●●●●● ● ●● ● ●
●●●●● ●
●●
●
●● ●●
●
●
●
●●●● ●●
●●
●
●
●●
●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●● ●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●●
● ●●● ● ● ●●●●● ● ●●●●●●●● ● ●● ●
●
●● ●●
●
●
●●●●● ●●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
● ● ●●●●
● ● ●●●●
● ●●● ● ●
● ●
●
●●●
●
●
●
●
●●●●●
●●●●
●
●
●
●●●
●●
●
●
●●●
●●●
●
●
● ●
●
●
●
●
●
●●
●●●
●●
●●
●●● ●●● ● ● ● ● ● ●●●
●●●● ●●● ● ●
●
●●●
●●●●●
●●●●●
●●
●
●
●●●●●●●●● ●
●
●●
●
● ●●●●● ●● ●●
●
●●
●●●● ●
●
●
● ●
●●
●●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
● ●●●
●●●●
●● ●
●
●
●
●
●
● ●●●
● ●●●● ●
●●
●●●● ●●
●
● ●●
●
●●●
● ●●● ● ●
●
●
●●
●●
●●
●
●●
●
●
●●●
●
●●
●
●
● ●
●
●
●
●
●
●●
●●
●●
●
●●
●● ● ●●●● ● ● ● ●●●●●● ● ● ●● ●
●
●●●●●
●
●
●●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●●
●● ● ● ●●●● ● ● ● ●●●●●● ● ● ●● ●
●
●●●●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●●
●
●●
●● ● ● ●●●● ● ● ● ●●●●●● ● ● ●●
●
● ●●●●
●
●●●
●●
●
●
● ●●●
●●●●
● ●
●
●
●
●
● ●●●●●● ●●
●
●●
●●●●●
●
●
●●
●●●
●●
●
●●
●
●
●
●●
●●
●
●●
●
●
●●●●●●●●● ●
●
●●
●
●●
●●●●●● ●●
●
●●
●●●● ●
●
●
●●
●●●
●●
●
●
●●
●
●
●
●●
●
●●
● ●●
●
●●●●●●●●● ●●
●●
●
●●●●●
●● ●
●
●
●
●
●●● ●
●
● ●
●●
●
●
●
●
●
●
●
●●
●● ●●
●
●●●●●●●●● ●
●●
●
●
●●●●●●
● ●
●
●
●
●
●●● ●
●
● ●
●●
●
●
●
●
●
●
●●
●●● ●●
●
●●●●● ●●●● ● ●●
●
●
●●●●
●●● ●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●●●
●●●●
●
● ●
●
●
●
●
●
● ●
●●●
●●
● ●
●● ● ●●●● ●
●●
●●
●●●● ●
●
●
●●
●●
●●● ●●
●
●●
● ●
●●
●
●
●●●
●●
●●
●
● ●
●
●
●
●
●
● ●
●●●
●●
●●
●● ● ●●●● ● ● ●●●●●●
● ●●● ●
●
●●●●
●
●
●● ● ●●
●
●●
●
●
●●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●●
●● ● ●●●● ● ● ● ● ●●●●●● ● ● ●● ●
●
●●●●●
●
●● ● ●
●●
●●
●
●● ●●●●●
●
●
●●●●●●●●● ●
●●
●
●
●●
●●●●●● ●●
●
●
●
●●●● ●
●
●
● ●
●
●
●●●
●●
●●
●●
●
●●
●
●
●
●●
●
●●
●● ●●●
●●
●
●●●
●●
●●
●●●
●●
●●●
● ●
●
●
●
●
●
● ●●●●● ●
●●●
●●
●●●● ●
●●
●●
●●●
●●
●●
●
●
●●
●●
●●●
●● ●
●●●●
●
●●●
●●
●
●
●
●
●●
●●
●
●
●●●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●●
●●
● ●● ● ●●●● ● ● ● ● ●●●●●● ● ● ●● ● ●
●
● ●●●●●●●●
●
●●● ●
●●
● ●
●
●●●
●
●
●●●
●●
●●
●
●
●●●
● ●●●●
●
● ●●●
●
●● ●
●●
●●
●● ●●
●
●
●●
●●
●
● ●● ●
●
●
●
(b)
distance
sem
ivar
ianc
e
0.2
0.4
0.6
500 1000 1500
+
+
+
+
+
+ +
++
+ +
+
+
+ +
57
299
419
457547
533574564589
543500
477452
457415
(c)
distance
sem
ivar
ianc
e0.2
0.4
0.6
500 1000 1500
+
+
+
+
+
+ +
++
+ +
+
+
+ +
(d)
GEOSTAT course, 11-17 April 2011, Canberra
Burrough and McDonnell (1998) example
What is the value of target variable at the location x=5, y=5? (examplefrom the book by Burrough and McDonnell (1998)):
> grid10 = expand.grid(x = seq(0, 10, 10), y = seq(0, 10,
+ 10))
> gridded(grid10) = ~x + y
> newpoint = as.data.frame(matrix(c(5, 5), nrow = 1, ncol = 2,
+ dimnames = list(c("x1"), c("X", "Y"))))
> coordinates(newpoint) <- ~X + Y
> krige(Z ~ 1, points5, newpoint, vgm(nugget = 2.5, "Sph",
+ psill = 7.5, range = 10))
[using ordinary kriging]
coordinates var1.pred var1.var
1 (5, 5) 4.3 4.93
GEOSTAT course, 11-17 April 2011, Canberra
Kriging = geostatistics
Kriging has for many decades been used as a synonym forgeostatistical interpolation.
It originated in the mining industry in the early 1950’s as a meansof improving ore reserve estimation (mining engineers D. G. Krigeand the statistician H. S. Sichel).
The technique was first published in Krige (1951), but it tookalmost a decade until a French mathematician G. Matheronderived the formulas and basically established the whole field oflinear geostatistics.
GEOSTAT course, 11-17 April 2011, Canberra
Environmental correlation
The concept of vegetation / soil – environment relationships hasfrequently been presented in terms of an equation with six keyenvironmental factors as:
V × S [x , y , t ] = f
{s[x , y , t ] c[x , y , t ] o[x , y , t ]r [x , y , t ] p[x , y , t ] a[x , y , t ]
(9)
GEOSTAT course, 11-17 April 2011, Canberra
Types of lm’s
There are (at least) four groups of statistical models that havebeen used to make spatial predictions with the help ofenvironmental factors:
I Classification-based models
I Tree-based models (decision tree)
I Regression models (Generalized Linear Models, GeneralAdditive Models)
GEOSTAT course, 11-17 April 2011, Canberra
Environmental correlation with OLS
A common regression-based approach to spatial prediction is themultiple linear regression:
zOLS(s0) = b0+b1 ·q1(s0)+. . .+bp ·qp(s0) =p∑
k=0
βk ·qk (s0) = βT·q (10)
where qk (s0) are the values of the auxiliary variables at the target
location, p is the number of predictors or auxiliary variables, and βk arethe regression coefficients solved using the Ordinary Least Squares:
β =(qT · q
)−1 · qT · z (11)
GEOSTAT course, 11-17 April 2011, Canberra
OLS prediction error
The prediction error of a multiple linear regression model is:
σ2OLS(s0) = MSE ·
[1 + qT
0 ·(qT · q
)−1 · q0
](12)
where MSE is the mean square (residual) error around the regression line:
MSE =
n∑i=1
[z (si)− z (si)]2
n − 2(13)
and q0 is the vector of predictors at new, unvisited location. The OLS
prediction error reflects the amount of extrapolation in the feature
space!
GEOSTAT course, 11-17 April 2011, Canberra
Adjusted R-square
The sum of squares of residuals (SSE ) can be used to determine theadjusted coefficient of multiple determination (R2
a), which describesthe goodness of fit:
R2a = 1−
(n − 1
n − p
)· SSE
SSTO
= 1−(n − 1
n − p
)·(1− R2
) (14)
where SSTO is the total sum of squares, R2 indicates amount of
variance explained by model, whereas R2a adjusts for the number of
variables (p) used. For many environmental mapping projects, a
R2a ≥0.85 is already a very satisfactory solution and higher values will
typically only mean over-fitting of the data.
GEOSTAT course, 11-17 April 2011, Canberra
Comparison of spatial prediction techniques
zinc.id zinc.tr zinc.lm zinc.ok
0
500
1000
1500
GEOSTAT course, 11-17 April 2011, Canberra
Universal model of spatial variation
From the statistical perspective, an environmental variable can beviewed as an information signal consisting of three components:
Z (s) = Z ∗(s) + ε′(s) + ε′′ (15)
where Z ∗(s) is the deterministic component, ε′(s) is thespatially correlated random component and ε′′ is the purenoise, usually the result of the measurement error.
GEOSTAT course, 11-17 April 2011, Canberra
Best Linear Unbiased Predictor
Matheron (1969) proposed that a value of a target variable at somelocation can be modelled as a sum of the deterministic and stochasticcomponents:
z (s0) = m(s0) + e(s0)
=
p∑k=0
βk · qk (s0) +n∑
i=1
λi · e(si)(16)
where m(s0) is the fitted deterministic part, e(s0) is the interpolated
residual, βk are estimated deterministic model coefficients (β0 is theestimated intercept), λi are kriging weights determined by the spatialdependence structure of the residual and where e(si) is the residual atlocation si .
GEOSTAT course, 11-17 April 2011, Canberra
BLUP (2)
The regression coefficients βk can be estimated from the sample by somefitting method, e.g.ordinary least squares (OLS) or, optimally, usingGeneralized Least Squares:
βGLS =(qT ·C−1 · q
)−1 · qT ·C−1 · z (17)
where βGLS is the vector of estimated regression coefficients, C is thecovariance matrix of the residuals, q is a matrix of predictors at thesampling locations and z is the vector of measured values of the targetvariable.
GEOSTAT course, 11-17 April 2011, Canberra
BLUP (3)
In matrix notation, regression-kriging is commonly written as(Christensen, 2001):
zRK(s0) = qT0 · βGLS + λT0 · (z− q · βGLS) (18)
where z (s0) is the predicted value at location s0, q0 is the vector ofp + 1 predictors and λ0 is the vector of n kriging weights used tointerpolate the residuals.
The estimation of the residuals is an iterative process: first the
deterministic part of variation is estimated using ordinary least squares
(OLS), then the covariance function of the residuals is used to obtain the
GLS coefficients. The most reliable way to estimate the model
coefficients is REML.
GEOSTAT course, 11-17 April 2011, Canberra
RK variance
The RK prediction variance reflects the position of new locations(extrapolation) in both geographical and feature space:
σ2RK(s0) = (C0 + C1)− cT0 ·C−1 · c0
+(q0 − qT ·C−1 · c0
)T · (qT ·C−1 · q)−1 · (q0 − qT ·C−1 · c0
)(19)
where C0 + C1 is the sill variation and c0 is the vector of covariances ofresiduals at the unvisited location.
GEOSTAT course, 11-17 April 2011, Canberra
RK and MLR
If the residuals show no spatial auto-correlation (pure nugget effect), theregression-kriging converges to pure multiple linear regression (Cbecomes identity matrix):
C =
C0 + C1 · · · 0... C0 + C1 00 0 C0 + C1
= (C0 + C1) · I (20)
so the kriging weights at any location predict the mean residual i.e.0value. Similarly, the regression-kriging variance reduces to the multiplelinear regression variance:
σ2RK(s0) = (C0 + C1)− 0 + qT
0 ·(qT · 1
(C0 + C1)· q)−1
· q0
σ2RK(s0) = σ2
OLS(s0) = MSE ·[1 + qT
0 ·(qT · q
)−1 · q0
](21)
GEOSTAT course, 11-17 April 2011, Canberra
RK and OK
If the target variable shows no correlation with the auxiliary predictors,
the regression-kriging model reduces to ordinary kriging model because
the deterministic part equals the (global) mean value.
GEOSTAT course, 11-17 April 2011, Canberra
RK and KED/UK
In the case of KED/UK, the extended covariance matrix of residuals isused, which looks like this:
CKED =
C (s1, s1) · · · C (s1, sn) 1 q1(s1) · · · qp(s1)...
......
......
C (sn , s1) · · · C (sn , sn) 1 q1(sn) · · · qp(sn)1 · · · 1 0 0 · · · 0
q1(s1) · · · q1(sn) 0 0 · · · 0...
... 0...
...qp(s1) · · · qp(sn) 0 0 · · · 0
(22)
GEOSTAT course, 11-17 April 2011, Canberra
RK and KED/UK (2)
The KED/UK weights are solved using the extended matrices:
λKED0 ={wKED1 (s0), ...,w
KEDn (s0), ϕ0(s0), ..., ϕp(s0)
}T= CKED−1 · cKED0
(23)
where λKED0 is the vector of solved weights, ϕp are the Lagrangemultipliers, CKED is the extended covariance matrix of residuals and cKED0
is the extended vector of covariances at new location.
GEOSTAT course, 11-17 April 2011, Canberra
RK and KED/UK (3)
The predictions at new locations are made by:
zKED(s0) =
n∑i=1
wKEDi (s0)·z (si) = δT0 · z (24)
for:n∑
i=1
wKEDi (s0)·qk (si) = qk (s0); k = 1, ..., p (25)
where δ0 is the vector of KED/UK weights (wKEDi ).
Hence, KED/UK looks exactly as ordinary kriging, except thecovariance matrix is extended with values of auxiliary predictors!
GEOSTAT course, 11-17 April 2011, Canberra
The name confusion
Matheron (1969) originally termed the technique Le krigeage universel(Universal kriging), however, the technique was intended as ageneralized case of kriging where the trend is modelled as a function ofcoordinates only.
If the deterministic part of variation (drift) is defined externally as alinear function of some auxiliary variables, rather than the coordinates,the term Kriging with External Drift (KED) is preferred.
The drift and residuals can also be estimated separately and thensummed. This procedure was suggested by Ahmed and de Marsily (1987)and Odeh et al.(1995), who later named it Regression-kriging.
Minasny and McBratney (2007) suggest that instead a mathematicallyaccurate term should be used to name the technique: EBLUP.
GEOSTAT course, 11-17 April 2011, Canberra
Why do I prefer the term RK?
1. RK explicitly separates trend estimation from spatialprediction of residuals, allowing the use ofarbitrarily-complex forms of regression, rather than the simplelinear techniques
2. RK allows the separate interpretation of the twointerpolated components
3. The emphasis on regression is important also because fittingof the deterministic part of variation (regression) is often morebeneficial for the quality of final maps
4. KED (extended) matrix is instable in the case that thecovariate does not vary smoothly in space
GEOSTAT course, 11-17 April 2011, Canberra
Decision tree
YESIs the
variable correlated with environmental
factors?
NO
YESIs the physical model
known?
NO
ORDINARY KRIGING
YESDo the
residuals showspatial auto-correlation?
YESDo the
residuals showspatial auto-correlation?
YESDoes the
variable showsspatial auto-correlation?
NO
INVERSE DISTANCE
INTERPOLATION
YESCan a
variogram with >1 parametersbe fitted?
NO
ENVIRONMENTAL CORRELATION
(OLS)
REGRESSION-KRIGING
(calibration)
NO PREDICTIONSPOSSIBLE
REGRESSION-KRIGING
(GLS)
DETERMINISTIC MODEL
NO
NO
GEOSTAT course, 11-17 April 2011, Canberra
Gstat
inverse distance interpolation:
ev.id = krige(ev∼1, data=points, newdata=mapgrid)
correlation with coordinates (2nd order polynomial model):
ev.ts = krige(ev∼x+y+x*y+x*x+y*y, data=points,
newdata=mapgrid)
moving window (with coordinates):
ev.mv = krige(ev∼x+y+x*y+x*x+y*y, data=points,
newdata=mapgrid, nmax=20)
GEOSTAT course, 11-17 April 2011, Canberra
Gstat (2)
ordinary kriging:
ev.ok = krige(ev∼1, data=points, newdata=mapgrid,
model=vgm(psill=5, "Exp", range=1000, nugget=1))
environmental correlation (OLS):
ev.ec = krige(ev∼q1+q2, data=points, newdata=mapgrid)
regression-kriging (universal kriging):
ev.rk = krige(ev∼q1+q2, data=points, newdata=mapgrid,
model=vgm(psill=3, "Exp", range=500, nugget=0))
GEOSTAT course, 11-17 April 2011, Canberra
R syntax
GEOSTAT course, 11-17 April 2011, Canberra
Space-time data
Universal kriging model for spatio-temporal data (Heuvelink &Griffith, 2010):
T (s, t) = m(s, t) + ε(s, t) (26)
where m(s, t) is the deterministic part of the variation (i.e. a linear
function of the auxiliary variables), ε(s, t) is the residual for every(s, t).
GEOSTAT course, 11-17 April 2011, Canberra
Space-time cube
GEOSTAT course, 11-17 April 2011, Canberra
Space-time workshop (Munster)
GEOSTAT course, 11-17 April 2011, Canberra
Space-time semivariance
γ(si , ti ; sj , tj ) = 0.5 · E[(ε(si , ti)− ε(sj , tj ))2
](27)
GEOSTAT course, 11-17 April 2011, Canberra
Residuals
Residuals (ε) consist of three stationary and independentcomponents (Heuvelink & Griffith, 2010):
ε(s, t) = εs(s) + εt(t) + εs,t(s, t) (28)
where εs(s) is a purely spatial process (with constant realizationsover time), εt(t) is a purely temporal process, and εs,t(s, t) is aspace-time process for which distance in space is made comparableto distance in time by introducing a space-time anisotropy ratio.
GEOSTAT course, 11-17 April 2011, Canberra
Zonal anisotropies
The covariance structure can be represented by (Snepvangers etal., 2003):
C (h, u) = Cs(h) + Ct(u) + Cs,t(√
h2 + (α+ u)2) (29)
where C (h, u) is the covariance at distance h in space, andtime-distance u, Cs(h) + Ct(u) allow the presence of zonalanisotropies (different variogram sills in different directions), andCs,t(
√h2 + (α+ u)2) allows the presence of geometric anisotropy
represented with the ratio α.
GEOSTAT course, 11-17 April 2011, Canberra
In space-time cube
XY
cdays
GEOSTAT course, 11-17 April 2011, Canberra
Variograms (separately)
0 50000 100000 150000 200000
05
1015
2025
365 days
Distance (m)
Sem
ivar
ianc
e
0 10 20 30 40 50 600
1020
3040
50
159 stations
Distance (in days)
Sem
ivar
ianc
e
GEOSTAT course, 11-17 April 2011, Canberra
Variograms (zonal anisotropy)
Distance (m)
sem
ivar
ianc
e
2
4
6
8
10
50000 100000 150000 200000
●●
●
● ●●
● ● ●● ●
● ●●
●
Distance (in days)se
miv
aria
nce
2
4
6
8
10
5 10 15
●
●
●
●
● ●●
●
●●
●●
●●
● ●
Marginal experimental variograms for residuals and fitted models:(left) space-domain only, (right) time-domain only.
GEOSTAT course, 11-17 April 2011, Canberra
Some experiences
I By adding the time component we are better off.
I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.
I Fitting and visualization of space-time variograms is abottle-neck!
I Predictions need to be visualized as animations.
I We have ignored the one-way auto-correlation (time worksonly one way)?
GEOSTAT course, 11-17 April 2011, Canberra
Some experiences
I By adding the time component we are better off.
I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.
I Fitting and visualization of space-time variograms is abottle-neck!
I Predictions need to be visualized as animations.
I We have ignored the one-way auto-correlation (time worksonly one way)?
GEOSTAT course, 11-17 April 2011, Canberra
Some experiences
I By adding the time component we are better off.
I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.
I Fitting and visualization of space-time variograms is abottle-neck!
I Predictions need to be visualized as animations.
I We have ignored the one-way auto-correlation (time worksonly one way)?
GEOSTAT course, 11-17 April 2011, Canberra
Some experiences
I By adding the time component we are better off.
I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.
I Fitting and visualization of space-time variograms is abottle-neck!
I Predictions need to be visualized as animations.
I We have ignored the one-way auto-correlation (time worksonly one way)?
GEOSTAT course, 11-17 April 2011, Canberra
Some experiences
I By adding the time component we are better off.
I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.
I Fitting and visualization of space-time variograms is abottle-neck!
I Predictions need to be visualized as animations.
I We have ignored the one-way auto-correlation (time worksonly one way)?
GEOSTAT course, 11-17 April 2011, Canberra
Universal space-time reference
Each observation should have by default:
I Longitude and latitude (WGS84) (or projected X ,Ycoordinates + proj4 string);
I Begin / end of the time interval in UTC (GMT) system;
I Support size (in square meters);
I Uncertainty or measurement error;
GEOSTAT course, 11-17 April 2011, Canberra
Space-time algebra re-visited
Should we (re)define and (re)implementspace-time (4D) algebra?
GEOSTAT course, 11-17 April 2011, Canberra
What does this mean?
I Distances always on a sphere (sphere geometry);
I Always use information about uncertainty (weightedregression);
I Always use information about the support size (nuggetestimation, cross-validation);
I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);
I Use Google Earth to visualize any type of geographic data;
GEOSTAT course, 11-17 April 2011, Canberra
What does this mean?
I Distances always on a sphere (sphere geometry);
I Always use information about uncertainty (weightedregression);
I Always use information about the support size (nuggetestimation, cross-validation);
I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);
I Use Google Earth to visualize any type of geographic data;
GEOSTAT course, 11-17 April 2011, Canberra
What does this mean?
I Distances always on a sphere (sphere geometry);
I Always use information about uncertainty (weightedregression);
I Always use information about the support size (nuggetestimation, cross-validation);
I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);
I Use Google Earth to visualize any type of geographic data;
GEOSTAT course, 11-17 April 2011, Canberra
What does this mean?
I Distances always on a sphere (sphere geometry);
I Always use information about uncertainty (weightedregression);
I Always use information about the support size (nuggetestimation, cross-validation);
I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);
I Use Google Earth to visualize any type of geographic data;
GEOSTAT course, 11-17 April 2011, Canberra
What does this mean?
I Distances always on a sphere (sphere geometry);
I Always use information about uncertainty (weightedregression);
I Always use information about the support size (nuggetestimation, cross-validation);
I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);
I Use Google Earth to visualize any type of geographic data;
GEOSTAT course, 11-17 April 2011, Canberra
Global Multiscale Nested RK
Global models will soon replace local (isolated) modeling. Onesuch approach is the nested RK model:
z (sB) = m0(sB−k )+e1(sB−k |sB−[k+1])+ . . .+ek (sB−2|sB−1)+ε(sB) (30)
where z (sB) is the value of the target variable estimated at groundscale (B), B−1, . . . ,B−k are the higher order components,ek (sB−k |sB−(k+1)) is the residual variation from scale sB−(k+1) to ahigher resolution scale sB−k , and ε is spatially auto-correlatedresidual soil variation (dealt with ordinary kriging).
GEOSTAT course, 11-17 April 2011, Canberra
Multi-scale concept
5.6 km
Global: climatic patterns and
processes, vegetation zones,
elevation
1 km
extent
Continental: geological zones,
meso-climatic conditions, erosion/
deposition at large scales
250 m Regional: general land use,
vegetation cover
100 mLocal: land management, erosion/
deposition at watershed level
GEOSTAT course, 11-17 April 2011, Canberra
Multi-resolution signal (McBratney, 1998)
GEOSTAT course, 11-17 April 2011, Canberra
Some thoughts
I Global models — global multiscale predictions — arenow.
I It is very probable that, in the near future, any geostatisticalanalysis will be global.
I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).
I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!
GEOSTAT course, 11-17 April 2011, Canberra
Some thoughts
I Global models — global multiscale predictions — arenow.
I It is very probable that, in the near future, any geostatisticalanalysis will be global.
I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).
I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!
GEOSTAT course, 11-17 April 2011, Canberra
Some thoughts
I Global models — global multiscale predictions — arenow.
I It is very probable that, in the near future, any geostatisticalanalysis will be global.
I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).
I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!
GEOSTAT course, 11-17 April 2011, Canberra
Some thoughts
I Global models — global multiscale predictions — arenow.
I It is very probable that, in the near future, any geostatisticalanalysis will be global.
I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).
I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!
GEOSTAT course, 11-17 April 2011, Canberra
Limits of geostatistics in R
I Sampling optimization algorithms for geostatisticalmodelling are missing
I Local regression-kriging still waits to be implemented
I In gstat, only linear models can be implemented; geoR canincorporate also non-linear models, but it can only work with<< 103 points
I KED algorithm can be rather slow and can lead to instabilities(singularity problem)
I Interactive visualization and fitting of 3D space-timevariograms is missing
GEOSTAT course, 11-17 April 2011, Canberra
Nothing can save bad data!
I Even the most sophisticated geostatistical tools will not beable to save the data sets of poor quality! If you want toproduce quality outputs (maps/reports), make sure your inputfield data satisfies some minimum requirements:
1. it is large enough2. it is representative3. it is independent4. it is produced using consistent methodology5. its precision is significantly precise
I Geostatistical mapping using inconsistent point samples ispossible, but do you really need this?
GEOSTAT course, 11-17 April 2011, Canberra