math2740: environmental statistics

29
Introduction Point-object distances Clark-Evans test MATH2740: Environmental Statistics Lecture 6: Distance Methods I February 15, 2019

Upload: others

Post on 22-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

MATH2740: Environmental StatisticsLecture 6: Distance Methods I

February 15, 2019

Page 2: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Table of contents

1 IntroductionProblem with quadrat dataDistance methods

2 Point-object distancesPoisson process case

Rayleigh distribution

Distribution of object-object distances

3 Clark-Evans testClark-Evans test of randomnessProblems with the Clark-Evans testExamples of Clark and Evans testProblems with Clark-Evans test II

Page 3: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Problems with quadrat data

Quadrat methods can be inefficient to use in some circumstances:

Time and cost to lay out and search all quadrats.

Choice of quadrat size potentially influencing conclusions.

Quadrat counts do not depend on underlying point pattern.Plots have same quadrat counts but different spatial pattern.

Page 4: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Distance methods

Using distance methods tries to overcome some of the problemsassociated with quadrat counting methods.

Page 5: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Types of distance measurement

Distance measurements involve measuring:

Distances from randomly selected points to the nearestneighbouring object, giving a point-object distance.

Distances from a randomly selected object to the nearestneighbouring object, giving an object-object distance.This procedure requires us to know the locations of all objectswithin the study area to allow selecting objects randomly.

Page 6: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Example: Types of distance measurement

Plots show a Poisson process with 30 objects within a unit square.Left: distances from four randomly selected objects in the studyarea to their nearest object. Gives object-object distances.Right: distances from four randomly located points in the studyarea to their nearest object. Gives point-object distances.

Page 7: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Other types of distance measurement I (NOT examined)

Other types of distance measurement can be considered:

Random object to the nth nearest neighbour.

Random point to the nth nearest neighbour.

Besag and Gleaves (1973) T-square sampling.

Page 8: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Other types of distance measurement II (NOT examined)

Besag and Gleaves (1973) T-square sampling.

Find distance from a random point O to the nearest object P .

Find distance to nearest object Q from P , where Q is locatedin the half-plane beyond O.

Gives a point-object distance and an object-object distance.

P

Q

O

Page 9: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Point-object distances I

Suppose object locations occur as a Poisson process with intensityλ (mean number of objects per unit area is λ).Number X (A) of objects in a region A with size |A| has a Poissondistribution with mean µ = λ|A| so

pr{X (A) = x} =µxe−µ

x!=

(λ|A|)xe−λ|A|

x!, x = 0, 1, 2, . . . .

In particular, pr{X (A) = 0} = e−λ|A|.

Page 10: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Point-object distances II

Let R denote distance from a random point to nearest object.Consider a circle of radius r centred on the random point.

r

Distance R from a random point to the nearest object is greaterthan r if the circle of radius r and area πr2 contains no objects.

Page 11: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Point-object distances III

Distance R from random point to nearest object satisfies

pr{R > r} = pr{No objects inside circle of radius r}= pr{X (A) = 0}

where X (A) ∼ Poisson(µ = λ|A|) with |A| = πr2. Hence

pr{R > r} = exp(−λπr2).

Cumulative distribution function of R is

FR(r) = pr{R ≤ r} = 1−pr{R > r} = 1− exp(−λπr2), r > 0.

The probability density function fR(r) of R is

fR(r) =dFR(r)

dr= 2λπr exp(−λπr2), r > 0.

Page 12: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Rayleigh distribution I

fR(r) =dFR(r)

dr= 2λπr exp(−λπr2), r > 0.

This is probability density function of a Rayleigh distribution.It is a special case of the Weibull distribution with probabilitydensity function fX (x) = abxb−1 exp(−axb), for x > 0, wherea > 0 and b > 0. Here a = λπ and b = 2.Plots show λ = 0.1 (left), λ = 0.2 (centre) and λ = 0.4 (right).

r

Pdf

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

r

Pdf

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

r

Pdf

0 1 2 3 4 50.

00.

20.

40.

60.

81.

0

Page 13: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Rayleigh distribution II

E[R ] =

∫ ∞

r=0

rfR(r)dr =

∫ ∞

r=0

2λπr2 exp(−λπr2)dr .

Let y = λπr2 so dy = 2λπrdr and dr =dy

2√

λπyso

E[R ] =

∫ ∞

y=0

2ye−y

2√

λπydy =

1

2√

λ

∫ ∞

y=0

y0.5e−y

12

√π

dy

=1

2√

λ

∫ ∞

y=0

y0.5e−y

Γ(

32

) dy =1

2√

λ

since Γ(32) = 1

2Γ(1

2) = 1

2

√π and area under a gamma(α = 3

2, 1)

distribution integrates to one, so that

∫ ∞

y=0

y0.5e−y

Γ(

32

) dy = 1.

Page 14: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Rayleigh distribution III: revision of gamma distribution

A gamma(α, λ) distribution has probability density function

fY (y) =λαyα−1e−λy

Γ(α)

for y > 0, where the gamma function satisfiesΓ(α) = (α − 1)Γ(α − 1) with Γ(1) = 1 and Γ

(

12

)

=√

π.

Page 15: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Rayleigh distribution IV

E[R2] =

∫ ∞

r=0

r2fR(r)dr =

∫ ∞

r=0

2λπr3 exp(−λπr2)dr .

Putting y = λπr2 and dy = 2λπrdr gives

E[R2] =

∫ ∞

y=0

2ye−y

2λπdy =

1

λπ

∫ ∞

y=0

ye−y dy =1

λπ

as area under a gamma(α = 2, 1) distribution integrates to one (⋆)

so, with Γ(2) = 1,

∫ ∞

y=0

ye−y

Γ(2)dy = 1.

Hence Var[R ] = E[R2] − {E[R ]}2 =1

λπ− 1

4λ=

4 − π

4λπ.

(⋆) Or recall for Y ∼ exponential(1), E[Y ] =

∫ ∞

y=0

ye−y dy = 1.

Page 16: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Object-object distances

Given a large number N of objects in the study area A, thedistribution of the distance between a random object and thenearest neighbouring object is the same as the point-objectdistance.

Suppose A contains N objects randomly positioned within A.Probability any object is located in a small region a ⊂ A is |a|/|A|.Probability any object is not located in a is 1 − |a|/|A|.If |a| = πr2, probability that none of remaining N − 1 objects arewithin a distance r of a randomly chosen object is(1 − πr2/|A|)N−1 by independence.Writing λ ≈ N/|A| gives pr{R ≤ r} ≈ 1 − (1 − λπr2/N)N−1.As N → ∞ this gives same as point-object distribution function.

Page 17: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Clark-Evans test I

Have N object-object nearest neighbour distances ri ,i = 1, 2, . . . ,N, with sample mean r̄ .If randomness (Poisson process) assumption is true, then for largeN, Clark and Evans (1954) assume

r̄ ≈ N

(

1

2√

λ,4 − π

4λπN

)

where E[R ] =1

2√

λand Var[R ] =

4 − π

4λπ. Hence

Z =

r̄ − 1

2√

λ√

4 − π

4λπN

≈ N(0, 1).

Reject randomness hypothesis at 5% level if |Z | > 1.96.

Page 18: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Clark-Evans test II

For small N Clark and Evans would suggest using a suitablegamma distribution as an approximation to the distribution of r̄ .

Page 19: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Clark-Evans measure of randomness

Clark and Evans use1

φR =r̄

E[R ]= 2

√λ r̄

as a measure of randomness.φR ≈ 1 for a random process, φR < 1 for a clustered (aggregated)process, and φR > 1 for a regularly located process2.

1Clark and Evans used the symbol R for their randomness measure but toavoid confusion with the random variable R the symbol φR is used here.

2Most extreme case has objects on a hexagonal grid, each object the samedistance r̄ from six others. This hexagon has area 3

3r̄2/2 and is associatedwith 3 data points, the central point and a weight one third for each of the sixsurrounding points, so λ = 3/(3

3r̄2/2). Thus r̄ = 1.0746/

λ so φR = 2.149.

Page 20: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Problems with Clark-Evans test I

Intensity λ should be known to carry out the test. Could beestimated using the mean number of objects per unit areafrom the study region.

Clark-Evans test uses all N object-object distances. Thesedistances are not independent but Diggle (1976) and Donnelly(1978) showed that the correlations are small3.

Correlations between the object-object distances mean centrallimit theorem does NOT apply. However Z ≈ N(0, 1) asshown by Donnelly (1978).

3Donnelly (1978) obtained better approximations for mean and variance ofthe object-object distances, but for large N these give E[R] and Var[R] asobtained by assuming object-object distances are independent.

Page 21: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Using a border region I

Clark and Evans (1954) advise having a border around the studyregion to avoid bias.For points near the edge of a study region the calculatedobject-object distance to objects within the study region will tendto be larger than it should be. This will have the effect of biasingthe test statistic Z upwards, rejecting the randomness hypothesisand suggesting regularity of the data points.

Page 22: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Using a border region II

Object-object distances are measured for all objects within theinner region and can be to points within the border region.

Page 23: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Using a border region III

Donnelly (1978) presented approximations for E[R ] and Var[R ]when a border is ignored.For perimeter P ,

E[R ] ≈ 1

2√

λ+

P

N

(

0.0514 +0.0412√

N

)

,

Var[R ] ≈ 0.070

λN+

0.037P

N2√

λ.

Page 24: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Using a toroidal correction

If a rectangular study region, an alternative is to assume the regionlies on a torus, so opposite edges are adjacent to each other.The study region (centre below) is surrounded by a grid of identicalregions. Object-object distances are measured for all objects withinthe central region and can be to points outside the centre.

Page 25: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Example 1: Simulated data I

The object-object nearest neighbour distances for the N = 11objects within the inner study region below are:

0.201 0.201 0.327 0.327 0.350 0.350 0.500 0.5000.657 0.826 1.278

Page 26: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Example 1: Simulated data II

Data are:

0.201 0.201 0.327 0.327 0.350 0.350 0.500 0.5000.657 0.826 1.278

These have mean r̄ = 0.5015. The inner region has area 9m2 so λcan be estimated by λ = 11/9 = 1.222. The test statistic is thus

z =

r̄ − 1

2√

λ√

4 − π

4λπN

=0.5015 − 0.4523

0.07128= 0.690.

Here |z | < 1.96. Accept the randomness hypothesis at 5% level.Notice many of the object-object distances are the same.

Page 27: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Example 2: Ground ant nests in Panama I

Levings and Franks (1982) present data for the number of groundant nests in various study regions on Barro Colorado Island, inGatun Lake, Panama. For one 100m2 square study region thenumber of nests of Ectatomma ruidum per m2 was given as 0.61with φR = 1.16. This suggests λ = 0.61, N = 100λ = 61 and

r̄ =φR

2√

λ= 0.7426. The Clark-Evans test statistic is then

z =

r̄ − 1

2√

λ√

4 − π

4λπN

=0.7426 − 0.6402

0.04285= 2.390.

As a two-sided test the P-value of this test isP = pr{|Z | > 2.390} = 0.0168 so reject randomness hypothesis.φR > 1 suggests the ant nests are distributed regularly.

Page 28: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Example 2: Ground ant nests in Panama II

Unfortunately Levings and Franks did not appear to use a borderso that their results are invalid. Using the corrected values for E[R ]and Var[R ] obtained by Donnelly (1978) the test statistic becomesz = 0.470 which is not significant. There is thus no evidence toreject the randomness hypothesis.For perimeter P (here 40m), this gives

E[R ] ≈ 1

2√

λ+

P

N

(

0.0514 +0.0412√

N

)

= 0.67735,

Var[R ] ≈ 0.070

λN+

0.037P

N2√

λ= 0.019321.

Page 29: MATH2740: Environmental Statistics

Introduction Point-object distances Clark-Evans test

Intensive sampling

If all the nearest neighbour distances are calculated in a region,then the values are not independent. Cressie (1993, p.609-610)refers to this as “intensive sampling”. The consequence is that dueto the correlations the true variance of R is greater than thatassumed so the test statistic Z used in the test tends to be largerthan it should be resulting in non-randomness being suggestedmore often than it should be.

One solution is to use Monte-Carlo tests for inference.

Independent realizations of the data assuming the null hypothesisis true are simulated and the test statistic Zi calculated for each.The observed value of the test statistic Z can be compared withthe ones simulated and the test rejects the null hypothesis if theobserved Z is too large or too small when compared with thesimulated Zi .