statistics 416, applied geostatistics

31
Statistics 416, Applied Geostatistics. Outline for the day: 1. Syllabus, etc. 2. Examples of spatial-temporal point processes. 3. Characterizations of point processes. 4. Integration. 5. Exercises. 6. Simplicity and orderliness. 7. Conditional intensity. 8. Poisson process. Read reinhart18.pdf . Ignore CCLE website. Use http://www.stat.ucla.edu/~frederic/416/S21 . Zoom password. Grading? Breaks? Please do not drop without talking to me first!

Upload: others

Post on 18-Mar-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics 416, Applied Geostatistics

Statistics 416, Applied Geostatistics.

Outline for the day:1. Syllabus, etc.2. Examples of spatial-temporal point processes. 3. Characterizations of point processes. 4. Integration. 5. Exercises. 6. Simplicity and orderliness. 7. Conditional intensity. 8. Poisson process.

Read reinhart18.pdf . Ignore CCLE website. Use http://www.stat.ucla.edu/~frederic/416/S21 . Zoom password. Grading? Breaks? Please do not drop without talking to me first!

Page 2: Statistics 416, Applied Geostatistics

1. Syllabus.

2. Examples of spatial-temporal point processes.

A point process is a random collection of points falling in some metric space. For a spatial-temporal point process, the metric space is a portion of space-time, S = Rd x R. Often d = 2, but sometimes 3.

Examples include incidence of disease, sightings or births of a species, occurrences of fires, earthquakes, lightning strikes, tsunamis, or volcanic eruptions.

Page 3: Statistics 416, Applied Geostatistics

Microearthquake origin times and epicenters in Parkfield, CA, recorded by the US High-Resolution Seismographic Station Network.

Page 4: Statistics 416, Applied Geostatistics

Wildfire centroids between 1876 and 1996 in Los Angeles County, CA, recorded by the Los Angeles County Department of Public Works

Page 5: Statistics 416, Applied Geostatistics

For a marked point process, each point has some mark or random variable associated with it.

Locations, times and magnitudes of moderate-sized (M ≥ 3.5) earthquakes in Bear Valley, CA, between 1970 and 2000.

Page 6: Statistics 416, Applied Geostatistics

3. Characterizations of point processes.

Point processes have been characterized various ways.

Daley and Vere-Jones (2003) detail the history originating from life tables, dating back to the 1700s.

3a. STOCHASTIC PROCESS. Point processes on the line were originally characterized as examples of stochastic processes on the line, that are: * Non-decreasing, * Z+ valued. So it is tempting to define a point process as any non-decreasing, Z+ valued stochastic process. On the line, this definition works just fine.

0 1 2 3 4 5 6 7 --------.------.-.------------.------.--------.--.-----------

Page 7: Statistics 416, Applied Geostatistics

3a continued.

Non-decreasing, Z+ valued stochastic process.

This has problems extending to spatial-temporal processes.

Let N(x,y) = the number of points (x1,y1) with x1 ≤ x and y1≤y.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0

1

2

3

Page 8: Statistics 416, Applied Geostatistics

3a continued.

It is very unclear what points this nondecreasing Z+ valued process corresponds to though.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0

1

Page 9: Statistics 416, Applied Geostatistics

3b. A LIST OF POINTS. For any finite collection of points, one could represent it simply as a finite list of points, N = {x1, x2, ...}. J. Møller and R. Waagepetersen (2004), Statistical Inference and Simulation for Spatial Point Processes, CRC, Boca Raton uses this formulation. If S = Rd x R is the domain where the points occur, then N is a vector in Ø U S U S2 U S3 U ....This works just fine for finite point processes, and point processes with a countable number of points. But you can imagine wanting to consider the point process with points everywhere, or points at all irrational times, or some other uncountable collection of points. You could restrict your definition to include only point processes with points on a set of measure zero, but then what about if N has points at all times in the Cantor set?

Page 10: Statistics 416, Applied Geostatistics

3c. RANDOM MEASURE.

A Z+ valued random measure is now the preferred mathematical definition, as it includes a wide range of processes on the line and extends readily to space-time.

The measure N(A) represents the number of points falling in the region A of space-time. For this set A, for example, the value of N(A) is 3. Attention is sometimes restricted to processes with only a finite number of points in any compact subset of S.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

A

Page 11: Statistics 416, Applied Geostatistics

3c, continued.

In the 1950s-1970s much attention was paid to the idea that there are non-measurable sets. N(A) is technically only defined for measurable A. A collection of points at all times in the Vitali set, for instance, if not considered a valid point process.

With the random measure formulation, the domain S can also be some abstract space. Brillinger (1997, 2000) analyzed point processes on the surface of a sphere or on an ellipse, for instance.

Brillinger, D.R. (1997). A particle migrating randomly on a sphere, Journal of Theoretical Probability 10, 429-433.

Brillinger, D.R. (2000). Some examples of random process environmental data analysis, in Handbook of Statistics, Vol. 18, C.R. Rao & P.K. Sen, eds, NorthHolland, Amsterdam, pp. 33–56.

Page 12: Statistics 416, Applied Geostatistics

4. Integration.

Note that with the measure theory formulation, which we will use throughout, N(B) = ∫B dN is the number of points in B.

What is ∫t ∫x ∫y f(t,x,y) dN?

It is simply ∑i f(ti,xi,yi) .

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

t

f(t)

Page 13: Statistics 416, Applied Geostatistics

Exercises.

1. Suppose the spatial-temporal point process N has points at

time 1.2, x=2, y=3.

time 2.4, x=3, y=0.5.

time 8.7, x=2, y=1.

Let B = [0,10] x [1.5, 2.5] x [0,5].

What is ∫B dN?

Page 14: Statistics 416, Applied Geostatistics

Exercises.

1. Suppose the spatial-temporal point process N has points at

time 1.2, x=2, y=3.

time 2.4, x=3, y=0.5.

time 8.7, x=2, y=1.

Let B = [0,10] (time) x [0,5] x [0,5].

What is ∫B dN?

2.

Page 15: Statistics 416, Applied Geostatistics

Exercises.

1. Suppose the spatial-temporal point process N has points at

time 1.2, x=2, y=3.

time 2.4, x=3, y=0.5.

time 8.7, x=2, y=1.

Let B = [0,10] (time) x [0,5] x [0,5].

What is ∫B (t+x2y) dN?

Page 16: Statistics 416, Applied Geostatistics

Exercises.

1. Suppose the spatial-temporal point process N has points at

time 1.2, x=2, y=3.

time 2.4, x=3, y=0.5.

time 8.7, x=2, y=1.

Let B = [0,10] (time) x [0,5] x [0,5].

What is ∫B (t+x2y) dN?

13.2 + 6.9 + 12.7 = 32.8.

Page 17: Statistics 416, Applied Geostatistics

6. Simple and orderly point processes.

Last time we noted that the accepted definition of a point process is a Z+ valued random measure.

We will sometimes use the notation ti to represent point i = (ti,xi,yi).

A point process is stationary (or homogeneous) if, for any k and any collection of measurable sets, B1, B2, ..., Bk, the joint distribution of the collection {N(B1+D), N(B2+D), ..., N(Bk+D)} doesn't depend on D.

A point process is called simple if all the points are distinct, i.e.

P(there are indices i and j where ti = tj) = 0.

A point process is orderly if for any time t and any spatial interval B,

lim Dt -> 0 P(N([t,t+Dt) x B) > 1) / (Dt |B|) = 0.

If N is simple and stationary, then it is orderly (Daley and Vere-Jones, 2003, 3.3.V and 3.3.VI).

Page 18: Statistics 416, Applied Geostatistics

Simple and orderly point processes, continued.

The times {t1,t2, ..., tn} are sometimes said to form the ground process.

N has simple ground process if all the times are distinct, with prob. 1.

Examples of a non-simple process, N with non-simple ground process, and a nonorderly process with points at {(1/i, U(0,1), i=1,2,...}.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

t

lat

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

t

lat

0.0 0.2 0.4 0.6 0.8 1.0

0.00.2

0.40.6

0.81.0

t

lat

Page 19: Statistics 416, Applied Geostatistics

7. Conditional intensity, l.

Fix any space-time location (t,x,y). l(t,x,y) is the limiting expected rate of accumulation of points around (t,x,y), given all points prior to t.

l(t,x,y) = lim Dt, d -> 0 E( N([t,t+Dt) x B(x,y,d)) | Ht ) / (Dt πd2), where

B(x,y,d) is a circle of radius d around (x,y), and Ht is the history of the process up to but not including time t.

If N is orderly, then

l(t,x,y) = lim Dt, d -> 0 P(N([t,t+Dt) x B(x,y,d)) > 0 | Ht ) / (Dt πd2).

Note that l is random. It can depend on what points have occurred previously, and these might be different with every realization.

Fix some spatial interval, B.

N(t, B) – ∫0t ∫∫ B l(t,x,y) dxdy dt is a martingale.

Page 20: Statistics 416, Applied Geostatistics

Conditional intensity, l, continued.

We will use l(t,x,y) = lim Dt, d -> 0 E( N([t,t+Dt) x B(x,y,d)) | Ht ) / (Dt πd2).

l is predictable.

Predictable is just a slight generalization of left-continuous.

If for any (t,x,y), the limit doesn't exist or is ∞, we say l doesn't exist.

l uniquely determines the distributions of any simple point process.

Page 21: Statistics 416, Applied Geostatistics

Conditional intensity, continued.

l(t,x,y) = lim Dt, d -> 0 E( N([t,t+Dt) x B(x,y,d)) | Ht ) / (Dt πd2).

l uniquely determines the distributions of any simple point process.

Daley and Vere-Jones (2003), prop. 7.2.IV.

This is an amazing fact. For many stochastic processes, you need to know the conditional mean and variance and maybe more info in order to specify the process. For Gaussian processes, you need to specify the mean, variance, and covariance of the process. For simple point processes, all you need is the conditional mean, l.

In modeling a point process, we typically assume it is simple and then just write down a model for l.

Page 22: Statistics 416, Applied Geostatistics

8. Poisson processes.

The most basic models for point processes are the Poisson processes.

If N is a simple point process with conditional intensity l, where l does not depend on what points have occurred previously, then N is a Poisson process.

The name comes from the fact that for such a process, for any set B, N(B) has a Poisson distribution.

P(N(B) = k) = e-A Ak / k! , for k = 0, 1, 2, ..., where A = ∫B l(t,x,y) dtdxdy, and with the convention 0! = 1.The mean of N(B) is A and the varianceis also A.

Page 23: Statistics 416, Applied Geostatistics

Poisson processes, continued.

A Poisson process is a simple point process with conditional intensity l, where l does not depend on what points have occurred previously.

Note that a Poisson process does not have

to be stationary. A simple point process

with l(t,x,y) = 10.5 + 2t + 4x – exp(-y)

is a Poisson process.

If l is constant for all t,x,y, and N is

simple, then N is a stationary Poisson

process, and is sometimes called

completely random.

Page 24: Statistics 416, Applied Geostatistics

Poisson processes, continued.

On the left is a stat. Poisson process with l(t,x) = 2.5 on [0,1] x [0,10],

and on the right is a Poisson process with l(t,x) = 1.5 + 10t + 2x.

The key thing about Poisson processes is their complete independence.

For a Poisson process N, N(B1) and N(B2) are independent for any disjoint sets B1 and B2.

0.0 0.2 0.4 0.6 0.8

24

68

10

t

x

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

t

x

Page 25: Statistics 416, Applied Geostatistics

More (tricky) exercises.

1. Suppose N is generated as follows. For each integer i = 1,2,..., N has a point at (i, i, i) with probability 1/i, independently of the other points, and N has no other points.

What is l(2,2,2)?

a) 2. b) 1. c) ½. d) does not exist.

0 20 40 60 80 100

020

4060

80100

t

lat

Page 26: Statistics 416, Applied Geostatistics

Exercises.

2. Suppose N is generated as follows. For each integer i = 1,2,..., N has a point at (i, i, i) with probability 1/i, independently of the other points, and N has no other points.

N is

a) simple but not orderly. b) orderly but not simple.

c) simple and orderly. d) neither simple nor orderly.

0 20 40 60 80 100

020

4060

80100

t

lat

Page 27: Statistics 416, Applied Geostatistics

Exercises.

3. Suppose N is a Poisson process with l(t,x,y) = 1.5 + 10t + 2x

on B = [0,1] x [0,10] x [0,1].

What is EN(B)?

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

t

x

Page 28: Statistics 416, Applied Geostatistics

Exercises.

3. Suppose N is a Poisson process with l(t,x,y) = 1.5 + 10t + 2x

on B = [0,1] x [0,10] x [0,1].

What is EN(B)?

∫B l(t,x,y) dt dx dy = 1.5(10) + 10(10)(12)/2 + 2(1)(102)/2 = 15+50+100=165.

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

t

x

Page 29: Statistics 416, Applied Geostatistics

Code.

## nonsimple point process n = 20x = runif(n)y = runif(n)plot(x,y,xlab="t",ylab="lat",pch=2)points(x[20],y[20],pch=3)

## nonsimple ground process plot(x,y,xlab="t",ylab="lat",pch=2)points(x[20],y[20]+.05,pch=3)

## nonorderly processplot(c(0,1),c(0,1),type="n",xlab="t",ylab="lat")n = 100for(i in 1:n) points(1/i,runif(1),pch=3,cex=.5)

Page 30: Statistics 416, Applied Geostatistics

Code.

## points at (i,i) with prob. 1/i. plot(c(0,100),c(0,100),type="n",xlab="t",ylab="lat")for(i in 1:100) if(runif(1) < 1/i) points(i,i,pch=3)

## stationary Poisson process with intensity 2.5 on B=[0,1]x[0,10].n = rpois(1,2.5*1*10)t = runif(n)x = runif(n)*10plot(t,x,pch=3)

Page 31: Statistics 416, Applied Geostatistics

Code. ## nonstationary Poisson process with intensity 1.5+10t+2x on B.n = rpois(1,15+50+100)n1 = 0t = c()x = c()while(n1<n){t2 = runif(1) ## candidate pointx2 = runif(1)*10if(runif(1) < (1.5+10*t2+2*x2)/(1.5+10+20)){ ## keep itt = c(t,t2)x = c(x,x2)n1 = n1 + 1cat(n1," ")

}}plot(t,x,pch=3)