compression and analysis of very large imagery data sets using spatial statistics james a. shine...

Post on 13-Jan-2016

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Compression and Analysis of Very Large Imagery Data Sets

Using Spatial Statistics

James A. Shine

George Mason University and

US Army Topographic Engineering Center

Interface 2001

June 16, 2001

ACKNOWLEDGMENTS

Dr. Margaret Oliver, University of Reading, UK

Dr. Richard Webster, Rothamsted Laboratory, UK

Dr. Daniel Carr, George Mason University

INTRODUCTIONGreater resolution in imagery data sets:

pixel resolution (1 meter; 3 x 10^6 data points/square mile)

more bands (up to 256 in hyperspectral sensors;+10^2)

more imagery over timeCompression becomes an important part of

timely analysis.How far can image be compressed before

information is lost?

PROFESSIONAL MOTIVATION:

Collecting imagery, climatic and other topographic data

Transforming the data into maps, surfaces, and other topographic products

Determination of sampling intervals using spatial statistics is an important tool for many of our applications:

collecting ground truth

choosing training points for classification

DATA SETS

CAMIS Data Collection

Computerized Airborne Multicamera Imaging System

Four-band sensor flown in Lear jet (blue, green, red, near infrared)

Each data frame 768x576 pixelsEach flight line has 30 framesEach collect uses 10-15 flight linesOrder of 10^7 data points per collect

Data Preprocessing

Considerable overlap in flight linesBands registered to each other firstOverlap removed, forming mosaicRadiometric correctionMap registration

Ft. Story, VAFt. A.P. Hill, VA

SPATIAL STATISTICS

Much spatial data (such as imagery) is spatially correlated; points close together have lower variance than those farther apart.

Variance can be divided into background noise (stochastic) and spatial.

The variance can be modeled by plotting vs. distance between points (variogram) and used for many applications.

STOCHASTIC AND SPATIAL VARIATION

STOCHASTIC VARIATION IS LOCAL, BACKGROUND NOISE (NUGGET EFFECT)

SPATIAL VARIATION IS GLOBAL (SILL AND RANGE)

THE SCALE OF SPATIAL VARIATION IS ESPECIALLY IMPORTANT

VARIOGRAMS DEMONSTRATE THESE TWO VARIATIONS

HOW TO COMPUTE A VARIOGRAM

We have sample locations x1, x2, … and values z at each location. The semivariance

for a given distance h is:

Where n(h) is the number of pairs of points a distance h apart. The semivariance is then plotted against h as shown on the next slide.

( )[ ( ) ( )]

* ( )

( )

hz x z x

n h

i h ii

n h

2

1

2

MODELING THE VARIOGRAMThe variogram is then fit on several different

models: exponential, nested exponential spherical, nested spherical circular others

The best-fitting model (minimum squared error or a similar metric) is chosen.

The model is then used to determine the scale (or scales in nested models) of variation and for interpolation and estimation.

COMPARISON EXPERIMENT

Compute variogram of complete image bandCompute variograms of subsampled image

band (reduced by powers of 2)Compare the variograms, determine when

curve is lostUse this as a compression threshold

COMPUTING A FULL IMAGE VARIOGRAM

Data transferred from imagery to text file (ERDAS Imagine, Arc/Info)

Modified FORTRAN program Running time: approx. 1 hour per 4 x 10^6

points only 2 directions (N-S and E-W)Current algorithm O(n^2), may be reducibleDetails: Shine, JSM 2000

Ft. Story full image variograms

FT. STORY BAND 1 ROWS

DISTANCE

GA

MM

A

0 200 400 600 800 1000

01000

3000

5000

FT. STORY BAND 1 COLUMNS

DISTANCE

GA

MM

A

0 200 400 600 800 1000

01000

3000

5000

FT. STORY BAND 1 AVERAGE

DISTANCE

GA

MM

A

0 200 400 600 800 1000

01000

3000

5000

FT. STORY BAND 2 ROWS

DISTANCE

GA

MM

A

0 200 400 600 800 1000

01000

2000

3000

4000

FT. STORY BAND 2 COLUMNS

DISTANCE

GA

MM

A

0 200 400 600 800 1000

01000

2000

3000

4000

FT. STORY BAND 2 AVERAGE

DISTANCE

GA

MM

A

0 200 400 600 800 1000

01000

2000

3000

4000

FT. STORY BAND 3 ROWS

DISTANCE

GA

MM

A

0 200 400 600 800 1000

0500

1500

2500

FT. STORY BAND 3 COLUMNS

DISTANCE

GA

MM

A

0 200 400 600 800 1000

0500

1500

2500

FT. STORY BAND 3 AVERAGE

DISTANCE

GA

MM

A

0 200 400 600 800 1000

0500

1500

2500

NUGGET MODEL

h

gam

ma

0 5 10 15 20 25 30

0.8

0.9

1.0

1.1

1.2

LINEAR MODEL

h

gam

ma

0 5 10 15 20 25 30

05

1015

2025

30

SPHERICAL MODEL

h

gam

ma

0 5 10 15 20 25 30

0.2

0.4

0.6

0.8

1.0

EXPONENTIAL MODEL

h

gam

ma

0 5 10 15 20 25 30

0.2

0.4

0.6

0.8

1.0

THEORETICAL VARIOGRAM MODELS

DOUBLE EXPONENTIAL MODEL

distance

ga

mm

a

0 5 10 15 20 25 30

0.5

1.0

1.5

2.0

+

+

++

++ + + + + + + + + + + + + + + + + + + + + + + + +

o

oo

oo

oo

o o o o o o o o o o o o o o o o o o o o o o o

X

X

X

X

X

XX

XX

XX X X X X X X X X X X X X X X X X X X X

A NESTED VARIOGRAM MODEL

Ft. A.P. Hill full image variograms

BAND 1

COMPRESSION ANALYSIS

Start with full variogram

Reduce sample by ¼ successively

Compare resulting variograms

EXAMPLE RESULT: A.P. HILL, BAND 1

FULL

ADD 1/4

ADD 1/16

ADD 1/64

ADD 1/256

FULL (ORANGE) AND 1/256 (BLUE) IMAGES SUPERIMPOSED

CONCLUSIONS

Preliminary results show little degradation in variogram at 256 times reduction

Seems to indicate that image can be compressed ~10^2 without affecting results of spatial statistical analysis

Computing time savings: hours to minutes

FUTURE WORK

Optimize variogram code

Finish tests on other Ft.A.P. Hill and Ft. Story imagery bands

Compare other available CAMIS imagery

Obtain general rule for achievable compression for obtaining a spatial correlation model from 1-meter imagery

Perform other image analysis operations on original and compressed images and compare.

top related