compression and analysis of very large imagery data sets using spatial statistics james a. shine...
Post on 13-Jan-2016
221 Views
Preview:
TRANSCRIPT
Compression and Analysis of Very Large Imagery Data Sets
Using Spatial Statistics
James A. Shine
George Mason University and
US Army Topographic Engineering Center
Interface 2001
June 16, 2001
ACKNOWLEDGMENTS
Dr. Margaret Oliver, University of Reading, UK
Dr. Richard Webster, Rothamsted Laboratory, UK
Dr. Daniel Carr, George Mason University
INTRODUCTIONGreater resolution in imagery data sets:
pixel resolution (1 meter; 3 x 10^6 data points/square mile)
more bands (up to 256 in hyperspectral sensors;+10^2)
more imagery over timeCompression becomes an important part of
timely analysis.How far can image be compressed before
information is lost?
PROFESSIONAL MOTIVATION:
Collecting imagery, climatic and other topographic data
Transforming the data into maps, surfaces, and other topographic products
Determination of sampling intervals using spatial statistics is an important tool for many of our applications:
collecting ground truth
choosing training points for classification
DATA SETS
CAMIS Data Collection
Computerized Airborne Multicamera Imaging System
Four-band sensor flown in Lear jet (blue, green, red, near infrared)
Each data frame 768x576 pixelsEach flight line has 30 framesEach collect uses 10-15 flight linesOrder of 10^7 data points per collect
Data Preprocessing
Considerable overlap in flight linesBands registered to each other firstOverlap removed, forming mosaicRadiometric correctionMap registration
Ft. Story, VAFt. A.P. Hill, VA
SPATIAL STATISTICS
Much spatial data (such as imagery) is spatially correlated; points close together have lower variance than those farther apart.
Variance can be divided into background noise (stochastic) and spatial.
The variance can be modeled by plotting vs. distance between points (variogram) and used for many applications.
STOCHASTIC AND SPATIAL VARIATION
STOCHASTIC VARIATION IS LOCAL, BACKGROUND NOISE (NUGGET EFFECT)
SPATIAL VARIATION IS GLOBAL (SILL AND RANGE)
THE SCALE OF SPATIAL VARIATION IS ESPECIALLY IMPORTANT
VARIOGRAMS DEMONSTRATE THESE TWO VARIATIONS
HOW TO COMPUTE A VARIOGRAM
We have sample locations x1, x2, … and values z at each location. The semivariance
for a given distance h is:
Where n(h) is the number of pairs of points a distance h apart. The semivariance is then plotted against h as shown on the next slide.
( )[ ( ) ( )]
* ( )
( )
hz x z x
n h
i h ii
n h
2
1
2
MODELING THE VARIOGRAMThe variogram is then fit on several different
models: exponential, nested exponential spherical, nested spherical circular others
The best-fitting model (minimum squared error or a similar metric) is chosen.
The model is then used to determine the scale (or scales in nested models) of variation and for interpolation and estimation.
COMPARISON EXPERIMENT
Compute variogram of complete image bandCompute variograms of subsampled image
band (reduced by powers of 2)Compare the variograms, determine when
curve is lostUse this as a compression threshold
COMPUTING A FULL IMAGE VARIOGRAM
Data transferred from imagery to text file (ERDAS Imagine, Arc/Info)
Modified FORTRAN program Running time: approx. 1 hour per 4 x 10^6
points only 2 directions (N-S and E-W)Current algorithm O(n^2), may be reducibleDetails: Shine, JSM 2000
Ft. Story full image variograms
FT. STORY BAND 1 ROWS
DISTANCE
GA
MM
A
0 200 400 600 800 1000
01000
3000
5000
FT. STORY BAND 1 COLUMNS
DISTANCE
GA
MM
A
0 200 400 600 800 1000
01000
3000
5000
FT. STORY BAND 1 AVERAGE
DISTANCE
GA
MM
A
0 200 400 600 800 1000
01000
3000
5000
FT. STORY BAND 2 ROWS
DISTANCE
GA
MM
A
0 200 400 600 800 1000
01000
2000
3000
4000
FT. STORY BAND 2 COLUMNS
DISTANCE
GA
MM
A
0 200 400 600 800 1000
01000
2000
3000
4000
FT. STORY BAND 2 AVERAGE
DISTANCE
GA
MM
A
0 200 400 600 800 1000
01000
2000
3000
4000
FT. STORY BAND 3 ROWS
DISTANCE
GA
MM
A
0 200 400 600 800 1000
0500
1500
2500
FT. STORY BAND 3 COLUMNS
DISTANCE
GA
MM
A
0 200 400 600 800 1000
0500
1500
2500
FT. STORY BAND 3 AVERAGE
DISTANCE
GA
MM
A
0 200 400 600 800 1000
0500
1500
2500
NUGGET MODEL
h
gam
ma
0 5 10 15 20 25 30
0.8
0.9
1.0
1.1
1.2
LINEAR MODEL
h
gam
ma
0 5 10 15 20 25 30
05
1015
2025
30
SPHERICAL MODEL
h
gam
ma
0 5 10 15 20 25 30
0.2
0.4
0.6
0.8
1.0
EXPONENTIAL MODEL
h
gam
ma
0 5 10 15 20 25 30
0.2
0.4
0.6
0.8
1.0
THEORETICAL VARIOGRAM MODELS
DOUBLE EXPONENTIAL MODEL
distance
ga
mm
a
0 5 10 15 20 25 30
0.5
1.0
1.5
2.0
+
+
++
++ + + + + + + + + + + + + + + + + + + + + + + + +
o
oo
oo
oo
o o o o o o o o o o o o o o o o o o o o o o o
X
X
X
X
X
XX
XX
XX X X X X X X X X X X X X X X X X X X X
A NESTED VARIOGRAM MODEL
Ft. A.P. Hill full image variograms
BAND 1
COMPRESSION ANALYSIS
Start with full variogram
Reduce sample by ¼ successively
Compare resulting variograms
EXAMPLE RESULT: A.P. HILL, BAND 1
FULL
ADD 1/4
ADD 1/16
ADD 1/64
ADD 1/256
FULL (ORANGE) AND 1/256 (BLUE) IMAGES SUPERIMPOSED
CONCLUSIONS
Preliminary results show little degradation in variogram at 256 times reduction
Seems to indicate that image can be compressed ~10^2 without affecting results of spatial statistical analysis
Computing time savings: hours to minutes
FUTURE WORK
Optimize variogram code
Finish tests on other Ft.A.P. Hill and Ft. Story imagery bands
Compare other available CAMIS imagery
Obtain general rule for achievable compression for obtaining a spatial correlation model from 1-meter imagery
Perform other image analysis operations on original and compressed images and compare.
top related