statgis2009-pg1-revision.pdf
TRANSCRIPT
-
8/11/2019 StatGIS2009-PG1-revision.pdf
1/6
Automatic Classification of Landsat Timeseries
using Geostatistics and Discriminant Analysis
Pierre Goovaerts1, Susan K. Maxwell2, Jaymie R. Meliker3
1BioMedware
2U.S. Geological Survey (USGS) Earth Resource Observation and Science (EROS)
Center [email protected]
3Stony Brook University, Department of Preventive Medicine
Abstract.This paper presents a new supervised classification algorithm that com-
bines classical discriminant analysis with space-time interpolation of probability
residuals. This approach allows the automatic segmentation of satellite imagery
while using available classification maps (e.g. crop maps) to correct a posteriori
the results. The methodology is illustrated using a 6 year Landsat time series and
six types of land cover: urban, water, non-agricultural (e.g. forest), corn, soybean,
and other crops. The validation demonstrates the greater accuracy of the proposed
approach over a traditional discriminant analysis: the proportion of correctly clas-
sified pixels ranges from 67% to 96% of the image depending on the land cover,with better results obtained when the last three land cover types are merged into a
single crop category. Furthermore the geostatistical approach attenuates the pep-
per-and-salt effect in the classified maps. This classification can be used in envi-
ronmental and health studies to estimate long-term exposures to pesticides.
1 INTRODUCTION
A new generation of remote sensing hardware is providing high spatial and temporal
resolution imagery that documents the environment with unprecedented spatial and
spectral detail. This quantum leap in resolution has enormous potential for improving
our ability to document, monitor and model environmental risks thought to be asso-
ciated with adverse health outcomes. In particular, Maxwell et al. (2009) summarized
how land surface remotely sensed data are currently being used to study the relationship
between cancer and environmental contaminants.
Landscape-based cancer studies require mid-resolution (10-100m) or high-resolution
(
-
8/11/2019 StatGIS2009-PG1-revision.pdf
2/6
The development of the classification algorithm described in this paper was based on
the assumption that some classified scenes are available in order to calibrate the rela-
tionship between spectral values and land covers. This is a reasonable assumption for
crops since land cover/land use maps are now available from USGS, as well as crop
maps from USDA for some states. Figure 1 illustrates a typical situation whereby a timeseries of satellite images, possibly with missing years/seasons or missing pixels for a
monitored time, coexist with a few classified scenes (e.g. 2001, 2003). The objective is
to create a time series of land cover maps for all years. A geostatistical pixel-based clas-
sification was adopted for the following reasons: the approach can be fully automated,
2) the algorithm allows filling-in gaps through space-time interpolation, 3) a similar
approach proved to generate high-accuracy classifications in earlier analysis of hyper-
spectral data (Goovaerts, 2002), 4) the probabilistic classification provides a measure of
uncertainty that can be propagated through exposure assessment.
The methodology is illustrated using a 6 year Landsat time series (2000-2005) for an
area NW of Des Moines, Iowa (Figure 1). Data were collected in both spring and sum-mer. The scene comprises 801801 pixels of 30 meters, and the analysis focused on
three spectral bands and six types of land cover: urban, water, non-agricultural (e.g. for-
est), corn, soybean, and other crops. The relative area of the scene covered by each land-
cover is fairly constant across time; e.g. starting with the largest percentage: non-
agricultural (20.3-26.3), urban (22.0-23.0), corn (19.0-22.2), soybean (19.1-21.8), water
(9.5-9.8), and other crops (0.9-2.8).
2 METHODOLOGY
The classification algorithm proceeds as follows:
1.
Factorial kriging is used to filter the noise in the image and interpolate any miss-
ing reflectance values, such as cloudy pixels or missing years; see Figure 2.
2.
For the two years where a classification is available, the filtered values undergo a
linear discriminant analysis (DA, PROC DISCRIM in SAS). Based on this mod-
el one can compute, for each time where reflectance values are known or were
estimated at Step 1, the probability of occurrence for each land cover of interest
(e.g. corn, soybean, urban). Each pixel is then assigned to the most likely catego-
ry (see example for 4 categories in Figure 3, left bottom map).
3. For the two years where a classification is available, probability residuals are
computed as the difference between the DA-based probability and the observed
probability (0 or 1) of occurrence (Figure 3). Probability residuals are then inter-
polated to unmonitored times and locations using space-time kriging and vario-
gram models that are inferred automatically.
4.
Interpolated residuals are combined with the DA-based probabilities at unmoni-
tored times, leading to a posterior probability of occurrence for each land cover
of interest (see example for crop in Figure 4).
5. Each pixel is assigned to the land cover with the largest probability of occur-
rence (maximum-likelihood classification) and the classification uncertainty is
quantified using the entropy of the probability distribution (Figure 5).
-
8/11/2019 StatGIS2009-PG1-revision.pdf
3/6
Figure 1: Imagery time series used to illustrate and validate the classification algorithm:
three spectral bands (TM 3, 4 ,5) recorded in spring and summer for six years and the
USDA CDL maps (urban, water, non-agricultural, corn, soybean, and other crops) for
the corresponding years. Red scenes correspond to the two seasons where Landsat im-
agery was not available. Only the 2001 and 2003 images were used to calibrate the rela-
tionships between spectral values and land cover, while the four other years were set
aside for validation.
Figure 2: Geostatistical space-time interpolation of reflectance values to fill-in the spa-
tial and temporal gaps in the time-series of Landsat images. In this particular example,
the missing image of Summer 2002 is reconstructed. Filter weights are computed using
space-time ordinary kriging and a separable space-time covariance obtained as the prod-
uct of the spatial covariance by the temporal correlation. Note the strong correlation
between summer images for different years, while the correlation is only 0.398 with the
spring image of the same year.
-
8/11/2019 StatGIS2009-PG1-revision.pdf
4/6
Figure 3: Comparison of the true landcover map and results of the discriminant analysis
(DA) for 2001. Misclassification errors are caused by an overestimation or underestima-
tion of the true probability of occurrence of each class (0 or 1) by the DA-based proba-
bility, which can be expressed as a probability residual. Figure 4 shows an example for
the crop category.
Figure 4: Use of the kriged probability residuals to update the probability of occurrence
of the crop category derived from the discriminant analysis conducted in 2001 and 2003.
-
8/11/2019 StatGIS2009-PG1-revision.pdf
5/6
Figure 5: Maximum likelihood
classification of the Landsat
time series before and after
updating of the discriminant
analysis results using spatio-temporal interpolation of prob-
ability residuals computed for
2001 and 2003. Note how the
updating attenuates the pepper-
and-salt effect in the classified
map.
Table 1: Results of the validation study: percentages of pixels correctly classified into
each of the five main types of landcover. The last row gives results obtained when urban
and water pixels are not used in the discriminant analysis.Methods Crops Non-ag. Urban Water
Year 2000
Discriminant Analysis (DA)
DA + ST kriging
Analysis w/o urban, water
93.2
94.7
86.2
57.3
69.0
67.4
54.5
87.2
61.5
83.6
Year 2002
Discriminant Analysis (DA)
DA + ST kriging
Analysis w/o urban, water
96.3
95.4
96.1
24.3
71.6
72.4
28.5
89.9
62.3
91.7
Year 2004
Discriminant Analysis (DA)
DA + ST kriging
Analysis w/o urban, water
83.3
85.3
86.4
36.8
73.4
81.6
57.8
92.9
78.0
95.2
Year 2005
Discriminant Analysis (DA)
DA + ST kriging
Analysis w/o urban, water
85.4
84.8
86.2
25.8
61.3
69.9
58.9
94.8
75.9
95.8
Table 2: Confusion matrix computed for the proposed approach and traditional DA
(2ndrow) for year 2005: actual and predicted numbers of pixels for each land cover.
Land cover Actual
Predicted Corn Soybean Other crops Non-ag Urban Water
Corn
Soybean
Other crops
Non-ag
Urban
Water
110477
113023
6650
4934
1826
7274
10313
5251
679
880379
987
18424
33986
87435
71444
2823
8906
6147
2174
499
621471
779
985
1119
604
770
732
2691
2882
497
60
144108
186
21505
22235
14552
15039
10549
53730
97550
41460
12062
167012812
11399
2732
5602
2622
4482
268
26520
858
18083
130910
83197686
3310
703
1378
206
348
148
1711
999
9644
373
84255012
43852
-
8/11/2019 StatGIS2009-PG1-revision.pdf
6/6
3 MODEL ASSESSMENT
A validation study was conducted to quantify whether the visual improvement noticed
in the updated classifications of Figure 5 (fewer isolated pixels) and the reduction inuncertainty translate into a greater accuracy of the classified maps. USDA CDL (Crop-
land Data Layer) maps for the same acquisition years were considered as the reference
classification for the validation, although the smaller accuracy of the urban and water
classes forced us to retain only the pixels that were consistently urban and water for the
period 2002-2005. The validation was conducted for both four and six classes (crops
category split into corn, soybean and small grains) and two different approaches: (1)
Straightforward discriminant analysis (reference method), (2) Discriminant analysis
followed by geostatistical updating of probabilities using space-time kriging (proposed
approach). The last approach was also repeated after excluding from the discriminant
analysis the water and urban classes which can be obtained from other sources (e.g.
USGS NLCD maps).
The performance criterion is the percentage of correctly classified pixels (table 1).
The validation demonstrated the greater accuracy of the proposed approach over a tradi-
tional discriminant analysis: the proportion of correctly classified pixels ranges from
67% to 96% of the image depending on the land cover, with higher accuracies obtained
when fewer classes are considered. The average proportion of correctly classified non-
agricultural pixels is 80%, except for the very dry year of 2002 where a majority of
those pixels were classified as crops. The classification of water is very good too, from
64 to 99% of correct classification. The detection of urban pixels remains problematic
for the reference approach with a majority of them being classified as non ag. (see con-
fusion matrix in table 2), which confirms the pepper-and-salt effect of those maps. A
very important result is that the space-time kriging of residuals almost systematically
improves the accuracy and precision (smaller entropy) of the prediction.
4 ACKNOWLEDGEMENTS
This research was funded by contract N44-PC-95008 from the National Cancer Institute.
The views stated in this publication are those of the author and do not necessarily repre-
sent the official views of the NCI.
5 REFERENCES
Goovaerts, P. (2002). Geostatistical incorporation of spatial coordinates into super-
vised classification of hyperspectral data. Journal of Geographical Systems 4(1): 99-
111.
Maxwell, S.K., J.R. Meliker, et al. (2009). Use of land surface remotely sensed satellite
and airborne data for environmental exposure assessment in cancer research.Journal
Of Exposure Science And Environmental Epidemiology, in press.
Nuckols J.R., M.H. Ward, et al. (2004). Using Geographic Information Systems for
Exposure Assessment in Environmental Epidemiology Studies. Environmental
Health Perspectives 112: 1007-101