statgis2009-pg1-revision.pdf

8/11/2019 StatGIS2009-PG1-revision.pdf

1/6

Automatic Classification of Landsat Timeseries

using Geostatistics and Discriminant Analysis

Pierre Goovaerts1, Susan K. Maxwell2, Jaymie R. Meliker3

1BioMedware

[email protected]

2U.S. Geological Survey (USGS) Earth Resource Observation and Science (EROS)

Center [email protected]

3Stony Brook University, Department of Preventive Medicine

[email protected]

Abstract.This paper presents a new supervised classification algorithm that com-

bines classical discriminant analysis with space-time interpolation of probability

residuals. This approach allows the automatic segmentation of satellite imagery

while using available classification maps (e.g. crop maps) to correct a posteriori

the results. The methodology is illustrated using a 6 year Landsat time series and

six types of land cover: urban, water, non-agricultural (e.g. forest), corn, soybean,

and other crops. The validation demonstrates the greater accuracy of the proposed

approach over a traditional discriminant analysis: the proportion of correctly clas-

sified pixels ranges from 67% to 96% of the image depending on the land cover,with better results obtained when the last three land cover types are merged into a

single crop category. Furthermore the geostatistical approach attenuates the pep-

per-and-salt effect in the classified maps. This classification can be used in envi-

ronmental and health studies to estimate long-term exposures to pesticides.

1 INTRODUCTION

A new generation of remote sensing hardware is providing high spatial and temporal

resolution imagery that documents the environment with unprecedented spatial and

spectral detail. This quantum leap in resolution has enormous potential for improving

our ability to document, monitor and model environmental risks thought to be asso-

ciated with adverse health outcomes. In particular, Maxwell et al. (2009) summarized

how land surface remotely sensed data are currently being used to study the relationship

between cancer and environmental contaminants.

Landscape-based cancer studies require mid-resolution (10-100m) or high-resolution

(


2/6

The development of the classification algorithm described in this paper was based on

the assumption that some classified scenes are available in order to calibrate the rela-

tionship between spectral values and land covers. This is a reasonable assumption for

crops since land cover/land use maps are now available from USGS, as well as crop

maps from USDA for some states. Figure 1 illustrates a typical situation whereby a timeseries of satellite images, possibly with missing years/seasons or missing pixels for a

monitored time, coexist with a few classified scenes (e.g. 2001, 2003). The objective is

to create a time series of land cover maps for all years. A geostatistical pixel-based clas-

sification was adopted for the following reasons: the approach can be fully automated,

2) the algorithm allows filling-in gaps through space-time interpolation, 3) a similar

approach proved to generate high-accuracy classifications in earlier analysis of hyper-

spectral data (Goovaerts, 2002), 4) the probabilistic classification provides a measure of

uncertainty that can be propagated through exposure assessment.

The methodology is illustrated using a 6 year Landsat time series (2000-2005) for an

area NW of Des Moines, Iowa (Figure 1). Data were collected in both spring and sum-mer. The scene comprises 801801 pixels of 30 meters, and the analysis focused on

three spectral bands and six types of land cover: urban, water, non-agricultural (e.g. for-

est), corn, soybean, and other crops. The relative area of the scene covered by each land-

cover is fairly constant across time; e.g. starting with the largest percentage: non-

agricultural (20.3-26.3), urban (22.0-23.0), corn (19.0-22.2), soybean (19.1-21.8), water

(9.5-9.8), and other crops (0.9-2.8).

2 METHODOLOGY

The classification algorithm proceeds as follows:

1.

Factorial kriging is used to filter the noise in the image and interpolate any miss-

ing reflectance values, such as cloudy pixels or missing years; see Figure 2.

2.

For the two years where a classification is available, the filtered values undergo a

linear discriminant analysis (DA, PROC DISCRIM in SAS). Based on this mod-

el one can compute, for each time where reflectance values are known or were

estimated at Step 1, the probability of occurrence for each land cover of interest

(e.g. corn, soybean, urban). Each pixel is then assigned to the most likely catego-

ry (see example for 4 categories in Figure 3, left bottom map).

3. For the two years where a classification is available, probability residuals are

computed as the difference between the DA-based probability and the observed

probability (0 or 1) of occurrence (Figure 3). Probability residuals are then inter-

polated to unmonitored times and locations using space-time kriging and vario-

gram models that are inferred automatically.

4.

Interpolated residuals are combined with the DA-based probabilities at unmoni-

tored times, leading to a posterior probability of occurrence for each land cover

of interest (see example for crop in Figure 4).

5. Each pixel is assigned to the land cover with the largest probability of occur-

rence (maximum-likelihood classification) and the classification uncertainty is

quantified using the entropy of the probability distribution (Figure 5).


3/6

Figure 1: Imagery time series used to illustrate and validate the classification algorithm:

three spectral bands (TM 3, 4 ,5) recorded in spring and summer for six years and the

USDA CDL maps (urban, water, non-agricultural, corn, soybean, and other crops) for

the corresponding years. Red scenes correspond to the two seasons where Landsat im-

agery was not available. Only the 2001 and 2003 images were used to calibrate the rela-

tionships between spectral values and land cover, while the four other years were set

aside for validation.

Figure 2: Geostatistical space-time interpolation of reflectance values to fill-in the spa-

tial and temporal gaps in the time-series of Landsat images. In this particular example,

the missing image of Summer 2002 is reconstructed. Filter weights are computed using

space-time ordinary kriging and a separable space-time covariance obtained as the prod-

uct of the spatial covariance by the temporal correlation. Note the strong correlation

between summer images for different years, while the correlation is only 0.398 with the

spring image of the same year.


4/6

Figure 3: Comparison of the true landcover map and results of the discriminant analysis

(DA) for 2001. Misclassification errors are caused by an overestimation or underestima-

tion of the true probability of occurrence of each class (0 or 1) by the DA-based proba-

bility, which can be expressed as a probability residual. Figure 4 shows an example for

the crop category.

Figure 4: Use of the kriged probability residuals to update the probability of occurrence

of the crop category derived from the discriminant analysis conducted in 2001 and 2003.


5/6

Figure 5: Maximum likelihood

classification of the Landsat

time series before and after

updating of the discriminant

analysis results using spatio-temporal interpolation of prob-

ability residuals computed for

2001 and 2003. Note how the

updating attenuates the pepper-

and-salt effect in the classified

map.

Table 1: Results of the validation study: percentages of pixels correctly classified into

each of the five main types of landcover. The last row gives results obtained when urban

and water pixels are not used in the discriminant analysis.Methods Crops Non-ag. Urban Water

Year 2000

Discriminant Analysis (DA)

DA + ST kriging

Analysis w/o urban, water

93.2

94.7

86.2

57.3

69.0

67.4

54.5

87.2

61.5

83.6

Year 2002


DA + ST kriging


96.3

95.4

96.1

24.3

71.6

72.4

28.5

89.9

62.3

91.7

Year 2004


DA + ST kriging


83.3

85.3

86.4

36.8

73.4

81.6

57.8

92.9

78.0

95.2

Year 2005


DA + ST kriging


85.4

84.8

86.2

25.8

61.3

69.9

58.9

94.8

75.9

95.8

Table 2: Confusion matrix computed for the proposed approach and traditional DA

(2ndrow) for year 2005: actual and predicted numbers of pixels for each land cover.

Land cover Actual

Predicted Corn Soybean Other crops Non-ag Urban Water

Corn

Soybean

Other crops

Non-ag

Urban

Water

110477

113023

6650

4934

1826

7274

10313

5251

679

880379

987

18424

33986

87435

71444

2823

8906

6147

2174

499

621471

779

985

1119

604

770

732

2691

2882

497

60

144108

186

21505

22235

14552

15039

10549

53730

97550

41460

12062

167012812

11399

2732

5602

2622

4482

268

26520

858

18083

130910

83197686

3310

703

1378

206

348

148

1711

999

9644

373

84255012

43852


6/6

3 MODEL ASSESSMENT

A validation study was conducted to quantify whether the visual improvement noticed

in the updated classifications of Figure 5 (fewer isolated pixels) and the reduction inuncertainty translate into a greater accuracy of the classified maps. USDA CDL (Crop-

land Data Layer) maps for the same acquisition years were considered as the reference

classification for the validation, although the smaller accuracy of the urban and water

classes forced us to retain only the pixels that were consistently urban and water for the

period 2002-2005. The validation was conducted for both four and six classes (crops

category split into corn, soybean and small grains) and two different approaches: (1)

Straightforward discriminant analysis (reference method), (2) Discriminant analysis

followed by geostatistical updating of probabilities using space-time kriging (proposed

approach). The last approach was also repeated after excluding from the discriminant

analysis the water and urban classes which can be obtained from other sources (e.g.

USGS NLCD maps).

The performance criterion is the percentage of correctly classified pixels (table 1).

The validation demonstrated the greater accuracy of the proposed approach over a tradi-

tional discriminant analysis: the proportion of correctly classified pixels ranges from

67% to 96% of the image depending on the land cover, with higher accuracies obtained

when fewer classes are considered. The average proportion of correctly classified non-

agricultural pixels is 80%, except for the very dry year of 2002 where a majority of

those pixels were classified as crops. The classification of water is very good too, from

64 to 99% of correct classification. The detection of urban pixels remains problematic

for the reference approach with a majority of them being classified as non ag. (see con-

fusion matrix in table 2), which confirms the pepper-and-salt effect of those maps. A

very important result is that the space-time kriging of residuals almost systematically

improves the accuracy and precision (smaller entropy) of the prediction.

4 ACKNOWLEDGEMENTS

This research was funded by contract N44-PC-95008 from the National Cancer Institute.

The views stated in this publication are those of the author and do not necessarily repre-

sent the official views of the NCI.

5 REFERENCES

Goovaerts, P. (2002). Geostatistical incorporation of spatial coordinates into super-

vised classification of hyperspectral data. Journal of Geographical Systems 4(1): 99-

111.

Maxwell, S.K., J.R. Meliker, et al. (2009). Use of land surface remotely sensed satellite

and airborne data for environmental exposure assessment in cancer research.Journal

Of Exposure Science And Environmental Epidemiology, in press.

Nuckols J.R., M.H. Ward, et al. (2004). Using Geographic Information Systems for

Exposure Assessment in Environmental Epidemiology Studies. Environmental

Health Perspectives 112: 1007-101

statgis2009-pg1-revision.pdf

Documents