statgis2009-pg1-revision.pdf

Upload: thyago-oliveira

Post on 02-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 StatGIS2009-PG1-revision.pdf

    1/6

    Automatic Classification of Landsat Timeseries

    using Geostatistics and Discriminant Analysis

    Pierre Goovaerts1, Susan K. Maxwell2, Jaymie R. Meliker3

    1BioMedware

    [email protected]

    2U.S. Geological Survey (USGS) Earth Resource Observation and Science (EROS)

    Center [email protected]

    3Stony Brook University, Department of Preventive Medicine

    [email protected]

    Abstract.This paper presents a new supervised classification algorithm that com-

    bines classical discriminant analysis with space-time interpolation of probability

    residuals. This approach allows the automatic segmentation of satellite imagery

    while using available classification maps (e.g. crop maps) to correct a posteriori

    the results. The methodology is illustrated using a 6 year Landsat time series and

    six types of land cover: urban, water, non-agricultural (e.g. forest), corn, soybean,

    and other crops. The validation demonstrates the greater accuracy of the proposed

    approach over a traditional discriminant analysis: the proportion of correctly clas-

    sified pixels ranges from 67% to 96% of the image depending on the land cover,with better results obtained when the last three land cover types are merged into a

    single crop category. Furthermore the geostatistical approach attenuates the pep-

    per-and-salt effect in the classified maps. This classification can be used in envi-

    ronmental and health studies to estimate long-term exposures to pesticides.

    1 INTRODUCTION

    A new generation of remote sensing hardware is providing high spatial and temporal

    resolution imagery that documents the environment with unprecedented spatial and

    spectral detail. This quantum leap in resolution has enormous potential for improving

    our ability to document, monitor and model environmental risks thought to be asso-

    ciated with adverse health outcomes. In particular, Maxwell et al. (2009) summarized

    how land surface remotely sensed data are currently being used to study the relationship

    between cancer and environmental contaminants.

    Landscape-based cancer studies require mid-resolution (10-100m) or high-resolution

    (

  • 8/11/2019 StatGIS2009-PG1-revision.pdf

    2/6

    The development of the classification algorithm described in this paper was based on

    the assumption that some classified scenes are available in order to calibrate the rela-

    tionship between spectral values and land covers. This is a reasonable assumption for

    crops since land cover/land use maps are now available from USGS, as well as crop

    maps from USDA for some states. Figure 1 illustrates a typical situation whereby a timeseries of satellite images, possibly with missing years/seasons or missing pixels for a

    monitored time, coexist with a few classified scenes (e.g. 2001, 2003). The objective is

    to create a time series of land cover maps for all years. A geostatistical pixel-based clas-

    sification was adopted for the following reasons: the approach can be fully automated,

    2) the algorithm allows filling-in gaps through space-time interpolation, 3) a similar

    approach proved to generate high-accuracy classifications in earlier analysis of hyper-

    spectral data (Goovaerts, 2002), 4) the probabilistic classification provides a measure of

    uncertainty that can be propagated through exposure assessment.

    The methodology is illustrated using a 6 year Landsat time series (2000-2005) for an

    area NW of Des Moines, Iowa (Figure 1). Data were collected in both spring and sum-mer. The scene comprises 801801 pixels of 30 meters, and the analysis focused on

    three spectral bands and six types of land cover: urban, water, non-agricultural (e.g. for-

    est), corn, soybean, and other crops. The relative area of the scene covered by each land-

    cover is fairly constant across time; e.g. starting with the largest percentage: non-

    agricultural (20.3-26.3), urban (22.0-23.0), corn (19.0-22.2), soybean (19.1-21.8), water

    (9.5-9.8), and other crops (0.9-2.8).

    2 METHODOLOGY

    The classification algorithm proceeds as follows:

    1.

    Factorial kriging is used to filter the noise in the image and interpolate any miss-

    ing reflectance values, such as cloudy pixels or missing years; see Figure 2.

    2.

    For the two years where a classification is available, the filtered values undergo a

    linear discriminant analysis (DA, PROC DISCRIM in SAS). Based on this mod-

    el one can compute, for each time where reflectance values are known or were

    estimated at Step 1, the probability of occurrence for each land cover of interest

    (e.g. corn, soybean, urban). Each pixel is then assigned to the most likely catego-

    ry (see example for 4 categories in Figure 3, left bottom map).

    3. For the two years where a classification is available, probability residuals are

    computed as the difference between the DA-based probability and the observed

    probability (0 or 1) of occurrence (Figure 3). Probability residuals are then inter-

    polated to unmonitored times and locations using space-time kriging and vario-

    gram models that are inferred automatically.

    4.

    Interpolated residuals are combined with the DA-based probabilities at unmoni-

    tored times, leading to a posterior probability of occurrence for each land cover

    of interest (see example for crop in Figure 4).

    5. Each pixel is assigned to the land cover with the largest probability of occur-

    rence (maximum-likelihood classification) and the classification uncertainty is

    quantified using the entropy of the probability distribution (Figure 5).

  • 8/11/2019 StatGIS2009-PG1-revision.pdf

    3/6

    Figure 1: Imagery time series used to illustrate and validate the classification algorithm:

    three spectral bands (TM 3, 4 ,5) recorded in spring and summer for six years and the

    USDA CDL maps (urban, water, non-agricultural, corn, soybean, and other crops) for

    the corresponding years. Red scenes correspond to the two seasons where Landsat im-

    agery was not available. Only the 2001 and 2003 images were used to calibrate the rela-

    tionships between spectral values and land cover, while the four other years were set

    aside for validation.

    Figure 2: Geostatistical space-time interpolation of reflectance values to fill-in the spa-

    tial and temporal gaps in the time-series of Landsat images. In this particular example,

    the missing image of Summer 2002 is reconstructed. Filter weights are computed using

    space-time ordinary kriging and a separable space-time covariance obtained as the prod-

    uct of the spatial covariance by the temporal correlation. Note the strong correlation

    between summer images for different years, while the correlation is only 0.398 with the

    spring image of the same year.

  • 8/11/2019 StatGIS2009-PG1-revision.pdf

    4/6

    Figure 3: Comparison of the true landcover map and results of the discriminant analysis

    (DA) for 2001. Misclassification errors are caused by an overestimation or underestima-

    tion of the true probability of occurrence of each class (0 or 1) by the DA-based proba-

    bility, which can be expressed as a probability residual. Figure 4 shows an example for

    the crop category.

    Figure 4: Use of the kriged probability residuals to update the probability of occurrence

    of the crop category derived from the discriminant analysis conducted in 2001 and 2003.

  • 8/11/2019 StatGIS2009-PG1-revision.pdf

    5/6

    Figure 5: Maximum likelihood

    classification of the Landsat

    time series before and after

    updating of the discriminant

    analysis results using spatio-temporal interpolation of prob-

    ability residuals computed for

    2001 and 2003. Note how the

    updating attenuates the pepper-

    and-salt effect in the classified

    map.

    Table 1: Results of the validation study: percentages of pixels correctly classified into

    each of the five main types of landcover. The last row gives results obtained when urban

    and water pixels are not used in the discriminant analysis.Methods Crops Non-ag. Urban Water

    Year 2000

    Discriminant Analysis (DA)

    DA + ST kriging

    Analysis w/o urban, water

    93.2

    94.7

    86.2

    57.3

    69.0

    67.4

    54.5

    87.2

    61.5

    83.6

    Year 2002

    Discriminant Analysis (DA)

    DA + ST kriging

    Analysis w/o urban, water

    96.3

    95.4

    96.1

    24.3

    71.6

    72.4

    28.5

    89.9

    62.3

    91.7

    Year 2004

    Discriminant Analysis (DA)

    DA + ST kriging

    Analysis w/o urban, water

    83.3

    85.3

    86.4

    36.8

    73.4

    81.6

    57.8

    92.9

    78.0

    95.2

    Year 2005

    Discriminant Analysis (DA)

    DA + ST kriging

    Analysis w/o urban, water

    85.4

    84.8

    86.2

    25.8

    61.3

    69.9

    58.9

    94.8

    75.9

    95.8

    Table 2: Confusion matrix computed for the proposed approach and traditional DA

    (2ndrow) for year 2005: actual and predicted numbers of pixels for each land cover.

    Land cover Actual

    Predicted Corn Soybean Other crops Non-ag Urban Water

    Corn

    Soybean

    Other crops

    Non-ag

    Urban

    Water

    110477

    113023

    6650

    4934

    1826

    7274

    10313

    5251

    679

    880379

    987

    18424

    33986

    87435

    71444

    2823

    8906

    6147

    2174

    499

    621471

    779

    985

    1119

    604

    770

    732

    2691

    2882

    497

    60

    144108

    186

    21505

    22235

    14552

    15039

    10549

    53730

    97550

    41460

    12062

    167012812

    11399

    2732

    5602

    2622

    4482

    268

    26520

    858

    18083

    130910

    83197686

    3310

    703

    1378

    206

    348

    148

    1711

    999

    9644

    373

    84255012

    43852

  • 8/11/2019 StatGIS2009-PG1-revision.pdf

    6/6

    3 MODEL ASSESSMENT

    A validation study was conducted to quantify whether the visual improvement noticed

    in the updated classifications of Figure 5 (fewer isolated pixels) and the reduction inuncertainty translate into a greater accuracy of the classified maps. USDA CDL (Crop-

    land Data Layer) maps for the same acquisition years were considered as the reference

    classification for the validation, although the smaller accuracy of the urban and water

    classes forced us to retain only the pixels that were consistently urban and water for the

    period 2002-2005. The validation was conducted for both four and six classes (crops

    category split into corn, soybean and small grains) and two different approaches: (1)

    Straightforward discriminant analysis (reference method), (2) Discriminant analysis

    followed by geostatistical updating of probabilities using space-time kriging (proposed

    approach). The last approach was also repeated after excluding from the discriminant

    analysis the water and urban classes which can be obtained from other sources (e.g.

    USGS NLCD maps).

    The performance criterion is the percentage of correctly classified pixels (table 1).

    The validation demonstrated the greater accuracy of the proposed approach over a tradi-

    tional discriminant analysis: the proportion of correctly classified pixels ranges from

    67% to 96% of the image depending on the land cover, with higher accuracies obtained

    when fewer classes are considered. The average proportion of correctly classified non-

    agricultural pixels is 80%, except for the very dry year of 2002 where a majority of

    those pixels were classified as crops. The classification of water is very good too, from

    64 to 99% of correct classification. The detection of urban pixels remains problematic

    for the reference approach with a majority of them being classified as non ag. (see con-

    fusion matrix in table 2), which confirms the pepper-and-salt effect of those maps. A

    very important result is that the space-time kriging of residuals almost systematically

    improves the accuracy and precision (smaller entropy) of the prediction.

    4 ACKNOWLEDGEMENTS

    This research was funded by contract N44-PC-95008 from the National Cancer Institute.

    The views stated in this publication are those of the author and do not necessarily repre-

    sent the official views of the NCI.

    5 REFERENCES

    Goovaerts, P. (2002). Geostatistical incorporation of spatial coordinates into super-

    vised classification of hyperspectral data. Journal of Geographical Systems 4(1): 99-

    111.

    Maxwell, S.K., J.R. Meliker, et al. (2009). Use of land surface remotely sensed satellite

    and airborne data for environmental exposure assessment in cancer research.Journal

    Of Exposure Science And Environmental Epidemiology, in press.

    Nuckols J.R., M.H. Ward, et al. (2004). Using Geographic Information Systems for

    Exposure Assessment in Environmental Epidemiology Studies. Environmental

    Health Perspectives 112: 1007-101