thomas villmann, erzsébet merényi and barbara hammer- neural maps in remote sensing image analysis

Upload: grettsz

Post on 06-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    1/41

    1

    Neural Maps in Remote Sensing Image Analysis

    Thomas Villmann, Erzsbet Mernyi and Barbara Hammer

    Universitt Leipzig, Germany

    Klinik fr Psychotherapie, 04107 Leipzig, K.-Tauchnitz-Str.25

    Rice University

    Depatment of Electrical and Computer Engineering

    6100 Main Street, MS 380

    Houston, Texas, U.S.A.

    University of Osnabrck,

    Department of Mathematics/Computer Science

    Albrechtstrae 28, 49069 Osnabrck, Germany

    corresponding author; email: [email protected], phone: +49 (0) 341 9718868 fax: +49

    (0) 341 2131257

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    2/41

    2

    Abstract

    We study the application of Self-Organizing Maps for the analyses of remote sensing

    spectral images. Advanced airborne and satellite-based imaging spectrometers produce very

    high-dimensional spectral signatures that provide key information to many scientific inves-

    tigations about the surface and atmosphere of Earth and other planets. These new, so-

    phisticated data demand new and advanced approaches to cluster detection, visualization,

    and supervised classification. In this article we concentrate on the issue of faithful topo-

    logical mapping in order to avoid false interpretations of cluster maps created by an SOM.

    We describe several new extensions of the standard SOM, developed in the past few years:

    the Growing Self-Organizing Map, magnifi

    cation control, and Generalized Relevance Learn-ing Vector Quantization, and demonstrate their effect on both low-dimensional traditional

    multi-spectral imagery and 200-dimensional hyperspectral imagery.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    3/41

    3

    1 Introduction

    In geophysics, geology, astronomy, as well as in many environmental applications airborne

    and satellite-borne remote sensing spectral imaging has become one of the most advanced

    tools for collecting vital information about the surface of Earth and other planets. Spec-

    tral images consist of an array of multi-dimensional vectors each of which is assigned to a

    particular spatial area, reflecting the response of a spectral sensor for that area at various

    wavelengths (Figure 1). Classification of intricate, high-dimensional spectral signatures has

    turned out far from trivial. Discrimination among many surface cover classes, discovery

    of spatially small, interesting spectral species proved to be an insurmountable challenge to

    many traditional clustering and classification methods. By costumary measures (such as,

    for example, Principal Component Analysis (PCA)) the intrinsic spectral dimensionality of

    hyperspectral images appears to be surprisingly low, 5 10 at most. Yet dimensionality

    reduction to such low numbers has not been successful in terms of preservation of impor-

    tant class distinctions. The spectral bands, many of which are highly correlated, may lie on

    a low-dimensional but non-linear manifold, which is a scenario that eludes many classical

    approaches. In addition, these data comprise enormous volumes and are frequently noisy.

    This motivates research into advanced and novel approaches for remote sensing image analy-sis and, in particular, neural networks [1]. Generally, artificial neural networks attempt to

    replicate the computational power of biological neural networks, with the following important

    (incomplete list of) features:

    adaptivity the ability to change the internal representation (connection weights, net-

    work structure) if new data or information are available

    robustness handling of missing, noisy ore otherwise confused data

    power / speed handling of large data volumes in acceptable time due to inherent

    parallelism

    non-linearity the ability to represent non-linear functions or mappings

    Exactly these properties make the application of neural networks interesting in remote

    sensing image analysis. In the present contribution we will concentrate on a special neural

    network type, neural maps. Neural maps constitute an important neural network paradigm.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    4/41

    4

    In brains, neural maps occur in all sensory modalities as well as in motor areas. In tech-

    nical contexts, neural maps are utilized in the fashion of neighborhood preserving vector

    quantizers. In both cases these networks project data from some possibly high-dimensional

    input space onto a position in some output space. In the area of spectral image analysis we

    apply neural maps for clustering, data dimensionality reduction, learning of relevant data

    dimensions and visualization. For this purpose we highlight several useful extensions of the

    basic neural map approaches.

    This paper is structured as follows: In Section 2 we briefly introduce the problems in

    remote sensing spectral image analysis. In Section 3 we give the basics of neural maps and

    vector quantization together with a short review of new extensions: structure adaptation

    and magnification control (Section 3.2), and learning of proper metrics or relevance learning

    (Section 3.3). In Section 4 we present applications of the reviewed methods to multi-spectral

    LANDSAT TM imagery as well as very high-dimensional hyperspectral AVIRIS imagery.

    2 Remote Sensing Spectral Imaging

    As mentioned above airborne and satellite-borne remote sensing spectral images consist of

    an array of multi-dimensional vectors assigned to particular spatial regions (pixel locations)

    reflecting the response of a spectral sensor at various wavelengths. These vectors are called

    spectra. A spectrum is a characteristic pattern that provides a clue to the surface material

    within the respective surface element. The utilization of these spectra includes areas such

    as mineral exploration, land use, forestry, ecosystem management, assessment of natural

    hazards, water resources, environmental contamination, biomass and productivity; and many

    other activities of economic significance, as well as prime scientific pursuits such as looking

    for possible sources of past or present life on other planets. The number of applications

    has dramatically increased in the past ten years with the advent of imaging spectrometers

    such as AVIRIS of NASA/JPL, that greatly surpass traditional multi-spectral sensors (e.g.,

    LANDSAT Thematic Mapper (TM)).

    Imaging spectrometers can resolve the known, unique, discriminating spectral features

    of minerals, soils, rocks, and vegetation. While a multi-spectral sensor samples a given

    wavelength window (typically the 0.42.5 m range in the case of Visible and Near-Infrared

    imaging) with several broad bandpasses, leaving large gaps between the bands, imaging

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    5/41

    5

    spectrometers sample a spectral window contiguously with very narrow, 1020nm badpasses

    (Figure 1) [2]. Hyperspectral technology is in great demand because direct identification of

    surface compounds is possible without prior field work, for materials with known spectral

    signatures. Depending on the wavelength resolution and the width of the wavelength window

    used by a particular sensor, the dimensionality of the spectra can be as low as 5 6 (such

    as in LANDSAT TM), or as high as several hundred for hyperspectral imagers [3].

    Spectral images can formally be described as a matrix S = v(x,y), where v(x,y) RDV

    is the vector of spectral information associated with pixel location (x, y). The elements

    v(x,y)i , i = 1 . . . DV of spectrum v

    (x,y) reflect the responses of a spectral sensor at a suite

    of wavelengths (Figure 2, [4],[5]). The spectrum is a characteristic fingerprint pattern that

    identifies the surface material within the area defined by pixel (x, y). The individual 2-

    dimensional image Si = vi(x,y) at wavelength i is called the ith image band. The data space

    V spanned by Visible-Near Infrared reflectance spectra is [0 noise, U + noise]DV RDV

    where U > 0 represents an upper limit of the measured scaled reflectivity and noise is the

    maximum value of noise across all spectral channels and image pixels. The data densityP(V)

    may vary strongly within this space. Sections of the data space can be very densely populated

    while other parts may be extremely sparse, depending on the materials in the scene and on

    the spectral bandpasses of the sensor. According to this model traditional multi-spectral

    imagery has a low DV value while DV can be several hundred for hyperspectral images. The

    latter case is of particular interest because the great spectral detail, complexity, and very

    large data volume pose new challenges in clustering, cluster visualization, and classification

    of images with such high spectral dimensionality [6].

    In addition to dimensionality and volume, other factors, specific to remote sensing, can

    make the analyses of hyperspectral images even harder. For example, given the richness

    of data, the goal is to separate many cover classes, however, surface materials that are

    significantly different for an application may be distinguished by very subtle differences

    in their spectral patterns. The pixels can be mixed, which means that several different

    materials may contribute to the spectral signature associated with one pixel. Training data

    may be scarce for some classes, and classes may be represented very unevenly. All the above

    difficulties motivate research into advanced and novel approaches [1].

    Noise is far less problematic than the intricacy of the spectral patterns, because of the

    high Signal-to-Noise Ratios (500 1, 500) that present-day hyperspectral imagers provide.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    6/41

    6

    For this discussion, we will omit noise issues, and additional effects such as atmospheric

    distortions, illumination geometry and albedo variations in the scene, because these can be

    addressed through well-established procedures prior to clustering or classification.

    3 Neural Maps and Vector Quantization

    3.1 Basic Concepts and Notations

    Neural maps as tools for (unsupervised) topographic vector quantization algorithms map

    data vectors v sampled from a data manifold V RDV onto a set A of neurons r:

    VA : V A . (3.1)

    Associated with each neuron is a pointer (weight vector) wr RDV specifying the config-

    uration of the neural map W = {wr}rA. The task of vector quantization is realized by the

    map as a winner-take-all rule, i.e. a stimulus vector v V is mapped onto that neuron

    s A the pointer ws of which is closest to the presented stimulus vector v,

    VA : v 7 s (v) = argminrA

    d (v,wr) . (3.2)

    where d (v,wr) is the Euclidean distance. The neuron s is referred to as the winner or

    bestmatching unit. The reverse mapping is defined as AV : r 7 wr. These two mappings

    together determine the neural map

    M = (VA,AV) (3.3)

    realized by the network. The subset of the input space

    r = {v V : r = VA (v)} (3.4)

    which is mapped to a particular neuron r according to (3.2), forms the (masked) receptive

    field of that neuron. If the intersection of two masked receptive fields r, r0 is non-vanishing

    we call r and r0 neighbored. The neighborhood relations form a corresponding graph

    structure GV in A: two neurons are connected in GV if and only if their masked receptive

    fields are neighbored. The graph GV is called the induced Delaunay-graph (See, for example,

    [7] for detailed definitions). Due to the bijective relation between neurons and weight vectors,

    GV also represents the Delaunay graph of the weights [7].

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    7/41

    7

    In the most widely used neural map algorithm, the self-organizing map (SOM) [8], the

    neurons are arranged a priori on a fixed grid A.1 Other algorithms such as the topology

    representing network (TRN), which is based on the neural gas algorithm (NG) [9], do not

    specify the topological relations among neurons, but construct a neighborhood structure in

    the output space during learning [7].

    In any of these algorithms the pointers are adapted such that the input data are matched

    appropriately. For this purpose a random sequence of data points v V from the stimuli

    distribution P(v) is presented to the map, the winning neuron s is determined according to

    (3.2), and the pointer ws

    is shifted toward v. Interactions among the neurons are included

    via a neighborhood function determining to what extent the units that are close to the

    winner in the output space participate in the learning step:

    4wr = h (r, s,v) (vwr) . (3.5)

    h (r, s,v) is usually chosen to be of Gaussian shape

    h (r, s,v) = exp

    (dA (r, s (v)))2

    !(3.6)

    where dA (r, s (v)) is a distance measure on the set A. We emphasize that h (r, s,v) im-

    plecitely depends on the whole set W of weight vectors via the winner determination of

    s according to (3.2). In SOMs it is evaluated in the output space A: dSOMA (r, s (v)) =

    kr s (v)kA whereas for the NG we have dNGA (r, s (v)) = kr (v,W). kr (v,W) gives the

    number of pointers wr0 for which the relation d (v,wr0) d (v,wr) is valid [9], i.e., dNGA is

    evaluated in the input space. (Here, d(, ) is the Euclidean distance.) In the SOM usually

    = 22 is used.

    The particular definitions of the distance measures in the two models cause further dif-

    ferences. In [9] the convergence rate of the neural gas network was shown to be faster than

    that of the SOM. Furthermore, it was proven that the adaptation rule for the weight vectors

    follows, on average, a potential dynamic. In contrast, a global energy function does not exist

    in the SOM algorithm [10].

    One important extension of the basic concepts of neural maps concerns the so-called mag-

    1 Usually the grid A is chosen as a dAdimensional hypercube and then one has i = (i1, . . . , idA). Yet,

    other ordered arrangements are also admissible.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    8/41

    8

    nification. The standard SOM distributes the pointers according to the input distribution

    P(W) P(V) (3.7)

    with the magnification factor SOM =2

    3 [11], [12].23

    For the NG one finds, by analyticalconsiderations, that for small the magnification factor NG =

    dd+2

    which only depends

    on the dimensionality of the input space embedded in RDV, i.e. the result is valid for all

    dimensions [9].

    Topology preservation or topographic mapping in neural maps is defined as the preser-

    vation of the continuity of the mapping from the input space onto the output space, more

    precisely it is equivalent to the continuity ofM between the topological spaces with properly

    chosen metric in both A and V. For lack of space we refer to [14] for detailed considerations.

    For the TRN the metric in A is defined according to the given data structure whereas in case

    of the SOM violations of topology preservation may occur if the structure of the output space

    does not match the topological structure of V. In SOMs the topology preserving property

    can be used for immediate evaluations of the resulting map, for instance by interpretation

    as a color space, as demonstrated in Section 4.2.2 . Topology preservation also allows the

    applications of interpolating schemes such as the parametrized SOM (PSOM) [15] or inter-

    polating SOM (I-SOM) [16]. A higher degree of topology preservation, in general, improves

    the accuracy of the map [17].

    As pointed out in [14] violations of topographic mapping in SOMs can result in false

    interpretations. Several approaches were developed to judge the degree of topology preser-

    vation for a given map. Here we briefly describe a variant P of the well known topographic

    product P [17]. Instead of the Euclidean distances between the weight vectors, P uses the

    respective distances dGV(wr,w

    r0) of minimal path lengths in the induced Delaunay-graph GV

    of the wr. During the computation of P for each node r the sequences nAj (r) ofj-th neigh-

    bors of r in A, and nVj (r) describing the j-th neighbor ofwr in V, have to be determined.

    These sequences and further averaging over neighborhood orders j and nodes r finally lead

    to

    P =1

    N(N 1)

    Xr

    N1Xj=1

    1

    2jlog() (3.8)

    2 This result is valid for the one-dimensional case and higher-dimensional ones which separate.3 The notation magnification factor is, strictly speaking, not correct. Some times it is called (mathemat-

    ically exact) magnification exponent. However, the notation magnification factor is commonly used [13].

    Therefore we use this notation here.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    9/41

    9

    with

    = jl=1

    dGVwr,wnA

    l(r)

    dGV

    wr,wnV

    l(r)

    dAr,nAl (r)

    dA (r,n

    Vl (r))

    , (3.9)

    and dA (r, r0) is the distance between the neuron positions r and r0 in the lattice A. P

    can take on positive or negative values with similar meaning as for the original topographic

    product P: if P < 0 holds the output space is too low-dimensional, and for P > 0 the

    output space is too high-dimensional. In both cases neighborhood relations are violated.

    Only for P 0 does the output space approximately match the topology of the input

    data. The present variant P overcomes the problem of strongly curved maps which may be

    judged neighborhood violating by the original P even though the shape of the map might

    be perfectly justified [14].

    3.2 Magnification Control and Structure Adaptation

    3.2.1 Magnification Control

    The first approach to influence the magnification of a vector quantizer, proposed in [18],

    is called the mechanism of conscience yielding approximately the same winning probability

    for each neuron. Hence, the approach yields density matching between input and output

    space which corresponds to a magnification factor of 1. Bauer et al. in [19] suggested a

    more general scheme for the SOM to achieve an arbitrary magnification. They introduced

    a local learning parameter s for each neuron r with hsi P(wr)m in (3.5), where m is an

    additional control parameter. Equation (3.5) now reads as

    4wr

    = sh (r, s,v) (vwr) . (3.10)

    Note that the learning factor s of the winning neuron s is applied to all updates. This local

    learning leads to a similar relation as in (3.7):

    P(W) P(V)0

    (3.11)

    with 0 = (m + 1) and allows a magnification control through the choice of m. In partic-

    ular, one can achieve a resolution of 0 = 1, which maximizes mutual information [20, 21].

    Application of this approach to the derivation of the dynamic of TRN yields exactly the

    same result [22].

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    10/41

    10

    3.2.2 Structure Adaptation

    For both neural map types, the SOM and the TRN, growing variants for structure adaptation

    are known.

    In the growing SOM (GSOM) approach the output A is a hypercube in which both the

    dimension and the respective length ratios are adapted during the learning procedure in

    addition to the weight vector learning, i.e. the overall dimensionality and the dimensions

    along the individual directions change in the hypercube. The GSOM starts from an initial

    2-neuron chain, learns according the regular SOM-algorithm, adds neurons to the output

    space according to a certain criterion to be described below, learns again, adds again, etc.,

    until a prespecified maximum number Nmax of neurons is distributed. During this procedure,

    the output space topology remains of the form n1 n2 ..., with nj = 1 for j > DA, where

    DA is the current dimensionality ofA.4 From there it can grow either by adding nodes in

    one of the directions which are already spanned by the output space or by initialising a new

    dimension. This decision is made on the basis of the receptive fields r. When reconstructing

    v V from neuron r, an error = vwr remains decomposed along the different directions,

    which results from projecting back the output grid A into the input space V:

    = vwr =

    DAXi=1

    ai(v) wr+ei wrei

    kwr+ei wreik+ v0 (3.12)

    Here, ei denotes the unit vector in direction i ofA, wr+ei and wrei are the weight vectors

    of the neighbors of the neuron r in the ith direction of the lattice A.5 Considering a re-

    ceptive field r and determining the first principle component PCA ofr allows a further

    decomposition ofv0. Projection ofv0 onto the direction ofPCA then yields aDA+1(v),

    v0

    =aDA+1(

    v)

    PCA

    ||PCA|| +v00.

    (3.13)

    The criterion for the growing now is to add nodes in that direction which has, on average,

    the largest of the expected (normalized) error amplitudes ai:

    ai =

    rni

    ni + 1

    Xvr

    | ai(v) |qPDA+1j=1 a

    2j(v)

    , i = 1,...,DA + 1 (3.14)

    4 Hence, the initial configuration is 2 1 1 ..., DA = 1.5 At the border of the output space grid, where not two, but just one neighboring neuron is available, we

    usewrwre

    i||wrwrei || orwr+e

    i

    wr

    ||wreiwr|| to compute the backprojection of the output space direction ei into the input

    space.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    11/41

    11

    After each growth step, a new learning phase has to take place, in order to readjust the map.

    For a detailed study of the algorithm we refer to [23].

    The Growing TRN adapts the number of neurons in TRN whereby the structure between

    them is re-arranged according to the TRN definition [24] Hence, it is capable of representing

    the data space in a topologically faithful manner with increasing accuracy and it also realizes

    a structure adaptation.

    3.3 Relevance Learning

    As mentioned above neural maps are unsupervised learning algorithms. In case of labeled

    data one can apply supervised classification schemes for vector quantization (VQ). Assuming

    that the data are labeled, an automatic clustering can be learned via attaching maps to

    the SOM or adding a supervised component to VQ to achieve learning vector quantization

    (LVQ) [25, 26]. Various modifications of LVQ exist which ensure faster convergence, a better

    adaptation of the receptive fields to optimum Bayesian decision, or an adaptation for complex

    data structures, to mention just a few [25, 27, 28].

    A common feature of unsupervised algorithms and LVQ is the information provided by

    the distance structure in V, which is determined by the chosen metric. Learning heavily

    relies on this feature and therefore crucially depends on how appropriate the chosen metric

    is for the respective learning task. Several methods have been proposed which adapt the

    metric during training [29],[30],[31].

    Here we focus on metric adaptation for LVQ since the LVQ combines the elegance of

    simple and intuitive updates in unsupervised algorithms with the accuracy of supervised

    methods. Let cv L be the label of input v, L a set of labels (classes) with #L = NL. LVQ

    uses afi

    xed number of codebook vectors (weight vectors) for each class. LetW = {wr} be theset of all codebook vectors and cr be the class label ofwr. Furthermore, let Wc= {wr|cr = c}

    be the subset of weights assigned to class c L.

    The training algorithm adapts the codebook vectors such that for each class c L, the

    corresponding codebook vectors Wc represent the class as accurately as possible. This means

    that the set of points in any given class Vc = {v V|cv = c}, and the union Uc =S

    r|wrWc

    r

    of receptive fields of the corresponding codebook vectors should differ as little as possible.

    For a given data point v with class label cv let (v) denote some function which is negativeifv is classified correctly, i.e., it belongs to a receptive field r with cr = cv, and which is

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    12/41

    12

    positive ifv is classified incorrectly, i.e., it belongs to a receptive field r with cr 6= cv. Let

    f : R R be some monotonically increasing function. The general scheme of GLVQ aims

    to minimize the cost function

    Cost =Xv

    f((v)) (3.15)

    via stochastic gradient descent. Different choices of f and (v) yield LVQ or LVQ2.1 as

    introduced in [25], respectively, as shown in [27]. Denote by d the squared Euclidean distance

    ofv to the nearest codebook vector. The choice of f as the sigmoidal function and

    (v) =dr+ d

    r

    dr+

    + dr

    , (3.16)

    where dr+

    is the squared Euclidean distance of the input vector v to the nearest codebook

    vector labeled with cr+ = cv, say wr+, and dr is the squared Euclidean distance to the best

    matching codebook vector but labeled with cr 6= cr+, say wr, yields a powerful and noise

    tolerant behavior since it combines adaptation near the optimum Bayesian borders similarly

    as in LVQ2.1, with prohibiting the possible divergence of LVQ2.1 (as reported in [27]). We

    refer to this modification as the GLVQ. The respective learning dynamic is obtained by

    differentiation of the cost function (3.15) [27]:

    4wr+ =

    4 f0|(v) dr

    (dr+ + dr)2 (vw

    r+) and 4w

    r =

    4 f0|(v) dr+

    (dr+ + dr)2 (vw

    r) (3.17)

    To result a metric adaptation we now introduce input weights = (1, . . . ,DV), i 0,

    kk = 1 in order to allow a different scaling of the input dimensions: Substituting the

    squared Euclidean metric by its scaled variant

    d (x,y) =

    DVXi=1

    i(xi yi)2 , (3.18)

    the receptive field of codebook vector wr becomes

    r=v V : r = VA (v)

    (3.19)

    with

    VA : v 7 s (v) = argminrA

    d (v,wr) (3.20)

    as mapping rule. Replacing r by

    rand d by d in the cost function Cost in (3.15) yields a

    variable weighting of the input dimensions and hence an adaptive metric. Now (3.17) reads

    as

    4wr+ = 4 f0|(v) d

    r

    (dr+

    + dr

    )2(vwr+) and 4wr =

    4 f0|(v) d

    r+

    (dr+

    + dr

    )2(vwr) (3.21)

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    13/41

    13

    with being the diagonal matrix with entries 1, . . . ,DV. Appropriate weighting factors

    can also be determined automatically via a stochastic gradient descent. In this way the

    GLVQ-learning rule (3.17) is augmented as follows:

    4k = 1 2 f0|(v)

    dr

    (dr+

    + dr

    )2(v wr+)

    2k

    dr+(d

    r++ d

    r)2

    (v wr)2k

    !(3.22)

    with 1 (0, 1) and k = 1 . . . DV, followed by normalization to obtain kk = 1. We term

    this generalization of the Relevance Learning Vector Quantization (RLVQ) [32] and GLVQ

    Generalized Relevance Learning Vector Quantization or GRLVQ as introduced in [33],[34].

    Obviously, the same idea could be applied to any gradient dynamics. We could, for

    example, minimize a different error function such as the Kullback-Leibler divergence of the

    distribution which is to be learned and the distribution which is implemented by the vector

    quantizer. This approach is not limited to supervised tasks. Unsupervised methods such as

    the NG [9], which obey a gradient dynamics, could be improved by application of weighting

    factors in order to obtain an adaptive metric. Furthermore, in unsupervised topographic

    mapping models like the SOM or NG one can try to learn the relevance of input dimensions

    subject to the requirement of topology preservation [33],[35].

    3.4 Further Approaches

    Beside the above introduced methods further approaches of neural inspired algorithms are

    well known. In this context we have to refer to the family of Fuzzy-clustering algorithms as for

    instance Fuzzy-SOM [36],[37],[38], Fuzzy-c-means [39],[40] or other (neural) vector quantizers

    based on minimization of an energy function [41],[42],[20]. For an overview of resepective

    approaches we refer to [43] (neural network approaches), [44],[45] (pattern classification).

    4 Application in Remote Sensing Image Analysis

    4.1 Low-Dimensional Data: LANDSAT TM Multi-spectral Im-

    ages

    4.1.1 Image Description

    LANDSAT TM satellite-based sensors produce images of the Earth in 7 different spectral

    bands. The ground resolution in meters is 30 30 for bands 1 5 and band 7. Band 6

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    14/41

    14

    (thermal band) is often dropped from analyses because of the lower spatial resolution. The

    LANDSAT TM bands were strategically determined for optimal detection and discrimination

    of vegetation, water, rock formations and cultural features within the limits of broad band

    multi-spectral imaging. The spectral information, associated with each pixel of a LANDSAT

    scene is represented by a vector v V RDV with DV = 6. The aim of any classification

    algorithm is to subdivide this data space into subsets of data points, with each subset

    corresponding to specific surface covers such as forest, industrial region, etc. The feature

    categories are specified by prototype data vectors (training spectra).

    In the present contribution we consider a LANDSAT TM image from the Colorado area,

    U.S.A.6 In addition we have a manually generated label map (ground truth image) for the

    assessment of classification accuracy. All pixels of the Colorado image have associated ground

    truth labels, categorized into 14 classes. The classes include several regions of different

    vegetation and soil, and can be used for supervised learning schemes like the GRLVQ (see

    Table 1).

    4.1.2 Analyses of LANDSAT TM Imagery

    A initial Grassberger-Procaccia analysis [46] yields DGP 3.1414 for the intrinsic dimen-

    sionality (ID) of the Colorado image.

    SOM analysis The GSOM generates a 12 7 3 lattice (Nmax = 256) in agreement

    with the Grassberger-Procaccia analysis (DGP 3.1414), which corresponds to a P-value of

    0.0095 indicating good topology preservation (Figure 3).

    One way to visualize the GSOM clusters of LANDSAT TM data is to use a SOM di-

    mension DA = 3 [47] and interpret the positions of the neurons r in the lattice A as vectors

    c (r) = (r,g,b) in the RGB color space, where r,g,b are the intensities of the colors red, green

    and blue, respectively, computed as cl (r) =(rl1)(nl1)

    255, for l = 1, 2, 3 [47]. Such assignment of

    colors to winner neurons immediately yields a pseudo-color cluster map of the original image

    for visual interpretation. Since we are mapping the data clouds from a 6-dimensional input

    space onto a three-dimensional color space dimensional conflicts may arise and the visual

    interpretation may fail. However, in the case of topologically faithful mapping this color

    representation, prepared using all six LANDSAT TM image bands, contains considerably

    6 Thanks to M. Augusteijn (Univerity of Colorado) for providing this image.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    15/41

    15

    more information than a costumary color composite combining three TM bands (frequently

    bands 2, 3, and 3). [47].

    Application of GRLVQ We trained the GRLVQ with 42 codebook vectors (3 for eachclass) on 5% of the data set until convergence. The algorithm converged in less than 10

    cycles for = 0.1 and 1 = 0.01. The GRLVQ produced a classification accuracy of about

    90% on the training set as well as on the entire data set. The weighting factors resulting

    from GRLVQ analysis provide a ranking of the data dimensions:

    = (0.1, 0.17, 0.27, 0.21, 0.26, 0.0) (4.1)

    Clearly, dimension 6 is ranked as least important with weighting factor close to 0. Dimension

    1 is the second least important followed by dimensions 2 and 4. Dimensions 3 and 5 are

    of similarly high importance according to this ranking. If we prune dimensions 6, 1, and

    2, an accuracy of 84% can still be achieved. Pruning dimension 4 brings the classification

    accuracy down to 50%, which is a very significant loss of information. This indicates that

    the intrinsic data dimension may not be as low as 2. These classification results are shown

    in Figure 4 with misclassified pixels colored black and classes keyed by the color wedge.

    Use of the scaled metric (3.18) during training (which is equivalent to having a new

    input space V) of a GSOM results in a two-dimensional lattice structure the size of which

    is 11 10. This is in agreement with a corresponding Grassberger-Procaccia estimate of

    the intrinsic data dimension as DGP 2.261 when this metric is applied. However, as we

    saw above and in Figure 4, reducing the dimensionality of this Landsat TM image to 2 is

    not as simple as discarding the four input image planes that received the lowest weightings

    from the GRLVQ. Both the Grassberger-Procaccia estimate and the GSOM suggest the

    intrinsic dimension of 2 for V whereas the dimension suggested by GRLVQ is at least 4.

    Hence, the (scaled) data lie on a two-dimensional submanifold in V. The distribution of

    weights for the two-dimensional lattice structure is visualized in Figure 5 for each original

    (but scaled) input dimension. All input dimensions, except the fourth, seem to be correlated.

    The fourth dimension shows a clearly different distribution which causes a two-dimensional

    representation. The corresponding clustering, visualized in the two-dimensional (r,g, 0) color

    space is depicted in Figure 4. Based on visual inspection this cluster structure compares

    better with the reference classification in Figure 4, upper left, than the GSOM clustering

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    16/41

    16

    in Figure 4. The scaling information provided to the GSOM by GRLVQ based on training

    labels seems to have improved the quality of the clustering.

    4.2 Hyperspectral Data: The Lunar Crater Volcanic Field

    AVIRIS Image

    4.2.1 Image Description

    A Visible-Near Infrared (0.4 2.5 m), 224-band, 20 m/pixel AVIRIS image of the Lunar

    Crater Volcanic Field (LCVF), Nevada, U.S.A., was analyzed in order to study SOM per-

    formance for high-dimensional remote sensing spectral imagery. (AVIRIS is the Airborne

    Visible-Near Infrared Imaging Spectrometer, developed at NASA/Jet Propulsion Labora-

    tory. See http://makalu.jpl.nasa.gov for details on this sensor and on imaging spectroscopy.)

    The LCVF is one of NASAs remote sensing test sites, where images are obtained regularly.

    A great amount of accumulated ground truth from comprehensive field studies [48] and

    research results from independent earlier work such as [49] provide a detailed basis for the

    evaluation of the results presented here.

    Figure 6 shows a natural color composite of the LCVF with labels marking the locations

    of23 different surface cover types of interest. This 10 12km2 area contains, among other

    materials, volcanic cinder cones (class A, reddest peaks) and weathered derivatives thereof

    such as ferric oxide rich soils (L, M, W), basalt flows of various ages (F, G, I), a dry lake

    divided into two halves of sandy (D) and clayey composition (E); a small rhyolitic outcrop

    (B); and some vegetation at the lower left corner (J), and along washes (C). Alluvial material

    (H), dry (N,O,P,U) and wet (Q,R,S,T) playa outwash with sediments of various clay contents

    as well as other sediments (V) in depressions of the mountain slopes, and basalt cobble stones

    strewn around the playa (K) form a challenging series of spectral signatures for pattern

    recognition (see in [6]). A long, NW-SE trending scarp, straddled by the label G, borders

    the vegetated area. Since this color composite only contains information from three selected

    image bands (one Red, one Green, and one Blue), many of the cover type variations remain

    undistinguished. They will become evident in the cluster and class maps below.

    After atmospheric correction and removal of excessively noisy bands (saturated water

    bands and overlapping detector channels), 194 image bands remained from the original 224.

    These 194-dimensional spectra are the input patterns in the following analyses.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    17/41

    17

    The spectral dimensionality of hyperspectral images is not well understood and it is an

    area of active research. While many believe that hyperspectral images are highly redundant

    because of band correlations, others maintain an opposite view. Few investigations exist into

    the intrinsic dimensionality (ID) of hyperspectral images. Linear methods such as PCA or

    determination of mixture model endmembers [50] [51] usually yield 3 8 endmembers.

    Optimally Topology Preserving Maps, a TRN approach [52], finds the spectral ID of the

    LCVF AVIRIS image to be between 3 and 7 whereas the Grassberger-Procaccia estimate

    [46] for the same image is DGP 3.06. These surprisingly low numbers, that increase with

    improved sensor performance [53], result from using statistical thresholds for the determina-

    tion of what is relevant, regardless of application dependent meaning of the data.

    The number of relevant components increases dramatically when specific goals are con-

    sidered such as what cover classes should be separated. With an associative neural network,

    Pendock [54] extracted 20 linear mixing endmembers from a 50-band (2.0 2.5 m) seg-

    ment of an AVIRIS image of Cuprite, Nevada (another well known remote sensing test site),

    setting only a rather general surface texture criterium. Benediktsson et al. [55] performed

    feature extraction on an AVIRIS geologic scene of Iceland, which resulted in 35 bands.

    They used an ANN (the same network that performed the classification itself) for Decision

    Boundary Feature Extraction (DBFE). The DBFE is claimed to preserve all features that

    are necessary to achieve the same accuracy by using all the original data dimensions, by the

    same classifier for predetermined classes. However, no comparison of classification accuracy

    was made using the full spectral dimension to support the DBFE claim. In their study a

    relatively low number of classes, 9, were of interest, and the goal was to find the number of

    features that describe those classes. Separation of a higher number of classes may require

    more features.

    It is not clear how feature extraction should be done in order to preserve relevant in-

    formation in hyperspectral images. Selection of 30 bands from our LCVF image by any of

    several methods leads to a loss of a number of the originally determined 23 cover classes.

    One example is shown in Figure 8. Wavelet compression studies on an earlier image of the

    the same AVIRIS scene [56] conclude that various schemes and compression rates affect dif-

    ferent spectral classes differently, and none was found overall better than another, within

    25% 50% compressions (retaining 75% 50% of the wavelet coefficients). In a study on

    simulated, 201-band spectral data, [57] show slight accuracy increase across classifications

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    18/41

    18

    on 20-band, 40-band, and 60-band subsets. However, that study is based on only two vege-

    tation classes, the feature extraction is a progressive hierarchical subsampling of the spectral

    bands, and there is no comparison with using the full, 201-band case. Comparative studies

    using full spectral resolution and many classes are lacking, in general, because few methods

    can cope with such high-dimensional data technically, and the ones that are capable (such

    as Minimum Distance, Parallel Piped) often perform too poorly to merit consideration.

    Undesirable loss of relevant information can result using any of these feature extraction

    approaches. In any case, finding an optimal feature extraction requires great preprocessing

    efforts just to taylor the data to available tools. An alternative is to develop capabilities

    to handle the full spectral information. Analysis of unreduced data is important for the

    establishment of benchmarks, exploration and novelty detection; as well as to allow for

    the distinction of significantly greater number of cover types than from traditional multi-

    spectral imagery (such as LANDSAT TM), according to the purpose of modern imaging

    spectrometers.

    4.2.2 SOM Analyses of Hyperspectral Imagery

    A systematic supervised classification study was conducted on the LCVF image

    (Figure 6), to simultaneously assess loss of information due to reduction of spectral dimen-

    sionality, and to compare performances of several traditional and an SOM-based hybrid

    ANN classifier. The 23 geologically relevant classes indicated in Figure 6 represent a great

    variety of surface covers in terms of spatial extent, the similarity of spectral signatures [6],

    and the number of available training samples. The full study, complete with evaluations of

    classification accuracies, is described in [58]. Average spectral shapes of these 23 classes are

    also shown in [6].

    Figure 7, top panel, shows the best classification, with 92% overall accuracy, produced

    by an SOM-hybrid ANN using all 194 spectral bands for input. This ANN first learns in an

    unsupervised mode, during which the input data are clustered in the hidden SOM layer. In

    this version the SOM uses the conscience mechanism of DeSieno [18], which forces density

    matching (i.e. a magnification factor of 1) as pointed out in Section 3.2 and illustrated

    for this particular LCVF image and SOM mapping in [59]. After the convergence of the

    SOM, the output layer is allowed to learn class labels via a Widrow-Hoff learning rule. The

    preformed clusters in the SOM greatly aid in accurate and sensitive classification, by helping

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    19/41

    19

    prevent the learning of inconsistent class labels. Detailed description of this classifier is

    given in several previous scientific studies, which produced improved interpretation of high-

    dimensional spectral data compared to earlier analyses [60] [61] [62]. Training samples

    for the supervised classifications were selected based on field knowledge. The SOM hidden

    layer was not evaluated and used for identification of spectral types (SOM clusters) prior

    to training sample determination. Figure 7 reflects the geologists view of the desirable

    segmentation.

    In order to apply Maximum Likelihood and other covariance based classifiers, the number

    of spectral channels needed to be reduced to 30, since the maximum number of training spec-

    tra that could be identified for all classes was 31. Dimensionality reduction was performed

    in several ways, including PCA, equidistant subsampling, and band selection by a domain

    expert. Band selection by domain expert proved most favorable. Figure 7, bottom panel,

    shows the Maximum Likelihood classification with 30 bands, which produced 51% accuracy.

    A number of classes (notably the ones with subtle spectral differences, such as N, Q, R, S,

    T, V, W) were entirely lost. Class K (basalt cobbles) disappeared from most of the edge of

    the playa, and only traces of B (rhyolitic outcrop) remained. Class G and F were greatly

    overestimated. Although the ANN classifier produced better results (not shown here) on the

    same 30-band reduced data set than the Maximum Likelihood, a marked drop in accuracy

    (to 75% from 92%) occurred compared to classification on the full data set. This emphasizes

    that accurate mapping of interesting, spatially small geologic units is possible from full

    hyperspectral information and with appropriate tools.

    Discovery in Hyperspectral Images with a SOM The previous section demonstrated

    the power of the SOM in helping discriminate among a large number of predetermined

    surface cover classes with subtle differences in the spectral patterns, using the full spectral

    resolution. It is even more interesting to examine the SOMs preformance for the detection

    of clusters in high-dimensional data. Figure 8 displays a 40 40 SOM trained with the

    the conscience algorithm [18]. The input data space was the entire 194-band LCVF image.

    Groups of neurons, altogether 32, that were found to be sensitized to groups of similar

    spectra in the 194-dimensional input data, are indicated by various colors. The boundaries

    of these clusters were determined by a somewhat modified version of the U-matrix method

    [63]. The density matching property of this variant of the SOM facilitates proper mapping

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    20/41

    20

    of spatially small clusters and therefore increases the possibility of discovery. Examples of

    discoveries are discussed below. Areas where no data points (spectra) were mapped are the

    grey corners with uniformly high fences, and are relatively small. The black background in

    the SOM lattice shows areas that have not been evaluated for cluster detection. The spatial

    locations of the image pixels mapped onto the groups of neurons in Figure 8, are shown in the

    same colors in Figure 9. Color coding for clusters that correspond to classes or subclasses of

    those in Figure 7, top, is the same as in Figure 7, to show similarities. Colors for additional

    groups were added.

    The first observation is the striking correspondence between the supervised ANN class

    map in Figure 7, top panel, and this clustering: the SOM detected all classes that were known

    as meaningful geological units. The discovery of classes B (rhyolitic outcrop, white), F

    (young basalt flows, dark grey and black, some shown in the black ovals), G (a different

    basalt, exposed along the scarp, dark blue, one segment outlined in the white rectangle), K

    (basalt cobbles, light blue, one segment shown in the black rectangle), and other spatially

    small classes such as the series of playa deposits (N, O, P, Q, R, S, T) is significant. This

    is the capability we need for sifting through high-volume, high-information-content data to

    alert for interesting, novel, or hard-to-find units. The second observation is that the SOM

    detected more, spatially coherent, clusters than the number of classes that we trained for in

    Figure 7. The SOMs view of the data is more refined and more precise than that of the

    geologists. For example, class A (in Figure 7) is split here into a red (peak of cinder cones)

    and a dark orange (flanks of cinder cones) cluster, that make geologic sense. The maroon

    cluster to the right of the red and dark orange clusters at the bottom of the SOM fills in some

    areas that remained unclassified (bg) in the ANN class map, in Figure 7. An example is the

    arcuate feature at the base of the cinder cone in the white oval, that apparently contains a

    material different enough to merit a separate spectral cluster. This material fills other areas,

    also unclassified in Figure 7, consistently at the foot of cinder cones (another example is seen

    in the large black oval). Evaluation of further refinements are left to the reader. Evidence

    that the SOM mapping in Figure 9 approximates an equiprobabilistic mapping (that the

    magnification factor for the SOM in Figure 9 is close to 1), using DeSienos algorithm, is

    presented in [59].

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    21/41

    21

    GSOM Analysis of the LCVF Hyperspectral Image As mentioned above, earlier

    investigations yielded an intrinsic spectral dimensionality of 3 7 for the LCVF data set

    [52]. The Grassberger-Procaccia estimate [46] DGPA 3.06 corroborates the lower end of

    the above range, suggesting that the data are highly correlated, and therefore a drastic

    dimensionality reduction may be possible. However, a faithful topographic mapping is nec-

    essary to preserve the information contained in the hyperspectral image. In addition to the

    conscience algorithm explicit magnification control [19] according to (3.10) and the growing

    SOM (GSOM) procedure, as extensions of the standard SOM, are suitable tools [64]. The

    GSOM produced a lattice of dimensions 8 6 6 for the LCVF image, which represents a

    radical dimension reduction, and it is in agreement with the Grassberger-Procaccia analysis

    above. The resulting false color visualization of the spectral clusters is depicted in Figure 10.

    It shows a segmentation similar to that in the supervised classification in Figure 7, top panel.

    Closer examination reveals, however, that a number of the small classes with subtle spectral

    differences from others, were lost (B, G, K, Q, R, S, T classes in Figure 7, top panel). In

    addition, some confusion of classes can be observed. For example, the bright yellow cluster

    appears at locations of known young basalt outcrops (class F, black, in Figure 7), and it also

    appears at locations of alluvial deposits (class H, orange in Figure 7, top panel). This may

    inspire further investigation into the use of magnification control, and perhaps the growth

    criterium in the GSOM.

    5 Conclusion

    Self-Organizing Maps have been showing great promise for the analyses of remote sensing

    spectral images. With recent advances in remote sensor technology, very high-dimensional

    spectral data emerged and demand new and advanced approaches to cluster detection, vi-

    sualization, and supervised classification. While standard SOMs produce good results, the

    high dimensionality and large amount of hyperspectral data call for very careful evaluation

    and control of the faithfulness of topological mapping performed by SOMs. Faithful topolog-

    ical mapping is required in order to avoid false interpretations of cluster maps created by an

    SOM. We summarized several new approaches developed in the past few years. Extensions to

    the standard Kohonen SOM, the Growing Self-Organizing Map, magnification control, and

    Generalized Relevance Learning Vector Quantization were discussed along with the modified

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    22/41

    22

    topographic product P. These ensure topology preservation through mathematical consid-

    erations. Their performance, and relationship to a former powerful SOM extension, the

    DeSieno conscience mechanism was discussed in the framework of case studies for both low-

    dimensional traditional multi-spectral, and very high-dimensional (hyperspectral) imagery.

    The Grassberger-Procaccia analysis served for an independent estimate of the determina-

    tion of intrinsic dimensionality to benchmark ID estimation by GSOM and GRLVQ. While

    we show some excellent data clustering and classification, there remains certain discrepancy

    between theoretical considerations and application results, notably with regard to intrinsic

    dimensionality measures and the consequences of dimensionality reduction to classification

    accuracy. This will be targeted in future work. Finally, since it is outside the scope of this

    contribution, we want to point out that full scale investigations such the LCVF study also

    have to make heavy use of advanced image processing tools and user interfaces, to handle

    great volumes of data efficiently, and for effective graphics/visualization. References to such

    tools are made in the cited literature on data analyses.

    6 Acknowledgements

    E.M. has been supported by the Applied Information Systems Research Program of NASA,

    Office of Space Science, NAG59045 and NAG5-10432. Contributions by Dr. William H.

    Farrand, providing field knowledge and data for the evaluation of the LCVF results, are

    gratefully acknowledged. Khoros (Khoral Research, Inc.) and NeuralWorks Professional

    (NeuralWare) packages were utilized in research software development.

    References

    [1] E. Mernyi, The challenges in spectral image analysis: An introduction and review of

    ANN approaches, in: Proc. Of European Symposium on Artificial Neural Networks

    (ESANN99), D facto publications, Brussels, Belgium, 1999, pp. 9398.

    [2] J. Richards, X. Jia, Remote Sensing Digital Image Analysis, third, revised and enlarged

    edition Edition, Springer-Verlag, Berlin, Heidelberg, New York, 1999.

    [3] R. O. Green, Summaries of the 6th Annual JPL Airborne Geoscience Workshop, 1.

    AVIRIS Workshop, Pasadena, CA (March 46 1996).

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    23/41

    23

    [4] J. Campbell, Introduction to Remote Sensing, The Guilford Press, U.S.A., 1996.

    [5] R. N. Clark, Spectroscopy of rocks and minerals, and principles of spectroscopy, in:

    A. Rencz (Ed.), Manual of Remote Sensing, John Wiley and Sons, Inc, New York,

    1999.

    [6] E. Mernyi, Self-organizing ANNs for planetary surface composition research, in: Proc.

    Of European Symposium on Artificial Neural Networks (ESANN98), D facto publica-

    tions, Brussels, Belgium, 1998, pp. 197202.

    [7] T. Martinetz, K. Schulten, Topology representing networks, Neural Networks 7(3)

    (1994) 507522.

    [8] T. Kohonen, Self-Organizing Maps, Vol. 30 of Springer Series in Information Sciences,

    Springer, Berlin, Heidelberg, 1995, (Second Extended Edition 1997).

    [9] T. M. Martinetz, S. G. Berkovich, K. J. Schulten, Neural-gas network for vector quan-

    tization and its application to time-series prediction, IEEE Trans. on Neural Networks

    4 (4) (1993) 558569.

    [10] E. Erwin, K. Obermayer, K. Schulten, Self-organizing maps: Ordering, convergenceproperties and energy functions, Biol. Cyb. 67 (1) (1992) 4755.

    [11] H. Ritter, K. Schulten, On the stationary state of Kohonens self-organizing sensory

    mapping, Biol. Cyb. 54 (1986) 99106.

    [12] T. Kohonen, Comparison of SOM point densities based on different criteria, Neural

    Computation 11 (8) (1999) 212234.

    [13] M. M. van Hulle, Faithful Representations and Topographic Maps From Distortion- to

    Information-based Self-organization, J. Wiley & Sons, Inc., 2000.

    [14] T. Villmann, R. Der, M. Herrmann, T. Martinetz, Topology Preservation in Self

    Organizing Feature Maps: Exact Definition and Measurement, IEEE Transactions on

    Neural Networks 8 (2) (1997) 256266.

    [15] H. Ritter, Parametrized self-organizing maps, in: S. Gielen, B. Kappen (Eds.), Proc.

    ICANN93 International Conference on Artificial Neural Networks, Springer, London,

    UK, 1993, pp. 568575.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    24/41

    24

    [16] J. Goppert, W. Rosenstiel, The continuous interpolating self-organizing map, Neural

    Processing Letters 5 (3) (1997) 18592.

    [17] H.-U. Bauer, K. R. Pawelzik, Quantifying the neighborhood preservation of Self-

    Organizing Feature Maps, IEEE Trans. on Neural Networks 3 (4) (1992) 570579.

    [18] D. DeSieno, Adding a conscience to competitive learning, in: Proc. ICNN88, Interna-

    tional Conference on Neural Networks, IEEE Service Center, Piscataway, NJ, 1988, pp.

    117124.

    [19] H.-U. Bauer, R. Der, M. Herrmann, Controlling the magnification factor of self

    organizing feature maps, Neural Computation 8 (4) (1996) 757771.

    [20] R. Linsker, How to generate maps by maximizing the mutual information between input

    and output signals, Neural Computation 1 (1989) 402411.

    [21] T. Villmann, M. Herrmann, Magnification control in neural maps, in: Proc. Of Eu-

    ropean Symposium on Artificial Neural Networks (ESANN98), D facto publications,

    Brussels, Belgium, 1998, pp. 191196.

    [22] T. Villmann, Controlling strategies for the magnification factor in the neural gas net-

    work, Neural Network World 10 (4) (2000) 739750.

    [23] H.-U. Bauer, T. Villmann, Growing a Hypercubical Output Space in a SelfOrganizing

    Feature Map, IEEE Transactions on Neural Networks 8 (2) (1997) 218226.

    [24] B. Fritzke, A growing neural gas network learns topologies, in: G. Tesauro, D. S. Touret-

    zky, T. K. Leen (Eds.), Advances in Neuralm Information Processing Systems 7, MIT

    Press, Cambridge MA, 1995.

    [25] T. Kohonen, S. Kaski, H. Lappalainen, J. Saljrvi, The adaptive-subspace self

    organizing map (assom), in: International Workshop on SelfOrganizing Maps

    (WSOM97), Helsinki, 1997, pp. 191196.

    [26] A. Meyering, H. Ritter, Learning 3D-shape-perception with local linear maps, in: Pro-

    ceedings of IJCNN92, 1992, pp. 432436.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    25/41

    25

    [27] A. S. Sato, K. Yamada, Generalized learning vector quantization, in: G. Tesauro,

    D. Touretzky, T. Leen (Eds.), Advances in Neural Information Processing Systems,

    Vol. 7, MIT Press, 1995, pp. 423429.

    [28] P. Somervuo, T. Kohonen, Self-organizing maps and learning vector quantization for

    feature sequences, Neural Processing Letters 10 (2) (1999) 151159.

    [29] M. Pregenzer, G. Pfurtscheller, D. Flotzinger, Automated feature selection with a dis-

    tinction sensitive learning vector quantizer, Neurocomputing 11 (1) (1996) 1929.

    [30] S. Kaski, Dimensionality reduction by random mapping: Fast similarity computation

    for clustering, in: Proceedings of IJCNN98, International Joint Conference on Neural

    Networks, Vol. 1, IEEE Service Center, Piscataway, NJ, 1998, pp. 413418.

    [31] S. Kaski, J. Sinkkonen, Metrics induced by maximizing mutual information, in: Helsinki

    University of Technology, Publications in Computer and Information Science, Report

    A55, Espoo, Finland, 1999.

    [32] T. Bojer, B. Hammer, D. Schunk, K. T. von Toschanowitz, Relevance determination

    in learning vector quantization, in: Proc. Of European Symposium on Artificial Neural

    Networks (ESANN2001), D facto publications, Brussels, Belgium, 2001, pp. 271276.

    [33] B. Hammer, T. Villmann, Estimating relevant input dimensions for self-organizing algo-

    rithms, in: N. Allinson, H. Yin, L. Allinson, J. Slack (Eds.), Advances in Self-Organising

    Maps, Springer-Verlag, London, 2001, pp. 173180.

    [34] B. Hammer, T. Villmann, Generalized relevance learning vector quantization, Neural

    Networks 15 (8-9) (2002) 10591068.

    [35] B. Hammer, T. Villmann, Input pruning for neural gas architectures, in: Proc. Of Eu-

    ropean Symposium on Artificial Neural Networks (ESANN2001), D facto publications,

    Brussels, Belgium, 2001, pp. 283288.

    [36] J. Bezdek, N. R. Pal, Fuzzification of the self-organizing feature map: will it work?,

    Proceedings of the SPIEThe International Society for Optical Engineering 2061 (1993)

    14262.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    26/41

    26

    [37] T. Graepel, M. Burger, K. Obermayer, Self-organizing maps: generalizations and new

    optimization techniques, Neurocomputing 21 (13) (1998) 17390.

    [38] M. Hasenjger, H. Ritter, K. Obermayer, Active data selection for fuzzy topographic

    mapping of proximities, in: G. Brewka, R. Der, S. Gottwald, A. Schierwagen (Eds.),

    Fuzzy-Neuro-Systems99 (FNS99), Proc. Of the FNS-Workshop, Leipziger Univer-

    sittsverlag, Leipzig, 1999, pp. 93103.

    [39] J. C. Bezdek, N. R. Pal, R. J. Hathaway, N. B. Karayiannis, Some new competitive

    learning schemes, Proceedings of the SPIEThe International Society for Optical En-

    gineering 2492 (pt. 1) (1995) 53849, (Applications and Science of Artificial Neural

    Networks Conference Date: 1721 April 1995 Conference Loc: Orlando, FL, USA Con-ference Sponsor: SPIE).

    [40] I. Gath, A. Geva, Unsupervised optimal fuzzy clustering, IEEE Transactions on Pattern

    Analysis and Machine Intelligence 11 (1989) 773781.

    [41] C. M. Bishop, M. Svensn, C. K. I. Williams, GTM: The generative topographic map-

    ping, Neural Computation 10 (1998) 215234.

    [42] J. Buhmann, H. Khnel, Vector quantization with complexity costs, IEEE Transactions

    on Information Theory 39 (1993) 11331145.

    [43] S. Haykin, Neural Networks - A Comprehensive Foundation, IEEE Press, New York,

    1994.

    [44] R. Duda, P. Hart, Pattern Classification and Scene Analysis, Wiley, New York, 1973.

    [45] J. Schrmann, Pattern Classification, J. Wiley and Sons Inc., New York, 1996.

    [46] P. Grassberger, I. Procaccia, Maesuring the strangeness of strange attractors, Physica

    9D (1983) 189208.

    [47] M. H. Gross, F. Seibert, Visualization of multidimensional image data sets using a

    neural network, Visual Computer 10 (1993) 145159.

    [48] R. E. Arvidson, M. D.-B. and. et al., Archiving and distribution of geologic remote

    sensing field experiment data., EOS, Transactions of the American Geophysical Union

    72 (17) (1991) 176.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    27/41

    27

    [49] W. H. Farrand, VIS/NIR reflectance spectroscopy of tuff rings and tuff cones, Ph.D.

    thesis, University of Arizona (1991).

    [50] J. B. Adams, M. O. Smith, A. R. Gillespie, Imaging spectroscopy: Interpretation based

    on spectral mixture analysis, in: C. Peters, P. Englert (Eds.), Remote Geochemical

    Analysis: Elemental and Mineralogical Composition, Cambridge University Press, New

    York, 1993, pp. 145166.

    [51] R. S. Inc., ENVI v.3 Users Guide (1997).

    [52] J. Bruske, E. Mernyi, Estimating the intrinsic dimensionality of hyperspectral images,

    in: Proc. Of European Symposium on Artificial Neural Networks (ESANN99), D facto

    publications, Brussels, Belgium, 1999, pp. 105110.

    [53] R. O. Green, J. Boardman, Exploration of the relationship between information content

    and signal-to-noise ratio and spatial resolution, in: Proc. 9th AVIRIS Earth Science and

    Applications Workshop, Pasadena, CA, 2000.

    [54] N. Pendock, A simple associative neural network for producing spatially homogeneous

    spectral abundance interpretations of hyperspectral imagery, in: Proc. Of European

    Symposium on Artificial Neural Networks (ESANN99), D facto publications, Brussels,

    Belgium, 1999, pp. 99104.

    [55] J. A. Benediktsson, J. R. Sveinsson, et al., Classification of very-high-dimensional data

    with geological applications, in: Proc. MAC Europe 91, Lenggries, Germany, 1994, pp.

    1318.

    [56] T. Moon, E. Mernyi, Classification of hyperspectral images using wavelet transforms

    and neural networks, in: Proc. of the Annual SPIE Conference, San diego, CA, 1995,

    p. 2569.

    [57] J. A. Benediktsson, P. H. Swain, et al., Classification of very high dimensional data using

    neural networks, in: IGARSS90 10th Annual International Geoscience and Remote

    Sensing Symp., Vol. 2, 1990, p. 1269.

    [58] E. Mernyi, W. H. Farrand, et al., Efficient geologic mapping from hyperspectral images

    with artificial neural networks classification: a comparison to conventional tools, IEEE

    TGARS (2001) (in preparation).

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    28/41

    28

    [59] E. Merenyi, precision mining" of high-dimensional patterns with self-organizing maps:

    Interpretation of hyperspectral images, in: P. Sincak, J. Vascak (Eds.), Quo Vadis

    Computational Intelligence? New Trends and Approaches in Computational Intelligence

    (Studies in Fuzziness and Soft Computing, Vol. 54, Physica-Verlag, ?, 2000.

    [60] E. S. Howell, E. Mernyi, L. A. Lebofsky, Classification of asteroid spectra using a

    neural network, Jour. Geophys. Res. 99 (10) (1994) 847865.

    [61] E. Mernyi, E. S. Howell, et al., Prediction of water in asteroids from spectral data

    shortward of 3 microns, ICARUS 129 (10) (1997) 421439.

    [62] E. Mernyi, R. B. Singer, J. S. Miller, Mapping of spectral variations on the surface of

    mars from high spectral resolution telescopic images, ICARUS 124 (10) (1996) 280295.

    [63] A. Ultsch, Knowledge acquisition with self-organizing neural networks, in: I. Aleksander,

    J. Taylor (Eds.), Artificial Neural Networks, 2, Vol. I, North-Holland, Amsterdam,

    Netherlands, 1992, pp. 735738.

    [64] T. Villmann, Neural maps for faithful data modelling in medicine state of the art and

    exemplary applications, Neurocomputing 48 (14) (2002) 229250.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    29/41

    29

    Tables

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    30/41

    30

    class label ground cover type

    a scotch pine

    b Douglas fir

    c pine/fir

    d mixed pine forrest

    e supple/prickle pine

    f aspen / mixed pine forrest

    g without vegetation

    h aspen

    i water

    j moist meadow

    k bush land

    l grass / pastureland

    m dry meadow

    n alpine vegetation

    o misclassification

    Table 1: Table of labels and classes occuring in the LANDSAT TM Colorado image.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    31/41

    31

    Figures

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    32/41

    32

    Figure 1: Comparison of the spectral resolution of the AVIRIS hyperspectral imager and the

    LANDSAT TM imager in the visible and near-infrared spectral range. AVIRIS representsa quasi-continuous spectral sampling whereas LANDSAT TM has coarse spectral resolution

    with large gaps between the band passes. Additionally, spectral reflectance charateristics of

    common earth surface materials is shown: 1 - water, 2 - vegetation, 3 - soil (from [2]).

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    33/41

    33

    Figure 2: Left: The concept of hyperspectral imaging. Figure from [4]. Right: The spectral

    signature of the mineral alunite as seen through the 6 broad bands of Landsat TM, as seen by

    the moderate spectral resolution sensor MODIS (20 bands in this region), and as measured

    in laboratory. Figure from [5]. Hyperspectral sensors such as AVIRIS of NASA/JPL [3]

    produce spectral details comparable to laboratory measurements.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    34/41

    34

    Figure 3: Left: The reference classification for the Landsat TM Colorado image, provided

    by M. Augusteijn. Classes are keyed as shown on the color wedge on the left, and the

    corresponding surface units are described in Table 1. Right: Cluster map of the Colorado

    image derived from all six bands by the 12 7 3 GSOM-solution. Note that, due to the

    visualization scheme, which uses a continuous scale of RGB colors described in the text,

    this map does not show hard cluster boundaries. More similar colors indicate more similar

    spectral classes.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    35/41

    35

    Figure 4: GRLVQ results for the Landsat TM Colorado image. Upper left: GRLVQ

    without pruning. Upper right: GRLVQ with pruning of dimensions 1, 2, and 6. Lower

    left: GRLVQ with pruning dimensions 1, 2, 6, and 4. Lower right: Clustering of the

    Colorado image using GSOM with GRLVQ scaling. For the three supervised classifications,

    the same color wedge and class labels apply as in Figure 3, left.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    36/41

    36

    Figure 5: The distribution of weight values in the two-dimensional 1110 GSOM lattice for

    the non-vanishing input dimensions 15 as a result of GRLVQ-scaling of the Colorado image

    during GSOM training. Dimension 4 has significantly different weight distribution from all

    others, while in the rest of the dimensions the variations are less dramatic. (Please note that

    the scaling is different for each dimension.) For visualization the SOM-Toolbox provided by

    the Neural Network Group at Helsinki University of Technology was used.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    37/41

    37

    Figure 6: The Lunar Crater Volcanic Field. RGB natural color composite from an AVIRIS,

    1994 image. The full hyperspectral image comprises 224 image bands over the 0.4 2.5

    m wavelength range, 512 614 pixels, altogether 140 Mbytes of data. Labels indicate 23

    different cover types described in the text. The ground resolution is 20 m/pixel.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    38/41

    38

    Figure 7: Top: SOM-hybrid supervised ANN classification of the LCVF scene, using 194

    image bands. The overall classification accuracy is 92% [58]. Bottom: Maximum Likelihood

    classification of the LCVF scene. 30, strategically selected bands were used due to the limited

    number of training samples for a number of classes. Considerable loss of class distinction

    occurred compared to the ANN classifi

    cation, resulting in an overall accuracy of 51%. bgstands for background (unclassified pixels).

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    39/41

    39

    Figure 8: Clusters identified in a 4040 SOM. The SOM was trained on the entire 194-band

    LCVF image, using the DeSieno algorithm [18].

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    40/41

    40

    Figure 9: The clusters from Figure 8 remapped to the original spatial image, to show where

    the different spectral types originated from. The relatively large, light grey areas correspond

    to the black, unevaluated parts of the SOM in Figure 8. Ovals and rectangles highlight

    examples of small classes with subtle spectral differences, discussed in the text. 32 clusters

    were detected, and found geologically meaningful, adding to the geologists view reflected in

    Figure 7, top panel.

  • 8/3/2019 Thomas Villmann, Erzsbet Mernyi and Barbara Hammer- Neural Maps in Remote Sensing Image Analysis

    41/41

    41

    Figure 10: GSOM-generated cluster map of the same 194 band hyperspectral image of the

    Lunar Crater Volcanic Field, Nevada, USA, as in Figure 7. It shows groups similar to those

    in the supervised classification map in Figure 7, top panel. Subtle spectral units such as B,

    G, K, Q, R, S, T were not separated, however, and for example, the yellow unit appears at

    both the locations of known young basalt deposits (class F, black, in Figure 7, top panel)

    and at locations of alluvial deposits (class H, orange, in Figure 7).