university polytechnic of madrid faculty …oa.upm.es/42680/1/kamal_alrawi.pdf · university...
TRANSCRIPT
UNIVERSITY POLYTECHNIC OF MADRID
FACULTY OF COMPUTER SCIENCE
DESIGN NEW SUPERVISED ART-TYPE ARTIFICIAL NEURAL NETWORKS, AND THEIR PERFORMANCES
FOR CLASSIFICATION LANDSAT TM IMAGES
Presented by
KAMAL R. AL-RAWI
To obtain the Ph.D. in Computer Science
MADRID- SPAIN
2001
CONSUELO GONZALO MARTIN, Associate Professor,
Department of Architecture and Technology of Computer
Systems, Faculty of Computer Science, University
Polytechnic of Madrid.
CERTIFIES: that the thesis entitled "DESIGN NEW
SUPERVISED ART-TYPE ARTIFICIAL NEURAL
NETWORKS, AND THEIR PERFORMANCES FOR
CLASSIFICATION LANDSAT TM IMAGES", has been
carried out by KAMAL R. AL-RAWI, under my
supervisión, in the Department of Architecture and
Technology of Computer Systems, Faculty of Computer
Science, University Polytechnic of Madrid.
To
Prof. Dr. Amos Eddy
ACKNOWLEDGEMENTS
I gratefully thank Dr. Consuelo Gonzalo Martín, associate professor of computer science, at Faculty of Computer Science, University Polytechnic of Madrid, for her continuous efforts during the supervisión of this thesis. Her guide and criticisms were a great help to me.
The criticisms of Dr. Águeda Arquero Hidalgo and Dr. Estibaliz Martínez Izquierdo were a great help. They were always in touch during preparing of this work.
My grateful thanks to Professor Dr. Pedro Gómez Vilda and the rest of the Working Group on Computer Technology, Dr. Victoria Rodellar Biarge, Dr. Mercedes Pérez Castellanos, and Dr Víctor Nieto Lluis, for their support and useful discussion.
I would like to thank all the gradúate students in the group, especially to Vicente Garcia del Cantara, for the friendly atmosphere during my stay in the Department of Architecture and Technology of Computer Systems.
My thanks to the secretary of the department Mrs. M. del Carmen Parró Cruz, who was always there to arrange our administration works.
I gratefully thank professor Dr. José Luis Casanova, Director of the Remote Sensing Laboratory (LATUV), University of Valladolid for using the facilities of the laboratory.
My thanks to Miss Sarah Strauss and Miss Nicole Knudsen for the proof reading of the Thesis.
Finally, I need to thank my wife Eman, my daughter Hiba, and my sons Saif Al-Deen and Haitham for their support during the preparing of this work.
INDEX
CHAPTER I: INTRODUCTION 1
1.1 Historical background 1
1.2 Adaptive Resonance Theory ANNs 2
1.2.1 Unsupervised ART ANNs 3
1.2.2 Supervised ART ANNs 4
1.3 classifying remotely sensed data with ANNs 4
1.4 Objectives 7
CHAPTER II: FUZZY ART ANN 8
2.1 Introduction 8
2.2 Matching system and vigilance parameter 8
2.3 Fuzzy ART dynamics 9
2.4 Fast-learning slow record option 14
2.5 Complement coding 15
2.6 Fuzzy subset and conservative limit 15
2.7 Training Algorithms of Fuzzy ART 16
2.8 Evolution of Fuzzy ART 19
2.9 Newly developed versions of Fuzzy ART 23
2.9.1 Flagged approach 24
2.9.2 Training algorithms of Flagged-Fuzzy ART. 27
2.9.3 Compact approach 29
2.9.4 Training algorithms of Compact-Fuzzy ART. 33
2.10 Categorization 35
CHAPTER III: FUZZY A R T M A P , 37
3.1 Introduction 37
3.2 Fuzzy ARTMAP 37
3.2.1 Vigilance parameter dynamics in supervised environment. 39
3.2.2 trainingphase 43
3.2.3 Classification phase 47
3.3 Full algorithm of Fuzzy ARTMAP 47
3.3.1 Training algorithms of Fuzzy ARTMAP. 48
3.3.2 Classification algorithm of Fuzzy ARTMAP 50
CHAPTER IV: SUPERVISED ART-I A N N 52
4.1 Introduction 52
4.2 Supervised ART-I 54
4.2.1 Architecture ofSupervised ART-I. 55
4.2.2 Data Description 58
A.2.3 Training of Supervised ART-I 58
4.2.4 Classification by Supervised ART-I 60
4.3 Algorithm of Supervised ART-1 60
4.3.1 Training Algorithm of Supervised ART-1. 60
4.3.2 Classification algorithm of Supervised ART-I 63
4.4 Discussion 64
CHAPTER V: SUPERVISED ART-II ANN 66
5.1 Introduction 66
5.2 Supervised ART-II 66
5.2.1 Architecture of Supervised ART-II. 66
5.2.2 Training of Supervised ART-II 68
5.2.3 Classification by Supervised ART-II. 74
5.3 Full algorithm of Supervised ART-II 74
5.3.1 Training algorithm of Supervised ART-II 74
5.3.2 Classification algorithm of Supervised ART-II. 78
5.4 Discussion 79
CHAPTER VI: PERFORMANCE OF SUPERVISED ART-I&II FOR CLASSIFICATION OF LANDSAT TM IMAGES 82
6.1 Satellites Landsat 82
6.2 Data 84
6.3 Performance 84
6.3.1 Training performance 84
6.3.2 Classification performance 92
CHAPTER VII: PERFORMANCES OF SUPERVISED ART ANNsWITH DIFFERENT VIGILANCE DYNAMICS 99
7.1 Introduction 99
7.2 Vigilance dynamics 99
7.2.1 Flying approach 99
7.2.2 Fixed vigilance approach 100
7.2.3 Free vigilance approach 100
7.2.4 Floating approach 102
7.3 Results and discussion 102
CHAPTER VIII: CONCLUSIONS 106
BIBLIOGRAPHY 109
APPENDIX: RESUMEN 115
A.l. INTRODUCCIÓN 115
A.1.1 Evolución histórica de las Redes Neuronales Artificiales (RNA) 115
A.1.2 Clasificación de datos remotamente detectados con RNA 116
A.2. OBJETIVOS DE LA TESIS 119
A.3. REDES NEURONALES ARTIFICIALES TIPO ART 119
A.3.1 Fuzzy ART 121
A.3.2 Fuzzy ARTMAP 123
A.4. PROPUESTA DE DOS VERSIONES MEJORADAS DE FUZZY ART 125
A.4.1 Versión "Flagged" de Fuzzy ART 125
A.4.2 Versión "Compact" de Fuzzy ART. 126
A.5. PROPUESTA DE DOS NUEVAS ARQUITECTURAS SUPERVISADAS TIPOART 127
A. 5.1 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-I 128
A.5.2 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-IJ 130
A.6. EVALUACIÓN DE LAS PRESTACIONES DE SUPERVISED ART-I Y SUPERVISED ART-II EN LA CLASIFICACIÓN DE IMÁGENES REMOTAMENTE DETECTADAS 132
A.7. PRESTACIONES DE REDES SUPERVISADA TIPO ART PARA DIFERENTES DINÁMICAS DEL PARÁMETRO DE VIGILANCIA. 135
A.8. CONCLUSIONES : 137
LIST OF FIGURES
Figure 2-1: Fuzzy ART dynamics 13
Figure 2-2: The architecture of Fuzzy ART 17
Figure 2-3: The architecture of FlaggedFuzzy ART. 26
Figure 2-4: The architecture of Compact Fuzzy ART. 31
Figure 3-1: Block diagram shows supervisión through mapfield. 38
Figure 3-2: The full architecture for supervisión through mapfield. 40
Figure 3-3: The architecture of ARTMAP for classification problem 41
Figure 3-4: Full architecture of Fuzzy ARTMAP 42
Figure 3-5: Match tracking using flying vigilance parameter 44
Figure 4-1: Training of map filed weights 53
Figure 4-2: Supervisión dynamic of the tagging approach of Supervised ART-I... 56
Figure 4-3: Architecture of Supervised ART-I 57
Figure 5-1: Supervisión dynamic of the stacking approach of Supervised ART-II 67
Figure 5-2: Architecture of Supervised ART-II 69
Figure5-3: Determination the winning node in the stacking-supervision approach of Supervised ART-II 71
Figure 6-1 :Number of category nodes in the domain of the vigilance parameter p and the dynamic learning parameter J3, using 9000 pixels of the
Landsat TM images 87
Figure 6-2: Training time, in minutes, for Supervised ART-I, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 89
Figure 6-3: Training time, in minutes, for Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 90
Figure 6-4: The ratio of training time for Supervised ART-I / Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning
parameter /?, using 9000 pixels of the Landsat TM images 91
Figure 6-5: Classification time, in minutes in the domain of the vigilance parameter p and the dynamic learning parameter /?, for 52 440 pixels of the Landsat TM images 93
Figure 6-6: Classification performance, in the domain of the vigilance parameter p and the dynamic learning parameter j5, for Landsat TM images 94
Figure 6-7: The abo ve image is the reference image. The lower image is the classified image using Supervised ART-II, with vigilance parameter p =0.98, the dynamic learning parameter /? =0.50, and training with 9000 exemplars. The classification accuracy is 85.82% 95
Figure 7-1: Sketches show different vigilance parameter dynamics: Fixed, free, And float approaches 101
Figure 7-2: Classified images for landsat TM images. First, second, third, and forthcolumn represents classified images using fly, float, fixed, and free vigilance parameter, respectively. First, second, third, fourth, and fifth raw represents classified images using initial vigilance
parameter and dynamic learning rate of (0.98, 0.50), (0.95, 0.20), (0.90, 0.15), (0.70, 0.15), (0.00, 0.15), respectively 104
LIST OF TABLES
Table 2-1: Comparison among Original, Flagged, and Compact algorithms of Fuzzy ART. The last two have been developed in this study 32
Table 5-1: Comparisons between Fuzzy ARTMAP, Supervised ART-I, and Supervised ART-II 80
Table 6-1: Descriptions for Landsat-5 Thematic Mapper (TM) images 83 Table 6-2: Performance of Supervised ART-II when trained with different
sizes of training samples 86
Table 6-3: Training and classification statistics for Landsat TM image at individual classlevel 96
Table 6-4: The confusión matrix for the classification process for the 52 440 pixels of the Landsat TM image.... 97
Table 7-1: The performance of Supervised ART-II ANN with different vigilance dynamics 103
ABSTRACT
New Supervised ART ANNs with simple architectures have been developed in
this study. Their architectures have been built from a single module of ART rather than
a pair of them connected by a map field as all other supervised ART-type ANNs that
have been reported in the literatee. Two different algorithms have been developed:
Supervised ART-I and Supervised ART-II. The developed algorithms reduced the
number of dynamic parameters, memory requirement, and the training time which is the
major problem facing the ANNs, without altering the classification accuracy.
Two simplified versión of Fuzzy ART algorithms have been developed, keeping
the categorization performance as that of the original algorithm. They are Flagged
Fuzzy ART and Compact Fuzzy ART. While Supervised ART-I and Supervised ART-II
are general in nature that can be applied to all ART ANNs, the supervisión of Compact
Fuzzy ART has been addressed in this work. The full algorithms for Supervised ART-I
and Supervised ART-II have been listed.
The newly developed ANNs have been applied to classify Landsat Thematic
Mapper (TM) images. The performance of the systems has been tested for different
dynamic parameters and different training samples. The behavior of the systems in the
vigilance parameter and dynamic learning parameter space has been addressed. Their
performances in the domain of the vigilance parameter and the dynamic learning
parameter have been understood.
Only one approach, for vigilance dynamic in all supervised ART-type ANNs,
has been addressed in the literatee. Three more approaches have been developed in this
study: fixed, free, and float. The performance of the developed ANNs for classification
landsat TM images has been tested for all these different vigilance dynamics.
CHAPTERI
INTRODUCTION
1.1 Historical background
Although the roots of the fieldof Artificial Neural Networks (ANNs) extend to
1943 when McCulloch and Pitts built the first artificial neural structure, its foundations
have been established in mid seventies. (Werbos 1974) developed the principie of the
Back Propagation (BP) ANN. (Grossberg 1976) developed the principie of Adaptive
Resonance Theory (ART) ANNs. However, the great theoretical advance of the field
has been achieved in 1980s. In that decade the algorithms of the BP ANN were
developed independently by many authors (Le Cun 1986, Parker 1986, and Rumelhart
et al. 1986). The Kohonen Self-Organizing Map KSOM (Kohonen 1982) and Hopfield
ANN (Hopfield 1982) have been developed. A lot of advances have been achieved for
ART ANNs (Carpenter & Grossbergh 1987a&b). ART ANNs is the concern of this
study due to their stability, rapidity and accuracy (Carpenter et al. 1991a&b, 1992,
1997, and Gan & Lúa 1992).
ART ANNs have been applied in many fields. Boeing Company has
implemented ART-1 neural information retrieval system for its engineering designs
(Caudell et al. 1994). Boeing Company has thousands of designs for its aircraft parts.
Features are extracted for each design. These features are presented to the network to
establish categories for these designs. When a new design is needed, its features are
presented to the system to determine the category that the required design belongs to.
1
Retrieval some features from the designs of the pointed category will avoid the
repetition of work for the new design.
ART ANNs have been employed for target recognition (Seibert & Waxman
1992). Their approach is extracting features of the target (aircraft) from different views.
(Bernardon & Carrick 1995) have used them also for target recognition using Synthetic-
Aperture Radar (SAR) imagery. After learning the network, target recognition is done
through matching the signal of the target with a set of stored target models. (Kumar &
Guez 1989, and Waxman et al. 1995) have used ART ANNs for target recognition too.
Kumar and Guez worked with visible, while Waxman and his group worked in visible,
infrared and SAR.
Moreover, ART ANNs have been employed for robot sensory motor control
(Baloch & Waxman 1991, Bachelder et al. 1993, Dubrawski & Crowley 1994,
Srinivasa & Sharma 1996) and robot navigation (Racz & Dubrawski 1995); machine
visión (Caudell & Healy 1994); object recognition (Seibert & Waxman 1992); face
recognition (Siebert & Waxman 1993); pattern clustering (Moore 1989, Mekkaoui &
Jespers 1990); character recognition (Wilson et al. 1990); sonar signal processing
(Simpson 1990); medical imaging (Soliz & Donohoe 1996); electrocardiogram wave
recognition (Ham & Han 1996); signature verification (Murshed et al. 1995); fault
identification problem in a nuclear power plant (Keyvan 1999); and remote sensing
(Gopaleía/. 1994; Baraldi & Parmiggiani 1995).
1.2 Adaptive Resonance Theory ANNs
There are two types of ANNs, supervised and unsupervised. In unsupervised
case only the input features are introduced to the input layer, then the network
categorizing them. While in the supervised type of ANNs the class code is supplied to
2
the network together with the input features. During training phase, when the network
correctly classifies an input feature, weights are trained, otherwise correction should be
done.
1.2.1 Unsupervised ART ANNs
The principie of ART ANNs was introduced in the literature as a theory of
human cognitive information processing (Grossberg 1976, 1980). Since then a series of
ART-based ANNs have been developed for unsupervised category learning and pattern
recognition in real-time: ART1 (Carpenter & Grossberg 1987a), ART2 (Carpenter &
Grossberg 1987b), ART3 (Carpenter & Grossberg 1990), SART (Baraldi &
Parmiggiani 1995), and Fuzzy ART (Carpenter et al. 1991a). ART1 has the ability to
categorize arbitrary binary input patterns (Carpentar & Grossberg 1987a). ART2 has the
ability to deal with binary and analog pattern as well (Carpentar & Grossberg 1987b).
The information, in ART1 and ART2, flows forward through weights that are connected
each node in the input layer to all nodes in the category layer, and backward through
another set of weights which connect each category node to all nodes in the input layer.
A simple architecture of unsupervised ART ANN has been developed (Carpentar et al.
1991a). They called it Fuzzy ART. It is like ART2, in that it has the ability to
categorize analog multi-valued input patterns and binary input patterns as well. Weights
in Fuzzy ART connect each node in input layer to all category nodes. Information flows
through these weights in one way, from the input layer to the category layer. Fuzzy
ART will be explained in details in chapter II.
3
1.2.2 Supervised ART ANNs
In the early nineties, two supervised ART architectures have been developed
ARTMAP (Carpenter et al. 1991b) and Fuzzy ARTMAP (Carpenter et al. 1992).
Architecture of ARTMAP has been built from two modules of ARTl, while architecture
of Fuzzy ARTMAP has been built from two modules of Fuzzy ART. ARTMAP has the
ability of learning and classifying binary multivalued input patterns. Fuzzy ARTMAP
has the ability of learning and classifying analog input patterns, in addition to the binary
one (Carpenter et al. 1992). More supervised ART-type ANNs have been developed;
ART-EMAP (Carpenter & Ross 1993), Gaussian ARTMAP (Williamson 1996),
ARTMAP-IC (Carpenter & Markuzon 1998), and Distributed ARTMAP (Carpenter
.1998). In all these architectures, the supervisión has been done through map field that
requires two modules of ART.
Fuzzy ARTMAP has been used widely. It showed better performance than
various other ANNs dealing with different problems such as, automatic analysis of
electrocardiogram (Ham & Han 1996); diagnostic monitoring of nuclear plants (Keyvan
et al. 1993); and prediction of protein secondary structure (Mehta et al. 1993).
1.3 Classifying remotely sensed data with ANNs
Mapping land-cover using remotely sensed data is a very active área of
research, due to the advances in space and computer technology (Benediktsson et al.
1990). Conventional classification is usually employed for this task. However, neural
networks have been often used in the last decade. The main advantages of neural
networks over conventional classifiers as Máximum Likelihood Classifier (MLC) are
that: 1) they are non-parametric, therefore, the probability distributions for each class
are not required. This allows us to introduce ancillary data (slope, topography, aspect,
4
...etc), in addition to the spectral data to the network, which many authors reported can
increase the classifícation accuracy (Benediktsson et al. 1990, Carpenter et al. 1997).
Moreover, neural networks are more robust when the distribution is not gaussian (Paola
& Schowengerdt 1997, Hepner et al. 1990). 2) Unlike conventional classifíers, neural
networks are able to manage fuzzy classifications (Paola & Schowengerdt 1997, Warner
& Shank 1997, Yool 1998). The numbers in the output represent the strength of the
classes membership of the specific input. This is very important when we deal with low
spatial resolution. 3) The parallel feature of neural networks allows us to increase the
speed of the classifícation process. This can be done by implement them on parallel
computers (Salu & Tilton 1993, Heermann & Khazenie 1992). 4) The neural networks
have flexibility for classifícation improvement (Carpenter et al. 1997). 5) It has the
ability for establishing an arbitrary decisión boundary (Paola & Schowengerdt 1995,
Tzenge/a/. 1994).
"Neural networks offer a flexible approach to building the complex, highly
non-linear models that required for a complex system. ... Unlike traditional expert
systems where knowledge is made explicit in the form of rules, neural networks
genérate their own rules by learning from exemplars" (Keyvan 1993).
Multi-Layer Perceptron (MLP), with Back Propagation learning, is the most
commonly used neural network in the literature to classify remotely sensed data. This is
due to the preferable learning approach of the network, which is based on minimizing
the error between the output of the network and the target valué. While some authors
have reported that conventional classifíers perform better than MLP (Mulder &
Spreeuwers 1991, Solaiman & Mouchot 1994), many authors have reported that MLP
perform better than MLC in classifying remotely sensed data (Hepner et al. 1990,
Heerman & Khazenie 1992, Paola & Schowengerdt 1994, Yoshida & Omatu 1994). The
5
classification performance of MLP can be improved by using ancillary data in addition
to the spectral data (Benediktsson et al. 1990). However, employing MLP as a classifier
incurs many problems. The architecture of the network is not fixed. The number of
hidden layers and the number of nodes in each hidden layer must be determined by trial
and error. This is a very costly process keeping in mind the long training time of the
network. In addition to that MLP might fall in a local mínimum during the training
phase. Moreover, MLP might not converge. Using a small learning rate to avoid the
convergence problem makes the long training time of the MLP network much longer.
(Heermann & Khazenie 1992) suggested using parallel computers to reduce the training
time. This reduces the training time but increases the hardware cost.
For classification of a Landsat image, (Carpenter et al. 1997) reported that
MLP did not converge, using learning rate=0.6 and momentum rate=0.4, after 212
minutes of training time on a SUN 4 SPARC Station, using 100 000 input presentations.
They employed a lower learning rate to avoid the convergence problem. The training
time exceeded 1000 minutes, while the classification accuracy was less than 27%. They
reported that Fuzzy ARTMAP (Carpentar et al. 1992) perform better classification
accuracy than MLP, with lower training time. They reported also that Fuzzy ARTMAP
and MLC perform the same level of classifying accuracy. Fuzzy ARTMAP has been
employed also by (Mannan et al. 1998) to classify (512x512) pixels, of an image of the
Linear Imaging Self-scanning Sensor (LISS-II) of Indian Remote Sensing Satellite
(IRS-1B), for their 13 classes. They reported that Fuzzy ARTMAP performs better than
both MLC and MLP in classification accuracy. The average classification for six data
sets are 84.7%, 80.3%, and 79.9% for Fuzzy ARTMAP, MLC, and MLP, respectively.
They reported that the training time was slightly less than that for MLC, but many times
faster than MLP.
6
Unlike MLP, Fuzzy ARTMAP has a well define architecture, it always
converges, and can tune itself to represent sub-classes by generating a new category
node. However, the main drawback to Fuzzy ARTMAP lies in the complex architecture.
ít is constructed from two modules of Fuzzy ART linked by a map field.
1.4 Objectives
The global objective of this work is to design new simplified versions for ART
ANNs architectures, which maintain their original performances, but improve
computational time and memory.
This objective can be divided in several partial objectives:
• Design new simple architectures of ART-type ANN, which provide the same
classification performances of classical ARTs.
• Develop learning and classification algorithms for these architectures.
• Encode the developed algorithms.
• Study of the behavior of the developed architectures for classification of remotely
sensed images Landsat Thematic Mapper (TM) in the whole domain of the dynamic
parameters.
The lay out of this study will be as follow: Chapter II deals with Fuzzy ART.
Chapter III deals with Fuzzy ARTMAP. Chapter IV deals with the newly developed
architectures "Supervised ART-I". Chapter V deals with the newly developed
architecture Supervised ART-II. The performance of Supervised ART-I and Supervised
ART-II ANNs for learning and classifying Landsat TM images are addressed in Chapter
VI. Performances of the newly developed ANNs using different vigilance dynamics are
addressed in chapter VIL Conclusions are listed in chapter VIII.
7
CHAPTERII
FUZZY ART ANN
2.1 Introduction
The Fuzzy ART is an unsupervised ART-based ANN. Its architecture has been
designed for leaming and categorization of arbitrary analog or binary multi-valued
input patterns. This has been achieved by using the mínimum operator ( A ) of the fuzzy
set theory instead of the intersection operator ( n ) of the set theory, which has been
employed in ART1.
2.2 Matchmg system and vigilance parameter
"Fuzzy ART incorporates the basic features of all ARTs system, notably,
pattern matching between bottom-up input and top-down leamed prototype vectors.
This matching process leads either to a resonant state that focuses attention and triggers
stable prototype leaming or a self-regulating parallel memory search. If the search ends
by selecting an established category, then the category's prototype may be refined to
incorpórate new information in the input pattern. If the search ends by selecting a
previously untrained node, then leaming of a new category takes place" (Carpenter et
al. 1991a). If the matching valué is greater than the predetermined valué, resonance
occurs and new information is incorporated to the winning category node through
training its weights, otherwise, a self-organizing parallel memory search is conducted.
8
The match criterion is called vigilance parameter/?. It calibrates the mínimum
confídence that a category node must have to represent the current input, before search
for a better-committed category node is triggered. If all committed category nodes fail
to represent the current input, a new category node is committed, as long as the
network's memory capacity is not fully utilized. The vigilance parameter is a non-
dimensional number pe(0, 1]. A valué of 1 means perfect matching. Low vigilance
parameter leads to code compression with broad generalization for categories. High
vigilance parameter leads to large number of category nodes with fine categories.
Vigilance parameter is the key feature of all ART ANNs. An ART ANN can
discrimínate up to the individual level by setting p - 1, while creating a single category
•node for all data by setting p = 0. The valué of the vigilance parameter is determined
according to type and amount of data that we have, categorization level that we look for,
the required speed, and available memory. The vigilance parameter is fixed during
training in all unsupervised ART ANNs.
2.3 Fuzzy ART dynamics
Input patterns A^e[0, 1] are presented at the input layer F¡. The choice
function Tj° for each committed category node of the category layer F? is computed
according to equation (2-1). The choice valué represents the activation level of each
committed category node;
2M
E ( 4 ( , ) A W , ) T{,) = -^ • /=1 C 2-1
« + ! > ! / 1=1
9
where wy are the weights which connect each category node/ in the Frfield with all
nodes of the input layer F¡. All weight valúes are initially set to 1 (i.e. Wy = l;for / = 1,
..., 2M. and j = 1, ..., Q . M represents the dimensión of the input features. Since, the
normalized features and their complements are introduced to the network, the dimensión
of the input vector A¡° is 2M. a is the choice parameter (a > 0). C is the total number
of committed category nodes at iteration t.
The winning committed category node is determined;
T},) = max{T¡')};j=l,...,N 2-2
It represents the category node with the highest choice valué among all category nodes
(committed and uncommitted) in the category layer. N represents the full memory
capacity of Fuzzy ART. The valué of N is normally much larger than C (N»C). All
category nodes N are involved, instead of committed category nodes C, which has been
employed by (Carpenter et al. 1991a). Their reasoning for this is to let uncommitted
category nodes be committed, when it is needed, in a sequence order (1, 2,..., j-\,j,j+1,
..., N). To achieve this, they assigned a very small positive valué ^.to each category
nodes before training is started. They called it, F2 -order constants. These valúes are
decreasing as the index of the order of category node y" in the memory field is increased.
Tj=4j;MC + lf>,...,N 2-3
0 < ^ <... < <pj <... < ^ = 0 2-4
10
In this way, when all committed category nodes are in shut off mode, because they
failed to represent the current input, the uncommitted category node (C +1)(0 will be
committed, since it has the highest choice valué (F2~order constant) among all
uncommitted category nodes as prearranged.
4'>, = max{^};JHC + \f\ ...,N 2-5
The match valué is computed for the winning category node J;
2M
match valué for node J - '-1 2M 2-6
(=1
The match valué represents a hypothesis, that the current input A(t) belong to
the winning category node J of the F2 - field. This hypothesis is tested against
predetermined vigilance parameter p e(0, 1]. The vigilance parameter represents the
minimum confidence level that is required to accept that the winning node J of the
F2 -field, represents the category of the current input A(t).
If the match valué of the winning node is less than p, the hypothesis is
rejected and this committed category node shuts off as far as the current input is
presented to the network. This is to prevent the persistent selection of the same category
node during search. Shut off is simply done by assigning -1 to the choice valué of the
failed category node. Researching for another winning committed category node is
triggered. The committed category node with the highest choice valué node is
determined among the entire category nodes N. The match valué for this node is
11
computed. The confídence level of this new winning category node to represent the
current input is then tested. Category nodes that in shut off mode persists in this mode
as far as the input vector that they failed to represent are presented at the input layer F].
The network keeps searching for máximum choice valué node J, doing
computation of the match function for node J, and testing against the vigilance
parameter p, for each committed node of the F2 - fíeld. This is done in order,
according to their choice valué's rank, until either one of the committed category node
can represent the current input A(t) (resonance occurs), then learning the weight wu of
the selected category node J, or if none, the uncommitted category node with index
C+l, which has the highest choice valué among all uncommitted category nodes as
prearranged, will be picked by the network to represent the current input. The match
valué of new committed category node passes the valué vigilance parameter p, since it
has the valué of one. That will be explained later in this chapter.
The weights of the selected category node update in order to incorpórate the
characteristics of the input pattern to category /according to the next equation;
W™ = A 4 ( ° A < ) + (1 - / ? ) < ; i=l,..., 2M 2-7
Where ¡3 e (0, 1] is the dynamic learning parameter. Putting /?=1 for fast learning. The
dynamics of Fuzzy ART is shown in (figure 2-1).
After learning the weights of the selected category node J, a check should be
done to see if a committed category node has been chosen to represent the current input
or a new category node has been committed. This is to increase the number of
committed category node C by one or none. The number of committed category node C
12
F2 (WTA)
T J
TRAINING
gggggg««»W¡ggg»«a»*g»««w»«
F,
w y
RESET
MATCH TESTING
Figure 2-1: Fuzzy ART dynamics. F¡ is the input layer. F2is the category
layer. Weights w^ connect each node in the input layer to all nodes in the
category layer. Learning for weights w¡j of the máximum choice valué node should be conducted if it is passed the matching test, otherwise reset occurs. \X\ represents the degree of membership between input and the weights of the winning category node J. It is computed as follow:
2M X = ̂ (Aj0 A w¡j). The valué of y7 represents the degree that category
1=1
node j can represent the input. For Winner Take All (WTA), as the case
here, v̂ = 1. However, ¿[yj =1 for the distributed case.
13
controls the computation of the choice function for committed category node only. Such
test can be done easily as follow;
I f (J>QThenC=C+l 2-8
If the index of the selective category node J is greater than the number of committed
category nodes C that means the uncommitted category node C+l, has been chosen to
represent the current input, because all committed category nodes failed to do so. This
uncommitted category node has the máximum prearranged choice valué ̂ c+, among the
choice valúes ^ of all other uncommitted category node, since it is prearranged so.
If choice function is computed for uncommitted category nodes, the order of
the prearranged choice valúes ^ that has been assigned for each category node y will be
destroyed. So committing uncommitted category nodes in order will be disordered.
This leads to the fragmentation of the nodes of the category layer.
In contrast, if choice function is not computed for some committed category
nodes, they keep their oíd choice valúes which represent the previous input rather than
the current input of our concern. This leads to the destruction of category node selection
according to their choice valúes for the current input.
2.4 Fast-Iearning slow record option
Fast-learning slow-record option has been suggested by (Carpenter et al.
1991b) for learning new committed category node. In this learning mode, /3 = 1 for the
first training iteration, while /? < 1 during the rest of the training phase. This is to avoid
under-training of rare events.
14
2.5 Complement coding
All ART's architectures are stable. Their weights are monotonic decreasing
during learning. This beautiful advantage of ARTs architectures sometimes produces
eroding for some of the weights wü to zero, when a large number of inputs are
presented to the network. This leads to category proliferation problem (Moore 1989,
Carpenter et al. 1991b).
(Carpenter et al. 1991b) suggested that proliferation of category nodes could be
avoided if inputs are normalized before their presentation to the network. They
recommended the presentation of input patterns and their complements as normalization
process. Complements coding leads to introduce the on-response and off-response to the
network. The norm \A\ of input patterns and their complements is always equal to the
dimensión of the input patterns M. This could be shown as follows;
M M M M M M
M|=(a,ac) = Z«,+E^=Sf l(+E(1-f l.) = £ f l . + M -E f l / = M 2"9
;=1 ;=1 /=1 1=1 ;=1 í=l
Therefore, complement coding automatically normalizes each input pattern to M.
Complement coding is a normalization process that does not alter the amplitude
information of the input features.
2.6 Fuzzy subset and conservative limit
If a is cióse to zero, then the category node whose weights wv are subset of
the input vector A¡ is first chosen, since its choice valué will be cióse to unity if such
category node exists. So the activation valué for each committed category node j ,
represents the fuzzy membership of the input vector A, in that category. Moreover, if
15
resonance occurs for a category node that's weights wü are subset of inputv4;, its
weights are unchanged during training. When a approaches to zero, it is called
conservative mode. In conservative mode neither training ñor resetting occurs. So a
must be large enough to alter such selection.
While choice valué depends on the degree to which wu is a fuzzy subset of A¡,
the reverse is true for match valué. It depends on the degree that Ai is a fuzzy subset of
wu. If the winning category node is a fuzzy subset choice, then the match function is;
2M
HWLT , 2M
According to this equation, the category node with the highest subset choice should be
chosen among all other subset choices, if such choices exist, in order to have the
máximum match valué. This choice can be controlled by the choice parameter a.
The full architecture of the Fuzzy ART is shown in (figure2-2). For more
details about Fuzzy ART see (Carpenter et al. 1991a, & 1992).
2.7 Training Algorithms of Fuzzy ART
1) Input parameters;
a) Dynamic parameters:
i- p e (0, 1]: The vigilance parameter.
ii- p e (0, 1]: The dynamic learning parameter; P
iii- a >0: The choice valué parameter.
= 1 for fast learning.
16
Figu
re 2
-2:
The
arc
hite
ctur
e of
Fuz
zy A
RT
. T
he f
ull
capa
city
N o
f th
e ca
tego
ry l
ayer
is
invo
lved
for
de
term
inat
ion
the
máx
imum
cho
ice
valu
é no
de J
. T
hey
are
show
n in
dar
k. W
eigh
ts a
re c
onne
cted
to
all
cate
gory
no
des.
Wei
ghts
that
are
con
nect
ed t
o un
com
mitt
ed c
ateg
ory
node
s ar
e sh
own
in li
ght.
Thi
s is
bec
ause
they
are
not
le
arne
d ye
t. T
he n
umbe
r of
com
pari
son
whi
ch i
s ne
eded
to
dete
rmin
e th
e m
áxim
um c
hoic
e va
lué
node
Jis
N-l
, si
nce
it is
car
ried
out
am
ong
all
cate
gory
nod
es. T
his
incr
ease
s tr
aini
ng t
ime.
If J
>C
the
n th
e un
com
mitt
ed c
ateg
ory
node
w
ith i
ndex
C+
l ha
s be
en c
omm
itted
, si
nce
its c
hoic
e va
lué
<¡>
c+x
= m
ax,{
<t>
j);j
-C+
l, ...
, N
. T
hat
beca
use
thes
e
cons
tant
are
arr
ange
d as
</>
c+x
<...
<<
p N.l\
has
bee
n pr
earr
ange
d th
is w
ay t
o le
t cat
egor
y no
des
to b
e co
mm
itted
in
orde
r to
pre
vent
the
fra
gmen
tatio
n of
the
cate
gory
lay
er.
b) Data characteristics;
i- M: The dimensión of the input features.
ii- Pt: The number of exemplars to be used in learning.
c) Initialization;
i-Wy = 1 , i=l,..., 2M,j=\, ...,N
n-Tj = <¡>} ,7=1,..., N, where 0, s 0
0<^<.. .<Í¿,<. . .<¿
iii- Number of iterations t=l.
iV- Number of committed category nodes C-l.
2) New input;
a,w for \<i<M 1
l-a^ for M + \<i<2M\
3) Compute the choice ñinction for all committed category nodes;
2M . -
1 ( 4 " A Wj y(') _ Jrí . , „
j iM ; J _ I > •••> c
¿=1
4) Reset: Determine the node J, which has the máximum choice valué;
A (0
18
2 M
5) Matching criterion: If ( £ ( 4 - ° A W U ) < M / ) ) Then;
i- Shut-off this node to put it out of competition;
rp = -i ii- GOTO STEP (4)
6) If( J>C) Then new category node has been committed
C=J
7) Training;
8)If(t<Pt)then;
i- t=t+I
ii- GOTO STEP (2)
9) Training has been done. The network is ready for categorization.
2.8 Evolution of Fuzzy ART
The Fuzzy ART algorithm has been introduced in the literatures by (Carpenter
et al. 1991a). In their algorithm, the choice valué is computed for all the potential
category nodes N in the next way;
19
2m
E(4 ( , )AWff) T.=^ ^ J=l,...,N
2M 2-11
«+2></ /=i
Since the input pattern A is normalized to [0, 1] and the initial weight valúes Wy (0) are
set to one, the choice valué for uncommitted category nodes using complement coding
is,
uncommitted
M
a + 2M 2-12
So the choice valué is;
2M
Z(4 ( , ) A W , ) ¡=1
2M
«+s W„ i=\
M
a + 2M
; / = l, ..., C
;y = C + l,..., iV
2-13
(Carpenter et al. 1991a) stated that choosing larger valúes for initial weights is
possible, but it will bias the system against the selection of uncommitted category
nodes. This is really what we look for. Our concern is to test all committed category
nodes to represent a specifíc input before a new node is committed. However, when
initial weights are very large, fast learning should be conducted, at least for the first
time when the node is committed. Fast learning brings the weights to valúes below one,
because the normalized input features that forced this node to be committed will be
20
assigned to its weights. Otherwise, long training is required to reduce the weights to
their converge valúes.
(Georgiopoulos et al. 1996) introduced a very complicated approach to insure
the testing of all committed category nodes is completed before a new node is
committed. The architecture of their Fuzzy ART has top-down weights w¡j, which they
initialized them to one, and bottom-up weights Wi}, which they initialized them to
\l(a + (p), where <p is a very large constant [2M,co]. They called <p, the uncommitted
cholee valué parameter. They stated that, the bottom-up inputs (using complement
coding) are given by the equation;
Tj =
1M
1 ( 4 ' A W,) /=!
1M
a + Y^ü M
a + <p
j = 1, .... C
j = C + l, ..., N
2-14
In this equation, <p replaces the 2M of equation (2-12). Since <p>2M, T} for
uncommitted nodes is smaller than that of committed category nodes. By using very
large q>, the choice function has valúes very cióse to zero for uncommitted category
nodes. This allowed them to examine the performance of Fuzzy ART at a -> oo, since
when (p —> oo, all committed category nodes will be tested before a new node is
committed. The valué of a alters the order of search among committed category nodes.
"A node is called an uncommitted node if all its top-down weights are equal to the
initial top-down weight valué, otherwise, the node is committed" (Georgiopoulos et al.
1996). This test takes time especially when the input features are large.
21
The weights updating in their fuzzy ART algorithm (they called it Fuzzy ART
Variant, because a -» oo and <p —» oo) for w^ is as that of equation (2-7). However,
they did not show the training algorithm for Wu, but they stated that the bottom-up
weights does not change where equation (2-15) is valid,
K * ^ " ^ ^ 2-15
While the very large valúes for the choice parameter a and the uncommitted
node choice parameter <p leads to testing of all committed category nodes before a new
node is committed in Fuzzy ART, they are not practical for Fuzzy ARTMAP because
they créate too many category nodes (Georgiopoulos et al. 1996).
It has been explained in (section 2.3) the use oíF2-order constant (f>} to assure
that the test for all committed category nodes is completed before a new node is
committed. Since <f>j » 0, all the committed category nodes will be tested to represent a
specific input before the uncommitted category nodes are tested. The valué of
<Pj decreases as j increases. This lets the uncommitted node C+l to be chosen before
other uncommitted category nodes. This approach has not been mentioned in the
original algorithm of Fuzzy ART (Carpenter et al. 1991a) or the original algorithm of
Fuzzy ARTMAP (Carpenter et al. 1992), but has been extracted from the full Fuzzy
ARTMAP algorithm that has been listed by (Carpenter et al. 1997).
(Geongiopoulos et al. 1999) include all committed category nodes in addition
to one uncommitted category node in the search for máximum choice valué node. The
choice valué for their algorithm is;
22
TJ limJs^M = - E ( 4 ( , ) A W , ) ;j = l,...,C
O ; = c +1
2-16
They assigned very large valúes for the initial weights in order to have a zero valué for
uncommitted category node, and therefore committed category nodes will be tested
first, before a new node is committed. This forced them to use fast learning when a new
node is committed to reduce them to their theoretical valúes, which are below one.
2.9 Newly developed versions of Fuzzy ART
The determination of the winning category node among the full capacity of the
network N, as reported by (Carpenter & Grossberg 1987a, Carpenter et al. 1991a), is
time consuming. The capacity of the system can be very large especially when it is
working in a non-homogenous environment. Uncommitted category nodes can be
committed in sequence order without using the prearranged choice valúes <¡>j and
without including all the capacity of the category layer N in determination the máximum
choice valué node J.
Two new versions of Fuzzy ART architectures have been constructed in this
work. The first one is the Flagged approach. This approach involves the uncommitted
category node with rank C+í in the category layer together with all committed category
nodes to determine the máximum choice valué node J. A total of C comparison is
required rather than JV-1 as the case in the original Fuzzy ART architecture. As
mentioned before, this approach has been conducted by (Georgiopoulos et al. 1999), but
in their approach they assigned large valué for initial weights which forced them to use
23
fast learning. The second one is the Compact approach, which involves only committed
category node C to determine the máximum choice valué node J.
2.9.1 Flagged approach
There is no reason at all to involve Tc+1,..., TN in determination the máximum
choice valué node. Only the uncommitted category node with rank C+1 in the category
layer will be involved. This uncommitted category node is flagged by assigning a valué
of </>c+1 to its choice valué such that;
T**-* < ¿c+1 < 0 2-17
A negative valué is assigned for </>c+l, because the input features A¡ as well as
the weights w¡j never have negative valúes. So, according to equation 2-11 the choice
valué for any committed category node is never a negative valué,
Tj > 0 ;j=l,...,C 2-18
However, the valué of (f>c+x must be greater than the choice valué of committed
category nodes that are in shut off mode. In this way, when all committed category node
are in shut-off mode, the flagged node with index C+1 in the category layer, will be
chosen as the máximum choice valué node. We should not worry about the match valué
of a new committed category node, since the match valué of any new committed node is
equal to one, which is the highest valué that the vigilance parameter p can have. That is
because A¡ is normalized to [0, 1] before its presentation to the network, and the initial
24
weights for category nodes are equal to one. So input 4 i s a subset of w /c+]. That
means A, A wiC+l = A¡. According to equation 2-11, computing the match function for
the subset choice leads always to one as demonstrated below;
1 2M 1 2M M 1-t(A¡ A W/c+1) = — Y 4 = — = 1 2-19
Therefore, the uncommitted flagged node C+\ will not go to shut off mode. It will pass
the match test for sure.
After resonance occurs, a check should be done to see if the flagged
uncommitted category node is chosen. If J>C then the flagged node has been chosen.
The number of committed category node must be increased by one (C=C+1) and the
weights of the new flagged node wiC+i should be initiated;
wi,c+\ = ! ; i'=l, -.., 2M 2-20
The full architecture of the Flagged-FuzTy ART is shown in (figure 2-3). Only
the committed category nodes and the flagged uncommitted category node are involved
in determination the máximum choice valué node. Weights are not established yet to
connect uncommitted category nodes in the F2 - layer with the all nodes of the input
layer Fx.
25
Figu
re 2
-3:
The
arc
hite
ctur
e of
Fla
gged
Fuz
zy A
RT
. O
nly
com
mitt
ed c
ateg
ory
node
s an
d th
e un
com
mitt
ed
cate
gory
nod
e w
ith in
dex
C+1
in th
e ca
tego
ry l
ayer
are
invo
lved
in
dete
rmin
atio
n th
e m
áxim
um c
hoic
e va
lué
node
J. T
hese
cat
egor
y no
des
are
show
n in
dar
k. W
eigh
ts a
re c
onne
cted
to a
ll th
ese
cate
gory
nod
es. C
ateg
ory
node
s th
at a
re n
ot in
volv
ed in
det
erm
inat
ion
the
máx
imum
cho
ice
valu
é no
de, a
re s
how
n in
ligh
t. W
eigh
ts a
re
not
conn
ecte
d to
the
m.
Wei
ghts
tha
t co
nnec
ted
to t
he f
lagg
ed n
ode
(unc
omm
itted
cat
egor
y no
de w
ith i
ndex
C
+1)
are
show
n in
ligh
t. T
his
is b
ecau
se th
ey a
re n
ot in
itiat
ed y
et. I
t w
ill b
e in
itiat
ed (
w¡ c
+1
= 1
; i=
l, ...
, 2M
) on
ly w
hen
it w
ill b
e ch
osen
as
the
máx
imum
cho
ice
valu
é no
de.
The
num
ber
of c
ompa
rison
whi
ch is
nee
ded
to d
eter
min
e th
e m
áxim
um c
hoic
e va
lué
node
is C
, sin
ce it
is
carr
ied
out a
mon
g co
mm
itted
cat
egor
y no
des
plus
the f
lagg
ed no
de. T
his
redu
ces
train
ing
time.
to
os
2.9.2 Training algorithms of Flagged-Fuzzy ART
1) Input parameters;
a) Dynamic parameters:
i- pe (O, 1]: The vigilanceparameter.
ii- fie (O, 1]: The dynamic leaming parameter; P = 1 for fast leaming.
iii- a >0: The choice valué parameter.
b) Data characteristics;
i- M: The dimensión of the input features.
ii- Pt: The number of exemplars to be used in leaming.
c) Initialization;
i-rc+1=-o.i
ii- Number of iteration t=l.
iii- Number of committed category nodes C=\.
2) New input;
' a^ for \<i<M
\-af for M + l<i<2M*
3) Compute the choice function for all committed category nodes;
2M .
ZUWAW,) 7 (̂0 _ ±\
J 2M , j= l , . . . ,C
r(') -
4) Reset: Determine the node J, which has the máximum choice valué;
r /( , ) = m o x { 7 ' ; , ) } Í J p i , . . . , c + l
5) Matching criterion: If (^(A^ A W Í , ) < Mp) then; 1=1
i- Shut off this node to put it out of competition;
ii- GOTO STEP (4)
6) If (J>C) Then new category node has been committed
i- C=J
ii- wu =• 1 ; i=l,..., 2M
i i i - r c + 1=-o. i
7) Training;
8)If(t<Pt)Then;
i- t=t+l
ii- GOTO STEP (2)
9) Training has been done. The network is ready for categorization.
28
2.9.3 Compact approach
Uncommitted category nodes can be committed in sequence order without
using the flagged uncommitted category node. It involves only the committed category
node to determine the máximum choice valué node J. It is called Compact approach.
The choice function is computed for committed category nodes. The máximum
choice valué node J i s determined among committed category nodes C only.
r j '>=max{7f} ; y=l , . . . ,C 2-21
The match valué of the selected category node J is tested against the
predetermined valué of the vigilance parameter p. If the match valué of node J is less
than p, the node is shut off by assigning a valué of -1 to its choice valué to put it out
of competition during the current input. Otherwise, the node is trained, all committed
category nodes are on, and new input is presented to the network.
When the máximum choice valué equals -1 all committed category nodes are
in shut off mode. The uncommitted category node C+l should be committed to
represent the current input in order to prevent the fragmentation of the category layer.
Simply training the initial weights of the category node with index C+l, and increasing
the count of the committed category nodes by one can do this. This commits
uncommitted category nodes according to their order in the category layer. The number
of comparison needed to determine the máximum choice valué node is (C-l) rather (N-
1) which the original Fuzzy ART algorithm requires. This will save a lot of computation
time, keeping in mind that N»C.
29
In the case of new category node should be committed, its weights will be
updated through the next equation;
™ S , = / H ( ° + ( 1 - / ? ) ;M,...,2M 2-22
According to this equation weights initialization (Wy ; i=\,..., 2M;j=\, ..., JV) is
not required, as reported by (Carpenter et al. 1991b). This also will save time since this
equation requires less arithmetic operations than the previously suggested one. The full
architecture of Compact-Fuzzy ART is shown in (figure 2-4). Committed category
nodes are shown in dark. Uncommitted category nodes are shown in light. Weights
connect all input layer nodes to committed category nodes only. Weights are not
connected to uncommitted category nodes since they are not committed yet (they are
not assigned weights yet).
The comparison among the original Fuzzy ART, Flagged-Fuzzy ART, and
Compact-Fuzzy ART is shown in (table 2-1). It shows clearly that Flagged-Fuzzy ART
and Compact-Fuzzy ART are faster than the original algorithm of Fuzzy ART. The
main point that is influencing the reduction of the training time is the number of
comparisons that are needed to determine the winning category node. They are, as
mentioned before, N-í, C, and C-1 for the original Fuzzy ART, Flagged-Fuzzy ART
and Compact-Fuzzy ART, respectively.
30
Figu
re
2-4:
The
arc
hite
ctur
e of
C
ompa
ct F
uzzy
AR
T.
Onl
y co
mm
itted
ca
tego
ry
node
s ar
e in
volv
ed
in
dete
rmin
atio
n th
e m
áxim
um c
hoic
e va
lué
node
/.
The
se c
ateg
ory
node
s ar
e sh
own
in d
ark.
Wei
ghts
con
nect
all
inpu
t la
yer
node
s to
com
mitt
ed c
ateg
ory
node
s on
ly. U
ncom
mitt
ed c
ateg
ory
node
s ar
e sh
own
in li
ght.
Wei
ghts
ar
e no
t con
nect
ed to
them
sin
ce th
ey a
re n
ot c
omm
itted
yet
(the
y ar
e no
t ass
igne
d w
eigh
ts y
et).
T
he n
umbe
r of
com
paris
on w
hich
is n
eede
d to
det
erm
ine
the
máx
imum
cho
ice
valu
é no
de is
C-l
, si
nce
it is
ca
rrie
d ou
t am
ong
com
mitt
ed c
ateg
ory
node
s on
ly. T
his
redu
ces
train
ing
time.
UJ
Init
iali
zati
on f
or C
hoic
e va
lué
Ch
oice
fun
ctio
n 7}
Det
erm
inat
ion
T„m
tn
ox
Ch
eck
for
new
com
mit
ted
node
Nu
mb
er o
f co
mpa
riso
n fo
r T
ma
x
Mat
ch t
esti
ng
Wei
ghts
ini
tial
izat
ion
Wei
ghts
upd
atin
g fo
r oí
d no
de
Wei
ghts
upd
atin
g fo
r n
ew n
ode
Rep
lace
men
t fí
xed
choi
ce v
alu
é ^
Ori
gina
l
0<4<
..<$<
..<$$
=£
2M
S(4
AW
J/)
T =
'"'
:/' =
l,...
,C
(-1
r y= W
£ü(7
;.;y=
l,...M
J >
C
N-\
1M
1=1
^
n
2M
'- P
í = l
y Sj=
Xi=
X..2
Mj=
\.^
yin=
fiA
xtf)
*Q-M
VIT
^A
V^
-ZH
'
Non
e
Fla
gged
¿c
+ 1 =
-0.1
Sam
e
TJ=
maP
/J=
l...C
+l}
J >
C
c Sam
e
w¡j
= 1
; i =
1,..
.,2M
Sam
e
V^
/^A
^V
O-Z
K'
<¿c
+ 1 =
-
0.1
Com
pac
t
Non
e
Sam
e
TJ=
mat
TJ;
j=l..
.Q
Tj=
-l
c-\
Sam
e
Non
e
Sam
e
4^/?
4«i-
m=
i2
Non
e
Tab
le 2
-1:
Com
paris
on a
mon
g O
rigin
al, F
lagg
ed, a
nd C
ompa
ct a
lgor
ithm
s of
Fuz
zy A
RT
. The
last
two
have
bee
n w
deve
lope
d in
this
stu
dy. F
lagg
ed a
nd C
ompa
ct a
lgor
ithm
s ar
e fa
ster
, how
ever
, Com
pact
alg
orith
m is
reco
mm
ende
d.
2.9.4 Training algorithms of Compact-Fuzzy ART
1) Input parameters;
a) Dynamic parameters;
i- p e (O, 1]: The vigilance parameter.
ii- P e (O, 1]: The dynamic leaming parameter; P = 1 for fast leaming.
iii- & >0: The choice valué parameter.
b) Data characteristics;
i- M: The dimensión of the input features.
ii- Pt: The number of exemplars to be used in leaming.
c) Initialization;
i- Number of iterations t=l.
ii- Number of committed category nodes C=\.
2) New input;
A" = a\l) for \<i<M
l-af for M + l<i<2M\
3) Compute the choice function for all committed category nodes;
2M . .
ZUmAW,) y(0 _ »'=
3 2M
«+E% ;7=1,..., C
¿=i
33
4) Reset: Determine the node J, which has the máximum choice valué;
T}l)=max{T^} ; /=i c
5) If TJl) - - 1 (all committed category nodes are in shut-off mode) then a
node (the node that its order in the category layer is C+1) should be
committed;
i- Increase the number of committed nodes by one;
C=C+1
ii- If in fast-learning slow-record mode;
Assign the valúes of the input feature to the weights of this node;
™£St=4l) ;M,... ,2M
Else (normal mode)
wfr=j8A?+(l-f]) ;/=!,..., 2M
iii- GOTO STEP (2)
2M
6) Matching criterion: lí(^(A¡l) A WU) < Mp) then; í=i
i- Shut-off this node to put it out of competition;
T)n=-\ J
ii- GOTO STEP (4)
7) Learning;
!C = M O A < ) + 0 - ^
8)If(t<Pt)then;
i- t=t+l
ii- GOTO STEP (2)
9) Training has been done. The network is ready for categorization.
2.10 Categorization
At the end of the training phase, all weights are fixed. The number of category
node C is known. The network is ready for categorization.
l)Input:
a O for \<i<M
\-a)n for M + \<i<2M
2) Compute the choice valúes for all committed nodes;
2M
Z U W A W , ) T¡»=-*—r,— - ,y=l,..., C 2M
3) Determine the node J, which has the máximum choice function among all
committed category nodes;
(Oí Ty> = max{T}'>} ;/=!,..., C
35
Match testing:
If (the match valué for the winning node J> p) then;
Category node J represents the category of this input
Else
The network fails to categorize this input
5) If more categorization is needed GOTO STEP (1).
6) Categorization has been done.
CHAPTERIII
FUZZY ARTMAP ANN
3.1 Introduction
While the roots of ART ANNs go back to 1976, the supervisión was not started
until sixteen years later when ARTMAP. architecture has been constructed (Carpenter et
al. 1991b). More supervised architectures have been constructed thereañer (Fuzzy
ARTMAP, ART-EMAP, Gaussian ARTMAP, ARTMAP-IC, and Distributed
ARTMAP). All of them are constructed from two modules of ART ANNs linked by a
map fíeld. Supervisión of ART ANNs using map field approach is shown in (figure 3-
1). The supervisión approach, using map field, will be explained through Fuzzy
ARTMAP.
3.2 Fuzzy ARTMAP
The Fuzzy ARTMAP is a supervised ART-type ANN. It is a generalization of
the ARTMAP ANN. "ARTMAP system learns orders of magnitude more quickly,
effectively, and accurately than alternative algorithms. It achieves these properties by
using an internal controller that conjointly maximizes predictive generalization and
minimizes predictive error by linking predictive success to category size on trail-by-trail
bases, using only local operations. This computation increases the vigilance parameter
p, of ARTaby the minimum amount needed to correct predictive error at ARTé"
(Carpenter et al. 1991b). Therefore, ARTMAP is a self-organizing expert system, since
it calibrates the selectivity of its hypotheses based upon predictive success (Carpenter et
37
b(TRAINING)
MAP F I E L D / - ^ GAIN ( A fc
CONTROLA. J * — ( \ M k ? FIELD ~H JORIENTING
VySUBSYSTEM
Figure3-1: Block diagram shows supervisión through map field. Two modulus of ARTs inter-linked by a map field.
38
al. 1991b). While ARTMAP treats binary input only, the Fuzzy ARTMAP is capable of
learning and classification of both binary and analog input patterns that present in
arbitrary order to the network. While back propagation ANN required 20 000 epochs to
learn a benchmark (Lang & Withbrok 1989), Fuzzy ARTMAP required only 5 epochs
(Carpenter et al. 1992). The architectures of all supervised ART-type ANNs consist of
two modules of ARTs (ARTa and ARTb). These two architectures are linked together
through a map field, see (figure 3-1&2). For classification tasks, ARTé is reduced to the
input layer only, see (figure 3-3). Map field is simply aniVxZ, array of binary weights
wjk; j - 1 , ..., N; k-1, ..., L initially set to one, see (figure 3-4). When wJK =0 means the
category node ./represents other class than K. Therefore, the node J should be shut off.
3.2.1 Vigilance parameter dynamics in supervised environment
As it has been mentioned before, the vigilance parameter p e [0, 1 ] is the key
feature for ART ANNs. It represents the minimum match valué required for a
committed category node to represent the current input. A match valué of 1 represents
perfect representation, while a match valué of 0 represents no match at all. A high valué
leads to genérate many category nodes to represent fine subclasses, while a low valué
leads to fewer category nodes with coarser subclasses. If the match valué of the winning
category node is greater than the predetermined vigilance parameter p while class
matching is failed, then the current match valué is assigned to the vigilance parameter
after increase it by a very small valué s as it is shown in equation 3-1;
i 2M
P = ~77^Ait) AWu) + £ 3-1
M tí
39
ARTMAP
Figure 3-2: The full architecture for supervisión through mapfield. ARTMAP is shown here. The dynamics of the network is very complex. All supervised ART-type ANNs that have been reported in the literatures (Fuzzy ARTMAP, ART-EMAP, Gaussian ARTMAP, ARTMAP-IC, and Distributed ARTMAP) are constructed using the map field approach. Carpenter and her group they represent all modules of ART as three layers: Input layer F0, category layer F2
and the hypothetical layer Fx. The assumed layer Fl represents the membership x between input and the weights of the winning category node J.
40
ARTMAP MAPHELD
fab
AFtt\
V
Fl"
Fo
w Jk
.ab
RESET
Pab)<-
PREOCTNE ERROR R =1
n. a
RES3NANCE
®F¿ MATCH TRACKNG
a a'
&,;& = !...,£
T Binary digital code
Figure 3-3: The architecture of ARTMAP for classifícation problem.
Match tracking is done through map field. If Y(6<0 A wJk)<pab) match
tracking should be conducted. This approach requires map field weights, map field vigilance parameter, and binary digital for class code.
41
Figure3-4: Architecture of Fuzzy ARTMAP. The full structure of ARTA is not needed. It has diminished to input layer only. All components in the upper light box are belonging to the supervisión through map fíeld. The components out side the box is the original architecture of Fuzzy ART.
42
The vigilance parameter p is only increasing during training phase when class
matching of the wining category node is failed for a specific input features. The very
small valué e is added to the failed match valué and then is assigned to the vigilance
parameter in order to classify rare events (Carpenter et al. 1992). The vigilance dynamic
in supervised environment is shown in (figure 3-5).
If class matching has occurred, all weights of the winning node should be
trained. Otherwise, a valué of-1 is assigned to the choice valué of this category node to
put it out of competition during current input (shut off). In addition to class matching,
the next winning node should beat the new vigilance parameter in order to represent the
current input. The vigilance parameter is fíxed if the category node J i s failed to pass the
match testing, while the match valué of node J is assigned to p if J is failed the class
matching;
i 2M
p"ew =max{pold ^(^(A^ AWU))}±S 3-2
This step is repeated until either one of the committed category nodes can
represent the current input or a new category node should be committed. The vigilance
parameter reset to its base-line valué and all committed category nodes are reactivated
before a new input is presented to the network.
3.2.2 Training phase
During training phase, a stream of input vectors A(t) and a stream of input
vectors b(t) are presented simultaneously to Fuzzy ART a and Fuzzy ART b of Fuzzy
ARTMAP, respectively. The input vector A(t) is normalized to [0, 1] before their
43
p
i
s_
P
i
F A
A
1 1 1 fc>
T,
Figure3-5: Sketches shows the match tracking. The x-axis represent ranking for all committed category nodes according to their choice valúes Tj. The y-axis represents the match valué for each category nodes. The thin line that runs along the solid line represents the match valué before addingf. New node must be committed, because all committed category nodes can not represents the current input.
44
presentation. The input vector b(,) is the correct prediction given A(0. It is a binary
digital code. It has the number of digits (neurons) that equals the number of classes of
the input data. In winner-take-all mode, all digits equal to zero except one digit, which
corresponds to the order of the class code. This digit is equal to one. However, in class
fuzzy membership the sum of the input valúes at the input layer of Fuzzy ARTA is equal
to one, that is;
2>r=i k=\
If the match valué of the winning node Tj is greater than the vigilance
parameter pa, the class matching should be tested. Thus;
k=\
Class matching: weights should be updated for node J;
<W = A 4 ° A wf) + (1 - P)ytf ;i=l, ..., 2M
< * = M° A O + (1 - P)wf -MI .... Z
Else
Match tracking:
1M 1 LM
M ti
Where pab e (0, 1] is the map field vigilance parameter, wJk is the weight vector which
connect the winning node J in the category layer with all nodes of the map field. All
weight valúes are initially set to 1. (i.e. wJk= 1 ; j = l, ...,Nand k=\,..., L), where L is
45
the total number of nodes at the map fíeld, which is equal to the number of classes of
the input data. If class match failed (the class matching valué is less than the
predetermined map fíeld vigilance parameter pab), the vigilance parameter pa should
be increased just abo ve the match valué of the selected category node by a small valué
e. This is called match tracking. It is an internal control mechanism that maximizes code
compressions and minimizes predictive errors. However, the vigilance parameter of the
map fíeld pab is fixed during learning phase. When a winning committed category node
failed to pass either, the required confídence level or class matching, it shuts off for the
duration of the input. The network repeats doing this until either one of the committed
category nodes can represent the current input or a new category node should be
committed.
If a new category node should be committed (all committed category nodes
failed to represent the current input), it weights will be updated as follows for normal
learning case (/? < 1);
wf¿rsl = pAf + (1 - P ) ; i=l, ..., 2M 3-4
w»r=fib¡í>+(l-fi);b=l,..,L 3-5
For fast learning case (J3 = 1), the valúes of the current input Af0 will be
assigned to w£rsl and the valúes of b(l) will be assigned to w£™. Fast learning, for
newly committed category node, is recommended to classify rare events. The valué of
j3 depends on the amount and type of the data under consideration. It is clear from the
above formulas that if /? =0 no learning will occur, since weights will not be changed
being fixed at 1 during training.
46
More details about Fuzzy ARTMAP architecture and algorithm can be found
in (Carpenter et al. 1992, & 1997).
3.2.3 Classification phase
At the end of the training phase the weights wtJ and wjk are fixed. The network
is ready for classification. Input patterns are presented to ARTa without class code. The
choice valué is computed for all committed category nodes. The category node with the
máximum choice valué is determined. The score of the winning category node J at each
node at the input of ART6 is computed by;
b(0= ™A k ~ L , k=l, ..., L 3-6
,wjk k=\
The node with the máximum score bKat input layer of ARTd is determined.
The index iTof this node will be the class code of the current input.
i
3.3 Full algorithm of Fuzzy ARTMAP
The full algorithm of Fuzzy ARTMAP is listed below. The supervisión of
Compact Fuzzy ART, which has been developed in this work, will be used. The Fuzzy
ARTMAP algorithm, which used original Fuzzy ART, is listed in (Carpenter et al.
1997).
47
3.3.1 Training algorithms ofFuzzy ARTMAP
1) Input parameters;
a) Dynamic parameters;
i- pe [O, 1]: Base-line vigilance parameter.
ii- pab e (O, 1]: Map field vigilance parameter.
iii- J3 e (O,1]: The dynamic leaming parameter; /?=1 for fast leaming.
iv- CC >0: The choice valué parameter.
b) Data characteristics;
i- M: The dimensión of the input features.
ii- Pt: The number of exemplars to be used in leaming.
c) Initialization;
i- Number of iterations t=l.
ii- Number of committed category nodes C=\.
2) New input;
< ( ) = í «í0 for \<i<M [1-flf0 for M + l<i<2M\
H° ;k=l,...,L.
3) Compute the choice function for all committed category nodes;
2M
ZUe,AWf) r ! 0 = w .
7 2M >J l< •••> ^
;=1
48
4) Reset: Determine the node J, which has the máximum choice valué;
T^ =max{TP) ;j=l,...,C
5) If ( Tj° - -1: all committed category nodes are in shut-off mode) then a
new node (the node that its order in the category layer is C+1) should be
committed;
i- Increase the number of committed nodes by one;
C=C+1
ii- If in fast-commit slow-record mode;
Assign the valúes of the input feature to the weights of this node;
™íc" = 4° ; 1=1. .... 2M
«"=¿f ;*.;,..„ L
Else (normal mode)
w*"=fr}')+(L-fi) ;i-l 2M
wgr=flbi°+<\-0) ,W L
iii- GOTO STEP (2)
2M 6) Matching criterion: If ( ^ ( 4 - ° AWU) < Mp) then;
;=i
i- Shut-off this node to put it out of competition;
ii- GOTO STEP (4)
49
7) Class matching: If ( 2 $ ° A wJk) < A*) t h e n ;
i- Shutoffnode J;
r w = _!
ii- Rise p to the limit that deactivates node J;
1 1M
iii- G0T0STEP(4)
8) Learning;
WT = 0(A¡* A W°Jd) + (1 - /?)<W ; lW, .... 2M
w? = M° A < ) + (1 - / ? )< ;k=l,..., L
9)If(t<Pt)then;
i- t=t+l
ii- p = p
iii- G0T0STEP(2)
10) Training has been done. The network is ready for classification.
3.3.2 Classification algorithm ofFuzzy ARTMAP
1) Newinput;
Ait) = , a? for \<i<M
l-a¡° for M + \<i<2M
50
2) Compute the choice function for all committed category nodes;
2M
1=1
3) Determine the node J, which has the máximum choice valué;
T}1) = max{Tp} ;7W,.... C
4) Matching criterion:
2M
If(£(4 ( ' )AW ¿ /)<Mp)then; i= i
i-The network can not determine the class code of this input.
ii- GOTO STEP (7).
5) Class matching:
1,(0 _ WJk °k ~ L ~~~ ;k=l,...,L
,WJk
I f ( S ^ ° A M ; - / * ) < ^ ) t h e n
i- The network can not determine the class of this input.
ii- GOTO STEP 7.
6) Class assigning:
i. bf=max{b?} > ; , . , !
ii- ATis the class code of the current input.
7) If more classification is needed GOTO STEP (1).
8) Classification has been done.
51
CHAPTERIV
SUPERVISED ART-IANN
4.1 Introduction
As it has been mentioned the map field approach is the unique approach, which
has been addressed in the literature for supervisión of all ART-type ANNs. Using map
field approach, two modules of ART ANNs are required. Moreover, map field
supervisión approach forces one to present the class code as Z-digits long binary code,
where L is the number of classes. The binary digital coding is employed by putting all
the class code digits equal to zero, except the one, which corresponds to the order of the
class code, which is set to 1. The class code should be presented to the network as
follows: class code #1 as (1 0 0 . . . 0), class code #2 as (0 1 0 . . . 0), class code #3 as (0
0 1 . . . 0), ..., class code #L as (0 0 0 . . . 1). More than that, training with hard samples
(each training exemplar represent a single class, which is the normal case) requires,
additional (false) learning for the weights of the map field. False learning because the
initial valúes of all weights that connect the category node with the map field equal one.
During training all weights drop to zero except one, which its valué remains equal to
one. This is the weight that connects the category node with the correspond node of the
map field that represents its class, see (figure 4-1). So, the valué of the map field
vigilance parameter/?ai does not effect the effíciency of learning or the accuracy of
classification. Therefore, pab needs just be a positive fractional number (0, 1], because
the match valué at the map field is either equal to one or zero. So, the valué of pab does
52
The initial map field weights
The class code bk for class #4
Each map field node bk A WM
L
Total map field k=\
1 ¡ 0
1 0 i
1 0
i I o i 0 I 0
I I 1 ¡
1 1 1 I 0
1 0
a
Map field weights for class #4
The class code bk for class #2
Each map field node bk A WM
L
Total map fieldX^ AWJk) k=\
0 0
0
0
0 1
0
0 0
0
1 o
I 0
0 0
0 I
I b
Figure 4-1: The map field supervisión approach: a) A new committed category node, which represents class code #4.
The initial map field weights are equal to one. The class code for class #4 is a binary digital code with all digits are equal to zero except digit #4 which is equal to one. The map field weights to the newly committed category node are equal to zero except the fourth one, which is equal to one. So, the code of class #4 has been stamped at the map field weights for this newly committed category node to represent class #4.
b) A committed category node represents class code #4 rejected to represent class #2.
53
not effect the class matching process. Class matching using map fields approach is done
as follow;
J^ 1 for class matching y (bk A w^) =< ¡~^ 0 for class correction
In addition to the requirement of two modules of ART, map field approach
leads to more computation through map process and weight learning. Moreover, it
requires more memory.
This chapter describes a new simplified supervised ANN architecture, which is
constructed from a single module of ART, called Supervised ART-I (Al-Rawi 1999).
Supervisión of the new simplified versión of Fuzzy ART (Compact Fuzzy ART), which
has been developed in chapter II, is described here. This new ANN has a simple
architecture and thus simplifies the computational complexity, keeping its accuracy
performance at the same level of Fuzzy ARTMAP. In addition to that, it has fewer
parameters and requires less memory. Moreover, if hardware is developed, the cost will
be much lower than that of map field approach.
The layout of this chapter is as follows: Section 4.2 describes the architecture
of the Supervised ART-I, data representation, training phase, and testing phase. The full
algorithm has been Usted in section 4.3. Section 4.4 includes the discussion.
4.2 Supervised ART-I
As Fuzzy ARTMAP, the newly developed ANN Supervised ART-I has the
ability of learning and classifying of arbitrary sequence order of binary and analog
multi-valued input patterns. It has the same classification accuracy of Fuzzy ARTMAP
4-1
54
with a simpler architecture. This leads to the simplificaron of the mathematical
construction of the network, reduction in the number of parameters, and reduction in the
memory requirement, and fínally, reduction of both training and classifícation time. The
supervisión approach of Supervised ART-I is shown in (figure 4-2).
4.2.1 Architecture of Supervised ART-I
The Supervised ART-I architecture is constructed from a single Fuzzy ART,
instead of two, as in Fuzzy ARTMAP. The full architecture of Supervised ART-I is
shown in (figure 4-3). This leads to the elimination of the map field, and therefore,
elimination of map field weights and the map field vigilance parameter (pab) of Fuzzy
ARTMAP. This has been achieved by two different process: 1) Employing the analog
class coding (it is convenient to use positive integer) instead of the necessary binary
digital coding in Fuzzy ARTMAP; 2) Introducing a one-dimensional memory, running
along all the category nodes N of F2a- field which used to tag each new committed node
with the code of the class that belongs to. This N size memory represents just (l/Z-)th of
the eliminated memory that is occupied by the weights of the map field wjk. Since, all
category nodes are connected to the map field, the total number of weights connected to
the map field is NxL, where N is the total capacity of the network.
Class matching in Supervised ART-I is much simpler than that in Fuzzy
ARTMAP. It does simply by reading the tag-value of the winning category node. When
a node is committed during training phase, the class code of the input pattern that forces
it to be committed is assigned to the memory of its tag-value. Each committed category
node has only one tag-value because each node can represent only one class. However
more than one category node can represent the same class. Therefore, each category
node can be seen as a representation of subclass of the class, which it belongs to.
55
, Tag(./)=6
r2
PREDDTWE EHBOfl R =1
Class code Integer
Figure 4-2: Supervisión dynamic of the tagging approach of Supervised ART-I. Class code is integer. Match tracking is conducted by checking the tag of the winning committed category node J with the class code b. This replaces the complicated map field approach.
56
Tag
#1
O
Tag
#2
O
Tag
O
Tag
O
NoT
ag
Yet
O
NoT
ag
Yet
O
M
* i
K
F x
Figu
re 4
-3:
Arc
hite
ctur
e of
Sup
ervi
sed
AR
T-I
. Su
perv
isió
n is
don
e us
ing
the
tagg
ing
appr
oach
. W
hen
a no
de i
s co
mm
itted
, it i
s ta
gged
with
the
clas
s co
de o
f the
inpu
t fea
ture
s th
at fo
rcé
it to
be
com
mitt
ed.
4.2.2 Data Description
The multi-valued input patterns A(t> can be presented to Supervised ART-I in
both binary and analog forms. However, input data should be normalized to [0, 1]
before their presentation. Since, sometimes, some of the weight valúes erode to zero, it
is recommended to introduce A(t) in the complement form to avoid the category
proliferation problem.
The class node b(t) is not a binary vector as the case of map field supervisión
approach, but a positive integer number which represents the class code of the input
patterns A(t), ( b — 1, 2,.... etc ). b = 1 represents class number 1, b - 2 represents class
number 2,...., etc.
4.2.3 Training of Supervised ART-I
During the training phase, a stream of multi-valued input patterns A(t) and the
class code b® are introduced simultaneously to the network. The choice function is
computed for each committed node according to equation (2-1).
The network selects the committed category node J, which 1) has the
máximum choice valué among all the committed category nodes (in F2 - field) and 2)
has a match valué greater than or equal to the vigilance parameter p;
2M
^ ( ^ A W ^ M p 4-2
If the tag-value of the winning category node matches the current class code b(t), the
node will be trained. Otherwise match tracking should be conducted;
58
If(Tag(J)=¿> )then
Weights updating;
w- = (3{A? A w°Jd) + (1 - flytf • i=l, ..., 2M
Else
Match tracking;
i 2M
M tí
Otherwise, class correction should be conducted by increasing the vigilance parameter
p above the match valué of this node by a small valué £ , and another committed node
is chosen. This sacrifices the generalization to correct predictive error. Any committed
category node that failed to represent the current input must be shut off, as far as this
input is on, in order to prevent its reselection. A category node is shut off by assigning a
-1 to its choice valué. That is because all category nodes not in shut off mode have a
positive choice valué. If the failed category node did not shut off, the network will be in
infinitive reselect-fail loop. The network will reselect the same category node, and the
node will fail to pass the match criterion. If none of the committed category nodes is
able to represent the current input A(t) (all committed category nodes are in shut off
mode), a new category node is committed and is tagged immediately with the class code
b(t) of the current input pattern. Such action is needed when the máximum choice valué
is a valué of shut off node (Tj =-1). In the fast-leaming slow-record option, the valúes of
the input features, which forced the category node to be committed, are assigned to its
weight valúes. This is to let the network deal better with noisy data, so rare events can
59
be classified. If the network is in normal mode, the weights of the new committed
category node will be as that of equation (2-16).
As in all supervised ART-type ANNs, the vigilance parameter p should be
reset to its base-line valué p, before a new input pattern is introduced to the network.
4.2.4 Classification by Supervised ART-I
During the testing phase, only the input pattern A(t) is introduced to the
network. The choice fiínction is computed for all committed category nodes. The
category node with máximum choice fiínction is determined among all committed
category nodes. If the match valué of the winning category node Jpasses the base-line
vigilance parameter p, then the tag of the category node J represents the class code of
the current input pattern A(t). If not, the network can not determine the class code of the
current input.
4.3 Algorithm of Supervised ART-I
4.3.1 Training Algorithm of Supervised ART-I
1) Input parameters;
a) Dynamic parameters;
i- pe[0 , 1]: Base-line vigilance parameter.
ii- p e (0,1]: The dynamic learning parameter; /?=1 for fast learning.
iii- a >0: The choice valué parameter.
60
b) Data characteristics;
i- M: The dimensión of the input features.
ii- Pt: The number of exemplars to be used in learning.
iii- L: The number of classes.
c) Initialization;
i- Number of iterations t=l.
ii- Number of committed category nodes C-\.
2) New input;
A v \ < for \<i<M | l - a , ( / ) for M + l<i<2M
A(0
3) Compute the choice function for all committed category nodes;
2M
7 - ¡ » = ^ _ _ .j-, c 2M
a + YJwu
i=\
4) Reset: Determine the node J, which has the máximum choice valué;
T^=max{TP) ;/=!,...,C
61
5) If Tj° = - 1 (all committed category nodes are in shut-off mode) then a
node (the node that its order in the category layer is C+1) should be
committed;
i- Increase the number of committed nodes by one;
C=C+1
ii- If in fast-learning slow-record mode;
Assign the valúes of the input feature to the weights of this node;
w¡c ~A¡ ;i=l,...,2M
Else (normal mode)
iii- GOTO STEP (2)
2M s 1=1
6) Matching criterion: If (^(A¡1) A W¡J) < Mp) then;
i- Shut off node J to put it out of competition;
7 f = - 1
ii- GOTO STEP (4)
7) Class matching: If (Tag (J) * b{t)) then;
i- Shut off node J;
If = -1
ii- Rise p to the limit that deactivates node J;
1 2M
P^ÍK^)-^
iii- GOTO STEP (4)
8) Learning;
f W A , i , ° W oíd wn™ = P(A<> A O + (1 - / ? ) < ; ,W,..., 2M
9) IF (t<Pt) then;
i- t=t+l
ii- p = p
iii- GOTO STEP (2)
10) Training has been done. The network is ready for classification.
4.3.2 Classification algorithm of Supervised ART-I
1) Newinput;
,(') 1(0 - '
a)n for \<i<M
\-a)n for M + l<i<2M
2) Compute the choice function for all committed category nodes;
) 2M
£(4° *", j 2M •j=i,.... c
a + Z>i/ i= i
63
3) Determine the node J, which has the máximum choice valué;
r W = m « { 7 f } ;J=1,...,C
4) Matching criterion:
1M
I f ( £ ( 4 ' ) A W ¿ / ) < M p ) t h e n ;
The network can not determine the class of this input.
Else
Tag(J) is the class code of the current input
5) If more classification is needed GOTO STEP (1).
6) Classification has been done.
4.4 Discussion
The supervisión approach of Supervised ART-I is more powerful than the map
field approach. The Supervised ART-I neural network has a simple architecture and a
simple mathematical construction, which reduces time of both training and
classification phase. In addition, it also has fewer parameters and it requires less
memory, than Fuzzy ARTMAP.
The introduction of the one-dimensional memory JVto store the tag's valué for
each committed node in the Supervised ART-I represents only (1/X)th of the eliminated
64
two-dimensional memory wJk (N,L) which connects the nodes of the F2a -Field with
the nodes of the map field Fab, see (figure 4-3).
Using the positive integer numbers for class coding, in Supervised ART-I, is
easier to handle than the necessary binary digital coding in map field approach of Fuzzy
ARTMAP, specially when we have large number of classes as in character recognition
and remote sensing tasks.
The Supervised ART-I, as it seems from its simple architecture, decreases
sharply the training and classifying times of the Fuzzy ARTMAP. However, it is
theoretically known that both of them have the same classification accuracy. The
supervisión approach, which has been developed in this chapter, is used for supervisión
of Compact Fuzzy ART. However, it can be applied for supervisión all ART
architectures.
65
CHAPTER V
SUPERVISED ART-IIANN
5.1 Introduction
It has been shown in the previous chapter that Supervised ART-I has a simpler
architecture, fewer parameters, and requires less memory than Fuzzy ARTMAP. This
leads to quicker learning and classifying algorithms, with the same accuracy as Fuzzy
ARTMAP. The great achievement of Supervised ART-I is that, its architecture has been
built from a single module of ART, instead of a pair of them as in all supervised ART-
type ANNs. This led to the elimination of the map field. The supervisión approach of
Supervised ART-I can be applied to all supervised ART-type architectures that have
been addressed in the literatures.
This chapter deals with constructing a new generation of Supervised ART-I,
called Supervised ART-II (Al-Rawi et al. 1999). As it will be shown, it is quicker in
learning for non-homogenous data and requires less memory than Supervised ART-I.
Nevertheless, the classifícation accuracy and number of parameters are like those of
Supervised ART-I.
5.2 Supervised ART-II
5.2.1 Architecture of Supervised ART-II
Supervised ART-II, as Supervised ART-I, has been built from a single Fuzzy
ART module, see (figure 5-1). The one-dimensional memory JVof the category nodes of
66
PREDDTIVE ERROR R =1
Class code Integer
a
Figure 5-1: Supervisión dynamic of the stacking approach of Supervised ART-II. Like Supervised ART-I, class code is integer. Match tracking is conducted by checking the stack K of the winning category node with the class code b. When all committed category nodes of the stack b fail to represent the input features, a new node should be committed. This is because, committed category nodes of other stacks will not pass the class matching. This will decrease sharply the training time.
67
Supervised ART-I is divided into Z-one-dimensional memories (Nk; k=l, ..., L) in
Supervised ART-II, see (figure 5-2). Each of these one-dimensional memories Nk has
been called "stack". The stack number k represents the class code for all its committed
category nodes. The memory requirement for Supervised ART-II is less than that of
Supervised ART-I. One label is assigned to each stack, rather than tagging each
individual category node in Supervised ART-I. The memory field to represent the class
code in Supervised ART-II is one-dimensional array with length L.
The size of Nk (number of nodes which are available to be committed) in each
stack are not necessarily equal. It depends on the nature and size of the data of each
class. However, if no previous knowledge about the data is available, an equal memory
size is recommended. In the case of using equál memory size for all stacks, the memory
field of the category layer is a Nk xL matrix. The dynamic of Supervised ART-II is
shown in (figure 5-1), and the full architecture is shown in (figure 5-2).
5.2.2 Training of Supervised ART-II
During training phase, a stream of multivalued input patterns A ¡ and their
class codes b^'' are introduced simultaneously to the network. The choice function is
computed for each committed node in all the stacks;
2M
IW"'A\,) _í=l
*M ^ ; jk=l,...,C{k),k=l L 5-1 a + L, WÜ>*
Í = I
T<0 _ 1 Jtk ~
CATEGORY LAYER
CLASS CODE
I N P U T L A Y E R
Figure 5-2: Architecture of Supervised ART-II. Input nodes are connected to all committed nodes. Committed nodes are shown in dark. Uncommitted nodes are shown in light.
69
Where C(k) is the number of committed nodes in the stack number k, and wijkk are the
weights, which connect each category node jk in each stack k with the input node í.
The node, which has the máximum choice valué, is determined for each stack;
TJt = max {T};¡} • j=i,.... C(k) 5-2
These máximum choice valué nodes are the candidates of their stacks to
represent the current input. The node, which has the highest choice valué T}¡) among
all the candidate nodes, is chosen to represent the current input, see (figure 5-3);
T¡¡>=max{T%} ; Jk=J„..., JL 5-3
The match valué is computed for the winning node, in the next way;
match valué = — V (A¡1) A WUK ) 5.4 M Ti
At this stage it can be checked if a new category node should be committed
from the stack that represents b(t). This can be done simply by checking Tj'^. If it is
equal to -1 that means a category node in shut off mode has been selected. This happens
only when all committed category nodes of the stack that represent the current input are
in shut off mode. In this case committed category nodes of other stacks are not required
70
JK
Figure5-3: Determination the winning node in the stacking-supervision approach of Supervised ART-II. The máximum choice valué node for each stack is selected. The winning node among these candidates is determined. When the winning node failed a new candidate for this stack is presented. The winning category node among all candidates is re-determined.
71
to be tested since they will not pass the class matching. This will save time during
learning phase.
If the match valué of the winning node has passed the vigilance parameter p,
the class matching should be checked. The class matching is passed if the stack label K
.of the winning node matches the current class code b(t). Then the weights of the
winning node are trained, otherwise match tracking should be conducted;
If(A=& (0)then
Weights updating:
K7 = A 4 ( 0 A vC) + 0 " PXK ; iW, .... 2M
Else
Match tracking should be conducted
If either the match valué or the class matching for the current node has failed, a
valué of-1 is assigned to the choice valué TJp of this node. This is to put it out of
competition. Another node with the máximum choice valué should be selected among
all the committed nodes of the stack K only. This node is the new candidate for its stack.
The candidates of all other stacks remain the same. The candidate node, which has the
highest choice valué among all the candidate nodes of all the stacks, is redetermined.
This process should be repeated until either one of the committed nodes can represents
the current input, then its weights are trained or a new node should be committed from
the stack which represents the class code of the current input b(,). This way all the
committed category nodes are stacked according to their classes.
72
In the case of a new category committed, its weights are assigned the valué of
the current input A¡ , which forces it to be committed, that is;
C(b)=C(b) + l 5-5
*&¿=4\ i=l,...,2M 5-6
Where C(b) is the number of committed nodes in the stack that represents the class code
b. Therefore, the weights initial valúes are not required. Weights updating for a new
committed category node of equation 5-2 represents the fast-learning slow-record
option, which is recommended to classify rare events. In the normal mode weights
updating should be conducted by;
*>$?«» = /H (° + 0 - / ? ) ; i=l .... 2M 5-7
As all supervised ART-type architectures, if class matching has failed, the
vigilance parameter p should be increased to the limit that the failed category node is
deactivated. Thus, the vigilance parameter should be assigned the match valué of the
current category node, plus a small valué e in order to treat the class correction;
i 2M
P = — Y,^ AWUK) + £ 5-8
73
However, the vigilance parameter should be relaxed to its base valué before
introducing the next input; p = p where p is the predetermined minimum accepted
matching valué.
5.2.3 Classification by SupervisedART-II
During the classification phase, the node, which has the highest choice
valué Tjg among all the committed nodes, is selected to represent the current input
Tjí =ma^}- jk=\,..., C(k); k=l,.... L 5-9
If the match valué of the winning node is greater than or equal to the vigilance
parameter p, the current input belongs to class K. If not the network can not determine
the class of the current input.
5.3 Full algorithm of Supervised ART-II
5.3.1 Training algorithm of Supervised ART-II
1) Input parameters;
a) Dynamic parameters;
i- pe [0, 1]: Base-line vigilance parameter.
ii- p € (0,1]: The dynamic learning parameter; ¡5=1 for fast learning.
iii- & >0: The choice valué parameter.
b) Data characteristics;
i- M: The dimensión of the input features.
74
ii- Pt: The number of exemplars to be used in learning.
iii- L: The number of classes.
c) Initialization;
i- Number of iterations t=l.
ii- Number of committed category nodes for each stack (class) is set to zero;
C(k)=0; k=l, ...,L
2) Newinput;
A ( í ) _ j a? for \<i<M
l - a f > for M + \<i<2M\
O : the class code of the current input A¡°.
3) If (no node has been committed for the current class b{t) (ie C(b)=0)) then;
GOTO STEP (6i)
4) For all the stacks; k=l,..., L:
If C(k)=0 (no category node has been committed yet for this class) then;
T =-1
Else
Compute the choice function for all committed category nodes;
75
2M
• v / _ ' = ' y (O _ _M
1=1
5) For each stack k, determine the node Jk, which has the máximum choice
valué;
T}?=max\r¡¡>}j-1,...,C(Q
6) If (Tj = -1 ), new node should be committed for class b(l);
i- Increase the number of committed nodes for class b(t) by one;
C(b(,))=C(b(,))+\
ii- If (fast-learning slow-record mode) then;
wi,C(b),b - A¡ ; i=l -, 2M
Else
^ ^ = A W + ( l - ^ ) ; / = / 2M
iii-GOTOSTEP(ll)
7) Determine the node, which has the highest choice valué among all the
máximum choice valued nodes of all the stacks;
T$=max{T^};k=l L
76
8) Match testing: For category node TJK ;
I f (ZW° AWUK j<^P)then;
i- Shut off category node JK
1JK 1
ii- redetermine the node which has the máximum choice valué node for
this stack K. k=K: GOTO STEP (5).
9) Match tracking:
If (K* b(t) ) the stack number K of the winning category node does not
match the current class code. Match tracking should be conducted;
i- Shut off category node JK during the current input A¡°;
2t>=-l lJK
1 í \
•»" í=i
iii- GOTO STEP (5)
10) Learning;
"Z? = fi(Am A <) + O - « « S ; i-i,.... a/
11) If the number of iteration t<pt then;
i- t=t+l
ii- p=p
iii- GOTO STEP (2)
12) The training has been done. The network is ready for classifícation.
77
5.3.2 Classification algorithm of Supervised ART-II
1) New input;
,« í «,(° for \<i<M A^ =<
\\-aj0 for M + l<i<2M
2) Compute the choice fimction for all committed category nodes;
2M
rp(t) _ 1 = 1
jkk ~~ 2M j Jk=l> ••••> C(k)> k~l> •••> L
i=l
3) Choose the node that has the máximum choice valué among all the
committed nodes for all stacks;
Tj¡> = max{T¡?}, j=l,.... C(k); k=l, ..., L
4) Match testing:
2M I f ( Z i \ A i ° AWUK)>MP)then; ;=1
The class code of the current input Af* is K
Else
The network can not determine the class code of this input.
5) If more classification is needed then;
GOTO STEP (1)
ELSE
Classification has been completed.
78
5.4 Discussion
It is clear from the algorithms of Fuzzy ARTMAP (Carpenter et al. 1992),
Supervised ART-I (Al-Rawi 1999), and Supervised ART-II (Al-Rawi et al. 1999), that,
they have the same classification accuracy. That is because all these architectures share
the same matching criteria. However, the last two learn quicker due to their simple
architectures.
The Supervised ART-II is quicker in learning binary and analog input patterns,
than Supervised ART-I. Because stacking the category nodes according to their classes,
makes the redetermination of the máximum choice valué node quicker than the tagging
approach of the Supervised ART-I, when the previous one has failed to pass the
vigilance parameter or the class matching, see (figure 5-3). The number of comparisons
which are required to redetermine the winning node, among all the committed category
nodes C, are C-1 comparisons for Supervised ART-I, while for Supervised ART-II they
are (C/L-l)+(L-l)=(l/L)C+(L-2) comparisons, as an average. Therefore, learning time
for Supervised ART-II is quicker compared to Supervised ART-I, as C increases (Al-
Rawi et al. 2000).
In Supervised ART-II, when all committed category nodes for the stack of the
current input pattern are in shut off mode, a new node must be committed from this
stack without waiting to test all the committed category nodes for all stacks, because
they will not pass the class matching. However, in Supervised ART-I all committed
category nodes should be tested before such a decisión can be made.
The comparison between Fuzzy ARTMAP, Supervised ART-I, and Supervised
ART-II is shown in (table 5-1). Supervised ART-I is superior or equivalent in each
point to Fuzzy ARTMAP. Supervised ART-II has simpler architecture from Supervised
ART-I. However, the main limitation for Supervised ART-II is that the máximum
79
Arc
hite
ctur
e
No.
of
cate
gory
nod
es i
nvol
ve f
or
Tm
ax
Che
ckin
g fo
r a
new
com
mit
ted
node
Re
Tm
ax b
efor
e ne
w n
ode
is c
omm
itte
d
Cla
ss c
odin
g
Sup
ervi
sión
mem
ory
requ
irem
ent
Map
fie
ld v
igil
ance
par
amet
er
Mat
ch t
rack
ing
Upd
ate
wei
ghts
of
AR
T w
¿/
Upd
ate
wei
ghts
of
map
fie
ld W
/¿
Fre
edom
of
unco
mm
itte
d ca
t. n
odes
Fuz
zy A
RT
MA
P
Tw
o A
RT
s
c J>C
c B
inar
y di
gita
l co
de
(N
+ 1
) *
L
Pab
L
^b kAW
Jk)^
Pab
k=
\
yF=
fitf
ntH
-fyt
^r=m
*<ui
-wí
Fre
e
Sup
ervi
sed
_á
RM
O
ne
AR
T
|
c \
7^
=-
l
c \
Ana
log
c Non
e
Tag
(J)=
b
Sam
e
Non
e
Fre
e
Sup
ervi
sed
AR
T=
II_
One
A
RT
C 1
L
Tm
ax(b
) =
-\
C 1
L
Ana
log
L
Non
e
K=
b
Sam
e
Non
e
Bou
nded
Tab
le 5
-1:
Com
paris
ons
betw
een
Fuzz
y A
RT
MA
P, S
uper
vise
d A
RT
-I, a
nd S
uper
vise
d A
RT
-II.
It is
ver
y cl
ear
that
the
supe
rvis
ión
appr
oach
of
Supe
rvis
ed A
RT-
I an
d Su
perv
ised
AR
T-I
I is
muc
h si
mpl
er th
an th
e m
ap f
ield
ap
proa
ch.
00 o
number of category nodes of each stack is predetermined before the training process.
When all the nodes of a particular stack have been committed, borrowing an
uncommitted node from another stack is not possible. Yet in the tagging approach of
Supervised ART-I, uncommitted nodes are free to represent any class during the
training process. This is the main constrain of the stacking supervisión approach of
Supervised ART-II.
This limitation of the stacking supervisión approach of Supervised ART-II,
can be treated by increasing the memory size of each stack. This additional memory is
compensated by employing only L of the N released memory of the tagging supervisión
of Supervised ART-I. The released memory can be used to increase the memory size of
each stack by one fold.
81
CHAPTER VI
PERFORMANCE OF SUPERVISED ART-I&II FOR
CLASSIFICATION OF LANDSAT TM IMAGES
6.1 Satellites Landsat
Satellites of the Landsat program are polar orbiting satellites. The orbit of
Landsat-5, whose images we will work with in this study, is 705 km high with
inclination of 98.2°. It rotates around the earth every 99 minutes. The repeated coverage
of the satellite is 16 days. It is sun-synchronous crossing the equator at 9:45 AM.
Landsat has the Multi-Spectral Scanner (MSS), and the Thematic Mapper
(TM). Our concern in this study is the TM. This sensor operates in the visible and
infrared range. TM has seven bands. All bands have (30 x 30)m spatial resolution
except band-6, which has (120 x 120)m resolution. Band-6 operates in the infrared. The
radiometric resolution is 8-bits for all channels. So, digital valúes are ranged between 0-
255. The details of TM sensor are listed in (table 6-1). The ground size of Landsat TM
images is (185 x 172) km. It consists 5760 Lines x 6928 Pixels.
The objectives of this chapter is to test the performance of Supervised ART-I
and Supervised ART-II for mapping land-cover, using Landsat TM images (Al-Rawi et
al. 2000). The sensitivity of the system will be tested for all the domain of dynamic
parameters (/?,/?) and for different sizes of training set. The effect of dynamic
parameters on training time and classification accuracy will be addressed.
82
Ban
d
1 2 3 4 5 7 6
Ban
dw
idth
(
m
)
0.45
-0.5
2 (V
IS,
blue
)
0.52
-0.6
0 (V
IS,
gree
n)
0.63
-0.6
9 (V
IS,
red)
0.76
-0.9
0 (N
IR)
1.55
-1.7
5 (S
WIR
)
2.08
-2.3
5 (S
WIR
)
10.4
0-12
.50
(TIR
)
Det
ecto
rs
SiP
D(1
6)
SiP
D (
16)
Sip
D (
16)
SiP
D (
16)
InS
b (1
6)
InS
b (1
6)
HgC
dTe
(4)
Res
olu
tio
n (
m)
30
30
30
30
30
30
120
SN
R (
aver
age)
60
60
46
46
36
28
Tab
le 6
-1: D
escr
iptio
ns fo
r L
ands
at-5
The
mat
ic M
appe
r (T
M) i
mag
es. T
M is
a s
cann
ing
optic
al s
enso
r op
erat
ing
in th
e vi
sibl
e an
d in
frar
ed r
ange
s. T
he la
st c
olum
n in
the
tabl
e re
pres
ents
the
Sign
al to
Noi
se ra
tio (S
NR
) for
eac
h ch
anne
l.
00
u>
6.2 Data
Results obtained for a scene of (256 x 240) pixels of the Landsat TM image
201/32 EOSAT are showed and discussed in this memory. The scene corresponds to the
área around the Spanish City of "Talavera de la Reina". All bands are used here, except
band-6 because it has a different resolution and operates in the infrared región (10.4-
12.5//m) of the spectra. The normalized valúes [0, 1] of the bands, as well as their
complements have been simultaneously introduced to the network together with class
code. The thematic classes present in this scene have been determined by supervised
field visit to almost all training áreas established during the image classification process.
Thirteen different thematic classes have been detected. These classes and number of
pixels that cover are: meadow (7569), wheat (5297), alfalfa (2061), mountains (7233),
fallow land-1 (3026), fallow land-2 (2957), fallow land-3 (3447), natural vegetation-1
(12578), natural vegetation-2 (9558), forest (3492), irrigated land (3476), wetland (101),
and river (645). The whole space defined by the vigilance parameter p e [0, 1] and the
dynamic leaming parameter /?e(0, 1] has been investigated using different size of
training samples. This will let us understand the influence of these parameters on the
sensitivity of the system from both the training time and classification accuracy points
of view. To achieve this it is required to train the networks thousand times. The simple
architectures of Supervised ART-I and Supervised ART-II make this possible.
6.3 Performance
6.3.1 Training performance
The networks (Supervised ART-I and Supervised ART-II) have been trained
with different sizes of training set. They trained with 200, 600, 1000, 3000, 9000, and
15000 exemplars for all the combinations of p(0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90,
84
and 0.95) and ,0(0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60,
0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, and 1.0). This required training each network
(Supervised ART-I and Supervised ART-II) 960 times.
Table 6-2 summarizes more significant results obtained from runs mentioned
above. Máximum and minimum valúes for committed category nodes, learning and
classification time and classification precisión, as well as the corresponding pair of
parameter valúes (/?,/?) are showed. It can be observed that category nodes range from
31 (at p=0.70, and J3=1.0) to 1077 (at ,0=0.70, and /?=0.40) when the networks are
trained with 200 and 15000 exemplars, respectively. The classification accuracy is
ranged from 64.66% (at p=0.S0, and J3 =0.95) to 81.87% (at ¿7=0.95, and J3 =0.40),
when the networks are trained with 15000 and 9000, respectively.
The behavior of the ANNs proposed in the whole dynamic parameter domain
( p , P)íox the learning set of 9000 exemplars, which gave the máximum accuracy has
been investigated in details. The variation of the number of committed category nodes
in the dynamic parameter domain is displayed in figure 6-1. It can be observed that as
p increases, more category nodes are generated (fine categories). At médium valúes of
dynamic learning parameter (0.3 </? <0.8), the number of committed category nodes
does not change much at médium and low vigilance valúes (/?<0.80), but it increases
sharply as vigilance parameter approach its máximum valué, see (figure 6-1). When
p =0.8 the number of committed category nodes is less than 200, and when p =0.90 the
number increases to 1000, and increases even more so up to 3000 when ,o=0.99. The
dynamic learning parameter also influences the number of committed category nodes. It
is clear in (figure 6-1) that at low level and médium level of p, the number of
committed category nodes increases as/?, approaches its minimum and máximum
85
Nu
mb
er o
f ca
teg
ory
No
des
Tra
inin
g t
ime
min
:sec
.ms
Cla
ssif
icat
ion
%
Cla
ssif
icat
ion
Tim
e m
in:s
ec.m
s
min
p
.fi
max
p
.ft
min
p
.ft
max
P
,P
min
P
,P
max
p
.fi
min
p.f
t m
ax
p.f
i
TR
AI
NI
NG
S
IZ
E
200
31
0.7
0,1
.0
64
0.8,
0.0
5
00:0
0.1
0.8,
0.7
00
:00.
2 0.
70,
0.05
67.1
4%
0.70
, 0.
45
76.2
7%
0.95
, 0.
95
00:4
3.0
0.7
5,1
.0
01:1
3.0
0.75
, 0.
05
600
40
0.85
, 0.
85
121
0.95
, 0.
05
00:0
0.4
0.60
, 0.
90
00:0
1.0
0.70
, 0.
05
72.7
5%
0.75
, 0.
35
80.9
9%
0.95
, 0.
95
00:5
1.0
0.85
, 0.
85
02:0
3.0
0.95
, 0.
05
1000
50
0.
70,
0.70
16
0 0.
90,
0.05
00:0
9.0
0.70
, 0.
85
00:0
2.0
0.75
, 0.
05
72.7
4%
0.70
, 0.
70
80.5
6%
0.95
, 0.
40
00:5
9.0
0.70
, 0.
90
02:3
8.0
0.60
, 0.
05
3000
71
0.
70,
0.70
34
8 0.
95,
0.05
00:0
3.0
0.70
, 0.
75
00:1
1.0
0.80
, 0.
05
71.3
6%
0.90
, 0.
80
81.8
2%
0.95
, 0.
20
01:1
5.0
0.70
, 0.
70
05:2
9.0
0.95
, 0.
05
9000
12
0 0.
70,
0.40
75
3 0.
95,
0.05
00:1
4.0
0.70
, 0.
40
01:1
7.0
0.95
, 0.
05
67.6
2%
0.90
, 0.
90
81.8
7%
0.95
, 0.
40
01:4
5.0
0.70
, 0.
40
11:5
5.0
0.95
, 0.
05
1500
0 16
6 0.
70,
0.40
10
77
0.95
, 0.
05
00:3
1.0
0.70
, 0.
60
03:1
3.0
0.95
, 0.
05
64.6
6%
0.80
, 0.
95
81.6
3%
0.95
, 0.
20
02:0
4.0
0.70
, 0.
40
15:4
2.0
0.95
, 0.
05
Tab
le 6
-2: P
erfo
rman
ce o
f Su
perv
ised
AR
T-I
I w
hen
train
ed w
ith d
iffer
ent
size
s of
trai
ning
sam
ples
. The
máx
imum
and
m
inim
um v
alúe
s ar
e sh
own.
00
0.80 - ' '
0.78 -I 1 1 1 1 0.0 0.2 0.4 0.6 0.8 1.0
•
P
Figure 6-1 :Number of category nodes in hundreds, in the domain of the vigilance parameter p and the dynamic leaming parameter fi, using 9000 pixels of the Landsat TM images. Number of category nodes increases as the vigilance parameter increases, creating fine categories. The dynamic leaming parameter influences the number of category node at low and high range due to the occurrences of under-training and over-training, respectively.
87
valúes. For small/7, an exemplar will not influence much its group, which leads to
under-training. In contrary, for large fi, an exemplar will influence its group very much,
which leads to over-training. In both cases (under-training and over-training) categories
can not represent well their members, which leads the network to genérate more
category nodes. Over-training and under-training increase the training time, due to
generation of more category nodes (figures 6-2 & 3).
It is well known from the theory that the classification accuracy and the number
of category nodes are equal for Fuzzy ARTMAP, Supervised ART-I and Supervised
ART-II. However, they have different training times. It is clear from the algorithm that
both Supervised ART-I and Supervised ART-II require less time for training due to
their simple architectures. Figure 6-2 and 6-3 show the variation of the training time in
the dynamic parameter domain for Supervised ART-I and Supervised ART-II,
respectively. It can be observed that training time is proportional to the number of
committed category nodes.
To check the performance of Supervised ART-I relative to the performance of
Supervised ART-II, the ratio of the training time for Supervised ART-I to Supervised
ART-II is shown in (figure 6-4). Abo ve the heavy line Supervised ART-II requires less
training time than Supervised ART-I. However, below the heavy line Supervised ART-I
performs better. Thanks to the simple architectures of both Supervised ART-I and
Supervised ART-II, the construction of these figures is practically possible. Supervised
ART-II performs better when the number of category nodes exceeds 1000. Otherwise
Supervised ART-I should be employed. The training times are in order of minutes using
SUN 4 SPARC Station. Such short learning times makes Supervised ART-I and
Supervised ART-II very powerful tools for classifying large scale remotely sensed data.
88
0.90 - '
0.88 -
0.86 -
0.84 -
0.82 -
0.80 -
0.78 -I 1 1 , , 0.0 0.2 0.4 0.6 0.8 1.0
•
Figure 6-2: Training time, in minutes, for Supervised ART-I, in the domain of the vigilance parameter p and the dynamic learning parameter / ? , using 9000 pixels of the Landsat TM images. Training time is proportional to number of category nodes.
0.90 -
0.88 -
0.86 -
0.84 -
0.82 -
0.80 -
0.78 -I 1 1 : i 1
0.0 0.2 0.4 0.6 0.8 1.0
•
P
Figure 6-3: Training time, in minutes, for Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter J3, using 9000 pixels of the Landsat TM images. The training time increases as the number of category nodes increases, however, generally speaking, the training time for Supervised ART-II is lower than that for Supervised ART-I.
p
0.78 0.0 0.2 0.4 0.6 0.8 1.0
- •
P
Figure 6-4: The ratio of training time for Supervised ART-I / Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter J3, using 9000 pixels of the Landsat TM images. Above the heavy line the training time of Supervised ART-II is faster than Supervised ART-I. However, below the heavy line, Supervised ART-I is faster.
91
6.3.2 Classification performance
The effect of the dynamics parameters on the performance of the networks for
classification time and classification accuracy is shown in (figure 6-5) and (figure 6-6),
respectively. The heavy line in (figure 6-6) represents the optimum valúes of dynamic
parameters for optimum classification. As the vigilance parameter increases, the number
of category nodes increases. Therefore, the dynamic learning rate or the training
exemplars should be increased. This is because the number of training exemplars for
each category node is decreased. The mínimum classification accuracy is 67.62% at
yP=0.90 and /?=0.90, while the máximum classification accuracy is 81.87% at p=0.95
and ^=0.40.
Training has been conducted using 9 000 pixels with higher vigilance parameter.
The combinations of p =0.96, 0.97, 0.98, and 0.99 with all domain of the dynamic
learning parameter have been used. Then, an 85.87% classification accuracy has been
obtained at /?=0.98 and /?=0.50, with classification time around 25 minutes using SUN
4 SPARC Station, and 9.50 minutes using ALPHA Station 500. The reference image
and the classified image for this run are shown in (figure 6-7). Each class is assigned
one color. The classification accuracy and number of category nodes for each class is
shown in (table 6-3). The confusión matrix for the classification is shown in (table 6-4).
The diagonal valúes represent the number of pixels, which are correctly classified. The
non-diagonal valúes represent the number of pixels that are wrongly assigned to each
class. Natural vegetation-1 and natural vegetation-2 have the large share for missed
classified pixels. They are assigned 1060 and 1940 pixels, respectively. This represents
42.5% of total missed classified pixels. However, they have very good classification
accuracy, since missed classified pixels represent only about 10% of their total pixels.
Forest contributed 762 pixels to natural vegetation-2 and 197 pixels to mountains.
92
0.0 0.2 0.4 0.6 0.8 1.0
P
Figure 6-5: Classification time, in minutes in the domain of the vigilance parameter p and the dynamic leaming parameter /?, for 52 440 pixels of the Landsat TM images.
0.98 -
0.96 -
0.94 -
0.92 -
0.90 -
0.88 -
0.86 -
0.84 -
0.82 -
0.80 -
0.78 -
-15 "
JW
10 /
I /
j
i
i
—15 —10
____/—5
[
20 25ZZZZ —15-
"——10 _
i i
-20 £"J--15
10—
0.0 0.2 0.4 0.6 0.8 1.0
P
Figure 6-5: Classification time, in minutes in the domain of the vigilance parameter p and the dynamic learning parameter /?, for 52 440 pixels of the Landsat TM images.
Figure 6-6: Classifícation performance, in the domain of the vigilance parameter p and the dynamic leaming parameter /?, for Landsat TM images. Classifícation performance is low at low vigilance parameter and high dynamic leaming parameter. As vigilance parameter increases, category node is increases, and therefore the dynamic leaming parameter should be increases if the training size is fixed.
Figure 6-7: The above image is the reference image. The lower image is the classified image using Supervised ART-II, with vigilance parameter /?=0.98, the dynamic learning parameter y9=0.50, and training with 9000 exemplars. The classification accuracy is 85.82%. Classes are assigned color as follows: 1-white, 2-red, 3-dark green, 4-dark yellow, 5-bright yellow, 6-yellowish green, 7-yellow, 8-brown, 9-green, 10-black, 11-dark blue, 12-light green, and 13- blue.
TR
AI
NI
NG
P
HA
SE
T
ES
TI
NG
P
HA
SE
CLA
SS
ÑA
ME
1-
Mea
dow
2-
Whe
at
3-A
lfalfa
4-
Mou
ntai
ns
5-F
allo
w la
nd-1
6-
Fal
low
land
-2
7-fa
llow
land
-3
8-N
atur
al v
eget
.-1
9-N
atur
al v
eget
.-2
10-F
ores
t 11
-lrrig
ated
land
12
-Wet
land
13
-Riv
er
Tota
l
corr
ect
pixe
ls
1449
50
7 37
7 28
6 23
3 57
5 64
1 24
16
1363
34
56
1 13
22
84
77
tota
l num
ber
of p
ixel
s 14
75
594
394
310
247
627
652
2519
15
32
35
578
13
24
9000
clas
sific
atio
n ac
cura
cy %
98
.24
85.3
5 95
.69
92.2
6 94
.33
91.7
1 98
.31
95.9
1 88
.97
97.1
4 97
.06
100.
00
91.6
7 94
.19
num
ber
of
node
s 18
9 59
42
63
50
81
14
0 45
0 12
7 15
35
1 3
1255
corr
ect
pixe
ls
5470
39
18
1521
53
20
2099
20
28
2514
91
28
7154
24
74
2696
84
59
6 45
002
tota
l num
ber
of p
ixel
s 60
94
4703
16
67
6923
27
79
2330
27
95
1005
9 80
26
3457
28
98
88
621
5244
0
clas
sific
atio
n ac
cura
cy %
89
.76
83.3
1 91
.24
76.8
5 75
.53
87.0
4 89
.95
90.7
4 89
.14
71.5
6 93
.03
95.4
5 95
.97
85.8
2
Tab
le 6
-3:
Tra
inin
g an
d cl
assi
fica
tion
stat
istic
s fo
r L
ands
at T
M i
mag
e at
ind
ivid
ual
clas
s le
vel.
The
ne
ural
net
wor
k ha
s be
en tr
aine
d w
ith 9
000
exem
plar
s. T
he c
lass
ific
atio
n ac
cura
cy f
or t
he tr
aini
ng s
et i
s 94
.19%
. T
he c
lass
ific
atio
n ac
cura
cy f
or t
he r
emai
ning
52
440
pixe
ls o
f th
e im
age
is 8
5.62
%.
The
vi
gila
nce
para
met
er p
=0.
98 a
nd th
e dy
nam
ic l
earn
ing
rate
/?=
0.50
are
use
d. T
he le
arni
ng ti
me
is 2
.99
min
utes
and
the
cla
ssif
icat
ion
time
is 2
0.97
min
utes
, usi
ng S
UN
4 S
PAR
C S
tatio
n. T
he d
ark
num
bers
in
the
las
t ra
w r
epre
sent
s th
e to
tal
num
ber
of p
ixel
s th
at a
re a
ssig
ned
by t
he n
eura
l ne
twor
k to
eac
h cl
ass.
Os
0 ) O B N N ( 0 0 ) S < M I O ( I H O ^
0 < t r l S N N N ° O n i H IO
o o o o o o o o o ^ o o Sí: IO (0
CM
O O O O f - O O O O T - O J g o S Í 0 0
w C M C M O O O O O O O
O O N ¡ ? 0 0 0 00 ^
S o § co SP S ° ° 5 t: CM CO r > -
CM O TJ-
c » o 2 2 5 o o o
0 ( 0 0 0 * 0 CM CO i -
r-~ o o o o ^ü
8 £ ^ CO ' "t-
^ O)
"5§ ^ CM ^
lO CM CO
1 0 ( 0 0 CM oo r-
CM
* o o o> O ) T "
o o o o T- SS r^ c j
o o o o o
¡Z2
«2
CM CO o o o o o CM h- CM
O CO o o o o o
m o o o o O)
^ CM
CM ^
oo _ in CO ^ CO
gs co ^
CO
O CM O O O
CO r-, r-% CM • * ^ o o CO CO t o
O) O O • *
CO h-h- o> o o o
CO CM
CO co CM
co co O) IO
IO . - . CM a> ° m c o j y o í ^ o o o o o
CO - , a> T - o M n Ol s
CO o o o o m
o h- O O CM oí m s s
o O O O CM
m IO
co co o CM
o o IO o CM
o o 3
T - c M c o ^ r i f ) c o r - - c o o >
co co
- ° °S «
co
CM CO *¡
co CM
00
^ tí
^ § •tí
lí i—) t 3
^ 59 o J3
« °
o fc TJ- q
CN OÍ ^ 8 u _
I-1 s < 2 l
co CO CO
8 « o *
<U «4-1
3 ° s - l
> , o> Oí ,£ )
- i * 1 3 co "K co o 03 +-•
"o 2 - t í 4-.
~ c U 2 CO Oí
13 e -tí § «J —
u o , o tí. «4-1 .tí
13 •*
OÍ . t í
U <4-H
-tí s>
f I -52 -3 .CH -*^
O H (D
s a
t> ^
o
'S fe 3 03 <D
"o 2
Oí
- 1 2 > & o Oí J
^ Oí
o +-^ tí
4_» . r - t
¡ e Ii5
X
C/3 Oí
)-( O H Oí
tí
Oí
«-i o 5
fi 13 « tí
O O t í Ci) OÍ . 3
Oí
±¿ "tí
Oí
° tí 13 tí Oí *̂
52 a B t3
' o <U ° J3 Oí " ^
¡I o Oí C/3
2 o, Oí k-l co Oí 3
03
3 Oí
tí Oí
- * - » >. x>
-o 3 8 -§.!> Oí —^
tí H CO ^H )-i Oí
X) rtí tí co I cS tí 73
•a i ^ ^ f—i co ' ^ co co- ^
c3 o Oí +3
co co
13
O 03 01
o T 3 Oí tí bD
•»—i co CO
03
"tob o o
£ 2 -a T? tí -o 03
Oí
Oí a, «/ o fl O) fl « 60 . i? Oí .sv
03 co X co • rt B . H co T 3 03 O H 03
97
Forest has the lowest classification accuracy among all classes. Mountains contributed
347, 751, and 328 pixels to natural vegetation-1, natural vegetation-2, and forest,
respectively.
The behaviour of both Supervised ART-I and Supervised ART-II, for training
remotely sensed data, for all the domain of the dynamic parameters is well understood.
According to the results that have been obtained, Supervised ART-II should be
employed when the number of category nodes is in thousands. Otherwise Supervised
ART-I performs better. However, Supervised ART-II can be employed here too, since
the learning time is very short when the number of category nodes is less than 1000,
which is less than a minute.
98
CHAPTER VII
PERFORMANCES OF SUPERVISED ART ANNs WITH DIFFERENT VIGILANCE DYNAMICS
7.1 Introduction
Only one approach, for vigilance dynamic in supervised ART ANNs, has been
áddressed in the literatures. If the match valué of the winning category node is greater
than the predetermined vigilance parameter p while class matching is failed, then the
current match valué is assigned to the vigilance parameter after increase it by a very
small valué e (equation 3-1)
The vigilance parameter p is only increasing during training phase when class
matching of the wining category node is failed for a specific input features. The very
small positive valué e is added to the failed match valué then is assigned to the vigilance
parameter in order to classify rare events (Carpenter et al. 1991b). However, Carpenter
et al. (1998b) reported that reducing it by s rather than increasing leads to reduction in
number of category nodes without influence the classification accuracy of the network.
The vigilance dynamic of this approach is shown in figure 3-5.
7.2. Vigilance dynamics
7.2.1 Flying approach
It has been mentioned above the unique vigilance dynamic that reported in the
literatures, which is only increasing during training phase when class matching of the
wining category node is failed for a specific input features. This approach is called the
99
flying approach to differentiate it from other approaches that they have been proposed
in this work. The vigilance parameter in the flying approach is controlled by the
foliowing equation:
2 M
pt+1 =max{pn{YJ(AiAwUK)IM)í}±£ 1A ¡=i
The flying approach prevents committed category node that has a match valué
greater than the initial vigilance parameter and belong to the class of the current input
out of competition, if the match valué of the failed category node is higher than the
match valué of this category node (see figure3-5). This leads to genérate more
committed category nodes. Therefore, longer training and classification times are
required.
7.2.2 Fixed vigilance approach
In this approach, the vigilance parameter is constant during training phase,
which has the initial valué.
A+i = A 7-2
This allowed all committed category nodes to be created under the same level
of confidence. Moreover, committed category node that has a match valué greater than
the initial vigilance parameter and belong to the class of the current input can represent
the input, independently to its choice valué rank among committed category nodes (see
figure 7-la)
7.2.3 Free vigilance approach
Free vigilance approach is assigned to the vigilance parameter the match valué
of the previous category node if it is failed to represent the current input.
100
a- Fixed A ii
á i
A
í
• k
b- Free
i
A
i
l
k
i
"
A k J
c- Float
i
i
L
i
k
l i
í
A L á
Figure7-1: Sketches show different vigilance parameter dynamics. The x-axis represent ranking for all committed nodes according to their choice valúes. The y-axis represents the match valué for each category nodes. First sketch (a) represents the fixed approach. All category nodes are committed at the same level of vigilance valué. The second sketch (b) represents the free approach. In this approach, the vigilance parameter is always equal to the previous match valué. Therefore, a category node might be committed with match valué less then the initial vigilance parameter p0. Finally, the third sketch (c) represents the float approach, the vigilance parameter is equal to the previous match valué if it is not smaller than the initial vigilance valué, otherwise initial vigilance valué will be employed.
101
2M
Pt+l=(Z(AiAWuK)/M)t 7-3 i=i
This allows the vigilance parameter to changed freely above and below the
initial valué. This allows the network to attenuate itself to the (proper) vigilance
parameter during training phase rather than forcé it to do so (see figure 7-Ib).
7.2.4 Floating approach
The floating approach is like that of the free vigilance parameter but with
constrain that does not let the vigilance parameter to be lower than its initial valué. This
is to be sure that all committed category nodes have the minimum required level of
confídence.
pt+]=max{po,(YJ(AiAwUK)/M)t} 7.4 /=i
This leads to genérate category nodes more than both fixed and free
approaches, but less than flying approach (see figure 7-le).
It should be mentioned here that; px = p0 for all the above vigilance dynamics.
Where pB is the initial vigilance valué.
7.3 Results and discussion
The performance of supervised ART-II ANN has been tested, for classification
of the Landsat TM images, using all the above mentioned vigilance dynamics.
102
p 0.98
—
035—
r~
~^gy
~ ^Ü
70~
~ 0.
00
p 0.50
0
^-4
n
ns^
0.
15
Cla
ssif
ícat
íon
per
form
ance
and
nu
mb
er o
f ca
Fly
%
87.0
5 m
m
"77
40
76.8
4
#cn 1241
^3
50
~—
23
5'
227
Floa
t
%
86.0
4 g=
reT
78
^ 7
g^
#cn 12
41
~T82
^ 56
Free
%
66.7
1 .
^^
-77
^ T
^gg
#cn 12
0 ^B
5 57
""56
" tego
ry n
odes
Fixe
d
%
85.7
3 —
jgjg
^™
7Z07
71
.44
#cn 12
14
. ...
^j
U
13*
Tab
le 7
-1: T
he p
erfo
rman
ce o
f Sup
ervi
sed
AR
T-I
I AN
N w
ith d
iffer
ent
vigi
lanc
e dy
nam
ics.
A
lpha
sta
tion
500
has
been
use
d fo
r th
ese
runs
.
Figure 7-2: Classified images for landsat TM images. First, second, third, and forth column represents classified images using fly, float, fixed, and free vigilance parameter, respectively. First, second, third, fourth, and fifth raw represents classified images using initial vigilance parameter and dynamic learning rate of (0.98, 0.50), (0.95, 0.20), (0.90, 0.15), (0.70, 0.15), (0.00, 0.15), respectively. Classes are assigned colours as follow: 1) meadow-white, 2) mountain-brown, 3) fallow landl-yellow, 4) fallow land2-dark yellow, 5) fallow land3-bright yellow, 6) irrigated land-red, 7) alfalfa-black, 8) wetland-dark blue, 9) forest-dark green, 10) wheat-light green, 11) natural vegetationl-yellowish green, 12) natural vegetation2-green, and 13) river-blue.
The network has been tested using fíve different combinations; (0.98,0.50),
(0.95,0.20), (0.90,0.15), (0.70,0.15), and (0.00,0.15), for vigilance parameter and
dynamic leaming rate, respectively. These valúes of vigilance parameter p0 and
dynamic leaming rate j3 are located on the optimum line for classifícation performance
(figure 6-6). This optimum line represents the best valué of /? for a specific valué of p
to get the máximum classifícation accuracy using flying approach.
The classifícation performance is ranged from 66.71% using the free approach
to 87.05% using the flying approach. The numbers of category nodes was 120 and 1252
for free and flying approaches, respectively. These results obtained when 0.98, and 0.50
were used for the vigilance parameter and for the dynamic leaming rate, respectively.
See (table 7-1) for details. Classified images are shown in (figure 7-2).
The neural network performances using flying, floating and fixed approaches
are closer to each other as the vigilance parameter approach to unity. It is clear from the
theory that all above-mentioned approaches lead to the same classifícation accuracy and
number of category nodes at p0 = 1. The neural network performance using floating
and free approaches are closer to each other as pB approach to zero. It is lead to the
same performance at p0 = 0.
While the flying approach shows better performance from accuracy point of
view when the initial vigilance parameter is equal or greater than 0.95, the floating
approach shows better performance for initial vigilance parameter less than 0.95. From
number of category-nodes point of view, the network performs better using floating
approach. While it is equal to each other at p0 = 1, it is reduced to less than 25%
(56/227) at p0 = 0, (see table 7-1). Such reduction will let to reduction in the training
time and the classifícation time as well.
105
CHAPTER VIII
CONCLUSIONS
In this study new simplifíed architectures of ANNs have been designed. These
architectures have been employed to analyze remotely sensed data. The conclusions that
can be drawn from this study are:
1) Two new versions of Fuzzy ART have been developed. The algorithms show that
these new versions have the same performance as the original algorithm for
categorization. However, they require less training and categorization times.
2) New supervised ART-type architecture has been developed called Supervised ART-
I. It has been built from a single module of ART rather than two modules of ARTs
linked by a map field as are the cases of all supervised ART ANNs which have been
addressed in the literature. This leads to the elimination of the map field and its
parameters. It is theoretically proven that Supervised ART-I has the classification
performance of fuzzy ARTMAP, however, it requires less memory and less training
time due to its simple architecture.
3) Other supervised ART-type architecture has been developed called Supervised
ART-II. It is also has been built from a single module of ART. It has the
classification performance of fuzzy ARTMAP and Supervised ART-I. The category
layer of Supervised ART-II has been divided into stacks. Each stack represents a
single class. This reduces the required memory for labeling category nodes from N
in the tagging approach of Supervised ART-I to only L in the stacking approach of
Supervised ART-II.
106
4) An uncommitted category node in Supervised ART-I is free to represent any class,
however, an uncommitted category node in Supervised ART-II is predetermined to
represent a specific class. When a stack runs out of uncommitted category nodes,
borrowing uncommitted category node from other stack is not possible. Increasing
the memory size of each stack can solve this limitation of the stacking supervisión
approach, of Supervised ART-II. This additional memory is compensated by
employing only L of the released N memory of the tagging supervisión of
Supervised ART-I. The released memory can be used to increase the memory size of
each stack by one fold.
5) While we only employed the newly developed supervisión approaches for Fuzzy
ART, they can be applied to all ART-type ANNs.
6) Supervised ART-I is oriented to homogenous environment, while Supervised ART-
II is oriented to non-homogenous environment. The homogenousity of the
environment depends on the type of data and on the dynamic parameters.
7) Since both Supervised ART-I and Supervised ART-II have been built from a single
module of ART, the cost for building chips for classification tasks will be much
lower than the map field approach.
8) The behavior of both Supervised ART-I and Supervised ART-II, for training
remotely sensed data, for all the domain of the dynamic parameters is well
understood.
9) An automatic system for classifying Landsat TM images, with very good
classification accuracy, has been developed.
10) This study shows that flying approach should be employed for vigilance dynamic if
the vigilance parameter is very high (>0.95), while floating approach should be
employed otherwise.
107
Some aspects derived from this study that need to be investigated in fiíture
works are: new learning algorithms need to be developed. These learning algorithms
must eliminate or reduce the under-training and over-training episode. Further studies
are recommended to investígate the behavior of these designed architectures for dealing
v/ith different digital signal processing problems. Some studies in this direction have
been already conducted. The developed architectures have been employed successfully
•for monitoring forest fire (Al-Rawi et al. 2001a, b, c & d) and for cloud detection (Al-
Rawi et al. 200le & f).
108
BIBLIOGRAPHY
Al-Rawi, K. R., 1999, "Supervised ART-I: A new neural network architecture for learning and classifying multivalued input patterns", Lecture Notes in Computer Science, 1606,756-765.
Al-Rawi, K. R., Gonzalo, C , and Arquero, A., 1999, "Supervised ART-II: A new neural network architecture, with quicker learning algorithm, for classifying multivalued input patterns", In proceeding of the European Symposium on Artificial Neural Network ESANN'99, Bruges, Belgium, 289-294.
Al-Rawi, K. R., Gonzalo, C.,and Martínez, E., 2000, "Supervised ART-II for classifícation Landsat Thamatic Mapper image", Remote Sensing in the 21st
Century: Economic and Environmental Applications, Casanova (ed), Balkema, Rotterdam, 229-235.
Al-Rawi, K. R., Casanova, J. L., and Calle, A., 2001a, "Burned área mapping system and fire detection system, based on neural networks and NOAA-AVHRR imagery", International Journal of Remote Sensing (in press).
Al-Rawi, K. R., Casanova, J. L., and Romo, A., 2001b, "IFEMS: New approach for monitoring wildfire evolution with NOAA-AVHRR imagery", International Journal of Remote Sensing (in press).
Al-Rawi, K. R., Casanova J. L., and Louakfaoui, M., 2001c, "IFEMS for monitoring spatial-temporal behaviour of múltiple fire phenomena", International Journal of Remote Sensing (in press).
Al-Rawi, K. R., Casanova J. L., and Calle, A., 200Id, "ART neural network for mapping burned área and determination severity of burn with Landsat TM images", Submitted to IEEE on Geoscience and Remote Sensing.
Al-Rawi, K. R., Casanova J. L., and Vasileisky, A., 200le, "A very quick neural network algorithm for cloud detection", Submitted to Geocarto International.
Al-Rawi, K. R., and Casanova, J. L., 2001f, "Neural network as an aid tool for building non-linear threshold algorithm for cloud detection", Submitted to Remote Sensing ofEnvironment.
Bachelder, I. A., Waxman, A. M., and Seibert, M., 1993, "A neural system for mobile robot visual place learning and recognition", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 512-517.
109
Baloch, A. A., and Waxman, A. M., 1991, "Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN", Neural Networks, 4, 271-302.
Baraldi, A., and Parmiggiani, F., 1995, "A neural network for unsupervised categorization for multivalued input patterns. An application to satellite image clustering", IEEE Transaction on Geoscience and Remote Sensing, 33, 305-316.
Benediktsson, J. A., Swain, P. H., and Ersoy, O. K., 1990, "Neural network approaches versus statistical methods in classification of multisource remote sensing data", IEEE Transaction on Geoscience and Remote Sensing, 28, 540-552.
Bernardon, A. M., and Carrick, J. E., 1995, "A neural system for automatic target learning and recognition applied to bare and camouflaged SAR target", Neural Amorfo, 8, 1103-1108.
Carpenter, G. A., and Grossberg, S. 1987a, "A massively parallel architecture for a self-organizing neural pattern recognition machine", Computer Vision, Graphic, and Image Processing, 37, 54-115.
Carpenter, G. A., and Grossburg, S., 1987b, " ART2: Stable self-organization of pattern recognition codes for analog input patterns", Applied Optics, 26, 4919-4930.
Carpenter, G. A., and Grossberg, S., 1990, " ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures", Neural Networks, 3, 129-159.
Carpenter, G. A., Grossberg, S., and Rosen, D. B., 1991a, " Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system", Neural Networks, 4, 759-771.
Carpenter, G. A., Grossberg, S., and Renold, J. H., 1991b, "ARTMAP: Supervised real-time learning and classification of nonstationary data by self-organizing neural network", Neural Network, 4, 565-588.
Carpenter, G. A., Grossberg, S., Markuzan, N., Reynold, J. H., and Rosen, D. B., 1992, "FUZZY ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps", IEEE Transaction On Neural Network, 3,698-882.
Carpenter, G. A., and Ross, W. D., 1993, "ART-EMAP: a new neural network architecture for object recognition by evidence accumulation", IEEE Transaction On Neural Network, 6, 805-818.
Carpenter, G. A., Gaja, M. N., Gapa, S., and Woodcok, C. E., 1997, "ART neural networks for remote sensing: vegetation classification from landsat TM and terrain data", IEEE Transaction on Geoscience and. Remote Sensing, 35, 308-325.
no
Carpenter, G. A., 1997, "Distributed learning, recognition, and prediction by ART and ARTMAP neural networks", Neural Networks, 10, 1473-1494.
Carpenter, G. A., and Markuzon, N., 1998, "ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases", Neural Networks, 11, 323-336.
Carpenter, G. A., 1998, "Distributed ARTMAP: a neural network for fast distributed supervised learning", Neural Networks, 11,793-813.
Caudell, T. P., and Healy, M. J., 1994, "Adaptive Resonance Theory networks in the Encephalon autonomous visión system", Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, USA, 1235-1240.
Caudell, T. P., Smith, S. D. G., Escobedo, R., and Anderson, M., 1994, "NIRS: large scale ART-1 neural architectures for engineering design retrieval", Neural Network, 7,1339-1350.
Dubrawski, A., and Crowley, J. L., 1994, "Learning locomotion reflexes: A self-supervised neural system for a mobile robot", Robotic and Autonomous System, 12,133-142.
Gan, K. W., and Lúa, K. T., 1992, "Chínese character classification using adaptive resonance network", Pattern Recognition, 25, 877-882.
Georgiopoulos, M, Fernlund, H., Bebis, G., and Heileman, G. L., 1996, "Order of search in Fuzzy ART and Fuzzy ARTMAP: Effect of the choice parameter", Neural networks, 9, 1541-1559.
Georgiopoulos, M, dagher, L., Heileman, G. L., and Bebis, G., 1999, "Properties of learning of a Fuzzy ART variant", Neural networks, 12, 837-850.
Gopal, S., Sklarew, D. M., and Lambin, E., 1994, "Fuzzy-neural networks in multi-temporal classification of landcover change in the Sahel", Proceeding of the DOSES Workshop on New Toolsfor Spatial Analysis, Lisbon, Portugal, 55-68.
Grossberg, S., 1976, " Adaptive pattern classification and universal recoding, II: Feed back, expectation, olfaction, and illusions", Biological Cybernetics, 23, 187-202.
Grossberg, S, 1980, "How does a brain build a cognitive code?", Psychological Review, 1,1-51.
Ham, F. M., and Han, S. W., 1996, "Quantitative study of the QRS complex using fuzzy ARTMAP and MIT/BIH arrhythmia datábase", in proceeding of Word congress on Neural Networks, 1,207-211.
Heermann, P. D., and Khazenie, N., 1992, "Classification of multispectral remote sensing data using a Back-Propagation neural network", IEEE Transaction on Geoscience and Remote Sensing, 30, 81-88.
l l l
Hepner, G. F., Logan,T., Ritter, N., and Bryant, N., 1990, " Artificial neural network classification using a minimal training set: comparison to conventional supervised classification", Photogrammetric Engineering & Remote Sensing, 56, 469-473.
Hopfield, J. J., 1982, "Neural networks and physical systems with emergent collective computational abilities," Proceeding of National academy of Sciences, 79, 2554-2558.
Keyvan, S., Drug, A., Rabelo, L. C , 1993, "Application of artificial neural networks for development of diagnostic monitoring system in nuclear plants", transaction of American Nuclear society, 1, 515-522.
Keyvan, S, 1999, "Application of ART2-A as a Pseudo-supervised paradígn to nuclear reactor diagnostics", Lecture Notes in Computer Science, 1606, 747-755.
Kohonen, T, 1982, "Self-organized formation of topologically correct feature maps," Biological Cybernetics, 43, 59-69.
Kumar, S. S., and Guez, A., 1989, "A neural network approach to target recognition", International Joint Conference on Neural Network, Washington DC, Hillsdale, NJ, Erlbaum Associate, II, 573-578.
Lang, K. J., and Withbrock, M. J., 1989, "Learning to tell two spirals apart", Proceedings 1988 Connectionist Models Summer School, 52-59.
Le Cun, Y. 1986, "Learning processes in an asymmetric threshold network", in Disordered Systems and Biological Organization, E. Bienenstock, F. Fogelman Souli, and G. Weisbruch, Eds., Berlín, Spring-Verlag.
Mannan, B., Roy, J., and Ray, K., 1998, "Fuzzy ARTMAP supervised classification of multi-spectral remotelt-sensed data", International Journal of Remote Sensing, 19, 767-774.
Mehta, B. V., Vij, L., and Rabelo, L. C., 1993, "Prediction of secondary structure of protein using fuzzy ARTMAP", in proceeding of Word Congress on Neural Networks, 1,228-232.
Mekkaoui, A., and Jespers, P., 1990, "An optimal self-organizing pattern classifier", International Joint Conference on Neural Networks, Washington DC, Hillsdale, NJ, Erlbaum Associate, 1,477-450.
Moore, B., 1989, "ART1 and patterns clustering", proceeding 1988 connectionist models Summer School, D. Touretzky, G. Hintoon, and T. Sejnowski, Eds, San Mateo, CA : Morgan Kaufmann, 174-185.
Mulder, N. J., and Spreeuwers, L., 1991, "Neural networks applied to the classification of remotely sensed data", International Geoscience and Remote Sensing Symposium (IGARSS'91). Espo, Finland, 2211-2213.
112
Murrshed, N. A., Bortozzi, F., and Sabourin, R., 1995, "Off-line signature verification, without a priori knowledge of class col. A new approach", Proceedings ofthe Third International Conference on Document Analysis and Recognition, Piscataway, NJ, USA.
Paola J. D.,and Schowengerdt, R. A., 1994, " Comparisons of neural networks to standard techniques for image classification and correlation", International Geoscience and Remote Sensing Symposium (IGARSS'94). Pasadena, Ca, USA, 1404-1406.
Paola, J. D., and Schowengerdt, R. A., 1995, "A review and analysis of backpropagatíon neural networks for classification of remotely-sensed multi spectral imagery", International Journal of Remote Sensing, 16, 3033-3058.
Paola, J. D., and Schowengerdt, R. A., 1997, "The effect of neural-network structure on a multispectral land-use / land-cover classification", Photogrammetric Engineering & Remote Sensing, 63, 535-544.
Parker, D., 1986, "Computational research in economics and management science", MIT, Cambridge, MA, USA, technical report TR-87, 1986.
Racz, J., and Dubrawski, A.; 1995, "Artificial neural network for mobile robot topological localization", Robotics andAutonomous Systems, 16, 73-80.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J., 1986, "Learning internal representations by back-propagation", Parallel distributed Processing: Explorations in the Microstructure of Cognition (D. E. Rumelhart and J. L. McClelland, Eds). MIT Press, Cambridge, Massachusetts, 318-362.
Salu, Y., and Tilton, J., 1993, "Classification of multispectral image data by the binary diamond neural network and by nonparametric, pixel-by-pixel methods", IEEE Transaction on Geoscience and Remote Sensing, 31, 606-617.
Seibert, M., and Waxman, M., 1992, "Adaptive 3-D object recognition from múltiple views", IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 107-124.
Seibert, M., and Waxman, A. M., 1993, "An approach to face recognition using saliency maps and caricatures", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 661-664.
Simpson, P. K., 1990, "Neural networks for sonar signal processing", Handbook of neural computing applications (A. J. Maren, C. T. Harston, and R. M. Pap (Eds.), San Diego, Academic press, 319-335.
Solaiman, B., and Mouchot, M. C , 1994., "A comparative study of conventional and neural network classification of multispectral data", International Geoscience. and Remote Sensing Symposium (IGARSS'94), Pasadena, CA, USA, 1413-1415.
113
Soliz, P., and Donohoe, G. W., 1996, "Adaptive resonance theory neural network for fundus image segmentation", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 1180-1183.
Srinivasa, N., and Sharma, R., 1996, "A self-organizing invertible map for active visión applications", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 121-124.
Tzeng, Y. C , Chen, K. S., Kao, W. L., and Fung, A. K., 1994, "A dynamic learning neural network for remote sensing applications", IEEE Transaction on Geoscience and Remote Sensing, 32,1096-1102.
Yool, S. R., 1998, "Land cover classification in rugged áreas using simulated moderate-resolution remote sensor data and an artificial neural network", International Journal of Remote Sensing, 19, 85-96.
Yoshida, T., and Omatu, S., 1994, "Neural network approach to land cover mapping", IEEE Transaction on Geoscience and Remote Sensing, 32, 1103-1109.
Warner, T. A., and Shank, M., 1997, "An evolution of the potential for fuzzy classification of multispectral data using artificial neural networks", Photogrammetic Engineering & Remote Sensing, 63,1285-1294.
Waxman, A. M., Seibert, M. R.? Gove, A., Fay, D. A., Bernardon, A. M., Lazott, C, Steele, W. R., and Cunnigham, R. K., 1995, "Neural processing of targets in visible, multispectral IR and SAR imagery", Neural Networks, 8, 1029-1051.
Werbos, P. J., 1974, "Beyond regression: New tools for prediction and analysis in the behavioural sciences", Ph.D. thesis, Harvard University, Cambridge, MA, USA.
Williamson, J. R., 1996, "Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimentional maps", Neural Networks, 9, 881-897.
Wilson, C. L., Wilkinson, R. A., and Ganis, M. D., 1990, "Self-organizing neural network character recognition on a massively parallel computer", International Joint Conferenceon neural Networks, San Diego, Piscataway, NJ, IEEE Service Center, II, 325-329.
114
APPENDIX
RESUMEN
A.l. INTRODUCCIÓN
A.1.1 Evolución histórica de las Redes Neuronales Artificiales (RNA)
Aún cuando el origen de las RNA se puede fechar en 1943, cuando McCulloch
and Pitts construyeron la primera estructura de RNA, los fundamentos de este área se
desarrollaron en la primera mitad de los años setenta. Fue entonces cuando (Werbos
1974) planteó los principios del algoritmo de aprendizaje conocido como Back
Propagation (BP) y (Grossberg 1976) estableció las bases de la Teoría de Resonancia
Adaptativa (Adaptive Resonance Theory (ART)). No obstante, fue en la década de los
ochenta cuanto se produjo un gran avance teórico en este campo. De tal forma que el
algoritmo BP fue desarrollado simultánea e independientemente por diferente autores
(Le Cun 1986, Parker 1986, y Rumelhart et al. 1986). Además se plantearon nuevas
estructuras de redes neuronales y nuevos algoritmos de aprendizaje. Así, Kohonen
propuso en 1982 las Redes Neuronales Autoorganizativas {Self-Organizing Map
(KSOM)).
En este trabajo se ha prestado una especial atención a la evolución
experimentada por las RNA tipo ART (Carpenter y Grossbergh 1987a&b), dada su
probada estabilidad, rapidez y precisión (Carpenter et al. 1991a&b, 1992, 1997, y Gan
& Lúa 1992). Estas prestaciones han facilitado su aplicación en diferentes y numerosas
áreas. Así, la compañía Boeing ha utilizado este tipo de RNA para la obtención de
información de diferentes sistemas con objeto de facilitar el diseño de otros nuevos
sistemas (Caudell et al. 1994). También se ha utilizado este tipo de redes para
115
reconocimiento de objetivos móviles (Seibert y Waxman 1992, Bernardon y Carrick
1995, Kumar y Guez 1989, Koch et al. 1995, y Waxman et al. 1995); Para el control de
motores en robótica (Baloch y Waxman 1991, Bachelder et al. 1993, Dubrawski y
Crowley 1994, Srinivasa y Sharma 1996); En navegación de robots (Racz y Dubrawski
1995); En visión artificial (Caudell y Healy 1994); Reconocimiento de objetos (Seibert
y Waxman 1992); Reconocimiento de caras (Siebert y Waxman 1993); Agrupación de
patrones (Moore 1989, Mekkaoui y Jespers 1990); Reconocimiento de caracteres
(Wilson et al. 1990); Procesado de señales de Sonar (Simpson 1990); Procesado de
imágenes médicas (Soliz y Donohoe 1996); Reconocimiento de ondas en
electrocardiogramas (Ham y Han 1996); Verificación de firmas (Murshed et al. 1995);
Identificación de fallos en plantas nucleares (Keyvan 1999); y en Teledetección (Gopal
etal. 1994, Baraldi y Parmiggiani 1995).
A. 1.2 Clasificación de datos remotamente detectados con RNA.
Los avances experimentados en las últimas décadas, tanto en la investigación
espacial como en las tecnologías de computación, han hecho posible la utilización de
datos remotamente detectados para la determinación y ubicación automática de las
clases temáticas presentes en la superficie terrestre. En la actualidad, este área de
conocimiento se caracteriza por ser una línea de investigación muy activa (Benediktsson
et al. 1990). Las ventajas que aporta el uso de RNA para llevar a cabo estas tareas de
clasificación, frente a algunos clasificadores convencionales, tales como el de máxima
probabilidad (MLC) son: 1) Las RNA no necesitan conocer apriori la distribución de
probabilidad para cada clase, ya que son sistemas no-paramétricos. Además, esto
permite introducir otros datos auxiliares de naturaleza no espectral (pendiente,
topografía, textura, ...etc), los cuales parecen mejorar la precisión de la clasificación
116
(Benediktsson et al. 1990, Carpenter et al. 1997). También, se ha probado que las redes
neuronales son más robustas cuando la distribución no es gaussiana (Paola y
Schowengerdt 1997, Hepner et al. 1990). 2) A diferencia de los clasificadores
convencionales, las RNA tiene capacidad para tratar con clasificaciones difusas (Paola y
Schowengerdt 1997, Warner y Zanca 1997, Yool 1998). En estos casos, los valores
proporcionados por las neuronas de salida pueden cuantificar el grado de pertenencia de
los datos de entrada a una clase determinada. Este aspecto es especialmente relevante
cuando se trabaja con sensores de baja resolución espacial. 3) El paralelismo inherente
en las RNA permite una relativa facilidad de computación de estos sistemas en
computadoras paralelas (Salu y Tilton 1993, Heermann y Khazenie 1992),
disminuyendo considerablemente el tiempo empleado en el proceso de clasificación,
respecto de los clasificadores clásicos. 4) La flexibilidad de las RNA permite mejorar
los resultados de clasificación en determinadas circunstancias (Carpenter et al. 1997). 5)
Por último, estos sistemas tienen la capacidad de poder establecer límites de decisión
arbitrarios (Paola y Schowengerdt 1995, Tzeng et al. 1994).
La red neural mas habitualmente utilizada en la literatura para clasificar datos
remotamente detectados es el Perceptron multi-capa (MLP), con el conocido algoritmo
de aprendizaje Backpropagation. Este algoritmo se basa en la minimización del error
entre el valor proporcionado por la red a su salida y el valor real. Algunos autores han
afirmado que los clasificadores convencionales tienen mejores prestaciones que el MLP
(Mulder y Spreeuwers 1991, Solaiman y Mouchot 1994). Sin embargo, otros han
concluido que el MLP clasifica datos remotamente detectados con mayor precisión que
el MLC ( Hepner et al. 1990, Heerman y Khazenie 1992, Paola y Schowengerdt 1994,
Yoshida y Omatu 1994). No obstante, la clasificación de datos remotos mediante la red
MLP presenta una serie de inconvenientes, como son: la arquitectura de la red no es fija,
117
el número de capas ocultas y el número de nodos en cada capa oculta debe determinarse
mediante prueba y error. Este proceso puede ser muy costoso desde el punto de vista de
tiempo de computación, dado que el entrenamiento de la red es lento. Además, durante
el proceso de aprendizaje, la red puede quedar atrapada en mínimos locales, lo que
impediría la convergencia de la red. Este problema se puede minimizar disminuyendo el
valor de la razón de aprendizaje, pero esto supone un aumento en el tiempo empleado
por la red durante el entrenamiento. (Heermann y Khazenie 1992) propusieron la
utilización de computadoras paralelas para reducir el tiempo de entrenamiento, a costa
de un aumento en el coste de hardware.
Algunos estudios (Carpentar et al. 1992) han mostrado que Fuzzy ARTMAP
proporciona una precisión de clasificación mayor que el MLP para imágenes del sensor
Thematic Mapper (TM), transportado por el satélite Landsat, empleando menos tiempo
para ello. Así mismo, estos autores concluyeron que en este caso, Fuzzy ARTMAP y
MLC proporcionaban la misma precisión de clasificación. Sin embargo, (Marinan et al.
1998) compararon las prestaciones de Fuzzy ARTMAP, MLP y MLC para clasificar
una imagen de 512x512 detectada por el sensor LISS-II transportado por el satélite
Indio IRS-1B, concluyendo que la precisión de clasificación de Fuzzy ARTMAP era
muy superior a la de los otros dos clasificadores. En cuanto al tiempo requerido para el
aprendizaje era ligeramente inferior que el tiempo empleado por el MLC y
considerablemente menor que él empleado por el MLP. Además es preciso destacar que
a diferencia del MLP, la arquitectura de Fuzzy ARTMAP está bien definida, siempre
converge, y es capaz por si misma de generar nuevos nodos que permitan representar
subclases. El principal inconveniente que presenta Fuzzy ARTMAP es la complejidad
de su arquitectura.
118
A.2. OBJETIVOS DE LA TESIS
De los aspectos discutidos anteriormente se sigue el objetivo de la presente
Tesis. Este objetivo se puede enunciar como la búsqueda de arquitecturas de redes
neuronales tipo ART que presenten las mismas prestaciones que ellas, pero que sean
más simples desde el punto de vista estructural, lo que a su vez supondrá la
disminución de los tiempos de cómputo asociados tanto al proceso de aprendizaje como
al de operación.
Este objetivo global, se puede desglosar en algunos objetivos parciales como son:
• Diseño de nuevas arquitecturas dé RNA tipo ART, que proporcionen la misma
precisión de clasificación que las ART clásicas, reduciendo la complejidad de sus
arquitecturas.
• Propuesta de algoritmos de aprendizaje para estas arquitecturas.
• Codificación de los algoritmos de aprendizaje de las diferentes arquitecturas
propuestas.
• Estudio exhaustivo y comparativo de las prestaciones de las redes y los algoritmos
propuestos para el caso de la clasificación de imágenes remotamente detectadas por
el sensor Thematic Mapper.
A.3. REDES NEURONALES ARTIFICIALES TIPO ART
Los principios de la Teoría de Resonancia Adaptativa (ART) fueron planteados
por Carpenter y Grossberg (Centre for Adaptive Systems, Department of Cognitive and
Neural System, University of Boston), como una teoría sobre el procesado de
información del sistema cognitivo humano (Grossberg 1976, 1980). A partir de esta
teoría, se desarrollaron inicialmente, diferentes estructuras no supervisadas, ART1
(Carpenter y Grossberg 1987a), ART2 (Carpenter y Grossberg 1987b), ART3
119
(Carpenter y Grossberg 1990), SART (Baraldi y Parmiggiani 1995) y Fuzzy ART
(Carpenter et al. 1991a). Todas estas redes eran capaces de agrupar las diferentes
entradas en clases, utilizando únicamente la información que caracterizaba a dichas
entradas (aprendizaje no supervisado). La diferencia fundamental entre ART1 y ART2
es que la primera solo admite datos binarios, mientras que la segunda también admite
datos analógicos. En ambas, hay flujo de información hacia delante y hacia atrás. Hacia
delante, a través de los pesos que conectan cada nodo de la capa de entrada con todos
los nodos de la capa que realiza el agrupamiento de los datos de entrada. A cada uno de
estos nodos se le va a denominar nodo categoría. Y hacía atrás mediante otro conjunto
de pesos que conecta cada nodo categoría, con todos los nodos en la capa de entrada. Al
igual que ART2, Fuzzy ART puede clasificar tanto datos binarios como analógicos. Sin
embargo, en este último caso la información solo fluye hacia delante desde la capa de
entrada hasta la capa clasificadora. Otra diferencia fundamental entre Fuzzy ART,
ART1 y ART2, es que el operador intersección de la teoría de conjuntos ( n ) , ha sido
sustituido por el operador ( A ) que representa al operador de mínimo valor en la teoría
de lógica difusa (fuzzy).
La primera red neuronal tipo ART que presentó un aprendizaje supervisado fue
ARTMAP, la cual fue propuesta por Carpentar et al. en (1991). En este caso, además de
las características a clasificar es preciso proporcionar a la red, durante la fase de
entrenamiento, el código de clase que corresponde a cada entrada. En 1992, estos
mismos autores presentaron otra nueva red tipo ART con aprendizaje supervisado
Fuzzy ARTMAP (Carpenter et al. 1992). Posteriormente, otras muchas arquitecturas
supervisadas tipo ART han sido investigadas, entre las que cabe mencionar ART-EMAP
(Carpenter y Ross 1993), Gaussian ARTMAP (Williamson 1996), ARTMAP-IC
(Carpenter y Markuzon 1998), y Distributed ARTMAP (Carpenter 1998). Todas estas
120
arquitecturas, se caracterizan porque la supervisión se lleva a cabo mediante un "map
field" que requiere la presencia de dos módulos tipo ART (ARTa y ARTb). Las
principales diferencias entre ARTMAP y Fuzzy ARTMAP radican en que mientras la
primera está construida con dos módulos de ART1, la segunda utiliza dos módulos de
Fuzzy ART. ARTMAP tiene la habilidad de aprender y clasificar patrones de entrada
binarios multievaluados, mientras que Fuzzy ARTMAP también admite patrones
analógicos.
De todas las redes supervisadas mencionadas anteriormente, Fuzzy ARTMAP ha
sido la más utilizada. Ella ha sido aplicada a la resolución de diferentes problemas,
como son: análisis automático de electrocardiogramas (Ham y Han 1996); gestión y
diagnóstico de centrales nucleares (Keyvan et al. 1993); o predicción de la estructura
secundaria de algunas proteínas (Mehta et al. 1993).
A.3.1 Fuzzy ART
Dado que todas las arquitecturas y algoritmos propuestos en este trabajo están
inspirados en Fuzzy ART, y Fuzzy ARTMAP, se va a realizar aquí una breve
descripción de ambas. Previamente, es preciso hacer notar que ambas mantienen las
características básicas y propias de todo los sistemas tipo ART. Entre ellas, es
especialmente resefiable, el emparejamiento de acuerdo a criterios de semejanza
(matching) entre los patrones de entrada y los vectores prototipo previamente
aprendidos por la red. Este proceso de emparejamiento puede llevar a la red a un estado
resonante que puede dar lugar al aprendizaje de nuevos prototipos (categorías) o a la
búsqueda de prototipos semejantes y previamente aprendidos. Si la semejanza es mayor
entre el patrón de entrada a la red y el almacenado que el predeterminado, la resonancia
ocurre y la nueva información se incorpora al nodo de la categoría seleccionado
121
mediante el entrenamiento de sus pesos. El criterio de semejanza se establece a través
del denominado parámetro de vigilancia/?. Este parámetro determina el umbral que
debe superar un nodo categoría comprometido para poder representar un patrón de
entrada dado, antes de que se dispare la búsqueda de otro nodo categoría que represente
mejor dicho patrón. Si ninguno de los nodos categoría comprometidos supera dicho
umbral, se debe comprometer un nuevo nodo categoría. Este proceso se puede repetir,
siempre que no se supere la capacidad de memoria de la red. El parámetro de vigilancia,
p, es un número adimensional definido en el intervalo (0, 1]. Un valor de este
parámetro igual a 1 representa una semejanza perfecta, es decir determina clases muy
bien diferenciadas, pero da lugar a un número alto de nodos categoría, mientras que
valores bajos de este parámetro permiten trabajar con pocos nodos categoría pero da
lugar a clases muy generales. Este parámetro es una de las claves de todas las RNA tipo
ART. Su valor depende del tipo y volumen de datos, la precisión de clasificación que se
desee, la velocidad requerida y la memoria disponible. Este parámetro se mantiene
constante en la operación de todas las redes no supervisadas.
En la figura 2-1 de la memoria, se muestra la dinámica de Fuzzy ART. En esta
figura Fx representa la capa de entrada y F2 la denominada capa clasificadora. Los
pesos ^conectan cada nodo de la capa de entrada con todos los nodos de la capa
clasificadora. El aprendizaje de los pesos del nodo ganador, wu, solo se lleva acabo si
este nodo pasa la prueba de semejanza, o dicho en otras palabras supera el parámetro de
vigilancia, sino este nodo sale de la competición (reset). En la figura 2-1 \X\ representa
el grado de semejanza entre la entrada y los pesos del nodo categoría ganador J. Este
grado de semejanza está determinado por la relación X = ̂ ( 4 ( , ) A wu). La selección
del nodo ganador supone calcular el nivel de activación de cada nodo categoría, Tj°
122
(ec. 2-1), y elegir el nodo que alcanza el nivel mas alto. El valor de y. es una
estimación del grado de pertenencia de la entrada a la clase representada por el nodo/.
La arquitectura de Fuzzy ART se muestra en la figura 2-2, donde se han
representado los 2M nodos de la capa de entrada, siendo M el número de valores que
definen a cada patrón de entrada. Los M últimos nodos de entrada representan los
valores complementarios de dichos patrones. Además en la figura 2-2 se han
representado los nodos categoría, así como todas las conexiones entre los nodos de F¡ y
F2 • Los nodos categoría cuyo índice va desde 1 hasta C reciben el nombre de nodos
categoría comprometidos, mientras que los nodos categoría cuyos índices van desde
C+l hasta N se denominan nodos categoría no comprometidos. Cuando todos los nodos
categoría comprometidos fallan en la representación de una entrada y consecuentemente
están fuera de competición uno de los nodos categoría no comprometidos debe ser
comprometido. Una vez que se ha encontrado un nodo capaz de representar al patrón de
entrada a la red y dicho nodo ha pasado el test de vigilancia, el valor de los pesos de ese
nodo categoría debe ser actualizados para que incorporen las características del nuevo
patrón al nodo J (ec.2-7). La ecuación de adaptación de los pesos viene dada por la
siguiente expresión:
w"J» = /3{A? A wff ) + (1 - / ? ) < ; i=h ..., 2M 2-7
Donde J3 e (0, 1] es el parámetro denominado razón de aprendizaje (learning raté).
A.3.2 Fuzzy ARTMAP
Como ya se ha mencionado, Fuzzy ARTMAP es una generalización de
ARTMAP (Carpenter et al. 1991b) (ver figuras 3-ly 3-2). En este caso, el mapfield es
123
un matriz deNxL pesos binarios (w jk; j=l, ..., N; k=l, ..., L) inicializados a 1 (figure
3-4), siendo L el número de clases a considerar.
A diferencia de las redes tipo ART no supervisadas, en las supervisadas, el
parámetro de vigilancia p s[0, 1] puede aumentar durante el proceso de aprendizaje.
Así por ejemplo, si el nivel de activación del nodo ganador es mayor que el valor del
parámetro de semejanza predeterminado y sin embargo, el test de semejanza a esa clase
no es superado, entonces, p toma el valor del nivel de semejanza aumentado en una
pequeña cantidad e, como se muestra en la siguiente ecuación:
1M 1 ¿M
, . ^ . , „ . 3-1
Esta definición del parámetro de vigilancia va a permitir clasificar eventos raros
(Carpenter et al. 1992). La figura 3-5 muestra la dinámica del parámetro de vigilancia
en entornos supervisados.
Para llevar a cabo el entrenamiento de la red Fuzzy ARTMAP, se deben
presentar a los módulos ARTa y ARTb los pares formados por los vectores de entrada
A(t> y b . El primer conjunto de vectores representa los patrones de entrenamiento,
mientras que el segundo grupo representa el código binario asignado a la clase a la que
pertenece el correspondiente patrón de entrenamiento.
Cuando el nivel de activación del nodo ganador supera el parámetro de
vigilancia, se debe evaluar la semejanza de la clase que en este caso se considerará
aceptable, si supera el valor predeterminado del parámetro de vigilancia del mapfield,
pab, entonces se procede a la actualización de los pesos de acuerdo a las ecuaciones
" T = A4 (° A Wf) + (1 - PW? ;i=l 2M
<w = fiibP A < ) + ( 1 - / ? ) < ;k=l, ...,L
124
en caso contrario será el parámetro de vigilancia el que deberá cambiar su valor
mediante la ecuación 3-1. El parámetro de vigilancia, pab, se mantiene constante
durante toda la fase de entrenamiento. La red repite el proceso de búsqueda de un nodo
ganador que supere todas los tests entre todos los nodos comprometidos y en caso de no
encontrarlo compromete nuevos nodos hasta encontrar el que represente al patrón de
entrenamiento que se está considerando en ese momento.
Una vez finalizada la fase de entrenamiento, el módulo ARTA se reduce a la
capa de entrada (figure 3-3) y los valores de los pesos w¡j y wjk se mantienen constantes.
Durante la fase de clasificación, se calcula el nivel de activación de todos los nodos
categoría comprometidos, para cada patrón de entrada que se presenta a ART a. Aquel
nodo que alcance el nivel de activación máximo será el ganador. Entonces se calcula la
puntuación de este nodo para cada uno de los nodos de entrada de ART¿ según la
ecuación 3-6. El índice del nodo bK que alcance la máxima puntuación indica el código
de la clase a la que pertenece el patrón de entrada.
A.4. PROPUESTA DE DOS VERSIONES MEJORADAS DE FUZZY ART
Con objeto de reducir los tiempos de cómputo asociados a los algoritmos de
aprendizaje y clasificación de la arquitectura no supervisada Fuzzy ART, en este trabajo
se han desarrollado dos nuevas versiones de dicha arquitectura que se han denominado
"Flagged Fuzzy ART" y "Compact Fuzzy ART" respectivamente.
A.4.1 Versión "Flagged" de Fuzzy ART
La primera versión ("Flagged") desarrollada para mejorar las prestaciones de Fuzzy
ART, está inspirada en las ideas de (Geongiopoulos et al. 1999). Estos autores
125
propusieron realizar la búsqueda del nodo de máximo valor entre todos los nodos
categoría comprometidos y un nodo categoría no comprometido. Sus aproximación
obliga a utilizar valores iniciales de los pesos altos y consecuentemente un aprendizaje
rápido. En la aproximación flagged propuesta, la determinación del nodo con máximo
valor se lleva a cabo considerando únicamente el nodo no comprometido,
adecuadamente marcado {flagged) mediante un valor, </>c+x, fijo, menor que cero, pero
superior a los niveles de activación de los nodos comprometidos que han quedado fuera
de competición. De esta forma, se asegura que cuando todos los nodos categoría
comprometidos están fuera de competición, el nodo no comprometido con índice C+l y
valor <f>c+l, va a ser el ganador. Además este nodo siempre va a pasar el test de
semejanza ya que se ha demostrado que él grado de semejanza de cualquier nodo
categoría comprometido por primera vez es 1 (ec. 2-19). Una vez que la red alcance la
resonancia, es preciso determinar si se ha activado un nuevo nodo comprometido. En
ese caso, el número de nodos categoría comprometidos ha aumentado en una unidad y
los pesos del nodo que pasa a ser abanderado se deben inicializar a 1. La arquitectura de
esta nueva red propuesta se muestra en la figura 2-3. Es claro que el número de
comparaciones que es preciso realizar para determinar el máximo valor de activación, es
inferior al número de comparaciones implicadas en la operación de Fuzzy ART.
A.4.2 Versión "Compact" de Fuzzy ART
En esta versión, la determinación del nodo con máximo valor de activación solo va a
tener en cuenta los nodos categoría comprometidos. Los nodos categoría no
comprometidos se van a ir comprometiendo en orden secuencial sin necesidad de estar
previamente etiquetados (flagged). De esta manera, si para una determinada entrada,
todos los nodos comprometidos quedan fuera de competición, el siguiente nodo no
126
comprometido pasa a estar comprometido y a representar la nueva entrada, mediante la
definición de sus pesos según la siguiente ecuación:
"¡SI, = /H(° + ( 1 " P) ;i=h..,2M (2-22)
Como se desprende de dicha ecuación, no es preciso la inicialización de los pesos de los
nodos no comprometidos antes de que dichos nodos se comprometan. Este aspecto ya
supone una optimización del algoritmo original, ya que supone una disminución
importante en el número de operaciones aritméticas que se deben realizar. Esta nueva
versión requiere C-l comparaciones, frente a las N-l requeridas por Fuzzy ART, lo cual
supone una reducción importante en los cálculos, dado que N»C. La figura 2-4
muestra la arquitectura completa de la versión "Compact" de Fuzzy ART. La tabla 2-1
muestra un estudio comparativo entre las versiones original, "Flagged" y "Compact" de
Fuzzy ART. En esta tabla se puede apreciar que las dos versiones propuestas son mas
rápidas que la original, y que entre ellas la compacta es mas recomendable, dado que
aunque la diferencia es muy pequeña, está última requiere menos memoria y menos
operaciones de asignación.
A.5. PROPUESTA DE DOS NUEVAS ARQUITECTURAS SUPERVISADAS TIPO ART
Las dos nuevas arquitecturas supervisadas tipo ART propuestas, permiten, al
igual que Fuzzy ARTMAP, aprender y clasificar patrones binarios y analógicos
multievaluados. Ambas RNA proporcionan la misma precisión de clasificación que la
original, sin embargo las nuevas estructuras son mucho mas simples. De lo cual se
deriva una importante reducción en la complejidad computacional y consecuentemente
en los tiempos de operación de la red, tanto en la fase de aprendizaje como en la de
clasificación.
127
Tanto Supervised ART-I como Supervised ART-II se han construido a partir de
un único módulo ART. Esta aproximación supone la eliminación del segundo módulo
ART, así como del módulo llamado mapfield, lo que a su vez implica la eliminación de
los pesos asociados a estos dos módulos y el parámetro de vigilancia del mapfield. A su
vez estas simplificaciones han permitido la utilización de un código de clase analógico
(entero positivo) en vez de un código binario de L dígitos. Y la sustitución de la
memoria de NxL pesos del mapfield por una memoria de N posiciones, utilizada para
almacenar las etiquetas asociadas a cada nuevo nodo comprometido con el código de la
clase a la que pertenece.
Las figuras 4-2 y 5-2 muestran las arquitecturas de Supervised ART-I y
Supervised ART-II respectivamente.
A.5.1 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-I
El entrenamiento de Supervised ART-I tiene lugar mediante la presentación a la
red de los pares formados por vectores de los patrones de entrada A® y el código de la
clase asignado a ese patrón, b(tK El nivel de activación de cada nodo comprometido se
calcula mediante la ecuación (2-1). De entre todos los nodos categoría comprometidos
se selecciona el nodo J (máximo nivel de activación) que cumpla las dos condiciones
siguientes: que tenga una valor de semejanza mayor o igual que el parámetro de
vigilancia y que supere el test de semejanza a esa clase (b(t)). Entonces el nodo es
entrenado de acuerdo a la siguiente ecuación:
w»J» = J3(A¡" A O + (1 - P)wf ; i=l,.... 2M
En el caso de que se supere el test de semejanza, pero no el de clase, entonces es preciso
actualizar el valor del parámetro de aprendizaje según la ecuación:
128
i 2M
P = T 7 { £ K ( ° A W Í / ) } + Í
En este caso se debe seleccionar otro nodo comprometido. El nodo categoría que no ha
podido representar el patrón de entrada actual debe ser puesto fuera de competición,
para evitar su posible reelección. En el caso de que ninguno de los nodos categoría
comprometidos sea capaz de representar esa entrada, se debe comprometer un nuevo
nodo y etiquetarlo con el código de clase 6 . Cada vez que se introduce un nuevo
patrón de entrenamiento a la red el parámetro de vigilancia debe volver a su valor
inicial.
Una vez que Supervised ART-I ha sido entrenada con un número suficiente y
significativo de patrones de aprendizaje, el proceso de clasificación es muy sencillo. En
este caso solo es preciso presentar a la red los patrones a clasificar, A(t). La red calcula la
función de activación de cada uno de los nodos categoría comprometidos, siendo el
ganador, el que consiga un nivel de activación mas alto. El código de clase asignado a
este nodo durante la fase de entrenamiento determinará la clase a la que pertenece el
patrón que en ese momento está presente en la entrada de la red, siempre y cuando, el
valor de semejanza del nodo ganador supere el valor inicial del parámetro de vigilancia,
p. Si esto no ocurre, esa entrada no puede ser clasificada. En este sentido, el grado de
representatividad del conjunto de patrones que se utilice para entrenar a la red es un
factor crítico para los resultados de clasificación.
Los pseudo-códigos de los algoritmos de aprendizaje y clasificación de
Supervised ART-I se encuentran detallados en el capítulo 4 de la memoria.
129
A.5.2 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-II
Como se puede apreciar en la figura 5-2, la memoria utilizada para almacenar los
nodos categoría en Supervised ART-I, se ha dividido en L "pilas", cada una de ellas
conteniendo Nk(Nk; k=l, ..., L) nodos categoría. El índice, k, asociado a cada "pila"
representa el código de clase de los nodos categoría incluidos en ella. El número de
nodos, Nk, puede variar de unas pilas a otras, de hecho este número puede depender de
la naturaleza y tamaño de los datos. No obstante, si no se tiene un conocimiento apriori
de ellos, se debe asignar inicialmente el mismo número de nodos a cada "pila".
El entrenamiento de Supervised ART-II se inicializa, al igual que para el resto de
las redes supervisadas tipo ART, presentando a la red un patrón de entrada, A ¡ y su
correspondiente código de clase, ¿ . A continuación se debe calcular el nivel de
activación de todos los nodos comprometidos de todas las pilas. En este caso el nivel de
activación se ha definido mediante la ecuación 5-1.
2M
S ( ^ , ( , ) A W M ) f(0 _ J=Í
'** 2M ; jk=l,...,C(k),k=l,...,L 5-1
; = 1
donde C(k) representa el número de nodos comprometidos en la pila k-ésima y wykk son
los pesos asociados a las conexiones entre el nodo categoría j k de esta pila con el nodo
de entrada i-ésimo. A continuación se debe determinar el nodo de cada pila con máximo
nivel de activación, siendo el ganador el que obtenga el máximo nivel de activación
entre los candidatos. Al igual que en Fuzzy ARTMAP y Supervised ART-I, este nodo
debe superar el test de semejanza y el de semejanza de clase para que se lleve a cabo el
aprendizaje de sus pesos. En el caso de que el nodo ganador (JK) no pueda representar
130
la entrada considerada, entonces este nodo se saca de la competición (Tj¡¿ = -1) y se
selecciona otro candidato de esa "pila". El proceso se repite con todos los nodos
candidatos hasta que se encuentre un nodo comprometido capaz de representar al patrón
de entrada que se está considerando en ese momento. En el caso de que ninguno de los
nodos categoría comprometidos de la pila asignada al código de clase, pueda representar
a la entrada actual, es preciso comprometer un nuevo nodo de esta pila, dado que
ninguno de los nodos comprometidos pertenecientes a otras pilas, va a superar el test de
semejanza de clase. Es claro, que en esta aproximación el número de nodos a considerar
para cada entrada que se presenta a la red es considerablemente inferior que en el caso
de Supervised ART-I. Sin embargo, Supervised ART-I permite que cualquier nodo
categoría pueda representar cualquier clase y Supervised ART-II no. No obstante, esta
desventaja se puede hacer mínima si la memoria utilizada en Supervised ART-I para
almacenar las etiquetas de los nodos categoría se utiliza para doblar el número de nodos
categoría de cada pila en Supervised ART-II, sin aumentar los requisitos de memoria.
En Supervised ART-II, el proceso de clasificación se reduce a asignar al patrón
de entrada presentado a la red en cada momento, el código de clase asignado durante la
fase de entrenamiento, al nodo que consigue un mayor nivel de activación para dicho
patrón de entrada. Si ningún nodo consigue superar el parámetro de vigilancia, esa
entrada no puede ser clasificada por la red.
Los pseudo-códigos de los algoritmos de aprendizaje y clasificación de
Supervised ART-II se encuentran detallados en el capítulo 5 de la memoria.
131
A.6. EVALUACIÓN DE LAS PRESTACIONES DE SUPERVISED ART-I Y SUPERVISED ART-II EN LA CLASIFICACIÓN DE IMÁGENES REMOTAMENTE DETECTADAS
Una vez desarrolladas las dos nuevas propuestas de arquitecturas supervisadas
tipo ART, se ha llevado a cabo un exhaustivo estudio del comportamiento de los
algoritmos tanto de aprendizaje como de clasificación asociados a dichas arquitecturas,
para el caso particular de imágenes remotamente detectadas por el sensor Thematic
Mapper (TM) (Al-Rawi et al. 2000). Con este objetivo, se ha estudiado la dependencia,
del número de nodos categoría de las redes, de los tiempos de aprendizaje y
clasificación, así como de la precisión de la clasificación, de los parámetros dinámicos
(p,j3), para conjuntos de entrenamiento de diferentes tamaños (200, 600, 1000, 3000,
9000 y 15000). Para ello, se ha analizado el dominio completo definido por el parámetro
de vigilancia p e [0,1] y el parámetro que determina la razón de aprendizaje /? € (0, 1],
para cada uno de los conjuntos de entrenamiento. Este estudio ha implicado entrenar
cada una de las redes definida por un par de valores (p,j3), para cada uno de los
conjuntos de entrenamiento. En total y dado el conjunto de valores que se le han
asignado a los parámetros de aprendizaje: p (0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, y
0.95) y J3(0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.65,
0.70, 0.75, 0.80, 0.85, 0.90, 0.95, y 1.0), cada red se ha entrenado 960 veces. Es preciso
mencionar que este estudio ha sido posible dada la simplicidad de las arquitecturas de
las redes propuestas en este trabajo. Así mismo hay que resaltar que de los estudios
teóricos se concluye que el número de nodos categoría implicados en el aprendizaje de
un determinado conjunto de entrenamiento es el mismo para Fuzzy ARTMAP,
Supervised ART-I y Supervised ART-I I.
Como datos de entrada a la red, tanto en la fase de aprendizaje como en la de
clasificación, se han utilizado los valores espectrales asignados a las 6 bandas del sensor
132
TM que presentan la misma resolución espacial (TM1, TM2, TM3, TM4, TM5 y TM7),
normalizados entre [0,1]. En cuanto a la asignación de la clase temática a cada uno de
los patrones de entrenamiento, se ha llevado a cabo mediante visitas de campo a las
áreas elegidas, así como otro tipo de información auxiliar (mapas, monografías,...).
En la tabla 6-2 se han extraído los resultados más significativos de los
experimentos realizados, como son los valores máximos y mínimos del número de
nodos categoría comprometidos, los tiempos de aprendizaje y clasificación y la
precisión de clasificación para todos los conjuntos de entrenamiento, indicando el par de
valores de los parámetros p y (3 a los que corresponden. En esta tabla se puede observar
que el número de nodos comprometidos varía desde 31 (para p=0.70, /?=1.0 y 200
patrones de aprendizaje) hasta 1077 (para p=0.70, J3 =0.40 y 15 000 patrones de
aprendizaje). Mientras que la precisión en la clasificación ha variado desde el 64.66%
(para p=0.S0, /?=0.95 y 15 000 patrones de entrenamiento) hasta 81.87% (para
p=0.95, ,0=0.40 y 9 000).
Con objeto de mostrar el comportamiento de los algoritmos para un número fijo
de patrones, dependiendo de los valores de p y fi y dado que la máxima precisión se
ha alcanzado para 9 000 patrones, en las figuras 6-1, 6-2 y 6-3 se muestra la variación
del número de nodos comprometidos, del tiempo de aprendizaje para Supervised ART-I
y el tiempo de aprendizaje para Supervised ART-II respectivamente, en todo el dominio
definido por p y fi para ese número de patrones.
En este estudio se ha constatado que el parámetro de vigilancia/? es el
parámetro determinante de las características y prestaciones de una determinada red tipo
ART. Como se puede observar en la figura 6-1, a medida que p aumenta de valor,
también aumentan el número de nodos comprometidos. Mientras que para p<0.S0,
133
este número de nodos es prácticamente independiente de /?, siempre y cuando, este
parámetro esté comprendido en el intervalo [0.3,0.8], el número de nodos
comprometidos aumentan notablemente^ añedida que el parámetro de vigilancia se
aproxima a su valor máximo. La influencia del parámetro fi en el número de nodos
comprometidos, también se puede observar en la figura 6-1. Es claro que para valores
medios y bajos de p, este número aumenta cuando fi se acerca a los extremos de su
intervalo de definición. Ello es debido a que valores de /? pequeños pueden provocar
un aprendizaje incompleto (under-training), mientras que valores altos provocan el
efecto contrario (over-training) y en cualquiera de los dos casos, cada categoría
establecida no es capaz de representar a todos sus miembros, provocando un aumento
del número de nodos comprometidos. Estos efectos de sobre aprendizaje y aprendizaje
incompleto, conllevan un aumento en los tiempos de aprendizaje y clasificación, como
consecuencia del aumento de nodos comprometidos, y además disminuye la precisión
de la clasificación. Este comportamiento queda reflejado en las figuras (6-2, 6-3 6-5 y
6-6).
El estudio comparativo de los tiempos de aprendizaje de Supervised ART-I y
Supervised ART-II, ha permitido determinar un valor umbral para el número de nodos
categoría (ver figura 6-4). De forma que si el número de nodos es menor que 1000,
Supervised ART-I emplea menos tiempo en aprender, mientras que por encima de ese
valor Supervised ART-I I, aprendemas rápido.
Con objeto de mejorar los resultados de clasificación, se ha entrenado la red con
9 000 patrones y con valores del parámetro de vigilancia superiores a los considerados
hasta este momento. Así, se han realizado experimentos para /?=0.96, 0.97, 0.98, y
0.99, variando el parámetro de aprendizaje en todos los valores considerados
anteriormente. De esta forma se ha obtenido una precisión en la clasificación del
134
85.87% para /?=0.98 y /?=0.50. Los tiempos de clasificación en este caso han sido de
25 minutos cuando se ha realizado eñ una estación de trabajo SUN 4 SPARC y de 9.50
minutos en una estación ALPHA 500. Resultados concretos de clasificación son
presentados y discutidos en la memoria.
Las diferencias entre las dos arquitecturas propuestas, básicamente residen en
que Supervised ART-II requiere menos memoria y tiene un aprendizaje mas rápido que
Supervised ART-I, en el caso de dates no-homogéneos.
A.7. PRESTACIONES DE REDES SUPERVISADA TIPO ART PARA DIFERENTES DINÁMICAS DEL PARÁMETRO DE VIGILANCIA
En todos los experimentos llevados a cabo en este trabajo, hasta este momento,
la actualización del valor del parámetro de vigilancia se ha llevado a cabo según la
aproximación propuesta en la literatura para todas las redes tipo ART supervisadas. A
esta aproximación, se le ha denominada volante (flying) (ec 3-1), para diferenciarla de
las aproximaciones que se han propuesto en este trabajo: parámetro de vigilancia fijo
durante el aprendizaje (vigilancia fija), parámetro de vigilancia libre y parámetro de
vigilancia flotante.
En la aproximación constante, todos los nodos categoría comprometidos
presenta el mismo nivel de confianza. Tal que cualquier nodo cuyo nivel de semejanza
supere el parámetro de vigilancia inicial y pertenezca a la misma clase puede
representar la entrada, independientemente del orden establecido por los valores de la
función de activación de todos los nodos categoría.
En la aproximación libre, el parámetro de vigilancia puede cambiar libremente a
valores superiores o inferiores al valor inicial, su valor en cada momento vendrá
determinado por el valor de semejanza del último nodo categoría que no ha sido capaz
de representar el patrón de entrada considerado en ese momento.
135
La diferencia entre la aproximación libre y flotante, es que en esta última, se
impide que el parámetro de vigilancia sea inferior a su valor inicial. De esta forma se
asegura que todos los nodos categoría comprometidos tengan un nivel de confianza
mínimo. En este caso el número de nodos comprometidos resultantes será superior a las
otras dos aproximaciones propuestas, pero siempre será inferior al caso del parámetro
de aprendizaje volante.
Con objeto de investigar la incidencia de estas aproximaciones en la operación
de este tipo de redes, se han llevado a cabo experimentos para cinco pares de valores de
los parámetros de la red (p 0 y /?), todos ellos pertenecientes al conjunto de valores
óptimos, previamente obtenidos para la aproximación volante ((0.98,0.50), (0.95,0.20),
(0.90,0.15), (0.70,0.15) y (0.00,0.15)). La precisión de clasificación obtenida ha variado
desde 66.71%, cuando se ha utilizado la aproximación de variación libre, a 87.05%
cuando se ha usado la aproximación volante. El número de nodos categoría ha sido
mínimo en la aproximación libre (120) y máximo en la volante (1252). Estos valores de
precisión son los máximos alcanzados para cada aproximación y corresponden a los
valores de los parámetros dinámicos /?=0.98 y ¡3=0.50
La conveniencia de usar una aproximación u otra depende de los requerimientos
impuestos y de los valores de los parámetros dinámicos que se hayan seleccionado. Así
se ha observado que si se desea optimizar la precisión de la clasificación, para valores
del parámetro de vigilancia superiores o iguales a 0.95, la mejor aproximación para
determinar la dinámica de este parámetro es la volante, pero si su valor es inferior a este
umbral, entonces la dinámica mas aconsejable es la proporcionada por la aproximación
flotante. En cambio, desde el punto de vista de la minimización del número de nodos
categoría y por tanto reducción de los tiempos de aprendizaje y clasificación, entonces
la aproximación flotante es la mas adecuada, si p0 = 0.
136
Cuando el parámetro de vigilancia se aproxima a 1, el número de nodos categoría y
la precisión de clasificación, son prácticamente independientes de que se utilice la
aproximación volante, la flotante o la fija. Las prestaciones de la red son muy similares
Cuando usando la aproximación flotante o la libre p0 se aproxima a cero, siendo iguales
en el límite.
A.8. CONCLUSIONES
Las conclusiones que se pueden extraer de los resultados obtenidos en este trabajo
de Tesis Doctoral, se pueden concretar en los siguientes puntos:
1.- Se han desarrollado dos nuevas versiones de la Red Neuronal Artificial (ART)
conocida como Fuzzy ART: Flagged y Compact. El análisis de los algoritmos no
supervisados de estas nuevas versiones, muestran que su capacidad de clasificación es
idéntica a la del algoritmo original, pero reducen su complejidad computacional y por
tanto los tiempos de cómputo asociados tanto a la etapa de entrenamiento, como a la de
clasificación.
2.- Se ha desarrollado una nueva arquitectura de RNA tipo ART supervisada,
denominada Supervised ART-I, así como su correspondiente algoritmo de aprendizaje
supervisado. Esta arquitectura propuesta es mucho mas simple que la del modelo
original, ya que ella está basada en un único módulo de ART, en vez de en dos. Esto
además supone la eliminación del módulo de unión entre ellos (map field). Un estudio
teórico ha mostrado que Supervised ART-I, tienen la misma precisión en el proceso de
clasificación que Fuzzy ARTMAP, pero reduce sus requisitos de almacenamiento de la
información y el tiempo de aprendizaje.
3.- Se ha propuesto una segunda arquitectura tipo ART supervisada, a la que se le ha
denominado Supervised ART-II, la cual presenta las mismas prestaciones respecto al
proceso de clasificación que Fuzzy ARTMAP y Supervised ART-I. La capa de nodo
137
categoría de Supervised ART-II se ha estructurado en L pilas. Esto supone una
importante reducción en la memoria requerida para almacenar las etiquetas de las clases
respecto a Supervised ART-I.
4.- Supervised ART-I permite que cualquier nodo categoría pueda representar cualquier
clase mientras que en Supervised ART-II cada nodo categoría esta predestinado a
representar una determinada clase. Ahora bien, imponiendo un tamaño de memoria
constante, la memoria utilizada en Supervised ART-I para almacenar las etiquetas de los
nodos categoría se podría emplear en Supervised ART-II para doblar el número de
nodos categoría de cada pila, lo que minimizaría está inflexibilidad.
5.- Las aproximaciones supervisadas propuestas para Fuzzy ART, se pueden hacer
extensivas a cualquier Red Neuronal Artificial tipo ART.
6.- Se ha demostrado que cada una de las aproximaciones supervisadas propuestas
trabaja de forma mas adecuada dependiendo de la homogeneidad del entorno.
Supervised ART-I es mas adecuada para entornos homogéneos, mientras que
Supervised ART-II está orientada a entornos no homogéneos. Esta homogeneidad se
define en términos del tipo de datos y de los parámetros dinámicos.
7.- La simplicidad de las arquitecturas Supervised ART-I y Supervised ART-II las hace
especialmente adecuadas para su realización mediante circuitos integrados de
dedicación específica.
8.- Un detallado estudio del comportamiento de Supervised ART-I y Supervised ART-
II en el dominio de sus parámetros dinámicos, ha permitido entender la influencia de los
parámetros dinámicos en el funcionamiento de las redes.
9.- Se ha desarrollado un sistema automático de clasificación de imágenes remotamente
detectadas por el sensor TM, el cual presenta una precisión de clasificación muy buena.
138
10.- Diferentes dinámicas del parámetro de vigilancia han sido propuestas y evaluadas.
Estas investigaciones han permitido concluir que la aproximación denominada "flying"
es adecuada cuando el valor inicial de dicho parámetro es muy alto (mayor que 0.95),
mientras que en el resto de los casos es mas adecuada la aproximación denominada
"floating".
Algunos de los aspectos derivados de este trabajo que deben ser investigados en
el futuro son: el estudio de nuevos algoritmos de aprendizaje que permitan reducir o
eliminar los episodios de under-training y over-training; el cómputo de los algoritmos
propuestos en diferentes sistemas DSP (Digital Signal Processing); así como la
investigación del comportamiento de las arquitecturas diseñadas para resolver otros
tipos de problemas. En este sentido, es preciso hacer notar que las arquitecturas
desarrolladas en este trabajo han sido utilizadas, proporcionado muy buenos resultados,
en la detección y seguimiento de incendios forestales (Al-Rawi et al. 2001a, b, c & d),
así como en la detección de nubes (Al-Rawi et al. 200le & f).
139