university polytechnic of madrid faculty …oa.upm.es/42680/1/kamal_alrawi.pdf · university...

UNIVERSITY POLYTECHNIC OF MADRID

FACULTY OF COMPUTER SCIENCE

DESIGN NEW SUPERVISED ART-TYPE ARTIFICIAL NEURAL NETWORKS, AND THEIR PERFORMANCES

FOR CLASSIFICATION LANDSAT TM IMAGES

Presented by

KAMAL R. AL-RAWI

To obtain the Ph.D. in Computer Science

MADRID- SPAIN

2001

CONSUELO GONZALO MARTIN, Associate Professor,

Department of Architecture and Technology of Computer

Systems, Faculty of Computer Science, University

Polytechnic of Madrid.

CERTIFIES: that the thesis entitled "DESIGN NEW

SUPERVISED ART-TYPE ARTIFICIAL NEURAL

NETWORKS, AND THEIR PERFORMANCES FOR

CLASSIFICATION LANDSAT TM IMAGES", has been

carried out by KAMAL R. AL-RAWI, under my

supervisión, in the Department of Architecture and

Technology of Computer Systems, Faculty of Computer

Science, University Polytechnic of Madrid.

To

Prof. Dr. Amos Eddy

ACKNOWLEDGEMENTS

I gratefully thank Dr. Consuelo Gonzalo Martín, associate professor of computer science, at Faculty of Computer Science, University Polytechnic of Madrid, for her continuous efforts during the supervisión of this thesis. Her guide and criticisms were a great help to me.

The criticisms of Dr. Águeda Arquero Hidalgo and Dr. Estibaliz Martínez Izquierdo were a great help. They were always in touch during preparing of this work.

My grateful thanks to Professor Dr. Pedro Gómez Vilda and the rest of the Working Group on Computer Technology, Dr. Victoria Rodellar Biarge, Dr. Mercedes Pérez Castellanos, and Dr Víctor Nieto Lluis, for their support and useful discussion.

I would like to thank all the gradúate students in the group, especially to Vicente Garcia del Cantara, for the friendly atmosphere during my stay in the Department of Architecture and Technology of Computer Systems.

My thanks to the secretary of the department Mrs. M. del Carmen Parró Cruz, who was always there to arrange our administration works.

I gratefully thank professor Dr. José Luis Casanova, Director of the Remote Sensing Laboratory (LATUV), University of Valladolid for using the facilities of the laboratory.

My thanks to Miss Sarah Strauss and Miss Nicole Knudsen for the proof reading of the Thesis.

Finally, I need to thank my wife Eman, my daughter Hiba, and my sons Saif Al-Deen and Haitham for their support during the preparing of this work.

INDEX

CHAPTER I: INTRODUCTION 1

1.1 Historical background 1

1.2 Adaptive Resonance Theory ANNs 2

1.2.1 Unsupervised ART ANNs 3

1.2.2 Supervised ART ANNs 4

1.3 classifying remotely sensed data with ANNs 4

1.4 Objectives 7

CHAPTER II: FUZZY ART ANN 8

2.1 Introduction 8

2.2 Matching system and vigilance parameter 8

2.3 Fuzzy ART dynamics 9

2.4 Fast-learning slow record option 14

2.5 Complement coding 15

2.6 Fuzzy subset and conservative limit 15

2.7 Training Algorithms of Fuzzy ART 16

2.8 Evolution of Fuzzy ART 19

2.9 Newly developed versions of Fuzzy ART 23

2.9.1 Flagged approach 24

2.9.2 Training algorithms of Flagged-Fuzzy ART. 27

2.9.3 Compact approach 29

2.9.4 Training algorithms of Compact-Fuzzy ART. 33

2.10 Categorization 35

CHAPTER III: FUZZY A R T M A P , 37

3.1 Introduction 37

3.2 Fuzzy ARTMAP 37

3.2.1 Vigilance parameter dynamics in supervised environment. 39

3.2.2 trainingphase 43

3.2.3 Classification phase 47

3.3 Full algorithm of Fuzzy ARTMAP 47

3.3.1 Training algorithms of Fuzzy ARTMAP. 48

3.3.2 Classification algorithm of Fuzzy ARTMAP 50

CHAPTER IV: SUPERVISED ART-I A N N 52

4.1 Introduction 52

4.2 Supervised ART-I 54

4.2.1 Architecture ofSupervised ART-I. 55

4.2.2 Data Description 58

A.2.3 Training of Supervised ART-I 58

4.2.4 Classification by Supervised ART-I 60

4.3 Algorithm of Supervised ART-1 60

4.3.1 Training Algorithm of Supervised ART-1. 60

4.3.2 Classification algorithm of Supervised ART-I 63

4.4 Discussion 64

CHAPTER V: SUPERVISED ART-II ANN 66

5.1 Introduction 66

5.2 Supervised ART-II 66

5.2.1 Architecture of Supervised ART-II. 66

5.2.2 Training of Supervised ART-II 68

5.2.3 Classification by Supervised ART-II. 74

5.3 Full algorithm of Supervised ART-II 74

5.3.1 Training algorithm of Supervised ART-II 74

5.3.2 Classification algorithm of Supervised ART-II. 78

5.4 Discussion 79

CHAPTER VI: PERFORMANCE OF SUPERVISED ART-I&II FOR CLASSIFICATION OF LANDSAT TM IMAGES 82

6.1 Satellites Landsat 82

6.2 Data 84

6.3 Performance 84

6.3.1 Training performance 84

6.3.2 Classification performance 92

CHAPTER VII: PERFORMANCES OF SUPERVISED ART ANNsWITH DIFFERENT VIGILANCE DYNAMICS 99

7.1 Introduction 99

7.2 Vigilance dynamics 99

7.2.1 Flying approach 99

7.2.2 Fixed vigilance approach 100

7.2.3 Free vigilance approach 100

7.2.4 Floating approach 102

7.3 Results and discussion 102

CHAPTER VIII: CONCLUSIONS 106

BIBLIOGRAPHY 109

APPENDIX: RESUMEN 115

A.l. INTRODUCCIÓN 115

A.1.1 Evolución histórica de las Redes Neuronales Artificiales (RNA) 115

A.1.2 Clasificación de datos remotamente detectados con RNA 116

A.2. OBJETIVOS DE LA TESIS 119

A.3. REDES NEURONALES ARTIFICIALES TIPO ART 119

A.3.1 Fuzzy ART 121

A.3.2 Fuzzy ARTMAP 123

A.4. PROPUESTA DE DOS VERSIONES MEJORADAS DE FUZZY ART 125

A.4.1 Versión "Flagged" de Fuzzy ART 125

A.4.2 Versión "Compact" de Fuzzy ART. 126

A.5. PROPUESTA DE DOS NUEVAS ARQUITECTURAS SUPERVISADAS TIPOART 127

A. 5.1 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-I 128

A.5.2 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-IJ 130

A.6. EVALUACIÓN DE LAS PRESTACIONES DE SUPERVISED ART-I Y SUPERVISED ART-II EN LA CLASIFICACIÓN DE IMÁGENES REMOTAMENTE DETECTADAS 132

A.7. PRESTACIONES DE REDES SUPERVISADA TIPO ART PARA DIFERENTES DINÁMICAS DEL PARÁMETRO DE VIGILANCIA. 135

A.8. CONCLUSIONES : 137

LIST OF FIGURES

Figure 2-1: Fuzzy ART dynamics 13

Figure 2-2: The architecture of Fuzzy ART 17

Figure 2-3: The architecture of FlaggedFuzzy ART. 26

Figure 2-4: The architecture of Compact Fuzzy ART. 31

Figure 3-1: Block diagram shows supervisión through mapfield. 38

Figure 3-2: The full architecture for supervisión through mapfield. 40

Figure 3-3: The architecture of ARTMAP for classification problem 41

Figure 3-4: Full architecture of Fuzzy ARTMAP 42

Figure 3-5: Match tracking using flying vigilance parameter 44

Figure 4-1: Training of map filed weights 53

Figure 4-2: Supervisión dynamic of the tagging approach of Supervised ART-I... 56

Figure 4-3: Architecture of Supervised ART-I 57

Figure 5-1: Supervisión dynamic of the stacking approach of Supervised ART-II 67

Figure 5-2: Architecture of Supervised ART-II 69

Figure5-3: Determination the winning node in the stacking-supervision approach of Supervised ART-II 71

Figure 6-1 :Number of category nodes in the domain of the vigilance parameter p and the dynamic learning parameter J3, using 9000 pixels of the

Landsat TM images 87

Figure 6-2: Training time, in minutes, for Supervised ART-I, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 89

Figure 6-3: Training time, in minutes, for Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 90

Figure 6-4: The ratio of training time for Supervised ART-I / Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning

parameter /?, using 9000 pixels of the Landsat TM images 91

Figure 6-5: Classification time, in minutes in the domain of the vigilance parameter p and the dynamic learning parameter /?, for 52 440 pixels of the Landsat TM images 93

Figure 6-6: Classification performance, in the domain of the vigilance parameter p and the dynamic learning parameter j5, for Landsat TM images 94

Figure 6-7: The abo ve image is the reference image. The lower image is the classified image using Supervised ART-II, with vigilance parameter p =0.98, the dynamic learning parameter /? =0.50, and training with 9000 exemplars. The classification accuracy is 85.82% 95

Figure 7-1: Sketches show different vigilance parameter dynamics: Fixed, free, And float approaches 101

Figure 7-2: Classified images for landsat TM images. First, second, third, and forthcolumn represents classified images using fly, float, fixed, and free vigilance parameter, respectively. First, second, third, fourth, and fifth raw represents classified images using initial vigilance

parameter and dynamic learning rate of (0.98, 0.50), (0.95, 0.20), (0.90, 0.15), (0.70, 0.15), (0.00, 0.15), respectively 104

LIST OF TABLES

Table 2-1: Comparison among Original, Flagged, and Compact algorithms of Fuzzy ART. The last two have been developed in this study 32

Table 5-1: Comparisons between Fuzzy ARTMAP, Supervised ART-I, and Supervised ART-II 80

Table 6-1: Descriptions for Landsat-5 Thematic Mapper (TM) images 83 Table 6-2: Performance of Supervised ART-II when trained with different

sizes of training samples 86

Table 6-3: Training and classification statistics for Landsat TM image at individual classlevel 96

Table 6-4: The confusión matrix for the classification process for the 52 440 pixels of the Landsat TM image.... 97

Table 7-1: The performance of Supervised ART-II ANN with different vigilance dynamics 103

ABSTRACT

New Supervised ART ANNs with simple architectures have been developed in

this study. Their architectures have been built from a single module of ART rather than

a pair of them connected by a map field as all other supervised ART-type ANNs that

have been reported in the literatee. Two different algorithms have been developed:

Supervised ART-I and Supervised ART-II. The developed algorithms reduced the

number of dynamic parameters, memory requirement, and the training time which is the

major problem facing the ANNs, without altering the classification accuracy.

Two simplified versión of Fuzzy ART algorithms have been developed, keeping

the categorization performance as that of the original algorithm. They are Flagged

Fuzzy ART and Compact Fuzzy ART. While Supervised ART-I and Supervised ART-II

are general in nature that can be applied to all ART ANNs, the supervisión of Compact

Fuzzy ART has been addressed in this work. The full algorithms for Supervised ART-I

and Supervised ART-II have been listed.

The newly developed ANNs have been applied to classify Landsat Thematic

Mapper (TM) images. The performance of the systems has been tested for different

dynamic parameters and different training samples. The behavior of the systems in the

vigilance parameter and dynamic learning parameter space has been addressed. Their

performances in the domain of the vigilance parameter and the dynamic learning

parameter have been understood.

Only one approach, for vigilance dynamic in all supervised ART-type ANNs,

has been addressed in the literatee. Three more approaches have been developed in this

study: fixed, free, and float. The performance of the developed ANNs for classification

landsat TM images has been tested for all these different vigilance dynamics.

CHAPTERI

INTRODUCTION

1.1 Historical background

Although the roots of the fieldof Artificial Neural Networks (ANNs) extend to

1943 when McCulloch and Pitts built the first artificial neural structure, its foundations

have been established in mid seventies. (Werbos 1974) developed the principie of the

Back Propagation (BP) ANN. (Grossberg 1976) developed the principie of Adaptive

Resonance Theory (ART) ANNs. However, the great theoretical advance of the field

has been achieved in 1980s. In that decade the algorithms of the BP ANN were

developed independently by many authors (Le Cun 1986, Parker 1986, and Rumelhart

et al. 1986). The Kohonen Self-Organizing Map KSOM (Kohonen 1982) and Hopfield

ANN (Hopfield 1982) have been developed. A lot of advances have been achieved for

ART ANNs (Carpenter & Grossbergh 1987a&b). ART ANNs is the concern of this

study due to their stability, rapidity and accuracy (Carpenter et al. 1991a&b, 1992,

1997, and Gan & Lúa 1992).

ART ANNs have been applied in many fields. Boeing Company has

implemented ART-1 neural information retrieval system for its engineering designs

(Caudell et al. 1994). Boeing Company has thousands of designs for its aircraft parts.

Features are extracted for each design. These features are presented to the network to

establish categories for these designs. When a new design is needed, its features are

presented to the system to determine the category that the required design belongs to.

1

Retrieval some features from the designs of the pointed category will avoid the

repetition of work for the new design.

ART ANNs have been employed for target recognition (Seibert & Waxman

1992). Their approach is extracting features of the target (aircraft) from different views.

(Bernardon & Carrick 1995) have used them also for target recognition using Synthetic-

Aperture Radar (SAR) imagery. After learning the network, target recognition is done

through matching the signal of the target with a set of stored target models. (Kumar &

Guez 1989, and Waxman et al. 1995) have used ART ANNs for target recognition too.

Kumar and Guez worked with visible, while Waxman and his group worked in visible,

infrared and SAR.

Moreover, ART ANNs have been employed for robot sensory motor control

(Baloch & Waxman 1991, Bachelder et al. 1993, Dubrawski & Crowley 1994,

Srinivasa & Sharma 1996) and robot navigation (Racz & Dubrawski 1995); machine

visión (Caudell & Healy 1994); object recognition (Seibert & Waxman 1992); face

recognition (Siebert & Waxman 1993); pattern clustering (Moore 1989, Mekkaoui &

Jespers 1990); character recognition (Wilson et al. 1990); sonar signal processing

(Simpson 1990); medical imaging (Soliz & Donohoe 1996); electrocardiogram wave

recognition (Ham & Han 1996); signature verification (Murshed et al. 1995); fault

identification problem in a nuclear power plant (Keyvan 1999); and remote sensing

(Gopaleía/. 1994; Baraldi & Parmiggiani 1995).

1.2 Adaptive Resonance Theory ANNs

There are two types of ANNs, supervised and unsupervised. In unsupervised

case only the input features are introduced to the input layer, then the network

categorizing them. While in the supervised type of ANNs the class code is supplied to

2

the network together with the input features. During training phase, when the network

correctly classifies an input feature, weights are trained, otherwise correction should be

done.

1.2.1 Unsupervised ART ANNs

The principie of ART ANNs was introduced in the literature as a theory of

human cognitive information processing (Grossberg 1976, 1980). Since then a series of

ART-based ANNs have been developed for unsupervised category learning and pattern

recognition in real-time: ART1 (Carpenter & Grossberg 1987a), ART2 (Carpenter &

Grossberg 1987b), ART3 (Carpenter & Grossberg 1990), SART (Baraldi &

Parmiggiani 1995), and Fuzzy ART (Carpenter et al. 1991a). ART1 has the ability to

categorize arbitrary binary input patterns (Carpentar & Grossberg 1987a). ART2 has the

ability to deal with binary and analog pattern as well (Carpentar & Grossberg 1987b).

The information, in ART1 and ART2, flows forward through weights that are connected

each node in the input layer to all nodes in the category layer, and backward through

another set of weights which connect each category node to all nodes in the input layer.

A simple architecture of unsupervised ART ANN has been developed (Carpentar et al.

1991a). They called it Fuzzy ART. It is like ART2, in that it has the ability to

categorize analog multi-valued input patterns and binary input patterns as well. Weights

in Fuzzy ART connect each node in input layer to all category nodes. Information flows

through these weights in one way, from the input layer to the category layer. Fuzzy

ART will be explained in details in chapter II.

3

1.2.2 Supervised ART ANNs

In the early nineties, two supervised ART architectures have been developed

ARTMAP (Carpenter et al. 1991b) and Fuzzy ARTMAP (Carpenter et al. 1992).

Architecture of ARTMAP has been built from two modules of ARTl, while architecture

of Fuzzy ARTMAP has been built from two modules of Fuzzy ART. ARTMAP has the

ability of learning and classifying binary multivalued input patterns. Fuzzy ARTMAP

has the ability of learning and classifying analog input patterns, in addition to the binary

one (Carpenter et al. 1992). More supervised ART-type ANNs have been developed;

ART-EMAP (Carpenter & Ross 1993), Gaussian ARTMAP (Williamson 1996),

ARTMAP-IC (Carpenter & Markuzon 1998), and Distributed ARTMAP (Carpenter

.1998). In all these architectures, the supervisión has been done through map field that

requires two modules of ART.

Fuzzy ARTMAP has been used widely. It showed better performance than

various other ANNs dealing with different problems such as, automatic analysis of

electrocardiogram (Ham & Han 1996); diagnostic monitoring of nuclear plants (Keyvan

et al. 1993); and prediction of protein secondary structure (Mehta et al. 1993).

1.3 Classifying remotely sensed data with ANNs

Mapping land-cover using remotely sensed data is a very active área of

research, due to the advances in space and computer technology (Benediktsson et al.

1990). Conventional classification is usually employed for this task. However, neural

networks have been often used in the last decade. The main advantages of neural

networks over conventional classifiers as Máximum Likelihood Classifier (MLC) are

that: 1) they are non-parametric, therefore, the probability distributions for each class

are not required. This allows us to introduce ancillary data (slope, topography, aspect,

4

...etc), in addition to the spectral data to the network, which many authors reported can

increase the classifícation accuracy (Benediktsson et al. 1990, Carpenter et al. 1997).

Moreover, neural networks are more robust when the distribution is not gaussian (Paola

& Schowengerdt 1997, Hepner et al. 1990). 2) Unlike conventional classifíers, neural

networks are able to manage fuzzy classifications (Paola & Schowengerdt 1997, Warner

& Shank 1997, Yool 1998). The numbers in the output represent the strength of the

classes membership of the specific input. This is very important when we deal with low

spatial resolution. 3) The parallel feature of neural networks allows us to increase the

speed of the classifícation process. This can be done by implement them on parallel

computers (Salu & Tilton 1993, Heermann & Khazenie 1992). 4) The neural networks

have flexibility for classifícation improvement (Carpenter et al. 1997). 5) It has the

ability for establishing an arbitrary decisión boundary (Paola & Schowengerdt 1995,

Tzenge/a/. 1994).

"Neural networks offer a flexible approach to building the complex, highly

non-linear models that required for a complex system. ... Unlike traditional expert

systems where knowledge is made explicit in the form of rules, neural networks

genérate their own rules by learning from exemplars" (Keyvan 1993).

Multi-Layer Perceptron (MLP), with Back Propagation learning, is the most

commonly used neural network in the literature to classify remotely sensed data. This is

due to the preferable learning approach of the network, which is based on minimizing

the error between the output of the network and the target valué. While some authors

have reported that conventional classifíers perform better than MLP (Mulder &

Spreeuwers 1991, Solaiman & Mouchot 1994), many authors have reported that MLP

perform better than MLC in classifying remotely sensed data (Hepner et al. 1990,

Heerman & Khazenie 1992, Paola & Schowengerdt 1994, Yoshida & Omatu 1994). The

5

classification performance of MLP can be improved by using ancillary data in addition

to the spectral data (Benediktsson et al. 1990). However, employing MLP as a classifier

incurs many problems. The architecture of the network is not fixed. The number of

hidden layers and the number of nodes in each hidden layer must be determined by trial

and error. This is a very costly process keeping in mind the long training time of the

network. In addition to that MLP might fall in a local mínimum during the training

phase. Moreover, MLP might not converge. Using a small learning rate to avoid the

convergence problem makes the long training time of the MLP network much longer.

(Heermann & Khazenie 1992) suggested using parallel computers to reduce the training

time. This reduces the training time but increases the hardware cost.

For classification of a Landsat image, (Carpenter et al. 1997) reported that

MLP did not converge, using learning rate=0.6 and momentum rate=0.4, after 212

minutes of training time on a SUN 4 SPARC Station, using 100 000 input presentations.

They employed a lower learning rate to avoid the convergence problem. The training

time exceeded 1000 minutes, while the classification accuracy was less than 27%. They

reported that Fuzzy ARTMAP (Carpentar et al. 1992) perform better classification

accuracy than MLP, with lower training time. They reported also that Fuzzy ARTMAP

and MLC perform the same level of classifying accuracy. Fuzzy ARTMAP has been

employed also by (Mannan et al. 1998) to classify (512x512) pixels, of an image of the

Linear Imaging Self-scanning Sensor (LISS-II) of Indian Remote Sensing Satellite

(IRS-1B), for their 13 classes. They reported that Fuzzy ARTMAP performs better than

both MLC and MLP in classification accuracy. The average classification for six data

sets are 84.7%, 80.3%, and 79.9% for Fuzzy ARTMAP, MLC, and MLP, respectively.

They reported that the training time was slightly less than that for MLC, but many times

faster than MLP.

6

Unlike MLP, Fuzzy ARTMAP has a well define architecture, it always

converges, and can tune itself to represent sub-classes by generating a new category

node. However, the main drawback to Fuzzy ARTMAP lies in the complex architecture.

ít is constructed from two modules of Fuzzy ART linked by a map field.

1.4 Objectives

The global objective of this work is to design new simplified versions for ART

ANNs architectures, which maintain their original performances, but improve

computational time and memory.

This objective can be divided in several partial objectives:

• Design new simple architectures of ART-type ANN, which provide the same

classification performances of classical ARTs.

• Develop learning and classification algorithms for these architectures.

• Encode the developed algorithms.

• Study of the behavior of the developed architectures for classification of remotely

sensed images Landsat Thematic Mapper (TM) in the whole domain of the dynamic

parameters.

The lay out of this study will be as follow: Chapter II deals with Fuzzy ART.

Chapter III deals with Fuzzy ARTMAP. Chapter IV deals with the newly developed

architectures "Supervised ART-I". Chapter V deals with the newly developed

architecture Supervised ART-II. The performance of Supervised ART-I and Supervised

ART-II ANNs for learning and classifying Landsat TM images are addressed in Chapter

VI. Performances of the newly developed ANNs using different vigilance dynamics are

addressed in chapter VIL Conclusions are listed in chapter VIII.

7

CHAPTERII

FUZZY ART ANN

2.1 Introduction

The Fuzzy ART is an unsupervised ART-based ANN. Its architecture has been

designed for leaming and categorization of arbitrary analog or binary multi-valued

input patterns. This has been achieved by using the mínimum operator ( A ) of the fuzzy

set theory instead of the intersection operator ( n ) of the set theory, which has been

employed in ART1.

2.2 Matchmg system and vigilance parameter

"Fuzzy ART incorporates the basic features of all ARTs system, notably,

pattern matching between bottom-up input and top-down leamed prototype vectors.

This matching process leads either to a resonant state that focuses attention and triggers

stable prototype leaming or a self-regulating parallel memory search. If the search ends

by selecting an established category, then the category's prototype may be refined to

incorpórate new information in the input pattern. If the search ends by selecting a

previously untrained node, then leaming of a new category takes place" (Carpenter et

al. 1991a). If the matching valué is greater than the predetermined valué, resonance

occurs and new information is incorporated to the winning category node through

training its weights, otherwise, a self-organizing parallel memory search is conducted.

8

The match criterion is called vigilance parameter/?. It calibrates the mínimum

confídence that a category node must have to represent the current input, before search

for a better-committed category node is triggered. If all committed category nodes fail

to represent the current input, a new category node is committed, as long as the

network's memory capacity is not fully utilized. The vigilance parameter is a non-

dimensional number pe(0, 1]. A valué of 1 means perfect matching. Low vigilance

parameter leads to code compression with broad generalization for categories. High

vigilance parameter leads to large number of category nodes with fine categories.

Vigilance parameter is the key feature of all ART ANNs. An ART ANN can

discrimínate up to the individual level by setting p - 1, while creating a single category

•node for all data by setting p = 0. The valué of the vigilance parameter is determined

according to type and amount of data that we have, categorization level that we look for,

the required speed, and available memory. The vigilance parameter is fixed during

training in all unsupervised ART ANNs.

2.3 Fuzzy ART dynamics

Input patterns A^e[0, 1] are presented at the input layer F¡. The choice

function Tj° for each committed category node of the category layer F? is computed

according to equation (2-1). The choice valué represents the activation level of each

committed category node;

2M

E ( 4 ( , ) A W , ) T{,) = -^ • /=1 C 2-1

« + ! > ! / 1=1

9

where wy are the weights which connect each category node/ in the Frfield with all

nodes of the input layer F¡. All weight valúes are initially set to 1 (i.e. Wy = l;for / = 1,

..., 2M. and j = 1, ..., Q . M represents the dimensión of the input features. Since, the

normalized features and their complements are introduced to the network, the dimensión

of the input vector A¡° is 2M. a is the choice parameter (a > 0). C is the total number

of committed category nodes at iteration t.

The winning committed category node is determined;

T},) = max{T¡')};j=l,...,N 2-2

It represents the category node with the highest choice valué among all category nodes

(committed and uncommitted) in the category layer. N represents the full memory

capacity of Fuzzy ART. The valué of N is normally much larger than C (N»C). All

category nodes N are involved, instead of committed category nodes C, which has been

employed by (Carpenter et al. 1991a). Their reasoning for this is to let uncommitted

category nodes be committed, when it is needed, in a sequence order (1, 2,..., j-\,j,j+1,

..., N). To achieve this, they assigned a very small positive valué ^.to each category

nodes before training is started. They called it, F2 -order constants. These valúes are

decreasing as the index of the order of category node y" in the memory field is increased.

Tj=4j;MC + lf>,...,N 2-3

0 < ^ <... < <pj <... < ^ = 0 2-4

10

In this way, when all committed category nodes are in shut off mode, because they

failed to represent the current input, the uncommitted category node (C +1)(0 will be

committed, since it has the highest choice valué (F2~order constant) among all

uncommitted category nodes as prearranged.

4'>, = max{^};JHC + \f\ ...,N 2-5

The match valué is computed for the winning category node J;

2M

match valué for node J - '-1 2M 2-6

(=1

The match valué represents a hypothesis, that the current input A(t) belong to

the winning category node J of the F2 - field. This hypothesis is tested against

predetermined vigilance parameter p e(0, 1]. The vigilance parameter represents the

minimum confidence level that is required to accept that the winning node J of the

F2 -field, represents the category of the current input A(t).

If the match valué of the winning node is less than p, the hypothesis is

rejected and this committed category node shuts off as far as the current input is

presented to the network. This is to prevent the persistent selection of the same category

node during search. Shut off is simply done by assigning -1 to the choice valué of the

failed category node. Researching for another winning committed category node is

triggered. The committed category node with the highest choice valué node is

determined among the entire category nodes N. The match valué for this node is

11

computed. The confídence level of this new winning category node to represent the

current input is then tested. Category nodes that in shut off mode persists in this mode

as far as the input vector that they failed to represent are presented at the input layer F].

The network keeps searching for máximum choice valué node J, doing

computation of the match function for node J, and testing against the vigilance

parameter p, for each committed node of the F2 - fíeld. This is done in order,

according to their choice valué's rank, until either one of the committed category node

can represent the current input A(t) (resonance occurs), then learning the weight wu of

the selected category node J, or if none, the uncommitted category node with index

C+l, which has the highest choice valué among all uncommitted category nodes as

prearranged, will be picked by the network to represent the current input. The match

valué of new committed category node passes the valué vigilance parameter p, since it

has the valué of one. That will be explained later in this chapter.

The weights of the selected category node update in order to incorpórate the

characteristics of the input pattern to category /according to the next equation;

W™ = A 4 ( ° A < ) + (1 - / ? ) < ; i=l,..., 2M 2-7

Where ¡3 e (0, 1] is the dynamic learning parameter. Putting /?=1 for fast learning. The

dynamics of Fuzzy ART is shown in (figure 2-1).

After learning the weights of the selected category node J, a check should be

done to see if a committed category node has been chosen to represent the current input

or a new category node has been committed. This is to increase the number of

committed category node C by one or none. The number of committed category node C

12

F2 (WTA)

T J

TRAINING

gggggg««»W¡ggg»«a»*g»««w»«

F,

w y

RESET

MATCH TESTING

Figure 2-1: Fuzzy ART dynamics. F¡ is the input layer. F2is the category

layer. Weights w^ connect each node in the input layer to all nodes in the

category layer. Learning for weights w¡j of the máximum choice valué node should be conducted if it is passed the matching test, otherwise reset occurs. \X\ represents the degree of membership between input and the weights of the winning category node J. It is computed as follow:

2M X = ̂ (Aj0 A w¡j). The valué of y7 represents the degree that category

1=1

node j can represent the input. For Winner Take All (WTA), as the case

here, v̂ = 1. However, ¿[yj =1 for the distributed case.

13

controls the computation of the choice function for committed category node only. Such

test can be done easily as follow;

I f (J>QThenC=C+l 2-8

If the index of the selective category node J is greater than the number of committed

category nodes C that means the uncommitted category node C+l, has been chosen to

represent the current input, because all committed category nodes failed to do so. This

uncommitted category node has the máximum prearranged choice valué ̂ c+, among the

choice valúes ^ of all other uncommitted category node, since it is prearranged so.

If choice function is computed for uncommitted category nodes, the order of

the prearranged choice valúes ^ that has been assigned for each category node y will be

destroyed. So committing uncommitted category nodes in order will be disordered.

This leads to the fragmentation of the nodes of the category layer.

In contrast, if choice function is not computed for some committed category

nodes, they keep their oíd choice valúes which represent the previous input rather than

the current input of our concern. This leads to the destruction of category node selection

according to their choice valúes for the current input.

2.4 Fast-Iearning slow record option

Fast-learning slow-record option has been suggested by (Carpenter et al.

1991b) for learning new committed category node. In this learning mode, /3 = 1 for the

first training iteration, while /? < 1 during the rest of the training phase. This is to avoid

under-training of rare events.

14

2.5 Complement coding

All ART's architectures are stable. Their weights are monotonic decreasing

during learning. This beautiful advantage of ARTs architectures sometimes produces

eroding for some of the weights wü to zero, when a large number of inputs are

presented to the network. This leads to category proliferation problem (Moore 1989,

Carpenter et al. 1991b).

(Carpenter et al. 1991b) suggested that proliferation of category nodes could be

avoided if inputs are normalized before their presentation to the network. They

recommended the presentation of input patterns and their complements as normalization

process. Complements coding leads to introduce the on-response and off-response to the

network. The norm \A\ of input patterns and their complements is always equal to the

dimensión of the input patterns M. This could be shown as follows;

M M M M M M

M|=(a,ac) = Z«,+E^=Sf l(+E(1-f l.) = £ f l . + M -E f l / = M 2"9

;=1 ;=1 /=1 1=1 ;=1 í=l

Therefore, complement coding automatically normalizes each input pattern to M.

Complement coding is a normalization process that does not alter the amplitude

information of the input features.

2.6 Fuzzy subset and conservative limit

If a is cióse to zero, then the category node whose weights wv are subset of

the input vector A¡ is first chosen, since its choice valué will be cióse to unity if such

category node exists. So the activation valué for each committed category node j ,

represents the fuzzy membership of the input vector A, in that category. Moreover, if

15

resonance occurs for a category node that's weights wü are subset of inputv4;, its

weights are unchanged during training. When a approaches to zero, it is called

conservative mode. In conservative mode neither training ñor resetting occurs. So a

must be large enough to alter such selection.

While choice valué depends on the degree to which wu is a fuzzy subset of A¡,

the reverse is true for match valué. It depends on the degree that Ai is a fuzzy subset of

wu. If the winning category node is a fuzzy subset choice, then the match function is;

2M

HWLT , 2M

According to this equation, the category node with the highest subset choice should be

chosen among all other subset choices, if such choices exist, in order to have the

máximum match valué. This choice can be controlled by the choice parameter a.

The full architecture of the Fuzzy ART is shown in (figure2-2). For more

details about Fuzzy ART see (Carpenter et al. 1991a, & 1992).

2.7 Training Algorithms of Fuzzy ART

1) Input parameters;

a) Dynamic parameters:

i- p e (0, 1]: The vigilance parameter.

ii- p e (0, 1]: The dynamic learning parameter; P

iii- a >0: The choice valué parameter.

= 1 for fast learning.

16

Figu

re 2

-2:

The

arc

hite

ctur

e of

Fuz

zy A

RT

. T

he f

ull

capa

city

N o

f th

e ca

tego

ry l

ayer

is

invo

lved

for

de

term

inat

ion

the

máx

imum

cho

ice

valu

é no

de J

. T

hey

are

show

n in

dar

k. W

eigh

ts a

re c

onne

cted

to

all

cate

gory

no

des.

Wei

ghts

that

are

con

nect

ed t

o un

com

mitt

ed c

ateg

ory

node

s ar

e sh

own

in li

ght.

Thi

s is

bec

ause

they

are

not

le

arne

d ye

t. T

he n

umbe

r of

com

pari

son

whi

ch i

s ne

eded

to

dete

rmin

e th

e m

áxim

um c

hoic

e va

lué

node

Jis

N-l

, si

nce

it is

car

ried

out

am

ong

all

cate

gory

nod

es. T

his

incr

ease

s tr

aini

ng t

ime.

If J

>C

the

n th

e un

com

mitt

ed c

ateg

ory

node

w

ith i

ndex

C+

l ha

s be

en c

omm

itted

, si

nce

its c

hoic

e va

lué

<¡>

c+x

= m

ax,{

<t>

j);j

-C+

l, ...

, N

. T

hat

beca

use

thes

e

cons

tant

are

arr

ange

d as

</>

c+x

<...

<<

p N.l\

has

bee

n pr

earr

ange

d th

is w

ay t

o le

t cat

egor

y no

des

to b

e co

mm

itted

in

orde

r to

pre

vent

the

fra

gmen

tatio

n of

the

cate

gory

lay

er.

b) Data characteristics;

i- M: The dimensión of the input features.

ii- Pt: The number of exemplars to be used in learning.

c) Initialization;

i-Wy = 1 , i=l,..., 2M,j=\, ...,N

n-Tj = <¡>} ,7=1,..., N, where 0, s 0

0<^<.. .<Í¿,<. . .<¿

iii- Number of iterations t=l.

iV- Number of committed category nodes C-l.

2) New input;

a,w for \<i<M 1

l-a^ for M + \<i<2M\

3) Compute the choice ñinction for all committed category nodes;

2M . -

1 ( 4 " A Wj y(') _ Jrí . , „

j iM ; J _ I > •••> c

¿=1

4) Reset: Determine the node J, which has the máximum choice valué;

A (0

18

2 M

5) Matching criterion: If ( £ ( 4 - ° A W U ) < M / ) ) Then;

i- Shut-off this node to put it out of competition;

rp = -i ii- GOTO STEP (4)

6) If( J>C) Then new category node has been committed

C=J

7) Training;

8)If(t<Pt)then;

i- t=t+I

ii- GOTO STEP (2)

9) Training has been done. The network is ready for categorization.

2.8 Evolution of Fuzzy ART

The Fuzzy ART algorithm has been introduced in the literatures by (Carpenter

et al. 1991a). In their algorithm, the choice valué is computed for all the potential

category nodes N in the next way;

19

2m

E(4 ( , )AWff) T.=^ ^ J=l,...,N

2M 2-11

«+2></ /=i

Since the input pattern A is normalized to [0, 1] and the initial weight valúes Wy (0) are

set to one, the choice valué for uncommitted category nodes using complement coding

is,

uncommitted

M

a + 2M 2-12

So the choice valué is;

2M

Z(4 ( , ) A W , ) ¡=1

2M

«+s W„ i=\

M

a + 2M

; / = l, ..., C

;y = C + l,..., iV

2-13

(Carpenter et al. 1991a) stated that choosing larger valúes for initial weights is

possible, but it will bias the system against the selection of uncommitted category

nodes. This is really what we look for. Our concern is to test all committed category

nodes to represent a specifíc input before a new node is committed. However, when

initial weights are very large, fast learning should be conducted, at least for the first

time when the node is committed. Fast learning brings the weights to valúes below one,

because the normalized input features that forced this node to be committed will be

20

assigned to its weights. Otherwise, long training is required to reduce the weights to

their converge valúes.

(Georgiopoulos et al. 1996) introduced a very complicated approach to insure

the testing of all committed category nodes is completed before a new node is

committed. The architecture of their Fuzzy ART has top-down weights w¡j, which they

initialized them to one, and bottom-up weights Wi}, which they initialized them to

\l(a + (p), where <p is a very large constant [2M,co]. They called <p, the uncommitted

cholee valué parameter. They stated that, the bottom-up inputs (using complement

coding) are given by the equation;

Tj =

1M

1 ( 4 ' A W,) /=!

1M

a + Y^ü M

a + <p

j = 1, .... C

j = C + l, ..., N

2-14

In this equation, <p replaces the 2M of equation (2-12). Since <p>2M, T} for

uncommitted nodes is smaller than that of committed category nodes. By using very

large q>, the choice function has valúes very cióse to zero for uncommitted category

nodes. This allowed them to examine the performance of Fuzzy ART at a -> oo, since

when (p —> oo, all committed category nodes will be tested before a new node is

committed. The valué of a alters the order of search among committed category nodes.

"A node is called an uncommitted node if all its top-down weights are equal to the

initial top-down weight valué, otherwise, the node is committed" (Georgiopoulos et al.

1996). This test takes time especially when the input features are large.

21

The weights updating in their fuzzy ART algorithm (they called it Fuzzy ART

Variant, because a -» oo and <p —» oo) for w^ is as that of equation (2-7). However,

they did not show the training algorithm for Wu, but they stated that the bottom-up

weights does not change where equation (2-15) is valid,

K * ^ " ^ ^ 2-15

While the very large valúes for the choice parameter a and the uncommitted

node choice parameter <p leads to testing of all committed category nodes before a new

node is committed in Fuzzy ART, they are not practical for Fuzzy ARTMAP because

they créate too many category nodes (Georgiopoulos et al. 1996).

It has been explained in (section 2.3) the use oíF2-order constant (f>} to assure

that the test for all committed category nodes is completed before a new node is

committed. Since <f>j » 0, all the committed category nodes will be tested to represent a

specific input before the uncommitted category nodes are tested. The valué of

<Pj decreases as j increases. This lets the uncommitted node C+l to be chosen before

other uncommitted category nodes. This approach has not been mentioned in the

original algorithm of Fuzzy ART (Carpenter et al. 1991a) or the original algorithm of

Fuzzy ARTMAP (Carpenter et al. 1992), but has been extracted from the full Fuzzy

ARTMAP algorithm that has been listed by (Carpenter et al. 1997).

(Geongiopoulos et al. 1999) include all committed category nodes in addition

to one uncommitted category node in the search for máximum choice valué node. The

choice valué for their algorithm is;

22

TJ limJs^M = - E ( 4 ( , ) A W , ) ;j = l,...,C

O ; = c +1

2-16

They assigned very large valúes for the initial weights in order to have a zero valué for

uncommitted category node, and therefore committed category nodes will be tested

first, before a new node is committed. This forced them to use fast learning when a new

node is committed to reduce them to their theoretical valúes, which are below one.

2.9 Newly developed versions of Fuzzy ART

The determination of the winning category node among the full capacity of the

network N, as reported by (Carpenter & Grossberg 1987a, Carpenter et al. 1991a), is

time consuming. The capacity of the system can be very large especially when it is

working in a non-homogenous environment. Uncommitted category nodes can be

committed in sequence order without using the prearranged choice valúes <¡>j and

without including all the capacity of the category layer N in determination the máximum

choice valué node J.

Two new versions of Fuzzy ART architectures have been constructed in this

work. The first one is the Flagged approach. This approach involves the uncommitted

category node with rank C+í in the category layer together with all committed category

nodes to determine the máximum choice valué node J. A total of C comparison is

required rather than JV-1 as the case in the original Fuzzy ART architecture. As

mentioned before, this approach has been conducted by (Georgiopoulos et al. 1999), but

in their approach they assigned large valué for initial weights which forced them to use

23

fast learning. The second one is the Compact approach, which involves only committed

category node C to determine the máximum choice valué node J.

2.9.1 Flagged approach

There is no reason at all to involve Tc+1,..., TN in determination the máximum

choice valué node. Only the uncommitted category node with rank C+1 in the category

layer will be involved. This uncommitted category node is flagged by assigning a valué

of </>c+1 to its choice valué such that;

T**-* < ¿c+1 < 0 2-17

A negative valué is assigned for </>c+l, because the input features A¡ as well as

the weights w¡j never have negative valúes. So, according to equation 2-11 the choice

valué for any committed category node is never a negative valué,

Tj > 0 ;j=l,...,C 2-18

However, the valué of (f>c+x must be greater than the choice valué of committed

category nodes that are in shut off mode. In this way, when all committed category node

are in shut-off mode, the flagged node with index C+1 in the category layer, will be

chosen as the máximum choice valué node. We should not worry about the match valué

of a new committed category node, since the match valué of any new committed node is

equal to one, which is the highest valué that the vigilance parameter p can have. That is

because A¡ is normalized to [0, 1] before its presentation to the network, and the initial

24

weights for category nodes are equal to one. So input 4 i s a subset of w /c+]. That

means A, A wiC+l = A¡. According to equation 2-11, computing the match function for

the subset choice leads always to one as demonstrated below;

1 2M 1 2M M 1-t(A¡ A W/c+1) = — Y 4 = — = 1 2-19

Therefore, the uncommitted flagged node C+\ will not go to shut off mode. It will pass

the match test for sure.

After resonance occurs, a check should be done to see if the flagged

uncommitted category node is chosen. If J>C then the flagged node has been chosen.

The number of committed category node must be increased by one (C=C+1) and the

weights of the new flagged node wiC+i should be initiated;

wi,c+\ = ! ; i'=l, -.., 2M 2-20

The full architecture of the Flagged-FuzTy ART is shown in (figure 2-3). Only

the committed category nodes and the flagged uncommitted category node are involved

in determination the máximum choice valué node. Weights are not established yet to

connect uncommitted category nodes in the F2 - layer with the all nodes of the input

layer Fx.

25

Figu

re 2

-3:

The

arc

hite

ctur

e of

Fla

gged

Fuz

zy A

RT

. O

nly

com

mitt

ed c

ateg

ory

node

s an

d th

e un

com

mitt

ed

cate

gory

nod

e w

ith in

dex

C+1

in th

e ca

tego

ry l

ayer

are

invo

lved

in

dete

rmin

atio

n th

e m

áxim

um c

hoic

e va

lué

node

J. T

hese

cat

egor

y no

des

are

show

n in

dar

k. W

eigh

ts a

re c

onne

cted

to a

ll th

ese

cate

gory

nod

es. C

ateg

ory

node

s th

at a

re n

ot in

volv

ed in

det

erm

inat

ion

the

máx

imum

cho

ice

valu

é no

de, a

re s

how

n in

ligh

t. W

eigh

ts a

re

not

conn

ecte

d to

the

m.

Wei

ghts

tha

t co

nnec

ted

to t

he f

lagg

ed n

ode

(unc

omm

itted

cat

egor

y no

de w

ith i

ndex

C

+1)

are

show

n in

ligh

t. T

his

is b

ecau

se th

ey a

re n

ot in

itiat

ed y

et. I

t w

ill b

e in

itiat

ed (

w¡ c

+1

= 1

; i=

l, ...

, 2M

) on

ly w

hen

it w

ill b

e ch

osen

as

the

máx

imum

cho

ice

valu

é no

de.

The

num

ber

of c

ompa

rison

whi

ch is

nee

ded

to d

eter

min

e th

e m

áxim

um c

hoic

e va

lué

node

is C

, sin

ce it

is

carr

ied

out a

mon

g co

mm

itted

cat

egor

y no

des

plus

the f

lagg

ed no

de. T

his

redu

ces

train

ing

time.

to

os

2.9.2 Training algorithms of Flagged-Fuzzy ART


a) Dynamic parameters:

i- pe (O, 1]: The vigilanceparameter.

ii- fie (O, 1]: The dynamic leaming parameter; P = 1 for fast leaming.




ii- Pt: The number of exemplars to be used in leaming.

c) Initialization;

i-rc+1=-o.i

ii- Number of iteration t=l.

iii- Number of committed category nodes C=\.

2) New input;

' a^ for \<i<M

\-af for M + l<i<2M*

3) Compute the choice function for all committed category nodes;

2M .

ZUWAW,) 7 (̂0 _ ±\

J 2M , j= l , . . . ,C

r(') -


r /( , ) = m o x { 7 ' ; , ) } Í J p i , . . . , c + l

5) Matching criterion: If (^(A^ A W Í , ) < Mp) then; 1=1

i- Shut off this node to put it out of competition;

ii- GOTO STEP (4)

6) If (J>C) Then new category node has been committed

i- C=J

ii- wu =• 1 ; i=l,..., 2M

i i i - r c + 1=-o. i

7) Training;

8)If(t<Pt)Then;

i- t=t+l

ii- GOTO STEP (2)


28

2.9.3 Compact approach

Uncommitted category nodes can be committed in sequence order without

using the flagged uncommitted category node. It involves only the committed category

node to determine the máximum choice valué node J. It is called Compact approach.

The choice function is computed for committed category nodes. The máximum

choice valué node J i s determined among committed category nodes C only.

r j '>=max{7f} ; y=l , . . . ,C 2-21

The match valué of the selected category node J is tested against the

predetermined valué of the vigilance parameter p. If the match valué of node J is less

than p, the node is shut off by assigning a valué of -1 to its choice valué to put it out

of competition during the current input. Otherwise, the node is trained, all committed

category nodes are on, and new input is presented to the network.

When the máximum choice valué equals -1 all committed category nodes are

in shut off mode. The uncommitted category node C+l should be committed to

represent the current input in order to prevent the fragmentation of the category layer.

Simply training the initial weights of the category node with index C+l, and increasing

the count of the committed category nodes by one can do this. This commits

uncommitted category nodes according to their order in the category layer. The number

of comparison needed to determine the máximum choice valué node is (C-l) rather (N-

1) which the original Fuzzy ART algorithm requires. This will save a lot of computation

time, keeping in mind that N»C.

29

In the case of new category node should be committed, its weights will be

updated through the next equation;

™ S , = / H ( ° + ( 1 - / ? ) ;M,...,2M 2-22

According to this equation weights initialization (Wy ; i=\,..., 2M;j=\, ..., JV) is

not required, as reported by (Carpenter et al. 1991b). This also will save time since this

equation requires less arithmetic operations than the previously suggested one. The full

architecture of Compact-Fuzzy ART is shown in (figure 2-4). Committed category

nodes are shown in dark. Uncommitted category nodes are shown in light. Weights

connect all input layer nodes to committed category nodes only. Weights are not

connected to uncommitted category nodes since they are not committed yet (they are

not assigned weights yet).

The comparison among the original Fuzzy ART, Flagged-Fuzzy ART, and

Compact-Fuzzy ART is shown in (table 2-1). It shows clearly that Flagged-Fuzzy ART

and Compact-Fuzzy ART are faster than the original algorithm of Fuzzy ART. The

main point that is influencing the reduction of the training time is the number of

comparisons that are needed to determine the winning category node. They are, as

mentioned before, N-í, C, and C-1 for the original Fuzzy ART, Flagged-Fuzzy ART

and Compact-Fuzzy ART, respectively.

30

Figu

re

2-4:

The

arc

hite

ctur

e of

C

ompa

ct F

uzzy

AR

T.

Onl

y co

mm

itted

ca

tego

ry

node

s ar

e in

volv

ed

in

dete

rmin

atio

n th

e m

áxim

um c

hoic

e va

lué

node

/.

The

se c

ateg

ory

node

s ar

e sh

own

in d

ark.

Wei

ghts

con

nect

all

inpu

t la

yer

node

s to

com

mitt

ed c

ateg

ory

node

s on

ly. U

ncom

mitt

ed c

ateg

ory

node

s ar

e sh

own

in li

ght.

Wei

ghts

ar

e no

t con

nect

ed to

them

sin

ce th

ey a

re n

ot c

omm

itted

yet

(the

y ar

e no

t ass

igne

d w

eigh

ts y

et).

T

he n

umbe

r of

com

paris

on w

hich

is n

eede

d to

det

erm

ine

the

máx

imum

cho

ice

valu

é no

de is

C-l

, si

nce

it is

ca

rrie

d ou

t am

ong

com

mitt

ed c

ateg

ory

node

s on

ly. T

his

redu

ces

train

ing

time.

UJ

Init

iali

zati

on f

or C

hoic

e va

lué

Ch

oice

fun

ctio

n 7}

Det

erm

inat

ion

T„m

tn

ox

Ch

eck

for

new

com

mit

ted

node

Nu

mb

er o

f co

mpa

riso

n fo

r T

ma

x

Mat

ch t

esti

ng

Wei

ghts

ini

tial

izat

ion

Wei

ghts

upd

atin

g fo

r oí

d no

de

Wei

ghts

upd

atin

g fo

r n

ew n

ode

Rep

lace

men

t fí

xed

choi

ce v

alu

é ^

Ori

gina

l

0<4<

..<$<

..<$$

=£

2M

S(4

AW

J/)

T =

'"'

:/' =

l,...

,C

(-1

r y= W

£ü(7

;.;y=

l,...M

J >

C

N-\

1M

1=1

^

n

2M

'- P

í = l

y Sj=

Xi=

X..2

Mj=

\.^

yin=

fiA

xtf)

*Q-M

VIT

^A

V^

-ZH

'

Non

e

Fla

gged

¿c

+ 1 =

-0.1

Sam

e

TJ=

maP

/J=

l...C

+l}

J >

C

c Sam

e

w¡j

= 1

; i =

1,..

.,2M

Sam

e

V^

/^A

^V

O-Z

K'

<¿c

+ 1 =

-

0.1

Com

pac

t

Non

e

Sam

e

TJ=

mat

TJ;

j=l..

.Q

Tj=

-l

c-\

Sam

e

Non

e

Sam

e

4^/?

4«i-

m=

i2

Non

e

Tab

le 2

-1:

Com

paris

on a

mon

g O

rigin

al, F

lagg

ed, a

nd C

ompa

ct a

lgor

ithm

s of

Fuz

zy A

RT

. The

last

two

have

bee

n w

deve

lope

d in

this

stu

dy. F

lagg

ed a

nd C

ompa

ct a

lgor

ithm

s ar

e fa

ster

, how

ever

, Com

pact

alg

orith

m is

reco

mm

ende

d.

2.9.4 Training algorithms of Compact-Fuzzy ART


a) Dynamic parameters;

i- p e (O, 1]: The vigilance parameter.

ii- P e (O, 1]: The dynamic leaming parameter; P = 1 for fast leaming.

iii- & >0: The choice valué parameter.




c) Initialization;

i- Number of iterations t=l.

ii- Number of committed category nodes C=\.

2) New input;

A" = a\l) for \<i<M

l-af for M + l<i<2M\


2M . .

ZUmAW,) y(0 _ »'=

3 2M

«+E% ;7=1,..., C

¿=i

33


T}l)=max{T^} ; /=i c

5) If TJl) - - 1 (all committed category nodes are in shut-off mode) then a

node (the node that its order in the category layer is C+1) should be

committed;

i- Increase the number of committed nodes by one;

C=C+1

ii- If in fast-learning slow-record mode;

Assign the valúes of the input feature to the weights of this node;

™£St=4l) ;M,... ,2M

Else (normal mode)

wfr=j8A?+(l-f]) ;/=!,..., 2M

iii- GOTO STEP (2)

2M

6) Matching criterion: lí(^(A¡l) A WU) < Mp) then; í=i


T)n=-\ J

ii- GOTO STEP (4)

7) Learning;

!C = M O A < ) + 0 - ^

8)If(t<Pt)then;

i- t=t+l

ii- GOTO STEP (2)


2.10 Categorization

At the end of the training phase, all weights are fixed. The number of category

node C is known. The network is ready for categorization.

l)Input:

a O for \<i<M

\-a)n for M + \<i<2M

2) Compute the choice valúes for all committed nodes;

2M

Z U W A W , ) T¡»=-*—r,— - ,y=l,..., C 2M

3) Determine the node J, which has the máximum choice function among all

committed category nodes;

(Oí Ty> = max{T}'>} ;/=!,..., C

35

Match testing:

If (the match valué for the winning node J> p) then;

Category node J represents the category of this input

Else

The network fails to categorize this input

5) If more categorization is needed GOTO STEP (1).

6) Categorization has been done.

CHAPTERIII

FUZZY ARTMAP ANN

3.1 Introduction

While the roots of ART ANNs go back to 1976, the supervisión was not started

until sixteen years later when ARTMAP. architecture has been constructed (Carpenter et

al. 1991b). More supervised architectures have been constructed thereañer (Fuzzy

ARTMAP, ART-EMAP, Gaussian ARTMAP, ARTMAP-IC, and Distributed

ARTMAP). All of them are constructed from two modules of ART ANNs linked by a

map fíeld. Supervisión of ART ANNs using map field approach is shown in (figure 3-

1). The supervisión approach, using map field, will be explained through Fuzzy

ARTMAP.

3.2 Fuzzy ARTMAP

The Fuzzy ARTMAP is a supervised ART-type ANN. It is a generalization of

the ARTMAP ANN. "ARTMAP system learns orders of magnitude more quickly,

effectively, and accurately than alternative algorithms. It achieves these properties by

using an internal controller that conjointly maximizes predictive generalization and

minimizes predictive error by linking predictive success to category size on trail-by-trail

bases, using only local operations. This computation increases the vigilance parameter

p, of ARTaby the minimum amount needed to correct predictive error at ARTé"

(Carpenter et al. 1991b). Therefore, ARTMAP is a self-organizing expert system, since

it calibrates the selectivity of its hypotheses based upon predictive success (Carpenter et

37

b(TRAINING)

MAP F I E L D / - ^ GAIN ( A fc

CONTROLA. J * — ( \ M k ? FIELD ~H JORIENTING

VySUBSYSTEM

Figure3-1: Block diagram shows supervisión through map field. Two modulus of ARTs inter-linked by a map field.

38

al. 1991b). While ARTMAP treats binary input only, the Fuzzy ARTMAP is capable of

learning and classification of both binary and analog input patterns that present in

arbitrary order to the network. While back propagation ANN required 20 000 epochs to

learn a benchmark (Lang & Withbrok 1989), Fuzzy ARTMAP required only 5 epochs

(Carpenter et al. 1992). The architectures of all supervised ART-type ANNs consist of

two modules of ARTs (ARTa and ARTb). These two architectures are linked together

through a map field, see (figure 3-1&2). For classification tasks, ARTé is reduced to the

input layer only, see (figure 3-3). Map field is simply aniVxZ, array of binary weights

wjk; j - 1 , ..., N; k-1, ..., L initially set to one, see (figure 3-4). When wJK =0 means the

category node ./represents other class than K. Therefore, the node J should be shut off.

3.2.1 Vigilance parameter dynamics in supervised environment

As it has been mentioned before, the vigilance parameter p e [0, 1 ] is the key

feature for ART ANNs. It represents the minimum match valué required for a

committed category node to represent the current input. A match valué of 1 represents

perfect representation, while a match valué of 0 represents no match at all. A high valué

leads to genérate many category nodes to represent fine subclasses, while a low valué

leads to fewer category nodes with coarser subclasses. If the match valué of the winning

category node is greater than the predetermined vigilance parameter p while class

matching is failed, then the current match valué is assigned to the vigilance parameter

after increase it by a very small valué s as it is shown in equation 3-1;

i 2M

P = ~77^Ait) AWu) + £ 3-1

M tí

39

ARTMAP

Figure 3-2: The full architecture for supervisión through mapfield. ARTMAP is shown here. The dynamics of the network is very complex. All supervised ART-type ANNs that have been reported in the literatures (Fuzzy ARTMAP, ART-EMAP, Gaussian ARTMAP, ARTMAP-IC, and Distributed ARTMAP) are constructed using the map field approach. Carpenter and her group they represent all modules of ART as three layers: Input layer F0, category layer F2

and the hypothetical layer Fx. The assumed layer Fl represents the membership x between input and the weights of the winning category node J.

40

ARTMAP MAPHELD

fab

AFtt\

V

Fl"

Fo

w Jk

.ab

RESET

Pab)<-

PREOCTNE ERROR R =1

n. a

RES3NANCE

®F¿ MATCH TRACKNG

a a'

&,;& = !...,£

T Binary digital code

Figure 3-3: The architecture of ARTMAP for classifícation problem.

Match tracking is done through map field. If Y(6<0 A wJk)<pab) match

tracking should be conducted. This approach requires map field weights, map field vigilance parameter, and binary digital for class code.

41

Figure3-4: Architecture of Fuzzy ARTMAP. The full structure of ARTA is not needed. It has diminished to input layer only. All components in the upper light box are belonging to the supervisión through map fíeld. The components out side the box is the original architecture of Fuzzy ART.

42

The vigilance parameter p is only increasing during training phase when class

matching of the wining category node is failed for a specific input features. The very

small valué e is added to the failed match valué and then is assigned to the vigilance

parameter in order to classify rare events (Carpenter et al. 1992). The vigilance dynamic

in supervised environment is shown in (figure 3-5).

If class matching has occurred, all weights of the winning node should be

trained. Otherwise, a valué of-1 is assigned to the choice valué of this category node to

put it out of competition during current input (shut off). In addition to class matching,

the next winning node should beat the new vigilance parameter in order to represent the

current input. The vigilance parameter is fíxed if the category node J i s failed to pass the

match testing, while the match valué of node J is assigned to p if J is failed the class

matching;

i 2M

p"ew =max{pold ^(^(A^ AWU))}±S 3-2

This step is repeated until either one of the committed category nodes can

represent the current input or a new category node should be committed. The vigilance

parameter reset to its base-line valué and all committed category nodes are reactivated

before a new input is presented to the network.

3.2.2 Training phase

During training phase, a stream of input vectors A(t) and a stream of input

vectors b(t) are presented simultaneously to Fuzzy ART a and Fuzzy ART b of Fuzzy

ARTMAP, respectively. The input vector A(t) is normalized to [0, 1] before their

43

p

i

s_

P

i

F A

A

1 1 1 fc>

T,

Figure3-5: Sketches shows the match tracking. The x-axis represent ranking for all committed category nodes according to their choice valúes Tj. The y-axis represents the match valué for each category nodes. The thin line that runs along the solid line represents the match valué before addingf. New node must be committed, because all committed category nodes can not represents the current input.

44

presentation. The input vector b(,) is the correct prediction given A(0. It is a binary

digital code. It has the number of digits (neurons) that equals the number of classes of

the input data. In winner-take-all mode, all digits equal to zero except one digit, which

corresponds to the order of the class code. This digit is equal to one. However, in class

fuzzy membership the sum of the input valúes at the input layer of Fuzzy ARTA is equal

to one, that is;

2>r=i k=\

If the match valué of the winning node Tj is greater than the vigilance

parameter pa, the class matching should be tested. Thus;

k=\

Class matching: weights should be updated for node J;

<W = A 4 ° A wf) + (1 - P)ytf ;i=l, ..., 2M

< * = M° A O + (1 - P)wf -MI .... Z

Else

Match tracking:

1M 1 LM

M ti

Where pab e (0, 1] is the map field vigilance parameter, wJk is the weight vector which

connect the winning node J in the category layer with all nodes of the map field. All

weight valúes are initially set to 1. (i.e. wJk= 1 ; j = l, ...,Nand k=\,..., L), where L is

45

the total number of nodes at the map fíeld, which is equal to the number of classes of

the input data. If class match failed (the class matching valué is less than the

predetermined map fíeld vigilance parameter pab), the vigilance parameter pa should

be increased just abo ve the match valué of the selected category node by a small valué

e. This is called match tracking. It is an internal control mechanism that maximizes code

compressions and minimizes predictive errors. However, the vigilance parameter of the

map fíeld pab is fixed during learning phase. When a winning committed category node

failed to pass either, the required confídence level or class matching, it shuts off for the

duration of the input. The network repeats doing this until either one of the committed

category nodes can represent the current input or a new category node should be

committed.

If a new category node should be committed (all committed category nodes

failed to represent the current input), it weights will be updated as follows for normal

learning case (/? < 1);

wf¿rsl = pAf + (1 - P ) ; i=l, ..., 2M 3-4

w»r=fib¡í>+(l-fi);b=l,..,L 3-5

For fast learning case (J3 = 1), the valúes of the current input Af0 will be

assigned to w£rsl and the valúes of b(l) will be assigned to w£™. Fast learning, for

newly committed category node, is recommended to classify rare events. The valué of

j3 depends on the amount and type of the data under consideration. It is clear from the

above formulas that if /? =0 no learning will occur, since weights will not be changed

being fixed at 1 during training.

46

More details about Fuzzy ARTMAP architecture and algorithm can be found

in (Carpenter et al. 1992, & 1997).

3.2.3 Classification phase

At the end of the training phase the weights wtJ and wjk are fixed. The network

is ready for classification. Input patterns are presented to ARTa without class code. The

choice valué is computed for all committed category nodes. The category node with the

máximum choice valué is determined. The score of the winning category node J at each

node at the input of ART6 is computed by;

b(0= ™A k ~ L , k=l, ..., L 3-6

,wjk k=\

The node with the máximum score bKat input layer of ARTd is determined.

The index iTof this node will be the class code of the current input.

i

3.3 Full algorithm of Fuzzy ARTMAP

The full algorithm of Fuzzy ARTMAP is listed below. The supervisión of

Compact Fuzzy ART, which has been developed in this work, will be used. The Fuzzy

ARTMAP algorithm, which used original Fuzzy ART, is listed in (Carpenter et al.

1997).

47

3.3.1 Training algorithms ofFuzzy ARTMAP



i- pe [O, 1]: Base-line vigilance parameter.

ii- pab e (O, 1]: Map field vigilance parameter.

iii- J3 e (O,1]: The dynamic leaming parameter; /?=1 for fast leaming.

iv- CC >0: The choice valué parameter.




c) Initialization;


ii- Number of committed category nodes C=\.

2) New input;

< ( ) = í «í0 for \<i<M [1-flf0 for M + l<i<2M\

H° ;k=l,...,L.


2M

ZUe,AWf) r ! 0 = w .

7 2M >J l< •••> ^

;=1

48


T^ =max{TP) ;j=l,...,C

5) If ( Tj° - -1: all committed category nodes are in shut-off mode) then a

new node (the node that its order in the category layer is C+1) should be

committed;


C=C+1

ii- If in fast-commit slow-record mode;


™íc" = 4° ; 1=1. .... 2M

«"=¿f ;*.;,..„ L

Else (normal mode)

w*"=fr}')+(L-fi) ;i-l 2M

wgr=flbi°+<\-0) ,W L

iii- GOTO STEP (2)

2M 6) Matching criterion: If ( ^ ( 4 - ° AWU) < Mp) then;

;=i


ii- GOTO STEP (4)

49

7) Class matching: If ( 2 $ ° A wJk) < A*) t h e n ;

i- Shutoffnode J;

r w = _!

ii- Rise p to the limit that deactivates node J;

1 1M

iii- G0T0STEP(4)

8) Learning;

WT = 0(A¡* A W°Jd) + (1 - /?)<W ; lW, .... 2M

w? = M° A < ) + (1 - / ? )< ;k=l,..., L

9)If(t<Pt)then;

i- t=t+l

ii- p = p

iii- G0T0STEP(2)

10) Training has been done. The network is ready for classification.

3.3.2 Classification algorithm ofFuzzy ARTMAP

1) Newinput;

Ait) = , a? for \<i<M

l-a¡° for M + \<i<2M

50


2M

1=1

3) Determine the node J, which has the máximum choice valué;

T}1) = max{Tp} ;7W,.... C

4) Matching criterion:

2M

If(£(4 ( ' )AW ¿ /)<Mp)then; i= i

i-The network can not determine the class code of this input.

ii- GOTO STEP (7).

5) Class matching:

1,(0 _ WJk °k ~ L ~~~ ;k=l,...,L

,WJk

I f ( S ^ ° A M ; - / * ) < ^ ) t h e n

i- The network can not determine the class of this input.

ii- GOTO STEP 7.

6) Class assigning:

i. bf=max{b?} > ; , . , !

ii- ATis the class code of the current input.

7) If more classification is needed GOTO STEP (1).

8) Classification has been done.

51

CHAPTERIV

SUPERVISED ART-IANN

4.1 Introduction

As it has been mentioned the map field approach is the unique approach, which

has been addressed in the literature for supervisión of all ART-type ANNs. Using map

field approach, two modules of ART ANNs are required. Moreover, map field

supervisión approach forces one to present the class code as Z-digits long binary code,

where L is the number of classes. The binary digital coding is employed by putting all

the class code digits equal to zero, except the one, which corresponds to the order of the

class code, which is set to 1. The class code should be presented to the network as

follows: class code #1 as (1 0 0 . . . 0), class code #2 as (0 1 0 . . . 0), class code #3 as (0

0 1 . . . 0), ..., class code #L as (0 0 0 . . . 1). More than that, training with hard samples

(each training exemplar represent a single class, which is the normal case) requires,

additional (false) learning for the weights of the map field. False learning because the

initial valúes of all weights that connect the category node with the map field equal one.

During training all weights drop to zero except one, which its valué remains equal to

one. This is the weight that connects the category node with the correspond node of the

map field that represents its class, see (figure 4-1). So, the valué of the map field

vigilance parameter/?ai does not effect the effíciency of learning or the accuracy of

classification. Therefore, pab needs just be a positive fractional number (0, 1], because

the match valué at the map field is either equal to one or zero. So, the valué of pab does

52

The initial map field weights

The class code bk for class #4

Each map field node bk A WM

L

Total map field k=\

1 ¡ 0

1 0 i

1 0

i I o i 0 I 0

I I 1 ¡

1 1 1 I 0

1 0

a

Map field weights for class #4

The class code bk for class #2

Each map field node bk A WM

L

Total map fieldX^ AWJk) k=\

0 0

0

0

0 1

0

0 0

0

1 o

I 0

0 0

0 I

I b

Figure 4-1: The map field supervisión approach: a) A new committed category node, which represents class code #4.

The initial map field weights are equal to one. The class code for class #4 is a binary digital code with all digits are equal to zero except digit #4 which is equal to one. The map field weights to the newly committed category node are equal to zero except the fourth one, which is equal to one. So, the code of class #4 has been stamped at the map field weights for this newly committed category node to represent class #4.

b) A committed category node represents class code #4 rejected to represent class #2.

53

not effect the class matching process. Class matching using map fields approach is done

as follow;

J^ 1 for class matching y (bk A w^) =< ¡~^ 0 for class correction

In addition to the requirement of two modules of ART, map field approach

leads to more computation through map process and weight learning. Moreover, it

requires more memory.

This chapter describes a new simplified supervised ANN architecture, which is

constructed from a single module of ART, called Supervised ART-I (Al-Rawi 1999).

Supervisión of the new simplified versión of Fuzzy ART (Compact Fuzzy ART), which

has been developed in chapter II, is described here. This new ANN has a simple

architecture and thus simplifies the computational complexity, keeping its accuracy

performance at the same level of Fuzzy ARTMAP. In addition to that, it has fewer

parameters and requires less memory. Moreover, if hardware is developed, the cost will

be much lower than that of map field approach.

The layout of this chapter is as follows: Section 4.2 describes the architecture

of the Supervised ART-I, data representation, training phase, and testing phase. The full

algorithm has been Usted in section 4.3. Section 4.4 includes the discussion.

4.2 Supervised ART-I

As Fuzzy ARTMAP, the newly developed ANN Supervised ART-I has the

ability of learning and classifying of arbitrary sequence order of binary and analog

multi-valued input patterns. It has the same classification accuracy of Fuzzy ARTMAP

4-1

54

with a simpler architecture. This leads to the simplificaron of the mathematical

construction of the network, reduction in the number of parameters, and reduction in the

memory requirement, and fínally, reduction of both training and classifícation time. The

supervisión approach of Supervised ART-I is shown in (figure 4-2).

4.2.1 Architecture of Supervised ART-I

The Supervised ART-I architecture is constructed from a single Fuzzy ART,

instead of two, as in Fuzzy ARTMAP. The full architecture of Supervised ART-I is

shown in (figure 4-3). This leads to the elimination of the map field, and therefore,

elimination of map field weights and the map field vigilance parameter (pab) of Fuzzy

ARTMAP. This has been achieved by two different process: 1) Employing the analog

class coding (it is convenient to use positive integer) instead of the necessary binary

digital coding in Fuzzy ARTMAP; 2) Introducing a one-dimensional memory, running

along all the category nodes N of F2a- field which used to tag each new committed node

with the code of the class that belongs to. This N size memory represents just (l/Z-)th of

the eliminated memory that is occupied by the weights of the map field wjk. Since, all

category nodes are connected to the map field, the total number of weights connected to

the map field is NxL, where N is the total capacity of the network.

Class matching in Supervised ART-I is much simpler than that in Fuzzy

ARTMAP. It does simply by reading the tag-value of the winning category node. When

a node is committed during training phase, the class code of the input pattern that forces

it to be committed is assigned to the memory of its tag-value. Each committed category

node has only one tag-value because each node can represent only one class. However

more than one category node can represent the same class. Therefore, each category

node can be seen as a representation of subclass of the class, which it belongs to.

55

, Tag(./)=6

r2

PREDDTWE EHBOfl R =1

Class code Integer

Figure 4-2: Supervisión dynamic of the tagging approach of Supervised ART-I. Class code is integer. Match tracking is conducted by checking the tag of the winning committed category node J with the class code b. This replaces the complicated map field approach.

56

Tag

#1

O

Tag

#2

O

Tag

O

Tag

O

NoT

ag

Yet

O

NoT

ag

Yet

O

M

* i

K

F x

Figu

re 4

-3:

Arc

hite

ctur

e of

Sup

ervi

sed

AR

T-I

. Su

perv

isió

n is

don

e us

ing

the

tagg

ing

appr

oach

. W

hen

a no

de i

s co

mm

itted

, it i

s ta

gged

with

the

clas

s co

de o

f the

inpu

t fea

ture

s th

at fo

rcé

it to

be

com

mitt

ed.

4.2.2 Data Description

The multi-valued input patterns A(t> can be presented to Supervised ART-I in

both binary and analog forms. However, input data should be normalized to [0, 1]

before their presentation. Since, sometimes, some of the weight valúes erode to zero, it

is recommended to introduce A(t) in the complement form to avoid the category

proliferation problem.

The class node b(t) is not a binary vector as the case of map field supervisión

approach, but a positive integer number which represents the class code of the input

patterns A(t), ( b — 1, 2,.... etc ). b = 1 represents class number 1, b - 2 represents class

number 2,...., etc.

4.2.3 Training of Supervised ART-I

During the training phase, a stream of multi-valued input patterns A(t) and the

class code b® are introduced simultaneously to the network. The choice function is

computed for each committed node according to equation (2-1).

The network selects the committed category node J, which 1) has the

máximum choice valué among all the committed category nodes (in F2 - field) and 2)

has a match valué greater than or equal to the vigilance parameter p;

2M

^ ( ^ A W ^ M p 4-2

If the tag-value of the winning category node matches the current class code b(t), the

node will be trained. Otherwise match tracking should be conducted;

58

If(Tag(J)=¿> )then

Weights updating;

w- = (3{A? A w°Jd) + (1 - flytf • i=l, ..., 2M

Else

Match tracking;

i 2M

M tí

Otherwise, class correction should be conducted by increasing the vigilance parameter

p above the match valué of this node by a small valué £ , and another committed node

is chosen. This sacrifices the generalization to correct predictive error. Any committed

category node that failed to represent the current input must be shut off, as far as this

input is on, in order to prevent its reselection. A category node is shut off by assigning a

-1 to its choice valué. That is because all category nodes not in shut off mode have a

positive choice valué. If the failed category node did not shut off, the network will be in

infinitive reselect-fail loop. The network will reselect the same category node, and the

node will fail to pass the match criterion. If none of the committed category nodes is

able to represent the current input A(t) (all committed category nodes are in shut off

mode), a new category node is committed and is tagged immediately with the class code

b(t) of the current input pattern. Such action is needed when the máximum choice valué

is a valué of shut off node (Tj =-1). In the fast-leaming slow-record option, the valúes of

the input features, which forced the category node to be committed, are assigned to its

weight valúes. This is to let the network deal better with noisy data, so rare events can

59

be classified. If the network is in normal mode, the weights of the new committed

category node will be as that of equation (2-16).

As in all supervised ART-type ANNs, the vigilance parameter p should be

reset to its base-line valué p, before a new input pattern is introduced to the network.

4.2.4 Classification by Supervised ART-I

During the testing phase, only the input pattern A(t) is introduced to the

network. The choice fiínction is computed for all committed category nodes. The

category node with máximum choice fiínction is determined among all committed

category nodes. If the match valué of the winning category node Jpasses the base-line

vigilance parameter p, then the tag of the category node J represents the class code of

the current input pattern A(t). If not, the network can not determine the class code of the

current input.

4.3 Algorithm of Supervised ART-I

4.3.1 Training Algorithm of Supervised ART-I



i- pe[0 , 1]: Base-line vigilance parameter.

ii- p e (0,1]: The dynamic learning parameter; /?=1 for fast learning.


60




iii- L: The number of classes.

c) Initialization;


ii- Number of committed category nodes C-\.

2) New input;

A v \ < for \<i<M | l - a , ( / ) for M + l<i<2M

A(0


2M

7 - ¡ » = ^ _ _ .j-, c 2M

a + YJwu

i=\


T^=max{TP) ;/=!,...,C

61

5) If Tj° = - 1 (all committed category nodes are in shut-off mode) then a

node (the node that its order in the category layer is C+1) should be

committed;


C=C+1

ii- If in fast-learning slow-record mode;


w¡c ~A¡ ;i=l,...,2M

Else (normal mode)

iii- GOTO STEP (2)

2M s 1=1

6) Matching criterion: If (^(A¡1) A W¡J) < Mp) then;

i- Shut off node J to put it out of competition;

7 f = - 1

ii- GOTO STEP (4)

7) Class matching: If (Tag (J) * b{t)) then;

i- Shut off node J;

If = -1

ii- Rise p to the limit that deactivates node J;

1 2M

P^ÍK^)-^

iii- GOTO STEP (4)

8) Learning;

f W A , i , ° W oíd wn™ = P(A<> A O + (1 - / ? ) < ; ,W,..., 2M

9) IF (t<Pt) then;

i- t=t+l

ii- p = p

iii- GOTO STEP (2)

10) Training has been done. The network is ready for classification.

4.3.2 Classification algorithm of Supervised ART-I

1) Newinput;

,(') 1(0 - '

a)n for \<i<M

\-a)n for M + l<i<2M


) 2M

£(4° *", j 2M •j=i,.... c

a + Z>i/ i= i

63

3) Determine the node J, which has the máximum choice valué;

r W = m « { 7 f } ;J=1,...,C

4) Matching criterion:

1M

I f ( £ ( 4 ' ) A W ¿ / ) < M p ) t h e n ;

The network can not determine the class of this input.

Else

Tag(J) is the class code of the current input

5) If more classification is needed GOTO STEP (1).

6) Classification has been done.

4.4 Discussion

The supervisión approach of Supervised ART-I is more powerful than the map

field approach. The Supervised ART-I neural network has a simple architecture and a

simple mathematical construction, which reduces time of both training and

classification phase. In addition, it also has fewer parameters and it requires less

memory, than Fuzzy ARTMAP.

The introduction of the one-dimensional memory JVto store the tag's valué for

each committed node in the Supervised ART-I represents only (1/X)th of the eliminated

64

two-dimensional memory wJk (N,L) which connects the nodes of the F2a -Field with

the nodes of the map field Fab, see (figure 4-3).

Using the positive integer numbers for class coding, in Supervised ART-I, is

easier to handle than the necessary binary digital coding in map field approach of Fuzzy

ARTMAP, specially when we have large number of classes as in character recognition

and remote sensing tasks.

The Supervised ART-I, as it seems from its simple architecture, decreases

sharply the training and classifying times of the Fuzzy ARTMAP. However, it is

theoretically known that both of them have the same classification accuracy. The

supervisión approach, which has been developed in this chapter, is used for supervisión

of Compact Fuzzy ART. However, it can be applied for supervisión all ART

architectures.

65

CHAPTER V

SUPERVISED ART-IIANN

5.1 Introduction

It has been shown in the previous chapter that Supervised ART-I has a simpler

architecture, fewer parameters, and requires less memory than Fuzzy ARTMAP. This

leads to quicker learning and classifying algorithms, with the same accuracy as Fuzzy

ARTMAP. The great achievement of Supervised ART-I is that, its architecture has been

built from a single module of ART, instead of a pair of them as in all supervised ART-

type ANNs. This led to the elimination of the map field. The supervisión approach of

Supervised ART-I can be applied to all supervised ART-type architectures that have

been addressed in the literatures.

This chapter deals with constructing a new generation of Supervised ART-I,

called Supervised ART-II (Al-Rawi et al. 1999). As it will be shown, it is quicker in

learning for non-homogenous data and requires less memory than Supervised ART-I.

Nevertheless, the classifícation accuracy and number of parameters are like those of

Supervised ART-I.

5.2 Supervised ART-II

5.2.1 Architecture of Supervised ART-II

Supervised ART-II, as Supervised ART-I, has been built from a single Fuzzy

ART module, see (figure 5-1). The one-dimensional memory JVof the category nodes of

66

PREDDTIVE ERROR R =1

Class code Integer

a

Figure 5-1: Supervisión dynamic of the stacking approach of Supervised ART-II. Like Supervised ART-I, class code is integer. Match tracking is conducted by checking the stack K of the winning category node with the class code b. When all committed category nodes of the stack b fail to represent the input features, a new node should be committed. This is because, committed category nodes of other stacks will not pass the class matching. This will decrease sharply the training time.

67

Supervised ART-I is divided into Z-one-dimensional memories (Nk; k=l, ..., L) in

Supervised ART-II, see (figure 5-2). Each of these one-dimensional memories Nk has

been called "stack". The stack number k represents the class code for all its committed

category nodes. The memory requirement for Supervised ART-II is less than that of

Supervised ART-I. One label is assigned to each stack, rather than tagging each

individual category node in Supervised ART-I. The memory field to represent the class

code in Supervised ART-II is one-dimensional array with length L.

The size of Nk (number of nodes which are available to be committed) in each

stack are not necessarily equal. It depends on the nature and size of the data of each

class. However, if no previous knowledge about the data is available, an equal memory

size is recommended. In the case of using equál memory size for all stacks, the memory

field of the category layer is a Nk xL matrix. The dynamic of Supervised ART-II is

shown in (figure 5-1), and the full architecture is shown in (figure 5-2).

5.2.2 Training of Supervised ART-II

During training phase, a stream of multivalued input patterns A ¡ and their

class codes b^'' are introduced simultaneously to the network. The choice function is

computed for each committed node in all the stacks;

2M

IW"'A\,) _í=l

*M ^ ; jk=l,...,C{k),k=l L 5-1 a + L, WÜ>*

Í = I

T<0 _ 1 Jtk ~

CATEGORY LAYER

CLASS CODE

I N P U T L A Y E R

Figure 5-2: Architecture of Supervised ART-II. Input nodes are connected to all committed nodes. Committed nodes are shown in dark. Uncommitted nodes are shown in light.

69

Where C(k) is the number of committed nodes in the stack number k, and wijkk are the

weights, which connect each category node jk in each stack k with the input node í.

The node, which has the máximum choice valué, is determined for each stack;

TJt = max {T};¡} • j=i,.... C(k) 5-2

These máximum choice valué nodes are the candidates of their stacks to

represent the current input. The node, which has the highest choice valué T}¡) among

all the candidate nodes, is chosen to represent the current input, see (figure 5-3);

T¡¡>=max{T%} ; Jk=J„..., JL 5-3

The match valué is computed for the winning node, in the next way;

match valué = — V (A¡1) A WUK ) 5.4 M Ti

At this stage it can be checked if a new category node should be committed

from the stack that represents b(t). This can be done simply by checking Tj'^. If it is

equal to -1 that means a category node in shut off mode has been selected. This happens

only when all committed category nodes of the stack that represent the current input are

in shut off mode. In this case committed category nodes of other stacks are not required

70

JK

Figure5-3: Determination the winning node in the stacking-supervision approach of Supervised ART-II. The máximum choice valué node for each stack is selected. The winning node among these candidates is determined. When the winning node failed a new candidate for this stack is presented. The winning category node among all candidates is re-determined.

71

to be tested since they will not pass the class matching. This will save time during

learning phase.

If the match valué of the winning node has passed the vigilance parameter p,

the class matching should be checked. The class matching is passed if the stack label K

.of the winning node matches the current class code b(t). Then the weights of the

winning node are trained, otherwise match tracking should be conducted;

If(A=& (0)then

Weights updating:

K7 = A 4 ( 0 A vC) + 0 " PXK ; iW, .... 2M

Else

Match tracking should be conducted

If either the match valué or the class matching for the current node has failed, a

valué of-1 is assigned to the choice valué TJp of this node. This is to put it out of

competition. Another node with the máximum choice valué should be selected among

all the committed nodes of the stack K only. This node is the new candidate for its stack.

The candidates of all other stacks remain the same. The candidate node, which has the

highest choice valué among all the candidate nodes of all the stacks, is redetermined.

This process should be repeated until either one of the committed nodes can represents

the current input, then its weights are trained or a new node should be committed from

the stack which represents the class code of the current input b(,). This way all the

committed category nodes are stacked according to their classes.

72

In the case of a new category committed, its weights are assigned the valué of

the current input A¡ , which forces it to be committed, that is;

C(b)=C(b) + l 5-5

*&¿=4\ i=l,...,2M 5-6

Where C(b) is the number of committed nodes in the stack that represents the class code

b. Therefore, the weights initial valúes are not required. Weights updating for a new

committed category node of equation 5-2 represents the fast-learning slow-record

option, which is recommended to classify rare events. In the normal mode weights

updating should be conducted by;

*>$?«» = /H (° + 0 - / ? ) ; i=l .... 2M 5-7

As all supervised ART-type architectures, if class matching has failed, the

vigilance parameter p should be increased to the limit that the failed category node is

deactivated. Thus, the vigilance parameter should be assigned the match valué of the

current category node, plus a small valué e in order to treat the class correction;

i 2M

P = — Y,^ AWUK) + £ 5-8

73

However, the vigilance parameter should be relaxed to its base valué before

introducing the next input; p = p where p is the predetermined minimum accepted

matching valué.

5.2.3 Classification by SupervisedART-II

During the classification phase, the node, which has the highest choice

valué Tjg among all the committed nodes, is selected to represent the current input

Tjí =ma^}- jk=\,..., C(k); k=l,.... L 5-9

If the match valué of the winning node is greater than or equal to the vigilance

parameter p, the current input belongs to class K. If not the network can not determine

the class of the current input.

5.3 Full algorithm of Supervised ART-II

5.3.1 Training algorithm of Supervised ART-II



i- pe [0, 1]: Base-line vigilance parameter.

ii- p € (0,1]: The dynamic learning parameter; ¡5=1 for fast learning.

iii- & >0: The choice valué parameter.



74


iii- L: The number of classes.

c) Initialization;


ii- Number of committed category nodes for each stack (class) is set to zero;

C(k)=0; k=l, ...,L

2) Newinput;

A ( í ) _ j a? for \<i<M

l - a f > for M + \<i<2M\

O : the class code of the current input A¡°.

3) If (no node has been committed for the current class b{t) (ie C(b)=0)) then;

GOTO STEP (6i)

4) For all the stacks; k=l,..., L:

If C(k)=0 (no category node has been committed yet for this class) then;

T =-1

Else

Compute the choice function for all committed category nodes;

75

2M

• v / _ ' = ' y (O _ _M

1=1

5) For each stack k, determine the node Jk, which has the máximum choice

valué;

T}?=max\r¡¡>}j-1,...,C(Q

6) If (Tj = -1 ), new node should be committed for class b(l);

i- Increase the number of committed nodes for class b(t) by one;

C(b(,))=C(b(,))+\

ii- If (fast-learning slow-record mode) then;

wi,C(b),b - A¡ ; i=l -, 2M

Else

^ ^ = A W + ( l - ^ ) ; / = / 2M

iii-GOTOSTEP(ll)

7) Determine the node, which has the highest choice valué among all the

máximum choice valued nodes of all the stacks;

T$=max{T^};k=l L

76

8) Match testing: For category node TJK ;

I f (ZW° AWUK j<^P)then;

i- Shut off category node JK

1JK 1

ii- redetermine the node which has the máximum choice valué node for

this stack K. k=K: GOTO STEP (5).

9) Match tracking:

If (K* b(t) ) the stack number K of the winning category node does not

match the current class code. Match tracking should be conducted;

i- Shut off category node JK during the current input A¡°;

2t>=-l lJK

1 í \

•»" í=i

iii- GOTO STEP (5)

10) Learning;

"Z? = fi(Am A <) + O - « « S ; i-i,.... a/

11) If the number of iteration t<pt then;

i- t=t+l

ii- p=p

iii- GOTO STEP (2)

12) The training has been done. The network is ready for classifícation.

77

5.3.2 Classification algorithm of Supervised ART-II

1) New input;

,« í «,(° for \<i<M A^ =<

\\-aj0 for M + l<i<2M

2) Compute the choice fimction for all committed category nodes;

2M

rp(t) _ 1 = 1

jkk ~~ 2M j Jk=l> ••••> C(k)> k~l> •••> L

i=l

3) Choose the node that has the máximum choice valué among all the

committed nodes for all stacks;

Tj¡> = max{T¡?}, j=l,.... C(k); k=l, ..., L

4) Match testing:

2M I f ( Z i \ A i ° AWUK)>MP)then; ;=1

The class code of the current input Af* is K

Else

The network can not determine the class code of this input.

5) If more classification is needed then;

GOTO STEP (1)

ELSE

Classification has been completed.

78

5.4 Discussion

It is clear from the algorithms of Fuzzy ARTMAP (Carpenter et al. 1992),

Supervised ART-I (Al-Rawi 1999), and Supervised ART-II (Al-Rawi et al. 1999), that,

they have the same classification accuracy. That is because all these architectures share

the same matching criteria. However, the last two learn quicker due to their simple

architectures.

The Supervised ART-II is quicker in learning binary and analog input patterns,

than Supervised ART-I. Because stacking the category nodes according to their classes,

makes the redetermination of the máximum choice valué node quicker than the tagging

approach of the Supervised ART-I, when the previous one has failed to pass the

vigilance parameter or the class matching, see (figure 5-3). The number of comparisons

which are required to redetermine the winning node, among all the committed category

nodes C, are C-1 comparisons for Supervised ART-I, while for Supervised ART-II they

are (C/L-l)+(L-l)=(l/L)C+(L-2) comparisons, as an average. Therefore, learning time

for Supervised ART-II is quicker compared to Supervised ART-I, as C increases (Al-

Rawi et al. 2000).

In Supervised ART-II, when all committed category nodes for the stack of the

current input pattern are in shut off mode, a new node must be committed from this

stack without waiting to test all the committed category nodes for all stacks, because

they will not pass the class matching. However, in Supervised ART-I all committed

category nodes should be tested before such a decisión can be made.

The comparison between Fuzzy ARTMAP, Supervised ART-I, and Supervised

ART-II is shown in (table 5-1). Supervised ART-I is superior or equivalent in each

point to Fuzzy ARTMAP. Supervised ART-II has simpler architecture from Supervised

ART-I. However, the main limitation for Supervised ART-II is that the máximum

79

Arc

hite

ctur

e

No.

of

cate

gory

nod

es i

nvol

ve f

or

Tm

ax

Che

ckin

g fo

r a

new

com

mit

ted

node

Re

Tm

ax b

efor

e ne

w n

ode

is c

omm

itte

d

Cla

ss c

odin

g

Sup

ervi

sión

mem

ory

requ

irem

ent

Map

fie

ld v

igil

ance

par

amet

er

Mat

ch t

rack

ing

Upd

ate

wei

ghts

of

AR

T w

¿/

Upd

ate

wei

ghts

of

map

fie

ld W

/¿

Fre

edom

of

unco

mm

itte

d ca

t. n

odes

Fuz

zy A

RT

MA

P

Tw

o A

RT

s

c J>C

c B

inar

y di

gita

l co

de

(N

+ 1

) *

L

Pab

L

^b kAW

Jk)^

Pab

k=

\

yF=

fitf

ntH

-fyt

^r=m

*<ui

-wí

Fre

e

Sup

ervi

sed

_á

RM

O

ne

AR

T

|

c \

7^

=-

l

c \

Ana

log

c Non

e

Tag

(J)=

b

Sam

e

Non

e

Fre

e

Sup

ervi

sed

AR

T=

II_

One

A

RT

C 1

L

Tm

ax(b

) =

-\

C 1

L

Ana

log

L

Non

e

K=

b

Sam

e

Non

e

Bou

nded

Tab

le 5

-1:

Com

paris

ons

betw

een

Fuzz

y A

RT

MA

P, S

uper

vise

d A

RT

-I, a

nd S

uper

vise

d A

RT

-II.

It is

ver

y cl

ear

that

the

supe

rvis

ión

appr

oach

of

Supe

rvis

ed A

RT-

I an

d Su

perv

ised

AR

T-I

I is

muc

h si

mpl

er th

an th

e m

ap f

ield

ap

proa

ch.

00 o

number of category nodes of each stack is predetermined before the training process.

When all the nodes of a particular stack have been committed, borrowing an

uncommitted node from another stack is not possible. Yet in the tagging approach of

Supervised ART-I, uncommitted nodes are free to represent any class during the

training process. This is the main constrain of the stacking supervisión approach of

Supervised ART-II.

This limitation of the stacking supervisión approach of Supervised ART-II,

can be treated by increasing the memory size of each stack. This additional memory is

compensated by employing only L of the N released memory of the tagging supervisión

of Supervised ART-I. The released memory can be used to increase the memory size of

each stack by one fold.

81

CHAPTER VI

PERFORMANCE OF SUPERVISED ART-I&II FOR

CLASSIFICATION OF LANDSAT TM IMAGES

6.1 Satellites Landsat

Satellites of the Landsat program are polar orbiting satellites. The orbit of

Landsat-5, whose images we will work with in this study, is 705 km high with

inclination of 98.2°. It rotates around the earth every 99 minutes. The repeated coverage

of the satellite is 16 days. It is sun-synchronous crossing the equator at 9:45 AM.

Landsat has the Multi-Spectral Scanner (MSS), and the Thematic Mapper

(TM). Our concern in this study is the TM. This sensor operates in the visible and

infrared range. TM has seven bands. All bands have (30 x 30)m spatial resolution

except band-6, which has (120 x 120)m resolution. Band-6 operates in the infrared. The

radiometric resolution is 8-bits for all channels. So, digital valúes are ranged between 0-

255. The details of TM sensor are listed in (table 6-1). The ground size of Landsat TM

images is (185 x 172) km. It consists 5760 Lines x 6928 Pixels.

The objectives of this chapter is to test the performance of Supervised ART-I

and Supervised ART-II for mapping land-cover, using Landsat TM images (Al-Rawi et

al. 2000). The sensitivity of the system will be tested for all the domain of dynamic

parameters (/?,/?) and for different sizes of training set. The effect of dynamic

parameters on training time and classification accuracy will be addressed.

82

Ban

d

1 2 3 4 5 7 6

Ban

dw

idth

(

m

)

0.45

-0.5

2 (V

IS,

blue

)

0.52

-0.6

0 (V

IS,

gree

n)

0.63

-0.6

9 (V

IS,

red)

0.76

-0.9

0 (N

IR)

1.55

-1.7

5 (S

WIR

)

2.08

-2.3

5 (S

WIR

)

10.4

0-12

.50

(TIR

)

Det

ecto

rs

SiP

D(1

6)

SiP

D (

16)

Sip

D (

16)

SiP

D (

16)

InS

b (1

6)

InS

b (1

6)

HgC

dTe

(4)

Res

olu

tio

n (

m)

30

30

30

30

30

30

120

SN

R (

aver

age)

60

60

46

46

36

28

Tab

le 6

-1: D

escr

iptio

ns fo

r L

ands

at-5

The

mat

ic M

appe

r (T

M) i

mag

es. T

M is

a s

cann

ing

optic

al s

enso

r op

erat

ing

in th

e vi

sibl

e an

d in

frar

ed r

ange

s. T

he la

st c

olum

n in

the

tabl

e re

pres

ents

the

Sign

al to

Noi

se ra

tio (S

NR

) for

eac

h ch

anne

l.

00

u>

6.2 Data

Results obtained for a scene of (256 x 240) pixels of the Landsat TM image

201/32 EOSAT are showed and discussed in this memory. The scene corresponds to the

área around the Spanish City of "Talavera de la Reina". All bands are used here, except

band-6 because it has a different resolution and operates in the infrared región (10.4-

12.5//m) of the spectra. The normalized valúes [0, 1] of the bands, as well as their

complements have been simultaneously introduced to the network together with class

code. The thematic classes present in this scene have been determined by supervised

field visit to almost all training áreas established during the image classification process.

Thirteen different thematic classes have been detected. These classes and number of

pixels that cover are: meadow (7569), wheat (5297), alfalfa (2061), mountains (7233),

fallow land-1 (3026), fallow land-2 (2957), fallow land-3 (3447), natural vegetation-1

(12578), natural vegetation-2 (9558), forest (3492), irrigated land (3476), wetland (101),

and river (645). The whole space defined by the vigilance parameter p e [0, 1] and the

dynamic leaming parameter /?e(0, 1] has been investigated using different size of

training samples. This will let us understand the influence of these parameters on the

sensitivity of the system from both the training time and classification accuracy points

of view. To achieve this it is required to train the networks thousand times. The simple

architectures of Supervised ART-I and Supervised ART-II make this possible.

6.3 Performance

6.3.1 Training performance

The networks (Supervised ART-I and Supervised ART-II) have been trained

with different sizes of training set. They trained with 200, 600, 1000, 3000, 9000, and

15000 exemplars for all the combinations of p(0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90,

84

and 0.95) and ,0(0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60,

0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, and 1.0). This required training each network

(Supervised ART-I and Supervised ART-II) 960 times.

Table 6-2 summarizes more significant results obtained from runs mentioned

above. Máximum and minimum valúes for committed category nodes, learning and

classification time and classification precisión, as well as the corresponding pair of

parameter valúes (/?,/?) are showed. It can be observed that category nodes range from

31 (at p=0.70, and J3=1.0) to 1077 (at ,0=0.70, and /?=0.40) when the networks are

trained with 200 and 15000 exemplars, respectively. The classification accuracy is

ranged from 64.66% (at p=0.S0, and J3 =0.95) to 81.87% (at ¿7=0.95, and J3 =0.40),

when the networks are trained with 15000 and 9000, respectively.

The behavior of the ANNs proposed in the whole dynamic parameter domain

( p , P)íox the learning set of 9000 exemplars, which gave the máximum accuracy has

been investigated in details. The variation of the number of committed category nodes

in the dynamic parameter domain is displayed in figure 6-1. It can be observed that as

p increases, more category nodes are generated (fine categories). At médium valúes of

dynamic learning parameter (0.3 </? <0.8), the number of committed category nodes

does not change much at médium and low vigilance valúes (/?<0.80), but it increases

sharply as vigilance parameter approach its máximum valué, see (figure 6-1). When

p =0.8 the number of committed category nodes is less than 200, and when p =0.90 the

number increases to 1000, and increases even more so up to 3000 when ,o=0.99. The

dynamic learning parameter also influences the number of committed category nodes. It

is clear in (figure 6-1) that at low level and médium level of p, the number of

committed category nodes increases as/?, approaches its minimum and máximum

85

Nu

mb

er o

f ca

teg

ory

No

des

Tra

inin

g t

ime

min

:sec

.ms

Cla

ssif

icat

ion

%

Cla

ssif

icat

ion

Tim

e m

in:s

ec.m

s

min

p

.fi

max

p

.ft

min

p

.ft

max

P

,P

min

P

,P

max

p

.fi

min

p.f

t m

ax

p.f

i

TR

AI

NI

NG

S

IZ

E

200

31

0.7

0,1

.0

64

0.8,

0.0

5

00:0

0.1

0.8,

0.7

00

:00.

2 0.

70,

0.05

67.1

4%

0.70

, 0.

45

76.2

7%

0.95

, 0.

95

00:4

3.0

0.7

5,1

.0

01:1

3.0

0.75

, 0.

05

600

40

0.85

, 0.

85

121

0.95

, 0.

05

00:0

0.4

0.60

, 0.

90

00:0

1.0

0.70

, 0.

05

72.7

5%

0.75

, 0.

35

80.9

9%

0.95

, 0.

95

00:5

1.0

0.85

, 0.

85

02:0

3.0

0.95

, 0.

05

1000

50

0.

70,

0.70

16

0 0.

90,

0.05

00:0

9.0

0.70

, 0.

85

00:0

2.0

0.75

, 0.

05

72.7

4%

0.70

, 0.

70

80.5

6%

0.95

, 0.

40

00:5

9.0

0.70

, 0.

90

02:3

8.0

0.60

, 0.

05

3000

71

0.

70,

0.70

34

8 0.

95,

0.05

00:0

3.0

0.70

, 0.

75

00:1

1.0

0.80

, 0.

05

71.3

6%

0.90

, 0.

80

81.8

2%

0.95

, 0.

20

01:1

5.0

0.70

, 0.

70

05:2

9.0

0.95

, 0.

05

9000

12

0 0.

70,

0.40

75

3 0.

95,

0.05

00:1

4.0

0.70

, 0.

40

01:1

7.0

0.95

, 0.

05

67.6

2%

0.90

, 0.

90

81.8

7%

0.95

, 0.

40

01:4

5.0

0.70

, 0.

40

11:5

5.0

0.95

, 0.

05

1500

0 16

6 0.

70,

0.40

10

77

0.95

, 0.

05

00:3

1.0

0.70

, 0.

60

03:1

3.0

0.95

, 0.

05

64.6

6%

0.80

, 0.

95

81.6

3%

0.95

, 0.

20

02:0

4.0

0.70

, 0.

40

15:4

2.0

0.95

, 0.

05

Tab

le 6

-2: P

erfo

rman

ce o

f Su

perv

ised

AR

T-I

I w

hen

train

ed w

ith d

iffer

ent

size

s of

trai

ning

sam

ples

. The

máx

imum

and

m

inim

um v

alúe

s ar

e sh

own.

00

0.80 - ' '

0.78 -I 1 1 1 1 0.0 0.2 0.4 0.6 0.8 1.0

•

P

Figure 6-1 :Number of category nodes in hundreds, in the domain of the vigilance parameter p and the dynamic leaming parameter fi, using 9000 pixels of the Landsat TM images. Number of category nodes increases as the vigilance parameter increases, creating fine categories. The dynamic leaming parameter influences the number of category node at low and high range due to the occurrences of under-training and over-training, respectively.

87

valúes. For small/7, an exemplar will not influence much its group, which leads to

under-training. In contrary, for large fi, an exemplar will influence its group very much,

which leads to over-training. In both cases (under-training and over-training) categories

can not represent well their members, which leads the network to genérate more

category nodes. Over-training and under-training increase the training time, due to

generation of more category nodes (figures 6-2 & 3).

It is well known from the theory that the classification accuracy and the number

of category nodes are equal for Fuzzy ARTMAP, Supervised ART-I and Supervised

ART-II. However, they have different training times. It is clear from the algorithm that

both Supervised ART-I and Supervised ART-II require less time for training due to

their simple architectures. Figure 6-2 and 6-3 show the variation of the training time in

the dynamic parameter domain for Supervised ART-I and Supervised ART-II,

respectively. It can be observed that training time is proportional to the number of

committed category nodes.

To check the performance of Supervised ART-I relative to the performance of

Supervised ART-II, the ratio of the training time for Supervised ART-I to Supervised

ART-II is shown in (figure 6-4). Abo ve the heavy line Supervised ART-II requires less

training time than Supervised ART-I. However, below the heavy line Supervised ART-I

performs better. Thanks to the simple architectures of both Supervised ART-I and

Supervised ART-II, the construction of these figures is practically possible. Supervised

ART-II performs better when the number of category nodes exceeds 1000. Otherwise

Supervised ART-I should be employed. The training times are in order of minutes using

SUN 4 SPARC Station. Such short learning times makes Supervised ART-I and

Supervised ART-II very powerful tools for classifying large scale remotely sensed data.

88

0.90 - '

0.88 -

0.86 -

0.84 -

0.82 -

0.80 -

0.78 -I 1 1 , , 0.0 0.2 0.4 0.6 0.8 1.0

•

Figure 6-2: Training time, in minutes, for Supervised ART-I, in the domain of the vigilance parameter p and the dynamic learning parameter / ? , using 9000 pixels of the Landsat TM images. Training time is proportional to number of category nodes.

0.90 -

0.88 -

0.86 -

0.84 -

0.82 -

0.80 -

0.78 -I 1 1 : i 1

0.0 0.2 0.4 0.6 0.8 1.0

•

P

Figure 6-3: Training time, in minutes, for Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter J3, using 9000 pixels of the Landsat TM images. The training time increases as the number of category nodes increases, however, generally speaking, the training time for Supervised ART-II is lower than that for Supervised ART-I.

p

0.78 0.0 0.2 0.4 0.6 0.8 1.0

- •

P

Figure 6-4: The ratio of training time for Supervised ART-I / Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter J3, using 9000 pixels of the Landsat TM images. Above the heavy line the training time of Supervised ART-II is faster than Supervised ART-I. However, below the heavy line, Supervised ART-I is faster.

91

6.3.2 Classification performance

The effect of the dynamics parameters on the performance of the networks for

classification time and classification accuracy is shown in (figure 6-5) and (figure 6-6),

respectively. The heavy line in (figure 6-6) represents the optimum valúes of dynamic

parameters for optimum classification. As the vigilance parameter increases, the number

of category nodes increases. Therefore, the dynamic learning rate or the training

exemplars should be increased. This is because the number of training exemplars for

each category node is decreased. The mínimum classification accuracy is 67.62% at

yP=0.90 and /?=0.90, while the máximum classification accuracy is 81.87% at p=0.95

and ^=0.40.

Training has been conducted using 9 000 pixels with higher vigilance parameter.

The combinations of p =0.96, 0.97, 0.98, and 0.99 with all domain of the dynamic

learning parameter have been used. Then, an 85.87% classification accuracy has been

obtained at /?=0.98 and /?=0.50, with classification time around 25 minutes using SUN

4 SPARC Station, and 9.50 minutes using ALPHA Station 500. The reference image

and the classified image for this run are shown in (figure 6-7). Each class is assigned

one color. The classification accuracy and number of category nodes for each class is

shown in (table 6-3). The confusión matrix for the classification is shown in (table 6-4).

The diagonal valúes represent the number of pixels, which are correctly classified. The

non-diagonal valúes represent the number of pixels that are wrongly assigned to each

class. Natural vegetation-1 and natural vegetation-2 have the large share for missed

classified pixels. They are assigned 1060 and 1940 pixels, respectively. This represents

42.5% of total missed classified pixels. However, they have very good classification

accuracy, since missed classified pixels represent only about 10% of their total pixels.

Forest contributed 762 pixels to natural vegetation-2 and 197 pixels to mountains.

92

0.0 0.2 0.4 0.6 0.8 1.0

P

Figure 6-5: Classification time, in minutes in the domain of the vigilance parameter p and the dynamic leaming parameter /?, for 52 440 pixels of the Landsat TM images.

0.98 -

0.96 -

0.94 -

0.92 -

0.90 -

0.88 -

0.86 -

0.84 -

0.82 -

0.80 -

0.78 -

-15 "

JW

10 /

I /

j

i

i

—15 —10

____/—5

[

20 25ZZZZ —15-

"——10 _

i i

-20 £"J--15

10—

0.0 0.2 0.4 0.6 0.8 1.0

P

Figure 6-5: Classification time, in minutes in the domain of the vigilance parameter p and the dynamic learning parameter /?, for 52 440 pixels of the Landsat TM images.

Figure 6-6: Classifícation performance, in the domain of the vigilance parameter p and the dynamic leaming parameter /?, for Landsat TM images. Classifícation performance is low at low vigilance parameter and high dynamic leaming parameter. As vigilance parameter increases, category node is increases, and therefore the dynamic leaming parameter should be increases if the training size is fixed.

Figure 6-7: The above image is the reference image. The lower image is the classified image using Supervised ART-II, with vigilance parameter /?=0.98, the dynamic learning parameter y9=0.50, and training with 9000 exemplars. The classification accuracy is 85.82%. Classes are assigned color as follows: 1-white, 2-red, 3-dark green, 4-dark yellow, 5-bright yellow, 6-yellowish green, 7-yellow, 8-brown, 9-green, 10-black, 11-dark blue, 12-light green, and 13- blue.

TR

AI

NI

NG

P

HA

SE

T

ES

TI

NG

P

HA

SE

CLA

SS

ÑA

ME

1-

Mea

dow

2-

Whe

at

3-A

lfalfa

4-

Mou

ntai

ns

5-F

allo

w la

nd-1

6-

Fal

low

land

-2

7-fa

llow

land

-3

8-N

atur

al v

eget

.-1

9-N

atur

al v

eget

.-2

10-F

ores

t 11

-lrrig

ated

land

12

-Wet

land

13

-Riv

er

Tota

l

corr

ect

pixe

ls

1449

50

7 37

7 28

6 23

3 57

5 64

1 24

16

1363

34

56

1 13

22

84

77

tota

l num

ber

of p

ixel

s 14

75

594

394

310

247

627

652

2519

15

32

35

578

13

24

9000

clas

sific

atio

n ac

cura

cy %

98

.24

85.3

5 95

.69

92.2

6 94

.33

91.7

1 98

.31

95.9

1 88

.97

97.1

4 97

.06

100.

00

91.6

7 94

.19

num

ber

of

node

s 18

9 59

42

63

50

81

14

0 45

0 12

7 15

35

1 3

1255

corr

ect

pixe

ls

5470

39

18

1521

53

20

2099

20

28

2514

91

28

7154

24

74

2696

84

59

6 45

002

tota

l num

ber

of p

ixel

s 60

94

4703

16

67

6923

27

79

2330

27

95

1005

9 80

26

3457

28

98

88

621

5244

0

clas

sific

atio

n ac

cura

cy %

89

.76

83.3

1 91

.24

76.8

5 75

.53

87.0

4 89

.95

90.7

4 89

.14

71.5

6 93

.03

95.4

5 95

.97

85.8

2

Tab

le 6

-3:

Tra

inin

g an

d cl

assi

fica

tion

stat

istic

s fo

r L

ands

at T

M i

mag

e at

ind

ivid

ual

clas

s le

vel.

The

ne

ural

net

wor

k ha

s be

en tr

aine

d w

ith 9

000

exem

plar

s. T

he c

lass

ific

atio

n ac

cura

cy f

or t

he tr

aini

ng s

et i

s 94

.19%

. T

he c

lass

ific

atio

n ac

cura

cy f

or t

he r

emai

ning

52

440

pixe

ls o

f th

e im

age

is 8

5.62

%.

The

vi

gila

nce

para

met

er p

=0.

98 a

nd th

e dy

nam

ic l

earn

ing

rate

/?=

0.50

are

use

d. T

he le

arni

ng ti

me

is 2

.99

min

utes

and

the

cla

ssif

icat

ion

time

is 2

0.97

min

utes

, usi

ng S

UN

4 S

PAR

C S

tatio

n. T

he d

ark

num

bers

in

the

las

t ra

w r

epre

sent

s th

e to

tal

num

ber

of p

ixel

s th

at a

re a

ssig

ned

by t

he n

eura

l ne

twor

k to

eac

h cl

ass.

Os

0 ) O B N N ( 0 0 ) S < M I O ( I H O ^

0 < t r l S N N N ° O n i H IO

o o o o o o o o o ^ o o Sí: IO (0

CM

O O O O f - O O O O T - O J g o S Í 0 0

w C M C M O O O O O O O

O O N ¡ ? 0 0 0 00 ^

S o § co SP S ° ° 5 t: CM CO r > -

CM O TJ-

c » o 2 2 5 o o o

0 ( 0 0 0 * 0 CM CO i -

r-~ o o o o ^ü

8 £ ^ CO ' "t-

^ O)

"5§ ^ CM ^

lO CM CO

1 0 ( 0 0 CM oo r-

CM

* o o o> O ) T "

o o o o T- SS r^ c j

o o o o o

¡Z2

«2

CM CO o o o o o CM h- CM

O CO o o o o o

m o o o o O)

^ CM

CM ^

oo _ in CO ^ CO

gs co ^

CO

O CM O O O

CO r-, r-% CM • * ^ o o CO CO t o

O) O O • *

CO h-h- o> o o o

CO CM

CO co CM

co co O) IO

IO . - . CM a> ° m c o j y o í ^ o o o o o

CO - , a> T - o M n Ol s

CO o o o o m

o h- O O CM oí m s s

o O O O CM

m IO

co co o CM

o o IO o CM

o o 3

T - c M c o ^ r i f ) c o r - - c o o >

co co

- ° °S «

co

CM CO *¡

co CM

00

^ tí

^ § •tí

lí i—) t 3

^ 59 o J3

« °

o fc TJ- q

CN OÍ ^ 8 u _

I-1 s < 2 l

co CO CO

8 « o *

<U «4-1

3 ° s - l

> , o> Oí ,£ )

- i * 1 3 co "K co o 03 +-•

"o 2 - t í 4-.

~ c U 2 CO Oí

13 e -tí § «J —

u o , o tí. «4-1 .tí

13 •*

OÍ . t í

U <4-H

-tí s>

f I -52 -3 .CH -*^

O H (D

s a

t> ^

o

'S fe 3 03 <D

"o 2

Oí

- 1 2 > & o Oí J

^ Oí

o +-^ tí

4_» . r - t

¡ e Ii5

X

C/3 Oí

)-( O H Oí

tí

Oí

«-i o 5

fi 13 « tí

O O t í Ci) OÍ . 3

Oí

±¿ "tí

Oí

° tí 13 tí Oí *̂

52 a B t3

' o <U ° J3 Oí " ^

¡I o Oí C/3

2 o, Oí k-l co Oí 3

03

3 Oí

tí Oí

- * - » >. x>

-o 3 8 -§.!> Oí —^

tí H CO ^H )-i Oí

X) rtí tí co I cS tí 73

•a i ^ ^ f—i co ' ^ co co- ^

c3 o Oí +3

co co

13

O 03 01

o T 3 Oí tí bD

•»—i co CO

03

"tob o o

£ 2 -a T? tí -o 03

Oí

Oí a, «/ o fl O) fl « 60 . i? Oí .sv

03 co X co • rt B . H co T 3 03 O H 03

97

Forest has the lowest classification accuracy among all classes. Mountains contributed

347, 751, and 328 pixels to natural vegetation-1, natural vegetation-2, and forest,

respectively.

The behaviour of both Supervised ART-I and Supervised ART-II, for training

remotely sensed data, for all the domain of the dynamic parameters is well understood.

According to the results that have been obtained, Supervised ART-II should be

employed when the number of category nodes is in thousands. Otherwise Supervised

ART-I performs better. However, Supervised ART-II can be employed here too, since

the learning time is very short when the number of category nodes is less than 1000,

which is less than a minute.

98

CHAPTER VII

PERFORMANCES OF SUPERVISED ART ANNs WITH DIFFERENT VIGILANCE DYNAMICS

7.1 Introduction

Only one approach, for vigilance dynamic in supervised ART ANNs, has been

áddressed in the literatures. If the match valué of the winning category node is greater

than the predetermined vigilance parameter p while class matching is failed, then the

current match valué is assigned to the vigilance parameter after increase it by a very

small valué e (equation 3-1)

The vigilance parameter p is only increasing during training phase when class

matching of the wining category node is failed for a specific input features. The very

small positive valué e is added to the failed match valué then is assigned to the vigilance

parameter in order to classify rare events (Carpenter et al. 1991b). However, Carpenter

et al. (1998b) reported that reducing it by s rather than increasing leads to reduction in

number of category nodes without influence the classification accuracy of the network.

The vigilance dynamic of this approach is shown in figure 3-5.

7.2. Vigilance dynamics

7.2.1 Flying approach

It has been mentioned above the unique vigilance dynamic that reported in the

literatures, which is only increasing during training phase when class matching of the

wining category node is failed for a specific input features. This approach is called the

99

flying approach to differentiate it from other approaches that they have been proposed

in this work. The vigilance parameter in the flying approach is controlled by the

foliowing equation:

2 M

pt+1 =max{pn{YJ(AiAwUK)IM)í}±£ 1A ¡=i

The flying approach prevents committed category node that has a match valué

greater than the initial vigilance parameter and belong to the class of the current input

out of competition, if the match valué of the failed category node is higher than the

match valué of this category node (see figure3-5). This leads to genérate more

committed category nodes. Therefore, longer training and classification times are

required.

7.2.2 Fixed vigilance approach

In this approach, the vigilance parameter is constant during training phase,

which has the initial valué.

A+i = A 7-2

This allowed all committed category nodes to be created under the same level

of confidence. Moreover, committed category node that has a match valué greater than

the initial vigilance parameter and belong to the class of the current input can represent

the input, independently to its choice valué rank among committed category nodes (see

figure 7-la)

7.2.3 Free vigilance approach

Free vigilance approach is assigned to the vigilance parameter the match valué

of the previous category node if it is failed to represent the current input.

100

a- Fixed A ii

á i

A

í

• k

b- Free

i

A

i

l

k

i

"

A k J

c- Float

i

i

L

i

k

l i

í

A L á

Figure7-1: Sketches show different vigilance parameter dynamics. The x-axis represent ranking for all committed nodes according to their choice valúes. The y-axis represents the match valué for each category nodes. First sketch (a) represents the fixed approach. All category nodes are committed at the same level of vigilance valué. The second sketch (b) represents the free approach. In this approach, the vigilance parameter is always equal to the previous match valué. Therefore, a category node might be committed with match valué less then the initial vigilance parameter p0. Finally, the third sketch (c) represents the float approach, the vigilance parameter is equal to the previous match valué if it is not smaller than the initial vigilance valué, otherwise initial vigilance valué will be employed.

101

2M

Pt+l=(Z(AiAWuK)/M)t 7-3 i=i

This allows the vigilance parameter to changed freely above and below the

initial valué. This allows the network to attenuate itself to the (proper) vigilance

parameter during training phase rather than forcé it to do so (see figure 7-Ib).

7.2.4 Floating approach

The floating approach is like that of the free vigilance parameter but with

constrain that does not let the vigilance parameter to be lower than its initial valué. This

is to be sure that all committed category nodes have the minimum required level of

confídence.

pt+]=max{po,(YJ(AiAwUK)/M)t} 7.4 /=i

This leads to genérate category nodes more than both fixed and free

approaches, but less than flying approach (see figure 7-le).

It should be mentioned here that; px = p0 for all the above vigilance dynamics.

Where pB is the initial vigilance valué.

7.3 Results and discussion

The performance of supervised ART-II ANN has been tested, for classification

of the Landsat TM images, using all the above mentioned vigilance dynamics.

102

p 0.98

—

035—

r~

~^gy

~ ^Ü

70~

~ 0.

00

p 0.50

0

^-4

n

ns^

0.

15

Cla

ssif

ícat

íon

per

form

ance

and

nu

mb

er o

f ca

Fly

%

87.0

5 m

m

"77

40

76.8

4

#cn 1241

^3

50

~—

23

5'

227

Floa

t

%

86.0

4 g=

reT

78

^ 7

g^

#cn 12

41

~T82

^ 56

Free

%

66.7

1 .

^^

-77

^ T

^gg

#cn 12

0 ^B

5 57

""56

" tego

ry n

odes

Fixe

d

%

85.7

3 —

jgjg

^™

7Z07

71

.44

#cn 12

14

. ...

^j

U

13*

Tab

le 7

-1: T

he p

erfo

rman

ce o

f Sup

ervi

sed

AR

T-I

I AN

N w

ith d

iffer

ent

vigi

lanc

e dy

nam

ics.

A

lpha

sta

tion

500

has

been

use

d fo

r th

ese

runs

.

Figure 7-2: Classified images for landsat TM images. First, second, third, and forth column represents classified images using fly, float, fixed, and free vigilance parameter, respectively. First, second, third, fourth, and fifth raw represents classified images using initial vigilance parameter and dynamic learning rate of (0.98, 0.50), (0.95, 0.20), (0.90, 0.15), (0.70, 0.15), (0.00, 0.15), respectively. Classes are assigned colours as follow: 1) meadow-white, 2) mountain-brown, 3) fallow landl-yellow, 4) fallow land2-dark yellow, 5) fallow land3-bright yellow, 6) irrigated land-red, 7) alfalfa-black, 8) wetland-dark blue, 9) forest-dark green, 10) wheat-light green, 11) natural vegetationl-yellowish green, 12) natural vegetation2-green, and 13) river-blue.

The network has been tested using fíve different combinations; (0.98,0.50),

(0.95,0.20), (0.90,0.15), (0.70,0.15), and (0.00,0.15), for vigilance parameter and

dynamic leaming rate, respectively. These valúes of vigilance parameter p0 and

dynamic leaming rate j3 are located on the optimum line for classifícation performance

(figure 6-6). This optimum line represents the best valué of /? for a specific valué of p

to get the máximum classifícation accuracy using flying approach.

The classifícation performance is ranged from 66.71% using the free approach

to 87.05% using the flying approach. The numbers of category nodes was 120 and 1252

for free and flying approaches, respectively. These results obtained when 0.98, and 0.50

were used for the vigilance parameter and for the dynamic leaming rate, respectively.

See (table 7-1) for details. Classified images are shown in (figure 7-2).

The neural network performances using flying, floating and fixed approaches

are closer to each other as the vigilance parameter approach to unity. It is clear from the

theory that all above-mentioned approaches lead to the same classifícation accuracy and

number of category nodes at p0 = 1. The neural network performance using floating

and free approaches are closer to each other as pB approach to zero. It is lead to the

same performance at p0 = 0.

While the flying approach shows better performance from accuracy point of

view when the initial vigilance parameter is equal or greater than 0.95, the floating

approach shows better performance for initial vigilance parameter less than 0.95. From

number of category-nodes point of view, the network performs better using floating

approach. While it is equal to each other at p0 = 1, it is reduced to less than 25%

(56/227) at p0 = 0, (see table 7-1). Such reduction will let to reduction in the training

time and the classifícation time as well.

105

CHAPTER VIII

CONCLUSIONS

In this study new simplifíed architectures of ANNs have been designed. These

architectures have been employed to analyze remotely sensed data. The conclusions that

can be drawn from this study are:

1) Two new versions of Fuzzy ART have been developed. The algorithms show that

these new versions have the same performance as the original algorithm for

categorization. However, they require less training and categorization times.

2) New supervised ART-type architecture has been developed called Supervised ART-

I. It has been built from a single module of ART rather than two modules of ARTs

linked by a map field as are the cases of all supervised ART ANNs which have been

addressed in the literature. This leads to the elimination of the map field and its

parameters. It is theoretically proven that Supervised ART-I has the classification

performance of fuzzy ARTMAP, however, it requires less memory and less training

time due to its simple architecture.

3) Other supervised ART-type architecture has been developed called Supervised

ART-II. It is also has been built from a single module of ART. It has the

classification performance of fuzzy ARTMAP and Supervised ART-I. The category

layer of Supervised ART-II has been divided into stacks. Each stack represents a

single class. This reduces the required memory for labeling category nodes from N

in the tagging approach of Supervised ART-I to only L in the stacking approach of

Supervised ART-II.

106

4) An uncommitted category node in Supervised ART-I is free to represent any class,

however, an uncommitted category node in Supervised ART-II is predetermined to

represent a specific class. When a stack runs out of uncommitted category nodes,

borrowing uncommitted category node from other stack is not possible. Increasing

the memory size of each stack can solve this limitation of the stacking supervisión

approach, of Supervised ART-II. This additional memory is compensated by

employing only L of the released N memory of the tagging supervisión of

Supervised ART-I. The released memory can be used to increase the memory size of

each stack by one fold.

5) While we only employed the newly developed supervisión approaches for Fuzzy

ART, they can be applied to all ART-type ANNs.

6) Supervised ART-I is oriented to homogenous environment, while Supervised ART-

II is oriented to non-homogenous environment. The homogenousity of the

environment depends on the type of data and on the dynamic parameters.

7) Since both Supervised ART-I and Supervised ART-II have been built from a single

module of ART, the cost for building chips for classification tasks will be much

lower than the map field approach.

8) The behavior of both Supervised ART-I and Supervised ART-II, for training

remotely sensed data, for all the domain of the dynamic parameters is well

understood.

9) An automatic system for classifying Landsat TM images, with very good

classification accuracy, has been developed.

10) This study shows that flying approach should be employed for vigilance dynamic if

the vigilance parameter is very high (>0.95), while floating approach should be

employed otherwise.

107

Some aspects derived from this study that need to be investigated in fiíture

works are: new learning algorithms need to be developed. These learning algorithms

must eliminate or reduce the under-training and over-training episode. Further studies

are recommended to investígate the behavior of these designed architectures for dealing

v/ith different digital signal processing problems. Some studies in this direction have

been already conducted. The developed architectures have been employed successfully

•for monitoring forest fire (Al-Rawi et al. 2001a, b, c & d) and for cloud detection (Al-

Rawi et al. 200le & f).

108

BIBLIOGRAPHY

Al-Rawi, K. R., 1999, "Supervised ART-I: A new neural network architecture for learning and classifying multivalued input patterns", Lecture Notes in Computer Science, 1606,756-765.

Al-Rawi, K. R., Gonzalo, C , and Arquero, A., 1999, "Supervised ART-II: A new neural network architecture, with quicker learning algorithm, for classifying multivalued input patterns", In proceeding of the European Symposium on Artificial Neural Network ESANN'99, Bruges, Belgium, 289-294.

Al-Rawi, K. R., Gonzalo, C.,and Martínez, E., 2000, "Supervised ART-II for classifícation Landsat Thamatic Mapper image", Remote Sensing in the 21st

Century: Economic and Environmental Applications, Casanova (ed), Balkema, Rotterdam, 229-235.

Al-Rawi, K. R., Casanova, J. L., and Calle, A., 2001a, "Burned área mapping system and fire detection system, based on neural networks and NOAA-AVHRR imagery", International Journal of Remote Sensing (in press).

Al-Rawi, K. R., Casanova, J. L., and Romo, A., 2001b, "IFEMS: New approach for monitoring wildfire evolution with NOAA-AVHRR imagery", International Journal of Remote Sensing (in press).

Al-Rawi, K. R., Casanova J. L., and Louakfaoui, M., 2001c, "IFEMS for monitoring spatial-temporal behaviour of múltiple fire phenomena", International Journal of Remote Sensing (in press).

Al-Rawi, K. R., Casanova J. L., and Calle, A., 200Id, "ART neural network for mapping burned área and determination severity of burn with Landsat TM images", Submitted to IEEE on Geoscience and Remote Sensing.

Al-Rawi, K. R., Casanova J. L., and Vasileisky, A., 200le, "A very quick neural network algorithm for cloud detection", Submitted to Geocarto International.

Al-Rawi, K. R., and Casanova, J. L., 2001f, "Neural network as an aid tool for building non-linear threshold algorithm for cloud detection", Submitted to Remote Sensing ofEnvironment.

Bachelder, I. A., Waxman, A. M., and Seibert, M., 1993, "A neural system for mobile robot visual place learning and recognition", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 512-517.

109

Baloch, A. A., and Waxman, A. M., 1991, "Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN", Neural Networks, 4, 271-302.

Baraldi, A., and Parmiggiani, F., 1995, "A neural network for unsupervised categorization for multivalued input patterns. An application to satellite image clustering", IEEE Transaction on Geoscience and Remote Sensing, 33, 305-316.

Benediktsson, J. A., Swain, P. H., and Ersoy, O. K., 1990, "Neural network approaches versus statistical methods in classification of multisource remote sensing data", IEEE Transaction on Geoscience and Remote Sensing, 28, 540-552.

Bernardon, A. M., and Carrick, J. E., 1995, "A neural system for automatic target learning and recognition applied to bare and camouflaged SAR target", Neural Amorfo, 8, 1103-1108.

Carpenter, G. A., and Grossberg, S. 1987a, "A massively parallel architecture for a self-organizing neural pattern recognition machine", Computer Vision, Graphic, and Image Processing, 37, 54-115.

Carpenter, G. A., and Grossburg, S., 1987b, " ART2: Stable self-organization of pattern recognition codes for analog input patterns", Applied Optics, 26, 4919-4930.

Carpenter, G. A., and Grossberg, S., 1990, " ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures", Neural Networks, 3, 129-159.

Carpenter, G. A., Grossberg, S., and Rosen, D. B., 1991a, " Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system", Neural Networks, 4, 759-771.

Carpenter, G. A., Grossberg, S., and Renold, J. H., 1991b, "ARTMAP: Supervised real-time learning and classification of nonstationary data by self-organizing neural network", Neural Network, 4, 565-588.

Carpenter, G. A., Grossberg, S., Markuzan, N., Reynold, J. H., and Rosen, D. B., 1992, "FUZZY ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps", IEEE Transaction On Neural Network, 3,698-882.

Carpenter, G. A., and Ross, W. D., 1993, "ART-EMAP: a new neural network architecture for object recognition by evidence accumulation", IEEE Transaction On Neural Network, 6, 805-818.

Carpenter, G. A., Gaja, M. N., Gapa, S., and Woodcok, C. E., 1997, "ART neural networks for remote sensing: vegetation classification from landsat TM and terrain data", IEEE Transaction on Geoscience and. Remote Sensing, 35, 308-325.

no

Carpenter, G. A., 1997, "Distributed learning, recognition, and prediction by ART and ARTMAP neural networks", Neural Networks, 10, 1473-1494.

Carpenter, G. A., and Markuzon, N., 1998, "ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases", Neural Networks, 11, 323-336.

Carpenter, G. A., 1998, "Distributed ARTMAP: a neural network for fast distributed supervised learning", Neural Networks, 11,793-813.

Caudell, T. P., and Healy, M. J., 1994, "Adaptive Resonance Theory networks in the Encephalon autonomous visión system", Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, USA, 1235-1240.

Caudell, T. P., Smith, S. D. G., Escobedo, R., and Anderson, M., 1994, "NIRS: large scale ART-1 neural architectures for engineering design retrieval", Neural Network, 7,1339-1350.

Dubrawski, A., and Crowley, J. L., 1994, "Learning locomotion reflexes: A self-supervised neural system for a mobile robot", Robotic and Autonomous System, 12,133-142.

Gan, K. W., and Lúa, K. T., 1992, "Chínese character classification using adaptive resonance network", Pattern Recognition, 25, 877-882.

Georgiopoulos, M, Fernlund, H., Bebis, G., and Heileman, G. L., 1996, "Order of search in Fuzzy ART and Fuzzy ARTMAP: Effect of the choice parameter", Neural networks, 9, 1541-1559.

Georgiopoulos, M, dagher, L., Heileman, G. L., and Bebis, G., 1999, "Properties of learning of a Fuzzy ART variant", Neural networks, 12, 837-850.

Gopal, S., Sklarew, D. M., and Lambin, E., 1994, "Fuzzy-neural networks in multi-temporal classification of landcover change in the Sahel", Proceeding of the DOSES Workshop on New Toolsfor Spatial Analysis, Lisbon, Portugal, 55-68.

Grossberg, S., 1976, " Adaptive pattern classification and universal recoding, II: Feed back, expectation, olfaction, and illusions", Biological Cybernetics, 23, 187-202.

Grossberg, S, 1980, "How does a brain build a cognitive code?", Psychological Review, 1,1-51.

Ham, F. M., and Han, S. W., 1996, "Quantitative study of the QRS complex using fuzzy ARTMAP and MIT/BIH arrhythmia datábase", in proceeding of Word congress on Neural Networks, 1,207-211.

Heermann, P. D., and Khazenie, N., 1992, "Classification of multispectral remote sensing data using a Back-Propagation neural network", IEEE Transaction on Geoscience and Remote Sensing, 30, 81-88.

l l l

Hepner, G. F., Logan,T., Ritter, N., and Bryant, N., 1990, " Artificial neural network classification using a minimal training set: comparison to conventional supervised classification", Photogrammetric Engineering & Remote Sensing, 56, 469-473.

Hopfield, J. J., 1982, "Neural networks and physical systems with emergent collective computational abilities," Proceeding of National academy of Sciences, 79, 2554-2558.

Keyvan, S., Drug, A., Rabelo, L. C , 1993, "Application of artificial neural networks for development of diagnostic monitoring system in nuclear plants", transaction of American Nuclear society, 1, 515-522.

Keyvan, S, 1999, "Application of ART2-A as a Pseudo-supervised paradígn to nuclear reactor diagnostics", Lecture Notes in Computer Science, 1606, 747-755.

Kohonen, T, 1982, "Self-organized formation of topologically correct feature maps," Biological Cybernetics, 43, 59-69.

Kumar, S. S., and Guez, A., 1989, "A neural network approach to target recognition", International Joint Conference on Neural Network, Washington DC, Hillsdale, NJ, Erlbaum Associate, II, 573-578.

Lang, K. J., and Withbrock, M. J., 1989, "Learning to tell two spirals apart", Proceedings 1988 Connectionist Models Summer School, 52-59.

Le Cun, Y. 1986, "Learning processes in an asymmetric threshold network", in Disordered Systems and Biological Organization, E. Bienenstock, F. Fogelman Souli, and G. Weisbruch, Eds., Berlín, Spring-Verlag.

Mannan, B., Roy, J., and Ray, K., 1998, "Fuzzy ARTMAP supervised classification of multi-spectral remotelt-sensed data", International Journal of Remote Sensing, 19, 767-774.

Mehta, B. V., Vij, L., and Rabelo, L. C., 1993, "Prediction of secondary structure of protein using fuzzy ARTMAP", in proceeding of Word Congress on Neural Networks, 1,228-232.

Mekkaoui, A., and Jespers, P., 1990, "An optimal self-organizing pattern classifier", International Joint Conference on Neural Networks, Washington DC, Hillsdale, NJ, Erlbaum Associate, 1,477-450.

Moore, B., 1989, "ART1 and patterns clustering", proceeding 1988 connectionist models Summer School, D. Touretzky, G. Hintoon, and T. Sejnowski, Eds, San Mateo, CA : Morgan Kaufmann, 174-185.

Mulder, N. J., and Spreeuwers, L., 1991, "Neural networks applied to the classification of remotely sensed data", International Geoscience and Remote Sensing Symposium (IGARSS'91). Espo, Finland, 2211-2213.

112

Murrshed, N. A., Bortozzi, F., and Sabourin, R., 1995, "Off-line signature verification, without a priori knowledge of class col. A new approach", Proceedings ofthe Third International Conference on Document Analysis and Recognition, Piscataway, NJ, USA.

Paola J. D.,and Schowengerdt, R. A., 1994, " Comparisons of neural networks to standard techniques for image classification and correlation", International Geoscience and Remote Sensing Symposium (IGARSS'94). Pasadena, Ca, USA, 1404-1406.

Paola, J. D., and Schowengerdt, R. A., 1995, "A review and analysis of backpropagatíon neural networks for classification of remotely-sensed multi spectral imagery", International Journal of Remote Sensing, 16, 3033-3058.

Paola, J. D., and Schowengerdt, R. A., 1997, "The effect of neural-network structure on a multispectral land-use / land-cover classification", Photogrammetric Engineering & Remote Sensing, 63, 535-544.

Parker, D., 1986, "Computational research in economics and management science", MIT, Cambridge, MA, USA, technical report TR-87, 1986.

Racz, J., and Dubrawski, A.; 1995, "Artificial neural network for mobile robot topological localization", Robotics andAutonomous Systems, 16, 73-80.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J., 1986, "Learning internal representations by back-propagation", Parallel distributed Processing: Explorations in the Microstructure of Cognition (D. E. Rumelhart and J. L. McClelland, Eds). MIT Press, Cambridge, Massachusetts, 318-362.

Salu, Y., and Tilton, J., 1993, "Classification of multispectral image data by the binary diamond neural network and by nonparametric, pixel-by-pixel methods", IEEE Transaction on Geoscience and Remote Sensing, 31, 606-617.

Seibert, M., and Waxman, M., 1992, "Adaptive 3-D object recognition from múltiple views", IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 107-124.

Seibert, M., and Waxman, A. M., 1993, "An approach to face recognition using saliency maps and caricatures", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 661-664.

Simpson, P. K., 1990, "Neural networks for sonar signal processing", Handbook of neural computing applications (A. J. Maren, C. T. Harston, and R. M. Pap (Eds.), San Diego, Academic press, 319-335.

Solaiman, B., and Mouchot, M. C , 1994., "A comparative study of conventional and neural network classification of multispectral data", International Geoscience. and Remote Sensing Symposium (IGARSS'94), Pasadena, CA, USA, 1413-1415.

113

Soliz, P., and Donohoe, G. W., 1996, "Adaptive resonance theory neural network for fundus image segmentation", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 1180-1183.

Srinivasa, N., and Sharma, R., 1996, "A self-organizing invertible map for active visión applications", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 121-124.

Tzeng, Y. C , Chen, K. S., Kao, W. L., and Fung, A. K., 1994, "A dynamic learning neural network for remote sensing applications", IEEE Transaction on Geoscience and Remote Sensing, 32,1096-1102.

Yool, S. R., 1998, "Land cover classification in rugged áreas using simulated moderate-resolution remote sensor data and an artificial neural network", International Journal of Remote Sensing, 19, 85-96.

Yoshida, T., and Omatu, S., 1994, "Neural network approach to land cover mapping", IEEE Transaction on Geoscience and Remote Sensing, 32, 1103-1109.

Warner, T. A., and Shank, M., 1997, "An evolution of the potential for fuzzy classification of multispectral data using artificial neural networks", Photogrammetic Engineering & Remote Sensing, 63,1285-1294.

Waxman, A. M., Seibert, M. R.? Gove, A., Fay, D. A., Bernardon, A. M., Lazott, C, Steele, W. R., and Cunnigham, R. K., 1995, "Neural processing of targets in visible, multispectral IR and SAR imagery", Neural Networks, 8, 1029-1051.

Werbos, P. J., 1974, "Beyond regression: New tools for prediction and analysis in the behavioural sciences", Ph.D. thesis, Harvard University, Cambridge, MA, USA.

Williamson, J. R., 1996, "Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimentional maps", Neural Networks, 9, 881-897.

Wilson, C. L., Wilkinson, R. A., and Ganis, M. D., 1990, "Self-organizing neural network character recognition on a massively parallel computer", International Joint Conferenceon neural Networks, San Diego, Piscataway, NJ, IEEE Service Center, II, 325-329.

114

APPENDIX

RESUMEN

A.l. INTRODUCCIÓN

A.1.1 Evolución histórica de las Redes Neuronales Artificiales (RNA)

Aún cuando el origen de las RNA se puede fechar en 1943, cuando McCulloch

and Pitts construyeron la primera estructura de RNA, los fundamentos de este área se

desarrollaron en la primera mitad de los años setenta. Fue entonces cuando (Werbos

1974) planteó los principios del algoritmo de aprendizaje conocido como Back

Propagation (BP) y (Grossberg 1976) estableció las bases de la Teoría de Resonancia

Adaptativa (Adaptive Resonance Theory (ART)). No obstante, fue en la década de los

ochenta cuanto se produjo un gran avance teórico en este campo. De tal forma que el

algoritmo BP fue desarrollado simultánea e independientemente por diferente autores

(Le Cun 1986, Parker 1986, y Rumelhart et al. 1986). Además se plantearon nuevas

estructuras de redes neuronales y nuevos algoritmos de aprendizaje. Así, Kohonen

propuso en 1982 las Redes Neuronales Autoorganizativas {Self-Organizing Map

(KSOM)).

En este trabajo se ha prestado una especial atención a la evolución

experimentada por las RNA tipo ART (Carpenter y Grossbergh 1987a&b), dada su

probada estabilidad, rapidez y precisión (Carpenter et al. 1991a&b, 1992, 1997, y Gan

& Lúa 1992). Estas prestaciones han facilitado su aplicación en diferentes y numerosas

áreas. Así, la compañía Boeing ha utilizado este tipo de RNA para la obtención de

información de diferentes sistemas con objeto de facilitar el diseño de otros nuevos

sistemas (Caudell et al. 1994). También se ha utilizado este tipo de redes para

115

reconocimiento de objetivos móviles (Seibert y Waxman 1992, Bernardon y Carrick

1995, Kumar y Guez 1989, Koch et al. 1995, y Waxman et al. 1995); Para el control de

motores en robótica (Baloch y Waxman 1991, Bachelder et al. 1993, Dubrawski y

Crowley 1994, Srinivasa y Sharma 1996); En navegación de robots (Racz y Dubrawski

1995); En visión artificial (Caudell y Healy 1994); Reconocimiento de objetos (Seibert

y Waxman 1992); Reconocimiento de caras (Siebert y Waxman 1993); Agrupación de

patrones (Moore 1989, Mekkaoui y Jespers 1990); Reconocimiento de caracteres

(Wilson et al. 1990); Procesado de señales de Sonar (Simpson 1990); Procesado de

imágenes médicas (Soliz y Donohoe 1996); Reconocimiento de ondas en

electrocardiogramas (Ham y Han 1996); Verificación de firmas (Murshed et al. 1995);

Identificación de fallos en plantas nucleares (Keyvan 1999); y en Teledetección (Gopal

etal. 1994, Baraldi y Parmiggiani 1995).

A. 1.2 Clasificación de datos remotamente detectados con RNA.

Los avances experimentados en las últimas décadas, tanto en la investigación

espacial como en las tecnologías de computación, han hecho posible la utilización de

datos remotamente detectados para la determinación y ubicación automática de las

clases temáticas presentes en la superficie terrestre. En la actualidad, este área de

conocimiento se caracteriza por ser una línea de investigación muy activa (Benediktsson

et al. 1990). Las ventajas que aporta el uso de RNA para llevar a cabo estas tareas de

clasificación, frente a algunos clasificadores convencionales, tales como el de máxima

probabilidad (MLC) son: 1) Las RNA no necesitan conocer apriori la distribución de

probabilidad para cada clase, ya que son sistemas no-paramétricos. Además, esto

permite introducir otros datos auxiliares de naturaleza no espectral (pendiente,

topografía, textura, ...etc), los cuales parecen mejorar la precisión de la clasificación

116

(Benediktsson et al. 1990, Carpenter et al. 1997). También, se ha probado que las redes

neuronales son más robustas cuando la distribución no es gaussiana (Paola y

Schowengerdt 1997, Hepner et al. 1990). 2) A diferencia de los clasificadores

convencionales, las RNA tiene capacidad para tratar con clasificaciones difusas (Paola y

Schowengerdt 1997, Warner y Zanca 1997, Yool 1998). En estos casos, los valores

proporcionados por las neuronas de salida pueden cuantificar el grado de pertenencia de

los datos de entrada a una clase determinada. Este aspecto es especialmente relevante

cuando se trabaja con sensores de baja resolución espacial. 3) El paralelismo inherente

en las RNA permite una relativa facilidad de computación de estos sistemas en

computadoras paralelas (Salu y Tilton 1993, Heermann y Khazenie 1992),

disminuyendo considerablemente el tiempo empleado en el proceso de clasificación,

respecto de los clasificadores clásicos. 4) La flexibilidad de las RNA permite mejorar

los resultados de clasificación en determinadas circunstancias (Carpenter et al. 1997). 5)

Por último, estos sistemas tienen la capacidad de poder establecer límites de decisión

arbitrarios (Paola y Schowengerdt 1995, Tzeng et al. 1994).

La red neural mas habitualmente utilizada en la literatura para clasificar datos

remotamente detectados es el Perceptron multi-capa (MLP), con el conocido algoritmo

de aprendizaje Backpropagation. Este algoritmo se basa en la minimización del error

entre el valor proporcionado por la red a su salida y el valor real. Algunos autores han

afirmado que los clasificadores convencionales tienen mejores prestaciones que el MLP

(Mulder y Spreeuwers 1991, Solaiman y Mouchot 1994). Sin embargo, otros han

concluido que el MLP clasifica datos remotamente detectados con mayor precisión que

el MLC ( Hepner et al. 1990, Heerman y Khazenie 1992, Paola y Schowengerdt 1994,

Yoshida y Omatu 1994). No obstante, la clasificación de datos remotos mediante la red

MLP presenta una serie de inconvenientes, como son: la arquitectura de la red no es fija,

117

el número de capas ocultas y el número de nodos en cada capa oculta debe determinarse

mediante prueba y error. Este proceso puede ser muy costoso desde el punto de vista de

tiempo de computación, dado que el entrenamiento de la red es lento. Además, durante

el proceso de aprendizaje, la red puede quedar atrapada en mínimos locales, lo que

impediría la convergencia de la red. Este problema se puede minimizar disminuyendo el

valor de la razón de aprendizaje, pero esto supone un aumento en el tiempo empleado

por la red durante el entrenamiento. (Heermann y Khazenie 1992) propusieron la

utilización de computadoras paralelas para reducir el tiempo de entrenamiento, a costa

de un aumento en el coste de hardware.

Algunos estudios (Carpentar et al. 1992) han mostrado que Fuzzy ARTMAP

proporciona una precisión de clasificación mayor que el MLP para imágenes del sensor

Thematic Mapper (TM), transportado por el satélite Landsat, empleando menos tiempo

para ello. Así mismo, estos autores concluyeron que en este caso, Fuzzy ARTMAP y

MLC proporcionaban la misma precisión de clasificación. Sin embargo, (Marinan et al.

1998) compararon las prestaciones de Fuzzy ARTMAP, MLP y MLC para clasificar

una imagen de 512x512 detectada por el sensor LISS-II transportado por el satélite

Indio IRS-1B, concluyendo que la precisión de clasificación de Fuzzy ARTMAP era

muy superior a la de los otros dos clasificadores. En cuanto al tiempo requerido para el

aprendizaje era ligeramente inferior que el tiempo empleado por el MLC y

considerablemente menor que él empleado por el MLP. Además es preciso destacar que

a diferencia del MLP, la arquitectura de Fuzzy ARTMAP está bien definida, siempre

converge, y es capaz por si misma de generar nuevos nodos que permitan representar

subclases. El principal inconveniente que presenta Fuzzy ARTMAP es la complejidad

de su arquitectura.

118

A.2. OBJETIVOS DE LA TESIS

De los aspectos discutidos anteriormente se sigue el objetivo de la presente

Tesis. Este objetivo se puede enunciar como la búsqueda de arquitecturas de redes

neuronales tipo ART que presenten las mismas prestaciones que ellas, pero que sean

más simples desde el punto de vista estructural, lo que a su vez supondrá la

disminución de los tiempos de cómputo asociados tanto al proceso de aprendizaje como

al de operación.

Este objetivo global, se puede desglosar en algunos objetivos parciales como son:

• Diseño de nuevas arquitecturas dé RNA tipo ART, que proporcionen la misma

precisión de clasificación que las ART clásicas, reduciendo la complejidad de sus

arquitecturas.

• Propuesta de algoritmos de aprendizaje para estas arquitecturas.

• Codificación de los algoritmos de aprendizaje de las diferentes arquitecturas

propuestas.

• Estudio exhaustivo y comparativo de las prestaciones de las redes y los algoritmos

propuestos para el caso de la clasificación de imágenes remotamente detectadas por

el sensor Thematic Mapper.

A.3. REDES NEURONALES ARTIFICIALES TIPO ART

Los principios de la Teoría de Resonancia Adaptativa (ART) fueron planteados

por Carpenter y Grossberg (Centre for Adaptive Systems, Department of Cognitive and

Neural System, University of Boston), como una teoría sobre el procesado de

información del sistema cognitivo humano (Grossberg 1976, 1980). A partir de esta

teoría, se desarrollaron inicialmente, diferentes estructuras no supervisadas, ART1

(Carpenter y Grossberg 1987a), ART2 (Carpenter y Grossberg 1987b), ART3

119

(Carpenter y Grossberg 1990), SART (Baraldi y Parmiggiani 1995) y Fuzzy ART

(Carpenter et al. 1991a). Todas estas redes eran capaces de agrupar las diferentes

entradas en clases, utilizando únicamente la información que caracterizaba a dichas

entradas (aprendizaje no supervisado). La diferencia fundamental entre ART1 y ART2

es que la primera solo admite datos binarios, mientras que la segunda también admite

datos analógicos. En ambas, hay flujo de información hacia delante y hacia atrás. Hacia

delante, a través de los pesos que conectan cada nodo de la capa de entrada con todos

los nodos de la capa que realiza el agrupamiento de los datos de entrada. A cada uno de

estos nodos se le va a denominar nodo categoría. Y hacía atrás mediante otro conjunto

de pesos que conecta cada nodo categoría, con todos los nodos en la capa de entrada. Al

igual que ART2, Fuzzy ART puede clasificar tanto datos binarios como analógicos. Sin

embargo, en este último caso la información solo fluye hacia delante desde la capa de

entrada hasta la capa clasificadora. Otra diferencia fundamental entre Fuzzy ART,

ART1 y ART2, es que el operador intersección de la teoría de conjuntos ( n ) , ha sido

sustituido por el operador ( A ) que representa al operador de mínimo valor en la teoría

de lógica difusa (fuzzy).

La primera red neuronal tipo ART que presentó un aprendizaje supervisado fue

ARTMAP, la cual fue propuesta por Carpentar et al. en (1991). En este caso, además de

las características a clasificar es preciso proporcionar a la red, durante la fase de

entrenamiento, el código de clase que corresponde a cada entrada. En 1992, estos

mismos autores presentaron otra nueva red tipo ART con aprendizaje supervisado

Fuzzy ARTMAP (Carpenter et al. 1992). Posteriormente, otras muchas arquitecturas

supervisadas tipo ART han sido investigadas, entre las que cabe mencionar ART-EMAP

(Carpenter y Ross 1993), Gaussian ARTMAP (Williamson 1996), ARTMAP-IC

(Carpenter y Markuzon 1998), y Distributed ARTMAP (Carpenter 1998). Todas estas

120

arquitecturas, se caracterizan porque la supervisión se lleva a cabo mediante un "map

field" que requiere la presencia de dos módulos tipo ART (ARTa y ARTb). Las

principales diferencias entre ARTMAP y Fuzzy ARTMAP radican en que mientras la

primera está construida con dos módulos de ART1, la segunda utiliza dos módulos de

Fuzzy ART. ARTMAP tiene la habilidad de aprender y clasificar patrones de entrada

binarios multievaluados, mientras que Fuzzy ARTMAP también admite patrones

analógicos.

De todas las redes supervisadas mencionadas anteriormente, Fuzzy ARTMAP ha

sido la más utilizada. Ella ha sido aplicada a la resolución de diferentes problemas,

como son: análisis automático de electrocardiogramas (Ham y Han 1996); gestión y

diagnóstico de centrales nucleares (Keyvan et al. 1993); o predicción de la estructura

secundaria de algunas proteínas (Mehta et al. 1993).

A.3.1 Fuzzy ART

Dado que todas las arquitecturas y algoritmos propuestos en este trabajo están

inspirados en Fuzzy ART, y Fuzzy ARTMAP, se va a realizar aquí una breve

descripción de ambas. Previamente, es preciso hacer notar que ambas mantienen las

características básicas y propias de todo los sistemas tipo ART. Entre ellas, es

especialmente resefiable, el emparejamiento de acuerdo a criterios de semejanza

(matching) entre los patrones de entrada y los vectores prototipo previamente

aprendidos por la red. Este proceso de emparejamiento puede llevar a la red a un estado

resonante que puede dar lugar al aprendizaje de nuevos prototipos (categorías) o a la

búsqueda de prototipos semejantes y previamente aprendidos. Si la semejanza es mayor

entre el patrón de entrada a la red y el almacenado que el predeterminado, la resonancia

ocurre y la nueva información se incorpora al nodo de la categoría seleccionado

121

mediante el entrenamiento de sus pesos. El criterio de semejanza se establece a través

del denominado parámetro de vigilancia/?. Este parámetro determina el umbral que

debe superar un nodo categoría comprometido para poder representar un patrón de

entrada dado, antes de que se dispare la búsqueda de otro nodo categoría que represente

mejor dicho patrón. Si ninguno de los nodos categoría comprometidos supera dicho

umbral, se debe comprometer un nuevo nodo categoría. Este proceso se puede repetir,

siempre que no se supere la capacidad de memoria de la red. El parámetro de vigilancia,

p, es un número adimensional definido en el intervalo (0, 1]. Un valor de este

parámetro igual a 1 representa una semejanza perfecta, es decir determina clases muy

bien diferenciadas, pero da lugar a un número alto de nodos categoría, mientras que

valores bajos de este parámetro permiten trabajar con pocos nodos categoría pero da

lugar a clases muy generales. Este parámetro es una de las claves de todas las RNA tipo

ART. Su valor depende del tipo y volumen de datos, la precisión de clasificación que se

desee, la velocidad requerida y la memoria disponible. Este parámetro se mantiene

constante en la operación de todas las redes no supervisadas.

En la figura 2-1 de la memoria, se muestra la dinámica de Fuzzy ART. En esta

figura Fx representa la capa de entrada y F2 la denominada capa clasificadora. Los

pesos ^conectan cada nodo de la capa de entrada con todos los nodos de la capa

clasificadora. El aprendizaje de los pesos del nodo ganador, wu, solo se lleva acabo si

este nodo pasa la prueba de semejanza, o dicho en otras palabras supera el parámetro de

vigilancia, sino este nodo sale de la competición (reset). En la figura 2-1 \X\ representa

el grado de semejanza entre la entrada y los pesos del nodo categoría ganador J. Este

grado de semejanza está determinado por la relación X = ̂ ( 4 ( , ) A wu). La selección

del nodo ganador supone calcular el nivel de activación de cada nodo categoría, Tj°

122

(ec. 2-1), y elegir el nodo que alcanza el nivel mas alto. El valor de y. es una

estimación del grado de pertenencia de la entrada a la clase representada por el nodo/.

La arquitectura de Fuzzy ART se muestra en la figura 2-2, donde se han

representado los 2M nodos de la capa de entrada, siendo M el número de valores que

definen a cada patrón de entrada. Los M últimos nodos de entrada representan los

valores complementarios de dichos patrones. Además en la figura 2-2 se han

representado los nodos categoría, así como todas las conexiones entre los nodos de F¡ y

F2 • Los nodos categoría cuyo índice va desde 1 hasta C reciben el nombre de nodos

categoría comprometidos, mientras que los nodos categoría cuyos índices van desde

C+l hasta N se denominan nodos categoría no comprometidos. Cuando todos los nodos

categoría comprometidos fallan en la representación de una entrada y consecuentemente

están fuera de competición uno de los nodos categoría no comprometidos debe ser

comprometido. Una vez que se ha encontrado un nodo capaz de representar al patrón de

entrada a la red y dicho nodo ha pasado el test de vigilancia, el valor de los pesos de ese

nodo categoría debe ser actualizados para que incorporen las características del nuevo

patrón al nodo J (ec.2-7). La ecuación de adaptación de los pesos viene dada por la

siguiente expresión:

w"J» = /3{A? A wff ) + (1 - / ? ) < ; i=h ..., 2M 2-7

Donde J3 e (0, 1] es el parámetro denominado razón de aprendizaje (learning raté).

A.3.2 Fuzzy ARTMAP

Como ya se ha mencionado, Fuzzy ARTMAP es una generalización de

ARTMAP (Carpenter et al. 1991b) (ver figuras 3-ly 3-2). En este caso, el mapfield es

123

un matriz deNxL pesos binarios (w jk; j=l, ..., N; k=l, ..., L) inicializados a 1 (figure

3-4), siendo L el número de clases a considerar.

A diferencia de las redes tipo ART no supervisadas, en las supervisadas, el

parámetro de vigilancia p s[0, 1] puede aumentar durante el proceso de aprendizaje.

Así por ejemplo, si el nivel de activación del nodo ganador es mayor que el valor del

parámetro de semejanza predeterminado y sin embargo, el test de semejanza a esa clase

no es superado, entonces, p toma el valor del nivel de semejanza aumentado en una

pequeña cantidad e, como se muestra en la siguiente ecuación:

1M 1 ¿M

, . ^ . , „ . 3-1

Esta definición del parámetro de vigilancia va a permitir clasificar eventos raros

(Carpenter et al. 1992). La figura 3-5 muestra la dinámica del parámetro de vigilancia

en entornos supervisados.

Para llevar a cabo el entrenamiento de la red Fuzzy ARTMAP, se deben

presentar a los módulos ARTa y ARTb los pares formados por los vectores de entrada

A(t> y b . El primer conjunto de vectores representa los patrones de entrenamiento,

mientras que el segundo grupo representa el código binario asignado a la clase a la que

pertenece el correspondiente patrón de entrenamiento.

Cuando el nivel de activación del nodo ganador supera el parámetro de

vigilancia, se debe evaluar la semejanza de la clase que en este caso se considerará

aceptable, si supera el valor predeterminado del parámetro de vigilancia del mapfield,

pab, entonces se procede a la actualización de los pesos de acuerdo a las ecuaciones

" T = A4 (° A Wf) + (1 - PW? ;i=l 2M

<w = fiibP A < ) + ( 1 - / ? ) < ;k=l, ...,L

124

en caso contrario será el parámetro de vigilancia el que deberá cambiar su valor

mediante la ecuación 3-1. El parámetro de vigilancia, pab, se mantiene constante

durante toda la fase de entrenamiento. La red repite el proceso de búsqueda de un nodo

ganador que supere todas los tests entre todos los nodos comprometidos y en caso de no

encontrarlo compromete nuevos nodos hasta encontrar el que represente al patrón de

entrenamiento que se está considerando en ese momento.

Una vez finalizada la fase de entrenamiento, el módulo ARTA se reduce a la

capa de entrada (figure 3-3) y los valores de los pesos w¡j y wjk se mantienen constantes.

Durante la fase de clasificación, se calcula el nivel de activación de todos los nodos

categoría comprometidos, para cada patrón de entrada que se presenta a ART a. Aquel

nodo que alcance el nivel de activación máximo será el ganador. Entonces se calcula la

puntuación de este nodo para cada uno de los nodos de entrada de ART¿ según la

ecuación 3-6. El índice del nodo bK que alcance la máxima puntuación indica el código

de la clase a la que pertenece el patrón de entrada.

A.4. PROPUESTA DE DOS VERSIONES MEJORADAS DE FUZZY ART

Con objeto de reducir los tiempos de cómputo asociados a los algoritmos de

aprendizaje y clasificación de la arquitectura no supervisada Fuzzy ART, en este trabajo

se han desarrollado dos nuevas versiones de dicha arquitectura que se han denominado

"Flagged Fuzzy ART" y "Compact Fuzzy ART" respectivamente.

A.4.1 Versión "Flagged" de Fuzzy ART

La primera versión ("Flagged") desarrollada para mejorar las prestaciones de Fuzzy

ART, está inspirada en las ideas de (Geongiopoulos et al. 1999). Estos autores

125

propusieron realizar la búsqueda del nodo de máximo valor entre todos los nodos

categoría comprometidos y un nodo categoría no comprometido. Sus aproximación

obliga a utilizar valores iniciales de los pesos altos y consecuentemente un aprendizaje

rápido. En la aproximación flagged propuesta, la determinación del nodo con máximo

valor se lleva a cabo considerando únicamente el nodo no comprometido,

adecuadamente marcado {flagged) mediante un valor, </>c+x, fijo, menor que cero, pero

superior a los niveles de activación de los nodos comprometidos que han quedado fuera

de competición. De esta forma, se asegura que cuando todos los nodos categoría

comprometidos están fuera de competición, el nodo no comprometido con índice C+l y

valor <f>c+l, va a ser el ganador. Además este nodo siempre va a pasar el test de

semejanza ya que se ha demostrado que él grado de semejanza de cualquier nodo

categoría comprometido por primera vez es 1 (ec. 2-19). Una vez que la red alcance la

resonancia, es preciso determinar si se ha activado un nuevo nodo comprometido. En

ese caso, el número de nodos categoría comprometidos ha aumentado en una unidad y

los pesos del nodo que pasa a ser abanderado se deben inicializar a 1. La arquitectura de

esta nueva red propuesta se muestra en la figura 2-3. Es claro que el número de

comparaciones que es preciso realizar para determinar el máximo valor de activación, es

inferior al número de comparaciones implicadas en la operación de Fuzzy ART.

A.4.2 Versión "Compact" de Fuzzy ART

En esta versión, la determinación del nodo con máximo valor de activación solo va a

tener en cuenta los nodos categoría comprometidos. Los nodos categoría no

comprometidos se van a ir comprometiendo en orden secuencial sin necesidad de estar

previamente etiquetados (flagged). De esta manera, si para una determinada entrada,

todos los nodos comprometidos quedan fuera de competición, el siguiente nodo no

126

comprometido pasa a estar comprometido y a representar la nueva entrada, mediante la

definición de sus pesos según la siguiente ecuación:

"¡SI, = /H(° + ( 1 " P) ;i=h..,2M (2-22)

Como se desprende de dicha ecuación, no es preciso la inicialización de los pesos de los

nodos no comprometidos antes de que dichos nodos se comprometan. Este aspecto ya

supone una optimización del algoritmo original, ya que supone una disminución

importante en el número de operaciones aritméticas que se deben realizar. Esta nueva

versión requiere C-l comparaciones, frente a las N-l requeridas por Fuzzy ART, lo cual

supone una reducción importante en los cálculos, dado que N»C. La figura 2-4

muestra la arquitectura completa de la versión "Compact" de Fuzzy ART. La tabla 2-1

muestra un estudio comparativo entre las versiones original, "Flagged" y "Compact" de

Fuzzy ART. En esta tabla se puede apreciar que las dos versiones propuestas son mas

rápidas que la original, y que entre ellas la compacta es mas recomendable, dado que

aunque la diferencia es muy pequeña, está última requiere menos memoria y menos

operaciones de asignación.

A.5. PROPUESTA DE DOS NUEVAS ARQUITECTURAS SUPERVISADAS TIPO ART

Las dos nuevas arquitecturas supervisadas tipo ART propuestas, permiten, al

igual que Fuzzy ARTMAP, aprender y clasificar patrones binarios y analógicos

multievaluados. Ambas RNA proporcionan la misma precisión de clasificación que la

original, sin embargo las nuevas estructuras son mucho mas simples. De lo cual se

deriva una importante reducción en la complejidad computacional y consecuentemente

en los tiempos de operación de la red, tanto en la fase de aprendizaje como en la de

clasificación.

127

Tanto Supervised ART-I como Supervised ART-II se han construido a partir de

un único módulo ART. Esta aproximación supone la eliminación del segundo módulo

ART, así como del módulo llamado mapfield, lo que a su vez implica la eliminación de

los pesos asociados a estos dos módulos y el parámetro de vigilancia del mapfield. A su

vez estas simplificaciones han permitido la utilización de un código de clase analógico

(entero positivo) en vez de un código binario de L dígitos. Y la sustitución de la

memoria de NxL pesos del mapfield por una memoria de N posiciones, utilizada para

almacenar las etiquetas asociadas a cada nuevo nodo comprometido con el código de la

clase a la que pertenece.

Las figuras 4-2 y 5-2 muestran las arquitecturas de Supervised ART-I y

Supervised ART-II respectivamente.

A.5.1 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-I

El entrenamiento de Supervised ART-I tiene lugar mediante la presentación a la

red de los pares formados por vectores de los patrones de entrada A® y el código de la

clase asignado a ese patrón, b(tK El nivel de activación de cada nodo comprometido se

calcula mediante la ecuación (2-1). De entre todos los nodos categoría comprometidos

se selecciona el nodo J (máximo nivel de activación) que cumpla las dos condiciones

siguientes: que tenga una valor de semejanza mayor o igual que el parámetro de

vigilancia y que supere el test de semejanza a esa clase (b(t)). Entonces el nodo es

entrenado de acuerdo a la siguiente ecuación:

w»J» = J3(A¡" A O + (1 - P)wf ; i=l,.... 2M

En el caso de que se supere el test de semejanza, pero no el de clase, entonces es preciso

actualizar el valor del parámetro de aprendizaje según la ecuación:

128

i 2M

P = T 7 { £ K ( ° A W Í / ) } + Í

En este caso se debe seleccionar otro nodo comprometido. El nodo categoría que no ha

podido representar el patrón de entrada actual debe ser puesto fuera de competición,

para evitar su posible reelección. En el caso de que ninguno de los nodos categoría

comprometidos sea capaz de representar esa entrada, se debe comprometer un nuevo

nodo y etiquetarlo con el código de clase 6 . Cada vez que se introduce un nuevo

patrón de entrenamiento a la red el parámetro de vigilancia debe volver a su valor

inicial.

Una vez que Supervised ART-I ha sido entrenada con un número suficiente y

significativo de patrones de aprendizaje, el proceso de clasificación es muy sencillo. En

este caso solo es preciso presentar a la red los patrones a clasificar, A(t). La red calcula la

función de activación de cada uno de los nodos categoría comprometidos, siendo el

ganador, el que consiga un nivel de activación mas alto. El código de clase asignado a

este nodo durante la fase de entrenamiento determinará la clase a la que pertenece el

patrón que en ese momento está presente en la entrada de la red, siempre y cuando, el

valor de semejanza del nodo ganador supere el valor inicial del parámetro de vigilancia,

p. Si esto no ocurre, esa entrada no puede ser clasificada. En este sentido, el grado de

representatividad del conjunto de patrones que se utilice para entrenar a la red es un

factor crítico para los resultados de clasificación.

Los pseudo-códigos de los algoritmos de aprendizaje y clasificación de

Supervised ART-I se encuentran detallados en el capítulo 4 de la memoria.

129

A.5.2 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-II

Como se puede apreciar en la figura 5-2, la memoria utilizada para almacenar los

nodos categoría en Supervised ART-I, se ha dividido en L "pilas", cada una de ellas

conteniendo Nk(Nk; k=l, ..., L) nodos categoría. El índice, k, asociado a cada "pila"

representa el código de clase de los nodos categoría incluidos en ella. El número de

nodos, Nk, puede variar de unas pilas a otras, de hecho este número puede depender de

la naturaleza y tamaño de los datos. No obstante, si no se tiene un conocimiento apriori

de ellos, se debe asignar inicialmente el mismo número de nodos a cada "pila".

El entrenamiento de Supervised ART-II se inicializa, al igual que para el resto de

las redes supervisadas tipo ART, presentando a la red un patrón de entrada, A ¡ y su

correspondiente código de clase, ¿ . A continuación se debe calcular el nivel de

activación de todos los nodos comprometidos de todas las pilas. En este caso el nivel de

activación se ha definido mediante la ecuación 5-1.

2M

S ( ^ , ( , ) A W M ) f(0 _ J=Í

'** 2M ; jk=l,...,C(k),k=l,...,L 5-1

; = 1

donde C(k) representa el número de nodos comprometidos en la pila k-ésima y wykk son

los pesos asociados a las conexiones entre el nodo categoría j k de esta pila con el nodo

de entrada i-ésimo. A continuación se debe determinar el nodo de cada pila con máximo

nivel de activación, siendo el ganador el que obtenga el máximo nivel de activación

entre los candidatos. Al igual que en Fuzzy ARTMAP y Supervised ART-I, este nodo

debe superar el test de semejanza y el de semejanza de clase para que se lleve a cabo el

aprendizaje de sus pesos. En el caso de que el nodo ganador (JK) no pueda representar

130

la entrada considerada, entonces este nodo se saca de la competición (Tj¡¿ = -1) y se

selecciona otro candidato de esa "pila". El proceso se repite con todos los nodos

candidatos hasta que se encuentre un nodo comprometido capaz de representar al patrón

de entrada que se está considerando en ese momento. En el caso de que ninguno de los

nodos categoría comprometidos de la pila asignada al código de clase, pueda representar

a la entrada actual, es preciso comprometer un nuevo nodo de esta pila, dado que

ninguno de los nodos comprometidos pertenecientes a otras pilas, va a superar el test de

semejanza de clase. Es claro, que en esta aproximación el número de nodos a considerar

para cada entrada que se presenta a la red es considerablemente inferior que en el caso

de Supervised ART-I. Sin embargo, Supervised ART-I permite que cualquier nodo

categoría pueda representar cualquier clase y Supervised ART-II no. No obstante, esta

desventaja se puede hacer mínima si la memoria utilizada en Supervised ART-I para

almacenar las etiquetas de los nodos categoría se utiliza para doblar el número de nodos

categoría de cada pila en Supervised ART-II, sin aumentar los requisitos de memoria.

En Supervised ART-II, el proceso de clasificación se reduce a asignar al patrón

de entrada presentado a la red en cada momento, el código de clase asignado durante la

fase de entrenamiento, al nodo que consigue un mayor nivel de activación para dicho

patrón de entrada. Si ningún nodo consigue superar el parámetro de vigilancia, esa

entrada no puede ser clasificada por la red.

Los pseudo-códigos de los algoritmos de aprendizaje y clasificación de

Supervised ART-II se encuentran detallados en el capítulo 5 de la memoria.

131

A.6. EVALUACIÓN DE LAS PRESTACIONES DE SUPERVISED ART-I Y SUPERVISED ART-II EN LA CLASIFICACIÓN DE IMÁGENES REMOTAMENTE DETECTADAS

Una vez desarrolladas las dos nuevas propuestas de arquitecturas supervisadas

tipo ART, se ha llevado a cabo un exhaustivo estudio del comportamiento de los

algoritmos tanto de aprendizaje como de clasificación asociados a dichas arquitecturas,

para el caso particular de imágenes remotamente detectadas por el sensor Thematic

Mapper (TM) (Al-Rawi et al. 2000). Con este objetivo, se ha estudiado la dependencia,

del número de nodos categoría de las redes, de los tiempos de aprendizaje y

clasificación, así como de la precisión de la clasificación, de los parámetros dinámicos

(p,j3), para conjuntos de entrenamiento de diferentes tamaños (200, 600, 1000, 3000,

9000 y 15000). Para ello, se ha analizado el dominio completo definido por el parámetro

de vigilancia p e [0,1] y el parámetro que determina la razón de aprendizaje /? € (0, 1],

para cada uno de los conjuntos de entrenamiento. Este estudio ha implicado entrenar

cada una de las redes definida por un par de valores (p,j3), para cada uno de los

conjuntos de entrenamiento. En total y dado el conjunto de valores que se le han

asignado a los parámetros de aprendizaje: p (0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, y

0.95) y J3(0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.65,

0.70, 0.75, 0.80, 0.85, 0.90, 0.95, y 1.0), cada red se ha entrenado 960 veces. Es preciso

mencionar que este estudio ha sido posible dada la simplicidad de las arquitecturas de

las redes propuestas en este trabajo. Así mismo hay que resaltar que de los estudios

teóricos se concluye que el número de nodos categoría implicados en el aprendizaje de

un determinado conjunto de entrenamiento es el mismo para Fuzzy ARTMAP,

Supervised ART-I y Supervised ART-I I.

Como datos de entrada a la red, tanto en la fase de aprendizaje como en la de

clasificación, se han utilizado los valores espectrales asignados a las 6 bandas del sensor

132

TM que presentan la misma resolución espacial (TM1, TM2, TM3, TM4, TM5 y TM7),

normalizados entre [0,1]. En cuanto a la asignación de la clase temática a cada uno de

los patrones de entrenamiento, se ha llevado a cabo mediante visitas de campo a las

áreas elegidas, así como otro tipo de información auxiliar (mapas, monografías,...).

En la tabla 6-2 se han extraído los resultados más significativos de los

experimentos realizados, como son los valores máximos y mínimos del número de

nodos categoría comprometidos, los tiempos de aprendizaje y clasificación y la

precisión de clasificación para todos los conjuntos de entrenamiento, indicando el par de

valores de los parámetros p y (3 a los que corresponden. En esta tabla se puede observar

que el número de nodos comprometidos varía desde 31 (para p=0.70, /?=1.0 y 200

patrones de aprendizaje) hasta 1077 (para p=0.70, J3 =0.40 y 15 000 patrones de

aprendizaje). Mientras que la precisión en la clasificación ha variado desde el 64.66%

(para p=0.S0, /?=0.95 y 15 000 patrones de entrenamiento) hasta 81.87% (para

p=0.95, ,0=0.40 y 9 000).

Con objeto de mostrar el comportamiento de los algoritmos para un número fijo

de patrones, dependiendo de los valores de p y fi y dado que la máxima precisión se

ha alcanzado para 9 000 patrones, en las figuras 6-1, 6-2 y 6-3 se muestra la variación

del número de nodos comprometidos, del tiempo de aprendizaje para Supervised ART-I

y el tiempo de aprendizaje para Supervised ART-II respectivamente, en todo el dominio

definido por p y fi para ese número de patrones.

En este estudio se ha constatado que el parámetro de vigilancia/? es el

parámetro determinante de las características y prestaciones de una determinada red tipo

ART. Como se puede observar en la figura 6-1, a medida que p aumenta de valor,

también aumentan el número de nodos comprometidos. Mientras que para p<0.S0,

133

este número de nodos es prácticamente independiente de /?, siempre y cuando, este

parámetro esté comprendido en el intervalo [0.3,0.8], el número de nodos

comprometidos aumentan notablemente^ añedida que el parámetro de vigilancia se

aproxima a su valor máximo. La influencia del parámetro fi en el número de nodos

comprometidos, también se puede observar en la figura 6-1. Es claro que para valores

medios y bajos de p, este número aumenta cuando fi se acerca a los extremos de su

intervalo de definición. Ello es debido a que valores de /? pequeños pueden provocar

un aprendizaje incompleto (under-training), mientras que valores altos provocan el

efecto contrario (over-training) y en cualquiera de los dos casos, cada categoría

establecida no es capaz de representar a todos sus miembros, provocando un aumento

del número de nodos comprometidos. Estos efectos de sobre aprendizaje y aprendizaje

incompleto, conllevan un aumento en los tiempos de aprendizaje y clasificación, como

consecuencia del aumento de nodos comprometidos, y además disminuye la precisión

de la clasificación. Este comportamiento queda reflejado en las figuras (6-2, 6-3 6-5 y

6-6).

El estudio comparativo de los tiempos de aprendizaje de Supervised ART-I y

Supervised ART-II, ha permitido determinar un valor umbral para el número de nodos

categoría (ver figura 6-4). De forma que si el número de nodos es menor que 1000,

Supervised ART-I emplea menos tiempo en aprender, mientras que por encima de ese

valor Supervised ART-I I, aprendemas rápido.

Con objeto de mejorar los resultados de clasificación, se ha entrenado la red con

9 000 patrones y con valores del parámetro de vigilancia superiores a los considerados

hasta este momento. Así, se han realizado experimentos para /?=0.96, 0.97, 0.98, y

0.99, variando el parámetro de aprendizaje en todos los valores considerados

anteriormente. De esta forma se ha obtenido una precisión en la clasificación del

134

85.87% para /?=0.98 y /?=0.50. Los tiempos de clasificación en este caso han sido de

25 minutos cuando se ha realizado eñ una estación de trabajo SUN 4 SPARC y de 9.50

minutos en una estación ALPHA 500. Resultados concretos de clasificación son

presentados y discutidos en la memoria.

Las diferencias entre las dos arquitecturas propuestas, básicamente residen en

que Supervised ART-II requiere menos memoria y tiene un aprendizaje mas rápido que

Supervised ART-I, en el caso de dates no-homogéneos.

A.7. PRESTACIONES DE REDES SUPERVISADA TIPO ART PARA DIFERENTES DINÁMICAS DEL PARÁMETRO DE VIGILANCIA

En todos los experimentos llevados a cabo en este trabajo, hasta este momento,

la actualización del valor del parámetro de vigilancia se ha llevado a cabo según la

aproximación propuesta en la literatura para todas las redes tipo ART supervisadas. A

esta aproximación, se le ha denominada volante (flying) (ec 3-1), para diferenciarla de

las aproximaciones que se han propuesto en este trabajo: parámetro de vigilancia fijo

durante el aprendizaje (vigilancia fija), parámetro de vigilancia libre y parámetro de

vigilancia flotante.

En la aproximación constante, todos los nodos categoría comprometidos

presenta el mismo nivel de confianza. Tal que cualquier nodo cuyo nivel de semejanza

supere el parámetro de vigilancia inicial y pertenezca a la misma clase puede

representar la entrada, independientemente del orden establecido por los valores de la

función de activación de todos los nodos categoría.

En la aproximación libre, el parámetro de vigilancia puede cambiar libremente a

valores superiores o inferiores al valor inicial, su valor en cada momento vendrá

determinado por el valor de semejanza del último nodo categoría que no ha sido capaz

de representar el patrón de entrada considerado en ese momento.

135

La diferencia entre la aproximación libre y flotante, es que en esta última, se

impide que el parámetro de vigilancia sea inferior a su valor inicial. De esta forma se

asegura que todos los nodos categoría comprometidos tengan un nivel de confianza

mínimo. En este caso el número de nodos comprometidos resultantes será superior a las

otras dos aproximaciones propuestas, pero siempre será inferior al caso del parámetro

de aprendizaje volante.

Con objeto de investigar la incidencia de estas aproximaciones en la operación

de este tipo de redes, se han llevado a cabo experimentos para cinco pares de valores de

los parámetros de la red (p 0 y /?), todos ellos pertenecientes al conjunto de valores

óptimos, previamente obtenidos para la aproximación volante ((0.98,0.50), (0.95,0.20),

(0.90,0.15), (0.70,0.15) y (0.00,0.15)). La precisión de clasificación obtenida ha variado

desde 66.71%, cuando se ha utilizado la aproximación de variación libre, a 87.05%

cuando se ha usado la aproximación volante. El número de nodos categoría ha sido

mínimo en la aproximación libre (120) y máximo en la volante (1252). Estos valores de

precisión son los máximos alcanzados para cada aproximación y corresponden a los

valores de los parámetros dinámicos /?=0.98 y ¡3=0.50

La conveniencia de usar una aproximación u otra depende de los requerimientos

impuestos y de los valores de los parámetros dinámicos que se hayan seleccionado. Así

se ha observado que si se desea optimizar la precisión de la clasificación, para valores

del parámetro de vigilancia superiores o iguales a 0.95, la mejor aproximación para

determinar la dinámica de este parámetro es la volante, pero si su valor es inferior a este

umbral, entonces la dinámica mas aconsejable es la proporcionada por la aproximación

flotante. En cambio, desde el punto de vista de la minimización del número de nodos

categoría y por tanto reducción de los tiempos de aprendizaje y clasificación, entonces

la aproximación flotante es la mas adecuada, si p0 = 0.

136

Cuando el parámetro de vigilancia se aproxima a 1, el número de nodos categoría y

la precisión de clasificación, son prácticamente independientes de que se utilice la

aproximación volante, la flotante o la fija. Las prestaciones de la red son muy similares

Cuando usando la aproximación flotante o la libre p0 se aproxima a cero, siendo iguales

en el límite.

A.8. CONCLUSIONES

Las conclusiones que se pueden extraer de los resultados obtenidos en este trabajo

de Tesis Doctoral, se pueden concretar en los siguientes puntos:

1.- Se han desarrollado dos nuevas versiones de la Red Neuronal Artificial (ART)

conocida como Fuzzy ART: Flagged y Compact. El análisis de los algoritmos no

supervisados de estas nuevas versiones, muestran que su capacidad de clasificación es

idéntica a la del algoritmo original, pero reducen su complejidad computacional y por

tanto los tiempos de cómputo asociados tanto a la etapa de entrenamiento, como a la de

clasificación.

2.- Se ha desarrollado una nueva arquitectura de RNA tipo ART supervisada,

denominada Supervised ART-I, así como su correspondiente algoritmo de aprendizaje

supervisado. Esta arquitectura propuesta es mucho mas simple que la del modelo

original, ya que ella está basada en un único módulo de ART, en vez de en dos. Esto

además supone la eliminación del módulo de unión entre ellos (map field). Un estudio

teórico ha mostrado que Supervised ART-I, tienen la misma precisión en el proceso de

clasificación que Fuzzy ARTMAP, pero reduce sus requisitos de almacenamiento de la

información y el tiempo de aprendizaje.

3.- Se ha propuesto una segunda arquitectura tipo ART supervisada, a la que se le ha

denominado Supervised ART-II, la cual presenta las mismas prestaciones respecto al

proceso de clasificación que Fuzzy ARTMAP y Supervised ART-I. La capa de nodo

137

categoría de Supervised ART-II se ha estructurado en L pilas. Esto supone una

importante reducción en la memoria requerida para almacenar las etiquetas de las clases

respecto a Supervised ART-I.

4.- Supervised ART-I permite que cualquier nodo categoría pueda representar cualquier

clase mientras que en Supervised ART-II cada nodo categoría esta predestinado a

representar una determinada clase. Ahora bien, imponiendo un tamaño de memoria

constante, la memoria utilizada en Supervised ART-I para almacenar las etiquetas de los

nodos categoría se podría emplear en Supervised ART-II para doblar el número de

nodos categoría de cada pila, lo que minimizaría está inflexibilidad.

5.- Las aproximaciones supervisadas propuestas para Fuzzy ART, se pueden hacer

extensivas a cualquier Red Neuronal Artificial tipo ART.

6.- Se ha demostrado que cada una de las aproximaciones supervisadas propuestas

trabaja de forma mas adecuada dependiendo de la homogeneidad del entorno.

Supervised ART-I es mas adecuada para entornos homogéneos, mientras que

Supervised ART-II está orientada a entornos no homogéneos. Esta homogeneidad se

define en términos del tipo de datos y de los parámetros dinámicos.

7.- La simplicidad de las arquitecturas Supervised ART-I y Supervised ART-II las hace

especialmente adecuadas para su realización mediante circuitos integrados de

dedicación específica.

8.- Un detallado estudio del comportamiento de Supervised ART-I y Supervised ART-

II en el dominio de sus parámetros dinámicos, ha permitido entender la influencia de los

parámetros dinámicos en el funcionamiento de las redes.

9.- Se ha desarrollado un sistema automático de clasificación de imágenes remotamente

detectadas por el sensor TM, el cual presenta una precisión de clasificación muy buena.

138

10.- Diferentes dinámicas del parámetro de vigilancia han sido propuestas y evaluadas.

Estas investigaciones han permitido concluir que la aproximación denominada "flying"

es adecuada cuando el valor inicial de dicho parámetro es muy alto (mayor que 0.95),

mientras que en el resto de los casos es mas adecuada la aproximación denominada

"floating".

Algunos de los aspectos derivados de este trabajo que deben ser investigados en

el futuro son: el estudio de nuevos algoritmos de aprendizaje que permitan reducir o

eliminar los episodios de under-training y over-training; el cómputo de los algoritmos

propuestos en diferentes sistemas DSP (Digital Signal Processing); así como la

investigación del comportamiento de las arquitecturas diseñadas para resolver otros

tipos de problemas. En este sentido, es preciso hacer notar que las arquitecturas

desarrolladas en este trabajo han sido utilizadas, proporcionado muy buenos resultados,

en la detección y seguimiento de incendios forestales (Al-Rawi et al. 2001a, b, c & d),

así como en la detección de nubes (Al-Rawi et al. 200le & f).

139

university polytechnic of madrid faculty …oa.upm.es/42680/1/kamal_alrawi.pdf · university...

Documents