art1: similarity measures

9
Neural Processing Letters 6: 109–117, 1997. 109 c 1997 Kluwer Academic Publishers. Printed in the Netherlands. ART1: Similarity Measures EI EI KHIN 1 , AMAROTTAM SHRESTHA 2 and R. SADANANDA 1 1 Computer Science and Information Management Program, Asian Institute of Technology, GPO Box 2754, Bangkok 10501, Thailand; 2 Dept. of Information Technology, Sirindhorn International Instituteof Technology, Thammasat University, P.O. Box 22, Thammasat Rangsit P.O., Pathumthani 12121, Thailand E-mail:[email protected] E-mail: [email protected] Key words: ART1, neural network, self stabilization, similarity measures, stability-plasticity dilemma Abstract. This paper concerns with the ART1 (Adaptive Resonance Theory 1) in Neural Network. Important features of ART1 are similarity measure (criterion), vigilance parameter ( ), and their function to classify the input patterns. Experimental results show that the similarity measure as designed originally does not increase the number of categories with the increased value of but decreases, too. This is against the claim of ‘stability-plasticity’ dilemma. A number of researchers have considered this and suggested alternative similarity measures. Here, we propose a new similarity criterion which eliminates this problem and also possesses the property of lowest list presentations needed for self stabilization of the network. We compare the results of different similarity criteria experimentally and present them in graphs. Analysis of the network under noisy environment is also carried out. 1. Introduction ART1 (Adaptive Resonance Theory 1) has evolved as of real time network model for unsupervised category learning and pattern recognition. ART system is very stable in both, search process and training. It is vigilant to unfamiliar and unexpect- ed events. At the same time, it memorizes the familiar and expected events. The familiarity (similarity) test is done by the novelty detector in the orientating sub- system of the ART1. For detail description on ART1 model, related terminology, algorithms and theorem, readers are referred to Carpenter and Grossberg [1]. Essential property of ART1 is concerned with the orienting subsystem express- ing whether input pattern matches the existing recognition code or is unfamiliar to the network requiring a new recognition code. The network permits us to vary the threshold value of vigilance to get the desired categorization. Given a fixed vigi- lance level, the network automatically rescales its sensitivity to patterns of variable complexity. A low vigilance leads to learning of coarse categories, whereas a high vigilance level leads to learning of fine categories. If the given vigilance value ( ) is less than or equal to the similarity criterion’s value of current input pattern,

Upload: ei-ei-khin

Post on 02-Aug-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ART1: Similarity Measures

Neural Processing Letters 6: 109–117, 1997. 109c 1997 Kluwer Academic Publishers. Printed in the Netherlands.

ART1: Similarity Measures

EI EI KHIN1, AMAROTTAM SHRESTHA2 and R. SADANANDA11Computer Science and Information Management Program, Asian Institute of Technology, GPO Box2754, Bangkok 10501, Thailand; 2Dept. of Information Technology, Sirindhorn InternationalInstitute of Technology, Thammasat University, P.O. Box 22, Thammasat Rangsit P.O., Pathumthani12121, ThailandE-mail:[email protected]: [email protected]

Key words: ART1, neural network, self stabilization, similarity measures, stability-plasticity dilemma

Abstract. This paper concerns with the ART1 (Adaptive Resonance Theory 1) in Neural Network.Important features of ART1 are similarity measure (criterion), vigilance parameter (�), and theirfunction to classify the input patterns. Experimental results show that the similarity measure asdesigned originally does not increase the number of categories with the increased value of � butdecreases, too. This is against the claim of ‘stability-plasticity’ dilemma. A number of researchershave considered this and suggested alternative similarity measures. Here, we propose a new similaritycriterion which eliminates this problem and also possesses the property of lowest list presentationsneeded for self stabilization of the network. We compare the results of different similarity criteriaexperimentally and present them in graphs. Analysis of the network under noisy environment is alsocarried out.

1. Introduction

ART1 (Adaptive Resonance Theory 1) has evolved as of real time network modelfor unsupervised category learning and pattern recognition. ART system is verystable in both, search process and training. It is vigilant to unfamiliar and unexpect-ed events. At the same time, it memorizes the familiar and expected events. Thefamiliarity (similarity) test is done by the novelty detector in the orientating sub-system of the ART1. For detail description on ART1 model, related terminology,algorithms and theorem, readers are referred to Carpenter and Grossberg [1].

Essential property of ART1 is concerned with the orienting subsystem express-ing whether input pattern matches the existing recognition code or is unfamiliar tothe network requiring a new recognition code. The network permits us to vary thethreshold value of vigilance to get the desired categorization. Given a fixed vigi-lance level, the network automatically rescales its sensitivity to patterns of variablecomplexity. A low vigilance leads to learning of coarse categories, whereas a highvigilance level leads to learning of fine categories. If the given vigilance value(�) is less than or equal to the similarity criterion’s value of current input pattern,

Page 2: ART1: Similarity Measures

110 EI EI KHIN ET AL.

reset signal is issued in the orientating subsystem to ensure the orthogonality of thecompared patterns.

Many similarity criteria are proposed and tested by researchers. Some of them,excluding original Carpenter and Grossberg criterion [1] are Cosine Angle [2],Hamming Distance, and, Sadananda and Sudhakara Rao [3]. Now, a brief expla-nation will be presented for each of these similarity measures in the followingsection.

2. Similarity measures

2.1. CARPENTER AND GROSSBERG METHOD

Similarity measure for this method can be represented by

� � jT :Xj=jXj

where

jT :Xj =

N�1Xi=0

tij � xi; jXj =

N�1Xi=0

xi and, jT j =N�1Xi=0

tij

and,� = vigilance parameter,T = stored exemplar,X = input pattern,N = the length of the input vector (in bits),xi = Component of X corresponding to node i in F1 layer,tij = Component of weight vector T connecting node i in F1 layer and node j in

F2 layer,� = MultiplicationThis method has ‘superset-subset’ problem i.e., Categorization is dependent

on the order of the input patterns presented. This problem can be overcome byrepeated list presentations but in real time usage, it is not guaranteed to receive thesame input pattern again. This similarity test ignores the additional features in theinput pattern apart from the common features between input and stored exemplar.The values of this similarity range from zero to one, corresponding to orthogonaland equal patterns, respectively.

2.2. COSINE ANGLE METHOD

The Cosine Angle similarity measure [2] for identical length binary vector patternscan be represented as

� � cos () = jT :Xj=fqjT j �qjXjg

Page 3: ART1: Similarity Measures

ART1: SIMILARITY MEASURES 111

Figure 1. Example patterns.

where, is the threshold angle between the two vectors T and X, and the other nota-tions are as specified in Carpenter and Grossberg’s similarity criterion described atSection (2.1). The values of similarity range between 0 and 1, same as the originalsimilarity criterion.

2.3. HAMMING DISTANCE METHOD

Hamming distance is the number of bits that are different in two identical lengthbinary vectors. Given two patterns 0011 and 1111, the Hamming Distance betweenthem is two. The Hamming Distance is also two for 0011 and 0000. The patterns1111 and 0000 are at equal Hamming Distance from the pattern 0011 although theyreally are very different. The following similarity measure is based on the conceptof Hamming Distance,

� � (T �X)=N

where T and X are two binary vectors of identical length, N , and

T �X =

N�1Xi=0

tij � xi and, � = Exclusive OR.

The notations used are as specified in Carpenter and Grossberg’s similarity criteriaexplained in section (2.1). The values range between 0 to 1, making it appropriatefor similarity criteria.

2.4. SADANANDA AND SUDHAKARA RAO METHOD

The similarity criterion can be represented by

� � jT:Xj=fjT j + jXj � jT:Xjg

with the same notations mentioned as in the Carpenter and Grossberg SimilarityCriterion (Section 2.1).

This criterion eliminates the superset subset problem encountered in the Car-penter and Grossberg’s method [1]. This similarity will be the ratio of the size of thesubset pattern to the size of the superset pattern irrespective of their presentationfor the patterns as shown in Figure 1. The similarity range value is the same as theabove described criteria.

Page 4: ART1: Similarity Measures

112 EI EI KHIN ET AL.

2.5. PROPOSED METHOD

One of the basic problem with the previous similarity measures is the formation ofhaphazard number of categories with the increase in vigilance value. The numberof categories does not increase in one direction monotonically with increase invigilance value. The proposed method stems from this fact of nonlinearity andsolves this problem by repeated consideration of magnitude of stored template andinput pattern (|T| and |X|).

It ensures the fine differentiation of the input patterns by giving the larger (�)value than other similarity criteria. This is evident from the fact that proposedsimilarity criterion has been added an extra positive term | T | to numerator anddenominator of the original Carpenter and Grossberg similarity criteria.

Proposed similarity measure is given by,

� � fjT:Xj + jT jg=fjT j + jXjg

with the definitions and notations as stated in Section 2.1.When the patterns are very different and have big size, the value of the proposed

similarity measure is zero. When patterns are identical, the value becomes oneindicating absolute similarity. This agrees with the boundary conditions zero andone suitable for similarity criterion.

The proposed similarity criterion is implemented by modifying the existingalgorithm for ART1 network without much complications.

3. Characteristics of Proposed Similarity Measure

The following lemmas characterize the proposed similarity criterion. Proofs forthose Lemmas can be found in [4].

Lemma(1): If the winning neuron’s corresponding template, T is a superset of aninput, I, then the similarity, � between the exemplar and input, will be one.

In Carpenter & Grossberg’s similarity method’s, the ratio will be one.

Lemma(2): If the winning neuron’s corresponding Template, T, is a subset of aninput, I, then the similarity ratio will be less than one.

The same is true for Carpenter and Grossberg’s method.

Lemma(3): If corresponding template T of the winning neuron is identical with theinput I, then, the similarity for both proposed method and Carpenter and Gross-berg’s method are 1.

Lemma(4): The winning neuron’s corresponding template T and input pattern Iare mixed templates, not satisfying any of the above type, the similarity will be lessthan one for proposed method as well as for Carpenter and Grossberg’s.

Page 5: ART1: Similarity Measures

ART1: SIMILARITY MEASURES 113

Figure 2. Experimental results (category formation) with uppercase characters as inputs fordifferent similarity criteria.

Lemma(5): If F2 layer has sufficient nodes for each input pattern and vigilanceparameter � is approximately one, then for proposed method, each member of alist of arbitrary n input patterns presented to the network will have direct accessto an F2 node same as in the Carpenter and Grossberg’s both for input patterns ofincreasing size and decreasing size.

In Carpenter and Grossberg’s method, at most one list presentation will berequired with input patterns of increasing size (X1 � X2 � X3 � X4 � ....... � Xn)and in the worst case, n list presentations will be required with input patterns ofdecreasing size (X1 � X2 � X3 � ...... � Xn).

4. Experimental Results

The simulated ART1 network is presented with the alphabets (uppercase and low-ercase) sequentially by using the new similarity measure. Figure 2 and Figure 3show the experimental results with uppercase characters as inputs and Figure 4 andFigure 5 show those with lowercase characters as inputs for stable categorization.

4.1. ANALYSIS OF RESULTS

As shown in figures, category formation for uppercase characters as well as low-ercase characters is monotonically increased with increase in vigilance value forthe proposed method. For other similarity criteria, there are peaks (increments) anddips (decrements) in the curve. This means that the number of categories formedincreases as well as decreases with increase in vigilance parameter.

Near vigilance value one, the number of clusters formed is indifferent for allmethods. It can be seen from the graphs that the numbers of list presentations needed

Page 6: ART1: Similarity Measures

114 EI EI KHIN ET AL.

Figure 3. Experimental results (list presentations) with uppercase characters as inputs fordifferent similarity criteria.

Figure 4. Experimental results (category formation) with lowercase characters as inputs fordifferent similarity criteria.

to direct access the corresponding stored templates for the proposed method isalways lower than the other methods for both uppercase characters and lowercasecharacters as inputs. It is an important point because it saves training time andmemory capacity. One significant event is that category formation fashion is moresensitive to the changes in vigilance value for our method near vigilance value one.

4.2. NOISY ENVIRONMENT TEST

The simulated network is presented sequentially with fresh and noisy patterns asshown in Figure 6 to analyze the performance of proposed similarity measure

Page 7: ART1: Similarity Measures

ART1: SIMILARITY MEASURES 115

Figure 5. Experimental results (list presentations) with lowercase characters as inputs fordifferent similarity criteria.

Figure 6. Noisy input patterns.

under noisy environment. The experimental results for these patterns are depictedas shown in Figure 7 and Figure 8 by comparing with the original Carpenter andGrossberg method.

Both Carpenter and Grossberg method and proposed method give 11 groups. Ournoisy input patterns are not complex enough to need the repeated list presentationsto self stabilize the network as shown in Figure 8. Finer categories are seen atsmaller vigilance values for Carpenter and Grossberg method but the results arethe same for both methods when vigilance is equal to 1.

5. Conclusion

Alternative similarity measure for novelty detector of ART1 is proposed and exam-ined by comparing with other similarity measures. It shows good promise to over-

Page 8: ART1: Similarity Measures

116 EI EI KHIN ET AL.

Figure 7. Experimental results (category formation) with noisy input patterns.

Figure 8. Experimental results (list presentation) with noisy input patterns.

come the stability plasticity dilemma. By doing so, applications which intend touse ART network paradigm are able to pick up the suitable similarity measure. Theproposed method can also be extended to use in other ART families such as ART2,ART2A, ART3, Fuzzy ART etc.

References

1. G.A. Carpenter and S. Grossberg, “A Massively Parallel Architecture for a Self-organizing NeuralPattern Recognition Machine”, Computer Vision, Graphics and Image Processing, Vol. 37, pp. 54–115, 1987.

2. M.M.A. Hashem, “Adaptive Resonance Theory: Characterization of ART1 and Development ofSimilarity Measure”, Master Thesis Asian Institute of Technology, Bangkok, 1993.

Page 9: ART1: Similarity Measures

ART1: SIMILARITY MEASURES 117

3. R. Sadananda and G.R.M. Sudhakara Rao, “ART1: Model Algorithm Characterization and NewSimilarity Metric Proposition in the Novelty Detector”, Proc. IEEE Neural Network Conference(Perth), Causal Productions, Australia, 1995.

4. E.E. Khin, “Concept Generalization in Adaptive Resonance Theory”, Master Thesis Asian Instituteof Technology, Bangkok, 1996.