756 ieee transactions on neural networks, vol. 18, no. 3, may 2007

22
756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007 A Weighted Voting Model of Associative Memory Xiaoyan Mu, Paul Watta, and Mohamad H. Hassoun Abstract—This paper presents an analysis of a random access memory (RAM)-based associative memory which uses a weighted voting scheme for information retrieval. This weighted voting memory can operate in heteroassociative or autoassociative mode, can store both real-valued and binary-valued patterns and, unlike memory models, is equipped with a rejection mechanism. A theoretical analysis of the performance of the weighted voting memory is given for the case of binary and random memory sets. Performance measures are derived as a function of the model parameters: pattern size, window size, and number of patterns in the memory set. It is shown that the weighted voting model has large capacity and error correction. The results show that the weighted voting model can successfully achieve high-detection and -identification rates and, simultaneously, low-false-acceptance rates. Index Terms—Associative memory, capacity, neural network, re- trieval, voting, weighted voting. I. INTRODUCTION T HE associative memory problem is stated as follows. We are given a fundamental memory set of desired associa- tions: where and , . The task is to design a system which robustly stores the fundamental associations [13], such that the following holds: 1) when presented with as input, the system should pro- duce at the output. 2) when presented with a noisy (corrupted, distorted, or in- complete) version of at the input, the system should also produce at the output; 3) when presented with an input that is not suffi- ciently similar to any of the inputs in the memory set the system should reject the input. The meaning of the words noisy and not sufficiently sim- ilar are application dependent. For example, in an image pro- cessing application, translation, rotation, and scale variations are common types of image distortion. Early work on associative neural networks [9], [12], [15], [19], [20] focused on designing systems to meet the first two re- Manuscript received May 23, 2006; revised October 26, 2006; accepted November 14, 2006. This work was supported in part by the the University of Michigan under the Office of the Vice President for Research (OVPR) Grant. X. Mu is with the Department of Electrical and Computer Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803 USA (e-mail: [email protected]). P. Watta is with the Department of Electrical and Computer Engineering, University of Michigan-Dearborn, Dearborn, MI 48128 USA (e-mail: [email protected]). M. H. Hassoun is with the Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202 USA (e-mail: has- [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNN.2007.891196 quirements. In fact, this focus on just the first two requirements is characteristic of more recent research as well [10], [30], [31], [38]. For many practical applications, though, the third require- ment is as important or even more important than the first two. Many neural-net-based associative memories have no rejection mechanism and, hence, cannot even distinguish between mean- ingful patterns and pure noise. This glaring deficiency may ex- plain why few, if any, of the associative neural memory designs have found their way into practical applications. This is unfor- tunate since many important and practical applications require associative memory as a main component. Automated human face recognition is one such application. In [16], a voting-based model of associative memory was pro- posed and analyzed. This associative memory model was called a two-level decoupled Hamming memory in order to emphasize the two types of processing that it employs. The low-level pro- cessing consists of a network of random access memory (RAM) processors. Each RAM unit computes local distance measures between the given input pattern and each of the patterns in the memory set. After cycling through the entire memory set, each RAM processor casts a vote for the pattern in the memory set to which it is (locally) closest. The higher level processing of this memory consists of a voting network which tallies the votes cast for each memory set pattern, and then outputs the pattern with the most votes. The two-level decoupled Hamming memory was shown to have a number of advantages over other neural-net-based asso- ciative memories [16], [37]. First, it was shown both theoreti- cally and experimentally that the two-level decoupled Hamming memory has high capacity and offers a large amount of error cor- rection. Second, this model never produces spurious memories and cannot get stuck in oscillations. Third, this memory model is easy to maintain. Associations can be easily (and online) added or deleted from the memory set. Some neural net approaches re- quire complete retraining when adding or deleting a single asso- ciation. Fourth, given a single memory key, this model is able to retrieve more than just one pattern. By ordering the number of votes, the memory can retrieve an ordered list of best-matching patterns. Finally, the system can be implemented in hardware with present digital technology. In order to be consistent with the present discussion, we will refer to the two-level decoupled Hamming memory as the voting-based associative memory model, or simply voting model. In this paper, we extend the analysis of the voting model in several ways. First, the voting memory is enhanced by including a rejection mechanism. The rejection threshold is in terms of number of votes, which is typically easier to adjust and more robust than thresholds set up to discriminate among distances. Second, we generalize the voting scheme by allowing for each local processor to cast not just a single vote, but a set of weighted 1045-9227/$25.00 © 2007 IEEE

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

A Weighted Voting Model of Associative MemoryXiaoyan Mu, Paul Watta, and Mohamad H. Hassoun

Abstract—This paper presents an analysis of a random accessmemory (RAM)-based associative memory which uses a weightedvoting scheme for information retrieval. This weighted votingmemory can operate in heteroassociative or autoassociative mode,can store both real-valued and binary-valued patterns and, unlikememory models, is equipped with a rejection mechanism. Atheoretical analysis of the performance of the weighted votingmemory is given for the case of binary and random memory sets.Performance measures are derived as a function of the modelparameters: pattern size, window size, and number of patternsin the memory set. It is shown that the weighted voting modelhas large capacity and error correction. The results show thatthe weighted voting model can successfully achieve high-detectionand -identification rates and, simultaneously, low-false-acceptancerates.

Index Terms—Associative memory, capacity, neural network, re-trieval, voting, weighted voting.

I. INTRODUCTION

THE associative memory problem is stated as follows. Weare given a fundamental memory set of desired associa-

tions: whereand , . The task is to design a

system which robustly stores the fundamental associations [13],such that the following holds:

1) when presented with as input, the system should pro-duce at the output.

2) when presented with a noisy (corrupted, distorted, or in-complete) version of at the input, the system should alsoproduce at the output;

3) when presented with an input that is not suffi-ciently similar to any of the inputs in the memory set

the system should reject the input.The meaning of the words noisy and not sufficiently sim-

ilar are application dependent. For example, in an image pro-cessing application, translation, rotation, and scale variationsare common types of image distortion.

Early work on associative neural networks [9], [12], [15],[19], [20] focused on designing systems to meet the first two re-

Manuscript received May 23, 2006; revised October 26, 2006; acceptedNovember 14, 2006. This work was supported in part by the the University ofMichigan under the Office of the Vice President for Research (OVPR) Grant.

X. Mu is with the Department of Electrical and Computer Engineering,Rose-Hulman Institute of Technology, Terre Haute, IN 47803 USA (e-mail:[email protected]).

P. Watta is with the Department of Electrical and Computer Engineering,University of Michigan-Dearborn, Dearborn, MI 48128 USA (e-mail:[email protected]).

M. H. Hassoun is with the Department of Electrical and ComputerEngineering, Wayne State University, Detroit, MI 48202 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNN.2007.891196

quirements. In fact, this focus on just the first two requirementsis characteristic of more recent research as well [10], [30], [31],[38]. For many practical applications, though, the third require-ment is as important or even more important than the first two.Many neural-net-based associative memories have no rejectionmechanism and, hence, cannot even distinguish between mean-ingful patterns and pure noise. This glaring deficiency may ex-plain why few, if any, of the associative neural memory designshave found their way into practical applications. This is unfor-tunate since many important and practical applications requireassociative memory as a main component. Automated humanface recognition is one such application.

In [16], a voting-based model of associative memory was pro-posed and analyzed. This associative memory model was calleda two-level decoupled Hamming memory in order to emphasizethe two types of processing that it employs. The low-level pro-cessing consists of a network of random access memory (RAM)processors. Each RAM unit computes local distance measuresbetween the given input pattern and each of the patterns in thememory set. After cycling through the entire memory set, eachRAM processor casts a vote for the pattern in the memory set towhich it is (locally) closest. The higher level processing of thismemory consists of a voting network which tallies the votes castfor each memory set pattern, and then outputs the pattern withthe most votes.

The two-level decoupled Hamming memory was shown tohave a number of advantages over other neural-net-based asso-ciative memories [16], [37]. First, it was shown both theoreti-cally and experimentally that the two-level decoupled Hammingmemory has high capacity and offers a large amount of error cor-rection. Second, this model never produces spurious memoriesand cannot get stuck in oscillations. Third, this memory model iseasy to maintain. Associations can be easily (and online) addedor deleted from the memory set. Some neural net approaches re-quire complete retraining when adding or deleting a single asso-ciation. Fourth, given a single memory key, this model is able toretrieve more than just one pattern. By ordering the number ofvotes, the memory can retrieve an ordered list of best-matchingpatterns. Finally, the system can be implemented in hardwarewith present digital technology.

In order to be consistent with the present discussion, wewill refer to the two-level decoupled Hamming memory asthe voting-based associative memory model, or simply votingmodel.

In this paper, we extend the analysis of the voting model inseveral ways. First, the voting memory is enhanced by includinga rejection mechanism. The rejection threshold is in terms ofnumber of votes, which is typically easier to adjust and morerobust than thresholds set up to discriminate among distances.Second, we generalize the voting scheme by allowing for eachlocal processor to cast not just a single vote, but a set of weighted

1045-9227/$25.00 © 2007 IEEE

Page 2: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 757

votes. The resulting memory model will be referred to as theweighted voting memory. Third, a theoretical analysis of the ca-pacity of the weighted voting memory is given, as well as ananalysis of how to compute the weights from training data.

The innovations of the voting memory developed here wereinspired by the application of human face recognition [7], [42].The face-recognition research community has developed proto-cols for assessing system performance [32], [33]. There are thetwo main tests commonly used: identification test and watchlisttest. In both cases, we are given a database of patterns (face im-ages) to store. In the identification test, the system is tested withimages of known people (of course, the test images are differentfrom the database images). For a given input image, the task isto identify which database person it is. In this case, no rejectionstate is needed. The measure of merit here is the identificationrate (IR), which is the probability that a given input image willbe matched with the correct database person. In the associativememory literature, the term retrieval rate is commonly used andtypically means the same thing as identification rate. We will useboth terms interchangeably.

In the watchlist test, the system must have a rejection mech-anism. Here, the two test sets are required: , which con-tains images of the known people, and , which containsimages of strangers (people not in the database). For thetest set, there are the following two measures of merit: the de-tection and identification rate (DIR), which is the percentage ofimages that are correctly matched with the known individuals,and the false rejection rate (FRR), which is the percentage ofimages that are rejected by the system [32]. For the testset, there is only one measure of interest: the false acceptancerate (FAR), which gives the percentage of imposter images thatare incorrectly matched with someone in the database.

Of course there is a tradeoff between the detection andidentification rate and the false acceptance rate. Typically,face-recognition systems are designed with a tunable parameteror threshold which allows one to control the tradeoff betweenDIR and FAR. A receiver operating characteristic (ROC) curvecan be constructed which shows how DIR and FAR vary as afunction of . Note that the identification test can be seen as aspecial case of the watchlist test on , where the thresholdis set to 0 (so the system does not reject any images).

We propose that researchers in neural associative memoriesadopt both the identification and watchlist testing methodology.In this paper, for memory sets consisting of random binary pat-terns, we are able to derive theoretical expressions for the re-trieval rate, detection and identification rate, and the false ac-ceptance rate for both the voting model and the weighted votingmodel.

This paper is organized as follows. Section II reviews the op-eration of the voting model and extends the analysis to considerrejection and the watchlist problem. In Section III, a weightedvoting model is proposed and its operation is described. InSection IV, a theoretical analysis of the performance of theweighted voting memory is given. Section V gives experimentalresults on random memory patterns. These experimental resultsare compared to the theoretical predictions. Finally, Section VIsummarizes the results and discusses future extensions of thiswork.

Fig. 1. Structure of the voting associative memory.

II. VOTING ASSOCIATIVE MEMORY

In this section, we first review the operation of the votingassociative memory and some of the results in [16]. The pre-vious analysis relied mainly on a continuous approximation ofthe necessary probability distributions. In this paper, we derivethe corresponding discrete probability distributions. As will beshown in Section V, the discrete distributions give a much moreaccurate model of the behavior of the voting memory. In ad-dition, we extend the analysis by considering rejection and wederive measures of performance on the watchlist test.

A. Operation of the Voting Associative Memory

In the voting associative memory, the -dimensional inputand each memory pattern are partitioned into a collection ofnonoverlapping windows of size . For notional simplicity, wewill assume that divides , hence, the number of windows

is an integer.For the input (memory key), let de-

note the data in each window. That is, is the portion ofcontained in the th window, etc. The database patterns are

partitioned in the same way: ,. The partitioned database patterns can be stored in a

RAM-type network, where the th RAM holds all the databasepatterns associated with the th window: .Fig. 1 shows the architecture of a RAM network with nine win-dows arranged in a 3 3 structure.

The voting network requires a distance measure to be com-puted locally at each window. Let be a distance measure be-tween two -dimensional vectors. Any suitable distance func-tion can be used. In this paper, we use the Hamming distancefor binary memory patterns and the city-block distance for real-valued patterns. In either case, the (local) distance betweenand is given by

Page 3: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

758 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

Fig. 2. Distance calculation for the voting memory. The distance betweenthe highlighted window of the input and corresponding window of each of thememory set patterns is computed.

where and denote the th component of and, respectively.

The local distance calculation of the voting network isshown in Fig. 2. At each window, we compute a local dis-tance between the input key and all the memorypatterns. The smallest distance is found, say , and then thelocal window casts a vote for memory pattern . The decisionnetwork examines the votes of all the windows, chooses thememory pattern, say , that received the most votes, and then,outputs . Combination schemes more sophisticated thanplurality voting could also be used [14], [21]–[24], [40].

Note that the network structure of the voting memory is sim-ilar to the WISARD system of Aleksander [1], [25]. However,the present system is formulated to work with general types ofdata, not just binary data. In addition, the present system usesregular connections between the input and the RAM units andnot random connections. Finally, the voting-based method of in-formation retrieval is not present in the WISARD system.

It is easy to introduce a rejection mechanism in the votingmodel. We simply use a threshold to indicate whether thenumber of votes received by the best-matching pattern is suf-ficiently large. In the case that the number of votes received isless than , the input is rejected.

It is interesting to note the two-level structure of this asso-ciative memory network. The RAMs are low-level processorswhich operate on just a portion of the image. The decision net-work is a higher level computation which integrates and makessense of the low-level information. Of course the problem ofunderstanding the connection between low-level processing andhigh-level decision making has long been an area of interest inboth neurobiology and artificial intelligence [3], [6], [35], [43].

B. Assumptions for Theoretical Analysis

The voting network is suitable for storing real-valued and het-eroassociative memory sets. For the theoretical analysis that fol-lows, though, we assume that the memory set is binary-valued( ) and random (each memory pattern is generated

randomly and independently). We will assume that each com-ponent of the fundamental memory patterns has a 50% chanceof being 1 and 50% chance of being 0.

We start by considering the identification task. In this case,we test the system with noisy versions of the memory set pat-terns and see how well the system can retrieve the correct pat-tern. To create a suitable memory key, we proceed as follows.We first select one of the memory set patterns—called the targetmemory pattern, or simply target. The memory key is formed bycorrupting the target memory pattern with an amount of uni-form random noise; that is, with probability , each componentof the target pattern is flipped from its original value. Each ofthe remaining fundamental memories will be called non-target memory patterns.

The goal of the theoretical analysis is to derive expressionsfor the retrieval rate , detection and identification rate ,and the false acceptance rate as a function of the followingmodel parameters:

dimension of memory patterns ;

window size;

number of patterns in the memory set;

amount of noise.

C. Probability of Voting for the Target and Nontarget Patterns

Recall that our memory key and memory set patterns are par-titioned into windows. Let be a random variable whichgives the local Hamming distance between a single window ofthe input and the corresponding window of the target pattern.From our assumptions, follows a binomial distribution:

(1)

where , and is the combination of things takenat a time

Similarly, let be a random variable which gives the localdistance between the input and one of the nontarget patterns.The distribution for is given by [17], [37]

(2)

For each , the local window will vote for thetarget pattern if and , where ranges over all theindices except for the index of the target image. Hence,the probability that a local window will vote for the target pat-tern is given by

(3)

Page 4: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 759

Similarly, a local window will vote for the th nontarget pat-tern if and , where ranges over all the indices

except for and the index of the target image. Hence,the probability that a local window will vote for the nontargetpattern is given by

(4)

D. Number of Votes for the Target and Nontarget Patterns

and give the probability that a single window will votefor the target or one of the nontarget patterns, respectively. Thedecision network counts up the votes for each of the win-dows, and then, chooses the fundamental memory with the mostvotes. Here, we ask: what is the total number of votes receivedby the target and the nontarget memories?

Let be a random variable that gives the total number ofvotes received by the target pattern. Each window of the RAMnetwork performs a Bernoulli experiment with probabilityof success (i.e., getting a vote) and probability of failure(not getting a vote). If the experiment is repeated over thewindows, the probability that the target receives votes followsa binomial distribution:

(5)

where .Similarly, let be the total number of votes received by the

th nontarget pattern. also follows a binomial distribution ofthe form:

(6)

By the central limit theorem, and assuming , thebinomial distribution for approaches a normal distributiongiven by

(7)

where the mean and variance are given by

(8)

(9)

Similarly, the distribution for can be approximated as

(10)

with mean and variance given by

(11)

(12)

E. Estimation of the Correct Retrieval Rate—The ContinuousCase

We now have probability density functions for , thenumber of votes received by the target pattern, and , thenumber of votes received by the th nontarget pattern. Thesedensity functions can be expressed in discrete form [(5) and(6)], or in an approximate continuous form [(7) and (10)]. Thesystem will retrieve the correct pattern when is larger than

for each and every one of the nontarget memorypatterns. Of these nontarget patterns, we need onlyconcern ourselves with the one that received the maximumnumber of votes. Let be a random variable that gives themaximum number of votes among all the nontarget patterns.

In the previous analysis [16], the authors showed that the den-sity function for can be expressed in terms of the contin-uous densities as

(13)

where is the cumulative distribution function associatedwith . That is, for the th nontarget pattern

(14)

Then, the average value of is given by

(15)

In the original analysis [16], the probability of correct re-trieval was defined as . However, thisdefinition is not useful, since it does not properly model how thevoting memory works. Rather, for the identification problem,the probability of correct retrieval is simply the probability thatthe number of votes received by the target pattern exceeds thatreceived by the maximum nontarget pattern

(16)

Page 5: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

760 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

Assuming that and are independent random vari-ables (and since both are Gaussian), can bewritten as

(17)

The proof of (17) is shown in Appendix A. Upon substituting(13) and (14), we get an expression for the probability of correctretrieval in terms of the two Gaussian density functions and

(18)

F. Estimation of the Correct Retrieval Rate—The DiscreteCase

The probability of correct retrieval in (18) is an approxima-tion. It is possible, though, to derive an exact expression for theprobability of correct retrieval using the discrete density func-tions. Note that this analysis was not done in [16].

We have the discrete distributions for the number of votes re-ceived by the target pattern (5) and the number of votes receivedby a nontarget pattern (6). To compute the probability of correctretrieval, we first need to compute the (discrete) distribution ofthe number of votes received by the maximum nontarget pattern

.Unlike the continuous case, here, we have to consider the

possibility of ties among the nontarget patterns. For example,suppose the maximum number of votes among nontargetpatterns is . The probability that a single nontarget memoryset pattern received votes (and all the other nontargetpatterns received less) is .The probability that precisely of the nontarget patternsachieve number of votes (a -way tie at the top) is given by

. To account for allpossible ties, we sum over all possible values of

(19)

Substituting (6) into (19), we receive

(20)

As with the continuous case, the probability of correct re-trieval is the probability that . In the discrete case,though, is an integer that varies from 0 to . Hence,

Plugging in for the given value of and rearranging the in-equality, we receive

Assuming is independent of

(21)

G. Watchlist Test

Here, the voting network employs a rejection mechanism. Athreshold is set, and if the winning memory set pattern doesnot receive at least votes, the input is rejected. The probabilityof correct retrieval here is the DIR, which is the probability that

and

(22)

To compute in the discrete form, we start with (21), butnow the smallest allowable value for is

(23)

Similarly, in continuous form, is given by

(24)

Page 6: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 761

To see how well the system rejects unknown patterns (thefalse acceptance rate), we probe the system not with a corruptedversion of one of the memory patterns, but with a completelynew (and random) input pattern. Since there is no target patternin the memory set, all memory patterns are now considerednontarget patterns. For a single local window, the probabilitythat the th memory pattern receives a vote is a bit simpler than(4) and is given by

(25)

Let be the total number of votes received by the thmemory pattern. View as the probability of a single suc-cess (getting a vote). is the total number of successes if theexperiment is repeated , hence, follows a binomial dis-tribution:

(26)

Let denote the maximum number of votes received byone of the (nontarget) memory patterns. The probability den-sity function of can be obtained by accounting for all pos-sible ties [similar to (19)]. The only difference here is that thereare nontarget patterns instead of

(27)

Again, assuming a large number of windows, the discrete dis-tribution for in (26) can be approximated by a Gaussian dis-tribution of the form

(28)

where the mean and variance are given by

(29)

(30)

In the continuous case, the random variable is a max-imum over Gaussian random variables, and hence, similar to(13), and has density function given by

(31)

The probability of false acceptance is the probability that oneof the memory patterns receives more than votes:

. In discrete form, the probability of false ac-ceptance is computed as

(32)

and in continuous form

(33)

III. WEIGHTED VOTING MODEL

At the local level, the voting memory works in anall-or-nothing fashion. That is, the memory pattern thathas the smallest (local) distance gets a vote and all the othermemory patterns get nothing. In noisy environments, though,it is quite possible that the target pattern may not appear firston the list of best-matching patterns. In this case, the worsthappens: the window casts a vote for a nontarget pattern andthe target pattern gets nothing.

In this section, we extend the voting model by having eachwindow cast a set of weighted votes—one for each memory setpattern. This more general model is called the weighted votingmemory model.

A. Operation of the Weighted Voting Memory

The weighted voting model operates as follows. As before,we start with a fundamental memory set and partition it into win-dows. In general, the memory set can be heteroassociative andthe patterns real-valued. Now, suppose we are given a memorykey and we want to determine which (if any) memory pattern itshould be associated with. As before, we compute local distancemeasures at each window. However, instead of just choosingthe smallest distance and assigning a vote to the correspondingmemory pattern, we sort all the distances and assign a rank toeach. Let the memory set pattern that has the smallest (local) dis-tance be assigned rank . The pattern with the next smallestdistance will have rank , etc. The distance computations andranking of memory set patterns are done independently by eachlocal window. Hence, the weighted voting model has the sameparallel structure as the voting model shown in Fig. 1. The onlydifference is that each window now requires a sorting operationand not a simple select.

After all the rankings have been computed, they are routed tothe decision network. The decision network examines the rank-ings for each memory set pattern, and then, computes an appro-priate output. The design of the decision network will be givenin Section IV.

A simple example will help clarify the concepts. Suppose wehave a memory set consisting of patterns and suppose thepatterns are partitioned into windows (in a 3 3 ar-rangement). Suppose that for a given memory key, the local dis-tances are computed and sorted, and the resulting rankings areas shown in Fig. 3(a). For example, for the first window[high-lighted in Fig. 3(a)], the local distances , , and werefound to satisfy . Hence, for this window, memoryset pattern #1 is assigned rank , pattern #2 is assigned rank

Page 7: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

762 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

Fig. 3. Example of a memory set with three patterns and a set of 3� 3 local classifiers. (a) Rankings assigned to each memory set pattern by the weighted votingmodel. (b) Votes cast by the simple voting model.

, and pattern #3 is assigned rank . The ranking from allthe other windows are shown, as well.

The voting network can be seen as a special case of weightedvoting, where we only consider the rank information.Fig. 3(b) shows the corresponding pattern of votes registeredin the simple voting network. In this example, though, eachmemory set pattern receives three votes, and hence, the votingnetwork cannot adequately discriminate among the memory setpatterns. Clearly, in this example, the second and third placerankings give additional information that would lead us toprefer memory set pattern #1 over the other two.

B. Statistical Interpretation of the Weights

Now, how does the decision network operate? Given an inputpattern , what we really want to compute for each memoryset pattern is the probability that is the target. Let thisprobability be denoted , where denotes thetarget pattern. Each window of the weighted voting network canbe seen as a memory or classifier in its own right. We expect thateach local memory will give a crude estimatethat is the target (really, window only sees and not allof ). Now, how do we combine the local probabilities

to estimate ? We will adopt the commonlyused combination scheme of simply summing (or averaging) thelocal estimates [2], [18], [40]

(34)

Now, for the weighted voting model, how do we computethe local probabilities ? For each memory setpattern , what we have available is the set of rankings that

received at each local window (see Fig. 3). Let us tally therankings: let be the number of windows that were foundto have rank 1, the number of windows that have rank 2,

etc. In general, let denote the number of windows that haverank , . The number of windows is , so

(35)

Typically, is much larger than the number of windows, somany of the terms will be 0.

We propose that for the weighted voting memory, thetotal number of votes assigned to pattern be aweighted sum of the number of windows at each ranking

that received

(36)

where the weights are used to adjust the relativeimportance of each ranking. Note that the simple voting memorycan be seen as a special case of weighted voting withand all other weights set to zero: .

Although there are many possible ways of choosing properweights, we propose that the weights be set as follows:

rank (37)

That is, given the fact that we know that a memory pattern is(locally) ranked 1, is the probability that it is, in fact, thetarget pattern. In noisy environments, the target pattern does notalways locally get ranked 1, though; and, given the fact that amemory pattern locally receives a rank of , is the proba-bility that said memory pattern is the target. For notational con-venience, let rank . Hence, the total numberof votes received by pattern is given by

(38)

The probabilities , , can be computedfrom training data by running trial simulations. If no trainingdata is available, the following heuristic setting is found to be

Page 8: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 763

Fig. 4. Using the rankings in Fig. 3(a), the weight assigned to each localwindow is shown. The weighted vote assigned to each pattern is the sum ofthese local probabilities [see (34)].

useful: [27]. Note that, in general, a different setof probabilities can be computed at each window: ,

, . For example, in face recogni-tion, the weights used for windows near the eye region mightbe different than the weights for windows near the mouth re-gion. However, experimental evidence shows that, at least forface recognition, the weights are very similar from window towindow, and hence, a single set of weights can be used for allwindows.

For the special case of binary and random memory patterns,it is possible to derive theoretical expressions for andas a function of pattern dimension, window size, and noise level.In fact, the derivation will be given in Section IV.

Note that (38) shows how the local probabilities arecombined to produce a global measure that is the targetpattern. Again, the combination scheme is the simple summinggiven in (34) (the only difference is that the scaling factorhas been dropped here). To see this, let us continue the weightedvoting example given in Fig. 3(a). Based on the given rankings,Fig. 4 shows the weight assigned to each window of the memoryset patterns. Memory pattern #1 has , , and

. Using (38), the weighted vote assigned to memorypattern #1 is given by

which is the same as adding all the local probabilities formemory pattern #1 given in Fig. 4.

IV. DERIVATION OF WEIGHTED VOTING PARAMETERS

FOR BINARY MEMORY SETS

For the theoretical analysis that follows, we again assume thatthe memory set is binary-valued and random. Inaddition, we start with the identification problem and assumethat the memory key is constructed by choosing one of the fun-damental memories and corrupting it with an amount of uni-form random noise.

We want to compute how many votes are received by eachmemory pattern. As before, there are really only two cases toconsider: 1) the number of votes received by the target memorypattern (i.e., the memory pattern used to construct thememory key) and 2) the number of votes received by the thnontarget memory pattern . In the weighted voting scheme,

(38) is used to compute and . This equation, though,requires that we will first compute the weights and theexpected number of patterns at each rank for the target pattern

, as well as the th nontarget pattern .Note that, in this section, we are using the same notation ,, , , etc., that was used when discussing the voting model.

However, it should be clear from the context that all quantitiesin this section pertain to the weighted voting model.

A. Weights

To compute the weights, we need to compute rank .That is, at the local level, after all distances are computed andsorted, if we know that a particular memory pattern appears thon the list, what is the probability that it is actually the targetpattern? It will be more convenient to compute rank ,so we will employ Bayes’ theorem

rankrank

rank(39)

is the probability of randomly choosing the target pat-tern from the memory set, since there are possible memorypatterns and only one target pattern . In addition,

rank is the probability of finding rank in the sortedlist. Since rank is one of possible rankings, rank

. Hence, (39) reduces to

rankrank (40)

For convenience, let rank . To computethe weights, we have to compute for each rank . This willbe done in Section IV-B.

B. Number of Windows at Each Rank for the Target Pattern:

Let denote the total number of windows of the targetmemory that have rank . In this section, we derive the distri-bution for . Of course, the total number of windows of thetarget memory that will end up with rank can be computedif we know the probability that a single window of the targetmemory will have rank . As mentioned previously, this proba-bility is denoted . The quantity is the probability thatthe target memory is locally ranked 1, as shown schematicallyin Fig. 5(a). Note that this probability is the same as (3) (re-call that the voting model only considers the first rank case, soin Section III-A, the expression “probability of voting for thetarget” means probability that it is rank 1).

The probability that at a local window the target memory hasrank 2 (i.e., appears second in the ordered list of best-matchingpatterns) is shown schematically in Fig. 5(b). In this case,we require precisely one of the nontarget memories to beranked higher than the target. The derivation of is givenin Appendix B. Similarly, the conditions for deriving the prob-ability are shown schematically in Fig. 5(c). As shown in

Page 9: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

764 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

Fig. 5. Schematic diagram of the conditions for computing (a) P , (b) P ,and (c) P .

Appendix B, is given by

(41)

The set of probabilities will be used to compute .However, note that from (40), ; hence, these prob-abilities are the weights in (38) used to compute the number ofvotes assigned to each memory pattern.

Now, we are in a position to compute . At a singlewindow, gives the probability that the target memorypattern will be ranked . Of course, with probability ,the target memory will not be ranked at that window. If theexperiment is repeated over all windows, the probabilitythat there will be successes follows a binomial distribution:

(42)

C. Number of Windows at Each Rank for Nontarget Patterns

Let rank be the probability that the thnontarget memory set pattern has rank . The probability thatthe th nontarget memory will have rank 1 is the same as in(4). To compute the probability that the th nontarget memorypattern has rank 2, there are the following two cases to consider:1) the target pattern is in front of the th nontarget pattern (thatis, the target has rank 1) and 2) the target pattern is behind theth nontarget pattern. These conditions are illustrated in Fig. 6.

Similarly, the probability that the th nontarget pattern hasrank is the sum of the probability of the following two cases: 1)the target pattern is in front of the th nontarget pattern (among

Fig. 6. Schematic diagram of the conditions for computing P .

the top patterns) and 2) the target pattern is behind the thnontarget pattern. As shown in Appendix B, is given by

(43)

At a single window, gives the probability that the thnontarget memory pattern will be ranked , and givesthe probability that it will not be ranked . If the experiment isrepeated over all windows, the probability that there willbe successes follows a binomial distribution:

(44)

D. Continuous Approximation of and

By the central limit theorem, and assuming a large number ofwindows, each of the discrete densities in (42) (there are ofthem) can be approximated with a normal distribution. In thiscase, the continuous density function for is given by

(45)

where the expected value and variance of are given by

(46)

(47)

Page 10: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 765

TABLE IENUMERATION OF ALL POSSIBLE VALUES FOR N , N , AND N FOR THE SIMPLE EXAMPLE SHOWN IN FIG. 7.

PROBABILITY FOR EACH CASE IS SHOWN, ALONG WITH THE RESULTING VALUE OF N

Fig. 7. Simple example withm = 3memory patterns andN=n = 3windows.

Similarly, the discrete probability density functions in(44) can be approximated by a continuous normal density

(48)

where the expected value and variance of are given by

(49)

(50)

E. Weighted Votes Received by Both Target and NontargetPatterns—Discrete Case

Now, we have computed the weights (41) and thenumber of windows for each rank and for each memorypattern . Actually, we computed for the target pattern

(42) and for the th nontarget pattern (44). Wecan now compute the total number of votes received by eachmemory set pattern. The total number of votes received by thetarget pattern is given by

(51)

The probability distribution for can be computed by ex-haustively enumerating all possible rankings of the votes.To see this, let us construct an even simpler example than theone given previously. Suppose we construct a weighted votingnetwork with memory patterns and windows.A schematic diagram of such a memory set is shown in Fig. 7.

In this case, we can enumerate all possible values for ,as shown in Table I. For each set of values, Table I showsthe probability that it occurs and the resulting number of votes.The distribution for can be obtained by tabulating and orga-nizing the right-most column. For example, to find the proba-bility , we sum the probability for all rows of Table Ithat have .

Note that the sum of all the values must be . Hence,an analytic expression for the discrete probability density func-tion can be written as

(52)

where is the delta function

ifotherwise.

In general, for the weighted voting memory, the weights arenonnegative real numbers. Hence, the number of votes is notan integer [nor is in (52)]. However, as was seen in Table I,

Page 11: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

766 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

there are only a finite number of possibilities for , and hence,can be described by the discrete distribution given in (52).

Similarly, the discrete probability density function of canbe written as

(53)

Clearly, obtaining and by (52) and (53) is a compu-tationally intensive endeavour. In general, the number of non-trivial terms in the sum (or number of rows in the table) is

which increases very fast for large . For example, the tablerequired for a memory set with just patterns and

windows has roughly rows. Hence, the con-tinuous approximation of the discrete densities (developed inSection IV-F) will be very useful for the weighted voting model.

Let be the maximum of random variables .Similar to (19), the distribution for can be derived by ac-counting for all possible ties among the weighted votes receivedby the nontarget patterns

(54)

F. Weighted Votes Received by Both Target and NontargetPatterns—Continuous Case

Here, we again compute the total number of weighted votesreceived by the target pattern . However,this time, we use the normal approximations (45)–(47).A linear combination of independentnormally distributed random variables also follows a normaldistribution with mean and variance

[34]. Hence, follows a normaldistribution of the form:

(55)

Using the fact that , the mean and variance are givenby

(56)

(57)

Similarly, for the th nontarget pattern, the total number ofweighted votes is given by

(58)

Again, since is a linear combination of normally dis-tributed random variables [see (48)–(50)], then, is alsonormally distributed with density function

(59)

where the mean and the variance are given by

(60)

(61)

G. Estimation of Correct Retrieval Rate

As before, in the continuous case, the density for canbe obtained from the individual densities by using

(62)

where is the distribution function corresponding to .The probability of correct retrieval is the probability that

, which can be computed as follows:

(63)

H. Watchlist Test

Here, we use a threshold . Again, DIR is the probabilityand is computed as

follows:

(64)

For the false positive test, the memory key is created by gen-erating a completely random input. Here, there is no target pat-tern in the memory set, and all memories are nontarget pat-terns. At a single local window, the probability that the th

Page 12: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 767

memory has rank 1 is the same as the simple voting case (25)and is given by

(65)

The probability that the th memory has rank at asingle window is simpler than (43) because here we do not haveto worry about the position of the target pattern (since there isnone). In this case, we have nontarget patterns rankedhigher than pattern (there are ways of choosing these

patterns) and all other patterns have higher rank.Hence, is given by

(66)

At a single window, gives the probability that the thnontarget memory pattern will be ranked , and givesthe probability that it will not be ranked . If the experiment isrepeated over all windows, the probability that there willbe successes follows a binomial distribution:

(67)

The total number of votes received by the th nontargetmemory pattern is

(68)

Similar to (53) and (54), the discrete form of the probabilitydensity functions can be computed by going through allpossible combinations of

(69)

Let denote the maximum among . The distribu-tion for is given by

(70)

Each of the discrete probability density functionsin (67) can be approximated by a normal

distribution

(71)

where the mean and variance are given by

(72)

(73)

In the continuous case, since the total number of votesreceived by the th memory pattern (67) is a linear combinationof Gaussians, it had a normal distribution of the form

(74)

where the mean and variance are given by

(75)

(76)

In the continuous case, the distribution for can be ob-tained from

(77)

Finally, the probability of false acceptance is the probabilitywhich is given by

(78)

By varying the threshold , we can achieve different values ofand . An ROC curve can be plotted to show the tradeoff

between the two as a function of the threshold.

I. Discussion

Note that for the weighted voting model, we used all possiblerankings rank in constructing the weighted sumof votes in (38). Of course, one may limit the number of termsin the sum to the top, say , rankings

(79)

This will be useful in Section V because for large values ofand , the discrete distributions characterizing the weighted

voting model [(52), (53), (69), and (70)] cannot easily be ob-tained by the exhaustive procedure outlined in Section IV-F (toocomputationally intensive). However, the discrete distributionscan be computed if a suitably small value of is chosen.

Page 13: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

768 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

Fig. 8. Binary image (left) with various amounts of random noise.

V. EXPERIMENTAL RESULTS

In this section, we simulate the voting and weighted votingmodels on random memory sets. The experimental method-ology is as follows. We first choose model parameters: memoryset size , pattern dimension , window size , and inputnoise . Next, a memory set with random binary patterns isgenerated. An appropriate memory key is then generated. Forexperiments involving the detection and identification rate, oneof the memory patterns is randomly chosen as the target, and thememory key is created by corrupting the target with an amount

of noise. For experiments involving the false acceptance rate,a completely new and random memory key is created.

Next, the memory key is sent through the system, and allthe local distance calculations and rankings are made. For thevoting model, the values of , , and are tabulated (thetilda indicates quantities determined from simulation). For theweighted voting memory, the values ofare tabulated for the target pattern, andfor one of the nontarget patterns. From a single run, the simu-lated number of votes will not provide a good estimation of thetrue value. Hence, we repeat the aforementioned procedure overseveral runs (each time a new memory set is generated) and av-erage the results.

An estimate of the probability is obtained by di-viding the average value of by the total number of win-dows

Similarly, for the th nontarget pattern, .In Section V-A, we will see that the weighted voting model

and, to a lesser extent, the voting model can tolerate a largequantity of uniform random noise. In order to get a feel for thenoise levels that are being discussed, Fig. 8 shows the effect ofvarious amounts of noise. Here, the original image (the letter A)is corrupted with noise to .

A. Experiment 1

For the first experiment, the following parameters werechosen: memory set size , pattern dimension

, window size , and input noise. The left-hand side of Fig. 9 shows the results for the

voting network.Fig. 9(a) shows a histogram of the number of votes (experi-

mentally observed) received by the target pattern , a typicalnontarget pattern , and the nontarget pattern with the mostvotes . Fig. 9(b) shows the corresponding theoretical re-sults computed using the discrete distributions and Fig. 9(c)shows the continuous approximations.

To obtain these theoretical graphs, the probabilitiesin (1) and in (2) are com-

puted for each . Next, the values of andcan be computed using (3) and (4), respectively. It is possibleto code these calculations directly using nested iterative loops.Once these values are known, the discrete distributions for

, , and can be computed using (5), (6), and (20),respectively. In the continuous case, and are first usedto compute the mean and variances given in (8), (9), (11), and(12). Finally, the continuous distributions , , and

can be computed using (7), (10), and (13).As expected, the discrete distribution provides a very accu-

rate estimation of the experimental histograms. The continuousapproximation for is not as accurate. The reason for thisis that in the continuous case, we model each as a Gaussian.It is clear from Fig. 9, though, that is not a perfect Gaussian.The error in is magnified in computing because of theterm raised to the power [see (13)]. Fig. 10 shows a su-perposition of the continuous and discrete densities of Fig. 9.Here, underestimates the number of votes received by themaximum nontarget pattern. Hence, the retrieval rate willbe overestimated.

To use the weighted voting model, we first need to computethe weights. For each rank , Fig. 11 shows a plot of the the-oretical values of and obtained from (41) and (43),respectively (solid lines). The values obtained from the simu-lation and are also shown (as circles). Since there issuch close agreement between the theoretical and experimentalvalues, and are subsampled, and every fifth sampleis shown (otherwise, the graphs would completely overlap). Forsmall values of , exceeds , but for larger values of ,the situation is reversed, and exceeds . In computingthe output of the weighted voting model, the values areused as the weights for the experimental results, and arethe weights for the theoretical results.

The right-hand side of Fig. 9 shows the resulting histogramsfor the weighted voting model. In order to present theoreticalresults using the discrete model, we limited the weighted sumto include terms [see (79)]. In Fig. 9(d), the results ofthe simulation are given. Fig. 9(e) shows the corresponding the-oretical results using the discrete model (52)–(54), and Fig. 9(f)shows the continuous theoretical results (55)–(62). Comparingwith the previous voting results [Fig. 9(a)–(c)], we see that forthe same parameters, the weighted voting model offers betterperformance because the distributions for and aremore separated. For the voting model, the correct retrieval ratewas experimentally found [using (21)] to be (inpercent), with theoretical prediction . Note that,for convenience, we express these probabilities as a percentage;

Page 14: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 769

Fig. 9. (a)–(c) Results of the voting and (d)–(f) weighted voting models with parameters N = 300� 300, n = 10� 10, � = 0:48, and m = 150. (a) and (d)Simulation results. (b) and (e) Theoretical results (discrete distribution). (c) and (f) Theoretical results (continuous approximation).

that is, 100 times the actual probability. For the weighted votingmodel, the experimental result [again obtained from (21), butthis time using the distributions of the weighted voting model]was and the corresponding theoretical result is

.

B. Experiment 2

For the second experiment, we use the same system param-eters as in the experiment 1, except now the system dimen-sion is reduced: memory set size , pattern dimen-sion , window size , and inputnoise . The resulting histograms for , , andfor both the voting and weighted voting models are shown inFig. 12.

For the voting model, Fig. 12(a) gives the simulation resultand Fig. 12(b) gives the theoretical result (using the discretemodel). For the weighted voting model, Fig. 12(c) gives the sim-ulation result and Fig. 12(d) gives the theoretical result (againusing the discrete model).

As expected, reducing the pattern dimension reduces systemperformance. For the voting model, there is much more overlapbetween and [compare Fig. 12(a) with Fig. 9(a). Infact, the means of the two distributions are nearly the same.Again, the weighted voting model is able to provide more sep-aration between the two distributions. In terms of overall per-formance, the voting model gave and ,while the weighted voting model gave and

.

C. Experiment 3

For the third experiment, we will keep the same parametersas experiment 1, but this time double the number of memorypatterns , pattern dimension , windowsize , and input noise . The results forthe voting model are shown in Fig. 13(a) and (b) and for theweighted model in Fig. 13(c) and (d).

Here, the voting model shows a large overlap between the dis-tributions of and . As with the previous example, this

Page 15: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

770 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

Fig. 10. Comparison between the discrete density and the continuous approx-imation for the voting model.

Fig. 11. Comparison between simulation and theory for the case of 10 � 10window size, noise � = 0:48, memory set size m = 150, and image size 300� 300.

is clearly a case where the capacity of the voting model has beenexceeded (too many patterns stored for the given dimension andnoise level). Again, the weighted voting memory gives a muchbetter separation between the distributions.

A summary of the results of the previous three experiments isshown in Table II. In all three of these experiments, the perfor-mance of the weighted voting model is substantially better thanthe voting model.

D. Effect of Database Size

For the experiments in this section and in subsequent sections(unless otherwise noted), the parameters were set as follows:pattern dimension , window size , andinput noise . For the weighted voting model, the numberof terms in the sum is set to . Fig. 14 shows how theretrieval rate for both the voting and weighted voting memoryvaries as the memory set size is increased from to

.Clearly, the performance of the voting model degrades

sharply as the number of memory patterns increases. Theweighted voting model, though, is much more robust and is

able to achieve a retrieval rate of about 95% when storing 1000memory patterns.

E. Effect of Noise Level

Fixing the memory set size at patterns, Fig. 15shows the retrieval rate as the level of input noise is varied from

to .Fig. 15 shows the results for both the voting model and the

weighted voting model (the weighted voting are the two right-most curves). The experimental results are shown as circles, andthe theoretical results as a dashed line (discrete model) and solidline (continuous model).

F. Effect of Window Size

The selection of the window size is an important consider-ation for both the voting and weighted voting models. Fig. 16shows how the retrieval rate varies as a function of window sizefor the voting model [Fig. 16(a)] and the weighted voting model[Fig. 16(b)].

As noted in [16], on binary and random memory sets, the per-formance of the voting models is best when using very smallwindow sizes (for example, 2 2) or large window sizes (forexample, larger than 10 10). Performance suffers when inter-mediate window sizes are used (such as 5 5 or 6 6).

A natural question arises: If the retrieval rate can be maxi-mized by using just a single window of size , why shouldwe consider what happens with the intermediate window sizes?One reason is that correct retrieval performance is not the onlydesign consideration to be taken into account. Retrieval speedis also an important factor in practical systems. If parallel hard-ware is available which allows for the fast and efficient compu-tation of local distances (in a windowed approach), it is conceiv-able that such a system could operate many times faster than aconventional and serial system which uses just a single window.In this case, it is important to know how the system performswith intermediate sizes and important to know that, for a fixedwindow size, the proposed weighted voting scheme yields betterresults than standard voting.

In addition, previous studies [16], [27] have indicated that forcorrelated patterns such as face images, the detection and iden-tification rate is best for intermediate window sizes. From a de-signer’s point of view, this is an interesting and welcome resultbecause it allows one to find an optimal intermediate windowsize that works best for the problem at hand. In this case, weget the best of both worlds: fast parallel computation and highclassification results.

G. Effect of Number of Rankings to Include

In this experiment, we study the effect of the number of rank-ings that are included in the weighted sum of votes [see(79)]. Fig. 17(a) shows how the retrieval rate varies when isvaried from to . Here, the experimental re-sults are shown as circles and the continuous theoretical resultsare shown as the solid line. Notice that for large values of ,the continuous model underestimates . For small values of

, is overestimated, as can be seen in Fig. 17(b), wherewe zoom in on the range to . Clearly, for theweighted voting model, it is best to use a large value for .

Page 16: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 771

Fig. 12. (a) and (b) Results for the voting and (c) and (d) weighted voting models with N = 200 � 200, n = 10 � 10, � = 0:48, and m = 150. (a) and(c) Simulation results. (b) and (d) Theoretical results using the discrete distributions.

Fig. 13. (a) and (b) Results for the voting and (c) and (d) weighted voting models with parameters: N = 300� 300, n = 10� 10, � = 0:48, and m = 300.(a) and (c) Simulation results. (b) and (d) Theoretical results.

H. ROC CurvesAnROCcurveshowinghowboth and vary forvarious

memory set sizes is shown in Fig. 18. In Fig. 18(a), the results forthevotingmodelaregiven.Here, thesolid line is theexperimental

result and thedashed line is the theoretical result (discretemodel).The results for the weighted votingmodel are shown in Fig. 18(b).The performance of the weighted voting model is very good, andhence, most of the results are clumped at the top of the graph.

Page 17: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

772 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

TABLE IISUMMARY OF THE THREE EXPERIMENTS. THE THEORETICAL P AND EXPERIMENTAL ~P VALUES

FOR THE PROBABILITY OF CORRECT RETRIEVAL ARE EXPRESSED IN PERCENT

Fig. 14. Effect of database size on the retrieval rate for the voting model (lowercurve) and the weighted voting model (upper curve). Here, N = 60 � 60,n = 5 � 5, and � = 0:4. Both experimental results (circles) and discretetheoretical results (dashed line) are shown.

Recall that ROC curves are constructed by using variousvalues of the system threshold . For the left-most part of eachcurve, a large value of the threshold is used. In this case, to meetthe threshold, the best-matching candidate must receive a largenumber of votes. As the threshold is reduced, the detection andidentification rate increases, but at the expense of a higher falseacceptance rate.

Of course, to implement the proposed voting-based system,a single threshold must be chosen. The selected value dependson the nature of the application. For example, for high-securitysystems, a premium is placed on low false acceptance, hence, asuitably high threshold is chosen.

VI. SUMMARY

The goal of associative memory research is to design systemsthat can reliably store and retrieve information. Early researchinto associative neural memories demonstrated that a paralleland distributed processing paradigm can be used to design con-tent-addressable types of memory. Although these early designswere quite interesting and mathematically elegant, they werevery poor at doing what they were designed to do: reliably storeinformation. The voting-based systems discussed in this paperstill use the parallel and distributed processing approach, but thesystems we propose are much more practical. In fact, the charac-teristics of a practical memory system are explicitly outlined in

Fig. 15. Effect of noise on the retrieval rate for the voting model (lower twocurves) and the weighted voting model (upper two curves). Here,N = 60�60,n = 5 � 5, and m = 200.

this paper: the detection and identification rate should be maxi-mized, and (simultaneously) the false acceptance rate and falserejection rate should be minimized.

In this paper, we have extended the analysis of the votingassociative memory proposed in [16]. In addition, we proposeda generalization of the voting memory where rather than casta single vote for the best-matching memory set pattern, eachwindow casts a set of weighted votes. For the case of random andbinary memory set patterns, we were able to derive expressionsfor the retrieval rate, detection and identification rate, and falseacceptance rate for both the voting memory and the weightedvoting memory.

For random/binary memory, the simulations reported hereshow that the weighted voting memory consistently outper-forms the voting memory. The price to pay for this increasedperformance is in terms of computation time: the weightedvoting memory requires a local sorting operation, while thevoting memory only requires a min-select operation, andtraining: the weighted voting memory requires training dataand a training phase in order to compute the weights, while thevoting memory does not require such a training phase.

It is important to note that the weighted voting memory is notlimited to binary patterns. The memory has been successfullyapplied to the difficult problem of human face recognition (usinggrayscale patterns); details can be found in [27]–[29].

Page 18: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 773

Fig. 16. Effect of window size on the retrieval rate for (a) the voting model and (b) the weighted voting model. Both experimental results (circles) and discretetheoretical results (solid line) are shown. Here, m = 200, N = 60� 60, and � = 0:4.

Fig. 17. P versus number of terms R in the sum [see (79)]. Experimental results (circles) and theoretical results (solid line) using the continuous model areshown. (a) Range 0 � R � 100. (b) Range 0 � R � 20. Here, N = 60� 60, n = 5� 5, m = 200, and � = 0:4.

The weighted voting associative memory model proposed inthis paper has the following desirable properties.

1) The proposed model can operate with binary patternsor, with a simple change in local distance measure, withgrayscale patterns or real-valued patterns. Models like theHopfield memory and bidirectional associative memory(BAM) do not easily generalize to grayscale patterns (forexample, one can just use the binary expansion represen-tation for each pixel, but the resulting network will beeight times as big and require 64 times more weights. Forimage processing problems with a large number of pixels,the resulting network is simply too large to be practical).

2) The proposed system has a rejection mechanism and a tun-able threshold which allows the user to adjust and

.3) Our theoretical derivation completely characterizes the

performance of the voting memory and weighted votingmemory for binary and random memory sets. In fact, thetheoretical analysis gives a framework for how capacitycan be analyzed for a whole class of voting-based systems.

4) We have shown by simulation on binary and randommemory sets that the proposed weighted voting memoryoutperforms the voting memory. In addition, we have

shown that the proposed systems can reliably reject pat-terns with low signal-to-noise ratio. Hence, the proposedmemory exhibits very high performance and can be usedfor practical associative memory problems.

5) The proposed memory model never produces a spuriousmemory.

6) The memory can operate in autoassociative or heteroasso-ciative mode.

7) If desired, the memory can retrieve an ordered list of best-matching patterns and not just a single pattern.

8) The voting-based retrieval mechanism of the proposedmodel is excellent at handling localized noise [8]. Thisinteresting property will be explored in a future paper.

9) In terms of hardware realization, the proposed model ispractical in the sense that it does not require full intercon-nectivity and can be realized using present day digital tech-nology.

10) The system is very easy to maintain, and memory patternscan be easily added or deleted from the memory set withoutextensive retraining.

Note that the weighted voting strategy outlined here is quitegeneral and can be used with other types of features [5]; for ex-ample, eigenfaces [4], [26], [36], wavelets [11], [39], [41], etc.

Page 19: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

774 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

Fig. 18. ROC curves showing how P and P vary as a function of thethreshold (implicit in the graph) and memory set size for (a) the voting modeland (b) the weighted voting model. Here, N = 60 � 60, n = 5 � 5, and� = 0:4.

In future work, we will study the performance of the weightedvoting memory when using other types of features. In addition,we will look into ways of making the computation more effi-cient.

APPENDIX A

Here, we want to show that

(A-1)

The desired computation can be written as the difference oftwo random variables

. If we know the probability density , then theprobability is easily computed

(A-2)

For two independent random variables and , the distri-bution of their sum is given by the convolution formula

(A-3)

Assuming that and are independent random vari-ables, then the distribution of their sum is given by

(A-4)

Plugging into (A-2), we get

(A-5)

Interchanging the order of integration, we get

(A-6)and using a variable substitution , (A-6) can be writtenas

(A-7)Since is Gaussian, has the following symmetry

property: . Plugging in,we receive

(A-8)

which is the desired result (by our notation, and).

APPENDIX B

Here, we will derive expressions for and . Thesequantities are needed to compute the distribution for the numberof votes received by the target pattern at each rank : [see(42)] and the number of votes received by one of the nontargetpatterns: [see (44)].

A. Computation of

is the probability that, at a single local window, the targetpattern appears th in the sorted list. is the probability that

Page 20: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 775

the target memory has rank 1. This probability is the same as(3), and for convenience, is rewritten here

(B-1)

What is the probability that the target memory has rank 2; i.e.,appears second in the ordered list of best-matching patterns?For each , the target memory is the secondwinner if and for precisely one of the nontargetmemories, and for all the other memories (here

ranges over all indices excluding the index of thetarget pattern and ). A diagram illustrating these conditions isshown in Fig. 5(a). The probability of meeting all three of theseconditions is given by

(B-2)

Using (1) and (2), (42) can be written as

(B-3)

Similarly, there are three conditions to be met for the targetpattern to be the th winner, as illustrated in Fig. 5(b). Here,there are nontarget patterns with ahigher rank than the target pattern. In addition, there areways of selecting the nontarget memories ranked higherthan the target. Hence, is given by

(B-4)

Using (1) and (2), (B-4) can be written as

(B-5)

B. Computation of

Let rank be the probability that the thnontarget memory set pattern has rank . The probability thatthe th nontarget memory will have rank 1 is the same as in (4)

(B-6)

To compute the probability that the th nontarget memory pat-tern has rank 2, there are two following cases to consider: 1) thetarget pattern is in front of the th nontarget pattern (that is, thetarget has rank 1) and 2) the target pattern is behind the th non-target pattern. These conditions are illustrated in Fig. 6. In thefirst case, there are three following conditions that must be met:

and and for all other nontargetpatterns. In the second case, there are four following conditionsto be met: and for one of the nontarget pat-terns (there are ways to choose this nontarget pattern)and for the target pattern and for all other

nontarget patterns

(B-7)

Using (1) and (2), (B-7) can be written as

(B-8)

Similarly, the probability that the th nontarget pattern hasrank is the sum of the probability of the following two cases:the target pattern is in front of the th nontarget pattern (amongthe top patterns) and the target pattern is behind the thnontarget pattern. This time, there are ways to choose the

Page 21: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

776 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

first nontarget patterns in front of the rank nontarget pat-tern. In addition, there are ways to choose the preceding

nontarget patterns when the target is in front of the ranknontarget pattern

(B-9)

Using (1) and (2), (B-9) can be written as

(B-10)

REFERENCES

[1] I. Aleksander, W. Thomas, and P. Bowden, “WISARD a radical stepforward in image recognition,” Sens. Rev., pp. 120–124, 1984.

[2] F. M. Alkoot and J. Kittler, “Experimental evaluation of expert fusionstrategies,” Pattern Recognit. Lett., vol. 20, pp. 1361–1369, 1999.

[3] C. Altmann, H. Bülthoff, and Z. Kourtzi, “Perceptual organization oflocal elements into global shapes in the human visual cortex,” CurrentBiol., vol. 13, pp. 342–349, 2003.

[4] P. Belhumeur, J. Hespanha, and D. Kreigman, “Eigenfaces vs. fisher-faces: Recognition using class specific linear projection,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997.

[5] C. Bishop, Neural Networks for Pattern Recognition. New York: Ox-ford Univ. Press, 1995.

[6] A. Burton, V. Bruce, and P. Hancock, “From pixels to people: A modelof familiar face recognition,” Cogn. Sci., vol. 23, no. 1, pp. 1–31, 1999.

[7] R. Chellappa, C. Wilson, and S. Sirohey, “Human and machine recog-nition of faces: A survey,” Proc. IEEE, vol. 83, no. 5, pp. 705–740, May1995.

[8] L. Chen and N. Tokuda, “Robustness of regional matching scheme overglobal matching scheme,” Artif. Intell., vol. 144, pp. 213–232, 2003.

[9] P. Chou, “The capacity of the Kanerva associative memory,” IEEETrans. Inf. Theory, vol. 35, no. 2, pp. 281–298, Mar. 1989.

[10] G. Costantini, D. Casali, and R. Perfetti, “Neural associative memorystoring gray-coded gray-scale images,” IEEE Trans. Neural Netw., vol.14, no. 3, pp. 703–707, May 2003.

[11] B. Duc and S. Fischer, “Face authentication with Gabor informationon deformable graphs,” IEEE Trans. Image Process., vol. 8, no. 4, pp.504–516, Apr. 1999.

[12] M. H. Hassoun, Ed., Associative Neural Memories: Theory and Imple-mentation. New York: Oxford Univ. Press, 1993.

[13] M. H. Hassoun, Fundamentals of Artificial Neural Networks. Cam-bridge, MA: MIT Press, 1995.

[14] T. Ho, J. Hull, and S. Srihari, “Decision combination in multiple clas-sifier system,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 1,pp. 66–75, Jan. 1994.

[15] J. Hopfield, “Neurons with graded response have collective computa-tional properties like those of two-state neurons,” Proc. Nat. Acad. Sci.,,vol. 81, pp. 3088–3092, 1984.

[16] N. Ikeda, P. Watta, M. Artiklar, and M. Hassoun, “Generalizationsof the hamming net for high performance associate memory,” NeuralNetw., vol. 14, no. 9, pp. 1189–1200, 2001.

[17] N. Ikeda, P. Watta, and M. Hassoun, “Capacity analysis of the two-level decoupled hamming associative memory,” in Proc. Int. Joint Conf.Neural Netw., Anchorage, AK, May 4–9, 1998, pp. 486–491.

[18] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining clas-sifiers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp.226–239, Mar. 1998.

[19] T. Kohonen, Self-Organization and Associative Memory. Berlin,Germany: Springer-Verlag, 1989.

[20] B. Kosko, “Bidirectional associative memories,” IEEE Trans. Syst.,Man, Cybern., vol. SMC-18, pp. 49–60, 1988.

[21] L. I. Kuncheva, “A theoretical study on six classifier fusion strategies,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 2, pp. 281–286,Feb. 2002.

[22] L. Lam and S. Y. Suen, “Application of majority voting to patternrecognition: An analysis of its behavior and performance,” IEEE Trans.Syst., Man Cybern. A, Syst. Humans, vol. 27, no. 5, pp. 553–568, Sep.1997.

[23] S. Lin, S. Kung, and L. Lin, “Face recognition/detection by proba-bilistic decision-based neural network,” IEEE Trans. Neural Netw., vol.8, no. 1, pp. 114–132, Jan. 1997.

[24] X. Lin, S. Yacoub, J. Burns, and S. Simske, “Performance analysis ofpattern classifier combination by plurality voting,” Pattern Recognit.Lett., vol. 24, no. 12, pp. 1959–1969, 2003.

[25] G. Lockwood and I. Aleksander, “Predicting the behaviour of G-RAMnetworks,” Neural Netw., vol. 16, pp. 91–100, 2003.

[26] H. Moon and P. J. Phillips, “Analysis of PCA-based face recognitionalgorithms,” in Empirical Evaluation Techniques in Computer Vision,K. W. B. Owyer and P. J. Phillips, Eds. Los Alamitos, CA: IEEEComp. Soc. Press, 1998, pp. 57–71.

[27] X. Mu, “Automated face recognition: A weighted voting method,”Ph.D. dissertation, Dept. Electr. Comput. Eng., Wayne State Univ.,Detroit, MI, 2004, 48202.

[28] X. Mu, M. Hassoun, and P. Watta, “Combining local distance mea-sures: Summing, voting, and weighted voting,” in Proc. IEEE Int. Conf.Syst., Man, Cybern., Waikoloa, HI, Oct. 10–12, 2005, pp. 737–743.

[29] X. Mu, P. Watta, and M. Hassoun, “A weighted voting model of asso-ciative memory: Experimental analysis,” in Proc. IEEE Int. Conf. Syst.,Man, Cybern., Big Island, HI, Oct. 10–12, 2005, vol. 2, pp. 1252–1257.

[30] M. Muezzinoglu and C. Guzelis, “A Boolean Hebb rule for binary as-sociative memory design,” IEEE Trans. Neural Netw., vol. 15, no. 1,pp. 195–202, Jan. 2004.

[31] M. Muezzinoglu, C. Guzelis, and J. Zurada, “A new design methodfor the complex-valued multistate Hopfield associative memory,” IEEETrans. Neural Netw., vol. 14, no. 4, pp. 891–899, Jul. 2003.

[32] P. Phillips, H. Moon, S. Rizvi, and P. Rauss, “The FERET evalua-tion methodology for face-recognition algorithms,” IEEE Trans. Pat-tern Anal. Mach. Intell., vol. 22, no. 10, pp. 1090–1104, Oct. 2000.

[33] P. Phillips, H. Wechsler, J. Huang, and P. Rauss, “The FERET databaseand evaluation procedure for face recognition algorithms,” Image Vis.Comput., vol. 16, no. 5, pp. 295–306, 1998.

[34] J. Rice, Mathematical Statistics and Data Analysis, 2nd ed. Belmont,CA: Duxbury Press, 1995.

[35] J. Saarinen and D. Levi, “Integration of local features into a globalshape,” Vis. Res., vol. 41, no. 14, pp. 1785–1790, 2001.

[36] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neu-rosci., vol. 3, no. 1, pp. 71–86, 1991.

[37] P. Watta, N. Ikeda, M. Artiklar, A. Subramanian, and M. Hassoun,“Comparison between theory and simulation for the 2-level decoupledhamming associative memory,” presented at the IEEE Int. Conf. NeuralNetw., Washington, DC, Jul. 10–16, 1999, Paper #JCNN 0337, unpub-lished.

[38] R. Wilson and E. Hancock, “A study of pattern recovery in recurrentcorrelation associative memories,” IEEE Trans. Neural Netw., vol. 14,no. 3, pp. 506–519, May 2003.

Page 22: 756 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 3, MAY 2007

MU et al.: WEIGHTED VOTING MODEL OF ASSOCIATIVE MEMORY 777

[39] L. Wiskott, J. Fellous, N. Kruger, and C. von der Malsburg, “Facerecognition by elastic graph matching,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 17, no. 7, pp. 775–779, Jul. 1997.

[40] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multipleclassifiers and their applications to handwriting recognition,” IEEETrans. Syst., Man, Cybern., vol. 22, no. 3, pp. 418–435, May/Jun.1992.

[41] B. Zhang, H. Zhang, and S. Ge, “Face recognition by applying waveletsubband representation and kernel associative memory,” IEEE Trans.Neural Netw., vol. 15, no. 1, pp. 166–177, Jan. 2004.

[42] W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld, “Face recog-nition: A literature survey,” ACM Comput. Surv., vol. 35, no. 4, pp.399–458, 2003.

[43] S. Zhu and Y. Wu, “From local features to global perception: A per-spective of Gestalt psychology from Markov random field theory,” Neu-rocomput., pp. 939–945, 1999.

Xiaoyan Mu received the Ph.D. degree from WayneState University, Detroit, MI, in July 2004.

Currently, she is an Assistant Professor in theDepartment of Electrical and Computer Engineering,Rose-Hulman Institute of Technology, Terre HauteIN. Her research interests are in the areas of artificialintelligence, pattern recognition, neural networks,image processing, and computer vision.

Paul Watta received the B.S., M.S., and Ph.D. de-grees in electrical engineering from Wayne State Uni-versity, Detroit, MI, in 1987, 1988, and 1994, respec-tively.

Currently, he is an Associate Professor at theUniversity of Michigan-Dearborn in the Departmentof Electrical and Computer Engineering. His re-search interests include associative memory, imageprocessing, face recognition, pattern recognition,and computer music.

Mohamad H. Hassoun received the B.S., M.S., andPh.D. degrees in electrical engineering from WayneState University, Detroit, MI, in 1981, 1982, and1986, respectively.

He is a Professor in the Department of Electricaland Computer Engineering at Wayne State Univer-sity, and served as Interim Chair in 1994–1995. Hefounded the Computation and Neural Networks Lab-oratory which performs research in the field of arti-ficial neural networks, machine learning, and patternrecognition. He has numerous papers and book chap-

ters on artificial neural network subjects. He is the editor of Associative NeuralMemories: Theory and Implementations (Oxford Univ. Press, 1993). He is alsothe author of the graduate textbook entitled Fundamentals of Artificial NeuralNetworks (MIT Press, 1995).

Dr. Hassoun has served as Associate Editor and Reviewer for a number oftechnical journals. Since January 1998, he has been the Co-Editor-in-Chief ofthe Neural Processing Letters. He served on the program committees of severalinternational conferences on neural networks. He received a National ScienceFoundation Presidential Young Investigator Award in 1990 and a number ofteaching awards at Wayne State University including the President’s Award forExcellence in Teaching.