sparse distributed memory

24
Sparse Distributed Memory A study of psychologically driven storage Pentti Kanerva Sparse Distributed Memory – p. 1/2

Upload: others

Post on 30-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sparse Distributed Memory

Sparse Distributed MemoryA study of psychologically driven storage

Pentti Kanerva

Sparse Distributed Memory – p. 1/24

Page 2: Sparse Distributed Memory

OutlineNeurons as address decodersBest matchSparse MemoryDistributed Storage

Sparse Distributed Memory – p. 2/24

Page 3: Sparse Distributed Memory

GoalThe goal of Kanerva’s paper is to present a method ofstorage that, given a test vector, can retrieve the bestmatch to the vector among a set of previously storedvectors. He builds this model from the ground up, firstusing neurons as address decoders. He then describes amethod of storage called sparse memory. This fails hisgoals, however, he builds on this model, culminating inwhat he calls distributed storage.

Sparse Distributed Memory – p. 3/24

Page 4: Sparse Distributed Memory

GoalKanerva believes that his model may be a goodgeneralization of how the human brain works. It isnon-conventional in that it does not strive to provide amethod for storing exactly that which one would like toretrieve. Rather, it provides a way to retrieve thosevectors that are very similar to a given vector.

Sparse Distributed Memory – p. 4/24

Page 5: Sparse Distributed Memory

Address Decoder NeuronsHave standard weight vector [called inputcoefficients]

BipolarLinear Threshold GateUnipolar Output

Sparse Distributed Memory – p. 5/24

Page 6: Sparse Distributed Memory

Key Observation

A one-dimensional weight vector can be realized as anaddress, for which the neuron can ‘code’ for.

The threshold value can be realized as the difference inthe desired address and the input address.

The Response Region of this type of neuron is the set ofinputs for which the output of the neuron is 1.

Sparse Distributed Memory – p. 6/24

Page 7: Sparse Distributed Memory

Mathematical AsideIt isn’t necessary that the weights be unipolar, but itmakes the model simple, and more useful. Also, it makesmuch more sense that each bit of the address contributesequally, otherwise the distance function becomesskewed.

Sparse Distributed Memory – p. 7/24

Page 8: Sparse Distributed Memory

Biological Aside

The addition and deletion of new synapses will not affectgreatly the address of an individual neuron. [Thiscorresponds to the expansion or contraction of the weightvector.] Furthermore, if synapses changed fromexcitatory to inhibitory or vice versa, the address wouldbe affected, but neurons are not known to make this sortof change.

Sparse Distributed Memory – p. 8/24

Page 9: Sparse Distributed Memory

Assumption for Rest of Work

Weights are constant over timeContributing to the idea that a neuron has anaddress

Thresholds are allowed to vary

Sparse Distributed Memory – p. 9/24

Page 10: Sparse Distributed Memory

Best MatchProblem: Given a data set of stored words, retrieve thebest match to a test word.

Formally: Describe a filing scheme for storing a set X ofwords so that one can retrieve, in minimum time, thestored word ζ that is the best match [closest in HammingDistance] to a test word z.

Recall: Hamming Distance between two binarysequences is the number of bits that disagree.

Sparse Distributed Memory – p. 10/24

Page 11: Sparse Distributed Memory

Problem with Conventional Architec-ture

It is quicker to check each word of a trillion-word dataset against each word of a trillion-word test set, than it isto load into memory the given data set.

Sparse Distributed Memory – p. 11/24

Page 12: Sparse Distributed Memory

Kanerva’s ‘Machine’An array of address decoding neurons, with the ability tochange all their thresholds as a unit. Output probablymuch like a hardware architecture, where the closestmatch will come out, when the threshold is passed.

Sparse Distributed Memory – p. 12/24

Page 13: Sparse Distributed Memory

Sparse [Random Access] Memory

Storage locations and addresses are given from thestart.Only contents of locations are modifiableStorage locations are very few in comparison withN = 2n

Storage locations randomly distributed over {0, 1}n

Sparse Distributed Memory – p. 13/24

Page 14: Sparse Distributed Memory

DefinitionsHard Locations: The random sample of actual storagelocations.

Nearest N ’-Neighbor, x’: The hard location closest tox ∈ N .

Sparse Distributed Memory – p. 14/24

Page 15: Sparse Distributed Memory

An example of a sparse memory

Address space N = {0, 1}1000

Hard Locations N ′ = 1, 000, 000

On average, each location x ∈ N is 424 bits from x′

Sparse Distributed Memory – p. 15/24

Page 16: Sparse Distributed Memory

Best Match w/ Sparse Memory

Let X be a random data set of 10,000 words.

Try storing each ξ ∈ X in ξ ′

With test word z and ξ closest to z in X , what is theprobability that the occupied hard location nearest to zcontains ξ?

Note: we are searching for ξ, located in ξ ′, and we arenot looking for z′ since z is not necessarily in X .

Sparse Distributed Memory – p. 16/24

Page 17: Sparse Distributed Memory

Distributed Storage

Many storage locations participate in a write/readoperationAppearance of a RAM with large address spaceIf η is stored at ξ, then reading from ξ retrieves η

Claim: Reading from address x similar to ξ retrievesa word y that is closer to η than x to ξ.

Sparse Distributed Memory – p. 17/24

Page 18: Sparse Distributed Memory

DefinitionsAccess Radius, Circle: If r is the access radius thengiven address x, we access all hard locations within r ofx when reading from or writing to x.

Contents of a Location x: Each storage location willcontain all of the words that have ever been written there.

Data of x, D(x): The data retrievable from address x isthe multiset of all the words in the hard locations insidethe access radius of x.

Reading at x: Returns a single word.

Sparse Distributed Memory – p. 18/24

Page 19: Sparse Distributed Memory

Best MatchGiven a test word z retrieve target word ζ ∈ X

1. Search D(z) for best match

2. Most frequent word of D(z)

3. Random word of D(z)

4. Compute archetypical element of D(z)

Sparse Distributed Memory – p. 19/24

Page 20: Sparse Distributed Memory

Finding the Best Match

Key Observation: If ζ is the word of X most similar tothe test word, we can expect it to occur the most often inD(z).

Obvious Attempt: Take the word that occurs mostfrequently.....Too much computation.

Weak Attempt: Take a random word.....Doesn’t work.

Working Attempt: Take the averageelement....Works....In most cases.

Sparse Distributed Memory – p. 20/24

Page 21: Sparse Distributed Memory

Computing the Archetypical Element

The Average Element: Take the bitwise sum of D(z).

If d(z, ζ) < 209 than in a series of steps the bitwise sumof each successive word read will converge on ζ .

Sparse Distributed Memory – p. 21/24

Page 22: Sparse Distributed Memory

Other Properties

Convergence/DivergenceMemory CapacityStorage Location CapacitySignal Strength/Noise/Fidelity

Sparse Distributed Memory – p. 22/24

Page 23: Sparse Distributed Memory

What’s it got to do with Neural Net-works?

Knowing that one knows may be indicated by fastconvergence.

Tip-of-the-tongue may be indicated by the critical point,at which we may recover the memory, and we may not.

Rehearsal of some action, might be indicated by thestorage of that item over and over.

The Effects of brain damage which sometimes seemnon-existent, may be indicative of a distributed storage,where it is still possible to retrieve memories.

Sparse Distributed Memory – p. 23/24

Page 24: Sparse Distributed Memory

Questions

???

Sparse Distributed Memory – p. 24/24