1 8. auto-associative memory and network dynamics lecture notes on brain and computation byoung-tak...

1

8. Auto-associative memory and network dynamics

Lecture Notes on Brain and Computation

Byoung-Tak Zhang

Biointelligence Laboratory

School of Computer Science and Engineering

Graduate Programs in Cognitive Science, Brain Science and Bioinformatics

Brain-Mind-Behavior Concentration Program

Seoul National University

E-mail: [email protected]

This material is available online at http://bi.snu.ac.kr/

Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002.

(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

Outline

2

8.1

8.28.3

8.4

8.58.68.7

Short-term memory and reverberating network ac-tivityLong-term memory and auto-associatorsPoint-attractor networks: the Grossberg-Hopfield modelThe phase diagram and the Grossberg-Hopfield modelSparse attractor neural networksChaotic networks: a dynamic systems viewBiologically more realistic variations of attractor networks


8.1 Short-term memory and reverberating net-work activity8.1.1 Short-term memory

Short-term memory¨ Ability to hold information temporarily¨ Recency effect

Tendency to remember recent objects Physiological level

¨ Hold corresponding neural activity over a certain duration Working memory

¨ Another type of short-term memory¨ Discuss in Chapter 11

3


8.1.2 Maintenance of neural activity

Monkey experiment: A monkey was trained that maintain its eyes on a central fixed spot until a ‘go’ signal(tone) is given

The target location for each trial had to be re-membered during the de-lay period

The dorsolateral pre-frontal cortex activity

Fig. 8.1: The neurons were sensitive to the par-ticular target direction of 270 deg

4

Fig. 8.1 Maintenance of delay activity in physiological experiments


8.1.3 Recurrences

Delay activity can be used to store activity over some period of time The question is how such activity can be maintained by a neural network

¨ Recurrences

1. A recurrent node is able to maintain its firing as long as the recurrent pathway is strong enough;

2. There is some delay in the feedback so that the re-entry signal does not fall within a refractory time of the node; and

3. A possible leakage in the recurrent pathway is small enough so that it is possible to fire the node again.

5

Fig. 8.2 (A) Schematic illustration of an auto-as-sociative node that is distinguished from the as-sociative node as illustrated in Fig. 7.1A in that it has, in addition, a recurrent feedback connection. (B) An auto-associative network that consist of associative nodes that not only receive external input from other neural layers but, in addition, have many recurrent collateral connections be-tween the nodes in the neural layer.


8.2 Long-term memory and auto-associa-tors Auto-associative memory

¨ The input of each node is fed back to all of the other nodes in the network

A recurrent network model¨ Tune the recurrent connections

The back-projections in this associative network¨ Enhances the pattern completion ability

The recurrent model¨ Anatomically faithful¨ Collateral connections¨ Intercortical back-projections

6


8.2.1 The hippocampus and episodic mem-ory The hippocampus has been associated with the form of LTM

called episodic memory¨ The storage of events¨ Area CA3 has well developed axon collaterals connecting¨ A recurrent auto-associative network

The acquisition of episodic long-term memory Lesion experiment

¨ Amnesia ¨ Inability to form new long-term memory

Intermediate-term memory

7


8.2.2 Learning and retrieval phase

A difficulty that occurs when combining associative Hebbian mechanisms with recurrences in the networks¨ A training phase and a retrieval phase

Switching between the learning and retrieval phase could be accomplished in the brain:¨ Mossy fibres command the firing the firing of specific

CA3 neurons¨ Chemical agents

Acetylcoholine (Ach) and noradrealine Neuromodulators

8


8.3 Point-attractor networks: The Gross-berg-Hopfield model Grossberg-Hopfield model

¨ The understanding and popularization of recurrent auto-associa-tive memory

¨ The dynamic properties Attractor states

¨ A dynamic system reaches asymptotically The brain does not rely entirely on the necessity of settling

into an attractor states The rapid path towards an attractor state

¨ Sufficient to use such networks as associative memory devices

9


8.3.1 Network dynamics

Considering the time domain A network of sigma nodes governed by the leaky integrator dy-

namics The change of the internal state of node I

A discrete version

Set Δt=τ=1

The stationary states, dhi/dt = 0, of the continuous system (eqn. 8.1) are also the fixpoints of the discrete system (eqn. 8.3)

The transient response dynamics of model

10

j

extijiji tItrwth )()()1(

j

extijiji

i tItrwthdt

tdh)()()(

)(1

j

extijijii tItrw

tth

ttth )()()()1()(

: Continuous system(8.1)

(8.2)

(8.3)


8.3.2 Hebbian auto-correlation learning

The weights are not random¨ Specifically self-organized by Hebbian learning

Capacity analysis and recall abilities A binary system, ri {0,1}, ∈

¨ The variables Train the system on a set of random patterns ξi

μ∈{-1,1}, ¨ The index μ labels the pattern, μ = 1 ,…, P¨ The Hebbian rule

The threshold gain function of the net input hi,

11

12 ii rs

jiij Nw

1

)(sign ii hr

(8.4)

(8.5)

(8.6)


8.3.3 Signal-to-noise analysis

How the network behaves on its own The external input to zero,

μ = 1 for demonstration,

First term is signal Second term is the cross-talk term

¨ ‘Noise’

12

jjjii

jjiji

tsN

ts

tswts

)(1

sign)1(

)(sign)1(

j

P

jjiii

j j

P

jjijjii

tsN

tsts

tsN

tsN

ts

2

2

11

)(1

)(sign)1(

)(1

)(1

sign)1(

(8.7)

(8.8)

(8.9)

(8.10)


8.3.4 One pattern

Only one imprinted pattern¨ Not have a cross-talk term

The dynamics of this network

Not start the network with the trained pattern¨ A noisy version¨ Retrieve the learned pattern

in a moderately noisy version of the trained pattern A point attractor

¨ The trained pattern

13

)(sign)1( tsts iii (8.11)


8.3.5 Many patterns

The cross-talk term is a random variable¨ Estimate the variance

Each part of the sum in the cross talk term is distributed num-ber with 1 to -1

The variance,

The error function, A load parameter,

14

N

P

N

P )()1(

)

2erf(1

2

1error

SP

2bound1- )21(erf2

1

P Fig. 8.3 The probability distribution of the cross-talk term is well

approximated by a Gaussian with zero and variance σ. The value of the shaded area marked Perror is the probability that the cross-talk term changes the state of the node. The table lists examples of this probability for different values of the load parameter α.

(8.12)

(8.13)

(8.14)


8.3.6 Point attractors

The pattern completion ability of the associative nodes The distance

¨ a = ξ1, b = s

15

)||||||||

'1(

2

1),(

ba

baba

d

Fig. 8.4 Simulation results for an auto-associative network with quasicontinuous leaky-integrator dynam-ics, N = 1000 node, and a time constant of τ = 10 ms. The time step was set to Δt = 0.01 ms. (A) Distance between a trained pattern and the state of the network after the network was initialized with a noisy ver -sion of one of the 100 trained patterns. (B) Dependence of the distance at t = 1 ms on the initial distance at time t = 0 ms

(8.15)


8.4 The phases diagram and the Grossberg-Hopfield model8.4.1 The load capacity αc

Sparse connectivity¨ The number of connections per node, C,

The load capacity αc

16

C

N pat

Fig. 8.5 Simulation results for an auto-as-sociative network equivalent to the net-work used in Fig. 8.4 but with different numbers of patterns I the training set Npat. The distance of the network state from the first training pattern at time t = 1 ms is shown. The network was initialized with a version of the first trained pattern with 1% reversed component

(8.16)


8.4.2 The spin model analogy

Spin models (developed in statistical physics) Central to the correspondence of spin models and the recur-

rent models¨ The binary state

Thermal noise, T A sharp transition between

¨ a paramagnetic phase and No dominant direction of the magnets

¨ the ferromagnetic phase A dominating direction of the elementary magnets

Phase transition

17


8.4.3 Frustration and spin glasses

The situation in auto-associative network is further compli-cated¨ The force between the nodes is not consistently positive

The Hebbian rule¨ Positive and negative weights¨ complicated spin states of the system¨ Frustrated systems or spin glasses

Mean field theory Replica method

18


8.4.4 The phase diagram

The phase diagram of the Grossberg-Hopfield model Noise in the network A probabilistic version

19

)/)(2exp(1

1)1)((

))((sign)(

1 TthtrP

thtr

i

ii

Fig. 8.6 Phase diagram of the attractor net-work trained on a binary pattern with Heb-bian imprinting. The abscissa represents the values of the load parameter α = Npat/C, where Npat is the number of trained patterns and C is the number of connections per node. The ordinate represents the amount of noise in the system. The shaded region is where point attractors proportional to the trained pattern exit. The network in this re-gion can therefore function as associative memory.

(8.17)(8.18)


8.4.5 Spurious states

Noise can help the memory performance The network with a pattern that has the sign of the majority of the

first three patterns: The state of the node after one update of this node

If the components ξ1,ξ2, ξ3 all have the same value, which happen with the probability of ¼, then we can pull out this value from the sum in the signal term,

if ξ3 has different sign

Average a signal that has the strength of

times the signal when updating a trained pattern

20

)(sign)1( 321iii

mixii ts

j

iii

N

jij

iiijijijij

iii

N

jii

patpat

NNNts ))(sign

1))((

1sign())(sign

1(sign)1( 321

4

321332211321

1

)(sign)(1

))((1 3213211321332211

iiiiiiij

iiijijiji NN

)(sign)(1

))((1 3213211321332211

iiiiiiij

iiijijiji NN

8

6

2

1*

4

3

2

3*

4

1

ξ1 ξ2 ξ3 ξ1+ξ2+ξ3

1111-1-1-1-1

11-1-111-1-1

1-11-11-11-1

311-11-1-1-3

(8.19) (8.20

)

(8.21)

(8.22)


8.4.6 The advantage of noise

Spurious states ¨ attractors under the network dynamics

The average strength of the signal for the spurious states is less than the signal for a trained pattern¨ The spurious states under normal conditions are less stable than attractors re-

lated to trained patterns With an appropriate level of noise

¨ Kick the system out of the basin of attraction of some spurious states and into the basin of attraction of another attractor

Noise can help to destabilize undesired memory states The average behavior of the network

¨ Good assumption in the large network The behavior of particular network depends strongly on the specific realiza-

tion of the training pattern¨ The phase diagram is specific to the choice of training patterns

21


8.5 Sparse attractor neural networks

The load capacity for the noiseless Grossberg-Hopfield model with starndard Hebbian learning is about 0.138¨ Training patterns are uncorrelated

The sensory signals are often correlated¨ A fish image and water image¨ The cross-talk term can yield high values¨ Solution

The training patterns get modified to yield orthogonal patterns αc = 1

22

elsewhere0

if1

'

(8.23)

(8.24)


8.5.1 Expansion coding

Orthogonalization¨ Expansion coding

23

Fig. 8.7 Example of expansion cod-ing that can orthogonalize a pattern representation with a single-layer perceptron. The nodes in the per-ceptron are threshold units, and we have included a bias with a separate node with constant input. The or-thogonal output can be fed into a recurrent attractor network where all these inputs are fixpoints of the attractor dynamics.


8.5.2 Sparse pattern

The expansion coding¨ The load capacities of attractor networks can be larger for pat-

terns with sparse representations. The sparseness

The storage capacity of attractor networks

¨ k is a constant (roughly on the order of 0.2~0.3)¨ Sparseness a = 0.1, 10,000 synapses¨ The number of patterns that can be stored exceed 20,000

24

)/1ln( aa

kc

22/ rra (8.25

)

(8.26)


8.5.3 Alternative learning schemes

The cross-talk term¨ Minimized with different choices of weight matrices¨ The overlap matrix

¨ The pseudo-inverse method

The inverse orthogonalized the input pattern and the storage capac-ity is αc = 1

Not very plausible biologically

25

i

jiQ

ii

Nwij )(

1 1 Q

(8.27)

(8.28)


8.5.4 The best storage capacity with a sparse pattern There are many possible learning algorithms

¨ Each learning rule can lead to a different weight matrix A certain training set we can ask what the load capacity of the

network¨ Try out all the possible weight matrices

A daunting task The maximal storage capacity of auto-associative network

with a binary patterns

The simplest Hebbian rule comes close to givig the maximum value

26

)/1ln(

1

aac (8.29)


8.5.5 Control of sparseness in attractor networks Training recurrent networks on patters with sparseness a with

the basic Hebbian covariance rule,¨ Not enough to ensure that the state that is retrieved also has

sparseness of aret = a Mean Variance A weight matrix with Gaussian-distributed components means

that the amount of inhibition is equal to the amount of excita-tion

27

))(( ararkw jiij

0)1()1(2)1()( 222222 aaaaaawwPw 0 w

)(d)(

)(d)(

0

0

wwPW

wwPW

ex

in

1)1(

)1(

ex

inret

retexretin

W

Wa

aWaWri rj δw P(w)

0011

0101

a2

-a(1-a)-a(1-a)(1-a) 2

(1-a) 2

a(1-a)a(1-a)

a2

(8.30)

(8.31)

2222)1()( aawPww )1(pat2 aaNw (8.32

)

(8.33)

(8.34)

(8.35)(8.36)

(8.37)

(8.38)


8.5.6 Global inhibition (1)

28

Activity-dependent global inhibition,

The retrieval sparseness

cww ijij

Fig. 8.8 (A) A Gaussian function centerd at a value –c. Such a curve describes the distribution of Hebbian weight values trained on random patterns and includes some global inhibition with strength value c. The shaded area is given by the Gaussian error function described in Appen-dix B. (B) Theoretical retrieval sparseness aret as a func-tion of global inhibition c is plotted as thin lines for differ-ent values of the sparseness of the pattern set a, from a = 0.05 (lower curve) to a = 0.5 (upper curve) I steps of Δa = 0.05. We assumed therein 40 imprinted patterns with Gaussian-distributed components of the weight matrix. The thick line shows where the retrieval sparseness matches the sparseness of the imprinted pattern (aret = a).

)21(erf2

)2

(erf12

1

2

1

2

1

2

1

2121

1

1-

0

2ret

dif

1

dif

dif

2

2

ac

cdtea

WW

Wa

ct

ret

(8.39)

(8.40)

(8.41)(8.42)


8.5.6 Global inhibition (2)

29

Fig. 8.9 (A) The simulated curve for pattern with sparseness a = 0.1. The plateau is due to the attractor dynamics not taken into account in the analysis that led to Fig. 8.8B. The lower curve indicates the aver -age Hamming distance between the imprinted pattern and the network state updating the network. Cor-rect recalls were indeed achieved for inhibition constants that coincide with the plateau in the retrieval sparseness. (B) normalized Histogram of weight components for 40 patterns trained with the Hebbian covariance rule. (C) Normalized histogram of weight components for 400 patterns trained with the Heb-bian covariance rule.


8.6 Chaotic networks: a dynamic systems view

The theory of dynamic systems¨ Auto-associative memories = ‘point attractors’

Equations of motion

The dynamic of a recurrent network with continuous dynamics Dimensionality

¨ The number of equations¨ The number of nodes in the networks

The vector x is state vector State space Trajectory

30

)(xfx

dt

d

(8.43)


8.6.1 Attractors

Lorenz system

A recurrent network of three nodes

31

3213

2312

121

)(

)(

cxxxdt

dx

xxbxdt

dx

xxadt

dx

otherwise

w

w

c

b

a

xxwxwdt

dxkj

jkijki

jij

i

0

1

1

{ and

00

01

012312

2213

11

21

ww

Fig. 8.10 Example of a trajectory of the Lorenz sys-tem from a numerical integration within the time in-terval 0 ≤ t ≤ 100. The parameters used were a = 10, b = 28, and c = 8/3.

(8.44)

(8.45)

(8.46)

(8.47)

(8.48)


8.6.2 Lyapunov functions (1)

A system has a point attractor if a Lyapunov function (energy function) exists

‘Landscape’ If there is a function V(x) that never increases under the dy-

namics of the system,

32

0)(

dt

dV x

Fig. 8.11 A ball in an ‘energy’ landscape.

)(xV

(8.49)


8.6.2 Lyapunov functions (2)

Lyapunov function for the recurrent networks

33

j

extijiji tItrwth )()()1(

ii

ext

i jjiijn rIrrwrrV

2

1),...,( 1

kkk

ext

k jjkkj

k jjkkj

trtrI

trtrwtrtrw

tVtVV

)()1(

)()(2

1)1()1(

2

1

)()1(

ij

extjjiijii

ikiiikkii

ijjiji

ikkkji

ijjiji

Itrwwtrtr

trtrbtrwtrtrwtr

trwtrtrwtrV

)}()(2

1{)()1(

)()1()()(2

1)()(

2

1

)()1(2

1)()1(

2

1

)()()1( thtrtrV iii

(8.50)

(8.51)

(8.52)

(8.53)

(8.54)


8.6.3 The Cohen-Grossberg theorem

General systems with continuous dynamics

Lyapunov function under the conditions¨ Positivity ai ≥ 0: The dynamics must be a leaky integrator rather

than an amplifying integrator¨ Symmetry wij = wji: The influence of one node on another has to

be the same as the reverse influence¨ Monotonicity sign(dg(x)/dx) = const: The activation function

has to be a monotonic function

34

N

kkkikiiii

i xfwxbxadt

dx

1

))(()()((8.55)


8.7 Biologically more realistic variations of attractor networks The synaptic weights between neurons in the nervous system

¨ Cannot be expected to fulfill the conditions of symmetry of the weight matrix required to guarantee stable attractors

Neurons receive a mixture of input from excitatory and in-hibitory presynaptic neuorns

Dale’s principle The Cohen-Grossberg theorem

¨ An inhibitory node could only receive inhibitory connections and vice versa

35


8.7.1 Asymmetric networks

Simple case of non-symmetric weight matrices¨ A symmetric and an anti-symmetric part

The difference between two consecutive time steps

36

sji

aij

sji

sij

aass

ww

ww

gg

www

jiforgg

jifor

jiforgg

was

as

ij 0

2|))1(||)((|)( tttd rr

Fig. 8. 12 (A) convergence indicator for networks with an asymmetric weight matrix where the indi-vidual components of the matrix are chosen from the unit strength. (B) Similar to (A) expect that the individual components of the weight matrix are chosen from a Gaussian distribution. (C) Overlap of the network state with a trained pattern in a Hebbian auto-associative network that satisfies Dale’s principle.

(8.56)(8.57)(8.58)

(8.59)

(8.60)


8.7.2 Random and Hebbian matrices with asymmetries The weight components of strength |wij| = 1 with random vari-

ables (Fig. 8.10 B) Dales’ principle (Fig. 8.10 C)

37

Fig. 8. 12 (A) convergence indica-tor for networks with an asymmet-ric weight matrix where the indi-vidual components of the matrix are chosen from the unit strength. (B) Similar to (A) expect that the individual components of the weight matrix are chosen from a Gaussian distribution. (C) Overlap of the network state with a trained pattern in a Hebbian auto-associa-tive network that satisfies Dale’s principle.


8.7.3 Non-monotonic networks

The Cohen-Grossberg theorem indicates that networks can also behave chaotic when violating other constraints

Networks of Hebbian trained networks with non-monotinic activation functions¨ Point attractors still exist¨ The profoundly enhanced storage capacities of those networks

Basins of attraction that seem to be surrounded by chaotic regimes

Non-monotonic activation functions An effective non-monotoincity

¨ Appropriate activation of excitatory and inhibitory connections¨ Some neurons may have non-monotonic gain functions

38


Conclusion

Short-term memory and long-term memory Recurrent neural networks Point-attractor networks

¨ Dynamics¨ Hebbian learning

Phase diagram Sparse attractor neural networks Dynamical systems

¨ Chaotic networks¨ Lyapunov function

Asymmetric networks

39

1 8. auto-associative memory and network dynamics lecture notes on brain and computation byoung-tak...

Documents

snu cse biointelligence

autoassociative node

autoassociative network

recurrent pathway

associative nodes

recurrent connectionsthe

associative networkenhances

recurrent feedback connection