1 8. auto-associative memory and network dynamics lecture notes on brain and computation byoung-tak...
TRANSCRIPT
1
8. Auto-associative memory and network dynamics
Lecture Notes on Brain and Computation
Byoung-Tak Zhang
Biointelligence Laboratory
School of Computer Science and Engineering
Graduate Programs in Cognitive Science, Brain Science and Bioinformatics
Brain-Mind-Behavior Concentration Program
Seoul National University
E-mail: [email protected]
This material is available online at http://bi.snu.ac.kr/
Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002.
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Outline
2
8.1
8.28.3
8.4
8.58.68.7
Short-term memory and reverberating network ac-tivityLong-term memory and auto-associatorsPoint-attractor networks: the Grossberg-Hopfield modelThe phase diagram and the Grossberg-Hopfield modelSparse attractor neural networksChaotic networks: a dynamic systems viewBiologically more realistic variations of attractor networks
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.1 Short-term memory and reverberating net-work activity8.1.1 Short-term memory
Short-term memory¨ Ability to hold information temporarily¨ Recency effect
Tendency to remember recent objects Physiological level
¨ Hold corresponding neural activity over a certain duration Working memory
¨ Another type of short-term memory¨ Discuss in Chapter 11
3
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.1.2 Maintenance of neural activity
Monkey experiment: A monkey was trained that maintain its eyes on a central fixed spot until a ‘go’ signal(tone) is given
The target location for each trial had to be re-membered during the de-lay period
The dorsolateral pre-frontal cortex activity
Fig. 8.1: The neurons were sensitive to the par-ticular target direction of 270 deg
4
Fig. 8.1 Maintenance of delay activity in physiological experiments
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.1.3 Recurrences
Delay activity can be used to store activity over some period of time The question is how such activity can be maintained by a neural network
¨ Recurrences
1. A recurrent node is able to maintain its firing as long as the recurrent pathway is strong enough;
2. There is some delay in the feedback so that the re-entry signal does not fall within a refractory time of the node; and
3. A possible leakage in the recurrent pathway is small enough so that it is possible to fire the node again.
5
Fig. 8.2 (A) Schematic illustration of an auto-as-sociative node that is distinguished from the as-sociative node as illustrated in Fig. 7.1A in that it has, in addition, a recurrent feedback connection. (B) An auto-associative network that consist of associative nodes that not only receive external input from other neural layers but, in addition, have many recurrent collateral connections be-tween the nodes in the neural layer.
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.2 Long-term memory and auto-associa-tors Auto-associative memory
¨ The input of each node is fed back to all of the other nodes in the network
A recurrent network model¨ Tune the recurrent connections
The back-projections in this associative network¨ Enhances the pattern completion ability
The recurrent model¨ Anatomically faithful¨ Collateral connections¨ Intercortical back-projections
6
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.2.1 The hippocampus and episodic mem-ory The hippocampus has been associated with the form of LTM
called episodic memory¨ The storage of events¨ Area CA3 has well developed axon collaterals connecting¨ A recurrent auto-associative network
The acquisition of episodic long-term memory Lesion experiment
¨ Amnesia ¨ Inability to form new long-term memory
Intermediate-term memory
7
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.2.2 Learning and retrieval phase
A difficulty that occurs when combining associative Hebbian mechanisms with recurrences in the networks¨ A training phase and a retrieval phase
Switching between the learning and retrieval phase could be accomplished in the brain:¨ Mossy fibres command the firing the firing of specific
CA3 neurons¨ Chemical agents
Acetylcoholine (Ach) and noradrealine Neuromodulators
8
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.3 Point-attractor networks: The Gross-berg-Hopfield model Grossberg-Hopfield model
¨ The understanding and popularization of recurrent auto-associa-tive memory
¨ The dynamic properties Attractor states
¨ A dynamic system reaches asymptotically The brain does not rely entirely on the necessity of settling
into an attractor states The rapid path towards an attractor state
¨ Sufficient to use such networks as associative memory devices
9
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.3.1 Network dynamics
Considering the time domain A network of sigma nodes governed by the leaky integrator dy-
namics The change of the internal state of node I
A discrete version
Set Δt=τ=1
The stationary states, dhi/dt = 0, of the continuous system (eqn. 8.1) are also the fixpoints of the discrete system (eqn. 8.3)
The transient response dynamics of model
10
j
extijiji tItrwth )()()1(
j
extijiji
i tItrwthdt
tdh)()()(
)(1
j
extijijii tItrw
tth
ttth )()()()1()(
: Continuous system(8.1)
(8.2)
(8.3)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.3.2 Hebbian auto-correlation learning
The weights are not random¨ Specifically self-organized by Hebbian learning
Capacity analysis and recall abilities A binary system, ri {0,1}, ∈
¨ The variables Train the system on a set of random patterns ξi
μ∈{-1,1}, ¨ The index μ labels the pattern, μ = 1 ,…, P¨ The Hebbian rule
The threshold gain function of the net input hi,
11
12 ii rs
jiij Nw
1
)(sign ii hr
(8.4)
(8.5)
(8.6)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.3.3 Signal-to-noise analysis
How the network behaves on its own The external input to zero,
μ = 1 for demonstration,
First term is signal Second term is the cross-talk term
¨ ‘Noise’
12
jjjii
jjiji
tsN
ts
tswts
)(1
sign)1(
)(sign)1(
j
P
jjiii
j j
P
jjijjii
tsN
tsts
tsN
tsN
ts
2
2
11
)(1
)(sign)1(
)(1
)(1
sign)1(
(8.7)
(8.8)
(8.9)
(8.10)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.3.4 One pattern
Only one imprinted pattern¨ Not have a cross-talk term
The dynamics of this network
Not start the network with the trained pattern¨ A noisy version¨ Retrieve the learned pattern
in a moderately noisy version of the trained pattern A point attractor
¨ The trained pattern
13
)(sign)1( tsts iii (8.11)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.3.5 Many patterns
The cross-talk term is a random variable¨ Estimate the variance
Each part of the sum in the cross talk term is distributed num-ber with 1 to -1
The variance,
The error function, A load parameter,
14
N
P
N
P )()1(
)
2erf(1
2
1error
SP
2bound1- )21(erf2
1
P Fig. 8.3 The probability distribution of the cross-talk term is well
approximated by a Gaussian with zero and variance σ. The value of the shaded area marked Perror is the probability that the cross-talk term changes the state of the node. The table lists examples of this probability for different values of the load parameter α.
(8.12)
(8.13)
(8.14)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.3.6 Point attractors
The pattern completion ability of the associative nodes The distance
¨ a = ξ1, b = s
15
)||||||||
'1(
2
1),(
ba
baba
d
Fig. 8.4 Simulation results for an auto-associative network with quasicontinuous leaky-integrator dynam-ics, N = 1000 node, and a time constant of τ = 10 ms. The time step was set to Δt = 0.01 ms. (A) Distance between a trained pattern and the state of the network after the network was initialized with a noisy ver -sion of one of the 100 trained patterns. (B) Dependence of the distance at t = 1 ms on the initial distance at time t = 0 ms
(8.15)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.4 The phases diagram and the Grossberg-Hopfield model8.4.1 The load capacity αc
Sparse connectivity¨ The number of connections per node, C,
The load capacity αc
16
C
N pat
Fig. 8.5 Simulation results for an auto-as-sociative network equivalent to the net-work used in Fig. 8.4 but with different numbers of patterns I the training set Npat. The distance of the network state from the first training pattern at time t = 1 ms is shown. The network was initialized with a version of the first trained pattern with 1% reversed component
(8.16)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.4.2 The spin model analogy
Spin models (developed in statistical physics) Central to the correspondence of spin models and the recur-
rent models¨ The binary state
Thermal noise, T A sharp transition between
¨ a paramagnetic phase and No dominant direction of the magnets
¨ the ferromagnetic phase A dominating direction of the elementary magnets
Phase transition
17
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.4.3 Frustration and spin glasses
The situation in auto-associative network is further compli-cated¨ The force between the nodes is not consistently positive
The Hebbian rule¨ Positive and negative weights¨ complicated spin states of the system¨ Frustrated systems or spin glasses
Mean field theory Replica method
18
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.4.4 The phase diagram
The phase diagram of the Grossberg-Hopfield model Noise in the network A probabilistic version
19
)/)(2exp(1
1)1)((
))((sign)(
1 TthtrP
thtr
i
ii
Fig. 8.6 Phase diagram of the attractor net-work trained on a binary pattern with Heb-bian imprinting. The abscissa represents the values of the load parameter α = Npat/C, where Npat is the number of trained patterns and C is the number of connections per node. The ordinate represents the amount of noise in the system. The shaded region is where point attractors proportional to the trained pattern exit. The network in this re-gion can therefore function as associative memory.
(8.17)(8.18)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.4.5 Spurious states
Noise can help the memory performance The network with a pattern that has the sign of the majority of the
first three patterns: The state of the node after one update of this node
If the components ξ1,ξ2, ξ3 all have the same value, which happen with the probability of ¼, then we can pull out this value from the sum in the signal term,
if ξ3 has different sign
Average a signal that has the strength of
times the signal when updating a trained pattern
20
)(sign)1( 321iii
mixii ts
j
iii
N
jij
iiijijijij
iii
N
jii
patpat
NNNts ))(sign
1))((
1sign())(sign
1(sign)1( 321
4
321332211321
1
)(sign)(1
))((1 3213211321332211
iiiiiiij
iiijijiji NN
)(sign)(1
))((1 3213211321332211
iiiiiiij
iiijijiji NN
8
6
2
1*
4
3
2
3*
4
1
ξ1 ξ2 ξ3 ξ1+ξ2+ξ3
1111-1-1-1-1
11-1-111-1-1
1-11-11-11-1
311-11-1-1-3
(8.19) (8.20
)
(8.21)
(8.22)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.4.6 The advantage of noise
Spurious states ¨ attractors under the network dynamics
The average strength of the signal for the spurious states is less than the signal for a trained pattern¨ The spurious states under normal conditions are less stable than attractors re-
lated to trained patterns With an appropriate level of noise
¨ Kick the system out of the basin of attraction of some spurious states and into the basin of attraction of another attractor
Noise can help to destabilize undesired memory states The average behavior of the network
¨ Good assumption in the large network The behavior of particular network depends strongly on the specific realiza-
tion of the training pattern¨ The phase diagram is specific to the choice of training patterns
21
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.5 Sparse attractor neural networks
The load capacity for the noiseless Grossberg-Hopfield model with starndard Hebbian learning is about 0.138¨ Training patterns are uncorrelated
The sensory signals are often correlated¨ A fish image and water image¨ The cross-talk term can yield high values¨ Solution
The training patterns get modified to yield orthogonal patterns αc = 1
22
elsewhere0
if1
'
(8.23)
(8.24)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.5.1 Expansion coding
Orthogonalization¨ Expansion coding
23
Fig. 8.7 Example of expansion cod-ing that can orthogonalize a pattern representation with a single-layer perceptron. The nodes in the per-ceptron are threshold units, and we have included a bias with a separate node with constant input. The or-thogonal output can be fed into a recurrent attractor network where all these inputs are fixpoints of the attractor dynamics.
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.5.2 Sparse pattern
The expansion coding¨ The load capacities of attractor networks can be larger for pat-
terns with sparse representations. The sparseness
The storage capacity of attractor networks
¨ k is a constant (roughly on the order of 0.2~0.3)¨ Sparseness a = 0.1, 10,000 synapses¨ The number of patterns that can be stored exceed 20,000
24
)/1ln( aa
kc
22/ rra (8.25
)
(8.26)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.5.3 Alternative learning schemes
The cross-talk term¨ Minimized with different choices of weight matrices¨ The overlap matrix
¨ The pseudo-inverse method
The inverse orthogonalized the input pattern and the storage capac-ity is αc = 1
Not very plausible biologically
25
i
jiQ
ii
Nwij )(
1 1 Q
(8.27)
(8.28)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.5.4 The best storage capacity with a sparse pattern There are many possible learning algorithms
¨ Each learning rule can lead to a different weight matrix A certain training set we can ask what the load capacity of the
network¨ Try out all the possible weight matrices
A daunting task The maximal storage capacity of auto-associative network
with a binary patterns
The simplest Hebbian rule comes close to givig the maximum value
26
)/1ln(
1
aac (8.29)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.5.5 Control of sparseness in attractor networks Training recurrent networks on patters with sparseness a with
the basic Hebbian covariance rule,¨ Not enough to ensure that the state that is retrieved also has
sparseness of aret = a Mean Variance A weight matrix with Gaussian-distributed components means
that the amount of inhibition is equal to the amount of excita-tion
27
))(( ararkw jiij
0)1()1(2)1()( 222222 aaaaaawwPw 0 w
)(d)(
)(d)(
0
0
wwPW
wwPW
ex
in
1)1(
)1(
ex
inret
retexretin
W
Wa
aWaWri rj δw P(w)
0011
0101
a2
-a(1-a)-a(1-a)(1-a) 2
(1-a) 2
a(1-a)a(1-a)
a2
(8.30)
(8.31)
2222)1()( aawPww )1(pat2 aaNw (8.32
)
(8.33)
(8.34)
(8.35)(8.36)
(8.37)
(8.38)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.5.6 Global inhibition (1)
28
Activity-dependent global inhibition,
The retrieval sparseness
cww ijij
Fig. 8.8 (A) A Gaussian function centerd at a value –c. Such a curve describes the distribution of Hebbian weight values trained on random patterns and includes some global inhibition with strength value c. The shaded area is given by the Gaussian error function described in Appen-dix B. (B) Theoretical retrieval sparseness aret as a func-tion of global inhibition c is plotted as thin lines for differ-ent values of the sparseness of the pattern set a, from a = 0.05 (lower curve) to a = 0.5 (upper curve) I steps of Δa = 0.05. We assumed therein 40 imprinted patterns with Gaussian-distributed components of the weight matrix. The thick line shows where the retrieval sparseness matches the sparseness of the imprinted pattern (aret = a).
)21(erf2
)2
(erf12
1
2
1
2
1
2
1
2121
1
1-
0
2ret
dif
1
dif
dif
2
2
ac
cdtea
WW
Wa
ct
ret
(8.39)
(8.40)
(8.41)(8.42)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.5.6 Global inhibition (2)
29
Fig. 8.9 (A) The simulated curve for pattern with sparseness a = 0.1. The plateau is due to the attractor dynamics not taken into account in the analysis that led to Fig. 8.8B. The lower curve indicates the aver -age Hamming distance between the imprinted pattern and the network state updating the network. Cor-rect recalls were indeed achieved for inhibition constants that coincide with the plateau in the retrieval sparseness. (B) normalized Histogram of weight components for 40 patterns trained with the Hebbian covariance rule. (C) Normalized histogram of weight components for 400 patterns trained with the Heb-bian covariance rule.
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.6 Chaotic networks: a dynamic systems view
The theory of dynamic systems¨ Auto-associative memories = ‘point attractors’
Equations of motion
The dynamic of a recurrent network with continuous dynamics Dimensionality
¨ The number of equations¨ The number of nodes in the networks
The vector x is state vector State space Trajectory
30
)(xfx
dt
d
(8.43)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.6.1 Attractors
Lorenz system
A recurrent network of three nodes
31
3213
2312
121
)(
)(
cxxxdt
dx
xxbxdt
dx
xxadt
dx
otherwise
w
w
c
b
a
xxwxwdt
dxkj
jkijki
jij
i
0
1
1
{ and
00
01
012312
2213
11
21
ww
Fig. 8.10 Example of a trajectory of the Lorenz sys-tem from a numerical integration within the time in-terval 0 ≤ t ≤ 100. The parameters used were a = 10, b = 28, and c = 8/3.
(8.44)
(8.45)
(8.46)
(8.47)
(8.48)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.6.2 Lyapunov functions (1)
A system has a point attractor if a Lyapunov function (energy function) exists
‘Landscape’ If there is a function V(x) that never increases under the dy-
namics of the system,
32
0)(
dt
dV x
Fig. 8.11 A ball in an ‘energy’ landscape.
)(xV
(8.49)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.6.2 Lyapunov functions (2)
Lyapunov function for the recurrent networks
33
j
extijiji tItrwth )()()1(
ii
ext
i jjiijn rIrrwrrV
2
1),...,( 1
kkk
ext
k jjkkj
k jjkkj
trtrI
trtrwtrtrw
tVtVV
)()1(
)()(2
1)1()1(
2
1
)()1(
ij
extjjiijii
ikiiikkii
ijjiji
ikkkji
ijjiji
Itrwwtrtr
trtrbtrwtrtrwtr
trwtrtrwtrV
)}()(2
1{)()1(
)()1()()(2
1)()(
2
1
)()1(2
1)()1(
2
1
)()()1( thtrtrV iii
(8.50)
(8.51)
(8.52)
(8.53)
(8.54)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.6.3 The Cohen-Grossberg theorem
General systems with continuous dynamics
Lyapunov function under the conditions¨ Positivity ai ≥ 0: The dynamics must be a leaky integrator rather
than an amplifying integrator¨ Symmetry wij = wji: The influence of one node on another has to
be the same as the reverse influence¨ Monotonicity sign(dg(x)/dx) = const: The activation function
has to be a monotonic function
34
N
kkkikiiii
i xfwxbxadt
dx
1
))(()()((8.55)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.7 Biologically more realistic variations of attractor networks The synaptic weights between neurons in the nervous system
¨ Cannot be expected to fulfill the conditions of symmetry of the weight matrix required to guarantee stable attractors
Neurons receive a mixture of input from excitatory and in-hibitory presynaptic neuorns
Dale’s principle The Cohen-Grossberg theorem
¨ An inhibitory node could only receive inhibitory connections and vice versa
35
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.7.1 Asymmetric networks
Simple case of non-symmetric weight matrices¨ A symmetric and an anti-symmetric part
The difference between two consecutive time steps
36
sji
aij
sji
sij
aass
ww
ww
gg
www
jiforgg
jifor
jiforgg
was
as
ij 0
2|))1(||)((|)( tttd rr
Fig. 8. 12 (A) convergence indicator for networks with an asymmetric weight matrix where the indi-vidual components of the matrix are chosen from the unit strength. (B) Similar to (A) expect that the individual components of the weight matrix are chosen from a Gaussian distribution. (C) Overlap of the network state with a trained pattern in a Hebbian auto-associative network that satisfies Dale’s principle.
(8.56)(8.57)(8.58)
(8.59)
(8.60)
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.7.2 Random and Hebbian matrices with asymmetries The weight components of strength |wij| = 1 with random vari-
ables (Fig. 8.10 B) Dales’ principle (Fig. 8.10 C)
37
Fig. 8. 12 (A) convergence indica-tor for networks with an asymmet-ric weight matrix where the indi-vidual components of the matrix are chosen from the unit strength. (B) Similar to (A) expect that the individual components of the weight matrix are chosen from a Gaussian distribution. (C) Overlap of the network state with a trained pattern in a Hebbian auto-associa-tive network that satisfies Dale’s principle.
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8.7.3 Non-monotonic networks
The Cohen-Grossberg theorem indicates that networks can also behave chaotic when violating other constraints
Networks of Hebbian trained networks with non-monotinic activation functions¨ Point attractors still exist¨ The profoundly enhanced storage capacities of those networks
Basins of attraction that seem to be surrounded by chaotic regimes
Non-monotonic activation functions An effective non-monotoincity
¨ Appropriate activation of excitatory and inhibitory connections¨ Some neurons may have non-monotonic gain functions
38
(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Conclusion
Short-term memory and long-term memory Recurrent neural networks Point-attractor networks
¨ Dynamics¨ Hebbian learning
Phase diagram Sparse attractor neural networks Dynamical systems
¨ Chaotic networks¨ Lyapunov function
Asymmetric networks
39