1.neuronal dynamical systems we describe the neuronal dynamical systems by first- order differential...

1.Neuronal Dynamical SystemsWe describe the neuronal dynamical systems by first-order differential or difference equations that govern the time evolution of the neuronal activations or membrane potentials.

),,(),,(

YX

YXFFhyFFgx

Review

4.Additive activation models

n

ijijiijjj

p

jijijjiii

JmxSyAy

InySxAx

1

1

)(

)(

Hopfield circuit:1. Additive autoassociative model;

2. Strictly increasing bounded signal function ;

3. Synaptic connection matrix is symmetric .

)0( S

)( TMM

j

ijijji

iii ImxS

R

xxC )(

Review

5.Additive bivalent models

n

ijij

kii

kj

p

jiji

kjj

ki

ImxSy

ImySx

)(

)(

1

1

Lyapunov Functions

Cannot find a lyapunov function,nothing follows;

Can find a lyapunov function,stability holds.

Review

A dynamics system is

stable , if ;

asymptotically stable, if .

0L

0L

Monotonicity of a lyapunov function is a sufficient not necessary condition for stability and asymptotic stability.

Review

Bivalent BAM theorem.

Every matrix is bidirectionally stable for synchronous or asynchronous state changes.

• Synchronous:update an entire field of neurons at a time.• Simple asynchronous:only one neuron makes a state-change decision.

• Subset asynchronous:one subset of neurons per field makes state-change decisions at a time.

Review

Chapter 3. Neural Dynamics II:Activation Models

The most popular method for constructing M:the bipolar Hebbian or outer-product learning method

binary vector associations:

bipolar vector associations:

),( ii BA

),( ii YXmi ,2,1

]1[2

1 ii XA

12 ii AX


The bipolar outer-product law:

m

kk

Tk YXM

The binary outer-product law:

m

kk

Tk BAM

The Boolean outer-product law:

kTk

m

kBAM

),,max( 11j

mim

jiij babam


The weighted outer-product law:

In matrix notation:

Where holds.

m

kk

Tkk YXwM

m

kkw 1

WYXM TWhere ]||[ 1

Tm

TT XXX

],,[ 1 mww DiagonalW

]||[ 1T

mTT YYY


kmk cw

10 c

One can models the inherent exponential fading of unsupervised learning laws by rearrange coefficients of the matrix W. Such as ,

an exponential fading memory, constrained by , results if

mww 1


1.Unweighted encoding skews memory-capacity analyses.

2. The neural-network literature has largely overlooked the weighted outerproduct laws.


※ Optimal Linear Associative Memory Matrices

Optimal linear associative memory matrices:

The pseudo-inverse matrix of :X *XIf x is a nonzero scalar: xx /1*

If x is a zero scalar or zero vector :

For a rectangular matrix , if exists:

0* x

If x is a nonzero vector:T

T

xx

xx *

1)( TXX1* )( TT XXXX

X



Define the matrix Euclidean norm asM

)( TMMTraceM

Minimize the mean-squared error of forward recall,to find that satisfies the relation

M̂

M allfor XMYMXY ˆ


Suppose further that the inverse matrix exists. Then

1X

YXX-Y

YY

1-

00

So the OLAM matrix correspond toM̂ YXM 1ˆ



If the set of vector is orthonormal

},,{ 1 mXX

ji if

ji if XX Tji 0

1

YXM Tˆ

Then the OLAM matrix reduces to the classical linear associative memory(LAM) :

For is orthonormal, the inverse of is .

X X TX


※ Autoassociative OLAM Filtering Autoassociative OLAM systems behave as linear filters.In the autoassociative case the OLAM matrix encodes only the known signal vectors . Then the OLAM matrix equation (3-78) reduces to

ix

XXM *

M linearly “filters” input measurement x to the output vector by vector matrix multiplication: .

x xxM


※3.6.2 Autoassociative OLAM Filtering

The OLAM matrix behaves as a projection operator. Algebraically,this means the matrix M is idempotent: .

XX *

MM 2

Since matrix multiplication is associative,pseudo-inverse property (3-80) implies idempotency of the autoassociative OLAM matrix

M

XX

)(

*

**

**

2

XXXX

XXXX

MMM


※ Autoassociative OLAM Filtering Then (3-80) also implies that the additive dual matrix behaves as a projection operator:XXI *

XX-I

XXX2X-I

)(X2X-I

XXXXXX-XX-I

))(()(

*

**

***

****2

**2*

XXXX

XXIXXIXXI

We can represent a projection matrix M as the mapping LRM n :


※ Autoassociative OLAM Filtering

The Pythagorean theorem underlies projection operators.

The known signal vectors span some unique linear subspace of

mXX ,,1 ),,( 1 mXXL nR

L equals , the set of all linear combinations of the m known signal vectors.

} all :{ m

i iii RcforXc

denotes the orthogonal complement space

L} allfor 0:{ LyxyRx Tn

the set of all real n-vectors x orthogonal to every n-vector y in L.



1. Operator projects onto L.

2. The dual operator projects onto .

XX * nR

nRXXI * L

Projection Operator and uniquely decompose every vector x into a summed signal vector and a noise or novelty vector :

XX * XXI *nR

x̂ x~

xx XXIxXxXx

~ˆ)( **

x

x̂x~



The unique additive decomposition obeys a generalized Pythagorean theorem:

xx ~ˆ

222 ||~||||ˆ|||||| xxx

where defines the squared Euclidean or norm.

221

2|||| nxxx 2l

Kohonen[1988] calls the novelty filter on .

XXI * nR



Projection measures what we know about input x relative to stored signal vectors :

x̂mXX ,,1

m

iii xcx̂

for some constant vector .),,( 1 ncc

The novelty vector measures what is maximally unknown or novel in the measured input signal x.

x~


※Autoassociative OLAM Filtering

Suppose we model a random measurement vector x as a random signal vector corrupted by an additive, independent random-noise vector :

sxNx

Ns xxx We can estimate the unknown signal as the OLAM-filtered output .

sxXxXx *ˆ



Kohonen[1988] has shown that if the multivariable noise distribution is radially symmetric, such as a multivariable Gaussian distribution,then the OLAM capacity m and pattern dimension n scale the variance of the random-variable estimator-error norm :

||ˆ|| sxx

2

2

||||

||||||]ˆ[||

N

ss

xn

m

xxn

mxxV



1.The autoassociative OLAM filter suppress noise if m<n , when memory capacity does not exceed signal dimension.

2.The OLAM filter amplifies noise if m>n, when capacity exceeds dimension.


※BAM Correlation Encoding Example

The above data-dependent encoding schemes add outer-product correlation matrices.

The following example illustrates a complete nonlinear feedback neural network in action,with data deliberately encoded into the system dynamics.



Suppose the data consists of two unweighted binary associations and defined by the nonorthogonal binary signal vectors:

),( 11 BA)1( 21 ww

),( 22 BA

A 0101011 B 00111

A 0001112 B 01012



These binary associations correspond to the two bipolar associations and defined by the bipol –ar signal vectors:

),( 11 YX ),( 22 YX

X 1111111 Y 11111

X 1111112 Y 11112


※BAM Correlation Encoding Example We compute the BAM memory matrix M by adding the bipol –ar correlation matrices and pointwise. The first correlation matrix equals

11 YX T22 YX T

11 YX T

1111

1111

1111

1111

1111

1111

1111

1

1

1

1

1

1

11

YX T


※BAM Correlation Encoding Example Observe that the i th row of the correlation matrix equals the bipolar vector multipled by the i th element of . The j th column has the similar result. So equals

11 YX T

1X1Y

22 YX T

1111

1111

1111

1111

1111

1111

22 YX T


※BAM Correlation Encoding Example Adding these matrices pairwise gives M:

2002

0220

2002

2002

0220

2002

1111

1111

1111

1111

1111

1111

1111

1111

1111

1111

1111

1111

2211

YXYXM TT



Suppose, first,we use binary state vectors.All update policies are synchronous.Suppose we present binary vector as input to the system—as the current signal state vector at . Then applying the threshold law (3-26) synchronously gives

1AXF

11 )0011()4224( B MA


※BAM Correlation Encoding Example Passing through the backward filter , and applying the bipolar version of the threshold law(3-27),gives back :

1B TM1A

11 )010101()222222( A MB T

So is a fixed point of the BAM dynamical system. It has Lyapunov “energy” , which equals the backward value .

),( 11 BA6),( 1111 TMBABAL

611 TT AMB

has the similar result:a fixed point with energy .

),( 22 BA622 TMBA


※BAM Correlation Encoding Example So the two deliberately encoded fixed points reside in equally “deep” attractors.

Hamming distance H equals distance. counts the number of slots in which binary vectors and differ:

1l ),( ji AAHiA jA

n

k

kj

kiji aaAAH ||),(



Consider for example the input , which differs from by 1 bit , or . Then

)000110( A2A 1),( 2 AAH

2)0101()2222( B AM



On average, bipolar signal state vector produce more accurate recall than binary signal state vectors when we use bipolar out-product encoding.

Intuitively,binary signal implicitly favor 1s over 0s,wheres bipolarsignals are not biased in favor of 1s or –1s:

1+0=1,whereas 1+(-1)=0



The nueron do not know that a globle pattern “error” has occurred. They do not know that they should correct the error, or whether their current behavior helps correct it.The network also does not provide the nuerons with a globle error signal, and also Lyapunov “energy” information, though state-changing nuerons decrease the energy.

Insteadly, the system dynamics guide the local behavior to a globle reconstruction(recollection) of a learned pattern.


※Memory Capacity:Dimensionality Limits Capacity

Synaptic connection matrices encode limited information.

After a point,adding additional associationsDoes not significantly change the connection matrix. The system “forgets”some patterns. This limits the memory capacity.

),( kk BA

We sum more correlation matrices ,then holds more frequently.

1ijm


※Memory Capacity:Dimensionality Limits Capacity

A general property of nueral network:

Dimensionality limits capacity


※3.6.4 Memory Capacity:Dimensionality Limits CapacityGrossberg’s sparse coding theorem says , for deterministic encoding ,that pattern dimensionality must exceed pattern number to prevent learning some patterns at the expense of forgetting others.

For example,capacity bound for bipolar correlation encoding in the Amari-Hopfield network is

n2log2

2


※3.6.4 Memory Capacity:Dimensionality Limits Capacity For Boolean encoding of binary associations, the memory capacity of bivalent additive BAMs can greatly exceed min(n,p) to the new upper bound min(2n,2p), if the thresholds Ui and Vj are judiciously choosed.

And different sets of thresholds should also improve capacity in the bipolar case(incloding bipolar Hebbian encoding)


※The Hopfield Model

The Hopfield model illustrates an autoassociative additive bivalent BAM operated serially with simple asynchronous state changes.

Autoassociativity means the network topology reduces to only one field, ,of neurons: .The synaptic connection matrix M symmetrically intraconnects the n neurons in field

XF YX FF

or MM T . mm jiij



The autoassociative version of Equation (3-24) describes the additive neuronal activation dynamics:

jiji

kjj

ki ImxSx )(1 (3-87)

for constant input , with threshold signal function

iI

U xif 0

U xif )(

U xif 1

)(

i1k

i

i1k

i

i1k

i1

kii

kii xSxS (3-88)



We precompute the Hebbian synaptic connection matrix M by summing bipolar outer-product(autocorrelation)matrices and zeroing the main diagonal:

m

kk

Tk mIXXM

1(3-89)

where I denotes the n-by-n identity matrix . Zeroing the main diagonal tends to improve recall accuracy by helping the system transfer function behave less like the identity operator.


※Additive dynamics and the noise-saturation dilemma

Grossberg’s Saturation theorem states that additive activation models saturate for large inputs, but multiplicative models do not .


The stationary “reflectance pattern” confronts the system amid the background illumination

),,( 1 nppP )(tI

1,0 1 ni ppandp

The i th neuron receives input .Convex coefficient defines the “reflectance” :

iI ip

iI

IpI ii

A],0[ B

: the passive decay rate: the activation bound

Grossberg’s Saturation Theorem


Additive Grossberg model:

iii

iiiiBIxIA

IxBAxx

)()(

We can solve the linear differential equation to yield

]1[)0()( )()( tii IA

i

itIAii e

IA

BIextx

For initial condition , as time increases the activation converges to its steady-state value:

0)0( ix

BIA

BIx

i

ii

As I


Multiplicative activation model:

ii

iiij

ji

ijjiiiii

BIxIA

BIxIIA

IxIxBAxx

)(

)(

)(

So the additive model saturates.


For initial condition ,the solution to this differential equation becomes

0)0( ix

)1( )( tIAii e

IA

IBpx

As time increases, the neuron reaches steady state exponentially fast:

BpIA

IBpx iii

as .

I

(3-96)


This proves the Grossberg saturation theorem:

Additive models saturate ,multiplicative models do not.


In general the activation variable can assume negative values . Then the operating range equals for .In the neurobiological literature the lower bound is usually smaller in magnitude than the upper bound :

ix],[ ii BC

0iCiC

iB ii BC

This leads to the slightly more general shunting activation model:

1

)()(j

jiiiii IxCIxBAxx


※ General Neuronal Activations:Cohen-Grossberg and multiplicative models

Consider the symmetric unidirectional or autoassociative case when , , and M is constant . Then a neural network possesses Cohen-Grossberg[1983] activation dynamics if its activation equations have the form

YX FF TMM

])()()[(1

n

jijjjiiiii mxSxbxax

The nonnegative function represents an abstract amplification function.

0)( ii xa

(3-102)



1. An intensity range of many order of magnitude is compressed into a manageable excursion in signal level.

2. The voltage difference between two points is propoetional to the contrast ratio between the two corresponding points in the image, independent of incident light intensity.



1. Grossberg’s interpretation of signal and noise

3. Grossberg’s noise suppression.

2. Grossberg’s interpretation of noise as auniform distribution.

Shortcoming of Grossberg’s model:


Grossberg[1988]has also shown that (3-102) reduces to the additive brain-state-in-a-box model of Anderson[1977,1983] and the shunting masking-field model [Cohen,1987] upon appropriate change of variables.


If , , and constant , where and are positive constants , and input is constant or varies slowly relative to fluctuations in ,then (3-102) reduces to the Hopfield circuit[1984]:

ii Ca /1 iiii IRxb )/( iiiii VxgxS )()(

jiijjiij TTmm iC iR

iI

ix

ij

ijji

iii ITV

R

xxC

An autoassociative network has shunting or multiplicative activation dynamics when the amplification function is linear, and is nonlinear .

ia

ib


For instance , if , (self-excitation in lateral inhibition) , and

ii xa 1iim

)]()()([1

iij

ijjiiiiiiiiii

i ImSCISxISBxAx

b

then (3-104) describes the distance-dependent unidirectional shunting network :

)( jiij mm

])()[(])()[(

iij

ijjjiiiiiiiiii ImxSxCIxSxBxAx


Hodgkin-Huxley membrane equation:

iiiip

iipi gVVgVVgVV

t

Vc )()()(

, and denote respectively passive(chloride ) , excitatory (sodium ) , and inhibitory (potassium ) saturation upper bounds .

pV V V ClNa K


At equilibrium, when the current equals zero ,the Hodgkin-Huxley model has the resting potential :

restV

ggg

VgVgVgV

p

ppt

rest

Neglect chloride-based passive terms.This gives the resting potential of the shunting model as

gg

VgVgVrest


BAM activations also possess Cohen-Grossberg dynamics, and their extensions:

])()()[ p

jijjjiiiii mySxbxax （

n

iijiijjjjj mxSybyay ])()()[(

with corresponding Lyapunov function L , as we show in Chapter 6 :

j

y

jjji

x

iiiiii j

ijji

ji

dSdbSmSSL0

'

0

' )()()(


1. The synaptic connections of all models till now have not changeed with time.

2. Such system only recall stored patterns.

3. They do not simultaneously learn new ones.

Any Comments

1.neuronal dynamical systems we describe the neuronal dynamical systems by first- order differential...

Documents