1 11. system level organization and coupled networks lecture notes on brain and computation...

1

11. System level organization and coupled networks

Lecture Notes on Brain and Computation

Byoung-Tak Zhang

Biointelligence Laboratory

School of Computer Science and Engineering

Graduate Programs in Cognitive Science, Brain Science and Bioinformatics

Brain-Mind-Behavior Concentration Program

Seoul National University

E-mail: [email protected]

This material is available online at http://bi.snu.ac.kr/

Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002.

(C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

Outline

2

11.111.211.311.411.511.6

System level anatomy of the brainModular mapping networksCoupled attractor networksWorking memoryAttentive visionAn interconnecting workspace hypothesis


11.1 System level anatomy of the brain

The brain is more than just a big neural network with com-pletely interconnected neurons

Combine the basic networks¨ Associative and competitive networks

Global architectures reflecting large-scale organizations of the brain

Modular networks resulting from combining the basic net-works¨ Display some structure within their architecture as opposed to

completely interconnected networks

3


11.1.1 Large-scale anatomical and func-tional organization in the brain

4

Fig. 11.1 Example of a map of connectivities between cortical areas involved in visual process-ing.


11.1.2 Advantages of modular organizations

Modular specialization is used in the brain The functional significance of modular specialization in vis-

ual processing¨ The cortex uses inhibition to sharpen various visual attributes

Color, edges or orientations Local inhibition

The separate attentional amplifications of separate features Learning speed, generalization abilities, representation capa-

bilities and task realizations Modular mapping networks Modular attractor networks

5


11.2 Modular mapping networks11.2.1 Mixture of experts

6

Fig. 11.2 An example of a type of modular mapping network called mixture of experts. Each expert, the gating network, and the integration network are usually mapping networks. The in-put layer of the integration network is composed of sigma-pi nodes, as the output of the gating network weights (modulates) the output of the expert networks to form the inputs of the integra-tion network.


11.2.2 Divide-and-conquer

7

Fig. 11.3 (A) Illustration of the absolute function f(x) = |x| that is difficult to approxi-mate with a single mapping network. (B) A modular mapping network in the form of a mixture of experts that can represent the absolute function accurately.


11.2.3 Training modular structures

Training such networks¨ To solve specific tasks in flexible manner

Training the experts alone has two component:¨ To assign the expert to particular task¨ To train each expert on the designated task

Training the gating network¨ Credit-assignment problem

Biological systems¨ A task assignment phase and an expert training phases are not

separated

8


11.2.4 The ‘what-and-where’ task

The ventral visual pathway: ‘what’ The dorsal visual pathway: ‘where’ Performing object recognition and determining the location of objects in a

modular network Retina of 5 x 5 cells Object of 3 x 3 patterns 26 input channels

¨ 25 for retina inputs¨ 1 for task specification

18 output nodes¨ 9 for objects patterns¨ 9 for location

36 hidden nodes Back-propagation learning

9

Fig. 11.4 Example of the ‘what-and-where’ tasks. (A) 5 x 5 model retina with 3 x 3 image of an object as an example.


11.2.5 Temporal and spatial cross-talk

Conflicting training information Temporal cross-talk

¨ The network will quickly adapt to reasonable performances of the ‘what’ task if we train the network first entirely on this task

¨ The representations of the hidden layers will change in a subse-quent learning period on the ‘where’ task, which is likely to conflict with the representation necessary for the ‘what’ task

¨ Training sets with conflicting training pattern Spatial cross-talk

¨ One training example due to the distributed nature of the repre-sentations

The division of the tasks into separate networks¨ Abolish both problematic cross-talk conflicts

10


11.2.6 Task decomposition Modular networks can learn task decomposition Solution of Jacobs, Jordan and Barto

¨ Using a gating network that increased the strength to that expert network that significantly improved the output the system

¨ The back-propagated error signal is modulated by the gating weights¨ The module that contributed most to the answer will adapt most to the new example¨ Specialized expert

Solution of Jacobs and Jordan¨ A physical location of the nodes in a single mapping network¨ Used a distance-dependent term in the objective function¨ Leads to a weight decay favoring short connections¨ The objective (or error) function

11

ij ij

ijij

ii

outi w

wdyrE

2

22

12

1

Fig. 11.4 (B) Connection weights between hidden and output nodes in a single mapping network. Positive weights are shown by solid squares, while open squares symbolize nega-tive values. (C) Connection weight between hidden and output nodes in a single mapping network when trained with a bias toward short connections.

(11.1)


11.2.7 Product of experts

Output can be interpreted as the probability that an input vector has a certain feature

If the summed output of the expert networks are normalized¨ Modular networks can easily be generalized to allow similar interpretations

View the mixture of experts as a collection of experts whose weighted opin-ion is averaged to determine the probability of the feature value

The probability of the independent events The advantages of a product of experts relative to the weighted mean calcu-

lated by the mixture of experts¨ A large probability assigned by one expert can be largely suppressed by low

probabilities assigned by other experts¨ A large probability is only indicated if there is some agreement between the ex-

perts¨ Allows the individual experts to assign unreasonably large probabilities to some

event as long as other experts represent such events more accurately with low probabilities

12


11.3 Coupled attractor networks

13

Fig. 11.5 Coupled (or subdivided) recurrent neural networks. The nodes in this exam-ple are divided into two groups (the nodes of each group are indicated with different shadings). There are connections within the nodes of each group and between nodes of different groups.


11.3.1 Imprinted and composite patterns

One single point attractor network versus two single point at-tractor networks

Imprint into this system all objects¨ Two independent feature vectors representing

m possible feature values for each feature of an object Build m2 possible objects

A network with 1000 nodes can store around 138 patterns¨ Hebbian rule¨ The storage capacity αc ≈ 0.138 (see Chapter 8)

¨ A network with two such independent subnetworks could store¨ , N is number of nodes¨ The number of patterns that can be stored in a single networks is

only

14

044,19138)/( 2 mc mNP

NP c

(11.2)

(11.3)


11.3.2 Signal-to-noise analysis

The behavior of coupled attractor networks using a signal-to-noise analysis

An overall network of N nodes that can be divided into modules, each having the same number of nodes N′

The weights are trained with the Hebbian rule Modulate the weight values between the modules with a fac-

tor g Define a new weight matrix with components A modulation matrix g

15

'N

Nm

P

jiij Nw

1

1

ijijij wgw ~

otherwise

module same e within thare , nodes if1

g

jig ij

(11.4)

(11.5)

(11.6)

(11.7)


11.3.3 Imprinted pattern

To evaluate the stability of the imprinted pattern Separate the signal term from the noise terms

To simplify the formulas

The capacity bound

16

122

)( 1

1

)( 1

111

)1

1(1

)1

1(11

:noise

)1

1(1''

:signal

)1''

()1(

zm

gmN

P

mg

mN

P

mg

mN

NN

N

NS

N

g

NN

NNg

N

Nsignts

imjjji

imjjjiiii

2-12 )]21(erf[ errorPz

21

2

2 zz

S

Fig. 11.6 Coupled attractor neural networks: result from signal-to-noise analy-sis. (A) Dependence of the load capacity of the im-printed pattern on relative intermodule coupling strength g for different numbers of modules m.

(11.8)

(11.9)

(11.10)

(11.11)


11.3.4 Composite pattern (1)

All kinds of combination of patterns in different modules All the subpatterns in the modules are chosen to be the first training pat-

tern except for the module to which the node under consideration belongs

17

)21

1)(1(

2

:noise

)1

1('

:signal

)1for terms-noiseother

1'()1(

otherwise

)(for )(

2

22

1

)(

11

1

zm

m

zg

z

gm

gN

NNS

NN

NNgsignts

imjts

imjjj

iiii

j

jj

Fig. 11.6 (B) Bounds on relative intermodule coupling strength g. For g value greater than the upper curve the imprinted patterns are stable. For g less then the lower curve the composite patterns are stable. In the narrow band in between we can adjust the system to have several composite and some imprinted patters stable. This band gets narrower as the number of modules m increase and vanishes for networks with many modules.

(11.12)

(11.13)

(11.14)

(11.15)


11.3.4 Composite pattern (2)

The reverse case

18

)1(2

21

:noise

1' :signal

)1for terms-noiseother

'()1(

otherwise

)(for )(

2

22

1

)(

11

1

mmz

mzg

z

mN

NS

N

g

N

Nsignts

imjts

imjjj

iiii

j

ij

Fig. 11.6 (B) Bounds on relative intermodule coupling strength g. For g value greater than the upper curve the imprinted patterns are stable. For g less then the lower curve the composite patterns are stable. In the narrow band in between we can adjust the system to have several composite and some imprinted patters stable. This band gets narrower as the number of modules m increase and vanishes for networks with many modules.

(11.16)

(11.17)

(11.18)

(11.19)


11.4 Working memory

A specific hypothesis on the implementation of working memory in the brain

Working memory is a construct that is hard to define precisely¨ Workspace that provides the necessary information to solve

complex task¨ Language comprehension¨ Mental arithmetic¨ Reasoning for problem-solving and decision-making¨ The specific form of short-term memory (STM)

19


11.4.1 Distributed model of working mem-ory A conceptual model of working memory

¨ Based on modular structure

How the interaction of these modules enable the brain to uti-lize the kind of information

20

Fig. 11.7 A modular system of short-, intermediate-, and long-term memory, which are associated with functionali-ties of the prefrontal cortex (PFC), the hippocampus and related areas (HCMP), and the perceptual and motor cortex (PMC), respectively.


11.4.2 Limited capacity of working mem-ory ‘31, 27, 4, 18’ ‘62, 97, 12, 73, 27, 54, 8’ The limited capacity of working memory

¨ ‘Magical number 7±2’ The very limited capacity of working memory is puzzling Correlated classical measurement of IQ factors A lager storage capacity should make us fitter to survive The search for the reasons behind the limited capacity of

working memory is prominent in cognitive neuroscience

21


11.4.3 Spurious synchronization

The reasons behind the limited capacity of working memory

22

Fig. 11.8 (A) The percentage of correct responses (recall ability) in human subjects in a sequen-tial comparison task of two images with different numbers of objects Nobj (solid squares). The dotted lines illustrate examples of the functional form as suggested by the synchronization hy-pothesis. The dashed line corresponds to the results with PSS(2) = 0.04, where PSS is the probabil-ity of spurious synchronization. (B) Illustration of the spurious synchronization hypothesis. The features of the object are represented by different spike trains, so that the number of synchronous spikes within a certain resolution increases with increasing an number of objects.


11.4.4 Quantification of the spurious syn-chronization hypothesis The spurious synchronization hypothesis

¨ Each additional feature of one object would in any case be syn-chronized with the other features of the objects so that only the number of objects with different spike trains would matter

¨ The number of pairs

¨ The probability of spurious synchronization between at least two spike trains in a set of Nobj spike trains (pattern)

¨ A functional expectation of the percentage of correct recall

23

)!2(!2

!

2 obj

objobjpairs

N

NNN

50))((100))(1( objSSobjSScorrect NPNPP

pairs

))2(1(1)( SSobjSS NPNP

(11.20)

(11.21)

(11.22)


11.4.5 Other hypotheses on the capacity limit Lisma and Idiart

¨ The limit might be solely due to the limiting ability of reverber-ating neural activity for short-term memory

¨ The representation of different objects is kept in different high-frequency subcycles of low-frequency oscillations found in the brain

Nelson Cowan¨ Based on the limits of an Attentional system

Thought to be a necessary ingredient in working memory models

24


11.5 Attentive vision11.5.1 Object recognition versus visual search

25

Fig. 11.9 Illustration of a visual search and an object recognition task. Each task de-mands a different strategy in exploring the visual scene.


11.5.2 The overall model

26

Fig. 11.10 Outline of a model of the visual processing in primates to simulate visual search and object recognition. The main parts of the model are inspired by structural features of the cortical areas thought to be central in these processes. Theses include early visual areas (labeled ‘V1-V4’) that represent the content of the visual field topographically with basic features, the inferior-temporal cortex (labeled ‘IT’) that is known to be central for object recognition, and the posterior parietal cortex (labeled ‘PP’) that is strongly associated with spatial attention.


11.5.3 Object representation in early visual areas

The first part of the model labeled ‘V1-V4’ represents early visual areas¨ The striate cortex (V1) and adjacent visual areas

The primary visual area V1 is actually the main cortical area ¨ Receives visual input from the LGN of the thalamus¨ The major target of the optic nerves from the eyes

Neuronal response to visual input from the eyes¨ Gabor functions

The principal role¨ The decomposition of the visual field into feature¨ Orientation, color, motion, etc.

The modeling point of view¨ The feature representation in this part of the model is topographic ¨ Features are represented in modules

Correspond to the location of the object in the visual field

27


11.5.4 Translation-invariant object recogni-tion with ANN The representation of the visual field in ‘V1-V4’ in this model

feed into the part labeled ‘IT’ The inferior-temporal cortex

¨ Involved in object recognition The connections between the ‘V1-V4’ and the ‘IT’

¨ Trained with Hebbian learning Translation-invariant object recognition

¨ The point attractor network to ‘recognize’ trained objects in test trials at all locations in the visual field

Cortical magnification

28


11.5.5 Size of the receptive field

The size of the receptive field of inferior-temporal neurons depends on the content of the visual field and the specifics of the task

29

Fig. 11.11 (A) Example of the average firing rate from recordings of a neuron in the in-ferior-temporal cortex of a monkey in response to an effective stimulus that is located at various degrees away from the direction of gaze. (B) Simulation results of a model with the essential components of the model shown in Fig. 11. 10. The correlation is thereby a measure of overlap between the activity of ‘IT’ nodes with the representation of the target object that was used during training.


11.5.6 Attentional bias in visual search and object recognition Visual search

¨ Simulate with supplying an object bias input to the attractor network in ‘IT’

¨ Top-down information¨ The additional input of an object bias to ‘IT’ can speed-up the

recognition process in ‘IT’ The object bias also supports the recognition ability of the in-

put from ‘V1-V4’ that corresponds to the target object Parallel conclusions

¨ An object recognition task in which top-down input to a spe-cific location in ‘PP’ is given

¨ Enhance the neural activity in ‘V1-V4’ for the features of the object that is located at the corresponding location

30


11.5.7 Parallel versus serial search

31

Fig. 11.12 Numerical experiments in which the model simulated a visual search task of a target object (the letter ‘E’) in a visual scene with visual distractors. (A) In one experiment the distractors consist of the letters ‘X’ that are visually very different from the target letter. The activity if a ‘PP’ node that corresponds to the target location increases in these experi -ments independently of the number of distractors, implying parallel search. (B) The second experiment was doe with dis -tractors (letter ‘F’) that were visually similar to the target letter. The reaction times, as measured from ‘PP’ nodes, depends linearly on the number of objects, a feature that is also characteristic of serial search. Both modes are, however, present in the same ‘parallel architecture’.


11.6 An interconnecting workspace hypothesis11.6.1 The global workspace

32

Fig. 11.13 Illustration of the workspace hypothesis. Two computa-tional spaces can be distinguished, the subnetworks with localized and specific computational specialization, and an interconnecting network that is the platform of the global workspace.


11.6.2 Demonstration of the global workspace in the Stroop task

33

Fig. 11.14 (A) In a Stroop task a word for a color, written in a color that can be different from the meaning of the word, is shown to a subject who is ask to perform either a word-naming or color-naming task. (B) Global workspace model that is able to reproduce several experimental findings in the Stroop task.


Conclusion

Fundamental example of modular networks¨ Complex information-processing systems

Expert and gating networks Demonstrate the number of modules in coupled attractor net-

works Working memory with modular networks Attentional vision

¨ Object recognition¨ Visual search

Workspace hypothesis

34

1 11. system level organization and coupled networks lecture notes on brain and computation...

Documents

snu cse biointelligence

integration network

single mapping network

modular mapping networks11

gating network weights

expert networks

big neural network

brain science