broca’s area

80
Broca’s area Pars opercularis Motor cortex Somatosensory cortex Sensory associative cortex Primary Auditory cortex Wernicke’s area Visual associative cortex Visual cortex [Adapted from Neural Basis of Thought and Language Jerome Feldman, Spring 2007, [email protected] Learning and Memory

Upload: yetty

Post on 27-Jan-2016

93 views

Category:

Documents


0 download

DESCRIPTION

Motor cortex. Somatosensory cortex. Sensory associative cortex. Pars opercularis. Visual associative cortex. Broca’s area. Visual cortex. Primary Auditory cortex. Wernicke’s area. Learning and Memory. [Adapted from Neural Basis of Thought and Language - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Broca’s area

Broca’sarea

Parsopercularis

Motor cortex Somatosensory cortex

Sensory associativecortex

PrimaryAuditory cortex

Wernicke’sarea

Visual associativecortex

Visualcortex

[Adapted from Neural Basis of Thought and Language Jerome Feldman, Spring 2007, [email protected]

Learning and Memory

Page 2: Broca’s area

Declarative Non-Declarative

Episodic Semantic Procedural

Memory

Learning and Memory

memory of a situation

general facts skills

Page 3: Broca’s area

Learning and Memory

There are two different types of learningSkill LearningFact and Situation Learning

General Fact Learning Episodic Learning

The process underlying skill (procedural) learning is partially different from those underlying fact/situation (declarative) learning.

Page 4: Broca’s area

Skill and Fact Learning involve different mechanisms

Certain brain injuries involving the hippocampal region of the brain render their victims incapable of learning any new facts or new situations. But these people can still learn new skills, including

relatively abstract skills like solving puzzles. Fact learning can be single-instance based. Skill learning requires repeated exposure to

stimuli.

Page 5: Broca’s area

Short term memory

How do we remember someone’s telephone number just after they tell us or the words in this sentence?

Short term memory is known to have a different biological basis than long term memory of either facts or skills. We now know that this kind of short term memory

depends upon ongoing electrical activity in the brain. You can keep something in mind by rehearsing it, but

this will interfere with your thinking about anything else.

Page 6: Broca’s area

Long term memory

But we do recall memories from decades past. These long term memories are known to be based on

structural changes in the synaptic connections between neurons.

Such permanent changes require the construction of new protein molecules and their establishment in the membranes of the synapses connecting neurons, and this can take several hours.

Thus there is a huge time gap between short term memory that lasts only for a few seconds and the building of long-term memory that takes hours to accomplish.

In addition to bridging the time gap, the brain needs mechanisms for converting the content of a memory from electrical to structural form.

Page 7: Broca’s area

Situational Memory

Think about an old situation that you still remember well. Your memory will include multiple modalities- vision, emotion, sound, smell, etc.

The standard theory is that memories in each particular modality activate much of the brain circuitry from the original experience.

There is general agreement that the Hippocampal area contains circuitry that can bind together the various aspects of an important experience into a coherent memory.

This process is believed to involve the Calcium based potentiation (LTP).

Page 8: Broca’s area

Dreaming and Memory

There is general agreement and considerable evidence that dreaming involves simulating experiences and is important in consolidating memory.

Page 9: Broca’s area

Models of Learning

Hebbian ~ coincidence Recruitment ~ one trial Supervised ~ correction (backprop) Reinforcement ~ delayed reward Unsupervised ~ similarity

Page 10: Broca’s area

Hebb’s Rule

The key idea underlying theories of neural learning go back to the Canadian psychologist Donald Hebb and is called Hebb’s rule.

From an information processing perspective, the goal of the system is to increase the strength of the neural connections that are effective.

Page 11: Broca’s area

Hebb (1949)

“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased”From: The organization of behavior.

Page 12: Broca’s area

Hebb’s rule

Each time that a particular synaptic connection is active, see if the receiving cell also becomes active. If so, the connection contributed to the success (firing) of the receiving cell and should be strengthened.

If the receiving cell was not active in this time period, our synapse did not contribute to the success the trend and should be weakened.

Page 13: Broca’s area

Hebb’s Rule: neurons that fire together wire together

Long Term Potentiation (LTP) is the biological basis of Hebb’s Rule

Calcium channels are the key mechanism

LTP and Hebb’s Rule

strengthen

weaken

Page 14: Broca’s area

Chemical realization of Hebb’s rule

It turns out that there are elegant chemical processes that realize Hebbian learning at two distinct time scales

Early Long Term Potentiation (LTP) Late LTP

These provide the temporal and structural bridge from short term electrical activity, through intermediate memory, to long term structural changes.

Page 15: Broca’s area

Calcium Channels Facilitate Learning In addition to the synaptic channels responsible

for neural signaling, there are also Calcium-based channels that facilitate learning. As Hebb suggested, when a receiving neuron fires,

chemical changes take place at each synapse that was active shortly before the event.

Page 16: Broca’s area

Long Term Potentiation (LTP)

These changes make each of the winning synapses more potent for an intermediate period, lasting from hours to days (LTP).

In addition, repetition of a pattern of successful firing triggers additional chemical changes that lead, in time, to an increase in the number of receptor channels associated with successful synapses - the requisite structural change for long term memory. There are also related processes for weakening synapses and

also for strengthening pairs of synapses that are active at about the same time.

Page 17: Broca’s area

The Hebb rule is found with long term potentiation (LTP) in the hippocampus

1 sec. stimuliAt 100 hz

Schafer collateral pathwayPyramidal cells

Page 18: Broca’s area

With high-frequency stimulation, Calcium comes in

During normal low-frequency trans-mission, glutamate interacts with NMDA and non-NMDA (AMPA) and metabotropic receptors.

Page 19: Broca’s area

Enhanced Transmitter Release

AMPA

Page 20: Broca’s area

Early and late LTP (Kandel, ER, JH Schwartz and TM Jessell

(2000) Principles of Neural Science. New York: McGraw-Hill.)

A. Experimental setup for demonstrating LTP in the hippocampus. The Schaffer collateral pathway is stimulated to cause a response in pyramidal cells of

CA1.

B. Comparison of EPSP size in early and late LTP with the early phase evoked by a single train and the late phase by 4

trains of pulses.

Page 21: Broca’s area
Page 22: Broca’s area

Computational Models based onHebb’s ruleThe activity-dependent tuning of the developing nervous system, as

well as post-natal learning and development, do well by following Hebb’s rule.

Explicit Memory in mammals appears to involve LTP in the Hippocampus.

Many computational systems for modeling incorporate versions of Hebb’s rule.

Winner-Take-All: Units compete to learn, or update their weights. The processing element with the largest output is declared the winner Lateral inhibition of its competitors.

Recruitment Learning Learning Triangle Nodes

LTP in Episodic Memory Formation

Page 23: Broca’s area
Page 24: Broca’s area

WTA: Stimulus ‘at’ is presented

a t o

1 2

Page 25: Broca’s area

Competition starts at category level

a t o

1 2

Page 26: Broca’s area

Competition resolves

a t o

1 2

Page 27: Broca’s area

Hebbian learning takes place

a t o

1 2

Category node 2 now represents ‘at’

Page 28: Broca’s area

Presenting ‘to’ leads to activation of category node 1

a t o

1 2

Page 29: Broca’s area

Presenting ‘to’ leads to activation of category node 1

a t o

1 2

Page 30: Broca’s area

Presenting ‘to’ leads to activation of category node 1

a t o

1 2

Page 31: Broca’s area

Presenting ‘to’ leads to activation of category node 1

a t o

1 2

Page 32: Broca’s area

Category 1 is established through Hebbian learning as well

a t o

1 2

Category node 1 now represents ‘to’

Page 33: Broca’s area

Hebb’s rule is not sufficient

What happens if the neural circuit fires perfectly, but the result is very bad for the animal, like eating something sickening? A pure invocation of Hebb’s rule would strengthen all participating

connections, which can’t be good. On the other hand, it isn’t right to weaken all the active connections

involved; much of the activity was just recognizing the situation – we would like to change only those connections that led to the wrong decision.

No one knows how to specify a learning rule that will change exactly the offending connections when an error occurs. Computer systems, and presumably nature as well, rely upon statistical

learning rules that tend to make the right changes over time. More in later lectures.

Page 34: Broca’s area

Hebb’s rule is insufficient

should you “punish” all the connections?

tastebud tastes rotten eats food gets sick

drinks water

Page 35: Broca’s area

Models of Learning

Hebbian ~ coincidence Recruitment ~ one trial Supervised ~ correction (backprop) Reinforcement ~ delayed reward Unsupervised ~ similarity

Page 36: Broca’s area

Recruiting connections

Given that LTP involves synaptic strength changes and Hebb’s rule involves coincident-activation based strengthening of connectionsHow can connections between two nodes be

recruited using Hebbs’s rule?

Page 37: Broca’s area

Recruitment Learning

Suppose we want to link up node X to node Y

The idea is to pick the two nodes in the middle to link them up

Can we be sure that we can find a path to get from X to Y?

KBFP )1(link no the point is, with a fan-out of 1000, if we allow 2 intermediate layers, we can almost always find a path

XX

YY

BBNN

KK

F = B/NF = B/N

Page 38: Broca’s area

X

Y

Page 39: Broca’s area

X

Y

Page 40: Broca’s area

Finding a Connection

P = Probability of NO link between X and Y

N = Number of units in a “layer”

B = Number of randomly outgoing units per unit

F = B/N , the branching factor

K = Number of Intermediate layers, 2 in the example

0 .999 .9999 .99999

1 .367 .905 .989

2 10-440 10-44 10-5

# Paths = (1-P k-1)*(N/F) = (1-P k-1)*B

P = (1-F) **B**K

N=K=

106 107 108

Page 41: Broca’s area

Finding a Connection in Random Networks

For Networks with N nodes and branching factor,

there is a high probability of finding good links.

N

Page 42: Broca’s area

Recruiting a Connection in Random Networks

1. Activate the two nodes to be linked2. Have nodes with double activation

strengthen their active synapses (Hebb)3. There is evidence for a “now print” signal

based on LTP (episodic memory)

Page 43: Broca’s area

Recruiting triangle nodes Let’s say we are trying to remember a green circle currently weak connections between concepts (dotted lines)

has-color

blue green round oval

has-shape

Page 44: Broca’s area

Strengthen these connections

and you end up with this picture

has-color

blue green round oval

has-shapeGreencircle

Page 45: Broca’s area
Page 46: Broca’s area

Has-color

Green

Has-shape

Round

Page 47: Broca’s area

Has-color Has-shape

GREEN ROUND

Page 48: Broca’s area

Models of Learning

Hebbian ~ coincidence Recruitment ~ one trial Supervised ~ correction (backprop) Reinforcement ~ delayed reward Unsupervised ~ similarity

Page 49: Broca’s area

Back Propagation

Page 50: Broca’s area

Supervised Learning - Backprop

How do we train the weights of the networkBasic Concepts

Use a continuous, differentiable activation function (Sigmoid)

Use the idea of gradient descent on the error surface

Extend to multiple layers

Page 51: Broca’s area

Backpropagation Algorithm

“activations”

“errors”

Page 52: Broca’s area

Backprop To learn on data which is not linearly

separable:Build multiple layer networks (hidden layer)Use a sigmoid squashing function instead

of a step function.

Page 53: Broca’s area

Tasks

Unconstrained pattern classificationCredit assessmentDigit Classification

Function approximationLearning controlStock prediction

Page 54: Broca’s area

Sigmoid Squashing Function

w2 wnw1

w0

y0=1

o u t p u t

y2 yny1. . .

i n p u t

n

i

iiywnet0

netey

1

1

Page 55: Broca’s area

Gradient Descent on an error

WSlope

Page 56: Broca’s area

Learning Rule – Gradient Descent on an Root Mean Square (RMS)

Learn wi’s that minimize squared error

21[ ] ( )

2 k kk O

E w t o

O = output layer

Page 57: Broca’s area

Gradient Descent

Gradient:

nw

E

w

E

w

EwE ,...,,][

10

ii w

Ew

Training rule: ][wEw

21[ ] ( )

2 k kk O

E w t o

Page 58: Broca’s area

Backpropagation Algorithm

Generalization to multiple layers and multiple output units

Page 59: Broca’s area

An informal account of BackProp

For each pattern in the training set:

Compute the error at the output nodes

Compute w for each wt in 2nd layer

Compute delta (generalized error expression) for hidden units

Compute w for each wt in 1st layer

After amassing w for all weights and, change each wt a little bit, as determined by the learning rate

jpipij ow

Page 60: Broca’s area

k j i

wjk wij

E = Error = ½ ∑i (ti – yi)2

yi

ti: targetij

ijij W

EWW

ijij W

EW

jiiiij

i

i

i

iij

yxfytW

x

x

y

y

E

W

E

)('

The derivative of the sigmoid is just ii yy 1

jiiiiij yyyytW 1

ijij yW iiiii yyyt 1

The output layerlearning rate

Page 61: Broca’s area

k j i

wjk wij

E = Error = ½ ∑i (ti – yi)2

yi

ti: target

The hidden layerjk

jk W

EW

jk

j

j

j

jjk W

x

x

y

y

E

W

E

iijiii

i j

i

i

i

ij

Wxfyty

x

x

y

y

E

y

E)(')(

kji

ijiiijk

yxfWxfytW

E

)(')(')(

kjji

ijiiiijk yyyWyyytW

11)(

jkjk yW jji

ijiiiij yyWyyyt

11)(

jji

iijj yyW

1

Page 62: Broca’s area

Let’s just do an example

E = Error = ½ ∑i (ti – yi)2x0 f

i1 w01

y0i2

b=1

w02

w0b

E = ½ (t0 – y0)2

i1 i2 y0

0 0 0

0 1 1

1 0 1

1 1 10.8

0.6

0.5

0

00.6224

0.51/(1+e^-0.5)

E = ½ (0 – 0.6224)2 = 0.1937

ijij yW iiiii yyyt 1

01 i 0

0

00000 1 yyyt

6224.016224.06224.000

1463.00

1463.0

0101 yW

0202 yW

00 bb yW

02 i

0 b

learning rate

suppose = 0.50731.01463.05.00 bW

0.4268

Page 63: Broca’s area

Momentum term

The speed of learning is governed by the learning rate. If the rate is low, convergence is slow If the rate is too high, error oscillates without reaching minimum.

Momentum tends to smooth small weight error fluctuations.

n)(n)y()1n(ij

wn)(ij

w ji

10

the momentum accelerates the descent in steady downhill directions.the momentum has a stabilizing effect in directions that oscillate in time.

Page 64: Broca’s area

Convergence

May get stuck in local minima Weights may diverge

…but works well in practice

Representation power:2 layer networks : any continuous function3 layer networks : any function

Page 65: Broca’s area

Local Minimum

USE A RANDOM COMPONENT SIMULATED ANNEALING

Page 66: Broca’s area

Overfitting and generalization

TOO MANY HIDDEN NODES TENDS TO OVERFIT

Page 67: Broca’s area

Overfitting in ANNs

Early Stopping :Stop training when error goes up on validation set

Page 68: Broca’s area

Stopping criteria

Sensible stopping criteria: total mean squared error change:

Back-prop is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small (in the range [0.01, 0.1]).

generalization based criterion: After each epoch the NN is tested for generalization. If the generalization performance is adequate then stop. If this stopping criterion is used then the part of the training set used for testing the network generalization will not be used for updating the weights.

Page 69: Broca’s area

Architectural ConsiderationsWhat is the right size network for a given job?

How many hidden units?

Too many: no generalization

Too few: no solution

Possible answer: Constructive algorithm, e.g.

Cascade Correlation (Fahlman, & Lebiere 1990)

etc

Page 70: Broca’s area

The number of layers and of neurons depend on the specific task. In practice this issue is solved by trial and error.

Two types of adaptive algorithms can be used: start from a large network and successively remove

some nodes and links until network performance degrades.

begin with a small network and introduce new neurons until performance is satisfactory.

Network Topology

Page 71: Broca’s area

Supervised vs Unsupervised Learning

•Backprop is supervised

•requires a 'target'

•how realistic is that?

•Hebbian learning is unsupervised

•but limited in power

•How can we combine the power of backprop with the idea of unsupervised learning?

Page 72: Broca’s area

Autoassociative Networks

input

copy of input as target

•Network trained to reproduce the input at the output layer

•Non-trivial if number of hidden units is smaller than inputs/outputs

•Forced to develop compressed representations of the patterns

•Hidden unit representations may reveal natural kinds (e.g. Vowels vs Consonants)

•Problem of explicit teacher is circumvented

Page 73: Broca’s area

Problems and Networks

•Some problems have natural "good" solutions

•Solving a problem may be possible by providing the right armoury of general-purpose tools

•Networks are general purpose tools.

•Choice of network type, training, architecture, etc greatly influences the chances of successfully solving a problem

•Tailoring tools for a specific job Vs Exploiting general purpose learning mechanism

Page 74: Broca’s area

Summary

Multiple layer feed-forward networks Replace Step with Sigmoid (differentiable)

function Learn weights by gradient descent on error

function Backpropagation algorithm for learning Avoid overfitting by early stopping

Page 75: Broca’s area

ALVINN drives 70mph on highways

Page 76: Broca’s area

Use MLP Neural Networks when …

(vectored) Real inputs, (vectored) real outputs

You’re not interested in understanding how it works

Long training times acceptable Short execution (prediction) times required Robust to noise in the dataset

Page 77: Broca’s area

Applications of FFNN

Classification, pattern recognition: FFNN can be applied to tackle non-linearly

separable learning problems. Recognizing printed or handwritten characters, Face recognition Classification of loan applications into credit-worthy and

non-credit-worthy groups Analysis of sonar radar to determine the nature of the

source of a signal

Regression and forecasting: FFNN can be applied to learn non-linear functions

(regression) and functions whose inputs is a sequence of measurements over time (time series).

Page 78: Broca’s area

Extensions of Backprop Nets

Recurrent Architectures Backprop through time

Page 79: Broca’s area

Elman Nets & Jordan Nets

Updating the context as we receive input In Jordan nets we model “forgetting” as well The recurrent connections have fixed weights You can train these networks using good ol’ backprop

Output

Hidden

Context Input

1

α

Output

Hidden

Context Input

1

Page 80: Broca’s area

Recurrent Backprop

we’ll pretend to step through the network one iteration at a time

backprop as usual, but average equivalent weights (e.g. all 3 highlighted edges on the right are equivalent)

a b c unrolling3 iterations

a b c

a b c

a b cw2

w1 w3

w4

w1 w2 w3 w4

a b c