introduction to artificial neural networks€¦ · 3/24/07 kristof van laerhoven - introduction in...

70
Introduction to Artificial Neural Networks Kristof Van Laerhoven kristof mis.tu-darmstadt.de http://www.mis.informatik.tu-darmstadt.de/kristof @

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

Introduction to

Artificial Neural Networks

Kristof Van Laerhovenkristof mis.tu-darmstadt.de

http://www.mis.informatik.tu-darmstadt.de/kristof

@

Page 2: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 2

Course Overview

• We will:

Learn to interpret common ANN formulas anddiagrams

Get an extensive overview of the major ANNs

Understand mechanisms behind ANNs

Introduce it with historic background

• Left as exercises:

Implementations of algorithms and details

Evaluations and proofs

Page 3: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 3

The Start of Artificial Neural Nets

• Standard ANN introduction courses just mention:

McCulloch, W. and Pitts, W. (1943). A logical

calculus of the ideas immanent in nervous

activity. Bulletin of Mathematical Biophysics,7:115 - 133.

• But let s give a few more details…

Warren McCulloch

(1898 - 1972)

Walter Pitts

(1921 - 1969)

‘real neuron’… ‘simple model’… ‘some authors’…

Page 4: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 4

before 1943: Neurobiology

• Trepanation: Brain = ‘cranial stuffing’

• ~400BC: Hippocrates:

Brain = “Seat of intelligence”

• ~350BC: Aristotle: Brain cools blood

• 160: Galen: Brain damage ~ mental

functioning

• 1543: Vesalius: anatomy

• 1783: Galvani:electrical excitability of muscles, neurons

Page 5: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 5

before 1943: Neurobiology

• 1898: Camillo Golgi: Golgi Stains

• 1906: Golgi + Santiago Ramón y

Cajal: Nobel prize

• ~1900: DuBois-Reymond, Müller,and von Helmholtz:

neurons electrically excitable

their activity affects electrical state ofadjacent neurons

Page 6: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6

before 1943: Neurobiology

• In 1943 there already existed a lively community of biophysicists doingmathematical work on neural nets

• Dissections, Golgi stains, and microscopes, but not: Church-Turing forthe brain

“One would assume, I think, that the

presence of a theory, however strange, in

a field in which no theory had previously

existed, would have been a spur to the

imagination of neurobiologists. But this did

not occur at all! The whole field of

neurology and neurobiology ignored the

structure, the message, and the form of

McCulloch’s and Pitts’s theory. Instead,

those who were inspired by it were those

who were destined to become the

aficionados of a new venture, now called

Artificial Intelligence, which proposed to

realize in a programmatic way the ideas

generated by the theory” -- Lettvin, 1989

‘real’ neurons…

Page 7: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 7

Back to McCulloch-Pitts

• Warren McCulloch:Neurophysiologist (philosophy, psychology, MD)The ‘McCulloch group’

• Walter PittsSelf-taught logic and maths, Principia @12y

Homeless, non-enrolled as student, eccentricManuscript on 3D networks

• Their seminal 1943 paper:Mathematical (logic) model for neuronNervous system can be considered a kind ofuniversal computing device as described by Leibniz

» Recommended Reading: Dark Hero Of The Information Age: In Search of NorbertWiener, the Father of Cybernetics, and search the web for Jerome Lettvin

What the frog's eyetells the frog's brain

1959

Page 8: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 8

(Hebbian) Learning

• Previously:Pavlov, 1927: Classical Conditioning

Thorndike, 1898: Operant Conditioning

• Donald Hebb (psychology, teacher)The Organization of Behavior

What fires together, wires together: “When anaxon of cell A is near enough to excite cell B andrepeatedly or persistently takes part in firing it, somegrowth process or metabolic change takes place inone or both cells such that A's efficiency, as one ofthe cells firing B, is increased” -- D. Hebb

• Others: Steven Grossberg, GailCarpenter: Adaptive Resonance Theory

Page 9: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 9

The Perceptron

° 1957: Frank Rosenblatt “The perceptron: A perceivingand recognizing automaton (project PARA).”

x1

x2

w1

w2y

x: inputs w: weights b: bias

0 < < 1: learning rate

Sign(y) : output (usually -1 or 1)

yT : target output (usually -1 or 1)

x3

w3

.

.

.

.

.

.

y = (wixi) + bi=1,2,3,...

b

Page 10: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 10

The Perceptron

x1

x2

w1

w2y

x: inputs w: weights b: bias0 < < 1: learning rateSign(y) : output yT : target output

Learning / Training Phase: (Widrow-Hoff, or Delta Rule)1. Start with random w1, w2, w3, …, b (usually near 0)2. Calculate y for the current input x1, x2, x3, …3. Update weights and bias: wi' = wi + (yT y)xi4. Go to step 2

x3

w3

.

.

.

.

.

.

y = (wixi) + bi=1,2,3,...

b

yT?

Page 11: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 11

The Perceptron: Example

• Teach the neuron to spotmale authors of emails on amailing list, given two inputs

x1: counts of {‘around’, ‘what’,‘more’} minus counts of{‘with’,’if’,’not’}

x2: counts of {‘at’,’it’,’many’}minus counts of {’myself’,’hers’,‘was’}

• Learning by examples:

Negative samples:

Positive samples:

• (loosely based on http://www.bookblog.net/gender/genie.php and “Gender, Genre, and

Writing Style in Formal Written Texts”, Shlomo Argamon, Moshe Koppel, Jonathan Fine,

Anat Rachel Shimoni)

………

(6,1)Coffee machine h…Kristof

(3,2)Re: UN petition fo…Alan

(-2,1)Printer on B floor i…Cath

(5,4)Visitors on Mon/TueHans

(-2,-1)Meeting room info…Christine

(5,-2)Camera Equipme…Gordon

(x1,x2)Subject:Sender:

Page 12: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 12

The Perceptron: Example

• Teach the neuron to spotmale authors of emails on amailing list, given two inputs

x1: counts of {‘around’, ‘what’,‘more’} minus counts of{‘with’,’if’,’not’}

x2: counts of {‘at’,’it’,’many’}minus counts of {’myself’,’hers’,‘was’}

• Learning by examples:

Negative samples:

Positive samples:

• (loosely based on http://www.bookblog.net/gender/genie.php and “Gender, Genre, and

Writing Style in Formal Written Texts”, Shlomo Argamon, Moshe Koppel, Jonathan Fine,

Anat Rachel Shimoni)

x1

x2

alan

hans

kristof

gordonchristine

cath

Instead of using the whole mails,

we use only keyword counts.

These type of ‘pre-processed’

values are called features in

machine learning literature.

Page 13: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 13

The Perceptron: Example

x1

x2

y =1

2x1

1

2x2 + 0 = 0

y = (wixi) + bi=1,2,3,...

w1=1/2

w2=-1/2

b = 0

y = w1x1+w2x2+b

= (1/2) x1 + (-1/2)x2 + 0

Initial Values:

sign(y)=-1

sign(y)=+1

x1

x2

w1

w2

b

y

Page 14: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 14

The Perceptron: Example

x1

x2

y = (wixi) + bi=1,2,3,...

y =2

5x1

1

3x2

2

7= 0

w1=2/5

w2=-1/3

b = -2/7

y = w1x1+w2x2+b

= (2/5) x1 + (-1/3)x2 + (-2/7)

Values after training for all and (with =0.3):

yT=-1

yT=+1

x1

x2

w1

w2

b

y

Page 15: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 15

The Perceptron: Example

x1

x2

y =9

12x1

2

23x2 2 = 0

y = (wixi) + bi=1,2,3,...

w1=9/12

w2=-2/23

b = -2

y = w1x1+w2x2+b

= (9/12) x1 + (-2/23)x2 + (-2)

Values after training 5 times for all and (with =0.3):

x1

x2

w1

w2

b

y

Page 16: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 16

The Perceptron’s Delta Rule

E = yT y( )22

• Delta Rule or Gradient Descent:

• Error:

• Calculate partial derivative of the Error

w.r.t. each weight:

w1 = w'1 w1 = (yT y)x1

E

w1= (yT y)x1

Local Minima only!

Page 17: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 17

The Perceptron

• Perceptron has 2 phases:Training/Learning Phase: adaptive weights

Testing Phase: for checking performance andgeneralisation

• Neurons and their outputs are grouped in 1 layer:

x1

x2

w11

w21y1

b1

x3

w31

b2

w12w22

w32

x2b1

x3 b2

x1

bm

xn

: : :y2

y1

y2

ym

Page 18: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 18

Output Functions

• Output function per

neuron: g(y) limits and

scales output

Simple threshold

Sigmoid

Gaussian

Piecewise Linear

g(x) = ex 2

2

g(x) =1

1+ e x

g(x) =

1x+1

20

...

...

...

1< x

1< x <1

x < 1

g(x) = tanh(x)

Page 19: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 19

Parameters

• Bias can be seen as weight to constant

input 1, or input with constant weight 1

• Output parameters (e.g., steepness for

sigmoid)

• Momentum term to avoid local minima

• Initialisation: Use a ‘Committee of

networks’ to avoid random init effects

(many networks training on same data,

different initialised weights)

Page 20: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 20

The Perceptron’s Issues

° 1969: Marvin Minsky and Seymour Papert,

“Perceptrons”, Cambridge, MA: MIT Press

=> Non-linearly separable functions not solvable!

x1

x2

y = (wixi) + bi=1,2,3,...

x1=0 x2=0 -> =0x1=1 x2=0 -> =1x1=0 x2=1 -> =1x1=1 x2=1 -> =0

XOR?x1

x2

w1

w2

b

y

Page 21: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 21

Backpropagation

° P. Werbos in 1974

° Compiled in 1986: D.E. Rumelhart, J.L. McClelland “Parallel

Distributed Processing: Explorations in the Microstructure of

Cognition”

• Network: Multi-layer

perceptron:

Same neurons, but:

‘Hidden’ layers

• Training propagates

per layer, from

output back to input

weights

1 hidden

layer

output

layer

(input

layer)

Page 22: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 22

Backpropagation Test Phase

• Inputs are entered innetwork

1 hidden

layer

output

layer

(input

layer)

Page 23: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 23

Backpropagation Test Phase

• Inputs are entered innetwork

• Weighted sumscalculate…

1 hidden

layer

output

layer

(input

layer)

Page 24: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 24

Backpropagation Test Phase

• Inputs are entered innetwork

• Weighted sumscalculate…

• Outputs of hidden layerneurons

1 hidden

layer

output

layer

(input

layer)

Page 25: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 25

Backpropagation Test Phase

• Inputs are entered innetwork

• Weighted sumscalculate…

• Outputs of hidden layerneurons

• Which are weightedagain to calculate…

1 hidden

layer

output

layer

(input

layer)

Page 26: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 26

Backpropagation Test Phase

• Inputs are entered innetwork

• Weighted sumscalculate…

• Outputs of hidden layerneurons

• Which are weightedagain to calculate…

• The outputs of theneurons in the outputlayer.. finally!

1 hidden

layer

output

layer

(input

layer)

Page 27: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 27

Backpropagation Training

• In the training phase, wealso know the targetoutputs for the input…

1 hidden

layer

output

layer

(input

layer)

Page 28: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 28

Backpropagation Training

• In the training phase, wealso know the targetoutputs for the input…

• Which can be used tore-adjust the weightsgoing to the output layer

1 hidden

layer

output

layer

(input

layer)

Page 29: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 29

Backpropagation Training

• In the training phase, wealso know the targetoutputs for the input…

• Which can be used tore-adjust the weightsgoing to the output layer

• Which lead to targetoutputs for the middlelayer neurons

1 hidden

layer

output

layer

(input

layer)

Page 30: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 30

Backpropagation Training

• In the training phase, wealso know the targetoutputs for the input…

• Which can be used tore-adjust the weightsgoing to the output layer

• Which lead to targetoutputs for the middlelayer neurons

• Which re-adjust theremaining weights (justlike in the perceptron)

1 hidden

layer

output

layer

(input

layer)

Page 31: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 31

Backpropagation

• XOR works!

y3

y2y1

x1

14.6

x2

-12-10.4

-6.5

-12 14.6

-11.9

4.3

-8.3

0

1

14.6

1-12

-10.4-6.5

-12 14.6

-11.9

4.3

-8.31

0

14.6

1-12

-10.4-6.5

-12 14.6

-11.9

4.3

-8.3

0

0

14.6

0-12

-10.4-6.5

-12 14.6

-11.9

4.3

-8.3 1

1

14.6

0-12

-10.4-6.5

-12 14.6

-11.9

4.3

-8.3

• Note: This network could be re-trained for OR, AND,

NOR, or NAND

Page 32: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 32

Backpropagation

• In formulas:

Page 33: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 33

Multilayer Perceptron Apps

• 1988: Gorman and Sejnowski “Analysis of Hidden Units in aLayered Network Trained to Classify Sonar Targets”

Page 34: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 34

Multilayer Perceptron Apps

• ALVINN (Pomerleau, 1993)

• Image Compression usingBackprop (Paul Watta, BrijeshDesaie, Norman Dannug,Mohamad Hassoun, 1996)

Page 35: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 35

Underfitting and Overfitting

• Underfitting: ANN that

is not sufficiently

complex to correctly

detect the pattern in a

noisy data set

• Overfitting: ANN that

is too complex so it

reacts on noise in the

data

• Techniques to avoid

them:

Model selection

Jittering

Weight decay

Early stopping

Bayesian estimation

Page 36: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 36

Overview so far…

• These are the most popular neural networkarchitectures and their training algorithms:

Perceptron - gradient descent / delta rule• Only linearly separable functions

• One layer

Multilayer perceptron - backpropagation• Hidden Layer(s)

• More powerful

• Training: errors are propagated back towards input weights

• Many applications

• They are SUPERVISED: we train them withinputs plus known target outputs

Page 37: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 37

The Kohonen Map

• Also: Self-Organising (Feature) Map• Teuvo Kohonen (1982) “Self-organized formation

of topologically correct feature maps”

• ‘Competition’ among neuronsfor the input:

• Topology amongneurons

• Unsupervised

neighbourhoodlearning rate

w'i = wi + (xi wi)

x2

x3

x4

x5

x1 w1

w2

w3

w4

w5

Page 38: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 38

neighbourhoodlearning rate

(72,19) (22,93) (24,91) (31,26)

(35,102) (95,67) (30,11) (64,19)

(12,19) (83,36) (16,13) (24,74)

(93,10) (72,56) (75,55) (61,42)

The Kohonen Map: Example

• Randomly initialised grid of (weight-)

vectors:

w'i = wi + (xi wi)

Page 39: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 39

w'i = wi + (xi wi)neighbourhoodlearning rate

(72,19) (22,93) (24,91) (31,26)

(35,102) (95,67) (30,11) (64,19)

(12,19) (83,36) (16,13) (24,74)

(93,10) (72,56) (75,55) (61,42)

The Kohonen Map: Example

• Randomly initialised grid of (weight-)

vectors:

• Input:

(82,41)

Page 40: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 40

neighbourhoodlearning rate

(72,19) (22,93) (24,91) (31,26)

(35,102) (95,67) (30,11) (64,19)

(12,19) (83,36) (16,13) (24,74)

(93,10) (72,56) (75,55) (61,42)

The Kohonen Map: Example

• Randomly initialised grid of (weight-)

vectors:

• Input:

(82,41)

w'i = wi + (xi wi)

minimum

distance

Page 41: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 41

neighbourhoodlearning rate

(72,19) (22,93) (24,91) (31,26)

(35,102) (95,67) (30,11) (64,19)

(12,19) (16,13) (24,74)

(93,10) (72,56) (75,55) (61,42)

The Kohonen Map: Example

• ‘Winner Takes All’: 1 for closest neuron

• Step 1:

(82,39)

(82,41)

w'i = wi + (xi wi)

Page 42: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 42

neighbourhoodlearning rate

(72,19) (22,93) (24,91) (31,26)

(41,77) (91,52) (53,20) (64,19)

(80,28) (34,21) (24,74)

(88,29) (78,46) (77,37) (61,42)

The Kohonen Map: Example

• Neighbours: is smaller for surrounding

• Step 1:

(82,39)

w'i = wi + (xi wi)

Page 43: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 43

neighbourhoodlearning rate

(74,21) (26,89) (27,91) (38,27)

(41,77) (91,52) (53,20) (68,22)

(80,28) (34,21) (32,69)

(88,29) (78,46) (77,37) (63,41)

The Kohonen Map: Example

• Neighbours: gets smaller and smaller…

• Step 1:

(82,39)

w'i = wi + (xi wi)

Page 44: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 44

neighbourhoodlearning rate

(74,21) (26,89) (27,91) (38,27)

(41,77) (91,52) (53,20) (68,22)

(80,28) (34,21) (32,69)

(88,29) (78,46) (77,37) (63,41)

The Kohonen Map: Example

• Different inputs activate different regions

• Step 2:

Newinput:

(82,39)

(70,22)

w'i = wi + (xi wi)

Page 45: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 45

neighbourhoodlearning rate

(74,21) (31,79) (33,72) (46,23)

(44,71) (90,47) (57,21) (68,22)

(80,28) (39,21) (38,47)

(88,29) (76,41) (77,37) (58,37)

The Kohonen Map: Example

• Different inputs activate different regions

• Step 2:

Newinput:

(79,37)

(70,20)

w'i = wi + (xi wi)

Page 46: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 46

neighbourhoodlearning rate

(74,21) (43,61) (37,69) (46,23)

(52,65) (89,45) (61,34) (68,22)

(79,27) (40,21) (38,47)

(88,29) (76,41) (77,37) (58,37)

The Kohonen Map: Example

• The map organises along input space

• Step 3:

Newinput:

(78,36)

(76,20)

w'i = wi + (xi wi)

Page 47: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 47

The Kohonen Map

• The neighbourhood and learning rate arecontrolled in 2 phases:

Self-Organising / Ordering phase: topologicalordering of the weight vectors ( , large)

Convergence Phase: fine-tuning of the map ( ,small)

larg

esm

all

input

Page 48: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 48

Kohonen Map: Applications

• Detecting structure in inputdata (NNRC, 1997)

Input: 39 indicators

describing various

quality-of-life factors,

such as state of

health, nutrition,

educational services

Page 49: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 49

Kohonen Map: Applications

• Traveling Salesman Problem (Fritzke &

Wilke, 1990)

• 1D Ring structure

Page 50: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 50

Kohonen Map: Applications

• Visualising sensor data from different

contexts (Van Laerhoven, 1999)

Page 51: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 51

x1

x2

w1

w2y y =

+1 if (wixi)i=1,2,3,...

>

1 otherwise

The Hopfield Network

• 1982: John Hopfield, "Neural networks andphysical systems with emergent collectivecomputational abilities”

• Associative Memory: ‘pony’

• Simple threshold units

• Outputs are -1 or +1 (sometimes 0 or 1):

• Connections are typically symmetric!

Page 52: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 52

The Hopfield Network

• States:

2

1

3

new state if…neuron

6

6

3

4

3

3

1

1

3 fires

731117

660116

711015

640014

331103

260102

311001

240000

2 fires1 fires321state

Page 53: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 53

Energy Functions

• Energy function of the network is a scalar valueused for finding local minima, regarded assolutions

• Popular method for neural network design

E =1

2wijsis j

i< j

+ isii

‘solutions’: memorised states

for Hopfield net

Page 54: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 54

The Hopfield Network

• States for:

2

1

3

new state if…neuron

6

6

3

4

3

3

1

1

3 fires

731117

660116

711015

640014

331103

260102

311001

240000

2 fires1 fires321state

3 6

5 7 2 0

1 4

Page 55: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 55

Hopfield Network Apps

• Pattern recognition:

Page 56: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 56

Elman Network

• Jeff Elman 1990: “Finding structure in time”

• Sequences in the input: implicit

representation of time

• Hidden neuron(s) copiedas an input

• Backpropagationas usual

Context neuronInput neurons

Hidden neurons

Output neuron

Page 57: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 57

Elman Network

Context neuronInput neurons

Hidden neurons

Output neuron

• Input is given

Page 58: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 58

Elman Network

Context neuronInput neurons

Hidden neurons

Output neuron

• Input is given

• Hidden neurons’

output functions

are calculated

Page 59: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 59

Elman Network

Context neuronInput neurons

Hidden neurons

Output neuron

• Input is given

• Hidden neurons’

output functions

are calculated

• Output is

calculated and

hidden neuron’s

output is copied

Page 60: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 60

Elman Network

Context neuronInput neurons

Hidden neurons

Output neuron

• Input is given

• Hidden neurons’

output functions

are calculated

• Output is

calculated and

hidden neuron’s

output is copied

• The next input…

Page 61: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 61

Elman Network Apps

• Learning formal grammars

Which strings belong to a language?

• Speech recognition:Recognition of Auditory Sound Sources and Events

(Stephen McAdams, 1990)

• Music composition:Neural network music composition by prediction: Exploring

the benefits of psychophysical constraints and multiscale

processing (Michael Mozer, 1994)

• Activity recognition:

Page 62: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 62

Terminology

Architectures:

• McCulloch-Pitts

• Adaptive Resonance

Theory

• Perceptron

• Multilayer Perceptron

• Kohonen Map

• Hopfield Network

• Elman Network

Learning:

• Hebbian

• Delta rule

• Backpropagation

• (Un)Supervised

• Competitive Learning

• Associative Memory

• Energy

Page 63: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 63

Architectures

perceptron multilayer perceptronHopfield net

Kohonen map Elman network

Page 64: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 64

Training

perceptron multilayer perceptronHopfield net

Kohonen map Elman network

Delta rule /

Gradient descent

Supervised

Backpropagation

Supervised

Associative

memory

Unsupervised

Competitive Learning

Unsupervised

Backpropagation

Supervised

Page 65: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 65

Number of Inputs

perceptron multilayer perceptronHopfield net

Kohonen map Elman network

4 4 3

5 2

Page 66: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 66

Number of Weights

perceptron multilayer perceptronHopfield net

Kohonen map Elman network

12 24 3

60 9

Page 67: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 67

Number of Outputs

perceptron multilayer perceptronHopfield net

Kohonen map Elman network

3 4 3

12 1

Page 68: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 68

Further Research

• Spiking / Pulsed Neural Networks

‘Spike trains’: timings of pulses is crucial

Closer to biological neuron

Action potential p

tt1,1

t1,2

t2,1

Page 69: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 69

Further Research

• Principal Component Analysis

Dimension reduction

• Independent Component Analysis

Cocktail party problem

‘blind source detection’

Page 70: Introduction to Artificial Neural Networks€¦ · 3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 6 before 1943: Neurobiology • In 1943 there already

3/24/07 Kristof Van Laerhoven - Introduction in Artificial Neural Networks 70

Further Research

• Support Vector Machines

x1

x2