chapter 20 · chapter 20 2. brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle...

44
Neural networks Chapter 20 Chapter 20 1

Upload: others

Post on 29-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Neural networks

Chapter 20

Chapter 20 1

Page 2: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Outline

♦ Brains

♦ Neural networks

♦ Perceptrons

♦ Multilayer networks

♦ Applications of neural networks

Chapter 20 2

Page 3: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Brains

1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle timeSignals are noisy “spike trains” of electrical potential

Axon

Cell body or Soma

Nucleus

Dendrite

Synapses

Axonal arborization

Axon from another cell

Synapse

Chapter 20 3

Page 4: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

McCulloch–Pitts “unit”

Output is a “squashed” linear function of the inputs:

ai← g(ini) = g(

ΣjWj,iaj

)

Output

ΣInput Links

Activation Function

Input Function

Output Links

a0 = −1 ai = g(ini)

ai

giniWj,i

W0,i

Bias Weight

aj

Chapter 20 4

Page 5: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Activation functions

(a) (b)

+1 +1

iniini

g(ini)g(ini)

(a) is a step function or threshold function

(b) is a sigmoid function 1/(1 + e−x)

Changing the bias weight W0,i moves the threshold location

Chapter 20 5

Page 6: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Implementing logical functions

McCulloch and Pitts: every Boolean function can be implemented (withlarge enough network)

AND?

OR?

NOT?

MAJORITY?

Chapter 20 6

Page 7: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Implementing logical functions

McCulloch and Pitts: every Boolean function can be implemented (withlarge enough network)

AND

W0 = 1.5

W1 = 1

W2 = 1

OR

W2 = 1

W1 = 1

W0 = 0.5

NOT

W1 = –1

W0 = – 0.5

Chapter 20 7

Page 8: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Network structures

Feed-forward networks:– single-layer perceptrons– multi-layer networks

Feed-forward networks implement functions, have no internal state

Recurrent networks:– Hopfield networks have symmetric weights (Wi,j = Wj,i)

g(x) = sign(x), ai = ± 1; holographic associative memory

– Boltzmann machines use stochastic activation functions,≈ MCMC in BNs

– recurrent neural nets have directed cycles with delays⇒ have internal state (like flip-flops), can oscillate etc.

Chapter 20 8

Page 9: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Feed-forward example

W1,3

1,4W

2,3W

2,4W

W3,5

4,5W

1

2

3

4

5

Feed-forward network = a parameterized family of nonlinear functions:

a5 = g(W3,5 · a3 + W4,5 · a4)

= g(W3,5 · g(W1,3 · a1 + W2,3 · a2) + W4,5 · g(W1,4 · a1 + W2,4 · a2))

Chapter 20 9

Page 10: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptrons

InputUnits Units

OutputWj,i

-4 -2 0 2 4x1-4

-20

24

x2

00.20.40.60.8

1Perceptron output

Chapter 20 10

Page 11: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Expressiveness of perceptrons

Consider a perceptron with g = step function (Rosenblatt, 1957, 1960)

Can represent AND, OR, NOT, majority, etc.

Represents a linear separator in input space:

ΣjWjxj > 0 or W · x > 0

I 1

I 2

I 1

I 2

I 1

I 2

?

(a) (b) (c)

0 1

0

1

0

1 1

0

0 1 0 1

xor I 2I 1orI 1 I 2and I 1 I 2

Chapter 20 11

Page 12: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning

Learn by adjusting weights to reduce error on training set

The squared error for an example with input x and true output y is

E =1

2Err

2 ≡1

2(y − hW(x))2

Chapter 20 12

Page 13: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning

Learn by adjusting weights to reduce error on training set

The squared error for an example with input x and true output y is

E =1

2Err

2 ≡1

2(y − hW(x))2

Perform optimization search by gradient descent:

∂E

∂Wj=?

Chapter 20 13

Page 14: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning

Learn by adjusting weights to reduce error on training set

The squared error for an example with input x and true output y is

E =1

2Err

2 ≡1

2(y − hW(x))2

Perform optimization search by gradient descent:

∂E

∂Wj= Err ×

∂Err

∂Wj= Err ×

∂Wj

(

y − g(Σnj = 0

Wjxj))

Chapter 20 14

Page 15: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning

Learn by adjusting weights to reduce error on training set

The squared error for an example with input x and true output y is

E =1

2Err

2 ≡1

2(y − hW(x))2

Perform optimization search by gradient descent:

∂E

∂Wj= Err ×

∂Err

∂Wj= Err ×

∂Wj

(

y − g(Σnj = 0

Wjxj))

= −Err × g′(in)× xj

Chapter 20 15

Page 16: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning

Learn by adjusting weights to reduce error on training set

The squared error for an example with input x and true output y is

E =1

2Err

2 ≡1

2(y − hW(x))2

Perform optimization search by gradient descent:

∂E

∂Wj= Err ×

∂Err

∂Wj= Err ×

∂Wj

(

y − g(Σnj = 0

Wjxj))

= −Err × g′(in)× xj

Simple weight update rule:

Wj ← Wj + α×Err × g′(in)× xj

E.g., +ve error ⇒ increase network output⇒ increase weights on +ve inputs, decrease on -ve inputs

Chapter 20 16

Page 17: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning

W = random initial values

for iter = 1 to T

for i = 1 to N (all examples)

~x = input for example iy = output for example iWold = WErr = y − g(Wold · ~x)for j = 1 to M (all weights)

Wj = Wj + α · Err · g′(Wold · ~x) · xj

Chapter 20 17

Page 18: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning contd.

Derivative of sigmoid g(x) can be written in simple form:

g(x) =1

1 + e−x

g′(x) = ?

Chapter 20 18

Page 19: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning contd.

Derivative of sigmoid g(x) can be written in simple form:

g(x) =1

1 + e−x

g′(x) =e−x

(1 + e−x)2= e−xg(x)2

Also,

g(x) =1

1 + e−x⇒ g(x) + e−xg(x) = 1 ⇒ e−x =

1− g(x)

g(x)

So

g′(x) =1− g(x)

g(x)g(x)2

= (1− g(x))g(x)

Chapter 20 19

Page 20: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Perceptron learning contd.

Perceptron learning rule converges to a consistent functionfor any linearly separable data set

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100Pro

port

ion

corr

ect o

n te

st s

et

Training set size - MAJORITY on 11 inputs

PerceptronDecision tree

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100Pro

port

ion

corr

ect o

n te

st s

et

Training set size - RESTAURANT data

PerceptronDecision tree

Chapter 20 20

Page 21: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Multilayer networks

Layers are usually fully connected;numbers of hidden units typically chosen by hand

Input units

Hidden units

Output units ai

Wj,i

aj

Wk,j

ak

Chapter 20 21

Page 22: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Expressiveness of MLPs

All continuous functions w/ 1 hidden layer, all functions w/ 2 hidden layers

-4 -2 0 2 4x1-4

-20

24

x2

00.20.40.60.8

1

hW(x1, x2)

-4 -2 0 2 4x1-4

-20

24

x2

00.20.40.60.8

1

hW(x1, x2)

Chapter 20 22

Page 23: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation learning

In general have n output nodes,

E ≡1

2

iErr2

i ,

where Erri = (yi − ai) and ∑

i runs over all nodes in the output layer.

Output layer: same as for single-layer perceptron,

Wj,i ← Wj,i + α× aj ×∆i

where ∆i = Err i × g′(in i)

Hidden layers: back-propagate the error from the output layer:

∆j = g′(inj)∑

iWj,i∆i .

Update rule for weights in hidden layers:

Wk,j ← Wk,j + α× ak ×∆j .

Chapter 20 23

Page 24: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation

For a node i in the output layer:

∂E

∂Wj,i= −(yi − ai)

∂ai

∂Wj,i

Chapter 20 24

Page 25: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation

For a node i in the output layer:

∂E

∂Wj,i= −(yi − ai)

∂ai

∂Wj,i= −(yi − ai)

∂g(in i)

∂Wj,i

Chapter 20 25

Page 26: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation

For a node i in the output layer:

∂E

∂Wj,i= −(yi − ai)

∂ai

∂Wj,i= −(yi − ai)

∂g(in i)

∂Wj,i

= −(yi − ai)g′(in i)

∂in i

∂Wj,i

Chapter 20 26

Page 27: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation

For a node i in the output layer:

∂E

∂Wj,i= −(yi − ai)

∂ai

∂Wj,i= −(yi − ai)

∂g(in i)

∂Wj,i

= −(yi − ai)g′(in i)

∂in i

∂Wj,i= −(yi − ai)g

′(in i)∂

∂Wj,i

kWk,iaj

Chapter 20 27

Page 28: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation

For a node i in the output layer:

∂E

∂Wj,i= −(yi − ai)

∂ai

∂Wj,i= −(yi − ai)

∂g(in i)

∂Wj,i

= −(yi − ai)g′(in i)

∂in i

∂Wj,i= −(yi − ai)g

′(in i)∂

∂Wj,i

kWk,iaj

= −(yi − ai)g′(in i)aj = −aj∆i

where ∆i = (yi − ai)g′(in i)

Chapter 20 28

Page 29: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j= ?

Chapter 20 29

Page 30: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

“Reminder”: Chain rule for partial derivatives

For f(x, y), with f differentiable wrt x and y, and x and y differentiablewrt u and v:

∂f

∂u=

∂f

∂x

∂x

∂u+

∂f

∂y

∂y

∂u

and

∂f

∂v=

∂f

∂x

∂x

∂v+

∂f

∂y

∂y

∂v

Chapter 20 30

Page 31: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j=

∂Wk,jE(aj1, aj2, . . . , ajm)

where {ji} are the indices of the nodes in the same layer as node j.

Chapter 20 31

Page 32: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j=

∂E

∂aj

∂aj

∂Wk,j+

i

∂E

∂ai

∂ai

∂Wk,j

where ∑

i runs over all other nodes i in the same layer as node j.

Chapter 20 32

Page 33: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j=

∂E

∂aj

∂aj

∂Wk,j+

i

∂E

∂ai

∂ai

∂Wk,j

=∂E

∂aj

∂aj

∂Wk,jsince

∂ai

∂Wk,j= 0 for i 6= j

Chapter 20 33

Page 34: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j=

∂E

∂aj

∂aj

∂Wk,j+

i

∂E

∂ai

∂ai

∂Wk,j

=∂E

∂aj

∂aj

∂Wk,jsince

∂ai

∂Wk,j= 0 for i 6= j

=∂E

∂aj· g′(inj)ak

Chapter 20 34

Page 35: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j=

∂E

∂aj

∂aj

∂Wk,j+

i

∂E

∂ai

∂ai

∂Wk,j

=∂E

∂aj

∂aj

∂Wk,jsince

∂ai

∂Wk,j= 0 for i 6= j

=∂E

∂aj· g′(inj)ak

∂E

∂aj= ?

Chapter 20 35

Page 36: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j=

∂E

∂aj

∂aj

∂Wk,j+

i

∂E

∂ai

∂ai

∂Wk,j

=∂E

∂aj

∂aj

∂Wk,jsince

∂ai

∂Wk,j= 0 for i 6= j

=∂E

∂aj· g′(inj)ak

∂E

∂aj=

∂ajE(ak1

, ak2, . . . , akm)

where {ki} are the indices of the nodes in the layer after node j.

Chapter 20 36

Page 37: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j=

∂E

∂aj

∂aj

∂Wk,j+

i

∂E

∂ai

∂ai

∂Wk,j

=∂E

∂aj

∂aj

∂Wk,jsince

∂ai

∂Wk,j= 0 for i 6= j

=∂E

∂aj· g′(inj)ak

∂E

∂aj=

k

∂E

∂ak

∂ak

∂aj

where ∑

k runs over all nodes k that node j connects to.

Chapter 20 37

Page 38: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

For a node j in a hidden layer:

∂E

∂Wk,j=

∂E

∂aj

∂aj

∂Wk,j+

i

∂E

∂ai

∂ai

∂Wk,j

=∂E

∂aj

∂aj

∂Wk,jsince

∂ai

∂Wk,j= 0 for i 6= j

=∂E

∂aj· g′(inj)ak

∂E

∂aj=

k

∂E

∂ak

∂ak

∂aj

=∑

k

∂E

∂akg′(ink)Wj,k

Chapter 20 38

Page 39: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation derivation: hidden layer

If we define

∆j ≡ g′(inj)∑

kWj,k∆k

then

∂E

∂Wk,j= −∆jak

Chapter 20 39

Page 40: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation pseudocode

for iter = 1 to T

for e = 1 to N (all examples)

~x = input for example e~y = output for example erun ~x forward through network, computing all {ai}, {ini}for all weights (j, i) (in reverse order)

compute ∆i =

(yi − ai)× g′(ini) if i is output node

g′(ini)∑

k Wi,k∆k o.w.

Wj,i = Wj,i + α× aj ×∆i

Chapter 20 40

Page 41: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation learning contd.

At each epoch, sum gradient updates for all examples and apply

Restaurant data:

0

2

4

6

8

10

12

14

0 50 100 150 200 250 300 350 400

Tot

al e

rror

on

trai

ning

set

Number of epochs

Usual problems with slow convergence, local minima

Chapter 20 41

Page 42: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Back-propagation learning contd.

Restaurant data:

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100

% c

orre

ct o

n te

st s

et

Training set size

Multilayer networkDecision tree

Chapter 20 42

Page 43: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Handwritten digit recognition

3-nearest-neighbor = 2.4% error400–300–10 unit MLP = 1.6% errorLeNet: 768–192–30–10 unit MLP = 0.9% error

Chapter 20 43

Page 44: Chapter 20 · Chapter 20 2. Brains 1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axon Cell body or

Summary

Most brains have lots of neurons; each neuron ≈ linear–threshold unit (?)

Perceptrons (one-layer networks) insufficiently expressive

Multi-layer networks are sufficiently expressive; can be trained by gradientdescent, i.e., error back-propagation

Many applications: speech, driving, handwriting, credit cards, etc.

Chapter 20 44