nn – cont

NN – cont.

Alexandra I. CristeaUSI intensive course “Adaptive Systems” April-May 2003

• We have seen how the neuron computes, let’s see– What it can compute?– How it can learn?

What does the neuron compute?

Perceptron, discrete neuron

• First, simple case: – no hidden layers– Only one neuron

– Get rid of threshold – b becomes w0

– Y – Boolean function : > 0 fires 0 doesn’t fire

Threshold function f

f

(w0 = - t = -1)

ｔ＝１

１ｆ

Y = X1 or X2

Ｗ１＝１Ｗ２＝ 1

X1X2

0 0 1

１ 1 1

0 １Y

ｆ

X1 X2

ｔ＝１

１ｆ

Y = X1 and X2

Ｗ１＝ 0,5 Ｗ２＝ 0,5

X1X2

0 0 0

１ 0 1

0 １Y

ｆ

X1 X2

ｔ＝１

１ｆ

Y = or(x1,…,xn)

w1=w2=…=wn=1ｔ＝１

１ｆ

Y = and(x1,…,xn)

w1=w2=…=wn=1/nｔ＝１

１ｆ

What are we actually doing?

X1X2

0 -1 1

１ 1 1

0 １Y

X1X2

0 0 0

１ 0 1

0 １Y

X1X2

0 0 1

１ 1 1

0 １Y

w0+w1*X1+w2*X2

Ｗ 0=-1; Ｗ１＝ 7; Ｗ２＝ 9

Ｗ 0=-1; Ｗ１＝ 0,7; Ｗ２＝ 0,9

Ｗ 0=1; Ｗ１＝ 7; Ｗ２＝ 9

X1

X2

x1

x2

w0+w1*x1+w2*x2

w0= - 1w1= - 0,67w2= 1

Linearly Separable Set

w0+w1*x1+w2*x2


x1

x2

w0= - 1w1= 0,25w2= - 0,1

w0+w1*x1+w2*x2


x1

x2

w0= - 1w1= 0,25w2= 0,04

w0+w1*x1+w2*x2


x1

x2

w0= - 1w1= 0,167w2= 0,1

Non-linearly separable Set

w0+w1*x1+w2*x2

Non Linearly Separable Set

x1

x2

w0=w1=w2=

w0+w1*x1+w2*x2Non Linearly Separable Set

x1

x2

w0=w1=w2=

Perceptron Classification Theorem

A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable.

w0+w1*x1+w2*x2

Typical non-linearly separable set: Y=XOR(x1,x2)

x1

x20,0 1,0

0,1 1,1

Y=1Y=0

How does the neuron learn?

Learning: weight computation• W1* （ X1 ＝１）＋ W ２ * （ X2= １）＞＝（ｔ＝

１）• W1* （ X1 ＝０）＋ W ２ * （ X2= １）＜（ｔ＝１）• W1* （ X1 ＝１）＋ W ２ * （ X2= ０）＜（ｔ＝１）• W1* （ X1 ＝０）＋ W ２ * （ X2= ０）＜（ｔ＝１）

X2

X1

W1*X1 ＋ W2*X2

Perceptron Learning Ruleincremental version

FOR i:= 0 TO n DO wi:=random initial value ENDFOR;

REPEAT select a pair (x,t) in X; (* each pair must have a positive probability of

being selected *) IF wT * x' > 0 THEN y:=1 ELSE y:=0 ENDIF; IF y t THEN

FOR i:= 0 TO n DO wi:= wi + (t-y) xi' ENDFOR ENDIF;

UNTIL X is correctly classified

ROSENBLATT (1962)

Idea Perceptron Learning Rule

w

x’

wnew wnew=w + x’ t=1y=0 (wTx’0)

wniew

x’

w

x’ x’

wnew=w - x’

wi:= wi + (t-y) xi'

w changes in the w changes in the direction of the input direction of the input

+ -

t=0y=1 (wTx’>0)

For multi-layered perceptrons w. continuous neurons, a simple and successful learning algorithm exists.

BKP:ErrorBKP:Error

Input Output

Hidden layerｙ１、 d １

ｙ２、 d ２

ｙ３、 d ３

ｙ４、 d ４

e1=d1 － y1

e2=d2 － y2

e3=d3 － y3

e4=d4 － y4

Hidden Hidden layerlayererrorerror ？？

Synapse

W ： weight

neuron1 neuron2

y1value

y2 ＝ w*y1value

Value (y1,y2)= Internal activation

Forward propagation

Weight serves as amplifier!Weight serves as amplifier!

Inverse Synapse

W ： weight

neuron1 neuron2

e1=????value

e2value

Value(e1,e2)= Error

Backward propagation


Inverse Synapse

W ： weight

neuron1 neuron2

e1=ww ＊＊ e2e2value

e2value

Value(e1,e2)= Error

Backward propagation


BKP:ErrorBKP:Error

Input Output

Hidden layerｙ１、 d １

ｙ２、 d ２

ｙ３、 d ３

ｙ４、 d ４

e1=d1 － y1

e2=d2 － y2

e3=d3 － y3

e4=d4 － y4

Hidden Hidden layerlayererrorerror ？？

O2 O1I1 O2,I2

Backpropagation to hidden layerBackpropagation to hidden layer

ｗ１

ｗ３

ｗ２Input

I1Output

O1

Hidden layer

ee ［ j ］＝ ie ［ i ］ｗ［ j,i ］Backpropagation ：

e １

e ２

e ３O2,I2

Update rule for 2 weight typesUpdate rule for 2 weight types

• ① I2 （ hidden layer ） , O1 （ system output ）• ② I1 （ system input ） , O2 （ hidden layer ）

① Δ ｗ =α(d[i]-y[i]) f’(S[i])f(S[i]) = =αe[i] f(S[i]) (simplification (simplification f’f’=1 for repeater, e.g.)=1 for repeater, e.g.)

S[i] = jw[j, ｉ ](t)h[j]

② Δ ｗ =α （ ie[i] ｗ [j,i] ） f’(S[j])f(S[j]) =α ee[j]f(S[j]) S[j] = kw[k,j](t)x[k]

Backpropagation algorithmFOR s := 1 TO r DO Ws := initial matrix(often random);

REPEAT

select a pair (x,t) in X; y0:=x; # forward phase: compute the actual output ys of the network with input x

FOR s := 1 TO r DO ys := F(Ws ys-1) END; # yr is the output vector of the network # backpropagation phase: propagate the errors back through the network # and adapt the weights of all layers

dr := Fr’ (t - yr) ;

FOR s := r TO 2 DO ds-1 := Fs-1' WsT ds;

Ws := Ws + ds ys-1T; END;

W1 := W1 + d1 y0T

UNTIL stop criterion

Conclusion

• We have seen binary function representation with single layer perceptron

• We have seen a learning algorithm for SLP

• We have seen a learning algorithm for MLP (BP)

• So, neurons can represent knowledge AND learn!

nn – cont

Documents