nn – cont
DESCRIPTION
NN – cont. Alexandra I. Cristea. USI intensive course “Adaptive Systems” April-May 200 3. We have seen how the neuron computes, let’s see What it can compute? How it can learn?. What does the neuron compute?. Perceptron, discrete neuron. First, simple case: no hidden layers - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/1.jpg)
NN – cont.
Alexandra I. CristeaUSI intensive course “Adaptive Systems” April-May 2003
![Page 2: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/2.jpg)
• We have seen how the neuron computes, let’s see– What it can compute?– How it can learn?
![Page 3: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/3.jpg)
What does the neuron compute?
![Page 4: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/4.jpg)
Perceptron, discrete neuron
• First, simple case: – no hidden layers– Only one neuron
– Get rid of threshold – b becomes w0
– Y – Boolean function : > 0 fires 0 doesn’t fire
![Page 5: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/5.jpg)
Threshold function f
f
(w0 = - t = -1)
t=1
1f
![Page 6: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/6.jpg)
Y = X1 or X2
W1=1 W2= 1
X1X2
0 0 1
1 1 1
0 1Y
f
X1 X2
t=1
1f
![Page 7: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/7.jpg)
Y = X1 and X2
W1= 0,5 W2= 0,5
X1X2
0 0 0
1 0 1
0 1Y
f
X1 X2
t=1
1f
![Page 8: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/8.jpg)
Y = or(x1,…,xn)
w1=w2=…=wn=1t=1
1f
![Page 9: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/9.jpg)
Y = and(x1,…,xn)
w1=w2=…=wn=1/nt=1
1f
![Page 10: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/10.jpg)
What are we actually doing?
X1X2
0 -1 1
1 1 1
0 1Y
X1X2
0 0 0
1 0 1
0 1Y
X1X2
0 0 1
1 1 1
0 1Y
w0+w1*X1+w2*X2
W 0=-1; W1 = 7; W2= 9
W 0=-1; W1 = 0,7; W2= 0,9
W 0=1; W1 = 7; W2= 9
X1
X2
![Page 11: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/11.jpg)
x1
x2
w0+w1*x1+w2*x2
w0= - 1w1= - 0,67w2= 1
Linearly Separable Set
![Page 12: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/12.jpg)
w0+w1*x1+w2*x2
Linearly Separable Set
x1
x2
w0= - 1w1= 0,25w2= - 0,1
![Page 13: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/13.jpg)
w0+w1*x1+w2*x2
Linearly Separable Set
x1
x2
w0= - 1w1= 0,25w2= 0,04
![Page 14: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/14.jpg)
w0+w1*x1+w2*x2
Linearly Separable Set
x1
x2
w0= - 1w1= 0,167w2= 0,1
![Page 15: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/15.jpg)
Non-linearly separable Set
![Page 16: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/16.jpg)
w0+w1*x1+w2*x2
Non Linearly Separable Set
x1
x2
w0=w1=w2=
![Page 17: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/17.jpg)
w0+w1*x1+w2*x2Non Linearly Separable Set
x1
x2
w0=w1=w2=
![Page 18: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/18.jpg)
w0+w1*x1+w2*x2Non Linearly Separable Set
x1
x2
w0=w1=w2=
![Page 19: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/19.jpg)
w0+w1*x1+w2*x2Non Linearly Separable Set
x1
x2
w0=w1=w2=
![Page 20: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/20.jpg)
Perceptron Classification Theorem
A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable.
![Page 21: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/21.jpg)
w0+w1*x1+w2*x2
Typical non-linearly separable set: Y=XOR(x1,x2)
x1
x20,0 1,0
0,1 1,1
Y=1Y=0
![Page 22: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/22.jpg)
How does the neuron learn?
![Page 23: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/23.jpg)
Learning: weight computation• W1* ( X1 =1)+ W 2 * ( X2= 1)>=(t=
1)• W1* ( X1 =0)+ W 2 * ( X2= 1)<(t=1)• W1* ( X1 =1)+ W 2 * ( X2= 0)<(t=1)• W1* ( X1 =0)+ W 2 * ( X2= 0)<(t=1)
X2
X1
W1*X1 + W2*X2
![Page 24: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/24.jpg)
Perceptron Learning Ruleincremental version
FOR i:= 0 TO n DO wi:=random initial value ENDFOR;
REPEAT select a pair (x,t) in X; (* each pair must have a positive probability of
being selected *) IF wT * x' > 0 THEN y:=1 ELSE y:=0 ENDIF; IF y t THEN
FOR i:= 0 TO n DO wi:= wi + (t-y) xi' ENDFOR ENDIF;
UNTIL X is correctly classified
ROSENBLATT (1962)
![Page 25: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/25.jpg)
Idea Perceptron Learning Rule
w
x’
wnew wnew=w + x’ t=1y=0 (wTx’0)
wniew
x’
w
x’ x’
wnew=w - x’
wi:= wi + (t-y) xi'
w changes in the w changes in the direction of the input direction of the input
+ -
t=0y=1 (wTx’>0)
![Page 26: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/26.jpg)
![Page 27: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/27.jpg)
For multi-layered perceptrons w. continuous neurons, a simple and successful learning algorithm exists.
![Page 28: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/28.jpg)
BKP:ErrorBKP:Error
Input Output
Hidden layery1、 d 1
y2、 d 2
y3、 d 3
y4、 d 4
e1=d1 - y1
e2=d2 - y2
e3=d3 - y3
e4=d4 - y4
Hidden Hidden layerlayererrorerror ??
![Page 29: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/29.jpg)
Synapse
W : weight
neuron1 neuron2
y1value
y2 = w*y1value
Value (y1,y2)= Internal activation
Forward propagation
Weight serves as amplifier!Weight serves as amplifier!
![Page 30: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/30.jpg)
Inverse Synapse
W : weight
neuron1 neuron2
e1=????value
e2value
Value(e1,e2)= Error
Backward propagation
Weight serves as amplifier!Weight serves as amplifier!
![Page 31: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/31.jpg)
Inverse Synapse
W : weight
neuron1 neuron2
e1=ww ** e2e2value
e2value
Value(e1,e2)= Error
Backward propagation
Weight serves as amplifier!Weight serves as amplifier!
![Page 32: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/32.jpg)
BKP:ErrorBKP:Error
Input Output
Hidden layery1、 d 1
y2、 d 2
y3、 d 3
y4、 d 4
e1=d1 - y1
e2=d2 - y2
e3=d3 - y3
e4=d4 - y4
Hidden Hidden layerlayererrorerror ??
O2 O1I1 O2,I2
![Page 33: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/33.jpg)
Backpropagation to hidden layerBackpropagation to hidden layer
w1
w3
w2Input
I1Output
O1
Hidden layer
ee [ j ] = ie [ i ]w[ j,i ]Backpropagation :
e 1
e 2
e 3O2,I2
![Page 34: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/34.jpg)
Update rule for 2 weight typesUpdate rule for 2 weight types
• ① I2 ( hidden layer ) , O1 ( system output )• ② I1 ( system input ) , O2 ( hidden layer )
① Δ w =α(d[i]-y[i]) f’(S[i])f(S[i]) = =αe[i] f(S[i]) (simplification (simplification f’f’=1 for repeater, e.g.)=1 for repeater, e.g.)
S[i] = jw[j, i ](t)h[j]
② Δ w =α ( ie[i] w [j,i] ) f’(S[j])f(S[j]) =α ee[j]f(S[j]) S[j] = kw[k,j](t)x[k]
![Page 35: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/35.jpg)
Backpropagation algorithmFOR s := 1 TO r DO Ws := initial matrix(often random);
REPEAT
select a pair (x,t) in X; y0:=x; # forward phase: compute the actual output ys of the network with input x
FOR s := 1 TO r DO ys := F(Ws ys-1) END; # yr is the output vector of the network # backpropagation phase: propagate the errors back through the network # and adapt the weights of all layers
dr := Fr’ (t - yr) ;
FOR s := r TO 2 DO ds-1 := Fs-1' WsT ds;
Ws := Ws + ds ys-1T; END;
W1 := W1 + d1 y0T
UNTIL stop criterion
![Page 36: NN – cont](https://reader033.vdocument.in/reader033/viewer/2022061608/568145df550346895db2e019/html5/thumbnails/36.jpg)
Conclusion
• We have seen binary function representation with single layer perceptron
• We have seen a learning algorithm for SLP
• We have seen a learning algorithm for MLP (BP)
• So, neurons can represent knowledge AND learn!