layered
Post on 23-Jan-2016
33 Views
Preview:
DESCRIPTION
TRANSCRIPT
Layered
Concept Map for Ch.3
Learning by BP
Nonlayered
Feed forwardNetwork
Min E(W)
SingleLayer
Multilayer Perceptron: y = F(x,W) f(x)Multilayer ALC
Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Matrix-Vector WScalar wij
Backpropagation (BP)
GradientDescent
Sigmoid
Actual OutputInput
Desired OutputNew W
-
+
Old W
Ch2,1 Ch 1 Ch 2
Chapter 3. Multilayer Perceptron1. MLP Architecture – Extension of Perceptron to Many
layers and Sigmoidal Activation functions
– for real-valued mapping/classification
Learning: Discrete → Find W*
→ Continuous F(x, W*) f(x)
or1
1
e 2
tanh11
2
e
1
0
-1
Hyperbolic Tangent
)1(2
1' 2
2
Smaller
0
1
Logistic
)1('
4
2. Weight Learning Rule – Backpropagation of Error
(1) Training Data ( ) Weights (W) :
Curve (Data) Fitting (Modeling, NL Regression)
(2) Mean Squared Error E for 1-D function as an Example
FunctionCostWFdEWEp
ppp
p 22 ))((2
1
2
1)( ,x
pd,px
)(xf
True Function
NN Approximating
Function
(3) Gradient Descent Learning (4) Learning Curve
E
ww
0
Gradienteous] [Instantan Local
(Pattern))2
(Batch))1
p
p
Ε
ΕΕ
w
ww
w
w
Iteration = One scan of the training set (Epoch)
0 Number of Iterations, n
E{ W(n), weight track }
(5) Backpropagation Learning Rule
j )( kk yd
ijw kyjkw k
k iy jy
j i
A. Output Layer Weights
))(( activationlocalerrorlocal
yw
E
w
Ew jk
jk
k
k
p
jk
pjk
)( ')( kkkk
pk yd
E
where
B. Inner Layer Weights
ijij
j
j
p
ij
pij y
w
E
w
Ew
unitshiddenforsignalerror
wy
y
EE
kjjkk
j
j
j
k
k k
p
j
pj
)('
where
Features: Locality of Computation, No Centralized Control, 2-Pass
(Credit assignment)
xi
Water Flow Analogy to Backpropagation
pd
px
py
pe
Input
Output
River Flow w1
)W,xF( y pp
( Drop Object Here )
( Fetch Object Here )
- Man
y weig
hts (Flows)
-
If the error is very sensitive to a weight change, then change that weight a lot, and
vice versa.
→ Gradient Descent , Minimum Disturbance
Principle
Flow wl
(6) Computation Example : MLP(2-1-2)
A. Forward Processing : Comp. Function Signals
)()(
)()(
)(
2222
1211
1
22111
hwsumy
hwsumy
sumh
xvxvsum
1x
2x 2y
1y
2v
1sum1w
2w
21sum
22sum
1v
h
No desired response is needed for hidden nodes. must exist = sigmoid [tanh or logistic]For classification, d = ± 0.9 for tanh, d = 0.1, 0.9 for logistic.
B. Backward Processing - Comp. Error Signals
;)1(',1
1)(
)('])( ')( '[)( ')(
)('])( ')( '[)( ')(
)()( '
)()( '
222121221123
122111221113
2
1
1222112
1222111
12222
12111
eif
xsumwsumewsumexsumwwxv
xsumwsumewsumexsumwwxv
sumsumehw
sumsumehw
2222
1211
)( '
)( '
esum
esum
1v 1w
2w
1sum
22sum
21sum
2v
111 yde
222 yde
h
13
23
has been computed in forward processing
If we knew f(x,y), it would be a lot faster to use it to calculate the output than to use the
NN.
Student Questions:
Does the output error become more uncertain in case of complex multilayer than simple layer ?
Should we use only up to 3 layers ?
Why can oscillation occur in the learning curve ?
Do we use the old weights for calculating the error signal δ ?
What does ANN mean ?
Which makes more sense, error gradient or the weight gradient considering the equation for weight change ?
What becomes the error signal to train the weights in forward mode ?
top related