layered

Layered

Concept Map for Ch.3

Learning by BP

Nonlayered

Feed forwardNetwork

Min E(W)

SingleLayer

Multilayer Perceptron: y = F(x,W) f(x)Multilayer ALC

Learning : {(xi, f(xi)) | i = 1 ~ N} → W

Matrix-Vector WScalar wij

Backpropagation (BP)

GradientDescent

Sigmoid

Actual OutputInput

Desired OutputNew W

Ch2,1 Ch 1 Ch 2

Chapter 3. Multilayer Perceptron1. MLP Architecture – Extension of Perceptron to Many

layers and Sigmoidal Activation functions

– for real-valued mapping/classification

Learning: Discrete → Find W*

→ Continuous F(x, W*) f(x)

tanh11

Hyperbolic Tangent

Smaller

Logistic

2. Weight Learning Rule – Backpropagation of Error

(1) Training Data ( ) Weights (W) :

Curve (Data) Fitting (Modeling, NL Regression)

(2) Mean Squared Error E for 1-D function as an Example

FunctionCostWFdEWEp

p 22 ))((2

1)( ,x

True Function

NN Approximating

Function

(3) Gradient Descent Learning (4) Learning Curve

Gradienteous] [Instantan Local

(Pattern))2

(Batch))1

Iteration = One scan of the training set (Epoch)

0 Number of Iterations, n

E{ W(n), weight track }

(5) Backpropagation Learning Rule

j )( kk yd

ijw kyjkw k

k iy jy

A. Output Layer Weights

))(( activationlocalerrorlocal

)( ')( kkkk

B. Inner Layer Weights

unitshiddenforsignalerror

Features: Locality of Computation, No Centralized Control, 2-Pass

(Credit assignment)

Water Flow Analogy to Backpropagation

Output

River Flow w1

)W,xF( y pp

( Drop Object Here )

( Fetch Object Here )

y weig

hts (Flows)

If the error is very sensitive to a weight change, then change that weight a lot, and

vice versa.

→ Gradient Descent , Minimum Disturbance

Principle

Flow wl

(6) Computation Example : MLP(2-1-2)

A. Forward Processing : Comp. Function Signals

hwsumy

xvxvsum

1sum1w

No desired response is needed for hidden nodes. must exist = sigmoid [tanh or logistic]For classification, d = ± 0.9 for tanh, d = 0.1, 0.9 for logistic.

B. Backward Processing - Comp. Error Signals

;)1(',1

)('])( ')( '[)( ')(

)()( '

222121221123

122111221113

1222112

1222111

xsumwsumewsumexsumwwxv

sumsumehw

111 yde

222 yde

has been computed in forward processing

If we knew f(x,y), it would be a lot faster to use it to calculate the output than to use the

Student Questions:

Does the output error become more uncertain in case of complex multilayer than simple layer ?

Should we use only up to 3 layers ?

Why can oscillation occur in the learning curve ?

Do we use the old weights for calculating the error signal δ ?

What does ANN mean ?

Which makes more sense, error gradient or the weight gradient considering the equation for weight change ?

What becomes the error signal to train the weights in forward mode ?

layered

error gradient

output error

weight gradient

error signals

weight change

backpropagation learning

learning curve e0iteration

old weights

Documents

layered emergency response.ppt

layered and composite biopolymers: mechanical, physical...

communication layered protocols layered protocols

layered curriculum

layered concept

layered coding

layered water rights

multiple layered polycarbonates

layered living

layered depth images

layered manufacturing

decomposing single images for layered photo...

application of layered and non-layered nano/micro particles...

an algebra of layered complex preferences · 2 a ⋅a ≤a....

layered learning

earth science 8.4 earth’s layered structure earth’s...

layered sensing - apps.dtic.mil

layered convection

layered navigation - pushon

sdh layered architecture