introduction to ann lesson 3pelillo/didattica/old stuff...microsoft powerpoint -...

9
Backpropagation Learning Algorithm An algorithm for learning the weights in the network, given a training set of input-output pairs { , } The algorithm is based on gradient descent method. Architecture : Weight on connection between the unit in layer (l-1) to unit in layer l μ x μ o ) l ( ij w th i th j

Upload: others

Post on 06-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Backpropagation Learning Algorithm

• An algorithm for learning the weights in the network,given a training set of input-output pairs { , }

• The algorithm is based on gradient descent method.• Architecture

: Weight on connection between the unit in layer (l-1) to unit in layer l

µx µo

)l(ijw thi thj

Supervised Learning

Supervised learning algorithms require the presence of a “teacher”who provides the right answers to the input questions.

Technically, this means that we need a training set of the form

where :

• is the network input vector

• is the network output vector

( ) ( ){ }pp y,x.....,y,xL 11=

( )phx hK1=

( )phy hK1=

Supervised Learning

The learning (or training) phase consists of determining a configuration of weights in such a way that the network output be as close as possible to the desired output, for all examples in the training set.Formally, this amounts to minimizing the following error function :

where is the output provided by the network when given as input.

( )∑ ∑

−=

−==

h k

hk

hk

p

hhh

yout

youtE

2121 2

21

houth

x

Back - Propagation

To minimize the error function E we can use the classicgradient – descent algorithm.

To compute the partial derivates , we use the error back propagation algorithm.

It consists of two stages:

• Forward pass : the input to the network is propagatedlayer after layer in forward direction

• Backward pass : the “ error ” made by the network is propagated backward, and weights are updated properly

ijwE ∂∂

Dato il pattern µ, l’unità nascosta j riceve un input netto dato da

e produce come output :

∑=k

kjkj xwh µµ

( )

== ∑

kkjkjj xwghgV µµµ

Back-Prop : Updating Hidden-to-Output Weights

where:

( )

( )

( )

( ) ( )µ

µ

µ

µµ

µ

µµ

µ

µ

µµ

µ

µ

µµ

µµ

µ

δη

η

η

η

η

η

ji

jiii

ij

iii

ij

k

kkk

kkkij

ijij

V

Vh'gOy

WOOy

WOOy

OyW

WEW

∑ ∑

∑ ∑

=

−=

∂∂

−=

∂∂

−=

∂∂

−=

∂∂

−=∆

2

21

( ) ( )µµµµδ iiii h'gOy −=

Back-Prop : Updating Input-to-Hidden Weights ( I )

( )

( ) ( )jk

iiii

i

jk

i

iii

jkjk

whh'gOy

wOOy

wEw

∂∂

−=

∂∂

−=

∂∂

−=∆

∑ ∑

∑ ∑

µµµµ

µ

µ

µ

µµ

η

η

η

( )

( )jk

jjij

jk

jij

jk

jij

jk

l

lil

jk

i

wh

h'gW

whg

W

wV

W

wVW

wh

∂∂

=

∂=

∂=

∂∂

=∂∂

µµ

µ

µ

µµ

Back-Prop : Updating Input-to-Hidden Weights ( II )

Hence, we get :

where :

µ

µµ

k

mm

jmjkjk

j

x

xwww

h

=

∂∂

=∂∂

( ) ( ) ( )( )

µ

µ

µ

µµ

µ

µ

µµ

µ

µµ

δη

δη

η

kj

kjiji,

i

jijii,

iijk

xh'gW

h'gWh'gOyw

=

=

−=∆

( ) iji

ijj Wh'gˆ ∑= µµµ δδ

Retropropagazione dell’errore :

• le linee nere indicano il segnale propagato in avanti• Le linee blu indicano l’errore (i δ) propagato all’indietro