multilayer perceptron

Building robots Spring 2003 1

Multilayer Perceptron

One and More Layers Neural Network

Building Robots Spring 2003 2

The association problem

ξ - input to the network with length NI, i.e., {ξk ; k =1,2,…,NI} O - output with length No, i.e., {Oi ; i=1,2,…,No} ς - desired output , i.e., {ςi ; i=1,2,…,No} w - weights in the network, i.e., wik weight between ξk and Oi

T – threshold value for output unit be activated g – function to convert input to output values between 0 and 1.

Special case: threshold function, g(x)=θ(x)=1 or 0 if x > 0 or not.

)(1

IN

kikiki TwgO

w

O

Given an input pattern ξ we would like the output O to be the desired one ς . Indeed we would like it to be true for a set of p input patterns and desired output patterns ,μ=1, …, p. The inputs and outputs may be continuous or boolean.


The geometric view of the weights

For the boolean case, we want , the boundary between positive and negative threshold is defined by which

gives a plane (hyperplane) perpendicular to . The solution is to find the hyperplane that separates all the

inputs according to the desired classification For example: the boolean function AND

).()(1

iii

N

kkiki TwTw

I

ii Tw .iw

1,0 i

0)0,0(03)1,0(

0)0,1(1)1,1(443

2211

iw

1

2 12)1,1( Tw

Hyperplane (line)

11 w 12 w

1 2


Learning: Steepest descent on weights The optimal set of weights minimize the following cost:

Steepest descent method will find a local minima via

or

where the update can be done each pattern at a time, is the “learning rate”, , and

p N

ii

N

kkiki

p N

iiiik

o Io

TwgOwE1 1

2

11 1

2)(

2

1

2

1})({

k

p

i

iii

ik

ikik

h

hgO

w

wE

t

w

1

)()(

})({

ikiki Twh

)1()()1( kiikik twtw

(error))()(' iiik Ohg

)(wE

w)(tw)1( tw


Analysis of Learning Weights

iiikiikik twtwtwtw )()1()()1(

The steepest descent rule

produces changes on the weight vector only in the direction of each pattern vector . Thus, components of the vector perpendicular to the input patterns are left unchanged. If is perpendicular to all input patterns, than the change in weight

will not affect the solution.

For , which is

largest when is small. Since , the largest

changes occur for units in “doubt”(close to the threshold value.)

iw

kikik aww ˆ

))(1)((2)('1

1)( iiihi hghghg

ehg

i

ih i

N

k

kiki TwhI

1

ih

)( ihg1

0


Limitations of the Perceptron

Many problems, as simple as the XOR problem, can not be solved by the perceptron (no hyperplane can separate the input)

1)0,0(0)1,0(

0)0,1(1)1,1(4433

2211

1

2 Not a solution


Multilayer Neural Network

0V

0w

O1w

1V

- input of layer L to layer L+1 - weights connecting layer L to layer L+1. – threshold values for units at layer L

Thus, the output of a two layer network is written as

The cost optimization on all weights is given by

)(1

1

1

LL

jL

k

N

k

Ljk

Lj TVwgV

Lw

LV

LT

211222

1

0

1

1,1

1

1,2jkk

N

kjk

N

kijjj

N

kijii TTwgwgTVwgVO

p N

ijkijiijkij

o

wwOwwE1 1

20101 }),({

2

1}),({


111

Lw 112

Lw

1 2

Properties and How it Works With one input layer, one output layer, and one or more hidden

layers, and enough units for each layer, any classification problem can be solved

Example: The XOR problem:

Later we address the generalization problem (for new examples)

011

Lw0

12Lw

022

Lw0

21Lw

O

12

LV1

1LV

1 1

1 11 1

5.021

12

11 LLL TTT

1)0,0(0)1,0(

0)0,1(1)1,1(4433

2211

1

2

Layer L=0

12

LV

11

LVLayer L=1

)1,1(11 Lw

21

LVO

Layer L=20 1


Learning: Steepest descent on weights

patternper or

are update weights thecase general In the

,1,

Lq

LpL

pq

Lpq V

w

E

t

w

layer)other (any )('

layer) (final)('

ere wh

1,,,

final,final,

r

Lr

Lrp

Lp

Lp

ppL

pL

p

whg

Ohg

)2()()1( ,1, Lq

Lp

LL Vtwtwpqpq

q

Lp

Lq

Lpq

Lp TVwh 1,,

))(1)((2)('1

1)(For ppphp hghghg

ehg

p

ppppL

p OOO )1(2 final,

0w

O12

Lp 1V

11

Lr 1

2

Lr 1

3

Lr


Learning Threshold Values

0w

O

1V

10

11T 1

2T 14T

21T 2

3T

11

0 V0,1,

where

precisely, More

units). clamped ereach thes 0, i.e., weights,(no units these

of values threshold thebecomes unitslayer next theit to connecting

weightthesuch that layer each at added is 1,-at clamped unit, newA

0,

01

0

0

,

1

1,,

0

Lq

LLp

Lp

q

Lq

Lpq

q

Lp

Lq

Lpq

Lp

Lq

wVTw

VwTVwh

w

learned. are weights thelikejust

learned are s thresholdall and values0q toextended is

)()1(

(2) rule learning theThus,,1,

Lq

Lp

LL Vtwtwpqpq

multilayer perceptron

Documents