multilayer perceptron

10
Building robots Spring 2003 1 Multilayer Perceptron One and More Layers Neural Network

Upload: cameron-sellers

Post on 31-Dec-2015

31 views

Category:

Documents


1 download

DESCRIPTION

Multilayer Perceptron. One and More Layers Neural Network. The association problem. ξ - i nput to the network with length N I , i.e., { ξ k ; k =1,2,…,N I } O - output with length N o , i.e., { O i ; i=1,2,…,N o } ς - desired output , i.e., { ς i ; i=1,2,…,N o } - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multilayer Perceptron

Building robots Spring 2003 1

Multilayer Perceptron

One and More Layers Neural Network

Page 2: Multilayer Perceptron

Building Robots Spring 2003 2

The association problem

ξ - input to the network with length NI, i.e., {ξk ; k =1,2,…,NI} O - output with length No, i.e., {Oi ; i=1,2,…,No} ς - desired output , i.e., {ςi ; i=1,2,…,No} w - weights in the network, i.e., wik weight between ξk and Oi

T – threshold value for output unit be activated g – function to convert input to output values between 0 and 1.

Special case: threshold function, g(x)=θ(x)=1 or 0 if x > 0 or not.

)(1

IN

kikiki TwgO

w

O

Given an input pattern ξ we would like the output O to be the desired one ς . Indeed we would like it to be true for a set of p input patterns and desired output patterns ,μ=1, …, p. The inputs and outputs may be continuous or boolean.

Page 3: Multilayer Perceptron

Building Robots Spring 2003 3

The geometric view of the weights

For the boolean case, we want , the boundary between positive and negative threshold is defined by which

gives a plane (hyperplane) perpendicular to . The solution is to find the hyperplane that separates all the

inputs according to the desired classification For example: the boolean function AND

).()(1

iii

N

kkiki TwTw

I

ii Tw .iw

1,0 i

0)0,0(03)1,0(

0)0,1(1)1,1(443

2211

iw

1

2 12)1,1( Tw

Hyperplane (line)

11 w 12 w

1 2

Page 4: Multilayer Perceptron

Building Robots Spring 2003 4

Learning: Steepest descent on weights The optimal set of weights minimize the following cost:

Steepest descent method will find a local minima via

or

where the update can be done each pattern at a time, is the “learning rate”, , and

p N

ii

N

kkiki

p N

iiiik

o Io

TwgOwE1 1

2

11 1

2)(

2

1

2

1})({

k

p

i

iii

ik

ikik

h

hgO

w

wE

t

w

1

)()(

})({

ikiki Twh

)1()()1( kiikik twtw

(error))()(' iiik Ohg

)(wE

w)(tw)1( tw

Page 5: Multilayer Perceptron

Building Robots Spring 2003 5

Analysis of Learning Weights

iiikiikik twtwtwtw )()1()()1(

The steepest descent rule

produces changes on the weight vector only in the direction of each pattern vector . Thus, components of the vector perpendicular to the input patterns are left unchanged. If is perpendicular to all input patterns, than the change in weight

will not affect the solution.

For , which is

largest when is small. Since , the largest

changes occur for units in “doubt”(close to the threshold value.)

iw

kikik aww ˆ

))(1)((2)('1

1)( iiihi hghghg

ehg

i

ih i

N

k

kiki TwhI

1

ih

)( ihg1

0

Page 6: Multilayer Perceptron

Building Robots Spring 2003 6

Limitations of the Perceptron

Many problems, as simple as the XOR problem, can not be solved by the perceptron (no hyperplane can separate the input)

1)0,0(0)1,0(

0)0,1(1)1,1(4433

2211

1

2 Not a solution

Page 7: Multilayer Perceptron

Building Robots Spring 2003 7

Multilayer Neural Network

0V

0w

O1w

1V

- input of layer L to layer L+1 - weights connecting layer L to layer L+1. – threshold values for units at layer L

Thus, the output of a two layer network is written as

The cost optimization on all weights is given by

)(1

1

1

LL

jL

k

N

k

Ljk

Lj TVwgV

Lw

LV

LT

211222

1

0

1

1,1

1

1,2jkk

N

kjk

N

kijjj

N

kijii TTwgwgTVwgVO

p N

ijkijiijkij

o

wwOwwE1 1

20101 }),({

2

1}),({

Page 8: Multilayer Perceptron

Building Robots Spring 2003 8

111

Lw 112

Lw

1 2

Properties and How it Works With one input layer, one output layer, and one or more hidden

layers, and enough units for each layer, any classification problem can be solved

Example: The XOR problem:

Later we address the generalization problem (for new examples)

011

Lw0

12Lw

022

Lw0

21Lw

O

12

LV1

1LV

1 1

1 11 1

5.021

12

11 LLL TTT

1)0,0(0)1,0(

0)0,1(1)1,1(4433

2211

1

2

Layer L=0

12

LV

11

LVLayer L=1

)1,1(11 Lw

21

LVO

Layer L=20 1

Page 9: Multilayer Perceptron

Building Robots Spring 2003 9

Learning: Steepest descent on weights

patternper or

are update weights thecase general In the

,1,

Lq

LpL

pq

Lpq V

w

E

t

w

layer)other (any )('

layer) (final)('

ere wh

1,,,

final,final,

r

Lr

Lrp

Lp

Lp

ppL

pL

p

whg

Ohg

)2()()1( ,1, Lq

Lp

LL Vtwtwpqpq

q

Lp

Lq

Lpq

Lp TVwh 1,,

))(1)((2)('1

1)(For ppphp hghghg

ehg

p

ppppL

p OOO )1(2 final,

0w

O12

Lp 1V

11

Lr 1

2

Lr 1

3

Lr

Page 10: Multilayer Perceptron

Building Robots Spring 2003 10

Learning Threshold Values

0w

O

1V

10

11T 1

2T 14T

21T 2

3T

11

0 V0,1,

where

precisely, More

units). clamped ereach thes 0, i.e., weights,(no units these

of values threshold thebecomes unitslayer next theit to connecting

weightthesuch that layer each at added is 1,-at clamped unit, newA

0,

01

0

0

,

1

1,,

0

Lq

LLp

Lp

q

Lq

Lpq

q

Lp

Lq

Lpq

Lp

Lq

wVTw

VwTVwh

w

learned. are weights thelikejust

learned are s thresholdall and values0q toextended is

)()1(

(2) rule learning theThus,,1,

Lq

Lp

LL Vtwtwpqpq