multilayer perceptron
DESCRIPTION
Multilayer Perceptron. One and More Layers Neural Network. The association problem. ξ - i nput to the network with length N I , i.e., { ξ k ; k =1,2,…,N I } O - output with length N o , i.e., { O i ; i=1,2,…,N o } ς - desired output , i.e., { ς i ; i=1,2,…,N o } - PowerPoint PPT PresentationTRANSCRIPT
Building robots Spring 2003 1
Multilayer Perceptron
One and More Layers Neural Network
Building Robots Spring 2003 2
The association problem
ξ - input to the network with length NI, i.e., {ξk ; k =1,2,…,NI} O - output with length No, i.e., {Oi ; i=1,2,…,No} ς - desired output , i.e., {ςi ; i=1,2,…,No} w - weights in the network, i.e., wik weight between ξk and Oi
T – threshold value for output unit be activated g – function to convert input to output values between 0 and 1.
Special case: threshold function, g(x)=θ(x)=1 or 0 if x > 0 or not.
)(1
IN
kikiki TwgO
w
O
Given an input pattern ξ we would like the output O to be the desired one ς . Indeed we would like it to be true for a set of p input patterns and desired output patterns ,μ=1, …, p. The inputs and outputs may be continuous or boolean.
Building Robots Spring 2003 3
The geometric view of the weights
For the boolean case, we want , the boundary between positive and negative threshold is defined by which
gives a plane (hyperplane) perpendicular to . The solution is to find the hyperplane that separates all the
inputs according to the desired classification For example: the boolean function AND
).()(1
iii
N
kkiki TwTw
I
ii Tw .iw
1,0 i
0)0,0(03)1,0(
0)0,1(1)1,1(443
2211
iw
1
2 12)1,1( Tw
Hyperplane (line)
11 w 12 w
1 2
Building Robots Spring 2003 4
Learning: Steepest descent on weights The optimal set of weights minimize the following cost:
Steepest descent method will find a local minima via
or
where the update can be done each pattern at a time, is the “learning rate”, , and
p N
ii
N
kkiki
p N
iiiik
o Io
TwgOwE1 1
2
11 1
2)(
2
1
2
1})({
k
p
i
iii
ik
ikik
h
hgO
w
wE
t
w
1
)()(
})({
ikiki Twh
)1()()1( kiikik twtw
(error))()(' iiik Ohg
)(wE
w)(tw)1( tw
Building Robots Spring 2003 5
Analysis of Learning Weights
iiikiikik twtwtwtw )()1()()1(
The steepest descent rule
produces changes on the weight vector only in the direction of each pattern vector . Thus, components of the vector perpendicular to the input patterns are left unchanged. If is perpendicular to all input patterns, than the change in weight
will not affect the solution.
For , which is
largest when is small. Since , the largest
changes occur for units in “doubt”(close to the threshold value.)
iw
kikik aww ˆ
))(1)((2)('1
1)( iiihi hghghg
ehg
i
ih i
N
k
kiki TwhI
1
ih
)( ihg1
0
Building Robots Spring 2003 6
Limitations of the Perceptron
Many problems, as simple as the XOR problem, can not be solved by the perceptron (no hyperplane can separate the input)
1)0,0(0)1,0(
0)0,1(1)1,1(4433
2211
1
2 Not a solution
Building Robots Spring 2003 7
Multilayer Neural Network
0V
0w
O1w
1V
- input of layer L to layer L+1 - weights connecting layer L to layer L+1. – threshold values for units at layer L
Thus, the output of a two layer network is written as
The cost optimization on all weights is given by
)(1
1
1
LL
jL
k
N
k
Ljk
Lj TVwgV
Lw
LV
LT
211222
1
0
1
1,1
1
1,2jkk
N
kjk
N
kijjj
N
kijii TTwgwgTVwgVO
p N
ijkijiijkij
o
wwOwwE1 1
20101 }),({
2
1}),({
Building Robots Spring 2003 8
111
Lw 112
Lw
1 2
Properties and How it Works With one input layer, one output layer, and one or more hidden
layers, and enough units for each layer, any classification problem can be solved
Example: The XOR problem:
Later we address the generalization problem (for new examples)
011
Lw0
12Lw
022
Lw0
21Lw
O
12
LV1
1LV
1 1
1 11 1
5.021
12
11 LLL TTT
1)0,0(0)1,0(
0)0,1(1)1,1(4433
2211
1
2
Layer L=0
12
LV
11
LVLayer L=1
)1,1(11 Lw
21
LVO
Layer L=20 1
Building Robots Spring 2003 9
Learning: Steepest descent on weights
patternper or
are update weights thecase general In the
,1,
Lq
LpL
pq
Lpq V
w
E
t
w
layer)other (any )('
layer) (final)('
ere wh
1,,,
final,final,
r
Lr
Lrp
Lp
Lp
ppL
pL
p
whg
Ohg
)2()()1( ,1, Lq
Lp
LL Vtwtwpqpq
q
Lp
Lq
Lpq
Lp TVwh 1,,
))(1)((2)('1
1)(For ppphp hghghg
ehg
p
ppppL
p OOO )1(2 final,
0w
O12
Lp 1V
11
Lr 1
2
Lr 1
3
Lr
Building Robots Spring 2003 10
Learning Threshold Values
0w
O
1V
10
11T 1
2T 14T
21T 2
3T
11
0 V0,1,
where
precisely, More
units). clamped ereach thes 0, i.e., weights,(no units these
of values threshold thebecomes unitslayer next theit to connecting
weightthesuch that layer each at added is 1,-at clamped unit, newA
0,
01
0
0
,
1
1,,
0
Lq
LLp
Lp
q
Lq
Lpq
q
Lp
Lq
Lpq
Lp
Lq
wVTw
VwTVwh
w
learned. are weights thelikejust
learned are s thresholdall and values0q toextended is
)()1(
(2) rule learning theThus,,1,
Lq
Lp
LL Vtwtwpqpq