simple perceptrons - ucsb computer scienceyfwang/courses/cs290i_prann/pdf/percept… · pr , ann,...

Simple Simple PerceptronsPerceptrons

PR , ANN, & ML 2

Simple Perceptrons

� Perform supervised learning

� correct I/O associations are provided

� Feed-forward networks

� connections are one-directional

� One layer

� input layer + output layer

PR , ANN, & ML 3

� N: dimension of the input vector

� M: dimension of the output vector

� inputs

� real outputs

� weight vectors

� activation function

Njx j ,...,1, =

Miyi ,...,1, =

w i M j Nij , , ..., , , ...,= =1 1

g

1x 2x 3x4x

1y 2y 3y

w34 ∑=

==N

j

jijii xwgnetgy1

)()(

Notations

PR , ANN, & ML 4

Perceptron Training

� given

� input patterns

� desired output patterns

� how to adapt the connection weights such that

the actual outputs conform the desired outputs

uxu

O

MiyOu

i

u

i ,,1 L==

PR , ANN, & ML 5

Simplest case� two types of inputs

� binary outputs (-1,1)

� thresholding

1x

2x

( , )w w1 2

u

o

uuwxwxw yxw =++=⋅ )sgn()sgn( 2211

PR , ANN, & ML 6

Examples

2x

1x 2x O

0 0 -1

0 1 -1

1 0 -1

1 1 1

1x

2x

w w1 1= w2 1=

w0 15= .O

1x

2x O

0 0 -1

0 1 1

1 0 1

1 1 -1

)5.1sgn()( 2211 −+=uu

xwxwnetg

2x2x

1x 1x

1x 2x

PR , ANN, & ML 7

Linear separability

� Is it at all possible to learn the desired I/O

associations?

� yes, if wij can be found such that

� no, otherwise

� Single-layer perceptron is severely limited

in what it can learn

u and i allfor )sgn(1

0

u

i

N

j

i

u

jij

u

i ywxwO =−= ∑=

PR , ANN, & ML 8

Perceptron Learning

� Linear separable or not, how to find the set

of weights?

� Using tagged samples

� closed form solution

� iterative solutions

PR , ANN, & ML 9

Closed Form Solution

BAA)(AW

BAW

T1T −=

=

=

uo

u

n

u

n

n

O

O

O

w

w

w

xx

xx

xx

MMOOOO

2

1

2

1

1

22

1

11

1

1...

1...

1...

� Not practical when number of samples is

large (most likely case)

PR , ANN, & ML 10

1x

2x

),( 21 ww

Perceptron Learning Rule

� If a pattern is correctly classified, no action

PR , ANN, & ML 11

1x

2x

),( 21 ww

Perceptron Learning Rule (cont.)

� If a positive pattern becomes a negative

pattern

PR , ANN, & ML 12

1x

2x

),( 21 ww


� If a negative pattern becomes a positive

pattern

PR , ANN, & ML 13

−∈>⋅−

+∈<⋅+

=+

otherwise

c

c

k

kk

kk

k

)(

)()(

)()(

)1( ,0

,0

w

xxwxw

xxwxw

w

� How should c be decided?

� Fixed increment

� Fractional correction

−+∈<⋅+

=+

otherwise

orycy

k

kk

k

)(

)()(

)1(

,0)(

w

xxwxw

w

PR , ANN, & ML 14


� Weight is a signed, linear combination of

training points

� Use those informative points (those the

classifier made a mistake, mistake driven)

� This is VERY important, lead later to

generalization to Support Vector Machines

PR , ANN, & ML 15

Comparison� Version space

� The (w1-w2) space of all feasible solutions

� Perceptron learning

� Greedy, gradient descent that often ends up at boundary

of the version space with little space for error

� SVM learning

� Center of largest imbedded sphere in the version space

(maximum margin)

� Bayes point machine

� Centroid of the version space

w1

w2

vs

SVM

Bayes

perceptron

PR , ANN, & ML 16

Perceptron Usage Rules

� After the weight has been determined

� Classification involves inner product of

training samples and test samples

� This is again VERY important, lead later

to generalization to Kernel Methods

∑∑ ⋅=⋅=⋅=i

iii

i

iii yyy xxxxxw αα )(

PR , ANN, & ML 17

u

j

u

i

u

i

u

j

u

i

u

i

u

i

u

j

u

i

u

i

u

i

u

i

u

i

u

j

u

iij

ij

old

ij

new

ij

xOy

xOyy

xyOy

otherwise

Oyifxy

)(

)(

)1(

0

2

2

−=

−=

−=

≠

=∆

∆+=

η

η

η

ηw

www

δ

Hebb’s Learning Rule

� Synapse strength should be increased when

both pre- and post-synapse neurons fire

vigorously

� for binary outputs

PR , ANN, & ML 18

� Case 1:

1x

2xw

new

wold

xη2

� Case 2:1,1 =−= yO 1,1 −== yO

wnew

wold

xη2−

2x

1x

PR , ANN, & ML 19

∑∑ ∑∑∑=

−=−=u i

N

j

u

jij

u

i

u i

u

i

u

i xwgOyOE2

1

2 ))((2

1)(

2

1)(w

u

j

u

u

i

u

i

u

i

old

ij

ij

old

ij

new

ij

u

j

u

u

i

u

i

u

i

ij

xnetgnetgO

xnetgnetgOE

∑

∑

−+=

∆+=

−−=

)('))((

)('))(()(

η

∂

∂

w

www

w

w

LMS (Widrow-Hoff, Delta)

� Not restricted to binary outputs

� Gradient search

PR , ANN, & ML 20

u

j

u

u

i

u

i

u

i

ij

u

i

u

i

u

i

u

i

u

i

u

i

u

i

u

i

u

i

u

i

ij

xnetgnetgO

net

net

y

y

yO

yO

yOE

∑ −−=

−

−

−=

)('))((

)(

)(

)()(2

ww

w

∂

∂

∂

∂

∂

∂

∂

∂

∂

∂

Nothing but Chain Rule

PR , ANN, & ML 21

)sgn()( 2211 bxwxwnetgO ++==

1w 2w

xx =1 yx =2

1312.0

2793.0

4299.0

2

1

−=

−=

=

b

w

w

PR , ANN, & ML 22

Final training results Error vs. training epoch

PR , ANN, & ML 23

)sgn()( 332211 bxwxwxwnetgy +++==

1w 3w

xx =1 yx =2 7550.0

3196.0

7411.0

4232.0

3

2

1

=

−=

−=

=

b

w

w

w

2w

zx =3

PR , ANN, & ML 24


PR , ANN, & ML 251x2x

O1 O2

11w21w12w

22w

)( 12121111 bxwxwgy ++=

)( 22221212 bxwxwgy ++=

PR , ANN, & ML 26


PR , ANN, & ML 27

PR , ANN, & ML 28


simple perceptrons - ucsb computer scienceyfwang/courses/cs290i_prann/pdf/percept… · pr , ann,...

Documents