lecture 3 perceptrons

7/21/2019 Lecture 3 Perceptrons

http://slidepdf.com/reader/full/lecture-3-perceptrons 1/46

The Linear Classifier,

also known as the “Perceptron”

COMP24111 lecture 3



LAST WEEK: our first“machine learning” algorithm

Testing point x

For each training datapoint x’

measure distance(x,x’)

End

Sort distances

Select K nearest

Assign most common class

The K-Nearest Neighbour Classifier

Make your own notes on itsadvantages / disadvantages.

I’ll ask for volunteers nexttime we meet…..



Model(memorize the

training data)

Testing Data(no labels)

Training data

Predicted Labels

Learning algorithm

(do nothing)

Supervised Learning Pipeline for Nearest Neighbour



Themost important concept in Machine Learning



Looks good so far…





Oh no! Mistakes!

What happened?





Oh no! Mistakes!

What happened?

We didn’t have all the data.

We can never assume that we do.

This is called “OVER-FITTIN”to the small dataset.




The Linear Classifier

COMP24111 lecture 3



Model(equation)


Training data

Predicted Labels

Learning algorithm

(search for good

parameters)

Supervised Learning Pipeline for Linear Classifiers



A more simple,compact model?

height

weightt

dancer""else player""then)(if t weight >



What’s an algorithm to find a good threshold?

height

weight


t=40

while ( numMistakes != 0 ){

t = t + 1

numMistakes = testRule(t)

}



We have our second Machine Learning procedure.


t=40while ( numMistakes != 0 )

{

t = t + 1


}

The threshold classifier (also known as a“Decision Stump”)



Three“ingredients” of a Machine Learning procedure

“Model”The final product, the thing you have to package upand send to a customer. A piece of code with some parameters that need to be set.

“Error function”The performance criterion: the function you use to judge how well the parameters of the model are set.

“Learningalgorithm”

The algorithm that optimises the model parameters,using the error function to judge how well it is doing.



Three“ingredients” of a Threshold Classifier

Model

Learningalgorithm

t=40

while ( numMistakes != 0 )

{ t = t + 1


}

Error function

dancer""else player""then)(if t x >

General case – we’ re not just talking about the weight

of a rugby player – a threshold can be put on any feature‘ x’ .



Model

(memorize the

training data)


Training data

Predicted Labels

Learning algorithm

(do nothing)

Supervised Learning Pipeline for Nearest Neighbour



What’s the“model” for the Nearest Neighbour classifier?

height

weight

For the k-nn, the model is thetraining data itself ! - very good accuracy - very computationally intensive!

Testing point x

For each training datapoint x’

measure distance(x,x’)

End

Sort distances

Select K nearest

Assign most common class



height

weight

New data: what’s an algorithm to find a good threshold?

Our model does notmatch the problem!

t

1 mistake…




height

weight

New data: what’s an algorithm to find a good threshold?

But our current modelcannot represent this…




We need a more sophisticated model…

dancer""else player""then)(if t x >



nput signals sent

from other neurons

f enough sufficient

signals accumulate! theneuron fires a signal"

#onnection strengths determine

ho$ the signals are accumulated



1 x

2 x

3 x

add

)( t aif >1=output output

signal

• input signals!x’ and coefficients!w’ are multiplied

• weights correspond to connection strengths

• signals are added up – if they are enough, FIRE!

else0=output

1w

2w

3w

i

M

i

iw xa ∑=

=1

incoming

signal

connection

strengthacti%ation

le%el

output

signal



Sum notation

(just like a loop from 1 to

M)

double[] x =

double[] =

Multiple corresponding

elements and add them up

a

if (actiation ! threshold) "#$% &

(actiation)

i

M

i

iw xa ∑=

=1

Calculation…



t >if 0else '1then == output output

∑

=

i

M

i

iw x1

The Perceptron Decision Rule



out"ut # $out"ut # %


∑=

i

M

i

iw x1

Ru&'( "la(er # $)allet dancer # %



s this a good decision boundar&'


∑=

i

M

i

iw x1



w$ # $.%

w* # %.*

t # %.%+


∑=

i

M

i

iw x1



w$ # *.$

w* # %.*

t # %.%+


∑=

i

M

i

iw x1



w$ # $.,

w* # %.%*

t # %.%+


∑=

i

M

i

iw x1



han&in& the wei&htsthreshold ma/es the decision 'oundar( move.

0ointless im"ossi'le to do it '( hand 1 onl( o/ 2or sim"le *-3 case.

We need an al&orithm4.

w$ # -%.5

w* # %.%6

t # %.%+



*0 '*0 '2*0 +=w

0*2 '*0 '0*1 += x

,1* -hat is the actiation' a' of the neuron.

,2* /oes the neuron fire.

,3* -hat if e set threshold at 0* and eight 3 to ero.

0*1=t

w$

w*

w6

7$

7*

76

∑=

= M

i

iiw xa1

Take a 20 minute break

and think about this.



20 minute break



*0 '*0 '2*0 +=w

0*2 '*0 '0*1 += x

0*1=t

w$

w*

w6

7$

7*

76

∑=

= M

i

iiw xa1

w$

w*

w6

7$

7*

76

∑=

= M

i

iiw xa1

3*1)*00*2()*0*0()2*00*1(1

=×+×+×==∑=

M

i

iiw xa

" *hat is the acti%ation! a! of the neuron'

+" Does the neuron fire'

if (activation > threshold) output else output"

…# $o %es& it fires#



*0 '*0 '2*0 +=w

0*2 '*0 '0*1 += x

0*1=t

w$

w*

w6

7$

7*

76

∑=

= M

i

iiw xa1

w$

w*

w6

7$

7*

76

∑=

= M

i

iiw xa1

*0)0*00*2()*0*0()2*00*1(1

=×+×+×==∑=

M

i

iiw xa

," *hat if $e set threshold at -". and $eight /, to zero'

if (activation > threshold) output else output"

…# $o no& it does not fire##



We need a more sophisticated model…

height

weight


dancer""else player""then))((if t x f > )(

)(

2

1

kg weight x

cmheight x

==

i

d

i

i xw∑=

=1

)4()4()( 2211 xw xw x f +=

The Perceptron



The Perceptron

i

d

i

i xw∑=

=1

)4()4()( 2211 xw xw x f +=

height

weight

height

weight

dancer""else player""then)(if t x f >

, and change the position of the DECISION BOUNDARYt 1w 2w

Decisionboundary



The Perceptron

error)tionclassifica(a*k*a* mistakesof 5um6erError function

height

weight

Model

Learning algo. alues***andtheoptimisetoneed****... t w

0

1

=

=

"dancer"

"player" 0y7else1y7thenif

1

==>∑=

t xw i

d

i

i



Perceptron Learning Rule

ne eight 8 old eight 9 0*1 ( true:a6el ; output ) input×

i24 8 target = !9 output = ! : 4. then u"date # ;

i24 8 target = !9 output = " : 4. then u"date # ;

i24 8 target = "9 output = ! : 4. then u"date # ;

i24 8 target = "9 output = " : 4. then u"date # ;

×

What 'eight updates do these cases produce?

update



initialise weights to random numbers in range -1 to +1

or n = 1 to "M#$%&R'%$

or ea*h training eam,le (.)

*al*ulate a*ti/ation

or ea*h weight

u,date weight b. learning rule

end

end

end

Perceptron convergence theorem:

If the data is linearly separable, then application of the

Perceptron learning rule will find a separating decision boundary,

within a finite number of iterations

Learning algorithm for the Perceptron



Model(if… then…)


Training data

Predicted Labels

Learning algorithm(search for good

parameters)

Supervised Learning Pipeline for Perceptron

N d l l bl



New data…. non-linearly separable

height

weight

Our model does notmatch the problem!

(AGAIN!)

Many mistakes!

dancer""else player""thenif 1

t xw i

d

i

i >∑=

M lil P



Multilayer Perceptron

x1

x2

x3

x4

x5



MLPd ii b d li bl ld!



MLP decision boundary – nonlinear problems, solved!

height

weight

N lNt k



Neural Networks - summary

Perceptrons are a (simple) emulation of a neuron.

Layering perceptrons gives you… a multilayer perceptron.An MLP is one type of neural network – there are others.

An MLP with sigmoid activation functions can solve highly nonlinear problems.

Downside – we cannot use the simple perceptron learning algorithm.

Instead we have the backpropagation algorithm.

This is outside the scope of this introductory course.

lecture 3 perceptrons

Documents