refresher: perceptron training algorithm

September 23, 2010 Neural Networks Lecture 6: Perceptron Learning

1

Refresher: Perceptron Training AlgorithmRefresher: Perceptron Training Algorithm

AlgorithmAlgorithm Perceptron; Perceptron;

Start with a randomly chosen weight vector Start with a randomly chosen weight vector ww00;;

Let k = 1;Let k = 1;

whilewhile there exist input vectors that are there exist input vectors that are misclassified by misclassified by wwk-1k-1, , dodo

Let Let iijj be a misclassified input vector; be a misclassified input vector;

Let Let xxkk = class( = class(iijj))iijj, implying that , implying that wwk-1k-1xxkk < 0; < 0;

Update the weight vector to Update the weight vector to wwkk = = wwk-1k-1 + + xxkk;;

Increment k;Increment k;

end-whileend-while;;


2

Another Refresher: Linear AlgebraAnother Refresher: Linear AlgebraHow can we visualize a straight line defined by an How can we visualize a straight line defined by an equation such as wequation such as w00 + w + w11ii11 + w + w22ii22 = 0? = 0?

One possibility is to determine the points where the line One possibility is to determine the points where the line crosses the coordinate axes:crosses the coordinate axes:

ii11 = 0 = 0 w w00 + w + w22ii22 = 0 = 0 w w22ii22 = -w = -w00 i i22 = -w = -w00/w/w22

ii22 = 0 = 0 w w00 + w + w11ii11 = 0 = 0 w w11ii11 = -w = -w00 i i11 = -w = -w00/w/w11

Thus, the line crosses at (0, -wThus, the line crosses at (0, -w00/w/w22))TT and (-w and (-w00/w/w11, 0), 0)TT..

If wIf w11 or w or w22 is 0, it just means that the line is horizontal or is 0, it just means that the line is horizontal or vertical, respectively.vertical, respectively.

If wIf w00 is 0, the line hits the origin, and its slope i is 0, the line hits the origin, and its slope i22/i/ii i is:is:ww11ii11 + w + w22ii22 = 0 = 0 w w22ii22 = -w = -w11ii11 i i22/i/i1 1 = -w= -w11/w/w22


3

Perceptron Learning ExamplePerceptron Learning Example

i1

1 2 3-3 -2 -1

i2

1

2

3

-3

-2

-1

-1

1

We would like our perceptron to correctly classify the We would like our perceptron to correctly classify the five 2-dimensional data points below.five 2-dimensional data points below.

Let the random initial weight vector Let the random initial weight vector ww00 = (2, 1, -2) = (2, 1, -2)TT..

Then the dividing line crosses atThen the dividing line crosses at (0, 1) (0, 1)TT and (-2, 0) and (-2, 0)TT..

class -1class -1class 1class 1

Let us pick the misclassified Let us pick the misclassified point (-2, -1)point (-2, -1)TT for learning: for learning:

ii = (1, -2, -1) = (1, -2, -1)TT (include offset 1) (include offset 1)

xx11 = (-1)(1, -2, -1) = (-1)(1, -2, -1)TT ( (ii is in class - is in class -1)1)

xx11 = (-1, 2, 1) = (-1, 2, 1)TT


4


i1

1 2 3-3 -2 -1

i2

1

2

3

-3

-2

-1

-1

1

ww11 = = ww00 + + xx1 1 (let us set = 1 for simplicity) (let us set = 1 for simplicity)

ww11 = (2, 1, -2) = (2, 1, -2)T T + (-1, 2, 1)+ (-1, 2, 1)TT = (1, 3, -1) = (1, 3, -1)TT

The new dividing line crosses at (0, 1)The new dividing line crosses at (0, 1)TT and (-1/3, 0) and (-1/3, 0)TT..

Let us pick the next misclassified Let us pick the next misclassified point (0, 2)point (0, 2)TT for learning: for learning:

ii = (1, 0, 2) = (1, 0, 2)TT (include offset 1) (include offset 1)

xx22 = (1, 0, 2) = (1, 0, 2)TT ( (ii is in class 1) is in class 1)

class -1class -1class 1class 1


5


i1

1 2 3-3 -2 -1

i2

1

2

3

-3

-2

-1

-1

1

ww22 = = ww11 + + xx22

ww22 = (1, 3, -1) = (1, 3, -1)T T + (1, 0, 2)+ (1, 0, 2)TT = (2, 3, 1) = (2, 3, 1)TT

Now the line crosses at (0, -2)Now the line crosses at (0, -2)TT and (-2/3, 0) and (-2/3, 0)TT..

With this weight vector, the With this weight vector, the perceptron achieves perfect perceptron achieves perfect classification!classification!

The learning process terminates.The learning process terminates.

In most cases, many more In most cases, many more iterations are necessary than in iterations are necessary than in this example.this example.class -1class -1

class 1class 1


6

Perceptron Learning ResultsPerceptron Learning ResultsWe proved that the perceptron learning algorithm is We proved that the perceptron learning algorithm is guaranteed to find a solution to a classification problem guaranteed to find a solution to a classification problem if it is linearly separable.if it is linearly separable.

But are those solutions optimal?But are those solutions optimal?

One of the reasons why we are interested in neural One of the reasons why we are interested in neural networks is that they are able to generalize, i.e., give networks is that they are able to generalize, i.e., give plausible output for new (untrained) inputs.plausible output for new (untrained) inputs.

How well does a perceptron deal with new inputs? How well does a perceptron deal with new inputs?


7

Perceptron Learning ResultsPerceptron Learning ResultsPerfect Perfect classification of classification of training samples, training samples, but may not but may not generalize well to generalize well to new (untrained) new (untrained) samples.samples.


8

Perceptron Learning ResultsPerceptron Learning ResultsThis function This function is likely to is likely to perform perform better better classification classification on new on new samples.samples.


9

AdalinesAdalinesIdea behind adaptive linear elements (Adalines):Idea behind adaptive linear elements (Adalines):

Compute a continuous, differentiable error function Compute a continuous, differentiable error function between net input and desired output (before applying between net input and desired output (before applying threshold function).threshold function).

For example, compute the mean squared error (MSE) For example, compute the mean squared error (MSE) between every training vector and its class (1 or -1).between every training vector and its class (1 or -1).

Then find those weights for which the error is minimal.Then find those weights for which the error is minimal.

With a differential error function, we can use the With a differential error function, we can use the gradient descent technique gradient descent technique to find this absolute to find this absolute minimum in the error function.minimum in the error function.


10

Gradient DescentGradient Descent

Gradient descent is a very common technique to find Gradient descent is a very common technique to find the absolute minimum of a function.the absolute minimum of a function.

It is especially useful for high-dimensional functions.It is especially useful for high-dimensional functions.

We will use it to iteratively minimizes the network’s (or We will use it to iteratively minimizes the network’s (or neuron’s) error by neuron’s) error by finding the gradientfinding the gradient of the error of the error surface in weight-space and surface in weight-space and adjusting the weightsadjusting the weights in the opposite direction.in the opposite direction.


11

Gradient DescentGradient Descent

Gradient-descent example:Gradient-descent example: Finding the absolute Finding the absolute minimum of a one-dimensional error function f(x):minimum of a one-dimensional error function f(x):

f(x)f(x)

xxxx00

slope: f’(xslope: f’(x00))

xx1 1 = x= x00 - f’(x - f’(x00))

Repeat this iteratively until for some xRepeat this iteratively until for some x ii, f’(x, f’(xii) is ) is

sufficiently close to 0.sufficiently close to 0.


12

Gradient DescentGradient DescentGradients of two-dimensional functions:Gradients of two-dimensional functions:

The two-dimensional function in the left diagram is represented by contour The two-dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function lines in the right diagram, where arrows indicate the gradient of the function at different locations. Obviously, the gradient is always pointing in the at different locations. Obviously, the gradient is always pointing in the direction of the steepest increase of the function. In order to find the direction of the steepest increase of the function. In order to find the function’s minimum, we should always move against the gradient.function’s minimum, we should always move against the gradient.

refresher: perceptron training algorithm

Documents