artificial neural networks - university of calgary in...

Artificial Neural Networks

CPSC 533 — Winter 2001

Christian Jacob ©

Neural Networks in the Context of AI Systems

‡ Neural Networks as Mediators between Symbolic AI and Statistical

Methods

‡ Neural Networks and Fuzzy-NN Hybrid Systems

2 05.1-NeuralNets-2.nb

The Brain Paradigm

Brains as Hierachical, Adaptive Processors?

05.1-NeuralNets-2.nb 3

Example: Visual Cortex of a Cat

‡ Image 1

‡ Image 2


ANN Image Processing Example

The Neuron: Elementary Processing Element of the Brain

Ë The neuron is the cell that performs information processing in the brain.

Ë The neuron is the fundamental unit of all nervous system tissues.

‡ Basic Neuron Architecture

Each neuron consists of: Ë Soma Ë Dendrites Ë Axon Ë Synapse


‡ Neurons and their Adaptive Connectivity

Brains vs. Digital ComputersÏ Computers require hundreds of cycles to simulate a firing of a neuron. (How does the "firing" pattern of a neuron

look like?)

Ï Computers are good at symbol processing.

ï Is "life" and "mind" reducible to "symbol processing"?

Ï Brains perform extremely well at highly parallel pattern recognition tasks:

Ë face recognition,

Ë language processing,


Ë language understanding (!),

Ë creativity, inventing, use of tools, ...

Ë self-reflection, self-awareness, ...

‡ Computers versus Human Brains

Human Brain:

Ë Grown by cell differentiation and iterated cell division (instead of constructed from pre-fabricated building blocks)

Ë Rather simple "processing elements"

Ë High degree of interconnectivity (adaptive!)

Ë Adaptive and hierarchical architecture

Ë Highly parallel and distributed information processing

Ë Redundant information storage and processing

Ë Functionality is both pre-programmed (to some degree) and "programmable"

Ë "Algorithms" are designed through learning, not programming.

‡ Computers versus Human Brains: Hard- / Software and Processing

Computer Human Brain

Computational units 1 CPU, 105 gates 1011 neurons

Storage units 109 bits RAM, 1010 bits disk 1011 neurons, 1014 synapses

Cycle time 10-9 sec 10-3 sec

Neuron updates per sec 105 1014


Networks of Neurons

Dendrites, synapses, cell body, and axon are the four elements that are usually adopted from the biological model in order to build artificial neural networks.

Artificial neurons for computing will have

Ë input channels,

Ë a cell body, and

Ë an output channel.

Synapses are simulated by contact points between the cell body and input or output connections.

A weight will be associated with these points.

Figure 1. A typical motor neuron

Transmission of Information

A fundamental problem of any information processing system is the way by which information is transmitted through the system.


Neurons transmit information using electrical signals.

However, in biological structures this can not be done by simple electronic trans-port as in metallic cables.

Evolution arrived at another solution: involving ions and semipermeable membranes.

‡ Charged Cells

Our body consists mainly of water, 55% of which is contained within the cells and 45% forming its environment.

The cells preserve their identity and biological components by enclosing the proto-plasm in a membrane.

Membranes are made of a double layer of molecules that form a diffusion barrier.

Some salts, present in our body, dissolve in the intra- and extracellular fluid and dissociate into negative and positive ions.

Ions present in the cells that play an important role for neurons and their informa-tion processing are

Ë sodium ions HNa+L, chlorine ions HCl-L, potassium HK+L, and calcium HCa2+L.

The membranes of the cells exhibit different degrees of permeability for each of these ions.

The permeability is determined by the number and size of pores in the membrane, the so-called ionic channels.

The specific permeability of the membrane leads to different distributions of ions in the interior and the exterior of the cells.

‡ Action Potential

In particular, differences in membrane permeability lead to the interior of neurons being negatively charged with respect to the extracellular fluid.

An action potential is produced by an initial depolarization of the cell membrane.


Figure 2. Typical form of the action potential

The potential increases from -70mV to +40mV.

After some time, the potential becomes negative again, but it overshoots.

Gradually, the cell recovers and the cell membrane returns to the initial potential.


‡ Transmission of an Action Potential

Figure 3. Transmission of an action potential

Information Processing at the Synapses

Neurons transmit information using action potentials.


The processing of this information at the interfaces between neurons, the synapses, involves a combination of electrical and chemical processes.

‡ Directed Transmission of Information

Synapses determine a direction for the transmission of information.

Signals flow from one cell to another in a well-defined manner.

Figure 4. Chemical signaling at the synapse

‡ Chemical Signaling

When an electric impulse arrives at the synapse, the synaptic vesicles fuse with the cell membrane.

The transmitters flow into the synaptic gap and some attach to the ionic channels.

This opens the ionic channels such that more ions can now flow from the exterior to the interior of the cell.

This way, the cell's potential is altered.

If the interior of the cell is increased, this helps prepare an action potential and the synapse causes an excitation of the cell.


Storage of Information and Learning

Figure 5. Unblocking of an NMDA receptor

NMDA receptors help to understand some forms of learning (among many others) in neurons.

NMDA receptors are ionic channels permeable for different kinds of molecules (sodium, calcium, or potassium ions).

These channels are blocked by a magnesium ion, such that the permeability for sodium and potassium is low.

If the cell has reached a certain excitation level, the ionic channels lose the magne-sium ions and become unblocked.

The permeability for Ca2+ions increases immediately, which starts a chain of reac-tions resulting in a durable change of the threshold level of the cell.

Artificial Neural Networks: Introductory Concepts

Definition of an ANN

Ë A neural network is a system composed of (usually a large number of) simple processing elements (neurons).

Ë Ideally, the processing elements operate asynchronously and in parallel.


Ë The ANNs can be used to acquire (through training, learning), store, and utilize experiential knowledge.

Mathematically, a neural network is a "mapping machine" capable of modeling a function

F : —n ö—m

That is, a network maps an m-dimensional real input vector Hx1, x2, …, xnL to an m-dimensional real output Hy1, y2, …, ymL.

ANN Architectures

‡ Feed-forward networks:

Ë Neurons are arranged in layers.

Ë Links only follow one direction, namely from input to output layer.

Ë Usually, a unit is linked only to units in the following layer(s).

Ë Units within the same layer are not linked.

Ë Signal (and error) propagation as well as weight updating can proceed uni-formly from the input to the output layer.

Figure 6. Example of a feed-forward network with a single hidden layer

‡ Recurrent networks:

Ë Links can be between any neuron and form arbitrary topologies.

Ë Can implement more complex neural architectures.

Ë Internal states with memory can be modelled.

Ë A stable internal state and output might not be reached.


Figure 7.McCulloch-Pitts network for a binary scaler. For example, it translates the binary sequence 00110110 into the sequence 00100100.

‡ Example 1: Perceptron

‡ Example 2: Simple Feed-forward Network — Input: (1, 0)


w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1

Input Hidden 1 Hidden 2 Output

1

0

0

0 0

0

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1

w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


1

0

0

0 0

0

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1


w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


1

0

0

0 1

0

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1

w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


1

0

0

0 1

0

1I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1



w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


0

1

0

0 0

0

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1

w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


0

1

0

1 0

0

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1


w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


0

1

0

1 1

0

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1

w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


0

1

0

1 1

0

1I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1



w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


1

1

0

0 0

0

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1

w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


1

1

1

1 0

0

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1


w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


1

1

1

1 1

1

0I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1

w13 = -1

w24 = -1

t = -0.5 t = 1.5

t = -0.5 t = 1.0

t = 0.5

w35 = 1

w46 = 1

w67 = 1

w57 = 1


1

1

1

1 1

1

1I1

I2

H3

H4

H5

H6

O7

w16 = 1

w25 = 1


A Generic Neuron Model

Generic Model of a Neuron Processing Unit

‡ A typical model of a neural processing unit:

‡ A more detailed model of a neural processing unit:

Input function:

ini = ⁄ j w j,i a j


Activation function:

gHiniL = gH⁄ j w j,i a jLOutput function:

ai = outHgHiniLL = outHgH⁄ j w j,i a jLL

‡ Activation Functions

(1) Step Function:

stepHx, tL = 91 if x ¥ t0 if x < t

-4 -2 2 4

0.2

0.4

0.6

0.8

1t = 1

(2) Sign Function:

signHxL = 91 if x ¥ 0

-1 if x < 0


-4 -2 2 4

-1

-0.5

0.5

1

(3) Sigmoid Function:

sigmoidHx, aL = 1ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ1+E-a x

-4 -2 2 4

0.2

0.4

0.6

0.8

1

The parameter a determines the slope of the sigmoid function:

Ë 0.1 ≤ a ≤ 1:


-4 -2 2 4

0.45

0.5

0.55

0.6

a = 0.1

-4 -2 2 4

0.4

0.5

0.6

0.7

a = 0.2

-4 -2 2 4

0.3

0.4

0.5

0.6

0.7

0.8

a = 0.3


-4 -2 2 4

0.2

0.4

0.6

0.8

a = 0.4

-4 -2 2 4

0.2

0.4

0.6

0.8

a = 0.5

-4 -2 2 4

0.2

0.4

0.6

0.8

a = 0.6


-4 -2 2 4

0.2

0.4

0.6

0.8

a = 0.7

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 0.8

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 0.9


-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 1.

Ë 1 ≤ a ≤ 10 :

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 1

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 2


-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 3

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 4

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 5


-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 6

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 7

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 8


-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 9

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 10

This allows the sigmoid function to approximate both the step and sign function.

Neurons in Action

‡ Neurons as Logic Gates

Individual units, representing Boolean functions, can act as logic gates, given appro-priate thresholds and weights.

Activation function: stepHx, tL


-4 -2 2 4

0.2

0.4

0.6

0.8

1t = 1

‡ ???

w = 1

w = 1

t = 1.5

‡ ???

w = 1

w = 1

t = 0.5


‡ ???

w = -1t = -0.5

Specific Neuron Models

McCulloch-Pitts Units

McCulloch-Pitts processing units are the simplest neuron models, which produce and transmit only binary information.

Figure 8. McCulloch-Pitts unit

The rule for evaluating the input of a McCulloch-Pitts unit is as follows:

Ë The MP unit gets two sorts of input:

- input x1, x2, …, xn through n excitatory edges

- input y1, y2, …, ym through m inhibitory edges.


Ë If m ¥ 1 and at least one of the signals y1, y2, …, ymis 1, the unit is inhibited and the output is 0.

Ë Otherwise, the total excitation x = x1 + x2 + … + xn is computed and com-pared to the threshold q:

output = 91 if x ¥ q0 if x < q

‡ Conjunction and Disjunction

Figure 9. Generalized AND and OR gates as McCulloch-Pitts units

‡ Negation and More Logical Functions

Figure 10.Logical functions and their realizations as McCulloch-Pitts neurons

‡ What Do MP Units Compute?

For visualization purposes, we consider the function space of logical functions of three variables.


Figure 11. Function values of a logical function of three variables Hx1 , x2 , x3 L

McCulloch-Pitts units divide the input space into two half-spaces.

For a given input Hx1, x2, x3L and a threshold q the condition

x1 + x2 + x3 ¥ q

is tested, which is true for all points to one side of the plane defined by x1 + x2 + x3 = 0 and false for all points to the other side.


Figure 12. Separation of the input space for the OR function

The majority function (with threshold q = 2) of three variables divides the input space in a similar manner, but the separating plane is given by the equation x1 + x2 + x3 = 2.


Figure 13. Separating planes of the OR and majority functions

The planes are always parallel in the case of McCulloch-Pitts units.

The Perceptron

Today, the perceptron is one of the classic models of neural network processing elements and architectures. Its use in practical applications is limited, however, due to its simplicity (both in its structure and learning algorithm) it provides a good model to study the basics and problems of connectionist information processing.

‡ The Classical Perceptron

The perceptron was probably the first computation device inspired by neural net-works.


The perceptron was developed in 1958 by the American psychologist Frank Rosenblatt.

Rosenblatt used the perceptron for image processing and image classification tasks.

Figure 14. The classical perceptron architecture as proposed by Frank Rosenblatt

‡ Minsky-and-Papert Perceptron

Minsky and Papert distilled the essential features from Rosenblatt's model in order to study the computational capabilities of the perceptron under different assumptions.

A retina is directly connected to logic elements called predicates, which can com-puter a single bit according to their input.

These predicates can be as computationally complex as we like. For example, each predicate could perform a filter function on the pixel image.

Each predicate, however, is limited in its diameter or the number of input pixels. No predicate sees the whole retina.

A threshold unit, which receives weighted inputs from the predicates, is used to compute the final output of the perceptron.


Figure 15. Predicates and weights of a perceptron.

‡ Limitations

Minsky and Papert differentiated between several kinds of predicates:

Ë diameter limited perceptrons: each receptive field of each predicate has a lim-ited diameter

Ë Perceptrons of limited order: each receptive field can only contain up to a certain maximum number of points

Ë Stochastic perceptrons: each receptive field consists of a number of randomly chosen points

These two researchers proved (among other limitations) that no diameter limited perceptron can decide whether a geometric figure is connected or not.


Figure 16. Receptive fields of predicates

‡ A Perceptron Cell

.

.

.

x1

x2

xn

xn+1 = 1

w1

w2

wn

wn+1 = -θ

0 y


‡ From Inputs to Output

The perceptron calculates its output value as follows:

y = 91 if ⁄i=1

n+1 wi ÿ xi ¥ 0

0 if ⁄i=1n+1 wi ÿ xi < 0

What Do Perceptrons Compute?

‡ Geometric Interpretation

A simple perceptron is a computing unit with threshold q which, when receiving the n real inputs x1, x2, …, xn through edges with the associated weights w1, w2, …, wn , computes its output as follows:

output = 91 if ⁄i=1

n wi xi ¥ q

0 otherwise.

The following figure shows this separation of the input space for weights Hw1, w2L = H0.9, 0.2L.


Figure 17. Separation of input space with a perceptron testing the condition 0.9 x1 + 2 x2 ¥ 1

‡ Perceptrons with a Bias

In many cases it is more convenient to deal with perceptrons of threshold zero only. This corresponds to linear separations which go through the origin of the input space.

Any perceptron with threshold q can be converted into an equivalent perceptron with threshold zero, which has an additional input called the bias weighted by -q.


Figure 18. A perceptron with a bias

Most learning algorithms can be stated more concisely by transforming thresholds into biases.

The input and weight vectors must be extended:

Ë extended input vector: Hx1, x2, …, xn , 1L

Ë extended weight vector: Hw1, w2, …, wn, wn+1L with wn+1 = -q .

‡ Linearly Separable Functions

A perceptron network is capable of computing any logical function.

If we reduce the network to a single perceptron, which functions are still computable?

The 16 Boolean functions of two variables:

x1 x2 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f130 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 01 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1

1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1

Perceptron-computable functions are those for which the points whose function value is 0 can be separated from the points whose function value is 1 using a line.


Figure 19.Linear separations of input space corresponding to OR and AND

Two sets of points A and B in an n-dimensional space are called linearly separable if n + 1 real numbers w1, …, wn+1 exist, such that every point Hx1, x2, …, xn L œ A satisfies ⁄i=1

n wi xi ¥ wn+1 and every point Hx1, x2, …, xnL œ B satisfies ⁄i=1

n wi xi < wn+1.

‡ The XOR Problem

The XOR function is not linearly separable, because the following four inequalities would have to be fulfilled:

x1 = 0; x2 = 0: w1 x1 + w2 x2 = 0 ö 0 < q

x1 = 1; x2 = 0: w1 x1 + w2 x2 = w1 ö w1 ¥ q

x1 = 0; x2 = 1: w1 x1 + w2 x2 = w2 ö w2 ¥ q

x1 = 1; x2 = 1: w1 x1 + w2 x2 = w1 + w2 ö w1 + w2 < q

According to the first inequality, q is positive. w1 and w2 are positive, too, accord-ing to the next two inequalities. Therefore, w1 + w2 < q cannot be true.


‡ Duality of Input Space and Weight Space

Figure 20. Duality of input and weight space

The computation performed by a perceptron can be visualized as a linear separation of input space.

When trying to find the appropriate weights for a perceptron, the search process can be better visualized in weight space.

‡ The Error Function in Weight Space

Assume that the set A of input vectors in n-dimensional space myst be separated from the set B of input vectors such that a perceptron computes the binary function fw with

fwHxL = 91 if x œ A0 if x œ B

The function fw depends on the set w of weights (including the threshold).

The error function value is the number of false classifications for a particular weight vector w.

Therefore, the error function can be defined as follows:

EHwL = ⁄xœA H1 - fwHxLL + ⁄xœB fwHxL.


Since EHwL is positive or zero, we want to reach the global minimum where EHwL = 0.

Consequently, the aim of perceptron learning is to find the weight vector for which EHwL = 0.

Figure 21.Error function for the AND function (for a perceptron with two inputs Hx1 , x2 L and constant threshold q = 1.


Figure 22. Iteration steps to the region of minimal error

The optimization problem, which the learning algorithm has to solve, can be under-stood as descent on the error surface.

‡ For Later: General Decision Curves

Functions used to discriminate between regions of input space are called decision curves.

A neural network must learn to identify these regions and to associate them with the correct classification response.


Figure 23. Non-linear separation of input space

Adaptive "Programming" of ANNs through Learning

ANN Learning

A learning algorithm is an adaptive method by which a network of computing units self-organizes to implement the desired behavior.


Changing Network Parameters

TestingInput/Output

Examples

CalculatingNetwork Errors

Figure 24. Learning process in a parametric system

In some learning algorithms, examples of the desired input-output mapping are presented to the network.

A correction step is executed iteratively until the network learns to produce the desired response.

Learning Schemes

Ï Supervised Learning:

Some input vectors are collected and presented to the network. The output computed by the network is observed and

the deviation from the expected answer is measured. The weights are corrected (= learning algorithm) according to the magnitude of the error.

Ë Error-correction Learning:

The magnitude of the error, together with the input vector, determines the magnitude of the corrections to the weights.

Examples: Perceptron learning, backpropagation.

Ë Reinforcement Learning:

After each presentation of an input-output example we only know whether the network produces the desired result or not. The weights are updated based on


this Boolean decision (true or false).

Examples: Learning how to ride a bike.

Ï Unsupervised Learning:

For a given input, the exact numerical output a network should produce is unknown. Since no "teacher" is available, the network must organize itself (e.g., in order to associate clusters with units).

Examples: Clustering with self-organizing feature maps, Kohonen networks.

Figure 25. Three clusters and a classifier network

The Perceptron Learning Algorithm

‡ Optimization Problem

The optimization problem, which the learning algorithm has to solve, can be under-stood as descent on the error surface.

But we can also look at the problem as a search for an inner point of the solution region (a polytope in the case of the perceptron).

For example, let's have a look at the separation corresponding to the AND function:

P = 8H1, 1L<

N = 8H0, 0L, H1, 0L, H0, 1L<

Here P and N are the two sets of points to be separated.

The set P must be classified in the positive and the set N in the negative half-space.


‡ Optimization Problem: The Calculation

Three weights w1, w2 and w3 = -q are needed to implement the desired separation with a generic perceptron.

With the extended input vector Hx3 = 1L, the following four inequalities have to be fulfilled:

H0, 0, 1L ÿ Hw1, w2, w3L < 0

H1, 0, 1L ÿ Hw1, w2, w3L < 0

H0, 1, 1L ÿ Hw1, w2, w3L < 0

H1, 1, 1L ÿ Hw1, w2, w3L > 0

These equations can be written in a simpler matrix form:

i

k

jjjjjjjjjjj

0 0 -1-1 0 -1

0 -1 -11 1 1

y

{

zzzzzzzzzzz

i

k

jjjjjj

w1w2w3

y

{

zzzzzz>

i

k

jjjjjjjjjjj

00

00

y

{

zzzzzzzzzzz

This can be written as

¿ ÿ w”÷÷ > 0,

where ¿ is the 4µ3 matrix and w”÷÷ the weight vector (written as a column vector).

This equation describes all points in the interior of a convex polytope.

The sides of the polytope are delimited by the planes defined by each of the inequali-ties above.

Any point in the interior of the polytope represents a solution for the learning problem.


Figure 26. Solution polytope for the AND function in weight space

‡ Learning Algorithm

The following procedure describes the learning algorithm for a single perceptron cell.

Given are two sets of points P and N , which the perceptron should learn to classify.

Ï Start: Generate an initial vector of weights w”÷÷ 0 .

t = 0

w”÷÷ = w”÷÷ 0

Ï Testing: Select x”÷ œ P ‹ N .

If x”÷ œ P and w”÷÷ t ÿ x”÷ > 0: goto Test for End

If x”÷ œ P and w”÷÷ t ÿ x”÷ § 0: goto Addition

If x”÷ œ N and w”÷÷ t ÿ x”÷ < 0: goto Test for End

If x”÷ œ N and w”÷÷ t ÿ x”÷ ¥ 0: goto Subtraction


Ï Addition: w”÷÷ t+1 = w”÷÷ t +x”÷

t = t + 1

goto Testing

Ï Subtraction: w”÷÷ t+1 = w”÷÷ t -x”÷

t = t + 1

goto Testing

Ï Test for End: Are all x”÷ œ P ‹ Ncorrectly classified?

Yes: END

No: goto Testing

Note: The perceptron learning procedure only works if the point sets are linearly separable.

‡ Example

The following example illustrates the convergence behavior of the perceptron learn-ing algorithm.


Figure 27. Initial Configuration


Figure 28. After correction with x”1



References

Hertz, J., Krogh, A., and Palmer, R. G. Introduction to the Theory of Neural Compu-tation. Addison-Wesley, Reading, MA, 1991.

Rojas, R. Neural Networks: A Systematic Introduction . Springer Verlag, Berlin,1996.


artificial neural networks - university of calgary in...

Documents