csnb234 artificial intelligence

UNIVERSITI TENAGA NASIONAL

CSNB234ARTIFICIAL INTELLIGENCE

Chapter 10Artificial Neural Networks (ANN)

Chapter 10Artificial Neural Networks (ANN)

Instructor: Alicia Tang Y. C.

(Chapter 11, pp. 458-471, Textbook)(Chapter 18, Ref. #1)

UNIVERSITI TENAGA NASIONAL 2UNIVERSITI TENAGA NASIONAL 2

What is Neural Network?

Neural Networks are a different paradigm for computing:

– Neural networks are based on the parallel architecture of animal brains.

It is a model that simulate a biological neural network

Real brains, however, are orders of magnitude more complex than any artificial neural network so far considered.

UNIVERSITI TENAGA NASIONAL 3

Artificial Neural Networks Supervised Learning

– The Perceptron– Multilayer Neural Networks that use a

backpropagation learning algorithm– The Hopfield network– Stochastic network

Unsupervised Learning– Hebbian Learning– Competitive Learning– Kohonen Network (SOM)


SUPERVISED LEARNING

INPUT

ERRORHANDLER

OUTPUTANN

EXPECTED OUTPUT

Feedback loop


UNSUPERVISED LEARNING

INPUT OUTPUTUnsupervised

learning program

The learning programs will adjust themselves to figure out what could be the output.

There is no targets to match, whatsoever


A Schematic of a Neuron


Neural network at the first glimpse

Neuron– A cell body consists of many dendrites– A single branch is called an axon– It is the information processor

• dendrites handle inputs - receive signals• soma does processing• axon holds output

– Neurons are connected by Synapses• synapses are modelled by (adjusting)

weights - point of contact between neurons


What is in a Neural Network? The model consists of artificial neurons

(processing elements or parameters) • they are called nodes• depends on hardware or software

implementation All neurons are connected in some

structure that form a “network” look, i.e. neurons are interconnected

A neural network usually operates in parallel

• parallel computation– doing multiple things at the same time.


What’s Special in a Neural Network?

Its computing architecture is based on:– large number of relatively simple processors– operating in PARALLEL– connected to each other by link system

How does the artificial neural network model the brain?

An artificial neural network consists of a number of interconnected processors.

These processors are made very simple; which are analogous to biological neurons in the human brain.

The neurons are connected by weighted links passing signals from one neuron to another.

Each neuron receives a number of signals, and it produces only one output signal through its connection.

– The outgoing connection, in turn, splits into a number of branches that transmit the same signal.

The outgoing branches terminate at the incoming connections of other neurons in the network.



Why Neural Network Computing?

To model and mimic certain processing capabilities of our brain.

It imitates the way a human brain works, learns, etc.


A Neural Network Model

Consists of– Input units xi

– Weight from unit i wi

– An activation level a– A threshold – A network topology– A learning algorithm

Real numbers


Neural Network with Hidden Layer(s)


Perceptrons Learn by Adjusting Weights


An example of the use of ANN


THE PERCEPTRON(Single Layer Neural Network)


Perceptron Developed by Frank Rosenblatt (1958). Its learning rule is superior than the Hebb

learning rule. Has been proven by Rosenblatt that the weights

can converge on particular applications. However, the Perceptron does not work for

nonlinear applications as proven by Minsky and Papert (1969).

Activation function used is the binary step function with an arbitrary, but fixed threshold.

Weights are adjusted by the Perceptron learning rule.


A Perceptron

Is a simple neural network

1

2: :n

Input units Output unit

Input unit xi

Weight from unit i wi

Activation level aThreshold

Given that


Threshold Function used by Perceptron

na = 1 if wi xi i=1

a = 0, otherwise

(1)

A unit as being ‘on’ or ‘active’, if activation level is ‘1’.


Perceptron Threshold Function


A Perceptron that learns “AND” and “OR” concepts:

1 11 1

AND- function

Each has two inputsWeights shown next to the arcs/linksThreshold, is shown next to the output

1.5

1 11 1

OR- function

0.5


The perceptron will have its output ‘on’ iff

x1.1 + x2.1 1.5 ---- using (1)

Perceptron learns by repeatedly testing on adjustable ‘weights’ through repeated

presentation of examples

P Q P AND Q-----------------------------------------1 1 11 0 00 1 0 0 0 0

x1 x2


A more abstract characterisation

We view inputs x1, x2, … xn to a perceptron as vectors in n-dim space

Since activation levels are restricted to 1 or 0, all input vectors will lie on the corner of a hypercube in this space

We may view weights and threshold as defining a hyperplane satisfying the equation:

• w1x1 + w2x2 + …. + wnxn - = 0


Geometric Interpretation

Input vectors are classified according to which side of the hyperplane they fall on

This is termed as Linear Discrimination e.g. four possible inputs are fall on

vertices of a square• w1x1 + w2x2 - = 0

•defines a line in the plane


Linear Discrimination

E.g. ax1 + bx2 - c = 0 (straight line)

ax1 + bx2 - c 0 (1 side of straight line)

ax1 + bx2 - c = 0

>= 0<=0


Perceptron cannot compute XOR function (I)

-

- +

+Graph of XOR function

No straight line(s) can be drawn to separate the “+” and “-”. Try it out, if you don’t believe.

P Q P XOR Q-----------------------------------------1 1 01 0 10 1 1 0 0 0

Hidden layersrequired!!


Perceptron cannot compute XOR function (II) Consider this net:

This suggests that neural nets of threshold units comprising more than one layer can correctly compute XOR function

0.5 1 -2 1

1 1 1.5


Perceptron cannot compute XOR function (III)

Hidden unit is neither an input nor an output unit, thus we need not concern with its activation level

Any function a perceptron can compute, a perceptron can learn


Description of A Learning Task Rules:

– to teach a perceptron a function f which maps n binary values x1, x2, … xn to a binary output f(x1, x2, … xn ).

– Think of f being the AND function• { f(1,1)=1, f(1,0)=0, f(0,1)=0, f(0,0)=0 }

– Starting off with random weights & thresholds and inputs & output will have some values that responds to activation level a, either 1 or 0.


We then compare the actual output with the desired output f(x1, x2, … xn ) = t

– ‘t’ for teaching

If the two are the same then leave the weights/threshold alone


Perceptron Learning Algorithm


Set wi ( i = 1, 2, .., n) and to be real numbers

Set to be a positive real number UNTIL all ap = tp for each input pattern p

DO– FOR each input pattern p = (x1

p … xnp) DO

• let new weights & threshold be:– wi wi + (tp - ap) . xi

p

- . (tp - ap)

– ENDFOR END UNTIL


Few words on

This is learning rate Amount by which we adjust wi & for

each pattern P. It affects the Speed of learning

– fairly small positive number is suggested– if it is too big --> over step minima– if it is too small --> move very2 slow


x

Minima is here & being skipped

x x

Too slow!!! Crawling ….


Multi-layer Neural Networks (MLP)

Hidden layers are required…

What are hidden layers?- They are layers additional to the input and output layers, not connected externally. - They are located in between the input and output layers.

Multi-layer Perceptron (MLP)

To build nonlinear classifier based on Perceptrons

Structure of MLP is usually found by experimentation

Parameters can be found using backpropagation


Multi-layer Perceptron (MLP)

How to learn?– Cannot simply use Perceptron learning rule

because we hidden layer(s)

There is a function that we are trying to minimize: e r r o r

Need a different activation function:– Use sigmoid function instead of threshold

function


Formulas needed for

The backpropagation learning algorithm



netj = ( wji . xi) - j)

1f(netj) =

1 + e (- wji

. xi

+ j) / T sigmoid

where Net input

w j,k w j,k + akp.j

p. f’(netjp)

Adjustmentof weightsone here

To compute new weight, i.e. to ‘learn’, we need the following formula:

Measure of errorat output unit

jp = w k,j . k

p Error for hidden units

jp = (tj - aj)

‘j’ says, you (those Ks) caused me to have so much error


Multi-layer Neural Networks

Modifications done to “units” We still assume input values are either 1 or 0 Output values are either 1 or 0 But, activation levels take on any real number

between 0 and 1 Thus,

– the activation level of each unit xj is: first we take the net input to xj to be weighted sum using this formula netj = ( wji . xi) - j) ------ (2)

i


Here,

– summation runs over all input units xi in the previous layer to xj

– with wji denoting the weight on the link from xi to unit xj

j the threshold corresponding to xj

Step function required and we use SIGMOID function


Sigmoid Function Is a continuous function Also called smooth function Why is this f(x) needed?

– It is a mathematical function that produces a sigmoid curve (i.e. S shape). It is a special case of a logistic function. It is used in neural network to introduce non linearity in the learning model.

1f(netj) = 1 + e (- w

ji . x

i +

j) / T

--- Sigmoid f(x)

Run over all i


Learning in Multi-layer NN via the ‘Backpropagation’ learning algorithm

All input patterns P are fed one at a time into the input units

actual response of the output units are compared with the desired output

adjustments are made to the weights in response to discrepancies between the desired & actual outputs

after all input patterns have been given, the whole process is repeated over & over until the actual response of the output is tolerably close to the desired response


We now examine the procedure of adjusting weights:

jp = (tj - aj) -------- (3)

where j

p = error at unit j in respond to presentation of input pattern P

–tj = desired response

–aj = actual response

For an output unit, j


The weights leading to unit j are modified in much the same way as for single-layer perceptron

For all units k which feed into unit j, we set:

w j,k w j,k + akp.j

p. f’(netjp) -------- (4)

f’(netjp)

= rate of change of function at any point, i.e. derivative of a function


What if unit j is a hidden unit?

The measure of jp of error at unit j, cannot

this time be given by the difference (tj - aj) [recall formula (3)]

Because we do not know what the response of the hidden units should be!!

Instead, it is calculated on the basis of the errors of the units in the layer immediately above unit j


Specifically, the error at unit j is the weighted sum of ALL the errors at the units k such that there is a link from unit j to unit k, with the weighting simply being given by the weights on the links:

jp = w k,j . k

p ------ (5)

k


Equation (3) tells us how to calculate error for output units and equation (5) tells us how to calculate errors for hidden units in terms of the errors in the layer above

We can construct a “goodness-of-fit” measure, which is used to determine how close the network is to compute the function we are trying to teach it.

A (sensible) measure is:

E = E p Where E p = ( (tjp - oj

p)2)


ANN Promises– A successful implementation area of ANN is

“vision”.– NN can survive the failure of some nodes– Handle noise (missing data) well. Once trained,

NN shows an ability to recognize patterns even though part of the data is missing

– A tool for modeling and exploring brain function– Parallelism (without much effort)– A neural network can execute an automatic

acquisition task for situation in which historical data are available.


ANN unsolved problems

• It can not (now) model high-level cognitive mechanism such as attention

• Brains are very large, having trillions of neurons

• There is growing evidence that (human) neuron can learn by not merely adjusting weights but to grow new connections

Exercises

State True/False

Multiple choice questions


Circle one of the answers; either True or False:

1. Backpropagation is an AI reasoning technique. [ True False ] 2. In neural network learning, some inputs must be adjusted once the actual

output is found not tally with the desired one. [ True False ]

Circle one the correct answer; either True or False:

1. Neural networks can handle noisy data. [ True False ] 2. Threshold value () is a parameter of the Perceptron learning algorithm.

[ True False ]

1. XOR problem cannot be solved by single layer Perceptron. [ True False ] 2. Backpropagation is a learning algorithm for Multi-Layer Perceptron. [ True False ]


MCQ, tick the correct answer:

1. In supervised learning:

a. Only input stimuli are shown to the network b. The algorithms are known but not the inputs c. The network is being control by the user d. Both the inputs and the desired outputs are known e. None of the above

2. Which of the following is not a promise of Artificial Neural Networks?

a. I t can explain result.

b. I t can survive the failure of some nodes.

c. I t has inherent parallelism.

d. I t can handle noise.

3. Why is the XOR problem exceptionally interesting to neural network

researchers?

a. Because it cannot be expressed in a way that allows you to use a

neural network

b. Because it is a complex binary function that cannot be solved by a

neural network.

c. Because it can be solved by a single layer Perceptron.

d. Because it is the simplest linearly inseparable problem that exists.


MCQ, tick the correct answer: 1. What is back-propagation?

(a) I t is another name given to the curvy function in the Perceptron unit. (b) I t is the transmission of error back through the network to adjust the

inputs. (c) I t is the transmission of error back through the network to allow weights to

be adjusted so that the network can learn. (d) None of the above.

2. Why are linearly separable problems of interest to neural network researchers?

(a) Because they are the only class of problems that a network can solve successfully

(b) Because they are the only class of problems a perceptron can solve successfully

(c) Because they are the only mathematical functions that are continuous

(d) Because they are the only mathematical functions you can draw

3. Which of the following is not a parameter used in the perceptron learning algorithm?

(a) Input units (xi) (b) Learning rate () (c) Threshold value () (d) Error rate ()

Exercise

A neural network for training the recognition of the digits 0 – 9



How many bars (hence input bits) are required to represent the digits 0 – 9

Only 7 bars are requiredto represent

all the 10 digits


Answer: --------- | | | | |-------- | | | | | ---------

* labelling from 0 – 9 is also required

x

x

x

x

x

x

x

x

Sum > 0 ?

So, the neural network can could be usedfor training will look like this:

csnb234 artificial intelligence

Documents