csnb234 artificial intelligence
DESCRIPTION
CSNB234 ARTIFICIAL INTELLIGENCE. Chapter 10 Artificial Neural Networks (ANN). (Chapter 11, pp. 458-471, Textbook) (Chapter 18, Ref. #1). Instructor: Alicia Tang Y. C. What is Neural Network?. Neural Networks are a different paradigm for computing: - PowerPoint PPT PresentationTRANSCRIPT
UNIVERSITI TENAGA NASIONAL
CSNB234ARTIFICIAL INTELLIGENCE
Chapter 10Artificial Neural Networks (ANN)
Chapter 10Artificial Neural Networks (ANN)
Instructor: Alicia Tang Y. C.
(Chapter 11, pp. 458-471, Textbook)(Chapter 18, Ref. #1)
UNIVERSITI TENAGA NASIONAL 2UNIVERSITI TENAGA NASIONAL 2
What is Neural Network?
Neural Networks are a different paradigm for computing:
– Neural networks are based on the parallel architecture of animal brains.
It is a model that simulate a biological neural network
Real brains, however, are orders of magnitude more complex than any artificial neural network so far considered.
UNIVERSITI TENAGA NASIONAL 3
Artificial Neural Networks Supervised Learning
– The Perceptron– Multilayer Neural Networks that use a
backpropagation learning algorithm– The Hopfield network– Stochastic network
Unsupervised Learning– Hebbian Learning– Competitive Learning– Kohonen Network (SOM)
UNIVERSITI TENAGA NASIONAL 4
SUPERVISED LEARNING
INPUT
ERRORHANDLER
OUTPUTANN
EXPECTED OUTPUT
Feedback loop
UNIVERSITI TENAGA NASIONAL 5
UNSUPERVISED LEARNING
INPUT OUTPUTUnsupervised
learning program
The learning programs will adjust themselves to figure out what could be the output.
There is no targets to match, whatsoever
UNIVERSITI TENAGA NASIONAL 6
A Schematic of a Neuron
UNIVERSITI TENAGA NASIONAL 7
Neural network at the first glimpse
Neuron– A cell body consists of many dendrites– A single branch is called an axon– It is the information processor
• dendrites handle inputs - receive signals• soma does processing• axon holds output
– Neurons are connected by Synapses• synapses are modelled by (adjusting)
weights - point of contact between neurons
UNIVERSITI TENAGA NASIONAL 8
What is in a Neural Network? The model consists of artificial neurons
(processing elements or parameters) • they are called nodes• depends on hardware or software
implementation All neurons are connected in some
structure that form a “network” look, i.e. neurons are interconnected
A neural network usually operates in parallel
• parallel computation– doing multiple things at the same time.
UNIVERSITI TENAGA NASIONAL 9
What’s Special in a Neural Network?
Its computing architecture is based on:– large number of relatively simple processors– operating in PARALLEL– connected to each other by link system
How does the artificial neural network model the brain?
An artificial neural network consists of a number of interconnected processors.
These processors are made very simple; which are analogous to biological neurons in the human brain.
The neurons are connected by weighted links passing signals from one neuron to another.
Each neuron receives a number of signals, and it produces only one output signal through its connection.
– The outgoing connection, in turn, splits into a number of branches that transmit the same signal.
The outgoing branches terminate at the incoming connections of other neurons in the network.
UNIVERSITI TENAGA NASIONAL 10
UNIVERSITI TENAGA NASIONAL 11
Why Neural Network Computing?
To model and mimic certain processing capabilities of our brain.
It imitates the way a human brain works, learns, etc.
UNIVERSITI TENAGA NASIONAL 12
A Neural Network Model
Consists of– Input units xi
– Weight from unit i wi
– An activation level a– A threshold – A network topology– A learning algorithm
Real numbers
UNIVERSITI TENAGA NASIONAL 13
Neural Network with Hidden Layer(s)
UNIVERSITI TENAGA NASIONAL 14
Perceptrons Learn by Adjusting Weights
UNIVERSITI TENAGA NASIONAL 15
An example of the use of ANN
UNIVERSITI TENAGA NASIONAL 16
THE PERCEPTRON(Single Layer Neural Network)
UNIVERSITI TENAGA NASIONAL 17
Perceptron Developed by Frank Rosenblatt (1958). Its learning rule is superior than the Hebb
learning rule. Has been proven by Rosenblatt that the weights
can converge on particular applications. However, the Perceptron does not work for
nonlinear applications as proven by Minsky and Papert (1969).
Activation function used is the binary step function with an arbitrary, but fixed threshold.
Weights are adjusted by the Perceptron learning rule.
UNIVERSITI TENAGA NASIONAL 18
A Perceptron
Is a simple neural network
1
2: :n
Input units Output unit
Input unit xi
Weight from unit i wi
Activation level aThreshold
Given that
UNIVERSITI TENAGA NASIONAL 19
Threshold Function used by Perceptron
na = 1 if wi xi i=1
a = 0, otherwise
(1)
A unit as being ‘on’ or ‘active’, if activation level is ‘1’.
UNIVERSITI TENAGA NASIONAL 20
Perceptron Threshold Function
UNIVERSITI TENAGA NASIONAL 21
A Perceptron that learns “AND” and “OR” concepts:
1 11 1
AND- function
Each has two inputsWeights shown next to the arcs/linksThreshold, is shown next to the output
1.5
1 11 1
OR- function
0.5
UNIVERSITI TENAGA NASIONAL 22
The perceptron will have its output ‘on’ iff
x1.1 + x2.1 1.5 ---- using (1)
Perceptron learns by repeatedly testing on adjustable ‘weights’ through repeated
presentation of examples
P Q P AND Q-----------------------------------------1 1 11 0 00 1 0 0 0 0
x1 x2
UNIVERSITI TENAGA NASIONAL 23
A more abstract characterisation
We view inputs x1, x2, … xn to a perceptron as vectors in n-dim space
Since activation levels are restricted to 1 or 0, all input vectors will lie on the corner of a hypercube in this space
We may view weights and threshold as defining a hyperplane satisfying the equation:
• w1x1 + w2x2 + …. + wnxn - = 0
UNIVERSITI TENAGA NASIONAL 24
Geometric Interpretation
Input vectors are classified according to which side of the hyperplane they fall on
This is termed as Linear Discrimination e.g. four possible inputs are fall on
vertices of a square• w1x1 + w2x2 - = 0
•defines a line in the plane
UNIVERSITI TENAGA NASIONAL 25
Linear Discrimination
E.g. ax1 + bx2 - c = 0 (straight line)
ax1 + bx2 - c 0 (1 side of straight line)
ax1 + bx2 - c = 0
>= 0<=0
UNIVERSITI TENAGA NASIONAL 26
Perceptron cannot compute XOR function (I)
-
- +
+Graph of XOR function
No straight line(s) can be drawn to separate the “+” and “-”. Try it out, if you don’t believe.
P Q P XOR Q-----------------------------------------1 1 01 0 10 1 1 0 0 0
Hidden layersrequired!!
UNIVERSITI TENAGA NASIONAL 27
Perceptron cannot compute XOR function (II) Consider this net:
This suggests that neural nets of threshold units comprising more than one layer can correctly compute XOR function
0.5 1 -2 1
1 1 1.5
UNIVERSITI TENAGA NASIONAL 28
Perceptron cannot compute XOR function (III)
Hidden unit is neither an input nor an output unit, thus we need not concern with its activation level
Any function a perceptron can compute, a perceptron can learn
UNIVERSITI TENAGA NASIONAL 29
Description of A Learning Task Rules:
– to teach a perceptron a function f which maps n binary values x1, x2, … xn to a binary output f(x1, x2, … xn ).
– Think of f being the AND function• { f(1,1)=1, f(1,0)=0, f(0,1)=0, f(0,0)=0 }
– Starting off with random weights & thresholds and inputs & output will have some values that responds to activation level a, either 1 or 0.
UNIVERSITI TENAGA NASIONAL 30
We then compare the actual output with the desired output f(x1, x2, … xn ) = t
– ‘t’ for teaching
If the two are the same then leave the weights/threshold alone
UNIVERSITI TENAGA NASIONAL 31
Perceptron Learning Algorithm
UNIVERSITI TENAGA NASIONAL 32
Set wi ( i = 1, 2, .., n) and to be real numbers
Set to be a positive real number UNTIL all ap = tp for each input pattern p
DO– FOR each input pattern p = (x1
p … xnp) DO
• let new weights & threshold be:– wi wi + (tp - ap) . xi
p
- . (tp - ap)
– ENDFOR END UNTIL
UNIVERSITI TENAGA NASIONAL 33
Few words on
This is learning rate Amount by which we adjust wi & for
each pattern P. It affects the Speed of learning
– fairly small positive number is suggested– if it is too big --> over step minima– if it is too small --> move very2 slow
UNIVERSITI TENAGA NASIONAL 34
x
Minima is here & being skipped
x x
Too slow!!! Crawling ….
UNIVERSITI TENAGA NASIONAL 35
Multi-layer Neural Networks (MLP)
Hidden layers are required…
What are hidden layers?- They are layers additional to the input and output layers, not connected externally. - They are located in between the input and output layers.
Multi-layer Perceptron (MLP)
To build nonlinear classifier based on Perceptrons
Structure of MLP is usually found by experimentation
Parameters can be found using backpropagation
UNIVERSITI TENAGA NASIONAL 36
Multi-layer Perceptron (MLP)
How to learn?– Cannot simply use Perceptron learning rule
because we hidden layer(s)
There is a function that we are trying to minimize: e r r o r
Need a different activation function:– Use sigmoid function instead of threshold
function
UNIVERSITI TENAGA NASIONAL 37
Formulas needed for
The backpropagation learning algorithm
UNIVERSITI TENAGA NASIONAL 38
UNIVERSITI TENAGA NASIONAL 39
netj = ( wji . xi) - j)
1f(netj) =
1 + e (- wji
. xi
+ j) / T sigmoid
where Net input
w j,k w j,k + akp.j
p. f’(netjp)
Adjustmentof weightsone here
To compute new weight, i.e. to ‘learn’, we need the following formula:
Measure of errorat output unit
jp = w k,j . k
p Error for hidden units
jp = (tj - aj)
‘j’ says, you (those Ks) caused me to have so much error
UNIVERSITI TENAGA NASIONAL 40
Multi-layer Neural Networks
Modifications done to “units” We still assume input values are either 1 or 0 Output values are either 1 or 0 But, activation levels take on any real number
between 0 and 1 Thus,
– the activation level of each unit xj is: first we take the net input to xj to be weighted sum using this formula netj = ( wji . xi) - j) ------ (2)
i
UNIVERSITI TENAGA NASIONAL 41
Here,
– summation runs over all input units xi in the previous layer to xj
– with wji denoting the weight on the link from xi to unit xj
j the threshold corresponding to xj
Step function required and we use SIGMOID function
UNIVERSITI TENAGA NASIONAL 42
Sigmoid Function Is a continuous function Also called smooth function Why is this f(x) needed?
– It is a mathematical function that produces a sigmoid curve (i.e. S shape). It is a special case of a logistic function. It is used in neural network to introduce non linearity in the learning model.
1f(netj) = 1 + e (- w
ji . x
i +
j) / T
--- Sigmoid f(x)
Run over all i
UNIVERSITI TENAGA NASIONAL 43
Learning in Multi-layer NN via the ‘Backpropagation’ learning algorithm
All input patterns P are fed one at a time into the input units
actual response of the output units are compared with the desired output
adjustments are made to the weights in response to discrepancies between the desired & actual outputs
after all input patterns have been given, the whole process is repeated over & over until the actual response of the output is tolerably close to the desired response
UNIVERSITI TENAGA NASIONAL 44
We now examine the procedure of adjusting weights:
jp = (tj - aj) -------- (3)
where j
p = error at unit j in respond to presentation of input pattern P
–tj = desired response
–aj = actual response
For an output unit, j
UNIVERSITI TENAGA NASIONAL 45
The weights leading to unit j are modified in much the same way as for single-layer perceptron
For all units k which feed into unit j, we set:
w j,k w j,k + akp.j
p. f’(netjp) -------- (4)
f’(netjp)
= rate of change of function at any point, i.e. derivative of a function
UNIVERSITI TENAGA NASIONAL 46
What if unit j is a hidden unit?
The measure of jp of error at unit j, cannot
this time be given by the difference (tj - aj) [recall formula (3)]
Because we do not know what the response of the hidden units should be!!
Instead, it is calculated on the basis of the errors of the units in the layer immediately above unit j
UNIVERSITI TENAGA NASIONAL 47
Specifically, the error at unit j is the weighted sum of ALL the errors at the units k such that there is a link from unit j to unit k, with the weighting simply being given by the weights on the links:
jp = w k,j . k
p ------ (5)
k
UNIVERSITI TENAGA NASIONAL 48
Equation (3) tells us how to calculate error for output units and equation (5) tells us how to calculate errors for hidden units in terms of the errors in the layer above
We can construct a “goodness-of-fit” measure, which is used to determine how close the network is to compute the function we are trying to teach it.
A (sensible) measure is:
E = E p Where E p = ( (tjp - oj
p)2)
UNIVERSITI TENAGA NASIONAL 49
ANN Promises– A successful implementation area of ANN is
“vision”.– NN can survive the failure of some nodes– Handle noise (missing data) well. Once trained,
NN shows an ability to recognize patterns even though part of the data is missing
– A tool for modeling and exploring brain function– Parallelism (without much effort)– A neural network can execute an automatic
acquisition task for situation in which historical data are available.
UNIVERSITI TENAGA NASIONAL 50
ANN unsolved problems
• It can not (now) model high-level cognitive mechanism such as attention
• Brains are very large, having trillions of neurons
• There is growing evidence that (human) neuron can learn by not merely adjusting weights but to grow new connections
Exercises
State True/False
Multiple choice questions
UNIVERSITI TENAGA NASIONAL 52
Circle one of the answers; either True or False:
1. Backpropagation is an AI reasoning technique. [ True False ] 2. In neural network learning, some inputs must be adjusted once the actual
output is found not tally with the desired one. [ True False ]
Circle one the correct answer; either True or False:
1. Neural networks can handle noisy data. [ True False ] 2. Threshold value () is a parameter of the Perceptron learning algorithm.
[ True False ]
1. XOR problem cannot be solved by single layer Perceptron. [ True False ] 2. Backpropagation is a learning algorithm for Multi-Layer Perceptron. [ True False ]
UNIVERSITI TENAGA NASIONAL 53
MCQ, tick the correct answer:
1. In supervised learning:
a. Only input stimuli are shown to the network b. The algorithms are known but not the inputs c. The network is being control by the user d. Both the inputs and the desired outputs are known e. None of the above
2. Which of the following is not a promise of Artificial Neural Networks?
a. I t can explain result.
b. I t can survive the failure of some nodes.
c. I t has inherent parallelism.
d. I t can handle noise.
3. Why is the XOR problem exceptionally interesting to neural network
researchers?
a. Because it cannot be expressed in a way that allows you to use a
neural network
b. Because it is a complex binary function that cannot be solved by a
neural network.
c. Because it can be solved by a single layer Perceptron.
d. Because it is the simplest linearly inseparable problem that exists.
UNIVERSITI TENAGA NASIONAL 54
MCQ, tick the correct answer: 1. What is back-propagation?
(a) I t is another name given to the curvy function in the Perceptron unit. (b) I t is the transmission of error back through the network to adjust the
inputs. (c) I t is the transmission of error back through the network to allow weights to
be adjusted so that the network can learn. (d) None of the above.
2. Why are linearly separable problems of interest to neural network researchers?
(a) Because they are the only class of problems that a network can solve successfully
(b) Because they are the only class of problems a perceptron can solve successfully
(c) Because they are the only mathematical functions that are continuous
(d) Because they are the only mathematical functions you can draw
3. Which of the following is not a parameter used in the perceptron learning algorithm?
(a) Input units (xi) (b) Learning rate () (c) Threshold value () (d) Error rate ()
Exercise
A neural network for training the recognition of the digits 0 – 9
UNIVERSITI TENAGA NASIONAL 55
UNIVERSITI TENAGA NASIONAL 56
How many bars (hence input bits) are required to represent the digits 0 – 9
Only 7 bars are requiredto represent
all the 10 digits
UNIVERSITI TENAGA NASIONAL 57
Answer: --------- | | | | |-------- | | | | | ---------
* labelling from 0 – 9 is also required
x
x
x
x
x
x
x
x
Sum > 0 ?
So, the neural network can could be usedfor training will look like this: