csnb234 artificial intelligence

57
UNIVERSITI TENAGA NASIONAL CSNB234 ARTIFICIAL INTELLIGENCE Chapter 10 Artificial Neural Networks (ANN) Instructor: Alicia Tang Y. C. (Chapter 11, pp. 458-471, Textbook) (Chapter 18, Ref. #1)

Upload: demitrius-gryphon

Post on 31-Dec-2015

30 views

Category:

Documents


0 download

DESCRIPTION

CSNB234 ARTIFICIAL INTELLIGENCE. Chapter 10 Artificial Neural Networks (ANN). (Chapter 11, pp. 458-471, Textbook) (Chapter 18, Ref. #1). Instructor: Alicia Tang Y. C. What is Neural Network?. Neural Networks are a different paradigm for computing: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL

CSNB234ARTIFICIAL INTELLIGENCE

Chapter 10Artificial Neural Networks (ANN)

Chapter 10Artificial Neural Networks (ANN)

Instructor: Alicia Tang Y. C.

(Chapter 11, pp. 458-471, Textbook)(Chapter 18, Ref. #1)

Page 2: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 2UNIVERSITI TENAGA NASIONAL 2

What is Neural Network?

Neural Networks are a different paradigm for computing:

– Neural networks are based on the parallel architecture of animal brains.

It is a model that simulate a biological neural network

Real brains, however, are orders of magnitude more complex than any artificial neural network so far considered.

Page 3: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 3

Artificial Neural Networks Supervised Learning

– The Perceptron– Multilayer Neural Networks that use a

backpropagation learning algorithm– The Hopfield network– Stochastic network

Unsupervised Learning– Hebbian Learning– Competitive Learning– Kohonen Network (SOM)

Page 4: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 4

SUPERVISED LEARNING

INPUT

ERRORHANDLER

OUTPUTANN

EXPECTED OUTPUT

Feedback loop

Page 5: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 5

UNSUPERVISED LEARNING

INPUT OUTPUTUnsupervised

learning program

The learning programs will adjust themselves to figure out what could be the output.

There is no targets to match, whatsoever

Page 6: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 6

A Schematic of a Neuron

Page 7: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 7

Neural network at the first glimpse

Neuron– A cell body consists of many dendrites– A single branch is called an axon– It is the information processor

• dendrites handle inputs - receive signals• soma does processing• axon holds output

– Neurons are connected by Synapses• synapses are modelled by (adjusting)

weights - point of contact between neurons

Page 8: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 8

What is in a Neural Network? The model consists of artificial neurons

(processing elements or parameters) • they are called nodes• depends on hardware or software

implementation All neurons are connected in some

structure that form a “network” look, i.e. neurons are interconnected

A neural network usually operates in parallel

• parallel computation– doing multiple things at the same time.

Page 9: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 9

What’s Special in a Neural Network?

Its computing architecture is based on:– large number of relatively simple processors– operating in PARALLEL– connected to each other by link system

Page 10: CSNB234 ARTIFICIAL INTELLIGENCE

How does the artificial neural network model the brain?

An artificial neural network consists of a number of interconnected processors.

These processors are made very simple; which are analogous to biological neurons in the human brain.

The neurons are connected by weighted links passing signals from one neuron to another.

Each neuron receives a number of signals, and it produces only one output signal through its connection.

– The outgoing connection, in turn, splits into a number of branches that transmit the same signal.

The outgoing branches terminate at the incoming connections of other neurons in the network.

UNIVERSITI TENAGA NASIONAL 10

Page 11: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 11

Why Neural Network Computing?

To model and mimic certain processing capabilities of our brain.

It imitates the way a human brain works, learns, etc.

Page 12: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 12

A Neural Network Model

Consists of– Input units xi

– Weight from unit i wi

– An activation level a– A threshold – A network topology– A learning algorithm

Real numbers

Page 13: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 13

Neural Network with Hidden Layer(s)

Page 14: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 14

Perceptrons Learn by Adjusting Weights

Page 15: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 15

An example of the use of ANN

Page 16: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 16

THE PERCEPTRON(Single Layer Neural Network)

Page 17: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 17

Perceptron Developed by Frank Rosenblatt (1958). Its learning rule is superior than the Hebb

learning rule. Has been proven by Rosenblatt that the weights

can converge on particular applications. However, the Perceptron does not work for

nonlinear applications as proven by Minsky and Papert (1969).

Activation function used is the binary step function with an arbitrary, but fixed threshold.

Weights are adjusted by the Perceptron learning rule.

Page 18: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 18

A Perceptron

Is a simple neural network

1

2: :n

Input units Output unit

Input unit xi

Weight from unit i wi

Activation level aThreshold

Given that

Page 19: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 19

Threshold Function used by Perceptron

na = 1 if wi xi i=1

a = 0, otherwise

(1)

A unit as being ‘on’ or ‘active’, if activation level is ‘1’.

Page 20: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 20

Perceptron Threshold Function

Page 21: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 21

A Perceptron that learns “AND” and “OR” concepts:

1 11 1

AND- function

Each has two inputsWeights shown next to the arcs/linksThreshold, is shown next to the output

1.5

1 11 1

OR- function

0.5

Page 22: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 22

The perceptron will have its output ‘on’ iff

x1.1 + x2.1 1.5 ---- using (1)

Perceptron learns by repeatedly testing on adjustable ‘weights’ through repeated

presentation of examples

P Q P AND Q-----------------------------------------1 1 11 0 00 1 0 0 0 0

x1 x2

Page 23: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 23

A more abstract characterisation

We view inputs x1, x2, … xn to a perceptron as vectors in n-dim space

Since activation levels are restricted to 1 or 0, all input vectors will lie on the corner of a hypercube in this space

We may view weights and threshold as defining a hyperplane satisfying the equation:

• w1x1 + w2x2 + …. + wnxn - = 0

Page 24: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 24

Geometric Interpretation

Input vectors are classified according to which side of the hyperplane they fall on

This is termed as Linear Discrimination e.g. four possible inputs are fall on

vertices of a square• w1x1 + w2x2 - = 0

•defines a line in the plane

Page 25: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 25

Linear Discrimination

E.g. ax1 + bx2 - c = 0 (straight line)

ax1 + bx2 - c 0 (1 side of straight line)

ax1 + bx2 - c = 0

>= 0<=0

Page 26: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 26

Perceptron cannot compute XOR function (I)

-

- +

+Graph of XOR function

No straight line(s) can be drawn to separate the “+” and “-”. Try it out, if you don’t believe.

P Q P XOR Q-----------------------------------------1 1 01 0 10 1 1 0 0 0

Hidden layersrequired!!

Page 27: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 27

Perceptron cannot compute XOR function (II) Consider this net:

This suggests that neural nets of threshold units comprising more than one layer can correctly compute XOR function

0.5 1 -2 1

1 1 1.5

Page 28: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 28

Perceptron cannot compute XOR function (III)

Hidden unit is neither an input nor an output unit, thus we need not concern with its activation level

Any function a perceptron can compute, a perceptron can learn

Page 29: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 29

Description of A Learning Task Rules:

– to teach a perceptron a function f which maps n binary values x1, x2, … xn to a binary output f(x1, x2, … xn ).

– Think of f being the AND function• { f(1,1)=1, f(1,0)=0, f(0,1)=0, f(0,0)=0 }

– Starting off with random weights & thresholds and inputs & output will have some values that responds to activation level a, either 1 or 0.

Page 30: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 30

We then compare the actual output with the desired output f(x1, x2, … xn ) = t

– ‘t’ for teaching

If the two are the same then leave the weights/threshold alone

Page 31: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 31

Perceptron Learning Algorithm

Page 32: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 32

Set wi ( i = 1, 2, .., n) and to be real numbers

Set to be a positive real number UNTIL all ap = tp for each input pattern p

DO– FOR each input pattern p = (x1

p … xnp) DO

• let new weights & threshold be:– wi wi + (tp - ap) . xi

p

- . (tp - ap)

– ENDFOR END UNTIL

Page 33: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 33

Few words on

This is learning rate Amount by which we adjust wi & for

each pattern P. It affects the Speed of learning

– fairly small positive number is suggested– if it is too big --> over step minima– if it is too small --> move very2 slow

Page 34: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 34

x

Minima is here & being skipped

x x

Too slow!!! Crawling ….

Page 35: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 35

Multi-layer Neural Networks (MLP)

Hidden layers are required…

What are hidden layers?- They are layers additional to the input and output layers, not connected externally. - They are located in between the input and output layers.

Page 36: CSNB234 ARTIFICIAL INTELLIGENCE

Multi-layer Perceptron (MLP)

To build nonlinear classifier based on Perceptrons

Structure of MLP is usually found by experimentation

Parameters can be found using backpropagation

UNIVERSITI TENAGA NASIONAL 36

Page 37: CSNB234 ARTIFICIAL INTELLIGENCE

Multi-layer Perceptron (MLP)

How to learn?– Cannot simply use Perceptron learning rule

because we hidden layer(s)

There is a function that we are trying to minimize: e r r o r

Need a different activation function:– Use sigmoid function instead of threshold

function

UNIVERSITI TENAGA NASIONAL 37

Page 38: CSNB234 ARTIFICIAL INTELLIGENCE

Formulas needed for

The backpropagation learning algorithm

UNIVERSITI TENAGA NASIONAL 38

Page 39: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 39

netj = ( wji . xi) - j)

1f(netj) =

1 + e (- wji

. xi

+ j) / T sigmoid

where Net input

w j,k w j,k + akp.j

p. f’(netjp)

Adjustmentof weightsone here

To compute new weight, i.e. to ‘learn’, we need the following formula:

Measure of errorat output unit

jp = w k,j . k

p Error for hidden units

jp = (tj - aj)

‘j’ says, you (those Ks) caused me to have so much error

Page 40: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 40

Multi-layer Neural Networks

Modifications done to “units” We still assume input values are either 1 or 0 Output values are either 1 or 0 But, activation levels take on any real number

between 0 and 1 Thus,

– the activation level of each unit xj is: first we take the net input to xj to be weighted sum using this formula netj = ( wji . xi) - j) ------ (2)

i

Page 41: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 41

Here,

– summation runs over all input units xi in the previous layer to xj

– with wji denoting the weight on the link from xi to unit xj

j the threshold corresponding to xj

Step function required and we use SIGMOID function

Page 42: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 42

Sigmoid Function Is a continuous function Also called smooth function Why is this f(x) needed?

– It is a mathematical function that produces a sigmoid curve (i.e. S shape). It is a special case of a logistic function. It is used in neural network to introduce non linearity in the learning model.

1f(netj) = 1 + e (- w

ji . x

i +

j) / T

--- Sigmoid f(x)

Run over all i

Page 43: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 43

Learning in Multi-layer NN via the ‘Backpropagation’ learning algorithm

All input patterns P are fed one at a time into the input units

actual response of the output units are compared with the desired output

adjustments are made to the weights in response to discrepancies between the desired & actual outputs

after all input patterns have been given, the whole process is repeated over & over until the actual response of the output is tolerably close to the desired response

Page 44: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 44

We now examine the procedure of adjusting weights:

jp = (tj - aj) -------- (3)

where j

p = error at unit j in respond to presentation of input pattern P

–tj = desired response

–aj = actual response

For an output unit, j

Page 45: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 45

The weights leading to unit j are modified in much the same way as for single-layer perceptron

For all units k which feed into unit j, we set:

w j,k w j,k + akp.j

p. f’(netjp) -------- (4)

f’(netjp)

= rate of change of function at any point, i.e. derivative of a function

Page 46: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 46

What if unit j is a hidden unit?

The measure of jp of error at unit j, cannot

this time be given by the difference (tj - aj) [recall formula (3)]

Because we do not know what the response of the hidden units should be!!

Instead, it is calculated on the basis of the errors of the units in the layer immediately above unit j

Page 47: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 47

Specifically, the error at unit j is the weighted sum of ALL the errors at the units k such that there is a link from unit j to unit k, with the weighting simply being given by the weights on the links:

jp = w k,j . k

p ------ (5)

k

Page 48: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 48

Equation (3) tells us how to calculate error for output units and equation (5) tells us how to calculate errors for hidden units in terms of the errors in the layer above

We can construct a “goodness-of-fit” measure, which is used to determine how close the network is to compute the function we are trying to teach it.

A (sensible) measure is:

E = E p Where E p = ( (tjp - oj

p)2)

Page 49: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 49

ANN Promises– A successful implementation area of ANN is

“vision”.– NN can survive the failure of some nodes– Handle noise (missing data) well. Once trained,

NN shows an ability to recognize patterns even though part of the data is missing

– A tool for modeling and exploring brain function– Parallelism (without much effort)– A neural network can execute an automatic

acquisition task for situation in which historical data are available.

Page 50: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 50

ANN unsolved problems

• It can not (now) model high-level cognitive mechanism such as attention

• Brains are very large, having trillions of neurons

• There is growing evidence that (human) neuron can learn by not merely adjusting weights but to grow new connections

Page 51: CSNB234 ARTIFICIAL INTELLIGENCE

Exercises

State True/False

Multiple choice questions

Page 52: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 52

Circle one of the answers; either True or False:

1. Backpropagation is an AI reasoning technique. [ True False ] 2. In neural network learning, some inputs must be adjusted once the actual

output is found not tally with the desired one. [ True False ]

Circle one the correct answer; either True or False:

1. Neural networks can handle noisy data. [ True False ] 2. Threshold value () is a parameter of the Perceptron learning algorithm.

[ True False ]

1. XOR problem cannot be solved by single layer Perceptron. [ True False ] 2. Backpropagation is a learning algorithm for Multi-Layer Perceptron. [ True False ]

Page 53: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 53

MCQ, tick the correct answer:

1. In supervised learning:

a. Only input stimuli are shown to the network b. The algorithms are known but not the inputs c. The network is being control by the user d. Both the inputs and the desired outputs are known e. None of the above

2. Which of the following is not a promise of Artificial Neural Networks?

a. I t can explain result.

b. I t can survive the failure of some nodes.

c. I t has inherent parallelism.

d. I t can handle noise.

3. Why is the XOR problem exceptionally interesting to neural network

researchers?

a. Because it cannot be expressed in a way that allows you to use a

neural network

b. Because it is a complex binary function that cannot be solved by a

neural network.

c. Because it can be solved by a single layer Perceptron.

d. Because it is the simplest linearly inseparable problem that exists.

Page 54: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 54

MCQ, tick the correct answer: 1. What is back-propagation?

(a) I t is another name given to the curvy function in the Perceptron unit. (b) I t is the transmission of error back through the network to adjust the

inputs. (c) I t is the transmission of error back through the network to allow weights to

be adjusted so that the network can learn. (d) None of the above.

2. Why are linearly separable problems of interest to neural network researchers?

(a) Because they are the only class of problems that a network can solve successfully

(b) Because they are the only class of problems a perceptron can solve successfully

(c) Because they are the only mathematical functions that are continuous

(d) Because they are the only mathematical functions you can draw

3. Which of the following is not a parameter used in the perceptron learning algorithm?

(a) Input units (xi) (b) Learning rate () (c) Threshold value () (d) Error rate ()

Page 55: CSNB234 ARTIFICIAL INTELLIGENCE

Exercise

A neural network for training the recognition of the digits 0 – 9

UNIVERSITI TENAGA NASIONAL 55

Page 56: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 56

How many bars (hence input bits) are required to represent the digits 0 – 9

Only 7 bars are requiredto represent

all the 10 digits

Page 57: CSNB234 ARTIFICIAL INTELLIGENCE

UNIVERSITI TENAGA NASIONAL 57

Answer: --------- | | | | |-------- | | | | | ---------

* labelling from 0 – 9 is also required

x

x

x

x

x

x

x

x

Sum > 0 ?

So, the neural network can could be usedfor training will look like this: