dtel - ycce.edu · 5 adaline,madaline. chapter-1 specific objective / course outcome 1 identify and...

The Centre for Technology enabled Teaching & Learning , N Y S S, India The Centre for Technology enabled Teaching & Learning , N Y S S, India

DTELDTEL(Department for Technology Enhanced Learning)

1DTEL DTEL Teaching Innovation - Entrepreneurial - Global

DEPARTMENT OF ELECTRONICS &

TELECOMMUNICATION ENGINEERING

VII-SEMESTER

FE2: SOFT COMPUTING

ET411

2

ET411

UNIT NO.2

NEURAL NETWORK

UNIT 1- SYLLABUS

Introduction of neural networks, learning methods1

perceptron training algorithm, single layer perceptron, 2

multiplayer perceptron3

DTEL DTEL

neural network architectures, 4

3

ADALINE,MADALINE5

CHAPTER-1 SPECIFIC OBJECTIVE / COURSE OUTCOME

Identify and describe Learning rules.1

Apply supervised neural networks to pattern classification.2

The student will be able to:

DTEL DTEL

Apply supervised neural networks to pattern classification.2

4

The Centre for Technology enabled Teaching & Learning , N Y S S, India

Techniques in soft computing

�Neural Networks

�Fuzzy Logic

Lect. No. 01:Unit 01

5DTEL DTEL

�Genetic Algorithm

�Hybrid Systems


According to Haykin(1994):

• A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It

Definition of NNLect. No. 01:Unit 01Lect. No. 01:Unit 01

6DTEL DTEL

experiential knowledge and making it available for use. It resembles the brain in two respects:

– Knowledge is acquired by the network through a learning process.

– Interneuron connection strengths known as synaptic weights are used to store the knowledge


What is Neural Network?

• A complex biological NN is a highly interconnected set of neurons to facilitate our reading, breathing…….

• Each neuron– is a rich assembly of tissue and chemistry– has the complexity (if not the speed) of a

1010


7DTEL DTEL

– has the complexity (if not the speed) of a microprocessor

• NNs operate– neural functions are stored in the neurons and their

connections– learning: establishment of new and modification of

existingconnections

The Centre for Technology enabled Teaching & Learning , N Y S S, India Biological Neuron

• The brain is highly complex, nonlinear, and parallel information processing system

• It performs tasks like pattern recognition, perception, motor control , many times faster than the fastest digital computers


8DTEL DTEL

computers• The purpose of neurons is to transmit information

– it accepts many inputs, which are all added up in some way

– if enough active inputs are received at once, the neuron will be activated and fire; if not, it will remain in its inactive state

The Centre for Technology enabled Teaching & Learning , N Y S S, India Structure of Neuron

�Body (soma) – contains nucleus containing the chromosomes

� dendrites� axon� synapse - a narrow gap

� couples the axon with


9DTEL DTEL

� couples the axon with the dendrite of another cell

� no direct linkage across the junction, it is a chemical one

� information is passed from one neuron to another through synapses

Figure from Elements of Artificial Neural Network, By K. Mehrotra, MIT,Cognet

The Centre for Technology enabled Teaching & Learning , N Y S S, India Operation of biological neuron

• signals are transmitted between neurons by electrical pulses (action potentials, spikes) traveling along the axon

• when the potential at the synapse is raised sufficiently by the action potential, it releases chemicals called


10DTEL DTEL

the action potential, it releases chemicals called neurotransmitters– it may take the arrival of more than one action

potential before the synapse is triggered

The Centre for Technology enabled Teaching & Learning , N Y S S, India ARTIFICIAL NEURAL NET• Information-processing system.

• Neurons process the information.

• The signals are transmitted by means of

connection links.


11DTEL DTEL

connection links.

• The links possess an associated weight.

• The output signal is obtained by applying

activations to the net input.

The Centre for Technology enabled Teaching & Learning , N Y S S, India ARTIFICIAL NEURAL NET

X

X1

W2

W1Y


12DTEL DTEL

X2

The figure shows a simple artificial neural net withtwo input neurons (X1, X2) and one output neuron(Y). The inter connected weights are given by W1and W2.


Association of biological net with artificial net


dendrites

Cell body

threshold

axon

13DTEL DTEL

Biological Neuron Artificial Neuron

Cell Neuron

Dendrites Weights or interconnections

Soma Net input

Axon Output

summation

axon

The Centre for Technology enabled Teaching & Learning , N Y S S, India Processing of an artificial net

The neuron is the basic information processing unit of a NN. It consists of:

1. A set of links, describing the neuron inputs, with weights W1, W2, …, Wm.

2. An adder function (linear combiner) for computing the


14DTEL DTEL

2. An adder function (linear combiner) for computing the weighted sum of the inputs (real numbers)

3. Activation function for limiting the amplitude of the neuron output.

jj

j XWum

1 ∑=

=

)(u y b+=ϕ

The Centre for Technology enabled Teaching & Learning , N Y S S, India SALIENT FEATURES OF ANN

�Adaptive learning

�Self-organization

�Real-time operation

�Massive parallelism


15DTEL DTEL

�Massive parallelism

�Learning and generalizing ability

The Centre for Technology enabled Teaching & Learning , N Y S S, India BIAS OF AN ARTIFICIAL NEURON

The bias value is added to the weighted sum

∑wixi so that we can transform it from the origin.

Y = ∑w x + b, where b is the bias


16DTEL DTEL

Yin = ∑wixi + b, where b is the bias

x1-x2=0

x1-x2= 1

x1

x2

x1-x2= -1

The Centre for Technology enabled Teaching & Learning , N Y S S, India BUILDING BLOCKS OF ARTIFICIAL

NEURAL NET

� Network Architecture (Connection between Neurons)


17DTEL DTEL

� Setting the Weights (Training)

� Activation Function



18DTEL DTEL

Figure from Elements of Artificial Neural Network, By K. Mehrotra, MIT,Cognet

The Centre for Technology enabled Teaching & Learning , N Y S S, India LAYER PROPERTIES

• Input Layer: Each input unit may be designated by an attribute value possessed by the instance.

• Hidden Layer: Not directly observable, provides


19DTEL DTEL

• Hidden Layer: Not directly observable, provides nonlinearities for the network.

• Output Layer: Encodes possible values.


MULTI LAYER ARTIFICIAL NEURAL NET

� INPUT: records without class attribute with normalized attributes values.

� INPUT VECTOR: X = { x1, x2, …, xn}where n is the number of (non-class) attributes.

� INPUT LAYER: there are as many nodes as non-


20DTEL DTEL

class attributes, i.e. as the length of the input vector.� HIDDEN LAYER: the number of nodes in the hidden

layer and the number of hidden layers depends on implementation.


TRAINING PROCESS

� Supervised Training - Providing the network with a series of sample inputs and comparing the output with the expected responses.

� Unsupervised Training - Most similar input vector


21DTEL DTEL

� Unsupervised Training - Most similar input vector is assigned to the same output unit.

� Reinforcement Training - Right answer is not provided but indication of whether ‘right’ or ‘wrong’ is provided.


ACTIVATION FUNCTION

ACTIVATION LEVEL – DISCRETE OR CONTINUOUS

HARD LIMIT FUCNTION (DISCRETE)

� Binary Activation function

�Bipolar activation function


22DTEL DTEL

�Identity function

SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)

�Binary Sigmoidal activation function

�Bipolar Sigmoidal activation function


ACTIVATION FUNCTION

Activation functions:

(A)Identity

(B)Binary step

(C)Bipolar step


23DTEL DTEL

(C)Bipolar step

(D)Binary sigmoidal

(E)Bipolar sigmoidal

(F)Ramp

Figure from Principles of Soft Computing, by S. N. Sivanandam & S. N. Deepa

The Centre for Technology enabled Teaching & Learning , N Y S S, India Activation function

• Binary Step

• Bipolar Step

• Binary Sigmoidal

1, _ 0( _ )

0, _ 0

y iny in

y inφ

>= ≤

1, _ 0( _ )

1, _ 0

y iny in

y inφ

>= − ≤

1


24DTEL DTEL

• Binary Sigmoidal

• Bipolar Sigmoidal

• Ramp

* _

1( _ )

1 y iny in

e αφ −=+

* _

* _

1( _ )

1

y in

y in

ey in

e

α

αφ−

−

−=+

1, _ 1

( _ ) _ ,0 _ 1

0, _ 0

y in

y in y in y in

y in

φ>

= ≤ ≤ <


PROBLEM SOLVING

1. Select a suitable NN model based on the nature of the problem.

2. Construct a NN according to the characteristics of the application domain.


25DTEL DTEL

3. Train the neural network with the learning procedure of the selected model.

4. Use the trained network for making inference or solving problems.


NEURAL NETWORKS

� Neural Network learns by adjusting theweights so as to be able to correctly classifythe training data and hence, after testingphase, to classify unknown data.


26DTEL DTEL

� Neural Network needs long time for training.

� Neural Network has a high tolerance tonoisy and incomplete data.


Operation of neural netLect. No. 03:Unit 01

fOutput y

∑

w0j

w1j

x0

x1

27DTEL DTEL

Weighted sum

Inputvector x

Output y

Activationfunction

Weightvector

w

wnjxn


MP Neuron

�Neurons are randomly connected

�Each neuron has a fixed threshold

�It takes one ‘time step’ to pass a signal over one connection link

�Firing state (Activation ) is binary (1 = firing, 0 = not firing)

McCULLOCH –PITTS NEURON


28DTEL DTEL

�Most widely used in case of logic functions.

�One inhibitory neuron connects to all other neurons

�It functions to regulate network activity (prevent too manyfirings)

�Positive weights – excites neuron, Negative weights – inhibitneuron


X1

XnY

+w

+w

-p



29DTEL DTEL

Xn+1

Xn+m

-p

-p

p > 0-p inhibit+w excite

Activation Function

f(y_in) = 1 if y_in ≥ θ= 0 if y_in < θ

Where θ is the threshold


• AND Logic Using MP Neuron

x1 x2 y0 0 00 1 01 0 01 1 1

x1X1

y Y



30DTEL DTEL

y_in = x1*1 + x2*1 = x1+x2

y = f(y_in) = 1 if y_in ≥ θ, y_in ≥ 2

x2 θ = 2X2


Training Algorithms for Single Layer NN

�Hebb – most fundamental�Perceptron Learning Algorithm

Delta Rule


31DTEL DTEL

�Delta Rule


• Donald Hebb stated in 1949 that in the brain, thelearning is performed by the change in thesynaptic gap. Hebb explained it:

Lect. No. 04:Unit 01 Hebb Network

32DTEL DTEL

• “When an axon of cell A is near enough to excite cell B, and repeatedly or permanently takes place in firing it, some growth process or metabolic change takes place in one or both the cells such that A’s efficiency, as one of the cells firing B, is increased.”


• Hebb Learning� Initialize weights to 0, wi

� For each training vector and target pair, si : ti , ( i = 1,n )� Set activations for input

neurons xi = si

1

X1 Y

X2

y

b

w1

w2 Bipolar Data +1 or –1We need training data for

Lect. No. 04:Unit 01 Hebb Network

33DTEL DTEL

� Set activations for output neuron y = ti

� Adjust weightswi (new) = wi (old) + xi y

• Adjust bias,b(new) = b(old) + y

We need training data for learning ( s:t ), Training vector s, Target vector t


Hebb Learning Example ( AND Logic Gate ):

Input

x1

Input

x2

Bias

B

Target

y

1 1 1 1

1 -1 1 -1


34DTEL DTEL

1 -1 1 -1

-1 1 1 -1

-1 -1 1 -1

The Centre for Technology enabled Teaching & Learning , N Y S S, India Initialize weights to zero, calculate change in weights and bias

Recall: wi (new) = wi (old) + xi yb(new) = b(old) + y

So, define: ∆w1 = x1y ∆ w2 = x2y ∆b = y

NEW w w b


35DTEL DTEL

x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b

1 1 1 1 1 1 1

NEW w1, w2, b

The Centre for Technology enabled Teaching & Learning , N Y S S, India Initialize weights to zero, calculate change in weights and bias

Recall: wi (new) = wi (old) + xi yb(new) = b(old) + y

So: ∆w1 = x1y ∆ w2 = x2y ∆b = y

NEW w w b


36DTEL DTEL

x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b

1 1 1 1 1 1 1 1 1 1

NEW w1, w2, b

Since initial weights = 0, wi (new) = wi (old) + xi y wi (new) = xi y

The Centre for Technology enabled Teaching & Learning , N Y S S, India CURRENT DECISION BOUNDARY

y = b + ∑i xi * wi = 0 (recall zero is the boundary )

0 = b + x1w1 + x2w2

solve for x : x = -(w /w ) x – b/w


37DTEL DTEL

solve for x2: x2 = -(w1 /w2) x1 – b/w2

With current weights: x2 = -x1 – 1

x2 = 0, x1 = -1

x2 = -1 , x1 = 0

+-

- -


Using: wi (new) = wi (old) + xi yb(new) = b(old) + y

And: ∆w1 = x1y ∆ w2 = x2y ∆b = y

w w b

Next Data Set


38DTEL DTEL

x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b

1 -1 1 -1 -1 1 -1 0 2 0

NEW w1, w2, b

Since previous weights no longer 0, wi (new) = wi (old) + xi y

The Centre for Technology enabled Teaching & Learning , N Y S S, India CURRENT DECISION BOUNDARY

+-

x2 = -(w1 /w2) x1 – b/w2

With current weights: x2 = 0


39DTEL DTEL

- -


Using: wi (new) = wi (old) + xi y

b(new) = b(old) + y

And: ∆w1 = x1y ∆ w2 = x2y ∆b = y

w w b

Next Data Set


40DTEL DTEL

x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b

-1 1 1 -1 1 -1 -1 1 1 -1

NEW w1, w2, b



CURRENT DECISION BOUNDARY

+-

x2 = -(w1 /w2) x1 – b/w2

With current weights: x2 = - x1 +1


41DTEL DTEL

- -

Boundary is now in correct position, but one more data set to process


FINAL DECISION BOUNDARY

+-

- -

x2 = -(w1 /w2) x1 – b/w2

With current weights: x2 = - x1 +1


42DTEL DTEL

Observation:

� Weights only change for active input neurons, xi ≠ 0

� Hebb Learning will not always find the correct weights even if hey exist


Using: wi (new) = wi (old) + xi y

b(new) = b(old) + y

And: ∆w1 = x1y ∆ w2 = x2y ∆b = y

w w b

Last Data Set


43DTEL DTEL

x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b

-1 -1 1 -1 1 1 -1 2 2 -2

NEW w1, w2, b



LINEAR SEPARABILITY

Linear separability is the concept wherein theseparation of the input space into regions isbased on whether the network response ispositive or negative.

Consider a network having positiveresponse in the first quadrant andnegative response in all other


44DTEL DTEL

negative response in all otherquadrants (AND function) witheither binary or bipolar data, thenthe decision line is drawnseparating the positive responseregion from the negative responseregion.


� Decision region/boundaryn = 2, b != 0, q = 0

is a line, called decision boundary, which partitions the plane into two decision regions

2x

1x

+

-

21

2

12

2211 or 0

wb

xww

x

wxwxb

−−=

=++

),( xx


45DTEL DTEL

into two decision regionsIf a point/pattern is in the positive region, then

, and the output is one (belongs to class one) Otherwise, , output –1 (belongs to class two)

n = 2, b = 0, q != 0 would result a similar partition

02211 ≥++ wxwxb02211 <++ wxwxb

),( 21 xx

The Centre for Technology enabled Teaching & Learning , N Y S S, India � If n = 3 (three input units), then the decision

boundary is a two dimensional plane in a threedimensional space

� In general, a decision boundary isa n-1 dimensional hyper-plane in an n dimensionalspace, which partition the space into two decisionregions

� This simple network thus can classify a given pattern


46DTEL DTEL

� This simple network thus can classify a given patterninto one of the two classes, provided one of thesetwo classes is entirely in one decision region (oneside of the decision boundary) and the other class isin another region.

� The decision boundary is determined completely bythe weights W and the bias b (or threshold q).


LINEAR SEPARABILITY PROBLEM

� If two classes of patterns can be separated by a decision boundary,represented by the linear equation

then they are said to be linearly separable. The simple network cancorrectly classify any patterns.

� Decision boundary (i.e., W, b or q) of linearly separable classes canbe determined either by some learning procedures or by solving


47DTEL DTEL

be determined either by some learning procedures or by solvinglinear equation systems based on representative patterns of eachclasses

� If such a decision boundary does not exist, then the two classes aresaid to be linearly inseparable.

� Linearly inseparable problems cannot be solved by the simplenetwork , more sophisticated architecture is needed.


�Examples of linearly separable classes- Logical AND function

patterns (bipolar) decision boundary

x1 x2 y w1 = 1-1 -1 -1 w2 = 1-1 1 -1 b = -11 -1 -1 q = 01 1 1 -1 + x1 + x2 = 0

x

oo

o

x: class I (y = 1)o: class II (y = -1)


48DTEL DTEL

1 1 1 -1 + x1 + x2 = 0

- Logical OR function

patterns (bipolar) decision boundary

x1 x2 y w1 = 1-1 -1 -1 w2 = 1-1 1 1 b = 11 -1 1 q = 01 1 1 1 + x1 + x2 = 0

o: class II (y = -1)

x

xo

x


The Centre for Technology enabled Teaching & Learning , N Y S S, India � Examples of linearly inseparable classes- Logical XOR (exclusive OR) function

patterns (bipolar) decision boundaryx1 x2 y-1 -1 -1-1 1 11 -1 11 1 -1

o

xo

x



49DTEL DTEL

1 1 -1No line can separate these two classes, as can be seen from the

fact that the following linear inequality system has no solutionbecause we have b < 0 from (1) + (4), and b >= 0 from (2) + (3), which is a contradiction

<++≥−+≥+−<−−

(4)(3)(2)(1)

0 0 0 0

21

21

21

21

wwbwwbwwbwwb


� XOR can be solved by a more complex network with hidden units

YY

z2z2

z1z1x1x1

x2x2

22

22

22

22

-2-2

-2-2

θ = 1θ = 1

θ = 0θ = 0

Y

z2

z1x1

x2

2

2

2

2

-2

-2

θ = 1

θ = 0

(-1, -1) (-1, -1) -1


50DTEL DTEL

(-1, -1) (-1, -1) -1(-1, 1) (-1, 1) 1(1, -1) (1, -1) 1(1, 1) (1, 1) -1


Perceptrons

• By Rosenblatt (1962)

– For modeling visual perception (retina)

– Three layers of units: Sensory, Association, and Response

– Learning occurs only on weights from A units to R units

(weights from S units to A units are fixed).


51DTEL DTEL

(weights from S units to A units are fixed).

– A single R unit receives inputs from n A units (same

architecture as our simple network)

– For a given training sample s:t, change weights only if the

computed output y is different from the target output t

(thus error driven)


Supervised Learning (Perceptron Learning Rule)• Training and test data sets • Training set; input & target are specified

x1 w1 w

Lect. No. 06:Unit 01 Perceptron Network

52DTEL DTEL

Σx1

x2

xn

.

.

.

w1w2

wn

w0o

0

1 ; *

( ) 0 ;

1 ;

n

i ii

i n

w x

f Y Y i n

Y i n

θ

θ θθ

=

>= − ≤ ≤ − < −

∑


PERCEPTRON LEARNING

wnew = wold + ∆w∆w = η (t - o) xiWhere,

t = target value (known),o = perceptron output (calculated),η = small constant (0.1) i.e. learning rate.xi = input sample


53DTEL DTEL

xi = input sample

• If the output is correct (t = o) the weights wi are not changed• If the output is incorrect (t ≠ o) the weights wi are changed

such that the output of the perceptron for the new weights is closer to t.

• The algorithm converges to the correct classification• if the training data is linearly separable• η is sufficiently small

The Centre for Technology enabled Teaching & Learning , N Y S S, India Perceptron learning rules

Sr. No.

Condition Action

1. The perceptron classifies the input pattern correctly (y_out = t)

No change in the current set of weights

2. The perceptron misclassifies the Increase each by

0 1, ,...., mw w w

w w∆


54DTEL DTEL

2. The perceptron misclassifies the input pattern negatively (y_out = -1 but target = +1)

Increase each by , where is proportional to , for all i=0,1,….,m

3. The perceptron misclassifies the input pattern positively (y_out= +1 but target = -1)

Decrease each by , where is proportional to , for all i=0,1,….,m

iwiw∆

iw∆

ix

iw

iw∆iw∆

ix


LEARNING ALGORITHM

Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function, the training value will be 1.

Output , O : The output value from the neuron.

Ij : Inputs being presented to the neuron.


55DTEL DTEL

Wj : Weight from input neuron (Ij) to the output neuron.

LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1.


TRAINING ALGORITHM

• Adjust neural network weights to map inputs to outputs.

• Use a set of sample patterns where the desired output (given the inputs presented) is known.


56DTEL DTEL

known.• The purpose is to learn to

– Recognize features which are common to good and bad exemplars


MULTILAYER PERCEPTRON

Output Values

Output Layer

AdjustableWeights


57DTEL DTEL

Input Signals

Weights

Input Layer

Figure from Principles of Soft Computing, by S. N. Sivanandam & S. N. Deepa


LAYERS IN NEURAL NETWORK

• The input layer:– Introduces input values into the network.– No activation function or other processing.

• The hidden layer(s):– Performs classification of features.– Two hidden layers are sufficient to solve any problem.


58DTEL DTEL

– Two hidden layers are sufficient to solve any problem.– Features imply more layers may be better.

• The output layer:– Functionally is just like the hidden layers.– Outputs are passed on to the world outside the neural

network.


ADALINE

�By Widrow and Hoff (1960) � Adaptive Linear Neuron for signal processing� The same architecture of our simple network� Learning method: delta rule (another way of error

driven), also called Widrow-Hoff learning rule� The delta: t – y_in


59DTEL DTEL

� The delta: t – y_in�NOT t – y because y = f( y_in ) is not differentiable

� Learning algorithm: same as Perceptron learning except in Step 5:b := b + a * (t – y_in)wi := wi + a * xi * (t – y_in)


� Derivation of the delta rule� Error for all P samples: mean square error

� E is a function of W = {w1, ... wn}� Learning takes gradient descent approach to reduce E by

modify W� the gradient of E:

∑=

−=P

p

pinyptP

E1

2))(_)((1

)......,(EE

E∂∂

∂∂=∇


60DTEL DTEL

�

�

� There for

)......,(1 nww

E∂∂

=∇

ii w

Ew

∂∂−∝∆

i

P

p

P

p ii

xpinyptP

pinyptw

pinyptPw

E

]))(_)((2

[

)(_)(())](_)((2

[

1

1

∑

∑

=

=

−−=

−∂∂−=

∂∂

ii w

Ew

∂∂−∝∆ i

P

xpinyptP

]))(_)((2

[ 1∑ −=


Recommended Textbooks

• Neural Networks, Fuzzy Logic and Genetic Algorithms, Synthesis and Applications, S. Rajshekharan, Vijayalakshmi Pai

• Elements of Artificial Neural Network, K. Mehrotra,


61DTEL DTEL

• Elements of Artificial Neural Network, K. Mehrotra, MIT, Cognet

• Principles of Soft Computing, by S. N. Sivanandam & S. N. Deepa,

• Fuzzy sets and Fuzzy logic, George Klir,Bo Yuan,PHI


Thank You

62DTEL DTEL

dtel - ycce.edu · 5 adaline,madaline. chapter-1 specific objective / course outcome 1 identify and...

Documents