lecture 6, cs5671 neural networks introduction –biological neurons –artificial neurons...

Lecture 6, CS567 1

Neural Networks

• Introduction– Biological neurons– Artificial neurons– Concepts– Conventions

• Single Layer Perceptron– Example– Limitation

Lecture 6, CS567 2

Biological neuron

• Neuron = Cell superclass in nervous system

• Specs– Total number = ~1011 (Size of hard disk circa ’03)

• Maximum number before birth

• 104 lost/day (More if you don’t study everyday!)

– Connections/neuron = ~104

– Signal Rate = ~103 Hz (Cpu = 109 Hz circa ’03)

– Signal Propagation Velocity = 10(-1 to 2)/sec– Power = 40W

Lecture 6, CS567 3

Biological Neuron

• Connectivity important (Just like human society)– Connected

• To what and

• To what extent

– Basis of memory and learning (revising opinions; learning lessons in life)

– Revision important (And why reading for the first time on eve of exam is a flawed strategy)

– Covering eye to prevent loss of vision in squint (Why advertising industry persists, subliminally or blatantly)

Lecture 6, CS567 4

Artificial Neural Networks

• What– Connected units with inputs and outputs

• Why– Can “learn” and approximate any function,

including non-linear functions (XOR)

• When – Basic idea more than 60 years old– Resurgence of interest once coverage extended to

non-linear problems

Lecture 6, CS567 5

Concepts• Trial

– Output = Verdict = Guilty/Not guilty– Processing neurons = Jury members– Output neuron = Jury Foreman– Inputs = Witnesses/Lawyers– Weights = Credibility of Witnesses/Lawyers

• Investment– Output decision = Buy/Sell– Inputs = Financial advisors– Weights = Past reliability of advice– Iterate = Revise weights after results

Lecture 6, CS567 6

Concepts• Types of learning

– Supervised• NN learns from a series of labeled examples (human

propagation of prejudice)

• Distinction between training and prediction phases

– Unsupervised • NN discovers clusters and classifies examples

• Also called self-organizing networks (human tendency)

• Typically, prediction rules cannot be derived from an NN

Lecture 6, CS567 7

Conventions

p1

p2

p3

pN

1h1

1h2

2h1

2h2

1hM 2hP

o1

o2

oK

(Input) ( Hidden ) (Output) LAYERS

w1,1

wM,N

w1,2

Lecture 6, CS567 8

Conventions• Generally, rich connectivity between, but not within

layers• Output for any neuron = Transfer/Activation function

f(x) = f(WP + b) where

W = Weight Matrix [w1,1 w1,2 w1,3 …. w1,N]

P = Input Matrix

WP = Matrix product = [w1,1p1+w1,2p2+ w1,3p3 ... +w1,NpN]

b = Bias/Offset

p1p2

pN

Lecture 6, CS567 9

Activation Functions• Hard limit: f(x) = [0/1]. If x < 0, f(x) = 0, else 1

• Symmetric hard limit: f(x) = [-1/1]. If x < 0, f(x) = -1, else 1

• Linear: f(x) = x

• Positive linear: f(x) = [0,x]. If x < 0, f(x) = 0, else x

• Saturating linear: f(x) = [0,1]. If x < 0, f(x) = 0; if x > 1, then 1, else x

• Symmetric Saturating linear: f(x) = [-1,1]. If x < -1, f(x) = -1; if x > 1, then 1, else x

• Log-sigmoid: f(x) = 1/(1+e-x)

• Competitive (multiple neuron layer; winner takes all):

f(xi) = 1 | xi > (not xi); f(not xi) = 0;

Lecture 6, CS567 10

Conventions• Output for any layer = column matrix =

[ f(W1P + b1)

f(W2P + b2)

.

f(WMP + bM)]

where

Wi = Weight Matrix [wi,1 wi,2 wi,3 …. w1,N]

Lecture 6, CS567 11

Single Layer Perceptron• Single Layer Single Neuron Perceptron

– Consider multiple inputs (column vector) with respective weights (row vector) to a neuron that serves as the output neuron

– Assume f(x) is the hard limit function

– Labeled training examples are provided {(P1,t1), (P2,t2) …. (PZ,tZ)}, where each ti is 0 or 1.

– Learning rule (NOT the same as prediction rule)• Error e = Target - f(x)

• For each input set Wcurrent = Wprevious + eP

bcurrent = bprevious + e

• Iterate till e is zero for all training examples

Lecture 6, CS567 12

Single Layer Perceptron• Single Layer Multiple Neuron Perceptron

– Consider multiple inputs (column vector) with respective weights (row vector) to a layer of several neurons that serve as the output

– Assume f(x) is the hard limit function– Labeled training examples are provided {(P1,t1), (P2,t2) ….

(PZ,tZ)}, where each ti is a column vector consisting of 0s and/or 1s.

– Learning rule (NOT the same as prediction rule; use vectors for the error and bias)

• Error E = Target - f(x) • For each input set

Wcurrent = Wprevious + EP

Bcurrent = Bprevious + E

• Iterate till E is zero for all training examples

lecture 6, cs5671 neural networks introduction –biological neurons –artificial neurons...

Documents