lecture 6, cs5671 neural networks introduction –biological neurons –artificial neurons...
TRANSCRIPT
Lecture 6, CS567 1
Neural Networks
• Introduction– Biological neurons– Artificial neurons– Concepts– Conventions
• Single Layer Perceptron– Example– Limitation
Lecture 6, CS567 2
Biological neuron
• Neuron = Cell superclass in nervous system
• Specs– Total number = ~1011 (Size of hard disk circa ’03)
• Maximum number before birth
• 104 lost/day (More if you don’t study everyday!)
– Connections/neuron = ~104
– Signal Rate = ~103 Hz (Cpu = 109 Hz circa ’03)
– Signal Propagation Velocity = 10(-1 to 2)/sec– Power = 40W
Lecture 6, CS567 3
Biological Neuron
• Connectivity important (Just like human society)– Connected
• To what and
• To what extent
– Basis of memory and learning (revising opinions; learning lessons in life)
– Revision important (And why reading for the first time on eve of exam is a flawed strategy)
– Covering eye to prevent loss of vision in squint (Why advertising industry persists, subliminally or blatantly)
Lecture 6, CS567 4
Artificial Neural Networks
• What– Connected units with inputs and outputs
• Why– Can “learn” and approximate any function,
including non-linear functions (XOR)
• When – Basic idea more than 60 years old– Resurgence of interest once coverage extended to
non-linear problems
Lecture 6, CS567 5
Concepts• Trial
– Output = Verdict = Guilty/Not guilty– Processing neurons = Jury members– Output neuron = Jury Foreman– Inputs = Witnesses/Lawyers– Weights = Credibility of Witnesses/Lawyers
• Investment– Output decision = Buy/Sell– Inputs = Financial advisors– Weights = Past reliability of advice– Iterate = Revise weights after results
Lecture 6, CS567 6
Concepts• Types of learning
– Supervised• NN learns from a series of labeled examples (human
propagation of prejudice)
• Distinction between training and prediction phases
– Unsupervised • NN discovers clusters and classifies examples
• Also called self-organizing networks (human tendency)
• Typically, prediction rules cannot be derived from an NN
Lecture 6, CS567 7
Conventions
p1
p2
p3
pN
1h1
1h2
2h1
2h2
1hM 2hP
o1
o2
oK
(Input) ( Hidden ) (Output) LAYERS
w1,1
wM,N
w1,2
Lecture 6, CS567 8
Conventions• Generally, rich connectivity between, but not within
layers• Output for any neuron = Transfer/Activation function
f(x) = f(WP + b) where
W = Weight Matrix [w1,1 w1,2 w1,3 …. w1,N]
P = Input Matrix
WP = Matrix product = [w1,1p1+w1,2p2+ w1,3p3 ... +w1,NpN]
b = Bias/Offset
p1p2
pN
Lecture 6, CS567 9
Activation Functions• Hard limit: f(x) = [0/1]. If x < 0, f(x) = 0, else 1
• Symmetric hard limit: f(x) = [-1/1]. If x < 0, f(x) = -1, else 1
• Linear: f(x) = x
• Positive linear: f(x) = [0,x]. If x < 0, f(x) = 0, else x
• Saturating linear: f(x) = [0,1]. If x < 0, f(x) = 0; if x > 1, then 1, else x
• Symmetric Saturating linear: f(x) = [-1,1]. If x < -1, f(x) = -1; if x > 1, then 1, else x
• Log-sigmoid: f(x) = 1/(1+e-x)
• Competitive (multiple neuron layer; winner takes all):
f(xi) = 1 | xi > (not xi); f(not xi) = 0;
Lecture 6, CS567 10
Conventions• Output for any layer = column matrix =
[ f(W1P + b1)
f(W2P + b2)
.
f(WMP + bM)]
where
Wi = Weight Matrix [wi,1 wi,2 wi,3 …. w1,N]
Lecture 6, CS567 11
Single Layer Perceptron• Single Layer Single Neuron Perceptron
– Consider multiple inputs (column vector) with respective weights (row vector) to a neuron that serves as the output neuron
– Assume f(x) is the hard limit function
– Labeled training examples are provided {(P1,t1), (P2,t2) …. (PZ,tZ)}, where each ti is 0 or 1.
– Learning rule (NOT the same as prediction rule)• Error e = Target - f(x)
• For each input set Wcurrent = Wprevious + eP
bcurrent = bprevious + e
• Iterate till e is zero for all training examples
Lecture 6, CS567 12
Single Layer Perceptron• Single Layer Multiple Neuron Perceptron
– Consider multiple inputs (column vector) with respective weights (row vector) to a layer of several neurons that serve as the output
– Assume f(x) is the hard limit function– Labeled training examples are provided {(P1,t1), (P2,t2) ….
(PZ,tZ)}, where each ti is a column vector consisting of 0s and/or 1s.
– Learning rule (NOT the same as prediction rule; use vectors for the error and bias)
• Error E = Target - f(x) • For each input set
Wcurrent = Wprevious + EP
Bcurrent = Bprevious + E
• Iterate till E is zero for all training examples