74.419 Artificial Intelligence 2004
- Neural Networks -
Neural Networks (NN)
• basic processing units
• general network architectures
• learning
• qualities and problems of NNs
Neural Networks – Central Concepts
biologically inspired– McCulloch-Pitts Neuron (automata theory), Perceptron
basic architecture– units with activation state, – directed weighted connections between units – "activation spreading", output used as input to connected
units
basic processing in unit– integrated input: sum of weighted outputs of connected
“pre-units”– activation of unit = function of integrated input– output depends on input/activation state– activation function or output function often threshold
dependent, also sigmoid (differentiable for backprop!) or linear
Anatomy of a Neuron
Diagram of an Action Potential
From: Ana Adelstein, Introduction to the Nervous System, Part Ihttp://www.ualberta.ca/~anaa/PSYCHO377/PSYCH377Lectures/L02Psych377/
General Neural Network Model
• Network of simple processing units (neurons)• Units connected by weighted links (labelled di-
graph; connection matrix)
wij
ui uj yi xj yj
xj Input to uj aj Activation state of uj yj Output of uj wij Weight of connection from ui to uj (often wji )
Neuron Model as FSA
Calculate new activation state and output based on current activation and input. wij
ui uj yi xj yj Formalization as Finite State Machine (observe delay in unit)
Input to uj xj (t) = i=1,…,n wij yi (t) Activation function aj (t+1) = (aj (t), xj (t)) Output function yj (t) = (aj (t), xj (t)) Output function often either linear function, threshold function or sigmoid function, depending only on the activation state.. Activation function often identity function of input, i.e. aj (t+1) = xj (t)
NN - Activation Functions
Sigmoid Activation Function Threshold Activation Function (Step Function)
adapted from Thomas Riga, University of Genoa, Italyhttp://www.hku.nl/~pieter/EDU/neuro/doc/ThomasRiga/ann.htm#pdp
Parallelism – Competing Rules VP
h1 h2 h3 Connection weights 1.0
0.5 0.33 -0.2 V NP PP
h1 h2 h3
V 1.0 0.5 0.33 t=1
NP 0.8 1.0 0.66 t=2
PP 0.6 0.8 0.99 t=3
NN Architectures + FunctionFeedforward, layered networks simple pattern classification, function estimating
Recurrent networks for space/time-variant input (e.g. natural language)
Completely connected networks Boltzman Machine, Hopfield Network optimization; constraint satisfaction
Self-Organizing Networks SOMs, Kohonen networks, winner-take-all (WTA)
networks unsupervised development of classification best-fitting weight vector slowly adapted to input
vector
NN Architectures + Function
Feedforward networks layers of uni-directionally connected units strict forward processing from input to output units simple pattern classification, function estimating,
decoder, control systemsRecurrent networks
Feedforward network with internal feedback (context memory)
processing of space/time-variant input, e.g. naturallanguage
e.g. Elman networks
Haykin, Simon: Neural Networks - A Comprehensive Foundation, Prentice-Hall, 1999, p. 22.
Feed-forward Network
NN Architectures + FunctionCompletely connected networks
all units bi-directionally connected positive weight positive association between
units; units support each other, are compatible optimization; constraint satisfaction Boltzman Machine, Hopfield Network
Self-Organizing Networks SOMs, Kohonen networks, also winner-take-all
(WTA) networks best-fitting weight vector slowly adapts to input
vector unsupervised learning of classification
Neural Networks - Learning
Learning = change connection weightsadjust connection weights in network, changes input-
output behaviour, make it react “properly” to input pattern– supervised = network is told about “correct”
answer = teaching input; e.g. backpropagation, reinforcement learning
– unsupervised = network has to find correct output (usually classification of input patterns) on it’s own; e.g. competitive learning, winner-take-all networks, self-organizing or Kohonen maps
Backpropagation - Schema
Backpropagation - Schematic Representation
The input is processed in a forward pass. Then the error is determined at the output units and propagated back through the network towards the input units.
adapted from Thomas Riga, University of Genoa, Italyhttp://www.hku.nl/~pieter/EDU/neuro/doc/ThomasRiga/ann.htm#pdp
Backpropagation LearningBackpropagation Learning is supervised
Correct input-output relation is known for some pattern samples; take some of these patterns for training: calculate error between produced output and correct output; propagate error back from output to input units and adjust weights. After training perform tests with known I/O patterns. Then use with unknown input patterns.
Idea behind the Backpropagation Rule (next slides):
Determine error for output units (compare produced output with 'teaching input' = correct or wanted output). Adjust weights based on error, activation state, and current weights. Determine error for internal units based on the derivation of activation function. Adjust weights for internal units using the error function, using an adapted delta-rule.
NN-Learning as OptimizationLearning: adjust network in order to adapt its input-output
behaviour so that it reacts “properly” to input patternsLearning as optimization process: find parameter setting for
network (in particular weights) which determines network that produces best-fitting behaviour (input-output relation)
minimize error in I/O behaviour optimize weight setting w.r.t error function find minimum in error surface for different weight settingsBackpropagation implements a gradient descent search for
correct weight setting (method not optimal)Statistical models (include a stochastic parameter) allow for
“jumps” out of local minima (cf. Hopfield Neuron with probabilistic activation function, Thermodynamic Models with temperature parameter, Simulated Annealing)
Genetic Algorithms can be used to determine parameter setting of Neural Network.
Backpropagation - Delta RuleThe Error is calculated as erri = (ti - yi) where
ti is the teaching input (the correct or wanted output) yi is the produced output
Note: In the textbook it is called (Ti - Oi)
Backpropagation- or delta-rule:
wj,i wj,i + • aj • i
where is a constant, the learning rate, aj is the activation of uj and
i is the backpropagated error.
i = erri • g' for units in the output layer
j = g' (xj) • wj,i • i for internal hidden units
Where g' is the derivative of the activation function g.
Then wk,j wk,j + • xk • j
Backpropagation as Error MinimizationFind Minimum of the Error function
E = 1/2 • i (ti - yi)2
Transform the above formula by integrating the weights (substitute the output term yi with g( wj,i • aj) = sum of weighted outputs of pre-neurons):
E(W) = 1/2 • i (ti - g(j wj,i • aj))2
where W is the complete weight matrix for the net.
Determine the derivative of the error function (the gradient) w.r.t to a single weight wk,j :
dE / dwk,j = -xk • j
To minimize the error, take the inverse of the gradient (+xk • j).
This yields the Backpropagation- or delta-rule:
wk,j wk,j + • xk • j
Implementation of Backprop-Learning
• Choose description of input and output patterns which is suitable for the task.
• Determine test set and training set (disjoint sets)• Do – in general thousands of – training runs (with
various patterns) until parameters of the NN converge.
• The training goes several times through the different pattern classes (outputs), either one class at a time or one pattern from each class at a time.
• Measure performance of the network for test data (determine error – wrong vs. right reaction of NN)
• re-train if necessary
Competitive Learning 1
Competitive Learning is unsupervised.
Discovers classes in the set of input patterns.
Classes are determined by similarity of inputs.
Determines (output) unit which responds to all sample inputs of the same class.
Unit reacts to patterns which are similar and thus represents this class.
Different classes are represented by different units. The system can thus - after learning - be used for classification.
Competitive Learning 2
Units specialize to recognize pattern classes
Unit which responds strongest (among all units) to the current input, moves it's weight vector towards the input vector (use e.g. Euclidean distance):
reduce weight on inactive lines, raise weight on active lines
all other units keep or reduce their weights (often a Gaussian curve used to determine which units change their weights and how)
Winning units (their weight vectors) represent a prototype of the class they recognize.
from Haykin, Simon: Neural Networks, Prentice-Hall, 1999, p. 60
Competitive Learning - Figure
Example: NetTalk (from 319)
• Terry Sejnowski of Johns Hopkins developed a system that can pronounce words of text
• The system consists of a backpropagation network with 203 input units (29 text characters, 7 characters at a time), 80 hidden units, and 26 output units– The system was developed over a year
• The DEC-talk system consists of hand-coded linguistic rules for speech pronunciation– developed over approximately 10 years
• DEC-talk outperforms NETtalk but DEC-talk required significantly more development time
NetTalk (from 319)
• "This exemplifies the utility of neural networks; they are easy to construct and can be used even when a problem is not fully understood. However, rule-based algorithms usually out-perform neural networks when enough understanding is available”
» Hertz, Introduction to the Theory of Neural Networks, p. 133
NETtalk - General
• Feedforward network architecture• NETtalk used text as input• Text was moved over input units ("window")
split text into fixed length input with some overlap between adjacent text windows
• Output represents controls for Speech Generator• Training through backpropagation• Training Patterns from human-made phonetic
transcripts
NETtalk - Processing Unit
NETtalk - Network Architecture
NETtalk - Some Articulatory Features (Output)
NETtalk - Some Articulatory Features (Output)
NN - Caveats 1
often 3 layers necessary Perceptron, Minsky&Papert’s analysis linearly separable pattern classes
position dependence visual pattern recognition can depend on position of
pattern in input layer / matrix introduce feature vectors (pre-analysis yields features of
patterns; features input to NN)
time- and space invariance patterns may be stretched / squeezed in space / time
dimension (visual objects, speech)
NN - Caveats 2
Recursive structures and functions not directly representable due to fixed architecture (fixed
size) move window of input units over input (which is larger
than input window) store information in hidden units ("context memory") and
feedback into input layer use hybrid model
Variable binding and value assignment simulation possible through simultaneously active,
synchronized units (cf. Lokendra Shastri)
Additional References
Haykin, Simon: Neural Networks – A Comprehensive Foundation, Prentice-Hall, 1995.
Rumelhart, McClelland & The PDP Research Group: Parallel Distributed Processing. Explorations into the Microstructures of Cognition, The MIT Press, 1986.
Neural Networks Web Pages
The neuroinformatics Site (incl. Software etc.)
http://www.neuroinf.org/
Neural Networks incl. Software Repository at CBIIS (Connectionist-Based Intelligent Information Systems), University of Otago, New Zealand
http://divcom.otago.ac.nz/infosci/kel/CBIIS.html
Kohonen Feature Map - Demo
http://rfhs8012.fh-regensburg.de/~saj39122/begrolu/ kohonen.html
Neurophysiology / Neurobiology Web Pages
Animated diagram of an Action Potential (Neuroscience for Kids - featuring the giant axon of the squid)http://faculty.washington.edu/chudler/ap.html
Adult explanation of processes involved in information transmission on the cell level (with diagrams but no animation)
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/E/ExcitableCells.html
Similar to above but with animation and partially spanish
http://www.epub.org.br/cm/n10/fundamentos/pot2_i.htm
Neurophysiology / Neurobiology Web Pages
Kandel's Nobel Lecture "Molecular Biology of Memory Storage: A Dialogue Between Genes and Synapses," December 8, 2000
http://www.nobel.se/medicine/laureates/2000/kandel-lecture.html
The Molecular Sciences Institute, Berkeley http://www.molsci.org/Dispatch
The Salk Institute for Biological Studies, San Diegohttp://www.salk.edu/