neural networks ellen walker hiram college. connectionist architectures characterized by (rich &...
TRANSCRIPT
Neural Networks
Ellen Walker
Hiram College
Connectionist Architectures
• Characterized by (Rich & Knight)– Large number of very simple neuron-like
processing elements– Large number of weighted connections
between these elements– Highly parallel, distributed control– Emphasis on automatic learning of internal
representations (weights)
Basic Connectionist Unit(Perceptron)
i ¡
i ~
i ™
inputs from
other nodes
w ¡
w ™
w ~
…
∑
-1 ≤ w ^ ≤ 1
weights
threshold
o
output to
other nodes
Classes of Connectionist Architectures
• Constraint networks– Positive and negative connections denote
constraints between the values of nodes– Weights set by programmer
• Layered networks– Weights represent contribution from one
intermediate value to the next– Weights are learned using feedback
Hopfield Network
• A constraint network• Every node is connected to every other node
– If the weight is 0, the connection doesn’t matter
• To use the network, set the values of the nodes and let the nodes adjust their values according to the weights.
• The “result” is the set of all values in the stabilized network.
Hopfield Network as CAM
• Nodes represent features of objects• Compatible features support each other
(weights > 0)• Stable states (local minima) are “valid”
interpretations• Noise features (incompatible) will be
suppressed (network will fall into nearest stable state)
Hopfield Net Example
Parallelogram
Rectangle
Circle
Ellipse
Parallel
Lines
-.5
+.7
+.7
-.7
0
-.8
-.9
-.9
-.9
-.9
Relaxation
• Algorithm to find stable state for Hopfield network (serial or parallel)– Pick a node– Compute [incoming weights]*[neighbors]– If above sum > 0, node =1, else node = -1
• When values aren’t changing, network is stable
• Result can depend on order of nodes chosen
Line Labeling and Relaxation
• Given an object, each vertex contrains the labels of its connected lines
1
2
3
4
5
> >
+
– –
+
+ +
–
Hopfield Network for Labeling
+
>
–
<
+
>
–
<
+
>
–
<
+
>
–
<
+
>
–
<1
2
3
5
4
Each gray box contains 4 mutually exclusive nodes (with negative links between them)
Lines denote positive links between compatible labels
Boltzmann Machine
• Alternative training method for a Hopfield network, based on simulated annealing
• Goal: to find the most stable state (rather than the nearest)
• Boltzmann rule is probabilistic, based on the “temperature” of the system
Deterministic vs. Boltzman
• Deterministic update rule
• Probabilistic update rule
– As temperature decreases, probabilistic rule approaches deterministic one
€
p(unit is on) =1
2+
ΔE
2ΔE (always 0 or 1)
€
p(unit is on) =1
1+ e−ΔE /T
Networks and Function Fitting
• We earlier talked about function fitting– Finding a function that approximates a set
of data so that• Function fits the data well• Function generalized to fit additional data
What Can a Neuron Compute?
• n inputs (i0=1, i1…in)• n+1 weights (w0…wn)• 1 output:
– 1 if g(i) > 0– 0 if g(i) < 0
– g(i) =
• G denotes a linear surface, and the output is 1 if the point is above this surface
€
w j i jj= 0
n
∑
Classification by a Neuron
Linearly SeparableNot Linearly Separable
Training a Neuron
1. Initialize weights randomly2. Collect all misclassified examples3. If there are none, we’re done.4. Else compute gradient & update weights
– Add all points that should have fired, subtract all points that should not have fired
– Add a constant (0<C<1) * gradient back to the weights.
5. Repeat steps 2-5 until done (Guaranteed to converge -- loop will end)
Training Example
1
2
3
Perceptron Problem
• We have a model and a training algorithm, but we can only compute linearly separable functions!
• Most interesting functions are not linearly separable.
• Solution: use more than one line (multiple perceptrons)
Multilayered Network
Layered, fully-connected (between layers), feed-forward
input
hidden
output
Backpropagation Training
• Compute a result: – input->hidden->output
• Compute error for each hidden node, based on desired result
• Propagate errors back:– Output->hidden, hidden->input– Weights are adjusted using gradient
Backpropagation Training (cont’d)
• Repeat above for every example in the training set (one epoch)
• Repeat above until stopping criterion is reached– Good enough average performance on
training set– Little enough change in network
• Hundreds of epochs…
Generalization
• If the network is trained correctly, results will generalize to unseen data
• If overtrained, network will “memorize” training data, random outputs otherwise
• Tricks to avoid memorization– Limit number of hidden nodes– Insert noise into training data
Unsupervised Network Learning
• Kohonen network for classification
lays
eggs
warm-
blooded
exo-
skeleton swims flies
Training Kohonen Network
• Create inhibitory links among nodes of output layer (“winner take all”)
• For each item in training data:– Determine an input vector– Run network - find max output node– Reinforce (increase) weights to maximum
node– Normalize weights so they sum to 1
Representations in Networks
• Distributed representation– Concept = pattern– Examples: Hopfield, backpropagation
• Localist representation– Concept = single node– Example: Kohonen
• Distributed can be more robust, also more efficient