neural networks ellen walker hiram college. connectionist architectures characterized by (rich &...

Neural Networks

Ellen Walker

Hiram College

Connectionist Architectures

• Characterized by (Rich & Knight)– Large number of very simple neuron-like

processing elements– Large number of weighted connections

between these elements– Highly parallel, distributed control– Emphasis on automatic learning of internal

representations (weights)

Basic Connectionist Unit(Perceptron)

i ¡

i ~

i ™

inputs from

other nodes

w ¡

w ™

w ~

…

∑

-1 ≤ w ^ ≤ 1

weights

threshold

o

output to

other nodes

Classes of Connectionist Architectures

• Constraint networks– Positive and negative connections denote

constraints between the values of nodes– Weights set by programmer

• Layered networks– Weights represent contribution from one

intermediate value to the next– Weights are learned using feedback

Hopfield Network

• A constraint network• Every node is connected to every other node

– If the weight is 0, the connection doesn’t matter

• To use the network, set the values of the nodes and let the nodes adjust their values according to the weights.

• The “result” is the set of all values in the stabilized network.

Hopfield Network as CAM

• Nodes represent features of objects• Compatible features support each other

(weights > 0)• Stable states (local minima) are “valid”

interpretations• Noise features (incompatible) will be

suppressed (network will fall into nearest stable state)

Hopfield Net Example

Parallelogram

Rectangle

Circle

Ellipse

Parallel

Lines

-.5

+.7

+.7

-.7

0

-.8

-.9

-.9

-.9

-.9

Relaxation

• Algorithm to find stable state for Hopfield network (serial or parallel)– Pick a node– Compute [incoming weights]*[neighbors]– If above sum > 0, node =1, else node = -1

• When values aren’t changing, network is stable

• Result can depend on order of nodes chosen

Line Labeling and Relaxation

• Given an object, each vertex contrains the labels of its connected lines

1

2

3

4

5

> >

+

– –

+

+ +

–

Hopfield Network for Labeling

+

>

–

<

+

>

–

<

+

>

–

<

+

>

–

<

+

>

–

<1

2

3

5

4

Each gray box contains 4 mutually exclusive nodes (with negative links between them)

Lines denote positive links between compatible labels

Boltzmann Machine

• Alternative training method for a Hopfield network, based on simulated annealing

• Goal: to find the most stable state (rather than the nearest)

• Boltzmann rule is probabilistic, based on the “temperature” of the system

Deterministic vs. Boltzman

• Deterministic update rule

• Probabilistic update rule

– As temperature decreases, probabilistic rule approaches deterministic one

€

p(unit is on) =1

2+

ΔE

2ΔE (always 0 or 1)

€

p(unit is on) =1

1+ e−ΔE /T

Networks and Function Fitting

• We earlier talked about function fitting– Finding a function that approximates a set

of data so that• Function fits the data well• Function generalized to fit additional data

What Can a Neuron Compute?

• n inputs (i0=1, i1…in)• n+1 weights (w0…wn)• 1 output:

– 1 if g(i) > 0– 0 if g(i) < 0

– g(i) =

• G denotes a linear surface, and the output is 1 if the point is above this surface

€

w j i jj= 0

n

∑

Classification by a Neuron

Linearly SeparableNot Linearly Separable

Training a Neuron

1. Initialize weights randomly2. Collect all misclassified examples3. If there are none, we’re done.4. Else compute gradient & update weights

– Add all points that should have fired, subtract all points that should not have fired

– Add a constant (0<C<1) * gradient back to the weights.

5. Repeat steps 2-5 until done (Guaranteed to converge -- loop will end)

Training Example

1

2

3

Perceptron Problem

• We have a model and a training algorithm, but we can only compute linearly separable functions!

• Most interesting functions are not linearly separable.

• Solution: use more than one line (multiple perceptrons)

Multilayered Network

Layered, fully-connected (between layers), feed-forward

input

hidden

output

Backpropagation Training

• Compute a result: – input->hidden->output

• Compute error for each hidden node, based on desired result

• Propagate errors back:– Output->hidden, hidden->input– Weights are adjusted using gradient

Backpropagation Training (cont’d)

• Repeat above for every example in the training set (one epoch)

• Repeat above until stopping criterion is reached– Good enough average performance on

training set– Little enough change in network

• Hundreds of epochs…

Generalization

• If the network is trained correctly, results will generalize to unseen data

• If overtrained, network will “memorize” training data, random outputs otherwise

• Tricks to avoid memorization– Limit number of hidden nodes– Insert noise into training data

Unsupervised Network Learning

• Kohonen network for classification

lays

eggs

warm-

blooded

exo-

skeleton swims flies

Training Kohonen Network

• Create inhibitory links among nodes of output layer (“winner take all”)

• For each item in training data:– Determine an input vector– Run network - find max output node– Reinforce (increase) weights to maximum

node– Normalize weights so they sum to 1

Representations in Networks

• Distributed representation– Concept = pattern– Examples: Hopfield, backpropagation

• Localist representation– Concept = single node– Example: Kohonen

• Distributed can be more robust, also more efficient

neural networks ellen walker hiram college. connectionist architectures characterized by (rich &...

Documents