introduction to artificial neural networks · · 2013-03-06neural networks in order to combine...
TRANSCRIPT
Introduction To Artificial Neural Networks
Machine Learning
circle square circle square …
“group these into two categories”
Supervised
Unsupervised
. Supervised Machine Learning
. . . .
. . . . .
.
. . . .
.
.
. . .
.
. .
.
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Accuracy
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Accuracy 15/20 = 0.75
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Accuracy 15/20 = 0.75 Precision
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Accuracy 15/20 = 0.75 Precision 7/12 = 0.58
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Accuracy 15/20 = 0.75 Precision 7/12 = 0.58 Recall
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Accuracy 15/20 = 0.75 Precision 7/12 = 0.58 Recall 7/7 = 1.0
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Accuracy 15/20 = 0.75 Precision 7/12 = 0.58 Recall 7/7 = 1.0 F1 (2PR/(P+R))
Supervised Machine Learning
. . . . .
. . . . .
.
. . . .
.
.
. . .
Accuracy 15/20 = 0.75 Precision 7/12 = 0.58 Recall 7/7 = 1.0 F1 (2PR/(P+R)) = 0.73
Supervised Machine Learning
293871947009
* √52.86301
/ 80.2341 = ?
293871947009
* √52.86301
/ 80.2341
= 26630240520.936812470902167425359
Neural Networks
In order to combine the powers of the machine and the human brain, Neural Networks try to mimic the structure and function of our nervous system.
Biological Motivation #1
Synapses
Axon
Dendrites
Synapses+++--
(weights)
Nodes
Biological Motivation #2
W is the strength of signal sent between A and B.
If A is stimulated sufficiently and w is positive, then A stimulates B.
If A is stimulated sufficiently and w is negative, then A inhibits B.
If A isn’t stimulated sufficiently, nothing happens.
The amount to which a node must be stimulated is determined by its threshold.
Weight w Node A Node B
Neural Networks Node (Neuron)
Edge (Interconnection)
Threshold T Output y
Input x1
Input x2
Input x3
Input x4
Weight w1
Weight w2
Weight w3
Weight w4
A Single Perceptron
Threshold T Output y
Input x1
Input x2
Input x3
Input x4
Weight w1
Weight w2
Weight w3
Weight w4
If w1x1 + w2x2 + … + wnxn ≥ T,
then the output of n is 1.
Otherwise,
the output of n is 0.
A Single Perceptron
Perceptron
created in the 1960’s Neural “network” of a single neuron Trainable: its threshold and input weights can be
modified or learned If the neuron doesn’t give the desired output, then it
has made a mistake. Input weights and threshold can be changed
according to a learning algorithm.
x1 x2 y = x1 and x2
0 0 0
0 1 0
1 0 0
1 1 1
X1 = “I did my homework.” X2 = “I’m well rested.” y = “I will go to class.” 1 means True 0 means False
T = ? Output y
Input x1
Input x2
W1 = ?
W2 = ?
AND
T = 2 Output y
Input x1
Input x2
W1 = 1
W2 = 1
AND
Inputs are either 0 or 1
Output is 1 only if all inputs are 1
T = ? Output y
Input x1
Input x2
Input x3
Input x4
W1 = ?
W2 = ?
W3 = ?
W4 = ?
AND
T = 4 Output y
Input x1
Input x2
Input x3
Input x4
W1 = 1
W2 = 1
W3 = 1
W4 = 1
AND
Inputs are either 0 or 1
Output is 1 only if all inputs are 1
x1 x2 y = x1 or x2
0 0 0
0 1 1
1 0 1
1 1 1
X1 = “I did my homework.” X2 = “I’m well rested.” y = “I will go to class.” 1 means True 0 means False
T = ? Output y
Input x1
Input x2
W1 = ?
W2 = ?
OR
T = 1 Output y
Input x1
Input x2
W1 = 1
W2 = 1
OR
Inputs are either 0 or 1
Output is 1 if at least 1 input is 1
T = ? Output y
Input x1
Input x2
Input x3
Input x4
W1 = ?
W2 = ?
W3 = ?
W4 = ?
OR
T = 1 Output y
Input x1
Input x2
Input x3
Input x4
W1 = 1
W2 = 1
W3 = 1
W4 = 1
OR
Inputs are either 0 or 1
Output is 1 if at least 1 input is 1
x1 x2 y = x1 xor x2
0 0 0
0 1 1
1 0 1
1 1 0
X1 = “I did my homework.” X2 = “I’m well rested.” y = “I will go to class.” 1 means True 0 means False
T = ? Output y
Input x1
Input x2
W1 = ?
W2 = ?
XOR
T = 0.5 Output y
Input x1
Input x2
W1 = 1
W2 = 1
XOR
Inputs are either 0 or 1
If inputs are 0, output is 0. If one input is 0 and one is 1, output is 1.
T = 0.5 Output y
Input x1
Input x2
W1 = 1
W2 = 1
XOR
Inputs are either 0 or 1
If input are 0, output is 0. If one input is 0 and one is 1, output is 1. If both inputs are 1, output is 1.
Linearly Separable x1 x2 x1 and x2
0 0 0
0 1 0
1 0 0
1 1 1
x1
x2
Linearly Separable x1 x2 x1 and x2
0 0 0
0 1 0
1 0 0
1 1 1
x1
x2
Linearly Separable x1 x2 x1 and x2
0 0 0
0 1 0
1 0 0
1 1 1
x1
x2
x1 x2 x1 or x2
0 0 0
0 1 1
1 0 1
1 1 1
x1
x2
Linearly Separable x1 x2 x1 and x2
0 0 0
0 1 0
1 0 0
1 1 1
x1
x2
x1 x2 x1 or x2
0 0 0
0 1 1
1 0 1
1 1 1
x1
x2
Linearly Separable x1 x2 x1 and x2
0 0 0
0 1 0
1 0 0
1 1 1
x1
x2
x1 x2 x1 or x2
0 0 0
0 1 1
1 0 1
1 1 1
x1
x2
x1 x2 x1 xor x2
0 0 0
0 1 1
1 0 1
1 1 0
x1
x2
History of Neural Networks
McCulloch and Pitts (1943) – introduced model of artificial neurons and suggested they could learn.
Hebb (1949) – Simple updating rule for learning. Rosenblatt (1962) - the perceptron model Minsky and Papert (1969) – wrote
Perceptrons Bryson and Ho (1969, but largely ignored until
1980s) – invented back-propogation learning for multilayer networks
Perceptrons
1969 book by Marvin Minsky and Seymour Papert
The problem is that they can only work for classification problems that are linearly separable
Insufficiently expressive “Important research problem” to investigate
multilayer networks although they were pessimistic about their value
XOR Input x1
Input x2
1
-1
-1
1
T = 1
T = 1
T = 1 1
1
x1 x2 x1 xor x2
0 0 0 0 1 1 1 0 1 1 1 0
Output = x1 xor x2
Training/Learning
Train a perceptron to respond to certain inputs with certain desired outputs
After training, the perceptron should give reasonable outputs for any input
If it wasn’t trained for that input, it should try to find the best possible output depending on how it was trained
Perceptron Training Rule
Begin with random weights Apply the perceptron to each training example
(each pass through examples is called an epoch)
If it misclassifies an example, modify the weights Continue until the perceptron classifies all
training examples correctly
Perceptron Training Rule
Begin with random weights Apply the perceptron to each training example
(each pass through examples is called an epoch)
If it misclassifies an example, modify the weights
Continue until the perceptron classifies all training examples correctly
Modifying the Weights
wi ← wi + ∆wi
∆wi = LearningRate(DesiredOutput – ActualOutput)xi
Modifying the Weights
wi ← wi + ∆wi
∆wi = LearningRate(DesiredOutput – ActualOutput)xi
Usually set to some small value like 0.1.
Moderates the degree to which the weights are changed at each step.
Keeps it from overshooting.
Modifying the Weights
wi ← wi + ∆wi
∆wi = LearningRate(DesiredOutput – ActualOutput)xi
This is the difference between what we wanted the output to be and what it actually was.
If the desired and actual are equal, then this is 0 and the weight won’t change.
Modifying the Weights
wi ← wi + ∆wi
∆wi = LearningRate(DesiredOutput – ActualOutput)xi
The value of the input itself.
If this value was 0, then it had no impact on the error, and so its weight shouldn’t be adjusted.
Perceptron Training Rule
Works when… cases are linearly separable learning rate is slow enough
Other approaches to training perceptrons… Delta rule (Gradient Descent Approach) Linear Programming
Restaurant Problem: Will I wait for a table?
Alternate – whether there is a suitable alternative restaurant nearby
Bar – whether the restaurant has a comfortable bar area to wait in Fri/Sat – true on Fridays and Saturdays Hungry – whether we are hungry Patrons – how many people are in the restaurant (None, Some or
Full) Price – the restaurants price range ($, $$, $$$) Raining – whether its is raining outside Reservation – whether we made a reservation Type – the kind of restaurant (French, Italian, Thai, or Burger) WaitEstimate – the wait estimate by the host (0-10 minutes, 10-30,
30-60, > 60)
Multilayer Network
Learning in Multilayer Networks
Same method as for single layer networks Example inputs are presented to the
network If the network computes an output that
matches the desired, nothing is done If there is an error, then the weights are
adjusted to balance the error
Back Propogation Algorithm
Approach to dividing the contribution of each weight to the error
Like the Perceptron Learning Algorithm, we try to minimize error between each desired output and actual output
At the output layer, the weight update rule is very similar to the rule for the perceptron. Two differences: The activation of the hidden unit aj is used instead of the input
value The rule contains a term for the gradient of the activation
function
Back Propagation Learning
Pattern Recognition
Inputs (x1, x2, …, xn) are called a pattern If the perceptron gives the desired output
for some pattern, the perceptron recognizes or correctly classifies that pattern.
A pattern could be anything….any ideas?
Handwritten Character Recognition
Le Cun et al. (1989) implemented a neural network to read zip codes on hand-addressed envelopes, for sorting purposes
To identify the digits, uses a 16x16 array of pixels as input, 3 hidden layers, and a distributed output encoding with 10 output units for digits 0-9
256 input nodes, 10 output units (1 for the liklihood of each number)
Neural Nets for Face Recognition
Learning Hidden Unit Weights
ALVINN Drives 70 mph on a public highway
Camera image
30x32 pixels as inputs
30 outputs for steering 30x32 weights
into one out of four hidden unit
4 hidden units
Interpreting Satellite Imagery for Automated Weather Forecasting
Summary
Perceptrons, one layer networks, are insufficiently expressive
Multi-layer networks are sufficiently expressive and can be trained by error back-propogation
Many applications including speech, driving, hand written character recognition, fraud detection, driving, etc.
T = 1 Output y
Input x1
Input x2
Input x3
Input x4
W1 = 1
W2 = 1
W3 = 1
W4 = 1
XOR Input x1
Input x2
1
-1
-1
1
T = 1
T = 1
T = 1 1
1
x1 x2 x1 xor x2
0 0 0 0 1 1 1 0 1 1 1 0
Output = x1 xor x2