introduction to artificial neural networks · · 2013-03-06neural networks in order to combine...

Introduction To Artificial Neural Networks

Machine Learning

circle square circle square …

“group these into two categories”

Supervised

Unsupervised

. Supervised Machine Learning

. . . .

. . . . .

.

. . . .

.

.

. . .

.

. .

.

Supervised Machine Learning

. . . . .

. . . . .

.

. . . .

.

.

. . .


. . . . .

. . . . .

.

. . . .

.

.

. . .

Accuracy


. . . . .

. . . . .

.

. . . .

.

.

. . .

Accuracy 15/20 = 0.75


. . . . .

. . . . .

.

. . . .

.

.

. . .

Accuracy 15/20 = 0.75 Precision


. . . . .

. . . . .

.

. . . .

.

.

. . .

Accuracy 15/20 = 0.75 Precision 7/12 = 0.58


. . . . .

. . . . .

.

. . . .

.

.

. . .

Accuracy 15/20 = 0.75 Precision 7/12 = 0.58 Recall


. . . . .

. . . . .

.

. . . .

.

.

. . .

Accuracy 15/20 = 0.75 Precision 7/12 = 0.58 Recall 7/7 = 1.0


. . . . .

. . . . .

.

. . . .

.

.

. . .

Accuracy 15/20 = 0.75 Precision 7/12 = 0.58 Recall 7/7 = 1.0 F1 (2PR/(P+R))


. . . . .

. . . . .

.

. . . .

.

.

. . .

Accuracy 15/20 = 0.75 Precision 7/12 = 0.58 Recall 7/7 = 1.0 F1 (2PR/(P+R)) = 0.73


293871947009

* √52.86301

/ 80.2341 = ?

293871947009

* √52.86301

/ 80.2341

= 26630240520.936812470902167425359

Neural Networks

In order to combine the powers of the machine and the human brain, Neural Networks try to mimic the structure and function of our nervous system.

Biological Motivation #1

Synapses

Axon

Dendrites

Synapses+++--

(weights)

Nodes

Biological Motivation #2

W is the strength of signal sent between A and B.

If A is stimulated sufficiently and w is positive, then A stimulates B.

If A is stimulated sufficiently and w is negative, then A inhibits B.

If A isn’t stimulated sufficiently, nothing happens.

The amount to which a node must be stimulated is determined by its threshold.

Weight w Node A Node B

Neural Networks Node (Neuron)

Edge (Interconnection)

Threshold T Output y

Input x1

Input x2

Input x3

Input x4

Weight w1

Weight w2

Weight w3

Weight w4

A Single Perceptron

Threshold T Output y

Input x1

Input x2

Input x3

Input x4

Weight w1

Weight w2

Weight w3

Weight w4

If w1x1 + w2x2 + … + wnxn ≥ T,

then the output of n is 1.

Otherwise,

the output of n is 0.

A Single Perceptron

Perceptron

  created in the 1960’s   Neural “network” of a single neuron   Trainable: its threshold and input weights can be

modified or learned   If the neuron doesn’t give the desired output, then it

has made a mistake.   Input weights and threshold can be changed

according to a learning algorithm.

x1 x2 y = x1 and x2

0 0 0

0 1 0

1 0 0

1 1 1

X1 = “I did my homework.” X2 = “I’m well rested.” y = “I will go to class.” 1 means True 0 means False

T = ? Output y

Input x1

Input x2

W1 = ?

W2 = ?

AND

T = 2 Output y

Input x1

Input x2

W1 = 1

W2 = 1

AND

Inputs are either 0 or 1

Output is 1 only if all inputs are 1

T = ? Output y

Input x1

Input x2

Input x3

Input x4

W1 = ?

W2 = ?

W3 = ?

W4 = ?

AND

T = 4 Output y

Input x1

Input x2

Input x3

Input x4

W1 = 1

W2 = 1

W3 = 1

W4 = 1

AND


Output is 1 only if all inputs are 1

x1 x2 y = x1 or x2

0 0 0

0 1 1

1 0 1

1 1 1


T = ? Output y

Input x1

Input x2

W1 = ?

W2 = ?

OR

T = 1 Output y

Input x1

Input x2

W1 = 1

W2 = 1

OR


Output is 1 if at least 1 input is 1

T = ? Output y

Input x1

Input x2

Input x3

Input x4

W1 = ?

W2 = ?

W3 = ?

W4 = ?

OR

T = 1 Output y

Input x1

Input x2

Input x3

Input x4

W1 = 1

W2 = 1

W3 = 1

W4 = 1

OR


Output is 1 if at least 1 input is 1

x1 x2 y = x1 xor x2

0 0 0

0 1 1

1 0 1

1 1 0


T = ? Output y

Input x1

Input x2

W1 = ?

W2 = ?

XOR

T = 0.5 Output y

Input x1

Input x2

W1 = 1

W2 = 1

XOR


If inputs are 0, output is 0. If one input is 0 and one is 1, output is 1.

T = 0.5 Output y

Input x1

Input x2

W1 = 1

W2 = 1

XOR


If input are 0, output is 0. If one input is 0 and one is 1, output is 1. If both inputs are 1, output is 1.

Linearly Separable x1 x2 x1 and x2

0 0 0

0 1 0

1 0 0

1 1 1

x1

x2


0 0 0

0 1 0

1 0 0

1 1 1

x1

x2

x1 x2 x1 or x2

0 0 0

0 1 1

1 0 1

1 1 1

x1

x2


0 0 0

0 1 0

1 0 0

1 1 1

x1

x2

x1 x2 x1 or x2

0 0 0

0 1 1

1 0 1

1 1 1

x1

x2

x1 x2 x1 xor x2

0 0 0

0 1 1

1 0 1

1 1 0

x1

x2

History of Neural Networks

  McCulloch and Pitts (1943) – introduced model of artificial neurons and suggested they could learn.

  Hebb (1949) – Simple updating rule for learning.   Rosenblatt (1962) - the perceptron model   Minsky and Papert (1969) – wrote

Perceptrons   Bryson and Ho (1969, but largely ignored until

1980s) – invented back-propogation learning for multilayer networks

Perceptrons

  1969 book by Marvin Minsky and Seymour Papert

  The problem is that they can only work for classification problems that are linearly separable

  Insufficiently expressive   “Important research problem” to investigate

multilayer networks although they were pessimistic about their value

XOR Input x1

Input x2

1

-1

-1

1

T = 1

T = 1

T = 1 1

1

x1 x2 x1 xor x2

0 0 0 0 1 1 1 0 1 1 1 0

Output = x1 xor x2

Training/Learning

 Train a perceptron to respond to certain inputs with certain desired outputs

 After training, the perceptron should give reasonable outputs for any input

  If it wasn’t trained for that input, it should try to find the best possible output depending on how it was trained

Perceptron Training Rule

  Begin with random weights   Apply the perceptron to each training example

(each pass through examples is called an epoch)

  If it misclassifies an example, modify the weights   Continue until the perceptron classifies all

training examples correctly


  Begin with random weights   Apply the perceptron to each training example

(each pass through examples is called an epoch)

  If it misclassifies an example, modify the weights

  Continue until the perceptron classifies all training examples correctly

Modifying the Weights

wi ← wi + ∆wi

∆wi = LearningRate(DesiredOutput – ActualOutput)xi


wi ← wi + ∆wi


Usually set to some small value like 0.1.

Moderates the degree to which the weights are changed at each step.

Keeps it from overshooting.


wi ← wi + ∆wi


This is the difference between what we wanted the output to be and what it actually was.

If the desired and actual are equal, then this is 0 and the weight won’t change.


wi ← wi + ∆wi


The value of the input itself.

If this value was 0, then it had no impact on the error, and so its weight shouldn’t be adjusted.


Works when…  cases are linearly separable  learning rate is slow enough

Other approaches to training perceptrons…  Delta rule (Gradient Descent Approach)  Linear Programming

Restaurant Problem: Will I wait for a table?

  Alternate – whether there is a suitable alternative restaurant nearby

  Bar – whether the restaurant has a comfortable bar area to wait in   Fri/Sat – true on Fridays and Saturdays   Hungry – whether we are hungry   Patrons – how many people are in the restaurant (None, Some or

Full)   Price – the restaurants price range ($, $$, $$$)   Raining – whether its is raining outside   Reservation – whether we made a reservation   Type – the kind of restaurant (French, Italian, Thai, or Burger)   WaitEstimate – the wait estimate by the host (0-10 minutes, 10-30,

30-60, > 60)

Multilayer Network

Learning in Multilayer Networks

 Same method as for single layer networks  Example inputs are presented to the

network   If the network computes an output that

matches the desired, nothing is done   If there is an error, then the weights are

adjusted to balance the error

Back Propogation Algorithm

  Approach to dividing the contribution of each weight to the error

  Like the Perceptron Learning Algorithm, we try to minimize error between each desired output and actual output

  At the output layer, the weight update rule is very similar to the rule for the perceptron. Two differences:   The activation of the hidden unit aj is used instead of the input

value   The rule contains a term for the gradient of the activation

function

Back Propagation Learning

Pattern Recognition

  Inputs (x1, x2, …, xn) are called a pattern   If the perceptron gives the desired output

for some pattern, the perceptron recognizes or correctly classifies that pattern.

 A pattern could be anything….any ideas?

Handwritten Character Recognition

  Le Cun et al. (1989) implemented a neural network to read zip codes on hand-addressed envelopes, for sorting purposes

  To identify the digits, uses a 16x16 array of pixels as input, 3 hidden layers, and a distributed output encoding with 10 output units for digits 0-9

  256 input nodes, 10 output units (1 for the liklihood of each number)

Neural Nets for Face Recognition

Learning Hidden Unit Weights

ALVINN Drives 70 mph on a public highway

Camera image

30x32 pixels as inputs

30 outputs for steering 30x32 weights

into one out of four hidden unit

4 hidden units

Interpreting Satellite Imagery for Automated Weather Forecasting

Summary

 Perceptrons, one layer networks, are insufficiently expressive

 Multi-layer networks are sufficiently expressive and can be trained by error back-propogation

 Many applications including speech, driving, hand written character recognition, fraud detection, driving, etc.

T = 1 Output y

Input x1

Input x2

Input x3

Input x4

W1 = 1

W2 = 1

W3 = 1

W4 = 1

XOR Input x1

Input x2

1

-1

-1

1

T = 1

T = 1

T = 1 1

1

x1 x2 x1 xor x2

0 0 0 0 1 1 1 0 1 1 1 0

Output = x1 xor x2

introduction to artificial neural networks · · 2013-03-06neural networks in order to combine...

Documents