7/13/20151 learning and perceptrons cis 479/579 bruce r. maxim um-dearborn

57
03/16/22 1 Learning and Perceptrons CIS 479/579 Bruce R. Maxim UM-Dearborn

Post on 22-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

04/19/23 1

Learning and Perceptrons

CIS 479/579

Bruce R. Maxim

UM-Dearborn

04/19/23 2

Momentum and Friction

• When human players use a mouse to aim– Momentum turns the view more than they expect

for large angles (the ballistic mouse thing)– Friction slows down the turn for small angles– Adjustments are needed to avoid losing accuracy

• For AI players they just aim at an exact target and shoot

• Perfect shooters may not be fun to play against so turning errors can be introduced

04/19/23 3

04/19/23 4

Explicit Model

• A mathematical function for computing the actual turning angle in terms of desired angle and previous output = 1.0 + 0.1 * noise(angle)output(t) = (angle * ) * + output(t – 1) * (1 - )

= scaling factors for blending previous output with angle request in range [0.3,0.5]

= initialized to random value between in range [0.9, 1.1]

noise( ) returns value in range [-1,1] could use cos(angle2 * 0.217 +342/angle)

04/19/23 5

Linear Approximation

• We can use a perceptron to approximate the function described earlier

• Once the animat learns the a faster approximation for function it can be removed from the AI code

• Aiming errors just become a constraint on animat behavior

04/19/23 6

04/19/23 7

Methodology• Approximation computed by training network

iteratively• Desired output is computed for random inputs• By grouping results, the batch algorithm can be

used to find values for weights and bias• A small perceptron is applied twice (to get pitch

and then yaw) rather creating a larger one that does both

• This reduces memory use at the expense of programming time

04/19/23 8

Accumulating Errors

• Momentum and friction causes errors or drift that tend to accumulate after several turns

• These errors allow the AI to perform more realistically performance

• Ignoring the variations in aiming will make the AI too error prone to challenge human players

04/19/23 9

Inverse Error

• To compensate for aiming errors, we could define an inverse error function to help correct the aiming errors

• Not every function as a definable inverse so that AI would be better served by a math-free method of approximating this type of function

• Given enough trial and error through simulation opportunities the AI should be able to predict the corrected angles needed

04/19/23 10

Learning - 1

• In effect the AI learns how to deal with aiming errors by receiving evaluative feedback

• Using this feedback the AI can incrementally improve its task performance

• The AI uses its sensors to detect the actual angles the body was turned since the last update

• Unfortunately the AI learns to shoot where it should have shot last time

04/19/23 11

Learning - 2

• With enough trials the AI can learn to anticipate where to shoot (the NN weights provide a crude memory to work with)

• Both the inputs and outputs will need to be scaled because the perceptron will have to deal with values that are not within the unit vector

04/19/23 12

Aimy

• Perceptron is used to learn corrected angles needed to prevent undershooting and overshooting

• Gathers data from its sensors to determine how far its body turned based on each requested angle

• Incremental training is used to approximate the inverse function needed to prevent aming errors

04/19/23 13

Evaluation - 1

• Animat should have the opportunity to correct aiming while moving around

• Perceptrons can learn more quickly when more training samples are presented

• The animat can corrects its aim on only two dimensions (pitch and yaw)

• Only when pitch is near horizontal can the animat aim while it is moving

04/19/23 14

Evaluation - 2

• When looking fully up or fully down there is no forward movement is possible, this prevents learning

• To prevent this trap, the animat is only allowed to control yaw until satisfactory results are obtained

• The worst that happens is the animat spinning around while learning

04/19/23 15

Evaluation - 3

• The way in which the yaw is chosen determines the angles available for learning

• If the animat full control over the yaw, it can decide what to learn and what to ignore (the effect may be for the NN to always predict the same turn to correct aiming errors)

• This is a good reason for forcing the NN to examine a variety of randomly generated angles during training to get a more representative training set and better learning

04/19/23 16

Multilayer Perceptrons

• Single layer perceptrons can only deal with linear problems

• Non-linear problems can only be approximated by single layer perceptrons

• Multilayer perceptrons (MLP)– Have extra middle layers know as “hidden” layers– The middle layers require more sophisticated

activation functions than single layer perceptrons (e.g. linear activations would make MLP behave like single layer perceptron)

04/19/23 17

04/19/23 18

Topology• MLP topology is said to be forward feed

because there are no backward (recurrent) connections

• There can be an arbitrary number of hidden layers in MLP

• Adding too many hidden layers increases the computational complexity of the network

• One hidden layer is usually enough to allow the MLP to be a universal approximator capable of approximating any continuous function

04/19/23 19

Hidden Layers• In some cases, there may be many

independencies among the input variables and adding an extra hidden layer can be helpful

• Adding hidden layers some times can reduce the total number of weights needed for suitable approximation

• MLP with two hidden layers can approximate any non-continuous functions

04/19/23 20

Hidden Neurons• Choosing the number of neurons in the hidden

layer is an art, often depends on the AI designer’s intuition and experience

• The neurons in the hidden layer are needed to represent the problem knowledge internally

• As the number of dimensions grows the complexity of the decision surface (path through hidden layer) increases

• Basically the output on one side of the surface is positive and negative on the other side

04/19/23 21

Connections

• Neurons can be fully connected to one another within and between layers

• Neurons can also be sparsely connected and even skip layers (e.g. straight from input to output)

• Most MLP are fully connected to simplify programming

04/19/23 22

Activation Function Properties

• Derivable (known and computable derivative)• Continuous (derivative defined every where)• Complexity (nonlinear for higher order tasks) • Monotonous (derivative positive)• Boundless (activation output and its derivative

are finite)• Polarity (bipolar preferred to positive)

04/19/23 23

Activation Functions

• Activation functions for the input and output layers are usually one of the following:– Step, Linear, Threshold logic, Sigmoid

• Hidden layer activation functions might be one of the following– Sigmoid: sig(x) = 1/(1 + e-x)– Hyperbolic tangent

– Bipolar Sigmoid: sigb(x) = 2/(1 + e-x) - 1

04/19/23 24

Role of Hidden Layers

• The use of a hidden layer implies that the information needed to compute the output must be filtered before passing it on to the next layer

• Each layer of the MLP receives its input from the previous layer and passes its modified output on to the next layer

04/19/23 25

Feed-Forward Algorithmcurrent = input; // process input layer

for layer = 1 to n

{

for i = 1 to m // compute output of each neuron

{

// multiply arrays and sum result

s = NetSum(neuron(I).weights.current);

output[i] = Activate(s);

}

// next layer uses this layer’s output as input

current = output;

}

04/19/23 26

Benefits of MLP

• The importance of MLP’s is not that they really mimic animal brains, they do not

• MLP have a thoroughly researched mathematical foundation and have been proven to work well in some applications

• MLP can be trained to do interesting things and this training really just involves numeric optimization (minimizing output error)

04/19/23 27

Back Propagation - 1

• BP is the process of filtering error from the output layer back through the preceding layers

• BP was developed in response to fact that single layer perceptron algorithms do not train hidden layers

• BP is the essence of most MLP learning algorithms

04/19/23 28

04/19/23 29

Back Propagation - 2

• Form of hill climbing know as “gradient ascent” hill climbing– several directions tried simultaneously– “steepest gradient” used to direct search

• Training may require thousands of backpropagations

• BP can get stuck or become unstable during training

• BP can be done in stages

04/19/23 30

Back Propagation - 3

• BP can train a net to recognize several concepts simultaneously

• Trained neural networks can be used to make predictions

• Too many trainable weights relative to the number of training facts can lead to overflow problems

04/19/23 31

Back Propagation Algorithm - 1

• Given: set of input-output pairs• Task: compute weights for 3 layer network at

maps inputs to corresponding outputs

Algorithm:

1.Determine the number of neurons required

2.Initialize weights to random values

3.Set activation values for threshold units

04/19/23 32

Back Propagation Algorithm - 2

4.Choose and input-output pair and assign activation levels to input neurons

5.Propagate activations from input neurons to hidden layer neurons for each neuron

hj = 1/(1 + e- w1ij

Xi)

6.Propagate activations from hidden layer neurons to output neurons for each neuron

oj = 1/(1 + e- w2ijhi)

04/19/23 33

Back PropagationAlgorithm - 3

7.Compute error for output neurons by comparing pattern to actual

8.Compute error for neurons in hidden layer

9.Adjust weights in between hidden layer and output layer

10.Adjust weights between input layer and hidden layer

11.Go to step 4

04/19/23 34

Backprop - 1

// compute gradient in last layer neuronsfor j = 1 to m delta[j] = deriv_activate(net_sum) * (desired[j] – output[j]);for i = last – 1 to first // process layers for j = 1 to m { total = 0; for k = 1 to n total += delta[k] * weights[j][k]; delta[j] = deriv_activate(net_sum) * total; }

04/19/23 35

Backprop - 2

// steepest descent for error gradient for

// each weight

for j = 1 to m

for i = 1 to n

// adjust weights using error gradient

weight[j][i] += learning_rate *

delta[j] * output[I];

// The generalized delta rule is used to

// compute each weight wij

// learning_rate set by KE

// delta[j] is gradient of neuron j error

04/19/23 36

Quick Propagation

• Batch technique– Exploits locally adaptive techniques to adjust step

magnitude based on local parameters– Uses knowledge of higher-order derivatives (e.g.

Newton’s methods)

• Allows for better prediction of the slope of the curve and location of minima

• Weights updated using method similar to backprop

04/19/23 37

Quickprop - 1

// Requires two additional arrays for step and// gradient - it remembers last set of values

// New weight update replaces steepest descentfor j = 1 to m for i = 1 to n // compute gradient and step { new_gradient[j][i] = -delta[j] * input[i]; new_step[j][i] = new_gradient[j][i] / (old_gradient[j][i] – new_gradient[j][I]) * old_step[j][i];

04/19/23 38

Quickprop - 1

// adjust weight

weight[j][i] += new_step[j][i];

// store values for next iteration

old_step[j][i] = new_step[j][i];

old_gradient[j][i] = new_gradient[j][i];

}

• Note since this is a batch algorithm all gradients for each training samples are added together

04/19/23 39

Resilient Propagation

• Weights updated only after all training samples have been seen

• The step size is not determined by the gradient unlike steepest descent techniques

• Equations are not too hard to implement

04/19/23 40

Rprop - 1// New weight update replaces steepest descentfor j = 1 to m for i = 1 to n // compute gradient and step { new_gradient[j][i] = -delta[j] * input[i]; // analyze change to get size of update if(new_gradient[j][i]*old_gradient[j][i]>0) new_update[j][i] = nplus * new_update[j][i]; else if(new_gradient[j][i]*old_gradient[j][i]<0) new_update[j][i] = nminus * new_update[j][i]; else new_update[j][i] = old_update[j][i];

04/19/23 41

Rprop - 2 // determine step direction if(new_gradient[j] > 0) step[j][i] = -new_update[j][i]; else if(new_gradient[j] < 0) step[j][i] = new_update[j][i]; else step[j][i] = 0; // adjust weight and store values weight[j][i]+= step[j][i]; old_update[j][i] = new_update[j][i]; old_gradient[j][i] = new_gradient[j][i]; }

04/19/23 42

Building Neural Networks

• Define the problem in terms of neurons– think in terms of layers

• Represent information as neurons– operationalize neurons– select their data type– locate data for testing and training

• Define the network• Train the network• Test the network

04/19/23 43

Structuring the Training Facts

• Use randomly ordered facts• Use representative data

– Include people who survive surgery as well as people who do not

• Neurons can’t be coded1=horse#1, 2=horse#2, etc.

• Networks like lots of inputs and outputs– Better to use two output neurons (one for buy and

one for sell than one coded 1=buy and 0=sell)

04/19/23 44

Structuring the Training Facts

• For historical data, use “rows” not “columns”don’t use:

day1 day2 day3

3 4 5

do use:

day

3

4

5

04/19/23 45

Structuring the Training Facts• Neural networks like differences over big

numbers

use –50

not 350 vs 400

• For seasonal datause 1 column per month with winter cases coded

1 for Dec, Jan, Feb, and 0 for other months

• Think qualitatively not quantitativelyuse: restaurant visit on Monday in early Febnot: restaurant visit on day 43

04/19/23 46

Generalization – 1

• Learning phase is responsible for optimizing the weights from the training examples

• It would be good if the NN could also process new or unseen examples correctly as well (generalization)

• If NN is bound too tightly to training examples is known as overfitting

• Overfitting is never a problem with single layer perceptrons

04/19/23 47

Generalization – 2

• For MLP number of hidden neurons affects complexity of decision surface

• Need to find the trade-off between the number of hidden neurons and result quality

• Incorrect or incomplete data interferes with generalization

• Bad training examples are usually to blame for failure of MLP to learn concepts

04/19/23 48

Testing and Validation

• Training sets – used to optimize the weights for a given set of parameters

• Validation sets – used to check the quality of training, help to find best combination of parameters

• Testing sets – check final quality of validated perceptrons (no test info is used to improve NN)

04/19/23 49

How can you tell things aren’t working out?

• Your network refuses to learn 10-20% of the training facts

• Things to try– Check definition file for data range errors– Check for bad (incorrect) facts– Some training facts may conflict with one another– The training tolerance level may be too strict for

the data being used– Switch from absolute score to differences

04/19/23 50

Batch vs Incremental

• Batch preferred over incremental training– Converge to answer faster– Have greater accuracy

• Incremental data can be gathered for batch processing if necessary

• Incremental approaches best suited for real-time, in-game learning (requires less memory)

04/19/23 51

Forgetting

• With incremental learning, it may be wise to slow down learning rate later in the game to avoid forgetting earlier lessons

• No formal approach to reducing learning rate, linear or exponential decay strategies are often successful

• This implies that learning will eventually become frozen as time passes

04/19/23 52

Perceptron Advantages

• Good mathematical foundation• If solution exists it can be found• Work best for well defined problems• If things go wrong the parameters can be

adjusted• Lots training algorithms exist• MLP works easily with continuous values• Deals well with noise

04/19/23 53

Perceptron Disadvantages - 1

• NN do not contain an easily understood representation of their knowledge

• MLP depends entirely on the algorithms used to create it

• MLP does not scale well• Once trained MLP is not updated without

retraining• Retraining does not preserve pervious MLP

knowledge

04/19/23 54

Perceptron Disadvantages - 2

• Design of inputs and outputs can have a profound impact on MLP success

• Input may require pre-processing and outputs may require post-processing

• Getting the right number of layers and neurons requires trial and error

04/19/23 55

Onno

• Uses a large neural network to handle shooting (prediction, target selection, aiming)

• Input is similar to that described in previous chapters

• Results are moderate, but demonstrates versatility of MLP and benefits of decomposing behaviors

04/19/23 56

Why haven’t there been more NN commercial successes?

• Programming neural networks is very difficult each constraint must be hardwired with O(N2) lateral inhibitory and O(N3) diagonal excitatory connections

• Learning in NN is hard – learning algorithms are hard to write– choosing the right knowledge representation in the

hidden layer is non-trivial

04/19/23 57

Why haven’t there been more NN commercial successes?

• For many application symbol-based knowledge is superior to circuit-based knowledge in terms of performance

• Neural networks may have been oversold (as has been the problem with many early AI technologies)