ma mru dm chapter07

Chapter 7Artificial Neural Networks

2

[Artificial] Neural Networks

• A class of powerful, general-purpose tools readily applied to:– Prediction– Classification– Clustering

• Biological Neural Net (human brain) is the most powerful – we can generalize from experience

• Computers are best at following pre-determined instructions

• Computerized Neural Nets attempt to bridge the gap– Predicting time-series in financial world– Diagnosing medical conditions– Identifying clusters of valuable customers– Fraud detection– Etc…

3

Neural Networks

• When applied in well-defined domains, their

ability to generalize and learn from data “mimics”

a human’s ability to learn from experience.

• Very useful in Data Mining…better results are

the hope

• Drawback – training a neural network results in

internal weights distributed throughout the

network making it difficult to understand why a

solution is valid

4

Neural Network History

• 1930s thru 1970s

• 1980s:

– Back propagation – better way of training a neural net

– Computing power became available

– Researchers became more comfortable with n-nets

– Relevant operational data more accessible

– Useful applications (expert systems) emerged

5

Real Estate AppraiserNeural networks have the ability to learn by example in much the same way that human experts gain from experience.

6

Loan Prospector – HNC/Fair Isaac

• A Neural Network (Expert System) is like a black box that

knows how to process inputs to create a useful output.

• The calculation(s) are quite complex and difficult to

understand

7

Neural Network Training

• Training the network = the process of adjusting weights inside it to arrive at the best combination of weights for making the desired predictions.

• The goal is to use the training set to calculate weights where the output of the network is as close to the desired output as possible for as many of the examples in the training set as possible.

8


• The network starts with a random set of weights, so it initially performs very poorly.

• By reprocessing the training set over and over (generations) and adjusting the internal weights each time to reduce the overall error, the network gradually does a better and better job of approximating the target values in the training set.

• When the approximations no longer improve, the network stops training.

• The final network comes from the generation that works best on the validation set, rather than the one that works best on the training set.

9


• Usually Feed-Forward or Forward propagation

• Back propagation has been used since the 1980s to adjust the weights (other methods are now available):– Calculates the error by taking the difference between the

calculated result and the actual result

– The error is fed back through the network and the weights are adjusted to minimize the error

– No longer the preferred method for adjusting the weights

10

Neural Networks for Directed Data Mining

The steps in this process are:

1. Identify the input and output features.

2. Transform the inputs and outputs so they are in a small range, (–1 to 1).

3. Set up a network with an appropriate topology.

4. Train the network on a representative set of training examples.

5. Use the validation set to choose the set of weights that minimizes the error.

6. Evaluate the network using the test set to see how well it performs.

7. Apply the model generated by the network to predict outcomes for unknown inputs.

11

Neural Net Limitations

• Neural Nets are good for prediction and estimation when:

– Inputs are well understood

– Output is well understood

– Experience is available for examples to use to “train” the neural net application (expert system)

• Neural Nets are only as good as the training set used to generate it.

• The resulting model is static and must be updated with more recent examples and retraining for it to stay relevant

• Can learn patterns that exist only in the training set, resulting in overfitting

12

Feed-Forward Neural Net Examples

13

Feed-Forward Neural Net Examples

If the network performs very well on the training set, but does much worse on the validation set, then this is an indication that it has memorized the training set.

14

The Unit of a Neural Network

• The unit of a neural

network is modeled

on the biological

neuron

• The unit combines its

inputs into a single

value, which it then

transforms to

produce the output;

together these are

called the activation

function

15

Loan Appraiser - revisited

Illustrates that a neural network (feed-forward in this case) is filled with seemingly meaningless weights.

The appraised value of this property is $176,228.

The input layer represents the process for mapping (coverting) values into a reasonable range(-1;+1).

input layer

hidden layer

output layer

16

Adjusting the Weights: generalized delta rule

• Two important parameters associated with using the generalized delta rule:

– Momentum - refers to the tendency of the weights inside each unit to change the “direction” they are heading in.

– That is, each weight remembers if it has been getting bigger or smaller, and momentum tries to keep it going in the same direction.

– A network with high momentum responds slowly to new training examples that want to reverse the weights.

– If momentum is low, then the weights are allowed to oscillate more freely.

17

Adjusting the Weights: generalized delta rule

• The learning rate - controls how quickly the weights change.

• The best approach for the learning rate is to start big and decrease it slowly as the network is being trained.

– Initially, the weights are random, so large oscillations are useful to get in the vicinity of the best weights.

– As the network gets closer to the optimal solution, the learning rate should decrease so the network can fine-tune to the most optimal weights.

18

More Decisions ...• How large should the hidden layer be?

– It depends on the data, the patterns being detected, and the type of network.

– Since overfitting is a major concern with networks using customer data, we generally do not use hidden layers larger than the number of inputs.

– A good place to start for many problems is to experiment with one, two, and three nodes in the hidden layer.

– When the network is overtraining, reduce the size of the layer.

– If it is not sufficiently accurate, increase its size.

– When using a network for classification, however, it can be useful to start with one hidden node for each class.

19

More Decisions ...• The size of the training set.

– Must be sufficiently large to cover the ranges of inputs available for each feature.

– In addition, you want several training examples for each weight in the network.

• For a network with s input units, h hidden units, and 1 output, there are h * (s + 1) + h + 1 weights in the network (each hidden layer node has a weight for each connection to the input layer, an additional weight for the bias, and then a connection to the output layer and its bias).

• For instance, if there are 15 input features and 10 units in the hidden network, then there are 171 weights in the network.

• There should be at least 30 examples for each weight, but a better minimum is 100.

20

More Decisions ...

• Choosing the Training Set

– Coverage of Values for All Features

– Number of Features – if more features:

• the larger the network needs to be, increasing the risk of overfitting and increasing the size of the training set.

• the longer is takes the network to converge to a set of weights.

• the weights are less likely to be optimal.

21

More Decisions ...

• Number of Outputs

– There be many examples for all possible output values from the network.

– The number of training examples for each possible output should be about the same.

• a sufficient number of examples of rare events

• the training set needs to be balanced by oversampling the rare cases

22

Preparing the Data• Continuous Values

– the values can be scaled to be in a reasonable rangemapped_value = 2 * (original_value – min) / (max – min + 1) – 1

– ensure that all variables values are represented in the training set; if not

• Increase the range

• Reject out-of-range values

• Peg values lower than the minimum to the minimum and higher than the maximum to the maximum

• Map the minimum value to –0.9 and the maximum value to 0.9 instead of –1 and 1

– skewed distribution:• discretizing or binning

• taking the logarithm

• standardizing

23

Preparing the Data• Categorical Values

– treat the codes as discrete, ordered values• not appropiate in some cases

– break the categories into flags, one for each value• dummy

– replace the code itself with numerical data about the code

• Instead of including zip codes in a model include various census fields, such as the median income or the proportion of households with children.

• Another possibility is to include historical information summarized at the level of the categorical variable.

– An example would be including the historical churn rate by zip code for a model that is predicting churn.

24

In-Class Exercise

• Search the web for a Neural Net Example

• Review the example

25

RapidMiner Practice

• To see:

– Training Videos\01 - Ralf Klinkenberg –RapidMinerResources\

• 5 - Modelling - Classification -3- Neural networks 1.mp4

• 5 - Modelling - Classification -3- Neural networks 2.mp4

• To practice:

– Do the exercises presented in the movies using the file “Iris.ioo”.

26

• Data Preprocessing

– GermanCredit.xls GermanCredit.ioo

• Process design• Take a look at the .ioo file and attributes / variables

• Use Neural Networks to predict response (credit rating)– Try different options

• Validate your model– Use validation operator

– Inside this put the Neural Network learner (left side) and Apply Model and Performance (right side)

• Model

– Read and interpret the results

RapidMiner Practice

27

Self-OrganizingMaps

• Self-organizingmaps (SOMs)or KohonenNetworks) are:

– a variant of neural networks used for undirected data mining tasks such as cluster detection.

– a neural network that can recognize unknown patterns in the data.

ma mru dm chapter07

Documents