feature selection in machine learning

47
Data Analytics & Machine Learning MCS4102 Assignment 2 Feature Selections with Trajan Simulator U.V Vandebona

Upload: upekha-vandebona

Post on 16-Apr-2017

823 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Feature Selection in Machine Learning

Data Analytics & Machine LearningMCS4102

Assignment 2

Feature Selections with Trajan Simulator

U.V Vandebona

Page 2: Feature Selection in Machine Learning

Content

• Feature Selection• Dataset 1 - Iris Dataset• Forward Selection• Backward Selection• Genetic Algorithm

• Dataset 2 - Abalone Dataset• Dataset 3 – Custom Dataset

Page 3: Feature Selection in Machine Learning

Data Set (1) - Iris The data set contains 3 classes of 50 instances

each, where each class refers to a type of iris flower.

Page 4: Feature Selection in Machine Learning

Data Set (1) - Iris Attribute Information (All in centimeters)

› Sepal length› Sepal width› Petal length› Petal width› Flower class

Ex: 5.3,3.7,1.5,0.2,Iris-setosa5.0,3.3,1.4,0.2,Iris-setosa7.0,3.2,4.7,1.4,Iris-versicolor6.4,3.2,4.5,1.5,Iris-versicolor6.3,3.3,6.0,2.5,Iris-virginica5.8,2.7,5.1,1.9,Iris-virginica

Page 5: Feature Selection in Machine Learning

Iris Class Features1. One of the classes (Iris Setosa) is linearly separable

from the other two. However, the other two classes are not linearly separable.

2. There is some overlap between the Versicolor and Virginica classes, so that it is impossible to achieve a perfect classification rate.

3. There is some redundancy in the four input variables, so that it is possible to achieve a good solution with only three of them, or even (with difficulty) from two.

Page 6: Feature Selection in Machine Learning

Import and Setup Data

1

3 2

Page 7: Feature Selection in Machine Learning

Import and Setup Data

1. Iris dataset is just a simple dataset that values are delimited by commas.

2. Dataset doesn’t include any variable naming or case naming.

3. We can edit our dataset to give proper variable naming.

› Class field has automatically turned into a nominal field as it contain only three nominal values.

Page 8: Feature Selection in Machine Learning

Feature Selection Analysis Feature Selection From the available variables, set the dependent

and independent (output and input) variables.

Dependent Variable :

ClassIndependent Variable :

Sepal Width Sepal LengthPetal Width Petal Length

4

Page 9: Feature Selection in Machine Learning

Why Feature Selection ? Dependent or output variable states which

flower class the record belongs to; Either Virginica, Versicolor or Setosa.

Independent or input variables are used to predict that decision.

Typically we do not have a strong idea of the relationship between the available variables and the desired prediction.

Page 10: Feature Selection in Machine Learning

Why Feature Selection ? To an extent, some neural network architectures

(e.g., multilayer perceptrons) can actually learn to ignore useless variables.

However, other architectures (e.g., radial basis functions) are adversely affected, and in all cases a larger number of inputs implies that a larger number of training cases are required.

As a rule of thumb, the number of training cases should be, a good, few times bigger than the number of weights in the network, to prevent over-learning.

Page 11: Feature Selection in Machine Learning

Why Feature Selection ? As a consequence, the performance of a network

can be improved by reducing the number of inputs, even sometimes at the cost of losing some input information.› In many problem domains, a range of input variables

are available which may be used to train a neural network, but it is not clear which of them are most useful, or indeed are needed at all.

Page 12: Feature Selection in Machine Learning

Why Feature Selection ? In non-linear problems, there may be

interdependencies and redundancies between variables; › for example, a pair of variables may be of no value

individually, but extremely useful in conjunction, or any one of a set of parameters may be useful.

› It is not possible, in general, to simply rank parameters in order of importance.

Page 13: Feature Selection in Machine Learning

Why Feature Selection ? The "curse of dimensionality" means that it is

sometimes actually better to discard some variables that do have a genuine information content, simply to reduce the total number of input variables, and therefore the complexity of the problem, and the size of the network.

Counter-intuitively, this can actually improve the network's generalization capabilities.

Page 14: Feature Selection in Machine Learning

Why Feature Selection ? The method that is guaranteed to select the best

input set, is to train networks with all possible input sets and all possible architectures, and to select the best. › In practice, this is impossible for any significant

number of candidate inputs. If you wish to examine the selection of variables

more closely yourself, Feature Selection is a good technique.

Page 15: Feature Selection in Machine Learning

Feature Selection The Feature Selection Algorithms conduct a large

number of experiments with different combinations of inputs, building probabilistic or generalized regression networks for each combination, evaluating the performance, and using this to further guide the search.

This is a "brute force" technique that may sometimes find results much faster.

Page 16: Feature Selection in Machine Learning

Feature Selection It explicitly identify input variables that do not

contribute significantly to the performance of networks, then by suggest to remove them.

These algorithms are either stepwise algorithms that progressively add or remove variables, or genetic algorithms.

Page 17: Feature Selection in Machine Learning

Sampling - Random5.1 Randomized subset assignment to train, select

and test.5.1

Page 18: Feature Selection in Machine Learning

Sampling - Fixed5.2 Fixed subset assignment to train, select and

test.› Add a column containing nominal values “Train”,

“Select”, ”Test” and “Ignore”. For generate the values, support of spreadsheet package may needed. Name the column as NNSET.

5.2

Page 19: Feature Selection in Machine Learning

Sampling - Fixed

Run the Feature Selection once and these subsets will be assigned after that.

Page 20: Feature Selection in Machine Learning

Sampling A major problem with neural networks is the

generalization issue (the tendency to overfit the training data), accompanied by the difficulty in quantifying likely performance on new data.

It is important to have ways to estimate the performance of the models on new data, and to be able to select among them.

Most work on assessing performance in neural modeling concentrates on approaches to resampling.

Page 21: Feature Selection in Machine Learning

Sampling Typically the neural network is trained in using a

training subset. The test subset is used to perform an unbiased

estimation of the network's likely performance.

Page 22: Feature Selection in Machine Learning

Sampling Often, a separate subset (the selection subset) is

used to halt training to mitigate over-learning, or to select from a number of models trained with different parameters. It keep an independent check on the performance of the networks during training with deterioration in the selection error indicating over-learning.

If over-learning occurs, stops training the network, and restores it to the state with minimum selection error.

Page 23: Feature Selection in Machine Learning

Feature Selection – Results Configuration

7

6

Page 24: Feature Selection in Machine Learning

Feature Selection – Results Configuration

6. In the results shown after analysis, each row will represents a particular test of a combination of inputs. So with this, it will show every combination of inputs.

7. It is sometimes a good idea to reduce the number of input variables to a network even at the cost of a little performance, as this improves generalization capability and decreases the network size and execution size.

Page 25: Feature Selection in Machine Learning

Feature Selection – Results Configuration

You can apply some extra pressure to eliminate unwanted variables by assigning a Unit Penalty.

› This is multiplied by the number of units in the network and added to the error level in assessing how good a network is, and thus penalizes larger networks.

Page 26: Feature Selection in Machine Learning

Feature Selection – Results Configuration

If there are a large number of cases, the evaluations performed by the feature selection algorithms can be very time-consuming (the time taken is proportional to the number of cases).

› For this reason, you can specify a sub-sampling rate. (However, in this case as we have very few cases, the sampling rate of 1.0 (the default) is fine).

Page 27: Feature Selection in Machine Learning

Forward Selection Begins by locating a single input variable, that on

its own, best predicts the output variable. It then checks for a second variable, that added

to the first. Repeat the process until either all variables have

been selected or no further improvement is made.

Good for larger number of variables.

Page 28: Feature Selection in Machine Learning

Forward Selection Generally Faster. Much faster if there are few

relevant variables, as it will locate them at the beginning of its search.

Can behave sensibly when data set has large number of variables as it selects variables initially.

It may miss key variables if they are interdependent. (that is where two or more variables must be added at the same time in order to improve the model.)

Page 29: Feature Selection in Machine Learning

Results The row label indicates the stage; (e.g. 2.3

indicates the third test in stage 2. ) The final row replicates the best result found, for convenience. The first column is the selection error of the Probabilistic Neural Network (PNN) or Generalized Regression Neural Network (GRNN). Subsequent columns indicate which inputs were selected for that particular combination.

Page 30: Feature Selection in Machine Learning

Results

0 Penalty

0.003 Penalty

0.001 Penalty

0.005 Penalty0.012 Penalty

0.002 Penalty

Conclusion : By considering the span of the error values of above results with penalty value, Petal Width and Petal Length are good features to use if needed reducing of input features.

Page 31: Feature Selection in Machine Learning

Backward Selection A Reverse process. Starts with a model including all the variables

and then removes them one at a time At each stage finding the variable that, when it is

removed least degrades the model. Good for smaller (20 or less) number of

variables.

Page 32: Feature Selection in Machine Learning

Backward Selection Doesn’t suffer from missing key variables. As it starts with the whole set of variables, the

initial evaluations are most time consuming. Suffer from large number of variables. Specially if

there are only a few weakly predictive ones in the set.

Not cut down the irrelevant variables until the very end of its search.

Page 33: Feature Selection in Machine Learning

Results

0 Penalty

0.003 Penalty

0.001 Penalty

0.004 Penalty0.012 Penalty

0.002 Penalty

Conclusion : By considering the span of the error values of above results with penalty value, Petal Width, Petal Length and Sepal Length are good features to use if needed reducing of input features.

Page 34: Feature Selection in Machine Learning

Genetic Algorithm A optimization algorithm. Genetic algorithms are a particularly effective

search technique for combinatorial problems (where a set of interrelated yes/no decisions needs to be made).

The method is time-consuming (it typically requires building and testing many thousands of networks)

Page 35: Feature Selection in Machine Learning

Genetic Algorithm For reasonably-sized problem domains (perhaps

50-100 possible input variables, and cases numbering in the low thousands), the algorithm can be employed effectively overnight or at the weekend on a fast PC.

With sub-sampling, it can be applied in minutes or hours, although at the cost of reduced reliability for very large numbers of variables.

Page 36: Feature Selection in Machine Learning

Genetic Algorithm Run with the default settings, it would perform

10,000 evaluations (100 population times 100 generations).

Since our problem has only 4 candidate inputs, the total number of possible combinations is only 16 (2 raised to the 4th power).

Page 37: Feature Selection in Machine Learning

Results

0 Penalty 0.003 Penalty

0.004 Penalty

0.012 Penalty

Conclusion : By considering the span of the error values of above results with penalty value, Petal Width, Petal Length and Sepal Length are good features to use if needed reducing of input features.

Page 38: Feature Selection in Machine Learning

Data Set (2) - Abalone The age of abalone can determined by

counting the number of rings. The number of rings is the value to predict

from physical measurements.

Page 39: Feature Selection in Machine Learning

Data Set (2) - AbaloneAttribute Name

Data Type Measurement Unit

Description 

Sex nominal M, F, and I (infant)Length continuous mm Longest shell

measurement Diameter continuous mm perpendicular to lengthHeight continuous mm with meat in shell Whole weight continuous grams whole abalone Shucked weight continuous grams weight of meatViscera weight continuous grams gut weight (after

bleeding)Shell weight continuous grams after being driedRings integer +1.5 gives the age in

years 

Page 40: Feature Selection in Machine Learning

Results - Forward Selection

Conclusion : Sex, Whole Weight and Shell Weight are good features to use if needed reducing of input features.

Height doesn’t give any useful effect.

High Penalty : 0.001

Less Penalty : 0.0001

Page 41: Feature Selection in Machine Learning

Results - Backward Selection

Conclusion : Sex, Whole weight and Shell weight are good features to use if needed reducing of input features.

Height doesn’t give any useful effect.

High Penalty : 0.001

Less Penalty : 0.0001

Page 42: Feature Selection in Machine Learning

Results - Genetic Algorithm

Conclusion : Sex, Whole weight and Shell weight are good features to use if needed reducing of input features.

Height doesn’t give any useful effect.

Sampling rate : 0.1

High Penalty : 0.001

Less Penalty : 0.0001

Page 43: Feature Selection in Machine Learning

Data Set (3) - Custom Set 4 Classes

17 Attribute Features

• C1 • C2 • C3 • C4

•F1•F2•F3•F4•F5•F6

•F7•F8•F9•F10•F11•F12

•F13•F14•F15•F17•F18

Page 44: Feature Selection in Machine Learning

Forward Selection ResultsHigh Penalty : 0.001

Less Penalty : 0.0001

Conclusion : Feature F14 is a good features to use if needed reducing of input features.

Feature F2, F5, F6, F7, F9 and F11 doesn’t give any useful effect.

Page 45: Feature Selection in Machine Learning

Backward Selection Results

High Penalty : 0.001Less Penalty : 0.0001

Conclusion : Feature F14 is a good features to use if needed reducing of input features.

Feature F2, F5, F6, F7, F9 and F11 doesn’t give any useful effect.

Page 46: Feature Selection in Machine Learning

Genetic Algorithm Results

High Penalty : 0.001Less Penalty : 0.0001

Conclusion : Feature F14 is a good features to use if needed reducing of input features.

Feature F2, F5, F6, F7, F9 and F11 doesn’t give any useful effect.

Page 47: Feature Selection in Machine Learning

Reference

http://archive.ics.uci.edu/ml/datasets/Iris

[Online 2015-10-25]

http://archive.ics.uci.edu/ml/datasets/Abalone

[Online 2015-10-25]

Trajan Neural Network Simulator Help