genetic-algorithm-based instance and feature selection

14
Genetic-Algorithm-Based Genetic-Algorithm-Based Instance and Feature Instance and Feature Selection Selection Instance Selection and Constr uction for Data Mining Ch. 6 H. Ishibuchi, T. Nakashima, a nd M. Nii

Upload: carys

Post on 15-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Genetic-Algorithm-Based Instance and Feature Selection. Instance Selection and Construction for Data Mining Ch. 6 H. Ishibuchi, T. Nakashima, and M. Nii. Abstract. GA based approach for selecting a small number of instances from a given data set in a pattern classification problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genetic-Algorithm-Based Instance and Feature Selection

Genetic-Algorithm-Based Genetic-Algorithm-Based Instance and Feature Instance and Feature

SelectionSelection

Instance Selection and Construction for Data Mining Ch. 6

H. Ishibuchi, T. Nakashima, and M. Nii

Page 2: Genetic-Algorithm-Based Instance and Feature Selection

AbstractAbstract

GA based approach for selecting a small number of instances from a given data set in a pattern classification problem.

To improve the classification ability of our nearest neighbor classifier by searching for an appropriate reference set.

Page 3: Genetic-Algorithm-Based Instance and Feature Selection

Genetic AlgorithmGenetic Algorithm

Coding Binary string of the length (n+m)

ai : inclusion or exclusion of the i-th feature sp : the inclusion or exclusion of the p-th instance

Fitness function Minimize |F|, minimize |P|, and maximize g(S) |F| : number of selected feature |P| : number of selected instance g(S) : classification performance

mn sssaaaS 2121

Page 4: Genetic-Algorithm-Based Instance and Feature Selection

Genetic AlgorithmGenetic Algorithm

Performance measure (first one) : gA(S) The number of correctly classified instances Minimize |P| subject to gA(S) = m

Performance measure (second one) : gB(S) When an instance xq was included in the reference set, xq was n

ot selected as its own nearest neighbor.

fitness

PxxPxxxd

PxPxxxdxxd

qqpqpF

qpqpF

qpF if }},{|),(min{

if },|),(min{),( *

||||)()( PWFWSgWSfitness PFg

Page 5: Genetic-Algorithm-Based Instance and Feature Selection

Genetic AlgorithmGenetic Algorithm

1. Initialization

2. Genetic Operation: Iterate the following procedure Npop/2 times to generate Npop string

1. Randomly select a pair of strings

2. Apply a uniform crossover

3. Apply a mutation operator

3. Generation Update: Select the Npop best string from 2Npop

4. Termination test

Page 6: Genetic-Algorithm-Based Instance and Feature Selection

Numerical ExampleNumerical Example

Page 7: Genetic-Algorithm-Based Instance and Feature Selection

Biased MutationBiased Mutation

For effectively decreasing the number of selected instances is to bias the mutation probability

In the biased mutation, a much larger probability is assigned to the mutation from sp = 1 to sp = 0.

Page 8: Genetic-Algorithm-Based Instance and Feature Selection

Data setsData sets

2 artificial + 4 real Normal distribution with small overlap Normal distribution with large overlap Iris data Appendicitis Data Cancer Data Wine Data

Page 9: Genetic-Algorithm-Based Instance and Feature Selection

Parameter SpecificationsParameter Specifications

Pop Size : 50 Crossover Prob. : 1.0 Mutation Prob.

Pm = 0.01 for feature selection

Pm(1 0) = 0.1 for instance selection

Pm(0 1) = 0.01 for instance selection

Stopping condition : 500 gen. Weight values : Wg = 5; WF = 1; WP = 1

Performance measure : gA(S) or gB(S)

30 trials for each data

Page 10: Genetic-Algorithm-Based Instance and Feature Selection

Performance on Training Performance on Training DataData

Page 11: Genetic-Algorithm-Based Instance and Feature Selection

Performance on Test DataPerformance on Test Data

Leaving-one-out procedure (iris & appendicitis) 10-fold cross-validation (cancer & wine)

Page 12: Genetic-Algorithm-Based Instance and Feature Selection

Effect of Feature SelectionEffect of Feature Selection

Page 13: Genetic-Algorithm-Based Instance and Feature Selection

Effect on NNEffect on NN

Page 14: Genetic-Algorithm-Based Instance and Feature Selection

Some VariantsSome Variants