feature selection - georgia institute of technology · results of sequential forward feature...

24
Feature Selection Robot Image Credit: Viktoriya Sukhanova © 123RF.com

Upload: others

Post on 26-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

FeatureSelection

Robot Image Credit: Viktoriya Sukhanova © 123RF.com

FeatureSelection• Givenasetofn features,thegoaloffeatureselection istoselecta

subsetofd features(d < n)inordertominimizetheclassificationerror.

• Why perform feature selection?– Data interpretation\knowledge discovery (insights into which factors

which are most representative of your problem)– Curse of dimensionality (amount of data grows exponentially with # of

features O(2")

• Fundamentallydifferentfromdimensionalityreduction(wewilldiscussnexttime)basedonfeaturecombinations(i.e.,featureextraction).

dimensionality reduction

FeatureSelectionvs.DimensionalityReduction

• FeatureSelection– Whenclassifyingnovelpatterns,onlyasmall numberoffeatures

needtobecomputed(i.e.,fasterclassification).– Themeasurementunits(length,weight,etc.)ofthefeaturesare

preserved.

• DimensionalityReduction(nexttime)– Whenclassifyingnovelpatterns,all featuresneedtobecomputed.– Themeasurementunits(length,weight,etc.)ofthefeaturesare

lost.

FeatureSelectionSteps

• Featureselectionisanoptimization problem.

– Step1: Searchthespaceofpossiblefeaturesubsets.

– Step2: Pickthesubsetthatisoptimalornear-optimalwithrespecttosomeobjectivefunction.

FeatureSelectionSteps(cont’d)

Search strategies– Optimal– Heuristic

Evaluation strategies- Filter methods - Wrapper methods

EvaluationStrategies

• Filter Methods– Evaluation is independent of

the classification algorithm.

– The objective function evaluates feature subsets by their information content, typically interclass distance, statistical dependence or information-theoretic measures.

EvaluationStrategies

• Wrapper Methods– Evaluation uses criteria related to the classification algorithm.

– The objective function is a pattern classifier, which evaluates feature subsets by their predictive accuracy (recognition rate on test data) by statistical resampling or cross-validation.

Filtervs.WrapperApproaches

FiltervsWrapperApproaches

SearchStrategies• Assuming n features, an exhaustive search would

require:

– Examining all possible subsets of size d.

– Selecting the subset that performs the best according to the criterion function.

• The number of subsets grows combinatorially, making exhaustive search impractical.

• In practice, heuristics are used to speed-up search but they cannot guarantee optimality.

10

ndæ öç ÷è ø

NaïveSearch

• Sort the given n features in order of their probability of correct recognition.

• Select the top d features from this sorted list.

• Disadvantage– Correlation among features is not considered.– The best pair of features may not even contain the best

individual feature.

Sequentialforwardselection(SFS)(heuristicsearch)

• First, the best single feature is selected (i.e., using some criterion function).

• Then, pairs of features are formed using one of the remaining features and this best feature, and the best pair is selected.

• Next, triplets of features are formed using one of the remaining features and these two best features, and the best triplet is selected.

• This procedure continues until a predefined number of features are selected.

12

SFS performsbest when the optimal subset issmall.

Example

13

Resultsofsequentialforwardfeatureselectionforclassificationofasatelliteimageusing28features.x-axisshowstheclassificationaccuracy(%)andy-axisshowsthefeaturesaddedateachiteration(thefirstiterationisatthebottom).Thehighestaccuracyvalueisshownwithastar.

features added at each iteration

Sequentialbackwardselection(SBS)(heuristicsearch)

• First, the criterion function is computed for all nfeatures.

• Then, each feature is deleted one at a time, the criterion function is computed for all subsets with n-1 features, and the worst feature is discarded.

• Next, each feature among the remaining n-1 is deleted one at a time, and the worst feature is discarded to form a subset with n-2 features.

• This procedure continues until a predefined number of features are left.

14

SBS performsbest when the optimal subset islarge.

Example

15

Resultsofsequentialbackward featureselectionforclassificationofasatelliteimageusing28features.x-axisshowstheclassificationaccuracy(%)andy-axisshowsthefeaturesremovedateachiteration(thefirstiterationisatthetop).Thehighestaccuracyvalueisshownwithastar.

features removed at each iteration

BidirectionalSearch(BDS)• BDSappliesSFSandSBS

simultaneously:– SFSisperformedfromthe

emptyset.– SBSisperformedfromthe

fullset.• ToguaranteethatSFSandSBS

convergetothesamesolution:– Featuresalreadyselectedby

SFSarenotremovedbySBS.– Featuresalreadyremovedby

SBSarenotaddedbySFS.3

LimitationsofSFSandSBS

• ThemainlimitationofSFSisthatitisunabletoremove featuresthatbecomenonusefulaftertheadditionofotherfeatures.

• ThemainlimitationofSBSisitsinabilitytoreevaluate theusefulnessofafeatureafterithasbeendiscarded.

• WewillexaminesomegeneralizationsofSFSandSBS:– Plus-L,minus-R”selection(LRS)– Sequentialfloatingforward/backwardselection(SFFSandSFBS)

“Plus-L,minus-R”selection(LRS)

• AgeneralizationofSFSandSBS– IfL>R,LRSstartsfromtheempty setand:

• RepeatedlyaddLfeatures• RepeatedlyremoveRfeatures

– IfL<R,LRSstartsfromthefull setand:• RepeatedlyremovesRfeatures• RepeatedlyaddLfeatures

ItsmainlimitationisthelackofatheorytohelpchoosetheoptimalvaluesofLandR.

Sequentialfloatingforward/backwardselection(SFFSandSFBS)

• AnextensiontoLRS:– RatherthanfixingthevaluesofLandR,floatingmethods

determinethesevaluesfromthedata.– Thedimensionalityofthesubsetduringthesearchcanbe

thoughttobe“floating”upanddown

• Twofloatingmethods:– Sequentialfloatingforwardselection(SFFS)– Sequentialfloatingbackwardselection(SFBS)

P. Pudil, J. Novovicova, J. Kittler, Floating search methods in feature selection, Pattern Recognition Lett. 15 (1994) 1119–1125.

Sequentialfloatingforwardselection(SFFS)

• Sequentialfloatingforwardselection(SFFS)startsfromtheemptyset.

• Aftereachforwardstep,SFFSperformsbackwardstepsaslongastheobjectivefunctionincreases.

Sequentialfloatingbackwardselection(SFBS)

• Sequentialfloatingbackwardselection(SFBS)startsfromthefullset.

• Aftereachbackwardstep,SFBSperformsforwardstepsaslongastheobjectivefunctionincreases.

FeatureSelectionusingGAs(randomizedsearch)

ClassifierFeature Subset

Data Feature Extraction

Feature Selection

(GA)

Feature Subset

• GAsprovideasimple,general,andpowerfulframeworkforfeatureselection.

FeatureSelectionUsingGAs(cont’d)

• Binary encoding:1 means“choosefeature”and0means“donotchoose”feature.

• Fitnessevaluation(tobemaximized)

1 N

Fitness=w1 ´ accuracy + w2 ´ #zerosClassification accuracy using a validation set

Number offeatures

w1>>w2

FeatureSelectionSummary

• Hastwo-foldadvantageofprovidingsomeinterpretationofthedataandmakingthelearningproblemeasier

• Findingglobaloptimumimpracticalinmostsituations,relyonheuristicsinstead(greedy\randomsearch)

• Filteringisfastandgeneralbutcanpickalarge#offeatures

• WrappingconsidersmodelbiasbutisMUCHslowerduetotrainingmultiplemodels