applicability issues of the (real-valued) negative selection algorithms zhou ji, dipankar dasgupta...

21
Applicability Applicability Issues of the Issues of the (Real-valued) (Real-valued) Negative Selection Negative Selection Algorithms Algorithms Zhou Ji, Dipankar Zhou Ji, Dipankar Dasgupta Dasgupta The University of Memphis The University of Memphis ECCO 2006: July 11, 2006. Seattle.

Upload: ian-orourke

Post on 10-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Applicability Issues Applicability Issues of the (Real-of the (Real-

valued) Negative valued) Negative Selection Selection

AlgorithmsAlgorithmsZhou Ji, Dipankar DasguptaZhou Ji, Dipankar Dasgupta

The University of MemphisThe University of Memphis

GECCO 2006: July 11, 2006. Seattle.

outlineoutline

BackgroundBackground Applicable or notApplicable or not??

Whether and whenWhether and when Issue by issueIssue by issue

ConclusionConclusion

BackgroundBackground

Artificial Intelligence

… … Biology-inspired methods … …

Neural networkEvolutionary computation

Artificial immune system (AIS) … …

Immune network Clonal selectionNegative selection

algorithmsOther models

Basic idea of negative selection Basic idea of negative selection algorithms (NSA):algorithms (NSA):

The problem to solve: anomaly detection or one-class classification

Basic idea of negative selection Basic idea of negative selection algorithms (NSA):algorithms (NSA):

Possible detectors are generated randomly.

Basic idea of negative selection Basic idea of negative selection algorithms (NSA):algorithms (NSA):

Those that cover self region are eliminated.

Variety of NSAVariety of NSA

Data and detector representationData and detector representation Binary (or string) representationBinary (or string) representation Real-valued representation; detectors as Real-valued representation; detectors as

hypersphere, or hyper-rectanglehypersphere, or hyper-rectangle Hybrid representationHybrid representation

Generate/elimination mechanismGenerate/elimination mechanism Random generation + censoringRandom generation + censoring Genetic algorithmGenetic algorithm Greedy algorithm or other deterministic algorithmGreedy algorithm or other deterministic algorithm

Matching ruleMatching rule Rcb (r contiguous bits) for binary representationRcb (r contiguous bits) for binary representation Euclidean distance-based for real-valued Euclidean distance-based for real-valued

representationrepresentation

Is NSA appropriate at all?Is NSA appropriate at all?

What we know:What we know: NSA is unique in its process and NSA is unique in its process and

representation schemerepresentation scheme There are some scenarios that NSA has There are some scenarios that NSA has

advantageadvantage Large Large number ofnumber of normal samples normal samples ““Negative database” to hide sample instancesNegative database” to hide sample instances

What we don’t know:What we don’t know: What are the applications What are the applications that that NSA NSA always always

do bestdo best in in??

Is real-valued Is real-valued representation appropriate?representation appropriate? It is important to choose proper data It is important to choose proper data

representation to make it possible to representation to make it possible to differentiate between classes.differentiate between classes.

It is application specific.It is application specific. The issue is general in all learning or The issue is general in all learning or

classification methods.classification methods. It is NSA’s strength that different It is NSA’s strength that different

data representations can fit in. data representations can fit in.

Matching rule’s Matching rule’s influenceinfluence

It is linked with the choice of data It is linked with the choice of data representation.representation.

The issues are similar with those of The issues are similar with those of data representation:data representation: It is also application specific. There is It is also application specific. There is

no no panaceapanacea.. Matching threshold is a real Matching threshold is a real

difficulty.difficulty.

Positive selection or Positive selection or negative selection?negative selection?

Advantage of positive selection is “being Advantage of positive selection is “being more straightforward” in many problems.more straightforward” in many problems.

For the same reason, negative selection is For the same reason, negative selection is more convenient in some cases. For more convenient in some cases. For example, the large amount to self samples.example, the large amount to self samples.

Common foundation of various negative Common foundation of various negative selection algorithms is better explained selection algorithms is better explained with the biological metaphor.with the biological metaphor.

Naïve “Self-detector” is not a solution.Naïve “Self-detector” is not a solution.

General difficulties not General difficulties not specific in NSAspecific in NSA

One-class classificationOne-class classification Without counter examples, the boundary between Without counter examples, the boundary between

the two classes is more sensitive to the threshold.the two classes is more sensitive to the threshold. With NSA, a good strategy like “boundary-aware V-With NSA, a good strategy like “boundary-aware V-

detector” could handle this issue very well in some detector” could handle this issue very well in some cases.cases.

High dimensionalityHigh dimensionality1.1. How to represent high-dimensional space effectivelyHow to represent high-dimensional space effectively

2.2. How many samples are necessary to represent a How many samples are necessary to represent a classclass

With NSA, the first difficulty is to some extent With NSA, the first difficulty is to some extent alleviatedalleviated

False criticisms of NSAFalse criticisms of NSA

NSA is useless because it doesn’t NSA is useless because it doesn’t solve “curse of dimensionality”.solve “curse of dimensionality”.

NSA doesn’t work because it failed NSA doesn’t work because it failed “this” experiment.“this” experiment.

NSA make no sense because positive NSA make no sense because positive selection is more straightforward in selection is more straightforward in such-and-such case.such-and-such case.

…………

ExperimentsExperiments

1.1. The difference between algorithm The difference between algorithm variationsvariations

2.2. Flexibility of NSA: different Flexibility of NSA: different distance measuredistance measure

3.3. NSA’s behavior at high NSA’s behavior at high dimensionalitydimensionality

Difference between algorithmDifference between algorithm variationsvariations

Flexibility of NSA: different distance Flexibility of NSA: different distance measuremeasure

Generalize Euclidean Distance to Minkowski distance of order m (m-norm distance or L-m distance)

Different detector shapes resultedDifferent detector shapes resulted

NSA’s behavior at high dimensionalitySetting-up of the experiment

Self region

A detector

15.65567.88N/AN/A0.3899.1612

11.29534.62N/AN/A0.3599.2611

8.06511.11N/AN/A0.2399.410

3.36502.32N/AN/A0.2299.69

1.22500.25N/AN/A0.1499.798

0.4500.04N/AN/A0.0899.957

050048360.0499.986

050014.0325.670.0599.965

05006.9213.110.0799.944

05001.477.70.011003

SDNumber of detectors

SDFalse alarm rate

SDDetection rate

dimensionality

NSA’s behavior at high dimensionalityThe results

ConclusionConclusion

Negative selection algorithms include Negative selection algorithms include many variations that are different in many variations that are different in many ways.many ways.

Negative selection algorithms apply Negative selection algorithms apply to certain scenarios.to certain scenarios.

There are still many questions to be There are still many questions to be answered in NSA.answered in NSA.

Other alternatives exist, but do not Other alternatives exist, but do not replace NSA.replace NSA.

Questions and Questions and comments?comments?

Thank you!Thank you!