negative selection algorithms at gecco 2005 7/22/2005

Negative Selection Algorithms at GECCO 20057/22/2005

AIS track of GECCO 2005• 11 regular paper

– 5 “negative selection algorithm” related

– 3 “immune network model” related

– multi –agent simulation, gene library, antigenic search

• 2 posters– Immune network model,

clonal selection

Papers on “Negative selection algorithms”• Ji & Dasgupta “Estimating the

detector coverage in a negative selection algorithm”

• Gonzalez et al “Discriminating and visualizing anomalies using negative selection algorithm and self-organizing maps”

• Stibor et al, “Is negative selection appropriate for anomaly detection?”

• Shaprio et al, “An evolutionary algorithm to generate hyper-ellipsoid detectors for negative selection”

• Hang et al, “Applying both positive and negative selection to supervise learning for anomaly detection”

“Discriminating and visualizing anomalies using negative selection algorithm and self-organizing maps”

Main Idea:• Combination of NS and

SOM (self-organizing map)• Visualize the anomalies

Key feature

• Using negative selection to produce artificial anomalies instead of detectors

SOM

• A type of neural network• To capture the feature in the

input and to provide a structural representation

• Output neurons are organized in a one- or two-dimensional lattice

• The weight vectors of these neurons represent prototypes (cluster centroid)

Three phases of NS-SOM

NS-SOM model

• “training SOP with only normal samples will produce a map that only reflect the structure of the self space, ignoring the non-self space”

• N-dimensional real-valued• During the second phase: if the input

samples are labels, … (moving the third phase).

• The first phase is executed just once, but the second and third phases could be executed as many times as sets of new samples are available

• Visual representation by a 2-D grid corresponding to the network

SOP output

• “A visual representation of the feature (self/non-self) space could be generated by drawing the 2-dimensional grid corresponding to the network, and assigning each node a different color depending on the category it represents (normal, unknown anomaly, or known anomaly).”

• “Two different SOM topologies were used with a rectangular output layer of 8×8 and 16×16 nodes.”

Output visualization

• Implementation– NS : RRNS algorithm by

Gonzalez et al– SOP : using the SOM-PAK

package by Helsinki University of Technology http://www.cis.hut.fi/

• Experiments– Iris data set– Wisconsin Breast Cancer data set

“Is negative selection appropriate for anomaly detection?”

• Problems in negative selection (specific schemes and applications)

• Compare with SVM (Support Vector Machine): requiring examples of one class or two classes?

• General problem : candidates are generated by a simple random search

• Shape space <-> affinity• “holes are necessary, to

generalizing beyond training set”– No hole: overfitting– Too many hole: underfitting

Criticism for binary representation

• “the hamming shape-space and the r-chunk matching rule only appropriate and applicable for anomaly detection problems for a small value of l (e.g. 0<l<32)” – Totally based on Esponda et

al’s analysis about number of holes

* Although I want to focus on introducing instead criticizing this work. The authors seems confused between hamming and r-chunk.

Criticism for real-valued representation

• Positive selection (Self Detection Classification) is more straightforward.

• It is not clear how to choose self radius.– “From our point of view, it is an approach

which requires two classes in the learning phase in order to determine the self-radius.” – no reason given.

• It is a problem how to find an optimal distribution do the detector (Gonzalez et al’s method takes “a vast amount of time”).

Occam’s razor principle

• When you have two competing theories which make exactly the same predictions, the one that is simpler is the better.

Comparison with SVM

• SVM is a machine learning algorithm for a two-class classification problem.

• The input data is mapped into a higher-dimensional feature space, where a linear decision region is constructed.

• A one-class SVM was proposed by Scholkopf et al.– Provides good results in high

dimensional space (no detail or results provided)

Summary

• Unfortunately, citing several related works, then making a scary claim.

• Little was done to analyze or propose alternatives, except proposing “Self Detector Classification” – detection by directly check all training samples.

“Applying both positive and negative selection to supervise learning for anomaly detection”

• Use synthetic anomalies to deal with anomaly-detection (supervised learning from class-imbalance data sets)– GA: Positive selection– Synthetic data: negative

selection

• Categorical/discrete data

Two categories of methods• At data level: main

focusing on re-sampling– Under-sampling the normal

class– Over-sampling the anomaly

class– combination

• At algorithm level

Other works using this strategy• Gonzales et al• SMOTE (Synthetic Minority

Over-sampling TEchniques)– “taking each minority class

sample and introducing synthetic examples along the line segment joining any/all of the k minority class nearest neighbors.”

The way of SMOTE generating synthetic samples

Phase 1: co-evolving patterns of the normal data (positive selection)• A number of non-interbreeding

subpopulation: no cooperation, no competition

• Randomly initialized• All converged scheme together

form the decision boundary.• Individuals consist of four

sections:

• fitness-proportionate selection• Uniform crossover• Bit flipping mutation• Subpopulation size=100• Crossover rate=0.65• Mutation rate=0.15

Phase 2: synthetic generation of anomalous samples

• Strategy 1: with seed– Starting with vacant neighbors of

the examples of the anomaly class• 2n neighbors for n-dimensional• “Vacant” means neither normal nor

anomaly

– Check if candidates is covered by schema of normal class. Those covered are removed.

• Strategy 2: without seed – in the case of no anomaly examples– Starting with random position

experiments

• UCI data sets: 14 used• Multi-class data are mapped into a

2-class dataset– Version 1: Natural distribution– Version 2: Balanced natural distribution– Version 3: balanced extreme

distribution(“balanced” means “processed by the

approach described in this paper”)

• Classifiers used: C4.5 and Naive Bayes

• Result: v2>v3>>v1

negative selection algorithms at gecco 2005 7/22/2005

Documents

som slide

network slide

negative selection appropriate

clonal selection slide

anomaly detection slide

negative selection hang

underfitting slide

output visualization