applications of one-class classification -- searching for comparable applications for negative...

Applications of one-Applications of one-class classificationclass classification

-- searching for comparable -- searching for comparable applications for negative applications for negative

selection algorithmsselection algorithms

backgroundbackground

►Purpose: looking for real world Purpose: looking for real world applications that demonstrates the usage applications that demonstrates the usage of of V-detectorV-detector (a negative selection (a negative selection algorithm)algorithm)

►One-class classification problem:One-class classification problem: Different from conventional classification: Different from conventional classification:

only information of one of the classes (target only information of one of the classes (target class) is availableclass) is available

Original application: anomaly (outliner) Original application: anomaly (outliner) detectiondetection

One-class ClassificationOne-class Classification

► Basic concept of classification:Basic concept of classification: A classifier is a function which outputs a class A classifier is a function which outputs a class

label from each input object. It cannot be label from each input object. It cannot be constructed from known rules.constructed from known rules.

In pattern recognition or machine learning: In pattern recognition or machine learning: inferring a classifier (a function) from a set of inferring a classifier (a function) from a set of training examples.training examples.

Usually, the type of function is chosen beforehand Usually, the type of function is chosen beforehand and parameters are to be determined.and parameters are to be determined.►Line classifier, mixture of Gaussians, neural networks, Line classifier, mixture of Gaussians, neural networks,

support vector classifierssupport vector classifiers

One-class ClassificationOne-class Classification

►Basic concept of classification:Basic concept of classification: Assumptions: continuity, enough Assumptions: continuity, enough

information (amount of samples, limited information (amount of samples, limited noise), etc.noise), etc.

Multi-class classification can be Multi-class classification can be decomposed into two-class classificationsdecomposed into two-class classifications

One-class classificationOne-class classification

►Same problems as conventional Same problems as conventional classificationclassification Definition of errorsDefinition of errors Atypical training dataAtypical training data Measuring the complexity of a solutionMeasuring the complexity of a solution The curse of dimensionalityThe curse of dimensionality The generalization of the methodThe generalization of the method

A conventional and a one-class classifier applied to an example dataset containing apples and pears, represented by 2 features per object. The solid line is the conventional classifier which distinguishes between the apples and pears, while the dashed line describes the dataset. This description can identify the outlier apple in the lower right corner, while the classifier will just classify it as an pear.

► Additional problemsAdditional problems Most conventional classifier’s assumption that Most conventional classifier’s assumption that

more or less balanced data.more or less balanced data. Hard to decide on the basis of on class how Hard to decide on the basis of on class how

tightly the boundary should fit in each direction tightly the boundary should fit in each direction around the data.around the data.

Hard to find which features should be used to Hard to find which features should be used to find the best separation.find the best separation.

Impossible to estimate false positives.Impossible to estimate false positives. More prominent curse of dimension.More prominent curse of dimension. Extra constraints: closed boundary etc.Extra constraints: closed boundary etc.

One-class classificationOne-class classification

Various techniquesVarious techniques

►Generate outliner detectionGenerate outliner detection Some methods requires near-target objects;Some methods requires near-target objects;

►Density method: directly estimating the Density method: directly estimating the density of target objectsdensity of target objects some works requires density estimate in the some works requires density estimate in the

complete feature spacecomplete feature space Typical sample is assumedTypical sample is assumed

► Reconstruction methods: based on prior Reconstruction methods: based on prior knowledgeknowledge

► Boundary methodsBoundary methods Well defined distanceWell defined distance

Application 1: texture Application 1: texture classificationclassification

► Problem: classification of texture imagesProblem: classification of texture images polished granite (or ceramic) tiles that are widely

used as construction elements The polished granite tiles are usually inspected

by a human expert using a chosen master tile as the reference.

Such inspection is subjective and qualitative One-class classifier is suitable: Outliners cannot

be used to train any methods►Recent development

quasi-statistical representation of binary images used as a feature space for texture image classification

►Based on CCR feature space Based on CCR feature space (coordinated cluster representation)(coordinated cluster representation)

►Outline of the method:Outline of the method: Given a master texture image of a class, Given a master texture image of a class,

estimate statistics of CCR histogramestimate statistics of CCR histogram Use parameters of the statistics to define Use parameters of the statistics to define

a closed decision boundary.a closed decision boundary.

Master images

CCR feature spaceCCR feature space

► A binary image intensity: SA binary image intensity: S={s={s{l,m}}, where l=1, {l,m}}, where l=1, 2, …L and m=1, 2, …, M2, …L and m=1, 2, …, M

► A rectangular window W = I X JA rectangular window W = I X J► Scan all over the image with one pixel steps using Scan all over the image with one pixel steps using

that windowthat window► The number of all possible state of the window is 2The number of all possible state of the window is 2ww

► Coordinated clusters representation consists of a Coordinated clusters representation consists of a histogram Hhistogram H

(I,J)(I,J)(b)(b) is the index of the imageis the index of the image (I,J) indicated the size of the window(I,J) indicated the size of the window b = 1, 2, …, 2b = 1, 2, …, 2ww

► When a histogram is normalized, it is considered as a When a histogram is normalized, it is considered as a proability distribution function of occurrenceproability distribution function of occurrence FF

(I,J)(I,J)(b) = 1/A H(b) = 1/A H(I,J)(I,J)(b)(b)

► Where A = (L-I+1)X(M-J+1)Where A = (L-I+1)X(M-J+1)► Histogram H contains all the information about n-point Histogram H contains all the information about n-point

correlation moments of the image if and only if the separation correlation moments of the image if and only if the separation vectors between n pixels fit between the scanning window vectors between n pixels fit between the scanning window

► In general, when the order of statistics is higher, more In general, when the order of statistics is higher, more structural information is availablestructural information is available

► There is a structural correspondence between a gray level There is a structural correspondence between a gray level image and its thresholded counterpartimage and its thresholded counterpart

► Provided that the binary image keeps enough structural Provided that the binary image keeps enough structural information about a primary gray level image to be classified, information about a primary gray level image to be classified, the CCR of a binary image is highly suitable for recognition the CCR of a binary image is highly suitable for recognition and classification of gray level texture imageand classification of gray level texture image

► Framework of classificationFramework of classification Training phaseTraining phase

►a set of gray level image from each texture classa set of gray level image from each texture class►Each thresholdEach threshold►Calculate CCR distribution functionCalculate CCR distribution function

Recognition phaseRecognition phase► Input test imageInput test image►ThresholdedThresholded►CCR distributionCCR distribution►Compare with prototypes and assign to the class of best Compare with prototypes and assign to the class of best

matchmatch►One-class classificationOne-class classification

►Define the limits of feature variationsDefine the limits of feature variations►Establish the criterion Establish the criterion

►Thresholding (binarization)Thresholding (binarization) Because CCR is defined for binary imageBecause CCR is defined for binary image Fuzzy C-Means clustering methodFuzzy C-Means clustering method

► Training phaseTraining phase assuming Q images of a class are available, a random set assuming Q images of a class are available, a random set

of P subimages is sampledof P subimages is sampled► If only one image is available, Q independent random sets If only one image is available, Q independent random sets

are sampledare sampled Five measurements are calculated from distribution Five measurements are calculated from distribution

function Ffunction F

1.1. F: mass center of subimages (not a value, still a function or F: mass center of subimages (not a value, still a function or histogram)histogram)

2.2. D: mean of distance (“distance” refers to the mean distance D: mean of distance (“distance” refers to the mean distance within a set)within a set)

3.3. : mean of standard deviation (“standard deviation” refers : mean of standard deviation (“standard deviation” refers to the standard deviation of a set)to the standard deviation of a set)

4. D: mean of q-th sample center to the center of all samples: mean of q-th sample center to the center of all samples5.5. 22: variance of each sets with regard to center of samples: variance of each sets with regard to center of samples

► CriterionCriterion d(Fd(Ftesttest, F) < , F) < DD+C+C D-2D-2<D<Dtesttest<D+2<D+2

► C is the emprical adjustment parameterC is the emprical adjustment parameter► FFtest test and Dand Dtest test are the mean of K ransom are the mean of K ransom

subimage of the texture image to be classifiedsubimage of the texture image to be classified► LL11 distance is used as the measures of distance is used as the measures of

distinctiondistinction d(Fd(F,F,F) = ) = bb|F|F(b)-F(b)-F(b)|(b)|

► ResultsResults C should be in the range of 1, 2, …, 20C should be in the range of 1, 2, …, 20

►Based on observation that Based on observation that is approximately ten times is approximately ten times less than less than DD

8 master images (training data) plus 16 testing 8 master images (training data) plus 16 testing images are used (128X128).images are used (128X128).

For C=1 or 2, only the master images are For C=1 or 2, only the master images are recognizedrecognized

More images are recognized for larger CMore images are recognized for larger C For C<19, no mis-classificationFor C<19, no mis-classification Proper C depends on the size of subimage (32, Proper C depends on the size of subimage (32,

24, 64 are discussed)24, 64 are discussed)

Application 2: authorshipApplication 2: authorship

► Problem: authorship verificationProblem: authorship verification Different from standard text categorization Different from standard text categorization

problemproblem No realistic to train with negative samplesNo realistic to train with negative samples

► Difference from other one-class Difference from other one-class classificationclassification

1.1. Negative samples are not lacked – hard to Negative samples are not lacked – hard to choose to represent the entire classchoose to represent the entire class

2.2. The object texts are longThe object texts are long We can chunk to multiple samples – a set instead of We can chunk to multiple samples – a set instead of

single instancesingle instance

►New idea:New idea: Depth of difference between two setsDepth of difference between two sets Test the rate of degradation of accuracy Test the rate of degradation of accuracy

as the best features are iteratively as the best features are iteratively droppeddropped

►Standard methodStandard method►Choose a feature set: frequencies of function Choose a feature set: frequencies of function

words, syntactic structures, parts-of-speech n-words, syntactic structures, parts-of-speech n-grams, complexity and richness measure, grams, complexity and richness measure, syntactic and orthographic idiosyncrasiessyntactic and orthographic idiosyncrasies

Note: very different from text categorization by topicNote: very different from text categorization by topic

►Having constructed feature vectors, use Having constructed feature vectors, use learning algorithm to construct distinguishing learning algorithm to construct distinguishing modelmodel

Similar to categorization by topicSimilar to categorization by topic Liner separators are believed to work wellLiner separators are believed to work well Assessment: k-fold cross-validation or bootstrappingAssessment: k-fold cross-validation or bootstrapping

► One-class scenarioOne-class scenario Naïve approachNaïve approach

► Chunk two works to generate two sufficient large setsChunk two works to generate two sufficient large sets► Test if we can distinguish using cross-validation with high Test if we can distinguish using cross-validation with high

accuracyaccuracy► Failed in experiment (different works are just different enough to Failed in experiment (different works are just different enough to

tell)tell) New approach: unmaskingNew approach: unmasking

► In the above approach, a small number of features are doing all In the above approach, a small number of features are doing all the work. They are likely to be from thematic differences, the work. They are likely to be from thematic differences, difference in genre or purpose, chronological shift of style, difference in genre or purpose, chronological shift of style, deliberate attempt to mask identitydeliberate attempt to mask identity

► Unmasking: removing features that are most useful to distinguishUnmasking: removing features that are most useful to distinguish► Hypothesis: if they are by the same author, difference will be Hypothesis: if they are by the same author, difference will be

refelected in only a relative small number of featuresrefelected in only a relative small number of features► Sudden degradation shows the same authorSudden degradation shows the same author

►Results:Results: Corpus: 21 19Corpus: 21 19thth century English iterature century English iterature Baseline: one-class SVMBaseline: one-class SVM Extension: using negative samples to Extension: using negative samples to

eliminate false positiveeliminate false positive Solution to a literary mystery: the case of Solution to a literary mystery: the case of

the bashful rabbithe bashful rabbi

bibliographybibliography

► D. M. J. Tax, “One-class classification”, PhD thesis, 2001D. M. J. Tax, “One-class classification”, PhD thesis, 2001► D. M. J. Tax, “Data description toolbox: A Matlab toolbox D. M. J. Tax, “Data description toolbox: A Matlab toolbox

for data description, outlier and novelty detection”. 2005for data description, outlier and novelty detection”. 2005► M. Koppel and J. Schler, Authorship verification as a one-M. Koppel and J. Schler, Authorship verification as a one-

class classification problem, in Proceedings of 21class classification problem, in Proceedings of 21stst International Conference on Machine Learning, 2004.International Conference on Machine Learning, 2004.

► R.E.Sanchez-Yanez et al, One-class texture classifier in the R.E.Sanchez-Yanez et al, One-class texture classifier in the CCRfeature space, Pattern Recognition Letter, 24, 2003.CCRfeature space, Pattern Recognition Letter, 24, 2003.

► R.E.Sanchez-Yanez et al, A framework for texture R.E.Sanchez-Yanez et al, A framework for texture classification using the coordinated clusters classification using the coordinated clusters representation, Pattern Recognition Letter, 24, 2003.representation, Pattern Recognition Letter, 24, 2003.

applications of one-class classification -- searching for comparable applications for negative...

Documents

class classification

class classification

class classifier

class label

texture classification

defined distance slide

closed boundary

balanced data