visual information systems multiple processor approach

Visual Information Visual Information SystemsSystems

multiple processor approachmultiple processor approach

ObjectivesObjectives

An introductive tutorial on multiple An introductive tutorial on multiple classifier combinationclassifier combination

Motivation and basic conceptsMotivation and basic concepts Main methods for creating multiple Main methods for creating multiple

classifiersclassifiers Main methods for fusing multiple Main methods for fusing multiple

classifiersclassifiers Applications, achievement, open Applications, achievement, open

issues and conclusionissues and conclusion

Why?Why?

A natural move when trying to solve A natural move when trying to solve numerous complicated patternsnumerous complicated patterns

EfficiencyEfficiency Dimension;Dimension; Complicated architecture such as neural Complicated architecture such as neural

network;network; Speed;Speed;

AccuracyAccuracy

Pattern ClassificationPattern Classification

Processing

Feature extraction

Classification

Fork spoon

Traditional approach to pattern Traditional approach to pattern classificationclassification

Unfortunately, no dominant classifier Unfortunately, no dominant classifier exists for all the data distributions, and the exists for all the data distributions, and the data distribution of the task at hand is data distribution of the task at hand is usually unknownusually unknown

Not one classifier can discriminative well Not one classifier can discriminative well enough if the number of classes are hugeenough if the number of classes are huge

For applications where the objects/classes For applications where the objects/classes of content are numerous, unlimited, of content are numerous, unlimited, unpredictable, one specific unpredictable, one specific classifier/detector cannot solve the classifier/detector cannot solve the problem.problem.

Combine individual classifiersCombine individual classifiers Beside avoiding the selection of the worse Beside avoiding the selection of the worse

classifier, under particular hypothesis, fusion of classifier, under particular hypothesis, fusion of multiple classifiers can improve the performance multiple classifiers can improve the performance of the best individual classifiers and, in some of the best individual classifiers and, in some special cases, provide the optimal Bayes classifierspecial cases, provide the optimal Bayes classifier

This is possible if individual classifiers make This is possible if individual classifiers make “different” errors“different” errors

For linear combiners, Turner and Ghosh (1996) For linear combiners, Turner and Ghosh (1996) showed that averaging outputs of individual showed that averaging outputs of individual classifiers with unbiased and uncorrelated errors classifiers with unbiased and uncorrelated errors can improve the performance of the best can improve the performance of the best individual classifier and, for infinite number of individual classifier and, for infinite number of classifiers, provide the optimal Bayes classifierclassifiers, provide the optimal Bayes classifier

DefinitionsDefinitions

A “classifier” is any mapping from the A “classifier” is any mapping from the space of features(measurements) to a space of features(measurements) to a space of class labels (names, tags, space of class labels (names, tags, distances, probabilities)distances, probabilities)

A classifier is a hypothesis about the real A classifier is a hypothesis about the real relation between features and class labelsrelation between features and class labels

A “learning algorithm” is a method to A “learning algorithm” is a method to construct hypothesesconstruct hypotheses

A learning algorithm applied to a set of A learning algorithm applied to a set of samples (training set) outputs a classifiersamples (training set) outputs a classifier

DefinitionsDefinitions

A multiple classifier system (MCS) is a A multiple classifier system (MCS) is a structured way to combine (exploit) structured way to combine (exploit) the outputs of individual classifiersthe outputs of individual classifiers

MCS can be thought as:MCS can be thought as: Multiple expert systemsMultiple expert systems Committees of expertsCommittees of experts Mixtures of expertsMixtures of experts Classifier ensemblesClassifier ensembles Composite classifier systemsComposite classifier systems

Basic conceptsBasic concepts

Multiple Classifier Systems (MCS) can Multiple Classifier Systems (MCS) can be characterized by:be characterized by: The ArchitectureThe Architecture Fixed/Trained Combination strategyFixed/Trained Combination strategy OthersOthers

MCS Architecture/TopologyMCS Architecture/Topology

SerialSerial

Expert 1

Expert 2

…

Expert N

ParallelParallel


Expert 1 Expert 2 … Expert N

Combining strategy

HybridHybrid


Expert 1

Expert 2

…

Expert N

Combiner1

Combiner2

Multiple Classifiers Sources?Multiple Classifiers Sources?

Different Different feature spacesfeature spaces: : face, voice face, voice fingerprint;fingerprint;

Different Different training setstraining sets: : Sampling;Sampling; Different Different classifiersclassifiers: : K_NN, Neural Net, SVM;K_NN, Neural Net, SVM; Different Different architecturesarchitectures: : Neural net: layers, Neural net: layers,

Units, transfer function;Units, transfer function; Different Different parameter valuesparameter values: : K in K_NN, K in K_NN,

Kernel in SVM;Kernel in SVM; Different Different initializationsinitializations: : Neural netNeural net


Same feature space, three classifiers demonstrate different performance


Different Different feature spacesfeature spaces: : face, voice face, voice fingerprint;fingerprint;

Different Different training setstraining sets: : Sampling;Sampling; Different Different classifiersclassifiers: : K_NN, Neural Net, K_NN, Neural Net,

SVM;SVM; Different Different architecturesarchitectures: : Neural net: Neural net:

layers, Units, transfer function;layers, Units, transfer function; Different Different parameter valuesparameter values: : K in K_NN, K in K_NN,

Kernel in SVM;Kernel in SVM; Different Different initializationsinitializations: : Neural netNeural net

Combination based on different feature spaces

Combining based on a single space but Combining based on a single space but different classifiersdifferent classifiers

Architecture of multiple classifier Architecture of multiple classifier combinationcombination

Fixed Combination RulesFixed Combination Rules

Product, MinimumProduct, Minimum Independent feature spaces;Independent feature spaces; Different areas of expertise;Different areas of expertise; Error free posterior probability estimatesError free posterior probability estimates

Sum(Mean), Median, Majority VoteSum(Mean), Median, Majority Vote Equal posterior-estimation distributions in same feature Equal posterior-estimation distributions in same feature

space;space; Differently trained classifiers, but drawn from the same Differently trained classifiers, but drawn from the same

distributiondistribution Bad if some classifiers(experts) are very good or very badBad if some classifiers(experts) are very good or very bad

Maximum RuleMaximum Rule Trust the most confident classifier/expert;Trust the most confident classifier/expert; Bad if some classifiers(experts) are badly trained.Bad if some classifiers(experts) are badly trained.

Ever optimal?

Fixed combining rules are sub-Fixed combining rules are sub-optimaloptimal

Base classifiers are never really Base classifiers are never really independent(Product)independent(Product)

Base classifiers are never really equally Base classifiers are never really equally imperfectly trained(sum,median,majority)imperfectly trained(sum,median,majority)

Sensitivity to over-confident base Sensitivity to over-confident base classifiers(product, min,max)classifiers(product, min,max)

Fixed combining rules are never optimal

TrainedTrained combinercombiner

Remarks on fixed and trained Remarks on fixed and trained combination strategiescombination strategies

Fixed rulesFixed rules SimplicitySimplicity Low memory and time requirementsLow memory and time requirements Well-suited for ensembles of classifiers with Well-suited for ensembles of classifiers with

independent/low correlated errors and similar independent/low correlated errors and similar performancesperformances

Trained rulesTrained rules Flexibility: potentially better performances Flexibility: potentially better performances

than fixed rulesthan fixed rules Trained rules are claimed to be more suitable Trained rules are claimed to be more suitable

than fixed ones for classifiers correlated or than fixed ones for classifiers correlated or exhibiting different performancesexhibiting different performances

High memory and time requirementsHigh memory and time requirements

Methods for fusing multiple classifiersMethods for fusing multiple classifiers

Methods for fusing multiple classifiers can be Methods for fusing multiple classifiers can be classified according to the type of information classified according to the type of information produced by the individual classifiers produced by the individual classifiers (Xu et al., (Xu et al., 1992)1992) The abstract level output: a classifier only outputs a The abstract level output: a classifier only outputs a

unique label for each input pattern; unique label for each input pattern; The rank level output: each classifier outputs a list of The rank level output: each classifier outputs a list of

possible classes, with ranking, for each input patternpossible classes, with ranking, for each input pattern The measurement level output: each classifier outputs The measurement level output: each classifier outputs

class “confidence” levels for each input patternclass “confidence” levels for each input patternFor each of the above categories, methods can be further

subdivided into:Integration vs Selection rules and Fixed rules vs trained rules

ExampleExample

The majority voting rule The majority voting rule fixed rules at the abstract-levelfixed rules at the abstract-level

Fuser (“Combination” rule)Fuser (“Combination” rule) Two main categories of fuser:Two main categories of fuser: Integration (fusion) functions: for each pattern, Integration (fusion) functions: for each pattern,

all the classifiers contribute to the final decision. all the classifiers contribute to the final decision. Integration assumes competitive classifiersIntegration assumes competitive classifiers

Selection functions: for each pattern, just one Selection functions: for each pattern, just one classifier, or a subset, is responsible for the final classifier, or a subset, is responsible for the final decision. Selection assumes complementary decision. Selection assumes complementary classifiersclassifiers

Integration and Selection can be “merged” for Integration and Selection can be “merged” for designing the hybrid fuserdesigning the hybrid fuser

Multiple functions for non-parallel architecture Multiple functions for non-parallel architecture can be necessarycan be necessary

Classifiers “Diversity” vs Fuser Classifiers “Diversity” vs Fuser ComplexityComplexity

Fusion is obviously useful only if the Fusion is obviously useful only if the combined classifiers are mutually combined classifiers are mutually complementarycomplementary Ideally, classifiers with high accuracy and high Ideally, classifiers with high accuracy and high

diversitydiversity The required degree of error diversity The required degree of error diversity

depends on the fuser complexitydepends on the fuser complexity Majority vote fuser: the majority should be Majority vote fuser: the majority should be

always correctalways correct Ideal selector: only one classifier should Ideal selector: only one classifier should

correct for each pattern ??correct for each pattern ??

Classifiers “Diversity” vs Fuser Classifiers “Diversity” vs Fuser ComplexityComplexity

An example, four diversity levels (A. An example, four diversity levels (A. Sharkey, 1999)Sharkey, 1999) Level 1: no more than one classifier is Level 1: no more than one classifier is

wrong for each patternwrong for each pattern Level 2: the majority is always correctLevel 2: the majority is always correct Level 3: at least one classifier is correct Level 3: at least one classifier is correct

for each patternfor each pattern Level 4: all classifiers are wrong for Level 4: all classifiers are wrong for

some patternssome patterns

Classifiers DiversityClassifiers Diversity Measures of diversity in classifier ensembles are Measures of diversity in classifier ensembles are

a matter of ongoing research (L. I. Kuncheva)a matter of ongoing research (L. I. Kuncheva) Key issue: how are the diversity measures Key issue: how are the diversity measures

related to the accuracy of the ensemble?related to the accuracy of the ensemble? Simple fusers can be used for classifiers that Simple fusers can be used for classifiers that

exhibit a simple complementary pattern (e.g. exhibit a simple complementary pattern (e.g. majority voting)majority voting)

Complex fusers, for example, a dynamic Complex fusers, for example, a dynamic selector, are necessary for classifiers with a selector, are necessary for classifiers with a complex dependency modelcomplex dependency model

The required “complexity” of the fuser depends The required “complexity” of the fuser depends on the degree of classifiers diversityon the degree of classifiers diversity

Analogy Between MCS and Analogy Between MCS and Single Classifier DesignSingle Classifier Design

Feature Design

Classifier Design

Performance Evaluation

Ensemble Design

Fuser Design

Performance Evaluation

MCS DesignMCS Design

The design of MCS involves two main phases: The design of MCS involves two main phases: the design of the classifier ensemble, and the the design of the classifier ensemble, and the design of the fuserdesign of the fuser

The design of the classifier ensemble is aimed The design of the classifier ensemble is aimed to create a set of complementary/diverse to create a set of complementary/diverse classifiersclassifiers

The design of the combination function/fuser is The design of the combination function/fuser is aimed to create a fusion mechanism that can aimed to create a fusion mechanism that can exploit the complementarity/diversity of exploit the complementarity/diversity of classifiers and optimally combine themclassifiers and optimally combine them

The two above design phases are obviously The two above design phases are obviously linked (Roli and Giacinto, 2002)linked (Roli and Giacinto, 2002)

Methods for Constructing MCSMethods for Constructing MCS

The effectiveness of MCS relies on combining The effectiveness of MCS relies on combining diverse/complentary classifiersdiverse/complentary classifiers

Several approaches have been proposed to Several approaches have been proposed to construct ensembles made up of construct ensembles made up of complementary classifiers. Among the others:complementary classifiers. Among the others: Using problem and designer knowledgeUsing problem and designer knowledge Injecting randomnessInjecting randomness Varying the classifier type, architecture, or Varying the classifier type, architecture, or

parametersparameters Manipulating training dataManipulating training data Manipulating input featuresManipulating input features Manipulating output featuresManipulating output features

Using problem and designer Using problem and designer knowledgeknowledge

When problem or designer knowledge is When problem or designer knowledge is available, “complementary” classification available, “complementary” classification algorithms can be designed algorithms can be designed In applications with multiple sensorsIn applications with multiple sensors In applications where complementary In applications where complementary

representations of patterns are possible (e.g., representations of patterns are possible (e.g., statistical and structural representations)statistical and structural representations)

When designer knowledge allows varying the When designer knowledge allows varying the classifier type, architecture, or parameters to classifier type, architecture, or parameters to create complementary classifierscreate complementary classifiers

There are heuristic approaches, perform as well as There are heuristic approaches, perform as well as the problem designer knowledge allows to the problem designer knowledge allows to design complementary classifiersdesign complementary classifiers

Two main method for MCS Two main method for MCS design (T. K. Ho, 2000)design (T. K. Ho, 2000)

Coverage optimisation methodsCoverage optimisation methods A simple fuser is given without any A simple fuser is given without any

design. The goal is to create a set of design. The goal is to create a set of complementary classifiers that can be complementary classifiers that can be fused optimallyfused optimally

Decision optimisation methodsDecision optimisation methods A set of carefully designed and optimised A set of carefully designed and optimised

classifiers is given and unchangeable, the classifiers is given and unchangeable, the goal is to optimise the fusergoal is to optimise the fuser

Two main method for MCS designTwo main method for MCS design

Decision optimisation method to MCS design is often Decision optimisation method to MCS design is often used when previously carefully designed classifiers used when previously carefully designed classifiers are available, or valid problem and designer are available, or valid problem and designer knowledge is availableknowledge is available

Coverage optimisation method makes sense when Coverage optimisation method makes sense when creating carefully designed, “strong”, classifiers is creating carefully designed, “strong”, classifiers is difficult, or time consumingdifficult, or time consuming

Integration of the two basic approaches is often usedIntegration of the two basic approaches is often used However, no design method guarantees to obtain However, no design method guarantees to obtain

the “optimal” ensemble for a given fuser or a given the “optimal” ensemble for a given fuser or a given application “Roli and Giacinto, 2002)application “Roli and Giacinto, 2002)

The base MCS can only be determined by The base MCS can only be determined by performance evaluationperformance evaluation

Rank-level Fusion MethodsRank-level Fusion Methods

Some classifiers provide class “scores”, or Some classifiers provide class “scores”, or some sort of class probabilitiessome sort of class probabilities

This information can be used to “rank” This information can be used to “rank” each classeach class

Pc1=0.10 Rc1=1Pc1=0.10 Rc1=1 Classifier -> Pc2=0.75 -> Rc2=3Classifier -> Pc2=0.75 -> Rc2=3 Pc3=0.15 Rc3=2Pc3=0.15 Rc3=2 In general if In general if Ω={c1,…ck} is the set of Ω={c1,…ck} is the set of

classes, the classifiers can provide an classes, the classifiers can provide an “ordered” (ranked) list of class labels“ordered” (ranked) list of class labels

The Borda Count Method: an The Borda Count Method: an exampleexample

Let N=3 and k=4, Let N=3 and k=4, Ω={a,b,c,d}Ω={a,b,c,d} For a given pattern, the ranked ouptuts For a given pattern, the ranked ouptuts

of the three classfiers are as followsof the three classfiers are as followsRank value Classifier1 Classifer2 Rank value Classifier1 Classifer2

Classifier3Classifier34 c a b4 c a b3 b b a3 b b a2 d d c2 d d c1 a c d1 a c d

The Borda Count Methods: an The Borda Count Methods: an exampleexample

So we haveSo we have

rraa = r = raa11+ra

2+ ra3 = 1+4+3=8

rrbb = r = rbb11+rb

2+ rb3 = 3+3+4=10

rrcc = r = rcc11+rc

2+ rc3 = 4+1+2=7

rrdd = r = rdd11+rd

2+ rd3 = 2+2+1=5

The winner-class is b because it has the maximum overall rank

Remarks on Rank level Remarks on Rank level MethodsMethods

Advantage over abstract level (majority vote)Advantage over abstract level (majority vote) Ranking is suitable in problems with many classes, Ranking is suitable in problems with many classes,

where the correct class may appear often near the top where the correct class may appear often near the top of the list, although not at the top of the list, although not at the top

Example: word recognition with sizeablee lexiconExample: word recognition with sizeablee lexicon Advantages over measurement level:Advantages over measurement level:

Rankings can be preferred to soft outputs to avoid lack of Rankings can be preferred to soft outputs to avoid lack of conssitency when using different classifierconssitency when using different classifier

Rankins can be preferred to soft outputs to simplify the Rankins can be preferred to soft outputs to simplify the combiner designcombiner design

Drawbacks:Drawbacks: Rank-level method are not supported by clear theorectical Rank-level method are not supported by clear theorectical

underpinningsunderpinnings Results depend on the scale of numbers assigned to the Results depend on the scale of numbers assigned to the

choiceschoices

Open issuesOpen issues

General combination strategies are General combination strategies are only sub-optimal solutions to most only sub-optimal solutions to most applications;applications;

ReferencesReferences1.1. Dr K Sirlantzis “Diversity in Multiple Classifier Systems”, Dr K Sirlantzis “Diversity in Multiple Classifier Systems”,

University of Kent;www.ee.kent.ac.uk;University of Kent;www.ee.kent.ac.uk;2. F. Roli, Tutorial Fusion of Multiple Pattern Classifier”, University

of Cagliari3. Robert P.W.Duin, “The Combining Classifier: to Train or Not to

Train?”, ICPR 2002, Pattern Recognition Group, Faculty of Applied Sciences;

4.4. L. Xu, A. Kryzak, C. V. Suen, “Methods of Combining Multiple L. Xu, A. Kryzak, C. V. Suen, “Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition”, Classifiers and Their Applications to Handwriting Recognition”, IEEE Transactions on Systems, Man Cybernet, 22(3), 1992, pp. IEEE Transactions on Systems, Man Cybernet, 22(3), 1992, pp. 418-435.418-435.

5.5. J. Kittle, M. Hatef, R. Duin and J. Matas, “On Combining J. Kittle, M. Hatef, R. Duin and J. Matas, “On Combining Classifiers”, IEEE Transactions on Pattern Analysis and Machine Classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), March 1998, pp. 226-239. Intelligence, 20(3), March 1998, pp. 226-239.

6.6. D. Tax, M. Breukelen, R. Duin, J. Kittle, “Combining Multiple D. Tax, M. Breukelen, R. Duin, J. Kittle, “Combining Multiple Classifiers by Averaging or by Multiplying?”, Patter Recognition, Classifiers by Averaging or by Multiplying?”, Patter Recognition, 33(2000), pp. 1475-1485. 33(2000), pp. 1475-1485.

7.7. L. I. Kuncheva, “A Theoretical Study on Six Classifier Fusion L. I. Kuncheva, “A Theoretical Study on Six Classifier Fusion Strategies”, IEEE Transactions on Pattern Analysis and Machine Strategies”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2), 2002, pp. 281-286.Intelligence, 24(2), 2002, pp. 281-286.

visual information systems multiple processor approach

Documents