visual information systems multiple processor approach
TRANSCRIPT
Visual Information Visual Information SystemsSystems
multiple processor approachmultiple processor approach
ObjectivesObjectives
An introductive tutorial on multiple An introductive tutorial on multiple classifier combinationclassifier combination
Motivation and basic conceptsMotivation and basic concepts Main methods for creating multiple Main methods for creating multiple
classifiersclassifiers Main methods for fusing multiple Main methods for fusing multiple
classifiersclassifiers Applications, achievement, open Applications, achievement, open
issues and conclusionissues and conclusion
Why?Why?
A natural move when trying to solve A natural move when trying to solve numerous complicated patternsnumerous complicated patterns
EfficiencyEfficiency Dimension;Dimension; Complicated architecture such as neural Complicated architecture such as neural
network;network; Speed;Speed;
AccuracyAccuracy
Pattern ClassificationPattern Classification
Processing
Feature extraction
Classification
Fork spoon
Traditional approach to pattern Traditional approach to pattern classificationclassification
Unfortunately, no dominant classifier Unfortunately, no dominant classifier exists for all the data distributions, and the exists for all the data distributions, and the data distribution of the task at hand is data distribution of the task at hand is usually unknownusually unknown
Not one classifier can discriminative well Not one classifier can discriminative well enough if the number of classes are hugeenough if the number of classes are huge
For applications where the objects/classes For applications where the objects/classes of content are numerous, unlimited, of content are numerous, unlimited, unpredictable, one specific unpredictable, one specific classifier/detector cannot solve the classifier/detector cannot solve the problem.problem.
Combine individual classifiersCombine individual classifiers Beside avoiding the selection of the worse Beside avoiding the selection of the worse
classifier, under particular hypothesis, fusion of classifier, under particular hypothesis, fusion of multiple classifiers can improve the performance multiple classifiers can improve the performance of the best individual classifiers and, in some of the best individual classifiers and, in some special cases, provide the optimal Bayes classifierspecial cases, provide the optimal Bayes classifier
This is possible if individual classifiers make This is possible if individual classifiers make “different” errors“different” errors
For linear combiners, Turner and Ghosh (1996) For linear combiners, Turner and Ghosh (1996) showed that averaging outputs of individual showed that averaging outputs of individual classifiers with unbiased and uncorrelated errors classifiers with unbiased and uncorrelated errors can improve the performance of the best can improve the performance of the best individual classifier and, for infinite number of individual classifier and, for infinite number of classifiers, provide the optimal Bayes classifierclassifiers, provide the optimal Bayes classifier
DefinitionsDefinitions
A “classifier” is any mapping from the A “classifier” is any mapping from the space of features(measurements) to a space of features(measurements) to a space of class labels (names, tags, space of class labels (names, tags, distances, probabilities)distances, probabilities)
A classifier is a hypothesis about the real A classifier is a hypothesis about the real relation between features and class labelsrelation between features and class labels
A “learning algorithm” is a method to A “learning algorithm” is a method to construct hypothesesconstruct hypotheses
A learning algorithm applied to a set of A learning algorithm applied to a set of samples (training set) outputs a classifiersamples (training set) outputs a classifier
DefinitionsDefinitions
A multiple classifier system (MCS) is a A multiple classifier system (MCS) is a structured way to combine (exploit) structured way to combine (exploit) the outputs of individual classifiersthe outputs of individual classifiers
MCS can be thought as:MCS can be thought as: Multiple expert systemsMultiple expert systems Committees of expertsCommittees of experts Mixtures of expertsMixtures of experts Classifier ensemblesClassifier ensembles Composite classifier systemsComposite classifier systems
Basic conceptsBasic concepts
Multiple Classifier Systems (MCS) can Multiple Classifier Systems (MCS) can be characterized by:be characterized by: The ArchitectureThe Architecture Fixed/Trained Combination strategyFixed/Trained Combination strategy OthersOthers
MCS Architecture/TopologyMCS Architecture/Topology
SerialSerial
Expert 1
Expert 2
…
Expert N
ParallelParallel
MCS Architecture/TopologyMCS Architecture/Topology
Expert 1 Expert 2 … Expert N
Combining strategy
HybridHybrid
MCS Architecture/TopologyMCS Architecture/Topology
Expert 1
Expert 2
…
Expert N
Combiner1
Combiner2
Multiple Classifiers Sources?Multiple Classifiers Sources?
Different Different feature spacesfeature spaces: : face, voice face, voice fingerprint;fingerprint;
Different Different training setstraining sets: : Sampling;Sampling; Different Different classifiersclassifiers: : K_NN, Neural Net, SVM;K_NN, Neural Net, SVM; Different Different architecturesarchitectures: : Neural net: layers, Neural net: layers,
Units, transfer function;Units, transfer function; Different Different parameter valuesparameter values: : K in K_NN, K in K_NN,
Kernel in SVM;Kernel in SVM; Different Different initializationsinitializations: : Neural netNeural net
Multiple Classifiers Sources?Multiple Classifiers Sources?
Same feature space, three classifiers demonstrate different performance
Multiple Classifiers Sources?Multiple Classifiers Sources?
Different Different feature spacesfeature spaces: : face, voice face, voice fingerprint;fingerprint;
Different Different training setstraining sets: : Sampling;Sampling; Different Different classifiersclassifiers: : K_NN, Neural Net, K_NN, Neural Net,
SVM;SVM; Different Different architecturesarchitectures: : Neural net: Neural net:
layers, Units, transfer function;layers, Units, transfer function; Different Different parameter valuesparameter values: : K in K_NN, K in K_NN,
Kernel in SVM;Kernel in SVM; Different Different initializationsinitializations: : Neural netNeural net
Combination based on different feature spaces
Combining based on a single space but Combining based on a single space but different classifiersdifferent classifiers
Architecture of multiple classifier Architecture of multiple classifier combinationcombination
Fixed Combination RulesFixed Combination Rules
Product, MinimumProduct, Minimum Independent feature spaces;Independent feature spaces; Different areas of expertise;Different areas of expertise; Error free posterior probability estimatesError free posterior probability estimates
Sum(Mean), Median, Majority VoteSum(Mean), Median, Majority Vote Equal posterior-estimation distributions in same feature Equal posterior-estimation distributions in same feature
space;space; Differently trained classifiers, but drawn from the same Differently trained classifiers, but drawn from the same
distributiondistribution Bad if some classifiers(experts) are very good or very badBad if some classifiers(experts) are very good or very bad
Maximum RuleMaximum Rule Trust the most confident classifier/expert;Trust the most confident classifier/expert; Bad if some classifiers(experts) are badly trained.Bad if some classifiers(experts) are badly trained.
Ever optimal?
Fixed combining rules are sub-Fixed combining rules are sub-optimaloptimal
Base classifiers are never really Base classifiers are never really independent(Product)independent(Product)
Base classifiers are never really equally Base classifiers are never really equally imperfectly trained(sum,median,majority)imperfectly trained(sum,median,majority)
Sensitivity to over-confident base Sensitivity to over-confident base classifiers(product, min,max)classifiers(product, min,max)
Fixed combining rules are never optimal
TrainedTrained combinercombiner
Remarks on fixed and trained Remarks on fixed and trained combination strategiescombination strategies
Fixed rulesFixed rules SimplicitySimplicity Low memory and time requirementsLow memory and time requirements Well-suited for ensembles of classifiers with Well-suited for ensembles of classifiers with
independent/low correlated errors and similar independent/low correlated errors and similar performancesperformances
Trained rulesTrained rules Flexibility: potentially better performances Flexibility: potentially better performances
than fixed rulesthan fixed rules Trained rules are claimed to be more suitable Trained rules are claimed to be more suitable
than fixed ones for classifiers correlated or than fixed ones for classifiers correlated or exhibiting different performancesexhibiting different performances
High memory and time requirementsHigh memory and time requirements
Methods for fusing multiple classifiersMethods for fusing multiple classifiers
Methods for fusing multiple classifiers can be Methods for fusing multiple classifiers can be classified according to the type of information classified according to the type of information produced by the individual classifiers produced by the individual classifiers (Xu et al., (Xu et al., 1992)1992) The abstract level output: a classifier only outputs a The abstract level output: a classifier only outputs a
unique label for each input pattern; unique label for each input pattern; The rank level output: each classifier outputs a list of The rank level output: each classifier outputs a list of
possible classes, with ranking, for each input patternpossible classes, with ranking, for each input pattern The measurement level output: each classifier outputs The measurement level output: each classifier outputs
class “confidence” levels for each input patternclass “confidence” levels for each input patternFor each of the above categories, methods can be further
subdivided into:Integration vs Selection rules and Fixed rules vs trained rules
ExampleExample
The majority voting rule The majority voting rule fixed rules at the abstract-levelfixed rules at the abstract-level
Fuser (“Combination” rule)Fuser (“Combination” rule) Two main categories of fuser:Two main categories of fuser: Integration (fusion) functions: for each pattern, Integration (fusion) functions: for each pattern,
all the classifiers contribute to the final decision. all the classifiers contribute to the final decision. Integration assumes competitive classifiersIntegration assumes competitive classifiers
Selection functions: for each pattern, just one Selection functions: for each pattern, just one classifier, or a subset, is responsible for the final classifier, or a subset, is responsible for the final decision. Selection assumes complementary decision. Selection assumes complementary classifiersclassifiers
Integration and Selection can be “merged” for Integration and Selection can be “merged” for designing the hybrid fuserdesigning the hybrid fuser
Multiple functions for non-parallel architecture Multiple functions for non-parallel architecture can be necessarycan be necessary
Classifiers “Diversity” vs Fuser Classifiers “Diversity” vs Fuser ComplexityComplexity
Fusion is obviously useful only if the Fusion is obviously useful only if the combined classifiers are mutually combined classifiers are mutually complementarycomplementary Ideally, classifiers with high accuracy and high Ideally, classifiers with high accuracy and high
diversitydiversity The required degree of error diversity The required degree of error diversity
depends on the fuser complexitydepends on the fuser complexity Majority vote fuser: the majority should be Majority vote fuser: the majority should be
always correctalways correct Ideal selector: only one classifier should Ideal selector: only one classifier should
correct for each pattern ??correct for each pattern ??
Classifiers “Diversity” vs Fuser Classifiers “Diversity” vs Fuser ComplexityComplexity
An example, four diversity levels (A. An example, four diversity levels (A. Sharkey, 1999)Sharkey, 1999) Level 1: no more than one classifier is Level 1: no more than one classifier is
wrong for each patternwrong for each pattern Level 2: the majority is always correctLevel 2: the majority is always correct Level 3: at least one classifier is correct Level 3: at least one classifier is correct
for each patternfor each pattern Level 4: all classifiers are wrong for Level 4: all classifiers are wrong for
some patternssome patterns
Classifiers DiversityClassifiers Diversity Measures of diversity in classifier ensembles are Measures of diversity in classifier ensembles are
a matter of ongoing research (L. I. Kuncheva)a matter of ongoing research (L. I. Kuncheva) Key issue: how are the diversity measures Key issue: how are the diversity measures
related to the accuracy of the ensemble?related to the accuracy of the ensemble? Simple fusers can be used for classifiers that Simple fusers can be used for classifiers that
exhibit a simple complementary pattern (e.g. exhibit a simple complementary pattern (e.g. majority voting)majority voting)
Complex fusers, for example, a dynamic Complex fusers, for example, a dynamic selector, are necessary for classifiers with a selector, are necessary for classifiers with a complex dependency modelcomplex dependency model
The required “complexity” of the fuser depends The required “complexity” of the fuser depends on the degree of classifiers diversityon the degree of classifiers diversity
Analogy Between MCS and Analogy Between MCS and Single Classifier DesignSingle Classifier Design
Feature Design
Classifier Design
Performance Evaluation
Ensemble Design
Fuser Design
Performance Evaluation
MCS DesignMCS Design
The design of MCS involves two main phases: The design of MCS involves two main phases: the design of the classifier ensemble, and the the design of the classifier ensemble, and the design of the fuserdesign of the fuser
The design of the classifier ensemble is aimed The design of the classifier ensemble is aimed to create a set of complementary/diverse to create a set of complementary/diverse classifiersclassifiers
The design of the combination function/fuser is The design of the combination function/fuser is aimed to create a fusion mechanism that can aimed to create a fusion mechanism that can exploit the complementarity/diversity of exploit the complementarity/diversity of classifiers and optimally combine themclassifiers and optimally combine them
The two above design phases are obviously The two above design phases are obviously linked (Roli and Giacinto, 2002)linked (Roli and Giacinto, 2002)
Methods for Constructing MCSMethods for Constructing MCS
The effectiveness of MCS relies on combining The effectiveness of MCS relies on combining diverse/complentary classifiersdiverse/complentary classifiers
Several approaches have been proposed to Several approaches have been proposed to construct ensembles made up of construct ensembles made up of complementary classifiers. Among the others:complementary classifiers. Among the others: Using problem and designer knowledgeUsing problem and designer knowledge Injecting randomnessInjecting randomness Varying the classifier type, architecture, or Varying the classifier type, architecture, or
parametersparameters Manipulating training dataManipulating training data Manipulating input featuresManipulating input features Manipulating output featuresManipulating output features
Using problem and designer Using problem and designer knowledgeknowledge
When problem or designer knowledge is When problem or designer knowledge is available, “complementary” classification available, “complementary” classification algorithms can be designed algorithms can be designed In applications with multiple sensorsIn applications with multiple sensors In applications where complementary In applications where complementary
representations of patterns are possible (e.g., representations of patterns are possible (e.g., statistical and structural representations)statistical and structural representations)
When designer knowledge allows varying the When designer knowledge allows varying the classifier type, architecture, or parameters to classifier type, architecture, or parameters to create complementary classifierscreate complementary classifiers
There are heuristic approaches, perform as well as There are heuristic approaches, perform as well as the problem designer knowledge allows to the problem designer knowledge allows to design complementary classifiersdesign complementary classifiers
Two main method for MCS Two main method for MCS design (T. K. Ho, 2000)design (T. K. Ho, 2000)
Coverage optimisation methodsCoverage optimisation methods A simple fuser is given without any A simple fuser is given without any
design. The goal is to create a set of design. The goal is to create a set of complementary classifiers that can be complementary classifiers that can be fused optimallyfused optimally
Decision optimisation methodsDecision optimisation methods A set of carefully designed and optimised A set of carefully designed and optimised
classifiers is given and unchangeable, the classifiers is given and unchangeable, the goal is to optimise the fusergoal is to optimise the fuser
Two main method for MCS designTwo main method for MCS design
Decision optimisation method to MCS design is often Decision optimisation method to MCS design is often used when previously carefully designed classifiers used when previously carefully designed classifiers are available, or valid problem and designer are available, or valid problem and designer knowledge is availableknowledge is available
Coverage optimisation method makes sense when Coverage optimisation method makes sense when creating carefully designed, “strong”, classifiers is creating carefully designed, “strong”, classifiers is difficult, or time consumingdifficult, or time consuming
Integration of the two basic approaches is often usedIntegration of the two basic approaches is often used However, no design method guarantees to obtain However, no design method guarantees to obtain
the “optimal” ensemble for a given fuser or a given the “optimal” ensemble for a given fuser or a given application “Roli and Giacinto, 2002)application “Roli and Giacinto, 2002)
The base MCS can only be determined by The base MCS can only be determined by performance evaluationperformance evaluation
Rank-level Fusion MethodsRank-level Fusion Methods
Some classifiers provide class “scores”, or Some classifiers provide class “scores”, or some sort of class probabilitiessome sort of class probabilities
This information can be used to “rank” This information can be used to “rank” each classeach class
Pc1=0.10 Rc1=1Pc1=0.10 Rc1=1 Classifier -> Pc2=0.75 -> Rc2=3Classifier -> Pc2=0.75 -> Rc2=3 Pc3=0.15 Rc3=2Pc3=0.15 Rc3=2 In general if In general if Ω={c1,…ck} is the set of Ω={c1,…ck} is the set of
classes, the classifiers can provide an classes, the classifiers can provide an “ordered” (ranked) list of class labels“ordered” (ranked) list of class labels
The Borda Count Method: an The Borda Count Method: an exampleexample
Let N=3 and k=4, Let N=3 and k=4, Ω={a,b,c,d}Ω={a,b,c,d} For a given pattern, the ranked ouptuts For a given pattern, the ranked ouptuts
of the three classfiers are as followsof the three classfiers are as followsRank value Classifier1 Classifer2 Rank value Classifier1 Classifer2
Classifier3Classifier34 c a b4 c a b3 b b a3 b b a2 d d c2 d d c1 a c d1 a c d
The Borda Count Methods: an The Borda Count Methods: an exampleexample
So we haveSo we have
rraa = r = raa11+ra
2+ ra3 = 1+4+3=8
rrbb = r = rbb11+rb
2+ rb3 = 3+3+4=10
rrcc = r = rcc11+rc
2+ rc3 = 4+1+2=7
rrdd = r = rdd11+rd
2+ rd3 = 2+2+1=5
The winner-class is b because it has the maximum overall rank
Remarks on Rank level Remarks on Rank level MethodsMethods
Advantage over abstract level (majority vote)Advantage over abstract level (majority vote) Ranking is suitable in problems with many classes, Ranking is suitable in problems with many classes,
where the correct class may appear often near the top where the correct class may appear often near the top of the list, although not at the top of the list, although not at the top
Example: word recognition with sizeablee lexiconExample: word recognition with sizeablee lexicon Advantages over measurement level:Advantages over measurement level:
Rankings can be preferred to soft outputs to avoid lack of Rankings can be preferred to soft outputs to avoid lack of conssitency when using different classifierconssitency when using different classifier
Rankins can be preferred to soft outputs to simplify the Rankins can be preferred to soft outputs to simplify the combiner designcombiner design
Drawbacks:Drawbacks: Rank-level method are not supported by clear theorectical Rank-level method are not supported by clear theorectical
underpinningsunderpinnings Results depend on the scale of numbers assigned to the Results depend on the scale of numbers assigned to the
choiceschoices
Open issuesOpen issues
General combination strategies are General combination strategies are only sub-optimal solutions to most only sub-optimal solutions to most applications;applications;
ReferencesReferences1.1. Dr K Sirlantzis “Diversity in Multiple Classifier Systems”, Dr K Sirlantzis “Diversity in Multiple Classifier Systems”,
University of Kent;www.ee.kent.ac.uk;University of Kent;www.ee.kent.ac.uk;2. F. Roli, Tutorial Fusion of Multiple Pattern Classifier”, University
of Cagliari3. Robert P.W.Duin, “The Combining Classifier: to Train or Not to
Train?”, ICPR 2002, Pattern Recognition Group, Faculty of Applied Sciences;
4.4. L. Xu, A. Kryzak, C. V. Suen, “Methods of Combining Multiple L. Xu, A. Kryzak, C. V. Suen, “Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition”, Classifiers and Their Applications to Handwriting Recognition”, IEEE Transactions on Systems, Man Cybernet, 22(3), 1992, pp. IEEE Transactions on Systems, Man Cybernet, 22(3), 1992, pp. 418-435.418-435.
5.5. J. Kittle, M. Hatef, R. Duin and J. Matas, “On Combining J. Kittle, M. Hatef, R. Duin and J. Matas, “On Combining Classifiers”, IEEE Transactions on Pattern Analysis and Machine Classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), March 1998, pp. 226-239. Intelligence, 20(3), March 1998, pp. 226-239.
6.6. D. Tax, M. Breukelen, R. Duin, J. Kittle, “Combining Multiple D. Tax, M. Breukelen, R. Duin, J. Kittle, “Combining Multiple Classifiers by Averaging or by Multiplying?”, Patter Recognition, Classifiers by Averaging or by Multiplying?”, Patter Recognition, 33(2000), pp. 1475-1485. 33(2000), pp. 1475-1485.
7.7. L. I. Kuncheva, “A Theoretical Study on Six Classifier Fusion L. I. Kuncheva, “A Theoretical Study on Six Classifier Fusion Strategies”, IEEE Transactions on Pattern Analysis and Machine Strategies”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2), 2002, pp. 281-286.Intelligence, 24(2), 2002, pp. 281-286.