committee based approaches to active learning · to active learning jerzy stefanowski + support of...
TRANSCRIPT
Committee Based Approaches Committee Based Approaches to Active Learningto Active Learning
Jerzy StefanowskiJerzy Stefanowski
+ support of + support of MarcinMarcin PachockiPachocki
Institute of Computing ScienceInstitute of Computing SciencePoznańPoznań University of TechnologyUniversity of Technology
TPD students TPD students –– Advanced Data MiningAdvanced Data MiningPoznan, 2009/2010Poznan, 2009/2010
22
A typical approach to A typical approach to supervised learningsupervised learning
Construct data representation (objects x attributes) Construct data representation (objects x attributes) and label examplesand label examplesPossibly prePossibly pre--process (feature construction)process (feature construction)Learn from all labeled examplesLearn from all labeled examples
33
MotivationsMotivations
Limited number of labeled exampleLimited number of labeled exampless; ; Unlabeled examples are easily availableUnlabeled examples are easily available
Labeling costlyLabeling costly
Examples:Examples:Classification of Web pages, email filtering, text Classification of Web pages, email filtering, text categorization.categorization.
AimsAimsAn efficient classifier with a minimal number of An efficient classifier with a minimal number of additional labelingadditional labeling
44
55
Active LearningActive Learning
Passive vs. Active Learning:Passive vs. Active Learning:An algorithm controls input examplesAn algorithm controls input examples
It is able to query (oracle / teacher) and receivesIt is able to query (oracle / teacher) and receives a a response (label) before outputting a final classifierresponse (label) before outputting a final classifierHow to select queriesHow to select queries??
Learningalgorithm
labeled
Unlabeledexamples
ClassifierQueries
Oracle
66
ActiveActive LearningLearning StructureStructure
“wise learner is one which will classify easy cases by itself and reservedifficult cases for the teacher” [Turney]
WhoWho isis anan oracleoracle??
77
Do not ask too many questions!Do not ask too many questions!
Query vs. random samplingQuery vs. random sampling
88
Previous ResearchPrevious Research
Selective sampling [Cohn et al. 94]Selective sampling [Cohn et al. 94]Uncertainty sampling [Uncertainty sampling [Lewis,CatlettLewis,Catlett 94]94]……EnsemblesEnsembles
Query by Committee of Query by Committee of TTwowo [[SuengSueng et al.et al.; ; FreundFreund et al. 97et al. 97]]Sampling committees Sampling committees Query by Committee [Abe, MaQuery by Committee [Abe, Mammitsukaitsuka 98]98]QBC and Active Decorate [QBC and Active Decorate [MelvileMelvile, Mooney 04], Mooney 04]
99
1010
1111
1212
Query by CommitteeQuery by Committee
L - set of labeled examplesU - set of unlabeled examplesA - base learning agorithmk - number of act iterationsm - size of each sampleRepeat k times1. Generate a committee of classifiers C* = EnsembleMethod(A, L)2. For each x in U compute Info_val(C*,x), based on the current
committee3. Select a subset S of m examples with max Info_val4. Obtain Labels from Oracle for examples in S5. Remove examples in S from U and add to LReturn Ensemble
1313
ActiveActive vsvs. . PassivePassive LearningLearning
1414
Research QuestionsResearch Questions
Algorithms for constructing committeesAlgorithms for constructing committeesSelection of examples to queriesSelection of examples to queries
Disagreement measuresDisagreement measuresHow many examples to queryHow many examples to query
Influence of creating the starting labeled setInfluence of creating the starting labeled setAn experimental evaluation of different An experimental evaluation of different approachesapproaches
1515
Constructing CommitteesConstructing Committees
Considered approachesConsidered approachesBagging [Bagging [Abe,MamitsukaAbe,Mamitsuka 98]98]
Boosting [Boosting [Abe,MamitsukaAbe,Mamitsuka 98]98]
Decorate [Decorate [Melville,MooneyMelville,Mooney 03]03]→→ Active_DecorateActive_Decorate [[MMMM 0404]]
Iterative and adaptive multiple classifierIterative and adaptive multiple classifier
Additional artificial training examples to get more Additional artificial training examples to get more diversified component classifiersdiversified component classifiers
Random ForestsRandom Forests
1616
1717
Disagreement measuresDisagreement measuresAnalysingAnalysing predictions of base classifiers:predictions of base classifiers:
Margins of the classified exampleMargins of the classified exampleDifference between number of votes in the committeeDifference between number of votes in the committee for the most for the most and the second predicted class and the second predicted class
Probability vectors instead of single predictionsProbability vectors instead of single predictionsGeneralizationGeneralization of margins of margins –– difference between probabilitiesdifference between probabilities
Distance between component classifiers and final answerDistance between component classifiers and final answer
Median distanceMedian distance
Jensen Shannon divergenceJensen Shannon divergence
1818
Constructing the starting training setConstructing the starting training set??
Influence of choosing the set Influence of choosing the set LL (starting in AL)(starting in AL)In controlled experimentsIn controlled experiments
Random sample vs. focused selectionRandom sample vs. focused selection
Edited kEdited k--NN [NN [Stefanowski,WilkStefanowski,Wilk 07]07]Use Wilson’s edited nearest neighbor rule:Use Wilson’s edited nearest neighbor rule:
Compare example’s label with Compare example’s label with its neighborsits neighbors,,
Safe Safe →→ correctly classified by correctly classified by its its kk nearest neighborsnearest neighbors,,Choosing the most safe examplesChoosing the most safe examplesfrom given classesfrom given classes
1919
UsefullnessUsefullness of of differenctdifferenct QBCQBC
An experimental comparative study An experimental comparative study Performance of different approaches to QBCPerformance of different approaches to QBC
Using 4 disagreement measuresUsing 4 disagreement measures
Selecting the starting training setSelecting the starting training setRandom vs. edited kRandom vs. edited k--NNNN
Choosing single or more examples to queryChoosing single or more examples to query
6 benchmark data sets (UCI)6 benchmark data sets (UCI)
Learning curvesLearning curvesPerformance passive vs. active learners Performance passive vs. active learners
Implementations in WEKAImplementations in WEKA
2020
Different committees in ALDifferent committees in ALComparing different active learners on SoybeanComparing different active learners on Soybean datadata
2121
Different committees in ALDifferent committees in ALSoybeanSoybean WineWine
BreastBreast WisconsinWisconsin IonosphereIonosphere
2222
Selecting the starting set in QBCSelecting the starting set in QBC
Soybean random selectionSoybean random selection edited kedited k--NNNN
2323
Reduction of labeled examplesReduction of labeled examplesActive Active vsvs PassivePassive
2424
Computational costsComputational costs
2525
Comparing disagreement measuresComparing disagreement measures
BreastBreast SoybeanSoybean
2626
ConclusionsConclusionsQBC in AL QBC in AL →→ accuracy comparable to passive versionsaccuracy comparable to passive versionswith with much smaller number of examplesmuch smaller number of examplesThe best reduction ratio The best reduction ratio →→ Active Decorate (4 of 6 data)Active Decorate (4 of 6 data)Trade off with computational costsTrade off with computational costs
Active Decorate Active Decorate ∼∼ 10 times more10 times moreRandom ForestsRandom Forests →→ the fastestthe fastest
Selection of the training set Selection of the training set →→ edited kedited k--NNNN improves all improves all approachesapproaches
The best reduction ratio The best reduction ratio →→ Active DecorateActive Decorate
Increasing the number of added queries Increasing the number of added queries →→ not too muchnot too muchChoice of disagreement measuresChoice of disagreement measures
No big influence, except multiNo big influence, except multi--class data (Soybean)class data (Soybean)Generalized margins Generalized margins →→ JS divergenceJS divergence
2727
Powiązane zagadnieniaPowiązane zagadnienia
2828
CoCo--training and Decompositiontraining and DecompositionCo-training is a method which can be applied to machine learning problems with multiple views. By this we mean that the problem has a natural way in which to divide their features into subsets which we call views.There is sufficient redundant information in the description of the examples that a number of distinct sets of features can be formed - each of which is sufficient for describing the target function.Blum and Mitchell 98: 2 views for classifying Web pages → the body of the page and the anchor text of the links that pointed to the web page.Kititchenko and Matwin [KM01] an application to classify e-mail → the body and subject of the email
2929
3030
ReferencesReferencesNaoki Abe and Hiroshi Naoki Abe and Hiroshi MamitsukaMamitsuka (1998), Query learning strategies (1998), Query learning strategies using boosting and bagging, in Proceedings of the Fifteenth using boosting and bagging, in Proceedings of the Fifteenth International Conference on Machine Learning’98, 1International Conference on Machine Learning’98, 1----9.9.AA. . BlumBlum,, TT.. Mitchell (1998), Combining labeled and unlabeledMitchell (1998), Combining labeled and unlabeled data with data with coco--training, in Proceedings of the Workshop on Computationaltraining, in Proceedings of the Workshop on Computational Learning Learning TheoryTheoryD. Cohen, L. Atlas, R. D. Cohen, L. Atlas, R. LadnerLadner (1994),Improving generalization with (1994),Improving generalization with active learning. Machine Learning, 15(2), 201active learning. Machine Learning, 15(2), 201--––221.221.Michael Davy (2005), A Review of Active Learning and CoMichael Davy (2005), A Review of Active Learning and Co--Training in Training in Text Classification. Dep. of Computer Science, Trinity College DText Classification. Dep. of Computer Science, Trinity College Dublin, ublin, Research Report, TCDResearch Report, TCD--CSCS--20052005--64, 39 pp.64, 39 pp.KiritchenkoKiritchenko and Matwin(2001), Email classification and Matwin(2001), Email classification withcowithco--training, in training, in Proceedings of the CASCON ’01 ConferenceProceedings of the CASCON ’01 ConferenceMelvileMelvile and Monney(2004), Diverse ensembles for active learning, In and Monney(2004), Diverse ensembles for active learning, In Proceedings of the 21st Int. Conference on Machine Learning, 584Proceedings of the 21st Int. Conference on Machine Learning, 584----591.591.J.StefanowskiJ.Stefanowski, , M.PachockiM.Pachocki (2009), (2009), Comparing Performance of Comparing Performance of Committee based Approaches to Active Learning. Committee based Approaches to Active Learning. In: In: M.KłopotekM.Kłopotek, , A.PrzepiórkowskiA.Przepiórkowski, , S.WierzchońS.Wierzchoń, , K.TrojanowskiK.Trojanowski (red.) (red.) RecentRecent AdvancesAdvances ininIntelligentIntelligent InformationInformation SystemsSystems, Wydawnictwo EXIT, Warszawa, 2009, , Wydawnictwo EXIT, Warszawa, 2009, 457457--470470. .