committee based approaches to active learning · to active learning jerzy stefanowski + support of...

Committee Based Approaches Committee Based Approaches to Active Learningto Active Learning

Jerzy StefanowskiJerzy Stefanowski

+ support of + support of MarcinMarcin PachockiPachocki

Institute of Computing ScienceInstitute of Computing SciencePoznańPoznań University of TechnologyUniversity of Technology

TPD students TPD students –– Advanced Data MiningAdvanced Data MiningPoznan, 2009/2010Poznan, 2009/2010

22

A typical approach to A typical approach to supervised learningsupervised learning

Construct data representation (objects x attributes) Construct data representation (objects x attributes) and label examplesand label examplesPossibly prePossibly pre--process (feature construction)process (feature construction)Learn from all labeled examplesLearn from all labeled examples

33

MotivationsMotivations

Limited number of labeled exampleLimited number of labeled exampless; ; Unlabeled examples are easily availableUnlabeled examples are easily available

Labeling costlyLabeling costly

Examples:Examples:Classification of Web pages, email filtering, text Classification of Web pages, email filtering, text categorization.categorization.

AimsAimsAn efficient classifier with a minimal number of An efficient classifier with a minimal number of additional labelingadditional labeling

55

Active LearningActive Learning

Passive vs. Active Learning:Passive vs. Active Learning:An algorithm controls input examplesAn algorithm controls input examples

It is able to query (oracle / teacher) and receivesIt is able to query (oracle / teacher) and receives a a response (label) before outputting a final classifierresponse (label) before outputting a final classifierHow to select queriesHow to select queries??

Learningalgorithm

labeled

Unlabeledexamples

ClassifierQueries

Oracle

66

ActiveActive LearningLearning StructureStructure

“wise learner is one which will classify easy cases by itself and reservedifficult cases for the teacher” [Turney]

WhoWho isis anan oracleoracle??

77

Do not ask too many questions!Do not ask too many questions!

Query vs. random samplingQuery vs. random sampling

88

Previous ResearchPrevious Research

Selective sampling [Cohn et al. 94]Selective sampling [Cohn et al. 94]Uncertainty sampling [Uncertainty sampling [Lewis,CatlettLewis,Catlett 94]94]……EnsemblesEnsembles

Query by Committee of Query by Committee of TTwowo [[SuengSueng et al.et al.; ; FreundFreund et al. 97et al. 97]]Sampling committees Sampling committees Query by Committee [Abe, MaQuery by Committee [Abe, Mammitsukaitsuka 98]98]QBC and Active Decorate [QBC and Active Decorate [MelvileMelvile, Mooney 04], Mooney 04]

1212

Query by CommitteeQuery by Committee

L - set of labeled examplesU - set of unlabeled examplesA - base learning agorithmk - number of act iterationsm - size of each sampleRepeat k times1. Generate a committee of classifiers C* = EnsembleMethod(A, L)2. For each x in U compute Info_val(C*,x), based on the current

committee3. Select a subset S of m examples with max Info_val4. Obtain Labels from Oracle for examples in S5. Remove examples in S from U and add to LReturn Ensemble

1313

ActiveActive vsvs. . PassivePassive LearningLearning

1414

Research QuestionsResearch Questions

Algorithms for constructing committeesAlgorithms for constructing committeesSelection of examples to queriesSelection of examples to queries

Disagreement measuresDisagreement measuresHow many examples to queryHow many examples to query

Influence of creating the starting labeled setInfluence of creating the starting labeled setAn experimental evaluation of different An experimental evaluation of different approachesapproaches

1515

Constructing CommitteesConstructing Committees

Considered approachesConsidered approachesBagging [Bagging [Abe,MamitsukaAbe,Mamitsuka 98]98]

Boosting [Boosting [Abe,MamitsukaAbe,Mamitsuka 98]98]

Decorate [Decorate [Melville,MooneyMelville,Mooney 03]03]→→ Active_DecorateActive_Decorate [[MMMM 0404]]

Iterative and adaptive multiple classifierIterative and adaptive multiple classifier

Additional artificial training examples to get more Additional artificial training examples to get more diversified component classifiersdiversified component classifiers

Random ForestsRandom Forests

1717

Disagreement measuresDisagreement measuresAnalysingAnalysing predictions of base classifiers:predictions of base classifiers:

Margins of the classified exampleMargins of the classified exampleDifference between number of votes in the committeeDifference between number of votes in the committee for the most for the most and the second predicted class and the second predicted class

Probability vectors instead of single predictionsProbability vectors instead of single predictionsGeneralizationGeneralization of margins of margins –– difference between probabilitiesdifference between probabilities

Distance between component classifiers and final answerDistance between component classifiers and final answer

Median distanceMedian distance

Jensen Shannon divergenceJensen Shannon divergence

1818

Constructing the starting training setConstructing the starting training set??

Influence of choosing the set Influence of choosing the set LL (starting in AL)(starting in AL)In controlled experimentsIn controlled experiments

Random sample vs. focused selectionRandom sample vs. focused selection

Edited kEdited k--NN [NN [Stefanowski,WilkStefanowski,Wilk 07]07]Use Wilson’s edited nearest neighbor rule:Use Wilson’s edited nearest neighbor rule:

Compare example’s label with Compare example’s label with its neighborsits neighbors,,

Safe Safe →→ correctly classified by correctly classified by its its kk nearest neighborsnearest neighbors,,Choosing the most safe examplesChoosing the most safe examplesfrom given classesfrom given classes

1919

UsefullnessUsefullness of of differenctdifferenct QBCQBC

An experimental comparative study An experimental comparative study Performance of different approaches to QBCPerformance of different approaches to QBC

Using 4 disagreement measuresUsing 4 disagreement measures

Selecting the starting training setSelecting the starting training setRandom vs. edited kRandom vs. edited k--NNNN

Choosing single or more examples to queryChoosing single or more examples to query

6 benchmark data sets (UCI)6 benchmark data sets (UCI)

Learning curvesLearning curvesPerformance passive vs. active learners Performance passive vs. active learners

Implementations in WEKAImplementations in WEKA

2020

Different committees in ALDifferent committees in ALComparing different active learners on SoybeanComparing different active learners on Soybean datadata

2121

Different committees in ALDifferent committees in ALSoybeanSoybean WineWine

BreastBreast WisconsinWisconsin IonosphereIonosphere

2222

Selecting the starting set in QBCSelecting the starting set in QBC

Soybean random selectionSoybean random selection edited kedited k--NNNN

2323

Reduction of labeled examplesReduction of labeled examplesActive Active vsvs PassivePassive

2424

Computational costsComputational costs

2525

Comparing disagreement measuresComparing disagreement measures

BreastBreast SoybeanSoybean

2626

ConclusionsConclusionsQBC in AL QBC in AL →→ accuracy comparable to passive versionsaccuracy comparable to passive versionswith with much smaller number of examplesmuch smaller number of examplesThe best reduction ratio The best reduction ratio →→ Active Decorate (4 of 6 data)Active Decorate (4 of 6 data)Trade off with computational costsTrade off with computational costs

Active Decorate Active Decorate ∼∼ 10 times more10 times moreRandom ForestsRandom Forests →→ the fastestthe fastest

Selection of the training set Selection of the training set →→ edited kedited k--NNNN improves all improves all approachesapproaches

The best reduction ratio The best reduction ratio →→ Active DecorateActive Decorate

Increasing the number of added queries Increasing the number of added queries →→ not too muchnot too muchChoice of disagreement measuresChoice of disagreement measures

No big influence, except multiNo big influence, except multi--class data (Soybean)class data (Soybean)Generalized margins Generalized margins →→ JS divergenceJS divergence

2727

Powiązane zagadnieniaPowiązane zagadnienia

2828

CoCo--training and Decompositiontraining and DecompositionCo-training is a method which can be applied to machine learning problems with multiple views. By this we mean that the problem has a natural way in which to divide their features into subsets which we call views.There is sufficient redundant information in the description of the examples that a number of distinct sets of features can be formed - each of which is sufficient for describing the target function.Blum and Mitchell 98: 2 views for classifying Web pages → the body of the page and the anchor text of the links that pointed to the web page.Kititchenko and Matwin [KM01] an application to classify e-mail → the body and subject of the email

3030

ReferencesReferencesNaoki Abe and Hiroshi Naoki Abe and Hiroshi MamitsukaMamitsuka (1998), Query learning strategies (1998), Query learning strategies using boosting and bagging, in Proceedings of the Fifteenth using boosting and bagging, in Proceedings of the Fifteenth International Conference on Machine Learning’98, 1International Conference on Machine Learning’98, 1----9.9.AA. . BlumBlum,, TT.. Mitchell (1998), Combining labeled and unlabeledMitchell (1998), Combining labeled and unlabeled data with data with coco--training, in Proceedings of the Workshop on Computationaltraining, in Proceedings of the Workshop on Computational Learning Learning TheoryTheoryD. Cohen, L. Atlas, R. D. Cohen, L. Atlas, R. LadnerLadner (1994),Improving generalization with (1994),Improving generalization with active learning. Machine Learning, 15(2), 201active learning. Machine Learning, 15(2), 201--––221.221.Michael Davy (2005), A Review of Active Learning and CoMichael Davy (2005), A Review of Active Learning and Co--Training in Training in Text Classification. Dep. of Computer Science, Trinity College DText Classification. Dep. of Computer Science, Trinity College Dublin, ublin, Research Report, TCDResearch Report, TCD--CSCS--20052005--64, 39 pp.64, 39 pp.KiritchenkoKiritchenko and Matwin(2001), Email classification and Matwin(2001), Email classification withcowithco--training, in training, in Proceedings of the CASCON ’01 ConferenceProceedings of the CASCON ’01 ConferenceMelvileMelvile and Monney(2004), Diverse ensembles for active learning, In and Monney(2004), Diverse ensembles for active learning, In Proceedings of the 21st Int. Conference on Machine Learning, 584Proceedings of the 21st Int. Conference on Machine Learning, 584----591.591.J.StefanowskiJ.Stefanowski, , M.PachockiM.Pachocki (2009), (2009), Comparing Performance of Comparing Performance of Committee based Approaches to Active Learning. Committee based Approaches to Active Learning. In: In: M.KłopotekM.Kłopotek, , A.PrzepiórkowskiA.Przepiórkowski, , S.WierzchońS.Wierzchoń, , K.TrojanowskiK.Trojanowski (red.) (red.) RecentRecent AdvancesAdvances ininIntelligentIntelligent InformationInformation SystemsSystems, Wydawnictwo EXIT, Warszawa, 2009, , Wydawnictwo EXIT, Warszawa, 2009, 457457--470470. .

3131

Thank you for your attention

Contact, remarks:[email protected]

Questions and remarks, please!

committee based approaches to active learning · to active learning jerzy stefanowski + support of...

Documents