active learning: applications, foundations & emerging...
TRANSCRIPT
Graz, the 18th of October 2016
Active Learning:Applications, Foundations & Emerging Trends
Workshop & Tutorial at IKNOW 2016
Daniel Kottke1 Georg Krempl2 Vincent Lemaire3 Edwin Lughofer4
1 Kassel University, Kassel, Germany2 Otto-von-Guericke University Magdeburg, Germany3 Kepler University, Linz, Austria4 Orange Labs, Lannion, France
1/38 Active Machine LearningKnowledge
DiscoveryManagement &
Schedule
Morning Session
10:30-12:30 Tutorial and Discussion
Afternoon Session
14:00-14:20 MapView: Graphical Data Representation for Active Learning byE. Weigl, A. Walch, U. Neissl, P. Meyer-Heye, Th. Radauer, E.Lughofer, W. Heidl and Ch. Eitzinger
14:20-14:40 Active Learning with SVM for Land Cover Classification - WhatCan Go Wrong? by S. Wuttke, W. Middelmann and U. Stilla
14:40-15:00 Dynamic Parameter Adaptation of SVM Based Active LearningMethodology by J. Smailovic, M. Grcar, N. Lavrac and M. Znidarsic
15:00-15:40 Investigating Exploratory Capabilities of Uncertainty Samplingusing SVMs in Active Learning by D. Lang, D. Kottke,G. Krempl and M. Spiliopoulou
15:20-15:40 Active Subtopic Detection in Multitopic Databy B. Bergner and G. Krempl
15:40-16:00 Closing
2/38 Active Machine LearningKnowledge
DiscoveryManagement &
Part 1: Introduction
I Motivation, Task & Scenarios
I Selected ApproachesI Version Space Partitioning & Query by CommitteeI Uncertainty SamplingI Expected Error ReductionI Probabilistic Active Learning
I From Pools to Evolving Streams
I A First Summary
Part 2: Online Active Learning & Applicationspresented by Edwin Lughofer
Part 3: Evaluation in Active Learningpresented by Daniel Kottke
3/38 Active Machine LearningKnowledge
DiscoveryManagement &
Motivating Applications
Credit Scoring & Fraud Detection
I predict from revenue whether a client will pay or default
I predict whether a credit card transaction is fraudulent or legitimate
I relevant e.g. for a banks or e-commerce companies
Brain Computer Interfaces
I predict from EEG pattern the action the user desires
I relevant e.g. for intelligent prostheses
Historical Map Annotation
I identify from scanned pixel data the annotations in historical maps
4/38 Active Machine LearningKnowledge
DiscoveryManagement &
Motivating Applications
(Supervised) Machine Learning Tasks
I Historical Datae.g. previous client’s records
I Generate Training Samplewith explanatory variables (e.g. profit)and class label (e.g. default)
I Estimate Distributionsjoint distributions d(x , y) or
posterior distributions d(y |x) = d(x,y)d(x)
I Derive Decision Boundaryat intersections of posterior distributions
I Make Automated Predictions for New Instancese.g. predict new client’s class label
I Done! (?)
5/38 Active Machine LearningKnowledge
DiscoveryManagement &
Motivating Applications
Challenge
I Some labels are expensive
I Labelling all historical instances might be impossible
Examplary Applications
I Credit Scoring & Fraud Detection:E.g. costly to accept high risk clients for model building,impossible to investigate all credit card transactions
I Brain Computer Interfaces:E.g. performing tasks for calibration can be tedious for user
I Historical Map Annotation:E.g. domain expert might be expensive/have limited
6/38 Active Machine LearningKnowledge
DiscoveryManagement &
Motivation
Big Data, but . . .
I Expert’s time is scarce,
I Storage & processing capacities are limited
Selection is important
I Efficient allocation of limited resources
I Sample where we expect something interesting
7/38 Active Machine LearningKnowledge
DiscoveryManagement &
Active Learning1
Setting
I Some information is costly (some not)
I Active learner controls selection process
Objective
I Select the most valuable information
I Baseline: Random selection
Historical Remarks
I Optimal experimental design [Fedorov, 1972]
I Learning with queries/query synthesis [Angluin, 1988]
I Selective sampling [Cohn et al., 1990]
1See e.g. [Settles, 2012, Cohn, 2010].
8/38 Active Machine LearningKnowledge
DiscoveryManagement &
Selective Data Acquisition Tasks2
Active Learning Scenarios
I Query synthesis: example generated upon query
I Pool U of unlabelled data: static, repeated access
I Stream: sequential arrival, no repeated access
Type of Selected Information
I Active label acquisition
I Active feature (value) acquisition
I Active class selection, also denotedActive class-conditional example acquisition
I . . .Time
y3 y2y1
x3x1
x2
Instances y5
x5
y4
x4
Stream
2Own categorization, inspired by [Attenberg et al., 2011, Saar-Tsechansky et al., 2009, Settles, 2009].
9/38 Active Machine LearningKnowledge
DiscoveryManagement &
Overview on Active Learning Strategies
Selected Active Learning Strategies3
I Version Space Partitioning & Query by Committee
I Uncertainty Sampling
I Decision Theoretic Approaches
I Loss Minimisation: Expected Error & Variance Reduction
I Probabilistic Active Learning
3Generic, i.e. usable with different classifier technologies.
10/38 Active Machine LearningKnowledge
DiscoveryManagement &
Version Space Partitioning4
I Version Space Partitioning [Ruff and Dietterich, 1989]:Selection based on disagreement between hypotheses
I Query by Committee [Seung et al., 1992]:
I Disagreement within an ensemble of classifiers
I Requires constructing a diverse ensemble of classifiers
I Combinations with clustering (mixture models)
Feat
ure
x2
Feature x1
Classifier 1
Classifie
r 2
Disagreement
4See [Ruff and Dietterich, 1989].
11/38 Active Machine LearningKnowledge
DiscoveryManagement &
Uncertainty Sampling6
I Information theoretic approach
I Uses classifier’s uncertainty as proxy
I Common uncertainty measures5
I Posterior-based:
Confidence: abs (P(y = +|x)− P(y = −|x))
Entropy −∑
y∈{+,−} p(y |x) log (p(y |x))
I Margin: distance to decision boundary
I Fast: O(|U|), where U is the set of unlabelled instances
I But do these measures really capture the uncertainty?
5See e.g. [Settles, 2012].6See [Roy and McCallum, 2001].
12/38 Active Machine LearningKnowledge
DiscoveryManagement &
Exemplary AL Situations
+
++
+++
+
+-
+-+
+-
-
low highnumber of labels (n)
non
-un
iform
un
iform
obse
rved
dis
trib
uti
on
of
labels
(p
)ˆ
III
III IV
I a label’s value dependson the label informationin its neighbourhood
I label informationI number of labelsI share of classes
I uncertainty sampling ignoresthe number of similar labels
13/38 Active Machine LearningKnowledge
DiscoveryManagement &
Measuring the Uncertainty
Problem with above measures
I Focus on exploitation, fails on exploration [Beyer et al., 2015]
I “Uncertainty” measures ignore uncertainty of the prediction modelCmp. epistemic vs. aleatoric uncertainty in [Senge et al., 2014]
Extensions: Combined measuresI [Fu et al., 2012]
I uncertaintyI instance correlation (within batch)
I [Reitmaier and Sick, 2013] 4DS approach, considering:I distance to the decision boundaryI diversity of samples in the query setI densityI class prior
I [Zliobaite et al., 2013]I uncertainty sampling combined withI randomization for better exploration
I [Weigl et al., 2015]I conflict: overlap of opposing classesI ignorance: proximity of nearest decision boundary
14/38 Active Machine LearningKnowledge
DiscoveryManagement &
Decision Theoretic Approaches
Expected Error Reduction[Cohn et al., 1996, Roy and McCallum, 2001]
I Aim: Minimise error after selection & retraining
I Model unknown label realisation as random variable
x∗ = arg minx
EY |L
∑x′∈U
EY |L′={L∪(x,y)}[y 6= y ]
I Better results reported than for uncertainty sampling [Settles, 2012]
I Relies on maximum-likelihood posterior estimate [Chapelle, 2005]
I Performance estimation relies on evaluation set (using L or by self-labelling U)
I High computational complexity: O(|U|2)
15/38 Active Machine LearningKnowledge
DiscoveryManagement &
Probabilistic Active Learning
Motivation
I Given a dataset with set of labelled instances Land pool of unlabelled instances U with a candidate x
I The true posterior in a candidate’s neighbourhood is unknown:
I Explicitly model the uncertainty associated with posterior value:Expectation not only over candidate instance’s label realisation y ,but also over true posterior p in its neighbourhood:
pgain(x) = Ep
[Ey|p
[performancegainp(L ∪ (x , y))
] ]I The impact of a label is largest in its direct neighbourhood:
I Evaluate change in classification performance only therein
16/38 Active Machine LearningKnowledge
DiscoveryManagement &
Probabilistic Active Learning
Limitations
I Separates classifier and active selector(similar to uncertainty sampling)
I Depends on appropriate neighbourhood definitionand probabilistic estimates for ls = (n, p)
I Performance gain is approximated within the neighbourhood(evaluating globally is possible, but computationally costly)
References
I Implementations in Java, Python, MATLAB are available(open source) at http://kmd.cs.ovgu.de/res/opal/
I Probabilistic Active Learning (PAL).Krempl, Kottke, Spiliopoulou. Discovery Science 2014.
I Optimised Probabilistic Active Learning (OPAL).Krempl, Kottke, Lemaire. Machine Learning 100(2) 2015.
I Multi-Class Probabilistic Active Learning (McPAL).Kottke, Krempl, Spiliopoulou. ECAI 2016.
17/38 Active Machine LearningKnowledge
DiscoveryManagement &
Probabilistic Active Learning in a Nutshell
Illustrative Example
-
+
-+
?
I Given: Dataset with labelled ( - / + ) andunlabelled ( ) instances
I Objective: Determine the expected gain oflabelling e.g. the candidate ?
I What label information do we have already?
I Summarise label information in itsneighbourhood:
For example, by using a probabilistic classifier,kernel frequency estimates, label counts, . . .
I Number of labels: n = 2I Share of positives therein
(i.e. posterior estimate): p = 12
I Summarise as label statistics:ls = (n = 2, p = 0.5)
18/38 Active Machine LearningKnowledge
DiscoveryManagement &
Probabilistic Active Learning
Probabilistic Gain7
pgain(ls) = Ep
[Ey|p
[gainp(ls, y)
] ]=∫ 1
0 Betaα,β(p) ·∑
y∈{0,1} Berp(y) · gainp(ls, y)dp
with:I ls = (n, p): Label statisticsI y : Candidate’s label realisationI p: True posterior at candidate’s position
I This probabilistic gain quantifies
I the expected change in classification performance
I at the candidate’s position in feature space,
I in each and every future classification there,
I given that one additional label is acquired.
I Weight pgain with the density dx over labelled andunlabelled data at the candidate’s position.
I Select the candidate with highest density-weightedprobabilistic gain.
-
+
-+
?
7See [Krempl et al., 2014].
19/38 Active Machine LearningKnowledge
DiscoveryManagement &
Probabilistic Active Learning – Interpretation
0
1
2
3
4
5
0 0.2 0.4 0.6 0.8 1
no labels (n=0)few labels (n=2,p=0.5)^
more labels (n=3,p=2/3)^
many labels (n=11,p=10/11)^
0.5
0.67
0.91
True posterior (p)
Norm
alized
Lik
elih
ood
I Uniform prior: Prior to the firstlabel’s arrival, all values of p areassumed equally plausible.
I A Bayesian approach yields for thenormalised likelihood corresponds abeta distribution with parameters:α Number of positive labels plus oneβ Number of negative labels plus one
I Left: Plot of normalised likelihoodsfor different values of α, β
I The peak of this function becomesthe more distinct, the more labels areobtained.
20/38 Active Machine LearningKnowledge
DiscoveryManagement &
Non-Myopic Extension of PAL
Myopic Probabilistic Gain
pgain(ls) = Ep
[Ey[gainp(ls, y)
] ]=∫ 1
0 Betaα,β(p) ·∑
y∈{0,1} Berp(y) · gainp(ls, y)dp
with:I ls = (n, p): Label statisticsI y : Candidate’s label realisationI p: True posterior at candidate’s position
Non-Myopic Extension
I Not a single label is purchased in future, but
I a set of labels according to a given budget m
I We need to optimise performance gain when acquiring this set of labels!
I Brute-Force Approach: Calculate gain for all combinations
I But: Ordering (of arrival) is irrelevant (in pools)It suffices to consider the varying number k of positives among m acquired labels
21/38 Active Machine LearningKnowledge
DiscoveryManagement &
Non-Myopic Probabilistic Gain
Non-Myopic Probabilistic Gain
GOPAL(ls,m) =1
m· Ep
[Ek
[gainp(ls, k,m)
] ]=
1
m·∫ 1
0Betaα,β(p) ·
∑0≤k≤m
Binm,p(k) · gainp(ls, k,m) dp
with:I ls = (n, p): Label statisticsI p: True posterior at candidate’s positionI m: Number of candidates to be acquired (budget)I k: Number of candidates with positive label realisations
I with performance gain as difference between future and current performance:
gainp(ls, k,m) = perfp
(np + k
n + m
)− perfp(p)
22/38 Active Machine LearningKnowledge
DiscoveryManagement &
Cost-Sensitive Classification
Given a situation with
I p ∈ [0, 1] true posterior prob. of the positive class in a neighbourhood
I q ∈ [0, 1] share of instances therein that are classified as positive
I costFP = τ ∈ [0, 1] cost of each false positive classification
I Misclassification loss as performance measure
Resulting Cost-Optimal Classification
q∗ =
0 p < τ1− τ p = τ1 p > τ
(1)
Misclass. Loss under Cost-Opt. Classification
perfp,τ (p) = −MLp,τ (p) = −
p · (1− τ) p < ττ · (1− τ) p = ττ · (1− p) p > τ
(2)
23/38 Active Machine LearningKnowledge
DiscoveryManagement &
Fast Closed-Form Solution
Non-Myopic, Cost-Sensitive Probabilistic Gain
I Combining misclassification loss as performance measure and
I the non-myopic probabilistic gain yields the
I probabilistic misclassification loss reduction
GOPAL(ls, τ,m) =1
m·∫ 1
0Betaα,β(p)
m∑k=0
Binm,p(k)
(MLp,τ (p)−MLp,τ
(np + k
n + m
))dp
Closed-Form Solution
GOPAL(n, p, τ,m) =n + 1
m·(
nn · p
)·(IML(n, p, τ, 0, 0)−
m∑k=0
IML(n, p, τ,m, k)
)
IML(n, p, τ,m, k) =
(mk
)·
(1− τ) · Γ(1−k+m+n−np)Γ(2+k+np)Γ(3+m+n)
np+kn+m
< τ
(τ − τ2) · Γ(1−k+m+n−np)Γ(1+k+np)Γ(2+m+n)
np+kn+m
= τ
τ · Γ(2−k+m+n−np)Γ(1+k+np)Γ(3+m+n)
np+kn+m
> τ
24/38 Active Machine LearningKnowledge
DiscoveryManagement &
Probabilistic Gain – GOPAL
Probabilistic Gain for Equal Misclassification Costs
01
23
45
0 0.2 0.4 0.6 0.8 1
0
0.05
0.1
0.15
0.2
observed posteriorp knowledge
n
pro
bab
ilist
ic g
ain
^
I The probabilistic gain in accuracy asa function of ls = (n, p) is
I monotone with variable n,
I symmetric with respect to p = 0.5,
I zero for irrelevant candidates.
I Compare to uncertainty:
(in confidence)const. w.r.t. n:
uncertainty
0
0.1
0.2
0.3
0.4
0.5
0 0.2 0.4 0.6 0.8 1
posterior
I Unequal misclassification costs:Asymmetric, as sampling from the“cheaper”’ class is preferred to avoidpotentially costly error
Visualisation
25/38 Active Machine LearningKnowledge
DiscoveryManagement &
Evaluation – Setup
Experimental Setup
I OPAL compared against its myopic, cost-sensitive PAL (csPAL) counterpart and:Uncertainty Sampling without (U.S.) and with self-training (U.S. st), CertaintySampling (C.S.), Expected Error Reduction with beta-prior (Chap) orcost-sensitive extension (Marg), non-myopic expected entropy reduction (Zhao)
I Same classifier (Parzen window classifier with Gaussian kernels)
I implemented in MATLAB and run on the same platform,
I with the same (dataset-specific, pre-tuned) bandwidth parameter,
I on several synthetic and real-world data sets,
I using cross-validation (100 random permutations),
I reporting learning curves in arithmetic mean in misclassification loss, and wins atlearning steps.
I More results are at our website http://kmd.cs.ovgu.de/res/opal/
26/38 Active Machine LearningKnowledge
DiscoveryManagement &
Overall Classification Performance
20 labels OPAL vs.
acquired csPAL U.S. U.S. st C.S. Marg1
Chap1
Zhao1
Randτ∗ = 0.10 47% 62%∗ 70%∗ 72%∗ 66%∗ 56%∗ 72%∗ 62%∗
τ∗ = 0.25 51%∗ 63%∗ 75%∗ 88%∗ 81%∗ 62%∗ 70%∗ 65%∗
τ∗ = 0.50 1% 64%∗ 72%∗ 92%∗ 87%∗ 63%∗ 69%∗ 68%∗
τ∗ = 0.75 53%∗ 60%∗ 67%∗ 86%∗ 80%∗ 50%∗ 48%∗ 58%∗
τ∗ = 0.90 42% 61%∗ 66%∗ 77%∗ 75%∗ 53%∗ 57%∗ 62%∗
40 labels OPAL vs.
acquired csPAL U.S. U.S. st C.S. Marg1
Chap1
Zhao1
Randτ∗ = 0.10 43% 55%∗ 71%∗ 75%∗ 69%∗ 62%∗ 69%∗ 57%∗
τ∗ = 0.25 56%∗ 59%∗ 73%∗ 89%∗ 79%∗ 65%∗ 69%∗ 58%∗
τ∗ = 0.50 4% 61%∗ 72%∗ 93%∗ 89%∗ 74%∗ 76%∗ 62%∗
τ∗ = 0.75 57%∗ 64%∗ 71%∗ 90%∗ 81%∗ 59%∗ 56%∗ 54%∗
τ∗ = 0.90 46% 55%∗ 63%∗ 82%∗ 77%∗ 57%∗ 64%∗ 56%∗
Table: Percentages of runs over all data sets, where OPAL performs better than its competitor.Significantly better performance is denoted by ∗, significantly worse performance by †. The usedsignificance level in the one-sided Wilcoxon signed-rank test was for both 0.001. Algorithms aremarked with 1 if not every data set could be used in the evaluation due to their long executiontime.
27/38 Active Machine LearningKnowledge
DiscoveryManagement &
Multi-Class Extension (McPAL)8
Motivation
I Many applications involve multinomial (rather than binary) labels (i.e. C > 2)
Task & Notation
I As before, pool of labelled (~x , y) ∈ L and unlabelled (~x , ·) ∈ U instances
I Multi-Class (C > 2, not binary) classification: y ∼ Multinomial~p(~k), where
I Instance’s feature vector ~x
I Instance’s true posterior vector ~p = (p1, . . . , pC )
I Instance’s label statistics ~k = (k1, . . . , kC )
I Realisation of m ≤ M additional labels: ~l = (l1, . . . , lC ) ∈ NC , s.t.∑
li = m
Our Contributions
I Modelling as probabilistic active learning problemand deriving a closed-form solution
I Identification & evaluation of three influence factors
8Kottke, Krempl, Lang, Teschner, Spiliopoulou, ECAI, 2016.
28/38 Active Machine LearningKnowledge
DiscoveryManagement &
Multi-Class Extension (McPAL): Selection Score
alScore(~x | L,U
)= P(~x | L ∪ U)︸ ︷︷ ︸
impact
· perfGain(
cl(~x | L
))︸ ︷︷ ︸posterior & reliability
(3)
perfGain(~k)
= maxm≤M
1
m
(expPerf
(~k,m
)︸ ︷︷ ︸new perf.
− expPerf(~k, 0)︸ ︷︷ ︸
curr. perf.
) (4)
expPerf(~k,m
)= E~p
[E~l
[perf
(~k +~l | ~p
)]](5)
=∑~l
(∑
(ki +li +di +1))−1∏
j=∑
(ki +1)
1
j
·∏ki +li +di∏
j=ki +1
j
· Γ ((∑
li ) + 1)∏(Γ (li + 1))
(6)
where ~l ∈ is a label realisation
29/38 Active Machine LearningKnowledge
DiscoveryManagement &
From Pools to Evolving Streams
Time
y3 y2y1
x3x1
x2
Instances y5
x5
y4
x4
Stream
Data Stream
I Instances arrive sequentially
I Possibly infinite number of instances
I Non-stationary distributions (drift)
I “Big Data” is often streaming data
General Challenges
I Adaptation to change
I Limited computational ressources
Active Learning-Specific Challenges
I Budget Management & Change Detection
I Evaluation & Performance Guarantees
I . . .
30/38 Active Machine LearningKnowledge
DiscoveryManagement &
Classification in Evolving Datastreams
Chunk-Based ProcessingKrempl, Ha, Spiliopoulou, DS, 2015.
I Clustering-based approach (COPAL)
I Diversity-maximising micro selection, and PAL-based macro selection
I Amnesic (COPAL-A) and incremental (COPAL-I) variants
I Experimental results: COPAL-I is better than COPAL-A,quality of clustering has a large impact on results
One-by-One ProcessingKottke, Krempl, Spiliopoulou, IDA, 2015.
I Notion of temporal usefulness, complementing spatial usefulness
I Budget management: Guaranteed that budget restriction is met
I Temporal selection: Balanced Incremental Quantile Filter (BIQF)
I Spatial selection: Probabilistic Active Learning
I Experimental results: Combination of BIQF and PAL is best for small budgets
31/38 Active Machine LearningKnowledge
DiscoveryManagement &
Summary of this part
I Active learning problem:Applications where collecting ground truth (e.g. labels)is not possible for every single example
Efficient allocation of limited resources:Sample where we expect something insightful
I Different tasks and scenariosQuery synthesis, pool-based or stream-based samplingActive acquisition of labels, features, instances from specific classes, . . .
I Uncertainty Sampling & Expected Error Reductionperform sometimes poor due to ignoring other types of uncertainty
Use a combination with other measures or a probabilistic approach:
I Probabilistic Active LearningExpected gain in classification performanceModels label realisation and true posterior as random variablesConsiders posterior estimate, its reliability, and impact as influence factors
Decision-theoretic, non-myopic, cost-sensitive, multi-classfast and competitive performance
32/38 Active Machine LearningKnowledge
DiscoveryManagement &
Thank you!
Questions?
33/38 Active Machine LearningKnowledge
DiscoveryManagement &
Bibliography I
Angluin, D. (1988).Queries and concept learning.Machine Learning, 2:319–342.
Attenberg, J., Melville, P., Provost, F., and Saar-Tsechansky, M. (2011).Selective data acquisition for machine learning.In Cost-Sensitive Machine Learning. CRC Press.
Beyer, C., Krempl, G., and Lemaire, V. (2015).How to select information that matters: A comparative study on active learningstrategies for classification.In Proc. of the 15th Int. Conf. on Knowledge Technologies and Data-DrivenBusiness (i-KNOW 2015), pages 2:1–2:8. ACM.
Chapelle, O. (2005).Active learning for parzen window classifier.In Proceedings of the Tenth International Workshop on Artificial Intelligence andStatistics, pages 49–56.
Cohn, D. (2010).Active learning.In Sammut, C. and Webb, G. I., editors, Encyclopedia of Machine Learning,pages 10–14. Springer.
34/38 Active Machine LearningKnowledge
DiscoveryManagement &
Bibliography II
Cohn, D., Atlas, L., Ladner, R., El-Sharkawi, M., Marks, R., Aggoune, M., andPark, D. (1990).Training connectionist networks with queries and selective sampling.In Advances in Neural Information Processing Systems (NIPS). MorganKaufmann.
Cohn, D. A., Ghahramani, Z., and Jordan, M. I. (1996).Active learning with statistical models.Journal of Artificial Intelligence Research, 4:129–145.
Fedorov, V. V. (1972).Theory of Optimal Experiments Design.Academic Press.
Fu, Y., Zhu, X., and Li, B. (2012).A survey on instance selection for active learning.Knowledge and Information Systems, 35(2):249–283.
Krempl, G., Kottke, D., and Spiliopoulou, M. (2014).Probabilistic active learning: Towards combining versatility, optimality andefficiency.In Dzeroski, S., Panov, P., Kocev, D., and Todorovski, L., editors, Proceedings ofthe 17th Int. Conf. on Discovery Science (DS), Bled, volume 8777 of LectureNotes in Computer Science, pages 168–179. Springer.
35/38 Active Machine LearningKnowledge
DiscoveryManagement &
Bibliography III
Reitmaier, T. and Sick, B. (2013).Let us know your decision: Pool-based active training of a generative classifierwith the selection strategy 4ds.Information Sciences, 230:106–131.
Roy, N. and McCallum, A. (2001).Toward optimal active learning through sampling estimation of error reduction.In Proc. of the 18th Int. Conf. on Machine Learning, ICML 2001, Williamstown,MA, USA, pages 441–448, San Francisco, CA, USA. Morgan Kaufmann.
Ruff, R. A. and Dietterich, T. (1989).What good are experiments?In Proc. of the sixth int. workshop on machine learning.
Saar-Tsechansky, M., Melville, P., and Provost, F. (2009).Active feature-value acquisition.Management Science, 55(4):664–684.
Senge, R., Bosner, S., Dembczynski, K., Haasenritter, J., Hirsch, O.,Donner-Banzhoff, N., and Hullermeier, E. (2014).Reliable classification: Learning classifiers that distinguish aleatoric and epistemicuncertainty.Information Sciences, 255:16–29.
36/38 Active Machine LearningKnowledge
DiscoveryManagement &
Bibliography IV
Settles, B. (2009).Active learning literature survey.Computer Sciences Technical Report 1648, University of Wisconsin-Madison,Madison, Wisconsin, USA.
Settles, B. (2012).Active Learning.Number 18 in Synthesis Lectures on Artificial Intelligence and Machine Learning.Morgan and Claypool Publishers.
Seung, H. S., Opper, M., and Sompolinsky, H. (1992).Query by committee.In M.K., W. and L.G., V., editors, Proc. of the fifth workshop on computationallearning theory. Morgan Kaufmann.
Weigl, E., Heidl, W., Lughofer, E., Radauer, T., and Eitzinger, C. (2015).On improving performance of surface inspection systems by online active learningand flexible classifier updates.Machine Vision and Applications, 27(1):103–127.
Zhao, Y., Yang, G., Xu, X., and Ji, Q. (2012).A near-optimal non-myopic active learning method.In Proceedings of the 21st International Conference on Pattern Recognition,ICPR 2012, Tsukuba, Japan, November 11-15, 2012, pages 1715–1718. IEEE.
37/38 Active Machine LearningKnowledge
DiscoveryManagement &
Bibliography V
Zliobaite, I., Bifet, A., Pfahringer, B., and Holmes, G. (2013).Active learning with drifting streaming data.IEEE Transactions on Neural Networks and Learning Systems, 25(1):27–39.
38/38 Active Machine LearningKnowledge
DiscoveryManagement &