a brief introduction to adaboost - middle east technical...
TRANSCRIPT
1
A Brief Introduction
to Adaboost
Hongbo Deng
6 Feb, 2007
Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman.
2
Outline
Background
Adaboost Algorithm
Theory/Interpretations
3
What’s So Good About Adaboost
Can be used with many different classifiers
Improves classification accuracy
Commonly used in many areas
Simple to implement
Not prone to overfitting
4
Bootstrapping
Bagging
Boosting (Schapire 1989)
Adaboost (Schapire 1995)
A Brief History Resampling for
estimating statistic
Resampling for
classifier
design
5
Bootstrap Estimation
Repeatedly draw n samples from D
For each set of samples, estimate a
statistic
The bootstrap estimate is the mean of the
individual estimates
Used to estimate a statistic (parameter)
and its variance
6
Bagging - Aggregate Bootstrapping
For i = 1 .. M
Draw n*<n samples from D with replacement
Learn classifier Ci
Final classifier is a vote of C1 .. CM
Increases classifier stability/reduces
variance D1
D2
D3 D
7
Boosting (Schapire 1989)
Consider creating three component classifiers for a two-category problem through boosting.
Randomly select n1 < n samples from D without replacement to obtain D1
Train weak learner C1
Select n2 < n samples from D with half of the samples misclassified by C1 to obtain D2 Train weak learner C2
Select all remaining samples from D that C1 and C2 disagree on Train weak learner C3
Final classifier is vote of weak learners D
D1
D2
D3
+ - -
+
8
Adaboost - Adaptive Boosting
Instead of resampling, uses training set re-weighting Each training sample uses a weight to determine the probability
of being selected for a training set.
AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier
Final classification based on weighted vote of weak classifiers
9
Adaboost Terminology
ht(x) … “weak” or basis classifier (Classifier =
Learner = Hypothesis)
… “strong” or final classifier
Weak Classifier: < 50% error over any
distribution
Strong Classifier: thresholded linear combination
of weak classifier outputs
10
Discrete Adaboost Algorithm Each training sample has a
weight, which determines the
probability of being selected for
training the component classifier
11
Find the Weak Classifier
12
Find the Weak Classifier
13
The algorithm core
14
Reweighting
y * h(x) = 1
y * h(x) = -1
15
Reweighting
In this way, AdaBoost “focused on” the
informative or “difficult” examples.
16
Reweighting
In this way, AdaBoost “focused on” the
informative or “difficult” examples.
17
Algorithm recapitulation
t = 1
18
Algorithm recapitulation
19
Algorithm recapitulation
20
Algorithm recapitulation
21
Algorithm recapitulation
22
Algorithm recapitulation
23
Algorithm recapitulation
24
Algorithm recapitulation
25
Pros and cons of AdaBoost
Advantages
Very simple to implement
Does feature selection resulting in relatively simple classifier
Fairly good generalization
Disadvantages
Suboptimal solution
Sensitive to noisy data and outliers
26
References Duda, Hart, ect – Pattern Classification
Freund – “An adaptive version of the boost by majority algorithm”
Freund – “Experiments with a new boosting algorithm”
Freund, Schapire – “A decision-theoretic generalization of on-line learning and an application to boosting”
Friedman, Hastie, etc – “Additive Logistic Regression: A Statistical View of Boosting”
Jin, Liu, etc (CMU) – “A New Boosting Algorithm Using Input-Dependent Regularizer”
Li, Zhang, etc – “Floatboost Learning for Classification”
Opitz, Maclin – “Popular Ensemble Methods: An Empirical Study”
Ratsch, Warmuth – “Efficient Margin Maximization with Boosting”
Schapire, Freund, etc – “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods”
Schapire, Singer – “Improved Boosting Algorithms Using Confidence-Weighted Predictions”
Schapire – “The Boosting Approach to Machine Learning: An overview”
Zhang, Li, etc – “Multi-view Face Detection with Floatboost”
27
Appendix
Bound on training error
Adaboost Variants
28
Bound on Training Error (Schapire)
29
Discrete Adaboost (DiscreteAB)
(Friedman’s wording)
30
Discrete Adaboost (DiscreteAB)
(Freund and Schapire’s wording)
31
Adaboost with Confidence
Weighted Predictions (RealAB)
32
Adaboost Variants Proposed By
Friedman
LogitBoost
Solves
Requires care to avoid numerical problems
GentleBoost
Update is fm(x) = P(y=1 | x) – P(y=0 | x) instead of Bounded [0 1]
33
Adaboost Variants Proposed By
Friedman
LogitBoost
34
Adaboost Variants Proposed By
Friedman
GentleBoost
35
Thanks!!!
Any comments or questions?