a speech about boosting presenter: roberto valenti
TRANSCRIPT
A speech about BoostingA speech about BoostingPresenter: Roberto Presenter: Roberto ValentiValenti
The Paper*The Paper*
*R.Schapire. The boosting approach to Machine Learning An Overview, 2001
I want YOU… I want YOU…
……TO TO UNDERSTANDUNDERSTAND
OverviewOverview
• Introduction• Adaboost
– How Does it work?– Why does it work?– Demo– Extensions– Performance & Applications
• Summary & Conclusions• Questions
Introduction to BoostingIntroduction to BoostingLet’s start
IntroductionIntroduction
• An example of Machine Learning: Spam classifier
• Highly accurate rule: difficult to find
• Inaccurate rule: ”BUY NOW”• Introducing Boosting: “An effective method of producing
an accurate prediction rule from inaccurate rules”
IntroductionIntroduction
• History of boosting:– 1989: Schapire
•First provable polynomial time boosting
– 1990: Freund•Much more efficient, but practical
drawbacks
– 1995: Freund & Schapire•Adaboost: Focus of this Presentation
– …
IntroductionIntroduction
• The Boosting Approach– Lots of Weak Classifiers– One Strong Classifier
• Boosting key points:– Give importance to misclassified data– Find a way to combine weak
classifiers in general rule.
AdaboostAdaboostHow does it work?
Adaboost – How does it work?Adaboost – How does it work?
Adaboost – How does it work?Adaboost – How does it work?
Base Learner Job:– Find a base Hypothesis:
– Minimize the error:
• Choose t
Adaboost – How does it work?Adaboost – How does it work?
AdaboostAdaboostWhy does it work?
Adaboost – Why does it work?Adaboost – Why does it work?
• Basic property: reduce the training error
• On binary Distributions:t
• Training error bounded by:
• Is at most e-2T->drops
exponentially!
• Generalization Error bounded by:
– T= number of iterations– m=sample size– d= Vapnik-Chervonenkis dimension2
– Pr [.]= empirical probability– Õ = Logarithmic and constant factors
• Overfitting in T!
Adaboost – Why does it work?Adaboost – Why does it work?
Adaboost – Why does it work?Adaboost – Why does it work?• Margins of the training examples
margin(x,y)=
• Positive only if correctly classified by H• Confidence in prediction:
• Qualitative Explanation of Effectiveness– Not Quantitative.
Adaboost – Other ViewAdaboost – Other View
• Adaboost as a zero-sum Game– Game matrix M– Row Player: Adaboost– Column Player: Base Learner– Row player plays rows with distribution P– Column player plays with distribution Q – Expected Loss: PTMQ
• Play a Repeated game Matrix
Adaboost – Other ViewAdaboost – Other View
• Von Neumann’s minmax theorem:
• If exist a classifier with • Then exist a combination of base
classifiers with margin > 2• Adaboost has potential of success
• Relations with Linear Programming and Online Learning
AdaboostAdaboostDemo
AdaboostAdaboostExtensions
Adaboost - ExtensionsAdaboost - Extensions
• History of Boosting:– …– 1997: Freund & Schapire
•Adaboost.M1 – First Multiclass Generalization– Fails if weak learner achieves less than 50%
•Adaboost.M2– Creates a set of binary problems– For x, better l1 or l2?
– 1999: Schapire & Singer•Adaboost.MH
– For x, better l1 or one of the others?
Adaboost - ExtensionsAdaboost - Extensions
– 2001: Rochery, Schapire et al.• Incorporating Human Knowledge
• Adaboost is data-driven• Human Knowledge can
compensate lack of data• Human expert:
– Chose rule p mapping x to p(x) Є [0,1]– Difficult!– Simple rules should work..
Adaboost - ExtensionsAdaboost - Extensions
• To incorporate human knowledge
• Where
RE(p||q)=p ln(p/q)+(1-p) ln((1-p)/(1-q))
AdaboostAdaboostPerformance and Applications
Adaboost - Performance & Adaboost - Performance & ApplicationsApplicationsError Rates on Text Error Rates on Text
categorizationcategorization
Reuters newswire articles AP newswire headlines
Adaboost - Performance & Adaboost - Performance & ApplicationsApplicationsSix Class Text Classification Six Class Text Classification
(TREC)(TREC)
Training Error Test Error
Adaboost - Performance & Adaboost - Performance & ApplicationsApplications
“How may I help you”
Spoken Language Spoken Language ClassificationClassification
“Help desk”
Adaboost - Performance & Adaboost - Performance & ApplicationsApplications
class, label1/weight1,label2/weight2
OCR: Outliers
Rounds:
12
25
4
Adaboost - ApplicationsAdaboost - Applications
• Text filtering – Schapire, Singer, Singhal. Boosting and Rocchio
applied to text filtering.1998• Routing
– Iyer, Lewis, Schapire, Singer, Singhal. Boosting for document routing.2000
• “Ranking” problems– Freund, Iyer, Schapire, Singer. An efficient
boostingalgorithm for combining preferences.1998• Image retrieval
– Tieu, Viola. Boosting image retrieval.2000• Medical diagnosis
– Merler, Furlanello, Larcher, Sboner. Tuning costsensitive boosting and its application to melanoma diagnosis.2001
Adaboost - ApplicationsAdaboost - Applications
• Learning problems in natural language processing– Abney, Schapire, Singer. Boosting applied to
tagging and PP attachment.1999– Collins. Discriminative reranking for natural
language parsing.2000– Escudero, Marquez, Rigau. Boosting applied to
word sense disambiguation.2000– Haruno, Shirai, Ooyama. Using decision trees to
construct a practical parser.1999– Moreno, Logan, Raj. A boosting approach for
confidence scoring.2001– Walker, Rambow, Rogati. SPoT: A trainable
sentence planner.2001
Summary and ConclusionsSummary and ConclusionsAt last…
SummarySummary
• Boosting takes a weak learner and converts it to a strong one
• Works by asymptotically minimizing the training error
• Effectively maximizes the margin of the combined hypothesis
• Adaboost is related to other many topics
•It Works!
ConclusionsConclusions
• Adaboost advantages:– Fast, simple and easy to program– No parameter required
• Performance Dependency:– (Skurichina, 2001) Boosting is only
useful for large sample size.– Choice of weak classifier– Incorporation of classifier weights– Data distribution
QuestionsQuestions
?(don’t be mean)