a speech about boosting presenter: roberto valenti

35
A speech about Boosting A speech about Boosting Presenter: Roberto Valenti Presenter: Roberto Valenti

Upload: anissa-clarke

Post on 28-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A speech about Boosting Presenter: Roberto Valenti

A speech about BoostingA speech about BoostingPresenter: Roberto Presenter: Roberto ValentiValenti

Page 2: A speech about Boosting Presenter: Roberto Valenti

The Paper*The Paper*

*R.Schapire. The boosting approach to Machine Learning An Overview, 2001

Page 3: A speech about Boosting Presenter: Roberto Valenti

I want YOU… I want YOU…

……TO TO UNDERSTANDUNDERSTAND

Page 4: A speech about Boosting Presenter: Roberto Valenti

OverviewOverview

• Introduction• Adaboost

– How Does it work?– Why does it work?– Demo– Extensions– Performance & Applications

• Summary & Conclusions• Questions

Page 5: A speech about Boosting Presenter: Roberto Valenti

Introduction to BoostingIntroduction to BoostingLet’s start

Page 6: A speech about Boosting Presenter: Roberto Valenti

IntroductionIntroduction

• An example of Machine Learning: Spam classifier

• Highly accurate rule: difficult to find

• Inaccurate rule: ”BUY NOW”• Introducing Boosting: “An effective method of producing

an accurate prediction rule from inaccurate rules”

Page 7: A speech about Boosting Presenter: Roberto Valenti

IntroductionIntroduction

• History of boosting:– 1989: Schapire

•First provable polynomial time boosting

– 1990: Freund•Much more efficient, but practical

drawbacks

– 1995: Freund & Schapire•Adaboost: Focus of this Presentation

– …

Page 8: A speech about Boosting Presenter: Roberto Valenti

IntroductionIntroduction

• The Boosting Approach– Lots of Weak Classifiers– One Strong Classifier

• Boosting key points:– Give importance to misclassified data– Find a way to combine weak

classifiers in general rule.

Page 9: A speech about Boosting Presenter: Roberto Valenti

AdaboostAdaboostHow does it work?

Page 10: A speech about Boosting Presenter: Roberto Valenti

Adaboost – How does it work?Adaboost – How does it work?

Page 11: A speech about Boosting Presenter: Roberto Valenti

Adaboost – How does it work?Adaboost – How does it work?

Base Learner Job:– Find a base Hypothesis:

– Minimize the error:

• Choose t

Page 12: A speech about Boosting Presenter: Roberto Valenti

Adaboost – How does it work?Adaboost – How does it work?

Page 13: A speech about Boosting Presenter: Roberto Valenti

AdaboostAdaboostWhy does it work?

Page 14: A speech about Boosting Presenter: Roberto Valenti

Adaboost – Why does it work?Adaboost – Why does it work?

• Basic property: reduce the training error

• On binary Distributions:t

• Training error bounded by:

• Is at most e-2T->drops

exponentially!

Page 15: A speech about Boosting Presenter: Roberto Valenti

• Generalization Error bounded by:

– T= number of iterations– m=sample size– d= Vapnik-Chervonenkis dimension2

– Pr [.]= empirical probability– Õ = Logarithmic and constant factors

• Overfitting in T!

Adaboost – Why does it work?Adaboost – Why does it work?

Page 16: A speech about Boosting Presenter: Roberto Valenti

Adaboost – Why does it work?Adaboost – Why does it work?• Margins of the training examples

margin(x,y)=

• Positive only if correctly classified by H• Confidence in prediction:

• Qualitative Explanation of Effectiveness– Not Quantitative.

Page 17: A speech about Boosting Presenter: Roberto Valenti

Adaboost – Other ViewAdaboost – Other View

• Adaboost as a zero-sum Game– Game matrix M– Row Player: Adaboost– Column Player: Base Learner– Row player plays rows with distribution P– Column player plays with distribution Q – Expected Loss: PTMQ

• Play a Repeated game Matrix

Page 18: A speech about Boosting Presenter: Roberto Valenti

Adaboost – Other ViewAdaboost – Other View

• Von Neumann’s minmax theorem:

• If exist a classifier with • Then exist a combination of base

classifiers with margin > 2• Adaboost has potential of success

• Relations with Linear Programming and Online Learning

Page 19: A speech about Boosting Presenter: Roberto Valenti

AdaboostAdaboostDemo

Page 21: A speech about Boosting Presenter: Roberto Valenti

AdaboostAdaboostExtensions

Page 22: A speech about Boosting Presenter: Roberto Valenti

Adaboost - ExtensionsAdaboost - Extensions

• History of Boosting:– …– 1997: Freund & Schapire

•Adaboost.M1 – First Multiclass Generalization– Fails if weak learner achieves less than 50%

•Adaboost.M2– Creates a set of binary problems– For x, better l1 or l2?

– 1999: Schapire & Singer•Adaboost.MH

– For x, better l1 or one of the others?

Page 23: A speech about Boosting Presenter: Roberto Valenti

Adaboost - ExtensionsAdaboost - Extensions

– 2001: Rochery, Schapire et al.• Incorporating Human Knowledge

• Adaboost is data-driven• Human Knowledge can

compensate lack of data• Human expert:

– Chose rule p mapping x to p(x) Є [0,1]– Difficult!– Simple rules should work..

Page 24: A speech about Boosting Presenter: Roberto Valenti

Adaboost - ExtensionsAdaboost - Extensions

• To incorporate human knowledge

• Where

RE(p||q)=p ln(p/q)+(1-p) ln((1-p)/(1-q))

Page 25: A speech about Boosting Presenter: Roberto Valenti

AdaboostAdaboostPerformance and Applications

Page 26: A speech about Boosting Presenter: Roberto Valenti

Adaboost - Performance & Adaboost - Performance & ApplicationsApplicationsError Rates on Text Error Rates on Text

categorizationcategorization

Reuters newswire articles AP newswire headlines

Page 27: A speech about Boosting Presenter: Roberto Valenti

Adaboost - Performance & Adaboost - Performance & ApplicationsApplicationsSix Class Text Classification Six Class Text Classification

(TREC)(TREC)

Training Error Test Error

Page 28: A speech about Boosting Presenter: Roberto Valenti

Adaboost - Performance & Adaboost - Performance & ApplicationsApplications

“How may I help you”

Spoken Language Spoken Language ClassificationClassification

“Help desk”

Page 29: A speech about Boosting Presenter: Roberto Valenti

Adaboost - Performance & Adaboost - Performance & ApplicationsApplications

class, label1/weight1,label2/weight2

OCR: Outliers

Rounds:

12

25

4

Page 30: A speech about Boosting Presenter: Roberto Valenti

Adaboost - ApplicationsAdaboost - Applications

• Text filtering – Schapire, Singer, Singhal. Boosting and Rocchio

applied to text filtering.1998• Routing

– Iyer, Lewis, Schapire, Singer, Singhal. Boosting for document routing.2000

• “Ranking” problems– Freund, Iyer, Schapire, Singer. An efficient

boostingalgorithm for combining preferences.1998• Image retrieval

– Tieu, Viola. Boosting image retrieval.2000• Medical diagnosis

– Merler, Furlanello, Larcher, Sboner. Tuning costsensitive boosting and its application to melanoma diagnosis.2001

Page 31: A speech about Boosting Presenter: Roberto Valenti

Adaboost - ApplicationsAdaboost - Applications

• Learning problems in natural language processing– Abney, Schapire, Singer. Boosting applied to

tagging and PP attachment.1999– Collins. Discriminative reranking for natural

language parsing.2000– Escudero, Marquez, Rigau. Boosting applied to

word sense disambiguation.2000– Haruno, Shirai, Ooyama. Using decision trees to

construct a practical parser.1999– Moreno, Logan, Raj. A boosting approach for

confidence scoring.2001– Walker, Rambow, Rogati. SPoT: A trainable

sentence planner.2001

Page 32: A speech about Boosting Presenter: Roberto Valenti

Summary and ConclusionsSummary and ConclusionsAt last…

Page 33: A speech about Boosting Presenter: Roberto Valenti

SummarySummary

• Boosting takes a weak learner and converts it to a strong one

• Works by asymptotically minimizing the training error

• Effectively maximizes the margin of the combined hypothesis

• Adaboost is related to other many topics

•It Works!

Page 34: A speech about Boosting Presenter: Roberto Valenti

ConclusionsConclusions

• Adaboost advantages:– Fast, simple and easy to program– No parameter required

• Performance Dependency:– (Skurichina, 2001) Boosting is only

useful for large sample size.– Choice of weak classifier– Incorporation of classifier weights– Data distribution

Page 35: A speech about Boosting Presenter: Roberto Valenti

QuestionsQuestions

?(don’t be mean)