the adaboost algorithm-0.5 margin theorem 1 let d be a distribution over x x {—1, 1}, and let s be...

35
The AdaBoost Algorithm

Upload: others

Post on 15-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

The AdaBoost Algorithm

Page 2: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

A typical learning curve

Page 3: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

…and a boosting one

Page 4: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

this lecture

Page 5: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

supervised learning

Page 6: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

boosting : introduction

Page 7: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

boosting example

  Start with uniform distribution on data

  Weak learners = halfplanes

Page 8: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

round 1

Page 9: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

round 2

Page 10: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

round 3

Page 11: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

final hypothesis

Page 12: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

the hypothesis points space

Page 13: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

separation

Page 14: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

AdaBoost - technical

Page 15: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

a bayesian interpretation

Page 16: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

AdaBoost – update rule

Page 17: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

online allocation - hedge algorithm

Page 18: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

AdaBoost – distribution update

Page 19: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

training error

  Theorem [Freund&Schapire ’97]

Page 20: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

Boosting and margin distribution

θ

Page 21: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

Why margins are important

•  Generalization error

Page 22: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

generalization error- based on margins

Page 23: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

AdaBoost - remarks

  AdaBoost is adaptive •  does not need to know or T a priori •  can exploit εt << ½-

  GOOD : does not overfit   BAD : Susceptible to noise

  but a nice property : identify outliers

Page 24: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

Recent advances

Page 25: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that
Page 26: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

boosting vs bagging

Page 27: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

Boosting vs SVMs

Page 28: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

…hope you are still with me

Page 29: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

boosting using confidence-rated predictions

Page 30: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

multiclass : AdaBoost.M1

Page 31: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

multiclass,multilabel : AdaBoost.MH

Page 32: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

output coding for multiclass problems

Page 33: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

InfoBoost – [Aslam ’2000]

Page 34: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

applications

Page 35: The AdaBoost Algorithm-0.5 margin Theorem 1 Let D be a distribution over X X {—1, 1}, and let S be a sample of m examples chosen independently at random according to D. Assume that

so