boosting - penn engineeringcis520/lectures/12_boosting.pdf · gradient tree boosting for regression...

13
Boosting Lyle Ungar Learning objectives Review stagewise regression Know adaboost and gradient boosting algorithms

Upload: others

Post on 06-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

BoostingLyle UngarLearning objectives

Review stagewise regressionKnow adaboost and gradient boosting algorithms

Page 2: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Stagewise Regressionu Sequentially learn the weights at

l Never readjust previously learned weightsl h(x) =

h(x) = 0 or average(y)For t =1:T

rt = y - h(x)regress rt = at h(x) to find at

h(x) = h(x) + at ht(x)

Page 3: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Boostingu Ensemble method

l Weighted combination of weak learners ht(x)

u Learned stagewisel At each stage, boosting gives more weight to what it got

wrong before

Page 4: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Adaboost

Where at is the log-odds of the weighted probability of the prediction being wrong

Page 5: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Adaboost exampleu https://alliance.seas.upenn.edu/~cis520/dynami

c/2019/wiki/index.php?n=Lectures.Boosting

Page 6: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Adaboost minimizes exponential loss

Page 7: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

And it learns it exponentially fast

Average Error Exponential in stages T and the accuracy of the weak learner g

Page 8: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Gradient Tree Boostingu Current state-of-the-art for moderate-sized data sets

l i.e. on average very slightly better than random forests when you don’t have enough data to do deep learning

Page 9: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Gradient Boostingu Model: F(x) = Si gi hi(x) + const

u Loss function: L(y,F(x))l L2 or logistic or …

u Base learner: hi(x)l Decision tree of specified depth

u Optionally subsample featuresl “stochastic gradient boosting”

u Do stagewise estimation of F(x)l Estimate hi(x) and gi at each iteration i

Page 10: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

https://en.wikipedia.org/wiki/Gradient_boosting

For squared error, this is just the standard residual

Page 11: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Gradient Tree Boostingfor Regression

u Loss function: L2

u Base learners hi(x)l Fixed depth regression tree fit on pseudo-residuall Gives a constant prediction for each leaf of the tree

u Stagewise: find weights on each hi(x)l Fancy version: fit different weights for each leaf of tree

Page 12: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

Regularization helps

http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html

Subsample = stochastic gradient boosting

Learning rate = shrinkage on g

Page 13: boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

What you should knowu Boosting

l Stagewise regression upweighting previous errorsl Gives highly accurate ensemble modelsl Relatively fastl Tends not to overfit (but still: use early stopping!)

u Gradient Tree Boostingl Uses pseudo-residualsl Very accurate!!!