boosting - penn engineeringcis520/lectures/12_boosting.pdf · gradient tree boosting for regression...

BoostingLyle UngarLearning objectives

Review stagewise regressionKnow adaboost and gradient boosting algorithms

Stagewise Regressionu Sequentially learn the weights at

l Never readjust previously learned weightsl h(x) =

h(x) = 0 or average(y)For t =1:T

rt = y - h(x)regress rt = at h(x) to find at

h(x) = h(x) + at ht(x)

Boostingu Ensemble method

l Weighted combination of weak learners ht(x)

u Learned stagewisel At each stage, boosting gives more weight to what it got

wrong before

Adaboost

Where at is the log-odds of the weighted probability of the prediction being wrong

Adaboost exampleu https://alliance.seas.upenn.edu/~cis520/dynami

c/2019/wiki/index.php?n=Lectures.Boosting

Adaboost minimizes exponential loss

And it learns it exponentially fast

Average Error Exponential in stages T and the accuracy of the weak learner g

Gradient Tree Boostingu Current state-of-the-art for moderate-sized data sets

l i.e. on average very slightly better than random forests when you don’t have enough data to do deep learning

Gradient Boostingu Model: F(x) = Si gi hi(x) + const

u Loss function: L(y,F(x))l L2 or logistic or …

u Base learner: hi(x)l Decision tree of specified depth

u Optionally subsample featuresl “stochastic gradient boosting”

u Do stagewise estimation of F(x)l Estimate hi(x) and gi at each iteration i

https://en.wikipedia.org/wiki/Gradient_boosting

For squared error, this is just the standard residual

Gradient Tree Boostingfor Regression

u Loss function: L2

u Base learners hi(x)l Fixed depth regression tree fit on pseudo-residuall Gives a constant prediction for each leaf of the tree

u Stagewise: find weights on each hi(x)l Fancy version: fit different weights for each leaf of tree

Regularization helps

http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html

Subsample = stochastic gradient boosting

Learning rate = shrinkage on g

What you should knowu Boosting

l Stagewise regression upweighting previous errorsl Gives highly accurate ensemble modelsl Relatively fastl Tends not to overfit (but still: use early stopping!)

u Gradient Tree Boostingl Uses pseudo-residualsl Very accurate!!!

boosting - penn engineeringcis520/lectures/12_boosting.pdf · gradient tree boosting for regression...

Documents

regression using boosting vishakh...

boost: boosting smooth transition regression trees...

real adaboost: boosting for credit scorecards and ... ·...

detecting malware even when it is encrypted€¦ · xgboost...

neural networks & boosting - statistics...

boosting methods benk erika kelemen zsolt. summary overview...

a gradient-based boosting algorithm for regression...

steganalysis by ensemble classifiers with boosting by...

a comparison of resampling techniques to handle the class...

research article application of boosting regression trees...

logistic regression, adaboost and bregman...

a gentle introduction to gradient boosting boosting for...

deep boosting - proceedings of machine learning...

boosting algorithms: regularization, prediction and model...

recognition ii ali farhadi. we have talked about nearest...

boosting brainpower

logistic boosting regression for label distribution...

a new perspective on boosting in linear regression via...

boosting renewable energy financing - world...

boosting with the l2 loss: regression and...