tree-b a sed m eth o d s t h e i n tu i ti o n b eh i n d€¦ · wi n -vec to r, llc. su p e r v i...
TRANSCRIPT
![Page 1: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/1.jpg)
The intuition behindtree-based methodsS U P E R V I S E D L E A R N I N G I N R : R E G R E S S I O N
Nina Zumel and John MountWin-Vector, LLC
![Page 2: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/2.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Example: Predict animal intelligence fromGestation Time and Litter Size
![Page 3: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/3.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Decision TreesRules of the form:
if a AND b AND c THEN y
Non-linear concepts
intervals
non-monotonic
relationships
non-additive interactions
AND: similar to
multiplication
![Page 4: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/4.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Decision Trees
IF Litter < 1.15 AND Gestation ≥ 268 → intelligence = 0.315
IF Litter IN [1.15, 4.3) → intelligence = 0.131
![Page 5: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/5.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Decision TreesPro: Trees Have an Expressive
Concept Space
Model RMSE
linear 0.1200419
tree 0.1072732
![Page 6: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/6.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Decision TreesCon: Coarse-Grained Predictions
![Page 7: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/7.jpg)
SUPERVISED LEARNING IN R: REGRESSION
It's Hard for Trees to Express Linear RelationshipsTrees Predict Axis-Aligned Regions
![Page 8: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/8.jpg)
SUPERVISED LEARNING IN R: REGRESSION
It's Hard for Trees to Express Linear RelationshipsIt's Hard to Express Lines with Steps
![Page 9: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/9.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Other Issues with TreesTree with too many splits (deep tree):
Too complex - danger of over�t
Tree with too few splits (shallow tree):
Predictions too coarse-grained
![Page 10: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/10.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Ensembles of TreesEnsembles Give Finer-grained Predictions than Single Trees
![Page 11: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/11.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Ensembles of TreesEnsemble Model Fits Animal Intelligence Data Better than Single Tree
Model RMSE
linear 0.1200419
tree 0.1072732
random forest 0.0901681
![Page 12: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/12.jpg)
Let's practice!S U P E R V I S E D L E A R N I N G I N R : R E G R E S S I O N
![Page 13: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/13.jpg)
Random forestsS U P E R V I S E D L E A R N I N G I N R : R E G R E S S I O N
Nina Zumel and John MountWin-Vector, LCC
![Page 14: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/14.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Random ForestsMultiple diverse decision trees averaged together
Reduces over�t
Increases model expressiveness
Finer grain predictions
![Page 15: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/15.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Building a Random Forest Model1. Draw bootstrapped sample from training data
2. For each sample grow a tree
At each node, pick best variable to split on (from a random
subset of all variables)
Continue until tree is grown
3. To score a datum, evaluate it with all the trees and average the
results.
![Page 16: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/16.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Example: Bike Rental Datacnt ~ hr + holiday + workingday +
+ weathersit + temp + atemp + hum + windspeed
![Page 17: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/17.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Random Forests with ranger()
model <- ranger(fmla, bikesJan,
+ num.trees = 500,
+ respect.unordered.factors = "order")
formula , data
num.trees (default 500) - use at least 200
mtry - number of variables to try at each node
default: square root of the total number of variables
respect.unordered.factors - recommend set to "order"
"safe" hashing of categorical variables
![Page 18: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/18.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Random Forests with ranger()
model
Ranger result
...
OOB prediction error (MSE): 3103.623
R squared (OOB): 0.7837386
Random forest algorithm returns estimates of out-of-sample
performance.
![Page 19: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/19.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Predicting with a ranger() model
bikesFeb$pred <- predict(model, bikesFeb)$predictions
predict() inputs:
model
data
Predictions can be accessed in the element predictions .
![Page 20: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/20.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Evaluating the modelCalculate RMSE:
bikesFeb %>%
+ mutate(residual = pred - cnt) %>%
+ summarize(rmse = sqrt(mean(residual^2)))
rmse
1 67.15169
Model RMSE
Quasipoisson 69.3
Random forests 67 15
![Page 21: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/21.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Evaluating the model
![Page 22: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/22.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Evaluating the model
![Page 23: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/23.jpg)
Let's practice!S U P E R V I S E D L E A R N I N G I N R : R E G R E S S I O N
![Page 24: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/24.jpg)
One-Hot-EncodingCategorical
VariablesS U P E R V I S E D L E A R N I N G I N R : R E G R E S S I O N
Nina Zumel and John MountWin-Vector, LLC
![Page 25: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/25.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Why Convert Categoricals Manually?Most R functions manage the conversion for you
model.matrix()
xgboost() does not
Must convert categorical variables to numeric representation
Conversion to indicators: one-hot encoding
![Page 26: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/26.jpg)
SUPERVISED LEARNING IN R: REGRESSION
One-hot-encoding and data cleaning with `vtreat`Basic idea:
designTreatmentsZ() to design a treatment plan from the
training data, then
prepare() to created "clean" data
all numerical
no missing values
use prepare() with treatment plan for all future data
![Page 27: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/27.jpg)
SUPERVISED LEARNING IN R: REGRESSION
A Small vtreat ExampleTraining Data
x u y
one 44 0.4855671
two 24 1.3683726
three 66 2.0352837
two 22 1.6396267
Test Data
x u y
one 5 2.6488148
three 12 1.5012938
one 56 0.1993731
two 28 1.2778516
![Page 28: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/28.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Create the Treatment Planvars <- c("x", "u")
treatplan <- designTreatmentsZ(dframe, varslist, verbose = FALSE)
Inputs to designTreatmentsZ()
dframe : training data
varlist : list of input variable names
set verbose = FALSE to suppress progress messages
![Page 29: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/29.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Get the New VariablesThe scoreFrame describes the variable mapping and types
(scoreFrame <- treatplan$scoreFrame %>% + select(varName, origName, code))
varName origName code 1 x_lev_x.one x lev 2 x_lev_x.three x lev 3 x_lev_x.two x lev 4 x_catP x catP 5 u_clean u clean
Get the names of the new lev and clean variables
(newvars <- scoreFrame %>% + filter(code %in% c("clean", "lev")) %>% + use_series(varName)) "x_lev_x.one" "x_lev_x.three" "x_lev_x.two" "u_clean"
![Page 30: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/30.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Prepare the Training Data for Modelingtraining.treat <- prepare(treatmentplan, dframe, varRestriction = ne
Inputs to prepare() :
treatmentplan : treatment plan
dframe : data frame
varRestriction : list of variables to prepare (optional)
default: prepare all variables
![Page 31: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/31.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Before and After Data TreatmentTraining Data
x u y
one 44 0.4855671
two 24 1.3683726
three 66 2.0352837
two 22 1.6396267
Treated Training Data
x_lev
_x.
one
x_lev
_x.
three
x_lev
_x.
two
u_clean
1 0 0 44
0 0 1 24
0 1 0 66
0 0 1 22
![Page 32: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/32.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Prepare the Test Data Before Model Application(test.treat <- prepare(treatplan, test, varRestriction = newvars))
x_lev_x.one x_lev_x.three x_lev_x.two u_clean
1 1 0 0 5
2 0 1 0 12
3 1 0 0 56
4 0 0 1 28
![Page 33: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/33.jpg)
SUPERVISED LEARNING IN R: REGRESSION
vtreat Treatment is RobustPreviously unseen x level: four
x u y
one 4 0.2331301
two 14 1.9331760
three 66 3.1251029
four 25 4.0332491
four encodes to (0, 0, 0)
prepare(treatplan, toomany, ...)
x_lev
_x.
one
x_lev
_x.
three
x_lev
_x.
two
u_clean
1 0 0 4
0 0 1 14
0 1 0 66
0 0 0 25
![Page 34: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/34.jpg)
Let's practice!S U P E R V I S E D L E A R N I N G I N R : R E G R E S S I O N
![Page 35: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/35.jpg)
Gradient boostingmachines
S U P E R V I S E D L E A R N I N G I N R : R E G R E S S I O N
Nina Zumel and John MountWin-Vector, LLC
![Page 36: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/36.jpg)
SUPERVISED LEARNING IN R: REGRESSION
How Gradient Boosting Works1. Fit a shallow tree T to the
data: M = T1
1 1
![Page 37: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/37.jpg)
SUPERVISED LEARNING IN R: REGRESSION
How Gradient Boosting Works1. Fit a shallow tree T to the
data: M = T
2. Fit a tree T_2 to the residuals.
Find γ such that
M =M + γT is the best �t
to data
1
1 1
2 1 2
![Page 38: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/38.jpg)
SUPERVISED LEARNING IN R: REGRESSION
How Gradient Boosting WorksRegularization: learning rate
η ∈ (0, 1)
M =M + ηγT
Larger η: faster learning
Smaller η: less risk of
over�t
2 1 2
![Page 39: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/39.jpg)
SUPERVISED LEARNING IN R: REGRESSION
How Gradient Boosting Works1. Fit a shallow tree T to the
data
M = T
2. Fit a tree T_2 to the residuals.
M =M + ηγ T
3. Repeat (2) until stopping
condition met
Final Model:
M =M + η γ T
1
1 1
2 1 2 2
1 ∑ i i
![Page 40: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/40.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Cross-validation to Guard Against Over�t
Training error keeps decreasing, but test error doesn't
![Page 41: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/41.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Best Practice (with xgboost())1. Run xgb.cv() with a large number of rounds (trees).
![Page 42: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/42.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Best Practice (with xgboost())1. Run xgb.cv() with a large number of rounds (trees).
2. xgb.cv()$evaluation_log : records estimated RMSE for each
round.
Find the number of trees that minimizes estimated RMSE: nbest
![Page 43: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/43.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Best Practice (with xgboost())1. Run xgb.cv() with a large number of rounds (trees).
2. xgb.cv()$evaluation_log : records estimated RMSE for each
round.
Find the number of trees that minimizes estimated RMSE: n
3. Run xgboost() , setting nrounds = n
best
best
![Page 44: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/44.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Example: Bike Rental ModelFirst, prepare the data
treatplan <- designTreatmentsZ(bikesJan, vars)
newvars <- treatplan$scoreFrame %>%
+ filter(code %in% c("clean", "lev")) %>%
+ use_series(varName)
bikesJan.treat <- prepare(treatplan, bikesJan, varRestriction = newv
For xgboost() :
Input data: as.matrix(bikesJan.treat)
Outcome: bikesJan$cnt
![Page 45: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/45.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Training a model with xgboost() / xgb.cv()cv <- xgb.cv(data = as.matrix(bikesJan.treat),
+ label = bikesJan$cnt,
+ objective = "reg:linear",
+ nrounds = 100, nfold = 5, eta = 0.3, depth = 6)
Key inputs to xgb.cv() and xgboost()
data : input data as matrix ; label : outcome
objective : for regression - "reg:linear"
nrounds : maximum number of trees to �t
eta : learning rate
depth : maximum depth of individual trees
nfold ( xgb.cv() only): number of folds for cross
![Page 46: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/46.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Find the Right Number of Trees
elog <- as.data.frame(cv$evaluation_log)
(nrounds <- which.min(elog$test_rmse_mean))
78
![Page 47: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/47.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Run xgboost() for �nal model
nrounds <- 78
model <- xgboost(data = as.matrix(bikesJan.treat),
+ label = bikesJan$cnt,
+ nrounds = nrounds,
+ objective = "reg:linear",
+ eta = 0.3,
+ depth = 6)
![Page 48: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/48.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Predict with an xgboost() modelPrepare February data, and predict
bikesFeb.treat <- prepare(treatplan, bikesFeb, varRestriction = newv
bikesFeb$pred <- predict(model, as.matrix(bikesFeb.treat))
Model performances on Febrary Data
Model RMSE
Quasipoisson 69.3
Random forests 67.15
Gradient Boosting 54.0
![Page 49: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/49.jpg)
SUPERVISED LEARNING IN R: REGRESSION
Visualize the ResultsPredictions vs. Actual Bike
Rentals, February
Predictions and Hourly Bike
Rentals, February
![Page 50: tree-b a sed m eth o d s T h e i n tu i ti o n b eh i n d€¦ · Wi n -Vec to r, LLC. SU P E R V I SE D LE A R NI NG I N R : R E G R E SSI O N Exa m pl e: Predi ct a ni m a l i ntel](https://reader034.vdocument.in/reader034/viewer/2022042300/5ecb95fe2c70c51dc775dfa8/html5/thumbnails/50.jpg)
Let's practice!S U P E R V I S E D L E A R N I N G I N R : R E G R E S S I O N