Download - Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Regression Treeand

Multivariate Adaptive Regression Splines (MARS)

Munmun Biswas

Dept. of Statistics, Brahmananda Keshab Chandra College

July 28, 2020

M.Biswas (BKC College) Regression Tree and Multivariate Adaptive Regression Splines (MARS)July 28, 2020 1 / 20

Page 2: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Classification Problem

CART in classification problem

We have discussed the ideaWe have seen the rpart package for implementationAlso certain limitations of CART algorithm

To improve the effectivity certain ensemble methods are proposed

BaggingBoostingRandom Forest

To be discussed in next class

Page 3: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

We have discussed the idea

We have seen the rpart package for implementationAlso certain limitations of CART algorithm

Page 4: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

We have discussed the ideaWe have seen the rpart package for implementation

Also certain limitations of CART algorithm

Page 5: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Page 6: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Page 7: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Bagging

BoostingRandom Forest

Page 8: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

BaggingBoosting

Random Forest

Page 9: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Page 10: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Page 11: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Regression Preblem

Data: (yi , xi ), i = 1, . . . , n,wherexi = (xi1, . . . , xij , . . . , xip)′

x1

x2

1

Page 12: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Regression Problem

The CART algorithm splitsthe x-space into partitions,say {R1,R2, . . . ,RM}

x1

x2

1

Page 13: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Regression Problem

The algorithm splits thex-space into partitions, say{R1,R2, . . . ,RM}The regression tree model ofresponse asf (x) =

∑Mm=1 cmI(x ∈ Rm)

cm = ave(yi |xi ∈ Rm)

x1

x2

R1

R2

R3

R4

1

Page 14: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

The partitioning of variables are done in a top-down, greedy fashion.

A partition performed earlier in the tree will not change based on laterpartitions.

The model begins with the entire data set, S . It searches everydistinct value of every input variable to find the predictor and splitvalue, that partitions the data into two regions R1 and R2 such thatthe overall sums of squares error are minimized.

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}Having found the best binary split, we partition the data into the tworesulting regions.

Repeat the splitting process on each of the two regions.

This process is continued until some stopping criterion is reached.

Page 15: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

Page 16: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

Page 17: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

(yi − c2)2}

Having found the best binary split, we partition the data into the tworesulting regions.

Page 18: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

Page 19: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

Page 20: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The CART algorithm

Minimize {SSE =∑

i∈R1(yi − c1)2 +

∑i∈R2

Page 21: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Cost Complexity Parameter

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

We typically grow a very large tree as defined in the previous sectionand then prune it back to find an optimal subtree.

There is often a balance to be achieved in the depth and complexityof the tree to optimize predictive performance on some unseen data.

minimize{SSE + α× |T |} where |T | is the number of terminal nodesof the tree.

Obtain the smallest pruned tree that has the lowest penalized error.

Page 22: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Stopping rule

minsplit: the minimum number of data points required to attempt asplit before it is forced to create a terminal node.

maxdepth: the maximum number of internal nodes between the rootnode and the terminal nodes.

Page 23: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Stopping rule

Page 24: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Stopping rule

Page 25: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Stopping rule

Page 26: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Stopping rule

Page 27: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Stopping rule

Page 28: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Implementation of Regression Tree

R package rpart

Using rattle()

Rattle is a free graphical user interface for Data Science, developedusing R. R is a free software environment for statistical computing,graphics, machine learning and artificial intelligence. Together Rattleand R provide a sophisticated environment for data science, statisticalanalyses, and data visualisation.

Page 29: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

R package rpart

Using rattle()

Page 30: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

R package rpart

Using rattle()

Page 31: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Data Description

Body girth measurements and skeletal diameter measurements, as well as age, weight, height

and gender, are given for 507 physically active individuals- 247 men and 260 women.

Figure: Body Dimension Data

Page 32: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Implementation using rpart

l i b r a r y ( r samp le )l i b r a r y ( r p a r t )l i b r a r y ( r p a r t . p l o t )%l i b r a r y ( Me t r i c s ) #rmse f u n c t i o n

b o d y d im e n s i o n d a t a i n t e r e s t <− r ead . c sv (”˜/ Desktop /Workshop s ta t ml /My ta lk / b o d y d im e n s i o n d a t a i n t e r e s t . c s v ” , row . names=NULL)View ( b o d y d im e n s i o n d a t a i n t e r e s t )

s e t . s eed (123)bodyd im sp l i t<− i n i t i a l s p l i t ( b o d y d im e n s i o n d a t a i n t e r e s t , propo =.7)bodyd im t ra i n<−t r a i n i n g ( b o d y d im s p l i t )bodyd im tes t<−t e s t i n g ( b o d y d im s p l i t )

m1<−r p a r t ( f o rmu la=we ight ˜ . , data=bodyd im t ra i n , method=”anova ”)r p a r t . p l o t (m1) #to v iew the t r e ep l o t c p (m1) #To check f o r the p runn ing

m2 <− r p a r t ( f o rmu la=we ight ˜ . , data= bodyd im t ra i n , method= ”anova ” , c o n t r o l= l i s t ( cp = 0 , x v a l = 10))p l o t c p (m2)

pred <− p r e d i c t (m1, newdata = bodyd im te s t )obs<−bodyd im te s t$we i gh trmse ( pred , obs )

Page 33: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Ilustration using rattle()

Page 34: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage/ Disadvantage

Advantages

Trees are easy to interpret

Trees can handle multicolinearity

Tree-method is a non parametric method (assumptions free)

Disadvantages

High variance caused by the hierarchical nature of the process

Lack of smoothness of the predictor surface (MARS alleviate)

Difficulty in modeling additive structure (MARS capture).

Page 35: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantages

Disadvantages

Page 36: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantages

Disadvantages

Page 37: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantages

Disadvantages

Page 38: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantages

Disadvantages

Page 39: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantages

Disadvantages

Page 40: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantages

Disadvantages

Page 41: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantages

Disadvantages

MARS

For the regression tree process, the data were partitioned in a waythat produced the “best” split with reference to the deviances fromthe mean on either side of the split.

For MARS a similar process is used to find the best split withreference to the deviances from a spline function on either side of thesplit.

The spline functions used by MARS are:

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Each function is a piecewise linear. By multiplying these splinestogether it is possible to produce quadratic or cubic curves.

The pair of functions (X − t)+ , (t − X )+ is called reflected pairwhile t is called a knot.

Recall: Regression tree uses as basis functions: I (Xj > c) andI (Xj ≤ c)

MARS

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

MARS

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

MARS

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

MARS

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

MARS

(x − t)+ = { x − t x > t0 o/w

and (t − x)+ = { t − x t > x0 o/w

Page 48: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Illustrative Example

Multivariate Adaptive Regression Splines

Data: (yi , xi ), i = 1, . . . , n, where xi = (xi1, . . . , xij , . . . , xip)′

Consider basis functions of formC = {(Xj − xij)+, (xij − Xj)+} i = 1 , . . . , n

j = 1 , . . . , p

Model building strategy

The forward pass

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

It prunes the model into its most effective part

Page 50: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

j = 1 , . . . , p

The forward pass

f (X ) = β0 +∑M

The backward pass

Page 51: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

j = 1 , . . . , p

The forward pass

f (X ) = β0 +∑M

The backward pass

Page 52: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

j = 1 , . . . , p

Model building strategyThe forward pass

f (X ) = β0 +∑M

The backward pass

Page 53: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

j = 1 , . . . , p

f (X ) = β0 +∑M

i=1 βmhm(X )

Each hm(x) is either a function in C or product of two or more suchfunctionsIn each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

Page 54: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

j = 1 , . . . , p

f (X ) = β0 +∑M

i=1 βmhm(X )Each hm(x) is either a function in C or product of two or more suchfunctions

In each step βm’s are estimated by minimizing the residual sum ofsquares

The backward pass

Page 55: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

j = 1 , . . . , p

f (X ) = β0 +∑M

The backward pass

Page 56: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

j = 1 , . . . , p

f (X ) = β0 +∑M

The backward pass

Page 57: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

j = 1 , . . . , p

f (X ) = β0 +∑M

The backward pass

Page 58: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The forward pass

Step 1: Start with h0(X ) = 1; f (1) = β(1)0 ,M(1) = {h0(X )}

Step 2: Add to the model a function of the formb1(Xj − t)+ + b2(t − Xj)+, with t ∈ {x1j , . . . , xNj} that produces thelargest decrease in training error. Say this is achieved by j = J, andt = xkJ .Model: f (2) = β

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

M(2) = {h0(X ), h1(X ), h2(X )}, h1(X ) = (XJ − xkJ)+ etc.

. . .

Step m + 1: Add to the model a function of the formb2m−1hl(X )(Xj − t)+ + b2mhl(X )(t − Xj)+ with hl(X ) ∈M(m) thatproduces the largest decrease in training error. Say this is achieved byj = J ′, t = xk′J′ and l = L.M(m+1) =M(m) ∪ {h2m−1(X ), h2m(X )}h2m−1 = hL(X )(XJ′ − xk′J′)+ and h2m = hL(X )(xk′J′ − XJ′)+

Page 59: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The forward pass

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

. . .

Page 60: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The forward pass

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

. . .

Page 61: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The forward pass

(2)0 + β

(2)1 (XJ − xkJ)+ + β

(2)2 (xkJ − XJ)+,

. . .

Page 62: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

The forward pass algorithm stops when the model set contains somepreset number of terms.

Stopping rule: One can fix the degree of interaction terms

The model typically overfits the data including large number of terms

Backward pass prunes the model

It deletes the least effective terms one by one

Generalized cross validation is used to choose the size of the modeland the most effective subset of terms

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

, where λ is size of the model and M(λ) is

the effective number of parameters in the model.

Page 63: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

Page 64: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

Page 65: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

Page 66: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

Page 67: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

Page 68: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

The backward pass

GCV (λ) =∑n

i=1(yi−fλ(xi ))2(1−M(λ)/n)2

Page 69: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Implementation of MARS in R

#imp l ementa t i on o f MARSl i b r a r y ( e a r t h )l i b r a r y ( c a r e t )

mars1 <− e a r t h ( we ight ˜ . , data = bodyd im t r a i n )p r i n t ( mars1 ) #f o r model summarysummary ( mars1 )p l o t (mars1 , which = 1)

mars2 <− e a r t h ( we ight ˜ . , data = bodyd im t ra i n , deg r ee = 2)summary ( mars2 )p l o t (mars2 , which=1, l e g end . pos=0)

#per fo rmance o f the model i n t e s t d a t a s e tsummary (mars1 , newdata=bodyd im te s t )yhat=p r e d i c t ( mars1 , newdata=bodyd im te s t )yobs=bodyd im te s t$we i gh trmse ( yhat , yobs )

Page 70: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Advantage and Disadvantage of MARS

Advantages:

Accurate if the local linear relationships are correct.Quick computation.Can work well even with large and small data sets.Provides automated feature selection.The non-linear relationship between the features and response are fairlyintuitive.Can be used for both regression and classification problems.Does not require feature standardization.

Disadvantages:

Not accurate if the local linear relationships are incorrect.Typically not as accurate as more advanced non-linear algorithms(random forests, gradient boosting machines).The earth package does not incorporate more advanced spline features(i.e. Piecewise cubic models).Missing values must be pre-processed.

Page 71: Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Thank You

Download - Regression Tree and Multivariate Adaptive Regression Splines (MARS) · 23 hours ago · For MARS a similar process is used to nd the best split with reference to the deviances from

Top Related